【问题标题】:Writing a loop to create ggplot figures with different data sources and titles编写一个循环来创建具有不同数据源和标题的 ggplot 图形
【发布时间】:2015-10-21 13:09:11
【问题描述】:

我没有使用循环的经验,但看起来我需要创建一些循环来正确分析我的数据。您能否展示如何在我已经创建的代码上创建一个简单的循环?让我们使用循环来获取一些图:

pdf(file = sprintf("complex I analysis", tbl_comp_abu1), paper='A4r')

ggplot(df_tbl_data1_comp1, aes(Size_Range, Abundance, group=factor(Gene_Name))) +
  theme(legend.title=element_blank()) +
  geom_line(aes(color=factor(Gene_Name))) +
  ggtitle("Data1 - complex I")+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

ggplot(df_tbl_data2_comp1, aes(Size_Range, Abundance, group=factor(Gene_Name))) +
  theme(legend.title=element_blank()) +
  geom_line(aes(color=factor(Gene_Name))) +
  ggtitle("Data2 - complex I")+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))


ggplot(df_tbl_data3_comp1, aes(Size_Range, Abundance, group=factor(Gene_Name))) +
  theme(legend.title=element_blank()) +
  geom_line(aes(color=factor(Gene_Name))) +
  ggtitle("Datas3 - complex I")+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

dev.off()

现在的问题是我想要实现什么。所以首先我要分析 10 个复合体,这意味着应该创建 10 个 pdf 文件,并且该示例显示了来自三个不同数据集的复杂数据集的图。为了使其正确,变量 comp1(来自 df_tbl_dataX_comp1)中的数字必须从 1 更改为 10 - 取决于我们要绘制的复合体。接下来要通过循环更改的是pdf文件的名称和每个图形......是否可以编写这样的循环?

数据:

structure(list(Size_Range = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 8L, 
8L, 8L, 9L, 9L, 9L, 10L, 10L, 10L, 11L, 11L, 11L, 12L, 12L, 12L, 
13L, 13L, 13L, 14L, 14L, 14L, 15L, 15L, 15L, 16L, 16L, 16L, 17L, 
17L, 17L, 18L, 18L, 18L, 19L, 19L, 19L, 20L, 20L, 20L), .Label = c("10", 
"34", "59", "84", "110", "134", "165", "199", "234", "257", "362", 
"433", "506", "581", "652", "733", "818", "896", "972", "1039"
), class = "factor"), Abundance = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 142733.475, 108263.525, 98261.11, 649286.165, 
3320759.803, 3708515.148, 6691260.945, 30946562.92, 180974.3725, 
4530005.805, 21499827.89, 0, 15032198.54, 4058060.583, 0, 3842964.97, 
2544030.857, 0, 1640476.977, 286249.1775, 0, 217388.5675, 1252965.433, 
0, 1314666.05, 167467.8825, 0, 253798.15, 107244.9925, 0, 207341.1925, 
15755.485, 0, 71015.85, 14828.5075, 0, 25966.2325, 0, 0, 0, 0, 
0, 0), Gene_Name = c("AT1G01080", "AT1G01090", "AT1G01320", "AT1G01420", 
"AT1G01470", "AT1G01560", "AT1G01800", "AT1G02150", "AT1G02500", 
"AT1G02560", "AT1G02780", "AT1G02880", "AT1G02920", "AT1G02930", 
"AT1G03030", "AT1G03090", "AT1G03110", "AT1G03130", "AT1G03220", 
"AT1G03230", "AT1G03330", "AT1G03475", "AT1G03630", "AT1G03680", 
"AT1G03870", "ATCG00420", "ATCG00470", "ATCG00480", "ATCG00490", 
"ATCG00500", "ATCG00650", "ATCG00660", "ATCG00670", "ATCG00740", 
"ATCG00750", "ATCG00842", "ATCG01100", "ATCG01030", "ATCG01114", 
"ATCG01665", "ATCG00770", "ATCG00780", "ATCG00800", "ATCG00810", 
"ATCG00820", "ATCG00722", "ATCG00744", "ATCG00855", "ATCG00853", 
"ATCG00888", "ATCG00733", "ATCG00766", "ATCG00812", "ATCG00821", 
"ATCG00856", "ATCG00830", "ATCG00900", "ATCG01060", "ATCG01110", 
"ATCG01120")), .Names = c("Size_Range", "Abundance", "Gene_Name"
), row.names = c(NA, -60L), class = "data.frame")

【问题讨论】:

  • 您的数据是否很大?您可以考虑创建一个命名的数据框列表(甚至是一个大的)并使用lapply 或类似的东西。
  • 它们没有那么大。如果我知道如何做到这一点很容易......
  • 另一种方法(如果不是必须在不同文件中设置绘图),将不同的绘图保存到列表中,只需将列表写入单个 pdf,这将为您提供每个图表的页面。 p = as.list(1:3)p[[1]] = ggplot(...) + ...p[[2]] = ... 等然后pdf("plots.pdf", paper = "A4r"); p; dev.off()

标签: r ggplot2


【解决方案1】:

这可能会奏效: 启动两个循环,一个用于复杂迭代,另一个用于数据集迭代。然后使用paste0()paste() 生成正确的文件名和标题。

PS.:我没有测试代码,因为我没有你的数据。但它应该给你一个想法。

#loop over complex    
for (c in 1:10) {

    #create pdf for every complex 
    pdf(file = paste0("complex", c, "analysis.pdf"), paper='A4r')

    #loop over datasets
    for(d in 1:3) {

    #plot
    ggplot(get(paste0("df_tbl_data",d,"_comp",c)), aes(Size_Range, Abundance, group=factor(Gene_Name))) +
      theme(legend.title=element_blank()) +
      geom_line(aes(color=factor(Gene_Name))) +
      ggtitle(paste0("Data",d," - complex ",c))+
      theme(axis.text.x = element_text(angle = 90, hjust = 1))
    }   
    dev.off()

}

【讨论】:

  • 它创建文件但没有任何扩展名(我的意思是没有“pdf”扩展名)。即使我手动将其更改为 pdf,它也不会打开文件。
  • @ShaxiLiver tbl_comp_abu1 是什么?
  • @ShaxiLiver 我做了一个小改动。希望它现在有效。我真的不能说问题是什么,因为我没有你的数据。是否有任何错误消息?
  • 在循环中使用 ggsave 不是最简单的吗?你必须给你的情节一个名字。这对我的情节很有效。你在里面放了这样的东西:ggsave(filename=paste("complex",c,"analysis.pdf",sep=""), plot=myplot)
  • 尝试将ggplot 行包含在print 语句中
【解决方案2】:

所以在做出回答后,我意识到它并没有解决有关循环的实际问题。但是,我希望它向您展示了解决根本问题的不同方式(也就是我不想浪费工作)。

我无法让您的情节与您发布的数据一起使用。在一个 60 行的数据框中有 60 个独特的基因名称。当您尝试制作geom_line 并按基因(aes(group=Gene_name))分组时,每行只有一个点。你需要两点来画一条线。

我整理了一些数据并做了分析。

# Function to generate random data
generate_data = function() {
  require(truncnorm)
  require(dplyr)

  gene_names = LETTERS[1:20]
  n_genes = length(gene_names)
  size_ranges = c(10, 34, 59, 84, 110, 134, 165, 199, 
                  234, 257, 362, 433, 506, 581, 652, 
                  733, 818, 896, 972, 1039)
  gene_size_means = rtruncnorm(n_genes, 10, 1000, 550, 300)
  genes_in_complex = rbinom(n_genes, 1, 0.3)
  true_variance = 50
  gene_size_variances = rchisq(n_genes, n_genes-1) * (true_variance/(n_genes-1))
  df = data.frame(gene_name=gene_names, 
                  gene_mean=gene_size_means, 
                  gene_var=gene_size_variances,
                  in_complex=genes_in_complex)
  df = df %>% group_by(gene_name) %>% 
    do(data.frame(size_ranges, 
                  abundance=dnorm(size_ranges, .$gene_mean, .$gene_var)*.$in_complex))
  return(df)
}

# Generate a list of tables. Each table is for one data set for one complex
data_tables = list()
n_comps = 3
for( complex_i in 1:2 ) {
  for( comp_j in 1:n_comps ) {
    loop_df = generate_data()
    loop_df$comp = comp_j
    loop_df$complex = complex_i
    data_tables = c(data_tables, list(loop_df))
  }
}

# Concatenate the tables into a larger data frame
dat = do.call(rbind, data_tables)

# Make a plots for each data set for complex 1
dat_complex1 = subset(dat, complex==1)
p = ggplot(dat_complex1, aes(x=size_ranges, y=abundance, color=gene_name, group=gene_name)) +
  geom_line() + 
  facet_wrap(~comp, ncol=1)
print(p)

# Make a plot with many subpanels for all complexes and data sets
p %+% dat + facet_grid(comp~complex) # screenshot shown below

所以您正在研究拟南芥中的蛋白质复合物?如果有人熟悉您的领域,一句话背景可能会帮助他们回答您的问题。或者,所需输出的图片可能会有所帮助。此外,一些更完整的示例数据和/或屏幕截图可能会引起您对未来帖子的更多兴趣。

【讨论】:

    【解决方案3】:

    看看这个方法。它取决于包含数据集名称、图表标题以及文件名的 data.frame (dat)。

    首先,我创建一个函数来创建绘图并保存它,然后我在 for-loop 和 apply-loop 中调用该函数(尽可能使用 apply,它更快)。

    代码如下所示:

    # create a custom function for ggplot, 
    # which creates the plot and then saves it as a pdf
    custom_ggplot_function <- function(input.data.name, graph.title, f.name){
      # get(input.data.name) gets you the variable which is stored as a string in
      # input.data.name
    
      p <- ggplot(get(input.data.name), aes(Size_Range, Abundance, group=factor(Gene_Name))) +
        theme(legend.title=element_blank()) +
        geom_line(aes(color=factor(Gene_Name))) +
        ggtitle(graph.title)+
        theme(axis.text.x = element_text(angle = 90, hjust = 1))
    
      ggsave(filename = paste0(f.name, ".pdf"), plot = p)
      NULL
    }
    
    # dat contains the names of your datasets, the titles of the graphs and filenames
    dat <- data.frame(df.names = c("df_tbl_data1_comp1",
                                  "df_tbl_data2_comp1"),
                      graph.titles = c("Data1 - Complex I",
                                       "Data2 - Complex II"),
                      file.names = c("file1", "file2"))
    # If you create your data.frame dat, you can also say 
    # df.names  = paste0("df_tbl_data", 1:10, "_comp1") and
    # graph.titles = paste0("Data", 1:10, " - Complex ", 1:10)     
    
    
    # loop through the rows of dat
    for (i in 1:nrow(dat)) {
      custom_ggplot_function(input.data.name = dat[i, "df.names"],
                             graph.title = dat[i, "graph.titles"], 
                             f.name = dat[i, "file.names"])
    }
    
    # or using the apply function
    apply(dat, 1, function(row.el) {
      custom_ggplot_function(input.data.name = row.el["df.names"], 
                             graph.title = row.el["graph.titles"], 
                             f.name = row.el["file.names"])
    })
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2023-03-05
      • 1970-01-01
      • 2020-10-14
      • 2020-04-06
      • 1970-01-01
      • 2019-05-30
      相关资源
      最近更新 更多