编写一个循环来创建具有不同数据源和标题的 ggplot 图形答案

【问题标题】：Writing a loop to create ggplot figures with different data sources and titles编写一个循环来创建具有不同数据源和标题的 ggplot 图形
【发布时间】：2015-10-21 13:09:11
【问题描述】：

我没有使用循环的经验，但看起来我需要创建一些循环来正确分析我的数据。您能否展示如何在我已经创建的代码上创建一个简单的循环？让我们使用循环来获取一些图：

pdf(file = sprintf("complex I analysis", tbl_comp_abu1), paper='A4r')

ggplot(df_tbl_data1_comp1, aes(Size_Range, Abundance, group=factor(Gene_Name))) +
  theme(legend.title=element_blank()) +
  geom_line(aes(color=factor(Gene_Name))) +
  ggtitle("Data1 - complex I")+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

ggplot(df_tbl_data2_comp1, aes(Size_Range, Abundance, group=factor(Gene_Name))) +
  theme(legend.title=element_blank()) +
  geom_line(aes(color=factor(Gene_Name))) +
  ggtitle("Data2 - complex I")+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))


ggplot(df_tbl_data3_comp1, aes(Size_Range, Abundance, group=factor(Gene_Name))) +
  theme(legend.title=element_blank()) +
  geom_line(aes(color=factor(Gene_Name))) +
  ggtitle("Datas3 - complex I")+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

dev.off()

现在的问题是我想要实现什么。所以首先我要分析 10 个复合体，这意味着应该创建 10 个 pdf 文件，并且该示例显示了来自三个不同数据集的复杂数据集的图。为了使其正确，变量 comp1（来自 df_tbl_dataX_comp1）中的数字必须从 1 更改为 10 - 取决于我们要绘制的复合体。接下来要通过循环更改的是pdf文件的名称和每个图形......是否可以编写这样的循环？

数据：

structure(list(Size_Range = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 8L, 
8L, 8L, 9L, 9L, 9L, 10L, 10L, 10L, 11L, 11L, 11L, 12L, 12L, 12L, 
13L, 13L, 13L, 14L, 14L, 14L, 15L, 15L, 15L, 16L, 16L, 16L, 17L, 
17L, 17L, 18L, 18L, 18L, 19L, 19L, 19L, 20L, 20L, 20L), .Label = c("10", 
"34", "59", "84", "110", "134", "165", "199", "234", "257", "362", 
"433", "506", "581", "652", "733", "818", "896", "972", "1039"
), class = "factor"), Abundance = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 142733.475, 108263.525, 98261.11, 649286.165, 
3320759.803, 3708515.148, 6691260.945, 30946562.92, 180974.3725, 
4530005.805, 21499827.89, 0, 15032198.54, 4058060.583, 0, 3842964.97, 
2544030.857, 0, 1640476.977, 286249.1775, 0, 217388.5675, 1252965.433, 
0, 1314666.05, 167467.8825, 0, 253798.15, 107244.9925, 0, 207341.1925, 
15755.485, 0, 71015.85, 14828.5075, 0, 25966.2325, 0, 0, 0, 0, 
0, 0), Gene_Name = c("AT1G01080", "AT1G01090", "AT1G01320", "AT1G01420", 
"AT1G01470", "AT1G01560", "AT1G01800", "AT1G02150", "AT1G02500", 
"AT1G02560", "AT1G02780", "AT1G02880", "AT1G02920", "AT1G02930", 
"AT1G03030", "AT1G03090", "AT1G03110", "AT1G03130", "AT1G03220", 
"AT1G03230", "AT1G03330", "AT1G03475", "AT1G03630", "AT1G03680", 
"AT1G03870", "ATCG00420", "ATCG00470", "ATCG00480", "ATCG00490", 
"ATCG00500", "ATCG00650", "ATCG00660", "ATCG00670", "ATCG00740", 
"ATCG00750", "ATCG00842", "ATCG01100", "ATCG01030", "ATCG01114", 
"ATCG01665", "ATCG00770", "ATCG00780", "ATCG00800", "ATCG00810", 
"ATCG00820", "ATCG00722", "ATCG00744", "ATCG00855", "ATCG00853", 
"ATCG00888", "ATCG00733", "ATCG00766", "ATCG00812", "ATCG00821", 
"ATCG00856", "ATCG00830", "ATCG00900", "ATCG01060", "ATCG01110", 
"ATCG01120")), .Names = c("Size_Range", "Abundance", "Gene_Name"
), row.names = c(NA, -60L), class = "data.frame")

【问题讨论】：

您可以查看：stackoverflow.com/questions/23439266/… 或 stackoverflow.com/questions/11357139/…
您的数据是否很大？您可以考虑创建一个命名的数据框列表（甚至是一个大的）并使用lapply 或类似的东西。
它们没有那么大。如果我知道如何做到这一点很容易......
另一种方法（如果不是必须在不同文件中设置绘图），将不同的绘图保存到列表中，只需将列表写入单个 pdf，这将为您提供每个图表的页面。 p = as.list(1:3)、p[[1]] = ggplot(...) + ...、p[[2]] = ... 等然后pdf("plots.pdf", paper = "A4r"); p; dev.off()。

标签： r ggplot2

【解决方案1】：

这可能会奏效：启动两个循环，一个用于复杂迭代，另一个用于数据集迭代。然后使用paste0() 或paste() 生成正确的文件名和标题。

PS.：我没有测试代码，因为我没有你的数据。但它应该给你一个想法。

#loop over complex    
for (c in 1:10) {

    #create pdf for every complex 
    pdf(file = paste0("complex", c, "analysis.pdf"), paper='A4r')

    #loop over datasets
    for(d in 1:3) {

    #plot
    ggplot(get(paste0("df_tbl_data",d,"_comp",c)), aes(Size_Range, Abundance, group=factor(Gene_Name))) +
      theme(legend.title=element_blank()) +
      geom_line(aes(color=factor(Gene_Name))) +
      ggtitle(paste0("Data",d," - complex ",c))+
      theme(axis.text.x = element_text(angle = 90, hjust = 1))
    }   
    dev.off()

}

【讨论】：

它创建文件但没有任何扩展名（我的意思是没有“pdf”扩展名）。即使我手动将其更改为 pdf，它也不会打开文件。
@ShaxiLiver tbl_comp_abu1 是什么？
@ShaxiLiver 我做了一个小改动。希望它现在有效。我真的不能说问题是什么，因为我没有你的数据。是否有任何错误消息？
在循环中使用 ggsave 不是最简单的吗？你必须给你的情节一个名字。这对我的情节很有效。你在里面放了这样的东西：ggsave(filename=paste("complex",c,"analysis.pdf",sep=""), plot=myplot)
尝试将ggplot 行包含在print 语句中

【解决方案2】：

所以在做出回答后，我意识到它并没有解决有关循环的实际问题。但是，我希望它向您展示了解决根本问题的不同方式（也就是我不想浪费工作）。

我无法让您的情节与您发布的数据一起使用。在一个 60 行的数据框中有 60 个独特的基因名称。当您尝试制作geom_line 并按基因（aes(group=Gene_name)）分组时，每行只有一个点。你需要两点来画一条线。

我整理了一些数据并做了分析。

# Function to generate random data
generate_data = function() {
  require(truncnorm)
  require(dplyr)

  gene_names = LETTERS[1:20]
  n_genes = length(gene_names)
  size_ranges = c(10, 34, 59, 84, 110, 134, 165, 199, 
                  234, 257, 362, 433, 506, 581, 652, 
                  733, 818, 896, 972, 1039)
  gene_size_means = rtruncnorm(n_genes, 10, 1000, 550, 300)
  genes_in_complex = rbinom(n_genes, 1, 0.3)
  true_variance = 50
  gene_size_variances = rchisq(n_genes, n_genes-1) * (true_variance/(n_genes-1))
  df = data.frame(gene_name=gene_names, 
                  gene_mean=gene_size_means, 
                  gene_var=gene_size_variances,
                  in_complex=genes_in_complex)
  df = df %>% group_by(gene_name) %>% 
    do(data.frame(size_ranges, 
                  abundance=dnorm(size_ranges, .$gene_mean, .$gene_var)*.$in_complex))
  return(df)
}

# Generate a list of tables. Each table is for one data set for one complex
data_tables = list()
n_comps = 3
for( complex_i in 1:2 ) {
  for( comp_j in 1:n_comps ) {
    loop_df = generate_data()
    loop_df$comp = comp_j
    loop_df$complex = complex_i
    data_tables = c(data_tables, list(loop_df))
  }
}

# Concatenate the tables into a larger data frame
dat = do.call(rbind, data_tables)

# Make a plots for each data set for complex 1
dat_complex1 = subset(dat, complex==1)
p = ggplot(dat_complex1, aes(x=size_ranges, y=abundance, color=gene_name, group=gene_name)) +
  geom_line() + 
  facet_wrap(~comp, ncol=1)
print(p)

# Make a plot with many subpanels for all complexes and data sets
p %+% dat + facet_grid(comp~complex) # screenshot shown below

所以您正在研究拟南芥中的蛋白质复合物？如果有人熟悉您的领域，一句话背景可能会帮助他们回答您的问题。或者，所需输出的图片可能会有所帮助。此外，一些更完整的示例数据和/或屏幕截图可能会引起您对未来帖子的更多兴趣。

【讨论】：

【解决方案3】：

看看这个方法。它取决于包含数据集名称、图表标题以及文件名的 data.frame (dat)。

首先，我创建一个函数来创建绘图并保存它，然后我在 for-loop 和 apply-loop 中调用该函数（尽可能使用 apply，它更快）。

代码如下所示：

# create a custom function for ggplot, 
# which creates the plot and then saves it as a pdf
custom_ggplot_function <- function(input.data.name, graph.title, f.name){
  # get(input.data.name) gets you the variable which is stored as a string in
  # input.data.name

  p <- ggplot(get(input.data.name), aes(Size_Range, Abundance, group=factor(Gene_Name))) +
    theme(legend.title=element_blank()) +
    geom_line(aes(color=factor(Gene_Name))) +
    ggtitle(graph.title)+
    theme(axis.text.x = element_text(angle = 90, hjust = 1))

  ggsave(filename = paste0(f.name, ".pdf"), plot = p)
  NULL
}

# dat contains the names of your datasets, the titles of the graphs and filenames
dat <- data.frame(df.names = c("df_tbl_data1_comp1",
                              "df_tbl_data2_comp1"),
                  graph.titles = c("Data1 - Complex I",
                                   "Data2 - Complex II"),
                  file.names = c("file1", "file2"))
# If you create your data.frame dat, you can also say 
# df.names  = paste0("df_tbl_data", 1:10, "_comp1") and
# graph.titles = paste0("Data", 1:10, " - Complex ", 1:10)     


# loop through the rows of dat
for (i in 1:nrow(dat)) {
  custom_ggplot_function(input.data.name = dat[i, "df.names"],
                         graph.title = dat[i, "graph.titles"], 
                         f.name = dat[i, "file.names"])
}

# or using the apply function
apply(dat, 1, function(row.el) {
  custom_ggplot_function(input.data.name = row.el["df.names"], 
                         graph.title = row.el["graph.titles"], 
                         f.name = row.el["file.names"])
})

【讨论】：