【问题标题】:How to make multiple boxplots by two different groups in one graph?如何在一张图中按两个不同的组制作多个箱线图?
【发布时间】:2019-08-14 03:49:52
【问题描述】:

dataset部分是这样的:

  Treatment   Status     gene1    gene2
1      Both Deceased  3.1934860 63.8697194   
2      Both Deceased  0.0000000 11.3436426   
3     Chemo Deceased  7.2186817 35.0621681   
4      Both Deceased  7.2186817 23.7185255   
5     Chemo Deceased  0.8049256 17.7083638   
6     Chemo Censored  0.8250437  0.8250437   
7     Chemo Censored  3.4136505 23.895533   
8     Radio Censored  0.9428735  4.7143673   
9      None Censored  3.3001750 10.7255686   

我想比较每种治疗的“已故”与“审查”中的每个基因表达。我现在只能做一个基因表达,是这样的:

ggboxplot(df, x="Treatment", y= "gene1", fill = "Status")

有什么方法可以将两个基因的箱线图组合在一张图中?或者有什么其他更好的方法来显示每组中死亡与审查之间这些基因的表达水平差异?

【问题讨论】:

  • 使用熔体和刻面

标签: r ggplot2 reshape boxplot


【解决方案1】:

我们可以在base R中使用boxplot(),这里我们需要先使用reshape()来获得长格式。

boxplot(gene ~ Status + time + Treatment, 
        reshape(cbind(id=rownames(dat), dat), 4:5, sep="", direction="long"), 
        border=1:2)

但是,这会产生一个非常拥挤的情节。我们可以做单独的箱线图,例如每个治疗组使用sapply()

par(mfrow=c(2, 2))
sapply(unique(dat$Treatment), function(x) {
  boxplot(value ~ Status + gene, 
          reshape(cbind(id=rownames(dat[dat$Treatment == x, ]), dat[dat$Treatment == x, ]), 
                  4:5, sep="", direction="long", v.names="value", timevar="gene"), 
          at=c(1:2, 4:5),
          main=x,
          border=1:2)
})

结果

数据

dat <- structure(list(Treatment = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
3L, 3L, 3L, 4L, 4L, 4L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 
4L, 4L), .Label = c("Both", "Chemo", "None", "Radio"), class = "factor"), 
    Status = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
    ), .Label = c("Censored", "Deceased"), class = "factor"), 
    gene1 = c(2.83185327992901, 5.21658677992433, 9.36719279899948, 
    1.77809421116808, 6.39453760571561, 3.08376117126782, -1.99524072673447, 
    0.380722587753265, -0.947148460332481, 1.73014054712629, 
    0.855919162512028, 0.501667581598007, 0.0638735169737497, 
    10.1712355237258, 5.34317645471502, -7.96626158445742, -0.0781613844302278, 
    5.59930916967042, -0.725717330717595, 0.492793009977729, 
    -0.546677404630108, 0.290301979542245, 2.83540215865274, 
    -1.25738031049913), gene2 = c(6.97361394841868, -6.86012827859373, 
    -0.193731972798249, -5.64669185350061, -20.6664537342379, 
    32.5477488386544, 12.6210452154023, 6.56845245925654, 13.5491140544121, 
    -2.9113829554538, 2.90958200298303, -6.56806056188421, 50.2577234864485, 
    17.0734922804668, 49.0769939658538, -2.0186433516603, 32.3823429023035, 
    17.7654319738005, 12.2884241568455, 21.7600566866782, 19.68978862329, 
    -12.6277420840716, 27.555120882401, 17.5164450232983)), row.names = c(3L, 
23L, 13L, 44L, 34L, 50L, 90L, 67L, 62L, 100L, 95L, 96L, 132L, 
144L, 124L, 174L, 171L, 168L, 196L, 205L, 207L, 233L, 229L, 212L
), class = "data.frame")

【讨论】:

    【解决方案2】:

    使用来自 jay.sf 的数据,您可以尝试“ggplot”。我正在使用tidyverse,但这不是必需的。

    library(tidyverse)
    dat %>% 
      as_tibble() %>%
      gather(gene, mRNA, -Treatment, -Status) %>% 
      ggplot(aes(Status, mRNA, fill =gene)) + 
       geom_boxplot() +
       facet_wrap(~Treatment, ncol = 2, scales = "free_y")
    

    使用facet_grid,您可以自动添加显着性水平

    dat %>% 
      as_tibble() %>%
      gather(gene, mRNA, -Treatment, -Status) %>% 
      ggplot(aes(gene, mRNA, fill =gene)) + 
       geom_boxplot(show.legend = F) +
       ggbeeswarm::geom_beeswarm(show.legend = F) +
       ggsignif::geom_signif(comparisons = list(c("gene1", "gene2"))) +
       facet_grid(Status~Treatment, scales = "free_y") 
    

    【讨论】:

    • 感谢您分享您的答案。我喜欢你的第二种情节。如果我有多个基因(并且它们的名字不是gene1,gen2等,我是否应该将它们的名字改为gene1,gene2,gene3......以便收集?
    • @RongrongChai 不一定。如果您需要重要的值,您只需在comparisons = list(c("gene1", "gene2")) 处更改名称。如果有两个以上的基因,您必须提供每个比较,例如comparisons = list(c("gene1", "gene2"), c("gene1", "gene3"),c("gene2", "gene3"))
    猜你喜欢
    • 1970-01-01
    • 2013-01-14
    • 2023-01-31
    • 2015-07-25
    • 2015-08-29
    • 2020-09-15
    • 1970-01-01
    • 2020-12-03
    相关资源
    最近更新 更多