【问题标题】:volcano plot in R: adding details: coloring common factors onlyR中的火山图:添加细节:仅着色公因数
【发布时间】:2020-01-29 07:40:40
【问题描述】:

我在为某些基因着色以指定 2 个数据集(whole_colon/volcano)中的常见基因时遇到问题。 下面的代码运行良好。然而,问题是我想添加一些非常棘手的细节。

我想为常见的基因应用不同的颜色(红色会很棒):仅当满足此陈述时:(whole_colon$genes==volcano$genes)。 我试图将组区分为(specified_increased/specified_decreased),但遗憾的是没有成功。

这是我的代码。

提前非常感谢。

    #volcano plot using ggplot2
    library(data.table)
    # Adding group to decipher if the gene is significant or not:
    whole_colon <- data.frame(whole_colon)
    whole_colon["group"] <- "NotSignificant"
    whole_colon[which(whole_colon['FDR'] < 0.05 & whole_colon['logFC'] > 1.5),"group"] <- "Increased"
    whole_colon[which(volcano['FDR'] < 0.05 & volcano['logFC'] > 1.5),"group"] <- "colon_Increased_specialized"
    whole_colon[which(volcano['FDR'] < 0.05 & volcano['logFC'] < -1.5),"group"] <- "colon_Decreased_specialized"

    with(subset(whole_colon , FDR<0.05), points(logFC, -log10(FDR), pch=20,col="red"), whole_colon$genes==volcano$genes)

    library(ggplot2)
    ggplot(whole_colon, aes(x = logFC, y = -log10(FDR), color = group))+
      scale_colour_manual(values = cols) +
      ggtitle(label = "Volcano Plot", subtitle = "colon specific volcano plot") +
      geom_point(size = 2.5, alpha = 1, na.rm = T) +
      theme_bw(base_size = 14) + 
      theme(legend.position = "right") + 
      xlab(expression(log[2]("logFC"))) + 
      ylab(expression(-log[10]("FDR"))) +
      geom_hline(yintercept = 1.30102, colour="#990000", linetype="dashed") + 
      geom_vline(xintercept = 1.5849, colour="#990000", linetype="dashed") + 
      geom_vline(xintercept = -1.5849, colour="#990000", linetype="dashed")+ 
      scale_y_continuous(trans = "log1p")

这给了我一个看起来像这样的受损图像。 (我希望 'whole_colon data' 被完全标记,而当它们具有与 'volcano data' 相同的基因时,颜色呈红色)

以下是来自 whole_colon 和volcano 的一些数据子集 整个冒号:

    genes   logFC       FDR             group   
1   CST1    9.554742    5.64e-45    Increased
3   OTOP2   -9.408177   5.76e-32    Decreased
4   COL11A1 6.825363    1.00e-31    Increased
5   INHBA   6.271879    2.07e-30    Increased
6   MMP7    7.594926    2.07e-30    Increased
7   BEST4   -7.756451   8.30e-30    Decreased
8   COL10A1 7.634386    1.82e-23    Increased
9   MMP11   4.767644    2.70e-23    Increased
10  GUCA2B  -6.346156   2.17e-21    Decreased
11  KRT6B   11.801550   5.37e-20    Increased
12  WNT2    9.485133    6.47e-20    Increased
13  COL8A1  3.974965    6.47e-20    Increase

火山:

     genes   logFC       FDR             group    
1   INHBA   6.271879    2.070000e-30    Increased
2   COL10A1 7.634386    1.820000e-23    Increased
3   WNT2    9.485133    6.470000e-20    Increased
4   COL8A1  3.974965    6.470000e-20    Increased
5   THBS2   4.104176    2.510000e-19    Increased
6   BGN     3.524484    5.930000e-18    Increased
7   COMP    11.916956   2.740000e-17    Increased
9   SULF1   3.540374    1.290000e-15    Increased
10  CTHRC1  3.937028    4.620000e-14    Increased
11  TRIM29  3.827088    1.460000e-11    Increased
12  SLC6A20 5.060538    5.820000e-11    Increased
13  SFRP4   5.924330    8.010000e-11    Increased
14  CDH3    5.330732    8.940000e-11    Increased
15  ESM1    6.491496    3.380000e-10    Increased
614 TDP2    -1.801368   0.002722461     NotSignificant
615 EPHX2   -1.721039   0.002722461     NotSignificant
616 RAVER2  -1.581812   0.002749728     NotSignificant
617 BMP6    -2.702780   0.002775460     Increased
619 SCNN1G  -4.012111   0.002870500     Increased
620 SLC52A3 -1.868920   0.002931197     NotSignificant
621 VIPR1   -1.556238   0.002945578     NotSignificant
622 SUCLG2  -1.720993   0.003059717     NotSignificant

【问题讨论】:

    标签: r dataframe ggplot2 data.table


    【解决方案1】:

    提供的示例数据集不完整,因为没有重叠,因此很难根据它进行颜色编码。试试下面的方法,关键是你不能使用==,而是%in%来返回一个布尔值,判断你在whole_colon中的基因是否在volcano中:

    whole_colon=structure(list(genes = structure(c(5L, 11L, 3L, 
    7L, 10L, 1L, 
    2L, 9L, 6L, 8L, 12L, 4L, 13L, 14L), .Label = c("BEST4", "COL10A1", 
    "COL11A1", "COL8A1", "CST1", "GUCA2B", "INHBA", "KRT6B", "MMP11", 
    "MMP7", "OTOP2", "WNT2", "ABC", "DEF"), class = "factor"), logFC = c(9.554742, 
    -9.408177, 6.825363, 6.271879, 7.594926, -7.756451, 7.634386, 
    4.767644, -6.346156, 11.80155, 9.485133, 3.974965, 0.5, -0.5), 
        FDR = c(5.64e-45, 5.76e-32, 1e-31, 2.07e-30, 2.07e-30, 8.3e-30, 
        1.82e-23, 2.7e-23, 2.17e-21, 5.37e-20, 6.47e-20, 6.47e-20, 
        1, 1), group = c("Increased", "Decreased", "Increased", "specific_Increased", 
        "Increased", "Decreased", "specific_Increased", "Increased", 
        "Decreased", "Increased", "specific_Increased", "specific_Increased", 
        "NotSignificant", "NotSignificant")), row.names = c("1", 
    "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", 
    "2"), class = "data.frame")
    

    设置组:

    #set the decreased and increased like you did:
    whole_colon["group"] <- "NotSignificant"
    whole_colon[which(whole_colon['FDR'] < 0.05 & whole_colon['logFC'] > 1.5),"group"] <- "Increased"
    whole_colon[which(whole_colon['FDR'] < 0.05 & -whole_colon['logFC'] > 1.5),"group"] <- "Decreased"
    whole_colon[which(whole_colon['FDR'] < 0.05 & whole_colon['logFC'] > 1.5 & whole_colon$genes %in% volcano$genes),"group"] <- "specific_Increased"
    whole_colon[which(whole_colon['FDR'] < 0.05 & whole_colon['logFC'] < -1.5 & whole_colon$genes %in% volcano$genes),"group"] <- "specific_Decreased"
    

    和情节:

    cols = c("grey","blue","blue","red","red")
    names(cols) = c("NotSignificant","Increased","Decreased",
    "specific_Increased","specific_Decreased")
    
        library(ggplot2)
            ggplot(whole_colon, aes(x = logFC, y = -log10(FDR), color = group))+
              scale_colour_manual(values = cols) +
              ggtitle(label = "Volcano Plot", subtitle = "colon specific volcano plot") +
              geom_point(size = 2.5, alpha = 1, na.rm = T) +
              theme_bw(base_size = 14) + 
              theme(legend.position = "right") + 
              xlab(expression(log[2]("logFC"))) + 
              ylab(expression(-log[10]("FDR"))) +
              geom_hline(yintercept = 1.30102, colour="#990000", linetype="dashed") + 
              geom_vline(xintercept = 1.5849, colour="#990000", linetype="dashed") + 
              geom_vline(xintercept = -1.5849, colour="#990000", linetype="dashed")+ 
              scale_y_continuous(trans = "log1p")
    

    #

    【讨论】:

    • 我怀疑这是否会帮助您解决问题,但是他们确实有称为“COL8A1”的通用数据(如果您愿意,我可以更改此样本数据以包含更多常见基因)。为了清楚起见,我只是在寻找一个相同的列 [基因],而不是整个数据行。而且这个命令也没有令人遗憾地工作。给了我“whole_colon”数据的完整图像[这是进步!!],但仍然缺少涉及公共数据集的颜色标记。
    • 您想对在另一个数据集中发现的重要且不同的数据进行颜色编码,对吗?即 5 个不同的组,显着上/下,不显着,显着上/下并在火山中发现,我做对了吗
    • 你需要为向量cols指定5种颜色
    • 感谢您的真诚帮助,是的,我很想给它们涂上不同的颜色,但并不是所有这 5 个扇区都需要有不同的颜色。上面的颜色就足够了,而我们只需添加一种带有“specific_Increased”和“specific_Decreased”的“红色”
    【解决方案2】:

    我想我解决了这个问题。很简单,多加一句,这个问题就解决了。 在调整了@StupidWolf 的建议和对 col 的 lil 重新定义过程之后,我得到了我想要的图像。

    cols<- c(red="red", orange="orange", NotSignificant="darkgrey", Increased= "#00B2FF" ,Decreased="#00B2FF", specific_Increased="#ff4d00", specific_Decreased="#ff4d00" )
    head(cols)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-02-22
      • 2021-09-17
      • 2017-07-04
      • 1970-01-01
      • 2021-08-24
      相关资源
      最近更新 更多