【发布时间】:2020-01-29 07:40:40
【问题描述】:
我在为某些基因着色以指定 2 个数据集(whole_colon/volcano)中的常见基因时遇到问题。 下面的代码运行良好。然而,问题是我想添加一些非常棘手的细节。
我想为常见的基因应用不同的颜色(红色会很棒):仅当满足此陈述时:(whole_colon$genes==volcano$genes)。 我试图将组区分为(specified_increased/specified_decreased),但遗憾的是没有成功。
这是我的代码。
提前非常感谢。
#volcano plot using ggplot2
library(data.table)
# Adding group to decipher if the gene is significant or not:
whole_colon <- data.frame(whole_colon)
whole_colon["group"] <- "NotSignificant"
whole_colon[which(whole_colon['FDR'] < 0.05 & whole_colon['logFC'] > 1.5),"group"] <- "Increased"
whole_colon[which(volcano['FDR'] < 0.05 & volcano['logFC'] > 1.5),"group"] <- "colon_Increased_specialized"
whole_colon[which(volcano['FDR'] < 0.05 & volcano['logFC'] < -1.5),"group"] <- "colon_Decreased_specialized"
with(subset(whole_colon , FDR<0.05), points(logFC, -log10(FDR), pch=20,col="red"), whole_colon$genes==volcano$genes)
library(ggplot2)
ggplot(whole_colon, aes(x = logFC, y = -log10(FDR), color = group))+
scale_colour_manual(values = cols) +
ggtitle(label = "Volcano Plot", subtitle = "colon specific volcano plot") +
geom_point(size = 2.5, alpha = 1, na.rm = T) +
theme_bw(base_size = 14) +
theme(legend.position = "right") +
xlab(expression(log[2]("logFC"))) +
ylab(expression(-log[10]("FDR"))) +
geom_hline(yintercept = 1.30102, colour="#990000", linetype="dashed") +
geom_vline(xintercept = 1.5849, colour="#990000", linetype="dashed") +
geom_vline(xintercept = -1.5849, colour="#990000", linetype="dashed")+
scale_y_continuous(trans = "log1p")
这给了我一个看起来像这样的受损图像。 (我希望 'whole_colon data' 被完全标记,而当它们具有与 'volcano data' 相同的基因时,颜色呈红色)
以下是来自 whole_colon 和volcano 的一些数据子集 整个冒号:
genes logFC FDR group
1 CST1 9.554742 5.64e-45 Increased
3 OTOP2 -9.408177 5.76e-32 Decreased
4 COL11A1 6.825363 1.00e-31 Increased
5 INHBA 6.271879 2.07e-30 Increased
6 MMP7 7.594926 2.07e-30 Increased
7 BEST4 -7.756451 8.30e-30 Decreased
8 COL10A1 7.634386 1.82e-23 Increased
9 MMP11 4.767644 2.70e-23 Increased
10 GUCA2B -6.346156 2.17e-21 Decreased
11 KRT6B 11.801550 5.37e-20 Increased
12 WNT2 9.485133 6.47e-20 Increased
13 COL8A1 3.974965 6.47e-20 Increase
火山:
genes logFC FDR group
1 INHBA 6.271879 2.070000e-30 Increased
2 COL10A1 7.634386 1.820000e-23 Increased
3 WNT2 9.485133 6.470000e-20 Increased
4 COL8A1 3.974965 6.470000e-20 Increased
5 THBS2 4.104176 2.510000e-19 Increased
6 BGN 3.524484 5.930000e-18 Increased
7 COMP 11.916956 2.740000e-17 Increased
9 SULF1 3.540374 1.290000e-15 Increased
10 CTHRC1 3.937028 4.620000e-14 Increased
11 TRIM29 3.827088 1.460000e-11 Increased
12 SLC6A20 5.060538 5.820000e-11 Increased
13 SFRP4 5.924330 8.010000e-11 Increased
14 CDH3 5.330732 8.940000e-11 Increased
15 ESM1 6.491496 3.380000e-10 Increased
614 TDP2 -1.801368 0.002722461 NotSignificant
615 EPHX2 -1.721039 0.002722461 NotSignificant
616 RAVER2 -1.581812 0.002749728 NotSignificant
617 BMP6 -2.702780 0.002775460 Increased
619 SCNN1G -4.012111 0.002870500 Increased
620 SLC52A3 -1.868920 0.002931197 NotSignificant
621 VIPR1 -1.556238 0.002945578 NotSignificant
622 SUCLG2 -1.720993 0.003059717 NotSignificant
【问题讨论】:
标签: r dataframe ggplot2 data.table