【发布时间】:2017-03-18 11:22:06
【问题描述】:
我正在使用ggplot2 创建带有标准偏差条的条形图。我的数据框很大,但这里是一个截断的版本,例如:
SampleName Target.ID Maj.Allele.Freq SD AVG.MAF
W15-P2-1 rs1005533 99.74811083 24.98883743 93.70753223
W15-P2-2 rs1005533 100 24.98883743 93.70753223
W15-P2-3 rs1005533 100 24.98883743 93.70753223
W15-P2-4 rs1005533 100 24.98883743 93.70753223
W15-P2-1 rs1005533 99.94819995 24.98883743 93.70753223
W15-P2-2 rs1005533 100 24.98883743 93.70753223
W15-P2-3 rs1005533 100 24.98883743 93.70753223
W15-P2-4 rs1005533 100 24.98883743 93.70753223
W21-P2-1 rs1005533 100 24.98883743 93.70753223
W21-P2-2 rs1005533 100 24.98883743 93.70753223
W21-P2-3 rs1005533 99.90044798 24.98883743 93.70753223
W21-P2-4 rs1005533 99.72375691 24.98883743 93.70753223
W21-P2-1 rs1005533 100 24.98883743 93.70753223
W21-P2-2 rs1005533 100 24.98883743 93.70753223
W21-P2-3 rs1005533 100 24.98883743 93.70753223
W21-P2-4 rs1005533 0 24.98883743 93.70753223
W15-P2-1 rs10092491 52.40641711 1.340954343 51.8604281
W15-P2-2 rs10092491 53.69923603 1.340954343 51.8604281
W15-P2-3 rs10092491 52.56689284 1.340954343 51.8604281
W15-P2-4 rs10092491 50.11764706 1.340954343 51.8604281
W15-P2-1 rs10092491 50.30094583 1.340954343 51.8604281
W15-P2-2 rs10092491 50.96277279 1.340954343 51.8604281
W15-P2-3 rs10092491 50.94102886 1.340954343 51.8604281
W15-P2-4 rs10092491 51.2849162 1.340954343 51.8604281
W21-P2-1 rs10092491 53.56976202 1.340954343 51.8604281
W21-P2-2 rs10092491 50.27861123 1.340954343 51.8604281
W21-P2-3 rs10092491 52.8358209 1.340954343 51.8604281
W21-P2-4 rs10092491 51.42585551 1.340954343 51.8604281
W21-P2-1 rs10092491 52.77890467 1.340954343 51.8604281
W21-P2-2 rs10092491 52.89017341 1.340954343 51.8604281
W21-P2-3 rs10092491 53.70786517 1.340954343 51.8604281
W21-P2-4 rs10092491 50 1.340954343 51.8604281
由于最后一列 (AVG.MAF) 中的平均值可能会产生超过最大值 100 的标准差条形图,因此该图显示的条形图超出了 y 轴 100 的限制。
这是创建上述图的代码:
pe1 = ggplot(half1, aes(x=Target.ID, y=AVG.MAF))+
geom_bar(stat = "identity", position = "dodge", colour = "black",
width = 0.5, fill = "yellowgreen")+xlab("")+
ylab("Average Major Allele Frequency")+
labs(title="Allele Balance AmpliSeq Identity Sample P2")+
geom_errorbar(aes(ymin = AVG.MAF-SD, ymax = AVG.MAF+SD),
width = 0.4, position = position_dodge(0.9),
size = 0.6)+
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = .5))
我尝试使用coord_cartesian 截断情节,但这会使情节看起来像是在隐藏一些数据:
以下是创建标准差条截断图的代码:
pe1 = ggplot(half1, aes(x=Target.ID, y=AVG.MAF))+geom_bar(stat = "identity", position = "dodge", colour = "black", width = 0.5, fill = "yellowgreen")+xlab("")+ylab("Average Major Allele Frequency")+labs(title="Allele Balance AmpliSeq Identity Sample P2")+geom_errorbar(aes(ymin = AVG.MAF-SD, ymax = AVG.MAF+SD), width = 0.4, position = position_dodge(0.9), size = 0.6)+theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = .5))+coord_cartesian(ylim=c(0,100))
似乎必须有一种方法可以将标准偏差条限制为我预期的 ymax 100,并且仍然保持顶部水平条在图中可见。有人知道怎么做吗?
【问题讨论】:
-
为什么要通过截断 std 开发栏的顶部来歪曲标准差?
-
...geom_errorbar(aes(ymin = AVG.MAF-SD, ymax = pmin(AVG.MAF+SD,100)...会做你想做的事吗?几乎可以肯定的是,您现在低估了不确定性,可能是因为使用的基础错误模型不合适。 -
@NathanDay 和 Miff 你都给了我一些思考。感谢你们的 cmets 和可能的解决方案。
-
要连接到 Nathan Day 的评论,也许标准差并不是你真正应该追求的,如果你可以获得自举置信区间呢?
标签: r ggplot2 standard-deviation