【问题标题】：Add median trend line and p-value for one-sided repeated measures test in 2-y axis scatter plot [R]在 2-y 轴散点图中添加单侧重复测量检验的中值趋势线和 p 值 [R]
【发布时间】：2016-03-29 11:46:55
【问题描述】：

加载示例数据框

df <- structure(list(ID = c(1,1,1,2,2,2,3,3,3), 
time = c(0L,1L,2L,0L,1L,2L,0L,1L,2L),
M1a = c(0, 0.2, 0.3, 0, 1.5, 2.9,0, 2.4, 3.9), 
M2a = c(0, 0.4, 0.6,0,0.9, 0.9,0,0.5, 0.7), 
M3a = c(0,0.3, 0.4, 0, 0.6, 0.9,0, 0.5, 0.8), 
M4a = c(0,0.6, 0.6,0, 0.4, 0.6,0, 0.2, 0.9), 
M1b = c(0L, 200L, 300L,0L, 300L, 900L,0L, 900L, 1000L), 
M2b = c(0L, 400L, 600L,0L, 600L, 900L,0L, 600L, 1000L), 
M3b = c(0L, 300L, 400L,0L, 200L, 800L,0L, 200L, 900L), 
M4b = c(0L, 600L, 600L,0L, 800L, 1000L,0L, 400L, 1100L)), 
.Names = c("ID", "time", "M1a", "M2a", "M3a", "M4a","M1b", "M2b","M3b", "M4b"), class = "data.frame", row.names = c(NA, -9L))

现在绘制两个 y 轴散点图

par(mar=c(5,4,4,5)+.1)
plot(df$time,df$M1a,type="p",col="red", main="M1", cex=0.5, cex.main=2, cex.lab=1.0, cex.axis=0.7)
par(new = TRUE)
plot(df$time,df$M1b,type="p",col="blue",xaxt="n",yaxt="n",xlab="",ylab="")
mtext("Relative change (%)",side=4,line=3)
axis(4)
legend("topleft",col=c("red","blue"),lty=1,legend=c("Absolute Change","Relative Change"))

我被什么困住了？

1.中值趋势线

我能够添加回归线，但我想要一条连接三个时间点的 M1a 和 M1b 中位数的中位数趋势线。

2.在图中添加p值，重复单向方差分析

fit1=aov(df$M1a~df$time + Error(ID/time),na.action=na.exclude,data=df);
sig1= summary(fit1)$"Error: Within"$"Pr(>F)"
if (sig<0.001) star='**' else if (sig>=0.001&sig<0.05) star='*' else star='';
if (sig1<0.001) star='**' else star='';

我计划在我的 2-y 轴图中添加使用上述代码来添加 p 值。在这里，我将 sig1 设为 NULL，但是，sig1 应该打印出 0.153。

如果结果显着，最终结果应在情节 (M1) 的主标题上包含 * 标记。

有什么建议吗？提前致谢！

【问题讨论】：

标签： r

【解决方案1】：

首先要回答 #2，需要查看 summary.aov 对象的内部结构：

dput(summary(fit1))
structure(list(`Error: ID` = structure(list(structure(list(Df = 1, 
    `Sum Sq` = 5.60666666666667, `Mean Sq` = 5.60666666666667, 
    `F value` = NA_real_, `Pr(>F)` = NA_real_), .Names = c("Df", 
"Sum Sq", "Mean Sq", "F value", "Pr(>F)"), class = c("anova", 
"data.frame"), row.names = "Residuals")), class = c("summary.aov", 
"listof")), `Error: ID:time` = structure(list(structure(list(
    Df = 1, `Sum Sq` = 11.3157142857143, `Mean Sq` = 11.3157142857143), .Names = c("Df", 
"Sum Sq", "Mean Sq"), class = c("anova", "data.frame"), row.names = "df$time")), class = c("summary.aov", 
"listof")), `Error: Within` = structure(list(structure(list(Df = c(1, 
5), `Sum Sq` = c(0.325952380952381, 0.573888888888889), `Mean Sq` = c(0.325952380952381, 
0.114777777777778), `F value` = c(2.83985617480293, NA), `Pr(>F)` = c(0.152766396924706, 
NA)), .Names = c("Df", "Sum Sq", "Mean Sq", "F value", "Pr(>F)"
), class = c("anova", "data.frame"), row.names = c("df$time  ", 
"Residuals"))), class = c("summary.aov", "listof"))), .Names = c("Error: ID", 
"Error: ID:time", "Error: Within"), class = "summary.aovlist")

请注意，summary(fit1)$"Error: Within" 中的值实际上被埋得更深一层（并且没有名称，因此需要数字索引。这样做：

summary(fit1)$"Error: Within"[[1]]$`Pr(>F)`
[1] 0.1527664        NA

现在看看我是否能弄清楚二-0 坐标的情节问题。很确定需要在par(new=TRUE) 操作之前进行任何中值绘图，因为这会根据新数据更改用户坐标系。

为您的情节添加具有提取价值的标题，并由@VincentBonhomme 的有用评论增强：

 plot(df$time,df$M1a,type="p",col="red", cex=0.5, cex.main=2, cex.lab=1.0, cex.axis=0.7)
lines(unique(df$time), 
        tapply(df$M1a, df$time, median)) 
par(new = TRUE) 
plot( df$time, df$M1b,type="p", col="blue", xaxt="n", yaxt="n", xlab="",ylab="") 
lines(unique(df$time), 
      tapply(df$M1b, df$time, median))
mtext("Relative change (%)",side=4,line=3)
axis(4)
legend("topleft",col=c("red","blue"), lty=1,legend=c("Absolute Change","Relative Change"))

title(main=bquote("P-value for M1 (absolute scale)"== 
                       .(round(summary(fit1)$"Error: Within"[[1]]$`Pr(>F)`, 3) ) )  )

【讨论】：

par(new=TRUE) 是对的，这似乎可行：plot(df$time,df$M1a,type="p",col="red", cex=0.5, cex.main=2, cex.lab=1.0, cex.axis=0.7) lines(unique(df$time), tapply(df$M1a, df$time, median)) par(new = TRUE) plot(df$time,df$M1b,type="p",col="blue",xaxt="n",yaxt="n",xlab="",ylab="") lines(unique(df$time), tapply(df$M1b, df$time, median))
通常人们希望根据值的联合范围在每个中设置 ylim，但这里的目标是完全不同的。两个纵坐标图被严厉批评为具有扭曲关系的巨大潜力。
quantreg 包提供分位数回归。 library(quantreg); abline(rq(M1b~time,data = df,tau = 0.5,col=2)。这里使用 50% quantil (tau) 进行估计。
@VincentBonhomme - 谢谢。我为示例 df 尝试了您的脚本，它成功运行，但它不适用于我的大数据。后来模仿示例 df 结构，您的脚本运行顺利，没有任何错误/警告，但我没有看到中间趋势线，实际上没有变化，想知道为什么？
@42- 谢谢！您的提示很好，此代码获取 p 值：sig1= summary(fit1)$"Error: Within"[[1]]$Pr(>F)[[1]]。我不想在标题中明确提及 p 值，而仅提及 ** 或 * 符号，以便它们表示 sig1=0.001&sig0.05，即标题将是 M1** 或 M1* 或 M1。基本原理 - 保持绘图整洁和对称，因为我可能会为 300 个自变量运行它。期待！