平均 95% 阴影的曲线答案

【问题标题】：Averaging of curves with 95% shade平均 95% 阴影的曲线
【发布时间】：2019-04-07 15:27:48
【问题描述】：

我有以下数据，

Data = data.frame(Participant = rep(sprintf("part%03d", 1:100), each=100),
                  Group = rep(c(0,1), each=5*1e3),
                  Evidence = rnorm(1e4),
                  CorrectOrNot = c(rbinom(5*1e3, size=1, prob=.3),
                                   rbinom(5*1e3, size=1, prob=.6)))

其中“参与者”是每个参与者的索引，“组”是每个参与者被分配到的条件，“证据”是对每个参与者的刺激“强度”，“CorrectOrNot”是每个刺激答案的正确性每个参与者。

所以我对每个参与者进行了逻辑回归，关于证据和正确答案概率之间的关系。

plot(1, type="n", xlab="Evidence", ylab="probCorrect", 
     xlim=c(-3, 3), ylim=c(0, 1))

for (i in 1:100)
{
  part = sprintf("part%03d", i)
  test = Data[Data$Participant==part,]

  fit = glm(CorrectOrNot ~ Evidence, test, family=binomial)
  newDat = data.frame(Evidence=seq(min(test$Evidence),max(test$Evidence),len=100))
  newDat$pc = predict(fit, newdata=newDat, type="response")

  lines(pc ~ Evidence, newDat, col=ifelse(test$Group[1]==0, "green", "red"), lwd=2)
}

legend(-3, 1, legend=c("Group 0", "Group 1"),
       col=c("green", "red"), lty=1:2, cex=0.6)

为了可视化生成的曲线，我编写了上面的代码，结果看起来很混乱。所以我想将这些线“平均”成每组的两条代表线，它们周围有一些阴影代表每组 95% 的“范围”。

任何帮助，包括使用 ggplot2 的帮助，将不胜感激。

【问题讨论】：

标签： r ggplot2

【解决方案1】：

tidyverse 包（还包括 ggplot2）可以帮助我们稍微重新组织您的代码。例如，我们可以对 Participant 列的每个唯一值进行一系列操作，而不是显式循环：

library(tidyverse)

newDat2 <- Data %>% 
  nest(-Participant) %>% 
  mutate(
    smoothDat = map(data, function(x) data.frame(Group = x$Group[1], Evidence=seq(min(x$Evidence),max(x$Evidence),len=100))),
    fit = map(data, function(x) glm(CorrectOrNot ~ Evidence, x, family=binomial)),
    predict = map2(smoothDat, fit, function(s, f) {
      s$pc <- predict(f, newdata = s, type = 'response')
      return(s)
    })
  )

在对mutate 的调用中，“smoothDat”创建用于生成预测的数据，“fit”为每个参与者计算模型，最后，“predict”包含返回的预测。最后，我们取消嵌套“预测”：

newDat2 <- unnest(newDat2, predict)

   Participant Group Evidence    pc
   <fct>       <dbl>    <dbl> <dbl>
 1 part001         0    -2.47 0.215
 2 part001         0    -2.42 0.215
 3 part001         0    -2.37 0.216
 4 part001         0    -2.32 0.217
 5 part001         0    -2.27 0.217
 6 part001         0    -2.22 0.218
 7 part001         0    -2.17 0.219
 8 part001         0    -2.12 0.219
 9 part001         0    -2.07 0.220
10 part001         0    -2.02 0.221
# ... with 9,990 more rows

为所有参与者获取与 ggplot2 兼容的数据集。

从那里开始，绘图代码相对容易。我正在使用geom_smooth 计算每个组的摘要。那里有很多选择。

plot.newdat <- ggplot(data = newDat2, aes(x = Evidence, y = pc, color = factor(Group), group = Participant)) +
  geom_line(alpha = 0.2) +
  geom_smooth(aes(group = Group), method = glm, method.args = list(family = binomial))
print(plot.newdat)

【讨论】：