如何使用 R 有效地引导组（多级）答案

【问题标题】：How to efficiently bootstrap groups (multilevel) using R如何使用 R 有效地引导组（多级）
【发布时间】：2013-03-11 23:16:04
【问题描述】：

我正在分析一项包含 40 个人的研究，每个人评价 10 个小插曲。

indiv     vign      score    score2    gender    
  1         1         5         3        1
  1         2         2         4        1   
  1         3         8         1        1
  .         .         .         .        .
  .         .         .         .        .
  .         .         .         .        .
  39       10         9         1        1 
  40        8         1         5        0 
  40        9         3         8        0

我想尝试一下，但很快我就意识到采样小插曲是没有意义的；我们应该对人进行抽样（所以我们每人抽样大约 10 行）。

以下功能有效，但它是下一个功能的瓶颈。那么问题来了，如何才能更有效地做到这一点？

ResampleMultilevel <- function(data, groupvar) {
  n <- length(unique(data[,groupvar]))

  index <- sample(data[ , groupvar], n, replace = TRUE)

  resampled <- NULL      # one of the issues is that we do not know 
                         # the size of the matrix yet, since it may vary. 
  for (i in 1:n) {
   resampled <- rbind(resampled, data[data[, groupvar] == index[i], ])
  }
  return(resampled)
}

subset 的问题是我找不到保留重复项的方法。

a <- cbind(rep(1:40, each = 10), rep(1:10, 4), rnorm(40), rnorm(40)), rep(1:10, 4), rnorm(40), rnorm(40))

index <- c(1,1)

subset(a, a[,1] == index)

【问题讨论】：

示例数据：cbind(1:40, rep(1:10, 4), rnorm(40), rnorm(40))
当前使用什么作为groupvar 参数，indiv 或vign？
我认为您的 for 循环可以替换为 data[index,] 。我认为这会节省一点。
@Marius 我现在正在使用indiv。
@Seth，这行不通。您需要为index 中的每个数字（人）选择大约 10 个小插曲。请注意，也可能有重复的人，不会被选中。

标签： r sample statistics-bootstrap

【解决方案1】：

一个

这几乎可行，只是结构并不是我想要的矩阵。

lapply(index, function(x) a[which(a[,1] == x),])

此外，这几乎可以实现，如果有一种非循环方式可以做到这一点，那就太好了，因为这里它只适用于数字 2：

a[which(a[,1] == 2),]       # works
a[which(a[,1] == index), ]  # does not work

【讨论】：

【解决方案2】：

基于 cmets，我正在修改答案。

a <- cbind(rep(1:40, each = 10), rep(1:10, 4), rnorm(40), rnorm(40))
index <- c(1, 1, 3, 4, 2)
a[a[, 1] %in% index, ]
##       [,1] [,2]        [,3]        [,4]
##  [1,]    1    1  0.28135473  0.47970116
##  [2,]    1    2 -0.12628982  0.34862899
##  [3,]    1    3 -0.41140740  1.30204100
##  [4,]    1    4 -0.61163593 -1.13354157
##  [5,]    1    5 -0.31538238  1.42701315
##  [6,]    1    6 -0.20403098  2.13989392
##  [7,]    1    7  0.37681973  0.65843232
##  [8,]    1    8 -0.94062165  0.97246212
##  [9,]    1    9  0.63377352 -0.48948273
## [10,]    1   10 -0.39817929 -1.03607028
## [11,]    2    1  0.54866153 -0.55127459
## [12,]    2    2  0.08410140  0.01457366
## [13,]    2    3 -1.19006851  1.33213116
## [14,]    2    4 -0.47210092  0.83369309
## [15,]    2    5  0.75968678 -0.48212390
## [16,]    2    6 -1.00205770  0.56376027
## [17,]    2    7  0.67251644  0.07234657
## [18,]    2    8  0.73165780 -0.51483172
## [19,]    2    9 -0.26022238  2.33181762
## [20,]    2   10  0.03370091 -0.71427295
## [21,]    3    1  0.60810461  0.15054307
## [22,]    3    2 -1.29363706  1.30510127
## [23,]    3    3 -0.20479713 -2.39797975
## [24,]    3    4 -0.86927664 -0.10845738
## [25,]    3    5  0.89040130 -0.08459249
## [26,]    3    6 -0.21511823  1.33960644
## [27,]    3    7 -0.32413278 -0.31691484
## [28,]    3    8 -0.61545941 -0.10457591
## [29,]    3    9 -1.85072358  0.93267270
## [30,]    3   10  0.38456423  0.76231047
## [31,]    4    1  0.76016236  1.63854054
## [32,]    4    2 -0.94463491  1.87271085
## [33,]    4    3  1.62451250  1.63298961
## [34,]    4    4 -1.96908559  0.89058201
## [35,]    4    5  1.66755533  0.10288947
## [36,]    4    6 -0.02182803 -0.91358891
## [37,]    4    7 -0.09382921 -0.54950093
## [38,]    4    8  0.74597002  2.31924468
## [39,]    4    9  0.64732694  0.29681494
## [40,]    4   10 -0.66535049  1.81285111

【讨论】：

因为我们希望它返回更多。请记住，这 1、1、3、4、2 应该是人，每个人都附有大约 10 个小插曲。
我现在看到我的示例无效......我的错误。试试这个：cbind(rep(1:40, each = 10), rep(1:10, 4), rnorm(40), rnorm(40))
a[which(a[,1] == 2),] 这有点作用，现在我想用一个可能为真的向量替换“2”！
所以你不希望同一行重复但第一列值在索引中的所有行，对吗？
哇，这太棒了。 %in% 到底是做什么的？