在 R 中，访问存储在列表中的索引答案

【问题标题】：In R, access indices stored in a list在 R 中，访问存储在列表中的索引
【发布时间】：2018-04-15 07:30:40
【问题描述】：

我有一个数据框 df，和一个索引列表 L，我应该在其中放置 0 而不是 df 的当前值。

例子：

DF：

# A tibble: 11 x 3
      A     B     C
    <dbl> <dbl> <dbl>
    1724     4  2013
    1758     4  2013
    1612     3  2013
    1692     3  2013
    1260    33  2014
    1157    22  2014
    1359    63  2014
    1414    27  2014
    387     3  2016
    374     3  2016

左：

[[1]]
[1] 3 4

[[2]]
[1] 1 2 3 4 5

[[3]]
[1] 1

所以在这个例子中，我必须在 A 列的第 3、4 行、B 列的 1:5 行和 C 列的第 1 行中输入零。

有没有办法在 R 中作为单行来做到这一点？ dplyr 或 R-base 解决方案会很棒！另外，我想避免应用或循环，因为我必须非常有效地做到这一点

【问题讨论】：

标签： r dplyr subset

【解决方案1】：

循环在我看来非常快。尚未完成复杂性比较，但如果您以列表形式替换并希望替换为 'val'，只需：

df
    a  b  c
1   1  1  1
2   2  2  2
3   3  3  3
4   4  4  4
5   5  5  5
6   6  6  6
7   7  7  7
8   8  8  8
9   9  9  9
10 10 10 10

val<-0
for(i in 1:length(L)){
  df[L[[i]],i]<-val
}

df
    a  b  c
1   1  0  0
2   2  0  2
3   0  0  3
4   0  0  4
5   5  0  5
6   6  6  6
7   7  7  7
8   8  8  8
9   9  9  9
10 10 10 10

我在 x 上进行了测试，10,000 行和 10,0000 列 df：

> b<-Sys.time()
> for(i in 1:length(L)){
+ x[L[[i]],i]<-0
+ }
> Sys.time()-b
Time difference of 0.490464 secs

看起来很快 :) 我知道这很明显，但希望它有所帮助！

******** 编辑 1 ********

如果我们使用 unlist 和 cbind 查看 @mt1022 的方法：

> b<-Sys.time()
> Lcol <- rep(seq_along(L), lengths(L))
> x[cbind(unlist(L), Lcol)] <- 0
> Sys.time()-b
Time difference of 7.467723 secs

显然要慢得多（因为当我们取消列表时，我们本质上是循环遍历 L 中的每个元素而不是 L 中的每个向量）。 ;)

【讨论】：

感谢您对我的解决方案进行基准测试。如手册中所述，似乎“不推荐使用 [ 矩阵索引（x[i] 与逻辑或 2 列整数矩阵 i）”。
我真的应该阅读更多手册。不过看起来很公平，否则你会不必要地使用相同的内存。

【解决方案2】：

使用索引矩阵的另一种方式：

# DF <- read.table(textConnection('A     B  C
#     1724     4  2013
#     1758     4  2013
#     1612     3  2013
#     1692     3  2013
#     1260    33  2014
#     1157    22  2014
#     1359    63  2014
#     1414    27  2014
#     387     3  2016
#     374     3  2016'), header = T)
# 
# L <- list(c(3, 4), c(1, 2, 3, 4, 5), c(1))


Lcol <- rep(seq_along(L), lengths(L))
DF[cbind(unlist(L), Lcol)] <- 0

# > DF
#       A  B    C
# 1  1724  0    0
# 2  1758  0 2013
# 3     0  0 2013
# 4     0  0 2013
# 5  1260  0 2014
# 6  1157 22 2014
# 7  1359 63 2014
# 8  1414 27 2014
# 9   387  3 2016
# 10  374  3 2016

【讨论】：

优秀的答案。也许您可以将其设为 DF[cbind(unlist(mylist), rep(seq_along(L), lengths(L)))] <- 0
@MKR 当然，但单行代码会影响可读性。无论如何，主要问题是这种方法太慢了:(。

【解决方案3】：

另一种选择是将mapply 与do.call 结合使用。

  do.call(cbind, mapply(function(x,y){
    df[x,y]<-0
    df[y]
  }, mylist, seq_along(mylist)))

  #         A  B    C
  # [1,] 1724  0    0
  # [2,] 1758  0 2013
  # [3,]    0  0 2013
  # [4,]    0  0 2013
  # [5,] 1260  0 2014
  # [6,] 1157 22 2014
  # [7,] 1359 63 2014
  # [8,] 1414 27 2014
  # [9,]  387  3 2016
  # [10,]  374  3 2016

数据：

  df <- read.table(text = 
      "A       B     C
      1724     4  2013
      1758     4  2013
      1612     3  2013
      1692     3  2013
      1260    33  2014
      1157    22  2014
      1359    63  2014
      1414    27  2014
      387     3  2016
      374     3  2016", header = TRUE)

  mylist <- list(c(3, 4), c(1, 2, 3, 4, 5), c(1))

【讨论】：