【问题标题】：How to replace for loop making processing faster?如何替换 for 循环使处理更快？
【发布时间】：2021-08-12 11:44:23
【问题描述】：

我有很多文件要处理。数据如下：

         V2              V3   V4
1 ID_0071817               1    1
2          1 201912312200+00 0.36
3          2 201912312300+00 0.36
4          3 202001010000+00 0.38
5 ID_0089011               1 1.00
6          1 202001010200+00 0.36

我现在要做的是：


for(j in 1:nrow(data)) { if (data[j,2] == "1") {ID<-data[j,1]}
  data[j,4] <- ID
}

产生：


      V2              V3   V4       V4.1
1 ID_0071817               1    1 ID_0071817
2          1 201912312200+00 0.36 ID_0071817
3          2 201912312300+00 0.36 ID_0071817
4          3 202001010000+00 0.38 ID_0071817
5 ID_0089011               1 1.00 ID_0089011
6          1 202001010200+00 0.36 ID_0089011

问题是这种处理整个数据的方式太慢了。单个文件大约需要 5 分钟，我得到了几千个。

【问题讨论】：

请添加语言标签 - 是 R 吗？

标签： r performance loops for-loop

【解决方案1】：

我们可以试试cumsum，如下所示

transform(
  df,
  V4.1 = V2[V3 == 1][cumsum(V3 == 1)]
)

给了

          V2              V3   V4       V4.1
1 ID_0071817               1 1.00 ID_0071817
2          1 201912312200+00 0.36 ID_0071817
3          2 201912312300+00 0.36 ID_0071817
4          3 202001010000+00 0.38 ID_0071817
5 ID_0089011               1 1.00 ID_0089011
6          1 202001010200+00 0.36 ID_0089011

数据

> dput(df)
structure(list(V2 = c("ID_0071817", "1", "2", "3", "ID_0089011", 
"1"), V3 = c("1", "201912312200+00", "201912312300+00", "202001010000+00",
"1", "202001010200+00"), V4 = c(1, 0.36, 0.36, 0.38, 1, 0.36)), class = "data.frame", row.names 
= c("1",
"2", "3", "4", "5", "6"))

【讨论】：