【问题标题】:How to replace for loop making processing faster?如何替换 for 循环使处理更快?
【发布时间】:2021-08-12 11:44:23
【问题描述】:

我有很多文件要处理。数据如下:

         V2              V3   V4
1 ID_0071817               1    1
2          1 201912312200+00 0.36
3          2 201912312300+00 0.36
4          3 202001010000+00 0.38
5 ID_0089011               1 1.00
6          1 202001010200+00 0.36


我现在要做的是:


for(j in 1:nrow(data)) { if (data[j,2] == "1") {ID<-data[j,1]}
  data[j,4] <- ID
}


产生:


      V2              V3   V4       V4.1
1 ID_0071817               1    1 ID_0071817
2          1 201912312200+00 0.36 ID_0071817
3          2 201912312300+00 0.36 ID_0071817
4          3 202001010000+00 0.38 ID_0071817
5 ID_0089011               1 1.00 ID_0089011
6          1 202001010200+00 0.36 ID_0089011


问题是这种处理整个数据的方式太慢了。单个文件大约需要 5 分钟,我得到了几千个。


【问题讨论】:

  • 请添加语言标签 - 是 R 吗?

标签: r performance loops for-loop


【解决方案1】:

我们可以试试cumsum,如下所示

transform(
  df,
  V4.1 = V2[V3 == 1][cumsum(V3 == 1)]
)

给了

          V2              V3   V4       V4.1
1 ID_0071817               1 1.00 ID_0071817
2          1 201912312200+00 0.36 ID_0071817
3          2 201912312300+00 0.36 ID_0071817
4          3 202001010000+00 0.38 ID_0071817
5 ID_0089011               1 1.00 ID_0089011
6          1 202001010200+00 0.36 ID_0089011

数据

> dput(df)
structure(list(V2 = c("ID_0071817", "1", "2", "3", "ID_0089011", 
"1"), V3 = c("1", "201912312200+00", "201912312300+00", "202001010000+00",
"1", "202001010200+00"), V4 = c(1, 0.36, 0.36, 0.38, 1, 0.36)), class = "data.frame", row.names 
= c("1",
"2", "3", "4", "5", "6"))

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2022-10-15
    • 1970-01-01
    • 2016-05-30
    • 1970-01-01
    • 2017-01-19
    • 1970-01-01
    • 2019-07-25
    相关资源
    最近更新 更多