R：在循环数据帧列表时使用索引答案

【问题标题】：R: Using indices while looping through list of data framesR：在循环数据帧列表时使用索引
【发布时间】：2017-08-12 16:20:10
【问题描述】：

在过去的几天里，我一直在尝试使用索引来循环遍历数据帧列表以填充每个数据帧中的相同字段。但我一直无法想出解决方案。我很确定我应该使用 lapply，但我不知道如何引用数据框列表中的行号来执行命令。

我的数据如下所示：

           pin        keypin2
01011030030000 01011030030000
01011030030000              0
01011030040000 01011030030000
01011030040000              0
01011040040000 01011040030000
01011040040000 01011040030000
01011040040000 01011040030000
01011040040000              0
01011060040000 01011060010000
01011060040000              0
01011060040000              0
01011060040000              0

目标是在 keypin2 字段中使用直接高于“0”值的 keypin2 值填充“0”值——条件是 pin 值匹配。

我编写了一个简单的 for 循环来在单个数据帧上完成此操作：

for(i in 2:nrow(test3)) {
  if(test3[i,2] == "0") {
    if(test3[i,1]==test3[c(i-1),1]){
      test3[i,2] <- test3[c(i-1),2]
    }
  }
}

我从 2:nrow(test3) 开始，否则我会在第一条记录上得到一个负索引，并且我知道如果第一条记录的 keypin2 为“0”，那么我可以将其保留为“0”，因为有没有keypin2。

结果很完美：

           pin        keypin2
01011030030000 01011030030000
01011030030000 01011030030000
01011030040000 01011030030000
01011030040000 01011030030000
01011040040000 01011040030000
01011040040000 01011040030000
01011040040000 01011040030000
01011040040000 01011040030000
01011060040000 01011060010000
01011060040000 01011060010000
01011060040000 01011060010000
01011060040000 01011060010000

我现在想将其应用于具有相同结构的数据框列表。我确信我应该能够用 lapply 做到这一点，但我似乎无法做到这一点。任何帮助或指导将不胜感激。

【问题讨论】：

标签： r list loops dataframe lapply

【解决方案1】：

只需编写一个嵌入代码的函数，然后将 lapply 用于列表。

   # reproduce data, create list
test3 <- data.frame(
  pin = as.character(
    c(01011030030000,01011030030000,01011030040000,01011030040000,01011040040000,
      01011040040000,01011040040000,01011040040000,01011060040000,01011060040000,
      01011060040000,01011060040000)),
  keypin= as.character(
    c(01011030030000,0,01011030030000,0,01011040030000,01011040030000,
      01011040030000,0,01011060010000,0,0,0)),
  stringsAsFactors = F
)        
my.data <- list(test3, test3)



# define custom function (includes your code)   
    process.df <- function(df) {
      test3 <- df
      for(i in 2:nrow(test3)) {
        if(test3[i,2] == "0") {
          if(test3[i,1]==test3[c(i-1),1]){
            test3[i,2] <- test3[c(i-1),2]
          }
        }
      }
      return(test3)
    }

# execute
    lapply(my.data, process.df)

【讨论】：

【解决方案2】：

在不使用循环的情况下执行此操作的一种方法是使用 tidyr 包中的 fill 函数：

df<-read.table(header=TRUE, text="pin        keypin2
01011030030000 01011030030000
               01011030030000              0
               01011030040000 01011030030000
               01011030040000              0
               01011040040000 01011040030000
               01011040040000 01011040030000
               01011040040000 01011040030000
               01011040040000              0
               01011060040000 01011060010000
               01011060040000              0
               01011060040000              0
               01011060040000              0",   colClasses=c("character", "character"))

#replace the O with NA
df$keypin2[df$keypin2==0]<-NA

library(tidyr)
#replace the NA with the cell above
fill(df, keypin2, .direction = "down")

这比使用循环要快得多，但它假定 keypin2[n] 不为零时 (pin[n] != pin[n-1])

【讨论】：