R嵌套for循环迭代行和列名答案

【问题标题】：R nested for loop iterating over rows and column namesR嵌套for循环迭代行和列名
【发布时间】：2017-04-09 01:15:48
【问题描述】：

我是 R 新手，所以请原谅这个基本问题。

这是a Dropbox link 我的数据的 .csv。

我有 1990-2010 年的国家数据。我的数据很宽：每个国家都是一行，每年有两列对应两个数据源。但是，某些国家/地区的数据并不完整。例如，一个国家行可能在 1990-1995 列中有 NA 值。

我想创建两列，对于每个国家/地区行，我希望这些列中的值是两种数据类型中最早的非 NA 值。

我还想创建另外两个列，对于每个国家/地区行，我希望这些列中的值是两个数据中最早的非 NA year类型。

所以最后四列会是这样的：

1990, 12, 1990, 87
1990, 7, 1990, 132
1996, 22, 1996, 173
1994, 14, 1994, 124

这是我对嵌套 for 循环的粗略半伪代码尝试：

for i in (number of rows){
  for j in names(df){
    if(is.na(df$j) == FALSE)  df$earliest_year = j
  }
}

如何生成这些所需的四列？谢谢！

【问题讨论】：

标签： r for-loop

【解决方案1】：

你提到了for循环；所以我尝试制作一个for循环。但是您可能想尝试其他 R 功能，例如稍后应用。这段代码有点长，希望对你有帮助：

# read data; i'm assuming the first column is row name and not important
df <- read.csv("wb_wide.csv", row.names = 1)

# get names of columns for the two datasource
# here I used grep to find columns names using NY and SP pattern; 
# but if the format is consistentto be alternating, 
# you can use sequence of number
dataSourceA <- names(df)[grep(x = names(df), pattern = "NY")]
dataSourceB <- names(df)[grep(x = names(df), pattern = "SP")]

# create new columns for the data set
# if i understand it correctly, first non-NA data from source 1
# and source 2; and then the year of these non-NAs
df$sourceA <- vector(length = nrow(df))
df$yearA <- vector(length = nrow(df))
df$sourceB <- vector(length = nrow(df))
df$yearB <- vector(length = nrow(df))

# start for loop that will iterate per row
for(i in 1:nrow(df)){

  # this is a bit nasty; but the point here is to first select columns for source A
  # then determine non-NAs, after which select the first and store it in the sourceA column
  df$sourceA[i] <- df[i, dataSourceA][which(!is.na(df[i , dataSourceA]))[1]]

  # another nasty one; but I used gsub to clean the column name so that the year will be left
  # you can also skip this and then just clean afterward
  df$yearA[i] <- gsub(x = names(df[i, dataSourceA][which(!is.na(df[i , dataSourceA]))[1]]),
               pattern = "^.*X", replacement = "")

  # same with the first bit of code, but here selecting from source B
  df$sourceB[i] <- df[i, dataSourceB][which(!is.na(df[i , dataSourceB]))[1]]

  # same with the second bit for source B
  df$yearB[i] <- gsub(x = names(df[i, dataSourceB][which(!is.na(df[i , dataSourceB]))[1]]),
               pattern = "^.*X", replacement = "")

}

我试图使代码特定于您的示例并希望输出。

【讨论】：

这太棒了！非常感谢！！也很有帮助的解释。