【问题标题】:R nested for loop iterating over rows and column namesR嵌套for循环迭代行和列名
【发布时间】:2017-04-09 01:15:48
【问题描述】:

我是 R 新手,所以请原谅这个基本问题。

这是a Dropbox link 我的数据的 .csv。

我有 1990-2010 年的国家数据。我的数据很宽:每个国家都是一行,每年有两列对应两个数据源。但是,某些国家/地区的数据并不完整。例如,一个国家行可能在 1990-1995 列中有 NA 值。

我想创建两列,对于每个国家/地区行,我希望这些列中的值是两种数据类型中最早的非 NA

我还想创建另外两个列,对于每个国家/地区行,我希望这些列中的值是两个数据中最早的非 NA year类型。

所以最后四列会是这样的:

1990, 12, 1990, 87
1990, 7, 1990, 132
1996, 22, 1996, 173
1994, 14, 1994, 124

这是我对嵌套 for 循环的粗略半伪代码尝试:

for i in (number of rows){
  for j in names(df){
    if(is.na(df$j) == FALSE)  df$earliest_year = j
  }
}

如何生成这些所需的四列?谢谢!

【问题讨论】:

    标签: r for-loop


    【解决方案1】:

    你提到了for循环;所以我尝试制作一个for循环。但是您可能想尝试其他 R 功能,例如稍后应用。这段代码有点长,希望对你有帮助:

    # read data; i'm assuming the first column is row name and not important
    df <- read.csv("wb_wide.csv", row.names = 1)
    
    # get names of columns for the two datasource
    # here I used grep to find columns names using NY and SP pattern; 
    # but if the format is consistentto be alternating, 
    # you can use sequence of number
    dataSourceA <- names(df)[grep(x = names(df), pattern = "NY")]
    dataSourceB <- names(df)[grep(x = names(df), pattern = "SP")]
    
    # create new columns for the data set
    # if i understand it correctly, first non-NA data from source 1
    # and source 2; and then the year of these non-NAs
    df$sourceA <- vector(length = nrow(df))
    df$yearA <- vector(length = nrow(df))
    df$sourceB <- vector(length = nrow(df))
    df$yearB <- vector(length = nrow(df))
    
    # start for loop that will iterate per row
    for(i in 1:nrow(df)){
    
      # this is a bit nasty; but the point here is to first select columns for source A
      # then determine non-NAs, after which select the first and store it in the sourceA column
      df$sourceA[i] <- df[i, dataSourceA][which(!is.na(df[i , dataSourceA]))[1]]
    
      # another nasty one; but I used gsub to clean the column name so that the year will be left
      # you can also skip this and then just clean afterward
      df$yearA[i] <- gsub(x = names(df[i, dataSourceA][which(!is.na(df[i , dataSourceA]))[1]]),
                   pattern = "^.*X", replacement = "")
    
      # same with the first bit of code, but here selecting from source B
      df$sourceB[i] <- df[i, dataSourceB][which(!is.na(df[i , dataSourceB]))[1]]
    
      # same with the second bit for source B
      df$yearB[i] <- gsub(x = names(df[i, dataSourceB][which(!is.na(df[i , dataSourceB]))[1]]),
                   pattern = "^.*X", replacement = "")
    
    }
    

    我试图使代码特定于您的示例并希望输出。

    【讨论】:

    • 这太棒了!非常感谢!!也很有帮助的解释。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2020-09-02
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-10-20
    • 1970-01-01
    相关资源
    最近更新 更多