【问题标题】:How can I create a data.frame from a nested list with differing numbers of variable如何从具有不同数量的变量的嵌套列表中创建 data.frame
【发布时间】:2014-10-16 06:11:33
【问题描述】:

我已经下载了一个诺贝尔奖获得者的 json 文件,并将其转换为一个名为“nobels”的列表。结构中显示了几条记录

str(nobels)  

List of 1
$ laureates:List of 2
  ..$ :List of 13
  .. ..$ id             : chr "359"
  .. ..$ firstname      : chr "Axel Hugo Theodor"
  .. ..$ surname        : chr "Theorell"
  .. ..$ born           : chr "1903-07-06"
  .. ..$ died           : chr "1982-08-15"
  .. ..$ bornCountry    : chr "Sweden"
  .. ..$ bornCountryCode: chr "SE"
  .. ..$ bornCity       : chr "Linköping"
  .. ..$ diedCountry    : chr "Sweden"
  .. ..$ diedCountryCode: chr "SE"
  .. ..$ diedCity       : chr "Stockholm"
  .. ..$ gender         : chr "male"
  .. ..$ prizes         :List of 1
  .. .. ..$ :List of 5
  .. .. .. ..$ year        : chr "1955"
  .. .. .. ..$ category    : chr "medicine"
  .. .. .. ..$ share       : chr "1"
  .. .. .. ..$ motivation  : chr "\"for his discoveries concerning the nature and mode of action of oxidation enzymes\""
  .. .. .. ..$ affiliations:List of 1
  .. .. .. .. ..$ :List of 3
  .. .. .. .. .. ..$ name   : chr "Karolinska Institutet, Nobel Medical Institute"
  .. .. .. .. .. ..$ city   : chr "Stockholm"
  .. .. .. .. .. ..$ country: chr "Sweden"

  ..$ :List of 10
  .. ..$ id             : chr "774"
  .. ..$ firstname      : chr "Richard"
  .. ..$ surname        : chr "Axel"
  .. ..$ born           : chr "1946-07-02"
  .. ..$ died           : chr "0000-00-00"
  .. ..$ bornCountry    : chr "USA"
  .. ..$ bornCountryCode: chr "US"
  .. ..$ bornCity       : chr "New York, NY"
  .. ..$ gender         : chr "male"
  .. ..$ prizes         :List of 1
  .. .. ..$ :List of 5
  .. .. .. ..$ year        : chr "2004"
  .. .. .. ..$ category    : chr "medicine"
  .. .. .. ..$ share       : chr "2"
  .. .. .. ..$ motivation  : chr "\"for their discoveries of odorant receptors and the organization of the olfactory system\""
  .. .. .. ..$ affiliations:List of 1
  .. .. .. .. ..$ :List of 3
  .. .. .. .. .. ..$ name   : chr "Columbia University"
  .. .. .. .. .. ..$ city   : chr "New York, NY"
  .. .. .. .. .. ..$ country: chr "USA"

我应该如何将其转换为 data.frame?

虽然列表中有列表,但我很乐意使用年份和类别,而无需奖品。

还有一个问题,不是每条记录都有相同数量的变量 - 例如这里的第二个例子,没有提供 deadCountry 字段等

TIA

深表歉意。我真的不应该在晚上这样做。提供的答案对我原来的问题很好。但是,当我运行完整列表时,出现错误

Error in data.frame(year = "1931", category = "literature", share = "1",  : 
arguments imply differing number of rows: 1, 0

以下是导致此问题的数据。好像和隶属关系有关

nobels <- list(structure(list(id = "359", firstname = "Axel Hugo Theodor", 
surname = "Theorell", born = "1903-07-06", died = "1982-08-15", 
bornCountry = "Sweden", bornCountryCode = "SE", bornCity = "Linköping", 
diedCountry = "Sweden", diedCountryCode = "SE", diedCity = "Stockholm", 
gender = "male", prizes = list(structure(list(year = "1955", 
category = "medicine", share = "1", motivation = "\"for his discoveries concerning the          nature and mode of action of oxidation enzymes\"", 
affiliations = list(structure(list(name = "Karolinska Institutet, Nobel Medical    Institute", 
city = "Stockholm", country = "Sweden"), .Names = c("name", 
"city", "country")))), .Names = c("year", "category", 
"share", "motivation", "affiliations")))), .Names = c("id", 
"firstname", "surname", "born", "died", "bornCountry", "bornCountryCode", 
"bornCity", "diedCountry", "diedCountryCode", "diedCity", "gender", 
"prizes")), structure(list(id = "604", firstname = "Erik Axel", 
surname = "Karlfeldt", born = "1864-07-20", died = "1931-04-08", 
bornCountry = "Sweden", bornCountryCode = "SE", bornCity = "Karlbo", 
diedCountry = "Sweden", diedCountryCode = "SE", diedCity = "Stockholm", 
gender = "male", prizes = list(structure(list(year = "1931", 
category = "literature", share = "1", motivation = "\"The poetry of Erik Axel   Karlfeldt\"", 
affiliations = list(list())), .Names = c("year", "category", 
"share", "motivation", "affiliations")))), .Names = c("id", 
"firstname", "surname", "born", "died", "bornCountry", "bornCountryCode", 
"bornCity", "diedCountry", "diedCountryCode", "diedCity", "gender", 
"prizes")))

【问题讨论】:

  • 请添加dput(head(your_list)
  • @pssguy 最好显示您的数据集的dput(正如我在帖子中所展示的那样)以获得数据结构的准确表示。
  • @akrun。抱歉,我忘记了 dput。谢谢建议
  • @pssguy 请检查我的更新。

标签: r list nested-lists


【解决方案1】:

正如您正确识别的那样,问题是由于affiliations 引起的,其子列表是一个空列表。

> str(nobels)
List of 2
 $ :List of 13
  ..$ id             : chr "359"
  ..$ firstname      : chr "Axel Hugo Theodor"
  ..$ surname        : chr "Theorell"
  ..$ born           : chr "1903-07-06"
  ..$ died           : chr "1982-08-15"
  ..$ bornCountry    : chr "Sweden"
  ..$ bornCountryCode: chr "SE"
  ..$ bornCity       : chr "Linköping"
  ..$ diedCountry    : chr "Sweden"
  ..$ diedCountryCode: chr "SE"
  ..$ diedCity       : chr "Stockholm"
  ..$ gender         : chr "male"
  ..$ prizes         :List of 1
  .. ..$ :List of 5
  .. .. ..$ year        : chr "1955"
  .. .. ..$ category    : chr "medicine"
  .. .. ..$ share       : chr "1"
  .. .. ..$ motivation  : chr "\"for his discoveries concerning the          nature and mode of action of oxidation enzymes\""
  .. .. ..$ affiliations:List of 1
  .. .. .. ..$ :List of 3
  .. .. .. .. ..$ name   : chr "Karolinska Institutet, Nobel Medical    Institute"
  .. .. .. .. ..$ city   : chr "Stockholm"
  .. .. .. .. ..$ country: chr "Sweden"
 $ :List of 13
  ..$ id             : chr "604"
  ..$ firstname      : chr "Erik Axel"
  ..$ surname        : chr "Karlfeldt"
  ..$ born           : chr "1864-07-20"
  ..$ died           : chr "1931-04-08"
  ..$ bornCountry    : chr "Sweden"
  ..$ bornCountryCode: chr "SE"
  ..$ bornCity       : chr "Karlbo"
  ..$ diedCountry    : chr "Sweden"
  ..$ diedCountryCode: chr "SE"
  ..$ diedCity       : chr "Stockholm"
  ..$ gender         : chr "male"
  ..$ prizes         :List of 1
  .. ..$ :List of 5
  .. .. ..$ year        : chr "1931"
  .. .. ..$ category    : chr "literature"
  .. .. ..$ share       : chr "1"
  .. .. ..$ motivation  : chr "\"The poetry of Erik Axel   Karlfeldt\""
  .. .. ..$ affiliations:List of 1
  .. .. .. ..$ : list()                             **<--problem here**

如果您向该列表中添加一些随机数据,则代码可以正常工作。

nobels[[2]]$prizes[[1]]$affiliations[[1]]<-list(name="random data")

使用plyr包:

library (plyr)
mydf <- ldply(nobels, data.frame)

【讨论】:

  • 谢谢,但我得到了错误 Data.frame 中的错误(参数意味着不同的行数 - 这可能是因为某些列表项不包括所有变量
  • @pssguy 我在另一个答案中 akrun 共享的数据上尝试了这个,没有收到任何错误。
  • @ujwal 你说的很对。我得到的错误是在完整列表上运行时。我想我已经在上面的编辑中确定了它
  • @pssguy 我已经在上面编辑了我的答案。你现在可以试试吗?
  • 谢谢。我遍历列表的每个元素,测试 length(nobels[[i]]$prizes[[1]]$affiliations[[1]]) 是否为零,如果是,则使用您的代码。这创建了一个新列(我随后可以省去),其值为“随机数据”或 NA。它还确保了在列表为零的那些类别中输入了 NA 现在只需对一位获奖者(例如居里夫人)的多个奖项进行排序,并希望能够真正分析数据!
【解决方案2】:

你也可以使用tidyr中的unnest

 devtools::install_github("hadley/tidyr")
 library(tidyr)

使用您的新数据集,这似乎可行

  res1 <-unnest(lapply(nobels, function(x)
         as.data.frame.list(rapply(x,unlist), stringsAsFactors=FALSE)))

  str(res1)
  #Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    2 obs. of  19 variables:
  #   $ id                         : chr  "359" "604"
  #$ firstname                  : chr  "Axel Hugo Theodor" "Erik Axel"
  #$ surname                    : chr  "Theorell" "Karlfeldt"
  #$ born                       : chr  "1903-07-06" "1864-07-20"
  #$ died                       : chr  "1982-08-15" "1931-04-08"
  #$ bornCountry                : chr  "Sweden" "Sweden"
  #$ bornCountryCode            : chr  "SE" "SE"
  #$ bornCity                   : chr  "Linköping" "Karlbo"
  #$ diedCountry                : chr  "Sweden" "Sweden"
  #$ diedCountryCode            : chr  "SE" "SE"
  #$ diedCity                   : chr  "Stockholm" "Stockholm"
  #$ gender                     : chr  "male" "male"
  #$ prizes.year                : chr  "1955" "1931"
  #$ prizes.category            : chr  "medicine" "literature"
  #$ prizes.share               : chr  "1" "1"
  #$ prizes.motivation          : chr  "\"for his discoveries concerning the          nature and mode of action of oxidation enzymes\"" "\"The poetry of Erik Axel   Karlfeldt\""
  #$ prizes.affiliations.name   : chr  "Karolinska Institutet, Nobel Medical    Institute" NA
 #$ prizes.affiliations.city   : chr  "Stockholm" NA
 #$ prizes.affiliations.country: chr  "Sweden" NA

【讨论】:

  • 感谢您花时间弥补任何不足,并让我了解 tidyr 包中的 unnest。正如您从上面看到的那样,我在上面复制的一些数据仍然存在问题
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2018-09-10
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多