【发布时间】:2015-11-23 19:20:53
【问题描述】:
我第一次来这里,所以我希望我不会破坏任何东西...... 我有一个列表列表:
Browse[2]> head(str(mylist))
List of 33
$ : chr [1:33] "0001" "space" "28" "night_club" ...
$ : chr [1:33] "0002" "concert" "28" "night_club" ...
$ : chr [1:31] "0003" "night_club" "24" "martial_arts" ...
$ : chr [1:31] "0004" "stage" "24" "basketball" ...
$ : chr [1:43] "0005" "night_club" "16" "concert" ...
$ : chr [1:43] "0006" "night_club" "16" "concert" ...
$ : chr [1:39] "0007" "night_club" "22" "concert" ...
$ : chr [1:39] "0008" "night_club" "22" "concert" ...
$ : chr [1:31] "0009" "night_club" "46" "martial_arts" ...
$ : chr [1:31] "0010" "night_club" "46" "martial_arts" ...
$ : chr [1:41] "0011" "night_club" "17" "martial_arts" ...
$ : chr [1:41] "0012" "night_club" "17" "martial_arts" ...
$ : chr [1:29] "0013" "concert" "23" "night_club" ...
$ : chr [1:29] "0014" "concert" "23" "night_club" ...
$ : chr [1:25] "0015" "night_club" "26" "concert" ...
$ : chr [1:31] "0016" "night_club" "42" "concert" ...
$ : chr [1:31] "0017" "night_club" "42" "concert" ...
$ : chr [1:31] "0018" "night_club" "25" "wrestling" ...
$ : chr [1:31] "0019" "night_club" "25" "wrestling" ...
$ : chr [1:33] "0020" "night_club" "46" "wrestling" ...
$ : chr [1:33] "0021" "night_club" "46" "wrestling" ...
$ : chr [1:41] "0022" "concert" "21" "stage" ...
$ : chr [1:41] "0023" "concert" "21" "stage" ...
$ : chr [1:55] "0024" "basketball" "8" "concert" ...
$ : chr [1:55] "0025" "basketball" "8" "concert" ...
$ : chr [1:37] "0026" "bald_person" "26" "martial_arts" ...
$ : chr [1:37] "0027" "bald_person" "26" "martial_arts" ...
$ : chr [1:37] "0028" "night_club" "32" "business_meeting" ...
$ : chr [1:37] "0029" "night_club" "32" "business_meeting" ...
$ : chr [1:15] "0030" "night_club" "59" "stage" ...
$ : chr [1:37] "0031" "stage" "12" "night_club" ...
$ : chr [1:37] "0032" "stage" "12" "night_club" ...
$ : chr [1:33] "0033" "night_club" "23" "portrait" ...
我想将此列表转换为宽格式数据框,其中第一列将是每个内部列表第一个元素(即“0001”、“0002”等),并且所有可能的列都存在类别在文件中: “空间”、“夜总会”、“音乐会”、“婚姻艺术”、“摔跤”等。 这意味着我将有一个非常宽的数据框,每行将以某个 id (0001,0002,0003 ...) 开头,列名将再次是文件中的所有类别:“space”、“night_club”、“concert "、"marital_arts"、"wrestling" 等,对于每一行,如果该 id 存在类别,它将填充列表中类别旁边的值(例如,第一行中的 "space" -> 28) .
我试图用循环构造一个规范化的数据框,然后将其转换为宽格式,但随着数据规模的扩大,这将是一个坏主意:
for (file in files){# iterate over files in folder
mylist <- strsplit(readLines(file), ":")
#close(mylist)
for (elem in mylist){
dataframe <- data.frame(frameid = numeric(), category = character(), nrow = length(unlist(elem)))
frameid <- rep.int(elem[[1]], length(elem)-1)
categories <- elem[-1:-1]
dataframe$frameid <- frameid
dataframe$category <- categories
}
}
可重现的输入输出示例: 输入输出:
list(c("0001", "space", "28", "night_club", "25"), c("0002",
"concert", "28", "night_club", "26"), c("0003", "night_club",
"24", "martial_arts", "27"), c("0004", "stage", "24", "basketball",
"30"))
输出:
Dataframe
frameid, cat_space, cat_night_club, cat_concert, cat_martial_arts, cat_stage, cat_basketball
0001, 28, 25, 0, 0, 0, 0
0002, 0, 26, 28, 0, 0, 0
0003, 0, 24, 0, 27, 0, 0
0004, 0, 0, 0, 0, 24, 30
【问题讨论】:
-
我已根据您的要求使用输入 dput 对象更新了问题
-
关于将列表转换为数据框的一般性讨论,我建议您查看this article 和this solution。这是一项相当普遍的任务,并且在 SO 中被广泛讨论。整个哲学归结为将您的列表元素一个一个地组装到
data.frame中。