组合数据框行答案

【问题标题】：combining data frames rows组合数据框行
【发布时间】：2012-07-10 15:29:11
【问题描述】：

我有一个带有两个 Id 变量和一个 name 变量的数据框。这些变量的组合数量不等。

## dput'ed data.frame
df <- structure(list(V1 = structure(c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 
4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L), .Label = c("A", 
"B", "C", "D", "E"), class = "factor"), V2 = c(1L, 2L, 3L, 1L, 
2L, 3L, 2L, 2L, 1L, 3L, 1L, 2L, 1L, 3L, 2L, 1L, 1L, 3L, 1L, 1L
), V3 = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 2L, 2L, 1L, 3L, 1L, 
2L, 1L, 3L, 2L, 1L, 1L, 3L, 1L, 1L), .Label = c("test1", "test2", 
"test3"), class = "factor")), .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA, 
-20L))
>df
   V1 V2    V3
1   A  1 test1
2   B  2 test2
3   C  3 test3
4   D  1 test1
5   E  2 test2
6   A  3 test3
7   B  2 test2
8   C  2 test2
9   D  1 test1
10  E  3 test3
11  A  1 test1
12  B  2 test2
13  C  1 test1
14  D  3 test3
15  E  2 test2
16  A  1 test1
17  B  1 test1
18  C  3 test3
19  D  1 test1
20  E  1 test1

我想组合这些行，以便结果每个 V1 只有一个条目，然后以逗号分隔的值列表作为第二个和第三个变量。像这样：

  f    V2            V3
1 A    1 ,3 ,1 ,1    test1 ,test3 ,test1 ,test1
2 B    2 ,2 ,2 ,1    test2 ,test2 ,test2 ,test1
3 C    3 ,2 ,1 ,3    test3 ,test2 ,test1 ,test3
4 D    1 ,1 ,3 ,1    test1 ,test1 ,test3 ,test1
5 E    2 ,3 ,2 ,1    test2 ,test3 ,test2 ,test1

我已经用下面的代码试过了，如果有点慢的话也可以。对更快的解决方案有何建议？

df = lapply(levels(df$V1), function(f){
  cbind(f,
        paste(df$V2[df$V1==f],collapse=" ,"),
        paste(df$V3[df$V1==f],collapse=" ,"))
})
df = as.data.frame(do.call(rbind, df))
df

编辑：更正的 dput(df)

【问题讨论】：

看起来你 dput'ed 是你想要的结果，而不是要转换的数据。
很抱歉。现在应该修复了
速度是您唯一追求的吗？您的输出还通过将所有这些值折叠成单个字符串来在一定程度上限制数据。使用aggregate 可以避免这种情况；输出中的每一列都是一个列表，您可以从中轻松恢复到之前的数据格式。
我没有意识到这一点。谢谢，因为恢复功能很方便。

标签： r dataframe lapply

【解决方案1】：

确保V3（或其他因子变量）处于模式as.character并使用aggregate：

df$V3 = as.character(df$V3)
aggregate(df[-1], by=list(df$V1), c, simplify=FALSE)
#   Group.1         V2                         V3
# 1       A 1, 3, 1, 1 test1, test3, test1, test1
# 2       B 2, 2, 2, 1 test2, test2, test2, test1
# 3       C 3, 2, 1, 3 test3, test2, test1, test3
# 4       D 1, 1, 3, 1 test1, test1, test3, test1
# 5       E 2, 3, 2, 1 test2, test3, test2, test1

【讨论】：

【解决方案2】：

do.call("rbind", lapply(split(df[, 2:3], df[,1]), function(x) sapply(x, paste, collapse=",")))
  V2        V3                       
A "1,3,1,1" "test1,test3,test1,test1"
B "2,2,2,1" "test2,test2,test2,test1"
C "3,2,1,3" "test3,test2,test1,test3"
D "1,1,3,1" "test1,test1,test3,test1"
E "2,3,2,1" "test2,test3,test2,test1"

【讨论】：