如何在 R 或 Excel 中重塑数据框？ [复制]答案

【问题标题】：How to reshape data frame in R or Excel? [duplicate]如何在 R 或 Excel 中重塑数据框？ [复制]
【发布时间】：2015-06-16 16:50:28
【问题描述】：

这是获取示例数据集的代码：

set.seed(0)
practice <- matrix(sample(1:100, 20), ncol = 2)
data <- as.data.frame(practice)
data <- cbind( lob = sprintf("objective%d", rep(1:2,each=5)), data)
data <- cbind( student = sprintf("student%d", rep(1:5,2)), data)
names(data) <- c("student", "learning objective","attempt", "score")
data[-8,]

数据如下：

    student learning objective attempt score
1  student1         objective1      90     6
2  student2         objective1      27    19
3  student3         objective1      37    16
4  student4         objective1      56    60
5  student5         objective1      88    34
6  student1         objective2      20    66
7  student2         objective2      85    42
9  student4         objective2      61    82
10 student5         objective2      58    31

我想要的是：

    student       objective1         objective2 
                 attempt  score     attempt score
1  student1         90     6          20      66
2  student2         27    19          85      42
3  student3         ...                0       0
4  student4         ...                  ...
5  student5         ...                  ...

有70个学习目标，所以复制粘贴尝试和分数会很繁琐，所以我想知道是否有更好的方法来清理数据。

R：我尝试使用 R 中的 melt 函数来获取新数据，但效果不佳。部分学生的分数缺失且学生姓名未列出，例如本例中为student3，所以我不能只cbind 分数。

Excel：有 70 个学习目标，由于缺少名称，我必须检查所有这 70 个目标的所有对应行 VLOOKUP：

(=VLOOKUP($C7,'0learning.csv'!$B$372:$G$395,5,0)
(=VLOOKUP($C7,'0learning.csv'!$B$372:$G$395,6,0)

有没有更好的办法？

【问题讨论】：

标签： r excel

【解决方案1】：

我们可以使用data.table 的开发版本，即v1.9.5，它可以采用多个value.var 列并将“长”形式重塑为“宽”。安装说明是here。

 library(data.table)#v1.9.5+
 names(data)[2] <- 'objective'
 dcast(setDT(data), student~objective, value.var=c('attempt', 'score'))
 #    student attempt_objective1 attempt_objective2 score_objective1
 #1: student1                 90                 20                6
 #2: student2                 27                 85               19
 #3: student3                 37                 96               16
 #4: student4                 56                 61               60
 #5: student5                 88                 58               34
 #    score_objective2
 #1:               66
 #2:               42
 #3:               87
 #4:               82
 #5:               31

或者使用来自base R的reshape

 reshape(data, idvar='student', timevar='objective', direction='wide')
 #  student attempt.objective1 score.objective1 attempt.objective2
 #  1 student1                 90                6                 20
 #  2 student2                 27               19                 85
 #  3 student3                 37               16                 96
 #  4 student4                 56               60                 61
 #  5 student5                 88               34                 58
 #    score.objective2
 #  1               66
 #  2               42
 #  3               87
 #  4               82
 #  5               31

【讨论】：

谢谢，但是这两行代码似乎都有错误。1:> 名称（数据）[2] dcast(setDT(data), student~objective, value.var=c('attempt', 'score')) .subset2(x, i, exact = exact) 中的错误：下标越界
@SongTianyang 你用的是data.table的开发版吗？
@SongTianyang 我添加了一个base R 版本，如果你没有data.table的devel版本应该可以工作
谢谢！我下载了包，但是默认版本是1.9.4。
重塑一个作品！非常感谢！