整洁的数据 Melt and Cast答案

【问题标题】：Tidy data Melt and Cast整洁的数据 Melt and Cast
【发布时间】：2015-07-08 15:58:13
【问题描述】：

在 Wickham 的 Tidy Data pdf 中，他有一个例子可以让数据从杂乱无章变为整洁。

我想知道代码在哪里？

比如用什么代码去哪里

表 1：典型的演示数据集。

到

表 3：与表 1 中相同的数据，但列中包含变量，行中包含观察值。

也许熔化或铸造。但是从http://www.statmethods.net/management/reshape.html 我看不出怎么做。

（自我注意：GDPpercapita 需要它...）

【问题讨论】：

在我看来“表 1”是一个矩阵，因此您可以使用 libray(reshape2); melt(table1)（如果您的数据集称为“表 1”）。
@Molx，这不是最直观的地方（或最直观的搜索表达式），因为它们是不同的包（尽管一个包含许多用于“reshape2”方法的包装器）。 “tidyr”小插图只关注data.frames，而“reshape2”包还处理其他数据类型。
@AnandaMahto 你说得对，我实际上认为这篇论文是关于 tidyr 的，因为它的标题，没有注意到它是关于 reshape2。
这篇论文比 tidyr 早了几年。我仍然鼓励 OP 查看the tidyr vingette，它涵盖了许多相同的原则，并显示了随附的tidyr 代码。
@Gregor，但仍然重要的是要认识到“tidyr”的作用比“reshape2”少，并且在作为输入的数据类型方面受到更多限制。

标签： r reshape reshape2 melt

【解决方案1】：

答案取决于您的数据结构。在您链接到的论文中，Hadley 正在撰写有关“reshape”和“reshape2”包的文章。

“表1”中的数据结构是什么含糊不清。从描述来看，这听起来像是带有命名暗名的matrix（就像我在mymat 中显示的那样）。在这种情况下，一个简单的melt 就可以了：

library(reshape2)
melt(mymat)
#           Var1       Var2 value
# 1   John Smith treatmenta     —
# 2     Jane Doe treatmenta    16
# 3 Mary Johnson treatmenta     3
# 4   John Smith treatmentb     2
# 5     Jane Doe treatmentb    11
# 6 Mary Johnson treatmentb     1

如果它不是矩阵，而是data.frame 和row.names，您仍然可以使用matrix 方法，使用类似melt(as.matrix(mymat))。

另一方面，如果“名称”是 data.frame 中的一列（因为它们在“tidyr”小插图中，您需要指定 id.vars 或 measure.vars 以便 @ 987654333@ 知道如何处理这些列。

melt(mydf, id.vars = "name")
#           name   variable value
# 1   John Smith treatmenta     —
# 2     Jane Doe treatmenta    16
# 3 Mary Johnson treatmenta     3
# 4   John Smith treatmentb     2
# 5     Jane Doe treatmentb    11
# 6 Mary Johnson treatmentb     1

街区里的新孩子是“tidyr”。 "tidyr" 包与data.frames 一起使用，因为它经常与dplyr 一起使用。我不会在这里复制“tidyr”的代码，因为the vignette 已经充分涵盖了这一点。

样本数据：

mymat <- structure(c("—", "16", "3", " 2", "11", " 1"), .Dim = c(3L, 
    2L), .Dimnames = list(c("John Smith", "Jane Doe", "Mary Johnson"
    ), c("treatmenta", "treatmentb")))

mydf <- structure(list(name = structure(c(2L, 1L, 3L), .Label = c("Jane Doe", 
    "John Smith", "Mary Johnson"), class = "factor"), treatmenta = c("—", 
    "16", "3"), treatmentb = c(2L, 11L, 1L)), .Names = c("name", 
    "treatmenta", "treatmentb"), row.names = c(NA, 3L), class = "data.frame")

【讨论】：