使用 plyr 在 R 中按列转置答案

【问题标题】：Transpose by column in R using plyr使用 plyr 在 R 中按列转置
【发布时间】：2014-04-04 22:12:03
【问题描述】：

这是我的名为 test 的 data.frame

    strain  variable    value       L1
1   AB1            n    582.00000   1
2   AB4            n    12.00000    1
3   CB4852         n    375.00000   1
4   CB4853         n    113.00000   1
5   CB4854         n    160.00000   1

这是一个融合的 data.frame，其中 L1 为 1-30，每个 L1 有 78 个变量和 96 个菌株...总共 219,552 行。

我想做的是获取这个data.frame（测试）并创建L1（30）X变量（78）具有以下方向的新data.frame：

L1_variable（这将是一个 df 的名称）

               strains1  strain2 .... strainN
    row.name     value     value        value
    variable x   value     value        value

因此为每个 L1 和具有每个应变列的给定变量值的变量创建一个新的 df。

这些将被放入一个函数中。

我认为需要创建一个函数，然后在我的 df 测试中使用 ddply，但我不知道如何实现。

感谢大家的帮助

【问题讨论】：

你看过reshape2吗？ cran.r-project.org/web/packages/reshape2/reshape2.pdf
您是否需要将它们作为单独的数据帧，而不是一个可以随意子集的大数据帧？
我看过reshape2，是的。理想情况下，我会有单独的 dfs，但是如果你有一个解决方案来提供一个带有 colnames = strains 和 row names = unique(variables) 的大 df 值填充单元格.. 我很想看看它
与其制作这么多“data.frames”，不如将它们保存在一个“列表”中；即split(df, interaction(df$L1, df$variable, drop = T)) 将输出一个“列表”，其中每个元素都有不同的L1 & variable 组合。然后你可以lapply 一个重塑功能。您也可以考虑构建 3D 数组的 xtabs(value ~ variable + strain + L1, df) 之类的东西。然后，您可以基于 L1 & variable 进行子集化，例如 my3darr["n", , 23]（即变量 = "n" & L1 = 23 & 所有菌株）。
@user2813055 请参阅我的答案，以获取带有colnames = strains 和row names = unique(variables) 的大型df 的解决方案

标签： r dataframe plyr transpose

【解决方案1】：

没有必要创建单独的数据框。您可以按如下方式重塑数据框：

# creating sample data (extending your sample in order to be able to illustrate the method
df <- structure(list(strain = structure(c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L), .Label = c("AB1", "AB4", "CB4852", "CB4853", "CB4854"), class = "factor"), variable = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L), .Label = c("m", "n"), class = "factor"), value = c(582, 12, 375, 113, 160, 753, 92, 115, 163, 189, 462, 72, 305, 183, 360, 142, 132, 75, 308, 216), L1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L)), .Names = c("strain", "variable", "value", "L1"), class = "data.frame", row.names = c(NA, -20L))

# transforming the data with the reshape2 package
require(reshape2)
df2 <- dcast(df, L1 + variable ~ strain, value.var="value")

# creating a variable with unique identifiers
df2$L1var <- paste0(df2$L1, df2$variable)

这会产生以下数据框：

df2 <- structure(list(L1 = c(1L, 1L, 2L, 2L), variable = structure(c(1L, 2L, 1L, 2L), .Label = c("m", "n"), class = "factor"), AB1 = c(753, 582, 142, 462), AB4 = c(92, 12, 132, 72), CB4852 = c(115, 375, 75, 305), CB4853 = c(163, 113, 308, 183), CB4854 = c(189, 160, 216, 360), L1var = c("1m", "1n", "2m", "2n")), .Names = c("L1", "variable", "AB1", "AB4", "CB4852", "CB4853", "CB4854", "L1var"), row.names = c(NA, -4L), class = "data.frame")

当您想要为每个唯一标识符单独的文件时，您可以像这样拆分df2：

# split dataframe in list of dataframes
dfs <- split(df2, df2$L1var)

# save each dataframe in the list to a seperate file
lapply(seq_along(dfs), function(i)write.csv(dfs[i], file = paste0(names(dfs)[i],'.csv')))

【讨论】：