【发布时间】:2017-03-18 10:17:37
【问题描述】:
我有一个表,其中包含使用来自pastecs 包的stat.desc() 创建的描述性统计信息。然而,挑战在于我必须将这些组合成一个列表形式,然后我无法将其取消列出。我找到了R list to data frame 线程,但我必须创建一个临时的data.frame 才能完成这项工作。我正在处理的实际数据很大,实际上不允许创建临时数据框。
这是我的代码:
[您将需要pastecs 包。它已经加载到我的系统上。]
dput(df)
structure(list(group = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L,
4L), .Label = c("A", "B", "C", "D"), class = "factor"), dt = c(60,
60, 63, 59, 63, 67, 71, 64, 65, 66, 68, 66, 71, 67, NA, 68, 56,
NA, 60, 61, 63, 64, 63, 59)), .Names = c("group", "dt"), row.names = c(NA,
-24L), class = "data.frame")
#Convert to data.table
data.table::setDT(df)
df1<-df[,.(newvar = list(stat.desc(dt))),by=group]
b<-data.frame(matrix(unlist(df1$newvar,use.names = TRUE), nrow=nrow(df1), byrow=T),stringsAsFactors = FALSE)
names(b)<- names(df1$newvar[[1]])
df1$newvar<-NULL
df1<-cbind(df1,b)
rm(b)
这里的b 是临时表,我对此感到不舒服。
预期输出:
structure(list(group = structure(1:4, .Label = c("A", "B", "C",
"D"), class = "factor"), nbr.val = c(4, 8, 6, 4), nbr.null = c(0,
0, 0, 0), nbr.na = c(0, 0, 2, 0), min = c(59, 63, 56, 59), max = c(63,
71, 71, 64), range = c(4, 8, 15, 5), sum = c(242, 530, 383, 249
), median = c(60, 66, 64, 63), mean = c(60.5, 66.25, 63.8333333333333,
62.25), SE.mean = c(0.866025403784439, 0.881354477089505, 2.32975916733421,
1.10867789130417), CI.mean.0.95 = c(2.75607934655562, 2.08407217077572,
5.9888365969565, 3.5283078589307), var = c(3, 6.21428571428571,
32.5666666666667, 4.91666666666667), std.dev = c(1.73205080756888,
2.49284690951645, 5.70672118354022, 2.21735578260835), coef.var = c(0.0286289389680806,
0.0376278778794936, 0.0894003318570269, 0.0356201732145919)), .Names = c("group",
"nbr.val", "nbr.null", "nbr.na", "min", "max", "range", "sum",
"median", "mean", "SE.mean", "CI.mean.0.95", "var", "std.dev",
"coef.var"), row.names = c(NA, -4L), class = "data.frame")
对不起,如果这太基本了。我正在寻找更快的方法(即没有中间表,最好是使用data.table 的解决方案)。
感谢您的宝贵时间。
【问题讨论】:
标签: r data.table