【发布时间】:2016-05-15 11:36:13
【问题描述】:
我有一张这样的桌子,
> head(dt2)
Weight Height Fitted interval limit value
1 65.6 174.0 71.91200 pred lwr 53.73165
2 80.7 193.5 91.63237 pred lwr 73.33198
3 72.6 186.5 84.55326 pred lwr 66.31751
4 78.8 187.2 85.26117 pred lwr 67.02004
5 74.8 181.5 79.49675 pred lwr 61.29244
6 86.4 184.0 82.02501 pred lwr 63.80652
我希望它有这样的,
> head(reshape2::dcast(dt2,
Weight + Height + Fitted + interval ~ limit,
fun.aggregate = mean))
Weight Height Fitted interval lwr upr
1 42.0 153.4 51.07920 conf 49.15463 53.00376
2 42.0 153.4 51.07920 pred 32.82122 69.33717
3 43.2 160.0 57.75378 conf 56.35240 59.15516
4 43.2 160.0 57.75378 pred 39.54352 75.96404
5 44.8 149.5 47.13512 conf 44.87642 49.39382
6 44.8 149.5 47.13512 pred 28.83891 65.43133
但是使用tidyr::spread,我该怎么做呢?
我正在使用,
> tidyr::spread(dt2, limit, value)
但得到错误,
Error: Duplicate identifiers for rows (1052, 1056), (238, 242), (1209, 1218), (395, 404), (839, 1170), (25, 356), (1173, 1203, 1215), (359, 389, 401), (1001, 1200), (187, 386), (906, 907), (92, 93), (930, 1144), (116, 330), (958, 1171), (144, 357), (902, 1018), (88, 204), (960, 1008), (146, 194), (1459, 1463), (645, 649), (1616, 1625), (802, 811), (1246, 1577), (432, 763), (1580, 1610, 1622), (766, 796, 808), (1408, 1607), (594, 793), (1313, 1314), (499, 500), (1337, 1551), (523, 737), (1365, 1578), (551, 764), (1309, 1425), (495, 611), (1367, 1415), (553, 601)
随机 10 行::
> dt[sample(nrow(dt), 10), ]
Weight Height Fitted interval limit value
1253 52.2 162.5 60.28203 conf upr 61.51087
426 49.1 158.8 56.54022 pred upr 74.75756
1117 78.4 184.5 82.53066 conf lwr 80.98778
1171 85.9 166.4 64.22611 conf lwr 63.21254
948 61.4 177.8 75.75494 conf lwr 74.66393
384 90.9 172.7 70.59731 pred lwr 52.41828
289 75.9 172.7 70.59731 pred lwr 52.41828
3 44.8 149.5 47.13512 pred lwr 28.83891
774 87.3 182.9 80.91258 pred upr 99.12445
772 86.4 175.3 73.22669 pred upr 91.40919
【问题讨论】:
-
您的示例在
limit中不包含upr,在interval中也不包含conf,这意味着您的预期结果不可重现 -
为什么不将其保存为长格式并进行汇总?请参阅here for an example 与基础 R、dplyr 和 data.table。
-
虽然我已经用 dcast 完成了,但我想用 tidyr 来完成它只是为了学习。 @mtoto 这只是我的数据集的一个头,我会编辑它给你一个随机样本,以便重现性。
-
这应该可以工作:
dt2 %>% group_by(interval, limit) %>% summarise_each(funs(mean)) %>% spread(limit, value, -c(1:3)) -
按区间和限制汇总,只给了我两行。