【发布时间】:2014-02-21 21:46:35
【问题描述】:
是否可以仅针对拆分变量的某些值返回 ddply 结果?例如,使用数据框example:
example <- structure(list(shape = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 3L), .Label = c("circle", "square", "triangle"
), class = "factor"), property = structure(c(1L, 3L, 2L, 1L,
2L, 3L, 1L, 1L, 1L, 1L, 2L, 3L, 1L, 1L), .Label = c("color",
"intensity", "size"), class = "factor"), value = structure(c(5L,
2L, 1L, 5L, 4L, 1L, 5L, 6L, 6L, 7L, 4L, 3L, 6L, 5L), .Label = c("3",
"5", "6", "7", "blue", "green", "red"), class = "factor")), .Names = c("shape",
"property", "value"), class = "data.frame", row.names = c(NA,
-14L))
看起来像这样
shape property value
1 circle color blue
2 circle size 5
3 circle intensity 3
4 circle color blue
5 square intensity 7
6 square size 3
7 square color blue
8 square color green
9 square color green
10 triangle color red
11 triangle intensity 7
12 triangle size 6
13 triangle color green
14 triangle color blue
我想返回一个数据框,其中包含具有某种颜色的每个形状的数量,如下所示:
shape property blue green red
1 circle color 2 0 0
2 square color 1 2 0
3 triangle color 1 1 1
但是,我似乎无法让它正确返回!我已经使用这样的方式获得了一部分:
ColorSummary <- ddply(example,.(shape,property="color"), function(example) summary(example$value))
但这会返回一个数据框,其中包含所有其他唯一 value 的列(来自属性 size 和 intensity,我不想要):
shape property 3 5 6 7 blue green red
1 circle color 1 1 0 0 2 0 0
2 square NA 1 0 0 1 1 2 0
3 triangle NA 0 0 1 1 1 1 1
我做错了什么 - 有没有办法像我展示的第一个结果一样返回数据框?
此外,虽然这是一个小而快的示例,但我的“真实”数据要大得多,并且需要很长时间来计算。限制为property="color",ddply 的速度会不会提高?
编辑:感谢您到目前为止的回答!对我来说不幸的是,我把情况过于简单化了,我不确定dcast 解决方案是否适合我。让我解释一下 - 我实际上正在使用数据框example2:
example2 <- structure(list(factory = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), shape = structure(c(1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L), .Label = c("circle",
"square", "triangle"), class = "factor"), property = structure(c(1L,
3L, 2L, 1L, 2L, 3L, 1L, 1L, 1L, 1L, 2L, 3L, 1L, 1L, 1L, 3L, 2L
), .Label = c("color", "intensity", "size"), class = "factor"),
value = structure(c(5L, 2L, 1L, 5L, 4L, 1L, 5L, 6L, 6L, 7L,
4L, 3L, 6L, 5L, 5L, 2L, 1L), .Label = c("3", "5", "6", "7",
"blue", "green", "red"), class = "factor")), .Names = c("factory",
"shape", "property", "value"), class = "data.frame", row.names = c(NA,
-17L))
我正在尝试将factory 和shape 分开。我有一个使用ddply 的混乱解决方案:
ColorSummary2 <- ddply(example2,.(factory,shape,property="color"), function(example2) summary(example2$value))
这给了
factory shape property 3 5 6 7 blue green red
1 A circle color 1 1 0 0 2 0 0
2 A square NA 1 0 0 1 1 2 0
3 A triangle NA 0 0 1 1 1 1 1
4 B circle NA 1 1 0 0 1 0 0
但我想返回的是这个(对不起,凌乱的表格,我在这里格式化表格时遇到了麻烦):
factory shape property blue green red
1 A circle color 2 0 0
2 A square NA 1 2 0
3 A triangle NA 1 1 1
4 B circle NA 1 0 0
这可能吗?
编辑 2: 很抱歉所有的编辑,我过于简单化了我的情况。这是一个更复杂的数据框,更接近我的真实示例。这个有一个列state,我不想用它来拆分。我可以用 ddply 做到这一点(混乱),但我可以使用 dcast 忽略 state 吗?
example3 <- structure(list(state = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L), .Label = c("CA", "FL"
), class = "factor"), factory = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), shape = structure(c(1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L), .Label = c("circle",
"square", "triangle"), class = "factor"), property = structure(c(1L,
3L, 2L, 1L, 2L, 3L, 1L, 1L, 1L, 1L, 2L, 3L, 1L, 1L, 1L, 3L, 2L
), .Label = c("color", "intensity", "size"), class = "factor"),
value = structure(c(5L, 2L, 1L, 5L, 4L, 1L, 5L, 6L, 6L, 7L,
4L, 3L, 6L, 5L, 5L, 2L, 1L), .Label = c("3", "5", "6", "7",
"blue", "green", "red"), class = "factor")), .Names = c("state",
"factory", "shape", "property", "value"), class = "data.frame", row.names = c(NA,
-17L))
【问题讨论】:
-
包
reshape2可能更适合这项任务。