【问题标题】:ddply with only certain values of splitting variableddply 仅具有某些拆分变量值
【发布时间】:2014-02-21 21:46:35
【问题描述】:

是否可以仅针对拆分变量的某些值返回 ddply 结果?例如,使用数据框example:

example <- structure(list(shape = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 3L, 3L, 3L, 3L, 3L), .Label = c("circle", "square", "triangle"
), class = "factor"), property = structure(c(1L, 3L, 2L, 1L, 
2L, 3L, 1L, 1L, 1L, 1L, 2L, 3L, 1L, 1L), .Label = c("color", 
"intensity", "size"), class = "factor"), value = structure(c(5L, 
2L, 1L, 5L, 4L, 1L, 5L, 6L, 6L, 7L, 4L, 3L, 6L, 5L), .Label = c("3", 
"5", "6", "7", "blue", "green", "red"), class = "factor")), .Names = c("shape", 
"property", "value"), class = "data.frame", row.names = c(NA, 
-14L))

看起来像这样

    shape     property  value
1   circle    color     blue
2   circle    size      5
3   circle    intensity 3
4   circle    color     blue
5   square    intensity 7
6   square    size      3
7   square    color     blue
8   square    color     green
9   square    color     green
10  triangle  color     red
11  triangle  intensity 7
12  triangle  size      6
13  triangle  color     green
14  triangle  color     blue

我想返回一个数据框,其中包含具有某种颜色的每个形状的数量,如下所示:

    shape    property  blue green   red
1   circle   color     2    0       0
2   square   color     1    2       0
3   triangle color     1    1       1

但是,我似乎无法让它正确返回!我已经使用这样的方式获得了一部分:

ColorSummary <- ddply(example,.(shape,property="color"), function(example) summary(example$value))

但这会返回一个数据框,其中包含所有其他唯一 value 的列(来自属性 sizeintensity,我不想要):

    shape     property      3   5   6   7   blue    green   red
1   circle    color         1   1   0   0   2       0       0
2   square    NA            1   0   0   1   1       2       0
3   triangle  NA            0   0   1   1   1       1       1

我做错了什么 - 有没有办法像我展示的第一个结果一样返回数据框?

此外,虽然这是一个小而快的示例,但我的“真实”数据要大得多,并且需要很长时间来计算。限制为property="color",ddply 的速度会不会提高?

编辑:感谢您到目前为止的回答!对我来说不幸的是,我把情况过于简单化了,我不确定dcast 解决方案是否适合我。让我解释一下 - 我实际上正在使用数据框example2

example2 <- structure(list(factory = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A", 
"B"), class = "factor"), shape = structure(c(1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L), .Label = c("circle", 
"square", "triangle"), class = "factor"), property = structure(c(1L, 
3L, 2L, 1L, 2L, 3L, 1L, 1L, 1L, 1L, 2L, 3L, 1L, 1L, 1L, 3L, 2L
), .Label = c("color", "intensity", "size"), class = "factor"), 
    value = structure(c(5L, 2L, 1L, 5L, 4L, 1L, 5L, 6L, 6L, 7L, 
    4L, 3L, 6L, 5L, 5L, 2L, 1L), .Label = c("3", "5", "6", "7", 
    "blue", "green", "red"), class = "factor")), .Names = c("factory", 
"shape", "property", "value"), class = "data.frame", row.names = c(NA, 
-17L))

我正在尝试将factoryshape 分开。我有一个使用ddply 的混乱解决方案:

ColorSummary2 <- ddply(example2,.(factory,shape,property="color"), function(example2) summary(example2$value))

这给了

    factory shape   property    3   5   6   7   blue    green   red
1   A   circle  color   1   1   0   0   2   0   0
2   A   square  NA  1   0   0   1   1   2   0
3   A   triangle    NA  0   0   1   1   1   1   1
4   B   circle  NA  1   1   0   0   1   0   0

但我想返回的是这个(对不起,凌乱的表格,我在这里格式化表格时遇到了麻烦):

    factory shape   property        blue    green   red
1   A   circle      color           2       0       0
2   A   square      NA              1       2       0
3   A   triangle    NA              1       1       1   
4   B   circle      NA              1       0       0

这可能吗?

编辑 2: 很抱歉所有的编辑,我过于简单化了我的情况。这是一个更复杂的数据框,更接近我的真实示例。这个有一个列state,我不想用它来拆分。我可以用 ddply 做到这一点(混乱),但我可以使用 dcast 忽略 state 吗?

example3 <- structure(list(state = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L), .Label = c("CA", "FL"
), class = "factor"), factory = structure(c(1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A", 
"B"), class = "factor"), shape = structure(c(1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L), .Label = c("circle", 
"square", "triangle"), class = "factor"), property = structure(c(1L, 
3L, 2L, 1L, 2L, 3L, 1L, 1L, 1L, 1L, 2L, 3L, 1L, 1L, 1L, 3L, 2L
), .Label = c("color", "intensity", "size"), class = "factor"), 
    value = structure(c(5L, 2L, 1L, 5L, 4L, 1L, 5L, 6L, 6L, 7L, 
    4L, 3L, 6L, 5L, 5L, 2L, 1L), .Label = c("3", "5", "6", "7", 
    "blue", "green", "red"), class = "factor")), .Names = c("state", 
"factory", "shape", "property", "value"), class = "data.frame", row.names = c(NA, 
-17L))

【问题讨论】:

  • reshape2 可能更适合这项任务。

标签: r plyr


【解决方案1】:

使用来自reshape2dcast

dcast(...~value,data=subset(example,property=='color'))
Aggregation function missing: defaulting to length
     shape property blue green red
1   circle    color    2     0   0
2   square    color    1     2   0
3 triangle    color    1     1   1

编辑

使用第二个数据集示例:

dcast(...~value,data=subset(example2,property=='color'))
Aggregation function missing: defaulting to length
  factory    shape property blue green red
1       A   circle    color    2     0   0
2       A   square    color    1     2   0
3       A triangle    color    1     1   1
4       B   circle    color    1     0   0

【讨论】:

  • 感谢 agstudy,这非常适合我给出的示例。但是,我试图简化我的示例以使其变得容易。不幸的是,我也被另一个变量分开,这就是我尝试 ddply 的原因。我将在上面编辑我的问题以显示我正在尝试的更复杂的情况。有没有办法使用 dcast 来完成这个新案例?
  • 非常感谢,agstudy!我真的很感谢你的帮助。你能告诉我是否有办法告诉 dcast 要“拆分”哪些列?例如,我可以只指定“工厂”和“形状”列吗?问题是我正在使用具有其他列的更大数据框,因此 dcast 解决方案没有正确计算工厂和形状。例如,想象如果我有另一列“状态”,dcast 会给我状态/工厂/形状的值,而我只想要工厂/形状。我希望这是有道理的,如果没有,我可以添加另一个示例数据框。再次感谢!
  • 亲爱的 agstudy,我用一个新的数据框更新了我的原始问题,显示了一个我不想用于拆分(状态)的列
  • 一种解决方法是制作仅包含感兴趣的分组列的新数据框,但理想情况下,我只想使用我的原始数据框来执行此操作。再次感谢 agstudy 的帮助。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2010-11-27
  • 2021-11-11
  • 2014-10-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多