【问题标题】:Proportion by subset within columns列内子集的比例
【发布时间】:2015-09-18 09:11:48
【问题描述】:

我认为我遗漏了一些明显的东西,我有以下数据框

df <- data.frame(type = c("cattle", "mixed", "not stated", "other", "sheep", "cattle", "mixed", "not stated", "other", "sheep", "cattle", "mixed", "not stated", "other", "sheep"),
        region = c("EA", "EA", "EA", "EA", "EA", "NW", "NW", "NW", "NW", "NW", "S", "S", "S", "S", "S" ),
        number = c(14, 9, 80, 0, 2, 36, 15, 45, 0, 7, 12, 35, 92, 18,  1))

我想计算每个区域内类型的比例。我都试过了:

require(plyr)

ddply(df, .(region, type), mutate,
prop = number/sum(number))

transform(df, prop = number/ave(number, region, type, FUN = sum))

哪个给

         type region number prop
1      cattle     EA     14    1
2       mixed     EA      9    1
3  not stated     EA     80    1
4       other     EA      0  NaN
5       sheep     EA      2    1
6      cattle     NW     36    1
7       mixed     NW     15    1
8  not stated     NW     45    1
9       other     NW      0  NaN
10      sheep     NW      7    1
11     cattle      S     12    1
12      mixed      S     35    1
13 not stated      S     92    1
14      other      S     18    1
15      sheep      S      1    1

感谢阅读

【问题讨论】:

  • 你如何计算一个地区的?
  • 这不就是transform(df, prop = number/ave(number, region, FUN = sum))吗?您的类型在每个地区都是唯一的,因此无需将其包含在看起来的计算中。
  • 如果它不是唯一的,你需要像transform(df, prop = ave(number, region, type, FUN = sum)/ ave(number, region, FUN = sum))这样的东西
  • @David Arenburg - 'transform(df, prop = ave(number, region, type, FUN = sum)/ ave(number, region, FUN = sum))' 工作得很好。谢谢!

标签: r transform plyr


【解决方案1】:

其实,你只需要应用 ddply 并按“区域”分组即可。

试试这个:

ddply(df, .(region), mutate, prop = number/sum(number))

     type region number        prop
1      cattle     EA     14 0.133333333
2       mixed     EA      9 0.085714286
3  not stated     EA     80 0.761904762
4       other     EA      0 0.000000000
5       sheep     EA      2 0.019047619
6      cattle     NW     36 0.349514563
7       mixed     NW     15 0.145631068
8  not stated     NW     45 0.436893204
9       other     NW      0 0.000000000
10      sheep     NW      7 0.067961165
11     cattle      S     12 0.075949367
12      mixed      S     35 0.221518987
13 not stated      S     92 0.582278481
14      other      S     18 0.113924051
15      sheep      S      1 0.006329114

原因:您希望按区域对每个组进行摘要,因此您只需要对区域进行 ddply

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2010-11-07
    相关资源
    最近更新 更多