【发布时间】:2015-09-18 09:11:48
【问题描述】:
我认为我遗漏了一些明显的东西,我有以下数据框
df <- data.frame(type = c("cattle", "mixed", "not stated", "other", "sheep", "cattle", "mixed", "not stated", "other", "sheep", "cattle", "mixed", "not stated", "other", "sheep"),
region = c("EA", "EA", "EA", "EA", "EA", "NW", "NW", "NW", "NW", "NW", "S", "S", "S", "S", "S" ),
number = c(14, 9, 80, 0, 2, 36, 15, 45, 0, 7, 12, 35, 92, 18, 1))
我想计算每个区域内类型的比例。我都试过了:
require(plyr)
ddply(df, .(region, type), mutate,
prop = number/sum(number))
和
transform(df, prop = number/ave(number, region, type, FUN = sum))
哪个给
type region number prop
1 cattle EA 14 1
2 mixed EA 9 1
3 not stated EA 80 1
4 other EA 0 NaN
5 sheep EA 2 1
6 cattle NW 36 1
7 mixed NW 15 1
8 not stated NW 45 1
9 other NW 0 NaN
10 sheep NW 7 1
11 cattle S 12 1
12 mixed S 35 1
13 not stated S 92 1
14 other S 18 1
15 sheep S 1 1
感谢阅读
【问题讨论】:
-
你如何计算一个地区的?
-
这不就是
transform(df, prop = number/ave(number, region, FUN = sum))吗?您的类型在每个地区都是唯一的,因此无需将其包含在看起来的计算中。 -
如果它不是唯一的,你需要像
transform(df, prop = ave(number, region, type, FUN = sum)/ ave(number, region, FUN = sum))这样的东西 -
@David Arenburg - 'transform(df, prop = ave(number, region, type, FUN = sum)/ ave(number, region, FUN = sum))' 工作得很好。谢谢!