【发布时间】:2015-08-12 04:34:39
【问题描述】:
我想通过位置名称来总结一个 df。数据看起来像这样:
location <- c("NY", "NC", "KA", "TX", "AZ", "NC", "SC", "ND", "SD", "MN","WA","MA","VT","CA","OR","NJ","OH","MI","IL","GA","FL")
tree_type <- c("pine", "birch", "maple", "palm")
df <- data.frame(location = sample(location, 20, replace = TRUE),
tree_type = sample(tree_type, 20, replace = TRUE),
density = runif(20, min = 24, max = 365),
income = runif(20, min = 37000, max = 62000))
我想要的是这样的:
location mean(density) mean(income) birch maple palm pine
1 AZ 38.44009 52032.95 0 0 1 0
2 CA 136.85112 42243.35 0 1 0 0
3 GA 101.24081 53405.60 2 0 0 0
4 IL 172.02651 46368.42 1 1 0 0
5 MA 198.69868 51117.18 0 0 0 1
6 MI 153.93358 60425.87 1 0 0 0
7 MN 185.05276 46468.68 0 0 1 0
8 NC 181.42187 46007.93 1 0 2 0
9 NJ 302.66541 59316.94 0 0 2 0
10 OR 303.88283 48497.03 0 0 0 2
11 SC 84.05136 50348.41 0 1 0 1
12 SD 158.47423 57894.27 0 0 1 0
13 VT 126.32967 42853.04 0 0 1 0
我是这样做的:
require(dplyr)
require(reshape2)
df_quantvars <- df %>% group_by(location) %>% summarise(mean(density), mean(income))
df_catvarslong <- as.data.frame(table(df[1:2]))
df_catvarswide <- dcast(df_catvarslong, location ~ tree_type, value.var = "Freq")
final_df <- left_join(df_quantvars, df_catvarswide, by = "location")
有没有办法在dplyr group_by 成语中做到这一点?冒着听起来很愚蠢的风险,我尝试这样做:
df_quantvars <- df %>% group_by(location) %>% summarise(mean(density), mean(income), table(df[1:2]))
我错过了什么?
【问题讨论】:
-
` summarise(mean(income), mean(density), birch=sum(tree_type=="birch"), maple=sum(tree_type=="maple"), palm=sum(tree_type =="palm"), pine=sum(tree_type=="pine"))`
-
你目前的方法有什么问题?
-
这是另一个尝试
df %>% group_by(location) %>% summarise(mean(density), mean(income)); df.table <- aggregate(tree_type ~ location, data = df, FUN = table); left_join(df, df.table, by = "location")。 -
别忘了
dcast可以直接聚合。我不确定您到底在寻找什么(是否需要使用table?),但您可以使用group_by、mutate和dcast:df %>% group_by(location) %>% mutate(mean(density), mean(income)) %>% dcast(location + `mean(density)` + `mean(income)` ~ tree_type, fun.aggregate = length)