【发布时间】:2020-11-03 17:56:45
【问题描述】:
在我目前的研究中,我意外地经常遇到这个特殊问题。假设我有一个数据框,其中包含美国所有州的总消费量。我想使用县人口(我有)来估计县的消费(我没有)。人口数据通常以长格式排列,列分别代表县、州和人口。如果消费数据称为cons,人口数据框称为pop,我通常解决问题的算法是这样的:
#data
pop <- as.data.frame(rnorm(12)+4)
pop$county <- letters[10:21]
pop$state <- c("A", "A", "A", "A", "B", "B", "B", "C", "C", "C","C","C")
colnames(pop)[1] <- "pop"
cons <- as.data.frame(c(10^5, 4*10^4, 8*10^4))
colnames(cons) <- "cons"
cons$state <- c("A", "B", "C")
agg_pop <- aggregate(list(pop_state = pop$pop), by = list(state = pop$state), FUN = sum, na.rm = T) # aggregating population by state
pop <- merge(pop, agg_pop, by = "state") # Merging the state population with the county population data
pop$share <- pop$pop/pop$pop_state # Calculating each county's share of state population
pop <- merge(pop, cons, by = "state") # Merging consumption data onto population data
pop$estimated_cons <- pop$cons * pop$share # multiplying county's share of state population with state consumption
谁能想到一种更简单的方法来做到这一点,只使用一个或两个函数?
【问题讨论】:
-
您好!你能提供一个最小的可重现的例子吗?
-
您能分享一个可重现的数据示例吗?
-
@grouah 我尝试添加一个带有模拟数据的示例
-
你好@pkpkPPkafa,我的回答有用吗?如果是这样,请不要犹豫,接受答案。