【发布时间】:2023-03-16 09:45:01
【问题描述】:
我想在一个数据框上滚动一个使用两个数据列的自定义函数。我知道如何用一个数据列做到这一点,但我不能完全解决两个问题。 (真实的数据框要大得多。)
my_df <- data.frame("id"=c("151", "143", "199", "122", "156"),
"person"=c("mother", "father", "grandma", "child", "sister", "mother", "grandma", "grandma", "father", "mother","mother", "mother", "grandma", "child", "sister", "mother", "mother", "grandma", "father", "mother", "mother", "mother", "mother", "mother", "mother"))
my_new_df <- my_df %>%
group_by(id) %>% # first I subset by ID number
mutate(total = n()) # calculate the total number of observations per ID
filter(person=='mother') %>% # then I filter the observations I want to know about
mutate(n_mother = n()) %>% calculate the # of 'mother' observations per ID
mutate(prop_mother = rollapply(n_mother/total, width=1, FUN=(??)) # Here I get stuck - I want the proportion of 'mother' observations updated for every observation from this ID number
我是否编写自定义函数以在管道内调用?
calculate_mother = function(n_mother){
return(n_mother / total)
}
在此之后,我还想计算 prop_mother 的滚动均值和方差,但在我实际计算 prop_mother 之前我不能这样做
【问题讨论】:
标签: r dplyr rolling-computation