【问题标题】:r rolling custom function over 2 columnsr 在 2 列上滚动自定义函数
【发布时间】:2023-03-16 09:45:01
【问题描述】:

我想在一个数据框上滚动一个使用两个数据列的自定义函数。我知道如何用一个数据列做到这一点,但我不能完全解决两个问题。 (真实的数据框要大得多。)

my_df <- data.frame("id"=c("151", "143", "199", "122", "156"), 
                "person"=c("mother", "father", "grandma", "child", "sister", "mother", "grandma", "grandma", "father", "mother","mother", "mother", "grandma", "child", "sister", "mother", "mother", "grandma", "father", "mother", "mother", "mother", "mother", "mother", "mother"))

my_new_df <- my_df %>%
group_by(id) %>% # first I subset by ID number
mutate(total = n()) # calculate the total number of observations per ID
filter(person=='mother') %>% # then I filter the observations I want to know about
mutate(n_mother = n()) %>% calculate the # of 'mother' observations per ID
mutate(prop_mother = rollapply(n_mother/total, width=1, FUN=(??)) # Here I get stuck - I want the proportion of 'mother' observations updated for every observation from this ID number
我是否编写自定义函数以在管道内调用?
calculate_mother = function(n_mother){
   return(n_mother / total)
}
在此之后,我还想计算 prop_mother 的滚动均值和方差,但在我实际计算 prop_mother 之前我不能这样做

【问题讨论】:

    标签: r dplyr rolling-computation


    【解决方案1】:

    我会尝试这样的:

    #count is group_by and n rolled into one
    all_ids <- my_df %>% count(id)
    
    mom_ids <- my_df %>% filter(person=='mother') %>% count(id,name = "n_mother")
    
    my_new_df <- full_join(all_ids,mom_ids)
    
    my_new_df$n_mother[is.na(my_new_df$n_mother)] <- 0
    
    my_new_df$prop_mother <- my_new_df$n_mother/my_new_df$n
    

    【讨论】:

      【解决方案2】:

      您正在寻找这样的东西吗?我无法确定要订购的东西,因为“滚动”计算所需的 ID 是为母亲重复的……或者您也可以按 ID 分组,而不仅仅是人

      library(dplyr)
      
      my_new_df <- my_df %>%
        dplyr::group_by(id) %>% 
        dplyr::mutate(total = n())  %>% 
        dplyr::mutate(n_mother = n()) %>%
        dplyr::group_by(person) %>%
        dplyr::mutate(prop_mother = n_mother/sum(total),
                      roll_prop_mother = cumsum(prop_mother))
      

      【讨论】:

      • 我并不是要复制 ID 和母亲 - 每个 ID 条目都有多个“母亲”观察结果。现在修复了。
      • 这很接近,但我不想要比例的总和,我想要数据框每一行的 mothertotal 的更新比例,按 ID 分组。跨度>
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-12-10
      • 1970-01-01
      • 1970-01-01
      • 2015-08-28
      • 1970-01-01
      相关资源
      最近更新 更多