【问题标题】:group_by and conditionally mutate by groupgroup_by 并按组有条件地变异
【发布时间】:2021-06-24 22:45:54
【问题描述】:

我需要创建一个goal 变量,如果dummy.ciiu_compared = 1 大于50% 时的病例数将是1 否则0

17/26=0.65 -> 1

目标将是goal 变量。

注意:考虑按年份和 ID 分组。

数据

db = structure(list(year = structure(c("2020", "2020", "2020", "2019", 
                                      "2019", "2019", "2019", "2019", "2019", "2019", "2019", "2019", 
                                      "2019", "2019", "2019", "2019", "2019", "2019", "2019", "2019", 
                                      "2019", "2019", "2019", "2019", "2019", "2019", "2019", "2019", 
                                      "2019"), label = "AÃ<U+0091>O", format.stata = "%9s"), id = structure(c(732437, 
                                                                                                              732437, 732437, 178036, 178036, 178036, 178036, 178036, 178036, 
                                                                                                              178036, 178036, 178036, 178036, 178036, 178036, 178036, 178036, 
                                                                                                              178036, 178036, 178036, 178036, 178036, 178036, 178036, 178036, 
                                                                                                              178036, 178036, 178036, 178036), label = "EXPEDIENTE", format.stata = "%12.0g"), 
                   n_shareholder = c(3L, 3L, 3L, 26L, 26L, 26L, 26L, 26L, 26L, 
                                     26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 
                                     26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L), dummy = structure(list(
                                       ciiu_comparado = c(0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 
                                                          1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1)), class = c("tbl_df", 
                                                                                                                         "tbl", "data.frame"), row.names = c(NA, -29L)), n_dummy = c(3L, 
                                                                                                                                                                                     3L, 3L, 17L, 17L, 9L, 17L, 9L, 9L, 9L, 17L, 17L, 17L, 9L, 
                                                                                                                                                                                     17L, 17L, 9L, 17L, 17L, 9L, 17L, 9L, 17L, 17L, 17L, 17L, 
                                                                                                                                                                                     17L, 9L, 17L), goal = c(0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 
                                                                                                                                                                                                             1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), row.names = c(NA, 
                                                                                                                                                                                                                                                                                   -29L), groups = structure(list(year = structure(c("2019", "2020"
                                                                                                                                                                                                                                                                                   ), label = "AÃ<U+0091>O", format.stata = "%9s"), id = structure(c(178036, 
                                                                                                                                                                                                                                                                                                                                                     732437), label = "EXPEDIENTE", format.stata = "%12.0g"), .rows = structure(list(
                                                                                                                                                                                                                                                                                                                                                       4:29, 1:3), ptype = integer(0), class = c("vctrs_list_of", 
                                                                                                                                                                                                                                                                                                                                                                                                 "vctrs_vctr", "list"))), row.names = c(NA, -2L), class = c("tbl_df", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                            "tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           "tbl_df", "tbl", "data.frame"))

# A tibble: 29 x 6
# Groups:   year, id [2]
   year      id n_shareholder dummy$ciiu_comparado n_dummy  goal
   <chr>  <dbl>         <int>                <dbl>   <int> <dbl>
 1 2020  732437             3                    0       3     0
 2 2020  732437             3                    0       3     0
 3 2020  732437             3                    0       3     0
 4 2019  178036            26                    1      17     1
 5 2019  178036            26                    1      17     1
 6 2019  178036            26                    0       9     1
 7 2019  178036            26                    1      17     1
 8 2019  178036            26                    0       9     1
 9 2019  178036            26                    0       9     1
10 2019  178036            26                    0       9     1
# ... with 19 more rows

【问题讨论】:

    标签: r group-by dplyr


    【解决方案1】:

    下面将根据问题的定义创建虚拟对象。

    1. 比较dummy$ciiu_comparado == 1返回FALSE/TRUE,内部编码为0/1
    2. sum(&lt;logical&gt;) 得到1 的总数;
    3. n() 是组的行数。
    4. 然后,检查结果是否大于阈值0.5

    省略输出。

    library(dplyr)
    
    db %>%
      group_by(year, id) %>%
      mutate(goal = sum(dummy$ciiu_comparado == 1)/n(),
             goal = as.integer(goal > 0.5))
    

    goal 可以在一条指令中计算出来。

    db %>%
      group_by(year, id) %>%
      mutate(goal = +(sum(dummy$ciiu_comparado)/n() > 0.5))
    

    【讨论】:

      【解决方案2】:

      你可以这样做:

      libarary(dplyr)
      db %>% 
          group_by(year, id) %>% 
          mutate(new_goal = ifelse(sum(dummy) > (0.5 * nrow(.)), 1, 0)) %>% 
          ungroup
      

      【讨论】:

        猜你喜欢
        • 2022-01-04
        • 2017-08-16
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2022-01-17
        • 2021-12-11
        • 1970-01-01
        相关资源
        最近更新 更多