【问题标题】:Mutate a subset of a variable based on a condition根据条件改变变量的子集
【发布时间】:2021-03-10 16:57:13
【问题描述】:

我有以下数据……

have_df <- tibble(
    year = c(2015, 2016, 2017, 2018, 2019, 2015, 2016, 2017, 2018, 2019),
    da_assist = c(0, 0, 0, 1, 1, 0, 0, 0, 0, 0),
    priority = c(NA, NA, NA, "Priority4", "Priority4", NA, NA, NA, NA, NA),
    nces_dist = c(065441, 065441, 065441, 065441, 065441,074911, 074911, 074911, 074911, 074911))

我想用 dplyr 表示当 2018 年的“优先级”=“优先级 4”时,然后将 2015 年、2016 年和 2017 年的“优先级”的 NA 转换为“优先级 4”。我只想为特定 id (nces_dist) 更改优先级变量中的值,其中 2018 年的“优先级”=“优先级 4”,因此数据如下所示:

need_df <- tibble(
    year = c(2015, 2016, 2017, 2018, 2019, 2015, 2016, 2017, 2018, 2019),
    da_assist = c(0, 0, 0, 1, 1, 0, 0, 0, 0, 0),
    priority = c("Priority4", "Priority4", "Priority4","Priority4", "Priority4", NA, NA, NA, NA, NA), 
    nces_dist = c(065441, 065441, 065441, 065441, 065441,074911, 074911, 074911, 074911, 074911)

我尝试搜索了十几个 mutate 帖子,但找不到使用来自另一个变量的子集来改变变量子集的方法。谢谢。

【问题讨论】:

    标签: r subset dplyr


    【解决方案1】:

    对于每个nces_dist,您可以在fill priority 方向上的priority 值。

    library(dplyr)
    library(tidyr)
    
    need_df <- have_df %>% group_by(nces_dist) %>% fill(priority, .direction = 'up')
    need_df
    
    #    year da_assist priority  nces_dist
    #   <dbl>     <dbl> <chr>         <dbl>
    # 1  2015         0 Priority4     65441
    # 2  2016         0 Priority4     65441
    # 3  2017         0 Priority4     65441
    # 4  2018         1 Priority4     65441
    # 5  2019         1 Priority4     65441
    # 6  2015         0 NA            74911
    # 7  2016         0 NA            74911
    # 8  2017         0 NA            74911
    # 9  2018         0 NA            74911
    #10  2019         0 NA            74911
    

    【讨论】:

      【解决方案2】:

      我们可以使用replace

      library(dplyr)
      have_df %>% 
          mutate(priority = replace(priority,
                   seq_len(match(2018, year)), priority[year == 2018][1]))
      

      -输出

      # A tibble: 10 x 4
      #    year da_assist priority  nces_dist
      #   <dbl>     <dbl> <chr>         <dbl>
      # 1  2015         0 Priority4     65441
      # 2  2016         0 Priority4     65441
      # 3  2017         0 Priority4     65441
      # 4  2018         1 Priority4     65441
      # 5  2019         1 Priority4     65441
      # 6  2015         0 <NA>          74911
      # 7  2016         0 <NA>          74911
      # 8  2017         0 <NA>          74911
      # 9  2018         0 <NA>          74911
      #10  2019         0 <NA>          74911
      

      【讨论】:

      • 你能解释一下我的帮助/学习的语法吗?
      • @AnilGoyal match 返回出现 2018 年的第一个索引。 seq_len 获取序列直到该点 seq_len(5) replace 并根据索引或逻辑向量进行替换。然后在替换中,我们指定priority的单个值,其中'year是2018,[1]是因为有多个2018
      【解决方案3】:

      如果我理解正确,以下代码也应该有所帮助

      have_df %>% 
        mutate(dummy = ifelse(year == 2018 & priority == "Priority4", 1, 0)) %>%
        group_by(nces_dist) %>%
        mutate(dummy = ifelse(is.na(dummy) == T, 0, dummy),
               dummy = cumsum(dummy),
               priority = ifelse(last(dummy) >0, "Priority4", NA)) %>%
        select(-dummy)
      
      # A tibble: 10 x 4
      # Groups:   nces_dist [2]
          year da_assist priority  nces_dist
         <dbl>     <dbl> <chr>         <dbl>
       1  2015         0 Priority4     65441
       2  2016         0 Priority4     65441
       3  2017         0 Priority4     65441
       4  2018         1 Priority4     65441
       5  2019         1 Priority4     65441
       6  2015         0 NA            74911
       7  2016         0 NA            74911
       8  2017         0 NA            74911
       9  2018         0 NA            74911
      10  2019         0 NA            74911
      

      如果满足任何条件,虚拟变量的累积和将每组的最后一个(按 Id 分组)更改为 1(或 >0);此后优先级列可以很容易地改变

      【讨论】:

        猜你喜欢
        • 2019-03-02
        • 2020-01-02
        • 2012-11-12
        • 2014-01-25
        • 1970-01-01
        • 1970-01-01
        • 2019-09-03
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多