【问题标题】:Drop rows conditional on value on other rows using dplyr in R在 R 中使用 dplyr 删除以其他行的值为条件的行
【发布时间】:2023-03-03 14:19:02
【问题描述】:

使用下面提供的示例数据:对于每种机构类型(“a”和“b”),如果存在 fac ==“yes”的行,我想删除带有 fac ==“no”的行年。然后我想按年份对这些值求和。但是,我无法弄清楚如何删除正确的“否”行。以下是我根据here 的答案所做的一些尝试。

set.seed(123)
ext <- tibble(
  institution = c(rep("a", 7), rep("b", 7)),
  year = rep(c("2005", "2005", "2006", "2007", "2008", "2009", "2009"), 2),
  fac = rep(c("yes", "no", "no", "no", "no", "yes", "no"), 2),
  value = sample(1:100, 14, replace=T)
)

ext %>%
  group_by(institution, year) %>%
  filter(if (fac == "yes") fac != "no")

ext %>%
  group_by(institution, year) %>%
  case_when(fac == "yes" ~ filter(., fac != "no"))

ext %>%
  group_by(institution, year) %>%
  {if (fac == "yes") filter(., fac != "no")}

【问题讨论】:

    标签: r dplyr tidyverse


    【解决方案1】:

    另一种方法是:

    library(dplyr)
    ext %>%
      group_by(institution, year) %>%
      filter(fac == 'yes' | n() < 2)
    
    # institution year  fac   value
    # 1 a           2005  yes      31
    # 2 a           2006  no       51
    # 3 a           2007  no       14
    # 4 a           2008  no       67
    # 5 a           2009  yes      42
    # 6 b           2005  yes      43
    # 7 b           2006  no       25
    # 8 b           2007  no       90
    # 9 b           2008  no       91
    # 10 b          2009  yes      69
    

    如果您想按年份计算总金额,请添加这两行,这将产生以下输出:

    group_by(year) %>%
    summarise(value=sum(value))
    
    # year  value
    # <chr> <int>
    # 1 2005     74
    # 2 2006     76
    # 3 2007    104
    # 4 2008    158
    # 5 2009    111
    

    【讨论】:

      【解决方案2】:

      这行得通吗:总而言之,我假设您想在应用过滤后按年份求和。

      library(dplyr)
      ext %>% group_by(institution, year) %>% filter(fac == 'yes'|all(fac == 'no'))
      # A tibble: 10 x 4
      # Groups:   institution, year [10]
         institution year  fac   value
         <chr>       <chr> <chr> <int>
       1 a           2005  yes      31
       2 a           2006  no       51
       3 a           2007  no       14
       4 a           2008  no       67
       5 a           2009  yes      42
       6 b           2005  yes      43
       7 b           2006  no       25
       8 b           2007  no       90
       9 b           2008  no       91
      10 b           2009  yes      69
      ext %>% group_by(institution, year) %>% filter(fac == 'yes'|all(fac == 'no')) %>% 
      ungroup() %>% group_by(year) %>% summarise(value = sum(value))
      `summarise()` ungrouping output (override with `.groups` argument)
      # A tibble: 5 x 2
        year  value
        <chr> <int>
      1 2005     74
      2 2006     76
      3 2007    104
      4 2008    158
      5 2009    111
      

      【讨论】:

        【解决方案3】:

        尝试创建一个标志来识别“是”的出现,然后只过滤所需的值。您需要按institutionyear 分组。然后,计算 yes 大于或等于 1 的值的长度。如果组内有一些值是,那么您可以标记 no 值。最后,仅过滤 Flag 中的零值,您将按预期删除行。代码如下:

        library(dplyr)
        #Code
        newdf <- ext %>% group_by(institution,year) %>%
          mutate(NYes=length(fac[fac=='yes']),
                 Flag=ifelse(fac=='no' & NYes>=1,1,0)) %>%
          filter(Flag==0) %>% select(-c(NYes,Flag))
        

        输出:

        # A tibble: 10 x 4
        # Groups:   institution, year [10]
           institution year  fac   value
           <chr>       <chr> <chr> <int>
         1 a           2005  yes      31
         2 a           2006  no       51
         3 a           2007  no       14
         4 a           2008  no       67
         5 a           2009  yes      42
         6 b           2005  yes      43
         7 b           2006  no       25
         8 b           2007  no       90
         9 b           2008  no       91
        10 b           2009  yes      69
        

        以及按年份总结的完整代码:

        #Code 2
        newdf <- ext %>% group_by(institution,year) %>%
          mutate(NYes=length(fac[fac=='yes']),
                 Flag=ifelse(fac=='no' & NYes>=1,1,0)) %>%
          filter(Flag==0) %>% select(-c(NYes,Flag)) %>%
          ungroup() %>%
          group_by(year) %>%
          summarise(value=sum(value))
        

        输出:

        # A tibble: 5 x 2
          year  value
          <chr> <int>
        1 2005     74
        2 2006     76
        3 2007    104
        4 2008    158
        5 2009    111
        

        【讨论】:

          【解决方案4】:

          data.table 的选项

          library(data.table)
          setDT(ext)[ext[, .I[fac == 'yes'|all(fac == 'no')], .(institution, year)]$V1]
          

          【讨论】:

            猜你喜欢
            • 1970-01-01
            • 2021-07-21
            • 2021-08-09
            • 1970-01-01
            • 2018-11-20
            • 2017-10-21
            • 1970-01-01
            • 2019-05-21
            • 1970-01-01
            相关资源
            最近更新 更多