【问题标题】:filter out multiple rows based on the content of one element in a single row根据单行中一个元素的内容过滤出多行
【发布时间】:2019-11-02 11:33:58
【问题描述】:

我有一个数据框,其中有很多行都是重复的值,但dd 列中的值除外。

如果任何一行在此非重复列中包含值“ACT”,我需要删除与此“ACT”行以及“ACT”行本身匹配的所有行。因此,在示例代码中,我只想保留aa 列中包含“c”和“e”的六行。

dd 中存在“ACT”时,我尝试了各种带有 for 循环的嵌套 if-else,并尝试以某种方式根据 aa 中的值过滤掉,但无法弄清楚如何获取远离单行向量匹配。

aa <- c("b","b","b","c","c","c","d","d","d","e","e","e")
bb <- c("t","t","t","w","w","w","r","r","r","s","s","s")
cc <- c(1,1,1,2,2,2,3,3,3,4,4,4)
dd <- c("CVR","ACT","CVR","CVR","CVR","CVR","ACT","CVR","CVR","CVR","CVR","CVR")

理想情况下,我正在寻找tidyverse 的解决方案,但我当然愿意接受任何事情。

【问题讨论】:

    标签: r dataframe dplyr duplicates tidyverse


    【解决方案1】:
    • 使用dplyr 包:
    library(dplyr)
    df1 <- tibble(
      aa = c("b","b","b","c","c","c","d","d","d","e","e","e"),
      bb = c("t","t","t","w","w","w","r","r","r","s","s","s"),
      cc = c(1,1,1,2,2,2,3,3,3,4,4,4),
      dd = c("CVR","ACT","CVR","CVR","CVR","CVR","ACT","CVR","CVR","CVR","CVR","CVR")
    )
    
    anti_join(df1, df1[df1$dd=="ACT", ], by=c("aa","bb","cc"))
    #> # A tibble: 6 x 4
    #>   aa    bb       cc dd   
    #>   <chr> <chr> <dbl> <chr>
    #> 1 c     w         2 CVR  
    #> 2 c     w         2 CVR  
    #> 3 c     w         2 CVR  
    #> 4 e     s         4 CVR  
    #> 5 e     s         4 CVR  
    #> 6 e     s         4 CVR
    
    • 使用data.table 包:
    library(data.table)
    df2 <- data.table(
      aa = c("b","b","b","c","c","c","d","d","d","e","e","e"),
      bb = c("t","t","t","w","w","w","r","r","r","s","s","s"),
      cc = c(1,1,1,2,2,2,3,3,3,4,4,4),
      dd = c("CVR","ACT","CVR","CVR","CVR","CVR","ACT","CVR","CVR","CVR","CVR","CVR")
    )
    
    df2[!df2[dd=="ACT",], on = c("aa","bb","bb")]
    #>    aa bb cc  dd
    #> 1:  c  w  2 CVR
    #> 2:  c  w  2 CVR
    #> 3:  c  w  2 CVR
    #> 4:  e  s  4 CVR
    #> 5:  e  s  4 CVR
    #> 6:  e  s  4 CVR
    

    reprex package (v0.3.0) 于 2019 年 6 月 19 日创建

    【讨论】:

      【解决方案2】:

      您可以将向量放在 data.table 中,并仅保留 dd 列中没有“ACT”的 (aa, bb, cc) 组。

      library(data.table)
      
      df <- data.table(
        aa = c("b","b","b","c","c","c","d","d","d","e","e","e"),
        bb = c("t","t","t","w","w","w","r","r","r","s","s","s"),
        cc = c(1,1,1,2,2,2,3,3,3,4,4,4),
        dd = c("CVR","ACT","CVR","CVR","CVR","CVR","ACT","CVR","CVR","CVR","CVR","CVR")
      )
      
      df[, if(!"ACT" %in% dd) .SD, .(aa, bb, cc)]
      #    aa bb cc  dd
      # 1:  c  w  2 CVR
      # 2:  c  w  2 CVR
      # 3:  c  w  2 CVR
      # 4:  e  s  4 CVR
      # 5:  e  s  4 CVR
      # 6:  e  s  4 CVR
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2017-08-03
        • 2019-10-01
        • 2012-07-03
        • 1970-01-01
        • 1970-01-01
        • 2013-08-21
        相关资源
        最近更新 更多