【问题标题】:Remove rows from data.frame according to a rule in R [duplicate]根据R中的规则从data.frame中删除行[重复]
【发布时间】:2020-10-29 15:23:25
【问题描述】:

在 data.frame 中,如果在所有其他列中存在具有相同信息的另一行,我想自动删除 Column_E 上带有“NA”的行,例如:

Column_A   Column_B   Column_C    Column_D      Column_E
A121       NAME1      A321        2019-01-01    NA
A121       NAME1      A321        2019-01-01    2020-02-01
A123       NAME2      A322        2019-01-01    2020-01-01
A123       NAME2      A322        2019-01-01    NA
A124       NAME3      A323        2019-01-01    2019-01-01
A124       NAME4      A324        2019-01-01    NA

输出应该是:

Column_A   Column_B   Column_C   Column_D       Column_E
A121       NAME1      A321        2019-01-01    2020-02-01
A123       NAME2      A322        2019-01-01    2020-01-01
A124       NAME3      A323        2019-01-01    2019-01-01
A124       NAME4      A324        2019-01-01    NA

有什么想法吗?

【问题讨论】:

    标签: r dataframe row


    【解决方案1】:

    您可以选择没有NA 值或组中只有1 行的行。

    library(dplyr)
    df %>%
      group_by(across(Column_A:Column_D)) %>%
      filter(!is.na(Column_E) | n() == 1)
    
    # Column_A Column_B Column_C Column_D   Column_E  
    #  <chr>    <chr>    <chr>    <chr>      <chr>     
    #1 A121     NAME1    A321     2019-01-01 2020-02-01
    #2 A123     NAME2    A322     2019-01-01 2020-01-01
    #3 A124     NAME3    A323     2019-01-01 2019-01-01
    #4 A124     NAME4    A324     2019-01-01 NA       
    

    data.table 中的逻辑相同:

    library(data.table)
    
    setDT(df)
    df[, .SD[!is.na(Column_E) | .N == 1], .(Column_A, Column_B, Column_C, Column_D)]
    

    和基础R:

    subset(df, ave(!is.na(Column_E),Column_A, Column_B, Column_C, Column_D, 
              FUN = function(x) x | length(x) == 1))
    

    数据

    df <- structure(list(Column_A = c("A121", "A121", "A123", "A123", "A124", 
    "A124"), Column_B = c("NAME1", "NAME1", "NAME2", "NAME2", "NAME3", 
    "NAME4"), Column_C = c("A321", "A321", "A322", "A322", "A323", 
    "A324"), Column_D = c("2019-01-01", "2019-01-01", "2019-01-01", 
    "2019-01-01", "2019-01-01", "2019-01-01"), Column_E = c(NA, "2020-02-01", 
    "2020-01-01", NA, "2019-01-01", NA)), class = "data.frame", 
    row.names = c(NA, -6L))
    

    【讨论】:

      猜你喜欢
      • 2021-02-24
      • 1970-01-01
      • 1970-01-01
      • 2019-01-14
      • 1970-01-01
      • 2021-05-13
      • 1970-01-01
      • 2018-10-25
      • 1970-01-01
      相关资源
      最近更新 更多