【问题标题】:Use unique in r with more than one logical condition在具有多个逻辑条件的 r 中使用 unique
【发布时间】:2020-08-29 17:06:20
【问题描述】:

data.table中的以下数据框

df <- data.table (id=c(1,1,2,2,3,3,4,4),
                  date=c("2013-11-22","2017-01-24","2017-06-24","2020-02-10","2011-01-03","2013-11-24","2015-01-24","2017-08-24"),
                  status=c("Former","Current","Former","Never","Current",NA,"Current","Former"))
df
id       date  status
1:  1 2013-11-22  Former
2:  1 2017-01-24 Current
3:  2 2017-06-24  Former
4:  2 2020-02-10   Never
5:  3 2011-01-03 Current
6:  3 2013-11-24    <NA>
7:  4 2015-01-24 Current
8:  4 2017-08-24  Former

我想使用以下逻辑为每个 id 创建一个唯一的行。应保留最新的date。如果最晚日期的status&lt;NA&gt;Never 并且还有一个更早日期的status,则应保留具有更早日期的行。 我用以下函数解决了这个问题:

unique1 <- df[df$status %in% c("Former","Current"),]
unique1 <- unique1[,.SD[which.max(anydate(date))],by=.(id)]
unique_final <- unique(df[order(id,ordered(status,c("Former","Current","Never",NA)))],by='id')
unique_final[match(unique1$id,unique_final$id),]<-unique1

并得到这些结果

id       date  status
1:  1 2017-01-24 Current
2:  2 2017-06-24  Former
3:  3 2011-01-03 Current
4:  4 2017-08-24  Former

有没有办法将这两个逻辑子集步骤结合起来?我想避免创建一个新的数据框而不是匹配它们。 我正在使用data.table,对于更大的数据集的解决方案会很棒。 谢谢!

【问题讨论】:

    标签: r data.table unique


    【解决方案1】:

    可以试试:

    library(data.table)
    
    df[, .SD[
      if (all(status %in% c(NA, 'Never'))) .N
      else max(which(!status %in% c(NA, 'Never')))
      ], by = id]
    

    输出:

       id       date  status
    1:  1 2017-01-24 Current
    2:  2 2017-06-24  Former
    3:  3 2011-01-03 Current
    4:  4 2017-08-24  Former
    

    【讨论】:

      【解决方案2】:

      这是一个基于dplyr 的解决方案。它重新编码状态,使当前和以前具有相同的级别,然后对每个 id 进行排序并获取第一行

      library(dplyr)
      library(data.table)
      
      df <- data.table(id=c(1,1,2,2,3,3,4,4),
                       date=c("2013-11-22","2017-01-24","2017-06-24","2020-02-10","2011-01-03","2013-11-24","2015-01-24","2017-08-24"),
                       status=c("Former","Current","Former","Never","Current",NA,"Current","Former"))
      
      
      
      df %>% 
        mutate(
          status = factor(status, levels = c("Never", "Former", "Current")),
          status2 = forcats::fct_recode(status, "Current" = "Former")
          ) %>% 
        group_by(id) %>% 
        arrange(desc(status2), desc(date)) %>% 
        select(-status2) %>% 
        slice(1)
      #> # A tibble: 4 x 3
      #> # Groups:   id [4]
      #>      id date       status 
      #>   <dbl> <chr>      <fct>  
      #> 1     1 2017-01-24 Current
      #> 2     2 2017-06-24 Former 
      #> 3     3 2011-01-03 Current
      #> 4     4 2017-08-24 Former
      

      reprex package (v0.3.0) 于 2020 年 8 月 29 日创建

      【讨论】:

        【解决方案3】:

        这是使用subset + ave 的基本 R 选项

        subset(
          df[!df$status %in% c(NA, "Never"), ],
          as.logical(ave(date, id, FUN = function(x) x == max(x)))
        )
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 2017-12-10
          • 2016-06-15
          • 1970-01-01
          • 2020-10-31
          • 2019-01-26
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多