【问题标题】:Identifying the first rows in a data frame grouped by an ID and date识别按 ID 和日期分组的数据框中的第一行
【发布时间】:2021-04-21 14:09:30
【问题描述】:

我有一个类似于以下的数据集:

dt = structure(list(ID = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 
4, 5, 5, 6, 6, 6, 6), date = structure(c(1332288000, 1332288000, 
1360540800, 1384819200, 1384819200, 1325548800, 1326499200, 1365292800, 
1365292800, 1365292800, 1400284800, 1442966400, 1450051200, 1404864000, 
1330387200, 1330387200, 1366329600, 1366329600, 1412467200, 1412467200
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), type = c("A", 
"C", "B", "A", "B", "C", "C", "A", "B", "C", "C", "A", "A", "C", 
"C", "C", "C", "B", "B", "A")), row.names = c(NA, -20L), class = c("tbl_df", 
"tbl", "data.frame"))

我有行来记录特定类型的事件(类型)的唯一个人(ID)和他们在系统中出现的日期(日期)。这些行首先按 ID 排序,然后按日期排序。您可以看到一个人可以出现在多个日期,并且每个日期内有多种事件类型。

我正在尝试创建一个额外的列(第一列)来指示/标记个人出现的第一个日期,为与他们的第一次出现日期相对应的每一行标记“1”,而不仅仅是他们出现的第一行。这个是我所追求的:

    ID       date type first
 1:  1 2012-03-21    A     1
 2:  1 2012-03-21    C     1
 3:  1 2013-02-11    B     0
 4:  1 2013-11-19    A     0
 5:  1 2013-11-19    B     0
 6:  2 2012-01-03    C     1
 7:  2 2012-01-14    C     0
 8:  2 2013-04-07    A     0
 9:  2 2013-04-07    B     0
10:  2 2013-04-07    C     0
11:  2 2014-05-17    C     0
12:  3 2015-09-23    A     1
13:  3 2015-12-14    A     0
14:  4 2014-07-09    C     1
15:  5 2012-02-28    C     1
16:  5 2012-02-28    C     1
17:  6 2013-04-19    C     1
18:  6 2013-04-19    B     1
19:  6 2014-10-05    B     0
20:  6 2014-10-05    A     0

例如,我已经看到了识别首次出现/行 herehere 的解决方案。但这些不是我所追求的,因为我同时按 ID 和日期分组。我尝试在按 ID 和日期分组时使用 data.table 中的重复函数,但这是在识别 ID 和日期的唯一组合:

df[!duplicated(df, by=c("ID", "date")), first := 1]

任何帮助将不胜感激 - 特别是使用 data.table 或 base r 的解决方案。

提前致谢

【问题讨论】:

    标签: r data.table


    【解决方案1】:

    对于每个ID 分配1 到first,其中日期与第一个日期相同可以写为:

    library(dplyr)
    
    dt %>%
      group_by(ID) %>%
      mutate(first = as.integer(as.Date(date) == first(as.Date(date)))) %>%
      ungroup
    

    data.table

    library(data.table)
    setDT(dt)[, first := as.integer(as.Date(date) == first(as.Date(date))), ID]
    dt
    
    #    ID       date type first
    # 1:  1 2012-03-21    A     1
    # 2:  1 2012-03-21    C     1
    # 3:  1 2013-02-11    B     0
    # 4:  1 2013-11-19    A     0
    # 5:  1 2013-11-19    B     0
    # 6:  2 2012-01-03    C     1
    # 7:  2 2012-01-14    C     0
    # 8:  2 2013-04-07    A     0
    # 9:  2 2013-04-07    B     0
    #10:  2 2013-04-07    C     0
    #11:  2 2014-05-17    C     0
    #12:  3 2015-09-23    A     1
    #13:  3 2015-12-14    A     0
    #14:  4 2014-07-09    C     1
    #15:  5 2012-02-28    C     1
    #16:  5 2012-02-28    C     1
    #17:  6 2013-04-19    C     1
    #18:  6 2013-04-19    B     1
    #19:  6 2014-10-05    B     0
    #20:  6 2014-10-05    A     0
    

    【讨论】:

    • 谢谢。使用您的 dplyr 解决方案并转换为与 data.table df[, first := as.integer(as.Date(date) == first(as.Date(date))), by = "ID"] 一起使用
    【解决方案2】:

    这是data.table 方法:

    library(data.table)
    setDT(dt)
    dt[,first := fifelse(date == min(date), 1, 0), by = "ID"]
    #    ID       date type first
    # 1:  1 2012-03-21    A     1
    # 2:  1 2012-03-21    C     1
    # 3:  1 2013-02-11    B     0
    # 4:  1 2013-11-19    A     0
    # 5:  1 2013-11-19    B     0
    # 6:  2 2012-01-03    C     1
    # 7:  2 2012-01-14    C     0
    # 8:  2 2013-04-07    A     0
    # 9:  2 2013-04-07    B     0
    #10:  2 2013-04-07    C     0
    #11:  2 2014-05-17    C     0
    #12:  3 2015-09-23    A     1
    #13:  3 2015-12-14    A     0
    #14:  4 2014-07-09    C     1
    #15:  5 2012-02-28    C     1
    #16:  5 2012-02-28    C     1
    #17:  6 2013-04-19    C     1
    #18:  6 2013-04-19    B     1
    #19:  6 2014-10-05    B     0
    #20:  6 2014-10-05    A     0
    

    【讨论】:

      【解决方案3】:

      这是另一种 data.table 解决方案/方法。 + 符号将 TRUEFALSE 转换为 10

      library(data.table)
      
      setDT(dt)[, first := +(date == min(date)), by=ID]
      
      #        ID       date   type first
      #  1:     1 2012-03-21      A     1
      #  2:     1 2012-03-21      C     1
      #  3:     1 2013-02-11      B     0
      #  4:     1 2013-11-19      A     0
      #  5:     1 2013-11-19      B     0
      #  6:     2 2012-01-03      C     1
      #  7:     2 2012-01-14      C     0
      #  8:     2 2013-04-07      A     0
      #  9:     2 2013-04-07      B     0
      # 10:     2 2013-04-07      C     0
      # 11:     2 2014-05-17      C     0
      # 12:     3 2015-09-23      A     1
      # 13:     3 2015-12-14      A     0
      # 14:     4 2014-07-09      C     1
      # 15:     5 2012-02-28      C     1
      # 16:     5 2012-02-28      C     1
      # 17:     6 2013-04-19      C     1
      # 18:     6 2013-04-19      B     1
      # 19:     6 2014-10-05      B     0
      # 20:     6 2014-10-05      A     0
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2021-12-09
        • 1970-01-01
        • 1970-01-01
        • 2021-03-16
        • 2017-04-29
        • 1970-01-01
        • 2015-06-01
        相关资源
        最近更新 更多