【问题标题】:Count rows by grouping and reset counter on new group with datatable and R通过使用数据表和 R 对新组进行分组和重置计数器来计算行数
【发布时间】:2022-11-25 05:53:57
【问题描述】:

我需要计算数据表获取的每个数据池的行数。这些要求必须填写周列“week”和“Exist”。 我有以下数据:

week_2020 <- seq(202001, 202015, 1)
week_2021 <- seq(202101, 202110, 1)

Exist <- c("TRUE","TRUE","TRUE","TRUE","TRUE","TRUE","TRUE","TRUE",
           NA,NA,NA,
           "TRUE","TRUE",NA,NA,"TRUE","TRUE","TRUE","TRUE",
           NA,NA,NA,
           "TRUE",NA, NA)

Year <- c(rep(2020,15),rep(2021,10) )

df<-data.table(Store = "store_1", Client = "client_1", 
               Year = Year,
               week = c(week_2020, week_2021),
               Exist = Exist)
Store Client Year Week Exist
Store 1 Client 1 2020 202001 TRUE
Store 1 Client 1 2020 202002 TRUE
Store 1 Client 1 2020 202003 TRUE
Store 1 Client 1 2020 202004 TRUE
Store 1 Client 1 2020 202005 TRUE
Store 1 Client 1 2020 202006 TRUE
Store 1 Client 1 2020 202007 TRUE
Store 1 Client 1 2020 202008 TRUE
Store 1 Client 1 2020 202009 NA
Store 1 Client 1 2020 202010 NA
Store 1 Client 1 2020 202011 NA
Store 1 Client 1 2020 202012 TRUE
Store 1 Client 1 2020 202013 TRUE
Store 1 Client 1 2020 202014 NA
Store 1 Client 1 2020 202015 NA
Store 1 Client 1 2021 202101 TRUE
Store 1 Client 1 2021 202102 TRUE
Store 1 Client 1 2021 202103 TRUE
Store 1 Client 1 2021 202104 TRUE
Store 1 Client 1 2021 202105 NA
Store 1 Client 1 2021 202106 NA
Store 1 Client 1 2021 202107 NA
Store 1 Client 1 2021 202108 TRUE
Store 1 Client 1 2021 202109 NA
Store 1 Client 1 2021 202110 NA

如表所示,“存在”一栏的部分数据为NA,表示不存在,但该分组也要算进去。

我创建了一个变量来帮助我计算周数,直到我找到丢失的周数,然后我应该计算它们并重置计数器,另一个变量计算最大计数“n_week_Count”,但我不能做我需要的。我希望你能帮我解决这个问题。谢谢指教

这就是我所拥有的...

df[, ':=' (n_weekCount = 1:.SD[,(.N)] ), keyby = c("Store", "Client", "Year", "Exist")
   ][, ':=' (MaxweekCount = .SD[, max(n_weekCount)]), keyby = c("Store", "Client", "Year", "Exist")
][order(week)]
Store Client Year Week Exist n_weekCount maxWeek_Count
Store 1 Client 1 2020 202001 TRUE 1 10
Store 1 Client 1 2020 202002 TRUE 2 10
Store 1 Client 1 2020 202003 TRUE 3 10
Store 1 Client 1 2020 202004 TRUE 4 10
Store 1 Client 1 2020 202005 TRUE 5 10
Store 1 Client 1 2020 202006 TRUE 6 10
Store 1 Client 1 2020 202007 TRUE 7 10
Store 1 Client 1 2020 202008 TRUE 8 10
Store 1 Client 1 2020 202009 NA 1 5
Store 1 Client 1 2020 202010 NA 2 5
Store 1 Client 1 2020 202011 NA 3 5
Store 1 Client 1 2020 202012 TRUE 9 10
Store 1 Client 1 2020 202013 TRUE 10 10
Store 1 Client 1 2020 202014 NA 4 5
Store 1 Client 1 2020 202015 NA 5 5
Store 1 Client 1 2021 202101 TRUE 1 10
Store 1 Client 1 2021 202102 TRUE 2 10
Store 1 Client 1 2021 202103 TRUE 3 10
Store 1 Client 1 2021 202104 TRUE 4 10
Store 1 Client 1 2021 202105 NA 1 5
Store 1 Client 1 2021 202106 NA 2 5
Store 1 Client 1 2021 202107 NA 3 5
Store 1 Client 1 2021 202108 TRUE 1 10
Store 1 Client 1 2021 202109 NA 4 5
Store 1 Client 1 2021 202110 NA 5 5

期望的结果是:

Store Client Year Week Exist n_weekCount maxWeek_Count
Store 1 Client 1 2020 202001 TRUE 1 8
Store 1 Client 1 2020 202002 TRUE 2 8
Store 1 Client 1 2020 202003 TRUE 3 8
Store 1 Client 1 2020 202004 TRUE 4 8
Store 1 Client 1 2020 202005 TRUE 5 8
Store 1 Client 1 2020 202006 TRUE 6 8
Store 1 Client 1 2020 202007 TRUE 7 8
Store 1 Client 1 2020 202008 TRUE 8 8
Store 1 Client 1 2020 202009 NA 1 3
Store 1 Client 1 2020 202010 NA 2 3
Store 1 Client 1 2020 202011 NA 3 3
Store 1 Client 1 2020 202012 TRUE 1 2
Store 1 Client 1 2020 202013 TRUE 2 2
Store 1 Client 1 2020 202014 NA 1 2
Store 1 Client 1 2020 202015 NA 2 2
Store 1 Client 1 2021 202101 TRUE 1 4
Store 1 Client 1 2021 202102 TRUE 2 4
Store 1 Client 1 2021 202103 TRUE 3 4
Store 1 Client 1 2021 202104 TRUE 4 4
Store 1 Client 1 2021 202105 NA 1 3
Store 1 Client 1 2021 202106 NA 2 3
Store 1 Client 1 2021 202107 NA 3 3
Store 1 Client 1 2021 202108 TRUE 1 1
Store 1 Client 1 2021 202109 NA 1 2
Store 1 Client 1 2021 202110 NA 2 2

【问题讨论】:

    标签: r group-by count data.table


    【解决方案1】:

    我们可以使用 rleid 进行分组,并使用 seq_len(.N) 和组大小 (.N) 创建列 (:=)

    library(data.table)
    df[, c("n_WeekCount", "maxWeek_Count") := .(seq_len(.N), .N),
          .(grp = rleid(Exist), Store, Client, Year)]
    

    -输出

    > df
          Store   Client  Year   week  Exist n_WeekCount maxWeek_Count
         <char>   <char> <num>  <num> <char>       <int>         <int>
     1: store_1 client_1  2020 202001   TRUE           1             8
     2: store_1 client_1  2020 202002   TRUE           2             8
     3: store_1 client_1  2020 202003   TRUE           3             8
     4: store_1 client_1  2020 202004   TRUE           4             8
     5: store_1 client_1  2020 202005   TRUE           5             8
     6: store_1 client_1  2020 202006   TRUE           6             8
     7: store_1 client_1  2020 202007   TRUE           7             8
     8: store_1 client_1  2020 202008   TRUE           8             8
     9: store_1 client_1  2020 202009   <NA>           1             3
    10: store_1 client_1  2020 202010   <NA>           2             3
    11: store_1 client_1  2020 202011   <NA>           3             3
    12: store_1 client_1  2020 202012   TRUE           1             2
    13: store_1 client_1  2020 202013   TRUE           2             2
    14: store_1 client_1  2020 202014   <NA>           1             2
    15: store_1 client_1  2020 202015   <NA>           2             2
    16: store_1 client_1  2021 202101   TRUE           1             4
    17: store_1 client_1  2021 202102   TRUE           2             4
    18: store_1 client_1  2021 202103   TRUE           3             4
    19: store_1 client_1  2021 202104   TRUE           4             4
    20: store_1 client_1  2021 202105   <NA>           1             3
    21: store_1 client_1  2021 202106   <NA>           2             3
    22: store_1 client_1  2021 202107   <NA>           3             3
    23: store_1 client_1  2021 202108   TRUE           1             1
    24: store_1 client_1  2021 202109   <NA>           1             2
    25: store_1 client_1  2021 202110   <NA>           2             2
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2015-09-24
      • 2016-06-17
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-10-22
      相关资源
      最近更新 更多