【发布时间】:2022-11-25 05:53:57
【问题描述】:
我需要计算数据表获取的每个数据池的行数。这些要求必须填写周列“week”和“Exist”。 我有以下数据:
week_2020 <- seq(202001, 202015, 1)
week_2021 <- seq(202101, 202110, 1)
Exist <- c("TRUE","TRUE","TRUE","TRUE","TRUE","TRUE","TRUE","TRUE",
NA,NA,NA,
"TRUE","TRUE",NA,NA,"TRUE","TRUE","TRUE","TRUE",
NA,NA,NA,
"TRUE",NA, NA)
Year <- c(rep(2020,15),rep(2021,10) )
df<-data.table(Store = "store_1", Client = "client_1",
Year = Year,
week = c(week_2020, week_2021),
Exist = Exist)
| Store | Client | Year | Week | Exist |
|---|---|---|---|---|
| Store 1 | Client 1 | 2020 | 202001 | TRUE |
| Store 1 | Client 1 | 2020 | 202002 | TRUE |
| Store 1 | Client 1 | 2020 | 202003 | TRUE |
| Store 1 | Client 1 | 2020 | 202004 | TRUE |
| Store 1 | Client 1 | 2020 | 202005 | TRUE |
| Store 1 | Client 1 | 2020 | 202006 | TRUE |
| Store 1 | Client 1 | 2020 | 202007 | TRUE |
| Store 1 | Client 1 | 2020 | 202008 | TRUE |
| Store 1 | Client 1 | 2020 | 202009 | NA |
| Store 1 | Client 1 | 2020 | 202010 | NA |
| Store 1 | Client 1 | 2020 | 202011 | NA |
| Store 1 | Client 1 | 2020 | 202012 | TRUE |
| Store 1 | Client 1 | 2020 | 202013 | TRUE |
| Store 1 | Client 1 | 2020 | 202014 | NA |
| Store 1 | Client 1 | 2020 | 202015 | NA |
| Store 1 | Client 1 | 2021 | 202101 | TRUE |
| Store 1 | Client 1 | 2021 | 202102 | TRUE |
| Store 1 | Client 1 | 2021 | 202103 | TRUE |
| Store 1 | Client 1 | 2021 | 202104 | TRUE |
| Store 1 | Client 1 | 2021 | 202105 | NA |
| Store 1 | Client 1 | 2021 | 202106 | NA |
| Store 1 | Client 1 | 2021 | 202107 | NA |
| Store 1 | Client 1 | 2021 | 202108 | TRUE |
| Store 1 | Client 1 | 2021 | 202109 | NA |
| Store 1 | Client 1 | 2021 | 202110 | NA |
如表所示,“存在”一栏的部分数据为NA,表示不存在,但该分组也要算进去。
我创建了一个变量来帮助我计算周数,直到我找到丢失的周数,然后我应该计算它们并重置计数器,另一个变量计算最大计数“n_week_Count”,但我不能做我需要的。我希望你能帮我解决这个问题。谢谢指教
这就是我所拥有的...
df[, ':=' (n_weekCount = 1:.SD[,(.N)] ), keyby = c("Store", "Client", "Year", "Exist")
][, ':=' (MaxweekCount = .SD[, max(n_weekCount)]), keyby = c("Store", "Client", "Year", "Exist")
][order(week)]
| Store | Client | Year | Week | Exist | n_weekCount | maxWeek_Count |
|---|---|---|---|---|---|---|
| Store 1 | Client 1 | 2020 | 202001 | TRUE | 1 | 10 |
| Store 1 | Client 1 | 2020 | 202002 | TRUE | 2 | 10 |
| Store 1 | Client 1 | 2020 | 202003 | TRUE | 3 | 10 |
| Store 1 | Client 1 | 2020 | 202004 | TRUE | 4 | 10 |
| Store 1 | Client 1 | 2020 | 202005 | TRUE | 5 | 10 |
| Store 1 | Client 1 | 2020 | 202006 | TRUE | 6 | 10 |
| Store 1 | Client 1 | 2020 | 202007 | TRUE | 7 | 10 |
| Store 1 | Client 1 | 2020 | 202008 | TRUE | 8 | 10 |
| Store 1 | Client 1 | 2020 | 202009 | NA | 1 | 5 |
| Store 1 | Client 1 | 2020 | 202010 | NA | 2 | 5 |
| Store 1 | Client 1 | 2020 | 202011 | NA | 3 | 5 |
| Store 1 | Client 1 | 2020 | 202012 | TRUE | 9 | 10 |
| Store 1 | Client 1 | 2020 | 202013 | TRUE | 10 | 10 |
| Store 1 | Client 1 | 2020 | 202014 | NA | 4 | 5 |
| Store 1 | Client 1 | 2020 | 202015 | NA | 5 | 5 |
| Store 1 | Client 1 | 2021 | 202101 | TRUE | 1 | 10 |
| Store 1 | Client 1 | 2021 | 202102 | TRUE | 2 | 10 |
| Store 1 | Client 1 | 2021 | 202103 | TRUE | 3 | 10 |
| Store 1 | Client 1 | 2021 | 202104 | TRUE | 4 | 10 |
| Store 1 | Client 1 | 2021 | 202105 | NA | 1 | 5 |
| Store 1 | Client 1 | 2021 | 202106 | NA | 2 | 5 |
| Store 1 | Client 1 | 2021 | 202107 | NA | 3 | 5 |
| Store 1 | Client 1 | 2021 | 202108 | TRUE | 1 | 10 |
| Store 1 | Client 1 | 2021 | 202109 | NA | 4 | 5 |
| Store 1 | Client 1 | 2021 | 202110 | NA | 5 | 5 |
期望的结果是:
| Store | Client | Year | Week | Exist | n_weekCount | maxWeek_Count |
|---|---|---|---|---|---|---|
| Store 1 | Client 1 | 2020 | 202001 | TRUE | 1 | 8 |
| Store 1 | Client 1 | 2020 | 202002 | TRUE | 2 | 8 |
| Store 1 | Client 1 | 2020 | 202003 | TRUE | 3 | 8 |
| Store 1 | Client 1 | 2020 | 202004 | TRUE | 4 | 8 |
| Store 1 | Client 1 | 2020 | 202005 | TRUE | 5 | 8 |
| Store 1 | Client 1 | 2020 | 202006 | TRUE | 6 | 8 |
| Store 1 | Client 1 | 2020 | 202007 | TRUE | 7 | 8 |
| Store 1 | Client 1 | 2020 | 202008 | TRUE | 8 | 8 |
| Store 1 | Client 1 | 2020 | 202009 | NA | 1 | 3 |
| Store 1 | Client 1 | 2020 | 202010 | NA | 2 | 3 |
| Store 1 | Client 1 | 2020 | 202011 | NA | 3 | 3 |
| Store 1 | Client 1 | 2020 | 202012 | TRUE | 1 | 2 |
| Store 1 | Client 1 | 2020 | 202013 | TRUE | 2 | 2 |
| Store 1 | Client 1 | 2020 | 202014 | NA | 1 | 2 |
| Store 1 | Client 1 | 2020 | 202015 | NA | 2 | 2 |
| Store 1 | Client 1 | 2021 | 202101 | TRUE | 1 | 4 |
| Store 1 | Client 1 | 2021 | 202102 | TRUE | 2 | 4 |
| Store 1 | Client 1 | 2021 | 202103 | TRUE | 3 | 4 |
| Store 1 | Client 1 | 2021 | 202104 | TRUE | 4 | 4 |
| Store 1 | Client 1 | 2021 | 202105 | NA | 1 | 3 |
| Store 1 | Client 1 | 2021 | 202106 | NA | 2 | 3 |
| Store 1 | Client 1 | 2021 | 202107 | NA | 3 | 3 |
| Store 1 | Client 1 | 2021 | 202108 | TRUE | 1 | 1 |
| Store 1 | Client 1 | 2021 | 202109 | NA | 1 | 2 |
| Store 1 | Client 1 | 2021 | 202110 | NA | 2 | 2 |
【问题讨论】:
标签: r group-by count data.table