使用 sqldf 基于多个条件进行计数答案

【问题标题】：Count based on multiple conditions using sqldf使用 sqldf 基于多个条件进行计数
【发布时间】：2020-02-07 01:40:00
【问题描述】：

大家好，我正在使用 sqldf 在 R 上编写一个 sql 查询，但似乎遇到了障碍。我有一个包含 Id 列、两个日期列和一个按列分组的表。

AlertDate  AppointmentDate  ID  Branch
01/01/20   04/01/20         1   W1
01/01/20   09/01/20         1   W1
08/01/20   09/01/20         1   W2
01/01/20   23/01/20         1   W1

我写的查询是

sqldf('select Branch,count(ID) from df where AlertDate <= AppointmentDate 
and AppointmentDate <AlertDate+7 group by Branch')

从这个查询中我得到的结果是

Branch Count
W1      1
W2      1

根据查询，这是正确的。我想要实现的是，如果我的第二个条件为假，即 AppointmentDate 小于 AlertDate+7。与其删除计数，不如根据日期将其计入下一组。例如，如果 alertdate 是 01/01/20，约会日期是 23/01/20，那么它应该计入 W4。 ceil((Appointmentdate-alertdate)/7) 所以最后我想要的结果是

Branch  Count
W1      1
W2      2
W4      1

第二行应计入 W2，第 4 行应计入 W4，而不是被丢弃。我试图在 sql 中使用 R 中的 sqldf 来实现这一点。使用 R 或 Sql 的任何可能的解决方案都对我有用。

dput(测试)的输出

structure(list(AlertDate = structure(c(18262, 18262, 18269, 18262), class = "Date"), AppointmentDate = structure(c(18265, 18270,18270, 18284), class = 
"Date"), ID = c(1, 1, 1, 1), Branch = c("W1","W1", "W2", "W1")), class = c("spec_tbl_df", "tbl_df", "tbl","data.frame"), row.names = c(NA, -4L), problems = 
structure(list( row = 4L, col = "Branch", expected = "", actual = "embedded null", 
file = "'C:/Users/FRssarin/Desktop/test.txt'"), row.names = c(NA,-1L), class = c("tbl_df", "tbl", "data.frame")), spec = structure(list(  cols = list(AlertDate = 
structure(list(format = "%d/%m/%y"), class = c("collector_date", 
"collector")), AppointmentDate = structure(list(format = "%d/%m/%y"), class = c("collector_date",  "collector")), ID = structure(list(), class = c("collector_double", "collector")), Branch = structure(list(), class = 
c("collector_character",  "collector"))), default = structure(list(), class = c("collector_guess",  "collector")), skip = 1), class = "col_spec"))

【问题讨论】：

提供可重现的数据，目前通过您的示例和您的查询，我们得到 2、1，而不是 1,1。发布dput(head(df))的输出。
嗨，我已经上传了带有查询结果的图片，我只得到了 1,1。
请将dput(test) 的输出粘贴为文本。
如果不关心条件输出是真还是假，为什么还要有第二个条件？
我关心第二个条件，如果条件为真，计数将在 group1 中完成，如果条件为假，它应该在下一个组中完成，如输出中所述。

标签： sql r sqldf

【解决方案1】：

这是使用 data.table 的一种方法

df <- structure(list(AlertDate = structure(c(18262, 18262, 18269, 18262), class = "Date"), AppointmentDate = structure(c(18265, 18270,18270, 18284), class = 
                                                                                                                     "Date"), ID = c(1, 1, 1, 1), Branch = c("W1","W1", "W2", "W1")), class = c("spec_tbl_df", "tbl_df", "tbl","data.frame"), row.names = c(NA, -4L), problems = 
              structure(list( row = 4L, col = "Branch", expected = "", actual = "embedded null", 
                              file = "'C:/Users/FRssarin/Desktop/test.txt'"), row.names = c(NA,-1L), class = c("tbl_df", "tbl", "data.frame")), spec = structure(list(  cols = list(AlertDate = 
                                                                                                                                                                                      structure(list(format = "%d/%m/%y"), class = c("collector_date",

我正在将其转换为 data.table 并为您的逻辑创建一个新列。

library(data.table)
df <- data.table(df)
df <- df[AlertDate <= AppointmentDate] 
df[, new_branch:= ifelse(as.numeric(AppointmentDate-AlertDate)>=7
            ,paste0("W", as.character(ceiling(as.numeric(AppointmentDate-AlertDate)/7))),Branch)]

这是结果表

    AlertDate AppointmentDate ID Branch new_branch
1: 2020-01-01      2020-01-04  1     W1         W1
2: 2020-01-01      2020-01-09  1     W1         W2
3: 2020-01-08      2020-01-09  1     W2         W2
4: 2020-01-01      2020-01-23  1     W1         W4

这是 goupby 的结果..

df[, .(.N, alert=head(AlertDate,1),  appoint=head(AppointmentDate,1)), by = list(new_branch)]
   new_branch N      alert    appoint
1:         W1 1 2020-01-01 2020-01-04
2:         W2 2 2020-01-01 2020-01-09
3:         W4 1 2020-01-01 2020-01-23

【讨论】：