查找每月添加的客户数量答案

【问题标题】：find number of customers added each month查找每月添加的客户数量
【发布时间】：2018-10-24 18:34:00
【问题描述】：

customer_id  transaction_id    month  year 
          1    3                7     2014
          1    4                7     2014
          2    5                7     2014
          2    6                8     2014
          1    7                8     2014
          3    8                9     2015
          1    9                9     2015
          4    10               9     2015
          5    11               9     2015
          2    12               9     2015

我非常熟悉 R 基础知识。任何帮助将不胜感激。

预期的输出应如下所示：

month   year  number_unique_customers_added
 7      2014     2
 8      2014     0
 9      2015     3

在 2014 年的第 7 个月和第 7 年中，只有 customer_id 1 和 2 存在，因此添加的客户数量为 2。在 2014 年的第 8 个月和第 8 年，没有添加新的客户 ID。因此，在此期间应该有零个客户添加。最后在 2015 年和第 9 个月，customer_ids 3,4 和 5 是新添加的。因此，此期间新增的客户数为 3。

【问题讨论】：

library(dplyr); df %>% group_by(month, year) %>% summarise(new_cus = n_distinct(customer_id))
@RonakShah 此处提供的解决方案仅提供给定年份一个月内的唯一客户数量。但是，任何两个月之间的客户都可能重叠。所以这个解决方案不能回答我的问题。对吗？
是的，也许。你能用你的预期输出更新你的帖子吗？如果我提供的答案不能解决您的问题，我将重新打开它。
@RonakShah 请查看更新后的问题。

标签： r

【解决方案1】：

使用data.table：

require(data.table)

dt[, .SD[1,], by = customer_id][, uniqueN(customer_id), by = .(year, month)]

说明：我们首先删除每个客户的所有后续交易（我们对第一个感兴趣，当她是“新客户”时），然后按年和月的每个组合计算唯一客户。

【讨论】：

【解决方案2】：

使用dplyr，我们可以首先创建一个列来指示客户是否重复，然后我们group_bymonth 和year 来计算每个组中的新客户。

library(dplyr)
df %>%
  mutate(unique_customers = !duplicated(customer_id)) %>%
  group_by(month, year) %>%
  summarise(unique_customers = sum(unique_customers))

#  month  year unique_customers
#  <int> <int>            <int>
#1     7  2014                2
#2     8  2014                0
#3     9  2015                3

【讨论】：