计算客户之间共享了多少订单答案

【问题标题】：Count how many orders were shared between customers计算客户之间共享了多少订单
【发布时间】：2017-07-21 08:42:58
【问题描述】：

我有一个有两列的表

Order | CustomerID

 1. A | C1 
 2. B | C1 
 3. C | C1 
 4. D | C2 
 5. B | C3 
 6. C | C3
 7. D | C4

这是一张长桌。我想要一个显示的输出

C1 | C3 | 2  #Customer C1 and Customer C3 share 2 orders (i.e. orders, B & C) 
C1 | C2 | 0   #Customer C1 and Customer C2 share 0 orders 
C2 | C4 | 1   #Customer C2 and Customer C4 share 1 orders (i.e. order, D)
C2 | C3 | 0  Customer C2 and Customer C3 share 0 orders

【问题讨论】：

那么如果有 n 个客户，您是否必须显示 nC2 组合的计数？如果您打算使用 sql 查询执行此操作，请标记正在使用的 dbms。
按订单加入自身，按 customerID1 和 customerID2 分组，并计数。大量关于merging/joining 和group by counting 的帖子。
不显示0个共享订单的组合是可以的。我想说明一下
对于 [r] 标签，stackoverflow.com/questions/28742825/…可能重复

标签： sql r dplyr plyr sqldf

【解决方案1】：

select 
    a.CustomerId
  , b.CustomerId
  , sum(case when a.[Order] = b.[Order] then 1 else 0 end) as SharedOrders
from t as a
  inner join t as b
    on a.CustomerId < b.CustomerId
group by a.CustomerId, b.CustomerId

测试设置：http://rextester.com/ISSCL35174

+------------+------------+--------------+
| CustomerId | CustomerId | SharedOrders |
+------------+------------+--------------+
| C1         | C2         |            0 |
| C1         | C3         |            2 |
| C2         | C3         |            0 |
| C1         | C4         |            0 |
| C2         | C4         |            1 |
| C3         | C4         |            0 |
+------------+------------+--------------+

仅返回共享订单：

select a.CustomerId
     , b.CustomerId
     , count(*) as SharedOrders
from t as a
  inner join t as b
    on a.CustomerId < b.CustomerId
   and a.[Order] = b.[Order]
group by a.CustomerId, b.CustomerId

+------------+------------+--------------+
| CustomerId | CustomerId | SharedOrders |
+------------+------------+--------------+
| C1         | C3         |            2 |
| C2         | C4         |            1 |
+------------+------------+--------------+

【讨论】：

【解决方案2】：

这是使用table、crossprod、combn 和矩阵子集的基本 R 方法。

# get counts of customer IDs
myMat <- crossprod(with(df, table(Order, CustomerID)))
myMat
          CustomerID
CustomerID C1 C2 C3 C4
        C1  3  0  2  0
        C2  0  1  0  1
        C3  2  0  2  0
        C4  0  1  0  1

请注意，对角线是每个客户的订单总数，（对称的）非对角线是每个客户共享的订单数。

# get all customer pairs
customers <- t(combn(rownames(myMat), 2))

# use matrix subsetting to pull out order counts and cbind.data.frame to put it together
cbind.data.frame(customers, myMat[customers])
   1  2 myMat[customers]
1 C1 C2                0
2 C1 C3                2
3 C1 C4                0
4 C2 C3                0
5 C2 C4                1
6 C3 C4                0

如果需要提供特定的变量名称，您可以使用 wrap this in setNames 添加名称

setNames(cbind.data.frame(customers, myMat[customers]), c("c1", "c2", "counts"))

数据

df <- 
structure(list(Order = c("A", "B", "C", "D", "B", "C", "D"), 
    CustomerID = c("C1", "C1", "C1", "C2", "C3", "C3", "C4")), .Names = c("Order", 
"CustomerID"), class = "data.frame", row.names = c(NA, -7L))

【讨论】：

【解决方案3】：

一个 SQL 服务器演示（但代码是通用的）：

; with data as (select 'A' as [Order], 'C1' as CustomerID 
                union all 
                select 'B', 'C1'
                union all 
                select 'C', 'C1'
                union all 
                select 'D', 'C2'
                union all 
                select 'B', 'C3'
                union all 
                select 'C', 'C3'
                union all 
                select 'D', 'C4'
        )
select c1, c2, count(*) from (
select x.[Order], x.CustomerID c1, y.CustomerID c2
from data x join data y on x.[Order] = y.[Order] and x.CustomerID < y.CustomerID
) temp
group by c1, c2

这仅考虑共享至少一个订单的对。我认为返回不共享任何订单的配对会浪费资源。

【讨论】：

【解决方案4】：

我会使用cross join 来获取所有客户对，然后使用left joins 来引入订单。最后一步是聚合：

select c1.CustomerId, c2.CustomerId, count(t2.Order) as inCommon
from (select distinct CustomerID from t) c1 cross join
     (select distinct CustomerID from t) c2 left join
     t t1
     on t1.CustomerId = c1.CustomerId left join
     t t2
     on t2.CustomerId = c2.CustomerId and
        t2.Order = t1.Order
where c1.CustomerId < c2.CustomerId
group by c1.CustomerId, c2.CustomerId;

这个过程有点棘手，因为您需要没有共同顺序的配对。

【讨论】：