【问题标题】:Calculating customer's trips based on transactions dates根据交易日期计算客户的行程
【发布时间】:2016-07-24 00:49:50
【问题描述】:

这里是基本 T-SQL 用户。我在尝试完成任务时遇到问题,希望得到一些指导。提前为任何错误道歉,因为英语不是我的母语。

我有一张包含很多交易的表,为了简单起见,假设我只有两列:CUSTOMER_ID,这是我的客户,DATE 是交易日期。

我的客户在城里时进行了大量交易,但随后他们可能会花费数周、数月甚至数年的时间才能回来并再次开始进行交易。我想以某种方式识别这些“旅行”中的每一个并将所涉及的交易分组,然后我想做一些事情,比如计算旅行持续时间、交易数量等。

我想将旅行视为在 10 天的空闲期之后发生的任何新交易。

让我试着用一些简单的例子来更好地解释我的请求:

这是我的交易表:

+-------------+------------+
| CUSTOMER_ID |    DATE    |
+-------------+------------+
| JHON        | 01-01-2016 |
| JHON        | 01-02-2016 |
| PEDRO       | 01-02-2016 |
| JHON        | 01-05-2016 |
| MIKE        | 01-05-2016 |
| MIKE        | 01-10-2016 |
| JHON        | 01-07-2016 |
| …           | …          |
| JHON        | 02-15-2016 |
| JHON        | 02-18-2016 |
| MIKE        | 02-19-2016 |
| MIKE        | 02-19-2016 |
+-------------+------------+

到目前为止,我已经做了这个查询来枚举客户的访问:

SELECT
    CUSTOMER_ID,
    DATE,
    ROW_NUMBER() OVER(PARTITION BY CUSTOMER_ID ORDER BY DATE) as VISIT_NUM

FROM
    TRANSACTIONS
WHERE
    CUSTOMER_ID IN ('JHON','MIKE','PEDRO')

运行该查询会得到类似这样的结果:

+-------------+------------+-----------+
| CUSTOMER_ID |    DATE    | VISIT_NUM |
+-------------+------------+-----------+
| JHON        | 01-01-2016 |         1 |
| JHON        | 01-02-2016 |         2 |
| JHON        | 01-07-2016 |         3 |
| JHON        | 02-15-2016 |         4 |
| JHON        | 02-18-2016 |         5 |
| MIKE        | 01-05-2016 |         1 |
| MIKE        | 01-10-2016 |         2 |
| MIKE        | 02-19-2016 |         3 |
| MIKE        | 02-19-2016 |         4 |
| PEDRO       | 01-02-2016 |         1 |
+-------------+------------+-----------+

现在是棘手的部分:我需要以某种方式创建一个查询(可能使用上述查询作为上一步)向我显示客户的行程信息,继续示例理想的结果将是这样的:

+-------------+----------+---------------+-------------+---------------+--------------+
| CUSTOMER_ID | TRIP_NUM | TRIP_START_DT | TRIP_END_DT | TRIP_DURATION | TRANSACTIONS |
+-------------+----------+---------------+-------------+---------------+--------------+
| JHON        |        1 | 01-01-2016    | 01-07-2016  |             7 |            3 |
| JHON        |        2 | 02-15-2016    | 02-18-2016  |             3 |            2 |
| MIKE        |        1 | 01-05-2016    | 01-10-2016  |             5 |            2 |
| MIKE        |        2 | 02-19-2016    | 02-19-2016  |             1 |            2 |
| PEDRO       |        1 | 01-02-2016    | 01-02-2016  |             1 |            1 |
+-------------+----------+---------------+-------------+---------------+--------------+

如您所见,Jhon 先生在一月份来了 3 次,并在二月份再次回来。距离他 1 月份的最后一笔交易已经过去了 10 多天,我想将他的新交易集视为他的新“旅行”。迈克在一月份也有一些活动,二月份也回来了,在他的第二次旅行中,他在同一天进行了两次交易,我也想说明这一点。如果客户只来了一天并且有一些活动(如佩德罗先生的情况),我还想将单日单笔交易记录视为旅行记录。

我将不胜感激,我真的不知道如何继续(我一直在阅读有关游标的内容,但此时似乎是黑魔法,无法找到实现它们的方法) .

再次为我的任何语法错误和任何可能的遗漏道歉。如有必要,我会进一步澄清。

【问题讨论】:

  • 您的英语水平超过 99% 的以英语为母语的人。

标签: sql sql-server tsql


【解决方案1】:

在您的示例中,计算行程持续时间并不是所有员工的标准,因此我对其进行了调整以遵循所有员工的第一个客户 ID

DEMO HERE

 ;with cte
 as
 (select cid,datee,datepart(month,datee) as monthh,
  dense_rank () over (partition by cid order by datepart(month,datee)) as samemonth,
 count(0) over (partition by cid,datepart(month,datee) ) as cnt
 from #temp
)
,cte1 as
 (
select cid,max(samemonth) as tripnumber,min(datee) as startdate,max(datee) as enddate,
max(cnt) as numberoftrips
from  cte 
group by cid,samemonth
)
select *,datediff(day,startdate,dateadd(day,1,enddate))as duration
from  cte1 

输出:

cid   tripnumber startdate      enddate    numberoftransactions duration
JHON    1        2016-01-01    2016-01-07   3                    7
JHON    2        2016-02-15    2016-02-18   2                    4
MIKE    1        2016-01-05    2016-01-10   2                    6
MIKE    2        2016-02-19    2016-02-19   2                    1
PEDRO   1        2016-01-02    2016-01-02   1                    1

【讨论】:

  • 感谢四位您的回答。不完全是我想要的,但它对我帮助很大!
【解决方案2】:

我在别处找到了完美的答案。所有功劳都归功于 Reddit 用户 nvarscar 的惊人解决方案!

我将在下面复制他/她的答案,以防其他人将来需要它:

您可以使用窗口函数功能,它可以帮助您聚合 当前行和所有前面的行之间的行。代码看起来也 很长,但至少你会看到所采取的步骤。

DECLARE @t TABLE 
    ([CUSTOMER_ID] varchar(5), [DATE] datetime)
;

INSERT INTO @t
    ([CUSTOMER_ID], [DATE])
VALUES
    ('JHON', '2016-01-01 00:00:00'),
    ('JHON', '2016-01-02 00:00:00'),
    ('PEDRO', '2016-01-02 00:00:00'),
    ('JHON', '2016-01-05 00:00:00'),
    ('MIKE', '2016-01-05 00:00:00'),
    ('MIKE', '2016-01-10 00:00:00'),
    ('JHON', '2016-01-07 00:00:00'),
    ('JHON', '2016-02-15 00:00:00'),
    ('JHON', '2016-02-18 00:00:00'),
    ('MIKE', '2016-02-19 00:00:00'),
    ('MIKE', '2016-02-19 00:00:00'),
    ('JHON', '2016-02-01 00:00:00'),
    ('JHON', '2016-02-02 00:00:00'),
    ('PEDRO', '2016-03-02 00:00:00'),
    ('JHON', '2016-03-05 00:00:00'),
    ('MIKE', '2016-05-05 00:00:00'),
    ('MIKE', '2016-05-10 00:00:00'),
    ('JHON', '2016-03-07 00:00:00'),
    ('JHON', '2016-04-15 00:00:00'),
    ('JHON', '2016-04-18 00:00:00'),
    ('MIKE', '2016-06-19 00:00:00'),
    ('MIKE', '2016-06-19 00:00:00')
;


WITH CTE1 AS (
SELECT 
  [CUSTOMER_ID]
, [DATE]
, COUNT(*) AS Transactions
FROM @t
GROUP BY 
  [CUSTOMER_ID]
, [DATE]
)
, CTE2 AS (
SELECT 
  [CUSTOMER_ID]
, [DATE]
, Transactions
, DATEDIFF(day,LAG([DATE]) OVER (PARTITION BY [CUSTOMER_ID] ORDER BY [DATE]),[DATE]) AS DaysSinceLastTransaction
FROM CTE1
)
, CTE3 AS (
SELECT 
  [CUSTOMER_ID]
, [DATE]
, Transactions
, CASE WHEN DaysSinceLastTransaction > 10 THEN 1 ELSE 0 END AS TripTag --Here we set the idle tag
FROM CTE2
)
, CTE4 AS (
SELECT 
  [CUSTOMER_ID]
, [DATE]
, Transactions
, SUM(TripTag) OVER (PARTITION BY [CUSTOMER_ID] ORDER BY [DATE] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS TripTag
FROM CTE3
)
SELECT 
  [CUSTOMER_ID]
, TripTag+1 AS TripNumber
, MIN ([DATE]) AS TripStartDate
, MAX ([DATE]) AS TripEndDate
, DATEDIFF(day, MIN ([DATE]), MAX ([DATE])) AS TripDuration
, SUM(Transactions) AS Transactions
FROM CTE4
GROUP BY [CUSTOMER_ID], TripTag

【讨论】:

    猜你喜欢
    • 2019-04-11
    • 1970-01-01
    • 1970-01-01
    • 2021-04-22
    • 2020-04-08
    • 1970-01-01
    • 1970-01-01
    • 2021-01-19
    • 1970-01-01
    相关资源
    最近更新 更多