【问题标题】:Lag functions and SUM滞后函数和 SUM
【发布时间】:2020-11-04 00:16:24
【问题描述】:

我需要获取每天至少离线 20 分钟的用户列表。这是我的数据

我有这个起始查询,但我被困在如何对offline_mins 的差异求和,即需要在 where 子句中添加“and sum(offline_mins)>=20”

SELECT  
   userid, 
    connected,
   LAG(recordeddt) OVER(PARTITION BY userid
   ORDER BY userid, 
            recordeddt) AS offline_period,
            DATEDIFF(minute, LAG(recordeddt) OVER(PARTITION BY userid
   ORDER BY userid, 
            recordeddt),recordeddt)  offline_mins
FROM device_data where connected=0; 

我的预期结果:

提前致谢。

【问题讨论】:

  • 请出示您的预期结果。
  • (1) 每个用户每 5 分钟是否总是有一条记录? (2) 在“每天”下线至少20分钟的用户中定义“每天”。
  • 每天=24 小时。可能不是每 5 分钟记录一次,但是当用户最终上线时,会有一条 connected=1 的记录

标签: sql sql-server datetime gaps-and-islands date-arithmetic


【解决方案1】:

这看起来像是一个间隙和孤岛问题,您希望将具有相同用户 ID 和状态的相邻行组合在一起。

首先,这是一个计算岛屿的查询:

select userid, connected, min(recordeddt) startdt, max(lead_recordeddt) enddt,
    datediff(min(recordeddt), max(lead_recordeddt)) duration
from (
    select dd.*,
        row_number()     over(partition by userid order by recordeddt) rn1,
        row_number()     over(partition by userid, connected order by recordeddt) rn2,
        lead(recordeddt) over(partition by userid order by recordeddt) lead_recordeddt
    from device_data dd
) dd
group by userid, connected, rn1 - rn2

现在,假设您希望用户每天离线至少 20 分钟。您可以每天细分岛屿,并使用having 子句进行过滤:

select userid
from (
    select recordedday, userid, connected,
        datediff(min(recordeddt), max(lead_recordeddt)) duration
    from (
        select dd.*, v.*,
            row_number()     over(partition by v.recordedday, userid order by recordeddt) rn1,
            row_number()     over(partition by v.recordedday, userid, connected order by recordeddt) rn2,
            lead(recordeddt) over(partition by v.recordedday, userid order by recordeddt) lead_recordeddt
        from device_data dd
        cross apply (values (convert(date, recordeddt))) v(recordedday)
    ) dd
    group by convert(date, recordeddt), userid, connected, rn1 - rn2
) dd
group by userid
having count(distinct case when connected = 0 and duration >= 20 then recordedday end) = count(distinct recordedday)

【讨论】:

    【解决方案2】:

    如前所述,这是一个差距和孤岛问题。这是我对它的看法,它使用一个简单的滞后函数来创建组,过滤掉连接的行,然后处理日期范围。

    CREATE TABLE #tmp(ID int, UserID int, dt datetime, connected int)
    INSERT INTO #tmp VALUES
    (1,1,'11/2/20 10:00:00',1),
    (2,1,'11/2/20 10:05:00',0),
    (3,1,'11/2/20 10:10:00',0),
    (4,1,'11/2/20 10:15:00',0),
    (5,1,'11/2/20 10:20:00',0),
    (6,2,'11/2/20 10:00:00',1),
    (7,2,'11/2/20 10:05:00',1),
    (8,2,'11/2/20 10:10:00',0),
    (9,2,'11/2/20 10:15:00',0),
    (10,2,'11/2/20 10:20:00',0),
    (11,2,'11/2/20 10:25:00',0),
    (12,2,'11/2/20 10:30:00',0)
    
    
    SELECT UserID, connected,DATEDIFF(minute,MIN(DT), MAX(DT)) OFFLINE_MINUTES 
    FROM
    (
        SELECT *, SUM(CASE WHEN connected <> LG THEN 1 ELSE 0 END) OVER (ORDER BY UserID,dt) grp
        FROM
        (
            select *, LAG(connected,1,connected) OVER(PARTITION BY UserID ORDER BY UserID,dt) LG
            from #tmp
        ) x
    ) y
    WHERE connected <> 1
    GROUP BY UserID,grp,connected
    HAVING DATEDIFF(minute,MIN(DT), MAX(DT)) >= 20
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2013-06-20
      • 2016-10-08
      • 1970-01-01
      • 2018-02-08
      • 2017-04-30
      相关资源
      最近更新 更多