【问题标题】:Finding overlapping timestamps in query在查询中查找重叠的时间戳
【发布时间】:2023-01-12 00:13:18
【问题描述】:

我在删除或标记按特定 ID 分组的重叠时间戳时遇到问题。

时间可以在嵌套中重叠,并且可以具有相同的开始时间或结束时间。
如果第二次在上一次结束之前开始,它将在上一次之前或与上一次同时结束。时差不会超过 12 小时。

使用 T-SQL。

样本数据:

ID  task_id starttime                       endtime
11  1       2023-01-10 06:31:00.000         2023-01-10 08:53:00.000
11  1       2023-01-10 08:00:00.000         2023-01-10 08:53:00.000
11  2       2023-01-10 13:14:00.000         2023-01-10 15:15:00.000
11  2       2023-01-10 15:46:00.000         2023-01-10 17:59:00.000
11  2       2023-01-10 18:49:00.000         2023-01-10 18:50:00.000
12  3       2023-01-09 10:10:00.000         2023-01-09 11:10:00.000
12  3       2023-01-09 10:10:00.000         2023-01-09 10:50:00.000
13  4       2023-01-08 20:00:00.000         2023-01-09 03:44:00.000
13  4       2023-01-08 21:00:00.000         2023-01-09 02:00:00.000
14  5       2023-01-01 19:23:00.000         2023-01-01 20:47:00.000
14  5       2023-01-02 03:35:00.000         2023-01-02 06:57:00.000

期望的结果:

ID  task_id starttime                       endtime
11  1       2023-01-10 06:31:00.000         2023-01-10 08:53:00.000
11  2       2023-01-10 13:14:00.000         2023-01-10 15:15:00.000
11  2       2023-01-10 15:46:00.000         2023-01-10 17:59:00.000
11  2       2023-01-10 18:49:00.000         2023-01-10 18:50:00.000
12  3       2023-01-09 10:10:00.000         2023-01-09 11:10:00.000
13  4       2023-01-08 20:00:00.000         2023-01-09 03:44:00.000
14  5       2023-01-01 19:23:00.000         2023-01-01 20:47:00.000
14  5       2023-01-02 03:35:00.000         2023-01-02 06:57:00.000

我尝试过具有领先或滞后功能的方法,但它似乎不能很好地处理边缘情况。 例如:

case when lead(starttime) over (partition by task_id order by starttime) <> endtime then 1 else 0 end as overlap_tag

不将 ID 11 task_id 2 中的时间从 18:49-18:50 计算为不重叠,并且似乎没有考虑日期的变化。

【问题讨论】:

    标签: sql tsql timestamp overlap


    【解决方案1】:

    我只在 PostgreSQL 上测试过它,但它可能会有所帮助。


    情况

    准备

    CREATE TABLE task_duration (
        id INTEGER, 
        task_id INTEGER,
        start_time TIMESTAMP, 
        end_time TIMESTAMP
    );
    
    INSERT INTO task_duration VALUES (11, 1, '2023-01-10 06:31:00.000', '2023-01-10 08:53:00.000');
    INSERT INTO task_duration VALUES (11, 1, '2023-01-10 08:00:00.000', '2023-01-10 08:53:00.000');
    INSERT INTO task_duration VALUES (11, 2, '2023-01-10 13:14:00.000', '2023-01-10 15:15:00.000');
    INSERT INTO task_duration VALUES (11, 2, '2023-01-10 15:46:00.000', '2023-01-10 17:59:00.000');
    INSERT INTO task_duration VALUES (11, 2, '2023-01-10 18:49:00.000', '2023-01-10 18:50:00.000');
    INSERT INTO task_duration VALUES (12, 3, '2023-01-09 10:10:00.000', '2023-01-09 11:10:00.000');
    INSERT INTO task_duration VALUES (12, 3, '2023-01-09 10:10:00.000', '2023-01-09 10:50:00.000');
    INSERT INTO task_duration VALUES (13, 4, '2023-01-08 20:00:00.000', '2023-01-09 03:44:00.000');
    INSERT INTO task_duration VALUES (13, 4, '2023-01-08 21:00:00.000', '2023-01-09 02:00:00.000');
    INSERT INTO task_duration VALUES (14, 5, '2023-01-01 19:23:00.000', '2023-01-01 20:47:00.000');
    INSERT INTO task_duration VALUES (14, 5, '2023-01-02 03:35:00.000', '2023-01-02 06:57:00.000');
    

    询问

    SELECT id, 
        task_id, 
        start_time, 
        end_time
    FROM (
        SELECT id, 
            task_id, 
            start_time, 
            end_time, 
            LAG(start_time) OVER (PARTITION BY task_id ORDER BY task_id, start_time, end_time DESC) AS prev_start_time, 
            LAG(end_time) OVER (PARTITION BY task_id ORDER BY task_id, start_time, end_time DESC) AS prev_end_time
        FROM task_duration
    ) v
    WHERE prev_start_time IS NULL  -- 1st condition
        OR NOT (v.end_time >= v.prev_start_time AND v.start_time <= v.prev_end_time);  -- 2nd condition
    

    结果

    id|task_id|start_time             |end_time               |
    --+-------+-----------------------+-----------------------+
    11|      1|2023-01-10 06:31:00.000|2023-01-10 08:53:00.000|
    11|      2|2023-01-10 13:14:00.000|2023-01-10 15:15:00.000|
    11|      2|2023-01-10 15:46:00.000|2023-01-10 17:59:00.000|
    11|      2|2023-01-10 18:49:00.000|2023-01-10 18:50:00.000|
    12|      3|2023-01-09 10:10:00.000|2023-01-09 11:10:00.000|
    13|      4|2023-01-08 20:00:00.000|2023-01-09 03:44:00.000|
    14|      5|2023-01-01 19:23:00.000|2023-01-01 20:47:00.000|
    14|      5|2023-01-02 03:35:00.000|2023-01-02 06:57:00.000|
    

    【讨论】:

      【解决方案2】:

      试试这个https://dbfiddle.uk/id_waBN_

      with task_duration_wrn(id, task_id, starttime, endtime, rn, act) as (
          select id, task_id, starttime, endtime, 
              rank() over(partition by id, task_id order by starttime, endtime) as rn,
              cast( 
              case when 
               starttime <= lag(endtime) over(partition by id, task_id order by starttime, endtime) 
              then 'PACK' end as VARCHAR(4))
          from task_duration
      ),
      cte(id, task_id, starttime, endtime, rn, lvl, act) as (
          select d.id, d.task_id, d.starttime, d.endtime, d.rn, 1, 
          CAST(NULL AS VARCHAR(4))
          from task_duration_wrn d
          where act is NULL
          
          union all
          
          select d.id, d.task_id, c.starttime, d.endtime, d.rn, c.lvl+1, d.act
          from cte c
          join task_duration_wrn d on c.id = d.id and c.task_id = d.task_id and 
              c.lvl+1 = d.rn
          where d.act = 'PACK'    
      )
      select id, task_id, starttime, max(endtime) as endtime
      from cte c
      group by id, task_id, starttime
      order by id, task_id, starttime
      

      【讨论】:

        猜你喜欢
        • 2020-07-13
        • 2013-01-07
        • 1970-01-01
        • 2015-01-07
        • 1970-01-01
        • 1970-01-01
        • 2018-09-27
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多