【问题标题】:Merge consecutive duplicate records including time range合并包含时间范围的连续重复记录
【发布时间】:2018-05-11 10:47:20
【问题描述】:

我有一个与这里提出的问题非常相似的问题:Merge duplicate temporal records in database

这里的区别是,我需要结束日期是实际日期而不是 NULL。

所以给出以下数据:

EmployeeId   StartDate   EndDate     Column1   Column2
1000         2009/05/01  2010/04/30   X         Y
1000         2010/05/01  2011/04/30   X         Y
1000         2011/05/01  2012/04/30   X         X
1000         2012/05/01  2013/04/30   X         Y
1000         2013/05/01  2014/04/30   X         X
1000         2014/05/01  2014/06/01   X         X

想要的结果是:

EmployeeId   StartDate   EndDate     Column1   Column2
1000         2009/05/01  2011/04/30   X         Y
1000         2011/05/01  2012/04/30   X         X
1000         2012/05/01  2013/04/30   X         Y
1000         2013/05/01  2014/06/01   X         X

链接线程中建议的解决方案是这样的:

with  t1 as  --tag first row with 1 in a continuous time series
(
select t1.*, case when t1.column1=t2.column1 and t1.column2=t2.column2
                  then 0 else 1 end as tag
  from test_table t1
  left join test_table t2
    on t1.EmployeeId= t2.EmployeeId and dateadd(day,-1,t1.StartDate)= t2.EndDate
)
select t1.EmployeeId, t1.StartDate, 
       case when min(T2.StartDate) is null then null
            else dateadd(day,-1,min(T2.StartDate)) end as EndDate,
       t1.Column1, t1.Column2
  from (select t1.* from t1 where tag=1 ) as t1  -- to get StartDate
  left join (select t1.* from t1 where tag=1 ) as t2  -- to get a new EndDate
    on t1.EmployeeId= t2.EmployeeId and t1.StartDate < t2.StartDate
 group by t1.EmployeeId, t1.StartDate, t1.Column1,   t1.Column2;

但是,当您需要结束日期而不仅仅是 NULL 时,这似乎不起作用。

有人可以帮我解决这个问题吗?

【问题讨论】:

    标签: db2


    【解决方案1】:

    这个怎么样?

    create table test_table (EmployeeId int, StartDate  date, EndDate  date,   Column1 char(1),  Column2 char(1))
    ;
    insert into test_table values
     (1000    ,     '2009-05-01','2010-04-30','X','Y')
    ,(1000    ,     '2010-05-01','2011-04-30','X','Y')
    ,(1000    ,     '2011-05-01','2012-04-30','X','X')
    ,(1000    ,     '2012-05-01','2013-04-30','X','Y')
    ,(1000    ,     '2013-05-01','2014-04-30','X','X')
    ,(1000    ,     '2014-05-01','2014-06-01','X','X')
    ;
    SELECT EmployeeId, StartDate, EndDate, Column1, Column2 FROM 
    (
        SELECT EmployeeId, StartDate 
        ,      MAX(EndDate) OVER(PARTITION BY EmployeeId, RN) AS EndDate
        ,      Column1 
        ,      Column2
        ,      DIFF
        FROM
        (
            SELECT t.*
            ,      SUM(DIFF) OVER(PARTITION BY EmployeeId ORDER BY StartDate ) AS RN
            FROM
            (
                SELECT t.*
                ,      CASE WHEN
                           Column1 = LAG(Column1,1) OVER(PARTITION BY EmployeeId ORDER BY StartDate)
                       AND Column2 = LAG(Column2,1) OVER(PARTITION BY EmployeeId ORDER BY StartDate)
                       THEN 0 ELSE 1 END AS DIFF
                FROM
                    test_table t
            ) t
        )
    )
    WHERE DIFF = 1
    ;
    

    【讨论】:

      【解决方案2】:

      这是另一种解决方案(取自How do I group on continuous ranges)。编码更简单,也可以处理 NULL 值(即处理 NULL = NULL 与简单的 LAG() 比较不同)。但是,由于GROUP BY

      ,它在处理大量数据时可能效率不高
      SELECT EmployeeId
      ,      MIN(StartDate) AS StartDate
      ,      MAX(EndDate)   AS EndDate
      ,      Column1 
      ,      Column2
      FROM
      (
          SELECT t.*
          ,      ROW_NUMBER() OVER(PARTITION BY EmployeeId, Column1, Column2 ORDER BY StartDate ) AS GRN
          ,      ROW_NUMBER() OVER(PARTITION BY EmployeeId                   ORDER BY StartDate ) AS RN
          FROM 
                 test_table t
          ) t
      GROUP BY
             EmployeeId
      ,      Column1 
      ,      Column2
      ,      RN - GRN
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2018-12-06
        • 1970-01-01
        • 2013-03-24
        • 2023-03-03
        • 2014-03-02
        • 1970-01-01
        • 2021-12-01
        相关资源
        最近更新 更多