识别历史表中的值变化答案

【问题标题】：Identify value changes in history table识别历史表中的值变化
【发布时间】：2021-08-04 15:21:57
【问题描述】：

我有下表，除了其他属性之外还包含：

客户 ID - 唯一标识符
价值
CreatedDate - 创建记录的时间（基于 ETL）
UpdatedDate - 直到记录有效时

由于除了 [Value] 之外还有其他属性正在跟踪历史值，因此可能存在以下情况，即同一客户的多行具有相同的 [Value]，但 [CreatedDate 中的时间戳不同] / [更新日期]。因此，数据可能如下所示：

Customer ID	Value	CreatedDate	UpdatedDate
1	111	04/08/2021 15:00	04/08/2021 17:00
1	111	01/08/2021 09:00	04/08/2021 15:00
1	222	20/07/2021 01:30	01/08/2021 09:00
1	222	01/06/2021 08:00	20/07/2021 01:30
1	111	01/04/2021 07:15	01/06/2021 08:00
2	333	03/08/2021 04:30	04/08/2021 17:00
2	444	23/07/2021 01:20	03/08/2021 04:30
2	444	01/04/2021 13:50	23/07/2021 01:20

我想以正确的顺序保持唯一的 [Values]，因此将 [Value] 保留为最早的 [CreatedDate]，但是，如果 Customer 最初具有 Value1，则将其更改为 Value2，最后更改回 Value1 .我也想保留这两个更改。因此理想的输出应该是这样的：

Customer ID	Value	CreatedDate	UpdatedDate
1	111	01/08/2021 09:00	04/08/2021 17:00
1	222	01/06/2021 08:00	01/08/2021 09:00
1	111	01/04/2021 07:15	01/06/2021 08:00
2	333	03/08/2021 04:30	04/08/2021 17:00
2	444	01/04/2021 13:50	03/08/2021 04:30

根据 CreatedDate/UpdatedDate 确定，按时间顺序变化并确定最早的 CreatedDate 和最晚的 UpdatedDate。但是，如果某个值多次出现，但被不同的值穿插，我也想保留它。

我尝试了以下方法，效果很好，但不适用于上述场景，输出如下：

SELECT [Customer ID]
        ,Value
        ,MIN(CreatedDate) as CreatedDate
        ,MAX(UpdatedDate) as UpdatedDate
                    
FROM #History
GROUP BY ID, Value

Customer ID	Value	CreatedDate	UpdatedDate
1	111	01/04/2021 07:15	04/08/2021 17:00
1	222	01/06/2021 08:00	01/08/2021 09:00
2	333	03/08/2021 04:30	04/08/2021 17:00
2	444	01/04/2021 13:50	03/08/2021 04:30

有什么想法吗？我也尝试过使用 LAG 和 LEAD，但也没能成功。

【问题讨论】：

这能回答你的问题吗？ Find min and max for subsets of consecutive rows - gaps and islands
这能回答你的问题吗？ Group similar objects in different date ranges to get min and max dates in SQL Server
我不确定我是否理解 UpdatedDate 列的含义；你能详细说明一下吗？同样要清楚的是，该行插入历史表时的 CreatedDate 记录是否正确？
@JacobFW 是的，这是正确的。对于最新记录，UpdatedDate 通常为 NULL，一旦识别出更改，UpdateDate 就是上面行的 CreatedDate。有意义吗？
@Srpic 明白了。所以历史表记录了变化，但是你有另一个表有最新的值吗？例如，假设您有一个存储联系人信息的表，例如名字、姓氏、地址、电话号码等。您将拥有始终包含最新信息的主要联系人表，然后是记录所有更改的历史记录表。你有这样的桌子吗？

标签： sql sql-server tsql

【解决方案1】：

这是一种孤岛问题，最好通过使用累积最大值寻找重叠来解决：

select customerid, min(createddate), max(updateddate)
from (select t.*,
             sum(case when prev_updatedate >= createddate then 0 else 1 end) over (partition by customerid, value order by createddate) as grp
      from (select h.*,
                   max(updateddate) over (partition by customerid, value order by createddate rows between unbounded preceding and 1 preceding) as prev_updatedate
            from #history h
           ) h
     ) h
group by customerid, value, grp;

逻辑是查看每个客户和价值的每一行的最新updatedate之前。如果这早于该行的创建日期，则此开始是新组。

最终结果只是聚合每个组中的行。

【讨论】：