显示同一个表中的行之间的差异答案

【问题标题】：Displaying the difference between rows in the same table显示同一个表中的行之间的差异
【发布时间】：2021-03-11 16:18:39
【问题描述】：

我有一个名为 Employee_audit 的表，具有以下架构，

emp_audit_id	eid	name	salary
1	1	Daniel	1000
2	1	Dani	1000
3	1	Danny	3000

我的目标是编写一个 SQL 查询，它将以以下格式返回，同时考虑到第一行也从 null 更改为值。

columnName	oldValue	newValue
name	null	Daniel
salary	null	1000
name	Daniel	Dani
name	Dani	Danny
salary	1000	3000

我已经编写了下面的 SQL 查询，

WITH cte  AS 
(
  SELECT empid,
         name,
         salary, 
         rn=ROW_NUMBER()OVER(PARTITION BY empid ORDER BY emp_audit_id)
  FROM   Employee_audit 
) 
SELECT oldname=CASE WHEN c1.Name=c2.Name THEN '' ELSE C1.Name END,
       newname=CASE WHEN c1.Name=c2.Name THEN '' ELSE C2.Name END,
       oldsalary=CASE WHEN c1.salary=c2.salary THEN NULL ELSE C1.salary END,
       newsalary=CASE WHEN c1.salary=c2.salary THEN NULL ELSE C2.salary END
FROM cte c1 INNER JOIN cte c2 
ON c1.empid=c2.empid AND c2.RN=c1.RN + 1

但它以以下格式给出结果

oldname	newname	oldsalary	newsalary
Daniel	Dani	null	null
Dani	Danny	1000	3000

你能回答我吗，我怎样才能得到所需的结果。

【问题讨论】：

您已经标记了 2 个不同的数据库 mysql 和 sql-server。你确定你需要他们两个的解决方案吗？
请只标记感兴趣的关系型数据库...而不是多个。而且，如果您将示例数据显示为 DDL+DML，那么人们可以更轻松地提供帮助。
你想做什么？哪个数据库？ ROW_NUMBER() 不会给你集合中的上一个或下一个值，这就是 LEAD() 或 LAG() 会做的。所有这些功能都添加到 MySQL 8 中，因此您要么要求 SQL Server，要么特别要求 MySQL 8
SQL Server 自 2005 年以来具有更改跟踪功能，自 2016 年以来具有时态表，基本上在所有受支持的版本中。您可以创建比Employee_audit 更好的解决方案。至于你想要的format，没有办法确定哪个更改是指哪个行（主键）。
@Techie321 你想要 SQL 和 T-SQL 标签，它们确实是问题所在。 SQL Server 只是 RDBMS，但 T-SQL 是您正在编写查询的内容。

标签： sql sql-server tsql

【解决方案1】：

如果您在 CTE 中为每一行指定一个行号，然后将自己连接到下一行，您可以比较旧值和新值。合并 2 个不同的列名有点笨拙，但是如果您需要更强大的解决方案，您可能会考虑对数据进行透视。

您显然还必须将所有值转换为通用数据类型，例如一个字符串。

declare @Test table (emp_audit_id int, eid int, [name] varchar(32), salary money);

insert into @Test (emp_audit_id, eid, [name], salary)
values
(1, 1, 'Daniel', 1000),
(2, 1, 'Dani', 1000),
(3, 1, 'Danny', 3000);

with cte as (
    select emp_audit_id, eid, [name], salary
      , row_number() over (partition by eid order by emp_audit_id) rn
    from @Test
)
select C.emp_audit_id, 'name' columnName, P.[Name] oldValue, C.[name] newValue
from cte C
left join cte P on P.eid = C.eid and P.rn + 1 = C.rn
where coalesce(C.[name],'') != coalesce(P.[Name],'')
union all
select C.emp_audit_id, 'salary' columnName, convert(varchar(21),P.salary), convert(varchar(21),C.salary)
from cte C
left join cte P on P.eid = C.eid and P.rn + 1 = C.rn
where coalesce(C.salary,0) != coalesce(P.salary,0)
order by C.emp_audit_id, columnName;

emp_audit_id	columnName	oldValue	newValue
1	name	NULL	Daniel
1	salary	NULL	1000.00
2	name	Daniel	Dani
3	name	Dani	Danny
3	salary	1000.00	3000.00

我强烈建议您将 DDL+DML（如上所示）添加到您未来的所有问题中，因为它使人们更容易提供帮助。

【讨论】：

谢谢 Dale，它工作得很好。 :)
@Techie321 不用担心 - 其他解决方案也可以完美运行 - 让您了解并选择技术。
Dale ，我们如何在单个选择查询中做到这一点，因为假设有 10 个字段，10 个选择查询和联合都使查询变慢。
Techie321 我知道你会问这个，这次你需要问一个新问题，确保它准确反映你的问题。 @PanagiotisKanavos 在 cmets 中提出的建议将是一个起点。

【解决方案2】：

超前和滞后功能可以帮助您。

“差异”计算您需要查找差异的每一列的差异

with diffs as (
    select 'name' colName, emp_audit_id, eid, lag(name, 1, null) over (partition by eid order by emp_audit_id) oldValue, name newValue
    from some_table
    union all
    select 'salary', emp_audit_id, eid, cast(lag(salary, 1, null) over (partition by eid order by emp_audit_id) as varchar), cast(salary as varchar) newValue
    from some_table
)
select * 
from diffs 
where oldValue <> newValue or oldValue is null 
order by emp_audit_id, eid

【讨论】：

嗨@PanagiotisKanavos 这是一个有趣的观点 - 发布您的答案并让最佳解决方案（从 OP 的角度来看）获胜！ (:
它会很丑，即使它有效....要每行获得多个结果，CROSS APPLY with rows 可以工作。但是 SELECT 将需要一个 case 为每个属性字段和一个大的case 来确定字段名称。
@Techie321 只能由您自己找出答案。针对具有真实数据的数据库运行这两种解决方案，并比较输出和特征（IO、CPU 时间和执行时间）。 PanagiotisKanavos 建议的解决方案可能是这里更好的方法。
嘿@PanagiotisKanavos！我真的认为您需要将您的想法作为单独的答案发布
如果您在 SSMS 中打开执行计划显示，它会显示每个运行查询所用时间的百分比。因此，在我的窗口中，8% 用于设置表和数据，58% 用于运行我的查询，34% 用于运行您的查询（我将它们都放在同一个窗口中）。它相当粗略和现成，而且不是很准确（为此使用统计数据），但可以了解两种解决方案中哪一种表现更好。