【发布时间】:2021-11-28 10:33:39
【问题描述】:
首先,我有这些信息:
- 重量A
- 重量 B
- 关系 B 到 A:1 到多
这样,可以得到如下结果:
| A_Id | Weight A | Weight B | B_Id |
|---|---|---|---|
| 1 | 3 | 16 | 1 |
| 2 | 5 | 16 | 1 |
| 3 | 6 | 16 | 1 |
| 4 | 7 | 16 | 1 |
| 5 | 2 | 12 | 2 |
| 6 | 6 | 12 | 2 |
现在,再添加两列:Sum Weight A By B_Id、Accumulative Difference
(将下表视为t2)
| A_Id | Weight A | Sum Weight A By B_Id | Weight B | B_Id | Accumulative Diff |
|---|---|---|---|---|---|
| 1 | 3 | 21 | 16 | 1 | 5 |
| 2 | 5 | 21 | 16 | 1 | 5 |
| 3 | 6 | 21 | 16 | 1 | 5 |
| 4 | 7 | 21 | 16 | 1 | 5 |
| 5 | 2 | 8 | 12 | 2 | 1 |
| 6 | 6 | 8 | 12 | 2 | 1 |
例如上面,
-
第一行累计差 => 21 - 16 = 5
-
第五行累计差=> (21 + 8) - (16 + 12) = 1
所以,我的目标是计算这样的 'Accumulative Difference' 整个结果将显示在报告中。
从技术上讲,通过使用“窗口函数”,这可以毫无问题地实现。
首先,我必须再创建 2 个列:Accumulate Weight A By B_Id、Accumulate Weight B。那么,只要找出两者的区别就行了。
我实际上还需要 3 列:
- [按 B_Id 行]
- [B_Id 对 A 的总和]
- [累积重量B]
| A_Id | Weight A | Sum Weight A By B_Id | Weight B | B_Id | Row By B_Id | Accumulate Weight A By B_Id | Accumulate Weight B | Accumulative Diff |
|---|---|---|---|---|---|---|---|---|
| 1 | 3 | 21 | 16 | 1 | 1 | 21 | 16 | 5 |
| 2 | 5 | 21 | 16 | 1 | 2 | 21 | 16 | 5 |
| 3 | 6 | 21 | 16 | 1 | 3 | 21 | 16 | 5 |
| 4 | 7 | 21 | 16 | 1 | 4 | 21 | 16 | 5 |
| 5 | 2 | 8 | 12 | 2 | 1 | 29 | 28 | 1 |
| 6 | 6 | 8 | 12 | 2 | 2 | 29 | 28 | 1 |
示例 SQL(生成t2):
;WITH tableA AS (
SELECT [A_Id] = 1, [Weight] = 3, [B_Id] = 1, [date] = '2021-10-01'
UNION
SELECT [A_Id] = 2, [Weight] = 5, [B_Id] = 1, [date] = '2021-10-02'
UNION
SELECT [A_Id] = 3, [Weight] = 6, [B_Id] = 1, [date] = '2021-10-03'
UNION
SELECT [A_Id] = 4, [Weight] = 7, [B_Id] = 1, [date] = '2021-10-04'
UNION
SELECT [A_Id] = 5, [Weight] = 2, [B_Id] = 2, [date] = '2021-10-05'
UNION
SELECT [A_Id] = 6, [Weight] = 6, [B_Id] = 2, [date] = '2021-10-06'
--Uncomment for testing NULL value
--UNION
--SELECT [A_Id] = 7, [Weight] = 9, [B_Id] = NULL, [date] = '2021-10-07'
--UNION
--SELECT [A_Id] = 8, [Weight] = 10, [B_Id] = 3, [date] = '2021-10-08'
),
tableB AS (
SELECT [B_Id] = 1, [Weight] = 16, [date] = '2021-10-03'
UNION
SELECT [B_Id] = 2, [Weight] = 12, [date] = '2021-10-06'
--Uncomment for testing NULL value
--UNION
--SELECT [B_Id] = 3, [Weight] = 8, [date] = '2021-10-08'
),
t1a AS (
SELECT
[A_Id] = tableA.A_Id,
[WeightA] = tableA.Weight,
[WeightB] = tableB.Weight,
[B_Id] = tableB.B_Id,
[Row By B_Id] = ROW_NUMBER() OVER(PARTITION BY tableB.B_Id ORDER BY A_Id)
FROM
tableA
FULL JOIN tableB ON tableA.B_Id = tableB.B_Id
),
t1b AS (
SELECT
*,
[Sum Weight A By B_Id] = SUM(WeightA) OVER (ORDER BY B_Id),
[Accumulate Weight B] = SUM(CASE WHEN [Row By B_Id] = 1 THEN WeightB ELSE 0 END) OVER (ORDER BY B_Id)
FROM t1a
),
t2 AS (
SELECT
*,
[Accumulate Difference] = [Sum Weight A By B_Id] - [Accumulate Weight B]
FROM t1b
)
SELECT
*
FROM t2
现在问题来了,如果B_Id 之一是NULL。 (取消注释生成NULL B_Id的部分)
以下是我的预期结果,尤其是在突出显示的行上:
| A_Id | Weight A | Sum Weight A By B_Id | Weight B | B_Id | Accumulate Weight A By B_Id | Accumulate Weight B | Accumulative Diff |
|---|---|---|---|---|---|---|---|
| 1 | 3 | 21 | 16 | 1 | 21 | 16 | 5 |
| 2 | 5 | 21 | 16 | 1 | 21 | 16 | 5 |
| 3 | 6 | 21 | 16 | 1 | 21 | 16 | 5 |
| 4 | 7 | 21 | 16 | 1 | 21 | 16 | 5 |
| 5 | 2 | 8 | 12 | 2 | 29 | 28 | 1 |
| 6 | 6 | 8 | 12 | 2 | 29 | 28 | 1 |
| 7 | 9 | 9 | 0 | NULL | 38 | 28 | 10 |
| 8 | 7 | 10 | 8 | 3 | 48 | 36 | 12 |
| 9 | 3 | 10 | 8 | 3 | 48 | 36 | 12 |
但是,对于我的示例查询,这不起作用。相反,会出现以下内容:
NULL B_Id 出现在第一行。 (顺序乱了)
所以我的问题是,如何处理这种情况? (保持原行与预期结果一致)
为什么顺序是这样的呢? (由@ThorstenKettner 提出)
默认顺序是基于B_TransactionDatetime。如果B_Id 为NULL,那么它将基于A_TransactionDatetime。所以,我计算另一列RefDateTime = COALESCE(B_TransactionDatetime, A_TransactionDatetime),并以此为基础进行排序。
PS:
受@ThorstenKettner 启发,我应该在窗口函数中使用RefDateTime,即:
[Sum Weight A By B_Id] = SUM(WeightA) OVER (ORDER BY RefDateTime),
[Accumulate Weight B] = SUM(CASE WHEN [Row By B_Id] = 1 THEN WeightB ELSE 0 END) OVER (ORDER BY RefDateTime)
结案。
【问题讨论】:
-
你这里用
-代表NULLs吗?如果您使用NULL,就不会那么混乱了。'-'是一个字符串,这意味着您将数字数据存储在varchar中;一个主要的设计缺陷。 -
您谈到了一组具有
NULL的B_Id的数据,但是您的示例数据都没有证明这个问题。 -
@Larnu 感谢您的反馈。我刚刚添加了一个专栏
B_Id。同时,'-' 不为 NULL,你可以认为它与上一行的值相同。 (我用连字符表示其他值) -
你需要更正你的数据,zeroflaw。
-The value of the previous row。如果多行都具有相同的值,那么它们也应该在您的示例数据中使用。 -
所以有 A 没有 B。不过我不明白,您按照什么标准对行进行排序。起初我虽然按 b.id,a.weight,但 b.id NULL 的位置与 b.id 3 的行顺序相矛盾。
标签: sql sql-server window-functions