【问题标题】:SQL - How to do Window function if there is NULL value?SQL - 如果有 NULL 值,如何执行 Window 函数?
【发布时间】:2021-11-28 10:33:39
【问题描述】:

首先,我有这些信息:

  1. 重量A
  2. 重量 B
  3. 关系 B 到 A:1 到多

这样,可以得到如下结果:

A_Id Weight A Weight B B_Id
1 3 16 1
2 5 16 1
3 6 16 1
4 7 16 1
5 2 12 2
6 6 12 2

现在,再添加两列:Sum Weight A By B_IdAccumulative Difference (将下表视为t2

A_Id Weight A Sum Weight A By B_Id Weight B B_Id Accumulative Diff
1 3 21 16 1 5
2 5 21 16 1 5
3 6 21 16 1 5
4 7 21 16 1 5
5 2 8 12 2 1
6 6 8 12 2 1

例如上面,

  1. 第一行累计差 => 21 - 16 = 5

  2. 第五行累计差=> (21 + 8) - (16 + 12) = 1

所以,我的目标是计算这样的 'Accumulative Difference' 整个结果将显示在报告中。

从技术上讲,通过使用“窗口函数”,这可以毫无问题地实现。 首先,我必须再创建 2 个列:Accumulate Weight A By B_IdAccumulate Weight B。那么,只要找出两者的区别就行了。

我实际上还需要 3 列:

  • [按 B_Id 行]
  • [B_Id 对 A 的总和]
  • [累积重量B]
A_Id Weight A Sum Weight A By B_Id Weight B B_Id Row By B_Id Accumulate Weight A By B_Id Accumulate Weight B Accumulative Diff
1 3 21 16 1 1 21 16 5
2 5 21 16 1 2 21 16 5
3 6 21 16 1 3 21 16 5
4 7 21 16 1 4 21 16 5
5 2 8 12 2 1 29 28 1
6 6 8 12 2 2 29 28 1

示例 SQL(生成t2):

选择 *, [按 B_Id 累积权重 A] = SUM(WeightA) OVER (PARTITION BY ... ORDER BY B_Id), [累积权重 B] = SUM(WeightB) OVER (PARTITION BY ... ORDER BY B_Id) 从 t2 -- (...) 可能是按日期年份月份 -- 累积重量 B 只能设置为第一行等 罢工>
;WITH tableA AS (
SELECT [A_Id] = 1, [Weight] = 3, [B_Id] = 1, [date] = '2021-10-01'
UNION
SELECT [A_Id] = 2, [Weight] = 5, [B_Id] = 1, [date] = '2021-10-02'
UNION
SELECT [A_Id] = 3, [Weight] = 6, [B_Id] = 1, [date] = '2021-10-03'
UNION
SELECT [A_Id] = 4, [Weight] = 7, [B_Id] = 1, [date] = '2021-10-04'
UNION
SELECT [A_Id] = 5, [Weight] = 2, [B_Id] = 2, [date] = '2021-10-05'
UNION
SELECT [A_Id] = 6, [Weight] = 6, [B_Id] = 2, [date] = '2021-10-06'
    
--Uncomment for testing NULL value
--UNION
--SELECT [A_Id] = 7, [Weight] = 9, [B_Id] = NULL, [date] = '2021-10-07'
--UNION
--SELECT [A_Id] = 8, [Weight] = 10, [B_Id] = 3, [date] = '2021-10-08'
    
),
tableB AS (
     SELECT [B_Id] = 1, [Weight] = 16, [date] = '2021-10-03'
     UNION
     SELECT [B_Id] = 2, [Weight] = 12, [date] = '2021-10-06'

    --Uncomment for testing NULL value
    --UNION
    --SELECT [B_Id] = 3, [Weight] = 8, [date] = '2021-10-08'
),
t1a AS (
    SELECT 
        [A_Id] = tableA.A_Id,
        [WeightA] = tableA.Weight,
        [WeightB] = tableB.Weight,
        [B_Id] = tableB.B_Id,
        [Row By B_Id] = ROW_NUMBER() OVER(PARTITION BY tableB.B_Id ORDER BY A_Id)
    FROM 
        tableA 
    FULL JOIN tableB ON tableA.B_Id = tableB.B_Id
),
t1b AS (
    SELECT
        *,
        [Sum Weight A By B_Id] = SUM(WeightA) OVER (ORDER BY B_Id),
        [Accumulate Weight B] = SUM(CASE WHEN [Row By B_Id] = 1 THEN WeightB ELSE 0 END) OVER (ORDER BY B_Id)
    FROM t1a
),
t2 AS (
    SELECT 
        *,
        [Accumulate Difference] = [Sum Weight A By B_Id] - [Accumulate Weight B]
    FROM t1b
)
SELECT 
    *
FROM t2

现在问题来了,如果B_Id 之一是NULL。 (取消注释生成NULL B_Id的部分)

以下是我的预期结果,尤其是在突出显示的行上:

A_Id Weight A Sum Weight A By B_Id Weight B B_Id Accumulate Weight A By B_Id Accumulate Weight B Accumulative Diff
1 3 21 16 1 21 16 5
2 5 21 16 1 21 16 5
3 6 21 16 1 21 16 5
4 7 21 16 1 21 16 5
5 2 8 12 2 29 28 1
6 6 8 12 2 29 28 1
7 9 9 0 NULL 38 28 10
8 7 10 8 3 48 36 12
9 3 10 8 3 48 36 12

但是,对于我的示例查询,这不起作用。相反,会出现以下内容:

NULL B_Id 出现在第一行。 (顺序乱了)

所以我的问题是,如何处理这种情况? (保持原行与预期结果一致)

为什么顺序是这样的呢? (由@ThorstenKettner 提出)

默认顺序是基于B_TransactionDatetime。如果B_Id 为NULL,那么它将基于A_TransactionDatetime。所以,我计算另一列RefDateTime = COALESCE(B_TransactionDatetime, A_TransactionDatetime),并以此为基础进行排序。

PS:

受@ThorstenKettner 启发,我应该在窗口函数中使用RefDateTime,即:

[Sum Weight A By B_Id] = SUM(WeightA) OVER (ORDER BY RefDateTime),
[Accumulate Weight B] = SUM(CASE WHEN [Row By B_Id] = 1 THEN WeightB ELSE 0 END) OVER (ORDER BY RefDateTime)

结案。

【问题讨论】:

  • 你这里用-代表NULLs吗?如果您使用NULL,就不会那么混乱了。 '-' 是一个字符串,这意味着您将数字数据存储在 varchar 中;一个主要的设计缺陷。
  • 您谈到了一组具有NULLB_Id 的数据,但是您的示例数据都没有证明这个问题。
  • @Larnu 感谢您的反馈。我刚刚添加了一个专栏B_Id。同时,'-' 不为 NULL,你可以认为它与上一行的值相同。 (我用连字符表示其他值)
  • 你需要更正你的数据,zeroflaw。 - The value of the previous row。如果多行都具有相同的值,那么它们也应该在您的示例数据中使用。
  • 所以有 A 没有 B。不过我不明白,您按照什么标准对行进行排序。起初我虽然按 b.id,a.weight,但 b.id NULL 的位置与 b.id 3 的行顺序相矛盾。

标签: sql sql-server window-functions


【解决方案1】:

您希望将 B 外连接到 A,因为并非每个 A 都有关联的 B。

然后您按块查看行。一个块是所有行。属于一个 B 或一个没有 B 的单个 A 行。b_id 将成为前者的良好组键,而 a_id 将适用于后者。对于组合键,有不同的选项。 COALESCE(b_id, a_id) 不是其中之一,因为我们可以在结果集中有一个 a_id 1 和一个 b_id 1 ,但不希望它们在同一个组中。一种解决方案是简单的COALESCE(b_id, -a_id),当然前提是您的 ID 不能为负数。

现在,您的所有计算都基于聚合组,即当它们属于 B 组时,您对单个 A 值不感兴趣。出于这个原因,我会立即聚合,并且只在最后再次加入单个 A 行。

行的顺序是COALESCE(b_date, a_date)

 with grouped as
    (
      select
        coalesce(b.b_id, -a.a_id) as grp_id,
        max(coalesce(b.date, a.date)) as grp_date,
        coalesce(max(b.weight), 0) as b_weight,
        sum(a.weight) as a_weight
      from a
      left join b on b.b_id =a.b_id
      group by coalesce(b.b_id, -a.a_id)
    )
    , calculated as
    (
      select
        grp_id,
        grp_date,
        b_weight,
        a_weight,
        sum(a_weight - b_weight) over (order by grp_date) as running_diff
        from grouped
    )
    select *
    from calculated c
    join a on a.b_id = c.grp_id or a.a_id = -c.grp_id
    order by c.grp_date, a.date;

我希望我一切正常。我手头没有电脑,我在手机上打字,结果比我想象的要难:-)

【讨论】:

  • 请有人为我编辑此代码。它在我的手机上不起作用。没有{} 按钮,四个空格的缩进对我来说似乎也不能正常工作。
  • 我已经为你编辑了 :) 你的回答启发了我在窗口函数中使用 RefDateTime COALESCE(b_date, a_date) 成为 ORDERed。 (这可以完成这项工作)。同时,我了解到COALESCE(b_id, -a_id) 可以作为一个组。太棒了。
【解决方案2】:

你可以使用 coalesce()。

SELECT 
 *,
 [Accumulate Weight A By B_Id] = SUM(WeightA) OVER (PARTITION BY B_id ORDER BY B_Id),
 [Accumulate Weight B] = SUM(WeightB) OVER (PARTITION BY B_id ORDER BY B_Id),
 SUM(coalesce(WeightA,0)-coalesce(WeightB,0)) OVER (PARTITION BY B_id ORDER BY B_Id) difference

FROM t2

PS:实际上,您的初始查询对我来说似乎是错误的,如果正确,那么就可以了。 可能你应该给出 A 和 B 的样本数据。对我来说,在加入它们之前 sum() 更有意义。

【讨论】:

  • 嗨,我很抱歉。我在问题中添加了更多信息。
【解决方案3】:

您将不得不进行更改,但这应该会有所帮助。`

SELECT [Accumulate Weight A By B_Id] = SUM(WeightA) OVER (
        PARTITION BY...ORDER BY B_Id
        )
    ,[Accumulate Weight B] = SUM(WeightB) OVER (
        PARTITION BY...ORDER BY B_Id
        )
FROM t2
WHERE B_Id IS NOT NULL

UNION

SELECT [Accumulate Weight A By B_Id] = SUM(TAB.WeightA) OVER (
        PARTITION BY TAB.ROW_NUM ORDER BY B_Id
        )
    ,[Accumulate Weight B] = SUM(TAB.WeightB) OVER (
        PARTITION BY TAB.ROW_NUM ORDER BY B_Id
        )
FROM (
    SELECT WeightA
        ,WeightB
        ,B_Id
        ,ROW_NUMBER() OVER (
            ORDER BY B_ID
            ) AS ROW_NUM
    FROM T2
    WHERE B_ID IS NULL
    ) TAB

`

【讨论】:

  • 嗨,我很抱歉。我在问题中添加了更多信息。从您的查询中可以看出,这是否意味着最终顺序会改变?
猜你喜欢
  • 1970-01-01
  • 2017-03-14
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2021-06-27
  • 2018-02-09
  • 1970-01-01
相关资源
最近更新 更多