如何创建一个仅计算 redshift 上其他列变化的列？答案

【问题标题】：How can I create a column which computes only the change of other column on redshift?如何创建一个仅计算 redshift 上其他列变化的列？
【发布时间】：2021-03-20 07:03:05
【问题描述】：

我有这个数据集：

product   customer    date                        value     buyer_position
A         123455      2020-01-01 00:01:01         100       1
A         123456      2020-01-02 00:02:01         100       2
A         523455      2020-01-02 00:02:05         100       NULL
A         323455      2020-01-03 00:02:07         100       NULL
A         423455      2020-01-03 00:09:01         100       3
B         100455      2020-01-01 00:03:01         100       1
B         999445      2020-01-01 00:04:01         100       NULL
B         122225      2020-01-01 00:04:05         100       2
B         993848      2020-01-01 10:04:05         100       3
B         133225      2020-01-01 11:04:05         100       NULL
B         144225      2020-01-01 12:04:05         100       4

数据集包含公司销售的产品和看到该产品的客户。一个客户可以看到多个产品，但是产品+客户的组合没有任何重复。我想知道有多少人在客户看到产品之前购买了它。

这将是完美的输出：

product   customer    date                        value     buyer_position     people_before
A         123455      2020-01-01 00:01:01         100       1                  0
A         123456      2020-01-02 00:02:01         100       2                  1
A         523455      2020-01-02 00:02:05         100       NULL               2
A         323455      2020-01-03 00:02:07         100       NULL               2
A         423455      2020-01-03 00:09:01         100       3                  2
B         100455      2020-01-01 00:03:01         100       1                  0
B         999445      2020-01-01 00:04:01         100       NULL               1
B         122225      2020-01-01 00:04:05         100       2                  1
B         993848      2020-01-01 10:04:05         100       3                  2
B         133225      2020-01-01 11:04:05         100       NULL               3
B         144225      2020-01-01 12:04:05         100       4                  3

如您所见，当客户 122225 看到他想要的产品时，已经有两个人购买了。以客户 323455 为例，已有两个人购买了产品 A。

我想我应该使用一些窗口函数，比如 lag()。但是 lag() 函数不会得到这个“累积”信息。所以我有点迷路了。

【问题讨论】：

标签： sql count amazon-redshift window-functions lag

【解决方案1】：

这看起来像是非null 值的窗口计数buyer_position 在前面的行中：

select t.*,
    coalesce(count(buyer_position) over(
        partition by product
        order by date
        rows between unbounded preceding and 1 preceding
    ), 0) as people_before
from mytable t

【讨论】：

【解决方案2】：

嗯。 . .如果我理解正确，您希望客户/产品的买方职位的最大值减去 1：

select t.*,
       max(buyer_position) over (partition by customer, product order by date rows between unbounded preceding and current row) - 1
from t;

【讨论】：

它没有用。当买家职位列为空时，列之前的人也将为空。这不是我想要做的。
@dummyds 。 . .您的示例数据中没有第一个买家位置为 NULL 的示例，因此不清楚在这种情况下您想要什么。