【发布时间】:2022-12-01 07:14:27
【问题描述】:
样品表:
| object_id | event_time | event_type | event_subtype | stage |
|---|---|---|---|---|
| 1 | 2022-10-01 | create | name, stage | A |
| 1 | 2022-10-02 | update | stage | B |
| 1 | 2022-10-03 | update | stage | C |
| 1 | 2022-10-04 | update | stage | A |
| 2 | 2022-10-01 | create | name, stage | A |
| 2 | 2022-10-02 | update | stage | C |
| 2 | 2022-10-03 | update | stage | A |
| 2 | 2022-10-04 | update | stage | B |
| 2 | 2022-10-05 | update | stage | C |
| 2 | 2022-10-06 | update | stage | A |
所以我需要的是一个根据阶段对行进行编号的列——在 object_id 到达阶段 C 之后,相同 object_id 的行号应该递增。它看起来像这样:
| object_id | event_time | event_type | event_subtype | stage | row_number |
|---|---|---|---|---|---|
| 1 | 2022-10-01 | create | name, stage | A | 1 |
| 1 | 2022-10-02 | update | stage | B | 1 |
| 1 | 2022-10-03 | update | stage | C | 1 |
| 1 | 2022-10-04 | update | stage | A | 2 |
| 2 | 2022-10-01 | create | name, stage | A | 1 |
| 2 | 2022-10-02 | update | stage | C | 1 |
| 2 | 2022-10-03 | update | stage | A | 2 |
| 2 | 2022-10-04 | update | stage | B | 2 |
| 2 | 2022-10-05 | update | stage | C | 2 |
| 2 | 2022-10-06 | update | stage | A | 3 |
该表必须按 object_id、event_time 排序。我在编写执行此操作的窗口函数时遇到了麻烦,这是我尝试过的:
row_number() over (partition by object_id, stage order by event_time)
它只是不适用于所有情况。此外,当我没有将 stage = C 定义为任何地方的分隔符时,我也很难理解这是如何工作的。有任何想法吗?
谢谢!
【问题讨论】:
标签: sql presto row-number