【问题标题】:row_number() expression question in SQL PrestoSQL Presto 中的 row_number() 表达式问题
【发布时间】:2022-12-01 07:14:27
【问题描述】:

样品表:

object_id event_time event_type event_subtype stage
1 2022-10-01 create name, stage A
1 2022-10-02 update stage B
1 2022-10-03 update stage C
1 2022-10-04 update stage A
2 2022-10-01 create name, stage A
2 2022-10-02 update stage C
2 2022-10-03 update stage A
2 2022-10-04 update stage B
2 2022-10-05 update stage C
2 2022-10-06 update stage A

所以我需要的是一个根据阶段对行进行编号的列——在 object_id 到达阶段 C 之后,相同 object_id 的行号应该递增。它看起来像这样:

object_id event_time event_type event_subtype stage row_number
1 2022-10-01 create name, stage A 1
1 2022-10-02 update stage B 1
1 2022-10-03 update stage C 1
1 2022-10-04 update stage A 2
2 2022-10-01 create name, stage A 1
2 2022-10-02 update stage C 1
2 2022-10-03 update stage A 2
2 2022-10-04 update stage B 2
2 2022-10-05 update stage C 2
2 2022-10-06 update stage A 3

该表必须按 object_id、event_time 排序。我在编写执行此操作的窗口函数时遇到了麻烦,这是我尝试过的:

row_number() over (partition by object_id, stage order by event_time)

它只是不适用于所有情况。此外,当我没有将 stage = C 定义为任何地方的分隔符时,我也很难理解这是如何工作的。有任何想法吗?

谢谢!

【问题讨论】:

    标签: sql presto row-number


    【解决方案1】:

    给你最终“正确”的顺序不是 row_number 的问题。

    对于你雾使用ORDER BY

    SELECT
    "object_id", "event_time", "event_type", "event_subtype", "stage",
    ROW_NUMBER() OVER(PARTITION BY "object_id","stage" ORDER BY "event_time") rn
      FROM tab1
      ORDER BY "object_id",rn,"stage"
    
    object_id event_time event_type event_subtype stage rn
    1 2022-10-01 create name, stage A 1
    1 2022-10-02 update stage B 1
    1 2022-10-03 update stage C 1
    1 2022-10-04 update stage A 2
    2 2022-10-01 create name, stage A 1
    2 2022-10-04 update stage B 1
    2 2022-10-02 update stage C 1
    2 2022-10-03 update stage A 2
    2 2022-10-05 update stage C 2
    2 2022-10-06 update stage A 3

    【讨论】:

      【解决方案2】:

      我建议对 stage 的先前值使用总和:

      -- sample data
      with dataset(object_id, event_time, event_type, event_subtype, stage) as (
          values    (1, '2022-10-01', 'create',   'name, stage', 'A'),
          (1, '2022-10-02', 'update', 'stage', 'B'),
          (1, '2022-10-03', 'update', 'stage', 'C'),
          (1, '2022-10-04', 'update', 'stage', 'A'),
          (2, '2022-10-01', 'create', 'name, stage',' A'),
          (2, '2022-10-02', 'update', 'stage', 'C'),
          (2, '2022-10-03', 'update', 'stage', 'A'),
          (2, '2022-10-04', 'update', 'stage', 'B'),
          (2, '2022-10-05', 'update', 'stage', 'C'),
          (2, '2022-10-06', 'update', 'stage', 'A')
      )
      
      -- query
      select object_id,
             event_time,
             event_type,
             event_subtype,
             stage,
             1 + sum(counter) over (partition by object_id order by event_time) as num
      from (select *,
                   if(lag(stage) over (partition by object_id order by event_time) = 'C', 1, 0) counter
            from dataset);
      

      输出:

      object_id event_time event_type event_subtype stage num
      1 2022-10-01 create name, stage A 1
      1 2022-10-02 update stage B 1
      1 2022-10-03 update stage C 1
      1 2022-10-04 update stage A 2
      2 2022-10-01 create name, stage A 1
      2 2022-10-02 update stage C 1
      2 2022-10-03 update stage A 2
      2 2022-10-04 update stage B 2
      2 2022-10-05 update stage C 2
      2 2022-10-06 update stage A 3

      【讨论】:

        猜你喜欢
        • 2011-03-31
        • 2010-09-27
        • 2012-06-01
        • 2012-06-14
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多