【问题标题】:Add a column to populate rank for every group添加一列以填充每个组的排名
【发布时间】:2018-09-23 18:05:37
【问题描述】:

我有帐户详细信息的历史数据,其中帐户活动状态为“活动”或“已取消”。重新打开帐户时,帐户状态变为“活动”,稍后可以变为“已取消”,如下所示数据。现在我想在每次重新打开帐户时区分数据(使用 account_sub_number)。

我使用了以下查询:

select status,status_code,account_number,date, 
row_number() over (partition by account_number,status_code order by 
date  ) as Account_Sub_Number
 from schema.account where account_number= 1234
 order by date

源数据:

Account Number  Status  Status Code Date
1234    Active  A   2017-12-04
1234    Active  A   2017-12-05
1234    Active  A   2017-12-06
1235    Active  A   2017-12-07
1234    Active  A   2018-03-02
1234    Cancelled   C   2018-03-03
1234    Cancelled   C   2018-03-04
1234    Cancelled   C   2018-05-10
1234    Cancelled   C   2018-05-11
1234    Active  A   2018-05-24
1234    Active  A   2018-05-25
1234    Active  A   2018-05-26
1234    Active  A   2018-05-27
1234    Cancelled   C   2018-05-28
1234    Cancelled   C   2018-06-15
1234    Cancelled   C   2018-06-16
1234    Cancelled   C   2018-06-17

需要的输出:

    Account Number  Status  Status Code Date    Account Sub Number
1234    Active  A   2017-12-04  1
1234    Active  A   2017-12-05  1
1234    Active  A   2017-12-06  1
1235    Active  A   2017-12-07  1
1234    Active  A   2018-03-02  1
1234    Cancelled   C   2018-03-03  1
1234    Cancelled   C   2018-03-04  1
1234    Cancelled   C   2018-05-10  1
1234    Cancelled   C   2018-05-11  1
1234    Active  A   2018-05-24  2
1234    Active  A   2018-05-25  2
1234    Active  A   2018-05-26  2
1234    Active  A   2018-05-27  2
1234    Cancelled   C   2018-05-28  2
1234    Cancelled   C   2018-06-15  2
1234    Cancelled   C   2018-06-16  2
1234    Cancelled   C   2018-06-17  2

我的查询结果:

    Account Number  Status  Status Code Date    Account_sub_number
1234    Active  A   2017-12-04  1
1234    Active  A   2017-12-05  2
1234    Active  A   2017-12-06  3
1235    Active  A   2017-12-07  4
1234    Active  A   2018-03-02  5
1234    Active  A   2018-05-24  6
1234    Active  A   2018-05-25  7
1234    Active  A   2018-05-26  8
1234    Active  A   2018-05-27  9
1234    Cancelled   C   2018-03-03  1
1234    Cancelled   C   2018-03-04  2
1234    Cancelled   C   2018-05-10  3
1234    Cancelled   C   2018-05-11  4
1234    Cancelled   C   2018-05-28  5
1234    Cancelled   C   2018-06-15  6
1234    Cancelled   C   2018-06-16  7
1234    Cancelled   C   2018-06-17  8

【问题讨论】:

  • 。 .请用您正在使用的数据库标记问题。

标签: sql group-by amazon-redshift rank


【解决方案1】:

使用lag 获取上一行(按日期排序的每个帐户)的状态,并将其用于比较设置具有运行总和的组。

select t.*
,sum(case when prev_status is null or (prev_status='Cancelled' and status='Active') then 1 else 0 end) 
 over(partition by account_number order by date) as sub_account_number
from (select status,status_code,account_number,date,
      lag(status) over (partition by account_number order by date) as prev_status
      from schema.account 
      where account_number= 1234
     ) a

【讨论】:

    【解决方案2】:

    基本上,您需要定义组。在这种情况下,您可以通过查看非活动状态之后的活动状态来标记组的开始位置。

    那么,组starts的累计总和就是你要找的子号:

    select a.*,
           sum(case when prev_status_code = status_code or
                         status <> 'Active'
                    then 0 else 1
               end) over (partition by account_number order by date range between unbounded preceding and current row) as account_subnumber
    from (select a.*,
                 lag(status_code) over (partition by account_number order by date) as prev_status_code
          from schema.account a
         ) a
    where account_number = 1234
    order by date;
    

    【讨论】:

    • 我收到此错误:错误:带有 ORDER BY 子句的聚合窗口函数需要一个框架子句
    • @Spider 。 . .按照我的要求,用您正在使用的数据库标记您的问题。根据标准,窗框条款是可选的。但是,有些数据库需要它。
    猜你喜欢
    • 2022-11-23
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2016-05-24
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多