【问题标题】:Repeat rows cumulative重复行累积
【发布时间】:2021-07-07 18:32:20
【问题描述】:

我有这张桌子

| date       | id | number |
|------------|----|--------|
| 2021/05/01 | 1  | 10     |
| 2021/05/02 | 2  | 20     |
| 2021/05/03 | 3  | 30     |
| 2021/05/04 | 1  | 20     |

我正在尝试编写一个查询以获取另一个表

| date       | id | number |
|------------|----|--------|
| 2021/05/01 | 1  | 10     |
| 2021/05/02 | 1  | 10     |
| 2021/05/02 | 2  | 20     |
| 2021/05/03 | 1  | 10     |
| 2021/05/03 | 2  | 20     |
| 2021/05/03 | 3  | 30     |
| 2021/05/04 | 1  | 20     |
| 2021/05/04 | 2  | 20     |
| 2021/05/04 | 3  | 30     |

这个想法是每个日期都应该有所有以前的不同 id 及其编号,如果一个 id 重复,那么只应考虑最后一个值。

【问题讨论】:

  • 用您正在使用的数据库标记您的问题。

标签: sql google-bigquery


【解决方案1】:

一种方法是展开每个日期的所有行。然后使用qualify 取最近的值:

with t as (
    select date '2021-05-01' as date, 1 as id, 10 as number union all
    select date '2021-05-02' as date, 2 as id, 20 as number union all
    select date '2021-05-03' as date, 3 as id, 30 as number union all
    select date '2021-05-04' as date, 1 as id, 20 as number
)
select d.date, t.id, t.number 
from t join
     (select date
      from (select min(date) as min_date, max(date) as max_date
            from t
           ) tt cross join 
           unnest(generate_date_array(min_date, max_date, interval 1 day)) date
     ) d
     on t.date <= d.date
where 1=1
qualify row_number() over (partition by d.date, t.id order by t.date desc) = 1
order by 1, 2, 3;

更有效的方法不会生成所有行然后过滤它们。相反,它只是通过生成每行的适当日期来生成所需的行。这需要几个窗口函数来获取每个 id 的“下一个”日期和数据中的最大日期:

with t as (
    select date '2021-05-01' as date, 1 as id, 10 as number union all
    select date '2021-05-02' as date, 2 as id, 20 as number union all
    select date '2021-05-03' as date, 3 as id, 30 as number union all
    select date '2021-05-04' as date, 1 as id, 20 as number
)
select date, t.id, t.number 
from (select t.*,
             date_add(lead(date) over (partition by id order by date), interval -1 day) as next_date,
             max(date) over () as max_date
      from t 
     ) t cross join
     unnest(generate_date_array(date, coalesce(next_date, max_date))) date
order by 1, 2, 3;

【讨论】:

    【解决方案2】:

    考虑下面[不那么冗长]的方法

    select t1.date, t2.id, t2.number
    from (
      select *, array_agg(struct(date, id,number)) over(order by date) arr
      from `project.dataset.table`
    ) t1, unnest(arr) t2
    where true 
    qualify row_number() over (partition by t1.date, t2.id order by t2.date desc) = 1
    # order by date, id    
    

    如果应用于您问题中的样本数据 - 输出是

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2012-09-14
      • 2019-02-04
      • 2017-03-23
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-04-07
      相关资源
      最近更新 更多