重复行累积答案

【问题标题】：Repeat rows cumulative重复行累积
【发布时间】：2021-07-07 18:32:20
【问题描述】：

我有这张桌子

| date       | id | number |
|------------|----|--------|
| 2021/05/01 | 1  | 10     |
| 2021/05/02 | 2  | 20     |
| 2021/05/03 | 3  | 30     |
| 2021/05/04 | 1  | 20     |

我正在尝试编写一个查询以获取另一个表

| date       | id | number |
|------------|----|--------|
| 2021/05/01 | 1  | 10     |
| 2021/05/02 | 1  | 10     |
| 2021/05/02 | 2  | 20     |
| 2021/05/03 | 1  | 10     |
| 2021/05/03 | 2  | 20     |
| 2021/05/03 | 3  | 30     |
| 2021/05/04 | 1  | 20     |
| 2021/05/04 | 2  | 20     |
| 2021/05/04 | 3  | 30     |

这个想法是每个日期都应该有所有以前的不同 id 及其编号，如果一个 id 重复，那么只应考虑最后一个值。

【问题讨论】：

用您正在使用的数据库标记您的问题。

标签： sql google-bigquery

【解决方案1】：

一种方法是展开每个日期的所有行。然后使用qualify 取最近的值：

with t as (
    select date '2021-05-01' as date, 1 as id, 10 as number union all
    select date '2021-05-02' as date, 2 as id, 20 as number union all
    select date '2021-05-03' as date, 3 as id, 30 as number union all
    select date '2021-05-04' as date, 1 as id, 20 as number
)
select d.date, t.id, t.number 
from t join
     (select date
      from (select min(date) as min_date, max(date) as max_date
            from t
           ) tt cross join 
           unnest(generate_date_array(min_date, max_date, interval 1 day)) date
     ) d
     on t.date <= d.date
where 1=1
qualify row_number() over (partition by d.date, t.id order by t.date desc) = 1
order by 1, 2, 3;

更有效的方法不会生成所有行然后过滤它们。相反，它只是通过生成每行内的适当日期来生成所需的行。这需要几个窗口函数来获取每个 id 的“下一个”日期和数据中的最大日期：

with t as (
    select date '2021-05-01' as date, 1 as id, 10 as number union all
    select date '2021-05-02' as date, 2 as id, 20 as number union all
    select date '2021-05-03' as date, 3 as id, 30 as number union all
    select date '2021-05-04' as date, 1 as id, 20 as number
)
select date, t.id, t.number 
from (select t.*,
             date_add(lead(date) over (partition by id order by date), interval -1 day) as next_date,
             max(date) over () as max_date
      from t 
     ) t cross join
     unnest(generate_date_array(date, coalesce(next_date, max_date))) date
order by 1, 2, 3;

【讨论】：

【解决方案2】：

考虑下面[不那么冗长]的方法

select t1.date, t2.id, t2.number
from (
  select *, array_agg(struct(date, id,number)) over(order by date) arr
  from `project.dataset.table`
) t1, unnest(arr) t2
where true 
qualify row_number() over (partition by t1.date, t2.id order by t2.date desc) = 1
# order by date, id

如果应用于您问题中的样本数据 - 输出是

【讨论】：