【问题标题】:How to fill missing dates in BigQuery?如何在 BigQuery 中填写缺失的日期?
【发布时间】:2021-04-07 18:14:57
【问题描述】:

此问题与 How to fill missing dates and values in partitioned data? 有关,但由于该解决方案不适用于 BigQuery,因此我再次发布此问题。

我有以下假设表格:

name       date          val
-------------------------------
A          01/01/2020     1.5
A          01/03/2020     2
A          01/06/2020     5
B          01/02/2020     90
B          01/07/2020     10

我想填写空白之间的日期并从最近的下一个日期复制该值。另外,我想填写以下日期:1)回到预设的 MINDATE(假设是 2019 年 12 月 29 日)和 2)回到当前日期(假设是 01/09/2020) - 对于 2),默认值为 1。

所以,输出将是:

name       date          val
-------------------------------
A          12/29/2019     1.5
A          12/30/2019     1.5
A          12/31/2019     1.5
A          01/01/2020     1.5   <- original
A          01/02/2020     2
A          01/03/2020     2     <- original
A          01/04/2020     5
A          01/05/2020     5
A          01/06/2020     5     <- original
A          01/07/2020     1
A          01/08/2020     1
A          01/09/2020     1
B          12/29/2019     90
B          12/30/2019     90
B          12/31/2019     90
B          01/01/2020     90
B          01/02/2020     90    <- original
B          01/03/2020     10
B          01/04/2020     10
B          01/05/2020     10
B          01/06/2020     10
B          01/07/2020     10    <- original
B          01/08/2020     1
B          01/09/2020     1

上述问题中接受的solution 在 BigQuery 中不起作用。

【问题讨论】:

    标签: python date datetime google-bigquery


    【解决方案1】:

    这应该可以工作

    with base as (
    
    select 'A' as name,           '01/01/2020' as date,     1.5 as val  union all
    select 'A' as name,           '01/03/2020' as date,     2 as val union all
    select 'A' as name,           '01/06/2020' as date,     5 as val union all
    select 'B' as name,           '01/02/2020' as date,     90 as val union all
    select 'B' as name,           '01/07/2020' as date,     10 as val
    ),
    
    missing_dates as (
    
    select name,dates as date from 
    UNNEST(GENERATE_DATE_ARRAY('2019-12-29', '2020-01-09', INTERVAL 1 DAY)) AS dates cross join (select distinct name from base)
    
    ), joined as (
    select distinct missing_dates.name, missing_dates.date,val 
    from  missing_dates 
    left join base on missing_dates.name = base.name 
    and  parse_date('%m/%d/%Y', base.date) = missing_dates.date
    
    )
    
    select * except(val), 
    ifnull(first_value(val ignore nulls) over(partition by name order by date ROWS BETWEEN CURRENT ROW AND
        UNBOUNDED FOLLOWING),1) as va1 
    from joined
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2018-06-24
      • 1970-01-01
      • 1970-01-01
      • 2018-10-05
      • 2019-06-26
      • 1970-01-01
      • 2020-11-10
      相关资源
      最近更新 更多