【问题标题】:Filling missing dates in BigQuery (SQL) without creating a new calendar在 BigQuery (SQL) 中填充缺失的日期而不创建新日历
【发布时间】:2018-11-15 12:06:27
【问题描述】:

我正在尝试创建一个 SQL,以便我可以在 Google Data Studio 中制作一个连接 BigQuery 的时间序列图。你可以在下面看到我的 SQL。

WITH 
CTE_1 AS
(SELECT ID, Date, Min_Predict, Max_Predict, Interval
,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Date) AS row_num
FROM
    table),
    CTE_2 AS
(SELECT Date, Min_Predict, Max_Predict,
SUM(Min_Predict) OVER (ORDER BY Date) AS Min,
SUM(Max_Predict) OVER (ORDER BY Date) AS Max
FROM CTE_1
WHERE
    row_num = 1 AND Interval = 'A')

SELECT Date, Min, Max
From CTE_2
GROUP BY Date, Min, Max
ORDER BY Date

结果我得到了这张表。

Row ProgressDate            EstMin  EstMax  
1   2017-07-21T00:00:00Z    0.125   0.25
2   2017-07-24T00:00:00Z    5.125   5.375
3   2017-07-25T00:00:00Z    8.75    10.25
4   2017-07-26T00:00:00Z    10.0    12.0
5   2017-07-27T00:00:00Z    10.5    12.75
6   2017-08-01T00:00:00Z    15.25   19.125
7   2017-08-02T00:00:00Z    15.5    19.375
8   2017-08-05T00:00:00Z    16.25   20.625

如您所见,我缺少日期,例如21.07 和 24.07 之间。如何用前一天的数据填充那些缺失的日期?因为在数据工作室中,我丢失了那些日子的数据,我可以将它们也等于 0,但我不想要这个。

【问题讨论】:

    标签: sql google-bigquery missing-data


    【解决方案1】:

    以下是 BigQuery 标准 SQL 并基于您当前的结果构建

    #standardSQL
    WITH your_current_result AS (
      ......
    ), days AS (
      SELECT day
      FROM (
        SELECT 
          MIN(DATE(TIMESTAMP(ProgressDate))) min_dt, 
          MAX(DATE(TIMESTAMP(ProgressDate))) max_dt
        FROM your_current_result
      ), UNNEST(GENERATE_DATE_ARRAY(min_dt, max_dt)) day
    )
    SELECT day, 
      LAST_VALUE(EstMin IGNORE NULLS) OVER(ORDER BY day) EstMin,
      LAST_VALUE(EstMax IGNORE NULLS) OVER(ORDER BY day) EstMax
    FROM days
    LEFT JOIN your_current_result
    ON day = DATE(TIMESTAMP(ProgressDate))
    -- ORDER BY day   
    

    您可以使用问题中的输出示例来测试、使用上述内容

    #standardSQL
    WITH your_current_result AS (
      SELECT '2017-07-21T00:00:00Z' ProgressDate, 0.125 EstMin, 0.25 EstMax UNION ALL
      SELECT '2017-07-24T00:00:00Z', 5.125, 5.375 UNION ALL
      SELECT '2017-07-25T00:00:00Z', 8.75, 10.25 UNION ALL
      SELECT '2017-07-26T00:00:00Z', 10.0, 12.0 UNION ALL
      SELECT '2017-07-27T00:00:00Z', 10.5, 12.75 UNION ALL
      SELECT '2017-08-01T00:00:00Z', 15.25, 19.125 UNION ALL
      SELECT '2017-08-02T00:00:00Z', 15.5, 19.375 UNION ALL
      SELECT '2017-08-05T00:00:00Z', 16.25, 20.625 
    ), days AS (
      SELECT day
      FROM (
        SELECT 
          MIN(DATE(TIMESTAMP(ProgressDate))) min_dt, 
          MAX(DATE(TIMESTAMP(ProgressDate))) max_dt
        FROM your_current_result
      ), UNNEST(GENERATE_DATE_ARRAY(min_dt, max_dt)) day
    )
    SELECT day, 
      LAST_VALUE(EstMin IGNORE NULLS) OVER(ORDER BY day) EstMin,
      LAST_VALUE(EstMax IGNORE NULLS) OVER(ORDER BY day) EstMax
    FROM days
    LEFT JOIN your_current_result
    ON day = DATE(TIMESTAMP(ProgressDate))
    ORDER BY day  
    

    结果

    Row day         EstMin  EstMax   
    1   2017-07-21  0.125   0.25     
    2   2017-07-22  0.125   0.25     
    3   2017-07-23  0.125   0.25     
    4   2017-07-24  5.125   5.375    
    5   2017-07-25  8.75    10.25    
    6   2017-07-26  10.0    12.0     
    7   2017-07-27  10.5    12.75    
    8   2017-07-28  10.5    12.75    
    9   2017-07-29  10.5    12.75    
    10  2017-07-30  10.5    12.75    
    11  2017-07-31  10.5    12.75    
    12  2017-08-01  15.25   19.125   
    13  2017-08-02  15.5    19.375   
    14  2017-08-03  15.5    19.375   
    15  2017-08-04  15.5    19.375   
    16  2017-08-05  16.25   20.625   
    

    【讨论】:

    • 这可以用于按组添加日期吗?查询应该检测到 group_id 的第一个实例,然后只填充值。谢谢。
    猜你喜欢
    • 2019-07-20
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-12-16
    • 2015-07-08
    • 1970-01-01
    • 2022-08-15
    • 1970-01-01
    相关资源
    最近更新 更多