【问题标题】:Calculate duration between two date with different row in big query (partition)计算大查询(分区)中具有不同行的两个日期之间的持续时间
【发布时间】:2019-10-10 07:20:47
【问题描述】:

我有这样的数据:

`id      box_id         event               time                     
1       1001           'start'       2019-06-13 16:00                                       
2       1001           'end'         2019-06-13 15:22             
2       2001           'start'       2019-06-18 15:20                
3       1001           'start'       2019-06-13 15:20               
4       2003           'start'       2019-06-18 15:20`

预期结果:

date          box_id         start                end              idle 
 2019-06-13    1001       2019-06-13 16:00         NA              0 
 2019-06-13    1001       2019-06-13 15:20    2019-06-13 15:22     2 
 2019-06-18    2001       2019-06-18 15:20         NA              0 
 2019-06-18    2003       2019-06-18 15:20         NA              0

我想获得两个日期之间的差异(基于接近时间),当 box_id 与 event : end 没有接近时间时, box_id 显示 idle = 0 。我应该怎么办 ?我已经阅读了一些关于使用 over partition 的参考资料

【问题讨论】:

    标签: sql google-bigquery diff partitioning duration


    【解决方案1】:

    使用lead():

    select cast(time as date) as date,
           box_id,
           time as start_time,
           end_time
    from (select t.*,
                 lead(time) over (partition by box_id order by time) as end_time
          from t
         ) t
    where event = 'start';
    

    【讨论】:

    • 感谢它的工作!但是我怎样才能在顶部的 1 个代码中获得 end_timestart_time 之间的持续时间? @戈登
    • @Nadyaf 。 . .如果您想要以分钟为单位的差异,请使用 timestamp_diff()datetime_diff(),具体取决于参数的类型。
    【解决方案2】:

    嗨@Nadyav:下面是帮助您入门的伪代码大纲。

    【讨论】:

    • @fintangilane thx 寻求建议,但如果我应该创建新专栏,那就太多了?
    【解决方案3】:

    以下是 BigQuery 标准 SQL

    #standardSQL
    SELECT MIN(day) AS day, box_id, 
      MAX(IF(event = 'start', time, NULL)) start,
      MAX(IF(event = 'end', time, NULL)) `end`,
      IFNULL(TIMESTAMP_DIFF(MAX(IF(event = 'end', time, NULL)), MAX(IF(event = 'start', time, NULL)), SECOND), 0) idle
    FROM (
      SELECT box_id, event, PARSE_TIMESTAMP('%Y-%m-%d %H:%M', time) time, PARSE_DATE('%Y-%m-%d', SUBSTR(time, 1, 10)) AS day, COUNTIF(event = 'start') OVER(win) grp
      FROM `project.dataset.table`
      WINDOW win AS (PARTITION BY box_id ORDER BY time)
    )
    GROUP BY grp, box_id
    

    如果适用于您问题中的样本数据

    WITH `project.dataset.table` AS (
      SELECT 1 id, 1001 box_id, 'start' event, '2019-06-13 16:00' time UNION ALL
      SELECT 2, 1001, 'end', '2019-06-13 15:22' UNION ALL
      SELECT 2, 2001, 'start', '2019-06-18 15:20' UNION ALL
      SELECT 3, 1001, 'start', '2019-06-13 15:20' UNION ALL
      SELECT 4, 2003, 'start', '2019-06-18 15:20'
    )
    

    结果是

    Row day         box_id  start                       end                         idle     
    1   2019-06-13  1001    2019-06-13 15:20:00 UTC     2019-06-13 15:22:00 UTC     120  
    2   2019-06-13  1001    2019-06-13 16:00:00 UTC     null                        0    
    3   2019-06-18  2001    2019-06-18 15:20:00 UTC     null                        0    
    4   2019-06-18  2003    2019-06-18 15:20:00 UTC     null                        0    
    

    【讨论】:

      【解决方案4】:

      稍微不同的解决方案(使用LAG):

      select
         date(end_time) as date,
         box_id,
         start_time,
         end_time,
         if(pevent = 'start' and event = 'end', timestamp_diff(end_time, start_time,minute), null) as idle
      from (
         select 
            box_id, 
            lag(time) over(partition by box_id order by time) as start_time, 
            time as end_time,  
            lag(event) over(partition by box_id order by time) as pevent,
            event
         from `dataset.table`
      )
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2021-11-28
        • 1970-01-01
        • 2023-01-16
        • 1970-01-01
        • 1970-01-01
        • 2020-06-28
        • 2014-02-28
        相关资源
        最近更新 更多