计算大查询（分区）中具有不同行的两个日期之间的持续时间答案

【问题标题】：Calculate duration between two date with different row in big query (partition)计算大查询（分区）中具有不同行的两个日期之间的持续时间
【发布时间】：2019-10-10 07:20:47
【问题描述】：

我有这样的数据：

`id      box_id         event               time                     
1       1001           'start'       2019-06-13 16:00                                       
2       1001           'end'         2019-06-13 15:22             
2       2001           'start'       2019-06-18 15:20                
3       1001           'start'       2019-06-13 15:20               
4       2003           'start'       2019-06-18 15:20`

预期结果：

date          box_id         start                end              idle 
 2019-06-13    1001       2019-06-13 16:00         NA              0 
 2019-06-13    1001       2019-06-13 15:20    2019-06-13 15:22     2 
 2019-06-18    2001       2019-06-18 15:20         NA              0 
 2019-06-18    2003       2019-06-18 15:20         NA              0

我想获得两个日期之间的差异（基于接近时间），当 box_id 与 event : end 没有接近时间时， box_id 显示 idle = 0 。我应该怎么办？我已经阅读了一些关于使用 over partition 的参考资料

【问题讨论】：

标签： sql google-bigquery diff partitioning duration

【解决方案1】：

使用lead():

select cast(time as date) as date,
       box_id,
       time as start_time,
       end_time
from (select t.*,
             lead(time) over (partition by box_id order by time) as end_time
      from t
     ) t
where event = 'start';

【讨论】：

感谢它的工作！但是我怎样才能在顶部的 1 个代码中获得 end_time 和 start_time 之间的持续时间？ @戈登
@Nadyaf 。 . .如果您想要以分钟为单位的差异，请使用 timestamp_diff() 或 datetime_diff()，具体取决于参数的类型。

【解决方案2】：

嗨@Nadyav：下面是帮助您入门的伪代码大纲。

【讨论】：

@fintangilane thx 寻求建议，但如果我应该创建新专栏，那就太多了？

【解决方案3】：

以下是 BigQuery 标准 SQL

#standardSQL
SELECT MIN(day) AS day, box_id, 
  MAX(IF(event = 'start', time, NULL)) start,
  MAX(IF(event = 'end', time, NULL)) `end`,
  IFNULL(TIMESTAMP_DIFF(MAX(IF(event = 'end', time, NULL)), MAX(IF(event = 'start', time, NULL)), SECOND), 0) idle
FROM (
  SELECT box_id, event, PARSE_TIMESTAMP('%Y-%m-%d %H:%M', time) time, PARSE_DATE('%Y-%m-%d', SUBSTR(time, 1, 10)) AS day, COUNTIF(event = 'start') OVER(win) grp
  FROM `project.dataset.table`
  WINDOW win AS (PARTITION BY box_id ORDER BY time)
)
GROUP BY grp, box_id

如果适用于您问题中的样本数据

WITH `project.dataset.table` AS (
  SELECT 1 id, 1001 box_id, 'start' event, '2019-06-13 16:00' time UNION ALL
  SELECT 2, 1001, 'end', '2019-06-13 15:22' UNION ALL
  SELECT 2, 2001, 'start', '2019-06-18 15:20' UNION ALL
  SELECT 3, 1001, 'start', '2019-06-13 15:20' UNION ALL
  SELECT 4, 2003, 'start', '2019-06-18 15:20'
)

结果是

Row day         box_id  start                       end                         idle     
1   2019-06-13  1001    2019-06-13 15:20:00 UTC     2019-06-13 15:22:00 UTC     120  
2   2019-06-13  1001    2019-06-13 16:00:00 UTC     null                        0    
3   2019-06-18  2001    2019-06-18 15:20:00 UTC     null                        0    
4   2019-06-18  2003    2019-06-18 15:20:00 UTC     null                        0

【讨论】：

【解决方案4】：

稍微不同的解决方案（使用LAG）：

select
   date(end_time) as date,
   box_id,
   start_time,
   end_time,
   if(pevent = 'start' and event = 'end', timestamp_diff(end_time, start_time,minute), null) as idle
from (
   select 
      box_id, 
      lag(time) over(partition by box_id order by time) as start_time, 
      time as end_time,  
      lag(event) over(partition by box_id order by time) as pevent,
      event
   from `dataset.table`
)

【讨论】：