【问题标题】:Time difference between 2 distinct events in BigQueryBigQuery 中 2 个不同事件之间的时间差
【发布时间】:2020-10-29 04:04:40
【问题描述】:

我正在尝试计算 BigQuery 中 2 个事件之间的时间差(它们是我们在 Firebase 中设置的 2 个自定义事件)。第一个是 event_a,第二个是在 event_a 之后触发的 event_b(无论何时)。

我尝试过以下查询:

SELECT round(AVG(time_diff),2) avg_duration_minutes
FROM(
SELECT user_pseudo_id,        
  CASE WHEN event_name = 'event_a' AND 
 LEAD(event_name,1) OVER(PARTITION BY user_id ORDER BY event_timestamp ASC) = 'event_b'
   THEN TIMESTAMP_DIFF(TIMESTAMP_MICROS(LEAD(event_timestamp, 1) OVER(PARTITION BY user_id ORDER BY event_timestamp ASC)), TIMESTAMP_MICROS(event_timestamp), minute) END time_diff
FROM  `database`
WHERE event_name in ('event_a', 'event_b')
)
where time_diff > 0.2

样本数据:

user_pseudo_id   event      timestamp
aaa              event_a    1587995938387000
bbb              event_a    1590948191239003
aaa              event_b    1587995943075005
ccc              event_a    1589130017650008
aaa              event_a    1593078261900005
aaa              event_b    1593078881226002
bbb              event_b    1590948208425007
ccc              event_b    1589130462706020

我想要得到的结果是每个用户的 event_a 和 event_b 之间的平均时间和总时间。

你有什么建议吗?重要的是要知道 2 个特定事件之间发生了多少时间(无论第二个事件何时发生)。

【问题讨论】:

  • 请提供样本数据和期望的结果。 “似乎不正确”也没有帮助。描述问题。
  • 我同意@GordonLinoff,您能否提供示例数据和所需的输出。因此可以编写查询并对其进行测试。
  • @GordonLinoff 很抱歉这是我第一次使用这个网站。现在好点了吗?我添加了一个简单的表,其中的数据与我的数据库中的数据相似

标签: sql google-bigquery


【解决方案1】:

以下是 BigQuery 标准 SQL

#standardSQL
SELECT 
  user_pseudo_id, 
  AVG(duration) AS avg_duration, 
  SUM(duration) AS total_duration
FROM (
  SELECT *, LEAD(timestamp) OVER(win) - timestamp AS duration
  FROM `project.dataset.table`
  WHERE event IN ('event_a', 'event_b')
  WINDOW win AS (PARTITION BY user_pseudo_id ORDER BY timestamp)

) 
WHERE event = 'event_a'
GROUP BY user_pseudo_id

【讨论】:

    【解决方案2】:

    如果要在事件a之后得到事件b的时间,可以使用条件累积最小值:

    SELECT ab.*
    FROM (SELECT user_pseudo_id, event_timestamp as event_a_timestamp,  
                 MIN(CASE WHEN event_name = 'event_b' THEN event_timestamp END) OVER 
                     (PARTITION BY user_id 
                      ORDER BY event_timestamp  
                      ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
                     ) as event_b_timestamp  
         FROM  `database`
         WHERE event_name in ('event_a', 'event_b')
        ) ab
    WHERE event_name = 'event_a'
    

    您的问题没有提供足够的细节来确定还需要做什么。

    【讨论】:

      【解决方案3】:

      我会这样回答:

      with data as (
        select user_pseudo_id, event_name, event_timestamp from `database` where event_name in ('event_a', 'event_b')
      ),
      ea as (
       -- Get first event_a per user
       select user_pseudo_id, min(event_timestamp) as first_a_ts from data where event_name = 'event_a' group by 1
      ),
      eb as (
       -- Get first event_b per user
       select user_pseudo_id, min(event_timestamp) as first_b_ts from data where event_name = 'event_b' group by 1
      ),
      joined (
        -- Assume we only want to calculate duration if user has an event_b, hence inner join
        select * 
        from ea 
        inner join eb using(user_pseudo_id) 
        where first_b_ts > first_a_ts
      )
      select 
        avg(timestamp_diff(first_b_ts, first_a_ts, second))/60.0 as avg_duration_minutes
      from joined
      

      我没有包含您的.2,因为我不确定您为什么任意过滤掉小于 12 秒的差异。

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2014-07-04
        • 1970-01-01
        • 1970-01-01
        • 2021-06-10
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多