【问题标题】:Count number of events before and after a event "A" till another event "A" is encountered in Big query?计算事件“A”之前和之后的事件数,直到在大查询中遇到另一个事件“A”?
【发布时间】:2018-02-15 15:25:18
【问题描述】:

我有一个包含日期、事件和用户的表格。有一个名为“A”的事件。我想找出特定事件在 SQL Bigquery 中事件“A”之前和之后发生了多少次。事件 A 可能会出现多次。但它应该只计算事件,直到它在前后条件下都遇到另一个事件 A。
例如,

 User           Date             Events
    123          2018-02-14            X.Y.A
    123          2018-02-12            X.Y.B
    134          2018-02-10            Y.Z.A
    123          2018-02-11            A
    123          2018-02-01            X.Y.Z
    134          2018-02-05            X.Y.B
    134          2018-02-04            A
    123          2018-02-13            A

输出会是这样的。

User       Event    Before   After
123          A      1        1
123          A      0        1
134          A      0        1

其他条件保持不变。

这个问题是我之前问题的延伸。

详情请见How to count number of a particular event before another event in SQL Bigquery?

我必须计算的事件包含一个特定的前缀。意味着我必须检查以( X.Y.then 一些事件名称)开头的事件。所以,X.Y.SomeEvent 是我必须为其设置计数器的事件。有什么建议吗?

【问题讨论】:

  • 我们可以很容易地回答,但如果你只是问而不先尝试 - 你永远学不会。展示您到目前为止尝试过的内容以及您必须解决哪些问题才能使其发挥作用 - 到那时,您应该已经从以前的问题中打下了良好的基础

标签: sql google-bigquery legacy-sql


【解决方案1】:

以下是 BigQuery 标准 SQL

#standardSQL
WITH grps AS (
  SELECT user, dt, event, 
    COUNTIF(event = 'A') OVER(PARTITION BY user ORDER BY dt) grp
  FROM `project.dataset.events`
)
SELECT dt, user, event, before, after 
FROM (
  SELECT dt, user, event, 
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY grp RANGE BETWEEN 1 PRECEDING AND 1 PRECEDING ) before,
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY grp RANGE BETWEEN CURRENT ROW AND CURRENT ROW) after
  FROM grps
)
WHERE event = 'A'
-- ORDER BY user  

您可以使用下面示例中的虚拟数据来测试/玩上面的内容

#standardSQL
WITH `project.dataset.events` AS (
  SELECT 123 user,  '2018-02-14' dt, 'X.Y.A' event UNION ALL
  SELECT 123,       '2018-02-13', 'A'     UNION ALL
  SELECT 123,       '2018-02-12', 'X.Y.B' UNION ALL
  SELECT 123,       '2018-02-11', 'A'     UNION ALL
  SELECT 123,       '2018-02-01', 'X.Y.Z' UNION ALL
  SELECT 134,       '2018-02-10', 'Y.Z.A' UNION ALL
  SELECT 134,       '2018-02-05', 'X.Y.B' UNION ALL
  SELECT 134,       '2018-02-04', 'A'     
), grps AS (
  SELECT user, dt, event, 
    COUNTIF(event = 'A') OVER(PARTITION BY user ORDER BY dt) grp
  FROM `project.dataset.events`
)
SELECT dt, user, event, before, after 
FROM (
  SELECT dt, user, event, 
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY grp RANGE BETWEEN 1 PRECEDING AND 1 PRECEDING ) before,
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY grp RANGE BETWEEN CURRENT ROW AND CURRENT ROW) after
  FROM grps
)
WHERE event = 'A'
ORDER BY user  

结果为

Row dt          user    event   before  after    
1   2018-02-11  123     A       1       1    
2   2018-02-13  123     A       1       1    
3   2018-02-04  134     A       0       1    

【讨论】:

    【解决方案2】:

    这是一个更普遍的问题。使用时可以使用与lag()lead()相同的思路:

    select userid,
           (seqnum - lag(seqnum, 1, 0) over (partition by userid, order by date) - 1) as before,
           (lead(seqnum, 1, cnt) over (partition by user_id order by date) - seqnum - 1) as after
    from (select t.*,
                 row_number() over (partition by userid order by date) as seqnum,
                 count(*) over (partition by userid) as cnt
          from t
          where event like 'X.Y%' or event = 'A'
         ) t
    where event = 'A';
    

    【讨论】:

    • 嘿 Gordon,当我尝试实现这一点时,它给了我一个错误,即在最后一个 where 条件下无法识别事件。你能看一下查询建议我做错了什么吗?
    • @VSR 。 . .如果要在外部查询中使用,则需要在子查询中选择它。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-04-16
    • 1970-01-01
    • 2020-12-23
    • 1970-01-01
    • 1970-01-01
    • 2022-11-02
    相关资源
    最近更新 更多