【问题标题】:BigQuery - Get most recent data for each individual userBigQuery - 获取每个用户的最新数据
【发布时间】:2021-04-14 07:57:31
【问题描述】:

我想知道这里是否有人可以为我正在研究的 BigQuery 提供帮助。

这需要为域中的每个用户提取最近的 gplus/currents 活动。 我尝试了以下查询,但这会为每个用户提取所有活动:

SELECT
  TIMESTAMP_MICROS(time_usec) AS date,
  email,
  event_type,
  event_name
FROM
  `bqadminreporting.adminlogtracking.activity`
WHERE
  record_type LIKE 'gplus'
ORDER BY
  email ASC;

我已尝试使用 DISTINCT,但我仍然为同一用户获得多个条目。理想情况下,我需要回顾 90 天以上...(所以在今天和 90 天前之间,获取每个用户的最新活动 - 如果这有意义吗?)这让我遇到了另一个 question 的问题.

编辑: 示例数据和预期输出。

字段:有500多个字段,我只是列出了相关的

+--------------------------------+---------+----------+
|           Field name           |  Type   |   Mode   |
+--------------------------------+---------+----------+
| time_usec                      | INTEGER | NULLABLE |
| email                          | STRING  | NULLABLE |
| event_type                     | STRING  | NULLABLE |
| event_name                     | STRING  | NULLABLE |
| record_type                    | STRING  | NULLABLE |
| gplus                          | RECORD  | NULLABLE |
| gplus. log_event_resource_name | STRING  | NULLABLE |
| gplus. attachment_type         | STRING  | NULLABLE |
| gplus. plusone_context         | STRING  | NULLABLE |
| gplus. post_permalink          | STRING  | NULLABLE |
| gplus. post_resource_name      | STRING  | NULLABLE |
| gplus. comment_resource_name   | STRING  | NULLABLE |
| gplus. post_visibility         | STRING  | NULLABLE |
| gplus. user_type               | STRING  | NULLABLE |
| gplus. post_author_name        | STRING  | NULLABLE |
+--------------------------------+---------+----------+

我的查询的输出:这是我在上面运行查询时得到的输出。

+-----+--------------------------------+------------------+----------------+----------------+
| Row |              date              |      email       |   event_type   |   event_name   |
+-----+--------------------------------+------------------+----------------+----------------+
|   1 | 2020-01-30 07:10:19.088 UTC    | user1@domain.com | post_change    | create_post    |
|   2 | 2020-03-03 08:47:25.086485 UTC | user1@domain.com | coment_change  | create_comment |
|   3 | 2020-03-23 09:10:09.522 UTC    | user1@domain.com | post_change    | create_post    |
|   4 | 2020-03-23 09:49:00.337 UTC    | user1@domain.com | plusone_change | remove_plusone |
|   5 | 2020-03-23 09:48:10.461 UTC    | user1@domain.com | plusone_change | add_plusone    |
|   6 | 2020-01-30 10:04:29.757005 UTC | user1@domain.com | coment_change  | create_comment |
|   7 | 2020-03-28 08:52:50.711359 UTC | user2@domain.com | coment_change  | create_comment |
|   8 | 2020-11-08 10:08:09.161325 UTC | user2@domain.com | coment_change  | create_comment |
|   9 | 2020-04-21 15:28:10.022683 UTC | user3@domain.com | coment_change  | create_comment |
|  10 | 2020-03-28 09:37:28.738863 UTC | user4@domain.com | coment_change  | create_comment |
+-----+--------------------------------+------------------+----------------+----------------+

期望的结果:每个用户只有 1 行数据,只显示最近的事件。

+-----+--------------------------------+------------------+----------------+----------------+
| Row |              date              |      email       |   event_type   |   event_name   |
+-----+--------------------------------+------------------+----------------+----------------+
|   1 | 2020-03-23 09:49:00.337 UTC    | user1@domain.com | plusone_change | remove_plusone |
|   2 | 2020-11-08 10:08:09.161325 UTC | user2@domain.com | coment_change  | create_comment |
|   3 | 2020-04-21 15:28:10.022683 UTC | user3@domain.com | coment_change  | create_comment |
|   4 | 2020-03-28 09:37:28.738863 UTC | user4@domain.com | coment_change  | create_comment |
+-----+--------------------------------+------------------+----------------+----------------+

【问题讨论】:

  • 您能否展示一个示例输入数据和预期输出?
  • 更新了我的问题以显示字段类型、我的当前输出和我想要的输出

标签: sql google-bigquery


【解决方案1】:

如果您想要最近行中的所有列,可以使用以下 BigQuery 语法:

select array_agg(t order by date desc limit 1)[ordinal(1)].*
from mytable t
group by t.email;

如果您想要特定的列,那么 Sergey 的解决方案可能会更简单。

【讨论】:

    【解决方案2】:

    使用array_agg:

    select 
      email,
      array_agg(STRUCT(TIMESTAMP_MICROS(time_usec) as date, event_type, event_name) ORDER BY time_usec desc LIMIT 1)[OFFSET(0)].*
    from `bqadminreporting.adminlogtracking.activity`
    where
      record_type LIKE 'gplus'
      and time_usec > unix_micros(timestamp_sub(current_timestamp(), interval 90 day))
    group by email
    order by email
    

    测试示例:

    with mytable as (
      select timestamp '2020-01-30 07:10:19.088 UTC' as date, 'user1@domain.com' as email, 'post_change' as event_type, 'create_post' as event_name union all
      select timestamp '2020-03-03 08:47:25.086485 UTC', 'user1@domain.com', 'coment_change', 'create_comment' union all
      select timestamp '2020-03-23 09:10:09.522 UTC', 'user1@domain.com', 'post_change', 'create_post' union all
      select timestamp '2020-03-23 09:49:00.337 UTC', 'user1@domain.com', 'plusone_change', 'remove_plusone' union all
      select timestamp '2020-03-23 09:48:10.461 UTC', 'user1@domain.com', 'plusone_change', 'add_plusone' union all
      select timestamp '2020-01-30 10:04:29.757005 UTC', 'user1@domain.com', 'coment_change', 'create_coment' union all
      select timestamp '2020-03-28 08:52:50.711359 UTC', 'user2@domain.com', 'coment_change', 'create_coment' union all
      select timestamp '2020-11-08 10:08:09.161325 UTC', 'user2@domain.com', 'coment_change', 'create_coment' union all
      select timestamp '2020-04-21 15:28:10.022683 UTC', 'user3@domain.com', 'coment_change', 'create_coment' union all
      select timestamp '2020-03-28 09:37:28.738863 UTC', 'user4@domain.com', 'coment_change', 'create_coment'
    )
    select 
      email,
      array_agg(STRUCT(date, event_type, event_name) ORDER BY date desc LIMIT 1)[OFFSET(0)].*
    from mytable
    group by email
    

    【讨论】:

    • 啊!惊人的。如果我只想回顾 90 天,你知道我会怎么做那个 Sergey 吗?
    • 哦,忘了。已添加time_usec > unix_micros(timestamp_sub(current_timestamp(), interval 90 day))
    【解决方案3】:

    解决问题的另一种方法是:-

    select * from (
    select 
    max (date1) max_dt
    from  mytable
    group by date(date1)), mytable
    where date1=max_dt
    

    【讨论】:

      猜你喜欢
      • 2023-03-16
      • 2013-11-22
      • 2022-10-24
      • 2023-03-25
      • 1970-01-01
      • 2022-10-04
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多