【问题标题】:Big Query pivot and aggregate repeated fieldsBig Query 透视和聚合重复字段
【发布时间】:2017-08-11 19:05:06
【问题描述】:

我想旋转“unitId”、“firebase_screen_class”字段,以便每个字段出现在单独的列中:

SELECT
  event.name,
  event_param.value.string_value AS ad_unit,
  COUNT(*) AS event_count
FROM
  `app_events_20170510`, 
  UNNEST(event_dim) AS event, 
  UNNEST(event.params) as event_param
WHERE
  event.name in ('Ad_requested', 'Ad_clicked', 'Ad_shown')
  and event_param.key in ('unitId', 'screen_class')
GROUP BY 1,2

我使用旧版 SQL 使用了以下查询,但它没有显示正确的聚合结果:

SELECT event_name, ad_unit, count(*) FROM
(
SELECT
  event_dim.name as event_name,
  MAX(IF(event_dim.params.key = "firebase_screen_class", event_dim.params.value.string_value, NULL)) WITHIN RECORD as firebase_screen_class,
  MAX(IF(event_dim.params.key = "unitId", event_dim.params.value.string_value, NULL)) WITHIN RECORD as ad_unit
FROM
  [app_events_20170510]
WHERE
  event_dim.name in ('Ad_requested','Ad_shown', 'Ad_clicked')
  and event_dim.params.key in ('unitId','screen_class')
)
group by 1,2

我正在寻找以下输出:

_________________________________________________________________________
| event_dim.name   | unitId         | screen_class         | count_events|
|__________________|________________|______________________|_____________|
| Ad_requested     | hpg            | socialFeed           |    520      |
|__________________|________________|______________________|_____________|
| Ad_shown         | hpg            | chat                 |    950      |
|__________________|________________|______________________|_____________|
| Ad_requested     | hni            | chat                 |    740      |
|__________________|________________|______________________|_____________|

所有事件Ad_requestedAd_shownAd_clicked 的参数具有相同的键(unitIdscreen_class),并且每个键的值也相同(unitIdhpg、@987654332 @/screen_class:socialFeed,chat)

【问题讨论】:

标签: google-bigquery


【解决方案1】:

以下是 BigQuery 标准 SQL

#standardSQL
WITH `aggregation` AS (
  SELECT
    event.name,
    event_param.key,
    COUNT(*) AS event_count
  FROM
    `app_events_20170510`, 
    UNNEST(event_dim) AS event, 
    UNNEST(event.params) AS event_param
  WHERE
    event.name IN ('Ad_requested', 'Ad_clicked', 'Ad_shown')
    AND event_param.key IN ('unitId', 'firebase_screen_class','house')
  GROUP BY 1, 2
)
SELECT 
  name,
  MAX(IF(key = 'unitId', event_count, NULL)) AS unitId,
  MAX(IF(key = 'firebase_screen_class', event_count, NULL)) AS firebase_screen_class,
  MAX(IF(key = 'house', event_count, NULL)) AS house
FROM `aggregation`
GROUP BY name  

根据 cmets 中的说明进行更新:

#standardSQL
SELECT
  event.name,
  (SELECT value.string_value FROM UNNEST(event.params) WHERE key = 'unitId') AS unitId,
  (SELECT value.string_value FROM UNNEST(event.params) WHERE key = 'firebase_screen_class') AS firebase_screen_class,
  (SELECT value.string_value FROM UNNEST(event.params) WHERE key = 'house') AS house,
  COUNT(1) AS event_count
FROM `app_events_20170510`, UNNEST(event_dim) AS event
WHERE event.name IN ('Ad_requested', 'Ad_clicked', 'Ad_shown')
GROUP BY 1,2,3,4

... 出于好奇,我尝试使用旧版 SQL 复制查询 ... -

为 BigQuery Legacy SQL 添加了版本(纯粹出于学习目的,希望帮助那些考虑迁移到标准 SQL 的人,因为这里现在提供了相同任务的两个版本

#legacySQL
SELECT name, product_id, source, firebase_event_origin, COUNT(1) AS event_count
FROM (
  SELECT event_dim.name AS name,
    MAX(IF(event_dim.params.key = 'unitId', event_dim.params.value.string_value, NULL)) WITHIN RECORD AS unitId,
    MAX(IF(event_dim.params.key = 'firebase_screen_class', event_dim.params.value.string_value, NULL)) WITHIN RECORD AS firebase_screen_class,
    MAX(IF(event_dim.params.key = 'house', event_dim.params.value.string_value, NULL)) WITHIN RECORD AS house,
  FROM FLATTEN([project:dataset.app_events_20170510], event_dim) AS event
  WHERE event_dim.name IN ('Ad_requested', 'Ad_clicked', 'Ad_shown')
)
GROUP BY 1, 2, 3, 4

【讨论】:

  • 感谢您的回答。我运行了查询,但这不是我想要实现的。我在找:event_name | unitId.values | house.values | firebase_screen_class.values| count_event
  • 我现在明白你的意思了
  • 是的,对不起。我在完成之前不小心发布了我的评论:-)
  • 我刚刚意识到 - 我不知道你所说的 xxx.values 是什么意思。它是各个键的 string_values 列表还是其他东西。我认为您应该提供更多详细信息/输出示例!
  • @DorianRoy - 尝试将 `app_events_20170510` 替换为 (SELECT FROM `IOS.app_events_20171106` UNION ALL SELECT FROM `ANDROID.app_events_20171106`)
猜你喜欢
  • 1970-01-01
  • 2021-08-09
  • 1970-01-01
  • 2019-06-18
  • 2018-02-26
  • 1970-01-01
  • 1970-01-01
  • 2016-02-19
  • 1970-01-01
相关资源
最近更新 更多