【发布时间】:2019-07-09 23:33:55
【问题描述】:
我正在尝试使用标准 SQL 将来自两个平面相关 BigQuery 视图的数据组合到单个嵌套表架构中
我有两个相似的表:
分析数据:全球时间跨度内每分钟一行
-------------------------------------------------------------------
minute_index | users | users_new | ...
-------------------------------------------------------------------
1312017 | 8 | 3 | ...
1312018 | 9 | 2 | ...
1312019 | 5 | 1 | ...
1312020 | 3 | 0 | ...
1312021 | 5 | 2 | ...
1312023 | 4 | 3 | ...
1312024 | 7 | 4 | ...
1312025 | 6 | 3 | ...
1312026 | 9 | 4 | ...
事件数据:每个发生的外部事件占一行
----------------------------------------
minute_index | event |
----------------------------------------
1312019 | "TV Spot Broadcast" |
1312023 | "Radio Spot Broadcast" |
1312026 | "Radio Spot Broadcast" |
我正在尝试将它们合并到一个表中,其中新表中的每一行都包含 Analytics 表的一个子集,该子集跨越该表和接下来的几分钟(我们称之为 5):
-----------------------------------------------------------------------------
minute_index | event | window_treated |
-----------------------------------------------------------------------------
1312019 | "TV Spot Broadcast" | minute_index | users | users_new |
|------------------------------------
| 1312019 | 5 | 1 |
| 1312020 | 3 | 0 |
| 1312021 | 5 | 2 |
| 1312023 | 4 | 3 |
| 1312024 | 7 | 4 |
-----------------------------------------------------------------------------
1312023 | "Radio Spot Broadcast" | minute_index | users | users_new |
|------------------------------------
| 1312023 | 4 | 3 |
| 1312020 | 3 | 0 |
| 1312021 | 5 | 2 |
| 1312023 | 4 | 3 |
| 1312024 | 7 | 4 |
我实际上已经能够构建这样的嵌套表,但只能通过构建和连接复杂的中间表集合来显然比它们应该的要复杂得多,如果我只能想了解如何在单个查询中执行此类操作。
这只是我尝试过的各种方法的一个示例......
SELECT
ed.timestamp AS timestamp,
ed.minute_index AS minute_index,
(SELECT AS STRUCT
ad.minute_index, ad.users, ad.users_new
FROM `my_project.my_dataset.analytics_data` ad
WHERE (ad.minute_index >= ed.minute_index)
AND (ad.minute_index < (ed.minute_index + 5))
ORDER BY
ed.minute_index) AS units_treated
FROM
`my_project.my_dataset.event_data` ed
但它也是似乎接近的几个之一,但都导致相同的验证器错误:
Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN.
【问题讨论】:
-
我已经能够在这里找到少量接近但不完全的答案,比如这个——Avoid correlated subqueries error in BigQuery——但他们通常没有提供足够的指导去哪里我得走了。