【发布时间】:2021-07-29 23:28:53
【问题描述】:
我正在尝试在 bigquery 中转发填充表,但在执行查询时资源不足。表大小为 2GB。 这张桌子看起来像这样:
with t as (
select timestamp '2021-05-01 00:00:01' as time, 10 as number union all
select timestamp '2021-05-01 05:00:01' as time, NULL as number union all
select timestamp '2021-05-01 23:00:01' as time, 20 as number union all
select timestamp '2021-05-02 00:00:01' as time, NULL as number union all
select timestamp '2021-05-02 01:00:01' as time, NULL as number union all
select timestamp '2021-05-02 05:00:01' as time, 12 as number
)
| time | number |
|---|---|
| 2021-05-01 00:00:01 | 10 |
| 2021-05-01 05:00:01 | NULL |
| 2021-05-01 23:00:01 | 20 |
| 2021-05-02 00:00:01 | NULL |
| 2021-05-02 01:00:01 | NULL |
| 2021-05-02 05:00:01 | 12 |
想要的输出是:
| time | number |
|---|---|
| 2021-05-01 00:00:01 | 10 |
| 2021-05-01 05:00:01 | 10 |
| 2021-05-01 23:00:01 | 20 |
| 2021-05-02 00:00:01 | 20 |
| 2021-05-02 01:00:01 | 20 |
| 2021-05-02 05:00:01 | 12 |
我目前的解决方案是:
SELECT time,
LAST_VALUE(number IGNORE NULLS) OVER(ORDER BY time) AS number
FROM t
它抛出:
Resources exceeded during query execution: The query could not be executed in the allotted memory.
问题在于 ORDER BY 的 OVER。 我尝试按天使用分区运行查询,并成功执行。
SELECT time,
LAST_VALUE(number IGNORE NULLS) OVER(PARTITION BY DATETIME_TRUNC(time, day) ORDER BY time) AS number
FROM t
| time | number |
|---|---|
| 2021-05-01 00:00:01 | 10 |
| 2021-05-01 05:00:01 | 10 |
| 2021-05-01 23:00:01 | 20 |
| 2021-05-02 00:00:01 | NULL |
| 2021-05-02 01:00:01 | NULL |
| 2021-05-02 05:00:01 | 12 |
问题是它仍然有空值,但比原始表少了大约 500 倍。不确定是否可以基于此解决问题。 有什么有效的方法可以解决这个问题吗?
【问题讨论】:
标签: sql google-bigquery