【发布时间】:2018-02-16 07:54:25
【问题描述】:
此查询每月失败一次,因为 BETWEEN 部分无效。 value BETWEEN min AND max,在 3 月 1 日,我的查询将再次失败,因为它将计算为 partition_2 BETWEEN 28 AND 1。如何使这个查询更可靠但仍然只使用所需的分区?
WITH recent_tasks AS
(SELECT task_id, state, timestamp, partition_0, partition_1, partition_2,
row_number() OVER (PARTITION BY task_id
ORDER BY timestamp DESC) AS rn
FROM firehose
WHERE
"partition_0" BETWEEN to_char(current_date - interval '1' day, 'yyyy') AND to_char(current_date, 'yyyy')
and "partition_1" BETWEEN to_char(current_date - interval '1' day, 'mm') AND to_char(current_date, 'mm')
and "partition_2" BETWEEN to_char(current_date - interval '1' day, 'dd') AND to_char(current_date, 'dd')
ORDER BY rn)
SELECT * FROM recent_tasks
WHERE rn=1
几个注意事项:
- 分区是字符值而不是整数
- partition_2 是月份分区
- 查询的目的是查找每个 task_id 的最新状态
- 使用 AWS Athena
- 数据以 S3 /yyyy/mm/dd 格式存储,每天都是一个新分区
理想情况下,我的查询将正确处理每月转换:
BETWEEN FEB 10 AND FEB 11 (works with above)
BETWEEN FEB 28 AND MAR 1 (fails with above)
BETWEEN MAR 1 AND MAR 2 (works with above)
【问题讨论】:
-
WITH recent_tasks AS (SELECT task_id, state, timestamp, partition_0, partition_1, partition_2, row_number() OVER (PARTITION BY task_id仅受 MySQL 8.0 支持。而且 MySQL 8.0 还没有准备好生产。你确定你在使用 MySQL?
标签: presto amazon-athena