【问题标题】:This query fails once per month, how can it be refactored?此查询每月失败一次,如何重构?
【发布时间】:2018-02-16 07:54:25
【问题描述】:

此查询每月失败一次,因为 BETWEEN 部分无效。 value BETWEEN min AND max,在 3 月 1 日,我的查询将再次失败,因为它将计算为 partition_2 BETWEEN 28 AND 1。如何使这个查询更可靠但仍然只使用所需的分区?

WITH recent_tasks AS
(SELECT task_id, state, timestamp, partition_0, partition_1, partition_2,
  row_number() OVER (PARTITION BY task_id
               ORDER BY timestamp DESC) AS rn
FROM firehose
WHERE
 "partition_0" BETWEEN to_char(current_date - interval '1' day, 'yyyy') AND to_char(current_date, 'yyyy')
 and "partition_1" BETWEEN to_char(current_date - interval '1' day, 'mm') AND to_char(current_date, 'mm')
 and "partition_2" BETWEEN to_char(current_date - interval '1' day, 'dd') AND to_char(current_date, 'dd')
ORDER BY rn)
SELECT * FROM recent_tasks
WHERE rn=1

几个注意事项:

  • 分区是字符值而不是整数
  • partition_2 是月份分区
  • 查询的目的是查找每个 task_id 的最新状态
  • 使用 AWS Athena
  • 数据以 S3 /yyyy/mm/dd 格式存储,每天都是一个新分区

理想情况下,我的查询将正确处理每月转换:

BETWEEN FEB 10 AND FEB 11 (works with above)
BETWEEN FEB 28 AND MAR 1  (fails with above)
BETWEEN MAR 1 AND MAR 2   (works with above)

【问题讨论】:

  • WITH recent_tasks AS (SELECT task_id, state, timestamp, partition_0, partition_1, partition_2, row_number() OVER (PARTITION BY task_id 仅受 MySQL 8.0 支持。而且 MySQL 8.0 还没有准备好生产。你确定你在使用 MySQL?

标签: presto amazon-athena


【解决方案1】:

如果您希望实现零而不是 28:

cast(to_char(current_date, 'dd') as signed)-1

所以,在 03/01,这将返回一个 1 - to_char(current_date, 'dd'),然后减去它会给你一个零:

and "partition_2" BETWEEN to_char(cast(to_char(current_date, 'dd') as signed)-1) AND to_char(current_date, 'dd')

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2020-10-15
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2016-12-22
    • 1970-01-01
    • 2018-06-05
    • 1970-01-01
    相关资源
    最近更新 更多