【发布时间】:2020-07-01 08:25:14
【问题描述】:
我有一个性能很差的 Redshift UNION 查询。查询如下:
WITH a1 AS (SELECT
revenue_month,
SUM(revenue) AS revenue,
SUM(cost1) AS cost1,
SUM(cost2) AS cost2,
SUM(cost3) AS cost3
FROM orders1
GROUP BY revenue_month),
a2 AS (SELECT
revenue_month,
SUM(revenue) AS revenue,
SUM(cost1) AS cost1,
SUM(cost2) AS cost2,
SUM(cost3) AS cost3
FROM orders2
GROUP BY revenue_month),
b1 AS (SELECT
revenue_month,
amount_type,
SUM(amount) AS amount
FROM monthly
GROUP BY revenue_month,amount_type)
SELECT 'a1' AS data_set, 'revenue' AS amount_type, a1.revenue AS amount FROM a1 UNION
SELECT 'a1' AS data_set, 'cost1' AS amount_type, a1.cost1 AS amount FROM a1 UNION
SELECT 'a1' AS data_set, 'cost2' AS amount_type, a1.cost2 AS amount FROM a1 UNION
SELECT 'a1' AS data_set, 'cost3' AS amount_type, a1.cost3 AS amount FROM a1 UNION
SELECT 'a2' AS data_set, 'revenue' AS amount_type, a2.revenue AS amount FROM a2 UNION
SELECT 'a2' AS data_set, 'cost1' AS amount_type, a2.cost1 AS amount FROM a2 UNION
SELECT 'a2' AS data_set, 'cost2' AS amount_type, a2.cost2 AS amount FROM a2 UNION
SELECT 'a2' AS data_set, 'cost3' AS amount_type, a2.cost3 AS amount FROM a2 UNION
SELECT 'b1' AS data_set, b1.amount_type, b2.amount FROM b2
UNION 部分的目标是将 a1 和 a2 转换为与 b1 具有相同的结果集架构,并最终获得一个组合数据集。
当单独运行 a1 和 a2 子查询时,每个子查询需要大约 60 秒才能完成 6000 行,而 b1 需要 5 秒才能完成 500 行。这些运行时间对我来说是可以接受的,但是,上面的“组合”查询运行了高达 20 分钟。
我认为获取部分是这个查询花费太多时间的部分。我曾尝试使用 UNION ALL,但性能并没有提高那么多。如果我能够以某种方式将 a1 和 a2 转换为 b1 架构而不必使用 UNION 会很棒,但我无法这样做。
任何帮助将不胜感激。谢谢
【问题讨论】:
标签: amazon-redshift