【发布时间】:2021-10-20 03:11:03
【问题描述】:
我正在执行从 pyspark 查询到雪花查询的迁移工作,并且想知道以下 A、B 选项之间哪个选项更好。
为避免不必要的查询,如果没有显着的性能差异,我想选择 B 选项。
在 B 选项中,雪花查询引擎是否自动优化并且内部行为类似于 A 选项?
一个选项
With A1 AS (select * from a1 where date='2021-10-20'),
A2 AS (select * from a2 where date='2021-10-20'),
A3 AS (select * from a3 where date='2021-10-20'),
A4 AS (select * from a4 where date='2021-10-20'),
A5 AS (select * from a5 where date='2021-10-20')
SELECT *
FROM final_merged_table
和B选项
With A1 AS (select * from a1),
A2 AS (select * from a2),
A3 AS (select * from a3),
A4 AS (select * from a4),
A5 AS (select * from a5)
SELECT *
FROM final_merged_table
WHERE date = '2021-10-20'
【问题讨论】:
-
假设您的 CTE 每次都应该从前一个表表达式中读取并且“final_merged_table”应该是 A5 是否安全?
-
在实际代码中,CTE相互依赖多次,例如A3是A1、A2连接的结果,A5是A3、A4连接的结果。但是为了简单起见,可以假设最终的_merged_table是所有A1~A5的联合表。
标签: sql snowflake-cloud-data-platform query-engine