【发布时间】:2016-10-24 12:57:51
【问题描述】:
假设一个项目使用分区来构造其数据。这个概念纯粹是特定于业务的,与数据库分区无关。
假设业务逻辑是这样的:
- 从 output_table 中删除 partition =
- 插入到 output_table (select * from input_table where partition = )
请记住,一切都是这样的结构,让我们将问题复杂化(以解决实际问题)。
假设我有一个潜在的杀手查询(SELECT 查询),就时间而言:
insert into output_table (
select *
from input_table
left outer join additional_table additional_table1
on input_table.id = additional_table1.id
left outer join additional_table additional_table2
on additional_table2.id = additional_table1.parent
where partition = <partitionX>
)
让我们对此进行优化并探索选项。 请记住,每个表都有分区。还要注意 table2 如何连接两次,但在不同的列上。而且,还要注意附加表是如何连接到自身上的
一切都使用 WITH 子句,但有几个选项,我想知道为什么其中一个更好。
A. WITH 部分中的直接和重复查询
WITH
CACHED_input_table AS (
SELECT *
FROM input_table
WHERE PARTITION_ID = < partition X >
),
CACHED_additional_table1 AS (
SELECT *
FROM additional_table
WHERE PARTITION_ID = < partition X >
),
CACHED_additional_table2 AS (
SELECT *
FROM additional_table
WHERE PARTITION_ID = < partition X >
)
SELECT *
FROM CACHED_input_table input_table
LEFT OUTER JOIN CACHED_additional_table1 additional_table1
ON input_table.ID = additional_table1.ID
LEFT OUTER JOIN CACHED_additional_table2 additional_table2
ON additional_table1.PARENT_ID = additional_table2.ID
B.在 FROM 部分重复使用查询
WITH
CACHED_input_table AS (
SELECT *
FROM input_table
WHERE PARTITION_ID = < partition X >
),
CACHED_additional_table AS (
SELECT *
FROM additional_table
WHERE PARTITION_ID = < partition X >
)
SELECT *
FROM CACHED_input_table input_table
LEFT OUTER JOIN CACHED_additional_table additional_table1
ON input_table.ID = additional_table1.ID
LEFT OUTER JOIN CACHED_additional_table additional_table2
ON additional_table1.PARENT_ID = additional_table2.ID
C.在 WITH 部分重用查询
WITH
CACHED_input_table AS (
SELECT *
FROM input_table
WHERE PARTITION_ID = < partition X >
),
CACHED_additional_table1 AS (
SELECT *
FROM additional_table
WHERE PARTITION_ID = < partition X >
),
CACHED_additional_table2 AS (
SELECT *
FROM CACHED_additional_table1
)
SELECT *
FROM CACHED_input_table input_table
LEFT OUTER JOIN CACHED_additional_table1 additional_table1
ON input_table.ID = additional_table1.ID
LEFT OUTER JOIN CACHED_additional_table2 additional_table2
ON additional_table1.PARENT_ID = additional_table2.ID
根据经验,选项 A 是最快的。但为什么?有人可以解释一下吗? (我玩的是Oracle v11.2)
我知道,我围绕这家公司特定的分区概念进行的优化可能与我所询问的围绕 WITH 子句的通用 sql 优化无关,但请将其作为现实生活中的示例。
解释计划
选项 A(7 秒内 9900 行)
------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 1037 | 18540 (8)| 00:00:03 | | |
|* 1 | HASH JOIN OUTER | | 1 | 1037 | 18540 (8)| 00:00:03 | | |
|* 2 | HASH JOIN OUTER | | 1 | 605 | 9271 (8)| 00:00:02 | | |
| 3 | PARTITION LIST SINGLE| | 1 | 173 | 2 (0)| 00:00:01 | KEY | KEY |
| 4 | TABLE ACCESS FULL | input_table | 1 | 173 | 2 (0)| 00:00:01 | 24 | 24 |
| 5 | PARTITION LIST SINGLE| | 1362K| 561M| 9248 (8)| 00:00:02 | KEY | KEY |
| 6 | TABLE ACCESS FULL | additional_table | 1362K| 561M| 9248 (8)| 00:00:02 | 24 | 24 |
| 7 | PARTITION LIST SINGLE | | 1362K| 561M| 9248 (8)| 00:00:02 | KEY | KEY |
| 8 | TABLE ACCESS FULL | additional_table | 1362K| 561M| 9248 (8)| 00:00:02 | 24 | 24 |
------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("additional_table"."PARENT"="additional_table"."ID"(+))
2 - access("input_table"."ID"="additional_table"."ID"(+))
选项 B(10 秒内 9900 行)
---------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2813 | 18186 (11)| 00:00:03 | | |
| 1 | TEMP TABLE TRANSFORMATION | | | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D6CA2_C26AF925 | | | | | | |
| 3 | PARTITION LIST SINGLE | | 1362K| 561M| 9248 (8)| 00:00:02 | KEY | KEY |
| 4 | TABLE ACCESS FULL | additional_table1 | 1362K| 561M| 9248 (8)| 00:00:02 | 24 | 24 |
|* 5 | HASH JOIN OUTER | | 1 | 2813 | 8939 (15)| 00:00:02 | | |
|* 6 | HASH JOIN OUTER | | 1 | 1493 | 4470 (15)| 00:00:01 | | |
| 7 | PARTITION LIST SINGLE | | 1 | 173 | 2 (0)| 00:00:01 | KEY | KEY |
| 8 | TABLE ACCESS FULL | input_table | 1 | 173 | 2 (0)| 00:00:01 | 24 | 24 |
| 9 | VIEW | | 1362K| 1714M| 4447 (14)| 00:00:01 | | |
| 10 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6CA2_C26AF925 | 1362K| 561M| 4447 (14)| 00:00:01 | | |
| 11 | VIEW | | 1362K| 1714M| 4447 (14)| 00:00:01 | | |
| 12 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6CA2_C26AF925 | 1362K| 561M| 4447 (14)| 00:00:01 | | |
---------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("additional_table1"."PARENT"="additional_table2"."ID"(+))
6 - access("input_table"."ID"="additional_table1"."ID"(+))
选项 C(17 秒内 9900 行)
---------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2813 | 18186 (11)| 00:00:03 | | |
| 1 | TEMP TABLE TRANSFORMATION | | | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D6CA7_C26AF925 | | | | | | |
| 3 | PARTITION LIST SINGLE | | 1362K| 561M| 9248 (8)| 00:00:02 | KEY | KEY |
| 4 | TABLE ACCESS FULL | additional_table | 1362K| 561M| 9248 (8)| 00:00:02 | 24 | 24 |
|* 5 | HASH JOIN OUTER | | 1 | 2813 | 8939 (15)| 00:00:02 | | |
|* 6 | HASH JOIN OUTER | | 1 | 1493 | 4470 (15)| 00:00:01 | | |
| 7 | PARTITION LIST SINGLE | | 1 | 173 | 2 (0)| 00:00:01 | KEY | KEY |
| 8 | TABLE ACCESS FULL | input_table | 1 | 173 | 2 (0)| 00:00:01 | 24 | 24 |
| 9 | VIEW | | 1362K| 1714M| 4447 (14)| 00:00:01 | | |
| 10 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6CA7_C26AF925 | 1362K| 561M| 4447 (14)| 00:00:01 | | |
| 11 | VIEW | | 1362K| 1714M| 4447 (14)| 00:00:01 | | |
| 12 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6CA7_C26AF925 | 1362K| 561M| 4447 (14)| 00:00:01 | | |
---------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("additional_table1"."PARENT_ID"="CACHED_additional_table"."ID"(+))
6 - access("input_table"."ID"="additional_table1"."ID"(+))
编辑:
- 添加了解释计划
- 已编辑的基本查询:有一个 input_table 和一个附加表连接了两次,一次在 input_table 上,一次在自身上
- 选项 A 的已编辑查询:有一个 input_table,additional_table 连接了两次,一次在 input_table 上,一次在其自身的副本 (additional_table) 上
- 选项 B 的已编辑查询:有一个 input_table,additional_table 连接了两次,一次在 input_table 上,一次在自身上,使用相同的别名(additional_table)
- 选项 C 的已编辑查询:有一个 input_table,additional_table 连接了两次,一次在 input_table 上,一次在 WITH 部分中从自身创建的另一个表上
【问题讨论】:
-
更改表名和别名不会改变选项 A 减少要连接的表的大小这一事实,因此选项 A 不等于您的基线,可能是性能提升的原因。不是由于使用 with,而是由于使用了进一步的连接条件。请参考我的建议答案了解更多
-
@Used_By_Already 我正在优化基线,所以我不希望它们相同(我认为它们最终会做同样的事情,我希望如此)。但是,我想要在这里比较选项
-
认为您正试图理解为什么“with”显然更快。要真正弄清楚这一点,您需要研究等效查询。
-
您对此有结论了吗?
-
@Used_By_Already 不,不是关于观察结果为何如此。选项 A 和您的优化都有很好的结果,并且肯定会显示具有较少中间对象的计划。我也没有尝试内联提示,因为我不知道它是什么意思以及如何去做(这需要我更多的研究和理解)
标签: sql oracle query-optimization