【发布时间】:2014-01-15 15:17:04
【问题描述】:
我有两个表 Table1 和 Table2,两个表的主索引分别为 col1、col2、col3 和 col4。 我加入这两个表并在一组包含表的 PI 的列上进行分组。 有人能告诉我为什么在解释计划中我得到“聚合中间结果是全局计算的” 而不是本地。我的理解是,当按列分组时包含所有 PI 列 聚合结果是在本地而不是全局计算的。
select
A.col1
,A.col2
,A.col3
,A.col4
,col5
,col6
,col7
,col8
,col9
,SUM(col10)
,COUNT(col11)
table1 A
left outer join
table2 B
on A.col1 = B.col1
A.col2 = B.col2
A.col3 = B.col3
A.col4 = B.col4
group by A.col1,A.col2,A.col3,A.col4,col5,col6,col7,col8,col9
以下是查询的解释计划
1) First, we lock a distinct DATEBASE_NAME."pseudo table" for read on a
RowHash to prevent global deadlock for DATEBASE_NAME.S.
2) Next, we lock a distinct DATEBASE_NAME."pseudo table" for write on a
RowHash to prevent global deadlock for
DATEBASE_NAME.TARGET_TABLE.
3) We lock a distinct DATEBASE_NAME."pseudo table" for read on a RowHash
to prevent global deadlock for DATEBASE_NAME.E.
4) We lock DATEBASE_NAME.S for read, we lock
DATEBASE_NAME.TARGET_TABLE for write, and we lock
DATEBASE_NAME.E for read.
5) We do an all-AMPs JOIN step from DATEBASE_NAME.S by way of a RowHash
match scan with no residual conditions, which is joined to
DATEBASE_NAME.E by way of a RowHash match scan. DATEBASE_NAME.S and
DATEBASE_NAME.E are left outer joined using a merge join, with
condition(s) used for non-matching on left table ("(NOT
(DATEBASE_NAME.S.col1 IS NULL )) AND ((NOT
(DATEBASE_NAME.S.col2 IS NULL )) AND ((NOT
(DATEBASE_NAME.S.col3 IS NULL )) AND (NOT
(DATEBASE_NAME.S.col4 IS NULL ))))"), with a join condition of (
"(DATEBASE_NAME.S.col1 = DATEBASE_NAME.E.col1) AND
((DATEBASE_NAME.S.col2 = DATEBASE_NAME.E.col2) AND
((DATEBASE_NAME.S.col3 = DATEBASE_NAME.E.col3) AND
(DATEBASE_NAME.S.col4 = DATEBASE_NAME.E.col4 )))"). The input
table DATEBASE_NAME.S will not be cached in memory. The result goes
into Spool 3 (all_amps), which is built locally on the AMPs. The
result spool file will not be cached in memory. The size of Spool
3 is estimated with low confidence to be 675,301,664 rows (
812,387,901,792 bytes). The estimated time for this step is 3
minutes and 37 seconds.
6) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by
way of an all-rows scan , grouping by field1 (
DATEBASE_NAME.S.col1 ,DATEBASE_NAME.S.col2
,DATEBASE_NAME.S.col3 ,DATEBASE_NAME.S.col4
,DATEBASE_NAME.E.col5
,DATEBASE_NAME.S.col6 ,DATEBASE_NAME.S.col7
,DATEBASE_NAME.S.col8 ,DATEBASE_NAME.S.col9). Aggregate
Intermediate Results are computed globally, then placed in Spool 4.
The aggregate spool file will not be cached in memory. The size
of Spool 4 is estimated with low confidence to be 506,476,248 rows
(1,787,354,679,192 bytes). The estimated time for this step is 1
hour and 1 minute.
7) We do an all-AMPs MERGE into DATEBASE_NAME.TARGET_TABLE
from Spool 4 (Last Use). The size is estimated with low
confidence to be 506,476,248 rows. The estimated time for this
step is 33 hours and 12 minutes.
8) We spoil the parser's dictionary cache for the table.
9) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> No rows are returned to the user as the result of statement 1.
【问题讨论】:
标签: teradata