【发布时间】:2017-12-06 20:41:21
【问题描述】:
我面临以下查询的性能问题,其中同一个表多次自联接。如何避免同一张表上的多个联接?
INSERT INTO "TEMP"."TABLE2"
SELECT
T1."PRODUCT_SNO"
,T2."PRODUCT_SNO"
,T3."PRODUCT_SNO"
,T4."PRODUCT_SNO"
,((COUNT(DISTINCT T1."ACCESS_METHOD_ID")(FLOAT)) /
(MAX(T5.GROUP_NUM(FLOAT))))
FROM
"TEMP"."TABLE1" T1
,"TEMP"."TABLE1" T2
,"TEMP"."TABLE1" T3
,"TEMP"."TABLE1" T4
,"TEMP"."_TWM_GROUP_COUNT" T5
WHERE
T1."ACCESS_METHOD_ID" = T2."ACCESS_METHOD_ID"
AND T2."ACCESS_METHOD_ID" = T3."ACCESS_METHOD_ID"
AND T3."ACCESS_METHOD_ID" = T4."ACCESS_METHOD_ID"
AND T1."SUBSCRIPTION_DATE" < T2."SUBSCRIPTION_DATE"
AND T2."SUBSCRIPTION_DATE" < T3."SUBSCRIPTION_DATE"
AND T3."SUBSCRIPTION_DATE" < T4."SUBSCRIPTION_DATE"
GROUP BY 1, 2, 3, 4;
这需要 3 小时才能完成。下面是它的解释:
1) First, we lock a distinct TEMP."pseudo table" for write on a
RowHash to prevent global deadlock for
TEMP.TABLE2.
2) Next, we lock a distinct TEMP."pseudo table" for read on a
RowHash to prevent global deadlock for TEMP.T5.
3) We lock TEMP.TABLE2 for write, we lock
TEMP.TABLE1 for access, and we lock TEMP.T5 for read.
4) We do an all-AMPs RETRIEVE step from TEMP.T5 by way of an
all-rows scan with no residual conditions into Spool 4 (all_amps),
which is duplicated on all AMPs. The size of Spool 4 is estimated
with high confidence to be 48 rows (816 bytes). The estimated
time for this step is 0.01 seconds.
5) We execute the following steps in parallel.
1) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of
an all-rows scan, which is joined to TEMP.T4 by way of an
all-rows scan with no residual conditions. Spool 4 and
TEMP.T4 are joined using a product join, with a join
condition of ("(1=1)"). The result goes into Spool 5
(all_amps), which is built locally on the AMPs. Then we do a
SORT to order Spool 5 by the hash code of (
TEMP.T4.ACCESS_METHOD_ID). The size of Spool 5 is
estimated with high confidence to be 8,051,801 rows (
233,502,229 bytes). The estimated time for this step is 1.77
seconds.
2) We do an all-AMPs JOIN step from TEMP.T2 by way of a
RowHash match scan with no residual conditions, which is
joined to TEMP.T1 by way of a RowHash match scan with no
residual conditions. TEMP.T2 and TEMP.T1 are joined
using a merge join, with a join condition of (
"(TEMP.T1.ACCESS_METHOD_ID = TEMP.T2.ACCESS_METHOD_ID)
AND (TEMP.T1.SUBSCRIPTION_DATE <
TEMP.T2.SUBSCRIPTION_DATE)"). The result goes into Spool
6 (all_amps), which is built locally on the AMPs. The size
of Spool 6 is estimated with low confidence to be 36,764,681
rows (1,213,234,473 bytes). The estimated time for this step
is 4.12 seconds.
6) We do an all-AMPs JOIN step from Spool 5 (Last Use) by way of a
RowHash match scan, which is joined to TEMP.T3 by way of a
RowHash match scan with no residual conditions. Spool 5 and
TEMP.T3 are joined using a merge join, with a join condition
of ("(TEMP.T3.SUBSCRIPTION_DATE < SUBSCRIPTION_DATE) AND
(TEMP.T3.ACCESS_METHOD_ID = ACCESS_METHOD_ID)"). The result
goes into Spool 7 (all_amps), which is built locally on the AMPs.
The size of Spool 7 is estimated with low confidence to be
36,764,681 rows (1,360,293,197 bytes). The estimated time for
this step is 4.14 seconds.
7) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of a
RowHash match scan, which is joined to Spool 7 (Last Use) by way
of a RowHash match scan. Spool 6 and Spool 7 are joined using a
merge join, with a join condition of ("(SUBSCRIPTION_DATE <
SUBSCRIPTION_DATE) AND ((ACCESS_METHOD_ID = ACCESS_METHOD_ID) AND
((ACCESS_METHOD_ID = ACCESS_METHOD_ID) AND ((ACCESS_METHOD_ID =
ACCESS_METHOD_ID) AND (ACCESS_METHOD_ID = ACCESS_METHOD_ID ))))").
The result goes into Spool 3 (all_amps), which is built locally on
the AMPs. The result spool file will not be cached in memory.
The size of Spool 3 is estimated with low confidence to be
766,489,720 rows (29,893,099,080 bytes). The estimated time for
this step is 1 minute and 21 seconds.
8) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by
way of an all-rows scan , grouping by field1 (
TEMP.T1.PRODUCT_SNO ,TEMP.T2.PRODUCT_SNO
,TEMP.T3.PRODUCT_SNO ,TEMP.T4.PRODUCT_SNO
,TEMP.T1.ACCESS_METHOD_ID). Aggregate Intermediate Results
are computed globally, then placed in Spool 9. The aggregate
spool file will not be cached in memory. The size of Spool 9 is
estimated with low confidence to be 574,867,290 rows (
46,564,250,490 bytes). The estimated time for this step is 6
minutes and 38 seconds.
9) We do an all-AMPs SUM step to aggregate from Spool 9 (Last Use) by
way of an all-rows scan , grouping by field1 (
TEMP.T1.PRODUCT_SNO ,TEMP.T2.PRODUCT_SNO
,TEMP.T3.PRODUCT_SNO ,TEMP.T4.PRODUCT_SNO). Aggregate
Intermediate Results are computed globally, then placed in Spool
11. The size of Spool 11 is estimated with low confidence to be
50,625 rows (3,695,625 bytes). The estimated time for this step
is 41.87 seconds.
10) We do an all-AMPs RETRIEVE step from Spool 11 (Last Use) by way of
an all-rows scan into Spool 1 (all_amps), which is redistributed
by the hash code of (TEMP.T1.PRODUCT_SNO,
TEMP.T2.PRODUCT_SNO, TEMP.T3.PRODUCT_SNO,
TEMP.T4.PRODUCT_SNO) to all AMPs. Then we do a SORT to order
Spool 1 by row hash. The size of Spool 1 is estimated with low
confidence to be 50,625 rows (1,873,125 bytes). The estimated
time for this step is 0.04 seconds.
11) We do an all-AMPs MERGE into TEMP.TABLE2 from
Spool 1 (Last Use). The size is estimated with low confidence to
be 50,625 rows. The estimated time for this step is 1 second.
12) We spoil the parser's dictionary cache for the table.
13) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> No rows are returned to the user as the result of statement 1.
收集所有必需的统计信息。
【问题讨论】:
-
我建议您再问一个问题,包括样本数据、期望的结果以及您想要做什么的解释。同时,学习
JOIN语法,以便您的查询可以进入 21 世纪。 -
问题是你多次加入同一个表。您还将无条件加入 T5 ......即笛卡尔加入。所以如果 T5 有一个重要的行数,它肯定会很慢。
-
@GordonLinoff 我的逻辑要求我进行这样的连接。有没有其他方法可以避免这种加入?
-
@JeffUK T5 没有匹配条件。我有什么选择?
-
这些表的 DDL 是什么?你的系统有多少个 AMP?
_TWM_GROUP_COUNT中的数据是什么?你做了一些奇怪的事情,COUNT(DISTINCT)将一直为 1,MAX可能也很愚蠢。此查询应解决哪个业务问题?
标签: sql performance teradata