【发布时间】:2018-03-23 22:25:29
【问题描述】:
我在同一个表中使用 id 和 parent_id 做了一个树结构。对于查询,我使用的是 PostgreSQL 提供的 CTE,但是要花费大量时间来执行递归结果的连接。例如,当我在 sadt_lot 表上有 100 条记录时,这个查询需要 8 秒才能返回结果。有人有更好的主意吗?
WITH RECURSIVE downlots as (
SELECT s1.sadt_lot_id, 0 AS level, s1.sadt_lot_id as root_id
FROM sadt_lot s1
WHERE s1.parent_lot_id IS NULL
UNION
SELECT s2.sadt_lot_id, d.level + 1, d.sadt_lot_id as root_id
FROM sadt_lot s2
INNER JOIN downlots d ON d.sadt_lot_id = s2.parent_lot_id
)
SELECT
"s"."sadt_lot_id",
"s"."name", concat(lpad(s.sadt_lot_id::TEXT, 3, '0'), '-', to_char(to_timestamp(s.created_at), 'DDMMYY')) sadt_lot_code,
"s"."created_at" AS "created_at",
"s"."version" AS "version", "s"."sadt_lot_status_id",
SUM(procedure_performed.amount_requested) procedures_total,
SUM(procedure_performed.total_value) procedures_total_value
FROM "sadt_lot" "s"
LEFT JOIN "sadt" ON sadt.sadt_lot_id = any(SELECT sadt_lot_id FROM downlots WHERE root_id = s.sadt_lot_id)
LEFT JOIN "procedure_auth" ON sadt.procedure_auth_id = procedure_auth.procedure_auth_id
LEFT JOIN "procedure_performed" ON procedure_auth.procedure_auth_id = procedure_performed.procedure_auth_id
WHERE "s"."parent_lot_id" IS NULL
GROUP BY "s"."sadt_lot_id"
ORDER BY "created_at" DESC
其他示例列出所有按根sadt_lot分组的sadt:
EXPLAIN ANALYZE WITH RECURSIVE downlots as (
SELECT sl1.sadt_lot_id, 0 AS level, sl1.sadt_lot_id as root_id
FROM sadt_lot sl1
WHERE sl1.parent_lot_id IS NULL
UNION
SELECT sl2.sadt_lot_id, d.level + 1, d.sadt_lot_id as root_id
FROM sadt_lot sl2
INNER JOIN downlots d ON d.sadt_lot_id = sl2.parent_lot_id
)
SELECT sl.sadt_lot_id, array_agg(s.sadt_id)
FROM sadt_lot sl
LEFT JOIN sadt s ON s.sadt_lot_id = any(SELECT sadt_lot_id FROM downlots WHERE root_id = sl.sadt_lot_id)
WHERE sl.parent_lot_id IS NULL
group by sl.sadt_lot_id
ORDEr By sl.sadt_lot_id
查询计划
GroupAggregate (cost=42.53..15077.74 rows=1 width=36) (actual time=104.090..8436.505 rows=90 loops=1)
Group Key: sl.sadt_lot_id
CTE downlots
-> Recursive Union (cost=0.00..42.39 rows=101 width=12) (actual time=0.006..0.104 rows=95 loops=1)
-> Seq Scan on sadt_lot sl1 (cost=0.00..2.94 rows=1 width=12) (actual time=0.005..0.019 rows=90 loops=1)
Filter: (parent_lot_id IS NULL)
Rows Removed by Filter: 5
-> Hash Join (cost=0.33..3.74 rows=10 width=12) (actual time=0.027..0.028 rows=2 loops=2)
Hash Cond: (sl2.parent_lot_id = d.sadt_lot_id)
-> Seq Scan on sadt_lot sl2 (cost=0.00..2.94 rows=94 width=8) (actual time=0.002..0.008 rows=95 loops=2)
-> Hash (cost=0.20..0.20 rows=10 width=8) (actual time=0.010..0.010 rows=48 loops=2)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> WorkTable Scan on downlots d (cost=0.00..0.20 rows=10 width=8) (actual time=0.001..0.004 rows=48 loops=2)
-> Nested Loop Left Join (cost=0.14..15004.14 rows=6242 width=8) (actual time=8.234..8434.229 rows=11345 loops=1)
Join Filter: (SubPlan 2)
Rows Removed by Join Filter: 1112125
-> Index Only Scan using sadt_lot_sadt_lot_id_parent_lot_id_idx on sadt_lot sl (cost=0.14..12.86 rows=1 width=4) (actual time=0.011..0.252 rows=90 loops=1)
Index Cond: (parent_lot_id IS NULL)
Heap Fetches: 90
-> Seq Scan on sadt s (cost=0.00..635.83 rows=12483 width=8) (actual time=0.002..1.785 rows=12483 loops=90)
SubPlan 2
-> CTE Scan on downlots (cost=0.00..2.27 rows=1 width=4) (actual time=0.003..0.007 rows=1 loops=1123470)
Filter: (root_id = sl.sadt_lot_id)
Rows Removed by Filter: 94
Planning time: 0.203 ms
Execution time: 8436.598 ms
【问题讨论】:
-
SQL中的FROM "s"在哪里?
-
问题写错,已编辑
-
WITH RECURSIVE ... FROM sadt_lot s1 - 为什么没有WHERE parent_lot_id IS NULL?
-
与它相同的结果,我在主查询的 WHERE 上有这个语句
标签: postgresql tree common-table-expression