【发布时间】:2016-12-08 09:14:44
【问题描述】:
总结
在包含许多行的多表连接中添加某些条件最终会导致查询速度变慢几个数量级。我尝试了很多方法来加快速度,包括每种类型的表连接、重新排序连接、重新排序 WHERE 子句、执行子查询、在 WHERE 子句中使用 CASE 语句等。
SQL 细节如下。
问题
- 为什么添加这个简单条件会导致规划器大幅更改其执行计划?
- 是否可以在不大幅更改查询或执行子查询的情况下告诉规划器如何首先分析特定条件(例如使用
WITH)
注意:我正在尝试为 API 编写通用 SQL 构建器,允许调用者在图中的任何点指定任意条件。问题是其中一些调用速度很快,而另一些调用不是由于 Postgres 计划执行的方式。专门为此查询设计的优化不会帮助我满足通用 SQL 构建器的更大目标。
详情
我有一个在 Postgres 中存储顶点和边的模式(一个简单的图形数据库):
CREATE TABLE IF NOT EXISTS vertex (type text, id serial, name text, data jsonb, UNIQUE (id))
CREATE INDEX vertex_data_idx ON vertex USING gin (data jsonb_path_ops)
CREATE INDEX vertex_type_idx ON vertex (type)
CREATE INDEX vertex_name_idx ON vertex (name)
CREATE TABLE IF NOT EXISTS edge (src integer REFERENCES vertex (id), dst integer REFERENCES vertex (id))
CREATE INDEX edge_src_idx ON edge (src)
CREATE INDEX edge_dst_idx ON edge (dst)
schema 存储图形,其中一个是这样的:PLANET --> CONTINENT --> COUNTRY --> REGION
我的示例数据库中有 447554 个顶点和 3155047 个边,但相关的数据在这里:
- 5 个行星(每个与 5 个大陆有关)
- 25 个大洲(每个大洲涉及 2500 个国家)
- 62500 个国家(其中 25% 分别与 100 个地区相关,其余没有地区关系)
- 250000 个地区
这个查找在任何给定区域有讲西班牙语的行星的查询速度很快:
SELECT DISTINCT
v1.name as name, v1.id as id
FROM vertex v1
LEFT JOIN edge e1 ON (v1.id = e1.src)
LEFT JOIN vertex v2 ON (v2.id = e1.dst)
LEFT JOIN edge e2 ON (v2.id = e2.src)
LEFT JOIN vertex v3 ON (v3.id = e2.dst)
LEFT JOIN edge e3 ON (v3.id = e3.src)
LEFT JOIN vertex v4 ON (v4.id = e3.dst)
WHERE
v4.type = 'REGION' AND
v4.data @> '{"languages":["spanish"]}'::jsonb
计划时间:6.289 毫秒 执行时间:0.744 ms
当我在图表 (v1) 的第一个表中的索引列上添加对结果没有影响的条件时,查询速度慢了 12,657 倍:
SELECT DISTINCT
v1.name as name, v1.id as id
FROM vertex v1
LEFT JOIN edge e1 ON (v1.id = e1.src)
LEFT JOIN vertex v2 ON (v2.id = e1.dst)
LEFT JOIN edge e2 ON (v2.id = e2.src)
LEFT JOIN vertex v3 ON (v3.id = e2.dst)
LEFT JOIN edge e3 ON (v3.id = e3.src)
LEFT JOIN vertex v4 ON (v4.id = e3.dst)
WHERE
v1.type = 'PLANET' AND
v4.type = 'REGION' AND
v4.data @> '{"languages":["spanish"]}'::jsonb
计划时间:7.664 毫秒 执行时间:89010.096 ms
这是第一次快速调用的解释(分析,缓冲区):
Unique (cost=154592.03..155453.96 rows=114925 width=28) (actual time=0.585..0.616 rows=4 loops=1)
Buffers: shared hit=92
-> Sort (cost=154592.03..154879.34 rows=114925 width=28) (actual time=0.579..0.588 rows=4 loops=1)
Sort Key: v1.name, v1.id
Sort Method: quicksort Memory: 17kB
Buffers: shared hit=92
-> Nested Loop (cost=37.96..142377.39 rows=114925 width=28) (actual time=0.155..0.549 rows=4 loops=1)
Buffers: shared hit=92
-> Nested Loop (cost=37.53..80131.76 rows=114925 width=4) (actual time=0.141..0.468 rows=4 loops=1)
Join Filter: (v2.id = e1.dst)
Buffers: shared hit=76
-> Nested Loop (cost=37.10..49179.08 rows=14270 width=8) (actual time=0.126..0.386 rows=4 loops=1)
Buffers: shared hit=60
-> Nested Loop (cost=36.68..41450.17 rows=14270 width=4) (actual time=0.112..0.304 rows=4 loops=1)
Join Filter: (v3.id = e2.dst)
Buffers: shared hit=44
-> Nested Loop (cost=36.25..37606.57 rows=1772 width=8) (actual time=0.092..0.209 rows=4 loops=1)
Buffers: shared hit=28
-> Nested Loop (cost=35.83..36646.82 rows=1772 width=4) (actual time=0.074..0.116 rows=4 loops=1)
Buffers: shared hit=12
-> Bitmap Heap Scan on vertex v4 (cost=30.99..1514.00 rows=220 width=4) (actual time=0.039..0.042 rows=1 loops=1)
Recheck Cond: (data @> '{"languages":["spanish"]}'::jsonb)
Filter: (type = 'REGION'::text)
Heap Blocks: exact=1
Buffers: shared hit=5
-> Bitmap Index Scan on vertex_data_idx (cost=0.00..30.94 rows=392 width=0) (actual time=0.020..0.020 rows=1 loops=1)
Index Cond: (data @> '{"languages":["spanish"]}'::jsonb)
Buffers: shared hit=4
-> Bitmap Heap Scan on edge e3 (cost=4.84..159.12 rows=57 width=8) (actual time=0.023..0.037 rows=4 loops=1)
Recheck Cond: (dst = v4.id)
Heap Blocks: exact=4
Buffers: shared hit=7
-> Bitmap Index Scan on edge_dst_idx (cost=0.00..4.82 rows=57 width=0) (actual time=0.013..0.013 rows=4 loops=1)
Index Cond: (dst = v4.id)
Buffers: shared hit=3
-> Index Only Scan using vertex_id_key on vertex v3 (cost=0.42..0.53 rows=1 width=4) (actual time=0.008..0.011 rows=1 loops=4)
Index Cond: (id = e3.src)
Heap Fetches: 4
Buffers: shared hit=16
-> Index Scan using edge_dst_idx on edge e2 (cost=0.43..1.46 rows=57 width=8) (actual time=0.008..0.011 rows=1 loops=4)
Index Cond: (dst = e3.src)
Buffers: shared hit=16
-> Index Only Scan using vertex_id_key on vertex v2 (cost=0.42..0.53 rows=1 width=4) (actual time=0.006..0.009 rows=1 loops=4)
Index Cond: (id = e2.src)
Heap Fetches: 4
Buffers: shared hit=16
-> Index Scan using edge_dst_idx on edge e1 (cost=0.43..1.46 rows=57 width=8) (actual time=0.005..0.008 rows=1 loops=4)
Index Cond: (dst = e2.src)
Buffers: shared hit=16
-> Index Scan using vertex_id_key on vertex v1 (cost=0.42..0.53 rows=1 width=28) (actual time=0.006..0.009 rows=1 loops=4)
Index Cond: (id = e1.src)
Buffers: shared hit=16
Planning time: 6.940 ms
Execution time: 0.714 ms
第二次,缓慢的通话:
HashAggregate (cost=592.23..592.24 rows=1 width=28) (actual time=89009.873..89009.885 rows=4 loops=1)
Group Key: v1.name, v1.id
Buffers: shared hit=11644657 read=1240045
-> Nested Loop (cost=2.98..592.22 rows=1 width=28) (actual time=9098.961..89009.833 rows=4 loops=1)
Buffers: shared hit=11644657 read=1240045
-> Nested Loop (cost=2.56..306.89 rows=522 width=32) (actual time=0.424..30066.007 rows=3092522 loops=1)
Buffers: shared hit=454795 read=46267
-> Nested Loop (cost=2.13..86.31 rows=65 width=36) (actual time=0.306..2120.293 rows=62500 loops=1)
Buffers: shared hit=239162 read=12162
-> Nested Loop (cost=1.70..51.10 rows=65 width=32) (actual time=0.261..574.490 rows=62500 loops=1)
Buffers: shared hit=488 read=562
actual time=0.205..1.206 rows=25 loops=1)p (cost=1.27..23.95 rows=8 width=36) (--More--
Buffers: shared hit=109 read=17
-> Nested Loop (cost=0.85..19.62 rows=8 width=32) (actual time=0.173..0.547 rows=25 loops=1)
Buffers: shared hit=12 read=14
-> Index Scan using vertex_type_idx on vertex v1 (cost=0.42..8.44 rows=1 width=28) (actual time=0.123..0.153 rows=5 loops=1)
Index Cond: (type = 'PLANET'::text)
Buffers: shared hit=2 read=4
-> Index Scan using edge_src_idx on edge e1 (cost=0.43..10.18 rows=100 width=8) (actual time=0.021..0.039 rows=5 loops=5)
Index Cond: (src = v1.id)
Buffers: shared hit=10 read=10
-> Index Only Scan using vertex_id_key on vertex v2 (cost=0.42..0.53 rows=1 width=4) (actual time=0.009..0.013 rows=1 loops=25)
Index Cond: (id = e1.dst)
Heap Fetches: 25
Buffers: shared hit=97 read=3
43..2.39 rows=100 width=8) (actual time=0.031..8.504 rows=2500 loops=25)(cost=0.--More--
Index Cond: (src = v2.id)
Buffers: shared hit=379 read=545
-> Index Only Scan using vertex_id_key on vertex v3 (cost=0.42..0.53 rows=1 width=4) (actual time=0.010..0.013 rows=1 loops=62500)
Index Cond: (id = e2.dst)
Heap Fetches: 62500
Buffers: shared hit=238674 read=11600
-> Index Scan using edge_src_idx on edge e3 (cost=0.43..2.39 rows=100 width=8) (actual time=0.013..0.163 rows=49 loops=62500)
Index Cond: (src = v3.id)
Buffers: shared hit=215633 read=34105
-> Index Scan using vertex_id_key on vertex v4 (cost=0.42..0.54 rows=1 width=4) (actual time=0.013..0.013 rows=0 loops=3092522)
Index Cond: (id = e3.dst)
Filter: ((data @> '{"languages":["spanish"]}'::jsonb) AND (type = 'REGION'::text))
Rows Removed by Filter: 1
Buffers: shared hit=11189862 read=1193778
Planning time: 7.664 ms
Execution time: 89010.096 ms
【问题讨论】:
-
删除
LEFT JOINs。它们不是必需的,只会混淆优化器。 -
v4上的外连接是无用的,因为由于where条件,它实际上变成了内连接 -
Voluntari,您对以下答案的理解如何?
-
(被否决,见上文)。
-
halfer - 下面的答案没有帮助,看来我的问题可能没有一个好的答案。我很惊讶你对我的详细帖子投了反对票。
标签: sql database postgresql join query-optimization