【发布时间】:2021-09-08 10:36:35
【问题描述】:
我正在使用 PostgreSQL 10 + pg_trgm 扩展。
表格布局:
Column | Type | Collation | Nullable | Default | Storage |
--------------+-------------------+-----------+----------+-----------------------+----------+
id | integer | | not null | | plain |
reps | integer | | | 1 | plain |
user | integer | | | | plain |
ip | character varying | | not null | ''::character varying | extended |
visittime | integer | | | | plain |
domain | character varying | | | | extended |
address | text | | | | extended |
method | character varying | | | | extended |
mime | character varying | | | | extended |
duration | integer | | | | plain |
size | bigint | | | | plain |
req_status | integer | | | | plain |
http_status | integer | | | | plain |
xproxymeta | integer | | | | plain |
Indexes:
"http_requests_pkey" PRIMARY KEY, btree (id)
"http_trgm_idx" gin (address gin_trgm_ops)
"md_userid_idx" btree ("user" DESC)
"md_visittime_idx" btree (visittime DESC)
Foreign-key constraints:
"http_requests_user_fkey" FOREIGN KEY ("user") REFERENCES username_id(id)
Triggers:
add_occupied_space_record_num AFTER INSERT ON http_requests FOR EACH ROW EXECUTE PROCEDURE add_occupied_space_record_num_func()
count_repeated_records BEFORE INSERT ON http_requests FOR EACH ROW EXECUTE PROCEDURE count_repeated_records_func()
delete_occupied_space_record_num AFTER DELETE ON http_requests FOR EACH ROW EXECUTE PROCEDURE delete_occupied_space_record_num_func()
请注意,address 列上有一个 GIN Trigram 全文索引。
该表目前有大约 1000 万条记录。
现在这两个相同的查询导致了非常不同的计划。 第一个需要 56 毫秒,第二个需要大约 25 秒。
# explain analyze select * from http_requests where address ilike '%abc%' order by visittime desc limit 10;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..60.93 rows=10 width=209) (actual time=2.862..21.725 rows=10 loops=1)
-> Index Scan using md_visittime_idx on http_requests (cost=0.43..500074.21 rows=82654 width=209) (actual time=2.861..21.719 rows=10 loops=1)
Filter: (address ~~* '%abc%'::text)
Rows Removed by Filter: 6663
Planning time: 0.279 ms
Execution time: 21.751 ms
(6 rows)
现在是相同的查询,只是搜索模式不同:xyz
# explain analyze select * from http_requests where address ilike '%xyz%' order by visittime desc limit 10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=5246.47..5246.50 rows=10 width=209) (actual time=23367.849..23367.870 rows=10 loops=1)
-> Sort (cost=5246.47..5248.52 rows=818 width=209) (actual time=23367.846..23367.848 rows=10 loops=1)
Sort Key: visittime DESC
Sort Method: top-N heapsort Memory: 34kB
-> Bitmap Heap Scan on http_requests (cost=2090.34..5228.79 rows=818 width=209) (actual time=17.202..23352.607 rows=18926 loops=1)
Recheck Cond: (address ~~* '%xyz%'::text)
Heap Blocks: exact=18243
-> Bitmap Index Scan on http_trgm_idx (cost=0.00..2090.14 rows=818 width=0) (actual time=12.342..12.342 rows=18926 loops=1)
Index Cond: (address ~~* '%xyz%'::text)
Planning time: 0.190 ms
Execution time: 23368.164 ms
(11 rows)
为什么计划如此不同,我该如何解决慢查询?
【问题讨论】:
-
查询完全不同。在返回任何行之前必须对结果进行排序。第二个可以返回它喜欢的任何行。
-
两者都有“order by visittime desc”。对吗?
-
只有当他们的问题/答案被评论,或者您标记他们的名字时,人们才会收到有关 cmets 的通知。因此,@GordonLinoff 将收到此评论的通知,但不会收到您的评论的通知。 (现在我已经标记了他,希望他现在会来这里,看到您的评论并可能回复。)
-
@MatBailie,当然,谢谢。
标签: sql postgresql database-optimization