【发布时间】:2022-01-20 22:51:10
【问题描述】:
Django: Performance issues with query sets using m2m
我在这里问了这个问题,但没有得到答案,所以我重新发布一个更详细的问题。
当我将ORDER BY 与Count 聚合值一起使用时,由于某种原因未使用索引并且查询需要很长时间才能执行。
videos_video_tags 列有大约 130 万行。
以下操作大约需要 500-800 毫秒。
SELECT "videos_tag"."id",
"videos_tag"."name",
COUNT("videos_video_tags"."video_id") AS "count"
FROM "videos_tag"
LEFT OUTER JOIN "videos_video_tags" ON ("videos_tag"."id" = "videos_video_tags"."tag_id")
GROUP BY "videos_tag"."id"
ORDER BY "count" DESC
LIMIT 100;
从此 SQL 语句中删除 ORDER BY "count" DESC 只需要大约 2-10ms。
如果使用EXPLAIN查看执行计划中的详细信息,会发现使用ORDER BY不使用索引的查询没有被使用。
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=35198.66..35198.91 rows=100 width=37) (actual time=770.355..770.376 rows=100 loops=1)
Output: videos_tag.id, videos_tag.name, (count(videos_video_tags.video_id))
Buffers: shared hit=6928 read=4311
-> Sort (cost=35198.66..35212.53 rows=5548 width=37) (actual time=770.354..770.366 rows=100 loops=1)
Output: videos_tag.id, videos_tag.name, (count(videos_video_tags.video_id))
Sort Key: (count(videos_video_tags.video_id)) DESC
Sort Method: top-N heapsort Memory: 37kB
Buffers: shared hit=6928 read=4311
-> HashAggregate (cost=34931.14..34986.62 rows=5548 width=37) (actual time=766.050..768.090 rows=5548 loops=1)
Output: videos_tag.id, videos_tag.name, count(videos_video_tags.video_id)
Group Key: videos_tag.id
Batches: 1 Memory Usage: 977kB
Buffers: shared hit=6928 read=4311
-> Hash Right Join (cost=221.83..28246.14 rows=1337000 width=45) (actual time=2.840..497.697 rows=1337000 loops=1)
Output: videos_tag.id, videos_tag.name, videos_video_tags.video_id
Inner Unique: true
Hash Cond: (videos_video_tags.tag_id = videos_tag.id)
Buffers: shared hit=6928 read=4311
-> Seq Scan on public.videos_video_tags (cost=0.00..24512.00 rows=1337000 width=32) (actual time=0.008..109.061 rows=1337000 loops=1)
Output: videos_video_tags.id, videos_video_tags.video_id, videos_video_tags.tag_id
Buffers: shared hit=6831 read=4311
-> Hash (cost=152.48..152.48 rows=5548 width=29) (actual time=2.795..2.796 rows=5548 loops=1)
Output: videos_tag.id, videos_tag.name
Buckets: 8192 Batches: 1 Memory Usage: 399kB
Buffers: shared hit=97
-> Seq Scan on public.videos_tag (cost=0.00..152.48 rows=5548 width=29) (actual time=0.008..1.048 rows=5548 loops=1)
Output: videos_tag.id, videos_tag.name
Buffers: shared hit=97
Planning:
Buffers: shared hit=14
Planning Time: 0.497 ms
Execution Time: 770.812 ms
(32 rows)
Time: 772.336 ms
如果您没有使用 ORDER BY,您将看到以下内容
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.71..1689.61 rows=100 width=37) (actual time=0.069..9.664 rows=100 loops=1)
Output: videos_tag.id, videos_tag.name, (count(videos_video_tags.video_id))
Buffers: shared hit=7761
-> GroupAggregate (cost=0.71..93700.72 rows=5548 width=37) (actual time=0.069..9.647 rows=100 loops=1)
Output: videos_tag.id, videos_tag.name, count(videos_video_tags.video_id)
Group Key: videos_tag.id
Buffers: shared hit=7761
-> Merge Left Join (cost=0.71..86960.24 rows=1337000 width=45) (actual time=0.060..8.222 rows=11375 loops=1)
Output: videos_tag.id, videos_tag.name, videos_video_tags.video_id
Merge Cond: (videos_tag.id = videos_video_tags.tag_id)
Buffers: shared hit=7761
-> Index Scan using videos_tag_pkey on public.videos_tag (cost=0.28..635.50 rows=5548 width=29) (actual time=0.011..0.066 rows=101 loops=1)
Output: videos_tag.id, videos_tag.name, videos_tag.is_actress, videos_tag.created_at
Buffers: shared hit=102
-> Index Scan using videos_video_tags_tag_id_2673cfc8 on public.videos_video_tags (cost=0.43..69598.37 rows=1337000 width=32) (actual time=0.012..5.928 rows=11375 loops=1)
Output: videos_video_tags.id, videos_video_tags.video_id, videos_video_tags.tag_id
Buffers: shared hit=7659
Planning:
Buffers: shared hit=14
Planning Time: 0.364 ms
Execution Time: 9.734 ms
(21 rows)
Time: 10.639 ms
我认为索引也存在没有任何问题。
public | videos_tag_name_key | index | postgres | videos_tag
public | videos_tag_pkey | index | postgres | videos_tag
public | videos_video_tags_pkey | index | postgres | videos_video_tags
public | videos_video_tags_tag_id_2673cfc8 | index | postgres | videos_video_tags
public | videos_video_tags_video_id_8220dbb8 | index | postgres | videos_video_tags
public | videos_video_tags_video_id_tag_id_f8d6ba70_uniq | index | postgres | videos_video_tags
我在这个问题上花费了相当多的时间,但仍然无法解决它。 您认为可能是什么原因?
【问题讨论】:
-
HashAggregate (... rows=5548 ...) (... rows=5548 ...)行表明您有 5548 个查询结果。添加ORDER BY时,需要对这些结果进行排序,然后返回前100 个(来自LIMIT)。如果您删除ORDER BY,则前 random 将返回 100 条记录,速度更快,但无用,因为您将不知道它们是否是TOP 100。 -
那么我该怎么做呢?由于我们在实现分页,所以需要按count排序,得到前100名左右。
-
也许是MATERIALIZED VIEW,就像这个答案可以提供帮助:stackoverflow.com/a/12925639/724039
标签: sql postgresql