Django：通过指定m2m模型的字段值从m2m-connected模型中过滤m2m模型需要很长时间答案

【问题标题】：Django: It takes a long time to filter the m2m model from the m2m-connected model by specifying the field values of the m2m modelDjango：通过指定m2m模型的字段值从m2m-connected模型中过滤m2m模型需要很长时间
【发布时间】：2022-01-22 22:44:05
【问题描述】：

m2m through 表有大约 140 万行。

减速可能是由于行数过多，但我确定我正在正确编写查询集。你认为是什么原因？

大约需要 400-1000 毫秒。

如果你通过 pk 而不是 name 进行过滤，它不会那么慢。

# models.py
class Tag(models.Model):
    id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
    name = models.CharField(unique=True, max_length=30)
    created_at = models.DateTimeField(default=timezone.now)


class Video(models.Model):
    id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
    title = models.CharField(max_length=300)
    thumbnail_url = models.URLField(max_length=1000)
    preview_url = models.URLField(max_length=1000, blank=True, null=True)
    embed_url = models.URLField(max_length=1000)
    sources = models.ManyToManyField(Source)
    duration = models.CharField(max_length=6)
    tags = models.ManyToManyField(Tag, blank=True, db_index=True)
    views = models.PositiveIntegerField(default=0, db_index=True)
    is_public = models.BooleanField(default=True)
    published_at = models.DateTimeField(default=timezone.now, db_index=True)
    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)

Video.objects.filter(tags__name='word').only('id').order_by('-published_at');

已发出查询

SELECT "videos_video"."id"
FROM "videos_video"
INNER JOIN "videos_video_tags" ON ("videos_video"."id" = "videos_video_tags"."video_id")
INNER JOIN "videos_tag" ON ("videos_video_tags"."tag_id" = "videos_tag"."id")
WHERE "videos_tag"."name" = 'word'
ORDER BY "videos_video"."published_at" DESC;

解释（分析、详细、缓冲区）

                                                                                                                                       QUERY PLAN                                                                                               
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=4225.63..4226.23 rows=241 width=24) (actual time=456.321..473.827 rows=135178 loops=1)
   Output: videos_video.id, videos_video.published_at
   Sort Key: videos_video.published_at DESC
   Sort Method: external merge  Disk: 4504kB
   Buffers: shared hit=540568 read=11368, temp read=563 written=566
   ->  Nested Loop  (cost=20.45..4216.10 rows=241 width=24) (actual time=5.538..398.841 rows=135178 loops=1)
         Output: videos_video.id, videos_video.published_at
         Inner Unique: true
         Buffers: shared hit=540568 read=11368
         ->  Nested Loop  (cost=20.02..4102.13 rows=241 width=16) (actual time=5.513..76.291 rows=135178 loops=1)
               Output: videos_video_tags.video_id
               Buffers: shared hit=2 read=11222
               ->  Index Scan using videos_tag_name_620230b0_like on public.videos_tag  (cost=0.28..8.30 rows=1 width=16) (actual time=0.020..0.022 rows=1 loops=1)
                     Output: videos_tag.id, videos_tag.name, videos_tag.is_actress, videos_tag.created_at
                     Index Cond: ((videos_tag.name)::text = 'word'::text)
                     Buffers: shared hit=1 read=2
               ->  Bitmap Heap Scan on public.videos_video_tags  (cost=19.74..4079.23 rows=1460 width=32) (actual time=5.489..62.122 rows=135178 loops=1)
                     Output: videos_video_tags.id, videos_video_tags.video_id, videos_video_tags.tag_id
                     Recheck Cond: (videos_video_tags.tag_id = videos_tag.id)
                     Heap Blocks: exact=11112
                     Buffers: shared hit=1 read=11220
                     ->  Bitmap Index Scan on videos_video_tags_tag_id_2673cfc8  (cost=0.00..19.38 rows=1460 width=0) (actual time=4.215..4.215 rows=135178 loops=1)
                           Index Cond: (videos_video_tags.tag_id = videos_tag.id)
                           Buffers: shared hit=1 read=108
         ->  Index Scan using videos_video_pkey on public.videos_video  (cost=0.42..0.47 rows=1 width=24) (actual time=0.002..0.002 rows=1 loops=135178)
               Output: videos_video.id, videos_video.title, videos_video.thumbnail_url, videos_video.preview_url, videos_video.embed_url, videos_video.duration, videos_video.views, videos_video.is_public, videos_video.published_at, videos_video.created_at, videos_video.updated_at
               Index Cond: (videos_video.id = videos_video_tags.video_id)
               Buffers: shared hit=540566 read=146
 Planning:
   Buffers: shared hit=33 read=13
 Planning Time: 0.991 ms
 Execution Time: 481.274 ms
(32 rows)

Time: 482.869 ms

【问题讨论】：

这样的东西更快吗？ Tag.objects.get(name='word').video_set.order_by('-published_at')
哦！这工作得非常快！！！！非常感谢。为什么这个工作这么快？
因为您在最坏的情况下查询所有三个表，所以您创建了一个包含 140 万行的 JOIN，然后在所有这些行中搜索您的结果。因为我将查询拆分，你只能从标签表中得到一行，所以连接大大减少了，我认为 postgres 也可以更容易地确定当查询只过滤和连接 PK 和 FK 时要使用哪些索引跨度>
我明白了，这可能是其他通过m2m查询慢的原因。非常感谢您的帮助！
我发了一个和这个类似的问题，希望你能回答。

标签： python sql django postgresql

【解决方案1】：

您的数据库是否获得了这些索引：

“videos_tag”（“名称”、“id”）
“videos_video_tags”（“tag_id”、“video_id”）
“videos_video”（“id”、“published_at”）

如果没有，试试吧！

【讨论】：

【解决方案2】：

我使用 Iain Shelvington 评论中描述的方法解决了这个问题。

Tag.objects.get(name='word').video_set.order_by('-published_at')

【讨论】：

但是，这翻译成什么查询？