按计数对多对多关系中的行进行排序，但速度很快答案

【问题标题】：Order rows in a many-to-many relationship by count, but fast按计数对多对多关系中的行进行排序，但速度很快
【发布时间】：2021-09-13 02:59:24
【问题描述】：

我的项目中的 Image 和 Tag 表之间存在多对多关系：

tags2images = db.Table("tags2images",
    db.Column("tag_id", db.Integer, db.ForeignKey("tags.id", ondelete="CASCADE", onupdate="CASCADE"), primary_key=True),
    db.Column("image_id", db.Integer, db.ForeignKey("images.id", ondelete="CASCADE", onupdate="CASCADE"), primary_key=True)
)

class Image(db.Model):
    __tablename__ = "images"

    id = db.Column(db.Integer, primary_key=True, autoincrement=False)
    title = db.Column(db.String(1000), nullable=True)

    tags = db.relationship("Tag", secondary=tags2images, back_populates="images", passive_deletes=True)

class Tag(db.Model):
    __tablename__ = "tags"

    id = db.Column(db.Integer, primary_key=True, autoincrement=True)
    name = db.Column(db.String(250), nullable=False, unique=True)

    images = db.relationship(
        "Image",
        secondary=tags2images,
        back_populates="tags",
        passive_deletes=True
    )

我想获取一个标签列表，按它们在图像中的使用次数排序。我的图像和标签表分别包含 ~200.000 和 ~1.000.000 行，因此数据量相当可观。

经过一番折腾，我来到了这个怪物：

db.session.query(Tag, func.count(tags_assoc.c.tag_id).label("total"))\
        .join(tags_assoc)\
        .group_by(Tag)\
        .order_by(text("total DESC"))\
        .limit(20).all()

虽然它确实按照我想要的方式返回了一个 (Tag, count) 元组列表，但它需要几秒钟，这不是最佳的。

我发现这篇很有帮助的帖子 (Counting relationships in SQLAlchemy) 帮助我将上述内容简化为

db.session.query(Tag.name, func.count(Tag.id))\
        .join(Tag.works)\
        .group_by(Tag.id)\
        .limit(20).all()

虽然与我的第一次尝试相比，这非常快，但输出显然不再排序。如何让 SQLAlchemy 在保持快速查询的同时产生所需的结果？

【问题讨论】：

由于您的Tag 类只有一个有意义的属性Tag.name，并且它被定义为unique=True，那么您可以只使用它作为主键并省略id 代理项（自动增量）键。这样，您对关联表的聚合查询将直接返回 Tag.name PK，从而可能避免对代理键进行不必要的连接。

标签： python sqlalchemy

【解决方案1】：

这似乎是您可能需要在 psql 中使用 EXPLAIN 的东西。我通过Index('idx_tags2images', 'tag_id', 'image_id') 在tag_id 和image_id 上添加了一个组合索引。我不确定哪个更好，单独的索引还是组合的？但也许看看在加入之前只对关联表使用有限的子查询是否更快。

from sqlalchemy import select
tags2images = Table("tags2images",
                    Base.metadata,
                    Column("id", Integer, primary_key=True),
                    Column("tag_id", Integer, ForeignKey("tags.id", ondelete="CASCADE", onupdate="CASCADE"), index=True),
                    Column("image_id", Integer, ForeignKey("images.id", ondelete="CASCADE", onupdate="CASCADE"), index=True),
                    Index('idx_tags2images', 'tag_id', 'image_id'),
)

class Image(Base):
    __tablename__ = "images"

    id = Column(Integer, primary_key=True)
    title = Column(String(1000), nullable=True)

    tags = relationship("Tag", secondary=tags2images, back_populates="images", passive_deletes=True)

class Tag(Base):
    __tablename__ = "tags"

    id = Column(Integer, primary_key=True, autoincrement=True)
    name = Column(String(250), nullable=False, unique=True)

    images = relationship(
        "Image",
        secondary=tags2images,
        back_populates="tags",
        passive_deletes=True
    )

with Session() as session:
    total = func.count(tags2images.c.image_id).label("total")
    # Count, group and order just the association table itself.
    sub = select(
        tags2images.c.tag_id,
        total
    ).group_by(
        tags2images.c.tag_id
    ).order_by(
        total.desc()
    ).limit(20).alias('sub')
    # Now bring in the Tag names with a join
    # we order again but this time only across 20 entries.
    # @NOTE: Subquery will not get tags with image_count == 0
    # since we use INNER join.
    q = session.query(
        Tag,
        sub.c.total
    ).join(
        sub,
        Tag.id == sub.c.tag_id
    ).order_by(sub.c.total.desc())
    for tag, image_count in q.all():
        print (tag.name, image_count)

【讨论】：