使用负索引加速 sqlalchemy orm 动态关系切片答案

【问题标题】：speed up sqlalchemy orm dynamic relationship slicing with negative indicies使用负索引加速 sqlalchemy orm 动态关系切片
【发布时间】：2020-09-21 13:03:43
【问题描述】：

我有以下 SQLA 模型和关系。我每秒都在记录每个通道的测量值，因此数据库中有很多测量值。

class Channel( Model ) :
    __tablename__   = 'channel'
    id              = Column( Integer, primary_key=True )
    #! --- Relationships ---
    measurements    = relationship( 'Measurement', back_populates='channel', lazy='dynamic' )

class Measurement( Model ) :
    __tablename__   = 'measurement'
    id              = Column( Integer, primary_key=True )
    timestamp       = Column( DateTime, nullable=False )
    value           = Column( Float, nullable=False )
    #! --- Relationships ---
    channel         = relationship( 'Channel', back_populates='measurements', uselist=False )

如果我想获得最新的测量值，我可以通过 ORM 获得它并使用 负索引 进行切片。

channel.measurements[-1]

但是，它非常非常慢！！

我可以使用.filter() 和.order_by() 等进一步过滤关系查询，以获得我想要的，但我喜欢使用ORM（为什么要不然？）

我注意到，如果我使用 正索引 进行切片，它会很快（类似于上面提到的显式 SQLA 查询）。

channel.measurements[0]

我更改了关系以保持measurements 的相反顺序，这似乎与使用零索引结合使用。

    measurements    = relationship( 'Measurement', back_populates='channel', lazy='dynamic', order_by='Measurement.id.desc()' )

那么，为什么负索引切片这么慢？

这是 SQLAlchemy 中的错误吗？我会认为执行正确的 SQL 以仅从数据库中获取最新项目会足够聪明？

我还需要做些什么来让测量按自然顺序排序并使用负索引切片并获得与其他方法相同的速度吗？？

【问题讨论】：

标签： orm sqlalchemy slice negative-integer

【解决方案1】：

你没有给出任何排序，所以它必须将所有对象加载到一个列表中，然后获取最后一个。

如果你添加echo=True参数，你可以看到查询的不同：

对于measurements[0]，它只选择与通道匹配的测量值之一 (LIMIT 1)：

SELECT measurement.id AS measurement_id, measurement.ts AS measurement_ts,
  measurement.value AS measurement_value,
  measurement.channel_id AS measurement_channel_id
FROM measurement
WHERE %(param_1)s = measurement.channel_id
 LIMIT %(param_2)s
{'param_1': 6, 'param_2': 1}

对于measurements[-1]，它选择与通道匹配的所有测量值。您还没有订购它，所以它必须要求数据库以它决定的任何顺序返回行（可能是measurement 上的主键，但不能保证）：

SELECT measurement.id AS measurement_id, measurement.ts AS measurement_ts,  
  measurement.value AS measurement_value,
  measurement.channel_id AS measurement_channel_id
FROM measurement
WHERE %(param_1)s = measurement.channel_id
{'param_1': 6}

如果您只想要最新的测量值，请选择它并按时间戳字段排序；你可能想要channel_id 和timestamp 字段的索引：

db.session.query(Measurement)\
    .filter(Measurement.channel_id == channel_id)\
    .order_by(Measurement.ts.desc())\
    .limit(1)\
    .first()

【讨论】：

谢谢@kielni。我专门要求 ORM 切片解决方案，而不是显式调用所有这些函数。如果调用first()，是否需要limit(1)？

【解决方案2】：

似乎答案是 SQLA 不支持具有负索引的有效切片或关联集合。实际上，代码中似乎有一些笨拙的尝试，但由于没有经过仔细考虑，因此将从 SQLA 中删除。

https://github.com/sqlalchemy/sqlalchemy/issues/5605

我通过实现一个返回最新测量值的混合属性解决了我的问题，而不是直接对关系集合进行切片。

    @hybrid_property
    def latest_measurement( self ) -> float :
        """
        Hybrid property that returns the latest measurement for the channel.
        """
        measurement = self.measurements.order_by( Measurement.id.desc() ).first()
        return measurement

【讨论】：