【问题标题】:Calculate ratios of counts of rows with subqueries计算带有子查询的行数的比率
【发布时间】:2021-04-06 11:08:00
【问题描述】:

(我想不出比这个问题更好的标题了。欢迎提出建议。)

(如果版本很重要,我使用的是 SQLAlchemy 1.4.4 和 Postgresql 13.1。)

我有一个表('test'),其中包含多个人的布尔值的多个实例,表示测试结果(通过或失败),我想创建一个查询,返回一个结果集,表示每个人的通过/失败比率他们。

即,对于这个表:

 id | person | passed
----+--------+--------
  1 | p1     | t
  2 | p1     | f
  3 | p1     | f
  4 | p2     | t
  5 | p2     | t
  6 | p2     | t
  7 | p2     | t
  8 | p2     | t
  9 | p2     | f
 10 | p2     | f
 11 | p2     | f

查询应该返回:

person | pass_fail_ratio
-------+-------------------
p1     | 0.5
p2     | 1.6666666666666667

这是迄今为止我能够提出的解决方案。 (我在末尾附加了一个完整的 MWE。)

results_count = (
    sa.select(
        test.person,
        test.passed,
        sa.func.count(test.passed).label('count')
    ).group_by(test.person).group_by(test.passed)
).subquery()

pass_count = (
    sa.select(results_count.c.person, results_count.c.count)
    .filter(results_count.c.passed == True)  # noqa
).subquery()

fail_count = (
    sa.select(results_count.c.person, results_count.c.count)
    .filter(results_count.c.passed == False)  # noqa
).subquery()

pass_fail_ratio = (
    sa.select(
        pass_count.c.person,
        (
            sa.cast(pass_count.c.count, sa.Float)
            / sa.cast(fail_count.c.count, sa.Float)
        ).label('success_failure_ratio')
    )
).filter(fail_count.c.person == pass_count.c.person)

对我来说,这看起来过于复杂,因为这在概念上似乎相当简单。有没有更好的解决方案?


MWE:

# To change database name, modify 'dbname'.

# Expected output:
# ('p1', 0.5)
# ('p2', 1.6666666666666667)

# Lots of constraints and checks omitted for brevity.

# To view generated SQL, uncomment the line containing "echo" below.

import sqlalchemy as sa
import sqlalchemy.orm as orm
import sqlalchemy.types as types

dbname = 'test'


base = orm.declarative_base()


class test(base):
    __tablename__ = 'test'
    id = sa.Column(sa.Integer, primary_key=True)
    person = sa.Column(sa.String)
    passed = sa.Column(types.Boolean)
    pass


engine = sa.create_engine(
    f"postgresql://localhost:5432/{dbname}", future=True
)
base.metadata.drop_all(engine)
base.metadata.create_all(engine)
session = orm.Session(engine)

# Add some data.
session.add(test(person='p1', passed=True))
session.add(test(person='p1', passed=False))
session.add(test(person='p1', passed=False))
session.add(test(person='p2', passed=True))
session.add(test(person='p2', passed=True))
session.add(test(person='p2', passed=True))
session.add(test(person='p2', passed=True))
session.add(test(person='p2', passed=True))
session.add(test(person='p2', passed=False))
session.add(test(person='p2', passed=False))
session.add(test(person='p2', passed=False))
session.commit()

results_count = (
    sa.select(
        test.person,
        test.passed,
        sa.func.count(test.passed).label('count')
    ).group_by(test.person).group_by(test.passed)
).subquery()

pass_count = (
    sa.select(results_count.c.person, results_count.c.count)
    .filter(results_count.c.passed == True)  # noqa
).subquery()

fail_count = (
    sa.select(results_count.c.person, results_count.c.count)
    .filter(results_count.c.passed == False)  # noqa
).subquery()

pass_fail_ratio = (
    sa.select(
        pass_count.c.person,
        (
            sa.cast(pass_count.c.count, sa.Float)
            / sa.cast(fail_count.c.count, sa.Float)
        ).label('success_failure_ratio')
    )
).filter(fail_count.c.person == pass_count.c.person)

# engine.echo = True
with orm.Session(engine) as session:
    res = session.execute(pass_fail_ratio)
    for row in res:
        print(row)
        pass
    pass
pass

【问题讨论】:

    标签: sql postgresql sqlalchemy


    【解决方案1】:

    这太复杂了。我不会使用子查询。一种方法是:

    select person,
           count(*) filter (where passed) * 1.0 / count(*) filter (where not passed)
    from test t
    group by person;
    

    如果没有filter,您可能会发现“以老式方式”表达这一点更方便:

    select person,
           sum( passed::int ) * 1.0 / sum( (not passed)::int )
    from test t
    group by person;
    

    请注意,通过率比通过与失败的比率更常用。很简单:

    select person,
           avg( passed::int ) as pass_ratio
    from test t
    group by person;
    

    【讨论】:

    • 哇,那是个杀手!作为一个了解 SQL 的人,委婉地说,并不完美,我不知道 FILTER。
    • 至于通过/失败与通过/所有比率部分,我有我的理由:-)
    • @toomas 。 . . FILTER 是标准 SQL,但 Postgres 是实现它的少数数据库之一。
    【解决方案2】:

    在 SQLAlchemy 中得到 Gordon Linoff 的答案。这是我的最终解决方案:

    import sqlalchemy as sa
    pass_fail_ratio_query = sa.select(
            test.person,
            (
                sa.cast(
                    sa.funcfilter(sa.func.count(), test.passed == True),  # noqa
                    sa.Float
                )
                / sa.cast(
                    sa.funcfilter(sa.func.count(), test.passed == False),  # noqa
                    sa.Float
                )
            )
        ).group_by(test.person)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2011-07-13
      • 2011-07-18
      • 2015-01-30
      • 1970-01-01
      • 1970-01-01
      • 2015-05-12
      相关资源
      最近更新 更多