极慢的“row_number() over order”查询答案

【问题标题】：Extremely slow "row_number() over order" query极慢的“row_number() over order”查询
【发布时间】：2023-04-09 23:02:01
【问题描述】：

我有一个包含列（id、xp、...）和大约 150 万行的用户表。

我通过以下查询（执行时间为 33 秒）获取某人在 XP 排行榜中的位置：

EXPLAIN ANALYZE WITH counts AS (
                SELECT DISTINCT
                                id,
                                ROW_NUMBER () OVER (ORDER BY xp DESC)
                FROM
                    users
            ) SELECT
                *
            FROM
                counts
            WHERE
                id=1;

Subquery Scan on counts  (cost=344492.80..395160.57 rows=7404 width=16) (actual time=30683.244..32174.117 rows=1 loops=1)
  Filter: (counts.id = '1'::bigint)
  Rows Removed by Filter: 1481060
  ->  HashAggregate  (cost=344492.80..376651.79 rows=1480702 width=24) (actual time=30679.440..32034.921 rows=1481061 loops=1)
        Group Key: users.id, row_number() OVER (?)"
        Planned Partitions: 64  Batches: 65  Memory Usage: 4369kB  Disk Usage: 125960kB
        ->  WindowAgg  (cost=212155.06..238067.34 rows=1480702 width=24) (actual time=2983.137..20302.548 rows=1481061 loops=1)
              ->  Sort  (cost=212155.06..215856.81 rows=1480702 width=16) (actual time=2983.082..5040.782 rows=1481061 loops=1)
                    Sort Key: users.xp DESC
                    Sort Method: external merge  Disk: 37760kB
                    ->  Seq Scan on users  (cost=0.00..35094.02 rows=1480702 width=16) (actual time=25.467..880.626 rows=1481061 loops=1)
Planning Time: 2.593 ms
JIT:
  Functions: 14
  Options: Inlining false, Optimization false, Expressions true, Deforming true"
  Timing: Generation 12.061 ms, Inlining 0.000 ms, Optimization 1.503 ms, Emission 26.086 ms, Total 39.650 ms"
Execution Time: 32325.206 ms

我的表定义：

CREATE TABLE users
(
    id                    bigint                                        NOT NULL
        CONSTRAINT users_pkey
            PRIMARY KEY,
    xp                    bigint               DEFAULT 0                NOT NULL,
    ...
);
CREATE INDEX user_xp_leaderboard_index
    ON users (xp DESC, id ASC);

但它非常慢。尽管考虑到它对整个表进行排序并过滤它并不奇怪，但我不知道如何改进/优化这个查询。

我做了SET work_mem TO '1 GB';。有一点帮助，但不大。

任何帮助将不胜感激。提前致谢。

【问题讨论】：

xp上的索引？
Filter: (counts.id = '90076279646212096'::bigint)
我是索引新手。你能解释一下我应该如何创建索引并据此更改我的查询吗？ @Amadan
90076279646212096 是我在示例查询中使用的 ID，而不是 1。我会解决的。对不起。 @wildplasser
这实际上有助于进一步。谢谢你。 @wildplasser

标签： sql postgresql query-optimization window-functions

【解决方案1】：

你可以这样写查询：

select count(*)
from users u 
where u.xp >= (select u2.xp from users u2 where u2.id = 1);

这可以利用users(id, xp) 上的索引。这应该完全消除任何排序。如果行很宽并且 Postgres 可以使用仅索引扫描，则 users(xp) 上的索引也可能会有所帮助。

【讨论】：

谢谢。你是个传奇。查询大约需要 100 毫秒。但是，我有几个问题。我有这个索引：CREATE INDEX user_xp_leaderboard_index ON users (xp DESC, id); 1) id 是否应该在那里？ 2) 如果是，我应该先输入哪个？ xp DESC 或 id?
我问这个是因为在分析部分，使用我创建的索引时它说“部分聚合”
@Midorina 。 . .答案表明哪些索引是有帮助的。 (xp, id) 没有按正确顺序排列的键。
在我的帖子的 cmets 中，@wildplasser 另有建议。这就是我问的原因。
@Midorina 。 . .那是针对您问题中的查询。在某些情况下，xp 上的索引可能有助于此查询，但效果可能不会很大。