【问题标题】:Postgresql Query slow if empty table in IN clause如果 IN 子句中的空表,Postgresql 查询会变慢
【发布时间】:2021-09-29 23:42:46
【问题描述】:

我有以下 SQL

WITH filtered_users_pre as (
  SELECT value as username,row_number() OVER (partition by value) AS rk
    FROM "user-stats".tag_table
    WHERE _at_timestamp = 1626955200
       AND tag in ('commercial','marketing')
  ),

  filtered_users as (
    SELECT username
    FROM filtered_users_pre
    WHERE rk = 2
  ),

  valid_users as (
    SELECT aa.username, aa.rank, aa.points, aa.version
    FROM "users-results".ai_algo aa
    WHERE aa._at_timestamp = 1626955200
          AND aa.rank_timeframe = '7d'
          AND aa.username IN (SELECT * FROM filtered_users)
    ORDER BY aa.rank ASC
    LIMIT 15
    OFFSET 0
  )
select * from valid_users;

"user-stats".tag_table 是一个包含大约 6000 万行的表,具有适当的索引。 "users-results".ai_algo 是一个包含大约 1000 万行的表,具有适当的索引。

适当的索引我指的是出现在上面 WHERE 子句中的所有字段。

如果 filtered_users 为空,则查询需要 4 秒才能运行。如果filtered_users至少有一行,则需要400ms。

谁能解释我为什么?有没有办法让查询以相同的性能(400ms)运行,filtered_users 为空?我期待通过减少filtered_users 中的行数来获得更好的性能。这就是最多 1 行发生的情况。当行数为0时,需要10倍的时间。

当然,如果我在 WHERE 中而不是 IN 子句,而是在 ai_algofiltered_users 之间放置一个 INNER JOIN,也会发生同样的情况

更新 这是filtered_users 有0 行(4 秒执行)时的EXPLAIN (ANALYZE, BUFFERS) 输出查询

Limit  (cost=14592.13..15870.39 rows=15 width=35) (actual time=3953.945..3953.949 rows=0 loops=1)
  Buffers: shared hit=7456641
  ->  Nested Loop Semi Join  (cost=14592.13..1795382.62 rows=20897 width=35) (actual time=3953.944..3953.947 rows=0 loops=1)
        Join Filter: (aa.username = filtered_users_pre.username)
        Buffers: shared hit=7456641
        ->  Index Scan using ai_algo_202107_rank_timeframe_rank_idx on ai_algo_202107 aa  (cost=0.56..1718018.61 rows=321495 width=35) (actual time=0.085..3885.547 rows=313611 loops=1)
"              Index Cond: (rank_timeframe = '7d'::""valid-users-timeframe"")"
              Filter: (_at_timestamp = 1626955200)
              Rows Removed by Filter: 7793096
              Buffers: shared hit=7456533
        ->  Materialize  (cost=14591.56..14672.51 rows=13 width=21) (actual time=0.000..0.000 rows=0 loops=313611)
              Buffers: shared hit=108
              ->  Subquery Scan on filtered_users_pre  (cost=14591.56..14672.44 rows=13 width=21) (actual time=3.543..3.545 rows=0 loops=1)
                    Filter: (filtered_users_pre.rk = 2)
                    Rows Removed by Filter: 2415
                    Buffers: shared hit=108
                    ->  WindowAgg  (cost=14591.56..14638.74 rows=2696 width=29) (actual time=1.996..3.356 rows=2415 loops=1)
                          Buffers: shared hit=108
                          ->  Sort  (cost=14591.56..14598.30 rows=2696 width=21) (actual time=1.990..2.189 rows=2415 loops=1)
                                Sort Key: tag_table_20210722.value
                                Sort Method: quicksort  Memory: 285kB
                                Buffers: shared hit=108
                                ->  Bitmap Heap Scan on tag_table_20210722  (cost=146.24..14437.94 rows=2696 width=21) (actual time=0.612..1.080 rows=2415 loops=1)
"                                      Recheck Cond: ((tag)::text = ANY ('{commercial,marketing}'::text[]))"
                                      Filter: (_at_timestamp = 1626955200)
                                      Rows Removed by Filter: 2415
                                      Heap Blocks: exact=72
                                      Buffers: shared hit=105
                                      ->  Bitmap Index Scan on tag_table_20210722_tag_idx  (cost=0.00..145.57 rows=5428 width=0) (actual time=0.292..0.292 rows=4830 loops=1)
"                                            Index Cond: ((tag)::text = ANY ('{commercial,marketing}'::text[]))"
                                            Buffers: shared hit=33
Planning Time: 0.914 ms
Execution Time: 3954.035 ms

这是当 filters_users 至少有 1 行(300 毫秒)时

Limit  (cost=14592.13..15870.39 rows=15 width=35) (actual time=15.958..300.759 rows=15 loops=1)
  Buffers: shared hit=11042
  ->  Nested Loop Semi Join  (cost=14592.13..1795382.62 rows=20897 width=35) (actual time=15.957..300.752 rows=15 loops=1)
        Join Filter: (aa.username = filtered_users_pre.username)
        Rows Removed by Join Filter: 1544611
        Buffers: shared hit=11042
        ->  Index Scan using ai_algo_202107_rank_timeframe_rank_idx on ai_algo_202107 aa (cost=0.56..1718018.61 rows=321495 width=35) (actual time=0.075..10.455 rows=645 loops=1)
"              Index Cond: (rank_timeframe = '7d'::""valid-users-timeframe"")"
              Filter: (_at_timestamp = 1626955200)
              Rows Removed by Filter: 16124
              Buffers: shared hit=10937
        ->  Materialize  (cost=14591.56..14672.51 rows=13 width=21) (actual time=0.003..0.174 rows=2395 loops=645)
              Buffers: shared hit=105
              ->  Subquery Scan on filtered_users_pre  (cost=14591.56..14672.44 rows=13 width=21) (actual time=1.895..3.680 rows=2415 loops=1)
                    Filter: (filtered_users_pre.rk = 1)
                    Buffers: shared hit=105
                    ->  WindowAgg  (cost=14591.56..14638.74 rows=2696 width=29) (actual time=1.894..3.334 rows=2415 loops=1)
                          Buffers: shared hit=105
                          ->  Sort  (cost=14591.56..14598.30 rows=2696 width=21) (actual time=1.889..2.102 rows=2415 loops=1)
                                Sort Key: tag_table_20210722.value
                                Sort Method: quicksort  Memory: 285kB
                                Buffers: shared hit=105
                                ->  Bitmap Heap Scan on tag_table_20210722  (cost=146.24..14437.94 rows=2696 width=21) (actual time=0.604..1.046 rows=2415 loops=1)
"                                      Recheck Cond: ((tag)::text = ANY ('{commercial,marketing}'::text[]))"
                                      Filter: (_at_timestamp = 1626955200)
                                      Rows Removed by Filter: 2415
                                      Heap Blocks: exact=72
                                      Buffers: shared hit=105
                                      ->  Bitmap Index Scan on tag_table_20210722_tag_idx  (cost=0.00..145.57 rows=5428 width=0) (actual time=0.287..0.287 rows=4830 loops=1)
"                                            Index Cond: ((tag)::text = ANY ('{commercial,marketing}'::text[]))"
                                            Buffers: shared hit=33
Planning Time: 0.310 ms
Execution Time: 300.954 ms

【问题讨论】:

  • 没有看到EXPLAIN (ANALYZE, BUFFERS) 输出,没有人可以回答这个问题。顺便说一句,对眼前的一切进行 iindexing 是不是正确的索引。
  • 我还没有索引所有内容,我索引了搜索中涉及的所有字段,这就是我的意思。将很快提供解释
  • @LaurenzAlbe 遵循您的建议,上面的答案更新为EXPLAIN (ANALYZE, BUFFERS)

标签: sql postgresql query-optimization


【解决方案1】:

问题是,如果没有匹配的filtered_users,PostgreSQL 必须遍历 all "users-results".ai_algo 而不找到 15 个结果行。如果子查询包含元素,它会快速找到 15 个匹配的"users-results".ai_algo 行并可以终止处理。

对此您无能为力,但您可以加快"users-results".ai_algo 的扫描速度。目前,您有

->  Index Scan using ai_algo_202107_rank_timeframe_rank_idx on ai_algo_202107 aa
                              ... (actual time=0.085..3885.547 rows=313611 loops=1)
      Index Cond: (rank_timeframe = '7d'::"valid-users-timeframe")
      Filter: (_at_timestamp = 1626955200)
      Rows Removed by Filter: 7793096
      Buffers: shared hit=7456533

您看到索引扫描的效果不如预期:它从表中读取 313611 + 7793096 = 8106707 行,并丢弃除 313611 之外的所有符合过滤条件的行。

您可以通过创建一个只能直接找到结果行的索引来做得更好:

CREATE INDEX ON "users-results".ai_algo (rank_timeframe, _at_timestamp);

然后您可以删除索引ai_algo_rank_timeframe_rank_idx,因为新索引可以做旧索引可以做的所有事情(甚至更多)。

【讨论】:

  • 谢谢,您建议的索引最终解决了问题(现在查询在 350 毫秒内运行,即使返回 0 行)。这样的答案对理解背后的逻辑很有帮助!
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2015-04-09
  • 2010-10-14
  • 2020-12-30
  • 1970-01-01
  • 2016-11-21
  • 2011-06-13
  • 2012-04-24
相关资源
最近更新 更多