【发布时间】:2021-09-29 23:42:46
【问题描述】:
我有以下 SQL
WITH filtered_users_pre as (
SELECT value as username,row_number() OVER (partition by value) AS rk
FROM "user-stats".tag_table
WHERE _at_timestamp = 1626955200
AND tag in ('commercial','marketing')
),
filtered_users as (
SELECT username
FROM filtered_users_pre
WHERE rk = 2
),
valid_users as (
SELECT aa.username, aa.rank, aa.points, aa.version
FROM "users-results".ai_algo aa
WHERE aa._at_timestamp = 1626955200
AND aa.rank_timeframe = '7d'
AND aa.username IN (SELECT * FROM filtered_users)
ORDER BY aa.rank ASC
LIMIT 15
OFFSET 0
)
select * from valid_users;
"user-stats".tag_table 是一个包含大约 6000 万行的表,具有适当的索引。
"users-results".ai_algo 是一个包含大约 1000 万行的表,具有适当的索引。
适当的索引我指的是出现在上面 WHERE 子句中的所有字段。
如果 filtered_users 为空,则查询需要 4 秒才能运行。如果filtered_users至少有一行,则需要400ms。
谁能解释我为什么?有没有办法让查询以相同的性能(400ms)运行,filtered_users 为空?我期待通过减少filtered_users 中的行数来获得更好的性能。这就是最多 1 行发生的情况。当行数为0时,需要10倍的时间。
当然,如果我在 WHERE 中而不是 IN 子句,而是在 ai_algo 和 filtered_users 之间放置一个 INNER JOIN,也会发生同样的情况
更新
这是filtered_users 有0 行(4 秒执行)时的EXPLAIN (ANALYZE, BUFFERS) 输出查询
Limit (cost=14592.13..15870.39 rows=15 width=35) (actual time=3953.945..3953.949 rows=0 loops=1)
Buffers: shared hit=7456641
-> Nested Loop Semi Join (cost=14592.13..1795382.62 rows=20897 width=35) (actual time=3953.944..3953.947 rows=0 loops=1)
Join Filter: (aa.username = filtered_users_pre.username)
Buffers: shared hit=7456641
-> Index Scan using ai_algo_202107_rank_timeframe_rank_idx on ai_algo_202107 aa (cost=0.56..1718018.61 rows=321495 width=35) (actual time=0.085..3885.547 rows=313611 loops=1)
" Index Cond: (rank_timeframe = '7d'::""valid-users-timeframe"")"
Filter: (_at_timestamp = 1626955200)
Rows Removed by Filter: 7793096
Buffers: shared hit=7456533
-> Materialize (cost=14591.56..14672.51 rows=13 width=21) (actual time=0.000..0.000 rows=0 loops=313611)
Buffers: shared hit=108
-> Subquery Scan on filtered_users_pre (cost=14591.56..14672.44 rows=13 width=21) (actual time=3.543..3.545 rows=0 loops=1)
Filter: (filtered_users_pre.rk = 2)
Rows Removed by Filter: 2415
Buffers: shared hit=108
-> WindowAgg (cost=14591.56..14638.74 rows=2696 width=29) (actual time=1.996..3.356 rows=2415 loops=1)
Buffers: shared hit=108
-> Sort (cost=14591.56..14598.30 rows=2696 width=21) (actual time=1.990..2.189 rows=2415 loops=1)
Sort Key: tag_table_20210722.value
Sort Method: quicksort Memory: 285kB
Buffers: shared hit=108
-> Bitmap Heap Scan on tag_table_20210722 (cost=146.24..14437.94 rows=2696 width=21) (actual time=0.612..1.080 rows=2415 loops=1)
" Recheck Cond: ((tag)::text = ANY ('{commercial,marketing}'::text[]))"
Filter: (_at_timestamp = 1626955200)
Rows Removed by Filter: 2415
Heap Blocks: exact=72
Buffers: shared hit=105
-> Bitmap Index Scan on tag_table_20210722_tag_idx (cost=0.00..145.57 rows=5428 width=0) (actual time=0.292..0.292 rows=4830 loops=1)
" Index Cond: ((tag)::text = ANY ('{commercial,marketing}'::text[]))"
Buffers: shared hit=33
Planning Time: 0.914 ms
Execution Time: 3954.035 ms
这是当 filters_users 至少有 1 行(300 毫秒)时
Limit (cost=14592.13..15870.39 rows=15 width=35) (actual time=15.958..300.759 rows=15 loops=1)
Buffers: shared hit=11042
-> Nested Loop Semi Join (cost=14592.13..1795382.62 rows=20897 width=35) (actual time=15.957..300.752 rows=15 loops=1)
Join Filter: (aa.username = filtered_users_pre.username)
Rows Removed by Join Filter: 1544611
Buffers: shared hit=11042
-> Index Scan using ai_algo_202107_rank_timeframe_rank_idx on ai_algo_202107 aa (cost=0.56..1718018.61 rows=321495 width=35) (actual time=0.075..10.455 rows=645 loops=1)
" Index Cond: (rank_timeframe = '7d'::""valid-users-timeframe"")"
Filter: (_at_timestamp = 1626955200)
Rows Removed by Filter: 16124
Buffers: shared hit=10937
-> Materialize (cost=14591.56..14672.51 rows=13 width=21) (actual time=0.003..0.174 rows=2395 loops=645)
Buffers: shared hit=105
-> Subquery Scan on filtered_users_pre (cost=14591.56..14672.44 rows=13 width=21) (actual time=1.895..3.680 rows=2415 loops=1)
Filter: (filtered_users_pre.rk = 1)
Buffers: shared hit=105
-> WindowAgg (cost=14591.56..14638.74 rows=2696 width=29) (actual time=1.894..3.334 rows=2415 loops=1)
Buffers: shared hit=105
-> Sort (cost=14591.56..14598.30 rows=2696 width=21) (actual time=1.889..2.102 rows=2415 loops=1)
Sort Key: tag_table_20210722.value
Sort Method: quicksort Memory: 285kB
Buffers: shared hit=105
-> Bitmap Heap Scan on tag_table_20210722 (cost=146.24..14437.94 rows=2696 width=21) (actual time=0.604..1.046 rows=2415 loops=1)
" Recheck Cond: ((tag)::text = ANY ('{commercial,marketing}'::text[]))"
Filter: (_at_timestamp = 1626955200)
Rows Removed by Filter: 2415
Heap Blocks: exact=72
Buffers: shared hit=105
-> Bitmap Index Scan on tag_table_20210722_tag_idx (cost=0.00..145.57 rows=5428 width=0) (actual time=0.287..0.287 rows=4830 loops=1)
" Index Cond: ((tag)::text = ANY ('{commercial,marketing}'::text[]))"
Buffers: shared hit=33
Planning Time: 0.310 ms
Execution Time: 300.954 ms
【问题讨论】:
-
没有看到
EXPLAIN (ANALYZE, BUFFERS)输出,没有人可以回答这个问题。顺便说一句,对眼前的一切进行 iindexing 是不是正确的索引。 -
我还没有索引所有内容,我索引了搜索中涉及的所有字段,这就是我的意思。将很快提供解释
-
@LaurenzAlbe 遵循您的建议,上面的答案更新为
EXPLAIN (ANALYZE, BUFFERS)
标签: sql postgresql query-optimization