【发布时间】:2020-09-01 02:47:56
【问题描述】:
我有一种感觉,我做错了什么,但我似乎无法弄清楚。
我正在尝试执行以下查询:
Select col1, col2, col3, col4, col5, day, month, year,
sum(num1) as sum_num1,
sum(num2) as sum_num2,
count(*) as count_items
from test_table where day = 10 and month = 5 and year = 2020
group by col1, col2, col3, col4, col5, day, month, year;
另外,我在day, month, year 上有一个索引,我使用以下命令设置了该索引
CREATE INDEX CONCURRENTLY testtable_dmy_idx on test_table (day, month, year);
现在我想出了设置顺序扫描开/关的设置,并尝试使用查询。
因此,当使用SET enable_seqscan TO on;(顺便说一句,这是默认行为)和EXPLAIN (analyze,buffers,timing) 运行上一个查询时,我得到以下输出:
-- Select Query with Sequential scan on
QUERY PLAN
Finalize GroupAggregate (cost=9733303.39..10836008.34 rows=5102790 width=89) (actual time=1100914.091..1110820.480 rows=491640 loops=1)
" Group Key: col1, col2, col3, col4, col5, day, month, year"
" Buffers: shared hit=25020 read=2793049 dirtied=10040, temp read=74932 written=75039"
I/O Timings: read=1059425.134
-> Gather Merge (cost=9733303.39..10607468.38 rows=6454984 width=89) (actual time=1100911.426..1110193.876 rows=795097 loops=1)
Workers Planned: 2
Workers Launched: 2
" Buffers: shared hit=76964 read=8416562 dirtied=33686, temp read=230630 written=230956"
I/O Timings: read=3178066.529
-> Partial GroupAggregate (cost=9732303.36..9861403.04 rows=3227492 width=89) (actual time=1100791.915..1107668.495 rows=265032 loops=3)
" Group Key: col1, col2, col3, col4, col5, day, month, year"
" Buffers: shared hit=76964 read=8416562 dirtied=33686, temp read=230630 written=230956"
I/O Timings: read=3178066.529
-> Sort (cost=9732303.36..9740372.09 rows=3227492 width=81) (actual time=1100788.479..1105630.411 rows=2630708 loops=3)
" Sort Key: col1, col2, col3, col4, col5"
Sort Method: external merge Disk: 241320kB
Worker 0: Sort Method: external merge Disk: 246776kB
Worker 1: Sort Method: external merge Disk: 246336kB
" Buffers: shared hit=76964 read=8416562 dirtied=33686, temp read=230630 written=230956"
I/O Timings: read=3178066.529
-> Parallel Seq Scan on test_table (cost=0.00..9074497.49 rows=3227492 width=81) (actual time=656277.982..1073808.146 rows=2630708 loops=3)
Filter: ((day = 10) AND (month = 5) AND (year = 2020))
Rows Removed by Filter: 24027044
Buffers: shared hit=76855 read=8416561 dirtied=33686
I/O Timings: read=3178066.180
Planning Time: 4.017 ms
Execution Time: 1111033.041 ms
Total time - Around 18 minutes
然后当我设置 SET enable_seqscan TO off; 并使用 Explain 运行相同的查询时,我得到以下信息:
-- Select Query with Sequential scan off
QUERY PLAN
Finalize GroupAggregate (cost=10413126.05..11515831.01 rows=5102790 width=89) (actual time=59211.363..66579.750 rows=491640 loops=1)
" Group Key: col1, col2, col3, col4, col5, day, month, year"
" Buffers: shared hit=3 read=104091, temp read=77942 written=78052"
I/O Timings: read=28662.857
-> Gather Merge (cost=10413126.05..11287291.05 rows=6454984 width=89) (actual time=59211.262..65973.857 rows=795178 loops=1)
Workers Planned: 2
Workers Launched: 2
" Buffers: shared hit=33 read=218096, temp read=230092 written=230418"
I/O Timings: read=51560.508
-> Partial GroupAggregate (cost=10412126.03..10541225.71 rows=3227492 width=89) (actual time=57013.922..62453.555 rows=265059 loops=3)
" Group Key: col1, col2, col3, col4, col5, day, month, year"
" Buffers: shared hit=33 read=218096, temp read=230092 written=230418"
I/O Timings: read=51560.508
-> Sort (cost=10412126.03..10420194.76 rows=3227492 width=81) (actual time=57013.423..60368.530 rows=2630708 loops=3)
" Sort Key: col1, col2, col3, col4, col5"
Sort Method: external merge Disk: 246944kB
Worker 0: Sort Method: external merge Disk: 246120kB
Worker 1: Sort Method: external merge Disk: 241408kB
" Buffers: shared hit=33 read=218096, temp read=230092 written=230418"
I/O Timings: read=51560.508
-> Parallel Bitmap Heap Scan on test_table (cost=527733.84..9754320.16 rows=3227492 width=81) (actual time=18155.864..30957.312 rows=2630708 loops=3)
Recheck Cond: ((day = 10) AND (month = 5) AND (year = 2020))
Rows Removed by Index Recheck: 1423
Heap Blocks: exact=13374 lossy=44328
Buffers: shared hit=3 read=218096
I/O Timings: read=51560.508
-> Bitmap Index Scan on testtable_dmy_idx (cost=0.00..525797.34 rows=7745982 width=0) (actual time=18148.218..18148.228 rows=7892123 loops=1)
Index Cond: ((day = 10) AND (month = 5) AND (year = 2020))
Buffers: shared hit=3 read=46389
I/O Timings: read=17368.250
Planning Time: 2.787 ms
Execution Time: 66783.481 ms
Total Time - Around 1 min
我似乎不明白为什么我会出现这种行为或我做错了什么,因为我希望 Postgres 能够自动优化查询,但这并没有发生。
任何帮助将不胜感激。
编辑 1:
更多关于 RDS postgres 版本的信息:
SELECT version();
x86_64-pc-linux-gnu 上的 PostgreSQL 11.5,由 gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9) 编译,64 位
编辑 2:
以SET max_parallel_workers_per_gather TO 0 运行默认为2(如SHOW max_parallel_workers_per_gather 所示)
-- Select Query with Sequential scan ON
QUERY PLAN
GroupAggregate (cost=11515667.22..11799074.58 rows=5102790 width=89) (actual time=1120868.377..1133231.165 rows=491640 loops=1)
" Group Key: col1, col2, col3, col4, col5, day, month, year"
" Buffers: shared hit=92456 read=8400966, temp read=295993 written=296321"
I/O Timings: read=1041723.362
-> Sort (cost=11515667.22..11535032.17 rows=7745982 width=81) (actual time=1120865.304..1129419.809 rows=7892123 loops=1)
" Sort Key: col1, col2, col3, col4, col5"
Sort Method: external merge Disk: 734304kB
" Buffers: shared hit=92456 read=8400966, temp read=295993 written=296321"
I/O Timings: read=1041723.362
-> Seq Scan on test_table (cost=0.00..9888011.58 rows=7745982 width=81) (actual time=663266.269..1070560.993 rows=7892123 loops=1)
Filter: ((day = 10) AND (month = 5) AND (year = 2020))
Rows Removed by Filter: 72081131
Buffers: shared hit=92450 read=8400966
I/O Timings: read=1041723.362
Planning Time: 5.829 ms
Execution Time: 1133422.968 ms
Total Time - Around 18 mins
随后,
-- Select Query with Sequential scan OFF
QUERY PLAN
GroupAggregate (cost=12190966.21..12474373.57 rows=5102790 width=89) (actual time=109048.306..119255.079 rows=491640 loops=1)
" Group Key: col1, col2, col3, col4, col5, day, month, year"
" Buffers: shared hit=3 read=218096, temp read=295993 written=296321"
I/O Timings: read=55697.723
-> Sort (cost=12190966.21..12210331.17 rows=7745982 width=81) (actual time=109047.621..115468.268 rows=7892123 loops=1)
" Sort Key: col1, col2, col3, col4, col5"
Sort Method: external merge Disk: 734304kB
" Buffers: shared hit=3 read=218096, temp read=295993 written=296321"
I/O Timings: read=55697.723
-> Bitmap Heap Scan on test_table (cost=527733.84..10563310.57 rows=7745982 width=81) (actual time=16941.764..62203.367 rows=7892123 loops=1)
Recheck Cond: ((day = 10) AND (month = 5) AND (year = 2020))
Rows Removed by Index Recheck: 4270
Heap Blocks: exact=39970 lossy=131737
Buffers: shared hit=3 read=218096
I/O Timings: read=55697.723
-> Bitmap Index Scan on testtable_dmy_idx (cost=0.00..525797.34 rows=7745982 width=0) (actual time=16933.964..16933.964 rows=7892123 loops=1)
Index Cond: ((day = 10) AND (month = 5) AND (year = 2020))
Buffers: shared hit=3 read=46389
I/O Timings: read=16154.294
Planning Time: 3.684 ms
Execution Time: 119440.147 ms
Total Time - Around 2 mins
编辑 3:
我使用以下方法检查了插入、更新、删除、活动和死元组的数量
SELECT n_tup_ins as "inserts",n_tup_upd as "updates",n_tup_del as "deletes", n_live_tup as "live_tuples", n_dead_tup as "dead_tuples"
FROM pg_stat_user_tables
where relname = 'test_table';
得到以下结果
| inserts | updates | deletes | live_tuples | dead_tuples |
|-------------|---------|-----------|-------------|-------------|
| 296590964 | 0 | 412400995 | 79717032 | 7589442 |
运行以下命令
VACUUM (VERBOSE, ANALYZE) test_table
得到以下结果:
[2020-05-15 18:34:08] [00000] vacuuming "public.test_table"
[2020-05-15 18:37:13] [00000] scanned index "testtable_dmy_idx" to remove 7573896 row versions
[2020-05-15 18:37:56] [00000] scanned index "testtable_unixts_idx" to remove 7573896 row versions
[2020-05-15 18:38:16] [00000] "test_table": removed 7573896 row versions in 166450 pages
[2020-05-15 18:38:16] [00000] index "testtable_dmy_idx" now contains 79973254 row versions in 1103313 pages
[2020-05-15 18:38:16] [00000] index "testtable_unixts_idx" now contains 79973254 row versions in 318288 pages
[2020-05-15 18:38:16] [00000] "test_table": found 99 removable, 2196653 nonremovable row versions in 212987 out of 8493416 pages
[2020-05-15 18:38:16] [00000] vacuuming "pg_toast.pg_toast_25023"
[2020-05-15 18:38:16] [00000] index "pg_toast_25023_index" now contains 0 row versions in 1 pages
[2020-05-15 18:38:16] [00000] "pg_toast_25023": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
[2020-05-15 18:38:16] [00000] analyzing "public.test_table"
[2020-05-15 18:38:27] [00000] "test_table": scanned 30000 of 8493416 pages, containing 282611 live rows and 0 dead rows; 30000 rows in sample, 80011093 estimated total rows
[2020-05-15 18:38:27] completed in 4 m 19 s 58 ms
之后,同一个查询的结果是这样的:
| inserts | updates | deletes | live_tuples | dead_tuples |
|-----------|---------|-----------|-------------|-------------|
| 296590964 | 0 | 412400995 | 80011093 | 0 |
【问题讨论】:
-
您是否在桌子上运行了 ANALYZE 和/或 VACUUM ?
work_mem的值是多少? -
并行计划比非并行计划更难解释。你可以重复 max_parallel_workers_per_gather=0 吗?希望我们学到的任何经验都能转化为平行。
-
您是否交替重复执行这些计划以排除缓存影响?
-
@jjanes 添加了带有编辑的版本信息,尽管这里也有,
PostgreSQL 11.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit -
将日期拆分为三个 {year,month,day} 列会导致三个低节奏列,索引对它们几乎没有影响。 (一个复杂的问题可能是三个列可以为空,这可能会导致另一个级别的灾难)
标签: sql postgresql performance amazon-rds