【发布时间】:2021-07-19 15:31:57
【问题描述】:
我在使用 PostgreSQL 时遇到了问题,它的运行时间很长。 这是一个示例代码。
create table t1 (
cust_id int,
cust_name varchar(100),
comment char(100));
insert into t1
select i, 'TESTNAME'||i, 'dummyy' from generate_series(1,1000) a(i);
select * from t1;
create table t2
(id int, cust_id int, amount bigint);
insert into t2
select i, case when i < 10 then i else i+1000 end, i*10
from generate_series(1,100) a(i);
select * from t2;
create table t3(
singo_id int, cust_id int, reg_date date, comment char(200));
insert into t3
select i, mod(i,1000), '2021-01-01'::date + mod(i,1000), 'dummyyyy'
from generate_series(1,2000) a(i);
--I inserted 'offset' to prevent subquery collapse on purpose.
select count(*)
from t1 a
where exists (select 1
from t3 b
where reg_date >= '2021-02-02'
and a.cust_id = b.cust_id
offset 0)
and exists (select 1
from t2 c
where a.cust_id = c.cust_id
offset 0);
--执行计划
| Aggregate (actual time=8.047..8.048 rows=1 loops=1) |
| Buffers: shared hit=1568 |
| -> Seq Scan on t1 a (actual time=8.042..8.043 rows=0 loops=1) |
| Filter: ((SubPlan 2) AND (SubPlan 1)) |
| Rows Removed by Filter: 1000 |
| Buffers: shared hit=1568 |
| SubPlan 2 |
| -> Seq Scan on t2 c (actual time=0.005..0.005 rows=0 loops=1000) |
| Filter: (a.cust_id = cust_id) |
| Rows Removed by Filter: 99 |
| Buffers: shared hit=1000 |
| SubPlan 1 |
| -> Seq Scan on t3 b (actual time=0.293..0.293 rows=0 loops=9) |
| Filter: ((reg_date >= '2021-02-02'::date) AND (a.cust_id = cust_id)) |
| Rows Removed by Filter: 2000 |
| Buffers: shared hit=549
我知道我做的测试 SQL 很傻。 我在生产系统中的真实SQL非常复杂,子查询不能折叠。 看上面的执行计划,好像PostgreSQL先用t2表过滤t1表。 我希望优化器做的是强制优化器首先使用 t3 表进行过滤。 我怎样才能做到这一点? 我将测试 SQL 更改为下面的内容。但它没有用。
select count(*)
from t1 a
where exists (select 1
from t2 c
where a.cust_id = c.cust_id
offset 0)
and exists (select 1
from t3 b
where reg_date >= '2021-02-02'
and a.cust_id = b.cust_id
offset 0);
【问题讨论】:
-
cust_id是所有三个表的主键吗? (并且t3.reg_date也可以是键或索引的一部分)
标签: postgresql subquery execution exists