这个 Postgres 查询不是最优的吗？答案

【问题标题】：Is this Postgres query not optimal?这个 Postgres 查询不是最优的吗？
【发布时间】：2020-04-28 08:08:06
【问题描述】：

我面临以下查询在 Postgres 9.2 中运行需要很长时间的问题：

select coalesce(sum(col_a), 0) 
from table_a 
where tid not in ( 
    select distinct tid 
    from table_b 
    where col_b = 13 )

注意tid 是table_a 中的主键。对于table_b，tid 被索引并引用table_a 作为外键。

该问题主要发生在磁盘快满并且表中正在发生一些重新索引时。我不是数据库专家，我不太了解问题可能是什么。

有人可以帮助理解这个问题/告诉我是否有更优化的查询吗？

【问题讨论】：

你有 col_b 的索引吗？这就是 seqscan 发生的地方
不，col_b 上没有索引
也许左连接或不存在具有更好的性能？ select coalesce(sum(col_a), 0) from table_a left join table_b on table_a.tid = table_b.tid and table_b.column = 13 where table_b.tid is null 或 select coalesce(sum(col_a), 0) from table_a where NOT EXISTS (select * from table_b where col_b = 13 and table_b.tid = table_a.tid)
NOT EXISTS 通常比 NOT IN 更快，并且子查询中的 distinct 是不必要的
与您的问题无关，但是：Postgres 9.2 是no longer supported，您应该尽快计划升级。

标签： sql postgresql postgresql-9.2

【解决方案1】：

我会尝试NOT EXISTS：

select coalesce(sum(a.col_a), 0) 
from table_a a
where not exists (select 1 from table_b b where b.tid = a.tid and b.col_b = 13);

此外，聚合也有帮助：

select coalesce(sum(a.col_a), 0) 
from table_a a inner join
     table_b b
     on b.tid = a.tid
group by a.tid
having count(*) filter (where b.col_b = 13) = 0;

另一种选择是使用left join：

select coalesce(sum(a.col_a), 0) 
from table_a a left join
     table_b b
     on b.tid = a.tid and b.col_b = 13
where b.tid is null;

为了获得最佳性能，索引会有所帮助 table_a(tid, col_a), table_b(tid, col_b)

【讨论】：

【解决方案2】：

我会推荐NOT EXISTS 使用正确的索引。因此，将查询写为：

select coalesce(sum(col_a), 0) 
from table_a a
where not exists (select 1
                  from table_b b
                  where b.tid = a.tid and b.col_b = 13
                 );

你想要的索引在table_b(tid, col_b):

create index idx_table_b_tid_col_b on table_b(id, col_b);

【讨论】：