如何优化 sql where = (select from same_table)答案

【问题标题】：How to optimize sql where = (select from same_table)如何优化 sql where = (select from same_table)
【发布时间】：2020-01-28 10:41:46
【问题描述】：

我有一个 PostgreSQL 查询，不知道有什么方法可以优化它。

我认为查询的主要瓶颈是子查询。

select social_status, count(*)
from client
where 1 = 1
  and social_status = (select social_status from client where id = 1)
  and created_at between '2018-09-10 06:05:41'::timestamp - interval '14 day' and '2018-09-10 06:05:41'::timestamp
group by social_status

另外，我尝试将= 替换为in，但没有任何改变。

我尝试使用 join，但它什么也没返回：

select a.social_status, count(*)
from client a
JOIN client b
     ON a.id = b.id
where 1 = 1
   and b.id = 1
  and a.social_status = b.social_status
  and a.created_at between '2018-09-10 06:05:41'::timestamp - interval '14 day' and '2018-09-10 06:05:41'::timestamp
group by a.social_status

现在大约需要 13-19 秒。

解释（分析、缓冲、格式化文本）结果：

QUERY PLAN
GroupAggregate  (cost=8.44..206659.09 rows=12 width=17) (actual time=23584.356..23584.357 rows=1 loops=1)
  Group Key: a.social_status
  Buffers: shared hit=8737 read=183781
  I/O Timings: read=22802.316
  InitPlan 1 (returns $0)
    ->  Index Scan using client_id_index on client  (cost=0.42..8.44 rows=1 width=9) (actual time=1.405..1.407 rows=1 loops=1)
          Index Cond: (id = 1)
          Buffers: shared hit=1 read=3
          I/O Timings: read=1.374
  ->  Seq Scan on client a  (cost=0.00..206645.81 rows=943 width=9) (actual time=202.157..23582.677 rows=2323 loops=1)
        Filter: ((created_at >= '2018-08-27 06:05:41'::timestamp without time zone) AND (created_at <= '2018-09-10 06:05:41'::timestamp without time zone) AND ((social_status)::text = ($0)::text))
        Rows Removed by Filter: 812931
        Buffers: shared hit=8737 read=183781
        I/O Timings: read=22802.316
Planning Time: 0.217 ms
Execution Time: 23584.460 ms

【问题讨论】：

请edit您的问题并添加使用explain (analyze, buffers, format text)生成的execution plan（不是只是一个“简单”解释）为formatted text，并确保您防止缩进计划。粘贴文本，然后将``` 放在计划前一行和计划后一行。还请包括所有索引的完整 create index 语句。
子查询不是问题（大约需要 1.5 毫秒）created_at 上有索引吗？
@a_horse_with_no_name 我认为，该字段没有索引，但不幸的是我无法更改数据库。
据我所知，该列上缺少的索引是查询缓慢的主要原因。如果你想让它更快，你必须创建索引
谢谢，我会努力做到的。但是初始查询不需要任何优化？您能否将您的评论添加为答案，以便我接受？

标签： sql postgresql performance optimization

【解决方案1】：

您可以尝试使用窗口函数：

select social_status, count(*)
from (select c.*,
             max(social_status) filter (where id = 1) over () as social_status_1
      from client c
     ) c
where social_status = social_status_1 and
      created_at between '2018-09-10 06:05:41'::timestamp - interval '14 day' and
                         '2018-09-10 06:05:41'::timestamp
group by social_status;

对于此查询，您需要在 client(id, social_status) 和 client(created_at, social_status) 上建立索引。

【讨论】：

谢谢。不幸的是，它没有提供正确的结果。我需要获取为特定客户设置的 social_status 计数。通过您的方法，我可以计算所有类型的 social_status。
@Alex 。 . .我修复了查询。
谢谢！现在可以了。我认为，主要问题是 created_at 没有索引，正如@a_horse_with_no_name 提到的那样。我会尝试在我的公司内解决它。