使用多个连接、分组和排序来加速查询答案

【问题标题】：Speeding up the query with multiple joins, group by and order by使用多个连接、分组和排序来加速查询
【发布时间】：2019-05-28 03:50:04
【问题描述】：

我有一个 SQL 查询：

SELECT
title,
(COUNT(DISTINCT A.id)) AS "count_title"

FROM 
B 
INNER JOIN D ON B.app = D.app
INNER JOIN A ON D.number = A.number 
INNER JOIN C ON A.id = C.id 

GROUP BY C.title
ORDER BY count_title DESC
LIMIT 10
;

表 D 包含 50M 记录，A 包含 30M 记录，B 和 C 各包含 30k 记录。在连接、分组依据、排序依据中使用的所有列上都定义了索引。

查询可以在没有 order by 语句的情况下正常工作，并在大约 2-3 秒内返回结果。

但是，通过排序操作（order by），查询时间增加到 10-12 秒。

我理解这背后的原因，执行器必须遍历所有记录进行排序操作，索引在这里几乎没有帮助。

还有其他方法可以加快查询速度吗？

这里是这个查询的解释分析：

"QUERY PLAN"
"Limit  (cost=974652.20..974652.22 rows=10 width=54) (actual time=2817.579..2825.071 rows=10 loops=1)"
"  Buffers: shared hit=120299 read=573195"
"  ->  Sort  (cost=974652.20..974666.79 rows=5839 width=54) (actual time=2817.578..2817.578 rows=10 loops=1)"
"        Sort Key: (count(DISTINCT A.id)) DESC"
"        Sort Method: top-N heapsort  Memory: 26kB"
"        Buffers: shared hit=120299 read=573195"
"        ->  GroupAggregate  (cost=974325.65..974526.02 rows=5839 width=54) (actual time=2792.465..2817.097 rows=3618 loops=1)"
"              Group Key: C.title"
"              Buffers: shared hit=120299 read=573195"
"              ->  Sort  (cost=974325.65..974372.97 rows=18931 width=32) (actual time=2792.451..2795.161 rows=45175 loops=1)"
"                    Sort Key: C.title"
"                    Sort Method: quicksort  Memory: 5055kB"
"                    Buffers: shared hit=120299 read=573195"
"                    ->  Gather  (cost=968845.30..972980.74 rows=18931 width=32) (actual time=2753.402..2778.648 rows=45175 loops=1)"
"                          Workers Planned: 1"
"                          Workers Launched: 1"
"                          Buffers: shared hit=120299 read=573195"
"                          ->  Parallel Hash Join  (cost=967845.30..970087.64 rows=11136 width=32) (actual time=2751.725..2764.832 rows=22588 loops=2)"
"                                Hash Cond: ((C.id)::text = (A.id)::text)"
"                                Buffers: shared hit=120299 read=573195"
"                                ->  Parallel Seq Scan on C  (cost=0.00..1945.87 rows=66687 width=32) (actual time=0.017..4.316 rows=56684 loops=2)"
"                                      Buffers: shared read=1279"
"                                ->  Parallel Hash  (cost=966604.55..966604.55 rows=99260 width=9) (actual time=2750.987..2750.987 rows=20950 loops=2)"
"                                      Buckets: 262144  Batches: 1  Memory Usage: 4032kB"
"                                      Buffers: shared hit=120266 read=571904"
"                                      ->  Nested Loop  (cost=219572.23..966604.55 rows=99260 width=9) (actual time=665.832..2744.270 rows=20950 loops=2)"
"                                            Buffers: shared hit=120266 read=571904"
"                                            ->  Parallel Hash Join  (cost=219571.79..917516.91 rows=99260 width=4) (actual time=665.804..2583.675 rows=20950 loops=2)"
"                                                  Hash Cond: ((D.app)::text = (B.app)::text)"
"                                                  Buffers: shared hit=8 read=524214"
"                                                  ->  Parallel Bitmap Heap Scan on D  (cost=217542.51..895848.77 rows=5126741 width=13) (actual time=661.254..1861.862 rows=6160441 loops=2)"
"                                                        Recheck Cond: ((action_type)::text = ANY ('{10,11}'::text[]))"
"                                                        Heap Blocks: exact=242152"
"                                                        Buffers: shared hit=3 read=523925"
"                                                        ->  Bitmap Index Scan on D_index_action_type  (cost=0.00..214466.46 rows=12304178 width=0) (actual time=546.470..546.471 rows=12320882 loops=1)"
"                                                              Index Cond: ((action_type)::text = ANY ('{10,11}'::text[]))"
"                                                              Buffers: shared hit=3 read=33669"
"                                                  ->  Parallel Hash  (cost=1859.36..1859.36 rows=13594 width=12) (actual time=4.337..4.337 rows=16313 loops=2)"
"                                                        Buckets: 32768  Batches: 1  Memory Usage: 1152kB"
"                                                        Buffers: shared hit=5 read=289"
"                                                        ->  Parallel Index Only Scan using B_index_app on B  (cost=0.29..1859.36 rows=13594 width=12) (actual time=0.015..2.218 rows=16313 loops=2)"
"                                                              Heap Fetches: 0"
"                                                              Buffers: shared hit=5 read=289"
"                                            ->  Index Scan using A_index_number on A  (cost=0.43..0.48 rows=1 width=24) (actual time=0.007..0.007 rows=1 loops=41900)"
"                                                  Index Cond: ((number)::text = (D.number)::text)"
"                                                  Buffers: shared hit=120258 read=47690"
"Planning Time: 0.747 ms"
"Execution Time: 2825.118 ms"

【问题讨论】：

您可以将EXPLAIN (ANALYZE, BUFFERS) 的输出添加到问题中吗？
我已经添加了。 @LaurenzAlbe 另外，我猜查询被缓存了，所以它执行得比较快。
该执行计划来自不同的查询。我在里面看到d.action_type IN ('10', '11')。
是的，这是一个附加条件。但这不会有太大的不同。
这会有所不同，因为在这种情况下，您可以尝试对d 进行仅索引扫描。

标签： sql postgresql sqlperformance postgresql-performance

【解决方案1】：

您可以尝试在b 和d 之间建立一个嵌套循环连接，因为b 要小得多：

CREATE INDEX ON d (app);

如果d 的清理频率足够高，您可以查看仅索引扫描是否更快。为此，在索引中包含 number（在 v11 中，为此使用 INCLUDE 子句！）。 EXPLAIN 输出表明您在 action_type 上有一个额外的条件；对于仅索引扫描，您还必须包含该列。

【讨论】：

创建了一个覆盖索引：CREATE INDEX d_covering_index ON d(app) INCLUDE(number,action_type) 执行器仍在进行位图堆扫描。
a) 你VACUUM d了吗？ b) 您是否尝试仅在 app 上使用索引？
是的。我用吸尘器吸尘。应用程序上的单一索引也被定义。但是，执行计划仍然没有变化。
另外，加入A&C后，行数在80K左右。在表D中，应用action_type约束后，记录数为4M，循环3次（使用位图堆扫描）。此步骤占用了大部分执行时间。
好的，那么我看不出让查询更快的希望。你可以向它扔硬件（RAM）。