加入大量大表时如何提高查询性能？答案

【问题标题】：How to improve the performance of queries when joining a lot of huge tables?加入大量大表时如何提高查询性能？
【发布时间】：2021-10-31 01:26:41
【问题描述】：

这是我的 SQL 脚本，我要加入 7 个表

SELECT concat_ws('-', it.item_id, it.model_id) AS product_id,
       concat_ws('-', aip.partner_item_id, aip.partner_model_id) AS product_reseller_id,
       i.name as item_name,
       im.name AS model_name,
       p.partner_code,
       sum(it.quantity) AS transfer_total,
       sum(isb.remaining_item) as remaining_stock,
       sum(isb.sold_item) as partner_sold
FROM transfer t
INNER JOIN partner p ON p.reseller_store_id = t.reseller_store_id
INNER JOIN item_transfer it ON t.id = it.transfer_id
INNER JOIN item i ON i.id = it.item_id
INNER JOIN item_model im ON it.model_id = im.id
INNER JOIN affiliate_item_mapping aip on it.item_id = aip.seller_item_id and it.model_id = aip.seller_model_id
and t.reseller_store_id = aip.reseller_store_id
LEFT JOIN inventory_summary_branch isb on isb.inventory_summary_id = concat_ws('-', aip.partner_item_id, aip.partner_model_id)
WHERE p.store_id = 9805
GROUP BY it.item_id, it.model_id, p.partner_code, i.id, im.id, aip.id, isb.inventory_summary_id

这是 SQL EXPLAIN 的结果：

GroupAggregate  (cost=13861.57..13861.62 rows=1 width=885) (actual time=1890.392..1890.525 rows=15 loops=1)
  Group Key: it.item_id, it.model_id, p.partner_code, i.id, im.id, aip.id, isb.inventory_summary_id
  Buffers: shared hit=118610
  ->  Sort  (cost=13861.57..13861.58 rows=1 width=765) (actual time=1890.310..1890.338 rows=21 loops=1)
        Sort Key: it.item_id, it.model_id, p.partner_code, aip.id, isb.inventory_summary_id
        Sort Method: quicksort  Memory: 28kB
        Buffers: shared hit=118610
        ->  Nested Loop  (cost=1.27..13861.56 rows=1 width=765) (actual time=73.156..1890.057 rows=21 loops=1)
              Buffers: shared hit=118610
              ->  Nested Loop  (cost=0.85..13853.14 rows=1 width=753) (actual time=73.134..1889.495 rows=21 loops=1)
                    Buffers: shared hit=118526
                    ->  Nested Loop  (cost=0.43..13845.32 rows=1 width=609) (actual time=73.099..1888.733 rows=21 loops=1)
                          Join Filter: ((p.reseller_store_id = t.reseller_store_id) AND (it.transfer_id = t.id))
                          Rows Removed by Join Filter: 2142
                          Buffers: shared hit=118442
                          ->  Nested Loop  (cost=0.43..13840.24 rows=1 width=633) (actual time=72.793..1879.961 rows=21 loops=1)
                                Join Filter: ((aip.seller_item_id = it.item_id) AND (aip.seller_model_id = it.model_id))
                                Rows Removed by Join Filter: 6003
                                Buffers: shared hit=118379
                                ->  Nested Loop Left Join  (cost=0.43..13831.47 rows=1 width=601) (actual time=72.093..1861.415 rows=24 loops=1)
                                      Buffers: shared hit=118307
                                      ->  Nested Loop  (cost=0.00..11.44 rows=1 width=572) (actual time=0.042..0.696 rows=24 loops=1)
                                            Join Filter: (p.reseller_store_id = aip.reseller_store_id)
                                            Rows Removed by Join Filter: 150
                                            Buffers: shared hit=7
                                            ->  Seq Scan on partner p  (cost=0.00..10.38 rows=1 width=524) (actual time=0.026..0.039 rows=6 loops=1)
                                                  Filter: (store_id = 9805)
                                                  Buffers: shared hit=1
                                            ->  Seq Scan on affiliate_item_mapping aip  (cost=0.00..1.03 rows=3 width=48) (actual time=0.006..0.043 rows=29 loops=6)
                                                  Buffers: shared hit=6
                                      ->  Index Scan using branch_id_inventory_summary_id_inventory_summary_branch on inventory_summary_branch isb  (cost=0.43..13820.01 rows=1 width=29) (actual time=77.498..77.498 rows=0 loops=24)
                                            Index Cond: ((inventory_summary_id)::text = concat_ws('-'::text, aip.partner_item_id, aip.partner_model_id))
                                            Buffers: shared hit=118300
                                ->  Seq Scan on item_transfer it  (cost=0.00..5.31 rows=231 width=32) (actual time=0.024..0.391 rows=251 loops=24)
                                      Buffers: shared hit=72
                          ->  Seq Scan on transfer t  (cost=0.00..3.83 rows=83 width=16) (actual time=0.011..0.256 rows=103 loops=21)
                                Buffers: shared hit=63
                    ->  Index Scan using pk_item on item i  (cost=0.42..7.81 rows=1 width=152) (actual time=0.022..0.023 rows=1 loops=21)
                          Index Cond: (id = it.item_id)
                          Buffers: shared hit=84
              ->  Index Scan using pk_item_model on item_model im  (cost=0.43..8.41 rows=1 width=20) (actual time=0.016..0.018 rows=1 loops=21)
                    Index Cond: (id = it.model_id)
                    Buffers: shared hit=84
Planning time: 10.051 ms
Execution time: 1890.943 ms

当然，这个语句可以正常工作，但是速度很慢。有没有更好的方法来编写这段代码？

如何提高性能？在这种情况下加入或子查询更好？任何人，请帮我一把

【问题讨论】：

执行计划完全一团糟。请从EXPLAIN (ANALYZE, BUFFERS)复制并粘贴结果。
什么是 branch_id_inventory_summary_id_inventory_summary_branch？就此而言，您的其他索引是什么？
@Domenico 你还在寻找这个问题的答案吗？
@Rahul Biswas - 是的，我仍在寻找解决方案。

标签： sql database postgresql query-optimization database-performance

【解决方案1】：

两件事可以帮助你

对所有涉及的表执行VACCUME ANALYZE。
在 item_transfer.item_id 和 model_id 上创建索引

【讨论】：

【解决方案2】：

基本上你所有的时间 (77.498*24) 都花在了 branch_id_inventory_summary_id_inventory_summary_branch 的索引扫描上。

我能看到的唯一解释是索引不适合查询，它正在被全索引扫描（代替全表扫描），而不是被有效扫描。这可能意味着索引包含列inventory_summary_id，但它不是前导列。（如果 EXPLAIN 能让这种低效的用法比现在更清晰，那就太好了）。

您可能会受益于诸如on inventory_summary_branch (inventory_summary_id) 这样更有可能被有效使用的索引。

我不知道为什么它不只是对该表进行哈希连接。可能你的 work_mem 太低了。

【讨论】：

非常感谢@jjanes。我为“inventory_summary_id”创建了一个索引。它帮助我提高了表现，但效果不大。
你能发布新的EXPLAIN (ANALYZE, BUFFERS)吗，最好在打开track_io_timing之后完成？

【解决方案3】：

内连接总是会变慢，尤其是在有这么多表的情况下。

您可以将整个表的内部联接更改为您需要的列，看看是否有改善：

发件人：

INNER JOIN partner p ON p.reseller_store_id = t.reseller_store_id

收件人：

inner join (select id, partner_code from partner) as p ON p.reseller_store_id = t.reseller_store_id

看看这是否能加快速度。

如果不是，我会推荐键上的索引

【讨论】：