优化左连接表上的 OR 查询答案

【问题标题】：Optimising an OR query on a left join table优化左连接表上的 OR 查询
【发布时间】：2021-06-08 19:08:18
【问题描述】：

我在表 A 上有此查询，该查询在关联表 B 上执行左连接并提取与 A 上的某些条件匹配的记录或 B上的某些条件：

SELECT  A.*, B.status FROM "A" 
LEFT JOIN B ON B.a_id = A.id AND B.b_field = 20371
WHERE "A"."type" = 'SomeValue' AND "A"."deleted_at" IS NULL AND 
      (A.a_field = 20371 OR A.another_field = 69074 OR B.id IS NOT NULL) 
ORDER BY "A"."updated_at" DESC LIMIT 10 OFFSET 0;

这里是解释：

Limit  (cost=234623.62..234624.83 rows=10 width=635) (actual time=4034.840..4034.984 rows=10 loops=1)
  ->  Gather Merge  (cost=234623.62..344565.43 rows=909175 width=635) (actual time=4034.839..4034.982 rows=10 loops=1)
        Workers Planned: 5
        Workers Launched: 0
        ->  Sort  (cost=233623.54..234078.13 rows=181835 width=635) (actual time=4033.536..4033.540 rows=10 loops=1)
              Sort Key: A.updated_at DESC
              Sort Method: top-N heapsort  Memory: 34kB
              ->  Hash Left Join  (cost=113.31..229694.15 rows=181835 width=635) (actual time=5.680..4033.139 rows=79 loops=1)
                    Hash Cond: (A.id = B.a_id)
                    Filter: ((A.a_field = 20371) OR (A.another_field = 69074) OR (B.id IS NOT NULL))
                    Rows Removed by Filter: 860017
                    ->  Parallel Seq Scan on A  (cost=0.00..228898.94 rows=181835 width=635) (actual time=0.011..3833.346 rows=860096 loops=1)
                          Filter: ((deleted_at IS NULL) AND ((type)::text = 'SomeValue'::text))
                          Rows Removed by Filter: 5265254
                    ->  Hash  (cost=112.92..112.92 rows=31 width=8) (actual time=0.107..0.108 rows=79 loops=1)
                          Buckets: 1024  Batches: 1  Memory Usage: 12kB
                          ->  Index Scan using index_B_on_b_field on B B  (cost=0.42..112.92 rows=31 width=8) (actual time=0.014..0.087 rows=79 loops=1)
                                Index Cond: (b_field = 20371)
Planning Time: 0.790 ms
Execution Time: 4035.090 ms

如您所见，Postgres 没有利用主表上的任何索引来折扣 A 上的大部分记录（A.type、A.deleted_at 等），因为它必须扫描 B 记录。

这是不带 OR 条件的查询：

SELECT  A.*, B.status FROM "A" 
LEFT JOIN B ON B.a_id = A.id AND B.b_field = 20371
WHERE "A"."type" = 'SomeValue' AND "A"."deleted_at" IS NULL AND 
      (A.a_field = 20371 OR A.another_field = 69074) 
ORDER BY "A"."updated_at" DESC LIMIT 10 OFFSET 0;

以及解释分析：

Limit  (cost=1397.52..1397.55 rows=10 width=635) (actual time=0.018..0.019 rows=0 loops=1)
  ->  Sort  (cost=1397.52..1397.64 rows=48 width=635) (actual time=0.017..0.018 rows=0 loops=1)
        Sort Key: A.updated_at DESC
        Sort Method: quicksort  Memory: 25kB
        ->  Hash Left Join  (cost=128.65..1396.49 rows=48 width=635) (actual time=0.013..0.014 rows=0 loops=1)
              Hash Cond: (A.id = B.a_id)
              ->  Bitmap Heap Scan on A  (cost=15.33..1282.98 rows=48 width=635) (actual time=0.012..0.013 rows=0 loops=1)
                    Recheck Cond: ((a_field = 20371) OR (another_field = 69074))
                    Filter: ((deleted_at IS NULL) AND ((type)::text = 'SomeValue'::text))
                    ->  BitmapOr  (cost=15.33..15.33 rows=325 width=0) (actual time=0.011..0.012 rows=0 loops=1)
                          ->  Bitmap Index Scan on index_A_on_a_field  (cost=0.00..10.87 rows=325 width=0) (actual time=0.006..0.006 rows=0 loops=1)
                                Index Cond: (a_field = 20371)
                          ->  Bitmap Index Scan on index_A_on_another_field  (cost=0.00..4.44 rows=1 width=0) (actual time=0.005..0.005 rows=0 loops=1)
                                Index Cond: (another_field = 69074)
              ->  Hash  (cost=112.92..112.92 rows=31 width=4) (never executed)
                    ->  Index Scan using index_B_on_b_field on B B  (cost=0.42..112.92 rows=31 width=4) (never executed)
                          Index Cond: (b_field = 20371)
Planning Time: 0.552 ms
Execution Time: 0.104 ms

有没有办法我们可以重写这个查询或让 Postgres 使用主 A 表上的一些索引？

【问题讨论】：

主查询中不需要表 B，您可以将其移至 EXISTS(...) 术语中。
我确实需要它。编辑查询以在 select 子句中添加来自 B 的列。 @wildplasser
那么，我们优化了错误的查询（计划）？
不，查询计划是正确的
"使 Postgres 能够利用主 A 表上的一些索引" 你还没有告诉我们这些是什么。

标签： sql postgresql indexing

【解决方案1】：

您可以尝试扩展至UNION：

SELECT  A.* FROM "A" 
LEFT JOIN B ON B.a_id = A.id AND B.b_field = 20371
WHERE "A"."type" = 'SomeValue' AND "A"."deleted_at" IS NULL AND 
      (A.a_field = 20371 OR A.another_field = 69074 OR B.id IS NOT NULL) 
ORDER BY "A"."updated_at" DESC LIMIT 10 OFFSET 0;

SELECT  A.* 
FROM "A" 
LEFT JOIN B ON B.a_id = A.id AND B.b_field = 20371
WHERE "A"."type" = 'SomeValue' 
  AND "A"."deleted_at" IS NULL 
  AND  A.a_field = 20371  
UNION
SELECT  A.* 
FROM "A" 
LEFT JOIN B ON B.a_id = A.id AND B.b_field = 20371
WHERE "A"."type" = 'SomeValue' 
  AND "A"."deleted_at" IS NULL 
  AND  A.another_field = 69074 
UNION 
SELECT  A.* 
FROM "A" 
LEFT JOIN B ON B.a_id = A.id AND B.b_field = 20371
WHERE "A"."type" = 'SomeValue' 
  AND "A"."deleted_at" IS NULL
  AND  B.id IS NOT NULL   -- this would be the same as changing LEFT to INNER JOIN
ORDER BY "A"."updated_at" DESC LIMIT 10 OFFSET 0;

【讨论】：