postgres 选择了一个糟糕的查询计划，如何解决？答案

【问题标题】：postgres chooses an aweful query plan , how can that be fixedpostgres 选择了一个糟糕的查询计划，如何解决？
【发布时间】：2017-02-02 18:42:00
【问题描述】：

我正在尝试优化这个查询：

EXPLAIN ANALYZE  
select
  dtt.matching_protein_seq_ids
from detected_transcript_translation dtt
join peptide_spectrum_match psm 
    on psm.detected_transcript_translation_id = 
       dtt.detected_transcript_translation_id
join peptide_spectrum_match_sequence psms 
    on  psm.peptide_spectrum_match_sequence_id = 
       psms.peptide_spectrum_match_sequence_id
WHERE
dtt.matching_protein_seq_ids && ARRAY[654819, 294711]
;

当允许 seq_scan 时（设置 enable_seqscan = on），优化器会选择一个非常糟糕的计划，在 49.85 秒内运行：

https://explain.depesz.com/s/WKbew

使用 set enable_seqscan = off，选择的计划使用适当的索引并且查询会立即运行。

https://explain.depesz.com/s/ISHV

请注意，我确实对所有三个表都运行了 ANALYZE...

【问题讨论】：

标签： postgresql

【解决方案1】：

您的问题是 PostgreSQL 无法很好地估计 WHERE 条件，所以它估计它占估计的总行数的一定百分比，这太过分了。

如果你知道这样的查询总是会有很少的结果行，你可以通过定义一个函数来作弊

CREATE OR REPLACE FUNCTION matching_transcript_translations(integer[])
   RETURNS SETOF detected_transcript_translation
   LANGUAGE SQL
   STABLE STRICT
   ROWS 2  /* pretend there are always exactly two matching rows */
AS
'SELECT * FROM detected_transcript_translation
   WHERE matching_protein_seq_ids && $1';

你可以这样使用

select
  dtt.matching_protein_seq_ids
from matching_transcript_translations(ARRAY[654819, 294711]) dtt
join peptide_spectrum_match psm 
    on psm.detected_transcript_translation_id = 
       dtt.detected_transcript_translation_id
join peptide_spectrum_match_sequence psms 
    on  psm.peptide_spectrum_match_sequence_id = 
       psms.peptide_spectrum_match_sequence_id;

那么 PostgreSQL 应该被欺骗认为只有一个匹配的行。

但是，如果有很多匹配的行，那么最终的计划会比你当前的计划更糟糕……

【讨论】：