Postgresql 索引不用于>30s 的慢查询答案

【问题标题】：Postgresql index is not used for slow queries >30sPostgresql 索引不用于>30s 的慢查询
【发布时间】：2021-09-19 21:58:06
【问题描述】：

POSTGRESQL 版本：10 硬件：4 名工人 / 16GBRAM / 50% 已使用

我不是 Postgresql 专家。我刚刚阅读了很多文档并做了很多测试。由于表上有 1000 万行，我有一些 postgresql 查询需要很多时间 > 30 秒。

            Column            |           Type           | Collation | Nullable |                         Default                          
------------------------------+--------------------------+-----------+----------+----------------------------------------------------------
 id                           | integer                  |           | not null | 
 cveid                        | character varying(50)    |           |          | 
 summary                      | text                     |           | not null | 
 published                    | timestamp with time zone |           |          | 
 modified                     | timestamp with time zone |           |          | 
 assigner                     | character varying(128)   |           |          | 
 vulnerable_products          | character varying(250)[] |           |          | 
 cvss                         | double precision         |           |          | 
 cvss_time                    | timestamp with time zone |           |          | 
 cvss_vector                  | character varying(250)   |           |          | 
 access                       | jsonb                    |           | not null | 
 impact                       | jsonb                    |           | not null | 
 score                        | integer                  |           | not null | 
 is_exploitable               | boolean                  |           | not null | 
 is_confirmed                 | boolean                  |           | not null | 
 is_in_the_news               | boolean                  |           | not null | 
 is_in_the_wild               | boolean                  |           | not null | 
 reflinks                     | jsonb                    |           | not null | 
 reflinkids                   | jsonb                    |           | not null | 
 created_at                   | timestamp with time zone |           |          | 
 history_id                   | integer                  |           | not null | nextval('vulns_historicalvuln_history_id_seq'::regclass)
 history_date                 | timestamp with time zone |           | not null | 
 history_change_reason        | character varying(100)   |           |          | 
 history_type                 | character varying(1)     |           | not null |
Indexes:
    "vulns_historicalvuln_pkey" PRIMARY KEY, btree (history_id)
    "btree_varchar" btree (history_type varchar_pattern_ops)
    "vulns_historicalvuln_cve_id_850876bb" btree (cve_id)
    "vulns_historicalvuln_cwe_id_2013d697" btree (cwe_id)
    "vulns_historicalvuln_history_user_id_9e25ebf5" btree (history_user_id)
    "vulns_historicalvuln_id_773f2af7" btree (id)
--- TRUNCATE
Foreign-key constraints:
    "vulns_historicalvuln_history_user_id_9e25ebf5_fk_custusers" FOREIGN KEY (history_user_id) REFERENCES custusers_user(id) DEFERRABLE INITIALLY DEFERRED

查询示例： SELECT * FROM vulns_historicalvuln WHERE history_type '+' order by id desc 仅获取前 10000 行； -> 30s 没有缓存

查询计划

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..31878.33 rows=10000 width=1736) (actual time=0.173..32839.474 rows=10000 loops=1)
   ->  Index Scan Backward using vulns_historicalvuln_id_773f2af7 on vulns_historicalvuln  (cost=0.43..26346955.92 rows=8264960 width=1736) (actual time=0.172..32830.958 rows=10000 loops=1)
         Filter: ((history_type)::text <> '+'::text)
         Rows Removed by Filter: 296
 Planning time: 19.514 ms
 Execution time: 32845.015 ms

SELECT DISTINCT "vulns"."id", "vulns"."uuid", "vulns"."feedid", "vulns"."cve_id", "vulns"."cveid", "vulns"."摘要”，“vulns”。“已发布”，“vulns”。“修改”，“vulns”。“assigner”，“vulns”。“cwe_id”，“vulns”。“vulnerable_packages_versions”，“vulns”。“vulnerable_products” , "vulns"."vulnerable_product_versions", "vulns"."cvss", "vulns"."cvss_time", "vulns"."cvss_version", "vulns"."cvss_vector", "vulns"."cvss_metrics", " vulns"."access", "vulns"."impact", "vulns"."cvss3", "vulns"."cvss3_vector", "vulns"."cvss3_version", "vulns"."cvss3_metrics", "vulns" ."score", "vulns"."is_exploitable", "vulns"."is_confirmed", "vulns"."is_in_the_news", "vulns"."is_in_the_wild", "vulns"."reflinks", "vulns"." reflinkids", "vulns"."created_at", "vulns"."updated_at", "vulns"."id" AS "exploit_count", false AS "monitored", '42' AS "org" FROM "vulns" WHERE ( "vulns"."score" >= 0 AND "vulns"."score"

查询计划

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=315191.32..315192.17 rows=10 width=1691) (actual time=3013.964..3013.990 rows=10 loops=1)
   ->  Unique  (cost=315191.32..329642.42 rows=170013 width=1691) (actual time=3013.962..3013.986 rows=10 loops=1)
         ->  Sort  (cost=315191.32..315616.35 rows=170013 width=1691) (actual time=3013.961..3013.970 rows=10 loops=1)
               Sort Key: updated_at DESC, id, uuid, feedid, cve_id, cveid, summary, published, modified, assigner, cwe_id, vulnerable_packages_versions, vulnerable_products, vulnerable_product_versions, cvss, cvss_time, cvss_version, cvss_vector, cvss_metrics, access, impact, cvss3, cvss3_vector, cvss3_version, cvss3_metrics, score, is_exploitable, is_confirmed, is_in_the_news, is_in_the_wild, reflinks, reflinkids, created_at
               Sort Method: external merge  Disk: 277648kB
               ->  Seq Scan on vulns  (cost=0.00..50542.19 rows=170013 width=1691) (actual time=0.044..836.597 rows=169846 loops=1)
                     Filter: ((score >= 0) AND (score <= 100))
 Planning time: 3.183 ms
 Execution time: 3070.346 ms

我创建了一个 btree varchar 索引btree_varchar" btree (history_type varchar_pattern_ops)，如下所示：在 vulns_historicalvuln (history_type varchar_pattern_ops) 上同时创建索引 btree_varchar；

我还为我的第二个查询创建了一个 vulns 得分索引： CREATE INDEX CONCURRENTLY ON vulns (score);

我阅读了很多关于慢查询和索引的帖子和文档。我确定这是关于慢查询的解决方案，但 Postgresql 的查询计划不使用我创建的索引。它估计它使用 seq 扫描比使用索引处理更快...

SELECT relname, indexrelname, idx_scan FROM   pg_catalog.pg_stat_user_indexes;

               relname               |                          indexrelname                           |  idx_scan  
-------------------------------------+-----------------------------------------------------------------+------------
 vulns_historicalvuln                | btree_varchar                                                   |          0

你能告诉我我的索引是否设计得很好？我如何调试它，如果需要，请随时询问更多信息。

谢谢

【问题讨论】：

history_type 是一个低基数列。见：stackoverflow.com/q/68306005/905902
使用索引需要 30 秒的是，不使用索引的需要不到 30 秒。
@wildplasser 是对的。你的过滤器只删除了你说的 1000 万中的 254 个。这个查询基本上只是对 1000 万行进行排序。
我对基数和数据库不熟悉。我找到了这个定义：不过，我们通常不会将基数作为一个数字来讨论。简单地谈论“高”和“低”基数更为常见。许多不同的值是高基数；大量重复值是低基数。
好吧，30 秒查询 1000 万行是“正常的”。索引不是这里的解决方案，所以我必须重新制作我的 postgres 查询或存储以使其更快？例如，history_type 是一个字符，它是： * '+' -> creation * '~' -> modify * '-' -> 抑制也许有一种方法可以让它变得更好，比如使用 ENUM ？

标签： postgresql

【解决方案1】：

经过一番研究，我明白索引不是我的问题的解决方案。该字段的低基数（重复值）使索引无用。

这里查询postgresql的时间是正常的，因为匹配了30M行。我关闭了这个问题，因为这里的索引没有问题。

【讨论】：

正如目前所写，您的答案尚不清楚。请edit 添加其他详细信息，以帮助其他人了解这如何解决所提出的问题。你可以找到更多关于如何写好答案的信息in the help center。