【发布时间】:2021-09-19 21:58:06
【问题描述】:
POSTGRESQL 版本:10 硬件:4 名工人 / 16GBRAM / 50% 已使用
我不是 Postgresql 专家。我刚刚阅读了很多文档并做了很多测试。 由于表上有 1000 万行,我有一些 postgresql 查询需要很多时间 > 30 秒。
Column | Type | Collation | Nullable | Default
------------------------------+--------------------------+-----------+----------+----------------------------------------------------------
id | integer | | not null |
cveid | character varying(50) | | |
summary | text | | not null |
published | timestamp with time zone | | |
modified | timestamp with time zone | | |
assigner | character varying(128) | | |
vulnerable_products | character varying(250)[] | | |
cvss | double precision | | |
cvss_time | timestamp with time zone | | |
cvss_vector | character varying(250) | | |
access | jsonb | | not null |
impact | jsonb | | not null |
score | integer | | not null |
is_exploitable | boolean | | not null |
is_confirmed | boolean | | not null |
is_in_the_news | boolean | | not null |
is_in_the_wild | boolean | | not null |
reflinks | jsonb | | not null |
reflinkids | jsonb | | not null |
created_at | timestamp with time zone | | |
history_id | integer | | not null | nextval('vulns_historicalvuln_history_id_seq'::regclass)
history_date | timestamp with time zone | | not null |
history_change_reason | character varying(100) | | |
history_type | character varying(1) | | not null |
Indexes:
"vulns_historicalvuln_pkey" PRIMARY KEY, btree (history_id)
"btree_varchar" btree (history_type varchar_pattern_ops)
"vulns_historicalvuln_cve_id_850876bb" btree (cve_id)
"vulns_historicalvuln_cwe_id_2013d697" btree (cwe_id)
"vulns_historicalvuln_history_user_id_9e25ebf5" btree (history_user_id)
"vulns_historicalvuln_id_773f2af7" btree (id)
--- TRUNCATE
Foreign-key constraints:
"vulns_historicalvuln_history_user_id_9e25ebf5_fk_custusers" FOREIGN KEY (history_user_id) REFERENCES custusers_user(id) DEFERRABLE INITIALLY DEFERRED
查询示例: SELECT * FROM vulns_historicalvuln WHERE history_type '+' order by id desc 仅获取前 10000 行; -> 30s 没有缓存
查询计划
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..31878.33 rows=10000 width=1736) (actual time=0.173..32839.474 rows=10000 loops=1)
-> Index Scan Backward using vulns_historicalvuln_id_773f2af7 on vulns_historicalvuln (cost=0.43..26346955.92 rows=8264960 width=1736) (actual time=0.172..32830.958 rows=10000 loops=1)
Filter: ((history_type)::text <> '+'::text)
Rows Removed by Filter: 296
Planning time: 19.514 ms
Execution time: 32845.015 ms
SELECT DISTINCT "vulns"."id", "vulns"."uuid", "vulns"."feedid", "vulns"."cve_id", "vulns"."cveid", "vulns"."摘要”,“vulns”。“已发布”,“vulns”。“修改”,“vulns”。“assigner”,“vulns”。“cwe_id”,“vulns”。“vulnerable_packages_versions”,“vulns”。“vulnerable_products” , "vulns"."vulnerable_product_versions", "vulns"."cvss", "vulns"."cvss_time", "vulns"."cvss_version", "vulns"."cvss_vector", "vulns"."cvss_metrics", " vulns"."access", "vulns"."impact", "vulns"."cvss3", "vulns"."cvss3_vector", "vulns"."cvss3_version", "vulns"."cvss3_metrics", "vulns" ."score", "vulns"."is_exploitable", "vulns"."is_confirmed", "vulns"."is_in_the_news", "vulns"."is_in_the_wild", "vulns"."reflinks", "vulns"." reflinkids", "vulns"."created_at", "vulns"."updated_at", "vulns"."id" AS "exploit_count", false AS "monitored", '42' AS "org" FROM "vulns" WHERE ( "vulns"."score" >= 0 AND "vulns"."score"
查询计划
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=315191.32..315192.17 rows=10 width=1691) (actual time=3013.964..3013.990 rows=10 loops=1)
-> Unique (cost=315191.32..329642.42 rows=170013 width=1691) (actual time=3013.962..3013.986 rows=10 loops=1)
-> Sort (cost=315191.32..315616.35 rows=170013 width=1691) (actual time=3013.961..3013.970 rows=10 loops=1)
Sort Key: updated_at DESC, id, uuid, feedid, cve_id, cveid, summary, published, modified, assigner, cwe_id, vulnerable_packages_versions, vulnerable_products, vulnerable_product_versions, cvss, cvss_time, cvss_version, cvss_vector, cvss_metrics, access, impact, cvss3, cvss3_vector, cvss3_version, cvss3_metrics, score, is_exploitable, is_confirmed, is_in_the_news, is_in_the_wild, reflinks, reflinkids, created_at
Sort Method: external merge Disk: 277648kB
-> Seq Scan on vulns (cost=0.00..50542.19 rows=170013 width=1691) (actual time=0.044..836.597 rows=169846 loops=1)
Filter: ((score >= 0) AND (score <= 100))
Planning time: 3.183 ms
Execution time: 3070.346 ms
我创建了一个 btree varchar 索引btree_varchar" btree (history_type varchar_pattern_ops),如下所示:
在 vulns_historicalvuln (history_type varchar_pattern_ops) 上同时创建索引 btree_varchar;
我还为我的第二个查询创建了一个 vulns 得分索引: CREATE INDEX CONCURRENTLY ON vulns (score);
我阅读了很多关于慢查询和索引的帖子和文档。我确定这是关于慢查询的解决方案,但 Postgresql 的查询计划不使用我创建的索引。它估计它使用 seq 扫描比使用索引处理更快...
SELECT relname, indexrelname, idx_scan FROM pg_catalog.pg_stat_user_indexes;
relname | indexrelname | idx_scan
-------------------------------------+-----------------------------------------------------------------+------------
vulns_historicalvuln | btree_varchar | 0
你能告诉我我的索引是否设计得很好?我如何调试它,如果需要,请随时询问更多信息。
谢谢
【问题讨论】:
-
history_type是一个低基数列。见:stackoverflow.com/q/68306005/905902 -
使用索引需要 30 秒的是,不使用索引的需要不到 30 秒。
-
@wildplasser 是对的。你的过滤器只删除了你说的 1000 万中的 254 个。这个查询基本上只是对 1000 万行进行排序。
-
我对基数和数据库不熟悉。我找到了这个定义:不过,我们通常不会将基数作为一个数字来讨论。简单地谈论“高”和“低”基数更为常见。许多不同的值是高基数;大量重复值是低基数。
-
好吧,30 秒查询 1000 万行是“正常的”。索引不是这里的解决方案,所以我必须重新制作我的 postgres 查询或存储以使其更快?例如,history_type 是一个字符,它是: * '+' -> creation * '~' -> modify * '-' -> 抑制 也许有一种方法可以让它变得更好,比如使用 ENUM ?
标签: postgresql