【问题标题】:Improve seq scan filter cost on column being NOT NULL提高非空列的 seq 扫描过滤器成本
【发布时间】:2019-12-08 01:50:30
【问题描述】:

我有以下带有解释的查询:

SELECT  "carts".* FROM "carts" WHERE ("carts"."content_updated_at" IS NOT NULL)  ORDER BY carts.content_updated_at desc LIMIT 50 OFFSET 0
                                  QUERY PLAN
------------------------------------------------------------------------------
 Limit  (cost=208231.51..208231.63 rows=50 width=1267)
   ->  Sort  (cost=208231.51..208558.04 rows=130615 width=1267)
         Sort Key: content_updated_at DESC
         ->  Seq Scan on carts  (cost=0.00..203892.57 rows=130615 width=1267)
               Filter: (content_updated_at IS NOT NULL)

如果我在content_updated_at IS NOT NULL 上添加索引,是否会提高此查询的性能?

这里是没有ORDER BY 子句的解释:

EXPLAIN for: SELECT  "carts".* FROM "carts" WHERE ("carts"."content_updated_at" IS NOT NULL) LIMIT 50 OFFSET 0
                               QUERY PLAN
------------------------------------------------------------------------
 Limit  (cost=0.00..75.03 rows=50 width=1270)
   ->  Seq Scan on carts  (cost=0.00..204483.29 rows=136264 width=1270)
         Filter: (content_updated_at IS NOT NULL)

EXPLAIN ANALYZE:

EXPLAIN ANALYZE SELECT  "carts".* FROM "carts" WHERE ("carts"."content_updated_at" IS NOT NULL)  ORDER BY carts.content_updated_at desc LIMIT 50 OFFSET 0;
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=209373.22..209373.34 rows=50 width=1270) (actual time=18482.469..18482.620 rows=50 loops=1)
   ->  Sort  (cost=209373.22..209717.30 rows=137633 width=1270) (actual time=18482.463..18482.517 rows=50 loops=1)
         Sort Key: content_updated_at DESC
         Sort Method: top-N heapsort  Memory: 50kB
         ->  Seq Scan on carts  (cost=0.00..204801.15 rows=137633 width=1270) (actual time=0.553..18283.431 rows=139318 loops=1)
               Filter: (content_updated_at IS NOT NULL)
               Rows Removed by Filter: 3023640

【问题讨论】:

    标签: database postgresql indexing


    【解决方案1】:

    seq 扫描是由于 ORDER BY,而不是 IS NOT NULL。您应该在 content_updated_at 上放置一个索引。如果您总是将 IS NOT NULL 添加到 where 子句,它可能是部分索引:

    CREATE INDEX ON carts (content_updated_at) WHERE content_updated_at IS NOT NULL;
    

    【讨论】:

    • 哦,我想,读到这个Seq Scan on carts (cost=0.00..203892.57 rows=130615 width=1267),它是因为IS NOT NULL。你是怎么得出这个结论的?
    • 我更新了没有ORDER BY的解释输出
    • 如果不使用 EXPLAIN ANALYZE,很容易遗漏一些细节。 SELECT * ... LIMIT 50 只会选择它找到的前 50 行(通过顺序扫描)并停止。 SELECT * ... ORDER BY x LIMIT 50 需要对所有内容进行排序,然后返回前 50 行。它要贵得多。
    • 感谢您的洞察力。我正在用EXPLAIN ANALYZE 更新问题。
    猜你喜欢
    • 2017-07-29
    • 2021-05-19
    • 1970-01-01
    • 1970-01-01
    • 2013-09-12
    • 1970-01-01
    • 2012-03-20
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多