为什么只使用一个索引答案

【问题标题】：Why is only one index used为什么只使用一个索引
【发布时间】：2018-03-21 07:25:32
【问题描述】：

我有一张桌子

CREATE TABLE timedevent
(
  id bigint NOT NULL,
  eventdate timestamp with time zone NOT NULL,
  newstateids character varying(255) NOT NULL,
  sourceid character varying(255) NOT NULL,
  CONSTRAINT timedevent_pkey PRIMARY KEY (id)
) WITH (OIDS=FALSE);

与PKid。

我必须从一组可能的来源中查询具有特定新状态和来源的两个日期之间的行。

我在eventdate 和newstateids 上创建了一个btree 索引，并在sourceid 上创建了一个（哈希索引）。只有date 上的索引使查询更快 - 似乎其他两个没有使用。为什么呢？我怎样才能使我的查询更快？

CREATE INDEX eventdate_index     ON timedevent USING btree (eventdate);
CREATE INDEX newstateids_index   ON timedevent USING btree (newstateids COLLATE pg_catalog."default");
CREATE INDEX sourceid_index_hash ON timedevent USING hash  (sourceid COLLATE pg_catalog."default");

这是 Hibernate 生成的查询：

select this_.id as id1_0_0_, this_.description as descript2_0_0_, this_.eventDate as eventDat3_0_0_, this_.locationId as location4_0_0_, this_.newStateIds as newState5_0_0_, this_.oldStateIds as oldState6_0_0_, this_.sourceId as sourceId7_0_0_ 
from TimedEvent this_
where ((this_.newStateIds=? and this_.sourceId in (?, ?, ?, ?, ?, ?)))
    and this_.eventDate between ? and ?
    limit ?

编辑：
抱歉标题具有误导性，但似乎 postges 使用了所有索引。问题是我的查询时间仍然保持不变。这是我得到的查询计划：

Limit  (cost=25130.29..33155.77 rows=321 width=161) (actual time=705.330..706.744 rows=279 loops=1)
  Buffers: shared hit=6 read=8167 written=61
  ->  Bitmap Heap Scan on timedevent this_  (cost=25130.29..33155.77 rows=321 width=161) (actual time=705.330..706.728 rows=279 loops=1)
        Recheck Cond: (((sourceid)::text = ANY ('{"root,kus-chemnitz,ize-159,Anwesend Bad","root,kus-chemnitz,ize-159,Alarmruf","root,kus-chemnitz,ize-159,Bett Alarm 1","root,kus-chemnitz,ize-159,Bett Alarm 2","root,kus-chemnitz,ize-159,Anwesend Zimmer" (...)
        Filter: ((eventdate >= '2017-11-01 15:41:00+01'::timestamp with time zone) AND (eventdate <= '2018-03-20 14:58:16.724+01'::timestamp with time zone))
        Buffers: shared hit=6 read=8167 written=61
        ->  BitmapAnd  (cost=25130.29..25130.29 rows=2122 width=0) (actual time=232.990..232.990 rows=0 loops=1)
              Buffers: shared hit=6 read=2152
              ->  Bitmap Index Scan on sourceid_index_hash  (cost=0.00..1403.36 rows=39182 width=0) (actual time=1.195..1.195 rows=9308 loops=1)
                    Index Cond: ((sourceid)::text = ANY ('{"root,kus-chemnitz,ize-159,Anwesend Bad","root,kus-chemnitz,ize-159,Alarmruf","root,kus-chemnitz,ize-159,Bett Alarm 1","root,kus-chemnitz,ize-159,Bett Alarm 2","root,kus-chemnitz,ize-159,Anwesend Z (...)
                    Buffers: shared hit=6 read=26
              ->  Bitmap Index Scan on state_index  (cost=0.00..23726.53 rows=777463 width=0) (actual time=231.160..231.160 rows=776520 loops=1)
                    Index Cond: ((newstateids)::text = 'ACTIV'::text)
                    Buffers: shared read=2126
Total runtime: 706.804 ms

按照 a_horse_with_no_name 的建议使用 btree on (sourceid, newstateids) 创建索引后，成本降低了：

Limit  (cost=125.03..8150.52 rows=321 width=161) (actual time=13.611..14.454 rows=279 loops=1)
  Buffers: shared hit=18 read=4336
  ->  Bitmap Heap Scan on timedevent this_  (cost=125.03..8150.52 rows=321 width=161) (actual time=13.609..14.432 rows=279 loops=1)
        Recheck Cond: (((sourceid)::text = ANY ('{"root,kus-chemnitz,ize-159,Anwesend Bad","root,kus-chemnitz,ize-159,Alarmruf","root,kus-chemnitz,ize-159,Bett Alarm 1","root,kus-chemnitz,ize-159,Bett Alarm 2","root,kus-chemnitz,ize-159,Anwesend Zimmer","r (...)
        Filter: ((eventdate >= '2017-11-01 15:41:00+01'::timestamp with time zone) AND (eventdate <= '2018-03-20 14:58:16.724+01'::timestamp with time zone))
        Buffers: shared hit=18 read=4336
        ->  Bitmap Index Scan on src_state_index  (cost=0.00..124.95 rows=2122 width=0) (actual time=0.864..0.864 rows=4526 loops=1)
              Index Cond: (((sourceid)::text = ANY ('{"root,kus-chemnitz,ize-159,Anwesend Bad","root,kus-chemnitz,ize-159,Alarmruf","root,kus-chemnitz,ize-159,Bett Alarm 1","root,kus-chemnitz,ize-159,Bett Alarm 2","root,kus-chemnitz,ize-159,Anwesend Zimmer (...)
              Buffers: shared hit=18 read=44
Total runtime: 14.497 ms"

【问题讨论】：

请edit您的问题并添加您使用的确切 create index语句和使用explain (analyze, buffers)生成的执行计划。 Formatted text 请no screen shots
@zerkms：这不是真的。
Postgres 确实有可能使用例如IN 条件的位图索引扫描和between 条件的另一个索引。这是否有意义以及计划者是否真正选择这样做取决于许多不同的因素。但是“只使用一个索引”的一般说法是错误的。
(newstateids, sourceid) 上的单个（btree）索引可能是比两个单列索引更好的选择
@zerkms: 是的，如果这意味着创建更少或更小的索引（例如，因为这些索引可以用于更多查询，而不仅仅是一个）

标签： postgresql indexing postgresql-performance

【解决方案1】：

基本上只使用一个索引，因为数据库必须将您的索引组合成一个以便它们有用（或组合来自更多索引的搜索结果）并且这样做非常昂贵，在这种情况下它选择不并仅使用与一个谓词相关的索引之一，并直接根据找到的行中的值检查其他谓词。

一个包含多列的 B 树索引会更好，正如 cmets 中的 a_horse_with_no_name 所建议的那样。另请注意，列的顺序很重要（用于单值搜索的列应该在前，用于范围搜索的列在后，您希望尽可能限制范围搜索）。然后databese将通过索引，使用索引的第一列寻找满足谓词的行（希望将行数缩小很多），然后第二列和第二个谓词开始发挥作用，...

在使用 AND 运算符组合谓词时使用单独的 B 树索引对数据库没有意义，因为它必须使用一个索引来选择满足一个谓词的所有行，然后必须使用另一个索引，再次从磁盘读取其块（存储索引的位置），仅获取满足与第二个索引相关的条件但可能不满足其他条件的行。如果他们满足它，那么在第一次使用 index 之后加载行并直接检查其他谓词，而不是使用 index，可能会更便宜。

【讨论】：