您可以使用表达式索引,但不应该。保持简单,并使用普通的 b-tree。
可以在colname IS NOT NULL上创建表达式索引:
test=> CREATE TABLE blah(name text);
CREATE TABLE
test=> CREATE INDEX name_notnull ON blah((name IS NOT NULL));
CREATE INDEX
test=> INSERT INTO blah(name) VALUES ('a'),('b'),(NULL);
INSERT 0 3
test=> SET enable_seqscan = off;
SET
craig=> SELECT * FROM blah WHERE name IS NOT NULL;
name
------
a
b
(2 rows)
test=> EXPLAIN SELECT * FROM blah WHERE name IS NOT NULL;
QUERY PLAN
-----------------------------------------------------------------------------
Bitmap Heap Scan on blah (cost=9.39..25.94 rows=1303 width=32)
Filter: (name IS NOT NULL)
-> Bitmap Index Scan on name_notnull (cost=0.00..9.06 rows=655 width=0)
Index Cond: ((name IS NOT NULL) = true)
(4 rows)
test=> SET enable_bitmapscan = off;
SET
test=> EXPLAIN SELECT * FROM blah WHERE name IS NOT NULL;
QUERY PLAN
------------------------------------------------------------------------------
Index Scan using name_notnull on blah (cost=0.15..55.62 rows=1303 width=32)
Index Cond: ((name IS NOT NULL) = true)
Filter: (name IS NOT NULL)
(3 rows)
...但 Pg 没有意识到它也可用于IS NULL:
test=> EXPLAIN SELECT * FROM blah WHERE name IS NULL;
QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on blah (cost=10000000000.00..10000000023.10 rows=7 width=32)
Filter: (name IS NULL)
(2 rows)
甚至将NOT (name IS NOT NULL) 转换为name IS NULL,这通常是您想要的。
test=> EXPLAIN SELECT * FROM blah WHERE NOT (name IS NOT NULL);
QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on blah (cost=10000000000.00..10000000023.10 rows=7 width=32)
Filter: (name IS NULL)
(2 rows)
所以实际上你最好使用两个不相交的表达式索引,一个在空集上,一个在非空集上。
test=> DROP INDEX name_notnull ;
DROP INDEX
test=> CREATE INDEX name_notnull ON blah((name IS NOT NULL)) WHERE (name IS NOT NULL);
CREATE INDEX
test=> EXPLAIN SELECT * FROM blah WHERE name IS NOT NULL;
QUERY PLAN
--------------------------------------------------------------------------
Index Scan using name_notnull on blah (cost=0.13..8.14 rows=3 width=32)
Index Cond: ((name IS NOT NULL) = true)
(2 rows)
test=> CREATE INDEX name_null ON blah((name IS NULL)) WHERE (name IS NULL);
CREATE INDEX
craig=> EXPLAIN SELECT * FROM blah WHERE name IS NULL;
QUERY PLAN
-----------------------------------------------------------------------
Index Scan using name_null on blah (cost=0.12..8.14 rows=1 width=32)
Index Cond: ((name IS NULL) = true)
(2 rows)
虽然这很可怕。对于大多数明智的用途,我只使用普通的 b-tree 索引。索引大小的改进并不太令人兴奋,至少对于小的输入,比如我用一堆 md5 值创建的虚拟对象:
test=> SELECT pg_size_pretty(pg_relation_size('blah'));
pg_size_pretty
----------------
9416 kB
(1 row)
test=> SELECT pg_size_pretty(pg_relation_size('blah_name'));
pg_size_pretty
----------------
7984 kB
(1 row)
test=> SELECT pg_size_pretty(pg_relation_size('name_notnull'));
pg_size_pretty
----------------
2208 kB
(1 row)
test=> SELECT pg_size_pretty(pg_relation_size('name_null'));
pg_size_pretty
----------------
2208 kB
(1 row)