【发布时间】:2013-04-12 07:45:45
【问题描述】:
我的表lead有一个索引:
\d lead
...
Indexes:
"lead_pkey" PRIMARY KEY, btree (id)
"lead_account__c" btree (account__c)
...
"lead_email" btree (email)
"lead_id_prefix" btree (id text_pattern_ops)
为什么 PG (9.1) 不使用索引来进行这种简单的等式选择?电子邮件几乎都是独一无二的......
db=> explain select * from lead where email = 'blah';
QUERY PLAN
------------------------------------------------------------
Seq Scan on lead (cost=0.00..319599.38 rows=1 width=5108)
Filter: (email = 'blah'::text)
(2 rows)
其他索引命中查询似乎没问题(虽然我不知道为什么这个不只使用 pkey 索引):
db=> explain select * from lead where id = '';
QUERY PLAN
------------------------------------------------------------------------------
Index Scan using lead_id_prefix on lead (cost=0.00..8.57 rows=1 width=5108)
Index Cond: (id = ''::text)
(2 rows)
db=> explain select * from lead where account__c = '';
QUERY PLAN
----------------------------------------------------------------------------------
Index Scan using lead_account__c on lead (cost=0.00..201.05 rows=49 width=5108)
Index Cond: (account__c = ''::text)
(2 rows)
起初我认为这可能是由于email 的不同值不够。例如,如果统计数据声称email 对于大多数表来说是blah,那么seq 扫描会更快。但事实并非如此:
db=> select count(*), count(distinct email) from lead;
count | count
--------+--------
749148 | 733416
(1 row)
即使我强制关闭 seq 扫描,规划器的行为就好像它别无选择:
db=> set enable_seqscan = off;
SET
db=> show enable_seqscan;
enable_seqscan
----------------
off
(1 row)
db=> explain select * from lead where email = 'foo@blah.com';
QUERY PLAN
---------------------------------------------------------------------------
Seq Scan on lead (cost=10000000000.00..10000319599.38 rows=1 width=5108)
Filter: (email = 'foo@blah.com'::text)
(2 rows)
也试过EXPLAIN ANALYZE:
db=> explain analyze select * from lead where email = 'foo@blah.com';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Seq Scan on lead (cost=10000000000.00..10000319732.76 rows=1 width=5102) (actual time=77845.244..77845.244 rows=0 loops=1)
Filter: (email = 'foo@blah.com'::text)
Total runtime: 77857.215 ms
(3 rows)
这是\d 输出(抱歉,必须隐藏列名,并裁剪以适应 SO 的限制;请参阅http://pastebin.com/ve3gzJpY 的未裁剪版本):
Table "lead"
Column | Type | Modifiers
--------------------------------------------+-----------------------------+-----------
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | real |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | boolean |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
email | text |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | boolean |
...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
account__c | text |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
id | text | not null
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | real |
...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | timestamp without time zone |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | real |
Indexes:
"lead_pkey" PRIMARY KEY, btree (id)
"lead_account__c" btree (account__c)
"lead_XXXXXXXXXXXXXXXXXXXXXX" btree (XXXXXXXXXXXXXXXXXXXXXX)
"lead_XXXXXXXXXXXXXXXXXXXXXX" btree (XXXXXXXXXXXXXXXXXXXXXX)
"lead_XXXXXXXXXXXXXXXXXXXXXX" btree (XXXXXXXXXXXXXXXXXXXXXX)
"lead_email" btree (email)
"lead_id_prefix" btree (id text_pattern_ops)
这里是pg_dump --schema-only -t lead(再次查看未裁剪的http://pastebin.com/ve3gzJpY,还有唯一的列名,以防有助于重现性):
--
-- PostgreSQL database dump
--
SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET default_tablespace = '';
SET default_with_oids = false;
--
-- Name: lead; Type: TABLE; Schema: public; Owner: pod; Tablespace:
--
CREATE TABLE lead (
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX real,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX boolean,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX date,
...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
account__c text,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
id text NOT NULL,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX real,
...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX timestamp without time zone,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX real
);
ALTER TABLE lead OWNER TO pod;
--
-- Name: lead_pkey; Type: CONSTRAINT; Schema: public; Owner: pod; Tablespace:
--
ALTER TABLE ONLY lead
ADD CONSTRAINT lead_pkey PRIMARY KEY (id);
--
-- Name: lead_account__c; Type: INDEX; Schema: public; Owner: pod; Tablespace:
--
CREATE INDEX lead_account__c ON lead USING btree (account__c);
--
-- Name: lead_XXXXXXXXXXXXXXXXXXXX; Type: INDEX; Schema: public; Owner: pod; Tablespace:
--
CREATE INDEX lead_XXXXXXXXXXXXXXXXXXXX ON lead USING btree (XXXXXXXXXXXXXXXXXXXX);
--
-- Name: lead_XXXXXXXXXXXXXXXXXXXX; Type: INDEX; Schema: public; Owner: pod; Tablespace:
--
CREATE INDEX lead_XXXXXXXXXXXXXXXXXXXX ON lead USING btree (XXXXXXXXXXXXXXXXXXXX);
--
-- Name: lead_XXXXXXXXXXXXXXXXXXXX; Type: INDEX; Schema: public; Owner: pod; Tablespace:
--
CREATE INDEX lead_XXXXXXXXXXXXXXXXXXXX ON lead USING btree (XXXXXXXXXXXXXXXXXXXX);
--
-- Name: lead_email; Type: INDEX; Schema: public; Owner: pod; Tablespace:
--
CREATE INDEX lead_email ON lead USING btree (email);
--
-- Name: lead_id_prefix; Type: INDEX; Schema: public; Owner: pod; Tablespace:
--
CREATE INDEX lead_id_prefix ON lead USING btree (id text_pattern_ops);
--
-- PostgreSQL database dump complete
--
一些PG目录咒语:
db=> select * from pg_index where indexrelid = 'lead_email'::regclass;
indexrelid | indrelid | indnatts | indisunique | indisprimary | indisexclusion | indimmediate | indisclustered | indisvalid | indcheckxmin | indisready | indkey | indcollation | indclass | indoption | indexprs | indpred
------------+-----------+----------+-------------+--------------+----------------+--------------+----------------+------------+--------------+------------+--------+--------------+----------+-----------+----------+---------
215251995 | 101034456 | 1 | f | f | f | t | f | t | t | t | 101 | 100 | 10043 | 0 | ¤ | ¤
(1 row)
一些地区信息:
db=> show lc_collate;
lc_collate
-------------
en_US.UTF-8
(1 row)
db=> show lc_ctype;
lc_ctype
-------------
en_US.UTF-8
(1 row)
我搜索了很多过去的 SO 问题,但没有一个是像这样的简单相等查询。
【问题讨论】:
-
奇怪......简单的相等不应该需要
text_pattern_ops索引,所以这很难解释。你能在一个小样本中重现这个吗?如果是这样,请发布到 sqlfiddle.com 并在此处链接。 -
请显示完整的表定义(最好通过
pg_dump)。 -
@PeterEisentraut 用
\d和pg_dump架构更新了问题。 -
@CraigRinger 我会尽力重现,但这可能需要一段时间 - 此表中有很多数据包含敏感的客户信息。
-
您的列名修改没有很好地执行 - 您最终所有列的名称都相同。这使得重现您的场景变得困难,因为 CREATE INDEX 语句引用了不明确的名称(更不用说 CREATE TABLE 本身由于列名重复而失败的事实)。为每个列使用不同的名称会更好。另外,lc_collate 和 lc_ctype 设置是什么?这些对于复制者可能很重要(索引的 indcollate=100 表示“默认排序规则”)。无论如何,非默认排序规则将显示为“修饰符”..
标签: postgresql indexing postgresql-9.1