【发布时间】:2015-09-14 23:05:34
【问题描述】:
我在 PostgreSQL 中有一个带有大数据的数据库(现在它大约是 46 GB,并且数据库将继续增长)。我在常用列上创建了索引并调整了配置文件:
shared_buffers = 1GB
temp_buffers = 256MB
work_mem = 512MB
但是这个查询还是很慢:
select distinct us_category_id as cat, count(h_user_id) as res from web_hits
inner join users on h_user_id = us_id
where (h_datetime)::date = ('2015-06-26')::date and us_category_id != ''
group by us_category_id
解释分析:
HashAggregate (cost=2870958.72..2870958.93 rows=21 width=9) (actual time=899141.683..899141.683 rows=0 loops=1) Group Key: users.us_category_id, count(web_hits.h_user_id) -> HashAggregate (cost=2870958.41..2870958.62 rows=21 width=9) (actual time=899141.681..899141.681 rows=0 loops=1) Group Key: users.us_category_id -> Hash Join (cost=5974.98..2869632.11 rows=265259 width=9) (actual time=899141.679..899141.679 rows=0 loops=1) Hash Cond: ((web_hits.h_user_id)::text = (users.us_id)::text) -> Seq Scan on web_hits (cost=0.00..2857563.80 rows=275260 width=7) (actual time=899141.676..899141.676 rows=0 loops=1) -> Seq Scan on web_hits (cost=0.00..2857563.80 rows=275260 width=7) (actual time=899141.676..899141.676 rows=0 loops=1) Filter: ((h_datetime)::date = '2015-06-26'::date) Rows Removed by Filter: 55051918 -> Hash (cost=4292.99..4292.99 rows=134559 width=10) (never executed) -> Seq Scan on users (cost=0.00..4292.99 rows=134559 width=10) (never executed) Filter: ((us_category_id)::text <> ''::text) "Planning time: 1.309 ms" "Execution time: 899141.789 ms"
日期已更改。 如何加快查询速度?
创建表和索引
CREATE TABLE web_hits (
h_id integer NOT NULL DEFAULT nextval('w_h_seq'::regclass),
h_user_id character varying,
h_datetime timestamp without time zone,
h_db_id character varying,
h_voc_prefix character varying,
...
h_bot_chek integer, -- 1-бот...
CONSTRAINT w_h_pk PRIMARY KEY (h_id)
);
ALTER TABLE web_hits OWNER TO postgres;
COMMENT ON COLUMN web_hits.h_bot_chek IS '1-бот, 0-не бот';
CREATE INDEX h_datetime ON web_hits (h_datetime);
CREATE INDEX h_db_index ON web_hits (h_db_id COLLATE pg_catalog."default");
CREATE INDEX h_pref_index ON web_hits (h_voc_prefix COLLATE pg_catalog."default" text_pattern_ops);
CREATE INDEX h_user_index ON web_hits (h_user_id text_pattern_ops);
CREATE TABLE users (
us_id character varying NOT NULL,
us_category_id character varying,
...
CONSTRAINT user_pk PRIMARY KEY (us_id),
CONSTRAINT cities_users_fk FOREIGN KEY (us_city_home)
REFERENCES cities (city_id),
CONSTRAINT countries_users_fk FOREIGN KEY (us_country_home)
REFERENCES countries (country_id),
CONSTRAINT organizations_users_fk FOREIGN KEY (us_institution_id)
REFERENCES organizations (org_id),
CONSTRAINT specialities_users_fk FOREIGN KEY (us_speciality_id)
REFERENCES specialities (speciality_id),
CONSTRAINT us_affiliation FOREIGN KEY (us_org_id)
REFERENCES organizations (org_id),
CONSTRAINT us_category FOREIGN KEY (us_category_id)
REFERENCES categories (cat_id),
CONSTRAINT us_reading_room FOREIGN KEY (us_reading_room_id)
REFERENCES reading_rooms (rr_id)
);
ALTER TABLE users OWNER TO sveta;
COMMENT ON COLUMN users.us_type IS '0-аноним, 1-читатель, 2-удаленный';
CREATE INDEX us_cat_index ON users (us_category_id);
CREATE INDEX us_user_index ON users (us_id text_pattern_ops);
【问题讨论】:
-
请注意,您可以删除 DISTINCT 关键字,因为由于您的 GROUP BY,结果已经很明显了。
-
请发布表和索引定义。
-
您能详细说明您已经建立的索引吗?这两个表看起来都被 seq scan 访问了。
-
我从您的设置中删除了噪音(默认设置)。另一方面,重要信息丢失。考虑postgresql-performance 的标签信息中的说明。为什么
postgres拥有一张表,sveta拥有另一张表?您对多个 ID 列使用字符数据类型而不是普通的integer(或bigint)有什么特殊原因? -
为什么
web_hits.h_user_id没有定义NOT NULL?列中是否有 NULL 值?如果是,您打算如何计算这些?从web_hits.h_user_id到users. us_id似乎真的应该有一个FK约束......
标签: sql windows postgresql postgresql-performance