【发布时间】:2014-04-18 07:00:58
【问题描述】:
我有 4 张桌子
create table web_content_3 ( content integer, hits bigint, bytes bigint, appid varchar(32) );
create table web_content_4 ( content character varying (128 ), hits bigint, bytes bigint, appid varchar(32) );
create table web_content_5 ( content character varying (128 ), hits bigint, bytes bigint, appid integer );
create table web_content_6 ( content integer, hits bigint, bytes bigint, appid integer );
我正在对大约 2 百万条记录的分组使用相同的查询
即SELECT content, sum(hits) as hits, sum(bytes) as bytes, appid from web_content_{3,4,5,6} GROUP BY content,appid;
结果是:
- Table Name | Content | appid | Time Taken [In ms]
- ===========================================================
- web_content_3 | integer | Character | 27277.931
- web_content_4 | Character | Character | 151219.388
- web_content_5 | Character | integer | 127252.023
- web_content_6 | integer | integer | 5412.096
这里的 web_content_6 查询只需要大约 5 秒,与其他三个组合相比,使用这个统计数据我们可以说 group by 的整数、整数组合要快得多,但问题是为什么?
我也有 EXPLAIN 结果,但它确实为我解释了 web_content_4 和 web_content_6 查询之间的巨大变化。
在这里。
test=# EXPLAIN ANALYSE SELECT content, sum(hits) as hits, sum(bytes) as bytes, appid from web_content_4 GROUP BY content,appid;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=482173.36..507552.31 rows=17680 width=63) (actual time=138099.612..151565.655 rows=17680 loops=1)
-> Sort (cost=482173.36..487196.11 rows=2009100 width=63) (actual time=138099.202..149256.707 rows=2009100 loops=1)
Sort Key: content, appid
Sort Method: external merge Disk: 152488kB
-> Seq Scan on web_content_4 (cost=0.00..45218.00 rows=2009100 width=63) (actual time=0.010..349.144 rows=2009100 loops=1)
Total runtime: 151613.569 ms
(6 rows)
Time: 151614.106 ms
test=# EXPLAIN ANALYSE SELECT content, sum(hits) as hits, sum(bytes) as bytes, appid from web_content_6 GROUP BY content,appid;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=368814.36..394194.51 rows=17760 width=24) (actual time=3282.333..5840.953 rows=17760 loops=1)
-> Sort (cost=368814.36..373837.11 rows=2009100 width=24) (actual time=3282.176..3946.025 rows=2009100 loops=1)
Sort Key: content, appid
Sort Method: external merge Disk: 74632kB
-> Seq Scan on web_content_6 (cost=0.00..34864.00 rows=2009100 width=24) (actual time=0.011..297.235 rows=2009100 loops=1)
Total runtime: 6172.960 ms
【问题讨论】:
-
因为比较。比较整数比比较“字符串”更快
-
可能在字符串的情况下,它正在逐个字符进行比较..所以排序也需要时间..您也可以在解释计划中看到..
-
这些表上有索引吗?
标签: sql postgresql group-by explain sql-execution-plan