【发布时间】:2015-10-06 15:27:57
【问题描述】:
我正在对 Postgres 9.3.9 中的一个大表进行查询。它是一个空间数据集,并且是空间索引的。比如说,我需要找到 3 种类型的对象:A、B 和 C。标准是 B 和 C 都在 A 的一定距离内,比如 500 米。
我的查询是这样的:
select
school.osm_id as school_osm_id,
school.name as school_name,
school.way as school_way,
restaurant.osm_id as restaurant_osm_id,
restaurant.name as restaurant_name,
restaurant.way as restaurant_way,
bar.osm_id as bar_osm_id,
bar.name as bar_name,
bar.way as bar_way
from (
select osm_id, name, amenity, way, way_geo
from planet_osm_point
where amenity = 'school') as school,
(select osm_id, name, amenity, way, way_geo
from planet_osm_point
where amenity = 'restaurant') as restaurant,
(select osm_id, name, amenity, way, way_geo
from planet_osm_point
where amenity = 'bar') as bar
where ST_DWithin(school.way_geo, restaurant.way_geo, 500, false)
and ST_DWithin(school.way_geo, bar.way_geo, 500, false);
这个查询给了我我想要的,但它需要很长时间,比如 13 秒才能执行。我想知道是否有另一种方法来编写查询并提高效率。
查询计划:
Nested Loop (cost=74.43..28618.65 rows=1 width=177) (actual time=33.513..11235.212 rows=10591 loops=1)
Buffers: shared hit=530967 read=8733
-> Nested Loop (cost=46.52..28586.46 rows=1 width=174) (actual time=31.998..9595.212 rows=4235 loops=1)
Buffers: shared hit=389863 read=8707
-> Bitmap Heap Scan on planet_osm_point (cost=18.61..2897.83 rows=798 width=115) (actual time=7.862..150.607 rows=8811 loops=1)
Recheck Cond: (amenity = 'school'::text)
Buffers: shared hit=859 read=5204
-> Bitmap Index Scan on idx_planet_osm_point_amenity (cost=0.00..18.41 rows=798 width=0) (actual time=5.416..5.416 rows=8811 loops=1)
Index Cond: (amenity = 'school'::text)
Buffers: shared hit=3 read=24
-> Bitmap Heap Scan on planet_osm_point planet_osm_point_1 (cost=27.91..32.18 rows=1 width=115) (actual time=1.064..1.069 rows=0 loops=8811)
Recheck Cond: ((way_geo && _st_expand(planet_osm_point.way_geo, 500::double precision)) AND (amenity = 'restaurant'::text))
Filter: ((planet_osm_point.way_geo && _st_expand(way_geo, 500::double precision)) AND _st_dwithin(planet_osm_point.way_geo, way_geo, 500::double precision, false))
Rows Removed by Filter: 0
Buffers: shared hit=389004 read=3503
-> BitmapAnd (cost=27.91..27.91 rows=1 width=0) (actual time=1.058..1.058 rows=0 loops=8811)
Buffers: shared hit=384528 read=2841
-> Bitmap Index Scan on idx_planet_osm_point_waygeo (cost=0.00..9.05 rows=137 width=0) (actual time=0.193..0.193 rows=64 loops=8811)
Index Cond: (way_geo && _st_expand(planet_osm_point.way_geo, 500::double precision))
Buffers: shared hit=146631 read=2841
-> Bitmap Index Scan on idx_planet_osm_point_amenity (cost=0.00..18.41 rows=798 width=0) (actual time=0.843..0.843 rows=6291 loops=8811)
Index Cond: (amenity = 'restaurant'::text)
Buffers: shared hit=237897
-> Bitmap Heap Scan on planet_osm_point planet_osm_point_2 (cost=27.91..32.18 rows=1 width=115) (actual time=0.375..0.383 rows=3 loops=4235)
Recheck Cond: ((way_geo && _st_expand(planet_osm_point.way_geo, 500::double precision)) AND (amenity = 'bar'::text))
Filter: ((planet_osm_point.way_geo && _st_expand(way_geo, 500::double precision)) AND _st_dwithin(planet_osm_point.way_geo, way_geo, 500::double precision, false))
Rows Removed by Filter: 1
Buffers: shared hit=141104 read=26
-> BitmapAnd (cost=27.91..27.91 rows=1 width=0) (actual time=0.368..0.368 rows=0 loops=4235)
Buffers: shared hit=127019
-> Bitmap Index Scan on idx_planet_osm_point_waygeo (cost=0.00..9.05 rows=137 width=0) (actual time=0.252..0.252 rows=363 loops=4235)
Index Cond: (way_geo && _st_expand(planet_osm_point.way_geo, 500::double precision))
Buffers: shared hit=101609
-> Bitmap Index Scan on idx_planet_osm_point_amenity (cost=0.00..18.41 rows=798 width=0) (actual time=0.104..0.104 rows=779 loops=4235)
Index Cond: (amenity = 'bar'::text)
Buffers: shared hit=25410
Total runtime: 11238.605 ms
目前我只使用一张表,其中有 1,372,711 行。它有 73 列:
Column | Type | Modifiers
--------------------+----------------------+---------------------------
osm_id | bigint |
access | text |
addr:housename | text |
addr:housenumber | text |
addr:interpolation | text |
admin_level | text |
aerialway | text |
aeroway | text |
amenity | text |
area | text |
barrier | text |
bicycle | text |
brand | text |
bridge | text |
boundary | text |
building | text |
capital | text |
construction | text |
covered | text |
culvert | text |
cutting | text |
denomination | text |
disused | text |
ele | text |
embankment | text |
foot | text |
generator:source | text |
harbour | text |
highway | text |
historic | text |
horse | text |
intermittent | text |
junction | text |
landuse | text |
layer | text |
leisure | text |
lock | text |
man_made | text |
military | text |
motorcar | text |
name | text |
natural | text |
office | text |
oneway | text |
operator | text |
place | text |
poi | text |
population | text |
power | text |
power_source | text |
public_transport | text |
railway | text |
ref | text |
religion | text |
route | text |
service | text |
shop | text |
sport | text |
surface | text |
toll | text |
tourism | text |
tower:type | text |
tunnel | text |
water | text |
waterway | text |
wetland | text |
width | text |
wood | text |
z_order | integer |
tags | hstore |
way | geometry(Point,4326) |
way_geo | geography |
gid | integer | not null default nextval('...
Indexes:
"planet_osm_point_pkey1" PRIMARY KEY, btree (gid)
"idx_planet_osm_point_amenity" btree (amenity)
"idx_planet_osm_point_waygeo" gist (way_geo)
"planet_osm_point_index" gist (way)
"planet_osm_point_pkey" btree (osm_id)
便利学校、餐厅、酒吧分别有8811、6291、779排。
【问题讨论】:
-
由于您交叉连接同一个表 3 次,您可能会得到相同记录的 3 个或更多结果。 table1 中有多少条记录,您希望通过匹配获得多少条记录?这可以告诉我们 13 秒是否太长。如果没有完整的连接语法,很难相信“这个查询给了我想要的东西”。
-
您的编辑改进了一些事情,但您并没有一路走好。基本细节仍然缺失。考虑 instructions here。点击链接并阅读。另外,特别是:类型为
A、B、C的行(大约)有多少?在 psql 中提供\d table1的输出(不带无关列)、EXPLAIN (ANALYZE, BUFFERS)的输出以及所有相关索引。 -
@ErwinBrandstetter 我更改了查询。桌子很稀疏。例如,如果我创建子表,我会创建一个表,该表只包含用于便利设施等的无空值的记录,这将加快进程。但问题是我有 73 列,我可能需要创建大约 70 个新表...
-
@ErwinBrandstetter 原来的表只有way,是几何,但是我需要在缓冲区中使用米所以我创建了一个地理列way_geo,但最终我需要纬度和经度,所以我猜测显示几何或地理都可以。查询可以是任何事物,甚至可以是超过三个事物(最少两个事物),并且每个事物都可以是任何列,除了 id、主键、地理和几何。是的,表很稀疏,很多都是空的,但我认为每一列至少有一个条目
-
"any of the columns" ... 如果只是
amenity列,我会有一些想法来加快速度,但是大约 70 列中的任何一个都很难索引。我认为你有一个数据库设计问题,就像你已经暗示过自己一样。另一个问题:您在究竟的结果中寻找什么?当前查询返回距离同一学校足够近的所有餐馆和酒吧的交叉连接。对于一个有 10 人休息的学校。附近有 10 家酒吧,您可以获得 100 行。如果您再组合 2 个项目,每个项目有 10 个命中,那已经是 10.000 行。这真的是你想要的吗?
标签: sql postgresql postgis spatial postgresql-performance