PostGIS中计算重叠区域的有效方法答案

【问题标题】：Efficient way of calculating overlap area in PostGISPostGIS中计算重叠区域的有效方法
【发布时间】：2022-03-28 06:05:36
【问题描述】：

我正在计算 PostGIS 数据库中两个表/图层的重叠。一组是六边形网格，我想为每个六边形计算与另一组多边形的重叠比例。多面体集也有一些重叠的多边形，所以我需要溶解/合并它们。在我在 FME 中执行此操作之前，但对于一些较大的多边形，它的内存一直不足。这就是我想在数据库中执行此操作的原因（PostGIS 应该非常有能力执行此操作）。

这是我目前所拥有的，它可以工作，而且内存现在没有用完：

EXPLAIN ANALYZE
WITH rh_union AS (
    SELECT (ST_Dump(ST_Union(rh.geometry))).geom AS geometry
    FROM relevant_habitats rh 
    WHERE rh.assessment_area_id=1
) 
SELECT h.receptor_id, 
SUM(ROUND((ST_Area(ST_Intersection(rhu.geometry, h.geometry)) / ST_Area(h.geometry))::numeric,3)) AS frac_overlap
FROM rh_union rhu, hexagons h
WHERE h.zoom_level=1 AND ST_Intersects(rhu.geometry, h.geometry)
GROUP BY h.receptor_id

所以我首先打破多面体并尽可能合并。然后计算六边形与多边形的叠加。然后计算（所有小块的总和）面积。

现在，我的问题是：

这是一种有效的方法吗？还是有更好的方法？

（还有一个附带问题：先 ST_Union 然后 ST_Dump 是否正确？）

--用 EXPLAIN ANALYZE 更新单个区域的输出：

"QUERY PLAN"
"GroupAggregate  (cost=1996736.77..15410052.20 rows=390140 width=36) (actual time=571.303..1137.657 rows=685 loops=1)"
"  Group Key: h.receptor_id"
"  ->  Sort  (cost=1996736.77..1998063.55 rows=530712 width=188) (actual time=571.090..620.379 rows=806 loops=1)"
"        Sort Key: h.receptor_id"
"        Sort Method: external merge  Disk: 42152kB"
"        ->  Nested Loop  (cost=55.53..1848314.51 rows=530712 width=188) (actual time=382.769..424.643 rows=806 loops=1)"
"              ->  Result  (cost=55.25..1321.51 rows=1000 width=32) (actual time=382.550..383.696 rows=65 loops=1)"
"                    ->  ProjectSet  (cost=55.25..61.51 rows=1000 width=32) (actual time=382.544..383.652 rows=65 loops=1)"
"                          ->  Aggregate  (cost=55.25..55.26 rows=1 width=32) (actual time=381.323..381.325 rows=1 loops=1)"
"                                ->  Index Scan using relevant_habitats_pkey on relevant_habitats rh  (cost=0.28..28.75 rows=12 width=130244) (actual time=0.026..0.048 rows=12 loops=1)"
"                                      Index Cond: (assessment_area_id = 94)"
"              ->  Index Scan using idx_hexagons_geometry_gist on hexagons h  (cost=0.29..1846.45 rows=53 width=156) (actual time=0.315..0.624 rows=12 loops=65)"
"                    Index Cond: (geometry && (((st_dump((st_union(rh.geometry))))).geom))"
"                    Filter: (((zoom_level)::integer = 1) AND st_intersects((((st_dump((st_union(rh.geometry))))).geom), geometry))"
"                    Rows Removed by Filter: 19"
"Planning Time: 0.390 ms"
"Execution Time: 1372.443 ms"

更新 2：现在在第二次选择 (SELECT h.receptor_id~) 上输出 EXPLAIN ANALYZE，并将 CTE 替换为 (temp) 表：

"QUERY PLAN"
"GroupAggregate  (cost=2691484.47..20931829.74 rows=390140 width=36) (actual time=29.455..927.945 rows=685 loops=1)"
"  Group Key: h.receptor_id"
"  ->  Sort  (cost=2691484.47..2693288.89 rows=721768 width=188) (actual time=28.382..31.514 rows=806 loops=1)"
"        Sort Key: h.receptor_id"
"        Sort Method: quicksort  Memory: 336kB"
"        ->  Nested Loop  (cost=0.29..2488035.20 rows=721768 width=188) (actual time=0.189..27.852 rows=806 loops=1)"
"              ->  Seq Scan on rh_union rhu  (cost=0.00..23.60 rows=1360 width=32) (actual time=0.016..0.050 rows=65 loops=1)"
"              ->  Index Scan using idx_hexagons_geometry_gist on hexagons h  (cost=0.29..1828.89 rows=53 width=156) (actual time=0.258..0.398 rows=12 loops=65)"
"                    Index Cond: (geometry && rhu.geometry)"
"                    Filter: (((zoom_level)::integer = 1) AND st_intersects(rhu.geometry, geometry))"
"                    Rows Removed by Filter: 19"
"Planning Time: 0.481 ms"
"Execution Time: 928.583 ms"

【问题讨论】：

是否应用了空间索引？
两者（h 和 rh）都有关于几何的要点索引。
@romepa 可以添加查询计划吗：explain analyse？
@JimJones，我添加了解释分析。必须弄清楚实际放置的位置！这有用吗？
您能否创建一个表而不是 CTE 子句并再次为该函数计时？

标签： postgresql postgis psql overlap

【解决方案1】：

您需要一个指标来描述一个多边形表在另一个表上的其他多边形上的重叠程度。此查询返回同一个表中重叠多边形的 id 和一个百分比指示符。

SELECT
CONCAT(a.polyid,' ',b.polyid) AS intersecting_polys,
CONCAT(a.attribute_x,' ',b.attribute_x) AS attribute,
ROUND((100 * (ST_Area(ST_Intersection(a.geom, b.geom))) / ST_Area(a.geom))::NUMERIC,0) AS pc_overlap
FROM your_schema.your_table AS a
LEFT JOIN your_schema.your_table AS b
ON (a.geom && b.geom AND ST_Relate(a.geom, b.geom, '2********'))
WHERE a.polyid != b.polyid
;

注意这个词使用ON (a.geom && b.geom AND ST_Relate(a.geom, b.geom, '2********')) 代替 ST_Covers。您可能想在您的用例中试验哪个是正确的。

【讨论】：

还有atcoordinates.info/2018/11/26/…