Python - 加速反向地理编码答案

【问题标题】：Python - speed up reverse geo-codingPython - 加速反向地理编码
【发布时间】：2021-06-02 05:09:51
【问题描述】：

我目前正在执行如下反向地理编码操作：

import json
from shapely.geometry import shape, Point
import time

with open('districts.json') as f: districts = json.load(f)
# file also kept at https://raw.githubusercontent.com/Thevesh/Display/master/districts.json

def reverse_geocode(lon,lat):
    point = Point(lon, lat) # lon/lat
    for feature in districts['features']:
        polygon = shape(feature['geometry'])
        if polygon.contains(point): return [(feature['properties'])['ADM1_EN'], (feature['properties'])['ADM2_EN']]
    return ['','']

start_time = time.time()
for i in range(1000): test = reverse_geocode(103, 3)
print('----- Code ran in ' + "{:.3f}".format(time.time() - start_time) + ' seconds -----')

这需要大约 13 秒来反向地理编码 1000 个点，这很好。

但是，我需要为一项任务对 1000 万个坐标对进行反向地理编码，这意味着假设线性复杂性需要 130k 秒（1.5 天）。不好。

该算法明显的低效之处在于，它每次对一个点进行分类时都会遍历整个多边形集，这非常浪费时间。

如何改进此代码？要在任务可接受的时间内计算 1000 万对，我需要在 1 秒内运行 1k 对。

【问题讨论】：

第一个明显的解决方案是使用多处理运行它，您尝试过吗？

标签： python reverse-geocoding

【解决方案1】：

我使用并行性得出了这个算法

如果可能，如果它对您的目的有用，请将其退回给我。请记住，这是一个业余算法，需要调整。

import concurrent.futures

with open('districts.json') as f: districts = json.load(f)

def reverse_geocode(lon:int, lat:int) -> list:

    point = Point(lon, lat) # lon/lat
    for feature in districts['features']:
        polygon = shape(feature['geometry'])
        if polygon.contains(point):
            return [(feature['properties'])['ADM1_EN'], (feature['properties'])['ADM2_EN']]
    return ['','']

if __name__ == '__main__':
    time_start = time.time()

    with concurrent.futures.ProcessPoolExecutor() as process:
        for url in range(1000):
            process.submit(reverse_geocode, 103, 3)

    time_end = time.time()
    print(f'\nfim {round(time_end - time_start, 2)} seconds')

【讨论】：

这将时间减少了一半，这很棒，但我认为根本问题仍然是我们正在遍历整个列表。我将尝试一种对要搜索的列表和要搜索的功能进行排序的方法。