【发布时间】:2022-10-21 04:43:21
【问题描述】:
我正在尝试找到一种有效的方法来计算一组坐标(纬度,经度)到最近邻居的距离:
[[51.51045038114607, -0.1393407528617875],
[51.5084300350736, -0.1261805976142865],
[51.37912856172232, -0.1038613174724213]]
我以前有一个工作(我想!)一段代码,它使用 sklearn 的 NearestNeighbors 来降低此任务的算法复杂性:
from sklearn.neighbors import NearestNeighbors
from sklearn.metrics.pairwise import haversine_distances
from math import sin, cos, sqrt, atan2, radians
# coordinates
coords = [[51.51045038114607, -0.1393407528617875],
[51.5084300350736, -0.1261805976142865],
[51.37912856172232, -0.1038613174724213]]
# tree method that reduces algorithmic complexity from O(n^2) to O(Nlog(N))
nbrs = NearestNeighbors(n_neighbors=2,
metric=_haversine_distance
).fit(coords)
distances, indices = nbrs.kneighbors(coords)
# the outputted distances
result = distances[:, 1]
输出如下:
array([ 1.48095104, 1.48095104, 14.59484348])
它使用我自己版本的半正弦距离作为距离度量
def _haversine_distance(p1, p2):
"""
p1: array of two floats, the first point
p2: array of two floats, the second point
return: Returns a float value, the haversine distance
"""
lon1, lat1 = p1
lon2, lat2 = p2
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# get the deltas
dlon = lon2 - lon1
dlat = lat2 - lat1
# haversine formula
a = np.sin(dlat/2)**2 + (np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2)**2)
c = 2 * np.arcsin(np.sqrt(a))
# approximate radius of earth in km
R = 6373.0
# convert to km distance
distance = R * c
return distance
These distances are wrong,我的第一个问题是,这是为什么呢?有什么办法可以在保留 NearestNeighbors 方法的算法简单性的同时纠正这个问题?
然后我发现我可以通过使用 geopy.distance 方法得到正确的答案,但是这并没有内置技术来降低复杂性和计算时间
import geopy.distance
coords_1 = (51.51045038, -0.13934075)
coords_2 = (51.50843004, -0.1261806)
geopy.distance.geodesic(coords_1, coords_2).km
我的第二个问题是,是否有这种方法的实现可以降低复杂性,否则我将被迫使用嵌套的 for 循环来检查每个之间的距离 点和所有其他人。
任何帮助表示赞赏!
【问题讨论】:
标签: python scikit-learn geopy haversine