【问题标题】:How to identify what coordinates that are within a specific distance of eachother如何识别彼此特定距离内的坐标
【发布时间】:2021-09-29 22:49:13
【问题描述】:

我正在尝试确定哪些坐标位于彼此的特定距离内。目前,当我的代码应该是两个单独的组时,我的代码将所有点组合在一起。

from sklearn.neighbors import DistanceMetric
from math import radians
import pandas as pd
import numpy as np
from collections import Counter

data = {'Lat': [38.42447, 38.424474, 38.424493, 38.424394, 38.424457, 38.424434],
    'Long': [-77.402199, -77.402228, -77.402186, -77.398625, -77.398602, -77.398459],
    'Name': ['Truck', 'Truck1','Truck2','Truck3','Truck4','Truck5',]}
df = pd.DataFrame(data)

df['Lat'] = np.radians(df['Lat'])
df['Long'] = np.radians(df['Long'])

dist = DistanceMetric.get_metric('haversine')

df[['Lat','Long']].to_numpy()
dist.pairwise(df[['Lat','Long']].to_numpy())*6371000

final_df = pd.DataFrame(dist.pairwise(df[['Lat','Long']].to_numpy())*6371000,  columns=df.Name.unique(), index=df.Name.unique())

potential_grouping = []
for row, col in final_df.items():
for item in col:
    if int(item) < 15:
        potential_grouping.append(row)

outside_features = [k for k, v in Counter(potential_grouping).items() if v == 1]
acceptable_features = [k for k, v in Counter(potential_grouping).items() if v > 1]
print(acceptable_features)
current output: ['Truck', 'Truck1', 'Truck2', 'Truck3', 'Truck4', 'Truck5']
desired output: [['Truck', 'Truck1', 'Truck2'],['Truck3', 'Truck4', 'Truck5']]

这是正在发生的事情的糟糕图片...... 6 个小圆圈目前正在分组(大红色圆圈),但应该是分开的(2 个绿色圆圈)。发生这种情况是因为每个坐标(小的棕色圆圈)都在 15 米之内。我怎样才能确保我得到我想要的输出?

【问题讨论】:

  • 您是否需要一个绿色组的任何两个成员最多 15 (?) 远?或者,对于绿色组的每个成员,至少需要有一个相同组的成员,该组的距离不超过 15?
  • @YuliaV 每个坐标需要在 15 米范围内才能进行分组。两组中的每个坐标都在 15 米范围内,这就是它们当前被错误分组的原因。但是 'Truck2' 和 'Truck3' 之间的距离是 100 多米(它们在单独的绿色圆圈中)
  • 您是在寻找一种快速而低效的解决方案还是一个合适的解决方案?聚类是已知的算法问题,参见例如towardsdatascience.com/…
  • 不清楚你想要的输出是什么。 “距离小于d”是不传递的,所以不存在等价关系,也不存在划分。

标签: python pandas algorithm cluster-analysis distance


【解决方案1】:

这是使用DBSCAN的一种方法:

from sklearn.cluster import DBSCAN

# here Lat and Long are already in radians
X = df[['Lat', 'Long']].to_numpy()

# here 15 is your max distance in meters divided by earth radius in meters
clustering = DBSCAN(eps=15/6373000, min_samples=1, metric='haversine').fit(X)

# see groups
print(clustering.labels_)
# [0 0 0 1 1 1]

# get the result as you want
acceptable_features = df['Name'].groupby(clustering.labels_).agg(list).tolist()
print(acceptable_features)
# [['Truck', 'Truck1', 'Truck2'], ['Truck3', 'Truck4', 'Truck5']]

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2019-07-04
    • 2022-06-13
    • 1970-01-01
    • 2016-01-03
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多