【问题标题】:Finding pairs of latitude and longitude within a certain radius in Python在Python中查找一定半径内的纬度和经度对
【发布时间】:2021-06-02 20:33:08
【问题描述】:

给定一个数据框df,如下:

    id              location        lon       lat
0    1            Onyx Spire  116.35425  39.87760
1    2        Unison Lookout  116.44333  39.93237
2    3       History Lookout  116.14857  39.73727
3    4     Domination Pillar  116.46387  39.96286
4    5           Union Tower  116.36373  39.95064
5    6   Ruby Forest Obelisk  116.35786  39.89463
6    7      Rust Peak Pillar  116.34870  39.98170
7    8      Ash Forest Tower  116.38461  39.94938
8    9  Prestige Mound Tower  116.34052  39.98977
9   10  Sapphire Mound Tower  116.35063  39.92982
10  11       Kinship Lookout  116.43020  39.99997
11  12    Exhibition Obelisk  116.45108  39.94371

对于每个location,如果它们之间的距离小于等于,我需要找出其他位置名称,例如5 km。

代码基于this link的回答:

from scipy.spatial import distance
from math import sin, cos, sqrt, atan2, radians

def get_distance(point1, point2):
    R = 6370
    lat1 = radians(point1[0])  #insert value
    lon1 = radians(point1[1])
    lat2 = radians(point2[0])
    lon2 = radians(point2[1])

    dlon = lon2 - lon1
    dlat = lat2- lat1

    a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
    c = 2 * atan2(sqrt(a), sqrt(1-a))
    distance = R * c
    return distance

all_points = df[['lat', 'lon']].values
dm = distance.cdist(all_points, all_points, get_distance)
pd.DataFrame(dm, index=df.index, columns=df.index)

输出:

           0          1          2   ...         9          10         11
0    0.000000   9.736316  23.494395  ...   5.813891  15.066709  11.054762
1    9.736316   0.000000  33.222475  ...   7.908015   7.598415   1.423357
2   23.494395  33.222475   0.000000  ...  27.492814  37.822285  34.549129
3   13.312235   3.815179  36.787014  ...  10.327235   5.024900   2.391864
4    8.160542   7.082601  30.000842  ...   2.569988   7.883467   7.484839
5    1.918235   8.409888  25.009951  ...   3.960618  13.235325   9.641336
6   11.583243   9.752599  32.096627  ...   5.770232   7.233093   9.692770
7    8.389761   5.350670  31.017383  ...   3.622002   6.835323   5.700434
8   12.525586  10.838805  32.501864  ...   6.720541   7.722060  10.722467
9    5.813891   7.908015  27.492814  ...   0.000000  10.334273   8.701063
10  15.066709   7.598415  37.822285  ...  10.334273   0.000000   6.502921
11  11.054762   1.423357  34.549129  ...   8.701063   6.502921   0.000000

但我想获得类似于以下数据框的输出。请注意location1location2location3 是距离location 的位置名称(配对的位置名称可能不准确,仅用作示例帮助理解),如果是NaN,则不存在这样的location

    id              location  ...            location2          location3
0    1            Onyx Spire  ...                  NaN                NaN
1    2        Unison Lookout  ...                  NaN                NaN
2    3       History Lookout  ...                  NaN                NaN
3    4     Domination Pillar  ...                  NaN                NaN
4    5           Union Tower  ...                  NaN                NaN
5    6   Ruby Forest Obelisk  ...                  NaN                NaN
6    7      Rust Peak Pillar  ...                  NaN                NaN
7    8      Ash Forest Tower  ...      Kinship Lookout                NaN
8    9  Prestige Mound Tower  ...                  NaN                NaN
9   10  Sapphire Mound Tower  ...                  NaN                NaN
10  11       Kinship Lookout  ...  Ruby Forest Obelisk  Domination Pillar
11  12    Exhibition Obelisk  ...                  NaN                NaN

我如何在 Python 中做到这一点?谢谢。

【问题讨论】:

    标签: python-3.x pandas dataframe levenshtein-distance euclidean-distance


    【解决方案1】:

    想法是为不是0 的值创建掩码,而不是像5km,然后使用DataFrame.dot 进行矩阵乘法,最后使用Series.str.split 用于连接到原始列的新列:

    df1 = pd.DataFrame(dm, index=df.index, columns=df.index)
    
    df = (df.join((df1.ne(0) & df1.lt(5)).dot(df['location']+ ',')
                                         .str[:-1]
                                         .str.split(',', expand=True)
                                         .add_prefix('loc')))
    

    print (df)
        id              location        lon       lat                 loc0  \
    0    1            Onyx Spire  116.35425  39.87760  Ruby Forest Obelisk   
    1    2        Unison Lookout  116.44333  39.93237    Domination Pillar   
    2    3       History Lookout  116.14857  39.73727                        
    3    4     Domination Pillar  116.46387  39.96286       Unison Lookout   
    4    5           Union Tower  116.36373  39.95064     Rust Peak Pillar   
    5    6   Ruby Forest Obelisk  116.35786  39.89463           Onyx Spire   
    6    7      Rust Peak Pillar  116.34870  39.98170          Union Tower   
    7    8      Ash Forest Tower  116.38461  39.94938          Union Tower   
    8    9  Prestige Mound Tower  116.34052  39.98977          Union Tower   
    9   10  Sapphire Mound Tower  116.35063  39.92982          Union Tower   
    10  11       Kinship Lookout  116.43020  39.99997                        
    11  12    Exhibition Obelisk  116.45108  39.94371       Unison Lookout   
    
                        loc1                  loc2                  loc3  
    0                   None                  None                  None  
    1     Exhibition Obelisk                  None                  None  
    2                   None                  None                  None  
    3     Exhibition Obelisk                  None                  None  
    4       Ash Forest Tower  Prestige Mound Tower  Sapphire Mound Tower  
    5   Sapphire Mound Tower                  None                  None  
    6       Ash Forest Tower  Prestige Mound Tower                  None  
    7       Rust Peak Pillar  Sapphire Mound Tower                  None  
    8       Rust Peak Pillar                  None                  None  
    9    Ruby Forest Obelisk      Ash Forest Tower                  None  
    10                  None                  None                  None  
    11     Domination Pillar                  None                  None  
    

    对于排序值,请使用:

    df1 = pd.DataFrame(dm, index=df.index, columns=df['location'])
    
    df1 = df.join(df1.apply(lambda x: pd.Series(x[(x!=0)&(x < 5)].sort_values().index), axis=1)
                    .add_prefix('loc'))
    print (df1)
        id              location        lon       lat                  loc0  \
    0    1            Onyx Spire  116.35425  39.87760   Ruby Forest Obelisk   
    1    2        Unison Lookout  116.44333  39.93237    Exhibition Obelisk   
    2    3       History Lookout  116.14857  39.73727                   NaN   
    3    4     Domination Pillar  116.46387  39.96286    Exhibition Obelisk   
    4    5           Union Tower  116.36373  39.95064      Ash Forest Tower   
    5    6   Ruby Forest Obelisk  116.35786  39.89463            Onyx Spire   
    6    7      Rust Peak Pillar  116.34870  39.98170  Prestige Mound Tower   
    7    8      Ash Forest Tower  116.38461  39.94938           Union Tower   
    8    9  Prestige Mound Tower  116.34052  39.98977      Rust Peak Pillar   
    9   10  Sapphire Mound Tower  116.35063  39.92982           Union Tower   
    10  11       Kinship Lookout  116.43020  39.99997                   NaN   
    11  12    Exhibition Obelisk  116.45108  39.94371        Unison Lookout   
    
                        loc1                 loc2                  loc3  
    0                    NaN                  NaN                   NaN  
    1      Domination Pillar                  NaN                   NaN  
    2                    NaN                  NaN                   NaN  
    3         Unison Lookout                  NaN                   NaN  
    4   Sapphire Mound Tower     Rust Peak Pillar  Prestige Mound Tower  
    5   Sapphire Mound Tower                  NaN                   NaN  
    6            Union Tower     Ash Forest Tower                   NaN  
    7   Sapphire Mound Tower     Rust Peak Pillar                   NaN  
    8            Union Tower                  NaN                   NaN  
    9       Ash Forest Tower  Ruby Forest Obelisk                   NaN  
    10                   NaN                  NaN                   NaN  
    11     Domination Pillar                  NaN                   NaN  
    

    【讨论】:

    • 谢谢,顺便说一句,我们可以将loc0, loc1, ... loc3 从最短到最长的距离排列吗?
    • @ahbon - 这很复杂,但可能。我正在回答。
    • 顺便说一句,我发现有些配对的位置不在5 km的距离范围内,可能是地球半径设置不准确,R = 6370?
    • @ahbon - 不幸的是这个区域对我来说是未知的,所以不知道。
    • @ahbon - 也许需要一些 geopandas 解决方案,从不使用它,所以不知道。我添加了从低到高排序的解决方案。
    【解决方案2】:

    这是一个使用 BallTree 的方法,从最短到最长距离排序

    from sklearn.neighbors import BallTree
    import pandas as pd
    import numpy as np
    
    
    data = { 'lon' : [116.35425, 116.44333, 116.14857, 116.46387, 116.36373, 116.35786, 116.34870, 116.38461, 116.34052, 116.35063, 116.43020, 116.45108],
    'lat' : [39.87760, 39.93237, 39.73727, 39.96286, 39.95064, 39.89463, 39.98170, 39.94938, 39.98977, 39.92982, 39.99997, 39.94371],
    'location' : ["Onyx Spire", "Unison Lookout", "History Lookout", "Domination Pillar", "Union Tower", "Ruby Forest Obelisk", "Rust Peak Pillar", "Ash Forest Tower", "Prestige Mound Tower", "Sapphire Mound Tower", "Kinship Lookout", "Exhibition Obelisk"]}
    
    locations = pd.DataFrame.from_dict(data)
    

    创建球树

    locations_radians =  np.radians(locations[["lat","lon"]].values)
    tree = BallTree(locations_radians, leaf_size=12, metric='haversine')
    
    distance_in_meters = 5000
    earth_radius = 6371000
        
    radius = distance_in_meters / earth_radius
    

    请注意,我首先对is_within 中的is_within_sorted 进行排序

    is_within, distances = tree.query_radius(locations_radians, r=radius, count_only=False, return_distance=True) 
    
    is_within_sorted = [ iw[ np.argsort(di) ] for iw,di in zip(is_within, distances) ]
    distances_sorted = [np.sort(d) for d in distances]
    

    is_within 包含不同长度的数组,它们将返回半径内位置的索引。您可以将这些与实际距离一起存储。

    现在我用 Nan 填充并创建一个 DF,以便以后加入

    pad_with_nans = [ np.pad(locations.location[iw], (0,locations.lat.size), 'constant', constant_values=np.nan)[:locations.lat.size] for iw in is_within_sorted]
    location_names = [ 'location_{}'.format(i) for i in range(locations.lat.size) ]
    
    within_radius = pd.DataFrame(pad_with_nans, index=locations.index, columns=location_names)
    

    我们有

    locations.join(within_radius)
    

    给予

             lon       lat           location         location_0  \
    0  116.35425  39.87760         Onyx Spire         Onyx Spire   
    1  116.44333  39.93237     Unison Lookout     Unison Lookout   
    2  116.14857  39.73727    History Lookout    History Lookout   
    3  116.46387  39.96286  Domination Pillar  Domination Pillar   
    4  116.36373  39.95064        Union Tower        Union Tower   
    
                location_1            location_2        location_3  \
    0  Ruby Forest Obelisk                   NaN               NaN   
    1   Exhibition Obelisk     Domination Pillar               NaN   
    2                  NaN                   NaN               NaN   
    3   Exhibition Obelisk        Unison Lookout               NaN   
    4     Ash Forest Tower  Sapphire Mound Tower  Rust Peak Pillar   
    
                 location_4  location_5  location_6  location_7  location_8  \
    0                   NaN         NaN         NaN         NaN         NaN   
    1                   NaN         NaN         NaN         NaN         NaN   
    2                   NaN         NaN         NaN         NaN         NaN   
    3                   NaN         NaN         NaN         NaN         NaN   
    4  Prestige Mound Tower         NaN         NaN         NaN         NaN   
    
       location_9  location_10  location_11  
    0         NaN          NaN          NaN  
    1         NaN          NaN          NaN  
    2         NaN          NaN          NaN  
    3         NaN          NaN          NaN  
    4         NaN          NaN          NaN  
    

    点本身始终在其内部,因此您可以删除第一列。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2018-09-09
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多