在数据框中使用 geopy 来获取距离答案

【问题标题】：Using geopy in a Dataframe to get distances在数据框中使用 geopy 来获取距离
【发布时间】：2019-04-29 18:58:53
【问题描述】：

我是 Geopy 的新手。我在这家运输公司工作，需要了解卡车运行的总公里数。

我在这里看到了一些答案，但它们对我不起作用。

我有以下来自卡车上安装的 GPS 的 Dataframe

    latitude    longitude
0   -25.145439  -54.294871
1   -24.144564  -54.240094
2   -24.142564  -54.198901
3   -24.140093  52.119021

第一步是制作第三列，将所有内容转换为一个点，但我所有的尝试都失败了

我写

df['point'] = df['latitude'].astype(float),df['longitude'].astype(float)

它返回一个对象。我希望它返回一个点。我的目标是：

    latitude    longitude      Point
0   -25.145439  -54.294871     (-25.145439  -54.294871)
1   -24.144564  -54.240094     (-24.144564  -54.240094)
2   -24.142564  -54.198901     (-24.142564  -54.198901)
3   -24.140093  52.119021      (-24.140093  52.119021)

然后我想与这两个保持距离，所以我会有这样的东西：

    latitude    longitude      Point                        Distance KM
0   -25.145439  -54.294871     (-25.145439  -54.294871)     0
1   -24.144564  -54.240094     (-24.144564  -54.240094)     0,2
2   -24.142564  -54.198901     (-24.142564  -54.198901)     0,4
3   -24.140093  52.119021      (-24.140093  52.119021)      0,2

注意距离是与上一行的差（已经排好序了）

我正在尝试：

df['distance'] = geodesic(df['point'],df['point'].shift(1))

我收到一个错误，它不适用于元组。

有人知道解决办法吗？

tks

【问题讨论】：

知道了。谢谢。在这篇文章中：stackoverflow.com/questions/30969282/…

标签： python pandas geopy

【解决方案1】：

创建一个point 系列：

import pandas as pd

df = pd.DataFrame(
    [
        (-25.145439,  -54.294871),
        (-24.144564,  -54.240094),
        (-24.142564,  -54.198901),
        (-24.140093,  52.119021),
    ],
    columns=['latitude', 'longitude']
)

from geopy import Point
from geopy.distance import distance

df['point'] = df.apply(lambda row: Point(latitude=row['latitude'], longitude=row['longitude']), axis=1)

In [2]: df
Out[2]:
    latitude  longitude                                point
0 -25.145439 -54.294871  25 8m 43.5804s S, 54 17m 41.5356s W
1 -24.144564 -54.240094  24 8m 40.4304s S, 54 14m 24.3384s W
2 -24.142564 -54.198901  24 8m 33.2304s S, 54 11m 56.0436s W
3 -24.140093  52.119021    24 8m 24.3348s S, 52 7m 8.4756s E

添加一个新的移位point_next 系列：

df['point_next'] = df['point'].shift(1)
df.loc[df['point_next'].isna(), 'point_next'] = None

In [4]: df
Out[4]:
    latitude  longitude                                point                           point_next
0 -25.145439 -54.294871  25 8m 43.5804s S, 54 17m 41.5356s W                                 None
1 -24.144564 -54.240094  24 8m 40.4304s S, 54 14m 24.3384s W  25 8m 43.5804s S, 54 17m 41.5356s W
2 -24.142564 -54.198901  24 8m 33.2304s S, 54 11m 56.0436s W  24 8m 40.4304s S, 54 14m 24.3384s W
3 -24.140093  52.119021    24 8m 24.3348s S, 52 7m 8.4756s E  24 8m 33.2304s S, 54 11m 56.0436s W

计算距离：

df['distance_km'] = df.apply(lambda row: distance(row['point'], row['point_next']).km if row['point_next'] is not None else float('nan'), axis=1)
df = df.drop('point_next', axis=1)

In [6]: df
Out[6]:
    latitude  longitude                                point   distance_km
0 -25.145439 -54.294871  25 8m 43.5804s S, 54 17m 41.5356s W           NaN
1 -24.144564 -54.240094  24 8m 40.4304s S, 54 14m 24.3384s W    111.003172
2 -24.142564 -54.198901  24 8m 33.2304s S, 54 11m 56.0436s W      4.192654
3 -24.140093  52.119021    24 8m 24.3348s S, 52 7m 8.4756s E  10449.661388

【讨论】：

【解决方案2】：

如果您处理大量数据（数十万），请准备好 .apply(geopy.distance(), axis=1) 的工作速度会非常慢。

一种解决方法是使用 Haversine 公式，它可以在 pandas/numpy 框架内有效地矢量化（但可能不太精确）。其他方法是使用称为 geopandas 的东西，如果你对外部包没问题

【讨论】：