【问题标题】:Python pandas - aggregate column on latitude, longitude based distancePython pandas - 基于纬度、经度的距离聚合列
【发布时间】:2020-01-30 10:06:14
【问题描述】:

使用如下数据框,

Time            Lat         Long        Val
19:24:50.925    35.61068333 139.6304283 -54.6   
19:24:51.022    35.61068333 139.6304283 -52.9   
19:24:51.118    35.61068333 139.6304283 -52.6   
19:24:51.215    35.61068394 139.6304283 -52.2
19:24:51.312    35.61068455 139.6304283 -49.3
19:24:51.409    35.61068515 139.6304283 -52.1
19:24:51.506    35.61068576 139.6304283 -52.2
19:24:51.603    35.61068636 139.6304283 -51.3
19:24:51.699    35.61068697 139.6304283 -51.8
19:24:51.796    35.61068758 139.6304283 -52.6
19:24:51.892    35.61068818 139.6304283 -53.5
19:24:51.990    35.61068879 139.6304283 -51.8
19:24:52.087    35.61068939 139.6304283 -54.1
19:24:52.183    35.61069042 139.6304283 -51.8
19:24:52.281    35.61069083 139.6304283 -53.5
19:24:52.378    35.61069125 139.6304283 -55.6
19:24:52.474    35.61069222 139.6304283 -53.2
19:24:52.571    35.61069278 139.6304283 -50.8
19:24:52.668    35.61069333 139.6304283 -54

LatLong 列共同保存每个位置的 geographic coordinatesVal 列保存该位置的某些指标的测量值。我需要做的是每 0.005 米聚合一次 Val 列 (mean) - 这意味着从第一个位置(纬度/经度)开始作为参考检查落在它的 0.005 米内的行并获得 @ 的平均值987654330@s 并从下一个位置重复(超出 0.005m 限制) - 结果如下所示。我查看了pandas.Grouper,但不确定如何使用它来实现结果。

Lat Long Val Count_of_records

【问题讨论】:

  • 所以要开始你想要所有点之间的成对欧几里得距离? geopy.distance 应该让这一切变得简单。但是我不清楚你想要什么。彼此相距 0.005 米内的所有点的成对平均值?或者组中所有成员彼此相距在 0​​.005 米以内的“组”的平均值?
  • 如果是后一种情况this previous question 可能会有所帮助。
  • @Ralph - 这个想法是从第一个位置开始,作为参考检查位置在 0.005 米内的行并对其进行平均。结果将有LatLongmean(Val)
  • 验证我的理解:然后您将继续对每一行应用相同的过程,使用每一行作为参考?
  • 以程序方式 - 从第一行开始作为参考,检查下一行与参考行的距离,如果小于限制距离 { store Val 当前行进行聚合} else { 对存储执行聚合Vals 并更改对当前行的引用 } 重复

标签: python pandas


【解决方案1】:

抱歉,我仍然无法理解这个问题。希望就是这样。

当然这个解决方案有点冗长,但我认为这应该使逻辑更清晰,并且将来更容易维护。

import pandas as pd
from io import StringIO
import geopy.distance
import numpy as np

# Setup data as in MWE
df = pd.read_fwf(StringIO("""
    Time            Lat         Long        Val
19:24:50.925    35.61068333 139.6304283 -54.6   
19:24:51.022    35.61068333 139.6304283 -52.9   
19:24:51.118    35.61068333 139.6304283 -52.6   
19:24:51.215    35.61068394 139.6304283 -52.2
19:24:51.312    35.61068455 139.6304283 -49.3
19:24:51.409    35.61068515 139.6304283 -52.1
19:24:51.506    35.61068576 139.6304283 -52.2
19:24:51.603    35.61068636 139.6304283 -51.3
19:24:51.699    35.61068697 139.6304283 -51.8
19:24:51.796    35.61068758 139.6304283 -52.6
19:24:51.892    35.61068818 139.6304283 -53.5
19:24:51.990    35.61068879 139.6304283 -51.8
19:24:52.087    35.61068939 139.6304283 -54.1
19:24:52.183    35.61069042 139.6304283 -51.8
19:24:52.281    35.61069083 139.6304283 -53.5
19:24:52.378    35.61069125 139.6304283 -55.6
19:24:52.474    35.61069222 139.6304283 -53.2
19:24:52.571    35.61069278 139.6304283 -50.8
19:24:52.668    35.61069333 139.6304283 -54"""), header=1)

# Extract longitude and latitude from df
coords = df[['Lat', 'Long']].values
# Compute the distances between consecutive rows of the dataframe
consec_dist = [geopy.distance.geodesic(*i).m for i in zip(coords[:-1], coords[1:])]

# Set up column in which to store our aggregates
df['mean'] = np.zeros(df.shape[0])

# The threshold distance
d = 0.005

# Loop over the rows one at a time
for row in range(df.shape[0] - 1):

    # From comments:
    # if less than limit distance { store Val of current row for aggregation}
    # else { perform aggregation on stored Vals and change reference to current row } repeat

    if consec_dist[row] < d:
        df.loc[row, 'mean'] = df.loc[row, 'Val']
    else:
        df.loc[row, 'mean'] = df.loc[row:row + 1, 'Val'].mean()

这给了我以下信息:

In [2]: df
Out[2]:
            Time        Lat        Long   Val   mean
0   19:24:50.925  35.610683  139.630428 -54.6 -54.60
1   19:24:51.022  35.610683  139.630428 -52.9 -52.90
2   19:24:51.118  35.610683  139.630428 -52.6 -52.40
3   19:24:51.215  35.610684  139.630428 -52.2 -50.75
4   19:24:51.312  35.610685  139.630428 -49.3 -50.70
5   19:24:51.409  35.610685  139.630428 -52.1 -52.15
6   19:24:51.506  35.610686  139.630428 -52.2 -51.75
7   19:24:51.603  35.610686  139.630428 -51.3 -51.55
8   19:24:51.699  35.610687  139.630428 -51.8 -52.20
9   19:24:51.796  35.610688  139.630428 -52.6 -53.05
10  19:24:51.892  35.610688  139.630428 -53.5 -52.65
11  19:24:51.990  35.610689  139.630428 -51.8 -52.95
12  19:24:52.087  35.610689  139.630428 -54.1 -52.95
13  19:24:52.183  35.610690  139.630428 -51.8 -52.65
14  19:24:52.281  35.610691  139.630428 -53.5 -54.55
15  19:24:52.378  35.610691  139.630428 -55.6 -54.40
16  19:24:52.474  35.610692  139.630428 -53.2 -52.00
17  19:24:52.571  35.610693  139.630428 -50.8 -52.40
18  19:24:52.668  35.610693  139.630428 -54.0   0.00

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2012-02-15
    • 1970-01-01
    • 1970-01-01
    • 2011-10-01
    • 1970-01-01
    • 2016-08-29
    • 2015-02-26
    • 2011-10-13
    相关资源
    最近更新 更多