【发布时间】:2020-09-21 11:41:27
【问题描述】:
当我需要从右侧匹配多个结果时,如何在 pandas 中执行基于时间的 JOIN,包括容差?
import pandas as pd
df = pd.DataFrame({'id_group_key':[1,1,1,2], 'time':['2020-01-01 00:01:00', '2020-01-01','2020-01-02', '2020-01-01'], 'left_value':[1,2,4,3]})
df['time'] = pd.to_datetime(df['time'])
df_right = pd.DataFrame({'id_group_key':[1,1,1,1,1,1,2], 'time':['2020-01-01 00:02:00', '2020-01-01 00:03:00', '2020-01-01 00:04:00','2020-01-01 00:02:00','2020-01-01 00:05:00', '2020-01-01', '2020-01-01'], 'right_value':[1,11,12,12,12,2,3]})
df_right['time'] = pd.to_datetime(df_right['time'])
df = df.set_index(['time'])
df = df.sort_index()
print(len(df))
df_right = df_right.set_index(['time'])
df_right = df_right.sort_index()
display(df)
display(df_right)
result = pd.merge_asof(df, df_right, on='id_group_key', tolerance=pd.Timedelta('36 days'), left_index=True, right_index=True)
display(result)
len(result)
【问题讨论】:
标签: python pandas join time-series