【问题标题】:pandas join on time with tolerance and allow for multiple matches熊猫准时加入并允许多场比赛
【发布时间】:2020-09-21 11:41:27
【问题描述】:

当我需要从右侧匹配多个结果时,如何在 pandas 中执行基于时间的 JOIN,包括容差?

import pandas as pd

df = pd.DataFrame({'id_group_key':[1,1,1,2], 'time':['2020-01-01 00:01:00', '2020-01-01','2020-01-02', '2020-01-01'], 'left_value':[1,2,4,3]})
df['time'] = pd.to_datetime(df['time'])
df_right = pd.DataFrame({'id_group_key':[1,1,1,1,1,1,2], 'time':['2020-01-01 00:02:00', '2020-01-01 00:03:00', '2020-01-01 00:04:00','2020-01-01 00:02:00','2020-01-01 00:05:00', '2020-01-01', '2020-01-01'], 'right_value':[1,11,12,12,12,2,3]})
df_right['time'] = pd.to_datetime(df_right['time'])


df = df.set_index(['time'])
df = df.sort_index()
print(len(df))

df_right = df_right.set_index(['time'])
df_right = df_right.sort_index()

display(df)
display(df_right)

result = pd.merge_asof(df, df_right, on='id_group_key', tolerance=pd.Timedelta('36 days'),  left_index=True, right_index=True)
display(result)
len(result)

【问题讨论】:

    标签: python pandas join time-series


    【解决方案1】:

    只需颠倒顺序即可解决问题:

    result = pd.merge_asof(df_right, df, on='id_group_key', tolerance=pd.Timedelta('36 days'),  left_index=True, right_index=True)
    

    但是,这不允许OUTER JOIN

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-09-05
      • 2017-11-08
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多