【问题标题】:Trying to concat two time series dataframes and matchup the timestamps as close as possible尝试连接两个时间序列数据帧并尽可能匹配时间戳
【发布时间】:2017-10-18 08:56:41
【问题描述】:

如何将两个 pandas 数据帧与日期时间索引连接起来,以使时间戳尽可能匹配。是否可以使用填充方法?

一个例子可以是:

#required packages
import pandas as pd
import numpy as np

# defining stuff
num_periods_1 = 11
num_periods_2 = 4

# create sample time series
dates1 = pd.date_range('1/1/2000', periods=num_periods_1, freq='10min')
dates2 = pd.date_range('1/1/2000 00:40:00', periods=num_periods_2, freq='10min')

column_names_1 = ['B', 'C', 'A']
column_names_2 = ['B', 'C', 'D']

df1 = pd.DataFrame(np.random.randn(num_periods_1, len(column_names_1)), index=dates1, columns=column_names_1)
df2 = pd.DataFrame(np.random.randn(num_periods_2, len(column_names_2)), index=dates2, columns=column_names_2)

print("\nData Frame One:\n", df1)
print("\nData Frame Two:\n", df2)

df3 = pd.concat([df1.reset_index().add_suffix('_x'), df2.reset_index().add_suffix('_y')], axis=1).set_index(['index_x', 'index_y']).sort_index(axis=1)
print("\nData Frame Three:\n", df3)

这里的输出将如下所示:

                                              A_x       B_x       B_y  \
index_x             index_y                                             
2000-01-01 00:00:00 2000-01-01 00:40:00  0.878508 -0.608439 -0.468326   
2000-01-01 00:10:00 2000-01-01 00:50:00 -1.056812  0.070073  0.802728   
2000-01-01 00:20:00 2000-01-01 01:00:00 -0.085436  0.577973  1.278077   
2000-01-01 00:30:00 2000-01-01 01:10:00 -0.061046 -0.410809 -1.913346   
2000-01-01 00:40:00 NaT                 -0.522415 -1.128558       NaN   
2000-01-01 00:50:00 NaT                  0.423210  1.266240       NaN   
2000-01-01 01:00:00 NaT                 -2.411029 -0.303869       NaN   
2000-01-01 01:10:00 NaT                  0.050969 -0.807989       NaN   
2000-01-01 01:20:00 NaT                 -0.466958  0.311464       NaN   
2000-01-01 01:30:00 NaT                 -0.137329 -0.234095       NaN   
2000-01-01 01:40:00 NaT                 -1.089133 -0.173481       NaN   

                                              C_x       C_y       D_y  
index_x             index_y                                            
2000-01-01 00:00:00 2000-01-01 00:40:00  2.298649  0.673585 -1.586648  
2000-01-01 00:10:00 2000-01-01 00:50:00 -1.791427  0.907333  0.950786  
2000-01-01 00:20:00 2000-01-01 01:00:00 -0.980498 -0.625798  0.284694  
2000-01-01 00:30:00 2000-01-01 01:10:00  1.337427 -0.859036 -0.237332  
2000-01-01 00:40:00 NaT                 -1.493857       NaN       NaN  
2000-01-01 00:50:00 NaT                  0.455737       NaN       NaN  
2000-01-01 01:00:00 NaT                  0.393388       NaN       NaN  
2000-01-01 01:10:00 NaT                 -1.612417       NaN       NaN  
2000-01-01 01:20:00 NaT                  2.471329       NaN       NaN  
2000-01-01 01:30:00 NaT                 -0.541828       NaN       NaN  
2000-01-01 01:40:00 NaT                 -0.162694       NaN       NaN

我想要做的是将第二个索引移动到时间戳与第一个索引匹配的位置。这可以通过 concat、join 或 merge 来实现吗?

【问题讨论】:

  • 也许在 concat 之前做df2 = df2.reindex_axis(df1.index, 0, method='nearest')
  • 看什么时候不完全匹配你要哪个方向pd.merge_asof(df1,df2,left_on='index_x',right_on='index_y',direction='backward')

标签: python-3.x pandas datetime


【解决方案1】:

不确定这是否可行,但如果你在 concat 之前使用 reindex,

df2 = df2.reindex(df1.index)
df3 = pd.concat([df1.reset_index().add_suffix('_x'),\ 
df2.reset_index().add_suffix('_y')], axis=1)\
.set_index(['index_x', 'index_y']).sort_index(axis=1)

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-11-19
    • 1970-01-01
    • 2018-11-17
    相关资源
    最近更新 更多