【发布时间】:2017-10-18 08:56:41
【问题描述】:
如何将两个 pandas 数据帧与日期时间索引连接起来,以使时间戳尽可能匹配。是否可以使用填充方法?
一个例子可以是:
#required packages
import pandas as pd
import numpy as np
# defining stuff
num_periods_1 = 11
num_periods_2 = 4
# create sample time series
dates1 = pd.date_range('1/1/2000', periods=num_periods_1, freq='10min')
dates2 = pd.date_range('1/1/2000 00:40:00', periods=num_periods_2, freq='10min')
column_names_1 = ['B', 'C', 'A']
column_names_2 = ['B', 'C', 'D']
df1 = pd.DataFrame(np.random.randn(num_periods_1, len(column_names_1)), index=dates1, columns=column_names_1)
df2 = pd.DataFrame(np.random.randn(num_periods_2, len(column_names_2)), index=dates2, columns=column_names_2)
print("\nData Frame One:\n", df1)
print("\nData Frame Two:\n", df2)
df3 = pd.concat([df1.reset_index().add_suffix('_x'), df2.reset_index().add_suffix('_y')], axis=1).set_index(['index_x', 'index_y']).sort_index(axis=1)
print("\nData Frame Three:\n", df3)
这里的输出将如下所示:
A_x B_x B_y \
index_x index_y
2000-01-01 00:00:00 2000-01-01 00:40:00 0.878508 -0.608439 -0.468326
2000-01-01 00:10:00 2000-01-01 00:50:00 -1.056812 0.070073 0.802728
2000-01-01 00:20:00 2000-01-01 01:00:00 -0.085436 0.577973 1.278077
2000-01-01 00:30:00 2000-01-01 01:10:00 -0.061046 -0.410809 -1.913346
2000-01-01 00:40:00 NaT -0.522415 -1.128558 NaN
2000-01-01 00:50:00 NaT 0.423210 1.266240 NaN
2000-01-01 01:00:00 NaT -2.411029 -0.303869 NaN
2000-01-01 01:10:00 NaT 0.050969 -0.807989 NaN
2000-01-01 01:20:00 NaT -0.466958 0.311464 NaN
2000-01-01 01:30:00 NaT -0.137329 -0.234095 NaN
2000-01-01 01:40:00 NaT -1.089133 -0.173481 NaN
C_x C_y D_y
index_x index_y
2000-01-01 00:00:00 2000-01-01 00:40:00 2.298649 0.673585 -1.586648
2000-01-01 00:10:00 2000-01-01 00:50:00 -1.791427 0.907333 0.950786
2000-01-01 00:20:00 2000-01-01 01:00:00 -0.980498 -0.625798 0.284694
2000-01-01 00:30:00 2000-01-01 01:10:00 1.337427 -0.859036 -0.237332
2000-01-01 00:40:00 NaT -1.493857 NaN NaN
2000-01-01 00:50:00 NaT 0.455737 NaN NaN
2000-01-01 01:00:00 NaT 0.393388 NaN NaN
2000-01-01 01:10:00 NaT -1.612417 NaN NaN
2000-01-01 01:20:00 NaT 2.471329 NaN NaN
2000-01-01 01:30:00 NaT -0.541828 NaN NaN
2000-01-01 01:40:00 NaT -0.162694 NaN NaN
我想要做的是将第二个索引移动到时间戳与第一个索引匹配的位置。这可以通过 concat、join 或 merge 来实现吗?
【问题讨论】:
-
也许在 concat 之前做
df2 = df2.reindex_axis(df1.index, 0, method='nearest')? -
看什么时候不完全匹配你要哪个方向
pd.merge_asof(df1,df2,left_on='index_x',right_on='index_y',direction='backward')
标签: python-3.x pandas datetime