Pandas：使用元组索引将数据从两个数据帧移动到另一个数据帧答案

【问题标题】：Pandas: moving data from two dataframes to another with tuple indexPandas：使用元组索引将数据从两个数据帧移动到另一个数据帧
【发布时间】：2019-11-14 01:20:50
【问题描述】：

我有如下三个数据框：

final_df

                                other   ref
(2014-12-24 13:20:00-05:00, a)  NaN     NaN
(2014-12-24 13:40:00-05:00, b)  NaN     NaN
(2018-07-03 14:00:00-04:00, d)  NaN     NaN

ref_df

                                a   b   c   d
2014-12-24 13:20:00-05:00       1   2   3   4
2014-12-24 13:40:00-05:00       2   3   4   5
2017-11-24 13:10:00-05:00       ..............
2018-07-03 13:25:00-04:00       ..............
2018-07-03 14:00:00-04:00       9   10  11  12
2019-07-03 13:10:00-04:00       ..............

other_df

                                a   b   c   d
2014-12-24 13:20:00-05:00       10  20  30  40
2014-12-24 13:40:00-05:00       20  30  40  50
2017-11-24 13:10:00-05:00       ..............
2018-07-03 13:20:00-04:00       ..............
2018-07-03 13:25:00-04:00       ..............
2018-07-03 14:00:00-04:00       90  100 110 120
2019-07-03 13:10:00-04:00       ..............

我需要用相关的数据框替换我的 final_df 中的 NaN 值，如下所示：

                                other   ref
(2014-12-24 13:20:00-05:00, a)  10      1
(2014-12-24 13:40:00-05:00, b)  30      3
(2018-07-03 14:00:00-04:00, d)  110     11

我怎样才能得到它？

【问题讨论】：

标签： python python-3.x pandas dataframe

【解决方案1】：

`pandas.DataFrame.lookup`

final_df['ref'] = ref_df.lookup(*zip(*final_df.index))
final_df['other'] = other_df.lookup(*zip(*final_df.index))

`map` 和 `get`

当您缺少位时

final_df['ref'] = list(map(ref_df.stack().get, final_df.index))
final_df['other'] = list(map(other_df.stack().get, final_df.index))

演示

设置

idx = pd.MultiIndex.from_tuples([(1, 'a'), (2, 'b'), (3, 'd')])
final_df = pd.DataFrame(index=idx, columns=['other', 'ref'])
ref_df = pd.DataFrame([
    [ 1,  2,  3,  4],
    [ 2,  3,  4,  5],
    [ 9, 10, 11, 12]
], [1, 2, 3], ['a', 'b', 'c', 'd'])
other_df = pd.DataFrame([
    [ 10,  20,  30,  40],
    [ 20,  30,  40,  50],
    [ 90, 100, 110, 120]
], [1, 2, 3], ['a', 'b', 'c', 'd'])

print(final_df, ref_df, other_df, sep='\n\n')

    other  ref
1 a   NaN  NaN
2 b   NaN  NaN
3 d   NaN  NaN

   a   b   c   d
1  1   2   3   4
2  2   3   4   5
3  9  10  11  12

    a    b    c    d
1  10   20   30   40
2  20   30   40   50
3  90  100  110  120

结果

final_df['ref'] = ref_df.lookup(*zip(*final_df.index))
final_df['other'] = other_df.lookup(*zip(*final_df.index))

final_df

     other  ref
1 a     10    1
2 b     30    3
3 d    120   12

【讨论】：

很好的答案！但是，如果 other_df 或 ref_df 缺少一行（例如 ref_df 中不存在 2018-07-03 13:20:00-04:00），那么此方法不会失败（因为您正在根据索引进行检查） ?只是好奇……！
"map and get" 对我有用。关于星号的更多解释，我将标记为已解决
@Alfonso_MA 我已将答案更新为使用list 而不是[*...]。 list 说明了我的意思。改变它更容易。

【解决方案2】：

另一个可以处理ref_df 和other_df 中缺失日期的解决方案：

index = pd.MultiIndex.from_tuples(final_df.index)
ref = ref_df.stack().rename('ref')
other = other_df.stack().rename('other')

result = pd.DataFrame(index=index).join(ref).join(other)

【讨论】：

pandas.DataFrame.lookup

map 和 get

演示

设置

结果

`pandas.DataFrame.lookup`

`map` 和 `get`