跟踪 Dataframe 中的操作答案

【问题标题】：Track an operation in Dataframe跟踪 Dataframe 中的操作
【发布时间】：2019-11-18 02:50:59
【问题描述】：

我的数据框

df1:

Index                 Amount               
01.01.2018 08:00:00   23.25
01.01.2018 08:10:00   25.50
01.01.2018 08:20:00   26.30
01.01.2018 08:30:00   25.00
01.01.2018 08:40:00   20.00
01.01.2018 08:50:00   21.20
01.01.2018 09:00:00   21.20
01.01.2018 09:10:00   31.20

df2:

Index       Operation
01.01.2018  -5.00
01.01.2018  10.00

我想在我的 df1 中跟踪来自 df2 的操作。

所以基本上检查 df2 中的操作，并找到该事件在 df1 中发生的位置。例如。有-5.00，这个事件发生在这里：

01.01.2018 08:30:00   25.00
01.01.2018 08:40:00   20.00

我的预期输出：

df:

Index                 Amount  Operation_T/F  Amount_Operation              
01.01.2018 08:00:00   23.25   0              0
01.01.2018 08:10:00   25.50   0              0
01.01.2018 08:20:00   26.30   0              0
01.01.2018 08:30:00   25.00   0              0
01.01.2018 08:40:00   20.00   1              -5.0
01.01.2018 08:50:00   21.20   0              0
01.01.2018 09:00:00   21.20   0              0
01.01.2018 09:10:00   31.20   1              10.0

白天可以重复操作这一事实不是问题。当然玩一些 for 和 if 可能是一个解决方案，但我正在尝试在 python 中实现一个干净的代码，我正在考虑一种更好的方法。

在编写 True 或 False 操作值、if in row 或 row + 1 时遇到了一些问题。我解决这个问题的想法是为两行创建 bin，然后跟踪该 bin 中是否发生了操作事件。你怎么看？

提前谢谢:)

【问题讨论】：

您能否更好地解释一下I'd like to track operations from df2 in my df1 的含义？
我想在 df1 的金额栏中找到 Operation -5.00 (df2)。它发生在从索引 01.01.2018 08:30:00 传递到索引 01.01.2018 08:40:00 的 df1 中。抱歉没有解释，我会编辑我的问题。
我猜日期很重要？所以必须有之前的 groupby 操作？
您应该检查以前的值 - 当前值。根据结果写入 Amount_Operation 列。
如果df1和df2之间的天数不同，我们是否应该忽略天数并在差异匹配时合并？

标签： python pandas timestamp

【解决方案1】：

IIUC，你想在金额和日期的差异上合并：

df1['date'] = df1.index.floor('D')
df1['Amount_Operation'] = df1.Amount.diff()


df = (df1.reset_index()
         .merge(df2.reset_index(), 
              left_on=['date', 'Amount_Operation'],
              right_on=['Index','Operation'],
              left_index=True,
              suffixes=['','_y'],
              how='left')
         .drop(['Index_y', 'date'], axis=1)
     )

df['Operation_T/F'] = df.Operation.isna()
df['Amount_Operation'] = df.Operation.fillna(0)

输出：

                Index  Amount  Amount_Operation  Operation  Operation_T/F
1 2018-01-01 08:00:00   23.25               0.0        NaN           True
1 2018-01-01 08:10:00   25.50               0.0        NaN           True
1 2018-01-01 08:20:00   26.30               0.0        NaN           True
1 2018-01-01 08:30:00   25.00               0.0        NaN           True
0 2018-01-01 08:40:00   20.00              -5.0       -5.0          False
1 2018-01-01 08:50:00   21.20               0.0        NaN           True
1 2018-01-01 09:00:00   21.20               0.0        NaN           True
1 2018-01-01 09:10:00   31.20              10.0       10.0          False

【讨论】：

【解决方案2】：

这是一种使用diff 来检查df2.Operation 中的第一个差异等于df2.Operation 并利用broadcasting 的方法：

m = df1.Amount.diff().values == df2.Operation.values[:,None]
df1['Operation_T/F'] = m.sum(0)
df1['Amount_Operation'] = (m * df2.Operation.values[:,None]).sum(0)

         Index         Amount         Operation_T/F  Amount_Operation
0 2018-01-01 08:00:00   23.25              0               0.0
1 2018-01-01 08:10:00   25.50              0               0.0
2 2018-01-01 08:20:00   26.30              0               0.0
3 2018-01-01 08:30:00   25.00              0               0.0
4 2018-01-01 08:40:00   20.00              1              -5.0
5 2018-01-01 08:50:00   21.20              0               0.0
6 2018-01-01 09:00:00   21.20              0               0.0
7 2018-01-01 09:10:00   31.20              1              10.0

【讨论】：

这似乎没有说明索引中的日期实际上与正在合并的日期相匹配。例如，如果将索引4 更改为2nd，它仍然会添加一个条目
据我从 OPs cmets 了解到，不需要：the problem is that df2 doesn't have a time index. I don't want to group by them@user3483203
稍后，他说I don't want to group by them, just to check where during the day, two operations from df2 happened in df1。同意这有点令人困惑。
他的意思是没有时间，但还有一天。 “只是为了检查白天的位置”
我从中推断的是，OP 在df2 中的给定时间仅具有对应于 1 天的值，尽管您可能是对的 @user3483203