更新版本:适用于多个 ID
这个解决方案的灵感来自这个thread的回复
import pandas as pd
df = pd.DataFrame({'ID':['001']*10 + ['002']*10,
'Event':['event-1','event-2','event-3','event-final','event-1',
'event-2','event-3','event-final','event-1','event-2',
'event-1','event-2','event-3','event-final','event-1',
'event-2','event-final','event-1','event-2','event-3'],
'time':pd.date_range('2021-03-22 09:00:00', periods=20, freq="T")
})
#converting time to string format to match your data
df['time'] = df['time'].dt.strftime("%H:%M")
#checking for values of 'event-final' and reversing the dataframe to find groupby cumsum
#A value of 0 indicates that its after 'event-final'
#Picking those values will give you the desired results
print (df[df.Event.eq('event-final')[::-1].astype(int).groupby(df.ID).cumsum().eq(0)])
print (df)
输出将是:
ID Event time
8 001 event-1 09:08
9 001 event-2 09:09
17 002 event-1 09:17
18 002 event-2 09:18
19 002 event-3 09:19
对于数据框:
ID Event time
0 001 event-1 09:00
1 001 event-2 09:01
2 001 event-3 09:02
3 001 event-final 09:03
4 001 event-1 09:04
5 001 event-2 09:05
6 001 event-3 09:06
7 001 event-final 09:07
8 001 event-1 09:08
9 001 event-2 09:09
10 002 event-1 09:10
11 002 event-2 09:11
12 002 event-3 09:12
13 002 event-final 09:13
14 002 event-1 09:14
15 002 event-2 09:15
16 002 event-final 09:16
17 002 event-1 09:17
18 002 event-2 09:18
单一 ID 的上一个答案
您可以找到最后一次出现 event-final 的索引,然后列出从该点开始的所有值。是的,在执行此操作之前,您需要按时间和 reset_index 排序值。
import pandas as pd
df = pd.DataFrame({'ID':['001']*10,
'Event':['event-1','event-2','event-3','event-final','event-1',
'event-2','event-3','event-final','event-1','event-2'],
'time':pd.date_range('2021-03-22 09:00:00', periods=10, freq="T")})
#converting time to string format to match your data
df['time'] = df['time'].dt.strftime("%H:%M")
#sorting time in ascending order (assume this is within same day
#if date goes beyond 24 hrs, then you should keep df['time'] in datetime format
df = df.sort_values(by='time').reset_index(drop=True)
print (df)
#find out the index of all events that have `event-final`
#and get only the last one using [-1]
idx = df.index[df['Event']=='event-final'][-1]
#using iloc or loc, you can get all records after the last `event-final` row
print (df.loc[idx+1:])
这个输出将是:
原始数据框:
ID Event time
0 001 event-1 09:00
1 001 event-2 09:01
2 001 event-3 09:02
3 001 event-final 09:03
4 001 event-1 09:04
5 001 event-2 09:05
6 001 event-3 09:06
7 001 event-final 09:07
8 001 event-1 09:08
9 001 event-2 09:09
没有事件最终值的最终数据帧。
ID Event time
8 001 event-1 09:08
9 001 event-2 09:09