我认为您可以将 groupby 与 filter 一起使用 - 条件是 - 不是 2 具有重复值的行和组中的列 message isin 没有值 T 或 X:
import pandas as pd
df = pd.DataFrame({"ID":["AA-1", "AA-1", "C-0" ,"BB-2", "BB-2"],
"symbol":["A","A","C","B","B"],
"date":["06/24/2014","06/24/2014","06/20/2013","06/25/2015","06/25/2015"],
"message": ["T","X","T","",""] })
print (df)
ID date message symbol
0 AA-1 06/24/2014 T A
1 AA-1 06/24/2014 X A
2 C-0 06/20/2013 T C
3 BB-2 06/25/2015 B
4 BB-2 06/25/2015 B
df1 = df.groupby(['ID','date','symbol']).filter(lambda x: ~((len(x) == 2) &
(x.message.isin(['T','X']).all())))
print (df1)
ID date message symbol
2 C-0 06/20/2013 T C
3 BB-2 06/25/2015 B
4 BB-2 06/25/2015 B
Filtration in docs.
comment编辑:
import pandas as pd
df = pd.DataFrame({"ID":["AA-1", "AA-1", "C-0", "C-0","BB-2", "BB-2"],
"symbol":["A","A","C","C", "B","B"],
"date":["06/24/2014","06/24/2014","06/20/2013","06/20/2013","06/25/2015","06/25/2015"],
"message": ["T","X","X","X","",""] })
print (df)
ID date message symbol
0 AA-1 06/24/2014 T A
1 AA-1 06/24/2014 X A
2 C-0 06/20/2013 X C
3 C-0 06/20/2013 X C
4 BB-2 06/25/2015 B
5 BB-2 06/25/2015 B
如果需要删除每个组中带有X 或T 的值 - 这意味着它也删除了双X 或双T,并且每组的每个len 始终为2:
df1 = df.groupby(['ID','date','symbol']).filter(lambda x: ~x.message.isin(['T','X']).all())
print (df1)
ID date message symbol
4 BB-2 06/25/2015 B
5 BB-2 06/25/2015 B
如果只需要删除值是T 和X 的组,您可以先通过message 删除sort_values,然后通过检查第一个值是否为T 和第二个X 来检查每个组中的X团体。 ('T' 是第一个,X 是第二个,因为排序):
df2 = df.sort_values('message')
.groupby(['ID','date','symbol'], sort=False)
.filter(lambda x: ((x.message.iloc[0] != 'T') | (x.message.iloc[1] != 'X')))
print (df2)
ID date message symbol
4 BB-2 06/25/2015 B
5 BB-2 06/25/2015 B
2 C-0 06/20/2013 X C
3 C-0 06/20/2013 X C