在每个组中查找特定类别的最后一次出现并过滤掉行-Pandas答案

【问题标题】：Finding last occurrence of a particular category within each group and Filter out rows-Pandas在每个组中查找特定类别的最后一次出现并过滤掉行-Pandas
【发布时间】：2022-08-18 15:56:17
【问题描述】：

我有一个数据集如下：

data = [[1,\'bot\', \'a\'], [1,\'cust\', \'b\'], [1,\'bot\', \'c\'],[1,\'cust\', \'d\'],[1,\'agent\', \'e\'],[1,\'cust\', \'f\'],
       [2,\'bot\', \'a\'],[2,\'cust\', \'b\'],[2,\'bot\', \'c\'],[2,\'bot\', \'d\'],[2,\'agent\', \'e\'],[2,\'cust\', \'f\'],[2,\'agent\', \'g\'],
       [3,\'cust\', \'h\'],[3,\'cust\', \'i\'],[3,\'agent\', \'k\'],[3,\'agent\', \'l\']]
  
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=[\'id\', \'sender\',\'text\'])
df

我想删除特定类别（发件人）的每个 id 组下的过滤记录。例如，如果我想过滤掉 \'bot\' 类别，我需要在每个 group(id) 下找到最后一个 bot 类别出现并删除该出现之前的记录。

预期产出

尝试了具有 groupby 功能的各种方法，但没有获得预期的输出。任何指针都会很有帮助

标签： python-3.x pandas dataframe group-by

【解决方案1】：

您可以使用反向 groupby.cummin 进行布尔索引：


m = df.loc[::-1,'sender'].ne('bot').groupby(df['id']).cummin()

out = df[m]

输出：

    id sender text
3    1   cust    d
4    1  agent    e
5    1   cust    f
10   2  agent    e
11   2   cust    f
12   2  agent    g
13   3   cust    h
14   3   cust    i
15   3  agent    k
16   3  agent    l

【讨论】：