如何对不同长度的 Python Pandas groupby 对象进行切片？答案

【问题标题】：How to slice Python Pandas groupby objects with various lengths?如何对不同长度的 Python Pandas groupby 对象进行切片？
【发布时间】：2021-12-02 19:37:15
【问题描述】：

创建数据框：

df = pd.DataFrame({'Set': [1, 1, 1, 2, 2, 2, 2, 2], 'Value': [1, 2, 3, 1, 2, 3, 4, 5]})

DataFrame 中的结果如下所示。

接下来我通过Set进行groupby操作，第一组如下图。

grouped_by_Set = df.groupby('Set')
grouped_by_Set.get_group(1)

现在我想在每个组的 Value 列中选择除最后一个条目之外的所有条目。我可以使用grouped_by_Set.nth([0, 1, 2]) 和grouped_by_Set.nth(-1) 选择每个组的前三个（例如）和最后一个条目，但是选择除每个组的最后一个条目之外的所有条目不适用于grouped_by_Set.nth(0:-1)。由于组的长度不同，我无法明确指定条目。

【问题讨论】：

请张贴预期的输出数据框

标签： python-3.x pandas pandas-groupby slice

【解决方案1】：

IIUC，您可以在apply 中使用iloc

print(df.groupby('Set').apply(lambda x: x.iloc[:-1]).reset_index(drop=True))
   Set  Value
0    1      1
1    1      2
2    2      1
3    2      2
4    2      3
5    2      4

或者您可以使用duplicated 和keep='last' 创建一个掩码，然后将此掩码与loc 一起使用

print(df.loc[df.duplicated(subset='Set', keep='last')])
   Set  Value
0    1      1
1    1      2
3    2      1
4    2      2
5    2      3
6    2      4

【讨论】：

【解决方案2】：

您可以使用tail(1) 获取每个组的最后一个条目，然后使用索引通过反转isin 从原始数据帧中取消选择它：

df[~df.index.isin(df.groupby("Set").tail(1).index)]

# Output:
    Set Value
0   1   1
1   1   2
3   2   1
4   2   2
5   2   3
6   2   4

【讨论】：

【解决方案3】：

使用reset_index(drop=True) + .max() 方法试试这个，想法是使用每个组的索引来设置切片操作的开始和结束：

grouped_by_Set = df.groupby('Set')
group = grouped_by_Set.get_group(1).reset_index(drop=True)
start = group.index[0]
end = group.index.max()
df_output = group.iloc[start:end, :]
print(df_output)

输出：

第 1 组：

	Set	Value
0	1	1
1	1	2

第 2 组：

	Set	Value
0	2	1
1	2	2
2	2	3
3	2	4

【讨论】：