为不同的groupby对象pandas实现不同的功能答案

【问题标题】：Implement different functions for different groupby objects pandas为不同的groupby对象pandas实现不同的功能
【发布时间】：2020-11-25 02:08:09
【问题描述】：

经过一番研究，我发现了以下内容 (Apply different functions to different items in group object: Python pandas)。这可能与我想要的完全相同，但我无法理解所提出的答案。让我试着用一个简单的例子来解释我想要什么：

import pandas as pd
import numpy as np

df = pd.DataFrame({'B': ['one', 'one', 'two', 'three',
                         'two', 'two', 'one', 'three'],
                   'C': np.random.randn(8),
                   'D': np.random.randn(8)})
grouped = df.groupby(['B'])

假设我们有一个从上面构建的简单数据集，如下所示：

       B         C         D
0    one -1.758565 -1.544788
1    one -0.309472  2.289912
2    two -1.885911  0.384215
3  three  0.444186  0.551217
4    two -0.502636  2.125921
5    two -2.247551 -0.188705
6    one -0.575756  1.473056
7  three  0.640316 -0.410318

在“B”列对它们进行分组后，创建了 3 个组

一个
两个
三个

现在，我如何在这些组上应用不同的功能，但仍将它们作为同一数据框的一部分。例如如果我想检查第 1 组中的元素是否

【问题讨论】：

你能准确地说明你的意思吗？在组上手动运行函数？
@MadPhysicist，我不想手动实现它们。我只想将不同的函数应用于不同的组，然后将结果放在一个数据框中，而不是将每个组分别作为自己的数据框处理
请发布您的预期输出

标签： python python-3.x pandas dataframe pandas-groupby

【解决方案1】：

您可以使用np.where 定义您想要的任何逻辑：

df['Flag'] = np.where((df['B'] == 'one') & (df['C'] < 0.5), True, False)
df['Flag'] = np.where((df['B'] == 'two') & (df['C'] >= 0.5), True, df['Flag'])
df['Flag'] = np.where((df['B'] == 'three') & (df['C'] < 0.5), True, df['Flag'])

Out[85]: 
       B         C         D   Flag
0    one -1.758565 -1.544788   True
1    one -0.309472  2.289912   True
2    two -1.885911  0.384215  False
3  three  0.444186  0.551217   True
4    two -0.502636  2.125921  False
5    two -2.247551 -0.188705  False
6    one -0.575756  1.473056   True
7  three  0.640316 -0.410318  False

从那里，假设您想按True 的总数进行分组：

df = df.groupby('B')['Flag'].sum().reset_index()

       B    Flag
0    one     3.0
1  three     1.0
2    two     0.0

要实现为可调整的自定义函数（每条评论），您可以：

def flag(one, two, three):
    df['Flag'] = np.where((df['B'] == 'one') & (one), True, False)
    df['Flag'] = np.where((df['B'] == 'two') & (two), True, df['Flag'])
    df['Flag'] = np.where((df['B'] == 'three') & (three), True, df['Flag'])


flag(one=df['C'] < 0.5, two=df['C'] >= 0.5, three=df['C'] < 0.5)
df

B         C         D   Flag
0    one -1.758565 -1.544788   True
1    one -0.309472  2.289912   True
2    two -1.885911  0.384215  False
3  three  0.444186  0.551217   True
4    two -0.502636  2.125921  False
5    two -2.247551 -0.188705  False
6    one -0.575756  1.473056   True
7  three  0.640316 -0.410318  False

【讨论】：

David Erickson，如果需要应用自定义函数而不是像 np.where 中所示的简单逻辑检查，是否可以应用相同的方法？
@UGuntupalli 查看我修改后的答案