使用条件总和的结果创建 Pandas DataFrame 列答案

【问题标题】：Creating a Pandas DataFrame Column with Results from the Sum of a Condition使用条件总和的结果创建 Pandas DataFrame 列
【发布时间】：2017-06-20 02:22:23
【问题描述】：

与this 有关从条件计算 DataFrame 值的问题相关，我有一个更复杂的问题，即为我正在努力处理的给定行包含基于该条件的总和。这是最初的df：

Key UID VID count   month   option  unit    year
0   1   5   100     1       A       10      2015
1   1   5   200     1       B       20      2015
2   1   5   300     2       A       30      2015
3   1   5   400     2       B       40      2015
4   1   7   450     2       B       45      2015
5   1   5   500     3       B       50      2015

我正在寻找迭代这个时间序列dataframe，为每行添加一列'unit_count'，每个行划分的那个月份的“单位”的值划分为“单位”的总和，只能在选项为'b'时.本质上：

df['unit_count'] = df['unit'] / sum of df['count'] for all records containing 'option' 'B' in the same month

这将附加 DataFrame 如下：

Key UID VID count   month   option  unit    year    unit_count
0   1   5   100     1       A       10      2015    0.050
1   1   5   200     1       B       20      2015    0.100
2   1   5   300     2       A       30      2015    0.035
3   1   5   400     2       B       40      2015    0.047
4   1   7   450     2       B       45      2015    0.053
5   1   5   500     3       B       50      2015    0.100

上面例子df的代码是：

df = pd.DataFrame({'UID':[1,1,1,1,1,1],
                   'VID':[5,5,5,5,7,5],
                'year':[2015,2015,2015,2015,2015,2015],
                'month':[1,1,2,2,2,3],
                'option':['A','B','A','B','B','B'],
                'unit':[10,20,30,40,45,50],
                'count':[100,200,300,400,450,500]
                })

【问题讨论】：

标签： python pandas dataframe conditional

【解决方案1】：

只想查看同一个月，因此您可以按month 列分组，然后在每个组中使用option == "B" 子集count 列并取和，使用除以 unit 列的总和值（您的逻辑的翻译）：

df['unit_count'] = df.groupby('month', group_keys=False).apply(
                      lambda g: g.unit/g['count'][g.option == "B"].sum())
df

【讨论】：

不错的解决方案！我认为使用.loc[] 可以让它更好一点：df.groupby(['year','month']).apply(lambda g: g.unit / g.loc[g.option=='B', 'count'].sum())
@MaxU 我有同样的感觉，但不知道它是否会更快，但更紧凑。
@Psidom 解决方案效果很好，尤其是在添加@MaxU 推荐的 df.groupby(['year','month] 时。@MaxU 更紧凑的解决方案返回两个错误。ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long' 和 @987654329 @