Groupby 具有产品聚合的多列答案

【问题标题】：Groupby multiple columns with product aggregationGroupby 具有产品聚合的多列
【发布时间】：2020-12-25 12:30:45
【问题描述】：

我有一个 pandas 数据框，其中包含以下几个项目和计数：

index	item1	item2	item3	count1	count2	count3
1	0	0.5	0.5	10	15	0
2	0.5	0	0.5	20	20	20
3	1	0	0	30	10	30
4	0	1	0	20	20	0

我想按项目groupby 数据，与产品聚合，这样我最终得到一个数据框，其中项目作为索引，计数作为列，单元格 i、j 的值等于总和的item{i} * count{j}。例如：

index	count1	count2
item1	(0 * 10) + (0.5 * 20) + (1 * 30) + (0 * 20) = 40	(0 * 15) + (0.5 * 20) + (1 * 10) + (0 * 20) = 20
item2	(0.5 * 10) + (0 * 20) + (0 * 30) + (1 * 20) = 25	(0.5 * 15) + (0 * 20) + (0 * 10) + (1 * 20) = 27.5

我尝试过使用groupby:

df[items + counts].groupby(items).agg('prod')

和

df.groupby(items)[counts].agg('prod')

但问题是groupby 使用列的值而不是列本身，我遇到了与pivot_table 相同的问题：

df.pivot_table(index=items, values=counts, aggfunc='prod')

我觉得解决方案应该是微不足道的，但我不能完全确定我缺少什么。

【问题讨论】：

第 3 项在哪里？

标签： python pandas dataframe pandas-groupby

【解决方案1】：

IIUC，您可以使用dot of items vs counts：

# create DataFrame with only item columns
items = df.filter(regex='^item')

# create DataFrame with only count columns
counts = df.filter(regex='^count')

# compute dot product
res = items.T.dot(counts)

print(res)

输出

       count1  count2  count3
item1    40.0    20.0    40.0
item2    25.0    27.5     0.0
item3    15.0    17.5    10.0

items 和 counts DataFrames 是使用filter 获得的。

【讨论】：