使用 Pandas，想要按多列分组以获得最小值/最大值，并将另一个列值添加到最小值/最大值列答案

【问题标题】：Using Pandas, want to group by multiple columns for min/max and add another column value to min/max columns使用 Pandas，想要按多列分组以获得最小值/最大值，并将另一个列值添加到最小值/最大值列
【发布时间】：2022-12-12 08:32:11
【问题描述】：

首先，如果标题难以理解，请见谅。

目标：我正在尝试按 source, type 列分组，按结果为每个组添加 min,max 列，然后将相关的 target 列添加到 min 和 max 列（在值的前面）。

我不知道如何以这种格式获得 Pandas 结果：

source	type	min	max
Person1	bow	Person 2: 0.001	Person 3: 0.05

我有一个字典列表如下：

`[{'source': 'Person1', 'target': 'Person2', 'type': 'bow', 'similarity': 0.636}, {'source': 'Person1', 'target': 'Person2', 'type': 'bigram', 'similarity': 0.040}, {'source': 'Person1', 'target': 'Person2', 'type': 'tfidf', 'similarity': 0.433}, {'source': 'Person1', 'target': 'Person3', 'type': 'bow', 'similarity': 0.699}, {'source': 'Person1', 'target': 'Person3', 'type': 'bigram', 'similarity': 0.171}, {'source': 'Person1', 'target': 'Person3', 'type': 'tfidf', 'similarity': 0.522}]`

在这个表中看起来像：

source	target	type	similarity
Person1	Person2	bow	0.636
Person1	Person2	bigram	0.040
Person1	Person2	tfidf	0.433
Person1	Person3	bow	0.699
Person1	Person3	bigram	0.171
Person1	Person3	tfidf	0.522

对于分组依据，最小/最大我使用以下内容：

df = df.groupby(['source','type']).similarity.agg(['min','max'])

结果为：

source	type	min	max
Person1	bow	0.636	0.699
Person1	bigram	0.040	0.171
Person1	tfidf	0.433	0.522

到目前为止一切都很好，但是如何将输出转换为以下结构：

[资源]：资源;[类型]：类型;[分钟]：目标：最小值（相似度）；[最大限度]：目标：最大（相似度）

source	type	min	max
Person1	bow	Person2: 0.636	Person3: 0.699
Person1	bigram	Person2: 0.040	Person3: 0.171
Person1	tfidf	Person3: 0.433	Person3: 0.522

我是否应该使用 .loc 来查找最小/最大值所在的行，然后以某种方式将它们添加到结果中？

【问题讨论】：

标签： pandas

【解决方案1】：

这是 GroupBy 和 pandas.merge 的方法：

g = df.groupby(by=['source', 'type'], sort=False)

out = (
            pd.merge(df.loc[g['similarity'].idxmin()]
                       .rename(columns= {'similarity': 'sim_min', 'target': 'target_min'}),
                     df.loc[g['similarity'].idxmax()]
                       .rename(columns= {'similarity': 'sim_max', 'target': 'target_max'}),
                     on=['source','type'])
              .assign(min=lambda x: x.pop('target_min') + ': ' + x.pop('sim_min').astype(str),
                      max=lambda x: x.pop('target_max') + ': ' + x.pop('sim_max').astype(str))
        )

另一种变体：

g = df.groupby(by=['source', 'type'], sort=False)


out = (
            pd.merge(df.loc[g['similarity'].idxmin()]
                         .assign(min= lambda x: x[['target', 'similarity']]
                                                     .astype(str).agg(": ".join, axis=1)),
                     df.loc[g['similarity'].idxmax()]
                         .assign(max= lambda x: x[['target', 'similarity']]
                                                     .astype(str).agg(": ".join, axis=1)),
                     on=['source','type'], suffixes=('', '_'))
                .loc[:, ['source', 'type', 'min', 'max']]
      )

＃输出：

print(out)

    source    type             min             max
0  Person1     bow  Person2: 0.636  Person3: 0.699
1  Person1  bigram   Person2: 0.04  Person3: 0.171
2  Person1   tfidf  Person2: 0.433  Person3: 0.522

【讨论】：

它像我希望的那样工作，谢谢！

【解决方案2】：

例子

data = [['Person1', 'Person2', 'bow', 0.636],
        ['Person1', 'Person2', 'bigram', 0.04],
        ['Person1', 'Person2', 'tfidf', 0.433],
        ['Person1', 'Person3', 'bow', 0.699],
        ['Person1', 'Person3', 'bigram', 0.171],
        ['Person1', 'Person3', 'tfidf', 0.522]]
df = pd.DataFrame(data, columns=['source', 'target', 'type', 'similarity'])

df

    source  target  type    similarity
0   Person1 Person2 bow     0.6
1   Person1 Person2 bigram  0.0
2   Person1 Person2 tfidf   0.4
3   Person1 Person3 bow     0.7
4   Person1 Person3 bigram  0.2
5   Person1 Person3 tfidf   0.5

过程

df.groupby(['source','type']).agg([min, max])

结果：

                target              similarity
                min     max         min     max
source  type                
Person1 bigram  Person2 Person3     0.0     0.2
        bow     Person2 Person3     0.6     0.7
        tfidf   Person2 Person3     0.4     0.5

使结果达到您想要的输出

df的target列的值加上: ，result的值改为str，合并。

完整代码和输出

(df.assign(target=df['target'] + ': ')
 .groupby(['source','type']).agg([min, max]).astype('str')
 .groupby(level=1, axis=1, sort=False).sum().reset_index())

输出：

    source  type    min             max
0   Person1 bigram  Person2: 0.04   Person3: 0.171
1   Person1 bow     Person2: 0.636  Person3: 0.699
2   Person1 tfidf   Person2: 0.433  Person3: 0.522

【讨论】：

您和abokey的回答都有助于理解我的问题，谢谢！

＃ 输出 ：

＃输出：