Python：在pandas DataFrame中获取组内的键值对答案

【问题标题】：Python: Get key value pairs within a group in pandas DataFramePython：在pandas DataFrame中获取组内的键值对
【发布时间】：2021-11-14 00:10:18
【问题描述】：

我有以下数据框：

import pandas as pd

df = pd.DataFrame([
    {'car_id': 123, 'country_code': 'CZ', 'grade': 5.0},
    {'car_id': 123, 'country_code': 'SK', 'grade': 1.0},
    {'car_id': 123, 'country_code': 'PL', 'grade': 4.0},
    {'car_id': 234, 'country_code': 'CZ', 'grade': 4.0},
    {'car_id': 234, 'country_code': 'SK', 'grade': 2.0},
    {'car_id': 234, 'country_code': 'PL', 'grade': 3.0},
    {'car_id': 345, 'country_code': 'CZ', 'grade': 2.0},
    {'car_id': 345, 'country_code': 'SK', 'grade': 5.0},
    {'car_id': 345, 'country_code': 'PL', 'grade': 1.0},
    {'car_id': 456, 'country_code': 'CZ', 'grade': None},
    {'car_id': 456, 'country_code': 'SK', 'grade': None},
    {'car_id': 456, 'country_code': 'PL', 'grade': None}
])

现在我想按car_id 对数据进行分组并得到两列：

最低等级，
最低等级的国家代码。

到目前为止，我有以下代码：

>>> (
...     df
...     .groupby('car_id')
...     .apply(lambda x: pd.Series({
...         'min_grade': x['grade'].min(),
...         'min_grade_country': x.loc[x.grade == x.grade.min(), 'country_code'],
...     }))
...     .reset_index()
... )
 car_id  min_grade   min_grade_country
0   123        1.0   1 SK Name: country_code, dtype: object
1   234        2.0   4 SK Name: country_code, dtype: object
2   345        1.0   8 PL Name: country_code, dtype: object
3   456        NaN   Series([], Name: country_code, dtype: object)

如您所见，我无法提取最低等级的国家代码。而且，我不确定是否有任何更优雅的 pandas 方法来获得它——我的意思是不结合使用 .apply() 方法和 lambda 函数。你能帮我解决这个问题吗？

【问题讨论】：

标签： python pandas group-by

【解决方案1】：

将GroupBy.agg 与min 和DataFrameGroupBy.idxmin 聚合用于从country_code 转换为index 的索引：

df1 = (
    df
    .set_index('country_code')
    .groupby('car_id')
    .agg(
        min_grade=('grade', 'min'),
        min_grade_country=('grade', 'idxmin')
    )
    .reset_index()
)

print (df1)
   car_id  min_grade min_grade_country
0     123        1.0                SK
1     234        2.0                SK
2     345        1.0                PL
3     456        NaN               NaN

【讨论】：

@Jaroslav Bezděk Dakujem。