你如何在 Python Pandas 中进行分组、排序和限制？（即获得前 10 名）答案

【问题标题】：How do you group, sort, and limit in Python Pandas? (i.e. Get Top 10)你如何在 Python Pandas 中进行分组、排序和限制？（即获得前 10 名）
【发布时间】：2021-03-31 08:40:19
【问题描述】：

我有一个包含 actor_id 和 account_id 列的 Pandas 数据框。演员是一个人，而帐户只是一个帐户。因此，一个人可以拥有多个帐户，并且帐户可以拥有多个人。

我的目标是按 actor_id 分组，然后按他们拥有的帐户数量对 actor_ids 进行排名，这样我就可以获得帐户最多的前 10 个演员的列表。

在 SQL 中，它类似于 SELECT actor_id, account_id, COUNT(account_id) GROUP BY actor_id LIMIT 10。但我正在尝试在 Python 中执行此操作。

我引用了这个Pandas group and sort by index count，但它对我不起作用。下面是我试过的代码。

df['count'] = df['actor_id'].map(df['account_id'].value_counts())
df.sort_index('count', ascending=False)

数据集如下所示：

在图片中，将 project_id 替换为 account_id。

【问题讨论】：

标签： python pandas pandas-groupby

【解决方案1】：

你可以这样做：

df_nb_acc = (
    df.groupby('actor_id')['account_id'] #groupby actor_id, select the column account_id
      .count() # count the number of accout per actor
      .reset_index() # actor_id become a column and not indexes 
      .rename(columns={'account_id':'Nb_account'}) # to rename the column
      .sort_values('Nb_account',axis=1, ascending=False)
      # to sort the value on the column Nb_account, largest to smallest
    )

要获得前 10 名，请执行df_nb_acc.head(10)

【讨论】：