使用pandas python将索引与另一列的最高三个值分组答案

【问题标题】：grouping the index with the highest three values of another column using pandas python使用pandas python将索引与另一列的最高三个值分组
【发布时间】：2018-08-05 11:19:05
【问题描述】：

我有一个 csv 文件，其中包含 StateName、Population、CityName 等列...请注意，对于每个州，你可以有多个城市名称，因此同一个城市有多个人口

我想要的是将 StateName 与同一个城市的三个最高人口分组。

what i have: (image click to see)

what i want to have (image click to see) 我的代码是：

def answer_six():
x=census_df['STNAME'].unique()
census_df2 = df = pd.DataFrame()



for a in x :
    census_dfcopy = census_df.copy()
    census_dfcopy = census_dfcopy.set_index(['STNAME'])
    census_dfcopy  = census_dfcopy.loc[a]
    census_dfcopy = census_dfcopy.reset_index()
    census_dfcopy = census_dfcopy.set_index(['CENSUS2010POP'])
    census_dfcopy1=census_dfcopy.sort_index(ascending = False)
    census_dfcopy1= census_dfcopy1.append(census_dfcopy1)
    census_dfcopy1.groupby('STNAME')


return   census_dfcopy1.head(3)

answer_six()

我只得到最后一个状态的最后 3 个值。

要下载 csv 文件，请访问以下链接： https://drive.google.com/open?id=1ptE6MRQ1NGrfRYBB7NKjqhOJZXlxScPo

【问题讨论】：

提供的链接中没有数据。
@fuglede 抱歉链接现在正确
去年有人在 Code Review 上提出了完全相同的问题：codereview.stackexchange.com/questions/151530/…

标签： python python-3.x pandas numpy series

【解决方案1】：

你可以的

census_df.groupby('STNAME').CENSUS2010POP.nlargest(3)

在行动：

In [51]: df
Out[51]:
    ctyname  pop stname
0         0   10      a
1         1    9      a
2         2    1      a
3         3    3      a
4         4   12      b
5         5   12      b
6         6   13      b
7         7   14      b
8         8    4      c
9         9    3      c
10       10    2      c
11       11    1      c

In [68]: df.groupby('stname').pop.nlargest(3)
Out[68]:
stname
a       0     10
        1      9
        3      3
b       7     14
        6     13
        4     12
c       8      4
        9      3
        10     2

【讨论】：

很好，如果回答对您有帮助，您可以accept it。