根据分组数据框中组的前两个值获取数据框答案

【问题标题】：Get the dataframe based on top two values of a group in grouped dataframe根据分组数据框中组的前两个值获取数据框
【发布时间】：2021-06-11 10:42:32
【问题描述】：

我的数据框df 是：

data = {'Election Year':['2000', '2000','2000','2000','2000','2000','2000','2000','2000','2005','2005','2005','2005','2005','2005','2005','2005','2005', '2010', '2010','2010','2010','2010','2010','2010','2010', '2010'],
    'Votes':[30, 50, 20, 26, 30, 45, 20, 46, 80, 60, 46, 95, 60, 10, 95, 16, 65, 35, 50, 100, 70, 26, 180, 100, 120, 46, 80], 
    'Party': ['A', 'B', 'C', 'A', 'B', 'C','A', 'B', 'C','A', 'B', 'C','A', 'B', 'C','A', 'B', 'C', 'A', 'B', 'C','A', 'B', 'C','A', 'B', 'C'],
    'Region': ['a', 'a', 'a', 'b', 'b', 'b','c', 'c', 'c','a', 'a', 'a', 'b', 'b', 'b','c', 'c', 'c','a', 'a', 'a', 'b', 'b', 'b','c', 'c', 'c']}
df = pd.DataFrame(data)
df

    
    Election Year   Votes   Party   Region
  0   2000           30      A       a
  1   2000           50      B       a
  2   2000           20      C       a
  3   2000           26      A       b
  4   2000           30      B       b
  5   2000           45      C       b 
  6   2000           20      A       c
  7   2000           46      B       c
  8   2000           80      C       c
  9   2005           60      A       a
  10  2005           46      B       a
  11  2005           95      C       a
  12  2005           60      A       b
  13  2005           10      B       b
  14  2005           95      C       b
  15  2005           16      A       c
  16  2005           65      B       c
  17  2005           35      C       c
  18  2010           50      A       a
  19  2010           100     B       a
  20  2010           70      C       a
  21  2010           26      A       b
  22  2010           180     B       b
  23  2010           100     C       b 
  24  2010           120     A       c
  25  2010           46      B       c
  26  2010           80      C       c

我希望获得显示 2010 年选举前 2 名的每个政党在考虑每个地区的所有过去选举中获得的最低票数的子数据框。所以想要的输出是：

 Election Year   Party   Votes   Region
     2005         B       10        b
     2000         C       20        a

首先，我试图根据 2010 年的总票数获得前两个政党。但它给出了每年的前两个政党。

df1 = df.groupby(['Election Year','Party'])['Votes'].sum().reset_index()
df1 = df1.sort_values(['Election Year','Votes'], ascending=False)
top_2 = df1.groupby('Election Year').head(8).reset_index()
top_2 = top_2[['Election Year', 'Party']].to_string(index=False)
top_2

如何解决此问题以获得 2010 年的前 2 名政党，然后检查所有年份中的最低票数。

【问题讨论】：

在您的尝试中，您从未仅选择 2010 年的数据。您尝试过这样做吗？那将是我做的第一件事……

标签： python pandas dataframe data-science

【解决方案1】：

获得在 2010 年选举中排名前 2 的政党：

m=df['Election Year'].eq('2010')
#create a msk to check condition
party=df[m].groupby(['Election Year','Party'],as_index=False)['Votes'].sum().sort_values('Votes',ascending=False).head(2)['Party'].values
#passed that mask and then grouping and sort values in descending order and get top 2 parties name

最终获得那两党的最低票数：

out=df[df['Party'].isin(party)].sort_values('Votes').drop_duplicates(subset=['Party'])
#checking minimum votes only for those parties

现在，如果您打印 out，您将获得预期的输出

【讨论】：

感谢您的友好回答。第一部分是仅给出前两个政党，但不是基于 2010 年的总最高票数。我是否需要按投票数的降序排序才能获得 2010 年最大的两个政党？ 2010 年排名前两位的政党是 B 和 C。
@Dpk 哦，你是对的......更新的答案......请看一下:)

【解决方案2】：

首先，我们尝试提取 2010 年表现最好的前 2 名。

top_2_in_2010 = df[df['Election Year'] == 2010].groupby(['Election Year', 'Party'], \
       as_index = False).sum().sort_values('Votes', ascending = False)['Party'][:2].values

为往年创建数据框：

df_2 = df[df['Election Year'] < 2010][df['Party'].isin(top_2_in_2010)]

最后，

result = df_2.sort_values('Votes', ascending = True).head(2)

打印出 result 将为您提供所需的输出。

【讨论】：