排序和分组后获取子数据框答案

【问题标题】：Getting sub-dataframe after sorting and groupby排序和分组后获取子数据框
【发布时间】：2021-06-10 11:16:38
【问题描述】：

我有一个数据框dfas：

  Election Year     Votes   Vote %      Party              Region   
0   2000            42289   29.40   Janata Dal (United)     A
1   2000            27618   19.20   Rashtriya Janata Dal    A
2   2000            20886   14.50   Bahujan Samaj Party     B 
3   2000            17747   12.40   Congress                B
4   2000            14047   19.80   Independent             C
5   2000            17047   10.80   JLS                     C
6   2005            8358    15.80   Janvadi Party           A
7   2005            4428    13.10   Independent             A
8   2005            1647    1.20    Independent             B
9   2005            1610    11.10   Independent             B
10  2005            1334    15.06   Nationalist             C
11  2005            1834    18.06   NJM                     C
12  2010            21114   20.80   Independent             A
13  2010            1042    10.5    Bharatiya Janta Dal     A
14  2010            835     0.60    Independent             B
15  2010            14305   15.50   Independent             B
16  2010            22211   17.70   Congress                C
16  2010            20011   14.70   INC                     C

如何获得每个选举年有两个或多个政党的投票率超过 10 的地区列表？

期望的输出：

Election Year    Region    Vote %
  2000             A        29.40
  2000             A        19.40
  2000             C        19.80
  2000             C        10.80
  2005             A        15.80
  2005             A        13.10
  2005             C        15.06
  2005             C        18.06
  2010             A        20.80
  2010             A        10.5
  2010             C        17.70
  2010             C        14.70

输出仅包含每年投票率超过 10% 的地区以及按升序排序的选举年份和地区名称。所以，这里只有区域“A”和“C”会出现在输出中。

我使用以下代码在按“选举年份”和“地区”分组后按降序对“投票百分比”进行排序，然后每年比较前 2 个投票百分比，但它给出了错误。

df1 = df.groupby(['Election Year','Region'])sort_values('Vote %', ascending = False).reset_index()

【问题讨论】：

请不要一遍又一遍地问同样的问题：stackoverflow.com/questions/67917504/…

标签： python pandas dataframe

【解决方案1】：

试试groupby filter:

cols = ['Election Year', 'Region', 'Vote %']
df1 = (
    df.groupby('Region')
        .filter(lambda g: g['Vote %'].ge(10).all())
        .sort_values(cols, ascending=(True, True, False))
    [cols].reset_index(drop=True)
)

df1:

    Election Year Region  Vote %
0            2000      A   29.40
1            2000      A   19.20
2            2000      C   19.80
3            2000      C   10.80
4            2005      A   15.80
5            2005      A   13.10
6            2005      C   18.06
7            2005      C   15.06
8            2010      A   20.80
9            2010      A   10.50
10           2010      C   17.70
11           2010      C   14.70

df 已使用：

df = pd.DataFrame({
    'Election Year': [2000, 2000, 2000, 2000, 2000, 2000, 2005, 2005, 2005,
                      2005, 2005, 2005, 2010, 2010, 2010, 2010, 2010, 2010],
    'Votes': [42289, 27618, 20886, 17747, 14047, 17047, 8358, 4428, 1647, 1610,
              1334, 1834, 21114, 1042, 835, 14305, 22211, 20011],
    'Vote %': [29.4, 19.2, 14.5, 12.4, 19.8, 10.8, 15.8, 13.1, 1.2, 11.1, 15.06,
               18.06, 20.8, 10.5, 0.6, 15.5, 17.7, 14.7],
    'Party': ['Janata Dal (United)', 'Rashtriya Janata Dal',
              'Bahujan Samaj Party', 'Congress', 'Independent', 'JLS',
              'Janvadi Party', 'Independent', 'Independent', 'Independent',
              'Nationalist', 'NJM', 'Independent', 'Bharatiya Janta Dal',
              'Independent', 'Independent', 'Congress', 'INC'],
    'Region': ['A', 'A', 'B', 'B', 'C', 'C', 'A', 'A', 'B', 'B', 'C', 'C', 'A',
               'A', 'B', 'B', 'C', 'C']
})

【讨论】：