【问题标题】:python pandas find percentile for a group in columnpython pandas在列中查找组的百分位数
【发布时间】:2021-04-29 16:40:38
【问题描述】:

我想找到每列的百分位数并添加到 df 数据框和标签

如果列的值为

前 20%(值>第 80 个百分位)然后是“强”

低于 20%(值>第 80 个百分位)然后“弱”

其他平均值

下面是我的数据框

  df=pd.DataFrame({'month':['1','1','1','1','1','2','2','2','2','2','2','2'],'X1': 
  [30,42,25,32,12,10,4,6,5,10,24,21],'X2':[10,76,100,23,65,94,67,24,67,54,87,81],'X3': 
  [23,78,95,52,60,76,68,92,34,76,34,12]})
  df

低于我的尝试

  df['X1_percentile'] = df.X1.rank(pct = True)
  df['X1_segment'] = np.where(df['X1_percentile']>0.8, 'Strong',np.where(df['X1_percentile'] 
  <0.20,'Weak', 'Average')) 

但我想为每个月和每列执行此操作。如果可能的话,这可以通过任何列号的函数自动执行,并且还为每列键入 colname+"_per" 和 colname+"_segment" ? 谢谢

【问题讨论】:

    标签: pandas percentile


    【解决方案1】:

    我们可以使用groupby + rank 和可选参数pct=True 来计算以百分排名表示的排名,然后使用np.select bin/将百分值分类到离散标签中。

    p = df.groupby('month').rank(pct=True)
    
    df[p.columns + '_per'] = p
    df[p.columns + '_seg'] = np.select([p.gt(.8), p.lt(.2)], ['strong', 'weak'], 'average')
    

       month  X1   X2  X3    X1_per    X2_per    X3_per   X1_seg   X2_seg   X3_seg
    0      1  30   10  23  0.600000  0.200000  0.200000  average  average  average
    1      1  42   76  78  1.000000  0.800000  0.800000   strong  average  average
    2      1  25  100  95  0.400000  1.000000  1.000000  average   strong   strong
    3      1  32   23  52  0.800000  0.400000  0.400000  average  average  average
    4      1  12   65  60  0.200000  0.600000  0.600000  average  average  average
    5      2  10   94  76  0.642857  1.000000  0.785714  average   strong  average
    6      2   4   67  68  0.142857  0.500000  0.571429     weak  average  average
    7      2   6   24  92  0.428571  0.142857  1.000000  average     weak   strong
    8      2   5   67  34  0.285714  0.500000  0.357143  average  average  average
    9      2  10   54  76  0.642857  0.285714  0.785714  average  average  average
    10     2  24   87  34  1.000000  0.857143  0.357143   strong   strong  average
    11     2  21   81  12  0.857143  0.714286  0.142857   strong  average     weak
    

    【讨论】:

      猜你喜欢
      • 2020-10-07
      • 2017-01-27
      • 1970-01-01
      • 2018-09-09
      • 1970-01-01
      • 1970-01-01
      • 2021-12-18
      • 2020-11-10
      • 2020-06-08
      相关资源
      最近更新 更多