根据按组计算的百分位数向数据框中添加一列答案

【问题标题】：Add a column to a data frame based on percentiles calculated by group根据按组计算的百分位数向数据框中添加一列
【发布时间】：2016-12-24 05:07:14
【问题描述】：

我有一个如下表格的数据框

Group    Value
  A       0.20
  A       0.86
  A       1.42
  A       0.35
  B       1.77
  B       0.56
  B       0.21
  .        .
  .        .

我想添加一列Alert，它采用两个可能的值：

'1' 如果特定 Group 的 Value 超过 1 - thr 百分位或小于每个特定 Value 的 thr 百分位 Group，其中 thr 是用户-定义的阈值
'0' 否则。

例如，假设Group A 中Value 的1 - thr 和thr 百分位数分别为1.0 和0.25；新列的对应值（我们称之为Alert）将是

    Group    Value   Alert
      A       0.20     1
      A       0.86     0
      A       1.42     1
      A       0.35     0

我已经尝试了以下

def make_alert(x, thr):
if x >= np.percentile(x, 1 - thr) | x <= np.percentile(x, thr):
    return 0
else:
    return 1 

pdf.groupby('Name').apply(lambda x: make_alert(x['Value'], AlertThr))

但是这不起作用，因为我的函数应用于相应列的每个元素，因此没有计算每个组的上限和下限。

有人可以提供有关如何执行此操作的提示吗？

【问题讨论】：

标签： python-2.7 pandas pandas-groupby

【解决方案1】：

我认为....（我自己是新手）使用.apply 表示该函数应用于“名称”列的内容。而是考虑...

df['Alert'] = df['Value'].apply(your_function)

【讨论】：

因此，在这种情况下，您创建一个新列 ['Alert']，在应用您的 make_alert 函数后，该列具有 ['Value'] 列的值。虽然我不一定确定你如何将两个参数传递给你的函数，因为我只使用过.apply 来传递单元格的内容。也许检查documentation
如果我这样做pdf.groupby('Group').quantile([thr, 1 - thr])，我会得到一个表，其中包含每个Group 的上下决策边界，我看不到的是我如何使用该表中的值来计算我需要的警报。