【问题标题】:Capping the outliers限制异常值
【发布时间】:2020-12-24 05:24:50
【问题描述】:

我有一个包含 3 个数值变量的数据框,我试图将异常值限制在 0.01 到 0.99 个百分位数之间,但它不起作用。

df[['TotalVisits', 'Total Time Spent on Website', 
'Page Views Per Visit']].describe(percentiles=[.25, .5, .75, .90, .95, .99])

This is the output

现在我尝试将异常值限制为:

q_l = df['TotalVisits'].quantile(0.00)
q_h = df['TotalVisits'].quantile(0.99)

df['TotalVisits'][df['TotalVisits']<= q_l] = q_l
df['TotalVisits'][df['TotalVisits']>= q_h] = q_h

但输出保持不变,而不是最大值变为 17。

【问题讨论】:

    标签: python-3.x pandas outliers


    【解决方案1】:

    您正在执行将失败的链切片分配

    修复你的代码

    q_l = df['TotalVisits'].quantile(0.00)
    q_h = df['TotalVisits'].quantile(0.99)
    
    df.loc[df['TotalVisits']<= q_l, 'TotalVisits'] = q_l
    df.loc[df['TotalVisits']>= q_h], 'TotalVisits'] = q_h
    

    并使用 pandas 功能改进它clip

    df['TotalVisits'] = df['TotalVisits'].clip(lower = q_l, upper = q_h)
    

    【讨论】:

      猜你喜欢
      • 2014-07-11
      • 1970-01-01
      • 2020-01-04
      • 1970-01-01
      • 1970-01-01
      • 2020-07-03
      • 2017-02-13
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多