【问题标题】:Function to Replace outlier with Lower Limit and Upper Limit in PythonPython中用下限和上限替换异常值的函数
【发布时间】:2019-02-18 19:09:00
【问题描述】:
from sklearn import datasets
import pandas as pd
import numpy as np

dt = datasets.load_diabetes()
data = pd.DataFrame(data= np.c_[dt['data'], dt['target']],columns= 
dt['feature_names'] + ['target'] )
data = data.drop('sex', axis = 1)

# mean +- 2sigma
# function to calculate outlier of a variable
def out1(x):
    mu = np.average(x)
    sigma = np.std(x)
    LL = mu - 2*sigma # Lower limit 
    UL = mu + 2*sigma # Upper limit
    out = [1 if (a >= UL) | (a <= LL) else 0 for a in x]
    return(out)

# check #outliers in each variable
print(data.apply(out1).apply(sum))


# Function to Replace outlier with LL / UL

def out_impute(x):
    mu = np.average(x)
    sigma = np.std(x)
    LL = mu - 2*sigma # Lower limit 
    UL = mu + 2*sigma # Upper limit
    xnew = "Enter Code Here"
    return(xnew)

data1 = data.apply(out_impute) # Create new data with inputed values

请有人帮我解决如何用下限和上限替换异常值。

我将异常值定义为 >= mu + 2*sigma 和 =

提前致谢!

【问题讨论】:

    标签: python pandas machine-learning statistics


    【解决方案1】:

    使用df.clip

    LL = mu - 2*sigma # Lower limit 
    UL = mu + 2*sigma # Upper limit
    df['data'].clip(LL, UL)
    

    【讨论】:

    • 我认为 clip 在所有列中应用相同的 LL 和 UL。有什么方法可以使它与特定列的 LL 和 UL 一起使用?
    猜你喜欢
    • 2020-08-10
    • 2018-01-05
    • 2021-04-13
    • 2017-05-10
    • 2021-09-16
    • 2020-12-24
    • 1970-01-01
    • 2021-10-12
    • 1970-01-01
    相关资源
    最近更新 更多