【问题标题】:RANSACRegressor changing base_estimator properties after constructionRANSACRegressor 在构造后更改 base_estimator 属性
【发布时间】:2022-01-22 20:53:40
【问题描述】:

基于this 问题的公认答案,我正在尝试使用 RANSAC 实现多项式回归器以拟合 5 阶多项式。

让要调整的数据是

x = [0.02965717 0.10966089 0.17002236 0.19015372 0.27044443 0.33011883
 0.40844298 0.4659353  0.54051902 0.61236153 0.68116213 0.74673223
 0.82403296 0.88216575 0.96342659]

y = [3.96001134e-03 6.81505094e-04 0.00000000e+00 1.13660854e-04
 2.26741003e-03 5.64587625e-03 1.24338500e-02 1.91707798e-02
 3.02265331e-02 4.34929443e-02 5.87863985e-02 7.59236901e-02
 9.96780928e-02 1.20366687e-01 1.53936744e-01]

以及使用的估算器

from sklearn.linear_model import RANSACRegressor
from sklearn.metrics import mean_squared_error

class PolynomialRegression(object):
    def __init__(self, degree=3, coeffs=None):
        print(f"Degree: {degree}")
        self.degree = degree
        self.coeffs = coeffs

    def fit(self, X, y):
        self.coeffs = np.polyfit(X.ravel(), y, self.degree)

    def get_params(self, deep=False):
        return {'coeffs': self.coeffs}

    def set_params(self, coeffs=None, random_state=None):
        self.coeffs = coeffs

    def predict(self, X):
        poly_eqn = np.poly1d(self.coeffs)
        y_hat = poly_eqn(X.ravel())
        return y_hat

    def score(self, X, y):
        return mean_squared_error(y, self.predict(X))

拟合是在下面的sn-p中完成的:

import numpy as np
ransac = RANSACRegressor(base_estimator=PolynomialRegression(degree=5),
                          residual_threshold=np.std(y),
                          random_state=0,
                          min_samples=2)
ransac.fit(np.expand_dims(x, axis=1), y)
w = np.array(ransac.estimator_.coeffs)
print(w)

如您所见,我正在向 RANSACRegressor 传递 degree=5 的多项式回归,并且我希望 w 有 6 个组件。然而,在执行代码时,PolynomialRegression 的次数在某个时刻更改为 3,并且使用该默认值完成拟合,而不是我正在构建的那个。

输出:

Degree: 5
Degree: 3
[ 0.07331904  0.14501533 -0.05369491  0.00492718]

如何正确定义要完成的拟合程度?

【问题讨论】:

    标签: python machine-learning scikit-learn ransac


    【解决方案1】:

    问题出在get_params 方法中,因为它预计会返回估计器的超参数,即它应该返回多项式回归的degree,而不是估计的回归系数。请参阅documentation

    import warnings
    import numpy as np
    from sklearn.linear_model import RANSACRegressor
    from sklearn.metrics import mean_squared_error
    from sklearn.datasets import make_regression
    warnings.filterwarnings('ignore')
    
    class PolynomialRegression(object):
        def __init__(self, degree=3):
            print(f"Degree: {degree}")
            self.degree = degree
    
        def fit(self, X, y):
            self.coeffs = np.polyfit(X.ravel(), y, self.degree)
    
        def get_params(self, deep=False):
            return {'degree': self.degree}
    
        def set_params(self, **parameters):
            for parameter, value in parameters.items():
                setattr(self, parameter, value)
            return self
    
        def predict(self, X):
            poly_eqn = np.poly1d(self.coeffs)
            y_hat = poly_eqn(X.ravel())
            return y_hat
    
        def score(self, X, y):
            return mean_squared_error(y, self.predict(X))
    
    x, y = make_regression(n_features=1, random_state=42)
    
    ransac = RANSACRegressor(
        base_estimator=PolynomialRegression(degree=5),
        residual_threshold=np.std(y),
        random_state=0,
        min_samples=2
    )
    
    ransac.fit(x, y)
    
    print(ransac.estimator_.coeffs)
    # Degree: 5
    # Degree: 5
    # [ 2.15861169e-14  1.51841316e-14 -5.09828681e-14  2.71301269e-15
    #   4.17411003e+01 -5.11272743e-15]
    

    【讨论】:

    • 这正是解决方案,谢谢。 get_params 必须返回的内容记录在哪里?
    • 我添加了文档的链接。
    猜你喜欢
    • 1970-01-01
    • 2013-08-01
    • 1970-01-01
    • 2016-11-04
    • 1970-01-01
    • 2021-07-11
    • 2018-09-10
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多