Ridge 岭回归通过对回归稀疏增加罚项来解决 普通最小二乘法 的一些问题.岭回归系数通过最小化带罚项的残差平方和
上述公式中, 是控制模型复杂度的因子(可看做收缩率的大小) :
越大,收缩率越大,那么系数对于共线性的鲁棒性更强
一、一般线性回归遇到的问题
在处理复杂的数据的回归问题时,普通的线性回归会遇到一些问题,主要表现在:
-
预测精度:这里要处理好这样一对为题,即样本的数量
和特征的数量
时,最小二乘回归会有较小的方差
时,容易产生过拟合
时,最小二乘回归得不到有意义的结果
- 模型的解释能力:如果模型中的特征之间有相互关系,这样会增加模型的复杂程度,并且对整个模型的解释能力并没有提高,这时,我们就要进行特征选择。
以上的这些问题,主要就是表现在模型的方差和偏差问题上,这样的关系可以通过下图说明:
(摘自:机器学习实战)
方差指的是模型之间的差异,而偏差指的是模型预测值和数据之间的差异。我们需要找到方差和偏差的折中。
二、岭回归的概念
在进行特征选择时,一般有三种方式:
- 子集选择
- 收缩方式(Shrinkage method),又称为正则化(Regularization)。主要包括岭回归和lasso回归。
- 维数缩减
,
通过确定的值可以使得在方差和偏差之间达到平衡:随着
的增大,模型方差减小而偏差增大。
对求导,结果为
令其为0,可求得的值:
三、实验的过程
我们去探讨一下取不同的对整个模型的影响。
和其他线性模型一样,Ridge 调用 fit 方法,参数为X,y,并且将线性模型拟合的系数 存到成员变量
coef_中。:
>>> from sklearn import linear_model
>>> clf = linear_model.Ridge (alpha = .5)
>>> clf.fit ([[0, 0], [0, 0], [1, 1]], [0, .1, 1])
Ridge(alpha=0.5, copy_X=True, fit_intercept=True, max_iter=None,
normalize=False, random_state=None, solver='auto', tol=0.001)
>>> clf.coef_
array([ 0.34545455, 0.34545455])
>>> clf.intercept_
0.13636...
绘制岭系数正则化的函数
岭回归是本例中使用的估计量,美中衍射代表不同系数向量特征,在路径末端,alpha趋于零,解趋于普通最小二乘法,系数表现出现很大的震荡
# Author: Fabian Pedregosa -- <[email protected]>
# License: BSD 3 clause
print(__doc__)
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
# X is the 10x10 Hilbert matrix 希尔伯特矩阵
X = 1. / (np.arange(1, 11) + np.arange(0, 10)[:, np.newaxis])
y = np.ones(10)
###############################################################################
# Compute paths a为alpha 正则化系数,通过便利-10 到 -2之间200个系数,来寻找最佳系数
n_alphas = 200
alphas = np.logspace(-10, -2, n_alphas)
clf = linear_model.Ridge(fit_intercept=False)
coefs = []
for a in alphas:
clf.set_params(alpha=a)
clf.fit(X, y)
coefs.append(clf.coef_)
###############################################################################
# Display results
ax = plt.gca()
ax.set_color_cycle(['b', 'r', 'g', 'c', 'k', 'y', 'm'])
ax.plot(alphas, coefs)
ax.set_xscale('log')
ax.set_xlim(ax.get_xlim()[::-1]) # reverse axis
plt.xlabel('alpha')
plt.ylabel('weights')
plt.title('Ridge coefficients as a function of the regularization')
plt.axis('tight')
plt.show()
class
sklearn.linear_model.RidgeClassifier(alpha=1.0, fit_intercept=True, normalize=False, copy_X=True, max_iter=None, tol=0.001, class_weight=None, solver='auto', random_state=None)[source]Classifier using Ridge regression.
Read more in the User Guide.
| Parameters: |
alpha : float
class_weight : dict or ‘balanced’, optional
copy_X : boolean, optional, default True
fit_intercept : boolean
max_iter : int, optional
normalize : boolean, optional, default False
solver : {‘auto’, ‘svd’, ‘cholesky’, ‘lsqr’, ‘sparse_cg’, ‘sag’}
tol : float
random_state : int seed, RandomState instance, or None (default)
|
|---|---|
| Attributes: |
coef_ : array, shape (n_features,) or (n_classes, n_features)
intercept_ : float | array, shape = (n_targets,)
n_iter_ : array or None, shape (n_targets,)
|
See also
Notes
For multi-class classification, n_class classifiers are trained in a one-versus-all approach. Concretely, this is implemented by taking advantage of the multi-variate response support in Ridge.
Methods
decision_function(X) |
Predict confidence scores for samples. |
fit(X, y[, sample_weight]) |
Fit Ridge regression model. |
get_params([deep]) |
Get parameters for this estimator. |
predict(X) |
Predict class labels for samples in X. |
score(X, y[, sample_weight]) |
Returns the mean accuracy on the given test data and labels. |
set_params(\*\*params) |
Set the parameters of this estimator. |
-
__init__(alpha=1.0, fit_intercept=True, normalize=False, copy_X=True, max_iter=None, tol=0.001, class_weight=None, solver='auto', random_state=None)[source]
-
decision_function(X)[source] -
Predict confidence scores for samples.
The confidence score for a sample is the signed distance of that sample to the hyperplane.
Parameters: X : {array-like, sparse matrix}, shape = (n_samples, n_features)
Samples.
Returns: array, shape=(n_samples,) if n_classes == 2 else (n_samples, n_classes) :
Confidence scores per (sample, class) combination. In the binary case, confidence score for self.classes_[1] where >0 means this class would be predicted.
-
fit(X, y, sample_weight=None)[source] -
Fit Ridge regression model.
Parameters: X : {array-like, sparse matrix}, shape = [n_samples,n_features]
Training data
y : array-like, shape = [n_samples]
Target values
sample_weight : float or numpy array of shape (n_samples,)
Sample weight.
New in version 0.17: sample_weight support to Classifier.
Returns: self : returns an instance of self.
-
get_params(deep=True)[source] -
Get parameters for this estimator.
Parameters: deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params : mapping of string to any
Parameter names mapped to their values.
-
predict(X)[source] -
Predict class labels for samples in X.
Parameters: X : {array-like, sparse matrix}, shape = [n_samples, n_features]
Samples.
Returns: C : array, shape = [n_samples]
Predicted class label per sample.
-
score(X, y, sample_weight=None)[source] -
Returns the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
Parameters: X : array-like, shape = (n_samples, n_features)
Test samples.
y : array-like, shape = (n_samples) or (n_samples, n_outputs)
True labels for X.
sample_weight : array-like, shape = [n_samples], optional
Sample weights.
Returns: score : float
Mean accuracy of self.predict(X) wrt. y.
-
set_params(**params)[source] -
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>so that it’s possible to update each component of a nested object.
文章参考:http://blog.csdn.net/google19890102/article/details/27228279