吴恩达机器学习课堂笔记 Chapter8 正则化（Regularization）

The problem of overfitting

Methods for addressing overfitting

Cost function

Intuition
Cost function

Regularized linear regression

Gradient descent
Normal equation

Regularized logistic regression

Cost function
Gradient descent

The problem of overfitting

If we have too many features, the learned hypothesis may fit the training set very well, but fail to generalize to new examples.

Underfitting => high bias
Overfitting => high variance

Methods for addressing overfitting

Reduce number of features
Manually choose which features to keep => Drop some information meanwhile.
Model selection algorithm
Regularization
Keep all the features, but reduce magnitude/values of parameter $\theta_j$
Works well when we have lots of features, each of which contributes a bit to predicting y.

Cost function

Intuition

Small values for parameters $\theta_0, \theta_1, ..., \theta_n$

“Simpler” hypothesis
Less prone to overfitting
We cannot know in advance which ones to pick. So we ask for every parameter to be small.

Cost function

$J(\theta) = \frac{1}{2m}(\sum_{i=0}^{m}(h_\theta(x^{(i)}-y^{i})^2) + \lambda \sum_{i=1}^n\theta_j^2)$
NOTE:
We do not penalize $\theta_0$ .
$\lambda$ :regularization parameter. Need to choose it well.

Regularized linear regression

Gradient descent

吴恩达机器学习课堂笔记 Chapter8 正则化（Regularization）

Normal equation

吴恩达机器学习课堂笔记 Chapter8 正则化（Regularization）

Regularized logistic regression

Cost function

$J(\theta)=-\frac{1}{m}\sum(ylogh_\theta(x)(1-y)log(1-h_\theta(x)))+\frac{1}{2m}\sum_{i=1}^n\theta_j^2$

Gradient descent

Same as above.