The problem of overfitting

If we have too many features, the learned hypothesis may fit the training set very well, but fail to generalize to new examples.

  • Underfitting => high bias
  • Overfitting => high variance

Methods for addressing overfitting

  • Reduce number of features
    Manually choose which features to keep => Drop some information meanwhile.
    Model selection algorithm
  • Regularization
    Keep all the features, but reduce magnitude/values of parameter θj\theta_j
    Works well when we have lots of features, each of which contributes a bit to predicting y.

Cost function

Intuition

Small values for parameters θ0,θ1,...,θn\theta_0, \theta_1, ..., \theta_n

  • “Simpler” hypothesis
  • Less prone to overfitting
  • We cannot know in advance which ones to pick. So we ask for every parameter to be small.

Cost function

J(θ)=12m(i=0m(hθ(x(i)yi)2)+λi=1nθj2)J(\theta) = \frac{1}{2m}(\sum_{i=0}^{m}(h_\theta(x^{(i)}-y^{i})^2) + \lambda \sum_{i=1}^n\theta_j^2)
NOTE:
We do not penalizeθ0\theta_0.
λ\lambda:regularization parameter. Need to choose it well.

Regularized linear regression

Gradient descent

吴恩达机器学习 课堂笔记 Chapter8 正则化(Regularization)

Normal equation

吴恩达机器学习 课堂笔记 Chapter8 正则化(Regularization)

Regularized logistic regression

Cost function

J(θ)=1m(yloghθ(x)(1y)log(1hθ(x)))+12mi=1nθj2J(\theta)=-\frac{1}{m}\sum(ylogh_\theta(x)(1-y)log(1-h_\theta(x)))+\frac{1}{2m}\sum_{i=1}^n\theta_j^2

Gradient descent

Same as above.

相关文章: