1 Evaluating the Hypothesis
1.1 Training / Testing Procedure
| Training sets |
Testing sets |
| 70% |
30% |
| (x(1),y(1))⋅⋅⋅(x(m),y(m)) |
(xtest(1),ytest(1))⋅⋅⋅(xtest(mtest),ytest(mtest)) |
datas are all randomly ordered
1.1.1 for Linear Regression
- Learn parameter θ from training data
- Compute test set error:
Jtest(θ)=2mtest1i=1∑mtest(hθ(xtest(i))−ytest(i))2
1.1.2 for Logistic Regression
- Learn parameter θ from training data
- Compute test set error:
way(1)
Jtest(θ)=−mtest1i=1∑mtest(ytest(i)loghθ(xtest(i))+(1−ytest(i))loghθ(xtest(i)))
way(2)
0/1 misclassification error:
err(hθ(x),y)={1,0,if h(x)≥0.5 and y=0, or h(x)<0.5 and y=1otherwise
Test error=mtest1i=1∑mtesterr(hθ(xtest(i)),ytest(i))
2 Model Selection and Training / Validation / Test sets(交叉验证集)
| Training set |
Cross validation set |
Test set |
| 60% |
20% |
20% |
| (x(1),y(1))⋅⋅⋅(x(m),y(m)) |
(xcv(1),ycv(1))⋅⋅⋅(xcv(mcv),ycv(mcv)) |
(xtest(1),ytest(1))⋅⋅⋅(xtest(mtest),ytest(mtest)) |
- Training error:
Jtrain(θ)=2m1i=1∑m(hθ(x(i))−y(i))2
Cross Validation error:
Jcv(θ)=2mcv1i=1∑mcv(hθ(xcv(i))−ycv(i))2
Test error:
Jtest(θ)=2mtest1i=1∑mtest(hθ(xtest(i))−ytest(i))2
- Model selection:
1° 使用训练集训练出n个模型
2° 用n个模型分别对交叉验证集计算得出交叉验证误差(代价函数的值)
3° 选取代价函数值最小的模型
4° 用步骤3°中选出的模型对测试集计算得出推广误差(代价函数的值)
3 Diagnosing Bias(偏差,欠拟合) vs. Variance(方差,过拟合)

| Bias(Underfit) |
Variance(overfit) |
|
Jtrain(θ) will be high |
Jtrain(θ) will be low |
| Jcv(θ)≈Jtrain(θ) |
Jcv(θ)>>Jtrain(θ) |
3.1 Regularization and Bias / Variance

3.2 Learning Curves
- 将训练集误差和交叉验证集误差作为训练集实例数量m的函数绘制的图表
3.2.1 High Bias

3.2.2 High Variance

3.2.3 Solutions
| to solve high bias |
to solve high variance |
| Try getting additional features |
Get more training examples |
| Try adding polynomial features |
Try smaller sets of features |
| Try decreasing λ
|
Try increasing λ
|
4 Nerual networks and overfitting
| “small” neural network |
“large” neural network |
| fewer parameters |
more parameters |
| more prone to underfitting |
more prone to overfitting |
| computationally cheaper |
computationally more expensive |
|
use regularization(λ)to address overfitting |
5 Reference
吴恩达 机器学习 coursera machine learning
黄海广 机器学习笔记