Machine Learning: Class Two: Regression

Machine Learning: Class Two: Regression

W and B are parameters（因数）, W(i): weight（权重）, B(i): bias（偏移量）
X(i) are the features（特性）

Machine Learning: Class Two: Regression

step 2: the Goodness of Function
Loss Function L: to evaluate how bad a function is
input: a function output: how bad it is

Machine Learning: Class Two: Regression

step 3: Pick the ‘best’ Function
the best function could minimize the result of loss function

Machine Learning: Class Two: Regression

1.基本介绍：

Machine Learning: Class Two: Regression

vector: 向量
gradient: 梯度
-η: learning rate

Gradient descent往往会产生局部最小值或是停留在驻点的问题，又由于梯度通常不会恰好取到0，所以我们常常会取梯度无线趋向于0的点。有时，这些点只是梯度很小而已，但远没有达到local minima。这类似于高原的地区，地势平缓，但是海拔很高。

Machine Learning: Class Two: Regression

2.减小误差

Machine Learning: Class Two: Regression

为了减少Loss Function评估的训练误差，我们可以采取一定的方式。但是，在训练集上表现好的模型，不一定会在测试集上表现得更好。这一现象称为，overfitting（过拟合）。

Machine Learning: Class Two: Regression

3.隐藏因素
根据pokemon的example，pokemon的种类也是隐藏因素之一。

Machine Learning: Class Two: Regression

对于其他的隐藏因素，也可以全部添加到function当中去。但是，尽管得到的训练误差较小，但是测试误差却很大，产生了过拟合现象。

Machine Learning: Class Two: Regression

针对这种现象，我们有一个解决措施：regularization（规范化，正则化）。
即，我们将要对原来的loss function进行改造。

我们在与原来的function当中加入一个w（i）的平方和，即 loss function不仅要获得一个测试误差较小的function，并且这个function也是比较平滑的。

对于λ的取值：λ越大，得到的function越平滑，反之则越陡峭。

对于b，我们是否要使用regularization？答案是否定的，因为b只决定了function的水平位置，对于function的平滑与否没有影响。

这种措施对于大多数情况是适用的。

Machine Learning: Class Two: Regression

最终，经过调试，我们确定了λ的取值，从而使得训练误差趋向最小值。

Machine Learning: Class Two: Regression