[机器学习笔记] 梯度下降法

[Machine Learning notebook by NG] Gradient Descent

Linear Regression Model(线性模型)
$h_\theta=\theta_0+\theta_1x$
$J(\theta_0,\theta_1)=\frac{1}{2m}\sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})^{2}$

Gradient Descent algorithm

[机器学习笔记] 梯度下降法
repeat until convergence {
$\theta_j=\theta_j-\alpha\frac{\partial}{\partial\theta_j}j(\theta_0,\theta_1)$
$j = 0 or 1$
}
$\alpha$ means learning rate,that is to say:it means the size of gradient step

Correct: Simultaneous update
$temp0=\theta_0-\alpha\frac{\partial}{\partial\theta_0}j(\theta_0,\theta_1)$
$temp1=\theta_0-\alpha\frac{\partial}{\partial\theta_1}j(\theta_0,\theta_1)$
$\theta_0=temp0$
$\theta_1=temp1$

Notice that:update $\theta_0$ and $\theta_1$ simultaneously

different Start point may obtion result,as than pictures show。
[机器学习笔记] 梯度下降法

the size of learning rate $\alpha$

if $\alpha$ is too small,gradient descent can be slow.
if $\alpha$ is too large, gradient descent can overshoot the minimum it may fail to converge, or even diverge.