[Machine Learning notebook by NG] Gradient Descent
Linear Regression Model(线性模型)
hθ=θ0+θ1x
J(θ0,θ1)=2m1∑i=1m(hθ(x(i))−y(i))2
Gradient Descent algorithm
![[机器学习笔记] 梯度下降法 [机器学习笔记] 梯度下降法](/default/index/img?u=L2RlZmF1bHQvaW5kZXgvaW1nP3U9YUhSMGNITTZMeTl3YVdGdWMyaGxiaTVqYjIwdmFXMWhaMlZ6THpNeU9DOWpPVGsyWmpJeFlUWXdZV0UyT1dGaFpUY3laRE5rTkRrMk5qWXdabVl3TUM1d2JtYz0=)
repeat until convergence {
θj=θj−α∂θj∂j(θ0,θ1)
j=0or1
}
α means learning rate,that is to say:it means the size of gradient step
Correct: Simultaneous update
temp0=θ0−α∂θ0∂j(θ0,θ1)
temp1=θ0−α∂θ1∂j(θ0,θ1)
θ0=temp0
θ1=temp1
Notice that:update θ0 and θ1 simultaneously
different Start point may obtion result,as than pictures show。
![[机器学习笔记] 梯度下降法 [机器学习笔记] 梯度下降法](/default/index/img?u=L2RlZmF1bHQvaW5kZXgvaW1nP3U9YUhSMGNITTZMeTl3YVdGdWMyaGxiaTVqYjIwdmFXMWhaMlZ6THpVNE9TOWhOamcwTldZd09UTTVNRGxqT0RJeE5ESmpOakk1WmpZMk5tUTFZak00TlM1d2JtYz0=)
![[机器学习笔记] 梯度下降法 [机器学习笔记] 梯度下降法](/default/index/img?u=L2RlZmF1bHQvaW5kZXgvaW1nP3U9YUhSMGNITTZMeTl3YVdGdWMyaGxiaTVqYjIwdmFXMWhaMlZ6THpjME1DODNNVGM0T0RsaE56QXpaRGd6TW1ObFlUVTJaR0kzT1RrNFpERTRZVGszWXk1d2JtYz0=)
the size of learning rate α
if α is too small,gradient descent can be slow.
if α is too large, gradient descent can overshoot the minimum it may fail to converge, or even diverge.