GradientDescent in Practice II - Learning Rate

Note: [5:20 - the x -axis label in theright graph should be θ rather than No. of iterations ]

Debugginggradient descent. Make a plot with number of iterations onthe x-axis. Now plot the cost function, J(θ) over the number of iterations ofgradient descent. If J(θ) ever increases, then you probably need to decrease α.

Automatic convergence test. Declareconvergence if J(θ) decreases by less than E in one iteration, where E is somesmall value such as 10−3. However in practice it's difficult to choose thisthreshold value.

机器学习笔记之学习速率

It has been proven that if learning rate α is sufficiently small, then J(θ) will decreaseon every iteration.

机器学习笔记之学习速率

To summarize:

If α is toosmall: slow convergence.

If α is toolarge: may not decrease on every iteration and thus may not converge.

 

相关文章: