机器学习第二课

梯度下降算法:
Repeat
{
$\theta_j=\theta_j-\alpha\frac{\partial}{\partial\theta_j}J(\theta_0,\theta_1...\theta_n)$
}simultaneously update for every j=0,1…n)

$\theta_j=\theta_j-\alpha\frac{1}{m}\sum_{i=1}^{m} (h_{\theta}(x^{(i)})-y^{(i)})x_j^{(i)}$
Feature Scaling以及Mean normalizaition

$\alpha$ 太大：slow convergence
$\alpha$ 太小：J( $\theta$ ) mat not decrease on every iteration,may not converge
尝试不同的 $\alpha$ ，绘制J( $\theta$ )随迭代次数变化的曲线

polynominal regression(多项式回归)

Normal equation(正规方程)

$\frac{\partial}{\partial\theta_j}J(\theta)=0$ for every j

机器学习第二课
Gradient Descent 和 Normal Equation各自的优缺点