梯度下降算法:
Repeat
{
θj=θj−α∂θj∂J(θ0,θ1...θn)
}simultaneously update for every j=0,1…n)
θj=θj−αm1i=1∑m(hθ(x(i))−y(i))xj(i)
Feature Scaling以及Mean normalizaition
α太大:slow convergence
α太小:J(θ) mat not decrease on every iteration,may not converge
尝试不同的α,绘制J(θ)随迭代次数变化的曲线
polynominal regression(多项式回归)
Normal equation(正规方程)
∂θj∂J(θ)=0 for every j

Gradient Descent 和 Normal Equation各自的优缺点
