线性回归之批梯度下降、随机梯度下降和mini-batch梯度下降算法

线性回归

对于 $y = ax+b$ 一元线性回归如下图所示：
线性回归之批梯度下降、随机梯度下降和mini-batch梯度下降算法
考虑多个变量的情形：

$h_\theta(x) = \theta_0 + \theta_1x_1 + \theta_2x_2\\ h(x) = \sum_{i=0}^n \theta_ix_i = \theta^Tx$
我们选取一个比较“符合常理”的误差函数为（损失函数）：
$J(\theta) = \frac{1}{2}\sum_{i=1}^m (h_\theta(x^{(i)})-y^{(i)})^2$
当损失函数取得极小值时，求得的 $\theta$ 即为局部最优解。（对于 $J(\theta)$ 这个二次函数而言，当取得极小值，求得的 $\theta$ 为全局最优解）

对于 $\theta$ 的解析式的求解过程如下：
线性回归之批梯度下降、随机梯度下降和mini-batch梯度下降算法
可得最小二乘意义下的参数最优解为：
$\theta = (X^TX)^{-1}X^Ty$
特别的，当 $X^TX$ 阶过高时，仍然需要使用梯度下降的方式计算数值解

梯度下降算法

步骤：
1、初始化 $\theta$ （随机初始化）
2、迭代得到新的 $\theta$ 能够是的 $J(\theta)$ 更小
3、如果 $J(\theta)$ 能够继续减少，返回（2）
迭代公式（ $\alpha$ 为称为学习率)：
$\theta_j :=\theta_j - \alpha\frac{\partial}{\partial\theta_j}J(\theta)$
梯度方向（本质上是对 $\theta$ 求偏导）：
$\begin{aligned} \frac{\partial}{\partial\theta_j}J(\theta) &= \frac{\partial}{\partial\theta_j}\frac{1}{2}(h_\theta(x) - y)^2 \\ &= 2*\frac{1}{2}(h_\theta(x) - y) *\partial\frac{\partial}{\partial\theta_j}(h_\theta(x)-y) \\ &= (h_\theta(x)-y)*\frac{\partial}{\partial\theta_j}(\sum_{i=0}^n \theta_ix_i - y) \\ &= (h_\theta(x)-y)x_j \end{aligned}$

批梯度下降算法

$\text{Repeat until convergence\{}\\ \theta_j := \theta_j + \alpha\sum_{i=1}^m(y^{(i)}-h_\theta(x^{(i)}))x_j^{(i)}\\ \text{\}}$
批梯度下降图示：
线性回归之批梯度下降、随机梯度下降和mini-batch梯度下降算法

随机梯度下降算法

$\text{Loop\{}\\ \text{for i = 1 to m,\{} \theta_j :=\theta_j + \alpha(y^{(i)}-h_\theta(x^{(i)}))x_j^{(i)} \text{\}}\\ \text{\}}$

mini-batch梯度下降算法

$\text{Repeat until convergence\{}\\ \theta_j := \theta_j + \alpha\sum_{i=1}^m(y^{(i)}-h_\theta(x^{(i)}))x_j^{(i)}\\ \text{\}}$