Deep Learnig(2) - 爱码网

1.神经网络起源：线性回归

（1）一个线性回归问题

目标方程： $\mathrm{y}=\mathrm{ax}_{1}+\mathrm{bx}_{2}+\mathrm{cx}_{3}+\mathrm{d}$
参数： $\mathrm{m}=[\mathrm{a}, \mathrm{b}, \mathrm{c}, \mathrm{d}]$
数据：
$\begin{array}{l} {\left[\left(x_{1,1}, x_{2,1}, x_{3,1}\right),\left(x_{1,2}, x_{2,2}, x_{3,2}\right), \ldots\left(x_{1, n}, x_{2, n}, x_{3, n}\right)\right]} \\ {\left[y_{1}, y_{2} \ldots \ldots y_{n}\right]} \end{array}$
预测： $\hat{y}_{t}=a x_{1, t}+b x_{2, t}+c x_{3, t}+d$
目标：minimize $\left(\hat{y}_{t}-y_{t}\right)$
优化方法：梯度下降
Deep Learnig(2)

（2）梯度下降法计算模型参数

当前： $\mathrm{m}_{0}=\left[\mathrm{a}_{0}, \mathrm{b}_{0}, \mathrm{c}_{0}, \mathrm{d}_{0}\right]$
梯度计算：
$\operatorname{Loss}=\mathrm{ax}_{1, \mathrm{t}}+\mathrm{bx}_{2, \mathrm{t}}+\mathrm{cx}_{3, \mathrm{t}}+\mathrm{d}-\mathrm{y}$
Deep Learnig(2)
参数更新： $\mathrm{m}:=\mathrm{m}-\eta \Delta \mathrm{m}$
梯度下降法总结：
随机初始化参数
开启循环： $t=0,1,2,\cdots$
带入数据求出结果 $\hat{\mathrm{y}}_{\mathrm{t}}$
与真值比较得到loss $=y-\hat{y}_{t}$
对各个变量求导得到 $\triangle \mathrm{m}$
更新m
如果loss足够小或t循环结束，终止。

（3）多目标学习

通过合并多个任务loss，一般能产生比单个模型更好的结果。
Deep Learnig(2)

线性回归的局限性：
线性回归能够清楚的描述分割线性分布的数据，对非线性分布的数据描述较弱。

Deep Learnig(2)

2.从线性到非线性

非线性激励
考量标准：
1.正向对输入的调整；
2.反向梯度损失。
Deep Learnig(2)
常用的非线性激励函数

(1)Sigmoid函数

将输入数据映射到[0,1]，梯度下降明显，至少减少 $75 \%$ ，梯度最大值为0.25。
$y(x)=\operatorname{sigmoid}(x)=\frac{1}{1+e^{-x}}$
$\left.y(x)^{\prime}=y(x)(1-y(x))\right)$
Deep Learnig(2)

(2)tahn函数

将输入数据映射到[0,1]，梯度下降明显。
$\begin{aligned} &f(x)=\tanh (x)=\frac{2}{1+e^{-2 x}}-1\\ &f^{\prime}(x)=1-f(x)^{2} \end{aligned}$
Deep Learnig(2)

(3)ReLu函数

正向截断负值，损失大量特征(特征足够多，不会有影响)，但是反向梯度没有损失。
$\begin{aligned} &f(x)=\left\{\begin{array}{lll} 0 & \text { for } & x<0 \\ x & \text { for } & x \geq 0 \end{array}\right.\\ &f^{\prime}(x)=\left\{\begin{array}{lll} 0 & \text { for } & x<0 \\ 1 & \text { for } & x \geq 0 \end{array}\right. \end{aligned}$
Deep Learnig(2)

(4)Leaky ReLU函数

保留更多参数，少量梯度反向传播。
$\begin{aligned} &f(x)=\left\{\begin{array}{rll} 0.01 x & \text { for } & x<0 \\ x & \text { for } & x \geq 0 \end{array}\right.\\ &f^{\prime}(x)=\left\{\begin{array}{rll} 0.01 & \text { for } & x<0 \\ 1 & \text { for } & x \geq 0 \end{array}\right. \end{aligned}$
神经元-神经网络
Deep Learnig(2)

问题：有没有线性回归网络？

并没有。
$\begin{array}{l} X_{1}=W_{0} \cdot X_{0}, X_{2}=W_{1} \cdot X_{1}, Y=W_{2} \cdot X_{2} \\ Y=W_{2} \cdot W_{1} \cdot W_{0} \cdot X_{0}=W_{3} \cdot X_{0} \end{array}$

3.神经网络的构建

（1）神经元的“并联”和“串联”

从第一层神经完了过得到最终输出，每一个神经元的数值由前一层神经元数值，神经元参数W，b以及激励函数共同决定，第n+1层第k个神经元的方程可由公式表示为：
$\begin{aligned} &z_{n+1, k}=\sum_{i=1}^{m} W_{n, k, i} \cdot x_{n, i}+b_{n, k}\\ &y_{n+1, k}=\frac{1}{1+e^{-z_{n+1, k}}} \end{aligned}$
Deep Learnig(2)

（2）神经元的优化

链式法则
计算梯度：

output->last layer Loss–> $\Delta \mathrm{y}_{\mathrm{n}}$
layer–>layer $\quad \Delta \mathrm{y}_{\mathrm{n}^{-}}>\Delta \mathrm{x}_{\mathrm{n}}$
layer->parameter $\quad \Delta \mathrm{y}_{\mathrm{n}^{-}}>\Delta \mathrm{w}_{\mathrm{n}}$
Deep Learnig(2)

4.神经网络的配件

（1）损失函数

Softmax：
$\sigma(\mathbf{z})_{j}=\frac{e^{z_{j}}}{\sum_{k=1}^{K} e^{z_{k}}} \quad \text { for } j=1, \ldots, K$
Loss影响：
$[1,2,3,4,1,2,3] \longrightarrow[0.024,0.064,0.175,0.475,0.024,0.064,0.175]$
Softmax好处，分类问题的预测结果更明显。
交叉熵：
$L(\mathbf{w})=\frac{1}{N} \sum_{n=1}^{N} H\left(p_{n}, q_{n}\right)=-\frac{1}{N} \sum_{n=1}^{N}\left[y_{n} \log \hat{y}_{n}+\left(1-y_{n}\right) \log \left(1-\hat{y}_{n}\right)\right]$
Explode问题解决：
$\text {Loss}=-\Sigma\left(l \cdot \log \left(\frac{p+0.05}{1.05}\right)+(1-l) \cdot \log \left(\frac{1.05-p}{1.05}\right)\right)$
用途，目标为 $[0,1]$ 区间的回归问题，以及生成。
自定义损失函数
a.看重某一属性：单独将某一些预测值取出或赋予不同大小的参数；
b.合并多个loss：多目标训练任务，设置合理的loss结合方式（各种运算）
c.神经网络融合：不同神经网络loss融合，共同loss对网络进行训练指导。