深度学习 -- 神经网络 1

深度学习 -- 神经网络 1

上图为人体的神经网络，其工作原理：

随着神经网络的发展，现在已经不再使用上面的示例解释目前的神经网络了。这是因为现在的神经网络有反向传播的过程，但这个在人体的神经网络是没有这个过程的。

深度学习 -- 神经网络 1

上图为一个简单的神经网络，每个圆圈代表一个神经元。最左侧为输入层，该层的神经元成为输入单元，最右侧的为输出层，该层神经元为输出单元，中间的层统称为隐藏层，这些层的单元称为隐藏单元。该网络为3层神经网络（输入层不记入网络层数）。

下面根据几个简单的实例来看下神经网络的工作过程。但在此之前，需要先介绍下神经网络中常用的参数：

General comments:

superscript $(i)$ will denote the $i^{th}$ training example while superscript $[l]$ will denote the $l^{th}$ layer

Sizes

Objects

$X \in R^{n_x × m}$ : is the input matrix
$x^{(i)} \in R^{n_x}$ : is the $i^{th}$ example represented as a column vector
$Y \in R^{n_x × m}$ : is the label matrix
$y^{(i)} \in R^{n_x}$ : is the output label for the $i^{th}$ example
$W^{[l]} \in R^{n_h^{[l]} × n_h^{[l-1]}}$ : is the weight matrix
$b^{[l]} \in R^{n_h^{[l]}}$ : is the bias vector in the $l^{th}$ layer
$\hat y \in R^{n_y}$ : is the predicted output vector

深度学习 -- 神经网络 1
该网络十分简单，一共2层：输入和输出

根据上述的神经网络参数介绍，网络参数维度为：
$n_x \in$ 12288 x m
$n_y \in$ 1 x m
$W^{[1]} \in$ 1 x 12288
$b^{[1]} \in$ 1 x 1

然后需要对参数 $W^{[1]}$ $b^{[1]}$ 进行初始化。神经网络的参数初始化方法很多，后续的文章里面会专门讲。

Activation function:
对于全连接网络中的一个样本 $x^{(i)}$ ，它所有的输入单元都会连接到下一层的每一个神经元上面。每个神经元会做两件事：
– 线性函数 $z$ ： $z^{(i)} = w^Tx^{(i)} + b$
– **函数 $\sigma$ ： $a^{(i)} = \sigma(z^{(i)})$
该例中，它的**函数为sigmoid， $\hat y = a^{(i)} = sigmoid(z^{(i)})$
Loss function:
用于表示每个样本的差异： $L(a^{(i)}, y^{(i)}) = - (y^{(i)} log(a^{(i)}) + (1 - y^{(i)}) log(1 - a^{(i)}))$
Cost function:
指的是计算所有样本的Loss function： $J = \frac{1}{m} \sum_{i=1}^{m} L(a^{(i)}, y^{(i)})$

前向传播Forward propagration:
它的目的就是将 $m$ 个样本的输入的特征向量 $X$ 通过层层的网络计算后得到最后的输出 $\hat Y$ ，然后计算出cost $J$ 。具体来讲，前向传播的过程是：
$Z^{[1]} = W^{[1]}X + b^{[1]}$
$A^{[1]} = sigmoid(Z^{[1]})$
$Z^{[2]} = W^{[2]}A^{[1]} + b^{[2]}$
$A^{[2]} = sigmoid(Z^{[2]})$
$\hat Y = A^{[2]}$
$J = -\frac{1}{m} (Y log(\hat Y)^T + (1 - Y)log(1 - \hat Y)^T)$
反向传播Backward propagation:
它的目的就是通过最小化 $J$ ，更新参数 $W^{[1]}$ 和 $b^{[1]}$ 。而最小化的过程就是梯度下降的过程：
$dW^{[1]} = \frac{\partial \mathcal{L} }{\partial W^{[1]}}$
$db^{[1]} = \frac{\partial \mathcal{L} }{\partial b^{[1]}}$
$W^{[1]} = W^{[1]} - \alpha \text{ } dW^{[1]}$
$b^{[1]} = b^{[1]} - \alpha \text{ } db^{[1]}$

总结：构建一个神经网络的主要步骤如下

Define the model structure (such as number of input features)
Initialize the model’s parameters( $W^{[1]}, b^{[1]}$ )
Loop：
- Caltulate current loss(forward propagation)
- Calculate current gradient(backward propagation)
- Udpate parameters(gradient descent)