Part I :

- Activation Functions

- Data Preprocessing

- Weight Initialization

- Batch Normalization

- Babysitting the Learning Process

- Hyperparameter Optimization 

 

  • Activation Functions

Training Neural Networks, part I

Training Neural Networks, part I

  • Sigmoid

Training Neural Networks, part I

问题:

1.梯度消失

Training Neural Networks, part I

2.非零中心 ------> 梯度更新低效

Training Neural Networks, part I

3.指数函数的计算代价高

  • tanh

Training Neural Networks, part I

以零为中心,但还是没有解决梯度消失的问题

  • ReLU

Training Neural Networks, part I

计算速度快,收敛速度快,在正半轴不会饱和

不是以零为中心,在负半轴饱和(包括0)

Training Neural Networks, part I

dead relu不会被**或更新,导致的原因:

                           (1)权值初始化不好

                           (2)学习率太高 ---> 权值波动大

  • Leaky ReLU

Training Neural Networks, part I

  • PReLU

Training Neural Networks, part I

  • ELU

Training Neural Networks, part I

  • Maxout

Training Neural Networks, part I

  • In practice:

Training Neural Networks, part I

 

  • Data Preprocessing

一般会做零中心化处理

Training Neural Networks, part I

不总是做归一化

Training Neural Networks, part I

仅作中心化的方法:

Training Neural Networks, part I

 

  • Weight Initialization

1. 初始化为0时(相同的值)

神经元会做相同的事情,没有打破参数对称问题

 

2. 初始化为很小的随机数

Training Neural Networks, part I

Training Neural Networks, part I

梯度会趋近于0

 

3. 初始化在高斯分布(-1, 1)中 

Training Neural Networks, part I

会饱和(在tanh中)

Training Neural Networks, part I

 

4. Xavier (tanh)

Training Neural Networks, part I

 

5.  Xavier (ReLU)

Training Neural Networks, part I

 

  • Batch Normalization 

Training Neural Networks, part I

归一化 + 缩放,平移

Training Neural Networks, part ITraining Neural Networks, part I

Training Neural Networks, part I

 

 

  • Babysitting the Learning Process

 

  • Step 1: Preprocess the data 

Training Neural Networks, part I

 

  • Step 2: Choose the architecture
  • Step 3: Double check that the loss is reasonable

 Training Neural Networks, part I

Training Neural Networks, part I

 

  • Step 4: try to train

 1. Make sure that you can overfit very small portion of the training data

Training Neural Networks, part I

Very small loss, train accuracy 1.00, nice!

2. Start with small regularization and find learning rate that makes the loss go down.

Training Neural Networks, part I

loss not going down :  learning rate too low

3. try a big learning rate

 Training Neural Networks, part I

loss exploding : learning rate too high

4. Rough range for learning rate we should be cross-validating is somewhere [1e-3 … 1e-5]

 

  • Hyperparameter Optimization

 Training Neural Networks, part I

Training Neural Networks, part I

Training Neural Networks, part I

问题:不能在空间上充分寻找,寻找范围是有限的

Random Search: More sample

Training Neural Networks, part I

 

Hyperparameters to play with:

- network architecture

- learning rate, its decay schedule, update type

- regularization (L2/Dropout strength)

 

Training Neural Networks, part I

Training Neural Networks, part I

Training Neural Networks, part I

Training Neural Networks, part I

 

  • Summay

Training Neural Networks, part I

相关文章: