【DL】mini-batch策略

Backto Deep Learning Index

基础概念

epoch : 所有训练集使用一次，叫一个 epoch

one epoch = one forward pass and one backward pass of all the training examples

对应代码中的参数是 n_epochs.

batch_size : 一个 batch 中 samples 的个数

一般情况下，一个训练集中会有大量的samples，受限于内存大小通常无法一次加载，同时为了提高训练速度，会将整个training set分为n_batch组，每组包含batch_size个samples

train_set = batch_size * n_batch

batch size = the number of training examples in one forward/backward pass. The higher the batch size, the more memory space you’ll need.

iterations : 利用某个 batch 中的所有samples 进行一次训练，叫一次 iteration

number of iterations = number of passes, each pass using batch_size number of examples. To be clear, one pass = one forward pass + one backward pass (we do not count the forward pass and backward pass as two different passes)
n_iterations = n_epoch * n_batch

具体流程是

# epoch个数
n_epochs = 100
# 样本总个数
numSamples = 100 000
# 要将样本分割为n_batch组
n_batch = 10
# 每个batch包含的samples
batch_size = numSamples / n_batch 
# 进行训练
iterations = 0
for i in range(n_epochs ):
    for j in range (n_batch):
       #利用第j组batch进行training
       train (j) 
       # iterations个数加1
       iterations = iterations  +1

超参数设定

讲道理，n_epochs 是个单向的参数：数量越多，耗时越长，效果越好。即使是陷入坑里，多来几次epoch，也不会变差。n_iterarions 是个从属量。

batch_size 是最核心的一个，涉及的软硬件方面很多:

size 大了，一下子进入内存送入GPU，硬件利用效率高；梯度方向更能代表整体的梯度方向，但是更新一次梯度变慢了，达到同样精度可能需要多来几个epoch；再调大可能硬件资源吃不消就报错了，或者早早进坑跳不出来了。
size 小了，硬件资源浪费，梯度更新随机性太高，但是灵活多变，梯度快速更新多次之后，可能一团乱麻，可能乱中出奇迹。

所以，这是一个技术活，需要经验感觉。举一个例子，很多helloword.demo 用MINST识别做入门，通常会把所有的图像拼接为一个tensor，开场就是 batch_size = 128，在我的 RTX2080 （8G）上，跑三个epoch之后就会爆 GPU sync failed 错误。调成 batch_size = 32 或 batch_size = 64 是个较好的选择
代码

import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.layers import Flatten,  MaxPooling2D, Conv2D
from keras.callbacks import TensorBoard

# Load and process the MNIST data
(X_train,y_train), (X_test, y_test) = mnist.load_data("./mnist.npz")

X_train = X_train.reshape(60000,28,28,1).astype('float32')
X_test = X_test.reshape(10000,28,28,1).astype('float32')

X_train /= 255
X_test /= 255

n_classes = 10
y_train = keras.utils.to_categorical(y_train, n_classes)
y_test = keras.utils.to_categorical(y_test, n_classes)

# Create the LeNet-5 neural network architecture
model = Sequential()
model.add(Conv2D(32, kernel_size=(3,3), activation='relu', input_shape=(28,28,1)) )
model.add(Conv2D(64, kernel_size=(3,3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25))
model.add(Flatten())          
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(n_classes, activation='softmax'))

# Compile the model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# Set log data to feed to TensorBoard for visual analysis
tensor_board = TensorBoard('./logs/LeNet-MNIST-2-BatchSize=16')
# Train the model
model.fit(X_train, y_train, batch_size=16, epochs=15, verbose=1,
          validation_data=(X_test,y_test), callbacks=[tensor_board])

【DL】mini-batch策略

限定epoch=15，batch_size 64vs32vs16，从上图-train图可以看到，, 在 train-set 上的 acc 和 loss 接近，但是 64 更快收敛，16最慢。从下图-eval图可以看到，32 的精度更高, 或者直接说模型更优。综合评判，batch_size=32 是最优选择。

Ref

谈谈深度学习中的 Batch_Size