Backto Deep Learning Index
基础概念
epoch : 所有训练集使用一次,叫一个 epoch
one epoch = one forward pass and one backward pass of all the training examples
对应代码中的参数是 n_epochs.
batch_size : 一个 batch 中 samples 的个数
一般情况下,一个训练集中会有大量的samples,受限于内存大小通常无法一次加载,同时为了提高训练速度,会将整个training set分为n_batch组,每组包含batch_size个samples
train_set = batch_size * n_batch
batch size = the number of training examples in one forward/backward pass. The higher the batch size, the more memory space you’ll need.
iterations : 利用某个 batch 中的所有samples 进行一次训练,叫一次 iteration
number of iterations = number of passes, each pass using
batch_sizenumber of examples. To be clear, one pass = one forward pass + one backward pass (we do not count the forward pass and backward pass as two different passes)
n_iterations = n_epoch * n_batch
具体流程是
# epoch个数
n_epochs = 100
# 样本总个数
numSamples = 100 000
# 要将样本分割为n_batch组
n_batch = 10
# 每个batch包含的samples
batch_size = numSamples / n_batch
# 进行训练
iterations = 0
for i in range(n_epochs ):
for j in range (n_batch):
#利用第j组batch进行training
train (j)
# iterations个数加1
iterations = iterations +1
超参数设定
讲道理,n_epochs 是个单向的参数:数量越多,耗时越长,效果越好。即使是陷入坑里,多来几次epoch,也不会变差。n_iterarions 是个从属量。
batch_size 是最核心的一个,涉及的软硬件方面很多:
- size 大了,一下子进入内存送入GPU,硬件利用效率高;梯度方向更能代表整体的梯度方向,但是更新一次梯度变慢了,达到同样精度可能需要多来几个epoch;再调大可能硬件资源吃不消就报错了,或者早早进坑跳不出来了。
- size 小了,硬件资源浪费,梯度更新随机性太高,但是灵活多变,梯度快速更新多次之后,可能一团乱麻,可能乱中出奇迹。
所以,这是一个技术活,需要经验感觉。举一个例子,很多helloword.demo 用MINST识别做入门,通常会把所有的图像拼接为一个tensor,开场就是 batch_size = 128,在我的 RTX2080 (8G)上,跑三个epoch之后就会爆 GPU sync failed 错误。调成 batch_size = 32 或 batch_size = 64 是个较好的选择
代码
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.layers import Flatten, MaxPooling2D, Conv2D
from keras.callbacks import TensorBoard
# Load and process the MNIST data
(X_train,y_train), (X_test, y_test) = mnist.load_data("./mnist.npz")
X_train = X_train.reshape(60000,28,28,1).astype('float32')
X_test = X_test.reshape(10000,28,28,1).astype('float32')
X_train /= 255
X_test /= 255
n_classes = 10
y_train = keras.utils.to_categorical(y_train, n_classes)
y_test = keras.utils.to_categorical(y_test, n_classes)
# Create the LeNet-5 neural network architecture
model = Sequential()
model.add(Conv2D(32, kernel_size=(3,3), activation='relu', input_shape=(28,28,1)) )
model.add(Conv2D(64, kernel_size=(3,3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(n_classes, activation='softmax'))
# Compile the model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# Set log data to feed to TensorBoard for visual analysis
tensor_board = TensorBoard('./logs/LeNet-MNIST-2-BatchSize=16')
# Train the model
model.fit(X_train, y_train, batch_size=16, epochs=15, verbose=1,
validation_data=(X_test,y_test), callbacks=[tensor_board])
- 限定epoch=15,batch_size 64vs32vs16,从上图-train图可以看到,, 在 train-set 上的 acc 和 loss 接近,但是 64 更快收敛,16最慢。从下图-eval图可以看到,32 的精度更高, 或者直接说模型更优。综合评判,batch_size=32 是最优选择。