我们这次搭建的神经网络如下图所示:
它的输入是一个像素值为28*28的灰度图,然后输入的数据先经过一个卷积层,卷积核的大小是5*5*32,得到32个feature map,之后经过池化层,这里我们选择了最大池化,得到12*12*32的数据,此时再经过一个卷积层,卷积核为5*5*64,得到的结果再一次经过池化,得到的数据为4*4*64,最后通过一个全连接层,得到最终的结果。

1.下载和准备数据
在这一部分我们完成对数据集的准备,这里我们选择最常见的书写数字图像集——MNIST,我们使用load_mnist函数来读取MNIST手写数字库,同时输出训练集、验证集、测试集的数量。
代码如下:
def load_mnist(path, kind='train'):
"""Load MNIST data from `path`"""
labels_path = os.path.join(path,
'%s-labels-idx1-ubyte'
% kind)
images_path = os.path.join(path,
'%s-images-idx3-ubyte'
% kind)
with open(labels_path, 'rb') as lbpath:
magic, n = struct.unpack('>II',
lbpath.read(8))
labels = np.fromfile(lbpath,
dtype=np.uint8)
with open(images_path, 'rb') as imgpath:
magic, num, rows, cols = struct.unpack(">IIII",
imgpath.read(16))
images = np.fromfile(imgpath,
dtype=np.uint8).reshape(len(labels), 784)
return images, labels
X_data, y_data = load_mnist('./', kind='train')
print('Rows: %d, Columns: %d' % (X_data.shape[0], X_data.shape[1]))
X_test, y_test = load_mnist('./', kind='t10k')
print('Rows: %d, Columns: %d' % (X_test.shape[0], X_test.shape[1]))
X_train, y_train = X_data[:50000,:], y_data[:50000]
X_valid, y_valid = X_data[50000:,:], y_data[50000:]
print('Training: ', X_train.shape, y_train.shape)
print('Validation: ', X_valid.shape, y_valid.shape)
print('Test Set: ', X_test.shape, y_test.shape)
输出结果如下:

2.生成模型前的准备工作
完成数据下载之后,此时我们需要从输入的数据中抽取股东size的batch,于是我们定义了batch生成函数。它返回一个数字+标签的元组。
代码如下:
def batch_generator(X, y, batch_size=64,
shuffle=False, random_seed=None):
idx = np.arange(y.shape[0])
if shuffle:
rng = np.random.RandomState(random_seed)
rng.shuffle(idx)
X = X[idx]
y = y[idx]
for i in range(0, X.shape[0], batch_size):
yield (X[i:i+batch_size, :], y[i:i+batch_size])
接下来为了使得数据有更好的表现、更快的收敛,我们需要对数据进行归一下操作。我们计算每个feature的平均值和标准差,完成归一化操作。
代码如下:
mean_vals = np.mean(X_train, axis=0)
std_val = np.std(X_train)
X_train_centered = (X_train - mean_vals)/std_val
X_valid_centered = X_valid - mean_vals
X_test_centered = (X_test - mean_vals)/std_val
3.使用tensorflow的底层API搭建CNN网络
首先我们定义卷积层和全连接层,来简化搭建神经的过程。
首先是卷积层,这里我们定义了权重,误差,然后对他们进行初始化。这里的卷积操作使用tf.nn.conv2d函数,权重初始化使用Xavier,误差使用tf.zeros函数完成初始化,最后确定ReLU作为**函数。
import tensorflow as tf
import numpy as np
## wrapper functions
def conv_layer(input_tensor, name,
kernel_size, n_output_channels,
padding_mode='SAME', strides=(1, 1, 1, 1)):
with tf.variable_scope(name):
## get n_input_channels:
## input tensor shape:
## [batch x width x height x channels_in]
input_shape = input_tensor.get_shape().as_list()
n_input_channels = input_shape[-1]
weights_shape = (list(kernel_size) +
[n_input_channels, n_output_channels])
weights = tf.get_variable(name='_weights',
shape=weights_shape)
print(weights)
biases = tf.get_variable(name='_biases',
initializer=tf.zeros(
shape=[n_output_channels]))
print(biases)
conv = tf.nn.conv2d(input=input_tensor,
filter=weights,
strides=strides,
padding=padding_mode)
print(conv)
conv = tf.nn.bias_add(conv, biases,
name='net_pre-activation')
print(conv)
conv = tf.nn.relu(conv, name='activation')
print(conv)
return conv
我们使用简单的输入来测试一下函数的功能:
g = tf.Graph()
with g.as_default():
x = tf.placeholder(tf.float32, shape=[None, 28, 28, 1])
conv_layer(x, name='convtest', kernel_size=(3, 3), n_output_channels=32)
del g, x
得到结果如下,函数功能正常:
接下来我们定义全连接函数。同样地这里我们使用fc_layer来构建权重和误差,用conv_layer来初始化他们,接着然后使用tf.matmul函数完成生成矩阵。这个函数中有三个变量,分别为输入,该层的名称,用于确定范围、输出单元。
代码如下:
def fc_layer(input_tensor, name,
n_output_units, activation_fn=None):
with tf.variable_scope(name):
input_shape = input_tensor.get_shape().as_list()[1:]
n_input_units = np.prod(input_shape)
if len(input_shape) > 1:
input_tensor = tf.reshape(input_tensor,
shape=(-1, n_input_units))
weights_shape = [n_input_units, n_output_units]
weights = tf.get_variable(name='_weights',
shape=weights_shape)
print(weights)
biases = tf.get_variable(name='_biases',
initializer=tf.zeros(
shape=[n_output_units]))
print(biases)
layer = tf.matmul(input_tensor, weights)
print(layer)
layer = tf.nn.bias_add(layer, biases,
name='net_pre-activation')
print(layer)
if activation_fn is None:
return layer
layer = activation_fn(layer, name='activation')
print(layer)
return layer
接着继续使用简单的输入验证函数功能。
g = tf.Graph()
with g.as_default():
x = tf.placeholder(tf.float32,
shape=[None, 28, 28, 1])
fc_layer(x, name='fctest', n_output_units=32,
activation_fn=tf.nn.relu)
del g, x
输出结果如下:

进行到这里,重头戏来了,我们要正式开始搭建CNN网络啦。。这里我们定义build_CNN 来管理搭建CNN模型的过程。
代码如下:
def build_cnn():
## Placeholders for X and y:
tf_x = tf.placeholder(tf.float32, shape=[None, 784],
name='tf_x')
tf_y = tf.placeholder(tf.int32, shape=[None],
name='tf_y')
# reshape x to a 4D tensor:
# [batchsize, width, height, 1]
tf_x_image = tf.reshape(tf_x, shape=[-1, 28, 28, 1],
name='tf_x_reshaped')
## One-hot encoding:
tf_y_onehot = tf.one_hot(indices=tf_y, depth=10,
dtype=tf.float32,
name='tf_y_onehot')
## 1st layer: Conv_1
print('\nBuilding 1st layer: ')
h1 = conv_layer(tf_x_image, name='conv_1',
kernel_size=(5, 5),
padding_mode='VALID',
n_output_channels=32)
## MaxPooling
h1_pool = tf.nn.max_pool(h1,
ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1],
padding='SAME')
## 2n layer: Conv_2
print('\nBuilding 2nd layer: ')
h2 = conv_layer(h1_pool, name='conv_2',
kernel_size=(5,5),
padding_mode='VALID',
n_output_channels=64)
## MaxPooling
h2_pool = tf.nn.max_pool(h2,
ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1],
padding='SAME')
## 3rd layer: Fully Connected
print('\nBuilding 3rd layer:')
h3 = fc_layer(h2_pool, name='fc_3',
n_output_units=1024,
activation_fn=tf.nn.relu)
## Dropout
keep_prob = tf.placeholder(tf.float32, name='fc_keep_prob')
h3_drop = tf.nn.dropout(h3, keep_prob=keep_prob,
name='dropout_layer')
## 4th layer: Fully Connected (linear activation)
print('\nBuilding 4th layer:')
h4 = fc_layer(h3_drop, name='fc_4',
n_output_units=10,
activation_fn=None)
## Prediction
predictions = {
'probabilities' : tf.nn.softmax(h4, name='probabilities'),
'labels' : tf.cast(tf.argmax(h4, axis=1), tf.int32,
name='labels')
}
## Visualize the graph with TensorBoard:
## Loss Function and Optimization
cross_entropy_loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(
logits=h4, labels=tf_y_onehot),
name='cross_entropy_loss')
## Optimizer:
optimizer = tf.train.AdamOptimizer(learning_rate)
optimizer = optimizer.minimize(cross_entropy_loss,
name='train_op')
## Computing the prediction accuracy
correct_predictions = tf.equal(
predictions['labels'],
tf_y, name='correct_preds')
accuracy = tf.reduce_mean(
tf.cast(correct_predictions, tf.float32),
name='accuracy')
这里得到的tensorboard结果如下:
接下来我们将定义四个其他函数:保存和加载函数以保存加载训练模型的检查点,训练模型使用training_set,预测函数来得到测试数据标签或可能性。
代码如下:
def save(saver, sess, epoch, path='./model/'):
if not os.path.isdir(path):
os.makedirs(path)
print('Saving model in %s' % path)
saver.save(sess, os.path.join(path,'cnn-model.ckpt'),
global_step=epoch)
def load(saver, sess, path, epoch):
print('Loading model from %s' % path)
saver.restore(sess, os.path.join(
path, 'cnn-model.ckpt-%d' % epoch))
def train(sess, training_set, validation_set=None,
initialize=True, epochs=20, shuffle=True,
dropout=0.5, random_seed=None):
X_data = np.array(training_set[0])
y_data = np.array(training_set[1])
training_loss = []
## initialize variables
if initialize:
sess.run(tf.global_variables_initializer())
np.random.seed(random_seed) # for shuflling in batch_generator
for epoch in range(1, epochs+1):
batch_gen = batch_generator(
X_data, y_data,
shuffle=shuffle)
avg_loss = 0.0
for i,(batch_x,batch_y) in enumerate(batch_gen):
feed = {'tf_x:0': batch_x,
'tf_y:0': batch_y,
'fc_keep_prob:0': dropout}
loss, _ = sess.run(
['cross_entropy_loss:0', 'train_op'],
feed_dict=feed)
avg_loss += loss
training_loss.append(avg_loss / (i+1))
print('Epoch %02d Training Avg. Loss: %7.3f' % (
epoch, avg_loss), end=' ')
if validation_set is not None:
feed = {'tf_x:0': validation_set[0],
'tf_y:0': validation_set[1],
'fc_keep_prob:0':1.0}
valid_acc = sess.run('accuracy:0', feed_dict=feed)
print(' Validation Acc: %7.3f' % valid_acc)
else:
print()
def predict(sess, X_test, return_proba=False):
feed = {'tf_x:0': X_test,
'fc_keep_prob:0': 1.0}
if return_proba:
return sess.run('probabilities:0', feed_dict=feed)
else:
return sess.run('labels:0', feed_dict=feed)
现在我们可以创建一个tensorflow图形对象,生成图形的随机种子,并在该图中建立CNN模型
import tensorflow as tf
import numpy as np
## Define hyperparameters
learning_rate = 1e-4
random_seed = 123
np.random.seed(random_seed)
## create a graph
g = tf.Graph()
with g.as_default():
tf.set_random_seed(random_seed)
## build the graph
build_cnn()
## saver:
saver = tf.train.Saver()
接下来我们训练CNN模型,实现过程中首先创建Tensorflow session来发布表格,然后使用train函数
在第一次创建网络时,需要初始化各个变量。
代码如下:
with tf.Session(graph=g) as sess:
train(sess,
training_set=(X_train_centered, y_train),
validation_set=(X_valid_centered, y_valid),
initialize=True,
random_seed=123)
save(saver, sess, epoch=20)
得到结果如下:
在20个epochs完成后,我们保存之前训练的模型。实现过程中我们首先删除了graph g,新定义了g2
重组了训练模型,完成对测试集的预测。
### Calculate prediction accuracy
### on test set
### restoring the saved model
del g
## create a new graph
## and build the model
g2 = tf.Graph()
with g2.as_default():
tf.set_random_seed(random_seed)
## build the graph
build_cnn()
## saver:
saver = tf.train.Saver()
## create a new session
## and restore the model
with tf.Session(graph=g2) as sess:
load(saver, sess,
epoch=20, path='./model/')
preds = predict(sess, X_test_centered,
return_proba=False)
print('Test Accuracy: %.3f%%' % (100*
np.sum(preds == y_test)/len(y_test)))
得到结果如下:
接着我们看一下前10个测试样本的预测情况。
## run the prediction on
## some test samples
np.set_printoptions(precision=2, suppress=True)
with tf.Session(graph=g2) as sess:
load(saver, sess,
epoch=20, path='./model/')
print(predict(sess, X_test_centered[:10],
return_proba=False))
print(predict(sess, X_test_centered[:10],
return_proba=True))
得到的结果如下:

接下来我们继续完成剩下的20个epoch,这次我们设置Initialize=False来跳过初始化操作。
## continue training for 20 more epochs
## without re-initializing :: initialize=False
## create a new session
## and restore the model
with tf.Session(graph=g2) as sess:
load(saver, sess,
epoch=20, path='./model/')
train(sess,
training_set=(X_train_centered, y_train),
validation_set=(X_valid_centered, y_valid),
initialize=False,
epochs=20,
random_seed=123)
save(saver, sess, epoch=40, path='./model/')
preds = predict(sess, X_test_centered,
return_proba=False)
print('Test Accuracy: %.3f%%' % (100*
np.sum(preds == y_test)/len(y_test)))
得到的结果如下:
结果表明,20个附加时期的训练略有改善。在测试集上获得99.37%的预测精度。