使用Tensorflow搭建CNN网络处理MNIST

我们这次搭建的神经网络如下图所示：

它的输入是一个像素值为28*28的灰度图，然后输入的数据先经过一个卷积层，卷积核的大小是5*5*32，得到32个feature map，之后经过池化层，这里我们选择了最大池化，得到12*12*32的数据，此时再经过一个卷积层，卷积核为5*5*64，得到的结果再一次经过池化，得到的数据为4*4*64，最后通过一个全连接层，得到最终的结果。

1.下载和准备数据

在这一部分我们完成对数据集的准备，这里我们选择最常见的书写数字图像集——MNIST，我们使用load_mnist函数来读取MNIST手写数字库，同时输出训练集、验证集、测试集的数量。

代码如下：

def load_mnist(path, kind='train'):
 """Load MNIST data from `path`"""
 labels_path = os.path.join(path,
 '%s-labels-idx1-ubyte'
 % kind)
 images_path = os.path.join(path,
 '%s-images-idx3-ubyte'
 % kind)

 with open(labels_path, 'rb') as lbpath:
 magic, n = struct.unpack('>II',
 lbpath.read(8))
 labels = np.fromfile(lbpath,
 dtype=np.uint8)

 with open(images_path, 'rb') as imgpath:
 magic, num, rows, cols = struct.unpack(">IIII",
 imgpath.read(16))
 images = np.fromfile(imgpath,
 dtype=np.uint8).reshape(len(labels), 784)

 return images, labels


X_data, y_data = load_mnist('./', kind='train')

print('Rows: %d, Columns: %d' % (X_data.shape[0], X_data.shape[1]))
X_test, y_test = load_mnist('./', kind='t10k')

print('Rows: %d, Columns: %d' % (X_test.shape[0], X_test.shape[1]))

X_train, y_train = X_data[:50000,:], y_data[:50000]
X_valid, y_valid = X_data[50000:,:], y_data[50000:]


print('Training: ', X_train.shape, y_train.shape)

print('Validation: ', X_valid.shape, y_valid.shape)

print('Test Set: ', X_test.shape, y_test.shape)

输出结果如下：

使用Tensorflow搭建CNN网络处理MNIST

2.生成模型前的准备工作

完成数据下载之后，此时我们需要从输入的数据中抽取股东size的batch，于是我们定义了batch生成函数。它返回一个数字+标签的元组。

代码如下：

def batch_generator(X, y, batch_size=64, 
 shuffle=False, random_seed=None):
 
 idx = np.arange(y.shape[0])
 
 if shuffle:
 rng = np.random.RandomState(random_seed)
 rng.shuffle(idx)
 X = X[idx]
 y = y[idx]
 
 for i in range(0, X.shape[0], batch_size):
 yield (X[i:i+batch_size, :], y[i:i+batch_size])

接下来为了使得数据有更好的表现、更快的收敛，我们需要对数据进行归一下操作。我们计算每个feature的平均值和标准差，完成归一化操作。

代码如下：

mean_vals = np.mean(X_train, axis=0)
std_val = np.std(X_train)

X_train_centered = (X_train - mean_vals)/std_val
X_valid_centered = X_valid - mean_vals
X_test_centered = (X_test - mean_vals)/std_val

3.使用tensorflow的底层API搭建CNN网络

首先我们定义卷积层和全连接层，来简化搭建神经的过程。

首先是卷积层，这里我们定义了权重，误差，然后对他们进行初始化。这里的卷积操作使用tf.nn.conv2d函数，权重初始化使用Xavier，误差使用tf.zeros函数完成初始化，最后确定ReLU作为**函数。

import tensorflow as tf

import numpy as np


## wrapper functions 


def conv_layer(input_tensor, name,
 kernel_size, n_output_channels, 
 padding_mode='SAME', strides=(1, 1, 1, 1)):
 with tf.variable_scope(name):
 ## get n_input_channels:
 ## input tensor shape: 
 ## [batch x width x height x channels_in]
 input_shape = input_tensor.get_shape().as_list()
 n_input_channels = input_shape[-1] 

 weights_shape = (list(kernel_size) + 
 [n_input_channels, n_output_channels])

 weights = tf.get_variable(name='_weights',
 shape=weights_shape)
 print(weights)
 biases = tf.get_variable(name='_biases',
 initializer=tf.zeros(
 shape=[n_output_channels]))
 print(biases)
 conv = tf.nn.conv2d(input=input_tensor, 
 filter=weights,
 strides=strides, 
 padding=padding_mode)
 print(conv)
 conv = tf.nn.bias_add(conv, biases, 
 name='net_pre-activation')
 print(conv)
 conv = tf.nn.relu(conv, name='activation')
 print(conv)
 
 return conv

我们使用简单的输入来测试一下函数的功能：

g = tf.Graph()

with g.as_default():
 x = tf.placeholder(tf.float32, shape=[None, 28, 28, 1])
 conv_layer(x, name='convtest', kernel_size=(3, 3), n_output_channels=32)
 

del g, x

得到结果如下，函数功能正常：使用Tensorflow搭建CNN网络处理MNIST

接下来我们定义全连接函数。同样地这里我们使用fc_layer来构建权重和误差，用conv_layer来初始化他们，接着然后使用tf.matmul函数完成生成矩阵。这个函数中有三个变量，分别为输入，该层的名称，用于确定范围、输出单元。

代码如下：

def fc_layer(input_tensor, name, 
 n_output_units, activation_fn=None):
 with tf.variable_scope(name):
 input_shape = input_tensor.get_shape().as_list()[1:]
 n_input_units = np.prod(input_shape)
 if len(input_shape) > 1:
 input_tensor = tf.reshape(input_tensor, 
 shape=(-1, n_input_units))

 weights_shape = [n_input_units, n_output_units]

 weights = tf.get_variable(name='_weights',
 shape=weights_shape)
 print(weights)
 biases = tf.get_variable(name='_biases',
 initializer=tf.zeros(
 shape=[n_output_units]))
 print(biases)
 layer = tf.matmul(input_tensor, weights)
 print(layer)
 layer = tf.nn.bias_add(layer, biases,
 name='net_pre-activation')
 print(layer)
 if activation_fn is None:
 return layer
 
 layer = activation_fn(layer, name='activation')
 print(layer)
 return layer

接着继续使用简单的输入验证函数功能。

g = tf.Graph()

with g.as_default():
 x = tf.placeholder(tf.float32, 
 shape=[None, 28, 28, 1])
 fc_layer(x, name='fctest', n_output_units=32, 
 activation_fn=tf.nn.relu)
 

del g, x

输出结果如下：

使用Tensorflow搭建CNN网络处理MNIST

进行到这里，重头戏来了，我们要正式开始搭建CNN网络啦。。这里我们定义build_CNN 来管理搭建CNN模型的过程。

代码如下:

def build_cnn():
 ## Placeholders for X and y:
 tf_x = tf.placeholder(tf.float32, shape=[None, 784],
 name='tf_x')
 tf_y = tf.placeholder(tf.int32, shape=[None],
 name='tf_y')

 # reshape x to a 4D tensor: 
 # [batchsize, width, height, 1]
 tf_x_image = tf.reshape(tf_x, shape=[-1, 28, 28, 1],
 name='tf_x_reshaped')
 ## One-hot encoding:
 tf_y_onehot = tf.one_hot(indices=tf_y, depth=10,
 dtype=tf.float32,
 name='tf_y_onehot')

 ## 1st layer: Conv_1
 print('\nBuilding 1st layer: ')
 h1 = conv_layer(tf_x_image, name='conv_1',
 kernel_size=(5, 5), 
 padding_mode='VALID',
 n_output_channels=32)
 ## MaxPooling
 h1_pool = tf.nn.max_pool(h1, 
 ksize=[1, 2, 2, 1],
 strides=[1, 2, 2, 1], 
 padding='SAME')
 ## 2n layer: Conv_2
 print('\nBuilding 2nd layer: ')
 h2 = conv_layer(h1_pool, name='conv_2', 
 kernel_size=(5,5), 
 padding_mode='VALID',
 n_output_channels=64)
 ## MaxPooling 
 h2_pool = tf.nn.max_pool(h2, 
 ksize=[1, 2, 2, 1],
 strides=[1, 2, 2, 1], 
 padding='SAME')

 ## 3rd layer: Fully Connected
 print('\nBuilding 3rd layer:')
 h3 = fc_layer(h2_pool, name='fc_3',
 n_output_units=1024, 
 activation_fn=tf.nn.relu)

 ## Dropout
 keep_prob = tf.placeholder(tf.float32, name='fc_keep_prob')
 h3_drop = tf.nn.dropout(h3, keep_prob=keep_prob, 
 name='dropout_layer')

 ## 4th layer: Fully Connected (linear activation)
 print('\nBuilding 4th layer:')
 h4 = fc_layer(h3_drop, name='fc_4',
 n_output_units=10, 
 activation_fn=None)

 ## Prediction
 predictions = {
 'probabilities' : tf.nn.softmax(h4, name='probabilities'),
 'labels' : tf.cast(tf.argmax(h4, axis=1), tf.int32,
 name='labels')
 }
 
 ## Visualize the graph with TensorBoard:

 ## Loss Function and Optimization
 cross_entropy_loss = tf.reduce_mean(
 tf.nn.softmax_cross_entropy_with_logits(
 logits=h4, labels=tf_y_onehot),
 name='cross_entropy_loss')

 ## Optimizer:
 optimizer = tf.train.AdamOptimizer(learning_rate)
 optimizer = optimizer.minimize(cross_entropy_loss,
 name='train_op')

 ## Computing the prediction accuracy
 correct_predictions = tf.equal(
 predictions['labels'], 
 tf_y, name='correct_preds')

 accuracy = tf.reduce_mean(
 tf.cast(correct_predictions, tf.float32),
 name='accuracy')

这里得到的tensorboard结果如下：使用Tensorflow搭建CNN网络处理MNIST

接下来我们将定义四个其他函数：保存和加载函数以保存加载训练模型的检查点，训练模型使用training_set，预测函数来得到测试数据标签或可能性。

代码如下：

def save(saver, sess, epoch, path='./model/'):
 if not os.path.isdir(path):
 os.makedirs(path)
 print('Saving model in %s' % path)
 saver.save(sess, os.path.join(path,'cnn-model.ckpt'),
 global_step=epoch)

 

def load(saver, sess, path, epoch):
 print('Loading model from %s' % path)
 saver.restore(sess, os.path.join(
 path, 'cnn-model.ckpt-%d' % epoch))

 

def train(sess, training_set, validation_set=None,
 initialize=True, epochs=20, shuffle=True,
 dropout=0.5, random_seed=None):

 X_data = np.array(training_set[0])
 y_data = np.array(training_set[1])
 training_loss = []

 ## initialize variables
 if initialize:
 sess.run(tf.global_variables_initializer())

 np.random.seed(random_seed) # for shuflling in batch_generator
 for epoch in range(1, epochs+1):
 batch_gen = batch_generator(
 X_data, y_data, 
 shuffle=shuffle)
 avg_loss = 0.0
 for i,(batch_x,batch_y) in enumerate(batch_gen):
 feed = {'tf_x:0': batch_x, 
 'tf_y:0': batch_y, 
 'fc_keep_prob:0': dropout}
 loss, _ = sess.run(
 ['cross_entropy_loss:0', 'train_op'],
 feed_dict=feed)
 avg_loss += loss

 training_loss.append(avg_loss / (i+1))
 print('Epoch %02d Training Avg. Loss: %7.3f' % (
 epoch, avg_loss), end=' ')
 if validation_set is not None:
 feed = {'tf_x:0': validation_set[0],
 'tf_y:0': validation_set[1],
 'fc_keep_prob:0':1.0}
 valid_acc = sess.run('accuracy:0', feed_dict=feed)
 print(' Validation Acc: %7.3f' % valid_acc)
 else:
 print()

 

def predict(sess, X_test, return_proba=False):
 feed = {'tf_x:0': X_test, 
 'fc_keep_prob:0': 1.0}
 if return_proba:
 return sess.run('probabilities:0', feed_dict=feed)
 else:
 return sess.run('labels:0', feed_dict=feed)

现在我们可以创建一个tensorflow图形对象，生成图形的随机种子，并在该图中建立CNN模型

import tensorflow as tf

import numpy as np

## Define hyperparameters
learning_rate = 1e-4
random_seed = 123

np.random.seed(random_seed)


## create a graph
g = tf.Graph()

with g.as_default():
 tf.set_random_seed(random_seed)
 ## build the graph
 build_cnn()

 ## saver:
 saver = tf.train.Saver()

接下来我们训练CNN模型，实现过程中首先创建Tensorflow session来发布表格，然后使用train函数

在第一次创建网络时，需要初始化各个变量。

代码如下：

with tf.Session(graph=g) as sess:
 train(sess, 
 training_set=(X_train_centered, y_train), 
 validation_set=(X_valid_centered, y_valid), 
 initialize=True,
 random_seed=123)
 save(saver, sess, epoch=20)

得到结果如下：

在20个epochs完成后，我们保存之前训练的模型。实现过程中我们首先删除了graph g,新定义了g2

重组了训练模型，完成对测试集的预测。

### Calculate prediction accuracy
### on test set
### restoring the saved model


del g

## create a new graph 
## and build the model
g2 = tf.Graph()

with g2.as_default():
 tf.set_random_seed(random_seed)
 ## build the graph
 build_cnn()

 ## saver:
 saver = tf.train.Saver()

## create a new session 
## and restore the model

with tf.Session(graph=g2) as sess:
 load(saver, sess, 
 epoch=20, path='./model/')
 
 preds = predict(sess, X_test_centered, 
 return_proba=False)

 print('Test Accuracy: %.3f%%' % (100*
 np.sum(preds == y_test)/len(y_test)))

得到结果如下：使用Tensorflow搭建CNN网络处理MNIST

接着我们看一下前10个测试样本的预测情况。

## run the prediction on 
## some test samples

np.set_printoptions(precision=2, suppress=True)


with tf.Session(graph=g2) as sess:
 load(saver, sess, 
 epoch=20, path='./model/')
 
 print(predict(sess, X_test_centered[:10], 
 return_proba=False))
 
 print(predict(sess, X_test_centered[:10], 
 return_proba=True))

得到的结果如下：

使用Tensorflow搭建CNN网络处理MNIST

接下来我们继续完成剩下的20个epoch，这次我们设置Initialize=False来跳过初始化操作。

## continue training for 20 more epochs
## without re-initializing :: initialize=False
## create a new session 
## and restore the model

with tf.Session(graph=g2) as sess:
 load(saver, sess, 
 epoch=20, path='./model/')
 
 train(sess,
 training_set=(X_train_centered, y_train), 
 validation_set=(X_valid_centered, y_valid),
 initialize=False,
 epochs=20,
 random_seed=123)
 
 save(saver, sess, epoch=40, path='./model/')
 
 preds = predict(sess, X_test_centered, 
 return_proba=False)
 
 print('Test Accuracy: %.3f%%' % (100*
 np.sum(preds == y_test)/len(y_test)))

得到的结果如下：

结果表明，20个附加时期的训练略有改善。在测试集上获得99.37%的预测精度。