使用 Keras 和 Tensorflow 作为后端，使用来自 keras 的 vgg16.py 训练 cifar10答案

【问题标题】：Using Keras with Tensorflow as backend to train cifar10 using vgg16.py from keras使用 Keras 和 Tensorflow 作为后端，使用来自 keras 的 vgg16.py 训练 cifar10
【发布时间】：2017-02-15 07:41:27
【问题描述】：

我使用了 keras 提供的pre-trained model of vgg16。 vgg16.py
在 vgg16.py 中，我将最小输入大小从 48 更改为 32，默认从 225 更改为 32。cifar10 的尺寸为 (nb_samples, 3, 32, 32)。

以下是代码：

from keras.datasets import cifar10
from keras.utils import *
from keras.optimizers import SGD
nb_classes = 10
(X_train, Y_train), (X_test, Y_test) = cifar10.load_data()
print ("Train shape", X_train.shape, Y_train.shape)
print ("Train samples", X_train.shape[0])
print ("Test samples", X_test.shape[0])
Y_train = np_utils.to_categorical(Y_train, nb_classes)
Y_test = np_utils.to_categorical(Y_test, nb_classes)
print ("Train shape", X_train.shape, Y_train.shape)

from keras.applications.vgg16 import VGG16
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
from keras.models import Model
import numpy as np

base_model = VGG16(weights=None, include_top=True, input_shape=X_train.shape[1:], classes=10)
base_model.compile(optimizer=SGD(lr=0.005, decay=1e-6, momentum=0.9, nesterov=True), loss='categorical_crossentropy', metrics=['accuracy'])   
base_model.fit(X_train, Y_train, nb_epoch=10, batch_size=256, verbose=1)
base_model.evaluate(X_test, Y_test, batch_size=256, verbose=1)
#The commented code gives validation accuracy but above code does not.
#base_model.fit(X_train, Y_train,batch_size=256,nb_epoch=10,validation_data=(X_test, Y_test),shuffle=True)

上面的代码可以工作，但是权重是随机初始化的。结果如下：

Using TensorFlow backend.
('http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz', 'cifar-10-batches-py')
('Train shape', (50000, 32, 32, 3), (50000, 1))
('Train samples', 50000)
('Test samples', 10000)
('Train shape', (50000, 32, 32, 3), (50000, 10))
Train on 10000 samples, validate on 5000 samples
Epoch 1/10
50000/50000 [==============================] - 2641s - loss: 2.3138 - acc: 0.111
Epoch 2/10
50000/50000 [==============================] - 2643s - loss: 2.3027 - acc: 0.0974
Epoch 3/10
50000/50000 [==============================] - 2642 - loss: 2.3027 - acc: 0.0987
Epoch 4/10
50000/50000 [==============================] - 2643s - loss: 2.3027 - acc: 0.0986
Epoch 5/10
50000/50000 [==============================] - 2728s - loss: 2.3027 - acc: 0.0966 
Epoch 6/10
50000/50000 [==============================] - 2736s - loss: 2.3027 - acc: 0.0983
Epoch 7/10
50000/50000 [==============================] - 2681s - loss: 2.3027 - acc: 0.0971
Epoch 8/10
50000/50000 [==============================] - 2707s - loss: 2.3027 - acc: 0.0970
Epoch 9/10
50000/50000 [==============================] - 2609s - loss: 2.3027 - acc: 0.0955
Epoch 10/10
50000/50000 [==============================] - 2649s - loss: 2.3027 - acc: 0.0997

这个培训好像saturated with loss=2.3027.
keras 的cifar10_cnn.py 的代码使用实时数据增强和速度reduces the speed of above 2000s to 351s of the code. 任何原因，并且在以后的训练集上准确率上升到 80%，但在上述情况下，它保持在 9% 不变？

【问题讨论】：

标签： python tensorflow keras

【解决方案1】：

我在尝试不同的配置后发现VGG16 architecture is too big 用于大小为 32x32 的图像。我尝试使用 VGG16 直到block3_pool，然后添加了dense 512fully_connected 后跟softmax classifier for10 classes。下面是修改后的代码：

base_model = VGG16(weights=None, include_top=False, 
             input_shape=X_train.shape[1:], classes=10)
x = base_model.get_layer('block3_pool').output
x = Flatten(name='Flatten')(x)
x = Dense(512, activation='relu', name='fc1')(x)
predictions = Dense(nb_classes, activation='softmax')(x)
model = Model(input=base_model.input, output=predictions)

我在问题模型中发现的缺点是：
(i) 将批量大小从 256 减少到 32。
(ii) 使用均值对数据进行归一化。
(iii) 数据增强应使用data_gen keras 函数。
(iv) 更小的架构 VGG16（16 层）到block3_pool（3 层）。
(v) 使用上述架构将时间从 2000 秒减少到 900 秒。

达到的accuracy 大约是75%。可以further reduce time on CPU吗？
在对大小为 10000 的测试示例评估模型后，我得到以下结果：
['loss', 'acc'] : [2.3033507381439211, 0.10000000000000001]。
这是overfitting。

【讨论】：

最小尺寸是 48 而不是 32。看github.com/fchollet/keras/blob/master/keras/applications/…

【解决方案2】：

我找到了this github project。在这里，他们在带有 keras 的 cifar10 上使用 vgg16。它们的验证准确度超过 0.97。

更准确地说：

他们使用 vgg16 直到层 'block3_pool'
他们添加了一些全连接层

全连接层代码：

x = Flatten()(last)
x = Dense(256, activation='relu')(x)
x = Dropout(0.5)(x)
pred = Dense(10, activation='sigmoid')(x)

【讨论】：

此链接现在不可用。我试图建立一个模型直到'block3_pool'层，但我在验证集上获得了 10% 的准确率。你能帮我解决一下吗？我对所有图层都使用了 padding ='SAME'。
链接已更改：github.com/mjiansun/cifar10-vgg16。我在我的帖子中调整了它
谢谢 Oliver，我看到模型中还有一些 dropout 层。你知道是否有可能在 cifar10 上使用“仅”卷积和全连接层获得超过 80% 的准确率？并且不扩大输入大小？
你知道这个实现的任何论文吗？