【发布时间】:2019-04-26 21:07:32
【问题描述】:
之前有人讨论过这个问题,但他们普遍收敛到梯度消失作为这个问题的根源。
但在我的模型中,只有两个隐藏层不太可能卡在梯度消失上,如下所示:
from __future__ import print_function
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import RMSprop
batch_size = 128
num_classes = 10
epochs = 20
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
model = Sequential()
model.add(Dense(512, activation='relu', kernel_initializer='random_uniform',input_shape=(784,)))
model.add(Dropout(0.2))
model.add(Dense(512, kernel_initializer='random_uniform',activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(num_classes, kernel_initializer='random_uniform',activation='softmax'))
print (model.get_weights().__len__())
for i in range(6):
print (str(i), "th layer shape: ", model.get_weights()[i].shape ,model.get_weights()[i].__len__(), "mean: ", np.mean(model.get_weights()[i]), "std dev: ", np.std(model.get_weights()[i]))
print ("Before Training")
print (model.get_weights()[i][0])
class LossHistory(keras.callbacks.Callback):
def on_train_begin(self, logs={}):
self.losses = []
def on_batch_end(self, batch, logs={}):
self.losses.append(logs.get('loss'))
batch_history = LossHistory()
model.summary()
model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(),
metrics=['accuracy'])
history = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test),
callbacks = [batch_history])
for i in range(6):
print (str(i), "th layer shape: ", model.get_weights()[i].shape ,model.get_weights()[i].__len__(), "mean: ", np.mean(model.get_weights()[i]), "std dev: ", np.std(model.get_weights()[i]))
print ("After Training Training")
print (model.get_weights()[i][0])
我截取了训练前后的权重截图。总之,第一层的权重在训练后没有变化,但第二层的权重确实发生了变化。 (由于参数较多,我只展示了权重矩阵第一行的一部分)
【问题讨论】:
-
不看其余代码,很难知道。
-
@StephaneBersier 刚刚更新了其余的代码。希望这能让你更清楚
-
它仍然缺少一些代码,但现在我注意到您使用
'random_uniform'来初始化权重,这通常不是一个好的初始化器。您是否尝试过使用 He 正常初始化? -
实际上,在您的情况下,这并不重要。没关系。
-
你能贴出训练前后输出权重的代码吗?另外,我想知道
EPOCHS的价值
标签: tensorflow keras training-data