即使我使用相同的层模块构建完全相同的模型，Tensorflow 和 Keras 的结果也会有所不同答案

【问题标题】：Tensorflow and Keras show a little different result even though I build exactly same models using same layer modules即使我使用相同的层模块构建完全相同的模型，Tensorflow 和 Keras 的结果也会有所不同
【发布时间】：2018-10-18 15:09:41
【问题描述】：

我同时使用 Tensorflow 和 Keras，我发现它们显示了不同的结果。已经有类似的问题了，但我和他们有点不同。

在我的情况下，损失和准确性只有一点点差异，我使用完全相同的“tf.keras.layers”模块。我认为唯一的区别是 AdamOptimizer 以及如何训练方法。

tf.train.AdamOptimizer 与 tf.keras.optimizers.Adam
tf.keras.models.fit 与 sess.run(train_optimizer)

我检查了亚当优化器的默认值是否相同。我认为差异不是由它的随机性引起的，因为当我运行 keras 模型几次时，我得到了相似的结果。

这是我的代码

Keras 模型

# Build model and train
X = tf.keras.layers.Input(shape=(sentence_size,), name='X')

embedded_X = tf.keras.layers.Embedding(voca_size,
                                       embedding_dim,
                                       weights = [embedding_matrix],
                                       input_length = sentence_size,
                                       trainable=True)(X)

hidden_states = tf.keras.layers.Bidirectional(tf.keras.layers.GRU(256, return_sequences=True))(embedded_X)
l_pool = tf.keras.layers.GlobalMaxPooling1D()(hidden_states)
preds = tf.keras.layers.Dense(1, activation = 'sigmoid')(l_pool)

model = tf.keras.models.Model(X, preds)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit( tokenized_train, y_train, shuffle=False, epochs=3, batch_size=32, validation_data= (tokenized_val, y_val))

张量流模型

# Build model  
X = tf.placeholder(tf.float32, [None, sentence_size])

embedded_X = tf.keras.layers.Embedding(voca_size,
                                       embedding_dim,
                                       weights = [embedding_matrix],
                                       input_length = sentence_size,
                                       trainable=True)(X)

hidden_states = tf.keras.layers.Bidirectional(tf.keras.layers.GRU(256, return_sequences=True))(embedded_X)
l_pool = tf.keras.layers.GlobalMaxPooling1D()(hidden_states)
_preds = tf.keras.layers.Dense(1, activation = 'sigmoid')(l_pool)

labels = tf.placeholder(tf.float32, [None, 1])
_loss = tf.reduce_mean( tf.keras.losses.binary_crossentropy(labels, _preds) )
_acc = tf.reduce_mean( tf.cast(tf.equal(labels, tf.round(_preds)), tf.float32) )
_train_op = tf.train.AdamOptimizer().minimize(_loss)

# Hyper parameters and loss_acc print function
from math import ceil

epochs = 3
batch_size = 32
steps_per_epoch = ceil( len(tokenized_train) / batch_size)

def loss_acc(sess, _loss, _preds, inputs, targets):
    batch_size = len(inputs)//100
    steps_per_epoch = ceil( len(inputs) / batch_size )

    data = tf.data.Dataset.from_tensor_slices((inputs, targets)).batch(batch_size).make_one_shot_iterator()
    next_batch = data.get_next()

    acc = 0
    loss = 0

    for batch in range(steps_per_epoch):
        x, y = sess.run(next_batch)
        l, a = sess.run([_loss, _acc], feed_dict={X:x, labels:y})

        acc += a/100
        loss += l/100

    return loss, acc

# Train model
data = tf.data.Dataset.from_tensor_slices((tokenized_train, y_train)).batch(batch_size).repeat().make_one_shot_iterator()
next_batch = data.get_next()
sess.run(tf.global_variables_initializer())

for epoch in range(epochs):
    for step in range(steps_per_epoch):
        x, y = sess.run(next_batch)
        batch_loss, batch_acc, _ = sess.run([_loss, _acc, _train_op], feed_dict={X:x, labels:y})
        if step%125 == 0:
            print('\nBatch: %d' %step)
            print(batch_loss, batch_acc)

    train_loss, train_acc = loss_acc(sess, _loss, _preds, tokenized_train, y_train)
    val_loss, val_acc = loss_acc(sess, _loss, _preds, tokenized_val, y_val)
    print("\nTrain loss: %.4f" %train_loss)
    print("Train acc: %.4f" %train_acc)
    print("Val loss: %.4f" %val_loss)
    print("Val acc: %.4f" %val_acc)

结果

Keras result
Tensorflow result

谢谢。

【问题讨论】：

标签： python tensorflow keras

【解决方案1】：

只要您没有完全相同的初始化权重，就无法消除不确定性的问题。您的结果变化不大，正如您在损失值中看到的那样，起点完全不同。而且 3 个 epoch 不是很多，尝试训练更多的 epoch，然后比较结果。如果您的模型过拟合，请添加一些正则化。

【讨论】：