恢复保存的 TensorFlow 模型以在测试集上进行评估答案

【问题标题】：Restoring saved TensorFlow model to evaluate on test set恢复保存的 TensorFlow 模型以在测试集上进行评估
【发布时间】：2016-12-22 16:06:50
【问题描述】：

我在恢复TF 模型和exporting graphs 上的Google 文档页面上看到了一些posts，但我认为我遗漏了一些东西。

我使用 Gist 中的代码将模型与 defines 模型的 utils 文件一起保存

现在我想恢复它并在以前看不见的测试数据中运行如下：

def evaluate(X_data, y_data):
    num_examples = len(X_data)
    total_accuracy = 0
    total_loss = 0
    sess = tf.get_default_session()
    acc_steps = len(X_data) // BATCH_SIZE
    for i in range(acc_steps):
        batch_x, batch_y = next_batch(X_val, Y_val, BATCH_SIZE)

        loss, accuracy = sess.run([loss_value, acc], feed_dict={
                images_placeholder: batch_x,
                labels_placeholder: batch_y,
                keep_prob: 0.5
                })
        total_accuracy += (accuracy * len(batch_x))
        total_loss += (loss * len(batch_x))
    return (total_accuracy / num_examples, total_loss / num_examples)

## re-execute the code that defines the model

# Image Tensor
images_placeholder = tf.placeholder(tf.float32, shape=[None, 32, 32, 3], name='x')

gray = tf.image.rgb_to_grayscale(images_placeholder, name='gray')

gray /= 255.

# Label Tensor
labels_placeholder = tf.placeholder(tf.float32, shape=(None, 43), name='y')

# dropout Tensor
keep_prob = tf.placeholder(tf.float32, name='drop')

# construct model
logits = inference(gray, keep_prob)

# calculate loss
loss_value = loss(logits, labels_placeholder)

# training
train_op = training(loss_value, 0.001)

# accuracy
acc = accuracy(logits, labels_placeholder)

with tf.Session() as sess:
    loader = tf.train.import_meta_graph('gtsd.meta')
    loader.restore(sess, tf.train.latest_checkpoint('./'))
    sess.run(tf.initialize_all_variables())   
    test_accuracy = evaluate(X_test, y_test)
    print("Test Accuracy = {:.3f}".format(test_accuracy[0]))

我得到的测试准确率只有 3%。但是，如果我在训练模型后不关闭笔记本并立即运行测试代码，我将获得 95% 的准确度。

这让我相信我没有正确加载模型？

【问题讨论】：

标签： tensorflow conv-neural-network

【解决方案1】：

问题出在这两行：

loader.restore(sess, tf.train.latest_checkpoint('./'))
sess.run(tf.initialize_all_variables())

第一行从检查点加载保存的模型。第二行重新初始化模型中的所有变量（例如权重矩阵、卷积滤波器和偏置向量），通常为随机数，并覆盖加载值。

解决方案很简单：删除第二行 (sess.run(tf.initialize_all_variables()))，然后评估将从检查点加载的训练值继续进行。

PS。此更改很有可能会给您一个关于“未初始化变量”的错误。在这种情况下，您应该执行sess.run(tf.initialize_all_variables()) 以在执行loader.restore(sess, tf.train.latest_checkpoint('./')) 之前初始化任何未保存在检查点中的变量。

【讨论】：

谢谢@mrry 我现在就试试这个
如你所料，TF 抛出一个关于未初始化变量的错误。当我按照你的建议向上移动时，它仍然只有 2% 的准确率，因此它是从一开始就开始的。
哦，我注意到另一个问题！ tf.train.import_meta_graph() 会将模型结构的第二个副本加载到当前图表中。如果您在创建 tf.Session 之前的代码构建了图形的副本（包括所有权重），那些权重将保持未初始化状态，并且只会恢复第二个副本中的权重。有两种处理方法：（1）不使用tf.train.import_meta_graph()，而是直接创建一个tf.train.Saver，用它来将checkpoint恢复为graph的初始副本；或者...
(2) 在使用tf.train.import_meta_graph() and instead use introspection methods such as tf.get_default_graph().get_operation_by_name()` 查找原始图中的损失、准确度和占位符张量之前，请避免构建评估图。两种方法都可能需要进行一些重组（基本上您必须确保图形和检查点中的变量名称相同），但我希望选项（1）涉及的工作更少。
@mrry 辍学怎么样？应该如何在评估时将其重置为1.0？声明一个新的tf.placeholder() 会简单有效，还是应该从培训中恢复占位符？

【解决方案2】：

我遇到了类似的问题，对我来说这很有效：

with tf.Session() as sess:
    saver=tf.train.Saver(tf.all_variables())
    saver=tf.train.import_meta_graph('model.meta')
    saver.restore(sess,"model")

    test_accuracy = evaluate(X_test, y_test)

【讨论】：

【解决方案3】：

here 找到的答案最终工作如下：

save_path = saver.save(sess, '/home/ubuntu/gtsd-12-23-16.chkpt')
print("Model saved in file: %s" % save_path)
## later re-run code that creates the model
# Image Tensor
images_placeholder = tf.placeholder(tf.float32, shape=[None, 32, 32, 3], name='x')

gray = tf.image.rgb_to_grayscale(images_placeholder, name='gray')

gray /= 255.

# Label Tensor
labels_placeholder = tf.placeholder(tf.float32, shape=(None, 43), name='y')

# dropout Tensor
keep_prob = tf.placeholder(tf.float32, name='drop')

# construct model
logits = inference(gray, keep_prob)

# calculate loss
loss_value = loss(logits, labels_placeholder)

# training
train_op = training(loss_value, 0.001)

# accuracy
acc = accuracy(logits, labels_placeholder)

saver = tf.train.Saver()
    with tf.Session() as sess:
        saver.restore(sess, '/home/ubuntu/gtsd-12-23-16.chkpt')
        print("Model restored.")
        test_accuracy = evaluate(X_test, y_test)
        print("Test Accuracy = {:.3f}".format(test_accuracy[0]*100))

【讨论】：

你不是想设置keep_prop = 1吗？