【问题标题】：what's wrong with my tensorflow code我的张量流代码有什么问题
【发布时间】：2017-12-13 06:26:35
【问题描述】：

我刚开始研究 tensorflow，我想为 MNIST 创建一个 DNN。在教程中，有一个非常简单的神经网络，它有 784 个输入节点、10 个输出节点并且没有隐藏节点。我尝试修改这些代码以创建 DNN 网络。这是我的代码。我想我只是在输入和输出层之间添加了一个有 500 个节点的隐藏层，但测试准确率只有 10%，这意味着它没有经过训练。你知道我的代码有什么问题吗？

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

import os
os.chdir('../')

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

x=tf.placeholder(tf.float32,[None,784])
W_h1=tf.Variable(tf.zeros([784,500]))
B_h1=tf.Variable(tf.zeros([500]))
h1=tf.nn.relu(tf.matmul(x,W_h1)+B_h1)
'''
W_h2=tf.Variable(tf.zeros([5,5]))
B_h2=tf.Variable(tf.zeros([5]))
h2=tf.nn.relu(tf.matmul(h1,W_h2)+B_h2)
'''
B_o=tf.Variable(tf.zeros([10]))
W_o=tf.Variable(tf.zeros([500,10]))
y=tf.nn.relu(tf.matmul(h1,W_o)+B_o)

y_=tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()

number_steps = 10000
batch_size = 100
for _ in range(number_steps):
  batch_xs, batch_ys = mnist.train.next_batch(batch_size)
  train=sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

  # Print classifier's accuracy
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

好的，根据@lejlot 的建议，我将代码更改如下。

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

import os
os.chdir('../')

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

x=tf.placeholder(tf.float32,[None,784])
W_h1=tf.Variable(tf.random_normal([784,500]))
B_h1=tf.Variable(tf.random_normal([500]))
h1=tf.nn.relu(tf.matmul(x,W_h1)+B_h1)
'''
W_h2=tf.Variable(tf.random_normal([500,500]))
B_h2=tf.Variable(tf.random_normal([500]))
h2=tf.nn.relu(tf.matmul(h1,W_h2)+B_h2)
'''
B_o=tf.Variable(tf.random_normal([10]))
W_o=tf.Variable(tf.random_normal([500,10]))
y= tf.matmul(h1,W_o)+B_o # notice no activation

y_=tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.nn.log_softmax(y), # notice log_softmax
                  reduction_indices=[1]))

train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()

correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

number_steps = 10000
batch_size = 100
for i in range(number_steps):
  batch_xs, batch_ys = mnist.train.next_batch(batch_size)
  train=sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
  if i % 1000==0:
    acc=sess.run(accuracy,feed_dict={x: mnist.test.images, y_: mnist.test.labels})
    print('Current loop %d, Accuracy: %g'%(i,acc))



  # Print classifier's accuracy
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

有两种修改：

用 tf.random_normal 改变 W_h1 和 B_h1 的初始值
改变y和cross_entropy的定义

修改剂量工作。但是我仍然不知道我的原始代码有什么问题。我调用了 tf.global_variables_initializer().run()，我认为这个函数会随机 W_h1 和 B_h1 的值。此外，如果我将 y 和 cross_entropy 定义如下，它不起作用。

y= tf.nn.softmax(tf.matmul(h1,W_o)+B_o) 
y_=tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y),reduction_indices=[1]))

【问题讨论】：

如果把学习率降低到0.002，增加步数怎么办？ 0.05 是一个非常高的学习率

标签： python tensorflow deep-learning

【解决方案1】：

首先这不是有效的分类器模型。

y=tf.nn.relu(tf.matmul(h1,W_o)+B_o)

y_=tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

您正在使用显式的交叉熵方程，它要求 y 是（逐行）概率分布，但您通过应用 relu 生成 y，这意味着您只是在输出一些非负数。事实上，如果你曾经输出零，你的代码将产生 NaN 并失败（因为 0 的对数是负无穷大）。

你应该使用

y = tf.nn.softmax(tf.matmul(h1,W_o)+B_o)

相反。甚至更好（为了更好的数值稳定性）：

y= tf.matmul(h1,W_o)+B_o # notice no activation

y_=tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(
                  -tf.reduce_sum(y_ * tf.nn.log_softmax(y), # notice log_softmax
                  reduction_indices=[1]))

更新

第二个问题是初始化 - 你不能将神经网络权重初始化为零，它们必须是随机数，通常从低方差零均值高斯中采样。全局初始化器不随机化权重，它只是运行所有初始化操作 - 如果初始化操作是常量（如零），它只是确保将这些零分配给变量，而不是其他（因此可用于重置网络等）。零初始化仅适用于逻辑回归等凸问题，但不适用于神经网络等复杂模型。

【讨论】：

感谢您的建议，但没有变化。
至少还有一个问题（在答案中解决）。一旦你修复它 - 请更新问题中的代码，降低学习率，它应该可以工作
当我随机初始权重并更改 cross_entropy 时，它确实有效。但我想我调用了 tf.global_variables_initializer().run() 来随机权重，在教程中，它使用零初始化和这个函数来初始化变量。
不，全局初始化程序只运行您定义的操作，因此如果它们是常量，它们将保持不变。零初始化仅适用于线性模型，所以我猜教程是使用逻辑回归？
是的，我知道问题所在。初始化器只执行变量的赋值语句，而不是随机的。本教程使用 softmax 并且没有隐藏单元。非常感谢。还有另一个问题。当我用 log_softmax 定义 cross_entropy 时，它运行良好。但是如果我单独定义它，也就是说我先用softmax定义y，然后用y_*tf.log(y)定义corss_entropy，它就不起作用了。那么最后一个定义有什么问题呢？