【问题标题】：Implement MLP in tensorflow在张量流中实现 MLP
【发布时间】：2016-01-29 06:14:35
【问题描述】：

我想使用 tensorflow 实现 https://www.coursera.org/learn/machine-learning 中教授的 MLP 模型。这是实现。

# one hidden layer MLP

x = tf.placeholder(tf.float32, shape=[None, 784])
y = tf.placeholder(tf.float32, shape=[None, 10])

W_h1 = tf.Variable(tf.random_normal([784, 512]))
h1 = tf.nn.sigmoid(tf.matmul(x, W_h1))

W_out = tf.Variable(tf.random_normal([512, 10]))
y_ = tf.matmul(h1, W_out)

# cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(y_, y)
cross_entropy = tf.reduce_sum(- y * tf.log(y_) - (1 - y) * tf.log(1 - y_), 1)
loss = tf.reduce_mean(cross_entropy)
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(loss)

correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# train
with tf.Session() as s:
    s.run(tf.initialize_all_variables())

    for i in range(10000):
        batch_x, batch_y = mnist.train.next_batch(100)
        s.run(train_step, feed_dict={x: batch_x, y: batch_y})

        if i % 100 == 0:
            train_accuracy = accuracy.eval(feed_dict={x: batch_x, y: batch_y})
            print('step {0}, training accuracy {1}'.format(i, train_accuracy))

但是，它不起作用。我认为层的定义是正确的，但问题在于 cross_entropy。如果我使用第一个，被注释掉的那个，模型很快收敛；但如果我使用第二个，我认为/希望是前一个方程的翻译，模型将不会收敛。

如果您想查看成本等式，可以在here 找到它。

更新

我已经使用 numpy 和 scipy 实现了相同的 MLP 模型，并且可以正常工作。

在tensorflow代码中，我在训练循环中添加了print行，发现y_中的所有元素都是nan...我认为它是由算术溢出或类似原因引起的。

【问题讨论】：

我认为这两个成本函数期望不同的“y_”。第一个想要原始线性输出，第二个想要所有类别的总和在 1 和 0 之间缩放的线性输出。缩放可以通过 tf.nn.softmax 来完成。
我不认为你的第一个损失是你打算使用的。常见的是softmax_cross_entropy_with_logits。请花点时间阅读 tensorflow 官方教程tensorflow.org/versions/0.6.0/tutorials/mnist/tf/… 或tensorflow.org/versions/0.6.0/tutorials/mnist/beginners/…

标签： machine-learning tensorflow

【解决方案1】：

很可能是 0*log(0) 问题。

更换

cross_entropy = tf.reduce_sum(- y * tf.log(y_) - (1 - y) * tf.log(1 - y_), 1)

与

cross_entropy = tf.reduce_sum(- y * tf.log(tf.clip_by_value(y_, 1e-10, 1.0)) - (1 - y) * tf.log(tf.clip_by_value(1 - y_, 1e-10, 1.0)), 1)

请参阅Tensorflow NaN bug?。

【讨论】：

我觉得这就像 0 * log(0) 问题。我只是找不到在 TF 中解决它的方法。非常感谢~

【解决方案2】：

我认为的问题是 nn.sigmoid_cross_entropy_with_logits 期望得到非标准化结果，而作为函数，您将其替换为 cross_entropy = tf.reduce_sum(- y * tf.log(y_) - (1 - y) * tf.log(1 - y_), 1)

期望 y_ 在 0 和 1 之间进行归一化（通过 sigmoid）

尝试替换

y_ = tf.matmul(h1, W_out)

与

y_ = tf.nn.sigmoid(tf.matmul(h1, W_out))

【讨论】：

抱歉，觉得值得一试，执行此操作时的输出是什么样的？
train_accuracy 约为 0.1，迭代次数超过 10k。