在 TensorFlow 中获取一个简单的 MLP 来建模 XOR答案

【问题标题】：Get a simple MLP in TensorFlow to model XOR在 TensorFlow 中获取一个简单的 MLP 来建模 XOR
【发布时间】：2016-03-02 21:32:35
【问题描述】：

我尝试构建一个简单的 MLP，其中包含一个输入层（2 个神经元）、一个隐藏层（5 个神经元）和一个输出层（1 个神经元）。我计划用[[0., 0.], [0., 1.], [1., 0.], [1., 1.]] 对其进行训练和喂养，以获得[0., 1., 1., 0.] 的所需输出（按元素）。

不幸的是，我的代码拒绝运行。无论我尝试什么，我都会不断收到维度错误。非常令人沮丧：/ 我想我遗漏了一些东西，但我不知道出了什么问题。

为了更好的可读性，我还将代码上传到了一个 pastebin：code

有什么想法吗？

import tensorflow as tf


#####################
# preparation stuff #
#####################

# define input and output data
input_data = [[0., 0.], [0., 1.], [1., 0.], [1., 1.]]  # XOR input
output_data = [0., 1., 1., 0.]  # XOR output

# create a placeholder for the input
# None indicates a variable batch size for the input
# one input's dimension is [1, 2]
n_input = tf.placeholder(tf.float32, shape=[None, 2])

# number of neurons in the hidden layer
hidden_nodes = 5


################
# hidden layer #
################
b_hidden = tf.Variable(0.1)  # hidden layer's bias neuron
W_hidden = tf.Variable(tf.random_uniform([hidden_nodes, 2], -1.0, 1.0))  # hidden layer's weight matrix
                                                                         # initialized with a uniform distribution
hidden = tf.sigmoid(tf.matmul(W_hidden, n_input) + b_hidden)  # calc hidden layer's activation


################
# output layer #
################
W_output = tf.Variable(tf.random_uniform([hidden_nodes, 1], -1.0, 1.0))  # output layer's weight matrix
output = tf.sigmoid(tf.matmul(W_output, hidden))  # calc output layer's activation


############
# learning #
############
cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(output, n_input)  # calc cross entropy between current
                                                                          # output and desired output
loss = tf.reduce_mean(cross_entropy)  # mean the cross_entropy
optimizer = tf.train.GradientDescentOptimizer(0.1)  # take a gradient descent for optimizing with a "stepsize" of 0.1
train = optimizer.minimize(loss)  # let the optimizer train


####################
# initialize graph #
####################
init = tf.initialize_all_variables()

sess = tf.Session()  # create the session and therefore the graph
sess.run(init)  # initialize all variables

# train the network
for epoch in xrange(0, 201):
    sess.run(train)  # run the training operation
    if epoch % 20 == 0:
        print("step: {:>3} | W: {} | b: {}".format(epoch, sess.run(W_hidden), sess.run(b_hidden)))

编辑：我仍然收到错误：/

hidden = tf.sigmoid(tf.matmul(n_input, W_hidden) + b_hidden)

输出line 27 (...) ValueError: Dimensions Dimension(2) and Dimension(5) are not compatible。将行更改为：

hidden = tf.sigmoid(tf.matmul(W_hidden, n_input) + b_hidden)

似乎工作正常，但随后错误出现在：

output = tf.sigmoid(tf.matmul(hidden, W_output))

告诉我：line 34 (...) ValueError: Dimensions Dimension(2) and Dimension(5) are not compatible

将语句转换为：

output = tf.sigmoid(tf.matmul(W_output, hidden))

也抛出异常：line 34 (...) ValueError: Dimensions Dimension(1) and Dimension(5) are not compatible。

EDIT2：我不太明白这一点。 hidden 不应该是 W_hidden x n_input.T，因为在维度上这将是 (5, 2) x (2, 1)？如果我转置n_input hidden 仍在工作（我什至不明白为什么它在没有转置的情况下工作）。但是，output 不断抛出错误，但是这个维度上的操作应该是 (1, 5) x (5, 1)?!

【问题讨论】：

标签： python machine-learning neural-network xor tensorflow

【解决方案1】：

(0) 包含错误输出会很有帮助 - 查看它也是一个有用的东西，因为它确实可以准确地识别出您遇到形状问题的位置。

(1) 出现形状错误是因为您的两个 matmul 中的 matmul 参数都向后，并且 tf.Variable 向后。一般规则是具有input_size, output_size 的层的权重应为[input_size, output_size]，matmul 应为tf.matmul(input_to_layer, weights_for_layer)（然后添加形状为[output_size] 的偏差）。

所以用你的代码，

W_hidden = tf.Variable(tf.random_uniform([hidden_nodes, 2], -1.0, 1.0))

应该是：

W_hidden = tf.Variable(tf.random_uniform([2, hidden_nodes], -1.0, 1.0))

和

hidden = tf.sigmoid(tf.matmul(W_hidden, n_input) + b_hidden)

应该是tf.matmul(n_input, W_hidden);和

output = tf.sigmoid(tf.matmul(W_output, hidden))

应该是tf.matmul(hidden, W_output)

(2) 修复这些错误后，需要为您的运行提供 feed_dict：

sess.run(train)

应该是：

sess.run(train, feed_dict={n_input: input_data})

至少，我认为这就是您想要实现的目标。

【讨论】：

编辑 2 错误。我将编辑我的答案以指出所有倒退的地方——您对 W_hidden 的 tf.Variable 声明也是倒退的。它们都必须采用[InputLayerSize, OutputLayerSize] 的形式。所以第一层应该是[2, hidden_nodes]。 matmuls 应该是input_layer x this_layer_weights。
非常感谢！现在一切都在运行。然而，网络没有正确学习，所有的输入都会产生接近 1.0 的输出。我猜我的交叉熵中有些东西是错误的..