模型不在张量流中学习答案

【问题标题】：Model not learning in tensorflow模型不在张量流中学习
【发布时间】：2016-05-15 22:08:07
【问题描述】：

我是 tensorflow 和神经网络的新手，我正在尝试创建一个仅将两个浮点值相乘的模型。

我不确定我想要多少个神经元，但我选择了 10 个神经元并试图看看我可以从哪里开始。我认为这可能会引入足够的复杂性，以便半准确地学习该操作。

无论如何，这是我的代码：

import tensorflow as tf
import numpy as np

# Teach how to multiply
def generate_data(how_many):
    data = np.random.rand(how_many, 2)
    answers = data[:, 0] * data[:, 1]
    return data, answers


sess = tf.InteractiveSession()

# Input data
input_data = tf.placeholder(tf.float32, shape=[None, 2])
correct_answers = tf.placeholder(tf.float32, shape=[None])

# Use 10 neurons--just one layer for now, but it'll be fully connected
weights_1 = tf.Variable(tf.truncated_normal([2, 10], stddev=.1))
bias_1 = tf.Variable(.1)


# Output of this will be a [None, 10]
hidden_output = tf.nn.relu(tf.matmul(input_data, weights_1) + bias_1)

# Weights
weights_2 = tf.Variable(tf.truncated_normal([10, 1], stddev=.1))

bias_2 = tf.Variable(.1)
# Softmax them together--this will be [None, 1]
calculated_output = tf.nn.softmax(tf.matmul(hidden_output, weights_2) + bias_2)

cross_entropy = tf.reduce_mean(correct_answers * tf.log(calculated_output))

optimizer = tf.train.GradientDescentOptimizer(.5).minimize(cross_entropy)

sess.run(tf.initialize_all_variables())

for i in range(1000):
    x, y = generate_data(100)
    sess.run(optimizer, feed_dict={input_data: x, correct_answers: y})

error = tf.reduce_sum(tf.abs(calculated_output - correct_answers))

x, y = generate_data(100)
print("Total Error: ", error.eval(feed_dict={input_data: x, correct_answers: y}))

似乎错误总是在 7522.1 左右，这对于仅 100 个数据点来说非常非常糟糕，所以我认为它不是在学习。

我的问题：我的机器学习了吗？如果是这样，我该怎么做才能使其更准确？如果没有，我怎样才能让它学习？

【问题讨论】：

标签： python machine-learning neural-network tensorflow

【解决方案1】：

代码存在一些主要问题。 Aaron 已经确定了其中的一些，但还有另一个重要的：calculated_output 和 correct_answers 不是同一个形状，所以当你减去它们时你会创建一个 2D 矩阵。（calculated_output 的形状是 (100, 1)，correct_answers 的形状是 (100)。）所以你需要调整形状（例如，在calculated_output 上使用tf.squeeze）。

这个问题也不需要任何非线性，所以你可以不用激活，只用一层。以下代码的总误差约为 6（每个测试点的平均误差约为 0.06）。希望对您有所帮助！

import tensorflow as tf
import numpy as np


# Teach how to multiply
def generate_data(how_many):
    data = np.random.rand(how_many, 2)
    answers = data[:, 0] * data[:, 1]
    return data, answers


sess = tf.InteractiveSession()

input_data = tf.placeholder(tf.float32, shape=[None, 2])
correct_answers = tf.placeholder(tf.float32, shape=[None])

weights_1 = tf.Variable(tf.truncated_normal([2, 1], stddev=.1))
bias_1 = tf.Variable(.0)

output_layer = tf.matmul(input_data, weights_1) + bias_1

mean_squared = tf.reduce_mean(tf.square(correct_answers - tf.squeeze(output_layer)))
optimizer = tf.train.GradientDescentOptimizer(.1).minimize(mean_squared)

sess.run(tf.initialize_all_variables())

for i in range(1000):
    x, y = generate_data(100)
    sess.run(optimizer, feed_dict={input_data: x, correct_answers: y})

error = tf.reduce_sum(tf.abs(tf.squeeze(output_layer) - correct_answers))

x, y = generate_data(100)
print("Total Error: ", error.eval(feed_dict={input_data: x, correct_answers: y}))

【讨论】：

啊，我知道肯定有其他事情发生了！

【解决方案2】：

你使用 softmax 的方式很奇怪。当您想要在一组类上获得概率分布时，通常使用 Softmax。在您的代码中，您似乎有一个一维输出。 softmax 对你没有帮助。

交叉熵损失函数适用于分类问题，但您正在进行回归。您应该尝试改用均方误差损失函数。

【讨论】：

当我将 softmax 替换为 relu 和 cross_entropy = tf.reduce_mean(correct_answers * tf.log(calculated_output)) 替换为 mean_squared = tf.reduce_mean(tf.square(correct_answers - calculated_output)) 并将其最小化时，它将我的错误减少到 ~1722.95，这要好得多。我假设我需要更多层才能变得更好？
奇怪的事情还在发生。假设您相乘的两个数字在 0 和 1 之间，那么输出也将在 0 和 1 之间，并且平方误差也将在该范围内。因此，如果您的网络可以学习输出全零，它的均方误差应该小于 1.0。
正是我的想法......每个数字 17 折非常糟糕。
实际上我现在看到您打印的错误不是均方误差。它是超过 100 个示例的总绝对误差。当我运行您的代码时，它看起来好像只是在学习始终输出 0.25，这给出了大约 0.05 的 MSE。