在急切执行中运行梯度下降的困难答案

【问题标题】：Difficulty with running gradient descent in eager execution在急切执行中运行梯度下降的困难
【发布时间】：2019-01-11 09:40:07
【问题描述】：

我在 TensorFlow 中使用 python 构建了一个神经网络，但我似乎无法通过 TensorFlow 的急切执行来解决这个问题。所有的梯度输出为零，我不确定我在程序中哪里出错了。

本来我用的是ReLU，我以为是网络的问题，所以改成了leaky ReLU。但是没有看到渐变的任何变化。

import tensorflow as tf

# emabling eager execution
tf.enable_eager_execution()

# establising learning rate
LEARNING_RATE = 20
TRAINING_ITERATIONS = 30
LABELS = tf.constant([0.5, 0.7, 1.0])
# print(LABELS)

# input test vector
init = tf.Variable(tf.random_normal([3, 1]))
# print(init)

# declare and intialize all weights
weight1 = tf.Variable(tf.random_normal([2, 3]))
bias1 = tf.Variable(tf.random_normal([2, 1]))
weight2 = tf.Variable(tf.random_normal([3, 2]))
bias2 = tf.Variable(tf.random_normal([3, 1]))
weight3 = tf.Variable(tf.random_normal([2, 3]))
bias3 = tf.Variable(tf.random_normal([2, 1]))
weight4 = tf.Variable(tf.random_normal([3, 2]))
bias4 = tf.Variable(tf.random_normal([3, 1]))
weight5 = tf.Variable(tf.random_normal([3, 3]))
bias5 = tf.Variable(tf.random_normal([3, 1]))

VARIABLES = [weight1, bias1, weight2, bias2, weight3, bias3, weight4, bias4, weight5, bias5]
# print(weight1)


def neuralNet(input, y_input):  # nn model aka: Thanouse's Eyes
    layerResult = tf.nn.leaky_relu((tf.matmul(weight1, input) + bias1), alpha=0.1)
    input = layerResult
    layerResult = tf.nn.leaky_relu((tf.matmul(weight2, input) + bias2), alpha=0.1)
    input = layerResult
    layerResult = tf.nn.leaky_relu((tf.matmul(weight3, input) + bias3), alpha=0.1)
    input = layerResult
    layerResult = tf.nn.leaky_relu((tf.matmul(weight4, input) + bias4), alpha=0.1)
    input = layerResult
    layerResult = tf.nn.leaky_relu((tf.matmul(weight5, input) + bias5), alpha=0.1)
    prediction = tf.nn.softmax(tf.reshape(layerResult, [-1]))
    return prediction


# print(neuralNet(init, LABELS))
# Begin training and update variables
optimizer = tf.train.AdamOptimizer(learning_rate=LEARNING_RATE)

for i in range(TRAINING_ITERATIONS):
    with tf.GradientTape(persistent=True) as tape:  # gradient calculation
        tape.watch(VARIABLES)
        COST = tf.reduce_sum(LABELS - neuralNet(init, LABELS))
    print(COST)
    GRADIENTS = tape.gradient(COST, VARIABLES)
    # print(GRADIENTS)
    optimizer.apply_gradients(zip(GRADIENTS, VARIABLES))

【问题讨论】：

标签： tensorflow gradient-descent eager-execution

【解决方案1】：

您不需要persistent GradientTape。只需删除参数即可。

实际问题是sum(softmax) 的导数始终为零，因为根据定义，softmax 输出的总和始终为 1。因此，无论您如何处理变量，都无法降低您的 COST已定义。

【讨论】：

但是在计算总和之前，我从 LABLES 中减去 softmax 向量，这会改变总和从 1 否？
LABELS 是恒定的（即更改变量不会更改 LABELS），因此，COST 不完全是 1，但它仍然是恒定的。