【问题标题】:Difficulty with running gradient descent in eager execution在急切执行中运行梯度下降的困难
【发布时间】:2019-01-11 09:40:07
【问题描述】:

我在 TensorFlow 中使用 python 构建了一个神经网络,但我似乎无法通过 TensorFlow 的急切执行来解决这个问题。所有的梯度输出为零,我不确定我在程序中哪里出错了。

本来我用的是ReLU,我以为是网络的问题,所以改成了leaky ReLU。但是没有看到渐变的任何变化。

import tensorflow as tf

# emabling eager execution
tf.enable_eager_execution()

# establising learning rate
LEARNING_RATE = 20
TRAINING_ITERATIONS = 30
LABELS = tf.constant([0.5, 0.7, 1.0])
# print(LABELS)

# input test vector
init = tf.Variable(tf.random_normal([3, 1]))
# print(init)

# declare and intialize all weights
weight1 = tf.Variable(tf.random_normal([2, 3]))
bias1 = tf.Variable(tf.random_normal([2, 1]))
weight2 = tf.Variable(tf.random_normal([3, 2]))
bias2 = tf.Variable(tf.random_normal([3, 1]))
weight3 = tf.Variable(tf.random_normal([2, 3]))
bias3 = tf.Variable(tf.random_normal([2, 1]))
weight4 = tf.Variable(tf.random_normal([3, 2]))
bias4 = tf.Variable(tf.random_normal([3, 1]))
weight5 = tf.Variable(tf.random_normal([3, 3]))
bias5 = tf.Variable(tf.random_normal([3, 1]))

VARIABLES = [weight1, bias1, weight2, bias2, weight3, bias3, weight4, bias4, weight5, bias5]
# print(weight1)


def neuralNet(input, y_input):  # nn model aka: Thanouse's Eyes
    layerResult = tf.nn.leaky_relu((tf.matmul(weight1, input) + bias1), alpha=0.1)
    input = layerResult
    layerResult = tf.nn.leaky_relu((tf.matmul(weight2, input) + bias2), alpha=0.1)
    input = layerResult
    layerResult = tf.nn.leaky_relu((tf.matmul(weight3, input) + bias3), alpha=0.1)
    input = layerResult
    layerResult = tf.nn.leaky_relu((tf.matmul(weight4, input) + bias4), alpha=0.1)
    input = layerResult
    layerResult = tf.nn.leaky_relu((tf.matmul(weight5, input) + bias5), alpha=0.1)
    prediction = tf.nn.softmax(tf.reshape(layerResult, [-1]))
    return prediction


# print(neuralNet(init, LABELS))
# Begin training and update variables
optimizer = tf.train.AdamOptimizer(learning_rate=LEARNING_RATE)

for i in range(TRAINING_ITERATIONS):
    with tf.GradientTape(persistent=True) as tape:  # gradient calculation
        tape.watch(VARIABLES)
        COST = tf.reduce_sum(LABELS - neuralNet(init, LABELS))
    print(COST)
    GRADIENTS = tape.gradient(COST, VARIABLES)
    # print(GRADIENTS)
    optimizer.apply_gradients(zip(GRADIENTS, VARIABLES))

【问题讨论】:

    标签: tensorflow gradient-descent eager-execution


    【解决方案1】:

    您不需要persistent GradientTape。只需删除参数即可。

    实际问题是sum(softmax) 的导数始终为零,因为根据定义,softmax 输出的总和始终为 1。因此,无论您如何处理变量,都无法降低您的 COST已定义。

    【讨论】:

    • 但是在计算总和之前,我从 LABLES 中减去 softmax 向量,这会改变总和从 1 否?
    • LABELS 是恒定的(即更改变量不会更改 LABELS),因此,COST 不完全是 1,但它仍然是恒定的。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2011-09-27
    • 2016-09-25
    • 1970-01-01
    • 1970-01-01
    • 2016-06-13
    相关资源
    最近更新 更多