compute_gradients 返回的究竟是什么以及它如何依赖于 batch_size？答案

【问题标题】：What exactly is compute_gradients returning and how does it depend on batch_size?compute_gradients 返回的究竟是什么以及它如何依赖于 batch_size？
【发布时间】：2019-12-23 07:51:52
【问题描述】：

请原谅我对TensorFlow的菜鸟理解，提前感谢您的帮助！

我正在尝试使用 compute_gradients() wrt 我加载的模型的嵌入输入来计算梯度。

我的 batch_size 是 250，embd_size 是 300。

我想为 250 个测试示例计算所有输入的梯度，所以predicted_y 是一个由[250,1] 形状模型预测的值的numpy 列表，所以我在feed 字典中提供的x_test 是[250, 300] 形状。

我已经尝试过这个类似的问题 What does compute_gradients return in tensorflow 但是我没有完全理解batch_size在compute_gradients()中的作用

def get_gradients(model, predicted_y):

    variables_fed = []
    gradients_fe = []
    inputs_here_fed = []

    optimizer_here = model.gradients
    inputs_here = model.inputs
    embedding_here = model.embedding
    cost_here = model.cost

    print(len(predicted_y))


    gradients, variables = zip(*optimizer_here.compute_gradients(cost_here, embedding_here))
    print("gradients object: {}".format(gradients[0]))

    opt = optimizer_here.apply_gradients(list(zip(gradients, variables)))
    # we do not have to run the optimizer as we do not want to BP

    with tf.Session() as sess:

        init = tf.global_variables_initializer()

        sess.run(init)
        test_state = sess.run(model.initial_state)

        feed = {model.inputs: x_test[0:len(predicted_y)], # dims should match predicted_y
                model.labels: predicted_y[:, None], #converting 1d to 2d array
                model.keep_prob: dropout,
                model.initial_state: test_state}

        # test = sess.run(opt, feed_dict=feed)

        gradients_fed = sess.run(gradients, feed_dict=feed)

        # inputs_here_fed = sess.run(inputs_here, feed_dict=feed)
        # variables_fed = sess.run(variables, feed_dict=feed)

    return variables_fed, gradients_fed, inputs_here_fed

def get_gradients_values(gradients): # takes IndexedSlices Object which store gradients as input

    l = gradients[0].values
    print("Shape of gradients list: {}".format(l.shape))

    return l

在输入sess.run(gradients, feed) 中的值后，我提取获得的IndexedSlices 对象的值并将其存储为列表grads。我希望得到grads，维度为[250, 300]，对应于每个测试示例的所有输入的梯度，但我得到[50000, 300]，我无法解释。

我也尝试改变 batch_size 看看会发生什么，但它给我输入形状错误之间的不匹配。我尝试理解 github 上的 compute_gradients() 代码，但对于我的基本理解的人来说它太模糊了。

如何获得每个测试集示例的所有输入的梯度？

【问题讨论】：

标签： tensorflow

【解决方案1】：

我发现根据我的x_train 尺寸，渐变从compute_gradients() 返回的尺寸是正确的，这是(250,200,300) 而不是我之前认为的(250,300)。计算得到的 IndexedSlices 对象有一个 values 属性，它将输出展平为 (250*200,300) 和一个 indices 属性，它与 values 具有相同的暗度，它为我们提供了您用于标记化的词汇表中每个单词的索引.

另外，在使用compute_gradients() 时，我们需要确保在上面的代码中始终使用batch_size == len(predicted_y)，否则会出现形状不匹配错误。要获取一个示例的渐变，请设置 batch_size = 1 并确保 predicted_y 仅包含 1 个测试示例。使用这些参数获得的梯度对象的大小将是 (200,300)，为了获得每个输入的属性/梯度度量，我们可以使用 np.sum(gradients, axis = 1) 对列求和。

【讨论】：