是否可以使用损失函数的梯度在 keras 中训练模型？答案

【问题标题】：Is it possible to train a model in keras with the gradient of the loss function?是否可以使用损失函数的梯度在 keras 中训练模型？
【发布时间】：2023-03-03 16:39:03
【问题描述】：

我有一个模型，我知道损失函数的梯度，即 dE/dy，其中 E 是损失函数，y 是输出。但是，它是不可积的，也没有损失函数的封闭形式。在这种情况下，有没有办法在 Keras 中训练模型（也许使用 tensorflow）？

【问题讨论】：

损失函数的梯度到底是什么意思？
@abdou_dev 相对于输出的损失梯度，或者只是输出的更新规则。

标签： tensorflow keras tf.keras

【解决方案1】：

是的。

假设您将 E 的所有变量 x_i 存储在 var_list 列表中。假设您有一个预先计算的每个对应变量的梯度值列表，dE/d(x_i)，存储在processed_grads 中。有了这些假设，您可以使用 ops.apply_gradients 通过以下命令将这些渐变应用于值：

opt.apply_gradients(zip(processed_grads, var_list))

如果您要寻找完整的示例，这是我从 TensorFlow 文档 (https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Optimizer) 中获取的示例：

# Create an optimizer.
opt = tf.keras.optimizers.SGD(learning_rate=0.1)

# Compute the gradients for a list of variables.
with tf.GradientTape() as tape:
  loss = <call_loss_function>
vars = <list_of_variables>
grads = tape.gradient(loss, vars)

# Process the gradients, for example cap them, etc.
# capped_grads = [MyCapper(g) for g in grads]
processed_grads = [process_gradient(g) for g in grads]

# Ask the optimizer to apply the processed gradients.
opt.apply_gradients(zip(processed_grads, var_list))

【讨论】：

感谢您的回答。但是，假设我有一个输出更新规则，例如，y=y-eta*y，其中 y 是输出。我想反向传播这个。我如何能够在上述框架中实现它？
另外，如何获取 var_list？在我的例子中，var_list 将是模型输出的变量。
通常优化以这种方式工作：y = y -e*y' 其中 y' 是梯度。优化器参数（如“e”或学习率）设置在第二行，我们在其中声明opt。 var_list 是一组模型参数。如果你是函数 y=f(x)= Ax1+Bx2+C，则 var_list 类似于 [x1,x2] 而 gradient_list 是 [g1,g2]。
欢迎阅读更多关于优化过程以及梯度在反向传播过程中的实际作用的信息。我分享了张量流的文档，看看它也会有帮助。祝你好运
好的，非常感谢