Keras SGD Optimizer 的“get_updates”方法中当前批次的前向传递计算答案

【问题标题】：Forward Pass calculation on current batch in "get_updates" method of Keras SGD OptimizerKeras SGD Optimizer 的“get_updates”方法中当前批次的前向传递计算
【发布时间】：2020-09-25 16:45:05
【问题描述】：

我正在尝试在 Keras SGD 优化器的 get_gradient 方法中实现随机 armijo 规则。因此，我需要计算另一个前向传递来检查所选择的 learning_rate 是否良好。我不想再次计算梯度，但我想使用更新后的权重。

使用 Keras 2.3.1 版和 TensorFlow 1.14.0 版

def get_updates(self, loss, params):
        grads = self.get_gradients(loss, params)
        self.updates = [K.update_add(self.iterations, 1)]

        lr = self.learning_rate
        if self.initial_decay > 0:
            lr = lr * (1. / (1. + self.decay * K.cast(self.iterations,
                                                      K.dtype(self.decay))))
        # momentum
        shapes = [K.int_shape(p) for p in params]
        moments = [K.zeros(shape, name='moment_' + str(i))
                   for (i, shape) in enumerate(shapes)]
        self.weights = [self.iterations] + moments
        for p, g, m in zip(params, grads, moments):
            v = self.momentum * m - lr * g  # velocity
            self.updates.append(K.update(m, v))

            if self.nesterov:
                new_p = p + self.momentum * v - lr * g
            else:
                new_p = p + v

            # Apply constraints.
            if getattr(p, 'constraint', None) is not None:
                 new_p = p.constraint(new_p)

            self.updates.append(K.update(p, new_p))

        ### own changes ###
        if self.armijo:
            inputs = (model._feed_inputs +
                      model._feed_targets +
                      model._feed_sample_weights)
            input_layer = model.layers[0].input
            armijo_function = K.function(inputs=input_layer, outputs=[loss],                                                                                  
                                                updates=self.updates,name='armijo')
            loss_next= armijo_function(inputs)
            [....change updates if learning rate was not good enough...]

        return self.updates

很遗憾，我在尝试计算“loss_next”时不理解错误消息：

tensorflow.python.framework.errors_impl.InvalidArgumentError: Requested Tensor connection between nodes "conv2d_1_input" and "conv2d_1_input" would create a cycle.

这里有两个问题：

如何访问我正在处理的当前批次？前向计算应该只考虑实际批次，因为梯度也只属于那个批次。
有什么更好的办法不使用 K.function 更新和评估前向传递来计算该批次的损失函数？

谁能帮忙？提前致谢。

【问题讨论】：

标签： tensorflow keras learning-rate stochastic-gradient custom-training

【解决方案1】：

如何访问我正在处理的当前批次？前向计算应该只考虑实际的批次，因为梯度也只属于那个批次。

为此，您可以在 model.fit() 中使用 batch_size = Total training records，这样每个 epoch 都只有一次前向传播和反向传播。因此，您可以分析epoch 1 上的梯度并修改epoch 2 的学习率，或者如果您正在使用自定义训练循环，则相应地修改代码。

有什么更好的想法不使用 K.function 来更新和评估前向传递来计算该批次的损失函数？

除了在tensorflow version 1.x 中使用from tensorflow.keras import backend as K 之外，我不记得有任何其他评估梯度的选项。最好的选择是将 tensorflow 更新到最新版本 2.2.0 并使用 tf.GradientTape。

建议通过answer 使用from tensorflow.keras import backend as K 在tensorflow 1.x 中捕获渐变。

以下是与您的要求几乎相似的示例代码。我正在使用tensorflow version 2.2.0。您可以从此程序构建您的需求。

我们正在程序中执行以下功能-

我们会在每个 epoch 后更改学习率。你可以使用model.fit 的回调参数来做到这一点。在这里，我使用 tf.keras.callbacks.LearningRateScheduler 将每个 epoch 的学习率增加 0.01，并使用 tf.keras.callbacks.Callback 在每个 epoch 结束时显示它。
在每个 epoch 结束后使用tf.GradientTape() 计算梯度。我们正在使用 append 将每个 epoch 的毕业生收集到一个列表中。
还可以根据您的要求设置batch_size=len(train_images)。

注意：由于内存限制，我只训练来自 Cifar 数据集的 500 条记录。

代码 -

%tensorflow_version 2.x
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Flatten, Dropout, MaxPooling2D
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import backend as K

import os
import numpy as np
import matplotlib.pyplot as plt

(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.cifar10.load_data()

train_images = train_images[:500]
train_labels = train_labels[:500]

test_images = test_images[:50]
test_labels = test_labels[:50]

model = Sequential([
    Conv2D(16, 3, padding='same', activation='relu', input_shape=(32, 32, 3)),
    MaxPooling2D(),
    Conv2D(32, 3, padding='same', activation='relu'),
    MaxPooling2D(),
    Conv2D(64, 3, padding='same', activation='relu'),
    MaxPooling2D(),
    Flatten(),
    Dense(512, activation='relu'),
    Dense(10)
])

lr = 0.01
adam = Adam(lr)

# Define the Gradient Fucntion
epoch_gradient = []
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# Define the Required Callback Function
class GradientCalcCallback(tf.keras.callbacks.Callback):
  def on_epoch_end(self, epoch, logs={}):
    with tf.GradientTape() as tape:
       logits = model(train_images, training=True)
       loss = loss_fn(train_labels, logits)    
    grad = tape.gradient(loss, model.trainable_weights)
    model.optimizer.apply_gradients(zip(grad, model.trainable_variables))
    epoch_gradient.append(grad)

gradcalc = GradientCalcCallback()

# Define the Required Callback Function
class printlearningrate(tf.keras.callbacks.Callback):
    def on_epoch_begin(self, epoch, logs={}):
        optimizer = self.model.optimizer
        lr = K.eval(optimizer.lr)
        Epoch_count = epoch + 1
        print('\n', "Epoch:", Epoch_count, ', LR: {:.2f}'.format(lr))

printlr = printlearningrate() 

def scheduler(epoch):
  optimizer = model.optimizer
  return K.eval(optimizer.lr + 0.01)

updatelr = tf.keras.callbacks.LearningRateScheduler(scheduler)

model.compile(optimizer=adam, 
          loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
          metrics=['accuracy'])

epochs = 10 

history = model.fit(train_images, train_labels, epochs=epochs, batch_size=len(train_images), 
                    validation_data=(test_images, test_labels),
                    callbacks = [printlr,updatelr,gradcalc])

# (7) Convert to a 2 dimensiaonal array of (epoch, gradients) type
gradient = np.asarray(epoch_gradient)
print("Total number of epochs run:", epochs)
print("Gradient Array has the shape:",gradient.shape)

输出 -

 Epoch: 1 , LR: 0.01
Epoch 1/10
1/1 [==============================] - 0s 427ms/step - loss: 30.1399 - accuracy: 0.0820 - val_loss: 2114.8201 - val_accuracy: 0.1800 - lr: 0.0200

 Epoch: 2 , LR: 0.02
Epoch 2/10
1/1 [==============================] - 0s 329ms/step - loss: 141.6176 - accuracy: 0.0920 - val_loss: 41.7008 - val_accuracy: 0.0400 - lr: 0.0300

 Epoch: 3 , LR: 0.03
Epoch 3/10
1/1 [==============================] - 0s 328ms/step - loss: 4.1428 - accuracy: 0.1160 - val_loss: 2.3883 - val_accuracy: 0.1800 - lr: 0.0400

 Epoch: 4 , LR: 0.04
Epoch 4/10
1/1 [==============================] - 0s 329ms/step - loss: 2.3545 - accuracy: 0.1060 - val_loss: 2.3471 - val_accuracy: 0.1800 - lr: 0.0500

 Epoch: 5 , LR: 0.05
Epoch 5/10
1/1 [==============================] - 0s 340ms/step - loss: 2.3208 - accuracy: 0.1060 - val_loss: 2.3047 - val_accuracy: 0.1800 - lr: 0.0600

 Epoch: 6 , LR: 0.06
Epoch 6/10
1/1 [==============================] - 0s 331ms/step - loss: 2.3048 - accuracy: 0.1300 - val_loss: 2.3069 - val_accuracy: 0.0600 - lr: 0.0700

 Epoch: 7 , LR: 0.07
Epoch 7/10
1/1 [==============================] - 0s 337ms/step - loss: 2.3041 - accuracy: 0.1340 - val_loss: 2.3432 - val_accuracy: 0.0600 - lr: 0.0800

 Epoch: 8 , LR: 0.08
Epoch 8/10
1/1 [==============================] - 0s 341ms/step - loss: 2.2871 - accuracy: 0.1400 - val_loss: 2.6009 - val_accuracy: 0.0800 - lr: 0.0900

 Epoch: 9 , LR: 0.09
Epoch 9/10
1/1 [==============================] - 1s 515ms/step - loss: 2.2810 - accuracy: 0.1440 - val_loss: 2.8530 - val_accuracy: 0.0600 - lr: 0.1000

 Epoch: 10 , LR: 0.10
Epoch 10/10
1/1 [==============================] - 0s 343ms/step - loss: 2.2954 - accuracy: 0.1300 - val_loss: 2.3049 - val_accuracy: 0.0600 - lr: 0.1100
Total number of epochs run: 10
Gradient Array has the shape: (10, 10)

希望这能回答您的问题。快乐学习。

【讨论】：

@mreiners - 希望我们已经回答了您的问题。如果您对答案感到满意，请您接受并投票。