如何访问我正在处理的当前批次?前向计算应该只考虑实际的批次,因为梯度也只属于那个批次。
为此,您可以在 model.fit() 中使用 batch_size = Total training records,这样每个 epoch 都只有一次前向传播和反向传播。因此,您可以分析epoch 1 上的梯度并修改epoch 2 的学习率,或者如果您正在使用自定义训练循环,则相应地修改代码。
有什么更好的想法不使用 K.function 来更新和评估前向传递来计算该批次的损失函数?
除了在tensorflow version 1.x 中使用from tensorflow.keras import backend as K 之外,我不记得有任何其他评估梯度的选项。最好的选择是将 tensorflow 更新到最新版本 2.2.0 并使用 tf.GradientTape。
建议通过answer 使用from tensorflow.keras import backend as K 在tensorflow 1.x 中捕获渐变。
以下是与您的要求几乎相似的示例代码。我正在使用tensorflow version 2.2.0。您可以从此程序构建您的需求。
我们正在程序中执行以下功能-
- 我们会在每个 epoch 后更改学习率。你可以使用
model.fit 的回调参数来做到这一点。在这里,我使用 tf.keras.callbacks.LearningRateScheduler 将每个 epoch 的学习率增加 0.01,并使用 tf.keras.callbacks.Callback 在每个 epoch 结束时显示它。
- 在每个 epoch 结束后使用
tf.GradientTape() 计算梯度。我们正在使用 append 将每个 epoch 的毕业生收集到一个列表中。
- 还可以根据您的要求设置
batch_size=len(train_images)。
注意:由于内存限制,我只训练来自 Cifar 数据集的 500 条记录。
代码 -
%tensorflow_version 2.x
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Flatten, Dropout, MaxPooling2D
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import backend as K
import os
import numpy as np
import matplotlib.pyplot as plt
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.cifar10.load_data()
train_images = train_images[:500]
train_labels = train_labels[:500]
test_images = test_images[:50]
test_labels = test_labels[:50]
model = Sequential([
Conv2D(16, 3, padding='same', activation='relu', input_shape=(32, 32, 3)),
MaxPooling2D(),
Conv2D(32, 3, padding='same', activation='relu'),
MaxPooling2D(),
Conv2D(64, 3, padding='same', activation='relu'),
MaxPooling2D(),
Flatten(),
Dense(512, activation='relu'),
Dense(10)
])
lr = 0.01
adam = Adam(lr)
# Define the Gradient Fucntion
epoch_gradient = []
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
# Define the Required Callback Function
class GradientCalcCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
with tf.GradientTape() as tape:
logits = model(train_images, training=True)
loss = loss_fn(train_labels, logits)
grad = tape.gradient(loss, model.trainable_weights)
model.optimizer.apply_gradients(zip(grad, model.trainable_variables))
epoch_gradient.append(grad)
gradcalc = GradientCalcCallback()
# Define the Required Callback Function
class printlearningrate(tf.keras.callbacks.Callback):
def on_epoch_begin(self, epoch, logs={}):
optimizer = self.model.optimizer
lr = K.eval(optimizer.lr)
Epoch_count = epoch + 1
print('\n', "Epoch:", Epoch_count, ', LR: {:.2f}'.format(lr))
printlr = printlearningrate()
def scheduler(epoch):
optimizer = model.optimizer
return K.eval(optimizer.lr + 0.01)
updatelr = tf.keras.callbacks.LearningRateScheduler(scheduler)
model.compile(optimizer=adam,
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
epochs = 10
history = model.fit(train_images, train_labels, epochs=epochs, batch_size=len(train_images),
validation_data=(test_images, test_labels),
callbacks = [printlr,updatelr,gradcalc])
# (7) Convert to a 2 dimensiaonal array of (epoch, gradients) type
gradient = np.asarray(epoch_gradient)
print("Total number of epochs run:", epochs)
print("Gradient Array has the shape:",gradient.shape)
输出 -
Epoch: 1 , LR: 0.01
Epoch 1/10
1/1 [==============================] - 0s 427ms/step - loss: 30.1399 - accuracy: 0.0820 - val_loss: 2114.8201 - val_accuracy: 0.1800 - lr: 0.0200
Epoch: 2 , LR: 0.02
Epoch 2/10
1/1 [==============================] - 0s 329ms/step - loss: 141.6176 - accuracy: 0.0920 - val_loss: 41.7008 - val_accuracy: 0.0400 - lr: 0.0300
Epoch: 3 , LR: 0.03
Epoch 3/10
1/1 [==============================] - 0s 328ms/step - loss: 4.1428 - accuracy: 0.1160 - val_loss: 2.3883 - val_accuracy: 0.1800 - lr: 0.0400
Epoch: 4 , LR: 0.04
Epoch 4/10
1/1 [==============================] - 0s 329ms/step - loss: 2.3545 - accuracy: 0.1060 - val_loss: 2.3471 - val_accuracy: 0.1800 - lr: 0.0500
Epoch: 5 , LR: 0.05
Epoch 5/10
1/1 [==============================] - 0s 340ms/step - loss: 2.3208 - accuracy: 0.1060 - val_loss: 2.3047 - val_accuracy: 0.1800 - lr: 0.0600
Epoch: 6 , LR: 0.06
Epoch 6/10
1/1 [==============================] - 0s 331ms/step - loss: 2.3048 - accuracy: 0.1300 - val_loss: 2.3069 - val_accuracy: 0.0600 - lr: 0.0700
Epoch: 7 , LR: 0.07
Epoch 7/10
1/1 [==============================] - 0s 337ms/step - loss: 2.3041 - accuracy: 0.1340 - val_loss: 2.3432 - val_accuracy: 0.0600 - lr: 0.0800
Epoch: 8 , LR: 0.08
Epoch 8/10
1/1 [==============================] - 0s 341ms/step - loss: 2.2871 - accuracy: 0.1400 - val_loss: 2.6009 - val_accuracy: 0.0800 - lr: 0.0900
Epoch: 9 , LR: 0.09
Epoch 9/10
1/1 [==============================] - 1s 515ms/step - loss: 2.2810 - accuracy: 0.1440 - val_loss: 2.8530 - val_accuracy: 0.0600 - lr: 0.1000
Epoch: 10 , LR: 0.10
Epoch 10/10
1/1 [==============================] - 0s 343ms/step - loss: 2.2954 - accuracy: 0.1300 - val_loss: 2.3049 - val_accuracy: 0.0600 - lr: 0.1100
Total number of epochs run: 10
Gradient Array has the shape: (10, 10)
希望这能回答您的问题。快乐学习。