在多种情况下提前停止答案

【问题标题】：Early stopping with multiple conditions在多种情况下提前停止
【发布时间】：2021-02-09 19:57:00
【问题描述】：

我正在为推荐系统（项目推荐）进行多类分类，我目前正在使用sparse_categorical_crossentropy loss 训练我的网络。因此，通过监控我的验证损失来执行EarlyStopping 是合理的，val_loss 如下：

tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=10)

按预期工作。然而，网络（推荐系统）的性能是通过 Average-Precision-at-10 来衡量的，并在训练期间作为一个指标进行跟踪，如 average_precision_at_k10。因此，我也可以使用此指标执行提前停止：

tf.keras.callbacks.EarlyStopping(monitor='average_precision_at_k10', patience=10)

这也可以按预期工作。

我的问题： 有时验证损失会增加，而 10 处的平均精度正在提高，反之亦然。因此，我需要监控两者，并执行提前停止，当且仅当两者都在恶化。我想做的事：

tf.keras.callbacks.EarlyStopping(monitor=['val_loss', 'average_precision_at_k10'], patience=10)

这显然行不通。任何想法如何做到这一点？

【问题讨论】：

简单的解决方案/问题马上。您是否尝试过创建第三个函数，例如“avg_prc_at_10k_and_val_loss”并在此方法中提前停止？就像“如果 val_loss() 正在减少并且 avg_precision_at_k10 正在减少 -> early_stop = true”...
我想过，但没有找到足够的文档。我了解可以创建自定义 EarlyStopping 函数，如下所述：datascience.stackexchange.com/questions/26833/…。它扩展了模型类，因此可以设置self.model.stop_training，但我不知道如何访问当前指标的值，例如。 val_loss 以类似的方式。你有什么想法吗？
我的答案中的自定义回调框架显示了如何访问这些指标。鉴于该框架，您应该能够开发所需的代码。

标签： python python-3.x tensorflow keras recommendation-engine

【解决方案1】：

我建议您创建自己的回调。在下文中，我添加了一个同时监控准确性和损失的解决方案。您可以将 acc 替换为您自己的指标：

class CustomCallback(keras.callbacks.Callback):
    acc = {}
    loss = {}
    best_weights = None
    
    def __init__(self, patience=None):
        super(CustomCallback, self).__init__()
        self.patience = patience
    
    def on_epoch_end(self, epoch, logs=None):
        epoch += 1
        self.loss[epoch] = logs['loss']
        self.acc[epoch] = logs['accuracy']
    
        if self.patience and epoch > self.patience:
            # best weight if the current loss is less than epoch-patience loss. Simiarly for acc but when larger
            if self.loss[epoch] < self.loss[epoch-self.patience] and self.acc[epoch] > self.acc[epoch-self.patience]:
                self.best_weights = self.model.get_weights()
            else:
                # to stop training
                self.model.stop_training = True
                # Load the best weights
                self.model.set_weights(self.best_weights)
        else:
            # best weight are the current weights
            self.best_weights = self.model.get_weights()

请记住，如果您想控制监控量的最小变化（又名 min_delta），您必须将其集成到代码中。

这是有关如何构建客户回调的文档：custom_callback

【讨论】：

Links to external resources are encouraged, but please add context around the link so your fellow users will have some idea what it is and why it’s there. Always quote the most relevant part of an important link, in case the external resource is unreachable or goes permanently offline.
这并不能真正回答问题。它只是文档中的样板代码。它没有做 OP 想要的......
我相信我参考的文档中有一个使用自定义回调进行提前停止的示例
@NicolasGervais，解决方案已添加。
我建议修改您的自定义回调以根据验证损失和平均精度为 10 调整学习率，而不是仅仅进行提前停止。这可能会提供获得更高模型性能的机会。应该很容易在你的回调中实现。

【解决方案2】：

您可以通过创建自定义回调来实现此目的。有关如何执行此操作的信息位于here. 下面是一些代码，说明了您可以在自定义回调中执行什么操作。我引用的文档显示了许多其他选项。

class LRA(keras.callbacks.Callback): # subclass the callback class
# create class variables as below. These can be accessed in your code outside the class definition as LRA.my_class_variable, LRA.best_weights
    my_class_variable=something  # a class variable
    best_weights=model.get_weights() # another  class variable
# define an initialization function with parameters you want to feed to the callback
    def __init__(self, param1, param2, etc):
        super(LRA, self).__init__()
        self.param1=param1
        self.param2=param2
        etc for all parameters
        # write any initialization code you need here

    def on_epoch_end(self, epoch, logs=None):  # method runs on the end of each epoch
        v_loss=logs.get('val_loss')  # example of getting log data at end of epoch the validation loss for this epoch
        acc=logs.get('accuracy') # another example of getting log data 
        LRA.best_weights=model.get_weights() # example of setting class variable value
        print(f'Hello epoch {epoch} has just ended') # print a message at the end of every epoch
        lr=float(tf.keras.backend.get_value(self.model.optimizer.lr)) # get the current learning rate
        if v_loss > self.param1:
           new_lr=lr * self.param2
           tf.keras.backend.set_value(model.optimizer.lr, new_lr) # set the learning rate in the optimizer
        # write whatever code you need

【讨论】：

【解决方案3】：

在上述Gerry P 的指导下，我设法创建了自己的自定义 EarlyStopping 回调，并认为我将其发布在这里以防其他人希望实现类似的东西。

如果 both 验证损失 和 平均精度为 10 没有提高 patience epochs，提前停止。

class CustomEarlyStopping(keras.callbacks.Callback):
    def __init__(self, patience=0):
        super(CustomEarlyStopping, self).__init__()
        self.patience = patience
        self.best_weights = None
        
    def on_train_begin(self, logs=None):
        # The number of epoch it has waited when loss is no longer minimum.
        self.wait = 0
        # The epoch the training stops at.
        self.stopped_epoch = 0
        # Initialize the best as infinity.
        self.best_v_loss = np.Inf
        self.best_map10 = 0

    def on_epoch_end(self, epoch, logs=None): 
        v_loss=logs.get('val_loss')
        map10=logs.get('val_average_precision_at_k10')

        # If BOTH the validation loss AND map10 does not improve for 'patience' epochs, stop training early.
        if np.less(v_loss, self.best_v_loss) and np.greater(map10, self.best_map10):
            self.best_v_loss = v_loss
            self.best_map10 = map10
            self.wait = 0
            # Record the best weights if current results is better (less).
            self.best_weights = self.model.get_weights()
        else:
            self.wait += 1
            if self.wait >= self.patience:
                self.stopped_epoch = epoch
                self.model.stop_training = True
                print("Restoring model weights from the end of the best epoch.")
                self.model.set_weights(self.best_weights)
                
    def on_train_end(self, logs=None):
        if self.stopped_epoch > 0:
            print("Epoch %05d: early stopping" % (self.stopped_epoch + 1))

然后用作：

model.fit(
    x_train,
    y_train,
    batch_size=64,
    steps_per_epoch=5,
    epochs=30,
    verbose=0,
    callbacks=[CustomEarlyStopping(patience=10)],
)

【讨论】：

ops!，你应该稍等片刻才能使用我的代码 ;) 祝你好运。
@Ghanem 谢谢！感谢您的努力和扩展您的第一个答案！

【解决方案4】：

此时，制作自定义循环并仅使用 if 语句会更简单。例如：

def main(epochs=50):
    for epoch in range(epochs):
        fit(epoch)

        if test_acc.result() > .8 and topk_acc.result() > .9:
            print(f'\nEarly stopping. Test acc is above 80% and TopK acc is above 90%.')
            break

if __name__ == '__main__':
    main(epochs=100)

这是一个使用此方法的简单自定义训练循环：

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import tensorflow_datasets as tfds
import tensorflow as tf

data, info = tfds.load('iris', split='train',
                       as_supervised=True,
                       shuffle_files=True,
                       with_info=True)

def preprocessing(inputs, targets):
    scaled = tf.divide(inputs, tf.reduce_max(inputs, axis=0))
    return scaled, targets

dataset = data.filter(lambda x, y: tf.less_equal(y, 2)).\
    map(preprocessing).\
    shuffle(info.splits['train'].num_examples)

train_dataset = dataset.take(120).batch(4)
test_dataset = dataset.skip(120).take(30).batch(4)


model = tf.keras.Sequential([
    tf.keras.layers.Dense(8, activation='relu'),
    tf.keras.layers.Dense(16, activation='relu'),
    tf.keras.layers.Dense(info.features['label'].num_classes, activation='softmax')
    ])


loss_object = tf.losses.SparseCategoricalCrossentropy(from_logits=True)

train_loss = tf.metrics.Mean()
test_loss = tf.metrics.Mean()

train_acc = tf.metrics.SparseCategoricalAccuracy()
test_acc = tf.metrics.SparseCategoricalAccuracy()

topk_acc = tf.metrics.SparseTopKCategoricalAccuracy(k=2)

opt = tf.keras.optimizers.Adam(learning_rate=1e-3)


@tf.function
def train_step(inputs, labels):
    with tf.GradientTape() as tape:
        logits = model(inputs)
        loss = loss_object(labels, logits)

    gradients = tape.gradient(loss, model.trainable_variables)
    opt.apply_gradients(zip(gradients, model.trainable_variables))
    train_loss(loss)
    train_acc(labels, logits)


@tf.function
def test_step(inputs, labels):
    logits = model(inputs)
    loss = loss_object(labels, logits)

    test_loss.update_state(loss)
    test_acc.update_state(labels, logits)

    topk_acc.update_state(labels, logits)

def fit(epoch):
    template = 'Epoch {:>2} Train Loss {:.3f} Test Loss {:.3f} ' \
               'Train Acc {:.2f} Test Acc {:.2f} Test TopK Acc {:.2f} '

    train_loss.reset_states()
    test_loss.reset_states()
    train_acc.reset_states()
    test_acc.reset_states()

    topk_acc.reset_states()

    for X_train, y_train in train_dataset:
        train_step(X_train, y_train)

    for X_test, y_test in test_dataset:
        test_step(X_test, y_test)

    print(template.format(
        epoch + 1,
        train_loss.result(),
        test_loss.result(),
        train_acc.result(),
        test_acc.result(),
        topk_acc.result()
    ))


def main(epochs=50):
    for epoch in range(epochs):
        fit(epoch)

        if test_acc.result() > .8 and topk_acc.result() > .9:
            print(f'\nEarly stopping. Test acc is above 80% and TopK acc is above 90%.')
            break

if __name__ == '__main__':
    main(epochs=100)

【讨论】：