在自定义回调中访问验证数据答案

【问题标题】：Accessing validation data within a custom callback在自定义回调中访问验证数据
【发布时间】：2018-05-20 10:47:53
【问题描述】：

我正在安装一个 train_generator，并且我想通过一个自定义回调来计算我的 validation_generator 上的自定义指标。如何在自定义回调中访问参数 validation_steps 和 validation_data？不在self.params，在self.model也找不到。这就是我想做的。欢迎任何不同的方法。

model.fit_generator(generator=train_generator,
                    steps_per_epoch=steps_per_epoch,
                    epochs=epochs,
                    validation_data=validation_generator,
                    validation_steps=validation_steps,
                    callbacks=[CustomMetrics()])


class CustomMetrics(keras.callbacks.Callback):

    def on_epoch_end(self, batch, logs={}):        
        for i in validation_steps:
             # features, labels = next(validation_data)
             # compute custom metric: f(features, labels) 
        return

keras：2.1.1

更新

我设法将我的验证数据传递给自定义回调的构造函数。但是，这会导致令人讨厌的“内核似乎已经死机。它将自动重新启动。”信息。我怀疑这是否是正确的方法。有什么建议吗？

class CustomMetrics(keras.callbacks.Callback):

    def __init__(self, validation_generator, validation_steps):
        self.validation_generator = validation_generator
        self.validation_steps = validation_steps


    def on_epoch_end(self, batch, logs={}):

        self.scores = {
            'recall_score': [],
            'precision_score': [],
            'f1_score': []
        }

        for batch_index in range(self.validation_steps):
            features, y_true = next(self.validation_generator)            
            y_pred = np.asarray(self.model.predict(features))
            y_pred = y_pred.round().astype(int) 
            self.scores['recall_score'].append(recall_score(y_true[:,0], y_pred[:,0]))
            self.scores['precision_score'].append(precision_score(y_true[:,0], y_pred[:,0]))
            self.scores['f1_score'].append(f1_score(y_true[:,0], y_pred[:,0]))
        return

metrics = CustomMetrics(validation_generator, validation_steps)

model.fit_generator(generator=train_generator,
                    steps_per_epoch=steps_per_epoch,
                    epochs=epochs,
                    validation_data=validation_generator,
                    validation_steps=validation_steps,
                    shuffle=True,
                    callbacks=[metrics],
                    verbose=1)

【问题讨论】：

我认为没有好的选择。如果您查看 keras 中 _fit_loop 的代码，它并没有真正将 validation_steps 和 validation_data 传递给回调。
在（批量开始时）上使用 next(validation_generatro) 怎么样，这会比你的方式更好吗？我的意思是，在这种情况下，我不知道 next(val_generator) 是否会进行下一次迭代，或者它总是从头开始随机开始并且永远不会覆盖所有验证数据。
如果您查看 Keras TensorBoard 回调，似乎有一种方法可以从模型中获取验证数据，但我无法在代码中找到它发生的位置：github.com/tensorflow/tensorflow/blob/r1.14/tensorflow/python/…
我在这里提供一个可能的答案：stackoverflow.com/a/59697739/880783
这能回答你的问题吗？ Create keras callback to save model predictions and targets for each batch during training

标签： python keras metrics

【解决方案1】：

我正在锁定相同问题的解决方案，然后我在已接受的答案here 中找到了您的解决方案和另一个解决方案。如果第二个解决方案有效，我认为这比在“纪元结束”时再次遍历所有验证要好

想法是将target和pred占位符保存在变量中，并在“批处理结束”时通过自定义回调更新变量

【讨论】：

【解决方案2】：

方法如下：

from sklearn.metrics import r2_score

class MetricsCallback(keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs=None):
        if epoch:
            print(self.validation_data[0])
            x_test = self.validation_data[0]
            y_test = self.validation_data[1]
            predictions = self.model.predict(x_test)
            print('r2:', r2_score(prediction, y_test).round(2))

model.fit( ..., callbacks=[MetricsCallback()])

Reference

Keras 2.2.4

【讨论】：

据你在github上的参考，self.validation数据是None，这个问题还没有解决。
@VadymB。 - 那是因为Unfortunately, since moving from fit to flow_from_directory and fit_generator, this has erred because self.validation_data is None. 我正在使用fit。

【解决方案3】：

您可以直接遍历 self.validation_data 以在每个 epoch 结束时聚合所有验证数据。如果您想计算整个验证数据集的准确率、召回率和 F1：

# Validation metrics callback: validation precision, recall and F1
# Some of the code was adapted from https://medium.com/@thongonary/how-to-compute-f1-score-for-each-epoch-in-keras-a1acd17715a2
class Metrics(callbacks.Callback):

    def on_train_begin(self, logs={}):
        self.val_f1s = []
        self.val_recalls = []
        self.val_precisions = []

    def on_epoch_end(self, epoch, logs):
        # 5.4.1 For each validation batch
        for batch_index in range(0, len(self.validation_data)):
            # 5.4.1.1 Get the batch target values
            temp_targ = self.validation_data[batch_index][1]
            # 5.4.1.2 Get the batch prediction values
            temp_predict = (np.asarray(self.model.predict(
                                self.validation_data[batch_index][0]))).round()
            # 5.4.1.3 Append them to the corresponding output objects
            if(batch_index == 0):
                val_targ = temp_targ
                val_predict = temp_predict
            else:
                val_targ = np.vstack((val_targ, temp_targ))
                val_predict = np.vstack((val_predict, temp_predict))

        val_f1 = round(f1_score(val_targ, val_predict), 4)
        val_recall = round(recall_score(val_targ, val_predict), 4)
        val_precis = round(precision_score(val_targ, val_predict), 4)

        self.val_f1s.append(val_f1)
        self.val_recalls.append(val_recall)
        self.val_precisions.append(val_precis)

        # Add custom metrics to the logs, so that we can use them with
        # EarlyStop and csvLogger callbacks
        logs["val_f1"] = val_f1
        logs["val_recall"] = val_recall
        logs["val_precis"] = val_precis

        print("— val_f1: {} — val_precis: {} — val_recall {}".format(
                 val_f1, val_precis, val_recall))
        return

valid_metrics = Metrics()

然后你可以在回调参数中添加valid_metrics：

your_model.fit_generator(..., callbacks = [valid_metrics])

请务必将其放在回调的开头，以防您希望其他回调使用这些措施。

【讨论】：

有没有办法使用验证数据的预测结果，而不是重新计算？
在def on_epoch_end(self, batch, logs) 中访问 self.validation 的先决条件是什么？我总是遇到AttributeError: 'Metrics' object has no attribute 'validation_data'
@vanessaxenia 您需要将 Metrics 类中的 validation_data 作为参数传递。
您的batch_index 实际上是数据的直接索引，因此它一次生成一个训练示例。您需要进行切片以获得完整批次。另外，更关键的是self.validation_data 只是一个包含 4 个元素的列表，这个答案根本不起作用。

【解决方案4】：

Verdant89 犯了一些错误，并没有实现所有功能。下面的代码应该可以工作。

class Metrics(callbacks.Callback):

def on_train_begin(self, logs={}):
    self.val_f1s = []
    self.val_recalls = []
    self.val_precisions = []

def on_epoch_end(self, epoch, logs):
    # 5.4.1 For each validation batch
    for batch_index in range(0, len(self.validation_data[0])):
        # 5.4.1.1 Get the batch target values
        temp_target = self.validation_data[1][batch_index]
        # 5.4.1.2 Get the batch prediction values
        temp_predict = (np.asarray(self.model.predict(np.expand_dims(
                            self.validation_data[0][batch_index],axis=0)))).round()
        # 5.4.1.3 Append them to the corresponding output objects
        if batch_index == 0:
            val_target = temp_target
            val_predict = temp_predict
        else:
            val_target = np.vstack((val_target, temp_target))
            val_predict = np.vstack((val_predict, temp_predict))

    tp, tn, fp, fn = self.compute_tptnfpfn(val_target, val_predict)
    val_f1 = round(self.compute_f1(tp, tn, fp, fn), 4)
    val_recall = round(self.compute_recall(tp, tn, fp, fn), 4)
    val_precis = round(self.compute_precision(tp, tn, fp, fn), 4)

    self.val_f1s.append(val_f1)
    self.val_recalls.append(val_recall)
    self.val_precisions.append(val_precis)

    # Add custom metrics to the logs, so that we can use them with
    # EarlyStop and csvLogger callbacks
    logs["val_f1"] = val_f1
    logs["val_recall"] = val_recall
    logs["val_precis"] = val_precis

    print("— val_f1: {} — val_precis: {} — val_recall {}".format(
             val_f1, val_precis, val_recall))
    return

def compute_tptnfpfn(self,val_target,val_predict):
    # cast to boolean
    val_target = val_target.astype('bool')
    val_predict = val_predict.astype('bool')

    tp = np.count_nonzero(val_target * val_predict)
    tn = np.count_nonzero(~val_target * ~val_predict)
    fp = np.count_nonzero(~val_target * val_predict)
    fn = np.count_nonzero(val_target * ~val_predict)

    return tp, tn, fp, fn

def compute_f1(self,tp, tn, fp, fn):
    f1 = tp*1. / (tp + 0.5*(fp+fn) + sys.float_info.epsilon)
    return f1

def compute_recall(self,tp, tn, fp, fn):
    recall = tp*1. / (tp + fn + sys.float_info.epsilon)
    return recall

def compute_precision(self,tp, tn, fp, fn):
    precision = tp*1. / (tp + fp + sys.float_info.epsilon)
    return precision

【讨论】：