添加自定义指标 Keras 子类化 API答案

【问题标题】：Adding custom metric Keras Subclassing API添加自定义指标 Keras 子类化 API
【发布时间】：2020-06-24 21:43:18
【问题描述】：

我正在关注“Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition - Aurélien Geron”第 12 章的“基于模型内部的损失和指标”部分，他在其中展示了如何添加不依赖于标签和预测的自定义损失和指标。

为了说明这一点，我们添加了一个自定义的“重建损失”，方法是在上层隐藏层的顶部添加一个应该重现输入的层。损失是重建损失和输入之间的均方差。

他展示了添加自定义损失的代码，效果很好，但即使按照他的描述，我也无法添加指标，因为它会引发“ValueError”。他说：

同样，您可以通过以下方式添加基于模型内部的自定义指标以您想要的任何方式计算它，只要结果是度量对象。例如，您可以创建一个keras.metrics.Mean 对象在构造函数中，然后在 call() 方法中调用它，并传递给它 recon_loss，最后通过调用模型的 add_metric() 方法。

这是代码（我为自己添加的行添加了#MINE）

import tensorflow as tf
from tensorflow import keras
class ReconstructingRegressor(keras.models.Model):
    def __init__(self, output_dim, **kwargs):
        super().__init__(**kwargs)
        self.hidden = [keras.layers.Dense(30, activation="selu",
                                          kernel_initializer="lecun_normal")
                       for _ in range(5)]
        self.out = keras.layers.Dense(output_dim)
        self.reconstruction_mean = keras.metrics.Mean(name="reconstruction_error") #MINE

    def build(self, batch_input_shape):
        n_inputs = batch_input_shape[-1]
        self.reconstruct = keras.layers.Dense(n_inputs)
        super().build(batch_input_shape)

    def call(self, inputs, training=None):
        Z = inputs
        for layer in self.hidden:
            Z = layer(Z)
        reconstruction = self.reconstruct(Z)
        recon_loss = tf.reduce_mean(tf.square(reconstruction - inputs))
        self.add_loss(0.05 * recon_loss)
        if training:                                      #MINE
            result = self.reconstruction_mean(recon_loss) #MINE
        else:                                             #MINE
            result = 0.                                   #MINE, I have also tried different things here, 
                                                          #but the help showed a similar sample to this.
        self.add_metric(result, name="foo")               #MINE
        return self.out(Z)

然后编译和拟合模型：

training_set_size=10
X_dummy = np.random.randn(training_set_size, 8) 
y_dummy = np.random.randn(training_set_size, 1)

model = ReconstructingRegressor(1)
model.compile(loss="mse", optimizer="nadam")
history = model.fit(X_dummy, y_dummy, epochs=2)

哪个抛出：


ValueError: in converted code:

    <ipython-input-296-878bdeb30546>:26 call  *
        self.add_metric(result, name="foo")               #MINE
    C:\Users\Kique\Anaconda3\envs\piz3\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py:1147 add_metric
        self._symbolic_add_metric(value, aggregation, name)
    C:\Users\Kique\Anaconda3\envs\piz3\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py:1867 _symbolic_add_metric
        'We do not support adding an aggregated metric result tensor that '

    ValueError: We do not support adding an aggregated metric result tensor that is not the output of a `tf.keras.metrics.Metric` metric instance. Without having access to the metric instance we cannot reset the state of a metric after every epoch during training. You can create a `tf.keras.metrics.Metric` instance and pass the result here or pass an un-aggregated result with `aggregation` parameter set as `mean`. For example: `self.add_metric(tf.reduce_sum(inputs), name='mean_activation', aggregation='mean')`

阅读后，我尝试了类似的方法来解决该问题，但它只是导致了不同的错误。我该如何解决这个问题？这样做的“正确”方法是什么？

我在 Windows 上使用 conda，安装了 tensorflow-gpu 2.1.0。

【问题讨论】：

标签： tensorflow keras deep-learning keras-layer tf.keras

【解决方案1】：

问题就在这里：

def call(self, inputs, training=None):
    Z = inputs
    for layer in self.hidden:
        Z = layer(Z)
    reconstruction = self.reconstruct(Z)
    recon_loss = tf.reduce_mean(tf.square(reconstruction - inputs))
    self.add_loss(0.05 * recon_loss)
    if training:                                      
        result = self.reconstruction_mean(recon_loss) 
    else:                                             
        result = 0.#<---Here!                                          
    self.add_metric(result, name="foo")              
    return self.out(Z)

错误表示 add_metric 仅获取从 tf.keras.metrics.Metric 派生的指标，但 0 是标量，而不是指标类型。

我建议的解决方案是简单地这样做：

def call(self, inputs, training=None):
    Z = inputs
    for layer in self.hidden:
        Z = layer(Z)
    reconstruction = self.reconstruct(Z)
    recon_loss = tf.reduce_mean(tf.square(reconstruction - inputs))
    self.add_loss(0.05 * recon_loss)
    if training:                                      
        result = self.reconstruction_mean(recon_loss)                           
        self.add_metric(result, name="foo")              
    return self.out(Z)

这样，您的平均reconstruction_error 将仅在训练时显示。

由于您使用 Eager 模式，您应该使用dynamic=True 创建图层，如下所示：

model = ReconstructingRegressor(1,dynamic=True)
model.compile(loss="mse", optimizer="nadam")
history = model.fit(X_dummy, y_dummy, epochs=2, batch_size=10)

P.S - 请注意，在调用 model.fit or model.evaluate 时，您还应该确保批量大小划分您的训练集（因为这是一个有状态网络）。所以，像这样调用这些函数：model.fit(X_dummy, y_dummy, epochs=2, batch_size=10) 或 model.evaluate(X_dummy,y_dummy, batch_size=10)。祝你好运！

【讨论】：

它给我带来了另一个错误：“ RuntimeError: You are using the method add_metric in a control flow branch in your layer” ...“目前不支持。请将您的调用移至 add_metric超出控制流分支" ... "您还可以通过将dynamic=True 传递给层构造函数将层标记为动态（仅急切）来解决此问题。"
我尝试了他们的建议，但它给了我上面提到的ValueError
看我的回答。 dynamic=True 确实有效，应该在构建图层时声明。我在 tensorflow-gpu-2.1.0 上再次对其进行了测试，它可以正常工作
谢谢，现在可以使用了！缺点是设置dynamic=True 会大大减慢训练和推理速度。我知道这超出了问题的范围，但如果您能含糊地解释如何克服这个问题，我将不胜感激。
显然，删除这一行：self.add_metric(result, name="foo") 也可以解决问题，并且不需要设置dynamic=True。该指标根据需要在训练中显示。我觉得很奇怪，仅在模型/层构造函数中指定一个指标，它就会在训练时显示出来，因为指标输出可能在模型内部用于不同的目的。