Call 函数何时以及如何在 Keras 的模型子类化中工作？答案

【问题标题】：When and How the Call function work in Model Subclassing of Keras?Call 函数何时以及如何在 Keras 的模型子类化中工作？
【发布时间】：2021-03-18 19:05:49
【问题描述】：

我在 Hands-on Machine Learning with Scikit-Learn、Keras 和 Tensorflow 中阅读了关于使用子类化 API 构建动态模型的内容，其中主要涉及编写包含两个方法的子类：构造函数和调用函数。构造函数相当容易理解。但是，在构建模型时，我无法理解调用函数何时以及如何工作。

我使用了书中的代码并进行了如下实验（使用来自 sklearn 的加州住房数据集）：

class WideAndDeepModel(keras.Model):
    def __init__(self, units=30, activation='relu', **kwargs):
        super().__init__(**kwargs)
        self.hidden1 = keras.layers.Dense(units, activation=activation)
        self.hidden2 = keras.layers.Dense(units, activation=activation)
        self.main_output = keras.layers.Dense(1)
        self.aux_output = keras.layers.Dense(1)
    def call(self, inputs):
        print('call function running')
        input_A, input_B = inputs
        hidden1 = self.hidden1(input_B)
        hidden2 = self.hidden2(hidden1)
        concat = keras.layers.concatenate([input_A, hidden2])
        main_output = self.main_output(concat)
        aux_output = self.aux_output(hidden2)
        return main_output, aux_output

model = WideAndDeepModel()
model.compile(loss=['mse','mse'], loss_weights=[0.9,0.1], optimizer='sgd')
history = model.fit([X_train_A, X_train_B],[y_train, y_train], epochs=20, validation_data=([X_val_A, X_val_B], [y_val, y_val]))

下面是训练时的输出：

Epoch 1/20
***call function running***
***call function running***
353/363 [============================>.] - ETA: 0s - loss: 1.6398 - output_1_loss: 1.5468 - output_2_loss: 2.4769
***call function running***
363/363 [==============================] - 1s 1ms/step - loss: 1.6224 - output_1_loss: 1.5296 - output_2_loss: 2.4571 - val_loss: 4.3588 - val_output_1_loss: 4.7174 - val_output_2_loss: 1.1308
Epoch 2/20
363/363 [==============================] - 0s 1ms/step - loss: 0.6073 - output_1_loss: 0.5492 - output_2_loss: 1.1297 - val_loss: 75.1126 - val_output_1_loss: 81.6632 - val_output_2_loss: 16.1572
...

调用函数在第一个 epoch 的训练开始时运行两次，然后在第一个 epoch 结束时运行。之后就再也不会运行了。

在我看来，虽然层在构造函数的早期实例化，但层之间的连接（在调用函数中定义）建立得很晚（在训练开始时）。在我看来，层之间没有这种所谓的连接的逻辑实体，连接只是将一层的输出按特定顺序传递到另一层的过程。我的理解正确吗？

第二个问题是为什么调用函数会在训练的早期运行三次而不是一次。

【问题讨论】：

这能回答你的问题吗？ TensorFlow: Difference between functioning of __call__() and call() when designing custom layers?

标签： python tensorflow keras deep-learning subclassing

【解决方案1】：

层在构造函数的早期实例化

正确

层之间的连接建立得很晚

同样正确，当您调用 model.build() 或第一次调用模型时，权重会被初始化，正如您在此 guide 中看到的对 Keras 层的子类：

class Linear(keras.layers.Layer):
    def __init__(self, units=32):
        super(Linear, self).__init__()
        self.units = units

    def build(self, input_shape):
        self.w = self.add_weight(
            shape=(input_shape[-1], self.units),
            initializer="random_normal",
            trainable=True,
        )
        self.b = self.add_weight(
            shape=(self.units,), initializer="random_normal", trainable=True
        )

    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b

为什么 call 函数在早期运行了 3 次

第一次可能是第一次调用模型，并实例化权重。然后另一个时间来构建 Tensorflow 图，它是非 Python 代码，而不是运行 Tensorflow 模型。该模型被调用一次以创建此图，并且进一步的调用在 Python 之外，因此您的打印函数不再是它的一部分。您可以使用model.compile(..., run_eagerly=True) 更改此行为。最后，第三次将是第一次通过验证数据。

【讨论】：

【解决方案2】：

首先要知道，当使用model.compile() 和model.fit() 不是调试模型的好主意时，使用 python 打印语句不是一个好主意，因为 tensorflow 使用 C++ 来更快地并行运行训练，而这print 语句将被省略。但让我回到你的问题。值得一提的是，TensorFlow 和 Keras 模型具有惰性行为，这意味着当您实例化模型 model = WideAndDeepModel() 时，尚未创建模型权重，无论您是否第一次调用 model.call()，它们都会被创建或model.build() 方法。因此，您的模型似乎已在 python 中调用一次以创建模型权重，一次用于启动训练过程并构建 C++ 对象（图形），一次用于启动验证。之后，所有计算都将在 C++ 中执行，您不会看到任何打印语句。

注意：如果你想在图形模式下打印一些东西，你可以使用tf.print

【讨论】：

你能提供关于 C++ 部分的参考吗？我想了解更多关于这方面的内容
我觉得这个TensorFlow官方tutorial是一个很好的起点。