【问题标题】:Multioutput-Multiclass Classification in Custom Scratch Training in TF.KerasTF.Keras 自定义 Scratch 训练中的多输出-多类分类
【发布时间】:2020-10-01 12:20:44
【问题描述】:

我想从头开始训练一个多出多类分类模型(使用自定义fit())。我想要一些建议。为了学习机会,在这里我将更详细地展示整个场景。希望它对任何人都有帮助。

数据集和目标

我正在使用来自here 的数据;这是一个孟加拉语手写字符识别挑战,每个样本都有 3 个相互关联的输出以及每个样本的多个类。请看下图:

在上图中,如您所见,ক্ট্রো由3个组件(ক্ট、ো、‍‍্র)组成,即字根元音变音符号和辅音变音符号,它们一起被称为字形Grapheme Root 也有 168 个不同的类别,并且与其他类别相同(117)。增加的复杂性导致 ~13,000 个不同的字形变体(与英语的 250 个字形单位相比)。

目标是对每张图像中的字形组成进行分类。

初步方法(没有问题)

我在here 上实现了一个训练管道,它使用旧的keras(不是tf.keras)进行了演示,它具有方便的功能,例如model.compilecallbacks 等。我定义了一个custom data generator 和定义了一个类似于下面的模型架构。

input_tensor = Input(input_dim)
curr_output = base_model(input_tensor)

oputput1 = Dense(168,  activation='softmax', name='gra') (curr_output)
oputput2 = Dense(11,   activation='softmax', name='vow') (curr_output)
oputput3 = Dense(7,    activation='softmax', name='cons') (curr_output)
output_tensor = [oputput1, oputput2, oputput3]
    
model = Model(input_tensor, output_tensor)

并编译模型如下:

model.compile(

        optimizer = Adam(learning_rate=0.001), 

        loss = {'gra' : 'categorical_crossentropy', 
                'vow' : 'categorical_crossentropy', 
                'cons': 'categorical_crossentropy'},

        loss_weights = {'gra' : 1.0,
                        'vow' : 1.0,
                        'cons': 1.0},

        metrics={'gra' : 'accuracy', 
                 'vow' : 'accuracy', 
                 'cons': 'accuracy'}
    )

如您所见,我可以使用特定的 lossloss_weightsaccuracy 清晰地控制每个输出。而使用.fit()方法,模型可以使用任意callbacks函数。

新方法(以及一些问题)

现在,我想用tf.keras 的新功能重新实现它。例如模型子类化自定义拟合训练。但是,数据加载器没有变化。模型定义如下:

    def __init__(self, dim):
        super(Net, self).__init__()
        self.efnet  = EfficientNetB0(input_shape=dim,
                                     include_top = False, 
                                     weights = 'imagenet')
        self.gap     = KL.GlobalAveragePooling2D()
        self.output1 = KL.Dense(168,  activation='softmax', name='gra')
        self.output2 = KL.Dense(11,   activation='softmax', name='vow') 
        self.output3 = KL.Dense(7,    activation='softmax', name='cons') 
    
    def call(self, inputs, training=False):
        x     = self.efnet(inputs)
        x     = self.gap(x)
        y_gra = self.output1(x)
        y_vow = self.output2(x)
        y_con = self.output3(x)
        return [y_gra, y_vow, y_con]

现在我面临的主要问题是为我的每个输出正确定义metricslossloss_weights 函数。但是,我是这样开始的:

optimizer        = tf.keras.optimizers.Adam(learning_rate=0.05)
loss_fn          = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
train_acc_metric = tf.keras.metrics.Accuracy()

@tf.function
def train_step(x, y):
    with tf.GradientTape(persistent=True) as tape:
        logits = model(x, training=True)  # Logits for this minibatch
        train_loss_value = loss_fn(y, logits)

    grads = tape.gradient(train_loss_value, model.trainable_weights)
    optimizer.apply_gradients(zip(grads, model.trainable_weights))
    train_acc_metric.update_state(y, logits)
    return train_loss_value


for epoch in range(2):
    # Iterate over the batches of the dataset.
    for step, (x_batch_train, y_batch_train) in enumerate(train_generator):
        train_loss_value = train_step(x_batch_train, y_batch_train)

    # Reset metrics at the end of each epoch
    train_acc_metric.reset_states()

除了上述设置之外,我还尝试了其他许多方法来处理此类问题。例如,我定义了 3 个损失函数和 3 个指标,但事情并没有正常工作。 loss/acc 变成了 nan 类型的东西。

这是我在这种情况下的几个直接查询:

  • 如何定义lossmetricsloss_weights
  • 如何有效利用所有callbacks 功能

而且只是为了学习的机会,如果它还有 regression 类型的输出(连同其余的 3 多输出,那么总共 4);如何在自定义fit 中处理所有这些?我访问过这个SO,给出了一些关于不同类型输出的提示(classification + regression)。

【问题讨论】:

  • 您是否尝试过使用功能 API - 就像您使用标准 keras 所做的那样(而不是模型子类化)?
  • 是的,正如我上面提到的,使用函数式API,这是我的notebook
  • 我的意思是使用 tf.keras 的功能 api - 因为您使用 keras 和功能 API 的方法取得了成功,为什么不继续使用 tf.keras 中的功能 api? (如果我理解正确,你的目标是从keras 移动到tf.keras
  • 哦,对不起。我误解了。并感谢您的提问。是的,我也试过了。它按预期工作。仅供参考,tf.kerasfunctional api + model.fitmodel subclassing + model.fit 工作得很好。我的最终目标是尝试了解tf.keras 中针对此类问题案例的自定义fit 方法。并且也在寻找一种方便的方式来利用callbacks函数的特性。
  • 好吧,如果您不使用.fit 并使用自己的训练循环 - 回调是无关紧要的,因为您现在可以控制训练。关于自定义损失,我相信它是完全相同的 - 您可以从您的 call() 方法返回一个字典,然后在训练步骤中使用 3 种不同的损失来计算梯度并自己应用它们。这也允许您使用loss_weights

标签: python tensorflow machine-learning keras deep-learning


【解决方案1】:

你只需要做一个自定义的训练循环,但一切都需要做 3 次(如果你还有一个连续变量,则 + 1)。这是一个使用四输出架构的示例:

import tensorflow as tf
import numpy as np

(xtrain, train_target), (xtest, test_target) = tf.keras.datasets.mnist.load_data()

# 10 categories, one for each digit
ytrain1 = tf.keras.utils.to_categorical(train_target, num_classes=10)
ytest1 = tf.keras.utils.to_categorical(test_target, num_classes=10)

# 2 categories, if the digit is odd or not
ytrain2 = tf.keras.utils.to_categorical((train_target % 2 == 0).astype(int), 
                                        num_classes=2)
ytest2 = tf.keras.utils.to_categorical((test_target % 2 == 0).astype(int), 
                                       num_classes=2)

# 4 categories, based on the interval of the digit
ytrain3 = tf.keras.utils.to_categorical(np.digitize(train_target, [3, 6, 8]), 
                                        num_classes=4)
ytest3 = tf.keras.utils.to_categorical(np.digitize(test_target, [3, 6, 8]), 
                                       num_classes=4)

# Regression, the square of the digit
ytrain4 = tf.square(tf.cast(train_target, tf.float32))
ytest4 = tf.square(tf.cast(test_target, tf.float32))

# train dataset
train_ds = tf.data.Dataset. \
    from_tensor_slices((xtrain, ytrain1, ytrain2, ytrain3, ytrain4)). \
    shuffle(32). \
    batch(32).map(lambda a, *rest: (tf.divide(a[..., None], 255), rest)). \
    prefetch(tf.data.experimental.AUTOTUNE)

# test dataset
test_ds = tf.data.Dataset. \
    from_tensor_slices((xtest, ytest1, ytest2, ytest3, ytest4)). \
    shuffle(32). \
    batch(32).map(lambda a, *rest: (tf.divide(a[..., None], 255), rest)). \
    prefetch(tf.data.experimental.AUTOTUNE)


# architecture
class Net(tf.keras.Model):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = tf.keras.layers.Conv2D(filters=16, kernel_size=(3, 3),
                                            strides=(1, 1), input_shape=(28, 28, 1),
                                            activation='relu')
        self.maxp1 = tf.keras.layers.MaxPool2D(pool_size=(2, 2))
        self.conv2 = tf.keras.layers.Conv2D(filters=32, kernel_size=(3, 3),
                                            strides=(1, 1),
                                            activation='relu')
        self.maxp2 = tf.keras.layers.MaxPool2D(pool_size=(2, 2))
        self.conv3 = tf.keras.layers.Conv2D(filters=64, kernel_size=(3, 3),
                                            strides=(1, 1),
                                            activation='relu')
        self.maxp3 = tf.keras.layers.MaxPool2D(pool_size=(2, 2))
        self.gap = tf.keras.layers.Flatten()
        self.dense = tf.keras.layers.Dense(64, activation='relu')
        self.output1 = tf.keras.layers.Dense(10, activation='softmax')
        self.output2 = tf.keras.layers.Dense(2, activation='softmax')
        self.output3 = tf.keras.layers.Dense(4, activation='softmax')
        self.output4 = tf.keras.layers.Dense(1, activation='linear')

    def call(self, inputs, training=False, **kwargs):
        x = self.conv1(inputs)
        x = self.maxp1(x)
        x = self.conv2(x)
        x = self.maxp2(x)
        x = self.conv3(x)
        x = self.maxp3(x)
        x = self.gap(x)
        x = self.dense(x)
        out1 = self.output1(x)
        out2 = self.output2(x)
        out3 = self.output3(x)
        out4 = self.output4(x)
        return out1, out2, out3, out4


model = Net()

optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)

# the three losses
loss_1 = tf.losses.CategoricalCrossentropy()
loss_2 = tf.losses.CategoricalCrossentropy()
loss_3 = tf.losses.CategoricalCrossentropy()
loss_4 = tf.losses.MeanAbsoluteError()

# mean object that keeps track of the train losses
loss_1_train = tf.metrics.Mean(name='tr_loss_1')
loss_2_train = tf.metrics.Mean(name='tr_loss_2')
loss_3_train = tf.metrics.Mean(name='tr_loss_3')
loss_4_train = tf.metrics.Mean(name='tr_loss_4')

# mean object that keeps track of the test losses
loss_1_test = tf.metrics.Mean(name='ts_loss_1')
loss_2_test = tf.metrics.Mean(name='ts_loss_2')
loss_3_test = tf.metrics.Mean(name='ts_loss_3')
loss_4_test = tf.metrics.Mean(name='ts_loss_4')

# accuracies for printout
acc_1_train = tf.metrics.CategoricalAccuracy(name='tr_acc_1')
acc_2_train = tf.metrics.CategoricalAccuracy(name='tr_acc_2')
acc_3_train = tf.metrics.CategoricalAccuracy(name='tr_acc_3')

# accuracies for printout
acc_1_test = tf.metrics.CategoricalAccuracy(name='ts_acc_1')
acc_2_test = tf.metrics.CategoricalAccuracy(name='ts_acc_2')
acc_3_test = tf.metrics.CategoricalAccuracy(name='ts_acc_3')


# custom training loop
@tf.function
def train_step(x, y1, y2, y3, y4):
    with tf.GradientTape(persistent=True) as tape:
        out1, out2, out3, out4 = model(x, training=True)
        loss_1_value = loss_1(y1, out1)
        loss_2_value = loss_2(y2, out2)
        loss_3_value = loss_3(y3, out3)
        loss_4_value = loss_4(y4, out4)

    losses = [loss_1_value, loss_2_value, loss_3_value, loss_4_value]

    # a list of losses is passed
    grads = tape.gradient(losses, model.trainable_variables)

    # gradients are applied
    optimizer.apply_gradients(zip(grads, model.trainable_variables))

    # losses are updated
    loss_1_train(loss_1_value)
    loss_2_train(loss_2_value)
    loss_3_train(loss_3_value)
    loss_4_train(loss_4_value)

    # accuracies are updated
    acc_1_train.update_state(y1, out1)
    acc_2_train.update_state(y2, out2)
    acc_3_train.update_state(y3, out3)


@tf.function
def test_step(x, y1, y2, y3, y4):
    out1, out2, out3, out4 = model(x, training=False)
    loss_1_value = loss_1(y1, out1)
    loss_2_value = loss_2(y2, out2)
    loss_3_value = loss_3(y3, out3)
    loss_4_value = loss_4(y4, out4)

    loss_1_test(loss_1_value)
    loss_2_test(loss_2_value)
    loss_3_test(loss_3_value)
    loss_4_test(loss_4_value)

    acc_1_test.update_state(y1, out1)
    acc_2_test.update_state(y2, out2)
    acc_3_test.update_state(y3, out3)


for epoch in range(5):
    # train step
    for inputs, outputs1, outputs2, outputs3, outputs4 in train_ds:
        train_step(inputs, outputs1, outputs2, outputs3, outputs4)

    # test step
    for inputs, outputs1, outputs2, outputs3, outputs4 in test_ds:
        test_step(inputs, outputs1, outputs2, outputs3, outputs4)

    metrics = [acc_1_train, acc_1_test,
               acc_2_train, acc_2_test,
               acc_3_train, acc_3_test,
               loss_4_train, loss_4_test]

    # printing metrics
    for metric in metrics:
        print(f'{metric.name}:{metric.result():=6.4f}', end=' ')   
    print()

    # resetting the states of the metrics
    loss_1_train.reset_states()
    loss_2_train.reset_states()
    loss_3_train.reset_states()

    loss_1_test.reset_states()
    loss_2_test.reset_states()
    loss_3_test.reset_states()

    acc_1_train.reset_states()
    acc_2_train.reset_states()
    acc_3_train.reset_states()

    acc_1_test.reset_states()
    acc_2_test.reset_states()
    acc_3_test.reset_states()
ts_acc_1:0.9495 ts_acc_2:0.9685 ts_acc_3:0.9589 ts_loss_4:5.5617 
ts_acc_1:0.9628 ts_acc_2:0.9747 ts_acc_3:0.9697 ts_loss_4:4.8953 
ts_acc_1:0.9697 ts_acc_2:0.9758 ts_acc_3:0.9733 ts_loss_4:4.5209 
ts_acc_1:0.9715 ts_acc_2:0.9796 ts_acc_3:0.9745 ts_loss_4:4.2175 
ts_acc_1:0.9742 ts_acc_2:0.9834 ts_acc_3:0.9775 ts_loss_4:3.9825

我不知道如何在自定义训练循环中使用 Keras 回调,most popular question 也不知道这个主题。如果您想使用 EarlyStopping,我 personally use a collections.deque,并在最小损失为倒数第 n 个时中断。这是一个例子:

from collections import deque
import numpy as np

epochs = 100
early_stopping = 5

loss_hist = deque(maxlen=early_stopping)

for epoch in range(epochs):
    loss_value = np.random.rand()
    loss_hist.append(loss_value)

    print('Last 5 values: ', *np.round(loss_hist, 3))

    if len(loss_hist) == early_stopping and loss_hist.popleft() < min(loss_hist):
        print('Early stopping. No loss decrease in %i epochs.\n' % early_stopping)
        break
Last 5 values:  0.456
Last 5 values:  0.456 0.153
Last 5 values:  0.456 0.153 0.2
Last 5 values:  0.456 0.153 0.2 0.433
Last 5 values:  0.456 0.153 0.2 0.433 0.528
Last 5 values:  0.153 0.2 0.433 0.528 0.349
Early stopping. No loss decrease in 5 epochs.

您可以看到最后一次,最内层的值是所有值中最小的,因此验证损失没有增加。这就是停止条件。

【讨论】:

  • 期待你:)。赞成。无论如何,我也尝试过使用上述方法并运行但得到了nan 类型的输出。请让我重新检查一下,我搞砸了。同时,您能否在答案中再添加两个要点:(1)。如果我们有regression 类型的输入和输出呢? (2)。虽然callbacks 功能与直接使用无关,但有没有办法在自定义fit 中使用它?
  • 我的第一个版本有 2-3 个错误,对此感到抱歉。我更正了它们并添加了您要求的详细信息。
猜你喜欢
  • 1970-01-01
  • 2020-04-28
  • 1970-01-01
  • 1970-01-01
  • 2019-05-20
  • 2017-06-08
  • 2014-09-23
  • 1970-01-01
  • 2023-04-09
相关资源
最近更新 更多