【问题标题】:Val Loss and manually calculated loss produce different valuesVal Loss 和手动计算的 loss 产生不同的值
【发布时间】:2022-12-23 23:59:30
【问题描述】:

我有一个使用 loss: binary cross entropy 的 CNN 分类模型:

optimizer_instance = Adam(learning_rate=learning_rate, decay=learning_rate / 200)
model.compile(optimizer=optimizer_instance, loss='binary_crossentropy')

我们正在保存最好的模型,因此最新保存的模型是实现最佳 val_loss 的模型:

es = EarlyStopping(monitor='val_loss', mode='min', verbose=0, patience=Config.LearningParameters.Patience)
modelPath = modelFileFolder + Config.LearningParameters.ModelFileName
checkpoint = keras.callbacks.ModelCheckpoint(modelPath , monitor='val_loss',
                                                         save_best_only=True,
                                                         save_weights_only=False, verbose=1)
callbacks = [checkpoint,es]
history = model.fit(x=training_generator,
                    batch_size=Config.LearningParameters.Batch_size,
                    epochs=Config.LearningParameters.Epochs,
                    validation_data=validation_generator,                              
                    callbacks=callbacks,
                    verbose=1)

在训练过程中,日志显示 val_loss 已降至 0.41。 在训练结束时,我们加载在训练过程中保存的最佳模型并预测验证数据集。 然后我们手动计算 BCE 并得到一个完全不同的值 2.335。

这是手动损失计算:

bce = tf.keras.losses.BinaryCrossentropy()
binaryCSELoss = bce(y_valid, preds)
print("Calculated Val Loss is: " + str(binaryCSELoss ))

这是培训日志的结尾:

10/10 [==============================] - ETA: 0s - loss: 0.0778
Epoch 40: val_loss did not improve from 0.41081
10/10 [==============================] - 4s 399ms/step - loss: 0.0778 - val_loss: 0.5413
% of marked 1 in validation: [0.51580906 0.48419094]
% of marked 1 in Test: [0.51991504 0.480085  ]
---------------------------------
Calculated Val Loss is: 2.3350689765791395

我们认为它可能必须对我们使用数据生成器的面部做一些事情,然后分别在批次上计算损失,因此我们添加了另一个不使用数据生成器的测试:

history = model.fit(x=trainX,y = y_train,
                      epochs=Config.LearningParameters.Epochs,
                      validation_data=(validateion_x,y_valid),
                      callbacks=callbacks,
                      verbose=1)
predictions_cnn = model.predict(validateion_x)
bce = tf.keras.losses.BinaryCrossentropy(from_logits=False)
binaryCSELoss = bce(y_valid, predictions_cnn)
valloss = binaryCSELoss.numpy()
print("binaryCSELoss logits=false on all Val Loss is: " + str(valloss))
bce = tf.keras.losses.BinaryCrossentropy(from_logits=True)
binaryCSELoss = bce(y_valid, predictions_cnn)
valloss = binaryCSELoss.numpy()
print("binaryCSELoss logits=true on all Val Loss is: " + str(valloss))

训练日志到此结束。同样,损失是不一样的:

54/54 [==============================] - ETA: 0s - loss: 0.5015
Epoch 6: val_loss did not improve from 0.66096
54/54 [==============================] - 8s 144ms/step - loss: 0.5015 - val_loss: 1.9742
% of marked 1 in validation: [0.28723404 0.71276593]
% of marked 1 in Test: [0.52077866 0.47922137]
loading Model: E:\CnnModels\2022-06-03_11-53-53\model.h5
Backend TkAgg is interactive backend. Turning interactive mode on.
binaryCSELoss logits=false on all Val Loss is: 0.6353029
binaryCSELoss logits=true on all Val Loss is: 0.7070135

怎么会这样?

【问题讨论】:

    标签: tensorflow deep-learning conv-neural-network classification loss-function


    【解决方案1】:

    BCE 是二进制交叉熵,当它响应二进制输出时,他们大多将其理解为 [ 1 -p , p ] 即当我们使用具有最大值的输出层来简单表示 [ 1 -p , p ]

    样本损失函数:

    https://towardsdatascience.com/where-did-the-binary-cross-entropy-loss-function-come-from-ac3de349a715

    https://www.tensorflow.org/api_docs/python/tf/keras/losses/Loss

    输出权重参数和偏差是 ( 192, 1 ) 表示作为树图的显着性,值的变化表示在此期间的损失。当您从 logs['loss'] 中读取变量值进行评估时,损失值是首选,但对于要求,映射结果。

    [ 样本 ]:

    import os
    from os.path import exists
    
    import tensorflow as tf
    import tensorflow_io as tfio
    
    import matplotlib.pyplot as plt
    
    """""""""""""""""""""""""""""""""""""""""""""""""""""""""
    [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
    None
    """""""""""""""""""""""""""""""""""""""""""""""""""""""""
    physical_devices = tf.config.experimental.list_physical_devices('GPU')
    assert len(physical_devices) > 0, "Not enough GPU hardware devices available"
    config = tf.config.experimental.set_memory_growth(physical_devices[0], True)
    print(physical_devices)
    print(config)
    
    """""""""""""""""""""""""""""""""""""""""""""""""""""""""
    Variables
    """""""""""""""""""""""""""""""""""""""""""""""""""""""""
    PATH = os.path.join('F:\datasets\downloads\Actors\train\Pikaploy', '*.tif')
    PATH_2 = os.path.join('F:\datasets\downloads\Actors\train\Candidt Kibt', '*.tif')
    files = tf.data.Dataset.list_files(PATH)
    files_2 = tf.data.Dataset.list_files(PATH_2)
    
    list_file = []
    list_file_actual = []
    list_label = []
    list_label_actual = [ 'Pikaploy', 'Pikaploy', 'Pikaploy', 'Pikaploy', 'Pikaploy', 'Candidt Kibt', 'Candidt Kibt', 'Candidt Kibt', 'Candidt Kibt', 'Candidt Kibt' ]
    for file in files.take(5):
        image = tf.io.read_file( file )
        image = tfio.experimental.image.decode_tiff(image, index=0)
        list_file_actual.append(image)
        image = tf.image.resize(image, [32,32], method='nearest')
        list_file.append(image)
        list_label.append(1)
        
    for file in files_2.take(5):
        image = tf.io.read_file( file )
        image = tfio.experimental.image.decode_tiff(image, index=0)
        list_file_actual.append(image)
        image = tf.image.resize(image, [32,32], method='nearest')
        list_file.append(image)
        list_label.append(9)
    
    checkpoint_path = "F:\models\checkpoint\" + os.path.basename(__file__).split('.')[0] + "\TF_DataSets_01.h5"
    checkpoint_dir = os.path.dirname(checkpoint_path)
    
    if not exists(checkpoint_dir) : 
        os.mkdir(checkpoint_dir)
        print("Create directory: " + checkpoint_dir)
    
    """""""""""""""""""""""""""""""""""""""""""""""""""""""""
    DataSet
    """""""""""""""""""""""""""""""""""""""""""""""""""""""""
    dataset = tf.data.Dataset.from_tensor_slices((tf.constant(tf.cast(list_file, dtype=tf.int64), shape=(10, 1, 32, 32, 4), dtype=tf.int64),tf.constant(list_label, shape=(10, 1, 1), dtype=tf.int64)))
    
    """""""""""""""""""""""""""""""""""""""""""""""""""""""""
    : Model Initialize
    """""""""""""""""""""""""""""""""""""""""""""""""""""""""
    model = tf.keras.models.Sequential([
        tf.keras.layers.InputLayer(input_shape=( 32, 32, 4 )),
        tf.keras.layers.Normalization(mean=3., variance=2.),
        tf.keras.layers.Normalization(mean=4., variance=6.),
        tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.Dense(128, activation='relu'),
        tf.keras.layers.Reshape((128, 225)),
        tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(96, return_sequences=True, return_state=False)),
        tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(96)),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(192, activation='relu'),
        tf.keras.layers.Dense(1, name='output'),
    ])
    
    """""""""""""""""""""""""""""""""""""""""""""""""""""""""
    : Callback
    """""""""""""""""""""""""""""""""""""""""""""""""""""""""
    class custom_callback(tf.keras.callbacks.Callback):
            
        def on_epoch_end(self, epoch, logs=None):
            if( logs['accuracy'] >= 0.97 ):
                self.model.stop_training = True
                
            print( "% of marked 2 in Train:  " + str( self.model.get_layer( name='output' ).get_weights()[0][ tf.math.argmax( self.model.get_layer( name='output' ).get_weights()[0] ).numpy() ][0][0] ) + " " + str( 1 - self.model.get_layer( name='output' ).get_weights()[0][ tf.math.argmax( self.model.get_layer( name='output' ).get_weights()[0] ).numpy() ][0][0] )
            )
    
        def on_test_end(self, logs=None):
            print( "
    " )   
            print( "% of marked 1 in Train:  " + str( self.model.get_layer( name='output' ).get_weights()[0][ tf.math.argmax( self.model.get_layer( name='output' ).get_weights()[0] ).numpy() ][0][0] ) + " " + str( 1 - self.model.get_layer( name='output' ).get_weights()[0][ tf.math.argmax( self.model.get_layer( name='output' ).get_weights()[0] ).numpy() ][0][0] )
            )
            # print( "
    " )     
        
    custom_callback = custom_callback()
    
    """""""""""""""""""""""""""""""""""""""""""""""""""""""""
    : Optimizer
    """""""""""""""""""""""""""""""""""""""""""""""""""""""""
    optimizer = tf.keras.optimizers.Nadam(
        learning_rate=0.00001, beta_1=0.9, beta_2=0.999, epsilon=1e-07,
        name='Nadam'
    )
    
    """""""""""""""""""""""""""""""""""""""""""""""""""""""""
    : Loss Fn
    """""""""""""""""""""""""""""""""""""""""""""""""""""""""                               
    lossfn = tf.keras.losses.BinaryCrossentropy( 
        from_logits=False,
        reduction=tf.keras.losses.Reduction.AUTO,
        name='BinaryCrossentropy' )
    
    """""""""""""""""""""""""""""""""""""""""""""""""""""""""
    : Model Summary
    """""""""""""""""""""""""""""""""""""""""""""""""""""""""
    model.compile(optimizer=optimizer, loss=lossfn, metrics=['accuracy'])
    
    """""""""""""""""""""""""""""""""""""""""""""""""""""""""
    : FileWriter
    """""""""""""""""""""""""""""""""""""""""""""""""""""""""
    if exists(checkpoint_path) :
        model.load_weights(checkpoint_path)
        print("model load: " + checkpoint_path)
        input("Press Any Key!")
    
    """""""""""""""""""""""""""""""""""""""""""""""""""""""""
    : Training
    """""""""""""""""""""""""""""""""""""""""""""""""""""""""
    history = model.fit( dataset, validation_data=(dataset), batch_size=1, epochs=50, callbacks=[custom_callback] )
    model.save_weights(checkpoint_path)
    
    plt.figure(figsize=(5,2))
    plt.title("Actors recognitions")
    for i in range(len(list_file)):
        img = tf.keras.preprocessing.image.array_to_img(
            list_file[i],
            data_format=None,
            scale=True
        )
        img_array = tf.keras.preprocessing.image.img_to_array(img)
        img_array = tf.expand_dims(img_array, 0)
        predictions = model.predict(img_array)
        score = tf.nn.softmax(predictions[0])
        plt.subplot(5, 2, i + 1)
        plt.xticks([])
        plt.yticks([])
        plt.grid(False)
        plt.imshow(list_file_actual[i])
        plt.xlabel(str(round(score[tf.math.argmax(score).numpy()].numpy(), 2)) + ":" +  str(list_label_actual[tf.math.argmax(score)]))
        
    plt.show()
    
    input('...')
    

    [ 输出 ]:

    10/10 [==============================] - 1s 56ms/step - loss: -60.9311 - accuracy: 0.5000 - val_loss: -60.9329 - val_accuracy: 0.5000
    Epoch 6/50
     9/10 [==========================>...] - ETA: 0s - loss: -54.1486 - accuracy: 0.5556
    
    % of marked 1 in Train:  0.17788188 0.8221181184053421
    % of marked 2 in Train:  0.17788188 0.8221181184053421
    10/10 [==============================] - 1s 54ms/step - loss: -60.9331 - accuracy: 0.5000 - val_loss: -60.9341 - val_accuracy: 0.5000
    Epoch 7/50
     9/10 [==========================>...] - ETA: 0s - loss: -54.1499 - accuracy: 0.5556
    
    % of marked 1 in Train:  0.17788248 0.8221175223588943
    % of marked 2 in Train:  0.17788248 0.8221175223588943
    10/10 [==============================] - 1s 57ms/step - loss: -60.9343 - accuracy: 0.5000 - val_loss: -60.9351 - val_accuracy: 0.5000
    Epoch 8/50
     9/10 [==========================>...] - ETA: 0s - loss: -54.1509 - accuracy: 0.5556
    
    % of marked 1 in Train:  0.1778828 0.8221171945333481
    % of marked 2 in Train:  0.1778828 0.8221171945333481
    

    【讨论】:

    • 我不确定这与我的要求有什么关系。如果我错过了什么,我会喜欢一个解释。
    • 问题是,您应用了 BCE,而最初的问题是让自定义 BCE 和输出值与您的字幕中一样,我回答了结果。 (标记 1 和标记 2 )
    猜你喜欢
    • 2021-10-22
    • 1970-01-01
    • 2016-07-16
    • 1970-01-01
    • 2021-10-22
    • 1970-01-01
    • 2019-01-13
    • 2021-10-19
    • 2018-04-12
    相关资源
    最近更新 更多