【问题标题】:Why is my CNN regressive network not learning?为什么我的 CNN 回归网络不学习?
【发布时间】:2019-12-07 15:22:21
【问题描述】:

我正在运行一个回归类型的卷积神经网络。该网络获取一张 55x1756 的图像并输出另一张尺寸为 11x1756 的图像。出于这个原因,我的架构的最后一层(如下所示)由一个密集层组成,该层将输出维度相乘作为参数。

如下图,我使用“tanh”激活函数和“adam”作为优化器。我已经训练网络一段时间了,但结果几乎总是一样的。除了验证损失低于不理想的训练损失这一事实外,损失和均方根误差保持稳定。下面附上训练示意图和模型摘要。

您对我如何改进它有什么建议吗? 提前致谢!

def generator(data_arr, batch_size = 10):

    num = len(data_arr) 
    num = int(num/batch_size)

    # Loop forever so the generator never terminates
    while True: 

        for offset in range(0, num):

            batch_samples = (data_arr[offset*batch_size:(offset+1)*batch_size])

            samples = []
            labels = []

            for batch_sample in batch_samples:

                samples.append(batch_sample[0])
                labels.append((np.array(batch_sample[1].flatten())).transpose())

            X_ = np.array(samples)
            Y_ = np.array(labels)

            X_ = X_[:, :, :, newaxis]

            yield (X_, Y_)

    # compile and train the model using the generator function
    train_generator = generator(training_data, batch_size = 10)
    validation_generator = generator(val_data, batch_size = 10)

    model = Sequential()

    model.add(Conv2D(4, (2, 2), input_shape = (55, 1756, 1)))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size = (3, 3)))
    model.add(BatchNormalization())

    model.add(Conv2D(8, (2, 2)))
    model.add(Activation('tanh'))
    model.add(MaxPooling2D(pool_size = (3, 3)))
    model.add(BatchNormalization())

    model.add(Conv2D(16, (2, 2)))
    model.add(Activation('tanh'))
    model.add(MaxPooling2D(pool_size = (3, 3)))
    model.add(BatchNormalization())

    model.add(Flatten()) 
    model.add(Dense(19316))
    model.add(Activation('softmax'))

    def nrmse(y_true, y_pred):
        return backend.sqrt(backend.mean(backend.square(y_pred - 
            y_true)))/(2)

    def rmse(y_true, y_pred):
        return backend.sqrt(backend.mean(backend.square(y_pred - y_true)))

    model.compile(loss = 'mean_squared_error',
                  optimizer = 'adam',
                  metrics = [rmse, nrmse, 'mae'])

    model.summary()
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 27, 878, 4)        20        
_________________________________________________________________
activation_1 (Activation)    (None, 27, 878, 4)        0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 9, 292, 4)         0         
_________________________________________________________________
batch_normalization_1 (Batch (None, 9, 292, 4)         16        
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 8, 291, 8)         136       
_________________________________________________________________
activation_2 (Activation)    (None, 8, 291, 8)         0         
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 2, 97, 8)          0         
_________________________________________________________________
batch_normalization_2 (Batch (None, 2, 97, 8)          32        
_________________________________________________________________
flatten_1 (Flatten)          (None, 1552)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 19316)             29997748  
_________________________________________________________________
activation_3 (Activation)    (None, 19316)             0

=================================================================
Total params: 29,997,952
Trainable params: 29,997,928
Non-trainable params: 24
_________________________________________________________________
Epoch 1/6
6660/6660 [==============================] - 425s 64ms/step - loss: 0.0135 - rmse: 0.0986 - nrmse: 0.0577 - mean_absolute_error: 0.0333 - val_loss: 0.0133 - val_rmse: 0.0971 - val_nrmse: 0.0572 - val_mean_absolute_error: 0.0327
Epoch 2/6
6660/6660 [==============================] - 422s 63ms/step - loss: 0.0135 - rmse: 0.0986 - nrmse: 0.0577 - mean_absolute_error: 0.0332 - val_loss: 0.0133 - val_rmse: 0.0971 - val_nrmse: 0.0572 - val_mean_absolute_error: 0.0327
Epoch 3/6
6660/6660 [==============================] - 422s 63ms/step - loss: 0.0135 - rmse: 0.0986 - nrmse: 0.0577 - mean_absolute_error: 0.0332 - val_loss: 0.0133 - val_rmse: 0.0971 - val_nrmse: 0.0572 - val_mean_absolute_error: 0.0327
Epoch 4/6
6660/6660 [==============================] - 422s 63ms/step - loss: 0.0135 - rmse: 0.0986 - nrmse: 0.0577 - mean_absolute_error: 0.0332 - val_loss: 0.0133 - val_rmse: 0.0971 - val_nrmse: 0.0572 - val_mean_absolute_error: 0.0327
Epoch 5/6
6660/6660 [==============================] - 422s 63ms/step - loss: 0.0135 - rmse: 0.0986 - nrmse: 0.0577 - mean_absolute_error: 0.0332 - val_loss: 0.0133 - val_rmse: 0.0971 - val_nrmse: 0.0572 - val_mean_absolute_error: 0.0327
Epoch 6/6
6660/6660 [==============================] - 421s 63ms/step - loss: 0.0135 - rmse: 0.0986 - nrmse: 0.0577 - mean_absolute_error: 0.0332 - val_loss: 0.0133 - val_rmse: 0.0971 - val_nrmse: 0.0572 - val_mean_absolute_error: 0.03274

【问题讨论】:

  • 这可能意味着网络太小,无法学习更多抽象特征,或者数据以某种方式搞砸了。
  • 强制使用卷积吗?我通常会尝试使用多层感知器来解决此类问题。太多的复杂性有时会增加问题。
  • 感谢您的评论@venkata krishnan!多层感知器能为这个问题带来什么优势?提前致谢!
  • 由于它的所有数字系列,MLP 可以很容易地学会表达非线性。如果你使用 conv 和 max pooling,你会自动开始从一层到另一层丢失一些信息。

标签: python tensorflow keras deep-learning


【解决方案1】:

如果您使用 ReLu 以外的激活函数,可能会出现梯度消失问题。尝试将函数更改为 ReLu,然后看看它是否有所改善。

【讨论】:

  • 我确实尝试过,但结果与以前几乎相同。你对什么可能导致他的行为类型有任何其他想法吗?提前感谢@新手
  • 你试过把它减少到一个 Conv 层吗?您的网络可能对您的数据过于复杂
猜你喜欢
  • 1970-01-01
  • 2016-09-10
  • 1970-01-01
  • 2018-12-13
  • 2019-01-19
  • 2021-07-02
  • 2018-08-02
  • 2021-09-15
  • 2017-04-15
相关资源
最近更新 更多