为什么我的神经网络总是给我同样的预测？答案

【问题标题】：Why is my neural network always giving me the same predictions?为什么我的神经网络总是给我同样的预测？
【发布时间】：2019-06-08 20:16:55
【问题描述】：

我正在尝试创建一个顺序神经网络，其中输出是 12 个非排他概率（A 的概率，B 的概率，C 的概率，...）。我的网络似乎学习了最常见的输出，并且总是对每个输入进行预测。我的所有输出值始终为“1”或“0”，中间没有任何内容，并且在同一位置始终具有相同的值（详情如下）。

我离 ML 专家还差得很远，所以解决方案可能非常简单。

我尝试过使用不同的批量大小（从 8 到 128 不等）和许多不同的损失函数，但似乎没有任何帮助。

我如何使用 Keras 创建模型：

model = Sequential()
model.add( Dense( 150, input_dim=9600, activation='relu') )
model.add( LeakyReLU(alpha=.01) )
model.add( Dense( 50, activation='relu') )
model.add( LeakyReLU(alpha=.01) )
model.add( Dense( 12, activation='sigmoid') )

metrics_to_output=[ 'accuracy' ]
# I've tried many loss functions, not just mean_squared_error
model.compile( loss='mean_squared_error', optimizer='adam', metrics=metrics_to_output )

这可能不相关，但这就是我准备数据和训练模型的方式。我也尝试过使用train_on_batch:

def generate_data_from_files( file1, file2 ):
    input = numpy.load( file1, allow_pickle=True )
    output = numpy.load( file2, allow_pickle=True )

    # The file only has 2 values, and I generate 12 probabilities derived from those 2 values
    transformed_output = output.copy()
    new_shape = ( output.shape[ 0 ], 12 )
    transformed_output.resize( new_shape )

    for x in range( 0, len( output ) ):
        #First 6 probabilities model the value of output[ x ][ 0 ]
        transformed_output[ x ][ 0 ] = 1 if output[ x ][ 0 ] <= -5.0 else 0
        transformed_output[ x ][ 1 ] = 1 if output[ x ][ 0 ] <= -3.0 else 0
        transformed_output[ x ][ 2 ] = 1 if output[ x ][ 0 ] <= -1.0 else 0
        transformed_output[ x ][ 3 ] = 1 if output[ x ][ 0 ] >= 1.0 else 0
        transformed_output[ x ][ 4 ] = 1 if output[ x ][ 0 ] >= 3.0 else 0
        transformed_output[ x ][ 5 ] = 1 if output[ x ][ 0 ] >= 5.0 else 0
        #Second 6 probabilities model the value of output[ x ][ 1 ]
        transformed_output[ x ][ 6 ] = 1 if output[ x ][ 1 ] <= -5.0 else 0
        transformed_output[ x ][ 7 ] = 1 if output[ x ][ 1 ] <= -3.0 else 0
        transformed_output[ x ][ 8 ] = 1 if output[ x ][ 1 ] <= -1.0 else 0
        transformed_output[ x ][ 9 ] = 1 if output[ x ][ 1 ] >= 1.0 else 0
        transformed_output[ x ][ 10] = 1 if output[ x ][ 1 ] >= 3.0 else 0
        transformed_output[ x ][ 11] = 1 if output[ x ][ 1 ] >= 5.0 else 0
    return input, transformed_output


input, output = generate_data_from_file( file1, file2 )
model.fit( x=input, y=output, batch_size=8, epochs=1 )

我希望得到 12 个从 0 到 1 的值，每个值都模拟一个概率。但是，当我使用网络进行预测时（即使是在训练数据上），我总是得到相同的输出：

0 1 1 0 0 0 0 0 0 0 0 0

这是一个合理的平均猜测，因为第 2 和第 3 个布尔值通常为真，而其他所有值通常为假，但我从未看到此预测有任何变化，即使在预期输出为其他值的训练数据上也是如此。我偶尔会看到 0.9999999 或 0.000001 代替 0 或 1，但即使这样也很少见。

我的收获是，我将模型设置为始终预测平均情况。任何反馈或建议将不胜感激。提前致谢！

编辑：谢谢大家的建议。在阅读了更多关于此的内容后，我认为正在发生的事情是我的输出层正在变得饱和。我正在改用 softsign 代替 sigmoid（并调整逻辑以期望 -1 是底数而不是 0），希望这会有所帮助。

【问题讨论】：

我认为你最好提供你用来打印这个的代码。但根据我的经验，要么是你从训练数据中预测出来的，要么是预处理错误。

标签： tensorflow keras tf.keras

【解决方案1】：

您正在为输出层使用 sigmoid 激活函数。

model.add( Dense( 12, activation='sigmoid') )

Sigmoid 输出 0 或 1。我认为您正在寻找的是 softmax 激活函数，它输出 0 和 1 之间的值，并且所有 (12) 个值加起来为 1。然后您将执行 argmax 到找到最高值并将其作为您的预测。

另外两件事：为什么在隐藏层中使用两个激活函数？使用其中一种，不要同时使用。

model.add( Dense( 50, activation='relu') )
model.add( LeakyReLU(alpha=.01) )

均方误差用于回归问题，根据您的描述，这似乎是一个分类问题。

【讨论】：

感谢您的反馈，尤其是关于双层的。但是，我不希望我所有的概率加起来为 1（这就是我试图通过“非排他概率”来表示的意思，但我理解这是多么模棱两可）。 sigmoid能不能不输出，比如0.5？如果不是，是否有类似 softmax 的东西不强制总和为 1？
这个Sigmoid outputs either 0 or 1有点不对，Keras的Sigmoid实际上是0到1之间的浮点值。而对于@Mack的问题，是的，Sigmoid在输入为0的情况下可以输出0.5。跨度>