批量标准化似乎在 keras 和 pytorch 中不起作用答案

【问题标题】：Batch normalization seems to not work same in keras and pytorch批量标准化似乎在 keras 和 pytorch 中不起作用
【发布时间】：2019-09-06 08:43:13
【问题描述】：

我有一个简单的模型并尝试批量标准化的工作原理，在线性层之后应用。它似乎根本没有标准化，因为默认情况下它被初始化为 keras 中的身份。在 pytorch 中检查相同的权重后，它的批量标准化确实会发生变化。请看下文。为什么以及模型有什么问题？

编辑：独立的示例，打印来自 keras 和 pytorch 模型的结果以进行视觉比较。要使用批标准化层，请取消注释指出它的几行，然后再次比较结果。

import tensorflow as tf
import numpy as np
from collections import OrderedDict

from tensorflow.python.keras import layers 
from tensorflow.python.keras import models

import torch
from torch import nn
from torch.nn import functional as F


from tensorflow.contrib import eager as tfe
tfe.enable_eager_execution()


class PytorchModel(nn.Module):
    def __init__(self,
                 in_channels,
                 out_channels):
        super().__init__()

        self.linear = nn.Linear(in_channels, out_channels, bias=True)
        self.norm = nn.BatchNorm1d(out_channels, eps=1e-3, momentum=0.01)

    def forward(self, inputs):
        x = self.linear(inputs)
        ## uncomment for batch normalization
        # x = self.norm(x.permute(0, 2, 1).contiguous()).permute(0, 2, 1).contiguous()
        x = F.relu(x)
        return x

class KerasModel(models.Model):
    def __init__(self,
                 num_filters):
        super(KerasModel, self).__init__()

        my_layers = []
        BN = layers.BatchNormalization(name='my_bn', momentum=0.01, epsilon=1e-3)
        LIN = layers.Dense(num_filters, name='my_linear', activation=None, use_bias=True)
        my_layers.append([LIN, BN])
        self.my_layers = my_layers

    def call(self, ins):
        x = self.my_layers[0][0](tf.convert_to_tensor(ins))
        ## uncomment for batch normalization
        # x = self.my_layers[0][1](x)
        x = tf.nn.relu(x)
        return x

if __name__ == '__main__':

    # create dummy input
    np.random.seed(0)
    input_np = np.random.rand(4,5,6)
    filters = 8

    keras_l = KerasModel(num_filters=filters)

    tf_features = keras_l(tf.convert_to_tensor(input_np))

    pytorch_l = PytorchModel(in_channels=6,
                             out_channels=filters)

    # copy weights from keras model to pytorch model
    new_state_dict = OrderedDict()
    new_state_dict['linear.weight'] = torch.from_numpy(np.transpose(keras_l.layers[0].weights[0].numpy(), (1, 0)))
    new_state_dict['linear.bias'] = torch.from_numpy(keras_l.layers[0].bias.numpy())
    ## uncomment for batch normalization
    # new_state_dict['norm.weight'] = torch.from_numpy(keras_l.layers[1].weights[0].numpy())  # gamma
    # new_state_dict['norm.bias'] = torch.from_numpy(keras_l.layers[1].weights[1].numpy())  # bias
    # new_state_dict['norm.running_mean'] = torch.from_numpy(keras_l.layers[1].weights[2].numpy())
    # new_state_dict['norm.running_var'] = torch.from_numpy(keras_l.layers[1].weights[3].numpy())
    pytorch_l.load_state_dict(new_state_dict, strict=False)

    batch_input_voxels_np = torch.from_numpy(input_np).float()
    batch_pytorch_features = pytorch_l.forward(batch_input_voxels_np)

    # => check how results differ, when batch normalization is applied.
    print(tf_features[0, 0, :])
    print(pytorch_features[0, 0, :])

【问题讨论】：

您如何确定标准化不起作用，为什么将 BN 动量设置为 0.01？你没有在 PyTorch 中这样做，它会完全改变对运行均值和方差的估计。
@MatiasValdenegro 更正了代码。我确实为两个 BN 版本设置了相同的值。我以这样的方式确定它，即 x == x_bn_k（无论我尝试在哪个轴上应用 BN），而 x != x_bn_py
仍然把势头设置得这么低是个问题，你为什么要这样做？当我问你如何比较时，你也应该包含相应的代码。
@MatiasValdenegro，现在我与简单的打印语句进行比较，因为我希望 BN 对 keras 和 pytorch 执行相同的操作。添加结果。我在一些例子中看到了如此低的动量值并接管了它。也可以更改它，但我猜它应该不会导致 keras 和 pytorch 结果之间存在这种差异。
好的，你应该做一个独立的代码示例，因为如果我运行你的 keras 代码，我会收到类似“Layer dense_1 被调用的输入不是符号张量”的错误，其中这是有道理的，因为您不能向 Keras 层提供非符号输入。所以我无法重现你的结果。我还查看了 keras BN 实现，默认权重初始化使其表现得像身份转换，这解释了你的结果。

标签： tensorflow keras pytorch batch-normalization

【解决方案1】：

试试这个。

with torch.no_grad():
    pytorch_l.eval()
    batch_pytorch_features = pytorch_l(batch_input_voxels_np)

【讨论】：

【解决方案2】：

别忘了

pytorch_l.eval()

和

with torch.no_grad():

【讨论】：