nn.Sequential 中的 LayerNorm 在 Torch 中答案

【问题标题】：LayerNorm inside nn.Sequential in torchnn.Sequential 中的 LayerNorm 在 Torch 中
【发布时间】：2021-01-02 23:28:27
【问题描述】：

我正在尝试在 Torch 的 nn.Sequential 中使用 LayerNorm。这就是我要找的-

import torch.nn as nn

class LayerNormCnn(nn.Module):
    def __init__(self):
        super(LayerNormCnn, self).__init__()
        self.net = nn.Sequential(
                nn.Conv2d(3, 32, kernel_size=3, stride=2, padding=1),
                nn.LayerNorm(),
                nn.ReLU(),
                nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1),
                nn.LayerNorm(),
                nn.ReLU(),
            )

    def forward(self, x):
        x = self.net(x)
        return x

不幸的是，它不起作用，因为LayerNorm 需要normalized_shape 作为输入。上面的代码抛出以下异常-

    nn.LayerNorm(),
TypeError: __init__() missing 1 required positional argument: 'normalized_shape'

现在，我就是这样实现的-

import torch.nn as nn
import torch.nn.functional as F


class LayerNormCnn(nn.Module):
    def __init__(self, state_shape):
        super(LayerNormCnn, self).__init__()
        self.conv1 = nn.Conv2d(state_shape[0], 32, kernel_size=3, stride=2, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1)

        # compute shape by doing a forward pass
        with torch.no_grad():
            fake_input = torch.randn(1, *state_shape)
            out        = self.conv1(fake_input)
            bn1_size   = out.size()[1:]
            out        = self.conv2(out)
            bn2_size   = out.size()[1:]

        self.bn1 = nn.LayerNorm(bn1_size)
        self.bn2 = nn.LayerNorm(bn2_size)

    def forward(self, x):
        x = F.relu(self.bn1(self.conv1(x)))
        x = F.relu(self.bn2(self.conv2(x)))
        return x

if __name__ == '__main__':
    in_shape   = (3, 128, 128)
    batch_size = 32

    model = LayerNormCnn(in_shape)
    x = torch.randn((batch_size,) + in_shape)
    out = model(x)
    print(out.shape)

可以在 nn.Sequential 中使用 LayerNorm 吗？

【问题讨论】：

标签： python neural-network pytorch

【解决方案1】：

original 层归一化论文建议不要在 CNN 中使用层归一化，因为图像边界周围的感受野将具有与实际图像内容中的感受野不同的值。 RNN 不会出现这个问题，这是最初测试的层规范。您确定要使用 LayerNorm 吗？如果您希望将不同的标准化技术与 BatchNorm 进行比较，请考虑 GroupNorm。这摆脱了 LayerNorm 假设，即层中的所有通道对预测的贡献相同，尤其是在层是卷积层的情况下，这是有问题的。相反，每个通道被进一步划分为组，这仍然允许 GN 层跨通道学习不同的统计数据。