论文 Bag of Tricks for Image Classification with Convolutional Neural Networks. 中提到,加 L2 正则就相当于将该权重趋向 0,而对于 CNN 而言,一般只对卷积层和全连接层的 weights 进行 L2(weight decay),而不对 biases 进行。Batch Normalization 层也不进行 L2。

PyTorch,只对卷积层和全连接层的 weights 进行 L2(weight decay):

weight_decay_list = (param for name, param in model.named_parameters() if name[-4:] != 'bias' and "bn" not in name)
no_decay_list = (param for name, param in model.named_parameters() if name[-4:] == 'bias' or "bn" in name)
parameters = [{'params': weight_decay_list},
              {'params': no_decay_list, 'weight_decay': 0.}]

optimizer = torch.optim.SGD(parameters, lr=0.1, momentum=0.9, weight_decay=5e-4, nesterov=True)

References

[1] He, T., Zhang, Z., Zhang, H., Zhang, Z., Xie, J., Li, M. (2019). Bag of Tricks for Image Classification with Convolutional Neural Networks. (CVPR) https://dx.doi.org/10.1109/cvpr.2019.00065

相关文章: