训练神经网络的常见错误

本文内容来自Andrej Karpathy, 李飞飞的高足。
twitter截图如下：

下面来逐点说明。受小博主知识上界限制，文中不免有理解不正确之处，恭请批评指正。

1、you didn’t try to overfit a single batch first.
首先尝试用小数据集作为一个batch，奔着过拟合去，短时间内排除明显的错误。

2、you forgot to toggle train/eval mode for the net.
忘记为网络切换训练/评估模式
很明显，这个是针对model在训练和评估时，batchsize大小不同和Dropout中keep_prob值的不同。

3、you forgot to .zero_grad() (in pytorch) before .bachward().
在.backward()之前忘了添加.zero_grad() （这条针对pytorch）
前人有碰到过这种问题，忘记写.zero_grad()，因而导致结果非常差。其实这条应该结合在第一条中检查。

4、you passed softmaxed outputs to a loss that expects raw logits.
将softmaxed输出的值传给了损失函数，而事实上，传入损失函数的应该是logits值，而不是softmaxed输出值。
这条应当留意，此前个人没怎么考虑过这种问题。

5、you didn’t use bias=False for your Linear/Conv2d layer when using BatchNorm, or conversely forget to include it for the output layer. This one won’t make you silent fail, but they are spurious parameters.
使用BatchNorm时，没有对线性或者二维卷积层使用bias=False，或者相反的忘记将其包含在输出层中。这虽然不会使模型失败，但它们是虚假的参数。
前人总结道，如果线性或者卷积层后面跟着BatchNorm，线性或者卷积层不需要偏置项，即令bias=False，而只需要权重参数W。

6、thinking view() and permute() are the same thing (& incorrectly using view)
以为view() 和 permute()是一样的，其实是不正确使用了view()。
该条需要实践出来体会。

本文参考网址如下：
[1] https://twitter.com/karpathy
[2] https://zhuanlan.zhihu.com/p/38937612