Everyone is an adaptive machine.
Life experiences are the training data.
We should learn from them with several loss functions.
A good learning rate method emphasizes the directions rather than a higher learning rate.
Everyone is an adaptive machine.
“In practice networks that use Batch Normalization are significantly more robust to bad initialization. Additionally, batch normalization can be interpreted as doing preprocessing at every layer of the network, but integrated into the network itself in a differentiable manner. Neat!”

相关文章: