Everyone is an adaptive machine.

Everyone is an adaptive machine.
Life experiences are the training data.
We should learn from them with several loss functions.
A good learning rate method emphasizes the directions rather than a higher learning rate.

“In practice networks that use Batch Normalization are significantly more robust to bad initialization. Additionally, batch normalization can be interpreted as doing preprocessing at every layer of the network, but integrated into the network itself in a differentiable manner. Neat!”