Chapter 2: How the backpropagation algorithm works
反向传播:back propagation or BP:
计算图:computation graph
链式法则: chain rule
Computational Graph
In GD algorithms, we modify weights/biases by
An example of using computational graphs to solve partial derivatives:
Notations
elementwise application of functions:
elementwise product of two vectors of the same shape:
from the
weight from neuron
the activation of the
Back Propagation
If we go through every neuron forwards, we may revisit some neurons for many times. Back propagation uses dynamic programming to save time.
Try to build an intuition with the following equations.
Calculate δl
1) For the output layer
Apply the chain rule:
in shorthand:
2) For layer l before the output layer
According to the chain rule:
Calculate biases and weights
1) Biases
in shorthand:
2) Weights
Dipicted: