Chapter 2: How the backpropagation algorithm works

反向传播:back propagation or BP:
计算图:computation graph
链式法则: chain rule yx=yzzx

Computational Graph

In GD algorithms, we modify weights/biases by η multiplying C (C’s partial derivatives to them).
An example of using computational graphs to solve partial derivatives:
Chapter 2 - Neural Network and Deep Learning

Notations

elementwise application of functions: f(v)
elementwise product of two vectors of the same shape: sv
from the (l1)th to lth layer: vl
weight from neuron k in layer l1 to neuron j in layer l: wljk
zljwljal1+blj
the activation of the jth neuron in layer l: alj=σ(zlj)
δljCzlj

Back Propagation

If we go through every neuron forwards, we may revisit some neurons for many times. Back propagation uses dynamic programming to save time.
Try to build an intuition with the following equations.

Calculate δl

1) For the output layer

Apply the chain rule:

CzLj=CaLjσ(zLj)(1)

in shorthand: δL=aCσ(zL)

2) For layer l before the output layer

According to the chain rule: δlk=δl+1jzl+1jzlk=δl+1jwl+1jkσ(zlk), therefore:

δl=((wl+1)Tδl+1)σ(zl)(2)

Calculate biases and weights

1) Biases

Cblj=δlj(3)

in shorthand: Cb=δ

2) Weights

Cwljk=al1kδlj(4)

Dipicted:
Chapter 2 - Neural Network and Deep Learning

A Vanilla Implementation

相关文章:

  • 2022-01-22
  • 2021-09-27
  • 2021-05-28
  • 2021-07-18
  • 2021-12-06
  • 2021-11-11
猜你喜欢
  • 2021-09-06
  • 2021-10-10
  • 2021-12-12
  • 2021-05-24
  • 2021-11-13
  • 2021-05-11
相关资源
相似解决方案