Backpropagation intuition


简单的2层浅神经网络,第一层的activation function为tanh(z),第二层的activation function为sigmoid(z)
神经网络architecture如下图:
Deep learning I - III Shallow Neural Network - Backpropagation intuition反向传播算法启发
使用计算流图(computational graphs)表示如下图:

Deep learning I - III Shallow Neural Network - Backpropagation intuition反向传播算法启发

在下面的公式中,loga[2] means lna[2]da[2],dz[2]等等是标记相应的导数的符号;并且,下面的公式是单个instance的,并没有矩阵化。

(1.1)L(a[2],y)=yloga[2](1y)log(1a[2])

(1.2)da[1×1][2]=dda[2]L(a[2],y)=ya[2]+1y1a[2]

(1.3)g(z[2])=sigmoid(z[2])=a[2]

(1.4)dz[1×1][2]=ddz[2]L(a[2],y)=dda[2]L(a[2],y)ddz[2]a[2]=da[2]g(z[2])=(ya[2]+1y1a[2])(g(z[2])(1g(z[2])))=(ya[2]+1y1a[2])a[2](1a[2])=a[2]y

(1.5)dW[1×4][2]=ddW[2]L(a[2],y)=dda[2]L(a[2],y)ddz[2]a[2]ddW[2]z[2]=dz[2]x=dz[1×1][2](a[4×1][1])T

(1.6)db[1×1][2]=ddb[2]L(a[2],y)=dda[2]L(a[2],y)ddz[2]a[2]ddb[2]z[2]=dz[1×1][2]

(1.7)da[4×1][1]=dda[1]L(a[2],y)=dda[2]L(a[2],y)ddz[2]a[2]dda[1]z[2]=dz[2]W[2]=(W[1×4][2])Tdz[1×1][2]

(1.8)g(z[1])=tanh(z[1])=a[1]

(1.9)dz[4×1][1]=ddz[1]L(a[2],y)=dda[2]L(a[2],y)ddz[2]a[2]dda[1]z[2]ddz[1]a[1]=da[1]g(z[1])=(W[1×4][2])Tdz[1×1][2]g(z[1])[4×1]

(1.10)dW[4×3][1]=ddW[1]L(a[2],y)=dda[2]L(a[2],y)ddz[2]a[2]dda[1]z[2]ddz[1]a[1]ddW[1]z[1]=dz[1]x=dz[4×1][1](a[3×1][0])T

(1.11)db[4×1][1]=ddW[1]L(a[2],y)=dda[2]L(a[2],y)ddz[2]a[2]dda[1]z[2]ddz[1]a[1]ddb[1]z[1]=dz[4×1][1]

下面是vectorization后的反向传播算法公式:

(2.1)L(A[2],Y)=1mi=1my(i)logA[2](i)(1y(i))log(1A[2](i))

(2.2)dA[1×m][2]=[(Y(1)A[2](1)+1Y(1)1A[2](1)),,(Y(m)A[2](m)+1Y(m)1A[2](m))]

(2.3)dZ[1×m][2]=[(Y(1)A[2](1)+1Y(1)1A[2](1)),,(Y(m)A[2](m)+1Y(m)1A[2](m))][A[2](1)(1A[2](1)),,A[2](m)(1A[2](m))]=[(A[2](1)Y(1)),,(A[2](m)Y(m))]=A[2]Y

(2.4)dW[1×4][2]=1mdZ[1×m][2](A[4×m][1])T

(2.5)db[1×1][2]=1mnp.sum(dZ[2],axis=1,keepdims=True)

dZ[4×m][1]=(W[1×4][2])TdZ[1×m][2]g[1](Z[1])[4×m]

(2.6)dW[4×3][1]=1mdZ[4×m][1](A[3×m][0])T

(2.7)db[4×1][1]=1msp.sum(dZ[4×m][1],axis=1,keepdims=True)

总结

Deep learning I - III Shallow Neural Network - Backpropagation intuition反向传播算法启发

相关文章: