1. Model representation

1.1 Neural network model

ML Notes: Week 4 - Neural Networks - RepresentationThe typical neuron has input wires which called the dendrites and also has an output wire called an Axon. The nucleus is considered as the computational unit. We could simplify the model as follows:
ML Notes: Week 4 - Neural Networks - Representation
Terms: a neuron or an artificial neuron with a sigmoid or logistic activation function

1.2 Some notations in the neural networks

ML Notes: Week 4 - Neural Networks - Representation
layer 1: Input layer
layer 2: Hidden layer (the rest layers all could be called hidden layer)
layer 3: Output layer


  • x,Θx, \Theta are parameter vectors. In addition, the Θ\Theta also is called as weights.
  • Θin(j)\Theta_{in}^{(j)} = matrix of weights mapping from layer jj in layer j+1j+1. If a network has sjs_j units in layer jj and has sj+1s_{j+1} units in layer j+1j+1, then Θ(j)=sj+1(sj+1)\Theta^{(j)} = s_{j+1}*(s_j+1).
  • ai(j)a_i^{(j)} = “activation” of unit ii in layer jj.

a1(2)=g(Θ10(1)x0+Θ11(1)x1+Θ12(1)x2+Θ13(1)x3)a2(2)=g(Θ20(1)x0+Θ21(1)x1+Θ22(1)x2+Θ23(1)x3)a3(2)=g(Θ30(1)x0+Θ31(1)x1+Θ32(1)x2+Θ33(1)x3)hΘ(x)=a1(3)=g(Θ10(2)a0(2)+Θ11(2)a1(2)+Θ12(2)a2(2)+Θ13(2)a3(2))\begin{aligned} a_1^{(2)}& = g(\Theta_{10}^{(1)}x_0 + \Theta_{11}^{(1)}x_1 + \Theta_{12}^{(1)}x_2 + \Theta_{13}^{(1)}x_3) \\ a_2^{(2)} &= g(\Theta_{20}^{(1)}x_0 + \Theta_{21}^{(1)}x_1 + \Theta_{22}^{(1)}x_2 + \Theta_{23}^{(1)}x_3)\\ a_3^{(2)} &= g(\Theta_{30}^{(1)}x_0 + \Theta_{31}^{(1)}x_1 + \Theta_{32}^{(1)}x_2 + \Theta_{33}^{(1)}x_3) \\ \newline h_\Theta(x) = a_1^{(3)} &= g(\Theta_{10}^{(2)}a_0^{(2)} + \Theta_{11}^{(2)}a_1^{(2)} + \Theta_{12}^{(2)}a_2^{(2)} + \Theta_{13}^{(2)}a_3^{(2)}) \newline \end{aligned}*gg is sigmoid/logistic activation function.

1.3 Forward propagation nueral network

The process of computing the activations, shown in the above figure, from the input then the hidden then the output layer, and that’s also called forward propagation

Now, we will vectorize the model. We difine
z1(2)=Θ10(1)x0+Θ11(1)x1+Θ12(1)x2+Θ13(1)x3z2(2)=Θ20(1)x0+Θ21(1)x1+Θ22(1)x2+Θ23(1)x3z3(2)=Θ30(1)x0+Θ31(1)x1+Θ32(1)x2+Θ33(1)x3\begin{aligned} z_1^{(2)}&=\Theta_{10}^{(1)}x_0 + \Theta_{11}^{(1)}x_1 + \Theta_{12}^{(1)}x_2 + \Theta_{13}^{(1)}x_3 \\ z_2^{(2)}&=\Theta_{20}^{(1)}x_0 + \Theta_{21}^{(1)}x_1 + \Theta_{22}^{(1)}x_2 + \Theta_{23}^{(1)}x_3\\ z_3^{(2)}&=\Theta_{30}^{(1)}x_0 + \Theta_{31}^{(1)}x_1 + \Theta_{32}^{(1)}x_2 + \Theta_{33}^{(1)}x_3 \end{aligned}
we can rewrite it as z(2)=[z1(2)z1(2)z1(2)]T=Θ(1)xz^{(2)} =[z_1^{(2)} z_1^{(2)} z_1^{(2)}]^T= \Theta^{(1)} x. If we treat xx as a(1)a^{(1)}, so z(2)=Θ(1)a(1)z^{(2)} = \Theta^{(1)} a^{(1)} .
That is z(j+1)=Θ(j)a(j)z^{(j+1)} = \Theta^{(j)} a^{(j)}

And a1(2)=g(z1(2)),a2(2)=g(z2(2)),a3(2)=g(z3(2))a_1^{(2)} = g(z_1^{(2)}), a_2^{(2)} = g(z_2^{(2)}), a_3^{(2)} = g(z_3^{(2)}) could be written as a(2)=g(z(2))a^{(2)} =g( z^{(2)}) .

For the above neural network model, if we take the input layer away, the model just like logistic function.
ML Notes: Week 4 - Neural Networks - Representation
Logistic function: hθ(x)=g(θ0+θ1x+θ2x2)h_\theta(x) = g(\theta_0+\theta_1x+\theta_2x_2)
The simplified neural network model: hΘ(x)=g(Θ10(2)a0(2)+Θ11(2)a1(2)+Θ12(2)a2(2)+Θ13(2)a3(2))h_\Theta(x)=g(\Theta_{10}^{(2)}a_0^{(2)} + \Theta_{11}^{(2)}a_1^{(2)} + \Theta_{12}^{(2)}a_2^{(2)} + \Theta_{13}^{(2)}a_3^{(2)} )

1.4 Other network architectures

ML Notes: Week 4 - Neural Networks - Representation

2. How to compute a complex nonlinear function?

x1,x2{0,1}x_1,x_2 \in\{0,1\}

2.1 AND

y=x1y = x_1 AND x2x_2
ML Notes: Week 4 - Neural Networks - Representation
Θ(1)=[302020]\Theta^{(1)} =\begin{bmatrix}-30 & 20 & 20 \end{bmatrix}

2.2 OR

y=x1y = x_1 OR x2x_2
ML Notes: Week 4 - Neural Networks - Representation
Θ(1)=[102020]\Theta^{(1)} =\begin{bmatrix}-10 & 20 & 20 \end{bmatrix}

2.3 NOT

y=y = NOT x1x_1
ML Notes: Week 4 - Neural Networks - Representation
Θ(1)=[1020]\Theta^{(1)} =\begin{bmatrix}10 & -20 \end{bmatrix}

2.4 (NOT x1x1) AND (NOT x2x_2)

ML Notes: Week 4 - Neural Networks - Representation
Θ(1)=[102020]\Theta^{(1)} =\begin{bmatrix}10 & -20 & -20\end{bmatrix}

2.5 XNOR

y=(x1y = (x_1 AND x2)x_2) OR (((NOT x1x1) AND (NOT x2x_2)))
ML Notes: Week 4 - Neural Networks - Representation

* we are able to put pieces together to generate some new functions.


3. Multi-class Classification

The ouput yiy_i will be [1000]\begin{bmatrix} 1 \\ 0\\ 0\\ 0\end{bmatrix} ,[0100]\begin{bmatrix} 0 \\ 1\\ 0\\ 0\end{bmatrix} ,[0010]\begin{bmatrix} 0 \\ 0\\ 1\\ 0\end{bmatrix} ,[0001]\begin{bmatrix} 0 \\ 0\\ 0\\ 1\end{bmatrix} depending on what the corresponding input XiX_i is. And in this way, we could implement the multi-class Classification.

相关文章: