机器学习系列之coursera week 5 Neural Networks: Learning

1. Cost Function and Back Propagation

1.1 Cost Function

Neural Network(classification)

L = total No of layers in networks

S(l) = No of units(not counting bias unit) in layer l

K = # output unit

E.g

L = 4

S1 = 3, S2 = S3 = 5, S4 = 4

机器学习系列之coursera week 5 Neural Networks: Learning 图1

（引自coursera week 5 Cost Function）

Cost Function:

机器学习系列之coursera week 5 Neural Networks: Learning

note: 不对bias unit的权值进行正则化

1.2 Back propagation algorithm

Gradient computation:

给定J(Θ)，求：

机器学习系列之coursera week 5 Neural Networks: Learning

Need code to compute:

-J(Θ)

- 机器学习系列之coursera week 5 Neural Networks: Learning

Given one training example(x, y):

L = 4

S1 = 3, S2 = S3 = 5, S4 = 4

forword propagation:

机器学习系列之coursera week 5 Neural Networks: Learning

back propagation algorithm:

Intuition: δj^l = error of node j in layer l

for each output unit(layer L=4)

机器学习系列之coursera week 5 Neural Networks: Learning

algorithm:

机器学习系列之coursera week 5 Neural Networks: Learning

（引自coursera week 5 Back propagation algorithm）

机器学习系列之coursera week 5 Neural Networks: Learning

1.3 Back propagation intuition

机器学习系列之coursera week 5 Neural Networks: Learning

2. Back Propagation in practice

2.1 Implementation note: unrolling parameters

将矩阵变成向量。

E.g.

thetavec = Theta1(:);

fminunc(@costFunction, initialTheta, options);

2.2 Gradient checking

Numerical estimation of gradients:

机器学习系列之coursera week 5 Neural Networks: Learning

(引自coursera week 5 Gradient check)

parameter vector θ:

机器学习系列之coursera week 5 Neural Networks: Learning

code:

thetaPlus = theta + Epsilon;

thetaMinus = theta - Epsilon;

gradApprox = (J(thetaPlus) - thetaMinus) / (2 * Epsilon)

check that gradApprox 约等于 Dvec

2.3 Random Initialization

Initial value of Θ

zero initialization: after each updata, parameters corresponding to inputs going into each of two hidden units are identical.

random initialization: symmetry breaking, initialize each Θ to random value in [-epsilon, epsilon].

E.g. theta = rand(0, 1) * (2 * INIT_Epsilon) - INIT_Epsilon

2.4 Putting it together(summary)

Training a neural network:

pick a network architecture.

No. of input units: Dimension of features x(i)

No. of output units: #classes(when 2 classes, that be 1)

reasonable default: 1 hidden layer, or if > 1 hidden layer, have same No. of hidden units in every layer(ussually the more the better). note：每一隐层的单元数大概为输入特征数的相等，两倍，三倍，四倍都是可以接受的。

(1) Randomly initialize weights

(2) Implement forward propagation to get h(x)

(3) Implement code to compute cost function

(4) Implement back prop to compute partial derivatives

(5) use gradient checking to compare partial derivatives computed using back prop VS using numerical estimate. Then disable gradient checking code.

(6) use gradient descent or advanced optimization method with backpropation to try to minimize J(Θ) as a function of parameters Θ.

note: Neural networks' cost function is a non-convex function

3. 关于梯度的数学推导

机器学习系列之coursera week 5 Neural Networks: Learning

目录

1. Cost Function and Back Propagation

1.1 Cost Function

1.2 Back propagation algorithm

1.3 Back propagation intuition

2. Back Propagation in practice

2.1 Implementation note: unrolling parameters

2.2 Gradient checking

2.3 Random Initialization

2.4 Putting it together(summary)

3. 关于梯度的数学推导