CheeseZH: Stanford University: Machine Learning Ex3: Multiclass Logistic Regression and Neural Network Prediction

Handwritten digits recognition (0-9)

Multi-class Logistic Regression

1. Vectorizing Logistic Regression

(1) Vectorizing the cost function

(2) Vectorizing the gradient

(3) Vectorizing the regularized cost function

(4) Vectorizing the regularized gradient

All above 4 formulas can be found in the previous blog: click here.

lrCostFunction.m

 1 function [J, grad] = lrCostFunction(theta, X, y, lambda)
 2 %LRCOSTFUNCTION Compute cost and gradient for logistic regression with 
 3 %regularization
 4 %   J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using
 5 %   theta as the parameter for regularized logistic regression and the
 6 %   gradient of the cost w.r.t. to the parameters. 
 7 
 8 % Initialize some useful values
 9 m = length(y); % number of training examples
10 
11 % You need to return the following variables correctly 
12 J = 0;
13 grad = zeros(size(theta));
14 
15 % ====================== YOUR CODE HERE ======================
16 % Instructions: Compute the cost of a particular choice of theta.
17 %               You should set J to the cost.
18 %               Compute the partial derivatives and set grad to the partial
19 %               derivatives of the cost w.r.t. each parameter in theta
20 %
21 % Hint: The computation of the cost function and gradients can be
22 %       efficiently vectorized. For example, consider the computation
23 %
24 %           sigmoid(X * theta)
25 %
26 %       Each row of the resulting matrix will contain the value of the
27 %       prediction for that example. You can make use of this to vectorize
28 %       the cost function and gradient computations. 
29 %
30 % Hint: When computing the gradient of the regularized cost function, 
31 %       there're many possible vectorized solutions, but one solution
32 %       looks like:
33 %           grad = (unregularized gradient for logistic regression)
34 %           temp = theta; 
35 %           temp(1) = 0;   % because we don't add anything for j = 0  
36 %           grad = grad + YOUR_CODE_HERE (using the temp variable)
37 %
38 
39 hx = sigmoid(X*theta);
40 reg = lambda/(2*m)*sum(theta(2:size(theta),:).^2);
41 J = -1/m*(y'*log(hx)+(1-y)'*log(1-hx)) + reg;
42 theta(1) = 0;
43 grad = 1/m*X'*(hx-y)+lambda/m*theta;
44 
45 
46 
47 
48 
49 
50 % =============================================================
51 
52 grad = grad(:);
53 
54 end

View Code