ML 逻辑回归 Logistic Regression

逻辑回归

Logistic Regression

首先我们来看看使用线性回归来解决分类会出现的问题。下图中，我们加入了一个训练集，产生的新的假设函数使得我们进行分类出现了错误；而且线性回归计算的结果往往会远小于0或者远大于1，这对于0,1分类变得很奇怪。可见线性回归并不适用与分类。下面介绍的逻辑回归的结果总是在[0,1]，适用于分类，其实逻辑回归是一种分类算法。

2 假设函数Hypothesis Representation

逻辑回归假设函数为：

ML 逻辑回归 Logistic Regression

其中 ML 逻辑回归 Logistic Regression 是参数向量，特征向量。该函数叫做Logistic function（也叫做Sigmoid function）。函数图像如下，范围(0,1)

ML 逻辑回归 Logistic Regression

对于分类问题，它可以表示给定特征 ML 逻辑回归 Logistic Regression ，参数属于类别1的概率，那么属于另一类的概率自然也就是。

ML 逻辑回归 Logistic Regression

3 决策边界Decision Boundary

根据逻辑函数的特性可以看出，当 ML 逻辑回归 Logistic Regression ，当

ML 逻辑回归 Logistic Regression

我们把 ML 逻辑回归 Logistic Regression 就称为决策边界（Decision Boundary）。

ML 逻辑回归 Logistic Regression

我们可以通过多项式组合来制定更加复杂的决策边界。

ML 逻辑回归 Logistic Regression

4 代价函数 Cost Function

代价函数 ML 逻辑回归 Logistic Regression ，在线性回归中，我们的，如果在逻辑回归中也使用这个函数，那么代价函数会是一个非凸函数，无法使用梯度下降去求解参数，所以我们要寻找一些函数使得代价函数为凸函数。

ML 逻辑回归 Logistic Regression

我们来看一下这个代价函数：

ML 逻辑回归 Logistic Regression

我们可以将它合并为一个公式：

ML 逻辑回归 Logistic Regression

5 梯度下降 Gradient Desecent

ML 逻辑回归 Logistic Regression

这里我们把代价函数写成这种形式是由最大似然估计得到了，当然也还有其他的形式。

梯度下降算法：

ML 逻辑回归 Logistic Regression

可以看到这里推导出来的公式看起来和线性回归梯度下降中推导出来的公式是一样的，但是要注意 ML 逻辑回归 Logistic Regression 已经是sigmod函数而不是线性公式了，所以他们是两码事。

6 多分类 Multi-class classification: one-vs-all

我们可以降逻辑回归用于多分类问题，假设有K类，我们可以训练出K个分类器，每个分类器，将其中一种类别作为正类，其余的都是负类来训练，然后再预测时，类别属于概率最大的那个类。

ML 逻辑回归 Logistic Regression

7 正则化regularization

7.1 过拟合 overfitting

ML 逻辑回归 Logistic Regression

7.2 代价函数cost function

在代价函数中加入正则项，注意，这里对 ML 逻辑回归 Logistic Regression 计算，而不是从开始。其中是正则项参数，如果太大，那么会趋向于0，使得，导致欠拟合。

ML 逻辑回归 Logistic Regression

7.3 正则化线性回归 Regularized linear regression

代价函数：

ML 逻辑回归 Logistic Regression

Gradeint descent

Repeat{

ML 逻辑回归 Logistic Regression

}

这里 ML 逻辑回归 Logistic Regression 是一个比1小一点点的数。

在线性回归中，我们除了梯度下降，还有正规方程的方法，正规方法加入正则项后：

ML 逻辑回归 Logistic Regression

7.4 正则化逻辑回归Regularized logistic regression

代价函数：

ML 逻辑回归 Logistic Regression

Gradient descent

Repeat{

ML 逻辑回归 Logistic Regression

}

实验代码

正则化逻辑回归

 1 function [J, grad] = costFunctionReg(theta, X, y, lambda)
 2 %COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization
 3 %   J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using
 4 %   theta as the parameter for regularized logistic regression and the
 5 %   gradient of the cost w.r.t. to the parameters. 
 6 
 7 % Initialize some useful values
 8 m = length(y); % number of training examples
 9 
10 % You need to return the following variables correctly 
11 J = 0;
12 grad = zeros(size(theta));
13 
14 % ====================== YOUR CODE HERE ======================
15 % Instructions: Compute the cost of a particular choice of theta.
16 %               You should set J to the cost.
17 %               Compute the partial derivatives and set grad to the partial
18 %               derivatives of the cost w.r.t. each parameter in theta
19 hx=sigmoid(X*theta);
20 Jnorm=(-1/m)*(y'*log(hx)+(1-y)'*log(1-hx));
21 theta0=theta(1); %注意theta0不用正则化
22 theta1=theta(2:end);
23 Jreg=(lambda/(2*m))*sum(theta1.^2);
24 J=Jnorm+Jreg;
25 
26 grad0=(hx-y)'*X(:,1)./m;
27 grad1=((hx-y)'*X(:,2:end)./m)'+(lambda/m).*theta1;
28 grad=[grad0;grad1];
29 % =============================================================
30 
31 end

sigmoid函数

 1 function g = sigmoid(z)
 2 %SIGMOID Compute sigmoid functoon
 3 %   J = SIGMOID(z) computes the sigmoid of z.
 4 
 5 % You need to return the following variables correctly 
 6 g = zeros(size(z));
 7 
 8 % ====================== YOUR CODE HERE ======================
 9 % Instructions: Compute the sigmoid of each value of z (z can be a matrix,
10 %               vector or scalar).
11 g=1./(1+exp((-1).*z));
12 % =============================================================
13 end

特征

 1 function out = mapFeature(X1, X2)
 2 % MAPFEATURE Feature mapping function to polynomial features
 3 %
 4 %   MAPFEATURE(X1, X2) maps the two input features
 5 %   to quadratic features used in the regularization exercise.
 6 %
 7 %   Returns a new feature array with more features, comprising of 
 8 %   X1, X2, X1.^2, X2.^2, X1*X2, X1*X2.^2, etc..
 9 %
10 %   Inputs X1, X2 must be the same size
11 %
12 
13 degree = 6;%6次函数
14 out = ones(size(X1(:,1)));
15 for i = 1:degree
16     for j = 0:i
17         out(:, end+1) = (X1.^(i-j)).*(X2.^j);
18     end
19 end
20 
21 end

主函数

  1 %% Machine Learning Online Class - Exercise 2: Logistic Regression
  2 %
  3 %  Instructions
  4 %  ------------
  5 % 
  6 %  This file contains code that helps you get started on the second part
  7 %  of the exercise which covers regularization with logistic regression.
  8 %
  9 %  You will need to complete the following functions in this exericse:
 10 %
 11 %     sigmoid.m
 12 %     costFunction.m
 13 %     predict.m
 14 %     costFunctionReg.m
 15 %
 16 %  For this exercise, you will not need to change any code in this file,
 17 %  or any other files other than those mentioned above.
 18 %
 19 
 20 %% Initialization
 21 clear ; close all; clc
 22 
 23 %% Load Data
 24 %  The first two columns contains the X values and the third column
 25 %  contains the label (y).
 26 
 27 data = load('ex2data2.txt');
 28 X = data(:, [1, 2]); y = data(:, 3);
 29 
 30 plotData(X, y);
 31 
 32 % Put some labels 
 33 hold on;
 34 
 35 % Labels and Legend
 36 xlabel('Microchip Test 1')
 37 ylabel('Microchip Test 2')
 38 
 39 % Specified in plot order
 40 legend('y = 1', 'y = 0')
 41 hold off;
 42 
 43 
 44 %% =========== Part 1: Regularized Logistic Regression ============
 45 %  In this part, you are given a dataset with data points that are not
 46 %  linearly separable. However, you would still like to use logistic 
 47 %  regression to classify the data points. 
 48 %
 49 %  To do so, you introduce more features to use -- in particular, you add
 50 %  polynomial features to our data matrix (similar to polynomial
 51 %  regression).
 52 %
 53 
 54 % Add Polynomial Features
 55 
 56 % Note that mapFeature also adds a column of ones for us, so the intercept
 57 % term is handled
 58 X = mapFeature(X(:,1), X(:,2));
 59 
 60 % Initialize fitting parameters
 61 initial_theta = zeros(size(X, 2), 1);
 62 
 63 % Set regularization parameter lambda to 1
 64 lambda = 1;
 65 
 66 % Compute and display initial cost and gradient for regularized logistic
 67 % regression
 68 [cost, grad] = costFunctionReg(initial_theta, X, y, lambda);
 69 
 70 fprintf('Cost at initial theta (zeros): %f\n', cost);
 71 
 72 fprintf('\nProgram paused. Press enter to continue.\n');
 73 pause;
 74 
 75 %% ============= Part 2: Regularization and Accuracies =============
 76 %  Optional Exercise:
 77 %  In this part, you will get to try different values of lambda and 
 78 %  see how regularization affects the decision coundart
 79 %
 80 %  Try the following values of lambda (0, 1, 10, 100).
 81 %
 82 %  How does the decision boundary change when you vary lambda? How does
 83 %  the training set accuracy vary?
 84 %
 85 
 86 % Initialize fitting parameters
 87 initial_theta = zeros(size(X, 2), 1);
 88 
 89 % Set regularization parameter lambda to 1 (you should vary this)
 90 lambda = 1;
 91 
 92 % Set Options
 93 options = optimset('GradObj', 'on', 'MaxIter', 400);
 94 
 95 % Optimize
 96 [theta, J, exit_flag] = ...
 97     fminunc(@(t)(costFunctionReg(t, X, y, lambda)), initial_theta, options);
 98 
 99 % Plot Boundary
100 plotDecisionBoundary(theta, X, y);
101 hold on;
102 title(sprintf('lambda = %g', lambda))
103 
104 % Labels and Legend
105 xlabel('Microchip Test 1')
106 ylabel('Microchip Test 2')
107 
108 legend('y = 1', 'y = 0', 'Decision boundary')
109 hold off;
110 
111 % Compute accuracy on our training set
112 p = predict(theta, X);
113 
114 fprintf('Train Accuracy: %f\n', mean(double(p == y)) * 100);

实验中参数lamda的大小也十分重要，不同的lamda可能会过拟合或者欠拟合。