Support vector machine（SVM）

The line of thinking is:
(1) modify the activation function of logistics regression, and then put out the optimization function.

(2) discuss the nature of this new optimiaztion function.

(3) kernel function.

1 optimization objective

Support vector machine（SVM）

where:

C equals to in Regularization.

cost0() and cost(1) are shown below.

Support vector machine（SVM）

2 nature of this new optimiaztion function

In this function, if C is infinite. This function will be:
Support vector machine（SVM）

We hope that:
For postive point y=1，cost( $\theta ^{T}*x_{i}$ )=0(z>=1).
For negative point y=0，cos2( $\theta ^{T}*x_{i}$ )=0(z<-1).

Support vector machine（SVM）
In this situation(C is infinite), machine will choose the black line as dividing line. Because we make the dividing line of $\theta ^{T}*x_{i}$ is 1/-1 instead of 0.

This classifier with maximum interval will provide a more reliable dividing line to classify. We can imgine that if dividing line is green line or pink line, it may be false if there is a little noisy.

3 kernel

The most commonly used kernel function are Gaussian kernel function and linear kernel function.

Kernel function is used to deal with non-linear situation. For example:
Support vector machine（SVM）
Gaussian kernel function is:

We take x1=(3,5) , so its image is as follows:

Support vector machine（SVM）
We find that if a point is closed to x1(3,5), the function will be closed to 1. Considering this function, we difinite a new optimization function:

where:
$f_i$ is Guassian kernel used on different $x_i$ .(A thought is to choose all the sample as $x_i$ )
Minimize optimization function, we will find $\theta ^{T}$ .

When $\sigma$ is high, it may lead to under fitting; when When $\sigma$ is low, it may lead to over fitting.

If $f_i$ = $x ^{T}x$ , we named it linear kernel.

There are several other kernel functions, but they are used less commonly.

Reference

链接: 支持向量机通俗导论（理解SVM的三层境界）.