机器学习系列之coursera week 7 Support Vector Machine

1. Large margin classification

1.1 Optimization objective

Alternative view of LR:

机器学习系列之coursera week 7 Support Vector Machine fig. 1

(引自coursera week 7 Optimization objective)

If y = 1, we want h(x) approximately equal to 1, θ^T * x >> 0

If y = 1, we want h(x) approximately equal to 0, θ^T * x << 0

机器学习系列之coursera week 7 Support Vector Machine

if y = 1, want θ^T * x >> 0

机器学习系列之coursera week 7 Support Vector Machine fig. 2

(引自coursera week 7 Optimization objective)

if y = 0, want θ^T * x << 0

机器学习系列之coursera week 7 Support Vector Machine fig. 3

(引自coursera week 7 Optimization objective)

Support Vector Machine:

LR:

机器学习系列之coursera week 7 Support Vector Machine

SVM:

机器学习系列之coursera week 7 Support Vector Machine

hypothesis:

机器学习系列之coursera week 7 Support Vector Machine

1.2 Large Margin Intuition

SVM:

机器学习系列之coursera week 7 Support Vector Machine

机器学习系列之coursera week 7 Support Vector Machine fig. 4

(引自coursera week 7 Large Margin Intuition)

SVM Decision Boundary:

whenever y^i = 1:

θ^T * x^i >= 1

whenever y^i = 0:

θ^T * x^i <= -1

linearly separable case:

机器学习系列之coursera week 7 Support Vector Machine fig. 5

(引自coursera week 7 Large Margin Intuition)

SVM会选择黑色的decision boundary，因为其分类性能会更好，也有很大的margin。 SVM有时被称为 Large Margin classifier

Large Margin Classifier in presence of outliers(异常值):

机器学习系列之coursera week 7 Support Vector Machine fig. 6

(引自coursera week 7 Large Margin Intuition)

1.3 Mathematic Behind Large Margin classification

Vector Inner Product:

inner product = u^T * v

SVM Decision Boundary:

机器学习系列之coursera week 7 Support Vector Machine

simplication: θ_0 = 0, n = 2

机器学习系列之coursera week 7 Support Vector Machine

机器学习系列之coursera week 7 Support Vector Machine fig. 7

(引自coursera week 7 Mathematic Behind Large Margin classification)

机器学习系列之coursera week 7 Support Vector Machine

where p^i is the project of x^i onto the vector θ. simplification: θ_0 = 0, 决策边界过原点

E.g.

机器学习系列之coursera week 7 Support Vector Machine fig 8

(引自coursera week 7 Mathematic Behind Large Margin classification)

机器学习系列之coursera week 7 Support Vector Machine fig. 9

(引自coursera week 7 Mathematic Behind Large Margin classification)

SVM会选择fig 9 的边界.

2. Kernels

2.1 Kernels I

Non-linear Decision Boundary:

机器学习系列之coursera week 7 Support Vector Machine fig. 10

(引自coursera week 7 Kernels I)

机器学习系列之coursera week 7 Support Vector Machine

Is there a different/better choice of the features???

Kenels:

Given x, compute new feature depending on proximity to landmarks l_1, l_2, l_3

机器学习系列之coursera week 7 Support Vector Machine fig. 11

(引自coursera week 7 Kernels I)

Given x:

机器学习系列之coursera week 7 Support Vector Machine

exp(...) is called Gaussian Kernel

机器学习系列之coursera week 7 Support Vector Machine

note: if x approximately equal to l_1, then f1 approximately equal 1

if x far from l_1, then f1 approximately equal 0

机器学习系列之coursera week 7 Support Vector Machine fig. 12

(引自coursera week 7 Kernels I)

2.2 Kernels II

Choose the landmarks:

choose l_(i) = x_(i)

Given example x:

机器学习系列之coursera week 7 Support Vector Machine fig. 13

(引自coursera week 7 Kernels II)

given (x1, y1), (x2, y2), ..., (xm, ym)

for training example (xi, yi),

机器学习系列之coursera week 7 Support Vector Machine

Hypothesis: given x, compute features f(belongs to R^m+1), predict "y = 1", if θ^T * f >= 0 (θ belongs to R^m+1)

Training:

机器学习系列之coursera week 7 Support Vector Machine

note: 正则化项的的项数从n变成了m

SVM parameters:

large C: lower bias, high variance

small C: higher bias, low variance

σ^2:

large σ^2: feature f very more smoothly, then higher bias, lower variace

small σ^2: feature f very less smoothly, then lower bias, higher variace

3. SVMs in practice

Using a SVM:

Need to specify:

-choice of parameter C

-choice of Kermel

E.g. No kernel("linear kernel") ---- n large, m small

Gaussian kernel: choose σ^2 ---- n small, m large

note: Do perform feature scaling before using the Gaussian kernel

other choice of kernel:

note: Not all similarity functions similarity (x, l) make valid kernels. (need to satisfy technical condition called "Mercer's Theorem" to make sure SVM package's optimizations run correctly and do not diverge).

many off-the-self kernels available:

-Polynomial kernel: k(x, l) = (x^t * x + r)^d

- More esoteric: string kernel, chi-square kernel, histogram intersection kernel, ....

Multi-class classification:

机器学习系列之coursera week 7 Support Vector Machine fig. 14

(引自coursera week 7 SVMs in practice)

LR Vs. SVMs:

n = # features, m = # training examples

if n is large(relative to m), then use LR or SVm without a kernel

if n is small, m is intermediate, then use SVM with Gaussian kernel

if n is small, m is large, then create more features, then us LR or SVM without kernel

目录

1. Large margin classification

1.1 Optimization objective

1.2 Large Margin Intuition

1.3 Mathematic Behind Large Margin classification

2. Kernels

2.1 Kernels I

2.2 Kernels II

3. SVMs in practice