5001: Statistical Machine Learning I 3rd class(20-9-15) 笔记

Overview: Statistical Machine Learning

Statistics and machine learning
models
- prediction and inference
- Classification
Model assessment for regression
Model assessment for classification
Validation set approach
- LOOCV
- K-fold CV

专有名词缩写
MSE（mean square error）
MSE(f ) = E(L(Y , f (X))) = E(Y − f (X))2

MCE（misclassification error）
MCE(f ) = E(L(Y , f (X))) = E(I(Y 6= f (X)))

Bias(fˆ(X)) = E(fˆ(X)) − f (X)

var(fˆ(X)) = E(fˆ(X) − E(fˆ(X)))2

Statistics and machine learning

“Different” terminologies:

Machine Learning	Statistics
Supervised learning	Classification/regression
Unsupervised learning	Clustering
Semisupervised learning	Class’n/reg’n with missing responses
Manifold learning	(Nonlinear) dimension reduction

Supervised learning :
for (x,y) x属于Rp，y属于R（x的维度是p）
可以通过训练，进行Classification/regression

Unsupervised learning
for x ，x属于Rp（x的维度是p），进行训练
可以进行一些聚类相关的操作

对于Semisupervised learning
some parts of its dataset contain the value y,
but most of its data are just x without y
for example, using python crawler to collect much data and tag some data by person

对于Manifold learning
？？？

Parametric models	Nonparametric models
Linear/polynomial regression model	Local smoothing
Generalized linear regression model	Smoothing splines
Fisher’s discriminant analysis	Classification and regression trees; random forest; boosting
Logistic regression	Support vector machines
Deep learning

models

prediction and inference

Classification

对于例子
进行classification的思路
1 Linear regression
5001: Statistical Machine Learning I 3rd class(20-9-15) 笔记

2 Nearest neighbors
5001: Statistical Machine Learning I 3rd class(20-9-15) 笔记
Left panel shows the result of 15-NN classifier; a few training
data are misclassified, and the decision boundary adapts to the
local density of the classes

Right panel shows the result of 1-NN classifier; none of the
training data is misclassified

Model assessment for regression

MSE（mean square error）

MSE(f ) = E(L(Y , f (X))) = E(Y − f (X))2

training error

5001: Statistical Machine Learning I 3rd class(20-9-15) 笔记

test error

5001: Statistical Machine Learning I 3rd class(20-9-15) 笔记

Model assessment for classification

MCE（misclassification error）

MCE(f ) = E(L(Y , f (X))) = E(I(Y 6= f (X)))

training error

5001: Statistical Machine Learning I 3rd class(20-9-15) 笔记

test error

5001: Statistical Machine Learning I 3rd class(20-9-15) 笔记

Validation set approach

If we have a large training set, we can estimate the test error by randomly splitting the data into training and validation parts Use the training part to build model, and then assess the model by applying it to the validation part

LOOCV

Split the data set of size n into
Training set with size n − 1
Validation set with size 1
Repeat this process n times

K-fold CV

5001: Statistical Machine Learning I 3rd class(20-9-15) 笔记