专有名词缩写
MSE(mean square error)
MSE(f ) = E(L(Y , f (X))) = E(Y − f (X))2

MCE(misclassification error)
MCE(f ) = E(L(Y , f (X))) = E(I(Y 6= f (X)))

Bias(fˆ(X)) = E(fˆ(X)) − f (X)

var(fˆ(X)) = E(fˆ(X) − E(fˆ(X)))2

Statistics and machine learning

“Different” terminologies:

Machine Learning Statistics
Supervised learning Classification/regression
Unsupervised learning Clustering
Semisupervised learning Class’n/reg’n with missing responses
Manifold learning (Nonlinear) dimension reduction

Supervised learning :
for (x,y) x属于Rp,y属于R(x的维度是p)
可以通过训练,进行Classification/regression

Unsupervised learning
for x ,x属于Rp(x的维度是p),进行训练
可以进行一些聚类相关的操作

对于Semisupervised learning
some parts of its dataset contain the value y,
but most of its data are just x without y
for example, using python crawler to collect much data and tag some data by person

对于Manifold learning
???

Parametric models Nonparametric models
Linear/polynomial regression model Local smoothing
Generalized linear regression model Smoothing splines
Fisher’s discriminant analysis Classification and regression trees; random forest; boosting
Logistic regression Support vector machines
Deep learning

models

prediction and inference

Classification

对于例子
进行classification的思路
1 Linear regression
5001: Statistical Machine Learning I 3rd class(20-9-15) 笔记

2 Nearest neighbors
5001: Statistical Machine Learning I 3rd class(20-9-15) 笔记
Left panel shows the result of 15-NN classifier; a few training
data are misclassified, and the decision boundary adapts to the
local density of the classes

Right panel shows the result of 1-NN classifier; none of the
training data is misclassified

Model assessment for regression

MSE(mean square error)

MSE(f ) = E(L(Y , f (X))) = E(Y − f (X))2

training error

5001: Statistical Machine Learning I 3rd class(20-9-15) 笔记

test error

5001: Statistical Machine Learning I 3rd class(20-9-15) 笔记

Model assessment for classification

MCE(misclassification error)

MCE(f ) = E(L(Y , f (X))) = E(I(Y 6= f (X)))

training error

5001: Statistical Machine Learning I 3rd class(20-9-15) 笔记

5001: Statistical Machine Learning I 3rd class(20-9-15) 笔记

test error

5001: Statistical Machine Learning I 3rd class(20-9-15) 笔记

Validation set approach

If we have a large training set, we can estimate the test error by randomly splitting the data into training and validation parts Use the training part to build model, and then assess the model by applying it to the validation part

LOOCV

Split the data set of size n into
Training set with size n − 1
Validation set with size 1
Repeat this process n times

K-fold CV

5001: Statistical Machine Learning I 3rd class(20-9-15) 笔记

相关文章:

  • 2021-10-08
  • 2021-07-18
  • 2021-06-15
  • 2021-11-13
  • 2021-10-08
  • 2021-05-20
  • 2022-01-25
  • 2021-09-20
猜你喜欢
  • 2021-09-28
  • 2021-11-27
  • 2021-05-26
  • 2021-06-16
  • 2021-11-21
  • 2022-12-23
  • 2022-01-17
相关资源
相似解决方案