机器学习系列之coursera week 9 Anomaly Detection

1. Density Estimation

1.1 Problem Motivation

Anomaly detection example:

Aircraft engine features:

x1 = heat generated

x2 = vibration intensity

Dataset: {x^1, ..., x^m}

New engine: x_test

机器学习系列之coursera week 9 Anomaly Detection fig. 1

(引自coursera week 9 Problem Motivation)

Density estimation:

Dataset: {x^1, ..., x^m}

Is x^test anomalous?

===> model p(x), where p(x) is a probability model(概率分布模型)

===> p(x^test) < ε -----> x^test is anomaly

Example:

(1) Fraud detection

(2) Manufacturing

(3) monitoring computers in a data center

2. Gaussian Distribution

parameters estimation:

机器学习系列之coursera week 9 Anomaly Detection

3. Algorithm

Training set: {x^1, ..., x^m}

Each example is x belongs to R^n

机器学习系列之coursera week 9 Anomaly Detection

Anomaly detection algorithm:

(1) choose features x_i that you think might be indicative of anomalous examples.
(2) Fit parameters μ1, ..., μn, σ^2_1, ..., σ^2_n
(3) Given new example x, compute p(x), anomaly if p(x) < ε

E.g.

机器学习系列之coursera week 9 Anomaly Detection fig. 2

(引自coursera week 9 Algorithm)

2. Building an Anomaly Detection System

2.1 Developing and Evaluating an Anomaly Detection System

The importance of real-number evaluation:

When developing a learning algorithm, making decisions is much easier if we have a way of evaluating our learning algorithm.

Assume we have some labeled data of anomalous and non-anomalous examples(y= 0 if normal, y = 1 if anomalous).

Training set:{x^1, ..., x^m}（一般来说无异常值，有几个也无所谓）

CV set

Test set

Aircraft wngines motivating example:

10000 good engines(normal)

20 flawed engines(anomalous)

===> training set: 6000 good engines

CV: 2000 good engines, 10 anomalous

Test: 2000 good engines, 10 anomalous

Algorithm evaluation:

Fit model p(x) on training set {x^1, ..., x^m}

On a CV/test example x, predict

y = 0, if p(x) < ε (anomaly)

y = 1, if p(x) >= ε (normal)

possible evaluation metrics:

- true positive, false positive, true negative, false negative

- Precision / Recall

- F1-score

note: can also use CV set to choose ε, pick the max one that has Max F1

2.2 Anomaly Detection VS. Supervised Learning

机器学习系列之coursera week 9 Anomaly Detection fig 3

(引自coursera week 9 Anomaly Detection VS. Supervised Learning)

机器学习系列之coursera week 9 Anomaly Detection fig. 4

(引自coursera week 9 Anomaly Detection VS. Supervised Learning)

2.3 Choose what features to use

Non-gaussian features: plot feature in histogram

if x^i 's histogram like this:

机器学习系列之coursera week 9 Anomaly Detection fig. 5

(引自coursera week 9 Choose what features to use)

is a Non-gaussian feature, usually using a log transform or x^c.

机器学习系列之coursera week 9 Anomaly Detection fig. 6

(引自coursera week 9 Choose what features to use)

Error analysis for anomaly detection:

want p(x) large for normal examples x, small for anomalous example y. Most common problem, p(x) is comparable for normal and anomalous examples. Considering add new features.

E.g.

机器学习系列之coursera week 9 Anomaly Detection fig. 7

(引自coursera week 9 Choose what features to use)

3. Multivariate Gaussian Distribution

3.1 Multivariate Gaussian Distibution

x belongs to R^n. Don't model p(x1), ..., p(xn) separately. Model p(x) all in one.

parameters: μ, Σ(convariance matrix)

机器学习系列之coursera week 9 Anomaly Detection

3.2 Anomaly detection using the multivariate Gaussian distribution

parameters: μ, Σ(convariance matrix)

机器学习系列之coursera week 9 Anomaly Detection

parameter fitting:

Given training set (x1, x2, ..., xm)

机器学习系列之coursera week 9 Anomaly Detection

Anomaly detection with the mulitvariate Gaussian:

(1) Fit model p(x) bu setting μ, Σ

(2) Given a new example x, computer p(x), flag an anomaly if p(x) < ε

机器学习系列之coursera week 9 Anomaly Detection fig. 8

(引自coursera week 9 Anomaly detection using the multivariate Gaussian distribution)

4. Recommender Systems - Predicting Movie Ratings

4.1 Problem Formulation

E.g. Predicting movies ratings:

User rates movies using zero to five stars

n_u = # users

n_m = # moveis

r(i, j) = 1 if user j has rated movie i

y(i, j) = rating given by user j to movie i (defined only if r(i, j) = 1)

4.2 Content-based recommendations

Content-based recommender systems:

suppose we have two features, x1 represents the probability of movies that belongs to romance; x2 ~ belong to action.

机器学习系列之coursera week 9 Anomaly Detection fig. 9

(引自coursera week 9 Content-based recommendations)

set x_0 = 1

===> x^(1) = [1; 0.9; 0]

for each user j, learn a parameter θ^(j). Predict user j as rating movie i with (θ^(j))^T * x^(i) stars

m(j) = # movies rated by user j

To learn θj:

机器学习系列之coursera week 9 Anomaly Detection

To learn θ1, ..., θ_nu：

机器学习系列之coursera week 9 Anomaly Detection

opt algorithm: Gradient Descent updata:

机器学习系列之coursera week 9 Anomaly Detection

5. Recommender Systems - Collaborative Filtering

5.1 Collaborative Filtering (协同过滤)

opt algorithm:

Given θ1, ..., θ_nu to learn x(i):

机器学习系列之coursera week 9 Anomaly Detection

Given θ1, ..., θ_nu to learn x(1), ... , x(n_m):

机器学习系列之coursera week 9 Anomaly Detection

collaborative filtering:

Given θ1, ..., θ_nu can estimate x(1), ... , x(n_m). Random initialize θ ---> x ---> θ ---> x....

5.2 Collaborative filtering algorithm

Collaborative filtering opt objective:

Minimizing x(1) , ..., x(n_m), θ1, ..., θ_nu simultaneously:

机器学习系列之coursera week 9 Anomaly Detection

note: no x0

Collaborative filtering algorithm:

(1) Initialize x(1),...,x(n_m), θ1,...,θn_u to some small random values
(2) Minimize J using gradient descent
(3) For a user with parameters θ and a movie with(learned) features x, predict a star rating of θ^T * x

for step 2:

机器学习系列之coursera week 9 Anomaly Detection

6. Recommender Systems - Low Rank Matrix

6.1 Vectorization: Low Rank Factorization

机器学习系列之coursera week 9 Anomaly Detection fig. 10

(引自coursera week 9 Vectorization: Low Rank Factorization)

predicted ratings = X * Θ^T

note: vollaborative filtering = low rank matrix factorization

Finding related movies:

for rach product i, we learn a feature vector x(i) belongs to R^n.

find the movies with the samllest ||x(i) - x(j)||^2

6.2 Implementational detail: Mean normalization

users who have not rated any movies:

机器学习系列之coursera week 9 Anomaly Detection fig. 11

(引自coursera week 9 Implementational detail: Mean normalization)

===> user 5 所有评分为0

Mean normalization:

机器学习系列之coursera week 9 Anomaly Detection fig. 12

(引自coursera week 9 Implementational detail: Mean normalization)

目录

1. Density Estimation

1.1 Problem Motivation

2. Gaussian Distribution

3. Algorithm

2. Building an Anomaly Detection System

2.1 Developing and Evaluating an Anomaly Detection System

2.2 Anomaly Detection VS. Supervised Learning

2.3 Choose what features to use

3. Multivariate Gaussian Distribution

3.1 Multivariate Gaussian Distibution

3.2 Anomaly detection using the multivariate Gaussian distribution

4. Recommender Systems - Predicting Movie Ratings

4.1 Problem Formulation

4.2 Content-based recommendations

5. Recommender Systems - Collaborative Filtering

5.1 Collaborative Filtering (协同过滤)

5.2 Collaborative filtering algorithm

6. Recommender Systems - Low Rank Matrix

6.1 Vectorization: Low Rank Factorization

6.2 Implementational detail: Mean normalization