目录
1. Density Estimation
1.1 Problem Motivation
2. Gaussian Distribution
3. Algorithm
2. Building an Anomaly Detection System
2.1 Developing and Evaluating an Anomaly Detection System
2.2 Anomaly Detection VS. Supervised Learning
2.3 Choose what features to use
3. Multivariate Gaussian Distribution
3.1 Multivariate Gaussian Distibution
3.2 Anomaly detection using the multivariate Gaussian distribution
4. Recommender Systems - Predicting Movie Ratings
4.1 Problem Formulation
4.2 Content-based recommendations
5. Recommender Systems - Collaborative Filtering
5.1 Collaborative Filtering (协同过滤)
5.2 Collaborative filtering algorithm
6. Recommender Systems - Low Rank Matrix
6.1 Vectorization: Low Rank Factorization
6.2 Implementational detail: Mean normalization
1. Density Estimation
1.1 Problem Motivation
Anomaly detection example:
Aircraft engine features:
x1 = heat generated
x2 = vibration intensity
Dataset: {x^1, ..., x^m}
New engine: x_test
fig. 1
(引自coursera week 9 Problem Motivation)
Density estimation:
Dataset: {x^1, ..., x^m}
Is x^test anomalous?
===> model p(x), where p(x) is a probability model(概率分布模型)
===> p(x^test) < ε -----> x^test is anomaly
Example:
(1) Fraud detection
(2) Manufacturing
(3) monitoring computers in a data center
2. Gaussian Distribution
parameters estimation:
3. Algorithm
Training set: {x^1, ..., x^m}
Each example is x belongs to R^n
Anomaly detection algorithm:
(1) choose features x_i that you think might be indicative of anomalous examples.
(2) Fit parameters μ1, ..., μn, σ^2_1, ..., σ^2_n
(3) Given new example x, compute p(x), anomaly if p(x) < ε
E.g.
fig. 2
(引自coursera week 9 Algorithm)
2. Building an Anomaly Detection System
2.1 Developing and Evaluating an Anomaly Detection System
The importance of real-number evaluation:
When developing a learning algorithm, making decisions is much easier if we have a way of evaluating our learning algorithm.
Assume we have some labeled data of anomalous and non-anomalous examples(y= 0 if normal, y = 1 if anomalous).
Training set:{x^1, ..., x^m}(一般来说无异常值,有几个也无所谓)
CV set
Test set
Aircraft wngines motivating example:
10000 good engines(normal)
20 flawed engines(anomalous)
===> training set: 6000 good engines
CV: 2000 good engines, 10 anomalous
Test: 2000 good engines, 10 anomalous
Algorithm evaluation:
Fit model p(x) on training set {x^1, ..., x^m}
On a CV/test example x, predict
y = 0, if p(x) < ε (anomaly)
y = 1, if p(x) >= ε (normal)
possible evaluation metrics:
- true positive, false positive, true negative, false negative
- Precision / Recall
- F1-score
note: can also use CV set to choose ε, pick the max one that has Max F1
2.2 Anomaly Detection VS. Supervised Learning
fig 3
(引自coursera week 9 Anomaly Detection VS. Supervised Learning)
fig. 4
(引自coursera week 9 Anomaly Detection VS. Supervised Learning)
2.3 Choose what features to use
Non-gaussian features: plot feature in histogram
if x^i 's histogram like this:
fig. 5
(引自coursera week 9 Choose what features to use)
is a Non-gaussian feature, usually using a log transform or x^c.
fig. 6
(引自coursera week 9 Choose what features to use)
Error analysis for anomaly detection:
want p(x) large for normal examples x, small for anomalous example y. Most common problem, p(x) is comparable for normal and anomalous examples. Considering add new features.
E.g.
fig. 7
(引自coursera week 9 Choose what features to use)
3. Multivariate Gaussian Distribution
3.1 Multivariate Gaussian Distibution
x belongs to R^n. Don't model p(x1), ..., p(xn) separately. Model p(x) all in one.
parameters: μ, Σ(convariance matrix)
3.2 Anomaly detection using the multivariate Gaussian distribution
parameters: μ, Σ(convariance matrix)
parameter fitting:
Given training set (x1, x2, ..., xm)
Anomaly detection with the mulitvariate Gaussian:
(1) Fit model p(x) bu setting μ, Σ
(2) Given a new example x, computer p(x), flag an anomaly if p(x) < ε
fig. 8
(引自coursera week 9 Anomaly detection using the multivariate Gaussian distribution)
4. Recommender Systems - Predicting Movie Ratings
4.1 Problem Formulation
E.g. Predicting movies ratings:
User rates movies using zero to five stars
n_u = # users
n_m = # moveis
r(i, j) = 1 if user j has rated movie i
y(i, j) = rating given by user j to movie i (defined only if r(i, j) = 1)
4.2 Content-based recommendations
Content-based recommender systems:
suppose we have two features, x1 represents the probability of movies that belongs to romance; x2 ~ belong to action.
fig. 9
(引自coursera week 9 Content-based recommendations)
set x_0 = 1
===> x^(1) = [1; 0.9; 0]
for each user j, learn a parameter θ^(j). Predict user j as rating movie i with (θ^(j))^T * x^(i) stars
m(j) = # movies rated by user j
To learn θj:
To learn θ1, ..., θ_nu:
opt algorithm: Gradient Descent updata:
5. Recommender Systems - Collaborative Filtering
5.1 Collaborative Filtering (协同过滤)
opt algorithm:
Given θ1, ..., θ_nu to learn x(i):
Given θ1, ..., θ_nu to learn x(1), ... , x(n_m):
collaborative filtering:
Given θ1, ..., θ_nu can estimate x(1), ... , x(n_m). Random initialize θ ---> x ---> θ ---> x....
5.2 Collaborative filtering algorithm
Collaborative filtering opt objective:
Minimizing x(1) , ..., x(n_m), θ1, ..., θ_nu simultaneously:
note: no x0
Collaborative filtering algorithm:
(1) Initialize x(1),...,x(n_m), θ1,...,θn_u to some small random values
(2) Minimize J using gradient descent
(3) For a user with parameters θ and a movie with(learned) features x, predict a star rating of θ^T * x
for step 2:
6. Recommender Systems - Low Rank Matrix
6.1 Vectorization: Low Rank Factorization
fig. 10
(引自coursera week 9 Vectorization: Low Rank Factorization)
predicted ratings = X * Θ^T
note: vollaborative filtering = low rank matrix factorization
Finding related movies:
for rach product i, we learn a feature vector x(i) belongs to R^n.
find the movies with the samllest ||x(i) - x(j)||^2
6.2 Implementational detail: Mean normalization
users who have not rated any movies:
fig. 11
(引自coursera week 9 Implementational detail: Mean normalization)
===> user 5 所有评分为0
Mean normalization:
fig. 12
(引自coursera week 9 Implementational detail: Mean normalization)