20180713DLday1课程笔记

0

1

-. machine learning

$D e e p L e a r n i n g \subset M a c h i n e L e a r n i n g \subset A r t i f i c i a l I n t e l l i g e n c e$
machine learning
- use statistical techniques, “learn” with data
- extract features automatically, instead of by domain experts
- learn automatically, instead of explicit programming
Big Data-Big Computation-Big Model : Why deep learning now
usage
- …

2Probability

Bayes’ Theorem
- $p (Y | X) = \frac{p (X | Y) p (Y)}{p (X)}, p (X) = \sum_{Y} p (X | Y) p (Y)$
- posterior $\propto$ likelihood * prior
variables
- E[f] := the average value of f(X) under the distribution p(x)
- $E [f] = \sum_{x} p (x) f (x)$
- V[f], cov[x, y]
distributions
- binomial distribution
- $B i n (m | N, μ) = (\binom{N}{m}) μ^{m} (1 - μ)^{N - m}$
- $E [m] = N μ, v a r [m] = N μ (1 - μ)$
multinomial variables
- x可以取k种值， $x = (0, 0, 1, 0, 0, 0)^{T}$ 表示x取了六种中的第三种
$μ = (μ_{1}, μ_{2}, . . ., μ_{k})^{T}$ ，对应x向量每个位置上为1的概率

从而某个特定的x出现的概率 $p (x | μ) = \prod_{k = 1}^{K} μ_{k}^{x_{k}}$ (也就是 $μ_{k}$ )
- $E [x | m u] = \sum_{x} p (x | μ) x = (μ_{1}, μ_{2}, . . ., μ_{k})^{T} = μ$
- maximum likelihood estimation
$μ_{k} = \frac{m_{k}}{N}, m_{k} = \sum_{N} x_{n k} 观察值的矩阵的每列和$
gaussian univariate distribution正态分布
- multivariate gaussian distribution
- maximum likelihood estimation
- mixture of gaussians-可以模拟其他各种分布
gradient descent梯度下降
- a way to minimize an object function $J (θ)$
- $η$ : learning rate, which determines the size of the steps we take to reach a local minimum
- update equation: $θ = θ - η * \nabla_{θ} J (θ)$

20180713DLday1课程笔记