Introduction


0

1

  • History 1950-1970 logic rules; 1980-1990 knowledge acquisition; 2010

-. machine learning

  • DeepLearningMachineLearningArtificialIntelligence
  • machine learning
    • use statistical techniques, “learn” with data
    • extract features automatically, instead of by domain experts
    • learn automatically, instead of explicit programming
  • Big Data-Big Computation-Big Model : Why deep learning now
  • usage

2Probability

  • Bayes’ Theorem

    • p(Y|X)=p(X|Y)p(Y)p(X),p(X)=Yp(X|Y)p(Y)
    • posterior likelihood * prior
  • variables

    • E[f] := the average value of f(X) under the distribution p(x)
    • E[f]=xp(x)f(x)
    • V[f], cov[x, y]
  • distributions

    • binomial distribution
    • Bin(m|N,μ)=(Nm)μm(1μ)Nm
    • E[m]=Nμ,var[m]=Nμ(1μ)
  • multinomial variables

    • x可以取k种值,x=(0,0,1,0,0,0)T表示x取了六种中的第三种

    μ=(μ1,μ2,...,μk)T,对应x向量每个位置上为1的概率

    从而某个特定的x出现的概率 p(x|μ)=k=1Kμkxk (也就是μk)

    • E[x|mu]=xp(x|μ)x=(μ1,μ2,...,μk)T=μ

    • maximum likelihood estimation

    μk=mkN,mk=Nxnk

  • gaussian univariate distribution正态分布

    • multivariate gaussian distribution
    • maximum likelihood estimation
    • mixture of gaussians-可以模拟其他各种分布
  • gradient descent梯度下降

    • a way to minimize an object function J(θ)
    • η: learning rate, which determines the size of the steps we take to reach a local minimum
    • update equation: θ=θηθJ(θ)

20180713DLday1课程笔记

相关文章: