LDA,咱们一起来学习【转】

首先，我以前总结过，关于北邮一个人写的导读，连接在这里。

肯定得看Blei 2003年的论文，点击下载。

然后很重要的Blei的视频和一个80多页的Lecture。

Topic Models

Latent Dirichlet Allocation(LDA) [pdf]模型是近年来提出的一种具有文本主题表示能力的非监督学习模型。

关键所在：it posits that each document is a mixture of a small number of topics and that each word’s creation is attributable to one of the document’s topics。

将文档看成是一组主题的混合，词有分配到每个主题的概率。

Probabilistic latent semantic analysis（PLSA） LDA可以看成是服从贝叶斯分布的PLSA

LDA，就是将原来向量空间的词的维度转变为Topic的维度，这一点是十分有意义的。

Lda的源代码，java c matlab python 等：

Code-python

deltaLDA.tgz

Other LDA implementations

lda-c(C)
GibbsLDA++(C++)
Matlab Topic Modeling Toolbox 1.3.2(Matlab/C via MEX)
lda-j(Java)
lda (C and Matlab)

转自：http://www.zhizhihu.com/html/y2010/1465.html