(源自:http://isip.buaa.edu.cn/lichen/?p=376

beta分布用于对二值随机变量建模,比如抛硬币实验。但如果随机变量可以取多个互斥的值呢?比如可能有pattern recognition and machine learning 2.2 Multinomial Variables种选择。对于某个可以取pattern recognition and machine learning 2.2 Multinomial Variables种互斥状态的随机变量我们可以用一个pattern recognition and machine learning 2.2 Multinomial Variables维向量pattern recognition and machine learning 2.2 Multinomial Variables来表示,其中一个元素pattern recognition and machine learning 2.2 Multinomial Variables取1,剩下的位置取0。例如,如果我们有一个变量可以有pattern recognition and machine learning 2.2 Multinomial Variables种状态,一个观察值恰好对应pattern recognition and machine learning 2.2 Multinomial Variables,那么pattern recognition and machine learning 2.2 Multinomial Variables可以表示成  

pattern recognition and machine learning 2.2 Multinomial Variables

这种向量满足pattern recognition and machine learning 2.2 Multinomial Variables。如果我们记pattern recognition and machine learning 2.2 Multinomial Variables的概率为参数pattern recognition and machine learning 2.2 Multinomial Variables,那么pattern recognition and machine learning 2.2 Multinomial Variables的分布就是:a

pattern recognition and machine learning 2.2 Multinomial Variables

其中的pattern recognition and machine learning 2.2 Multinomial Variables,而参数pattern recognition and machine learning 2.2 Multinomial Variables满足pattern recognition and machine learning 2.2 Multinomial Variables且 pattern recognition and machine learning 2.2 Multinomial Variables,因为它们代表概率。(2.26)的分布可以看成伯努利分布的一个多值泛化。可以看到这个分布式满足概率的归一化的:

 pattern recognition and machine learning 2.2 Multinomial Variables

 同时,

pattern recognition and machine learning 2.2 Multinomial Variables

 考虑一个包含pattern recognition and machine learning 2.2 Multinomial Variables个独立观察值pattern recognition and machine learning 2.2 Multinomial Variables的数据集pattern recognition and machine learning 2.2 Multinomial Variables相应的似然函数:

 

pattern recognition and machine learning 2.2 Multinomial Variables

 可以看出似然函数和数量pattern recognition and machine learning 2.2 Multinomial Variables有关:

pattern recognition and machine learning 2.2 Multinomial Variables

 实际上是表示观察数据中pattern recognition and machine learning 2.2 Multinomial Variables为1的观察值的个数。这在概率论中称为充分统计量。

为了找出boldsymbol{mu}的最大似然估计值,我们需要对pattern recognition and machine learning 2.2 Multinomial Variables求极大值,并满足所有mu_k之和为1这个约束。引入拉朗格日乘子lambda,并极大化:

pattern recognition and machine learning 2.2 Multinomial Variables

 对(2.31)式以pattern recognition and machine learning 2.2 Multinomial Variables为自变量求导并令其导数为0,可得:

 pattern recognition and machine learning 2.2 Multinomial Variables

 把(2.32)带入约束pattern recognition and machine learning 2.2 Multinomial Variables得到pattern recognition and machine learning 2.2 Multinomial Variables,这样我们得到最大似然的解:

pattern recognition and machine learning 2.2 Multinomial Variables

这个实际上式N个观察值中那些pattern recognition and machine learning 2.2 Multinomial Variables的实例所占百分比。

 我们考虑给出pattern recognition and machine learning 2.2 Multinomial Variables和N个数据观察值时,pattern recognition and machine learning 2.2 Multinomial Variables的联合概率分布。从(2.29)我们得到:

pattern recognition and machine learning 2.2 Multinomial Variables

这就是多项式分布,归一化系数是把pattern recognition and machine learning 2.2 Multinomial Variables个对象划分成pattern recognition and machine learning 2.2 Multinomial Variables组大小分别为pattern recognition and machine learning 2.2 Multinomial Variables的可能划分总数。即:

pattern recognition and machine learning 2.2 Multinomial Variables

注意到变量m_k需要满足:

pattern recognition and machine learning 2.2 Multinomial Variables

相关文章: