1.4 贝叶斯估计

1.4.0 参数估计基础

在实际问题中，发现信号的基础上，还需要测定信号的参数，但由于信号要受到随机噪声的污染，不可能精确的测定信号的参数，需要使用统计估计的方法尽可能精确地对其估计。如果信号参数是随机变量或非随机的未知量，则称为信号的参数估计；若被估计量是随机过程或者非随机的未知过程，则称为波形估计或状态估计。因此，信号的参数估计是指被估计参数在观测时间内不随时间变化，属静态估计；波形或状态估计涉及的信号参数是随时间变化的，属动态估计。

为了对信号的参数做出估计，需要获得观测数据。设观测方程为

1.4 贝叶斯估计，k=1,2,…,N

其中， 1.4 贝叶斯估计是第k次观测值；是被估计量；是第k次测量噪声；是已知的观测系数。

现在的问题是根据N次观测值

1.4 贝叶斯估计

按照某种最佳估计准则，对参数 1.4 贝叶斯估计做出估计。即构造一个观测量的函数，即，作为参数的估计值。

如果被估计量是p维矢量 1.4 贝叶斯估计，那么观测方程一般可以表示为

1.4 贝叶斯估计，k=1,2,…,N

式中， 1.4 贝叶斯估计是第k次观测的q维观测矢量，是p维被估计矢量，是第k次观测的q维观测噪声矢量，是阶观测矩阵。

1.4 贝叶斯估计

信号参数估计的统计模型

1.4.1 常用代价函数与贝叶斯估计

在信号估计问题中，因为被估计问题θ和估计量 1.4 贝叶斯估计是连续随机变量。所以每一对分配一个代价函数。代价函数C是θ和两个变量的函数。

但实际上，我们把它规定为误差 1.4 贝叶斯估计的函数，即，是估计误差的但变量函数。

The cost function C(x) is typically one of the following ：

1.4.1.1 误差平方代价函数(Quadratic Cost Solution)

(MMSE estimator)

1.4 贝叶斯估计

1.4.1.2 误差绝对值代价函数(Absolute Cost Solution)

(posteriori median estimator)

1.4 贝叶斯估计

1.4.1.3 均匀代价函数(Hit-or-miss)

(Maximum a Posteriori (MAP) estimator)

1.4 贝叶斯估计

除上述三种之外，还可以选择其他形式的代价函数，但无论何种形式的代价都应满足两个特性：非负性和误差 1.4 贝叶斯估计趋于零的最小性。

1.4.1.4 贝叶斯估计

被估计量 1.4 贝叶斯估计是随机变量，其先验概率密度为，那么是随机参量和观测量z的函数，因此，平均代价为：

1.4 贝叶斯估计

使平均代价 1.4 贝叶斯估计最小的估计就是贝叶斯估计。

利用概率论中的条件概率公式

1.4 贝叶斯估计

平均代价公式可改写为

1.4 贝叶斯估计

由于上式对 1.4 贝叶斯估计的内积分非负，因而C最小等效为内积分最小，即

1.4 贝叶斯估计

1.4 贝叶斯估计称为条件平均代价。它对求最小，就能得到参量的贝叶斯估计

Bayesian estimators are deﬁned by a minimization problem which seeks for the value of 1.4 贝叶斯估计 that minimizes the average cost.

1.4 贝叶斯估计

1.4.2 最小均方误差估计

使用平方代价函数的贝叶斯估计使最小均方误差估计

推导：

将平方代价函数的条件平均代价用 1.4 贝叶斯估计表示，

1.4 贝叶斯估计

使条件平均代价最小的一个必要条件是上式对 1.4 贝叶斯估计，求导并令结果等于0来求得最佳的，即

1.4 贝叶斯估计

因为

1.4 贝叶斯估计

所以

1.4 贝叶斯估计

求二阶导

1.4 贝叶斯估计

故 1.4 贝叶斯估计是对应的平均代价的极小值，由于它使均方误差估计最小，因而称为最小均方误差估计。由于也是的条件均值，故最小均方误差估计又称条件均值估计。

1.4.3最大后验估计

对于均匀代价函数，条件平均代价用 1.4 贝叶斯估计表示为

1.4 贝叶斯估计

其中 1.4 贝叶斯估计是使条件代价最小的估计量，欲使最小，需使右边积分值最大。应当选择使它处于后验概率密度最大处的值，这样求得的估计量称为最大后验概率估计，记为。

如果最大值处于 1.4 贝叶斯估计的允许范围内，且有连续的一阶导数，则获得最大值的必要条件是

1.4 贝叶斯估计

因为自然对数是自变量的单调函数，所以有

1.4 贝叶斯估计

上式称为最大后验方程，利用上式求解时，每一种情况下都需要检验所求得的解是否绝对最大。

1.4.4 最大似然估计

利用贝叶斯公式

1.4 贝叶斯估计

将最大后验方程写成

1.4 贝叶斯估计

当被估计量 1.4 贝叶斯估计是未知先验分布的随机参量或是非随机未知参量时，上式含有未知量，不能采用上式求估计值。这时设想只用其中的第一项，即取似然函数的最大值对应的作为估计量，则称之为最大似然估计，其估计量记为，可由方程：

1.4 贝叶斯估计或

求得。第二个对数求导公式称为最大似然方程。

由于ML没有或不能利用被估计参量的先验知识，因而其估计质量一般说要比贝叶斯估计差，也就是说，比最大后验估计差。

1.4.5 Bayesian inference versus Frequentist inference

通常可以把最大后验看成贝叶斯学派的方法，而把最大似然看成频率学派的方法。

Two different interpretations of probability have long existed. In Bayesian inference, the prior probabilities are speciﬁed and then Bayes theorem is used to make probability statements about the parameter as in equation. In frequentist inference such prior probabilities are considered nonsensical. The parameter θ is considered an unknown constant, not a random variable. Since it is not random, making probability statements doesn’t make sense. A counterargument to this is that even if it is a constant, since it is unknown we may view it as a random variable. The uncertainty may be considered randomness. It might be one value, it might be another, it might be a third. Such arguments can and have continued for many years and are very interesting.

If you are just interested in determining θ, Bayesian and frequentist methods both oﬀer promising paths toward a solution. Often the two methods generate extremely similar answers anyway, making any argument about which one is better nearly meaningless from the standpoint of whether the method arrives at the correct value of θ. Speciﬁcally, often the MSEs of the two methods are identical or nearly identical.

There are certain problems where the frequentist solution (usually Maximum Likelihood Estimation) is easier to follow, other problems where the Bayesian solution is easier to follow. Thus, a knowledge of both methods is useful.

Bayesian inference updates the probability estimate for a hypothesis as additional evidence is acquired. Bayesian inference is explicitly based on the evidence and prior opinion, which allows it to be based on multiple sets of evidence.

Frequentist inference is capable of making operational decisions and estimating parameters with or without confidence intervals. Frequentist inference is based solely on the probability of the data which is often one set of evidence.

一个例子看懂最大后验（使用Hit or Miss代价函数的贝叶斯估计）和极大似然的区别

小明今天没来上学，三个可能的Hypothesis（θ）：

小明今天生病了 / 美国总统特朗普会见小明 / 地球遭受陨石撞击

用极大似然（MLE）估计出来的θ_hat(对θ的估计)是“地球遭受陨石撞击”，因为

Likelihood（小明今天没来上学|地球遭受陨石撞击）= 1

而用最大后验求出来的是“小明今天生病了”，因为考虑了先验——“地球毁灭”和“特朗普会见小明”的概率都远低于“小明今天生病了”。

用“奥卡姆剃刀”解释这个现象是模型越复杂（宇宙模型》国际关系模型》生活模型），出现的（先验）概率越低。