Sample mean of a random variable

Intuition from some examples

给定一个随机变量 $X$ 满足某种分布, 我们可以通过sample它得到其mean or variance. 假设sample了 $N$ 个点, 那sample mean
$\overline{X}=\frac{1}{N}\sum_{n=1}^N X_n$
随着 $N$ 的增加, $\overline{X}$ 应该越来越趋近于真实的mean $\mathbb{E}[X]$ 最终相等. 但是simulation results indicate otherwise.
例1: Bernoulli distribution $X\sim Bernoulli(p=0.6)$ , 随着N的增大,sample mean的curve如下图所示
Sample mean of a random variable
可以看到, 最终的mean确实好像是收敛到了 $p=0.6$ . 但是如果我们放大来看的话会发现, 这条曲线实际上在抖动,就是说他并不是converge到一个点的.

例2: Gaussian distribution $X\sim N(0,1)$ , sample mean的curve
Sample mean of a random variable

可以看到, $10^6$ 之后sample mean仍在抖动. 虽然抖动的很小, 但至少不是想象中的converge to a single point.

以上simulation表示, sample mean 收敛不到 population mean (i.e., 真实的mean) . It should be close to the population mean, but may not exactly equal the population mean.

Main Results

另一种表述方法是: 即使 $N$ 足够大, 每次sample N次得到的 $\overline{X}$ 仍然不是一个固定值, 而是一个distribution.

Theorem 1 (mean of sample mean). If $\mathbb{E}[X]=\mu$ , then $\mathbb{E}[\overline{X}]=\mu$ .

Theorem 1很好理解, 即 $X$ 的 sample mean 的 mean 即为 $X$ 的 mean. 这也很好verify:

$\mathbb{E}[\overline{X}]= \mathbb{E}\bigg[\frac{1}{N}\sum_{n=1}^N X_n\bigg] =\frac{1}{N}\sum_{n=1}^N \mathbb{E}[X_n]=\mathbb{E}[X]=\mu,$

因为每次的sample都是i.i.d.的.

Theorem 2 (variance of sample mean). If $\text{var} [X]=\sigma^2$ , then $\text{var}[\overline{X}]=\frac{\sigma^2}{N}$ .

$\text{var}[\overline{X}]= \text{var}\bigg[\frac{1}{N}\sum_{n=1}^N X_n\bigg] =\frac{1}{N^2}\sum_{n=1}^N \text{var}[X_n]=\frac{\sigma^2}{N},$

从Theorem 2中也可以看出, 多sample是有好处的, $N$ 越大sample mean 的variance越小也就越趋近于population mean.

Conclusion

Overall, the sample mean is not a robust statistic, meaning that they are sensitive to outliers. We can only give a lower bound and an upper bound of the population mean, and say how confident we are (in %) that the population mean is between the lower bound and upper bound of the confidence interval.

Sample mean of a random variable

Confidence interval is $\big[ \overline{X}-E, \overline{X}+E\big]$ , where $E$ is called the margin of error, and is given by
$E=z_{\alpha/2}\frac{\sigma}{\sqrt{N}}$

$z$ : critical value, can be computed from standard normal distribution if given $\alpha/2$ .
$\alpha$ : significance level.
$CL=1-\alpha$ : confidence level.

As shown in the figure,

Given a $CL=95\%$ ;
Calculate $\alpha = 0.05$ and $\alpha = 0.025$ ;
Check norm distribution table and find $z_{\alpha/2}=z_{0.025}=1.96$
Compute $E=z_{\alpha/2}\frac{\sigma}{\sqrt{N}}$ , and the confidence interval.