Intuition from some examples
给定一个随机变量满足某种分布, 我们可以通过sample它得到其mean or variance. 假设sample了个点, 那sample mean
随着的增加,应该越来越趋近于真实的mean 最终相等. 但是simulation results indicate otherwise.
例1: Bernoulli distribution , 随着N的增大,sample mean的curve如下图所示
可以看到, 最终的mean确实好像是收敛到了. 但是如果我们放大来看的话会发现, 这条曲线实际上在抖动,就是说他并不是converge到一个点的.
例2: Gaussian distribution , sample mean的curve
可以看到, 之后sample mean仍在抖动. 虽然抖动的很小, 但至少不是想象中的converge to a single point.
以上simulation表示, sample mean 收敛不到 population mean (i.e., 真实的mean) . It should be close to the population mean, but may not exactly equal the population mean.
Main Results
另一种表述方法是: 即使 足够大, 每次sample N次得到的 仍然不是一个固定值, 而是一个distribution.
Theorem 1 (mean of sample mean). If , then .
Theorem 1很好理解, 即 的 sample mean 的 mean 即为 的 mean. 这也很好verify:
因为每次的sample都是i.i.d.的.
Theorem 2 (variance of sample mean). If , then .
从Theorem 2中也可以看出, 多sample是有好处的, 越大sample mean 的variance越小也就越趋近于population mean.
Conclusion
Overall, the sample mean is not a robust statistic, meaning that they are sensitive to outliers. We can only give a lower bound and an upper bound of the population mean, and say how confident we are (in %) that the population mean is between the lower bound and upper bound of the confidence interval.
Confidence interval is , where is called the margin of error, and is given by
: critical value, can be computed from standard normal distribution if given .
: significance level.
: confidence level.
As shown in the figure,
- Given a ;
- Calculate and ;
- Check norm distribution table and find
- Compute , and the confidence interval.