03-Data Resampling

1. Bootstrap

Draw a “bootstrap sample" by sampling n times with replacement from the sample.

The bootstrap estimates the variability of the sampling process and works well for estimating confidence intervals.

A confidence interval provides a range of values which is likely to contain the population parameter of interest.

ex. I have 95% confidence to believe that the mean of this parameter is in range(x1, x2)

03-Data Resampling



2. Permutation

Concatenate two datasets A & B, randomly reset the indexes, then output new A and new B with no replacement.

Permutation tests test a specific null hypothesis of exchangeability.


3.Cross validation

Cross-validation removes one point at a time, then fits to the remaining points, then sees how well the removed point is fit.

Cross-validation is primarily a way of measuring the predictive performance of a statistical model.

Cross Validation is used to assess the predictive performance of the models and and to judge how they perform outside the sample to a new data set also known as test data
The motivation to use cross validation techniques is that when we fit a model, we are fitting it to a training dataset. Without cross validation we only have information on how does our model perform to our in-sample data. Ideally we would like to see how does the model perform when we have a new data in terms of accuracy of its predictions. In science, theories are judged by its predictive performance.  
There two types of cross validation you can perform: leave one out and k fold.

相关文章:

  • 2021-08-23
  • 2021-04-20
  • 2021-06-29
  • 2021-04-27
  • 2021-11-07
  • 2022-01-11
  • 2021-07-02
猜你喜欢
  • 2022-12-23
  • 2022-01-21
  • 2022-12-23
  • 2022-12-23
  • 2022-12-23
  • 2022-01-13
  • 2021-08-08
相关资源
相似解决方案