这是我一直用于引导回归并在必要时进行更改的代码
为了使 bootstrap 起作用,重要的是观测值是独立的、同分布的,并且您的估计值的分布收敛到相应的总体分布。在下面的示例中,我估计了一个包含 20 个观测值的回归模型。在本例中,每个观测值都输入两次。在这种情况下,我需要引导原始观察结果,以获得适当的标准误差。
set.seed(45)
x <- 2*rnorm(20)
epsilon <- rnorm(20)
y <- 1 - 0.5*x + epsilon # y variable is the regression
data1 <- data.frame(y=y,x=x,obs.id=1:20)
summary(lm(y~x,data=data1))
# now the dataset is entered twice but we know the id's of the original observations
data2 <- rbind(data1,data1)
summary(lm(y~x,data=data2))
# the coefficients are exactly the same, but the estimated standard errors are wrong
# due to the duplication of the dataset. The data are depenndent, the independent units of
# observation are the id's
B <- 10000
boot.b <- matrix(NA,nrow=B,ncol=2)
all.ids <- cbind(1:20,line1=1:20,line2=21:40)
for (b in 1:B){
ids.b <- sample(all.ids[,1],20,replace=TRUE)
lines.b <- c(all.ids[ids.b,2],all.ids[ids.b,3])
data.b <- data2[lines.b,]
boot.b[b,] <- coef(lm(y~x,data=data.b))
}
colMeans(boot.b)
coef(lm(y~x,data=data1))
var(boot.b)
vcov(lm(y~x,data=data2))