引导线性回归答案

【问题标题】：bootstrap a linear regression引导线性回归
【发布时间】：2014-05-09 11:47:36
【问题描述】：

我正在尝试从 R 中的线性回归运行引导程序。到目前为止我的代码是

hprice<-lm(dat[,1]~dat[,3]+dat[,4]+dat[,5]+dat[,6])
print (hprice)
pricefunc<-function(data,ind) lm(data[ind,1]~data[ind,3]+data[ind,4]+data[ind,5]+data[ind,6])
hpboot<-boot(dat,pricefunc, 1000)

这似乎不起作用。

我不太了解统计参数，我会说这是我出错的地方。

谢谢

【问题讨论】：

“不起作用”到底是什么意思？您没有为其他任何人提供足够的代码来运行它（即没有示例数据）。请参阅this question，了解如何制作可重现的示例。
你是对的，我没有提供足够的代码。我最大的问题是我无法弄清楚 boot() 中的“统计”参数是什么意思。我得到了一个使它起作用的答案，现在我只需要弄清楚如何。谢谢

标签： r statistics-bootstrap

【解决方案1】：

如果您需要系数估计，您必须将$coef 添加到lm 函数中

pricefunc<-function(data,ind) lm(data[ind,1]~data[ind,3]+data[ind,4]+data[ind,5]+data[ind,6])$coef

然后就可以运行了：

boot(dat,pricefunc, 1000)

【讨论】：

好的。我试过了，我认为它有效。谢谢你。 $coef 有什么作用？
命令lm 为您提供有关线性模型的信息列表，但boot 函数只想对模型的系数进行推断，因此您必须使用$coef仅提取有关系数估计的信息。

【解决方案2】：

这是我一直用于引导回归并在必要时进行更改的代码为了使 bootstrap 起作用，重要的是观测值是独立的、同分布的，并且您的估计值的分布收敛到相应的总体分布。在下面的示例中，我估计了一个包含 20 个观测值的回归模型。在本例中，每个观测值都输入两次。在这种情况下，我需要引导原始观察结果，以获得适当的标准误差。

set.seed(45)
x <- 2*rnorm(20)
epsilon <- rnorm(20)
y <- 1 - 0.5*x + epsilon # y variable is the regression
data1 <- data.frame(y=y,x=x,obs.id=1:20)
summary(lm(y~x,data=data1))

# now the dataset is entered twice but we know the id's of the original observations
data2 <- rbind(data1,data1)
summary(lm(y~x,data=data2))

# the coefficients are exactly the same, but the estimated standard errors are wrong
# due to the duplication of the dataset. The data are depenndent, the independent units of
# observation are the id's
B <- 10000
boot.b <- matrix(NA,nrow=B,ncol=2)
all.ids <- cbind(1:20,line1=1:20,line2=21:40)
for (b in 1:B){
ids.b <- sample(all.ids[,1],20,replace=TRUE)
lines.b <- c(all.ids[ids.b,2],all.ids[ids.b,3])
data.b <- data2[lines.b,]
boot.b[b,] <- coef(lm(y~x,data=data.b))
}
colMeans(boot.b)

coef(lm(y~x,data=data1))

var(boot.b)

vcov(lm(y~x,data=data2))

【讨论】：

【解决方案3】：

还有来自parameters 的model_parameters 函数来获取自举置信区间和p-值：

library(parameters)

mod <- lm(formula = wt ~ mpg, data = mtcars)

model_parameters(mod)
#> Parameter   | Coefficient |   SE |         95% CI | t(30) |      p
#> ------------------------------------------------------------------
#> (Intercept) |        6.05 | 0.31 | [ 5.42,  6.68] | 19.59 | < .001
#> mpg         |       -0.14 | 0.01 | [-0.17, -0.11] | -9.56 | < .001

model_parameters(mod, bootstrap = TRUE, iterations = 100)
#> Parameter   | Coefficient |         95% CI |     p
#> --------------------------------------------------
#> (Intercept) |        5.99 | [ 5.36,  6.68] | 0.010
#> mpg         |       -0.14 | [-0.17, -0.11] | 0.010

^{由reprex package (v1.0.0) 于 2021-03-09 创建}

【讨论】：