【发布时间】:2021-01-08 12:34:03
【问题描述】:
我们来获取数据:
set.seed(42)
y <- rnorm(125)
x <- data.frame(runif(125), rexp(125))
我想对其执行 2 折交叉验证。所以:
library(caret)
model <- train(y ~ .,
data = cbind(y, x), method = "lm",
trControl = trainControl(method = "cv", number = 2)
)
model
Linear Regression
125 samples
2 predictor
No pre-processing
Resampling: Cross-Validated (2 fold)
Summary of sample sizes: 63, 62
Resampling results:
RMSE Rsquared MAE
1.091108 0.002550859 0.8472947
Tuning parameter 'intercept' was held constant at a value of TRUE
我想手动获取上面的这个 RMSE 值,以确保我完全理解交叉验证。
我目前的工作
正如我在上面看到的,我的样本分为:62(1 折)和 63(2 折)。
#Training first model basing on first fold
model_1 <- lm(y[1:63] ~ ., data = x[1:63, ])
#Calculating RMSE for the first model
RMSE_1 <- RMSE(y[64:125], predict(model_1, newdata = x[64:125, ]))
#Training second model basing on second fold
model_2 <- lm(y[64:125] ~ ., data = x[64:125, ])
#Calculating RMSE for the second model
RMSE_2 <- RMSE(y[1:63], predict(model_1, newdata = x[1:63, ]))
mean(c(RMSE_1, RMSE_2))
1.023411
我的问题是 - 为什么我得到不同的 RMSE ?这个误差太大了,不能被视为估计误差——当然他们是以另一种方式计算的。你知道我在做什么不同吗?
【问题讨论】:
标签: r regression linear-regression cross-validation manual-testing