【发布时间】:2019-06-10 04:13:54
【问题描述】:
我正在编写一个函数,用于从一系列线性回归模型中获取诊断和测试错误。
我的输入是一个列表列表。每个列表都包含其自身模型的信息。
model.1 <- list("medv","~.","Boston_Ready")
names(model.1) <- c("response", "input","dataset")
model.2 <- list("medv","~lstat","Boston_Ready")
names(model.2) <- c("response", "input","dataset")
models <- list(model.1,model.2)
当给定一个包含数据框、响应变量和输入的列表时,我的函数会计算回归诊断。
TestError <- function(model){
library('boot')
df <- get(model$dataset)
formula <- paste(model$response,model$input)
response <- model$response
##Diagnostics
fit <- lm(formula,data=df)
fit_summ <- summary(fit)
F_Stat <- fit_summ$fstatistic[1]
Adj_R_Sq <- fit_summ$adj.r.squared
RSS <- with(fit_summ, df[2] * sigma^2)
AIC <- AIC(fit)
BIC <- BIC(fit)
##Cross-Validation
#5-fold cross validation
glm.fit <- glm(formula, data=df)
cv.err <- cv.glm(df, glm.fit, K=5)
Five.Fold_MSE <- cv.err$delta[1]
#10-fold cross validation
glm.fit <- glm(formula, data=df)
cv.err <- cv.glm(df, glm.fit, K=10)
Ten.Fold_MSE <- cv.err$delta[1]
#LOOCV
glm.fit <- glm(formula, data=df)
cv.err <- cv.glm(df, glm.fit)
LOOCV_MSE <- cv.err$delta[1]
#Output
label <- c("lm","formula =",paste(model$response,model$input), "data= ",model$dataset)
print(paste(label))
Results <- (c(LOOCV_MSE,Five.Fold_MSE,Ten.Fold_MSE,F_Stat,Adj_R_Sq, RSS, AIC, BIC))
names(Results) <- c("LOOCV MSE", "5-Fold MSE", "10-Fold MSE","F-Stat","Adjusted R^2","RSS","AIC","BIC")
print(Results)
}
由于某种原因,输出生成了两次相同的东西
lapply(models,TestError)
> lapply(models,TestError)
[1] "lm" "formula =" "medv ~." "data= " "Boston_Ready"
LOOCV MSE 5-Fold MSE 10-Fold MSE F-Stat Adjusted R^2 RSS AIC BIC
0.3250332 0.3288020 0.3251508 114.3744328 0.6918372 152.5405737 853.2181335 903.9365735
[1] "lm" "formula =" "medv ~lstat" "data= " "Boston_Ready"
LOOCV MSE 5-Fold MSE 10-Fold MSE F-Stat Adjusted R^2 RSS AIC BIC
0.4597660 0.4622565 0.4593045 601.6178711 0.5432418 230.2061197 1043.4596316 1056.1392416
[[1]]
LOOCV MSE 5-Fold MSE 10-Fold MSE F-Stat Adjusted R^2 RSS AIC BIC
0.3250332 0.3288020 0.3251508 114.3744328 0.6918372 152.5405737 853.2181335 903.9365735
[[2]]
LOOCV MSE 5-Fold MSE 10-Fold MSE F-Stat Adjusted R^2 RSS AIC BIC
0.4597660 0.4622565 0.4593045 601.6178711 0.5432418 230.2061197 1043.4596316 1056.1392416
这是由于 lapply() 的一个怪癖吗?
【问题讨论】:
-
当您从函数内打印时,您只会看到它两次。如果您将
lapply调用分配给一个变量(即testing <- lapply(models,TestError)),它应该只有一个