【问题标题】:how to repeat hyperparameter tuning (alpha and/or lambda) of glmnet in mlr3如何在 mlr3 中重复 glmnet 的超参数调整(alpha 和/或 lambda)
【发布时间】:2021-06-16 04:01:13
【问题描述】:

我想在较小的数据集中重复glmnet 的超参数调整(alpha 和/或 mlr3avoid variability

caret 中,我可以使用"repeatedcv" 做到这一点

因为我真的很喜欢mlr3 家庭包,所以我想用它们来进行分析。但是,我不确定如何在mlr3中执行此步骤的正确方法

示例数据

#library
library(caret)
library(mlr3verse)
library(mlbench)

# get example data
data(PimaIndiansDiabetes, package="mlbench")
data <- PimaIndiansDiabetes

# get small training data
train.data <- data[1:60,]

reprex package (v1.0.0) 于 2021 年 3 月 18 日创建

caret 方法(调整 alphalambda)使用 "cv""repeatedcv"


trControlCv <- trainControl("cv",
             number = 5,
             classProbs = TRUE,
             savePredictions = TRUE,
             summaryFunction = twoClassSummary)

# use "repeatedcv" to avoid variability in smaller data sets
trControlRCv <- trainControl("repeatedcv",
             number = 5,
             repeats= 20,
             classProbs = TRUE,
             savePredictions = TRUE,
             summaryFunction = twoClassSummary)

# train and extract coefficients with "cv" and different set.seed
set.seed(2323)
model <- train(
  diabetes ~., data = train.data, method = "glmnet",
  trControl = trControlCv,
  tuneLength = 10,
  metric="ROC"
)

coef(model$finalModel, model$finalModel$lambdaOpt) -> coef1

set.seed(23)
model <- train(
  diabetes ~., data = train.data, method = "glmnet",
  trControl = trControlCv,
  tuneLength = 10,
  metric="ROC"
)

coef(model$finalModel, model$finalModel$lambdaOpt) -> coef2


# train and extract coefficients with "repeatedcv" and different set.seed
set.seed(13)

model <- train(
  diabetes ~., data = train.data, method = "glmnet",
  trControl = trControlRCv,
  tuneLength = 10,
  metric="ROC"
)

coef(model$finalModel, model$finalModel$lambdaOpt) -> coef3


set.seed(55)
model <- train(
  diabetes ~., data = train.data, method = "glmnet",
  trControl = trControlRCv,
  tuneLength = 10,
  metric="ROC"
)

coef(model$finalModel, model$finalModel$lambdaOpt) -> coef4

reprex package (v1.0.0) 于 2021-03-18 创建

用交叉验证展示不同的系数,用重复的交叉验证展示相同的系数

# with "cv" I get different coefficients
identical(coef1, coef2)
#> [1] FALSE

# with "repeatedcv" I get the same coefficients
identical(coef3,coef4)
#> [1] TRUE

reprex package (v1.0.0) 于 2021-03-18 创建

第一个使用cv.glmnetmlr3 方法(内部调整lambda

# create elastic net regression
glmnet_lrn = lrn("classif.cv_glmnet", predict_type = "prob")

# define train task
train.task <- TaskClassif$new("train.data", train.data, target = "diabetes")

# create learner 
learner = as_learner(glmnet_lrn)

# train the learner with different set.seed
set.seed(2323)
learner$train(train.task)
coef(learner$model, s = "lambda.min") -> coef1

set.seed(23)
learner$train(train.task)
coef(learner$model, s = "lambda.min") -> coef2

reprex package (v1.0.0) 于 2021-03-18 创建

通过交叉验证展示不同的系数

# compare coefficients
coef1
#> 9 x 1 sparse Matrix of class "dgCMatrix"
#>                        1
#> (Intercept) -3.323460895
#> age          0.005065928
#> glucose      0.019727881
#> insulin      .          
#> mass         .          
#> pedigree     .          
#> pregnant     0.001290570
#> pressure     .          
#> triceps      0.020529162
coef2
#> 9 x 1 sparse Matrix of class "dgCMatrix"
#>                        1
#> (Intercept) -3.146190752
#> age          0.003840963
#> glucose      0.019015433
#> insulin      .          
#> mass         .          
#> pedigree     .          
#> pregnant     .          
#> pressure     .          
#> triceps      0.018841557

reprex package (v1.0.0) 于 2021-03-18 创建

更新 1:我取得的进展

根据下面的评论和this comment 我可以使用rsmpAutoTuner

这个answer建议不要调cv.glmnet而是glmnet(当时ml3中没有)

第二种mlr3方法使用glmnet(重复alphalambda的调整)

# define train task
train.task <- TaskClassif$new("train.data", train.data, target = "diabetes")

# create elastic net regression
glmnet_lrn = lrn("classif.glmnet", predict_type = "prob")

# turn to learner
learner = as_learner(glmnet_lrn)

# make search space
search_space = ps(
  alpha = p_dbl(lower = 0, upper = 1),
  s = p_dbl(lower = 1, upper = 1)
)

# set terminator
terminator = trm("evals", n_evals = 20)

#set tuner
tuner = tnr("grid_search", resolution = 3)

# tune the learner
at = AutoTuner$new(
  learner = learner,
  rsmp("repeated_cv"),
  measure = msr("classif.ce"),
  search_space = search_space,
  terminator = terminator,
  tuner=tuner)

at
#> <AutoTuner:classif.glmnet.tuned>
#> * Model: -
#> * Parameters: list()
#> * Packages: glmnet
#> * Predict Type: prob
#> * Feature types: logical, integer, numeric
#> * Properties: multiclass, twoclass, weights

未决问题

我如何证明我的第二种方法是有效的,并且我得到不同种子的相同或相似系数? IE。如何提取AutoTuner的最终模型的系数

set.seed(23)
at$train(train.task) -> tune1

set.seed(2323) 
at$train(train.task) -> tune2

reprex package (v1.0.0) 于 2021-03-18 创建

【问题讨论】:

  • 你可以在 mlr3 中做同样的事情,见mlr3book.mlr-org.com/resampling.html
  • @LarsKotthoff 感谢您的评论。我相应地调整了我的问题。
  • 我不确定您的问题是什么,或者是否已经回答 - 请尝试在未来提出简洁而简短的问题(虽然代表很好!)。您也可以回答自己的问题,如有疑问,请提出新问题。回答你的问题:我已经回答了如何用旧的mlr [这里]stackoverflow.com/questions/50995525/… 调整glmnet 的问题。将其移植到mlr3 应该不会那么难。不过我现在没有时间。这有帮助吗?
  • 感谢您的有用评论。我试图更简洁地说明我取得的进展(由于 cmets)和问题的开放点。

标签: r r-caret mlr3


【解决方案1】:

glmnet 的重复超参数调整(alpha 和 lambda)可以使用 SECOND mlr3 方法 完成,如上所述。 可以使用stats::coef提取系数,并将值存储在AutoTuner

coef(tune1$model$learner$model, alpha=tune1$tuning_result$alpha,s=tune1$tuning_result$s)
# 9 x 1 sparse Matrix of class "dgCMatrix"
# 1
# (Intercept) -1.6359082102
# age          0.0075541841
# glucose      0.0044351365
# insulin      0.0005821515
# mass         0.0077104934
# pedigree     0.0911233031
# pregnant     0.0164721202
# pressure     0.0007055435
# triceps      0.0056942014
coef(tune2$model$learner$model, alpha=tune2$tuning_result$alpha,s=tune2$tuning_result$s)
# 9 x 1 sparse Matrix of class "dgCMatrix"
# 1
# (Intercept) -1.6359082102
# age          0.0075541841
# glucose      0.0044351365
# insulin      0.0005821515
# mass         0.0077104934
# pedigree     0.0911233031
# pregnant     0.0164721202
# pressure     0.0007055435
# triceps      0.0056942014

【讨论】:

  • 我不确定你为什么要指定 s = 1 ( s = p_dbl(lower = 1, upper = 1) )。这样,您只考虑一个 lambda 值。这不应该被调整,或者从 cv.glmnet 确定吗?你和我可能正在尝试解决类似的问题!
  • 欲了解更多信息,我喜欢这里的答案:stats.stackexchange.com/questions/77546/how-to-interpret-glmnet/…
猜你喜欢
  • 2020-07-03
  • 2018-02-08
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2018-12-23
  • 1970-01-01
  • 2021-07-19
相关资源
最近更新 更多