【问题标题】:Xgboost Hyperparameter Tuning In R for binary classification用于二进制分类的 R 中的 Xgboost 超参数调整
【发布时间】:2020-06-17 16:33:14
【问题描述】:

我是 R 新手,正在尝试为 xgboost-二进制分类进行超参数调整,但是我遇到了错误,如果有人可以帮助我,我将不胜感激

as.matrix(cv.res)[, 3] 中的错误:下标超出范围此外:警告消息:不推荐使用“early.stop.round”。请改用“early_stopping_rounds”。请参阅 help("Deprecated") 和 help("xgboost-deprecated")。

请在下面找到代码sn-p`

 I would appreciate if some one could provide another alternative too apart from this approach in R

X_Train <- as(X_train, "dgCMatrix")


GS_LogLoss = data.frame("Rounds" = numeric(), 
                        "Depth" = numeric(),
                        "r_sample" = numeric(),
                        "c_sample" = numeric(), 
                        "minLogLoss" = numeric(),
                        "best_round" = numeric())

for (rounds in seq(50,100, 25)) {
  
  for (depth in c(4, 6, 8, 10)) {
    
    for (r_sample in c(0.5, 0.75, 1)) {
      
      for (c_sample in c(0.4, 0.6, 0.8, 1)) {
        
        for (imb_scale_pos_weight in c(5, 10, 15, 20, 25))	{
          
          for (wt_gamma in c(5, 7, 10)) {
            
            for (wt_max_delta_step in c(5,7,10)) {
              
              for (wt_min_child_weight in c(5,7,10,15))	{
                
                
                set.seed(1024)
                eta_val = 2 / rounds
                cv.res = xgb.cv(data = X_Train, nfold = 2, label = y_train, 
                                nrounds = rounds, 
                                eta = eta_val, 
                                max_depth = depth,
                                subsample = r_sample, 
                                colsample_bytree = c_sample,
                                early.stop.round = 0.5*rounds,
                                scale_pos_weight= imb_scale_pos_weight,
                                max_delta_step = wt_max_delta_step,
                                gamma = wt_gamma,
                                objective='binary:logistic', 
                                eval_metric = 'auc',
                                verbose = FALSE)
                
                print(paste(rounds, depth, r_sample, c_sample, min(as.matrix(cv.res)[,3]) ))
                GS_LogLoss[nrow(GS_LogLoss)+1, ] = c(rounds, 
                                                     depth, 
                                                     r_sample, 
                                                     c_sample, 
                                                     min(as.matrix(cv.res)[,3]), 
                                                     which.min(as.matrix(cv.res)[,3]))
                
              }
            }
          }
        }	
      }
    }
  }	
}

`

【问题讨论】:

  • 你确定 xgb.cv 的输出是一个矩阵吗?我建议先解决 1 个参数的问题。一旦成功,您可以将测试扩展到其他人

标签: r machine-learning xgboost grid-search hyperparameters


【解决方案1】:

要选择超参数,您可以使用元包tidymodels,尤其是包parsniprsampleyardsticktune

这样的工作流程可以工作:

library(tidyverse)
library(tidymodels)

# Specify the model and the parameters to tune (parnsip)
model <-
  boost_tree(tree_depth = tune(), mtry = tune()) %>% 
  set_mode("classification") %>% 
  set_engine("xgboost")

# Specify the resampling method (rsample)
splits <- vfold_cv(X_train, v = 2)

# Specify the metrics to optimize (yardstick)
metrics <- metric_set(roc_auc)

# Specify the parameters grid (or you can use dials to automate your grid search)
grid <- expand_grid(tree_depth = c(4, 6, 8, 10),
                    mtry = c(2, 10, 50)) # You can add others

# Run each model (tune)
tuned <- tune_grid(formula = Y ~ .,
                   model = model,
                   resamples = splits,
                   grid = grid,
                   metrics = metrics,
                   control = control_grid(verbose = TRUE))

# Check results
show_best(tuned)
autoplot(tuned)
select_best(tuned)

# Update model
tuned_model <- 
  model %>% 
  finalize_model(select_best(tuned)) %>% 
  fit(Y ~ ., data = X_train)

# Make prediction 
predict(tuned_model, X_train)
predict(tuned_model, X_test)

请注意,模型规范中的名称与xgboost 中的原始名称相比可能会发生变化,因为parsnip 是一个统一的接口,在多个模型中具有一致的名称。见here

【讨论】:

  • 无法安装 tidymodels 包,报错
  • 然后尝试单独安装它们:rsample, parsnip, yardstick, tune。
猜你喜欢
  • 2016-03-01
  • 2021-02-15
  • 2021-12-15
  • 2021-02-11
  • 2022-08-16
  • 2020-10-02
  • 2020-09-10
  • 1970-01-01
  • 2019-03-10
相关资源
最近更新 更多