【问题标题】:Error in LightGBM algorithm using tidymodels and treesnip package使用 tidymodels 和 treesnip 包的 LightGBM 算法出错
【发布时间】:2021-06-20 08:30:06
【问题描述】:

我想尝试使用 tidymodels 和 treesnip 包的 LightGBM 算法。 一些预处理...

# remotes::install_github("curso-r/treesnip")
        # install.packages("titanic") 
        library(tidymodels)
        library(stringr)
        library(titanic)
        data("titanic_train")

    df <- titanic_train %>% as_tibble %>%
      mutate(title=str_extract(Name,"\\w+\\.") %>% str_replace(fixed("."),"")) %>%
      mutate(title=case_when(title %in% c('Mlle','Ms')~'Miss', 
                             title=='Mme'~ 'Mrs',
                             title %in% c('Capt','Don','Major','Sir','Jonkheer', 'Col')~'Sir',
                             title %in% c('Dona', 'Lady', 'Countess')~'Lady',
                             TRUE~title)) %>%
      mutate(title=as.factor(title),
             Survived=factor(Survived,levels = c(0,1),labels=c("no","yes")),
             Sex=as.factor(Sex),
             Pclass=factor(Pclass)) %>%
      select(-c(PassengerId,Ticket,Cabin,Name)) %>% 
      mutate(Embarked=as.factor(Embarked))
table(df$title,df$Sex)

trnTst <- initial_split(data = df,prop = .8,strata = Survived)

cv.folds <- training(trnTst) %>% 
  vfold_cv(data = .,v = 4,repeats = 1)
cv.folds
rec <- recipe(Survived~.,data = training(trnTst)) %>% 
  step_nzv(all_predictors()) %>%  
  step_knnimpute(Age,neighbors = 3,impute_with = vars(title,Fare,Pclass))

为了检查问题不在数据中,我成功调整了随机森林算法。

m.rf <- rand_forest(trees = 1000,min_n = tune(),mtry = tune()) %>% 
  set_mode(mode = 'classification') %>% 
  set_engine('ranger')
wf.rf <- workflow() %>% add_recipe(rec) %>% add_model(m.rf)
(cls <- parallel::makeCluster(parallel::detectCores()-1))
doParallel::registerDoParallel(cl = cls)
tn.rf <- tune_grid(wf.rf,resamples = cv.folds,grid = 20,
                    metrics = metric_set(accuracy,roc_auc))
doParallel::stopImplicitCluster()
autoplot(tn.rf)
wf.rf <- finalize_workflow(x = wf.rf,parameters = select_best(tn.rf,metric = 'roc_auc'))
res.rf <- fit_resamples(wf.rf,resamples = cv.folds,metrics = metric_set(accuracy,roc_auc))
res.rf %>% collect_metrics()

但 lightGBM 在没有调整和并行处理的情况下会引发错误

根据How to Use Lightgbm with Tidymodels

与 XGBoost 相比,lightgbm 和 catboost 都非常有能力处理分类变量(因子),因此您不需要将变量转换为虚拟变量(一种热编码),实际上您不应该这样做,它让一切变慢,并可能给你带来更差的性能。

library(treesnip) # lightgbm & catboost connector
m.lgbm <- boost_tree() %>% #trees = tune(), min_n = tune()) %>% 
  set_mode(mode = 'classification') %>% 
  set_engine('lightgbm')
wf.lgbm <- workflow() %>% add_recipe(rec) %>% add_model(m.lgbm)
res.lgbm <- fit_resamples(wf.lgbm,resamples = cv.folds)
Warning message:
All models failed. See the `.notes` column. 

     res.lgbm$.notes[[1]]

internal: Error in pkg_list[[1]]: subgroup out of bounds

【问题讨论】:

  • 同样的问题,有什么提示吗?

标签: r lightgbm tidymodels


【解决方案1】:

尝试在没有doParallel 的情况下运行tune_grid - LightGBM 和tune_grid 之间似乎存在冲突,两者都希望并行运行。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2021-03-18
    • 2022-01-23
    • 2022-11-02
    • 1970-01-01
    • 2020-08-27
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多