【问题标题】:Get AUC on training data from a fitted workflow in Tidymodels?从 Tidymodels 中的拟合工作流中获取训练数据的 AUC?
【发布时间】:2021-04-12 17:24:45
【问题描述】:

我正在努力解决如何使用 tidymodels 从逻辑回归模型中获取 AUC。

这是一个使用内置 mpg 数据集的示例。

library(tidymodels)
library(tidyverse)

# Use mpg dataset
df <- mpg

# Create an indicator variable for class="suv"
df$is_suv <- as.factor(df$class == "suv")

# Create the split object
df_split <- initial_split(df, prop=1/2)

# Create the training and testing sets
df_train <- training(df_split)
df_test <- testing(df_split)

# Create workflow
rec <-
  recipe(is_suv ~ cty + hwy + cyl, data=df_train)

glm_spec <-
  logistic_reg() %>%
  set_engine(engine = "glm")

glm_wflow <- 
  workflow() %>%
  add_recipe(rec) %>%
  add_model(glm_spec)

# Fit the model
model1 <- fit(glm_wflow, df_train)

# Attach predictions to training dataset
training_results <- bind_cols(df_train, predict(model1, df_train))

# Calculate accuracy
accuracy(training_results, truth = is_suv, estimate = .pred_class)

# Calculate AUC??
roc_auc(training_results, truth = is_suv, estimate = .pred_class)

最后一行返回此错误:

> roc_auc(training_results, truth = is_suv, estimate = .pred_class)
Error in metric_summarizer(metric_nm = "roc_auc", metric_fn = roc_auc_vec,  : 
  formal argument "estimate" matched by multiple actual arguments

【问题讨论】:

    标签: r tidymodels yardstick


    【解决方案1】:

    由于您正在进行二元分类,roc_auc() 期望一个与“相关”类相对应的类概率向量,而不是预测类。

    您可以使用predict(model1, df_train, type = "prob") 获取此信息。或者,如果您使用的是 0.2.2 或更高版本的工作流,您可以使用 augment() 来获取类预测和概率,而无需使用 bind_cols()

    library(tidymodels)
    library(tidyverse)
    
    # Use mpg dataset
    df <- mpg
    
    # Create an indicator variable for class="suv"
    df$is_suv <- as.factor(df$class == "suv")
    
    # Create the split object
    df_split <- initial_split(df, prop=1/2)
    
    # Create the training and testing sets
    df_train <- training(df_split)
    df_test <- testing(df_split)
    
    # Create workflow
    rec <-
      recipe(is_suv ~ cty + hwy + cyl, data=df_train)
    
    glm_spec <-
      logistic_reg() %>%
      set_engine(engine = "glm")
    
    glm_wflow <- 
      workflow() %>%
      add_recipe(rec) %>%
      add_model(glm_spec)
    
    # Fit the model
    model1 <- fit(glm_wflow, df_train)
    
    # Attach predictions to training dataset
    training_results <- augment(model1, df_train)
    
    # Calculate accuracy
    accuracy(training_results, truth = is_suv, estimate = .pred_class)
    #> # A tibble: 1 x 3
    #>   .metric  .estimator .estimate
    #>   <chr>    <chr>          <dbl>
    #> 1 accuracy binary         0.795
    
    # Calculate AUC
    roc_auc(training_results, truth = is_suv, estimate = .pred_FALSE)
    #> # A tibble: 1 x 3
    #>   .metric .estimator .estimate
    #>   <chr>   <chr>          <dbl>
    #> 1 roc_auc binary         0.879
    

    reprex package (v1.0.0) 于 2021-04-12 创建

    【讨论】:

    • 我已更新到最新版本,但无法使augment 功能正常工作。这是错误:&gt; training_results &lt;- augment(model1, df_train) Error: No augment method for objects of class workflow
    • 没关系,我只需要更新我的workflows版本
    猜你喜欢
    • 1970-01-01
    • 2022-06-20
    • 2020-08-04
    • 2019-06-23
    • 1970-01-01
    • 2021-03-21
    • 2019-06-16
    • 2022-01-16
    • 1970-01-01
    相关资源
    最近更新 更多