【问题标题】:Plotting issues -Partial dependence plots绘图问题 - 部分依赖图
【发布时间】:2021-08-14 09:57:46
【问题描述】:

创建了以下 explain_tidymodels,以显示部分依赖图。

explainer <- explain_tidymodels(rf_vi_fit, data = Data_train, y = Data_train$Lead_week)

现在我正在通过执行以下操作来创建绘图:

model_profile(explainer, variables = c( "AC", "Jaar, "Month", "Retentie")) %>% plot()

现在我得到以下图像:

问题是首先,“为工作流模型创建”的文本阻塞了我的 AC 标题。其次,我想将颜色从蓝色更改为红色。我试过 %>% plot(color = "red") 和 %>% plot(col = "red"),但似乎都不起作用。

有人知道如何解决这些绘图问题之一吗?提前致谢!

【问题讨论】:

    标签: r plot tidymodels


    【解决方案1】:

    您可以使用as_tibble() 函数访问创建这些图的数据,然后您可以以您喜欢的任何自定义方式创建图:

    library(tidymodels)
    #> Registered S3 method overwritten by 'tune':
    #>   method                   from   
    #>   required_pkgs.model_spec parsnip
    library(DALEXtra)
    #> Loading required package: DALEX
    #> Welcome to DALEX (version: 2.2.0).
    #> Find examples and detailed introduction at: http://ema.drwhy.ai/
    #> Additional features will be available after installation of: ggpubr.
    #> Use 'install_dependencies()' to get all suggested dependencies
    #> 
    #> Attaching package: 'DALEX'
    #> The following object is masked from 'package:dplyr':
    #> 
    #>     explain
    
    data(ames)
    ames_train <- ames %>%
        transmute(Sale_Price = log10(Sale_Price),
                  Gr_Liv_Area = as.numeric(Gr_Liv_Area), 
                  Year_Built, Bldg_Type)
    
    rf_model <- 
        rand_forest(trees = 1000) %>% 
        set_engine("ranger") %>% 
        set_mode("regression")
    
    rf_wflow <- 
        workflow() %>% 
        add_formula(
            Sale_Price ~ Gr_Liv_Area + Year_Built + Bldg_Type) %>% 
        add_model(rf_model) 
    
    rf_fit <- rf_wflow %>% fit(data = ames_train)
    explainer_rf <- explain_tidymodels(
        rf_fit, 
        data = dplyr::select(ames_train, -Sale_Price), 
        y = ames_train$Sale_Price,
        label = "random forest"
    )
    #> Preparation of a new explainer is initiated
    #>   -> model label       :  random forest 
    #>   -> data              :  2930  rows  3  cols 
    #>   -> data              :  tibble converted into a data.frame 
    #>   -> target variable   :  2930  values 
    #>   -> predict function  :  yhat.workflow  will be used ( [33m default [39m )
    #>   -> predicted values  :  No value for predict function target column. ( [33m default [39m )
    #>   -> model_info        :  package tidymodels , ver. 0.1.3 , task regression ( [33m default [39m ) 
    #>   -> predicted values  :  numerical, min =  4.91122 , mean =  5.220561 , max =  5.520101  
    #>   -> residual function :  difference between y and yhat ( [33m default [39m )
    #>   -> residuals         :  numerical, min =  -0.8113628 , mean =  7.953836e-05 , max =  0.3598514  
    #>  [32m A new explainer has been created! [39m
    
    pdp_rf <- model_profile(explainer_rf, N = NULL, 
                            variables = "Gr_Liv_Area", groups = "Bldg_Type")
    
    as_tibble(pdp_rf$agr_profiles) %>%
        mutate(`_label_` = stringr::str_remove(`_label_`, "random forest_")) %>%
        ggplot(aes(`_x_`, `_yhat_`, color = `_label_`)) +
        geom_line(size = 1.2, alpha = 0.8) +
        labs(x = "Gross living area", 
             y = "Sale Price (log)", 
             color = NULL,
             title = "Partial dependence profile for Ames housing sales",
             subtitle = "Predictions from a random forest model")
    

    reprex package (v2.0.0) 于 2021-05-27 创建

    【讨论】:

    • 感谢 Julia,非常感谢您的解释!
    猜你喜欢
    • 2022-06-14
    • 2019-10-04
    • 2018-03-02
    • 1970-01-01
    • 2019-06-28
    • 2011-06-12
    • 1970-01-01
    • 2020-11-19
    • 1970-01-01
    相关资源
    最近更新 更多