【问题标题】:Modifying Objects in R so that they are Compatible for Plotting修改 R 中的对象,使其与绘图兼容
【发布时间】:2020-10-19 21:19:45
【问题描述】:

我在 R 中找到了一个程序,它能够为数据集中的观察结果绘制图表。

#source: https://cran.r-project.org/web/packages/xgboost/xgboost.pdf

library(xgboost)

data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')

bst <- xgboost(agaricus.train$data, agaricus.train$label, nrounds = 50,
eta = 0.1, max_depth = 3, subsample = .5,
method = "hist", objective = "binary:logistic", nthread = 2, verbose = 0)

xgb.plot.shap(agaricus.test$data, model = bst, features = "odor=none")

contr <- predict(bst, agaricus.test$data, predcontrib = TRUE)

xgb.plot.shap(agaricus.test$data, contr, model = bst, top_n = 12, n_col = 3)

对于生成的上述图,我想对其进行修改,以便在这些图上仅显示第一个观察值。我尝试修改下面的代码

# repeat for just one row: error

 b=(agaricus.test$data)[1,]
 b = as.matrix(b)
 contr <- predict(bst, b, predcontrib = TRUE)
xgb.plot.shap(b, contr, model = bst, top_n = 12, n_col = 3)

Error in xgb.plot.shap(b, contr, model = bst, top_n = 12) : 
  shap_contrib is not compatible with the provided data

谁能告诉我我做错了什么?或者这根本不可能? 谢谢

【问题讨论】:

  • 我想帮助你,但我需要你的数据来重现它..
  • @rodolfoksveiga 感谢您的回复!我相信数据包含在“xgboost”库中。如果您复制并粘贴第一部分代码,一切都会运行。它的第二部分没有运行。如果您无法访问数据,请告诉我。谢谢!

标签: r dataframe object plot xgboost


【解决方案1】:

据我所知,它实际上并不是这样工作的。这是一个示例 - 二进制变量的得分为 0 或 1:得分 0 = SHAP 值介于 0.2 和 0.5 之间,而得分 1 = SHAP 值介于 1.2 和 1.5 之间 - 这就是图表所说明的内容 -该变量的 0 和 1 之间的 SHAP 值的差异。选择“第一次观察”可能是得分为 0 或得分为 1 的观察,因此显示的 SHAP 值并不能真正告诉您有关变量的太多信息。这就是为什么 SHAP 图需要一个以上观察的矩阵(以及为什么您的方法不起作用)。

尽管如此,如果您愿意,您可以提取前 n 个观察值的 SHAP 值,然后在 ggplot 或 base R 中自行绘制第一个观察值,例如

library(tidyverse)
library(xgboost)

data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')

bst <- xgboost(agaricus.train$data, agaricus.train$label, nrounds = 50,
               eta = 0.1, max_depth = 3, subsample = .5,
               method = "hist", objective = "binary:logistic",
               nthread = 2, verbose = 0)

xgb.plot.shap(agaricus.test$data, model = bst, features = "odor=none")

contr <- predict(bst, agaricus.test$data, predcontrib = TRUE)

## Use "plot = FALSE" to return the data to "mat", instead of the rendered plot
mat <- xgb.plot.shap(agaricus.test$data[1:2,], contr[1:2,], model = bst,
              top_n = 12, n_col = 3, plot = FALSE)

## Format the data
mat$shap_contrib %>% 
  t() %>%
  as.data.frame() %>% 
  rownames_to_column() %>%
  set_names(c("Variable", "SHAP", "second_observation")) %>% 
## Then plot however you want
  ggplot(aes(y = SHAP, x = "")) +
  geom_point(pch = 3) +
  theme_bw() +
  theme(axis.ticks.x = element_blank(),
        axis.title.x = element_blank()) +
  facet_wrap(facets = vars(Variable))

按 cmets 更新:

library(tidyverse)
library(xgboost)

data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')

bst <- xgboost(agaricus.train$data, agaricus.train$label, nrounds = 50,
               eta = 0.1, max_depth = 3, subsample = .5,
               objective = "binary:logistic",
               nthread = 2, verbose = 0)

xgb.plot.shap(agaricus.test$data, model = bst, features = "odor=none")

contr <- predict(bst, agaricus.test$data, predcontrib = TRUE, approxcontrib = FALSE)
pred <- predict(bst, agaricus.test$data)

## Use "plot = FALSE" to return the data to "mat", instead of the rendered plot
mat <- xgb.plot.shap(agaricus.test$data[1:2,], contr[1:2,], model = bst,
                     top_n = 12, n_col = 3, plot = FALSE)

## Format the data
SHAP <- as.matrix(mat$shap_contrib[1,]) %>%
  as.data.frame() %>% 
  rownames_to_column() %>%
  set_names(c("Variable", "SHAP"))

Score <- as.matrix(mat$data[1,]) %>%
  as.data.frame() %>% 
  rownames_to_column() %>% 
  set_names(c("Variable", "Score"))

Pred <- ifelse(pred[1] <= 0.5, 0, 1)

SHAP_Score <- left_join(SHAP, Score, by = "Variable")

SHAP_Score_Pred <- cbind(SHAP_Score, Pred)

ggplot(SHAP_Score_Pred, aes(y = SHAP, x = Score)) +
  geom_hline(yintercept = 0, lty = 2, col = "grey75") +
  geom_point(pch = 3, cex = 3, col = "red") +
  ggtitle(label = paste("Prediction for this observation =", Pred, sep = " ")) +
  theme_bw(base_size = 12) +
  theme(axis.text = element_text(size = 14),
        axis.title = element_text(size = 16)) +
  scale_x_continuous(breaks = c(0,1)) +
  facet_wrap(facets = vars(Variable))

【讨论】:

  • 非常感谢您的回答!在这个图上,是否可以添加观察 ID,以及 xgboost 模型对此观察的预测(1 或 0)?只是一个问题:我假设您(好心)编写的程序,它只适用于“xgboost”, - 没有其他模型。我对么?再次感谢您的帮助!
  • 是的 - 我相信可以添加观察 ID、预测等。尝试自己弄清楚,然后在需要帮助时编辑您的帖子。此示例特定于 XGBoost,但您可以计算其他模型的 SHAP 并使用相同类型的方法进行绘图;参见例如dalex / shapper (smarterpoland.pl/index.php/2019/03/…)
  • mat
  • 这会选择前两个观察值。您需要多个的原因是 xgb.plot.shap() 需要一个具有 >1 观察值的矩阵。第二个观察值保留在数据框中(即有三列 set_names(c("Variable", "SHAP", "second_observation")) 但未绘制第三列称为“second_observation”
  • 谢谢!你知道如何为obs添加ID吗?我试过: id
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2020-12-26
  • 2023-04-03
  • 2013-03-02
  • 2017-06-28
  • 2023-03-19
  • 1970-01-01
相关资源
最近更新 更多