【问题标题】:How to calculate shap values for ADABoost model?如何计算 ADABoost 模型的形状值?
【发布时间】:2020-06-11 11:24:52
【问题描述】:

我正在运行 3 个不同的模型(随机森林、梯度提升、Ada Boost)和基于这 3 个模型的模型集合。

我设法将 SHAP 用于 GB 和 RF,但不适用于 ADA,但出现以下错误:

Exception                                 Traceback (most recent call last)
in engine
----> 1 explainer = shap.TreeExplainer(model,data = explain_data.head(1000), model_output= 'probability')

/home/cdsw/.local/lib/python3.6/site-packages/shap/explainers/tree.py in __init__(self, model, data, model_output, feature_perturbation, **deprecated_options)
    110         self.feature_perturbation = feature_perturbation
    111         self.expected_value = None
--> 112         self.model = TreeEnsemble(model, self.data, self.data_missing)
    113 
    114         if feature_perturbation not in feature_perturbation_codes:

/home/cdsw/.local/lib/python3.6/site-packages/shap/explainers/tree.py in __init__(self, model, data, data_missing)
    752             self.tree_output = "probability"
    753         else:
--> 754             raise Exception("Model type not yet supported by TreeExplainer: " + str(type(model)))
    755 
    756         # build a dense numpy version of all the tree objects

Exception: Model type not yet supported by TreeExplainer: <class 'sklearn.ensemble._weight_boosting.AdaBoostClassifier'>

我在那个状态的 Git 上找到了这个 link

TreeExplainer 从我们试图解释的任何模型类型创建一个 TreeEnsemble 对象,然后在下游使用。因此,您需要做的就是在

中添加另一个 if 语句

TreeEnsemble 构造函数类似于用于梯度提升的构造函数

但我真的不知道如何实现它,因为我对此很陌生。

【问题讨论】:

    标签: adaboost shap


    【解决方案1】:

    我遇到了同样的问题,我所做的是修改您正在评论的git 中的文件。

    在我的情况下,我使用 Windows,因此文件位于 C:\Users\my_user\AppData\Local\Continuum\anaconda3\Lib\site-packages\shap\explainers 但您可以双击错误消息和文件将被打开。

    下一步是添加另一个elif,正如 git help 的回答所说。就我而言,我是从404 行中完成的,如下所示:

    1) 修改源代码。

    ... 
        self.objective = objective_name_map.get(model.criterion, None)
        self.tree_output = "probability"
    elif str(type(model)).endswith("sklearn.ensemble.weight_boosting.AdaBoostClassifier'>"): #From this line I have modified the code
        scaling = 1.0 / len(model.estimators_) # output is average of trees
        self.trees = [Tree(e.tree_, normalize=True, scaling=scaling) for e in model.estimators_]
        self.objective = objective_name_map.get(model.base_estimator_.criterion, None) #This line is done to get the decision criteria, for example gini.
        self.tree_output = "probability" #This is the last line I added
    elif str(type(model)).endswith("sklearn.ensemble.forest.ExtraTreesClassifier'>"): # TODO: add unit test for this case
        scaling = 1.0 / len(model.estimators_) # output is average of trees
        self.trees = [Tree(e.tree_, normalize=True, scaling=scaling) for e in model.estimators_]
    ...
    

    注意在其他模型中,shap 的代码需要属性'criterion',而 AdaBoost 分类器没有直接的方式。所以在这种情况下,这个属性是从 AdaBoost 的“弱”分类器中获得的,这就是我添加 model.base_estimator_.criterion 的原因。

    最后,您必须再次导入库,训练您的模型并获取 shap 值。我举个例子:

    2) 再次导入库并尝试:

    from sklearn import datasets
    from sklearn.ensemble import AdaBoostClassifier
    import shap
    
    # import some data to play with
    iris = datasets.load_iris()
    X = iris.data
    y = iris.target
    
    ADABoost_model = AdaBoostClassifier()
    ADABoost_model.fit(X, y)
    
    shap_values = shap.TreeExplainer(ADABoost_model).shap_values(X)
    shap.summary_plot(shap_values, X, plot_type="bar")
    

    生成以下内容:

    3) 获得新结果:

    【讨论】:

    • @Shadelex 如果答案是好的,请考虑接受它。
    【解决方案2】:

    shap 包似乎已更新,但仍不包含 AdaBoostClassifier。基于之前的答案,我修改了之前的答案以使用第 598-610 行中的 shap/explainers/tree.py 文件

    ### Added AdaBoostClassifier based on the outdated StackOverflow response and Github issue here
    ### https://stackoverflow.com/questions/60433389/how-to-calculate-shap-values-for-adaboost-model/61108156#61108156
    ### https://github.com/slundberg/shap/issues/335
    elif safe_isinstance(model, ["sklearn.ensemble.AdaBoostClassifier", "sklearn.ensemble._weighted_boosting.AdaBoostClassifier"]):
        assert hasattr(model, "estimators_"), "Model has no `estimators_`! Have you called `model.fit`?"
        self.internal_dtype = model.estimators_[0].tree_.value.dtype.type
        self.input_dtype = np.float32
        scaling = 1.0 / len(model.estimators_) # output is average of trees
        self.trees = [Tree(e.tree_, normalize=True, scaling=scaling) for e in model.estimators_]
        self.objective = objective_name_map.get(model.base_estimator_.criterion, None) #This line is done to get the decision criteria, for example gini.
        self.tree_output = "probability" #This is the last line added
    

    还在进行测试以将其添加到包中:)

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2018-01-22
      • 1970-01-01
      • 2013-02-07
      • 2013-08-12
      • 2021-08-20
      • 2021-02-09
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多