【问题标题】:Shap or Lime with TPOT classifier带 TPOT 分类器的 Shap 或 Lime
【发布时间】:2021-07-27 09:06:31
【问题描述】:

您将如何将 shap 或 Lime 或任何其他模型可解释性工具与 TPOT 导出管道一起使用?例如,这里有一些 shap 库的代码,但您不能将 TPOT 管道传递给它。你会在那里传递什么?

explainer = shap.Explainer(model)
shap_values = explainer(X)

【问题讨论】:

    标签: python scikit-learn shap tpot


    【解决方案1】:

    解决方案 1:

    要使用 SHAP 解释 scikit-learn 管道,TPOT 优化过程的结果模型对象,您需要指示 SHAP 使用名为最终估计器(分类器/回归器步骤)的管道,并且您需要使用任何管道转换器步骤(即:预处理器或特征选择器),然后将其提供给 SHAP 解释器。

    import numpy as np
    import pandas as pd
    import shap
    from sklearn.datasets import load_iris
    from tpot import TPOTClassifier
    
    #Let's use the Iris dataset
    
    iris = load_iris()
    X = pd.DataFrame(iris.data, columns=iris.feature_names)
    y = pd.DataFrame(iris.target)
    
    tpot = TPOTClassifier(generations=3, population_size=25, verbosity=3, random_state=42)
    tpot.fit(X, y)
    
    #Inspect resulting Pipeline. Great, 2 steps in the Pipeline: one selector and then the classifier.
    
    tpot.fitted_pipeline_
    
    Pipeline(steps=[('variancethreshold', VarianceThreshold(threshold=0.05)),
                    ('logisticregression',
                     LogisticRegression(C=10.0, random_state=42))])
    
    # Before feeding your data to the explainer, you need to transform the data up to the Pipeline step before the classifier step. 
    # Beware that in this case it's just one step, but could be more.
    
    shap_df = pd.DataFrame(tpot.fitted_pipeline_.named_steps["variancethreshold"].transform(X), columns=X.columns[tpot.fitted_pipeline_.named_steps["variancethreshold"].get_support(indices=True)])
    
    # Finally, instruct the SHAP explainer to use the classifier step with the transformed data
    
    shap.initjs()
    explainer = shap.KernelExplainer(tpot.fitted_pipeline_.named_steps["logisticregression"].predict_proba, shap_df)
    shap_values = explainer.shap_values(shap_df)
    
    #Plot summary
    shap.summary_plot(shap_values, shap_df)
    

    解决方案 2:

    显然 scikit-learn Pipeline predict_proba() 函数将执行解决方案 1 中刚刚描述的操作(即:转换数据,并将 predict_proba 与最终估计器一起应用。

    从这个意义上说,这也应该对你有用:

    import numpy as np
    import pandas as pd
    import shap
    from sklearn.datasets import load_iris
    from tpot import TPOTClassifier
    
    iris = load_iris()
    X = pd.DataFrame(iris.data, columns=iris.feature_names)
    y = pd.DataFrame(iris.target)
    
    tpot = TPOTClassifier(generations=10, population_size=50, verbosity=3, random_state=42, template='Selector-Transformer-Classifier')
    tpot.fit(X, y)
    
    #Resulting Pipeline
    Pipeline(steps=[('variancethreshold', VarianceThreshold(threshold=0.0001)),
                    ('rbfsampler', RBFSampler(gamma=0.8, random_state=42)),
                    ('randomforestclassifier',
                     RandomForestClassifier(bootstrap=False, criterion='entropy',
                                            max_features=0.5, min_samples_leaf=10,
                                            min_samples_split=12,
                                            random_state=42))])
    
    explainer = shap.KernelExplainer(tpot.fitted_pipeline_.predict_proba, X)
    shap_values = explainer.shap_values(X)
    
    shap.summary_plot(shap_values, X)
    

    补充说明

    如果您使用基于树的模型,您可以使用TreeExplainer,它必须比通用的KernelExplainer 更快。根据文档,支持 LightGBM、CatBoost、Pyspark 和大多数基于树的 scikit-learn 模型。

    【讨论】:

      猜你喜欢
      • 2018-09-24
      • 1970-01-01
      • 2018-07-15
      • 2017-08-06
      • 2019-12-08
      • 2019-12-17
      • 2021-12-19
      • 2021-01-04
      相关资源
      最近更新 更多