【发布时间】:2021-04-15 10:47:23
【问题描述】:
正如您在 SHAP 瀑布图中看到的那样,值为零,这是什么原因?零值是否合理?
这是我的数据的链接: https://github.com/kilickursat/Tunnelling/blob/main/TBM_Performance.xlsx
这是我的代码:
import numpy as np
import pandas as pd
import lightgbm
from sklearn.metrics import r2_score, mean_squared_error as MSE
from lightgbm import LGBMRegressor
import shap
import io
df2 = pd.read_excel(io.BytesIO(uploaded['TBM_Performance.xlsx'])) #Colab used
df2["ROCK_PRO"] = df2["UCS(MPa)"] / df2["BTS(MPa)"]
X = df2[["UCS(MPa)", "BTS(MPa)","Fs(m)","Alpha(degree)","PI(kN/mm)","ROCK_PRO"]]
y = df2[["ROP(m/hr)"]]
print(df2)
print(X,y)
hyper_params = {
'task': 'train',
'boosting_type': 'goss',
'objective': 'regression',
'metric': "mse"
}
# train an LightGBM model
model = lightgbm.LGBMRegressor(**hyper_params).fit(X, y)
explainer = shap.Explainer(model)
# visualize the first prediction's explanation
shap.plots.waterfall(shap_values[0])
[![enter image description here][2]][2]
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
X = pd.DataFrame(np.c_[df2['PI(kN/mm)'],df2["ROCK_PRO"],df2["BTS(MPa)"]], columns = ['PI(kN/mm)', "ROCK_PRO", "BTS(MPa)"])
y = df2['ROP(m/hr)']
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.20, random_state=42)
model= LGBMRegressor(**hyper_params,min_data_in_leaf=0,
min_sum_hessian_in_leaf=0.0).fit(X_train, y_train)
predictions = model.predict(X_test)
r2_score(predictions, y_test).round(2)
#R2_score : 0.96
【问题讨论】:
-
感谢@Flavia Giammarino 的编辑。