【发布时间】:2021-08-17 22:37:12
【问题描述】:
我正在尝试创建一个用于校准分类器的类。我一直在阅读有关概率校准的资源,我对我们应该校准分类器的数据集有点困惑。我创建了一个拆分训练集的类,以进一步训练和验证该集。然后,首先将分类器拟合到训练集,并在验证集上预测未校准的概率。
然后,我创建 CalibrationCV 类的 cal_model 实例,然后将其拟合到验证集并再次预测验证集的校准概率。
有人可以看看下面的代码并为我更正代码吗?
class calibrate_model:
"""
A class that will split the training dataset to both train and validation set and then does
probability calibration.
model = Classification model
Xtrain = Independent feature set
ytrain = target variable set
cv = cross validation method
cal_method = 'sigmoid' or 'isotonic'.
"""
def __init__(self, model, Xtrain, ytrain, cv, cal_method):
self.model = model
self.Xtrain = Xtrain
self.ytrain = ytrain
self.cv = cv
self.cal_method = cal_method
def calibrate_probability(self):
from sklearn.model_selection import train_test_split
from sklearn.calibration import CalibratedClassifierCV
from sklearn.calibration import calibration_curve
train_X, val_X, train_y, val_y = train_test_split(self.Xtrain,
self.ytrain,
test_size = 0.2,
random_state = seed)
#uncalibrated model
for train_index, test_index in self.cv.split(train_X, train_y):
X_train_kfold, X_val_kfold = train_X[train_index], train_X[test_index]
y_train_kfold, y_val_kfold = train_y[train_index], train_y[test_index]
self.model.fit(X_train_kfold, y_train_kfold)
uc_probs = self.model.predict_proba(val_X)[:, 1]
uc_fop, uc_mpv = calibration_curve(val_y, uc_probs, n_bins=10, normalize=True,
strategy = 'quantile')
#Calibrating Model
self.cal_model = CalibratedClassifierCV(self.model, method=self.cal_method, cv=self.cv)
self.cal_model.fit(val_X, val_y)
# predict probabilities
c_probs = self.cal_model.predict_proba(val_X)[:, 1]
# reliability diagram
c_fop, c_mpv = calibration_curve(val_y, c_probs, n_bins=10, normalize=True,
strategy = 'quantile')
# plot CATBOOST calibrated
plt.plot([0, 1], [0, 1], linestyle='--');
# plot un calibrated model reliability
plt.plot(uc_mpv, uc_fop, marker='.', label = 'Uncalibrated');
# plot calibrated reliability
plt.plot(c_mpv, c_fop, marker='.', label = 'Calibrated');
plt.title(type(self.model).__name__ + ' ' + self.cal_method)
plt.ylabel('Fraction of Positives (fop)')
plt.xlabel('Mean Predicted Value (mpv)')
plt.legend();
plt.tight_layout()
【问题讨论】:
标签: python machine-learning calibration