【发布时间】:2021-10-12 12:47:33
【问题描述】:
我正在处理数据,我正在尝试不同的分类算法,看看哪一个作为基线模型表现最好。代码如下:
# Trying out different classifiers and selecting the best
## Creat list of classifiers we're going to loop through
classifiers = [
KNeighborsClassifier(),
SVC(),
DecisionTreeClassifier(),
RandomForestClassifier(),
AdaBoostClassifier(),
GradientBoostingClassifier()
]
classifier_names = [
'kNN',
'SVC',
'DecisionTree',
'RandomForest',
'AdaBoost',
'GradientBoosting'
]
model_scores = []
## Looping through the classifiers
for classifier, name in zip(classifiers, classifier_names):
pipe = Pipeline(steps=[
('preprocessor', preprocessor),
('selector', SelectKBest(k=len(X.columns))),
('classifier', classifier)])
score = cross_val_score(pipe, X, y, cv=5, scoring='accuracy').mean()
model_scores.append(score)
print("Model score for {}: {}".format(name, score))
输出是:
Model score for kNN: 0.7472524440239673
Model score for SVC: 0.7896621728161464
Model score for DecisionTree: 0.7302148734267939
Model score for RandomForest: 0.779058799919727
Model score for AdaBoost: 0.7949635904933918
Model score for GradientBoosting: 0.7930712637252372
原来最好的模型是AdaBoostClassifier()。我通常会选择最好的基线模型并对其执行GridSearchCV 以进一步提高其基线性能。
但是,如果假设作为基线模型表现最好的模型(在本例中为 AdaBoost),通过超参数调整仅提高 1%,而最初表现不佳的模型(例如 SCV() ),会有更多的“潜力”,通过超参数调整来改进(例如,提高 4%),并且在调整之后会最终成为更好的模型?
有没有办法预先知道这个“潜力”,而无需对 所有 分类器执行 GridSearch?
【问题讨论】:
标签: python scikit-learn hyperparameters