带有预拟合分类器的 mlxtend ensemblevoteclassifier 的二元分类答案

【问题标题】：binary classification with mlxtend ensemblevoteclassifier with prefitted classifiers带有预拟合分类器的 mlxtend ensemblevoteclassifier 的二元分类
【发布时间】：2018-08-22 21:28:23
【问题描述】：

我正在使用 mlxtend EnsembleVoteClassifier 使用预拟合线性 SVC 进行二元分类，但我不断遇到错误：

ValueError: X.shape[1] = 352 应该等于 336，个数训练时的特征

我使用 scikit-learn joblib 将预拟合分类器加载到列表中。分类器是来自 sklearn.svm 的线性 svc：

CLFS 列表：

[SVC(C=0.1, cache_size=200, class_weight=None, coef0=0.0,decision_function_shape='ovr', degree=3, gamma='auto', kernel='linear',max_iter=-1, probability=False, random_state=None, shrinking=True,tol=0.001, verbose=False),SVC(C=0.1, cache_size=200, class_weight=None, coef0=0.0,decision_function_shape='ovr', degree=3, gamma='auto', kernel='linear',max_iter=-1, probability=False, random_state=None, shrinking=True,tol=0.001, verbose=False)]

它们被传递给整体投票分类器，它像往常一样安装，没有任何问题：

ensembleVoting = EnsembleVoteClassifier(clfs = list_of_clfs, refit = False, voting='hard', weights=None)
X = ...
y = ...
ensembleVoting.fit(X,y)

上面提到的错误是在预测时出现的，即使使用相同的数据进行拟合：

predictions = ensembleVoting.predict(X)

【问题讨论】：

获得此信息的唯一原因是您的一个或多个预装配分类器配备了不同数量的特征。您的 CLF 列表是如何拟合的？
是的，你是对的。当我使用时间序列时，我没有剪裁它们的频率以确保拟合和预测之间的特征数量没有差异。谢谢！

标签： python scikit-learn ensemble-learning mlxtend

【解决方案1】：

正如@ken-syme 在上面 cmets 中提到的，分类器配备了与集成不同的数字特征。在这种情况下发生这种情况是因为用作数据的时间序列不是以完全相同的频率采样的。

【讨论】：