【发布时间】:2016-05-09 22:16:37
【问题描述】:
我正在使用sklearn训练一个分类模型,数据形状和训练管道是:
clf = Pipeline([
("imputer", Imputer(missing_values='NaN', strategy="mean", axis=0)),
('feature_selection', VarianceThreshold(threshold=(.97 * (1 - .97)))),
('scaler', StandardScaler()),
('classification', svm.SVC(kernel='linear', C=1))])
print X.shape, y.shape
(59381, 895) (59381,)
我检查了feature_selection 会将特征向量大小从895 减少到124
feature_selection = Pipeline([
("imputer", Imputer(missing_values='NaN', strategy="mean", axis=0)),
('feature_selection', VarianceThreshold(threshold=(.97 * (1 - .97))))
])
feature_selection.fit_transform(X).shape
(59381, 124) (59381,)
然后我尝试如下获得准确性
scores = cross_validation.cross_val_score(clf, X, y)
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
但是训练过程很慢,我想知道在这种情况下加快过程?还是124 的特征向量大小对svm 模型来说还是太大了?
【问题讨论】:
标签: machine-learning scikit-learn svm libsvm