【发布时间】:2017-05-01 10:55:36
【问题描述】:
我对@987654321@的教程稍作修改
所以 X 有一个缺失值。这不适用于原始的 svc,因此我尝试创建一个 clf 作为管道 - 一个 imputer,然后是一个 svc。但是,我仍然收到缺失值错误。将 RFECV 等特征选择方法与流水线中的分类器链接起来时,如何进行估算?
print(__doc__)
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.model_selection import StratifiedKFold
from sklearn.feature_selection import RFECV
from sklearn.datasets import make_classification
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import Imputer
# Build a classification task using 3 informative features
X, y = make_classification(n_samples=20, n_features=25, n_informative=3,
n_redundant=2, n_repeated=0, n_classes=8,
n_clusters_per_class=1, random_state=0)
X[1][8]=np.NAN#plant missing value
# Create the RFE object and compute a cross-validated score.
svc = SVC(kernel="linear")
clf=make_pipeline(Imputer(),svc)
# The "accuracy" scoring is proportional to the number of correct
# classifications
rfecv = RFECV(estimator=clf, step=1, cv=StratifiedKFold(2),
scoring='accuracy')
rfecv.fit(X, y)
print("Optimal number of features : %d" % rfecv.n_features_)
# Plot number of features VS. cross-validation scores
plt.figure()
plt.xlabel("Number of features selected")
plt.ylabel("Cross validation score (nb of correct classifications)")
plt.plot(range(1, len(rfecv.grid_scores_) + 1), rfecv.grid_scores_)
plt.show()
【问题讨论】:
标签: python numpy scikit-learn