【问题标题】:Cross validation with particular dataset lists with Python使用 Python 对特定数据集列表进行交叉验证
【发布时间】:2016-10-03 07:34:52
【问题描述】:

我知道 sklearn 有很好的方法来获得交叉验证分数:

 from sklearn.model_selection import cross_val_score
 clf = svm.SVC(kernel='linear', C=1)
 scores = cross_val_score(clf, iris.data, iris.target, cv=5)
 scores      

我想知道特定训练和测试集的分数:

train_list = [train1, train2, train3] # train1,2,3 is the training data sets
test_list = [test1, test2, test3] # # test1,2,3 is the test data sets
clf = svm.SVC(kernel='linear', C=1)
scores = some_nice_method(clf, train_list, test_list)

有没有这种方法在python中给出特定分离数据集的分数?

【问题讨论】:

    标签: python machine-learning scikit-learn


    【解决方案1】:

    这正好是两行代码:

    for tr, te in zip(train_list, test_list):
        svm.SVC(kernel='linear', C=1).train(X[tr, :], y[tr]).score(X[te, :], y[te])
    

    sklearn.svn.SVC.score:

    score(X, y, sample_weight=None)
    

    返回给定测试数据和标签的平均准确度。

    【讨论】:

      【解决方案2】:

      我的建议是使用kfold cross validation,如下所示。在这种情况下,您将获得特定拆分的训练、测试指数以及准确度分数。 在新版本的 Sklearn 中,有一些变化。

      from sklearn import svm
      from sklearn import datasets
      from sklearn.model_selection import KFold
      from sklearn.metrics import accuracy_score
      
      iris = datasets.load_iris()
      X = iris.data
      y = iris.target
      
      clf = svm.SVC(kernel='linear', C=1)
      kf = KFold(n_splits=5)
      
      for train_index, test_index in kf.split(range(len(X))):
          print("TRAIN:", train_index, "TEST:", test_index)
          X_train, X_test = X[train_index], X[test_index]
          y_train, y_test = y[train_index], y[test_index]
          clf.fit(X_train, y_train)
          y_pred = clf.predict(X_test)
          score = accuracy_score(y_test, y_pred)
          print score
      

      【讨论】:

        猜你喜欢
        • 2020-03-18
        • 2020-09-10
        • 2022-01-25
        • 1970-01-01
        • 1970-01-01
        • 2021-07-30
        • 2018-01-06
        • 2020-08-03
        • 2011-12-16
        相关资源
        最近更新 更多