【问题标题】:Getting Precision and Recall using sklearn使用 sklearn 获得精确度和召回率
【发布时间】:2018-07-04 05:38:24
【问题描述】:

使用下面的代码,我有 Accuracy 。现在我正在尝试

1) 找到每个折叠的 precisionrecall(总共 10 折叠)

2) 为precision 获取mean

3) 为recall 获取mean

这可能类似于下面的print(scores)print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))

有什么想法吗?

import numpy as np
from sklearn import cross_validation
from sklearn import datasets
from sklearn import svm
from sklearn.model_selection import StratifiedKFold

iris = datasets.load_iris()
skf = StratifiedKFold(n_splits=10)
clf = svm.SVC(kernel='linear', C=1)
scores = cross_validation.cross_val_score(clf, iris.data, iris.target, cv=10)
print(scores)  #[ 1. 0.93333333   1.  1. 0.86666667  1.  0.93333333   1.  1.  1.]
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2)) # Accuracy: 0.97 (+/- 0.09)

【问题讨论】:

    标签: python machine-learning scikit-learn svm cross-validation


    【解决方案1】:

    这有点不同,因为cross_val_score不能计算非二分类的precision/recall,所以需要使用recision_score、recall_score并手动进行交叉验证。参数 average='micro' 计算全局精度/召回率。

    import numpy as np
    from sklearn import cross_validation
    from sklearn import datasets
    from sklearn import svm
    from sklearn.model_selection import StratifiedKFold
    from sklearn.metrics import precision_score, recall_score
    
    iris = datasets.load_iris()
    skf = StratifiedKFold(n_splits=10)
    clf = svm.SVC(kernel='linear', C=1)
    
    X = iris.data
    y = iris.target
    precision_scores = []
    recall_scores = []
    for train_index, test_index in skf.split(X, y):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
    
        y_pred = clf.fit(X_train, y_train).predict(X_test)
        precision_scores.append(precision_score(y_test, y_pred, average='micro'))
        recall_scores.append(recall_score(y_test, y_pred, average='micro'))
    
    print(precision_scores)
    print("Recall: %0.2f (+/- %0.2f)" % (np.mean(precision_scores), np.std(precision_scores) * 2))
    print(recall_scores)
    print("Recall: %0.2f (+/- %0.2f)" % (np.mean(recall_scores), np.std(recall_scores) * 2))
    

    【讨论】:

    • 我的precisionrecallaccuracy 分数都完全相同[1.0, 0.93333333333333335, 1.0, 1.0, 0.8666666666666667, 1.0, 0.93333333333333335, 1.0, 1.0, 1.0] average: 0.97 (+/- 0.09),但情况并非如此。为什么是这样?我们该如何解决?
    • 我得到了同样的结果。我认为这是由于数据:iris 数据集太小太简单,因此您可以尝试使用更大的数据集。
    • 我在我正在使用的实际数据集(np.array of 2163, 8719)上尝试了这组代码,但对于precisionrecall 和@,我仍然收到相同的答案987654330@
    • 有什么想法可以解决这个问题吗?
    • 将设置更改为average='macro' 更改了precisionrecall 分数,但我不确定它是否是适当的设置
    【解决方案2】:
    import pandas as pd
    import numpy as np
    from sklearn.metrics import confusion_matrix, recall_score, precision_score, 
                                accuracy_score, f1_score,roc_auc_score
              
    def binary_classification_performance(y_test, y_pred):
        tp, fp, fn, tn = confusion_matrix(y_test, y_pred).ravel()
        accuracy = round(accuracy_score(y_pred = y_pred, y_true = y_test),2)
        precision = round(precision_score(y_pred = y_pred, y_true = y_test),2)
        recall = round(recall_score(y_pred = y_pred, y_true = y_test),2)
        f1_score = round(2*precision*recall/(precision + recall),2)
        specificity = round(tn/(tn+fp),2)
        npv = round(tn/(tn+fn),2)
        auc_roc = round(roc_auc_score(y_score = y_pred, y_true = y_test),2)
    
    
        result = pd.DataFrame({'Accuracy' : [accuracy],
                             'Precision (or PPV)' : [precision],
                             'Recall (senitivity or TPR)' : [recall],
                             'f1 score' : [f1_score],
                             'AUC_ROC' : [auc_roc],
                             'Specificty (or TNR)': [specificity],
                             'NPV' : [npv],
                             'True Positive' : [tp],
                             'True Negative' : [tn],
                             'False Positive':[fp],
                             'False Negative':[fn]})
        return result
    
    
    binary_classification_performance(y_test, y_pred)
    

    【讨论】:

      猜你喜欢
      • 2021-05-05
      • 2018-01-20
      • 2017-03-14
      • 2015-12-05
      • 2020-08-29
      • 2017-05-28
      • 2019-09-06
      • 2023-04-07
      • 2021-03-02
      相关资源
      最近更新 更多