【问题标题】:True Positive Rate and False Positive Rate (TPR, FPR) for Multi-Class Data in python [duplicate]python中多类数据的真阳性率和假阳性率(TPR,FPR)[重复]
【发布时间】:2018-11-12 22:24:32
【问题描述】:

如何计算多类分类问题的真假阳性率?说,

y_true = [1, -1,  0,  0,  1, -1,  1,  0, -1,  0,  1, -1,  1,  0,  0, -1,  0]
y_prediction = [-1, -1,  1,  0,  0,  0,  0, -1,  1, -1,  1,  1,  0,  0,  1,  1, -1]

混淆矩阵是由metrics.confusion_matrix(y_true, y_prediction) 计算的,但这只是转移了问题。


在@seralouk 的回答之后编辑。在这里,-1 类将被视为负数,而 01 是正数的变体。

【问题讨论】:

    标签: python scikit-learn confusion-matrix multiclass-classification


    【解决方案1】:

    使用您的数据,您可以一次获取所有类的所有指标:

    import numpy as np
    from sklearn.metrics import confusion_matrix
    
    y_true = [1, -1,  0,  0,  1, -1,  1,  0, -1,  0,  1, -1,  1,  0,  0, -1,  0]
    y_prediction = [-1, -1,  1,  0,  0,  0,  0, -1,  1, -1,  1,  1,  0,  0,  1,  1, -1]
    cnf_matrix = confusion_matrix(y_true, y_prediction)
    print(cnf_matrix)
    #[[1 1 3]
    # [3 2 2]
    # [1 3 1]]
    
    FP = cnf_matrix.sum(axis=0) - np.diag(cnf_matrix)  
    FN = cnf_matrix.sum(axis=1) - np.diag(cnf_matrix)
    TP = np.diag(cnf_matrix)
    TN = cnf_matrix.sum() - (FP + FN + TP)
    
    FP = FP.astype(float)
    FN = FN.astype(float)
    TP = TP.astype(float)
    TN = TN.astype(float)
    
    # Sensitivity, hit rate, recall, or true positive rate
    TPR = TP/(TP+FN)
    # Specificity or true negative rate
    TNR = TN/(TN+FP) 
    # Precision or positive predictive value
    PPV = TP/(TP+FP)
    # Negative predictive value
    NPV = TN/(TN+FN)
    # Fall out or false positive rate
    FPR = FP/(FP+TN)
    # False negative rate
    FNR = FN/(TP+FN)
    # False discovery rate
    FDR = FP/(TP+FP)
    # Overall accuracy
    ACC = (TP+TN)/(TP+FP+FN+TN)
    

    对于我们有很多类的一般情况,这些指标在下图中以图形方式表示:

    【讨论】:

      【解决方案2】:

      因为有几种方法可以解决这个问题,但没有一个是真正通用的(参见https://stats.stackexchange.com/questions/202336/true-positive-false-negative-true-negative-false-positive-definitions-for-mul?noredirect=1&lq=1https://stats.stackexchange.com/questions/51296/how-do-you-calculate-precision-and-recall-for-multiclass-classification-using-co#51301),这里是the paper which I was unclear about中貌似用到的解决方案:

      将两个前台页面之间的混淆计算为误报

      所以解决方案是import numpy as np,使用y_truey_prediction作为np.array,那么:

      FP = np.logical_and(y_true != y_prediction, y_prediction != -1).sum()  # 9
      FN = np.logical_and(y_true != y_prediction, y_prediction == -1).sum()  # 4
      TP = np.logical_and(y_true == y_prediction, y_true != -1).sum()  # 3
      TN = np.logical_and(y_true == y_prediction, y_true == -1).sum()  # 1
      TPR = 1. * TP / (TP + FN)  # 0.42857142857142855
      FPR = 1. * FP / (FP + TN)  # 0.9
      

      【讨论】:

        【解决方案3】:

        另一个简单的方法是PyCM(由我),它支持多类混淆矩阵分析。

        适用于您的问题:

        >>> from pycm import ConfusionMatrix
        >>> y_true = [1, -1,  0,  0,  1, -1,  1,  0, -1,  0,  1, -1,  1,  0,  0, -1,  0]
        >>> y_prediction = [-1, -1,  1,  0,  0,  0,  0, -1,  1, -1,  1,  1,  0,  0,  1,  1, -1]
        >>> cm = ConfusionMatrix(actual_vector=y_true,predict_vector=y_prediction)
        >>> print(cm)
        Predict          -1       0        1        
        Actual
        -1               1        1        3        
        0                3        2        2        
        1                1        3        1        
        
        
        
        
        Overall Statistics : 
        
        95% CI                                                           (0.03365,0.43694)
        Bennett_S                                                        -0.14706
        Chi-Squared                                                      None
        Chi-Squared DF                                                   4
        Conditional Entropy                                              None
        Cramer_V                                                         None
        Cross Entropy                                                    1.57986
        Gwet_AC1                                                         -0.1436
        Joint Entropy                                                    None
        KL Divergence                                                    0.01421
        Kappa                                                            -0.15104
        Kappa 95% CI                                                     (-0.45456,0.15247)
        Kappa No Prevalence                                              -0.52941
        Kappa Standard Error                                             0.15485
        Kappa Unbiased                                                   -0.15405
        Lambda A                                                         0.2
        Lambda B                                                         0.27273
        Mutual Information                                               None
        Overall_ACC                                                      0.23529
        Overall_RACC                                                     0.33564
        Overall_RACCU                                                    0.33737
        PPV_Macro                                                        0.23333
        PPV_Micro                                                        0.23529
        Phi-Squared                                                      None
        Reference Entropy                                                1.56565
        Response Entropy                                                 1.57986
        Scott_PI                                                         -0.15405
        Standard Error                                                   0.10288
        Strength_Of_Agreement(Altman)                                    Poor
        Strength_Of_Agreement(Cicchetti)                                 Poor
        Strength_Of_Agreement(Fleiss)                                    Poor
        Strength_Of_Agreement(Landis and Koch)                           Poor
        TPR_Macro                                                        0.22857
        TPR_Micro                                                        0.23529
        
        Class Statistics :
        
        Classes                                                          -1                      0                       1                       
        ACC(Accuracy)                                                    0.52941                 0.47059                 0.47059                 
        BM(Informedness or bookmaker informedness)                       -0.13333                -0.11429                -0.21667                
        DOR(Diagnostic odds ratio)                                       0.5                     0.6                     0.35                    
        ERR(Error rate)                                                  0.47059                 0.52941                 0.52941                 
        F0.5(F0.5 score)                                                 0.2                     0.32258                 0.17241                 
        F1(F1 score - harmonic mean of precision and sensitivity)        0.2                     0.30769                 0.18182                 
        F2(F2 score)                                                     0.2                     0.29412                 0.19231                 
        FDR(False discovery rate)                                        0.8                     0.66667                 0.83333                 
        FN(False negative/miss/type 2 error)                             4                       5                       4                       
        FNR(Miss rate or false negative rate)                            0.8                     0.71429                 0.8                     
        FOR(False omission rate)                                         0.33333                 0.45455                 0.36364                 
        FP(False positive/type 1 error/false alarm)                      4                       4                       5                       
        FPR(Fall-out or false positive rate)                             0.33333                 0.4                     0.41667                 
        G(G-measure geometric mean of precision and sensitivity)         0.2                     0.30861                 0.18257                 
        LR+(Positive likelihood ratio)                                   0.6                     0.71429                 0.48                    
        LR-(Negative likelihood ratio)                                   1.2                     1.19048                 1.37143                 
        MCC(Matthews correlation coefficient)                            -0.13333                -0.1177                 -0.20658                
        MK(Markedness)                                                   -0.13333                -0.12121                -0.19697                
        N(Condition negative)                                            12                      10                      12                      
        NPV(Negative predictive value)                                   0.66667                 0.54545                 0.63636                 
        P(Condition positive)                                            5                       7                       5                       
        POP(Population)                                                  17                      17                      17                      
        PPV(Precision or positive predictive value)                      0.2                     0.33333                 0.16667                 
        PRE(Prevalence)                                                  0.29412                 0.41176                 0.29412                 
        RACC(Random accuracy)                                            0.08651                 0.14533                 0.10381                 
        RACCU(Random accuracy unbiased)                                  0.08651                 0.14619                 0.10467                 
        TN(True negative/correct rejection)                              8                       6                       7                       
        TNR(Specificity or true negative rate)                           0.66667                 0.6                     0.58333                 
        TON(Test outcome negative)                                       12                      11                      11                      
        TOP(Test outcome positive)                                       5                       6                       6                       
        TP(True positive/hit)                                            1                       2                       1                       
        TPR(Sensitivity, recall, hit rate, or true positive rate)        0.2                     0.28571                 0.2                     
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 2022-01-17
          • 2017-01-25
          • 1970-01-01
          • 2018-08-21
          • 2022-12-29
          • 2016-02-03
          • 2020-09-22
          • 1970-01-01
          相关资源
          最近更新 更多