R 中的机器学习 - 集成的混淆矩阵答案

【问题标题】：Machine Learning in R - confusion matrix of an ensembleR 中的机器学习 - 集成的混淆矩阵
【发布时间】：2019-06-11 22:44:55
【问题描述】：

我正在尝试访问跨多个分类器的总体准确度（或混淆矩阵），但似乎无法找到报告此问题的方法。

已经试过了：

confusionMatrix(fits_predicts,reference=(mnist_27$test$y))

表中的错误（数据，参考，dnn = dnn，...）：所有参数长度必须相同

library(caret)
library(dslabs)
set.seed(1)
data("mnist_27")

models <- c("glm", "lda",  "naive_bayes",  "svmLinear", 
            "gamboost",  "gamLoess", "qda", 
            "knn", "kknn", "loclda", "gam",
            "rf", "ranger",  "wsrf", "Rborist", 
            "avNNet", "mlp", "monmlp",
            "adaboost", "gbm",
            "svmRadial", "svmRadialCost", "svmRadialSigma")

fits <- lapply(models, function(model){ 
  print(model)
  train(y ~ ., method = model, data = mnist_27$train)
}) 

names(fits) <- models

fits_predicts <- sapply(fits, function(fits){ predict(fits,mnist_27$test)
  })

我想报告不同模型的混淆矩阵。

【问题讨论】：

你好像没有合奏，只是一堆模特……
在我看来，这个问题是不完整的，因为它是一个集合，但它缺乏投票系统，多数票决定最终模型。

标签： r machine-learning r-caret confusion-matrix ensemble-learning

【解决方案1】：

你没有训练任何合奏；你只是在训练几个模型的列表，没有以任何方式组合它们，这绝对不是一个集成。

鉴于此，您得到的错误并不意外，因为 confusionMatrix 需要一个预测（如果您确实有一个整体，就会出现这种情况），而不是多个预测。

为简单起见，仅保留前 4 个模型的列表，并稍微更改您的 fits_predicts 定义，以便它提供一个数据框，即：

models <- c("glm", "lda",  "naive_bayes",  "svmLinear")

fits_predicts <- as.data.frame( sapply(fits, function(fits){ predict(fits,mnist_27$test)
}))

# rest of your code as-is

以下是获取每个模型的混淆矩阵的方法：

cm <- lapply(fits_predicts, function(fits_predicts){confusionMatrix(fits_predicts,reference=(mnist_27$test$y))
})

给了

> cm
$glm
Confusion Matrix and Statistics

          Reference
Prediction  2  7
         2 82 26
         7 24 68

               Accuracy : 0.75           
                 95% CI : (0.684, 0.8084)
    No Information Rate : 0.53           
    P-Value [Acc > NIR] : 1.266e-10      

                  Kappa : 0.4976         
 Mcnemar's Test P-Value : 0.8875         

            Sensitivity : 0.7736         
            Specificity : 0.7234         
         Pos Pred Value : 0.7593         
         Neg Pred Value : 0.7391         
             Prevalence : 0.5300         
         Detection Rate : 0.4100         
   Detection Prevalence : 0.5400         
      Balanced Accuracy : 0.7485         

       'Positive' Class : 2              


$lda
Confusion Matrix and Statistics

          Reference
Prediction  2  7
         2 82 26
         7 24 68

               Accuracy : 0.75           
                 95% CI : (0.684, 0.8084)
    No Information Rate : 0.53           
    P-Value [Acc > NIR] : 1.266e-10      

                  Kappa : 0.4976         
 Mcnemar's Test P-Value : 0.8875         

            Sensitivity : 0.7736         
            Specificity : 0.7234         
         Pos Pred Value : 0.7593         
         Neg Pred Value : 0.7391         
             Prevalence : 0.5300         
         Detection Rate : 0.4100         
   Detection Prevalence : 0.5400         
      Balanced Accuracy : 0.7485         

       'Positive' Class : 2              


$naive_bayes
Confusion Matrix and Statistics

          Reference
Prediction  2  7
         2 88 23
         7 18 71

               Accuracy : 0.795           
                 95% CI : (0.7323, 0.8487)
    No Information Rate : 0.53            
    P-Value [Acc > NIR] : 5.821e-15       

                  Kappa : 0.5873          
 Mcnemar's Test P-Value : 0.5322          

            Sensitivity : 0.8302          
            Specificity : 0.7553          
         Pos Pred Value : 0.7928          
         Neg Pred Value : 0.7978          
             Prevalence : 0.5300          
         Detection Rate : 0.4400          
   Detection Prevalence : 0.5550          
      Balanced Accuracy : 0.7928          

       'Positive' Class : 2               


$svmLinear
Confusion Matrix and Statistics

          Reference
Prediction  2  7
         2 81 24
         7 25 70

               Accuracy : 0.755           
                 95% CI : (0.6894, 0.8129)
    No Information Rate : 0.53            
    P-Value [Acc > NIR] : 4.656e-11       

                  Kappa : 0.5085          
 Mcnemar's Test P-Value : 1               

            Sensitivity : 0.7642          
            Specificity : 0.7447          
         Pos Pred Value : 0.7714          
         Neg Pred Value : 0.7368          
             Prevalence : 0.5300          
         Detection Rate : 0.4050          
   Detection Prevalence : 0.5250          
      Balanced Accuracy : 0.7544          

       'Positive' Class : 2

您还可以访问每个模型的单个混淆矩阵，例如对于lda：

> cm['lda']
$lda
Confusion Matrix and Statistics

          Reference
Prediction  2  7
         2 82 26
         7 24 68

               Accuracy : 0.75           
                 95% CI : (0.684, 0.8084)
    No Information Rate : 0.53           
    P-Value [Acc > NIR] : 1.266e-10      

                  Kappa : 0.4976         
 Mcnemar's Test P-Value : 0.8875         

            Sensitivity : 0.7736         
            Specificity : 0.7234         
         Pos Pred Value : 0.7593         
         Neg Pred Value : 0.7391         
             Prevalence : 0.5300         
         Detection Rate : 0.4100         
   Detection Prevalence : 0.5400         
      Balanced Accuracy : 0.7485         

       'Positive' Class : 2

【讨论】：