【问题标题】:error with mnLogloss for multinomial classifier using caret/gbm使用 caret/gbm 的多项分类器的 mnLogloss 错误
【发布时间】:2020-10-12 20:52:06
【问题描述】:

我正在尝试执行多项分类器。它似乎有效,我能够生成一个最小化 logLoss 与提升迭代的图,但是我无法提取错误值。这是我运行 mnLogLoss 函数时的错误。

Error in mnLogLoss(predicted, lev = predicted$label) : 
  'data' should have columns consistent with 'lev'
data has been partitioned into.
-training
-testing
-in both, the column "label" contains the ground truth

library(MLmetrics)
fitControl <- trainControl(method = "repeatedcv", number=10, repeats=3, verboseIter = FALSE,
                           savePredictions = TRUE, classProbs = TRUE, summaryFunction= mnLogLoss)


gbmGrid1 <- expand.grid(.interaction.depth = (1:3), .n.trees = (1:10)*20, .shrinkage = 0.01, .n.minobsinnode = 3)

system.time(
  gbmFit1 <- train(label~., data = training, method = "gbm", trControl=fitControl,
                   verbose = 1, metric = "logLoss", tuneGrid = gbmGrid1)
)

gbmPredictions <- predict(gbmFit1, testing)
predicted <- cbind(gbmPredictions, testing)

mnLogLoss(predicted, lev = levels(predicted$label))

【问题讨论】:

    标签: r classification r-caret gbm multinomial


    【解决方案1】:

    对于 mnLogLoss,它在小插图中说:

    data: a data frame with columns ‘obs’ and ‘pred’ for the observed
              and predicted outcomes. For metrics that rely on class
              probabilities, such as ‘twoClassSummary’, columns should also
              include predicted probabilities for each class. See the
              ‘classProbs’ argument to ‘trainControl’.
    

    所以它不要求训练数据。这里的data参数只是一个输入,所以我使用了一些模拟数据:

    library(caret)
    
    df = data.frame(label=factor(sample(c("a","b"),100,replace=TRUE)),
    matrix(runif(500),ncol=50))
    training = df[1:50,]
    testing = df[1:50,]
    
    fitControl <- trainControl(method = "repeatedcv", number=10, repeats=3, verboseIter = FALSE,
                               savePredictions = TRUE, classProbs = TRUE, summaryFunction= mnLogLoss)
    
    gbmGrid1 <- expand.grid(.interaction.depth = (1:3), .n.trees = (1:10)*20, .shrinkage = 0.01, .n.minobsinnode = 3)
    
    gbmFit1 <- train(label~., data = training, method = "gbm", trControl=fitControl,verbose = 1, metric = "logLoss", tuneGrid = gbmGrid1)
    )
    

    我们将obspred 放在一起,最后两列是每个类别的概率:

    predicted <- data.frame(obs=testing$label,
    pred=predict(gbmFit1, testing),
    predict(gbmFit1, testing,type="prob"))
    
    head(predicted)
    
      obs pred         a         b
    1   b    a 0.5506054 0.4493946
    2   b    a 0.5107631 0.4892369
    3   a    b 0.4859799 0.5140201
    4   b    a 0.5090264 0.4909736
    5   b    b 0.4545746 0.5454254
    6   a    a 0.6211514 0.3788486
    
    mnLogLoss(predicted, lev = levels(predicted$obs))
      logLoss 
    0.6377392
    

    【讨论】:

      猜你喜欢
      • 2020-09-20
      • 2014-02-10
      • 2014-07-22
      • 2013-01-27
      • 2020-08-09
      • 2013-03-14
      • 2013-08-17
      • 2015-04-22
      • 2018-01-05
      相关资源
      最近更新 更多