当您在 trainControl 中指定 classProbs = TRUE 时,Caret 确实支持将类概率传递给自定义摘要函数。在这种情况下,创建自定义汇总函数时的data 参数将有另外两列命名为包含每个类概率的类。这些类的名称将在 lev 参数中,这是一个长度为 2 的向量。
查看示例:
library(caret)
library(mlbench)
data(Sonar)
自定义汇总LogLoss:
LogLoss <- function (data, lev = NULL, model = NULL){
obs <- data[, "obs"] #truth
cls <- levels(obs) #find class names
probs <- data[, cls[2]] #use second class name to extract probs for 2nd clas
probs <- pmax(pmin(as.numeric(probs), 1 - 1e-15), 1e-15) #bound probability, this line and bellow is just logloss calculation, irrelevant for your question
logPreds <- log(probs)
log1Preds <- log(1 - probs)
real <- (as.numeric(data$obs) - 1)
out <- c(mean(real * logPreds + (1 - real) * log1Preds)) * -1
names(out) <- c("LogLoss") #important since this is specified in call to train. Output can be a named vector of multiple values.
out
}
fitControl <- trainControl(method = "cv",
number = 5,
classProbs = TRUE,
summaryFunction = LogLoss)
fit <- train(Class ~.,
data = Sonar,
method = "rpart",
metric = "LogLoss" ,
tuneLength = 5,
trControl = fitControl,
maximize = FALSE) #important, depending on calculated performance measure
fit
#output
CART
208 samples
60 predictor
2 classes: 'M', 'R'
No pre-processing
Resampling: Cross-Validated (5 fold)
Summary of sample sizes: 166, 166, 166, 167, 167
Resampling results across tuning parameters:
cp LogLoss
0.00000000 1.1220902
0.01030928 1.1220902
0.05154639 1.1017268
0.06701031 1.0694052
0.48453608 0.6405134
LogLoss was used to select the optimal model using the smallest value.
The final value used for the model was cp = 0.4845361.
或者使用包含类级别的lev 参数并定义一些错误检查
LogLoss <- function (data, lev = NULL, model = NULL){
if (length(lev) > 2) {
stop(paste("Your outcome has", length(lev), "levels. The LogLoss() function isn't appropriate."))
}
obs <- data[, "obs"] #truth
probs <- data[, lev[2]] #use second class name
probs <- pmax(pmin(as.numeric(probs), 1 - 1e-15), 1e-15) #bound probability
logPreds <- log(probs)
log1Preds <- log(1 - probs)
real <- (as.numeric(data$obs) - 1)
out <- c(mean(real * logPreds + (1 - real) * log1Preds)) * -1
names(out) <- c("LogLoss")
out
}
查看插入符号书的这一部分:https://topepo.github.io/caret/model-training-and-tuning.html#metrics
了解更多信息。如果您打算使用插入符号,即使您不擅长阅读,也值得一读。