【问题标题】:R caret Error: Something is wrong; all the Accuracy metric values are missing:R 插入符号错误:有问题;缺少所有准确度指标值:
【发布时间】:2017-09-29 17:35:38
【问题描述】:

我正在尝试在我的数据集上应用堆叠,但我在这里。

# Load library
library(DJL)
library(caret)
library(caretEnsemble)

# Load data and format the target attribute to avoid clutters
df <- dataset.engine.2015[, -c(1, 2)]
levels(df$Type) <- list(NA.D = "NA-D", NA.P = "NA-P", SC.P = "SC-P", TC.D = "TC-D", TC.P = "TC-P")

# Run
st.methods <- c("lda", "rpart", "glm", "knn", "svmRadial")
st.control <- trainControl(method = "repeatedcv", number = 5, repeats = 3, 
                           savePredictions = T, classProbs = T)
st.models  <- caretList(Type ~., data = df, trControl = st.control, methodList = st.methods)

然后我明白了:

Something is wrong; all the Accuracy metric values are missing:
    Accuracy       Kappa    
 Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA  
 NA's   :1     NA's   :1    
Error: Stopping
In addition: There were 18 warnings (use warnings() to see them)

谁能帮我解决这个错误?

【问题讨论】:

  • 你能不能试着举个例子reproducible,否则任何建议都只是猜测。
  • 对不起,我忘了加载包“caretEnsemble”。现在,您可以重现我的错误。感谢您指出这一点!
  • 如果没有从您的输入数据集dataset.engine.2015 中进行一些观察,您的示例将无法重现。如果您的数据集是专有的,您可以使用第一条评论中提到的选项对其进行匿名化。本质上,创建一个小型测试数据集,您会在该数据集上收到错误并发布dput(DF) 的输出,其中DF 是测试数据集。
  • 我不知道您所说的“不可重现”是什么意思。数据集(dataset.engine.2015)伴随着“DJL”包,所以我相信你可以简单地实现我的代码来加载它并重现我的问题。如果这不是您所要求的,请告知。
  • 对不起,请原谅我对DJL 包的不熟悉以及我假设dataset.engine.2015 是用户定义的数据集。

标签: r r-caret


【解决方案1】:

glm 模型不能用于预测具有两个以上类别的分类因变量。尝试从st.methods 中删除glm 或将glm 替换为例如multinomgbmrandomForest

这里有两个有用的实验。首先我们只考虑glm

rm(list=ls())
library(DJL)
library(caret)
library(caretEnsemble)  
df <- dataset.engine.2015[, -c(1, 2)]
levels(df$Type) <- list(NA.D = "NA-D", NA.P = "NA-P", SC.P = "SC-P", TC.D = "TC-D", TC.P = "TC-P")

st.control <- trainControl(method = "repeatedcv", number = 5, repeats = 3, 
                           savePredictions = T, classProbs = T)

st.methods <- c("glm")
st.models  <- caretList(Type ~., data = df, trControl = st.control, methodList = st.methods)

这是错误信息:

Something is wrong; all the Accuracy metric values are missing:
    Accuracy       Kappa    
 Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA  
 NA's   :1     NA's   :1    
Error in train.default(x, y, weights = w, ...) : Stopping
Inoltre: There were 18 warnings (use warnings() to see them)

现在我们将glm 替换为multinom

st.methods <- c("multinom")
st.models  <- caretList(Type ~., data = df, trControl = st.control, methodList = st.methods)
print(st.models)

输出是:

$multinom
Penalized Multinomial Regression 

1206 samples
   5 predictor
   5 classes: 'NA.D', 'NA.P', 'SC.P', 'TC.D', 'TC.P' 

No pre-processing
Resampling: Cross-Validated (5 fold, repeated 3 times) 
Summary of sample sizes: 964, 965, 965, 965, 965, 964, ... 
Resampling results across tuning parameters:

  decay  Accuracy   Kappa    
  0e+00  0.9306411  0.8518294
  1e-04  0.9300901  0.8506964
  1e-01  0.9328507  0.8564466

Accuracy was used to select the optimal model using  the largest value.
The final value used for the model was decay = 0.1.

【讨论】:

  • 感谢您的解决方案!它现在可以工作,但不幸的是 caretEnsemble 包没有为多类(> 2)问题提供堆叠:当我尝试使用 caretStack 函数创建集成解决方案时,它返回“尚未针对多类问题实现”。我希望我早点知道这一点!我可能必须找到另一个包来完成这项任务。如果有人知道 R 中有更好的选择,请发表评论。
猜你喜欢
  • 2015-09-01
  • 2016-12-21
  • 2015-08-09
  • 2016-07-10
  • 2021-04-01
  • 1970-01-01
  • 1970-01-01
  • 2016-01-10
  • 2020-12-14
相关资源
最近更新 更多