【发布时间】:2021-01-04 21:07:02
【问题描述】:
错误:nrow(x) == n 不正确
我不确定在这种情况下“n”指的是什么。以下是引发错误的代码:
# BUILD MODEL
set.seed(9353)
elastic_net_model <- train(x = predictors, y = y,
method = "glmnet",
family = "binomial",
preProcess = c("scale"),
tuneLength = 10,
metric = "ROC",
# metric = "Spec",
trControl = train_control)
其他人遇到此错误的主要问题是他们的 y 变量不是因子或数字。他们经常将其作为矩阵或数据框传递。我明确地将我的 y 作为一个因素,如下所示:
# Make sure that the outcome variable is a two-level factor
dfBlocksAll$trophout1 = as.factor(dfBlocksAll$trophout1)
# Set levels for dfBlocksAll$trophout1
levels(dfBlocksAll$trophout1) <- c("NoTrophy", "Trophy")
# Split the data into training and test set, 70/30 split
set.seed(1934)
index <- createDataPartition(y = dfBlocksAll$trophout1, p = 0.70, list = FALSE)
training <- dfBlocksAll[index, ]
testing <- dfBlocksAll[-index, ]
# This step is the heart of the process
y <- dfBlocksAll$trophout1 # outcome variable - did they get a trophy or not?
predictors <- training[,which(colnames(training) != "trophout1")]
在抛出错误的块之前唯一可能相关的代码是:
train_control <- trainControl(method = "repeatedcv",
number = 10,
repeats = 10,
# sampling = "down",
classProbs = TRUE,
summaryFunction = twoClassSummary,
allowParallel = TRUE,
savePredictions = "final",
verboseIter = FALSE)
由于我的 y 已经是一个因素,我假设我的错误与 x 有关,而不是 y。从代码中可以看出,我的 x 是一个称为“预测器”的数据框。该数据框包含 768 个 obs。 67 个变量,并用字符和数字填充。
【问题讨论】:
-
您确实意识到 glmnet 无法处理 NA,对吧?为什么不报告对象预测变量和 y 上的
dim和summary(或者可能使用 sapply(predictors, function(x){sum(is.na(x))})` 的结果???? ?
标签: r machine-learning data-science r-caret