【问题标题】:Elastic net issue in R - Error in check_dims(x = x, y = y) : nrow(x) == n is not TRUER 中的弹性网络问题 - check_dims(x = x, y = y) 中的错误:nrow(x) == n 不是 TRUE
【发布时间】:2021-01-04 21:07:02
【问题描述】:

错误:nrow(x) == n 不正确

我不确定在这种情况下“n”指的是什么。以下是引发错误的代码:

# BUILD MODEL 
set.seed(9353)
elastic_net_model <- train(x = predictors, y = y,
                           method = "glmnet",
                           family = "binomial",
                           preProcess = c("scale"),
                           tuneLength = 10,
                           metric = "ROC",
                           # metric = "Spec",
                           trControl = train_control)

其他人遇到此错误的主要问题是他们的 y 变量不是因子或数字。他们经常将其作为矩阵或数据框传递。我明确地将我的 y 作为一个因素,如下所示:

# Make sure that the outcome variable is a two-level factor
dfBlocksAll$trophout1 = as.factor(dfBlocksAll$trophout1)

# Set levels for dfBlocksAll$trophout1
levels(dfBlocksAll$trophout1) <- c("NoTrophy", "Trophy")

# Split the data into training and test set, 70/30 split
set.seed(1934)
index <- createDataPartition(y = dfBlocksAll$trophout1, p = 0.70, list = FALSE)
training  <- dfBlocksAll[index, ]
testing <- dfBlocksAll[-index, ]

# This step is the heart of the process
y <- dfBlocksAll$trophout1 # outcome variable - did they get a trophy or not?
predictors <- training[,which(colnames(training) != "trophout1")]

在抛出错误的块之前唯一可能相关的代码是:

train_control <- trainControl(method = "repeatedcv",
                              number = 10,
                              repeats = 10,
                              # sampling = "down",
                              classProbs = TRUE, 
                              summaryFunction = twoClassSummary,
                              allowParallel = TRUE,
                              savePredictions = "final",
                              verboseIter = FALSE)

由于我的 y 已经是一个因素,我假设我的错误与 x 有关,而不是 y。从代码中可以看出,我的 x 是一个称为“预测器”的数据框。该数据框包含 768 个 obs。 67 个变量,并用字符和数字填充。

【问题讨论】:

  • 您确实意识到 glmnet 无法处理 NA,对吧?为什么不报告对象预测变量和 y 上的 dimsummary (或者可能使用 sapply(predictors, function(x){sum(is.na(x))})` 的结果???? ?

标签: r machine-learning data-science r-caret


【解决方案1】:

你的响应变量必须来自训练,这里我使用一个示例数据集:

dfBlocksAll = data.frame(matrix(runif(1000),ncol=10))
dfBlocksAll$trophout1 = factor(sample(c("NoTrophy", "Trophy"),100,replace=TRUE))

index <- createDataPartition(y = dfBlocksAll$trophout1, p = 0.70, list = FALSE)
training  <- dfBlocksAll[index, ]
testing <- dfBlocksAll[-index, ]

这部分应该改一下:

y <- training$trophout1 
predictors <- training[,which(colnames(training) != "trophout1")]

其余的运行都很好:

elastic_net_model <- train(x = predictors, y = y,
                           method = "glmnet",
                           family = "binomial",
                           preProcess = c("scale"),
                           tuneLength = 10,
                           metric = "ROC",
                           trControl = train_control)

elastic_net_model
glmnet 

71 samples
10 predictors
 2 classes: 'NoTrophy', 'Trophy' 

Pre-processing: scaled (10) 
Resampling: Cross-Validated (10 fold, repeated 10 times) 
Summary of sample sizes: 65, 64, 64, 63, 64, 64, ... 
Resampling results across tuning parameters:

  alpha  lambda        ROC        Sens       Spec      
  0.1    0.0003090198  0.5620833  0.5908333  0.51666667
  0.1    0.0007138758  0.5620833  0.5908333  0.51666667
  0.1    0.0016491457  0.5614583  0.5908333  0.51083333
  0.1    0.0038097407  0.5594444  0.5933333  0.51083333

【讨论】:

    猜你喜欢
    • 2021-08-18
    • 2011-03-28
    • 1970-01-01
    • 2018-05-26
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多