【问题标题】:Caret Predict Target Variable nrow() is Null插入符号预测目标变量 nrow() 为 Null
【发布时间】:2020-09-16 21:36:37
【问题描述】:

df:

library(caret)

a = c("aa", "bb", "cc", "aa", "aa", "aa", "bb", "cc", "bb", "bb") 
b = c("aa", "bb", "cc", "aa", "aa", "aa", "bb", "cc", "bb", "bb") 
c = c("aa", "bb", "cc", "aa", "aa", "aa", "bb", "cc", "bb", "bb") 
d = c("aa", "bb", "cc", "aa", "aa", "aa", "bb", "cc", "bb", "bb") 
e = c(1, 0, 1, 0, 0, 0, 1, 1, 1, 1)

#df1
df1 = data.frame(a,b,c,d,e)
#df2
df2 = data.frame(a,b,c,d,e)

Caret Log-red 模型:

df1$e <- as.factor(df1$e)
df2$e <- as.factor(df2$e)

# define training control
train_control <- trainControl(method = "cv", number = 5)

# train the model on training set
model <- train(e ~ .,
               data = df1,
               trControl = train_control,
               method = "glm",
               family=binomial())

# logistic <- glm(WonLost ~ . -PANum, data=train, family="binomial")
df2$predict <- caret::predict.train(model, newdata=df2,type = "prob")


nrow(df2$predict)
nrow(df2$e)

为什么 nrow(df2$e) 为零?我根据之前遇到的错误将目标变量更改为一个因子,但这似乎导致了我当前的问题。

警告消息:1:在 train.default(x, y, weights = w, ...) 中:你 正在尝试进行回归,而您的结果只有两种可能 values 您是否尝试进行分类?如果是这样,请使用 2 级 因素作为您的结果列。

【问题讨论】:

    标签: r r-caret


    【解决方案1】:

    有时caret 对变量很敏感,即使您的glm logit 模型在回归或分类方面存在问题。我学到的一个建议是将目标变量重新编码为是/否。此外,请注意插入符号的预测被添加为df2 中的新数据帧,这就是为什么nrow() 有效,而e 只是一个向量,因此您必须使用length()NROW()。代码如下:

    library(caret)
    #Vectors
    a = c("aa", "bb", "cc", "aa", "aa", "aa", "bb", "cc", "bb", "bb") 
    b = c("aa", "bb", "cc", "aa", "aa", "aa", "bb", "cc", "bb", "bb") 
    c = c("aa", "bb", "cc", "aa", "aa", "aa", "bb", "cc", "bb", "bb") 
    d = c("aa", "bb", "cc", "aa", "aa", "aa", "bb", "cc", "bb", "bb") 
    e = c(1, 0, 1, 0, 0, 0, 1, 1, 1, 1)
    
    #df1
    df1 = data.frame(a,b,c,d,e)
    #df2
    df2 = data.frame(a,b,c,d,e)
    #Format
    df1$e[df1$e==1] <- 'Yes'
    df1$e[df1$e==0] <- 'No'
    df2$e[df2$e==1] <- 'Yes'
    df2$e[df2$e==0] <- 'No'
    
    # define training control
    train_control <- trainControl(method = "cv", number = 5)
    
    # train the model on training set
    model <- train(e ~ .,
                   data = df1,
                   trControl = train_control,
                   method = "glm",
                   family=binomial())
    
    #Predict
    df2$predict <- caret::predict.train(model, newdata=df2,type = "prob")
    #Checks
    nrow(df2$predict)
    NROW(df2$e)
    length(df2$e)
    

    输出:

    df2
        a  b  c  d   e   predict.No predict.Yes
    1  aa aa aa aa Yes 7.500000e-01        0.25
    2  bb bb bb bb  No 2.500000e-01        0.75
    3  cc cc cc cc Yes 8.646869e-09        1.00
    4  aa aa aa aa  No 7.500000e-01        0.25
    5  aa aa aa aa  No 7.500000e-01        0.25
    6  aa aa aa aa  No 7.500000e-01        0.25
    7  bb bb bb bb Yes 2.500000e-01        0.75
    8  cc cc cc cc Yes 8.646869e-09        1.00
    9  bb bb bb bb Yes 2.500000e-01        0.75
    10 bb bb bb bb Yes 2.500000e-01        0.75
    
    nrow(df2$predict)
    [1] 10
    NROW(df2$e)
    [1] 10
    length(df2$e)
    [1] 10
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2019-10-31
      • 1970-01-01
      • 2018-05-26
      • 2016-07-25
      • 1970-01-01
      • 2016-05-16
      • 1970-01-01
      相关资源
      最近更新 更多