【发布时间】:2019-05-17 12:23:55
【问题描述】:
我正在使用 predict() 函数来预测 blackFriday_test 中的 Purchase 变量。当我将这些变量作为参数使用 cor() 时,我收到一条“尺寸不兼容”的错误消息。
我尝试查看 blackFriday_test 中 Purchas 变量的维度,即 107516,但预测值结果仅为 32955。
数据是从https://www.kaggle.com/mehdidag/black-friday 下载的。
library(caret)
blackFriday <- read.csv("BlackFriday.csv", stringsAsFactors = T)
这里我删除了前两个特征,因为它们是标识符
nblackFriday <- blackFriday[, 3:12]
set.seed(189)
train <- sample(nrow(nblackFriday), as.integer(0.8 * nrow(nblackFriday)), replace = F)
blackFriday_train <- nblackFriday[train, ]
blackFriday_test <- nblackFriday[-train, ]
从存在的两个变量中删除 NA
nblackFriday$Product_Category_2 <- ifelse(is.na(nblackFriday$Product_Category_2), mean(nblackFriday$Product_Category_2, na.rm = T), nblackFriday$Product_Category_2)
nblackFriday$Product_Category_3 <- ifelse(is.na(nblackFriday$Product_Category_3), mean(nblackFriday$Product_Category_3, na.rm = T), nblackFriday$Product_Category_3)
blackFriday_train$Product_Category_2 <- nblackFriday$Product_Category_2[train]
blackFriday_train$Product_Category_3 <- nblackFriday$Product_Category_3[train]
m <- train(Purchase ~ ., data = blackFriday_train, method = "rpart")
p <- predict(m, blackFriday_test)
cor(p, blackFriday_test$Purchase)
```
#This is where I get the error
I expect the number of predicted values to be the same as the number of rows in blackFriday_test, but they are not.
【问题讨论】: