【发布时间】:2021-08-08 12:23:43
【问题描述】:
我创建了一个训练和测试集并测试了我的模型。我的工作流程如下:
# Test/train
set.seed(2402) ## This generates a random order
splits <- initial_split(Data, prop = 0.7) ## 70% will be training data
# Create a train and test set
Data_train <- training(splits)
Data_test <- testing(splits)
# Specify the recipe
rf_mod <- rand_forest(mtry = tune(), min_n = tune(), trees = 200) %>%
set_mode("regression") %>%
set_engine("ranger", importance = "permutation")
# Create a workflow
rf_mod_workflow <- workflow() %>%
add_model(rf_mod) %>%
add_recipe(rf_mod_recipe)
rf_mod_workflow
# State our error metrics
class_metrics <- metric_set(rmse, mae)
通过 registerDoParallel() 加快计算速度
registerDoParallel()
rf_grid <- grid_regular(
mtry(range = c(5, 15)),
min_n(range = c(10, 200)),
levels = 5
)
rf_grid
set.seed(654321)
rf_tune_res <- tune_grid(
rf_mod_workflow,
resamples = cv_folds,
grid = rf_grid,
metrics = class_metrics
)
# Select the best number of mtry
best_rmse <- select_best(rf_tune_res, "rmse")
rf_final_wf <- finalize_workflow(rf_mod_workflow, best_rmse)
rf_final_wf
# Finalise the workflow
set.seed(56789)
rf_final_fit <- rf_final_wf %>%
last_fit(splits, metrics = class_metrics)
但是,我现在想使用我创建的模型来预测新数据集。问题是这个新数据集包含 NA 值。是否仍然可以在具有 NA 值的数据集上进行预测,或者随机森林不允许这样做?我对线性回归做了类似的事情,忽略了 NA 值,只预测了不存在 NA 值的实例。
【问题讨论】:
标签: r random-forest prediction