如何将大量模型传递给gather_predictions答案

【问题标题】：How to pass a large amount of models to gather_predictions如何将大量模型传递给gather_predictions
【发布时间】：2016-12-16 15:07:20
【问题描述】：

在modelr 包中，函数gather_predictions 可用于将来自多个模型的预测添加到数据框，但是我不确定如何在函数调用中指定这些模型。帮助文档给出了以下示例：

df <- tibble::data_frame(
  x = sort(runif(100)),
  y = 5 * x + 0.5 * x ^ 2 + 3 + rnorm(length(x))
)

m1 <- lm(y ~ x, data = df)
grid <- data.frame(x = seq(0, 1, length = 10))
grid %>% add_predictions(m1)

m2 <- lm(y ~ poly(x, 2), data = df)
grid %>% spread_predictions(m1, m2)
grid %>% gather_predictions(m1, m2)

这里的模型是在函数调用中特别提到的。如果我们有一些我们想要预测的模型，那效果很好，但是如果我们有大量或未知数量的模型怎么办？在这种情况下，手动指定模型不再可行。

帮助文档对参数部分的表述方式似乎表明您需要将每个模型添加为单独的参数。

gather_predictions 和 spread_predictions 采用多个模型。这 name 将取自参数名称型号。

例如，将模型列表输入到gather_predictions 中是行不通的。

是否有一些简单的方法可以将列表/大量模型输入到gather_predictions？

列表中 10 个模型的示例：

modelslist <- list()
for (N in 1:10) {
  modelslist[[N]] <- lm(y ~ poly(x, N), data = df)
}

如果将模型以其他方式存储而不是列表效果更好，那也很好。

【问题讨论】：

标签： r tidyverse

【解决方案1】：

m <- grid %>% gather_predictions(lm(y ~ poly(x, 1), data = df))
for (N in 2:10) {
  m <- rbind(m, grid %>% gather_predictions(lm(y ~ poly(x, N), data = df)))
}

【讨论】：

适用于示例，但这确实意味着您必须在进行过程中创建模型并且不存储模型，这对于训练良好且快速的线性模型来说很好，但是对于更复杂的模型，这可能并不理想。
@MarijnStevering 哦，是的，我想它只是存储结果，但我这样做是为了使其更简洁。要同时存储模型，您只需在循环中再插入一行，该行与您在问题中的行相同。话虽如此，您不能以这种方式使用列表数据类型，因此可能导致您必须两次拟合模型，或者至少在将数据类型从列表转换为 gather_predictions 之前使用它。

【解决方案2】：

有一些解决方法可以解决这个问题。我的方法是： 1.建立具有特定名称的模型列表 2. 使用经过调整的 modelr::gather_predictions() 将列表中的所有模型应用于数据

# prerequisites
library(tidyverse)
set.seed(1363)    

# I'll use generic name 'data' throughout the code, so you can easily try other datasets.
# for this example I'll use your data df
data=df

# data visualization
ggplot(data, aes(x, y)) + 
        geom_point(size=3)

your sample data

# build a list of models
models <-vector("list", length = 5)
model_names <- vector("character", length=5)
for (i in 1:5) {
        modelformula <- str_c("y ~ poly(x,", i, ")", sep="")
        models[[i]] <- lm(as.formula(modelformula), data = data)
        model_names[[i]] <- str_c('model', i) # remember we name the models here sequantially
}

# apply names to the models list
names(models) <- model_names

# this is modified verison of modelr::gather_predictions() in order to accept list of models
gather.predictions <- function (data, models, .pred = "pred", .model = "model") 
{
        df <- map2(models, .pred, modelr::add_predictions, data = data)
        names(df) <- names(models)
        bind_rows(df, .id = .model)
}

# the rest is the same as modelr's function...
grids <- gather.predictions(data = data, models = models, .pred = "y")

ggplot(data, aes(x, y)) + 
        geom_point() +
        geom_line(data = grids, colour = "red") +
        facet_wrap(~ model)

example of polynomial models (degree 1:5) applied to your sample data

旁注：我选择字符串来构建模型有充分的理由......来讨论。

【讨论】：