在多个模型中使用 predict() 在 R 中生成置信区间答案

【问题标题】：Using predict() across multiple models to generate confidence intervals in R在多个模型中使用 predict() 在 R 中生成置信区间
【发布时间】：2021-03-16 22:23:23
【问题描述】：

我的目标是从一个数据框创建多个模型，然后围绕与这些不同模型对应的拟合值生成置信区间。

拉入库：

library(purrr)
library(dplyr)
library(modelr)

将 data_1 指定为来自 R 的 DNase 数据集：

data_1 <- DNase

为每次运行创建一个独特的模型：

model_dna <- data_1 %>% group_by(Run) %>% 
  do(model_dna = lm(conc ~ density, data = .)) %>% ungroup()

然后，我想预测同一组数据的拟合度和 95% 置信区间，但要单独针对每个模型进行预测。当包括区间 =“置信度”时，结果表应生成拟合值的“拟合”列，以及“upr”和“lwr”列，表示拟合值周围的置信范围。我试过这个，因为 spread_predictions 以前曾帮助将拟合值分布在多组数据中：

data_2 <- map(model_dna$model_dna, ~ spread_predictions(data = data_1, models = .x, interval = "confidence"))

但是，会产生以下错误：

Error in UseMethod("predict") : 
  no applicable method for 'predict' applied to an object of class "character"

有谁知道产生这些数字的最佳方法是什么？我是否必须更改此函数处理数据的方式？或者有没有更好的函数可以使用，即直接使用 predict() ，它肯定将区间作为参数（http://www.sthda.com/english/articles/40-regression-analysis/166-predict-in-r-model-predictions-and-confidence-intervals/）？

【问题讨论】：

下面的帖子很好地回答了我的问题：stackoverflow.com/questions/65042240/…

标签： r model purrr predict modelr

【解决方案1】：

我们可以使用invoke，将data指定为“data_1”，将模型指定为list（model_dna中的model_dna列是list）

library(purrr)
out <- invoke(spread_predictions, data = data_1, model_dna$model_dna)

-输出

head(out)
#Grouped Data: density ~ conc | Run
#  Run       conc density       <lm>
#1   1 0.04882812   0.017 -1.5759940
#2   1 0.04882812   0.018 -1.5693389
#3   1 0.19531250   0.121 -0.8838652
#4   1 0.19531250   0.124 -0.8639000
#5   1 0.39062500   0.206 -0.3181831
#6   1 0.39062500   0.215 -0.2582873

如果是从模型中得到置信区间，

library(broom)
library(tidyr)
data_1 %>%
   nest_by(Run) %>% 
   mutate(model_dna = list(lm(conc ~ density, data = data) %>%
   tidy(., conf.int = TRUE))) %>%
   select(Run, model_dna) %>%
   ungroup %>% 
   unnest(c(model_dna))
# A tibble: 22 x 8
#   Run   term        estimate std.error statistic     p.value conf.low conf.high
#   <ord> <chr>          <dbl>     <dbl>     <dbl>       <dbl>    <dbl>     <dbl>
# 1 10    (Intercept)    -1.69     0.670     -2.52 0.0245         -3.13   -0.251 
# 2 10    density         6.66     0.733      9.08 0.000000306     5.08    8.23  
# 3 11    (Intercept)    -1.58     0.629     -2.51 0.0249         -2.93   -0.231 
# 4 11    density         6.60     0.690      9.57 0.000000161     5.12    8.08  
# 5 9     (Intercept)    -1.47     0.646     -2.28 0.0388         -2.86   -0.0876
# 6 9     density         6.49     0.708      9.16 0.000000272     4.97    8.01  
# 7 1     (Intercept)    -1.30     0.588     -2.21 0.0440         -2.56   -0.0402
# 8 1     density         6.51     0.659      9.88 0.000000108     5.10    7.92  
# 9 4     (Intercept)    -1.22     0.583     -2.09 0.0550         -2.47    0.0300
#10 4     density         6.36     0.645      9.86 0.000000111     4.97    7.74  
# … with 12 more rows

【讨论】：

问题在于，虽然这对于生成拟合值非常有效，但在 invoke() 中添加 interval = 'confidence' 功能时似乎不起作用
@Cameron 你可以检查那个函数spread_predictions(data, ..., type = NULL)的使用情况，'interval'没有参数
好的，@akrun 说得通。它告诉我要参考 predict() 的文档以查看所有选项，因此我认为由于 predict() 采用了 spread_predictions 的参数。你知道我如何使用 predict() 和特征 (interval = 'confidence) 来提取这些数据吗？
@Cameron 您在帖子下方的评论表明您正在寻找与更新中类似的帖子