使用 broom 和 tidyverse 总结 r 平方游戏答案

【问题标题】：Using broom and tidyverse to summarise r squared gams使用 broom 和 tidyverse 总结 r 平方游戏
【发布时间】：2018-02-07 08:29:27
【问题描述】：

我发布了一个问题 here，并且能够重现 Claus' answer 以使用 tidyverse 对 iris 数据计算加法模型中每个物种的多个 r 平方值。但是，包发生了更新，现在没有计算 R-sq 值。不知道为什么...
这是子句响应和输出

library(tidyverse)
library(broom)
iris %>% nest(-Species) %>% 
  mutate(fit = map(data, ~mgcv::gam(Sepal.Width ~ s(Sepal.Length, bs = "cs"), data = .)),
         results = map(fit, glance),
         R.square = map(fit, ~ summary(.)$r.sq)) %>%
  unnest(results) %>%
  select(-data, -fit)

#      Species  R.square       df    logLik      AIC      BIC deviance df.residual
# 1     setosa 0.5363514 2.546009 -1.922197 10.93641 17.71646 3.161460    47.45399
# 2 versicolor 0.2680611 2.563623 -3.879391 14.88603 21.69976 3.418909    47.43638
# 3  virginica 0.1910916 2.278569 -7.895997 22.34913 28.61783 4.014793    47.72143

但我的代码和输出使用 R.square <dbl [1]> 值生成了这个

library(tidyverse)
library(broom)
iris %>% nest(-Species) %>% 
  mutate(fit = map(data, ~mgcv::gam(Sepal.Width ~ s(Sepal.Length, bs = "cs"), data = .)),
          results = map(fit, glance),
          R.square = map(fit, ~ summary(.)$r.sq)) %>%
   unnest(results) %>%
   select(-data, -fit)

     Species  R.square       df    logLik      AIC      BIC deviance
      <fctr>    <list>    <dbl>     <dbl>    <dbl>    <dbl>    <dbl>
1     setosa <dbl [1]> 2.396547 -1.973593 10.74028 17.23456 3.167966
2 versicolor <dbl [1]> 2.317501 -4.021222 14.67745 21.02058 3.438361
3  virginica <dbl [1]> 2.278569 -7.895997 22.34913 28.61783 4.014793

任何人都可以提供有关原因的见解吗？

【问题讨论】：

我能够得到第一个输出。你的包版本是什么？我有broom_0.4.3，dplyr_0.7.4 purrr_0.2.4
fwiw 我得到第二个输出但 SessionInfo 说 ...broom_0.4.3 , dplyr_0.7.4, purrr_0.2.4????还有mgcv_1.8-23
我认为是mgcv 版本。如果我简化为 mgcv::gam(Sepal.Width ~ s(Sepal.Length, bs = "cs"), data =iris) %>% glance 我的结果没有 R 平方。由于我和@akrun 拥有相同的扫帚，gam 模型的格式可能不同？
我有 broom_0.4.3、dplyr_0.7.4 purrr_0.2.4 和 mgcv_1.8-23
@akrun 你运行的是哪个版本的mgcv？

标签： r tidyverse gam mgcv

【解决方案1】：

我的sessionInfo 与 OP 相同（参见上面的 cmets）。我可以通过使用map_dbl 将 R-squared 强制为双精度来解决此问题。我不完全确定为什么它对 Akrun 有效......？

iris %>% nest(-Species) %>% 
  mutate(fit = map(data, ~mgcv::gam(Sepal.Width ~ s(Sepal.Length, bs = "cs"), data = .)),
         results = map(fit, glance),
         R.square = map_dbl(fit, ~ summary(.)$r.sq)) %>%
  unnest(results) %>%
  select(-data, -fit)

# A tibble: 3 x 8
  Species    R.square    df logLik   AIC   BIC deviance df.residual
  <fct>         <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <dbl>
1 setosa        0.536  2.55  -1.92  10.9  17.7     3.16        47.5
2 versicolor    0.268  2.56  -3.88  14.9  21.7     3.42        47.4
3 virginica     0.191  2.28  -7.90  22.3  28.6     4.01        47.7

【讨论】：

我认为map_dbl 是正确的做法。不知道为什么它只对我有用 map，它真的不应该。我有dplyr_0.7.4 和purrr_0.2.4。