是否有一种简洁的方法可以用 geom_quantile() 中的方程和其他统计数据来标记 ggplot 图？答案

【问题标题】：Is there a neat approach to label a ggplot plot with the equation and other statistics from geom_quantile()?是否有一种简洁的方法可以用 geom_quantile() 中的方程和其他统计数据来标记 ggplot 图？
【发布时间】：2021-04-18 01:56:34
【问题描述】：

我想包含来自geom_quantile() 拟合线的相关统计数据，其方式类似于我对geom_smooth(method="lm") 拟合线性回归（我之前使用过 ggpmisc这是awesome)。例如这段代码：

# quantile regression example with ggpmisc equation
# basic quantile code from here:
# https://ggplot2.tidyverse.org/reference/geom_quantile.html

library(tidyverse)
library(ggpmisc)
# see ggpmisc vignette for stat_poly_eq() code below:
# https://cran.r-project.org/web/packages/ggpmisc/vignettes/user-guide.html#stat_poly_eq

my_formula <- y ~ x
#my_formula <- y ~ poly(x, 3, raw = TRUE)

# linear ols regression with equation labelled
m <- ggplot(mpg, aes(displ, 1 / hwy)) +
  geom_point()

m + 
  geom_smooth(method = "lm", formula = my_formula) +
  stat_poly_eq(aes(label =  paste(stat(eq.label), "*\" with \"*", 
                                  stat(rr.label), "*\", \"*", 
                                  stat(f.value.label), "*\", and \"*",
                                  stat(p.value.label), "*\".\"",
                                  sep = "")),
               formula = my_formula, parse = TRUE, size = 3)

生成：

对于分位数回归，您可以将geom_smooth() 换成geom_quantile() 并绘制一条可爱的分位数回归线（在本例中为中位数）：

# quantile regression - no equation labelling
m + 
  geom_quantile(quantiles = 0.5)

您将如何将汇总统计信息发送到标签，或者在旅途中重新创建它们？（即，除了在调用 ggplot 之前进行回归，然后将其传递给然后进行注释（例如，类似于 here 或 here 的线性回归所做的事情？

【问题讨论】：

您想在绘图上打印哪些具体的“相关统计数据”？当然，最好拟合您自己的模型以获得重要的拟合统计数据，而不是回复绘图功能。这些其他功能基本上只是为您执行此操作的包装器（您最终会多次拟合模型，这有点浪费）。如果您真的想隐藏工作，您可以编写自己的 geom 来执行此操作，但这基本上就是需要发生的事情。
公平通话。我认为可能有一个使用来自 ggpmisc 的stat_fit_glance 的解决方案，尽管我只需要弄清楚它如何与多个分位数一起工作。 cran.r-project.org/web/packages/ggpmisc/vignettes/…
@MarkNeal 我提出了一个问题，提醒自己看看为什么stat_fit_glance() 在实现glance() 时不起作用。（感谢您让我知道您发现 'ggpmisc' 很有用！）
@MarkNeal 新的 'ggpmisc' 0.3.8 现在在 CRAN 中。我已经根据这个版本更新了我的答案。现在可以使用stat_fit_tidy() 添加符合qr() 的模型的方程和p-值。
@MarkNeal 'ggpmisc' 的下一个版本，版本 0.4.0 正在 GitHub 中形成。我添加了stat_quant_eq()。它试图为您的问题提供一个优雅的答案，包括对分位数向量的支持。如果你能找到一些时间，你能检查并提供反馈吗？您需要从 GitHub 包 'ggpp' 和 'gppmisc' 安装：remotes::install_github("aphalo/ggpp") 和 remotes::install_github("aphalo/ggpmisc", ref = "stat.quant.eq")。 ref 参数指向已实现新统计信息的分支。

标签： r ggplot2 label quantile-regression ggpmisc

【解决方案1】：

@mark-neal stat_fit_glance() 可以与 quantreg::rq() 一起使用。然而，使用stat_fit_glance() 更复杂。这个统计数据不“知道”对glance() 的期望，因此必须手动组装label。

人们需要知道有什么可用的。可以在 ggplot 之外运行拟合模型并使用glance() 来找出它返回的列，或者可以在 ggplot 中借助包“gginnards”来执行此操作。我将展示这个替代方案，从上面的代码示例继续。

library(gginnards)

m + 
  geom_quantile(quantiles = 0.5) +
  stat_fit_glance(method = "rq", method.args = list(formula = y ~ x), geom = "debug")

geom_debug() 默认情况下只是将其输入打印到 R 控制台，它的输入就是统计信息返回的内容。

# A tibble: 1 x 11
   npcx  npcy   tau logLik    AIC    BIC df.residual     x      y PANEL group
  <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>       <int> <dbl>  <dbl> <fct> <int>
1    NA    NA   0.5   816. -1628. -1621.         232  1.87 0.0803 1        -1

我们可以使用after_stat() 访问每一列（早期的化身是stat() 并包含名称...。我们需要使用sprintf() 的编码符号进行格式化。如果在这种情况下我们组装一个需要解析成表达式的字符串，还需要parse = TRUE。

m + 
  geom_quantile(quantiles = 0.5) +
  stat_fit_glance(method = "rq", method.args = list(formula = y ~ x), 
                  mapping = aes(label = sprintf('italic(tau)~"="~%.2f~~AIC~"="~%.3g~~BIC~"="~%.3g',
                                                after_stat(tau), after_stat(AIC), after_stat(BIC))),
                  parse = TRUE)

这个例子产生了下面的情节。

对于stat_fit_tidy()，同样的方法应该有效。然而，在 'ggpmisc' (= 0.3.8) 中修复，现在在 CRAN 中。

以下示例仅适用于 'ggpmisc' (>= 0.3.8)

剩下的问题是glance() 或tidy() 返回的tibble 是否包含想要添加到情节中的信息，至少在默认情况下tidy.qr() 似乎并非如此。但是，tidy.rq() 有一个参数se.type，它决定了tibble 中返回的值。修改后的 stat_fit_tidy() 接受要传递给 tidy() 的命名参数，从而使以下操作成为可能。

m + 
  geom_quantile(quantiles = 0.5) +
  stat_fit_tidy(method = "rq",
                method.args = list(formula = y ~ x), 
                tidy.args = list(se.type = "nid"),
                mapping = aes(label = sprintf('y~"="~%.3g~+~%.3g~x*", with "*italic(P)~"="~%.3f',
                                              after_stat(Intercept_estimate), 
                                              after_stat(x_estimate),
                                              after_stat(x_p.value))),
                parse = TRUE)

这个例子产生了下面的情节。

定义一个新的统计数据stat_rq_eq() 会更简单：

stat_rq_eqn <- function(formula = y ~ x, tau = 0.5, ...) {
  stat_fit_tidy(method = "rq",
                method.args = list(formula = formula, tau = tau), 
                tidy.args = list(se.type = "nid"),
                mapping = aes(label = sprintf('y~"="~%.3g~+~%.3g~x*", with "*italic(P)~"="~%.3f',
                                              after_stat(Intercept_estimate), 
                                              after_stat(x_estimate),
                                              after_stat(x_p.value))),
                parse = TRUE,
                ...)
}

答案变成：

m + 
  geom_quantile(quantiles = 0.5) +
  stat_rq_eqn(tau = 0.5)

【讨论】：

对于那些刚接触它的人，在stat_fit_tidy() 中添加tau=0.9 会为绘图提供第90 个分位数方程以匹配geom_quantile(quantiles = 0.9)，这将产生直线。如果没有定义 tau 值，则默认为 0.5（即“中值”回归）。
多个分位数方程的“联合”处理可以允许标签垂直“步进”，因此不需要指定手动 y 轴位置以避免重叠。我将考虑合并一些代码来打印P<0.01，这比四舍五入或以科学计数法提供一个可笑的小数字更整洁。无论如何，佩德罗让这项工作发挥作用！
@MarkNeal P 值的代码在stat_poly_eq() 的源代码中，可以从那里复制。鉴于我很可能也会使用它，我将尝试在下一个版本之前添加对多个 tau 值的支持。考虑到我自己将如何使用它，还有一个 stat_quantile_band() 可以绘制分位数，例如 stat_smooth() 绘制置信带。这会有用吗？尽管可以将 ...tau... 映射到颜色或线型，但这可能会干扰这些美学的其他用途。
@MarkNeal 我已经在“ggpmisc”的下一个版本中实现了统计stat_quant_eq()。它支持分位数 (= tau) 的向量。我使用quantiles 而不是 tau 来与stat_quantile() 保持一致。我做了一些测试，它似乎按预期工作。我仍然需要测试构面的使用。由于我还将“ggpmisc”拆分为两个包，因此可以使用以下命令从 GitHub 安装：remotes::install_github("aphalo/ggpp")，然后是 remotes::install_github("aphalo/ggpmisc")。如果可能，请尝试一下，如果它有问题或需要改进，请告诉我。
谢谢！在提交给 CRAN 之前，我会等待您的 cmets 并自己做一些额外的检查。

【解决方案2】：

请将此视为 Pedro 出色答案的附录，他在其中完成了大部分繁重的工作 - 这增加了一些演示调整（颜色和线型）和代码以简化多个分位数，生成以下图表：

library(tidyverse)
library(ggpmisc) #ensure version 0.3.8 or greater
library(quantreg)
library(generics)

my_formula <- y ~ x
#my_formula <- y ~ poly(x, 3, raw = TRUE)

# base plot
m <- ggplot(mpg, aes(displ, 1 / hwy)) +
  geom_point()

# function for labelling
# Doesn't neatly handle P values (e.g return "P<0.001 where appropriate)

stat_rq_eqn <- function(formula = y ~ x, tau = 0.5, colour = "red", label.y = 0.9, ...) {
  stat_fit_tidy(method = "rq",
                method.args = list(formula = formula, tau = tau), 
                tidy.args = list(se.type = "nid"),
                mapping = aes(label = sprintf('italic(tau)~"="~%.3f~";"~y~"="~%.3g~+~%.3g~x*", with "~italic(P)~"="~%.3g',
                                              after_stat(x_tau),
                                              after_stat(Intercept_estimate), 
                                              after_stat(x_estimate),
                                              after_stat(x_p.value))),
                parse = TRUE,
                colour = colour,
                label.y = label.y,
                ...)
}

# This works, though with double entry of plot specs
# custom colours and linetype
# https://stackoverflow.com/a/44383810/4927395
# https://stackoverflow.com/a/64518380/4927395


m + 
  geom_quantile(quantiles = c(0.1, 0.5, 0.9), 
                aes(colour = as.factor(..quantile..),
                    linetype = as.factor(..quantile..))
                )+
  scale_color_manual(values = c("red","purple","darkgreen"))+
  scale_linetype_manual(values = c("dotted", "dashed", "solid"))+
  stat_rq_eqn(tau = 0.1, colour = "red", label.y = 0.9)+
  stat_rq_eqn(tau = 0.5, colour = "purple", label.y = 0.95)+
  stat_rq_eqn(tau = 0.9, colour = "darkgreen", label.y = 1.0)+
  theme(legend.position = "none") # suppress legend


# not a good habit to have double entry above
# modified with reference to tibble for plot specs, 
# though still a stat_rq_eqn call for each quantile and manual vertical placement
# https://www.r-bloggers.com/2019/06/curly-curly-the-successor-of-bang-bang/

my_tau = c(0.1, 0.5, 0.9)
my_colours = c("red","purple","darkgreen")
my_linetype = c("dotted", "dashed", "solid")

quantile_plot_specs <- tibble(my_tau, my_colours, my_linetype)

m + 
  geom_quantile(quantiles = {{quantile_plot_specs$my_tau}}, 
                aes(colour = as.factor(..quantile..),
                    linetype = as.factor(..quantile..))
  )+
  scale_color_manual(values = {{quantile_plot_specs$my_colours}})+
  scale_linetype_manual(values = {{quantile_plot_specs$my_linetype}})+
  stat_rq_eqn(tau = {{quantile_plot_specs$my_tau[1]}}, colour = {{quantile_plot_specs$my_colours[1]}}, label.y = 0.9)+
  stat_rq_eqn(tau = {{quantile_plot_specs$my_tau[2]}}, colour = {{quantile_plot_specs$my_colours[2]}}, label.y = 0.95)+
  stat_rq_eqn(tau = {{quantile_plot_specs$my_tau[3]}}, colour = {{quantile_plot_specs$my_colours[3]}}, label.y = 1.0)+
  theme(legend.position = "none")

【讨论】：