【发布时间】:2021-06-10 13:11:31
【问题描述】:
在阅读 Hadley Wickham 的《R for Data Science》这本书时,我开始有这个问题。在“数据转换”一章中,作者使用了这个例子:
library(nycflights13)
library(tidyverse)
by_dest <- group_by(flights, dest)
delay <- summarise(by_dest,
count = n(),
dist = mean(distance, na.rm = TRUE),
delay = mean(arr_delay, na.rm = TRUE)
)
#> `summarise()` ungrouping output (override with `.groups` argument)
delay <- filter(delay, count > 20, dest != "HNL")
# It looks like delays increase with distance up to ~750 miles
# and then decrease. Maybe as flights get longer there's more
# ability to make up delays in the air?
ggplot(data = delay, mapping = aes(x = dist, y = delay)) +
geom_point(aes(size = count), alpha = 1/3) +
geom_smooth(se = FALSE)
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
输出图如下所示: relationship between delay and flight distance
所以我想知道我们是否可以使用R知道曲线的极值和对应的x值?
从this answer 开始,我发现使用 ggpmisc 中的 stat_poly_eq() 来计算多项式回归方程:
library(ggpmisc)
formula=y ~ poly(x, 3, raw=TRUE)
p <- ggplot(data = delay, mapping = aes(x = dist, y = delay))
p <- p + geom_point(aes(size = count), alpha = 1/3)
p <- p+ geom_smooth(method = "lm", formula = formula, se = FALSE)
(p1 <- p+ stat_poly_eq(formula = formula, aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")), parse = TRUE))
并使用该方程找出曲线的极值,我认为这很不聪明。所以我想知道如何找出我的回归曲线的极值和对应的x值(不是原始数据的最大值和最小值,而是回归曲线的最大值和最小值)。
【问题讨论】:
-
现在没有时间回答,但是您可以通过在您的
ggplot对象上运行ggplot_build()来到达那里(假设您已将其保存到变量中),提取$data组件,找出与geom_smooth()生成的数据对应的层,然后在该数据帧中查找y变量的最大值。 (这将只是平滑预测点集合中的最大 y 值,而不是绝对最大值......)
标签: r ggplot2 non-linear-regression