【问题标题】:R print equation of linear regression on the plot itselfR在绘图本身上打印线性回归方程
【发布时间】:2014-08-02 03:11:35
【问题描述】:

我们如何在绘图上打印一条线的方程?

我有 2 个自变量,想要一个这样的方程:

y=mx1+bx2+c

where x1=cost, x2 =targeting

我可以绘制最佳拟合线,但如何在绘图上打印方程?

也许我不能在一个方程中打印 2 个自变量,但我该怎么做才能说 y=mx1+c 至少?

这是我的代码:

fit=lm(Signups ~ cost + targeting)
plot(cost, Signups, xlab="cost", ylab="Signups", main="Signups")
abline(lm(Signups ~ cost))

【问题讨论】:

标签: r regression


【解决方案1】:

我尝试将输出自动化一点:

fit <- lm(mpg ~ cyl + hp, data = mtcars)
summary(fit)
##Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 36.90833    2.19080  16.847  < 2e-16 ***
## cyl         -2.26469    0.57589  -3.933  0.00048 ***
## hp          -0.01912    0.01500  -1.275  0.21253 


plot(mpg ~ cyl, data = mtcars, xlab = "Cylinders", ylab = "Miles per gallon")
abline(coef(fit)[1:2])

## rounded coefficients for better output
cf <- round(coef(fit), 2) 

## sign check to avoid having plus followed by minus for negative coefficients
eq <- paste0("mpg = ", cf[1],
             ifelse(sign(cf[2])==1, " + ", " - "), abs(cf[2]), " cyl ",
             ifelse(sign(cf[3])==1, " + ", " - "), abs(cf[3]), " hp")

## printing of the equation
mtext(eq, 3, line=-2)

希望对你有帮助,

亚历克斯

【讨论】:

    【解决方案2】:

    您使用?text。此外,您不应该使用abline(lm(Signups ~ cost)),因为这是一个不同的模型(请参阅我对简历的回答:Is there a difference between 'controling for' and 'ignoring' other variables in multiple regression)。无论如何,请考虑:

    set.seed(1)
    Signups   <- rnorm(20)
    cost      <- rnorm(20)
    targeting <- rnorm(20)
    fit       <- lm(Signups ~ cost + targeting)
    
    summary(fit)
    # ...
    # Coefficients:
    #             Estimate Std. Error t value Pr(>|t|)
    # (Intercept)   0.1494     0.2072   0.721    0.481
    # cost         -0.1516     0.2504  -0.605    0.553
    # targeting     0.2894     0.2695   1.074    0.298
    # ...
    
    windows();{
      plot(cost, Signups, xlab="cost", ylab="Signups", main="Signups")
      abline(coef(fit)[1:2])
      text(-2, -2, adj=c(0,0), labels="Signups = .15 -.15cost + .29targeting")
    }
    

    【讨论】:

      【解决方案3】:

      这是使用tidyverse 包的解决方案。

      关键是broom 包,它简化了提取模型数据的过程。例如:

      fit1 <- lm(mpg ~ cyl, data = mtcars)
      summary(fit1)
      
      fit1 %>%
          tidy() %>%
          select(estimate, term)
      

      结果

      # A tibble: 2 x 2
        estimate term       
           <dbl> <chr>      
      1    37.9  (Intercept)
      2    -2.88 cyl 
      

      我写了一个函数来提取和格式化使用dplyr的信息:

      get_formula <- function(object) {
          object %>% 
              tidy() %>% 
              mutate(
                  term = if_else(term == "(Intercept)", "", term),
                  sign = case_when(
                      term == "" ~ "",
                      estimate < 0 ~ "-",
                      estimate >= 0 ~ "+"
                  ),
                  estimate = as.character(round(abs(estimate), digits = 2)),
                  term = if_else(term == "", paste(sign, estimate), paste(sign, estimate, term))
              ) %>%
              summarize(terms = paste(term, collapse = " ")) %>%
              pull(terms)
      }
      
      get_formula(fit1)
      

      结果

      [1] " 37.88 - 2.88 cyl"
      

      然后使用ggplot2绘制线条并添加标题

      mtcars %>%
          ggplot(mapping = aes(x = cyl, y = mpg)) +
          geom_point() +
          geom_smooth(formula = y ~ x, method = "lm", se = FALSE) +
          labs(
              x = "Cylinders", y = "Miles per Gallon", 
              caption = paste("mpg =", get_formula(fit1))
          )
      

      Plot using geom_smooth()

      这种绘制线的方法实际上只对可视化两个变量之间的关系有意义。正如@Glen_b 在评论中指出的那样,我们从建模mpg 作为cyl (-2.88) 的函数得到的斜率与我们从建模mpg 作为cyl 的函数得到的斜率不匹配和其他变量 (-1.29)。例如:

      fit2 <- lm(mpg ~ cyl + disp + wt + hp, data = mtcars)
      summary(fit2)
      
      fit2 %>%
          tidy() %>%
          select(estimate, term)
      

      结果

      # A tibble: 5 x 2
        estimate term       
           <dbl> <chr>      
      1  40.8    (Intercept)
      2  -1.29   cyl        
      3   0.0116 disp       
      4  -3.85   wt         
      5  -0.0205 hp 
      

      也就是说,如果您想准确地绘制模型的回归线,该模型包含未出现在图中的变量,请改用geom_abline(),并使用broom 包函数获取斜率和截距。据我所知geom_smooth() 公式不能引用尚未映射为美学的变量。

      mtcars %>%
          ggplot(mapping = aes(x = cyl, y = mpg)) +
          geom_point() +
          geom_abline(
              slope = fit2 %>% tidy() %>% filter(term == "cyl") %>% pull(estimate),
              intercept = fit2 %>% tidy() %>% filter(term == "(Intercept)") %>% pull(estimate),
              color = "blue"
          ) +
          labs(
              x = "Cylinders", y = "Miles per Gallon", 
              caption = paste("mpg =", get_formula(fit2))
          )
      

      Plot using geom_abline()

      【讨论】:

        猜你喜欢
        • 2017-03-24
        • 2012-01-12
        • 2016-07-09
        • 2011-11-24
        • 2018-05-30
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多