【问题标题】:Why is my regression line now showing up?为什么我的回归线现在出现了?
【发布时间】:2021-12-25 13:23:51
【问题描述】:

我一直在尝试在 R 中处理这段代码,但遇到了一些困难。我当前的问题是我正在使用的回归代码没有显示出来。这是因为 x 轴是字符而不是数字或日期吗?提前感谢您的帮助!

library(dataRetrieval)
library(plyr)
library(tidyverse)
library(ggpmisc) # for dealing with stat equations
library(ggplot2) # for making plots 
library(lubridate) # for working with dates
library(scales) #for working with date_format
library(tidyverse)
library(tibble)
library(tidyr)

siteNo <- "02197000"
pCode <- "00060"

daily <- readNWISdv(siteNo, pCode, "1900-01-01","2021-09-30", statCd="00003")
daily <- renameNWISColumns(daily)
Date <- format(as.Date(daily$Date), format = "%Y-%m-%d")

Date2=format(as.Date(daily$Date), format = "%Y")
#mean_Flow=format(as.integer())


daily2 = ddply(daily, .(site_no, Date2), summarise,
               mean_Flow = mean(Flow)*(0.0283168))

#check to see if this date is in the data
for (i in 1900:2021){
  #test code to see if its there
  print(any(daily2 == i))
  #add the year if it doesnt exist
  if(any(daily2 == i) == FALSE){
    print(i)
    print("need to add the")
    #how do i add a row for the i
    daily2[nrow(daily2) + 1,] = list(siteNo, i,NA)}}
#add the data frame to the new one



lm_eqn <- function(daily2){
  m <- lm(mean_Flow ~ Date2, daily2);
  eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2, 
                   list(a = format(unname(coef(m)[1]), digits = 2),
                        b = format(unname(coef(m)[2]), digits = 2),
                        r2 = format(summary(m)$r.squared, digits = 3)))
  as.character(as.expression(eq));
}


p1 = ggplot(daily2,aes(Date2,mean_Flow)) +
  geom_line(group = 1) +
  geom_smooth(method = "lm", se=FALSE, color="black", formula = mean_Flow ~ Date2) +
  geom_text(x = 1950, y = 700, label = lm_eqn(daily2), parse = TRUE) +
  theme_classic()+
  labs(x="", y=(expression(Discharge~(m^{3}~s^{-1}))))+
  scale_y_continuous(limits = c(0,800))+
 # scale_x_continuous(limits = c(1900,2021),
   #                  breaks = 5)+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

p1

【问题讨论】:

    标签: r ggplot2 regression data-retrieval


    【解决方案1】:

    如果您检查str(daily2),您会看到Date2 已被强制转换为字符变量,可能是在您的ddply() 步骤期间

    'data.frame':   105 obs. of  3 variables:
     $ site_no  : chr  "02197000" "02197000" "02197000" "02197000" ...
     $ Date2    : chr  "1900" "1901" "1902" "1903" ...
     $ mean_Flow: num  349 465 348 383 157 ...
    

    因此,当您使用它进行回归时,它会将其视为因子变量,这实际上会创建一个饱和模型,其中无法估计任何系数:

    summary(lm(mean_Flow ~ Date2, data = daily2))
    
    # Call:
    # lm(formula = mean_Flow ~ Date2, data = daily2)
    # 
    # Residuals:
    # ALL 104 residuals are 0: no residual degrees of freedom!
    # 
    # Coefficients:
    #              Estimate Std. Error t value Pr(>|t|)
    # (Intercept)  348.5154        NaN     NaN      NaN
    # Date21901    116.6319        NaN     NaN      NaN
    # Date21902     -0.6292        NaN     NaN      NaN
    # [Further output omitted]
    

    因此,如果您将 Date2 变量转换为数字(并在 geom_smooth() 调用中修复/省略您的公式),您会得到我认为您想要的输出:

    daily2$Date2 <- as.numeric(daily2$Date2)
    
    p1 = ggplot(daily2,aes(Date2,mean_Flow)) +
        geom_line(group = 1) +
        geom_smooth(method = "lm", se=FALSE, color="black") +
        geom_text(x = 1950, y = 700, label = lm_eqn(daily2), parse = TRUE) +
        theme_classic()+
        labs(x="", y=(expression(Discharge~(m^{3}~s^{-1}))))+
        scale_y_continuous(limits = c(0,800))+
        # scale_x_continuous(limits = c(1900,2021),
        #                  breaks = 5)+
        theme(axis.text.x = element_text(angle = 45, hjust = 1))
    
    p1
    

    【讨论】:

    • 非常感谢!这完美地工作。感谢您的帮助!
    • @Kelsey 很高兴它有帮助!由于它解决了您的问题,请不要忘记单击答案旁边的复选标记以将其从灰色切换为已填充stackoverflow.com/help/someone-answers
    猜你喜欢
    • 2021-08-01
    • 2020-07-10
    • 1970-01-01
    • 1970-01-01
    • 2016-04-28
    • 1970-01-01
    • 1970-01-01
    • 2014-06-05
    • 2021-05-30
    相关资源
    最近更新 更多