【问题标题】:Linear regression with intercept forced to zero AND uncertainty on value of slope截距强制为零和斜率值不确定性的线性回归
【发布时间】:2019-02-23 17:51:40
【问题描述】:

我想用 python 做一个线性回归,有两个要求:

  • 拦截强制为零

  • 在输出中我想对斜率参数以及 p 值、r 平方有不确定性...

据我所知,stats.linregress 满足第一个要求,np.linalg.lstsq 满足第二个要求。有人可以帮我找到最简单的方法吗?

非常感谢, 卡米尔

【问题讨论】:

    标签: python-3.x linear-regression


    【解决方案1】:
    import numpy as np
    from scipy.optimize import curve_fit
    
    xdata = np.array([x values])
    ydata = np.array([y values])
    
    def func(x, a):
      return a*x
    
    popt, pcov = curve_fit(func, xdata, ydata)
    
    residuals = ydata- func(xdata, *popt)
    ss_res = np.sum(residuals**2)
    ss_tot = np.sum((ydata-np.mean(ydata))**2)
    r_squared = 1 - (ss_res / ss_tot)
    
    dgr_free = len(xdata)-1
    chi_sqr = sum([(y-func(x,*popt))**2/func(x,*popt) for x,y in zip(xdata,ydata)])
    
    print(popt) # will print out your varibles in order, in this case just a
    print(r_squared)
    print(chi_sqr,dgr_free) # btw this is chi squared not p
    

    这里的想法是我们在没有 + b 的情况下对 lieaner 函数进行回归,因为 b 上下移动 y 轴截距,因此当它设置为 0 时,我们得到一个线性回归,截距在 (0,0)

    使用 scipy.curve_fit 的另一个好处是您可以对任何公式进行回归 - 尽管 r_squared 在曲线回归中有些多余。

    【讨论】:

      【解决方案2】:

      此示例包含您问题中要求的统计信息,并且还绘制了拟合函数与数据的关系图。

      from scipy.optimize import curve_fit
      import numpy as np
      import scipy.odr
      import scipy.stats
      import numpy, scipy, matplotlib
      import matplotlib.pyplot as plt
      
      xData = np.array([5.357, 5.797, 5.936, 6.161, 6.697, 6.731, 6.775, 8.442, 9.861])
      yData = np.array([0.376, 0.874, 1.049, 1.327, 2.054, 2.077, 2.138, 4.744, 7.104])
      
      def func(x,b0):
          return b0 * x
      
      initialParameters = numpy.array([np.mean(yData) / np.mean(xData)])
      
      def f_wrapper_for_odr(beta, x): # parameter order for odr
          return func(x, *beta)
      
      fittedParameters, cov= curve_fit(func, xData, yData, p0=initialParameters)
      
      model = scipy.odr.odrpack.Model(f_wrapper_for_odr)
      data = scipy.odr.odrpack.Data(xData, yData)
      myodr = scipy.odr.odrpack.ODR(data, model, beta0=fittedParameters,  maxit=0)
      myodr.set_job(fit_type=2)
      fittedParameterstatistics = myodr.run()
      df_e = len(xData) - len(fittedParameters) # degrees of freedom, error
      cov_beta = fittedParameterstatistics.cov_beta # parameter covariance matrix from ODR
      sd_beta = fittedParameterstatistics.sd_beta * fittedParameterstatistics.sd_beta
      ci = []
      t_df = scipy.stats.t.ppf(0.975, df_e)
      ci = []
      for i in range(len(fittedParameters)):
          ci.append([fittedParameters[i] - t_df * fittedParameterstatistics.sd_beta[i], fittedParameters[i] + t_df * fittedParameterstatistics.sd_beta[i]])
      
      tstat_beta = fittedParameters / fittedParameterstatistics.sd_beta # coeff t-statistics
      pstat_beta = (1.0 - scipy.stats.t.cdf(np.abs(tstat_beta), df_e)) * 2.0    # coef. p-values
      
      for i in range(len(fittedParameters)):
          print('parameter:', fittedParameters[i])
          print('   conf interval:', ci[i][0], ci[i][1])
          print('   tstat:', tstat_beta[i])
          print('   pstat:', pstat_beta[i])
          print()
      
      modelPredictions = func(xData, *fittedParameters) 
      
      absError = modelPredictions - yData
      
      SE = numpy.square(absError) # squared errors
      MSE = numpy.mean(SE) # mean squared errors
      RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
      Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
      print('RMSE:', RMSE)
      print('R-squared:', Rsquared)
      
      print()
      
      
      ##########################################################
      # graphics output section
      def ModelAndScatterPlot(graphWidth, graphHeight):
          f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
          axes = f.add_subplot(111)
      
          # first the raw data as a scatter plot
          axes.plot(xData, yData,  'D')
      
          # create data for the fitted equation plot
          xModel = numpy.linspace(min(xData), max(xData))
          yModel = func(xModel, *fittedParameters)
      
          # now the model as a line plot
          axes.plot(xModel, yModel)
      
          axes.set_xlabel('X Data') # X axis data label
          axes.set_ylabel('Y Data') # Y axis data label
      
          plt.show()
          plt.close('all') # clean up after using pyplot
      
      graphWidth = 800
      graphHeight = 600
      ModelAndScatterPlot(graphWidth, graphHeight)
      

      【讨论】:

        猜你喜欢
        • 2018-06-11
        • 2020-12-03
        • 1970-01-01
        • 2015-09-09
        • 2018-07-07
        • 1970-01-01
        • 2014-02-15
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多