【问题标题】:How to retrieve model estimates from statsmodels?如何从 statsmodels 中检索模型估计?
【发布时间】:2018-07-09 09:48:34
【问题描述】:

来自这样的数据集:

import pandas as pd
import numpy as np
import statsmodels.api as sm

# A dataframe with two variables
np.random.seed(123)
rows = 12
rng = pd.date_range('1/1/2017', periods=rows, freq='D')
df = pd.DataFrame(np.random.randint(100,150,size=(rows, 2)), columns=['y', 'x']) 
df = df.set_index(rng)

...和这样的线性回归模型:

x = sm.add_constant(df['x'])
model = sm.OLS(df['y'], x).fit()

...您可以通过这种方式轻松检索一些模型系数:

print(model.params)

但我就是不知道如何从模型摘要中检索所有其他参数:

print(str(model.summary()))

正如问题中所述,我对 R-squared 特别感兴趣。

How to extract a particular value from the OLS-summary in Pandas? 的帖子中,我了解到您可以使用print(model.r2) 在那里做同样的事情。但这似乎不适用于 statsmodels。

有什么建议吗?

【问题讨论】:

    标签: python statsmodels


    【解决方案1】:

    你可以使用属性如

    • model.f_pvalue 获取 F 统计量的 p 值
    • model.rsquared 获取模型等的 rsquared 值。

    参考https://www.statsmodels.org/devel/generated/statsmodels.regression.linear_model.RegressionResults.html的文档

    示例用法: Jupyter screenshot of attributes

    【讨论】:

      【解决方案2】:

      你可以像这样得到 R 平方:

      代码:

      model.rsquared
      

      测试代码:

      import pandas as pd
      import numpy as np
      import statsmodels.api as sm
      
      # A dataframe with two variables
      np.random.seed(123)
      rows = 12
      rng = pd.date_range('1/1/2017', periods=rows, freq='D')
      df = pd.DataFrame(np.random.randint(100,150,size=(rows, 2)), columns=['y', 'x'])
      df = df.set_index(rng)
      
      x = sm.add_constant(df['x'])
      model = sm.OLS(df['y'], x).fit()
      
      print(model.params)
      print(model.rsquared)
      print(str(model.summary()))
      

      结果:

      const    176.636417
      x         -0.357185
      dtype: float64
      
      0.338332793094
      
                                  OLS Regression Results                            
      ==============================================================================
      Dep. Variable:                      y   R-squared:                       0.338
      Model:                            OLS   Adj. R-squared:                  0.272
      Method:                 Least Squares   F-statistic:                     5.113
      Date:                Tue, 30 Jan 2018   Prob (F-statistic):             0.0473
      Time:                        05:36:04   Log-Likelihood:                -41.442
      No. Observations:                  12   AIC:                             86.88
      Df Residuals:                      10   BIC:                             87.85
      Df Model:                           1                                         
      Covariance Type:            nonrobust                                         
      ==============================================================================
                       coef    std err          t      P>|t|      [0.025      0.975]
      ------------------------------------------------------------------------------
      const        176.6364     20.546      8.597      0.000     130.858     222.415
      x             -0.3572      0.158     -2.261      0.047      -0.709      -0.005
      ==============================================================================
      Omnibus:                        1.934   Durbin-Watson:                   1.182
      Prob(Omnibus):                  0.380   Jarque-Bera (JB):                1.010
      Skew:                          -0.331   Prob(JB):                        0.603
      Kurtosis:                       1.742   Cond. No.                     1.10e+03
      ==============================================================================
      
      Warnings:
      [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
      [2] The condition number is large, 1.1e+03. This might indicate that there are
      strong multicollinearity or other numerical problems.
      

      查找所有属性名称:

      用一小段代码:

      for attr in dir(model):
          if not attr.startswith('_'):
              print(attr)
      

      你可以看到一个对象的所有属性:

      HC0_se
      HC1_se
      HC2_se
      HC3_se
      aic
      bic
      bse
      centered_tss
      compare_f_test
      compare_lm_test
      compare_lr_test
      condition_number
      conf_int
      conf_int_el
      cov_HC0
      cov_HC1
      cov_HC2
      cov_HC3
      cov_kwds
      cov_params
      cov_type
      df_model
      df_resid
      eigenvals
      el_test
      ess
      f_pvalue
      f_test
      fittedvalues
      fvalue
      get_influence
      get_prediction
      get_robustcov_results
      initialize
      k_constant
      llf
      load
      model
      mse_model
      mse_resid
      mse_total
      nobs
      normalized_cov_params
      outlier_test
      params
      predict
      pvalues
      remove_data
      resid
      resid_pearson
      rsquared
      rsquared_adj
      save
      scale
      ssr
      summary
      summary2
      t_test
      tvalues
      uncentered_tss
      use_t
      wald_test
      wald_test_terms
      wresid
      

      【讨论】:

      • 谢谢!关于如何查找所有属性名称的额外信息很棒!
      猜你喜欢
      • 2023-01-28
      • 2014-09-15
      • 2022-07-15
      • 1970-01-01
      • 2017-12-20
      • 1970-01-01
      • 2018-05-26
      • 2016-05-26
      • 1970-01-01
      相关资源
      最近更新 更多