在 Python 中使用 statsmodels 进行回归分析答案

【问题标题】：Regression Analysis with statsmodels in Python在 Python 中使用 statsmodels 进行回归分析
【发布时间】：2018-04-09 17:03:38
【问题描述】：

我是 Python 新手，正在学习如何在 Python 中使用 statsmodels 进行回归分析（从 R 迁移到 Python 并以 R 方式思考）。我的最小工作示例如下：

Income  =  [80, 100, 120, 140, 160, 180, 200, 220, 240, 260]
Expend  =  [70,  65,  90,  95, 110, 115, 120, 140, 155, 150]

import pandas as pd
df1 = pd.DataFrame(
{'Income': Income,
     'Expend': Expend
    })

#regression with formula
import statsmodels.formula.api as smf

#instantiation
reg1 = smf.ols('Expend ~ Income', data = df1)

#members of reg object
print(dir(reg1))

['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_data_attr', '_df_model', '_df_resid', '_fit_ridge', '_get_init_kwds', '_handle_data', '_init_keys', '_setup_score_hess', 'data', 'df_model', 'df_resid', 'endog', 'endog_names', 'exog', 'exog_names', 'fit', 'fit_regularized', 'formula', 'from_formula', 'get_distribution', 'hessian', 'information', 'initialize', 'k_constant', 'loglike', 'nobs', 'predict', 'rank', 'score', 'weights', 'wendog', 'wexog', 'whiten']

#members of the object provided by the modelling.
print(dir(reg1.fit()))

['HC0_se', 'HC1_se', 'HC2_se', 'HC3_se', '_HCCM', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_cache', '_data_attr', '_get_robustcov_results', '_is_nested', '_wexog_singular_values', 'aic', 'bic', 'bse', 'centered_tss', 'compare_f_test', 'compare_lm_test', 'compare_lr_test', 'condition_number', 'conf_int', 'conf_int_el', 'cov_HC0', 'cov_HC1', 'cov_HC2', 'cov_HC3', 'cov_kwds', 'cov_params', 'cov_type', 'df_model', 'df_resid', 'eigenvals', 'el_test', 'ess', 'f_pvalue', 'f_test', 'fittedvalues', 'fvalue', 'get_influence', 'get_prediction', 'get_robustcov_results', 'initialize', 'k_constant', 'llf', 'load', 'model', 'mse_model', 'mse_resid', 'mse_total', 'nobs', 'normalized_cov_params', 'outlier_test', 'params', 'predict', 'pvalues', 'remove_data', 'resid', 'resid_pearson', 'rsquared', 'rsquared_adj', 'save', 'scale', 'ssr', 'summary', 'summary2', 't_test', 'tvalues', 'uncentered_tss', 'use_t', 'wald_test', 'wald_test_terms', 'wresid']

我想了解print(dir(reg1)) 和print(dir(reg1.fit())) 的输出。我在哪里可以获得这些组件的文档和这些片段的示例？

【问题讨论】：

标签： python statsmodels

【解决方案1】：

伙计，这是简单的“谷歌搜索”/阅读文档页面。可能令人困惑的是statsmodels.formula.api 的使用。这是为了提供entering R-style formulas的可能性。

statsmodels 的文档位于：StatsModels Index Page。向下滚动直到到达“目录”。点击Linear Regression。向下滚动到Module Reference，有指向Model Classes 和Result Classes 的链接。

@Bill Bell 已经指出了正确的模型类：它是OLS。在methods 下方，您可以找到fit 文档的链接，其中指出fit 返回一个RegressionResults 对象。

RegressionResults doc page 解释了您感兴趣的属性。

请注意：

属性以双下划线__开始/结束，例如__class__ 等是Python special attributes。
您可以通过附加 ? 在 Python 解释器中获得帮助，例如通过输入 reg1?（很像在 R 中预先添加 ?）

【讨论】：

感谢@Tw UxTLi51Nus 提供非常有用的答案。如果指出在哪里可以获得 statsmodels 文档的 pdf 格式，将不胜感激。谢谢
@MYaseen208 抱歉找不到。有一个open issue for it。如果您真正想要的是“本地可用的文档”，您可以在您的机器上构建文档。为此，请参阅here

【解决方案2】：

dir() 用于列出模块中的所有属性、方法和变量，就像在 R 作为 library(lme4) 方法（class= “merMod”）你也可以试试 reg1.dict

【讨论】：

使用 reg1.__dict__

【解决方案3】：

关于 Python 的一些知识点。

Python 在 python 解释器 help 中的 python try 命令中有内置的离线文档
```
>>> help(dir)
>>> help(help)
```
如果您想在线查看，可以访问pydocs 以获得一般帮助。如需特定包的帮助，请访问pypi（Python 包索引）
现在针对您的问题。帮助statsmodels。重定向到Homepage
最后，这里有一个您可能会感兴趣的页面：Fitting models using R-style formulas。

【讨论】：

【解决方案4】：

>>> reg1.__module__
'statsmodels.regression.linear_model'

谷歌搜索给了我这个页面，http://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLS.html，其中包括一个指向fit的链接。

我不知道这有你需要的一切。我希望它是一条腿。

【讨论】：