【问题标题】:Multi Variable Regression statsmodels.api多变量回归 statsmodels.api
【发布时间】:2020-12-13 03:51:56
【问题描述】:

我查看了文档,但仍然无法弄清楚。我想运行具有多个回归的 WLS。

statsmodels.api 被导入为 sm

单变量示例。

X = Height
Y = Weight

res = sm.OLS(Y,X,).fit() 
res.summary()

假设我也有:

X2 = 年龄

如何将其添加到我的回归中?

【问题讨论】:

  • 注意,statsmodels 不会添加截距,除非使用公式。

标签: python regression statsmodels


【解决方案1】:

我使用二阶多项式来预测身高和年龄如何影响士兵的体重。你可以在我的 GitHub 上获取 ansur_2_m.csv。

 df=pd.read_csv('ANSUR_2_M.csv', encoding = "ISO-8859-1",   usecols=['Weightlbs','Heightin','Age'],  dtype={'Weightlbs':np.integer,'Heightin':np.integer,'Age':np.integer})
 df=df.dropna()
 df.reset_index()
 df['Heightin2']=df['Heightin']**2
 df['Age2']=df['Age']**2

 formula="Weightlbs ~ Heightin+Heightin2+Age+Age2"
 model_ols = smf.ols(formula,data=df).fit()
 minHeight=df['Heightin'].min()
 maxHeight=df['Heightin'].max()
 avgAge = df['Age'].median()
 print(minHeight,maxHeight,avgAge)

 df2=pd.DataFrame()

 df2['Heightin']=np.linspace(60,100,50)
 df2['Heightin2']=df2['Heightin']**2
 df2['Age']=28
 df2['Age2']=df['Age']**2

 df3=pd.DataFrame()
 df3['Heightin']=np.linspace(60,100,50)
 df3['Heightin2']=df2['Heightin']**2
 df3['Age']=45
 df3['Age2']=df['Age']**2

 prediction28=model_ols.predict(df2)
 prediction45=model_ols.predict(df3)

 plt.clf()
 plt.plot(df2['Heightin'],prediction28,label="Age 28")
 plt.plot(df3['Heightin'],prediction45,label="Age 45")
 plt.ylabel="Weight lbs"
 plt.xlabel="Height in"
 plt.legend()
 plt.show()

 print('A 45 year old soldier is more probable to weight more than an 28 year old soldier')

【讨论】:

【解决方案2】:

您可以将它们放入 data.frame 并调用列(这样输出看起来也更好):

import statsmodels.api as sm
import pandas as pd
import numpy as np

Height = np.random.uniform(0,1,100)
Weight = np.random.uniform(0,1,100)
Age = np.random.uniform(0,30,100)

df = pd.DataFrame({'Height':Height,'Weight':Weight,'Age':Age})

res = sm.OLS(df['Height'],df[['Weight','Age']]).fit()

In [10]: res.summary()
Out[10]: 
<class 'statsmodels.iolib.summary.Summary'>
"""
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                 Height   R-squared (uncentered):                   0.700
Model:                            OLS   Adj. R-squared (uncentered):              0.694
Method:                 Least Squares   F-statistic:                              114.3
Date:                Mon, 24 Aug 2020   Prob (F-statistic):                    2.43e-26
Time:                        15:54:30   Log-Likelihood:                         -28.374
No. Observations:                 100   AIC:                                      60.75
Df Residuals:                      98   BIC:                                      65.96
Df Model:                           2                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Weight         0.1787      0.090      1.988      0.050       0.000       0.357
Age            0.0229      0.003      8.235      0.000       0.017       0.028
==============================================================================
Omnibus:                        2.938   Durbin-Watson:                   1.813
Prob(Omnibus):                  0.230   Jarque-Bera (JB):                2.223
Skew:                          -0.211   Prob(JB):                        0.329
Kurtosis:                       2.404   Cond. No.                         49.7
==============================================================================

【讨论】:

    猜你喜欢
    • 2016-05-04
    • 2012-12-18
    • 2020-11-22
    • 1970-01-01
    • 2018-05-03
    • 2016-12-14
    • 1970-01-01
    • 2011-01-06
    • 1970-01-01
    相关资源
    最近更新 更多