Python 中的 OLS Breusch Pagan 测试答案

【问题标题】：OLS Breusch Pagan test in PythonPython 中的 OLS Breusch Pagan 测试
【发布时间】：2015-07-15 15:56:23
【问题描述】：

我使用 statsmodels 包来估计我的 OLS 回归。现在我想要Breusch Pagan test。我在这个测试中使用了pysal 包，但是这个函数返回一个错误：

import statsmodels.api as sm
import pysal

model = sm.OLS(Y,X,missing = 'drop')
rs = model.fit()
pysal.spreg.diagnostics.breusch_pagan(rs)

返回错误：

AttributeError: 'OLSResults' 对象没有属性 'u'

我该怎么办？

【问题讨论】：

我不知道这个例外，但你也可以使用 statsmodels 中的 breushpagan，它采用 OLS 残差和候选作为异方差的解释变量，statsmodels.sourceforge.net/devel/generated/… 此处有示例statsmodels.sourceforge.net/devel/examples/notebooks/generated/…跨度>
谢谢。请添加您的答案。

标签： python statistics canopy pysal

【解决方案1】：

问题是statsmodels的回归结果实例与pysal中的不兼容。

您可以使用来自 statsmodels 的breuschpagan，它采用 OLS 残差和候选作为异方差的解释变量，因此它不依赖于特定模型或模型的实现。

文档： https://www.statsmodels.org/devel/generated/statsmodels.stats.diagnostic.het_breuschpagan.html

这里有例子https://www.statsmodels.org/devel/examples/notebooks/generated/regression_diagnostics.html

不知道Breusch-Pagan测试的实现是否有本质区别。

该名称在 statsmodels 中似乎拼写错误。

edit 名称的拼写已在 statsmodels 0.9 版中更正。旧的错误拼写是breushpagan。

【讨论】：

似乎旧链接已损坏。此博客：statology.org/breusch-pagan-test-python 讲述了测试，如何使用 statsmodel 以详细而明确的方式实现它。
我修复了链接并调整为缺少“c”的breuschpagan 的更正拼写

【解决方案2】：

由于我倾向于不使用 statsmodels 库，因此我创建了一个 Python 函数来执行 Breusch-Pagan 测试。它使用来自 SciKit-learn 的多元线性回归。

import numpy as np
from sklearn.linear_model import LinearRegression
from scipy.stats import chisqprob


def breusch_pagan_test(x, y):
    '''
    Breusch-Pagan test for heteroskedasticity in a linear regression model:
    H_0 = No heteroskedasticity.
    H_1 = Heteroskedasticity is present.

    Inputs:
    x = a numpy.ndarray containing the predictor variables. Shape = (nSamples, nPredictors).
    y = a 1D numpy.ndarray containing the response variable. Shape = (nSamples, ).

    Outputs a list containing three elements:
    1. the Breusch-Pagan test statistic.
    2. the p-value for the test.
    3. the test result.
    '''

    if y.ndim != 1:
        raise SystemExit('Error: y has more than 1 dimension.')
    if x.shape[0] != y.shape[0]:
        raise SystemExit('Error: the number of samples differs between x and y.')
    else:
        n_samples = y.shape[0]

    # fit an OLS linear model to y using x:
    lm = LinearRegression()
    lm.fit(x, y)

    # calculate the squared errors:
    err = (y - lm.predict(x))**2

    # fit an auxiliary regression to the squared errors:
    # why?: to estimate the variance in err explained by x
    lm.fit(x, err)
    pred_err = lm.predict(x)
    del lm

    # calculate the coefficient of determination:
    ss_tot = sum((err - np.mean(err))**2)
    ss_res = sum((err - pred_err)**2)
    r2 = 1 - (ss_res / ss_tot)
    del err, pred_err, ss_res, ss_tot

    # calculate the Lagrange multiplier:
    LM = n_samples * r2
    del r2

    # calculate p-value. degrees of freedom = number of predictors.
    # this is equivalent to (p - 1) parameter restrictions in Wikipedia entry.
    pval = chisqprob(LM, x.shape[1])

    if pval < 0.01:
        test_result = 'Heteroskedasticity present at 99% CI.'
    elif pval < 0.05:
        test_result = 'Heteroskedasticity present at 95% CI.'
    else:
        test_result = 'No significant heteroskedasticity.'
    return [LM, pval, test_result]

【讨论】：

请注意，chisqprob 已弃用，scipy.stats.distributions.chi2.sf 是建议的替代方案：docs.scipy.org/doc/scipy-0.19.1/reference/generated/…
如果 x 是由 id、nbr_of_edges、nbr_of_nodes 表示的图的列表，并且 y 是图上的计算索引，如连接索引，......那是有效的输入吗？
嗨 sAguinaga，很抱歉回复晚了。我写这篇文章已经有一段时间了，但是“x”只是一个 numpy 数组，其中包含线性回归中的预测变量。例如，如果您有 100 个训练样本（即唯一数据点）并且每个都有 5 个预测变量，则 x.shape=(100, 5)。
小修正：Breusch-Pagan 检验将归一化误差回归到预测变量上。所以resid**2 / mean(resid**2) 而不仅仅是resid**2。这就是他们在原始论文中所指的g。
感谢指正。我不知道这一点。