【问题标题】:numpy.polyfit has no keyword 'cov'numpy.polyfit 没有关键字“cov”
【发布时间】:2012-04-22 11:16:19
【问题描述】:

我正在尝试使用 polyfit 来找到一组数据的最佳拟合直线,但我还需要知道参数的不确定性,所以我也想要协方差矩阵。在线文档建议我写:

polyfit(x, y, 2, cov=True)

但这给出了错误:

TypeError: polyfit() 得到了一个意外的关键字参数 'cov'

果然 help(polyfit) 没有显示关键字参数 'cov'。

那么在线文档是否引用了 numpy 的先前版本? (我有 1.6.1,最新的)。我可以编写自己的 polyfit 函数,但有人对我的 polyfit 没有协方差选项有什么建议吗?

谢谢

【问题讨论】:

标签: python numpy covariance


【解决方案1】:

对于来自库的解决方案,我发现使用scikits.statsmodels 是一个方便的选择。在 statsmodels 中,回归对象具有返回参数和标准错误的可调用属性。我在下面举了一个例子来说明这将如何为你工作:

# Imports, I assume NumPy for forming your data.
import numpy as np
import scikits.statsmodels.api as sm

# Form the data here
(X, Y) = ....

reg_x_data =  np.ones(X.shape);                      # 0th degree term. 
for ii in range(1,deg+1):
    reg_x_data = np.hstack(( reg_x_data, X**(ii) )); # Append the ii^th degree term.


# Store OLS regression results into `result`
result = sm.OLS(Y,reg_x_data).fit()


# Print the estimated coefficients
print result.params

# Print the basic OLS standard error in the coefficients
print result.bse

# Print the estimated basic OLS covariance matrix
print result.cov_params()   # <-- Notice, this one is a function by convention.

# Print the heteroskedasticity-consistent standard error
print result.HC0_se

# Print the heteroskedasticity-consistent covariance matrix
print result.cov_HC0

result 对象中还有其他强大的协方差属性。您可以通过打印出dir(result) 来查看它们。此外,按照惯例,异方差一致估计量的协方差矩阵仅在您已经调用相应的标准误差之后可用,例如:您必须在result.cov_HC0 之前调用result.HC0_se,因为第一个引用导致计算和存储第二个。

Pandas 是另一个可能为这些操作提供更高级支持的库。

非库函数

当您不想/不能依赖额外的库函数时,这可能很有用。

下面是我编写的用于返回 OLS 回归系数的函数,以及一堆东西。它返回残差、回归方差和标准误差(残差平方的标准误差)、大样本方差的渐近公式、OLS 协方差矩阵、异方差一致的“稳健”协方差估计(即 OLS 协方差但根据残差加权)和“White”或“bias-corrected”异方差一致协方差。

import numpy as np

###
# Regression and standard error estimation functions
###
def ols_linreg(X, Y):
    """ ols_linreg(X,Y) 

        Ordinary least squares regression estimator given explanatory variables 
        matrix X and observations matrix Y.The length of the first dimension of 
        X and Y must be the same (equal to the number of samples in the data set).

        Note: these methods should be adapted if you need to use this for large data.
        This is mostly for illustrating what to do for calculating the different
        classicial standard errors. You would never really want to compute the inverse
        matrices for large problems.

        This was developed with NumPy 1.5.1.
    """
    (N, K) = X.shape
    t1 = np.linalg.inv( (np.transpose(X)).dot(X) )
    t2 = (np.transpose(X)).dot(Y)

    beta = t1.dot(t2)
    residuals = Y - X.dot(beta)
    sig_hat = (1.0/(N-K))*np.sum(residuals**2)
    sig_hat_asymptotic_variance = 2*sig_hat**2/N
    conv_st_err = np.sqrt(sig_hat)

    sum1 = 0.0
    for ii in range(N):
        sum1 = sum1 + np.outer(X[ii,:],X[ii,:])

    sum1 = (1.0/N)*sum1
    ols_cov = (sig_hat/N)*np.linalg.inv(sum1)

    PX = X.dot(  np.linalg.inv(np.transpose(X).dot(X)).dot(np.transpose(X))   )
    robust_se_mat1 = np.linalg.inv(np.transpose(X).dot(X))
    robust_se_mat2 = np.transpose(X).dot(np.diag(residuals[:,0]**(2.0)).dot(X))
    robust_se_mat3 = np.transpose(X).dot(np.diag(residuals[:,0]**(2.0)/(1.0-np.diag(PX))).dot(X))

    v_robust = robust_se_mat1.dot(robust_se_mat2.dot(robust_se_mat1))
    v_modified_robust = robust_se_mat1.dot(robust_se_mat3.dot(robust_se_mat1))

    """ Returns:
        beta -- The vector of coefficient estimates, ordered on the columns on X.
        residuals -- The vector of residuals, Y - X.beta
        sig_hat -- The sample variance of the residuals.
        conv_st_error -- The 'standard error of the regression', sqrt(sig_hat).
        sig_hat_asymptotic_variance -- The analytic formula for the large sample variance
        ols_cov -- The covariance matrix under the basic OLS assumptions.
        v_robust -- The "robust" covariance matrix, weighted to account for the residuals and heteroskedasticity.
        v_modified_robust -- The bias-corrected and heteroskedasticity-consistent covariance matrix.
    """
    return beta, residuals, sig_hat, conv_st_err, sig_hat_asymptotic_variance, ols_cov, v_robust, v_modified_robust

对于你的问题,你可以这样使用它:

import numpy as np

# Define or load your data:
(Y, X) = ....

# Desired polynomial degree
deg = 2;

reg_x_data =  np.ones(X.shape); # 0th degree term. 
for ii in range(1,deg+1):
    reg_x_data = np.hstack(( reg_x_data, X**(ii) )); # Append the ii^th degree term.


# Get all of the regression data.
beta, residuals, sig_hat, conv_st_error, sig_hat_asymptotic_variance, ols_cov, v_robust, v_modified_robust = ols_linreg(reg_x_data,Y)

# Print the covariance matrix:
print ols_cov

如果您在我的计算中发现任何错误(尤其是异方差一致性估计器),请告诉我,我会尽快修复。

【讨论】:

  • 感谢您的帮助,但有几个错误。我假设hastack应该是hstack。我还必须首先将我的 X 和 Y 转换为 2D 数组(即 .shape = (2000,1),而不是 (2000,)。然后我得到问题 't2 = (np.transpose(X)).dot( Y), ValueError: matrices are not aligned',因为 reg_x_data 现在的大小与 Y 不同。
  • 是的,我刚刚更改了hastack 的错字,我不确定您的屏幕是否没有刷新或其他问题。是的,我假设 XY 是二维数组。如果需要,您可以在ols_linreg 的开头添加一些简单的代码来将一维数组重新整形为二维数组。
  • 我添加了一个使用 scikits.statsmodels 库的解决方案,以防您真的需要一个库解决方案。我还修改了我的代码,使其更加高效和矢量化。我已经根据 scikits.statsmodels 回归结果检查了此处发布的函数,我的函数在他们提供的几个测试数据集上将它们的结果匹配到小数点后 6 位。
猜你喜欢
  • 2023-02-17
  • 2011-05-24
  • 2011-11-13
  • 1970-01-01
  • 1970-01-01
  • 2020-03-19
  • 2016-10-10
  • 2020-12-06
  • 1970-01-01
相关资源
最近更新 更多