【问题标题】:despite singular.ok = FALSE, lm() reports a result while solve(t(X) %*% X) %*% (t(X) %*% y) reports (correctly) an error尽管singular.ok = FALSE,lm() 报告结果,而solve(t(X) %*% X) %*% (t(X) %*% y) 报告(正确)错误
【发布时间】:2021-06-01 20:25:15
【问题描述】:

我的测试数据集是

> y
        DLogPrice
 [1,]   3.4232680
 [2,]  -1.0099196
 [3,]   0.7867983
 [4,]  -1.2224441
 [5,]   3.5718083
 [6,]  -0.4550516
 [7,]   1.6733032
 [8,]   1.6540079
 [9,]   0.6122239
[10,]  -1.3530304
[11,] -18.9058749
[12,]  15.6916978
[13,]   1.9088818

> X
      DQ DQInverseSize    DQLogSize  DQSize     DQSize2
 [1,]  2  1.925926e-05  23.10281197  208000  2.1664e+10
 [2,] -2 -1.851852e-05 -23.17977301 -216000 -2.3328e+10
 [3,]  1  9.259259e-06  11.58988651  108000  1.1664e+10
 [4,] -1 -8.000000e-06 -11.73606902 -125000 -1.5625e+10
 [5,]  2  1.600000e-05  23.47213803  250000  3.1250e+10
 [6,] -1 -8.000000e-06 -11.73606902 -125000 -1.5625e+10
 [7,]  1  9.259259e-06  11.58988651  108000  1.1664e+10
 [8,] -2 -1.878307e-05 -23.15160214 -213000 -2.2689e+10
 [9,]  0  0.000000e+00   0.00000000       0  0.0000e+00
[10,]  0 -4.761905e-07   0.04879016    5000  1.0250e+09
[11,]  0  9.090909e-07  -0.09531018  -10000 -2.1000e+09
[12,]  0 -9.090909e-07   0.09531018   10000  2.1000e+09
[13,]  2  1.869565e-05  23.16561287  215000  2.3225e+10

求解y = Xb中的线性回归系数向量b

solve(t(X) %*% X) %*% (t(X) %*% y)

在尝试反转 X'X 时导致奇点错误:

> solve(t(X) %*% X) %*% (t(X) %*% y)
Error in solve.default(t(X) %*% X) : 
  system is computationally singular: reciprocal condition number = 8.6658e-43

我不明白为什么尽管将singularity.ok = FALSE 设置为如下,但 stats::lm() 确实会报告结果

> df <- data.frame(y,X)
> 
> test.lm <- stats::lm(DLogPrice ~ DQ + DQInverseSize + DQLogSize + DQSize + DQSize2 -1, data=df, singular.ok = FALSE)
> summary(test.lm)

Call:
stats::lm(formula = DLogPrice ~ DQ + DQInverseSize + DQLogSize + 
    DQSize + DQSize2 - 1, data = df, singular.ok = FALSE)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.2934 -1.6251  0.2413  1.0087  6.0501 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)  
DQ            -4.492e+07  1.956e+07  -2.297   0.0507 .
DQInverseSize  1.503e+11  6.497e+10   2.314   0.0494 *
DQLogSize      4.038e+06  1.759e+06   2.295   0.0509 .
DQSize        -3.606e+01  1.585e+01  -2.275   0.0524 .
DQSize2        5.353e-05  2.374e-05   2.255   0.0542 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.618 on 8 degrees of freedom
Multiple R-squared:  0.8371,    Adjusted R-squared:  0.7353 
F-statistic: 8.222 on 5 and 8 DF,  p-value: 0.005156

我在这里遗漏/误解了什么? 感谢您的想法。

【问题讨论】:

    标签: r regression singular


    【解决方案1】:

    在底层,R 的 lm.fit() 函数使用 QR 分解,使其能够更可靠地处理类似这样的近乎奇异的情况:

    y <- 
      tibble::tribble(
        ~DLogPrice,
        3.4232680,
        -1.0099196,
        0.7867983,
        -1.2224441,
        3.5718083,
        -0.4550516,
        1.6733032,
        1.6540079,
        0.6122239,
        -1.3530304,
        18.9058749,
        15.6916978,
        1.9088818
      )
    
    x <- tibble::tribble(
      ~DQ,  ~DQInverseSize,    ~DQLogSize,  ~DQSize,     ~DQSize2,
      2,  1.925926e-05,  23.10281197,  208000,  2.1664e+10,
      -2, -1.851852e-05, -23.17977301, -216000, -2.3328e+10,
      1,  9.259259e-06,  11.58988651,  108000,  1.1664e+10,
      -1, -8.000000e-06, -11.73606902, -125000, -1.5625e+10,
      2,  1.600000e-05,  23.47213803,  250000,  3.1250e+10,
      -1, -8.000000e-06, -11.73606902, -125000, -1.5625e+10,
      1,  9.259259e-06,  11.58988651,  108000,  1.1664e+10,
      -2, -1.878307e-05, -23.15160214, -213000, -2.2689e+10,
      0,  0.000000e+00,   0.00000000,       0,  0.0000e+00,
      0, -4.761905e-07,   0.04879016,    5000,  1.0250e+09,
      0,  9.090909e-07,  -0.09531018,  -10000, -2.1000e+09,
      0, -9.090909e-07,   0.09531018,   10000,  2.1000e+09,
      2,  1.869565e-05,  23.16561287,  215000,  2.3225e+10
    )
    
    qr.solve(cbind(1, x), y)
    #>             1            DQ DQInverseSize     DQLogSize        DQSize 
    #>  3.547059e+00 -1.853857e+07  6.137535e+10  1.668251e+06 -1.508519e+01 
    #>       DQSize2 
    #>  2.268808e-05
    coef(lm(as.matrix(y) ~ as.matrix(x)))
    #>               (Intercept)            as.matrix(x)DQ as.matrix(x)DQInverseSize 
    #>              3.547059e+00             -1.853857e+07              6.137535e+10 
    #>     as.matrix(x)DQLogSize        as.matrix(x)DQSize       as.matrix(x)DQSize2 
    #>              1.668251e+06             -1.508519e+01              2.268808e-05
    

    reprex package (v2.0.0) 于 2021-06-01 创建

    OLS solve(t(x) %*% x) %*% t(x) %*% y 的教科书定义在计算上非常低效,不是实现 OLS 的好方法。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2013-12-17
      • 2013-06-20
      • 2020-09-09
      • 2022-08-16
      • 2013-10-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多