R中的线性回归，没有列名答案

【问题标题】：Linear Regression in R without names of columnR中的线性回归，没有列名
【发布时间】：2017-03-24 16:43:13
【问题描述】：

我有未知列数的数据框，每列都有不同的特定名称。我想在第一列和其他列之间进行线性回归，而不使用列名。我能做到吗？

例如，mtcars 的数据集

fit1 <- lm(mtcars$mpg ~ ., mtcars)

但我想使用 mtcars$mpg

感谢您的帮助！

【问题讨论】：

fit1 <- lm(mtcars[,1] ~ ., mtcars) ?
好的，但是如果我想做 fit2
那是一个不同的问题 - 您的问题专门说“在第一列和其他列之间”，Ryan Morton 告诉您如何做到这一点。您还可以编辑 data 参数以排除您不想要的列，例如I(fit1$resisuals^2) ~ ., data = mtcars[, -1]

标签： r linear-regression lm

【解决方案1】：

查看源代码可能令人生畏，但信息量很大。在完成所有model.matrix 和其他按摩之后解析lm 后，您会看到它调用了lm.fit。如果您查看帮助，在警告除非有经验的用户通常_不应该_直接使用之后，您可以通过调试它并查看如何操作来了解发生了什么lm 正在呼叫lm.fit。

debugonce(lm.fit)
lm(mtcars$mpg ~ mtcars$cyl, mtcars)
# debugging in: lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...)

# Browse[2]> 
head(x)
#                   (Intercept) mtcars$cyl
# Mazda RX4                   1          6
# Mazda RX4 Wag               1          6
# Datsun 710                  1          4
# Hornet 4 Drive              1          6
# Hornet Sportabout           1          8
# Valiant                     1          6

# Browse[2]> 
head(y)
#         Mazda RX4     Mazda RX4 Wag        Datsun 710    Hornet 4 Drive 
#              21.0              21.0              22.8              21.4 
# Hornet Sportabout           Valiant 
#              18.7              18.1 

# Browse[2]>
c    
# Call:
# lm(formula = mtcars$mpg ~ mtcars$cyl, data = mtcars)
# Coefficients:
# (Intercept)   mtcars$cyl  
#      37.885       -2.876

由此看来，y 只是响应变量 (mtcars$mpg) 的一个向量，x 是一个矩阵，其中包含所有解释变量以及前导列，即截距。（我将把它作为练习留给读者添加其他列。）

直接调用它可能看起来像：

mod <- lm.fit(cbind(Intercept = 1, mtcars$cyl), mtcars$mpg)
str(mod)
# List of 8
#  $ coefficients : Named num [1:2] 37.88 -2.88
#   ..- attr(*, "names")= chr [1:2] "Intercept" ""
#  $ residuals    : num [1:32] 0.37 0.37 -3.58 0.77 3.82 ...
#  $ effects      : Named num [1:32] -113.65 -28.6 -3.7 0.71 3.82 ...
#   ..- attr(*, "names")= chr [1:32] "Intercept" "" "" "" ...
#  $ rank         : int 2
#  $ fitted.values: num [1:32] 20.6 20.6 26.4 20.6 14.9 ...
#  $ assign       : NULL
#  $ qr           :List of 5
#   ..$ qr   : num [1:32, 1:2] -5.657 0.177 0.177 0.177 0.177 ...
#   .. ..- attr(*, "dimnames")=List of 2
#   .. .. ..$ : NULL
#   .. .. ..$ : chr [1:2] "Intercept" ""
#   ..$ qraux: num [1:2] 1.18 1.02
#   ..$ pivot: int [1:2] 1 2
#   ..$ tol  : num 1e-07
#   ..$ rank : int 2
#   ..- attr(*, "class")= chr "qr"
#  $ df.residual  : int 30

一个区别是lm() 的输出属于"lm" 类，而lm.fit 的输出是"list"。同样，查看lm 的源代码，您可以看到它从lm.fit 获取输出并将一个类和一些其他属性应用于列表。（这些属性经常被summary 等美学所使用，但也可能被其他辅助函数使用。）

简单地分配一个类将改进它的控制台打印，但您需要测试以查看您的后续计算/分析是否需要应用这些属性和类。

class(mod) <- "lm"
mod
# Call:
# NULL
# Coefficients:
# Intercept             
#    37.885     -2.876

Caveat Emptor：lm 中有几个清洁和按摩步骤在此处完全被绕过。通过使用此函数，您正在消除用户预期的包装函数的“安全性”。

【讨论】：