循环制作 lm() 模型时公式中的无效功率答案

【问题标题】：Invalid power in formula when looping to make lm() models循环制作 lm() 模型时公式中的无效功率
【发布时间】：2017-09-21 09:56:05
【问题描述】：

我正在尝试使用相同的数据制作不同的 lm 模型，但通过更改指数来显示更高阶的交互。我试图在 for 循环中进行此操作，而不是编写多行。但我收到“公式中的无效幂”错误。

for (m in 1:5){
  assign(paste("lm.", m, sep = ""), lm(paste("Response ~ (Factor1+Factor2+Factor3+Factor4+Factor5)^", m, sep = "")))
}

问：这是什么原因造成的，我该如何解决？

其次，我想粘贴 lm 函数中的因子数（Factor1+Factor2+...），因为因子数会发生变化。

由于上述错误，我也无法查看这是否有效。

我试过了：

for (m in 1:n.factors){
  # Make a string to paste in lm()
  assign(paste("lm.paste.", m, sep = ""), paste("Response ~ (", paste(factors, collapse="+"), ")^", m, sep = ""))

  # Paste in string to lm()
  assign(paste("lm.", m, sep = ""), lm(lm.paste.1, data=data.df))
}

其中“因子”是包含我的因子字符串的向量（“因子 1”、“因子 2”...）。而n.factors是因子的个数。

问：这是正确的做法吗？感觉有点麻烦。

也许有一些方法可以使用

lm(Response ~ ., data = data.df)

然后子集 .?

这里是My Data Frame。

编辑：

dput(head(data.df, 20))

structure(list(Factor1 = c(-1, 1, -1, 1, -1, 1, -1, 1, -1, 1, 
-1, 1, -1, 1, -1, 1, -1, 1, -1, 1), Factor2 = c(-1, -1, 1, 1, 
-1, -1, 1, 1, -1, -1, 1, 1, -1, -1, 1, 1, -1, -1, 1, 1), Factor3 = c(-1, 
-1, -1, -1, 1, 1, 1, 1, -1, -1, -1, -1, 1, 1, 1, 1, -1, -1, -1, 
-1), Factor4 = c(-1, -1, -1, -1, -1, -1, -1, -1, 1, 1, 1, 1, 
1, 1, 1, 1, -1, -1, -1, -1), Factor5 = c(-1, -1, -1, -1, -1, 
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 1, 1, 1, 1), Response = c(680.45, 
722.48, 702.14, 666.93, 703.67, 642.14, 692.98, 669.26, 491.58, 
475.52, 478.76, 568.23, 444.72, 410.37, 428.51, 491.47, 607.34, 
620.8, 610.55, 638.04), Order = c(17, 30, 14, 8, 32, 20, 26, 
24, 10, 16, 27, 18, 3, 19, 31, 15, 12, 1, 4, 23)), .Names = c("Factor1", 
"Factor2", "Factor3", "Factor4", "Factor5", "Response", "Order"
), row.names = c(NA, 20L), class = "data.frame")

【问题讨论】：

避免使用assign，阅读lapply，将输出保存在列表中。
有关在所有组合上运行 lm 的相关/可能重复的帖子：stackoverflow.com/questions/28606549/…
您可以查看many models section in R for Data Science 以安装多个模型
您应该考虑使用poly() 而不是^。

标签： r lm

【解决方案1】：

您的问题只是 1 不是lm 中的幂的可接受值：

> lm(mpg ~ cyl^1, mtcars)
Error in terms.formula(formula, data = data) : invalid power in formula

如果您将迭代器更改为使用 2:5，那么它将运行：

> for (m in 2:5) {lm(paste0("mpg ~ cyl^",m), mtcars);print("ok")}
[1] "ok"
[1] "ok"
[1] "ok"
[1] "ok"
> for (m in 1:5) tryCatch({lm(paste0("mpg ~ cyl^",m), mtcars);print("ok")}, error = function(e) warning(e))
[1] "ok"
[1] "ok"
[1] "ok"
[1] "ok"
Warning message:
In terms.formula(formula, data = data) : invalid power in formula

您可以编写一些代码，如果值为 1，则返回空字符串，如果值不是 1，则返回“^m”，如果必须在公式中包含 ^1 版本，则将其连接到公式的末尾循环。

为了回答您的第二个问题，我强烈建议从不在此类工作中使用assign。一旦你的 lm 对象在全局环境中，就很难让它们回来用它们做更多的事情。如果你想将你的 m 扩展到 20，或者你想为任何给定的 m 泛化你的代码，这将是非常困难的。

更好的选择是将 lm 对象存储在列表中。您可以通过将 for 循环更改为对 lapply 的调用来非常简单地做到这一点：

> results <- lapply(2:5, function(x) lm(paste0("mpg ~ cyl^",x), mtcars))
> str(results)
List of 4
 $ :List of 12
  ...snip...
 $ :List of 12
  ...snip... 
 $ :List of 12
  ...snip...
 $ :List of 12
  ...snip...

现在您可以遍历您的列表，一次性对每个模型执行“某些操作”：

> purrr::map(results, function(x) x$coefficients)
[[1]]
(Intercept)         cyl 
   37.88458    -2.87579 

[[2]]
(Intercept)         cyl 
   37.88458    -2.87579 

[[3]]
(Intercept)         cyl 
   37.88458    -2.87579 

[[4]]
(Intercept)         cyl 
   37.88458    -2.87579

这可以推广到您想应用于 lm 对象列表、任何 m 或实际上任何 lm 对象列表甚至任何类型对象列表的任何函数，例如：

> purrr::map(results, broom::tidy)
[[1]]
         term estimate std.error statistic      p.value
1 (Intercept) 37.88458 2.0738436 18.267808 8.369155e-18
2         cyl -2.87579 0.3224089 -8.919699 6.112687e-10

[[2]]
         term estimate std.error statistic      p.value
1 (Intercept) 37.88458 2.0738436 18.267808 8.369155e-18
2         cyl -2.87579 0.3224089 -8.919699 6.112687e-10

[[3]]
         term estimate std.error statistic      p.value
1 (Intercept) 37.88458 2.0738436 18.267808 8.369155e-18
2         cyl -2.87579 0.3224089 -8.919699 6.112687e-10

[[4]]
         term estimate std.error statistic      p.value
1 (Intercept) 37.88458 2.0738436 18.267808 8.369155e-18
2         cyl -2.87579 0.3224089 -8.919699 6.112687e-10

现在您有了一个数据框列表，其中每个数据框都包含有关模型的一些有趣信息。例如，您可以使用purrr::map 依次检查每个数据框中的 p 值并对其进行处理，或者创建 4 个拟合回归线图，或者做任何您喜欢的事情。

【讨论】：

感谢您的清晰解释！我通过添加一个 if() 语句解决了第一个问题，其中 if(m ==1) 将创建一个没有指数的 lm 模型。现在我将看第二部分。再次感谢！