仅在第一次出现另一个模式之前查找模式（或：如何从混合效应模型的公式中删除随机效应）答案

【问题标题】：Find pattern only until first occurrence of another pattern (or: how to remove random effects from a formula of mixed effects models)仅在第一次出现另一个模式之前查找模式（或：如何从混合效应模型的公式中删除随机效应）
【发布时间】：2019-04-30 10:15:12
【问题描述】：

我想从模型公式中提取信息，尤其是我想去除随机效应，以便从混合模型（lme4-notation）中获得“固定效应部分”。

为此，我在找到括号 ( 之前在公式中搜索最后一个 +。 + 之前的所有内容都必须是公式的“固定”部分。这适用于具有固定效应预测器/变量的模型。

但是，对于空模型（仅在固定效果中拦截），可能没有 +，例如如果公式是Reaction ~ (Days | Subject)。在这种情况下，我检查是否有 no +-sign。但这不适用于具有多个随机部分的模型。在下面的示例中，grepl()for f2 应该返回 FALSE，但返回 TRUE，因为在随机部分的 second 开头括号中找到了 +。

我的问题：如何在第一个 ( 之后停止检查 +，从而忽略可能的第二个或第三个随机效应项？以下示例的目标是 grepl()-commands 返回 FALSE、FALSE、TRUE、TRUE。

f1 <- "Reaction ~ (1 + Days | Subject)"
f2 <- "Reaction ~ (1 | mygrp/mysubgrp) + (1 | Subject)"
f3 <- "Reaction ~ x1 + x2 + (1 + Days | Subject)"
f4 <- "Reaction ~ x1 + x2 + (1 | mygrp/mysubgrp) + (1 | Subject)"

# works!
grepl("\\+(\\s)*\\((.*)\\)", f1) # should return FALSE
#> [1] FALSE

# fails...
grepl("\\+(\\s)*\\((.*)\\)", f2) # should return FALSE
#> [1] TRUE

# works!
grepl("\\+(\\s)*\\((.*)\\)", f3) # should return TRUE
#> [1] TRUE

# works!
grepl("\\+(\\s)*\\((.*)\\)", f4) # should return TRUE
#> [1] TRUE

【问题讨论】：

顺便问一下：谁能详细说明为什么grepl("\\~[:space:]*[:alnum:]+[:space:]*\\+", f3)返回FALSE和stringr::str_detect(f3, "\\~[:space:]*[:alnum:]+[:space:]*\\+")返回TRUE？
@Roman: 试试grepl("\\~[[:space:]]*[[:alnum:]]+[[:space:]]*\\+", f3)
FFS。 @AkselA 谢谢。我以为我要疯了。
@Roman：不，只是 R 是它的特殊自我。 ;)
由于问题最终与模式匹配无关，因此更改标题可能是合适的，因此其他试图从公式中删除随机效应的人可以找到它。

标签： r regex lme4

【解决方案1】：

这并不是从 RE 的角度真正回答您的问题（可能有答案），但如果您的目标是提取随机效应和/或固定效应公式，您可能会从查看源代码中获得更多收益glFormula 和 lFormula 组成 lme4 包本身。由于他们分别为固定效果和随机效果创建了设计矩阵X 和Z，因此他们必须在某些点提取各自的部分。

例如，要提取固定效果，使用函数nobars 和RHSForm：

library(lme4)
f1 <- Reaction ~ (1 + Days | Subject)
f2 <- Reaction ~ (1 | mygrp/mysubgrp) + (1 | Subject)
f3 <- Reaction ~ x1 + x2 + (1 + Days | Subject)
f4 <- Reaction ~ x1 + x2 + (1 | mygrp/mysubgrp) + (1 | Subject)
(f1FixedEffects <- nobars(lme4:::RHSForm(f1)) #note the triple 'lme4:::'. RHSForm is not exported to the public environment.
[1] 1
(f2FixedEffects <- nobars(lme4:::RHSForm(f2))
[1] 1
(f1FixedEffects <- nobars(lme4:::RHSForm(f3))
x1 + x2
(f1FixedEffects <- nobars(lme4:::RHSForm(f4))
x1 + x2

如果希望提取您可以使用的整个公式

lme4:::RHSForm(f1) <- nobars(lme4:::RHSForm(f1)
f1
Reaction ~ 1

或类似（感谢 AkselA 的评论）

nobars(f1)
Reaction ~ 1

对于固定效果。

请注意，我将您的字符串公式转换为公式。这也可以用'as.formula()'来完成

【讨论】：

我认为您可以直接在公式上使用nobars()，而无需隔离右侧。
确实如此。只是最终从源头复制，不假思索。我会编辑我的答案并记录下来，谢谢。 :-)
谢谢！刚看了代码，就看到nobars() 只是用公式的第二部分（来自原始的第三部分）再次迭代地调用自己。那么nobars() 最终的作用是f4[[3]][[2]][[2]]（为每个进一步的随机效果添加任意数量的[[2]]）。
@Daniel：我也看到了，调用递归函数nobars_() 扫描树中的|s。很整洁。

【解决方案2】：

Oliver 的回答是正确的，尤其是当您已经在使用 lme4 时，但还有一个 base 框架可以用于修改可以使用的公式。

# Is read as class formula
f4 <- Reaction ~ x1 + x2 + (1 | mygrp/mysubgrp) + (1 | Subject)

# Isolate the terms and find which contains a vertical bar
f4t <- terms(f4)
dr <- grep("|", labels(f4t), fixed=TRUE)

# Drop the term(s) containing a vertical bar
f4td <- drop.terms(f4t, dr)

# Update the old formula with the new set of terms
f4u <- update(f4, f4td)

# Voilà
f4u
# Reaction ~ x1 + x2

如 cmets 中所述，这在两种特定情况下会失败：所有效果都是随机的，没有效果是随机的。为了正确处理这些异常，我发现最好在我使用的时候编写一个适当的函数。

drop_randfx <- function(form) {
    form.t <- terms(form)
    dr <- grepl("|", labels(form.t), fixed=TRUE)
    if (mean(dr) == 1) {
        form.u <- update(form, . ~ 1)
    } else {
        if (mean(dr) == 0) {
            form.u <- form
        } else {
            form.td <- drop.terms(form.t, which(dr))
            form.u <- update(form, form.td)
        }
    }
    form.u
}

这通过了所有测试

f1 <- Reaction ~ (1 + Days | Subject)
f2 <- Reaction ~ (1 | mygrp/mysubgrp) + (1 | Subject)
f3 <- Reaction ~ x1 + x2 + (1 + Days | Subject)
f4 <- Reaction ~ x1 * x2 + (1 | mygrp/mysubgrp) + (1 | Subject)
f5 <- Reaction ~ x1 + x2

sapply(list(f1, f2, f3, f4, f5), drop_randfx)    # [[1]]

# [[1]]
# Reaction ~ 1
# 
# [[2]]
# Reaction ~ 1
# 
# [[3]]
# Reaction ~ x1 + x2
# 
# [[4]]
# Reaction ~ x1 + x2 + x1:x2
# 
# [[5]]
# Reaction ~ x1 + x2

【讨论】：

虽然我喜欢这个答案，因为它依赖于基础 R，但我接受了 Oliver 的答案，因为它也适用于 f1 和 f2。 drop.terms() 在这里失败，也许这个异常可以被某种方式捕获。