GLMMadaptive (R) 中的零膨胀两部分模型：固定效应零部分的方差分析？答案

【问题标题】：Zero-inflated two-part models in GLMMadaptive (R): anova on fixed effects zero-part?GLMMadaptive (R) 中的零膨胀两部分模型：固定效应零部分的方差分析？
【发布时间】：2021-02-10 18:39:13
【问题描述】：

我正在使用 R 中的 GLMMadaptive 包运行障碍对数正态模型。连续部分和零部分都在固定效应中定义了分类变量。我想对这些分类变量进行方差分析，以检测是否存在主效应。

我已经看到，使用 glmmTMB 包，您可以分别在条件模型和零部分模型上分别运行 ANOVA，如 here 所示。

对于 GLMMadaptive 包是否有类似的策略可用？（据我所知，glmmTMB 不支持障碍对数正态模型）。也许使用 emmeans 包中的 joint_tests 函数？如果是这样，您如何定义要测试零部分模型？因为emmeans::joint_tests(hurdlemodel) 只给出了模型条件部分的 F 检验。

或者作为一种替代方法，您能否将排除感兴趣变量的模型与完整模型的拟合度进行比较，正如this vignette 中随机效应的相关性所证明的那样？

非常感谢！

Russ Lenth 在 cmets 中的建议在下面实现，使用 GLMMadaptive two-part model vignette 中的数据和模型：

library(GLMMadaptive)
library(emmeans)

# data generating code from the vignette:
{
set.seed(1234)
n <- 100 # number of subjects
K <- 8 # number of measurements per subject
t_max <- 5 # maximum follow-up time

# we construct a data frame with the design: 
# everyone has a baseline measurement, and then measurements at random follow-up times
DF <- data.frame(id = rep(seq_len(n), each = K),
                 time = c(replicate(n, c(0, sort(runif(K - 1, 0, t_max))))),
                 sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))

# design matrices for the fixed and random effects non-zero part
X <- model.matrix(~ sex * time, data = DF)
Z <- model.matrix(~ 1, data = DF)
# design matrices for the fixed and random effects zero part
X_zi <- model.matrix(~ sex, data = DF)
Z_zi <- model.matrix(~ 1, data = DF)

betas <- c(1.5, 0.05, 0.05, -0.03) # fixed effects coefficients non-zero part
shape <- 2 # shape/size parameter of the negative binomial distribution
gammas <- c(-1.5, 0.5) # fixed effects coefficients zero part
D11 <- 0.5 # variance of random intercepts non-zero part
D22 <- 0.4 # variance of random intercepts zero part

# we simulate random effects
b <- cbind(rnorm(n, sd = sqrt(D11)), rnorm(n, sd = sqrt(D22)))
# linear predictor non-zero part
eta_y <- as.vector(X %*% betas + rowSums(Z * b[DF$id, 1, drop = FALSE]))
# linear predictor zero part
eta_zi <- as.vector(X_zi %*% gammas + rowSums(Z_zi * b[DF$id, 2, drop = FALSE]))
# we simulate negative binomial longitudinal data
DF$y <- rnbinom(n * K, size = shape, mu = exp(eta_y))
# we set the extra zeros
DF$y[as.logical(rbinom(n * K, size = 1, prob = plogis(eta_zi)))] <- 0
}

#create categorical time variable
DF$time_categorical[DF$time<2.5] <- "early"
DF$time_categorical[DF$time>=2.5] <- "late"
DF$time_categorical <- as.factor(DF$time_categorical)

#model with interaction in fixed effects zero part and adding nesting in zero part as in model above
km3 <- mixed_model(y ~ sex * time_categorical, random = ~ 1 | id, data = DF, 
                   family = hurdle.lognormal(), n_phis = 1,
                   zi_fixed = ~ sex * time_categorical, zi_random = ~ 1 | id)

#### ATTEMPT at QDRG function in emmeans ####

coef_zero_part <- fixef(km3, sub_model = "zero_part")
vcov_zero_part <- vcov(km3)[9:12,9:12]

qd_km3 <- emmeans::qdrg(formula = ~ sex * time_categorical, data = DF,
coef = coef_zero_part, vcov = vcov_zero_part)

输出：

> joint_tests(qd_km3)
 model term           df1 df2 F.ratio p.value
 sex                    1 Inf  11.509 0.0007 
 time_categorical       1 Inf   0.488 0.4848 
 sex:time_categorical   1 Inf   1.077 0.2993 

> emmeans(qd_km3, pairwise ~ sex|time_categorical)
$emmeans
time_categorical = early:
 sex    emmean    SE  df asymp.LCL asymp.UCL
 male   -1.592 0.201 Inf     -1.99    -1.198
 female -1.035 0.187 Inf     -1.40    -0.669

time_categorical = late:
 sex    emmean    SE  df asymp.LCL asymp.UCL
 male   -1.914 0.247 Inf     -2.40    -1.429
 female -0.972 0.188 Inf     -1.34    -0.605

Confidence level used: 0.95 

$contrasts
time_categorical = early:
 contrast      estimate    SE  df z.ratio p.value
 male - female   -0.557 0.270 Inf -2.064  0.0390 

time_categorical = late:
 contrast      estimate    SE  df z.ratio p.value
 male - female   -0.942 0.306 Inf -3.079  0.0021

检查对比是否与零部分固定效果相对应：

> fixef(km3, sub_model = "zero_part")
                   (Intercept)                      sexfemale           time_categoricallate sexfemale:time_categoricallate 
                    -1.5920415                      0.5568072                     -0.3220390                      0.3849780 

> (-1.5920) - (-1.5920 + 0.5568)
[1] -0.5568 #matches contrast within "early" level of "time_categorical"
> (-1.5920 + -0.3220) - (-1.5920 + -0.3220  + 0.5568 + 0.3850)
[1] -0.9418 #matches contrast within "late" level of "time_categorical"

【问题讨论】：

您也许可以使用emmeans::qdrg() 来创建所需的对象。请参阅其文档。您显然无法使用 object 参数。您需要为模型的条件部分或零部分指定数据、固定效应公式，以及相关模型部分的相关回归系数和 vcov 矩阵。对于后者，您可能必须选择系数和协方差矩阵的子集。
感谢您的建议！我一直在尝试，但它似乎还没有工作（可能是由于我缺乏洞察力......）。我在上面的帖子中添加了我的代码。
看起来基本没问题。尝试列出您用于 coef 和 vcov 的内容，并确保其中没有多余的内容。
它仍然不起作用，尽管vcov 元素确实对零部分没有选择性。我现在将其编辑为一个 4x4 矩阵，其中包含行/列“zi_(Intercept)”、“zi_sexfemale”、“zi_time_categoricallate”和“zi_sexfemale:time_categoricallate”。 coef 元素是一个包含 4 列“(Intercept)”、“sexfemale”、“time_categoricallate”和“sexfemale:time_categoricallate”的矩阵，id 的每个嵌套都有一行。列的命名是否重要？还是可能有其他问题？谢谢！
为什么你的 coef 是矩阵？它应该是一个长度为 4 的向量。

标签： r anova mixed-models

【解决方案1】：

函数emmeans::qdrg() 有时可用于为emmeans 不直接支持的模型创建所需的对象。请参阅其文档。在非常简单的模型中（例如，从 lm 继承，提供 object 和 data 参数可能就足够了。

这通常不适用于更复杂的模型，在这种情况下您需要为模型的条件部分或零部分指定data，固定效应formula，以及相关的回归系数（coef）和方差-协方差矩阵（vcov）的部分有问题的模型。通常对于具有多个组件的此类模型，您可能必须选择系数和协方差矩阵的子集。这些都必须符合：coef 的长度必须等于vcov 的行数和列数以及formula 生成的模型矩阵中的列数[可以通过model.matrix(formula, data = data) 进行检查]。

qdrg()不会适用于多变量模型 - 或者至少它很棘手 - 因为隐含模型涉及描述多变量响应水平的其他因素。如果有特殊规定，例如样条平滑，那是 qdrg() 可能无法工作的另一个实例。

一旦qdrg() 实际运行并产生结果，最好使用它来估计模型参数化估计的一些对比。例如，假设模型安装了默认的contr.treatment 对比。然后回归系数可以解释为与作为参考水平的第一水平的比较。因此，如果我们计算了rg <- qdrg(...)，其中一个因素是"treat"，请查看contrast(rg, "trt.vs.ctrl1", simple = "treat")，并检查估计的对比组第一组是否与主效应估计相匹配treat.

我将用一个简单的 lm 模型来说明所有这些，忽略 emmeans 已经支持它的事实。

> warp.lm <- lm(breaks ~ wool * tension, data = warpbreaks)

这是参考网格

> rg <- qdrg(~ wool * tension, coef = coef(warp.lm), vcov = vcov(warp.lm),
+     df = df.residual(warp.lm), data = warpbreaks)

这是一个健全性检查——首先，看一下模型摘要：

> summary(warp.lm)$coef
                Estimate Std. Error   t value     Pr(>|t|)
(Intercept)     44.55556   3.646761 12.217842 2.425903e-16
woolB          -16.33333   5.157299 -3.167032 2.676803e-03
tensionM       -20.55556   5.157299 -3.985721 2.280796e-04
tensionH       -20.00000   5.157299 -3.877999 3.199282e-04
woolB:tensionM  21.11111   7.293523  2.894501 5.698287e-03
woolB:tensionH  10.55556   7.293523  1.447251 1.543266e-01

其次，看选定的对比：

> contrast(rg, "trt.vs.ctrl1", simple = "wool")
tension = L:
 contrast estimate   SE df t.ratio p.value
 B - A      -16.33 5.16 48 -3.167  0.0027 

tension = M:
 contrast estimate   SE df t.ratio p.value
 B - A        4.78 5.16 48  0.926  0.3589 

tension = H:
 contrast estimate   SE df t.ratio p.value
 B - A       -5.78 5.16 48 -1.120  0.2682 

> contrast(rg, "trt.vs.ctrl1", simple = "tension")
wool = A:
 contrast estimate   SE df t.ratio p.value
 M - L     -20.556 5.16 48 -3.986  0.0005 
 H - L     -20.000 5.16 48 -3.878  0.0006 

wool = B:
 contrast estimate   SE df t.ratio p.value
 M - L       0.556 5.16 48  0.108  0.9863 
 H - L      -9.444 5.16 48 -1.831  0.1338 

P value adjustment: dunnettx method for 2 tests

与回归系数相比，我们确实确认 wool 的第一个对比度估计为 -16.33，与 woolB 的回归系数匹配。此外，tension 的第一组对比估计为 -20.556 和 -20.0，与 tensionM 和 tensionH 的回归系数相匹配。 SE 和 t 比率也匹配。（由于多重性调整，第二组的 P 值不匹配。）

【讨论】：