R：自举混合模型二元逻辑回归答案

【问题标题】：R: bootstrapped mixed model binary logistic regressionR：自举混合模型二元逻辑回归
【发布时间】：2013-08-27 23:46:53
【问题描述】：

我需要引导我的混合模型二元逻辑回归。该模型本身运行良好（并得到专家朋友的批准和纠正），但自举版本有问题。引导版本之前已被另一位专家朋友批准（在 CrossValidated 中，但后来的 mods 删除了我的帖子，说它不属于 CrossValidated）。但是相同的代码恰好适用于简单的固定效应多元逻辑回归（尽管在这种情况下也有很多类似于此处警告的警告[除了这个针对 lmer() 函数的单一警告：“在 mer_finalize( ans) : 错误收敛 (8)")。

能否请您告诉我错误所在的位置以及如何调试它？

非常感谢。

我的代码是（我暂时将复制数保持得太低而无法调试代码）：

library(boot)
library(lme4)

mixedGLM <- function(formula, data, indices) {
        d <- data[indices, ]
        (fit <- lmer(DV ~ (Demo1 +Demo2+Demo3 +Demo4 +Trt)^2 
                     + (1 | PatientID) + (0 + Trt | PatientID)
                     , family=binomial(logit), d))
        return(coef(fit))
      }

results <- boot(data=MixedModelData4 , statistic = mixedGLM, R= 2, formula= DV~Demo1 +Demo2 +Demo3 +Demo4 +Trt)

。 . . 我的错误是：

Error in t.star[r, ] <- res[[r]] : 
  incorrect number of subscripts on matrix
In addition: Warning messages:
1: In mer_finalize(ans) : false convergence (8)
2: glm.fit: algorithm did not converge 
3: glm.fit: fitted probabilities numerically 0 or 1 occurred 
4: glm.fit: fitted probabilities numerically 0 or 1 occurred 
5: In mer_finalize(ans) : false convergence (8)

。 . . 你能告诉我如何让 boot() 函数也给出 P 值吗？？！它只给出 beta 和 SE 以及偏差和 CI，但我也需要 P 值。

非常感谢。

----------------------------------- ---- 发展故事 -------------------------------------------------------- ---------

好的，我很高兴运行 Henrik 的漂亮代码。但是代码并没有完全运行。首先它给出了这个错误：

Fitting 17 lmer() models:
[...
Error: pwrssUpdate did not converge in 30 iterations
In addition: Warning message:
In mixed(DV ~ (Demo1 + Demo2 + Demo3 + Demo4 + Trt)^2 + (1 | PatientID) +  :
  Due to missing values, reduced number of observations to 90
> (results2 <- mixed(DV ~ (Demo1 +Demo2+Demo3 +Demo4 +Trt)^2
+ results3 <- mixed(DV ~ (Demo1 +Demo2+Demo3 +Demo4 +Trt)^2

然后我删除了第一个括号块并将语法修改为这个：

results3 <- mixed(DV ~ (Demo1 +Demo2+Demo3 +Demo4 +Trt)^2 
                 + (0 + Trt | PatientID),
                 family=binomial(logit), data = MixedModelData4,
                 method = "PB", args.test = list(nsim = 2))

这次测试通过了第一步（拟合模型）但未能获得 P 值，再次给出相同的错误和警告：

Fitting 17 lmer() models:
[.................]
Obtaining 16 p-values:
[....
Error: pwrssUpdate did not converge in 30 iterations
In addition: Warning messages:
1: In mixed(DV ~ (Demo1 + Demo2 + Demo3 + Demo4 + Trt)^2 + (0 + Trt |  :
  Due to missing values, reduced number of observations to 90
2: In (function (fn, par, lower = rep.int(-Inf, n), upper = rep.int(Inf,  :
  failure to converge in 10000 evaluations
3: In (function (fn, par, lower = rep.int(-Inf, n), upper = rep.int(Inf,  :
  failure to converge in 10000 evaluations
4: In (function (fn, par, lower = rep.int(-Inf, n), upper = rep.int(Inf,  :
  failure to converge in 10000 evaluations
5: In (function (fn, par, lower = rep.int(-Inf, n), upper = rep.int(Inf,  :
  failure to converge in 10000 evaluations
6: In (function (fn, par, lower = rep.int(-Inf, n), upper = rep.int(Inf,  :
  failure to converge in 10000 evaluations

我不知道如何调试它，或者问题是我的数据集？我应该补充一点，我的数据集完全以均值为中心（所有变量）。 DV 仅被否定（因为均值居中不允许 R 工作，而否定对二元结果也一样）。

----------------------------------- - - - - - - 更新 - - - - - - - - - - - - - - - - - - - ----------------------

我将 METHOD 的 PB 值更改为 LRT（按照 Henrik 的建议），模型的拟合过程已完成，但获取 P 值的过程并未开始：

> results4 <- mixed(DV ~ (Demo1 +Demo2+Demo3 +Demo4 +Trt)^2 
+                   + (0 + Trt | PatientID),
+                   family=binomial(logit), data = MixedModelData4,
+                   method = "LRT", args.test = list(nsim = 2))
Fitting 17 lmer() models:
[.................]
Warning message:
In mixed(DV ~ (Demo1 + Demo2 + Demo3 + Demo4 + Trt)^2 + (0 + Trt |  :
  Due to missing values, reduced number of observations to 90

事实证明，当使用 LRT 时，P 值不是通过引导获得的。因此，结果已经准备好了（尽管不是自举的）。

【问题讨论】：

非常感谢。您是在谈论这个声明：“结果 talkstats.com/showthread.php/…
请注意，我没有真正的重复测量，而是伪复制。我的患者在长格式数据集中重复出现。所以我不知道重复测量之间是否存在真正的相关性或 100% 的伪相关性？我的 Demo 变量处于患者级别，但 Trt（治疗）处于治疗级别，对每个使用真正药物和安慰剂的患者重复...有关详细信息，请检查该链接。
嗯，我明白了，非常感谢。我同意它没有定义“公式”参数，但我已经从一个网站修改了这个函数，它实际上适用于固定效应二进制 logit。也许在那种情况下它也不能正常工作。但该网站是合法的。我已经详细说明了该功能和我在这里谈论的网站：talkstats.com/showthread.php/…
但是我会尝试在宽版本上运行模型（我也有，但我不知道如何准确区分宽格式的治疗和安慰剂，因为在其中，这两个具有不同的列，而不是同一变量的两个级别：处理）。
一些 cmets：(1) 查看lme4 的开发（即将发布）版本是一个非常好的主意，它具有一些内置功能 [ bootMer 和 confint(...,method="boot")] 和 (2) refit() 函数可以大大加快速度； (3) 在一些引导复制中看到失败是很常见的。

标签： r debugging syntax statistics-bootstrap

【解决方案1】：

如果您想要来自带有参数引导的GLMM 的p 值，您可以使用来自包afex 的函数mixed，它通过pbkrtest::PBmodcomp 获得它们：

library(afex)
results <- mixed(DV ~ (Demo1 +Demo2+Demo3 +Demo4 +Trt)^2 
                     + (1 | PatientID) + (0 + Trt | PatientID),
                     family=binomial(logit), data = d,
                     method = "PB", args.test = list(nsim = 1000))

您甚至可以先定义一个本地集群（即使用多个核心）：

cl <- makeCluster(rep("localhost", 4))
results <- mixed(DV ~ (Demo1 +Demo2+Demo3 +Demo4 +Trt)^2 
                     + (1 | PatientID) + (0 + Trt | PatientID),
                     family=binomial(logit), data = d,
                     method = "PB", args.test = list(nsim = 1000, cl = cl))

安装所有三个软件包的开发版本可能是最好的（因为当前版本的pbkrtest 是为lme4 1.0 设计的，尚未安装）：

afex 来自 r-forge：install.packages("afex", repos="http://R-Forge.R-project.org")
pbkrtest 来自：http://people.math.aau.dk/~sorenh/software/pbkrtest/devel/
lme4 来自 github：https://github.com/lme4/lme4

【讨论】：

我只能说太棒了！ :) 我正在安装 R3 并尝试您所链接的软件包。
这个问题是已知的，pbkrtest 的作者正在解决这个问题：thread.gmane.org/gmane.comp.lang.r.lme4.devel/10509/focus=10518 但这意味着，您的模型/数据表现不佳。它会在特定型号上失败吗？或者尝试 method="LRT" 而不是 "PB" 并仅在有趣的比较上使用 PBmodcomp。
顺便说一句，method = "LRT" 不基于参数引导计算 p 值，而是基于似然比检验。
@vic 如果你使用method = "LRT"，你不是在引导。只有 method = "PB" 使用引导程序。如果您将 thise 与足够大的样本（> 1000）一起使用，这就是您的引导程序。
@Vic 正如我所说，pbkrtest 的作者知道这个问题并正在努力解决（但我不知道什么时候会有解决方案）。至此，您可以使用旧版本的lme4 (cran.r-project.org/src/contrib/Archive/pbkrtest）。这应该会更好。