具有 R 的一般混合线性模型中截距的假设检验答案

【问题标题】：Hypothesis test for intercepts in general mixed linear models with R具有 R 的一般混合线性模型中截距的假设检验
【发布时间】：2018-10-18 02:41:56
【问题描述】：

我有固定效应的数据：基因型 = C、E、K、M；年龄 = 30、45、60、75、90 天；随机效应：block = 1、2、3；和变量 = weight_DM。

文件在：https://drive.google.com/open?id=1_H6YZbdesK7pk5H23mZtp5KhVRKz0Ozl

我有每个基因型的年龄线性和二次斜率，但我没有截距和标准误差。 R代码是：

library(nlme)
library(lme4)
library(car)
library(carData)
library(emmeans)
library(ggplot2)
library(Matrix)
library(multcompView)
datos_weight <- read.csv2("D:/investigacion/publicaciones/articulos-escribiendo/pennisetum/pennisetum-agronomicas/data_weight.csv",header=T, sep = ";", dec = ",")

parte_fija_3 <- formula(weight_DM 
                    ~ Genotypes 
                    + Age 
                    + I(Age^2) 
                    + Genotypes*Age 
                    + Genotypes*I(Age^2))
heterocedasticidad_5 <- varComb(varExp(form = ~fitted(.)))
correlacion_4 <- corCompSymm(form = ~ 1|Block/Genotypes)

modelo_43 <- gls(parte_fija_3, 
             weights = heterocedasticidad_5, 
             correlation = correlacion_4, 
             na.action = na.omit, 
             data = datos_weight)
anova(modelo_43)

#response
Denom. DF: 48 
                   numDF  F-value p-value
(Intercept)            1 597.3828  <.0001
Genotypes              3   2.9416  0.0424
Age                    1 471.6933  <.0001
I(Age^2)               1  22.7748  <.0001
Genotypes:Age          3   5.9425  0.0016
Genotypes:I(Age^2)     3   0.7544  0.5253

#################################
#test whether the linear age slopes of each genotype is equal to zero
################################
(tendencias_em_lin <- emtrends(modelo_43,
                           "Genotypes",
                           var = "Age"))

#response
Genotypes Age.trend        SE df lower.CL upper.CL
C          1.693331 0.2218320 48 1.247308 2.139354
E          1.459517 0.2135037 48 1.030239 1.888795
K          2.001097 0.2818587 48 1.434382 2.567811
M          1.050767 0.1301906 48 0.789001 1.312532

Confidence level used: 0.95 

(tendencias_em_lin_prueba <- update(tendencias_em_lin,
                                infer = c(TRUE,TRUE),
                                null = 0))

#response
Genotypes Age.trend        SE df lower.CL upper.CL t.ratio p.value
C          1.693331 0.2218320 48 1.247308 2.139354   7.633  <.0001
E          1.459517 0.2135037 48 1.030239 1.888795   6.836  <.0001
K          2.001097 0.2818587 48 1.434382 2.567811   7.100  <.0001
M          1.050767 0.1301906 48 0.789001 1.312532   8.071  <.0001

Confidence level used: 0.95    

########################################                                
#test differences between slope of the age linear for each genotype
########################################
CLD(tendencias_em_lin, 
    adjust = "bonferroni",
    alpha = 0.05)                                                                         

#response
Genotypes Age.trend        SE df  lower.CL upper.CL .group
M          1.050767 0.1301906 48 0.7128801 1.388653  1    
E          1.459517 0.2135037 48 0.9054057 2.013628  12   
C          1.693331 0.2218320 48 1.1176055 2.269057  12   
K          2.001097 0.2818587 48 1.2695822 2.732611   2   

Confidence level used: 0.95 
Conf-level adjustment: bonferroni method for 4 estimates 
P value adjustment: bonferroni method for 6 tests 
significance level used: alpha = 0.05

问题

如何检验每个基因型的年龄截距是否为零？
如何测试每个基因型的年龄截距之间的差异？
每个基因型截距的标准误是多少？

感谢您的帮助。

【问题讨论】：

标签： r testing regression

【解决方案1】：

您可以使用emmeans() 以与使用emtrends() 类似的方式回答这些问题。

另请查看 summary.emmGrid 的文档，并注意您可以选择是否执行 CI、测试或两者都执行。例如，

emm <- emmeans(...)
summary(emm, infer = c(TRUE,TRUE))
summary(emm, infer = c(TRUE,FALSE))   # or confint(emm)
summary(emm, infer = c(FALSE,TRUE))   # or test(emm)

真正的拦截

如果实际上您想要实际的 y 截距，您可以使用

emm <- emmeans(..., at = list(age = 0))

预测是在 0 岁时进行的，这是每组条件的回归方程中的截距。但是，我想劝阻您不要这样做，因为（a）这些预测是巨大的外推，因此它们的标准误差也很大； (b) 预测 0 岁时的反应没有实际意义。因此，我认为问题 #1 基本上没有意义。

如果您不考虑 at 部分，则 emmeans() 会在数据集中的平均年龄进行预测。这些预测的标准误差将比截距小很多。由于您在模型中有涉及age 的交互，因此每个年龄的预测比较不同。我建议把

emm <- emmeans(..., cov.reduce = FALSE, by = "age")

这相当于使用at 指定数据集中出现的age 值集，并对每个年龄值进行单独比较。

【讨论】：