如何在R中使不在扫帚中的模型的输出保持整洁答案

【问题标题】：How to make outputs of models which are not in broom tidy in R如何在R中使不在扫帚中的模型的输出保持整洁
【发布时间】：2021-06-25 11:25:43
【问题描述】：

我一直在尝试使 wfe 模型的输出整洁，以便我可以轻松地将其合并到 ggplot 等中。这是我在使用扫帚中未包含的其他包和统计模型时遇到的问题.

假设我创建了一个这样的数据集：（取自 wfe 的文件）：

library (wfe)
        ## generate panel data with number of units = N, number of time = Time
    N <- 10 # number of distinct units
    Time <- 15 # number of distinct time
    
    ## treatment effect
    beta <- 1
    
    ## generate treatment variable
    treat <- matrix(rbinom(N*Time, size = 1, 0.25), ncol = N)
    ## make sure at least one observation is treated for each unit
    while ((sum(apply(treat, 2, mean) == 0) > 0) | (sum(apply(treat, 2, mean) == 1) > 0) |
           (sum(apply(treat, 1, mean) == 0) > 0) | (sum(apply(treat, 1, mean) == 1) > 0)) {
      treat <- matrix(rbinom(N*Time, size = 1, 0.25), ncol = N)
    }
    treat.vec <- c(treat)
    
    ## unit fixed effects
    alphai <- rnorm(N, mean = apply(treat, 2, mean))
        ## geneate two random covariates
    x1 <- matrix(rnorm(N*Time, 0.5,1), ncol=N)
    x2 <- matrix(rbeta(N*Time, 5,1), ncol=N)
    x1.vec <- c(x1)
    x2.vec <- c(x2)
    ## generate outcome variable
    y <- matrix(NA, ncol = N, nrow = Time)
    for (i in 1:N) {
      y[, i] <- alphai[i] + treat[, i] + x1[,i] + x2[,i] + rnorm(Time)
    }
    y.vec <- c(y)
    
    ## generate unit and time index
    unit.index <- rep(1:N, each = Time)
    time.index <- rep(1:Time, N)
    
    Data.obs <- as.data.frame(cbind(y.vec, treat.vec, unit.index, time.index, x1.vec, x2.vec))
    colnames(Data.obs) <- c("y", "tr", "unit", "time", "x1", "x2")

现在我从函数 wfe 运行一个模型（同样，代码来自包的帮助文件）：

    mod.did <- wfe(y~ tr+x1+x2, data = Data.obs, treat = "tr",
               unit.index = "unit", time.index = "time", method = "unit",
               qoi = "ate", estimator ="did", hetero.se=TRUE, auto.se=TRUE,
               White = TRUE, White.alpha = 0.05, verbose = TRUE)

## summarize the results
summary(mod.did)

我的问题是如何将此输出转换为我可以绘制的整洁对象。

如果我打电话给tidy(mod.did)，我会收到以下错误：

Error: No tidy method for objects of class wfedid

我明白，但我不确定如何解决。我尝试将各个参数（系数、se 等）映射到一个新的列表对象中，但没有奏效，所以我希望这里有人知道更系统的方法。如果有帮助，这里是输出的 dput：https://pastebin.com/HTkKEUUQ

谢谢！

【问题讨论】：

免责声明 - 我还没有使用 wfe 包！通常，大多数统计软件包都会附带他们的预测方法。这意味着当您使用predict(model) 时，您将获得一个包含绘图所需数据的对象。
像@tjebo 我也不熟悉 wfe 包。提供dput(mod.did) 的输出将帮助他和我这样的人帮助你。我的建议是编写你自己的tidy() 函数（然后也许将它提交给broom 的维护者）。原则上不难。
添加到我之前的评论 - 在这种情况下，wfe 不会为生成的 wfedid 类提供预测方法。
可以直接询问包维护者-github.com/insongkim/wfe/issues
谢谢 - 我为 dput(mod.did) 输出添加了一个 pastebin 链接。也许这有帮助

标签： r ggplot2 tidyr broom

【解决方案1】：

这是tidy 方法的开始：

library(dplyr); library(tibble)

tidy.wfedid <- function(x, conf.int=FALSE, conf.level=0.95, ...) {
   cc <- (coef(summary(x))
      %>% as.data.frame()
      %>% setNames(c("estimate","std.error","statistic","p.value"))
      %>% tibble::rownames_to_column("term")
      %>% as_tibble()
   )
   return(cc)
}

请注意，（1）我没有实现置信区间的东西（你可以通过使用 mutate 添加列 (conf.low, conf.high) = term ± std.error*qnorm((1+conf.level)/2) 来做到这一点；（2）这给出了标准“ tidy" 方法，它给出了一个系数表。如果你想要预测和预测的置信区间，你需要编写一个augment 方法...

【讨论】：

这很棒。更重要的是，当这种情况再次发生时，我也明白问题所在。非常感谢！
（也可以在这里添加：github.com/tidymodels/broom，它可能会帮助其他人）