这似乎太复杂了,但这是我能想到的。 (在不将线性模型本身作为管道的一部分运行的情况下执行此操作会更有效,即仅识别使用了哪些样本——这可能通过model.frame() 和一些适当的加入来实现......
library(dplyr)
library(purrr)
library(broom)
library(tibble)
## same as before, but also convert rownames to a column
df <- mtcars %>%
mutate(disp = replace(hp, c(2, 3), NA),
wt = replace(wt, c(3, 4, 5), NA)) %>%
rownames_to_column("model")
## (1) set up vector of vars and give it names (for later .id=)
dd <- c("disp", "wt") %>%
setNames(c("samp1", "samp2")) %>%
## (2) construct formulas for lm
map(reformulate, response = "mpg") %>%
## (3) fit the lm
map(lm, data = df) %>%
## (4) generate fitted values
map_dfr(augment, newdata=df, .id="samp") %>%
select(samp, model, .fitted) %>%
## (5) identify which observations were *not* used
mutate(val = !is.na(.fitted)) %>%
## (6) pivot from one long column to two half-length columns
pivot_wider(names_from=samp, values_from=val, id_cols= model) %>%
## (7) add to df
full_join(df, by = "model")
此版本无需运行模型即可完成此操作。
## helper function: returns logical vector of whether observation
## was included in model frame or not
drop_vec <- function(mf) {
nn <- attr(mf, "na.action")
incl <- rep(TRUE, nrow(mf) + length(nn))
incl[nn] <- FALSE
incl
}
## first few bits are the same as above
dd <- c("disp", "wt") %>%
setNames(c("samp1", "samp2")) %>%
map(reformulate, response = "mpg") %>%
## only construct model frames - don't run lm()
map(model.frame, data = df) %>%
## apply helper function
map(drop_vec) %>%
## stick them together
bind_cols(df)
我不喜欢这个解决方案的唯一一点是 samp 列在开头结束;将不得不大惊小怪才能将它们作为数据框中的 last 列。