【发布时间】:2020-03-08 23:02:42
【问题描述】:
我有一个 df1,每 人 (id) 有多个问卷(测量),这些问卷在特定时间点(日期)得到回答.通常,每个人都应每次会话填写三份调查问卷(第一、前、后)。一些参与者未能填写所有三份问卷。他们可能只回答三个中的一两个。因此,可能的模式可能是完整的(参与者 A)、缺少“post”(参与者 B)、缺少“first”(参与者 C)、缺少“pre”(参与者 D),或者只回答了三个中的一个(参与者E、F、G)。
见df1:
df1 <- structure(list(id = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 6L, 7L), .Label = c("A", "B", "C", "D", "E", "F", "G"), class = "factor"), measure = structure(c(1L, 3L, 2L, 1L, 3L, 3L, 2L, 1L, 2L, 1L, 3L, 2L), .Label = c("first", "post", "pre"), class = "factor"), date = structure(c(17558, 17558, 17558, 17558, 17559, 17559, 17559, 17559, 17558, 17558, 17558, 17558 ), class = "Date"), result = c(1, 5, 4, 7, 8, 7, 2, 1, 3, 5, 7, 7)), class = "data.frame", row.names = c(NA, -12L))
现在,我想在数据集中添加缺失的行,其中包含 id 和 measure 以及缺失日期和结果的“NA”。最终的 df 应该看起来像 df2。
df2 <- structure(list(id = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L), .Label = c("A", "B", "C", "D", "E", "F", "G"), class = "factor"), measure = structure(c(1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L), .Label = c("first", "post", "pre"), class = "factor"), date = structure(c(17558, 17558, 17558, 17558, 17559, NA, NA, 17559, 17559, 17559, NA, 17558, 17558, NA, NA, NA, 17558, NA, NA, NA, 17558), class = "Date"), result = c(1, 5, 4, 7, 8, NA, NA, 7, 2, 1, NA, 3, 5, NA, NA, NA, 7, NA, NA, NA, 7)), class = "data.frame", row.names = c(NA, -21L))
我尝试将可能丢失的组合分组并插入一行。但这并没有带来预期的结果。
require (tidyverse)
final <- df1 %>%
group_by(id, measure == "first" & lag(measure, 1, default=NA) == "post") %>%
do(add_row(., measure = "pre", .after = 0)) %>%
ungroup()
我也试过了
final <- df1 %>% complete(id, nesting(measure, date))
也许更复杂的是,参与者可以参加多个会议。因此,有可能每个 id 都有 x * (first, post, pre)。
【问题讨论】: