【问题标题】:Count missing values under specific date conditions in R在 R 中计算特定日期条件下的缺失值
【发布时间】:2021-12-31 00:53:29
【问题描述】:

上下文

我正在帮助一位在教育领域工作的朋友,她在大流行期间为学生提供的项目之一是在 Zoom 上进行一对一辅导。他们有员工在 Google 表格中考勤,但有时他们不可避免地会忘记标记出勤率。为了帮助她解决这个问题,我试图找出一些 R 代码,当员工忘记标记出勤时,这些代码将被捕获。我已经用谷歌搜索并尝试了 if_else 命令、pivot_longer、for-loops 等,但我仍然是 R 的(功能性)初学者,所以我正在阅读的很多内容都超出了我的想象。 (仅供参考,R 是我所知道的唯一编程语言,因此解决方案必须是我可以在未来实现并在需要时进行故障排除的东西)。也就是说,提前道歉不包括可重现的代码(因为我什至不知道从哪里开始)。但是对于任何能够提供帮助的人,here is sample data in a publicly viewable Google Sheet

问题

我需要查找和/或计算辅导start_date之后出现的NA值,但不是将来出现的NA。您通常可以知道员工何时忘记考勤,因为:

  1. 那些 NA 介于员工确实记得出勤和

    的周之间
  2. 那些 NA 发生在过去但从未在辅导之前发生start date

为了帮助可视化这一点,我用黄色突出显示了大多数(可能是全部)符合此标准的 NA。

由于这不是我的数据/我的项目,因此我无法仅仅因为这是不好的做法而对其进行更改(例如,星期几和时间在同一个单元格中)。但是,非常感谢您提供的任何直接帮助我解决问题的解决方案。

最好的,

詹姆斯

编辑其他上下文

为了帮助监控出勤率,我正在创建一个 flexdashboard,目标是包含一个 valueBox(或类似的东西),用于填充不应该有 NA 的辅导课程的数量。如果程序管理员通过仪表板知道有 NA,她可以尽快跟进她的员工并让他们输入正确的代码(X、1、2 等)。她现在遇到的问题是几天或几周过去了,员工可能会忘记会话是否发生或有人迟到。谢谢!

dput(fake_tutoring_data)

structure(list(start_date = structure(c(18893, 18897, 18898, 
18900, 18900, 18900, 18901, 18904, 18907, 18911, 18911, 18912, 
18913, 18919, 18919, 18925, 18925, 18933, 18933, 18934, 18935, 
18939, 18939, 18939, 18946, 18964, 18968, 18968), class = "Date"), 
    day_time = c("TH/7pm", "MON/5:30 PM", "TUE/6:15PM", "TH/9am", 
    "TH/6:30 PM", "TH/7pm", "F/5:15pm", "MON/4:30PM", "TH/6 pm", 
    "MON/ 5:00 PM", "MON/6:00 PM", "TUE/6:30 PM", "WED/6pm", 
    "TUE/11:00 AM", "TUE/2pm", "MON 4:45 PM", "TUE/6:00 PM", 
    "TUE/6:00 PM", "TUE/6:15PM", "WED/5:00 PM", "TH/6PM", "MON/5:30PM", 
    "MON/5:30 PM", "MON/6:00 PM", "MON/6:00 PM", "F/12pm", "Tue/ 4:30pm", 
    "Tue/5:00 pm"), `2021-09-20` = c("1", NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA), `2021-09-27` = c(NA, "1", "1", 
    "1", "1", "1", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), `2021-10-04` = c("X", 
    "1", "1", "1", "1", "X", NA, "1", "1", NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
    `2021-10-11` = c("X", "1", "1", "1", "1", "1", "1", "1", 
    "4", "1", "1", "1", "1", NA, "4", NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA), `2021-10-18` = c("X", "1", "1", 
    "1", "X", "1", "1", "1", "4", "3", "X", "1", "2", NA, "1", 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), `2021-10-25` = c("1", 
    "1", "1", "1", "1", "4/5", "1", "1", "2", "X", "1", "4", 
    "1", "1", "X", "1", "1", NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA), `2021-11-01` = c("1", "1", "1", "1", "1", NA, 
    "1", "4", "1", "X", "1", "1", NA, "1", "1", "4", "1", "1", 
    "1", "1", "1", NA, NA, NA, NA, NA, NA, NA), `2021-11-08` = c("1", 
    "1", "3", NA, "1", NA, "1", "X", NA, "5", NA, "4", "2", "4", 
    "1", "1", "1", "1", "1", "1", "1", "1", NA, NA, NA, NA, NA, 
    NA), `2021-11-15` = c("1", "1", NA, "1", "1", "1", "1", "3", 
    "2", "3", NA, "2", "4", "2", NA, "2", "1", "1", NA, NA, "1", 
    "2", "1", "1", "1", NA, NA, NA), `2021-11-22` = c(NA, "1", 
    "1", NA, NA, NA, "1", "X", NA, NA, NA, NA, NA, "5", "X", 
    "4", "4", "1", "4", NA, "1", "1", "2", "X", NA, NA, NA, NA
    ), `2021-11-29` = c("1", "1", NA, "1", "1", "X", "1", NA, 
    "2", NA, NA, "1", NA, "1", "2", "X", "1", "1", NA, "1", "4", 
    "1", "1", "1", "1", "4", NA, NA), `2021-12-06` = c("1", NA, 
    "1", NA, NA, NA, "5", NA, "1", NA, NA, "1", "1", NA, NA, 
    "X", "3", "1", NA, "2", "5", NA, "1", "1", "1", "4", "1", 
    "1"), `2021-12-13` = c(NA, "1", NA, NA, "1", NA, NA, NA, 
    NA, NA, NA, "5", "5", NA, NA, NA, NA, "1", "X", "2", NA, 
    NA, "5", "1", "4", NA, NA, "1"), `2021-12-20` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2021-12-27` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-01-03` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-01-10` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-01-17` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-01-24` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-01-31` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-02-07` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-02-14` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-02-21` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-02-28` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-03-07` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-03-14` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-03-21` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-03-28` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-04-04` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-04-11` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-04-18` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-04-25` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-05-02` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-05-09` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-05-16` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-05-23` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-05-30` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-06-06` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_)), row.names = c(NA, 
-28L), class = c("tbl_df", "tbl", "data.frame"))

【问题讨论】:

  • 如果您可以编辑您的问题以在示例数据中包含一个 R 对象,那将很有帮助!原始数据框(或者您已将内容导入 R)很好。例如,如果要将表加载到名为 attendance 的 R 数据帧中,请在 R 控制台中键入 dput(attendance) 并将结果粘贴到此处。
  • 感谢您的提示和善意。我已经包含了原始帖子。

标签: r missing-data lubridate flexdashboard


【解决方案1】:

在这种情况下,转置您的数据是可行的方法。这就是我将如何做到的。首先,将fake_tutoring_data 从宽格式转换为长格式:将每一行替换为每个日期列对应的行,保留start_dateday_time 标识符。我使用正则表达式来识别日期列;它们的格式为 YYYY-MM-DD。 (我还添加了一个id 列,它只是行号;稍后会派上用场。)

library(tidyverse)
library(lubridate)

long_tutoring_data = fake_tutoring_data %>%
  mutate(id = row_number()) %>%
  pivot_longer(cols = matches("^[0-9]{4}-[0-9]{2}-[0-9]{2}$"),
               names_to = "attendance_date",
               values_to = "attendance") %>%
  mutate(attendance_date = ymd(attendance_date))

接下来,使用三个条件查找缺失的考勤记录:如果记录缺失,则记录缺失

  1. 今天之前,
  2. 在相应的start_date 上或之后,并且
  3. 不见了。
missing_attendance = long_tutoring_data %>%
  filter(attendance_date < Sys.Date(),
         attendance_date >= start_date,
         is.na(attendance))

我没有检查这是否与您在电子表格中标记的 NA 完全对应,但一些抽查表明它非常接近。

您建议的替代标准 - 缺失记录是按时间顺序落在两个非缺失记录之间的记录 - 比较棘手但可行。我添加了两列 previous_attendancenext_attendance,其中包含此 id 的上一个(或下一个)非空出勤记录。然后我们可以找到至少有一个前后非缺失记录的缺失记录。

missing_attendance = long_tutoring_data %>%
  group_by(id) %>%
  arrange(id, attendance_date) %>%
  mutate(previous_attendance = attendance,
         next_attendance = attendance) %>%
  fill(previous_attendance, .direction = "down") %>%
  fill(next_attendance, .direction = "up") %>%
  ungroup() %>%
  filter(is.na(attendance),
         !is.na(previous_attendance),
         !is.na(next_attendance))

这并没有像第一种方法那样识别出尽可能多的缺失记录;您将最好地判断哪种方法更适合您的目的。

【讨论】:

  • 您的第一个解决方案效果很好!非常感谢您的帮助!
  • 在这种情况下,您可以通过单击复选标记来接受答案作为您问题的答案。
猜你喜欢
  • 1970-01-01
  • 2020-06-24
  • 1970-01-01
  • 1970-01-01
  • 2021-12-31
  • 1970-01-01
  • 2021-03-04
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多