在 R 中生成综合报告格式答案

【问题标题】：Generate a comprehensive report format in R在 R 中生成综合报告格式
【发布时间】：2020-07-27 22:22:15
【问题描述】：

我已经从 MySQL 服务器获取了一些信息到 R 中，在我的 R 数据框中如下所示：

barcode_no   Inspection_date        current_profile      score    Tag_log   prod_log
12345678     2020-01-15 14:34:13    Large                10       C1        WIP
12345678     2020-01-15 18:33:11    Medium               20       C2        Hold
12345678     2020-01-15 13:23:24    Medium               50       C3        Hold
12345678     2020-01-15 12:12:23    Medium               70                 Shipped
12345678     2020-01-15 11:12:45    Medium               120      C1        Shipped
12345678     2020-01-15 12:22:32    Small                150      C2        Shipped
12345678     2020-01-15 15:23:23    Small                10       C3        WIP
12345678     2020-01-15 16:34:08    Small                20       C2        Hold
12345678     2020-01-15 17:07:13    Small                130      C1        Hold
12345678     2020-01-15 17:09:05    Small                40                 Hold

要求是将上述数据框的详细信息适合日期和月份的综合报告结构。

comprehensive_df（日期）：如果该日期的部分或全部记录不可用，则将根据系统日期考虑最晚日期，然后用0填写综合报告df。

Current_profile     # of records  % of records C1 C2 C3 [Null] # of records  % of records C1 C2 C3 [Null] # of records  % of records C1 C2 C3 [Null] Total    % Total
**Large               01            16.67        1  0  0  0      0             0            0   0  0    0      0             0            0  0  0   0     1        10.00**
Shipped             0             0.0          0  0  0  0      0             0            0   0  0    0      0             0            0  0  0   0     0        0.0
Hold                0             0.0          0  0  0  0      0             0            0   0  0    0      0             0            0  0  0   0     0        0.0
WIP                 01             1.0         1  0  0  0      0             0            0   0  0    0      0             0            0  0  0   0     1        100.00
**Small               03            50.00        0  1  1  1      0             0            0   0  0    0     02             66.67        1  1  0   0     5        50.00**
Shipped             0             0            0  0  0  0      0             0            0   0  0    0     01             50.00        0  1  0   0     1        20.00
Hold                02            66.67        0  1  0  1      0             0            0   0  0    0      1             100.00       1  0  0   0     3        60.00
WIP                 01            33.33        1  0  0  0      0             0            0   0  0    0      0             0            0  0  0   0     1        20.00
**Medium              02            33.33        0  1  1  0      1             100.00       0   0  0    1      1             33.33        1  0  0   0     4        40.00**
Shipped             0              0           0  0  0  0      1             100.00       0   0  0    1      1             100.00       0  0  0   0     2        50.00
Hold                2            100.00        0  1  1  0      0             0            0   0  0    0      0             0            0  0  0   0     2        50.00
WIP                 0            0             0  0  0  0      0             0            0   0  0    0      0             0            0  0  0   0     0        0
Total               06            0.10         1  0  0  0      1             0            0   0  0    0      3             0            0  0  0   0     1        0.10

我将综合数据框划分为多个部分，其中第 2 到第 7 列代表得分从 0 到 50 到 100 和第 14 列的人数到 20 表示得分 > 100 的人数。

我正在尝试的代码：

df1<- df %>%
  mutate(Month = format(ymd(Inspection_date),'%b-%Y')) %>%
  group_by(Month) %>%
  dplyr::summarise(`current_profile` = n())

df2<- df %>%
  mutate(Month = format(ymd(Inspection_date),'%b-%Y')) %>%
  group_by(Month) %>%
  dplyr::summarise(`Tag_log` = n())

df3<- df %>%
  mutate(Month = format(ymd(Inspection_date),'%b-%Y')) %>%
  group_by(Month) %>%
  dplyr::summarise(`prod_log` = n())

对于每个变量，依此类推。然后尝试full_join Date 的所有数据框以获取日期综合格式和月份综合格式。

comprehensive_df <- df1 %>% full_join(df1, by = 'Month') %>% 
                      full_join(df2, by = 'Month') %>%
                      full_join(df3, by = 'Month')

【问题讨论】：

到目前为止你有什么代码？ gt 可能会有所帮助。
@alistaire：我是使用 R 的初学者。我尝试过 dplyr 和 tidyverse 库来实现全面的数据框，但到目前为止只能通过 Inspection_Date 聚合计数和Current_profile 仅限。不知道如何与Current_profile 和prod_log 聚合在一起，
按多个变量分组：my_df %>% group_by(Current_profile, prod_log, Tag_log) %>% summarise(n_records = n())。这不会为您提供您要求的演示格式，但它将为您提供的格式对于进一步分析更有用，因为它是 tidy。
@alistaire：是的，但需要以这种格式创建，以便在呈现月度报告的数据时更加直观。
@alistaire：老实说，这是一些我想自动化的手动任务，因为它需要大量时间来使用write.csv() 导出数据并将其以给定格式排列在 excel 中。

标签： r dplyr tidyverse

【解决方案1】：

我不确定我是否了解您的需求，但可能是这样的？

library(magrittr)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date

dat <- tibble::tribble(
  ~barcode_no, ~Inspection_date, ~current_profile, ~score, ~Tag_log, ~prod_log,
    12345678L,     "15/01/2020",          "Large",    10L,     "C1",     "WIP",
    12345678L,     "15/01/2020",         "Medium",    20L,     "C2",    "Hold",
    12345678L,     "15/01/2020",         "Medium",    50L,     "C3",    "Hold",
    12345678L,     "15/01/2020",         "Medium",    70L,       NA, "Shipped",
    12345678L,     "15/01/2020",         "Medium",   120L,     "C1", "Shipped",
    12345678L,     "15/01/2020",          "Small",   150L,     "C2", "Shipped",
    12345678L,     "15/01/2020",          "Small",    10L,     "C3",     "WIP",
    12345678L,     "15/01/2020",          "Small",    20L,     "C2",    "Hold",
    12345678L,     "15/01/2020",          "Small",   130L,     "C1",    "Hold",
    12345678L,     "15/01/2020",          "Small",    40L,       NA,    "Hold"
  )



dat$Inspection_date = as.Date(dat$Inspection_date,format = "%d/%m/%Y")

today = Sys.Date()

param_date = as.Date("15/01/2020",format = "%d/%m/%Y")

dat$month = format(ymd(dat$Inspection_date),'%b-%Y')

dat$score_group = dplyr::case_when(
  dat$score <= 50 ~ "low",
  dat$score < 100 ~ "med",
  TRUE ~ "high"
)

dat %>% dplyr::filter(Inspection_date >= param_date) %>%
  dplyr::group_by(current_profile, month, score_group, Tag_log,prod_log) %>% 
  dplyr::summarise(count = dplyr::n()) %>% 
  tidyr::pivot_wider(names_from = c("score_group","Tag_log"),
                     values_from = count,
                     values_fill  = list(count = 0)) -> res_dat


knitr::kable(res_dat,format = "markdown")

|current_profile |month    |prod_log | low_C1| high_C1| low_C2| low_C3| med_NA| high_C2| low_NA|
|:---------------|:--------|:--------|------:|-------:|------:|------:|------:|-------:|------:|
|Large           |Jan-2020 |WIP      |      1|       0|      0|      0|      0|       0|      0|
|Medium          |Jan-2020 |Shipped  |      0|       1|      0|      0|      1|       0|      0|
|Medium          |Jan-2020 |Hold     |      0|       0|      1|      1|      0|       0|      0|
|Small           |Jan-2020 |Hold     |      0|       1|      1|      0|      0|       0|      1|
|Small           |Jan-2020 |Shipped  |      0|       0|      0|      0|      0|       1|      0|
|Small           |Jan-2020 |WIP      |      0|       0|      0|      1|      0|       0|      0|

【讨论】：

输出与预期输出不匹配，因为预期输出中也包含计数为 0 的行。