【问题标题】:Calculating Average and median on Monthly data and convert in htmltable R计算每月数据的平均值和中位数并在 htmltable R 中转换
【发布时间】:2018-08-01 05:20:03
【问题描述】:

我有一个 DF:

my_data <- read.table(text = 
                      "ID     Date1                     T1     Date2     Val1
                      A-1    '2018-01-10 15:05:24'       A    2018-01-15  10
                      A-2    '2018-01-05 14:15:22'       B    2018-01-14  12
                      A-3    '2018-01-04 13:20:21'       A    2018-01-13  15
                      A-4    '2018-01-01 18:35:45'       B    2018-01-12  22
                      A-5    '2017-12-28 19:45:10'       A    2018-01-11  18
                      A-6    '2017-12-10 08:03:29'       A    2018-01-10  21
                      A-7    '2017-12-06 20:55:55'       A    2018-01-09  28
                      A-8    '2018-01-10 10:02:12'       A    2018-01-15  10
                      A-9    '2018-01-05 17:15:14'       B    2018-01-14  12
                      A-10   '2018-01-04 18:35:58'       A    2018-01-13  15
                      A-11   '2018-01-01 21:09:25'       B    2018-01-12  22
                      A-12   '2017-12-28 02:12:22'       A    2018-01-11  18
                      A-13   '2017-12-10 03:45:44'       A    2018-01-10  21
                      A-14   '2017-12-06 07:15:25'       A    2018-01-09  28 
                      A-18   '2017-10-07 08:02:84        B    2017-11-05  20
                      A-21   '2017-10-01 06:04:04        A    2017-10-20  15
                      A-51   '2017-09-20 08:07:06        A    2017-09-28  10
                      A-35   '2017-09-14 08:02:45        A    2017-09-25  20
                      A-30   '2017-08-10 15:03:08        A    2017-08-30  25", 
header = TRUE, stringsAsFactors = FALSE)

并运行下面提到的代码,我得到如下所示的输出:

    table_2 <- merge(  
  my_data %>% 
    mutate(Date2 = ymd(Date2)) %>% 
    arrange(Date2) %>% 
    mutate(Month = paste(month(ymd_hms(Date1), label = TRUE), year(Date1), sep = "-")) %>% 
    filter(T1 == "A") %>%
    group_by(Month) %>% 
    summarise("# of A" = n(),
              "sum of A" = sum(Val1)) %>%
    mutate("MOM Growth # of A" = round(apply(cbind(`# of A`, lag(- `# of A`)), 
                                       1, sum, na.rm = TRUE) / lag(`# of A`), 2),
           "MOM Growth sum of A" = round(apply(cbind(`sum of A`, lag(- `sum of A`)), 
                                         1, sum, na.rm = TRUE) / lag(`sum of A`) * 100, 2)) %>% 
    select(Month, `# of A`, `MOM Growth # of A`, `sum of A`, `MOM Growth sum of A`),
  my_data %>% 
    mutate(Date2 = ymd(Date2)) %>% 
    arrange(Date2) %>% 
    mutate(Month = paste(month(ymd_hms(Date1), label = TRUE), year(Date1), sep = "-")) %>% 
    filter(T1 == "B") %>%
    group_by(Month) %>% 
    summarise("# of B" = n(),
              "sum of B" = sum(Val1)) %>%
    mutate("MOM Growth # of B" = round(apply(cbind(`# of B`, lag(- `# of B`)), 
                                       1, sum, na.rm = TRUE) / lag(`# of B` * 100), 2),
           "MOM Growth sum of B" = round(apply(cbind(`sum of B`, lag(- `sum of B`)), 
                                         1, sum, na.rm = TRUE) / lag(`sum of B`) * 100), 2) %>%
    select(Month, `# of B`, `MOM Growth # of B`, `sum of B`, `MOM Growth sum of B`),
  by = "Month",
  all = TRUE)

table_2[is.na(table_2)] <- ""

输出(表_2):

现在我想在Status of A 头部和Median of B 下添加两列Median of AAvg Time of A,在Status of B 头部添加Avg Time of B。并将这些输出转换为 htmltable 格式。

只是想知道如何调整 summarise 中的代码来计算每月数据的这些值。

此外,月份应在输出数据中按顺序排列,如果在 Max 月份和 Min 月份之间缺少任何月份,则该月份应具有除 MOM Growth 之外的所有值 0 Status of AStatus of B,因为这应该大于-100%。

【问题讨论】:

    标签: r dataframe ggplot2 html-table dplyr


    【解决方案1】:

    这是一个尝试使用:

    library(dplyr)
    library(lubridate)
    library(tableHTML)
    

    我添加了 AB 的中值和平均时间列,并添加了 MOM Growth 条件,并确保月份是 1) 以正确的顺序和 2) 完整,即使没有数据在特定月份可用。

    为了获取所有月份,请在数据中创建一个从第一个日期到最后一个日期的日期序列(中间间隔 1 个月)。然后确保AB 两个组都有一个日期(因为稍后会有一个过滤器):

    date_range = expand.grid(Date1 = seq(min(ymd_hms(my_data$Date1)), max(ymd_hms(my_data$Date1)), 
                                         by = "1 month"),
                             T1 = c("A", "B"),
                             stringsAsFactors = FALSE)
    
    table_2 <- merge( 
      my_data %>% 
        mutate(Date2 = ymd(Date2),
               Date1 = ymd_hms(Date1)) %>% 
        full_join(date_range, by = c("Date1", "T1")) %>% # join date ranges to table
        arrange(Date1) %>% # sort by date
        mutate(Month = paste(month(Date1, label = TRUE), year(Date1), sep = "-"),
               row_number = row_number(), # create row_numbers to keep up order
               Val1 = coalesce(Val1, 0L)) %>% # replace NA with 0 in Val1
        filter(T1 == "A") %>%
        group_by(Month) %>% 
        summarise("# of A" = n(),
                  "sum of A" = sum(Val1, na.rm = TRUE),
                  "Median of A" = median(Val1, na.rm = TRUE), # compute median
                  "Avg Time of A" = round(mean(difftime(Date2, Date1),
                                               na.rm = TRUE), # compute avg time
                                          2),
                  row_number = min(row_number)) %>% # get min row number
        arrange(row_number) %>% # sort by row number (to sort months)
        mutate("MOM Growth # of A" = round(apply(cbind(`# of A`, lag(- `# of A`)), 
                                                 1, sum, na.rm = TRUE) / lag(`# of A`), 2),
               "MOM Growth sum of A" = round(apply(cbind(`sum of A`, lag(- `sum of A`)), 
                                                   1, sum, na.rm = TRUE) / lag(`sum of A`) * 100, 2)) %>% 
        mutate("MOM Growth # of A" = if_else(is.infinite(`MOM Growth # of A`), 100, `MOM Growth # of A`), # replace Inf with 100
               "MOM Growth sum of A" = if_else(is.infinite(`MOM Growth sum of A`), 100, `MOM Growth sum of A`)) %>% 
        select(Month, `# of A`, `MOM Growth # of A`,
               `sum of A`, `MOM Growth sum of A`,
               `Median of A`, `Avg Time of A`), 
      my_data %>% 
        mutate(Date2 = ymd(Date2),
               Date1 = ymd_hms(Date1)) %>% 
        full_join(date_range, by = c("Date1", "T1")) %>% 
        arrange(Date1) %>% 
        mutate(Month = paste(month(Date1, label = TRUE), year(Date1), sep = "-"),
               row_number = row_number(),
               Val1 = coalesce(Val1, 0L)) %>% 
        filter(T1 == "B") %>%
        group_by(Month) %>% 
        summarise("# of B" = n(),
                  "sum of B" = sum(Val1, na.rm = TRUE),
                  "Median of B" = median(Val1, na.rm = TRUE),
                  "Avg Time of B" = round(mean(difftime(Date2, Date1),
                                               na.rm = TRUE),
                                          2),
                  row_number = min(row_number)) %>%
        arrange(row_number) %>% 
        mutate("MOM Growth # of B" = round(apply(cbind(`# of B`, lag(- `# of B`)), 
                                                 1, sum, na.rm = TRUE) / lag(`# of B`), 2),
               "MOM Growth sum of B" = round(apply(cbind(`sum of B`, lag(- `sum of B`)), 
                                                   1, sum, na.rm = TRUE) / lag(`sum of B`) * 100, 2)) %>% 
        mutate("MOM Growth # of B" = if_else(is.infinite(`MOM Growth # of B`), 100, `MOM Growth # of B`),
               "MOM Growth sum of B" = if_else(is.infinite(`MOM Growth sum of B`), 100, `MOM Growth sum of B`)) %>% 
        select(Month, `# of B`, `MOM Growth # of B`,
               `sum of B`, `MOM Growth sum of B`,
               `Median of B`, `Avg Time of B`), 
      by = "Month", 
      all = TRUE,
      sort = FALSE) # do not sort by ID column to keep month order
    

    “删除”缺失值:

    table_2[is.na(table_2)] = "" 
    

    从该数据创建一个tableHTML

    table_2 %>% tableHTML(rownames = FALSE,
                          widths = rep(100, 13),
                          second_headers = list(c(1, 4, 4),
                                                c("", "Status of A", "Status of B")),
                          caption = "A & B consolidated") %>% 
      add_css_caption(css = list(c("font-weight", "border"), 
                                 c("bold", "1px solid black")))
    

    结果如下:

    【讨论】:

    • 在计算行数时,将n() 替换为sum(!is.na(Date2)) 可能是个好主意,因为您还会计算空行数。
    • 只需添加%&gt;% add_css_row(css = list(c("background-color"), c("lightblue")), rows = 3:6)
    • 这就是标题。改用add_css_caption(css = list(c("background-color"), c("lightblue")))
    • 那是因为即使您没有任何数据,您每个月也至少有一条记录。将"# of A" = n() 更改为"# of A" = sum(!is.na(Date1))"# of A" = sum(!is.na(Date2))(以date_range 中未使用的为准)
    • 查看小插图,这是教程:cran.rstudio.com/web/packages/tableHTML/vignettes/… 如果您想了解更多信息,请使用 vignette(package = "tableHTML")
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2023-01-20
    • 2021-12-18
    • 2019-12-20
    • 2021-10-13
    • 2020-04-10
    • 1970-01-01
    相关资源
    最近更新 更多