【问题标题】:Groupby a column and find its sum and countGroupby 一列并找到它的总和和计数
【发布时间】:2020-07-11 09:34:05
【问题描述】:

背景: 我有一个数据集,df,

  Date                          Duration


 1/2/2020 5:00:00 PM            20
 1/2/2020 5:30:01 PM            30
 1/2/2020 6:00:00 PM            10
 1/5/2020 7:00:01 AM            5
 1/6/2020 8:00:00 AM            2
 1/6/2020 9:00:00 AM            8

所需的输出:

 Date                 Total_Duration         Count

1/2/2020                60                     3
1/5/2020                5                      1
1/6/2020                10                     2

输入:

 structure(list(Date = structure(1:6, .Label = c("1/2/2020 5:00:00 PM", 
 "1/2/2020 5:30:01 PM", "1/2/2020 6:00:00 PM", "1/5/2020 7:00:01 AM", 
 "1/6/2020 8:00:00 AM", "1/6/2020 9:00:00 AM"), class = "factor"), 
 Duration = c(20L, 30L, 10L, 5L, 2L, 8L)), class = "data.frame", row.names = c(NA, 
-6L))

我尝试过的:

 library(dplyr)
 df %>% group_by(Date)  %>% add_tally() %>%
 summarize(Duration) 

任何指导都会有所帮助。

【问题讨论】:

    标签: r dplyr aggregation lubridate


    【解决方案1】:

    使用dmy_hms(假设格式为DD/MM/YYYYY HH::MM:SS)转换为'DateTime'后,我们可以只从'Date'中获取Date,将其用作分组变量并获取'的sum Duration' 和 'Count' 作为n()

    library(dplyr)
    library(lubridate)
    df %>%
        group_by(Date = as.Date(dmy_hms(Date))) %>% 
        summarise(Total_Duration = sum(Duration), Count = n())
    # A tibble: 3 x 3
    #  Date       Total_Duration Count
    #  <date>              <int> <int>
    #1 2020-02-01             60     3
    #2 2020-05-01              5     1
    #3 2020-06-01             10     2
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2019-02-24
      • 2021-06-05
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-05-04
      相关资源
      最近更新 更多