【问题标题】:Creating Subgroups based on Time Period using Lubridate and Dplyr使用 Lubridate 和 Dplyr 基于时间段创建子组
【发布时间】:2016-10-24 21:44:30
【问题描述】:

这应该是一个快速而简单的问题。使用下面的简单数据框,我想使用 dplyr 和 lubridate 对所有在 2015 年 4 月或之后具有 OnsetDate 的客户进行分组。该组将称为“NewOnset”,其余的将是“OldOnset”。

我是润滑新手,遇到了一些麻烦。

City<-c("Toronto", "Toronto", "Montreal","Ottawa","Ottawa",
        "Hamilton","Peterborough","Toronto","Hamilton","Hamilton")

OnsetDate<-c("11/04/1980","04/08/2005","04/19/2015","07/10/2015","10/10/1999","03/11/2016","09/12/2011","06/10/2015","02/05/1988","08/08/2016")

Client<-c("Cl1","Cl2","Cl3","Cl4","Cl5","Cl6","Cl7","Cl8","Cl9","Cl10")

DF<- data.frame(Client,City,OnsetDate)

【问题讨论】:

  • DF %&gt;% mutate(OnsetDate = as.Date(OnsetDate, '%m/%d/%Y')) %&gt;% group_by(group = if_else(OnsetDate &gt; as.Date('2015-04-01'), 'NewOnset', 'OldOnset')) 或用lubridate::mdy 替换as.Date
  • 谢谢!如果您将其作为官方答案,我可以给予您信任。

标签: r dplyr lubridate


【解决方案1】:

使用 dplyr,

       # parse OnsetDate to Date; alternatively use lubridate::mdy(OnsetDate)
DF %>% mutate(OnsetDate = as.Date(OnsetDate, '%m/%d/%Y')) %>% 
    # add and group by new column
    group_by(group = if_else(OnsetDate >= as.Date('2015-04-01'),    # condition
                             'NewOnset',    # return if above (true)
                             'OldOnset'))   # return if below (false)

## Source: local data frame [10 x 4]
## Groups: group [2]
## 
##    Client         City  OnsetDate    group
##    <fctr>       <fctr>     <date>    <chr>
## 1     Cl1      Toronto 1980-11-04 OldOnset
## 2     Cl2      Toronto 2005-04-08 OldOnset
## 3     Cl3     Montreal 2015-04-19 NewOnset
## 4     Cl4       Ottawa 2015-07-10 NewOnset
## 5     Cl5       Ottawa 1999-10-10 OldOnset
## 6     Cl6     Hamilton 2016-03-11 NewOnset
## 7     Cl7 Peterborough 2011-09-12 OldOnset
## 8     Cl8      Toronto 2015-06-10 NewOnset
## 9     Cl9     Hamilton 1988-02-05 OldOnset
## 10   Cl10     Hamilton 2016-08-08 NewOnset

注意这里的分组任何事情,您可以在mutate 中执行这两个操作,但您确实会得到一个适合进一步变异或汇总的分组 data.frame。

另一种方法是使用cut.Date,它将返回一个因子:

# parse OnsetDate to Date; alternatively use lubridate::mdy(OnsetDate)
DF %>% mutate(OnsetDate = as.Date(OnsetDate, '%m/%d/%Y')) %>% 
    # add and group by new column
    group_by(group = cut(OnsetDate, 
                         breaks = c(min(OnsetDate), as.Date('2015-04-01'), max(OnsetDate)), 
                         labels = c('OldOnset', 'NewOnset'), 
                         include.lowest = TRUE))

## Source: local data frame [10 x 4]
## Groups: group [2]
## 
##    Client         City  OnsetDate    group
##    <fctr>       <fctr>     <date>   <fctr>
## 1     Cl1      Toronto 1980-11-04 OldOnset
## 2     Cl2      Toronto 2005-04-08 OldOnset
## 3     Cl3     Montreal 2015-04-19 NewOnset
## 4     Cl4       Ottawa 2015-07-10 NewOnset
## 5     Cl5       Ottawa 1999-10-10 OldOnset
## 6     Cl6     Hamilton 2016-03-11 NewOnset
## 7     Cl7 Peterborough 2011-09-12 OldOnset
## 8     Cl8      Toronto 2015-06-10 NewOnset
## 9     Cl9     Hamilton 1988-02-05 OldOnset
## 10   Cl10     Hamilton 2016-08-08 NewOnset

【讨论】:

    【解决方案2】:

    无需使用外部软件包即可完成这项简单的任务。在基础 R 中:

    ## coerce character to a valid date
    DF$OnsetDate <- as.Date(DF$OnsetDate ,"%m/%d/%Y")
    ## flter rows
    DF[DF$OnsetDate>"2015-04-30",]
    
    #    Client     City  OnsetDate
    # 4     Cl4   Ottawa 2015-07-10
    # 6     Cl6 Hamilton 2016-03-11
    # 8     Cl8  Toronto 2015-06-10
    # 10   Cl10 Hamilton 2016-08-08
    

    【讨论】:

      【解决方案3】:

      您可以在没有 dplyr 功能的情况下执行此操作。 Lubridate 的函数系列以您要转换为日期的对象的格式命名。在这种情况下,您想使用mdy 函数,因为输入格式是月-日-年。

      DF$OnsetDate &lt;- mdy(DF$OnsetDate)

      然后您可以通过根据您的条件对行进行子集来创建新的数据框。

      NewOnset <- DF[DF$OnsetDate >= as.Date("2015-04-01"), ]
      OldOnset <- DF[DF$OnsetDate < as.Date("2015-04-01"), ]
      

      【讨论】:

        【解决方案4】:

        您的代码存在一些问题。这应该可以解决它:

        City <- c("Toronto", "Toronto", "Montreal", "Ottawa", "Ottawa", "Hamilton", "Peterborough", "Toronto", "Hamilton", "Hamilton")
        OnsetDate <- c("11/04/1980","04/08/2005","04/19/2015","07/10/2015","10/10/1999","03/11/2016","09/12/2011","06/10/2015","02/05/1988","08/08/2016")
        Client <- c("Cl1","Cl2","Cl3","Cl4","Cl5","Cl6","Cl7","Cl8","Cl9","Cl10")
        
        df <- data.frame(Client, City, OnsetDate)
        
        df$OnsetDate <- as.Date(df$OnsetDate, format = "%m/%d/%Y")    
        
        # here comes the magic
        df %>% filter(OnsetDate > as.Date("04/01/2015", format = "%m/%d/%Y"))
        

        您可以使用format 参数,这里不需要lubridate 包。上面的代码产生:

          Client     City  OnsetDate
        1    Cl3 Montreal 2015-04-19
        2    Cl4   Ottawa 2015-07-10
        3    Cl6 Hamilton 2016-03-11
        4    Cl8  Toronto 2015-06-10
        5   Cl10 Hamilton 2016-08-08
        

        【讨论】:

          猜你喜欢
          • 2019-10-23
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2018-03-03
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多