【问题标题】:nested If statement on dates日期上的嵌套 If 语句
【发布时间】:2017-01-31 20:09:37
【问题描述】:

我有一个数据框df,如下所示。

Id     ProcessDate
10     2011-12-29 14:14:00
11     2011-12-29 14:16:00
12     2011-12-29 14:14:00
13     2011-12-29 14:20:00
14     2011-12-29 14:49:00
15     2011-12-29 14:51:00
16     2011-12-29 14:53:00
17     2011-12-29 15:11:00
18     2011-12-29 15:13:00 
19     2011-12-29 15:10:00
20     2011-12-29 15:21:00
21     2011-12-29 14:34:00
22     2011-12-29 15:26:00  

我正在尝试创建第三列 Status,它将包含这三个值之一 {Before, during , after } 基于此条件。

 if  (df$ProcessDate < 2011-12-29 14:48:00)
 then  df$Status = "Before"
 else if (df$ProcessDate > 2011-12-29 14:48:00 & df$ProcessDate < 2011-12-29 15:16:00)
 then  df$Status = "Between"
 else  df$Status = "After"

最终的数据框应如下所示。

Id     ProcessDate              Status
10     2011-12-29 14:14:00      Before
11     2011-12-29 14:16:00      Before
12     2011-12-29 14:14:00      Before
13     2011-12-29 14:20:00      Before
14     2011-12-29 14:49:00      Between
15     2011-12-29 14:51:00      Between       
16     2011-12-29 14:53:00      Between
17     2011-12-29 15:11:00      Between
18     2011-12-29 15:13:00      Between
19     2011-12-29 15:10:00      Between
20     2011-12-29 15:21:00      After
21     2011-12-29 14:34:00      After
22     2011-12-29 15:26:00      After

我尝试了一些方法,但没有成功,非常感谢您对此问题的任何帮助。

【问题讨论】:

  • a few examplessimilar questions - 你尝试了什么,为什么没有成功?
  • @KimJenkins 我认为您的倒数第二行的状态应该是“之前”,对吧?

标签: r if-statement dataframe


【解决方案1】:

子集赋值

对于这种特殊情况,在基本 R 中执行此操作的一个非常简单的方法是将所有内容设置为 'Between',然后使用子集分配来更改应该是其他内容的行:

df$ProcessDate <- as.POSIXct(df$ProcessDate)    # skip if already parsed to datetime

df$Status <- 'Between'
df$Status[df$ProcessDate < as.POSIXct('2011-12-29 14:48:00')] <- 'Before'
df$Status[df$ProcessDate >= as.POSIXct('2011-12-29 15:16:00')] <- 'After'

df
##    Id         ProcessDate  Status
## 1  10 2011-12-29 14:14:00  Before
## 2  11 2011-12-29 14:16:00  Before
## 3  12 2011-12-29 14:14:00  Before
## 4  13 2011-12-29 14:20:00  Before
## 5  14 2011-12-29 14:49:00 Between
## 6  15 2011-12-29 14:51:00 Between
## 7  16 2011-12-29 14:53:00 Between
## 8  17 2011-12-29 15:11:00 Between
## 9  18 2011-12-29 15:13:00 Between
## 10 19 2011-12-29 15:10:00 Between
## 11 20 2011-12-29 15:21:00   After
## 12 21 2011-12-29 14:34:00  Before
## 13 22 2011-12-29 15:26:00   After

cut

专门设计的方法是使用cut,它有一个cut.POSIXt 方法。除了您已经想要的数据之外,它还需要在数据之前和之后设置断点,但对分类数据来说是一个很好的因素。

df$Status <- cut(df$ProcessDate, 
                 breaks = c(min(df$ProcessDate), 
                          as.POSIXct(c('2011-12-29 14:48:00', '2011-12-29 15:16:00')), 
                          max(df$ProcessDate) + 1), 
                 labels = c('Before', 'Between', 'After'))

嵌套的ifelse 调用

最常见和通用的基本版本是嵌套的ifelse 调用,它看起来很难看(特别是如果它们很多的话),但评估速度很快,因为ifelse 是矢量化的,而if 不是:

df$Status <- ifelse(df$ProcessDate < as.POSIXct('2011-12-29 14:48:00'), 
                    'Before', 
                    ifelse(df$ProcessDate < as.POSIXct('2011-12-29 15:16:00'), 
                           'Between', 
                           'After'))

dplyr

dplyr::case_when 是嵌套 ifelse 调用的一个很好的替代方案。它依次评估每个条件并返回相应的值:

library(dplyr)

df %>% mutate(
    ProcessDate = as.POSIXct(ProcessDate),    # skip this line if already datetime
                       # if this is true,                      then return "Before"
    Status = case_when(.$ProcessDate < as.POSIXct('2011-12-29 14:48:00') ~ 'Before',
                       # for the rest, if this is true,             return "Between"
                       .$ProcessDate < as.POSIXct('2011-12-29 15:16:00') ~ 'Between',
                       # always true, so make the rest "After"
                       TRUE ~ 'After'))

所有版本都返回相同的东西,除了 cut,它返回一个因子而不是字符向量。

【讨论】:

    【解决方案2】:

    试试这个:

    left <- as.POSIXct("12/29/2011 14:48", format = "%m/%d/%Y %H:%M") 
    right <- as.POSIXct("12/29/2011 15:16", format = "%m/%d/%Y %H:%M") 
    DT[, Status := ifelse(ProcessDate < left, "before", 
                ifelse(ProcessDate > right, "after", "between"))]
    

    它给出:

        Id         ProcessDate  Status
     1: 10 2011-12-29 14:14:00  before
     2: 11 2011-12-29 14:16:00  before
     3: 12 2011-12-29 14:14:00  before
     4: 13 2011-12-29 14:20:00  before
     5: 14 2011-12-29 14:49:00 between
     6: 15 2011-12-29 14:51:00 between
     7: 16 2011-12-29 14:53:00 between
     8: 17 2011-12-29 15:11:00 between
     9: 18 2011-12-29 15:13:00 between
    10: 19 2011-12-29 15:10:00 between
    11: 20 2011-12-29 15:21:00   after
    12: 21 2011-12-29 15:34:00   after
    13: 22 2011-12-29 15:26:00   after
    

    结果与上面相同,可矢量化ifelse()data.table

    【讨论】:

      【解决方案3】:

      这可能是一种可能的解决方案

      ids = c(10, 11, 12, 13, 14, 15, 16, 17, 18,  19, 20, 21, 22)      
      date = c('2011-12-29 14:14:00', '2011-12-29 14:16:00', '2011-12-29 14:14:00', '2011-12-29 14:20:00', '2011-12-29 14:49:00', '2011-12-29 14:51:00', '2011-12-29 14:53:00', '2011-12-29 15:11:00', '2011-12-29 15:13:00', '2011-12-29 15:10:00', '2011-12-29 15:21:00', '2011-12-29 14:34:00', '2011-12-29 15:26:00')
      df <- data.frame(Id = ids, 
                       ProcessDate = strptime(date, format = '%Y-%m-%d %H:%M:%S'))
      
      
      date.status.before <- strptime('2011-12-29 14:48:00', format = '%Y-%m-%d %H:%M:%S')
      date.status.after <- strptime('2011-12-29 15:16:00', format = '%Y-%m-%d %H:%M:%S')
      ProcessDateStatus <- function(process.date) {
        if  (process.date < date.status.before)
          "Before"
        else if (process.date > date.status.before & process.date < date.status.after)
          "Between"
        else 
          "After"  
      }
      df$Status <- lapply(df$ProcessDate, ProcessDateStatus)
      

      导致

         Id         ProcessDate  Status
      1  10 2011-12-29 14:14:00  Before
      2  11 2011-12-29 14:16:00  Before
      3  12 2011-12-29 14:14:00  Before
      4  13 2011-12-29 14:20:00  Before
      5  14 2011-12-29 14:49:00 Between
      6  15 2011-12-29 14:51:00 Between
      7  16 2011-12-29 14:53:00 Between
      8  17 2011-12-29 15:11:00 Between
      9  18 2011-12-29 15:13:00 Between
      10 19 2011-12-29 15:10:00 Between
      11 20 2011-12-29 15:21:00   After
      12 21 2011-12-29 14:34:00  Before
      13 22 2011-12-29 15:26:00   After
      

      【讨论】:

        【解决方案4】:

        一种可能的解决方案是将您的时间转换为纪元值,然后进行比较。 这可以通过使用 as.integer(as.POSIXct("Time")) 来完成,如下所示

        df = NULL
        df$ids = c(10, 11, 12, 13, 14, 15, 16, 17, 18,  19, 20, 21, 22)      
        df$date = c('2011-12-29 14:14:00', '2011-12-29 14:16:00', '2011-12-29      14:14:00', '2011-12-29 14:20:00', '2011-12-29 14:49:00', '2011-12-29 14:51:00', '2011-12-29 14:53:00', '2011-12-29 15:11:00', '2011-12-29 15:13:00', '2011-12-29 15:10:00', '2011-12-29 15:21:00', '2011-12-29 14:34:00', '2011-12-29 15:26:00')
        df = as.data.frame(df)
        df$date = as.integer(as.POSIXct(df$date))
        
        upper   = as.integer(as.POSIXct('2011-12-29 15:16:00'))
        lower   = as.integer(as.POSIXct('2011-12-29 14:48:00'))
        

        您将拥有如下转换后的日期列

        > df
            ids       date
        1   10 1325148240
        2   11 1325148360
        3   12 1325148240
        4   13 1325148600
        5   14 1325150340
        6   15 1325150460
        7   16 1325150580
        8   17 1325151660
        9   18 1325151780
        10  19 1325151600
        11  20 1325152260
        12  21 1325149440
        13  22 1325152560
        

        然后你可以简单地进行数字比较

        for(i in c(1:nrow(df))){
            if(df$date[i] < lower)
                    df$Status[i] = "Before"
            else if(df$date[i] > lower & df$date[i] < upper)
                    df$Status[i] = "Between"
            else
                    df$Status[i] = "After"
        }
        

        导致输出

        > df
            ids       date  Status
        1   10 1325148240  Before
        2   11 1325148360  Before
        3   12 1325148240  Before
        4   13 1325148600  Before
        5   14 1325150340 Between
        6   15 1325150460 Between
        7   16 1325150580 Between
        8   17 1325151660 Between
        9   18 1325151780 Between
        10  19 1325151600 Between
        11  20 1325152260   After
        12  21 1325149440  Before
        13  22 1325152560   After
        

        【讨论】:

        • 要保留日期的格式,您可以在每行的条件内执行从日期和时间到纪元的转换。
        • 转换为数字并没有真正的意义;比较运算符(&lt; 等)直接在 POSIX*t 日期时间上工作,因此您所做的只是让数据更难阅读。
        • 根据我之前的评论,您可以保留数据格式。转换不需要是明确的。进行转换是为了在比较日期时显示纪元的概念。
        猜你喜欢
        • 1970-01-01
        • 2020-04-04
        • 1970-01-01
        • 1970-01-01
        • 2012-06-01
        • 1970-01-01
        • 2016-04-23
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多