【问题标题】:Subset dataset with time condition in RR中具有时间条件的子集数据集
【发布时间】:2016-08-29 05:04:03
【问题描述】:

我有一个这样的数据集 例子.txt

"09/Jan/2016" "05:00:22" "304" 449
"09/Jan/2016" "07:00:12" "304" 449
"09/Jan/2016" "10:00:02" "200" 10575
"09/Jan/2016" "11:00:03" "304" 449
"09/Jan/2016" "13:00:03" "304" 449
"09/Jan/2016" "20:00:03" "304" 449 
"09/Jan/2016" "23:00:03" "304" 450 
"10/Jan/2016" "00:00:03" "304" 449 
"10/Jan/2016" "03:00:03" "304" 449 
"10/Jan/2016" "04:00:03" "304" 449 

我可以在我在 R 中运行我的代码之前六小时从范围内对我的数据集进行子集化吗? 例如,我在 1 月 10 日 4:15 打开并运行我的代码,所以我想要我的数据集中的子集,比如

"09/Jan/2016" "23:00:03" "304" 450 
"10/Jan/2016" "00:00:03" "304" 449 
"10/Jan/2016" "03:00:03" "304" 449 
"10/Jan/2016" "04:00:03" "304" 449 

我应该使用什么函数来回答我的问题?以及如何使用?

【问题讨论】:

    标签: r time subset


    【解决方案1】:

    假设您拥有 4 列,名称为 V1V2V3V4,数据框为 df

    您可以通过

    base R 中执行此操作
    mergedDateTime <- as.POSIXct(paste(df$V1, df$V2), format = "%d/%b/%Y %H:%M:%S")
    df[(Sys.time() - 6*60*60) <  mergedDateTime & Sys.time() > mergedDateTime, ]
    

    对于给定的示例,这将作为,

    x <- "01/10/2016 04:15:00"
    mergedDateTime <- as.POSIXct(paste(df$V1, df$V2), format = "%d/%b/%Y %H:%M:%S")
    df[(as.POSIXct(x, format = "%m/%d/%Y %H:%M:%S") - 6*60*60) <  mergedDateTime & 
                    as.POSIXct(x, format = "%m/%d/%Y %H:%M:%S") > mergedDateTime, ]
    
    
    #        V1       V2      V3  V4
    #7  09/Jan/2016 23:00:03 304 450
    #8  10/Jan/2016 00:00:03 304 449
    #9  10/Jan/2016 03:00:03 304 449
    #10 10/Jan/2016 04:00:03 304 449
    

    【讨论】:

      【解决方案2】:

      lubridatechron 包结合使用时,对于处理日期和时间非常强大且富有表现力:

      library(readr)
      library(chron)
      library(lubridate)
      
      # read the data in
      df_foo = read_table(file = '"09/Jan/2016" "05:00:22" "304" 449
      "09/Jan/2016" "07:00:12" "304" 449
      "09/Jan/2016" "10:00:02" "200" 10575
      "09/Jan/2016" "11:00:03" "304" 449
      "09/Jan/2016" "13:00:03" "304" 449
      "09/Jan/2016" "20:00:03" "304" 449 
      "09/Jan/2016" "23:00:03" "304" 450 
      "10/Jan/2016" "00:00:03" "304" 449 
      "10/Jan/2016" "03:00:03" "304" 449 
      "10/Jan/2016" "04:00:03" "304" 449', 
                          col_names = c("Date", "Time", "Value1", "Value2"))
      
      # parse dates and times
      df_foo = df_foo %>% 
        mutate(
          # parse the dates
          Date = as.Date(gsub('"', "", Date), format = "%d/%b/%Y"),
          # parse the times
          Time = times(format(gsub('"', "", Time), format = "%H:%M:%S")),
          Value1 = as.integer(gsub('"', "", Value1)),
          # datetime
          Datetime = ISOdatetime(
            month = month(Date), 
            day = days(Date), 
            hour = hours(Time),
            sec = seconds(Time),
            min = minutes(Time),
            year = year(Date)
          )
        )
      
      # filter to data within 6 hours of the current time
      df_foo %>% 
        filter(
          Datetime > Sys.time() - dhours(6)
        )
      

      显然,鉴于您在问题中包含的数据样本,这不会返回任何内容。

      【讨论】:

      • 这可以在 lubridate 中简单得多 - 例如 parse_date_time(paste(df_foo$Date, df_foo$Time), orders="dmyHMS") 或者甚至在基础 R 中它只有一行 - as.POSIXct(paste(df_foo$Date, df_foo$Time), format="%d/%b/%Y %H:%M:%S", tz="UTC")
      • @thelatemail 你指的是日期时间构造?我的版本也是拨打ISOdatetime。在您的任何一个版本中都无法查看经济体。
      猜你喜欢
      • 2015-01-30
      • 2021-08-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-08-11
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多