【问题标题】:How to convert irregular times into XTS object using R如何使用 R 将不规则时间转换为 XTS 对象
【发布时间】:2017-05-22 01:08:42
【问题描述】:

我有以下data.frame,我想将其转换为xts() 对象,但一直想弄清楚如何格式化时间:

data.frame

数据从最近(顶部)到最旧(底部)排列。问题是每一行都与格式不一致,所以我在尝试以每行显示正确日期和时间的方式格式化它时遇到了麻烦。

日期/时间列的所需输出:

01/05/17 02:55 PM
01/05/17 11:40 AM
01/05/17 07:00 AM
12/30/16 05:50 PM
12/29/16 07:03 AM
12/30/16 07:00 AM

数据:

data <- structure(list(Date = c("Jan-05-17 02:55PM", "11:40AM", "07:00AM", 
"Dec-30-16 05:50PM", "Dec-29-16 07:03AM", "07:00AM"), News = c("ENTEROMEDICS INC Files SEC form 8-K, Other Events, Financial Statements and Exhibits  +89.95%", 
"Why These 5 Biopharma Stocks Are Making Massive Gains on Thursday", 
"EnteroMedics Announces vBloc® Neurometabolic Therapy Now Available at MedStar Health and Roper St. Francis PR Newswire", 
"Why U.S. Steel, EnteroMedics, and McEwen Mining Slumped Today at Motley Fool -18.03%", 
"Splits Calendar: EnteroMedics splits before market open today (70:1 ratio)", 
"EnteroMedics Announces Retirement of All Senior Convertible Notes PR Newswire"
), Symbol = c("ETRM", "ETRM", "ETRM", "ETRM", "ETRM", "ETRM")), .Names = c("Date", 
"News", "Symbol"), row.names = c(NA, 6L), class = "data.frame")

【问题讨论】:

    标签: r time xts


    【解决方案1】:

    使用subDate 开头的数字替换为NA,后跟空格,后跟数字。然后使用 read.table 创建一个 2 列数据框,其中第 1 列中的日期(或 NA)和第 2 列中的时间。使用 na.locf 填写 NA 值给 DF2。现在cbindDF2data[-1] 读取使用read.zoo 创建的data.frame。最后将生成的"zoo" 对象转换为"xts"

    DF2 <- na.locf(read.table(text = sub("^(\\d)", "NA \\1", data$Date)))
    z <- read.zoo(cbind(DF2, data[-1]), index = 1:2, tz = "", format = "%b-%d-%y %I:%M%p")
    as.xts(z)
    

    【讨论】:

      【解决方案2】:

      假设您在所需日期时间输出的最后一行有错字,我猜您的意思是12/29/16 07:00 AM,那么当您在Date 列中有一个缺少日期的元素时,请使用最近已知的日期并“向后”滚动:

      library(stringr)
      
      l_datetime <- str_split(data$Date, " ")
      data$ymd <- unlist(lapply(l_datetime, function(x) ifelse(length(x) == 2, x[[1]], NA)))
      data$time <- unlist(lapply(l_datetime, function(x) ifelse(length(x) == 2, x[[2]], x[[1]])))
      # Roll "backward" the latest known date for elements of column `Date` that have missing YYYY-MM-DD values
      data$ymd <- na.locf(data$ymd) 
      # Carefully parse the time strings allowing for AM/PM:
      psx_date <- as.POSIXct(paste(data$ymd, data$time), format = "%b-%d-%y %I:%M%p")
      
      x_data <- xts(x = data[, c("News", "Symbol")], order.by = psx_date)
      # > x_data
      #                                                                                                         News                                  Symbol
      # 2016-12-29 07:00:00 "EnteroMedics Announces Retirement of All Senior Convertible Notes PR Newswire"                                           "ETRM"
      # 2016-12-29 07:03:00 "Splits Calendar: EnteroMedics splits before market open today (70:1 ratio)"                                              "ETRM"
      # 2016-12-30 17:50:00 "Why U.S. Steel, EnteroMedics, and McEwen Mining Slumped Today at Motley Fool -18.03%"                                    "ETRM"
      # 2017-01-05 07:00:00 "EnteroMedics Announces vBloc® Neurometabolic Therapy Now Available at MedStar Health and Roper St. Francis PR Newswire" "ETRM"
      # 2017-01-05 11:40:00 "Why These 5 Biopharma Stocks Are Making Massive Gains on Thursday"                                                       "ETRM"
      # 2017-01-05 14:55:00 "ENTEROMEDICS INC Files SEC form 8-K, Other Events, Financial Statements and Exhibits  +89.95%"                           "ETRM"
      

      【讨论】:

        【解决方案3】:

        这是使用tidyquant 包的解决方案,它会加载解决此问题所需的所有包。与其他解决方案一样,您需要与结构一致的日期,例如:

        "Jan-05-17 02:55 PM"
        

        使用lubridate包,您可以通过mdy_hm()函数转换为POSIXct类,如下所示:

        "Jan-05-17 02:55 PM" %>% lubridate::mdy_hm()
        > "2017-01-05 14:55:00 UTC"
        

        lubridate::mdy_hm() 函数代表月-日-年时-分。输出是正确 date-time 类中的日期。

        tidyquant 包有一个方便的函数,as_xts(),带有一个参数,date_col,当指定时将 data.frame 日期列转换为 xts 行名称。我使用管道 (%&gt;%) 使代码更具可读性并显示工作流程,并使用 dplyr::mutate() 函数使用 lubridate::mdy_hm() 函数将 Date 列更改为 POSIXct 类。最终的工作流程如下所示:

        data %>%
            mutate(Date = lubridate::mdy_hm(Date)) %>%
            as_xts(date_col = Date)
        

        在尝试代码 sn-p 之前,请确保 Date 列的所有行都具有有效格式,例如“Jan-05-17 02:55 PM”,否则您将在 lubridate::mdy_hm() 函数处收到解析错误。

        我用来测试的数据如下:

        data <- structure(list(Date = c("Jan-05-17 02:55 PM", "Jan-05-17 11:40 AM", "Jan-05-17 07:00 AM", 
                                    "Dec-30-16 05:50 PM", "Dec-29-16 07:03 AM", "Dec-29-16 07:00 AM"), News = c("ENTEROMEDICS INC Files SEC form 8-K, Other Events, Financial Statements and Exhibits  +89.95%", 
                                                                                                   "Why These 5 Biopharma Stocks Are Making Massive Gains on Thursday", 
                                                                                                   "EnteroMedics Announces vBloc® Neurometabolic Therapy Now Available at MedStar Health and Roper St. Francis PR Newswire", 
                                                                                                   "Why U.S. Steel, EnteroMedics, and McEwen Mining Slumped Today at Motley Fool -18.03%", 
                                                                                                   "Splits Calendar: EnteroMedics splits before market open today (70:1 ratio)", 
                                                                                                   "EnteroMedics Announces Retirement of All Senior Convertible Notes PR Newswire"
                                    ), Symbol = c("ETRM", "ETRM", "ETRM", "ETRM", "ETRM", "ETRM")), .Names = c("Date", 
                                                                                                               "News", "Symbol"), row.names = c(NA, 6L), class = "data.frame")
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 2016-01-07
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2014-05-17
          • 2021-11-12
          • 2020-03-07
          • 2018-09-03
          相关资源
          最近更新 更多