从R中的字符日期中删除分钟和秒答案

【问题标题】：Erase minutes and seconds from character dates in R从R中的字符日期中删除分钟和秒
【发布时间】：2019-03-21 01:02:12
【问题描述】：

我有这个时间戳向量：

c("01/09/2019 9:51:03", "01/09/2019 9:51:39", "01/09/2019 9:57:04", 
"01/09/2019 10:01:41", "01/09/2019 10:06:06", "01/09/2019 10:09:36", 
"01/09/2019 10:11:55", "01/09/2019 10:21:15", "01/09/2019 10:21:39", 
"01/09/2019 10:52:20")

我想从字符向量中去掉分钟和秒，这样我就只有 01/09/2019 9 和 01/09/2019 10

最有效的方法是什么？

【问题讨论】：

有趣的公认答案，您对高效的定义是什么？
我非常偏向于tidyverse 包:)
啊我明白了，我不怪你 :) 下次你可能应该把它放在问题中......

标签： r

【解决方案1】：

这是一个。

datevec <- c("01/09/2019 9:51:03", "01/09/2019 9:51:39", "01/09/2019 9:57:04", 
      "01/09/2019 10:01:41", "01/09/2019 10:06:06", "01/09/2019 10:09:36", 
      "01/09/2019 10:11:55", "01/09/2019 10:21:15", "01/09/2019 10:21:39", 
      "01/09/2019 10:52:20")

format(as.POSIXct(datevec, format = "%d/%m/%Y %H:%M:%OS"), "%d/%m/%Y %H")

# Result
 [1] "01/09/2019 09" "01/09/2019 09" "01/09/2019 09" "01/09/2019 10" "01/09/2019 10" "01/09/2019 10"
 [7] "01/09/2019 10" "01/09/2019 10" "01/09/2019 10" "01/09/2019 10"

【讨论】：

【解决方案2】：

您想要的输出类是什么？这个怎么样：

v <- c("01/09/2019 9:51:03", "01/09/2019 9:51:39", "01/09/2019 9:57:04", 
  "01/09/2019 10:01:41", "01/09/2019 10:06:06", "01/09/2019 10:09:36", 
  "01/09/2019 10:11:55", "01/09/2019 10:21:15", "01/09/2019 10:21:39", 
  "01/09/2019 10:52:20")


strptime(v, "%m/%d/%Y %H")

【讨论】：

【解决方案3】：

这看起来不错，

unlist(strsplit(mystring, split = ":", fixed=TRUE))[c(TRUE, FALSE,FALSE)]

（在here 的帮助下制作）

可以选择，

sapply(strsplit(mystring, split=':', fixed=TRUE), `[`, 1)

使用 Ronak 的一些基准和最近的 cmets，fixed=TRUE 使方法更快，我们看到方法 4（上述方法）最快，

mystring <- c("01/09/2019 9:51:03", "01/09/2019 9:51:39", "01/09/2019 9:57:04", 
              "01/09/2019 10:01:41", "01/09/2019 10:06:06", "01/09/2019 10:09:36", 
              "01/09/2019 10:11:55", "01/09/2019 10:21:15", "01/09/2019 10:21:39", 
              "01/09/2019 10:52:20")

microbenchmark(one = sapply(strsplit(mystring, split=':', fixed=TRUE), `[`, 1),
           two = unlist(lapply(mystring,function(x) strsplit(x,":", fixed=TRUE)[[1]][1])),
           three = strptime(mystring, "%m/%d/%Y %H"),
           four = unlist(strsplit(mystring, split = ":", fixed=TRUE))[c(TRUE, FALSE,FALSE)],
           five = format(as.POSIXct(mystring, format = "%d/%m/%Y %H:%M:%OS"), "%d/%m/%Y %H"), 
           six = gsub("(.*?):.*", "\\1", mystring),
           seven = str_extract(mystring, ".+(?=:.+:)"),
           times = 100000)



    Unit: microseconds
  expr     min      lq      mean  median       uq        max neval
   one  42.792  49.471  85.63742  52.572  57.1310  669280.96 1e+05
   two  64.637  70.618 114.16364  73.252  77.6840  582466.94 1e+05
 three 129.456 134.771 156.82308 136.188 139.2030  339715.94 1e+05
  four  12.860  15.641  22.75699  17.254  18.5440  305703.52 1e+05
  five 482.888 505.647 633.15388 512.880 552.1155  551274.28 1e+05
   six  37.889  43.121  52.79030  45.567  49.1880   32954.59 1e+05
 seven  53.432  59.051  88.05015  62.326  69.9320 1180361.17 1e+05

【讨论】：

是的。我认为fixed = TRUE 让它更快。
很有意思，其实4个，fixed=TRUE最快，改了benchmark来显示这个。

【解决方案4】：

你也可以从stringr使用str_extract：

date_strings <- c("01/09/2019 9:51:03", "01/09/2019 9:51:39", "01/09/2019 9:57:04", 
"01/09/2019 10:01:41", "01/09/2019 10:06:06", "01/09/2019 10:09:36", 
"01/09/2019 10:11:55", "01/09/2019 10:21:15", "01/09/2019 10:21:39", 
"01/09/2019 10:52:20")

str_extract(date_strings, ".+(?=:.+:)")

 [1] "01/09/2019 9"  "01/09/2019 9"  "01/09/2019 9"  "01/09/2019 10"
 [5] "01/09/2019 10" "01/09/2019 10" "01/09/2019 10" "01/09/2019 10"
 [9] "01/09/2019 10" "01/09/2019 10"

【讨论】：

【解决方案5】：

另一个：

dates <- c("01/09/2019 9:51:03", "01/09/2019 9:51:39", "01/09/2019 9:57:04", 
                  "01/09/2019 10:01:41", "01/09/2019 10:06:06", "01/09/2019 10:09:36", 
                  "01/09/2019 10:11:55", "01/09/2019 10:21:15", "01/09/2019 10:21:39", 
                  "01/09/2019 10:52:20")
unlist(lapply(dates,function(x) strsplit(x,":")[[1]][1]))

给予

 [1] "01/09/2019 9"  "01/09/2019 9"  "01/09/2019 9"  "01/09/2019 10" "01/09/2019 10"
 [6] "01/09/2019 10" "01/09/2019 10" "01/09/2019 10" "01/09/2019 10" "01/09/2019 10"

【讨论】：

【解决方案6】：

这是另一个使用gsub

通过()和\\1捕获模式来引用捕获的组，需要?使正则表达式变得懒惰，因为有多个:。

gsub("(.*?):.*", "\\1", dates)

【讨论】：