【问题标题】:Convert column values to date in R将列值转换为 R 中的日期
【发布时间】:2021-01-03 16:02:16
【问题描述】:

已阅读并应用以下主题,但未成功

convert specified columns to dates in R R convert character vector values to Date values

我有一列名为dates的值,由于某种原因,我无法使用as.Dates转换为实际日期。它位于名为 general 的数据框中。

我试图将它提取到另一个对象,但我得到的只是一个值列表

[[1]]
 [1] NA           "43897"      NA           "44004"      "23/05/2020" "25/06/2020" "25/06/2020"
 [8] "43837"      "43989"      "43868"      "43989"      "18/07/2020" NA           "23/06/2020"
[15] "30/06/2020" "21/07/2020" "31/07/2020" "24/06/2020" "28/06/2020" "17/06/2020" "43989"     
[22] "16/06/2020" NA           "43896"      "23/06/2020" "44018"      "31/05/2020" "28/05/2020"
[29] "44081"      "25/06/2020" NA           NA           "27/06/2020" "43926"      "17/05/2020"
[36] NA           "43956"      "20/06/2020" "24/04/2020" "24/03/2020" "22/02/2020" NA          
[43] NA           NA           NA           NA           NA           NA           NA          
[50] NA           NA           NA           "44030"      "43837"      "18/07/2020"

我试过了

as.Date(general$dates,"%Y-%m-%d")

返回

 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[32] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

甚至

lapply(general$dates,as.Date,origin = "1970-01-01")
Error in charToDate(x) : 
  character string is not in a standard unambiguous format

任何灯光都将不胜感激。

【问题讨论】:

  • as.Date(general$dates,"%d/%m/%Y") 怎么样?
  • @LocoGris 我试过了,但它最终将一些观察结果转化为 NA。

标签: r excel dataframe date dplyr


【解决方案1】:

由于有两种格式,我们可以使用grep 来创建索引并单独执行。

# // create an index to separate the numeric only dates
i1 <- grepl('^\\d+$', dates)
dates1 <- as.Date(rep(NA, length(dates)))
# // specify the origin - seems like 1899 instead of 1970
dates1[i1] <-  as.Date(as.numeric(dates[i1]), origin = '1899-12-31')
# // assign the other dates as well with the format
dates1[!i1] <- as.Date(dates[!i1], "%d/%m/%Y")
dates1
[1] NA           "2020-03-08" NA           "2020-06-23" "2020-05-23" "2020-06-25" "2020-06-25" "2020-01-08" "2020-06-08"
[10] "2020-02-08" "2020-06-08" "2020-07-18" NA           "2020-06-23" "2020-06-30" "2020-07-21" "2020-07-31" "2020-06-24"
[19] "2020-06-28" "2020-06-17" "2020-06-08" "2020-06-16" NA           "2020-03-07" "2020-06-23" "2020-07-07" "2020-05-31"
[28] "2020-05-28" "2020-09-08" "2020-06-25" NA           NA           "2020-06-27" "2020-04-06" "2020-05-17" NA          
[37] "2020-05-06" "2020-06-20" "2020-04-24" "2020-03-24" "2020-02-22" NA           NA           NA           NA          
[46] NA           NA           NA           NA           NA           NA           NA           "2020-07-19" "2020-01-08"
[55] "2020-07-18"

或者如果我们用lubridate,用dplyr,会更容易

library(dplyr)
library(lubridate)
coalesce(as_date(as.numeric(dates)), dmy(dates))

as_date 中的origin 应相应更改

数据

dates <- c(NA, "43897", NA, "44004", "23/05/2020", "25/06/2020", "25/06/2020", 
"43837", "43989", "43868", "43989", "18/07/2020", NA, "23/06/2020", 
"30/06/2020", "21/07/2020", "31/07/2020", "24/06/2020", "28/06/2020", 
"17/06/2020", "43989", "16/06/2020", NA, "43896", "23/06/2020", 
"44018", "31/05/2020", "28/05/2020", "44081", "25/06/2020", NA, 
NA, "27/06/2020", "43926", "17/05/2020", NA, "43956", "20/06/2020", 
"24/04/2020", "24/03/2020", "22/02/2020", NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, "44030", "43837", "18/07/2020")

【讨论】:

  • 嗨。它部分工作。有些值的行为很奇怪,如下所示:[1] NA "2090-03-09" NA "2090-06-24" "2020-05-23" "2020-06-25" "2020-06-25" [8] "2090-01-08" "2090-06-09" "2090-02-08" "2090-06-09" "2020-07-18" NA 它确实变成了日期格式,但奇怪的是,变成了 2090 而不是 2020。关于如何解决这个问题的任何想法?
  • 嗨。通过使用grep 方法,它运行良好。通过使用lubridate,它没有,并且一些值表现得很奇怪,将年份转换为 2090 年而不是 2020 年。知道为什么会发生这种情况吗?
  • @dairelix 是的,正如我在帖子中提到的,origin 默认为 '1970-01-01' 您可以指定 origin 与自定义 as.Date 一样
猜你喜欢
  • 2022-08-16
  • 1970-01-01
  • 2021-12-18
  • 2021-03-07
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多