如何将 lubridate as_datetime 函数与 dplyr mutate 和 case_when 函数结合使用？答案

【问题标题】：How to use lubridate as_datetime function in combination with dplyr mutate and case_when functions?如何将 lubridate as_datetime 函数与 dplyr mutate 和 case_when 函数结合使用？
【发布时间】：2019-08-16 13:00:54
【问题描述】：

我正在尝试操作 dttm 变量以根据数字 id 向量调整不同的时区。我可以根据 id 操作变量，而不会使用字符向量作为新值。但是，当我尝试使用date_time() 函数创建新值时，每个值都会收到case_when 中第一项的结果。

id 向量是数字的，我尝试将类转换为因子和字符。问题有关。

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date

df1 <- tibble(
  id = c(1, 2, 3),
  date_time = rep(as_datetime("2018-01-01 12:34:56", tz = "Europe/Zurich"), 3)
) %>% 
  mutate(
    date_time2 = case_when(
      id == 1 ~ "one",
      id == 2 ~ "two",
      TRUE ~ "three"
    )
  )


df2 <- tibble(
  id = c(1, 2, 3),
  date_time = rep(as_datetime("2018-01-01 12:34:56", tz = "Europe/Zurich"), 3)
) %>% 
  mutate(
    date_time2 = case_when(
      id == 1 ~ as_datetime(date_time, tz = "America/New_York"),
      id == 2 ~ as_datetime(date_time, tz = "Asia/Kolkata"),
      TRUE ~ date_time
    )
  )

df3 <- tibble(
  id = c(1, 2, 3),
  date_time = rep(as_datetime("2018-01-01 12:34:56", tz = "Europe/Zurich"), 3)
) %>% 
  mutate(
    date_time2 = case_when(
      id == 1 ~ as_datetime(date_time, tz = "Asia/Kolkata"),
      id == 2 ~ as_datetime(date_time, tz = "America/New_York"),
      TRUE ~ date_time
    )
  )


df1 
#> # A tibble: 3 x 3
#>      id date_time           date_time2
#>   <dbl> <dttm>              <chr>     
#> 1     1 2018-01-01 12:34:56 one       
#> 2     2 2018-01-01 12:34:56 two       
#> 3     3 2018-01-01 12:34:56 three

df2
#> # A tibble: 3 x 3
#>      id date_time           date_time2         
#>   <dbl> <dttm>              <dttm>             
#> 1     1 2018-01-01 12:34:56 2018-01-01 06:34:56
#> 2     2 2018-01-01 12:34:56 2018-01-01 06:34:56
#> 3     3 2018-01-01 12:34:56 2018-01-01 06:34:56

df3
#> # A tibble: 3 x 3
#>      id date_time           date_time2         
#>   <dbl> <dttm>              <dttm>             
#> 1     1 2018-01-01 12:34:56 2018-01-01 17:04:56
#> 2     2 2018-01-01 12:34:56 2018-01-01 17:04:56
#> 3     3 2018-01-01 12:34:56 2018-01-01 17:04:56

^{由reprex package (v0.2.1) 于 2019 年 3 月 26 日创建}

df1 显示了我的预期。

在df2 中，我希望date_time2 id == 2 显示“2018-01-01 17:04:56”而不是“2018-01-01 06:34:56”。

在df3 中，我希望date_time2 id == 3 显示“2018-01-01 12:34:56”而不是“2018-01-01 17:04:56”。

【问题讨论】：

标签： r dplyr lubridate

【解决方案1】：

这似乎是一个错误（可能是dpylr，因为之前的日期有问题）。

这是一种可能的工作方式（不要问我为什么会这样 :)）

tibble(
  id = c(1, 2, 3),
  date_time = rep(as_datetime("2018-01-01 12:34:56", tz = "Europe/Zurich"), 3)
) %>% 
  mutate(
    date_time2 = case_when(
      id == 1 ~ as_datetime(as.character(as_datetime(date_time, tz = "America/New_York"))),
      id == 2 ~ as_datetime(as.character(as_datetime(date_time, tz = "Asia/Kolkata"))),
      TRUE ~  as_datetime(as.character(date_time))

    )
  )

# A tibble: 3 x 3
     id date_time           date_time2         
  <dbl> <dttm>              <dttm>             
1     1 2018-01-01 12:34:56 2018-01-01 06:34:56
2     2 2018-01-01 12:34:56 2018-01-01 17:04:56
3     3 2018-01-01 12:34:56 2018-01-01 12:34:56

【讨论】：

【解决方案2】：

我们可以使用lubridate 包中的force_tzs。我们可以为tzones 参数提供不同的时区设置。在这种情况下，如果您知道时区的顺序，则不需要case_when。

library(dplyr)
library(lubridate)

df2 %>%
  mutate(date_time2 = force_tzs(date_time, tzones = c("America/New_York", "Asia/Kolkata", "UTC")))
# # A tibble: 3 x 3
#      id date_time           date_time2         
#   <dbl> <dttm>              <dttm>             
# 1     1 2018-01-01 12:34:56 2018-01-01 17:34:56
# 2     2 2018-01-01 12:34:56 2018-01-01 07:04:56
# 3     3 2018-01-01 12:34:56 2018-01-01 12:34:56

df3 %>%
  mutate(date_time2 = force_tzs(date_time, tzones = c("Asia/Kolkata", "America/New_York", "UTC")))
# # A tibble: 3 x 3
#      id date_time           date_time2         
#   <dbl> <dttm>              <dttm>             
# 1     1 2018-01-01 12:34:56 2018-01-01 07:04:56
# 2     2 2018-01-01 12:34:56 2018-01-01 17:34:56
# 3     3 2018-01-01 12:34:56 2018-01-01 12:34:56

【讨论】：

您可以使用id 为时区向量编制索引，以防值多于reprex 中的值。你知道吗，为什么你在加尔各答得到07:04:56 而不是06:34:56？这是某种夏令时问题吗？
谢谢。正如@kath 所建议的，我有几百个 id，索引在这种情况下很有帮助。但是，尽管我很难理解原因，但下面建议的解决方案仍然非常有效。印度的时区在冬季比欧洲中部时间早 4:30，在夏季比欧洲中部时间早 5:30。