【问题标题】:Generate sequence of dates for given frequency as per days of occurence根据发生的天数生成给定频率的日期序列
【发布时间】:2021-06-02 06:00:17
【问题描述】:

尝试在 R 编程(使用 lubridate)中生成具有给定开始日期和频率的日期的日期序列不是数值,而是日期可能发生的天数。

给定下表,其中定义了组、开始日期、日期和发生标志

+-------+------------+-----+-----+
| Group | start_date | Day | Y/N |
+-------+------------+-----+-----+
| foo   | 02-06-2021 | Mon |   0 |
| foo   | 02-06-2021 | Tue |   1 |
| foo   | 02-06-2021 | Wed |   0 |
| foo   | 02-06-2021 | Thu |   1 |
| foo   | 02-06-2021 | Fri |   1 |
| foo   | 02-06-2021 | Sat |   1 |
| foo   | 02-06-2021 | Sun |   0 |
| bar   | 02-06-2021 | Mon |   1 |
| bar   | 02-06-2021 | Tue |   0 |
| bar   | 02-06-2021 | Wed |   0 |
| bar   | 02-06-2021 | Thu |   1 |
| bar   | 02-06-2021 | Fri |   1 |
| bar   | 02-06-2021 | Sat |   0 |
| bar   | 02-06-2021 | Sun |   0 |
+-------+------------+-----+-----+

需要的输出如下。

+-------+------------+---------------------+
| Group | given_date | next_available_date |
+-------+------------+---------------------+
| foo   | 02-06-2021 | 03-06-2021          |
| foo   | 04-06-2021 | 04-06-2021          |
| foo   | 06-06-2021 | 08-06-2021          |
| bar   | 02-06-2021 | 03-06-2021          |
| bar   | 05-06-2021 | 07-06-2021          |
+-------+------------+---------------------+

关于while循环的一些想法,我认为可能会很累。

for each given_date{
inputdate = given_date
while(true){
 {
 if(group =="Foo" & day(inputdate) in ('Tue','Thu','Fri','Sat')
 next_available_date=inputdate
 break
 }
 else
 {
  inputdate = inputdate+(1 day) (repeat the loop until if condition is satisfied)
 } 
}
}

如果不同组的条件可能不同。

无法弄清楚如何利用不均匀的频率来获取下一个可用日期。

【问题讨论】:

  • 你能解释一下输出吗? Day 代表什么? Y/N 是什么?另外你如何计算given_datenext_available_date
  • Y/N 是我们将确定日期是否可以发生的标志基础,例如:对于 foo 和给定日期 02-06-2021 即星期三,下一个可用日期不能结婚因为 Y/N 列对星期三为 0,下一个可用日期将是 03-06-2021,即周四和 Y/N = 1,同样的情况将适用于 05-06-2021 酒吧,但从 Y/N分别标记 sat 和 sun 0,0 因此下一个可用日期将落在 07-06-2021
  • 抱歉,如果不清楚,stackoverflow 的新手
  • 是的,组的开始日期将保持不变
  • 肯定会这样做

标签: r datetime dplyr lubridate


【解决方案1】:

如前面在 cmets 中所讨论的,处理更大的样本。紧随其后的策略 -

  • 由于您的day 列始终从Mon 开始,这不等于start_date,因此需要匹配weekday 的列。
  • 因此创建了day 字段以排序factor 类型,以便可以将其操作为整数。
  • 以这样一种方式排列数据框,即您的每个组仅从那天开始。为此使用了模除法%%
  • 安排任务后就容易多了。我为每个工作日结束、每个组和每个 start_date 创建了七个日期。
  • 在任何地方过滤掉 Y/N 为 0 的行。
  • 现在您只需要如此使用的顶行slice_head()
df <- data.frame(
  stringsAsFactors = FALSE,
                   Group = c("foo","foo","foo",
                             "foo","foo","foo","foo","foo","foo","foo",
                             "foo","foo","foo","foo","foo","foo","foo",
                             "foo","foo","foo","foo","bar","bar","bar",
                             "bar","bar","bar","bar","bar","bar","bar","bar",
                             "bar","bar","bar"),
              start_date = c("02-06-2021",
                             "02-06-2021","02-06-2021","02-06-2021","02-06-2021",
                             "02-06-2021","02-06-2021","04-06-2021",
                             "04-06-2021","04-06-2021","04-06-2021","04-06-2021",
                             "04-06-2021","04-06-2021","06-06-2021","06-06-2021",
                             "06-06-2021","06-06-2021","06-06-2021",
                             "06-06-2021","06-06-2021","02-06-2021","02-06-2021",
                             "02-06-2021","02-06-2021","02-06-2021","02-06-2021",
                             "02-06-2021","05-06-2021","05-06-2021",
                             "05-06-2021","05-06-2021","05-06-2021","05-06-2021",
                             "05-06-2021"),
                     Day = c("Mon","Tue","Wed",
                             "Thu","Fri","Sat","Sun","Mon","Tue","Wed",
                             "Thu","Fri","Sat","Sun","Mon","Tue","Wed",
                             "Thu","Fri","Sat","Sun","Mon","Tue","Wed",
                             "Thu","Fri","Sat","Sun","Mon","Tue","Wed","Thu",
                             "Fri","Sat","Sun"),
                     y_n = c(0L,1L,0L,1L,1L,
                             1L,0L,0L,1L,0L,1L,1L,1L,0L,0L,1L,0L,1L,
                             1L,1L,0L,1L,0L,0L,1L,1L,0L,0L,1L,0L,
                             0L,1L,1L,0L,0L)
      )

library(lubridate)
library(tidyverse)

df %>% group_by(Group, start_date) %>%
  mutate(Day = factor(Day, levels = Day, ordered = T)) %>%
  arrange(Group, (as.numeric(Day) + 7 - wday(dmy(start_date), week_start = 1)) %% 7, .by_group = T) %>%
  mutate(next_available_date = dmy(start_date) + 0:6) %>%
  filter(y_n !=0) %>%
  slice_head()
#> # A tibble: 5 x 5
#> # Groups:   Group, start_date [5]
#>   Group start_date Day     y_n next_available_date
#>   <chr> <chr>      <ord> <int> <date>             
#> 1 bar   02-06-2021 Thu       1 2021-06-03         
#> 2 bar   05-06-2021 Mon       1 2021-06-07         
#> 3 foo   02-06-2021 Thu       1 2021-06-03         
#> 4 foo   04-06-2021 Fri       1 2021-06-04         
#> 5 foo   06-06-2021 Tue       1 2021-06-08

根据提供的数据

df <- data.frame(
  stringsAsFactors = FALSE,
                   Group = c("foo","foo","foo",
                             "foo","foo","foo","foo","bar","bar","bar",
                             "bar","bar","bar","bar"),
              start_date = c("02-06-2021",
                             "02-06-2021","02-06-2021","02-06-2021","02-06-2021",
                             "02-06-2021","02-06-2021","02-06-2021",
                             "02-06-2021","02-06-2021","02-06-2021","02-06-2021",
                             "02-06-2021","02-06-2021"),
                     Day = c("Mon","Tue","Wed",
                             "Thu","Fri","Sat","Sun","Mon","Tue","Wed",
                             "Thu","Fri","Sat","Sun"),
                     y_n = c(0L,1L,0L,1L,1L,
                             1L,0L,1L,0L,0L,1L,1L,0L,0L)
      )

library(lubridate)
library(tidyverse)



df %>% group_by(Group, start_date) %>%
  mutate(Day = factor(Day, levels = Day, ordered = T)) %>%
  arrange(Group, (as.numeric(Day) + 7 - wday(dmy(start_date), week_start = 1)) %% 7, .by_group = T) %>%
  mutate(next_available_date = dmy(start_date) + 0:6) %>%
  filter(y_n !=0) %>%
  slice_head()

#> # A tibble: 2 x 5
#> # Groups:   Group, start_date [2]
#>   Group start_date Day     y_n next_available_date
#>   <chr> <chr>      <ord> <int> <date>             
#> 1 bar   02-06-2021 Thu       1 2021-06-03         
#> 2 foo   02-06-2021 Thu       1 2021-06-03

reprex package (v2.0.0) 于 2021-06-02 创建

【讨论】:

  • 不确定代码是如何工作的,如果你能简单解释一下,我会进一步探索
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2016-04-08
  • 2021-06-20
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多