【问题标题】:Occurence of certain date is more than x times, get next available date某个日期的出现次数超过 x 次,获取下一个可用日期
【发布时间】:2021-04-01 11:18:14
【问题描述】:

我有一个包含 15 列的数据框,其中 1 列是参与者 ID,14 列是每位参与者可能的约会日期(节假日和周末除外):

  Included.Participant         V1         V2         V3         V4         V5   V6   V7         V8         V9        V10        V11        V12  V13  V14
1                    1 2021-03-22 2021-03-23 2021-03-24 2021-03-25 2021-03-26 <NA> <NA> 2021-03-29 2021-03-30 2021-03-31 2021-04-01 2021-04-02 <NA> <NA>
2                    2 2021-03-22 2021-03-23 2021-03-24 2021-03-25 2021-03-26 <NA> <NA> 2021-03-29 2021-03-30 2021-03-31 2021-04-01 2021-04-02 <NA> <NA>
3                    3 2021-03-22 2021-03-23 2021-03-24 2021-03-25 2021-03-26 <NA> <NA> 2021-03-29 2021-03-30 2021-03-31 2021-04-01 2021-04-02 <NA> <NA>
4                    4 2021-03-22 2021-03-23 2021-03-24 2021-03-25 2021-03-26 <NA> <NA> 2021-03-29 2021-03-30 2021-03-31 2021-04-01 2021-04-02 <NA> <NA>
5                    5 2021-03-22 2021-03-23 2021-03-24 2021-03-25 2021-03-26 <NA> <NA> 2021-03-29 2021-03-30 2021-03-31 2021-04-01 2021-04-02 <NA> <NA>
6                    6 2021-03-22 2021-03-23 2021-03-24 2021-03-25 2021-03-26 <NA> <NA> 2021-03-29 2021-03-30 2021-03-31 2021-04-01 2021-04-02 <NA> <NA>

每个日期,可以添加 3 位参与者。如果已达到最多 3 个,则应移至第四个参与者的下一个可用日期。所以在这个例子中,期望的输出是:

      Included.Participant         V1
1                    1 2021-03-22
2                    2 2021-03-22
3                    3 2021-03-22
4                    4 2021-03-23
5                    5 2021-03-23
6                    6 2021-03-23

如果没有可能的日期,则 V1 列可以留空。

我似乎无法弄清楚如何获得所需的输出。我真的希望你能帮忙

非常感谢!

输入:

structure(list(Included.y = c(1L, 2L, 3L, 4L, 7L, 8L, 9L, 10L, 
11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L), 
    V1 = structure(c(18870, NA, 18848, NA, NA, NA, NA, NA, 18806, 
    18799, 18835, 18841, NA, NA, 18912, 18954, NA, 18842, NA, 
    NA), class = "Date"), V2 = structure(c(18871, NA, 18849, 
    NA, NA, NA, NA, 18876, 18807, 18800, 18836, 18842, NA, NA, 
    18913, 18955, NA, 18843, NA, NA), class = "Date"), V3 = structure(c(18872, 
    18904, 18850, 18897, 18967, NA, 18883, 18877, 18808, 18801, 
    18837, 18843, 18890, NA, 18914, 18956, 18953, 18844, NA, 
    18869), class = "Date"), V4 = structure(c(NA, 18905, 18851, 
    18898, 18968, 18953, 18884, 18878, 18809, 18802, NA, 18844, 
    18891, 18967, NA, NA, 18954, NA, 18925, 18870), class = "Date"), 
    V5 = structure(c(NA, 18906, NA, 18899, 18969, 18954, 18885, 
    18879, NA, NA, NA, NA, 18892, 18968, NA, NA, 18955, NA, 18926, 
    18871), class = "Date"), V6 = structure(c(NA, 18907, NA, 
    18900, 18970, 18955, 18886, NA, NA, NA, NA, NA, 18893, 18969, 
    NA, NA, 18956, NA, 18927, 18872), class = "Date"), V7 = structure(c(18876, 
    NA, NA, NA, NA, 18956, NA, NA, NA, NA, 18841, NA, NA, 18970, 
    18918, 18960, NA, 18848, 18928, NA), class = "Date"), V8 = structure(c(18877, 
    NA, 18855, NA, NA, NA, NA, NA, 18813, 18806, 18842, 18848, 
    NA, NA, 18919, 18961, NA, 18849, NA, NA), class = "Date"), 
    V9 = structure(c(18878, NA, 18856, NA, NA, NA, NA, 18883, 
    18814, 18807, 18843, 18849, NA, NA, 18920, 18962, NA, 18850, 
    NA, NA), class = "Date"), V10 = structure(c(18879, 18911, 
    18857, 18904, 18974, NA, 18890, 18884, 18815, 18808, 18844, 
    18850, 18897, NA, 18921, 18963, 18960, 18851, NA, 18876), class = "Date"), 
    V11 = structure(c(NA, 18912, 18858, 18905, 18975, 18960, 
    18891, 18885, 18816, 18809, NA, 18851, 18898, 18974, NA, 
    NA, 18961, NA, 18932, 18877), class = "Date"), V12 = structure(c(NA, 
    18913, NA, 18906, 18976, 18961, 18892, 18886, NA, NA, NA, 
    NA, 18899, 18975, NA, NA, 18962, NA, 18933, 18878), class = "Date"), 
    V13 = structure(c(NA, 18914, NA, 18907, 18977, 18962, 18893, 
    NA, NA, NA, NA, NA, 18900, 18976, NA, NA, 18963, NA, 18934, 
    18879), class = "Date"), V14 = structure(c(18883, NA, NA, 
    NA, NA, 18963, NA, NA, NA, NA, 18848, NA, NA, 18977, 18925, 
    18967, NA, 18855, 18935, NA), class = "Date")), row.names = c(NA, 
20L), class = "data.frame")

【问题讨论】:

  • 如果您可以将数据作为对象粘贴到问题中,这将真正有助于测试和验证解决方案:使用dput(your_dataframe)
  • 谢谢你的提示,我用 dput 编辑了我的问题
  • 一旦一个日期被三个参与者填满,V1 中的 2021-02-23 是同一日期:V2 中的 2021-02-23 仍然可供其他参与者使用?
  • 在您的答案下方查看我的评论。希望这能解决这个问题
  • 嗨,这些不是基于偏好的日期。这些是该参与者的所有约会可能性,因此我们不必考虑哪些日期在哪个列号中。希望这会有所帮助

标签: r dataframe


【解决方案1】:

在澄清将参与者分配到不同列中的相同日期后更新了代码:

library(dplyr)
library(tidyr)
library(tibble)


df1 <- 
  df %>% 
  pivot_longer(-Included.Participant) %>% 
  select(-Included.Participant) %>% 
  mutate(name = factor(name, levels = paste0("V", 1:14), ordered = TRUE))%>% 
  group_by(value) %>%
  arrange(value, name) %>% 
  slice_head(n = 3)%>% 
  rowid_to_column(var = "Included.Participant") %>% 
  filter(Included.Participant <= 20) %>% 
  pivot_wider(names_from = name, values_from = value)

  • 输出
head(df1, 10)
#> # A tibble: 10 x 5
#>    Included.Participant V1         V2         V3         V4        
#>                   <int> <date>     <date>     <date>     <date>    
#>  1                    1 2021-03-22 NA         NA         NA        
#>  2                    2 2021-03-22 NA         NA         NA        
#>  3                    3 2021-03-22 NA         NA         NA        
#>  4                    4 2021-03-23 NA         NA         NA        
#>  5                    5 2021-03-23 NA         NA         NA        
#>  6                    6 2021-03-23 NA         NA         NA        
#>  7                    7 NA         2021-03-24 NA         NA        
#>  8                    8 NA         2021-03-24 NA         NA        
#>  9                    9 NA         2021-03-24 NA         NA        
#> 10                   10 NA         NA         2021-03-25 NA

【讨论】:

  • 嗨彼得,感谢您的帮助。对于前 6 名参与者,这似乎是我想要的输出,但从参与者 7 开始,没有选择下一个可能的日期。在第 7 行,下一个可用日期是 2021-03-24。此外,如果示例中的日期 2021-03-23 已经填充了 3 个“空位”(参与者 4-6),那么该日期应该不再可用。我将我的确切输出的 dput 添加到原始帖子中。
  • 修改代码以反映澄清。
  • 嗨,彼得,非常感谢!这正是我所需要的:D 你让我很开心
  • 嗨彼得,我刚刚注意到 ID 列被替换为行号。有什么办法可以保持 ID 列不变?
  • 哪一列是ID列?原始数据框中的所有列都不称为“ID”。你是说专栏Included.Participant;这些值是整数,实际上是行号。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2019-06-18
  • 1970-01-01
  • 2012-04-27
  • 2019-03-31
  • 2015-06-08
相关资源
最近更新 更多