【问题标题】:Identify groups of following dates in a dataframe识别数据框中的以下日期组
【发布时间】:2020-03-03 13:01:12
【问题描述】:

我有一个这样的df:

    mydate daynight myvar
195 2018-11-21  1   64
196 2018-11-21  2    4
197 2018-11-22  1    7
198 2018-11-22  2    2
199 2018-11-23  2    0
200 2018-11-24  2    0
201 2018-11-25  2   64
202 2018-11-26  2    6
203 2018-11-27  2    0
204 2018-11-28  2    0
205 2018-11-29  2   20
206 2018-11-30  2    4
207 2018-12-01  1   64
214 2018-12-04  2    1
215 2018-12-05  1    4
216 2018-12-05  2    0
217 2018-12-08  2   30
218 2018-12-09  2   15
219 2018-12-10  2    0
220 2018-12-11  2   14
221 2018-12-12  1   28

我正在尝试在daynight 列中识别至少有 3 个跟随 2 的组,并且我想知道这些组的开始和结束日期。有了那个df,它会是:

     beg          end         
[1,] "2018-11-22" "2018-11-30"
[2,] "2018-12-05" "2018-12-11"

【问题讨论】:

    标签: r date datetime


    【解决方案1】:

    示例数据:

    df = read.table(text = "
    mydate daynight myvar
    195 2018-11-21  1   64
    196 2018-11-21  2    4
    197 2018-11-22  1    7
    198 2018-11-22  2    2
    199 2018-11-23  2    0
    200 2018-11-24  2    0
    201 2018-11-25  2   64
    202 2018-11-26  2    6
    203 2018-11-27  2    0
    204 2018-11-28  2    0
    205 2018-11-29  2   20
    206 2018-11-30  2    4
    207 2018-12-01  1   64
    214 2018-12-04  2    1
    215 2018-12-05  1    4
    216 2018-12-05  2    0
    217 2018-12-08  2   30
    218 2018-12-09  2   15
    219 2018-12-10  2    0
    220 2018-12-11  2   14
    221 2018-12-12  1   28
    ", header=T)
    

    一种方法是这样的:

    library(dplyr)
    library(data.table)
    
      df %>%
      group_by(group = rleid(daynight)) %>%  # group by consequtive daynight values
      summarise(val = unique(daynight),      # get the daynight value
                Count = n(),                 # count rows of that value
                Beg = first(mydate),         # get first date
                End = last(mydate)) %>%      # get last date
      filter(Count > 2 & val == 2) %>%       # keep only cases where you have 3+ daylight = 2
      select(-group, -val)                   # remove unecessary columns
    
    # # A tibble: 2 x 3
    #    Count Beg        End       
    #    <int> <fct>      <fct>     
    # 1     9 2018-11-22 2018-11-30
    # 2     5 2018-12-05 2018-12-11
    

    请注意,此方法假定您的日期列已排序(如您的示例中所示)。

    【讨论】:

      【解决方案2】:

      这是一个基本的 R 想法,

      i1 <- cumsum(c(TRUE, diff(d2$daynight) != 0))
      
      do.call(rbind, 
       Filter(function(x)!is.null(x), 
                tapply(d2$mydate, i1, FUN = function(i)if(length(i) >= 3){c(i[1], i[length(i)])})))
      

      给出,

               [,1]         [,2]        
      4 "2018-11-22" "2018-11-30"
      8 "2018-12-05" "2018-12-11"
      

      数据

      dput(d2)
      structure(list(mydate = c("2018-11-21", "2018-11-21", "2018-11-22", 
      "2018-11-22", "2018-11-23", "2018-11-24", "2018-11-25", "2018-11-26", 
      "2018-11-27", "2018-11-28", "2018-11-29", "2018-11-30", "2018-12-01", 
      "2018-12-04", "2018-12-05", "2018-12-05", "2018-12-08", "2018-12-09", 
      "2018-12-10", "2018-12-11", "2018-12-12"), daynight = c(1L, 2L, 
      1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 
      2L, 2L, 1L), myvar = c(64L, 4L, 7L, 2L, 0L, 0L, 64L, 6L, 0L, 
      0L, 20L, 4L, 64L, 1L, 4L, 0L, 30L, 15L, 0L, 14L, 28L)), row.names = c("195", 
      "196", "197", "198", "199", "200", "201", "202", "203", "204", 
      "205", "206", "207", "214", "215", "216", "217", "218", "219", 
      "220", "221"), class = "data.frame")
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2021-06-12
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2021-12-09
        • 2017-03-22
        • 2021-06-14
        相关资源
        最近更新 更多