【问题标题】:Lubridate: How to subtract the last observation of a month [duplicate]Lubridate:如何减去一个月的最后一次观察[重复]
【发布时间】:2016-08-22 16:31:38
【问题描述】:

我有一个时间序列,想得到每个月最后一次观察的信息。这个问题不是关于生成一个新的时间序列,而是在现有的时间序列中找到每个月的最后一次观察。最后一次观察可能不是一个月的最后一天。下面只是一个小例子,

date <- c(ymd(20010129, 20010228, 20010330, 20010429), ymd(20010501) + days(1:90))

# "2001-01-29" "2001-02-28" "2001-03-30" "2001-04-29" "2001-05-02" "2001-05-03" "2001-05-04" "2001-05-05"
# "2001-05-06" "2001-05-07" "2001-05-08" "2001-05-09" "2001-05-10" "2001-05-11" "2001-05-12" "2001-05-13"
# "2001-05-14" "2001-05-15" "2001-05-16" "2001-05-17" "2001-05-18" "2001-05-19" "2001-05-20" "2001-05-21"
# "2001-05-22" "2001-05-23" "2001-05-24" "2001-05-25" "2001-05-26" "2001-05-27" "2001-05-28" "2001-05-29"
# "2001-05-30" "2001-05-31" "2001-06-01" "2001-06-02" "2001-06-03" "2001-06-04" "2001-06-05" "2001-06-06"
# "2001-06-07" "2001-06-08" "2001-06-09" "2001-06-10" "2001-06-11" "2001-06-12" "2001-06-13" "2001-06-14"
# "2001-06-15" "2001-06-16" "2001-06-17" "2001-06-18" "2001-06-19" "2001-06-20" "2001-06-21" "2001-06-22"
# "2001-06-23" "2001-06-24" "2001-06-25" "2001-06-26" "2001-06-27" "2001-06-28" "2001-06-29" "2001-06-30"
# "2001-07-01" "2001-07-02" "2001-07-03" "2001-07-04" "2001-07-05" "2001-07-06" "2001-07-07" "2001-07-08"
# "2001-07-09" "2001-07-10" "2001-07-11" "2001-07-12" "2001-07-13" "2001-07-14" "2001-07-15" "2001-07-16"
# "2001-07-17" "2001-07-18" "2001-07-19" "2001-07-20" "2001-07-21" "2001-07-22" "2001-07-23" "2001-07-24"
# "2001-07-25" "2001-07-26" "2001-07-27" "2001-07-28" "2001-07-29" "2001-07-30"

我想继续观察"2001-01-29""2001-02-28""2001-03-30""2001-04-29""2001-05-31""2001-06-30""2001-07-30"。有没有办法实现?

【问题讨论】:

    标签: r time-series lubridate


    【解决方案1】:

    您可以按月份对日期进行分组并计算最大值:

    library(lubridate)
    unique(ave(date, month(date), FUN = max))
    
    # [1] "2001-01-29" "2001-02-28" "2001-03-30" "2001-04-29"
    # [5] "2001-05-31" "2001-06-30" "2001-07-30"
    

    【讨论】:

    • 但是我的数据集有跨年的观察,你的回答只能给出12个观察。我需要每年每个月的最后一次观察。再次感谢!
    • 您可以将year(date) 添加为组变量。类似:unique(ave(date, month(date), year(date), FUN = max))
    • 如果你要统一它,我猜tapplyave 更有意义。
    • @Frank tapply 似乎使原始数据的类丢失。但是as.Date(tapply(date, month(date), FUN = max)) 仍然是一个不错的选择。
    【解决方案2】:

    我们可以使用data.table。将“日期”向量转换为data.table,按“日期”的yearmonth 分组,我们得到“日期”的max

    library(data.table)
    as.data.table(date)[, .(Date = max(date)), .(Year = year(date), Month = month(date))]
    #   Year Month       Date
    #1: 2001     1 2001-01-29
    #2: 2001     2 2001-02-28
    #3: 2001     3 2001-03-30
    #4: 2001     4 2001-04-29
    #5: 2001     5 2001-05-31
    #6: 2001     6 2001-06-30
    #7: 2001     7 2001-07-30
    

    或者使用base R 和基于tapply 的简单方法,而不是获取与原始向量长度相同的向量,然后获取unique

    do.call("c", tapply(date, list(month(date), year(date)), 
                    FUN = function(x) list(max(x))))
    #[1] "2001-01-29" "2001-02-28" "2001-03-30" "2001-04-29" "2001-05-31" 
    #[6] "2001-06-30" "2001-07-30"
    

    或者用简洁的方式

     unname(as.Date(tapply(date, substr(date, 1,7), FUN = max), origin = "1970-01-01"))
     #[1] "2001-01-29" "2001-02-28" "2001-03-30" "2001-04-29" "2001-05-31" 
     #[6] "2001-06-30" "2001-07-30"
    

    此外,我们可以通过检查相邻元素(假设它是有序的)来获得不进行任何分组的输出,它应该非常有效。

    v1 <- substr(date, 1, 7)
    date[c(v1[-1]!= v1[-length(v1)], TRUE)]
    [1] "2001-01-29" "2001-02-28" "2001-03-30" "2001-04-29" "2001-05-31" 
    [6] "2001-06-30" "2001-07-30"
    

    基准测试

    date1 <- c(ymd(20010129, 20010228, 20010330, 20010429), ymd(20010501) + days(1:1e6))
    system.time(as.data.table(date1)[, .(Date = max(date1)), 
          .(Year = year(date1), Month = month(date1))])
    #   user  system elapsed 
    #   5.53    0.05    5.58  
    
    
    system.time({
     v1 <- substr(date1, 1, 7)
     date1[c(v1[-1]!= v1[-length(v1)], TRUE)]
    })
    # user  system elapsed 
    #  10.25    0.23   10.49 
    

    基于上述性能,data.table 方法非常有效,尽管相邻元素之间的base R 比较也不是那么落后,而闪光的不是金子。

    system.time(unique(ave(date1, year(date1), month(date1), FUN = max)))
    #   user  system elapsed 
    # 242.35  120.80  364.55 
    

    【讨论】:

      【解决方案3】:

      endpoints xts 包中的一个函数完全符合其名称的含义:

      > date[endpoints(date,on='months')]
      [1] "2001-01-29" "2001-02-28" "2001-03-30" "2001-04-29" "2001-05-31"
      [6] "2001-06-30" "2001-07-30”
      

      参数的有效值包括:“us”(微秒)、“微秒”、“ms”(毫秒)、“毫秒”、“secs”(秒)、“秒”、“mins”(分钟) 、“分钟”、“小时”、“天”、“周”、“月”、“季度”和“年”。

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2021-06-27
        • 1970-01-01
        • 1970-01-01
        • 2014-05-10
        • 1970-01-01
        相关资源
        最近更新 更多