【问题标题】:Find earliest interval start date and extract interval (R, lubridate)查找最早的间隔开始日期和提取间隔(R,lubridate)
【发布时间】:2019-05-22 13:29:47
【问题描述】:

我有一个包含四个可能的类 S4:intervals 的数据框,如下所示:

 id  int_a           int_b             int_c              int_d
 1   2013--2015      2011--2012        NA--NA             2014--2014

我需要对可以使用 int_start() 提取的最早开始日期进行排序,然后将此间隔(或长度)存储为例如first_int 作为数据集中的新变量,并重复第二、第三和第四。

预期的输出是:

id  .. first_int      sec_int          third_int          fourth_int
 1  .. 2011--2012     2013--2015       2014--2014         NA--NA

我在下面添加了一大块我的数据集

library(lubridate)
so <- structure(list(int_a = new("Interval", .Data = c(24192000, 
                                                          52704000, 0, 64022400, NA, NA, NA, 0, NA, NA), start = structure(c(1286841600, 
                                                                                                                             1327276800, 1157068800, 1370995200, NA, NA, NA, 1296172800, NA, 
                                                                                                                             NA), class = c("POSIXct", "POSIXt"), tzone = "UTC"), tzone = "UTC"), 
                     int_b = new("Interval", .Data = c(NA, 2505600, NA, NA, 
                                                          53222400, 7862400, NA, NA, 0, 116812800), start = structure(c(NA, 
                                                                                                                        1402531200, NA, NA, 1397433600, 1307577600, NA, NA, 1366329600, 
                                                                                                                        1320278400), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
                                    tzone = "UTC"), int_c = new("Interval", .Data = c(NA, 
                                                                                         NA, 19353600, NA, NA, 41472000, 0, NA, NA, NA), start = structure(c(NA, 
                                                                                                                                                             NA, 1287446400, NA, NA, 1238025600, 1433203200, NA, NA, NA
                                                                                         ), class = c("POSIXct", "POSIXt"), tzone = "UTC"), tzone = "UTC"), 
                     int_d = new("Interval", .Data = c(3024000, 9331200, NA, 
                                                          0, 8899200, 36374400, 0, 3196800, 18748800, 28771200), start = structure(c(1316044800, 
                                                                                                                                     1396828800, NA, 1466640000, 1457568000, 1290038400, 1444694400, 
                                                                                                                                     1321315200, 1381968000, 1438300800), class = c("POSIXct", 
                                                                                                                                                                                    "POSIXt"), tzone = "UTC"), tzone = "UTC")), class = c("tbl_df", 
                                                                                                                                                                                                                                          "tbl", "data.frame"), row.names = c(NA, -10L))

reprex package (v0.3.0) 于 2019 年 5 月 22 日创建

非常感谢!

【问题讨论】:

    标签: r intervals lubridate


    【解决方案1】:

    我有点不确定您在这里真正寻找的是什么,但我认为这可能会解决您的挑战。我对您帖子中提到的部分感到有些困惑

    我可以按开始日期排序,但我无法按间隔的一个特征排序,然后提取整个。

    使用purr::lubridate::,您可以执行以下操作:

    library(lubridate)
    #> 
    #> Attaching package: 'lubridate'
    #> The following object is masked from 'package:base':
    #> 
    #>     date
    library(purrr)
    
    so <- structure(
      list(
        int_a = new("Interval",
                    .Data = c(24192000, 52704000, 0, 64022400, NA, NA, NA, 0, NA, NA),
                    start = structure(c(1286841600, 1327276800, 1157068800, 1370995200, NA, NA, NA, 1296172800, NA, NA), 
                                      class = c("POSIXct", "POSIXt"), 
                                      tzone = "UTC"), tzone = "UTC"),
    
        int_b = new("Interval", 
                    .Data = c(NA, 2505600, NA, NA, 53222400, 7862400, NA, NA, 0, 116812800), 
                    start = structure(c(NA, 1402531200, NA, NA, 1397433600, 1307577600, NA, NA, 1366329600, 1320278400),
                                      class = c("POSIXct", "POSIXt"),
                                      tzone = "UTC"), tzone = "UTC"),
    
        int_c = new("Interval",
                    .Data = c(NA, NA, 19353600, NA, NA, 41472000, 0, NA, NA, NA),
                    start = structure(c(NA, NA, 1287446400, NA, NA, 1238025600, 1433203200, NA, NA, NA), 
                                      class = c("POSIXct", "POSIXt"), 
                                      tzone = "UTC"),tzone = "UTC"),
    
        int_d = new("Interval",
                    .Data = c(3024000, 9331200, NA, 0, 8899200, 36374400, 0, 3196800, 18748800, 28771200), 
                    start = structure(c(1316044800, 1396828800, NA, 1466640000, 1457568000, 1290038400, 1444694400, 1321315200, 1381968000, 1438300800), 
                                      class = c("POSIXct", "POSIXt"), 
                                      tzone = "UTC"), tzone = "UTC")),
    
      class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -10L))
    
    earliest_start_interval_list <- map_at(.x = so, .at = 1:ncol(so), ~ min(lubridate::int_start(.x), na.rm = TRUE))
    

    reprex package (v0.3.0) 于 2019 年 5 月 22 日创建

    【讨论】:

    • 非常感谢您的回复!我不清楚,对不起。我需要 (a) 找到一行中四个间隔之间的最早开始日期 (b) 然后提取从该日期开始的整个间隔并将其存储在一个新列中 (c) 重复第二个、第三个和第四(=最后)最早的开始日期,为了清楚起见,我编辑了原始帖子
    • 我试图做类似的事情:earliest_start_interval_row &lt;- pmap(.l = so, .f = ~min(lubridate::int_start(.x), na.rm = TRUE))(或使用min()而不是pmin()),但是这样NA在每次第一个值为NA时都会返回。我无法进一步进步,但也许其他人可以在这里插话。如果您在将所有内容放在一起之前根据原始间隔起始值计算 min,会怎样?
    猜你喜欢
    • 2020-02-19
    • 2013-06-28
    • 1970-01-01
    • 1970-01-01
    • 2020-07-11
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-03-07
    相关资源
    最近更新 更多