【问题标题】:data.table: fast calculate statistics of rows time within bidirectional time moving windowdata.table:快速计算双向时间移动窗口内的行时间统计
【发布时间】:2018-04-09 21:51:50
【问题描述】:
library(data.table)
library(lubridate)
df <- data.table(col1 = c('A', 'A', 'A', 'B', 'B', 'B'), col2 = c("2015-03-06 01:37:57", "2015-03-06 01:39:57", "2015-03-06 01:45:28", "2015-03-06 02:31:44", "2015-03-06 03:55:45", "2015-03-06 04:01:40"))

对于每一行,我想计算具有相同 'col1' 值的行的时间标准偏差(col2)和该行时间之前过去 10 分钟(包括)和接下来 10 分钟之后的时间窗口内的时间行(包括)

我尝试使用基于previous question解决方案的快速方法

df$col2 <- as_datetime(df$col2)
gap <- 10L
df[, feat1 := .SD[.(col1 = col1, t1 = col2 - gap * 60L, t2 = col2 + gap * 60L)
                  , on = .(col1, col2 >= t1, col2 <= t2)
                  , .(col1, col2 = x.col2, times = as.numeric(col2))
                  ][, .(sd_times = sd(times))
                    , by = .(col1, col2)]$sd_times][]

但我又犯了下一个错误:

Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__,  : 
  Join results in 14 rows; more than 12 = nrow(x)+nrow(i). Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and datatable-help for advice.

【问题讨论】:

  • 解决方案可能是设置 allow.cartesian=TRUE ,如错误消息中所述。

标签: r data.table


【解决方案1】:

我已经使用上面的Frank 评论解决了我的任务:

df[, feat1 := .SD[.(col1 = col1, t1 = col2 - gap * 60L, t2 = col2 + gap * 60L)
                  , on = .(col1, col2 >= t1, col2 <= t2)
                  , .(col1, col2 = x.col2, times = as.numeric(col2)), allow.cartesian=TRUE
                  ][, .(sd_times = sd(times))
                    , by = .(col1, col2)]$sd_times][]

【讨论】:

    猜你喜欢
    • 2018-09-11
    • 1970-01-01
    • 1970-01-01
    • 2021-12-29
    • 2021-11-19
    • 2016-04-12
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多