【问题标题】:Detect Value Changes beyond a threshold in Time Series data in SQL在 SQL 中检测超出时间序列数据阈值的值更改
【发布时间】:2019-08-14 01:00:03
【问题描述】:

在 PostgreSQL 中,我试图找到具有低于 60 的值序列的主题,然后是随后出现的两个高于 60 的连续值。我还对第一个低于 60 的记录值和第二个高于 60 的值之间的时间长度感兴趣。对于每个主题,此事件可能发生多次。

我正在努力找出如何搜索无限数量的值 = 60。

RowID    SubjectID    Value    TimeStamp    
1           1          65   2142-04-29 12:00:00 
2           1          58   2142-04-30 03:00:00 
3           1          55   2142-04-30 04:00:00
4           1          54   2142-04-30 05:00:00
5           1          55   2142-04-30 06:15:00  
6           1          56   2142-04-30 06:45:00
7           1          65   2142-04-30 07:00:00 
8           1          65   2142-04-30 08:00:00
9           2          48   2142-05-04 03:30:00 
10          2          48   2142-05-04 04:00:00
11          2          50   2142-05-04 05:00:00
12          2          69   2142-05-04 06:00:00
13          2          68   2142-05-04 07:00:00
14          2          69   2142-05-04 08:00:00
15          2          50   2142-05-04 09:00:00
16          2          55   2142-05-04 10:00:00
17          2          50   2142-05-04 10:30:00
18          2          67   2142-05-04 11:00:00
19          2          67   2142-05-04 12:00:00

我目前的尝试使用了滞后和领先功能,但是当我不确定我需要向前看多远时,我不确定如何使用这些功能。这是一个向前看一个值和向后看一个值的例子。我的问题是我不知道如何按subjectID 进行分区以查看“t”时间点,其中每个主题的“t”可能不同。

select t.subjectId, t.didEventOccur,
   (next_timestamp - timestamp) as duration
from (select t.*,
         lag(t.value) over (partition by t.subjectid order by t.timestamp) 
as prev_value,
         lead(t.value) over (partition by t.subjectid order by 
t.timestamp) as next_value,
         lead(t.timestamp) over (partition by t.subjectid order by 
 t.timestamp) as next_timestamp
  from t
 ) t
where value < 60 and next_value < 60 and
  (prev_value is null or prev_value >= 60);

我希望得到这样的输出:

SubjectID  DidEventOccur Duration 
 1          1             05:00:00
 2          1             03:30:00
 2          1             03:00:00

【问题讨论】:

  • 如果连续超过 2 行的值 > 60 怎么办?忽略其余的? Postgres 版本?
  • 是的,其余大于 60 的值可以忽略。我希望这可以澄清它。
  • 您似乎认为该事件每个用户只能发生一次?会不会出现多次,然后返回什么?
  • 好点!我只讨论一个事件,但如果它发生多次,我应该包括这个案例。谢谢!

标签: sql postgresql time-series window-functions gaps-and-islands


【解决方案1】:

您一直要求的纯 SQL 解决方案:

SELECT subjectid, start_at, next_end_at - start_at AS duration
FROM  (
   SELECT *
        , lead(end_at) OVER (PARTITION BY subjectid ORDER BY start_at) AS next_end_at
   FROM  (
      SELECT subjectid, grp, big
           , min(ts) AS start_at
           , max(ts) FILTER (WHERE big AND big_rn = 2) AS end_at  -- 2nd timestamp
      FROM  (
         SELECT subjectid, ts, grp, big
              , row_number() OVER (PARTITION BY subjectid, grp, big ORDER BY ts) AS big_rn
         FROM  (
            SELECT subjectid, ts
                 , row_number() OVER (PARTITION BY subjectid ORDER BY ts)
                 - row_number() OVER (PARTITION BY subjectid, (value > 60) ORDER BY ts) AS grp
                 , (value > 60) AS big
            FROM   tbl
            ) sub1
         ) sub2
      GROUP  BY subjectid, grp, big
      ) sub3
   ) sub4
WHERE  NOT big                 -- identifies block of values <= 60 ...
AND    next_end_at IS NOT NULL -- ...followed by at least 2 values > 60
ORDER  BY subjectid, start_at;

我省略了无用的列DidEventOccur,而是添加了start_at。否则正是你想要的结果。

db小提琴here

考虑使用 plpgsql(或任何 PL)中的程序解决方案,它应该会更快。更简单?我会说是的,但这取决于谁在评判。请参阅(解释了解该技术和更多链接):

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2014-12-15
    • 2021-03-13
    • 1970-01-01
    • 2020-10-07
    • 2018-12-16
    • 2012-08-28
    相关资源
    最近更新 更多