【问题标题】:Take the first row from every time window从每个时间窗口取第一行
【发布时间】:2020-05-21 12:56:25
【问题描述】:

我有 2 列:一列是时间列,另一列是某种布尔类型列:

GMT  VAL
2010-08-01 10:59:32   1
2010-08-01 10:59:33   0
2010-08-01 10:59:34   1
2010-08-01 10:59:36   0
2010-08-01 10:59:38   1
2010-08-01 10:59:41   1
2010-08-01 10:59:43   0
2010-08-01 10:59:45   1
2010-08-01 10:59:47   0
2010-08-01 10:59:53   1

我想从每个 10 秒的窗口中取出第一行。

GMT  VAL
2010-08-01 10:59:32   1
2010-08-01 10:59:43   0

我该怎么做?

【问题讨论】:

    标签: sql vertica


    【解决方案1】:

    你可以使用row_number():

    select t.*
    from (select t.*,
                 row_nubmer() over (partition by date_trunc('minute', gmt), floor(extract(seconds from gmt) / 6)
                                    order by gmt) as seqnum
          from t
         ) t
    where seqnum = 1;
    

    你也可以转换成字符串:

    select t.*
    from (select t.*,
                 row_nubmer() over (partition by left(to_char(gmt, 'YYYYMMDDHH24MMSS'), 13)
                                    order by gmt) as seqnum
          from t
         ) t
    where seqnum = 1;
    

    或者使用epoch:

    select t.*
    from (select t.*,
                 row_nubmer() over (partition by floor(extract(epoch from gmt) / 10)                                   order by gmt) as seqnum
          from t
         ) t
    where seqnum = 1;
    

    【讨论】:

      【解决方案2】:

      Vertica 处于最佳状态 - 尽管您确实将时间序列捕捉到下一个 10 秒的边界以从它开始,因此如果您不修复该问题,则会返回不同的行。

      如果您确实需要确切的原始时间戳,则将最小时间序列时间戳与下一个最小实际时间戳之间的差异添加到 10 秒快照时间片中 - 在这种特定情况下为 2 秒 - 检查tb 和下面的 ts 公用表表达式。

      WITH
      -- your input ...
      input(gmt,val) AS (
                SELECT TIMESTAMP '2010-08-01 10:59:32',1
      UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:33',0
      UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:34',1
      UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:36',0
      UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:38',1
      UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:41',1
      UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:43',0
      UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:45',1
      UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:47',0
      UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:53',1
      )
      ,
      -- create the timeseries - I decide to snap it to exact 10-second time slices
      -- use the Vertica TIME SLICE function to create the limits of the time series
      tm(tm) AS (
                  SELECT MIN(TIME_SLICE(gmt,10,'SECOND','START')) AS tm FROM input
        UNION ALL SELECT MAX(TIME_SLICE(gmt,10,'SECOND','START')) AS tm FROM input
      )
      ,
      -- use Vertica's TIMESERIES clause to actually create the time series
      -- which will be snapped to 10-second borders
      tb(tb) AS (
        SELECT tb 
        FROM tm
        TIMESERIES tb AS '10 SECONDS' OVER(ORDER BY tm)
      )
      ,
       -- add the difference between timeseries timestamp and actual timestamp
      ts(ts) AS (
        SELECT 
          tb +( (SELECT MIN(gmt) FROM INPUT) - (SELECT MIN(tb) FROM tb) )
        FROM tb
      )
      -- finally, use the "Event Series Join"
      -- - That's the INTERPOLATE PREVIOUS VALUE predicate - 
      -- to apply an outer join
      SELECT
        gmt
      , ts AS control_ts
      , val
      FROM input
      LEFT
      JOIN ts
        ON gmt INTERPOLATE PREVIOUS VALUE ts
      WHERE gmt IS NOT NULL
      -- Vertica's Analytic Limit Clause
      LIMIT 1 OVER(PARTITION BY ts ORDER BY gmt)
      ;
      

      返回:

               gmt         |     control_ts      | val 
      ---------------------+---------------------+-----
       2010-08-01 10:59:32 | 2010-08-01 10:59:32 |   1
       2010-08-01 10:59:43 | 2010-08-01 10:59:42 |   0
       2010-08-01 10:59:53 | 2010-08-01 10:59:52 |   1
      
      

      【讨论】:

        猜你喜欢
        • 2021-04-06
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2015-12-23
        相关资源
        最近更新 更多