【问题标题】:Run a SQL query against ten-minutes time intervals针对十分钟的时间间隔运行 SQL 查询
【发布时间】:2020-05-05 10:34:15
【问题描述】:

我有一个具有此架构的 postgresql 表:

id SERIAL PRIMARY KEY,
traveltime INT,
departuredate TIMESTAMPTZ,
departurehour TIMETZ

这是一些数据(已编辑):

 id | traveltime |     departuredate      | departurehour 
  ----+------------+------------------------+---------------
    1 |         73 | 2019-12-24 00:00:03+01 | 00:00:03+01
    2 |         73 | 2019-12-24 00:12:16+01 | 00:12:16+01
   53 |        115 | 2019-12-24 07:53:44+01 | 07:53:44+01
   54 |        116 | 2019-12-24 07:58:45+01 | 07:58:45+01
   55 |        119 | 2019-12-24 08:03:46+01 | 08:03:46+01
   56 |        120 | 2019-12-24 08:08:47+01 | 08:08:47+01
   57 |        121 | 2019-12-24 08:13:48+01 | 08:13:48+01
   58 |        121 | 2019-12-24 08:18:48+01 | 08:18:48+01
  542 |        112 | 2019-12-26 07:52:41+01 | 07:52:41+01 
  543 |        114 | 2019-12-26 07:57:42+01 | 07:57:42+01
  544 |        116 | 2019-12-26 08:02:43+01 | 08:02:43+01
  545 |        116 | 2019-12-26 08:07:44+01 | 08:07:44+01
  546 |        117 | 2019-12-26 08:12:45+01 | 08:12:45+01
  547 |        118 | 2019-12-26 08:17:46+01 | 08:17:46+01
  548 |        118 | 2019-12-26 08:22:48+01 | 08:22:48+01
 1031 |         80 | 2019-12-28 07:50:33+01 | 07:50:33+01
 1032 |         81 | 2019-12-28 07:55:34+01 | 07:55:34+01
 1033 |         81 | 2019-12-28 08:00:35+01 | 08:00:35+01
 1034 |         82 | 2019-12-28 08:05:36+01 | 08:05:36+01
 1035 |         82 | 2019-12-28 08:10:37+01 | 08:10:37+01
 1036 |         83 | 2019-12-28 08:15:38+01 | 08:15:38+01
 1037 |         83 | 2019-12-28 08:20:39+01 | 08:20:39+01

我想获得几周内每 10 分钟间隔为 traveltime 收集的所有值的平均值。

数据样本的预期结果:对于 8h00 和 8h10 之间的 10 分钟间隔,将包含在 avg 中的行带有 id 55, 56, 544, 545, 1033 and 1034 等等。

我可以得到特定区间的平均值:

select avg(traveltime) from belt where departurehour >= '10:40:00+01' and departurehour < '10:50:00+01';

为避免为每个间隔创建查询,我使用此查询来获取编码的完整时段的所有 10 分钟间隔:

select i from generate_series('2019-11-23', '2020-01-18', '10 minutes'::interval) i;

我想念的是一种将我的 AVG 查询应用于每个生成的间隔的方法。任何方向都会有所帮助!

【问题讨论】:

    标签: postgresql aggregate


    【解决方案1】:

    事实证明,无论日期范围如何,generate_series 实际上并不适用。关键部分是每天 144 个 10 分钟的间隔。不幸的是,Postgres 没有为小步舞曲提供间隔类型。 (也许创建一个会是一个有用的练习)。但一切都不是损失,你可以用 BETWEEN 模拟相同的,只需要玩范围的结束。
    下面使用递归 CTE 生成此模拟。然后像以前一样加入您的表。

    set timezone to '+1';    -- necessary to keep my local offset from effecting results. 
    -- create table an insert data here 
    -- additional data added outside of date range so should not be included) 
    with recursive min_intervals as 
           (select '00:00:00'::timetz        start_10Min   -- start of 1st 10Min interval
                 , '00:09:59.999999'::timetz end_10Min     -- last microsecond in 10Min interval
                 , 1 interval_no
            union all 
            select start_10Min + interval '10 min'        
                 , end_10Min   + interval '10 min'  
                 , interval_no + 1
              from Min_intervals
             where interval_no < 144                   -- 6 10Min intervals/hr * 24 Hr/day = No of 10Min intervals in any day
           )  -- select * from min_intervals;
    select start_10Min, end_10Min, avg(traveltime) average_travel_time
      from min_intervals
      join belt  
         on departuredate::time between start_10Min and end_10Min
      where departuredate::date between date '2019-11-23' and date '2020-01-18'  
      group by start_10Min, end_10Min
      order by start_10Min;   
    
    -- test result for 'specified' Note added rows fall within time frame 08:00 to 08:10
    -- but these should be excluded so the avg for that period should be the same for both queries.
     select avg(traveltime) from belt where id in (55, 56, 544, 545, 1033, 1034); 
    

    我的上述问题是数据范围本质上是硬编码的(是的替代参数可用)并且是手动的,但这对于 psql 或 IDE 来说是可以的,但对于生产环境来说并不好。如果要在该环境中使用它,我将使用以下函数返回相同结果的虚拟表。

     create or replace function travel_average_per_10Min_interval(
                                start_date_in date
                              , end_date_in   date
                              ) 
    returns table (Start_10Min     timetz
                  ,end_10Min       timetz
                  ,avg_travel_time numeric
                  )
    language sql
    as $$
        with recursive min_intervals as 
               (select '00:00:00'::timetz        start_10Min   -- start of 1st 10Min interval
                     , '00:09:59.999999'::timetz end_10Min     -- last microsecond in 10Min interval
                     , 1 interval_no
                union all 
                select start_10Min + interval '10 min'        
                     , end_10Min   + interval '10 min'  
                     , interval_no + 1
                  from Min_intervals
                 where interval_no < 144                        -- 6 10Min intervals/hr * 24 Hr/day = No of 10Min intervals in any day
               )  -- select * from min_intervals;
        select start_10Min, end_10Min, avg(traveltime) average_travel_time
          from min_intervals
          join belt  
            on departuredate::time between start_10Min and end_10Min
         where departuredate::date between start_date_in and end_date_in  
         group by start_10Min, end_10Min
         order by start_10Min;                  
    $$;
    
    -- test 
    select * from travel_average_per_10Min_interval(date '2019-11-23', date '2020-01-18');
    

    【讨论】:

    • 谢谢!但我更想要的是获取每 10 分钟间隔的平均值,而不是全局值。
    • 然后发布一些示例数据 - 作为文本没有图像 - 以及该数据的预期结果。也请查看How to Ask
    • 我编辑了问题以放置一些数据(足够两个时间间隔)。我已经从你的回答中学到的一件事是我不需要离开时间列,它只是我 csv 中离开日期的“手动”拆分!
    • 查看修改后的答案部分。
    • 有效,太好了!我只是有这个查询的最后一个问题,我试图理解它并在你回答后解决它,但没有成功:我需要与 generate_series 中的间隔相同的顺序,但我找不到如何实现这一点。你觉得有可能吗?
    猜你喜欢
    • 2014-01-26
    • 2011-02-17
    • 2023-02-21
    • 2019-09-12
    • 2020-02-14
    • 1970-01-01
    • 2016-04-15
    • 1970-01-01
    • 2014-02-19
    相关资源
    最近更新 更多