【问题标题】:Window function to count occurrences in last 10 minutes窗口函数用于计算过去 10 分钟内的发生次数
【发布时间】:2017-06-29 18:28:21
【问题描述】:

我可以使用传统的子查询方法来计算最近十分钟内出现的次数。例如,这个:

drop table if exists [dbo].[readings]
go

create table [dbo].[readings](
    [server] [int] NOT NULL,
    [sampled] [datetime] NOT NULL
)
go

insert into readings
values
(1,'20170101 08:00'),
(1,'20170101 08:02'),
(1,'20170101 08:05'),
(1,'20170101 08:30'),
(1,'20170101 08:31'),
(1,'20170101 08:37'),
(1,'20170101 08:40'),
(1,'20170101 08:41'),
(1,'20170101 09:07'),
(1,'20170101 09:08'),
(1,'20170101 09:09'),
(1,'20170101 09:11')
go

-- Count in the last 10 minutes - example periods 08:31 to 08:40, 09:12 to 09:21
select server,sampled,(select count(*) from readings r2 where r2.server=r1.server and r2.sampled <= r1.sampled and r2.sampled > dateadd(minute,-10,r1.sampled)) as countinlast10minutes
from readings r1
order by server,sampled
go

如何使用窗口函数来获得相同的结果?我试过这个:

select server,sampled,
count(case when sampled <= r1.sampled and sampled > dateadd(minute,-10,r1.sampled) then 1 else null end) over (partition by server order by sampled rows between unbounded preceding and current row) as countinlast10minutes
-- count(case when currentrow.sampled <= r1.sampled and currentrow.sampled > dateadd(minute,-10,r1.sampled) then 1 else null end) over (partition by server order by sampled rows between unbounded preceding and current row) as countinlast10minutes
from readings r1
order by server,sampled

但结果只是运行计数。任何引用当前行指针的系统变量? currentrow.sampled ?

【问题讨论】:

  • 试试这个 select count(1) from readings r1 where datediff(minute, getdate(), sampled)

标签: sql-server tsql window-functions


【解决方案1】:

据我所知,使用窗口函数并不能简单地完全替换您的子查询。

窗口函数对一组行进行操作,并允许您根据分区和顺序使用它们。 您尝试做的不是我们可以在窗口函数中使用的分区类型。 要生成分区,我们需要能够在这种情况下使用窗口函数,这只会导致代码过于复杂。

我建议cross apply() 作为您的子查询的替代方案。

我不确定您是否打算将结果限制在 9 分钟内,但 sampled &gt; dateadd(...) 是您原始子查询中发生的情况。

这是基于将样本划分为 10 分钟的窗口以及cross apply() 版本的窗口函数的外观。

select 
    r.server
  , r.sampled
  , CrossApply       = x.CountRecent
  , OriginalSubquery = (
      select count(*) 
      from readings s
      where s.server=r.server
        and s.sampled <= r.sampled
        /* doesn't include 10 minutes ago */
        and s.sampled > dateadd(minute,-10,r.sampled)
        )
  , Slices           = count(*) over(
      /* partition by server, 10 minute slices, not the same thing*/
      partition by server, dateadd(minute,datediff(minute,0,sampled)/10*10,0)
      order by sampled
      )
from readings r
  cross apply (
    select CountRecent=count(*) 
    from readings i
    where i.server=r.server
      /* changed to >= */
      and i.sampled >= dateadd(minute,-10,r.sampled) 
      and i.sampled <= r.sampled 
     ) as x
order by server,sampled

结果:http://rextester.com/BMMF46402

+--------+---------------------+------------+------------------+--------+
| server |       sampled       | CrossApply | OriginalSubquery | Slices |
+--------+---------------------+------------+------------------+--------+
|      1 | 01.01.2017 08:00:00 |          1 |                1 |      1 |
|      1 | 01.01.2017 08:02:00 |          2 |                2 |      2 |
|      1 | 01.01.2017 08:05:00 |          3 |                3 |      3 |
|      1 | 01.01.2017 08:30:00 |          1 |                1 |      1 |
|      1 | 01.01.2017 08:31:00 |          2 |                2 |      2 |
|      1 | 01.01.2017 08:37:00 |          3 |                3 |      3 |
|      1 | 01.01.2017 08:40:00 |          4 |                3 |      1 |
|      1 | 01.01.2017 08:41:00 |          4 |                3 |      2 |
|      1 | 01.01.2017 09:07:00 |          1 |                1 |      1 |
|      1 | 01.01.2017 09:08:00 |          2 |                2 |      2 |
|      1 | 01.01.2017 09:09:00 |          3 |                3 |      3 |
|      1 | 01.01.2017 09:11:00 |          4 |                4 |      1 |
+--------+---------------------+------------+------------------+--------+

【讨论】:

    【解决方案2】:

    这不是一个非常令人满意的答案,但一种可能性是首先创建一个包含所有分钟的帮助表

    CREATE TABLE #DateTimes(datetime datetime primary key);
    
    WITH E1(N) AS 
    (
        SELECT 1 FROM (VALUES(1),(1),(1),(1),(1),
                                (1),(1),(1),(1),(1)) V(N)
    )                                       -- 1*10^1 or 10 rows
    , E2(N) AS (SELECT 1 FROM E1 a, E1 b)   -- 1*10^2 or 100 rows
    , E4(N) AS (SELECT 1 FROM E2 a, E2 b)   -- 1*10^4 or 10,000 rows
    , E8(N) AS (SELECT 1 FROM E4 a, E4 b)   -- 1*10^8 or 100,000,000 rows
     ,R(StartRange, EndRange)
     AS (SELECT MIN(sampled),
                MAX(sampled)
         FROM   readings)
     ,N(N)
     AS (SELECT ROW_NUMBER()
                  OVER (
                    ORDER BY (SELECT NULL)) AS N
         FROM   E8)
    INSERT INTO #DateTimes
    SELECT TOP (SELECT 1 + DATEDIFF(MINUTE, StartRange, EndRange) FROM R) DATEADD(MINUTE, N.N - 1, StartRange)
    FROM   N,
           R;
    

    然后你可以使用ROWS BETWEEN 9 PRECEDING AND CURRENT ROW

    WITH T1 AS
    ( SELECT  Server,
                      MIN(sampled) AS StartRange,
                      MAX(sampled) AS EndRange
             FROM     readings
             GROUP BY Server )
    SELECT      Server,
                sampled,
                Cnt
    FROM        T1
    CROSS APPLY
                ( SELECT   r.sampled,
                                    COUNT(r.sampled) OVER (ORDER BY N.datetime ROWS BETWEEN 9 PRECEDING AND CURRENT ROW) AS Cnt
                          FROM      #DateTimes N
                          LEFT JOIN readings r
                          ON        r.sampled = N.datetime
                                    AND r.server = T1.server
                          WHERE     N.datetime BETWEEN StartRange AND EndRange ) CA
    WHERE       CA.sampled IS NOT NULL
    ORDER BY    sampled
    

    以上假设每分钟最多有一个样本,并且所有时间都是精确的分钟。如果这不是真的,则需要另一个表表达式按日期时间四舍五入到分钟进行预聚合。

    【讨论】:

      【解决方案3】:

      感谢 Martin 和 SqlZim 的回答。我将针对可在窗口聚合中使用的 %%currentrow 之类的东西提出 Connect 增强请求。我认为这会导致更简单自然的 sql:

      选择计数(采样时的情况 dateadd(minute,-10,%%currentrow.sampled) then 1 else null end) over (...无论窗口是什么... )

      我们已经可以使用这样的表达式:

      选择计数(采样时的情况 dateadd(minute,-10,getdate()) then 1 else null end) over (...whatever the window is...)

      因此,如果我们可以引用当前行中的列,那就太好了。

      【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2016-12-08
      • 2021-09-17
      • 2017-05-27
      • 2010-11-16
      • 1970-01-01
      • 1970-01-01
      • 2021-11-07
      相关资源
      最近更新 更多