根据小时过滤窗口功能答案

【问题标题】：Filter on window function according to hour根据小时过滤窗口功能
【发布时间】：2013-02-07 02:35:55
【问题描述】：

我想使用两个不同（但相似）的窗口函数来计算两个值 SUM 和 COUNT 在 is_active over user_id+item 上，只到行的时间 - 减去 1 小时。我的直觉是使用 ROWS UNBOUNDED PRECEDING 但这样我就无法过滤了

COUNT(1) OVER(PARTITION BY user_id, item ORDER BY req_time ROWS UNBOUNDED PRECEDING) 
SUM(is_active) OVER(PARTITION BY user-id, item ORDER BY req_time ROWS UNBOUNDED PRECEDING)

但是，这并没有考虑到“1 小时前”间隔因素

考虑以下数据：

user_id |     req_time       | item  | is_active |  
--------+--------------------+-------------------+---
1   | 2011-01-01 12:00:00|   1   |     0     |
1   | 2011-01-01 12:30:00|   1   |     1     |
1   | 2011-01-01 15:00:00|   1   |     1     |
1   | 2011-01-01 16:00:00|   1   |     0     |
1   | 2011-01-01 16:00:00|   2   |     0     |
1   | 2011-01-01 16:20:00|   2   |     1     |
2   | 2011-02-02 11:00:00|   1   |     1     |
2   | 2011-02-02 13:00:00|   1   |     0     |
1   | 2011-02-02 16:20:00|   1   |     0     |
1   | 2011-02-02 16:30:00|   2   |     0     |

我希望得到以下结果：“值 1”是 SUM(is_active)，“值 2”是 COUNT(1)：

user_id |     req_time       | item  | value 1 | value 2 |  
--------+--------------------+-----------------+---------+
1   | 2011-01-01 12:00:00|   1   |    0    |    0    |
1   | 2011-01-01 12:30:00|   1   |    0    |    0    |
1   | 2011-01-01 15:00:00|   1   |    1    |    2    |
1   | 2011-01-01 16:00:00|   1   |    2    |    3    |
1   | 2011-01-01 16:00:00|   2   |    0    |    0    |
1   | 2011-01-01 16:20:00|   2   |    0    |    0    |
2   | 2011-02-02 11:00:00|   1   |    0    |    0    |
2   | 2011-02-02 13:00:00|   1   |    1    |    1    |
1   | 2011-02-02 16:20:00|   1   |    2    |    4    |
1   | 2011-02-02 16:30:00|   2   |    1    |    2    |

我使用的是基于 Postgresql 8.2.15 的 Greenplum 4.21

提前致谢！吉利比

【问题讨论】：

Postgresql 8.2 没有窗口函数。你试过你的查询吗？
Postgresql 8.2 没有窗口功能，但 Greenplum 4.2 有。我在问题中添加的查询运行完美，除了“1 小时前”条件

标签： sql postgresql greenplum

【解决方案1】：

我不确定如何使用窗口函数来做到这一点，至少很容易。

我知道的最简单的方法是在 select 子句中使用相关子查询：

select t.*,
       (select count(*) from t t2
        where t2.user_id = t.user_id and t2.item = t.item and
              t2.req_time < t.req_time - interval '1 hour'
       ) as value1,
       (select SUM(is_active) from t t2
        where t2.user_id = t.user_id and t2.item = t.item and
              t2.req_time < t.req_time - interval '1 hour'
       ) as value2
from t

您可以在没有相关子查询的情况下执行此操作。只是有点麻烦。 . .

select t.user_id, t.req_time, t.item,
       count(*) as value1,
       sum(t2.isactive) as value2
from t left outer join
     t t2
     on t.user_id = t2.user_id and
        t.item = t2.item and
        t2.req_time < t.req_time - interval '1 hour'
group by t.user_id, t.req_time, t.item

这可能比关联子查询版本更有效（因为有两个关联）。而且，它应该在 GreenPlum 中工作。我没有意识到它缺乏对相关子查询的支持。这是对 ANSI 的重大突破。

【讨论】：

如果 PostgreSQL 拥有它，这就是 CROSS APPLY 会大放异彩的地方。我的意思是，你可以执行一个子选择而不是两个。
我担心 Greenplum（而且我相信大部分 Posgresql 如果不是全部）没有相关的子查询
@gilibi PostgreSQL 确实有相关的子查询。

【解决方案2】：

8.3 at SQL Fiddle。只有一个子选择。

select user_id, req_time, item, v[1] as value1, v[2] as value2
from (
    select t.*,
        (
            select array[
                coalesce(sum(is_active::integer), 0),
                count(*)
                ] as v
            from t s
            where
                user_id = t.user_id
                and item = t.item
                and req_time <= t.req_time - interval '1 hour'
        ) as v
    from t
) s
order by req_time, user_id, item

【讨论】：