【问题标题】:In postgresql, how to count runs in a sequence across repeating partitions?在postgresql中,如何计算跨重复分区的顺序运行?
【发布时间】:2018-08-10 03:35:37
【问题描述】:

我有一个用户完成了一系列事件。我想记录他们完成每个事件的次数和顺序。

所以对于下表user_events

name  eventname  time    
Ted   a          12:01
Ted   b          12:02
Ted   b          12:03
Ted   b          12:04
Ted   c          12:05
Ted   b          12:06
Ted   b          12:07
Ted   c          12:08
Ted   b          12:09
Ted   b          12:11
Ted   b          12:12

我应该得到:

name  eventname  event_sequence_number  time_started  frequency
Ted   a          1                      12:01         1
Ted   b          2                      12:02         3
Ted   c          3                      12:05         1
Ted   b          4                      12:06         2
Ted   c          5                      12:08         1
Ted   b          6                      12:09         3

我一直在尝试使用 rank()、dense_rank()、row_number()lag(),但无法将它们放在一起。有什么想法吗?

【问题讨论】:

    标签: sql postgresql sequence aggregation gaps-and-islands


    【解决方案1】:

    试试这个。它使用 Tabibitosan 方法(分组序列范围):Toolbox

    SQL Fiddle

    PostgreSQL 9.6 架构设置

    CREATE TABLE user_events
        (user_name varchar(3), eventname varchar(1), event_time time)
    ;
    
    INSERT INTO user_events
        (user_name, eventname, event_time)
    VALUES
        ('Ted', 'a', '12:01'),
        ('Ted', 'b', '12:02'),
        ('Ted', 'b', '12:03'),
        ('Ted', 'b', '12:04'),
        ('Ted', 'c', '12:05'),
        ('Ted', 'b', '12:06'),
        ('Ted', 'b', '12:07'),
        ('Ted', 'c', '12:08'),
        ('Ted', 'b', '12:09'),
        ('Ted', 'b', '12:11'),
        ('Ted', 'b', '12:12')
    ;
    

    查询 1

    SELECT t.user_name
        ,t.eventname
        ,row_number() OVER (
            ORDER BY MIN(event_time)
            ) AS event_sequence_number
        ,MIN(event_time) AS time_started
        ,COUNT(*) as frequency
    FROM (
        SELECT user_name
            ,eventname
            ,event_time
            ,row_number() OVER (
                ORDER BY event_time
                ) - row_number() OVER (
                PARTITION BY eventname ORDER BY event_time
                    ,eventname
                )  seq
        FROM user_events
        ) t
    GROUP BY user_name
        ,eventname
        ,seq
    ORDER BY time_started
    

    Results

    | user_name | eventname | event_sequence_number | time_started | frequency |
    |-----------|-----------|-----------------------|--------------|-----------|
    |       Ted |         a |                     1 |     12:01:00 |         1 |
    |       Ted |         b |                     2 |     12:02:00 |         3 |
    |       Ted |         c |                     3 |     12:05:00 |         1 |
    |       Ted |         b |                     4 |     12:06:00 |         2 |
    |       Ted |         c |                     5 |     12:08:00 |         1 |
    |       Ted |         b |                     6 |     12:09:00 |         3 |
    

    【讨论】:

    • 一段漂亮的代码——我最终创造了类似的东西。对于通过聚合数百万在线用户行为创建更清晰的路径分析非常有用。
    猜你喜欢
    • 2016-04-26
    • 2020-12-31
    • 2010-12-02
    • 1970-01-01
    • 2011-02-28
    • 2013-12-13
    • 1970-01-01
    • 2019-10-27
    • 1970-01-01
    相关资源
    最近更新 更多