【问题标题】:PostgreSQL getting daily, weekly, and monthly averages of the occurrences of an event in one queryPostgreSQL 在一次查询中获取事件发生的每日、每周和每月平均值
【发布时间】:2016-11-08 16:03:47
【问题描述】:

目前我有这个相当大的查询,由

  1. 通过获取按事件名称和日期分组的事件的count(),将每日、每周、每月计数汇总到中间表中。
  2. 通过 avg() group by just event 选择每个中间表的平均计数,合并结果,因为我希望每天、每周、每月有一个单独的列,将填充值设置为 0到空列中。
  3. 然后我对所有列求和,0 基本上充当无操作,这为每个事件提供了一个值。

虽然查询很大,但我觉得我在做很多重复的工作。有什么方法可以更好地执行此查询或使其更小吗?我以前没有真正做过这样的查询,所以我不太确定。

WITH monthly_counts as (
  SELECT
    event,
    count(*) as count
  FROM tracking_stuff
  WHERE
    event = 'thing'
    OR event = 'thing2'
    OR event = 'thing3'
  GROUP BY event, date_trunc('month', created_at)
),
weekly_counts as (
  SELECT
    event,
    count(*) as count
  FROM tracking_stuff
  WHERE
    event = 'thing'
    OR event = 'thing2'
    OR event = 'thing3'
  GROUP BY event, date_trunc('week', created_at)
),
daily_counts as (
  SELECT
    event,
    count(*) as count
  FROM tracking_stuff
  WHERE
    event = 'thing'
    OR event = 'thing2'
    OR event = 'thing3'
  GROUP BY event, date_trunc('day', created_at)
),
query as (
  SELECT
    event,
    0 as daily_avg,
    0 as weekly_avg,
    avg(count) as monthly_avg
  FROM monthly_counts
  GROUP BY event
  UNION
  SELECT
    event,
    0 as daily_avg,
    avg(count) as weekly_avg,
    0 as monthly_avg
  FROM weekly_counts
  GROUP BY event
  UNION
  SELECT
    event,
    avg(count) as daily_avg,
    0 as weekly_avg,
    0 as monthly_avg
  FROM daily_counts
  GROUP BY event
)
SELECT
  event,
  sum(daily_avg) as daily_avg,
  sum(weekly_avg) as weekly_avg,
  sum(monthly_avg) as monthly_avg
FROM query
GROUP BY event;

【问题讨论】:

    标签: sql postgresql query-optimization aggregate analytics


    【解决方案1】:

    我会这样写查询:

    select event, daily_avg, weekly_avg, monthly_avg
    from (
        select event, avg(count) monthly_avg
        from (
            select event, count(*)
            from tracking_stuff
            where event in ('thing1', 'thing2', 'thing3')
            group by event, date_trunc('month', created_at)
        ) s
        group by 1
    ) monthly
    join (
        select event, avg(count) weekly_avg
        from (
            select event, count(*)
            from tracking_stuff
            where event in ('thing1', 'thing2', 'thing3')
            group by event, date_trunc('week', created_at)
        ) s
        group by 1
    ) weekly using(event)
    join (
        select event, avg(count) daily_avg
        from (
            select event, count(*)
            from tracking_stuff
            where event in ('thing1', 'thing2', 'thing3')
            group by event, date_trunc('day', created_at)
        ) s
        group by 1
    ) daily using(event)
    order by 1;
    

    如果where 条件消除了大部分数据(比如一半以上),则使用cte 可以稍微加快查询执行速度:

    with the_data as (
        select event, created_at
        from tracking_stuff
        where event in ('thing1', 'thing2', 'thing3')
        )
    
    select event, daily_avg, weekly_avg, monthly_avg
    from (
        select event, avg(count) monthly_avg
        from (
            select event, count(*)
            from the_data
            group by event, date_trunc('month', created_at)
        ) s
        group by 1
    ) monthly
    --  etc ... 
    

    出于好奇,我对数据进行了测试:

    create table tracking_stuff (event text, created_at timestamp);
    insert into tracking_stuff
        select 'thing' || random_int(9), '2016-01-01'::date+ random_int(365)
        from generate_series(1, 1000000);
    

    在每个查询中,我都将 thing 替换为 thing1,因此查询消除了大约 2/3 的行。

    10 次测试的平均执行时间:

    Original query          1106 ms
    My query without cte    1077 ms
    My query with cte        902 ms
    Clodoaldo's query       5187 ms
    

    【讨论】:

    • 只是一个真正的快速问题,没有检查任何事实......加入不是比工会更昂贵吗?除了偏好之外,还有什么理由不使用with
    • 在这种情况下unionjoin 之间的区别应该是难以察觉的。类似的评论可能涉及使用cte。通常我在需要递归时使用with
    • a CTE 是规划器的优化栅栏。可能会或不会有所作为。
    【解决方案2】:

    在 9.5+ 中使用 grouping sets

    FROM 和 WHERE 子句选择的数据按每个指定的分组集单独分组,就像简单的 GROUP BY 子句一样为每个组计算聚合,然后返回结果

    select event,
        avg(total) filter (where day is not null) as avg_day,
        avg(total) filter (where week is not null) as avg_week,
        avg(total) filter (where month is not null) as avg_month    
    from (
        select
            event,
            date_trunc('day', created_at) as day,
            date_trunc('week', created_at) as week,
            date_trunc('month', created_at) as month,
            count(*) as total
        from tracking_stuff
        where event in ('thing','thing2','thing3')
        group by grouping sets ((event, 2), (event, 3), (event, 4))
    ) s
    group by event
    

    【讨论】:

    • 这是非常有趣的提示!虽然我的直觉告诉我这个查询应该相当昂贵。
    猜你喜欢
    • 1970-01-01
    • 2018-12-19
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-08-04
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多