【问题标题】:Always get latest record for a user when grouping: SQL分组时始终获取用户的最新记录:SQL
【发布时间】:2018-07-23 01:59:19
【问题描述】:

我有一个如下所示的 postgres 表:

user_id date          val
1       2015-01-01    1
2       2015-01-01    2
1       2015-01-30    7
3       2015-02-01    1
3       2015-02-05    7
3       2015-02-12    3
4       2015-02-10    1
4       2015-02-11    2

我希望能够按月分组获取 vals 的总和,以便它只计算用户最新值的总和。

预期输出:

date         sum
2015-01-01   9
2015-02-01   5

我希望有一些灵活的方法允许使用相同的代码以不同的方式进行聚合。所以如果我决定按 user_id 分组

user_id   sum
1         7
3         3
4         2

我可以想到一些基于 max 等的复杂 SQL 连接。但我想知道是否有更优雅的东西?

【问题讨论】:

  • 它是最新值而不是最大值。所以 2 月 user_id = 3 的最新值为 3,而 2 月 user_id = 4 的最新值为 2

标签: sql postgresql group-by


【解决方案1】:

每月最高金额:

-- sum the inner max values using only the date to group by
select d, sum(maxV) as sumMaxV
from
(
    SELECT DISTINCT  -- needed to trim down results from partition
    date_trunc('month',dated) as d,
    first_value(val) OVER ( -- only the first result is taken for each partition, they are 
                            -- identical due to ordering, hence we need distinct them
        PARTITION BY date_trunc('month',dated), user_id     
        ORDER BY val DESC) as maxV   
    FROM T
) tmp
group by d

结果:

d                      sumMaxV
2015-01-01T00:00:00Z      9
2015-02-01T00:00:00Z      9

每月最后一次的总和:

-- sum the inner lastV values using only the date to group by
select d, sum(lastV) as sumLastV
from
(
    SELECT DISTINCT   
    date_trunc('month',dated) as d,
    first_value(val) OVER ( 
        PARTITION BY date_trunc('month',dated), user_id     
        ORDER BY dated DESC) as lastV   
    FROM T
) tmp
group by d

输出:

d                      sumlastv
2015-01-01T00:00:00Z      9
2015-02-01T00:00:00Z      5

数据:

CREATE TABLE T  ("user_id" int, "dated" timestamp, "val" int);    
INSERT INTO T   ("user_id", "dated", "val")
VALUES          (1, '2015-01-01 00:00:00', 1),
                (2, '2015-01-01 00:00:00', 2),
                (1, '2015-01-30 00:00:00', 7),
                (3, '2015-02-01 00:00:00', 1),
                (3, '2015-02-05 00:00:00', 7),
                (3, '2015-02-12 00:00:00', 3),
                (4, '2015-02-10 00:00:00', 1),
                (4, '2015-02-11 00:00:00', 2);

【讨论】:

  • 谢谢——希望按最新值而不是最大值进行分组
  • @eljusticiero67 也添加了上个月的总和。
  • 并改为使用date_trunc 而不是substr(cast (dated as varchar(20)) ,1 , 7) - 这应该有助于使用索引。
【解决方案2】:

Following 让您可以按用户或日期分组(灵感来自@Patrick Artner 的解决方案)

-- split date to year, month
with dd as(
select user_id, extract(year from dates) as yyyy,
      extract(month from dates) as mm, val
from mytable),
-- get the latest value per user_id, year and month
aggs as(
select distinct user_id, yyyy, mm, 
  last_value(val) OVER (PARTITION BY user_id, yyyy, mm ORDER BY yyyy, mm) as latest
from dd
) 
-- group by either user_id or date
select user_id, -- concat(yyyy, '-',mm, '-01')::date, 
sum(latest) as total
from aggs
group by 1;

【讨论】:

    猜你喜欢
    • 2022-12-04
    • 1970-01-01
    • 2020-07-21
    • 1970-01-01
    • 1970-01-01
    • 2012-02-12
    • 1970-01-01
    • 1970-01-01
    • 2012-04-10
    相关资源
    最近更新 更多