【问题标题】:How do I make an aggregate on an integer with a grouped column, for which I only want some included?如何对具有分组列的整数进行聚合,我只希望包含一些列?
【发布时间】:2020-04-17 18:54:43
【问题描述】:

我有一张表prices 保存一些产品的所有价格:

CREATE TABLE prices (
  id INT,
  product_id INT, /*Foreign key*/
  created_at TIMESTAMP,
  price INT
);

product_id 的第一个实体是它的初始销售价格。如果产品随后减少,则将添加一个新实体。

我想找出所有产品每天的平均价格和总价格变化

这是一些示例数据:

INSERT INTO prices (id, product_id, created_at, price) VALUES (1, 1, '2020-01-01', 11000);
INSERT INTO prices (id, product_id, created_at, price) VALUES (2, 2, '2020-01-01', 3999);
INSERT INTO prices (id, product_id, created_at, price) VALUES (3, 3, '2020-01-01', 9999);
INSERT INTO prices (id, product_id, created_at, price) VALUES (4, 4, '2020-01-01', 2000);
INSERT INTO prices (id, product_id, created_at, price) VALUES (5, 1, '2020-01-02', 9999);
INSERT INTO prices (id, product_id, created_at, price) VALUES (6, 2, '2020-01-02', 2999);    
INSERT INTO prices (id, product_id, created_at, price) VALUES (7, 5, '2020-01-02', 2999);
INSERT INTO prices (id, product_id, created_at, price) VALUES (8, 1, '2020-01-03', 8999);
INSERT INTO prices (id, product_id, created_at, price) VALUES (9, 1, '2020-01-03 10:00:00', 7000);
INSERT INTO prices (id, product_id, created_at, price) VALUES (10, 5, '2020-01-03', 4000);
INSERT INTO prices (id, product_id, created_at, price) VALUES (11, 6, '2020-01-03', 3999);
INSERT INTO prices (id, product_id, created_at, price) VALUES (12, 3, '2020-01-03', 6999);

预期的结果应该是:

date       mean_price_change    total_price_change
2020-01-01 0                    0
2020-01-02 1000.5               2001
2020-01-03 1666                 4998

说明:

  • “2020-01-01”的平均降价和总计为 0,因为所有产品在该日期都是新的。
  • 在“2020-01-02”上,平均价格变化为:(11000-9999 + 3999-2999)/2 = 1000.5,因为 product_id 12 都已减少到 9999 和 2999天,它们之前的价格是 11000 和 3999,总共减少了:(11000-9999 + 3999-2999) = 2001。
  • 在“2020-01-03”上,仅更改了 product_id 1351 在一天中的两个不同时间:9999 => 8999 => 7000(最后一次执政)和3:从 9999 => 6999 a 然后 5:从 2999 => 4000 上升。这给出了总计:(9999-7000 + 9999-6999 + 2999-4000) = 4998 当天平均降价:1666

我这里也添加了数据:https://www.db-fiddle.com/f/tJgoKFMJxcyg5gLDZMEP77/1

我说要玩一些DISTINCT ON,但似乎没有这样做......

【问题讨论】:

  • 同一产品在同一日期有两个价格,这让我感到困惑。

标签: sql postgresql business-intelligence


【解决方案1】:

你似乎想要lag() 和聚合:

select created_at, avg(prev_price - price), sum(prev_price - price)
from (select p.*, lag(price) over (partition by product_id order by created_at) as prev_price
      from prices p
     ) p
group by created_at
order by created_at;

您在 2020 年 1 月 3 日有两个产品 1 的价格。一旦我解决了这个问题,我会得到与你的问题相同的结果。 Here 是 dbfiddle。

编辑:

每天处理多个价格:

select created_at, avg(prev_price - price), sum(prev_price - price)
from (select p.*, lag(price) over (partition by product_id order by created_at) as prev_price
      from (select distinct on (product_id, created_at::date) p.*
            from prices p
            order by product_id, created_at::date
           ) p
     ) p
group by created_at
order by created_at;

【讨论】:

  • 同一日期可以有多个价格变动。实际上 created_at 是一个时间戳,因此您可以说特定日期(按时间戳)的“最后”价格变化是“支配”价格变化。你能把它合并吗?
  • 真棒回答戈登! @NielsKristian 您可以使用另一个子查询来消除不相关的行:dbfiddle.uk/…
  • @NielsKristian 。 . .是的。在使用lag()之前,您可以每天只保留一条记录。
【解决方案2】:

试试这个

select 
created_at, 
avg(change),
sum(change)
from
(
    with cte as 
    (
    select 
    id, 
    product_id,
    created_at,
    lag(created_at) over(order by product_id, created_at) as last_date,
    price
    from prices
    )
    select
    c.id,
    c.product_id,
    c.created_at,
    c.last_date,
    p.price as last_price,
    c.price,
    COALESCE(p.price - c.price,0) as change
    from cte c
    left join prices p on c.product_id =p.product_id  and c.last_date =p.created_at
  where p.price != c.price or p.price is null
) tmp
group by created_at
order by created_at

【讨论】:

  • 抱歉,我对created_at 列的类型不够明确。其实是TIMESTAMP
  • 它有什么问题?
【解决方案3】:

下面的查询跟踪所有价格变化,请注意我们加入当前和早期基于

  • 他们的产品是一样的
  • earlier 确实比 current 早
  • earlier 是比当前日期早的最新项目
  • current 是它自己日期的最新项目

select today.product_id, (today.price - coalesce(earlier.price)), today.created_at as difference from prices current join prices earlier on today.product_id = earlier.product_id and earlier.created_at < current.created_at where not exists ( select 1 from prices later where later.product_id = today.product_id and ( ((today.created_at = later.created_at) and (today.id < later.id)) or ((earlier.created_at <= later.created_at) and (earlier.id < later.id)) ) );

现在,让我们做一些聚合:

select created_at, avg(today.price - coalesce(earlier.price)) as mean, sum(today.price - coalesce(earlier.price)) as total
from prices current
left join prices earlier
on today.product_id = earlier.product_id and earlier.created_at < current.created_at
where not exists (
    select 1
    from prices later
    where later.product_id = today.product_id and
    (
     ((today.created_at = later.created_at) and (today.id < later.id)) or
     ((earlier.created_at <= later.created_at) and (earlier.id < later.id))
    )
)
group by created_at
order by created_at;

【讨论】:

  • 嗯,我似乎无法让它工作?我还用created_at(时间戳)的正确类型更新了问题。很抱歉
  • @NielsKristian 这可能不起作用,因为它从未经过测试。你能指出错误是什么吗?更好的是:你能创建一个 SQL Fiddle 吗?
猜你喜欢
  • 2021-10-06
  • 1970-01-01
  • 1970-01-01
  • 2022-01-23
  • 2021-12-12
  • 1970-01-01
  • 2021-09-21
  • 1970-01-01
相关资源
最近更新 更多