每组多个组的 SQL 总和答案

【问题标题】：SQL sum of multiple groups per group每组多个组的 SQL 总和
【发布时间】：2012-11-17 18:58:20
【问题描述】：

在我之前的问题中有一个相当大的错误

horse_with_no_name 的回答返回了一个完美的结果，我非常感激，但是我最初的问题是错误的，所以我真的很抱歉；如果你看下表;

电路_uid |客户名称|机架位置|阅读日期|阅读时间 |安培|伏特 |千瓦|千瓦时 |千瓦 | pf |钥匙 -------------------------------------------------- -------------------------------------------------- ---------------------------------- cu1.cb1.r1 |客户 1 | 12.01.a1 | 2012-01-02 | 00:01:01 | 4.51 | 229.32 | 1.03 | 87 | 1.03 | 0.85 | 15 cu1.cb1.r1 |客户 1 | 12.01.a1 | 2012-01-02 | 01:01:01 | 4.18 | 230.3 | 0.96 | 90 | 0.96 | 0.84 | 16 cu1.cb1.r2 |客户 1 | 12.01.a1 | 2012-01-02 | 00:01:01 | 4.51 | 229.32 | 1.03 | 21 | 1.03 | 0.85 | 15 cu1.cb1.r2 |客户 1 | 12.01.a1 | 2012-01-02 | 01:01:01 | 4.18 | 230.3 | 0.96 | 23 | 0.96 | 0.84 | 16 cu1.cb1.s2 |客户 2 | 10.01.a1 | 2012-01-02 | 00:01:01 | 7.34 | 228.14 | 1.67 | 179 | 1.67 | 0.88 | 24009 cu1.cb1.s2 |客户 2 | 10.01.a1 | 2012-01-02 | 01:01:01 | 9.07 | 228.4 | 2.07 | 182 | 2.07 | 0.85 | 24010 cu1.cb1.s3 |客户 2 | 10.01.a1 | 2012-01-02 | 00:01:01 | 7.34 | 228.14 | 1.67 | 121 | 1.67 | 0.88 | 24009 cu1.cb1.s3 |客户 2 | 10.01.a1 | 2012-01-02 | 01:01:01 | 9.07 | 228.4 | 2.07 | 124 | 2.07 | 0.85 | 24010 cu1.cb1.r1 |客户 3 | 01.01.a1 | 2012-01-02 | 00:01:01 | 7.32 | 229.01 | 1.68 | 223 | 1.68 | 0.89 | 48003 cu1.cb1.r1 |客户 3 | 01.01.a1 | 2012-01-02 | 01:01:01 | 6.61 | 228.29 | 1.51 | 226 | 1.51 | 0.88 | 48004 cu1.cb1.r4 |客户 3 | 01.01.a1 | 2012-01-02 | 00:01:01 | 7.32 | 229.01 | 1.68 | 215 | 1.68 | 0.89 | 48003 cu1.cb1.r4 |客户 3 | 01.01.a1 | 2012-01-02 | 01:01:01 | 6.61 | 228.29 | 1.51 | 217 | 1.51 | 0.88 | 48004

如您所见，现在每个客户都有多个电路。因此，结果现在将是每个客户每个电路的每个最早 kwh 读数的总和，因此该表中的结果将是：

customer_name | kwh(sum)
--------------+-----------
customer 1    | 108      (the result of 87 + 21)  
customer 2    | 300      (the result of 179 + 121)  
customer 3    | 438      (the result of 223 + 215)

每位客户将有超过 2 个电路，并且读数可能发生在不同的时间，因此需要“最早”读数。

有人对修改后的问题有什么建议吗？

CentOs/Redhat 上的 PostgreSQL 8.4。

【问题讨论】：

标签： sql postgresql aggregate-functions greatest-n-per-group

【解决方案1】：

SELECT customer_name, sum(kwh) AS kwh_total
FROM  (
    SELECT DISTINCT ON (customer_name, circuit_uid)
           customer_name, circuit_uid, kwh
    FROM   readings
    WHERE  reading_date = '2012-01-02'::date
    ORDER  BY customer_name, circuit_uid, reading_time
    ) x
GROUP  BY 1

与before 相同，只需根据(customer_name, circuit_uid) 选择最早的。
然后按customer_name求和。

索引

像下面这样的multi-column index 会让这个非常快：

CREATE INDEX readings_multi_idx
ON readings(reading_date, customer_name, circuit_uid, reading_time);

【讨论】：

@AlanEnnis：你能用更快的EXPLAIN ANALYZE 进行快速测试吗？会很有趣。
erwin 总运行时间：21.058 毫秒（8 行）马的结果总运行时间：20.623 毫秒（10 行）相关日期的总行数为 432 行。你们俩都很棒，谢谢。
@AlanEnnis：感谢您的反馈。可能DISTINCT ON 版本有更多的排序开销，在这种情况下不需要。对于需要有序输出的简单情况，此变体通常更快。但是对于像你这样的小集合来说，这真的是无关紧要的——除非你多次跟注。另外：使用我在答案中添加的索引，这将比任何事情都快。
Erwin，创建索引的结果是总运行时间：9.325 毫秒（10 行）用于您的查询和总运行时间：9.523 毫秒（12 行）用于马的查询。在生产中，每个日期的行数将增加 20 或 30 倍，因此会有所不同。也谢谢你的索引，我会把它归档到生产数据库中。

【解决方案2】：

这是对您原来问题的扩展：

select customer_name,
       sum(kwh)
from (
   select customer_name,
          kwh,
          reading_time,
          reading_date,
          row_number() over (partition by customer_name, circuit_uid order by reading_time) as rn
   from readings
   where reading_date = date '2012-01-02'
) t
where rn = 1
group by customer_name

注意外部查询中的新sum() 和内部查询中更改的partition by 定义（与您之前的问题相比）现在计算每个circuit_uid 的第一个读数（而不是每个客户的第一个读数)。

【讨论】：