在 PostgreSQL 中，generate_series() 无法按预期使用 sum答案

【问题标题】：generate_series() not working as expected with sum in PostgreSQL在 PostgreSQL 中，generate_series() 无法按预期使用 sum
【发布时间】：2013-04-04 12:50:39
【问题描述】：

我有一些名为分类的表，其中包含classification_indicator_id。
我需要总结这个 ID 并放入 1 天系列。
我需要添加大约 20 列（还有另一个 classification_indicator_id）。
我从previous question修改了一点答案：

select
data.d::date as "data",
sum(c.classification_indicator_id)::integer as "Segment1",
sum(c4.classification_indicator_id)::integer as "Segment2",
sum(c5.classification_indicator_id)::integer as "Segment3"
from 
  generate_series(
    '2013-03-25'::timestamp without time zone,
    '2013-04-01'::timestamp without time zone,
    '1 day'::interval
) data(d)
left join classifications c on (data.d::date = c.created::date and c.classification_indicator_id = 3)
left join classifications c4 on (data.d::date = c4.created::date and c4.classification_indicator_id = 4)
left join classifications c5 on (data.d::date = c5.created::date and c5.classification_indicator_id = 5)
group by "data"
ORDER BY "data"

但仍然无法正常工作。 sum 每行都很大，当我添加其他列时会增长。在 2013-03-26 的 segment1 中有 4 列的第二个表中应该与第一个表等中的数量相同。

 With 3 column                      With 4 columns
data       | Segment1 | Segment2   data       | Segment1 | Segment2 | Segment3
--------------------------------   -------------------------------------------
2013-03-25 | 12       | 16         2013-03-25 | 12       | 16       | 20
--------------------------------   -------------------------------------------
2013-03-26 | 18       | 24         2013-03-26 | 108      | 144      | 180

【问题讨论】：

标签： sql postgresql join sum generate-series

【解决方案1】：

作为commented under your previous answer，您遇到了“代理交叉连接”。
我在这个相关答案中更详细地解释了它：
Two SQL LEFT JOINS produce incorrect result

您的查询应该是这样的：

SELECT d.created AS data
      ,c3.segment1
      ,c4.segment2
      ,c5.segment3
FROM (
   SELECT generate_series('2013-03-25'::date
                         ,'2013-04-01'::date
                         ,interval '1 day')::date AS created
    ) d
LEFT JOIN (
    SELECT created
          ,sum(classification_indicator_id)::integer AS segment1
    FROM   classifications
    WHERE  classification_indicator_id = 3
    GROUP  BY 1
    ) c3 USING (created)
LEFT JOIN (
    SELECT created
          ,sum(classification_indicator_id)::integer AS segment2
    FROM   classifications
    WHERE  classification_indicator_id = 4
    GROUP  BY 1
    ) c4 USING (created)
LEFT JOIN (
    SELECT created
          ,sum(classification_indicator_id)::integer AS segment3
    FROM   classifications
    WHERE  classification_indicator_id = 5
    GROUP  BY 1
    ) c5 USING (created)
ORDER  BY 1;

假设created 是date，而不是timestamp。

或者，对于更快的查询，因为这已成为一个主题：

SELECT d.created AS data
      ,count(classification_indicator_id = 3 OR NULL)::int * 3 AS segment1
      ,count(classification_indicator_id = 4 OR NULL)::int * 4 AS segment2
      ,count(classification_indicator_id = 5 OR NULL)::int * 5 AS segment3
FROM (
   SELECT generate_series('2013-03-25'::date
                         ,'2013-04-01'::date
                         ,interval '1 day')::date AS created
    ) d
LEFT   JOIN classifications c USING (created)
GROUP  BY 1
ORDER  BY 1;

【讨论】：

谢谢，我们现在正在测试这个解决方案，如果它对我们有帮助，我会告诉你的。但我们认为就是这样:)
我们将使用这两种解决方案，它们都很棒。从 10000 毫秒到 100 毫秒就是这样！
@infaustus：交叉连接可能会变得非常昂贵。由于性能已经成为一个话题，我提供了一个可能更快的变体。是的，可能。布丁的证明正在测试中。 ;)
我喜欢函数generate_series(...)::date的直接转换。这将清理大量代码。

【解决方案2】：

不需要连接：

select
    data.d::date as "data",
    sum((classification_indicator_id = 3)::integer * classification_indicator_id)::integer as "Segment1",
    sum((classification_indicator_id = 4)::integer * classification_indicator_id)::integer as "Segment2",
    sum((classification_indicator_id = 5)::integer * classification_indicator_id)::integer as "Segment3",
from 
    generate_series(
        '2013-03-25'::timestamp without time zone,
        '2013-04-01'::timestamp without time zone,
        '1 day'::interval
    ) data(d)
    left join
    classifications c on data.d::date = c.created::date
group by "data"
ORDER BY "data"

【讨论】：

这可能比多个连接更快。 CASE 会更快。
@Erwin 可能 ???你一定在开玩笑。或者我的英语很不稳定，没有理解 might 的含义 :))
我也会测试一下，我会告诉你的