【问题标题】:generate_series() not working as expected with sum in PostgreSQL在 PostgreSQL 中,generate_series() 无法按预期使用 sum
【发布时间】:2013-04-04 12:50:39
【问题描述】:

我有一些名为分类的表,其中包含classification_indicator_id
我需要总结这个 ID 并放入 1 天系列。
我需要添加大约 20 列(还有另一个 classification_indicator_id)。
我从previous question修改了一点答案:

select
data.d::date as "data",
sum(c.classification_indicator_id)::integer as "Segment1",
sum(c4.classification_indicator_id)::integer as "Segment2",
sum(c5.classification_indicator_id)::integer as "Segment3"
from 
  generate_series(
    '2013-03-25'::timestamp without time zone,
    '2013-04-01'::timestamp without time zone,
    '1 day'::interval
) data(d)
left join classifications c on (data.d::date = c.created::date and c.classification_indicator_id = 3)
left join classifications c4 on (data.d::date = c4.created::date and c4.classification_indicator_id = 4)
left join classifications c5 on (data.d::date = c5.created::date and c5.classification_indicator_id = 5)
group by "data"
ORDER BY "data"

但仍然无法正常工作。 sum 每行都很大,当我添加其他列时会增长。在 2013-03-26 的 segment1 中有 4 列的第二个表中应该与第一个表等中的数量相同。

 With 3 column                      With 4 columns
data       | Segment1 | Segment2   data       | Segment1 | Segment2 | Segment3
--------------------------------   -------------------------------------------
2013-03-25 | 12       | 16         2013-03-25 | 12       | 16       | 20
--------------------------------   -------------------------------------------
2013-03-26 | 18       | 24         2013-03-26 | 108      | 144      | 180    

【问题讨论】:

    标签: sql postgresql join sum generate-series


    【解决方案1】:

    作为commented under your previous answer,您遇到了“代理交叉连接”。
    我在这个相关答案中更详细地解释了它:
    Two SQL LEFT JOINS produce incorrect result

    您的查询应该是这样的:

    SELECT d.created AS data
          ,c3.segment1
          ,c4.segment2
          ,c5.segment3
    FROM (
       SELECT generate_series('2013-03-25'::date
                             ,'2013-04-01'::date
                             ,interval '1 day')::date AS created
        ) d
    LEFT JOIN (
        SELECT created
              ,sum(classification_indicator_id)::integer AS segment1
        FROM   classifications
        WHERE  classification_indicator_id = 3
        GROUP  BY 1
        ) c3 USING (created)
    LEFT JOIN (
        SELECT created
              ,sum(classification_indicator_id)::integer AS segment2
        FROM   classifications
        WHERE  classification_indicator_id = 4
        GROUP  BY 1
        ) c4 USING (created)
    LEFT JOIN (
        SELECT created
              ,sum(classification_indicator_id)::integer AS segment3
        FROM   classifications
        WHERE  classification_indicator_id = 5
        GROUP  BY 1
        ) c5 USING (created)
    ORDER  BY 1;
    

    假设createddate,而不是timestamp

    或者,对于更快的查询,因为这已成为一个主题:

    SELECT d.created AS data
          ,count(classification_indicator_id = 3 OR NULL)::int * 3 AS segment1
          ,count(classification_indicator_id = 4 OR NULL)::int * 4 AS segment2
          ,count(classification_indicator_id = 5 OR NULL)::int * 5 AS segment3
    FROM (
       SELECT generate_series('2013-03-25'::date
                             ,'2013-04-01'::date
                             ,interval '1 day')::date AS created
        ) d
    LEFT   JOIN classifications c USING (created)
    GROUP  BY 1
    ORDER  BY 1;
    

    【讨论】:

    • 谢谢,我们现在正在测试这个解决方案,如果它对我们有帮助,我会告诉你的。但我们认为就是这样:)
    • 我们将使用这两种解决方案,它们都很棒。从 10000 毫秒到 100 毫秒就是这样!
    • @infaustus:交叉连接可能会变得非常昂贵。由于性能已经成为一个话题,我提供了一个可能更快的变体。是的,可能。布丁的证明正在测试中。 ;)
    • 我喜欢函数generate_series(...)::date的直接转换。这将清理大量代码。
    【解决方案2】:

    不需要连接:

    select
        data.d::date as "data",
        sum((classification_indicator_id = 3)::integer * classification_indicator_id)::integer as "Segment1",
        sum((classification_indicator_id = 4)::integer * classification_indicator_id)::integer as "Segment2",
        sum((classification_indicator_id = 5)::integer * classification_indicator_id)::integer as "Segment3",
    from 
        generate_series(
            '2013-03-25'::timestamp without time zone,
            '2013-04-01'::timestamp without time zone,
            '1 day'::interval
        ) data(d)
        left join
        classifications c on data.d::date = c.created::date
    group by "data"
    ORDER BY "data"
    

    【讨论】:

    • 这可能比多个连接更快。 CASE 会更快。
    • @Erwin 可能 ???你一定在开玩笑。或者我的英语很不稳定,没有理解 might 的含义 :))
    • 我也会测试一下,我会告诉你的
    猜你喜欢
    • 2023-03-14
    • 1970-01-01
    • 2018-06-22
    • 2018-12-31
    • 2011-11-19
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-05-01
    相关资源
    最近更新 更多