【问题标题】:Speed up query where results with count(*) = 0 are included加速查询,其中包含 count(*) = 0 的结果
【发布时间】:2020-08-12 15:53:54
【问题描述】:

我有一个表 squitters,其中包括一个列 parsed_time。我想知道过去两天每小时的记录数并使用了这个查询:

SELECT date_trunc('hour', parsed_time) AS hour , count(*) 
FROM squitters 
WHERE parsed_time > date_trunc('hour', now()) - interval '2 day' 
GROUP BY hour 
ORDER BY hour DESC;

这可行,但零记录的小时数不会出现在结果中。我想要几个小时 结果中的记录也为零,计数为零,因此我使用generate_series 函数编写了此查询:

SELECT bins.hour, count(squitters.parsed_time)
FROM generate_series(date_trunc('hour', now() - interval '2 day'),  now(), '1 hour') bins(hour)
LEFT OUTER JOIN squitters ON bins.hour = date_trunc('hours', squitters.parsed_time) 
GROUP BY bins.hour
ORDER BY bins.hour DESC;

这可行,结果是计数为零的小时箱,但速度要慢得多。

第二个查询的 count=zero 结果如何才能获得第一个查询的速度?

(顺便说一句,parsed_time 上有一个索引)

【问题讨论】:

    标签: sql postgresql date group-by query-optimization


    【解决方案1】:

    您可以尝试更改连接条件,以便在 parsed_time 列上不应用日期函数:

    SELECT b.hour, COUNT(s.parsed_time) cnt
    FROM generate_series(date_trunc('hour', now() - interval '2 day'),  now(), '1 hour') b(hour)
    LEFT OUTER JOIN squitters s
        ON  s.parsed_time >= b.hour
        AND s.parsed_time <  b.hours + interval '1 hour'
    GROUP BY b.hour
    ORDER BY b.hour DESC;
    

    或者,您也可以尝试使用相关子查询(或横向连接)而不是 left join - 这避免了外部聚合的需要:

    SELECT 
        b.hour,
        (
            SELECT COUNT(*) 
            FROM squitters s 
            WHERE s.parsed_time >= b.hour AND s.parsed_time <  b.hours + interval '1 hour'
        ) cnt
    FROM generate_series(date_trunc('hour', now() - interval '2 day'),  now(), '1 hour') b(hour)
    ORDER BY b.hour desc
    

    【讨论】:

      【解决方案2】:

      您可以利用Common Table Expressions 将您的问题分成小块:

      WITH cte AS (
          --First query your table
          SELECT date_trunc('hour', parsed_time) AS sq_hour , count(*) 
          FROM squitters 
          WHERE parsed_time > date_trunc('hour', now()) - interval '2 day' 
          GROUP BY hour 
          ORDER BY hour DESC
      ), series AS (
          --Create the series without the data returned from 1st query
          SELECT 
              bins.series_hour, 
              0
          FROM 
              generate_series(date_trunc('hour', now() - interval '2 day'),  now(), '1 hour') bins(series_hour) 
          WHERE 
              series_hour not in (SELECT sq_hour FROM cte)
      )
      --Union the result
      SELECT * FROM cte 
      UNION 
      SELECT * FROM series 
      ORDER BY 1
      

      【讨论】:

        猜你喜欢
        • 2020-11-11
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2012-11-30
        • 1970-01-01
        • 2017-12-15
        相关资源
        最近更新 更多