如何优化（TimescaleDB/PostgreSQL）时间序列 SQL 查询答案

【问题标题】：How to optimize (TimescaleDB/PostgreSQL) timeseries SQL query如何优化（TimescaleDB/PostgreSQL）时间序列 SQL 查询
【发布时间】：2019-10-13 17:25:00
【问题描述】：

我有时间序列数据，我正在尝试尽可能高效的数据库结构和查询。

我已将 id 和 datetime 索引为表中的 desc。

SELECT
   table.id,
   To_char(Time_bucket('2 hours', datetime) at time zone 'utc', 'YYYY-MM-DD"T"HH24:MI:SS"Z"') AS time,
   Avg(value) AS value,
   mapping.description 
FROM
   table 
   JOIN
      mapping 
      ON table.id = mapping.id 
WHERE
   table.id IN
   (
      10000,
      10004,
      1001,
      10005
   )
   AND datetime BETWEEN '2019-09-25' AND '2019-09-30' 
GROUP BY
   time,
   table.id,
   mapping.description 
ORDER BY
   time DESC;

表结构如下

                        Table "public.table"
  Column  |            Type             | Collation | Nullable | Default
----------+-----------------------------+-----------+----------+---------
 datetime | timestamp without time zone |           | not null |
 id       | integer                     |           | not null |
 value    | double precision            |           |          |
Indexes:
    "table_datetime_idx" btree (datetime DESC)
    "table_id_datetime_idx" btree (id, datetime DESC)

映射表

                      Table "public.mapping"
   Column    |       Type        | Collation | Nullable | Default
-------------+-------------------+-----------+----------+---------
 id          | integer           |           | not null |
 tagname     | character varying |           |          |
 description | character varying |           |          |
 unit        | character varying |           |          |
 mineu       | double precision  |           |          |
 maxeu       | double precision  |           |          |

Indexes:
     "mapping_id_idx" btree (id)

没有错误，但我仍然想知道这看起来不够好或不够高效。现在执行大约需要 14 秒。优化此查询的最简单解决方案是什么？

在 EXPLAIN ANALYZE 的结果下方

 GroupAggregate  (cost=250964.79..265699.28 rows=453369 width=73) (actual time=10247.641..11501.894 rows=60 loops=1)
   Group Key: (to_char(timezone('utc'::text, time_bucket('02:00:00'::interval, _hyper_1_4_chunk.datetime)), 'YYYY-MM-DD"T"HH24:MI:SS"Z"'::text)), _hyper_1_4_chunk.id, mapping.description
   ->  Sort  (cost=250964.79..252098.21 rows=453369 width=73) (actual time=10237.177..10481.057 rows=421712 loops=1)
         Sort Key: (to_char(timezone('utc'::text, time_bucket('02:00:00'::interval, _hyper_1_4_chunk.datetime)), 'YYYY-MM-DD"T"HH24:MI:SS"Z"'::text)) DESC, _hyper_1_4_chunk.id, mapping.description
         Sort Method: external merge  Disk: 33816kB
         ->  Hash Join  (cost=7228.67..196570.23 rows=453369 width=73) (actual time=81.488..5779.432 rows=421712 loops=1)
               Hash Cond: (_hyper_1_4_chunk.id = mapping.id)
               ->  Append  (cost=7215.89..186363.19 rows=452059 width=20) (actual time=81.299..3680.949 rows=421712 loops=1)
                     ->  Bitmap Heap Scan on _hyper_1_4_chunk  (cost=7215.89..129006.87 rows=363549 width=20) (actual time=81.298..3350.870 rows=336860 loops=1)
                           Recheck Cond: ((id = ANY ('{10000,10004,1001,10005}'::integer[])) AND (datetime >= '2019-09-25 00:00:00'::timestamp without time zone) AND (datetime <= '2019-09-30 00:00:00'::timestamp without time zone))
                           Heap Blocks: exact=61125
                           ->  Bitmap Index Scan on _hyper_1_4_chunk_table_id_datetime_idx  (cost=0.00..7125.00 rows=363549 width=0) (actual time=69.006..69.006 rows=336860 loops=1)
                                 Index Cond: ((id = ANY ('{10000,10004,1001,10005}'::integer[])) AND (datetime >= '2019-09-25 00:00:00'::timestamp without time zone) AND (datetime <= '2019-09-30 00:00:00'::timestamp without time zone))
                     ->  Bitmap Heap Scan on _hyper_1_3_chunk  (cost=1766.52..57356.32 rows=88510 width=20) (actual time=20.876..311.867 rows=84852 loops=1)
                           Recheck Cond: ((id = ANY ('{10000,10004,1001,10005}'::integer[])) AND (datetime >= '2019-09-25 00:00:00'::timestamp without time zone) AND (datetime <= '2019-09-30 00:00:00'::timestamp without time zone))
                           Heap Blocks: exact=16352
                           ->  Bitmap Index Scan on _hyper_1_3_chunk_table_id_datetime_idx  (cost=0.00..1744.39 rows=88510 width=0) (actual time=17.291..17.291 rows=84852 loops=1)
                                 Index Cond: ((id = ANY ('{10000,10004,1001,10005}'::integer[])) AND (datetime >= '2019-09-25 00:00:00'::timestamp without time zone) AND (datetime <= '2019-09-30 00:00:00'::timestamp without time zone))
               ->  Hash  (cost=8.46..8.46 rows=346 width=33) (actual time=0.163..0.163 rows=346 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 31kB
                     ->  Seq Scan on mapping  (cost=0.00..8.46 rows=346 width=33) (actual time=0.019..0.097 rows=346 loops=1)
 Planning time: 1.008 ms
 Execution time: 11507.606 ms

【问题讨论】：

您必须指定表结构或至少指定所有列的来源。
你能对查询做 EXPLAIN ANALYZE 并分享结果吗？它将允许查看块有多大以及计划的好坏。磁盘速度也可能影响查询执行时间。您是在本地运行 TimescaleDB 还是在云端运行？
在本地虚拟服务器上运行。将 EXPLAIN ANALYZE 的编辑结果编辑到帖子中。
Sort Method: external merge Disk: 33816kB 您正在交换到磁盘以合并结果。需要增加分配给 shared_buffers 的大小以适应内存中的排序。你有运行 timescaledb-tune 吗？ github.com/timescale/timescaledb-tune
@MikeFreedman：shared_buffers 与排序所需的内存无关。这是由work_mem 控制的

标签： sql postgresql performance timescaledb

【解决方案1】：

如果您将work_mem 提高到 100 MB 或更多，则应该在内存中计算排序，这将加快执行速度。

如果您提高work_mem 更多，您可能会获得更快的哈希聚合而不是组聚合，这将使查询更快。

我认为您对索引扫描无能为力。

【讨论】：