【问题标题】:Improve speed of complex postgres query in rails app提高 Rails 应用程序中复杂 postgres 查询的速度
【发布时间】:2017-11-27 10:41:00
【问题描述】:

我的应用中有一个视图可以可视化大量数据,并且在后端使用此查询生成数据:

DataPoint Load (20394.8ms)  
SELECT communities.id as com, 
       consumers.name as con, 
       array_agg(timestamp ORDER BY data_points.timestamp asc) as tims, 
       array_agg(consumption ORDER BY data_points.timestamp ASC) as cons 
FROM "data_points" 
     INNER JOIN "consumers" ON "consumers"."id" = "data_points"."consumer_id" 
     INNER JOIN "communities_consumers" ON "communities_consumers"."consumer_id" = "consumers"."id" 
     INNER JOIN "communities" ON "communities"."id" = "communities_consumers"."community_id" 
     INNER JOIN "clusterings" ON "clusterings"."id" = "communities"."clustering_id" 
WHERE ("data_points"."timestamp" BETWEEN $1 AND $2) 
   AND "data_points"."interval_id" = $3 
   AND "clusterings"."id" = 1 
GROUP BY communities.id, consumers.id  
[["timestamp", "2015-11-20 09:23:00"], ["timestamp", "2015-11-27 09:23:00"], ["interval_id", 2]]

查询执行大约需要 20 秒,这似乎有点过分。

生成查询的代码是这样的:

res = {}
DataPoint.joins(consumer: {communities: :clustering} )
         .where('clusterings.id': self,
               timestamp: chart_cookies[:start_date] .. chart_cookies[:end_date],
               interval_id: chart_cookies[:interval_id])
         .group('communities.id')
         .group('consumers.id')
         .select('communities.id as com, consumers.name as con',
                'array_agg(timestamp ORDER BY data_points.timestamp asc) as tims',
                'array_agg(consumption ORDER BY data_points.timestamp ASC) as cons')
         .each do |d|
      res[d.com] ||= {}
      res[d.com][d.con] = d.tims.zip(d.cons)
      res[d.com]["aggregate"] ||= d.tims.map{|t| [t,0]}
      res[d.com]["aggregate"]  = res[d.com]["aggregate"].zip(d.cons).map{|(a,b),d| [a,(b+d)]}
end
res

相关的数据库模型如下:

  create_table "data_points", force: :cascade do |t|
    t.bigint "consumer_id"
    t.bigint "interval_id"
    t.datetime "timestamp"
    t.float "consumption"
    t.float "flexibility"
    t.datetime "created_at", null: false
    t.datetime "updated_at", null: false
    t.index ["consumer_id"], name: "index_data_points_on_consumer_id"
    t.index ["interval_id"], name: "index_data_points_on_interval_id"
    t.index ["timestamp", "consumer_id", "interval_id"], name: "index_data_points_on_timestamp_and_consumer_id_and_interval_id", unique: true
    t.index ["timestamp"], name: "index_data_points_on_timestamp"
  end

  create_table "consumers", force: :cascade do |t|
    t.string "name"
    t.string "location"
    t.string "edms_id"
    t.bigint "building_type_id"
    t.bigint "connection_type_id"
    t.float "location_x"
    t.float "location_y"
    t.string "feeder_id"
    t.bigint "consumer_category_id"
    t.datetime "created_at", null: false
    t.datetime "updated_at", null: false
    t.index ["building_type_id"], name: "index_consumers_on_building_type_id"
    t.index ["connection_type_id"], name: "index_consumers_on_connection_type_id"
    t.index ["consumer_category_id"], name: "index_consumers_on_consumer_category_id"
  end

  create_table "communities_consumers", id: false, force: :cascade do |t|
    t.bigint "consumer_id", null: false
    t.bigint "community_id", null: false
    t.index ["community_id", "consumer_id"], name: "index_communities_consumers_on_community_id_and_consumer_id"
    t.index ["consumer_id", "community_id"], name: "index_communities_consumers_on_consumer_id_and_community_id"
  end

  create_table "communities", force: :cascade do |t|
    t.string "name"
    t.text "description"
    t.bigint "clustering_id"
    t.datetime "created_at", null: false
    t.datetime "updated_at", null: false
    t.index ["clustering_id"], name: "index_communities_on_clustering_id"
  end

  create_table "clusterings", force: :cascade do |t|
    t.string "name"
    t.text "description"
    t.datetime "created_at", null: false
    t.datetime "updated_at", null: false
  end

如何使查询执行得更快?是否可以重构查询以简化查询,或者向数据库架构添加一些额外的索引以缩短查询时间?

有趣的是,我在另一个视图中使用的稍微简化的查询版本运行速度要快得多,第一个请求只需 1161.4 毫秒,以下请求只需 41.6 毫秒:

DataPoint Load (1161.4ms)  
SELECT consumers.name as con, 
       array_agg(timestamp ORDER BY data_points.timestamp asc) as tims, 
       array_agg(consumption ORDER BY data_points.timestamp ASC) as cons 
FROM "data_points" 
    INNER JOIN "consumers" ON "consumers"."id" = "data_points"."consumer_id" 
    INNER JOIN "communities_consumers" ON "communities_consumers"."consumer_id" = "consumers"."id" 
    INNER JOIN "communities" ON "communities"."id" = "communities_consumers"."community_id" 
WHERE ("data_points"."timestamp" BETWEEN $1 AND $2) 
   AND "data_points"."interval_id" = $3 
   AND "communities"."id" = 100 GROUP BY communities.id, consumers.name  
[["timestamp", "2015-11-20 09:23:00"], ["timestamp", "2015-11-27 09:23:00"], ["interval_id", 2]]

在 dbconsole 中使用命令 EXPLAIN (ANALYZE, BUFFERS) 进行查询,我得到以下输出:

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=12.31..7440.69 rows=246 width=57) (actual time=44.139..20474.015 rows=296 loops=1)
   Group Key: communities.id, consumers.id
   Buffers: shared hit=159692 read=6148105 written=209
   ->  Nested Loop  (cost=12.31..7434.54 rows=246 width=57) (actual time=20.944..20436.806 rows=49728 loops=1)
         Buffers: shared hit=159685 read=6148105 written=209
         ->  Nested Loop  (cost=11.88..49.30 rows=1 width=49) (actual time=0.102..6.374 rows=296 loops=1)
               Buffers: shared hit=988 read=208
               ->  Nested Loop  (cost=11.73..41.12 rows=1 width=57) (actual time=0.084..4.443 rows=296 loops=1)
                     Buffers: shared hit=396 read=208
                     ->  Merge Join  (cost=11.58..40.78 rows=1 width=24) (actual time=0.075..1.365 rows=296 loops=1)
                           Merge Cond: (communities_consumers.community_id = communities.id)
                           Buffers: shared hit=5 read=7
                           ->  Index Only Scan using index_communities_consumers_on_community_id_and_consumer_id on communities_consumers  (cost=0.27..28.71 rows=296 width=16) (actual time=0.039..0.446 rows=296 loops=1)
                                 Heap Fetches: 4
                                 Buffers: shared hit=1 read=6
                           ->  Sort  (cost=11.31..11.31 rows=3 width=16) (actual time=0.034..0.213 rows=247 loops=1)
                                 Sort Key: communities.id
                                 Sort Method: quicksort  Memory: 25kB
                                 Buffers: shared hit=4 read=1
                                 ->  Bitmap Heap Scan on communities  (cost=4.17..11.28 rows=3 width=16) (actual time=0.026..0.027 rows=6 loops=1)
                                       Recheck Cond: (clustering_id = 1)
                                       Heap Blocks: exact=1
                                       Buffers: shared hit=4 read=1
                                       ->  Bitmap Index Scan on index_communities_on_clustering_id  (cost=0.00..4.17 rows=3 width=0) (actual time=0.020..0.020 rows=8 loops=1)
                                             Index Cond: (clustering_id = 1)
                                             Buffers: shared hit=3 read=1
                     ->  Index Scan using consumers_pkey on consumers  (cost=0.15..0.33 rows=1 width=33) (actual time=0.007..0.008 rows=1 loops=296)
                           Index Cond: (id = communities_consumers.consumer_id)
                           Buffers: shared hit=391 read=201
               ->  Index Only Scan using clusterings_pkey on clusterings  (cost=0.15..8.17 rows=1 width=8) (actual time=0.004..0.005 rows=1 loops=296)
                     Index Cond: (id = 1)
                     Heap Fetches: 296
                     Buffers: shared hit=592
         ->  Index Scan using index_data_points_on_consumer_id on data_points  (cost=0.44..7383.44 rows=180 width=24) (actual time=56.128..68.995 rows=168 loops=296)
               Index Cond: (consumer_id = consumers.id)
               Filter: (("timestamp" >= '2015-11-20 09:23:00'::timestamp without time zone) AND ("timestamp" <= '2015-11-27 09:23:00'::timestamp without time zone) AND (interval_id = 2))
               Rows Removed by Filter: 76610
               Buffers: shared hit=158697 read=6147897 written=209
 Planning time: 1.811 ms
 Execution time: 20474.330 ms
(40 rows)

bullet gem 返回以下警告:

USE eager loading detected
  Community => [:communities_consumers]
  Add to your finder: :includes => [:communities_consumers]

USE eager loading detected
  Community => [:consumers]
  Add to your finder: :includes => [:consumers]

去掉与clusterings表的join后,新的查询计划如下:

EXPLAIN for: SELECT communities.id as com, consumers.name as con, array_agg(timestamp ORDER BY data_points.timestamp asc) as tims, array_agg(consumption ORDER BY data_points.timestamp ASC) as cons FROM "data_points" INNER JOIN "consumers" ON "consumers"."id" = "data_points"."consumer_id" INNER JOIN "communities_consumers" ON "communities_consumers"."consumer_id" = "consumers"."id" INNER JOIN "communities" ON "communities"."id" = "communities_consumers"."community_id" WHERE ("data_points"."timestamp" BETWEEN $1 AND $2) AND "data_points"."interval_id" = $3 AND "communities"."clustering_id" = 1 GROUP BY communities.id, consumers.id [["timestamp", "2015-11-29 20:52:30.926247"], ["timestamp", "2015-12-06 20:52:30.926468"], ["interval_id", 2]]
                                                                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=10839.79..10846.42 rows=241 width=57)
   ->  Sort  (cost=10839.79..10840.39 rows=241 width=57)
         Sort Key: communities.id, consumers.id
         ->  Nested Loop  (cost=7643.11..10830.26 rows=241 width=57)
               ->  Nested Loop  (cost=11.47..22.79 rows=1 width=49)
                     ->  Hash Join  (cost=11.32..17.40 rows=1 width=16)
                           Hash Cond: (communities_consumers.community_id = communities.id)
                           ->  Seq Scan on communities_consumers  (cost=0.00..4.96 rows=296 width=16)
                           ->  Hash  (cost=11.28..11.28 rows=3 width=8)
                                 ->  Bitmap Heap Scan on communities  (cost=4.17..11.28 rows=3 width=8)
                                       Recheck Cond: (clustering_id = 1)
                                       ->  Bitmap Index Scan on index_communities_on_clustering_id  (cost=0.00..4.17 rows=3 width=0)
                                             Index Cond: (clustering_id = 1)
                     ->  Index Scan using consumers_pkey on consumers  (cost=0.15..5.38 rows=1 width=33)
                           Index Cond: (id = communities_consumers.consumer_id)
               ->  Bitmap Heap Scan on data_points  (cost=7631.64..10805.72 rows=174 width=24)
                     Recheck Cond: ((consumer_id = consumers.id) AND ("timestamp" >= '2015-11-29 20:52:30.926247'::timestamp without time zone) AND ("timestamp" <= '2015-12-06 20:52:30.926468'::timestamp without time zone))
                     Filter: (interval_id = 2::bigint)
                     ->  BitmapAnd  (cost=7631.64..7631.64 rows=861 width=0)
                           ->  Bitmap Index Scan on index_data_points_on_consumer_id  (cost=0.00..1589.92 rows=76778 width=0)
                                 Index Cond: (consumer_id = consumers.id)
                           ->  Bitmap Index Scan on index_data_points_on_timestamp  (cost=0.00..6028.58 rows=254814 width=0)
                                 Index Cond: (("timestamp" >= '2015-11-29 20:52:30.926247'::timestamp without time zone) AND ("timestamp" <= '2015-12-06 20:52:30.926468'::timestamp without time zone))
(23 rows)

根据 cmets 的要求,这些是简化查询的查询计划,有和没有对 communities.id 的限制

 DataPoint Load (1563.3ms)  SELECT consumers.name as con, array_agg(timestamp ORDER BY data_points.timestamp asc) as tims, array_agg(consumption ORDER BY data_points.timestamp ASC) as cons FROM "data_points" INNER JOIN "consumers" ON "consumers"."id" = "data_points"."consumer_id" INNER JOIN "communities_consumers" ON "communities_consumers"."consumer_id" = "consumers"."id" INNER JOIN "communities" ON "communities"."id" = "communities_consumers"."community_id" WHERE ("data_points"."timestamp" BETWEEN $1 AND $2) AND "data_points"."interval_id" = $3 GROUP BY communities.id, consumers.name  [["timestamp", "2015-11-29 20:52:30.926000"], ["timestamp", "2015-12-06 20:52:30.926000"], ["interval_id", 2]]
EXPLAIN for: SELECT consumers.name as con, array_agg(timestamp ORDER BY data_points.timestamp asc) as tims, array_agg(consumption ORDER BY data_points.timestamp ASC) as cons FROM "data_points" INNER JOIN "consumers" ON "consumers"."id" = "data_points"."consumer_id" INNER JOIN "communities_consumers" ON "communities_consumers"."consumer_id" = "consumers"."id" INNER JOIN "communities" ON "communities"."id" = "communities_consumers"."community_id" WHERE ("data_points"."timestamp" BETWEEN $1 AND $2) AND "data_points"."interval_id" = $3 GROUP BY communities.id, consumers.name [["timestamp", "2015-11-29 20:52:30.926000"], ["timestamp", "2015-12-06 20:52:30.926000"], ["interval_id", 2]]
                                                                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=140992.34..142405.51 rows=51388 width=49)
   ->  Sort  (cost=140992.34..141120.81 rows=51388 width=49)
         Sort Key: communities.id, consumers.name
         ->  Hash Join  (cost=10135.44..135214.45 rows=51388 width=49)
               Hash Cond: (data_points.consumer_id = consumers.id)
               ->  Bitmap Heap Scan on data_points  (cost=10082.58..134455.00 rows=51388 width=24)
                     Recheck Cond: (("timestamp" >= '2015-11-29 20:52:30.926'::timestamp without time zone) AND ("timestamp" <= '2015-12-06 20:52:30.926'::timestamp without time zone) AND (interval_id = 2::bigint))
                     ->  Bitmap Index Scan on index_data_points_on_timestamp_and_consumer_id_and_interval_id  (cost=0.00..10069.74 rows=51388 width=0)
                           Index Cond: (("timestamp" >= '2015-11-29 20:52:30.926'::timestamp without time zone) AND ("timestamp" <= '2015-12-06 20:52:30.926'::timestamp without time zone) AND (interval_id = 2::bigint))
               ->  Hash  (cost=49.16..49.16 rows=296 width=49)
                     ->  Hash Join  (cost=33.06..49.16 rows=296 width=49)
                           Hash Cond: (communities_consumers.community_id = communities.id)
                           ->  Hash Join  (cost=8.66..20.69 rows=296 width=49)
                                 Hash Cond: (consumers.id = communities_consumers.consumer_id)
                                 ->  Seq Scan on consumers  (cost=0.00..7.96 rows=296 width=33)
                                 ->  Hash  (cost=4.96..4.96 rows=296 width=16)
                                       ->  Seq Scan on communities_consumers  (cost=0.00..4.96 rows=296 width=16)
                           ->  Hash  (cost=16.40..16.40 rows=640 width=8)
                                 ->  Seq Scan on communities  (cost=0.00..16.40 rows=640 width=8)
(19 rows)

  DataPoint Load (1479.0ms)  SELECT consumers.name as con, array_agg(timestamp ORDER BY data_points.timestamp asc) as tims, array_agg(consumption ORDER BY data_points.timestamp ASC) as cons FROM "data_points" INNER JOIN "consumers" ON "consumers"."id" = "data_points"."consumer_id" INNER JOIN "communities_consumers" ON "communities_consumers"."consumer_id" = "consumers"."id" INNER JOIN "communities" ON "communities"."id" = "communities_consumers"."community_id" WHERE ("data_points"."timestamp" BETWEEN $1 AND $2) AND "data_points"."interval_id" = $3 GROUP BY communities.id, consumers.name  [["timestamp", "2015-11-29 20:52:30.926000"], ["timestamp", "2015-12-06 20:52:30.926000"], ["interval_id", 2]]
EXPLAIN for: SELECT consumers.name as con, array_agg(timestamp ORDER BY data_points.timestamp asc) as tims, array_agg(consumption ORDER BY data_points.timestamp ASC) as cons FROM "data_points" INNER JOIN "consumers" ON "consumers"."id" = "data_points"."consumer_id" INNER JOIN "communities_consumers" ON "communities_consumers"."consumer_id" = "consumers"."id" INNER JOIN "communities" ON "communities"."id" = "communities_consumers"."community_id" WHERE ("data_points"."timestamp" BETWEEN $1 AND $2) AND "data_points"."interval_id" = $3 GROUP BY communities.id, consumers.name [["timestamp", "2015-11-29 20:52:30.926000"], ["timestamp", "2015-12-06 20:52:30.926000"], ["interval_id", 2]]
                                                                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=140992.34..142405.51 rows=51388 width=49)
   ->  Sort  (cost=140992.34..141120.81 rows=51388 width=49)
         Sort Key: communities.id, consumers.name
         ->  Hash Join  (cost=10135.44..135214.45 rows=51388 width=49)
               Hash Cond: (data_points.consumer_id = consumers.id)
               ->  Bitmap Heap Scan on data_points  (cost=10082.58..134455.00 rows=51388 width=24)
                     Recheck Cond: (("timestamp" >= '2015-11-29 20:52:30.926'::timestamp without time zone) AND ("timestamp" <= '2015-12-06 20:52:30.926'::timestamp without time zone) AND (interval_id = 2::bigint))
                     ->  Bitmap Index Scan on index_data_points_on_timestamp_and_consumer_id_and_interval_id  (cost=0.00..10069.74 rows=51388 width=0)
                           Index Cond: (("timestamp" >= '2015-11-29 20:52:30.926'::timestamp without time zone) AND ("timestamp" <= '2015-12-06 20:52:30.926'::timestamp without time zone) AND (interval_id = 2::bigint))
               ->  Hash  (cost=49.16..49.16 rows=296 width=49)
                     ->  Hash Join  (cost=33.06..49.16 rows=296 width=49)
                           Hash Cond: (communities_consumers.community_id = communities.id)
                           ->  Hash Join  (cost=8.66..20.69 rows=296 width=49)
                                 Hash Cond: (consumers.id = communities_consumers.consumer_id)
                                 ->  Seq Scan on consumers  (cost=0.00..7.96 rows=296 width=33)
                                 ->  Hash  (cost=4.96..4.96 rows=296 width=16)
                                       ->  Seq Scan on communities_consumers  (cost=0.00..4.96 rows=296 width=16)
                           ->  Hash  (cost=16.40..16.40 rows=640 width=8)
                                 ->  Seq Scan on communities  (cost=0.00..16.40 rows=640 width=8)
(19 rows)

【问题讨论】:

  • 请使用子弹 gem(或在您的 sql 客户端中将查询作为原始 sql 运行)来查看您的代码将时间花在哪里。从代码来看,看起来每个块都会变得很重,如果您的查询返回一个大列表,那么在 rails 中处理大列表会很慢。
  • 这个查询是否对您的操作系统数据进行了某种分析?
  • @Jin.X:我在问题的查询中添加了EXPLAIN (ANALYZE, BUFFERS) 的输出。我不认为 ruby​​ 是瓶颈,因为它在查询运行时没有高 cpu 负载。
  • @xeon131:不确定您所说的“运营系统数据”是什么意思,但该项目是关于能源系统中的集群,此查询描述了当前集群的消耗数据的概述,细分进入社区,如果这有意义的话。
  • @user000001 使用子弹 gem,并从 rails 服务器运行查询,您的服务器日志将显示有关在后台执行的查询及其时间查询的信息,这将有助于审阅者/你自己找出问题所在

标签: sql ruby-on-rails postgresql performance


【解决方案1】:

您是否尝试在以下位置添加索引:

"data_points".timestamp" + "data_points".consumer_id"

data_points".consumer_id 仅?

【讨论】:

  • 这个答案实际上帮助最大,将查询时间缩短到 8 秒
  • 嗯,很有趣。你已经在"timestamp", "consumer_id", "interval_id" 上有了复合索引,只在"data_points".timestamp" + "data_points".consumer_id" 上添加索引应该只是冗余?
  • 既然查询只使用了这 3 个字段中的 2 个,为什么会这样呢?也很遗憾听到它没有比 @user000001 更快:/
  • @nekogami:是的,我也很惊讶需要双索引,虽然已经有三索引,但似乎实际上是必要的。
  • @user000001 所以?你找到解决办法了吗?
【解决方案2】:

您使用的是哪个版本的 Postgres?在 Postgres 10 中,他们引入了本地表分区。如果您的“data_points”表非常大,这可能会大大加快您的查询速度,因为您正在查看时间范围:

WHERE (data_points.TIMESTAMP BETWEEN $1 AND $2) 

您可以研究的一种策略是在“时间戳”字段的 DATE 值上添加分区。然后修改您的查询以包含一个额外的过滤器,以便开始分区:

WHERE ("data_points"."timestamp" BETWEEN $1 AND $2) 
   AND (CAST("data_points"."timestamp" AS DATE) BETWEEN CAST($1 AS DATE) AND CAST($2 AS DATE))
   AND "data_points"."interval_id" = $3 
   AND "data_points"."interval_id" = $3 
   AND "communities"."clustering_id"  = 1 

如果您的“data_points”表非常大而您的“时间戳”过滤范围很小,这应该会有所帮助,因为它会快速过滤掉不需要处理的行块。

我没有在 Postgres 中这样做过,所以我不确定它的可行性、帮助性,等等。但这是需要研究的东西:)

https://www.postgresql.org/docs/10/static/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE

【讨论】:

  • 这看起来很有希望,我会研究一下链接,看看如何修改表结构以利用分份。
【解决方案3】:

clusterings_id 上有外键吗?另外 - 尝试像这样改变你的状况:

SELECT communities.id as com, 
       consumers.name as con, 
       array_agg(timestamp ORDER BY data_points.timestamp asc) as tims, 
       array_agg(consumption ORDER BY data_points.timestamp ASC) as cons 
FROM "data_points" 
     INNER JOIN "consumers" ON "consumers"."id" = "data_points"."consumer_id" 
     INNER JOIN "communities_consumers" ON "communities_consumers"."consumer_id" = "consumers"."id" 
     INNER JOIN "communities" ON "communities"."id" = "communities_consumers"."community_id" 
WHERE ("data_points"."timestamp" BETWEEN $1 AND $2) 
   AND "data_points"."interval_id" = $3 
   AND "communities"."clustering_id"  = 1 
GROUP BY communities.id, consumers.id 

【讨论】:

  • 这类似于@EdmundLee 的回答,但如果不提高查询的性能。
【解决方案4】:
  1. 您无需加入clusterings。因此,请尝试从您的查询中删除它,并使用 communities.clustering_id = 1 代替它。这应该摆脱查询计划中的 3 个步骤。这应该可以为您节省最多,因为您的查询计划在三个嵌套循环内对其进行了几次索引扫描。

  2. 您还可以尝试调整聚合timestamp 的方式。我假设您不需要在几秒钟内聚合它们?

  3. 我还要删除 "index_data_points_on_timestamp" 索引,因为您已经有一个复合索引。这实际上是没有用的。这应该会提高您的写入性能。

【讨论】:

  • 第 1 点是有道理的,但我试过了,并没有提高性能。对于第 2 点,我使用interval_id 检查,它将 data_points 限制在指定的时间间隔内(例如 15 分钟、每小时、每天)。
  • @user000001 取出clusterings 后可以发布查询计划器结果吗?
  • 当然,这是执行的查询:DataPoint Load (39700.3ms) SELECT communities.id as com, consumers.name as con, array_agg(timestamp ORDER BY data_points.timestamp asc) as tims, array_agg(consumption ORDER BY data_points.timestamp ASC) as cons ...
  • ...FROM "data_points" INNER JOIN "consumers" ON "consumers"."id" = "data_points"."consumer_id" INNER JOIN "communities_consumers" ON "communities_consumers"."consumer_id" = "consumers"."id" INNER JOIN "communities" ON "communities"."id" = "communities_consumers"."community_id" WHERE ("data_points"."timestamp" BETWEEN $1 AND $2) AND "data_points"."interval_id" = $3 AND "communities"."clustering_id" = 1 GROUP BY communities.id, consumers.id [["timestamp", "2015-11-29 08:00:50.371546"], ["timestamp", "2015-12-06 08:00:50.371951"], ["interval_id", 2]]
  • @user000001 对不起,我的意思是来自 postgres 的计划者。可以分享一下吗?
【解决方案5】:

data_points.timestamp 上的索引没有被使用,可能是由于 ::timestamp 转换。

我想知道更改列数据类型或创建功能索引是否会有所帮助。

编辑 - 我猜,您的 CREATE TABLE 中的日期时间是 Rails 选择显示 Postgres 时间戳数据类型的方式,因此可能根本不会发生转换。

尽管如此,时间戳上的索引并未被使用,但根据您的数据分布,这可能是优化器非常明智的选择。

【讨论】:

    【解决方案6】:

    所以这里我们有 Postgres 9.3 和长查询。那么在查询之前,您必须确保您的数据库具有最佳设置,并适合您对磁盘的读写百分比、磁盘类型 ssd 或旧硬盘,并且您不切换 autovacuum,检查表和索引的膨胀并且您对用于构建最佳计划的索引具有很好的选择性。

    检查行类型和填充行的大小。改变行的类型也可以减少表格的大小和时间。

    所以现在你确保了这一切。现在让我们思考一下 Postgres 如何执行以及我们如何减少时间和精力。 ORM 适用于简单查询,但如果您尝试进行复杂查询,则必须使用 execute by sql 方法并保留在 Query Service Objects 中。

    在 sql 中尽可能编写更简单的查询 Postgres 也会浪费时间来解析查询。

    检查所有连接字段的索引。使用explain analyze 检查您现在是否拥有最佳扫描方法。

    下一点。您尝试进行 4 次连接! Postgres 尝试在 4 中找到最优查询计划!次(4 个阶乘!)让我们考虑使用带有预定义表的子查询或表进行此选择。

    对 4 个连接使用分离的查询或函数(尝试子查询):

    SELECT *
    FROM "data_points" as predefined
    INNER JOIN "consumers"
    ON "consumers"."id" ="data_points"."consumer_id" 
    INNER JOIN "communities_consumers"
    ON "communities_consumers"."consumer_id" = "consumers"."id" 
    INNER JOIN "communities"
    ON "communities"."id" = "communities_consumers"."community_id" 
    INNER JOIN "clusterings"
    ON "clusterings"."id" "communities"."clustering_id" 
    
    WHERE "data_points"."interval_id" = 2 
    AND "clusterings"."id" = 1 
    

    2) 下一步(不要使用变量直接传递)

    SELECT *
    FROM predefined
    WHERE "data_points"."timestamp"
    BETWEEN "2015-11-20 09:23:00"
    AND 2015-11-27 09:23:00
    

    3) 你有 3 次向data_points 查询,你需要的更少:

    array_agg(timestamp ORDER BY data_points.timestamp asc) as tims
    array_agg(consumption ORDER BY data_points.timestamp ASC) as cons
    WHERE ("data_points"."timestamp" BETWEEN $1 AND $2)
    

    您应该记住长查询不仅仅是关于查询,而是关于设置、ORM 使用、sql 以及 Postgres 如何使用它。

    【讨论】:

      猜你喜欢
      • 2018-08-02
      • 2015-11-29
      • 1970-01-01
      • 2022-01-22
      • 2012-10-25
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2012-05-01
      相关资源
      最近更新 更多