【问题标题】:Index is not used when using Exposed with DataSource with Postgres将 Exposed with DataSource 与 Postgres 一起使用时不使用索引
【发布时间】:2020-06-22 14:39:41
【问题描述】:

在将 Exposed 与 DataSource 一起使用时,我遇到了意外行为(我尝试了 apache DBCP 和 HikariCP)。

设置:带有idflag 字段的单个表(test),索引在flag 上。

查询:SELECT * from test where flag=1 limit 1;

手动运行时,使用索引,查询速度快。当通过暴露重复运行时,经过 9 次调用,性能会下降。该索引不再使用 - 请参阅下面的查询计划。

这里是示例代码:


object TestTable : IntIdTable() {
    val flag = integer("flag").index()
}

fun createNRows(n: Int) = repeat(n) {
    TestTable.insert { it[flag] = 0 }
}

fun main(args: Array<String>) {
    val ds = HikariDataSource(HikariConfig().apply {
        jdbcUrl = "jdbc:postgresql://localhost:5432/testdb"
        username = ...
        password = ...
        setDriverClassName("org.postgresql.Driver")
    })

    Database.connect(ds)


    transaction {
        // only run the first time:
        // SchemaUtils.create(TestTable)
        // createNRows(1000000) 
        println("total ${TestTable.selectAll().count()} elements")
    }

    repeat(10000) {
        transaction {
            val startedAt = System.currentTimeMillis()
            TestTable.select { TestTable.flag.eq(1) }.limit(1).toList()
            println("Query took ${System.currentTimeMillis() - startedAt}")
        }
    }
}

输出:

total 1000000 elements
Query took 6
Query took 1
Query took 1
Query took 1
Query took 1
Query took 1
Query took 1
Query took 1
Query took 0
Query took 79
Query took 64
Query took 63
Query took 62
Query took 63
....

这里是启用了EXPLAIN (ANALYZE, BUFFERS) 的 postgres 日志:

这是最后一个快速查询:

2020-03-10 23:03:00.596 CET [71012] LOG:  duration: 0.021 ms  bind S_2: 
2020-03-10 23:03:00.597 CET [71012] LOG:  duration: 0.083 ms  parse <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ
2020-03-10 23:03:00.597 CET [71012] LOG:  duration: 0.013 ms  bind <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ
2020-03-10 23:03:00.597 CET [71012] LOG:  execute <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ
2020-03-10 23:03:00.597 CET [71012] LOG:  duration: 0.025 ms
2020-03-10 23:03:00.597 CET [71012] LOG:  duration: 0.011 ms  bind S_3: BEGIN
2020-03-10 23:03:00.597 CET [71012] LOG:  execute S_3: BEGIN
2020-03-10 23:03:00.597 CET [71012] LOG:  duration: 0.015 ms
2020-03-10 23:03:00.598 CET [71012] LOG:  duration: 0.159 ms  bind S_4: SELECT test.id, test.flag FROM test WHERE test.flag = $1 LIMIT 1
2020-03-10 23:03:00.598 CET [71012] DETAIL:  parameters: $1 = '1'
2020-03-10 23:03:00.598 CET [71012] LOG:  execute S_4: SELECT test.id, test.flag FROM test WHERE test.flag = $1 LIMIT 1
2020-03-10 23:03:00.598 CET [71012] DETAIL:  parameters: $1 = '1'
2020-03-10 23:03:00.598 CET [71012] LOG:  duration: 0.028 ms
2020-03-10 23:03:00.598 CET [71012] LOG:  duration: 0.015 ms  plan:
    Query Text: SELECT test.id, test.flag FROM test WHERE test.flag = $1 LIMIT 1
    Limit  (cost=0.42..4.44 rows=1 width=8) (actual time=0.013..0.013 rows=0 loops=1)
      Buffers: shared hit=3
      ->  Index Scan using test_flag on test  (cost=0.42..4.44 rows=1 width=8) (actual time=0.012..0.012 rows=0 loops=1)
            Index Cond: (flag = 1)
            Buffers: shared hit=3
2020-03-10 23:03:00.598 CET [71012] LOG:  duration: 0.072 ms  bind S_1: COMMIT
2020-03-10 23:03:00.598 CET [71012] LOG:  execute S_1: COMMIT
2020-03-10 23:03:00.598 CET [71012] LOG:  duration: 0.017 ms
2020-03-10 23:03:00.599 CET [71012] LOG:  duration: 0.022 ms  parse <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ COMMITTED
2020-03-10 23:03:00.599 CET [71012] LOG:  duration: 0.007 ms  bind <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ COMMITTED
2020-03-10 23:03:00.599 CET [71012] LOG:  execute <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ COMMITTED
2020-03-10 23:03:00.599 CET [71012] LOG:  duration: 0.013 ms

这是第一个“慢”的:

2020-03-10 23:03:01.601 CET [71012] LOG:  duration: 0.022 ms  bind S_2: 
2020-03-10 23:03:01.602 CET [71012] LOG:  duration: 0.052 ms  parse <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ
2020-03-10 23:03:01.602 CET [71012] LOG:  duration: 0.011 ms  bind <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ
2020-03-10 23:03:01.602 CET [71012] LOG:  execute <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ
2020-03-10 23:03:01.602 CET [71012] LOG:  duration: 0.023 ms
2020-03-10 23:03:01.602 CET [71012] LOG:  duration: 0.012 ms  bind S_3: BEGIN
2020-03-10 23:03:01.602 CET [71012] LOG:  execute S_3: BEGIN
2020-03-10 23:03:01.602 CET [71012] LOG:  duration: 0.015 ms
2020-03-10 23:03:01.602 CET [71012] LOG:  duration: 0.192 ms  bind S_4: SELECT test.id, test.flag FROM test WHERE test.flag = $1 LIMIT 1
2020-03-10 23:03:01.602 CET [71012] DETAIL:  parameters: $1 = '1'
2020-03-10 23:03:01.602 CET [71012] LOG:  execute S_4: SELECT test.id, test.flag FROM test WHERE test.flag = $1 LIMIT 1
2020-03-10 23:03:01.602 CET [71012] DETAIL:  parameters: $1 = '1'
2020-03-10 23:03:01.678 CET [71012] LOG:  duration: 75.889 ms
2020-03-10 23:03:01.679 CET [71012] LOG:  duration: 75.868 ms  plan:
    Query Text: SELECT test.id, test.flag FROM test WHERE test.flag = $1 LIMIT 1
    Limit  (cost=0.00..0.02 rows=1 width=8) (actual time=75.864..75.864 rows=0 loops=1)
      Buffers: shared hit=96 read=4329
      ->  Seq Scan on test  (cost=0.00..16925.00 rows=1000000 width=8) (actual time=75.862..75.862 rows=0 loops=1)
            Filter: (flag = $1)
            Rows Removed by Filter: 1000000
            Buffers: shared hit=96 read=4329
2020-03-10 23:03:01.679 CET [71012] LOG:  duration: 0.054 ms  bind S_1: COMMIT
2020-03-10 23:03:01.679 CET [71012] LOG:  execute S_1: COMMIT
2020-03-10 23:03:01.679 CET [71012] LOG:  duration: 0.014 ms
2020-03-10 23:03:01.679 CET [71012] LOG:  duration: 0.025 ms  parse <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ COMMITTED
2020-03-10 23:03:01.679 CET [71012] LOG:  duration: 0.004 ms  bind <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ COMMITTED
2020-03-10 23:03:01.679 CET [71012] LOG:  execute <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ COMMITTED
2020-03-10 23:03:01.679 CET [71012] LOG:  duration: 0.009 ms

Postgres 版本(自制):

postgres (PostgreSQL) 11.5

客户端版本:


dependencies {
    implementation 'org.jetbrains.exposed:exposed:0.17.7'
    implementation "org.postgresql:postgresql:42.2.8"
    implementation 'org.jetbrains.kotlin:kotlin-stdlib'
    implementation 'com.zaxxer:HikariCP:2.3.2'
}

postgres 配置是默认的(日志是使用自动解释生成的,但没有它问题会重现)

这里是例子的来源:https://github.com/RomanBrodetski/kotlin-exposed-issue

观察:

  • 如果.limit(1) 被删除,则问题不会重现
  • 如果未使用数据源(Database.connect("jdbc:postgresql://localhost:5432/testdb", driver = "org.postgresql.Driver") 而不是 Database.connect(ds)),则不会重现该问题
  • 如果事务中有附加语句,则问题不再重现。

【问题讨论】:

  • 您能否显示EXPLAIN (ANALYZE, BUFFERS) 的输出以显示其中一个较慢的运行?您可以为此使用 auto_explain。
  • @LaurenzAlbe 感谢您的建议,请参阅更新后的帖子
  • 不同之处在于慢查询使用的是带有通用计划的服务器端准备语句。列中的值如何分布? ANALYZE 是否已在桌面上运行?
  • @LaurenzAlbe 没有或很少有带有flag == 1 的元素。我尝试运行ANALYZE,但没有任何改变。
  • 听起来像参数嗅探/绑定变量问题。 SELECT * from test where flag=1 limit 1; - 标志是硬编码的,实际代码 SELECT test.id, test.flag FROM test WHERE test.flag = $1 LIMIT 1 - 标志是一个参数(计划必须更通用才能满足所有情况)use-the-index-luke.com/blog/2011-07-16/planning-for-reuse

标签: postgresql jdbc datasource kotlin-exposed


【解决方案1】:

完全删除.limit(1) 使其始终使用索引。 问题是在几次(5)次执行后为准备好的语句创建的总体计划是错误的。 limit 1 是它的原因。将1 设为绑定变量可以解决问题。不幸的是,我在 Exposed 库中找不到这样做的方法 - 它将数字内联到准备好的语句中。

由于某种原因,它认为它可以在顺序扫描期间立即找到匹配的行,无论我做什么真空/分析/创建统计信息,我都无法改变它的想法。 (我尝试改变标志值的分布,没有帮助)

从 SQL 重现问题:

create index test_flag_partial_idx on test (flag) include (id) where flag is not null and flag = 1;

vacuum  full  analyse  test;

PREPARE select_with_limit_as_value AS SELECT test.id, test.flag FROM test WHERE test.flag IS NOT NULL AND test.flag = $1 LIMIT 1;
EXECUTE select_with_limit_as_value(1);
EXECUTE select_with_limit_as_value(1);
EXECUTE select_with_limit_as_value(1);
EXECUTE select_with_limit_as_value(1);
EXECUTE select_with_limit_as_value(1);
EXECUTE select_with_limit_as_value(1);


PREPARE select_with_limit_as_bind AS SELECT test.id, test.flag FROM test WHERE test.flag IS NOT NULL AND test.flag = $1 LIMIT $2;
EXECUTE select_with_limit_as_bind(1, 1);
EXECUTE select_with_limit_as_bind(1, 1);
EXECUTE select_with_limit_as_bind(1, 1);
EXECUTE select_with_limit_as_bind(1, 1);
EXECUTE select_with_limit_as_bind(1, 1);
EXECUTE select_with_limit_as_bind(1, 1);

第一个准备好的语句使用limit作为硬编码值,在几次执行后切换到使用顺序扫描的通用计划。 第二条prepared statement使用limit作为绑定变量,一般plan使用index。

您需要将标志参数硬编码到查询中或将限制设为绑定变量。

在 PostgreSQL 12 中,您可以禁用通用计划,您可以在查询前后对其进行更改:

set plan_cache_mode = force_custom_plan;

所有都在 PostgreSQL 12.2 上尝试过。

【讨论】:

    【解决方案2】:

    尝试在插入数据后收集表统计信息。看起来 CBO 了解表结构的统计信息较少。实际上,postgres使用您创建的索引并不是一个坏主意,因为索引的所有值都是相同的。因此,接下来可以尝试从代码中删除索引,或者创建更好的索引。

    最后它似乎与Exposed 无关,而是与Postgresql 本身有关。

    (我想发表评论,但由于我的SO声誉不可能,因此写了一个答案)

    【讨论】:

      猜你喜欢
      • 2021-09-04
      • 2010-12-04
      • 2018-07-08
      • 1970-01-01
      • 1970-01-01
      • 2018-04-16
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多