在 Tarantool 中通过 SQL 查询不区分大小写的列答案

【问题标题】：Querying case-insensitive columns by SQL in Tarantool在 Tarantool 中通过 SQL 查询不区分大小写的列
【发布时间】：2020-10-09 20:53:09
【问题描述】：

我们知道字符串 Tarantool 索引可以通过指定排序选项设置为不区分大小写：collation = "unicode_ci"。例如：

t = box.schema.create_space("test")
t:format({{name = "id", type = "number"}, {name = "col1", type = "string"}})
t:create_index('primary')
t:create_index("col1_idx", {parts = {{field = "col1", type = "string", collation = "unicode_ci"}}})
t:insert{1, "aaa"}
t:insert{2, "bbb"}
t:insert{3, "ccc"}

现在我们可以进行不区分大小写的查询：

tarantool> t.index.col1_idx:select("AAA")
---
- - [1, 'aaa']
...

但是如何使用 SQL 来实现呢？这不起作用：

tarantool> box.execute("select * from \"test\" where \"col1\" = 'AAA'")
---
- metadata:
  - name: id
    type: number
  - name: col1
    type: string
  rows: []
...

这个也不行：

tarantool> box.execute("select * from \"test\" indexed by \"col1_idx\" where \"col1\" = 'AAA'")
---
- metadata:
  - name: id
    type: number
  - name: col1
    type: string
  rows: []
...

有一个性能不佳的肮脏技巧（完整扫描）。我们不想要它，是吗？

tarantool> box.execute("select * from \"test\" indexed by \"col1_idx\" where upper(\"col1\") = 'AAA'")
---
- metadata:
  - name: id
    type: number
  - name: col1
    type: string
  rows:
  - [1, 'aaa']
...

最后，我们还有一个解决方法：

tarantool> box.execute("select * from \"test\" where \"col1\" = 'AAA' collate \"unicode_ci\"")
---
- metadata:
  - name: id
    type: number
  - name: col1
    type: string
  rows:
  - [1, 'aaa']
...

但问题是 - 它是否使用索引？没有索引它也可以工作......

【问题讨论】：

标签： sql performance tarantool

【解决方案1】：

可以检查查询计划以确定是否使用了特定索引。要获得查询计划，只需在原始查询中添加“EXPLAIN QUERY PLAN”前缀。例如：

tarantool>  box.execute("explain query plan select * from \"test\" where \"col1\" = 'AAA' collate \"unicode_ci\"")
---
- metadata:
  - name: selectid
    type: integer
  - name: order
    type: integer
  - name: from
    type: integer
  - name: detail
    type: text
  rows:
  - [0, 0, 0, 'SEARCH TABLE test USING COVERING INDEX col1_idx (col1=?) (~1 row)']
...

所以答案是“是”，在这种情况下使用索引。
再举一个例子：

box.execute("select * from \"test\" indexed by \"col1_idx\" where \"col1\" = 'AAA'")

不幸的是，此比较中的排序规则是二进制的，因为索引的排序规则被忽略了。在 SQL 中，只有列的排序规则被认为在比较期间使用。此限制将在相应的issue 关闭后立即解决。

【讨论】：