将 LIKE 运算符与 DETERMINISTIC 函数一起使用时的 Oracle 执行计划答案

【问题标题】：Oracle execution plans when using the LIKE operator with a DETERMINISTIC function将 LIKE 运算符与 DETERMINISTIC 函数一起使用时的 Oracle 执行计划
【发布时间】：2011-03-17 14:52:20
【问题描述】：

现在，当我在 LIKE 运算符的右侧使用 DETERMINISTIC 函数时，我遇到了一个非常棘手的问题，即 Oracle 执行计划运行严重。这是我的情况：

情况

我认为执行这样的查询是明智的（简化）：

SELECT [...]
FROM customers cust
JOIN addresses addr ON addr.cust_id = cust.id
WHERE special_char_filter(cust.surname) like special_char_filter(?)

我会将? 绑定到'Eder%' 之类的东西。现在customers 和addresses 是非常大的表。这就是为什么使用索引很重要的原因。当然，addresses.cust_id 上有一个常规索引。但我还在special_char_filter(customers.surname) 上创建了一个基于函数的索引，效果非常好。

麻烦

问题是，上述涉及like 子句的查询在addresses 上创建了带有FULL TABLE SCANS 的执行计划。此查询中的某些内容似乎使 Oracle 无法使用 addresses.cust_id 上的索引。

解决方法

我发现，我的问题的解决方案是这样的：

SELECT [...]
FROM customers cust
JOIN addresses addr ON addr.cust_id = cust.id
WHERE special_char_filter(cust.surname) like ?

我从 like 运算符的右侧删除了 (DETERMINISTIC !) 函数，并在 Java 中预先计算了绑定变量。现在这个查询是超快的，没有任何 FULL TABLE SCANS。这也非常快（虽然不等价）：

SELECT [...]
FROM customers cust
JOIN addresses addr ON addr.cust_id = cust.id
WHERE special_char_filter(cust.surname) = special_char_filter(?)

混乱

我不明白这一点。在 like 运算符的右侧使用确定性函数有什么问题？我在 Oracle 11.2.0.1.0 中观察到了这一点

【问题讨论】：

oracle 版本对于这类问题非常重要。 Oracle rdbms 版本是什么？
我在 11.2.0.1.0 版本中已经仔细观察过这个问题。认为它很可能也出现在 10g 版本中。不过，我无法正式确认这一点

标签： oracle sql-execution-plan sql-like deterministic

【解决方案1】：

查询中可能什么都没有。基于成本的优化器可能会感到困惑，并认为 FULL TABLE SCAN 更快。您是否尝试过在查询中使用 HINT，强制 Oracle 使用您的索引？

【讨论】：

是的，我已经尝试了几乎所有在这里找到的提示：psoug.org/reference/hints.html，但没有任何成功。 JOIN 从未使用过任何索引。但我很困惑，Oracle 不会意识到在计算执行计划之前只计算一次 special_char_filter(?) 就可以解决问题......从技术上讲，special_char_filter(?) 是一个常数。
@Lukas Eder：你试过/*+ use_nl(cust addr) */吗？没有任何索引提示有帮助，但 use_nl 对我有用。
嗯，我没试过。但是为什么你认为嵌套循环（而不是哈希连接）会强制使用索引？注意，还有/*+ use_nl_using_index(...) */，对我不起作用
嗯，/*+ use_nl */ 也不起作用。没有应用嵌套循环，我仍然得到一个带有全表扫描的哈希连接......它真的很扭曲。

【解决方案2】：

问题是 Oracle 不知道“special_char_filter(?)”会返回什么。如果它返回一个'%'，那么使用索引会很慢，因为一切都会匹配。如果它返回“A%”，它可能也会很慢，因为（假设所有字母的分布相等）大约 4% 的行会匹配。如果它返回 '%FRED%'，它不会返回很多行，但是使用索引范围扫描会执行得很差，因为这些行可能位于索引的开头、中间或结尾，所以它必须做整个索引。

如果您知道 special_char_filter 将始终返回一个开头至少包含三个“实心”字符的字符串，那么您可能会遇到更好的运气

选择 [...] 来自客户的客户加入地址 addr ON addr.cust_id = cust.id WHERE special_char_filter(cust.surname) like special_char_filter(?) AND substr(special_char_filter(cust.surname),1,3) = substr(special_char_filter(?),1,3)

在 substr(special_char_filter(cust.surname),1,3) 上有一个 FBI

虽然如果在 java 中预先计算结果有效，那么请坚持下去。

除此之外，我可能会查看 Oracle Text 的匹配项。

【讨论】：

感谢您的意见。不幸的是，我不知道? 的内容，因为这是用户输入。我只是认为甲骨文必须有某种方法可以计算该函数的结果在计算执行计划之前。 Oracle Text 将很快推出。我希望这能解决我所有的问题！ :-)
您正在使用prepared statement - 这些对于准备执行计划一次并运行多次非常有用。您是否尝试过简单的陈述（未准备好）？
@Mat，你是对的。有道理，我没有想到。但是，在我们的架构中，“简单声明”不是一个选项。但我使用 Toad（我在其中运行执行计划分析）直接针对数据库测试了“简单语句”。实际上，它并没有改变任何东西。

【解决方案3】：

下面的脚本显示了我用来对 ADDRESSES 索引进行索引范围扫描的步骤。在查看细节之前，您可能只想运行整个事情。如果您没有获得两次索引范围扫描对于最后两个查询，可能是我们的版本、设置等不同。我使用的是 10.2.0.1.0。

如果您确实看到了想要的计划，那么您可能希望逐步修改我的脚本以使其更准确地反映真实数据，并尝试找到导致其崩溃的确切更改。希望我的设置至少接近真实的东西，并且不会遗漏任何细节它与您的确切问题无关。

这是一个奇怪的问题，我不明白这里发生的一切。例如，我不知道为什么 use_nl 有效，但索引提示无效。

（请注意，我的执行时间是基于重复执行的。第一次运行时，一些查询可能会更慢，因为数据没有被缓存。）

--create tables
create table customers (id number, surname varchar2(100), other varchar2(100));
create table addresses (cust_id number, other varchar2(100));

--create data and indexes
insert into customers select level, 'ASDF'||level, level from dual connect by level <= 1000000;
insert into addresses select level, level from dual connect by level <= 1000000;
create index customers_id on customers(id);
create index addresses_cust_id on addresses(cust_id);
create index customers_special_char_filter on customers(special_char_filter(surname));

--create function
create or replace function special_char_filter(surname in varchar) return varchar2 deterministic is
begin
    return replace(surname, 'bad value!', null);
end;
/

--gather stats
begin
    dbms_stats.gather_table_stats(ownname => user, tabname => 'CUSTOMERS', cascade => true);
    dbms_stats.gather_table_stats(ownname => user, tabname => 'ADDRESSES', cascade => true);
end;
/

set autotrace on;

--Index range scan on CUSTOMERS_SPECIAL_CHAR_FILTER, but full table scan on ADDRESSES
--(0.2 seconds)
SELECT *
FROM customers cust
JOIN addresses addr ON addr.cust_id = cust.id
WHERE special_char_filter(cust.surname) like special_char_filter('ASDF100000bad value!%');

--This uses the addresses index but it does an index full scan.  Not really what we want.
--I'm not sure why I can't get an index range scan here.
--Various other index hints also failed here.  For example, no_index_ffs won't stop an index full scan.
--(1 second)
SELECT /*+ index(addr addresses_cust_id) */ *
FROM customers cust
JOIN addresses addr ON addr.cust_id = cust.id
WHERE special_char_filter(cust.surname) like special_char_filter('ASDF100000bad value!%');


--Success!  With this hint both indexes are used and it's super-fast.
--(0.02 seconds)
SELECT /*+ use_nl(cust addr) */ *
FROM customers cust
JOIN addresses addr ON addr.cust_id = cust.id
WHERE special_char_filter(cust.surname) like special_char_filter('ASDF100000bad value!%');


--But forcing the index won't always be a good idea, for example when the value starts with '%'.
--(1.2 seconds)
SELECT /*+ use_nl(cust addr) */ *
FROM customers cust
JOIN addresses addr ON addr.cust_id = cust.id
WHERE special_char_filter(cust.surname) like special_char_filter('%ASDF100000bad value!%');

【讨论】：

抱歉，我没有收到有关您的回答的任何通知。看起来是一个很好的基准。另一方面，您内联了您的字符串文字并且没有使用绑定变量，这可能会导致一些由于绑定变量偷看导致的问题。我仍然对这些事情感到困惑......
我检查了你的脚本。有趣的是，最后一条语句通常是我的数据库中最快的，即使前面有 %。另一方面，当使用/*+ use_nl */ 提示时，行数估计值通常与查询执行计划中的实际行数相差很大...
我会接受你的回答，因为它最接近任何解决方案。