MySQL InnoDB FULLTEXT 在 JSON 生成的 STORED 列上搜索比 LIKE 慢答案

【问题标题】：MySQL InnoDB FULLTEXT search over JSON generated STORED column is slower than LIKEMySQL InnoDB FULLTEXT 在 JSON 生成的 STORED 列上搜索比 LIKE 慢
【发布时间】：2021-11-04 18:49:54
【问题描述】：

表：

CREATE TABLE `stores` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `slug` varchar(191) COLLATE utf8mb4_unicode_ci NOT NULL,
  `value` json DEFAULT NULL,
  `html` mediumtext COLLATE utf8mb4_unicode_ci
       GENERATED ALWAYS AS (json_unquote(json_extract(`value`,'$.html')))
       STORED,
  PRIMARY KEY (`id`),
  KEY `slug` (`slug`),
  FULLTEXT KEY `html` (`html`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

查询：

select id from `stores`  where  MATCH(stores.html) AGAINST ('forum*' IN BOOLEAN MODE)  limit 20

耗时 0.14 秒

解释：

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	stores	NULL	fulltext	html	html	0	const	1	100.00	Using where; Ft_hints: no_ranking, limit = 20

查询时：

select id from `stores`  where   stores.html like '%forum%' limit 20

仅需 0.003 秒

解释：

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	stores	NULL	ALL	NULL	NULL	NULL	NULL	134101	100.00	Using where

我记得，当我第一次通过 json 实现这个虚拟生成的字段时，它似乎比类似的要快，但现在在所有字段上实现它之后，我发现网站变慢了。所以我开始分析简单的查询，发现全文实际上要慢得多！

当我在 select 之后添加 SQL_NO_CACHE 时没有任何区别。

我错过了什么？谢谢

【问题讨论】：

删除LOWER;它可能会更快。
你得到相同的结果吗？
从 LOWER 删除开始，好的，但问题是为什么全文搜索很慢？结果一样吗？使用 LIKE，我得到的结果数量略多一些，因为它与字符串匹配，即使它是另一个单词的一部分。
这需要多长时间？ SELECT MAX(LENGTH(html)) FROM stores where html like '%forum%'; ?
MAX(LENGTH(html)) 75581 1 行 (0.083 s)

标签： mysql json full-text-search innodb generated

【解决方案1】：

这个查询异常快：

select id from `stores`
    where stores.html like '%forum%' limit 20

因为它只查看了足够多的行来找到包含该字符串的 20 行。我想你会发现这需要更长的时间，因为它会检查每一行：

select id from `stores`
    where stores.html like '%non-existent-text%' limit 20

另一个可能的原因是MATCH 在执行LIMIT 之前发现了数百、可能数千或行。所以时间：

select id from `stores`  
    where  MATCH(stores.html) AGAINST ('qwertyui' IN BOOLEAN MODE)  limit 20

最重要的是，您可能需要忍受这种不一致。我相信（没有确凿证据证明你的你的数据集）比MATCH通常会比LIKE快。注意“通常”这个词。

【讨论】：