【发布时间】:2013-07-26 12:06:27
【问题描述】:
我有一张像
这样的大桌子CREATE TABLE IF NOT EXISTS `object_search` (
`keyword` varchar(40) COLLATE latin1_german1_ci NOT NULL,
`object_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`keyword`,`media_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_german1_ci;
大约 3900 万行(使用超过 1 GB 的空间)包含对象表中 100 万条记录的索引数据(object_id 指向)。
现在用类似的查询搜索这个
SELECT object_id, COUNT(object_id) AS hits
FROM object_search
WHERE keyword = 'woman' OR keyword = 'house'
GROUP BY object_id
HAVING hits = 2
已经比在 object 表中的组合 keywords 字段上搜索 LIKE 快得多,但仍需要长达 1 分钟。
解释如下:
+----+-------------+--------+------+---------------+---------+---------+-------+--------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------+---------------+---------+---------+-------+--------+----------+--------------------------+
| 1 | SIMPLE | search | ref | PRIMARY | PRIMARY | 42 | const | 345180 | 100.00 | Using where; Using index |
+----+-------------+--------+------+---------------+---------+---------+-------+--------+----------+--------------------------+
完整的解释与连接object 和object_color 和object_locale 表,而上述查询在子查询中运行以避免开销,看起来像:
+----+-------------+-------------------+--------+---------------+-----------+---------+------------------+--------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------------+--------+---------------+-----------+---------+------------------+--------+----------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 182544 | 100.00 | Using temporary; Using filesort |
| 1 | PRIMARY | object_color | eq_ref | object_id | object_id | 4 | search.object_id | 1 | 100.00 | |
| 1 | PRIMARY | locale | eq_ref | object_id | object_id | 4 | search.object_id | 1 | 100.00 | |
| 1 | PRIMARY | object | eq_ref | PRIMARY | PRIMARY | 4 | search.object_id | 1 | 100.00 | |
| 2 | DERIVED | search | ref | PRIMARY | PRIMARY | 42 | | 345180 | 100.00 | Using where; Using index |
+----+-------------+-------------------+--------+---------------+-----------+---------+------------------+--------+----------+---------------------------------+
我的首要目标是能够在 1 或 2 秒内完成扫描。
那么,有没有其他技术可以提高关键字的搜索速度?
2013 年 8 月 6 日更新:
应用 Neville K 的大部分建议,我现在有以下设置:
CREATE TABLE `object_search_keyword` (
`keyword_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`keyword` varchar(64) COLLATE latin1_german1_ci NOT NULL,
PRIMARY KEY (`keyword_id`),
FULLTEXT KEY `keyword_ft` (`keyword`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_german1_ci;
CREATE TABLE `object_search` (
`keyword_id` int(10) unsigned NOT NULL,
`object_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`keyword_id`,`media_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
新查询的解释如下所示:
+----+-------------+----------------+----------+--------------------+------------+---------+---------------------------+---------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------------+----------+--------------------+------------+---------+---------------------------+---------+----------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 24381 | 100.00 | Using temporary; Using filesort |
| 1 | PRIMARY | object_color | eq_ref | object_id | object_id | 4 | object_search.object_id | 1 | 100.00 | |
| 1 | PRIMARY | object | eq_ref | PRIMARY | PRIMARY | 4 | object_search.object_id | 1 | 100.00 | |
| 1 | PRIMARY | locale | eq_ref | object_id | object_id | 4 | object_search.object_id | 1 | 100.00 | |
| 2 | DERIVED | <derived4> | system | NULL | NULL | NULL | NULL | 1 | 100.00 | |
| 2 | DERIVED | <derived3> | ALL | NULL | NULL | NULL | NULL | 24381 | 100.00 | |
| 4 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
| 3 | DERIVED | object_keyword | fulltext | PRIMARY,keyword_ft | keyword_ft | 0 | | 1 | 100.00 | Using where; Using temporary; Using filesort |
| 3 | DERIVED | object_search | ref | PRIMARY | PRIMARY | 4 | object_keyword.keyword_id | 2190225 | 100.00 | Using index |
+----+-------------+----------------+----------+--------------------+------------+---------+---------------------------+---------+----------+----------------------------------------------+
许多派生来自关键字比较子查询被嵌套到另一个子查询中,该子查询只计算返回的行数:
SELECT SQL_NO_CACHE object.object_id, ..., @rn AS numrows
FROM (
SELECT *, @rn := @rn + 1
FROM (
SELECT SQL_NO_CACHE search.object_id, COUNT(turbo.object_id) AS hits
FROM object_keyword AS kwd
INNER JOIN object_search AS search ON (kwd.keyword_id = search.keyword_id)
WHERE MATCH (kwd.keyword) AGAINST ('+(woman) +(house)')
GROUP BY search.object_id HAVING hits = 2
) AS numrowswrapper
CROSS JOIN (SELECT @rn := 0) CONST
) AS turbo
INNER JOIN object AS object ON (search.object_id = object.object_id)
LEFT JOIN object_color AS object_color ON (search.object_id = object_color.object_id)
LEFT JOIN object_locale AS locale ON (search.object_id = locale.object_id)
ORDER BY timestamp_upload DESC
上述查询实际上会在大约 6 秒内运行,因为它搜索两个关键字。我搜索的关键字越多,搜索下降的速度就越快。
有什么方法可以进一步优化?
2013 年 8 月 7 日更新
阻塞的东西似乎几乎可以肯定是附加的ORDER BY 语句。没有它,查询将在不到一秒的时间内执行。
那么,有什么方法可以更快地对结果进行排序?欢迎提出任何建议,即使是需要在其他地方进行后期处理的骇人听闻的建议。
当天晚些时候更新 2013-08-07
好的女士们先生们,将WHERE 和ORDER BY 语句嵌套在另一层子查询中,以免它打扰不需要的表,它的性能再次大致翻倍:
SELECT wowrapper.*, locale.title
FROM (
SELECT SQL_NO_CACHE object.object_id, ..., @rn AS numrows
FROM (
SELECT *, @rn := @rn + 1
FROM (
SELECT SQL_NO_CACHE search.media_id, COUNT(search.media_id) AS hits
FROM object_keyword AS kwd
INNER JOIN object_search AS search ON (kwd.keyword_id = search.keyword_id)
WHERE MATCH (kwd.keyword) AGAINST ('+(frau)')
GROUP BY search.media_id HAVING hits = 1
) AS numrowswrapper
CROSS JOIN (SELECT @rn := 0) CONST
) AS search
INNER JOIN object AS object ON (search.object_id = object.object_id)
LEFT JOIN object_color AS color ON (search.object_id = color.object_id)
WHERE 1
ORDER BY object.object_id DESC
) AS wowrapper
LEFT JOIN object_locale AS locale ON (jfwrapper.object_id = locale.object_id)
LIMIT 0,48
搜索耗时 12 秒(单个关键字,约 200K 结果)现在需要 6 秒,搜索两个关键字耗时 6 秒(60K 结果)现在需要大约 3.5 秒。
现在这已经是一个巨大的进步,但有没有机会进一步推动这一点?
当天早些时候更新 2013-08-08
取消查询的最后一个嵌套变体,因为它实际上减慢了它的其他变体......
我现在正在尝试使用 MyISAM 使用不同的表布局和 FULLTEXT 索引的其他一些东西,用于具有组合关键字字段(逗号分隔在 TEXT 字段中)的专用搜索表。
2013-08-08 更新
好吧,纯全文索引并没有真正的帮助。
回到之前的设置,唯一阻塞的是ORDER BY(它使用临时表和文件排序)。没有它,搜索将在不到一秒的时间内完成!
所以基本上剩下的就是:
如何优化ORDER BY 语句以更快地运行,可能是通过消除临时表的使用?
【问题讨论】:
-
您可以发布
EXPLAIN查询结果吗? -
我刚刚运行了
OPTIMIZE TABLE,现在它在 10 到 30 秒之间。 -
很难回答您更新后的问题,因为您并没有真正将苹果与苹果进行比较 - 您在原始问题中发布的查询使用架构更改和全文搜索运行得更快还是更慢?
-
@NevilleK 它运行得更快,并且它仍然作为最里面的子查询出现在更新的问题中。
标签: mysql search indexing keyword-search