使用 ORDER BY 的 MySQL 查询需要很长时间才能执行答案

【问题标题】：MySQL query with ORDER BY takes long time to execute使用 ORDER BY 的 MySQL 查询需要很长时间才能执行
【发布时间】：2020-05-05 15:37:47
【问题描述】：

我有一个名为“response_set”的表，具有以下索引（“show create table response_set;”的结果）：

| response_set | CREATE TABLE `response_set` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `survey_id` int(11) NOT NULL DEFAULT '0',
  `respondent_id` int(11) DEFAULT NULL,
  `ext_ref` varchar(64) DEFAULT NULL,
  `email_addr` varchar(128) DEFAULT NULL,
  `ip` varchar(32) DEFAULT NULL,
  `t` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `time_taken` int(11) DEFAULT NULL,
  `category_id` int(11) DEFAULT NULL,
  `duplicate` int(1) DEFAULT '0',
  `email_group` varchar(30) DEFAULT NULL,
  `external_email_id` int(11) DEFAULT NULL,
  `geo_code_country` varchar(64) DEFAULT NULL,
  `geo_code_country_code` varchar(2) DEFAULT NULL,
  `terminated_survey` int(1) DEFAULT NULL,
  `geo_code_region` varchar(128) DEFAULT NULL,
  `geo_code_city` varchar(3) DEFAULT NULL,
  `geo_code_area_code` varchar(3) DEFAULT NULL,
  `geo_code_dma_code` varchar(3) DEFAULT NULL,
  `restart_url` varchar(255) DEFAULT NULL,
  `inset_list` varchar(1024) DEFAULT NULL,
  `custom1` varchar(1024) DEFAULT NULL,
  `custom2` varchar(1024) DEFAULT NULL,
  `custom3` varchar(1024) DEFAULT NULL,
  `custom4` varchar(1024) DEFAULT NULL,
  `panel_member_id` int(11) DEFAULT NULL,
  `external_id` int(11) DEFAULT NULL,
  `weight` float DEFAULT NULL,
  `custom5` varchar(1024) DEFAULT NULL,
  `quota_overlimit` int(1) DEFAULT '0',
  `panel_id` int(11) DEFAULT NULL,
  `referer_url` varchar(255) DEFAULT NULL,
  `referer_domain` varchar(64) DEFAULT NULL,
  `user_agent` varchar(255) DEFAULT NULL,
  `longitude` decimal(15,12) DEFAULT '0.000000000000',
  `latitude` decimal(15,12) DEFAULT '0.000000000000',
  `radius` decimal(7,2) DEFAULT '0.00',
  `cx_business_unit_id` int(11) DEFAULT '0',
  `survey_link_id` int(11) DEFAULT '0',
  `data_quality_flag` int(1) DEFAULT '0',
  `data_quality_score` double DEFAULT '0',
  `extended_info_json` json DEFAULT NULL,
  `updated_ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `channel` int(1) DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `panel_member_id` (`panel_member_id`),
  KEY `panel_member_id_2` (`panel_member_id`),
  KEY `email_group` (`email_group`),
  KEY `email_group_2` (`email_group`),
  KEY `survey_timestamp_idx` (`survey_id`,`t`),
  KEY `cx_business_unit_id_idx` (`cx_business_unit_id`),
  KEY `data_quality_flag_idx` (`data_quality_flag`),
  KEY `data_quality_score_idx` (`data_quality_score`),
  KEY `survey_timestamp_terminated_idx` (`survey_id`,`t`,`terminated_survey`),
  KEY `survey_idx` (`survey_id`)
) ENGINE=InnoDB AUTO_INCREMENT=39759 DEFAULT CHARSET=utf8 |

现在我正在一个页面上执行以下查询，以根据survey_id 和order by id 检索response_set 行：

SELECT * 
FROM response_set a 
WHERE a.survey_id = 1602673827 
ORDER BY a.id limit 100;

问题有时是查询需要 超过 30 秒 才能执行，并且这种行为不一致（因为有时会在按 a.id 排序时发生，有时会在按 a.id DESC 排序时发生因为用户可以在页面上按升序或降序查看响应集）针对不同的survey_id。

表中有大约 620 万条记录，对于给定的survey_id (1602673827)，有 45,800 条记录。在使用 EXPLAIN SELECT 语句了解查询执行计划时，我得到以下信息：

+----+-------------+-------+------------+-------+------------------------------------------------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys                                        | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+-------+------------------------------------------------------+---------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | a     | NULL       | index | survey_timestamp_idx,survey_timestamp_terminated_idx | PRIMARY | 4       | NULL | 6863 |     1.46 | Using where |
+----+-------------+-------+------------+-------+------------------------------------------------------+---------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

现在我无法理解即使 indexes -> 'survey_timestamp_idx,survey_timestamp_terminated_idx' 存在，为什么 MySQL 不使用索引并选择全表扫描。另外，当我按如下方式修改查询时：

SELECT * 
FROM response_set a USE INDEX (survey_timestamp_idx) 
WHERE a.survey_id = 1602673827 
ORDER BY a.id  limit 100;

查询执行时间减少到 0.17 秒。在对修改后的查询执行 EXPLAIN 时，我得到以下信息：

+----+-------------+-------+------------+------+----------------------+----------------------+---------+-------+-------+----------+---------------------------------------+
| id | select_type | table | partitions | type | possible_keys        | key                  | key_len | ref   | rows  | filtered | Extra                                 |
+----+-------------+-------+------------+------+----------------------+----------------------+---------+-------+-------+----------+---------------------------------------+
|  1 | SIMPLE      | a     | NULL       | ref  | survey_timestamp_idx | survey_timestamp_idx | 4       | const | 87790 |   100.00 | Using index condition; Using filesort |
+----+-------------+-------+------------+------+----------------------+----------------------+---------+-------+-------+----------+---------------------------------------+
1 row in set, 1 warning (0.00 sec)

但是，我不想在查询中显式使用“USE INDEX”，因为 where 子句是动态的，并且可以根据用户选择的过滤器在 where 子句中包含以下组合：

1. where survey_id = ?;
2. where survey_id = ? and t = ?; (t is timestamp)
3. where survey_id = ? and terminated_survey = ?;
4. where survey_id = ? and t = ? and terminated_survey = ?;

另外，如果我从查询中删除 ORDER BY 子句，查询总是使用索引并且执行速度非常快。

当查询中存在 ORDER BY 子句时，是否有其他方法可以让 MySQL 查询引擎选择正确（更快）的执行计划（通过使用正确的索引）？

我正在使用 MySQL 版本：5.7.22

我已阅读有关 ORDER BY 查询优化 (https://dev.mysql.com/doc/refman/5.5/en/order-by-optimization.html) 的 MySQL 官方文档，并尝试在 (id,survey_id) 和 (survey_id, id) 上添加复合索引，但没有成功。有人可以帮忙吗？

【问题讨论】：

所有查询案例都有第一个条件WHERE survey_id = ?，因此您可以使用索引提示USE INDEX (survey_idx) 来提高性能
您还可以根据survey_idx 键将表拆分为分区。 dev.mysql.com/doc/mysql-partitioning-excerpt/5.7/en/…

标签： mysql database sql-order-by query-performance

【解决方案1】：

survey_id = ?;
survey_id = 哪里？和 t = ?; （t 是时间戳）
survey_id = 哪里？和 terminate_survey = ?;
survey_id = 哪里？和 t = ?和 terminate_survey = ?;

假设您有ORDER BY id ASC (or DESC)，那么您需要 4 个索引来优化处理所有这些索引。从 WHERE 中提到的 1、2 或 3 列（以任意顺序）开始，然后以 id 结束。

我无法解释为什么 KEY survey_idx (survey_id) 未用于相关查询，该索引也不是 EXPLAIN 中的“可能键”。就好像在运行查询和发布此问题之间发生了一些变化。请重新检查。

顺便说一句，INT(1) 仍然需要 4 个字节；你可能想要一个字节的TINYINT UNSIGNED。许多其他领域都超出了必要的范围。尺寸至少对性能有一定影响。

0.17s -- 使用FORCE INDEX(survey_idx) 可能会更快

以PRIMARY KEY（如(id, survey_id)）开始几乎总是没用的。索引应该开始用=测试的东西，然后移动到作为范围或GROUP BY或（如你的情况）ORDER BY测试的东西。

食谱：http://mysql.rjweb.org/doc.php/index_cookbook_mysql

【讨论】：