【发布时间】:2013-12-17 11:31:42
【问题描述】:
我有一个简单的表格,目前有大约 1000 万行。 这是定义:
CREATE TABLE `train_run_messages` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`train_id` int(10) unsigned NOT NULL,
`customer_id` int(10) unsigned NOT NULL,
`station_id` int(10) unsigned NOT NULL,
`train_run_id` int(10) unsigned NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`type` tinyint(4) NOT NULL,
`customer_station_track_id` int(10) unsigned DEFAULT NULL,
`lateness_type` tinyint(3) unsigned NOT NULL,
`lateness_amount` mediumint(9) NOT NULL,
`lateness_code` tinyint(3) unsigned DEFAULT '0',
`info_text` varchar(32) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `timestamp` (`timestamp`),
KEY `lateness_amount` (`lateness_amount`),
KEY `customer_timestamp` (`customer_id`,`timestamp`),
KEY `trm_customer` (`customer_id`),
KEY `trm_train` (`train_id`),
KEY `trm_station` (`station_id`),
KEY `trm_trainrun` (`train_run_id`),
KEY `FI_trm_customer_station_tracks` (`customer_station_track_id`),
CONSTRAINT `FK_trm_customer_station_tracks` FOREIGN KEY (`customer_station_track_id`) REFERENCES `customer_station_tracks` (`id`),
CONSTRAINT `trm_customer` FOREIGN KEY (`customer_id`) REFERENCES `customers` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `trm_station` FOREIGN KEY (`station_id`) REFERENCES `stations` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `trm_train` FOREIGN KEY (`train_id`) REFERENCES `trains` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `trm_trainrun` FOREIGN KEY (`train_run_id`) REFERENCES `train_runs` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=9928724 DEFAULT CHARSET=utf8;
我们有很多按 customer_id 和时间戳过滤的查询,因此我们为此创建了一个组合索引。
现在我有这个简单的查询:
SELECT * FROM `train_run_messages` WHERE `customer_id` = '5' AND `timestamp` >= '2013-12-01 00:00:57' AND `timestamp` <= '2013-12-31 23:59:59' LIMIT 0, 100
在我们当前有大约 10M 条目的机器上,这个查询需要大约 16 秒,这在我看来有点长,因为这样的查询有一个索引。
让我们看看这个查询的解释输出:
+----+-------------+--------------------+------+------------------------------------------- +--------------------+---------+-------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------+------+-------------------------------------------+--------------------+---------+-------+--------+-------------+
| 1 | SIMPLE | train_run_messages | ref | timestamp,customer_timestmap,trm_customer | customer_timestamp | 4 | const | 551405 | Using where |
+----+-------------+--------------------+------+-------------------------------------------+--------------------+---------+-------+--------+-------------+
所以 MySQL 告诉我它将使用 customer_timestamp 索引,很好!为什么查询仍然需要约 16 秒? 由于我并不总是信任 MySQL 查询分析器,让我们尝试使用强制索引:
SELECT * FROM `train_run_messages` USE INDEX (customer_timestamp) WHERE `customer_id` = '5' AND `timestamp` >= '2013-12-01 00:00:57' AND `timestamp` <= '2013-12-31 23:59:59' LIMIT 0, 100
查询时间:0.079s!!
我:不解!
那么任何人都可以解释为什么 MySQL 显然没有使用它说它将从 EXPLAIN 输出中使用的索引吗?有什么方法可以证明它在执行真正的查询时真正使用了什么索引?
顺便说一句:这是慢日志的输出:
# Time: 131217 11:18:04
# User@Host: root[root] @ localhost [127.0.0.1]
# Query_time: 16.252878 Lock_time: 0.000168 Rows_sent: 100 Rows_examined: 9830711
SET timestamp=1387275484;
SELECT * FROM `train_run_messages` WHERE `customer_id` = '5' AND `timestamp` >= '2013-12-01 00:00:57' AND `timestamp` <= '2013-12-31 23:59:59' LIMIT 0, 100;
尽管它并没有具体说明它没有使用任何索引,但 Rows_examined 表明它会执行全表扫描。
那么这是否可以在不使用 USE INDEX 的情况下修复?我们使用 Propel 作为 ORM,目前无法在不手动编写查询的情况下使用 MySQL 特定的“USE INDEX”。
编辑: 这是 EXPLAIN 和 USE INDEX 的输出:
+----+-------------+--------------------+-------+--------------------+--------------------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------+-------+--------------------+--------------------+---------+------+--------+-------------+
| 1 | SIMPLE | train_run_messages | range | customer_timestmap | customer_timestmap | 8 | NULL | 191264 | Using where |
+----+-------------+--------------------+-------+--------------------+--------------------+---------+------+--------+-------------+
【问题讨论】:
-
有多少个不同的客户 ID?
-
在train_run_messages表中只有customerId为5的条目。(系统是为多客户设计的,但是这个数据库中只有一个客户)
-
在这种情况下,它将忽略 customer_id 上的索引(根据经验,如果索引没有将记录缩小到大约 1/3 以下,那么它将被忽略)。但是,我希望时间戳可以缩小范围
-
是的,时间戳大大缩小了范围,从 2009 年到现在,这 1000 万条记录或多或少均匀分布。