【发布时间】:2016-12-05 11:30:33
【问题描述】:
我有一张桌子clicks:
CREATE TABLE `clicks` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`link_id` int(11) NOT NULL,
`date_added` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
PRIMARY KEY (`id`),
KEY `link_id` (`link_id`),
KEY `date_added` (`date_added`)
) ENGINE=InnoDB AUTO_INCREMENT=90899051 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
具有以下索引:
+--------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| clicks | 0 | PRIMARY | 1 | id | A | 79808649 | NULL | NULL | | BTREE | | |
| clicks | 1 | link_id | 1 | link_id | A | 276154 | NULL | NULL | | BTREE | | |
| clicks | 1 | date_added | 1 | date_added | A | 79808649 | NULL | NULL | | BTREE | | |
+--------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
我正在尝试对此表运行一些分析查询,但我发现它需要很长时间才能运行。以下面的查询为例:
SELECT
DISTINCT(link_id) AS link_id
FROM
clicks
WHERE
date_added >= '2016-11-01 00:00:00'
AND date_added <= '2016-12-05 10:16:00'
完成此查询几乎需要一分钟。我通过在未使用索引的查询上运行 EXPLAIN 发现。
+----+-------------+--------+-------+---------------+---------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+---------+---------+------+----------+-------------+
| 1 | SIMPLE | clicks | index | date_added | link_id | 4 | NULL | 79786609 | Using where |
+----+-------------+--------+-------+---------------+---------+---------+------+----------+-------------+
我希望通过使用date_added 列上的索引来过滤结果集,然后从结果中提取不同的link_ids 来运行查询。
有谁知道为什么没有使用索引,或者我可以做些什么来强制使用它?
注意:这个问题是一个更大问题的一部分,与我上周发布的一个未解决问题密切相关 - MySQL query with JOIN not using INDEX
编辑
解释我的查询,不使用任何索引提示:
EXPLAIN SELECT DISTINCT(link_id) FROM clicks WHERE date_added >= '2016-11-01 00:00:00' AND date_added <= '2016-12-05 23:59:59';
+----+-------------+---------------------------+-------+---------------+---------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------------+-------+---------------+---------+---------+------+----------+-------------+
| 1 | SIMPLE | clicks | index | date_added | link_id | 4 | NULL | 79816660 | Using where |
+----+-------------+---------------------------+-------+---------------+---------+---------+------+----------+-------------+
用索引提示解释我的查询:
EXPLAIN SELECT DISTINCT(link_id) FROM clicks USE INDEX(date_added) IGNORE INDEX(link_id) WHERE date_added >= '2016-11-01 00:00:00' AND date_added <= '2016-12-05 23:59:59';
+----+-------------+---------------------------+------+---------------+------+---------+------+----------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------------+------+---------------+------+---------+------+----------+------------------------------+
| 1 | SIMPLE | clicks | ALL | date_added | NULL | NULL | NULL | 79816882 | Using where; Using temporary |
+----+-------------+---------------------------+------+---------------+------+---------+------+----------+------------------------------+
编辑 2
在我的查询中使用FORCE INDEX(date_added)(查询完成更快,12.05 秒):
EXPLAIN SELECT DISTINCT(link_id) FROM clicks FORCE INDEX(date_added) WHERE date_added >= '2016-11-01 00:00:00' AND date_added <= '2016-12-05 23:59:59';
+----+-------------+---------------------------+-------+---------------+------------+---------+------+----------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------------+-------+---------------+------------+---------+------+----------+------------------------------+
| 1 | SIMPLE | clicks | range | date_added | date_added | 4 | NULL | 17277508 | Using where; Using temporary |
+----+-------------+---------------------------+-------+---------------+------------+---------+------+----------+------------------------------+
【问题讨论】:
-
你分析过表格了吗?
-
如果将
DISTINCT(link_id)替换为count(*),会得到什么? -
@DuduMarkovitz EXPLAIN 语句指示“使用索引”,这是我想要的,我在大约 3.5 秒内得到结果。
-
using index表示覆盖索引。如果您使用 distinct,则需要为此创建一个多列索引。原因:对于 count(*) mysql 不需要检索实际的字段值,对于不同的它需要。 -
表格的百分之几包含在日期范围内? (这与您的一个问题有关。)
标签: mysql sql indexing database-performance query-performance