mysql - 按索引列分组+按索引列分组导致速度下降答案

【问题标题】：mysql - group by indexed columns + where by indexed column caused speed decreasemysql - 按索引列分组+按索引列分组导致速度下降
【发布时间】：2012-01-08 14:08:45
【问题描述】：

我有一个表statistics 具有下一个结构：

+-------------------+----------------------+------+-----+---------+----------------+
| Field             | Type                 | Null | Key | Default | Extra          |
+-------------------+----------------------+------+-----+---------+----------------+
| id                | int(11)              | NO   | PRI | NULL    | auto_increment |
| created_at        | datetime             | YES  | MUL | NULL    |                |
| year_in_tz        | smallint(5) unsigned | YES  | MUL | NULL    |                |
| month_in_tz       | tinyint(3) unsigned  | YES  | MUL | NULL    |                |
+-------------------+----------------------+------+-----+---------+----------------+

在 created_at、year_in_tz、month_in_tz 和（year_in_tz、month_in_tz）上使用键：

 ALTER TABLE `statistics` ADD INDEX created_at (created_at);
 alter table statistics add index year_in_tz (year_in_tz);
 alter table statistics add index month_in_tz (month_in_tz);
 alter table statistics add index year_month_in_tz(year_in_tz,month_in_tz);

一些查询示例...

mysql> SELECT COUNT(*) AS count_all, year_in_tz, month_in_tz 
       FROM `statistics` 
       GROUP BY year_in_tz, month_in_tz;
+-----------+------------+-------------+
| count_all | year_in_tz | month_in_tz |
+-----------+------------+-------------+
|    467890 |       2011 |          11 |
|   7339389 |       2011 |          12 |
+-----------+------------+-------------+
2 rows in set (5.04 sec)  

mysql> describe SELECT COUNT(*) AS count_all, year_in_tz, month_in_tz FROM `statistics` GROUP BY year_in_tz, month_in_tz;
 +----+-------------+--------------------+-------+---------------+------------------+---------+------+---------+-------------+
 | id | select_type | table              | type  | possible_keys | key              | key_len | ref  | rows    | Extra       |
 +----+-------------+--------------------+-------+---------------+------------------+---------+------+---------+-------------+
 |  1 | SIMPLE      | statistics         | index | NULL          | year_month_in_tz | 5       | NULL | 7797984 | Using index |
 +----+-------------+--------------------+-------+---------------+------------------+---------+------+---------+-------------+
 1 row in set (0.01 sec)

 mysql> SELECT COUNT(*) AS count_all, year_in_tz, month_in_tz 
        FROM `statistics` 
        WHERE (created_at BETWEEN '2011-10-31 20:00:00' AND '2011-12-31 19:59:59') 
        GROUP BY year_in_tz, month_in_tz;
 +-----------+------------+-------------+
 | count_all | year_in_tz | month_in_tz |
 +-----------+------------+-------------+
 |    467890 |       2011 |          11 |
 |   7339389 |       2011 |          12 |
 +-----------+------------+-------------+
 2 rows in set (1 min 33.46 sec)

 mysql> describe SELECT COUNT(*) AS count_all, year_in_tz, month_in_tz FROM `statistics` WHERE (created_at BETWEEN '2011-10-31 20:00:00' AND '2011-12-31 19:59:59') GROUP BY year_in_tz, month_in_tz;
 +----+-------------+--------------------+-------+---------------+------------------+---------+------+---------+-------------+
 | id | select_type | table              | type  | possible_keys | key              | key_len | ref  | rows    | Extra       |
 +----+-------------+--------------------+-------+---------------+------------------+---------+------+---------+-------------+
 |  1 | SIMPLE      | statistics         | index | created_at    | year_month_in_tz | 5       | NULL | 7797984 | Using where |
 +----+-------------+--------------------+-------+---------------+------------------+---------+------+---------+-------------+
 1 row in set (0.07 sec)

因此，如果我在索引列上使用带有子句的 where 语句 + 按索引列分组，则速度极低。也许有人知道如何改进最后一个查询以使其更快？

P.S. 在玩过索引之后，我发现 (created_at, year_in_tz, month_in_tz) 上的新索引使查询运行得更快，但我希望每次查询 0-1 秒，而不是 10 秒：

alter table lending_statistics add index created_at_with_year_and_month_in_tz (created_at,year_in_tz,month_in_tz);

mysql> describe SELECT COUNT(*) AS count_all, year_in_tz, month_in_tz FROM `statistics`        WHERE (created_at BETWEEN '2011-10-31 20:00:00' AND '2011-12-31 19:59:59') GROUP BY year_in_tz, month_in_tz;
+----+-------------+--------------------+-------+-------------------------------------------------+--------------------------------------+---------+------+---------+-----------------------------------------------------------+
| id | select_type | table              | type  | possible_keys                                   | key                                  | key_len | ref  | rows    | Extra                                                     |
+----+-------------+--------------------+-------+-------------------------------------------------+--------------------------------------+---------+------+---------+-----------------------------------------------------------+
|  1 | SIMPLE      | statistics         | range | created_at,created_at_with_year_and_month_in_tz | created_at_with_year_and_month_in_tz | 9       | NULL | 3612208 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+--------------------+-------+-------------------------------------------------+--------------------------------------+---------+------+---------+-----------------------------------------------------------+

1 行（0.05 秒）

mysql> SELECT COUNT(*) AS count_all,        year_in_tz, month_in_tz        FROM `lending_statistics`        WHERE (created_at BETWEEN '2011-10-31 20:00:00' AND '2011-12-31 19:59:59')        GROUP BY year_in_tz, month_in_tz;
+-----------+------------+-------------+
| count_all | year_in_tz | month_in_tz |
+-----------+------------+-------------+
|    467890 |       2011 |          11 |
|   7339389 |       2011 |          12 |
+-----------+------------+-------------+
2 rows in set (10.62 sec)

【问题讨论】：

只是好奇；因为 year_in_Tz 在您的示例中将是相同的，如果您按照 article 从组中省略它会发生什么
xQbert，没有任何反应，但感谢您提供关于查询优化的好主意（如果选择范围为一年，则从组中省略 year_in_tz）。
这只是上面文章中的一个想法：您可以使用此功能通过避免不必要的列排序和分组来获得更好的性能。但是，这主要是在每个未在 GROUP BY 中命名的非聚合列中的所有值对于每个组都相同时很有用。”我和你现在一样茫然
刚刚将最后一个描述更改为正确的...最后一个用于另一个查询。
你能列出你创建的键的定义吗？

标签： mysql indexing group-by innodb

【解决方案1】：

将字段 ID 添加到您的索引 created_at_with_year_and_month_in_tz，然后更改您的选择语句以使用

select count(id) ....

在 MySQL 5.6 中，ICP 功能在这种情况下可能会有所帮助，因为所有访问的字段都是索引的一部分。我相信 MySQL 可能会在您指定 count(*) 时读取实际数据记录，因此它需要读取索引文件以及数据文件。

【讨论】：

试过了。有一些性能提升（高达 20%），但我不知道这是由于将 id 添加到索引还是由于当前的硬件状态造成的。无论如何，使用这个索引的 count(id) 和 count(*) 需要相同的时间。

【解决方案2】：

试试这个，有一个带有日期时间索引的known MySQL issue

    WHERE
        created_at BETWEEN 
               CAST('2011-10-31 20:00:00' AS datetime) AND 
               CAST('2011-12-31 19:59:59'  AS datetime)

【讨论】：

感谢您的回答，但我没有看到查询性能有任何变化:(

【解决方案3】：

缓慢的COUNT(*) 查询是 MySQL 和 PostgreSQL（以及其他 RDBMS）经常遇到的问题，因为在查询执行期间会执行顺序表扫描。尝试考虑在其他地方缓存您的聚合数据：memcached、redis 等。

【讨论】：

我正在考虑将统计信息的编译缓存到另一个表中，但实现起来并不是那么快，所以我需要一些想法来尽可能快地进行当前查询。