如何避免在多对多查询中“使用临时”？答案

【问题标题】：How to avoid "Using temporary" in many-to-many queries?如何避免在多对多查询中“使用临时”？
【发布时间】：2011-07-25 05:47:17
【问题描述】：

这个查询很简单，我要做的就是获取给定类别中按last_updated字段排序的所有文章：

SELECT
    `articles`.*
FROM
    `articles`,
    `articles_to_categories`
WHERE
        `articles`.`id` = `articles_to_categories`.`article_id`
        AND `articles_to_categories`.`category_id` = 1
ORDER BY `articles`.`last_updated` DESC
LIMIT 0, 20;

但是它运行得很慢。这是解释说的：

select_type  table                   type     possible_keys           key         key_len  ref                                rows  Extra
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SIMPLE       articles_to_categories  ref      article_id,category_id  article_id  5        const                              5016  Using where; Using temporary; Using filesort
SIMPLE       articles                eq_ref   PRIMARY                 PRIMARY     4        articles_to_categories.article_id  1

有没有办法重写这个查询或向我的 PHP 脚本添加额外的逻辑以避免Using temporary; Using filesort 并加快处理速度？

表结构：

*articles*
id | title | content | last_updated

*articles_to_categories*
article_id | category_id

更新

我已将last_updated 编入索引。我想我的情况在 documentation 中有说明：

在某些情况下，MySQL 无法使用用于解析 ORDER BY 的索引，虽然它仍然使用索引来查找匹配 WHERE 子句的行。这些案例包括：

用于获取行的键与 ORDER BY 中使用的键不同： SELECT * FROM t1 WHERE key2=constant ORDER BY key1;

您正在加入许多表，并且 ORDER BY 中的列并非全部从第一个非常量表用于检索行。（这是 EXPLAIN 输出中的第一个表没有 const 连接类型。）

但我仍然不知道如何解决这个问题。

【问题讨论】：

慢有多慢？你用的是什么引擎？
@f00 查询运行 3-5 秒，我使用的是 innodb（可以在标签中看到）
也许可以查看我的示例 - 重要的是集群 PK 的顺序。

标签： php mysql query-optimization innodb

【解决方案1】：

这是我前一段时间针对类似性能相关问题所做的一个简化示例，它利用了 innodb 集群主键索引（显然仅适用于 innodb ！！）

您有 3 个表：category、product 和 product_category，如下所示：

drop table if exists product;
create table product
(
prod_id int unsigned not null auto_increment primary key,
name varchar(255) not null unique
)
engine = innodb; 

drop table if exists category;
create table category
(
cat_id mediumint unsigned not null auto_increment primary key,
name varchar(255) not null unique
)
engine = innodb; 

drop table if exists product_category;
create table product_category
(
cat_id mediumint unsigned not null,
prod_id int unsigned not null,
primary key (cat_id, prod_id) -- **note the clustered composite index** !!
)
engine = innodb;

最重要的是 product_catgeory 聚集复合主键的顺序，因为这种场景的典型查询总是由 cat_id = x 或 cat_id in (x,y,z...) 引导。

我们有 500K 个类别、100 万 个产品和 1.25 亿 个产品类别。

select count(*) from category;
+----------+
| count(*) |
+----------+
|   500000 |
+----------+

select count(*) from product;
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+

select count(*) from product_category;
+-----------+
| count(*)  |
+-----------+
| 125611877 |
+-----------+

那么让我们看看这个架构如何处理与您的查询类似的查询。所有查询都是冷运行的（在 mysql 重启后），缓冲区为空，没有查询缓存。

select
 p.*
from
 product p
inner join product_category pc on 
    pc.cat_id = 4104 and pc.prod_id = p.prod_id
order by
 p.prod_id desc -- sry dont a date field in this sample table - wont make any difference though
limit 20;

+---------+----------------+
| prod_id | name           |
+---------+----------------+
|  993561 | Product 993561 |
|  991215 | Product 991215 |
|  989222 | Product 989222 |
|  986589 | Product 986589 |
|  983593 | Product 983593 |
|  982507 | Product 982507 |
|  981505 | Product 981505 |
|  981320 | Product 981320 |
|  978576 | Product 978576 |
|  973428 | Product 973428 |
|  959384 | Product 959384 |
|  954829 | Product 954829 |
|  953369 | Product 953369 |
|  951891 | Product 951891 |
|  949413 | Product 949413 |
|  947855 | Product 947855 |
|  947080 | Product 947080 |
|  945115 | Product 945115 |
|  943833 | Product 943833 |
|  942309 | Product 942309 |
+---------+----------------+
20 rows in set (0.70 sec) 

explain
select
 p.*
from
 product p
inner join product_category pc on 
    pc.cat_id = 4104 and pc.prod_id = p.prod_id
order by
 p.prod_id desc -- sry dont a date field in this sample table - wont make any diference though
limit 20;

+----+-------------+-------+--------+---------------+---------+---------+------------------+------+----------------------------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref           | rows | Extra                                        |
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+----------------------------------------------+
|  1 | SIMPLE      | pc    | ref    | PRIMARY       | PRIMARY | 3       | const           |  499 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | p     | eq_ref | PRIMARY       | PRIMARY | 4       | vl_db.pc.prod_id |    1 |                                              |
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+----------------------------------------------+
2 rows in set (0.00 sec)

所以这是 0.70 秒冷 - 哎哟。

希望这会有所帮助:)

编辑

刚刚阅读了您对我上面评论的回复，您似乎可以选择以下两种选择之一：

create table articles_to_categories
(
article_id int unsigned not null,
category_id mediumint unsigned not null,
primary key(article_id, category_id), -- good for queries that lead with article_id = x
key (category_id)
)
engine=innodb;

或者。

create table categories_to_articles
(
article_id int unsigned not null,
category_id mediumint unsigned not null,
primary key(category_id, article_id), -- good for queries that lead with category_id = x
key (article_id)
)
engine=innodb;

取决于您关于如何定义集群 PK 的典型查询。

【讨论】：

感谢您提供如此详细的答复。我已经按照您的建议创建了一个索引 - 两个 PRIMARY 键现在都是查询中的用户，就像您的示例一样。但是很遗憾，查询仍然需要 3 秒，并且使用临时表。
你的意思是你的主键从 article_id, category_id 更改为 category_id, article_id ？在 EDIT 中检查我的 categories_to_articles 表。如果一切都失败了，请发布您的表定义...

【解决方案2】：

您应该能够通过在 articles.last_updated 上添加一个键来避免文件排序。 MySQL 需要对 ORDER BY 操作进行文件排序，但只要您按索引列排序（有一些限制），就可以不使用文件排序。

有关更多信息，请参阅此处：http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html

【讨论】：

实际上，我已将 last_updated 编入索引。我不知道为什么不使用索引。也许 MySQL 想看到类似 (id, last_updated) 的东西？
您实际上是对的，删除 ORDER BY 使查询非常快。现在我只需要了解如何让MYSQL使用索引:)
我已经尝试创建 (id, last_updated) 索引，但 MySQL 仍然使用主索引：/
@SilverLight - 我认为你不能真正摆脱文件排序......因为你必须使用 articles_to_categories.category_id 上的 WHERE 子句从 articles 读取行，所以读取的顺序是由这个条件决定。去掉filesort，在key排序的时候MySQL实际上是按照key来读取记录的，所以结果不需要排序。不确定您是否可以使用它..

【解决方案3】：

我假设你在你的数据库中做了以下：

1) 文章 -> id 是主键

2)articles_to_categories -> article_id 是文章的外键 -> id

3) 您可以在 category_id 上创建索引

【讨论】：

根据解释 category_id 已经是一个可能的键。

【解决方案4】：

ALTER TABLE articles ADD INDEX (last_updated);
ALTER TABLE articles_to_categories ADD INDEX (article_id);

应该这样做。正确的计划是使用第一个索引查找前几条记录，并使用第二个索引执行 JOIN。如果它不起作用，请尝试 STRAIGHT_JOIN 或其他方法来强制正确使用索引。

【讨论】：

然后强制使用它们。但是，由于articles_to_categories.category_id = 1 的条件，它可能无法正常工作。对 5k 行使用临时和文件排序可能是最佳选择。