重写mysql select以减少时间并将tmp写入磁盘答案

【问题标题】：Rewriting mysql select to reduce time and writing tmp to disk重写mysql select以减少时间并将tmp写入磁盘
【发布时间】：2010-08-20 20:23:18
【问题描述】：

我有一个 mysql 查询需要几分钟，这不是很好，因为它用于创建网页。

使用了三个表：poster_data 包含有关各个海报的信息。 poster_categories 列出了所有类别（电影、艺术等），而 poster_prodcat 列出了 posterid 编号及其所在的类别，例如一张海报会有多行，比如电影、印第安纳琼斯、哈里森福特、冒险电影等。

这是慢查询：

select * 
  from poster_prodcat, 
       poster_data, 
       poster_categories 
 where poster_data.apnumber = poster_prodcat.apnumber 
   and poster_categories.apcatnum = poster_prodcat.apcatnum 
   and poster_prodcat.apcatnum='623'  
ORDER BY aptitle ASC 
   LIMIT 0, 32

根据解释：

这需要几分钟。 Poster_data 刚刚超过 800,000 行，而 poster_prodcat 刚刚超过 1700 万行。使用此选择的其他类别查询几乎不引人注意，而 poster_prodcat.apcatnum='623' 有大约 400,000 个结果并且正在写入磁盘

【问题讨论】：

标签： sql mysql

【解决方案1】：

希望对您有所帮助 - http://pastie.org/1105206

drop table if exists poster;
create table poster
(
poster_id int unsigned not null auto_increment primary key,
name varchar(255) not null unique
)
engine = innodb; 


drop table if exists category;
create table category
(
cat_id mediumint unsigned not null auto_increment primary key,
name varchar(255) not null unique
)
engine = innodb; 

drop table if exists poster_category;
create table poster_category
(
cat_id mediumint unsigned not null,
poster_id int unsigned not null,
primary key (cat_id, poster_id) -- note the clustered composite index !!
)
engine = innodb;

-- FYI http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html

select count(*) from category
count(*)
========
500,000


select count(*) from poster
count(*)
========
1,000,000

select count(*) from poster_category
count(*)
========
125,675,688

select count(*) from poster_category where cat_id = 623
count(*)
========
342,820

explain
select
 p.*,
 c.*
from
 poster_category pc
inner join category c on pc.cat_id = c.cat_id
inner join poster p on pc.poster_id = p.poster_id
where
 pc.cat_id = 623
order by
 p.name
limit 32;

id  select_type table   type    possible_keys   key     key_len ref                         rows
==  =========== =====   ====    =============   ===     ======= ===                         ====
1   SIMPLE      c       const   PRIMARY         PRIMARY 3       const                       1   
1   SIMPLE      p       index   PRIMARY         name    257     null                        32  
1   SIMPLE      pc      eq_ref  PRIMARY         PRIMARY 7       const,foo_db.p.poster_id    1   

select
 p.*,
 c.*
from
 poster_category pc
inner join category c on pc.cat_id = c.cat_id
inner join poster p on pc.poster_id = p.poster_id
where
 pc.cat_id = 623
order by
 p.name
limit 32;

Statement:21/08/2010 
0:00:00.021: Query OK

【讨论】：

我能问一下你为什么选择innodb吗？（我不太了解这些差异。）
你检查解释计划和结果查询速度了吗？一言以蔽之的聚集索引。
你可能也想看看这个tag1consulting.com/MySQL_Engines_MyISAM_vs_InnoDB
只是为了澄清 100 万张海报、500K 类别、1.25 亿张海报类别和 cat_id = 623（300K+ 行）的 0.02 秒运行时间
我制作了这些表的 innodb 副本并尝试了查询。在具有两个内部联接的查询中，我在 3 分钟后记录了一个空结果集。删除第一个内部联接在 20.13 秒内返回了 32 行。但是，运行我的原始查询会在 0.10 秒内返回集合，所以我很高兴。只是好奇为什么内部连接需要这么长时间。感谢您的帮助。

【解决方案2】：

您列出的查询是最终查询的样子？（所以他们有 apcatnum=/ID/ ？）

其中 poster_data.apnumber=poster_prodcat.apnumber 和 poster_categories.apcatnum=poster_prodcat.apcatnum 和 poster_prodcat.apcatnum='623'

poster_prodcat.apcatnum='623' 将大大减少 mysql 必须处理的数据集，因此这应该是查询的第一个解析部分。

然后继续交换 where-comparisons，以便首先解析那些最小化数据集的那些。

您可能还想尝试子查询。我不确定这会有所帮助，但 mysql 可能不会首先获取所有 3 个表，而是先执行子查询，然后再执行另一个。这应该最大限度地减少查询时的内存消耗。虽然如果您真的想选择所有列（因为您在那里使用 *），这不是一个选项。

【讨论】：

好的，刚刚试过。奇怪的是，将 poster_prodcat.apcatnum='623' 移动到返回 0 行的第一个位置，而该类别中有 422,777 个海报。

【解决方案3】：

您需要在 POSTER_DATA 中有一个关于 apnumber 的索引。扫描 841,152 条记录正在扼杀性能。

【讨论】：

我在该列上有一个索引：键名：posterid 类型：唯一，基数：841152，字段：apnumber

【解决方案4】：

看起来查询正在使用 apptitle 索引来获取排序，但它正在执行全面扫描以过滤结果。我认为如果您在 poster_data 上同时拥有 apptitle 和 apnumber 的复合索引，这可能会有所帮助。然后 MySQL 可能能够使用它来执行排序顺序和过滤器。

create index data_title_anum_idx on poster_data(aptitle,apnumber);

【讨论】：