为什么 MAX() 比 ORDER BY ... LIMIT 1 慢 100 倍？答案

【问题标题】：Why is MAX() 100 times slower than ORDER BY ... LIMIT 1?为什么 MAX() 比 ORDER BY ... LIMIT 1 慢 100 倍？
【发布时间】：2012-10-07 11:24:56
【问题描述】：

我有一个表 foo 与（以及其他 20 个）列 bar、baz 和 quux 索引在 baz 和 quux。该表有大约 500k 行。

为什么以下查询的速度差异如此之大？查询 A 耗时 0.3s，查询 B 耗时 28s。

查询 A

select baz from foo
    where bar = :bar
    and quux = (select quux from foo where bar = :bar order by quux desc limit 1)

解释

id  select_type table   type    possible_keys   key     key_len ref     rows    Extra
1   PRIMARY     foo     ref     quuxIdx         quuxIdx 9       const   2       "Using where"
2   SUBQUERY    foo     index   NULL            quuxIdx 9       NULL    1       "Using where"

查询 B

select baz from foo
    where bar = :bar
    and quux = (select MAX(quux) from foo where bar = :bar)

解释

id  select_type table   type    possible_keys   key     key_len ref     rows    Extra
1   PRIMARY     foo     ref     quuxIdx         quuxIdx 9       const   2       "Using where"
2   SUBQUERY    foo     ALL     NULL            NULL    NULL    NULL    448060  "Using where"

我使用 MySQL 5.1.34。

【问题讨论】：

'LiMIT 1' 表示取 1 行并停止，不是吗？查询 B 是 O(n*m)
@PaulDinh 似乎两个查询都产生相同的结果，很可能与操作顺序有关，在第一种情况下，它按 quux 排序，并在第二个查询中从结果（快速）中搜索栏（需要检查整个表）从未排序然后排序找到最大值
@Viktor，你能告诉我explain select baz from foo where bar = :bar and quux = (select quux from foo where quux=MAX(quux) and bar = :bar )explain select baz from foo where bar = :bar and quux = (select quux from foo where quux=MAX(quux) and bar = :bar limit 1 )
@eicto：+1 对您的评论，但我想澄清一件事：查找未索引列的最大值不需要 O(n log(n)) 排序。它可以通过扫描表一次并记住看到的最高值在 O(n) 时间内完成。

标签： mysql performance query-optimization

【解决方案1】：

您应该在(bar, quux) 上添加一个索引。

没有这个索引，MySQL 无法看到如何高效地执行查询，因此它必须从各种低效的查询计划中进行选择。

在第一个示例中，它扫描quux 索引并为找到的每一行在原始表中查找bar 的相应值。检查每一行需要两倍的时间，但幸运的是，具有正确值 bar 的行接近其扫描的开始，因此它可以停止。这可能是因为您搜索的bar的值经常出现，所以幸运的机会非常高。因此，它可能只需要在找到匹配项之前检查少数行，因此即使检查每一行需要两倍的时间，但仅检查几行的事实可以节省大量的整体成本。由于bar 上没有索引，MySQL 事先不知道:bar 的值经常出现，所以它无法知道这个查询会很快。

在第二个示例中，它使用了一个不同的计划，它始终扫描整个表。每一行都直接从表中读取，不使用索引。这意味着读取的每一行都很快，但是因为你有很多行，所以总体上很慢。如果在:bar 上没有匹配的行，这将是更快的查询计划。但是，如果大约 1% 的行具有 bar 的期望值，则与上述计划相比，使用此查询计划将（非常）慢大约 100 倍。由于您在 bar 上没有索引，因此 MySQL 不会提前知道这一点。

您也可以只添加缺少的索引，然后两个查询都会快得多。

【讨论】：

似乎 OP 询问为什么相同的结果会有如此巨大的差异
so select quux from foo where quux=MAX(quux) and bar = :bar 比 select MAX(quux) from foo where bar = :bar 快，如果 quux 被索引为 int 并且 bar 是 text ？
nvm 它给出了不同的结果:)