【问题标题】:Replace "OR" on 2 indexes with a faster solution (UNION?)用更快的解决方案替换 2 个索引上的“OR”(UNION?)
【发布时间】:2021-09-22 13:13:00
【问题描述】:

我正在查询商店系统中的购物车,例如:

DROP TABLE IF EXISTS c;
CREATE TABLE c (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `user` int(10) unsigned DEFAULT NULL,
  `email` VARCHAR(255) NOT NULL DEFAULT '', 
  `number` VARCHAR(20) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `user`(`user`),
  KEY `email`(`email`),
  UNIQUE KEY `number`(`number`)
) ENGINE=InnoDB;

INSERT INTO c SET user=1, email="test1@example.com", number="00001";
INSERT INTO c SET user=2, email="test2@example.com", number="00002";
INSERT INTO c SET user=3, email="test3@example.com", number="00003";
INSERT INTO c SET user=4, email="test1@example.com", number="00004";
INSERT INTO c SET user=1, email="test1@example.com", number="00005";

我需要使用一列查询 c 的记录,该列显示具有相同用户或相同电子邮件的购物车数量。所以我这样做:

SELECT c.number, 
       (SELECT COUNT(DISTINCT (id)) FROM c AS c2
                  WHERE c2.email = c.email OR c2.user = c.user
       ) AS ordercount
FROM c;
   

+--------+------------+
| number | ordercount |
+--------+------------+
| 00001  |          3 |
| 00002  |          1 |
| 00003  |          1 |
| 00004  |          3 |
| 00005  |          3 |
+--------+------------+

这可行,但问题是 OR 非常慢,因为 MySQL/MariaDB 在子查询中不使用任何键:

EXPLAIN SELECT c.number, 
               (SELECT COUNT(DISTINCT (id)) FROM c AS c2
                   WHERE c2.email = c.email OR c2.user = c.user
               ) AS ordercount
        FROM c;

+----+--------------------+-------+------------+------+---------------------------+--    ----+---------+------+------+----------+-------------+
| id | select_type        | table | partitions | type | possible_keys             | key  | key_len | ref  | rows | filtered | Extra       |
+----+--------------------+-------+------------+------+---------------------------+------+---------+------+------+----------+-------------+
|  1 | PRIMARY            | c     | NULL       | ALL  | NULL                      | NULL | NULL    | NULL |    5 |   100.00 | NULL        |
|  2 | DEPENDENT SUBQUERY | c2    | NULL       | ALL  | PRIMARY,number,user,email | NULL | NULL    | NULL |    5 |    36.00 | Using where |
+----+--------------------+-------+------------+------+---------------------------+------+---------+------+------+----------+-------------+

即使强制索引也不会让数据库使用它:

EXPLAIN SELECT c.number, 
               (SELECT COUNT(DISTINCT (id)) FROM c AS c2 FORCE INDEX(email, user)
                  WHERE c2.email = c.email OR c2.user = c.user
               ) AS ordercount
        FROM c;

+----+--------------------+-------+------------+------+---------------------------+--    ----+---------+------+------+----------+-------------+
| id | select_type        | table | partitions | type | possible_keys             | key  | key_len | ref  | rows | filtered | Extra       |
+----+--------------------+-------+------------+------+---------------------------+------+---------+------+------+----------+-------------+
|  1 | PRIMARY            | c     | NULL       | ALL  | NULL                      | NULL | NULL    | NULL |    5 |   100.00 | NULL        |
|  2 | DEPENDENT SUBQUERY | c2    | NULL       | ALL  | PRIMARY,number,user,email | NULL | NULL    | NULL |    5 |    36.00 | Using where |
+----+--------------------+-------+------------+------+---------------------------+------+---------+------+------+----------+-------------+

使用“电子邮件”列或“用户”列都可以,使用密钥:

EXPLAIN SELECT c.number, 
               (SELECT COUNT(DISTINCT (id)) FROM c AS c2 WHERE c2.email = c.email) AS ordercount
        FROM c;

+----+--------------------+-------+------------+------+---------------------------+-------+---------+--------------+------+----------+-------------+
| id | select_type        | table | partitions | type | possible_keys             | key   | key_len | ref          | rows | filtered | Extra       |
+----+--------------------+-------+------------+------+---------------------------+-------+---------+--------------+------+----------+-------------+
|  1 | PRIMARY            | c     | NULL       | ALL  | NULL                      | NULL  | NULL    | NULL         |    5 |   100.00 | NULL        |
|  2 | DEPENDENT SUBQUERY | c2    | NULL       | ref  | PRIMARY,number,user,email | email | 767     | test.c.email |    3 |   100.00 | Using index |
+----+--------------------+-------+------------+------+---------------------------+-------+---------+--------------+------+----------+-------------+

问题是查询在大约有 500.000 个条目的大型表上运行,使得查询需要大约 30 秒才能查询 50 条记录的子集。仅使用“email”匹配或仅使用“user”匹配运行查询,50 条记录只需要大约 1 秒。

所以我需要优化查询。我试图将 OR 更改为 UNION:

SELECT c.number, 
(SELECT COUNT(DISTINCT (id)) FROM 
    ((SELECT u1.id FROM c AS u1 WHERE
     u1.email = c.email
    )
    UNION DISTINCT
    (SELECT u2.id FROM c AS u2 WHERE
    u2.user = c.user
    )) AS u2
) AS ordercount
FROM c;

但我收到了错误: 错误 1054 (42S22):“where 子句”中的未知列“c.email”

知道如何使用索引更快地进行此查询吗?

【问题讨论】:

  • 我认为较新版本的 MySQL 包含“索引 OR”运算符。这是 MariaDB 还是 MySQL?和哪个版本?顺便说一句,好问题。
  • 感谢您的回复。我在 MySql 5.7.24 和 MariaDB 10.5.6 上尝试过,结果相同。不幸的是,服务器上没有 MySQL 8.x。
  • (您应该添加另一个购物车 -- 使用相同的用户但不同的电子邮件。)

标签: mysql sql mariadb union


【解决方案1】:

这是使用两个left joins 的替代方法:

select c.*,
       count(distinct coalesce(ce.id, cu.id))
from c left join
     c ce
     on c.email = ce.email left join
     c cu
     on c.user = cu.user and not cu.email <=> ce.email
group by c.id;

这可以在c(user)c(email) 上使用单独的索引。

基本上,这会沿着两个单独的维度连接,然后将它们组合在一起形成count(distinct)。在一些更糟糕的情况下,可能在两个维度上都有很多匹配项。但是,在许多情况下,这可能会很好地工作,因为它可以使用索引而不是扫描整个表的每一行。

【讨论】:

  • 是的,我认为如果基数很低,这应该会很好。我想了一半,然后分心了。
【解决方案2】:

(我假设“c”的意思是“购物车”。)

(重新开始)

由于numberUNIQUE,它也可能是PRIMARY KEY。也摆脱id

CREATE FUNCTION Ct(_user INT, _email VARCHAR(255))
    RETURNS VARCHAR(20)
RETURN
    SELECT COUNT(DISTINCT number)
        FROM
            ( SELECT number
                FROM c
                WHERE user = _user
            ) UNION ALL
            ( SELECT number
                FROM c
                WHERE email = _email
            );

那就做吧

SELECT number, Ct(user, email)
    FROM c;

请注意,我避免使用双重 DISTINCT。而且,由于 PK 是每个二级索引的隐含部分,因此内部 Select 具有“覆盖”索引。

【讨论】:

  • 感谢您的回复!你能解释一下吗?不幸的是,在我的测试数据上运行 SELECT 时,我得到了一个“空集”。我添加了测试数据的预期结果,它是一个包含 2 列“数字”和“订单计数”的结果集。
  • @Werner - 你想要一个包含重复电子邮件或用户的 c.number 列表吗?还是只算一次?
  • @Werner - 我重新开始了。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2021-03-17
  • 2018-05-08
  • 2023-01-24
  • 2021-07-07
  • 2013-11-19
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多