使用嵌套选择计算唯一行 - 帮助我优化答案

【问题标题】：Counting unique rows with nested select - help me optimise使用嵌套选择计算唯一行 - 帮助我优化
【发布时间】：2019-07-05 21:52:12
【问题描述】：

我有一个奇怪的问题

SELECT t.something_id, t.platform, t.country, SUM(t.amnt) AS amountz
FROM ( SELECT something_id, platform, country, 1 AS amnt
       FROM log_table
       WHERE target_date = '2018-02-09'
       GROUP BY (unique_key) ) t
GROUP BY t.something_id, t.country, t.platform

日志表有唯一的玩家和一个计数器，如果玩家有多个会话，它会更新。它基于一个唯一索引工作，每天都会为唯一用户插入一个单独的行，因此我们可以分析数据。此时表增长了很多，运行此查询来计算昨天的唯一用户是一项相当困难的任务.

运行一个解释扩展查询给我这个结果：

| id    | select_type   | table         | type      | possible_keys     | key               | key_len   | ref           | rows          | filtered  | Extra         |           |           |           |
|----   |-------------  |-----------    |-------    |---------------    |------------------ |---------  |-----------    |-----------    |---------- |------------   |--------   |---------- |-------    |
| 1     | PRIMARY       | <derived2>    | ALL       | NULL              | NULL              | NULL      | NULL           | 114441375    | 100.00    | Using         | temporary;| Using     | filesort  |
| 2     | DERIVED       | log_table     | index     | NULL              | idx_multi_column  | 944       | NULL          | 114441375     | 100.00    | Using         | where;    |Using      | index     |

我的结构：

| Name          | Type          |
|-------------  |-------------- |
| stat_id       | int(8)        |
| metric        | tinyint(1)    |
| platform      | tinyint(1)    |
| something_id  | varchar(128)  |
| target_date   | date          |
| country       | varchar(2)    |
| amount        | int(100)      |
| unique_key    | varchar(180)  |
| created       | timestamp     |
| modified      | timestamp     |

我正在使用的索引： idx_multi_column = unique_key,target_date,country,platform,something_id

我知道嵌套第二个选择的第一个选择使用临时存储，并且由于行数很多会减慢速度。有什么办法可以改善吗？

【问题讨论】：

标签： mysql sql indexing group-by query-optimization

【解决方案1】：

看起来您的查询可以使用聚合函数 COUNT(DISTINCT...) 来简化：

SELECT 
   something_id, 
    platform, 
    country, 
   COUNT(DISTINCT unique_key) AS amountz
FROM log_table
WHERE target_date = '2018-02-09'
GROUP BY something_id, country, platform

如果给定的 something_id/platform/country 没有重复的 unique_id，那么您可以删除 DISTINCT 关键字；这应该会提高性能。

【讨论】：

我已经编辑了您的查询，这似乎在使用 explain extended 运行时使用了文件排序 - 有没有办法防止这种情况发生？文件排序很慢吧？
group by 暗示 ORDER BY。您可以ORDER BY NULL 避免排序。作为索引target_date, something_id, country, platform, unique_key 来帮助这个查询。

【解决方案2】：

我很确定这是您想要的查询（GMB 指出）：

SELECT something_id, platform, country, 
       COUNT(DISTINCT unique_key) AS amountz
FROM log_table
WHERE target_date = '2018-02-09'
GROUP BY something_id, country, platform

为了提高性能，请尝试使用log_table(target_date, something_id, country, platform, unique_key) 上的索引。

【讨论】：

这基本上是@GMB 答案的副本，我接受了他的版本，因为他是第一个回答的。谢谢。
@PaaPs 。 . .重点是解释哪些索引最适合查询。