提高选择速度 - mysql - 非常大的表答案

【问题标题】：Improving select speed - mysql - very large tables提高选择速度 - mysql - 非常大的表
【发布时间】：2014-03-10 03:01:26
【问题描述】：

一般来说是 MySQL 和 SQL 的新手 - 所以请保持温和 :-)

我有一个行数非常多的表。表格是：

create table iostat (
pkey     int not null auto_increment,
serverid int not null,
datestr  char(15) default 'NULL',
esttime  int not null default 0,

rs       float not null default 0.0,
ws       float not null default 0.0,
krs      float not null default 0.0,
kws      float not null default 0.0,
wait     float not null default 0.0,
actv     float not null default 0.0,
wsvct    float not null default 0.0,
asvct    float not null default 0.0,
pctw     int not null default 0,
pctb     int not null default 0,
device   varchar(50),
avgread  float not null default 0.0,
avgwrit  float not null default 0.0,

primary key (pkey),

index i_serverid (serverid),
index i_esttime (esttime),
index i_datestr (datestr),
index i_rs (rs),
index i_ws (ws),
index i_krs (krs),
index i_kws (kws),
index i_wait (wait),
index i_actv (actv),
index i_wsvct (wsvct),
index i_asvct (asvct),
index i_pctb (pctb),
index i_device (device),
index i_servdate (serverid, datestr),
index i_servest (serverid, esttime)

)
engine = MyISAM
data directory = '${IOSTATdatadir}'
index directory = '${IOSTATindexdir}'
;

目前该表有 834,317,203 行。

是的 - 我需要所有数据。数据的最高级别组织是按收集日期 (datestr)。它是一个 CHAR 而不是日期，以保留我用于各种加载、提取和分析脚本的特定日期格式。

每天增加大约 16,000,000 行。

我想加快的操作之一是（限制通常为 50，但范围从 10 到 250）：

create table TMP_TopLUNsKRead
  select
    krs, device, datestr, esttime
  from
    iostat
  where
    ${WHERECLAUSE}
  order by
    krs desc limit ${Limit};

其中：

serverid = 29 and esttime between X and Y and device like '%t%'

其中 X 和 Y 是从 4 分钟到 24 小时的时间戳。

我宁愿不更改数据库引擎。这让我可以将数据和索引放在单独的驱动器上，这给了我显着的整体性能。它也是总共 16 亿行，重新加载将花费大量时间。

【问题讨论】：

如果将 EXPLAIN 添加到此查询的开头，您会得到什么结果？
在您的查询中对性能影响最大的元素是${WHERECLAUSE}，向我们展示该变量的设置会很有用。
我知道我会忘记这一点：serverid = 29 和 X 和 Y 之间的 esttime 以及像 '%t%' 这样的设备——X 和 Y 是从 4 分钟到 24 小时不等的时间戳.
|编号 |选择类型 |表|类型 |可能的键 |关键 | key_len |参考 |行 |额外 | +----+-------------+--------+--------+------------- ------------------+-------+------------+- -----+--------+-------------+ | 1 |简单 | iostat |索引 | i_serverid,i_esttime,i_servdate,i_serve | i_krs | 4 |空 | 69421 |使用位置 | +----+-------------+--------+--------+------------- ------------------+-------+------------+- -----+--------+-------------+

标签： mysql query-performance large-data-volumes

【解决方案1】：

如果不知道您的 ${WHERECLAUSE} 中的内容，就无法为您提供帮助。你说得对，这是一张巨大的桌子。

但这里有一个可能会有所帮助的观察结果：

上的复合覆盖索引

(krs, device, datestr, esttime)

可能会加快数据子集的排序和提取速度。

【讨论】：

我已经根据像这样的特定操作添加了几个复合覆盖索引，但由于新数据的加载时间变得非常长，我将它们删除了。我可能会把这个放回去 - 目前构建这个临时表（和其他 2 个类似的表）是我对这个表执行的最慢的重复任务。
现在我重新阅读并完全理解了您的评论...我可能会这样做，但我对 TMP_... 表的提取性能没有任何问题。它们通常是 iostat 表的一个非常小的子集。

【解决方案2】：

device like '%t%'

这就是凶手。前导 % 表示它是对整个列的搜索，或者如果它被索引则为索引，而不是索引查找。看看你是否可以不用前导%。

【讨论】：

我害怕这个。数据的格式为 cXtYYYYYYYY...，其中 X 可以变化且不可预测。真正的解决方法是添加一个 device_type 字段。 “t”的存在区分了两种类型的设备数据。
@opsdog，使用 gigarow 表将架构设计为可搜索是至关重要的。您甚至可以考虑切换到提供函数索引和降序索引的其他 RDMS，例如 PostgreSQL。