针对 FAST 计数优化 MySQL 数据库答案

【问题标题】：Optimizing MySQL database for FAST count针对 FAST 计数优化 MySQL 数据库
【发布时间】：2011-12-22 19:39:39
【问题描述】：

我目前正在尝试优化我的数据库。问题如下：我有一张表，目前存储超过 83Mio。与时间相关的值。它们由高分辨率 (ms) 时间戳索引。我需要做的是计算某个值在给定的时间间隔内出现了多少次 - 例如说我想知道值 1.56787 在时间戳 x 到时间戳 y 的间隔中出现了多少次。现在这几乎需要永远。我正在使用 InnoDB，我已经花了很多时间优化配置文件，这极大地提高了速度。

感谢您的任何意见，因为我几乎没有办法解决这个问题。我能想到的唯一解决方法是创建包含固定间隔的预计数值的表，这不会真正令人满意，因为整个事情也应该是完全可更新的（我们正在谈论每隔几毫秒到达的新值）。另一个数据库系统会更适合我的问题吗？

这里是解释输出：

Field   Type    Null    Key Default Extra

timestamp   bigint(20)  NO  PRI NULL     
ask decimal(6,5)    NO      NULL     
bid decimal(6,5)    NO      NULL     
askvolume   decimal(6,5)    NO      NULL     
bidvolume   decimal(6,5)    NO      NULL     

# The MySQL server
[mysqld]
port= 3306
socket= "C:/xampp/mysql/mysql.sock"
basedir="C:/xampp/mysql" 
tmpdir="C:/xampp/tmp" 
datadir="C:/xampp/mysql/data"
pid_file="mysql.pid"
skip-external-locking
key_buffer = 16M
max_allowed_packet = 61M
table_cache = 64
sort_buffer_size = 512K
net_buffer_length = 8K
read_buffer_size = 256K
read_rnd_buffer_size = 512K
myisam_sort_buffer_size = 8M
log_error="mysql_error.log"
bind-address="192.168.1.2"


# Don't listen on a TCP/IP port at all. This can be a security enhancement,
# if all processes that need to connect to mysqld run on the same host.
# All interaction with mysqld must be made via Unix sockets or named pipes.
# Note that using this option without enabling named pipes on Windows
# (via the "enable-named-pipe" option) will render mysqld useless!
# 
# commented in by lampp security
#skip-networking
skip-federated

# Replication Master Server (default)
# binary logging is required for replication
# log-bin deactivated by default since XAMPP 1.4.11
#log-bin=mysql-bin

# required unique id between 1 and 2^32 - 1
# defaults to 1 if master-host is not set
# but will not function as a master if omitted
server-id   = 1

# Replication Slave (comment out master section to use this)
#
# To configure this host as a replication slave, you can choose between
# two methods :
#
# 1) Use the CHANGE MASTER TO command (fully described in our manual) -
#    the syntax is:
#
#    CHANGE MASTER TO MASTER_HOST=<host>, MASTER_PORT=<port>,
#    MASTER_USER=<user>, MASTER_PASSWORD=<password> ;
#
#    where you replace <host>, <user>, <password> by quoted strings and
#    <port> by the master's port number (3306 by default).
#
#    Example:
#
#    CHANGE MASTER TO MASTER_HOST='125.564.12.1', MASTER_PORT=3306,
#    MASTER_USER='joe', MASTER_PASSWORD='secret';
#
# OR
#
# 2) Set the variables below. However, in case you choose this method, then
#    start replication for the first time (even unsuccessfully, for example
#    if you mistyped the password in master-password and the slave fails to
#    connect), the slave will create a master.info file, and any later
#    change in this file to the variables' values below will be ignored and
#    overridden by the content of the master.info file, unless you shutdown
#    the slave server, delete master.info and restart the slaver server.
#    For that reason, you may want to leave the lines below untouched
#    (commented) and instead use CHANGE MASTER TO (see above)
#
# required unique id between 2 and 2^32 - 1
# (and different from the master)
# defaults to 2 if master-host is set
# but will not function as a slave if omitted
#server-id       = 2
#
# The replication master for this slave - required
#master-host     =   <hostname>
#
# The username the slave will use for authentication when connecting
# to the master - required
#master-user     =   <username>
#
# The password the slave will authenticate with when connecting to
# the master - required
#master-password =   <password>
#
# The port the master is listening on.
# optional - defaults to 3306
#master-port     =  <port>
#
# binary logging - not required for slaves, but recommended
#log-bin=mysql-bin


# Point the following paths to different dedicated disks
#tmpdir = "C:/xampp/tmp"
#log-update = /path-to-dedicated-directory/hostname

# Uncomment the following if you are using BDB tables
#bdb_cache_size = 4M
#bdb_max_lock = 10000

# Comment the following if you are using InnoDB tables
#skip-innodb
innodb_data_home_dir = "C:/xampp/mysql/data"
innodb_data_file_path = ibdata1:10M:autoextend
innodb_log_group_home_dir = "C:/xampp/mysql/data"
#innodb_log_arch_dir = "C:/xampp/mysql/data"
## You can set .._buffer_pool_size up to 50 - 80 %
## of RAM but beware of setting memory usage too high
innodb_buffer_pool_size = 1024M
innodb_additional_mem_pool_size = 20M
## Set .._log_file_size to 25 % of buffer pool size
innodb_log_file_size = 5M
innodb_log_buffer_size = 16M
innodb_flush_log_at_trx_commit = 0
innodb_lock_wait_timeout = 50

[mysqldump]
quick
max_allowed_packet = 16M

[mysql]
no-auto-rehash
# Remove the next comment character if you are not familiar with SQL
#safe-updates

[isamchk]
key_buffer = 20M
sort_buffer_size = 20M
read_buffer = 2M
write_buffer = 2M

[myisamchk]
key_buffer = 20M
sort_buffer_size = 20M
read_buffer = 2M
write_buffer = 2M

[mysqlhotcopy]
interactive-timeout

哦，这台机器是具有 6GB RAM 的 i7-950，系统+数据库位于 SSD 上。所以我认为这应该不是问题？

感谢您的帮助，我们将不胜感激！

【问题讨论】：

在问题上砸钱不是一种选择吗？您可以做很多事情来提高性能，而更多的钱绝对不是我的首选……但在某些方面，您只需要获得更好的硬件。根据您的描述，您似乎正在接近这一点。话虽如此......我强烈建议给（配置良好的）Postgres 一个机会。
没有看到用于运行 MySQl 实例的表和硬件 - 很难提出任何建议。而且我怀疑 Postgres 是否会对 I/O 绑定系统产生任何影响，尤其是知道 InnoDB 的 B-tree 实现是业内最好的实现之一。从长远来看，交换整个 RDBMS 不是一个可行的选择 IMO。我打赌 MySQL 配置不正确（SHOW VARIABLES LIKE '%innodb%' and output of EXPLAIN...` 不存在）。如果您可以发布这些内容，则更容易分析问题所在。
谢谢大家，我在描述中添加了所有信息......不，如果有必要，投入一些现金应该不是问题，但正如我发布的那样，它是一个带 ssd 和 6GB 的 i7内存...
innodb_buffer_pool_size 在 6 gig 系统上只有 1 GB。将其增加到 5 个演出。您遇到的另一个问题是将时间戳作为 InnoDB 引擎中的主键。如果您的时间戳不是连续的（下一个比上一个大），那么 InnoDB 会对记录进行物理重新排序，这会在插入时由于聚集的主键而降低其性能。在您的 select 语句之前添加 EXPLAIN 关键字，以查看 MySQL 想要检查的行数并发布输出。

标签： mysql optimization

【解决方案1】：

我对索引时间戳值中的值范围没有感觉，但在我看来partitioning 你的表可以帮助你。特别是RANGE partitioning 或HASH partitioning。

这应该会给您带来显着的性能提升。

【讨论】：

哈希分区是不可取的：它确保数据在分区中的均匀分布（作为哈希），但是对于间隔查询（就像他所做的那样）它没有提供特别的好处，而范围partitionig 肯定会的。
谢谢大家的建议，我想我现在有很多东西要研究！

【解决方案2】：

如果时间范围可以表示为一系列范围（月、日、周等），您可能会引入类似日期前缀列的内容，这将显着减少使用 IN() 的检查行数表达。

这里有一篇文章揭露了这个想法：http://www.mysqlperformanceblog.com/2010/01/09/getting-around-optimizer-limitations-with-an-in-list/

【讨论】：

感谢您的回复！我在上面发布了解释输出。我一定会看看你的链接！

【解决方案3】：

第一步：如果您还没有这样做，请使用解释计划来查看查询的瓶颈到底是什么，以及引擎是否正确使用了索引。

第二步：按时间戳范围对表进行分区。我不确定 MySQL/InnoDB 是否具有该功能，但如果没有，您最好更改 DBMS。

在任何情况下，MySQL 都不是高性能的好选择：根据您的需要，使用 Oracle 或 Postgre 甚至内存存储可能会更好（尤其是如果您不太关心安全而不是性能）。

【讨论】：

InnoDB 将工作数据集存储在内存中，并且比内存中特定解决方案的运行速度要快得多。 Postgres 或 Oracle 不会更快。这是一个优化系统 I/O 并了解查询为何需要这么长时间的问题。
我真诚地怀疑 InnoDB 会比 H2 更快（他们说不是，因为它的价值：h2database.com/html/performance.html）。尽管如此，显然切换 RDBMS 是另一回事，而不是最简单的解决方案，但值得一提。
InnoDB @ 5.6 速度很快，您链接的基准表明 MySQL 在他们的案例中每秒执行 140 条语句，这比我的小型测试至强和 7200rpm 驱动器的速度要慢。我不相信针对其他系统的开箱即用基准测试。再说一次，MySQL 有专门的基于列的分析引擎，其执行速度比 Oracle 快。问题是为什么分贝小于 1 亿。记录执行如此缓慢，我知道一个事实是，您可以在 InnoDB 上使用如此多的行获得非常快速的结果，而无需更改数据库系统。