以您的城市/街区为例,您的架构可能类似于:
CREATE TABLE cities (
`city_id` SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
`country_id` TINYINT UNSIGNED NOT NULL,
`zip` VARCHAR(50) NOT NULL,
`name` VARCHAR(100) NOT NULL,
PRIMARY KEY (`city_id`)
);
CREATE TABLE blocks (
`block_id` MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,
`city_id` SMALLINT UNSIGNED NOT NULL,
`p1` SMALLINT UNSIGNED NOT NULL DEFAULT '0',
`p2` SMALLINT UNSIGNED NOT NULL DEFAULT '1',
PRIMARY KEY (`block_id`),
FOREIGN KEY (`city_id`) REFERENCES `cities` (`city_id`)
);
您对给定城市 (city_id = 123) 的查询将是:
查询 1
SELECT AVG(p1/(p1+p2)) AS B
FROM blocks b
WHERE b.city_id = 123
注意:AVG(x) = SUM(x) / COUNT(x)
现在,如果您担心性能,您应该定义一些预期数字:
- 城市数量
- (平均)每个城市的街区数
- 您将/可以使用的硬件
- 您通常会运行的查询
- 每小时/分钟/秒的查询数
如果您已定义这些数字,则可以生成一些虚拟/虚假数据来对其进行性能测试。
这是一个包含 1000 个城市和 100K 个街区(平均每个城市 100 个街区)的示例:
首先创建一个有 100K 序列号的辅助表:
CREATE TABLE IF NOT EXISTS seq100k
SELECT NULL AS seq
FROM information_schema.COLUMNS c1
JOIN information_schema.COLUMNS c2
JOIN information_schema.COLUMNS c3
LIMIT 100000;
ALTER TABLE seq100k CHANGE COLUMN seq seq MEDIUMINT UNSIGNED AUTO_INCREMENT PRIMARY KEY;
使用 MariaDB,您可以改用序列插件。
生成数据:
DROP TABLE IF EXISTS blocks;
DROP TABLE IF EXISTS cities;
CREATE TABLE cities (
`city_id` SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
`country_id` TINYINT UNSIGNED NOT NULL,
`zip` VARCHAR(50) NOT NULL,
`name` VARCHAR(100) NOT NULL,
PRIMARY KEY (`city_id`)
)
SELECT seq AS city_id
, floor(rand(1)*10+1) as country_id
, floor(rand(2)*99999+1) as zip
, rand(3) as name
FROM seq100k
LIMIT 1000;
CREATE TABLE blocks (
`block_id` MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,
`city_id` SMALLINT UNSIGNED NOT NULL,
`p1` SMALLINT UNSIGNED NOT NULL DEFAULT '0',
`p2` SMALLINT UNSIGNED NOT NULL DEFAULT '1',
PRIMARY KEY (`block_id`),
FOREIGN KEY (`city_id`) REFERENCES `cities` (`city_id`)
)
SELECT seq AS block_id
, floor(rand(4)*1000+1) as city_id
, floor(rand(5)*11) as p1
, floor(rand(6)*20+1) as p2
FROM seq100k
LIMIT 100000;
现在您可以运行查询了。请注意,我不会使用确切的运行时。如果您需要它们准确无误,则应使用分析。
运行 Query 1 我的 GUI (HeidiSQL) 显示 0.000 sec,我称之为“几乎即时”。
您可能希望运行如下查询:
查询 2
SELECT b.city_id, AVG(p1/(p1+p2)) AS B
FROM blocks b
GROUP BY b.city_id
ORDER BY B DESC
LIMIT 10
HeidiSQL 显示0.078 sec。
使用覆盖索引
ALTER TABLE `blocks`
DROP INDEX `city_id`,
ADD INDEX `city_id` (`city_id`, `p1`, `p2`);
您可以将运行时间减少到0.031 sec。如果这还不够快,您应该考虑一些缓存策略。一种方法(除了应用程序级别的缓存)是使用触发器来管理cities 表中的新列(我们就叫它B):
ALTER TABLE `cities` ADD COLUMN `B` FLOAT NULL DEFAULT NULL AFTER `name`;
定义更新触发器:
DROP TRIGGER IF EXISTS `blocks_after_update`;
DELIMITER //
CREATE TRIGGER `blocks_after_update` AFTER UPDATE ON `blocks` FOR EACH ROW BEGIN
if new.p1 <> old.p1 or new.p2 <> old.p2 then
update cities c
set c.B = (
select avg(p1/(p1+p2))
from blocks b
where b.city_id = new.city_id
)
where c.city_id = new.city_id;
end if;
END//
DELIMITER ;
更新测试:
查询 3
UPDATE blocks b SET p2 = p2 + 100 WHERE 1=1;
UPDATE blocks b SET p2 = p2 - 100 WHERE 1=1;
此查询在没有触发器的2.500 sec 和有触发器的60 sec 中运行。这可能看起来像很多开销 - 但考虑一下,我们要更新 100K 行两次 - 这意味着平均 60K msec / 200K updates = 0.3 msec/update。
现在您可以使用 Query 2 获得相同的结果
查询 4
SELECT c.city_id, c.B
FROM cities c
ORDER BY c.B DESC
LIMIT 10
“几乎立即”(0.000 sec)。
如果需要,您仍然可以优化触发器。在cities 表中使用附加列block_count(也需要使用触发器进行管理)。
添加列:
ALTER TABLE `cities`
ADD COLUMN `block_count` MEDIUMINT UNSIGNED NOT NULL DEFAULT '0' AFTER `B`;
初始化数据:
UPDATE cities c SET c.block_count = (
SELECT COUNT(*)
FROM blocks b
WHERE b.city_id = c.city_id
)
WHERE 1=1;
重写触发器:
DROP TRIGGER IF EXISTS `blocks_after_update`;
DELIMITER //
CREATE TRIGGER `blocks_after_update` AFTER UPDATE ON `blocks` FOR EACH ROW BEGIN
declare old_A, new_A double;
if new.p1 <> old.p1 or new.p2 <> old.p2 then
set old_A = old.p1/(old.p1+old.p2);
set new_A = new.p1/(new.p1+new.p2);
update cities c
set c.B = (c.B * c.block_count - old_A + new_A) / c.block_count
where c.city_id = new.city_id;
end if;
END//
DELIMITER ;
有了这个触发器,Query 3 现在在8.5 sec 中运行。这意味着每次更新的开销为0.03 msec。
请注意,您还需要定义 INSERT 和 DELETE 触发器。您将需要添加更多逻辑(例如,在更新时处理 city_id 中的更改)。但也有可能您根本不需要任何触发器。