【发布时间】:2018-09-08 18:30:57
【问题描述】:
几个月以来,我一直在从开源收集足球比赛的数据。为此,我通过 XPath 使用 PHP 获取特定网站 URL 的数据,该 URL 显示特定足球比赛的数据。然后我进行一些数据编辑,使它们符合我的需要。下一步也是最后一步是将它们传输到我的 MySQL 数据库中的多个表中。
随着数据库的快速增长,我慢慢地遇到了严重的性能问题。因为我在我的电脑上本地做所有事情,而且那个不是机器怪物,处理一场比赛已经需要一些时间。感受一下这有多快:在数据挖掘的最初几天,一场比赛大约需要 24 秒。但是,现在平均超过了 60 秒的阈值。
到目前为止,我偶尔会进入 PHP 代码并尝试在可能的情况下对其进行改进,因为我认为主要问题在于不那么干净的代码 sn-ps。虽然它有点帮助,但几天后平均时间进一步增加,最近我开始意识到肯定还有另一个耗时的问题。所以我做了一个测试 PHP 脚本,它在运行主代码时执行某种日志记录。
这表明我在数据库表中插入数据的一些 SQL 查询平均需要很长时间(我在这里分析了 100 个匹配项):
- DB 首发阵容:6.44 秒
- DB 替补:8.49 秒
再次检查查询,我意识到它们非常复杂。
这些是涉及的表:
tblStartingSquad
+----+---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+--------+
| id | matchID | player1ID | player2ID | player3ID | player4ID | player5ID | player6ID | player7ID | player8ID | player9ID | player10ID | player11ID | clubID |
+----+---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+--------+
| 1 | 1 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 1 |
| 2 | 1 | 12 | 13 | 14 | 15 | 16 | 17 | 16 | 17 | 18 | 19 | 20 | 2 |
| 3 | 2 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 1 |
| 4 | 2 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 3 |
+----+---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+--------+
tblSubstitutes
+----+---------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+--------+
| id | matchID | player12ID | player13ID | player14ID | player15ID | player16ID | player17ID | player18ID | player19ID | player20ID | player21ID | player22ID | player23ID | clubID |
+----+---------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+--------+
| 1 | 1 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 1 |
| 2 | 1 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 2 |
| 3 | 2 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 1 |
| 4 | 2 | 56 | 57 | 58 | 59 | 60 | 61 | 61 | 62 | 63 | 64 | 65 | 66 | 3 |
+----+---------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+--------+
tblMatch
+---------+---------------------+-------------+------------------+
| matchID | date | coach1 | coach2 |
+---------+---------------------+-------------+------------------+
| 1 | 2006-08-19 22:00:00 | Piotr Nowak | Fernando Clavijo |
| 2 | 2006-08-15 21:00:00 | Piotr Nowak | Mustafa Ugur |
+---------+---------------------+-------------+------------------+
tblPlayer
+----------+------------------------+------------------+
| playerID | namePlayer | short |
+----------+------------------------+------------------+
| 1 | Enis Ulusan | enis-ulusan |
| 2 | Grant Robert Murray | grant-murray |
| 3 | Evgeniy Shpedt | evgeniy-shpedt |
| 4 | Mihai Alexandru Costea | mihai-costea |
| 5 | Jan Zolna | jan-zolna |
| 6 | Adrian Gheorghiu | adrian-gheorghiu |
| 7 | Marius Marian Croitoru | marius-croitoru |
| 8 | Jacov Nachtailer | jacov-nachtailer |
| ... | ... | ... |
+----------+------------------------+------------------+
tblClub
+--------+-----------------+
| clubID | nameClub |
+--------+-----------------+
| 1 | D.C. United |
| 2 | Colorado Rapids |
| 3 | Caykur Rizespor |
+--------+-----------------+
这些是所涉及的查询:
SQL查询首发阵容
$tblstarting_squad = 'INSERT INTO tblStartingSquad (matchID, player1ID, player2ID, player3ID, player4ID, player5ID, player6ID, player7ID, player8ID, player9ID, player10ID, player11ID, clubID)
SELECT
(SELECT matchID FROM tblMatch WHERE date = "' . $match_date . '" AND coach1 = "' . $match_coach_home . '" AND coach2 = "' . $match_coach_away . '"),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[0] . '" AND short = "' . $player_short[0] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[1] . '" AND short = "' . $player_short[1] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[2] . '" AND short = "' . $player_short[2] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[3] . '" AND short = "' . $player_short[3] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[4] . '" AND short = "' . $player_short[4] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[5] . '" AND short = "' . $player_short[5] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[6] . '" AND short = "' . $player_short[6] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[7] . '" AND short = "' . $player_short[7] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[8] . '" AND short = "' . $player_short[8] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[9] . '" AND short = "' . $player_short[9] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[10] . '" AND short = "' . $player_short[10] . '" LIMIT 1),
(SELECT clubID FROM tblClub WHERE nameClub = "' . $match_club[1] . '" LIMIT 1)
WHERE NOT EXISTS (
SELECT e.matchID
FROM tblStartingSquad As e
INNER JOIN tblMatch As m
ON e.matchID = m.matchID
WHERE m.date = "' . $match_date . '" AND m.coach1 = "' . $match_coach_home . '" AND m.coach2 = "' . $match_coach_away . '" AND e.clubID = (SELECT clubID FROM tblClub WHERE nameClub = "' . $match_club[1] . '")
);';
if (!mysqli_query($db_connection, $tblstarting_squad)) {
echo("Error description $tblstarting_squad: " . mysqli_error($db_connection) . "<br />");
}
SQL 查询替补
$tblsubstitutes = 'INSERT INTO tblSubstitutes (matchID, player12ID, player13ID, player14ID, player15ID, player16ID, player17ID, player18ID, player19ID, player20ID, player21ID, player22ID, player23ID, clubID)
SELECT
(SELECT matchID FROM tblMatch WHERE date = "' . $match_date . '" AND coach1 = "' . $match_coach_home . '" AND coach2 = "' . $match_coach_away . '"),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[11] . '" AND short = "' . $player_short[11] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[12] . '" AND short = "' . $player_short[12] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[13] . '" AND short = "' . $player_short[13] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[14] . '" AND short = "' . $player_short[14] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[15] . '" AND short = "' . $player_short[15] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[16] . '" AND short = "' . $player_short[16] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[17] . '" AND short = "' . $player_short[17] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[18] . '" AND short = "' . $player_short[18] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[19] . '" AND short = "' . $player_short[19] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[20] . '" AND short = "' . $player_short[20] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[21] . '" AND short = "' . $player_short[21] . '" LIMIT 1),
(SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[22] . '" AND short = "' . $player_short[22] . '" LIMIT 1),
(SELECT clubID FROM tblClub WHERE nameClub = "' . $match_club[1] . '" LIMIT 1)
WHERE NOT EXISTS (
SELECT e.matchID
FROM tblSubstitutes As e
INNER JOIN tblMatch As m
ON e.matchID = m.matchID
WHERE m.date = "' . $match_date . '" AND m.coach1 = "' . $match_coach_home . '" AND m.coach2 = "' . $match_coach_away . '" AND e.clubID = (SELECT clubID FROM tblClub WHERE nameClub = "' . $match_club[1] . '")
);';
if (!mysqli_query($db_connection, $tblsubstitutes)) {
echo("Error description $tblsubstitutes: " . mysqli_error($db_connection) . "<br />");
}
这两个查询实际上是相同的。如果没有其他具有相同数据的条目,他们会将 11 个(分别为 12 个)玩家的 playerID 插入到 tblStartingSquad(分别为 tblSubstitutes)中。 playerID 必须事先在数据库中检查,因为原始数据没有单独的 ID。通过namePlayer 和short 从表tblPlayer 中选择它会发生这种情况。
tblStartingSquad 和 tblSubstitutes 表本身目前包含 110,000 行(用于 55,000 个匹配项),tblPlayer 为 100,000 行。
我在谷歌上搜索了一些解决方案,但找不到任何可以提高整体速度的方法。我理解的一个问题是我必须单独检查每个玩家,所以我得到 11 和 12 个子查询。这不是很优雅,但我真的不知道如何改进它。也许 StackOverflow 上的某个人有建议?
【问题讨论】:
-
您没有提供足够的信息让我们帮助您。请read this note about asking good SQL questions,并关注查询性能部分。那么请edit你的问题。
-
添加了5个涉及的数据库表以帮助理解问题
-
@WilsonHauck 它只是我计算机上的本地 XAMPP 服务器,甚至不是 unix 系统。我正在使用 8 GB RAM。对于您要求的所有其他内容,我无法提供答案,因为我不知道您的意思和想知道的内容。我只是一个能够做一些 PHP 和 SQL 并且知道如何启动 XAMPP 来完成我想做的事情的菜鸟 :)
-
@s1dy 正在查看您最近的 cmets。你从一开始就取得了很大的进步。当您准备好调整服务器的性能时,请与我们联系,您会更加惊讶于我们可以通过减少等待时间来帮助您实现的可能性。
-
@WilsonHauck 谢谢,但不,谢谢。我不想和你私下接触。我可以通过这个网站获得帮助。
标签: php mysql sql sqlperformance