【问题标题】:Improve performance of complex SQL query提高复杂 SQL 查询的性能
【发布时间】:2018-09-08 18:30:57
【问题描述】:

几个月以来,我一直在从开源收集足球比赛的数据。为此,我通过 XPath 使用 PHP 获取特定网站 URL 的数据,该 URL 显示特定足球比赛的数据。然后我进行一些数据编辑,使它们符合我的需要。下一步也是最后一步是将它们传输到我的 MySQL 数据库中的多个表中。

随着数据库的快速增长,我慢慢地遇到了严重的性能问题。因为我在我的电脑上本地做所有事情,而且那个不是机器怪物,处理一场比赛已经需要一些时间。感受一下这有多快:在数据挖掘的最初几天,一场比赛大约需要 24 秒。但是,现在平均超过了 60 秒的阈值。

到目前为止,我偶尔会进入 PHP 代码并尝试在可能的情况下对其进行改进,因为我认为主要问题在于不那么干净的代码 sn-ps。虽然它有点帮助,但几天后平均时间进一步增加,最近我开始意识到肯定还有另一个耗时的问题。所以我做了一个测试 PHP 脚本,它在运行主代码时执行某种日志记录。

这表明我在数据库表中插入数据的一些 SQL 查询平均需要很长时间(我在这里分析了 100 个匹配项):

  • DB 首发阵容:6.44 秒
  • DB 替补:8.49 秒

再次检查查询,我意识到它们非常复杂。

这些是涉及的表:

tblStartingSquad

+----+---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+--------+  
| id | matchID | player1ID | player2ID | player3ID | player4ID | player5ID | player6ID | player7ID | player8ID | player9ID | player10ID | player11ID | clubID |
+----+---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+--------+
| 1  |    1    |     1     |     2     |     3     |     4     |     5     |     6     |     7     |     8     |     9     |     10     |     11     |   1    |
| 2  |    1    |    12     |    13     |    14     |    15     |    16     |    17     |    16     |    17     |    18     |     19     |     20     |   2    |
| 3  |    2    |     1     |     2     |     3     |     4     |     5     |     6     |     7     |     8     |     9     |     10     |     11     |   1    |
| 4  |    2    |    21     |    22     |    23     |    24     |    25     |    26     |    27     |    28     |    29     |     30     |     31     |   3    |
+----+---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+--------+

tblSubstitutes

+----+---------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+--------+  
| id | matchID | player12ID | player13ID | player14ID | player15ID | player16ID | player17ID | player18ID | player19ID | player20ID | player21ID | player22ID | player23ID | clubID |
+----+---------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+--------+
| 1  |    1    |     32     |     33     |     34     |     35     |     36     |     37     |     38     |     39     |     40     |     41     |     42     |     43     |   1    |
| 2  |    1    |     44     |     45     |     46     |     47     |     48     |     49     |     50     |     51     |     52     |     53     |     54     |     55     |   2    |
| 3  |    2    |     32     |     33     |     34     |     35     |     36     |     37     |     38     |     39     |     40     |     41     |     42     |     43     |   1    |
| 4  |    2    |     56     |     57     |     58     |     59     |     60     |     61     |     61     |     62     |     63     |     64     |     65     |     66     |   3    |
+----+---------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+--------+

tblMatch

+---------+---------------------+-------------+------------------+
| matchID |         date        |    coach1   |      coach2      |
+---------+---------------------+-------------+------------------+
|    1    | 2006-08-19 22:00:00 | Piotr Nowak | Fernando Clavijo |
|    2    | 2006-08-15 21:00:00 | Piotr Nowak |   Mustafa Ugur   |
+---------+---------------------+-------------+------------------+

tblPlayer

+----------+------------------------+------------------+
| playerID |       namePlayer       |      short       |
+----------+------------------------+------------------+
|     1    |       Enis Ulusan      |    enis-ulusan   |
|     2    |   Grant Robert Murray  |   grant-murray   |
|     3    |     Evgeniy Shpedt     |  evgeniy-shpedt  |
|     4    | Mihai Alexandru Costea |   mihai-costea   |
|     5    |       Jan Zolna        |     jan-zolna    |
|     6    |    Adrian Gheorghiu    | adrian-gheorghiu |
|     7    | Marius Marian Croitoru | marius-croitoru  |
|     8    |    Jacov Nachtailer    | jacov-nachtailer |
|    ...   |          ...           |        ...       |
+----------+------------------------+------------------+

tblClub

+--------+-----------------+
| clubID |    nameClub     |
+--------+-----------------+
|    1   |   D.C. United   |
|    2   | Colorado Rapids |
|    3   | Caykur Rizespor |
+--------+-----------------+

这些是所涉及的查询:

SQL查询首发阵容

$tblstarting_squad = 'INSERT INTO tblStartingSquad (matchID, player1ID, player2ID, player3ID, player4ID, player5ID, player6ID, player7ID, player8ID, player9ID, player10ID, player11ID, clubID) 
                    SELECT
                        (SELECT matchID FROM tblMatch WHERE date = "' . $match_date . '" AND coach1 = "' . $match_coach_home . '" AND coach2 = "' . $match_coach_away . '"), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[0] . '" AND short = "' . $player_short[0] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[1] . '" AND short = "' . $player_short[1] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[2] . '" AND short = "' . $player_short[2] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[3] . '" AND short = "' . $player_short[3] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[4] . '" AND short = "' . $player_short[4] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[5] . '" AND short = "' . $player_short[5] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[6] . '" AND short = "' . $player_short[6] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[7] . '" AND short = "' . $player_short[7] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[8] . '" AND short = "' . $player_short[8] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[9] . '" AND short = "' . $player_short[9] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[10] . '" AND short = "' . $player_short[10] . '" LIMIT 1), 
                        (SELECT clubID FROM tblClub WHERE nameClub = "' . $match_club[1] . '" LIMIT 1)
                    WHERE NOT EXISTS (
                        SELECT e.matchID 
                        FROM tblStartingSquad As e
                        INNER JOIN tblMatch As m
                            ON e.matchID = m.matchID
                        WHERE m.date = "' . $match_date . '" AND m.coach1 = "' . $match_coach_home . '" AND m.coach2 = "' . $match_coach_away . '" AND e.clubID = (SELECT clubID FROM tblClub WHERE nameClub = "' . $match_club[1] . '")
                    );';

if (!mysqli_query($db_connection, $tblstarting_squad)) {
                    echo("Error description $tblstarting_squad: " . mysqli_error($db_connection) . "<br />");
                }

SQL 查询替补

$tblsubstitutes = 'INSERT INTO tblSubstitutes (matchID, player12ID, player13ID, player14ID, player15ID, player16ID, player17ID, player18ID, player19ID, player20ID, player21ID, player22ID, player23ID, clubID) 
                    SELECT
                        (SELECT matchID FROM tblMatch WHERE date = "' . $match_date . '" AND coach1 = "' . $match_coach_home . '" AND coach2 = "' . $match_coach_away . '"), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[11] . '" AND short = "' . $player_short[11] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[12] . '" AND short = "' . $player_short[12] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[13] . '" AND short = "' . $player_short[13] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[14] . '" AND short = "' . $player_short[14] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[15] . '" AND short = "' . $player_short[15] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[16] . '" AND short = "' . $player_short[16] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[17] . '" AND short = "' . $player_short[17] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[18] . '" AND short = "' . $player_short[18] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[19] . '" AND short = "' . $player_short[19] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[20] . '" AND short = "' . $player_short[20] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[21] . '" AND short = "' . $player_short[21] . '" LIMIT 1),
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[22] . '" AND short = "' . $player_short[22] . '" LIMIT 1), 
                        (SELECT clubID FROM tblClub WHERE nameClub = "' . $match_club[1] . '" LIMIT 1)
                    WHERE NOT EXISTS (
                        SELECT e.matchID 
                        FROM tblSubstitutes As e
                        INNER JOIN tblMatch As m
                            ON e.matchID = m.matchID
                        WHERE m.date = "' . $match_date . '" AND m.coach1 = "' . $match_coach_home . '" AND m.coach2 = "' . $match_coach_away . '" AND e.clubID = (SELECT clubID FROM tblClub WHERE nameClub = "' . $match_club[1] . '")
                    );';

if (!mysqli_query($db_connection, $tblsubstitutes)) {
                    echo("Error description $tblsubstitutes: " . mysqli_error($db_connection) . "<br />");
                }

这两个查询实际上是相同的。如果没有其他具有相同数据的条目,他们会将 11 个(分别为 12 个)玩家的 playerID 插入到 tblStartingSquad(分别为 tblSubstitutes)中。 playerID 必须事先在数据库中检查,因为原始数据没有单独的 ID。通过namePlayershort 从表tblPlayer 中选择它会发生这种情况。

tblStartingSquadtblSubstitutes 表本身目前包含 110,000 行(用于 55,000 个匹配项),tblPlayer 为 100,000 行。

我在谷歌上搜索了一些解决方案,但找不到任何可以提高整体速度的方法。我理解的一个问题是我必须单独检查每个玩家,所以我得到 11 和 12 个子查询。这不是很优雅,但我真的不知道如何改进它。也许 StackOverflow 上的某个人有建议?

【问题讨论】:

  • 您没有提供足够的信息让我们帮助您。请read this note about asking good SQL questions,并关注查询性能部分。那么请edit你的问题。
  • 添加了5个涉及的数据库表以帮助理解问题
  • @WilsonHauck 它只是我计算机上的本地 XAMPP 服务器,甚至不是 unix 系统。我正在使用 8 GB RAM。对于您要求的所有其他内容,我无法提供答案,因为我不知道您的意思和想知道的内容。我只是一个能够做一些 PHP 和 SQL 并且知道如何启动 XAMPP 来完成我想做的事情的菜鸟 :)
  • @s1dy 正在查看您最近的 cmets。你从一开始就取得了很大的进步。当您准备好调整服务器的性能时,请与我们联系,您会更加惊讶于我们可以通过减少等待时间来帮助您实现的可能性。
  • @WilsonHauck 谢谢,但不,谢谢。我不想和你私下接触。我可以通过这个网站获得帮助。

标签: php mysql sql sqlperformance


【解决方案1】:

重新考虑您的桌子设计,以获得桌子设计。带编号的后缀列绝不是理想的数据存储。行很便宜。色谱柱很昂贵。长格式的连接、聚合、搜索、索引等要容易得多。否则,您的查询将会很复杂,因为您显示 12 个子查询甚至自联接!

有趣的是,您的 tblClubtblPlayer 是长格式,但不是 tblStartingSquadtblSubstitutes!简单地说,将所有无关的玩家列删除到 one 中,其中行表示不同的玩家:

tblStartingSquad

 ID   MatchID   PlayerID    ClubID  
  1         1          5         1
  2         1          8         1
  3         1          9         1
...

tblSubstitutes

 ID   MatchID   PlayerID    ClubID
  1         1          2         1
  2         1         16         1
  3         1          7         1
...

tblMatch (为了清楚起见,重命名了 Coach 列)

ID                    Date      HomeCoach          AwayCoach
 1     2006-08-19 22:00:00    Piotr Nowak   Fernando Clavijo
 2     2006-08-15 21:00:00    Piotr Nowak       Mustafa Ugur

PHP

从这个数据库设计中,您可以运行更简单的 PHP 参数化查询调用,甚至使用 PDO(而不是 mysqli)更轻松地绑定数组中的许多参数。

// OPEN CONNECTION
$dbconn = new PDO("mysql:host=$servername;dbname=$dbname", $username, $password);
// SET PDO ERROR MODE TO EXCEPTION
$dbconn -> setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

// PREPARED STATEMENT
$sql = "INSERT INTO tblStartingSquad (`Match`, `PlayerID`, `ClubID`)
        SELECT m.MatchID, p1.PlayerID, c.ClubID
        FROM 
           (SELECT p.PlayerID
            FROM tblPlayer p
            WHERE p.namePlayer IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
           ) p1
        INNER JOIN 
           (SELECT p.PlayerID
            FROM tblPlayer p
            WHERE p.short IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
           ) p2 ON p1.PlayerID = p2.PlayerID
        CROSS JOIN
            (SELECT MatchID 
             FROM tblMatch
             WHERE `date` = ? AND HomeCoach = ? AND AwayCoach = ?) m
        CROSS JOIN
            (SELECT ClubID 
             FROM tblClub
             WHERE nameClub = ?) c
        WHERE NOT EXISTS
             (SELECT 1 FROM tblStartingSquad As e
              WHERE e.MatchID = m.matchID)"

try {
     // INITIALIZE STATEMENT
     $stmt = $dbconn->prepare($sql);

     $params = array($player_name[0], $player_name[1], $player_name[2], 
                     $player_name[3], $player_name[4], $player_name[5], 
                     $player_name[6], $player_name[7], $player_name[8], 
                     $player_name[9], $player_name[10],  
                     $player_short[0], $player_short[1], $player_short[2],
                     $player_short[3], $player_short[4], $player_short[5], 
                     $player_short[6], $player_short[7], $player_short[8], 
                     $player_short[9], $player_short[10],         
                     $match_date, $match_coach_home, $match_coach_away, $match_club);

     // ITERATIVELY BIND PARAMS
     foreach($params as $key => $val) {
        $stmt->bindParam($key+1, $val, PDO::PARAM_STR);
     }

     // EXECUTE ACTION
     $stmt->execute();

} catch (PDOException $e) {
     echo "Error: " . $e->getMessage();
}

tblSubstitutes 执行类似的调用,调整追加查询的目标和 WHERE 子句和参数值。

【讨论】:

  • 太棒了!明天我会检查一下,让您(以及所有其他感兴趣的读者)知道这将在多大程度上提高性能!
  • 经过 5 天的编辑和更改表结构(添加了 2,500,000 行),我终于能够测试 PHP 代码及其性能。我遇到的第一个问题是$stmt-&gt;bind_param 必须是$stmt-&gt;bindParam。下一个是脚本可以正常工作,但是数据库中没有条目,我找不到原因。我已经改变了一切必要的东西。事实上,如果我使用$stmt-&gt;debugDumpParams() 向我展示准备好的语句,它会为每个关键位置提供Key: Position #0: paramno=0 name=[0] "" is_param=1 param_type=2
  • 没有引发异常?如果在网络服务器上运行 PHP,请检查 Apache 日志。并确保 qmarks,?(这里有很多)等于 $params 数组中的数字。我不知道您的设置,因此请将此答案用作实际情况的指南。根据需要进行调整。
  • 好的,现在进行性能分析:从 14.93 s 下降到 2.66 s。总体而言,整个脚本性能提高了 24.8%。不错!我通过删除foreach($params as $key =&gt; $val) { $stmt-&gt;bindParam($key+1, $val, PDO::PARAM_STR); } 并将$stmt-&gt;execute(); 更改为$stmt-&gt;execute($params); 来管理PHP 代码。最终做到了。不要问我为什么:)
  • 嗯,很高兴听到这一切都解决了。性能可能是由于重新设计了表结构。即使有 250 万行,数据库的长比扩展要好于宽。 LOAD XML 甚至可能还有另一个快速解决方案,允许您通过 PHP 直接从 XML 转到数据库。我刚刚回答了一个 PHP-MariaDB question.
【解决方案2】:

正如 O.Jones 在 cmets 中所述,查看更多 (php) 代码有助于判断性能问题。 尽管重新设计了您的数据库,但实际上另一个快速的建议是运行一个循环并使用prepared statement 对 playerID 的各个查询。这可能会给您带来轻微的性能提升。

在我看来,在 PHP 中做更多事情而不是将数据获取的逻辑外包给 SQL 似乎是合乎逻辑的。

【讨论】:

  • 我用更多信息编辑了我的条目。到目前为止,整个 PHP 脚本本身几乎没有性能问题,我尽可能地对其进行了改进,1,570 行代码的总体平均速度为 3.x 秒。如前所述,我测量了代码的显式任务,这些值反映了只有上面引用的 PHP-SQL 代码需要的时间。所以这确实是一个 SQL 查询问题。虽然我认为使用准备好的语句是我必须尝试的一个想法,但实际上我看不出它如何提高性能,因为仍然需要几个查询。
  • 感谢编辑!准备好的语句可以帮助您节省查询解析、数据转换和内存使用。据我了解,您尝试在一个大查询中获取类似的数据。因此,您改为将准备好的语句发送到数据库并注入不同的例如循环内的 playerID。我有一种强烈的感觉,它会提高你的表现。
【解决方案3】:

语句并不复杂。它们只是包含很多查找。因此,请确保查找速度很快。您需要以下索引。如果您还没有它们,请将它们添加到您的数据库中。

create index idx_find_player on tblplayer (nameplayer, short, playerid);
create index idx_find_club on tblclub (nameclub, clubid)
create index idx_find_match on tblmatch (date, coach1, coach2, matchid)

create index idx_find_squad1 on tblstartingsquad (matchid, clubid)
create index idx_find_squad2 on tblSsartingsquad (clubid, matchid)

create index idx_find_subs1 on tblsubstitutes (matchid, clubid)
create index idx_find_subs2 on tblsubstitutes (clubid, matchid)

我不确定哪个小队索引更有可能被使用,所以创建两个并查看 DBMS 选择哪个。然后你可以放下另一个。替代索引也是如此。

【讨论】:

    【解决方案4】:

    要获得准确的答案,我们必须在 SQL 中查看您的执行计划,将其发送到此处,以便我可以帮助您解决问题

    在此之前,我认为您做错了,您可以简单地在数据库中定义用户定义表并传递您的值,而不是为单行编写一个选择。通过这样做,您可以在 SQL 和服务器端代码中获得更好的性能,我向您保证,您在第一级和之后的性能最好,正如我之前所说的需要执行计划

    1. 声明你的类型(别忘了用你自己的来改变数据类型)

    创建类型 [dbo].[com_ListOfGuid] AS TABLE(NamePlayer NVARCHAR(256) NOT NULL,Short NVARCHAR(256) NOT NULL)

    1. 不用在代码中创建 select 语句,只需声明一个数据表并将数据填充到其中并像其他参数一样传递它(不要忘记为您设置一个与您的 user-defined-table-type 完全相同的列名称) 3.在您的 SQL 代码中,只需将 user-defined-table-type 与您的表连接并将它们插入到您的目标表中(我给您一个示例,您可以根据需要进行更改)

      CREATE PROCEDURE Procedure_Your_Name @ObjectTable Type_Your_Name 只读 作为 插入到 TARGET_Table () 选择 * 从 first_Table F INNER JOIN @ObjectTable T ON F.NamePlayer = t.NamePlayer 和 t.Short = f.Short

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2017-05-26
      • 1970-01-01
      • 2016-09-19
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多