【问题标题】:Sql query with joins between four tables with millions of rows具有数百万行的四个表之间的连接的 Sql 查询
【发布时间】:2010-10-18 00:23:28
【问题描述】:

我们有一个事务处理 sql 语句,它查询 4 ​​个表,每个表有数百万行。

这需要几分钟,尽管它已经根据 TuningAdvisor 使用索引和统计数据进行了优化。

查询的结构如下:

SELECT E.EmployeeName , SUM(M.Amount) AS TotalAmount , SUM(B.Amount) AS BudgetAmount , SUM(T.Hours) AS TotalHours , SUM(TB.Hours) AS BudgetHours , SUM(CASE WHEN T.Type = 'Waste' THEN T.Hours ELSE 0 END) AS WastedHours 来自员工 E 左加入 MoneyTransactions M ON E.EmployeeID = M.EmployeeID LEFT JOIN BudgetTransactions B ON E.EmployeeID = B.EmployeeID LEFT JOIN TimeTransactions T ON E.EmployeeID = T.EmployeeID 左加入 TimeBudgetTransactions TB ON E.EmployeeID = TB.EmployeeID GROUP BY E.EmployeeName

由于每个事务表包含数百万行,我考虑将其拆分为每个事务表一个查询,使用表变量,如 @real@budget@hours,然后将它们加入最终的 @987654324 @。但是在测试中它似乎没有加速。

您将如何处理以加快速度?

【问题讨论】:

    标签: sql sql-server join large-data-volumes


    【解决方案1】:

    我不确定您发布的查询是否会产生您期望的结果。

    它将交叉连接所有维度表(MoneyTransactions 等)并将所有结果相乘。

    试试这个:

    SELECT  E.EmployeeName,
            (
            SELECT  SUM(amount)
            FROM    MoneyTransactions m
            WHERE   M.EmployeeID = E.EmployeeID
            ) AS TotalAmount,
            (
            SELECT  SUM(amount)
            FROM    BudgetTransactions m
            WHERE   M.EmployeeID = E.EmployeeID
            ) AS BudgetAmount,
            (
            SELECT  SUM(hours)
            FROM    TimeTransactions m
            WHERE   M.EmployeeID = E.EmployeeID
            ) AS TotalHours,
            (
            SELECT  SUM(hours)
            FROM    TimeBudgetTransactions m
            WHERE   M.EmployeeID = E.EmployeeID
            ) AS BudgetHours
    FROM    Employees E
    

    【讨论】:

    • 嗯...这不是 SELECT EmployeeID, EmployeeName, SUM(...), SUM(...) FROM Employees GROUP BY EmployeeID, EmployeeName 吗?
    • @Quassnoi:谢谢。我认为嵌套查询(SELECT 中的 SELECT)会比 JOIN 慢...尚未尝试您的建议...
    • 不,他们不会。实际上,您发布的查询很慢,因为它会产生很多不必要的连接并产生不正确的结果。如果每个事务表中每个员工有 100 行,那么您将在结果中为每个员工获得 10,000,000 行,这很可能不是您想要的。
    【解决方案2】:

    我不知道您的表上是否有所有可以加快处理速度的索引,但是拥有大表可能会对查询时间产生这种影响。 如果可能,我建议对表进行分区。这需要更多的工作,但是你现在为加快查询所做的一切在几百万条新记录之后是不够的。

    【讨论】:

      【解决方案3】:

      试试这个:

      SELECT E.EmployeeName, TA.TotalAmount, BA.BudgetAmount, TWH.TotalHours, BH.BudgetHours, TWH.WastedHours
      FROM Employees E
      LEFT JOIN 
      (SELECT E.EmployeeID, SUM(M.Amount) AS TotalAmount
      FROM Employees E INNER JOIN MoneyTransactions M ON E.EmployeeID = M.EmployeeID GROUP BY E.EmployeeID)TA
      ON E.EmployeeID = TA.EmployeeID
      LEFT JOIN 
      (SELECT E.EmployeeID , SUM(B.Amount) AS BudgetAmount
      FROM Employees E INNER JOIN BudgetTransactions B ON E.EmployeeID = B.EmployeeID GROUP BY E.EmployeeID)BA
      ON E.EmployeeID = BA.EmployeeID
      LEFT JOIN 
      (SELECT E.EmployeeID , SUM(T.Hours) AS TotalHours , SUM(CASE WHEN T.Type = 'Waste' THEN T.Hours ELSE 0 END) AS WastedHours
      FROM Employees E INNER JOIN TimeTransactions T ON E.EmployeeID = T.EmployeeID GROUP BY E.EmployeeID)TWH
      ON E.EmployeeID = TWH.EmployeeID
      LEFT JOIN 
      (SELECT E.EmployeeID , SUM(TB.Hours) AS BudgetHours
      FROM Employees E INNER JOIN TimeBudgetTransactions TB ON E.EmployeeID = TB.EmployeeID GROUP BY E.EmployeeID)BH
      ON E.EmployeeID = BH.EmployeeID
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2017-10-01
        • 2021-05-29
        • 1970-01-01
        • 1970-01-01
        • 2018-09-14
        • 2021-11-17
        • 2016-05-02
        相关资源
        最近更新 更多