【问题标题】:How to optimize huge query with repeated subqueries如何使用重复子查询优化大型查询
【发布时间】:2011-05-12 16:15:16
【问题描述】:

我有以下包含重复子查询的巨大查询,对我来说它看起来效率很低。我该如何优化它?

SELECT T2.date1, T2.date2, T2.period, T1.market, T1.ticker, 0 AS scenario
FROM
(SELECT  DISTINCT
        Q.market AS market,
        Q.ticker AS ticker

FROM portfolio.scenario S RIGHT JOIN portfolio.quote Q
ON    S.series =  (SELECT S.series
                  FROM scenario S
                  WHERE S.date1 >= '2009-09-01' AND
                        S.date2 <= '2010-07-01' AND
                        S.period = 'QUARTER'
                  ORDER BY S.date2
                  LIMIT 1) AND
      Q.market = S.market AND 
      Q.ticker = S.ticker

WHERE  Q.date = '2010-07-01' AND
       S.date1 IS NULL) AS T1

JOIN 

(SELECT DISTINCT S.date1, S.date2, S.period
FROM scenario S
WHERE S.series = (SELECT S.series
                  FROM scenario S
                  WHERE S.date1 >= '2009-09-01' AND
                        S.date2 <= '2010-07-01' AND
                        S.period = 'QUARTER'
                  ORDER BY S.date2
                  LIMIT 1) AND
      S.date1 >= '2009-09-01' AND
      S.date2 <= '2010-07-01') AS T2

UNION

SELECT  S.date1 AS date1, 
        S.date2 AS date2,
        S.period AS period,
        Q.market AS market,
        Q.ticker AS ticker,
        Q.close * EXP(S.ratio) AS scenario

FROM portfolio.scenario S , portfolio.quote Q

WHERE  S.series = (SELECT S.series
                  FROM scenario S
                  WHERE S.date1 >= '2009-09-01' AND
                        S.date2 <= '2010-07-01' AND
                        S.period = 'QUARTER'
                  ORDER BY S.date2
                  LIMIT 1) AND
      S.date1 >= '2009-09-01' AND
      S.date2 <= '2010-07-01' AND
      Q.date = '2010-07-01' AND
      Q.market = S.market AND 
      Q.ticker = S.ticker

UNION

SELECT T2.date1, T2.date2, T2.period, T1.market, T1.ticker, 0 AS scenario
FROM
(SELECT  DISTINCT
        Q.market AS market,
        Q.ticker AS ticker

        FROM portfolio.scenario S , portfolio.quote Q
        WHERE  Q.date = '2010-07-01' AND
              Q.market = S.market AND 
              Q.ticker = S.ticker AND
              S.series = (SELECT S.series
                  FROM scenario S
                  WHERE S.date1 >= '2009-09-01' AND
                        S.date2 <= '2010-07-01' AND
                        S.period = 'QUARTER'
                  ORDER BY S.date2
                  LIMIT 1) AND
              S.date1 >= '2009-09-01' AND
              S.date2 <= '2010-07-01' ) AS T1

JOIN 

(SELECT DISTINCT S.date1, S.date2, S.period
FROM scenario S
WHERE S.series = (SELECT S.series
                  FROM scenario S
                  WHERE S.date1 >= '2009-09-01' AND
                        S.date2 <= '2010-07-01' AND
                        S.period = 'QUARTER'
                  ORDER BY S.date2
                  LIMIT 1) AND
      S.date1 >= '2009-09-01' AND
      S.date2 <= '2010-07-01') AS T2


WHERE (T2.date1, T2.date2, T2.period, T1.market, T1.ticker)
        NOT IN (SELECT  S.date1 AS date1, 
                S.date2 AS date2,
                S.period AS period,
                Q.market AS market,
                Q.ticker AS ticker

        FROM portfolio.scenario S , portfolio.quote Q
        WHERE  Q.date = '2010-07-01' AND
              Q.market = S.market AND 
              Q.ticker = S.ticker AND
              S.series = (SELECT S.series
                  FROM scenario S
                  WHERE S.date1 >= '2009-09-01' AND
                        S.date2 <= '2010-07-01' AND
                        S.period = 'QUARTER'
                  ORDER BY S.date2
                  LIMIT 1) AND
              S.date1 >= '2009-09-01' AND
              S.date2 <= '2010-07-01' )


ORDER BY 
date1,date2,period,market,ticker

在@Bruce 的评论和一些减少子查询的逻辑之后,我现在的查询是:

(SELECT S.date1, 
        S.date2,
        S.period,
        Q.market,
        Q.ticker,
        Q.close * EXP(S.ratio) AS scenario

FROM portfolio.scenario S , portfolio.quote Q

WHERE  
      S.date1 >= (@date1 := '2009-09-01') AND
      S.date2 <= (@date2 := '2010-07-01') AND
      Q.date = (@qdate := '2010-07-01') AND
      S.series = 
      (@series := 
                  (SELECT S.series
                  FROM scenario S
                  WHERE S.date1 >= '2009-09-01' AND
                        S.date2 <= '2010-07-01' AND
                        S.period = 'QUARTER'
                  ORDER BY S.date2
                  LIMIT 1)) AND
      Q.market = S.market AND 
      Q.ticker = S.ticker)

UNION

(SELECT T2.date1, T2.date2, T2.period, T1.market, T1.ticker, 0 AS scenario
FROM
(SELECT Q.market, Q.ticker
 FROM quote Q
 WHERE Q.date = @qdate) AS T1

JOIN 

(SELECT DISTINCT S.date1, S.date2, S.period
FROM scenario S
WHERE S.series = @series AND
      S.date1 >= @date1 AND
      S.date2 <= @date2) AS T2

WHERE (T2.date1, T2.date2, T2.period, T1.market, T1.ticker)
        NOT IN 

        (SELECT  S.date1,
                 S.date2,
                 S.period,
                 Q.market,
                 Q.ticker
        FROM portfolio.scenario S , portfolio.quote Q
        WHERE  Q.date = @qdate AND
               Q.market = S.market AND 
               Q.ticker = S.ticker AND
               S.series = @series AND
               S.date1 >= @date1 AND
               S.date2 <= @date2 ))

但是,如果我改变了

  (@series := 
              (SELECT S.series
              FROM scenario S
              WHERE S.date1 >= '2009-09-01' AND
                    S.date2 <= '2010-07-01' AND
                    S.period = 'QUARTER'
              ORDER BY S.date2
              LIMIT 1))

成为

  (@series := 
              (SELECT S.series
              FROM scenario S
              WHERE S.date1 >= @date1 AND
                    S.date2 <= @date2 AND
                    S.period = 'QUARTER'
              ORDER BY S.date2
              LIMIT 1))

处理它需要太多时间(我在 10 分钟前执行了查询,但仍然没有得到结果),而查询通常在 5 秒内返回。

另外,当我重置变量时,执行结果不正确(可能使用之前执行的变量值)。我如何在不添加 SET 语句的情况下更改它(我希望它是单个查询)

【问题讨论】:

    标签: mysql query-optimization subquery correlated-subquery


    【解决方案1】:

    使用 MySQL 变量:

    SELECT
        @x := ColumnName,
        @y := ColumnName2 + @z,
        @z := (SELECT * FROM SubTable WHERE x = @x),
        (SELECT * FROM Table2 WHERE X = @z),
        (SELECT * FROM Table3 WHERE X = @z)
    FROM Table
    WHERE
        v = @v
    
    • 您可以将子选择和列值分配给 SQL 变量
    • 您可以在语句中的任何位置引用这些变量
    • 变量包含来自前几行的值(如果已设置)
    • 您可以通过这种方式重复使用子选择和其他值

    【讨论】:

    • 感谢您的建议,但在我的查询中使用 @z := (SELECT * FROM SubTable WHERE x = @x) 这样的行大大增加了获取结果集的时间。我已经根据您的评论重新编辑了问题以包含新查询和一些减少子查询的逻辑
    • 节省来自在多个列中使用该变量(我的示例没有显示,所以我更新了它)。我最近将它应用到了一条汇总 SQL,否则它会为多个列重复窗口平均计算(子选择 + 分组函数)。
    • 感谢老兄的反馈,是的,我将这个概念与@series 变量一起使用。我认为在您的示例中 @z := (SELECT * FROM SubTable WHERE x = @x) ,SubTable 只有一列,因为 @z 将无法保存多列的值。
    • 使用临时表(或视图?)可能会更好地处理结果集,具体取决于确定集合的复杂性。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多