【问题标题】:Correlated Subquery? pulling data from different columns, same table相关子查询?从不同的列,同一个表中提取数据
【发布时间】:2019-08-08 07:48:34
【问题描述】:

我正在尝试使用多个条件从不同的列中提取数据,但无法弄清楚如何,我相信这是我需要的相关子查询,并且尝试了几种不同的方法,但无法弄清楚。

我希望得到迈阿密热火队在以下类别中获胜的平均值 + 纽约尼克斯队在以下类别中的平均失利并将它们合并为一个平均值。

所以这是我对 Heat 的查询,它准确地检索了我想要的内容

SELECT
    box_score.team_name, 
    ROUND(AVG(eFG),3) eFG,
    ROUND(AVG(OPP_eFG),3) OPP_eFG,
    ROUND(AVG(TOV_PCT),3) TOV_PCT,
    ROUND(AVG(OPP_TOV_PCT),3) OPP_TOV_PCT,
    ROUND(AVG(ORB_PCT),3) ORB_PCT,
    ROUND(AVG(DRB_PCT),3) DRB_PCT,
    ROUND(AVG(FTA_RATE),3) FTA_RATE,
    ROUND(AVG(OPP_FTA_RATE),3) OPP_FTA_RATE
FROM box_score
WHERE team_name = 'Miami Heat' AND WIN_LOSS = 'W' AND game_date < '2019-03-07' 

我也为尼克斯队带来了损失,这也导致了我想要的结果

WHERE team_name = 'New York Knicks' AND WIN_LOSS = 'L' AND game_date < '2019-03-07' 

我的问题是试图将这两者结合到一个查询中,在其中我得到热火队获胜的平均值和尼克斯失利的平均值。所有这些信息都来自同一张表,我可以从 ID 号或姓名获取团队信息...如果有任何改变,我正在使用 SQLite

这是运行查询的结果,这是我正在寻找的具有平均值的一行数据...但是我希望将这些数字在热火队获胜和尼克斯队输球中的平均值合并为一行

热火平均胜率

eFG    OPP_eFG  TOV_PCT  OPP_TOV_PCT  ORB_PCT  DRB_PCT  FTA_RATE  OPP_FTA_RATE
0.603  0.505    0.14     0.126        0.28     0.77     0.235     0.141

这些是尼克斯队输球的平均值

eFG    OPP_eFG  TOV_PCT  OPP_TOV_PCT  ORB_PCT  DRB_PCT  FTA_RATE  OPP_FTA_RATE
0.568  0.602    0.146    0.136        0.225    0.787    0.222     0.235

我想将两者合并为每个类别的 1 个平均值

但是有什么办法可以让我得到平均值以从单独的列中提取数据?

在这种情况下,我对迈阿密热火队很感兴趣,所以我有上面的平均值,但我想做的是从热火队得到平均值到尼克斯队对应的相反数据(eFG 应该与其他团队的 opp_eFG 等等)...所以基本上我正在寻找以下的平均值:

热火 eFG 和 OPP_eFG 尼克斯

热火 OPP_eFG 和 eFG 尼克斯

热火 TOV_PCT 和 OPP_TOV_PCT 尼克斯

热火 OPP_TOV_PCT 和 TOV_PCT 尼克斯队

热火 FTA_RATE 和 OPP_FTA_RATE 尼克斯

热火 OPP_FTA_RATE 和 FTA_RATE 尼克斯

仍然希望得到 1 行作为结果

【问题讨论】:

  • WHERE team_name = "New York Knicks"
  • 请不要以使合理答案帖子无效的方式编辑问题帖子,修复旧帖子以提出问题并发布新问题。请在代码问题中给出minimal reproducible example--cut & paste & runnable code 加上所需的输出加上清晰的规范和解释。 PS 这似乎是一个常见的错误,人们希望加入一些聚合(每个都可能涉及加入),但他们错误地尝试先进行所有的加入,然后再进行所有的聚合。 Strange duplicate behavior from GROUP_CONCAT of two LEFT JOINs of GROUP_BYs

标签: sql sqlite inner-join correlated-subquery


【解决方案1】:

此答案假定您想要一个 AVG(heat)-AVG(knicks),根据原始帖子,而不是 AVG(heatsX OR knicksY)

我想为此推广通用表表达式:

WITH selector_heat as (
SELECT
    box_score.team_name, 
    ROUND(AVG(eFG),3) eFG,
    ROUND(AVG(OPP_eFG),3) OPP_eFG,
    ROUND(AVG(TOV_PCT),3) TOV_PCT,
    ROUND(AVG(OPP_TOV_PCT),3) OPP_TOV_PCT,
    ROUND(AVG(ORB_PCT),3) ORB_PCT,
    ROUND(AVG(DRB_PCT),3) DRB_PCT,
    ROUND(AVG(FTA_RATE),3) FTA_RATE,
    ROUND(AVG(OPP_FTA_RATE),3) OPP_FTA_RATE
FROM box_score
WHERE team_name = 'Miami Heat' AND WIN_LOSS = 'W' AND game_date < '2019-03-07' 
)
, selector_knicks as (
...
)
select H.eFG - K.OPP_eFG as magic_nbr
from selector_heat H 
join selector_knicks K ON (1=1)

这里有更多关于语法的细节:https://www.sqlite.org/lang_with.html 但暂时忽略“递归”位,在这种情况下您不需要它们。

或者(使用稍微不同的方法)您可以使用 Window 子句来聚合“每个团队”,然后使用结果。 更多信息在这里:https://www.sqlite.org/windowfunctions.html#introduction_to_window_functions

例子:

SELECT  
  team_name, 
  WIN_LOSS,
  ROUND(AVG(eFG) OVER (partition by team_name, win_loss),3) as eFG
  ...
  from box_score
  where game_date < '2019-03-07'

使用此结果集,您可以获得所有团队的平均值和 win_loss 组合。 将其包装在 CTE 中并在适合的条件下加入自身,例如

WITH cte as (SELECT ...)
SELECT H.eFG - K.OPP_eFG as magic_nbr
FROM cte H join cte K 
  ON (H.team_name = 'Miami Heat' 
  AND K.team_name = 'NY Knicks'
  AND H.win_loss = 'W'
  AND K.win_loss = 'L')

【讨论】:

  • 是的,这是我输入的错误,看起来我想要的是平均值减去另一个,但我实际上是在寻找两者的平均值
【解决方案2】:

如果您想先计算平均值然后对其进行平均,您可以使用两个级别的聚合:

SELECT ROUND(AVG(eFG), 3) as eFG,
       ROUND(AVG(OPP_eFG), 3) as OPP_eFG,
       ROUND(AVG(TOV_PCT), 3) as TOV_PCT,
       ROUND(AVG(OPP_TOV_PCT), 3) as OPP_TOV_PCT,
       ROUND(AVG(ORB_PCT), 3) as ORB_PCT,
       ROUND(AVG(DRB_PCT), 3) as DRB_PCT,
       ROUND(AVG(FTA_RATE), 3) as FTA_RATE,
       ROUND(AVG(OPP_FTA_RATE), 3) as OPP_FTA_RATE
FROM (SELECT bs.team_name, 
             AVG(eFG) as eFG,
             AVG(OPP_eFG) as OPP_eFG,
             AVG(TOV_PCT) as TOV_PCT,
             AVG(OPP_TOV_PCT) as OPP_TOV_PCT,
             AVG(ORB_PCT) as ORB_PCT,
             AVG(DRB_PCT) as DRB_PCT,
             AVG(FTA_RATE) as FTA_RATE,
             AVG(OPP_FTA_RATE) as OPP_FTA_RATE
      FROM box_score bs
      WHERE game_date < '2019-03-07' AND
            ( (team_name = 'Miami Heat' AND WIN_LOSS = 'W') OR
              (team_name = 'New York Knicks' AND WIN_LOSS = 'L')
            )
     ) bs

【讨论】:

    【解决方案3】:

    一种解决方案是使用条件聚合在单个表扫描中执行整个操作(没有连接或子查询):

    SELECT  
        box_score.team_name, 
        ROUND(AVG(CASE WHEN team_name = 'Miami Heat'      AND WIN_LOSS = 'W' THEN eFG          END),3) Heat_eFG,
        ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN eFG          END),3) Knicks_eFG,
        ROUND(AVG(CASE WHEN team_name = 'Miami Heat'      AND WIN_LOSS = 'W' THEN OPP_eFG      END),3) Heat_OPP_eFG,
        ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN OPP_eFG      END),3) Knicks_OPP_eFG,
        ROUND(AVG(CASE WHEN team_name = 'Miami Heat'      AND WIN_LOSS = 'W' THEN TOV_PCT      END),3) Heat_TOV_PCT,
        ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN TOV_PCT      END),3) Knicks_TOV_PCT,
        ROUND(AVG(CASE WHEN team_name = 'Miami Heat'      AND WIN_LOSS = 'W' THEN OPP_TOV_PCT  END),3) Heat_OPP_TOV_PCT,
        ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN OPP_TOV_PCT  END),3) Knicks_OPP_TOV_PCT,
        ROUND(AVG(CASE WHEN team_name = 'Miami Heat'      AND WIN_LOSS = 'W' THEN ORB_PCT      END),3) Heat_ORB_PCT,
        ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN ORB_PCT      END),3) Knicks_ORB_PCT,
        ROUND(AVG(CASE WHEN team_name = 'Miami Heat'      AND WIN_LOSS = 'W' THEN DRB_PCT      END),3) Heat_DRB_PCT,
        ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN DRB_PCT      END),3) Knicks_DRB_PCT,
        ROUND(AVG(CASE WHEN team_name = 'Miami Heat'      AND WIN_LOSS = 'W' THEN FTA_RATE     END),3) Heat_FTA_RATE,
        ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN FTA_RATE     END),3) Knicks_FTA_RATE,
        ROUND(AVG(CASE WHEN team_name = 'Miami Heat'      AND WIN_LOSS = 'W' THEN OPP_FTA_RATE END),3) Heat_OPP_FTA_RATE,
        ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN OPP_FTA_RATE END),3) Knicks_OPP_FTA_RATE
    FROM box_score
    WHERE team_name IN ('Miami Heat', 'New York Knicks') AND game_date < '2019-03-07' 
    

    如果您要计算平均值,这里是查询的另一个版本,例如,eFG 代表迈阿密获胜,OPP_eFG 代表纽约输在单列中。这仍然依赖于条件聚合。我还通过将条件移至WHERE 子句来略微简化了逻辑。

    SELECT  
        box_score.team_name, 
        ROUND(AVG(CASE 
            WHEN team_name = 'Miami Heat'      THEN eFG 
            WHEN team_name = 'New York Knicks' THEN OPP_eFG 
        END, 3) Heats_eFG_Knicks_OPP_eFG, 
        ROUND(AVG(CASE 
            WHEN team_name = 'Miami Heat'      THEN OPP_eFG 
            WHEN team_name = 'New York Knicks' THEN eFG 
        END, 3) Heats_OPP_eFG_Knicks_eFG,
        ROUND(AVG(CASE 
            WHEN team_name = 'Miami Heat'      THEN TOV_PCT 
            WHEN team_name = 'New York Knicks' THEN OPP_TOV_PCT 
        END, 3) Heats_TOV_PCT_Knicks_OPP_TOV_PCT,
        ROUND(AVG(CASE 
            WHEN team_name = 'Miami Heat'      THEN OPP_TOV_PCT 
            WHEN team_name = 'New York Knicks' THEN TOV_PCT 
        END, 3) Heats_OPP_TOV_PCT_Knicks_TOV_PCT,
        ROUND(AVG(CASE 
            WHEN team_name = 'Miami Heat'      THEN FTA_RATE 
            WHEN team_name = 'New York Knicks' THEN OPP_FTA_RATE 
        END, 3) Heats_FTA_RATE_Knicks_OPP_FTA_RATE,
        ROUND(AVG(CASE 
            WHEN team_name = 'Miami Heat'      THEN OPP_FTA_RATE 
            WHEN team_name = 'New York Knicks' THEN FTA_RATE 
        END, 3) Heats_OPP_FTA_RATE_Knicks_FTA_RATE
    FROM box_score
    WHERE 
        game_date < '2019-03-07' 
        AND (
               ( team_name = 'Miami Heat'      AND win_loss = 'W' )
            OR ( team_name = 'New York Knicks' AND win_loss = 'L') 
        )
    

    注意:正如wildpasser 所评论的,您可能希望在文字值周围使用单引号而不是双引号(这是 SQL 标准)。我将原始查询中的所有双引号全局转换为单引号。

    【讨论】:

    • 如果您想对任何其他团队使用相同的逻辑(即比较其他胜负组合)- 将 CASE 语句更改为使用 win_loss 标志。然后团队名称只需要在一个地方更改——主要的 WHERE 子句
    猜你喜欢
    • 1970-01-01
    • 2016-09-27
    • 1970-01-01
    • 2016-10-12
    • 1970-01-01
    • 2014-10-23
    • 1970-01-01
    • 2014-06-10
    • 2019-05-27
    相关资源
    最近更新 更多