【问题标题】:SQL - LAG to get previous value if condition using multiple previous columns satisfiedSQL - 如果满足使用多个先前列的条件,则 LAG 获取先前值
【发布时间】:2021-01-26 17:06:34
【问题描述】:

我有一个由以下人员创建的表:

CREATE TABLE #test_table 
(
id INT
,EventName VARCHAR(50)
,HomeTeam VARCHAR(25)
,Metric INT
)

INSERT INTO #test_table VALUES
(1, 'Team A vs Team B', 'Team A', 5),
(2, 'Team A vs Team B', 'Team A', 7),
(3, 'Team C vs Team D', 'Team C', 6),
(4, 'Team Z vs Team A', 'Team Z', 8),
(5, 'Team A vs Team B', 'Team A', 9),
(6, 'Team C vs Team D', 'Team C', 3),
(7, 'Team C vs Team D', 'Team C', 1),
(8, 'Team E vs Team F', 'Team E', 2)

结果:

id  EventName           HomeTeam    Metric
------------------------------------------
1   Team A vs Team B    Team A      5
2   Team A vs Team B    Team A      7
3   Team C vs Team D    Team C      6
4   Team Z vs Team A    Team Z      8
5   Team A vs Team B    Team A      9
6   Team C vs Team D    Team C      3
7   Team C vs Team D    Team C      1
8   Team E vs Team F    Team E      2

A 想要计算一个新列 PreviousMetricN,其中 N 可以是 1、2、3,...,它显示了 Metric 的前一个值,但前提是 HomeTeam 参与前一个事件。例如:

id  EventName           HomeTeam    Metric  PreviousMetric1 PreviousMetric2
------------------------------------------------------------------------
1   Team A vs Team B    Team A      5       NULL            NULL
2   Team A vs Team B    Team A      7       5               NULL
3   Team C vs Team D    Team C      6       NULL            NULL
4   Team Z vs Team A    Team Z      8       NULL            NULL
5   Team A vs Team B    Team A      9       8               7
6   Team C vs Team D    Team C      3       6               NULL
7   Team C vs Team D    Team C      1       3               6
8   Team E vs Team F    Team E      2       NULL            NULL

我一直在尝试LAG 的变体,在PARTITION BY 子句中使用新的分组变量,例如

LAG(Metric) OVER(Partition by (CASE WHEN CHARINDEX(HomeTeam, EventName)>0 THEN 1 ELSE 0 END) ORDER BY id)

但没有任何成功。如何做到这一点?

编辑: 我也在这里问过熊猫这个问题: Pandas shift - get previous value if multiple conditions satisfied

【问题讨论】:

  • 为什么 id=5 的 PreviousMetric1 不应该是 7?什么是 PreviousMetric2?
  • 请用简单的语言解释您要查找的内容。
  • 请检查我的回答。它会解决你的问题。

标签: sql sql-server window-functions


【解决方案1】:

我在这里看不到使用窗口函数和单次扫描表的答案。我们可以在单次扫描中执行此查询,如下所示:

假设您在另一列中有AwayTeam

如果你还没有这个并且你想从EventData解析它:
我们可以使用:SUBSTRING(EventData, CHARINDEX(' vs ', EventData) + 4)
我敦促您遵循适当的规范化并将其创建为表格中的适当列。

我们的算法是这样运行的:

  1. 使用CROSS APPLY 将两个团队作为单独的行相乘(取消透视)
  2. 使用LAG 计算前一个Metrics,按合并的Team 列进行分区
  3. 向下过滤加倍的行,以便我们只为每个原始行获得一行
SELECT id, HomeTeam, AwayTeam, Metric, Prev1, Prev2, Prev3
FROM (

  SELECT *
    ,Prev1 = LAG(Metric, 1) OVER (PARTITION BY v.Team ORDER BY id)
    ,Prev2 = LAG(Metric, 2) OVER (PARTITION BY v.Team ORDER BY id)
    ,Prev3 = LAG(Metric, 3) OVER (PARTITION BY v.Team ORDER BY id)
    -- more of these ......
  FROM test_table
  CROSS APPLY (VALUES (HomeTeam, 1),(AwayTeam, 0)) AS v(Team,IsHome)
) AS t

WHERE IsHome = 1
-- ORDER BY id  --if necessary

重要的是,我们可以在不使用多种不同的排序、分区或排序以及不使用自连接的情况下做到这一点。只需一次扫描。

结果:

id HomeTeam AwayTeam Metric Prev1 Prev2 Prev3
1 Team A Team B 5 (null) (null) (null)
2 Team A Team B 7 5 (null) (null)
3 Team C Team D 6 (null) (null) (null)
4 Team Z Team A 8 (null) (null) (null)
5 Team A Team B 9 8 7 5
6 Team C Team D 3 6 (null) (null)
7 Team C Team D 1 3 6 (null)
8 Team E Team F 2 (null) (null) (null)

【讨论】:

  • 这是一个明智的方法,绝对是最有效的解决方案。最良好的祝愿。
【解决方案2】:

逻辑似乎是:

lag(metric, <n>) over (partition by hometeam order by id)

我不明白为什么需要eventName

【讨论】:

  • 啊 - 我应该解释得更清楚。因为HomeTeam 可以参与游戏,但不能成为该游戏中的HomeTeam。现在更新问题以证明这一点
【解决方案3】:

使用OUTER APPLY 和相关子查询:

SELECT *
FROM test_table c
OUTER APPLY (SELECT TOP 1 PreviousMetric1 = c2.Metric  
             FROM test_table c2 
             WHERE CHARINDEX(c.HomeTeam, c2.EventName)>0 
               AND c.id > c2.id
             ORDER BY id DESC) s1
OUTER APPLY (SELECT PreviousMetric2 = c2.Metric  
             FROM test_table c2 
             WHERE CHARINDEX(c.HomeTeam, c2.EventName)>0 
               AND c.id > c2.id
             ORDER BY id DESC OFFSET 1 ROWS FETCH NEXT 1 ROW ONLY) s2            
ORDER BY id;

db<>fiddle demo

输出:

+-----+-------------------+-----------+---------+------------------+-----------------+
| id  |    EventName      | HomeTeam  | Metric  | PreviousMetric1  | PreviousMetric2 |
+-----+-------------------+-----------+---------+------------------+-----------------+
|  1  | Team A vs Team B  | Team A    |      5  |                  |                 |
|  2  | Team A vs Team B  | Team A    |      7  |               5  |                 |
|  3  | Team C vs Team D  | Team C    |      6  |                  |                 |
|  4  | Team Z vs Team A  | Team Z    |      8  |                  |                 |
|  5  | Team A vs Team B  | Team A    |      9  |               8  |               7 |
|  6  | Team C vs Team D  | Team C    |      3  |               6  |                 |
|  7  | Team C vs Team D  | Team C    |      1  |               3  |               6 |
|  8  | Team E vs Team F  | Team E    |      2  |                  |                 |
+-----+-------------------+-----------+---------+------------------+-----------------+

PreviousMetricN 扩展是一个用OFFSET N-1 ROWS FETCH ... 添加对应的OUTER APPLY sN 的问题。

【讨论】:

    【解决方案4】:

    首先通过自加入和公用表表达式,我对之前所有包含主队的事件名进行了排名。我们可以从上一个匹配中获得 PreviousMetric1,我们可以使用 Lead() 窗口函数来获取 PreviousMetric2。请检查以下查询:

    with cte as(
    select a.id,a.eventname,a.hometeam,a.metric,b.metric PreviousMetric1,   
    LEAD(b.metric)over (partition by a.id order by b.id desc) PreviousMetric2,
    row_number()over(partition by a.id,a.hometeam order by b.id desc) rownum
    from #test_table a
        left join #test_table b
            on charindex(a.hometeam,b.eventname)>0  and a.id>b.id
    )select id,eventname,hometeam,metric,PreviousMetric1,PreviousMetric2  from cte 
    where rownum=1
    

    您还可以让 PreviousMetric3 应用 Lead() 并将 2 作为第二个参数。通过这种方式,您可以拥有任意数量的先前指标。与任何其他方法相比,它都更快。

    ;with cte as(
        select a.id,a.eventname,a.hometeam,a.metric,b.metric PreviousMetric1,   
        LEAD(b.metric)over (partition by a.id order by b.id desc) PreviousMetric2,
        LEAD(b.metric,2)over (partition by a.id order by b.id desc) PreviousMetric3,
        row_number()over(partition by a.id,a.hometeam order by b.id desc) rownum
        from #test_table a
            left join #test_table b
                on charindex(a.hometeam,b.eventname)>0  and a.id>b.id
        )select id,eventname,hometeam,metric,PreviousMetric1,PreviousMetric2 ,PreviousMetric3 from cte 
    where rownum=1
    

    【讨论】:

    • 一个不错的解决方案,但我将不得不授予 @Charlieface 的解决方案,因为它的效率高,而且只需扫描一次表格,避免了自连接、多重排序等。
    • 很高兴知道你得到了你想要的解决方案。是的,如果您将另一列视为客队球队,那么这是更好的解决方案。最良好的祝愿。这是一个很好的挑战。您的问题值得一票。
    【解决方案5】:

    我相信这就是您正在寻找的:

    ;with cte as (
       select id
     , eventname
     , hometeam
     , metric
     , CASE WHEN CHARINDEX(HomeTeam, EventName)>0 THEN LAG(Metric) OVER (Partition by HomeTeam ORDER BY id) ELSE NULL END previous from  #test_table 
    )
    select * ,CASE WHEN CHARINDEX(HomeTeam, EventName)>0 THEN LAG(previous) OVER (Partition by HomeTeam ORDER BY id) ELSE NULL END previous2 
    from cte
    order by 1
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-12-21
      • 1970-01-01
      • 2010-10-02
      • 2020-01-01
      • 2022-01-23
      • 2015-02-11
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多