SQL：直接从连接中获取顶部元素答案

【问题标题】：SQL: Get top element directly from joinSQL：直接从连接中获取顶部元素
【发布时间】：2021-03-06 05:20:01
【问题描述】：

我很想知道我们是否可以避免内部查询并在连接之前使用连接来获取另一个表的顶部元素。

例如，有一个“students”表和另一个代表“student_marks”的表，它代表学生在个别科目中的分数。现在我想为每个学生列出他在所有科目中的最高分。

student
-----------------------------
| student_id | name   | Age |
-----------------------------
| S1         | Biden  | 15  |
| S2         | Jordan | 16  |
-----------------------------

student_marks
-------------------------------------
| student_id | subject      | marks |
-------------------------------------
| S1         | Geology      | 80    |
| S1         | Trigonometry | 90    |
| S2         | Geography    | 70    |
| S2         | Geology      | 75    |
-------------------------------------

想要的结果如下：

----------------------------------------------------
| student_id | name   | age | subject      | marks |
----------------------------------------------------
| S1         | Biden  | 15  | Trigonometry | 90    |
| S2         | Jordan | 16  | Geology      | 75    |
----------------------------------------------------

由于我用的是MySql，所以不能用with子句

我的尝试是

select * from
student s
inner join (
    select student_id, max(marks) as marks from student_marks group by student_id
) max_student_marks sm
on s.student_id = sm.student_id;

PS：上述方法确实给出了预期的结果（没有主题列），但随着数据量的增加运行速度非常慢。那么有没有更好的方法来利用连接并从用于连接的辅助表中选择顶部元素。

【问题讨论】：

'由于我使用的是 MySql，所以我不能使用 with 子句'-从第 8 版开始你可以..你在哪个版本上？
我正在使用 5.17 或 5.2 类似的东西。无法真正升级到 8 :|
@Jake 您发布的查询未返回列subject。

标签： mysql sql inner-join greatest-n-per-group

【解决方案1】：

你可以使用row_number():

select *
from student s inner join 
     (select sm.*,
             row_number() over (partition by student_id order by marks desc) as seqnum
      from student_marks sm
     )
     on s.student_id = sm.student_id and sm.seqnum = 1;

【讨论】：

OP 说他不能使用 with 所以他可能不在允许他使用窗口函数的版本（8 或更高版本）上。
另一方面，如果 OP 需要自 2017 年以来在 MySQL 中可用的功能，那么他们应该升级到支持这些功能的版本是合理的。
@Bill Karwin - 你不准备接受那些装腔作势者无法升级的问题？

【解决方案2】：

你快到了：

select * from
student s
inner join student_marks sm1
  on s.student_id = sm1.student_id
inner join (
    select student_id, max(marks) as marks from student_marks group by student_id
) sm2
on sm1.student_id = sm2.student_id and sm1.marks = sm2.marks;

然后，您可以使用sm1 从具有最大标记的相应行中获取其他列。

请注意，如果 student_marks 中有多行与标记的最大值匹配，则这可能会找到平局。

这就是窗口函数如此有用的原因。

我在 MySQL 8.0 之前使用的另一个解决方案涉及“tiebreaker”列。任何保证在匹配最大标记的行集中不同的列都可以工作，但主键是一个典型的选择。假设这个表的主键是id。

select * from
student s
inner join student_marks sm1 on s.student_id = sm1.student_id
left outer join student_marks sm2 on s.student_id = sm2.student_id 
  and (sm1.marks < sm2.marks or sm1.marks = sm2.marks and sm1.id < sm2.id)
where sm2.student_id IS NULL;

其工作方式是检查是否存在具有更大标记的行sm2，或者如果不更大，则id 更大。如果不存在这样的行，则sm1 必须是具有最高标记的行。当 OUTER JOIN 为 sm2 的列返回 NULL 时，就会发生这种情况。

如果您不关心关系，此解决方案也适用，只需删除 id 的术语：

select * from
student s
inner join student_marks sm1 on s.student_id = sm1.student_id
left outer join student_marks sm2 on s.student_id = sm2.student_id 
  and sm1.marks < sm2.marks
where sm2.student_id IS NULL;

【讨论】：

对不起，我该如何使用 s.marks！学生表中没有名称标记列。
抱歉这个错误。我已经编辑了上面的查询来展示如何解决它。
我所做的尝试已经给了我想要的结果。但这运行缓慢，因为在加入之前有一个按功能分组。我正在寻找不涉及 group by 并使用最大连接功能的东西。
我提供了一个不使用group by的解决方案。

【解决方案3】：

使用NOT EXISTS 得到每个学生的最高marks：

select sm.*
from student_marks sm
where not exists (select 1 from student_marks where student_id = sm.student_id and marks > sm.marks)

然后加入表student：

select s.*, t.subject, t.marks
from student s
inner join (
    select sm.*
    from student_marks sm
    where not exists (select 1 from student_marks where student_id = sm.student_id and marks > sm.marks)
) t on t.student_id = s.student_id

请参阅demo。
结果：

| student_id | name   | Age | subject      | marks |
| ---------- | ------ | --- | ------------ | ----- |
| S1         | Biden  | 15  | Trigonometry | 90    |
| S2         | Jordan | 16  | Geology      | 75    |

【讨论】：