【发布时间】:2021-10-02 20:52:38
【问题描述】:
如果我在多个表上执行内连接,我如何确保结果集只包含最近的时间戳。例如
SELECT
e.customer_id AS customer_id,
e.event_id AS event_id,
#MOST RECENT TIMESTAMP from car.updated_on, motorcycle.updated_on or walkc.updated_on
FROM
event_table AS e
INNER JOIN car AS c ON e.customer_id = c.customer_id
INNER JOIN motorcycle AS m ON e.customer_id = m.customer_id
INNER JOIN walk AS w ON e.customer_id = w.customer_id
WHERE
e.event_id = c.event_id
AND e.event_id = m.event_id
AND e.event_id = w.event_id
我有一个记录所有发生的事件的表,我只想提取最近的时间戳,而不管所有三个事件(汽车、摩托车或步行)中的所有客户的事件类型。
样本数据:
事件
| customer_id | event_id |
|---|---|
| 1 | 100 |
| 2 | 101 |
| 3 | 102 |
| 4 | 103 |
| 5 | 104 |
| 6 | 105 |
| 7 | 106 |
| 8 | 107 |
| 9 | 108 |
| 10 | 109 |
汽车
| customer_id | event_id | car_id | updated_on |
|---|---|---|---|
| 1 | 100 | 1 | 2021-07-23 10:09:05 |
| 2 | 101 | 1 | 2021-07-23 10:09:05 |
| 3 | 102 | 1 | 2021-07-23 10:09:05 |
| 4 | 103 | 1 | 2021-07-23 10:09:05 |
| 5 | 104 | 1 | 2021-07-23 10:09:05 |
| 6 | 105 | 1 | 2021-07-23 10:09:05 |
| 7 | 106 | 1 | 2021-07-23 10:09:05 |
| 8 | 107 | 1 | 2021-07-23 10:09:05 |
| 9 | 108 | 1 | 2021-07-23 10:09:05 |
| 10 | 109 | 1 | 2021-07-23 10:09:05 |
摩托车
| customer_id | event_id | motorcycle_id | updated_on |
|---|---|---|---|
| 1 | 100 | 1 | 2021-07-23 10:09:00 |
| 2 | 101 | 1 | 2021-07-23 10:09:00 |
| 3 | 102 | 1 | 2021-07-23 10:09:00 |
| 4 | 103 | 1 | 2021-07-23 10:09:00 |
| 5 | 104 | 1 | 2021-07-23 10:09:10 |
| 6 | 105 | 1 | 2021-07-23 10:09:10 |
| 7 | 106 | 1 | 2021-07-23 10:09:00 |
| 8 | 107 | 1 | 2021-07-23 10:09:00 |
走路
| customer_id | event_id | walk_id | updated_on |
|---|---|---|---|
| 1 | 100 | 1 | 2021-07-23 10:09:00 |
| 2 | 101 | 1 | 2021-07-23 10:09:00 |
| 3 | 102 | 1 | 2021-07-23 10:09:00 |
| 4 | 103 | 1 | 2021-07-23 10:09:00 |
| 5 | 104 | 1 | 2021-07-23 10:09:00 |
| 6 | 105 | 1 | 2021-07-23 10:09:00 |
| 7 | 106 | 1 | 2021-07-23 10:09:00 |
| 8 | 107 | 1 | 2021-07-23 10:09:15 |
| 9 | 108 | 1 | 2021-07-23 10:09:15 |
期望的结果:
| customer_id | event_id | updated_on | comment |
|---|---|---|---|
| 1 | 100 | 2021-07-23 10:09:05 | TS from car |
| 2 | 101 | 2021-07-23 10:09:05 | TS from car |
| 3 | 102 | 2021-07-23 10:09:05 | TS from car |
| 4 | 103 | 2021-07-23 10:09:05 | TS from car |
| 5 | 104 | 2021-07-23 10:09:10 | TS from motorcycle |
| 6 | 105 | 2021-07-23 10:09:10 | TS from motorcycle |
| 7 | 106 | 2021-07-23 10:09:15 | TS from walk |
| 8 | 107 | 2021-07-23 10:09:15 | TS from walk |
我不需要最终结果集中的comment,我添加它只是为了解释。实际上,我不在乎事件是什么。我只关心四个表中的INNER JOIN;所以最多应该只有8条记录,我只想要最新(最高)的时间戳值。 customer_id 和 event_id 需要匹配所有 INNER JOINS。
例如:customer_id = 1 和 event_id = 100;这存在于所有 4 个表中。它具有updated_on 的三个值(分别来自:汽车、摩托车和步行)。我想要MAX(2021-07-23 10:09:05, 2021-07-23 10:09:00, 2021-07-23 10:09:00); MAX(car.updated_on, motorcyle.updated_on, walk.updated_on).
任何帮助将不胜感激,谢谢。
编辑:我在两个查询中得到了想要的结果。希望优化为单个查询。
- 仅获取三个表之间的
UNIQUE记录并将它们存储在另一个名为event_joined的位置。此表每次都会被完全覆盖,而不仅仅是附加到。
SELECT
e.customer_id AS customer_id,
e.event_id AS event_id,
FROM
event_table AS e
INNER JOIN car AS c ON e.customer_id = c.customer_id
INNER JOIN motorcycle AS m ON e.customer_id = m.customer_id
INNER JOIN walk AS w ON e.customer_id = w.customer_id
WHERE
e.event_id = c.event_id
AND e.event_id = m.event_id
AND e.event_id = w.event_id
- 在执行
UNION之前,我们知道所有三个表将具有相同的行数,因为我们之前仅将它们加入以匹配记录。现在我们只需GROUP BY并获取MAX(最近的)时间戳。
SELECT event_temp.customer_id, event_temp.event_id, MAX(event_temp.updated_on) AS updated_on
FROM (
SELECT customer_id, event_id, updated_on FROM car AS c INNER JOIN event_joined AS ej ON e.customer_id = c.customer_id AND e.event_id = c.event_id
UNION ALL
SELECT customer_id, event_id, updated_on FROM motorcycle AS m INNER JOIN event_joined AS ej ON e.customer_id = c.customer_id AND e.event_id = c.event_id
UNION ALL
SELECT customer_id, event_id, updated_on FROM walk AS w INNER JOIN event_joined AS ej ON e.customer_id = c.customer_id AND e.event_id = c.event_id
) AS event_temp
GROUP BY event_temp.customer_id, event_temp.event_id;
有没有办法将其优化为单个查询?谢谢。
【问题讨论】:
-
提供样本数据、所需结果和适当的数据库标签。
-
抱歉打扰你,知道我可以用什么来格式化表格吗?它在预览中看起来不错,但是当我按保存时它无法正确生成表格。
-
我更新了,希望截图没问题。我无法弄清楚降价。看起来像堆栈的错误:(
-
当汽车、摩托车和步行表链接到的事件表已经有客户 ID 时,为什么会有客户 ID?您的 tab,e 结构没有意义。另外,这个凌乱的表结构与您的实际问题有多大的相关性?尽量减少到最低限度。
-
对遗留系统的不幸技术限制我无法升级,否则我会正常化整个混乱。实际上,我知道我需要做 4
INNER JOINS这很好。我只需要从其他 3 个表中获取最新的TIMESTAMP。
标签: sql sql-server greatest-n-per-group sql-server-2016