【问题标题】:SQL query to get date pairs within idSQL查询以获取id内的日期对
【发布时间】:2020-03-18 20:23:24
【问题描述】:

我有一个包含以下行的表格:

    | item_id | change_type | change_date | change_id | other columns...
    | :------ | :---------- | :---------- | :-------- |
    |     123 |         off |  2019-06-04 |       321 |
    |     123 |          on |  2019-07-11 |       741 |
    |     123 |         off |  2019-07-13 |       987 |
    |     123 |          on |  2019-08-01 |       951 |
    |     123 |         off |  2019-08-07 |       357 |
    |     456 |         off |  2019-08-01 |       125 |
    |     456 |          on |  2019-11-18 |       878 |
    |     789 |          on |  2019-12-18 |       373 |
    |     012 |         off |  2019-12-25 |       654 |
    |     698 |         off |  2019-08-01 |       741 |
    |     698 |          on |  2018-01-03 |       147 |

我正在尝试运行产生以下结果的查询:

    | item_id | on_date    | off_date   | on_id | off_id | other columns...
    | :------ | :--------- | :--------- | :---- | :----- |
    |     123 |            | 2019-06-04 |       |    321 |
    |     123 | 2019-07-11 | 2019-07-13 |   741 |    987 |
    |     123 | 2019-08-01 | 2019-08-07 |   951 |    357 |
    |     456 |            | 2019-08-01 |       |    125 |
    |     456 | 2019-11-18 |            |   878 |        |
    |     789 | 2019-12-18 |            |   373 |        |
    |     012 |            | 2019-12-25 |       |    654 |
    |     698 | 2018-01-03 | 2019-08-01 |   147 |    741 |

我需要的结果是一个表格,其中日期“on”和“off”日期按降序记录(按item_id 分组),“off”日期与前一个日期在同一行(按时间) “开”日期。

我得到的最接近的是以下变体:

尝试一:

SELECT
    changes_main.item_id,
    `on_date`,
    `off_date`,
    `on_id`,
    `off_id`
FROM (
    SELECT DISTINCT `item_id`
    FROM item_changes
) AS changes_main
LEFT OUTER JOIN (
    SELECT
        `item_id`, -- for joining purposes only
        `change_date` AS `on_date`,
        `change_id` AS `on_id`
    FROM item_changes
    WHERE `change_type` = 'on'
) AS changes_ons ON changes_ons.item_id = changes_main.item_id
RIGHT OUTER JOIN ( -- although LEFT or RIGHT doesn't seem to matter
    SELECT
        `item_id`, -- for joining purposes only
        `change_date` AS `off_date`,
        `change_id` AS `off_id`
    FROM item_changes
    WHERE `change_type` = 'off'
) AS changes_offs ON changes_offs.item_id = changes_main.item_id
;

但是,这实质上会在on_dateoff_date 之间产生一个CROSS JOIN

第二次尝试的唯一变化是添加WHERE 子句。这是我从this question 那里得到的一个想法。

尝试二:

-- Same exact query as the above, however with the following
-- WHERE statement placed where the semicolon is above:
WHERE
    `off_date` = (
        SELECT MIN(offs2.change_date)
        FROM item_changes AS offs2
        WHERE offs2.change_type = 'off' AND
        offs2.change_date > changes_ons.on_date
    )
;

问题在于,如果 item_id 中有非偶数的“on/off”,多余的“on”或“off”会被过滤掉。

我已经尝试过上述WHERE 子句的变体,包括OR off_date IS NULLOR on_date IS NULL 等。

更新:

第三次尝试是使用UNION 和一些SCALAR SUBQUERIES。这是我最接近我需要的结果。但是,仍然不足(例如,它不包括change_id,以及没有创建完美匹配)。

SELECT
    changes_on.item_id,
    changes_on.change_date AS `on_date`,
    (SELECT MIN(offs2.change_date)
        FROM item_changes AS offs2
        WHERE offs2.change_type = 'off' AND
        offs2.change_date > changes_ons.change_date
    ) AS `off_date`,
    changes_on.change_id AS `on_id`,
    NULL AS `off_id` -- odd
FROM item_changes AS changes_on
WHERE `change_type` = 'on'

UNION

SELECT
    changes_offs.item_id,
    changes_offs.change_date AS `off_date`,
    (SELECT MIN(ons2.change_date)
        FROM item_changes AS ons2
        WHERE ons2.change_type = 'on' AND
        ons2.change_date < changes_offs.on_date
    ) AS `off_date`,
    NULL AS `on_id`, -- odd
    changes_offs.change_id AS `off_id`
FROM item_changes AS changes_offs
WHERE `change_type` = 'off'
;

我们将不胜感激助理/输入/指导。

【问题讨论】:

  • 易于使用的窗口函数,在 MySQL 8.x 中可用。您使用的是 MySQL 5.x 还是 8.x?
  • 不幸的是 5.x。目前。

标签: mysql sql join self-join


【解决方案1】:

根据每行之前的“on”数分配一个组。然后使用条件聚合:

select item_id,
       max(case when change_type = 'on' then date end) as on_date,
       max(case when change_type = 'on' then change_id end) as on_change_id,
       max(case when change_type = 'off' then date end) as off_date,
       max(case when change_type = 'off' then change_id end) as off_change_id
from (select t.*,
             sum(case when change_type = 'on' then 1 else 0 end) over (partition by item_id order by change_date) as grp
      from t
     ) t
group by item_id, grp;

编辑:

在早期版本的 MySQL 中,您可以将其表示为:

select item_id,
       max(case when change_type = 'on' then date end) as on_date,
       max(case when change_type = 'on' then change_id end) as on_change_id,
       max(case when change_type = 'off' then date end) as off_date,
       max(case when change_type = 'off' then change_id end) as off_change_id
from (select t.*,
             (select count(*)
              from t t2
              where t2.item_id = t.item_id and
                    t2.change_date <= t.change_date and
                    t2.change_type = 'on'
            ) as grp
      from t
     ) t
group by item_id, grp;

性能不如使用窗口函数,但(item_id, change_type, change_date) 上的索引会有所帮助。

【讨论】:

  • 太棒了!谢了。我正在使用 MySQL 5.x,所以我不能使用 OVER 但是我将 grp 查询稍微修改为相关子查询:(SELECT SUM(CASE WHEN change_type` = 'on' THEN 1 ELSE 0 END) FROM t AS sub_t WHERE sub_t.item_id = t.item_id AND sub_t.change_date OVER 仿真有任何缺陷?
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2019-12-14
  • 1970-01-01
  • 2021-09-07
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多