如果您阅读了R T Snodgrass 的“Developing Time-Oriented Database Applications in SQL”一书(其 pdf 可从他的网站上的出版物中获得),并在 p165-166 上找到图 6.25,您将找到可以在当前示例中使用的非平凡 SQL 对具有相同 ID 值和连续时间间隔的各种行进行分组。
下面的查询开发接近正确,但在最后发现了一个问题,其根源在于第一个 SELECT 语句。我还没有找到给出错误答案的原因。 [如果有人可以在他们的 DBMS 上测试 SQL 并告诉我第一个查询是否在那里正常工作,那将是一个很大的帮助!] p>
它看起来像:
-- Derived from Figure 6.25 from Snodgrass "Developing Time-Oriented
-- Database Applications in SQL"
CREATE TABLE Data
(
Start DATE,
Finish DATE,
ID CHAR(2),
Amount INT
);
INSERT INTO Data VALUES('2008-10-01', '2008-10-02', '01', 10);
INSERT INTO Data VALUES('2008-10-02', '2008-10-03', '02', 20);
INSERT INTO Data VALUES('2008-10-03', '2008-10-04', '01', 38);
INSERT INTO Data VALUES('2008-10-04', '2008-10-05', '01', 23);
INSERT INTO Data VALUES('2008-10-05', '2008-10-06', '03', 14);
INSERT INTO Data VALUES('2008-10-06', '2008-10-07', '02', 3);
INSERT INTO Data VALUES('2008-10-07', '2008-10-08', '02', 8);
INSERT INTO Data VALUES('2008-10-08', '2008-11-08', '03', 19);
SELECT DISTINCT F.ID, F.Start, L.Finish
FROM Data AS F, Data AS L
WHERE F.Start < L.Finish
AND F.ID = L.ID
-- There are no gaps between F.Finish and L.Start
AND NOT EXISTS (SELECT *
FROM Data AS M
WHERE M.ID = F.ID
AND F.Finish < M.Start
AND M.Start < L.Start
AND NOT EXISTS (SELECT *
FROM Data AS T1
WHERE T1.ID = F.ID
AND T1.Start < M.Start
AND M.Start <= T1.Finish))
-- Cannot be extended further
AND NOT EXISTS (SELECT *
FROM Data AS T2
WHERE T2.ID = F.ID
AND ((T2.Start < F.Start AND F.Start <= T2.Finish)
OR (T2.Start <= L.Finish AND L.Finish < T2.Finish)));
该查询的输出是:
01 2008-10-01 2008-10-02
01 2008-10-03 2008-10-05
02 2008-10-02 2008-10-03
02 2008-10-06 2008-10-08
03 2008-10-05 2008-10-06
03 2008-10-05 2008-11-08
03 2008-10-08 2008-11-08
已编辑:倒数第二行有问题 - 它不应该存在。我还不清楚它是从哪里来的。
现在我们需要将该复杂表达式视为另一个 SELECT 语句的 FROM 子句中的查询表达式,它将给定 ID 的数量值与与上面显示的最大范围重叠的条目相加。
SELECT M.ID, M.Start, M.Finish, SUM(D.Amount)
FROM Data AS D,
(SELECT DISTINCT F.ID, F.Start, L.Finish
FROM Data AS F, Data AS L
WHERE F.Start < L.Finish
AND F.ID = L.ID
-- There are no gaps between F.Finish and L.Start
AND NOT EXISTS (SELECT *
FROM Data AS M
WHERE M.ID = F.ID
AND F.Finish < M.Start
AND M.Start < L.Start
AND NOT EXISTS (SELECT *
FROM Data AS T1
WHERE T1.ID = F.ID
AND T1.Start < M.Start
AND M.Start <= T1.Finish))
-- Cannot be extended further
AND NOT EXISTS (SELECT *
FROM Data AS T2
WHERE T2.ID = F.ID
AND ((T2.Start < F.Start AND F.Start <= T2.Finish)
OR (T2.Start <= L.Finish AND L.Finish < T2.Finish)))) AS M
WHERE D.ID = M.ID
AND M.Start <= D.Start
AND M.Finish >= D.Finish
GROUP BY M.ID, M.Start, M.Finish
ORDER BY M.ID, M.Start;
这给出了:
ID Start Finish Amount
01 2008-10-01 2008-10-02 10
01 2008-10-03 2008-10-05 61
02 2008-10-02 2008-10-03 20
02 2008-10-06 2008-10-08 11
03 2008-10-05 2008-10-06 14
03 2008-10-05 2008-11-08 33 -- Here be trouble!
03 2008-10-08 2008-11-08 19
已编辑:这几乎是正确的数据集,可以在其上进行原始问题要求的 COUNT 和 SUM 聚合,因此最终答案是:
SELECT I.ID, COUNT(*) AS Number, SUM(I.Amount) AS Amount
FROM (SELECT M.ID, M.Start, M.Finish, SUM(D.Amount) AS Amount
FROM Data AS D,
(SELECT DISTINCT F.ID, F.Start, L.Finish
FROM Data AS F, Data AS L
WHERE F.Start < L.Finish
AND F.ID = L.ID
-- There are no gaps between F.Finish and L.Start
AND NOT EXISTS
(SELECT *
FROM Data AS M
WHERE M.ID = F.ID
AND F.Finish < M.Start
AND M.Start < L.Start
AND NOT EXISTS
(SELECT *
FROM Data AS T1
WHERE T1.ID = F.ID
AND T1.Start < M.Start
AND M.Start <= T1.Finish))
-- Cannot be extended further
AND NOT EXISTS
(SELECT *
FROM Data AS T2
WHERE T2.ID = F.ID
AND ((T2.Start < F.Start AND F.Start <= T2.Finish) OR
(T2.Start <= L.Finish AND L.Finish < T2.Finish)))
) AS M
WHERE D.ID = M.ID
AND M.Start <= D.Start
AND M.Finish >= D.Finish
GROUP BY M.ID, M.Start, M.Finish
) AS I
GROUP BY I.ID
ORDER BY I.ID;
id number amount
01 2 71
02 2 31
03 3 66
评论:
哦! Drat ... 3 的条目的“数量”是它应该有的两倍。以前的“编辑”部分表明事情开始出错的地方。看起来好像第一个查询有细微的错误(可能是针对不同的问题),或者我正在使用的优化器行为不端。不过,应该有一个与此密切相关的答案,它会给出正确的值。
作为记录:在 Solaris 10 上的 IBM Informix Dynamic Server 11.50 上进行了测试。但是,应该可以在任何其他符合标准的 SQL DBMS 上正常工作。