由于这是 SE,我们必须使用老式表示法,而不是 SQL-92 JOIN 表示法。
以下四个查询是两个可能答案的共同基础:
SELECT t1.tbl_id AS tbl_id, t2.tbl_id AS ind
FROM tbl AS t1, OUTER tbl AS t2
WHERE t1.tbl_id + 1 = t2.tbl_id
INTO TEMP x1;
SELECT t1.tbl_id AS tbl_id, t2.tbl_id AS ind
FROM tbl AS t1, OUTER tbl AS t2
WHERE t1.tbl_id - 1 = t2.tbl_id
INTO TEMP x2;
SELECT tbl_id AS hi_range
FROM x1
WHERE ind IS NULL
INTO TEMP x3;
SELECT tbl_id AS lo_range
FROM x2
WHERE ind IS NULL
INTO TEMP x4;
表 x3 和 x4 现在(分别)包含 tbl_id 的值,它们没有直接后继和直接前任。每个值都是 tbl_id 值的连续范围的开始或结束。在 IDS 而不是 SE 中,您可以使用标准 SQL OUTER JOIN 表示法并在两个查询而不是四个查询中过滤连接结果;你在 SE 没有那么奢侈。
具有二次(或更差)行为的非解
现在你只需要弄清楚如何组合这两个表:
SELECT t1.lo_range, t2.hi_range
FROM x4 AS t1, x3 AS t2
WHERE t1.lo_range <= t2.hi_range
AND NOT EXISTS
(SELECT t3.lo_range, t4.hi_range
FROM x4 AS t3, x3 AS t4
WHERE t3.lo_range <= t4.hi_range
AND t1.lo_range = t3.lo_range
AND t2.hi_range > t4.hi_range
);
此查询的主要部分出现两次,并生成范围开始小于或等于范围结束的所有行对(equal 允许 'ranges' 由一个单独的值组成,删除了两边的行)。 NOT EXISTS 子句确保没有其他对具有相同的起始值和较小的结束值。
如果数据有很多间隙,对临时表的查询可能不会很快;如果差距很少,那么应该没问题。
最后一个查询在范围数方面表现出二次行为。当我只有十几个范围时,这很好(亚秒级响应时间);当我有 1,200 个范围时,这并不好 - 没有在合理的时间内完成。
避免二次行为
既然二次行为不好,我们如何改写查询...
对于范围的每个低端,找到大于或等于低端的范围的最小高端,或者在 SQL 中:
SELECT t1.lo_range, MIN(t2.hi_range) AS hi_range
FROM x4 AS t1, x3 AS t2
WHERE t2.hi_range >= t1.lo_range
GROUP BY t1.lo_range;
请注意,这可以很容易地合并到 ACE 报告中。它为您提供了存在的数字范围 - 而不是那些不存在的数字。你可以弄清楚如何生成另一个。
时间
在包含 1200 个数据间隙的 22100 行的表上表现得非常好。在其基准模式 (-B) 中使用(我的)SQLCMD 程序,并将 SELECT 输出发送到 /dev/null,并使用 IDS 11.70.FC1 在 MacOS X 10.6.7(MacBook Pro、Intel Core 2 Duo at 3 GHz 和4 GB RAM),结果是:
$ sqlcmd -d stores -B -f gaps.sql
+ CLOCK START;
2011-03-31 18:44:39
+ BEGIN;
Time: 0.000588
2011-03-31 18:44:39
+ SELECT t1.tbl_id AS tbl_id, t2.tbl_id AS ind
FROM tbl AS t1, OUTER tbl AS t2
WHERE t1.tbl_id + 1 = t2.tbl_id
INTO TEMP x1;
Time: 0.437521
2011-03-31 18:44:39
+ SELECT t1.tbl_id AS tbl_id, t2.tbl_id AS ind
FROM tbl AS t1, OUTER tbl AS t2
WHERE t1.tbl_id - 1 = t2.tbl_id
INTO TEMP x2;
Time: 0.315050
2011-03-31 18:44:39
+ SELECT tbl_id AS hi_range
FROM x1
WHERE ind IS NULL
INTO TEMP x3;
Time: 0.012510
2011-03-31 18:44:39
+ SELECT tbl_id AS lo_range
FROM x2
WHERE ind IS NULL
INTO TEMP x4;
Time: 0.008754
+ output "/dev/null";
2011-03-31 18:44:39
+ SELECT t1.lo_range, MIN(t2.hi_range) AS hi_range
FROM x4 AS t1, x3 AS t2
WHERE t2.hi_range >= t1.lo_range
GROUP BY t1.lo_range;
Time: 0.561935
+ output "/dev/stdout";
2011-03-31 18:44:40
+ SELECT COUNT(*) FROM x1;
22100
Time: 0.001171
2011-03-31 18:44:40
+ SELECT COUNT(*) FROM x2;
22100
Time: 0.000685
2011-03-31 18:44:40
+ SELECT COUNT(*) FROM x3;
1200
Time: 0.000590
2011-03-31 18:44:40
+ SELECT COUNT(*) FROM x4;
1200
Time: 0.000768
2011-03-31 18:44:40
+ SELECT t1.lo_range, MIN(t2.hi_range) AS hi_range
FROM x4 AS t1, x3 AS t2
WHERE t2.hi_range >= t1.lo_range
GROUP BY t1.lo_range
INTO TEMP x5;
Time: 0.529420
2011-03-31 18:44:40
+ SELECT COUNT(*) FROM x5;
1200
Time: 0.001155
2011-03-31 18:44:40
+ ROLLBACK;
Time: 0.329379
+ CLOCK STOP;
Time: 2.202523
$
会的;处理时间不到几秒钟。