【发布时间】:2022-01-23 05:54:42
【问题描述】:
在 SQL(最好是 SQL Server)中有没有办法选择一个组中排在其他组之外的前 N 条记录?
例如:
DROP TABLE IF EXISTS #DISTANCE
CREATE TABLE #DISTANCE
(
GNAME VARCHAR(3)
, CNAME VARCHAR(3)
, DIST NUMERIC(5,3)
)
INSERT INTO #DISTANCE
VALUES ('E1', 'C1', 1), ('E1','C2',2),
('E2', 'C1', 1.5), ('E2','C2',2.5)
如果我按距离 ASC 为每个 ENAME 寻找第一个专有 CNAME,我希望得到这样的输出:
| Ename | Cname | Dist |
|---|---|---|
| E1 | C1 | 1 |
| E2 | C2 | 2.5 |
请注意,省略 E1|C2 和 E2|C1,因为它们将是该组排名结果中的第二个值。
我想出了一些 SQL 方法来尝试正确地提取它,但是当我在 ENAME 上添加其他组并且如果我更改我的 Top N 值时,我的工作就会崩溃。
如果我增加复杂性:
TRUNCATE TABLE #DISTNACE
INSERT INTO #DISTANCE
VALUES ('E1', 'C1', 1), ('E1', 'C2', 2),
('E1', 'C3', 3), ('E1', 'C4', 5),
('E2', 'C1', 2.5), ('E2', 'C2', 4),
('E2', 'C3', 3.5), ('E2', 'C4', 6),
('E3', 'C4', 7), ('E3', 'C5', 6),
('E3', 'C6', 4)
我试图得到的 SQL 输出如下所示:
| GNAME | CNAME | DIST |
|---|---|---|
| E1 | C1 | 1.000 |
| E1 | C2 | 2.000 |
| E1 | C3 | 3.000 |
| E2 | C4 | 6.000 |
| E3 | C6 | 4.000 |
| E3 | C5 | 6.000 |
我可以让它在这个特定的实例中工作,使用这个代码:
WITH X AS
(
SELECT *
--, RNK = DENSE_RANK() OVER (ORDER BY DIST ASC)
, CNAME_RNK_BY_DIST = DENSE_RANK() OVER (PARTITION BY CNAME ORDER BY DIST ASC)
, CNAME_RNK_BY_DIST = DENSE_RANK() OVER (PARTITION BY CNAME ORDER BY DIST ASC)
FROM #DISTANCE
)
,MINDIST AS ( -- FIRST OCCURANCE OF CNAME VALUE
SELECT
CNAME
, MIN(DIST) MINDIST
FROM X GROUP BY CNAME
)
-- SELECT * , CALC = SUM(CNAME_RNK_BY_DIST / 4) OVER (PARTITION BY CNAME ORDER BY DIST ASC) FROM X order by CNAME, DIST
, X2 AS (
SELECT *, CALC = SUM(FLOOR(CNAME_RNK_BY_DIST / 4)) OVER (PARTITION BY CNAME ORDER BY DIST ASC) FROM X
)
--SELECT * FROM X2 order by CNAME, dist
, CALC AS (
SELECT CNAME, MAXINC = MAX(CALC) FROM X2 GROUP BY CNAME
)
--SELECT * FROM CALC
, FIRST_OCCURANCE_PAIRS AS (
SELECT A.*
,OCCURANCE = RANK() OVER (PARTITION BY CNAME ORDER BY DIST)
FROM X A
JOIN MINDIST B ON A.CNAME = B.CNAME AND A.DIST = B.MINDIST
)
--SELECT * FROM FIRST_OCCURANCE_PAIRS
,ISO AS
(
SELECT * fROM FIRST_OCCURANCE_PAIRS WHERE OCCURANCE > 3
)
--select * from FIRST_OCCURANCE_PAIRS
-- SELECT * FROM ISO
, NEXT_OCCURANCE AS (
SELECT A.*
FROM X AS A
JOIN CALC ON A.CNAME = CALC.CNAME
JOIN ISO B ON A.CNAME_RNK_BY_DIST = CALC.MAXINC and A.CNAME = B.CNAME
)
--select * from NEXT_OCCURANCE
, FRAME AS (
SELECT
CNAME
, CNAME
, DIST
FROM FIRST_OCCURANCE_PAIRS
--WHERE OCCURANCE <=3
UNION
SELECT
CNAME
, CNAME
, DIST
FROM NEXT_OCCURANCE
)
--select * from FRAME
, FINAL AS (
SELECT * ,FINALRNK = ROW_NUMBER() OVER (PARTITION BY CNAME ORDER BY DIST)
FROM
FRAME
)
SELECT * FROM FINAL WHERE FINALRNK <4
但随着更多记录的添加,逻辑失败。有没有办法清理这个 SQL 并获得任意数量组合的结果?
【问题讨论】:
-
您似乎想要每个
cname中dist最少的行。如果是这种情况,那么您的第一个示例是错误的。您希望在 C2 的结果中出现('E1','C2',2),而不是('E2','C2',2.5),因为 2
标签: sql sql-server tsql