您可以使用 4 种方法:
- 不同
- 分组依据
- 子查询
- 带有 ROW_NUMBER() 的公用表表达式 (CTE)
考虑以下带有测试数据的示例TABLE:
/** Create test table */
CREATE TEMPORARY TABLE dupes(word text, num int, id int);
/** Add test data with duplicates */
INSERT INTO dupes(word, num, id)
VALUES ('aaa', 100, 1)
,('bbb', 200, 2)
,('ccc', 300, 3)
,('bbb', 400, 4)
,('bbb', 200, 5) -- duplicate
,('ccc', 300, 6) -- duplicate
,('ddd', 400, 7)
,('bbb', 400, 8) -- duplicate
,('aaa', 100, 9) -- duplicate
,('ccc', 300, 10); -- duplicate
选项 1:选择 DISTINCT
这是最简单直接,但也是最有限的方式:
SELECT DISTINCT word, num
FROM dupes
ORDER BY word, num;
/*
word|num|
----|---|
aaa |100|
bbb |200|
bbb |400|
ccc |300|
ddd |400|
*/
选项 2:分组方式
分组允许您添加聚合数据,例如min(id)、max(id)、count(*) 等:
SELECT word, num, min(id), max(id), count(*)
FROM dupes
GROUP BY word, num
ORDER BY word, num;
/*
word|num|min|max|count|
----|---|---|---|-----|
aaa |100| 1| 9| 2|
bbb |200| 2| 5| 2|
bbb |400| 4| 8| 2|
ccc |300| 3| 10| 3|
ddd |400| 7| 7| 1|
*/
选项 3:子查询
使用子查询,您可以首先识别要忽略的重复行,然后在外部查询中使用WHERE NOT IN (subquery) 构造将它们过滤掉:
/** Find the higher id values of duplicates, distinct only added for clarity */
SELECT distinct d2.id
FROM dupes d1
INNER JOIN dupes d2 ON d2.word=d1.word AND d2.num=d1.num
WHERE d2.id > d1.id
/*
id|
--|
5|
6|
8|
9|
10|
*/
/** Use the previous query in a subquery to exclude the dupliates with higher id values */
SELECT *
FROM dupes
WHERE id NOT IN (
SELECT d2.id
FROM dupes d1
INNER JOIN dupes d2 ON d2.word=d1.word AND d2.num=d1.num
WHERE d2.id > d1.id
)
ORDER BY word, num;
/*
word|num|id|
----|---|--|
aaa |100| 1|
bbb |200| 2|
bbb |400| 4|
ccc |300| 3|
ddd |400| 7|
*/
选项 4:使用 ROW_NUMBER() 的公用表表达式
在公用表表达式 (CTE) 中,选择 ROW_NUMBER(),按组列分区并按所需顺序排序。然后只选择有ROW_NUMBER() = 1的记录:
WITH CTE AS (
SELECT *
,row_number() OVER(PARTITION BY word, num ORDER BY id) AS row_num
FROM dupes
)
SELECT word, num, id
FROM cte
WHERE row_num = 1
ORDER BY word, num;
/*
word|num|id|
----|---|--|
aaa |100| 1|
bbb |200| 2|
bbb |400| 4|
ccc |300| 3|
ddd |400| 7|
*/