如何删除表中的重复项？答案

【问题标题】：How to remove duplicates in a table?如何删除表中的重复项？
【发布时间】：2014-03-04 20:06:41
【问题描述】：

CREATE join_table {
  id1 integer,
  id2 integer
}

我想创建一个UNIQ CONSTRAINT(id1, id2)，但是，我看到一些不好的数据，例如：

id1   | id2
------------
1     | 1
1     | 1
1     | 2

因此，记录 (1,1) 显然是重复的，并且会违反 uniq 约束。如何编写一个 sql 查询来删除表中的所有重复记录。

注意：我想删除其中一个重复项，以便创建 uniq 约束

【问题讨论】：

Remove duplicate from a table 的可能重复项

标签： postgresql

【解决方案1】：

这将保留其中一个副本：

delete from join_table
where ctid not in (select min(ctid)
                   from join_table
                   group by id1, id2);

您的桌子没有可用于“挑选一名幸存者”的唯一标识符。这就是 Postgres 的 ctid 派上用场的地方，因为它是每一行的内部唯一标识符。请注意，您永远不应将ctid 用于单个语句。它不是一个普遍独特的东西，但对于单个语句的运行时来说就很好了。

SQLFiddle 示例：http://sqlfiddle.com/#!15/dabfc/1

如果您想删除所有行重复：

delete from join_table
where (id1, id2) in (select id1, id2
                     from join_table
                     group by id1, id2
                     having count(*) > 1);

这两种解决方案在大桌子上都不会很快。如果您需要大表中的大量行，如 jjanes 所示，创建一个没有重复项的新表会快得多。

【讨论】：

【解决方案2】：

如果没有主键，那将很难做到。

现有表是否以 FK 约束等命名？如果没有，请重新制作它。

begin;
create table new_table as select distinct * from join_table;
drop table join_table;
alter table new_table rename TO join_table;
commit;

【讨论】：