识别 Oracle 上 n 字段数据表的重复组合答案

【问题标题】：Identifying duplicate combinations for n-field data table on Oracle识别 Oracle 上 n 字段数据表的重复组合
【发布时间】：2011-05-09 13:54:40
【问题描述】：

我设法编写了 sql 查询来将包含组合的重复行更新为 2 字段表的空值。但是，我坚持使用超过 2 个字段的表。

我的 2-field 解决方案是：

为组合表插入测试数据：

create table combinations as 
select 1 col1, 2 col2 from dual --row1 
union all 
select 2, 1 from dual --row2
union all 
select 1, 3 from dual --row3
union all 
select 1,4 from dual; --row4

来自组合的表 row1 和 row2 是重复的，因为元素的顺序无关紧要。

将 2 个字段的重复组合更新为 null（将 row2 更新为 null）：

update combinations 
set col1=null, col2=null 
where rowid IN(
select x.rid from (
    select 
        rowid rid, 
        col1, 
        col2, 
        row_number() over (partition by least(col1,col2), greatest(col1,col2)
                               order by rownum) duplicate_row 
    from combinations) x 
where duplicate_row > 1);

我上面的代码依赖于 least(,) 和 great(,) 函数，这就是它工作得很好的原因。将此代码调整为 3 字段表的任何想法？

为combinations2'表（3-fields）插入测试数据

create table combinations2 as
select 1 col1, 2 col2, 3 col3 from dual --row1
union all
select 2, 1, 3 from dual --row2
union all
select 1, 3, 2 from dual --row3;

具有 3 个字段的 Combinations2 表具有相同的 row1、row2、row3。我的目标是将 row2 和 row3 更新为 null。

【问题讨论】：

好像和这个问题一样：stackoverflow.com/questions/5924118/…

标签： sql oracle combinations

【解决方案1】：

update combinations2
set col1 = NULL
  , col2 = NULL
  , col3 = NULL
where rowid in (
            select r 
            from
                (
                -- STEP 4
                select r, row_number() over(partition by colls order by colls) duplicate_row
                from
                    (
                    -- STEP 3
                    select r, c1 || '_' || c2 || '_' || c3 colls
                    from
                        (
                        -- STEP 2
                        select r
                              , max(case when rn = 1 then val else null end) c1 
                              , max(case when rn = 2 then val else null end) c2
                              , max(case when rn = 3 then val else null end) c3
                        from 
                            (
                            -- STEP 1
                            select r
                                  , val
                                  , row_number() over(partition by r order by val) rn
                            from
                                (
                                  select rowid as r, col1 as val
                                  from combinations2
                                union all
                                  select rowid, col2
                                  from combinations2
                                union all
                                  select rowid, col3
                                  from combinations2
                                )
                            )
                        group by r
                        )
                    )
                )
            where duplicate_row > 1
            )
;

第 1 步：对列中的值进行排序
第 2 步：构建具有排序值的行
第 3 步：将列连接到字符串
第 4 步：查找重复项

【讨论】：

不错，但您需要稍作修改，如果您向表中添加更多值，例如：1,4,2 和 4,2,1，它就不起作用了
在 STEP 4 中应该是 row_number() over(partition by colls ORDER BY colls)
感谢您的报告。我在格式化时以某种方式丢失了它。我已经编辑了我的答案。现在它应该可以正常运行了。