【问题标题】:Merging records by possible match found按找到的可能匹配合并记录
【发布时间】:2020-07-18 03:53:18
【问题描述】:

我遇到了一个问题,我必须更正一些历史数据。它有大量的数据。为了更正这些历史数据,我需要通过找到的可能匹配将它们合并在一起。让我知道这是否与其他任务重复。

这是表结构:

CREATE TABLE Contacts
(
    Id INT PRIMARY KEY, 
    FirstName VARCHAR(50), 
    LastName VARCHAR(50), 
    Email VARCHAR(50), 
    Mobile VARCHAR(50),
    Notes VARCHAR(MAX),
)

合并逻辑如下:

 --When all 4 fields(firstName, lastName, Email, Mobile) are matching for more then one contact, merge them together
 --when one record has all 4 fields, another records has only 3 matching and 4th one as null, merge them,
 --when one record has all 4 fields, another records has only 2 matching and remaining two as null, merge them,
 --when one record has all 4 fields, another records has only 1 matching and remaining three as null, merge them,


 --when one record has 3 fields and 4th field is NULL, another record has exacly same matching records, merge them,
 --when one record has 3 fields and 4th field is NULL, another records has only 1 matching and remaining three as null, merge them,
 --when one record has 3 fields and 4th field is NULL, another records has only 2 matching and remaining two as null, merge them,
 --when one record has 3 fields and 4th field is NULL, another records has only 1 matching and remaining three as null, merge them,

 --when one record has 2 fields and 2 fields as NULL, another record has exacly same matching records, merge them,
 --when one record has 2 fields and 2 fields as NULL, another records has only 1 matching  field and remaining three as null, merge them,

 --when one record has 1 fields and 3 fields as NULL, another record has exacly same matching, merge them,

当我说将它们合并在一起时,这意味着将两个项目合并为一个并删除剩余的一个。我试图通过联系人列表上的光标来做这些事情,但这并没有帮助我完成所有这些组合。

我在这里也找不到任何这样的帖子,我可以从那里获得任何线索。编写查询以执行此操作的任何线索都会有所帮助。

【问题讨论】:

  • 你可以用一个很长而且很复杂的where声明来做到这一点
  • 样本数据和预期结果将有助于说明您的规则。

标签: sql sql-server sql-server-2008 sql-server-2012 ssms


【解决方案1】:

CURSOR 有什么性能问题,考虑到很多情况,您可以使用 CURSOR 尝试以下选项并检查您的要求是否已满 -

DEMO HERE

DECLARE 
    @FirstName VARCHAR(MAX),
    @LastName VARCHAR(MAX),
    @Email VARCHAR(MAX),
    @Mobile  VARCHAR(MAX),

    @FirstName_prev VARCHAR(MAX),
    @LastName_prev VARCHAR(MAX),
    @Email_prev VARCHAR(MAX),
    @Mobile_prev  VARCHAR(MAX),

    @loop_start  INT = 0;

DECLARE @tmp TABLE(
    FirstName VARCHAR(MAX),
    LastName VARCHAR(MAX),
    Email VARCHAR(MAX),
    Mobile  VARCHAR(MAX)
);

DECLARE cursor_Contacts CURSOR
FOR SELECT FirstName,LastName,Email,Mobile
    FROM Contacts
    ORDER BY 
    ISNULL(FirstName,'ZZZZZZZZZZZZZZZ')
    ,ISNULL(LastName,'ZZZZZZZZZZZZZZZ')
    ,ISNULL(Email,'ZZZZZZZZZZZZZZZ')
    ,ISNULL(Mobile,'ZZZZZZZZZZZZZZZ');

OPEN cursor_Contacts;

FETCH NEXT FROM cursor_Contacts INTO 
     @FirstName,@LastName,@Email,@Mobile;

WHILE @@FETCH_STATUS = 0
    BEGIN

        IF @loop_start = 0

        BEGIN

            INSERT INTO @tmp(FirstName,LastName,Email,Mobile)
            VALUES (@FirstName,@LastName,@Email,@Mobile)

            SET @loop_start = 1

        END

        ELSE
        BEGIN

            IF
            (@FirstName_prev = @FirstName OR @FirstName IS NULL) AND
            (@LastName_prev = @LastName OR @LastName IS NULL) AND
            (@Email_prev = @Email OR @Email IS NULL) AND
            (@Mobile_prev = @Mobile OR @Mobile IS NULL)

            BEGIN
                SET @loop_start = 1
            END

            ELSE
            BEGIN
                SET @loop_start = 2

                INSERT INTO @tmp(FirstName,LastName,Email,Mobile)
                VALUES (@FirstName,@LastName,@Email,@Mobile)
            END
        END;


        SET @FirstName_prev = @FirstName
        SET @LastName_prev= @LastName
        SET @Email_prev = @Email
        SET @Mobile_prev= @Mobile;


        FETCH NEXT FROM cursor_Contacts INTO 
            @FirstName,@LastName,@Email,@Mobile;
    END;

CLOSE cursor_Contacts;

SELECT * FROM @tmp;

DEALLOCATE cursor_Contacts;

【讨论】:

  • 谢谢@mkRabbani。我按照光标方法工作。性能问题没问题,因为我只需要这样做一次。
  • 不客气@error_handler。很高兴看到它有帮助:)
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2019-08-10
  • 2016-03-30
  • 1970-01-01
  • 1970-01-01
  • 2023-02-22
  • 2016-10-11
相关资源
最近更新 更多