【发布时间】:2021-12-24 13:54:28
【问题描述】:
我有一个客户列表,其中有一堆是重复的('Acme Inc'、'Acme, Inc'、'Acme Inc.'、'Acme, Inc.')他们都有不同的 ID。 但是,每个 ID 也有多个地址。比如……
+-------+---------------+-------------------+-----------+---+-------+
|ID |Name |Address |City |St |Zip |
+-------+---------------+-------------------+-----------+---+-------+
|001 |Acme Inc |123 Address St |Columbus |OH |43081 |
|001 |Acme Inc |321 Street St |Columbus |OH |43081 |
|001 |Acme Inc |456 Blanket Blvd |Columbus |OH |43081 |
|002 |Acme, Inc |123 Babel St |Columbus |OH |43081 |
|002 |Acme, Inc |321 Acorn Rd |Columbus |OH |43081 |
|002 |Acme, Inc |456 Lancer Blvd |Columbus |OH |43081 |
|003 |Baker |456 Blanket Blvd |Columbus |OH |43081 |
|004 |Peterson |456 Blanket Blvd |Columbus |OH |43081 |
|005 |Plumbers Inc |123 Address St |Columbus |OH |43081 |
|006 |Plumbers, LLC |321 Street St |Columbus |OH |43081 |
|007 |Acme, Inc. |123 Address St |Columbus |OH |43081 |
我有一个标准化名称的函数,所以前 6 个都是“Acme”,后两个都是“Plumbers”。
我想要的是重复的 ID 和名称列表。目标是报告具有唯一 ID 和重复名称的记录。
+-------+---------------+
|ID |Name |
+-------+---------------+
|001 |Acme Inc |
|002 |Acme, Inc |
|007 |Acme, Inc. |
|005 |Plumbers Inc |
|006 |Plumbers, LLC |
我试过了:
SELECT
DISTINCT [Name],
( SELECT strNew FROM [fn_strNorm](2, [Name]) ) AS [NewName]
FROM [Processed_Vendors]
WHERE
[VendorID] <> '' AND
[VendorID] IS NOT NULL AND
[Name]<> '' AND
[Name] IS NOT NULL
GROUP BY [NewName]
HAVING COUNT(*) > 1
ORDER BY [NewName]
我也尝试将它们放入 [dump_names] 表并加入两者,但我不断从同一个 ID 获取多条记录
SELECT
pv.[VendorID],
pv.[Name]
FROM [dupe_names] n
LEFT JOIN [Processed_Vendors] pv
ON pv.[Name] = n.[Name]
ORDER BY pv.[Name]
SELECT
'Name Match' AS [Reason],
pv.[VendorID],
pv.[Name]
FROM [dupe_names] n
LEFT JOIN [Processed_Vendors] pv
ON pv.[Name] = n.[Name]
AND ( SELECT strNew FROM [dbo].[fn_strNorm](2, pv.[Name]) ) = n.[NewName]
ORDER BY pv.[Name]
我认为我想太多了,或者我的偏头痛正在影响我的想法。 无论哪种方式,我都感谢您的帮助。
【问题讨论】:
-
您使用的是哪种 DBMS 产品? “SQL”只是所有关系数据库都使用的一种查询语言,而不是特定数据库产品的名称(而且你的代码是非标准的SQL)。请为您使用的数据库产品添加tag。 Why should I tag my DBMS
标签: sql duplicates distinct