【问题标题】:How to format/Scrub Source data for Import into SQL Server SSNs and Phone Numbers如何格式化/清理源数据以导入 SQL Server SSN 和电话号码
【发布时间】:2020-06-19 17:40:28
【问题描述】:

由于我的数据库存储了 SSN 和电话号码的格式化数据,因此我需要一种方法来首先获取传入数据,无论其格式如何,并对其进行格式化以匹配我的数据库在这些字段中存储数据的方式。我要迁移的数据由应用程序由最终用户从外部应用程序临时导入临时表,然后重构和操作以插入到我的客户端数据库中。

我在处理没有正则表达式的数据时遇到问题。如何在 SQL Server 中完成此类 DML 任务?我的两种数据类型的所需输出如下。我正在努力将我的源数据转换为这些输出格式。

数据存储所需的插入输出格式

社会保障号:123-45-6789

SSN:如果 8 个字符,然后用前导零填充

SSN:如果少于 8 个字符,则用问号“?”填充... ???-??-1234(不要问)

电话:123-456-7890

示例代码

WITH fakeCSVData AS
(
    SELECT '111223333' AS SSN, '(444) 4444444'  AS Phone UNION ALL
    SELECT '211222121' AS SSN, '101 232-4545'   AS Phone UNION ALL
    SELECT '12334556'  AS SSN, '(191) 330-4345' AS Phone UNION ALL
    SELECT '41531'     AS SSN, '(039) 084-8309' AS Phone UNION ALL
    SELECT '220981278' AS SSN, '(298) 372-9234' AS Phone UNION ALL
    SELECT '222013450' AS SSN, '(78) 909-7790'  AS Phone UNION ALL
    SELECT '123456789' AS SSN, '(717)_272-7277' AS Phone UNION ALL
    SELECT '113344556' AS SSN, '210-973-2123'   AS Phone UNION ALL
    SELECT '808768252' AS SSN, '(219) 362-1895' AS Phone UNION ALL
    SELECT '3456'      AS SSN, '895 536-5356'   AS Phone UNION ALL
    SELECT '204874556' AS SSN, '(909) 544-9124' AS Phone UNION ALL
    SELECT '80832934'  AS SSN, '0271932132'     AS Phone


)

SELECT 


    CASE WHEN LTRIM(RTRIM(csv.ssn))           LIKE '[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]' THEN LTRIM(RTRIM(csv.ssn))
            WHEN LTRIM(RTRIM(csv.ssn))          LIKE '[0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]'      THEN RIGHT( REPLICATE('0', 1) + LTRIM(RTRIM( csv.ssn )), 11)
            WHEN LTRIM(RTRIM(csv.ssn))          LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'   THEN SUBSTRING(LTRIM(RTRIM(csv.ssn)),1,3) + '-' + SUBSTRING(LTRIM(RTRIM(csv.ssn)),4,2) + '-' + SUBSTRING(LTRIM(RTRIM(csv.ssn)),6,4)
            WHEN LTRIM(RTRIM(csv.ssn))          LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'        THEN RIGHT( REPLICATE('0', 1) + LTRIM(RTRIM( SUBSTRING(LTRIM(RTRIM(csv.ssn)),1,2) + '-' + SUBSTRING(LTRIM(RTRIM(csv.ssn)),3,2) + '-' + SUBSTRING(LTRIM(RTRIM(csv.ssn)),5,4) )), 11)
            WHEN RIGHT(LTRIM(RTRIM(csv.ssn)),4) LIKE '%[0-9][0-9][0-9][0-9]'                           THEN '???-??-' + RIGHT(LTRIM(RTRIM(csv.ssn)),4)
      END AS SocSecNo
    , NullIf(LEFT( REPLACE( LTRIM(RTRIM( REPLACE(REPLACE(csv.Phone, ')', ''), '(', '') )), ' ' , '-') , 12), '') AS Phone


FROM fakeCSVData csv

示例代码的当前输出

SocSecNo    | Phone
--------------------------
111-22-3333 | 444-4444444
211-22-2121 | 101-232-4545
012-33-4556 | 191-330-4345
???-??-1531 | 039-084-8309
220-98-1278 | 298-372-9234
222-01-3450 | 78-909-7790
123-45-6789 | 717_272-7277
???-??-4556 | 210-973-2123
808-76-8252 | 219-362-1895
???-??-3456 | 895-536-5356
204-87-4556 | 909-544-9124
080-83-2934 | 0271932132

我一直在想,如果我只有一个简单的方法来首先从传入的源数据中删除所有非数字字符,然后我可以根据需要格式化字符串...但我不是查找执行此操作的任何 SQL Server Native 函数。

【问题讨论】:

    标签: sql-server data-migration dml


    【解决方案1】:

    有点难看,但也许这会有所帮助

    WITH fakeCSVData AS
    (
        SELECT '111223333' AS SSN, '(444) 4444444'  AS Phone UNION ALL
        SELECT '211222121' AS SSN, '101 232-4545'   AS Phone UNION ALL
        SELECT '12334556'  AS SSN, '(191) 330-4345' AS Phone UNION ALL
        SELECT '41531'     AS SSN, '(039) 084-8309' AS Phone UNION ALL
        SELECT '220981278' AS SSN, '(298) 372-9234' AS Phone UNION ALL
        SELECT '222013450' AS SSN, '(78) 909-7790'  AS Phone UNION ALL
        SELECT '123456789' AS SSN, '(717)_272-7277' AS Phone UNION ALL
        SELECT '113344556' AS SSN, '210-973-2123'   AS Phone UNION ALL
        SELECT '808768252' AS SSN, '(219) 362-1895' AS Phone UNION ALL
        SELECT '3456'      AS SSN, '895 536-5356'   AS Phone UNION ALL
        SELECT '204874556' AS SSN, '(909) 544-9124' AS Phone UNION ALL
        SELECT '80832934'  AS SSN, '0271932132'     AS Phone
    )
    
    Select NewSSN = format(try_convert(bigint,SSN),choose(len(SSN)
                                                   ,'???-??-???0'
                                                   ,'???-??-??00'
                                                   ,'???-??-?000'
                                                   ,'???-??-0000'
                                                   ,'???-?0-0000'
                                                   ,'???-00-0000'
                                                   ,'??0-00-0000'
                                                   ,'?00-00-0000'
                                                   ,'000-00-0000') )
          ,NewPhn = format(try_convert(bigint,Phn),choose(len(Phn)
                                                   ,'???-???-???0'
                                                   ,'???-???-??00'
                                                   ,'???-???-?000'
                                                   ,'???-???-0000'
                                                   ,'???-??0-0000'
                                                   ,'???-?00-0000'
                                                   ,'???-000-0000'
                                                   ,'??0-000-0000'
                                                   ,'?00-000-0000'
                                                   ,'000-000-0000') )
     From fakeCSVData A
     Cross Apply ( values (  replace(
                             replace(
                             replace(
                             replace(
                             replace(Phone,' ','') 
                             ,'(','')
                             ,')','')
                             ,'-','')
                             ,'_','')
                          )
                 ) B(Phn)
    

    退货

    编辑

    您可能会注意到CROSS APPLY 将清理PHONE 字符串。这可能需要一些维护,甚至需要使用 UDF 来去除非数字值。

    【讨论】:

      猜你喜欢
      • 2021-04-08
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-03-10
      相关资源
      最近更新 更多