【问题标题】:Data extraction with multiple delimiters?使用多个分隔符提取数据?
【发布时间】:2015-11-24 14:27:49
【问题描述】:

我有一个由分号和逗号分隔的旧数据源列。第一个分号表示姓氏,第二个表示名字和中间名(或首字母),最后一个分号表示个人类型。逗号表示新名称已开始。这是此数据的示例。

+-------+---------------------------------------------------------------------------------------------------------------------+
|  ID   | SOURCE                                                                                                              |
+-------+---------------------------------------------------------------------------------------------------------------------+
| 62963 | RENZ;MICHAEL;DECEASED,WANDER;MARIA;MINOR,WANDER;HENRY RUDOLPH;MINOR,WANDER;ROSA;MINOR,WANDER;PAUL EMIL;MINOR        |
| 62964 | HERNDON;A C;ESTATE,BERRING;A F;DECEASED,BEIRING;A F;DECEASED,BEIRING;ANDREAS FREDERICK;DECEASED                     |
| 62965 | ZINCH;;ESTATE,ZINTZ;;ESTATE,HAYNES;HENRY;DECEASED                                                                   |
| 62965 | ZINCH;;ESTATE,ZINTZ;;ESTATE,HAYNES;HENRY;DECEASED                                                                   |
| 62966 | KRAUS;JOSEPHINE;MINOR,KENNEDY;GEORGE;DECEASED                                                                       |
| 62967 | CAREY;JAMES;ESTATE,DE LA GARZA;REFUGIO;DECEASED                                                                     |
| 62968 | LEWIS;FLORENCE;ESTATE,LOCKWOOD;ALBERT A;DECEASED                                                                    |
| 62969 | GLAESER;EMMA;MINOR,GLAESER;HERMAN JR;MINOR,GLAESER;HERMAN;MINOR,RODRIGUEZ;HILARIO;DECEASED,RODRIGUEZ;MARIE;DECEASED |
| 62970 | STORY;BETTIE;ESTATE,EIGENDORFF;FRANZ;DECEASED                                                                       |
| 62971 | HOWELL;MAMIE;MINOR,HOWELL;ETHEL;MINOR                                                                               |
+-------+---------------------------------------------------------------------------------------------------------------------+

我正在尝试以如下方式提取数据:

+-----------+------------+-------------+-------------------+----------+
|      ID   |   SEQUENCE |    LAST     |    FIRSTMIDDLE    |   TYPE   |
+-----------+------------+-------------+-------------------+----------+
|     62963 |          1 | RENZ        | MICHAEL           | DECEASED |
|     62963 |          2 | WANDER      | MARIA             | MINOR    |
|     62963 |          3 | WANDER      | HENRY RUDOLPH     | MINOR    |
|     62963 |          4 | WANDER      | ROSA              | MINOR    |
|     62963 |          5 | WANDER      | PAUL EMIL         | MINOR    |
|     62964 |          1 | HERNDON     | A C               | ESTATE   |
|     62964 |          2 | BERRING     | A F               | DECEASED |
|     62964 |          3 | BEIRING     | A F               | DECEASED |
|     62964 |          4 | BEIRING     | ANDREAS FREDERICK | DECEASED |
|     62965 |          1 | ZINCH       |                   | ESTATE   |
|     62965 |          2 | ZINTZ       |                   | ESTATE   |
|     62965 |          3 | HAYNES      | HENRY             | DECEASED |
|     62966 |          1 | KRAUS       | JOSEPHINE         | MINOR    |
|     62966 |          2 | KENNEDY     | GEORGE            | DECEASED |
|     62967 |          1 | CAREY       | JAMES             | ESTATE   |
|     62967 |          2 | DE LA GARZA | REFUGIO           | DECEASED |
|     62968 |          1 | LEWIS       | FLORENCE          | ESTATE   |
|     62968 |          2 | LOCKWOOD    | ALBERT A          | DECEASED |
|     62969 |          1 | GLAESER     | EMMA              | MINOR    |
|     62969 |          2 | GLAESER     | HERMAN JR         | MINOR    |
|     62969 |          3 | GLAESER     | HERMAN            | MINOR    |
|     62969 |          4 | RODRIGUEZ   | HILARIO           | DECEASED |
|     62969 |          5 | RODRIGUEZ   | MARIE             | DECEASED |
|     62970 |          1 | STORY       | BETTIE            | ESTATE   |
|     62970 |          2 | EIGENDORFF  | FRANZ             | DECEASED |
|     62971 |          1 | HOWELL      | MAMIE             | MINOR    |
|     62971 |          2 | HOWELL      | ETHEL             | MINOR    |
+-----------+------------+-------------+-------------------+----------+

这种类型的数据提取是我不太熟悉的。我想我需要使用SUBSTRINGCHARINDEX 的复杂组合,但鉴于源列可以包含的条目数量各不相同,我不确定如何最好地处理这个问题。任何关于我应该从哪里开始的指导都会非常有帮助。

【问题讨论】:

  • Google“SQL 拆分函数”实际上有数千个示例。
  • 理想情况下,这段代码的目的是修复这个严重损坏的架构。您永远不想发现自己将分隔数据存储在列中。
  • 解析字符串不是为 SQL 设计的。实际上这是最糟糕的想法。你考虑过SQL CLR Functions吗?
  • @RBarryYoung 我会研究这个的,谢谢。
  • @JoelCoehoorn 没错。我正在尝试将提供给我们的源数据转换为适用于我们的新架构的格式。

标签: sql sql-server tsql


【解决方案1】:

使用拆分字符串概念和parsename 来做到这一点

SELECT id,
       Row_number()
         OVER (
           partition BY id
           ORDER BY (SELECT NULL ))AS sequence,
       Parsename(Replace(col3, ';', '.'), 3) as LAST,
       Parsename(Replace(col3, ';', '.'), 2) as FIRSTMIDDLE,
       Parsename(Replace(col3, ';', '.'), 1) as TYPE
FROM   (SELECT id,
               Split.a.value('.', 'VARCHAR(100)') col3
        FROM   (SELECT id,
                       Cast ('<M>' + Replace(item_id, ',', '</M><M>')
                             + '</M>' AS XML) AS Data
                FROM   #yourtable) AS A
               CROSS APPLY Data.nodes ('/M') AS Split(a))a 

【讨论】:

  • 仅供参考,使用PARSENAME 既快捷又简单,但如果源数据中有句点 (.),它将中断。您可能需要先将它们转换为未使用的字符,然后在 PARSENAME 函数返回值后将它们转换回句点。
【解决方案2】:
create table #temp (id int, [source] nvarchar(4000))

insert #temp (id, [source])
      select 62963, 'RENZ;MICHAEL;DECEASED,WANDER;MARIA;MINOR,WANDER;HENRY RUDOLPH;MINOR,WANDER;ROSA;MINOR,WANDER;PAUL EMIL;MINOR'
union select 62964, 'HERNDON;A C;ESTATE,BERRING;A F;DECEASED,BEIRING;A F;DECEASED,BEIRING;ANDREAS FREDERICK;DECEASED'
union select 62965, 'ZINCH;;ESTATE,ZINTZ;;ESTATE,HAYNES;HENRY;DECEASED'
union select 62965, 'ZINCH;;ESTATE,ZINTZ;;ESTATE,HAYNES;HENRY;DECEASED'
union select 62966, 'KRAUS;JOSEPHINE;MINOR,KENNEDY;GEORGE;DECEASED'
union select 62967, 'CAREY;JAMES;ESTATE,DE LA GARZA;REFUGIO;DECEASED'
union select 62968, 'LEWIS;FLORENCE;ESTATE,LOCKWOOD;ALBERT A;DECEASED'
union select 62969, 'GLAESER;EMMA;MINOR,GLAESER;HERMAN JR;MINOR,GLAESER;HERMAN;MINOR,RODRIGUEZ;HILARIO;DECEASED,RODRIGUEZ;MARIE;DECEASED'
union select 62970, 'STORY;BETTIE;ESTATE,EIGENDORFF;FRANZ;DECEASED'
union select 62971, 'HOWELL;MAMIE;MINOR,HOWELL;ETHEL;MINOR'

select id, 
    row_number() over(partition by id order by id) as [sequence],
    [1] as [last], 
    [2] as [firstmiddle], 
    [3] as [type]
from (
    select id, attributeid, attribute, 
        row_number() over(partition by attributeid order by personid) x
    from (
        select id, 
            personid,
            row_number() over(partition by personid order by personid) attributeid,
            attribute
        from (
            select id, 
                personid, 
                attribute = y.i.value('(./text())[1]', 'nvarchar(4000)')
            from 
            ( 
                select id, personid, x = convert(xml, '<i>' 
                    + replace(person, ';', '</i><i>') 
                    + '</i>').query('.')
                from (
                    select id, 
                        row_number() over (order by id) as personid, 
                        person = y.i.value('(./text())[1]', 'nvarchar(4000)')
                    from ( 
                        select id, x = convert(xml, '<i>' 
                            + replace([source], ',', '</i><i>') 
                            + '</i>').query('.')
                        from #temp
                    ) personxml 
                    cross apply x.nodes('i') AS y(i)
                ) personsplit
            ) attributexml
            cross apply x.nodes('i') AS y(i)
        ) attributesplit
    ) attributes
) as sourcetable
pivot (
    min(attribute)
    for attributeid in ([1],[2],[3])
) as pivottable

【讨论】:

    猜你喜欢
    • 2012-05-20
    • 1970-01-01
    • 1970-01-01
    • 2019-08-06
    • 1970-01-01
    • 2022-01-13
    • 1970-01-01
    • 2020-09-23
    • 2019-04-27
    相关资源
    最近更新 更多