在Oracle中将连字符分隔的字符串拆分为行答案

【问题标题】：Split hyphen separated string into rows in Oracle在Oracle中将连字符分隔的字符串拆分为行
【发布时间】：2020-08-19 11:39:42
【问题描述】：

我在表中有一个列，它按顺序存储数据。一些数据使用连字符分隔，一些数据使用逗号分隔。我想将数据拆分成行。问题是逗号分隔的值在每个逗号之后被视为单个值，但对于连字符，它意味着一种数据范围。例如，如果字符串是这样的'A1, A2, A4'，则表示有 3 个值，将被转换为 3 行。还有像'A1-A4'这样的字符串，这意味着有4个值，将被转换为4行，因为连字符表示值的范围，表示起始值和结束值。

我能够转换逗号分隔的值，但不确定如何在 oracle 中拆分连字符分隔的范围。

  SELECT regexp_substr('A1,A2,A4' , '[^,]+', 1, level) as a
  FROM dual
  CONNECT BY regexp_substr('A1,A2,A4', '[^,]+', 1, level) is not null

以上 ddl 将提供的字符串转换为 3 行，这很好。

  SELECT regexp_substr('A1-A4' , '[^-]+', 1, level) as a
  FROM dual
  CONNECT BY regexp_substr('A1-A4', '[^-]+', 1, level) is not null

但上面的查询应该返回 4 行，但我不知道如何实现这一点。有什么想法吗？

【问题讨论】：

'A1-A4' 仅包含一个 '-'。所以预期的结果是 2 行而不是 4 行？
A1-A4 表示从 A1 到 A4 的数据，'A1, A2, A3, A4'
不是 SQL 语言。 'A1-A4' 是一个精确的字符串，regexp_substr 函数完全按照提供的参数作为参数。
尝试使用'A1-A2-A3-A4' 运行您的查询。
DBMS 应该如何知道A1-A4 对您意味着什么？您希望它表示从 A1 到 A4 的所有值，但是有多少个值？ A1.1、A1.2 等是否存在于该范围内？你制定规则，所以你必须对此进行编程。对于编程，我们通常使用编程语言。如果我是你，我会因此在 PL/SQL 中编写一个函数。在那里，您可以简单地遍历您的字符串并轻松解决诸如'A1-A3,A5,A6-A7,A9' 之类的表达式。

标签： sql oracle split

【解决方案1】：

假设模式始终是一对具有相同前缀（此处为“A”）且每个值后跟一个数字的值，您可以使用不同的正则表达式来提取前缀、起始数字和结束数字：

SELECT
  regexp_substr('A1-A4' , '(.*?)(\d+)-.*?(\d+)', 1, 1, null, 1) as prefix,
  to_number(regexp_substr('A1-A4' , '(.*?)(\d+)-.*?(\d+)', 1, 1, null, 2)) as start_num,
  to_number(regexp_substr('A1-A4' , '(.*?)(\d+)-.*?(\d+)', 1, 1, null, 3)) as end_num
FROM dual

PREFIX  START_NUM    END_NUM
------  ---------  ---------
A               1          4

然后在递归 CTE 中使用它来获取两者之间的值：

WITH rcte (prefix, num, end_num) AS (
  SELECT
    regexp_substr('A1-A4' , '(.*?)(\d+)-.*?(\d+)', 1, 1, null, 1),
    to_number(regexp_substr('A1-A4' , '(.*?)(\d+)-.*?(\d+)', 1, 1, null, 2)),
    to_number(regexp_substr('A1-A4' , '(.*?)(\d+)-.*?(\d+)', 1, 1, null, 3))
  FROM dual
  UNION ALL
  SELECT prefix, num + 1, end_num
  FROM rcte
  WHERE num < end_num
)
SELECT prefix || num as result
FROM rcte

RESULT
------
A1
A2
A3
A4

db<>fiddle

您可以在一个查询中结合这两种方法，进一步假设您没有在同一字符串中混合使用逗号分隔的值和范围； db<>fiddle demo。如果您确实有混合，则可以将它们串联应用；将逗号分隔的行转换为行，然后进一步处理实际上是连字符范围的任何新行。

【讨论】：

【解决方案2】：

带有扩展样本数据的完整示例：

with t(n, str) as (
select 1,'A1, A2, A4' from dual union all
select 2,'B1, B4, B7-B11' from dual union all
select 3,'C1, C3, C5-C7' from dual union all
select 4,'XY1, XT3, ZZ5-ZZ7' from dual 
)
select *
from t
    ,lateral(
        select level part_n, regexp_substr(str,'[^ ,]+',1,level) part
        from dual 
        connect by level<=regexp_count(str,'[^ ,]+')
     )
    ,lateral(
        select 
           level sub_part_n, 
           nvl(
              regexp_substr(part,'(\w+)(\d+)[ -]+\1(\d+)',1,1,null,1)
              ||
              (regexp_substr(part,'(\w+)(\d+)[ -]+\1(\d+)',1,1,null,2) + level -1) 
             ,part
             )
             as subpart
        from dual 
        connect by level<= regexp_substr(part,'(\w+)(\d+)[ -]+\1(\d+)',1,1,null,3)
                         - regexp_substr(part,'(\w+)(\d+)[ -]+\1(\d+)',1,1,null,2)
                         + 1
    )

结果：

         N STR                   PART_N PART       SUB_PART_N SUBPART
---------- ----------------- ---------- ---------- ---------- ----------
         1 A1, A2, A4                 1 A1                  1 A1
         1 A1, A2, A4                 2 A2                  1 A2
         1 A1, A2, A4                 3 A4                  1 A4
         2 B1, B4, B7-B11             1 B1                  1 B1
         2 B1, B4, B7-B11             2 B4                  1 B4
         2 B1, B4, B7-B11             3 B7-B11              1 B7
         2 B1, B4, B7-B11             3 B7-B11              2 B8
         2 B1, B4, B7-B11             3 B7-B11              3 B9
         2 B1, B4, B7-B11             3 B7-B11              4 B10
         2 B1, B4, B7-B11             3 B7-B11              5 B11
         3 C1, C3, C5-C7              1 C1                  1 C1
         3 C1, C3, C5-C7              2 C3                  1 C3
         3 C1, C3, C5-C7              3 C5-C7               1 C5
         3 C1, C3, C5-C7              3 C5-C7               2 C6
         3 C1, C3, C5-C7              3 C5-C7               3 C7
         4 XY1, XT3, ZZ5-ZZ7          1 XY1                 1 XY1
         4 XY1, XT3, ZZ5-ZZ7          2 XT3                 1 XT3
         4 XY1, XT3, ZZ5-ZZ7          3 ZZ5-ZZ7             1 ZZ5
         4 XY1, XT3, ZZ5-ZZ7          3 ZZ5-ZZ7             2 ZZ6
         4 XY1, XT3, ZZ5-ZZ7          3 ZZ5-ZZ7             3 ZZ7

【讨论】：

【解决方案3】：

如果应用于具有多行的表，那么您可以尝试这样的操作（参见代码中的 cmets）：

SQL> with test (id, col) as
  2  -- sample data
  3    (select 1, 'A1,A2,A4' from dual union all
  4     select 2, 'BX8-BX11' from dual union all
  5     select 3, 'C1,C4'    from dual union all
  6     select 4, 'D6-D9'    from dual
  7    ),
  8  temp as
  9  -- split e.g. "BX8-BX11" to "BX", 8 and 11
 10    (select id,
 11            regexp_substr(col, '^[[:alpha:]]+') alp,
 12            to_number(regexp_substr(col, '\d+', 1, 1)) num1,
 13            to_number(regexp_substr(col, '\d+', 1, 2)) num2
 14     from test
 15     where instr(col, '-') > 0
 16    )
 17  -- trivial - split comma-separated values to rows
 18  select id,
 19         regexp_substr(col, '[^,]+', 1, column_value) val
 20  from test cross join table(cast(multiset(select level from dual
 21                                           connect by level <= regexp_count(col, ',') + 1
 22                                          ) as sys.odcinumberlist))
 23  where instr(col, '-') = 0
 24  union all
 25  -- create rows for values that are dash-separated
 26  select id,
 27         alp || to_char(num1 + column_value - 1) val
 28  from temp cross join table(cast(multiset(select level from dual
 29                                           connect by level <= num2 - num1 + 1
 30                                          ) as sys.odcinumberlist))
 31  order by id, val;

        ID VAL
---------- ------------------------------------------------
         1 A1
         1 A2
         1 A4
         2 BX10
         2 BX11
         2 BX8
         2 BX9
         3 C1
         3 C4
         4 D6
         4 D7
         4 D8
         4 D9

13 rows selected.

SQL>

【讨论】：

【解决方案4】：

或者：

CROSS JOIN 您的输入带有一系列连续整数，并选择第 i 次出现的连续非逗号/连续非连字符......

WITH
input(sid,str) AS (
          SELECT 1,'A1,A2,A4' FROM dual
UNION ALL SELECT 2,'ANY-BY-HYPHEN' FROM dual
)
-- a set of running integer variables
,
i(i) AS (
          SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
UNION ALL SELECT 6
)
SELECT
  sid
, i
, REGEXP_SUBSTR(str,'[^-,]+',1,i) AS part
FROM input CROSS JOIN i
WHERE REGEXP_SUBSTR(str,'[^-,]+',1,i) <>''
ORDER BY sid,i
;
-- out  sid | i |  part  
-- out -----+---+--------
-- out    1 | 1 | A1
-- out    1 | 2 | A2
-- out    1 | 3 | A4
-- out    2 | 1 | ANY
-- out    2 | 2 | BY
-- out    2 | 3 | HYPHEN

【讨论】：