无法在单行 oracle 中获取所有匹配的 regex_substr答案

【问题标题】：unable to get get all occurrences of matched regex_substr in single row oracle无法在单行 oracle 中获取所有匹配的 regex_substr
【发布时间】：2020-12-05 06:51:09
【问题描述】：

我正在尝试从 oracle 11g 中的数据库列中获取特定数据，但我的正则表达式只返回第一次出现的数据。知道如何获得由“|”分隔的同一行中的所有事件吗？？

我的查询：

SELECT regexp_substr(
         '"Error:" {user_1@domain.com}<"User_2" {user_2@domain.com};"Error:" {user_3@domain.com}<"User_4" {user_4@domain.com};',
         'Error:[^<]+<'
       ) AS emails
FROM   DUAL;

我的输出应该是：

Error:" {user_1@domain.com}< Error:" {user_3@domain.com}<

当前输出为：

Error:" {user_1@domain.com}<

为了清楚起见，我在我的表中附加了插入语句的 dml：

insert into tests(result) values ('</span></td></tr><tr><td><span class="inputlabel">[14].</span>&nbsp&nbsp<span class="label">ORC|Boston Medical Center|||||||||||||
</span></td></tr><tr><td><span class="inputlabel">[15].</span>&nbsp&nbsp<span class="label">OBR|05-123|LOINC-Lcl-11546-9-1|20050415||||||||||LocalCode: Abscess2||||c|||||
</span></td></tr><tr><td class="errorlabel" nowrap>Error: Report Status Code (ReportStatusCode of type ID) value (c) is invalid Vocabulary code.</td></tr><tr><td class="errorlabel" nowrap>Message rejected.</td></tr><tr><td><span class="inputlabel">[17].</span>&nbsp&nbsp<span class="label">PID||||||||RecCtl_ID|FORTES|AVERY||||||||||||||||||
</span></td></tr><tr><td><span class="inputlabel">[18].</span>&nbsp&nbsp<span class="label">NK1|NK Last Name|NK First Name||||||||||
</span></td></tr><tr><td><span class="inputlabel">[19].</span>&nbsp&nbsp<span class="label">ORC|Boston Medical Center|||||||||||||
</span></td></tr><tr><td><span class="inputlabel">[20].</span>&nbsp&nbsp<span class="label">OBR|05-123|LOINC-Lcl-11546-9-1|20050415||||||||||Local 128477000||||Report_Status_Code 12345678_30|||||
</span></td></tr><tr><td><span class="inputlabel">[21].</span>&nbsp&nbsp<span class="label">OBX|LOINC-Lcl-11546-9-4|SMED-Lcl-78181009-4|||||F|200504231010|BMC
</span></td></tr><tr><td class="inputlabel" nowrap>Processing Results: 3 Messages Accepted, <span class="errorlabel">1 Messages Rejected.</span></td></tr><tr><td class="inputlabel" nowrap>End Time: 2011-08-07 18:47:47.312</td></tr></table>
</span></td></tr><tr><td class="errorlabel" nowrap>Error: Report Status Code (ReportStatusCode of type ID) value (c) is invalid Vocabulary code.</td></tr><tr><td class="errorlabel" nowrap>Message rejected.</td></tr><tr><td><span class="inputlabel">[17].</span>&nbsp&nbsp<span class="label">PID||||||||RecCtl_ID|FORTES|AVERY||||||||||||||||||
')

表创建：

create table TESTS
(
  result CLOB
)

现在我想要来自上述 HTML 的所有错误消息，即输出应该是这样的：

Error: Report Status Code (ReportStatusCode of type ID) value (c) is invalid Vocabulary code Error: Report Status Code (ReportStatusCode of type ID) value (c) is invalid Vocabulary code

现在我只收到 1 条错误消息。

【问题讨论】：

标签： sql regex oracle11g regex-lookarounds

【解决方案1】：

我不是特别擅长，但是 - 看看它是否有帮助。

第 1 - 4 行的样本数据
TEMPCTE 将字符串拆分成行
final select（从第 10 行开始）聚合包含“错误”的结果

SQL> select * From v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for 64-bit Windows: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

SQL> with test (col) as
  2    (select '"Error:" {user_1@domain.com}<"User_2" {user_2@domain.com};"Error:" {user_3@domain.com}<"User_4" {user_4@domain.com};'
  3     from dual
  4    ),
  5  temp as
  6    (select regexp_substr(col, '[^<;]+', 1, level) val
  7     from test
  8     connect by level <= regexp_count(col, '<') + 1
  9    )
 10  select listagg(val, '< ') within group (order by val) result
 11  from temp
 12  where instr(val, 'Error') > 0;

RESULT
--------------------------------------------------------------------------------
"Error:" {user_1@domain.com}< "Error:" {user_3@domain.com}

SQL>

使用您发布的示例数据：

SQL> set pagesize 100
SQL> set long 10000
SQL>
SQL> with
  2  temp as
  3    (select regexp_substr(result, '[^<;]+', 1, level) val
  4     from tests
  5     connect by level <= regexp_count(result, '<') + 1
  6    )
  7  select replace(val, 'td class="errorlabel" nowrap>', '') result
  8  from temp
  9  where instr(val, 'Error') > 0;

RESULT
--------------------------------------------------------------------------------
Error: Report Status Code (ReportStatusCode of type ID) value (c) is invalid Voc
abulary code.

Error: Report Status Code (ReportStatusCode of type ID) value (c) is invalid Voc
abulary code.


SQL>

【讨论】：

实际上我的数据来自列表，我无法像这样创建临时表。
如果您发布了 CREATE TABLE 和 INSERT INTO 示例数据，我就不必自己创建测试用例了。此外，我明确表示前几行代表样本数据。你对此有什么不明白的？您可能需要的代码从第 5 行开始。
哦，是的。请参阅我更新的 SQL*Plus 输出，其中显示了数据库版本。这意味着你做错了什么。
谢谢，但请您看看我的示例数据和表格创建脚本 - 可能我们还需要更改其他内容。
不客气。我在答案中添加了更多代码。我删除了 LISTAGG（因为您的结果包含两行（不是同一行中的串联值）。请看一下。

【解决方案2】：

这回答了问题的原始版本。

如果你想要多个子字符串，那么正确的策略是regexp_replace()。我认为这是在做你指定的事情：

SELECT regexp_replace(
         '"Error:" {user_1@domain.com}<"User_2" {user_2@domain.com};"Error:" {user_3@domain.com}<"User_4" {user_4@domain.com};',
         '"(Error:[^<]+<)[^;]+;',
         '\1'
       ) AS emails
FROM   DUAL;

Here 是一个 dbfiddle。

【讨论】：

不 - 我想要这样的东西：错误：" {user_1@domain.com}
@GucciDeveloper 。 . .我更新了答案。我不知道为什么我认为它第一次起作用。
你能看看我编辑的问题吗？以上查询不适用于我的示例数据。