带有函数调用的 REGEXP_REPLACE 反向引用答案

【问题标题】：REGEXP_REPLACE back-reference with function call带有函数调用的 REGEXP_REPLACE 反向引用
【发布时间】：2019-01-22 21:01:35
【问题描述】：

我可以对REGEXP_REPLACE 反向引用值使用一些函数调用吗？

例如，我想调用 chr() 或任何其他关于反向引用值的函数，但是这个

SELECT REGEXP_REPLACE('a 98 c 100', '(\d+)', ASCII('\1')) FROM dual;

只返回 '\' 的 ASCII 值：

'a 92 c 92'

我希望先评估最后一个参数（替换字符串），然后再替换字符串。所以结果是：

'a b c d'

【问题讨论】：

标签： sql oracle regexp-replace

【解决方案1】：

真的只是为了好玩，您可以使用 XPath 进行标记化、数字到字符的转换以及聚合：

select *
from xmltable(
  'string-join(
    for $t in tokenize($s, " ") 
      return if ($t castable as xs:integer) then codepoints-to-string(xs:integer($t)) else $t,
    " ")'
  passing 'a 98 c 100' as "s"
);

Result Sequence                                                                 
--------------------------------------------------------------------------------
a b c d

初始字符串值作为$s传入； tokenize() 使用空格作为分隔符将其拆分；对生成的每个$t 进行评估以查看它是否为整数，如果是，则通过codepoints-to-string 将其转换为等效字符，否则将不理会它；然后将所有令牌与string-join() 重新组合。

如果原始文件有多个空格，则这些空格将折叠成一个空格（就像 Littlefoot 的正则表达式一样）。

【讨论】：

如果我能活 1000 岁，我将无法编写这样的查询。我几乎无语了。

【解决方案2】：

使用一个正则表达式我不是很聪明，但是 - 一步一步，这样的事情可能会有所帮助。它将源字符串拆分为行，检查它的一部分是否是数字，如果是，则选择其中的CHR。最后，所有内容都聚合回单个字符串。

SQL> with test (col) as
  2    (select 'a 98 c 100' from dual),
  3  inter as
  4    (select level lvl,
  5            regexp_substr(col, '[^ ]+', 1, level) c_val
  6     from test
  7     connect by level <= regexp_count(col, ' ') + 1
  8    ),
  9  inter_2 as
 10    (select lvl,
 11            case when regexp_like(c_val, '^\d+$') then chr(c_val)
 12                 else c_val
 13            end c_val_2
 14     from inter
 15    )
 16  select listagg(c_val_2, ' ') within group (order by lvl) result
 17  from inter_2;

RESULT
--------------------
a b c d

SQL>

它可以缩短一个步骤（我故意将它原样这样您可以一次执行一个查询并检查结果，以使事情更清楚）：

SQL> with test (col) as
  2    (select 'a 98 c 100' from dual),
  3  inter as
  4    (select level lvl,
  5            case when regexp_like(regexp_substr(col, '[^ ]+', 1, level), '^\d+$')
  6                      then chr(regexp_substr(col, '[^ ]+', 1, level))
  7                 else regexp_substr(col, '[^ ]+', 1, level)
  8            end c_val
  9     from test
 10     connect by level <= regexp_count(col, ' ') + 1
 11    )
 12  select listagg(c_val, ' ') within group (order by lvl) result
 13  from inter;

RESULT
--------------------
a b c d

SQL>

[编辑：如果输入看起来不同怎么办？]

这有点简单。使用REGEXP_SUBSTR，提取数字：..., 1, 1 返回第一个，... 1, 2 返回第二个。纯REPLACE 然后用它们的CHR 值替换数字。

SQL> with test (col) as
  2      (select 'a98c100e' from dual)
  3  select
  4    replace(replace(col, regexp_substr(col, '\d+', 1, 1), chr(regexp_substr(col, '\d+', 1, 1))),
  5                         regexp_substr(col, '\d+', 1, 2), chr(regexp_substr(col, '\d+', 1, 2))) result
  6  from test;

RESULT
--------------------
abcde

SQL>

【讨论】：

感谢您的快速回复，如果我的输入数据如下所示：'a99c199e'，您有什么建议吗？
不客气。我编辑了我的信息；请看一下。