如何在 Oracle 的文本中保留相同的大小写？答案

【问题标题】：How to retain the same case in the text in Oracle?如何在 Oracle 的文本中保留相同的大小写？
【发布时间】：2020-01-21 00:54:56
【问题描述】：

我有一个包含单词和句子列的表格。这个想法是如果在单词列中找到单词，则用链接（包括单词本身）替换句子中的单词。下面的查询完美替换，但由于链接是从 temp.word 列构造的，因此句子中单词的大小写更改为 word 列中单词的大小写。有没有办法在句子本身中保留相同的大小写？

Create table temp(
  id       NUMBER,
  word     VARCHAR2(1000),
  sentence VARCHAR2(2000)
);

insert into temp
SELECT 1,'automation testing', 'automtestingation Testing is popular kind of testing' FROM DUAL UNION ALL
SELECT 2,'testing','manual testing' FROM DUAL UNION ALL
SELECT 3,'manual testing','this is an old method of testing' FROM DUAL UNION ALL
SELECT 4,'punctuation','automation testing,manual testing,punctuation,automanual testing-testing' FROM DUAL UNION ALL
SELECT 5,'B-number analysis','B-number analysis table' FROM DUAL UNION ALL
SELECT 6,'B-number analysis table','testing B-number analysis' FROM DUAL UNION ALL
SELECT 7,'Not Matched','testing testing testing' FROM DUAL;

with words(id, word, word_length, search1, replace1, search2, replace2) as (
  select id, word, length(word),
  '(^|\W)' || REGEXP_REPLACE(word, '([][)(}{|^$\.*+?])', '\\\1') || '($|\W)',
  '\1{'|| id ||'}\2',
  '{'|| id ||'}',
  'http://localhost/' || id || '/<u>' || word || '</u>'
  FROM temp
)
, joined_data as (
  select w.search1, w.replace1, w.search2, w.replace2,
    s.rowid s_rid, s.sentence,
    row_number() over(partition by s.rowid order by word_length desc) rn
  from words w
  join temp s
  on instr(UPPER(s.sentence), UPPER(w.word)) > 0
  and regexp_like(s.sentence, w.search1)
)
, unpivoted_data as (
  select S_RID, SENTENCE, PHASE, SEARCH_STRING, REPLACE_STRING,
    row_number() over(partition by s_rid order by phase, rn) rn,
    case when row_number() over(partition by s_rid order by phase, rn)
      = count(*) over(partition by s_rid)
      then 1
      else 0
    end is_last
  from joined_data
  unpivot(
    (search_string, replace_string) 
    for phase in ( (search1, replace1) as 1, (search2, replace2) as 2 ))
)
, replaced_data(S_RID, RN, is_last, SENTENCE) as (
  select S_RID, RN, is_last,
    regexp_replace(SENTENCE, search_string, replace_string,1,0,'i')
  from unpivoted_data
  where rn = 1
  union all
  select n.S_RID, n.RN, n.is_last,
    case when n.phase = 1
      then regexp_replace(o.SENTENCE, n.search_string, n.replace_string,1,0,'i')
      else replace(o.SENTENCE, n.search_string, n.replace_string)
    end
  from unpivoted_data n
  join replaced_data o
    on o.s_rid = n.s_rid and n.rn = o.rn + 1  
)
select s_rid, sentence from replaced_data
where is_last = 1
order by s_rid;

For example, for id = 1, the sentence is automtestingation Testing is popular kind of testing
After replacement it will be automtestingation http://localhost/2/<u>testing</u> is popular kind of http://localhost/2/<u>testing</u>.

The word Testing is replaced with testing(from the temp.word column).

预期的结果是

automtestingation http://localhost/2/<u>Testing</u> is popular kind of http://localhost/2/<u>testing</u>

【问题讨论】：

您使用的是什么版本的 Oracle？ “作弊”并使用WITH FUNCTION 来回答这个问题可能更容易，但该功能是在 12c 中引入的。
我使用的是 11g :-/
嗨，乔恩，您有没有其他方法可以实现这一目标？目前，我想不出其他方法。非常感谢任何其他想法。
你能创建一个常规的 PL/SQL 函数吗？
是的，我可以。函数中要实现什么样的逻辑？

标签： oracle replace pattern-matching regexp-replace

【解决方案1】：

Oracle 设置：

Create table temp(
  id       NUMBER,
  word     VARCHAR2(1000),
  Sentence VARCHAR2(2000)
);

insert into temp
SELECT 1,'automation testing', 'automtestingation TeStInG TEST is popular kind of testing' FROM DUAL UNION ALL
SELECT 2,'testing','manual testing' FROM DUAL UNION ALL
select 2,'test', 'test' FROM DUAL UNION ALL
SELECT 3,'manual testing','this is an old method of testing' FROM DUAL UNION ALL
SELECT 4,'punctuation','automation testing,manual testing,punctuation,automanual testing-testing' FROM DUAL UNION ALL
SELECT 5,'B-number analysis','B-number analysis table' FROM DUAL UNION ALL
SELECT 6,'B-number analysis table','testing B-number analysis' FROM DUAL UNION ALL
SELECT 7,'Not Matched','testing testing testing' FROM DUAL UNION ALL
SELECT 8,'^[($','testing characters ^[($ that need escaping in a regular expression' FROM DUAL;

SQL 类型：

CREATE TYPE stringlist IS TABLE OF VARCHAR2(4000);
/
CREATE TYPE intlist IS TABLE OF NUMBER(20,0);
/

PL/SQL 函数：

CREATE FUNCTION replace_words(
  word_list IN  stringlist,
  id_list   IN  intlist,
  sentence  IN  temp.sentence%TYPE
) RETURN temp.sentence%TYPE
IS
  p_sentence       temp.sentence%TYPE := UPPER( sentence );
  p_pos            PLS_INTEGER := 1;
  p_min_word_index PLS_INTEGER;
  p_word_index     PLS_INTEGER;
  p_start          PLS_INTEGER;
  p_index          PLS_INTEGER;
  o_sentence       temp.sentence%TYPE;
BEGIN
  LOOP
    p_min_word_index := NULL;
    p_index          := NULL;
    FOR i IN 1 .. word_list.COUNT LOOP
      p_word_index := p_pos;
      LOOP
        p_word_index := INSTR( p_sentence, word_list(i), p_word_index );
        EXIT WHEN p_word_index = 0;
        IF (   p_word_index  > 1
           AND REGEXP_LIKE( SUBSTR( p_sentence, p_word_index - 1, 1 ), '\w' )
           )
           OR  REGEXP_LIKE( SUBSTR( p_sentence, p_word_index + LENGTH( word_list(i) ), 1 ), '\w' )
        THEN
           p_word_index := p_word_index + 1;
           CONTINUE;
        END IF;
        IF p_min_word_index IS NULL OR p_word_index < p_min_word_index THEN
          p_min_word_index := p_word_index;
          p_index := i;
        END IF;
        EXIT;
      END LOOP;
    END LOOP;
    IF p_index IS NULL THEN
      o_sentence := o_sentence || SUBSTR( sentence, p_pos );
      EXIT;
    ELSE
      o_sentence := o_sentence
                    || SUBSTR( sentence, p_pos, p_min_word_index - p_pos )
                    || 'http://localhost/'
                    || id_list(p_index)
                    || '/<u>'
                    || SUBSTR( sentence, p_min_word_index, LENGTH( word_list( p_index ) ) )
                    || '</u>';
      p_pos := p_min_word_index + LENGTH( word_list( p_index ) );
    END IF;
  END LOOP;
  RETURN o_sentence;
END;
/

合并：

MERGE INTO temp dst
USING (
  WITH lists ( word_list, id_list ) AS (
    SELECT CAST(
             COLLECT(
               UPPER( word )
               ORDER BY LENGTH( word ) DESC, UPPER( word ) ASC, ROWNUM
             )
             AS stringlist
           ),
           CAST(
             COLLECT(
               id
               ORDER BY LENGTH( word ) DESC, UPPER( word ) ASC, ROWNUM
             )
             AS intlist
           )
    FROM   temp
  )
  SELECT t.ROWID rid,
         replace_words(
           word_list,
           id_list,
           sentence
         ) AS replaced_sentence
  FROM   temp t
         CROSS JOIN lists
) src
ON ( dst.ROWID = src.RID )
WHEN MATCHED THEN
  UPDATE SET sentence = src.replaced_sentence;

输出：

SELECT * FROM temp;

身份证 |字 |句子 -: | :------------------------ | :------------------------------------------------ -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -- 1 |自动化测试|自动化测试 http://localhost/2/TestInG http://localhost/2/TEST 是一种流行的 http://localhost/2/测试 2 |测试 | http://localhost/3/手动测试 2 |测试 | http://localhost/2/测试 3 |手动测试 |这是 http://localhost/2/testing 的旧方法 4 |标点符号 | http://localhost/1/自动化测试,http://localhost/3/手动测试,http://localhost/4/标点符号 u>,automanual http://localhost/2/testing-http://localhost/2/testing 5 | B数分析| http://localhost/6/B数分析表 6 | B数分析表| http://localhost/2/测试 http://localhost/5/B数分析 7 |不匹配 | http://localhost/2/测试 http://localhost/2/测试 http://localhost/2/测试 8 | ^[($ | http://localhost/2/testing 字符 http://localhost/8/^[($ 需要在正则表达式中转义

db小提琴here

【讨论】：

谢谢，MT0。当记录较少时，这非常有效。但是对于有 50k 条记录的表，它完全利用了临时空间。我还添加了一个新的临时表空间。它消耗超过 32GB 并且临时空间再次空间不足。有没有办法调整这个查询来提高性能？再次感谢。
HI MT0，如果您能帮助调整此代码，那将非常有帮助。如果它是子字符串，则不应替换该单词。例如，单词是“方式”。如果在句子中有两种方式。更换不应该发生。并且当单词是 url 的一部分时 (https://***"word"***.com)。但是当我们有 way's or way/ or way: or way, or way;等等，
@Ana 请ask a new question 使用minimal reproducible example，包括现有功能；一些示例数据的 DDL 和 DML 语句，这些数据展示了您试图涵盖的所有问题；以及您尝试解决的问题以及存在的错误。另请注意，StackOverflow 不是编码服务，我们不应该为您逐步构建您的应用程序。

【解决方案2】：

虽然肯定有一种方法可以在单个 SQL 语句中执行此操作，但我认为使用单独的函数可以更好地解决此问题：

create or replace function replace_words(p_word varchar2, p_sentence varchar2) return varchar2 is
    v_match_position number;
    v_match_count    number := 0;
    v_new_sentence   varchar2(4000) := p_sentence;
begin
    --Find all matches.
    loop
        --Find Nth case-insensitive match
        v_match_count := v_match_count + 1;
        v_match_position := regexp_instr(
            srcstr     => v_new_sentence,
            pattern    => p_word,
            occurrence => v_match_count,
            modifier   => 'i');

        exit when v_match_position = 0;

        --Insert text, instead of replacing, to keep the original case.
        v_new_sentence :=
            substr(v_new_sentence, 1, v_match_position - 1) || 'http://localhost/2/<u>'||
            substr(v_new_sentence, v_match_position, length(p_word)) || '</u>'||
            substr(v_new_sentence, v_match_position + length(p_word));

    end loop;

    return v_new_sentence;
end;
/

那么SQL查询是这样的：

select id, word, sentence, replace_words(word, sentence) from temp;

【讨论】：

谢谢，乔恩。但是这个函数必须在句子中寻找准确的单词并替换。在您的示例中：单词是“手动测试”，但句子中没有“手动测试”。如果这个词只是“测试”，那么替换就可以了。
@Ana 我将函数修改为只使用一个单词。从您的原始示例中，我认为 word 列包含单词列表。否则，您的第一个示例似乎不正确
我们需要对照所有句子检查单词列中的所有单词。如果在任何句子中找到任何单词，我们需要用链接替换。例如：我们在单词栏中有一个单词手动测试，我们需要在句子栏中替换手动测试。预期结果是localhost/3/manual testing。 id =5 的预期结果是 localhost/1/automation testing, localhost/3/manual testing, localhost/4/punctuation, automanual localhost/2/testing-http://localhost/2/…>
@Ana 我以为我的第一个版本就是这样做的？您能否修改问题并列出每个值的准确结果？
您的第一个版本很好。只是我们需要在句子中寻找确切的单词匹配。 select replace_words('Manual testing', 'this is an old method of manual testing') from dual;结果是这是localhost/2/manual localhost/2/testing 的旧方法，但预期结果是“这是localhost/2/manual 测试的旧方法”。链接中的数字也是单词各自的id。该表由 50k 条记录组成。我们应该在循环中调用这个函数来检查句子中的所有单词吗？