如何通过 RegEx 或 replaceAll 删除包含特殊字符的部分字符串？答案

【问题标题】：How to remove part of string that includes special characters by RegEx or replaceAll?如何通过 RegEx 或 replaceAll 删除包含特殊字符的部分字符串？
【发布时间】：2016-08-20 15:59:37
【问题描述】：

这里是字符串：

1. "AAA BBB  CCCCC CCCCCCC"
2. "  AAA              BBB  DDDD DDDD DDDDD"
3. "    EEE         FFF  GGGGG GGGGG"

开头和第一个单词和第二个单词之间的空格可能会有所不同。所以我需要一个正则表达式来删除第三个单词之前的所有内容，所以它总是返回 “CCCCC CCCCCCC”或“DDDD DDDD DDDDD”或“GGGGG GGGGG”。假设它可以通过 RegEx 来完成，而不是解析字符串中的所有单词

【问题讨论】：

regex101.com/r/yU4cS2/1
您不能只是在这里倾倒您的需求并为您完成工作。展示你的努力。
@rock321987 - 将其发布为答案。这正是所要求的
没有意义。如果您只有 2 个单词，字符串是否返回空。无论如何，只需将^\s*(?:\S+(?:\s+|$)){2} 替换为空即可。

标签： java regex

【解决方案1】：

这个正则表达式可以工作

\s*\w+\s+\w+\s+(.+$)

Regex Demo

JAVA 代码

String pattern  = "(?m)\\s*\\w+\\s+\\w+\\s+(.+$)"; 
String line = "AAA BBB  CCCCC CCCCCCC\n  AAA              BBB  DDDD DDDD DDDDD\n    EEE         FFF  GGGGG GGGGG";

Pattern r = Pattern.compile(pattern);

Matcher m = r.matcher(line);
while (m.find()) {
     System.out.println("Found value: " + m.group(1) );
}

Ideone Demo

【讨论】：

【解决方案2】：

您需要使用组匹配来解析所需的数据

String result = null;

try {
    Pattern regex = Pattern.compile("\\s*\\w+\\s*\\w+\\s*([\\w| ]+)");
    Matcher regexMatcher = regex.matcher("  AAA              BBB  DDDD DDDD DDDDD");
    if (regexMatcher.find()) {
        result = regexMatcher.group(1); // result = "DDDD DDDD DDDDD"
    } 
} catch (PatternSyntaxException ex) {
    // Syntax error in the regular expression
}

正则表达式解释

"\\s" +           // Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   "*" +            // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
"\\w" +           // Match a single character that is a “word character” (letters, digits, and underscores)
   "+" +            // Between one and unlimited times, as many times as possible, giving back as needed (greedy)
"\\s" +           // Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   "*" +            // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
"\\w" +           // Match a single character that is a “word character” (letters, digits, and underscores)
   "+" +            // Between one and unlimited times, as many times as possible, giving back as needed (greedy)
"\\s" +           // Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   "*" +            // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
"(" +            // Match the regular expression below and capture its match into backreference number 1
   "[\\w| ]" +       // Match a single character present in the list below
                       // A word character (letters, digits, and underscores)
                       // One of the characters “| ”
      "+" +            // Between one and unlimited times, as many times as possible, giving back as needed (greedy)
")"

【讨论】：

我还需要删除第一个和第二个单词

【解决方案3】：

与@rock321987 的答案类似，您可以修改正则表达式以使用量词来忽略您不想要的任意数量的前面单词。

\s*(?:\w+\s+){2}(.+$)

More info

或者在 Java 中：

"\\s*(?:\\w+\\s+){2}(.+$)"

?: 使 () 中的模式成为非捕获组。 { } 中的数字是您要忽略的后跟空格的单词数。

【讨论】：