使用 reg-ex 提取匹配的字符串答案

【问题标题】：Extract the matched string using reg-ex使用 reg-ex 提取匹配的字符串
【发布时间】：2016-04-06 00:49:11
【问题描述】：

我搜索了与 Java 正则表达式相关的问题，并找到了有关 Pattern 和 Matcher 类的信息，以便为您提供围绕 reg-ex 匹配条件的文本组。

但是，我的要求不同。我希望提取正则表达式所代表的实际文本。

例子：

Input text: ABC 22. XYZ
Regular expression: (.*) [0-9]* (.*)

使用 Pattern 和 Matcher 类（或 Java 中的任何其他方式），我怎样才能获得文本“22.”？这是正则表达式表示的文本。

【问题讨论】：

不，不是。您提供的正则表达式匹配整个字符串，而不仅仅是其中的␠22.␠ 部分。
stackoverflow.com/questions/16517689/…
@amrut：您的意思是问如何使用带有 Matcher/Patern 类的(.*) [0-9]* (.*) 模式获得22.？这是不可能的，因为在数字之后还需要一个点以及围绕该子模式的另一个捕获组（请参阅regex demo 和code demo）。请注意，您可以通过[0-9]+[.] pattern using Matcher#find() 获取它。
@WiktorStribiżew，是的。我错过了点。
令人印象深刻的响应时间，伙计！ :)

标签： java regex

【解决方案1】：

你可以试试下面的正则表达式¹：

.*?(\s*\d+\.\s+).*

使用一些图形工具²，您可以看到正则表达式中的组在哪里，即：

要提取该组，在 Java 中您可以执行以下操作：

String input = "ABC 22. XYZ";

System.out.println(
    input.replaceAll(".*?(\\s*\\d+\\.\\s+).*", "$1")
);  // prints " 22. "

其中$1 替换为group #1。

注意事项

正则表达式解释：

NODE         EXPLANATION
------------------------------------------------------------------
  .*?        any character except \n (0 or more times
             (matching the least amount possible))
------------------------------------------------------------------
  (          group and capture to \1:
------------------------------------------------------------------
    \s*        whitespace (\n, \r, \t, \f, and " ") (0
               or more times (matching the most amount
               possible))
------------------------------------------------------------------
    \d+        digits (0-9) (1 or more times (matching
               the most amount possible))
------------------------------------------------------------------
    \.         '.'
------------------------------------------------------------------
    \s+        whitespace (\n, \r, \t, \f, and " ") (1
               or more times (matching the most amount
               possible))
------------------------------------------------------------------
  )          end of \1
------------------------------------------------------------------
  .*         any character except \n (0 or more times
             (matching the most amount possible))

获取截图的工具是Regexper。

【讨论】：

【解决方案2】：

您的捕获组已关闭。

Pattern p = Pattern.compile ("(\\d+\\.?)");
Matcher m = p.matcher ("ABC 22. XYZ");
if (m.find ()) {
  System.out.println  (m.group (1));
}

使用( 和)，您可以定义捕获组，您以后可以按组索引从匹配器中检索。第 0 组始终是整场比赛。

【讨论】：

这里不需要组。您不妨使用"\\d+\\.?" 并使用m.group(0) 获取值。因此，重点根本不是捕获组，而是您选择的点。

【解决方案3】：

您的输入在“22”之后有一个点，但您的正则表达式没有考虑到这一点。

如果您的输入中只有一个数字，您可以这样提取：

String number = input.replaceAll(".*?(\\d+).*", "$1");

此正则表达式匹配输入中任意位置的（第一个）数字（任意长度），而不管输入的其余部分是什么。

【讨论】：