Regex 类详解 - 爱码网

NET Framework 中的正则表达式引擎由 Regex 类表示。此引擎是 .NET Framework 正则表达式对象模型中的主要组件。

可以通过以下两种方式之一使用正则表达式引擎：

正则表达式引擎会缓存静态方法调用中使用的正则表达式，这样一来，重复调用使用同一正则表达式的静态正则表达式方法将提供相对良好的性能。
Regex 对象多次。

Regex 类的方法来执行下列操作：

确定字符串是否与正则表达式模式匹配。
提取单个匹配项或第一个匹配项。
提取所有匹配项。
替换匹配的子字符串。
将单个字符串拆分成一个字符串数组。

以下各部分对这些操作进行了描述。

匹配正则表达式模式

例如，下面的代码将确保字符串与有效的美国社会保障号匹配。

using System;
using System.Text.RegularExpressions;

public class Example
{
public static void Main()
   {
string[] values = { "111-22-3333", "111-2-3333"};
string pattern = @"^\d{3}-\d{2}-\d{4}$";
foreach (string value in values) {
if (Regex.IsMatch(value, pattern))
            Console.WriteLine("{0} is a valid SSN.", value);
else   
            Console.WriteLine("{0}: Invalid", value);
      }
   }
}
// The example displays the following output:
//       111-22-3333 is a valid SSN.
//       111-2-3333: Invalid

^\d{3}-\d{2}-\d{4}$ 的含义如下表所示。

模式	说明
^	匹配输入字符串的开头部分。
\d{3}	匹配三个十进制数字。
-	匹配连字符。
\d{2}	匹配两个十进制数字。
-	匹配连字符。
\d{4}	匹配四个十进制数字。
$	匹配输入字符串的末尾部分。

提取单个匹配项或第一个匹配项

Match.NextMatch 方法。

using System;
using System.Text.RegularExpressions;

public class Example
{
public static void Main()
   {
string input = "This is a a farm that that raises dairy cattle."; 
string pattern = @"\b(\w+)\W+(\1)\b";
      Match match = Regex.Match(input, pattern);
while (match.Success)
      {
         Console.WriteLine("Duplicate '{0}' found at position {1}.",  
                           match.Groups[1].Value, match.Groups[2].Index);
         match = match.NextMatch();
      }                       
   }
}
// The example displays the following output:
//       Duplicate 'a' found at position 10.
//       Duplicate 'that' found at position 22.

\b(\w+)\W+(\1)\b 的含义如下表所示。

模式	说明
\b	从单词边界开始进行匹配。
(\w+)	这是第一个捕获组。
\W+	匹配一个或多个非单词字符。
(\1)	这是第二个捕获组。
\b	在单词边界处结束匹配。

提取所有匹配项

NextMatch 方法。

using System;
using System.Text.RegularExpressions;

public class Example
{
public static void Main()
   {
string input = "This is a a farm that that raises dairy cattle."; 
string pattern = @"\b(\w+)\W+(\1)\b";
foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine("Duplicate '{0}' found at position {1}.",  
                           match.Groups[1].Value, match.Groups[2].Index);
   }
}
// The example displays the following output:
//       Duplicate 'a' found at position 10.
//       Duplicate 'that' found at position 22.

替换匹配的子字符串

中的十进制数字前添加美国货币符号。

using System;
using System.Text.RegularExpressions;

public class Example
{
public static void Main()
   {
string pattern = @"\b\d+\.\d{2}\b";
string replacement = "$$$&"; 
string input = "Total Cost: 103.64";
      Console.WriteLine(Regex.Replace(input, pattern, replacement));     
   }
}
// The example displays the following output:
//       Total Cost: $103.64

\b\d+\. \d{2}\b is interpreted as shown in the following table.

模式	说明
\b	在单词边界处开始匹配。
\d+	匹配一个或多个十进制数字。
\.	匹配句点。
\d{2}	匹配两个十进制数字。
\b	在单词边界处结束匹配。

$$$& 的含义如下表所示。

模式	替换字符串
$$	美元符号 ($) 字符。
$&	整个匹配的子字符串。

将单个字符串拆分成一个字符串数组

例如，下面的代码将编号列表中的项置于字符串数组中。

C++

JScript

using System;
using System.Text.RegularExpressions;

public class Example
{
public static void Main()
   {
string input = "1. Eggs 2. Bread 3. Milk 4. Coffee 5. Tea";
string pattern = @"\b\d{1,2}\.\s";
foreach (string item in Regex.Split(input, pattern))
      {
if (! String.IsNullOrEmpty(item))
            Console.WriteLine(item);
      }      
   }
}
// The example displays the following output:
//       Eggs
//       Bread
//       Milk
//       Coffee
//       Tea

\b\d{1,2}\. \s is interpreted as shown in the following table.

模式	说明
\b	在单词边界处开始匹配。
\d{1,2}	匹配一个或两个十进制数字。
\.	匹配句点。
\s	与空白字符匹配。

MatchCollection 和 Match 对象

Match 对象。

Match 集合

Item 是集合的索引器（在 C# 中）和默认属性（在 Visual Basic 中）。

foreach）。

此示例枚举了该集合，将匹配项复制到字符串数组并将字符位置记录在整数数组中。

using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

public class Example
{
public static void Main()
   {
       MatchCollection matches;
       List<string> results = new List<string>();
       List<int> matchposition = new List<int>();

// Create a new Regex object and define the regular expression.
       Regex r = new Regex("abc");
// Use the Matches method to find all matches in the input string.
       matches = r.Matches("123abc4abcd");
// Enumerate the collection to retrieve all matches and positions.
foreach (Match match in matches)
       {
// Add the match string to the string array.
           results.Add(match.Value);
// Record the character position where the match was found.
           matchposition.Add(match.Index);
       }
// List the results.
for (int ctr = 0; ctr < results.Count; ctr++)
         Console.WriteLine("'{0}' found at position {1}.", 
                           results[ctr], matchposition[ctr]);  
   }
}
// The example displays the following output:
//       'abc' found at position 3.
//       'abc' found at position 7.

Match 类

Match 对象：

MatchCollection.Count 属性。

正则表达式只是与输入字符串中的字符串“abc”匹配。

C++

JScript

using System;
using System.Text.RegularExpressions;

public class Example
{
public static void Main()
   {
string pattern = "abc";
string input = "abc123abc456abc789";
foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine("{0} found at position {1}.", 
                           match.Value, match.Index);
   }
}
// The example displays the following output:
//       abc found at position 0.
//       abc found at position 6.
//       abc found at position 12.

false。

Match.NextMatch 方法来匹配输入字符串中的字符串“abc”。

using System;
using System.Text.RegularExpressions;

public class Example
{
public static void Main()
   {
string pattern = "abc";
string input = "abc123abc456abc789";
      Match match = Regex.Match(input, pattern);
while (match.Success)
      {
         Console.WriteLine("{0} found at position {1}.", 
                           match.Value, match.Index);
         match = match.NextMatch();                  
      }                     
   }
}
// The example displays the following output:
//       abc found at position 0.
//       abc found at position 6.
//       abc found at position 12.

Match 类的以下两个属性都将返回集合对象：

GroupCollection 对象，该对象包含有关与正则表达式模式中的捕获组匹配的子字符串的信息。
Match对象具有的信息相同。

捕获集合部分。

Match.Index 属性返回输入字符串中匹配的字符串的起始位置（从零开始）。

Match 类还具有两个模式匹配方法：

Match 对象。
Match.Result 方法对匹配的字符串执行指定的替换操作并返回相应结果。

Match.Result 方法在每个包含两个小数位的数字前预置一个 $ 符号和一个空格。

using System;
using System.Text.RegularExpressions;

public class Example
{
public static void Main()
   {
string pattern = @"\b\d+(,\d{3})*\.\d{2}\b";
string input = "16.32\n194.03\n1,903,672.08"; 

foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine(match.Result("$$ $&"));
   }
}
// The example displays the following output:
//       $ 16.32
//       $ 194.03
//       $ 1,903,672.08

\b\d+(,\d{3})*\. \d{2}\b is defined as shown in the following table.

模式	说明
\b	在单词边界处开始匹配。
\d+	匹配一个或多个十进制数字。
(,\d{3})*	匹配零个或多个以下模式：一个逗号后跟三个十进制数字。
\.	匹配小数点字符。
\d{2}	匹配两个十进制数字。
\b	在单词边界处结束匹配。

$& 模式）替换。

返回页首

组集合

此对象后面的每个对象均表示一个捕获组的结果。

在对未命名捕获进行索引后，将按照命名捕获在正则表达式模式中出现的顺序从左至右对它们进行索引。

Group 对象，如下所示：

Group group = match.Groups[ctr];

下面的示例定义一个正则表达式，该表达式使用分组构造捕获日期的年、月和日部分。

using System;
using System.Text.RegularExpressions;

public class Example
{
public static void Main()
   {
string pattern = @"\b(\w+)\s(\d{1,2}),\s(\d{4})\b";
string input = "Born: July 28, 1989";
      Match match = Regex.Match(input, pattern);
if (match.Success)
for (int ctr = 0; ctr <  match.Groups.Count; ctr++)
            Console.WriteLine("Group {0}: {1}", ctr, match.Groups[ctr].Value);
    }
}
// The example displays the following output:
//       Group 0: July 28, 1989
//       Group 1: July
//       Group 2: 28
//       Group 3: 1989

\b(\w+)\s(\d{1,2}),\s(\d{4})\b 的定义如下表所示。

模式	说明
\b	在单词边界处开始匹配。
(\w+)	这是第一个捕获组。
\s	与空白字符匹配。
(\d{1,2})	这是第二个捕获组。
,	匹配逗号。
\s	与空白字符匹配。
(\d{4})	这是第三个捕获组。
\b	在单词边界处结束匹配。

返回页首

捕获的组

有关示例，请参见上一部分。

它会将子字符串“ab”分配给第一个捕获组，并将子字符串“b”分配给第二个捕获组。

List<int> matchposition = new List<int>();
List<string> results = new List<string>();
// Define substrings abc, ab, b.
Regex r = new Regex("(a(b))c"); 
Match m = r.Match("abdabc");
for (int i = 0; m.Groups[i].Value != ""; i++) 
{
// Add groups to string array.
   results.Add(m.Groups[i].Value); 
// Record character position.
   matchposition.Add(m.Groups[i].Index); 
}

// Display the capture groups.
for (int ctr = 0; ctr < results.Count; ctr++)
   Console.WriteLine("{0} at position {1}", 
                     results[ctr], matchposition[ctr]);
// The example displays the following output:
//       abc at position 3
//       ab at position 3
//       b at position 4

下面的示例使用命名的分组构造，从包含“DATANAME:VALUE”格式的数据的字符串中捕获子字符串，正则表达式通过冒号 (:) 拆分数据。

Regex r = new Regex("^(?<name>\\w+):(?<value>\\w+)");
Match m = r.Match("Section1:119900");
Console.WriteLine(m.Groups["name"].Value);
Console.WriteLine(m.Groups["value"].Value);
// The example displays the following output:
//       Section1
//       119900

^(?<name>\w+):(?<value>\w+) 的定义如下表所示。

模式	说明
^	从输入字符串的开头部分开始匹配。
(?<name>\w+)	name。
:	匹配冒号。
(?<value>\w+)	value。

Group.Success 属性指示子字符串是否与捕获组所定义的模式匹配。

限定符），可以按两种方式修改一个捕获组对应一个捕获这样的关系：

Group 对象的属性。

组属性	值
Success	false
Value	String.Empty
Length	0

由于输入字符串“aaaccc”与此模式匹配，因此该捕获组没有匹配项。

using System;
using System.Text.RegularExpressions;

public class Example
{
public static void Main()
   {
string pattern = "aaa(bbb)*ccc";
string input = "aaaccc";
      Match match = Regex.Match(input, pattern);
      Console.WriteLine("Match value: {0}", match.Value);
if (match.Groups[1].Success)
         Console.WriteLine("Group 1 value: {0}", match.Groups[1].Value);
else
         Console.WriteLine("The first capturing group has no match.");
   }
}
// The example displays the following output:
//       Match value: aaaccc
//       The first capturing group has no match.

如示例中的输出所示，虽然正则表达式成功捕获整个句子，但第二个捕获组仅捕获了最后一个单词。

using System;
using System.Text.RegularExpressions;

public class Example
{
public static void Main()
   {
string pattern = @"\b((\w+)\s?)+\.";
string input = "This is a sentence. This is another sentence.";
      Match match = Regex.Match(input, pattern);
if (match.Success)
      {
         Console.WriteLine("Match: " + match.Value);
         Console.WriteLine("Group 2: " + match.Groups[2].Value);
      }   
   }
}
// The example displays the following output:
//       Match: This is a sentence.
//       Group 2: sentence

返回页首

捕获集合

Capture 对象：

Each 构造（在 Visual Basic 中）。
CaptureCollection 对象的默认属性（在 Visual Basic 中）或索引器（在 C# 中）。

Group 对象相同的捕获。

CaptureCollection 对象都将包含三个成员。

using System;
using System.Text.RegularExpressions;

public class Example
{
public static void Main()
   {
string pattern = "((a(b))c)+";
string input = "abcabcabc";

      Match match = Regex.Match(input, pattern);
if (match.Success)
      {
         Console.WriteLine("Match: '{0}' at position {1}",  
                           match.Value, match.Index);
         GroupCollection groups = match.Groups;
for (int ctr = 0; ctr < groups.Count; ctr++) {
            Console.WriteLine("   Group {0}: '{1}' at position {2}", 
                              ctr, groups[ctr].Value, groups[ctr].Index);
            CaptureCollection captures = groups[ctr].Captures;
for (int ctr2 = 0; ctr2 < captures.Count; ctr2++) {
               Console.WriteLine("      Capture {0}: '{1}' at position {2}", 
                                 ctr2, captures[ctr2].Value, captures[ctr2].Index);
            }                     
         }
      }
   }
}
// The example displays the following output:
//       Match: 'abcabcabc' at position 0
//          Group 0: 'abcabcabc' at position 0
//             Capture 0: 'abcabcabc' at position 0
//          Group 1: 'abc' at position 6
//             Capture 0: 'abc' at position 0
//             Capture 1: 'abc' at position 3
//             Capture 2: 'abc' at position 6
//          Group 2: 'ab' at position 6
//             Capture 0: 'ab' at position 0
//             Capture 1: 'ab' at position 3
//             Capture 2: 'ab' at position 6
//          Group 3: 'b' at position 7
//             Capture 0: 'b' at position 1
//             Capture 1: 'b' at position 4
//             Capture 2: 'b' at position 7

Group.Captures 属性来返回多组捕获的子字符串。

   int counter;
   Match m;
   CaptureCollection cc;
   GroupCollection gc;

// Look for groupings of "Abc".
   Regex r = new Regex("(Abc)+"); 
// Define the string to search.
   m = r.Match("XYZAbcAbcAbcXYZAbcAb"); 
   gc = m.Groups;

// Display the number of groups.
   Console.WriteLine("Captured groups = " + gc.Count.ToString());

// Loop through each group.
for (int i=0; i < gc.Count; i++) 
   {
      cc = gc[i].Captures;
      counter = cc.Count;

// Display the number of captures in this group.
      Console.WriteLine("Captures count = " + counter.ToString());

// Loop through each capture in the group.
for (int ii = 0; ii < counter; ii++) 
      {
// Display the capture and its position.
         Console.WriteLine(cc[ii] + "   Starts at character " + 
              cc[ii].Index);
      }
   }
}
// The example displays the following output:
//       Captured groups = 2
//       Captures count = 1
//       AbcAbcAbc   Starts at character 3
//       Captures count = 3
//       Abc   Starts at character 3
//       Abc   Starts at character 6
//       Abc   Starts at character 9

返回页首

单个捕获

Capture.Index 属性指示匹配的子字符串在输入字符串中的起始位置（从零开始）。

((\w+(\s\w+)*),(\d+);)+ 中，城市名称将分配给第二个捕获组，而温度将分配到第四个捕获组。

using System;
using System.Text.RegularExpressions;

public class Example
{
public static void Main()
   {
string input = "Miami,78;Chicago,62;New York,67;San Francisco,59;Seattle,58;"; 
string pattern = @"((\w+(\s\w+)*),(\d+);)+";
      Match match = Regex.Match(input, pattern);
if (match.Success)
      {
         Console.WriteLine("Current temperatures:");
for (int ctr = 0; ctr < match.Groups[2].Captures.Count; ctr++)
            Console.WriteLine("{0,-20} {1,3}", match.Groups[2].Captures[ctr].Value, 
                              match.Groups[4].Captures[ctr].Value);
      }
   }
}
// The example displays the following output:
//       Current temperatures:
//       Miami                 78
//       Chicago               62
//       New York              67
//       San Francisco         59

该正则表达式的定义如下表所示。

模式	说明
\w+	匹配一个或多个单词字符。
(\s\w+)*	这是第三个捕获组。
(\w+(\s\w+)*)	这是第二个捕获组。
,	匹配逗号。
(\d+)	这是第四个捕获组。
;	匹配分号。
((\w+(\s\w+)*),(\d+);)+	这是第一个捕获组。