正则表达式匹配 <Key>....<Value> 模式答案

【问题标题】：regex to match <Key>....<Value> pattern正则表达式匹配 <Key>....<Value> 模式
【发布时间】：2010-07-06 07:36:02
【问题描述】：

我有以下外部系统发送的数据，需要针对特定密钥进行解析

<ContextDetails>
<Context><Key>ID</Key><Value>100</Value></Context>
<Context><Key>Name</Key><Value>MyName</Value></Context>
</ContextDetails>

我尝试使用正则表达式解析它以获取 KEY 的值：名称

<Context><Key>Name</Key><Value>.</Value></Context>

但结果是空白

修复这个正则表达式我需要做些什么改变

【问题讨论】：

你不应该为此使用正则表达式..
对我来说这看起来不像正则表达式 - 你使用什么语言来表示正则表达式？爪哇？。网？ Javascript？珀尔？红宝石？还有什么？
看起来是 XML 解析器的完美工作。

标签： .net xml regex parsing

【解决方案1】：

如果这是 XML，请将其加载到 XDocument 并查询。

请参阅@Jens 的answer，了解有关如何执行此操作的详细信息。

【讨论】：

【解决方案2】：

要扩展Oded's answer，你应该这样做的方式是这样的：

XDocument doc = XDocument.Parse(@"<ContextDetails> 
<Context><Key>ID</Key><Value>100</Value></Context> 
<Context><Key>Name</Key><Value>MyName</Value></Context> 
</ContextDetails>");

String name  =  doc.Root.Elements("Context")
                        .Where(xe => xe.Element("Key").Value == "Name")
                        .Single()
                        .Element("Value").Value;

【讨论】：

【解决方案3】：

在我看来你做错了。您应该使用 XML 解析器。 http://www.tutorialspoint.com/ruby/ruby_xml_xslt.htm 这只是一个指南。它可以提供帮助。

【讨论】：

【解决方案4】：

我认为，匹配所有 Key-Value-Pairse 的 Reg-Ex 表达式是：

<Context>\s*?<Key>(.*?)\</Key>\s*?<Value>(.*?)</Value>\s*?</Context>

说明：

// <Context>\s*?<Key>(.*?)\</Key>\s*?<Value>(.*?)</Value>\s*?</Context>
// 
// Match the characters "<Context>" literally «<Context>»
// Match a single character that is a "whitespace character" (spaces, tabs, line breaks, etc.) «\s*?»
//    Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
// Match the characters "<Key>" literally «<Key>»
// Match the regular expression below and capture its match into backreference number 1 «(.*?)»
//    Match any single character that is not a line break character «.*?»
//       Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
// Match the character "<" literally «\<»
// Match the characters "/Key>" literally «/Key>»
// Match a single character that is a "whitespace character" (spaces, tabs, line breaks, etc.) «\s*?»
//    Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
// Match the characters "<Value>" literally «<Value>»
// Match the regular expression below and capture its match into backreference number 2 «(.*?)»
//    Match any single character that is not a line break character «.*?»
//       Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
// Match the characters "</Value>" literally «</Value>»
// Match a single character that is a "whitespace character" (spaces, tabs, line breaks, etc.) «\s*?»
//    Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
// Match the characters "</Context>" literally «</Context>»

用法：

using System.Text.RegularExpressions;
public static void RunSnippet()
    {
        Regex RegexObj = new Regex("<Context>\\s*?<Key>(.*?)\\</Key>\\s*?<Value>(.*?)</Value>\\s*?</Context>",
            RegexOptions.IgnoreCase | RegexOptions.Multiline);
        Match MatchResults = RegexObj.Match(@"<ContextDetails>
            <Context><Key>ID</Key><Value>100</Value></Context>
            <Context><Key>Name</Key>   <Value>MyName</Value></Context>
            </ContextDetails>
            ");
        while (MatchResults.Success){
            Console.WriteLine("Key: " + MatchResults.Groups[1].Value)   ;
            Console.WriteLine("Value: " + MatchResults.Groups[2].Value) ;
            Console.WriteLine("----");
            MatchResults = MatchResults.NextMatch();
        }
    }
    /*
    Output:

        Key: ID
        Value: 100
        ----
        Key: Name
        Value: MyName
        ----
    */

仅计算值或键“名称”的正则表达式：

<Context>\s*?<Key>Name</Key>\s*?<Value>(.*?)</Value>\s*?</Context>

说明：

// <Context>\s*?<Key>Name</Key>\s*?<Value>(.*?)</Value>\s*?</Context>
// 
// Match the characters "<Context>" literally «<Context>»
// Match a single character that is a "whitespace character" (spaces, tabs, line breaks, etc.) «\s*?»
//    Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
// Match the characters "<Key>Name</Key>" literally «<Key>Name</Key>»
// Match a single character that is a "whitespace character" (spaces, tabs, line breaks, etc.) «\s*?»
//    Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
// Match the characters "<Value>" literally «<Value>»
// Match the regular expression below and capture its match into backreference number 1 «(.*?)»
//    Match any single character that is not a line break character «.*?»
//       Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
// Match the characters "</Value>" literally «</Value>»
// Match a single character that is a "whitespace character" (spaces, tabs, line breaks, etc.) «\s*?»
//    Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
// Match the characters "</Context>" literally «</Context>»

用法：

string SubjectString = @"<ContextDetails>
            <Context><Key>ID</Key><Value>100</Value></Context>
            <Context><Key>Name</Key>   <Value>MyName</Value></Context>
            </ContextDetails>
            ";
    Console.WriteLine( Regex.Match(SubjectString, "<Context>\\s*?<Key>Name</Key>\\s*?<Value>(.*?)</Value>\\s*?</Context>",
            RegexOptions.IgnoreCase | RegexOptions.Multiline).Groups[1].Value );

【讨论】：

哇，这是一个解释！ =) 请问您是否使用了一些生成器来为您执行此操作？那会派上用场的！
RegExBuddy 是解释的生成器。它是一个带有调试器的正则表达式编辑器。（网址：regexbuddy.com）

【解决方案5】：

您可以使用 XML 解析器吗？如果是这样，那就使用它，它是适合这项工作的工具。

如果您只有一个文本编辑器，并且愿意手动检查每个匹配项，那么您可以使用正则表达式。您的正则表达式中的错误是 . 仅匹配一个字符（除换行符之外的任何字符）。因此，您需要将其替换为 .*?（匹配任意数量的字符，但尽可能少），或者更好的是 [^<]*。

后者表示“除< 之外的零个或多个字符”（这是分隔符）。当然，这只有在您要查找的值中没有 < 时才有效。

您的正则表达式还假设整个匹配在一行上，标签之间没有空格 - 因此在所有其他情况下都会失败。

更新：我刚刚看到您的编辑：那么您确实可以访问 XML 解析器 - 使用 Oded 的答案。

【讨论】：