非贪婪的正则表达式答案

【问题标题】：Non greedy regex非贪婪的正则表达式
【发布时间】：2013-02-15 12:11:40
【问题描述】：

我需要在这样的注释 php 文件中获取一些标签内的值

php code
/* this is a comment
!-
<titulo>titulo3</titulo>
<funcion>
   <descripcion>esta es la descripcion de la funcion 6</descripcion>
</funcion>
<funcion>
   <descripcion>esta es la descripcion de la funcion 7</descripcion>
</funcion>
<otros>
   <descripcion>comentario de otros 2a hoja</descripcion>
</otros>
-!
*/
some php code

所以你可以看到文件有换行符和重复的标签，如<funcion></funcion>，我需要获取每一个标签，所以我尝试了这样的事情：

preg_match_all("/(<funcion>)(.*)(<\/funcion>)/s",$file,$matches);

此示例适用于换行符，但它很贪心，所以我一直在搜索并看到这两种解决方案：

preg_match_all("/(<funcion>)(.*?)(<\/funcion>)/s",$file,$matches);
preg_match_all("/(<funcion>)(.*)(<\/funcion>)/sU",$file,$matches);

但它们都不适合我，不知道为什么

【问题讨论】：

只需解析 XML。
@Blender 它不是一个真正的 xml，它应该在一个 php 文件的注释中。我将对其进行编辑，以便更清楚
我写了一个答案，但我刚刚意识到您发布的第一个示例（实际上是第二个）实际上在这里完美运行。
@Raphael_ 它对我不起作用，哈哈
在this codepad 中运行良好。

标签： php regex regex-greedy non-greedy

【解决方案1】：

。 .如果结构完全一样（始终在内容内缩进），您可以轻松地将其与 /\n[\s]+([^\n]+(\n[\s]+)*)\n 匹配/。

。 .我总是倾向于避免使用“懒惰”（“非贪婪”）修饰符。它只是看起来像一个 hack，并且它并非随处可用并且具有相同的实现。由于在这种情况下您似乎不需要需要它，我建议您不要使用它。

。 .试试这个：

$regexp = '/<funcion>\n[\s]+([^\n]+(\n[\s]+)*)\n</funcion>/';
$works = preg_match_all($regexp, $file, $matches);
echo '<pre>';
print_r($matches);

。 . "$matches[1]" 数组将为您提供一个包含 "funcion" 标签内容的数组。

。 .当然，最好预先过滤内容并将正则表达式应用于评论内容，以避免任何不匹配。

。 .玩得开心。

【讨论】：

@Raphael_ 我学会了尽可能严格地使用正则表达式，这样它们就不会匹配不一致的数据（你会更容易检测到这些数据）。这不是关于过于复杂，只是对模式更严格一些。我的示例仅适用于正确缩进的内容，并且还会返回更清晰的结果（已经“修剪”）。

【解决方案2】：

试试这个..

 /<funcion>((.|\n)*?)<\/funcion>/i

例如

$srting = "<titulo>titulo3</titulo>
<funcion>
   <descripcion>esta es la descripcion de la funcion 6</descripcion>
</funcion>
<funcion>
   <descripcion>esta es la descripcion de la funcion 7</descripcion>
</funcion>
<otros>
   <descripcion>comentario de otros 2a hoja</descripcion>
</otros>";

$result=preg_match_all('/<funcion>((.|\n)*?)<\/funcion>/i', $srting,$m);
print_r($m[0]);

这个输出

Array
(
    [0] => 
   esta es la descripcion de la funcion 6

    [1] => 
   esta es la descripcion de la funcion 7

)

DEMO

【讨论】：

【解决方案3】：

你的问题中的这个表达：

preg_match_all("/(<funcion>)(.*?)(<\/funcion>)/s", $file, $matches);
print_r($matches);

这将起作用，但仅当$file 是包含 XML 的字符串时；如果是文件名，则必须先获取内容：

preg_match_all("/(<funcion>)(.*?)(<\/funcion>)/s", file_get_contents($file), $matches);

另外，请记住，当您使用非贪婪模式时，PCRE 具有backtrack limitations。

【讨论】：

【解决方案4】：

尝试使用[\s\S]，即所有空格和非空格字符，而不是.。此外，无需在匹配组中添加<funcion> 和</funcion>。

/<funcion>([\s\S]*?)<\/funcion>/s

另外，请记住，最好的方法是使用 XML parser 解析 XML。即使它不是 XML 文档，正如您在评论中提到的那样，提取应该解析的部分并使用 XML 解析器对其进行解析。

【讨论】：

问题是，当我使用 *? 时，我的代码似乎不起作用，我不知道为什么