这个 Perl 正则表达式是什么意思：m/(.*?):(.*?)$/g？答案

【问题标题】：What does this Perl regex mean: m/(.*?):(.*?)$/g?这个 Perl 正则表达式是什么意思：m/(.*?):(.*?)$/g？
【发布时间】：2011-04-15 20:20:33
【问题描述】：

我正在编辑一个 Perl 文件，但我不理解这个正则表达式比较。谁能给我解释一下？

if ($lines =~ m/(.*?):(.*?)$/g) { } ..

这里发生了什么？ $lines 是文本文件中的一行。

【问题讨论】：

看起来第一个 (.*?) 将始终匹配空字符串。
并非总是如此。它将匹配直到第一个冒号的所有字符。

标签： regex perl

【解决方案1】：

把它分成几部分：

$lines =~ m/ (.*?)      # Match any character (except newlines)
                        # zero or more times, not greedily, and
                        # stick the results in $1.
             :          # Match a colon.
             (.*?)      # Match any character (except newlines)
                        # zero or more times, not greedily, and
                        # stick the results in $2.
             $          # Match the end of the line.
           /gx;

所以，这将匹配像 ":" 这样的字符串（它匹配零个字符，然后是一个冒号，然后是行尾之前的零个字符，$1 和 $2 是空字符串），或者 "abc:" ( $1 = "abc"、$2 是一个空字符串）或"abc:def:ghi"（$1 = "abc" 和$2 = "def:ghi"）。

如果你传入一个不匹配的行（如果字符串不包含冒号，看起来就是这样），那么它不会处理括号内的代码。但如果匹配，则括号内的代码可以使用和处理特殊的 $1 和 $2 变量（至少，直到下一个正则表达式出现，如果括号内有一个）。

【讨论】：

【解决方案2】：

有一个工具可以帮助理解正则表达式：YAPE::Regex::Explain。

忽略g 修饰符，这里不需要：

use strict;
use warnings;
use YAPE::Regex::Explain;

my $re = qr/(.*?):(.*?)$/;
print YAPE::Regex::Explain->new($re)->explain();

__END__

The regular expression:

(?-imsx:(.*?):(.*?)$)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  :                        ':'
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

另见perldoc perlre。

【讨论】：

【解决方案3】：

它是由对正则表达式了解太多或对$' 和$` 变量了解不足的人编写的。

这可以写成

if ($lines =~ /:/) {
    ... # use $` ($PREMATCH)  instead of $1
    ... # use $' ($POSTMATCH) instead of $2
}

或

if ( ($var1,$var2) = split /:/, $lines, 2 and defined($var2) ) {
    ... # use $var1, $var2 instead of $1,$2
}

【讨论】：

如果您想使用 /:/，请使用 Perl 5.10 中的 /p 标志和 ${^PREMATCH} 和 ${^POSTMATCH} 变量。不过，我更喜欢拆分，因为这就是实际发生的情况。

【解决方案4】：

(.*?) 捕获任何字符，但尽可能少。

所以它会寻找像<something>:<somethingelse><end of line>这样的模式，如果字符串中有多个:，第一个将用作<something>和<somethingelse>之间的分隔符。

【讨论】：

【解决方案5】：

该行表示使用正则表达式 m/(.*?):(.*?)$/g 对 $lines 执行正则表达式匹配。如果在$lines 和false 中找不到匹配项，它将有效地返回true。

=~ 运算符的解释：

二进制“=~”绑定一个标量表达式到模式匹配。某些操作搜索或修改字符串 $_ by 默认。这个运算符使那种在其他一些上的操作工作细绳。正确的论点是搜索模式、替换或音译。左边的论点是应该搜索什么，替代或音译的默认 $_。用于标量时上下文，返回值一般预示着成功操作。

正则表达式本身是：

m/    #Perform a "match" operation
(.*?) #Match zero or more repetitions of any characters, but match as few as possible (ungreedy)
:     #Match a literal colon character
(.*?) #Match zero or more repetitions of any characters, but match as few as possible (ungreedy)
$     #Match the end of string
/g    #Perform the regex globally (find all occurrences in $line)

因此，如果 $lines 与该正则表达式匹配，它将进入条件部分，否则它将是 false 并跳过它。

【讨论】：