【问题标题】：cryptic perl expression神秘的 perl 表达式
【发布时间】：2012-09-03 15:29:39
【问题描述】：

我在一个 perl（实际上是 PDL）程序中找到以下语句：

/\/([\w]+)$/i;

有人可以为我这个 perl 编程学徒解码吗？

【问题讨论】：

标签： regex perl pdl

【解决方案1】：

当然，我会从里到外解释：

\w - 匹配可以在单词中使用的single character（字母数字，加上'_'）

[...] - 匹配来自within the brackets的单个字符

[\w] - 匹配可以在单词中使用的单个字符（这里有点多余）

+ - 尽可能匹配前一个字符 repeating as many times，但必须至少出现一次。

[\w]+ - 匹配一组单词字符，多次匹配。这会找到一个词。

(...) - grouping。记住这组字符以备后用。

([\w]+) - 匹配一个单词，然后记住它

$ - end-of-line。匹配行尾的东西

([\w]+)$ - 匹配一行中的最后一个单词，并记住它以备后用

\/ - 一个斜杠字符“/”。必须用反斜杠转义，因为斜杠很特殊。

\/([\w]+)$ - 匹配一行中的最后一个单词，在斜线“/”之后，并记住该单词以备后用。这可能是从路径中获取目录/文件名。

/.../ - match 语法

/.../i - 我的意思是case-insensitive。

现在大家一起：

/\/([\w]+)$/i; - 匹配一行的最后一个单词并记住它以备后用；这个词必须在斜线之后。基本上，从绝对路径中获取文件名。不区分大小写的部分无关紧要，\w 已经匹配两种情况。

更多关于 Perl 正则表达式的细节在这里：http://www.troubleshooters.com/codecorn/littperl/perlreg.htm

正如 JRFerguson 指出的那样，YAPE::Regex::Explain 可用于标记正则表达式并解释各个部分。

【讨论】：

谢谢蒂姆...出色的描述，完全符合代码的上下文。经过 40 多年的编程，我似乎需要更多的正则表达式经验！

【解决方案2】：

您会发现Yape::Regex::Explain 模块值得安装。

#!/usr/bin/env perl
use YAPE::Regex::Explain;
#...may need to single quote $ARGV[0] for the shell...
print YAPE::Regex::Explain->new( $ARGV[0] )->explain;

假设这个脚本被命名为 'reexplain' 做：

$ ./rexplain '/\/([\w]+)$/i'

...获得：

The regular expression:

(?-imsx:/\/([\w]+)$/i)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  /                        '/'
----------------------------------------------------------------------
  \/                       '/'
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [\w]+                    any character of: word characters (a-z,
                             A-Z, 0-9, _) (1 or more times (matching
                             the most amount possible))
----------------------------------------------------------------------
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  /                        '/'
----------------------------------------------------------------------
  \/                       '/'
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [\w]+                    any character of: word characters (a-z,
                             A-Z, 0-9, _) (1 or more times (matching
                             the most amount possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string
----------------------------------------------------------------------
  /i                       '/i'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

更新：

另见：https://stackoverflow.com/a/12359682/1015385。如那里和模块文档中所述：

不支持在 Perl 5.6 版之后添加的正则表达式语法，尤其是任何 5.10 中添加的构造。

【讨论】：

【解决方案3】：

/\/([\w]+)$/i;

它是一个正则表达式，如果是一个完整的语句，则应用于$_变量，如下所示：

$_ =~ /\/([\w]+)$/i;

它查找斜线\/，后跟字母数字字符串\w+，然后是行尾$。它还捕获() 字母数字字符串，该字符串以变量$1 结尾。末尾的/i 使其不区分大小写，在这种情况下不起作用。

【讨论】：

【解决方案4】：

虽然它不能帮助“解释”正则表达式，但一旦你有了一个测试用例，Damian 的新 Regexp::Debugger 是一个很酷的实用程序，可以观察匹配过程中实际发生的情况。安装它然后在命令行执行rxrx启动调试器，然后输入/\/([\w]+)$/和'/r'（例如），最后m开始匹配。然后，您可以通过反复按 Enter 来逐步执行调试器。真的很酷！

【讨论】：

【解决方案5】：

这是将$_ 与后跟一个或多个字符（不区分大小写）的斜线进行比较，并将其存储在$1

$_ value     then     $1 value 
------------------------------
"/abcdes"     |       "abcdes"
"foo/bar2"    |       "bar2"
"foobar"      |       undef      # no slash so doesn't match

【讨论】：

$1 是 undef 除非之前的匹配，在这种情况下它的值是不变的。

【解决方案6】：

Online Regex Analyzer 值得一提。这是一个link 来解释您的正则表达式的含义，并粘贴在这里以作记录。

Sequence：按顺序匹配以下所有项

/                                                  (slash)
                                               --+
Repeat                                           | (in GroupNumber:1)
   AnyCharIn[ WordCharacter] one or more times   |
                                               --+
EndOfLine

【讨论】：