使用正则表达式解析文本答案

【问题标题】：parsing text using regular expression使用正则表达式解析文本
【发布时间】：2016-04-27 13:43:59
【问题描述】：

我有一些文本，我想提取具有以下模式的行。

string1(string2,string3,int)

我正在使用 perl 来解析我只能为 string1(string2) 的规则编写代码

#!/usr/bin/perl
$txt='A (A, B, 49997 )';

$re1='((?:[a-z][a-z]+))';   # Word 1
$re2='.*?'; # Non-greedy match on filler
$re3='(\\(.*\\))';  # Round Braces 1

$re=$re1.$re2.$re3;
if ($txt =~ m/$re/is)
{
$word1=$1;
$rbraces1=$2;
print "($word1) ($rbraces1) \n";
}

【问题讨论】：

你的问题是什么？

标签： perl parsing

【解决方案1】：

如果事先知道括号中的元素个数，就可以轻松匹配每个元素：

#!/usr/bin/perl
use warnings;
use strict;

my $txt='A (B, C, 49997 )';

my $id  = qr/([a-z]+)/i;
my $int = qr/([1-9][0-9]*)/;


if (my @matches = $txt =~ /$id \s* \(  $id , \s* $id , \s* $int \s* \)/x ) {
    print "($_)\n" for @matches;
}

如果标识符可以重复任意多次，您仍然可以使用( $id , \s* )+ 进行匹配，但只会返回最后一个捕获组。在这种情况下，提取整个列表并使用split /,\s*/。

#!/usr/bin/perl
use warnings;
use strict;

my $txt='A (B, C, D, E, F, 49997 )';

my $id   = qr/[a-z]+/i;
my $int  = qr/[1-9][0-9]*/;
my $list = qr/$id (?: , \s* $id)*/x;


if (my @matches = $txt =~ /($id) \s* \( ($list) , \s* ($int) \s* \)/x ) {
    splice @matches, 1, 1, split /,\s*/, $matches[1];
    print "($_)\n" for @matches;
}

【讨论】：