C++ 正则表达式：哪个组匹配？答案

【问题标题】：C++ regex: Which group matched?C++ 正则表达式：哪个组匹配？
【发布时间】：2017-07-06 13:05:01
【问题描述】：

我有一个正则表达式，其中包含通过 or 条件连接的各种子组：

([[:alpha:]]+)|([[:digit:]]+)

当我匹配字符串 1 a 2 时，我得到三个匹配项：1、a 和 2。

在 C++ 中有没有办法确定匹配的子模式？

【问题讨论】：

创建一个最小的运行示例。提及您正在使用的正则表达式库。现在的答案是肯定的，不能提供更多细节。
为什么需要知道匹配的是哪个组？
如果您使用std::regex，您可以在匹配时传递match_results<...> 对象。这将包含一个 sub_match<...> 对象数组，您可以遍历该数组以找到匹配的对象。

标签： c++ regex

【解决方案1】：

不直接。

使用std::regex 库，match_result 类负责处理 sub-match，它有一个名为 std::match_results::size 的方法，您可以通过该方法找到编号的子匹配。

例如：

std::string str( "one two three four five" );
std::regex rx( "(\\w+)(\\w+)(\\w+)(\\w+)(\\w+)" );
std::match_results< std::string::const_iterator > mr;

std::regex_search( str, mr, rx );

std::cout << mr.size() << '\n'; // 6

这里的输出是 6 而不是 5 因为匹配本身也被计算在内。您可以通过.str( number ) 方法或operator[] 访问它们

因此，由于子匹配是从 从左到右 计算的，您应该在看到 size 方法的输出后确定女巫组是否匹配。

如果将 rx 更改为 "(\\w+)(\\d+)(\\w+)"，则 size = 0

如果您将 rx 更改为 "(\\w+).+"，则大小为 2。这意味着你有一个完全成功的匹配和一个sum-match

例如：

std::string str( "one two three four five" );
std::regex rx( "(\\w+).+" );
std::match_results< std::string::const_iterator > mr;

std::regex_search( str, mr, rx );

std::cout << mr.str( 1 ) << '\n'; // one
std::cout << mr[ 1 ] << '\n';     // one

两者的输出是：一个

如果你只想打印 sub-match，你可以使用一个简单的循环，它有一个 index 并且这个索引从 1 开始强> 不是 0

例如：

std::string str( "one two three four five" );
std::regex rx( "(\\w+) \\w+ (\\w+) \\w+ (\\w+)" );
std::match_results< std::string::const_iterator > mr;

std::regex_search( str, mr, rx );

for( std::size_t index = 1; index < mr.size(); ++index ){
    std::cout << mr[ index ] << '\n';
}

输出是：

one
three
five

通过说确定哪些子模式匹配
如果您的意思是指定应该从搜索引擎返回哪个子匹配，那么答案是是，您可以使用std::regex_token_iterator 确定那：

例如：（迭代每个匹配的 second 个子匹配）

std::string str( "How are you today ? I am fine . How about you ?" );
std::regex rx( "(\\w+) (\\w+) ?" );
std::match_results< std::string::const_iterator > mr;

std::regex_token_iterator< std::string::const_iterator > first( str.begin(), str.end(), rx, 2 ), last;

while( first != last ){
    std::cout << first->str() << '\n';
    ++first;
}

最后一个参数是 2：( str.begin(), str.end(), rx, 2 )，这意味着您只需要 第二个子匹配。所以输出是：

are
today
am
about

【讨论】：