【问题标题】:Using c++ regex and too slow result使用 c++ 正则表达式和太慢的结果
【发布时间】:2016-02-22 06:04:58
【问题描述】:

速度对我来说是非常非常重要的标准。这是我的字符串

DataInfoBlock1 {

    Some description
}

DataInfoBlock2 {

    Some other informations
}

Block3 {

    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
    Anything = vl;
}

我想获取每个块的名称和内部内容,我正在使用 c++ 正则表达式,这是我的正则表达式和正则表达式搜索代码

smatch Matches;
string Rgx("[[:blank:]]*([^]*?)[[:blank:]]*[{][[:blank:]]*([^]*?)[[:blank:]]*[}]", regex::optimize);

while(regex_search(str, Matches, Rgx))
{

}

我的结果是

Array[1]
    Array[0] => DataInfoBlock1 {

        Some description
    }
    Array[1] => DataInfoBlock1
    Array[2] => Some description
,
...
...

我得到了完整的匹配,块名称和内部数据,但是速度非常非常糟糕(大约需要一秒钟)才能得到匹配!!!我应该怎么做才能更快地得到结果?这是正则表达式优化吗?我必须使用什么正则表达式来获得这样的结果,但在非常快的时间内?

【问题讨论】:

  • [^] 有效吗?您可能会通过更明确地提高性能。也使用+ 而不是*。我认为。我不太了解正则表达式。
  • 一切顺利,但我的问题是速度(完成时间)
  • 您将问题标记为 boost - 您使用的是 boost regex 还是 C++11+ regex 库?您将其标记为 pcre 但这不是 pcre 语法boost.org/doc/libs/1_57_0/libs/regex/doc/html/boost_regex/…
  • 正如我在问题中所说,我使用的是普通的 c++ 正则表达式。我使用了 boost 和 pcre 标签,如果 pcre 或 boost 中有更好的报价,请指导我。
  • @user216086:这不是标签的用途,不要那样做

标签: c++ regex


【解决方案1】:

不妨试试这个:

#include <regex>
#include <iostream>

const std::string s = R"~(
DataInfoBlock1 {

    Some description
}

DataInfoBlock2 {

    Some other informations
}

Block3 {

    Anything = vl;
    Anything = vl;
}
)~";

int main()
{    
    std::regex e(R"~((\w+)\s+\{([^}]+)\})~", std::regex::optimize);

    std::sregex_iterator iter(s.begin(), s.end(), e);
    std::sregex_iterator end;

    for(; iter != end; ++iter)
    {
        std::cout << "-----------------------------" << '\n';
        std::cout << "name: " << iter->str(1) << '\n';
        std::cout << "contents: " << iter->str(2) << '\n';
    }
}

输出:

-----------------------------
name: DataInfoBlock1
contents: 

    Some description

-----------------------------
name: DataInfoBlock2
contents: 

    Some other informations

-----------------------------
name: Block3
contents: 

    Anything = vl;
    Anything = vl;

Tested Here

【讨论】:

  • 太棒了!!!太好了……但仍然等待更好更快的正则表达式……那些 (~) 和 (R) 是什么?
  • @user216086 R"~()~" 都是所谓的raw string literal 的一部分。你可以选择你喜欢的括号(如果有的话)R"()"。重要的是选择字符串本身不太可能包含的内容。普通字符串文字需要转义特殊字符(包括转义字符本身:"\\" - 用于单个反斜杠)。原始字符串不包含任何转义字符,因此单个反斜杠不再是转义码。
  • @user216086 这对于regex 很有用,因为转义可能会让人感到困惑(正则表达式在C++ 之上有自己的转义。此外,原始字符串 可以是多行的(没有"\n" 行尾转义码)。新行实际上就是新行。
  • 你能告诉我我应该怎么做才能跳过我的内容中的任何描述/* anything here */ or // in one line?我的意思是当它们在/* anything here */ or // in one line 中时不要得到任何块和内容。我必须使用什么正则表达式来跳过描述?
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2012-01-27
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2016-07-06
相关资源
最近更新 更多