【问题标题】:Searching for Holy Grail of search and replace in C++在 C++ 中寻找搜索和替换的圣杯
【发布时间】:2015-12-21 18:55:29
【问题描述】:

最近我正在寻找一种方法来替换字符串中的标记,这基本上是查找和替换(但至少有一种额外的方法可以解决这个问题)并且看起来像是相当平庸的任务。我带来了几个可能的实现,但从性能的角度来看,它们都不令人满意。最好的成就是每次迭代约 50us。这种情况很理想,字符串的大小从未增长过,最初我省略了不区分大小写的要求
这是Coliru的代码

我的机器上的结果:
Boost.Spirit 符号结果:3421?=3421
100000 次循环耗时 6060 毫秒。
Boyer-Moore 结果:3421?=3421
100000 次循环耗时 5959 毫秒。
Boyer Moore Hospool 结果:3421?=3421
100000 次循环耗时 5008 毫秒。
Knuth Morris Pratt 结果:3421?=3421
100000 次循环耗时 12451 毫秒。
Naive STL 搜索和替换结果:3421?=3421
100000 次循环耗时 5532 毫秒。
提升 replace_all 结果:3421?=3421
100000 次循环耗时 4860 毫秒。

那么问题来了,这么简单的任务需要这么长时间?可以说,好的,简单的任务,继续并更好地实施它。但现实情况是,15 年前的 MFC 幼稚实现完成任务的速度要快一个数量级:

CString FillTokenParams(const CString& input, const std::unordered_map<std::string, std::string>& tokens)
{
    CString tmpInput = input;
    for(const auto& token : tokens)
    {
        int pos = 0;
        while(pos != -1)
        {
            pos = tmpInput.Find(token.first.c_str(), pos);
            if(pos != -1)
            {
                int tokenLength = token.first.size();
                tmpInput.Delete(pos, tokenLength);
                tmpInput.Insert(pos, token.second.c_str());
                pos += 1;
            }
        }
    }

    return tmpInput;
}

结果:
MFC naive 搜索替换结果:3421?=3421
100000 次循环耗时 516 毫秒。
为什么这个笨拙的代码比现代 C++ 更好???为什么其他实现这么慢?我错过了一些基本的东西吗?

EDIT001:我已经对这个问题进行了投资,对代码进行了分析并进行了三次检查。您可能对此不满意,但 std::string::replace 并不需要时间。在任何 STL 实现搜索中,大部分时间都需要花费时间,提升精神会浪费时间分配 tst(我猜是评估树中的测试节点)。我不希望有人在“这是你的问题”的函数中指向一条线,然后噗,问题就消失了。问题是 MFC 是如何将相同的工作提高 10 倍的。

EDIT002:刚刚深入研究 Find 的 MFC 实现并编写了一个模仿 MFC 实现的函数

namespace mfc
{
std::string::size_type Find(const std::string& input, const std::string& subString, std::string::size_type start)
{
    if(subString.empty())
    {
        return std::string::npos;
    }

    if(start < 0 || start > input.size())
    {
        return std::string::npos;
    }

    auto found = strstr(input.c_str() + start, subString.c_str());
    return ((found == nullptr) ? std::string::npos : std::string::size_type(found - input.c_str()));
}
}

std::string MFCMimicking(const std::string& input, const std::unordered_map<std::string, std::string>& tokens)
{
    auto tmpInput = input;
    for(const auto& token : tokens)
    {
        auto pos = 0;
        while(pos != std::string::npos)
        {
            pos = mfc::Find(tmpInput, token.first, pos);
            if(pos != std::string::npos)
            {
                auto tokenLength = token.first.size();
                tmpInput.replace(pos, tokenLength, token.second.c_str());
                pos += 1;
            }
        }
    }

    return tmpInput;
}

结果:
MFC 模拟展开结果:3421?=3421
100000 次循环耗时 411 毫秒。
意思是4us。每个电话,去击败那个 C strstr

EDIT003:使用 -Ox 编译和运行


MFC 模拟展开结果:3421?=3421
100000 次循环耗时 660 毫秒。
MFC 天真的搜索和替换结果:3421?=3421
100000 个周期花费了 856 毫秒。
手动展开结果:3421?=3421
100000 次循环耗时 1995 毫秒。
博耶-摩尔 结果:3421?=3421
100000 次循环耗时 6911 毫秒。
博耶摩尔医院 结果:3421?=3421
100000 次循环耗时 5670 毫秒。
克努斯·莫里斯·普拉特 结果:3421?=3421
100000 个周期花费了 13825 毫秒。
朴素的 STL 搜索和 替换结果:3421?=3421
100000 次循环耗时 9531 毫秒。
提升 replace_all 结果:3421?=3421
100000 个周期耗时 8996 毫秒。


使用 -O2 运行(与原始测量一样)但 10k 周期


MFC 模拟展开结果:3421?=3421
10000 个循环 104 毫秒。
MFC 天真的搜索和替换结果:3421?=3421
10000 周期花费了 105 毫秒。
手动展开结果:3421?=3421
10000 周期花费了 356 毫秒。
Boyer-Moore 结果:3421?=3421
10000 个周期 花了1355毫秒。
Boyer Moore Hospool 结果:3421?=3421
10000 周期花费了 1101 毫秒。
Knuth Morris Pratt 结果:3421?=3421
10000 个周期花费了 1973 毫秒。
Naive STL 搜索和替换结果: 3421?=3421
10000 次循环耗时 923 毫秒。
提升替换_all 结果:3421?=3421
10000 个周期耗时 880 毫秒。

【问题讨论】:

  • CString 的定义是什么?是你的课吗?它不是标准的 C++ 类。
  • 请澄清“现代 C++”。你说的是 C++14 吗?你说的是微软的 CLI 吗?
  • 请展示执行(测量)基准测试的代码。将代码放入您的问题中。
  • CString - MFC,微软基础类,一个古老的知识,在几个世纪前就消失了 :) 现代我的意思是现在,而不是二十年前实现的东西,无论是 C++11,14, 17 是的,C++03 也可以是现代的 :)
  • @ThomasMatthews 对代码提出质疑,200 行代码不是一个好主意,对吧?这就是 Coliru 的目的

标签: c++ algorithm boost mfc boost-spirit-qi


【解决方案1】:

所以,我对 Qi 版本有一些观察。

还创建了 X3 版本。

最后,编写了一个手动扩展函数,击败了所有其他候选函数(我希望它比 MFC 更快,因为它不会为重复删除/插入而烦恼)。

如果需要,请跳至基准图表。

关于 Qi 版本

  1. 是的,符号表存在基于节点的容器的局部性问题。它们可能不是您可以在此处使用的最佳匹配项。
  2. 无需在每个循环中重建符号:
  3. 不是按字符跳过非符号,而是扫描到下一个:

    +(bsq::char_ - symbols)
    
inline std::string spirit_qi(const std::string& input, bsq::symbols<char, std::string> const& symbols)
{
    std::string retVal;
    retVal.reserve(input.size() * 2);

    auto beg = input.cbegin();
    auto end = input.cend();

    if(!bsq::parse(beg, end, *(symbols | +(bsq::char_ - symbols)), retVal))
        retVal = input;

    return retVal;
}

这已经相当快了。但是:

手动循环

在这个简单的例子中,为什么不手动进行解析?

inline std::string manual_expand(const std::string& input, TokenMap const& tokens)
{
    std::ostringstream builder;
    auto expand = [&](auto const& key) {
        auto match = tokens.find(key);
        if (match == tokens.end())
            builder << "$(" << key << ")";
        else
            builder << match->second;
    };

    builder.str().reserve(input.size()*2);

    builder.str("");
    std::ostreambuf_iterator<char> out(builder);

    for(auto f(input.begin()), l(input.end()); f != l;) {
        switch(*f) {
            case '$' : {
                    if (++f==l || *f!='(') {
                        *out++ = '$';
                        break;
                    }
                    else {
                        auto s = ++f;
                        size_t n = 0;

                        while (f!=l && *f != ')')
                            ++f, ++n;

                        // key is [s,f] now
                        expand(std::string(&*s, &*s+n));

                        if (f!=l)
                            ++f; // skip '}'
                    }
                }
            default:
                *out++ = *f++;
        }
    }
    return builder.str();
}

这在我的机器上的性能非常出色。

其他想法

您可以查看 Boost Spirit Lex,可能带有静态生成的令牌表:http://www.boost.org/doc/libs/1_60_0/libs/spirit/doc/html/spirit/lex/abstracts/lexer_static_model.html。不过我不是特别喜欢 Lex。

比较:

查看Interactive Chart

这是使用Nonius 进行基准测试统计。

完整的基准代码:http://paste.ubuntu.com/14133072/

#include <boost/container/flat_map.hpp>

#define USE_X3
#ifdef USE_X3
#   include <boost/spirit/home/x3.hpp>
#else
#   include <boost/spirit/include/qi.hpp>
#endif

#include <boost/algorithm/string.hpp>
#include <boost/algorithm/searching/boyer_moore.hpp>
#include <boost/algorithm/searching/boyer_moore_horspool.hpp>
#include <boost/algorithm/searching/knuth_morris_pratt.hpp>
#include <string>
#include <unordered_map>
#include <iostream>
#include <fstream>
#include <nonius/benchmark.h++>
#include <nonius/main.h++>

using TokenMap = boost::container::flat_map<std::string, std::string>;

#ifdef USE_X3
    namespace x3  = boost::spirit::x3;

    struct append {
        std::string& out;
        void do_append(char const ch) const                       { out += ch;                      } 
        void do_append(std::string const& s)  const               { out += s;                       } 
        template<typename It>
        void do_append(boost::iterator_range<It> const& r)  const { out.append(r.begin(), r.end()); } 
        template<typename Ctx>
        void operator()(Ctx& ctx) const                           { do_append(_attr(ctx));          } 
    };

    inline std::string spirit_x3(const std::string& input, x3::symbols<char const*> const& symbols)
    {
        std::string retVal;
        retVal.reserve(input.size() * 2);
        append appender { retVal };

        auto beg = input.cbegin();
        auto end = input.cend();

        auto rule = *(symbols[appender] | x3::char_ [appender]);

        if(!x3::parse(beg, end, rule))
            retVal = input;

        return retVal;
    }
#else
    namespace bsq = boost::spirit::qi;

    inline std::string spirit_qi_old(const std::string& input, TokenMap const& tokens)
    {
        std::string retVal;
        retVal.reserve(input.size() * 2);
        bsq::symbols<char const, char const*> symbols;
        for(const auto& token : tokens) {
            symbols.add(token.first.c_str(), token.second.c_str());
        }

        auto beg = input.cbegin();
        auto end = input.cend();

        if(!bsq::parse(beg, end, *(symbols | bsq::char_), retVal))
            retVal = input;

        return retVal;
    }

    inline std::string spirit_qi(const std::string& input, bsq::symbols<char, std::string> const& symbols)
    {
        std::string retVal;
        retVal.reserve(input.size() * 2);

        auto beg = input.cbegin();
        auto end = input.cend();

        if(!bsq::parse(beg, end, *(symbols | +(bsq::char_ - symbols)), retVal))
            retVal = input;

        return retVal;
    }
#endif

inline std::string manual_expand(const std::string& input, TokenMap const& tokens) {
    std::ostringstream builder;
    auto expand = [&](auto const& key) {
        auto match = tokens.find(key);

        if (match == tokens.end())
            builder << "$(" << key << ")";
        else
            builder << match->second;
    };

    builder.str().reserve(input.size()*2);
    std::ostreambuf_iterator<char> out(builder);

    for(auto f(input.begin()), l(input.end()); f != l;) {
        switch(*f) {
            case '$' : {
                    if (++f==l || *f!='(') {
                        *out++ = '$';
                        break;
                    }
                    else {
                        auto s = ++f;
                        size_t n = 0;

                        while (f!=l && *f != ')')
                            ++f, ++n;

                        // key is [s,f] now
                        expand(std::string(&*s, &*s+n));

                        if (f!=l)
                            ++f; // skip '}'
                    }
                }
            default:
                *out++ = *f++;
        }
    }
    return builder.str();
}

inline std::string boost_replace_all(const std::string& input, TokenMap const& tokens)
{
    std::string retVal(input);
    retVal.reserve(input.size() * 2);

    for(const auto& token : tokens)
    {
        boost::replace_all(retVal, token.first, token.second);
    }
    return retVal;
}

inline void naive_stl(std::string& input, TokenMap const& tokens)
{
    input.reserve(input.size() * 2);
    for(const auto& token : tokens)
    {
        auto next = std::search(input.cbegin(), input.cend(), token.first.begin(), token.first.end());
        while(next != input.cend())
        {
            input.replace(next, next + token.first.size(), token.second);
            next = std::search(input.cbegin(), input.cend(), token.first.begin(), token.first.end());
        }
    }
}

inline void boyer_more(std::string& input, TokenMap const& tokens)
{
    input.reserve(input.size() * 2);
    for(const auto& token : tokens)
    {
        auto next =
            boost::algorithm::boyer_moore_search(input.cbegin(), input.cend(), token.first.begin(), token.first.end());
        while(next != input.cend())
        {
            input.replace(next, next + token.first.size(), token.second);
            next = boost::algorithm::boyer_moore_search(input.cbegin(), input.cend(), token.first.begin(),
                                                        token.first.end());
        }
    }
}

inline void bmh_search(std::string& input, TokenMap const& tokens)
{
    input.reserve(input.size() * 2);
    for(const auto& token : tokens)
    {
        auto next = boost::algorithm::boyer_moore_horspool_search(input.cbegin(), input.cend(), token.first.begin(),
                                                                  token.first.end());
        while(next != input.cend())
        {
            input.replace(next, next + token.first.size(), token.second);
            next = boost::algorithm::boyer_moore_search(input.cbegin(), input.cend(), token.first.begin(),
                                                        token.first.end());
        }
    }
}

inline void kmp_search(std::string& input, TokenMap const& tokens)
{
    input.reserve(input.size() * 2);
    for(const auto& token : tokens)
    {
        auto next = boost::algorithm::knuth_morris_pratt_search(input.cbegin(), input.cend(), token.first.begin(),
                                                                token.first.end());
        while(next != input.cend())
        {
            input.replace(next, next + token.first.size(), token.second);
            next = boost::algorithm::boyer_moore_search(input.cbegin(), input.cend(), token.first.begin(),
                                                        token.first.end());
        }
    }
}

namespace testdata {
    std::string const expected =
        "Five and Seven said nothing, but looked at Two. Two began in a low voice, 'Why the fact is, you see, Miss, "
        "this here ought to have been a red rose-tree, and we put a white one in by mistake; and if the Queen was to "
        "find it out, we should all have our heads cut off, you know. So you see, Miss, we're doing our best, afore "
        "she comes, to—' At this moment Five, who had been anxiously looking across the garden, called out 'The Queen! "
        "The Queen!' and the three gardeners instantly threw themselves flat upon their faces. There was a sound of "
        "many footsteps, and Alice looked round, eager to see the Queen.First came ten soldiers carrying clubs; these "
        "were all shaped like the three gardeners, oblong and flat, with their hands and feet at the corners: next the "
        "ten courtiers; these were ornamented all over with diamonds, and walked two and two, as the soldiers did. "
        "After these came the royal children; there were ten of them, and the little dears came jumping merrily along "
        "hand in hand, in couples: they were all ornamented with hearts. Next came the guests, mostly Kings and "
        "Queens, and among them Alice recognised the White Rabbit: it was talking in a hurried nervous manner, smiling "
        "at everything that was said, and went by without noticing her. Then followed the Knave of Hearts, carrying "
        "the King's crown on a crimson velvet cushion; and, last of all this grand procession, came THE KING AND QUEEN "
        "OF HEARTS.Alice was rather doubtful whether she ought not to lie down on her face like the three gardeners, "
        "but she could not remember ever having heard of such a rule at processions; 'and besides, what would be the "
        "use of a procession,' thought she, 'if people had all to lie down upon their faces, so that they couldn't see "
        "it?' So she stood still where she was, and waited.When the procession came opposite to Alice, they all "
        "stopped and looked at her, and the Queen said severely 'Who is this?' She said it to the Knave of Hearts, who "
        "only bowed and smiled in reply.'Idiot!' said the Queen, tossing her head impatiently; and, turning to Alice, "
        "she went on, 'What's your name, child?''My name is Alice, so please your Majesty,' said Alice very politely; "
        "but she added, to herself, 'Why, they're only a pack of cards, after all. I needn't be afraid of them!''And "
        "who are these?' said the Queen, pointing to the three gardeners who were lying round the rosetree; for, you "
        "see, as they were lying on their faces, and the pattern on their backs was the same as the rest of the pack, "
        "she could not tell whether they were gardeners, or soldiers, or courtiers, or three of her own children.'How "
        "should I know?' said Alice, surprised at her own courage. 'It's no business of mine.'The Queen turned crimson "
        "with fury, and, after glaring at her for a moment like a wild beast, screamed 'Off with her head! "
        "Off—''Nonsense!' said Alice, very loudly and decidedly, and the Queen was silent.The King laid his hand upon "
        "her arm, and timidly said 'Consider, my dear: she is only a child!'The Queen turned angrily away from him, "
        "and said to the Knave 'Turn them over!'The Knave did so, very carefully, with one foot.'Get up!' said the "
        "Queen, in a shrill, loud voice, and the three gardeners instantly jumped up, and began bowing to the King, "
        "the Queen, the royal children, and everybody else.'Leave off that!' screamed the Queen. 'You make me giddy.' "
        "And then, turning to the rose-tree, she went on, 'What have you been doing here?'";
    std::string const inputWithtokens =
        "Five and Seven said nothing, but looked at $(Two). $(Two) began in a low voice, 'Why the fact is, you see, "
        "Miss, "
        "this here ought to have been a red rose-tree, and we put a white one in by mistake; and if the Queen was to "
        "find it out, we should all have our $(heads) cut off, you know. So you see, Miss, we're doing our best, afore "
        "she comes, to—' At this moment Five, who had been anxiously looking across the garden, called out 'The Queen! "
        "The Queen!' and the three gardeners instantly threw themselves flat upon their faces. There was a sound of "
        "many footsteps, and Alice looked round, eager to see the $(Queen).First came ten soldiers carrying clubs; "
        "these "
        "were all shaped like the three gardeners, oblong and flat, with their hands and feet at the corners: next the "
        "ten courtiers; these were ornamented all over with $(diamonds), and walked two and two, as the soldiers did. "
        "After these came the royal children; there were ten of them, and the little dears came jumping merrily along "
        "hand in hand, in couples: they were all ornamented with hearts. Next came the guests, mostly Kings and "
        "Queens, and among them Alice recognised the White Rabbit: it was talking in a hurried nervous manner, smiling "
        "at everything that was said, and went by without noticing her. Then followed the Knave of Hearts, carrying "
        "the King's crown on a crimson velvet cushion; and, last of all this grand procession, came THE KING AND QUEEN "
        "OF HEARTS.Alice was rather doubtful whether she ought not to lie down on her face like the three gardeners, "
        "but she could not remember ever having heard of such a rule at processions; 'and besides, what would be the "
        "use of a procession,' thought she, 'if people had all to lie down upon their faces, so that they couldn't see "
        "it?' So she stood still where she was, and waited.When the procession came opposite to Alice, they all "
        "stopped and looked at her, and the $(Queen) said severely 'Who is this?' She said it to the Knave of Hearts, "
        "who "
        "only bowed and smiled in reply.'Idiot!' said the Queen, tossing her head impatiently; and, turning to Alice, "
        "she went on, 'What's your name, child?''My name is Alice, so please your Majesty,' said Alice very politely; "
        "but she added, to herself, 'Why, they're only a pack of cards, after all. I needn't be afraid of them!''And "
        "who are these?' said the $(Queen), pointing to the three gardeners who were lying round the rosetree; for, "
        "you "
        "see, as they were lying on their faces, and the $(pattern) on their backs was the same as the rest of the "
        "pack, "
        "she could not tell whether they were gardeners, or soldiers, or courtiers, or three of her own children.'How "
        "should I know?' said Alice, surprised at her own courage. 'It's no business of mine.'The Queen turned crimson "
        "with fury, and, after glaring at her for a moment like a wild beast, screamed 'Off with her head! "
        "Off—''Nonsense!' said $(Alice), very loudly and decidedly, and the Queen was silent.The $(King) laid his hand "
        "upon "
        "her arm, and timidly said 'Consider, my dear: she is only a child!'The $(Queen) turned angrily away from him, "
        "and said to the $(Knave) 'Turn them over!'The $(Knave) did so, very carefully, with one foot.'Get up!' said "
        "the "
        "Queen, in a shrill, loud voice, and the three gardeners instantly jumped up, and began bowing to the King, "
        "the Queen, the royal children, and everybody else.'Leave off that!' screamed the Queen. 'You make me giddy.' "
        "And then, turning to the rose-tree, she went on, 'What have you been doing here?'";

    static TokenMap const raw_tokens {
        {"Two", "Two"},           {"heads", "heads"},
        {"diamonds", "diamonds"}, {"Queen", "Queen"},
        {"pattern", "pattern"},   {"Alice", "Alice"},
        {"King", "King"},         {"Knave", "Knave"},
        {"Why", "Why"},           {"glaring", "glaring"},
        {"name", "name"},         {"know", "know"},
        {"Idiot", "Idiot"},       {"children", "children"},
        {"Nonsense", "Nonsense"}, {"procession", "procession"},
    };

    static TokenMap const tokens {
        {"$(Two)", "Two"},           {"$(heads)", "heads"},
        {"$(diamonds)", "diamonds"}, {"$(Queen)", "Queen"},
        {"$(pattern)", "pattern"},   {"$(Alice)", "Alice"},
        {"$(King)", "King"},         {"$(Knave)", "Knave"},
        {"$(Why)", "Why"},           {"$(glaring)", "glaring"},
        {"$(name)", "name"},         {"$(know)", "know"},
        {"$(Idiot)", "Idiot"},       {"$(children)", "children"},
        {"$(Nonsense)", "Nonsense"}, {"$(procession)", "procession"},
    };

}

NONIUS_BENCHMARK("manual_expand", [](nonius::chronometer cm)     {
    std::string const tmp = testdata::inputWithtokens;
    auto& tokens = testdata::raw_tokens;

    std::string result;
    cm.measure([&](int) {
        result = manual_expand(tmp, tokens);
    });
    assert(result == testdata::expected);
})

#ifdef USE_X3
NONIUS_BENCHMARK("spirit_x3", [](nonius::chronometer cm) {
    auto const symbols = [&] {
        x3::symbols<char const*> symbols;
        for(const auto& token : testdata::tokens) {
            symbols.add(token.first.c_str(), token.second.c_str());
        }
        return symbols;
    }();

    std::string result;
    cm.measure([&](int) {
            result = spirit_x3(testdata::inputWithtokens, symbols);
        });
    //std::cout << "====\n" << result << "\n====\n";
    assert(testdata::expected == result);
})
#else
NONIUS_BENCHMARK("spirit_qi", [](nonius::chronometer cm) {
    auto const symbols = [&] {
        bsq::symbols<char, std::string> symbols;
        for(const auto& token : testdata::tokens) {
            symbols.add(token.first.c_str(), token.second.c_str());
        }
        return symbols;
    }();

    std::string result;
    cm.measure([&](int) {
            result = spirit_qi(testdata::inputWithtokens, symbols);
        });
    assert(testdata::expected == result);
})

NONIUS_BENCHMARK("spirit_qi_old", [](nonius::chronometer cm) {
    std::string result;
    cm.measure([&](int) {
            result = spirit_qi_old(testdata::inputWithtokens, testdata::tokens);
        });
    assert(testdata::expected == result);
})
#endif

NONIUS_BENCHMARK("boyer_more", [](nonius::chronometer cm) {
    cm.measure([&](int) {
        std::string tmp = testdata::inputWithtokens;
        boyer_more(tmp, testdata::tokens);
        assert(tmp == testdata::expected);
    });
})

NONIUS_BENCHMARK("bmh_search", [](nonius::chronometer cm) {
    cm.measure([&](int) {
        std::string tmp = testdata::inputWithtokens;
        bmh_search(tmp, testdata::tokens);
        assert(tmp == testdata::expected);
    });
})

NONIUS_BENCHMARK("kmp_search", [](nonius::chronometer cm) {
    cm.measure([&](int) {
        std::string tmp = testdata::inputWithtokens;
        kmp_search(tmp, testdata::tokens);
        assert(tmp == testdata::expected);
    });
})

NONIUS_BENCHMARK("naive_stl", [](nonius::chronometer cm) {
    cm.measure([&](int) {
            std::string tmp = testdata::inputWithtokens;
            naive_stl(tmp, testdata::tokens);
            assert(tmp == testdata::expected);
        });
})

NONIUS_BENCHMARK("boost_replace_all", [](nonius::chronometer cm)     {
    std::string const tmp = testdata::inputWithtokens;

    std::string result;
    cm.measure([&](int) {
        result = boost_replace_all(testdata::inputWithtokens, testdata::tokens);
    });
    assert(result == testdata::expected);
})

【讨论】:

  • 添加了full comparative benchmarks。您可以看到没有什么比我的手动方法(8µs)更好,而 Spirit X3 排在第二位(22µs)。正如预期的那样,替换方法较慢。录制的直播流:part 1part 2
  • 我怀疑它是否更快,但x3::raw[x3::seek[&amp;symbols|x3::eoi]][appender] % symbols[appender]; 也可以。
  • @cv_and_he 哦,我还是忘了,让我比较一下。哇!它是 as fast 作为手动编码的 (code)。我非常接近 - 感谢您使用 eoi 修复它 :)
  • @kreuzerkrieg 如果您对两种 X3 方法之一的 Spirit Qi 版本感兴趣,可以尝试this。我认为它们应该相似,但它可能(或可能没有)具有不同的性能特征(由于两个库的内部机制)。
  • @cv_and_he,同时我不能用 VS2015 编译 sehe 示例,用 Boost.Spirit 邮件列表检查它,但看起来它不支持 VS
【解决方案2】:

用于 MFCMimicking 的 EDIT2: 好吧,从您的代码中可以明显看出为什么 MFC 版本更快:它不像您的其他一些版本那样搜索每次替换的整个字符串(我仍然无法解释 boost::spirit)。一旦它进行替换,它就会从替换点开始搜索,而不是从字符串的开头开始搜索,所以很明显这样会更快。

编辑:在做了更多研究并看到 (Algorithm to find multiple string matches) 之后,似乎使用良好的单字符串匹配算法来查找多个搜索词是这里的实际问题。可能您最好的选择是使用适当的算法(该问题中提到了一些)。

至于为什么 MFC 更快?我建议将其提炼成一个不同的问题“为什么在 CString 中删除和插入比 std::string 快得多”或类似的东西,并确保将其标记为 C++ 和 MFC,以便具有适当专业知识的人可以提供帮助(我有使用标准 C++ 的经验,但无法帮助了解 VC++ 对 CString 进行了哪些优化)。

原答案: 好的,由于代码量很大,我只看了expandTokens3,但我认为所有版本都有同样的问题。您的代码有两个潜在的重大性能问题:

  • 每次替换时都搜索整个字符串。如果您在一个字符串中有十个变量要替换,那么它所占用的时间将是所需时间的十倍。

  • 您在输入字符串中就地执行每个替换,而不是从每个部分构建结果字符串。这可能会导致为您所做的每个替换进行内存分配和复制,同样可能会显着增加运行时间。

【讨论】:

  • 同意第一个,你可以尝试在输入字符串中找到一个token,然后在token map中匹配。强烈不同意第二个。两个版本都有,没什么大的变化。如果您不知道结果的大小,则无法避免分配,在这种特殊情况下,不会发生分配,因为结果字符串将始终小于输入。而且,当然我已经对其进行了分析,不会说大部分时间都花在了分配上。关键是,古老的、可怕的、令人作呕的 MFC 怎么能打败一切?
  • @kreuzerkrieg 通过对整体替换增加的基本启发式估计,您可以最大限度地减少重新分配。
  • 你可以,为了简单起见,我已经把它排除在等式之外(在最初的问题中提到过)
  • 在您的编辑中:您错过了非常重要的一点,替换不会占用大部分时间,搜索会。因此,如果提炼出一个简短的问题,它应该类似于“他们对 CString::Find 做了什么让工作这么快”。至于标签:这是SO不方便的部分,必须删除boost标签才能添加MFC。并感谢您的链接,将研究 grep 的东西。
  • Re.: EDIT2 我的发现明确表明瓶颈是strstrstd::search 甚至std::find。这完全与“重新”搜索是否从已知偏移量开始无关。
【解决方案3】:

那么问题来了,这么简单的任务需要这么长时间?可以说,好的,简单的任务,继续并更好地实施它。但现实情况是,15 年前的 MFC 幼稚实现完成任务的速度要快一个数量级

答案其实很简单。

首先我在我的 macbook pro 上使用 apple clang 7.0 编译了你的代码:

$ cc --version
Apple LLVM version 7.0.0 (clang-700.1.76)
Target: x86_64-apple-darwin15.2.0
Thread model: posix

结果似乎与 OP 的匹配...

Boost.Spirit symbols result: 3425?=3425
10000 cycles took 8906ms.
Boyer-Moore results:3425?=3425
10000 cycles took 2891ms.
Boyer Moore Hospool result:3425?=3425
10000 cycles took 2392ms.
Knuth Morris Pratt result: 3425?=3425
10000 cycles took 4363ms.
Naive STL search and replace result: 3425?=3425
10000 cycles took 4333ms.
Boost replace_all result:3425?=3425
10000 cycles took 23284ms.
MFCMimicking result:3425?=3425
10000 cycles took 426ms.    <-- seemingly outstanding, no?

然后我添加了 -O3 标志:

Boost.Spirit symbols result: 3425?=3425
10000 cycles took 675ms.
Boyer-Moore results:3425?=3425
10000 cycles took 788ms.
Boyer Moore Hospool result:3425?=3425
10000 cycles took 623ms.
Knuth Morris Pratt result: 3425?=3425
10000 cycles took 1623ms.

Naive STL search and replace result: 3425?=3425
10000 cycles took 562ms.                    <-- pretty good!!!

Boost replace_all result:3425?=3425
10000 cycles took 748ms.
MFCMimicking result:3425?=3425
10000 cycles took 431ms.                    <-- awesome but not as outstanding as it was!

现在结果与 MFC CString 结果的数量级相同。

为什么?

因为当您针对 BOOST 和/或 STL 进行编译时,您正在扩展模板,并且库代码采用与您的编译单元相同的优化设置。

当您链接到 MFC 时,您链接到的是在打开优化的情况下编译的共享库。

当您使用strstr 时,您调用的是经过预编译、优化并且在某些部分是手写的c 库。当然会很快!

解决了:)

10000 次循环而不是 100000 次,不同的机器......

作为参考,以下是在笔记本电脑上使用电池供电运行 100,000 次循环版本的结果。全面优化(-O3):

Boost.Spirit symbols result: 3425?=3425
100000 cycles took 6712ms.
Boyer-Moore results:3425?=3425
100000 cycles took 7923ms.
Boyer Moore Hospool result:3425?=3425
100000 cycles took 6091ms.
Knuth Morris Pratt result: 3425?=3425
100000 cycles took 16330ms.

Naive STL search and replace result: 3425?=3425
100000 cycles took 6719ms.

Boost replace_all result:3425?=3425
100000 cycles took 7353ms.

MFCMimicking result:3425?=3425
100000 cycles took 4076ms.

【讨论】:

  • Clang 7.0?此外,这完全符合我的发现。 Clang 3.7 和 GCC 5.2 并没有使 Spirit Qi 速度更快。此外,这忽略了这样一个事实,即在我的答案或 OP 最后添加的 MFCMimicking 版本中执行手动方法仍然要快得多。
  • @sehe Apple 已经从主要的 clang 存储库(白痴!)分支出来并且有不同的版本号。
  • @RichardHodges 这清楚地显示了 Clang 的优越性,Apple 微调的高端硬件和 OSX 的工程奇迹。当然只是开玩笑。当我执行 100k 周期时,您运行了 10k 周期,零很容易错过,因此十倍的“改进”;)抛开你没有运行(当然)MFC 版本,所以你没有将苹果与苹果进行比较这些结果来自不同的参考机器
  • @kreuzerkrieg 添加了 mfc 模仿以进行比较。我想你会发现它证明了这一点。
  • @kreuzerkrieg 您在 Coliru 上链接到的代码有这一行:size_t cycles = 10'000;,如果我没记错的话,是一万。你确定你用 100,000 在 MFC 上试过吗?数量级上的差异与这里所犯的错误相符。
【解决方案4】:

只是一些更新。我运行了原始的 STL 代码(使用search)与受 MFC 启发的代码,我通过优化得到了它(-O2)stl-base 给出了228ms,而 MFC-like 给出了285ms。如果没有优化,我会得到类似 7284ms310ms 的东西。我在 macbook2016Pro 上使用 i7-6700HQ CPU @ 2.60GHz 进行操作。 所以基本上使用strstr的代码无法优化,而STL代码进行了大量优化。

然后我运行了naiveSTL 代码的最终版本,它使用find 而不是搜索,它给了我28ms。所以绝对是赢家。如果@kreuzerkrieg 的链接有一天失效,我添加了下面的代码。

inline void naiveSTL(std::string& input, const TokenMap& tokens)
{
    input.reserve(input.size() * 2);
    for(const auto& token : tokens)
    {
        auto next = input.find(token.first);
        while(next != std::string::npos)
        {
            input.replace(next, token.first.size(), token.second);
            next = input.find(token.first, next + 1);
        }
    }
}

【讨论】:

  • 确实,事情发生了变化。我将所有代码放在一起并将其放到 github 存储库中。 Nonius 结果可在线获取。答案已通过链接更新。但是,底线没有改变:)
【解决方案5】:

好吧,这将是一个漫长的故事。只是为了提醒您提出的问题。

  1. 为什么使用 C++(各种方法)的搜索和替换如此缓慢?
  2. 为什么 MFC 的搜索和替换速度如此之快?

令人惊讶的是,这两个问题的答案相同。因为 C++ 开销。 是的。我们闪亮的现代 C++ 有一个开销,我们大多忽略了 灵活性和优雅。

然而,对于亚微秒级的分辨率(并不是说 C++ 不是 能够以纳秒级分辨率做事)开销变得更多 突出。

让我用我在问题中发布的相同代码进行展示,但它更多 与每个功能中所做的事情保持一致。

Live On Coliru

它使用前面提到的 Nonius(感谢@sehe),交互结果是here

You can click the legend to show/hide particular series.

结论

有两个突出的结果

  • MFC模仿功能和
  • 我自己的手动替换

这些函数至少比其他函数快一个数量级,那么有什么区别呢?

所有这些“慢”函数都是用 C++ 编写的,而 fast 是用 C 编写的(不是纯 C,当输出大小增加时,我懒得处理输出缓冲区的 malloc/realloc)。好吧,我想很明显,有时别无选择,只能求助于纯 C。出于安全原因和缺乏类型安全性,我个人反对使用 C。此外,编写高质量的 C 代码只需要更多的专业知识和注意力。

暂时不标记为答案,等待cmets给出这个结论。

我要感谢所有积极参与讨论、提出想法并指出我示例中不一致之处的人。

2019 年更新
只是为了保留代码:https://github.com/kreuzerkrieg/string_search_replace
Nonius 结果:https://kreuzerkrieg.github.io/string_search_replace/

在 Ubuntu 18.04 上使用 gcc-9 运行

【讨论】:

  • 我想你忘了测试你的kreuzerkriegManual函数。结果为this。另外,我的不是“反向”的,您每次都通过操作密钥对其进行了一些限制。你能更新与你修复的错误的比较吗?
  • 我已经对其进行了测试,然后我对其进行了一些更改,并且很可能在那里引入了一些错误。稍后会修复它,今天无法访问我的机器
  • @sehe,刚刚测试过,按预期工作,还检查了我发布的代码版本,它与我在本地运行的版本相同。奇怪的故事......
  • 一点也不奇怪。原来您只是使用了错误的令牌映射:side-by-side.cpp(士气:assert 如果您强制要求NDEBUG,则毫无用处)。幸运的是,性能差异有利于正确的版本:interactive graph
  • 啊,该死!士气是不要运行#define 依赖代码并期望相同的结果!并感谢您的帮助!
【解决方案6】:

您对std::string:replace 的可疑使用是如此的缓慢,以至于代码中的其他内容都不重要。

【讨论】:

  • 这个毫无意义的替换击败了酷小孩 - boost.spirit
  • 我不知道 Boost.Spirit 提出任何性能声明。它的目的是易于使用,而不是字符串匹配的快速性能。
  • @MaxLybbert:Boost.Spirit 偶尔会声称速度非常快,但我认为易用性并不是它的长处。也许“易于使用,对于那些已经是专家的人”:) boost-spirit.com/home/2014/09/03/… 我认为无论如何在字符串匹配的低级算法上竞争是相当困难的,你能做的事情就这么多。
  • @MaxLybbert 如果您想查看,请在我的回答中添加比较基准with charts
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2015-06-15
  • 2011-11-03
  • 1970-01-01
  • 1970-01-01
  • 2012-08-12
  • 2011-04-16
  • 1970-01-01
相关资源
最近更新 更多