查找字符串中单词出现次数的最佳方法（C++，字符串中没有空格）答案

【问题标题】：The best way to find number of occurrences of word in string (C++, no spaces in string)查找字符串中单词出现次数的最佳方法（C++，字符串中没有空格）
【发布时间】：2019-05-16 12:01:02
【问题描述】：

假设给你一个字符串“banana”。

您想知道在“banana”中可以找到多少次“ana”。所以，这个数字是 2。

string s = "banana";
int num = 0,pos = 0;
pos = s.find("ana");
while(pos!=string::npos) {
    num++;
    pos = s.find("ana",pos+1);
}
cout<<num<<endl;

问题是，我想为此编写更短的代码。可以使用哪些功能？我尝试使用 search() 但这不是我想要的。 Count() 仅适用于字符。

还有其他功能可以帮助我做到这一点吗？（比赛中不允许升压，所以这次不行）。

【问题讨论】：

这和你要求的一样好。我想你可以使用正则表达式来删除一两行代码，但正则表达式可能非常繁重。
比赛 :) 。什么比赛？在哪里？如果您告诉我，我可能不会给您答案，而是将其发布到我自己的比赛中:) 此外，更短的代码并不一定意味着更快的程序！
比赛是针对学生的国家级比赛：/
不应该像pos = s.find("ana", pos + 2); 因为“ana”是你要找的不是“nan”吗？
如果比赛允许参赛者获得外部帮助，我会感到惊讶。

标签： c++ string

【解决方案1】：

假设您最终还打算计算任何可能的短语复制，这里有 3 个通用函数模板，它们几乎适用于任何容器或指针/数组，前提是它们被赋予正确的第一个/最后一个迭代器而不是容器！

#include <iterator>
#include <algorithm>
#include <map>
#include <vector>

//finding the occurrence for an specific case within a range
template<typename rng_t__, typename dif_t__ = typename std::iterator_traits<rng_t__>::difference_type>
dif_t__ occurrence(const rng_t__ rng_fst_, const rng_t__ rng_lst_, const rng_t__ schd_fst_, const rng_t__ schd_lst){
    dif_t__ counter = 0;
    for (rng_t__ it = rng_fst_; (it = std::search(it, rng_lst_, schd_fst_, schd_lst))++ != rng_lst_; ++counter);
    return counter;
}
//finding the replications for all subsets with certain length within a range
template<typename rng_t__, typename dif_t__ = typename std::iterator_traits<rng_t__>::difference_type, typename val_t__ = typename std::iterator_traits<rng_t__>::value_type>
dif_t__ replications(const rng_t__ rng_fst_, const rng_t__ rng_lst_, dif_t__ lnt_){
    if(!lnt_ or lnt_ >= std::distance(rng_fst_, rng_lst_)) return 0;
    rng_t__ it_lst = rng_fst_;
    for (--lnt_; lnt_--; ++it_lst);
    std::map<std::vector<val_t__>, dif_t__> cases;
    for (rng_t__ it_fst = rng_fst_; it_lst++ != rng_lst_; ++it_fst){
        auto it_rslt_pair = cases.insert({{it_fst, it_lst}, 0});
        if(! it_rslt_pair.second) ++(it_rslt_pair.first->second);
    }
    dif_t__ counter = 0;
    for (const auto& a_case : cases) counter += a_case.second;
    return counter;
}
//finding the replications for all subsets with all possible lengths within a range
template<typename rng_t__, typename dif_t__ = typename std::iterator_traits<rng_t__>::difference_type, typename val_t__ = typename std::iterator_traits<rng_t__>::value_type>
dif_t__ replications(const rng_t__ rng_fst_, const rng_t__ rng_lst_){
    const dif_t__ rng_lnt = std::distance(rng_fst_, rng_lst_);
    dif_t__ counter = 0;
    for (dif_t__ a_lnt = 0; ++a_lnt < rng_lnt; counter += replications(rng_fst_, rng_lst_, a_lnt));
    return counter;
}

#include <string>

int main(int argc, char** argv) {

    std::string range = "banana", searched = "ana";

    std::cout<< "total occurrence for the ana" << std::endl;
    std::cout<< occurrence("banana", "banana" + 6, "ana", "ana" +3) << std::endl;
    std::cout<< occurrence(range.begin(), range.end(), searched.begin(), searched.end()) << std::endl;

    std::cout<< "total replications for every phrase from banana with length of 3" << std::endl;
    std::cout<< replications("banana", "banana" + 6, 3) << std::endl;
    std::cout<< replications(range.begin(), range.end(), 3) << std::endl;

    std::cout<< "total replications for every phrase from banana with every possible length" << std::endl;
    std::cout<< replications("banana", "banana" + 6) << std::endl;
    std::cout<< replications(range.begin(), range.end()) << std::endl;

    return 0;
}

可能的输出：

total occurrence for the ana
2
2
total replications for every phrase from banana with length of 3
1
1
total replications for every phrase from banana with every possible length
6
6

祝你比赛顺利！

【讨论】：

【解决方案2】：

我想为此编写更短的代码

你想使用while (true)和break：

std::string s{"banana"};
std::string::size_type pos{0};
int num{0};
while (true) {
    pos = s.find("ana", pos);
    if (pos == std::string::npos) break;
    pos += 2; // next possible place for an "ana";
    ++num;
} 
std::cout << num << "\n";

【讨论】：

使用while(true) 和break 真的更快吗？
@AKL - 唯一的判断方法是检查组件或测量持续时间。
@2785528 感谢您的明智、广泛和典型的回答！我真正的意思是，为什么它应该更快？
@AKL OP 没有要求更快的代码，OP 要求更短的代码。这种方式只使用一行find。
为什么要更快？我不知道它应该。我的经验是您的桌面（和我的）是一个复杂的系统。 “代码更改”的大小或性质与“性能更改”无关。该系统是非线性的。

【解决方案3】：

您可以为此使用for 循环：

string s = "banana";
int num = 0;

for (auto pos = s.find("ana"); pos != string::npos; pos = s.find("ana",pos+1))
    num++;
cout << num << endl;

如您所见，我使用auto 作为pos 的类型，因此它将具有正确的类型std::string::size_type。

【讨论】：

感谢您重复我的回答！你也愿意接受@AlanBirtles 的反对票吗？ :) 首先_我的更短！（我说的是代码）。第二个 _ 在查找和相等性检查中，您花费了使用 auto 和重复 pos 保存的字母数量的一半！
另外，为什么每个body都使用后增量num++而不是前增量++num，后者更快且开销更少？
@AKL 在我看来，这是关于代码的可读性和简洁性的最佳解决方案。同样在此处使用num++; 或++num; 将使用相同的汇编代码：godbolt.org/z/GLwgEj
1- 因为它是“可读性”而不是“可读性”我不确定你能判断可读性有多好！ 2-可读性不是提问者所要求的。 3-你们太沉迷于汇编器和优化器而变得懒惰。尽管在这种情况下可能无关紧要，但没有什么比养成一个好习惯更重要了。 4-请去问 AlanBirtles 删除他的反对票:)
@AKL 这与您的答案不同，并且没有相同的错误

【解决方案4】：

在不牺牲可读性的情况下，最简单的代码可能是这样的：

std::string s = "banana";
int num = 0;
size_t pos = 0;

while ((pos = s.find("ana", pos)) != std::string::npos)
{
    num++;
    pos++;
}
std::cout << num << "\n";

或作为for循环：

std::string s = "banana";
int num = 0;
for (size_t pos = 0; (pos = s.find("ana", pos)) != std::string::npos; num++, pos++)
{
}
std::cout << num << "\n";

【讨论】：

老兄，您如此痴迷于测试“anana”案例，以至于您为循环案例编写了它。由于它少于 6 个字符，我无法为您编辑它！
说到不同的情况，std::string::npos 有可能是一个小于/大于 size_t 的整数类型（无论多么小），因为它通常用 -1 填充，如果 string::size_type 或 size_t 之一是无符号的，其中一个小于 int 而另一个不是，则您的代码也可以包含无限循环。在字符串中定义 size_type 是有原因的。我还应该开始一篇关于何时使用后增量运算符的帖子吗？
@AKL std::string::size_type 始终为 std:size_t。对于整数类型，后自增运算符通常不会更昂贵，它们只会（可能）对迭代器等结构更昂贵，编译器会优化差异
就我在string 中的读数size_type 而言，它是由size_type 的Allocator 定义的，人们可以选择使用另一个默认std::allocator。但也许我错了，并不是我怀疑你，只是为了能够向其他人证明这一点，如果它发生的话，如果你能给我提供“std::string::size_type is always”的参考，我将不胜感激std:size_t"。
我同意你关于后增量操作的观点，这只是养成一个好习惯，没有双重标准和代码的可读性。因为看的不是正宗的资料，所以很长一段时间都不知道preincrement算子！