与迭代相比，为什么我的二进制搜索速度如此之慢？答案

【问题标题】：Why is my binary search so insanely slow compared to an iterative?与迭代相比，为什么我的二进制搜索速度如此之慢？
【发布时间】：2013-09-26 23:22:30
【问题描述】：

我正在编写一个自动完成程序，该程序在给定字典文件和输入文件的情况下查找一个字母或一组字符的所有可能匹配项。我刚刚完成了一个在迭代搜索上实现二分搜索的版本，并认为我可以提高程序的整体性能。

问题是，二分搜索几乎比迭代搜索慢 9 倍。是什么赋予了？我认为我通过使用二分搜索而不是迭代来提高性能。

运行时间（向左搜索bin）[Larger]:

这里是每个版本的重要部分，完整的代码可以在my github用cmake构建和运行。

二分查找函数（在给定输入循环时调用）

bool search(std::vector<std::string>& dict, std::string in,
        std::queue<std::string>& out)
{
    //tick makes sure the loop found at least one thing. if not then break the function
    bool tick = false;  
    bool running = true;
    while(running) {
        //for each element in the input vector
        //find all possible word matches and push onto the queue
        int first=0, last= dict.size() -1;
        while(first <= last)
        {
            tick = false;
            int middle = (first+last)/2;
            std::string sub = (dict.at(middle)).substr(0,in.length());
            int comp = in.compare(sub);
            //if comp returns 0(found word matching case)
            if(comp == 0) {
                tick = true;
                out.push(dict.at(middle));
                dict.erase(dict.begin() + middle);      
            }
            //if not, take top half
            else if (comp > 0)
                first = middle + 1;
            //else go with the lower half
            else
                last = middle - 1;
        }
        if(tick==false)
            running = false;
    }
    return true;
}

迭代搜索（包含在主循环中）：

for(int k = 0; k < input.size(); ++k) {
        int len = (input.at(k)).length();
        // truth false variable to end out while loop
        bool found = false;
        // create an iterator pointing to the first element of the dictionary
        vecIter i = dictionary.begin();
        // this while loop is not complete, a condition needs to be made
        while(!found && i != dictionary.end()) {
            // take a substring the dictionary word(the length is dependent on
            // the input value) and compare
            if( (*i).substr(0,len) == input.at(k) ) {
                // so a word is found! push onto the queue
                matchingCase.push(*i);
            }
            // move iterator to next element of data
            ++i;    
        }

    }

示例输入文件：

z
be
int
nor
tes
terr
on

【问题讨论】：

a letter or set of letters 的意思是它们位于搜索词的开头吗？
@SJuan76 我编辑了一个示例文件，包括图片中使用的搜索词，是的，字母是单词的开头。
erasevector 中的项目很昂贵。
不要使用substr 尝试使用std::strncmp
他们给出完全相同的结果吗？你有没有数过你在每个循环中做了多少操作并比较了它们？

标签： c++ performance search time binary-search

【解决方案1】：

不要删除向量中间的元素（这非常昂贵），然后重新开始搜索，只需比较找到的项目之前和之后的元素（因为它们应该都彼此相邻）直到找到所有匹配的项目。

或者使用std::equal_range，它就是这样做的。

【讨论】：

【解决方案2】：

这将是罪魁祸首：

dict.erase(dict.begin() + middle);

您反复从字典中删除项目，以天真地使用二进制搜索来查找所有有效前缀。这增加了巨大的复杂性，而且是不必要的。

相反，一旦找到匹配项，请后退一步，直到找到第一个匹配项，然后再前进，将所有匹配项添加到您的队列中。请记住，由于您的字典已排序并且您仅使用前缀，因此所有有效匹配项都会连续出现。

【讨论】：

【解决方案3】：

dict.erase 操作在 dict 的大小上是线性的：它将整个数组从中间到末尾复制到数组的开头。这使得“二分查找”算法在 dict 的长度上可能是二次的，具有 O(N^2) 昂贵的内存复制操作。

【讨论】：