为什么自制的二进制搜索算法比 std::binary_search 慢？答案

【问题标题】：why homemade binary search algorithm is slower than std::binary_search?为什么自制的二进制搜索算法比 std::binary_search 慢？
【发布时间】：2014-04-08 10:45:43
【问题描述】：

一个简单的自制二分查找算法被std::binary_search击败（再次）：

// gcc version 4.8.2 X86_64

#ifndef EXAMPLE_COMPARE_VERSION
# define EXAMPLE_COMPARE_VERSION 0
#endif

static const long long LOOPS = 0x1fffffff;

#include <cassert>
#include <cstdlib>
#include <ctime>
#include <cstdio>

#if EXAMPLE_COMPARE_VERSION
#include <algorithm>

inline bool stl_compare(const int l, const int r) {
  return l < r;
}

#else

inline bool compare(const int *beg, const int *end, int v) {
  for (const int *p; beg <= end;) {
    p = beg + (end - beg) / 2;
    if (*p < v) beg = p + 1;
    else if (*p > v) end = p - 1;
    else return true;
  }
  return false;
}
#endif

int main() {
  const int arr[] = {
    1784, 1785, 1787, 1789, 1794, 1796, 1797, 1801, 
    1802, 1805, 1808, 1809, 1912, 1916, 1918, 1919, 
    1920, 1924, 1925, 1926, 1929, 1930, 2040, 2044, 
    2047, 2055, 2057, 2058, 2060, 2061, 2064, 2168, 
    2172, 2189, 2193, 2300, 2307, 2309, 2310, 2314, 
    2315, 2316, 2424, 2429, 2432, 2433, 2438, 2441, 
    2448, 2552, 2555, 2563, 2565, 2572, 2573, 2680, 
    2684, 2688, 2694, 2697, 2699, 2700, 2704, 2705, 
    2808, 2811, 2813, 2814, 2816, 2818, 2822, 2826, 
    2827, 2828, 2936, 2957, 3064, 3070, 3072, 3073, 
    3074, 3075, 3076, 3077, 3078, 3081, 3082, 3084, 
    3085, 3086, 3088, 3192, 3196, 3198, 3200, 3205, 
    3206, 3211, 3212, 3213, 3326, 3327, 3328, 3330, 
    3331, 3333, 3337, 3338, 3339, 3344, 3448, 3449, 
    3451, 3452, 3454, 3459, 3461, 3462, 3465, 3469, 
    3472, 3578, 3585, 3588, 3593, 3594, 3704, 3712, 
    3715, 3722, 3723, 3852, 3972, 3973, 3974, 3980, 
    3982, 4088, 4090, 4091, 4092, 4094, 4096, 4098, 
    4099, 4100, 4101, 4102, 4103, 4105, 4106, 4107, 
    4108, 4109, 4110, 4216, 4220, 4222, 4223, 4224, 
    4226, 4227, 4229, 4230, 4233, 4234, 4235, 4238, 
    4240, 4350, 4354, 4361, 4369, 4476, 4480, 4486, 
    4600, 4614, 4735, 4864, 4870, 4984, 4991, 5004, 
  };
  clock_t t = clock();
  const size_t len = sizeof(arr) / sizeof(arr[0]);
  for (long long i = 0; i < LOOPS; i++) {
    int v = arr[rand() % len];
#if EXAMPLE_COMPARE_VERSION >= 2
    assert(std::binary_search(arr, arr + len, v, stl_compare));
#elif EXAMPLE_COMPARE_VERSION
    assert(std::binary_search(arr, arr + len, v));
#else 
    assert(compare(arr, arr + len, v));
#endif
  }
  printf("compare version: %d\ttime: %zu\n",
      EXAMPLE_COMPARE_VERSION, (clock() - t) / 10000);
}

编译文件（如果保存为`t.cc`）

g++ t.cc -O3 -DEXAMPLE_COMPARE_VERSION=0 -o t0
g++ t.cc -O3 -DEXAMPLE_COMPARE_VERSION=1 -o t1
g++ t.cc -O3 -DEXAMPLE_COMPARE_VERSION=2 -o t2

测试

./t2 ; ./t0 ; ./t1

在我的机器上输出（时间越短越快）：

compare version: 2      time: 3533
compare version: 0      time: 4074
compare version: 1      time: 3968

在将EXAMPLE_COMPARE_VERSION 设置为0 时，我们使用自制的二分查找算法。

inline bool compare(const int *beg, const int *end, int v) {
  for (const int *p; beg <= end;) {
    p = beg + (end - beg) / 2;
    if (*p < v) beg = p + 1;
    else if (*p > v) end = p - 1;
    else return true;
  }
  return false;
}

将EXAMPLE_COMPARE_VERSION 设置为1 时，我们使用：

template <class ForwardIterator, class T>
  bool binary_search (ForwardIterator first, ForwardIterator last,
                      const T& val);

将EXAMPLE_COMPARE_VERSION 设置为2 时，我们使用：

template <class ForwardIterator, class T, class Compare>
  bool binary_search (ForwardIterator first, ForwardIterator last,
                      const T& val, Compare comp);

// the Compare function:
inline bool stl_compare(const int l, const int r) {
  return l < r;
}

这两个std::binary_search函数定义在gcc头文件目录的bits/stl_algo.h中。

问题

为什么std::binary_search 使用比较功能 (t2) 比不使用比较功能的版本 (t1) 快得多？
有时甚至 t1 比自制的二进制搜索程序 (t0) 更快。为什么 t0 这么慢，如何加快速度？

更新：

将random() 替换为rand()，另见What difference between rand() and random() functions?

【问题讨论】：

时间差距不大..尝试扩展arr。（也许......十倍？）也许你应该使用<random>。
Effective STL by Scott Meyers。从43开始阅读，他们专门处理你提到的所有问题。我本可以发布他所有的答案，但是从他的书中阅读它会更清楚。
比较他们的时间和你花费的时间。还要检查生成的汇编程序是否有提示。
@DumbCoder 你的意思是第12条。对STL容器的线程安全有现实的期望。？我要读它。谢谢:)
@MapX - 不，我从第 43 条开始提到。

标签： c++ algorithm stl profiling binary-search

【解决方案1】：

因为基准有缺陷。

您在循环（和定时）区域内调用 random：不仅它的运行时间有问题（并影响基准测试），而且这也意味着您可能没有在基准测试中测量相同的运行
由于花费的时间取决于随机输出，您使用什么统计方法尽可能公平？平均超过 5 次交错运行 ?最好的 5 次交错运行？ ...

现在，即使在清除了瓦砾之后，您很可能最终会遇到标准算法比您自己的自制解决方案更快的情况。在这一点上，想想 C++ 的哲学：你不需要为你不需要的东西买单。因此，如果没有优化，标准实现很可能至少足够精简与天真的方法一样快：如果不是这样，它们已经被修补了很长时间很久以前！

所以，最后，您需要检查差异。此时您需要深入研究代码并了解它是如何映射的。我建议使用源代码、LLVM IR 或程序集进行此探索（如果您不了解某些转换，请随时提问）。

也许有一些展开？也许测试更好地暗示？谁知道，经过几十年的存在，你可能会发现一颗珍珠。

注意：要在 http://coliru.stacked-crooked.com 上获取 LLVM IR，请使用以下命令行 clang -O3 -S -emit-llvm -o main.ll main.cpp && cat main.ll

【讨论】：

不调用srandom()，random()每次输出相同的序列。
@MapX: 你的意思是srand 和rand 吗？我个人不知道random这个函数，cppreference也不知道，所以我以为是自定义函数。
random() 不是标准库的一部分。但是rand() 和random() 在这个问题中的行为是一样的。
@MapX：我编辑了我的答案以考虑到random 调用可能会生成相同的数字序列。它不会明显影响在此调用中花费的时间，也不会影响用于收集度量的统计方法。
@MapX 不管怎样，我都会尝试减少rand() 调用的时间因素，要么提前生成随机数（除非只有少数，否则不要这样做很昂贵）或者；生成随机数，启动定时器，搜索，停止定时器，总时间，循环。

【解决方案2】：

编译器无论如何都会内联比较函数，而 STL 实现通常由大师编写。

【讨论】：

他们是大师并没有解释他们自己版本之间的区别。

【解决方案3】：

我修改了 Mohit Jain 的第一个答案。我有 2 个版本：

版本 1

inline bool compare(const int *beg, const int *end, int v) {
  while (beg <= end) {
    const int* const p = beg + ((end - beg) >> 1);
    const int z = *p;
    if(z != v){
        beg = z > v ? beg : p + 1;
        end = z < v ? end : p - 1;
    }
    else {
        return true;
    }
  }
  return false;
}

第 2 版

inline bool compare(const int *beg, const int *end, int v) {
  while (beg <= end) {
    const int* const p = beg + ((end - beg) >> 1);
    const int z = *p;
    if(z != v){
        beg = z > v ? beg : p;
        end = z < v ? end : p;
    }
    else {
        return true;
    }
  }
  return false;
}

运行时间

原始版本：2642

Mohits 版本：2435

我的版本 1：2413

我的版本 2：2366

t1: 2606

t2: 2508

我使用的gcc版本是4.7.2。

t0,t1,t2的结果与OPs结果一致。

令我惊讶的是，我的第 2 版是最快的。由于过度拟合（即优化此特定测试集的代码），这可能是偶然发生的。另外，我不确定为什么我的版本 1 比 Mohits 版本快。

我测试了各种东西，并认为我应该发布快速的版本。要确定某个版本更快的原因，应该检查汇编代码。

【讨论】：

除了汇编指令计数之外，指令流水线、预取策略等可能在时序中发挥重要作用。但干得好。
我同意，我也应该提到这一点。我没有检查汇编输出，但是我的版本稍快的原因可能是beg/end的赋值使用了条件移动操作，并且几乎总是输入第一个if（良好的分支预测）。
@George 也许您的第二个版本会稍微快一些，因为编译器可以优化纯指针分配，而不是涉及指针算术 beg = z > v ? beg : p; 与 beg = z > v ? beg : p + 1;
可能也是这样！但是，我不确定这是否普遍适用或仅适用于这个特定的测试集。版本 2 应该需要更多时间，因为它在每个步骤中排除的元素更少。

【解决方案4】：

对此没有明确的答案，但我可以尝试给出一些观点。

如果您指定 stl_compare，首先 std::binary_search 调用您 stl_compare，然后实际操作符
您的算法有机会改进。例如，您在比较时取消引用 p 2 次。您可以将 *p 保存为 const 或 register 或 const 寄存器类型以加快速度。

能否请您修改一下您的比较功能并试一试

inline bool compare(const int *beg, const int *end, int v) {
  while (beg <= end) {
    const int *const p = beg + ((end - beg) >> 1);
    const int z = *p;
    if (z < v) beg = p + 1;
    else if (z > v) end = p - 1;
    else return true;
  }
  return false;
}

它在我的机器上显示比 stl 二进制搜索更好的结果。 (gcc 4.6.3)

编辑

完全尊重 Matthieu M 的 cmets。我试图重新制作您的搜索食谱。在我的设置中，我仍然得到与 stl 相当的结果。

$ ./t0
compare version: 0      time: 3088
$ ./t1
compare version: 1      time: 3113
$ ./t2
compare version: 2      time: 3115
$ ./t0;./t1;./t2
compare version: 0      time: 3082
compare version: 1      time: 3116
compare version: 2      time: 3042

按照我使用的修改后的功能

inline bool compare(const int *beg, const int *end, int v) {
  if(end <= beg) return false; // Comment this line if you are sure end is always greater than beg
  int count = end - beg;
  while (count > 0) {
    const int half = count >> 1;
    const int *const p = beg + half;
    if (*p < v) {
      beg = p + 1;
      count -= half + 1;
    } else {
      count = half;
    }
  }
  return *beg == v;
}

【讨论】：

谢谢。我修改了一下，发现变化不大。 t0 仍然比其他 t2 慢很多，比 t1 (gcc 4.8.2) 慢一点。

编译文件（如果保存为t.cc）

测试

问题

编译文件（如果保存为`t.cc`）