查找幻数 C++答案

【问题标题】：Find Magic Numbers C++查找幻数 C++
【发布时间】：2017-01-18 00:27:53
【问题描述】：

幻数

一个正整数是“神奇的”当且仅当它可以通过重复将它除以 2（如果它是偶数）或将它乘以 3（如果它是奇数）再加 1 来减少为 1。因此，例如，3 是神奇的，因为 3 首先减少到 10 (3*3+1)，然后减少到 5 (10/2)，然后减少到 16 (5*3+1)，然后减少到 8 (16/2) ，然后到 4 (8/2)，然后到 2 (4/2)，最后到 1 (2/2)。魔数假设指出所有正整数都是魔数，或者，正式地：∀x ∈ Z, MAGIC(x) 其中 MAGIC(x) 是谓词“x 是魔数”。

我们应该开发一个 C++ 程序来查找从 1 到 50 亿的“幻数”。如果正确完成，应该需要 80 秒或更短的时间。我的大约需要 2 个小时，而我们给出的示例程序需要 14 天。这是我的代码，我该怎么做才能加快速度？我是否遗漏了明显的优化问题？

#include <iostream>
using namespace std;

bool checkMagic(unsigned long number);

int main()
{
  for(unsigned long i = 1; i <= 5000000000; i++)
  {
    if(i % 1000000 == 0)
    {
      //only print out every 1000000 numbers to keep track, but slow it down 
      //by printing out every single number
      cout<<i<<endl;
    }

    if(!checkMagic(i))
    {
      //not magic
      cout<<i<<" not a magic number"<<endl;
      break;
    }
  }
}

bool checkMagic(unsigned long number)
{
  if(number % 2 == 0)
  {
    number = number / 2;
  }
  else
  {
    number = (number * 3) + 1;
  }

  if(number !=  1)
  {
    checkMagic(number);
  }

  return 1;
}

【问题讨论】：

请在SE Code Review 询问以改进工作代码。
使程序快速运行的诀窍是记住您之前找到的所有幻数。如果在乘除的过程中遇到了之前见过的幻数，可以立即停止。这称为 memoization。
好吧，一方面我会摆脱递归。编译器可能正在优化它，但是通过使用循环，您知道您有一个循环，并且您不会进行所有的函数调用。
您的checkMagic 总是返回1。你甚至没有使用递归的结果。
您想找出从 1 到 50 亿的数字中哪些是“神奇的”？容易：所有这些。查找Collatz Conjecture。

标签： c++ magic-numbers

【解决方案1】：

这个问题基本上要求验证Collatz Conjecture最高5B。

我认为这里的关键是，对于我们正在检查的每个数字n，查看乐观情景和悲观情景，并在恢复悲观情景之前检查乐观情景。

在乐观的情况下，我们根据 n/2 修改 n ； 3n + 1 规则，数字序列将：

在有限的步数内变得小于 n（在这种情况下，我们可以检查我们对这个较小数字的了解）。
不会导致步骤溢出。

（正如 TonyK 正确指出的那样，我们不能依赖它（甚至不能依赖第一个）。

所以，对于乐观的场景，我们可以使用如下函数：

#include <unordered_set>
#include <set>
#include <iostream>
#include <memory>
#include <list>
#include <gmp.h>

using namespace std;

using found_set = unordered_set<size_t>;

bool fast_verify(size_t i, size_t max_tries, const found_set &found) {
    size_t tries = 0;
    size_t n = i;
    while(n != 1) {
        if(++tries == max_tries ) 
            return false;

        if(n < i)
            return found.empty() || found.find(i) == found.end();
        if(n % 2 == 0)
            n /= 2;
        else if(__builtin_mul_overflow (n, 3, &n) || __builtin_add_overflow(n, 1, &n))
            return false;
    }   

    return true;
}

注意以下几点：

该函数仅尝试验证它收到的数字的猜想。如果返回true，则已通过验证。如果它返回false，它只是表示它尚未被验证（即，它并不意味着它已被反驳）。
它接受一个参数max_tries，并且只验证到这个数量的步骤。如果超过了这个数字，它不会尝试识别这是否是无限循环的一部分 - 它只是返回验证失败。
它需要一个已知失败数的unordered_set（当然，如果 Collatz 猜想为真，此集合将始终为空）。
它通过__builtin_*_overflow 检测溢出。（不幸的是，这是特定于 gcc 的。不同的平台可能需要一组不同的此类函数。）

现在是缓慢但确定的功能。此函数使用GNU MP multi-precision arithmetic library。它通过维护它已经遇到的数字序列来检查无限循环。如果该数字的猜想已被证明，则此函数返回true，如果该数字的猜想已被证伪，则该函数返回false（注意与之前快速验证的区别）。

bool slow_check(size_t i) {
    mpz_t n_; 
    mpz_init(n_);

    mpz_t rem_;
    mpz_init(rem_);

    mpz_t i_; 
    mpz_init(i_);

    mpz_set_ui(i_, i); 
    mpz_set_ui(n_, i); 

    list<mpz_t> seen;

    while(mpz_cmp_ui(n_, 1) != 0) {
        if(mpz_cmp_ui(n_, i) < 0)
            return true;
        mpz_mod_ui(rem_, n_, 2); 
        if(mpz_cmp_ui(rem_, 0) == 0) {
            mpz_div_ui(n_, n_, 2); 
        }   
        else {
            mpz_mul_ui(n_, n_, 3); 
            mpz_add_ui(n_, n_, 1); 
       }   
       seen.emplace_back(n_);
       for(const auto &e0: seen)
           for(const auto &e1: seen)
               if(&e0 != &e1 && mpz_cmp(e0, e1) == 0)
                   return false;
   }   

   return true;
}

最后，main 维护了一个被证明的数字的unordered_set。对于每个数字，它都乐观地尝试验证猜想，然后，如果失败（溢出或超过迭代次数），则使用慢方法：

int main()
{
    const auto max_num = 5000000000;
    found_set found;

    for(size_t i = 1; i <= max_num; i++) {
        if(i % 1000000000 == 0)
            cout << "iteration " << i << endl;

        auto const f = fast_verify(i, max_num, found);
        if(!f && !slow_check(i))
            found.insert(i);
    }

    for(auto e: found)
        cout << e << endl;
}

运行完整代码（如下）给出：

$ g++ -O3 --std=c++11 magic2.cpp -lgmp && time ./a.out
iteration 1000000000
iteration 2000000000
iteration 3000000000
iteration 4000000000
iteration 5000000000

real    1m3.933s
user    1m3.924s
sys 0m0.000s

$ uname -a
Linux atavory 4.4.0-38-generic #57-Ubuntu SMP Tue Sep 6 15:42:33 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ sudo lshw | grep -i cpu
      *-cpu
           description: CPU
           product: Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz
           bus info: cpu@0
           version: Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz
           capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts cpufreq

即，没有发现被证实的数字，运行时间约为 64 秒。

完整代码：

#include <unordered_set>
#include <set>
#include <iostream>
#include <memory>
#include <list>
#include <gmp.h>

using namespace std;

using found_set = unordered_set<size_t>;

bool fast_verify(size_t i, size_t max_tries, const found_set &found) {
    size_t tries = 0;
    size_t n = i;
    while(n != 1) {
        if(++tries == max_tries )
            return false;

        if(n < i)
            return found.empty() || found.find(i) == found.end();
        if(n % 2 == 0)
            n /= 2;
        else if(__builtin_mul_overflow (n, 3, &n) || __builtin_add_overflow(n, 1, &n))
            return false;
    }   

    return true;
}

bool slow_check(size_t i) {
    mpz_t n_; 
    mpz_init(n_);

    mpz_t rem_;
    mpz_init(rem_);

    mpz_t i_; 
    mpz_init(i_);

    mpz_set_ui(i_, i); 
    mpz_set_ui(n_, i); 

    list<mpz_t> seen;

    while(mpz_cmp_ui(n_, 1) != 0) {
        if(mpz_cmp_ui(n_, i) < 0)
            return true;
        mpz_mod_ui(rem_, n_, 2); 
        if(mpz_cmp_ui(rem_, 0) == 0) {
            mpz_div_ui(n_, n_, 2); 
        }   
        else {
            mpz_mul_ui(n_, n_, 3); 
            mpz_add_ui(n_, n_, 1); 
       }   
       seen.emplace_back(n_);
       for(const auto &e0: seen)
           for(const auto &e1: seen)
               if(&e0 != &e1 && mpz_cmp(e0, e1) == 0)
                   return false;
   }   

   return true;
}


int main()
{
    const auto max_num = 5000000000;
    found_set found;

    for(size_t i = 1; i <= max_num; i++) {
        if(i % 1000000000 == 0)
            cout << "iteration " << i << endl;

        auto const f = fast_verify(i, max_num, found);
        if(!f && !slow_check(i))
            found.insert(i);
    }

    for(auto e: found)
        cout << e << endl;

    return 0;
}

【讨论】：

有两个nums 有点混乱。
@MarkSetchell 你是绝对正确的 - 非常感谢。已更正。
这实际上不起作用。考虑一下如果在max_num 下方有一个非幻数会发生什么：你将如何检测到这一点？你不会，因为任何一个 (i) num 都会溢出，可能是一个神奇的数字；或 (ii) 程序将进入无限循环。我想你可以说你在情况 (ii) 中检测到了它，但肯定不是在情况 (i) 中。
由于 while 循环没有副作用，编译器可能会完全忽略该循环。
@Jarod42 您的评论是正确的，但我随后添加了副作用，这几乎不会影响时间。无论如何，我随后使用先乐观后悲观的方法完全重写了解决 TonyK 正确 cmets 的答案。这将运行时间增加了约 20 秒。如果您能再看一下答案，我将不胜感激。谢谢。

【解决方案2】：

您的代码和 Ami Tavory 的回答完全忽略了溢出问题。如果在您的范围中有一个非幻数，那么 Collatz 迭代将进入一个循环，或者无限制地增加。如果它无限制地增加，它肯定会溢出，无论您的unsigned long 的大小如何；如果它进入一个循环，它很可能在到达之前溢出。所以你必须在里面放一个溢出检查：

#include <limits.h>
...
if (number > (ULONG_MAX - 1) / 3)
    PrintAnApologyAndDie() ;
number = (number * 3) + 1;

【讨论】：

【解决方案3】：

我可以做些什么来加快速度？我是否遗漏了明显的优化问题？

for(unsigned long i = 1; i

for 循环并不是调用方法 5B 次最快的编码风格。

在我的代码中，请参见 unwind100() 和 innerLoop()。我编码的展开消除了 99% 的 innerLoop 开销，节省了超过 5 秒（在我的桌面 DD5 上）。您的储蓄可能会有所不同。 Wikipedia 上有一篇关于展开循环的文章，并简要说明了节省的潜力。

如果（i % 1000000 == 0）

此代码显然会生成进度报告。耗资50亿比较，它对幻数检查没有任何作用。

查看我的代码outerLoop，它调用了innerLoop 10 次。这提供了一个每个外循环 1 个进度报告的便利位置。考虑拆分你的循环进入一个'innerLoop'和一个'outerLoop'。测试 50 亿次是比向 innerLoops 添加 10 次调用更昂贵。

与 cmets 一样，无论测试结果如何，您的 checkMagic() 都会返回“1”。一个简单的错误，但我不知道这个错误是否会影响性能。

我认为您的 checkMagic() 没有实现尾递归，这确实会减慢您的代码速度。

查看我的方法 checkMagicR()，它有尾递归，并返回一个 true 或假的。

此外，在您的 CheckMagic() 中，在奇数输入的子句中，您忽略了当 n 为正且奇数时 (3n+1) 必须是偶数的想法。我的代码执行“early-div-by-2”，从而尽可能减少未来的工作量。

在我的方法 unwind10() 中，请注意我的代码通过使用 checkMagicRO()（用于奇数输入）和 checkMagicRE()（甚至）。

在我的早期版本中，代码浪费了时间来检测已知的偶数输入是否为偶数，然后将其除以 2，然后返回 true。 checkMagicRE() 跳过测试和除法，节省时间。

CheckMagicRO() 跳过奇数测试，然后执行奇数并继续进入 checkMagicR()。

if(number != 1) { checkMagic(number); }

您的递归一直持续到 1 ... 没有必要，并且可能对整体性能造成最大影响。

查看我的 checkMagicR() “递归终止子句”。我的代码在 ((n/2)

结果：

我的桌面 (DD5) 较旧、速度较慢，并且运行的是 Ubuntu 15.10 (64)。此代码使用 g++ v5.2.1 开发，仅使用 -O3 无超时断言完成。

DD5 显然比 Amy Tavory 使用的机器慢 2 倍（比较他的代码在 DD5 上的运行方式）。

在 DD5 上，我的代码在模式 1 下大约 44 秒完成，在模式 2 下完成 22 秒。

更快的机器可能会在 11 秒内完成此代码（在模式 2 中）。

完整代码：

// V9

#include <chrono>
#include <iomanip>
#include <iostream>
#include <limits>
#include <sstream>
#include <thread>
#include <vector>
#include <cassert>

// chrono typedef - shorten one long name
typedef std::chrono::high_resolution_clock  Clock_t;


class T431_t
{

private:
   uint64_t           m_N;             // start-of-current-recursion
   const uint64_t     m_upperLimitN;   // wrap-around avoidance / prevention

   std::stringstream  m_ssMagic;       // dual threads use separate out streams

   uint64_t           m_MaxRcrsDpth;   // diag only
   uint64_t           m_Max_n;         // diag only
   uint64_t           m_TotalRecurse;  // diag only

   Clock_t::time_point  m_startMS;     // Derived req - progress info

   std::stringstream  m_ssDuration;

   std::ostream*      m_so;            // dual threads - std::cout or m_ssMagic

   uint64_t           m_blk;

   const int64_t      m_MaxThreadDuration; // single Max80KMs, dual Max40KMs


   // enum of known type (my preference for 'internal' constants)
   enum T431_Constraints : uint64_t
   {
      ZERO     = 0,
      B00      = 1,    // least-significant-bit is bit 0
      ONE      = 1,    // lsbit set
      TWO      = 2,
      THREE    = 3,
      //
      K        = 1000, // instead of 1024,
      //
      BlkSize  = ((K * K * K) / 2),  // 0.5 billion
      //
      MaxSingleCoreMS = (80 * K), // single thread max
      MaxDualCoreMS   = (40 * K)  // dual thread max
   };

   // convenience method:  show both hex and decimal on one line (see dtor)
   inline std::string showHexDec(const std::string& label, uint64_t val) {
      std::stringstream ss;
      ss << "\n" << label
         << " = 0x" << std::hex << std::setfill('0') << std::setw(16) << val
         << "  ("   << std::dec << std::setfill(' ') << std::setw(22) << val << ")";
      return (ss.str());
   }

   // duration: now() - m_startMS
   std::chrono::milliseconds  getChronoMillisecond() {
      return (std::chrono::duration_cast<std::chrono::milliseconds>
              (Clock_t::now() - m_startMS));
   }

   // reports milliseconds since m_startMS time
   void ssDurationPut (const std::string& label)
      {
         m_ssDuration.str(std::string());  // empty string
         m_ssDuration.clear();             // clear ss flags

         std::chrono::milliseconds  durationMS = getChronoMillisecond();

         uint64_t  durationSec = (durationMS.count() / 1000);
         uint64_t durationMSec = (durationMS.count() % 1000);

         m_ssDuration
            << label << std::dec << std::setfill(' ') << std::setw(3) << durationSec
            << "."   << std::dec << std::setfill('0') << std::setw(3) << durationMSec
            << " sec";
      }

   inline void reportProgress()
      {
         std::chrono::milliseconds  durationMS = getChronoMillisecond();

         assert(durationMS.count() < m_MaxThreadDuration); // normal finish -> "no endless loop"
         //
         uint64_t  durationSec = (durationMS.count() / 1000);
         uint64_t durationMSec = (durationMS.count() % 1000);

         *m_so << " m_blk = " << m_blk
               << "  m_N = 0x" << std::setfill('0') << std::hex << std::setw(16) << m_N
               << "  " << std::dec << std::setfill(' ') << std::setw(3) << durationSec
               << "."  << std::dec << std::setfill('0') << std::setw(3) << durationMSec
               << " sec  (" << std::dec << std::setfill(' ') << std::setw(10) << m_N << ")"
               << std::endl;
      }


public:

   T431_t(uint64_t maxDuration = MaxSingleCoreMS) :
      m_N                 (TWO),  // start val of current recursion
      m_upperLimitN       (initWrapAroundLimit()),
      // m_ssMagic        // default ctor ok - empty
      m_MaxRcrsDpth       (ZERO),
      m_Max_n             (TWO),
      m_TotalRecurse      (ZERO),
      m_startMS           (Clock_t::now()),
      // m_ssDuration     // default ctor ok - empty
      m_so                (&std::cout), // default
      m_blk               (ZERO),
      m_MaxThreadDuration (maxDuration)
      {
         m_ssMagic.str().reserve(1024*2);
         m_ssDuration.str().reserve(256);
      }

   ~T431_t()
      {
         // m_MaxThreadDuration
         // m_blk
         // m_so
         // m_ssDuration
         // m_startMS
         // m_Max_n
         // m_TotalRecurse
         // m_MaxRcrsDpth
         if (m_MaxRcrsDpth) // diag only
         {
            ssDurationPut ("\n      duration = " );

            std::cerr << "\n T431() dtor        "
               //
                      << showHexDec("         u64MAX",
                                    std::numeric_limits<uint64_t>::max()) << "\n"
               //
                      << showHexDec("        m_Max_n", m_Max_n)
                      << showHexDec("  (3*m_Max_n)+1", ((3*m_Max_n)+1))
                      << showHexDec("    upperLimitN", m_upperLimitN)
               //                        ^^^^^^^^^^^ no wrap-around possible when n is less
               //
                      << "\n"
                      << showHexDec("  m_MaxRcrsDpth",  m_MaxRcrsDpth)
                      << "\n"
                      << showHexDec(" m_TotalRecurse",  m_TotalRecurse)
               //
                      << m_ssDuration.str() << std::endl;
         }
         // m_ssMagic
         // m_upperLimitN
         // m_N
         if(m_MaxThreadDuration == MaxSingleCoreMS)
            std::cerr << "\n  " << __FILE__ << " by Douglas O. Moen     Compiled on = "
                      << __DATE__ << "  " << __TIME__ << "\n";
      }

private:

   inline bool checkMagicRE(uint64_t nE) // n is known even, and the recursion starting point
      {
         return(true);  // for unit test, comment out this line to run the asserts

         // unit test - enable both asserts
         assert(nE == m_N);             // confirm recursion start value
         assert(ZERO == (nE & B00));    // confirm even
         return(true);                  // nothing more to do
      }

   inline bool checkMagicRO(uint64_t nO) // n is known odd,  and the recursion starting point
      {
         // unit test - uncomment the following line to measure checkMagicRE() performance
         // return (true);  // when both methods do nothing -  0.044

         // unit test - enable both these asserts
         // assert(nO == m_N);  // confirm recursion start value
         // assert(nO & B00);   // confirm odd
         uint64_t nxt;
         {
            assert(nO < m_upperLimitN);  // assert that NO wrap-around occurs
            // NOTE: the sum of 4 odd uint's is even, and is destined to be
            // divided by 2 in the next recursive invocation.
            // we need not wait to divide by 2
            nxt = ((nO+nO+nO+ONE)  >> ONE); // combine the arithmetic
            //      ||||||||||||   ||||||
            //      sum is even    unknown after sum divided by two
         }
         return (checkMagicR(nxt));
      }


   inline void unwind8()
      {
         // skip 0, 1
         assert(checkMagicRE (m_N)); m_N++;  // 2
         assert(checkMagicRO (m_N)); m_N++;  // 3
         assert(checkMagicRE (m_N)); m_N++;  // 4
         assert(checkMagicRO (m_N)); m_N++;  // 5
         assert(checkMagicRE (m_N)); m_N++;  // 6
         assert(checkMagicRO (m_N)); m_N++;  // 7
         assert(checkMagicRE (m_N)); m_N++;  // 8
         assert(checkMagicRO (m_N)); m_N++;  // 9
      }

   inline void unwind10()
      {
         assert(checkMagicRE (m_N)); m_N++;  // 0
         assert(checkMagicRO (m_N)); m_N++;  // 1
         assert(checkMagicRE (m_N)); m_N++;  // 2
         assert(checkMagicRO (m_N)); m_N++;  // 3
         assert(checkMagicRE (m_N)); m_N++;  // 4
         assert(checkMagicRO (m_N)); m_N++;  // 5
         assert(checkMagicRE (m_N)); m_N++;  // 6
         assert(checkMagicRO (m_N)); m_N++;  // 7
         assert(checkMagicRE (m_N)); m_N++;  // 8
         assert(checkMagicRO (m_N)); m_N++;  // 9
      }

   inline void unwind98()
      {
         unwind8();
         unwind10();  // 1
         unwind10();  // 2
         unwind10();  // 3
         unwind10();  // 4
         unwind10();  // 5
         unwind10();  // 6
         unwind10();  // 7
         unwind10();  // 8
         unwind10();  // 9
      }

   inline void unwind100()
      {
         unwind10();  // 0
         unwind10();  // 1
         unwind10();  // 2
         unwind10();  // 3
         unwind10();  // 4
         unwind10();  // 5
         unwind10();  // 6
         unwind10();  // 7
         unwind10();  // 8
         unwind10();  // 9
      }

   inline void innerLoop (uint64_t start_N, uint64_t end_N)
      {
         for (uint64_t iLoop = start_N; iLoop < end_N; iLoop += 100)
         {
            unwind100();
         }

         reportProgress();
      }

   inline void outerLoop(const std::vector<uint64_t>&  start_Ns,
                         const std::vector<uint64_t>&  end_Ns)
      {
         *m_so << "\n outerLoop \n  m_upperLimitN = 0x"
               << std::hex << std::setfill('0') << std::setw(16) << m_upperLimitN
               << "\n            m_N = 0x"      << std::setw(16) << start_Ns[0]
               << "  " << std::dec << std::setfill(' ') << std::setw(3) << 0
               << "."  << std::dec << std::setfill('0') << std::setw(3) << 0
               << " sec  (" << std::dec << std::setfill(' ') << std::setw(10)
               << start_Ns[0] << ")" << std::endl;

         size_t szStart = start_Ns.size();
         size_t szEnd   = end_Ns.size();
         assert(szEnd == szStart);

         m_blk = 0;

         {  // when blk 0 starts at offset 2 (to skip values 0 and 1)
            if(TWO == start_Ns[0])
            {
               m_N = TWO;  // set m_N start

               unwind98(); // skip 0 and 1

               assert(100 == m_N); // confirm

               innerLoop (100, end_Ns[m_blk]);

               m_blk += 1;
            }
            else // thread 2 blk 0 starts in the middle of 5 billion range
            {
               m_N = start_Ns[m_blk];  // set m_N start, do full block
            }
         }

         for (; m_blk < szStart;  ++m_blk)
         {
            innerLoop (start_Ns[m_blk], end_Ns[m_blk]);
         }
      }


   // 1 == argc
   void  exec1 (void) // no parameters --> check all values
      {
         const std::vector<uint64_t> start_Ns =
            {         TWO, (BlkSize*1), (BlkSize*2), (BlkSize*3), (BlkSize*4),
              (BlkSize*5), (BlkSize*6), (BlkSize*7), (BlkSize*8), (BlkSize*9)  };
         //    blk 0        blk 1        blk 2        blk 3        blk 4
         //    blk 5        blk 6        blk 7        blk 8        blk 9
         const std::vector<uint64_t> end_Ns =
            { (BlkSize*1), (BlkSize*2), (BlkSize*3), (BlkSize*4), (BlkSize*5),
              (BlkSize*6), (BlkSize*7), (BlkSize*8), (BlkSize*9), (BlkSize*10) };

         m_startMS = Clock_t::now();

         // outerLoop :  10 blocks generates 10 progressReports()
         outerLoop( start_Ns, end_Ns);

         ssDurationPut("");

         std::cout << "   execDuration = " << m_ssDuration.str() << " " << std::endl;
      } // void  exec1 (void)


   void exec2a ()  //  thread entry a
      {
         //lower half of test range
         const std::vector<uint64_t> start_Ns =
            {        TWO , (BlkSize*1), (BlkSize*2), (BlkSize*3), (BlkSize*4) };
         //    blk 0        blk 1        blk 2        blk 3        blk 4
         const std::vector<uint64_t> end_Ns =
            { (BlkSize*1), (BlkSize*2), (BlkSize*3), (BlkSize*4), (BlkSize*5) };

         m_startMS = Clock_t::now();

         // m_so to std::cout
         // uses 5 blocks to gen 5 progress indicators
         outerLoop (start_Ns, end_Ns);

         ssDurationPut("");

         std::cout << "   execDuration =  " << m_ssDuration.str() << " (a)   " << std::endl;
      } // exec2a (void)


   void exec2b ()  //  thread entry b
      {
         // upper half of test range
         const std::vector<uint64_t> start_Ns =
            { (BlkSize*5), (BlkSize*6), (BlkSize*7), (BlkSize*8), (BlkSize*9)  };
         //    blk 5        blk 6        blk 7        blk 8        blk 9
         const std::vector<uint64_t> end_Ns =
            { (BlkSize*6), (BlkSize*7), (BlkSize*8), (BlkSize*9), (BlkSize*10) };

         m_startMS = Clock_t::now();

         m_so = &m_ssMagic;
         // uses 5 blocks to gen 5 progress indicators
         outerLoop(start_Ns, end_Ns);

         ssDurationPut("");

         m_ssMagic << "   execDuration =  " << m_ssDuration.str() << " (b)  " << std::endl;
      } // exec2b (void)


   // 3 == argc
   int exec3 (std::string argv1,  // recursion start value
              std::string argv2)  // values count
      {
         uint64_t start_N = std::stoull(argv1);
         uint64_t length  = std::stoull(argv2);

         // range checks
         {
            std::string note1;
            switch (start_N) // limit check
            {
            case 0: { start_N = 3; length -= 3;
                  note1 = "... 0 is below test range (change start to 3 ";
            } break;

            case 1: { start_N = 3; length -= 2;
                  note1 = "... 1 is defined magic (change start to 3";
            } break;

            case 2: { start_N = 3; length -= 1;
                  note1 = "... 2 is trivially magic (change start to 3";
            } break;

            default:
            {  // when start_N from user is even
               if (ZERO == (start_N & B00))
               {
                  start_N -= 1;
                  note1 = "... even is not an interesting starting point";
                  length  += 1;
               }
            }
            break;

            } // switch (start_N)

            // start_N not restrained ... allow any manual search
            //    but uin64_t wrap-around still preempted

            std::cout << "\n start_N: " << start_N << "  " << note1
                      << "\n  length: " << length  << "  " << std::endl;
         }

         m_startMS = Clock_t::now();

         for (m_N = start_N;   m_N < (start_N + length);   ++m_N)
         {
            bool magicNumberFound = checkMagicRD (m_N, 1);

            std::cout << m_ssMagic.str() << std::dec << m_N
                      << "    m_MaxRcrsDpth: "  << m_MaxRcrsDpth << ";"
                      << "    m_Max_n: "  << m_Max_n << ";" << std::endl;

            m_ssMagic.str(std::string()); // empty string
            m_ssMagic.clear();            // clear flags)
            if (!magicNumberFound)
            {
               std::cerr << m_N << " not a magic number!" << std::endl;
               break;
            }

         } // for (uint64_t k = kStart; k < kEnd; ++k)

         ssDurationPut("");

         std::cout << "   execDuration =  " << m_ssDuration.str() << "    " << std::endl;

         return(0);
      } // int exec3 (std::string argv1,  std::string argv2)


   // conversion
   std::string ssMagicShow() { return (m_ssMagic.str()); }


   // D3: use recursion for closer match to definition
   bool checkMagicR(uint64_t n)   // recursive Performance
      {
         uint64_t nxt;
         {
            if (n & B00) // n-odd
            {
               if (n >= m_upperLimitN) // wraparound imminent abort
                  return (false);    // recursion termination clause

               // NOTE: the sum of 4 odd uint's is even, and will be halved
               // we need not wait for the next recursion to divide-by-2
               nxt = ((n+n+n+ONE)  >> ONE); // combine the arithmetic
               //     known even
            }
            else // n-even
            {
               nxt = (n >> ONE);   // bit shift divide by 2

               if (nxt < m_N)      // nxt < m_N start of recursion
                  return ( true ); // recursion termination clause
               // We need not de-curse to 1 because all
               // (n < m_N) are asserted as magic
            }
         }
         return (checkMagicR(nxt));  // tail recursion
      } // bool  checkMagicR(uint64_t n)


   // D4: diagnostic text output
   bool checkMagicRD (uint64_t n, uint64_t rd)  // rd: recursion depth
      {
         if (n > m_Max_n) m_Max_n = n;
         m_TotalRecurse += 1;

         m_ssMagic << "\n" << rd << std::setw(static_cast<int>(rd)) << " "
                   << " checkMagicRD (" << m_N << ", " << n << ")";

         uint64_t nxt;
         {
            if (n & B00) // n-odd
            {
               if (n >= m_upperLimitN) // wraparound imminent abort
                  return ( terminateCheckMagic (nxt, rd, false) );

               // NOTE: the sum of 4 odd uint's is even, and will be divided by 2
               // we need not wait for the next recursion to divide by 2
               nxt = ((n+n+n+ONE)  >> ONE); // combine the arithmetic
               //      |||||||||   ||||||
               //    sum is even   unknown after divide by two
            }
            else // n-even
            {
               nxt = (n >> ONE);     // bit shift divide by 2

               if (nxt < m_N)      // nxt < m_N start of recursion
                  return ( terminateCheckMagic (nxt, rd, true) );
               // We need not de-curse to 1 because all
               // (n < m_N) are asserted as magic
            }
         }
         return (checkMagicRD (nxt, (rd+1))); // tail recursion
      } //  bool checkMagicRD(uint64_t, uint64_t rd)


   bool terminateCheckMagic (uint64_t n, uint64_t rd, bool rslt)
      {
         if (rd > m_MaxRcrsDpth) // diag only
         {
            std::chrono::milliseconds  durationMS = getChronoMillisecond();

            uint64_t durationSec  = durationMS.count() / 1000;
            uint64_t durationMSec = durationMS.count() % 1000;

            m_MaxRcrsDpth = rd;
            // generate noise each update - this does not happen often
            static uint64_t count = 0;
            std::cerr << "\n\n" << std::setfill(' ')
                      << std::setw(30) << "["
                      << std::setw(4) << durationSec << "." << std::setfill('0') << std::setw(3)
                      << durationMSec << std::setfill(' ')
                      << " sec]   m_N: " << std::setw(10) << m_N
                      << "    n: " << std::setw(10) << n
                      << "    MaxRcrsDpth: " << std::setw(5) << m_MaxRcrsDpth
                      << "    Max_n:  " << std::setw(16) << m_Max_n
                      << "   (" << count << ")" << std::flush;
            count += 1; // how many times MaxRcrsDpth updated
         }

         m_ssMagic << "\n " << std::setw(3) << rd << std::setw(static_cast<int>(rd)) << " "
                   << "  " << (rslt ? "True " : ">>>false<<< ") << m_N << ", ";

         return (rslt);
      }

   uint64_t initWrapAroundLimit()
      {
         uint64_t upLimit = 0;
         uint64_t u64MAX = std::numeric_limits<uint64_t>::max();
         // there exist a max number,  m_upperLimitN
         // where 3n+1 < u64MAX, i.e.
         upLimit = ((u64MAX - 1) / 3);

         return (upLimit);
      } // uint64_t initWrapAroundLimit()

public:

   int exec (int argc, char* argv[])
      {
         int retVal = -1;

         { time_t t0 = std::time(nullptr); while(t0 == time(nullptr)) { /* do nothing*/ }; }
         // result: consistent run time

         switch (argc)
         {

         case 1: // no parameters: 5 billion tests, 10 blocks, this instance, < 80 sec
         {
            exec1();
            retVal = 0;
         }
         break;

         case 2: // one parameter: 2 threads each runs 5/2 billion tests, in 5 blocks,
         {  //                     two new instances,  < 40 sec
            // 2 new objects
            T431_t  t431a(MaxDualCoreMS); // lower test sequence
            T431_t  t431b(MaxDualCoreMS); // upper test sequence

            // 2 threads
            std::thread tA (&T431_t::exec2a, &t431a);
            std::thread tB (&T431_t::exec2b, &t431b);

            // 2 join's
            tA.join();
            tB.join();

            // lower test sequence (tA) output directly to std::cout
            // upper test sequence (tB) captured in T431_t::m_ssMagic.  send it now
            std::cout << t431b.ssMagicShow() << std::endl;

            retVal = 0;
         }
         break;


         case 3: // argv[1] -> start address,  argv[2] -> length,
         {
            retVal = exec3 (std::string(argv[1]), std::string(argv[2]));
         } break;


         default :
         {
            std::cerr
               << "\nUsage: "
               << "\n argc (= " << argc << ") not expected, should be 0, 1, or 2 parameters."
               << "\n   1  (no parameters) : 5 billion tests, 10 blocks, < 80 sec\n"
               //
               << "\n   2  (one parameter) : 2 threads, each 5/2 billion tests, 5 blocks,  < 40 sec\n"
               //
               << "\n   3  (two parameters): verbose mode: start argv[1], test count argv[2] \n"
               //
               << "\n   3.1: \"./dumy431V4 3 1000  1> /tmp/dumy431V4.out2   2> /tmp/dumy431V4.out2a  "
               << "\n                     3  1 K    std::cout redirected     std::cerr redirected \n"
               //
               << "\n   3.2: \"./dumy431V4 3 1000000000  1>  /dev/null      2> /tmp/dumy431V4.out2b   "
               << "\n                     3  1 B       discard std::cout  std::cerr redirected \n"
               //
               << "\n   3.3: \"./dumy431V4 15 100   "
               << "\n                     15 100   (cout and cerr to console) \n"
               << std::endl;
            retVal = 0;
         }

         } // switch

         return(retVal);
      } // int exec(int argc, char* argv[])

};  // class T431


int main (int argc, char* argv[])
{
   std::cout << "argc: " << argc << std::endl;
   for (int i=0; i<argc; i+=1) std::cout << argv[i] << " ";
   std::cout << std::endl;

   setlocale(LC_ALL, "");
   std::ios::sync_with_stdio(false);

   int retVal;
   {
      T431_t  t431;
      retVal = t431.exec(argc, argv);
   }

   std::cout << "\nFINI " << std::endl;
   return(retVal);
}

提交的代码：

a) 使用 assert(x) 进行所有检查（因为 assert 快速且具有优化器不能忽略的副作用）。

b) 通过使用持续时间的断言来检测任何无限循环。

c) 断言以防止任何回绕。

当任何断言发生时，程序都不会正常终止。

当这段代码正常结束时，表示：

 a) no disproven numbers found, and
 b) the running time < 80 seconds, so no endless loop occurred      and
 c) no wrap-around occurred

关于递归 checkMagicR(uint64_t n) 的注意事项：

n 的 3 种形式在递归中可用：

形式参数 n - 递归调用的典型
auto var nxt - 为下一个接收 n 的计算值递归调用
数据属性：m_N——当前运行的起点递归努力

【讨论】：

尾递归演示：使用 -O0 构建时，代码耗时超过 80 秒并断言。进度报告显示 ~18 秒（使用 -O0）vs ~4.3 秒（使用 -O3）。
仅供参考：我使用 make。它生成的编译命令是：g++-5 -m64 -O3 -ggdb -std=c++14 -Wall -Wextra -Wshadow -Wnon-virtual-dtor -pedantic -Wcast-align -Wcast-qual -Wconversion -Wpointer- arith -Wunused -Woverloaded-virtual -O3 dumy431V9.cc -o dumy431V9 -pthread。该代码还使用 -std=C++11 编译和运行（全速）。
我认为“平均”（44 / 5B）小得令人痛苦……我认为优化器可能仍然让我感到惊讶。（想法？）
琐事 1：最大递归深度（提交代码的）是 447 次调用，在测试输入 2,788,008,987 处……令人惊讶地接近 5,000,000,000 测试范围的中间值。该递归中的 Max_n 为 3,562,942,561,397,226,080。

【解决方案4】：

我在猜测优化器可能对这个特定的尾递归做出了哪些改变。

所以我从我的 V9 代码中创建了一个迭代答案（在我的另一个答案中）。

更改自：

bool checkMagicR(uint64_t n) 
   {
      //.. replace all 
   }

更改为：

// ITERATIVE VERSION 
bool checkMagicR(uint64_t n)   // ITERATIVE Performance
   {
      do
      {  uint64_t nxt;

         if (n & B00) // n-odd
         {
            if (n >= m_upperLimitN) // wraparound imminent abort
               return (false);    // recursion termination clause.

            // NOTE: the sum of 4 odd uint's is even, and will be halved
            // we need not wait for the next recursion to divide-by-2
            nxt = ((n+n+n+ONE)  >> ONE); // combine the arithmetic
            //     known even
         }
         else // n-even
         {
            nxt = (n >> ONE);   // bit shift divide by 2

            if (nxt < m_N)      // nxt < m_N start of recursion
               return ( true );     // recursion termination clause.
            // We need not de-curse to 1 because all
            // (n < m_N) are asserted as magic
         }

         n = nxt;
      } while(true);  // NO  recursion
   }

没有修改名字。没有在此方法或其他方法中修复 cmets 或其他提及“递归”的问题。我在这个方法的顶部和底部更改了一些 cmets。

结果：

性能相同。（44 秒）
模式 1 和模式 2 都是迭代的。
模式 3（仍然）是递归的。

【讨论】：