我可以做些什么来加快速度?我是否遗漏了明显的优化问题?
- for(unsigned long i = 1; i
for 循环并不是调用方法 5B 次最快的编码风格。
在我的代码中,请参见 unwind100() 和 innerLoop()。我编码的展开消除了 99% 的 innerLoop 开销,节省了超过 5 秒(在我的桌面 DD5 上)。您的储蓄可能会有所不同。 Wikipedia 上有一篇关于展开循环的文章,并简要说明了节省的潜力。
此代码显然会生成进度报告。耗资50亿
比较,它对幻数检查没有任何作用。
查看我的代码outerLoop,它调用了innerLoop 10 次。这提供了一个
每个外循环 1 个进度报告的便利位置。考虑拆分
你的循环进入一个'innerLoop'和一个'outerLoop'。测试 50 亿次是
比向 innerLoops 添加 10 次调用更昂贵。
- 与 cmets 一样,无论测试结果如何,您的 checkMagic() 都会返回“1”。一个简单的错误,但我不知道这个错误是否会影响性能。
我认为您的 checkMagic() 没有实现尾递归,这确实会减慢您的代码速度。
查看我的方法 checkMagicR(),它有尾递归,并返回一个 true 或
假的。
此外,在您的 CheckMagic() 中,在奇数输入的子句中,您忽略了当 n 为正且奇数时 (3n+1) 必须是偶数的想法。我的代码执行“early-div-by-2”,从而尽可能减少未来的工作量。
- 在我的方法 unwind10() 中,请注意我的代码通过使用 checkMagicRO()(用于奇数输入)和 checkMagicRE()(甚至)。
在我的早期版本中,代码浪费了时间来检测已知的偶数输入是否为偶数,然后将其除以 2,然后返回 true。 checkMagicRE() 跳过测试和除法,节省时间。
CheckMagicRO() 跳过奇数测试,然后执行奇数并继续进入 checkMagicR()。
- if(number != 1) { checkMagic(number); }
您的递归一直持续到 1 ... 没有必要,并且可能对整体性能造成最大影响。
查看我的 checkMagicR() “递归终止子句”。我的代码在 ((n/2)
结果:
我的桌面 (DD5) 较旧、速度较慢,并且运行的是 Ubuntu 15.10 (64)。此代码使用 g++ v5.2.1 开发,仅使用 -O3 无超时断言完成。
DD5 显然比 Amy Tavory 使用的机器慢 2 倍(比较他的代码在 DD5 上的运行方式)。
在 DD5 上,我的代码在模式 1 下大约 44 秒完成,在模式 2 下完成 22 秒。
更快的机器可能会在 11 秒内完成此代码(在模式 2 中)。
完整代码:
// V9
#include <chrono>
#include <iomanip>
#include <iostream>
#include <limits>
#include <sstream>
#include <thread>
#include <vector>
#include <cassert>
// chrono typedef - shorten one long name
typedef std::chrono::high_resolution_clock Clock_t;
class T431_t
{
private:
uint64_t m_N; // start-of-current-recursion
const uint64_t m_upperLimitN; // wrap-around avoidance / prevention
std::stringstream m_ssMagic; // dual threads use separate out streams
uint64_t m_MaxRcrsDpth; // diag only
uint64_t m_Max_n; // diag only
uint64_t m_TotalRecurse; // diag only
Clock_t::time_point m_startMS; // Derived req - progress info
std::stringstream m_ssDuration;
std::ostream* m_so; // dual threads - std::cout or m_ssMagic
uint64_t m_blk;
const int64_t m_MaxThreadDuration; // single Max80KMs, dual Max40KMs
// enum of known type (my preference for 'internal' constants)
enum T431_Constraints : uint64_t
{
ZERO = 0,
B00 = 1, // least-significant-bit is bit 0
ONE = 1, // lsbit set
TWO = 2,
THREE = 3,
//
K = 1000, // instead of 1024,
//
BlkSize = ((K * K * K) / 2), // 0.5 billion
//
MaxSingleCoreMS = (80 * K), // single thread max
MaxDualCoreMS = (40 * K) // dual thread max
};
// convenience method: show both hex and decimal on one line (see dtor)
inline std::string showHexDec(const std::string& label, uint64_t val) {
std::stringstream ss;
ss << "\n" << label
<< " = 0x" << std::hex << std::setfill('0') << std::setw(16) << val
<< " (" << std::dec << std::setfill(' ') << std::setw(22) << val << ")";
return (ss.str());
}
// duration: now() - m_startMS
std::chrono::milliseconds getChronoMillisecond() {
return (std::chrono::duration_cast<std::chrono::milliseconds>
(Clock_t::now() - m_startMS));
}
// reports milliseconds since m_startMS time
void ssDurationPut (const std::string& label)
{
m_ssDuration.str(std::string()); // empty string
m_ssDuration.clear(); // clear ss flags
std::chrono::milliseconds durationMS = getChronoMillisecond();
uint64_t durationSec = (durationMS.count() / 1000);
uint64_t durationMSec = (durationMS.count() % 1000);
m_ssDuration
<< label << std::dec << std::setfill(' ') << std::setw(3) << durationSec
<< "." << std::dec << std::setfill('0') << std::setw(3) << durationMSec
<< " sec";
}
inline void reportProgress()
{
std::chrono::milliseconds durationMS = getChronoMillisecond();
assert(durationMS.count() < m_MaxThreadDuration); // normal finish -> "no endless loop"
//
uint64_t durationSec = (durationMS.count() / 1000);
uint64_t durationMSec = (durationMS.count() % 1000);
*m_so << " m_blk = " << m_blk
<< " m_N = 0x" << std::setfill('0') << std::hex << std::setw(16) << m_N
<< " " << std::dec << std::setfill(' ') << std::setw(3) << durationSec
<< "." << std::dec << std::setfill('0') << std::setw(3) << durationMSec
<< " sec (" << std::dec << std::setfill(' ') << std::setw(10) << m_N << ")"
<< std::endl;
}
public:
T431_t(uint64_t maxDuration = MaxSingleCoreMS) :
m_N (TWO), // start val of current recursion
m_upperLimitN (initWrapAroundLimit()),
// m_ssMagic // default ctor ok - empty
m_MaxRcrsDpth (ZERO),
m_Max_n (TWO),
m_TotalRecurse (ZERO),
m_startMS (Clock_t::now()),
// m_ssDuration // default ctor ok - empty
m_so (&std::cout), // default
m_blk (ZERO),
m_MaxThreadDuration (maxDuration)
{
m_ssMagic.str().reserve(1024*2);
m_ssDuration.str().reserve(256);
}
~T431_t()
{
// m_MaxThreadDuration
// m_blk
// m_so
// m_ssDuration
// m_startMS
// m_Max_n
// m_TotalRecurse
// m_MaxRcrsDpth
if (m_MaxRcrsDpth) // diag only
{
ssDurationPut ("\n duration = " );
std::cerr << "\n T431() dtor "
//
<< showHexDec(" u64MAX",
std::numeric_limits<uint64_t>::max()) << "\n"
//
<< showHexDec(" m_Max_n", m_Max_n)
<< showHexDec(" (3*m_Max_n)+1", ((3*m_Max_n)+1))
<< showHexDec(" upperLimitN", m_upperLimitN)
// ^^^^^^^^^^^ no wrap-around possible when n is less
//
<< "\n"
<< showHexDec(" m_MaxRcrsDpth", m_MaxRcrsDpth)
<< "\n"
<< showHexDec(" m_TotalRecurse", m_TotalRecurse)
//
<< m_ssDuration.str() << std::endl;
}
// m_ssMagic
// m_upperLimitN
// m_N
if(m_MaxThreadDuration == MaxSingleCoreMS)
std::cerr << "\n " << __FILE__ << " by Douglas O. Moen Compiled on = "
<< __DATE__ << " " << __TIME__ << "\n";
}
private:
inline bool checkMagicRE(uint64_t nE) // n is known even, and the recursion starting point
{
return(true); // for unit test, comment out this line to run the asserts
// unit test - enable both asserts
assert(nE == m_N); // confirm recursion start value
assert(ZERO == (nE & B00)); // confirm even
return(true); // nothing more to do
}
inline bool checkMagicRO(uint64_t nO) // n is known odd, and the recursion starting point
{
// unit test - uncomment the following line to measure checkMagicRE() performance
// return (true); // when both methods do nothing - 0.044
// unit test - enable both these asserts
// assert(nO == m_N); // confirm recursion start value
// assert(nO & B00); // confirm odd
uint64_t nxt;
{
assert(nO < m_upperLimitN); // assert that NO wrap-around occurs
// NOTE: the sum of 4 odd uint's is even, and is destined to be
// divided by 2 in the next recursive invocation.
// we need not wait to divide by 2
nxt = ((nO+nO+nO+ONE) >> ONE); // combine the arithmetic
// |||||||||||| ||||||
// sum is even unknown after sum divided by two
}
return (checkMagicR(nxt));
}
inline void unwind8()
{
// skip 0, 1
assert(checkMagicRE (m_N)); m_N++; // 2
assert(checkMagicRO (m_N)); m_N++; // 3
assert(checkMagicRE (m_N)); m_N++; // 4
assert(checkMagicRO (m_N)); m_N++; // 5
assert(checkMagicRE (m_N)); m_N++; // 6
assert(checkMagicRO (m_N)); m_N++; // 7
assert(checkMagicRE (m_N)); m_N++; // 8
assert(checkMagicRO (m_N)); m_N++; // 9
}
inline void unwind10()
{
assert(checkMagicRE (m_N)); m_N++; // 0
assert(checkMagicRO (m_N)); m_N++; // 1
assert(checkMagicRE (m_N)); m_N++; // 2
assert(checkMagicRO (m_N)); m_N++; // 3
assert(checkMagicRE (m_N)); m_N++; // 4
assert(checkMagicRO (m_N)); m_N++; // 5
assert(checkMagicRE (m_N)); m_N++; // 6
assert(checkMagicRO (m_N)); m_N++; // 7
assert(checkMagicRE (m_N)); m_N++; // 8
assert(checkMagicRO (m_N)); m_N++; // 9
}
inline void unwind98()
{
unwind8();
unwind10(); // 1
unwind10(); // 2
unwind10(); // 3
unwind10(); // 4
unwind10(); // 5
unwind10(); // 6
unwind10(); // 7
unwind10(); // 8
unwind10(); // 9
}
inline void unwind100()
{
unwind10(); // 0
unwind10(); // 1
unwind10(); // 2
unwind10(); // 3
unwind10(); // 4
unwind10(); // 5
unwind10(); // 6
unwind10(); // 7
unwind10(); // 8
unwind10(); // 9
}
inline void innerLoop (uint64_t start_N, uint64_t end_N)
{
for (uint64_t iLoop = start_N; iLoop < end_N; iLoop += 100)
{
unwind100();
}
reportProgress();
}
inline void outerLoop(const std::vector<uint64_t>& start_Ns,
const std::vector<uint64_t>& end_Ns)
{
*m_so << "\n outerLoop \n m_upperLimitN = 0x"
<< std::hex << std::setfill('0') << std::setw(16) << m_upperLimitN
<< "\n m_N = 0x" << std::setw(16) << start_Ns[0]
<< " " << std::dec << std::setfill(' ') << std::setw(3) << 0
<< "." << std::dec << std::setfill('0') << std::setw(3) << 0
<< " sec (" << std::dec << std::setfill(' ') << std::setw(10)
<< start_Ns[0] << ")" << std::endl;
size_t szStart = start_Ns.size();
size_t szEnd = end_Ns.size();
assert(szEnd == szStart);
m_blk = 0;
{ // when blk 0 starts at offset 2 (to skip values 0 and 1)
if(TWO == start_Ns[0])
{
m_N = TWO; // set m_N start
unwind98(); // skip 0 and 1
assert(100 == m_N); // confirm
innerLoop (100, end_Ns[m_blk]);
m_blk += 1;
}
else // thread 2 blk 0 starts in the middle of 5 billion range
{
m_N = start_Ns[m_blk]; // set m_N start, do full block
}
}
for (; m_blk < szStart; ++m_blk)
{
innerLoop (start_Ns[m_blk], end_Ns[m_blk]);
}
}
// 1 == argc
void exec1 (void) // no parameters --> check all values
{
const std::vector<uint64_t> start_Ns =
{ TWO, (BlkSize*1), (BlkSize*2), (BlkSize*3), (BlkSize*4),
(BlkSize*5), (BlkSize*6), (BlkSize*7), (BlkSize*8), (BlkSize*9) };
// blk 0 blk 1 blk 2 blk 3 blk 4
// blk 5 blk 6 blk 7 blk 8 blk 9
const std::vector<uint64_t> end_Ns =
{ (BlkSize*1), (BlkSize*2), (BlkSize*3), (BlkSize*4), (BlkSize*5),
(BlkSize*6), (BlkSize*7), (BlkSize*8), (BlkSize*9), (BlkSize*10) };
m_startMS = Clock_t::now();
// outerLoop : 10 blocks generates 10 progressReports()
outerLoop( start_Ns, end_Ns);
ssDurationPut("");
std::cout << " execDuration = " << m_ssDuration.str() << " " << std::endl;
} // void exec1 (void)
void exec2a () // thread entry a
{
//lower half of test range
const std::vector<uint64_t> start_Ns =
{ TWO , (BlkSize*1), (BlkSize*2), (BlkSize*3), (BlkSize*4) };
// blk 0 blk 1 blk 2 blk 3 blk 4
const std::vector<uint64_t> end_Ns =
{ (BlkSize*1), (BlkSize*2), (BlkSize*3), (BlkSize*4), (BlkSize*5) };
m_startMS = Clock_t::now();
// m_so to std::cout
// uses 5 blocks to gen 5 progress indicators
outerLoop (start_Ns, end_Ns);
ssDurationPut("");
std::cout << " execDuration = " << m_ssDuration.str() << " (a) " << std::endl;
} // exec2a (void)
void exec2b () // thread entry b
{
// upper half of test range
const std::vector<uint64_t> start_Ns =
{ (BlkSize*5), (BlkSize*6), (BlkSize*7), (BlkSize*8), (BlkSize*9) };
// blk 5 blk 6 blk 7 blk 8 blk 9
const std::vector<uint64_t> end_Ns =
{ (BlkSize*6), (BlkSize*7), (BlkSize*8), (BlkSize*9), (BlkSize*10) };
m_startMS = Clock_t::now();
m_so = &m_ssMagic;
// uses 5 blocks to gen 5 progress indicators
outerLoop(start_Ns, end_Ns);
ssDurationPut("");
m_ssMagic << " execDuration = " << m_ssDuration.str() << " (b) " << std::endl;
} // exec2b (void)
// 3 == argc
int exec3 (std::string argv1, // recursion start value
std::string argv2) // values count
{
uint64_t start_N = std::stoull(argv1);
uint64_t length = std::stoull(argv2);
// range checks
{
std::string note1;
switch (start_N) // limit check
{
case 0: { start_N = 3; length -= 3;
note1 = "... 0 is below test range (change start to 3 ";
} break;
case 1: { start_N = 3; length -= 2;
note1 = "... 1 is defined magic (change start to 3";
} break;
case 2: { start_N = 3; length -= 1;
note1 = "... 2 is trivially magic (change start to 3";
} break;
default:
{ // when start_N from user is even
if (ZERO == (start_N & B00))
{
start_N -= 1;
note1 = "... even is not an interesting starting point";
length += 1;
}
}
break;
} // switch (start_N)
// start_N not restrained ... allow any manual search
// but uin64_t wrap-around still preempted
std::cout << "\n start_N: " << start_N << " " << note1
<< "\n length: " << length << " " << std::endl;
}
m_startMS = Clock_t::now();
for (m_N = start_N; m_N < (start_N + length); ++m_N)
{
bool magicNumberFound = checkMagicRD (m_N, 1);
std::cout << m_ssMagic.str() << std::dec << m_N
<< " m_MaxRcrsDpth: " << m_MaxRcrsDpth << ";"
<< " m_Max_n: " << m_Max_n << ";" << std::endl;
m_ssMagic.str(std::string()); // empty string
m_ssMagic.clear(); // clear flags)
if (!magicNumberFound)
{
std::cerr << m_N << " not a magic number!" << std::endl;
break;
}
} // for (uint64_t k = kStart; k < kEnd; ++k)
ssDurationPut("");
std::cout << " execDuration = " << m_ssDuration.str() << " " << std::endl;
return(0);
} // int exec3 (std::string argv1, std::string argv2)
// conversion
std::string ssMagicShow() { return (m_ssMagic.str()); }
// D3: use recursion for closer match to definition
bool checkMagicR(uint64_t n) // recursive Performance
{
uint64_t nxt;
{
if (n & B00) // n-odd
{
if (n >= m_upperLimitN) // wraparound imminent abort
return (false); // recursion termination clause
// NOTE: the sum of 4 odd uint's is even, and will be halved
// we need not wait for the next recursion to divide-by-2
nxt = ((n+n+n+ONE) >> ONE); // combine the arithmetic
// known even
}
else // n-even
{
nxt = (n >> ONE); // bit shift divide by 2
if (nxt < m_N) // nxt < m_N start of recursion
return ( true ); // recursion termination clause
// We need not de-curse to 1 because all
// (n < m_N) are asserted as magic
}
}
return (checkMagicR(nxt)); // tail recursion
} // bool checkMagicR(uint64_t n)
// D4: diagnostic text output
bool checkMagicRD (uint64_t n, uint64_t rd) // rd: recursion depth
{
if (n > m_Max_n) m_Max_n = n;
m_TotalRecurse += 1;
m_ssMagic << "\n" << rd << std::setw(static_cast<int>(rd)) << " "
<< " checkMagicRD (" << m_N << ", " << n << ")";
uint64_t nxt;
{
if (n & B00) // n-odd
{
if (n >= m_upperLimitN) // wraparound imminent abort
return ( terminateCheckMagic (nxt, rd, false) );
// NOTE: the sum of 4 odd uint's is even, and will be divided by 2
// we need not wait for the next recursion to divide by 2
nxt = ((n+n+n+ONE) >> ONE); // combine the arithmetic
// ||||||||| ||||||
// sum is even unknown after divide by two
}
else // n-even
{
nxt = (n >> ONE); // bit shift divide by 2
if (nxt < m_N) // nxt < m_N start of recursion
return ( terminateCheckMagic (nxt, rd, true) );
// We need not de-curse to 1 because all
// (n < m_N) are asserted as magic
}
}
return (checkMagicRD (nxt, (rd+1))); // tail recursion
} // bool checkMagicRD(uint64_t, uint64_t rd)
bool terminateCheckMagic (uint64_t n, uint64_t rd, bool rslt)
{
if (rd > m_MaxRcrsDpth) // diag only
{
std::chrono::milliseconds durationMS = getChronoMillisecond();
uint64_t durationSec = durationMS.count() / 1000;
uint64_t durationMSec = durationMS.count() % 1000;
m_MaxRcrsDpth = rd;
// generate noise each update - this does not happen often
static uint64_t count = 0;
std::cerr << "\n\n" << std::setfill(' ')
<< std::setw(30) << "["
<< std::setw(4) << durationSec << "." << std::setfill('0') << std::setw(3)
<< durationMSec << std::setfill(' ')
<< " sec] m_N: " << std::setw(10) << m_N
<< " n: " << std::setw(10) << n
<< " MaxRcrsDpth: " << std::setw(5) << m_MaxRcrsDpth
<< " Max_n: " << std::setw(16) << m_Max_n
<< " (" << count << ")" << std::flush;
count += 1; // how many times MaxRcrsDpth updated
}
m_ssMagic << "\n " << std::setw(3) << rd << std::setw(static_cast<int>(rd)) << " "
<< " " << (rslt ? "True " : ">>>false<<< ") << m_N << ", ";
return (rslt);
}
uint64_t initWrapAroundLimit()
{
uint64_t upLimit = 0;
uint64_t u64MAX = std::numeric_limits<uint64_t>::max();
// there exist a max number, m_upperLimitN
// where 3n+1 < u64MAX, i.e.
upLimit = ((u64MAX - 1) / 3);
return (upLimit);
} // uint64_t initWrapAroundLimit()
public:
int exec (int argc, char* argv[])
{
int retVal = -1;
{ time_t t0 = std::time(nullptr); while(t0 == time(nullptr)) { /* do nothing*/ }; }
// result: consistent run time
switch (argc)
{
case 1: // no parameters: 5 billion tests, 10 blocks, this instance, < 80 sec
{
exec1();
retVal = 0;
}
break;
case 2: // one parameter: 2 threads each runs 5/2 billion tests, in 5 blocks,
{ // two new instances, < 40 sec
// 2 new objects
T431_t t431a(MaxDualCoreMS); // lower test sequence
T431_t t431b(MaxDualCoreMS); // upper test sequence
// 2 threads
std::thread tA (&T431_t::exec2a, &t431a);
std::thread tB (&T431_t::exec2b, &t431b);
// 2 join's
tA.join();
tB.join();
// lower test sequence (tA) output directly to std::cout
// upper test sequence (tB) captured in T431_t::m_ssMagic. send it now
std::cout << t431b.ssMagicShow() << std::endl;
retVal = 0;
}
break;
case 3: // argv[1] -> start address, argv[2] -> length,
{
retVal = exec3 (std::string(argv[1]), std::string(argv[2]));
} break;
default :
{
std::cerr
<< "\nUsage: "
<< "\n argc (= " << argc << ") not expected, should be 0, 1, or 2 parameters."
<< "\n 1 (no parameters) : 5 billion tests, 10 blocks, < 80 sec\n"
//
<< "\n 2 (one parameter) : 2 threads, each 5/2 billion tests, 5 blocks, < 40 sec\n"
//
<< "\n 3 (two parameters): verbose mode: start argv[1], test count argv[2] \n"
//
<< "\n 3.1: \"./dumy431V4 3 1000 1> /tmp/dumy431V4.out2 2> /tmp/dumy431V4.out2a "
<< "\n 3 1 K std::cout redirected std::cerr redirected \n"
//
<< "\n 3.2: \"./dumy431V4 3 1000000000 1> /dev/null 2> /tmp/dumy431V4.out2b "
<< "\n 3 1 B discard std::cout std::cerr redirected \n"
//
<< "\n 3.3: \"./dumy431V4 15 100 "
<< "\n 15 100 (cout and cerr to console) \n"
<< std::endl;
retVal = 0;
}
} // switch
return(retVal);
} // int exec(int argc, char* argv[])
}; // class T431
int main (int argc, char* argv[])
{
std::cout << "argc: " << argc << std::endl;
for (int i=0; i<argc; i+=1) std::cout << argv[i] << " ";
std::cout << std::endl;
setlocale(LC_ALL, "");
std::ios::sync_with_stdio(false);
int retVal;
{
T431_t t431;
retVal = t431.exec(argc, argv);
}
std::cout << "\nFINI " << std::endl;
return(retVal);
}
提交的代码:
a) 使用 assert(x) 进行所有检查(因为 assert 快速且具有
优化器不能忽略的副作用)。
b) 通过使用持续时间的断言来检测任何无限循环。
c) 断言以防止任何回绕。
当任何断言发生时,程序都不会正常终止。
当这段代码正常结束时,表示:
a) no disproven numbers found, and
b) the running time < 80 seconds, so no endless loop occurred and
c) no wrap-around occurred
关于递归 checkMagicR(uint64_t n) 的注意事项:
n 的 3 种形式在递归中可用: