C ++线程：尽管没有种族，但共享内存未更新答案

【问题标题】：C++ threads: shared memory not updated despite absence of raceC ++线程：尽管没有种族，但共享内存未更新
【发布时间】：2014-06-26 20:01:58
【问题描述】：

我正在试验 C++ 标准线程。我写了一个小基准来测试性能开销和整体吞吐量。它的原理是在一个或多个线程中运行 10 亿次迭代的循环，不时地进行小停顿。

在第一个版本中，我在共享内存中使用了计数器（即普通变量）。我期望以下输出：

Sequential      1e+009 loops    4703 ms 212630 loops/ms
2 thrds:t1      1e+009 loops    4734 ms 211238 loops/ms
2 thrds:t2      1e+009 loops    4734 ms 211238 loops/ms
2 thrds:tt      2e+009 loops    4734 ms 422476 loops/ms
manythrd tn     1e+009 loops    7094 ms 140964 loops/ms
...  
manythrd tt     6e+009 loops    7094 ms 845785 loops/ms

不幸的是，显示器显示了一些计数器，就好像它们没有初始化一样！

我可以通过将每个计数器的最终值存储在 atomic<> 中以供以后显示来解决此问题。但是我不明白为什么基于简单共享内存的版本不能正常工作：每个线程使用自己的计数器，所以没有竞态条件。甚至显示线程也只有在计数线程完成后才能访问计数器。使用 volatile 也无济于事。

谁能解释我这种奇怪的行为（好像内存没有更新）并告诉我我是否遗漏了什么？

这里是共享变量：

const int maxthread = 6;
atomic<bool> other_finished = false;
atomic<long> acounter[maxthread];

这里是线程函数的代码：

void foo(long& count, int ic, long maxcount)   
{
    count = 0;  
    while (count < maxcount) {
        count++;
        if (count % 10000000 == 0)
            this_thread::sleep_for(chrono::microseconds(1));
    }
    other_finished = true;      // atomic: announce work is finished
    acounter[ic] = count;       // atomic: share result 
}

这是我如何调用线程基准测试的示例：

mytimer.on();                 // second run, two threadeds
thread t1(foo, counter[0], 0, maxcount);  // additional thread
foo(counter[1], 1, maxcount);         // main thread
t1.join();                    // wait end of additional thread
perf = mytimer.off();     
display_perf("2 thrds:t1", counter[0], perf);  // non atomic version of code
display_perf("2 thrds:t2", counter[1], perf);
display_perf("2 thrds:tt", counter[0] + counter[1], perf);

【问题讨论】：

是的！抱歉：Win 8.1 上的 MSVC 2013，带有英特尔 i7
很可能与问题无关。但是，关于性能，您应该查看False sharing，即不同的线程不应写入位于同一缓存行上的变量，在您的情况下为counter。
关于虚假分享的非常有趣的文章。我怀疑缓存有问题。但是，在您使用 std::ref() 解决方案之后，我使用全局数组创建了我的程序的一个变体，并且没有传递引用。这工作正常，这证实了问题不是缓存而是参考。

标签： c++ multithreading c++11 benchmarking stdthread

【解决方案1】：

这是重现问题的简化版本：

void deep_thought(int& value) { value = 6 * 9; }

int main()
{
    int answer = 42;
    std::thread{deep_thought, answer).join();
    return answer; // 42
}

看起来像是将 answer 的引用传递给工作函数，并将 6 * 9 分配给引用，因此分配给 answer。但是std::thread的构造函数复制了answer，并将对该副本的引用传递给worker函数，并且主线程中的变量answer永远不会改变。

GCC-4.9 和 Clang-3.5 都拒绝上述代码，因为无法使用左值引用调用工作函数。您可以通过使用std::ref 传递变量来解决问题：

    std::thread{deep_thought, std::ref(answer)}.join();

【讨论】：

可能要提一下，解决方案是使用std::reference_wrapper。
@T.C.：感谢您的提示。我已经更新了答案。