Intel TBB 取得工作进展答案

【问题标题】：Intel TBB get progress of workIntel TBB 取得工作进展
【发布时间】：2013-12-06 16:53:08
【问题描述】：

如何从TBBparallel_for 获取进度信息？

tbb::parallel_for(tbb::blocked_range<size_t>(0,1000),classA);

【问题讨论】：

您可能正在寻找 TBB 库中的流程图，而不仅仅是一个循环...
我会扭转这种局面...你如何从串行 for 循环中获得进展？
@Rick 提出正确的问题，如果使用串行循环，您通过简单/串行回调函数从唯一的 1 个主线程发送进度...您也可以这样做，但附加线程 ID (sender) ，根据他们拥有的计划数量作业计算每个线程的进度（使用静态调度会更容易:) 并在回调中执行“减少”（进度 += 1 个线程的进度/线程总数）！！！但是注意进度必须受到“保护”（共享）...
这就是我的观点。在您的示例 concurrent_unordered_map 中，具有线程 id 键和保存计数的值将运行得非常好。

标签： c++ tbb

【解决方案1】：

如果您只需要计算当前执行了多少次迭代，一个简单的解决方案可能是使用全局原子计数器：

#include <tbb/tbb.h>
tbb::atomic<size_t> atomic_progress_counter;

void ParallelFoo() {
    tbb::parallel_for( tbb::blocked_range<size_t>(0, 1000),
        [&]( tbb::blocked_range<size_t> r ) {
            for( size_t i=r.begin(); i!=r.end(); ++i ) {
                Foo(i);
                ++atomic_progress_counter;
            }
        }
    );
}

但是，如果每次迭代的工作量小而硬件并发量大，共享变量的原子增量可能会增加显着的开销。例如，我会在 Intel 的 Xeon Phi 协处理器上小心使用这种方法。

【讨论】：

【解决方案2】：

Rick 建议使用 concurrent_unordered_map 是一个很好的建议。这是另一种方式，在高层次上基本上是相同的想法，但使用其他 TBB 机制以避免处理显式线程 ID。

zero_allocator 在此对于 close a timing hole 在 concurrent_vector 中的元素的分配和初始化之间是必需的。

#include <tbb/tbb.h>

typedef size_t ProgressType;
typedef tbb::atomic<ProgressType> ProgressCounter;
tbb::enumerable_thread_specific<ProgressCounter> LocalCounters;

// zero_allocator is essential here.
tbb::concurrent_vector<ProgressCounter*, tbb::zero_allocator<ProgressCounter*> > LocalCounterPointers;

void AddToProgress(ProgressType delta) {
    bool exists;
    auto& i = LocalCounters.local(exists);
    i += delta;
    if( !exists )
        // First time we've seen this local counter.
        LocalCounterPointers.push_back(&i);
}

ProgressType GetProgress() {
    ProgressType sum = 0;
    size_t n = LocalCounterPointers.size();
    for( size_t i=0; i<n; ++i )
        // "if" deals with timing hold where slot in LocalCounterPointers was allocated but not initialized.
        if( auto* j = LocalCounterPointers[i] )
            sum += *j;
    return sum;
}

// Can be called asynchronously.
void ClearProgress() {
    size_t n = LocalCounterPointers.size();
    for( size_t i=0; i<n; ++i )
        // "if" deals with timing hold where slot in LocalCounterPointers was allocated but not initialized.
        if( auto* j = LocalCounterPointers[i] )
            *j = 0;

}

// Demo code
#include <iostream>

int main() {
    ClearProgress();
    tbb::parallel_for( tbb::blocked_range<int>(0, 1000),
                      [&]( tbb::blocked_range<int> r ) {
                         for( int i=r.begin(); i!=r.end(); ++i ) {
                             AddToProgress(1);
                             std::cout << "progress = " << GetProgress() << std::endl;
                         }
                      }
    );
}

【讨论】：