OpenMP 共享文件处理程序答案

【问题标题】：OpenMP share file handlerOpenMP 共享文件处理程序
【发布时间】：2013-04-18 19:33:57
【问题描述】：

我有一个循环，我使用 OpenMP 对其进行并行化。在这个循环中，我从文件中读取一个三角形，并对这些数据执行一些操作。这些操作在每个三角形之间都是独立的，所以我认为这很容易并行化，只要我将文件的实际读取保持在关键部分。

读取三角形的顺序并不重要
有些三角形被读取并很快被丢弃，有些则需要更多的算法工作（bbox 构造，...）
我在做二进制 I/O
使用 C++ ifstream *tri_data*
我正在 SSD 上进行测试

ReadTriangle 调用 file.read() 并从 ifstream 中读取 12 个浮点数。

#pragma omp parallel for shared (tri_data)
for(int i = 0; i < ntriangles ; i++) {
    vec3 v0,v1,v2,normal;
#pragma omp critical
    {
        readTriangle(tri_data,v0,v1,v2,normal);
    }
    (working with the triangle here)
}

现在，我观察到的行为是启用 OpenMP 后，整个过程会变慢。我在我的代码中添加了一些计时器来跟踪在 I/O 方法中花费的时间，以及在循环本身中花费的时间。

没有 OpenMP：

Total IO IN time       : 41.836 s.
Total algorithm time   : 15.495 s.

使用 OpenMP：

Total IO IN time       : 48.959 s.
Total algorithm time   : 44.61 s.

我的猜测是，由于读取处于临界区，线程只是在等待彼此完成使用文件处理程序，导致等待时间更长。

关于如何解决这个问题的任何指示？我的程序将真正受益于处理具有多个进程的读取三角形的可能性。我试过玩弄线程调度和相关的东西，但在这种情况下似乎没有多大帮助。

由于我正在研究核外算法，因此引入缓冲区来容纳大量三角形并不是一个真正的选择。

【问题讨论】：

让一个线程完成所有 IO 并将三角形放入队列或类似的东西中，而其他线程获取三角形并处理它们怎么样？

标签： c++ performance io openmp

【解决方案1】：

所以，我提出的解决方案是基于主/从策略，其中：

master（线程 0）执行所有 I/O
奴隶对检索到的数据做一些工作

伪代码如下所示：

#include<omp.h>

vector<vec3> v0;
vector<vec3> v1;
vector<vec3> v2;
vector<vec3> normal;

vector<int> tdone;

int nthreads;
int triangles_read = 0;

/* ... */

#pragma omp parallel shared(tri_data)
{
  int id = omp_get_thread_num();
  /*
   * Initialize all the buffers in the master thread.
   * Notice that the size in memory is similar to your example.
   */
#pragma omp single
  {
    nthreads = omp_get_num_threads();
    v0.resize(nthreads);
    v1.resize(nthreads);
    v2.resize(nthreads);
    normal.resize(nthreads);
    tdone.resize(nthreads,1);
  }

  if ( id == 0 ) { // Producer thread

    int next = 1; 
    while( triangles_read != ntriangles ) {
      if ( tdone[next] ) { // If the next thread is free
        readTriangle(tri_data,v0[next],v1[next],v2[next],normal[next]); // Read data and fill the correct buffer
        triangles_read++;
        tdone[next] = 0; // Set a flag for thread next to start working
#pragma omp flush (tdone[next],triangles_read) // Flush it
      }
      next = next%(nthreads - 1) + 1; // Set next
    } // while

  } else { // Consumer threads

    while( true  ) { // Wait for work                  
      if( tdone[id] == 0) {
        /* ... do work here on v0[id], v1[id], v2[id], normal[id] ... */
        tdone[id] == 1;
#pragma omp flush (tdone[id]) // Flush it   
      }      
      if( tdone[id] == 1 && triangles_read == ntriangles) break; // Work finished for all
    }

  }
#pragma omp barrier

}

我不确定这对你是否仍然有价值，但无论如何这是一个不错的预告片！

【讨论】：

谢谢，伙计。很快就会尝试一下，如果它满足我的需求，就标记为答案。