【发布时间】:2019-10-15 17:15:31
【问题描述】:
下面是我尝试使用 OpenMP 和循环平铺(又名循环阻塞)优化的功能。但是,在我应用下面的循环平铺后,我的 out 输出当前给出了错误的值。有人可以查看我的代码,并指出它出错的原因。非常感谢
#include <stdlib.h>
#include <stdio.h>
#include <omp.h>
#include "utils.h"
const long BLOCK_SIZE = 8*DIM;
int i, j, k,ii,jj,kk, dim = DIM-1;
long compute, out = 1.0, we_need, gimmie;
void work_it_par(long *old, long *new)
{
we_need = need_func();
gimmie = gimmie_func();
#pragma omp parallel for private(i,j,k,ii,jj,kk, compute) firstprivate(we_need, gimmie, dim,old,BLOCK_SIZE) reduction(+:out) num_threads(omp_get_num_procs())
for (ii=1; ii<dim-BLOCK_SIZE; ii+=BLOCK_SIZE) {
for (jj=1; jj<dim-BLOCK_SIZE; jj+=BLOCK_SIZE) {
for (kk=1; kk<dim-BLOCK_SIZE; kk+=BLOCK_SIZE) {
for (i=ii; i<ii+BLOCK_SIZE; i++) {
for (j=jj; j<jj+BLOCK_SIZE; j++) {
for (k=kk; k<kk+BLOCK_SIZE; k++) {
//int temp = i*DIM*DIM+j*DIM+k;
compute = old[i*DIM*DIM+j*DIM+k] * we_need;
out += compute / gimmie;
}
}
}
}
}
}
printf("AGGR:%ld\n",out);
}
【问题讨论】:
标签: c optimization openmp