所有分析工具都依赖于编译器在构建期间生成的调试信息。只要调试信息捕获了这些优化(尤其是内联),分析工具就能够将其映射到正确的源位置。对于 ICC,当您在启用优化的情况下构建代码时,请使用编译器选项“-debug inline-debug-info”。因此,如果您的函数是内联的,它将确保它会在调用站点和被调用站点(定义函数的位置)调用优化。下面是一个简单的例子,说明了这一点:
#include <iostream>
#include <tbb/tbb.h>
#include <tbb/parallel_for.h>
#include <cstdlib>
using namespace std;
using namespace tbb;
long len = 0;
float *__restrict__ a;
float *__restrict__ b;
float *__restrict__ c;
class Test {
public:
void operator()( const blocked_range<size_t>& x ) const {
for (long i=x.begin(); i!=x.end(); ++i ) {
c[i] = (a[i] * b[i]) + b[i];
}
}
};
int main(int argc, char* argv[]) {
cout << atol(argv[1]) << endl;
len = atol(argv[1]);
a = new float[len];
b = new float[len];
c = new float[len];
parallel_for(blocked_range<size_t>(0,len, 100), Test() );
return 0;
}
使用以下编译器选项构建上述代码会发出矢量化报告,该报告不会将矢量化报告映射到正确的源代码行:
$ icpc testdebug.cc -c -vec-report2 -O3
tbb/parallel_for.h(127): (col. 22) remark: loop was not vectorized: unsupported loop structure
tbb/parallel_for.h(127): (col. 22) remark: LOOP WAS VECTORIZED
tbb/parallel_for.h(127): (col. 22) remark: loop was not vectorized: unsupported loop structure
tbb/parallel_for.h(127): (col. 22) remark: LOOP WAS VECTORIZED
tbb/parallel_for.h(127): (col. 22) remark: loop was not vectorized: nonstandard loop is not a vectorization candidate
tbb/parallel_for.h(127): (col. 22) remark: loop was not vectorized: nonstandard loop is not a vectorization candidate
tbb/partitioner.h(164): (col. 9) remark: loop was not vectorized: existence of vector dependence
从上面的报告中,我们看到两条“LOOP WAS VECTORIZED”消息,但映射到parallel_for.h TBB 头。没有与我们程序中的函子相对应的报告。由于函子是在parallel_for 块中调用的,因此函数定义内联在parallel_for.h
为了捕获该信息,请在构建期间使用 -debug inline-debug-info 编译器选项,生成的矢量化报告将如下所示:
$ icpc testdebug.cc -c -vec-report2 -O3 -debug inline-debug-info
tbb/partitioner.h(171): (col. 9) remark: loop was not vectorized: unsupported loop structure
testdebug.cc(14): (col. 37) remark: LOOP WAS VECTORIZED
tbb/partitioner.h(164): (col. 9) remark: loop was not vectorized: unsupported loop structure
testdebug.cc(14): (col. 37) remark: LOOP WAS VECTORIZED
tbb/partitioner.h(245): (col. 33) remark: loop was not vectorized: nonstandard loop is not a vectorization candidate
tbb/partitioner.h(265): (col. 52) remark: loop was not vectorized: nonstandard loop is not a vectorization candidate
tbb/partitioner.h(164): (col. 9) remark: loop was not vectorized: existence of vector dependence
从上面的报告中可以清楚地看出,testdebug.cc(14) 中的“LOOP WAS VECTORIZED”。