【发布时间】:2016-02-12 21:58:32
【问题描述】:
我想分析几个不同的函数,因此从两个简单的函数开始:
void expVec(const int a_len, const double *a, double *b)
{
vdExp(a_len, a, b);//Using the vectorial exp-function from the mkl library
}
和
void expVec_sample(const int len, const double *a, double *b)
{
for(int i = 0; i < len; i++)//Serial execution
b[i] = exp(a[i]);
}
我用
称呼他们for(int i = 0; i < 10000; i++)
{
expVec(len, a, c);
expVec_sample(len, a, b);
}
并用
编译程序gcc-5 main.c speed_test.c -o main -I/opt/intel/mkl/include -pg -g -DMKL_ILP64 -mavx -msse4.2 -msse3 -msse2 -m64 -flto -march=native -funroll-loops -std=gnu99 -Wl,--start-group /opt/intel/compilers_and_libraries_2016.0.109/linux/mkl/lib/intel64/libmkl_intel_ilp64.a /opt/intel/compilers_and_libraries_2016.0.109/linux/mkl/lib/intel64/libmkl_core.a /opt/intel/compilers_and_libraries_2016.0.109/linux/mkl/lib/intel64/libmkl_sequential.a -Wl,--end-group -lpthread -lm -ldl -lfftw3 -lfftw3_threads
由于来自另一个项目的复制和粘贴(我最终将使用分析的结果),我正在使用大部分开关。
现在我得到(当试图测量这两个函数的性能时)gprof 中的结果:
index % time self children called name
<spontaneous>
[1] 35.6 2.45 0.00 mkl_vml_kernel_dError [1]
-----------------------------------------------
<spontaneous>
[2] 33.4 2.29 0.00 mkl_vml_kernel_dExp_E9HAynn [2]
-----------------------------------------------
<spontaneous>
[3] 12.9 0.89 0.00 mkl_vml_kernel_GetMode [3]
-----------------------------------------------
0.64 0.00 10000/10000 main [5]
[4] 9.3 0.64 0.00 10000 expVec_sample [4]
-----------------------------------------------
<spontaneous>
[5] 9.3 0.00 0.64 main [5]
0.64 0.00 10000/10000 expVec_sample [4]
0.00 0.00 10000/10000 expVec [9]
0.00 0.00 1/1 fill [10]
-----------------------------------------------
<spontaneous>
[6] 7.0 0.48 0.00 vdexp_cout_rare [6]
-----------------------------------------------
<spontaneous>
[7] 1.4 0.10 0.00 mkl_vml_kernel_SetMode [7]
-----------------------------------------------
<spontaneous>
[8] 0.4 0.03 0.00 mkl_vml_kernel_zError [8]
-----------------------------------------------
0.00 0.00 10000/10000 main [5]
[9] 0.0 0.00 0.00 10000 expVec [9]
-----------------------------------------------
0.00 0.00 1/1 main [5]
[10] 0.0 0.00 0.00 1 fill [10]
-----------------------------------------------
这已经表明 mkl 中的错误处理需要大量时间。当我尝试使用 valgrind 对其进行分析时,我得到:
==29598== valgrind: Unrecognised instruction at address 0x40ea75.
==29598== at 0x40EA75: mkl_vml_kernel_dExp_E9HAynn (in /home/roland/Dokumente/MA/gnlse/c-ext_test/speed_test/main)
==29598== by 0x40220D: vdExp (in /home/roland/Dokumente/MA/gnlse/c-ext_test/speed_test/main)
==29598== by 0x401D0B: expVec (speed_test.c:21)
==29598== by 0x401BC5: main (main.c:18)
==29598== Your program just tried to execute an instruction that Valgrind
==29598== did not recognise. There are two possible reasons for this.
==29598== 1. Your program has a bug and erroneously jumped to a non-code
==29598== location. If you are running Memcheck and you just saw a
==29598== warning about a bad jump, it's probably your program's fault.
==29598== 2. The instruction is legitimate but Valgrind doesn't handle it,
==29598== i.e. it's Valgrind's fault. If you think this is the case or
==29598== you are not sure, please let us know and we'll try to fix it.
==29598== Either way, Valgrind will now raise a SIGILL signal which will
==29598== probably kill your program.
==29598==
==29598== Process terminating with default action of signal 4 (SIGILL)
==29598== Illegal opcode at address 0x40EA75
==29598== at 0x40EA75: mkl_vml_kernel_dExp_E9HAynn (in /home/roland/Dokumente/MA/gnlse/c-ext_test/speed_test/main)
==29598== by 0x40220D: vdExp (in /home/roland/Dokumente/MA/gnlse/c-ext_test/speed_test/main)
==29598== by 0x401D0B: expVec (speed_test.c:21)
==29598== by 0x401BC5: main (main.c:18)
==29598==
==29598== Events : Ir
==29598== Collected : 530647
==29598==
==29598== I refs: 530,647
Unvalid machine command (Memory dump written)
是什么导致了这些错误,我怎样才能以有用的方式分析这两个函数?
【问题讨论】:
标签: c profiling valgrind intel-mkl gprof