【问题标题】:Profiling intel mkl functions with valgrind and gprof使用 valgrind 和 gprof 分析 intel mkl 函数
【发布时间】:2016-02-12 21:58:32
【问题描述】:

我想分析几个不同的函数,因此从两个简单的函数开始:

void expVec(const int a_len, const double *a, double *b)
{
    vdExp(a_len, a, b);//Using the vectorial exp-function from the mkl library
}

void expVec_sample(const int len, const double *a, double *b)
{
    for(int i = 0; i < len; i++)//Serial execution
        b[i] = exp(a[i]);
}

我用

称呼他们
for(int i = 0; i < 10000; i++)
{
    expVec(len, a, c);
    expVec_sample(len, a, b);
}

并用

编译程序
gcc-5 main.c speed_test.c -o main -I/opt/intel/mkl/include -pg -g -DMKL_ILP64 -mavx -msse4.2 -msse3 -msse2 -m64 -flto -march=native -funroll-loops -std=gnu99 -Wl,--start-group /opt/intel/compilers_and_libraries_2016.0.109/linux/mkl/lib/intel64/libmkl_intel_ilp64.a /opt/intel/compilers_and_libraries_2016.0.109/linux/mkl/lib/intel64/libmkl_core.a /opt/intel/compilers_and_libraries_2016.0.109/linux/mkl/lib/intel64/libmkl_sequential.a -Wl,--end-group -lpthread -lm -ldl -lfftw3 -lfftw3_threads

由于来自另一个项目的复制和粘贴(我最终将使用分析的结果),我正在使用大部分开关。
现在我得到(当试图测量这两个函数的性能时)gprof 中的结果:

index % time    self  children    called     name
                                                 <spontaneous>
[1]     35.6    2.45    0.00                 mkl_vml_kernel_dError [1]
-----------------------------------------------
                                                 <spontaneous>
[2]     33.4    2.29    0.00                 mkl_vml_kernel_dExp_E9HAynn [2]
-----------------------------------------------
                                                 <spontaneous>
[3]     12.9    0.89    0.00                 mkl_vml_kernel_GetMode [3]
-----------------------------------------------
                0.64    0.00   10000/10000       main [5]
[4]      9.3    0.64    0.00   10000         expVec_sample [4]
-----------------------------------------------
                                                 <spontaneous>
[5]      9.3    0.00    0.64                 main [5]
                0.64    0.00   10000/10000       expVec_sample [4]
                0.00    0.00   10000/10000       expVec [9]
                0.00    0.00       1/1           fill [10]
-----------------------------------------------
                                                 <spontaneous>
[6]      7.0    0.48    0.00                 vdexp_cout_rare [6]
-----------------------------------------------
                                                 <spontaneous>
[7]      1.4    0.10    0.00                 mkl_vml_kernel_SetMode [7]
-----------------------------------------------
                                                 <spontaneous>
[8]      0.4    0.03    0.00                 mkl_vml_kernel_zError [8]
-----------------------------------------------
                0.00    0.00   10000/10000       main [5]
[9]      0.0    0.00    0.00   10000         expVec [9]
-----------------------------------------------
                0.00    0.00       1/1           main [5]
[10]     0.0    0.00    0.00       1         fill [10]
-----------------------------------------------

这已经表明 mkl 中的错误处理需要大量时间。当我尝试使用 valgrind 对其进行分析时,我得到:

==29598== valgrind: Unrecognised instruction at address 0x40ea75.
==29598==    at 0x40EA75: mkl_vml_kernel_dExp_E9HAynn (in /home/roland/Dokumente/MA/gnlse/c-ext_test/speed_test/main)
==29598==    by 0x40220D: vdExp (in /home/roland/Dokumente/MA/gnlse/c-ext_test/speed_test/main)
==29598==    by 0x401D0B: expVec (speed_test.c:21)
==29598==    by 0x401BC5: main (main.c:18)
==29598== Your program just tried to execute an instruction that Valgrind
==29598== did not recognise.  There are two possible reasons for this.
==29598== 1. Your program has a bug and erroneously jumped to a non-code
==29598==    location.  If you are running Memcheck and you just saw a
==29598==    warning about a bad jump, it's probably your program's fault.
==29598== 2. The instruction is legitimate but Valgrind doesn't handle it,
==29598==    i.e. it's Valgrind's fault.  If you think this is the case or
==29598==    you are not sure, please let us know and we'll try to fix it.
==29598== Either way, Valgrind will now raise a SIGILL signal which will
==29598== probably kill your program.
==29598== 
==29598== Process terminating with default action of signal 4 (SIGILL)
==29598==  Illegal opcode at address 0x40EA75
==29598==    at 0x40EA75: mkl_vml_kernel_dExp_E9HAynn (in /home/roland/Dokumente/MA/gnlse/c-ext_test/speed_test/main)
==29598==    by 0x40220D: vdExp (in /home/roland/Dokumente/MA/gnlse/c-ext_test/speed_test/main)
==29598==    by 0x401D0B: expVec (speed_test.c:21)
==29598==    by 0x401BC5: main (main.c:18)
==29598== 
==29598== Events    : Ir
==29598== Collected : 530647
==29598== 
==29598== I   refs:      530,647
Unvalid machine command (Memory dump written)

是什么导致了这些错误,我怎样才能以有用的方式分析这两个函数?

【问题讨论】:

    标签: c profiling valgrind intel-mkl gprof


    【解决方案1】:

    我不了解 valgrind,但正如您所见,gprof 不会为您提供有用的信息。例如,看起来它在错误处理上花费了大部分时间。做什么?目的是什么?

    也许它只是进去然后又出来了,你的向量太小了,处理它们只需要很少的时间,它与那个错误例程什么都不做差不多。

    Here is a ton of discussion about things like that, and what to do about it.

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2023-03-12
      • 1970-01-01
      • 2015-04-24
      • 1970-01-01
      • 2013-07-16
      • 2020-03-04
      • 2016-11-26
      • 1970-01-01
      相关资源
      最近更新 更多