【发布时间】:2016-08-26 23:27:25
【问题描述】:
我使用 Vampir 在集群上工作,用于可视化 mpi 通信。因为集群缺少 MPI3 实现,所以我在我的主目录中安装了 OpenMPI 2.0.0(除了 --prefix 之外没有使用其他标志)(没有 Vampir 也可以正常工作)。现在我不知道将我的本地 MPI3-install 与 Vampir 正确结合来构建我的程序 (fetchAndOpTest.f90)。我尝试了以下方法:
vtf90 -vt:fc ~/OpenMPI2/bin/mpif90 -o fetchAndOpTestF90.x fetchAndOpTest.f90
(不知道它是否重要,但这会发出以下警告:/usr/bin/ld: warning: libmpi.so.1, needed by /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../lib/libmpi_f77.so, may conflict with libmpi.so.20)
使用~/OpenMPI2/bin/mpirun -np 2 fetchAndOpTestF90.x 执行我的程序会导致:
fetchAndOpTestF90.x: error while loading shared libraries: libvt-mpi.so.0: cannot open shared object file: No such file or directory [...]
因此我也尝试了vtf90 -vt:fc ~/OpenMPI2/bin/mpif90 -L/opt/vampirtrace/5.14.4/lib -o fetchAndOpTestF90.x fetchAndOpTest.f90,但它并没有改变ldd输出。
编辑:按照@Harald 的建议编辑了 LD_LIBRARY_PATH。
> ldd fetchAndOpTestF90.x
linux-vdso.so.1 => (0x00007ffc6ada9000)
libmpi_f77.so.1 => /usr/lib/libmpi_f77.so.1 (0x00007ff8fdf2e000)
libvt-mpi.so.0 => /opt/vampirtrace/5.14.4/lib/libvt-mpi.so.0 (0x00007ff8fdca3000)
libvt-mpi-unify.so.0 => /opt/vampirtrace/5.14.4/lib/libvt-mpi-unify.so.0 (0x00007ff8fda18000)
libotfaux.so.0 => /opt/vampirtrace/5.14.4/lib/libotfaux.so.0 (0x00007ff8fd810000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ff8fd50c000)
libopen-trace-format.so.1 => /opt/vampirtrace/5.14.4/lib/libopen-trace-format.so.1 (0x00007ff8fd2c4000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007ff8fd0ab000)
libpapi.so.5.3 => /usr/lib/x86_64-linux-gnu/libpapi.so.5.3 (0x00007ff8fce57000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff8fcc53000)
libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007ff8fc939000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff8fc633000)
libmpi_usempi.so.20 => /home/USER/OpenMPI2/lib/libmpi_usempi.so.20 (0x00007ff8fc430000)
libmpi_mpifh.so.20 => /home/USER/OpenMPI2/lib/libmpi_mpifh.so.20 (0x00007ff8fc1df000)
libmpi.so.20 => /home/USER/OpenMPI2/lib/libmpi.so.20 (0x00007ff8fbefb000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ff8fbce5000)
libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007ff8fbaa9000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ff8fb88b000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff8fb4c6000)
libmpi.so.1 => /usr/lib/libmpi.so.1 (0x00007ff8fb145000)
/lib64/ld-linux-x86-64.so.2 (0x00007ff8fe162000)
libpfm.so.4 => /usr/lib/x86_64-linux-gnu/libpfm.so.4 (0x00007ff8fadff000)
libopen-pal.so.20 => /home/USER/OpenMPI2/lib/libopen-pal.so.20 (0x00007ff8fab09000)
libopen-rte.so.20 => /home/USER/OpenMPI2/lib/libopen-rte.so.20 (0x00007ff8fa887000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007ff8fa684000)
libhwloc.so.5 => /usr/lib/x86_64-linux-gnu/libhwloc.so.5 (0x00007ff8fa43b000)
libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007ff8fa231000)
libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007ff8fa026000)
libpciaccess.so.0 => /usr/lib/x86_64-linux-gnu/libpciaccess.so.0 (0x00007ff8f9e1d000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ff8f9c15000)
现在执行抛出错误:mpirun noticed that process rank 0 with PID 0 on node cluster exited on signal 11 (Segmentation fault)(程序是正确的,并且在没有 Vampir 的情况下使用本地 MPI3 安装构建和执行运行正常)
【问题讨论】: