【问题标题】:OpenMPI debugging with Valgrind and suppressions in OS X在 OS X 中使用 Valgrind 和抑制进行 OpenMPI 调试
【发布时间】:2011-10-27 03:54:01
【问题描述】:

我正在我的 OS X (Snow Leopard) 笔记本电脑上用 C++ 编写并行代码,并尝试使用 memchecker 对其进行调试。我已经成功构建了具有 valgrind 支持的 OpenMPI:configure --prefix=/opt/openmpi-1.4.3/ --enable-debug --enable-memchecker --with-valgrind=/opt/valgrind-3.6.0/ FFLAGS=-m64 F90FLAGS=-m64(忽略 Fortran 标志,这是因为我的 Fortran 编译器来自 GCC)。

当我运行我的应用程序时

mpirun -np 2 valgrind --suppressions=/opt/openmpi-1.4.3/share/openmpi/openmpi-valgrind.supp --leak-check=yes --dsymutil=yes ./program

我从 Valgrind 收到了很多警告(其中大部分来自最后的堆摘要)。我在下面的警告中包含了一个小的 sn-p。我从他们那里得到的是 Valgrind 检测到 MPI 库中的内存泄漏和未初始化的值,但我对此并不感兴趣。我想从我写的代码中得到警告。我已经使用 OpenMPI 提供的抑制文件运行 Valgrind,但显然这还不够。如何轻松忽略 OpenMPI 分发中检测到的所有其他警告?是否可以在 OS X 上使用 Valgrind 找到用于 OpenMPI 调试的抑制文件,或者您知道任何狡猾的技巧吗?

第一个警告是

 ==1531==    Syscall param writev(vector[...]) points to uninitialised byte(s)
 ==1531==    at 0x1014E16E2: writev (in /usr/lib/libSystem.B.dylib)
 ==1531==    by 0x101AEA4C5: mca_oob_tcp_peer_send (in /opt/openmpi-1.4.3/lib/openmpi/mca_oob_tcp.so)
 ==1531==    by 0x101AF0B88: mca_oob_tcp_send_nb (in /opt/openmpi-1.4.3/lib/openmpi/mca_oob_tcp.so)
 ==1531==    by 0x101AC7F48: orte_rml_oob_send (in /opt/openmpi-1.4.3/lib/openmpi/mca_rml_oob.so)
 ==1531==    by 0x101AC8AA1: orte_rml_oob_send_buffer (in /opt/openmpi-1.4.3/lib/openmpi/mca_rml_oob.so)
 ==1531==    by 0x101B3489E: allgather (in /opt/openmpi-1.4.3/lib/openmpi/mca_grpcomm_bad.so) 
 ==1531==    by 0x101B3525D: modex (in /opt/openmpi-1.4.3/lib/openmpi/mca_grpcomm_bad.so)
 ==1531==    by 0x1000A48E6: ompi_mpi_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x1000F7806: MPI_Init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100001AF2: main (main.cpp:34)
 ==1531==  Address 0x101a8911b is 107 bytes inside a block of size 256 alloc'd
 ==1531==    at 0x10002DB2D: realloc (vg_replace_malloc.c:525)
 ==1531==    by 0x1012240B6: opal_dss_buffer_extend (in /opt/openmpi-1.4.3/lib/libopen- pal.0.dylib)
 ==1531==    by 0x101225CF7: opal_dss_copy_payload (in /opt/openmpi-1.4.3/lib/libopen-pal.0.dylib)
 ==1531==    by 0x101B347CA: allgather (in /opt/openmpi-1.4.3/lib/openmpi/mca_grpcomm_bad.so)
 ==1531==    by 0x101B3525D: modex (in /opt/openmpi-1.4.3/lib/openmpi/mca_grpcomm_bad.so)
 ==1531==    by 0x1000A48E6: ompi_mpi_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x1000F7806: MPI_Init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100001AF2: main (main.cpp:34)

执行后堆摘要的小 sn-p 如下所示

 ==1531== 88 bytes in 1 blocks are definitely lost in loss record 1,950 of 2,194
 ==1531==    at 0x10002D915: malloc (vg_replace_malloc.c:236)
 ==1531==    by 0x100073888: opal_obj_new (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073808: opal_obj_new_debug (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073C17: ompi_attr_create_keyval_impl (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073FCF: ompi_attr_create_keyval (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100077C96: create_comm (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x10007798A: ompi_attr_create_predefined (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x1000737CF: ompi_attr_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x1000A4840: ompi_mpi_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x1000F7806: MPI_Init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100001AF2: main (main.cpp:34)

...

 ==1531== 88 bytes in 1 blocks are definitely lost in loss record 1,952 of 2,194
 ==1531==    at 0x10002D915: malloc (vg_replace_malloc.c:236)
 ==1531==    by 0x100073888: opal_obj_new (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073808: opal_obj_new_debug (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073C17: ompi_attr_create_keyval_impl (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073FCF: ompi_attr_create_keyval (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x10014CEC5: PMPI_Keyval_create (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x1065ACFE6: ???
 ==1531==    by 0x10658867B: ???
 ==1531==    by 0x10017A591: module_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100179985: mca_io_base_file_select (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100089D55: ompi_file_open (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x1000E1ED1: MPI_File_open (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531== 
 ==1531== 88 bytes in 1 blocks are definitely lost in loss record 1,953 of 2,194
 ==1531==    at 0x10002D915: malloc (vg_replace_malloc.c:236)
 ==1531==    by 0x100073888: opal_obj_new (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073808: opal_obj_new_debug (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073C17: ompi_attr_create_keyval_impl (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073FCF: ompi_attr_create_keyval (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x10014CEC5: PMPI_Keyval_create (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x1065A6210: ???
 ==1531==    by 0x106597149: ???
 ==1531==    by 0x106596AAB: ???
 ==1531==    by 0x1065AD14C: ???
 ==1531==    by 0x10658867B: ???
 ==1531==    by 0x10017A591: module_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)

【问题讨论】:

  • 澄清一下:我不知道这是 OS X 问题还是 Linux 系统上的问题,因为我没有在 Linux 系统上测试过。
  • 格式化代码示例或程序输出,请不要使用>符号。选择代码示例并按下顶部面板上的按钮“{}”。它将在每一行的开头添加 4 个空格符号(0x20)

标签: macos mpi valgrind openmpi


【解决方案1】:

我无法谈论 Open MPI 在 Valgrind 下的行为,但 MPICH2 在这方面应该会更好。如果您不是特别需要 Open MPI 作为您的 MPI 实现,那么您可以easily configure MPICH2 to avoid problems with Valgrind

【讨论】:

  • 我会尝试研究一下,到目前为止我只使用过 OpenMPI。无论如何,我想我可以使用 MPICH2 调试我的代码,或者将 OpenMPI 用于其他目的
【解决方案2】:

您可以自己为 valgrind 添加其他抑制。这些将处理您发布的第一组警告:

{
  ORTE OOB suppression rule
  Memcheck:Param
  writev(vector[...])
  fun:writev
  fun:mca_oob_tcp_msg_send_handler
  fun:mca_oob_tcp_peer_send
  fun:mca_oob_tcp_send_nb
  fun:orte_rml_oob_send
  fun:orte_rml_oob_send_buffer
  ...
  fun:ompi_mpi_init
}

{
  OMPI init leak
  Memcheck:Leak
  fun:malloc
  ...
  fun:ompi_mpi_init
}

{
  OMPI init leak
  Memcheck:Leak
  fun:realloc
  ...
  fun:ompi_mpi_init
}

{
  OMPI init leak
  Memcheck:Leak
  fun:calloc
  ...
  fun:ompi_mpi_init
}

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2010-09-12
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2010-09-09
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多