【发布时间】:2019-12-04 18:47:20
【问题描述】:
我有一个包含已编译 C++ MPI Hello World 代码的共享对象文件。当我尝试使用 ctypes 从 Python 调用它时,我得到了一个相当无用的错误列表。
mpiHello.cpp:
#include <mpi.h>
extern "C"
void mpiHello() {
int rank, size;
MPI_Init(NULL, NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
std::cout << "Hello world! I am " << rank << " of " << size << std::endl;
MPI_Finalize();
}
编译命令:
mpic++ -fPIC -o mpi.so mpiHello.cpp -shared
Python 调用:
def __init__(self):
self.dll = None
_DIRNAME = os.path.dirname(__file__)
try: # Windows
self.dll = ctypes.CDLL(os.path.join(_DIRNAME, "mpi.dll"))
except OSError: # Linux
self.dll = ctypes.cdll.LoadLibrary(os.path.join(_DIRNAME, "mpi.so"))
finally:
self.dll.mpiHello.argtypes = []
def execute(self):
self.dll.mpiHello()
_mpi = mpi()
_mpi.execute()
[<user>-OptiPlex-7050:09468] mca_base_component_repository_open: unable to open mca_shmem_mmap: /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_shmem_mmap.so: undefined symbol: opal_show_help (ignored)
[<user>-OptiPlex-7050:09468] mca_base_component_repository_open: unable to open mca_shmem_sysv: /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_shmem_sysv.so: undefined symbol: opal_show_help (ignored)
[<user>-OptiPlex-7050:09468] mca_base_component_repository_open: unable to open mca_shmem_posix: /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_shmem_posix.so: undefined symbol: opal_shmem_base_framework (ignored)
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_shmem_base_select failed
--> Returned value -1 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_init failed
--> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
--> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[<user>-OptiPlex-7050:9468] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
Process finished with exit code 1
我希望代码显示 4 行“Hello world!我是......但我只是收到错误。任何帮助将不胜感激!
【问题讨论】:
-
您可能需要考虑使用已经解决此问题的
mpi4py。如果没有,请查看代码。对于 Open MPI,除非您使用--disable-dlopen配置它,否则您必须使用dlopen(‘libmpi.so’, RTLD_GLOBAL)iirc。 -
顺便说一句,你用
mpirun启动你的python脚本对吧? -
不幸的是,mpi4py 在这里不是一个选项。我已经尝试了您建议的其他两个选项,但我仍然无法使其正常工作。
-
您是否尝试在
extern "C"之后添加括号并在文件末尾关闭它们?mpi.h里面也是一样的。我认为extern "C"的范围没有考虑到您的功能,因为没有...范围{}