【问题标题】:How can I call an MPI .so file from Python?如何从 Python 调用 MPI .so 文件?
【发布时间】:2019-12-04 18:47:20
【问题描述】:

我有一个包含已编译 C++ MPI Hello World 代码的共享对象文件。当我尝试使用 ctypes 从 Python 调用它时,我得到了一个相当无用的错误列表。

mpiHello.cpp:

#include <mpi.h>

extern "C"
void mpiHello() {

    int rank, size;

    MPI_Init(NULL, NULL);

    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    std::cout << "Hello world! I am " << rank << " of " << size << std::endl;

    MPI_Finalize();

}

编译命令: mpic++ -fPIC -o mpi.so mpiHello.cpp -shared

Python 调用:

    def __init__(self):
        self.dll = None
        _DIRNAME = os.path.dirname(__file__)
        try:  # Windows
            self.dll = ctypes.CDLL(os.path.join(_DIRNAME, "mpi.dll"))
        except OSError:  # Linux
            self.dll = ctypes.cdll.LoadLibrary(os.path.join(_DIRNAME, "mpi.so"))
        finally:
            self.dll.mpiHello.argtypes = []

    def execute(self):
        self.dll.mpiHello()

_mpi = mpi()
_mpi.execute()
[<user>-OptiPlex-7050:09468] mca_base_component_repository_open: unable to open mca_shmem_mmap: /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_shmem_mmap.so: undefined symbol: opal_show_help (ignored)
[<user>-OptiPlex-7050:09468] mca_base_component_repository_open: unable to open mca_shmem_sysv: /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_shmem_sysv.so: undefined symbol: opal_show_help (ignored)
[<user>-OptiPlex-7050:09468] mca_base_component_repository_open: unable to open mca_shmem_posix: /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_shmem_posix.so: undefined symbol: opal_shmem_base_framework (ignored)
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_shmem_base_select failed
  --> Returned value -1 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_init failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_rte_init failed
  --> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[<user>-OptiPlex-7050:9468] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!

Process finished with exit code 1

我希望代码显示 4 行“Hello world!我是......但我只是收到错误。任何帮助将不胜感激!

【问题讨论】:

  • 您可能需要考虑使用已经解决此问题的mpi4py。如果没有,请查看代码。对于 Open MPI,除非您使用 --disable-dlopen 配置它,否则您必须使用 dlopen(‘libmpi.so’, RTLD_GLOBAL) iirc。
  • 顺便说一句,你用mpirun启动你的python脚本对吧?
  • 不幸的是,mpi4py 在这里不是一个选项。我已经尝试了您建议的其他两个选项,但我仍然无法使其正常工作。
  • 您是否尝试在extern "C" 之后添加括号并在文件末尾关闭它们? mpi.h 里面也是一样的。我认为extern "C" 的范围没有考虑到您的功能,因为没有...范围{}

标签: python c++ mpi ctypes


【解决方案1】:

在 Linux 上几乎没有什么可做的,事实上你的错误是缺少括号。

#include "mpiHello.h" 

extern "C" {
    void mpiHello() {
        // Code
    }
}

mpiHello.h 文件也是如此:

#pragma once
#include <mpi.h>

extern "C" {
    void mpiHello();
}

使用 python 处理共享库。我通常使用宏来处理导出范围,它适用于 Windows (.dll) 和 Linux (.so)。

mpi.cpp:

#include "mpiHello.h"

extern "C" {
    MYEXPORT void mpiHello() {
        // Code
    }
}

mpiHello.h:

#pragma once
#include <mpi.h>
#if _WIN32
#define MYEXPORT __declspec(dllexport)
#else
#define MYEXPORT 
#endif

extern "C" {
    MYEXPORT void mpiHello();
}

如果你想创建一个带有返回值或参数的新函数,请小心。您需要在 python 文件中指定它们。

如果你有一个新的函数int myFonction(char* str1, char* str2, int pos),那么你在python中会有如下声明:

dll.myFonction.argtypes = [ctype.c_char_p, ctype.c_char_p, ctype.c_int]
dll.myFunction.restype = ctype.c_int

此外,您必须为您的新函数提供 C 类型参数,因此您必须将 Python 值转换为 C 值。 这是一个将 python 列表和 python int 转换为 ctypes 的示例:

pyhton_string = "Hello Word"
python_int = 42

c_string = (ctype.c_char* len(pyhton_string))(*pyhton_string)
c_int = ctype.c_int(python_int)

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2020-05-07
    • 2020-03-26
    • 1970-01-01
    • 1970-01-01
    • 2015-02-14
    • 1970-01-01
    • 2021-05-16
    • 1970-01-01
    相关资源
    最近更新 更多