【发布时间】:2016-07-23 08:24:11
【问题描述】:
我在两台 Ubuntu 14.04 主机上打开了 recently installed MPI,现在我正在使用提供的两个测试函数 hello_c 和 ring_c 测试它的功能。主机被称为“hermes”和“zeus”,它们都有用户“mpiuser”以非交互方式登录(通过 ssh-agent)。
mpirun hello_c 和 mpirun --host hermes,zeus hello_c 函数都可以正常工作。
在本地调用函数mpirun --host zeus ring_c 也可以。 hermes 和 zeus 的输出:
mpiuser@zeus:/opt/openmpi-1.6.5/examples$ mpirun --host zeus ring_c
Process 0 sending 10 to 0, tag 201 (1 processes in ring)
Process 0 sent to 0
Process 0 decremented value: 9
Process 0 decremented value: 8
Process 0 decremented value: 7
Process 0 decremented value: 6
Process 0 decremented value: 5
Process 0 decremented value: 4
Process 0 decremented value: 3
Process 0 decremented value: 2
Process 0 decremented value: 1
Process 0 decremented value: 0
Process 0 exiting
但调用函数mpirun --host zeus,hermes ring_c 失败并给出以下输出:
mpiuser@zeus:/opt/openmpi-1.6.5/examples$ mpirun --host hermes,zeus ring_c
Process 0 sending 10 to 1, tag 201 (2 processes in ring)
[zeus:2930] *** An error occurred in MPI_Recv
[zeus:2930] *** on communicator MPI_COMM_WORLD
[zeus:2930] *** MPI_ERR_TRUNCATE: message truncated
[zeus:2930] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
Process 0 sent to 1
--------------------------------------------------------------------------
mpirun has exited due to process rank 1 with PID 2930 on
node zeus exiting improperly. There are two reasons this could occur:
1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.
2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"
This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
我没有找到任何关于如何解决此类问题的文档,并且我不知道根据错误输出在哪里查找错误。 我该如何解决这个问题?
【问题讨论】:
标签: testing installation openmpi