可能是您的标准错误(odls 框架正在打印绑定信息)被重定向到某个地方。它适用于我的 Open MPI 1.5.3:
$ mpiexec --report-bindings -rankfile rank_file -hostfile host_file \
-n 4 hostname
host
[host:18314] [[35226,0],0] odls:default:fork binding child [[35226,1],0] to slot_list 0
[host:18314] [[35226,0],0] odls:default:fork binding child [[35226,1],1] to slot_list 1
[host:18314] [[35226,0],0] odls:default:fork binding child [[35226,1],2] to slot_list 0
[host:18314] [[35226,0],0] odls:default:fork binding child [[35226,1],3] to slot_list 1
host
host
host
如果库由于某种原因未能报告绑定,您可以使用一个简单的脚本来检查绑定是否实际发生:
#!/bin/sh
cpuset=$(cat /proc/self/status | grep Cpus_allowed_list | awk '{print $2;}')
echo "Rank $OMPI_COMM_WORLD_RANK bound to core(s) $cpuset"
只需将其命名为 report_bindings 并通过 mpiexec 运行它:
$ mpiexec --report-bindings -rankfile rank_file -hostfile host_file \
-n 4 report_bindings
Rank 1 bound to core(s) 8
Rank 0 bound to core(s) 0
Rank 3 bound to core(s) 8
Rank 2 bound to core(s) 0