这是一个使用点对点和集合的工作代码(集合版本在下面注释掉但工作正常)。您需要定义一个向量类型来对应主控端接收端的非连续数据。要使用集体聚集,你需要弄乱这个向量的大小,以确保聚集将所有部分放在正确的位置,你需要使用聚集版本。
数组索引很容易弄乱,所以为了一般性,我在 6x12 矩阵上使用了 2x3 进程数组,这样事情就故意不是正方形的。
为混乱的缩进道歉 - 我似乎有制表符/空格问题,我真的应该在未来解决!
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#define M 6
#define N 12
#define MP 2
#define NP 3
#define MLOCAL (M/MP)
#define NLOCAL (N/NP)
#define TAG 0
int main(void)
{
int master[M][N];
int local[MLOCAL][NLOCAL];
MPI_Comm comm = MPI_COMM_WORLD;
int rank, size, src;
int i, j;
int istart, jstart;
int displs[MP*NP], counts[MP*NP];
MPI_Status status;
MPI_Request request;
MPI_Datatype block, blockresized;
MPI_Init(NULL, NULL);
MPI_Comm_size(comm, &size);
MPI_Comm_rank(comm, &rank);
if (size != MP*NP)
{
if (rank == 0) printf("Size %d not equal to MP*NP = %d\n", size, MP*NP);
MPI_Finalize();
return 1;
}
for (i=0; i < M; i++)
{
for (j=0; j < N; j++)
{
master[i][j] = rank;
}
}
for (i=0; i < MLOCAL; i++)
{
for (j=0; j < NLOCAL; j++)
{
local[i][j] = rank+1;
}
}
// Define vector type appropriate for subsections of master array
MPI_Type_vector(MLOCAL, NLOCAL, N, MPI_INT, &block);
MPI_Type_commit(&block);
// Non-blocking send to avoid deadlock with rank 0 sending to itself
MPI_Isend(local, MLOCAL*NLOCAL, MPI_INTEGER, 0, TAG, comm, &request);
// Receive from all the workers
if (rank == 0)
{
for (src=0; src < size; src++)
{
// Find out where this block should go
istart = (src/NP) * MLOCAL;
jstart = (src%NP) * NLOCAL;
// receive a single block
MPI_Recv(&master[istart][jstart], 1, block, src, TAG, comm, &status);
}
}
// Wait for send to complete
MPI_Wait(&request, &status);
/* comment out collective
// Using collectives -- currently commented out!
MPI_Type_create_resized(block, 0, sizeof(int), &blockresized);
MPI_Type_commit(&blockresized);
// Work out displacements in master in counts of integers
for (src=0; src < size; src++)
{
istart = (src/NP) * MLOCAL;
jstart = (src%NP) * NLOCAL;
displs[src] = istart*N + jstart;
counts[src] = 1;
}
// Call collective
MPI_Gatherv(local, MLOCAL*NLOCAL, MPI_INT,
master, counts, displs, blockresized,
0, comm);
*/
// Print out
if (rank == 0)
{
for (i=0; i < M; i++)
{
for (j=0; j < N; j++)
{
printf("%d ", master[i][j]);
}
printf("\n");
}
}
MPI_Finalize();
}
它似乎在 6 个进程上工作正常:
mpiexec -n 6 ./arraygather
1 1 1 1 2 2 2 2 3 3 3 3
1 1 1 1 2 2 2 2 3 3 3 3
1 1 1 1 2 2 2 2 3 3 3 3
4 4 4 4 5 5 5 5 6 6 6 6
4 4 4 4 5 5 5 5 6 6 6 6
4 4 4 4 5 5 5 5 6 6 6 6
这应该适用于矩阵完全分解到进程网格上的任何情况。如果进程的子矩阵大小不完全相同,会有点复杂。