Fortran MPI allgatherv 具有二维数组的派生类型答案

【问题标题】：Fortran MPI allgatherv with derived type for 2d arrayFortran MPI allgatherv 具有二维数组的派生类型
【发布时间】：2017-08-12 10:23:27
【问题描述】：

需要有关此 Fortran MPI 问题的帮助。试图从二维数组的不同列中收集数据。问题是没有使用每一行的所有数据，并且每个进程分配的列可能不相等。所有进程都从一个等效的全局数据视图开始，每个进程都应在特定列上执行工作，最后交换信息，以便所有进程再次共享公共视图。问题类似于MPI partition and gather 2D array in Fortran和Sending 2D arrays in Fortran with MPI_Gather

绘制示例：使用 3 MPI 进程的 data(8,4)

---------------------
| a1 | b1 | c1 | d1 |
| a2 | b2 | c2 | d2 |
| a3 | b3 | c3 | d3 |
| a4 | b4 | c4 | d4 |
| a5 | b5 | c5 | d5 |
| a6 | b6 | c6 | d6 |
| a7 | b7 | c7 | d7 |
| a8 | b8 | c8 | d8 |
---------------------

进程 1 将得到 2 列，进程 2 得到 1 列，进程 3 得到 1 列。

-----------  ------  ------
| a1 | b1 |  | c1 |  | d1 |
| a2 | b2 |  | c2 |  | d2 |
| a3 | b3 |  | c3 |  | d3 |
| a4 | b4 |  | c4 |  | d4 |
| a5 | b5 |  | c5 |  | d5 |
| a6 | b6 |  | c6 |  | d6 |
| a7 | b7 |  | c7 |  | d7 |
| a8 | b8 |  | c8 |  | d8 |
-----------  ------  ------

在实际问题中，实际大小是 data(200000,59)。这是一块预先分配的内存，我只使用了每列的一部分（总是从索引 1 开始）。例如，我只需要每列中的前 3 个值。

-----------  ------  ------
| a1 | b1 |  | c1 |  | d1 |
| a2 | b2 |  | c2 |  | d2 |
| a3 | b3 |  | c3 |  | d3 |
| == | == |  | == |  | == |
| a4 | b4 |  | c4 |  | d4 |
| a5 | b5 |  | c5 |  | d5 |
| a6 | b6 |  | c6 |  | d6 |
| a7 | b7 |  | c7 |  | d7 |
| a8 | b8 |  | c8 |  | d8 |
-----------  ------  ------

我正在尝试创建可用于完成此任务的发送和接收数据类型。到目前为止，我最好的猜测是使用 MPI_TYPE_VECTOR。 MPI_TYPE_VECTOR(COUNT, BLOCKLENGTH, STRIDE, OLDTYPE, NEWTYPE, IERROR)

为此将使用 MPI_TYPE_VECTOR(1, 3, 8, MPI_DOUBLE, newtype, ierr)。这应该允许每个进程发送最少的信息。有了这个，我想我应该可以用 ALLGATHERV 发送信息了。

MPI_ALLGATHERV（SENDBUF、SENDCOUNT、SENDTYPE、RECVBUF、RECVCOUNT、DISPLS、RECVTYPE、COMM、IERROR）我在哪里使用 MPI_ALLGATHERV(data(1,my_first_col), num_cols_to_be_sent, newtype, data, RECVCOUNT[], DISPLS[], newtype, COMM, IERROR)

据我所知，这是每个进程应该发送的信息。

Process 1: [a1,a2,a3,b1,b2,b3]
Process 2: [c1,c2,c3]
Process 3: [d1,d2,d3]

我见过的例子要么使用整个数据列，要么位移自然是所需子数组的倍数。我无法将其解压缩到正确的列中。它不应该能够做到这一点，因为接收端了解类型的大小/范围。当然，我对整个范围的事情感到非常困惑。任何帮助，将不胜感激。真正的代码在工作，但这里是查看和 cmets 的快速娱乐（可能无法编译，只是快速制作）。

  MODULE PARALLEL
    INTEGER iproc, nproc, rank, ierr
    INTEGER mylow, myhigh, mysize, ichunk, irem
    INTEGER, ALLOCATABLE :: isize(:), idisp(:), ilow(:), ihigh(:)
    DOUBLE PRECISION, ALLOCATABLE :: glob_val(:,:)
    INTEGER newtype
  END MODULE


  PROGRAM MAIN
  USE PARALLEL
  IMPLICIT NONE
  INCLUDE 'mpif.f'

c   **temp variables
  integer i, j
  integer num_rows,num_cols
  integer used_rows

c    ----setup MPI----
  call MPI_INIT(ierr)
  call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)
  call MPI_COMM_SIZE(MPI_COMM_WORLD,nproc,ierr)
  iproc = rank+1  !rank is base 0, rest of fortran base 1

c   ----setup initial data      
  integer num_rows=20  !contiguous in memory over rows (ie single column)
  integer num_cols=11  !noncontiguous memory between different columns
  integer 

  ALLOCATE (isize(nproc))
  ALLOCATE (idisp(nproc))
  ALLOCATE (ilow(nproc))
  ALLOCATE (ishigh(nproc))      
  ALLOCATE (glob_val(num_rows,num_cols))

  glob_val = 1.0*iproc   !sent all glob values to process id
  do i=1,num_cols
    do j=1,used_rows
      glob_val(j,i) = iproc+.01*j  !add refernce index to used data
    end do
  end do

c   ---setup exchange information
  ichunk = num_cols/nproc
  irem = num_cols -(ichunk*nproc)
  mysize=ichunk
  if(iproc.le.irem) mysize=mysize+1

  mylow=0
  myhigh=0

  do i=1,nproc   !establish global understanding of processes
    mylow=myhigh+1
    myhigh=mylow+ichunk
    if(i.le.irem) myhigh=myhigh+1

    isize(i)=myhigh-mylow+1
    idisp(i)=(mylow-1)    !based on receiving type size/extent
    ilow(i)=mylow
    ihigh(i)=myhigh
  end do
  mylow=ilow(iproc)
  myhigh=ihigh(iproc)

  call MPI_TYPE_VECTOR(1,used_rows,num_rows,MPI_DOUBLE,
 &                     newtype,ierr)
  call MPI_TYPE_COMMIT(newtype,ierr)

c   --- perform exchange based on 'newtype'      
      !MPI_ALLGATHERV(SENDBUF, SENDCOUNT, SENDTYPE,
      !               RECVBUF, RECVCOUNT, DISPLS, RECVTYPE,
      !               COMM, IERROR)
  call MPI_ALLGATHERV(glob_val(1,mylow),mysize,newtype
 &                    glob_val,isize,iproc,newtype,
 &                    MPI_COMM_WORLD,ierr)      

c   ---print out global results of process 2
  if(iproc.eq.2) then      
    do i=1,num_rows
      write(*,*) (glob_val(i,j),j=1,num_cols) 
    end do
  end if

  END program

【问题讨论】：

你的数据不就是double precision吗？为什么要创建自定义 MPI 类型？
数据是双精度的。自定义类型用于创建仅包含我需要的数组数量的单一类型。例如，数组列中的 8 个值中的 3 个成为单个实体。之后，每个进程都被发送若干列。

标签： fortran 2d mpi derived-types

【解决方案1】：

好的，我通过以下方式完成了这项工作：

1) myhigh=mylow + ichunk - 1 不是 myhigh = mylow + ichunk

2) used_rows 必须在赋值循环之前设置

3) 更明确地定义实际的缓冲区，试试

call MPI_ALLGATHERV(glob_val(:,mylow:myhigh), mysize, newtype,   &
                    glob_val(1:used_rows,:), isize, idisp, newtype, &
                    MPI_COMM_WORLD, ierr)

使用 gfortran 和 openmpi 的完整代码：

  MODULE PARALLEL
    INTEGER iproc, nproc, rank, ierr
    INTEGER mylow, myhigh, mysize, ichunk, irem
    INTEGER, ALLOCATABLE :: isize(:), idisp(:), ilow(:), ihigh(:)
    DOUBLE PRECISION, ALLOCATABLE :: glob_val(:,:)
    INTEGER newtype
  END MODULE


  PROGRAM MAIN
  USE PARALLEL
  use mpi
  IMPLICIT NONE
  ! INCLUDE 'mpif.f'

!   **temp variables
  integer i, j
  integer num_rows,num_cols
  integer used_rows

!    ----setup MPI----
  call MPI_INIT(ierr)
  call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)
  call MPI_COMM_SIZE(MPI_COMM_WORLD,nproc,ierr)
  iproc = rank+1  !rank is base 0, rest of fortran base 1

!   ----setup initial data      
  num_rows=8  !contiguous in memory over rows (ie single column)
  num_cols=4  !noncontiguous memory between different columns
  used_rows = 3

  ALLOCATE (isize(nproc))
  ALLOCATE (idisp(nproc))
  ALLOCATE (ilow(nproc))
  ALLOCATE (ihigh(nproc))      
  ALLOCATE (glob_val(num_rows,num_cols))

!  glob_val = 1.0*iproc   !sent all glob values to process id
  glob_val = -1.0 * iproc  
  do i=1,num_cols
    do j=1,used_rows
      glob_val(j,i) = (1.0*iproc)+(.01*j)  !add refernce index to used data
    end do
  end do

!   ---setup exchange information
  ichunk = num_cols/nproc
  irem = num_cols -(ichunk*nproc)
  mysize=ichunk
  if(iproc.le.irem) mysize=mysize+1

  mylow=0
  myhigh=0

  do i=1,nproc   !establish global understanding of processes
    mylow=myhigh+1
    myhigh=mylow+ichunk-1
    if(i.le.irem) myhigh=myhigh+1

    isize(i)=myhigh-mylow+1
    idisp(i)=(mylow-1)    !based on receiving type size/extent
    ilow(i)=mylow
    ihigh(i)=myhigh
  end do
  mylow=ilow(iproc)
  myhigh=ihigh(iproc)

  call MPI_TYPE_VECTOR(1,used_rows,num_rows,MPI_DOUBLE, &
                      newtype,ierr)
  call MPI_TYPE_COMMIT(newtype,ierr)

  write(*,*) rank, idisp
  write(*,*) rank, isize
!   --- perform exchange based on 'newtype'      
      !MPI_ALLGATHERV(SENDBUF, SENDCOUNT, SENDTYPE,
      !               RECVBUF, RECVCOUNT, DISPLS, RECVTYPE,
      !               COMM, IERROR)
  call MPI_ALLGATHERV(glob_val(:,mylow:myhigh),mysize,newtype, &
                     glob_val(1:used_rows,:),isize,idisp,newtype, &
                     MPI_COMM_WORLD,ierr)      

!   ---print out global results of process 2
  if(iproc.eq.2) then      
    do i=1,num_rows
      write(*,*) (glob_val(i,j),j=1,num_cols) 
    end do
  end if

  call MPI_Finalize(ierr)

  END program

【讨论】：

抱歉，我花了一段时间才回到这个问题。我运行了你的代码。我明白你想用它做什么。但是，我没有得到正确的答复。为了获得更好的测试结果，我将数据初始化行更改为： 'code' glob_val(j,i) = (100.0*iproc)+(.01*j)+i 您的代码导致： 101.01 -1.00 203.01 -2.00 101.02 - 1.00 203.02 -2.00 101.03 -1.00 203.03 -2.00 -2.00 -2.00 -2.00 -2.00 -2.00 -2.00 -2.00 -2.00 -2.00 -2.00 -2.00 -2.00 -2.00 -2.00 -2.00 -2.00
我明白了！从使用 MPI_TYPE_VECTOR 更改为 MPI_TYPE_CONTIGUOUS(used_rows, MPI_DOUBLE, newtype, ierr) 并更改了 ALLGATHER(glob_val(1:used_rows, mylow:myhigh), ...) 没有您的帮助就无法做到！谢谢塔博士。 @塔博士
很高兴为您提供帮助:)