CUDA-Fortran 设备数据结构中的可分配数组答案

【问题标题】：Allocatable arrays in CUDA-Fortran device data structuresCUDA-Fortran 设备数据结构中的可分配数组
【发布时间】：2017-08-10 13:18:09
【问题描述】：

我正在尝试在驻留在 GPU 内存中的“设备”数据结构中使用可分配数组。代码（粘贴在下面）编译，但给出了段错误。我在做一些明显错误的事情吗？

模块文件名为“gpu_modules.F90”，如下所示：

!=============
! This module contains definitions for data structures and the data
! stored on the device
!=============

   module GPU_variables
   use cudafor

   type :: data_str_def

!=============
! single number quantities
!=============

      integer                       :: i, j 
      real(kind=8)                  :: a 

!=============
! Arrays
!=============

      real(kind=8),   allocatable   :: b(:)
      real(kind=8),   allocatable   :: c(:,:)
      real(kind=8),   allocatable   :: d(:,:,:)
      real(kind=8),   allocatable   :: e(:,:,:,:)

   end type data_str_def

!=============
! Actual data is here
!=============

   type(data_str_def), device, allocatable   :: data_str(:)

   contains

!=============
! subroutine to allocate memory
!=============

      subroutine allocate_mem(n1)
      implicit none 
      integer, intent(in)  :: n1 

      call deallocate_mem()

      write(*,*) 'works here'
      allocate(data_str(n1))

      write(*,*) 'what about allocating memory?'
      allocate(data_str(n1) % b(10))
      write(*,*) 'success!'

      return
      end subroutine allocate_mem

!=============
! subroutine to deallocate memory
!=============

      subroutine deallocate_mem()
      implicit none
      if(allocated(data_str)) deallocate(data_str)
      return 
      end subroutine deallocate_mem

   end module GPU_variables

主程序是'gpu_test.F90'，如下所示：

!=============
! main program 
!=============

    program gpu_test
    use gpu_variables
    implicit none

!=============
! local variables
!=============

    integer             :: i, j, n

!=============
! allocate data
!=============

    n       = 2                 ! number of data structures

    call allocate_mem(n)

!=============
! dallocate device data structures and exit
!=============

    call deallocate_mem()
    end program

编译命令（从当前文件夹）是：

pgfortran -Mcuda=cc5x *.F90

终端输出：

$ ./a.out 
 works here
 what about allocating memory?
Segmentation fault (core dumped)

任何帮助/见解和解决方案将不胜感激。不，使用指针不是一个可行的选择。

编辑：另一个可能相关的细节：我使用的是 pgfortran 16.10 版

【问题讨论】：

注意使用kind=8 是丑陋且不可移植的（尽管它不会导致此错误）。此外，在所有结束之前返回是完全肤浅的。
另请注意allocate(data_str(n1) % b(10)) 仅为data_str 的n1th 组件分配b 组件。但这可能是您在这个简单示例中的意图。
可能重复stackoverflow.com/questions/44680150/…
嗨弗拉基米尔：感谢您的回复。 kind=8 只是为了让事情明确。是的，我只想分配派生类型的一个组件，而不涉及其余部分。我也首先查看了“数据结构中的指针”问题 - 建议的解决方案是制作数据结构的主机端副本并将整个内容复制到设备......我会尝试并发布回复
刚刚尝试过.. 没有运气制作主机副本并传输到设备。

标签： cuda fortran allocatable-array

【解决方案1】：

分段错误的原因是您必须访问主机上data_str的内存才能分配data_str(n1)%b。由于 data_str 位于设备内存中，而不是主机内存中，因此您会遇到分段错误。理论上，编译器可以创建一个主机 temp，对其进行分配，然后将其复制到 data_str(n1)%b 的描述符中，但这不是当今 CUDA Fortran 的一部分。

您可以通过自己创建临时来解决这种情况：

      subroutine allocate_mem(n1)
      implicit none
      integer, intent(in)  :: n1
      type(data_str_def) :: data_str_h

      call deallocate_mem()

      write(*,*) 'works here'
      allocate(data_str(n1))

      write(*,*) 'what about allocating memory?'
      allocate(data_str_h% b(10))
      data_str(n1) = data_str_h
      write(*,*) 'success!'

      return
      end subroutine allocate_mem

顺便说一句，您打算将组件 b、c、d 和 e 分配在主机内存或设备内存中吗？我没有看到他们的设备属性，所以在上面，他们会去主机内存。

【讨论】：

嗨拉菲克，感谢您的建议。模块“data_str”中的数据结构具有设备属性，因此它驻留在 GPU 内存中。 “data_str”中的任何条目也继承此属性。
@ansri ““data_str”中的任何条目也继承了这个属性。”这不是真的！请参阅我已经向您展示的链接stackoverflow.com/questions/44680150/…

【解决方案2】：

所以我在 PGI 论坛上发布了这个问题，来自 PGI 的一个人确认该功能不受支持，因为我正在尝试使用它。

http://www.pgroup.com/userforum/viewtopic.php?t=5661

他的建议是使用“托管”属性或在数据结构中使用固定大小的数组。

【讨论】：