【问题标题】:Sequential dot_product in OpenACC Fortran loopOpenACC Fortran 循环中的顺序点积
【发布时间】:2021-05-06 12:03:22
【问题描述】:

在一个 Fortran 程序中,我有一个大循环,其中有几个 dot_product 调用循环内生成的小向量:

program test
        implicit none

        real :: array1(2, 2), array2(2, 2), res(2)
        real :: subarray1(2), subarray2(2)
        integer :: i

        array1 = 1
        array2 = 2

        !$acc data copyin(array1, array2) copyout(res)
        !$acc kernels
        !$acc loop independent private(subarray1, subarray2)
        do i = 1, 2
                subarray1(:) = array1(:, i)
                subarray2(:) = array2(:, i)
                res(i) = dot_product(subarray1, subarray2)
        enddo
        !$acc end kernels
        !$acc end data

        print "(2(g0, x))", res
endprogram

使用 PGI 编译器编译时,dot_product 的加速实现似乎使用加速循环,因此可以更好地防止主循环加速(在 gang 和 vector 上):

test:
     11, Generating copyin(array1(:,:)) [if not already present]
         Generating copyout(res(:)) [if not already present]
         Generating copyin(array2(:,:)) [if not already present]
     14, Loop is parallelizable
         Generating Tesla code
         14, !$acc loop gang ! blockidx%x
         15, !$acc loop vector(32) ! threadidx%x
         17, !$acc loop vector(32) ! threadidx%x
             Generating implicit reduction(+:subarray1$r)
     14, CUDA shared memory used for subarray2,subarray1
     15, Loop is parallelizable
     17, Loop is parallelizable

从日志中可以看出,它对循环私有向量使用了隐式缩减和共享内存。

有没有办法强制dot_product 顺序运行?

【问题讨论】:

    标签: fortran openacc pgi-accelerator


    【解决方案1】:

    有没有办法强制 dot_product 顺序运行?

    只要您不介意数组语法也按顺序运行,只需将“gang vector”添加到循环指令中即可。

    % cat test.f90
    program test
            implicit none
    
            real :: array1(2, 2), array2(2, 2), res(2)
            real :: subarray1(2), subarray2(2)
            integer :: i
    
            array1 = 1
            array2 = 2
    
            !$acc data copyin(array1, array2) copyout(res)
            !$acc kernels loop gang vector private(subarray1, subarray2)
            do i = 1, 2
                    subarray1(:) = array1(:, i)
                    subarray2(:) = array2(:, i)
                    res(i) = dot_product(subarray1, subarray2)
            enddo
            !$acc end data
    
            print "(2(g0, x))", res
    endprogram
    % nvfortran -acc -Minfo=accel test.f90
    test:
         11, Generating copyin(array1(:,:)) [if not already present]
             Generating copyout(res(:)) [if not already present]
             Generating copyin(array2(:,:)) [if not already present]
         13, Loop is parallelizable
             Generating Tesla code
             13, !$acc loop gang, vector(32) ! blockidx%x threadidx%x
             14, !$acc loop seq
             16, !$acc loop seq
         13, Local memory used for subarray2,subarray1
         14, Loop is parallelizable
         16, Loop is parallelizable
    

    【讨论】:

      猜你喜欢
      • 2021-04-24
      • 2022-01-08
      • 2021-09-23
      • 1970-01-01
      • 1970-01-01
      • 2015-10-22
      • 2014-01-31
      • 2020-11-06
      • 1970-01-01
      相关资源
      最近更新 更多