Repa 2 和 3 API 之间的主要区别是什么？答案

【问题标题】：What are the key differences between the Repa 2 and 3 APIs?Repa 2 和 3 API 之间的主要区别是什么？
【发布时间】：2012-05-25 00:34:58
【问题描述】：

更具体地说，我有以下看起来无害的小 Repa 3 程序：

{-# LANGUAGE QuasiQuotes #-}

import Prelude hiding (map, zipWith)
import System.Environment (getArgs)
import Data.Word (Word8)
import Data.Array.Repa
import Data.Array.Repa.IO.DevIL
import Data.Array.Repa.Stencil
import Data.Array.Repa.Stencil.Dim2

main = do
  [s] <- getArgs
  img <- runIL $ readImage s

  let out = output x where RGB x = img
  runIL . writeImage "out.bmp" . Grey =<< computeP out

output img = map cast . blur . blur $ blur grey
  where
    grey              = traverse img to2D luminance
    cast n            = floor n :: Word8
    to2D (Z:.i:.j:._) = Z:.i:.j

---------------------------------------------------------------

luminance f (Z:.i:.j)   = 0.21*r + 0.71*g + 0.07*b :: Float
  where
    (r,g,b) = rgb (fromIntegral . f) i j

blur = map (/ 9) . convolve kernel
  where
    kernel = [stencil2| 1 1 1
                        1 1 1
                        1 1 1 |]

convolve = mapStencil2 BoundClamp

rgb f i j = (r,g,b)
  where
    r = f $ Z:.i:.j:.0
    g = f $ Z:.i:.j:.1
    b = f $ Z:.i:.j:.2

在我的 2Ghz core 2 duo 笔记本电脑上处理 640x420 图像需要这么长时间：

real    2m32.572s
user    4m57.324s
sys     0m1.870s

我知道肯定有什么问题，因为我在使用 Repa 2 的更复杂的算法上获得了更好的性能。在该 API 下，我发现的重大改进来自于在每次数组转换之前添加对“强制”的调用（我理解这意味着每次调用映射、卷积、遍历等）。我不能完全弄清楚在 Repa 3 中要做的类似事情 - 事实上，我认为新的表现类型参数应该确保在何时需要强制数组时没有歧义？新的 monadic 接口如何适应这个方案？我已阅读 Don S 的精彩教程，但 Repa 2 和 3 API 之间存在一些关键差距，在线 AFAIK 很少讨论。

更简单地说，有没有一种影响最小的方法来解决上述程序的效率问题？

【问题讨论】：

标签： haskell image-processing parallel-processing repa data-parallel-haskell

【解决方案1】：

新的表示类型参数不会在需要时自动强制（这可能是一个很难做到的问题） - 您仍然需要手动强制。在 Repa 3 中，这是通过 computeP 函数完成的：

computeP
  :: (Monad m, Repr r2 e, Fill r1 r2 sh e)
  => Array r1 sh e -> m (Array r2 sh e)

我个人真的不明白为什么它是 monadic，因为你也可以使用 Monad Identity：

import Control.Monad.Identity (runIdentity)
force
  :: (Repr r2 e, Fill r1 r2 sh e)
  => Array r1 sh e -> Array r2 sh e
force = runIdentity . computeP

所以，现在您的 output 函数可以通过适当的强制重写：

output img = map cast . f . blur . f . blur . f . blur . f $ grey
  where ...

带有缩写f 使用辅助函数u 来辅助类型推断：

u :: Array U sh e -> Array U sh e
u = id
f = u . force

通过这些更改，加速非常显着 - 这是意料之中的，因为在没有中间强制的情况下，每个输出像素最终会评估超出必要的值（中间值不共享）。

您的原始代码：

real    0m25.339s
user    1m35.354s
sys     0m1.760s

强制：

real    0m0.130s
user    0m0.320s
sys     0m0.028s

使用 600x400 png 进行测试，输出文件完全相同。

【讨论】：

这是一个很好的答案！我知道computeP 是'force' 的替代品，但没想过将它与identity monad 一起使用。感谢您的帮助。
我相信使用单子返回类型的原因是因为强制某些东西的想法与顺序发生的强制紧密相关。 cse.unsw.edu.au/~chak/papers/LCKP12.html中有更好的解释

【解决方案2】：

computeP 是新的force。

在 Repa 3 中，您需要在 Repa 2 中使用 force 的任何地方使用 computeP。

repa-examples 中的Laplace 示例与您正在做的类似。您还应该在 blur 函数中使用 cmap 而不是普通的 map。下周初我的主页上会有一篇论文解释原因。

【讨论】：

Haskell 社区的伟大之处 - 来自库开发人员自己的反馈 :) 我热切期待您的论文。