GHC 8.0.x 中的 Foldl 内存性能答案

【问题标题】：Foldl memory performance in GHC 8.0.xGHC 8.0.x 中的 Foldl 内存性能
【发布时间】：2017-03-12 12:05:35
【问题描述】：

我在检查我正在处理的一些代码的内存使用情况时遇到了一个奇怪的问题。

使用foldl 对一个非常大的列表的元素求和，我得到一个恒定的内存使用量。

使用foldl' 我也得到了恒定的内存使用量（如预期的那样）。

使用foldr 内存增长并使我的系统崩溃（没有我期望的堆栈溢出异常）。

触发它所需的最少代码是： main = print $ foldx (+) 0 [1..100000000000000000000]

其中 foldx 是 foldl、foldr 或 foldl'

我的印象（根据Foldr Foldl Foldl'）恰恰相反。

我使用上述代码设置了一个 repo： https://github.com/framp/hs-fold-perf-test

这里发生了什么？ GHC 8.0.x 是不是太聪明了？我在 macOS Sierra 上

谢谢

【问题讨论】：

在这种特定情况下，GHC 可能正在优化foldl，使其更接近foldl'。除此之外，其余的应该是预期的行为：foldl' 应该在恒定空间中运行，而foldr 应该分配大量的 thunk，因为(+) 是严格的。

标签： performance haskell fold

【解决方案1】：

`foldl` 和 `foldl'`

在这种情况下，GHC 发现foldl 可以变得严格，并从本质上重写它以利用foldl'。请看下文 GHC 如何优化 foldl 构造。

请注意，这仅适用于您使用优化-O 进行编译。如果没有优化，foldl 程序会消耗我所有的内存并崩溃。

查看ghc -O -fforce-recomp -ddump-simpl foldl.hs的输出我们可以看到GHC完全消除了使用的巨大列表并将表达式优化为尾递归函数：

Rec {
-- RHS size: {terms: 20, types: 5, coercions: 0, joins: 0/0}
Main.main_go [Occ=LoopBreaker] :: Integer -> Integer -> Integer
[GblId, Arity=2, Str=<S,U><S,1*U>]
Main.main_go
  = \ (x_a36m :: Integer) (eta_B1 :: Integer) ->
      case integer-gmp-1.0.0.1:GHC.Integer.Type.gtInteger#
             x_a36m lim_r4Yv
      of wild_a36n
      { __DEFAULT ->
      case GHC.Prim.tagToEnum# @ Bool wild_a36n of {
        False ->
          Main.main_go
            (integer-gmp-1.0.0.1:GHC.Integer.Type.plusInteger
               x_a36m 1)
            (integer-gmp-1.0.0.1:GHC.Integer.Type.plusInteger eta_B1 x_a36m);
        True -> eta_B1
      }
      }
end Rec }

这解释了为什么它以恒定的内存使用运行。

为什么`foldr` 需要这么多内存？

foldr 构建了很多 thunk，它们本质上是未完成的计算，最终将保持正确的值。本质上，当尝试评估 foldr 表达式时，会发生这种情况：

foldr (+) 0 [1..100]
== (+) 1 $ foldr 0 [2..100]
== (+) 1 $ (+) 2 $ foldr [3..100]
...
== (+) 1 $ (+) 2 $ .. $ (+) 99 $ (+) 100 0 -- at this point there are 100
== (+) 1 $ (+) 2 $ .. $ (+) 99 $ 100       -- unevaluated computations, which
== (+) 1 $ (+) 2 $ .. $ (+) 199            -- take up a lot of memory
...
== (+) 1 $ 5049
== 5050

100000000000000000000 的限制对于 thunk 占用的空间比您的 RAM 更多并且您的程序崩溃而言太大了。

【讨论】：

感谢 ThreeFx（和 chi），这很有意义

foldl 和 foldl'

为什么foldr 需要这么多内存？

`foldl` 和 `foldl'`

为什么`foldr` 需要这么多内存？