为什么 X.dot(X.T) 在 numpy 中需要这么多内存？答案

【问题标题】：Why does X.dot(X.T) require so much memory in numpy?为什么 X.dot(X.T) 在 numpy 中需要这么多内存？
【发布时间】：2014-02-08 02:42:30
【问题描述】：

X 是一个 n x p 矩阵，其中 p 远大于 n。假设 n = 1000 和 p = 500000。当我运行时：

X = np.random.randn(1000,500000)
S = X.dot(X.T)

尽管结果大小为 1000 x 1000，但执行此操作最终会占用大量内存。操作完成后，内存使用量会回落。有没有办法解决这个问题？

【问题讨论】：

有趣。除了X 和S 的空间之外，这应该可以在非常小的恒定空间中实现：换位查看X，计算并分配S 的空间，并计算S 的每个元素直接地。从我对 NumPy 的了解来看，类似的东西应该会自动从这种操作组合中消失......
我不确定细节，但只要有可能，numpy 就会尝试使用 BLAS 例程，我认为这将是 sgemm 用于矩阵乘法。我敢打赌，这些高度优化的例程需要连续的数据（可能按 Fortran 顺序），因此必须根据您的情况制作副本。
使用 numpy >=1.8，这将有助于解决许多此类情况。

标签： python numpy scipy linear-algebra

【解决方案1】：

问题不在于X 和X.T 是同一内存空间的视图本身，而是 X.T 是 F-contiguous 而不是 C-contiguous。当然，这必须在这种情况下，至少有一个输入数组必须为真您将数组与其转置视图相乘。

在 numpy np.dot 将创建 any F-ordered 输入数组的 C-ordered 副本，而不仅仅是恰好是同一块的视图的那些记忆。

例如：

X = np.random.randn(1000,50000)
Y = np.random.randn(50000, 100)

# X and Y are both C-order, no copy
%memit np.dot(X, Y)
# maximum of 1: 485.554688 MB per loop

# make X Fortran order and Y C-order, now the larger array (X) gets
# copied
X = np.asfortranarray(X)
%memit np.dot(X, Y)
# maximum of 1: 867.070312 MB per loop

# make X C-order and  Y Fortran order, now the smaller array (Y) gets
# copied
X = np.ascontiguousarray(X)
Y = np.asfortranarray(Y)
%memit np.dot(X, Y)
# maximum of 1: 523.792969 MB per loop

# make both of them F-ordered, both get copied!
X = np.asfortranarray(X)
%memit np.dot(X, Y)
# maximum of 1: 905.093750 MB per loop

如果复制存在问题（例如，当X 非常大时），您能做些什么？

最好的选择可能是升级到更新版本的 numpy - 正如@perimosocordiae 指出的那样，这个性能问题已在this pull request 中得到解决。

如果由于某种原因您无法升级 numpy，还有一个技巧可以让您执行快速的、基于 BLAS 的点积，而无需通过直接通过 scipy.linalg.blas 调用相关的 BLAS 函数来强制复制（无耻地从this answer):

from scipy.linalg import blas
X = np.random.randn(1000,50000)

%memit res1 = np.dot(X, X.T)
# maximum of 1: 845.367188 MB per loop

%memit res2 = blas.dgemm(alpha=1., a=X.T, b=X.T, trans_a=True)
# maximum of 1: 471.656250 MB per loop

print np.all(res1 == res2)
# True

【讨论】：

上面有提到，但是numpy >=1.8 为你解决了这个问题：github.com/numpy/numpy/pull/2730
@perimosocordiae 是的，但有些人可能会因为任何原因（损坏的依赖关系等）而觉得升级很尴尬，所以我想我会使用 1.7.1 提供解决方案。我将编辑我的答案以使其更清楚。