【发布时间】:2018-10-17 14:45:44
【问题描述】:
我在 MATLAB 和 Python 中设置了两个关于广播矩阵乘法的相同测试。对于 Python,我使用了 NumPy,对于 MATLAB,我使用了使用 BLAS 的 mtimesx 库。
MATLAB
close all; clear;
N = 1000 + 100; % a few initial runs to be trimmed off at the end
a = 100;
b = 30;
c = 40;
d = 50;
A = rand(b, c, a);
B = rand(c, d, a);
C = zeros(b, d, a);
times = zeros(1, N);
for ii = 1:N
tic
C = mtimesx(A,B);
times(ii) = toc;
end
times = times(101:end) * 1e3;
plot(times);
grid on;
title(median(times));
Python
import timeit
import numpy as np
import matplotlib.pyplot as plt
N = 1000 + 100 # a few initial runs to be trimmed off at the end
a = 100
b = 30
c = 40
d = 50
A = np.arange(a * b * c).reshape([a, b, c])
B = np.arange(a * c * d).reshape([a, c, d])
C = np.empty(a * b * d).reshape([a, b, d])
times = np.empty(N)
for i in range(N):
start = timeit.default_timer()
C = A @ B
times[i] = timeit.default_timer() - start
times = times[101:] * 1e3
plt.plot(times, linewidth=0.5)
plt.grid()
plt.title(np.median(times))
plt.show()
- 我的 Python 是从
pip安装的默认 Python,它使用 OpenBLAS。 - 我在英特尔 NUC i3 上运行。
MATLAB 代码在 1ms 内运行,而 Python 在 5.8ms 内运行,我不知道为什么,因为它们似乎都在使用 BLAS。
编辑
来自 Anaconda:
In [7]: np.__config__.show()
mkl_info:
libraries = ['mkl_rt']
library_dirs = [...]
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = [...]
blas_mkl_info:
libraries = ['mkl_rt']
library_dirs = [...]
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = [...]
blas_opt_info:
libraries = ['mkl_rt']
library_dirs = [...]
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = [...]
lapack_mkl_info:
libraries = ['mkl_rt']
library_dirs = [...]
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = [...]
lapack_opt_info:
libraries = ['mkl_rt']
library_dirs = [...]
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = [...]
从 numpy 使用 pip
In [2]: np.__config__.show()
blas_mkl_info:
NOT AVAILABLE
blis_info:
NOT AVAILABLE
openblas_info:
library_dirs = [...]
libraries = ['openblas']
language = f77
define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
library_dirs = [...]
libraries = ['openblas']
language = f77
define_macros = [('HAVE_CBLAS', None)]
lapack_mkl_info:
NOT AVAILABLE
openblas_lapack_info:
library_dirs = [...]
libraries = ['openblas']
language = f77
define_macros = [('HAVE_CBLAS', None)]
lapack_opt_info:
library_dirs = [...]
libraries = ['openblas']
language = f77
define_macros = [('HAVE_CBLAS', None)]
编辑 2
我尝试用 np.matmul(A, B, out=C) 替换 C = A @ B 并得到 2x worse 时间,例如大约 11 毫秒。这真的很奇怪。
【问题讨论】:
-
@etmuse 谢谢,已经看到了。我的论点是matlab(或
mtimesx)和numpy都在使用BLAS,所以我不明白为什么会有任何区别。 -
@galah92 哪个 BLAS。如果您在那篇文章中看到投票最多的答案,它会提到 Matlab 使用 Intel MKL,这非常快(至少在英特尔硬件上)。您可以使用
np.show_config()检查您的 NumPy 发行版正在使用什么;就我而言,它是OpenBLAS。这两者的区别是significant。 -
顺便说一句,这就是为什么有
intel-numpy,是Intel Distribution for Python的一部分。 -
重申上述内容:请显示
np.show_config()的输出
标签: python matlab performance numpy