带有批处理和广播的张量流矩阵乘法答案

【问题标题】：tensorflow matrix multiplication with batching and broadcasting带有批处理和广播的张量流矩阵乘法
【发布时间】：2019-06-19 22:23:38
【问题描述】：

我可以在以下情况下使用tf.matmul(A, B) 进行批量矩阵乘法：

A.shape == (..., a, b) 和
B.shape == (..., b, c),

... 相同。

但我想要一个额外的广播：

A.shape == (a, b, 2, d) 和
B.shape == (a, 1, d, c)
result.shape == (a, b, 2, c)

我希望结果是a x b 在(2, d) 和(d, c) 之间的矩阵乘法批次。

如何做到这一点？

测试代码：

import tensorflow as tf
import numpy as np

a = 3
b = 4
c = 5
d = 6

x_shape = (a, b, 2, d)
y_shape = (a, d, c)
z_shape = (a, b, 2, c)

x = np.random.uniform(0, 1, x_shape)
y = np.random.uniform(0, 1, y_shape)
z = np.empty(z_shape)

with tf.Session() as sess:
    for i in range(b):
        x_now = x[:, i, :, :]
        z[:, i, :, :] = sess.run(
            tf.matmul(x_now, y)
        )

print(z)

【问题讨论】：

B 和 y 有不同的形状？我不知道tf，但numpy A@B 有效。
是的。对于numpy、x @ y[:, np.newaxis, :, :] 有效。这也适用于张量流。我不知道 @ 在 GPU 上的 tensorflow 中的效率如何。

标签： python numpy tensorflow matrix-multiplication

【解决方案1】：

tf.einsum - 任意维度张量之间的广义收缩，在这样的问题中是你的朋友。请参阅 tf 文档here。

stackoverflow 上有一个很棒的教程：(Understanding NumPy's einsum)。


import tensorflow as tf
import numpy as np

a = 3
b = 4
c = 5
d = 6

x_shape = (a, b, 2, d)
y_shape = (a, d, c)
z_shape = (a, b, 2, c)

x = tf.constant(np.random.uniform(0, 1, x_shape))
y = tf.constant(np.random.uniform(0, 1, y_shape))
z = tf.constant(np.empty(z_shape))

v = tf.einsum('abzd,adc->abzc', x, y)
print z.shape, v.shape

with tf.Session() as sess:
  print sess.run(v)


RESULT:

(3, 4, 2, 5) (3, 4, 2, 5)
[[[[ 1.8353901   1.29175219  1.49873967  1.78156638  0.79548786]
   [ 2.32836196  2.01395003  1.53038244  2.51846521  1.65700572]]

  [[ 1.76139921  1.78029925  1.22302866  2.18659201  1.51694413]
   [ 2.32021949  1.98895703  1.7098903   2.21515966  1.33412172]]

  [[ 2.13246675  1.63539287  1.64610271  2.16745158  1.02269943]
   [ 1.75559616  1.6715972   1.26049591  2.14399714  1.34957603]]

  [[ 1.80167636  1.91194534  1.3438773   1.9659323   1.25718317]
   [ 1.4379158   1.31033243  0.71024123  1.62527415  1.31030634]]]


 [[[ 2.04902039  1.59019464  1.32415689  1.59438659  2.02918951]
   [ 2.23684642  1.27256603  1.63474052  1.73646679  2.42958829]]
  ....
  ....

【讨论】：

感谢您的回答。如果我在 GPU 上运行 tensorflow 的 einsum 实现，它的速度和内存效率如何？ tensorflow 的@ 怎么样？我记得 numpy 可以对 dot 和 tensordot 使用并行 mkl 例程，但对于 einsum 则不行。
不确定是否适用于 GPU。对于 CPU，它们可能是相同的。 stackoverflow.com/questions/43100679/…。同样对于 TPU，根据我的个人经验，einsum 更快 (5%-10%)。
tf 中 einsum 的速度依赖于 opt_einsum 包的优化。 github.com/tensorflow/tensorflow/issues/16835@运算符使用另一个代码路径：github.com/tensorflow/tensorflow/issues/1062

【解决方案2】：

只需要tf.reshape 和tf.matmul。无需转置。

import tensorflow as tf
import numpy as np

jit_scope = tf.contrib.compiler.jit.experimental_jit_scope

a = 3
b = 4
c = 5
d = 6

x_shape = (a, b, 2, d)
y_shape = (a, d, c)

x = tf.constant(np.random.uniform(0, 1, x_shape))
y = tf.constant(np.random.uniform(0, 1, y_shape))

x2 = tf.reshape(x, (a, b * 2, d))

with jit_scope():
    z = tf.reshape(tf.matmul(x2, y), (a, b, 2, c))
    z2 = x @ (y[:, np.newaxis, :, :])
    z3 = tf.einsum('abzd, adc -> abzc', x, y)

with tf.Session() as sess:
    z_, z2_, z3_ = sess.run([z, z2, z3])

assert np.allclose(z_, z2_)
assert np.allclose(z_, z3_)

【讨论】：