TensorFlow：一个网络，两个 GPU？答案

【问题标题】：TensorFlow: one network, two GPUs?TensorFlow：一个网络，两个 GPU？
【发布时间】：2016-03-06 02:07:38
【问题描述】：

我有一个具有两个不同输出流的卷积神经网络：

                         input
                           |
                         (...) <-- several convolutional layers
                           |
                       _________
    (several layers)   |       |    (several layers)
    fully-connected    |       |    fully-connected
    output stream 1 -> |       | <- output stream 2

我想计算 /gpu:0 上的流 1 和 /gpu:1 上的流 2。不幸的是，我无法正确设置它。

这次尝试：

...placeholders...
...conv layers...

with tf.device("/gpu:0"):
    ...stream 1 layers...
    nn_out_1 = tf.matmul(...)

with tf.device("/gpu:1"):
    ...stream 2 layers...
    nn_out_2 = tf.matmul(...)

运行速度非常慢（比仅在 1 个 GPU 上训练慢），有时会在输出中产生 NaN 值。我认为这可能是因为 with 语句可能无法正确同步。所以我添加了control_dependencies 并将conv 层明确地放在/gpu:0 上：

...placeholders...  # x -> input, y -> labels

with tf.device("/gpu:0"):
    with tf.control_dependencies([x, y]):
        ...conv layers...
        h_conv_flat = tf.reshape(h_conv_last, ...)

with tf.device("/gpu:0"):
    with tf.control_dependencies([h_conv_flat]):
        ...stream 1 layers...
        nn_out_1 = tf.matmul(...)

with tf.device("/gpu:1"):
    with tf.control_dependencies([h_conv_flat]):
        ...stream 2 layers...
        nn_out_2 = tf.matmul(...)

...但是使用这种方法，网络甚至没有运行。无论我尝试了什么，它都抱怨输入没有被初始化：

tensorflow.python.framework.errors.InvalidArgumentError:
    You must feed a value for placeholder tensor 'x'
    with dtype float
    [[Node: x = Placeholder[dtype=DT_FLOAT, shape=[],
    _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

如果没有 with 语句，网络仅在 /gpu:0 上进行训练并且运行良好 - 训练合理的东西，没有错误。

我做错了什么？ TensorFlow 是否无法将一个网络中的不同层流拆分到不同的 GPU？我是否总是必须将完整网络拆分到不同的塔中？

【问题讨论】：

它可能取决于许多不同的因素。是同一个gpu吗？你的数据有多大？
是的，两个 GPU 是一样的，它们在一张卡上。它是来自 NVIDIA link 的 Dual K80 Tesla 卡。它有 24 GB VRAM，数据完全适合一个 GPU (12GB) 的 VRAM。
您确定瓶颈是该计算的 GPU 速度吗？瓶颈在与 GPU 之间的带宽中很常见，而不是实际计算；如果你将一个大张量发送到另一个 GPU，那么在这种情况下只会让事情变得更糟。

标签： python machine-learning neural-network tensorflow

【解决方案1】：

有一个例子说明如何在一个网络上使用多个 gpus https://github.com/tensorflow/tensorflow/blob/master/tensorflow/models/image/cifar10/cifar10_multi_gpu_train.py 也许你可以复制代码。也可以得到这样的东西

# Creates a graph.
c = []
for d in ['/gpu:2', '/gpu:3']:
with tf.device(d):
   a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
   b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
   c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
sum = tf.add_n(c)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print sess.run(sum)

正在查看：https://www.tensorflow.org/versions/r0.7/how_tos/using_gpu/index.html#using-multiple-gpus

最好的问候

【讨论】：