一个 tensorflow 急切的 gpu 错误答案

【问题标题】：A tensorflow eager gpu error一个 tensorflow 急切的 gpu 错误
【发布时间】：2018-04-02 18:18:47
【问题描述】：

我正在学习 TensorFlow Eager Execution 的 demo。当我尝试单元格“GPU 使用情况”（见下文）时，出现错误提示变量未放置在 GPU 上。

import tensorflow as tf
import tensorflow.contrib.eager as tfe
tf.enable_eager_execution()
A = tf.constant([[2.0, 0.0], [0.0, 3.0]])
if tf.test.is_gpu_available() > 0:
    with tf.device(tf.test.gpu_device_name()):
        print(tf.matmul(A, A))

完整的错误信息：

Traceback（最近一次通话最后一次）：

文件“”，第 4 行，在打印(tf.matmul(A, A))

文件 "c:\python\python35_64\lib\site-packages\tensorflow\python\ops\math_ops.py", 第 2108 行，在 matmul 中 a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)

文件 "c:\python\python35_64\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", 第 4517 行，在 mat_mul _six.raise_from(_core._status_to_exception(e.code, message), None)

文件“”，第 3 行，在 raise_from 中

InvalidArgumentError：冲突设备上的张量：无法计算 MatMul 作为输入 #0 预计将打开 /job:localhost/replica:0/task:0/device:GPU:0 但实际上在 /job:localhost/replica:0/task:0/device:CPU:0（运行在 /job:localhost/replica:0/task:0/device:GPU:0) 可以复制张量显式使用 .gpu() 或 .cpu()，或使用透明复制 tfe.enable_eager_execution(tfe.DEVICE_PLACEMENT_SILENT)。复印设备之间的张量可能会减慢您的模型 [Op:MatMul] 名称： MatMul/

按照说明，我尝试了tfe.enable_eager_execution(tfe.DEVICE_PLACEMENT_SILENT)，但它返回了另一个错误消息（从tfe.DEVICE_PLACEMENT_SILENT返回的值是2）：

Traceback（最近一次通话最后一次）：

文件“”，第 1 行，在 tfe.enable_eager_execution(tfe.DEVICE_PLACEMENT_SILENT)

文件 "c:\python\python35_64\lib\site-packages\tensorflow\python\framework\ops.py", 第 5229 行，在 enable_eager_execution "config 必须是 tf.ConfigProto，但得到了 %s" % type(config))

TypeError: config must be a tf.ConfigProto, but got

如何解决错误？我也不知道Tensors can be copied explicitly using .gpu() or .cpu() 是如何工作的。

谢谢。

感谢@ash，修改后的代码有效（需要重启笔记本）。

import tensorflow as tf
import tensorflow.contrib.eager as tfe
tfe.enable_eager_execution(device_policy=tfe.DEVICE_PLACEMENT_SILENT)
A = tf.constant([[2.0, 0.0], [0.0, 3.0]])
if tf.test.is_gpu_available() > 0:
    with tf.device(tf.test.gpu_device_name()):
        print(tf.matmul(A, A))

或者（需要重启笔记本），

import tensorflow as tf
import tensorflow.contrib.eager as tfe
tfe.enable_eager_execution()
A = tf.constant([[2.0, 0.0], [0.0, 3.0]])
if tf.test.is_gpu_available() > 0:
    with tf.device(tf.test.gpu_device_name()):
        A = A.gpu()
        print(tf.matmul(A, A))

【问题讨论】：

标签： tensorflow

【解决方案1】：

错误消息中描述的修复当然可以使用调整，尝试：

tfe.enable_eager_execution(device_policy=tfe.DEVICE_PLACEMENT_SILENT)

改为（注意使用 device_policy 关键字参数）。

另一个建议是使用.cpu() 或.gpu() 方法，例如：

A = A.gpu()
print(tf.matmul(A, A))

这似乎是演示中的一个错误。但是这里发生的是张量A 被放置在 CPU 内存中，我们要求在 GPU 上执行矩阵乘法。因此，A 张量必须从 CPU（又名“主机”）内存复制到 GPU（又名“设备”）内存。这可以显式完成，或者通过将 device_policy 参数设置为 enable_eager_execution() 来完成 - 可以告诉 TensorFlow 运行时在需要时在设备之间静默复制张量。

希望对您有所帮助。

【讨论】：