【问题标题】:Cannot assign a device for operation 'Variable_4/Adam_1'无法为操作“Variable_4/Adam_1”分配设备
【发布时间】:2019-09-17 20:55:37
【问题描述】:

我正在尝试执行从以下 github 存储库中克隆的脚本“train.py”:

https://github.com/xiaojunxu/dnn-binary-code-similarity

安装上述存储库 (requirements.txt) 的所有要求后,我运行“train.py”并收到以下错误,我无法找到解决方案:

2019-09-17 20:43:51.186970: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
  Traceback (most recent call last):
    File "train.py", line 124, in <module>
      gnn.init(LOAD_PATH, LOG_PATH)
    File "/ws/Gemini/graphnnSiamese.py", line 120, in init
      sess.run(tf.global_variables_initializer())
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 889, in run
      run_metadata_ptr)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1120, in _run
      feed_dict_tensor, options, run_metadata)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
      options, run_metadata)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
      raise type(e)(node_def, op, message)
  tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'Variable_4/Adam_1': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0 ]. Make sure the device specification refers to a valid device.
     [[Node: Variable_4/Adam_1 = VariableV2[_class=["loc:@Variable_4"], container="", dtype=DT_FLOAT, shape=[64], shared_name="", _device="/device:GPU:0"]()]]

  Caused by op u'Variable_4/Adam_1', defined at:
    File "train.py", line 122, in <module>
      lr = LEARNING_RATE
    File "/ws/Gemini/graphnnSiamese.py", line 93, in __init__
      optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(loss)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 353, in minimize
      name=name)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 474, in apply_gradients
      self._create_slots([_get_variable_for(v) for v in var_list])
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/adam.py", line 137, in _create_slots
      self._zeros_slot(v, "v", self._name)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 796, in _zeros_slot
      named_slots[_var_key(var)] = slot_creator.create_zeros_slot(var, op_name)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 174, in create_zeros_slot
      colocate_with_primary=colocate_with_primary)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 148, in create_slot_with_initializer
      dtype)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 67, in _create_slot_var
      validate_shape=validate_shape)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1203, in get_variable
      constraint=constraint)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1092, in get_variable
      constraint=constraint)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 425, in get_variable
      constraint=constraint)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 394, in _true_getter
      use_resource=use_resource, constraint=constraint)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 805, in _get_single_variable
      constraint=constraint)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 213, in __init__
      constraint=constraint)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 309, in _init_from_args
      name=name)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/state_ops.py", line 133, in variable_op_v2
      shared_name=shared_name)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py", line 927, in _variable_v2
      shared_name=shared_name, name=name)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
      op_def=op_def)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
      op_def=op_def)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
      self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

  InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'Variable_4/Adam_1': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0 ]. Make sure the device specification refers to a valid device.
     [[Node: Variable_4/Adam_1 = VariableV2[_class=["loc:@Variable_4"], container="", dtype=DT_FLOAT, shape=[64], shared_name="", _device="/device:GPU:0"]()]]

我在建议中发现应该尝试将以下内容更改为“0”:

os.environ["CUDA_VISIBLE_DEVICES"]= "0"

但它对我不起作用。

如果有人能帮我解决这个问题,我将不胜感激。谢谢。

【问题讨论】:

    标签: python-2.7 tensorflow


    【解决方案1】:

    无法为操作“Variable_4/Adam_1”分配设备:操作 已明确分配给 /device:GPU:0 但可用设备为 [ /job:localhost/replica:0/task:0/device:CPU:0]。确保设备 规范指的是一个有效的设备

    您是否安装了tensorflowtensorflow-gpu?如果您想使用 GPU,则后者是您想要的。

    请执行以下代码验证 GPU 可用性

    tf.config.list_physical_devices('GPU') 
    

    执行上述命令后,如果可用,您将收到类似于我的输出的输出,如下所述

    [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
    

    也可能是版本兼容性问题。首先,检查你的 nvidia 驱动是否安装了:nvidia-smi,你应该会得到这样的东西:

    Wed Jun 10 15:13:03 2020       
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 440.82       Driver Version: 418.67       CUDA Version: 10.1     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===============================+======================+======================|
    |   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
    | N/A   54C    P0    36W / 250W |   1573MiB / 16280MiB |      0%      Default |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                       GPU Memory |
    |  GPU       PID   Type   Process name                             Usage      |
    |=============================================================================|
    +-----------------------------------------------------------------------------+
    

    之后,使用nvcc --version 检查您拥有的 cuda 版本。示例:

    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2019 NVIDIA Corporation
    Built on Sun_Jul_28_19:07:16_PDT_2019
    Cuda compilation tools, release 10.1, V10.1.243
    

    最后,检查您是否安装了兼容的 python/tensorflow/cuda 版本。为此,使用this 作为参考似乎适用于大多数人。

    安装驱动后别忘了重启!

    【讨论】:

    • @Paria,如果您对答案感到满意,请您接受并投票赞成答案。谢谢!
    猜你喜欢
    • 2020-10-09
    • 2020-02-22
    • 1970-01-01
    • 1970-01-01
    • 2016-02-06
    • 1970-01-01
    • 1970-01-01
    • 2011-04-26
    • 1970-01-01
    相关资源
    最近更新 更多