【问题标题】:Cannot start Tensorboard when tensorflow is runningtensorflow 运行时无法启动 Tensorboard
【发布时间】:2018-06-02 13:31:23
【问题描述】:

当 tensorflow 已经在运行并使用 GPU 时,我无法启动 tensorboard 实例。错误如下。显然,Tensorflow 在启动时会阻止所有 GPU 内存,而与它实际需要的内存无关。有没有办法在 tensorflow 进程运行时启动 tensorboard,还是总是先启动它?

totalMemory: 5,93GiB freeMemory: 41,56MiB
2018-06-02 15:28:11.053634: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-06-02 15:28:11.321850: E tensorflow/core/common_runtime/direct_session.cc:154] Internal: CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory
Traceback (most recent call last):
  File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/bin/tensorboard", line 11, in <module>
    sys.exit(run_main())
  File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorboard/main.py", line 36, in run_main
    tf.app.run(main)
  File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorboard/main.py", line 45, in main
    default.get_assets_zip_provider())
  File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorboard/program.py", line 166, in main
    tb = create_tb_app(plugins, assets_zip_provider)
  File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorboard/program.py", line 201, in create_tb_app
    flags=FLAGS)
  File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorboard/backend/application.py", line 126, in standard_tensorboard_wsgi
    plugin_instances = [constructor(context) for constructor in plugins]
  File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorboard/backend/application.py", line 126, in <listcomp>
    plugin_instances = [constructor(context) for constructor in plugins]
  File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorboard/plugins/beholder/beholder_plugin.py", line 47, in __init__
    self.most_recent_frame = im_util.get_image_relative_to_script('no-data.png')
  File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorboard/plugins/beholder/im_util.py", line 254, in get_image_relative_to_script
    return read_image(filename)
  File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorboard/plugins/beholder/im_util.py", line 242, in read_image
    return np.array(decode_png(image_file.read()))
  File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorboard/plugins/beholder/im_util.py", line 159, in __call__
    self._lazily_initialize()
  File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorboard/plugins/beholder/im_util.py", line 137, in _lazily_initialize
    self._session = tf.Session(graph=graph, config=config)
  File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1560, in __init__
    super(Session, self).__init__(target, graph, config=config)
  File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 633, in __init__
    self._session = tf_session.TF_NewSession(self._graph._c_graph, opts)
tensorflow.python.framework.errors_impl.InternalError: Failed to create session.

【问题讨论】:

  • Tensorboard 不需要 GPU。 Tensorboard 的主要作用是解析事件 pb 文件并显示在网络上。没有任何繁重的计算,不需要GPU?你是如何安装 tensorflow 的?

标签: python tensorflow tensorboard


【解决方案1】:

Tensorboard 1.7.0 似乎在 GPU 上占用了大约 150MB。见this open Tensorboard issue。看起来它正在解决中。

临时的一个选项是限制 Tensorflow 允许预先分配每个进程的内存百分比,详细信息 in this answer。这样,您可以确保为 GPU 上您可能希望在训练期间运行的其他任务保留一定百分比的内存。

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.8)

sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2019-01-02
    • 2021-01-29
    • 2019-08-10
    • 1970-01-01
    • 2019-03-10
    • 1970-01-01
    • 2020-04-13
    • 1970-01-01
    相关资源
    最近更新 更多