【问题标题】:Running queue in background in Tensorflow causes strange exceptions在 Tensorflow 后台运行队列会导致奇怪的异常
【发布时间】:2016-06-13 19:32:19
【问题描述】:

我在 Tensorflow 中实现这样的图:有一个队列 Q,一个后台线程将张量排入队列。在主线程中,我按顺序从 Q 中取出元素。

我的代码可以简化如下:

import time
import threading
import tensorflow as tf

sess = tf.InteractiveSession()
coord = tf.train.Coordinator()

q = tf.FIFOQueue(32, dtypes=tf.int32)

def loop(g):
    with g.as_default():
        enqueue_op = q.enqueue(1, name="example_enqueue")

        for i in range(20):
            if coord.should_stop():
                return

            try:
                sess.run(enqueue_op)
            except tf.errors.CancelledError:
                print("enqueue canncelled")

threads = [
    threading.Thread(target=loop, args=(tf.get_default_graph(),))
]

sess.run(tf.initialize_all_variables())

for t in threads: t.start()

# If I sleep 1 seconds, it will be fine!
# time.sleep(1)

print(sess.run(q.dequeue()))

coord.request_stop()
coord.join(threads)

sess.close()

我评论说,如果我在运行出队操作之前睡 1 秒,一切都会好起来的。但是,如果立即运行,将引发以下异常:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 715, in _do_call
    return fn(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 697, in _run_fn
    status, run_metadata)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/framework/errors.py", line 450, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors.NotFoundError: FetchOutputs node fifo_queue_Dequeue:0: not found

在处理上述异常的过程中,又发生了一个异常:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/hanxu/Downloads/BrainSeg/playgrounds/7.py", line 32, in <module>
    print(sess.run(q.dequeue()))
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 372, in run
    run_metadata_ptr)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 636, in _run
    feed_dict_string, options, run_metadata)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 708, in _do_run
    target_list, options, run_metadata)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 728, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.NotFoundError: FetchOutputs node fifo_queue_Dequeue:0: not found
HanXus-MacBook-Pro:BrainSeg hanxu$ python3 -m playgrounds.7
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 715, in _do_call
    return fn(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 697, in _run_fn
    status, run_metadata)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/framework/errors.py", line 450, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors.NotFoundError: FetchOutputs node fifo_queue_Dequeue:0: not found

在处理上述异常的过程中,又发生了一个异常:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/hanxu/Downloads/BrainSeg/playgrounds/7.py", line 34, in <module>
    print(sess.run(q.dequeue()))
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 372, in run
    run_metadata_ptr)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 636, in _run
    feed_dict_string, options, run_metadata)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 708, in _do_run
    target_list, options, run_metadata)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 728, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.NotFoundError: FetchOutputs node fifo_queue_Dequeue:0: not found

有人可以帮忙吗?非常感谢!!

更新

我正在使用 Tensorflow 9.0rc0。

我的实际情况要复杂一些。入队的张量实际上每次都不同,比如

def loop(g):
    with g.as_default():
        for i in range(20):
            if coord.should_stop():
                return

            # Look here!
            enqueue_op = q.enqueue(i, name="example_enqueue")

            try:
                sess.run(enqueue_op)
            except tf.errors.CancelledError:
                print("enqueue canncelled")

所以将入队操作移动到主线程并非易事:(我不知道如何。请帮忙:)

【问题讨论】:

    标签: machine-learning tensorflow


    【解决方案1】:

    这是 an issue 与旧(0.9 之前)版本的 TensorFlow,在 0.9 版本中是 fixed。问题是当其他线程(即您的loop() 线程)正在使用该图时,向图中添加节点(即在您对q.dequeue()q.enqueue() 的调用中)不是线程安全的。

    您需要解决两个问题以避免竞争条件(在 0.9 之前的版本中):

    1. 不要在loop() 线程中调用q.enqueue()。而是在主线程中创建它。例如:

      q = tf.FIFOQueue(32, dtypes=tf.int32)
      enqueue_op = q.enqueue(1, name="example_enqueue")
      
      def loop(g):
          for i in range(20):
              if coord.should_stop():
                  return
              try:
                  sess.run(enqueue_op)
              except tf.errors.CancelledError:
                  print("enqueue canncelled")
      
    2. 在启动loop() 线程之前将调用移至q.dequeue()(向图中添加一个节点):

      dequeued_t = q.dequeue()
      
      for t in threads: t.start()
      
      print(sess.run(deqeueued_t))
      

    【讨论】:

    • 对于并发写入来说不是线程安全的吗? q.dequeue()q.enqueue 并行运行并且都修改图形
    • 啊,好点,它比我想象的更坏。更新了答案以提出两个修复建议。
    • 嗨!谢谢!事实上,我使用的是 9.0rc0。在暗示您的 2 建议后,问题确实消失了!但是,在我的实际情况下,我似乎无法应用您的第一次修改。请参考我对问题的更新:)
    • 对于您更新的问题,您可以创建一个将tf.placeholder() 作为输入并提供不同值的单个入队操作。这有意义吗?
    猜你喜欢
    • 2013-11-11
    • 1970-01-01
    • 2012-03-19
    • 1970-01-01
    • 2011-10-03
    • 2016-01-18
    • 2019-08-12
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多