【问题标题】:Why Keras Lambda-Layer cause problem Mask_RCNN?为什么 Keras Lambda-Layer 会导致问题 Mask_RCNN?
【发布时间】:2021-03-12 08:48:41
【问题描述】:

我正在使用来自此 repo 的 Mask_RCNN 包:https://github.com/matterport/Mask_RCNN

我尝试使用这个包训练我自己的数据集,但它在开始时给了我一个错误。

2020-11-30 12:13:16.577252: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-11-30 12:13:16.587017: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-11-30 12:13:16.587075: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (7612ade969e5): /proc/driver/nvidia/version does not exist
2020-11-30 12:13:16.587479: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-11-30 12:13:16.593569: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2300000000 Hz
2020-11-30 12:13:16.593811: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1b2aa00 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-11-30 12:13:16.593846: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
Traceback (most recent call last):
  File "machines.py", line 345, in <module>
    model_dir=args.logs)
  File "/content/Mask_RCNN/mrcnn/model.py", line 1837, in __init__
    self.keras_model = self.build(mode=mode, config=config)
  File "/content/Mask_RCNN/mrcnn/model.py", line 1934, in build
    anchors = KL.Lambda(lambda x: tf.Variable(anchors), name="anchors")(input_image)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 926, in __call__
    input_list)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 1117, in _functional_construction_call
    outputs = call_fn(cast_inputs, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/core.py", line 904, in call
    self._check_variables(created_variables, tape.watched_variables())
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/core.py", line 931, in _check_variables
    raise ValueError(error_str)
ValueError: 
The following Variables were created within a Lambda layer (anchors)
but are not tracked by said layer:
  <tf.Variable 'anchors/Variable:0' shape=(1, 261888, 4) dtype=float32>
The layer cannot safely ensure proper Variable reuse across multiple
calls, and consquently this behavior is disallowed for safety. Lambda
layers are not well suited to stateful computation; instead, writing a
subclassed Layer is the recommend way to define layers with
Variables.

我查找了导致问题的代码部分(位于回购中的file: /mrcnn/model.pyline: 1935): IN[0]: anchors = KL.Lambda(lambda x: tf.Variable(anchors), name="anchors")(input_image)

如果有人知道如何解决或已经解决,请提及解决方案。

【问题讨论】:

    标签: tensorflow machine-learning keras keras-layer


    【解决方案1】:

    进入mrcnn/model.py并添加:

    class AnchorsLayer(KL.Layer):
        def __init__(self, anchors, name="anchors", **kwargs):
            super(AnchorsLayer, self).__init__(name=name, **kwargs)
            self.anchors = tf.Variable(anchors)
    
        def call(self, dummy):
            return self.anchors
    
        def get_config(self):
            config = super(AnchorsLayer, self).get_config()
            return config
    

    然后找到行:

    anchors = KL.Lambda(lambda x: tf.Variable(anchors), name="anchors")(input_image)
    

    并将其替换为:

    anchors = AnchorsLayer(anchors, name="anchors")(input_image)
    

    像 TF 2.4 中的魅力一样工作!

    【讨论】:

      【解决方案2】:

      根本原因: Tensorflow 2.X 中 Keras 的 Lambda 层的行为由 Tensorflow 1.X 更改。 在 Tensorflow 1.X 的 Keras 中,所有 tf​​.Variable 和 tf.get_variable 都会通过变量创建者上下文自动跟踪到 layer.weights 中,因此它们会自动接收梯度和可训练。这种方法在自动图编译方面存在问题,将 Python 代码转换为 Tensorflow 2.X 中的执行图,因此将其删除,现在 Lambda 层具有检查变量创建并引发错误的代码,如您所见。简而言之,Tensorflow 2.X 中的 Lambda 层必须是无状态的。如果要创建变量,Tensorflow 2.X 中正确的方法是子类化层类并添加可训练权重作为类成员。

      解决方案: 有2个选择-

      1. 更改为使用 Tensorflow 1.X.. 不会引发此错误。

      2. 将 Lambda 层替换为 Keras 层的子类:

      class AnchorsLayer(tensorflow.keras.layers.Layer):
      
         def __init__(self, anchors):
           super(AnchorLayer, self).__init__()
           self.anchors_v = tf.Variable(anchors)
         
         def call(self):
           return self.anchors_v
      
      # Then replace the Lambda call with this:
         
         anchors_layer = AnchorLayers(anchors)
         anchors = anchors_layer()
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2020-11-18
        • 1970-01-01
        • 1970-01-01
        • 2012-01-05
        • 2017-11-21
        • 1970-01-01
        相关资源
        最近更新 更多