【问题标题】:TensorFlow Keras CuDNNGRU to GRU conversionTensorFlow Keras CuDNNGRU 到 GRU 的转换
【发布时间】:2019-11-11 19:25:44
【问题描述】:

我有一个使用(现已弃用)tf.keras.layers.CuDNNGRU 层(在 TensorFlow 2.0 中tf.compat.v1 中可用)在 TensorFlow 1.14 中构建的训练模型,我正在尝试将旧层的权重移植到新的 TensorFlow 2.0 模型中使用tf.keras.layers.GRU 构建以获得等效模型。

这样做的一个动机是能够在 CPU 上进行推理(tf.compat.v1.keras.layers.CuDNNGRU 层仅在 GPU 上运行)。另一个动机是让模型面向未来。

问题

如何将经过训练的 tf.contrib.v1.keras.layers.CuDNNGRU 层转换为等效的 tf.keras.layers.GRU 层?

【问题讨论】:

    标签: python tensorflow machine-learning deep-learning tf.keras


    【解决方案1】:

    tensorflow.python.keras.saving.hdf5_format 中的以下私有帮助函数似乎可以解决问题。该函数执行在CuDNNGRU/GRUCuDNNLSTM/LSTM 格式之间转换权重的更一般的任务,因此它在我的用例之外很有用。该函数似乎起源于独立 Keras 中的 this pull request

    import numpy as np
    
    
    def _convert_rnn_weights(layer, weights):
      """Converts weights for RNN layers between native and CuDNN format.
    
      Input kernels for each gate are transposed and converted between Fortran
      and C layout, recurrent kernels are transposed. For LSTM biases are summed/
      split in half, for GRU biases are reshaped.
    
      Weights can be converted in both directions between `LSTM` and`CuDNNSLTM`
      and between `CuDNNGRU` and `GRU(reset_after=True)`. Default `GRU` is not
      compatible with `CuDNNGRU`.
    
      For missing biases in `LSTM`/`GRU` (`use_bias=False`) no conversion is made.
    
      Arguments:
          layer: Target layer instance.
          weights: List of source weights values (input kernels, recurrent
              kernels, [biases]) (Numpy arrays).
    
      Returns:
          A list of converted weights values (Numpy arrays).
    
      Raises:
          ValueError: for incompatible GRU layer/weights or incompatible biases
      """
    
    
      def transform_kernels(kernels, func, n_gates):
        """Transforms kernel for each gate separately using given function.
    
        Arguments:
            kernels: Stacked array of kernels for individual gates.
            func: Function applied to kernel of each gate.
            n_gates: Number of gates (4 for LSTM, 3 for GRU).
    
        Returns:
            Stacked array of transformed kernels.
        """
        return np.hstack([func(k) for k in np.hsplit(kernels, n_gates)])
    
    
      def transpose_input(from_cudnn):
        """Makes a function that transforms input kernels from/to CuDNN format.
    
        It keeps the shape, but changes between the layout (Fortran/C). Eg.:
    
        ```
        Keras                 CuDNN
        [[0, 1, 2],  <--->  [[0, 2, 4],
         [3, 4, 5]]          [1, 3, 5]]
        ```
    
        It can be passed to `transform_kernels()`.
    
        Arguments:
            from_cudnn: `True` if source weights are in CuDNN format, `False`
                if they're in plain Keras format.
    
        Returns:
            Function that converts input kernel to the other format.
        """
        order = 'F' if from_cudnn else 'C'
    
    
        def transform(kernel):
          return kernel.T.reshape(kernel.shape, order=order)
    
    
        return transform
    
    
      target_class = layer.__class__.__name__
    
    
      # convert the weights between CuDNNLSTM and LSTM
      if target_class in ['LSTM', 'CuDNNLSTM'] and len(weights) == 3:
        # determine if we're loading a CuDNNLSTM layer
        # from the number of bias weights:
        # CuDNNLSTM has (units * 8) weights; while LSTM has (units * 4)
        # if there's no bias weight in the file, skip this conversion
        units = weights[1].shape[0]
        bias_shape = weights[2].shape
        n_gates = 4
    
    
        if bias_shape == (2 * units * n_gates,):
          source = 'CuDNNLSTM'
        elif bias_shape == (units * n_gates,):
          source = 'LSTM'
        else:
          raise ValueError('Invalid bias shape: ' + str(bias_shape))
    
    
        def convert_lstm_weights(weights, from_cudnn=True):
          """Converts the weights between CuDNNLSTM and LSTM.
    
          Arguments:
            weights: Original weights.
            from_cudnn: Indicates whether original weights are from CuDNN layer.
    
          Returns:
            Updated weights compatible with LSTM.
          """
    
    
          # Transpose (and reshape) input and recurrent kernels
          kernels = transform_kernels(weights[0], transpose_input(from_cudnn),
                                      n_gates)
          recurrent_kernels = transform_kernels(weights[1], lambda k: k.T, n_gates)
          if from_cudnn:
            # merge input and recurrent biases into a single set
            biases = np.sum(np.split(weights[2], 2, axis=0), axis=0)
          else:
            # Split single set of biases evenly to two sets. The way of
            # splitting doesn't matter as long as the two sets sum is kept.
            biases = np.tile(0.5 * weights[2], 2)
          return [kernels, recurrent_kernels, biases]
    
    
        if source != target_class:
          weights = convert_lstm_weights(weights, from_cudnn=source == 'CuDNNLSTM')
    
    
      # convert the weights between CuDNNGRU and GRU(reset_after=True)
      if target_class in ['GRU', 'CuDNNGRU'] and len(weights) == 3:
        # We can determine the source of the weights from the shape of the bias.
        # If there is no bias we skip the conversion since
        # CuDNNGRU always has biases.
    
    
        units = weights[1].shape[0]
        bias_shape = weights[2].shape
        n_gates = 3
    
    
        def convert_gru_weights(weights, from_cudnn=True):
          """Converts the weights between CuDNNGRU and GRU.
    
          Arguments:
            weights: Original weights.
            from_cudnn: Indicates whether original weights are from CuDNN layer.
    
          Returns:
            Updated weights compatible with GRU.
          """
    
    
          kernels = transform_kernels(weights[0], transpose_input(from_cudnn),
                                      n_gates)
          recurrent_kernels = transform_kernels(weights[1], lambda k: k.T, n_gates)
          biases = np.array(weights[2]).reshape((2, -1) if from_cudnn else -1)
          return [kernels, recurrent_kernels, biases]
    
    
        if bias_shape == (2 * units * n_gates,):
          source = 'CuDNNGRU'
        elif bias_shape == (2, units * n_gates):
          source = 'GRU(reset_after=True)'
        elif bias_shape == (units * n_gates,):
          source = 'GRU(reset_after=False)'
        else:
          raise ValueError('Invalid bias shape: ' + str(bias_shape))
    
    
        if target_class == 'CuDNNGRU':
          target = 'CuDNNGRU'
        elif layer.reset_after:
          target = 'GRU(reset_after=True)'
        else:
          target = 'GRU(reset_after=False)'
    
    
        # only convert between different types
        if source != target:
          types = (source, target)
          if 'GRU(reset_after=False)' in types:
            raise ValueError('%s is not compatible with %s' % types)
          if source == 'CuDNNGRU':
            weights = convert_gru_weights(weights, from_cudnn=True)
          elif source == 'GRU(reset_after=True)':
            weights = convert_gru_weights(weights, from_cudnn=False)
    
    
      return weights
    

    对于我的用例(将CuDNNGRU 权重放入GRU),使用此函数的解决方案如下:

    # cudnn_gru and gru are built CuDNNGRU and GRU layers, respectively
    kernel, recurrent_kernel, bias = _convert_rnn_weights(
        layer=gru,
        weights=[
            cudnn_gru.kernel.numpy(),
            cudnn_gru.recurrent_kernel.numpy(),
            cudnn_gru.bias.numpy(),
        ],
    )
    gru.cell.kernel.assign(kernel)
    gru.cell.recurrent_kernel.assign(recurrent_kernel)
    gru.cell.bias.assign(bias)
    

    请注意,要使用tf.keras.layers.GRU 的 cuDNN 兼容实现,必须使用use a specific combination of parameters(特别是use_bias=True)。

    【讨论】:

      【解决方案2】:

      我知道这个线程有点老,但我可以添加如何在 Keras/TF 2.6 中将 CuDNNGRU/CuDNNLSTM 转换为 GRU/LSTM(接受的答案对我不起作用,因为 gru 的属性。单元格似乎已更改)。

      背景:我想使用 GPU 训练 CuDNNGRU(比在 GPU 上训练标准 GRU 更快)并将其转换为标准 GRU 以进行 CPU 推理。

      这个解决方案来自一个名叫bzamecnik的GitHub人:

      1. 创建包含 CuDNNGRU gru_cudnn(或 CuDNNLSTM)的模型,对其进行训练并保存其权重:

        gru_cudnn = CuDNNGRU(n_units)
        model = ... make model with gru_cudnn ...
        model.fit(...)
        model.save_weights('weights_cudnn.h5')
        
      2. 使用标准 GRU gru(或 LSTM)而不是 CuDNNGRU(或 CuDNNLSTM)创建具有相同架构的模型,并从 1 加载保存的 CuDNN 权重:

        gru = GRU(n_units, reset_after=True, recurrent_activation='sigmoid')
        model = ... make model with gru ...
        model.load_weights('weights_cudnn.h5')
        

      我希望这对以后偶然发现此线程的人有所帮助。

      【讨论】:

        猜你喜欢
        • 2018-10-04
        • 2021-12-08
        • 2019-10-29
        • 1970-01-01
        • 1970-01-01
        • 2018-10-08
        • 2021-09-08
        • 2021-04-08
        • 1970-01-01
        相关资源
        最近更新 更多