在 Keras 中使用的自定义注意力层答案

【问题标题】：Custom Attention Layer using in Keras在 Keras 中使用的自定义注意力层
【发布时间】：2020-05-20 15:58:10
【问题描述】：

我想创建一个自定义注意力层，用于随时输入，该层返回所有时间输入的加权平均值。

例如，我希望形状为[32,100,2048] 的输入张量进入层，我得到形状为[32,100,2048] 的张量。我写的图层如下：

import tensorflow as tf

from keras.layers import Layer, Dense

#or

from tensorflow.keras.layers import Layer, Dense


class Attention(Layer):

  def __init__(self, units_att):

     self.units_att = units_att
     self.W = Dense(units_att)
     self.V = Dense(1)
     super().__init__()

  def __call__(self, values):

      t = tf.constant(0, dtype= tf.int32)    
      time_steps = tf.shape(values)[1]
      initial_outputs = tf.TensorArray(dtype=tf.float32, size=time_steps)
      initial_att =  tf.TensorArray(dtype=tf.float32, size=time_steps)

      def should_continue(t, *args):
          return t < time_steps

      def iteration(t, values, outputs, atts):

        score = self.V(tf.nn.tanh(self.W(values)))

        # attention_weights shape == (batch_size, time_step, 1)
        attention_weights = tf.nn.softmax(score, axis=1)

        # context_vector shape after sum == (batch_size, hidden_size)
        context_vector = attention_weights * values
        context_vector = tf.reduce_sum(context_vector, axis=1)

        outputs = outputs.write(t, context_vector)
        atts = atts.write(t, attention_weights)
        return t + 1, values, outputs, atts

      t, values, outputs, atts = tf.while_loop(should_continue, iteration,
                                  [t, values, initial_outputs, initial_att])

      outputs = outputs.stack()
      outputs = tf.transpose(outputs, [1,0,2])

      atts = atts.stack()
      atts = tf.squeeze(atts, -1)
      atts = tf.transpose(atts, [1,0,2])
      return t, values, outputs, atts

对于input= tf.constant(2, shape= [32, 100, 2048], dtype= tf.float32)，我得到了在 tf2 中输出 shape = [32,100,2048]，在 tf1 中输出 [32,None, 2048]。

对于输入 input= Input(shape= (None, 2048))，我在 tf1 中得到带有 shape = [None, None, 2048] 的输出，但出现错误

TypeError: 'Tensor' 对象不能被解释为整数

在 tf2 中。

最后，在这两种情况下，我都不能在我的模型中使用这个层，因为我的模型输入是Input(shape= (None, 2048))，我得到了错误

AttributeError: 'NoneType' 对象没有属性 '_inbound_nodes'

在 tf1 和 tf2 中，我得到与上述相同的错误，我使用 Keras 函数方法创建模型。

【问题讨论】：

这里添加关注的简单方法：stackoverflow.com/a/62949137/10375049

标签： tensorflow keras deep-learning

【解决方案1】：

从您共享的代码看来，您希望在代码中实现 Bahdanau 的注意力层。您想要关注所有“值”（上一层输出 - 它的所有隐藏状态），并且您的“查询”将是解码器的最后一个隐藏状态。您的代码实际上应该非常简单，应该如下所示：

        class Bahdanau(tf.keras.layers.Layer):
            def __init__(self, n):
                super(Bahdanau, self).__init__()
                self.w = tf.keras.layers.Dense(n)
                self.u = tf.keras.layers.Dense(n)
                self.v = tf.keras.layers.Dense(1)
        
            def call(self, query, values):
                query = tf.expand_dims(query, 1)
                e = self.v(tf.nn.tanh(self.w(query) + self.u(values)))
                a = tf.nn.softmax(e, axis=1)
                c = a * values
                c = tf.reduce_sum(c, axis=1)
                return a,c
        
        ##Say we want 10 units in the single layer MLP determining w,u
        attentionlayer = Bahdanau(10)
        ##Call with i/p: decoderstate @ t-1 and all encoder hidden states
        a, c = attentionlayer(stminus1, hj)

我们没有在代码的任何地方指定张量形状。此代码将返回一个与“stminus1”大小相同的上下文张量，即“查询”。它在使用 Bahdanau 的注意力机制处理所有“值”（解码器的所有输出状态）之后执行此操作。

因此，假设您的批量大小为 32、timesteps=100 和嵌入维度=2048，则 stminus1 的形状应为 (32,2048)，hj 的形状应为 (32,100,2048)。输出上下文的形状为 (32,2048)。我们还返回了 100 个注意力权重，以防您想将它们路由到一个漂亮的显示器上。

这是“注意”的最简单版本。如果您有任何其他意图，请告诉我，我将重新格式化我的答案。更多具体细节请参考https://towardsdatascience.com/create-your-own-custom-attention-layer-understand-all-flavours-2201b5e8be9e

【讨论】：