【问题标题】:Sequence to Sequence - for time series prediction序列到序列 - 用于时间序列预测
【发布时间】:2020-08-28 15:12:10
【问题描述】:

我尝试构建一个序列到序列模型,以根据传感器信号的前几个输入预测随时间变化的传感器信号(见下图)

该模型工作正常,但我想“增加趣味”并尝试在两个 LSTM 层之间添加一个注意力层。

型号代码:

def train_model(x_train, y_train, n_units=32, n_steps=20, epochs=200,
                n_steps_out=1):

    filters = 250
    kernel_size = 3

    logdir = os.path.join(logs_base_dir, datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
    tensorboard_callback = TensorBoard(log_dir=logdir, update_freq=1)

    # get number of features from input data
    n_features = x_train.shape[2]
    # setup network
    # (feel free to use other combination of layers and parameters here)
    model = keras.models.Sequential()
    model.add(keras.layers.LSTM(n_units, activation='relu',
                                return_sequences=True,
                                input_shape=(n_steps, n_features)))
    model.add(keras.layers.LSTM(n_units, activation='relu'))
    model.add(keras.layers.Dense(64, activation='relu'))
    model.add(keras.layers.Dropout(0.5))
    model.add(keras.layers.Dense(n_steps_out))
    model.compile(optimizer='adam', loss='mse', metrics=['mse'])
    # train network
    history = model.fit(x_train, y_train, epochs=epochs,
                        validation_split=0.1, verbose=1, callbacks=[tensorboard_callback])
    return model, history

我看过documentation,但有点迷茫。在当前模型上添加注意力层或 cmets 的任何帮助将不胜感激


更新: 在谷歌搜索之后,我开始认为我完全错了,我重写了我的代码。

我正在尝试迁移我在此 GitHub repository 中找到的 seq2seq 模型。在存储库代码中,演示的问题是根据一些早期样本预测随机生成的正弦波。

我有类似的问题,我正在尝试更改代码以满足我的需要。

区别:

  • 我的训练数据形状是 (439, 5, 20) 439 个不同的信号,5 个时间步长,每个具有 20 个特征
  • 拟合数据时我没有使用fit_generator

超级参数:

layers = [35, 35] # Number of hidden neuros in each layer of the encoder and decoder

learning_rate = 0.01
decay = 0 # Learning rate decay
optimiser = keras.optimizers.Adam(lr=learning_rate, decay=decay) # Other possible optimiser "sgd" (Stochastic Gradient Descent)

num_input_features = train_x.shape[2] # The dimensionality of the input at each time step. In this case a 1D signal.
num_output_features = 1 # The dimensionality of the output at each time step. In this case a 1D signal.
# There is no reason for the input sequence to be of same dimension as the ouput sequence.
# For instance, using 3 input signals: consumer confidence, inflation and house prices to predict the future house prices.

loss = "mse" # Other loss functions are possible, see Keras documentation.

# Regularisation isn't really needed for this application
lambda_regulariser = 0.000001 # Will not be used if regulariser is None
regulariser = None # Possible regulariser: keras.regularizers.l2(lambda_regulariser)

batch_size = 128
steps_per_epoch = 200 # batch_size * steps_per_epoch = total number of training examples
epochs = 100

input_sequence_length = n_steps # Length of the sequence used by the encoder
target_sequence_length = 31 - n_steps # Length of the sequence predicted by the decoder
num_steps_to_predict = 20 # Length to use when testing the model

编码器代码:

# Define an input sequence.

encoder_inputs = keras.layers.Input(shape=(None, num_input_features), name='encoder_input')

# Create a list of RNN Cells, these are then concatenated into a single layer
# with the RNN layer.
encoder_cells = []
for hidden_neurons in layers:
    encoder_cells.append(keras.layers.GRUCell(hidden_neurons,
                                              kernel_regularizer=regulariser,
                                              recurrent_regularizer=regulariser,
                                              bias_regularizer=regulariser))

encoder = keras.layers.RNN(encoder_cells, return_state=True, name='encoder_layer')

encoder_outputs_and_states = encoder(encoder_inputs)

# Discard encoder outputs and only keep the states.
# The outputs are of no interest to us, the encoder's
# job is to create a state describing the input sequence.
encoder_states = encoder_outputs_and_states[1:]

解码器代码:

# The decoder input will be set to zero (see random_sine function of the utils module).
# Do not worry about the input size being 1, I will explain that in the next cell.
decoder_inputs = keras.layers.Input(shape=(None, 20), name='decoder_input')

decoder_cells = []
for hidden_neurons in layers:
    decoder_cells.append(keras.layers.GRUCell(hidden_neurons,
                                              kernel_regularizer=regulariser,
                                              recurrent_regularizer=regulariser,
                                              bias_regularizer=regulariser))

decoder = keras.layers.RNN(decoder_cells, return_sequences=True, return_state=True, name='decoder_layer')

# Set the initial state of the decoder to be the ouput state of the encoder.
# This is the fundamental part of the encoder-decoder.
decoder_outputs_and_states = decoder(decoder_inputs, initial_state=encoder_states)

# Only select the output of the decoder (not the states)
decoder_outputs = decoder_outputs_and_states[0]

# Apply a dense layer with linear activation to set output to correct dimension
# and scale (tanh is default activation for GRU in Keras, our output sine function can be larger then 1)
decoder_dense = keras.layers.Dense(num_output_features,
                                   activation='linear',
                                   kernel_regularizer=regulariser,
                                   bias_regularizer=regulariser)

decoder_outputs = decoder_dense(decoder_outputs)

模型总结:

model = keras.models.Model(inputs=[encoder_inputs, decoder_inputs], 
outputs=decoder_outputs)
model.compile(optimizer=optimiser, loss=loss)
model.summary()

Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
encoder_input (InputLayer)      (None, None, 20)     0                                            
__________________________________________________________________________________________________
decoder_input (InputLayer)      (None, None, 20)     0                                            
__________________________________________________________________________________________________
encoder_layer (RNN)             [(None, 35), (None,  13335       encoder_input[0][0]              
__________________________________________________________________________________________________
decoder_layer (RNN)             [(None, None, 35), ( 13335       decoder_input[0][0]              
                                                                 encoder_layer[0][1]              
                                                                 encoder_layer[0][2]              
__________________________________________________________________________________________________
dense_5 (Dense)                 (None, None, 1)      36          decoder_layer[0][0]              
==================================================================================================
Total params: 26,706
Trainable params: 26,706
Non-trainable params: 0
__________________________________________________________________________________________________

尝试拟合模型时:

history = model.fit([train_x, decoder_inputs],train_y, epochs=epochs,
                        validation_split=0.3, verbose=1)

我收到以下错误:

When feeding symbolic tensors to a model, we expect the tensors to have a static batch size. Got tensor with shape: (None, None, 20)

我做错了什么?

【问题讨论】:

    标签: tensorflow machine-learning keras attention-model sequence-to-sequence


    【解决方案1】:

    这是已编辑问题的答案

    首先,当你调用 fit 时,decoder_inputs 是一个张量,你不能用它来拟合你的模型。你引用的代码的作者,使用一个零数组,所以你必须这样做(我在下面的虚拟示例中这样做)

    其次,查看模型摘要中的输出层...它是 3D,因此您必须将目标作为 3D 数组进行管理

    第三,解码器输入必须是 1 个特征维度,而不是您报告的 20 个特征维度

    设置初始参数

    layers = [35, 35]
    learning_rate = 0.01
    decay = 0 
    optimiser = keras.optimizers.Adam(lr=learning_rate, decay=decay)
    
    num_input_features = 20
    num_output_features = 1
    loss = "mse"
    
    lambda_regulariser = 0.000001
    regulariser = None
    
    batch_size = 128
    steps_per_epoch = 200
    epochs = 100
    

    定义编码器

    encoder_inputs = keras.layers.Input(shape=(None, num_input_features), name='encoder_input')
    
    encoder_cells = []
    for hidden_neurons in layers:
        encoder_cells.append(keras.layers.GRUCell(hidden_neurons,
                                                  kernel_regularizer=regulariser,
                                                  recurrent_regularizer=regulariser,
                                                  bias_regularizer=regulariser))
    
    encoder = keras.layers.RNN(encoder_cells, return_state=True, name='encoder_layer')
    encoder_outputs_and_states = encoder(encoder_inputs)
    encoder_states = encoder_outputs_and_states[1:] # only keep the states
    

    定义解码器(1 个特征维度输入!)

    decoder_inputs = keras.layers.Input(shape=(None, 1), name='decoder_input') #### <=== must be 1
    
    decoder_cells = []
    for hidden_neurons in layers:
        decoder_cells.append(keras.layers.GRUCell(hidden_neurons,
                                                  kernel_regularizer=regulariser,
                                                  recurrent_regularizer=regulariser,
                                                  bias_regularizer=regulariser))
    
    decoder = keras.layers.RNN(decoder_cells, return_sequences=True, return_state=True, name='decoder_layer')
    decoder_outputs_and_states = decoder(decoder_inputs, initial_state=encoder_states)
    
    decoder_outputs = decoder_outputs_and_states[0] # only keep the output sequence
    decoder_dense = keras.layers.Dense(num_output_features,
                                       activation='linear',
                                       kernel_regularizer=regulariser,
                                       bias_regularizer=regulariser)
    
    decoder_outputs = decoder_dense(decoder_outputs)
    

    定义模型

    model = keras.models.Model(inputs=[encoder_inputs, decoder_inputs], outputs=decoder_outputs)
    model.compile(optimizer=optimiser, loss=loss)
    model.summary()
    
    Layer (type)                    Output Shape         Param #     Connected to                     
    ==================================================================================================
    encoder_input (InputLayer)      (None, None, 20)     0                                            
    __________________________________________________________________________________________________
    decoder_input (InputLayer)      (None, None, 1)      0                                            
    __________________________________________________________________________________________________
    encoder_layer (RNN)             [(None, 35), (None,  13335       encoder_input[0][0]              
    __________________________________________________________________________________________________
    decoder_layer (RNN)             [(None, None, 35), ( 11340       decoder_input[0][0]              
                                                                     encoder_layer[0][1]              
                                                                     encoder_layer[0][2]              
    __________________________________________________________________________________________________
    dense_4 (Dense)                 (None, None, 1)      36          decoder_layer[0][0]              
    ==================================================================================================
    

    这是我的虚拟数据。和你的形状一样。注意decoder_zero_inputs 它的维度与你的 y 相同,但它是一个零数组

    train_x = np.random.uniform(0,1, (439, 5, 20))
    train_y = np.random.uniform(0,1, (439, 56, 1))
    validation_x = np.random.uniform(0,1, (10, 5, 20))
    validation_y = np.random.uniform(0,1, (10, 56, 1))
    decoder_zero_inputs = np.zeros((439, 56, 1)) ### <=== attention
    

    拟合

    history = model.fit([train_x, decoder_zero_inputs],train_y, epochs=epochs,
                         validation_split=0.3, verbose=1)
    
    Epoch 1/100
    307/307 [==============================] - 2s 8ms/step - loss: 0.1038 - val_loss: 0.0845
    Epoch 2/100
    307/307 [==============================] - 1s 2ms/step - loss: 0.0851 - val_loss: 0.0832
    Epoch 3/100
    307/307 [==============================] - 1s 2ms/step - loss: 0.0842 - val_loss: 0.0828
    

    验证预测

    pred_validation = model.predict([validation_x, np.zeros((10,56,1))])
    

    【讨论】:

      【解决方案2】:

      Keras 中的注意力层不是可训练层(除非我们使用尺度参数)。它只计算矩阵运算。在我看来,如果直接应用在时间序列上,这一层可能会导致一些错误,但让我们继续按顺序...

      在我们的时间序列问题上复制注意力机制的最自然选择是采用here 提出并再次解释here 的解决方案。这是NLP中注意力在enc-dec结构中的经典应用

      在 TF 实现之后,对于我们的注意力层,我们需要 3d 格式的查询、值、键张量。我们直接从循环层获得这些值。更具体地说,我们利用序列输出和隐藏状态。这些就是我们建立注意力机制所需要的。

      查询是输出序列[batch_dim, time_step, features]

      value 是隐藏状态 [batch_dim, features],我们为矩阵运算添加时间维度 [batch_dim, 1, features]

      作为key,我们像以前一样使用隐藏状态,所以key = value

      在上面的定义和实现中我发现了2个问题:

      • 使用 softmax(dot(sequence, hidden)) 计算分数。点没问题,但 Keras 实现之后的 softmax 是在最后一个维度上计算的,而不是在时间维度上计算的。这意味着分数都是 1,所以它们是无用的
      • 输出注意力是点(分数,隐藏),而不是我们需要的点(分数,序列)

      例子:

      def attention_keras(query_value):
      
          query, value = query_value # key == value
          score = tf.matmul(query, value, transpose_b=True) # (batch, timestamp, 1)
          score = tf.nn.softmax(score) # softmax on -1 axis ==> score always = 1 !!!
          print((score.numpy()!=1).any()) # False ==> score always = 1 !!!
          score = tf.matmul(score, value) # (batch, timestamp, feat)
          return score
      
      np.random.seed(33)
      time_steps = 20
      features = 50
      sample = 5
      
      X = np.random.uniform(0,5, (sample,time_steps,features))
      state = np.random.uniform(0,5, (sample,features))
      attention_keras([X,tf.expand_dims(state,1)]) # ==> the same as Attention(dtype='float64')([X,tf.expand_dims(state,1)])
      

      因此,出于这个原因,为了关注时间序列,我提出了这个解决方案

      def attention_seq(query_value, scale):
      
          query, value = query_value
          score = tf.matmul(query, value, transpose_b=True) # (batch, timestamp, 1)
          score = scale*score # scale with a fixed number (it can be finetuned or learned during train)
          score = tf.nn.softmax(score, axis=1) # softmax on timestamp axis
          score = score*query # (batch, timestamp, feat)
          return score
      
      np.random.seed(33)
      time_steps = 20
      features = 50
      sample = 5
      
      X = np.random.uniform(0,5, (sample,time_steps,features))
      state = np.random.uniform(0,5, (sample,features))
      attention_seq([X,tf.expand_dims(state,1)], scale=0.05)
      

      查询是输出序列[batch_dim, time_step, features]

      value 是隐藏状态 [batch_dim, features],我们为矩阵运算添加时间维度 [batch_dim, 1, features]

      权重使用 softmax(scale*dot(sequence, hidden)) 计算。 scale 参数是一个标量值,可用于在应用 softmax 操作之前缩放权重。 softmax 在时间维度上正确计算。注意力输出是输入序列和分数的加权乘积。我将标量参数用作固定值,但可以对其进行调整或作为可学习的权重插入到自定义层中(作为 Keras 注意中的比例参数)。

      在网络实施方面,有两种可用的可能性:

      ######### KERAS #########
      inp = Input((time_steps,features))
      seq, state = GRU(32, return_state=True, return_sequences=True)(inp)
      att = Attention()([seq, tf.expand_dims(state,1)])
      
      ######### CUSTOM #########
      inp = Input((time_steps,features))
      seq, state = GRU(32, return_state=True, return_sequences=True)(inp)
      att = Lambda(attention_seq, arguments={'scale': 0.05})([seq, tf.expand_dims(state,1)])
      

      结论

      我不知道在简单问题中引入注意力层能带来多少附加值。如果您有短序列,我建议您保持原样。我在这里报告的是我表达我的考虑的答案,我会接受关于可能的错误或误解的评论或考虑


      在您的模型中,可以通过这种方式嵌入这些解决方案

      ######### KERAS #########
      inp = Input((n_features, n_steps))
      seq, state = GRU(n_units, activation='relu',
                       return_state=True, return_sequences=True)(inp)
      att = Attention()([seq, tf.expand_dims(state,1)])
      x = GRU(n_units, activation='relu')(att)
      x = Dense(64, activation='relu')(x)
      x = Dropout(0.5)(x)
      out = Dense(n_steps_out)(x)
      
      model = Model(inp, out)
      model.compile(optimizer='adam', loss='mse', metrics=['mse'])
      model.summary()
      
      ######### CUSTOM #########
      inp = Input((n_features, n_steps))
      seq, state = GRU(n_units, activation='relu',
                       return_state=True, return_sequences=True)(inp)
      att = Lambda(attention_seq, arguments={'scale': 0.05})([seq, tf.expand_dims(state,1)])
      x = GRU(n_units, activation='relu')(att)
      x = Dense(64, activation='relu')(x)
      x = Dropout(0.5)(x)
      out = Dense(n_steps_out)(x)
      
      model = Model(inp, out)
      model.compile(optimizer='adam', loss='mse', metrics=['mse'])
      model.summary()
      

      【讨论】:

        猜你喜欢
        • 2020-10-05
        • 2019-08-31
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2020-02-29
        • 2018-09-22
        • 2019-09-20
        相关资源
        最近更新 更多