【问题标题】:keras (lstm) - necessary shape when using return_sequences=Truekeras (lstm) - 使用 return_sequences=True 时的必要形状
【发布时间】:2017-11-12 16:19:24
【问题描述】:

我正在尝试将 LSTM 网络拟合到 sin 函数。目前,据我了解 Keras,我的代码只预测下一个值。根据此链接:Many to one and many to many LSTM examples in Keras 这是多对一模型。但是,我的目标是实现多对多模型。基本上,我希望能够预测给定时间的 10 个值。当我尝试使用 return_sequences=True(见model.add(..)行),应该是解决办法,出现如下错误:

ValueError: Error when checking target: expected lstm_8 to have 3 dimensions, but got array with shape (689, 1)

不幸的是,我完全不知道为什么会发生这种情况。使用 return_sequences=True 时是否有一般规则需要输入形状?此外,我究竟需要改变什么?感谢您的帮助。

import pandas
import numpy as np
import matplotlib.pylab as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import sklearn

from keras.models import Sequential
from keras.layers import Activation, LSTM
from keras import optimizers
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot

#generate sin function with noise
x = np.arange(0, 100, 0.1)
noise = np.random.uniform(-0.1, 0.1, size=(1000,))
Y = np.sin(x) + noise

# Perform feature scaling
scaler = MinMaxScaler()
Y = scaler.fit_transform(Y.reshape(-1, 1))

# split in train and test
train_size = int(len(Y) * 0.7)
test_size = len(Y) - train_size
train, test = Y[0:train_size,:], Y[train_size:len(Y),:]

def create_dataset(dataset, look_back=1):
    dataX, dataY = [], []
    for i in range(len(dataset)-look_back-1):
         a = dataset[i:(i+look_back), 0]
         dataX.append(a)
         dataY.append(dataset[i + look_back, 0])
    return np.array(dataX), np.array(dataY)

# reshape into X=t and Y=t+1
look_back = 10
X_train, y_train = create_dataset(train, look_back)
X_test, y_test = create_dataset(test, look_back)

# LSTM network expects the input data in form of [samples, time steps, features]
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))
np.set_printoptions(threshold=np.inf)

# compile model
model = Sequential()
model.add(LSTM(1, input_shape=(look_back, 1)))#, return_sequences=True))  <== uncomment this
model.compile(loss='mean_squared_error', optimizer='adam')
SVG(model_to_dot(model).create(prog='dot', format='svg'))

model.fit(X_train, y_train, validation_data=(X_test, y_test), 
batch_size=10, epochs=10, verbose=2)
prediction = model.predict(X_test, batch_size=1, verbose=0)
prediction.reshape(-1) 
#Transform back to original representation
Y = scaler.inverse_transform(Y)
prediction = scaler.inverse_transform(prediction)
plt.plot(np.arange(0,Y.shape[0]), Y)
plt.plot(np.arange(Y.shape[0] - X_test.shape[0] , Y.shape[0]), prediction, 'red')
plt.show()
error = mean_squared_error(y_test, prediction)
print(error)

【问题讨论】:

    标签: keras lstm keras-layer


    【解决方案1】:

    问题不在于输入,而在于输出。 错误说:“检查 target 时出错”,target = y_train 和 y_test。

    因为您的 lstm 返回一个序列 (return_sequences=True),所以输出尺寸将为:(n_batch,lookback,1)。

    您可以使用model.summary()来验证它

    _________________________________________________________________
    Layer (type)                 Output Shape              Param #   
    =================================================================
    lstm_1 (LSTM)                (None, 10, 1)             12        
    =================================================================
    Total params: 12
    Trainable params: 12
    Non-trainable params: 0
    _________________________________________________________________
    

    您将需要更改您的 create_dataset 函数,以便塑造每个基本事实(回顾,1)。

    您可能想做的事情:
    对于训练集中的每个序列 x,它的 y 将是下一个进程序列。
    例如,假设我们想学习一些更简单的东西,序列将是前一个数字加 1 --> 1,2,3,4,5,6,7,8,9,10。 对于回溯=4:

    X_train[0] = 1,2,3,4   
    y_train[0] will be: 2,3,4,5  
    X_train[1] = 2,3,4,5  
    y_train[1] will be: 3,4,5,6  
    and so on...
    

    【讨论】:

    • 感谢您的回复,但我真的不明白。将行更改为dataY.append(dataset[i+1:(i+1+look_back), 0]) 会产生您提到的输出?
    • 这取决于所需的输出.. 但是是的.. dataY.append(dataset[i+1:(i+1+look_back), 0]) 应该使每个标签为 (lockback,1 ) 向量。
    • 我是这么想的,不过好像没什么区别。至少我也得到了一个值错误。 Error when checking target: expected lstm_1 to have 3 dimensions, but got array with shape (164, 10)
    • 你做得很好..最后就是目标形状应该是(None,164,10),指令dataY.append(dataset[i+1:(i+1+look_back) , 0]) 将创建形状为 (164,10) 的向量,但您想通知模型每个时间戳都是标量。所以在那之后做: dataY.reshape(-1,lookback,1) 或 np.expand_dims(dataY,axis=-1) 并确保 dataY.shape 是 (164,10,1)
    • 我知道这是一个老问题,但我已经按照@DvirSamuel 的建议模拟了数据。我将发布代码作为答案,因为我这里没有空间。请注意,FNN 的性能与 LSTM 一样好。
    【解决方案2】:

    我按照@DvirSamuel 的建议模拟了数据,并提供了 LSTM 和 FNN 的代码。请注意,对于 LSTM,如果 return_sequences = True 包含在上一层中,则需要 network_lstm.add(layers.Dense(1, activation = None))

    ## Simulate data.
    
    np.random.seed(20180826)
    
    Z = np.random.randint(0, 10, size = (11000, 1))
    
    for i in range(10):
    
         Z = np.concatenate((Z, (Z[:, -1].reshape(Z.shape[0], 1) + 1)), axis = 1)
    
    X = Z[:, :-1]
    
    Y = Z[:,  1:]
    
    print(X.shape)
    
    print(Y.shape)
    
    ## Training and validation data.
    
    split = 10000
    
    X_train = X[:split, :]
    X_valid = X[split:, :]
    
    Y_train = Y[:split, :]
    Y_valid = Y[split:, :]
    
    print(X_train.shape)
    print(Y_train.shape)
    print(X_valid.shape)
    print(Y_valid.shape)
    

    LSTM 模型的代码:

    ## LSTM model.
    
    X_lstm_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
    X_lstm_valid = X_valid.reshape(X_valid.shape[0], X_valid.shape[1], 1)
    
    Y_lstm_train = Y_train.reshape(Y_train.shape[0], Y_train.shape[1], 1)
    Y_lstm_valid = Y_valid.reshape(Y_valid.shape[0], Y_valid.shape[1], 1)
    
    # Define model.
    
    network_lstm = models.Sequential()
    network_lstm.add(layers.LSTM(64, activation = 'relu', input_shape = (X_lstm_train.shape[1], 1),
        return_sequences = True))
    network_lstm.add(layers.Dense(1, activation = None))
    
    network_lstm.summary()
    
    # Compile model.
    
    network_lstm.compile(optimizer = 'rmsprop', loss = 'mean_squared_error')
    
    # Fit model.
    
    history_lstm = network_lstm.fit(X_lstm_train, Y_lstm_train, epochs = 5, batch_size = 32, verbose = True,
        validation_data = (X_lstm_valid, Y_lstm_valid))
    
    ## Extract loss over epochs and predict.
    
    # Extract loss.
    
    loss_lstm = history_lstm.history['loss']
    val_loss_lstm = history_lstm.history['val_loss']
    epochs_lstm = range(1, len(loss_lstm) + 1)
    
    plt.plot(epochs_lstm, loss_lstm, 'black', label = 'Training Loss')
    plt.plot(epochs_lstm, val_loss_lstm, 'red', label = 'Validation Loss')
    plt.title('LSTM: Training and Validation Loss')
    plt.legend()
    
    plt.title('First in Sequence')
    
    plt.scatter(Y_train[:, 0], network_lstm.predict(X_lstm_train)[:, 0], alpha = 0.1)
    plt.xlabel('Actual')
    plt.ylabel('Predicted')
    plt.show()
    
    plt.scatter(Y_valid[:, 0], network_lstm.predict(X_lstm_valid)[:, 0], alpha = 0.1)
    plt.xlabel('Actual')
    plt.ylabel('Predicted')
    plt.show()
    
    plt.title('Last in Sequence')
    
    plt.scatter(Y_train[:, -1], network_lstm.predict(X_lstm_train)[:, -1], alpha = 0.1)
    plt.xlabel('Actual')
    plt.ylabel('Predicted')
    plt.show()
    
    plt.scatter(Y_valid[:, -1], network_lstm.predict(X_lstm_valid)[:, -1], alpha = 0.1)
    plt.xlabel('Actual')
    plt.ylabel('Predicted')
    plt.show()
    

    FNN 模型的代码:

    ## FNN model.
    
    # Define model.
    
    network_fnn = models.Sequential()
    network_fnn.add(layers.Dense(64, activation = 'relu', input_shape = (X_train.shape[1],)))
    network_fnn.add(Dense(10, activation = None))
    
    network_fnn.summary()
    
    # Compile model.
    
    network_fnn.compile(optimizer = 'rmsprop', loss = 'mean_squared_error')
    
    # Fit model.
    
    history_fnn = network_fnn.fit(X_train, Y_train, epochs = 5, batch_size = 32, verbose = True,
        validation_data = (X_valid, Y_valid))
    
    ## Extract loss over epochs.
    
    # Extract loss.
    
    loss_fnn = history_fnn.history['loss']
    val_loss_fnn = history_fnn.history['val_loss']
    epochs_fnn = range(1, len(loss_fnn) + 1)
    
    plt.plot(epochs_fnn, loss_fnn, 'black', label = 'Training Loss')
    plt.plot(epochs_fnn, val_loss_fnn, 'red', label = 'Validation Loss')
    plt.title('FNN: Training and Validation Loss')
    plt.legend()
    
    plt.title('First in Sequence')
    
    plt.scatter(Y_train[:, 1], network_fnn.predict(X_train)[:, 1], alpha = 0.1)
    plt.xlabel('Actual')
    plt.ylabel('Predicted')
    plt.show()
    
    plt.scatter(Y_valid[:, 1], network_fnn.predict(X_valid)[:, 1], alpha = 0.1)
    plt.xlabel('Actual')
    plt.ylabel('Predicted')
    plt.show()
    
    plt.title('Last in Sequence')
    
    plt.scatter(Y_train[:, -1], network_fnn.predict(X_train)[:, -1], alpha = 0.1)
    plt.xlabel('Actual')
    plt.ylabel('Predicted')
    plt.show()
    
    plt.scatter(Y_valid[:, -1], network_fnn.predict(X_valid)[:, -1], alpha = 0.1)
    plt.xlabel('Actual')
    plt.ylabel('Predicted')
    plt.show()
    

    【讨论】:

    • 太棒了! :) 有用吗?你训练过模型吗?模型可以学习逻辑吗?
    • 是的!无论如何,它对我有用。试一试@DvirSamuel,代码就在那里。
    【解决方案3】:

    不应该这样吗:

    X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))

    X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))

    像这样:

    X_train = np.reshape((X_train.shape[0], X_train.shape[1], 1))

    X_test = np.reshape((X_test.shape[0], X_test.shape[1], 1))

    这可能是您的问题吗? (xD 后 1 年)

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2017-12-12
      • 2021-10-25
      • 2018-02-27
      • 2019-01-20
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多