如何使用 Keras / Theano 配置一个非常简单的 LSTM 进行回归答案

【问题标题】：How to configure a very simple LSTM with Keras / Theano for Regression如何使用 Keras / Theano 配置一个非常简单的 LSTM 进行回归
【发布时间】：2016-09-16 04:12:06
【问题描述】：

我正在努力为简单的回归任务配置 Keras LSTM。官网有一些很基础的解释：Keras RNN documentation

但要充分理解，带有示例数据的示例配置将非常有帮助。

我几乎没有找到使用 Keras-LSTM 进行回归的示例。大多数示例都是关于分类（文本或图像）。我研究了 Keras 发行版附带的 LSTM 示例以及我通过 Google 搜索找到的一个示例：http://danielhnyk.cz/ 它提供了一些见解，尽管作者承认该方法的内存效率非常低，因为数据样本必须非常存储多余的。

尽管评论者 (Taha) 提出了一项改进，但数据存储仍然是多余的，我怀疑这是否是 Keras 开发人员的本意。

我下载了一些简单的序列数据示例，恰好是来自雅虎财经的股票数据。它可从雅虎财经Data免费获得

Date,       Open,      High,      Low,       Close,     Volume,   Adj Close
2016-05-18, 94.160004, 95.209999, 93.889999, 94.559998, 41923100, 94.559998
2016-05-17, 94.550003, 94.699997, 93.010002, 93.489998, 46507400, 93.489998
2016-05-16, 92.389999, 94.389999, 91.650002, 93.879997, 61140600, 93.879997
2016-05-13, 90.00,     91.669998, 90.00,     90.519997, 44188200, 90.519997

该表包含 8900 多行这样的 Apple 股票数据。每天有 7 列 = 数据点。要预测的值是“AdjClose”，这是一天结束时的值

所以目标是根据前几天的顺序来预测第二天的 AdjClose。（这可能几乎是不可能的，但看看工具在具有挑战性的条件下的表现总是很好的。）

我认为这应该是一个非常标准的 LSTM 预测/回归案例，并且可以轻松转移到其他问题领域。

那么，应该如何格式化数据（X_train，y_train）以实现最小冗余，以及如何初始化只有一个 LSTM 层和几个隐藏神经元的 Sequential 模型？

亲切的问候，西奥

PS：我开始编写这个代码：

...
X_train
Out[6]: 
array([[  2.87500000e+01,   2.88750000e+01,   2.87500000e+01,
      2.87500000e+01,   1.17258400e+08,   4.31358010e-01],
   [  2.73750019e+01,   2.73750019e+01,   2.72500000e+01,
      2.72500000e+01,   4.39712000e+07,   4.08852011e-01],
   [  2.53750000e+01,   2.53750000e+01,   2.52500000e+01,
      2.52500000e+01,   2.64320000e+07,   3.78845006e-01],
   ..., 
   [  9.23899994e+01,   9.43899994e+01,   9.16500015e+01,
      9.38799973e+01,   6.11406000e+07,   9.38799973e+01],
   [  9.45500031e+01,   9.46999969e+01,   9.30100021e+01,
      9.34899979e+01,   4.65074000e+07,   9.34899979e+01],
   [  9.41600037e+01,   9.52099991e+01,   9.38899994e+01,
      9.45599976e+01,   4.19231000e+07,   9.45599976e+01]], dtype=float32)

y_train
Out[7]: 
array([  0.40885201,   0.37884501,   0.38822201, ...,  93.87999725,
   93.48999786,  94.55999756], dtype=float32)

到目前为止，数据已准备就绪。没有引入冗余。现在的问题是，如何在这些数据上描述 Keras LSTM 模型/训练过程。

编辑 3：

这是更新后的代码，其中包含循环网络所需的 3D 数据结构。（见 Lorrit 的回答）。但是，它不起作用。

编辑 4：在 Activation('sigmoid') 之后删除了多余的逗号，以正确的方式塑造了 Y_train。还是一样的错误。

import numpy as np

from keras.models import Sequential
from keras.layers import Dense,  Activation, LSTM

nb_timesteps    =  4
nb_features     =  5
batch_size      = 32

# load file
X_train = np.genfromtxt('table.csv', 
                        delimiter=',',  
                        names=None, 
                        unpack=False,
                        dtype=None)

# delete the first row with the names
X_train = np.delete(X_train, (0), axis=0)

# invert the order of the rows, so that the oldest
# entry is in the first row and the newest entry
# comes last
X_train = np.flipud(X_train)

# the last column is our Y
Y_train = X_train[:,6].astype(np.float32)

Y_train = np.delete(Y_train, range(0,6))
Y_train = np.array(Y_train)
Y_train.shape = (len(Y_train), 1)

# we don't use the timestamps. convert the rest to Float32
X_train = X_train[:, 1:6].astype(np.float32)

# shape X_train
X_train.shape = (1,len(X_train), nb_features)


# Now comes Lorrit's code for shaping the 3D-input-data
# http://stackoverflow.com/questions/36992855/keras-how-should-i-prepare-input-data-for-rnn
flag = 0

for sample in range(X_train.shape[0]):
    tmp = np.array([X_train[sample,i:i+nb_timesteps,:] for i in range(X_train.shape[1] - nb_timesteps + 1)])

    if flag==0:
        new_input = tmp
        flag = 1

    else:
        new_input = np.concatenate((new_input,tmp))

X_train = np.delete(new_input, len(new_input) - 1, axis = 0)
X_train = np.delete(X_train, 0, axis = 0)
X_train = np.delete(X_train, 0, axis = 0)
# X successfully shaped

# free some memory
tmp = None
new_input = None


# split data for training, validation and test
# 50:25:25
X_train, X_test = np.split(X_train, 2, axis=0)
X_valid, X_test = np.split(X_test, 2, axis=0)

Y_train, Y_test = np.split(Y_train, 2, axis=0)
Y_valid, Y_test = np.split(Y_test, 2, axis=0)


print('Build model...')

model = Sequential([
    Dense(8, input_dim=nb_features),
    Activation('softmax'),
    LSTM(4, dropout_W=0.2, dropout_U=0.2),
    Dense(1),
    Activation('sigmoid')
])

model.compile(loss='mse',
              optimizer='RMSprop',
              metrics=['accuracy'])

print('Train...')
print(X_train.shape)
print(Y_train.shape)
model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=15,
          validation_data=(X_test, Y_test))
score, acc = model.evaluate(X_test, Y_test,
                            batch_size=batch_size)

print('Test score:', score)
print('Test accuracy:', acc)

Keras 说，数据似乎仍然存在问题：

Using Theano backend.
Using gpu device 0: GeForce GTX 960 (CNMeM is disabled, cuDNN not available)Build model...

Traceback (most recent call last):

  File "<ipython-input-1-3a6e9e045167>", line 1, in <module>
    runfile('C:/Users/admin/Documents/pycode/lstm/lstm5.py', wdir='C:/Users/admin/Documents/pycode/lstm')

  File "C:\Users\admin\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile
    execfile(filename, namespace)

  File "C:\Users\admin\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile
    exec(compile(scripttext, filename, 'exec'), glob, loc)

  File "C:/Users/admin/Documents/pycode/lstm/lstm5.py", line 79, in <module>
    Activation('sigmoid')

  File "d:\git\keras\keras\models.py", line 93, in __init__
    self.add(layer)

  File "d:\git\keras\keras\models.py", line 146, in add
    output_tensor = layer(self.outputs[0])

  File "d:\git\keras\keras\engine\topology.py", line 441, in __call__
    self.assert_input_compatibility(x)

  File "d:\git\keras\keras\engine\topology.py", line 382, in assert_input_compatibility
    str(K.ndim(x)))

Exception: Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=2

【问题讨论】：

stackoverflow.com/a/62570576/10375049

标签： regression theano keras lstm

【解决方案1】：

在您的模型定义中，您在 LSTM 层之前放置了一个 Dense 层。您需要在 Dense 层上使用 TimeDistributed 层。

尝试改变

model = Sequential([
    Dense(8, input_dim=nb_features),
    Activation('softmax'),
    LSTM(4, dropout_W=0.2, dropout_U=0.2),
    Dense(1),
    Activation('sigmoid')
])

到

model = Sequential([
    TimeDistributed(Dense(8, input_dim=nb_features, Activation='softmax')),
    LSTM(4, dropout_W=0.2, dropout_U=0.2),
    Dense(1),
    Activation('sigmoid')
])

【讨论】：

【解决方案2】：

在将数据输入 LSTM 之前，您仍然缺少一个预处理步骤。您必须决定在计算当天的 AdjClose 时要包含多少以前的数据样本（前几天）。请参阅我的回答here，了解如何做到这一点。然后，您的数据应该是 3 维形状（nb_samples、nb_included_previous_days、特征）。

然后，您可以将 3D 输入到具有一个输出的标准 LSTM 层。您可以将此值与 y_train 进行比较并尝试将误差降至最低。请记住选择适合回归的损失函数，例如均方误差。

【讨论】：

因此与链接的示例相比，在我的情况下，样本数为 1。如果我只想使用最近的 5 个输入/时间步来预测输出，我的数据的形状应该是(8900, 5, 6)。这不是意味着存储数据的冗余系数接近 5 吗？！
是的，确实如此。在此示例中，冗余应该不是问题，因为 (8900, 5, 6) 浮点数据集仅占用大约 1Mb 的 RAM。在处理较大的数据集（尤其是具有更多特征的数据集）时，您可能需要考虑使用查找表来查找实际值，并且仅在 LSTM 的输入中引用它们。 Keras 嵌入层可以帮助您做到这一点。
好吧，如果冗余是无法避免的，那就这样吧。我现在已经使用您的代码预处理 X_train 和 Y_train。 X_train 的每个样本现在都是 4 个时间步长和 5 个特征的序列。相应地，有 4 个 Y 值（作为示例输出 - 每个时间步一个）。请查看更新后的代码。这是行不通的。 Keras 说“输入 0 与层 lstm_1 不兼容：预期 ndim=3，发现 ndim=2”；顺便说一句，数据可从 Yahoo Finance real-chart.finance.yahoo.com/table.csv?s=AAPL&a=11&b=12&c=1980&d=04&e=23&f=2016&g=d&ignore=.csv 免费获得
每个样本应包含 4 个时间步长，每个时间步长 5 个特征作为输入，只有 1 个输出，应该是第二天的 AdjClose。毕竟，这是你想要预测的。
关于你的错误，你可能在输入数据的重塑过程中犯了一个错误。在调用 fit() 函数之前检查 X_train.shape 以确保它具有形状（nb_training_samples、nb_included_previous_days、特征）。

【解决方案3】：

不确定这是否仍然相关，但 Jason Brownlees 博士的博客 here 上有一个很好的示例说明如何使用 LSTM 网络预测时间序列。

我准备了一个关于三个具有不同幅度的噪声相移正弦曲线的示例。不是市场数据，但我假设，您假设一只股票会说明另一只股票。

import numpy
import matplotlib.pyplot as plt
import pandas
import math
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Reshape
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
# generate sine wavepip
def make_sine_with_noise(_start, _stop, _step, _phase_shift, gain):
    x = numpy.arange(_start, _stop, step = _step)
    noise = numpy.random.uniform(-0.1, 0.1, size = len(x))
    y = gain*0.5*numpy.sin(x+_phase_shift)
    y = numpy.add(noise, y)
    return x, y
# convert an array of values into a dataset matrix
def create_dataset(dataset, look_back=1, look_ahead=1):
    dataX, dataY = [], []
    for i in range(len(dataset) - look_back - look_ahead - 1):
        a = dataset[i:(i + look_back), :]
        dataX.append(a)
        b = dataset[(i + look_back):(i + look_back + look_ahead), :]
        dataY.append(b)
    return numpy.array(dataX), numpy.array(dataY)
# fix random seed for reproducibility
numpy.random.seed(7)
# generate sine wave
x1, y1 = make_sine_with_noise(0, 200, 1/24, 0, 1)
x2, y2 = make_sine_with_noise(0, 200, 1/24, math.pi/4, 3)
x3, y3 = make_sine_with_noise(0, 200, 1/24, math.pi/2, 20)
# plt.plot(x1, y1)
# plt.plot(x2, y2)
# plt.plot(x3, y3)
# plt.show()
#transform to pandas dataframe
dataframe = pandas.DataFrame({'y1': y1, 'y2': y2, 'x3': y3})
dataset = dataframe.values
dataset = dataset.astype('float32')
# normalize the dataset
scaler = MinMaxScaler(feature_range=(0, 1))
dataset = scaler.fit_transform(dataset)
#split into train and test sets
train_size = int(len(dataset) * 0.67)
test_size = len(dataset) - train_size
train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]
# reshape into X=t and Y=t+1
look_back = 10
look_ahead = 5
trainX, trainY = create_dataset(train, look_back, look_ahead)
testX, testY = create_dataset(test, look_back, look_ahead)
print(trainX.shape)
print(trainY.shape)
# reshape input to be [samples, time steps, features]
trainX = numpy.reshape(trainX, (trainX.shape[0], trainX.shape[1], trainX.shape[2]))
testX = numpy.reshape(testX, (testX.shape[0], testX.shape[1], testX.shape[2]))
# create and fit the LSTM network
model = Sequential()
model.add(LSTM(look_ahead, input_shape=(trainX.shape[1], trainX.shape[2]), return_sequences=True))
model.add(LSTM(look_ahead, input_shape=(look_ahead, trainX.shape[2])))
model.add(Dense(trainY.shape[1]*trainY.shape[2]))
model.add(Reshape((trainY.shape[1], trainY.shape[2])))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, epochs=1, batch_size=1, verbose=1)
# make prediction
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)

#save model
model.save('my_sin_prediction_model.h5')

trainPredictPlottable = trainPredict[::look_ahead]
trainPredictPlottable = [item for sublist in trainPredictPlottable for item in sublist]
trainPredictPlottable = scaler.inverse_transform(numpy.array(trainPredictPlottable))
# create single testPredict array concatenating every 'look_ahed' prediction array
testPredictPlottable = testPredict[::look_ahead]
testPredictPlottable = [item for sublist in testPredictPlottable for item in sublist]
testPredictPlottable = scaler.inverse_transform(numpy.array(testPredictPlottable))
# testPredictPlottable = testPredictPlottable[:-look_ahead]
# shift train predictions for plotting
trainPredictPlot = numpy.empty_like(dataset)
trainPredictPlot[:, :] = numpy.nan
trainPredictPlot[look_back:len(trainPredictPlottable)+look_back, :] = trainPredictPlottable
# shift test predictions for plotting
testPredictPlot = numpy.empty_like(dataset)
testPredictPlot[:, :] = numpy.nan
testPredictPlot[len(dataset)-len(testPredictPlottable):len(dataset), :] = testPredictPlottable
# plot baseline and predictions
dataset = scaler.inverse_transform(dataset)
plt.plot(dataset, color='k')
plt.plot(trainPredictPlot)
plt.plot(testPredictPlot)
plt.show()

【讨论】：