【发布时间】:2021-01-15 10:03:20
【问题描述】:
我需要预测具有 N 台虚拟机的数据中心的工作负载。数据的结构如下:
id,日期,小时,星期几,cpu,ram,ram_tot,users,id_vm
5fff03b99b56dba65a873e2a,2020-12-14,00:00,1,2,820,8000,10,1
5fff03ba9b56dba65a873e2c,2020-12-14,00:00,1,2,2458,16000,1,2
数据包括:id、日期、小时、星期几 (1-7)、VM 的 CPU 数量、使用的 RAM、总 RAM、连接到相关 VM 的用户数、VM id(1 或 2)。 这是在熊猫数据框中导入的。在数据框中,我构建了一个名为 peak 的列,如果存在虚拟机的工作负载(使用的 ram 百分比非常高,>80%),则其值为 1,否则为 0。 我构建了一个时间序列数据集并对其进行了规范化。我构建了一个 LSTM 网络来预测是否会出现工作负载峰值(预测变量为峰值),具有训练和测试阶段 我在验证阶段得到了非常糟糕的结果:相对于实际值,预测值非常低。 我想如果网络在预测峰值时运行良好,则相关值接近 1。
这是我的代码:
#read data from a mongo db and passed in a pandas dataframe
df = DataFrame(list_cur)
# calc for %mem used
df['pmem'] = (df['ram']/df['ram_tot'])*100
conditions = [(df['pmem'] <= 80), (df['pmem'] > 80)] #80
values = [0, 1]
df['peak'] = np.select(conditions, values)
df['datetime'] = df['data'] + ' ' + df['ora']
# extract hour and minutes to build 2 new columns
df[['hh','mm']] = df.ora.str.split(":", expand=True,)
# dataset with 6 features and 1 label
# oevery row of the dataset = 1 observation
dataset = df[['hh', 'mm', 'dayofweek', 'users', 'pmem', 'id_app', 'peak']]
# normalization of the dataset
sc = MinMaxScaler(feature_range = (0, 1))
dfn = sc.fit_transform(dataset)
# build temporal series
x = []
y = []
n_steps = 192
for i in range(len(dfn)):
# find the end of this pattern
end_ix = i + n_steps
# check if we are beyond the sequence
if end_ix > len(dfn)-1:
break
# gather input and output parts of the pattern
seq_x, seq_y = dfn[i:end_ix, 0:5], dfn[end_ix, 6]
x.append(seq_x)
y.append(seq_y)
# splitting dataset in train and test
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.33, random_state=42)
# convert in arrays
X_train = np.asarray(X_train, dtype=np.float32)
X_test = np.asarray(X_test, dtype=np.float32)
y_train = np.asarray(y_train, dtype=np.float32)
y_test = np.asarray(y_test, dtype=np.float32)
# LSTM neural network model
model = Sequential()
#Adding the first LSTM layer and some Dropout regularisation
model.add(LSTM(units = 6, return_sequences = True, input_shape = (X_train.shape[1], X_train.shape[2])))
model.add(Dropout(0.2))
# Adding a second LSTM layer and some Dropout regularisation
model.add(LSTM(units = 32, return_sequences = True))
model.add(Dropout(0.2))
# Adding a third LSTM layer and some Dropout regularisation
model.add(LSTM(units = 64, return_sequences = True))
model.add(Dropout(0.2))
# Adding a fourth LSTM layer and some Dropout regularisation
model.add(LSTM(units = 32))
model.add(Dropout(0.2))
# Adding the output layer
model.add(Dense(units = 1))
model.summary()
# Compiling the LSTM
model.compile(loss = 'categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
# Fitting the LSTM to the Training set
history = model.fit(X_train, y_train, epochs = 5, batch_size = 32, validation_data=(X_test, y_test))
model.evaluate(X_test, y_test, verbose=1, return_dict=True)
print("test loss, test acc:", history)
print("Generate predictions for all samples")
yhat = model.predict(X_test, verbose=1)
plot.figure(figsize=(20, 10))
y1 = np.array(y_test)
y2 = np.array(yhat[:, 0])
plt.plot(y1, label = "Test", marker="o", linewidth=0)
plt.plot(y2, label = "Previsto", marker="x",)
plt.xlabel('x - axis')
# Set the y axis label of the current axis.
plt.ylabel('y - axis')
# Set a title of the current axes.
plt.title('Two or more lines on same plot with suitable legends ')
# show a legend on the plot
plt.legend()
# Display a figure.
plt.show()
这是我的结果。
有什么错误?
【问题讨论】:
-
尝试 sigmoid 激活最后一层和二元二元交叉熵损失
标签: python tensorflow lstm forecasting