【发布时间】:2020-10-06 19:42:34
【问题描述】:
import pandas as pd
import numpy as np
from numpy import array
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
from sklearn.preprocessing import MinMaxScaler
# split a multivariate sequence into samples
def split_sequences(sequences, n_steps, n_test):
X, y = list(), list()
for i in range(0,len(sequences),100):
# find the end of this pattern
end_ix = i + n_steps
# check if we are beyond the dataset
if i!=0 and end_ix > len(sequences):
break
sequences[i:end_ix,0]=np.insert(np.diff(sequences[i:end_ix,0]),0,0)
# gather input and output parts of the pattern
seq_x, seq_y = sequences[i:end_ix-n_test], sequences[end_ix-n_test:end_ix]
X.append(seq_x)
y.append(seq_y)
return array(X), array(y)
df = pd.read_csv('time-series-19-covid-combined.csv')
df = df.drop(['Lat','Long'], axis = 1)
df.columns = ['day','country', 'territory','confirmed','recovered','deaths']
data=df[df.country.isin(['Australia','Costa Rica','Greece','Hungary','Israel'])][['confirmed','recovered','deaths']]
is_brazil = (df['country']=='Brazil')
data2=df[(is_brazil)][['confirmed','recovered','deaths']]
date=df[(is_brazil)][['day','confirmed']]
date.day = pd.to_datetime(date.day,format='%Y%m%d', errors='ignore')
date.set_index('day', inplace=True)
n_features = data.shape[1] # this is number of parallel inputs
n_timesteps = date.shape[0] # this is number of timesteps
n_test = int(n_timesteps*0.25)
X, Y = split_sequences(data.values, n_timesteps, n_test)
#normalization#####################################################
alld=np.concatenate((X,Y),1)
alld=alld.reshape(alld.shape[0]*alld.shape[1],alld.shape[2])
scaler = MinMaxScaler()
scaler.fit(alld)
X=[scaler.transform(x) for x in X]
y=[scaler.transform(y) for y in Y]
X=np.array(X)
y=np.array(y)[:,:,0]
# define model
model = Sequential()
model.add(LSTM(50, activation='relu', input_shape=(n_timesteps - n_test, n_features)))
model.add(Dense(y.shape[1]))
model.compile(optimizer='adam', loss='mse')
# fit model
model.fit(X, y, epochs=200, verbose=1)
# evaluation
data2x=data2
truth = data2
data2x.values[0:len(data2x),0]=np.insert(np.diff(data2x.values[0:len(data2x),0]),0,0)
data2x=scaler.transform(data2x)
X_test = np.expand_dims(data2x, axis=0)
yhat = model.predict(X_test[:,-n_timesteps + n_test:,:], verbose=0)
print (data2x[-n_timesteps + n_test:,0], yhat)
actual_predictions = scaler.inverse_transform(np.tile(yhat, (1, 1, 3))[0])[:,0]
大小和值:
X: float-64 (16,108,3) 数组
X_test: float-64 (1,144,3) 数组
Y: float-64 (16,36,3) 数组
alld: float-64 (2304,3) 数组
数据:数据框 (1728,3)
data2: 数据框 (144,3)
data2x:float-64 (144,3) 数组
日期:数据框 (144,1)
df: 数据框 (38448,6)
is_brazil: 系列 (38448,)
n_features:3(整数)
n_test: 36 (int)
n_timesteps: 144 (int)
真相:数据框 (144,3)
y: float-64 (16,36) 数组
yhat: float-32 (1,36) 数组
我打算在我的项目中做的是使用来自特定国家/地区的确诊病例、康复患者和死亡人数的数据来训练 LSTM,并尝试预测另一个国家/地区的病例数。例如:使用来自澳大利亚、哥斯达黎加、希腊、匈牙利和以色列的数据训练 LSTM,并尝试预测巴西的病例数。
找到原始代码here 并尝试使用 Keras 对其进行编码,但是在上面的最后一行代码中,当我尝试反向规范化时,我遇到了错误:ValueError:operands could not be broadcast连同形状 (1,108) (3,) (1,108)
我不知道可以做些什么来解决这个问题。在其他线程中搜索,但没有成功。任何解决方案将不胜感激。
最好的问候,
海高。
【问题讨论】:
标签: python arrays pandas numpy keras