【问题标题】:How to loop through various train and test splits如何循环通过各种训练和测试拆分
【发布时间】:2020-12-21 22:36:27
【问题描述】:

我使用 TimeSeriesSplit() 创建了各种训练和测试拆分。我的数据框有 377 个观察值,包含 6 个输入变量和 1 个目标变量。

我使用以下代码将我的数据框拆分为训练和测试:

#train set 
i=0
for X_train, X_test in tscv.split(data):
    i=i+1
    print ("No of observations under train%s=%s"%(i,len(X_train)))
    print ("No of observations under test%s=%s" % (i, len(X_test)))

X_train1, X_test1 = data[:67, :-1],  data[67:129,:-1]
X_train2, X_test2 = data[:129,:-1], data[129:191,:-1]
X_train3, X_test3 = data[:191,:-1], data[191:253,:-1]
X_train4, X_test4 = data[:253,:-1], data[253:315,:-1]
X_train5, X_test5 = data[:315,:-1], data[315:377,:-1]

#test set
i=0
for y_train, y_test in tscv.split(data):
    i=i+1
    print ("No of observations under train%s=%s"%(i,len(y_train)))
    print ("No of observations under test%s=%s" % (i, len(y_test)))

y_train1, y_test1 = data[:67, -1], data[67:129 ,-1]
y_train2, y_test2 = data[:129,-1], data[129:191,-1]
y_train3, y_test3 = data[:191,-1], data[191:253,-1]
y_train4, y_test4 = data[:253,-1], data[253:315,-1]
y_train5, y_test5 = data[:315,-1], data[315:377,-1]

所以我总共有 5 个拆分。我想通过这些拆分训练我的 lstm 模型,但我不确定我能做到最好。这是我的 lstm 的代码:

# split into input and outputs
train_X, train_y = X_train, y_train
test_X, test_y = X_test, y_test

#reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,LSTM, Flatten
import matplotlib.pyplot as pyplot
# design network
model = Sequential()
model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dense(1))
model.compile(loss='mae', optimizer='adam')
history = model.fit(train_X, train_y, epochs=700
                    , batch_size=72, validation_data=(test_X, test_y), verbose=2, shuffle=False)

# plot history
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
pyplot.show()

#predictions
y_lstm = model.predict(test_X)

#metrics for test set
mse_lstm = mean_squared_error(y_test, y_lstm)
rmse_lstm = np.sqrt(mse_lstm)
r2_lstm = r2_score(y_test, y_lstm)
mae_lstm = mean_absolute_error(y_test, y_lstm)

#train metics
train     = model.predict(X_t_reshaped)
msetrain  = mean_squared_error(y_train, train)
rmsetrain = np.sqrt(msetrain)
r2train   = r2_score(y_train, train)

如何使用上述代码循环遍历所有不同的拆分并将结果存储在列表或数据框中?

我还想绘制如下图所示的预测结果

这是基于@Ashraful 答案的图表

【问题讨论】:

    标签: python pandas keras


    【解决方案1】:

    用这个替换你的最后一个代码块,

    from sklearn.metrics import  mean_squared_error
    from sklearn.metrics import *
    import numpy as np
    import csv  
    
    Round = 3      # define the number of digits after decimal point you want 
    
    fields = ['Fold_No', 'mse_lstm', 'rmse_lstm', 'r2_lstm','mae_lstm']  
    csvfile = open('Summary.csv', 'w') 
    csvwriter = csv.writer(csvfile)  
    csvwriter.writerow(fields) 
    
    
    for fold in range(1,6):
        print(f'Running fold {fold}')
        # split into input and outputs
        train_X, train_y = eval(f'X_train{fold}'),eval(f'y_train{fold}')
        test_X, test_y = eval(f'X_test{fold}'),eval(f'y_test{fold}')
        print(train_X.shape)
    
    
    
        #reshape input to be 3D [samples, timesteps, features]
        train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
        test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))
    
        from tensorflow.keras.models import Sequential
        from tensorflow.keras.layers import Dense,LSTM, Flatten
        import matplotlib.pyplot as pyplot
        # design network
        model = Sequential()
        model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))
        model.add(Dense(1))
        model.compile(loss='mae', optimizer='adam')
        history = model.fit(train_X, train_y, epochs=2
                            , batch_size=72, validation_data=(test_X, test_y), verbose=2, shuffle=False)
    
        # plot history
        pyplot.plot(history.history['loss'], label='train')
        pyplot.plot(history.history['val_loss'], label='test')
        pyplot.legend()
        pyplot.show()
    
        #predictions
        train_output =  model.predict(train_X)
        y_lstm = model.predict(test_X)
    
        pyplot.plot(train_output, label='Training output')
        pyplot.plot(train_y, label='Obesrved Training Target')
        # pyplot.plot(train_y, label='Training value')
        pyplot.plot(test_y, label='Obesrved Predic. Target')
        pyplot.plot(y_lstm, label='Predicted Output')
        pyplot.legend(loc='upper right')
        # pyplot.legend()
        pyplot.show()
        
        #metrics for test set
        mse_lstm = mean_squared_error(y_test1, y_lstm)
        rmse_lstm = np.sqrt(mse_lstm)
        r2_lstm = r2_score(y_test1, y_lstm)
        mae_lstm = mean_absolute_error(y_test1, y_lstm)
    
        csvwriter.writerow([f'Fold_{fold}',round(mse_lstm,Round), round(rmse_lstm,Round), round(r2_lstm,Round),round(mae_lstm,Round)]) 
    
    
    csvfile.close()
    
    #read stored CSV file
    summary= pd.read_csv('Summary.csv')
    
    print(summary)
    

    另外,我在 colab 文件中的实现,你可以找到 here

    【讨论】:

    • 这很好用,谢谢。但是我将如何绘制所有值的预测图,这样我才能看到模型是如何预测的?我已编辑问题以显示示例图表
    • 更新问题以显示我在添加更新后得到的情节。每次折叠后我都会得到该图,但我想在最后绘制图 ocne,就像问题中显示的图像一样
    • 有没有办法可以将每个折叠的火车数据保存到数据框中?因此,例如折叠 1,我采用预测的火车并保存,折叠 2 我附加到火车预测数据帧,折叠 3 我附加等等
    猜你喜欢
    • 2020-10-21
    • 1970-01-01
    • 2017-11-01
    • 1970-01-01
    • 1970-01-01
    • 2021-06-28
    • 1970-01-01
    • 2020-06-08
    • 2018-12-21
    相关资源
    最近更新 更多