Tensorflow - 如何阅读预测答案

【问题标题】：Tensorflow - How to read the predictionTensorflow - 如何阅读预测
【发布时间】：2018-11-20 04:10:35
【问题描述】：

我有问题。对于一个学校项目，我创建了一个循环神经网络 (RNN)，我想在其中预测股票价格是上涨还是下跌。我也有一些来自 CSV 文件的数据。训练进行得很顺利，所以我准备好预测一些测试。从 RNN 我得到了一些结果，因为它在一周内有多个预测。这是我的代码：

import io
import requests
import os
import time
import random

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf

from sklearn.metrics import mean_absolute_error
from sklearn import preprocessing
from collections import deque

#Constant Variables
SEQ_LEN = 30
FUTURE_PERIOD_PREDICT = 3
RATIO_TO_PREDICT = "LTC-USD"
BATCH_SIZE = 64
NAME = str(RATIO_TO_PREDICT) + "-" + str(SEQ_LEN) + "-SEQ-" + str(FUTURE_PERIOD_PREDICT) + "-PRED-" + str(int(time.time()))
ACTIONS = ["Sell", "Buy"]

def classify(current, future):
    if float(future) > float(current):
        return 1
    else:
        return 0

def preprocess_df(df):
    df = df.drop('future', 1)

    for col in df.columns:
        if col != "target":
            df[col] = df[col].pct_change()
            df.dropna(inplace=True)
            #df[col] = preprocessing.scale(df[col].values)

    df.dropna(inplace=True)

    sequential_data = []
    prev_days = deque(maxlen=SEQ_LEN)



    for i in df.values:
        prev_days.append([n for n in i[:-1]])
        if len(prev_days) == SEQ_LEN:
            sequential_data.append([np.array(prev_days), i[-1]])

    buys = []
    sells = []

    for seq, target in sequential_data:
        if target == 0:
            sells.append([seq, target])
        elif target == 1:
            buys.append([seq, target])


    random.shuffle(buys)
    random.shuffle(sells)

    lower = min(len(buys), len(sells))


    buys = buys[:lower]
    sells = sells[:lower]


    sequential_data = buys+sells

    x = []
    y = []

    for seq, target in sequential_data:
        x.append(seq)
        y.append(target)

    return np.array(x), y




main_df = pd.DataFrame()

ratios = ["BTC-USD", "LTC-USD", "ETH-USD"]
for ratio in ratios:


    url="https://www.test.nl/get_csv_data_onscreen.php?method=test&ratio=" + str(ratio)
    dataset = requests.get(url, verify=False).content
    df = pd.read_csv(io.StringIO(dataset.decode('utf-8')), names=["time", "low", "high", "open", "close", "volume", "rsi14", "ma5", "ema5", "ema12", "ema20", "macd", "signal"])

    df.rename(columns={"close": str(ratio)+"_close", "volume": str(ratio) + "_volume", "rsi14": str(ratio) + "_rsi14", "ma5": str(ratio) + "_ma5", "ema5": str(ratio) + "_ema5", "ema12": str(ratio) + "_ema12", "ema20": str(ratio) + "_ema20", "macd": str(ratio) + "_macd", "signal": str(ratio) + "_signal"}, inplace=True)

    df.set_index("time", inplace=True)
    df = df[[str(ratio) + "_close", str(ratio) + "_volume", str(ratio) + "_rsi14", str(ratio) + "_ma5", str(ratio) + "_ema5", str(ratio) + "_ema12", str(ratio) + "_ema20", str(ratio) + "_macd", str(ratio) + "_signal"]]

    if len(main_df) == 0:
        main_df = df
    else:
        main_df = main_df.join(df)


main_df['future'] = main_df[str(RATIO_TO_PREDICT) + "_close"].shift(-FUTURE_PERIOD_PREDICT)
main_df['target'] = list(map(classify, main_df[str(RATIO_TO_PREDICT) + "_close"], main_df["future"]))
#print(main_df[[str(RATIO_TO_PREDICT) + "_close", "future", "target"]].head(10))


times = sorted(main_df.index.values)
last_5pct = times[-int(0.05*len(times))]

validation_main_df = main_df[(main_df.index >= last_5pct)]
main_df = main_df[(main_df.index < last_5pct)]

test_x, test_y = preprocess_df(main_df)
validation_x, validation_y = preprocess_df(validation_main_df)

model = tf.keras.models.load_model("models\Crypto_Model_0.6337.h5")

predictions = model.predict(test_x)
print(predictions)
print(ACTIONS[int(prediction[0][0])])

所以当我打印预测时，我会得到一个围绕 0 和 1 的数字列表。这是结果的简短版本：

[[ 0.61009574]
 [ 0.5243717 ]
 [ 0.56290686]
 [ 0.49165   ]
 [ 0.50527   ]
 [ 0.77428705]
 [ 0.62151164]
 [ 0.55098933]
 [ 0.45642132]
 [ 0.61239064]
 [ 0.69220203]
 [ 0.3707057 ]
 [ 0.5335519 ]
 [ 0.43078205]
 [ 0.57520276]
 [ 0.46626005]
 [ 0.37625414]
 [ 0.56013215]]

但是最新的数据点是什么。例如，这是我上传的列表的一部分：

1535782500,63.41,63.63,63.47,63.52,83505,55.104896,63.574000,63.586200,63.61220000,63.454000,0.31080000,0.44500684
1535783400,63.44,63.74,63.52,63.62,95980,56.921744,63.578000,63.597500,63.61340000,63.469800,0.28840000,0.41370000
1535784300,63.62,63.86,63.64,63.81,71996,60.216065,63.616000,63.668300,63.64360000,63.502200,0.28270000,0.38750000
1535785200,63.71,64.00,63.83,63.82,101652,60.387764,63.644000,63.718900,63.67070000,63.532500,0.27580000,0.36520000
1535786100,63.64,63.87,63.82,63.84,78686,60.752590,63.722000,63.759300,63.69670000,63.561800,0.26880000,0.34590000
1535787000,63.76,63.88,63.84,63.84,82486,60.752590,63.786000,63.786200,63.71870000,63.588300,0.26030000,0.32880000
1535787900,63.70,63.89,63.84,63.72,71654,57.093572,63.806000,63.764100,63.71890000,63.600800,0.24110000,0.31130000
1535788800,63.69,63.87,63.73,63.76,88931,58.001593,63.796000,63.762700,63.72520000,63.616000,0.22650000,0.29430000
1535789700,63.71,63.86,63.79,63.82,87103,59.389894,63.796000,63.781800,63.73980000,63.635400,0.21730000,0.27890000
1535790600,63.76,63.97,63.77,63.89,102919,61.009256,63.806000,63.817900,63.76290000,63.659600,0.21320000,0.26580000

我输入了 1 周 15 分钟的数据，即 672 行。所以只是为了清楚......

预测的最后一个值是 csv 文件中最后一行的预测吗？

【问题讨论】：

还有其他人知道这个遮阳篷吗？

标签： python tensorflow

【解决方案1】：

为什么要打乱顺序时间数据？日期/时间索引应该在每一行中，并告诉您它预测的日期。不强烈建议对 RNN 或 LSTM 进行洗牌。看起来你也在尝试应用强化学习，我总是建议避免在训练中使用，你可以获得一些幸运的动作，并且模型只会记住数据点，而不是泛化算法。

【讨论】：

ACTIONS = ["Sell", "Buy"] 这个变量让我这么想。行动的目的是什么？记录回测的可能性？在基于操作的方法之前，您应该使用更多统计方法来验证您的模型。
不，那是我没有添加的最后一部分：print(ACTIONS[int(prediction[0][0])])。因此，如果价格会更高，则为买入，否则为卖出
但是，如果我打印预测，我无法打印时间，所以我不知道哪个是最后一个预测！你能帮我解决这个问题吗？
删除此行 random.shuffle(sequential_data) 以实际获得该系列的顺序训练。输出数据中的第一列不是预测日期吗？