【问题标题】:Tensorflow - How to read the predictionTensorflow - 如何阅读预测
【发布时间】:2018-11-20 04:10:35
【问题描述】:

我有问题。 对于一个学校项目,我创建了一个循环神经网络 (RNN),我想在其中预测股票价格是上涨还是下跌。我也有一些来自 CSV 文件的数据。训练进行得很顺利,所以我准备好预测一些测试。从 RNN 我得到了一些结果,因为它在一周内有多个预测。 这是我的代码:

import io
import requests
import os
import time
import random

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf

from sklearn.metrics import mean_absolute_error
from sklearn import preprocessing
from collections import deque

#Constant Variables
SEQ_LEN = 30
FUTURE_PERIOD_PREDICT = 3
RATIO_TO_PREDICT = "LTC-USD"
BATCH_SIZE = 64
NAME = str(RATIO_TO_PREDICT) + "-" + str(SEQ_LEN) + "-SEQ-" + str(FUTURE_PERIOD_PREDICT) + "-PRED-" + str(int(time.time()))
ACTIONS = ["Sell", "Buy"]

def classify(current, future):
    if float(future) > float(current):
        return 1
    else:
        return 0

def preprocess_df(df):
    df = df.drop('future', 1)

    for col in df.columns:
        if col != "target":
            df[col] = df[col].pct_change()
            df.dropna(inplace=True)
            #df[col] = preprocessing.scale(df[col].values)

    df.dropna(inplace=True)

    sequential_data = []
    prev_days = deque(maxlen=SEQ_LEN)



    for i in df.values:
        prev_days.append([n for n in i[:-1]])
        if len(prev_days) == SEQ_LEN:
            sequential_data.append([np.array(prev_days), i[-1]])

    buys = []
    sells = []

    for seq, target in sequential_data:
        if target == 0:
            sells.append([seq, target])
        elif target == 1:
            buys.append([seq, target])


    random.shuffle(buys)
    random.shuffle(sells)

    lower = min(len(buys), len(sells))


    buys = buys[:lower]
    sells = sells[:lower]


    sequential_data = buys+sells

    x = []
    y = []

    for seq, target in sequential_data:
        x.append(seq)
        y.append(target)

    return np.array(x), y




main_df = pd.DataFrame()

ratios = ["BTC-USD", "LTC-USD", "ETH-USD"]
for ratio in ratios:


    url="https://www.test.nl/get_csv_data_onscreen.php?method=test&ratio=" + str(ratio)
    dataset = requests.get(url, verify=False).content
    df = pd.read_csv(io.StringIO(dataset.decode('utf-8')), names=["time", "low", "high", "open", "close", "volume", "rsi14", "ma5", "ema5", "ema12", "ema20", "macd", "signal"])

    df.rename(columns={"close": str(ratio)+"_close", "volume": str(ratio) + "_volume", "rsi14": str(ratio) + "_rsi14", "ma5": str(ratio) + "_ma5", "ema5": str(ratio) + "_ema5", "ema12": str(ratio) + "_ema12", "ema20": str(ratio) + "_ema20", "macd": str(ratio) + "_macd", "signal": str(ratio) + "_signal"}, inplace=True)

    df.set_index("time", inplace=True)
    df = df[[str(ratio) + "_close", str(ratio) + "_volume", str(ratio) + "_rsi14", str(ratio) + "_ma5", str(ratio) + "_ema5", str(ratio) + "_ema12", str(ratio) + "_ema20", str(ratio) + "_macd", str(ratio) + "_signal"]]

    if len(main_df) == 0:
        main_df = df
    else:
        main_df = main_df.join(df)


main_df['future'] = main_df[str(RATIO_TO_PREDICT) + "_close"].shift(-FUTURE_PERIOD_PREDICT)
main_df['target'] = list(map(classify, main_df[str(RATIO_TO_PREDICT) + "_close"], main_df["future"]))
#print(main_df[[str(RATIO_TO_PREDICT) + "_close", "future", "target"]].head(10))


times = sorted(main_df.index.values)
last_5pct = times[-int(0.05*len(times))]

validation_main_df = main_df[(main_df.index >= last_5pct)]
main_df = main_df[(main_df.index < last_5pct)]

test_x, test_y = preprocess_df(main_df)
validation_x, validation_y = preprocess_df(validation_main_df)

model = tf.keras.models.load_model("models\Crypto_Model_0.6337.h5")

predictions = model.predict(test_x)
print(predictions)
print(ACTIONS[int(prediction[0][0])])

所以当我打印预测时,我会得到一个围绕 0 和 1 的数字列表。这是结果的简短版本:

[[ 0.61009574]
 [ 0.5243717 ]
 [ 0.56290686]
 [ 0.49165   ]
 [ 0.50527   ]
 [ 0.77428705]
 [ 0.62151164]
 [ 0.55098933]
 [ 0.45642132]
 [ 0.61239064]
 [ 0.69220203]
 [ 0.3707057 ]
 [ 0.5335519 ]
 [ 0.43078205]
 [ 0.57520276]
 [ 0.46626005]
 [ 0.37625414]
 [ 0.56013215]]

但是最新的数据点是什么。例如,这是我上传的列表的一部分:

1535782500,63.41,63.63,63.47,63.52,83505,55.104896,63.574000,63.586200,63.61220000,63.454000,0.31080000,0.44500684
1535783400,63.44,63.74,63.52,63.62,95980,56.921744,63.578000,63.597500,63.61340000,63.469800,0.28840000,0.41370000
1535784300,63.62,63.86,63.64,63.81,71996,60.216065,63.616000,63.668300,63.64360000,63.502200,0.28270000,0.38750000
1535785200,63.71,64.00,63.83,63.82,101652,60.387764,63.644000,63.718900,63.67070000,63.532500,0.27580000,0.36520000
1535786100,63.64,63.87,63.82,63.84,78686,60.752590,63.722000,63.759300,63.69670000,63.561800,0.26880000,0.34590000
1535787000,63.76,63.88,63.84,63.84,82486,60.752590,63.786000,63.786200,63.71870000,63.588300,0.26030000,0.32880000
1535787900,63.70,63.89,63.84,63.72,71654,57.093572,63.806000,63.764100,63.71890000,63.600800,0.24110000,0.31130000
1535788800,63.69,63.87,63.73,63.76,88931,58.001593,63.796000,63.762700,63.72520000,63.616000,0.22650000,0.29430000
1535789700,63.71,63.86,63.79,63.82,87103,59.389894,63.796000,63.781800,63.73980000,63.635400,0.21730000,0.27890000
1535790600,63.76,63.97,63.77,63.89,102919,61.009256,63.806000,63.817900,63.76290000,63.659600,0.21320000,0.26580000

我输入了 1 周 15 分钟的数据,即 672 行。所以只是为了清楚......

预测的最后一个值是 csv 文件中最后一行的预测吗?

【问题讨论】:

  • 还有其他人知道这个遮阳篷吗?

标签: python tensorflow


【解决方案1】:

为什么要打乱顺序时间数据?日期/时间索引应该在每一行中,并告诉您它预测的日期。不强烈建议对 RNN 或 LSTM 进行洗牌。看起来你也在尝试应用强化学习,我总是建议避免在训练中使用,你可以获得一些幸运的动作,并且模型只会记住数据点,而不是泛化算法。

【讨论】:

  • ACTIONS = ["Sell", "Buy"] 这个变量让我这么想。行动的目的是什么?记录回测的可能性?在基于操作的方法之前,您应该使用更多统计方法来验证您的模型。
  • 不,那是我没有添加的最后一部分:print(ACTIONS[int(prediction[0][0])])。因此,如果价格会更高,则为买入,否则为卖出
  • 但是,如果我打印预测,我无法打印时间,所以我不知道哪个是最后一个预测!你能帮我解决这个问题吗?
  • 删除此行 random.shuffle(sequential_data) 以实际获得该系列的顺序训练。输出数据中的第一列不是预测日期吗?
猜你喜欢
  • 2016-05-06
  • 1970-01-01
  • 1970-01-01
  • 2020-09-14
  • 2018-02-04
  • 2020-08-18
  • 2017-02-21
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多