【问题标题】:Failure to predict with ETS未能使用 ETS 进行预测
【发布时间】:2021-09-18 14:03:44
【问题描述】:

大家早上好。我正在尝试使用 ETS 进行预测。

我有以下代码:

from sktime.forecasting.ets import AutoETS


datos = [21.5294, 21.5228, 21.5289, 21.5096, 21.506, 21.5119, 21.5173, 21.5308, 21.5355, 21.5181, 21.5, 21.4972, 21.5067, 21.5149, 21.4994, 21.4967, 21.4774, 21.4662, 21.4752, 21.4858, 21.4581, 21.4398, 21.4385, 21.4471, 21.4399, 21.444, 21.4555, 21.4366, 21.4402, 21.4371, 21.4317, 21.4342, 21.411, 21.4174, 21.4149, 21.4151, 21.4186, 21.4411, 21.4569, 21.4628, 21.448, 21.4468, 21.4357, 21.4329, 21.4543, 21.4429, 21.4478, 21.4423, 21.4536, 21.4416, 21.4384, 21.4378, 21.4622, 21.4413, 21.4315, 21.4419, 21.4323, 21.429, 21.4103, 21.4194, 21.4364, 21.4245, 21.4348, 21.4276, 21.4113, 21.4235, 21.407, 21.412, 21.4263, 21.431, 21.4362, 21.432, 21.4445, 21.4487, 21.4623, 21.4766, 21.4785, 21.4891, 21.4869, 21.4903, 21.4839, 21.4856, 21.4909, 21.5048, 21.5005, 21.4905, 21.4906, 21.4914, 21.5052, 21.4898, 21.5232, 21.5234, 21.5086, 21.5108, 21.5017, 21.5141, 21.5055, 21.4953, 21.4618, 21.4504, 21.4667, 21.4602, 21.453, 21.4497, 21.4446, 21.4308, 21.4347, 21.4512, 21.4675, 21.4675, 21.465, 21.4624, 21.4682, 21.472, 21.4632, 21.4644, 21.4615, 21.4604, 21.4679, 21.4672]
indice = pd.date_range("2020-10-31 23:57:00", periods=len(datos), freq="T")

datos = pd.Series(data=datos, index=indice)

datos = datos.asfreq(freq='T')


pasado = datos[:100]
futuro = datos[100:]


model_auto = AutoETS(auto=True, initialization_method='heuristic', allow_multiplicative_trend=True, n_jobs=-1, sp=10)
model_auto.fit(pasado)


lista = list(np.array(range(20))+1)
prediccion = model_auto.predict(lista)

#print(pasado)
#print(futuro)
#print(prediccion)

pasado.plot()
futuro.plot()
prediccion.plot()
plt.show()

结果如下:

Predict

蓝线对应于我用来训练模型的数据。

橙色线对应“未来”数据

绿线对应预测,应该靠近橙线。

我不知道为什么预测总是相同的值。

我想知道您对此的看法。你知道为什么这个预测会出现这种情况吗?我该如何纠正?

谢谢。

【问题讨论】:

    标签: python python-3.x time-series prediction ets


    【解决方案1】:

    这不是错误...我不是该主题的专家,但简短的回答是:“这是由于您拥有的数据集”

    用一个例子来给出长答案会更好......想象一下你有另一组数据。如果您同意,他们可以是:

    datos = [
        30.05251300, 19.14849600, 25.31769200, 27.59143700,
        32.07645600, 23.48796100, 28.47594000, 35.12375300,
        36.83848500, 25.00701700, 30.72223000, 28.69375900,
        36.64098600, 23.82460900, 29.31168300, 31.77030900,
        35.17787700, 19.77524400, 29.60175000, 34.53884200,
        41.27359900, 26.65586200, 28.27985900, 35.19115300,
        42.20566386, 24.64917133, 32.66733514, 37.25735401,
        45.24246027, 29.35048127, 36.34420728, 41.78208136,
        49.27659843, 31.27540139, 37.85062549, 38.83704413,
        51.23690034, 31.83855162, 41.32342126, 42.79900337,
        55.70835836, 33.40714492, 42.31663797, 45.15712257,
        59.57607996, 34.83733016, 44.84168072, 46.97124960,
        60.01903094, 38.37117851, 46.97586413, 50.73379646,
        61.64687319, 39.29956937, 52.67120908, 54.33231689,
        66.83435838, 40.87118847, 51.82853579, 57.49190993,
        65.25146985, 43.06120822, 54.76075713, 59.83447494,
        73.25702747, 47.69662373, 61.09776802, 66.05576122]
    
    indice = pd.date_range("2020-10-31 23:57:00", periods=len(datos), freq="T")
    
    datos = pd.Series(data=datos, index=indice)
            
    datos = datos.asfreq(freq='T')
    

    这样你会得到一个类似这样的代码:

    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    from statsmodels.tsa.exponential_smoothing.ets import ETSModel
        
    datos = [
            30.05251300, 19.14849600, 25.31769200, 27.59143700,
            32.07645600, 23.48796100, 28.47594000, 35.12375300,
            36.83848500, 25.00701700, 30.72223000, 28.69375900,
            36.64098600, 23.82460900, 29.31168300, 31.77030900,
            35.17787700, 19.77524400, 29.60175000, 34.53884200,
            41.27359900, 26.65586200, 28.27985900, 35.19115300,
            42.20566386, 24.64917133, 32.66733514, 37.25735401,
            45.24246027, 29.35048127, 36.34420728, 41.78208136,
            49.27659843, 31.27540139, 37.85062549, 38.83704413,
            51.23690034, 31.83855162, 41.32342126, 42.79900337,
            55.70835836, 33.40714492, 42.31663797, 45.15712257,
            59.57607996, 34.83733016, 44.84168072, 46.97124960,
            60.01903094, 38.37117851, 46.97586413, 50.73379646,
            61.64687319, 39.29956937, 52.67120908, 54.33231689,
            66.83435838, 40.87118847, 51.82853579, 57.49190993,
            65.25146985, 43.06120822, 54.76075713, 59.83447494,
            73.25702747, 47.69662373, 61.09776802, 66.05576122]
        
    indice = pd.date_range("2020-10-31 23:57:00", periods=len(datos), freq="T")
        
    datos = pd.Series(data=datos, index=indice)
        
    datos = datos.asfreq(freq='T')
              
              
    pasado = datos[:48]
    futuro = datos[47:]
    
                  
    modelo = ETSModel(datos, error="add", trend="add", seasonal="add",
                        damped_trend=True, seasonal_periods=4)
    #modelo_fit = modelo.fit(maxiter=10000)
    fit = modelo.fit()
        
    print(fit.summary())
        
    pred = fit.get_prediction(start='2020-11-01 00:44:00', end='2020-11-01 01:04:00')
        
    df = pred.summary_frame(alpha=0.05)
        
        
    simulated = fit.simulate(anchor="end", nsimulations=10, repetitions=100)
    
    for i in range(simulated.shape[1]):
      simulated.iloc[:,i].plot(label='_', color='gray', alpha=0.1)
          
    df["mean"].plot(label='mean prediction')
    df["pi_lower"].plot(linestyle='--', color='tab:cyan', label='95% interval')
    df["pi_upper"].plot(linestyle='--', color='tab:cyan', label='_')
    
    pred.endog.plot(label='data')
    plt.legend()
    plt.show()
    

    你会得到这种类型的结果:

    您的数据以橙色表示。 ETS 模型以蓝色估计数据的平均值,以及数据可以根据平均值变化的范围(即间歇的青色线)。然后(在预测中)模型执行模拟尝试预测,向前 10 步,尝试 100 次(它们是灰线)。

    在这种特殊情况下,模型非常适合数据......但当然!这是一个教科书的例子,所以它可以完美地工作——在日常实践中,理论是不同的。

    尽管您使用另一个库,但通常它可以解释您获得结果的原因。

    ETS 模型在用于预测时有几个可用的功能:

    • 预测:根据样本进行预测
    • 预测:样本内和样本外预测
    • simulate:运行状态空间模型的模拟
    • get_prediction:样本内和样本外预测,以及预测区间。

    在您的情况下,数据是随机的,因为在模型眼中缺少另一个词,并且这个特定模型很难生成或决定数据将来可以去哪里,因此它估计了一个平均值、上限和较低的范围。数据可能在未来。

    让我们使用相同的代码,只是改变数据,你会得到这样的结果:

    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    from statsmodels.tsa.exponential_smoothing.ets import ETSModel
    
        
    pasado = datos[:100]
    futuro = datos[99:]
    print(futuro)
            
    modelo = ETSModel(datos, error="add", trend="add", seasonal="add",
                  damped_trend=True, seasonal_periods=4)
    #modelo_fit = modelo.fit(maxiter=10000)
    fit = modelo.fit()
    
    print(fit.summary())
    
    #prediccion = modelo_fit.get_prediction(start='2020-11-01 01:37:00', end='2020-11-01 01:57:00')
    pred = fit.get_prediction(start='2020-11-01 01:36:00', end='2020-11-01 01:56:00')
    
    df = pred.summary_frame(alpha=0.05)
    
    
    
    
    simulated = fit.simulate(anchor="end", nsimulations=20, repetitions=100)
    for i in range(simulated.shape[1]):
      simulated.iloc[:,i].plot(label='_', color='gray', alpha=0.1)
    
    
    df["mean"].plot(label='mean prediction')
    df["pi_lower"].plot(linestyle='--', color='tab:cyan', label='95% interval')
    df["pi_upper"].plot(linestyle='--', color='tab:cyan', label='_')
    pred.endog.plot(label='data')
    
    pasado.plot(label='Pasado')
    futuro.plot(label='Futuro')
    
    
    
    plt.legend()
    plt.show()
    

    在训练数据(绿色)之后,有一种气泡(青色虚线之间包含的内容),这是对数据未来可能在哪里的估计(根据模型) ,因此通常以相同值显示给您的线是模型预测的未来值的估计平均值。换句话说,根据数据,模型不能精确地调整到你在未来变量中的数据。

    可以(肯定......也许)更好地适应数据的模型可能是 SARIMA 或 SARIMAX,最好搜索(对于以前的情况)一些适合值 order = (p, d, q) 和seasonal_order = (P, D, Q, s) 自动(尽管计算成本可能开始上升)。

    当然,模型还有很多,Mathematica 有一个函数(我现在不记得了),它会寻找最适合数据的模型和参数集。也许 Python 在某个地方也有类似的东西——如果有的话,我很想听听

    【讨论】:

      猜你喜欢
      • 2013-05-28
      • 2012-11-11
      • 1970-01-01
      • 2019-08-09
      • 1970-01-01
      • 1970-01-01
      • 2013-01-23
      • 2018-09-24
      • 2019-05-06
      相关资源
      最近更新 更多