ExponentialSmoothing - 此日期图使用什么预测方法？答案

【问题标题】：ExponentialSmoothing - What prediction method to use for this date plot?ExponentialSmoothing - 此日期图使用什么预测方法？
【发布时间】：2020-06-18 17:53:18
【问题描述】：

我目前有这些日期与累积总和的数据点。我想使用 python 预测未来日期的累积总和。我应该使用什么预测方法？

我的日期系列采用这种格式：['2020-01-20', '2020-01-24', '2020-01-26', '2020-01-27', '2020-01-30', '2020-01-31'] dtype='datetime64[ns]'

我尝试了样条，但似乎样条无法处理日期时间序列

我尝试用指数平滑法进行时间序列预测，但结果不正确。我不了解 predict(3) 的含义以及为什么它返回我已经拥有的日期的预测总和。我从一个例子中复制了这段代码。这是我的 exp 平滑代码：

fit1 = ExponentialSmoothing(date_cumsum_df).fit(smoothing_level=0.3,optimized=False)

fcast1 = fit1.predict(3)

fcast1



2020-01-27       1.810000
2020-01-30       2.467000
2020-01-31       3.826900
2020-02-01       5.978830
2020-02-02       7.785181
2020-02-04       9.949627
2020-02-05      11.764739
2020-02-06      14.535317
2020-02-09      17.374722
2020-02-10      20.262305
2020-02-16      22.583614
2020-02-18      24.808530
2020-02-19      29.065971
2020-02-20      39.846180
2020-02-21      58.792326
2020-02-22     102.054628
2020-02-23     201.038240
2020-02-24     321.026768
2020-02-25     474.318737
2020-02-26     624.523116
2020-02-27     815.166181
2020-02-28    1100.116327
2020-02-29    1470.881429
2020-03-01    1974.317000
2020-03-02    2645.321900
2020-03-03    3295.025330
2020-03-04    3904.617731

哪种方法最适合似乎呈指数增长的总和值预测？另外，我对使用 python 进行数据科学还很陌生，所以请放轻松。谢谢。

【问题讨论】：

标签： python data-science prediction

【解决方案1】：

指数平滑仅适用于没有任何缺失时间序列值的数据。对于您提到的三种方法，我将向您展示对未来 +5 天数据的预测：

指数拟合（您的猜测“似乎呈指数增长”）
样条插值
指数平滑

注意：我通过从您的绘图中窃取数据来获取您的数据，并将日期保存到 dates 并将数据值保存到 values

import pandas as pd
import numpy as np
from statsmodels.tsa.holtwinters import ExponentialSmoothing
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from scipy.optimize import curve_fit
from scipy.interpolate import splrep, splev

df = pd.DataFrame()
# mdates.date2num allows functions like curve_fit and spline to digest time series data
df['dates'] = mdates.date2num(dates)
df['values'] = values 

# Exponential fit function
def exponential_func(x, a, b, c, d):
    return a*np.exp(b*(x-c))+d

# Spline interpolation
def spline_interp(x, y, x_new):
    tck = splrep(x, y)
    return splev(x_new, tck)

# define forecast timerange (forecasting 5 days into future)
dates_forecast = np.linspace(df['dates'].min(), df['dates'].max() + 5, 100)
dd = mdates.num2date(dates_forecast)

# Doing exponential fit
popt, pcov = curve_fit(exponential_func, df['dates'], df['values'], 
                       p0=(1, 1e-2, df['dates'][0], 1))

# Doing spline interpolation
yy = spline_interp(df['dates'], df['values'], dates_forecast)

到目前为止，直截了当（mdates.date2num 函数除外）。由于您丢失了数据，因此您必须对实际数据使用样条插值来用插值数据填充缺失的时间点

# Interpolating data for exponential smoothing (no missing data in time series allowed)
df_interp = pd.DataFrame()
df_interp['dates'] = np.arange(dates[0], dates[-1] + 1, dtype='datetime64[D]')
df_interp['values'] = spline_interp(df['dates'], df['values'], 
                                    mdates.date2num(df_interp['dates']))
series_interp = pd.Series(df_interp['values'].values, 
                          pd.date_range(start='2020-01-19', end='2020-03-04', freq='D'))

# Now the exponential smoothing works fine, provide the `trend` argument given your data 
# has a clear (kind of exponential) trend
fit1 = ExponentialSmoothing(series_interp, trend='mul').fit(optimized=True)

您可以绘制这三种方法，看看它们对未来五天的预测如何

# Plot data
plt.plot(mdates.num2date(df['dates']), df['values'], 'o')
# Plot exponential function fit
plt.plot(dd, exponential_func(dates_forecast, *popt))
# Plot interpolated values
plt.plot(dd, yy)
# Plot Exponential smoothing prediction using function `forecast`
plt.plot(np.concatenate([series_interp.index.values, fit1.forecast(5).index.values]),
     np.concatenate([series_interp.values, fit1.forecast(5).values]))

所有三种方法的比较表明您选择指数平滑是正确的。它看起来比其他两种方法更好地预测未来五天

关于你的其他问题

我不明白 predict(3) 是什么意思，以及为什么它会返回我已有日期的预测总和。

ExponentialSmoothing.fit() 返回一个 statsmodels.tsa.holtwinters.HoltWintersResults 对象，它有两个函数可以用于预测/预测值：predict 和 forecast：

predict 对您的数据进行 start 和 end 观察，并将 ExponentialSmoothing 模型应用于相应的日期值。为了预测未来的值，您必须指定一个 end 参数，该参数在未来

>> fit1.predict(start=np.datetime('2020-03-01'), end=np.datetime64('2020-03-09'))
2020-03-01    4240.649526
2020-03-02    5631.207307
2020-03-03    5508.614325
2020-03-04    5898.717779
2020-03-05    6249.810230
2020-03-06    6767.659081
2020-03-07    7328.416024
2020-03-08    7935.636353
2020-03-09    8593.169945
Freq: D, dtype: float64

在您的示例中，predict(3)（等于 predict(start=3) 根据您从第三个日期开始的日期预测值，并且没有任何预测。

forecast() 只做预测。您只需将要预测的观察次数传递给未来。

>> fit1.forecast(5)
2020-03-05    6249.810230
2020-03-06    6767.659081
2020-03-07    7328.416024
2020-03-08    7935.636353
2020-03-09    8593.169945
Freq: D, dtype: float64

由于这两个函数都基于相同的ExponentialSmoothing.fit 模型，因此它们的值在相同的日期是相等的。

【讨论】：

非常感谢！我现在要尝试指数平滑。你能告诉我 fit1.predict(3) 中的“3”是什么吗？如果你想预测接下来的 5 天，你只需将 5 传递给 predict() 吗？谢谢！
@ShafinM 不客气。对不起，我忘了粘贴绘制数据/模型的代码，这可能已经回答了你的问题。但是，我编辑了答案并添加了关于如何使用fit.predict() 和fit.forecast() 的简要说明。