An overview of time series forecasting models

2019-10-04 09:47:05

This blog is from: https://towardsdatascience.com/an-overview-of-time-series-forecasting-models-a2fa7a358fcb

What is this article about?

here.

Time series forecasting is a hot topic which has many possible applications, such as stock prices forecasting, weather forecasting, business planning, resources allocation and many others. Even though forecasting can be considered as a subset of supervised regression problems, some specific tools are necessary due to the temporal nature of observations.

What is a time series?

How to validate and test a time series model?

Due to the temporal dependencies in time series data, we cannot rely on usual validation techniques. To avoid biased evaluations we must ensure that training sets contains observations that occurred prior to the ones in validation sets.

time series cross validation and it is summarised in the following picture, in which the blue points represents the training sets in each “fold” and the red points represent the corresponding validation sets.

An overview of time series forecasting models

Time series cross-validation. Credits to Rob J Hyndman

1,2,…,n steps ahead. In this way we can also compare the goodness of the forecasts for different time horizons.

test set subsequent in time. The performance estimate can be done by using the same sliding window technique used for cross validation, but without re-estimating the model parameters.

Short data exploration

electrical equipment manufactured in the Euro area.

fpp2 package in R. To make the data available outside R you can simply run the following code in a R environment.

library(fpp2)
write.csv(elecequip,file = “elecequip.csv”,row.names = FALSE)

last 2 years for testing purposes.

The time series has a peak at the end of 2000 and another one during 2007. The huge decrease that we observe at the end of 2008 is probably due to the global financial crisis which occurred during that year.

There seems to be a yearly seasonal pattern. To better visualise this, we show data for each year separately in both original and polar coordinates.

We observe a strong seasonal pattern. In particular there is a huge decline in production in August due to the summer holidays.

Time series forecasting models

We will consider the following models:

Naïve, SNaïve
Seasonal decomposition (+ any model)
Exponential smoothing
ARIMA, SARIMA
GARCH
Dynamic linear models
TBATS
Prophet
NNETAR
LSTM

t+1,…,t+12.

Mean Absolute Error (MAE) to assess the performance of the models.

1) Naïve, SNaïve

last observed value.

Ŷ(t+h|t) = Y(t)

This kind of forecast assumes that the stochastic model generating the time series is a random walk.

T, the forecasts given by the SNaïve model are given by:

Ŷ(t+h|t) = Y(t+h-T)

T time steps. In our application, the SNaïve forecast for the next year is equal to the last year’s observations.

benchmark models. The following plots show the predictions obtained with the two models for the year 2007.

forecast R package.

2) Seasonal decomposition (+ any model)

If data shows some seasonality (e.g. daily, weekly, quarterly, yearly) it may be useful to decompose the original time series into the sum of three components:

Y(t) = S(t) + T(t) + R(t)

R(t) is the remainder component.

classical decomposition and it consists in:

Estimating trend T(t) through a rolling mean
Y(t)-T(t) for each season (e.g. for each month)
R(t)=Y(t)-T(t)-S(t)

has been extended in several ways. Its extensions allow to:

have a non-constant seasonality
compute initial and last values of the decomposition
avoid over-smoothing

STL decomposition, which is known to be versatile and robust.

STL decomposition on industrial production index data

use the decomposition for forecasting purposes is the following:

Y(t)= S(t)+T(t)+R(t).
any model you like to forecast the evolution of the seasonally adjusted time series.
S(t) for last year).

In the following picture we show the seasonally adjusted industrial production index time series.

The following plot shows the predictions obtained for the year 2007 by using the STL decomposition and the naïve model to fit the seasonally adjusted time series.

stats R package.

3) Exponential smoothing

simple exponential smoothing and its forecasts are given by:

Ŷ(t+h|t) = ⍺y(t) + ⍺(1-⍺)y(t-1) + ⍺(1-⍺)²y(t-2) + …

0<⍺<1.

the corresponding weights decrease exponentially as we go back in time.

here.

The following plots show the predictions obtained for the year 2007 by using exponential smoothing models (automatically selected) to fit both the original and the seasonally adjusted time series.

forecast R package.

4) ARIMA, SARIMA

ARIMA models are among the most widely used approaches for time series forecasting. The name is an acronym for AutoRegressive Integrated Moving Average.

Moving Average model the forecasts correspond to a linear combination of past forecast errors.

Integrating) the time series may be a necessary step, i.e. considering the time series of the differences instead of the original one.

SARIMA model (Seasonal ARIMA) extends the ARIMA by adding a linear combination of seasonal past values and/or forecast errors.

here.

The following plots show the predictions obtained for the year 2007 by using a SARIMA model and an ARIMA model on the seasonally adjusted time series.

forecast R package.

5) GARCH

heteroskedastic, i.e. with constant variance.

GARCH model assumes that the variance of the error terms follows an AutoRegressive Moving Average (ARMA) process, therefore allowing it to change in time. It is particularly useful for modelling financial time series whose volatility changes across time. The name is an acronym for Generalised Autoregressive Conditional Heteroskedasticity.

here.

The following plots show the predictions obtained for the year 2007 by using a GARCH model to fit the seasonally adjusted time series.

rugarch R package.

6) Dynamic linear models

coefficients change in time. An example of dynamic linear model is given below.

y(t) = ⍺(t) + tβ(t) + w(t)

⍺(t) = ⍺(t-1) + m(t)

β(t) = β(t-1) + r(t)

w(t)~N(0,W) , m(t)~N(0,M) , r(t)~N(0,R)

β(t) follow a random walk process.

here.

heavy computational costs I had to keep the model extremely simple which resulted in poor forecasts.

dlm R package.

7) TBATS

The TBATS model is a forecasting model based on exponential smoothing. The name is an acronym for Trigonometric, Box-Cox transform, ARMA errors, Trend and Seasonal components.

multiple seasonalities by modelling each seasonality with a trigonometric representation based on Fourier series. A classic example of complex seasonality is given by daily observations of sales volumes which often have both weekly and yearly seasonality.

here.

The following plot shows the predictions obtained for the year 2007 by using a TBATS model to fit the time series.

forecast R package.

8) Prophet

Core Data Science team.

The prophet model assumes that the the time series can be decomposed as follows:

y(t) = g(t) + s(t) + h(t) + ε(t)

h(t) correspond respectively to trend, seasonality and holiday. The last term is the error term.

curve-fitting exercise, therefore it does not explicitly take into account the temporal dependence structure in the data. This also allows to have irregularly spaced observations.

holydays can be easily incorporated into the model.

Bayesian framework and it allows to make full posterior inference to include model parameter uncertainty in the forecast uncertainty.

here.

The following plot shows the predictions obtained for the year 2007 by using a Prophet model to fit the time series.

prophet R package.

9) NNETAR

The NNETAR model is a fully connected neural network. The acronym stands for Neural NETwork AutoRegression.

t+1. To perform multi-steps forecasts the network is applied iteratively.

here.

The following plots show the predictions obtained for the year 2007 obtained by using a NNETAR model with seasonally lagged input and a NNETAR model on the seasonally adjusted time series.

forecast R package.

10) LSTM

LSTM models can be used to forecast time series (as well as other Recurrent Neural Networks). LSTM is an acronym that stands for Long-Short Term Memories.

keep tracks of dependencies of new observations with past ones (even very far ones).

here. However, they are mostly used with unstructured data (e.g. audio, text, video).

here.

The following plot shows the predictions for the first year in the test set obtained by fitting a LSTM model on the seasonally adjusted time series.

Keras framework in Python.

Evaluation

cross-validation procedure described previously. We didn’t compute it for dynamic linear models and LSTM models due to their high computational cost and poor performance.

In the following picture we show the cross-validated MAE for each model and for each time horizon.

We can see that, for time horizons greater than 4, the NNETAR model on the seasonally adjusted data performed better than the others. Let’s check the overall MAE computed by averaging over different time horizons.

Cross-validated MAE

best model for this application since it corresponded to the lowest cross-validated MAE.

5,24. In the following picture we can see the MAE estimated on the test set for each time horizon.

How to further improve performance

Other techniques to increase models performance could be:

Using different models for different time horizons
Combining multiple forecasts (e.g. considering the average prediction)
Bootstrap Aggregating

The last technique can be summarised as follows:

Decompose the original time series (e.g. by using STL)
Generate a set of similar time series by randomly shuffling chunks of the Remainder component
Fit a model on each time series
Average forecasts of every model

here.

Other models

not included in this list are for instance:

Any standard regression model, taking time as input (and/or other features)
Encoder Decoder models, which are tipically used in NLP tasks (e.g. translation)
Attention Networks, which are tipically applied to unstructured data (e.g. Text-to-Speech)

Final remarks

feature selection.

not use information from the future, which could be satisfied by forecasting the predictors or by using their lagged versions.