Professional Documents
Culture Documents
Fundamentals
1 2 3 4
• What is Time Series • Promise of Deep Learning • Taxonomy of Time Series • How to Develop a Skillful
Forecasting? for Time Series Forecasting Problems Forecasting Model
Forecasting
5 6 7
• Time Series as Supervised • Review of Simple and • Classical Time Series
Learning Classical Forecasting Forecasting Methods in
Methods Python
◦ These problems are neglected because it is this time component that makes time series problems more difficult
to handle
◦ Standard definitions of time series, time series analysis, and time series forecasting
◦ The important components to consider in time series data
◦ Examples of time series to make your understanding concrete
observation #1
observation #2
observation #3
◦ Nomenclature
◦ t-n: A prior or lag time (e.g. t-1 for the previous time)
◦ t: A current time and point of reference
◦ t+n: A future or forecast time (e.g. t+1 for the next time)
◦ In descriptive modeling, or time series analysis, a time series is modeled to determine its components in terms of
seasonal patterns, trends, relation to external factors, and the like
◦ In time series forecasting, the information in a time series (perhaps with additional information) is used to
forecast future values of that series
◦ The quality of a descriptive model is determined by how well it describes all available data and the interpretation
it provides to better inform the problem domain
◦ The primary objective of time series analysis is to develop mathematical models that provide plausible
descriptions from sample data
◦ An important distinction in forecasting is that the future is completely unavailable and must only be estimated
from what has already happened
◦ The skill of a time series forecasting model is determined by its performance at predicting the future
◦ This is often at the expense of being able to explain why a specific prediction was made, condence intervals and even
better understanding the underlying causes behind the problem
◦ All time series have a level, most have noise, and the trend and seasonality are optional
◦ The main features of many time series are trends and seasonal variations another important feature of most time series is
that observations close together in time tend to be correlated (serially dependent)
◦ Outliers
◦ Perhaps there are corrupt or extreme outlier values that need to be identified and handled
◦ Missing
◦ Perhaps there are gaps or missing data that need to be interpolated or imputed
◦ Discover the promised capabilities of deep learning neural networks for time series forecasting
◦ The focus and implicit, if not explicit, limitations on classical time series forecasting methods
◦ The general capabilities of Multilayer Perceptrons and how they may be harnessed for time series forecasting
◦ The added capabilities of feature learning and native support for sequences provided by Convolutional Neural Networks
and Recurrent Neural Networks
◦ Traditional time series forecasting has been dominated by linear methods like ARIMA because well understood
and effective on many problems
◦ This expectation of a learnable mapping function also makes one of the limitations clear: the mapping function is
fixed or static
◦ Fixed Inputs. The number of lag input variables is fixed, in the same way as traditional time series forecasting methods
◦ Fixed Outputs. The number of output variables is also fixed; although a more subtle issue, it means that for each input
pattern, one output must be produced
◦ Operating directly on raw data, such as raw pixel values, instead of domain-specific or handcrafted features
derived from the raw data
◦ The model then learns how to automatically extract the features from the raw data that are directly useful for the problem
being addressed
◦ This is called representation learning and the CNN achieves this in such a way that the features are extracted regardless of
how they occur in the data, so-called transform or distortion invariance
◦ This capability of CNNs has been demonstrated to great effect on time series classification tasks such as
automatically detecting human activities based on raw accelerator sensor data from fitness devices and
smartphones
◦ CNNs get the benefits of Multilayer Perceptrons for time series forecasting, namely support for multivariate
input, multivariate output and learning arbitrary but complex functional relationships, but do not require that the
model learn directly from lag observations
◦ The model can learn a representation from a large input sequence that is most relevant for the prediction problem
◦ This capability of LSTMs has been used to great effect in complex natural language processing problems such
as neural machine translation where the model must learn the complex inter-relationships between words both
within a given language and across languages in translating form one language to another
◦ This capability can be used in time series forecasting
◦ Recurrent neural networks can also automatically learn the temporal dependence from the data
◦ The most relevant context of input observations to the expected output is learned and can change dynamically
◦ A framework to quickly understand and frame the time series forecasting problem
◦ A structured way of thinking about time series forecasting problems
◦ A framework to uncover the characteristics of a given time series forecasting problem
◦ A suite of specific questions, the answers to which will help to define the forecasting problem
◦ Inputs
◦ Historical data provided to the model in order to make a single forecast
◦ Not the data used to train the model
◦ The data used to make one forecast, for example the last seven days of sales data to forecast the next one day of sales
data
◦ May not be able to be specific when it comes to input data, for example may not know whether one or multiple prior time
steps are required to make a forecast
◦ Outputs
◦ Prediction or forecast for a future time step beyond the data provided as input
Unstructured Structured
◦ The number of variables may differ between the inputs and outputs, e.g. the data may not be symmetrical
Univariate and Multivariate Inputs Univariate and Multivariate Outputs
One-step Multi-step
• Forecast the next time step • Forecast more than one future time
steps
◦ The more time steps to be projected into the future, the more challenging the problem given the compounding
nature of the uncertainty on each forecasted time step
Static Dynamic
• A forecast model is fit once and used • A forecast model is fit on newly
to make predictions available data prior to each
• The model is not updated or changed prediction
between forecasts • A new model or update the existing
model after receive new observations
prior to making a subsequent
forecast
• A time series where the observations • A time series where the observations
are uniform over time are not uniform over time
• One observation each hour, day,
month or year
◦ The lack of uniformity of the observations may be caused by missing or corrupt values
◦ It may also be a feature of the problem where observations are only made available sporadically or at increasingly or
decreasingly spaced time intervals
◦ In the case of non-uniform observations, specific data formatting may be required when fitting some models to make the
observations uniform over time
• Select a standard time series dataset and work through the questions in the taxonomy
to learn more about the dataset
Standard Form
• Transform the taxonomy into a form or spreadsheet that can re-use on new time series
forecasting projects going forward
Additional Characteristic
• Brainstorm and list at least one additional characteristic of a time series forecasting
problem and a question that might used to identify it
◦ A specific and actionable procedure that can use to work through the time series forecasting problem and get
better than average performance from the model
◦ A systematic four-step process that can use to work through any time series forecasting problem
◦ A list of models to evaluate and the order in which to evaluate them
◦ A methodology that allows the choice of final model to be defensible with empirical evidence, rather than whim or fashion
Design
Define Test Finalize
Test
Problem Models Model
Harness
◦ The process is different from a classical linear work-through of a predictive modeling problem
◦ It is designed to get a working forecast model fast and then slow down and see if you can get a better model
◦ Recommend having separate code for each experiment that can be re-run at any time
◦ This is important so that can circle back when discover a bug, fix the code, and re-run an experiment
◦ Running experiments and iterating quickly, but if sloppy, then cannot trust any of your results
◦ Especially important when it comes to the design of the test harness for evaluating candidate models
5. Univariate vs. Multivariate 6. Single-step vs. Multi-step 7. Static vs. Dynamic 8. Contiguous vs.
• Are you working on a univariate or • Do you require a single-step or a • Do you require a static or a Discontiguous
multivariate time series problem? multi-step forecast? dynamically updated model? • Are your observations contiguous
or discontiguous?
◦ A common time series forecasting model evaluation scheme if looking for ideas
◦ Split the dataset into a train and test set
◦ Fit a candidate approach on the training dataset
◦ Make predictions on the test set directly or using walk-forward validation
◦ Calculate a metric that compares the predictions to the expected values
◦ The test harness must be robust and must have complete trust in the results it provides
◦ Ensure that any coeficients used for data preparation are estimated from the training dataset only and then applied on the
test set
◦ Include mean and standard deviation in the case of data standardization
◦ Involves training a new final model on all available historical data (train and test)
◦ How can re-frame the time series problem as a supervised learning problem for machine learning
◦ What supervised learning is and how it is the foundation for all predictive modeling machine learning algorithms
◦ The sliding window method for framing a time series dataset and how to use it
◦ How to use the sliding window for multivariate data and multi-step forecasting
Y = f (X )
X, y
5, 0.9
4, 0.8
one output variable to be predicted (y)
5, 1.0
3, 0.7
4, 0.9
Classification Regression
A classification problem is when the output A regression problem is when the output
variable is a category, such as red and blue variable is a real value, such as dollars or
or disease and no disease weight
time, measure
1, 100
2, 110 Use previous time steps as input variables
3, 108
4, 115
5, 120
Use the next time step as the output variable
X, y
?, 100 The previous time step is the input (X)
100, 110
110, 108
108, 115 The next time step is the output (y)
115, 120
120, ? The order between the observations is preserved, and must continue to be
preserved when using this dataset to train a supervised model
There are no previous value that can use to predict the first value in the
sequence
Delete this row as cannot use it
There are no a known next value to predict for the last value in the sequence
Delete this value while training the supervised model
◦ Notice:
◦ Turn a time series into either a regression or a classification supervised learning problem for real-valued or labeled time
series values
◦ The standard linear and nonlinear machine learning algorithms may be applied
◦ The width sliding window can be increased to include more previous time steps
◦ The sliding window approach can be used on a time series that has more than one value, or so-called multivariate time
series
Need to remove
◦ There are a number of ways to model multi-step forecasting as a supervised learning problem
◦ Framing multi-step forecast using the sliding window method
X1, y1, y2
? 100, 110
100, 110, 108
110, 108, 115
108, 115, 120 A supervised model only has X1 to work with in order to predict both y1
115, 120, ?
120, ?, ?
and y2
Careful thought and experimentation are needed on the problem to find
Cannot be used to train a window width that results in acceptable model performance
Naive, or using observations values directly Average, or using a statistic calculated on previous observations
◦ Seasonal Autoregressive Integrated Moving Average (SARIMA): An extension to ARIMA that supports the direct
modeling of the seasonal component of the series
◦ Discover the SARIMA method for time series forecasting with univariate data containing trends and seasonality
◦ This acronym is descriptive, capturing the key aspects of the model itself
◦ AR: Autoregression – A model that uses the dependent relationship between an observation and some number of lagged
observations
◦ I: Integrated – The use of differencing of raw observations (e.g. subtracting an observation from an observation at the
previous time step) in order to make the time series stationary
◦ MA: Moving Average – A model that uses the dependency between an observation and a residual error from a moving
average model applied to lagged observations
◦ Time series methods like the Box-Jenkins ARIMA family of methods develop a model
◦ The prediction is a weighted linear sum of recent past observations or lags
◦ Exponential smoothing forecasting methods are similar in that a prediction is a weighted sum of past
observations
◦ The model explicitly uses an exponentially decreasing weight for past observations
◦ Past observations are weighted with a geometrically decreasing ratio
◦ A suite of classical methods for time series forecasting that can test on the forecasting problem prior to
exploring to machine learning methods
Seasonal Autoregressive
Seasonal Autoregressive
Integrated Moving- Vector Autoregression Vector Autoregression
Integrated Moving-
Average with Exogenous (VAR) Moving-Average (VARMA)
Average (SARIMA)
Regressors (SARIMAX)
Vector Autoregression
Moving-Average with Simple Exponential Holt Winter’s Exponential
Exogenous Regressors Smoothing (SES) Smoothing (HWES)
(VARMAX)
◦ The method is suitable for univariate time series without trend and seasonal components
statsmodels.tsa.ar_model.AutoReg
statsmodels.tsa.ar_model.AutoRegResults
# contrived dataset
data = [x + random() for x in range(1, 100)]
# fit model
model = AutoReg(data, lags=1)
model_fit = model.fit()
# make prediction
yhat = model_fit.predict(len(data), len(data))
print(yhat)
◦ The method is suitable for univariate time series without trend and seasonal components
# contrived dataset
data = [x + random() for x in range(1, 100)]
# fit model
model = ARIMA(data, order=(0, 0, 1))
model_fit = model.fit()
# make prediction
yhat = model_fit.predict(len(data), len(data))
print(yhat)
◦ The method is suitable for univariate time series without trend and seasonal components
# contrived dataset
data = [random() for x in range(1, 100)]
# fit model
model = ARIMA(data, order=(2, 0, 1))
model_fit = model.fit()
# make prediction
yhat = model_fit.predict(len(data), len(data))
print(yhat)
◦ The method is suitable for univariate time series with trend and without seasonal components
# contrived dataset
data = [random() for x in range(1, 100)]
# fit model
model = ARIMA(data, order=(2, 0, 1))
model_fit = model.fit()
# make prediction
yhat = model_fit.predict(len(data), len(data))
print(yhat)
◦ The method is suitable for univariate time series with trend and/or seasonal components
# contrived dataset
data = [x + random() for x in range(1, 100)]
# fit model
model = SARIMAX(data, order=(1, 1, 1), seasonal_order=(0, 0, 0, 0))
model_fit = model.fit(disp=False)
# make prediction
yhat = model_fit.predict(start=len(data), end=len(data))
print(yhat)
◦ The method is suitable for univariate time series with trend and/or seasonal components and exogenous
variables
# contrived dataset
data1 = [x + random() for x in range(1, 100)]
data2 = [x + random() for x in range(101, 200)]
# fit model
model = SARIMAX(data1, exog=data2, order=(1, 1, 1), seasonal_order=(0, 0, 0, 0))
model_fit = model.fit(disp=False)
# make prediction
exog2 = [200 + random()]
yhat = model_fit.predict(len(data1), len(data1), exog=[exog2])
print(yhat)
◦ The method is suitable for multivariate time series without trend and seasonal components
# fit model
model = VAR(data)
model_fit = model.fit()
# make prediction
yhat = model_fit.forecast(model_fit.endog, steps=1)
print(yhat)
◦ The method is suitable for multivariate time series without trend and seasonal components
# fit model
model = VARMAX(data, order=(1, 1))
model_fit = model.fit(disp=False)
# make prediction
yhat = model_fit.forecast()
print(yhat)
◦ The method is suitable for multivariate time series without trend and seasonal components with exogenous
variables
# fit model
model = VARMAX(data, exog=data_exog, order=(1, 1))
model_fit = model.fit(disp=False)
# make prediction
data_exog2 = [[100]]
yhat = model_fit.forecast(exog=data_exog2)
print(yhat)
◦ The method is suitable for univariate time series without trend and seasonal components
# contrived dataset
data = [x + random() for x in range(1, 100)]
# fit model
model = SimpleExpSmoothing(data)
model_fit = model.fit()
# make prediction
yhat = model_fit.predict(len(data), len(data))
print(yhat)
◦ The method is suitable for univariate time series with trend and/or seasonal components
# contrived dataset
data = [x + random() for x in range(1, 100)]
# fit model
model = ExponentialSmoothing(data)
model_fit = model.fit()
# make prediction
yhat = model_fit.predict(len(data), len(data))
print(yhat)