You are on page 1of 123

1.

Fundamentals

Badan Meteorologi, Klimatologi, dan Geofisika


Juli – Agustus 2022
Fundamentals

1 2 3 4
• What is Time Series • Promise of Deep Learning • Taxonomy of Time Series • How to Develop a Skillful
Forecasting? for Time Series Forecasting Problems Forecasting Model
Forecasting

5 6 7
• Time Series as Supervised • Review of Simple and • Classical Time Series
Learning Classical Forecasting Forecasting Methods in
Methods Python

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 13


What is Time Series Forecasting?

Badan Meteorologi, Klimatologi, dan Geofisika


Juli – Agustus 2022
What is Time Series Forecasting?
◦ Time series forecasting is an important area of machine learning that is often neglected
◦ It is important because there are so many prediction problems that involve a time component

◦ These problems are neglected because it is this time component that makes time series problems more difficult
to handle
◦ Standard definitions of time series, time series analysis, and time series forecasting
◦ The important components to consider in time series data
◦ Examples of time series to make your understanding concrete

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 15


Time Series Phenomena
◦ A timeseries can be any data obtained via measurements at regular intervals, like the daily price of a stock, the
hourly electricity consumption of a city, or the weekly sales of a store
◦ Timeseries are everywhere
◦ Natural phenomena
◦ Seismic activity
◦ The evolution of fish populations in a river
◦ The weather at a location
◦ Human activity patterns
◦ Visitors to a website
◦ A country’s GDP
◦ Credit card transactions
◦ Working with timeseries involves understanding the dynamics of a system
◦ Its periodic cycles, how it trends over time, its regular regime and its sudden spikes

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 16


Timeseries Wide Range
◦ Classification
◦ Assign one or more categorical labels to a timeseries
◦ Given the timeseries of the activity of a visitor on a website, classify whether the visitor is a bot or a human
◦ Event detection
◦ Identify the occurrence of a specific expected event within a continuous data stream
◦ A particularly useful application is “hotword detection,” where a model monitors an audio stream and detects utterances
like “Ok Google” or “Hey Alexa”
◦ Anomaly detection
◦ Detect anything unusual happening within a continuous datastream
◦ Unusual activity on your corporate network? Might be an attacker
◦ Unusual readings on a manufacturing line? Time for a human to go take a look
◦ Anomaly detection is typically done via unsupervised learning, because you often don’t know what kind of anomaly you’re
looking for, so you can’t train on specific anomaly examples

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 17


Time Series Datasets
◦ Example of a normal time series dataset

observation #1
observation #2
observation #3

◦ Time does play a role in normal machine learning datasets


◦ Predictions are made for new data when the actual outcome may not be known until some future date
◦ The future is being predicted, but all prior observations are treated equally
◦ Perhaps with some very minor temporal dynamics to overcome the idea of concept drift such as only using the last year of
observations rather than all data available

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 18


Time Series Datasets
◦ A time series dataset is different
◦ Time series adds an explicit order dependence between observations: a time dimension
◦ This additional dimension is both a constraint and a structure that provides a source of additional information
◦ A time series is a sequence of observations taken sequentially in time

Time #1, observation


Time #2, observation
Time #3, observation

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 19


Time Series Nomenclature
◦ The current time is defined as t, an observation at the current time is defined as obs(t)
◦ Times in the past are negative relative to the current time
◦ Times in the future are interested in forecasting and are positive relative to the current time

◦ Nomenclature
◦ t-n: A prior or lag time (e.g. t-1 for the previous time)
◦ t: A current time and point of reference
◦ t+n: A future or forecast time (e.g. t+1 for the next time)

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 20


Describing vs. Predicting
◦ Understanding a dataset, called time series analysis, can help to make better predictions, but is not required and
can result in a large technical investment in time and expertise not directly aligned with the desired outcome,
which is forecasting the future

◦ In descriptive modeling, or time series analysis, a time series is modeled to determine its components in terms of
seasonal patterns, trends, relation to external factors, and the like

◦ In time series forecasting, the information in a time series (perhaps with additional information) is used to
forecast future values of that series

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 21


Time Series Analysis
◦ Time series analysis involves developing models that best capture or describe an observed time series in order to
understand the underlying causes
◦ The why behind a time series dataset
◦ Assumptions about the form of the data and decomposing the time series into constitution components

◦ The quality of a descriptive model is determined by how well it describes all available data and the interpretation
it provides to better inform the problem domain

◦ The primary objective of time series analysis is to develop mathematical models that provide plausible
descriptions from sample data

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 22


Time Series Forecasting
◦ Making predictions about the future is called extrapolation in the classical statistical handling of time series data
◦ Refer to it as time series forecasting
◦ Forecasting involves taking models fit on historical data and using them to predict future observations
◦ Descriptive models can borrow from the future (i.e. to smooth or remove noise), they only seek to best describe the data

◦ An important distinction in forecasting is that the future is completely unavailable and must only be estimated
from what has already happened
◦ The skill of a time series forecasting model is determined by its performance at predicting the future
◦ This is often at the expense of being able to explain why a specific prediction was made, condence intervals and even
better understanding the underlying causes behind the problem

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 23


Components of Time Series
◦ Level: The baseline value for the series if it were a straight line
◦ Trend: The optional and often linear increasing or decreasing behavior of the series over time
◦ Seasonality: The optional repeating patterns or cycles of behavior over time
◦ Noise: The optional variability in the observations that cannot be explained by the model

◦ All time series have a level, most have noise, and the trend and seasonality are optional
◦ The main features of many time series are trends and seasonal variations another important feature of most time series is
that observations close together in time tend to be correlated (serially dependent)

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 24


Concerns of Forecasting
◦ How much data do you have available and are you able to gather it all together?
◦ More data is often more helpful, offering greater opportunity for exploratory data analysis, model testing and tuning, and
model delity
◦ What is the time horizon of predictions that is required? Short, medium or long term?
◦ Shorter time horizons are often easier to predict with higher confidence
◦ Can forecasts be updated frequently over time or must they be made once and remain static?
◦ Updating forecasts as new information becomes available often results in more accurate predictions
◦ At what temporal frequency are forecasts required?
◦ Often forecasts can be made at a lower or higher frequencies, allowing you to harness down-sampling, and up-sampling of
data, which in turn can offer benefits while modeling

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 25


Cleaning Scaling Transformation
◦ Frequency
◦ Perhaps data is provided at a frequency that is too high to model or is unevenly spaced through time requiring resampling
for use in some models

◦ Outliers
◦ Perhaps there are corrupt or extreme outlier values that need to be identified and handled

◦ Missing
◦ Perhaps there are gaps or missing data that need to be interpolated or imputed

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 26


Examples of Time Series Forecasting
◦ Forecasting the corn yield in tons by state each year
◦ Forecasting whether an EEG trace in seconds indicates a patient is having a seizure or not
◦ Forecasting the closing price of a stock each day
◦ Forecasting the birth rate at all hospitals in a city each year
◦ Forecasting product sales in units sold each day for a store
◦ Forecasting the number of passengers through a train station each day
◦ Forecasting unemployment for a state each quarter
◦ Forecasting utilization demand on a server each hour
◦ Forecasting the size of the rabbit population in a state each breeding season
◦ Forecasting the average price of gasoline in a city each day

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 27


Promise of Deep Learning for
Time Series Forecasting

Badan Meteorologi, Klimatologi, dan Geofisika


Juli – Agustus 2022
Deep Learning Neural Networks
◦ Deep learning neural networks are able to automatically learn arbitrary complex mappings from inputs to outputs
and support multiple inputs and outputs
◦ These are powerful features that offer a lot of promise for time series forecasting, particularly on problems with complex-
nonlinear dependencies, multivalent inputs, and multi-step forecasting
◦ These features along with the capabilities of more modern neural networks may offer great promise such as the automatic
feature learning provided by convolutional neural networks and the native support for sequence data in recurrent neural
networks

◦ Discover the promised capabilities of deep learning neural networks for time series forecasting
◦ The focus and implicit, if not explicit, limitations on classical time series forecasting methods
◦ The general capabilities of Multilayer Perceptrons and how they may be harnessed for time series forecasting
◦ The added capabilities of feature learning and native support for sequences provided by Convolutional Neural Networks
and Recurrent Neural Networks

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 29


Time Series Forecasting
◦ Time series forecasting is difficult
◦ Unlike the simpler problems of classification and regression, time series problems add the complexity of order or temporal
dependence between observations
◦ This can be difficult as specialized handling of the data is required when fitting and evaluating models
◦ This temporal structure can also aid in modeling, providing additional structure like trends and seasonality that can be
leveraged to improve model skill

◦ Traditional time series forecasting has been dominated by linear methods like ARIMA because well understood
and effective on many problems

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 30


Classical Methods Limitations
◦ Focus on complete data: missing or corrupt data is generally unsupported
◦ Focus on linear relationships: assuming a linear relationship excludes more complex joint distributions
◦ Focus on fixed temporal dependence: the relationship between observations at different times, and in turn the
number of lag observations provided as input, must be diagnosed and specfied
◦ Focus on univariate data: many real-world problems have multiple input variables
◦ Focus on one-step forecasts: many real-world problems require forecasts with a long time horizon

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 31


Machine Learning Methods
◦ Machine learning methods can be effective on more complex time series forecasting problems with multiple
input variables, complex nonlinear relationships, and missing data
◦ In order to perform well, these methods often require hand-engineered features prepared by either domain
experts or practitioners with a background in signal processing
◦ Classical techniques often depended on hand-crafted features that were expensive to create and required expert
knowledge of the field

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 32


Multilayer Perceptrons for Time Series
◦ Simpler neural networks such as the Multilayer Perceptron or MLP approximate a mapping function from input
variables to output variables

◦ MLP capability is valuable for time series


◦ Robust to Noise. Neural networks are robust to noise in input data and in the mapping function and can even support
learning and prediction in the presence of missing values
◦ Nonlinear. Neural networks do not make strong assumptions about the mapping function and readily learn linear and
nonlinear relationships

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 33


Neural Networks for Time Series
◦ Neural networks can be cofigured to support an arbitrary defined but fixed number of inputs and outputs in the
mapping function
◦ Multivariate Inputs. An arbitrary number of input features can be specified, providing direct support for multivariate
forecasting
◦ Multi-step Forecasts. An arbitrary number of output values can be specified, providing direct support for multi-step and
even multivariate forecasting

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 34


Feedforward Neural Networks
◦ Feedforward neural networks may be useful for time series forecasting
◦ Implicit in the usage of neural networks is the requirement that there is indeed a meaningful mapping from inputs to
outputs to learn
◦ Modeling a mapping of a random walk will perform no better than a persistence model (e.g. using the last seen
observation as the forecast)

◦ This expectation of a learnable mapping function also makes one of the limitations clear: the mapping function is
fixed or static
◦ Fixed Inputs. The number of lag input variables is fixed, in the same way as traditional time series forecasting methods
◦ Fixed Outputs. The number of output variables is also fixed; although a more subtle issue, it means that for each input
pattern, one output must be produced

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 35


Feedforward Neural Networks
◦ Feedforward neural networks do offer great capability but still suffer from this key limitation of having to specify
the temporal dependence upfront in the design of the model
◦ This dependence is almost always unknown and must be discovered and teased out from detailed analysis in a fixed form

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 36


Convolutional Neural Networks for Time Series
◦ Convolutional Neural Networks or CNNs are a type of neural network that was designed to eficiently handle
image data
◦ Proven effective on challenging computer vision problems both achieving state-of-the-art results on tasks like image
classification and providing a component in hybrid models for entirely new problems such as object localization, image
captioning and more

◦ Operating directly on raw data, such as raw pixel values, instead of domain-specific or handcrafted features
derived from the raw data
◦ The model then learns how to automatically extract the features from the raw data that are directly useful for the problem
being addressed
◦ This is called representation learning and the CNN achieves this in such a way that the features are extracted regardless of
how they occur in the data, so-called transform or distortion invariance

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 37


Convolutional Neural Networks for Time Series
◦ The ability of CNNs to learn and automatically extract features from raw input data can be applied to time series
forecasting problems
◦ A sequence of observations can be treated like a one-dimensional image that a CNN model can read and distill into the
most salient elements

◦ This capability of CNNs has been demonstrated to great effect on time series classification tasks such as
automatically detecting human activities based on raw accelerator sensor data from fitness devices and
smartphones

◦ CNNs get the benefits of Multilayer Perceptrons for time series forecasting, namely support for multivariate
input, multivariate output and learning arbitrary but complex functional relationships, but do not require that the
model learn directly from lag observations
◦ The model can learn a representation from a large input sequence that is most relevant for the prediction problem

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 38


Long Short-Term Memory (LSTM)
◦ LSTM is able to solve many time series tasks unsolvable by feedforward networks using fixed size time windows

◦ This capability of LSTMs has been used to great effect in complex natural language processing problems such
as neural machine translation where the model must learn the complex inter-relationships between words both
within a given language and across languages in translating form one language to another
◦ This capability can be used in time series forecasting

◦ Recurrent neural networks can also automatically learn the temporal dependence from the data
◦ The most relevant context of input observations to the expected output is learned and can change dynamically

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 39


Recurrent Neural Networks for Time Series
◦ Recurrent neural networks like the Long Short-Term Memory network or LSTM add the explicit handling of order
between observations when learning a mapping function from inputs to outputs, not offered by MLPs or CNNs
◦ They are a type of neural network that adds native support for input data comprised of sequences of observations

◦ Native Support for Sequences


◦ Recurrent neural networks directly add support for input sequence data

◦ The addition of sequence is a new dimension to the function being approximated


◦ Instead of mapping inputs to outputs alone, the network is capable of learning a mapping function for the inputs over time
to an output

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 40


Promise of Deep Learning 1/2
The capabilities of deep learning neural networks suggest a good fit for time series forecasting
1. Neural networks learn arbitrary mapping functions
◦ Neural networks should be able to subsume the capabilities of classical linear forecasting methods given their ability to learn arbitrary
complex mapping from inputs to outputs
2. Neural networks may not require a scaled or stationary time series as input
◦ It is good practice to manually identify and remove systematic structures from time series data to make the problem easier to model
(e.g. make the series stationary)
◦ May still be a best practice when using recurrent neural networks
◦ The available context of the sequence provided as input may allow neural network models to learn both trend and seasonality directly
3. Neural networks support multivariate inputs
◦ Each of the three classes of neural network models discussed, MLPs, CNNs and RNNs offer capabilities that are challenging for
classical time series forecasting methods
4. Neural networks support multi-step outputs

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 41


Promise of Deep Learning 2/2
5. Convolutional neural networks support eficient feature learning
◦ CNNs offer eficiency and much greater performance at automatically learning to identify, extract and distill useful features from raw
data
6. LSTM networks support eficient learning of temporal dependencies
◦ The explicit addition of support for input sequences in RNNs offers eficiency and greater performance for automatically learning the
temporal dependencies both within the input sequence and from the input sequence to the output
7. Hybrid models eficiently combine the diverse capabilities of different architectures
◦ The use of hybrid models like CNN-LSTMs and ConvLSTMs that seek to harness the capabilities of all three model types

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 42


Taxonomy of Time Series
Forecasting Problems

Badan Meteorologi, Klimatologi, dan Geofisika


Juli – Agustus 2022
Taxonomy of Time Series Forecasting Problems
◦ There are many things to consider with a new time series forecasting problem
◦ The choice make directly impacts each step of the project from the design of a test harness to evaluate forecast models to
the fundamental dificulty of the forecast problem
◦ It is possible to very quickly narrow down the options by working through a series of questions about your time series
forecasting problem
◦ Considering a few themes and questions to narrow down the type of problem, test harness, and even choice of algorithms
for the project

◦ A framework to quickly understand and frame the time series forecasting problem
◦ A structured way of thinking about time series forecasting problems
◦ A framework to uncover the characteristics of a given time series forecasting problem
◦ A suite of specific questions, the answers to which will help to define the forecasting problem

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 44


Framework Overview
◦ Time series forecasting involves developing and using a predictive model on data where there is an ordered
relationship between observations
◦ Answer a few questions and greatly improve the understanding of the structure of the forecast problem, the
structure of the model requires, and how to evaluate it

Inputs 1 Outputs Univariate 5 Multivariate

Endogenous 2 Exogenous Single-step 6 Multi-step

Unstructured 3 Structured Static 7 Dynamic

Regression 4 Classification Contiguous 8 Discontiguous

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 45


Inputs vs. Outputs
◦ A prediction problem involves using past observations to predict or forecast one or more possible future
observations
◦ The goal is to guess about what might happen in the future

◦ Inputs
◦ Historical data provided to the model in order to make a single forecast
◦ Not the data used to train the model
◦ The data used to make one forecast, for example the last seven days of sales data to forecast the next one day of sales
data
◦ May not be able to be specific when it comes to input data, for example may not know whether one or multiple prior time
steps are required to make a forecast

◦ Outputs
◦ Prediction or forecast for a future time step beyond the data provided as input

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 46


Endogenous vs. Exogenous
◦ The input data can be further subdivided in order to better understand its relationship to the output variable
◦ Endogenous
◦ An input variable is affected by other variables in the system and the output variable depends on it
◦ The observations for an input variable depend upon one another
◦ The observation at time t is dependent upon the observation at t – 1
◦ The t – 1 may depend on t – 2, and so on
◦ Exogenous
◦ An input variable is independent of other variables in the system and the output variable depends upon it
◦ Endogenous variables are infuenced by other variables in the system (including themselves) whereas as
exogenous variables are not and are considered as outside the system
◦ A time series forecasting problem has endogenous variables and may or may not have exogenous variables
◦ The output is a function of some number of prior time steps
◦ The exogenous variables are ignored given the strong focus on the time series

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 47


Regression vs. Classification
◦ Regression predictive modeling problems are those where a quantity is predicted
◦ A quantity is a numerical value; for example a price, a count, a volume, and so on
◦ A time series forecasting problem in which you want to predict one or more future numerical values is a regression type
predictive modeling problem

◦ Classification predictive modeling problems are those where a category is predicted


◦ Classify as one of two or more labels, for example hot, cold, up, down, and buy, sell
◦ A time series forecasting problem in which you want to classify input time series data is a classification type predictive
modeling problem

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 48


Unstructured vs. Structured
◦ It is useful to plot each variable in a time series and inspect the plot looking for possible patterns
◦ A time series for a single variable may not have any obvious pattern
◦ A series with no pattern as unstructured is no discernible time-dependent structure
◦ A time series may have obvious patterns
◦ A trend or seasonal cycles as structured
◦ Simplify the modeling process by identifying and removing the obvious structures from the data, such as an increasing
trend or repeating cycle
◦ Some classical methods even allow to specify parameters to handle these systematic structures directly

Unstructured Structured

• No obvious systematic time- • Systematic time-dependent patterns


dependent pattern in a time series in a time series variable (e.g. trend
variable. and/or seasonality)

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 49


Univariate vs. Multivariate
◦ A single variable measured over time is referred to as a univariate time series
◦ Univariate means one variate or one variable
◦ Multiple variables measured over time is referred to as a multivariate time series: multiple variates or multiple
variables
Univariate Multivariate

• One variable measured over time • Multiple variables measured over


time

◦ The number of variables may differ between the inputs and outputs, e.g. the data may not be symmetrical
Univariate and Multivariate Inputs Univariate and Multivariate Outputs

• One or multiple input variables • One or multiple output variables to be


measured over time predicted

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 50


Single-step vs. Multi-step
◦ A one-step forecast model : a forecast problem that requires a prediction of the next time step
◦ A multi-step forecast model : a forecast problem that requires a prediction of more than one time step

One-step Multi-step

• Forecast the next time step • Forecast more than one future time
steps

◦ The more time steps to be projected into the future, the more challenging the problem given the compounding
nature of the uncertainty on each forecasted time step

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 51


Static vs. Dynamic
◦ It is possible to develop a model once and use it repeatedly to make predictions

Static Dynamic

• A forecast model is fit once and used • A forecast model is fit on newly
to make predictions available data prior to each
• The model is not updated or changed prediction
between forecasts • A new model or update the existing
model after receive new observations
prior to making a subsequent
forecast

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 52


Contiguous vs. Discontiguous
Contiguous Discontiguous

• A time series where the observations • A time series where the observations
are uniform over time are not uniform over time
• One observation each hour, day,
month or year

◦ The lack of uniformity of the observations may be caused by missing or corrupt values
◦ It may also be a feature of the problem where observations are only made available sporadically or at increasingly or
decreasingly spaced time intervals
◦ In the case of non-uniform observations, specific data formatting may be required when fitting some models to make the
observations uniform over time

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 53


Framework Review
1. Inputs vs. Outputs:

• What are the inputs and outputs for a forecast?

2. Endogenous vs. Exogenous:

• What are the endogenous and exogenous variables?

3. Unstructured vs. Structured:

• Are the time series variables unstructured or structured?

4. Regression vs. Classification:

• Are you working on a regression or classification predictive modeling problem?


• What are some alternate ways to frame your time series forecasting problem?

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 54


Framework Review
5. Univariate vs. Multivariate:

• Are you working on a univariate or multivariate time series problem?

6. Single-step vs. Multi-step:

• Do you require a single-step or a multi-step forecast?

7. Static vs. Dynamic:

• Do you require a static or a dynamically updated model?

8. Contiguous vs. Discontiguous:

• Are your observations contiguous or discontiguous?

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 55


Some Ideas for Extending
Apply Taxonomy

• Select a standard time series dataset and work through the questions in the taxonomy
to learn more about the dataset

Standard Form

• Transform the taxonomy into a form or spreadsheet that can re-use on new time series
forecasting projects going forward

Additional Characteristic

• Brainstorm and list at least one additional characteristic of a time series forecasting
problem and a question that might used to identify it

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 56


How to Develop a Skillful
Forecasting Model

Badan Meteorologi, Klimatologi, dan Geofisika


Juli – Agustus 2022
Skillful Forecasting Model
◦ “Here is a dataset, now develop a forecast"
◦ This is the normal situation that most practitioners find themselves in when getting started on a new time series
forecasting problem

◦ A specific and actionable procedure that can use to work through the time series forecasting problem and get
better than average performance from the model
◦ A systematic four-step process that can use to work through any time series forecasting problem
◦ A list of models to evaluate and the order in which to evaluate them
◦ A methodology that allows the choice of final model to be defensible with empirical evidence, rather than whim or fashion

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 58


Situation
◦ A common situation to develop a forecast model
◦ Perhaps sent a CSV file
◦ Perhaps given access to a database
◦ Perhaps starting a competition

◦ The problem can be reasonably well defined:


◦ Have or can access historical time series data
◦ Know or can find out what needs to be forecasted
◦ Know or can find out how what is most important in evaluating a candidate model

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 59


Situation
◦ So how tackle this problem? Unless have been through this trial by fire

May struggle even if you have


May struggle even if you have a background in time series
May struggle because you are
machine learning experience forecasting because machine
new to the fields of machine
because time series data is learning methods may
learning and time series
different outperform the classical
approaches on the data

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 60


Process Overview
◦ The goal of this process is to get a good enough forecast model as fast as possible
◦ This process may or may not deliver the best possible model, but it will deliver a good model: a model that is better than a
baseline prediction
◦ This process will deliver a model that is 80% to 90% of what can be achieved on the problem

◦ The process is fast


◦ It focuses on automation
◦ Hyperparameters are searched rather than specified based on careful analysis
◦ Encouraged to test suites of models in parallel, rapidly getting an idea of what works and what doesn’t
◦ Nevertheless, the process is flexible, allowing you to circle back or go as deep as you like on a given step if you have the
time and resources

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 61


Process Overview
◦ The process is divided into four parts

Design
Define Test Finalize
Test
Problem Models Model
Harness

◦ The process is different from a classical linear work-through of a predictive modeling problem
◦ It is designed to get a working forecast model fast and then slow down and see if you can get a better model

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 62


How to Use This Process
◦ The biggest mistake is skipping steps
◦ The mistake that almost all beginners make is going straight to modeling without a strong idea of what problem is being
solved or how to robustly evaluate candidate solutions
◦ This almost always results in a lot of wasted time
◦ Slow down, follow the process, and complete each step

◦ Recommend having separate code for each experiment that can be re-run at any time
◦ This is important so that can circle back when discover a bug, fix the code, and re-run an experiment
◦ Running experiments and iterating quickly, but if sloppy, then cannot trust any of your results
◦ Especially important when it comes to the design of the test harness for evaluating candidate models

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 63


Step 1: Define Problem
1. Inputs vs. Outputs 2. Endogenous vs. 3. Unstructured vs. 4. Regression vs.
 What are the inputs and outputs Exogenous Structured Classification
for a forecast?  What are the endogenous and  Are the time series variables  Are you working on a regression or
exogenous variables? unstructured or structured? classification predictive modeling
problem?
 What are some alternate ways to
frame your time series forecasting
problem?

5. Univariate vs. Multivariate 6. Single-step vs. Multi-step 7. Static vs. Dynamic 8. Contiguous vs.
• Are you working on a univariate or • Do you require a single-step or a • Do you require a static or a Discontiguous
multivariate time series problem? multi-step forecast? dynamically updated model? • Are your observations contiguous
or discontiguous?

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 64


Step 1: Define Problem
◦ Some useful tools to help get answers include:
◦ Data visualizations (e.g. line plots, etc.)
◦ Statistical analysis (e.g. ACF/PACF plots, etc.)
◦ Domain experts
◦ Project stakeholders

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 65


Step 2: Design Test Harness
◦ Design a test harness that can use to evaluate candidate models
◦ Includes both the method used to estimate model skill and the metric used to evaluate predictions

◦ A common time series forecasting model evaluation scheme if looking for ideas
◦ Split the dataset into a train and test set
◦ Fit a candidate approach on the training dataset
◦ Make predictions on the test set directly or using walk-forward validation
◦ Calculate a metric that compares the predictions to the expected values

◦ The test harness must be robust and must have complete trust in the results it provides
◦ Ensure that any coeficients used for data preparation are estimated from the training dataset only and then applied on the
test set
◦ Include mean and standard deviation in the case of data standardization

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 66


Step 3: Test Models
◦ Test many models using the test harness
◦ Carefully designing experiments to test a suite of configurations for standard models and letting them run
◦ Each experiment can record results to a file, to allow quickly discover the top three to five most skillful congurations from
each run

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 67


Step 3: Test Models
◦ Some common classes of methods that can design experiments around include the following:

1. Baseline 2. Autoregression 3. Exponential Smoothing 4. Linear Machine


• Simple forecasting methods • The Box-Jenkins process and • Single, double and triple Learning
such as persistence and methods such as SARIMA exponential smoothing • Linear regression methods
averages methods and variants such as
regularization

5. Nonlinear Machine 6. Ensemble Machine 7. Deep Learning


Learning Learning • MLPs, CNNs, LSTMs, and
• kNN, decision trees, support • Random forest, gradient Hybrid models
vector regression and more boosting, stacking and more

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 68


Step 3: Test Models
◦ Configurations evaluation
◦ Search model configurations at a finer resolution around a configuration known to already perform well
◦ Search more model hyperparameter configurations
◦ Use analysis to set better bounds on model hyperparameters to be searched
◦ Use domain knowledge to better prepare data or engineer input features
◦ Explore different potentially more complex methods
◦ Explore ensembles of well performing base models

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 69


Step 3: Test Models
◦ Data preparation schemes:
◦ Differencing to remove a trend
◦ Seasonal differencing to remove seasonality
◦ Standardize to center
◦ Normalize to rescale
◦ Power Transform to make normal

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 70


Step 3: Test Models
◦ Speed up the evaluation of models include:
◦ Use multiple machines in parallel via cloud hardware (such as Amazon EC2)
◦ Reduce the size of the train or test dataset to make the evaluation process faster
◦ Use a more coarse grid of hyperparameters and circle back if have time later
◦ Perhaps do not refit a model for each step in walk-forward validation

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 71


Step 4: Finalize Model
◦ Know whether the time series is predictable
◦ If it is predictable, have a list of the top 5 to 10 candidate models that are skillful on the problem
◦ Pick one or multiple models and finalize them

◦ Involves training a new final model on all available historical data (train and test)

◦ The model is ready for use


◦ Make a prediction for the future
◦ Save the model to file for later use in making predictions
◦ Incorporate the model into software for making predictions

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 72


Time Series as Supervised
Learning

Badan Meteorologi, Klimatologi, dan Geofisika


Juli – Agustus 2022
Time Series as Supervised Learning
◦ Time series forecasting can be framed as a supervised learning problem
◦ The time series data allows accessing to the suite of standard linear and nonlinear machine learning algorithms on any
problem

◦ How can re-frame the time series problem as a supervised learning problem for machine learning
◦ What supervised learning is and how it is the foundation for all predictive modeling machine learning algorithms
◦ The sliding window method for framing a time series dataset and how to use it
◦ How to use the sliding window for multivariate data and multi-step forecasting

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 74


Supervised Machine Learning
◦ The majority of practical machine learning uses supervised learning
◦ Have input variables (X) and an output variable (y)
◦ Use an algorithm to learn the mapping function from the input to the output

Y = f (X )

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 75


Supervised Machine Learning
◦ The goal is to approximate the real underlying mapping
◦ A new input data (X) to predict the output variables (y) for that data

◦ A contrived example of a supervised learning dataset


one input variable (X)

X, y
5, 0.9
4, 0.8
one output variable to be predicted (y)
5, 1.0
3, 0.7
4, 0.9

Each row is an observation

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 76


Supervised Machine Learning
◦ Called supervised learning because the process of an algorithm learning from the training dataset can be thought
of as a teacher supervising the learning process
◦ The algorithm iteratively makes predictions on the training data and is corrected by making updates
◦ Learning stops when the algorithm achieves an acceptable level of performance
◦ Supervised learning problems can be further grouped into regression and classification problems

Classification Regression
A classification problem is when the output A regression problem is when the output
variable is a category, such as red and blue variable is a real value, such as dollars or
or disease and no disease weight

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 77


Sliding Window
◦ Time series data can be phrased as supervised learning
◦ Given a sequence of numbers for a time series dataset, can restructure the data to look like a supervised learning problem

time, measure
1, 100
2, 110 Use previous time steps as input variables
3, 108
4, 115
5, 120
Use the next time step as the output variable

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 78


Sliding Window
◦ Restructure the time series dataset as a supervised learning problem
◦ Use the value at the previous time step to predict the value at the next time-step
◦ Re-organizing the time series dataset

X, y
?, 100 The previous time step is the input (X)
100, 110
110, 108
108, 115 The next time step is the output (y)
115, 120
120, ?  The order between the observations is preserved, and must continue to be
preserved when using this dataset to train a supervised model
 There are no previous value that can use to predict the first value in the
sequence
 Delete this row as cannot use it
 There are no a known next value to predict for the last value in the sequence
 Delete this value while training the supervised model

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 79


Sliding Window
◦ The sliding window method (the window method) : Use of prior time steps to predict the next time step
◦ In statistics and time series analysis, called a lag or lag method
◦ The number of previous time steps is called the window width or size of the lag
◦ The sliding window is the basis for how can turn any time series dataset into a supervised learning problem

◦ Notice:
◦ Turn a time series into either a regression or a classification supervised learning problem for real-valued or labeled time
series values
◦ The standard linear and nonlinear machine learning algorithms may be applied
◦ The width sliding window can be increased to include more previous time steps
◦ The sliding window approach can be used on a time series that has more than one value, or so-called multivariate time
series

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 80


Sliding Window With Multivariates
◦ Univariate Time Series
◦ These are datasets where only a single variable is observed at each time, such as temperature each hour
◦ The simplest to understand and work with

◦ Multivariate Time Series


◦ These are datasets where two or more variables are observed at each time
◦ It is harder to model and often many of the classical methods do not perform well
◦ It much more complicated than univariate time series analysis

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 81


Sliding Window With Multivariates
◦ Example of a small contrived multivariate time series dataset with two observations at each time step
time, measure1, measure2
1, 0.2, 88
2, 0.5, 89
3, 0.7, 87
4, 0.4, 88
5, 1.0, 90  Re-frame this time series dataset as a supervised learning problem
with a window width of one
 Use the previous time step values of measure1 and measure2
 Have available the next time step value for measure1
Need to remove
 Then predict the next time step value of measure2
X1, X2, X3, y  Give 3 input features and one output value to predict for each
?, ?, 0.2, 88
0.2, 88, 0.5, 89
training pattern
0.5, 89, 0.7, 87
0.7, 87, 0.4, 88
0.4, 88, 1.0, 90
1.0, 90, ?, ?

Need to remove

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 82


Sliding Window With Multivariates
◦ How to predict both measure1 and measure2 for the next time step?
◦ The sliding window approach can be used
◦ Phrase it as a supervised learning problem to predict both measure1 and measure2 with the same window width of one

X1, X2, y1, y2


?, ?, 0.2, 88
0.2, 88, 0.5, 89
0.5, 89, 0.7, 87
0.7, 87, 0.4, 88
0.4, 88, 1.0, 90
1.0, 90, ?, ?  Not many supervised learning methods can handle the prediction of
multiple output values without modification
 Predicting two different output variables
 Multi-step forecasting
 Predicting multiple time-steps ahead of one output variable

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 83


Sliding Window With Multiple Steps
◦ The number of time-steps to forecast:
◦ One-step Forecast: The next time step (t+1) is predicted
◦ Multi-step Forecast: Two or more future time steps are to be predicted

◦ There are a number of ways to model multi-step forecasting as a supervised learning problem
◦ Framing multi-step forecast using the sliding window method

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 84


Framing Multi-step Forecast
◦ Example of a small contrived time series dataset.
time, measure
1, 100
2, 110
3, 108
4, 115
5, 120  Frame this time series as a two-step forecasting dataset for supervised
learning with a window width of one
Cannot be used to train

X1, y1, y2
? 100, 110
100, 110, 108
110, 108, 115
108, 115, 120  A supervised model only has X1 to work with in order to predict both y1
115, 120, ?
120, ?, ?
and y2
 Careful thought and experimentation are needed on the problem to find
Cannot be used to train a window width that results in acceptable model performance

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 85


Review of Simple and Classical
Forecasting Methods

Badan Meteorologi, Klimatologi, dan Geofisika


Juli – Agustus 2022
Simple and Classical Forecasting Methods
◦ Machine Learning and Deep Learning methods can achieve impressive results on challenging time series
forecasting problems
◦ There are many forecasting problems where classical methods such as SARIMA and exponential smoothing readily
outperform more sophisticated methods
◦ It is important to both understand how classical time series forecasting methods work and to evaluate them prior to
exploring more advanced methods

◦ How to Naive and Classical Methods for time series forecasting


◦ How to develop simple forecasts for time series forecasting problems that provide a baseline for estimating model skill
◦ How to develop autoregressive models for time series forecasting
◦ How to develop exponential smoothing methods for time series forecasting

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 87


Simple Forecasting Methods
◦ Establishing a baseline is essential on any time series forecasting problem
◦ A baseline in performance gives an idea of how well all other models will actually perform on the problem

◦ How to develop a simple forecasting methods


◦ Can use to calculate a baseline level of performance on the time series forecasting problem

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 88


Forecast Performance Baseline
◦ A baseline in forecast performance provides a point of comparison
◦ It is a point of reference for all other modeling techniques on the problem
◦ If a model achieves performance at or below the baseline, the technique should be fixed or abandoned
◦ The technique used to generate a forecast to calculate the baseline performance must be easy to implement and naive of
problem-specific details
◦ The goal is to get a baseline performance on the time series forecast problem, better understanding the dataset
and developing more advanced models

Good technique for making a naive forecast

Repeatable: A method that is deterministic,


Simple: A method that requires little or no Fast: A method that is fast to implement and
meaning that it produces an expected output
training or intelligence computationally trivial to make a prediction
given the same input

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 89


Forecast Strategies
◦ Simple forecast strategies are those that assume little or nothing about the nature of the forecast problem and
are fast to implement and calculate
◦ If a model can perform better than the performance of a simple forecast strategy
◦ Can be said to be skillful

Simple forecast strategies

Naive, or using observations values directly Average, or using a statistic calculated on previous observations

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 90


Naive Forecasting Strategy
◦ A naive forecast involves using the previous observation directly as the forecast without any change
◦ It is often called the persistence forecast as the prior observation is persisted
◦ This simple approach can be adjusted slightly for seasonal data
◦ The observation at the same time in the previous cycle may be persisted instead
◦ This can be further generalized to testing each possible offset into the historical data that could be used to persist a value
for a forecast
◦ Example of a univariate time series
[1, 2, 3, 4, 5, 6, 7, 8, 9]

Persist the last observation (relative index -1) as the value 9

Persist the second last prior observation (relative index -2) as 8

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 91


Average Forecast Strategy
◦ One step above the naive forecast is the strategy of averaging prior values
◦ All prior observations are collected and averaged, either using the mean or the median, with no other treatment to the data
◦ Want to shorten the history used in the average calculation to the last few observations
◦ Can generalize this to the case of testing each possible set of n-prior observations to be included into the average
calculation
◦ Example of a univariate time series
[1, 2, 3, 4, 5, 6, 7, 8, 9]

Average the last one observation (9)

Average the last two observations (8, 9)

◦ Example of a univariate time series with seasonal structure


[1, 2, 3, 1, 2, 3, 1, 2, 3]

The series with a 3-step cycle

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 92


Autoregressive Methods
◦ Autoregressive Integrated Moving Average (ARIMA) is one of the most widely used forecasting methods for
univariate time series data forecasting
◦ Although the method can handle data with a trend, it does not support time series with a seasonal component

◦ Seasonal Autoregressive Integrated Moving Average (SARIMA): An extension to ARIMA that supports the direct
modeling of the seasonal component of the series
◦ Discover the SARIMA method for time series forecasting with univariate data containing trends and seasonality

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 93


Autoregressive Integrated Moving Average Model
◦ An ARIMA model is a class of statistical models for analyzing and forecasting time series data
◦ It explicitly caters to a suite of standard structures in time series data, and as such provides a simple yet powerful method
for making skillful time series forecasts
◦ It is a generalization of the simpler AutoRegressive Moving Average or ARMA and adds the notion of integration

◦ This acronym is descriptive, capturing the key aspects of the model itself
◦ AR: Autoregression – A model that uses the dependent relationship between an observation and some number of lagged
observations
◦ I: Integrated – The use of differencing of raw observations (e.g. subtracting an observation from an observation at the
previous time step) in order to make the time series stationary
◦ MA: Moving Average – A model that uses the dependency between an observation and a residual error from a moving
average model applied to lagged observations

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 94


Autoregressive Integrated Moving Average Model
◦ A standard notation is used of ARIMA(p,d,q)
◦ The parameters are substituted with integer values to quickly indicate the specific ARIMA model being used
◦ It does not support seasonal data
◦ That is a time series with a repeating cycle
◦ ARIMA expects data that is either not seasonal or has the seasonal component removed
◦ Seasonally adjusted via methods such as seasonal differencing

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 95


Exponential Smoothing Methods
◦ Exponential smoothing: A time series forecasting method for univariate data that can be extended to support
data with a systematic trend or seasonal component
◦ It is a powerful forecasting method that may be used as an alternative to the popular Box-Jenkins ARIMA family of
methods

◦ Time series methods like the Box-Jenkins ARIMA family of methods develop a model
◦ The prediction is a weighted linear sum of recent past observations or lags

◦ Exponential smoothing forecasting methods are similar in that a prediction is a weighted sum of past
observations
◦ The model explicitly uses an exponentially decreasing weight for past observations
◦ Past observations are weighted with a geometrically decreasing ratio

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 96


Classical Time Series Forecasting
Methods in Python

Badan Meteorologi, Klimatologi, dan Geofisika


Juli – Agustus 2022
Classical Time Series Forecasting Methods
◦ Before exploring machine learning methods for time series
◦ A good idea to ensure have exhausted classical linear time series forecasting methods
◦ Focused on linear relationships
◦ Sophisticated and perform well on a wide range of problems
◦ Assuming that the data is suitably prepared and the method is well configured

◦ A suite of classical methods for time series forecasting that can test on the forecasting problem prior to
exploring to machine learning methods

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 98


Classical Time Series Forecasting Methods
Autoregressive Moving Autoregressive Integrated
Autoregression (AR) Moving Average (MA)
Average (ARMA) Moving Average (ARIMA)

Seasonal Autoregressive
Seasonal Autoregressive
Integrated Moving- Vector Autoregression Vector Autoregression
Integrated Moving-
Average with Exogenous (VAR) Moving-Average (VARMA)
Average (SARIMA)
Regressors (SARIMAX)

Vector Autoregression
Moving-Average with Simple Exponential Holt Winter’s Exponential
Exogenous Regressors Smoothing (SES) Smoothing (HWES)
(VARMAX)

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 99


Classical Time Series Forecasting Methods
◦ All code examples are in Python and use the statsmodels library
◦ Each code example is demonstrated on a simple contrived dataset that may or may not be appropriate for the method
◦ Replace the contrived dataset with the data in order to test the method
◦ Install statsmodels library

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 100


statsmodels
◦ statsmodels is a Python module
◦ Provides classes and functions for the estimation of many different statistical models
◦ Conducting statistical tests, and statistical data exploration

◦ An extensive list of result statistics are available for each estimator


◦ The results are tested against existing statistical packages to ensure that they are correct
◦ The package is released under the open source Modified BSD (3-clause) license
◦ The online documentation is hosted at statsmodels.org

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 101


AutoRegression (AR)
◦ AR method models the next step in the sequence as a linear function of the observations at prior time steps
◦ The notation for the model involves specifying the order of the model p as a parameter to the AR function, e.g. AR(p)
◦ AR(1) is a first-order autoregression model

◦ The method is suitable for univariate time series without trend and seasonal components

statsmodels.tsa.ar_model.AutoReg

statsmodels.tsa.ar_model.AutoRegResults

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 102


Autoregression (AR)
◦ Autoregression Example
# AR example
from statsmodels.tsa.ar_model import AutoReg
from random import random

# contrived dataset
data = [x + random() for x in range(1, 100)]

# fit model
model = AutoReg(data, lags=1)
model_fit = model.fit()

# make prediction
yhat = model_fit.predict(len(data), len(data))
print(yhat)

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 103


Autoregression (AR)
◦ Running the example prepares the data, fits the model, and makes a prediction
[100.54323284]

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 104


Moving Average (MA)
◦ MA method models the next step in the sequence as a linear function of the residual errors from a mean process
at prior time steps
◦ A moving average model is different from calculating the moving average of the time series
◦ The notation for the model involves specifying the order of the model q as a parameter to the MA function, e.g. MA(q)
◦ MA(1) is a first-order moving average model

◦ The method is suitable for univariate time series without trend and seasonal components

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 105


Moving Average (MA)
◦ MA Example
# MA example
from statsmodels.tsa.arima.model import ARIMA
from random import random

# contrived dataset
data = [x + random() for x in range(1, 100)]

# fit model
model = ARIMA(data, order=(0, 0, 1))
model_fit = model.fit()

# make prediction
yhat = model_fit.predict(len(data), len(data))
print(yhat)

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 106


Moving Average (MA)
◦ Running the example prepares the data, fits the model, and makes a prediction
[72.23938335]

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 107


Autoregressive Moving Average (ARMA)
◦ ARMA method models the next step in the sequence as a linear function of the observations and residual errors
at prior time steps
◦ It combines both Autoregression (AR) and Moving Average (MA) models
◦ The notation for the model involves specifying the order for the AR(p) and MA(q) models as parameters to an ARMA
function, e.g. ARMA(p, q)
◦ An ARIMA model can be used to develop AR or MA models

◦ The method is suitable for univariate time series without trend and seasonal components

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 108


Autoregressive Moving Average (ARMA)
◦ ARMA Example
# ARMA example
from statsmodels.tsa.arima.model import ARIMA
from random import random

# contrived dataset
data = [random() for x in range(1, 100)]

# fit model
model = ARIMA(data, order=(2, 0, 1))
model_fit = model.fit()

# make prediction
yhat = model_fit.predict(len(data), len(data))
print(yhat)

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 109


Autoregressive Moving Average (ARMA)
◦ Running the example prepares the data, fits the model, and makes a prediction
[0.48535413]

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 110


Autoregressive Integrated Moving Average (ARIMA)
◦ ARIMA method models the next step in the sequence as a linear function of the differenced observations and
residual errors at prior time steps
◦ It combines both Autoregression (AR) and Moving Average (MA) models as well as a differencing pre-processing step of
the sequence to make the sequence stationary, called integration (I)
◦ The notation for the model involves specifying the order for the AR(p), I(d), and MA(q) models as parameters to an ARIMA
function, e.g. ARIMA(p, d, q)
◦ An ARIMA model can also be used to develop AR, MA, and ARMA models

◦ The method is suitable for univariate time series with trend and without seasonal components

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 111


Autoregressive Integrated Moving Average (ARIMA)
◦ ARIMA Example
# ARMA example
from statsmodels.tsa.arima.model import ARIMA
from random import random

# contrived dataset
data = [random() for x in range(1, 100)]

# fit model
model = ARIMA(data, order=(2, 0, 1))
model_fit = model.fit()

# make prediction
yhat = model_fit.predict(len(data), len(data))
print(yhat)

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 112


Autoregressive Integrated Moving Average (ARIMA)
◦ Running the example prepares the data, fits the model, and makes a prediction
[0.48535413]

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 113


Seasonal Autoregressive Integrated Moving-Average (SARIMA)
◦ SARIMA method models the next step in the sequence as a linear function of the differenced observations,
errors, differenced seasonal observations, and seasonal errors at prior time steps
◦ It combines the ARIMA model with the ability to perform the same autoregression, differencing, and moving average
modeling at the seasonal level
◦ The notation for the model involves specifying the order for the AR(p), I(d), and MA(q) models as parameters to an ARIMA
function and AR(P), I(D), MA(Q) and m parameters at the seasonal level, e.g. SARIMA(p, d, q)(P, D, Q)m where “m” is the
number of time steps in each season (the seasonal period)
◦ A SARIMA model can be used to develop AR, MA, ARMA and ARIMA models

◦ The method is suitable for univariate time series with trend and/or seasonal components

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 114


Seasonal Autoregressive Integrated Moving-Average (SARIMA)
◦ SARIMA Example
# SARIMA example
from statsmodels.tsa.statespace.sarimax import SARIMAX
from random import random

# contrived dataset
data = [x + random() for x in range(1, 100)]

# fit model
model = SARIMAX(data, order=(1, 1, 1), seasonal_order=(0, 0, 0, 0))
model_fit = model.fit(disp=False)

# make prediction
yhat = model_fit.predict(start=len(data), end=len(data))
print(yhat)

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 115


Seasonal Autoregressive Integrated Moving-Average (SARIMA)
◦ Running the example prepares the data, fits the model, and makes a prediction
[0.48535413]

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 116


Seasonal Autoregressive Integrated Moving-Average with
Exogenous Regressors (SARIMAX)
◦ SARIMAX is an extension of the SARIMA model that also includes the modeling of exogenous variables
◦ Exogenous variables are covariates and can be thought of as parallel input sequences that have observations at the same
time steps as the original series
◦ The primary series may be referred to as endogenous data to contrast it from the exogenous sequence(s)
◦ The observations for exogenous variables are included in the model directly at each time step and are not modeled in the
same way as the primary endogenous sequence (e.g. as an AR, MA, etc. process)
◦ The SARIMAX method can also be used to model the subsumed models with exogenous variables, such as ARX, MAX,
ARMAX, and ARIMAX

◦ The method is suitable for univariate time series with trend and/or seasonal components and exogenous
variables

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 117


Seasonal Autoregressive Integrated Moving-Average with
Exogenous Regressors (SARIMAX)
◦ SARIMAX Example
# SARIMAX example
from statsmodels.tsa.statespace.sarimax import SARIMAX
from random import random

# contrived dataset
data1 = [x + random() for x in range(1, 100)]
data2 = [x + random() for x in range(101, 200)]

# fit model
model = SARIMAX(data1, exog=data2, order=(1, 1, 1), seasonal_order=(0, 0, 0, 0))
model_fit = model.fit(disp=False)

# make prediction
exog2 = [200 + random()]
yhat = model_fit.predict(len(data1), len(data1), exog=[exog2])
print(yhat)

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 118


Seasonal Autoregressive Integrated Moving-Average with
Exogenous Regressors (SARIMAX)
◦ Running the example prepares the data, fits the model, and makes a prediction
[100.13132921]

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 119


Vector Autoregression (VAR)
◦ VAR method models the next step in each time series using an AR model
◦ It is the generalization of AR to multiple parallel time series, e.g. multivariate time series
◦ The notation for the model involves specifying the order for the AR(p) model as parameters to a VAR function, e.g. VAR(p)

◦ The method is suitable for multivariate time series without trend and seasonal components

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 120


Vector Autoregression (VAR)
◦ VAR Example
# VAR example
from statsmodels.tsa.vector_ar.var_model import VAR
from random import random

# contrived dataset with dependency


data = list()
for i in range(100):
v1 = i + random()
v2 = v1 + random()
row = [v1, v2]
data.append(row)

# fit model
model = VAR(data)
model_fit = model.fit()

# make prediction
yhat = model_fit.forecast(model_fit.endog, steps=1)
print(yhat)

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 121


Vector Autoregression (VAR)
◦ Running the example prepares the data, fits the model, and makes a prediction
[[100.26399189 100.85867797]]

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 122


Vector Autoregression Moving-Average (VARMA)
◦ VARMA method models the next step in each time series using an ARMA model
◦ It is the generalization of ARMA to multiple parallel time series, e.g. multivariate time series
◦ The notation for the model involves specifying the order for the AR(p) and MA(q) models as parameters to a VARMA
function, e.g. VARMA(p, q)
◦ A VARMA model can also be used to develop VAR or VMA models

◦ The method is suitable for multivariate time series without trend and seasonal components

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 123


Vector Autoregression Moving-Average (VARMA)
◦ VARMA Example
# VARMA example
from statsmodels.tsa.statespace.varmax import VARMAX
from random import random

# contrived dataset with dependency


data = list()
for i in range(100):
v1 = random()
v2 = v1 + random()
row = [v1, v2]
data.append(row)

# fit model
model = VARMAX(data, order=(1, 1))
model_fit = model.fit(disp=False)

# make prediction
yhat = model_fit.forecast()
print(yhat)

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 124


Vector Autoregression Moving-Average (VARMA)
◦ Running the example prepares the data, fits the model, and makes a prediction
[[0.36700568 0.70765394]]

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 125


Vector Autoregression Moving-Average with Exogenous Regressors
(VARMAX)
◦ VARMAX is an extension of the VARMA model that also includes the modeling of exogenous variables
◦ It is a multivariate version of the ARMAX method
◦ Exogenous variables are also called covariates and can be thought of as parallel input sequences that have observations
at the same time steps as the original series
◦ The primary series(es) are referred to as endogenous data to contrast it from the exogenous sequence(s)
◦ The observations for exogenous variables are included in the model directly at each time step and are not modeled in the
same way as the primary endogenous sequence (e.g. as an AR, MA, etc. process)
◦ The VARMAX method can also be used to model the subsumed models with exogenous variables, such as VARX and
VMAX

◦ The method is suitable for multivariate time series without trend and seasonal components with exogenous
variables

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 126


Vector Autoregression Moving-Average with Exogenous Regressors
(VARMAX)
◦ VARMAX Example
# VARMAX example
from statsmodels.tsa.statespace.varmax import VARMAX
from random import random

# contrived dataset with dependency


data = list()
for i in range(100):
v1 = random()
v2 = v1 + random()
row = [v1, v2]
data.append(row)
data_exog = [x + random() for x in range(100)]

# fit model
model = VARMAX(data, exog=data_exog, order=(1, 1))
model_fit = model.fit(disp=False)

# make prediction
data_exog2 = [[100]]
yhat = model_fit.forecast(exog=data_exog2)
print(yhat)

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 127


Vector Autoregression Moving-Average with Exogenous Regressors
(VARMAX)
◦ Running the example prepares the data, fits the model, and makes a prediction
[[0.44761352 0.84834508]]

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 128


Simple Exponential Smoothing (SES)
◦ SES method models the next time step as an exponentially weighted linear function of observations at prior time
steps

◦ The method is suitable for univariate time series without trend and seasonal components

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 129


Simple Exponential Smoothing (SES)
◦ SES Example
# SES example
from statsmodels.tsa.holtwinters import SimpleExpSmoothing
from random import random

# contrived dataset
data = [x + random() for x in range(1, 100)]

# fit model
model = SimpleExpSmoothing(data)
model_fit = model.fit()

# make prediction
yhat = model_fit.predict(len(data), len(data))
print(yhat)

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 130


Simple Exponential Smoothing (SES)
◦ Running the example prepares the data, fits the model, and makes a prediction
[99.45127609]

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 131


Holt Winter’s Exponential Smoothing (HWES)
◦ HWES called the Triple Exponential Smoothing method models the next time step as an exponentially weighted
linear function of observations at prior time steps, taking trends and seasonality into account

◦ The method is suitable for univariate time series with trend and/or seasonal components

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 132


Holt Winter’s Exponential Smoothing (HWES)
◦ HWES Example
# HWES example
from statsmodels.tsa.holtwinters import ExponentialSmoothing
from random import random

# contrived dataset
data = [x + random() for x in range(1, 100)]

# fit model
model = ExponentialSmoothing(data)
model_fit = model.fit()

# make prediction
yhat = model_fit.predict(len(data), len(data))
print(yhat)

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 133


Holt Winter’s Exponential Smoothing (HWES)
◦ Running the example prepares the data, fits the model, and makes a prediction
[99.91972499]

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 134

You might also like