1 Fundamentals

1.
Fundamentals
Badan Meteorologi, Klimatologi, dan Geofisika

Juli – Agustus 2022
Fundamentals
1 2 3 4
• What is Time Series • Promise of Deep Learning • Taxonomy of Time Series • How to Develop a Skillful
Forecasting? for Time Series Forecasting Problems Forecasting Model
Forecasting
5 6 7
• Time Series as Supervised • Review of Simple and • Classical Time Series
Learning Classical Forecasting Forecasting Methods in
Methods Python
Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 13

What is Time Series Forecasting?

What is Time Series Forecasting?
◦ Time series forecasting is an important area of machine learning that is often neglected
◦ It is important because there are so many prediction problems that involve a time component
◦ These problems are neglected because it is this time component that makes time series problems more difficult
to handle
◦ Standard definitions of time series, time series analysis, and time series forecasting
◦ The important components to consider in time series data
◦ Examples of time series to make your understanding concrete

Time Series Phenomena
◦ A timeseries can be any data obtained via measurements at regular intervals, like the daily price of a stock, the
hourly electricity consumption of a city, or the weekly sales of a store
◦ Timeseries are everywhere
◦ Natural phenomena
◦ Seismic activity
◦ The evolution of fish populations in a river
◦ The weather at a location
◦ Human activity patterns
◦ Visitors to a website
◦ A country’s GDP
◦ Credit card transactions
◦ Working with timeseries involves understanding the dynamics of a system
◦ Its periodic cycles, how it trends over time, its regular regime and its sudden spikes

Timeseries Wide Range
◦ Classification
◦ Assign one or more categorical labels to a timeseries
◦ Given the timeseries of the activity of a visitor on a website, classify whether the visitor is a bot or a human
◦ Event detection
◦ Identify the occurrence of a specific expected event within a continuous data stream
◦ A particularly useful application is “hotword detection,” where a model monitors an audio stream and detects utterances
like “Ok Google” or “Hey Alexa”
◦ Anomaly detection
◦ Detect anything unusual happening within a continuous datastream
◦ Unusual activity on your corporate network? Might be an attacker
◦ Unusual readings on a manufacturing line? Time for a human to go take a look
◦ Anomaly detection is typically done via unsupervised learning, because you often don’t know what kind of anomaly you’re
looking for, so you can’t train on specific anomaly examples

Time Series Datasets
◦ Example of a normal time series dataset
observation #1
observation #2
observation #3
◦ Time does play a role in normal machine learning datasets

◦ Predictions are made for new data when the actual outcome may not be known until some future date
◦ The future is being predicted, but all prior observations are treated equally
◦ Perhaps with some very minor temporal dynamics to overcome the idea of concept drift such as only using the last year of
observations rather than all data available

Time Series Datasets
◦ A time series dataset is different
◦ Time series adds an explicit order dependence between observations: a time dimension
◦ This additional dimension is both a constraint and a structure that provides a source of additional information
◦ A time series is a sequence of observations taken sequentially in time
Time #1, observation


Time Series Nomenclature
◦ The current time is defined as t, an observation at the current time is defined as obs(t)
◦ Times in the past are negative relative to the current time
◦ Times in the future are interested in forecasting and are positive relative to the current time
◦ Nomenclature
◦ t-n: A prior or lag time (e.g. t-1 for the previous time)
◦ t: A current time and point of reference
◦ t+n: A future or forecast time (e.g. t+1 for the next time)

Describing vs. Predicting
◦ Understanding a dataset, called time series analysis, can help to make better predictions, but is not required and
can result in a large technical investment in time and expertise not directly aligned with the desired outcome,
which is forecasting the future
◦ In descriptive modeling, or time series analysis, a time series is modeled to determine its components in terms of
seasonal patterns, trends, relation to external factors, and the like
◦ In time series forecasting, the information in a time series (perhaps with additional information) is used to
forecast future values of that series

Time Series Analysis
◦ Time series analysis involves developing models that best capture or describe an observed time series in order to
understand the underlying causes
◦ The why behind a time series dataset
◦ Assumptions about the form of the data and decomposing the time series into constitution components
◦ The quality of a descriptive model is determined by how well it describes all available data and the interpretation
it provides to better inform the problem domain
◦ The primary objective of time series analysis is to develop mathematical models that provide plausible
descriptions from sample data

Time Series Forecasting
◦ Making predictions about the future is called extrapolation in the classical statistical handling of time series data
◦ Refer to it as time series forecasting
◦ Forecasting involves taking models fit on historical data and using them to predict future observations
◦ Descriptive models can borrow from the future (i.e. to smooth or remove noise), they only seek to best describe the data
◦ An important distinction in forecasting is that the future is completely unavailable and must only be estimated
from what has already happened
◦ The skill of a time series forecasting model is determined by its performance at predicting the future
◦ This is often at the expense of being able to explain why a specific prediction was made, condence intervals and even
better understanding the underlying causes behind the problem

Components of Time Series
◦ Level: The baseline value for the series if it were a straight line
◦ Trend: The optional and often linear increasing or decreasing behavior of the series over time
◦ Seasonality: The optional repeating patterns or cycles of behavior over time
◦ Noise: The optional variability in the observations that cannot be explained by the model
◦ All time series have a level, most have noise, and the trend and seasonality are optional
◦ The main features of many time series are trends and seasonal variations another important feature of most time series is
that observations close together in time tend to be correlated (serially dependent)

Concerns of Forecasting
◦ How much data do you have available and are you able to gather it all together?
◦ More data is often more helpful, offering greater opportunity for exploratory data analysis, model testing and tuning, and
model delity
◦ What is the time horizon of predictions that is required? Short, medium or long term?
◦ Shorter time horizons are often easier to predict with higher confidence
◦ Can forecasts be updated frequently over time or must they be made once and remain static?
◦ Updating forecasts as new information becomes available often results in more accurate predictions
◦ At what temporal frequency are forecasts required?
◦ Often forecasts can be made at a lower or higher frequencies, allowing you to harness down-sampling, and up-sampling of
data, which in turn can offer benefits while modeling

Cleaning Scaling Transformation
◦ Frequency
◦ Perhaps data is provided at a frequency that is too high to model or is unevenly spaced through time requiring resampling
for use in some models
◦ Outliers
◦ Perhaps there are corrupt or extreme outlier values that need to be identified and handled
◦ Missing
◦ Perhaps there are gaps or missing data that need to be interpolated or imputed

Examples of Time Series Forecasting
◦ Forecasting the corn yield in tons by state each year
◦ Forecasting whether an EEG trace in seconds indicates a patient is having a seizure or not
◦ Forecasting the closing price of a stock each day
◦ Forecasting the birth rate at all hospitals in a city each year
◦ Forecasting product sales in units sold each day for a store
◦ Forecasting the number of passengers through a train station each day
◦ Forecasting unemployment for a state each quarter
◦ Forecasting utilization demand on a server each hour
◦ Forecasting the size of the rabbit population in a state each breeding season
◦ Forecasting the average price of gasoline in a city each day

Promise of Deep Learning for

Deep Learning Neural Networks
◦ Deep learning neural networks are able to automatically learn arbitrary complex mappings from inputs to outputs
and support multiple inputs and outputs
◦ These are powerful features that offer a lot of promise for time series forecasting, particularly on problems with complex-
nonlinear dependencies, multivalent inputs, and multi-step forecasting
◦ These features along with the capabilities of more modern neural networks may offer great promise such as the automatic
feature learning provided by convolutional neural networks and the native support for sequence data in recurrent neural
networks
◦ Discover the promised capabilities of deep learning neural networks for time series forecasting
◦ The focus and implicit, if not explicit, limitations on classical time series forecasting methods
◦ The general capabilities of Multilayer Perceptrons and how they may be harnessed for time series forecasting
◦ The added capabilities of feature learning and native support for sequences provided by Convolutional Neural Networks
and Recurrent Neural Networks

◦ Time series forecasting is difficult
◦ Unlike the simpler problems of classification and regression, time series problems add the complexity of order or temporal
dependence between observations
◦ This can be difficult as specialized handling of the data is required when fitting and evaluating models
◦ This temporal structure can also aid in modeling, providing additional structure like trends and seasonality that can be
leveraged to improve model skill
◦ Traditional time series forecasting has been dominated by linear methods like ARIMA because well understood
and effective on many problems

Classical Methods Limitations
◦ Focus on complete data: missing or corrupt data is generally unsupported
◦ Focus on linear relationships: assuming a linear relationship excludes more complex joint distributions
◦ Focus on fixed temporal dependence: the relationship between observations at different times, and in turn the
number of lag observations provided as input, must be diagnosed and specfied
◦ Focus on univariate data: many real-world problems have multiple input variables
◦ Focus on one-step forecasts: many real-world problems require forecasts with a long time horizon

Machine Learning Methods
◦ Machine learning methods can be effective on more complex time series forecasting problems with multiple
input variables, complex nonlinear relationships, and missing data
◦ In order to perform well, these methods often require hand-engineered features prepared by either domain
experts or practitioners with a background in signal processing
◦ Classical techniques often depended on hand-crafted features that were expensive to create and required expert
knowledge of the field

Multilayer Perceptrons for Time Series
◦ Simpler neural networks such as the Multilayer Perceptron or MLP approximate a mapping function from input
variables to output variables
◦ MLP capability is valuable for time series

◦ Robust to Noise. Neural networks are robust to noise in input data and in the mapping function and can even support
learning and prediction in the presence of missing values
◦ Nonlinear. Neural networks do not make strong assumptions about the mapping function and readily learn linear and
nonlinear relationships

Neural Networks for Time Series
◦ Neural networks can be cofigured to support an arbitrary defined but fixed number of inputs and outputs in the
mapping function
◦ Multivariate Inputs. An arbitrary number of input features can be specified, providing direct support for multivariate
forecasting
◦ Multi-step Forecasts. An arbitrary number of output values can be specified, providing direct support for multi-step and
even multivariate forecasting

Feedforward Neural Networks
◦ Feedforward neural networks may be useful for time series forecasting
◦ Implicit in the usage of neural networks is the requirement that there is indeed a meaningful mapping from inputs to
outputs to learn
◦ Modeling a mapping of a random walk will perform no better than a persistence model (e.g. using the last seen
observation as the forecast)
◦ This expectation of a learnable mapping function also makes one of the limitations clear: the mapping function is
fixed or static
◦ Fixed Inputs. The number of lag input variables is fixed, in the same way as traditional time series forecasting methods
◦ Fixed Outputs. The number of output variables is also fixed; although a more subtle issue, it means that for each input
pattern, one output must be produced

Feedforward Neural Networks
◦ Feedforward neural networks do offer great capability but still suffer from this key limitation of having to specify
the temporal dependence upfront in the design of the model
◦ This dependence is almost always unknown and must be discovered and teased out from detailed analysis in a fixed form

Convolutional Neural Networks for Time Series
◦ Convolutional Neural Networks or CNNs are a type of neural network that was designed to eficiently handle
image data
◦ Proven effective on challenging computer vision problems both achieving state-of-the-art results on tasks like image
classification and providing a component in hybrid models for entirely new problems such as object localization, image
captioning and more
◦ Operating directly on raw data, such as raw pixel values, instead of domain-specific or handcrafted features
derived from the raw data
◦ The model then learns how to automatically extract the features from the raw data that are directly useful for the problem
being addressed
◦ This is called representation learning and the CNN achieves this in such a way that the features are extracted regardless of
how they occur in the data, so-called transform or distortion invariance

Convolutional Neural Networks for Time Series
◦ The ability of CNNs to learn and automatically extract features from raw input data can be applied to time series
forecasting problems
◦ A sequence of observations can be treated like a one-dimensional image that a CNN model can read and distill into the
most salient elements
◦ This capability of CNNs has been demonstrated to great effect on time series classification tasks such as
automatically detecting human activities based on raw accelerator sensor data from fitness devices and
smartphones
◦ CNNs get the benefits of Multilayer Perceptrons for time series forecasting, namely support for multivariate
input, multivariate output and learning arbitrary but complex functional relationships, but do not require that the
model learn directly from lag observations
◦ The model can learn a representation from a large input sequence that is most relevant for the prediction problem

Long Short-Term Memory (LSTM)
◦ LSTM is able to solve many time series tasks unsolvable by feedforward networks using fixed size time windows
◦ This capability of LSTMs has been used to great effect in complex natural language processing problems such
as neural machine translation where the model must learn the complex inter-relationships between words both
within a given language and across languages in translating form one language to another
◦ This capability can be used in time series forecasting
◦ Recurrent neural networks can also automatically learn the temporal dependence from the data
◦ The most relevant context of input observations to the expected output is learned and can change dynamically

Recurrent Neural Networks for Time Series
◦ Recurrent neural networks like the Long Short-Term Memory network or LSTM add the explicit handling of order
between observations when learning a mapping function from inputs to outputs, not offered by MLPs or CNNs
◦ They are a type of neural network that adds native support for input data comprised of sequences of observations
◦ Native Support for Sequences

◦ Recurrent neural networks directly add support for input sequence data
◦ The addition of sequence is a new dimension to the function being approximated

◦ Instead of mapping inputs to outputs alone, the network is capable of learning a mapping function for the inputs over time
to an output

Promise of Deep Learning 1/2
The capabilities of deep learning neural networks suggest a good fit for time series forecasting
1. Neural networks learn arbitrary mapping functions
◦ Neural networks should be able to subsume the capabilities of classical linear forecasting methods given their ability to learn arbitrary
complex mapping from inputs to outputs
2. Neural networks may not require a scaled or stationary time series as input
◦ It is good practice to manually identify and remove systematic structures from time series data to make the problem easier to model
(e.g. make the series stationary)
◦ May still be a best practice when using recurrent neural networks
◦ The available context of the sequence provided as input may allow neural network models to learn both trend and seasonality directly
3. Neural networks support multivariate inputs
◦ Each of the three classes of neural network models discussed, MLPs, CNNs and RNNs offer capabilities that are challenging for
classical time series forecasting methods
4. Neural networks support multi-step outputs

Promise of Deep Learning 2/2
5. Convolutional neural networks support eficient feature learning
◦ CNNs offer eficiency and much greater performance at automatically learning to identify, extract and distill useful features from raw
data
6. LSTM networks support eficient learning of temporal dependencies
◦ The explicit addition of support for input sequences in RNNs offers eficiency and greater performance for automatically learning the
temporal dependencies both within the input sequence and from the input sequence to the output
7. Hybrid models eficiently combine the diverse capabilities of different architectures
◦ The use of hybrid models like CNN-LSTMs and ConvLSTMs that seek to harness the capabilities of all three model types

Taxonomy of Time Series
Forecasting Problems

Taxonomy of Time Series Forecasting Problems
◦ There are many things to consider with a new time series forecasting problem
◦ The choice make directly impacts each step of the project from the design of a test harness to evaluate forecast models to
the fundamental dificulty of the forecast problem
◦ It is possible to very quickly narrow down the options by working through a series of questions about your time series
forecasting problem
◦ Considering a few themes and questions to narrow down the type of problem, test harness, and even choice of algorithms
for the project
◦ A framework to quickly understand and frame the time series forecasting problem
◦ A structured way of thinking about time series forecasting problems
◦ A framework to uncover the characteristics of a given time series forecasting problem
◦ A suite of specific questions, the answers to which will help to define the forecasting problem

Framework Overview
◦ Time series forecasting involves developing and using a predictive model on data where there is an ordered
relationship between observations
◦ Answer a few questions and greatly improve the understanding of the structure of the forecast problem, the
structure of the model requires, and how to evaluate it
Inputs 1 Outputs Univariate 5 Multivariate
Endogenous 2 Exogenous Single-step 6 Multi-step
Unstructured 3 Structured Static 7 Dynamic
Regression 4 Classification Contiguous 8 Discontiguous

Inputs vs. Outputs
◦ A prediction problem involves using past observations to predict or forecast one or more possible future
observations
◦ The goal is to guess about what might happen in the future
◦ Inputs
◦ Historical data provided to the model in order to make a single forecast
◦ Not the data used to train the model
◦ The data used to make one forecast, for example the last seven days of sales data to forecast the next one day of sales
data
◦ May not be able to be specific when it comes to input data, for example may not know whether one or multiple prior time
steps are required to make a forecast
◦ Outputs
◦ Prediction or forecast for a future time step beyond the data provided as input

Endogenous vs. Exogenous
◦ The input data can be further subdivided in order to better understand its relationship to the output variable
◦ Endogenous
◦ An input variable is affected by other variables in the system and the output variable depends on it
◦ The observations for an input variable depend upon one another
◦ The observation at time t is dependent upon the observation at t – 1
◦ The t – 1 may depend on t – 2, and so on
◦ Exogenous
◦ An input variable is independent of other variables in the system and the output variable depends upon it
◦ Endogenous variables are infuenced by other variables in the system (including themselves) whereas as
exogenous variables are not and are considered as outside the system
◦ A time series forecasting problem has endogenous variables and may or may not have exogenous variables
◦ The output is a function of some number of prior time steps
◦ The exogenous variables are ignored given the strong focus on the time series

Regression vs. Classification
◦ Regression predictive modeling problems are those where a quantity is predicted
◦ A quantity is a numerical value; for example a price, a count, a volume, and so on
◦ A time series forecasting problem in which you want to predict one or more future numerical values is a regression type
predictive modeling problem
◦ Classification predictive modeling problems are those where a category is predicted

◦ Classify as one of two or more labels, for example hot, cold, up, down, and buy, sell
◦ A time series forecasting problem in which you want to classify input time series data is a classification type predictive
modeling problem

Unstructured vs. Structured
◦ It is useful to plot each variable in a time series and inspect the plot looking for possible patterns
◦ A time series for a single variable may not have any obvious pattern
◦ A series with no pattern as unstructured is no discernible time-dependent structure
◦ A time series may have obvious patterns
◦ A trend or seasonal cycles as structured
◦ Simplify the modeling process by identifying and removing the obvious structures from the data, such as an increasing
trend or repeating cycle
◦ Some classical methods even allow to specify parameters to handle these systematic structures directly
Unstructured Structured
• No obvious systematic time- • Systematic time-dependent patterns

dependent pattern in a time series in a time series variable (e.g. trend
variable. and/or seasonality)

Univariate vs. Multivariate
◦ A single variable measured over time is referred to as a univariate time series
◦ Univariate means one variate or one variable
◦ Multiple variables measured over time is referred to as a multivariate time series: multiple variates or multiple
variables
Univariate Multivariate
• One variable measured over time • Multiple variables measured over

time
◦ The number of variables may differ between the inputs and outputs, e.g. the data may not be symmetrical
Univariate and Multivariate Inputs Univariate and Multivariate Outputs
• One or multiple input variables • One or multiple output variables to be

measured over time predicted

Single-step vs. Multi-step
◦ A one-step forecast model : a forecast problem that requires a prediction of the next time step
◦ A multi-step forecast model : a forecast problem that requires a prediction of more than one time step
One-step Multi-step
• Forecast the next time step • Forecast more than one future time
steps
◦ The more time steps to be projected into the future, the more challenging the problem given the compounding
nature of the uncertainty on each forecasted time step

Static vs. Dynamic
◦ It is possible to develop a model once and use it repeatedly to make predictions
Static Dynamic
• A forecast model is fit once and used • A forecast model is fit on newly
to make predictions available data prior to each
• The model is not updated or changed prediction
between forecasts • A new model or update the existing
model after receive new observations
prior to making a subsequent
forecast

Contiguous vs. Discontiguous
Contiguous Discontiguous
• A time series where the observations • A time series where the observations
are uniform over time are not uniform over time
• One observation each hour, day,
month or year
◦ The lack of uniformity of the observations may be caused by missing or corrupt values
◦ It may also be a feature of the problem where observations are only made available sporadically or at increasingly or
decreasingly spaced time intervals
◦ In the case of non-uniform observations, specific data formatting may be required when fitting some models to make the
observations uniform over time

Framework Review
1. Inputs vs. Outputs:
• What are the inputs and outputs for a forecast?
2. Endogenous vs. Exogenous:
• What are the endogenous and exogenous variables?
3. Unstructured vs. Structured:
• Are the time series variables unstructured or structured?
4. Regression vs. Classification:
• Are you working on a regression or classification predictive modeling problem?

• What are some alternate ways to frame your time series forecasting problem?

Framework Review
5. Univariate vs. Multivariate:
• Are you working on a univariate or multivariate time series problem?
6. Single-step vs. Multi-step:
• Do you require a single-step or a multi-step forecast?
7. Static vs. Dynamic:
• Do you require a static or a dynamically updated model?
8. Contiguous vs. Discontiguous:
• Are your observations contiguous or discontiguous?

Some Ideas for Extending
Apply Taxonomy
• Select a standard time series dataset and work through the questions in the taxonomy
to learn more about the dataset
Standard Form
• Transform the taxonomy into a form or spreadsheet that can re-use on new time series
forecasting projects going forward
Additional Characteristic
• Brainstorm and list at least one additional characteristic of a time series forecasting
problem and a question that might used to identify it

How to Develop a Skillful
Forecasting Model

Skillful Forecasting Model
◦ “Here is a dataset, now develop a forecast"
◦ This is the normal situation that most practitioners find themselves in when getting started on a new time series
forecasting problem
◦ A specific and actionable procedure that can use to work through the time series forecasting problem and get
better than average performance from the model
◦ A systematic four-step process that can use to work through any time series forecasting problem
◦ A list of models to evaluate and the order in which to evaluate them
◦ A methodology that allows the choice of final model to be defensible with empirical evidence, rather than whim or fashion

Situation
◦ A common situation to develop a forecast model
◦ Perhaps sent a CSV file
◦ Perhaps given access to a database
◦ Perhaps starting a competition
◦ The problem can be reasonably well defined:

◦ Have or can access historical time series data
◦ Know or can find out what needs to be forecasted
◦ Know or can find out how what is most important in evaluating a candidate model

Situation
◦ So how tackle this problem? Unless have been through this trial by fire
May struggle even if you have

May struggle even if you have a background in time series
May struggle because you are
machine learning experience forecasting because machine
new to the fields of machine
because time series data is learning methods may
learning and time series
different outperform the classical
approaches on the data

Process Overview
◦ The goal of this process is to get a good enough forecast model as fast as possible
◦ This process may or may not deliver the best possible model, but it will deliver a good model: a model that is better than a
baseline prediction
◦ This process will deliver a model that is 80% to 90% of what can be achieved on the problem
◦ The process is fast

◦ It focuses on automation
◦ Hyperparameters are searched rather than specified based on careful analysis
◦ Encouraged to test suites of models in parallel, rapidly getting an idea of what works and what doesn’t
◦ Nevertheless, the process is flexible, allowing you to circle back or go as deep as you like on a given step if you have the
time and resources

Process Overview
◦ The process is divided into four parts
Design
Define Test Finalize
Test
Problem Models Model
Harness
◦ The process is different from a classical linear work-through of a predictive modeling problem
◦ It is designed to get a working forecast model fast and then slow down and see if you can get a better model

How to Use This Process
◦ The biggest mistake is skipping steps
◦ The mistake that almost all beginners make is going straight to modeling without a strong idea of what problem is being
solved or how to robustly evaluate candidate solutions
◦ This almost always results in a lot of wasted time
◦ Slow down, follow the process, and complete each step
◦ Recommend having separate code for each experiment that can be re-run at any time
◦ This is important so that can circle back when discover a bug, fix the code, and re-run an experiment
◦ Running experiments and iterating quickly, but if sloppy, then cannot trust any of your results
◦ Especially important when it comes to the design of the test harness for evaluating candidate models

Step 1: Define Problem
1. Inputs vs. Outputs 2. Endogenous vs. 3. Unstructured vs. 4. Regression vs.
 What are the inputs and outputs Exogenous Structured Classification
for a forecast?  What are the endogenous and  Are the time series variables  Are you working on a regression or
exogenous variables? unstructured or structured? classification predictive modeling
problem?
 What are some alternate ways to
frame your time series forecasting
problem?
5. Univariate vs. Multivariate 6. Single-step vs. Multi-step 7. Static vs. Dynamic 8. Contiguous vs.
• Are you working on a univariate or • Do you require a single-step or a • Do you require a static or a Discontiguous
multivariate time series problem? multi-step forecast? dynamically updated model? • Are your observations contiguous
or discontiguous?

Step 1: Define Problem
◦ Some useful tools to help get answers include:
◦ Data visualizations (e.g. line plots, etc.)
◦ Statistical analysis (e.g. ACF/PACF plots, etc.)
◦ Domain experts
◦ Project stakeholders

Step 2: Design Test Harness
◦ Design a test harness that can use to evaluate candidate models
◦ Includes both the method used to estimate model skill and the metric used to evaluate predictions
◦ A common time series forecasting model evaluation scheme if looking for ideas
◦ Split the dataset into a train and test set
◦ Fit a candidate approach on the training dataset
◦ Make predictions on the test set directly or using walk-forward validation
◦ Calculate a metric that compares the predictions to the expected values
◦ The test harness must be robust and must have complete trust in the results it provides
◦ Ensure that any coeficients used for data preparation are estimated from the training dataset only and then applied on the
test set
◦ Include mean and standard deviation in the case of data standardization

Step 3: Test Models
◦ Test many models using the test harness
◦ Carefully designing experiments to test a suite of configurations for standard models and letting them run
◦ Each experiment can record results to a file, to allow quickly discover the top three to five most skillful congurations from
each run

Step 3: Test Models
◦ Some common classes of methods that can design experiments around include the following:
1. Baseline 2. Autoregression 3. Exponential Smoothing 4. Linear Machine

• Simple forecasting methods • The Box-Jenkins process and • Single, double and triple Learning
such as persistence and methods such as SARIMA exponential smoothing • Linear regression methods
averages methods and variants such as
regularization
5. Nonlinear Machine 6. Ensemble Machine 7. Deep Learning

Learning Learning • MLPs, CNNs, LSTMs, and
• kNN, decision trees, support • Random forest, gradient Hybrid models
vector regression and more boosting, stacking and more

Step 3: Test Models
◦ Configurations evaluation
◦ Search model configurations at a finer resolution around a configuration known to already perform well
◦ Search more model hyperparameter configurations
◦ Use analysis to set better bounds on model hyperparameters to be searched
◦ Use domain knowledge to better prepare data or engineer input features
◦ Explore different potentially more complex methods
◦ Explore ensembles of well performing base models

Step 3: Test Models
◦ Data preparation schemes:
◦ Differencing to remove a trend
◦ Seasonal differencing to remove seasonality
◦ Standardize to center
◦ Normalize to rescale
◦ Power Transform to make normal

Step 3: Test Models
◦ Speed up the evaluation of models include:
◦ Use multiple machines in parallel via cloud hardware (such as Amazon EC2)
◦ Reduce the size of the train or test dataset to make the evaluation process faster
◦ Use a more coarse grid of hyperparameters and circle back if have time later
◦ Perhaps do not refit a model for each step in walk-forward validation

Step 4: Finalize Model
◦ Know whether the time series is predictable
◦ If it is predictable, have a list of the top 5 to 10 candidate models that are skillful on the problem
◦ Pick one or multiple models and finalize them
◦ Involves training a new final model on all available historical data (train and test)
◦ The model is ready for use

◦ Make a prediction for the future
◦ Save the model to file for later use in making predictions
◦ Incorporate the model into software for making predictions

Time Series as Supervised
Learning

Time Series as Supervised Learning
◦ Time series forecasting can be framed as a supervised learning problem
◦ The time series data allows accessing to the suite of standard linear and nonlinear machine learning algorithms on any
problem
◦ How can re-frame the time series problem as a supervised learning problem for machine learning
◦ What supervised learning is and how it is the foundation for all predictive modeling machine learning algorithms
◦ The sliding window method for framing a time series dataset and how to use it
◦ How to use the sliding window for multivariate data and multi-step forecasting

Supervised Machine Learning
◦ The majority of practical machine learning uses supervised learning
◦ Have input variables (X) and an output variable (y)
◦ Use an algorithm to learn the mapping function from the input to the output
Y = f (X )

◦ The goal is to approximate the real underlying mapping
◦ A new input data (X) to predict the output variables (y) for that data
◦ A contrived example of a supervised learning dataset

one input variable (X)
X, y
5, 0.9
4, 0.8
one output variable to be predicted (y)
5, 1.0
3, 0.7
4, 0.9
Each row is an observation

◦ Called supervised learning because the process of an algorithm learning from the training dataset can be thought
of as a teacher supervising the learning process
◦ The algorithm iteratively makes predictions on the training data and is corrected by making updates
◦ Learning stops when the algorithm achieves an acceptable level of performance
◦ Supervised learning problems can be further grouped into regression and classification problems
Classification Regression
A classification problem is when the output A regression problem is when the output
variable is a category, such as red and blue variable is a real value, such as dollars or
or disease and no disease weight

Sliding Window
◦ Time series data can be phrased as supervised learning
◦ Given a sequence of numbers for a time series dataset, can restructure the data to look like a supervised learning problem
time, measure
1, 100
2, 110 Use previous time steps as input variables
3, 108
4, 115
5, 120
Use the next time step as the output variable

Sliding Window
◦ Restructure the time series dataset as a supervised learning problem
◦ Use the value at the previous time step to predict the value at the next time-step
◦ Re-organizing the time series dataset
X, y
?, 100 The previous time step is the input (X)
100, 110
110, 108
108, 115 The next time step is the output (y)
115, 120
120, ?  The order between the observations is preserved, and must continue to be
preserved when using this dataset to train a supervised model
 There are no previous value that can use to predict the first value in the
sequence
 Delete this row as cannot use it
 There are no a known next value to predict for the last value in the sequence
 Delete this value while training the supervised model

Sliding Window
◦ The sliding window method (the window method) : Use of prior time steps to predict the next time step
◦ In statistics and time series analysis, called a lag or lag method
◦ The number of previous time steps is called the window width or size of the lag
◦ The sliding window is the basis for how can turn any time series dataset into a supervised learning problem
◦ Notice:
◦ Turn a time series into either a regression or a classification supervised learning problem for real-valued or labeled time
series values
◦ The standard linear and nonlinear machine learning algorithms may be applied
◦ The width sliding window can be increased to include more previous time steps
◦ The sliding window approach can be used on a time series that has more than one value, or so-called multivariate time
series

Sliding Window With Multivariates
◦ Univariate Time Series
◦ These are datasets where only a single variable is observed at each time, such as temperature each hour
◦ The simplest to understand and work with
◦ Multivariate Time Series

◦ These are datasets where two or more variables are observed at each time
◦ It is harder to model and often many of the classical methods do not perform well
◦ It much more complicated than univariate time series analysis

◦ Example of a small contrived multivariate time series dataset with two observations at each time step
time, measure1, measure2
1, 0.2, 88
2, 0.5, 89
3, 0.7, 87
4, 0.4, 88
5, 1.0, 90  Re-frame this time series dataset as a supervised learning problem
with a window width of one
 Use the previous time step values of measure1 and measure2
 Have available the next time step value for measure1
Need to remove
 Then predict the next time step value of measure2
X1, X2, X3, y  Give 3 input features and one output value to predict for each
?, ?, 0.2, 88
0.2, 88, 0.5, 89
training pattern
0.5, 89, 0.7, 87
0.7, 87, 0.4, 88
0.4, 88, 1.0, 90
1.0, 90, ?, ?
Need to remove

◦ How to predict both measure1 and measure2 for the next time step?
◦ The sliding window approach can be used
◦ Phrase it as a supervised learning problem to predict both measure1 and measure2 with the same window width of one
X1, X2, y1, y2

?, ?, 0.2, 88
0.2, 88, 0.5, 89
0.5, 89, 0.7, 87
0.7, 87, 0.4, 88
0.4, 88, 1.0, 90
1.0, 90, ?, ?  Not many supervised learning methods can handle the prediction of
multiple output values without modification
 Predicting two different output variables
 Multi-step forecasting
 Predicting multiple time-steps ahead of one output variable

Sliding Window With Multiple Steps
◦ The number of time-steps to forecast:
◦ One-step Forecast: The next time step (t+1) is predicted
◦ Multi-step Forecast: Two or more future time steps are to be predicted
◦ There are a number of ways to model multi-step forecasting as a supervised learning problem
◦ Framing multi-step forecast using the sliding window method

Framing Multi-step Forecast
◦ Example of a small contrived time series dataset.
time, measure
1, 100
2, 110
3, 108
4, 115
5, 120  Frame this time series as a two-step forecasting dataset for supervised
learning with a window width of one
Cannot be used to train
X1, y1, y2
? 100, 110
100, 110, 108
110, 108, 115
108, 115, 120  A supervised model only has X1 to work with in order to predict both y1
115, 120, ?
120, ?, ?
and y2
 Careful thought and experimentation are needed on the problem to find
Cannot be used to train a window width that results in acceptable model performance

Review of Simple and Classical
Forecasting Methods

Simple and Classical Forecasting Methods
◦ Machine Learning and Deep Learning methods can achieve impressive results on challenging time series
forecasting problems
◦ There are many forecasting problems where classical methods such as SARIMA and exponential smoothing readily
outperform more sophisticated methods
◦ It is important to both understand how classical time series forecasting methods work and to evaluate them prior to
exploring more advanced methods
◦ How to Naive and Classical Methods for time series forecasting

◦ How to develop simple forecasts for time series forecasting problems that provide a baseline for estimating model skill
◦ How to develop autoregressive models for time series forecasting
◦ How to develop exponential smoothing methods for time series forecasting

Simple Forecasting Methods
◦ Establishing a baseline is essential on any time series forecasting problem
◦ A baseline in performance gives an idea of how well all other models will actually perform on the problem
◦ How to develop a simple forecasting methods

◦ Can use to calculate a baseline level of performance on the time series forecasting problem

Forecast Performance Baseline
◦ A baseline in forecast performance provides a point of comparison
◦ It is a point of reference for all other modeling techniques on the problem
◦ If a model achieves performance at or below the baseline, the technique should be fixed or abandoned
◦ The technique used to generate a forecast to calculate the baseline performance must be easy to implement and naive of
problem-specific details
◦ The goal is to get a baseline performance on the time series forecast problem, better understanding the dataset
and developing more advanced models
Good technique for making a naive forecast
Repeatable: A method that is deterministic,

Simple: A method that requires little or no Fast: A method that is fast to implement and
meaning that it produces an expected output
training or intelligence computationally trivial to make a prediction
given the same input

Forecast Strategies
◦ Simple forecast strategies are those that assume little or nothing about the nature of the forecast problem and
are fast to implement and calculate
◦ If a model can perform better than the performance of a simple forecast strategy
◦ Can be said to be skillful
Simple forecast strategies
Naive, or using observations values directly Average, or using a statistic calculated on previous observations

Naive Forecasting Strategy
◦ A naive forecast involves using the previous observation directly as the forecast without any change
◦ It is often called the persistence forecast as the prior observation is persisted
◦ This simple approach can be adjusted slightly for seasonal data
◦ The observation at the same time in the previous cycle may be persisted instead
◦ This can be further generalized to testing each possible offset into the historical data that could be used to persist a value
for a forecast
◦ Example of a univariate time series
[1, 2, 3, 4, 5, 6, 7, 8, 9]
Persist the last observation (relative index -1) as the value 9
Persist the second last prior observation (relative index -2) as 8

Average Forecast Strategy
◦ One step above the naive forecast is the strategy of averaging prior values
◦ All prior observations are collected and averaged, either using the mean or the median, with no other treatment to the data
◦ Want to shorten the history used in the average calculation to the last few observations
◦ Can generalize this to the case of testing each possible set of n-prior observations to be included into the average
calculation
◦ Example of a univariate time series
[1, 2, 3, 4, 5, 6, 7, 8, 9]
Average the last one observation (9)
Average the last two observations (8, 9)
◦ Example of a univariate time series with seasonal structure

[1, 2, 3, 1, 2, 3, 1, 2, 3]
The series with a 3-step cycle

Autoregressive Methods
◦ Autoregressive Integrated Moving Average (ARIMA) is one of the most widely used forecasting methods for
univariate time series data forecasting
◦ Although the method can handle data with a trend, it does not support time series with a seasonal component
◦ Seasonal Autoregressive Integrated Moving Average (SARIMA): An extension to ARIMA that supports the direct
modeling of the seasonal component of the series
◦ Discover the SARIMA method for time series forecasting with univariate data containing trends and seasonality

Autoregressive Integrated Moving Average Model
◦ An ARIMA model is a class of statistical models for analyzing and forecasting time series data
◦ It explicitly caters to a suite of standard structures in time series data, and as such provides a simple yet powerful method
for making skillful time series forecasts
◦ It is a generalization of the simpler AutoRegressive Moving Average or ARMA and adds the notion of integration
◦ This acronym is descriptive, capturing the key aspects of the model itself
◦ AR: Autoregression – A model that uses the dependent relationship between an observation and some number of lagged
observations
◦ I: Integrated – The use of differencing of raw observations (e.g. subtracting an observation from an observation at the
previous time step) in order to make the time series stationary
◦ MA: Moving Average – A model that uses the dependency between an observation and a residual error from a moving
average model applied to lagged observations

Autoregressive Integrated Moving Average Model
◦ A standard notation is used of ARIMA(p,d,q)
◦ The parameters are substituted with integer values to quickly indicate the specific ARIMA model being used
◦ It does not support seasonal data
◦ That is a time series with a repeating cycle
◦ ARIMA expects data that is either not seasonal or has the seasonal component removed
◦ Seasonally adjusted via methods such as seasonal differencing

Exponential Smoothing Methods
◦ Exponential smoothing: A time series forecasting method for univariate data that can be extended to support
data with a systematic trend or seasonal component
◦ It is a powerful forecasting method that may be used as an alternative to the popular Box-Jenkins ARIMA family of
methods
◦ Time series methods like the Box-Jenkins ARIMA family of methods develop a model
◦ The prediction is a weighted linear sum of recent past observations or lags
◦ Exponential smoothing forecasting methods are similar in that a prediction is a weighted sum of past
observations
◦ The model explicitly uses an exponentially decreasing weight for past observations
◦ Past observations are weighted with a geometrically decreasing ratio

Classical Time Series Forecasting
Methods in Python

Classical Time Series Forecasting Methods
◦ Before exploring machine learning methods for time series
◦ A good idea to ensure have exhausted classical linear time series forecasting methods
◦ Focused on linear relationships
◦ Sophisticated and perform well on a wide range of problems
◦ Assuming that the data is suitably prepared and the method is well configured
◦ A suite of classical methods for time series forecasting that can test on the forecasting problem prior to
exploring to machine learning methods

Autoregressive Moving Autoregressive Integrated
Autoregression (AR) Moving Average (MA)
Average (ARMA) Moving Average (ARIMA)
Seasonal Autoregressive
Seasonal Autoregressive
Integrated Moving- Vector Autoregression Vector Autoregression
Integrated Moving-
Average with Exogenous (VAR) Moving-Average (VARMA)
Average (SARIMA)
Regressors (SARIMAX)
Vector Autoregression
Moving-Average with Simple Exponential Holt Winter’s Exponential
Exogenous Regressors Smoothing (SES) Smoothing (HWES)
(VARMAX)

◦ All code examples are in Python and use the statsmodels library
◦ Each code example is demonstrated on a simple contrived dataset that may or may not be appropriate for the method
◦ Replace the contrived dataset with the data in order to test the method
◦ Install statsmodels library

statsmodels
◦ statsmodels is a Python module
◦ Provides classes and functions for the estimation of many different statistical models
◦ Conducting statistical tests, and statistical data exploration
◦ An extensive list of result statistics are available for each estimator

◦ The results are tested against existing statistical packages to ensure that they are correct
◦ The package is released under the open source Modified BSD (3-clause) license
◦ The online documentation is hosted at statsmodels.org

AutoRegression (AR)
◦ AR method models the next step in the sequence as a linear function of the observations at prior time steps
◦ The notation for the model involves specifying the order of the model p as a parameter to the AR function, e.g. AR(p)
◦ AR(1) is a first-order autoregression model
◦ The method is suitable for univariate time series without trend and seasonal components
statsmodels.tsa.ar_model.AutoReg
statsmodels.tsa.ar_model.AutoRegResults

Autoregression (AR)
◦ Autoregression Example
# AR example
from statsmodels.tsa.ar_model import AutoReg
from random import random
# contrived dataset
data = [x + random() for x in range(1, 100)]
# fit model
model = AutoReg(data, lags=1)
model_fit = model.fit()
# make prediction
yhat = model_fit.predict(len(data), len(data))
print(yhat)

Autoregression (AR)
◦ Running the example prepares the data, fits the model, and makes a prediction
[100.54323284]

Moving Average (MA)
◦ MA method models the next step in the sequence as a linear function of the residual errors from a mean process
at prior time steps
◦ A moving average model is different from calculating the moving average of the time series
◦ The notation for the model involves specifying the order of the model q as a parameter to the MA function, e.g. MA(q)
◦ MA(1) is a first-order moving average model

Moving Average (MA)
◦ MA Example
# MA example
from statsmodels.tsa.arima.model import ARIMA
# contrived dataset
# fit model
model = ARIMA(data, order=(0, 0, 1))
# make prediction
print(yhat)

Moving Average (MA)
[72.23938335]

Autoregressive Moving Average (ARMA)
◦ ARMA method models the next step in the sequence as a linear function of the observations and residual errors
at prior time steps
◦ It combines both Autoregression (AR) and Moving Average (MA) models
◦ The notation for the model involves specifying the order for the AR(p) and MA(q) models as parameters to an ARMA
function, e.g. ARMA(p, q)
◦ An ARIMA model can be used to develop AR or MA models

◦ ARMA Example
# ARMA example
# contrived dataset
data = [random() for x in range(1, 100)]
# fit model
# make prediction
print(yhat)

[0.48535413]

Autoregressive Integrated Moving Average (ARIMA)
◦ ARIMA method models the next step in the sequence as a linear function of the differenced observations and
residual errors at prior time steps
◦ It combines both Autoregression (AR) and Moving Average (MA) models as well as a differencing pre-processing step of
the sequence to make the sequence stationary, called integration (I)
◦ The notation for the model involves specifying the order for the AR(p), I(d), and MA(q) models as parameters to an ARIMA
function, e.g. ARIMA(p, d, q)
◦ An ARIMA model can also be used to develop AR, MA, and ARMA models
◦ The method is suitable for univariate time series with trend and without seasonal components

◦ ARIMA Example
# ARMA example
# contrived dataset
data = [random() for x in range(1, 100)]
# fit model
# make prediction
print(yhat)

[0.48535413]

Seasonal Autoregressive Integrated Moving-Average (SARIMA)
◦ SARIMA method models the next step in the sequence as a linear function of the differenced observations,
errors, differenced seasonal observations, and seasonal errors at prior time steps
◦ It combines the ARIMA model with the ability to perform the same autoregression, differencing, and moving average
modeling at the seasonal level
◦ The notation for the model involves specifying the order for the AR(p), I(d), and MA(q) models as parameters to an ARIMA
function and AR(P), I(D), MA(Q) and m parameters at the seasonal level, e.g. SARIMA(p, d, q)(P, D, Q)m where “m” is the
number of time steps in each season (the seasonal period)
◦ A SARIMA model can be used to develop AR, MA, ARMA and ARIMA models
◦ The method is suitable for univariate time series with trend and/or seasonal components

◦ SARIMA Example
# SARIMA example
from statsmodels.tsa.statespace.sarimax import SARIMAX
# contrived dataset
# fit model
model = SARIMAX(data, order=(1, 1, 1), seasonal_order=(0, 0, 0, 0))
model_fit = model.fit(disp=False)
# make prediction
yhat = model_fit.predict(start=len(data), end=len(data))
print(yhat)

[0.48535413]

Seasonal Autoregressive Integrated Moving-Average with
Exogenous Regressors (SARIMAX)
◦ SARIMAX is an extension of the SARIMA model that also includes the modeling of exogenous variables
◦ Exogenous variables are covariates and can be thought of as parallel input sequences that have observations at the same
time steps as the original series
◦ The primary series may be referred to as endogenous data to contrast it from the exogenous sequence(s)
◦ The observations for exogenous variables are included in the model directly at each time step and are not modeled in the
same way as the primary endogenous sequence (e.g. as an AR, MA, etc. process)
◦ The SARIMAX method can also be used to model the subsumed models with exogenous variables, such as ARX, MAX,
ARMAX, and ARIMAX
◦ The method is suitable for univariate time series with trend and/or seasonal components and exogenous
variables

◦ SARIMAX Example
# SARIMAX example
from statsmodels.tsa.statespace.sarimax import SARIMAX
# contrived dataset
data1 = [x + random() for x in range(1, 100)]
data2 = [x + random() for x in range(101, 200)]
# fit model
model = SARIMAX(data1, exog=data2, order=(1, 1, 1), seasonal_order=(0, 0, 0, 0))
# make prediction
exog2 = [200 + random()]
yhat = model_fit.predict(len(data1), len(data1), exog=[exog2])
print(yhat)

[100.13132921]

Vector Autoregression (VAR)
◦ VAR method models the next step in each time series using an AR model
◦ It is the generalization of AR to multiple parallel time series, e.g. multivariate time series
◦ The notation for the model involves specifying the order for the AR(p) model as parameters to a VAR function, e.g. VAR(p)
◦ The method is suitable for multivariate time series without trend and seasonal components

◦ VAR Example
# VAR example
from statsmodels.tsa.vector_ar.var_model import VAR
# contrived dataset with dependency

data = list()
for i in range(100):
v1 = i + random()
v2 = v1 + random()
row = [v1, v2]
data.append(row)
# fit model
model = VAR(data)
# make prediction
yhat = model_fit.forecast(model_fit.endog, steps=1)
print(yhat)

[[100.26399189 100.85867797]]

Vector Autoregression Moving-Average (VARMA)
◦ VARMA method models the next step in each time series using an ARMA model
◦ It is the generalization of ARMA to multiple parallel time series, e.g. multivariate time series
◦ The notation for the model involves specifying the order for the AR(p) and MA(q) models as parameters to a VARMA
function, e.g. VARMA(p, q)
◦ A VARMA model can also be used to develop VAR or VMA models
◦ The method is suitable for multivariate time series without trend and seasonal components

◦ VARMA Example
# VARMA example
from statsmodels.tsa.statespace.varmax import VARMAX

data = list()
v1 = random()
v2 = v1 + random()
row = [v1, v2]
data.append(row)
# fit model
model = VARMAX(data, order=(1, 1))
# make prediction
yhat = model_fit.forecast()
print(yhat)

[[0.36700568 0.70765394]]

Vector Autoregression Moving-Average with Exogenous Regressors
(VARMAX)
◦ VARMAX is an extension of the VARMA model that also includes the modeling of exogenous variables
◦ It is a multivariate version of the ARMAX method
◦ Exogenous variables are also called covariates and can be thought of as parallel input sequences that have observations
at the same time steps as the original series
◦ The primary series(es) are referred to as endogenous data to contrast it from the exogenous sequence(s)
◦ The observations for exogenous variables are included in the model directly at each time step and are not modeled in the
same way as the primary endogenous sequence (e.g. as an AR, MA, etc. process)
◦ The VARMAX method can also be used to model the subsumed models with exogenous variables, such as VARX and
VMAX
◦ The method is suitable for multivariate time series without trend and seasonal components with exogenous
variables

(VARMAX)
◦ VARMAX Example
# VARMAX example
from statsmodels.tsa.statespace.varmax import VARMAX

data = list()
v1 = random()
v2 = v1 + random()
row = [v1, v2]
data.append(row)
data_exog = [x + random() for x in range(100)]
# fit model
model = VARMAX(data, exog=data_exog, order=(1, 1))
# make prediction
data_exog2 = [[100]]
yhat = model_fit.forecast(exog=data_exog2)
print(yhat)

(VARMAX)
[[0.44761352 0.84834508]]

Simple Exponential Smoothing (SES)
◦ SES method models the next time step as an exponentially weighted linear function of observations at prior time
steps

◦ SES Example
# SES example
from statsmodels.tsa.holtwinters import SimpleExpSmoothing
# contrived dataset
# fit model
model = SimpleExpSmoothing(data)
# make prediction
print(yhat)

[99.45127609]

Holt Winter’s Exponential Smoothing (HWES)
◦ HWES called the Triple Exponential Smoothing method models the next time step as an exponentially weighted
linear function of observations at prior time steps, taking trends and seasonality into account
◦ The method is suitable for univariate time series with trend and/or seasonal components

◦ HWES Example
# HWES example
from statsmodels.tsa.holtwinters import ExponentialSmoothing
# contrived dataset
# fit model
model = ExponentialSmoothing(data)
# make prediction
print(yhat)

[99.91972499]

1 Fundamentals

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 Fundamentals

Uploaded by

Copyright:

Available Formats

1.

Badan Meteorologi, Klimatologi, dan Geofisika

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 13

Badan Meteorologi, Klimatologi, dan Geofisika

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 15

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 16

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 17

◦ Time does play a role in normal machine learning datasets

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 18

Time #1, observation

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 19

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 20

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 21

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 22

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 23

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 24

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 25

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 26

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 27

Badan Meteorologi, Klimatologi, dan Geofisika

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 29

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 30

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 31

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 32

◦ MLP capability is valuable for time series

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 33

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 34

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 35

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 36

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 37

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 38

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 39

◦ Native Support for Sequences

◦ The addition of sequence is a new dimension to the function being approximated

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 40

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 41

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 42

Badan Meteorologi, Klimatologi, dan Geofisika

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 44

Inputs 1 Outputs Univariate 5 Multivariate

Endogenous 2 Exogenous Single-step 6 Multi-step

Unstructured 3 Structured Static 7 Dynamic

Regression 4 Classification Contiguous 8 Discontiguous

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 45

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 46

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 47

◦ Classification predictive modeling problems are those where a category is predicted

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 48

• No obvious systematic time- • Systematic time-dependent patterns

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 49

• One variable measured over time • Multiple variables measured over

• One or multiple input variables • One or multiple output variables to be

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 50

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 51

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 52

Workshop on Deep Learning Time Series by adhi@sci.ui.ac.id July 2022 53

• What are the inputs and outputs for a forecast?

2. Endogenous vs. Exogenous:

• What are the endogenous and exogenous variables?

3. Unstructured vs. Structured:

• Are the time series variables unstructured or structured?

4. Regression vs. Classification:

• Are you working on a regression or classification predictive modeling problem?