Professional Documents
Culture Documents
Datasets in R
Quantitative Forecasting Methods
Data analysis: What are we looking for?
Seasonality
Repetitive data patterns
Stationarity
Mean and variance are constant over time
Trends
Positive or negative slope
asantand@uninorte.edu.co
Universidad del Norte
asantand@uninorte.edu.co
Universidad del Norte
1 - Identify the
dataset location
on your computer
Week Demand Week Demand Week Demand Week Demand Week Demand
1 48084 17 51794 33 50373 49 49604 65 52727
2 50464 18 51650 34 52085 50 48261 66 49626
3 48823 19 49389 35 50231 51 46489 67 46206
4 49398 20 50707 36 49803 52 52418 68 47842
5 52538 21 49382 37 50671 53 52397 69 50042
6 50818 22 53548 38 51454 54 49316 70 50597
7 50296 23 46409 39 49722 55 46310 71 51307
8 51440 24 49431 40 47570 56 50380 72 48105
9 51033 25 53011 41 49152 57 55074 73 49009
10 51561 26 49791 42 50926 58 50085 74 50210
11 49532 27 49264 43 46769 59 52782 75 50352
12 47341 28 51218 44 47194 60 49583 76 54427
13 48121 29 47416 45 53536 61 48879 77 46294
14 51865 30 50618 46 49701 62 46359 78 48602
15 51280 31 48947 47 49798 63 52092 79 53989
16 54595 32 47853 48 50706 64 47897 80 49911
As usual, the demand is a random process, but there is some things we can do to estimate it.
asantand@uninorte.edu.co
Universidad del Norte
Quantitative Forecasting Methods
Augmented Dickey Fuller test (ADF Test) Augmented Dickey Fuller test allows testing for higher-
is a common statistical test used to test order autoregressive processes:
whether a given Time Series is stationary
or not .
Since the p-value is less than 0.05 then the null hypothesis is
rejected, as a consequence a stationary behavior is assumed.
asantand@uninorte.edu.co
Universidad del Norte
asantand@uninorte.edu.co
Universidad del Norte
𝑡−1
What is the optimal value of N?
𝐷𝑡 TTR is one of the libraries
𝑡=𝑡−𝑁 For example, let’s use n = 2 containing MA(n) functions
𝐹𝑡 = SMA: Simple Moving Averages
𝑛
Stationary Forecasting Methods: Moving Averages
Now, separating the forecasts corresponding to the training and testing datasets…
et = Ft – Dt
asantand@uninorte.edu.co
Universidad del Norte
Stationary Forecasting Methods: Moving Averages
Calculating the MAD for the training and testing datasets…
Are they
statistically
different?
asantand@uninorte.edu.co
Universidad del Norte
Stationary Forecasting Methods: Moving Averages
Let’s validate if n = 2 is a good value to parametrize the forecast model
asantand@uninorte.edu.co
Universidad del Norte
Quantitative Forecasting Methods
asantand@uninorte.edu.co
Universidad del Norte
Quantitative Forecasting Methods
AR(p) stands for the autoregressive process; the MA(q) stands for moving average model, the q is the
p parameter is an integer that helps to estimate how number of lagged forecast error terms in the
many lagged series are going to be used to forecast. prediction equation.
Regression analysis and Holt´s method are used under this condition.
Let ( x1 , y1 ), ( x2 , y2 ), …, ( xn , yn ) be the n paired data points for x and y.
The optimal values of the parameters are chosen so that the sum of the squared distances between the regression
line and the data points is minimized.
What is the best value of 𝛽?
Forecasting Methods: Trend based methods
Double Exponential Smoothing: addresses the issue of changes on the slopes over time
Holt´s Method requires two smoothing constants α and β. 𝑆𝑡 = 𝛼𝐷𝑡 + (1 − 𝛼)(𝑆𝑡−1 + 𝐺𝑡−1 )
𝐹𝑡,𝑡+𝜏 = 𝑆𝑡 + 𝜏 𝐺𝑡
asantand@uninorte.edu.co
Universidad del Norte
Forecasting Methods: Trend based methods
Double Exponential Smoothing:
12 24
asantand@uninorte.edu.co
Universidad del Norte
Forecasting Methods: Trend based methods
Double Exponential Smoothing:
12
𝐷1 = (1/𝑛) 𝐷𝑖 = 156.08
Average growth = 222.24 – 156.08 = 66.17 24
𝑖=1
1
24
ഥ
𝐷= 𝐷𝑖 = 189,16
𝑛
Average growth per month = 66.17/12 = 5.51 𝐷2 = (1/𝑛) 𝐷𝑖 = 222.24 𝑖=1
𝑖=13
asantand@uninorte.edu.co
Universidad del Norte
Forecasting Methods: Trend based methods
Double Exponential Smoothing:
If the demand for the 25th month is already known: D25= 259
Assume values for α and β → α = β = 0.1
asantand@uninorte.edu.co
Universidad del Norte
Forecasting Methods: Trend based methods
Double Exponential Smoothing: Example
Step 1. Compute the sample mean Step 2. Divide each observation by the sample mean
The sample mean is useful to know the central This will provide an understanding of the variance
tendency of the demand process. of the demand process.
Step 3. Average the factors corresponding to Step 4. Understand that each period (hour, day,
each period on the season month, semester, year…) has its own dynamic
The seasonal factor allows to calculate the demand It is important to update parameters after a new
forecast for each forthcomming period. demand value is available.
Example: Note:
The transportation department wants to estimate the Since there are not a set of parameters on this
number of cars crossing a bridge to schedule workers at method, there is no need of divide the dataset on
the tollbooths. training and testing.
Example: Note:
The transportation department wants to estimate the Since there are not a set of parameters on this
number of cars crossing a bridge to schedule workers at method, there is no need of divide the dataset on
the tollbooths. training and testing.
Example:
The transportation department wants to estimate the
number of cars crossing a bridge to schedule workers at
the tollbooths.
500
400
Ft = (μ + t Gt ) ct + et 300
200
100
0 1 2 3 4 5 6 7 8 9 10 11 12
Forecasting: Seasonal and trend-based methods
Holt-Winters: The model requires 3 parameters ɑ, β and γ, the length of a season, and the number of
periods in a season.
asantand@uninorte.edu.co
Universidad del Norte
Forecasting: Seasonal and trend-based methods
Initialization dataset: A set of data used to initialize model parameters
60+234+163+252 69+266+188+278
𝑑2011 = = 177,25 𝑑2012 = = 200,25
4 4
𝑆𝑡 = 𝑑ҧ + 𝑇 − 𝑡ҧ 𝐺𝑡
𝑑1 60
𝐶1 = = = 0,35582 𝑑2 234 𝑑6 266
𝐹1 168,625 𝐶2 = = 174,375 = 1,34194 𝐶6 = = 197,375 = 1,34769
𝐹2 𝐹6
𝑑4 252 𝑑8 278
𝐶4 = = 185,875 = 1,35575 𝐶8 = = 208,875 = 1,33094
𝐹4 𝐹8
Forecasting: Seasonal and trend-based methods
Initialization dataset: A set of data used to initialize model parameters
Error minimization is more flexible in optimization based tools, but in statistical packages it is not simple to solve
for MAD or MAPE (R an other tools are set to minimize the MSE).
Non-Stationary Time
Series: ARIMA
Forecasting: Seasonal and trend-based methods: ARIMA
Autoregressive integrated moving average (ARIMA)
ARIMA models are applied in cases where the data show evidence of non-stationarity behavior
ARIMA (p,d,q)
In ARIMA models, d represents the degree of differentiation, that is, the number of differences used to make time
series stationary.
asantand@uninorte.edu.co
Universidad del Norte
Questions
Alcides R. Santander M. Ph.D.
email: Website:
asantand@uninorte.edu.co www.uninorte.edu.co