cm1 Anglais Annoté PDF

Module MP(C)
Methods for predicting numerical variables

Simon Malinowski
M1 Miage - M1 Data Science, Univ. Rennes 1
Simon Malinowski (Univ. Rennes 1/IRISA) January 28, 2021 1 / 55

Schedule of this course
1 Part 1 : Time series prediction

I 5h CM
I 8h TD/TP (including a graded work)
2 Part 2 : Prediction using other variables (regression models)
I 5h CM
I 8h TD/TP
I 4h Project (graded)
→ Kaggle competition : Lien

Time series prediction
Simon Malinowski
M1 Miage, Univ. Rennes 1

1 Introduction : context, definitions
2 Time series decomposition
3 Time series prediction

Time series examples
electricity consumption of users every minute or hour

number of people in the public transport every day
average daily temperature in a given city

Our goal: predict the next value(s)

Our goal: predict the next value(s)
pollution
355
350
345
340
335
330
325
20 22 24 26 28
Time

What is a time series?
Definition
A time series is a finite series of numerical data points that represents
the evolution of a certain quantity over reguar time stamps
Also called sometimes chronolgical series

Denoted by X (t) = x1 , x2 , . . . , xn , or also X (t) = x(1), x(2), . . . , x(n)

Time series
The frequency of the observations (time stamps) can be:

hourly
daily
weekly
monthly
quarterly
yearly
whatever, provided that it is regular
The measured quantity is numerical

Graphical representations
1) Global representation : the points (t, xt ) are depicted on a plot

Example 1 : number of unemployed people registered to ”Pôle Emploi” in
France every month from 2001 to 2014
5000
4500
Nombre de chômeurs
4000
3500
3000
2000 2005 2010 2015
Time


Example 2 : Monthly average temperature in Nottingham
Period 1920-1940
65
60
55
Temperature(F)
50
45
40
35
30
1920 1925 1930 1935 1940
Time

Example 3 : Average monthly air trafic
Period 1949-1961
600
500
Traffic aérien
400
300
200
100
1950 1952 1954 1956 1958 1960
Time

2) By-period representation : superimposed periods
Unemployment data
6000
1996
1997
2001
2007
5500
2012
2013
2014
5000
chom[1:12]
4500
4000
3500
3000
2 4 6 8 10 12
Index

2) By-period representation : superimposed periods
1964
1968
70
1972
1975
60
Temperature(F)
50
40
30
20
10
2 4 6 8 10 12
Index


Analyzing and understanding a time series
Time series often represent a complex phenomenon, difficult to analyze in

a straightforward manner
The methods that we will use to predict a time series are based on the
decomposition of a TS into elements that are simpler to
handle with
control
The elements that are considered are :
1 the trend Tt
2 the seasonal component St
3 the residual component Et

Decomposition models
Additive model
We assume that Xt can be written as
Xt = Tt + St + Et
Multiplicative model
We assume that Xt can be written as
Xt = (Tt ∗ St ) + Et
Xt = Tt ∗ St ∗ Et

The trend
Definition
This component represents the global evolution of the considered
phemomenon
Assumptions about this component

slowly varying
deterministic
can be estimated as a (simple) mathematical function

Trend or not trend ?
600
500
AirPassengers
400
300
200
100
1950 1952 1954 1956 1958 1960
Time

600
500
AirPassengers
400
300
200
100
1950 1952 1954 1956 1958 1960
Time

582
581
580
LakeHuron
579
578
577
576
1880 1900 1920 1940 1960
Time

582
581
580
LakeHuron
579
578
577
576
1880 1900 1920 1940 1960
Time

Kinds of trends
1 Linear: Tt = a + b × t
2 Polynomial : Tt = a0 + a1 × t + · · · + an × t n
3 Logarithmic : Tt = a + b × log t
4 Exponential : Tt = a × exp(bt)

The seasonal component
Definition
The seasonal component represents periodic fluctuations around the mean
inside a period (of length p) and that occur almost identically from period
to period
Example : Temperature in Nottingham

65
60
55
Temperature(F)
50
45
40
35
30
1920 1925 1930 1935 1940
Time

Definition
The seasonal component represents periodic fluctuations around the mean
inside a period (of length p) and that occur almost identically from period
to period
Example : Temperature in Nottingham

60
55
Température moy
50
45
40
2 4 6 8 10 12
Mois

This component is completetly defined by p seasonal coefficients

s1 , . . . sp that represent the seasonal profile.
We have St = s1 , . . . , sp , s1 , . . . , sp , . . .
Hence, the seasonal component St is the repetition of these p coefficients
over each period.

The residual component Et
This component represents irregular and ”unpredictable” fluctuations.

It is a random component, considered often as a zero-mean random
variable

Determining the composition of a time series
All time series are not composed of both a trend and a seasonal component
The typical kinds of time series are
completely random : Xt = Et
with a trend component only : Xt = Tt + Et
with a seasonal component only : Xt = St + Et
with both a trend and a seasonal component : Xt = Tt + St + Et
Next step is a tool to help you determine in which case we are

Autocorrelation function
Let Xt = x1 , . . . , xn be a time series, et k an integer ∈ [0, n − 1].

The autocorrelation function of Xt is defined as :
n
X
(xt − x)(xt−k − x)
cov (xt , xt−k ) t=k+1
ρ(k) = = n
var (xt ) X
(xt − x)2
t=1
Properties :
ρ(0) = 1
ρ(k) ≤ 1, ∀k > 0

For a completely random time series (Xt = Et ),
p
|ρ(l)| < 1.96 ∗ 1/n, ∀l > 0
3
2
1
bbg
0
-1
-2
-3
0 200 400 600 800 1000
Index
Series bbg
1.0
0.8
0.6
ACF
0.4
0.2
0.0
0 5 10 15 20 25 30
Lag

For a time series with a trend (Xt = Tt + Et ), we have:
ρ(l) → 1, ∀l > 0
2.0
1.8
1.6
yy
1.4
1.2
1.0
0 200 400 600 800 1000
xx
Series yy
1.0
0.8
0.6
ACF
0.4
0.2
0.0
0 5 10 15 20 25 30
Lag

For a time series with a trend (Xt = Tt + Et ), we have:
ρ(l) → 1, ∀l > 0
20
15
yy
10
5
0
0 50 100 150 200
xx
Series yy
1.0
0.8
0.6
ACF
0.4
0.2
0.0
0 5 10 15 20
Lag

For a time series of the kind xt = cos( 2Tπ t) (with a seasonal component),
we have :
2π
ρ(l) → cos( t), ∀l > 0
T
0.0 0.5 1.0
yy
-1.0
0 50 100 150 200
xx
Series yy
0.0 0.5 1.0
ACF
-1.0
0 5 10 15 20
Lag

Example for a time series with a period of length 4

80
60
yy
40
20
0 20 40 60 80
Index
Series yy
1.0
0.5
ACF
0.0
-0.5
0 5 10 15
Lag

And for a series with both a trend and a seasonal component:
2π
xt = a + b × t + cos( t)
T
25
20
15
zz
10
5
0
0 50 100 150 200
xx
Series zz
1.0
0.8
0.6
ACF
0.4
0.2
0.0
0 5 10 15 20
Lag

Detection of trend and/or seasonality
Summary:
1 Intuition about the phenomenon
2 Graphical representations of the time series
3 Analysis of the autocorrelation function
I slow decrease of the coefficients → trend
I periodicity of the coefficients → seasonality
I combination of both : trend + seasonality

Seasonal component: additive or multiplicative ?
For an additive model, the time series stays inside a constant ”strip”
around the trend

Seasonal component: additive or multiplicative ?
For a multiplicative model, the ”strip” arounf the mean is not

constant over time, i.e. the seasonal components depend on the trend


Series witout trend nor seasonality

Example

Prediction of a series witout trend nor seasonality
It is the simplest case, rarely used in practice. But it allows to understand

and apprehend some important things that will be used later.
Hypothesis: the time series takes its values around a stable level a , it has
no trend an no seasonal component.
Xt = a + et ,
with et a residual component.
Problem : how to estimate a in order to predict the future values of Xt ?

Let Xt = x1 , . . . , xn a series witout trend nor seasonality. We aim at

predicting the next value : x̂n+1 (and why not the next ones).
1 Naı̈ve : the last value xn is used as the prediction at time n + 1
x̂n+1 = xn
+ next value is likely to be close to the previous one

− only the previous value is used to predict (the past information is lost)
On our example:

2 prediction is the mean of all past values :

n
1X
x̂n+1 = xt
n
t=1
+ all past values are used for prediction

−
On our example:

Prediction using the mean of past values
Prévisions des ventes par moyenne de toutes observations passées
Ventes réelles
Prévision
350
ventes
300
250
200
5 10 15
Index

3 prediction is the median of all past values

+ all available data is taken into account
+ extreme values have less influence
− the close past is as important than the far past
On our example:

Prediction using the median of past values
Ventes réelles
Prévision par moyenne
Prévision par médiane
350
ventes
300
250
200
5 10 15
Index

Prévision d’une série sans tendance, ni saisonnalité
4 Weighted sum of all past values

Pn
ωi x i
x̂n+1 = Pi=1
n
i=1 ωi
+ we can decide to give more influence to some past values

+ all available data is used
− choice of ωi is tedious
On our example:

5 Simple exponential smoothing

+ more influence to recent values
+ all available data is used
+ flexible and easy to apply
Principle:

Simple exponential smoothing : analysis
x̂n+1 = α xn + (1 − α) x̂n

x̂n+1 = α xn + (1 − α) x̂n
= α xn + (1 − α) ( α xn−1 + (1 − α) x̂n−1 )

x̂n+1 = α xn + (1 − α) x̂n
= α xn + (1 − α) ( α xn−1 + (1 − α) x̂n−1 )
= α xn + α(1 − α) xn−1 + (1 − α)2 (αxn−2 + (1 − α)x̂n−2 )

x̂n+1 = α xn + (1 − α) x̂n
= α xn + (1 − α) ( α xn−1 + (1 − α) x̂n−1 )
= α xn + α(1 − α) xn−1 + (1 − α)2 (αxn−2 + (1 − α)x̂n−2 )
= ...
= α xn + α(1 − α) xn−1 + α(1 − α)2 xn−2 + · · · +
α(1 − α)k xn−k + α(1 − α)n−1 x1 + (1 − α)n x̂1

alpha = 0.9
alpha = 0.7
alpha = 0.5
0.8
alpha = 0.3
alpha = 0.1
Poids associé pour la prévision
0.6
0.4
0.2
0.0
2 4 6 8 10 12 14
Indice observation passée

Influence of the parameter α
if α is big (close to 1), we give a lot of influence to recent values. At

the limit (α = 1), I predict the last observed value
→ predictions are fluctuating; quick reaction to changes in the data
if α is small (close to 0), more influence is given to past values (tends

toward the mean of all past values)
→ predictions are stable but slow reactions to changes in the data

Simple Exponential Smoothing
Ventes réelles
Prévision Lissage expo simple alpha = 0.7
Prévision Lissage expo simple alpha = 0.3
350
ventes
300
250
200
5 10 15
Index

Simple Exponential smoothing : example

Choice of α and x̂0

Auto-regressive models (AR)
Principle :
p
X
x̂n = β0 + βk xn−k
k=1
p is the order of the model : AR(p)

the coefficients βj are estimated based on the original series

Auto-regressive models : example

How to choose the order p ?
For a given p, when a model is fitted (coefficients are estimated), we can

compute the BIC score (Bayesian Information Criterion).
It is a tradeoff between accuracy of the intermediate predictions and the
complexity of the model (number of coefficients)
The best value of p for a given time series can be chosen according to this
score (the lower BIC, the better)

Summary

Sélection de la meilleure méthode

cm1 Anglais Annoté PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

cm1 Anglais Annoté PDF

Uploaded by

Copyright:

Available Formats

Module MP(C)

Methods for predicting numerical variables

M1 Miage - M1 Data Science, Univ. Rennes 1

Simon Malinowski (Univ. Rennes 1/IRISA) January 28, 2021 1 / 55

1 Part 1 : Time series prediction

Simon Malinowski (Univ. Rennes 1/IRISA) January 28, 2021 2 / 55

M1 Miage, Univ. Rennes 1

Simon Malinowski (Univ. Rennes 1/IRISA) January 28, 2021 3 / 55

2 Time series decomposition

3 Time series prediction

Simon Malinowski (Univ. Rennes 1/IRISA) January 28, 2021 4 / 55

electricity consumption of users every minute or hour

Simon Malinowski (Univ. Rennes 1/IRISA) January 28, 2021 5 / 55

Simon Malinowski (Univ. Rennes 1/IRISA) January 28, 2021 6 / 55

Simon Malinowski (Univ. Rennes 1/IRISA) January 28, 2021 7 / 55

Also called sometimes chronolgical series

Simon Malinowski (Univ. Rennes 1/IRISA) January 28, 2021 8 / 55

The frequency of the observations (time stamps) can be:

Simon Malinowski (Univ. Rennes 1/IRISA) January 28, 2021 9 / 55

1) Global representation : the points (t, xt ) are depicted on a plot

2000 2005 2010 2015

Simon Malinowski (Univ. Rennes 1/IRISA) January 28, 2021 10 / 55

1) Global representation : the points (t, xt ) are depicted on a plot

1920 1925 1930 1935 1940

Simon Malinowski (Univ. Rennes 1/IRISA) January 28, 2021 11 / 55

1950 1952 1954 1956 1958 1960

Simon Malinowski (Univ. Rennes 1/IRISA) January 28, 2021 12 / 55

Simon Malinowski (Univ. Rennes 1/IRISA) January 28, 2021 13 / 55

2) By-period representation : superimposed periods

Simon Malinowski (Univ. Rennes 1/IRISA) January 28, 2021 13 / 55

2 Time series decomposition

3 Time series prediction

Simon Malinowski (Univ. Rennes 1/IRISA) January 28, 2021 14 / 55

Time series often represent a complex phenomenon, difficult to analyze in

Simon Malinowski (Univ. Rennes 1/IRISA) January 28, 2021 15 / 55

Simon Malinowski (Univ. Rennes 1/IRISA) January 28, 2021 16 / 55

Assumptions about this component

Simon Malinowski (Univ. Rennes 1/IRISA) January 28, 2021 17 / 55

1950 1952 1954 1956 1958 1960

Simon Malinowski (Univ. Rennes 1/IRISA) January 28, 2021 18 / 55

1950 1952 1954 1956 1958 1960

Simon Malinowski (Univ. Rennes 1/IRISA) January 28, 2021 19 / 55

1880 1900 1920 1940 1960

Simon Malinowski (Univ. Rennes 1/IRISA) January 28, 2021 20 / 55

1880 1900 1920 1940 1960

Simon Malinowski (Univ. Rennes 1/IRISA) January 28, 2021 21 / 55

Simon Malinowski (Univ. Rennes 1/IRISA) January 28, 2021 22 / 55

Example : Temperature in Nottingham

1920 1925 1930 1935 1940

Simon Malinowski (Univ. Rennes 1/IRISA) January 28, 2021 23 / 55

Example : Temperature in Nottingham

Simon Malinowski (Univ. Rennes 1/IRISA) January 28, 2021 23 / 55

This component is completetly defined by p seasonal coefficients

Simon Malinowski (Univ. Rennes 1/IRISA) January 28, 2021 24 / 55

This component represents irregular and ”unpredictable” fluctuations.

Simon Malinowski (Univ. Rennes 1/IRISA) January 28, 2021 25 / 55

Next step is a tool to help you determine in which case we are

Simon Malinowski (Univ. Rennes 1/IRISA) January 28, 2021 26 / 55

Let Xt = x1 , . . . , xn be a time series, et k an integer ∈ [0, n − 1].

Simon Malinowski (Univ. Rennes 1/IRISA) January 28, 2021 27 / 55

0 200 400 600 800 1000

Simon Malinowski (Univ. Rennes 1/IRISA) January 28, 2021 28 / 55