You are on page 1of 58

Hierarchical Mixtures of AR Models

for Financial Time Series Analysis

Carmen Vidal(1) & Alberto Suárez (1,2)


(1) Computer Science Dpt., Escuela Politécnica Superior
(2) Risklab Madrid
Universidad Autónoma de Madrid (Spain)

alberto.suarez@ii.uam.es
Financial time series

Time series of assets are highly irregular


If market efficiency hypothesis is correct they are
also unpredictable.
Time series of assets are non-stationary
They are usually transformed in log-returns, or, for
short periods of time, in relative returns

Asset returns exhibit deveations from normality


Leptokurtic: Heavy tails
Heteroskedastic: Volatility clustering

2
Financial time series modelling/ analysis
Modelling finacial time-series is not easy
Natural sciences
⌧ Not reproducible
⌧ Underlying model?
Inductive / statistical learning
⌧ Small data sets
⌧ Complex data
14000

12000

• Non-linear
10000

8000

• Non-stationarity 6000

• Non-gaussian
4000

2000

• Heteroskedastic 0
0 500 1000 1500 2000 2500 30

3
Two stylized facts (Timo Teräsvirta)

Returns exhibit two empirically


observed features:
Correlations
⌧Short term for the returns
⌧Medium term for absolute
values of returns
Leptokurtosis
⌧Heavy tails
⌧Extreme events

4
An example: IBEX35

5
Daily returns: IBEX35 (5 years)

6
Daily-returns distribution

7
Black-Scholes theory
In theory: Markets are efficient
Absence of arbitrage opportunities.
No systematic trends.
Very short term memory.
Model: Black-Scholes
Log of daily returns of an asset are distributed
according to a normal distribution.
Two parameters:
• Risk free interest rate.
• Volatility [ free parameter]

8
Is Black-Scholes a good model?
Advantages
Simple minimal model with only one free
parameter, the volatility. Volatility smile
Good pricing accuracy for at-the-money (European call)
Sonrisa de la volatilidad
options. 0.258

Implied volatility
0.256
Analytic pricing formulas for simple 0.254

derivatives. 0.252

Drawbacks: 0.25

0.248

Incorrect pricing formulas for: 0.246

Deep in-the-money or out-of-the-money 0.244

Short-term (less than a month) orptions 0.242

0.24
Options on underlying with very low or 80 85 90 95 100 105 110

very high volatility. Strike


This is reflected in the fact that implied
volatility is not constant [Volatility smile]
9
Beyond Black-Scholes
In practice markets are
Not efficient: Memory effects (short/long term?).
Very unpredictable (at least sometimes)
Extreme events are more frequent than what the Black-Scholes models
predicts.
Occurrence of crashes.
Changes in economic paradigm.
Market friction: Transaction costs, lack of liquidity, dividends,
etc.
Heteroskedasticity + heavy tails

Need more sophisticated model


Parametric models: Generalizations of Blak-Scholes.
Non-parametric models: Neural networks, Mixture models
10
Memory effects (IBEX 35)

11
12
Failure of normal model: Heavy tails

13
14
Empirical evidence for leptokurtosis
0.24

0.235

Volatility smiles and smirks 0.23

Black-Scholes is insufficient to
0.225

0.22

account for time evolution of 0.215

underlying. 0.21

0.205
80 85 90 95 100 105 110 115 120

Incremented risk
Multiplicative factor in market
Risk estimates (Basel Accord
1988, 1996 ammendment)

15
Time series analysis

Consider the time series


X1 , X 2 ,…, X t , …, X T
Time series analysis
Forecasting X ˆ t +d = F ( X t , X t −1 , …; θt );
Classification Class = F ( X t , X t −1 , …; θt )
Modelling P( X t +d | X t , X t −1 , …; θt )
These problems are closely related to each other:
X t +d = F ( X t , X t −1 , …; θt ) + ε t + d ;
P(ε t +d | X t , X t −1 , …; θt )
16
Time series prediction: a Learning view

Network model for time-series prediction

1
X t −1
X t −2 Learning
device
X̂ t

X t− p

17
Tasks in time series analysis

Obtaining data:
Selection of attributes: Choose relevant indicators
Data collection
⌧Discrete data: Grouping /averaging in time window
⌧Continuous data: Importance of sampling frequency

Preprocessing data
Clean data : Missing data, outliers
Normalization of data X t − µ ; X t − median ; 2 X t − ( X max + X min )
σ iq X max − X min
Eliminate trends /seasonality: Handle a-priori info explicit /
Stationary data. X t − X t −1; X t − X t −1 ; log X t
18
X t −1 X t −1
Parametric / non-parametric data analysis
Parametric
Formulate (restrictive) hypothesis dependent on a set of parameters
Find parameters by data-driven optimization [training set]
⌧Sensitivity analysis
⌧Uncertainty in estimated parameters
⌧Robustness
Validation of models [test set]
Non-Parametric
Consider a family of universal approximants
Fix architecture / parameters by data-driven optimization [training set]
⌧Sensitivity analysis
⌧Robustness
⌧Uncertainty
⌧Intelligibility
Validation of models [test set] 19
Classical models in time-series

Consider the time series


X 0 , X 1 , X 2 ,… X t −1 , X t ,… , X T
The series exhibits randomness.
The process is covariance-stationary when:
E [X t ] =µ
Mean is time independent

Autocovariance is independent of time-translations

E [( X t +τ − µ )( X t − µ )]= γ τ
20
Autorregressive+Moving average models
Autorregressive model for a time-series
[ p] [q]
X t = f ( X , u ; θ) + ut
t t
Vectors of delayed values:
[X ] = [ X
[m] +
t t −1 X t −2 X t −m ]
[u ] = [u
[m] +
t t −1 ut −2 ut −m ]

The systematic term Xˆ t = f ( X t , ut ; θ) reflects trends.


[ p] [q]

The innovations ut are uncorrelated noise.


Maximization of the likelihood function yields estimates
of the model parameters.
21
Autoregressive (feedforward) MLP
Input layer
1 w10(1) Hidden layer(s)
(1 ) Output layer
w20
xt −1 w1( 2 )

xt − 2 w2( 2 ) xˆ ( t )

wJ( 2 )

xt − D (1)
wJD
J
  D
 
xˆt = ∑ w j  f  ∑ w jd xt −d + w j 0 ; θ j  − c j ;
( 2) (1) (1)

j =1   d =1  
Sigmoidal (logistic) Hyperbolicx tangent:
1 −x
e −e
f ( x) = f ( x) =
1 − e− x e x + e− x 22
ARMA(p,q) MLP
Input layer
1 θ1 Hidden layer(s)
θ2
Output layer
xt −1 w1( 2 )

xt − 2 w2( 2 ) xˆ ( t )
AR
w
wJ( 2 )

xt − p

delay
xˆt −1 J
 p AR
xˆt −2 w MA xˆt = ∑ w j f  ∑ w jd xt −d +
delay j =1  d =1
p

+ + ∑ wMA ( xt −d − x
ˆt − d ) + θ j
;
xˆt − q _ ut − q d =1
jd

delay 23
Mixture model
GATING
NETWORK
X t −1
MODEL 1 g1
X t −s g2
MODEL 2
X̂ t
σt2−1 Σ σˆ t2
gJ

σt2− s MODEL J

1
24
Gating Network  
ˆ
r −1

hi = exp bi  X t −1 + ∑ ai X t −k −1 − ci 
ˆ
  k =1 
1 -c1
1 h1
Xˆ t −1
a1 h2
Xˆ t −2

ar-1
hJ −1
Xˆ t − r

J −1
hi
Probabilities gi = J −1 ; i = 1,2 ,… ( J − 1) gJ = 1 − ∑ g j
1+ ∑hj j =1

j =1 25
Hierarchical mixtures
  r −1

exp b1  X t −1 + ∑ a1k X t −k −1 − c1 
Input = Vector of Delayed values µ1 =   k =1 
  r −1

1 + exp b1  X t −1 + ∑ a1k X t −k −1 − c1 
  k =1 
µ 2 = 1 − µ1
µ1 µ2
  r −1

exp b2  X t −1 + ∑ a2 k X t −k −1 − c2 
µ1|1 =   k =1 
  r −1

1 + exp b2  X t −1 + ∑ a2 k X t −k −1 − c2 
µ1|1 µ2|1   
MODEL 3 k =1

µ 2|1 = 1 − µ1|1

MODEL 1 MODEL 2 Model 1 µ 11 = µ 1|1 µ 1 ;


Model 2 µ 12 = µ 2 |1 µ 1 ;
Model 3 µ2
26
Mixture of Gaussians for t-independent pdf
Empirical sample X 1 , X 2 ,… X N
Model pdf K
P ( x ) = ∑ pk N ( x; µk , σ k )
Two steps: k =1
Toss a K-sided loaded dice to choose component.
Extract value from the selected model.
Advantages:
Close to the normal world.
Accounts for leptokurtosis of empirical unconditional
distributions in finance.
27
28
29
Mixture of Gaussians

Intuition:
Implicitly market forecasts are made in terms of scenarios.
Each of these scenarios is characterized by an expected return
and a volatility.
Markets assign a different probability to each scenario.

Dynamical picture?
Direct time aggregation of the process yields a normal model (by
Central Limit Theorem).
It is possible to construct a discontinuous jump process
maintaining the mixture form. Not realistic.
30
Mixture of AR processes

Mixtures of Gaussians + autorregressive dynamics


In:
In Vector of delays (Used in gating network + AR models)
Out:
Out Next value in time series

No hierarchy Tree hierarchy

31
Synthetic data: Example 1

Time series generated by a hierarchical mixture of 3 AR(1)


experts

Expert contributions Histogram (unconditional pdf)


1 250
E3
E1
0.9 E2

0.8 200

0.7

0.6 150

0.5

0.4
100

0.3

0.2
50

0.1

0
−10 −8 −6 −4 −2 0 2 4 6 8 0
Contribucion de cada experto −10 −8 −6 −4 −2 0 2 4 6 8

32
LL Train LL Test K-S Test ECM Test
Model 1 fit
-17967 -18009 0 0.4645

Fitting to a mixture of 2 AR(1) experts


(wrong type of model!)

Contributions
1
Histogram
140
Percentile plot10
g1
g2
0.9
120

0.8 5

100
0.7

0.6 0
80

Y Quantiles
0.5

60
0.4
−5

0.3 40

0.2
−10
20
0.1

0 0
−10 −8 −6 −4 −2 0 2 4 6 8 10 −15 −10 −5 0 5 10 −15
Contribucion de cada experto −10 −8 −6 −4 −2 0 2 4 6 8
X Quantiles

33
LL Train LL Test K-S Test ECM Test
Model 2 fit
-16675 -16755 0.9666 0.3164

Fitting to a mixture of 3 AR(1) experts


(learnable model)

Contributions Histogram Percentile plot


1 200 8
E3
E1
0.9 E2 180 6

0.8 160
4

0.7 140
2

0.6 120

Y Quantiles
0

0.5 100
−2

0.4 80
−4
0.3 60

−6
0.2 40

−8
0.1 20

−10
0 0 −10 −8 −6 −4 −2 0 2 4 6 8
−10 −8 −6 −4 −2 0 2 4 6 8 10 −10 −8 −6 −4 −2 0 2 4 6 8 X Quantiles
Contribucion de cada experto

34
AR(1) fit for Ibex35 (1200 +712 days)

35
AR(1) fit for Ibex35 (1200 +712 days)

36
MIX 2 AR(1) fit for Ibex35

37
MIX 3 AR(1) fit for Ibex35

38
Hierarchical MIX 3 AR(1) fit for Ibex35

39
Conclusions and perspectives

Mixtures of AR(1) models improve the results of single


AR(1) models in financial returns time series.
Mixtures of 2 / 3 experts seem to be sufficient to
model leptokurtosis and dynamics.
The introduction of hierarchy in the structure of the
mixture may significantly improve statistical description
of financial time series data.
To do:
Heteroskedasticity
Calibration of models to market

40
Mixture of ARCH processes

MixARCH X t = φ[+i ] ⋅ X [t m ] + u[ i ] (t ),
with probability g[i ] ( X [t r ] , θ[ i ] )
The model for the residuals is

u[ i ] (t ) = σ[ i ] (t ) Z t
2 + 2 [q]
σ ( t ) = κ i + α ⋅ [u ] ( t )
[i ] i [i ]
The quantities Z t are assumed to be N(0,1)

41
Mixture of GARCH processes

MixGARCH Xˆ t = φ[+i ] ⋅ X
ˆ [t m ] + u[ i ] (t ),
[r]
with probability g[ i ] ( X , θ[ i ] )
t

The model for the residuals is


u[ i ] (t ) = σ[ i ] (t ) Z t
2 + 2 [q] + 2 [ p]
σ ( t ) = κ i + α ⋅ [u ] ( t ) + β ⋅ [ σ ] ( t )
[i ] i [i ] i [i ]

The quantities Z t are assumed to be N(0,1)

42
AR(1) / ARCH(1) for IBEX35

The maximum-likelihood fit of the time-series


IBEX35 yields the model
Xˆ t = 0.1129 Xˆ t −1 + σ t Z t
σ = 0.9097 + 0.1118( Xˆ t −1 − 0.1129 Xˆ t −2 )
2 2
t

The quantities Z t are assumed to follow a N(0,1)


distribution.

43
Residual correlations: ARCH(1)

44
Normality hypothesis: ARCH(1)

200
4

2 150

s
eil
t 0
n
au
Q 100
Y
-2

50
-4

-6 0
-6 -4 -2 0 2 4 6 -4 -3 -2 -1 0 1 2 3 4 5
X Qua ntile s

KS Test = 0.12
45
MIXARCH for IBEX35

The mixture model is


Model 1 Xˆ t = 0.0559 Xˆ t −1 + σ t Z t
σ = 2.2194 + 0.1976( X t −1 − 0.0559 X t −2 )
2
t
2
ˆ ˆ

Model 2 Xˆ t = 0.1380 Xˆ t −1 + σ t Z t
σ = 0.6820 + 0.03821( X t −1 − 0.1380 X t −2 )
2
t
2
ˆ ˆ

The probabilities for the mixture are


1
g[1] ( X t −1 ) = ;
1 + exp{− 0.6839( X t −1 − 2.5155)}
g[ 2 ] ( X t −1 ) = 1 − g[1] ( X t −1 )
46
Residual correlations: MIXARCH

47
Normality hypothesis: MixARCH(1)

6
160

4
140

120
2
100
s
eil
t 0
n 80
ua
Q
Y
60
-2

40
-4
20

-6
-6 -4 -2 0 2 4 6 0
-3 -2 -1 0 1 2 3
X Qua ntile s

KS Test = 0.83
48
MIXARCH Model fit

49
AR(1) / GARCH(1,1) for IBEX35

The maximum-likelihood fit of the time-series


IBEX35 yields the model
Xˆ t = 0.1358 Xˆ t −1 + σ t Z t
σ = 0.0527 + 0.0755( Xˆ t −1 − 0.1358 Xˆ t −2 ) +
2 2
t

0.8733σ 2
t −1
The quantities Zare
t assumed to follow a N(0,1)
distribution.

50
Residual correlations: GARCH
Autocorre la tions of re s idua ls
1
0.8
e 0.6
d
ut
i
n
g 0.4
a
M 0.2
0
0 5 10 15 20 25 30

Autocorre la tions of a bs (re s idua ls )


1
0.8
e 0.6
d
ut
i
n
g 0.4
a
M 0.2
0
0 5 10 15 20 25 30
De la y
51
Normality hypothesis: GARCH(1,1)

250
6

4 200

2
s 150
eil
t
n
a 0
u
Q
Y 100
-2

50
-4

-6 0
-6 -4 -2 0 2 4 6 -4 -2 0 2 4 6
X Qua ntile s

KS Test = 0.56
52
Test Data
90

80
3
70
yt 2
60 ili
t
50
al 1
o
V
40
0
30
100 200 300 400 500 600
Time
20

10

0
Autocorre la tions of re s idua ls
-6 -4 -2 0 2 4 6
1
0.8
6 e
d 0.6
ut
i

KS = 0.33
ng 0.4
4 a
M 0.2
0
2
0 5 10 15 20 25 30
s Autocorre la tions of a bs (re s idua ls )
eil
t 0
n
ua 1
Q
Y 0.8
-2
e
d 0.6
ut
i
ng 0.4
-4
a
M 0.2
0
-6
-5 0 5 0 5 10 15 20 25 30
X Qua ntile s
De la y 53
MIXGARCH for IBEX35

The mixture model is


Model 1 Xˆ t = 0.1255 Xˆ t −1 + σ t Z t
σ = 0.0156 + 0.0778 ( X t −1 − 0.1255 X t −2 ) + 0.8937 σt2−1
2
2
t
ˆ

Model 2 Xˆ t = 0.3314 Xˆ t −1 + σ t Z t
σ = 2.6230 + 0.0000( X t −1 − 0.3314 X t −2 ) + 0.0285 σt2−1
2
2
t
ˆ ˆ

The probabilities for the mixture are


1
g[1] ( X t −1 ) = ;
1 + exp{0.5418 ( X t −1 − 4.8710)}
ˆ
g[ 2 ] ( X t −1 ) = 1 − g[1] ( X t −1 )
54
Residual correlations: MIXGARCH
Autocorre la tions of re s idua ls

1
0.8
e
d 0.6
ut
i
n
g 0.4
a
M 0.2
0
0 5 10 15 20 25 30

Autocorre la tions of a bs (re s idua ls )


1
0.8
e
d 0.6
ut
i
n
g 0.4
a
M 0.2
0
0 5 10 15 20 25 30
De la y 55
Normality hypothesis: MIXGARCH

6
160

140
4

120
2
100
s
eil
80 t 0
n
a
u
Q
60 Y
-2

40

-4
20

0 -6
-3 -2 -1 0 1 2 3 -6 -4 -2 0 2 4 6
X Qua ntile s

KS test = 0.95
56
MIXGARCH Model fit
0.8
y 0.6
p
or 0.4
t
n 0.2
E
0
200 400 600 800 1000 1200

s 0.8
eit 0.6 Mode l 1
ili
b 0.4 Mode l 2
a
b 0.2
or
P 0
200 400 600 800 1000 1200

2
yt
iil
t 1
al
o
V
0
200 400 600 800 1000 1200
Time 57
Test Data
80

70
3
60
yt 2
ili
t
50 al 1
o
40
V
0
100 200 300 400 500 600
30
Time
20

10 Autocorre la tions of re s idua ls


1
0
-6 -4 -2 0 2 4 6 0.8
e
d 0.6
ut
i
ng 0.4
6 a
M 0.2
0
4 0 5 10 15 20 25 30

2
KS = 0.25 Autocorre la tions of a bs (re s idua ls )

1
s 0.8
eil
t 0
n e
ua d 0.6
Q ut
Y i
ng 0.4
-2 a
M 0.2
0
-4
0 5 10 15 20 25 30
De la y
-6
-6 -4 -2 0 2 4 6
X Qua ntile s 58

You might also like