Msgarch 1

Markov-Switching GARCH Model
Hoàng Dung Vũ Minh
November 4th, 2023
1. Introduction
Financial time series data often exhibit volatility clustering, where periods of high
volatility are followed by subsequent periods of high volatility. Modeling and under-
standing this volatility is crucial for risk management, option pricing, and portfolio
optimization in the field of finance. One powerful class of models designed for captur-
ing volatility patterns is the Generalized Autoregressive Conditional Heteroskedas-
ticity (GARCH) family of models.
However, as the financial domain becomes increasingly nuanced, the limitations
of traditional GARCH models in fully encapsulating the complexities of volatility
dynamics have led to the emergence of more sophisticated frameworks. MSGARCH
extends the capabilities of traditional GARCH models by introducing the concept
of regime-switching in a Markov Chain, acknowledging that market conditions and
volatility regimes can undergo abrupt changes over time.
2. Time Series Background

Strict stationary of stochastic process time series {yt } requires both sequences
{y1 , y2 , ..., yn } and {y1+h , y2+h , ..., yn+h } possess the same joint distribution for any
integer n ⩾ 1 and any integer h. Strict stationarity requires that the joint distribu-
tion of a stochastic process does not depend on time and so the only factor affecting
the relationship between the two observations is the gap between them.
Covariance stationarity is a weaker form of strict stationary that is easier to
satisfy. A stochastic process yt is covariance stationary if
E[yt ] = µ for t = 1, 2, . . .
Var[yt ] = σ 2 < ∞ for t = 1, 2, . . . (1)
E[(yt − µ)(yt−s − µ)] = γs for t = 1, 2, . . . , s = 1, 2, . . . , t − 1.
Covariance stationarity requires that both the unconditional mean and uncondi-
tional variance are finite and do not change with time. The covariance stationarity
conditions only apply to unconditional moments and not conditional moments, and
so a covariance process may have a varying predictable conditional mean and con-
ditional variance.
1
3. GARCH Theoretical Background
Building upon the pivotal ARCH model by Engle (1982), the standard (single-
regime) GARCH(1,1) model is defined by Bollerslev (1986); Bollerslev et al. (1994)
as
y t = µ + εt (2)
σt2 = ω + αε2t−1 + βσt−1

2
(3)
εt = σ t e t (4)
iid
et ∼ D(0, 1, ζ) (5)
Where yt is the time series of interest, usually a financial return time series, µ repre-
sents the conditional mean of yt , and εt is the error term that includes a stochastic
volatility process σt2 , and a sequence of i.i.d innovation/shock et , with σt2 is condition-
ally independent of et . Here et follows D(0, 1, ζ) which is a continuous distribution of
innovation term with zero mean, unit variance, and shape parameter ζ (D(.) is typ-
ically assumed to be a Normal distribution). Coefficient α represents the magnitude
of a unit shock’s immediate impact on the next period variance, while coefficient β
indicates the memory of the variance process.
The expected conditional mean and conditional variance by conditioning on It−1 =
{yt−h , h > 0} which is the information set of the time series up to time t − 1. is:
E[yt | It−1 ] = E[µ | It−1 ] + E[σt et | It−1 ]

= E[µ | It−1 ] + E[σt | It−1 ]E[et | It−1 ] (6)
=µ
V ar[yt | It−1 ] = V ar[µ | It−1 ] + V ar[σt et | It−1 ]

= 0 + E[σt2 e2t | It−1 ] − (E[σt et | It−1 ])2
(7)
= E[σt2 | It−1 ]E[e2t | It−1 ]
= σt2
Under these conditions, the return distribution condition on It−1 is:
yt | It−1 ∼ D(µ, σt2 , ζ) (8)
And equivalently
εt | It−1 ∼ D(0, σt2 , ζ) (9)
The equation of interest is equation (3), which states the conditional variance or
the volatility process (σt2 ) at time t depends both on the past values of the shock
captured by the squared error terms (ε2t−1 ) and lagged values of itself (σt−1
2
). All of
2
the right-hand side variables that determine σt2 are known at time t − 1, and so σt2
is in the information set It−1 . To ensure the positivity of the conditional variance,
restrictions are imposed such that ω > 0, α ⩾ 0, and β ⩾ 0.
When expectations are taken over all time, the conditional terms become their un-
conditional counterparts. Hence, the unconditional (long-run) variance by taking the
expectation of both sides of equation (3) is:
V ar[yt ] = E[σt2 ] = ω + αE[ε2t−1 ] + βE[σt−1

2
]
2 2 2
σ = ω + ασ + βσ (10)
ω
σ2 =
1−α−β
Where the second equality states from the Law of Iterated Expectations that E[ε2t−1 ] =
2
E[σt−1 e2t−1 ] = E[E[σt−1
2
e2t−1 ] | It−1 ]] = E[σt−1
2
E[e2t−1 | It−1 ]] = E[σt−1
2
· 1] =
2
V ar[yt−1 ] = σ . The sum α + β measures the persistence of a shock to the con-
ditional variance in equation (3), which must be less than 1 to ensure the covariance
stationary of the process.
When a GARCH model is estimated using daily or higher frequency data or data
suffers from structural break, the estimate of the sum (α+β) tends to be biased close
to or equal to one, indicating that the volatility process is highly persistent, which
causes poor volatility prediction, or violates the covariance stationary condition.
4. Discrete Markov Chain Background

Let st be the state variable which is equal to k if the time series is in state/regime k
at time t, then the sequence {st } is an ergodic Markov chain on a finite discrete set
state space S = {1, . . . , K}. The Markov Property states that condition on the most
recent observation st−1 = i, then the transition probability state to st = j at time t,
is independent of the past st−2 , st−3 , .... The time-homogeneity of the Markov Chain
indicates that transition probabilities are time-invariant which means they do not
depend on t or the history of previous information It−1 . Consequently, we have the
following one-step transition probability of the Markov Chain:
P (st = j|st−1 = i, st−2 , . . . , It−1 ) = P (st = j|st−1 = i) = pij ∀i, j ∈ S (11)
Therefore its transition probability matrix P for all possible states in the state space
S is:  
p11 p21 · · · pK1
 p12 p22 · · · pK2 
P =  .. (12)
 
.. . . .. 
 . . . . 
p1K p2K · · · pKK
Where each row of P sums to 1 ( K

P PK
j pij = j P (st = j|st−1 = i) = 1). Because
given the observation of the current state st = i, then the next state st+1 must lie in
one of the possible states in S. As a result, when summing over K possible states in
the next state result in 1 from the Law of Total Probability. And the self-transition
3
probability is given as pkk = P (st = k | st−1 = k), which represents the persistence
of "staying probabilities" within a state. If the diagonal of the transition matrix
P is large relative to other non-diagonal entries, then it indicates that there exists
strong persistence within a state.
The t-step transition probability that the chain started in state i at time s0 and hits
state j in t steps, by conditioning on state st−1 = k is:
K
X
P (st = j | s0 = i) = P (st = j | st−1 = k, s0 = i)P (st−1 = k | s0 = i)
k=1
K
X
= P (st = j | st−1 = k)P (st−1 = k | s0 = i) (Markov Property)
k=1
XK
= pkj P (st−1 = k | s0 = i)
k=1
(t)
= (Pt )ij = pij
(13)
Recursively substitute by conditioning on the most recent time st−l−1 (l > 1) demon-
strates that P (st = j | s0 = i) is the ijth entry of the transition matrix Pt (transition
matrix P raised to power t).
Denoting the row vector π0 = [P (s0 = 1), P (s0 = 2), ..., P (s0 = K)] as the probabil-
ity distribution of the Markov chain at time t = 0. Then the t-step state probability
distribution is:
πt = π0 P t (14)
For a finite discrete Markov Chain to be ergodic, two conditions must be met:
irreducibility and aperiodicity.
A Markov Chain is irreducible if all states communicate with each other which
(n) (m)
means there exist two positive integers m > 0, n > 0 such that pij > 0 and pji > 0
∀i, j ∈ S. Intuitively, this means that it is possible to go from state i to state j after
n steps, and vice versa after m steps, for all states in the state space S.
The periodicity of a state in a Markov chain is related to the number of steps it takes
for the chain to return to that state. A Markov chain is aperiodic if it is irreducible
and, for all states k in the state space S the greatest common divisor of all possible
(n)
return times d(k) = gcd{n > 0 : pkk > 0} is equal to 1. In simpler terms, an
aperiodic Markov Chain allows for the chain to return to any state at any time step.
Under these conditions of ergodicity, namely irreducibility, and aperiodicity, then
exists a unique stationary or long-run distribution π = (π1 , π2 , ..., πK ) with the
property that
π = πP t = lim π0 P t (15)
t→∞
The unconditional regime probabilities from the stationary assumption of the Markov
Chain are defined as πk = P (st = k) for all k ∈ S, which represents the probability
the Markov Chain is expected to be in regime k the long run.
4
For two regime K = 2, the time-invariant property of stationary unconditional
regime probabilities follows that πk = P (st = k) = P (st−1 = k), for k = 1, 2 ∀t. The
unconditional state probability for the first state π1 , by conditioning on the previous
state st−1 is:
π1 = P (st = 1)
= P (st−1 = 1)P (st = 1|st−1 = 1) + P (st−1 = 2)P (st = 1|st−1 = 2)
= π1 p11 + π2 p21
= π1 p11 + (1 − π1 )(1 − p22 )
Solving this equation, the unconditional stationary probabilities π1 , π2 are:

1 − p22 1 − p11
π1 = , π2 = 1 − π1 = .
2 − p11 − p22 2 − p11 − p22
5. Markov Switching GARCH model

The Markov-Switching GARCH (MSGARCH) model was proposed to capture regime-
switching in the conditional variance by allowing the parameters to change over time
based on the states of an unobservable discrete latent Markov chain. This model al-
lows for the capture of differences in variance dynamics resulting from high-frequency
data or structural breaks in low and high-volatility periods.
Following the specification by BAUWENS et al. (2010) and Haas et al. (2004), the
general MSGARCH(1,1) model is given by
y t = µ s t + εt (16)
σt2 = ωst + αst ε2t−1 + βst σt−1

2
(17)
εt = σ t e t (18)
iid
et ∼ D(0, 1, ζ) (19)
Where the parameters follow a homogeneous, stationary, ergodic Markov chain with
finite state-space denoted with st , which is independent of the i.i.d sequence of
shocks et . The stochastic mechanism drives a latent ergodic homogeneous Markov
chain with K regimes for the MSGARCH model, according to Hamilton and Susmel
(1994). Latent refers to the inability to predict or systematically observe regimes or
regime shifts. Parameters of the model must satisfy the constraints ωst > 0, αst ⩾ 0
and βst ⩾ 0 for st ∈ S = {1, . . . , K} so that the conditional variance is positive and
αst + βst < 1 to be covariance stationary for all possible states st .
5
5.1. Path Dependence Problem
Hamilton and Susmel (1994) have argued that regime-switching GARCH models
are essentially intractable and impossible to estimate due to the dependence of the
conditional variance on the entire past history of the data in a GARCH model.
Given initial variance σ02 , Haas et al. (2004) demonstrated successive recursion of
equation (17) yields:
t−1
X i−1 t−1
Y Y
σt2 2 2

= ωst−i + αst−i εt−1−i βst−j + σ0 βst−i (20)
i=0 j=0 i=0
This recursive equation shows that the conditional variance σt2 at time t depends on
the entire history of regimes of to time t.
According to Gray (1996), the distribution of yt at time t, conditional on the regime
(st ) and on the available information It−1 , depends directly on st , and also indirectly
on st−1 , st−2 ,.... due to the path dependence problem in MSGARCH models. This
is because the conditional variance at time t (σt2 ) depends upon the conditional
variance at time t − 1, which depends upon the regime at time t − 1 and on the
conditional variance at time t − 2, and so on. Consequently, the conditional variance
at time t depends on the entire sequence of regimes up to time t.
Gray (1996) showed that calculation for the log-likelihood function for a sample of
T observations requires integration over all K T possible (unobserved) regime paths,
which makes the numerical optimization of the likelihood function break downs for
very large T .
5.2. Hass Solution

To circumvent the path dependence problem of the original MSGARCH,Haas et al.
(2004) allows the conditional variance σt2 to follow an ergodic Markov Chain that
depends on the regime st , giving the formulation the error term:
εt = σst ,t et (21)
Haas et al. (2004) originally conditioned the innovation et to be the same as equation
(21), however a more generalization of the innovation was adopted by Ardia et al.
(2019) by letting the innovations be transitive according to state st = k, giving rise
to ek,t which is a generalization of et , then we have the shock as:
iid
ek,t ∼ D(0, 1, ζ k ) (22)
And the modified error term is:
εt = σst ,t est ,t (23)
If we condition on the realization that the state st = k and the available information
set It−1 , we get the return distribution on state k is:
6
2
yt | (st = k, It−1 ) ∼ D(µk , σk,t , ζk ) (24)
Where the conditional variance is:
V ar[yt | st = k, It−1 ] = V ar[µk | st = k, It−1 ] + V ar[εt | st = k, It−1 ]
= V ar[µk | st = k, It−1 ] + V ar[σk,t ek,t | st = k, It−1 ]
2 2
= 0 + E[σk,t ek,t | st = k, It−1 ] − (E[σk,t ek,t | st = k, It−1 ])2
2
= E[σk,t | st = k, It−1 ]E[e2k,t | st = k, It−1 ]
2
= σk,t
(25)
2
Given the formulation of D(.) in equation (24), we derive σk,t is the variance of yt
conditioned on the realization of st = k and the information set It−1 . Therefore, the
conditional variance equation conditioned on state st = k is:
2
σk,t = ωk + αk ε2t−1 + βk σk,t−1
2
(26)
The MSGARCH(1,1) model proposed by Haas et al. (2004) can also be written in
matrix form for the conditional variance:
σt2 = ω + αε2t−1 + βσt−1

2
(27)
where ω = [ω1 , ω2 , ..., ωK ]T , α = [α1 , α2 , ..., αK ]T , and β = diag[β1 , β2 , ..., βK ]T . The

K-regimes conditional variances process σt2 = [σ1,t 2 2
, σ2,t 2 T
, ..., σK,t ] is thereby a vector
of k × 1 components.
Equation (27) indicates that the evolution of the conditioned regime st = k volatil-
2
ity process σk,t only depends on within-regime st = k parameters and within-regime
variables. Coefficient αk measures the immediate impact of a shock on the next
2
period variance for regime k conditional variance σk,t , while β reflects the memory
of the variance process for regime k in response to the shock. Here, the condi-
2
tional variances σk,t , for k = 1, ..., K are assumed to follow K separate independent
GARCH-type processes which evolve in parallel across time
Suppose there are two regimes, one with a low-volatility process (st = i), and
one with a high-volatility process (st = j), and there exists a transition from a
low-volatility regime to a high-volatility regime at time t. Comparing the two MS-
GARCH models, the conditional variance process between equation (17) and (26)
states that for period t, the variance dynamics are the same if st = st−1 . If there
exists a regime shift from st−1 = i to st = j at time t, then equation (17) states that
the low-volatility regime in t − 1 continues to dictate the conditional variance σt2
regardless of the regime shift, whereas equation (26) determines an instantaneous
2
shift in the conditional variance σk,t by the high-volatility regime. This parame-
terization of equation (21) is often in accordance with the real-life explanation that
financial market volatility often instantaneously increases substantially in periods of
turbulence. The usual observation of volatility clustering from the standard GARCH
model that low (high) volatility today is followed by low (high) volatility tomorrow
is only practical for a given regime, but not necessarily true for periods that exists
a regime shift.
7
5.3. Methods of Estimation
Under the model specification of Haas et al. (2004), the parameters of the MS-
GARCH can be either estimated by a frequentist method via Maximum Likelihood
or a Bayesian approach from MCMC under development by Ardia et al. (2019).
Let θ i be a vector of all GARCH parameters across states i in S: θ i = (µi , σi,t , ωi , αi , βi ).
Then Ψ is the a vector of mixture of all parameters Ψ = (θ1 , ..., θK , ζ1 , ..., ζK , P)
Hamilton (1989)’s paper proposed an algorithm to estimate the parameters of a
switching process when the true state of the system at any given time is unobserv-
able. Basically, Hamilton’s filter is used to jointly estimate i) the parameters of the
model conditional on being in state k and ii) the probability that we are in state k
at a particular time t.
Hamilton’s filter calculates each state’s conditional density st = j by making in-
ferences on each state’s unknown probabilities by using all the data up to time t.
When the filter probabilities are obtained, Hamilton’s filter is used to apply the
state transition probabilities to construct the log-likelihood function by repetitions
of predicting and updating for state inference. Given the direct observation about
yt , the inference about the value of st (the filtered probability at time t) will take
the form of:
ξj,t = P (st = j | It , Ψ) ∀j ∈ S (28)
Then the filtered probability for the previous state at time t − 1 is
ξi,t−1 = P (st−1 = i | It−1 , Ψ) ∀i ∈ S (29)
Then the probability density under regime j at time t is:

ηj,t = f yt | st = j, It−1 , Ψ ∀j ∈ S (30)
Then the conditional probability density of observing the current observation yt
8
given all available information It−1 and all model parameters Ψ is:
K
X
f (yt | It−1 , Ψ) = f (yt , st = j | It−1 , Ψ)
j=1
K
X
= f (yt | st = j, It−1 , Ψ)P (st = j | It−1 , Ψ)
j=1
 
K
X K
X
= f (yt | st = j, It−1 , Ψ) P (st = j, st−1 = i | It−1 , Ψ)
j=1 i=1
 
K
X K
X
= f (yt | st = j, It−1 , Ψ) P (st = j | st−1 = i, It−1 , Ψ)P (st−1 = i | It−1 , Ψ)
j=1 i=1
 
K
X K
X
= f (yt | st = j, It−1 , Ψ) P (st = j | st−1 = i)P (st−1 = i | It−1 , Ψ)
j=1 i=1
K X
X K
= P (st−1 = i | It−1 , Ψ)P (st = j | st−1 = i)f (yt | st = j, It−1 , Ψ)
j=1 i=1
K X
X K
= ξi,t−1 pij ηj,t
j=1 i=1
(31)
Here the P (st−1 = i | It−1 , Ψ)P (st = j | st−1 = i) represents the prior beliefs of the
observer about the state of time series at time t − 1. And f (yt | st = j, It−1 Ψ) is the
observed data with belief at state st = j. Then the likelihood is the product of the
prior and the observed data condition on the state st .
The proof states that the conditional current state probability P (st = j | It−1 , Ψ) is
decomposed into the past filtered probability (ξi,t−1 ) and the transition probability
(pij ):
K
X
P (st = j | It−1 , Ψ) = P (st = j | st−1 = i)P (st−1 = i | It−1 , Ψ) (32)
i=1
9
Consequently, the filtered probability for the current state (ξj,t ) can be obtained via:
P (yt , st = j | It−1 , Ψ)
P (st = j | It , Ψ) =
f yt | It−1 , Ψ

f yt | st = j, It−1 , Ψ P (st = j | It−1 , Ψ)
=
f yt | It−1 , Ψ
PK
f yt | st = j, It−1 , Ψ i P (st = j | st−1 = i, It−1 , Ψ)P (st−1 = i | It−1 , Ψ)
=
f yt | It−1 , Ψ
PK
f yt | st = j, It−1 , Ψ i P (st = j | st−1 = i)P (st−1 = i | It−1 , Ψ)
=
f yt | It−1 , Ψ
PK
i P (st = j | st−1 = i)P (st−1 = i | It−1 , Ψ)f yt | st = j, It−1 , Ψ
=
f yt | It−1 , Ψ
PK
i pij ξi,t−1 ηj,t
=
f yt | It−1 , Ψ
(33)
The given procedures state that the current filtered probability of st (ξj,t ) can be
calculated as long as the previous filtered probability of st−1 (ξi,t−1 ) is known. As
a result, the conditional probability density f (yt | It−1 , Ψ) in equation (31) can
be calculated in an iterative process, where inference is performed iteratively for
t = 1, 2, ..., T to calculate ξi,t−1 as input for ξj,t as output.
Several options are available for the value ξi,0 to use to start these iterations. If
the Markov chain is presumed to be ergodic, Hamilton (1989) suggested using the
unconditional probabilities P (s0 = i). Another altenative is setting ξi,0 = 1/K,
where K is the total number of Markov states in S.
Equivalently, the likelihood function to be maximized is the product of each con-
ditional density given all available information It−1 and all model parameters Ψ
is:
T
Y
L(Ψ | IT ) = f (yt | It−1 , Ψ) (34)
t=1
The log-likelihood function to maximize is:
T
X
log[L(Ψ | IT )] = log[f (yt | It−1 , Ψ)] (35)
t=1
5.3.1. A numerical example:
The following numerical example assumes there exists two-state volatility, one with
a low-volatility process (st = i), and one with a high-volatility process (st = j)
yt = µst + εt (36)
10
And probability density under regime j at time t follows a Normal Distribution is:
1 1 (yt − µj )2
ηj,t = f yt | st = j, It−1 , Ψ = q exp(− 2
) (37)
2
2πσj,t 2 σj,t
Then the conditional probability density at time t is:

2 X
X 2
f (yt | It−1 , Ψ) = ξi,t−1 pij ηj,t
j=1 i=1
(38)
= ξ1,t−1 p11 η1,t + ξ1,t−1 p12 η2,t + ξ2,t−1 p21 η1,t + ξ2,t−1 p22 η2,t
= ξ1,t−1 (p11 η1,t + (1 − p11 )η2,t ) + ξ2,t−1 ((1 − p22 )η1,t + p22 η2,t )
The filtered probability for the current state (ξj,t ) is:

P2
pij ξi,t−1 ηj,t p ξ η + p2j ξ2,t−1 ηj,t
ξj,t = P (st = j | It , Ψ) = i=1 = 1j 1,t−1 j,t
f yt | It−1 , Ψ f yt | It−1 , Ψ
p1j ξ1,t−1 ηj,t + p2j ξ2,t−1 ηj,t
=
ξ1,t−1 (p11 η1,t + (1 − p11 )η2,t ) + ξ2,t−1 ((1 − p22 )η1,t + p22 η2,t
(39)
Initially, the conditional probability density at time t = 1 is:
2 X
X 2
f (y1 | I0 , Ψ) = ξi,0 pij ηj,1
j=1 i=1
(40)
= ξ1,0 p11 η1,1 + ξ1,0 p12 η2,1 + ξ2,0 p21 η1,1 + ξ2,0 p22 η2,1
= ξ1,0 (p11 η1,1 + (1 − p11 )η2,1 ) + ξ2,0 ((1 − p22 )η1,1 + p22 η2,1 )
Where ξ1,0 follows unconditional probabilities P (s0 = i) by Hamilton’s proposal

above.
The filtered probability for the current state (ξj,t ) at time t = 1 is:
p1j ξ1,0 ηj,1 + p2j ξ2,0 ηj,1

ξj,1 = P (s1 = j | It , Ψ) =
ξ1,0 (p11 η1,1 + (1 − p11 )η2,1 ) + ξ2,0 ((1 − p22 )η1,1 + p22 η2,1 )
(41)
Therefore at time t = 2, we have:
Given ξj,1 is known from equation (41), substitute it into equation (38), which makes
f (y2 | I1 , Ψ) = K
P PK
j=1 i=1 ξj,1 pij ηj,2 . Then use f (y2 | I1 , Ψ) as the denominator and
ξj,1 as numerator for (39) to calculate ξj,2 . Repeat until the end of the sample till
time T .
6. Empirical Results
The MSGARCH model was tested on the VNI stock index for daily data from period
1/6/2012 to 11/6/2023. An MSGARCH(1,1) specified with a two-state regime was
11
implemented to test whether there is evidence of regime-switching between a low-
volatility state and a high-volatility state in the VNI stock index. For daily or higher
frequency financial data, the average log return is often equal to 0. As a result, the
conditional mean µst is set to 0 in both states. Then the log-return time series
yt = log(Pt ) − log(Pt−1 ), where Pt is the logarithm transformation of the closing
daily price.
Hình 1: VNI Log return
The empirical analysis of the logarithmic price returns for VNI reveals a discernible
pattern characterized by the phenomenon of volatility clustering. This observation
suggests a propensity for periods of high volatility today is followed by analogous
states of high volatility in subsequent periods. The identification of such clustering
underscores the presence of serial dependence in the volatility structure, implying a
persistence in the level of market risk over consecutive time intervals
Bảng 1: Fitted Parameters
Parameter Estimate Std. Error t value Pr(> |t|)

ω1 3.174733e-06 0.0000 4.2157 1.245 × 10−05
α1 0.1069 0.0266 4.0203 2.906 × 10−05
β1 0.8278 0.0133 62.1937 < 1 × 10−16
ω2 0.0001 0.0001 1.1126 0.1329
α2 0.1726 0.2766 0.6241 0.2663
β2 0.8085 0.0278 29.1241 < 1 × 10−16
p1,1 0.9212 0.0257 35.8801 < 1 × 10−16
p2,1 0.8981 0.0831 10.8100 < 1 × 10−16
12
A two-regime switching GARCH model was estimated assuming that the conditional
log-return follows a Normal Distribution. The MSGARCH model shows that the
two regimes report different reactions to past returns/shocks: α1 ≈ 0.10 vs α2 ≈
0.17. Furthermore, the shock persistence to volatility is different across the two
regimes. The first regime reports α1 + β1 ≈ 0.9347, while the second regime reports
α2 + β2 ≈ 0.9811. However, it’s essential to note that the statistical significance of
these coefficients varies between the regimes. Notably, the coefficient α2 = 0.1726 for
the second regime may not reach statistical significance (p-value: 0.2663), caution
should be exercised when interpreting this result. The lack of statistical significance
may imply that this particular coefficient does not significantly differ from zero in
the second regime. Therefore, conservative interpretation should make that state 1
is the high-volatility regime, and state 2 is the low-volatility regime.
Bảng 2: Transition Matrix
t + 1|k = 1 t + 1|k = 2
t|k = 1 0.9212 0.0788
t|k = 2 0.8981 0.1019
The pronounced values along the first diagonal element of the transition matrix
(p11 ) imply a notable propensity for persistence in state 1 within the Markov chain.
Specifically, the high probabilities associated with transitioning to the first state
P (st+1 = 1 | st = k) regardless of the previous state suggests a strong pattern where
the system tends to go to the first state.
Conversely, the staying probability of state 2: p22 is low compared to p11 , indicating
low persistency for the Markov Chain to stay in state 2. Additionally, the transition
probability from state 1 to state 2 (p12 ) is low, describing that the system is less
likely to switch from state 1 to state 2. While these probabilities for state 2 are less
likely than state 1, they are still significant and contribute to the dynamic behavior
of the Markov chain.
The initial column of the transition matrix reveals a notable inclination within the
Markov Chain to persist in and shift to the high-volatility regime, contradicting the
probabilities associated with the low-volatility regime in the second column. This
observation suggests a pronounced proclivity for the financial system to remain in a
state of heightened volatility and, when transitioning, to preferentially move towards
a regime characterized by increased market turbulence.
Bảng 3: 10-Steps ahead Transition Matrix
t + 10|k = 1 t + 10|k = 2
t|k = 1 0.9193785 0.08062147
t|k = 2 0.9193785 0.08062147
The 10-days ahead Transition Matrix implies the Markov system tends to persist in
the high-volatility state 1 even after ten-time steps, with a somewhat small proba-
bility of transitioning to the other low-volatility state.
13
Bảng 4: Stationary Regime Probabilities of Markov Chain
State 1 State 2
0.9194 0.0806
The disparity in long-term probabilities between the high-volatility regime (State 1)

and the low-volatility regime (State 2) signifies distinct persistency patterns in the
financial time series. A higher probability assigned to State 1 suggests an anticipa-
tion of prolonged market turbulence, highlighting a robust tendency for heightened
volatility. Conversely, the lower long-term probability for State 2 indicates a reduced
likelihood of the system being in a low-volatility regime in the long run. This asym-
metry underscores a proclivity towards sustained volatility rather than extended
periods of market tranquility in the financial time series.
(a) Hamilton’s Filter for High-volatility State 1 (b) Hamilton’s Filter for Low-volatility State 2
Hình 2: Side-by-side comparison of Hamilton’s Filter for State 1 and 2
Utilizing parameters estimated through Maximum Likelihood Estimation (MLE),

denoted as Ψ̂, the inferred Hamilton’s filter, denoted as P (st = j | It , Ψ̂) for j = 1, 2,
suggests a persistent inclination towards a high-volatility regime throughout the
observed sample period. This inference is drawn from numerous instances where the
Hamilton’s probability approaches one for the high-volatility state. This consistent
pattern indicates that, in the context of the Vietnam market, a prevailing high-
volatility regime with heightened turbulence is observed.
Evidently, the suggestions derived from Hamilton’s filter point toward a recurring
manifestation of a high-volatility regime. The in-sample volatility distinctly charac-
terizes dates within the high-frequency spectrum that exhibit pronounced instances
of extreme volatility rather than conforming to low-volatility regimes. This empiri-
cal finding contributes insights into the dynamics of the financial landscape of the
Vietnam stock market, emphasizing the prevalent characteristics of the market’s
extreme turbulence, particularly in periods of pronounced volatility.
14
Hình 3: In-sample Volatility
Bảng 5: Model Statistics at Final Outputs
Statistic Value
Log-Likelihood 9375.7056
AIC -18735.4112
BIC -18687.4893
Hình 4: MLE Iterations

The Maximum Likelihood Estimation (MLE) methods tracked the log-likelihood
15
value against the number of iterations using the Quasi-Newton optimization method
L-BFGS by Liu and Nocedal (1989). In this case, the relatively low number of it-
erations (66) implies that the Quasi-Newton optimization method, specifically the
L-BFGS variant, was successful in efficiently optimizing the model parameters, pro-
viding a robust and well-fitted solution to the likelihood maximization problem.
7. Conclusion
MSGARCH model has provided valuable insights into the dynamic characteristics
of the financial time series under consideration. The model, employing a Markov-
switching framework for volatility, has allowed for the identification of distinct
regimes characterized by varying levels of market turbulence. The estimation results
have illuminated regime-specific reactions to past returns, with nuanced differences
in the response coefficients and volatility persistence across the identified regimes.
Tài liệu
Ardia, D., Bluteau, K., Boudt, K., Catania, L., and Trottier, D.-A. (2019). Markov-
switching garch models in r: The msgarch package. Journal of Statistical Software,
91(4):1–38.
BAUWENS, L., PREMINGER, A., and ROMBOUTS, J. V. (2010). Theory and
inference for a Markov switching Garch model. LIDAM Reprints CORE 2303,
Université catholique de Louvain, Center for Operations Research and Economet-
rics (CORE).
Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity.
Journal of Econometrics, 31(3):307–327.
Bollerslev, T., Engle, R. F., and Nelson, D. B. (1994). Chapter 49 arch models.
volume 4 of Handbook of Econometrics, pages 2959–3038. Elsevier.
Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of
the variance of united kingdom inflation. Econometrica, 50(4):987–1007.
Gray, S. F. (1996). Modeling the conditional distribution of interest rates as a
regime-switching process. Journal of Financial Economics, 42(1):27–62.
Haas, M., Mittnik, S., and Paolella, M. S. (2004). A New Approach to Markov-
Switching GARCH Models. Journal of Financial Econometrics, 2(4):493–530.
Hamilton, J. and Susmel, R. (1994). Autoregressive conditional heteroskedasticity
and changes in regime. Journal of Econometrics, 64(1-2):307–333.
Hamilton, J. D. (1989). A new approach to the economic analysis of nonstationary
time series and the business cycle. Econometrica, 57(2):357–384.
Liu, D. and Nocedal, J. (1989). On the limited memory bfgs method for large scale
optimization. Mathematical Programming, 45(1-3):503–528.
16

Msgarch 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Msgarch 1

Uploaded by

Copyright:

Available Formats

Markov-Switching GARCH Model

Hoàng Dung Vũ Minh

November 4th, 2023

2. Time Series Background

σt2 = ω + αε2t−1 + βσt−1

E[yt | It−1 ] = E[µ | It−1 ] + E[σt et | It−1 ]

V ar[yt | It−1 ] = V ar[µ | It−1 ] + V ar[σt et | It−1 ]

yt | It−1 ∼ D(µ, σt2 , ζ) (8)

V ar[yt ] = E[σt2 ] = ω + αE[ε2t−1 ] + βE[σt−1

4. Discrete Markov Chain Background

P (st = j|st−1 = i, st−2 , . . . , It−1 ) = P (st = j|st−1 = i) = pij ∀i, j ∈ S (11)

Where each row of P sums to 1 ( K

Solving this equation, the unconditional stationary probabilities π1 , π2 are:

5. Markov Switching GARCH model

σt2 = ωst + αst ε2t−1 + βst σt−1

5.2. Hass Solution

And the modified error term is:

εt = σst ,t est ,t (23)

σt2 = ω + αε2t−1 + βσt−1

where ω = [ω1 , ω2 , ..., ωK ]T , α = [α1 , α2 , ..., αK ]T , and β = diag[β1 , β2 , ..., βK ]T . The

ξj,t = P (st = j | It , Ψ) ∀j ∈ S (28)

Then the filtered probability for the previous state at time t − 1 is

ξi,t−1 = P (st−1 = i | It−1 , Ψ) ∀i ∈ S (29)

Then the probability density under regime j at time t is:

Then the conditional probability density of observing the current observation yt

The log-likelihood function to maximize is:

5.3.1. A numerical example:

Then the conditional probability density at time t is:

The filtered probability for the current state (ξj,t ) is:

Where ξ1,0 follows unconditional probabilities P (s0 = i) by Hamilton’s proposal

p1j ξ1,0 ηj,1 + p2j ξ2,0 ηj,1

Hình 1: VNI Log return

Bảng 1: Fitted Parameters

Parameter Estimate Std. Error t value Pr(> |t|)

Bảng 2: Transition Matrix

Bảng 3: 10-Steps ahead Transition Matrix

The disparity in long-term probabilities between the high-volatility regime (State 1)

Hình 2: Side-by-side comparison of Hamilton’s Filter for State 1 and 2

Utilizing parameters estimated through Maximum Likelihood Estimation (MLE),

Bảng 5: Model Statistics at Final Outputs

Hình 4: MLE Iterations

You might also like