Professional Documents
Culture Documents
1. Introduction
Financial time series data often exhibit volatility clustering, where periods of high
volatility are followed by subsequent periods of high volatility. Modeling and under-
standing this volatility is crucial for risk management, option pricing, and portfolio
optimization in the field of finance. One powerful class of models designed for captur-
ing volatility patterns is the Generalized Autoregressive Conditional Heteroskedas-
ticity (GARCH) family of models.
However, as the financial domain becomes increasingly nuanced, the limitations
of traditional GARCH models in fully encapsulating the complexities of volatility
dynamics have led to the emergence of more sophisticated frameworks. MSGARCH
extends the capabilities of traditional GARCH models by introducing the concept
of regime-switching in a Markov Chain, acknowledging that market conditions and
volatility regimes can undergo abrupt changes over time.
Covariance stationarity requires that both the unconditional mean and uncondi-
tional variance are finite and do not change with time. The covariance stationarity
conditions only apply to unconditional moments and not conditional moments, and
so a covariance process may have a varying predictable conditional mean and con-
ditional variance.
1
3. GARCH Theoretical Background
Building upon the pivotal ARCH model by Engle (1982), the standard (single-
regime) GARCH(1,1) model is defined by Bollerslev (1986); Bollerslev et al. (1994)
as
y t = µ + εt (2)
εt = σ t e t (4)
iid
et ∼ D(0, 1, ζ) (5)
Where yt is the time series of interest, usually a financial return time series, µ repre-
sents the conditional mean of yt , and εt is the error term that includes a stochastic
volatility process σt2 , and a sequence of i.i.d innovation/shock et , with σt2 is condition-
ally independent of et . Here et follows D(0, 1, ζ) which is a continuous distribution of
innovation term with zero mean, unit variance, and shape parameter ζ (D(.) is typ-
ically assumed to be a Normal distribution). Coefficient α represents the magnitude
of a unit shock’s immediate impact on the next period variance, while coefficient β
indicates the memory of the variance process.
The expected conditional mean and conditional variance by conditioning on It−1 =
{yt−h , h > 0} which is the information set of the time series up to time t − 1. is:
And equivalently
εt | It−1 ∼ D(0, σt2 , ζ) (9)
The equation of interest is equation (3), which states the conditional variance or
the volatility process (σt2 ) at time t depends both on the past values of the shock
captured by the squared error terms (ε2t−1 ) and lagged values of itself (σt−1
2
). All of
2
the right-hand side variables that determine σt2 are known at time t − 1, and so σt2
is in the information set It−1 . To ensure the positivity of the conditional variance,
restrictions are imposed such that ω > 0, α ⩾ 0, and β ⩾ 0.
When expectations are taken over all time, the conditional terms become their un-
conditional counterparts. Hence, the unconditional (long-run) variance by taking the
expectation of both sides of equation (3) is:
Where the second equality states from the Law of Iterated Expectations that E[ε2t−1 ] =
2
E[σt−1 e2t−1 ] = E[E[σt−1
2
e2t−1 ] | It−1 ]] = E[σt−1
2
E[e2t−1 | It−1 ]] = E[σt−1
2
· 1] =
2
V ar[yt−1 ] = σ . The sum α + β measures the persistence of a shock to the con-
ditional variance in equation (3), which must be less than 1 to ensure the covariance
stationary of the process.
When a GARCH model is estimated using daily or higher frequency data or data
suffers from structural break, the estimate of the sum (α+β) tends to be biased close
to or equal to one, indicating that the volatility process is highly persistent, which
causes poor volatility prediction, or violates the covariance stationary condition.
Therefore its transition probability matrix P for all possible states in the state space
S is:
p11 p21 · · · pK1
p12 p22 · · · pK2
P = .. (12)
.. . . ..
. . . .
p1K p2K · · · pKK
3
probability is given as pkk = P (st = k | st−1 = k), which represents the persistence
of "staying probabilities" within a state. If the diagonal of the transition matrix
P is large relative to other non-diagonal entries, then it indicates that there exists
strong persistence within a state.
The t-step transition probability that the chain started in state i at time s0 and hits
state j in t steps, by conditioning on state st−1 = k is:
K
X
P (st = j | s0 = i) = P (st = j | st−1 = k, s0 = i)P (st−1 = k | s0 = i)
k=1
K
X
= P (st = j | st−1 = k)P (st−1 = k | s0 = i) (Markov Property)
k=1
XK
= pkj P (st−1 = k | s0 = i)
k=1
(t)
= (Pt )ij = pij
(13)
Recursively substitute by conditioning on the most recent time st−l−1 (l > 1) demon-
strates that P (st = j | s0 = i) is the ijth entry of the transition matrix Pt (transition
matrix P raised to power t).
Denoting the row vector π0 = [P (s0 = 1), P (s0 = 2), ..., P (s0 = K)] as the probabil-
ity distribution of the Markov chain at time t = 0. Then the t-step state probability
distribution is:
πt = π0 P t (14)
For a finite discrete Markov Chain to be ergodic, two conditions must be met:
irreducibility and aperiodicity.
A Markov Chain is irreducible if all states communicate with each other which
(n) (m)
means there exist two positive integers m > 0, n > 0 such that pij > 0 and pji > 0
∀i, j ∈ S. Intuitively, this means that it is possible to go from state i to state j after
n steps, and vice versa after m steps, for all states in the state space S.
The periodicity of a state in a Markov chain is related to the number of steps it takes
for the chain to return to that state. A Markov chain is aperiodic if it is irreducible
and, for all states k in the state space S the greatest common divisor of all possible
(n)
return times d(k) = gcd{n > 0 : pkk > 0} is equal to 1. In simpler terms, an
aperiodic Markov Chain allows for the chain to return to any state at any time step.
Under these conditions of ergodicity, namely irreducibility, and aperiodicity, then
exists a unique stationary or long-run distribution π = (π1 , π2 , ..., πK ) with the
property that
π = πP t = lim π0 P t (15)
t→∞
The unconditional regime probabilities from the stationary assumption of the Markov
Chain are defined as πk = P (st = k) for all k ∈ S, which represents the probability
the Markov Chain is expected to be in regime k the long run.
4
For two regime K = 2, the time-invariant property of stationary unconditional
regime probabilities follows that πk = P (st = k) = P (st−1 = k), for k = 1, 2 ∀t. The
unconditional state probability for the first state π1 , by conditioning on the previous
state st−1 is:
π1 = P (st = 1)
= P (st−1 = 1)P (st = 1|st−1 = 1) + P (st−1 = 2)P (st = 1|st−1 = 2)
= π1 p11 + π2 p21
= π1 p11 + (1 − π1 )(1 − p22 )
Following the specification by BAUWENS et al. (2010) and Haas et al. (2004), the
general MSGARCH(1,1) model is given by
y t = µ s t + εt (16)
εt = σ t e t (18)
iid
et ∼ D(0, 1, ζ) (19)
Where the parameters follow a homogeneous, stationary, ergodic Markov chain with
finite state-space denoted with st , which is independent of the i.i.d sequence of
shocks et . The stochastic mechanism drives a latent ergodic homogeneous Markov
chain with K regimes for the MSGARCH model, according to Hamilton and Susmel
(1994). Latent refers to the inability to predict or systematically observe regimes or
regime shifts. Parameters of the model must satisfy the constraints ωst > 0, αst ⩾ 0
and βst ⩾ 0 for st ∈ S = {1, . . . , K} so that the conditional variance is positive and
αst + βst < 1 to be covariance stationary for all possible states st .
5
5.1. Path Dependence Problem
Hamilton and Susmel (1994) have argued that regime-switching GARCH models
are essentially intractable and impossible to estimate due to the dependence of the
conditional variance on the entire past history of the data in a GARCH model.
Given initial variance σ02 , Haas et al. (2004) demonstrated successive recursion of
equation (17) yields:
t−1
X i−1 t−1
Y Y
σt2 2 2
= ωst−i + αst−i εt−1−i βst−j + σ0 βst−i (20)
i=0 j=0 i=0
This recursive equation shows that the conditional variance σt2 at time t depends on
the entire history of regimes of to time t.
According to Gray (1996), the distribution of yt at time t, conditional on the regime
(st ) and on the available information It−1 , depends directly on st , and also indirectly
on st−1 , st−2 ,.... due to the path dependence problem in MSGARCH models. This
is because the conditional variance at time t (σt2 ) depends upon the conditional
variance at time t − 1, which depends upon the regime at time t − 1 and on the
conditional variance at time t − 2, and so on. Consequently, the conditional variance
at time t depends on the entire sequence of regimes up to time t.
Gray (1996) showed that calculation for the log-likelihood function for a sample of
T observations requires integration over all K T possible (unobserved) regime paths,
which makes the numerical optimization of the likelihood function break downs for
very large T .
εt = σst ,t et (21)
Haas et al. (2004) originally conditioned the innovation et to be the same as equation
(21), however a more generalization of the innovation was adopted by Ardia et al.
(2019) by letting the innovations be transitive according to state st = k, giving rise
to ek,t which is a generalization of et , then we have the shock as:
iid
ek,t ∼ D(0, 1, ζ k ) (22)
If we condition on the realization that the state st = k and the available information
set It−1 , we get the return distribution on state k is:
6
2
yt | (st = k, It−1 ) ∼ D(µk , σk,t , ζk ) (24)
Where the conditional variance is:
V ar[yt | st = k, It−1 ] = V ar[µk | st = k, It−1 ] + V ar[εt | st = k, It−1 ]
= V ar[µk | st = k, It−1 ] + V ar[σk,t ek,t | st = k, It−1 ]
2 2
= 0 + E[σk,t ek,t | st = k, It−1 ] − (E[σk,t ek,t | st = k, It−1 ])2
2
= E[σk,t | st = k, It−1 ]E[e2k,t | st = k, It−1 ]
2
= σk,t
(25)
2
Given the formulation of D(.) in equation (24), we derive σk,t is the variance of yt
conditioned on the realization of st = k and the information set It−1 . Therefore, the
conditional variance equation conditioned on state st = k is:
2
σk,t = ωk + αk ε2t−1 + βk σk,t−1
2
(26)
The MSGARCH(1,1) model proposed by Haas et al. (2004) can also be written in
matrix form for the conditional variance:
7
5.3. Methods of Estimation
Under the model specification of Haas et al. (2004), the parameters of the MS-
GARCH can be either estimated by a frequentist method via Maximum Likelihood
or a Bayesian approach from MCMC under development by Ardia et al. (2019).
Let θ i be a vector of all GARCH parameters across states i in S: θ i = (µi , σi,t , ωi , αi , βi ).
Then Ψ is the a vector of mixture of all parameters Ψ = (θ1 , ..., θK , ζ1 , ..., ζK , P)
Hamilton (1989)’s paper proposed an algorithm to estimate the parameters of a
switching process when the true state of the system at any given time is unobserv-
able. Basically, Hamilton’s filter is used to jointly estimate i) the parameters of the
model conditional on being in state k and ii) the probability that we are in state k
at a particular time t.
Hamilton’s filter calculates each state’s conditional density st = j by making in-
ferences on each state’s unknown probabilities by using all the data up to time t.
When the filter probabilities are obtained, Hamilton’s filter is used to apply the
state transition probabilities to construct the log-likelihood function by repetitions
of predicting and updating for state inference. Given the direct observation about
yt , the inference about the value of st (the filtered probability at time t) will take
the form of:
8
given all available information It−1 and all model parameters Ψ is:
K
X
f (yt | It−1 , Ψ) = f (yt , st = j | It−1 , Ψ)
j=1
K
X
= f (yt | st = j, It−1 , Ψ)P (st = j | It−1 , Ψ)
j=1
K
X K
X
= f (yt | st = j, It−1 , Ψ) P (st = j, st−1 = i | It−1 , Ψ)
j=1 i=1
K
X K
X
= f (yt | st = j, It−1 , Ψ) P (st = j | st−1 = i, It−1 , Ψ)P (st−1 = i | It−1 , Ψ)
j=1 i=1
K
X K
X
= f (yt | st = j, It−1 , Ψ) P (st = j | st−1 = i)P (st−1 = i | It−1 , Ψ)
j=1 i=1
K X
X K
= P (st−1 = i | It−1 , Ψ)P (st = j | st−1 = i)f (yt | st = j, It−1 , Ψ)
j=1 i=1
K X
X K
= ξi,t−1 pij ηj,t
j=1 i=1
(31)
Here the P (st−1 = i | It−1 , Ψ)P (st = j | st−1 = i) represents the prior beliefs of the
observer about the state of time series at time t − 1. And f (yt | st = j, It−1 Ψ) is the
observed data with belief at state st = j. Then the likelihood is the product of the
prior and the observed data condition on the state st .
The proof states that the conditional current state probability P (st = j | It−1 , Ψ) is
decomposed into the past filtered probability (ξi,t−1 ) and the transition probability
(pij ):
K
X
P (st = j | It−1 , Ψ) = P (st = j | st−1 = i)P (st−1 = i | It−1 , Ψ) (32)
i=1
9
Consequently, the filtered probability for the current state (ξj,t ) can be obtained via:
P (yt , st = j | It−1 , Ψ)
P (st = j | It , Ψ) =
f yt | It−1 , Ψ
f yt | st = j, It−1 , Ψ P (st = j | It−1 , Ψ)
=
f yt | It−1 , Ψ
PK
f yt | st = j, It−1 , Ψ i P (st = j | st−1 = i, It−1 , Ψ)P (st−1 = i | It−1 , Ψ)
=
f yt | It−1 , Ψ
PK
f yt | st = j, It−1 , Ψ i P (st = j | st−1 = i)P (st−1 = i | It−1 , Ψ)
=
f yt | It−1 , Ψ
PK
i P (st = j | st−1 = i)P (st−1 = i | It−1 , Ψ)f yt | st = j, It−1 , Ψ
=
f yt | It−1 , Ψ
PK
i pij ξi,t−1 ηj,t
=
f yt | It−1 , Ψ
(33)
The given procedures state that the current filtered probability of st (ξj,t ) can be
calculated as long as the previous filtered probability of st−1 (ξi,t−1 ) is known. As
a result, the conditional probability density f (yt | It−1 , Ψ) in equation (31) can
be calculated in an iterative process, where inference is performed iteratively for
t = 1, 2, ..., T to calculate ξi,t−1 as input for ξj,t as output.
Several options are available for the value ξi,0 to use to start these iterations. If
the Markov chain is presumed to be ergodic, Hamilton (1989) suggested using the
unconditional probabilities P (s0 = i). Another altenative is setting ξi,0 = 1/K,
where K is the total number of Markov states in S.
Equivalently, the likelihood function to be maximized is the product of each con-
ditional density given all available information It−1 and all model parameters Ψ
is:
T
Y
L(Ψ | IT ) = f (yt | It−1 , Ψ) (34)
t=1
T
X
log[L(Ψ | IT )] = log[f (yt | It−1 , Ψ)] (35)
t=1
The following numerical example assumes there exists two-state volatility, one with
a low-volatility process (st = i), and one with a high-volatility process (st = j)
yt = µst + εt (36)
10
And probability density under regime j at time t follows a Normal Distribution is:
1 1 (yt − µj )2
ηj,t = f yt | st = j, It−1 , Ψ = q exp(− 2
) (37)
2
2πσj,t 2 σj,t
6. Empirical Results
The MSGARCH model was tested on the VNI stock index for daily data from period
1/6/2012 to 11/6/2023. An MSGARCH(1,1) specified with a two-state regime was
11
implemented to test whether there is evidence of regime-switching between a low-
volatility state and a high-volatility state in the VNI stock index. For daily or higher
frequency financial data, the average log return is often equal to 0. As a result, the
conditional mean µst is set to 0 in both states. Then the log-return time series
yt = log(Pt ) − log(Pt−1 ), where Pt is the logarithm transformation of the closing
daily price.
The empirical analysis of the logarithmic price returns for VNI reveals a discernible
pattern characterized by the phenomenon of volatility clustering. This observation
suggests a propensity for periods of high volatility today is followed by analogous
states of high volatility in subsequent periods. The identification of such clustering
underscores the presence of serial dependence in the volatility structure, implying a
persistence in the level of market risk over consecutive time intervals
12
A two-regime switching GARCH model was estimated assuming that the conditional
log-return follows a Normal Distribution. The MSGARCH model shows that the
two regimes report different reactions to past returns/shocks: α1 ≈ 0.10 vs α2 ≈
0.17. Furthermore, the shock persistence to volatility is different across the two
regimes. The first regime reports α1 + β1 ≈ 0.9347, while the second regime reports
α2 + β2 ≈ 0.9811. However, it’s essential to note that the statistical significance of
these coefficients varies between the regimes. Notably, the coefficient α2 = 0.1726 for
the second regime may not reach statistical significance (p-value: 0.2663), caution
should be exercised when interpreting this result. The lack of statistical significance
may imply that this particular coefficient does not significantly differ from zero in
the second regime. Therefore, conservative interpretation should make that state 1
is the high-volatility regime, and state 2 is the low-volatility regime.
t + 1|k = 1 t + 1|k = 2
t|k = 1 0.9212 0.0788
t|k = 2 0.8981 0.1019
The pronounced values along the first diagonal element of the transition matrix
(p11 ) imply a notable propensity for persistence in state 1 within the Markov chain.
Specifically, the high probabilities associated with transitioning to the first state
P (st+1 = 1 | st = k) regardless of the previous state suggests a strong pattern where
the system tends to go to the first state.
Conversely, the staying probability of state 2: p22 is low compared to p11 , indicating
low persistency for the Markov Chain to stay in state 2. Additionally, the transition
probability from state 1 to state 2 (p12 ) is low, describing that the system is less
likely to switch from state 1 to state 2. While these probabilities for state 2 are less
likely than state 1, they are still significant and contribute to the dynamic behavior
of the Markov chain.
The initial column of the transition matrix reveals a notable inclination within the
Markov Chain to persist in and shift to the high-volatility regime, contradicting the
probabilities associated with the low-volatility regime in the second column. This
observation suggests a pronounced proclivity for the financial system to remain in a
state of heightened volatility and, when transitioning, to preferentially move towards
a regime characterized by increased market turbulence.
t + 10|k = 1 t + 10|k = 2
t|k = 1 0.9193785 0.08062147
t|k = 2 0.9193785 0.08062147
The 10-days ahead Transition Matrix implies the Markov system tends to persist in
the high-volatility state 1 even after ten-time steps, with a somewhat small proba-
bility of transitioning to the other low-volatility state.
13
Bảng 4: Stationary Regime Probabilities of Markov Chain
State 1 State 2
0.9194 0.0806
(a) Hamilton’s Filter for High-volatility State 1 (b) Hamilton’s Filter for Low-volatility State 2
14
Hình 3: In-sample Volatility
Statistic Value
Log-Likelihood 9375.7056
AIC -18735.4112
BIC -18687.4893
15
value against the number of iterations using the Quasi-Newton optimization method
L-BFGS by Liu and Nocedal (1989). In this case, the relatively low number of it-
erations (66) implies that the Quasi-Newton optimization method, specifically the
L-BFGS variant, was successful in efficiently optimizing the model parameters, pro-
viding a robust and well-fitted solution to the likelihood maximization problem.
7. Conclusion
MSGARCH model has provided valuable insights into the dynamic characteristics
of the financial time series under consideration. The model, employing a Markov-
switching framework for volatility, has allowed for the identification of distinct
regimes characterized by varying levels of market turbulence. The estimation results
have illuminated regime-specific reactions to past returns, with nuanced differences
in the response coefficients and volatility persistence across the identified regimes.
Tài liệu
Ardia, D., Bluteau, K., Boudt, K., Catania, L., and Trottier, D.-A. (2019). Markov-
switching garch models in r: The msgarch package. Journal of Statistical Software,
91(4):1–38.
BAUWENS, L., PREMINGER, A., and ROMBOUTS, J. V. (2010). Theory and
inference for a Markov switching Garch model. LIDAM Reprints CORE 2303,
Université catholique de Louvain, Center for Operations Research and Economet-
rics (CORE).
Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity.
Journal of Econometrics, 31(3):307–327.
Bollerslev, T., Engle, R. F., and Nelson, D. B. (1994). Chapter 49 arch models.
volume 4 of Handbook of Econometrics, pages 2959–3038. Elsevier.
Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of
the variance of united kingdom inflation. Econometrica, 50(4):987–1007.
Gray, S. F. (1996). Modeling the conditional distribution of interest rates as a
regime-switching process. Journal of Financial Economics, 42(1):27–62.
Haas, M., Mittnik, S., and Paolella, M. S. (2004). A New Approach to Markov-
Switching GARCH Models. Journal of Financial Econometrics, 2(4):493–530.
Hamilton, J. and Susmel, R. (1994). Autoregressive conditional heteroskedasticity
and changes in regime. Journal of Econometrics, 64(1-2):307–333.
Hamilton, J. D. (1989). A new approach to the economic analysis of nonstationary
time series and the business cycle. Econometrica, 57(2):357–384.
Liu, D. and Nocedal, J. (1989). On the limited memory bfgs method for large scale
optimization. Mathematical Programming, 45(1-3):503–528.
16