Momentum and Mean-Reversion in A Semi-Markov

d
Momentum and Mean-Reversion in a Semi-Markov
we
Model for Stock Returns
Javier Giner1 and Valeriy Zakamulin2
vie
1
Department of Economics, Accounting and Finance, University of La Laguna, Camino La Hornera
s/n, 38071, Santa Cruz de Tenerife, Spain, E-mail: jginer@ull.edu.es
2
Corresponding author, School of Business and Law, University of Agder, Service Box 422, 4604
Kristiansand, Norway, E-mail: valeriz@uia.no
December 30, 2021
re
Abstract
A vast body of empirical literature documents the existence of short-term momentum
and medium-term mean reversion in various financial markets. By contrast, there is still a
er
great shortage of theoretical models that explain the presence of these two common phe-
nomena. We develop a semi-Markov model where the return process randomly switches
between bull and bear states. In our model, the state duration times are governed by a neg-
ative binomial distribution that exhibits a positive duration dependence. We demonstrate
pe
that this model induces return momentum at short lags and reversal at subsequent lags.
We calibrate our model to empirical data and show that the model-implied autocorrelation
function fits reasonably well to the empirically estimated autocorrelation function.
Key words: time-series momentum; mean reversion; bull and bear markets; duration
dependence; semi-Markov model
ot
JEL classification: C1, G10

tn
rin
ep
Pr
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3997837
1 Introduction
d
Momentum and mean-reversion are the two all-pervading phenomena in financial markets
we
documented in numerous empirical studies. Momentum denotes a stock price tendency to
continue moving in the same direction over a short run. For instance, if stock returns have
been high in the recent past, they are most likely to remain high in the nearest future. The
vie
concept of mean reversion refers to a stock price tendency to revert to a trend path in the
medium run. For example, if stock returns have been unusually high (low) in the past, they
are likely to be unusually low (high) in the future.
Momentum and mean reversion come in two flavors: cross-sectional and time-series. While
re
the cross-sectional momentum (discovered by Jegadeesh and Titman (1993)) and mean re-
version (first reported by De Bondt and Thaler (1985)) focus on the relative performance of
stocks in the cross-section, the time-series momentum and mean reversion aim attention exclu-
er
sively at a financial asset’s own performance. In this paper, we deal only with the time-series
momentum and mean reversion. In particular, we focus on the time-series dependence in the
pe
returns on a single stock or a stock market index. In this case, short-term momentum and
medium-term mean reversion materialize as a positive return autocorrelation at short lags and
a negative autocorrelation at longer lags.
Earlier studies on the time-series momentum and mean reversion in stock prices are con-
ot
ducted by Summers (1986), Fama and French (1988), Lo and MacKinlay (1988), Poterba and
Summers (1988), and Jegadeesh (1991). For example, Poterba and Summers (1988) document
tn
that stock returns exhibit a positive autocorrelation over periods shorter than one year and a
negative autocorrelation over longer periods. Fama and French (1988) find a negative auto-
correlation in returns aggregated over periods from three to five years. Later studies on the
rin
time-series momentum and mean reversion turn a spotlight on how one can exploit these phe-
nomena to beat the market. For example, Moskowitz, Ooi, and Pedersen (2012) document that
a strategy, which buys stocks with positive returns in the past 12 months and sells stocks with
ep
negative returns, delivers superior performance in various financial markets. Subsequently,
similar findings are reported by Georgopoulou and Wang (2016), Hurst, Ooi, and Pedersen
(2017), Lim, Wang, and Yao (2018), and many others. Balvers, Wu, and Gilliland (2000) not
Pr
only present evidence of momentum and mean reversion but also show how mean reversion
can be exploited to generate abnormal returns. Similar results are presented by Balvers and
d
Wu (2006) and Balvers, Hu, and Huang (2012).
Momentum and mean-reversion phenomena are considered anomalies within traditional
we
asset-pricing models that assume unbounded investor rationality. Behavioral theories explain
these phenomena by challenging the assumption of strict rationality. In particular, these
theories presume that investors have several cognitive and emotional biases. The behavioral
vie
explanation for the existence of short-term momentum and subsequent medium-term mean
reversion rests upon two assumptions: investors underreact to news in the short-run and
overreact in the medium-run (see Hong and Stein (1999) and references therein).
re
There is a vast body of empirical literature on momentum and mean reversion in various
financial markets. By contrast, there is still a great shortage of theoretical models that explain
the presence of short-term momentum and subsequent mean reversion. Almost exclusively,
er
these models are equilibrium models (some examples are Hong and Stein (1999), Barberis and
Shleifer (2003), and He and Li (2015)) that assume the existence of several types of traders
in a financial market: rational traders, noisy (irrational) traders, momentum traders (trend-
pe
followers), and contrarian traders. These models are elaborate and complicated theoretical
models that are hard or even impossible to calibrate to empirical data. Besides, these mod-
els are difficult to solve analytically. Therefore, the researchers have to resort to numerical
ot
solutions.
This paper is the first to entertain a fundamentally different approach to the theoretical
tn
modeling of momentum and mean reversion in financial markets. Whereas an equilibrium
model produces (as an output) the return process that exhibits momentum and subsequent
mean reversion, we directly model the return process that induces momentum and mean re-
rin
version. In particular, the model proposed in this paper rests upon a simple, plausible, easy-
to-understand assumption. In our model, the return process randomly switches between two
possible states commonly referred to as bull and bear markets. Besides simplicity, other im-
ep
portant advantages of our model are parsimony and ease of calibration to empirical data.
In principle, the idea to model financial returns by a process that switches between bull and
bear states is not new. On the contrary, Markov-switching models (MSM) for the modeling of
Pr
stock returns are well-known. In an MSM, although in each state of the market the returns are
independently distributed, the returns exhibit a positive autocorrelation that decreases as the
lag length increases (see Timmermann (2000) and Frühwirth-Schnatter (2006)). That is, an
d
MSM can explain the short-term momentum. Unfortunately, an MSM is not able to explain
the medium-term mean reversion.
we
A severe limitation of an MSM is that the state duration times are governed by a geometric
distribution that is memoryless. As a result, there is no duration dependence. In other words,
the state termination probability does not depend on the time already spent in that state.
vie
By contrast, many empirical studies document that the stock market states exhibit a positive
duration dependence (see, among others, Cochran and Defina (1995), Ohn, Taylor, and Pagan
(2004), and Harman and Zuehlke (2007)). A positive duration dependence means that the
re
longer a bull (bear) market lasts, the higher its probability of ending. Consequently, an MSM
does not provide a correct representation of the bull and bear market duration times.
The primary approach to incorporate the duration dependence in a regime-switching model

er
is to replace an MSM with a semi-Markov switching model (SMSM). An SMSM generalizes
the MSM by allowing the state duration time to follow any probability distribution. However,
a serious disadvantage of an SMSM is the lack of analytical tractability. Besides, all numer-
pe
ical computations rely on using complicated recursive algorithms. Alternatively, an SMSM
can be realized as an expanded-state MSM (ESMSM) where several Markovian states repre-
sent one semi-Markovian state. Our choice is an ESMSM with a specific topology where the
ot
state duration times are governed by a negative binomial distribution. A negative binomial
distribution exhibits a positive duration dependence and reduces to a geometric distribution

tn
under particular parameter constraints. As compared to an original SMSM formulation, an
ESMSM formulation lacks flexibility but presents two crucial advantages. First, an ESMSM
provides some degree of analytical tractability. Second, this formulation enables us to apply
rin
all well-established methods available for Markov models.
Our main contribution is to propose a theoretical construction of an ESMSM where the
return process randomly switches between bull and bear states. For the simplest case, where
ep
two Markovian states represent each semi-Markovian state, we offer the analytical solutions to
the return autocorrelation function. We demonstrate that the return autocorrelation function
exhibits both short-term momentum and medium-term mean reversion. Under realistic model
Pr
parameters, the shape of the autocorrelation function represents a damped cosine wave that
decays rather fast. Qualitatively, the shape of the return autocorrelation function remains the
same in the general case where many Markovian states represent each semi-Markovian state.
d
In the general case, the return autocorrelation function can be computed using simple numer-
ical methods. We demonstrate the applicability of our theoretical results using an empirical
we
application. In this application, we calibrate our model to the monthly returns on the Dow
Jones index and the Standard and Poor’s Composite index. Using these two indices, we ex-
plore how well the model-implied return autocorrelation function fits the empirically estimated
vie
autocorrelation function. We show that the fit is reasonably good. In particular, our model
correctly captures the duration of the short-term momentum that lasts about 10-12 months
and subsequently reverses.
re
The rest of the paper is organized as follows. Subsequent Section 2 describes how the
return autocorrelation function is computed in a two-state regime-switching model. For the
sake of completeness and comparability, Section 3 presents a conventional MSM and the return
er
autocorrelation function in this model. Section 4 explains the construction of our ESMSM,
offers the analytical solution to the return autocorrelation function for the simplest case, and
demonstrates that the return autocorrelation function exhibit both short-term momentum and
pe
medium-term mean reversion. Section 5 calibrates our model to empirical data and illustrates
the goodness of fit. Finally, Section 6 concludes the paper.

ot
2 Return Autocorrelation in a Regime-Switching Model
Denote by Xt the period-t log return on a financial asset. We assume that Xt is a discrete-time
tn
stochastic process that randomly switches between two states (regimes): A and B. Formally,
the state space of the process is St ∈ {A, B}. The return distribution depends on the state St
in the following manner:

rin


µA + σA zt
 if St = A,
Xt =

µB + σB zt
 if St = B,
ep
where µA and σA are the mean and standard deviation of returns in state A, µB and σB are the
mean and standard deviation of returns in state B, and zt is an identically and independently
distributed over time random variable with zero mean and unit variance.
Pr
Throughout the paper, we assume that state A is a bull state of the market, while state
B is a bear state of the market. A bull market is typically a high-return low-volatility state,
whereas a bear market is a low-return high-volatility state.
d
The conditional probabilities P rob(St+n = J|St = I) = pIJ (n) are called the multi-period
transition probabilities. In words, pIJ (n) is the probability that the process transits from state
we
I to state J over n periods. The n-period transition probability distribution of the process can
be represented by a 2 × 2 transition probability matrix P(n):
 
vie
pAA (n) pAB (n)
P(n) =  .
pBA (n) pBB (n)
Denote by π = [πA , πB ] the vector of the steady-state (stationary or ergodic) probabilities.
re
Specifically,
πA = P rob(St = A), πB = P rob(St = B).
er
The return autocorrelation function ρn is defined by (see Timmermann (2000) and Frühwirth-
Schnatter (2006, Chapter 10))

E[Xt Xt+n ] − µ2
ρn = ,
pe
σ2
where
µ = E[Xt ] = πA µA + πB µB , σ 2 = V ar[Xt ] = πA σA
2 2
+ π B σB + πA πB (µA − µB )2 ,
ot
E[Xt Xt+n ] = πA µA (pAA (n)µA + pAB (n)µB ) + πB µB (pBA (n)µA + pBB (n)µB ) .
tn
The expression for the lag-n autocorrelation can be re-written in the following form:
πA πB (µA − µB )2 − (µA − µB )(πA pAB (n) µA − πB pBA (n) µB )

ρn = . (1)
σ2
rin
It is important to note that the return autocorrelation function depends on n only through
transition probabilities pAB (n) and pBA (n). The computation of the n-period transition prob-
ep
abilities depends largely on whether the regime-witching model is a Markov or a semi-Markov
model. We discuss the computation of the transition probabilities in the subsequent sections.
Typically, the return autocorrelation at any lag is very weak and escapes detection in em-
Pr
pirical studies (because it is usually statistically insignificant). Therefore, reliable detection of
return autocorrelation is only possible using returns aggregated over multiple periods. This
idea was put forward by Fama and French (1988) who suggest using the first-order autocorre-
d
lation of k-period returns:
we
AC1(k) = Cor(Xt+k,t+1 , Xt,t−k+1 ), (2)
where
k k
vie
X X
Xt+k,t+1 = Xt+i , Xt,t−k+1 = Xt−k+i .
i=1 i=1
Proposition 1. The first-order autocorrelation of k-period returns is given by
1′ U 1
re
AC1(k) = , (3)
1′ R1
where 1 is the k × 1 vector of ones, R and U are the k × k matrices given by


1 ρ1 ρ2
er
. . . ρk−1




ρk ρk+1 ρk+2 . . . ρ2k−1


   
 ρ1 1 ρ 1 . . . ρk−2  ρk−1 ρk ρk+1 . . . ρ2k−2 
   
pe
   
R =  ρ2
 ρ1 1 . . . ρk−3 , U = ρk−2 ρk−1 ρk

,
. . . ρ2k−3  (4)
 . . . .  . . . .
.. ..
 
 . .. .. ..   . .. .. .. 
 . .   . . 
   
ρk−1 ρk−2 ρk−3 . . . 1 ρ1 ρ2 ρ3 . . . ρk
ot
where ρi is the lag i autocorrelation of Xt .
The proof is given in the Appendix.

tn
Note that the first-order autocorrelation of k-period returns is fully determined by the
return autocorrelation function ρn . In fact, AC1(k) equals the sum of all elements of the
matrix U divided by the sum of all elements of the matrix R.

rin
3 Return Autocorrelation in a Markov Model

ep
In an MSM, the return process satisfies the “Markov property” (memorylessness)
P rob(St+1 |St , . . . , S0 ) = P rob(St+1 |St ). (5)

Pr
In words, the conditional probability distribution of the state process at a future time t + 1,
d
depends only upon the present state at time t and not upon the past states at times t − i, for
all i ≥ 1.
we
Assume the following one-period transition probability matrix:
   
pAA pAB  1 − α α 
P= = . (6)
vie
pBA pBB β 1−β
For instance, if the process is in state A, then over a single period, the process transits to state
B with probability pAB or remains in state A with probability pAA = 1 − pAB . The probability
re
pAA is called the self-transition probability of state A. Figure 1 illustrates an MSM specified
by the transition probability matrix in (6).
er pBA
pAA pBB
pe
A B
pAB
ot
Figure 1: A two-state Markov switching model. pIJ denotes the conditional probability that
the process transits from state I to state J over a single period.
tn
The state duration time is the random time of staying in a particular state. In an MSM,
the state I duration time dI follows the geometric distribution

rin
n−1
P rob(dI = n) = pII (1 − pII ), n ≥ 1,
where pII is the self-transition probability of state I, dI is the duration time of state I, and n
ep
is the number of periods. In a Markov process, there is no duration dependence in the sense
that
P rob(dI > s + n|dI ≥ s) = P rob(dI > n) ∀ n, s ≥ 1.

Pr
The intuition behind this property is as follows. If the process is in state I at time t, then the
remaining state duration time does not depend on the time already spent in that state.
d
When the state duration times are geometrically distributed, the mean state duration times
are given by
we
1 1 1 1
E[dA ] = = , E[dB ] = = , (7)
1 − pAA α 1 − pBB β
where E[dA ] and E[dB ] denote the mean state A and B duration times, respectively.
vie
Assuming that the transition probability matrix P is the same after each period, the n-
period transition probability matrix can be computed as P(n) = Pn . The elements of the
transition probability matrix P(n) are given by (see, for example, Hamilton (1994, Chapter
22))
re
   
pAA (n) pAB (n) πA + πB (1 − α − β)n
πB − πB (1 − α − β)n
P(n) =  = . (8)

pBA (n) pBB (n) πA − πA (1 − α − β)n πB + πA (1 − α − β)n
er
Note that in the limit when n → ∞, the matrix P(n) converges to a matrix that contains
the stationary probabilities in each row. Assuming that α + β < 1, the transition probability
pe
pAA (n) (pBB (n)) monotonically decreases from 1 (when n = 0) to πA (πB ). In contrast, the
transition probability pAB (pBA (n)) monotonically increases from 0 to πB (πA ). The stationary
probabilities satisfy the following condition πP = π. This condition can be used to find the
ot
expression for the stationary probabilities
β α
πA = , πB = . (9)
tn
α+β α+β
In an MSM, the return autocorrelation function given by equation (1) reduces to

rin
πA πB (µA − µB )2
ρn = (1 − α − β)n . (10)
σ2
It is essential to note that the lag-n autocorrelation is always non-negative, ρn ≥ 0, given

ep
that α + β < 1. Additionally, if µB 6= µA , then the autocorrelation is strictly positive. The
autocorrelation exponentially decreases towards zero as n increases. Consequently, in an MSM,
the return process exhibits a short-term momentum.

Pr
Figure 2 illustrates the return autocorrelations in an MSM for monthly returns. Specifically,
the red line with points plots the month-k return autocorrelation, whereas the blue line with
points plots the first-order autocorrelation of k-month returns. The annualized mean state
d
returns are µA = 20% and µB = −30%. The annualized standard deviations of state returns
are σA = σB = 20%. The mean state A (bull market) duration time equals 20 months, whereas
we
the mean state B (bear market) duration time equals 10 months. Note that the month-k return
autocorrelation exponentially decreases towards zero as k increases. In contrast, the first-order
autocorrelation of k-month returns quickly increases and then gradually decreases towards zero
vie
after reaching the maximum. It is worth emphasizing that the first-order autocorrelation of
k-month returns is substantially larger than the month-k return autocorrelation for k > 1.
re
0.20
ACF(k)
AC1(k)
0.15
er
Autocorrelation
0.10
pe
0.05
0.00
ot
0 10 20 30 40
Months, k
tn
Figure 2: The return autocorrelations in a Markov switching model. ACF (k) denotes the month-k
return autocorrelation ρk . AC1(k) denotes the first-order autocorrelation of k-month returns. The
annualized mean state returns are µA = 20% and µB = −30%. The annualized standard deviations
of state returns are σA = σB = 20%. The mean state A duration time equals 20 months, whereas the
mean state B duration time equals 10 months.
rin
4 Return Autocorrelation in a Semi-Markov Model

ep
4.1 Preliminaries
A serious limitation of a conventional Markov model is that the state duration times are ge-
Pr
ometrically distributed. Consequently, the probability of state change does not depend on
the amount of time passed since the entry into the state. This behavior is not a reasonable
10
representation of many real-world processes. For example, the majority of empirical stud-
d
ies document that the stock market cycles exhibit duration dependence (see, among others,
Cochran and Defina (1995), Maheu and McCurdy (2000), Lunde and Timmermann (2004),
we
Ohn et al. (2004), and Harman and Zuehlke (2007)). Most often, the researchers find positive
duration dependence in both the bull and bear market states. A positive duration dependence
means that the longer a state lasts, the higher its probability of ending. The main approach
vie
to incorporate the duration dependence in a regime-switching model is to replace a Markov
switching model with a semi-Markov switching model (SMSM).
An SMSM generalizes an MSM by allowing the state duration time to follow any probability
re
distributions. Contrary to a Markov model, an SMSM does not have the Markov property at
each time t. The Markov property is satisfied only when the process changes the state. That
is, when a process enters state I, it determines the next state J according to the transition
er
probability pIJ . However, after J has been selected, but before making the transition, the
process holds in state I for a random time dIJ . Generally, a two-state SMSM is specified
by a 2 × 2 transition probability matrix and 4 probability mass functions that determine the
pe
distribution of the state duration times when the process transits from state I to state J.
A great advantage of an SMSM is that it is very flexible and can incorporate any duration
distribution. However, a serious disadvantage of an SMSM is that there are no analytical

ot
solutions to the state transition probabilities. Moreover, the state transition probabilities
must always be computed using complicated recursive numerical algorithms (see, for example,
tn
Howard (1971) and Barbu and Limnios (2008)).
Ferguson (1980) was the first to note that an SMSM can be realized as an expanded-state
MSM (ESMSM). Russell and Cook (1987), Johnson (2005), Guèdon (2005), and Langrock
rin
and Zucchini (2011, Chapter 12) present overviews of various approaches to constructing an
ESMSM. In an ESMSM, each individual state I of the process is represented by q sub-states in
the conventional Markov model: {i1 , i2 , . . . , iq }. The state process is in macro-state I, St = I,

ep
as long as the state process is in the set of q sub-states1 St ∈ {i1 , i2 , . . . , iq }. As compared
with an authentic SMSM, an ESMSM is not flexible with respect to the distribution of the
state duration times. However, an immense advantage of using the ESMSM formulation is that
Pr
1
Note that in our definitions, a macro-state is a semi-Markovian state, while a sub-state is a Markovian state.
That is, in an ESMSM, a semi-Markovian state consists of several Markovian states.
11
this approach enables one to apply all well-established methods available for Markov models.
d
For example, instead of implementing a complicated recursive numerical algorithm, one can
compute the n-period transition probability matrix using matrix multiplication (power).
we
In an ESMSM, the state duration distributions depend on the chosen topology. Our choice
is to use an ESMSM with a specific topology where the state duration times follow a negative
binomial distribution. In this topology, each of the q sub-states of a macro-state is with self-
vie
transition, and the transition to the next macro-state is possible from the last qth sub-state
only. We assume that the self-transition probability pii is the same in each sub-state 1, 2, . . . , q
of state I. Under this assumption, the macro-state I duration time follows a negative binomial
re
distribution dI ∼ N B(q) (for references, see Johnson (2005), Guèdon (2005), Zhu, Wang, Yang,
and Song (2006), and Tejedor, Gómez, and Pacheco (2015)). The probability mass function of
the N B(q) distribution is given by

er
f (n, q, pii ) = P rob(dI = n) =

n−1

n−q
(1 − pii )q pii , n ≥ q. (11)
n−q
pe
The transition probability 1 − pii can be interpreted as the probability of success on one
Bernoulli trial. Thus, the negative binomial distribution gives the probability that the qth
success will occur in the nth Bernoulli trial. The expected number of trials to get q successes
(or equivalently the macro-state I mean duration time) equals E[dI ] = q/(1 − pii ). The
ot
geometric distribution is a special case of the negative binomial distribution when q = 1.
Consequently, when q = 1 for all states, an ESMSM reduces to a conventional MSM.

tn
A very convenient function for duration analysis is the hazard function
f (n)
h(n) = ,
rin
1 − F (n)
where f (n) is the probability mass function of the state durations and F (n) is the correspond-
ing cumulative distribution function. The hazard function gives a conditional failure rate.
ep
Specifically, the hazard function is a probability that the market state ends at time n under
the condition that the market state lasted till time n. When the hazard function is constant,
there is no duration dependence. Provided absence of duration dependence, at any time, the
Pr
probability that a market state ends does not depend on how long the state has lasted. If a
12
hazard function is an increasing function of time, there is a positive duration dependence. In
d
this case, the longer a market state lasts, the higher the probability that it ends.
Figure 3 illustrates various shapes of the negative binomial distribution N B(q) for q ∈
we
{1, 2, 3, 4} and the shapes of the corresponding hazard functions. For all q, the mean state
duration time is always 20. Only for q = 1 the distribution of the state duration is memory-
less. For q > 1, the state duration distribution exhibits a positive duration dependence. In
vie
particular, the probability that a state terminates increases as the state age increases. Many
researchers report that the negative binomial distribution describes many empirical phenom-
ena much better than the geometric distribution (see, for example, Levinson (1986), Burshtein
re
(1996), and Johnson (2005)).
Probability mass function Hazard rate
erNB(1) 0.20 NB(1)

NB(2) NB(2)
0.05
NB(3) NB(3)
NB(4) NB(4)
0.15
0.04
pe
Probability
0.03
Rate
0.10
0.02
0.05
0.01
ot
0.00
0.00
0 10 20 30 40 50 0 10 20 30 40 50
State duration, n State duration, n

tn
Figure 3: The left panel shows the probability mass function of state durations for the negative
binomial distribution N B(q) for various q. The right panel shows the hazard rate for the negative
binomial distribution N B(q) for various q. For all q, the mean state duration time is always 20.
rin
4.2 Topology of ESMSM and Functional Form of the Solution
We turn to the presentation of the topology of an ESMSM where the state duration times
ep
follow a negative binomial distribution. We begin with the simplest case, depicted in Figure 4,
where each macro-state A and B is represented by two sub-states. Specifically, macro-state A
consists of sub-states 1 and 2, while macro-state B consists of sub-states 3 and 4. This ESMSM
Pr
extends the conventional two-state MSM depicted in Figure 1. In this ESMSM, the duration
13
times of states A and B follow the N B(2) distribution.
d
2α
we
1 − 2α 1 2 1 − 2α
State A
vie
2β 2α
2β
1 − 2β 4 3 1 − 2β
re
State B
Figure 4: An ESMSM with two sub-states for each macro-state A and B. Specifically, macro-
er
state A consists of sub-states 1 and 2, while macro-state B is represented by sub-states 3 and
4.
pe
Before proceeding further, it is essential to clarify our notation for a general state transi-
tion probability. The states written in capital letters I and J denote macro-states (or semi-
Markovian states), while the states in lower case letters i and j or integer numbers denote
sub-states (or Markovian states). Therefore, pIJ denotes the transition probability between
ot
two macro-states, whereas pij (or, for instance, p12 ) denotes the transition probability between
two sub-states.
tn
The one-period transition probability matrix for the ESMSM in Figure 4 is given by:
   
p p12 p13 p14 1 − 2α 2α 0 0
 11
rin
  
   
p21 p22 p23 p24 
  0 1 − 2α 2α 0 

P= = . (12)
 
p31 p32 p33 p34   0 0 1 − 2β 2β 
   
   
p41 p42 p43 p44 2β 0 0 1 − 2β
ep
Each element pij of the transition probability matrix is defined in the usual manner: pij =
P rob(St+1 = j|St = i). Note that the self-transition probabilities of sub-states 1 and 2 (3
Pr
and 4) are the same p11 = p22 (p33 = p44 ). As a result, the transition probabilities from one
sub-state of macro-state A (B) to either another sub-state or another macro-state are the same
14
p12 = p23 (p34 = p41 ).
d
Provided that both α < 1/2 and β < 1/2, it is easy to check that the matrix P is indeed a
stochastic matrix whose entries are non-negative and whose rows all sum to 1. Our ESMSM
we
is constructed to reproduce the mean state duration times of the conventional two-state MSM
in Figure 1. For example, the mean state A duration time in the ESMSM is 2/(1 − p11 ) =
2/(2α) = 1/α, which is the same as the mean state A duration time in the conventional MSM.
vie
As a result, in our ESMSM, the one-period transition probabilities pIJ are the same as in the
corresponding traditional MSM (see below). Additionally, our ESMSM has the same stationary
probabilities πA and πB . All these features provide simple comparability between the ESMSM
re
and the corresponding MSM.
In the ESMSM specified by the transition probability matrix in (12), the self-transition
probability of macro-state A is computed as follows. If we know that the process is in macro-

er
state A, then the process is equally likely2 to be either in sub-state 1 or 2. If the process is in
sub-state 1, then the probability of remaining in macro-state A is p11 + p12 . If the process is in
sub-state 2, then the probability of remaining in macro-state A is p21 + p22 . Consequently, the
pe
probability pAA is computed as (p11 + p12 )/2 + (p21 + p22 )/2. All other transition probabilities
are computed in the same manner:
pAA = (p11 + p12 + p21 + p22 )/2, pBA = (p31 + p32 + p41 + p42 )/2,
ot
(13)
pAB = (p13 + p14 + p23 + p24 )/2, pBB = (p33 + p34 + p43 + p44 )/2.
tn
It is easy to check that both the ESMSM and MSM have the same one-period transition
probabilities for states A and B. For example, pAA = 1 − α in both the ESMSM and MSM.
However, the multi-period transition probabilities are different.

rin
The n-period transition probability matrix in the ESMSM is given by
 
p11 (n) p12 (n) p13 (n) p14 (n)
 
ep
 
p21 (n) p22 (n) p23 (n) p24 (n)
P(n) = Pn =  . (14)
 
p31 (n) p32 (n) p33 (n) p34 (n)
 
 
p41 (n) p42 (n) p43 (n) p44 (n)
Pr
2
It is worth noting that, when we observe that the process is in macro-state A, there is nothing that distin-
guishes the sub-states in A from each other.
15
The n-period transition probabilities of macro-states A and B are computed similarly to (13).
d
For example, the n-period self-transition probability of state A is computed as pAA (n) =
(p11 (n) + p12 (n) + p21 (n) + p22 (n))/2.
we
Now consider the general case where macro-state A is represented by g sub-states, while
macro-state B consists of p sub-states. In the general case, the state A duration time follows
the N B(g) distribution, while the state B duration time follows the N B(p) distribution. In
vie
this case, the one-period (g + p) × (g + p) transition probability matrix P is given by the
following partitioned matrix  

P
 AA PAB 
P= ,
re
PBA PBB
where PAA is the g × g sub-matrix, PAB is the g × p sub-matrix, PBA is the p × g sub-
matrix, and PBB is the p × p sub-matrix.3 For instance, the self-transition probability pAA
er
(pBB ) of macro-state A (B) is computed by summing all elements of sub-matrix PAA (PBB )
and dividing the result by g (p). Then, the complementary probability pAB (pBA ) can be
pe
calculated as pAB = 1 − pAA (pBA = 1 − pBB ).
The n-period transition probability matrix is computed in the usual manner as P(n) = Pn .
How does the analytical solution for the elements of Pn look like? Assuming that the matrix
P is diagonalizable over the field of complex numbers,4 one can find the analytical solution
ot
through the diagonalization of P. The objective in this method is to find a diagonal matrix D
that allows us to express P as P = QDQ−1 . Then, the n-th power of P can be computed as
tn
Pn = QDn Q−1 .
The diagonalization procedure consists in the following steps. First, one finds the eigen-
values λi , i ∈ {1, 2, . . . , g + p}, of P. The eigenvalues are the values of λ that satisfy the
rin
equation |P − λI| = 0. Second, one finds the eigenvectors corresponding to each eigenvalue.
The eigenvector vi is found by solving the equation (P − λi I) = vi . The diagonal matrix D
contains the eigenvalues along the main diagonal, D = diag(λ1 , . . . , λg+p ). The matrix Q is
ep
composed of eigenvectors Q = [v1 , . . . , vg+p ]. Third, one finds the inverse matrix Q−1 . Finally,
3
A limitation of our model is that the following conditions must be satisfied: α < 1/g and β < 1/p. These
conditions are typically met by real-world processes when g and p are relatively small.
4
The matrix P defined by (12) is diagonalizable because it has four distinct eigenvalues, see the proof of
Pr
Proposition 2 in the subsequent section.
16
one performs the matrix multiplication
d
 
λn
 1
we

 
 λn2 
 −1
P(n) = Q  Q . (15)

..

 . 

 
λng+p
vie
The general case is not analytically tractable. However, the computational method (15) for
the elements of the n-period transition probability matrix allows us to deduce how the solution
to the state transition probabilities looks like. For example, the solution to the n-period
re
transition probabilities pAB (n) and pBA (n) (used to compute the autocorrelation function
given by equation (1)) has the following functional form
er
pIJ (n) = c1,IJ λn1 + c2,IJ λn2 + . . . + cg+p,IJ λng+p ,
where cs,IJ , s ∈ {1, 2, . . . , g + p}, are some functions of the one-period state transition proba-
pe
bilities in the corresponding conventional Markov model, cs,IJ = cs,IJ (α, β), λ1 is the largest
eigenvalue, λ2 is the second-largest eigenvalue, etc. Hence, the first conclusion is that the
functional form of the solution is represented by a sum of exponential functions.

ot
As a rule, the largest eigenvalue of a stochastic matrix is 1, that is, λ1 = 1. All other
eigenvalues are in absolute value smaller than 1. Therefore, each exponential function cs,IJ λns ,
s > 1, approaches zero as n increases. We know that in the limit as n increases, the n-
tn
period transition probability pAB (n) (pBA (n)) approaches the stationary probability πB (πA ).
Consequently, the second conclusion is that c1,IJ = πJ and the above equation can be re-written
rin
as
pIJ (n) = πJ + c2,IJ λn2 + . . . + cp+q,IJ λng+p .
A transition probability matrix may have complex eigenvalues. These eigenvalues always
ep
occur in complex conjugate pairs and, hence, the transition probability pIJ (n) approaches πJ
in an oscillating manner. In the case all eigenvalues are real, the transition probability pIJ (n)
approaches πJ in a non-oscillating manner. Thus, the third conclusion is that the transition
Pr
probability pIJ (n) can approach the stationary probability in two fundamental manners: either
17
oscillating or non-oscillating.
d
4.3 Analytical Solutions When a Macro-State Has Two Sub-States
we
In some simple cases, the diagonalization method allows one to derive analytical solutions to the
n-period state transition probabilities. The two-state conventional Markov model is obviously
the simplest case where the second largest eigenvalue is λ2 = 1 − α − β and c2,IJ = −πJ ,
vie
see equation (8). In the ESMSM, the matrix P is a sparse matrix that contains many zero
elements. Therefore, when two sub-states represent each macro-state, it is hard but possible
to derive the analytical solutions to the n-period state transition probabilities.
re
Proposition 2. The solutions to the n-period state transition probabilities of macro-states A
and B, with two sub-states for each macro-state, are given by
pAB (n) = πB −
er1
4β
ψ(n), pBA (n) = πA −
1
4α
ψ(n), (16)
pe
where function ψ(n) is given by
(δ + C)2 n (δ − C)2 n (α − β)2

ψ(n) = λ3 − λ4 − (1 − 2δ)n , (17)
4C 4C δ
ot
πA and πB are the stationary probabilities given by equation (9), δ = α + β, λ3 = 1 − δ − C,

p
λ4 = 1 − δ + C, and C = α2 + β 2 − 6αβ assuming that C 6= 0.
tn
Therefore, in the ESMSM presented in Figure 4, the solution for the lag-n autocorrelation
yields
(µA − µB )2
rin
ρn = ψ(n). (18)
4σ 2 (α + β)
Function ψ(n) determines the functional form of the lag-n autocorrelation in the ESMSM. This
function represents the sum of three exponential functions, where the first two are functions of
ep
C. Note that C can be either a real non-zero number, zero, or a complex number depending
Pr
18
on the sign and value of α2 + β 2 − 6αβ. In particular:
d

 √ √
 a complex number if (3 − 8)β < α < (3 + 8)β,
we




√

C is 0 if α = (3 ± 8)β, (19)


√ √



a real number
 if α < (3 − 8)β or α > (3 + 8)β.
vie
One can easily deduce that C is a real number when the mean duration of one state is ap-
proximately more than six times greater than the mean duration of the other state. We do
not observe such a notable difference between the mean durations of bull and bear markets.
re
Consequently, in the context of the stock market cycles, we expect that C is a complex number.
In this case, λ3 and λ4 is a complex conjugate pair, and the analytical solution to function
ψ(n) is provided by the following proposition.

er
Proposition 3. If C is a complex number, then function ψ(n) given by equation (17) can be
rewritten in the following form:

pe
(α − β)2
ψ(n) = Rλn cos(nϕ + θ) − (1 − 2δ)n , (20)
δ
where p !
p 6αβ − α2 − β 2
ot
λ = 1 − 2δ + 8αβ, ϕ = arctan , (21)

1−δ
s !
(α − β)4 (α − β)2
δ2
tn
R= + , θ = arctan . (22)
6αβ − α2 − β 2
p
δ 6αβ − α2 − β 2
Consequently, if C is a complex number, the expression for the n-period state transition
rin
probabilities represents the difference between two components. The first component is a
damped cosine wave with a phase shift, while the second is exponential decay. Therefore,
ρn approaches zero in an oscillating manner as n increases. To gain further insight into the
ep
behavior of the lag-n autocorrelation, let us assume that α = β. In this case, the expression
for ρn can be simplified to5

(µA − µB )2 n
ρn = λ cos(nϕ). (23)
Pr
4σ 2
5
If α = β, then R = α + β and θ = 0. Besides, the second term on the right-hand side of equation (20)
disappears. Finally, πA = πB = 0.5.
19
Under this simplified assumption, it is clear-cut that a damped cosine function without a
d
phase shift represents the shape of the lag-n autocorrelation. In particular, ρn periodically
changes sign beginning from a positive one.6 Typically, because the cosine wave decays rather
we
fast, the full oscillating behavior is hard to notice. However, one can clearly see a positive
autocorrelation over the short run and a subsequent negative autocorrelation over the medium
run. That is, the return process exhibits both short-term momentum and medium-term mean
vie
reversion.
Consider the case where C is a non-zero real number. In that case, λ3 and λ4 are also real
numbers. Function ψ(n) represents the sum of three real-valued exponential functions. As in
re
the preceding case, the lag-n autocorrelation approaches zero as n increases, though in a non-
oscillating manner. Finally, consider the last case where C → 0. The subsequent proposition
(which proof is given in the Appendix) provides the solution to the n-period state transition
probabilities in that case.

er
Proposition 4. As C → 0, function ψ(n) given by equation (17) converges to
pe
(α − β)2 (α − β)2

lim ψ(n) = δ− n (1 − δ)n − (1 − 2δ)n . (24)
C→0 1−δ δ
Consequently, in this case the return autocorrelation ρn also decreases towards zero in a
ot
non-oscillating manner as n increases.
We finish this section by presenting some illustrations provided in Figure 5. Specifically, this
tn
figure shows three examples of the return autocorrelation functions in an ESMSM with two sub-
state for each macro-state. Specifically, the top panel plots the month-k return autocorrelation
function, while the bottom panel displays the first-order autocorrelation function of k-month
rin
returns. In all plots, the annualized mean state returns are µA = 20% and µB = −30%.
The annualized standard deviations of state returns are σA = σB = 20%. The one-period
transition probability from a bear to a bull state of the market is β = 0.1. The one-period
ep
transition probability from a bull to a bear state of the market takes three alternative values
√
α ∈ {0.01, (3− 8)β, 0.05}. In the case α = 0.05 (α = 0.01), the month-k return autocorrelation
√
approaches zero in an oscillating (non-oscillating) manner. The case α = (3− 8)β is the border
Pr
between the oscillatory and non-oscillatory behavior.

6
This function crosses zero each time when nϕ = kπ radians, where k is a positive integer value.
20
Autocorrelation ACF(k)
d
0.06
α = 0.01
α = (3 − 8 )β
we
α = 0.05
0.04
ACF(k)
vie
0.02
0.00
re
0 10 20 30 40
Months, k
er
First−order autocorrelation of multiperiod returns AC1(k)
α = 0.01
α = (3 − 8 )β
pe
0.15
α = 0.05
0.10
AC1(k)
ot
0.05
tn
0.00
0 10 20 30 40 50
rin
Months, k
Figure 5: The return autocorrelations in an ESMSM with two sub-state for each macro-state. ACF (k)
denotes the month-k return autocorrelation ρk . AC1(k) denotes the first-order autocorrelation of k-
month returns. The annualized mean state returns are µA = 20% and µB = −30%. The annualized
ep
standard deviations of state returns are σA = σB = 20%. The one-period transition√probability β = 0.1.
The one-period transition probability α takes three alternative values {0.01, (3 − 8)β, 0.05}.
In all three cases, the month-k return autocorrelation crosses zero at least once. In par-
Pr
ticular, in all cases, the autocorrelation function changes sign from positive to negative no
21
less than once. When it comes to the shape of the first-order autocorrelation of k-month re-
d
turns, qualitatively, it remains the same in all three cases. Specifically, it quickly increases and
then, after reaching the maximum, gradually decreases below zero. These examples motivate
we
that the return process in our ESMSM exhibits both short-term momentum and subsequent
medium-term mean reversion.
vie
4.4 Numerical Solutions For the General Case
If one macro-state in an ESMSM is represented by more than two sub-states, then the n-
period transition probabilities can be computed using matrix multiplication routines available
re
in many mathematical software programs. All that is needed is to define the one-period
transition probability matrix in an ESMSM. For example, in an ESMSM where either macro-
state is represented by three sub-states, the one-period transition probability matrix is given
by 
er 
1 − 3α 3α 0 0 0 0 
 
 0 1 − 3α 3α 0 0 0 
pe
 
 
 0 0 1 − 3α 3α 0 0 
 
P=
 .
 (25)
 0 0 0 1 − 3β 3β 0 
 
 
 0 0 0 0 1 − 3β 3β 
 
 
ot
3β 0 0 0 0 1 − 3β
We remind the reader that, under our convention, the mean state duration times in an
tn
ESMSM are the same as in the corresponding conventional MSM. Therefore, for example, the
transition probability from one sub-state to another sub-state of macro-state A equals 3α. This
choice ensures that the mean state A duration time equals 1/α. However, whereas the state
rin
duration times in a conventional MSM follow the geometric distribution, the state duration
times in an ESMSM specified by the transition probability matrix in (25) are governed by the
N B(3) distribution.
ep
Our numerical experiments reveal that, under realistic model parameters, the solution to
the return autocorrelation function ρn in an ESMSM with more than two sub-states for each
macro-state is qualitatively similar to that where an ESMSM has two sub-states for each
Pr
macro-state. For the sake of illustration, Figure 6 shows the return autocorrelations in an
ESMSM with three sub-states for each macro-state. As in the preceding section, this ESMSM
22
assumes monthly returns. In the figure, the red line with points plots the month-k return
d
autocorrelation, whereas the blue line with points plots the first-order autocorrelation of k-
month returns. Except for the number of sub-states for each macro-state, the other model
we
parameters are the same as those in Figure 2. Specifically, the annualized mean state returns
are µA = 20% and µB = −30%. The annualized standard deviations of state returns are
σA = σB = 20%. The mean state A (bull market) duration time equals 20 months, and the
vie
mean state B (bear market) duration time equals 10 months.
0.15
re
ACF(k)
AC1(k)
0.10
Autocorrelation
0.05
er
0.00
pe
−0.05
−0.10
0 10 20 30 40
ot
Months, k
Figure 6: Return autocorrelations in an expanded-state Markov switching model with three sub-states
tn
for each macro-state. ACF (k) denotes the month-k return autocorrelation ρk . AC1(k) denotes the
first-order autocorrelation of k-months returns. The annualized mean state returns are µA = 20% and
µB = −30%. The annualized standard deviations of state returns are σA = σB = 20%. The mean state
A duration time equals 20 months, whereas the mean state B duration time equals 10 months.
rin
It is instructive to compare the shapes of the autocorrelation functions in the conventional
MSM depicted in Figure 2 and those in the ESMSM presented in Figure 6. Whereas the month-
k return autocorrelation exponentially decreases towards zero in the MSM, the month-k return
ep
autocorrelation exhibits a damped oscillating behavior around zero in the ESMSM. While the
first-order autocorrelation of k-month returns is always positive in the MSM, the first-order
autocorrelation of k-month returns is positive initially, and subsequently, its sign changes to
Pr
negative in the ESMSM. Again, it is worth noting that the first-order autocorrelation of k-
month returns is notably larger in absolute value than the month-k return autocorrelation for
23
k > 1.
d
In concluding this section, it should be emphasized that a semi-Markov model is able
to reproduce both the short-term momentum and medium-term mean reversion under the
we
condition of positive duration dependence. In other words, the state termination probability
must increase with the state age. In most empirical studies, the researchers document positive
duration dependence in bull and bear market states. Thus, this condition is satisfied. Why does
vie
positive duration dependence induce medium-term mean reversion? The negative binomial
distribution can provide an answer as follows. As motivated by Figure 3, the larger the value
of q in the N B(q) distribution, the lower (higher) the probability of state termination if the
re
state is young (old). Therefore, the larger the value of q, the lesser the uncertainty in the
state duration time. Thus, positive duration dependence induces some regularity of the stock
market cycles.
er
Additionally, the higher the degree of positive duration dependence, the more regular are
the stock market cycles and the more pronounced is the mean-reverting behavior. This rela-
tionship is depicted in Figure 7 that plots the month-k return autocorrelation in the ESMSM
pe
where each macro-state is represented by q ∈ {1, 2, 3, 4} sub-states. For all q, the mean state
A (bull market) duration time is 20, while the mean state B (bear market) duration time is
10. For each q, the state A and B duration times follow N B(q) distribution. The curves in the
ot
figure clearly illustrate that the larger the value of q, the stronger the mean-reverting behavior.
One can further formalize the discussion presented in the preceding paragraph using the
tn
following mathematical arguments. The goal is to demonstrate the reduction of uncertainty
in the state duration time when q increases. Under our construction, the mean macro-state
I duration time is constant and equals to q/(q pIJ ) = 1/pIJ , where pIJ is the probability of
rin
transiting from macro-state I to macro-state J over one period. However, the variance of
the state duration time equals q(1 − q pIJ )/(q pIJ )2 = 1/q/p2IJ − 1/pIJ . Consequently, as q
increases, the mean macro-state I duration time remains the same, but the variance of the
ep
macro-state I duration time decreases. As a result, as q increases, the probability distribution
of the state duration concentrates more and more around the mean. Evidently, as q increases,
the market states interchange with higher regularity that materializes in negative medium-
Pr
term autocorrelations. That is, the medium-term mean reversion is the manifestation of some
regularity in the stock market cycles.
24
Autocorrelation
d
0.10
NB(1)
we
NB(2)
NB(3)
NB(4)
0.05
ACF(k)
vie
0.00
re
−0.05
0 10 20 30 40
Months, k
er
Figure 7: The return autocorrelation ρn in an expanded-state Markov switching models with q sub-
states for each macro-state for various q ∈ {1, 2, 3, 4}. The annualized mean state returns are µA = 20%
and µB = −30%. The annualized standard deviations of state returns are σA = σB = 20%. The mean
state A duration time equals 20 months, whereas the mean state B duration time equals 10 months.
pe
The intuition behind the strengthening of mean reversion due to higher regularity in state
changes can be reinforced as follows. Consider what happens with the return autocorrelation
function in the ESMSM with two sub-states for each macro-state when the state duration times
ot
become certain. Specifically, consider the case where

tn
α → 1/2 and β → 1/2. (26)
In this limiting case, the variance of the macro-state I duration time is zero, α = β, and, there-
rin
fore, the return autocorrelation function is given by equation (23). Besides, it is easy to check
that under conditions (26) we get λ = 1 and ϕ = π/2. Therefore, the return autocorrelation
function reduces to
(µA − µB )2
ep
ρn = cos(nπ/2). (27)
4σ 2
The conclusion is that, with deterministic state duration times, the shape of the lag-n autocor-
relation is represented by a cosine function without a phase shift and damping (if we extend
Pr
n to real numbers). Put differently, when the variance of the state duration time approaches
25
zero, the market states start to interchange with perfect regularity.
d
5 Empirical Application
we
5.1 Data and Descriptive Statistics of Bull and Bear Markets
Our empirical application uses the data on two famous stock market indices: the Standard and
vie
Poor’s (S&P) Composite index and the Dow Jones Industrial Average (DJIA) index. All data
come at the monthly frequency and represent capital gain returns. Our sample period begins
in January 1897 and ends in December 2020 (124 full years), giving 1488 monthly observations.
re
The data on the S&P Composite index is collected from two sources. In particular, the index
returns over the period from January 1897 to December 1925 are provided by William Schwert
(schwert.ssb.rochester.edu). The index returns for this period are constructed using a
er
collection of early stock market indices for the US. The methodology of construction is described
in all detail in Schwert (1990). From January 1926 to February 1957, the index returns are
the returns on the S&P 90 stock market index. Beginning from March 1957, the index returns
pe
are the returns on the S&P 500 stock market index. The index returns over the period from
January 1926 to December 2020 are provided by Amit Goyal (www.hec.unil.ch/agoyal/).
The data on the DJIA index over the total sample period are provided by S&P Dow Jones
ot
Indices LLC (www.spglobal.com).
Using the capital gain returns, we reconstruct each stock index value. The bull and bear
tn
market turning points are identified using the method proposed by Pagan and Sossounov
(2003). This method seems to be the most widely accepted method among researchers for such
purposes (some notable examples are Gonzalez, Powell, Shi, and Wilson (2005), Kaminsky and
rin
Schmukler (2007), and Claessens, Kose, and Terrones (2012)). In brief, this method adopts,
with minor modifications, the dating algorithm developed by Bry and Boschan (1971) to
identify the US business cycle turning points using the GDP data. By and large, this algorithm
ep
is a pattern recognition algorithm based on a set of rules. First, the algorithm finds peaks and
troughs in a data series. Second, the algorithm performs several censoring operations to ensure
that a complete stock cycle lasts at least 16 months and a market state lasts at least 5 months
Pr
unless a rise or fall in the stock price exceeds 20%.
For each stock market index, Table 1 presents the summary statistics of the bull and bear
26
S&P Composite index Dow Jones index
Statistic
d
Bull markets Bear markets Bull markets Bear markets
Number of states 34 33 38 37
we
Minimum duration 4 3 7 3
Mean duration 29.03 14.52 27.11 13.00
Maximum duration 74 40 73 34
Mean return 23.02 -27.33 23.92 -28.87
Standard deviation 15.52 18.47 15.95 19.08
vie
Table 1: Summary statistics of the bull and bear market states. Duration is measured in
months. Mean returns and standard deviations are annualized and reported in percentages.
markets. Even though there are some differences between the descriptive statistics for each
re
market index, they share lots of similarities. The mean return is equal to 23% (-28%) in a bull
(bear) state of the market, while the standard deviation of returns amounts to 16% (19%) in a
bull (bear) state of the market. The difference between the mean returns in the bull and bear
er
states of the market is substantial. By contrast, the difference between the standard deviation
of returns is negligible. These observations suggest that the market states differ mainly in their
mean returns, not in their standard deviations. The mean duration of a bull (bear) market is
pe
equal to about 28 (14) months. The variable of primary interest is the discrepancy between
the mean durations of the bull and bear market states. Regardless of the choice of a stock
market index, the mean bull market duration is approximately twice as long as the mean bear
ot
market duration. Consequently, we expect that the autocorrelation function of returns to each
index exhibits a damped oscillatory behavior.

tn
5.2 Fitting Statistical Distributions to Bull and Bear Duration Data
In our semi-Markov model, the state duration times follow the negative binomial N B(q) dis-
rin
tribution. The probability mass function of the N B(q) distribution is given by equation (11).
The probability mass function f (n, q, p) describes the probability that the qth success will oc-
cur in the nth Bernoulli trial (the parameter p is the probability of success in a single trial). We
ep
remind the reader that N B(1) distribution is equivalent to the geometric distribution. This
section fits the N B(q) distribution to the state duration data to determine which q fits the
data best.
Pr
In fitting the distributions, we rely on the method of maximum likelihood. The standard
procedure is to find the pair of parameters (q, p) that maximizes the log-likelihood function.
27
A complication is that q is usually extended to real numbers, but our model assumes that q is
d
an integer number. To tackle this problem, we assume that q is known and find the maximum
likelihood estimator for p only. We do it sequentially for various integer values q ∈ {1, . . . , 6}
we
and select the value of q, which maximizes the log-likelihood. Additionally, we conduct the
Kolmogorov-Smirnov test to formally evaluate the goodness-of-fit of the negative binomial
distribution to the state duration data. The Kolmogorov-Smirnov test is a nonparametric test
vie
of equality between two distributions. In our case, we test the equality between the fitted
N B(q) distribution and the empirical distribution.
Bull markets Bear markets

q
re
p Log-likelihood P-value p Log-likelihood P-value
Panel A: S&P Composite index
1 0.033 -140.33 0.03 0.064 -114.98 0.01
2 0.064 -133.38 0.30 0.121 -108.82 0.32
3 0.094 -131.45 0.63
er 0.171 -107.21 0.82
4 0.121 -131.27 0.88 0.216 -107.05 0.93
5 0.147 -131.89 0.97 0.256 -107.50 0.86
6 0.171 -132.94 0.87 0.292 -108.23 0.73
Panel B: Dow Jones index
pe
1 0.036 -150.26 0.01 0.071 -122.93 0.00
2 0.070 -142.83 0.41 0.132 -117.23 0.12
3 0.102 -140.90 0.95 0.185 -116.36 0.49
4 0.131 -140.88 0.86 0.233 -116.95 0.27
5 0.159 -141.74 0.59 0.275 -118.11 0.14
6 0.185 -143.05 0.39 0.313 -119.50 0.08
ot
Table 2: The results of estimations and tests. p is the probability of success in one Bernoulli
trial in the N B(q) distribution. Log-likelihood is the value of the maximum likelihood
tn
estimation of p for various q ∈ {1, . . . , 6} in the N B(q) distribution. P-value is the p-value
of the Kolmogorov-Smirnov test of the equality between the empirical distribution of state
durations and the fitted N B(q) distribution.
For each stock market index and market state, Table 2 reports the estimated p and the
rin
log-likelihood values of the maximum likelihood estimation of p for various q ∈ {1, . . . , 6}.
Besides, this table reports the p-value of the Kolmogorov-Smirnov test of the equality between
ep
the empirical distribution of state durations and the fitted negative binomial distribution.
Our first observation is that the Kolmogorov-Smirnov test rejects the equality between the
empirical distribution of state durations and the fitted negative binomial distribution only for
Pr
q = 1. That is, we have evidence that none of the state duration times follow the geometric
distribution. Consequently, a conventional Markov model cannot be used to model the bull-
28
bear dynamics of the selected stock market indices. However, a semi-Markov model where
d
the state duration times follow a negative binomial distribution with q ∈ {2, . . . , 6} represents
a reasonable model. Our second observation is that the negative binomial distribution with
we
q = 4 maximizes the log-likelihood function for virtually all stock market indices and market
states. Therefore, for the sake of uniformity, we assume that the state duration times are
governed by the N B(4) distribution in the rest of the paper.
vie
The top panels in Figure 8 plot the histograms of the bull and bear market durations for
the S&P Composite index. The bottom panels in Figure 8 plot the histograms of the bull
and bear market durations for the Dow Jones index. In each panel, the lines with blue points
re
plot the fitted geometric distribution, while the lines with red points plot the fitted N B(4)
distribution. The visual observation of the curves in these panels reinforces the evidence that
the negative binomial distribution fits the state duration data substantially better than the
geometric distribution.
er
5.3 Model Calibration and Results
pe
In this section, we estimate the empirical lag-n return autocorrelation ρn and the first-order
autocorrelation of k-period returns AC1(k). Subsequently, we compute the model-implied
ρn and AC1(k) in our ESMSM and the conventional MSM using the fitted model parame-
ot
ters. Finally, we compare and contrast the empirical autocorrelations with the model-implied
autocorrelations.
tn
All autocorrelations in our study are estimated using a highly robust covariance (and
correlation) estimation method suggested by Rousseeuw (1984) and further developed by
Rousseeuw (1985). The covariance is estimated using the minimum covariance determinant
rin
(MCD) method, which is highly resistant to outliers.7 The problem is that the exact MCD
method is extremely time-consuming. In our study, we rely on the FAST-MCD method devel-
oped by Rousseeuw and Driessen (1999).

ep
We intend to estimate the first-order autocorrelation of k-period returns for k ∈ {1, . . . , 30}
months. The fundamental problem with these estimations is that we have only a relatively small
number of non-overlapping intervals of length 30 months. Therefore, as in Fama and French

Pr
7
In contrast, Fama and French (1988) estimate the first-order autocorrelation of multi-period returns using
a standard OLS regression. However, a few outliers in the return data may highly influence the covariance and
correlation estimation by biasing the estimates away from values representative for most of the sample.
29
Bull market duration, S&P Composite index Bear market duration, S&P Composite index
d
0.04
0.06
Distribution Distribution
Geometric Geometric
we
Negative binomial Negative binomial
0.05
0.03
0.04
Density
Density
0.02
0.03
vie
0.02
0.01
0.01
0.00
0.00
0 20 40 60 80 0 10 20 30 40
re
Duration, months Duration, months
Bull market duration, Dow Jones index Bear market duration, Dow Jones index
Distribution Distribution
er 0.06
Geometric Geometric
0.030
Negative binomial Negative binomial

0.05
0.04
0.020
pe
Density
Density
0.03
0.02
0.010
0.01
0.000
0.00
ot
0 20 40 60 80 0 10 20 30 40
Duration, months Duration, months

tn
Figure 8: The histograms of the bull and bear market durations. The lines with blue points plot the
fitted geometric distribution, while the lines with red points plot the fitted N B(4) distribution.
(1988), to increase the number of observations of k-month returns, we employ overlapping

rin
intervals of k months.8
We wish to estimate the empirical autocorrelations and conduct the hypothesis test that the
estimated autocorrelations are statistically significantly different from zero. That is, under the
ep
null hypothesis, all autocorrelations are zeros. By and large, this null hypothesis is equivalent
to a presumption that the returns are independent and identically distributed. We employ the
8
It is known that estimates obtained using overlapping blocks of data are biased in short samples (see Fama
Pr
and French (1988), Kim, Nelson, and Startz (1991), and Nelson and Kim (1993) among others). However,
our sample is not short because it contains 1488 monthly observations. Our extensive simulation experiments
confirm that the bias in estimating the first-order autocorrelation of k-period returns is negligibly small.
30
randomization method to conduct the test of the null hypothesis. In essence, randomization
d
consists of reshuffling the data and then recalculating the test statistics for each reshuffling to
estimate its distribution under the null hypothesis.
we
To be more specific, we randomize the return series 1,000 times, each time obtaining a new
estimate for ρ∗n and AC1(k)∗ .9 Then, for example, the collection of all estimates for AC1(k)∗
constitutes the probability distribution of AC1(k) under the null hypothesis. We compute the
vie
90% confidence interval for AC1(k) under the null hypothesis using this probability distribu-
tion. In this case, if the estimated value of AC1(k) lies outside of the 90% confidence interval,
this value is statistically significantly different from zero at the 5% level in a one-tailed test.
re
The ESMSM and the corresponding MSM are calibrated to empirical data using the fol-
lowing methodology. The idea behind our procedure is to ensure that in both the ESMSM and
the corresponding MSM, the mean state duration times and stationary state probabilities are
er
the same. In the geometric distribution, the mean state duration times are given by equation
(7). Therefore, in the MSM, the one-period transition probability from state A (bull market)
to state B (bear market) equals α = 1/E[dA ], where E[dA ] is the mean state A duration time.
pe
Similarly, the one-period transition probability from state B to state A is given by β = 1/E[dB ],
where E[dB ] is the mean state B duration time. Under our theoretical construction, in the
ESMSM with four sub-states for each macro-state, the one-period transition probability from
ot
one sub-state of macro-state A (B) to another sub-state equals 4α (4β). The one-period tran-
sition probabilities computed in the manner described above are only marginally different from
tn
the probabilities reported in Table 2.
The left panels in Figure 9 plot the lag-n autocorrelation of log returns, ρn , while the
right panels plot the first-order autocorrelation of k-period log returns, AC1(k). The black
rin
lines with points show the empirically estimated autocorrelations. The shaded areas indicate
the 90% confidence interval for the estimated autocorrelations under the null hypothesis of
i.i.d. returns. The blue lines with points depict the autocorrelations implied by the fitted
ep
conventional Markov model. The red lines with points depict the autocorrelations implied
by the fitted semi-Markov model, where four Markovian states represent one semi-Markovian
state. The top panels in Figure 9 plot the results of estimations and calibrations for the S&P
Pr
Composite index, while the bottom panels show the results of estimations and calibrations for
9
Asterisk is used to indicate that each of these estimates is calculated on a randomized sample.
31
the Dow Jones index.
d
S&P Composite index S&P Composite index
we
0.15
0.3
Autocorrelation of k−month returns, AC1(k)
0.10
Autocorrelation function, ACF(n)
0.2
0.05
0.1
vie
0.00
0.0
−0.05
−0.1
−0.10
−0.2
Empirical Empirical
Semi−Markov model Semi−Markov model
−0.15
−0.3
Markov model Markov model
re
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Lag n, months Number of months, k
Dow Jones index er Dow Jones index

0.15
0.3
Autocorrelation of k−month returns, AC1(k)
0.10
Autocorrelation function, ACF(n)
0.2
pe
0.05
0.1
0.00
0.0
−0.05
−0.1
−0.10
−0.2
ot
Empirical Empirical
Semi−Markov model Semi−Markov model
−0.15
−0.3
Markov model Markov model
0 5 10 15 20 25 30 0 5 10 15 20 25 30
tn
Lag n, months Number of months, k
Figure 9: The results of estimations and calibrations. The left panels plot the lag-n autocorrelation of
log-returns (ACF (n), ρn ). The right panels plot the first-order autocorrelation of k-period log returns
(AC1(k)). The black lines with points show the empirically estimated autocorrelations. The shaded
rin
areas indicate the 90% confidence interval for the estimated autocorrelation under the null hypothesis of
i.i.d. returns. The blue lines with points depict the autocorrelations implied by the fitted conventional
Markov model. The red lines with points depict the autocorrelations implied by the fitted semi-Markov
model, where four Markovian states represent one semi-Markovian state. The top panels in Figure 9
plot the results of estimations and calibrations for the S&P Composite index, while the bottom panels
ep
show the results of estimations and calibrations for the Dow Jones index.
First and foremost, our results present convincing evidence of the presence of both short-
Pr
term momentum and medium-term mean reversion in the returns on the two stock market
indices. This evidence is mainly obtained by comparing the empirically estimated first-order
32
autocorrelation of k-period returns with the boundaries of the 90% confidence interval under
d
the null hypothesis of i.i.d. returns. The evidence is stronger in the returns on the S&P
Composite index. In particular, for this index, the estimated values of AC1(k) are statistically
we
significantly above zero over the periods from 3 to 9 months and statistically significantly below
zero over the periods from 14 to 18 months. Using the returns on the Dow Jones index, the
values of AC1(k) are statistically significantly positive (negative) over the periods from 4 to 6
vie
months (from 15 to 16 months).
For both stock market indices, most of the estimated lag-n autocorrelations lie inside the
90% confidence interval. For both indices, the lag-5 (lag-22) autocorrelation is statistically
re
significantly above (below) zero at the 5% level. Additionally, for the S&P Composite (Dow
Jones) index, the lag-11 (lag-18) autocorrelation is statistically significantly above (below) zero.
Therefore, the evidence of short-term momentum and medium-term mean reversion is weaker,
er
judging by the estimated lag-n autocorrelation values. Besides, because of the limited number
of statistically significant values, there is another problem in drawing inference from the lag-n
autocorrelations. Specifically, due to the multiple-testing issue, some of the estimated lag-n
pe
autocorrelations can be statistically significant due to luck or chance.
Second but no less crucial, our results present convincing evidence that the semi-Markov
model is much better in explaining the shape of the empirically estimated AC1(k) function than
ot
the conventional Markov model. Specifically, the fitted conventional Markov model implies
only a short-term momentum that should be strong and cause statistically significant values
tn
of AC1(k) over periods from 1 to 20 months. In contrast, the fitted semi-Markov model
predicts a short-term momentum that should generate statistically significant values of AC1(k)
over periods from 1 to 8 months. Subsequently, over periods longer than 11-12 months, the
rin
fitted semi-Markov model forecasts negative values of AC1(k) that should not be statistically
significant.
Purely qualitatively, the shape of the semi-Markov model-implied AC1(k) and the shape
ep
of the empirically estimated AC1(k) look similar. The semi-Markov model correctly captures
the duration of the short-term momentum that lasts about 10-12 months and subsequently
reverses. Quantitatively though, the model-implied momentum is stronger than the estimated
Pr
momentum. The difference is especially noticeable over periods from 1 to 5 months. In
contrast, the model-implied mean-reversion is weaker than the estimated mean reversion. Here
33
the difference is noticeable over periods from 14 to 17 months.
d
5.4 Discussion
we
The results reported in the preceding section demonstrate that the fitted semi-Markov model
generates the shape of the first-order autocorrelation function of multi-period returns, AC1(k),
that is qualitatively similar to the empirically estimated shape. However, the match between
vie
the empirical and model-implied AC1(k) is far from perfect. In this section, we discuss some
potential explanations for the observed mismatch.
Universally, any discrepancy between the model predictions and empirical data stems from
re
the model misspecification. One apparent problem with our model is that there might be
more than two regimes in the return process in real markets. In particular, even though the
researchers often assume only two states in the stock market, several studies extend the number
er
of market states. For example, Dias, Vermunt, and Ramos (2015) and Liu and Wang (2017)
employ a three-state regime-switching model. Maheu, McCurdy, and Song (2012) and Jiang
and Fang (2015) operate with a four-state regime-switching model. Finally, De Angelis and
pe
Paas (2013) estimate a seven-state model. The presence of more than two regimes in the
stock market can potentially account for the observed mismatch between the predictions of
our semi-Markov model and the empirical estimates.

ot
In a two-state model, a bull state is a low-volatility high-return state, whereas a bear state
is a high-volatility low-return state. Contrastingly, the models that employ more than two
tn
states typically have several types of bull (bear) market states. For example, these models
distinguish between low-volatility and high-volatility bull (bear) markets. Besides, bull (bear)
markets may have different mean returns. Additionally, some of these models assume the
rin
existence of sideways trending markets. Maheu et al. (2012) consider an additional possibility:
the presence of shorter-term mean reversions in both bull and bear states of the market. In
particular, these authors assume that bull markets contain periods of bull market corrections,
ep
whereas bear markets have periods of bear market rallies.
The assumptions behind the model suggested by Maheu et al. (2012) can be justified by
the Dow Theory developed at the end of the 19th century (see Brown, Goetzmann, and Kumar
Pr
(1998) and references therein). Among other things, this theory postulates the existence of
several types of trends in financial markets. The primary trend is the most dominant of all
34
types of trends. Primary trends are classified as bull and bear markets that last from a few
d
months to a few years. Secondary trends, which may last from a few weeks to a few months,
move oppositive to the primary trend. Finally, minor trends, which last from a few hours to a
we
few weeks, can move with or against the primary trend.
By and large, the Dow Theory presupposes a simultaneous existence of stock market cycles
of different durations. Our semi-Markov model exclusively focuses on the primary stock market
vie
trends. The presence of secondary trends can significantly alter the behavior of the return
autocorrelation function at shorter lags and explain the observed discrepancy between our
model predictions and empirical estimates at the first 4 lags. Specifically, while our model
re
predicts large positive and statistically significant return autocorrelation at shorter lags, the
empirical data reveal either small or absent autocorrelations at these lags. We conjecture that
the main explanation for the observed mismatch consists in the presence of secondary trends in
er
the stock markets. The existence of secondary trends in various financial markets have recently
been documented by Zaremba, Long, and Karathanasopoulos (2019).
Finally, in our model, the state duration times are governed exclusively by the negative
pe
binomial distribution. Even though the statistical tests cannot reject the assumption that
the state duration times follow the negative binomial distribution, in reality, the distributions
of the state duration times are likely to depart from the negative binomial. Therefore, the
ot
deviations from the negative binomial distribution may be responsible for some discrepancies
between our model predictions and the empirical data.

tn
6 Conclusions
We present a semi-Markov model where the return process randomly switches between bull
rin
and bear states. Our semi-Markov model is realized as an expanded-state Markov model
where several Markovian states represent one semi-Markovian state. In our model, the state
duration times are governed by a negative binomial distribution that exhibits a positive du-
ep
ration dependence. We offer the analytical solutions to the return autocorrelation function
for the simplest case, where two Markovian states represent each semi-Markovian state. In
Pr
the general case, the return autocorrelation function can be computed using simple numeri-
cal methods. We demonstrate that the return process in our model induces both short-term
35
momentum and medium-term mean reversion. Under realistic model parameters, the shape of
d
the autocorrelation function represents a damped cosine wave that decays rather fast.
Positive autocorrelations at shorter lags show up because, most often, the return process is
we
more likely to remain in the same state than to switch to another state. The intuition behind
the appearance of negative autocorrelations at longer lags is as follows. Provided the absence
of duration dependence, the switching between the two states is entirely irregular. In this case,
vie
the return process exhibits only short-term momentum. When both states exhibit a positive
duration dependence,10 some regularity in the state changes emerges. As a result, the return
process starts to show both short-term momentum and medium-term mean reversion.
re
Our model is easy to fit to empirical data. We calibrate our model to monthly returns
on the Dow Jones and Standard and Poor’s Composite indices. We demonstrate that the fit
is reasonably good. In particular, our model correctly captures the duration of short-term
er
momentum that lasts about 10-12 months and subsequently reverses. The largest discrepancy
between the model-implied autocorrelations and the empirically estimated autocorrelations is
observed at the shortest lags. We conjecture that the main reason for this discrepancy is the
pe
presence of higher-frequency regimes in the return process.
All in all, our model represents a parsimonious, simple-to-compute, and easy-to-calibrate
regime-switching model for stock returns. This model explains both short-term momentum
ot
and medium-term mean reversion documented by numerous empirical studies.
References
tn
Balvers, R., Wu, Y., and Gilliland, E. (2000). “Mean Reversion across National Stock Markets
and Parametric Contrarian Investment Strategies”, Journal of Finance, 55 (2), 745–772.
rin
Balvers, R. J., Hu, O., and Huang, D. (2012). “Transitory Market States and the Joint
Occurrence of Momentum and Mean Reversion”, Journal of Financial Research, 35 (4),
471–495.
Balvers, R. J. and Wu, Y. (2006). “Momentum and Mean Reversion Across National Equity
ep
Markets”, Journal of Empirical Finance, 13 (1), 24–48.

10
We strongly believe that the positive duration dependence is the primary explanation for the presence of
mean reversion. Our theoretical model employs only one particular duration distribution because it provides
some analytical tractability and fast and straightforward numerical computations. Alternatively, the return
Pr
autocorrelation function in a semi-Markov model with an arbitrary duration distribution can be studied using
the Monte Carlo simulation method. This method is simple but computationally intensive. Our extensive
simulations (not reported in this paper) using various duration distributions with a positive duration dependence
confirm the relation between the regularity of state changes and the strength of mean reversion.
36
Barberis, N. and Shleifer, A. (2003). “Style Investing”, Journal of Financial Economics, 68 (2),
d
161–199.
Barbu, V. S. and Limnios, N. (2008). Semi-Markov Chains and Hidden Semi-Markov Models
we
toward Applications: Their Use in Reliability and DNA Analysis. Springer-Verlag, New
York.
Brown, S. J., Goetzmann, W. N., and Kumar, A. (1998). “The Dow Theory: William Peter
Hamilton’s Track Record Reconsidered”, Journal of Finance, 53 (4), 1311–1333.
vie
Bry, G. and Boschan, C. (1971). Cyclical Analysis of Time Series: Selected Procedures and
Computer Programs. NBER.
Burshtein, D. (1996). “Robust Parametric Modeling of Durations in Hidden Markov Models”,
re
IEEE Transactions on Speech and Audio Processing, 4 (3), 240–242.
Claessens, S., Kose, M. A., and Terrones, M. E. (2012). “How Do Business and Financial
Cycles Interact?”, Journal of International Economics, 87 (1), 178–190.
er
Cochran, S. J. and Defina, R. H. (1995). “Duration Dependence in yhe US Stock Market
Cycle: A Parametric Approach”, Applied Financial Economics, 5 (5), 309–318.
pe
De Angelis, L. and Paas, L. J. (2013). “A Dynamic Analysis of Stock Markets Using a Hidden
Markov Model”, Journal of Applied Statistics, 40 (8), 1682–1700.
De Bondt, W. F. M. and Thaler, R. (1985). “Does the Stock Market Overreact?”, Journal of
Finance, 40 (3), 793–805.
ot
Dias, J. G., Vermunt, J. K., and Ramos, S. (2015). “Clustering Financial Time Series: New
Insights From an Extended Hidden Markov Model”, European Journal of Operational
Research, 243 (3), 852 – 864.
tn
Fama, E. F. and French, K. R. (1988). “Permanent and Temporary Components of Stock

Prices”, Journal of Political Economy, 96 (2), 246–273.
Ferguson, J. D. (1980). “Variable Duration Models for Speech”, In Ferguson, J. D. (Ed.),

rin
Proceedings of the Symposium on the Application of Hidden Markov Models to Text and
Speech, pp. 143–179. Princeton, New Jersey.
Frühwirth-Schnatter, S. (2006). Finite Mixture and Markov Switching Models. Springer, New
ep
York.
Georgopoulou, A. and Wang, J. G. (2016). “The Trend Is Your Friend: Time-Series Momentum
Strategies across Equity and Commodity Markets”, Review of Finance, 21 (4), 1557–1592.
Pr
Gonzalez, L., Powell, J. G., Shi, J., and Wilson, A. (2005). “Two Centuries of Bull and Bear
Market Cycles”, International Review of Economics and Finance, 14 (4), 469 – 486.
37
Guèdon, Y. (2005). “Hidden Hybrid Markov/Semi-Markov Chains”, Computational Statistics
d
& Data Analysis, 49 (3), 663–688.
Hamilton, J. D. (1994). Time Series Analysis. Princetony, New Jersey.
we
Harman, Y. S. and Zuehlke, T. W. (2007). “Nonlinear Duration Dependence in Stock Market
Cycles”, Review of Financial Economics, 16 (4), 350 – 362.
He, X.-Z. and Li, K. (2015). “Profitability of Time Series momentum”, Journal of Banking &
vie
Finance, 53, 140–157.
Hong, H. and Stein, J. C. (1999). “A Unified Theory of Underreaction, Momentum Trading,

and Overreaction in Asset Markets”, Journal of Finance, 54 (6), 2143–2184.
re
Howard, R. A. (1971). Dynamic Probabilistic Systems, Volume II: Semi-Markov and Decision
Processes. John Wiley & Sons, Inc., New York.
Hurst, B., Ooi, Y. H., and Pedersen, L. H. (2017). “A Century of Evidence on Trend-Following
Investing”, Journal of Portfolio Management, 44 (1), 15–29.
er
Jegadeesh, N. (1991). “Seasonality in Stock Price Mean Reversion: Evidence from the U.S.
and the U.K.”, Journal of Finance, 46 (4), 1427–1444.
pe
Jegadeesh, N. and Titman, S. (1993). “Returns to Buying Winners and Selling Losers: Impli-
cations for Stock Market Efficiency”, Journal of Finance, 48 (1), 65–91.
Jiang, Y. and Fang, X. (2015). “Bull, Bear or Any Other States in US Stock Market?”,
Economic Modelling, 44, 54 – 58.
ot
Johnson, M. T. (2005). “Capacity and Complexity of HMM Duration Modeling Techniques”,

IEEE Signal Processing Letters, 12 (5), 407–410.
tn
Kaminsky, G. L. and Schmukler, S. (2007). “Short-Run Pain, Long-Run Gain: Financial

Liberalization and Stock Market Cycles”, Review of Finance, 12, 253–292.
Kim, M. J., Nelson, C. R., and Startz, R. (1991). “Mean Reversion in Stock Prices? A
rin
Reappraisal of the Empirical Evidence”, Review of Economic Studies, 58 (3), 515–528.
Langrock, R. and Zucchini, W. (2011). “Hidden Markov Models With Arbitrary State Dwell-
Time Distributions”, Computational Statistics & Data Analysis, 55 (1), 715–724.
ep
Levinson, S. (1986). “Continuously Variable Duration Hidden Markov Models for Speech
Analysis”, In ICASSP ’86. IEEE International Conference on Acoustics, Speech, and
Signal Processing, Vol. 11, pp. 1241–1244.
Pr
Lim, B. Y., Wang, J. G., and Yao, Y. (2018). “Time-Series Momentum in Nearly 100 Years of
Stock Returns”, Journal of Banking & Finance, 97, 283–296.
38
Liu, Z. and Wang, S. (2017). “Decoding Chinese Stock Market Returns: Three-State Hidden
d
Semi-Markov Model”, Pacific-Basin Finance Journal, 44, 127 – 149.
Lo, A. W. and MacKinlay, A. G. (1988). “Stock Market Prices do not Follow Random Walks:
we
Evidence from a Simple Specification Test”, Review of Financial Studies, 1 (1), 41–66.
Lunde, A. and Timmermann, A. (2004). “Duration Dependence in Stock Prices: An Analysis

of Bull and Bear Markets”, Journal of Business and Economic Statistics, 22 (3), 253–273.
vie
Maheu, J. M. and McCurdy, T. H. (2000). “Identifying Bull and Bear Markets in Stock
Returns”, Journal of Business and Economic Statistics, 18 (1), 100–112.
Maheu, J. M., McCurdy, T. H., and Song, Y. (2012). “Components of Bull and Bear Markets:
Bull Corrections and Bear Rallies”, Journal of Business and Economic Statistics, 30 (3),
re
391–403.
Moskowitz, T. J., Ooi, Y. H., and Pedersen, L. H. (2012). “Time Series Momentum”, Journal
of Financial Economics, 104 (2), 228–250.
er
Nelson, C. R. and Kim, M. J. (1993). “Predictable Stock Returns: The Role of Small Sample
Bias”, Journal of Finance, 48 (2), 641–661.
pe
Ohn, J., Taylor, L. W., and Pagan, A. (2004). “Testing for Duration Dependence in Economic
Cycles”, Econometrics Journal, 7 (2), 528–549.
Pagan, A. R. and Sossounov, K. A. (2003). “A Simple Framework for Analysing Bull and Bear
Markets”, Journal of Applied Econometrics, 18 (1), 23–46.
ot
Poterba, J. M. and Summers, L. H. (1988). “Mean Reversion in Stock Prices: Evidence and
Implications”, Journal of Financial Economics, 22, 27–59.
Rousseeuw, P. J. (1984). “Least Median of Squares Regression”, Journal of the American

tn
Statistical Association, 79 (388), 871–880.
Rousseeuw, P. J. (1985). “Multivariate Estimation With High Breakdown Point”, In Gross-

mann, W., Pflug, G., Vincze, I., and Wertz, W. (Eds.), Mathematical Statistics and
rin
Applications, Vol. B, pp. 283–297. Reidel Publishing Company, Dordrecht, Nederland.
Rousseeuw, P. J. and Driessen, K. V. (1999). “A Fast Algorithm for the Minimum Covariance
Determinant Estimator”, Technometrics, 41 (3), 212–223.
ep
Russell, M. and Cook, A. (1987). “Experimental Evaluation of Duration Modelling Techniques

for Automatic Speech Recognition”, In ICASSP ’87. IEEE International Conference on
Acoustics, Speech, and Signal Processing, Vol. 12, pp. 2376–2379.
Pr
Schwert, G. W. (1990). “Indexes of U.S. Stock Prices from 1802 to 1987”, Journal of Business,
63 (3), 399–426.
39
Summers, L. H. (1986). “Does the Stock Market Rationally Reflect Fundamental Values?”,
d
Journal of Finance, 41 (3), 591–601.
Tejedor, A., Gómez, J., and Pacheco, A. (2015). “The Negative Binomial Distribution as a
we
Renewal Model for the Recurrence of Large Earthquakes”, Pure and Applied Geophysics,
172, 23–31.
Timmermann, A. (2000). “Moments of Markov Switching Models”, Journal of Econometrics,

96 (1), 75–111.
vie
Zaremba, A., Long, H., and Karathanasopoulos, A. (2019). “Short-Term Momentum (Almost)
Everywhere”, Journal of International Financial Markets, Institutions and Money, 63,
101–140.
re
Zhu, H., Wang, J., Yang, Z., and Song, Y. (2006). “A Method to Design Standard HMMs with
Desired Length Distribution for Biological Sequence Analysis”, In Bücher, P. and Moret,
B. M. E. (Eds.), Algorithms in Bioinformatics, pp. 24–31 Berlin, Heidelberg. Springer
Berlin Heidelberg. er
Appendix
pe
Proof of Proposition 1
We suppose that Xt is wide-sense stationary process, that is, the process whose unconditional
mean and autocovariance do not vary with respect to time: E[Xt ] = µ and E[(Xt − µ)(Xt−k −
µ)] = Cov(Xt , Xt−k ) = γk for any t and k.
ot
By definition,
Cov(Xt+k,t+1 , Xt,t−k+1 )
Cor(Xt+k,t+1 , Xt,t−k+1 ) = , (28)
V ar(Xt,t−k+1 )
tn
where Cov(Xt+k,t+1 , Xt,t−k+1 ) is the covariance between Xt+k,t+1 and Xt,t−k+1 . Note that,
because of the stationarity assumption, the variance of Xt+k,t+1 equals that of Xt,t−k+1 .
The variance of Xt+k,t+1 is given by
rin
k k−1
k−1 X
!
X X
V ar(Xt,t−k+1 ) = V ar Xt−k+i = Cov(Xt−i , Xt−j ).
i=1 i=0 j=0
By definition, Cov(Xt−i , Xt−j ) = ρ|i−j| γ02 = ρ|i−j| σ 2 , where ρm denotes the lag m autocorrela-
ep
tion of Xt (with ρ0 = 1) and σ 2 denotes the variance of Xt . Consequently, the expression for
the variance can be written as
k−1
k−1 X
Pr
X
V ar(Xt,t−k+1 ) = ρ|i−j| σ 2 = 1′ R1σ 2 .
i=0 j=0
40
By similar reasoning, the covariance between Xt+k,t+1 and Xt,t−k+1 is given by
d
 
k
X k
X k X
X k
Cov(Xt+k,t+1 , Xt,t−k+1 ) = Cov  Xt+i , Xt−k+j  = Cov(Xt+i , Xt−k+j )
we
i=1 j=1 i=1 j=1
k X
X k
= ρ|k−j+i|σ 2 = 1′ U 1σ 2 .
i=1 j=1
vie
Inserting the expressions for Cov(Xt+k,t+1 , Xt,t−k+1 ) and V ar(Xt,t−k+1 ) into equation (28)
completes the proof.
re
The detailed proof of this proposition is very lengthy. Below, we present only the sketch of the
proof. Full details of the proof are available from the authors upon request.
First, we find the eigenvalues λi , i ∈ {1, . . . , 4}, of P by solving the equation |P − λI| = 0.
This gives us the following 4 eigenvalues: λ1 = 1, λ2 = 1−2δ, λ3 = 1−δ−C, and λ4 = 1−δ+C.
The diagonal matrix D contains the eigenvalues {λ1 , . . . , λ4 } along the main diagonal.
er
Second, we find the eigenvectors corresponding to each eigenvalue by solving the equation
(P − λi I) = vi for each λi . These eigenvectors constitute the columns in matrix Q. After some
extremely tedious but straightforward computations we get the following matrices Q and Q−1 :
pe
   
β−α−C β−α+C β β
1 − αβ 2β 2β
α α
   2δβ 2δ
α
2δ
α
2δ
β 
1 1 −1 −1   − 2δ − 2δ
Q−1 = 2δ

2δ  ,
Q=
1 − β β−α+C β−α−C  ,
 
− β d α c 
 α 2α 2α   2C 4C 2C 4C 
β c α d
1 1 1 1 − 4C − 2C − 4C
ot
2C
where c = β −α+C and d = β −α−C are two constants introduced to shorten the expressions.
Finally, we derive the expressions for the elements of matrix P(n) = Pn = QDn Q−1 . Using
tn
four 2 × 2 sub-matrices, matrix P(n) can be written as:

" #
PAA (n) PAB (n)
P(n) = .
PBA (n) PBB (n)
rin
To compute the transition probabilities, we need to know the elements of sub-matrices PAB (n)
and PBA (n). The sub-matrix PAB (n) is:
" # " #
p13 (n) p14 (n) π3 + π3 αβ λn2 + αd n αc n
4βC λ3 − 4βC λ4 π3 − π1 αβ λn2 + cd n cd n
8βC λ3 − 8βC λ4
ep
PAB (n) = = α n α n c n d n
,
p23 (n) p24 (n) π3 − π3 λn2 − 2C λ3 + 2C λ4 π3 + π1 λn2 − 4C λ3 + 4C λ4
β πA α πB
where π1 = 2δ = 2 and π3 = 2δ = 2 are the stationary probabilities of sub-states 1 and
3. The probability pAB (n) is computed as one-half of the sum of these four probabilities (see
Pr
41
equation (13)):
d
(α − β)2 n (δ + C)2 n (δ − C)2 n
pAB (n) = πB + λ2 − λ + λ .
4βδ 16βC 3 16βC 4
we
The sub-matrix PBA (n) is:
" # " #
p31 (n) p32 (n) π1 + π1 αβ λn2 − βc n βd n
4αC λ3 + 4αC λ4 π1 − π3 αβ λn2 + cd n cd n
8αC λ3 − 8αC λ4
PBA (n) = = β n β n d n c n
.
p41 (n) p42 (n) π1 − π1 λn2 − 2C λ3 + 2C λ4 π1 + π3 λn2 + 4C λ3 − 4C λ4
vie
Probability pBA (n) is computed as one-half of the sum of these four probabilities:
(α − β)2 n (δ + C)2 n (δ − C)2 n

pBA (n) = πA + λ2 − λ + λ .
4αδ 16αC 3 16αC 4
re
A Useful Property
Property 1. Suppose that C3 = u − iv and C4 = u + iv is a complex conjugate pair. Addi-
tionally, suppose that λ3 = x − iy and λ4 = x + iy is another complex conjugate pair. Then
er
the following result holds:
C3 λn3 + C4 λn4 = 2λn R cos(nϕ + θ), (29)

pe
p √
where λ = x2 + y 2 , ϕ = arctan(y/x), R = u2 + v 2 , and θ = arctan(v/u).
Proof : Using De Moivre’s formula, we obtain
λn3 = (x − iy)n = λn e−inϕ , λn4 = (x + iy)n = λn einϕ .

ot
Euler’s formula implies that
2 cos(nϕ) = einϕ + e−inϕ , 2i sin(nϕ) = einϕ − e−inϕ .

tn
Therefore,
C3 λn3 + C4 λn4 = (u − iv)(x − iy)n + (u + iv)(x + iy)n

rin
(30)
= λn (u − iv)e−inϕ + (u + iv)einϕ = 2λn (u cos(nϕ) − v sin(nϕ)) .

Finally, a linear combination of cosine and sine waves is equivalent to a single cosine wave with
a phase shift and re-scaled amplitude:
ep
u v
u cos(nϕ) − v sin(nϕ) = R cos(nϕ) − sin(nϕ) = R cos(nϕ + θ).
R R
Pr
42
d
p
If C is a complex number, then C = i|C| where |C| = |α2 + β 2 − 6αβ|. In this case, the
eigenvalues λ3 and λ4 can be written in the following form:
we
λ3 = 1 − δ − i|C|, λ4 = 1 − δ + i|C|. (31)
Apparently, λ3 and λ4 is a pair of conjugate complex numbers. Consider now the coefficients
in front of λ3 and λ4 in equation (17).
vie
(δ + C)2 δ (α − β)2 (δ − C)2 δ (α − β)2
C3 = = −i , C4 = − = +i . (32)
4C 2 2|C| 4C 2 2|C|
Consequently, C3 and C4 is also a pair of conjugate complex numbers. The final result follows
re
from Property 1.
When C → 0, the expression for ψ(n) gives rise to an indeterminate 0/0 form. We are going
er
to evaluate this indeterminate form using l’Hôpital’s rule. We consider the situation when C
is a complex number that approaches zero.
The expressions for λ3 and λ4 are given by equations (31), whereas the coefficients in front
pe
of λ3 and λ4 in equation (17) are given by equations (32). Using equation (30), the expression
for ψ(n) is given by
(α − β)2 (α − β)2

n
ψ(n) = λ δ cos(nϕ(C)) − sin(nϕ(C)) − (1 − 2δ)n , (33)
|C| δ
ot
where
p |C|
λ= (1 − δ)2 + |C|2 , ϕ(C) = arctan ,
1−δ
tn
and notation ϕ(C) emphasizes that ϕ is a function of C.

As C → 0, λ → 1 − δ, ϕ(C) → 0, and, hence, cos(nϕ(C)) → 1 while sin(nϕ(C)) → 0.
There is one term in (33) which has an indeterminate 0/0 form, in particular, sin(nϕ(C))/|C|.
The application of l’Hôpital’s rule gives us (to shorten the notation, we replace |C| by c)
rin
sin(nϕ(c)) l’H n cos(nϕ(c))ϕ′ (c) 1−δ n

lim = lim ′
= lim n cos(nϕ(c)) 2 2
= .
c→0 c c→0 c c→0 (1 − δ) + c 1−δ
Consequently,
ep
(α − β)2 (α − β)2

lim ψ(n) = δ− n (1 − δ)n − (1 − 2δ)n .
C→0 1−δ δ
Pr
43

Momentum and Mean-Reversion in A Semi-Markov - SSRN-id3997837

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Momentum and Mean-Reversion in A Semi-Markov - SSRN-id3997837

Uploaded by

Copyright:

Available Formats

d

December 30, 2021

JEL classification: C1, G10

are likely to be unusually low (high) in the future.

a negative autocorrelation at longer lags.

negative returns, delivers superior performance in various financial markets. Subsequently,

Momentum and mean-reversion phenomena are considered anomalies within traditional

these phenomena by challenging the assumption of strict rationality. In particular, these

modeling of momentum and mean reversion in financial markets. Whereas an equilibrium

the medium-term mean reversion.

distribution that is memoryless. As a result, there is no duration dependence. In other words,

The primary approach to incorporate the duration dependence in a regime-switching model

distribution exhibits a positive duration dependence and reduces to a geometric distribution

under particular parameter constraints. As compared to an original SMSM formulation, an

all well-established methods available for Markov models.

Our main contribution is to propose a theoretical construction of an ESMSM where the

and subsequently reverses.

return autocorrelation function is computed in a two-state regime-switching model. For the

the goodness of fit. Finally, Section 6 concludes the paper.

2 Return Autocorrelation in a Regime-Switching Model

in the following manner:

be represented by a 2 × 2 transition probability matrix P(n):

Denote by π = [πA , πB ] the vector of the steady-state (stationary or ergodic) probabilities.

πA = P rob(St = A), πB = P rob(St = B).

Schnatter (2006, Chapter 10))

πA πB (µA − µB )2 − (µA − µB )(πA pAB (n) µA − πB pBA (n) µB )

abilities depends largely on whether the regime-witching model is a Markov or a semi-Markov

pirical studies (because it is usually statistically insignificant). Therefore, reliable detection of

Proposition 1. The first-order autocorrelation of k-period returns is given by

where 1 is the k × 1 vector of ones, R and U are the k × k matrices given by

where ρi is the lag i autocorrelation of Xt .

The proof is given in the Appendix.

matrix U divided by the sum of all elements of the matrix R.

3 Return Autocorrelation in a Markov Model

In an MSM, the return process satisfies the “Markov property” (memorylessness)

P rob(St+1 |St , . . . , S0 ) = P rob(St+1 |St ). (5)

by the transition probability matrix in (6).

the state I duration time dI follows the geometric distribution

P rob(dI > s + n|dI ≥ s) = P rob(dI > n) ∀ n, s ≥ 1.

expression for the stationary probabilities

In an MSM, the return autocorrelation function given by equation (1) reduces to

It is essential to note that the lag-n autocorrelation is always non-negative, ρn ≥ 0, given

that α + β < 1. Additionally, if µB 6= µA , then the autocorrelation is strictly positive. The

autocorrelation exponentially decreases towards zero as n increases. Consequently, in an MSM,

the return process exhibits a short-term momentum.

autocorrelation exponentially decreases towards zero as k increases. In contrast, the first-order

4 Return Autocorrelation in a Semi-Markov Model

switching model with a semi-Markov switching model (SMSM).

distribution. However, a serious disadvantage of an SMSM is that there are no analytical

Howard (1971) and Barbu and Limnios (2008)).

ESMSM. In an ESMSM, each individual state I of the process is represented by q sub-states in

the conventional Markov model: {i1 , i2 , . . . , iq }. The state process is in macro-state I, St = I,

as long as the state process is in the set of q sub-states1 St ∈ {i1 , i2 , . . . , iq }. As compared

the N B(q) distribution is given by

geometric distribution is a special case of the negative binomial distribution when q = 1.

Consequently, when q = 1 for all states, an ESMSM reduces to a conventional MSM.

A very convenient function for duration analysis is the hazard function

Probability mass function Hazard rate

erNB(1) 0.20 NB(1)

State duration, n State duration, n

4.2 Topology of ESMSM and Functional Form of the Solution

where each macro-state A and B is represented by two sub-states. Specifically, macro-state A