Master Thesis Virgile Troude Final

Financial Crash Identification
for Trading strategies and Consultancy services
by
Virgile Troude
Submitted to the Department of Management, Technology and

Economics in partial fulfillment of graduation requirements for the
degree of
Master of Science
ETH Zurich
September 2020
Supervisor: Jan-Christian Gerlach Student: Virgile Troude
Signature: Signature:
Professor: Didier Sornette
Signature:
Abstract
Based on the Log-Periodic Power-Law Singularity (LPPLS) model
and the geometric growth dynamic combined to different error mod-
els (standard Gaussian distribution, Brownian motion and Ornstein-
Uhlenbeck process), we constructed model-based trading strategies
to trade systematically on a weakly basis such that the strategies
are based on the Kelly criterion. We postulated the following hy-
pothesis: If a model-based strategy is successful over a time period,
then the model characterized well the price time series over this
time period. Then the success of a LPPLS-based trading strategy
will imply that the price series is in a bubble regime. If it is a
geometrical-based strategy which is successful, then the time series
is in a non-bubble regime. The success of a strategy is based on dif-
ferent metrics (Sharpe ratio, Value at Risk, Accuracy, etc ...), where
each metric quantify the trading performance of a strategy. When
a metric is in an -drawup, it will give a success point to the strat-
egy such that the success of a strategy is the average success point
over the different metrics. The success value of strategy is computed
in a backtest study to determine the state of a price series (bubble
or non-bubble). Based on results obtained on synthetic data when
we calibrated the models and the others obtained after a backtest
study, we developed a current time estimation of bubble state and
critical time to be able to do consultancy on the financial market
(by estimating the state of a time series: bubble or not bubble; and
when the bubble will end) and to optimize trading strategies. Then
we defined different trading strategies based on all the models at the
same time. The goal is to obtain a strategy which will play well in
and out of a bubble and which will be robust to crashes. Based on
the estimations of the critical time we defined breaking time of a
strategy such that when it seems that a crash will occur in the next
ten days, we exit the market and we get back in once it seems that
a crash occurs. Most of the strategies developed in this paper did
not succeed to beat the buy and hold (B&H) on all the metric in
average. The breaking times based on the estimations of the critical
time of a bubble did not optimized the strategy. The best strat-
egy is a Kelly strategy based on the expected growth obtained from
the fits of all the models. Compared to all the other strategy, this
strategy succeed to divide in average by ten the Value at Risk. The
critical breaking times succeed to optimize a bit the Value at Risk of
a strategy. It did not change the median but it gives a wider range
of small value for the Value at Risk. We tried to apply supervised
networks to predict features of a time series which estimates if we are
in a bubble regime or not, if the bubble will end soon and the best
strategy to adopt. Unfortunately, after optimization the networks
converge to give output which oscillate around a value close to the
mean value of the targets given during the training phase.
i
Acknowledgements
I want to thank Jan-Christian Gerlach who helped me all along
this paper. He gave me ideas and advises to help me to go through
this project. He took a lot of its time to help me with the writing
part of this paper.
I also want to thank the Professor Sornette, who gave us all his
time during meeting. He helped me to put words on concepts that
I tried to develop and helped me at few moments to give a deeper
perspective of what I tried to construct.
Thank you to Jean Herskovits, who does not know it but some
conversations that we had gave me idea to use for this project.
Finally I want to thank my family who always encourage me
during my study and all along this project.
ii
Contents
1 Introduction 1
2 Models & Calibration 3

2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Modelling Financial Asset Price Dynamics . . . . . . . . . . . . . 3
2.2.1 The Log-Periodic Power-Law Singularity (LPPLS) Model 3
2.2.2 Error Models . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.3 Summary of the Error Models . . . . . . . . . . . . . . . . 7
2.2.4 DS LPPLS Confidence Bubble Indicator (CI) . . . . . . . 8
2.3 Model Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.1 Maximum Log-Likelihood . . . . . . . . . . . . . . . . . . 11
2.3.2 Optimization Procedure . . . . . . . . . . . . . . . . . . . 12
2.3.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Synthetic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.1 Motivation: Estimator Validation . . . . . . . . . . . . . . 18
2.4.2 Synthetic Data Generation . . . . . . . . . . . . . . . . . 19
2.4.3 Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3 Trading Strategies for Regime Classification 25

3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 The Kelly Criterion . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Individual Strategy Simulation . . . . . . . . . . . . . . . . . . . 29
3.4 NASDAQ Dotcom Bubble Example Run . . . . . . . . . . . . . . 33
3.5 Identification of Bubble and Crash Episodes . . . . . . . . . . . . 37
3.5.1 Outline of the Procedure . . . . . . . . . . . . . . . . . . 37
3.5.2 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.5.3 Success Weights . . . . . . . . . . . . . . . . . . . . . . . 45
3.5.4 Regime Classification . . . . . . . . . . . . . . . . . . . . 46
3.5.5 Bubble Start and End Times . . . . . . . . . . . . . . . . 48
3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4 Current Time Estimation 54

4.1 Estimation from Past Results . . . . . . . . . . . . . . . . . . . . 54
4.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.1.2 The Fitting Probability Density . . . . . . . . . . . . . . . 54
4.1.3 Current Estimation of the Critical Time and State of a
Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.1.4 Completion of the Quenched and Mixed Strategies . . . . 58
4.1.5 Application of the Critical Time to Trading Strategies . . 60
4.2 Prediction of Structural Breaks / Phase Transitions . . . . . . . 64
4.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.2.2 The Inputs Set . . . . . . . . . . . . . . . . . . . . . . . . 64
4.2.3 The Targets . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2.4 The Multi-Branches Convolutional Neural Network (MCNN) 70
4.2.5 Network Applications . . . . . . . . . . . . . . . . . . . . 75
4.2.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.2.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
iii
5 Discussion 82
5.1 Models and Optimizations Procedures . . . . . . . . . . . . . . . 82
5.2 Trading Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.3 The Success Value . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.4 The Global Phase Labelling Procedure . . . . . . . . . . . . . . . 86
5.5 Predictive Learning of the Critical Dynamic . . . . . . . . . . . . 87
6 Conclusion 89
A Martingale Representation Theorem 90
B Computation of the Ornstein-Uhlenbeck Process 90
C From Return to Log-Return 90
D Log-Likelihood 91
E Observables 91
F Spin Glass and Quenched Kelly Weight 92
G -Drawup/down Algorithm 93
H Code Python: Model classes 94
I Results on Synthetic Data 123
J Application of the Global Phase Labelling Procedure on the

NASDAQ 100 137
K Supervised Learning Results 193
L Trading Strategies Results 325
iv
1 Introduction
The biggest fear of all investors are crashes. A model which would be able to
characterize as well stable growths, bubbles, crashes and recovery periods is a
dream for most of financial agents.
At this time there exist models to characterize different states of a time
series but none of them are able to combine all the possible features in one
model. For example, efficient models to characterize price growths are defined
by a geometric models. This issue with such models is that they do not consider
the possibility to observe a crash. There exist models to characterize bubble
dynamics but once the bubble burst the model is no more valid and we arrived in
a grey zone where we do not know what really happens. The uncertainty of the
phenomena once the crash occurs is due to the different instances (companies,
banks, states, ...) which could act on the market to stabilize it or not.
The goal of this paper is to develop techniques and procedures to identify
the current state of a time series and to predict regime changes. The idea
is to develop trading strategies based on different models which characterized
stable regimes or bubble dynamics. The first objective of the model-based trad-
ing strategies is to develop indicators to identify and classify different market
regime. We will base this study on the hypothesis that if a model-based trading
strategies performed well during a period, then the time series could be well
characterized by this model. This first step can be summarized by saying that
we will use trading performances to do observation for consultancy services (by
consultancy services we mean to be able to estimate features in a price time
series such as if the price series is in a bubble or not). A second step is to use
the observation to optimize trading performances by combining the different
trading strategies. The new trading strategies will combine the different model-
based strategies, which is equivalent to define meta-models which will combine
the different models which characterized the different regimes of a time series.
Then the first goal of this project is to obtain trading strategies which can adapt
smoothly in function of the current and next state of the time series. The second
one is to identify and predict phase transition for consultancy services.
The first step in this project (see Figure 1 to obtain a map of the whole
project) is to define the different models that characterized price growths and
bubble dynamics. Next, an optimization procedure to calibrate each model
to a given price time series is presented. Before application of the models to
real market data, the correctness and accuracy of the employed optimization
procedure will be tested on synthetic time series. This will be presented in the
following section.
The whole study is based on different models to define the different regimes.
The bubble regime is described by the Log-Periodic Power-Law Singularity (LP-
PLS) model, where a price growths with no-risk of crash (non-bubble regime)
is characterized by a geometric model. For each type of regime it is possible to
consider different stochastic processes around the price trends called the error
models.
Once the models have been defined and calibrated, it will be time to apply
them to trading strategies. The trading strategies will be based on portfolio
optimization under the hypothesis that the price dynamic is characterized by
one of the model considered. Then we will backtest the trading strategies and
based on their successes under different metrics. Procedures will be developed
1
to characterize the current state of a price series and the starting and ending
time of bubble regimes.
The last step is to use the knowledge from the labelling procedure to charac-
terize in current time the state of a time series, to optimize the trading strategies
developed in the previous part. To estimate in current time two techniques have
been developed:
• We will use the past results to characterize in current time different fea-
tures of the time series.
• The other technique is to predict the different features by using supervised
learning based on neural networks. This is a model-free approach (”black
box”).
Financial Crash Identification

For Trading strategies and Consultancy services
Models Tests on
synthetic data
LPPLS + Geometric
error models (GBM, MGBM)
(GN, BM, OU)
Optimization Real Price
procedure data
Kelly Weight Strategies

(Quenched & mixed) Fitting routine
Measure Strategies Performances

(Sharpe ratio, VaR, CAGR, Accuracy, Average return)
Labelling Procedure
Strategies Phases
Success
Current time estimation

From Past Results By Using Neural
(Delay) Networks
(Prediction)
Figure 1: Map representation of the project.
2
2 Models & Calibration
2.1 Motivation
In this first section we will present the different models used in this paper to
characterize different regime in a time series as bubble or not bubble.
The model which is going to bu used to characterized bubble regime will
be the Log-Periodic Power-Law Singularity (LPPLS) model [1]. Where the
”non-bubble” regime will be characterized by a geometric model (see Section
2.2.1).
In this paper, these two models will characterized the trend of the log-price
dynamic and their parameters will be considered as predictable. Then in a
second part we will introduced error models which will embedded the stochastic
part of the log-price process. We will considered three different error models:
• Standard Gaussian distribution;
• Wiener process;
• Ornstein-Uhlenbeck process.
To be able to fit real data with respects to each model, we will defined
optimization procedures based on the maximization of the log-likelihood (Ap-
pendix D, p∼91). To test the optimization procedures and to estimate how well
we are able to recover the models parameters. We will apply the optimization
procedures on a set of synthetic time series.
Then the goal of this section is first to introduce the different models. Then
to define optimization procedures to fit these models to time series. To finally
test the optimization procedures on synthetic data.
We will use the results obtain on synthetic data to estimate an empirical
limit of the optimization procedures to recover the LPPLS parameters.
2.2 Modelling Financial Asset Price Dynamics

In this first subsection we will introduced the different models. Then we will
defined the domain of definitions of the models parameters to define a confidence
indicator which will characterize the ”truthiness” of a set of fits over a time
series.
2.2.1 The Log-Periodic Power-Law Singularity (LPPLS) Model

The LPPLS model was originally presented as the Johansen-Ledoit-Sornette
(JLS) model [1] in 1996. It was further developed by Sornette and embedded
in a rational expectations framework [2], in order to formulate a version of
the model applicable to financial market price dynamics, more specifically the
dynamics of prices during the formation of financial bubbles, which are ended
by crashes. In the following, a derivation of the LPPLS formula will be derived
and the fundamental ideas and assumptions forming the theoretical framework
will be explained.
Consider an asset with price pt , which pays no dividends and for simplicity
we ignore the interest rate, risk aversion, information asymmetry and the market
3
clearing condition [2]. This leads to define a price which follows the martingale
hypothesis (Eq.1):
∀s > t Et [ps ] = pt , (1)

where Et [.] denotes the expectation conditional on the data up to time t.
Assuming the jump process is the unique stochastic part in the price dynamic,
the infinitesimal return on the asset before the crash is (Eq.2):
dpt
= µt dt − kdjt , (2)
pt
where µt is the time-dependent drift, dj is a jump process and k ∈ (0, 1) the
percentage price drop during the crash. The idea of the LPPLS is to predict the
time of the crash, which can be assimilated to a jump process jt whose value is
zero before the crash and one afterwards.
The conditional expectation at time t of the differential of the jump process
is: Et [djs ] = −ht dt, for s > t, where ht is the hazard rate of the crash, which is
the probability per unit of time that the crash will happen in the next instant
[t, t + dt]. By using the martingale hypothesis on the price (Eq.1, i.e Et [dps ] = 0
for t ≤ s) one obtains µt = kht . Thus, solving for the log-price yields (Eq.3):
Z t
pt
ln =k hs ds , (3)
p0 0
before the crash. From [1], the solution of the risk hazard up to a first order
expansion is given by (Eq.4):
ht ≈ (tc − t)m−1 (B0 + C0 cos [ω ln(tc − t) + φ]) . (4)

After integration, this yields to the general solution of the LPPLS model
(Eq.5):
ln pt = lt = A + |tc − t|m (B + C cos [ω ln |tc − t| + φ]) , (5)

where B = −mkB0 and C = −k(m + ω)C0 . This model comprises seven
parameters. The central parameter tc is the critical time, i.e. the time at
which the crash occurs. A = ln ptc , B is the amplitude of the super-exponential
power-law part and m ∈ (0, 1) is the critical exponent. On top of the power law,
log-periodic, accelerating oscillations with amplitude C are added. The angular
log-frequency of the oscillations is given by ω and φ the phase.
As visible, when 0 < m < 1, the price dynamics before the crash, i.e. during
the bubble regime, are super-exponential due to the power law term |tc − t|m
in the LPPLS formula. The formula is not valid for t > tc , as visible from the
argument of the logarithm inside the periodic part of the model.
We say that the dynamic is super-exponential in comparison to a geometric
dynamic for which the predictive part take the following form (Eq.6):
dpt = µpt dt ⇒ pt = p0 eµt , (6)

for which the price dynamic follows an exponential dynamic. This dynamic is
said to be geometric because if we consider that the asset as a constant interest
rate r over a time period ∆t, the price at time t + ∆t is (Eq.7):
4
pt+∆t = (1 + r)pt , (7)
so the price dynamic is defined by a geometrical series. If we set n = t/∆t,
t0 = 0 and the infinitesimal return r = µ/∆t then the price dynamic is (Eq.8):
t
µ ∆t
pt = p0 1 + ⇒ lim pt = p0 eµt , (8)
∆t ∆t→0
then we had shown that an exponential dynamic is the limit in continuous time
of a geometric series. This is the reason why we said that a price follows a
geometric dynamic if it is characterized by Equation (6).
For the LPPLS case the infinitesimal return is given by the risk hazard
and the amplitude of the jump: µt = kht . Since we assumed that m < 1
from [2], then when we get closer to tc , ht starts to diverge and at tc the
infinitesimal return of the asset is infinite. This is the reason why we said
that the dynamic is super-exponential due to the power-law part. In our case
the geometric model will characterized non-bubble dynamics where the LPPLS
model will characterized bubble dynamic.
The geometric model is not realistic since it does not respect the arbitrage
free hypothesis (Eq.1) and it is not an adapted process. As well if we consider
that each parameters of the LPPLS model are predictable it would mean that
there is no stochasticity in each process which is not what we observe in real life.
Then to be able to estimate a truthful model for a price dynamic, we have to
consider the notion of error models to complete the geometric and the LPPLS
ones.
2.2.2 Error Models

Calibration of a model like the LPPLS to market data inherently requires not
only the formulation of the dynamic model itself, but also of an underlying error
model for the residuals of the calibration. Fitting with Ordinary Least Squares
(OLS) for instance would assume the residuals to be normally distributed and in-
dependently and identically distributed, thus uncorrelated and stationary. The
Gaussian noise model is the first and most commonly used error model that
we assume in this study. Besides this most basic model specification, we can
formulate arbitrary statistical models for the noise process. In the following,
a variety of reasonable noise models are derived. Together with the LPPLS
formula, each noise model will form an instance of the LPPLS model. It is
the goal throughout this thesis, to compare the performance of these different
versions of the LPPLS model to each other and to competing different models
such as Geometric Brownian Motion (GBM) which assume an entirely different
formulation of the price process in the first place.
Assume the log-price obeys the Martingale Representation Theorem (see
Appendix A, p∼90), so the general formula for the price is (Eq.9):
Z t
ln pt = E[ln pt ] + φt dWt
0 (9)
= rt + νt ,
where φt is an adapted process and Wt a Wiener stochastic process. Then, we
can write the log-price as a predictable process rt and a stochastic one νt which
5
models the error model part of the process, i.e. the deviations of a random price
time series from the underlying true / expected model dynamics. As stated, in
this thesis, the predictable process rt will be given by the LPPLS model or a
geometric growth.
As mentioned above, the first model for the residuals is just a white noise
model (Eq.10):
νt = σt , (10)
where σ is the standard deviation and t ∼ N(0,1) follows a standard Gaus-
sian distribution. Thus we have idd noise instances.
An alternative would be to model the noise process as a Wiener process
(Eq.11):
νt = σWt where: dWt ∼ N(0,√dt) , (11)

where σ is the volatility
√ of the log-price. In this case the standard deviation
increases with time (∼ t) since for t > s, Wt − Ws ∼ N(0,√t−s) . This error
model is equivalent to the case when the adapted process φt is constant in
Equation (9). This noise instance will be our second model to consider for the
errors.
To be able to constrain the standard deviation to finite values, it is possible
to add a mean-reversal term. This is also known as Ornstein-Uhlenbeck process
(Eq.12) and yields our third model for the errors (Eq.12):
dνt = −θνdt + σdWt , (12)

where θ = τ −1 is the reversing rate and τ = θ−1 is the characteristic time
of the mean-reversion. The solution of Equation (12) is given by (Eq.13; the
demonstration is given in Appendix B, p∼90):
e−θt
νt = Ut = σ √ W(e2θt −1) . (13)
2θ
In this case the standard deviation is (Eq.14):
r
1 − e−2θt
σt = σ , (14)
2θ
and so the deviation of the error model is bounded. In the limit when θ 1
the Ornstein-Uhlembeck process is a Brownian motion since 1 − e−2θt ≈ 2θt.
The auto-correlation between t and t + ∆t is (Eq.15):
corr(νt , νt+∆t ) = σ 2 e−θ∆t , (15)
then if θ−1 > ∆t: corr(νt , νt+∆t ) ≈ 0 which means that after a time step ∆t the
process is uncorrelated. In this study the time step between each data point in
a time series will be of one trading day, then for θ−1 ≥ 1 day, the error model
simplifies to idd noise.
To visualize the typical behavior of each noise model added on top of the
LPPLS formula, the 0.8-quantiles of a process νt ∼ N(0,σt ) are plotted for each
error model in Figure 2, p∼7. Where σt is the standard deviation of each error
model.
6
The dark√line represents the 0.8-quantile for a Brownian motion (BM). It
grows in ∼ t. The dashed red line is the quantile of the standard Gaussian
distribution (GN) and it is constant over the time. The quantiles (magenta:
θ = 1; cyan: θ = 10−1 ; green: θ = 10−2 ; blue: θ = 10−3 ) of the Ornstein-
Uhlenbeck (OU) process converge close to t = τ = θ−1√, the charcteristic time
of the process and before it grows approximately in ∼ t.
You can observe that when θ = 1, the Ornstein-Uhlenbeck process is equiv-
alent to Gaussian iid Noise and when θ = 0 it becomes a Wiener process.
Figure 2: 0.8-quantile in function of the time for a standard Gaussian distri-

bution (red dashed line; Eq.10), a Brownian motion (black line; Eq.11) and
for an Ornstein-Uhlenbeck process (with: θ = 10−3 , blue; θ = 10−2 , green;
θ = 10−1 , cyan; θ = 1, magenta; Eq.13) for different values of θ, each one with
σ = 1. Since all the processes are symmetric the 0.2-quantile is the opposite of
the 0.8-quantile plotted here. The Ornstein-Uhlenbeck process has a converg-
ing standard deviation after τ = θ−1 , which is the characteristic time of the
mean-reversing effect.
2.2.3 Summary of the Error Models

In the previous subsections two predictive models have been defined:
• The LPPLS model which characterizes bubble regimes.
• The geometric model which characterizes growths with no risk of crashes.
Furthermore, a number of different error models that each can be overlaid
with the two models were presented in the previous subsection. The combina-
tions are presented Table 1. Based on the resulting formulas, as these models
7
are simply time series models, synthetic price trajectories can be easily created
on a computer using random generation, if values for the involved parameters
for each model are set.
The standard Gaussian distribution is not used to be a possible error model
for the geometric dynamics because it is not realistic. It is used for the LPPLS
since the first idea to fit the data is to minimize the OLS, which is equivalent
to assume that the model error is iid standard Gaussian distributions. For a
geometric model no one used iid noise instances where the LPPLS+GN model
is used by the Financial Crash Observatory (FCO) to characterize bubbles.
States models name models

LPPLS with
LPPLS Standard
ln pt = lt + σt
(Bubble) Gaussian
Distribution
(LPPLS+GN)
LPPLS with
d ln pt = dlt + σdWt
Brownian Motion
(LPPLS+BM)
Mean-reversal
LPPLS d ln pt = dlt − θ(ln pt − lt )dt + σdWt
(LPPLS+OU)
Geometric
Geometric Brownian motion
d ln pt = µdt + σdWt
(Non-Bubble) (GBM)
Mean-reversal
GBM d ln pt = µdt−θ(ln pt −µt)dt+σdWt
(MGBM)
Table 1: The different models used in this paper to describe the log-price dy-
namic of an asset. Where lt = A + |t − tc |m [B + C cos(ω ln |t − tc | + φ)] is the
LPPLS function, t ∼ N(0,1) are iid noise instances and dWt ∼ N(0,√dt) is the
differential of a Wiener process.
2.2.4 DS LPPLS Confidence Bubble Indicator (CI)

A main objective of this paper is to fit the various models and instances pre-
sented in the previous sections to price time series, compare their goodness of fit,
perform model selection to apply the correct model and thereby ultimately clas-
sify time series into different regimes model: bubble, crash or non-bubble. For
instance, if the LPPLS model matches best, the corresponding window of price
data that the model was fitted to would likely be a bubble regime. If instead
GBM was the best fit to the price data, the corresponding period would rather
be classified as a regime of ”normal” price growth. In the end, the procedures
and entire framework developed throughout this thesis shall accomplish a reli-
able categorization of the different price regimes by application of the presented
models and other meta-algorithms for model selection and regime prediction.
In order for an LPPLS fit to represent true super-exponential price dynamics,
the calibrated values of the parameters need to be in certain ranges, as for
instance m ≥ 1 would not result super-exponential price dynamics and m < 0
8
gives a super-exponential explosion with diverging price. In order to have super-
exponential growth without divergences, we assume m ∈ (0, 1). Therefore, after
performing a fit, the resulting parameter values need to be filtered, at first, in
order to see whether the corresponds to true SE price dynamics or is instead
just a spurious result. Table 2 shows the filter constraints for the different
parameters of the LPPLS model used in this study. As said, since m is the
critical exponent, it should be in (0, 1) since for m ≥ 1 the singularity in the
first derivative disappears and so the phase transition is lost. If m < 0, the
LPPLS function would become singular in tc . This seems unrealistic because
the log-price should be finite in time. The critical time value tc should be close
to the ending time of the fitting window, t2 , such that it could be 3 months in the
past or a year in the future (here the time is expressed in trading days). From
[4], the angular log-frequency is linked to the fractal structure of a bubble.
Due to the log-periodic oscillations, local maxima appears in the model and
there are separated by time intervals. If we assume two successive local maxima
tm2 > tm1 such that: ∆t1 > ∆t2 , where ∆ti = tc − tmi for i = 1, 2. Then from
Equation (5) we know that (Eq.16):
2π ∆t1
λ≡eω = (16)
∆t2
is the ratio of consecutive time intervals. Then we get closer from a factor λ−1
to tc after each oscillations. Then if we set the boundaries for the log-frequency
such that: ω ∈ [4, 25] we will obtained 0.2 < λ−1 < 0.8. Then the distance for
the extreme cases evolved as following:
• λ−1 = 0.2: If we assume that tc = 500 days where t0 = 0 days is the first
local maxima then after three oscillations we will be at: ∆t3 = λ−3 tc = 4
days before tc . After an other oscillation we will be in the 24 hours before
the crash occurs.
• λ−1 = 0.8: By using the same example as above, after three oscillations
we will 256 days before the crash. Then we will need 15 oscillations before
being in the day of the crash.
To keep the hazard rate positive during a positive bubble (B < 0) or negative
during a negative one (B > 0), the damping D = m|B|/ω|C| is introduced such
that: D ≥ 0.5. The number of oscillations O is defined such that: O ≥ 2.5
if |C|/|B| ≥ 0.05, which means that there should be at least 2.5 oscillations
in the observed fit window to consider the fit as significantly different from a
random motion. The condition |C|/|B| ≥ 0.05 is to ensure that the magnitude
of the considered oscillations is significantly high in relation to the amplitude of
the power law. Otherwise, their frequency would not matter, as price behaviour
would be almost purely dominated by a clear power law pattern (which obviously
is a possibility).
Based on the filtering conditions, we accept valid (reject false) LPPLS fits
as ’qualified’ (’not qualified’), when their parameters lie all inside (at least one
outside) the allowed filtering ranges in Table 2. Let the following be the quali-
fication function for a fit calibrated to price data in the time window [t1 , t2 ]:
(
LPPLS − signB if all the parameters are in there boundaries
qual (t1 , t2 ) = (17)
0 otherwise,
9
where the sign of qualLPPLS (t1 , t2 ) is the sign of the bubble observed.
Consider a set of fits 1 with fixed fit window ending time t2 and various
fit window starting times t1 ∈ F (t2 ), where F (t2 ) is the set of starting time
considered for an ending time t2 .
Then the set of qualified fits at time t2 is (Eq.18):
QLPPLS
t2 = {qualLPPLS (t1 , t2 ), ∀t1 ∈ F (t2 )} . (18)
Then, the DS LPPLS Confidence Indicator (CI), developed by Sornette et
al. [7], at time t2 is defined by (Eq.19):
1 X
CILPPLS (t2 ) = q , (19)
|QLPPLS
t2 |
q∈QLPPLS
t 2
where the sum is taken over the elements q ∈ QLPPLS

t2 .
In the same spirit, it is possible to compute the Confidence Indicator also
for the geometric models (GBM and MGBM) by (Eq.20):
qualG (t1 , t2 ) = signµ(t1 , t2 ) , (20)

where µ(t1 , t2 ) is the trend fit to the time interval [t1 , t2 ]. Then the set of
all qualification at time t2 of a geometric model at time t2 is QG t2 . Then, it
is possible to define the CI for the geometric models from Equation (19) and
Equation (20) such that (Eq.21):
1 X
CIG (t2 ) = q , (21)
|QGt2 | G
q∈Qt
2
where the sum is taken over the elements q ∈ QG t2 .

For the geometric case there is no filtering conditions. The idea of this
confidence indicators is to say that if most of the fits are polarised in the same
direction a tendency is observed. Otherwise it is to uncertain.
Parameters Boundaries Descriptions Pre-Conditions

Power-Law
m (0,1)
Exponent
Critical Time tc [t2 + δt1 , t2 + δt2 ]
Log-Periodic
ω [4,25]
Frequency
m|B|
Damping D [0,5,∞) D= ω|C|
Number of ω |tc −t1 | |C|
O [2.5,∞) O= 2π ln |tc −t2 | if |B| ≥ 0.05
Oscillations
Table 2: Domain of definition of the parameters of the LPPLS model, where

t ∈ [t1 , t2 ] for each data point, δt1 = max{−60, − N2 }, δt2 = min{252, N2 },
N the number of data points fitted and the LPPLS function is defined by:
lt = A + |t − tc |m [B + C cos(ω ln |t − tc | + φ)].
1 The fitting procedure is based on the maximization of the log-likelihood with respects to
each error models and it will be explained in more detail in the next section.
10
2.3 Model Calibration
In this part we will construct optimization procedures based on each error mod-
els (GN, BM, OU). We need to construct a such procedures for the LPPLS
model since it is a non-linear model with seven parameters. Which makes this
model difficult to calibrate.
To facilitate future studies for future students a general implementation of all
the models used in this paper and the corresponding optimization procedure
have been done.
2.3.1 Maximum Log-Likelihood

Based on the different price and error models, it is possible to fit each model
instance by maximazing the log-likelihood. Thus, at first, the log-likelihood
needs to be derived and then its maximum needs to be computed. As this is
typically an optimization problem that does not have an easy analytical solu-
tion, especially for noise models beyond the simple Gaussian iid formulation, a
numerical optimization procedure for maximization of the log-likelihood has to
be chosen. These steps are described below, starting with the derivation of the
log-likelihood for the different models..
Since each error model represents different stochastic processes, the log-
likelihood (defined in the Appendix D, p∼91) will be different for each model.
From the specifications in Equation (9) and Table 1, we can link the structure
of the residuals for each model to a different normal distribution (Eq.22):
GN: yt − rt ∼ N(0,σ)
BM: d(yt − rt ) ∼ N(0,σdt) (22)
OU: d(yt − rt ) + θ(yt − rt )dt ∼ N(0,σdt) ,
where yt = ln pt . Based on these distribution formulations, the log-likelihood is

then derived.
Here rt is the underlying model (LPPLS or geometric) that we are going to fit
(see Eq.9) with respect to each error models. Then the seven LPPLS parameters
or the trend of the geometric model are insinuated in rt and Equation (22) shows
how we access these values by optimization.
Assume a sample of log-price yi = yti for ti+1 = ti = ∆t, with ∆t = 1 and i =
1, .., N , the cost function (the value of σ 2 which maximizes the log-likelihood)
for each error model is given in the following (Eq.23):
11
N
1 X
GN: σ̂ 2 (Ψ) = (yi − ri (Ψ))2 = SSE(Ψ)
N i=1
N
1 X
BM: σ̂ 2 (Ψ) = (∆yi − ∆ri (Ψ))2
N − 1 i=2
N
θ2 X
OU: σ̂ 2 (Ψ; θ) = (yi − ri (Ψ))2 (23)
N − 1 i=2
N
θ X
+2 (yi − ri (Ψ))(∆yi − ∆ri (Ψ))
N − 1 i=2
N
1 X
+ (∆yi − ∆ri (Ψ))2 ,
N − 1 i=2
where ∆yi = yi − yi−1 , ∆ri = ri − ri−1 and Ψ is the set of parameters of the
predictive model part rt of the log-price. For the special case of a standard
Gaussian distribution, the cost function is the Sum of Squared Error (SSE). For
each case the log-likelihood is defined by L ∝ − ln σ̂ 2 (see Appendix D, p∼91).
For the geometric dynamics (rt = µt and Ψ = {µ}) the maximization of the
log-likelihood is reduced to the optimization of a set of linear equations for the
GBM case and to find the minimum of a polynomial of degree 4 for the MGBM
case.
2.3.2 Optimization Procedure

For the LPPLS model (rt = lt ), the maximization of the log-likelihood is a com-
plex task due to the non-linearity of the model and resulting cost function. Thus,
in this section, the typical approach of calibrating the parameters by splitting
the optimization problem into a linear and a non-linear part, is presented.
The set of parameters of the LPPLS model Ψ = {A, B, C, m, tc , ω, φ} can be
decomposed into two subsets:
~ l = (A, B, C1 , C2 )
• Linear parameters: Ψ
(where C1 = C cos φ and C2 = −C sin φ, in Equation (5))
~ n = (m, tc , ω).
• Non-linear parameters: Ψ
So it is possible to write the LPPLS model as (Eq.24):
~ l, Ψ
lt (Ψ ~ n) = L ~ n) · Ψ
~ t (Ψ ~l (24)
where ~a · ~b denotes the Euclidean scalar product.

The main idea used by the Financial Crash Observatory (FCO) is to split the
optimization of the cost function into a two-step procedure, where based on
different initial values of the nonlinear parameters, the values of the linear pa-
rameters are found using the solution of a linear set of equations. Different
initial values for the three nonlinear parameters are supplied in a grid search.
Starting from a pre-selected set of points on the grid, a nonlinear optimizer finds
the optimal solution for the nonlinear parameters by iteratively computing the
12
solution for the linear parameters. This procedure is outlined in more detail
below.
Denote by G = M × Tc × Ω the allowed search domains for each of the nonlinear
parameters Ψ ~ n (given in Table 2, p∼10). Then, the search grid of initial values
is defined as (Eq.25):
GN := MN × TcN × ΩN , (25)
where the index N denotes that the spaces have been discretized into N linearly
spaced values (in this paper we used N = 10).
Given, a set of values for the nonlinear parameters, the goal of the linear opti-
mization is solving (Eq.26):
~ i ∈ GN ,
∀Ψ ~ i = arg min σ̂ 2 (Ψ
Ψ ~ l, Ψ
~i) (26)
n l n
~l
Ψ
For each grid point Ψ~ n ∈ GN . Then the non-linear optimization is done on all
the points of the grid where the non-linear parameters Ψ ~ i are used as guesses
n
and the linear ones are kept constant. Then the final solutions is the one which
maximize the log-likelihood.
The approach used in this paper is slightly different. To optimize the speed of
the procedure and to be able to use a thinner grid, the non-linear optimization
will not be done on all the guesses obtained after the linear one.
The first step with the non-linear grid search stay the same. After the linear
optimization we have a set {Ψ∗ = (Ψ~ l, Ψ
~ n )| ∀Ψ
~ n ∈ GN }. Then we reconstructed
the map F : GN → [0, +∞) defined by (Eq.27):
~ n ∈ GN ,
∀Ψ ~ n ) = min σ̂ 2 (Ψ
F (Ψ ~ l, Ψ
~ n ). (27)
~l
Ψ
The goal at this moment is to extract what we will call the five ”best guesses”
(the number of best guesses is an hyper-parameter of the procedure but in this
paper we will always search the five best guesses). The idea is not to take the
five arguments in GN which minimize F, because if we take directly the five
best minimum they could be neighbours which will leads to search during the
non-linear optimization, solutions in the same neighbourhood. Then the goal is
to obtain a smaller amount of guesses to give to the non-linear optimization to
obtain a faster procedure (since the non-linear optimization is the part which
cost the most time), but to obtain a good sample of guesses they need to be in
different regions of the grid.
To obtain good local minima of F, the scipy.signal.argrelextrema method is
used. The idea is as following:
1. Use scipy.signal.argrelextrema to obtain a first set G0N ⊂ GN of local
minima of F.
2. While |GnN | > 5:
(a) Define a new map F n which is the same as F but it is defined on GnN .
(b) Then apply step 1 to F n to obtain the subset Gn+1
N ⊂ GnN ⊂ GN .
3. Once a set GnN has been found:
(a) If |GnN | = 5: return GnN .
13
(b) Else: complete GnN with the number of missing minima (5 − |GnN |),
which are the best minima in Gn−1 n
N \ GN . Then return the completed
set GnN .
Then the non-linear optimization is no more done on GN but on GnN . Then
we defines the set of guesses to give to the non-linear optimization by: {Ψ∗ =
~ l, Ψ
(Ψ ~ n )| ∀Ψ
~ n ∈ Gn }. From the non-linear optimization we obtain five new
N
non-linear guesses. For each one of the new non-linear guesses we apply again
the linear optimization, such that we keep only the set which maximizes the
log-likelihood.
During the construction of the new procedure, it have been compared to the
original one and they gave the same kind of results, but the new optimizations
is faster since it only applies five times the non-linear optimization.
To optimize the linear parameters, we used the chain rules to decompose the
problem, such that the cost function is of the form (Eq.28):
∂ σ̂ 2 X ∂ σ̂ 2 ∂li

= =0
∂Ψ~ l Ψ ~ l =Ψ
~l ∂l i ∂ ~ l Ψ
Ψ ~ l =Ψ
~l
i
b b
X ∂ σ̂ 2 (28)
⇒ ~ ~
Li (Ψn ) = 0
i
∂li ~ b
Ψl =Ψl ~
Equation (28) reduced to a linear system for the GN and BM error models, but
for the case with OU, due to the part in θ and θ2 in Equation 23, the problem
reduces to the resolution of a system of polynomial of degree 3.
So OU is a special case compare to GN and BM. The first idea was to apply
a non-linear optimization for all the parameters at the same time but it gave
bad results. Then we kept the decomposition into two different optimization
where we optimized the linear parameters and θ into a non-linear optimization
but it did not work well. Then the final idea and the one used in this paper to
optimize the linear parameters and θ is as following:
~ 0.
1. Solve the linear system to minimize the SSE and to get a guess Ψ l
2. Solve the Polynomial of degree 2 in θ (Eq.23) to get a guess θ∗ .

~ l to
3. Consider that θ = θ∗ is constant and solve the linear system in Ψ
~ ∗ , θ∗ ).
obtain the solution for the linear parameters (Ψ l
To optimize the non-linear parameters, the Newton Conjugate-Gradient method

is used to maximize the log-likelihood and the Lagrange regularization is applied
to keep Ψ~ n ∈ G. The implementation in python of the non-linear optimization
is based on the scipy.optimize library.
2.3.3 Implementation
In Section 2.2.1 we defined the LPPLS and the geometric models of price growth,
then we defined in Section 2.2.2 three error models (GN, BM and OU, see Table
1, p∼8). Based on each error models we defined an optimization procedure.
Here, we outline the actual computer implementation of the optimization pro-
cedure.
The goal of the implementations was to be the most general as possible. The
main objective was to write a code which can be used easily by future students.
14
We know that many people working on the model always use their own code.
Then to facilitate future study, a code has been written such that everyone can
quickly start to experiment with.
The main idea was to deconstruct the models into four main categories:
• The predictive models: Which characterized the predictive part of a model
(see Eq.9) such as the LPPLS model.
• The optimization procedure: Here the grid search has been implemented
such that it is possible to use the original grid search method or the one
used in this paper (see Section 2.3.2, p∼12).
• The cost functions: It will define the different cost functions (Eq.23).
• The stochastic process: Which will define the different error models.
Model
@abstract
Optimization Stochastic
@abstract @abstract
Predictive GN_cost GN BM OU
@abstract @abstract
BM_cost
@abstract
PL
@abstract OU_cost
LogLik_GN @abstract
@abstract
LPPLS LogLik_BM Ito_model

@abstract @abstract
LogLik_OU Predictive models and the

@abstract optimization procedure
Cost functions and

fitting methods
+ Error models
Predictive model with an

optimization procedure
Predictive model with

an optimization procedure + Class Combination
Class Hierarchy
Given as an argument
Figure 3: Map of the class hierarchies of the implementations of the different

models and the corresponding optimization procedures. Model is the parent
class of all the other one. Optimization will be the parent class of the optimiza-
tion procedures and cost functons (blue part) and of the different predictive
model (red part). To obtain a complete predictive model a blue class and a
red one should be combined. The stochastic classes represent the different er-
ror models. A predictive model (purple) and a stochastic one (green part) to
obtain a full model with a predictive and a stochastic part and a corresponding
optimization procedure.
The goal of a such construction is to give the possibility to be able to complete

the code easily by implementing new model (for example variants of the LPPLS
model) without needing to re-implement the optimization procedure and the
error models. As well it is possible to implement new error models with the
15
associated cost functions without writing again the optimization procedure and
the predictive model. It is also possible to adapt as sub-classes the optimization
procedure.
The idea is to assign the different parts of the optimization problem into different
programming classes that are all sub-classes of an abstract class Model. Figure
3 shows the class hierarchies to construct a model. In the Appendix H there is
Table 9, p∼95, which summarize the functions of each class and there is a copy
of the code.
In Figure 3 it is possible to observe that there is three direct sub-classes of the
abstract class Model:
• Optimization: Which is the parent classes of two different parts of the
problem:
– Predictive: Which defines the grid search procedure and combined
the linear and non-linear optimization. It is also the parent classes
of a predictive model. For this project the only predictive model
implemented and used is the LPPLS model, but the power-law has
been implemented as the mother class of the LPPLS one.
– GN cost: Which defines the OLS cost function and the general
structure of the linear optimization part. It is the mother class of all
the other cost functions.
• Stochastic: It defines the general form of an error model and all its sub-
classes are the error models. Each error models has been implementing
such that they can consider a linear drift, which leads to the GBM model
for the BM error model and to the MGBM for the OU one.
• Ito model: It combines a sub-class which is a combination of a predictive
model and a cost function with an error model to give a full stochastic
model.
Table 9, p∼95, goes deeper into the subtleties between each classes such as the
main methods that they defined.
All the classes are abstract except: GN, BM, OU and Ito model. To be able
to construct a non abstract Predictive class X for a model Y (for example for
the LPPLS model), it needs to be combined with a cost class Z (for example
GN cost) such that: x = X(Y,Z), where x is an instance of class X which is a
subclass of Y and Z.
Using this classes to be able to define each model of Table 1 it is needed to
follow this steps:
from model import LPPLS , LogLik GN , LogLik BM ,

LogLik OU , GN, BM, OU
c l a s s LPPLS GN(LPPLS , LogLik GN ) : p a s s

# LPPLS model with GN optimization class
c l a s s LPPLS BM(LPPLS , LogLik BM ) : p a s s
# LPPLS model with BM optimization class
c l a s s LPPLS OU(LPPLS , LogLik OU ) : p a s s
# LPPLS model with OU optimization class
16
l p p l s g n = I t o m o d e l (LPPLS GN ( ) , GN( ) )
# LPPLS+GN model i n s t a n c e
l p p l s b m = I t o m o d e l (LPPLS BM ( ) , BM( ) )
# LPPLS+BM model i n s t a n c e
l p p l s o u = I t o m o d e l (LPPLS OU ( ) , OU( ) )
# LPPLS+OU model i n s t a n c e
gbm = BM( ) # GBM model i n s t a n c e
mgbm = OU( ) # MGBM model i n s t a n c e
Then the five model instances have been programmed. To be able to fit a log-
price time series ln p different hyper-parameters are considered. For a Predictive
instance (or a Ito model one) there is seven hyper-parameters:
• non-linear bounds: The boundaries of the non-linear parameters of a
Predictive model (it could be none).
• method: Defines the optimization method to use. Although ev-

erything have been implemented to use the Newton-conjugate gradi-
ent method other optimization method can be used (for example the
Nelder-Mead method and the other methods pre-implemented in the
scipy.optimize.minimize method).
• tool\options: see https://docs.scipy.org/doc/scipy/reference/generated/
scipy.optimize.minimize.html.
• show: A boolean to choose if the final results are printed on the terminal
at the end of the procedure or not.
• n to extract: The number of guesses to extract after the grid search. If
it is set to None, the optimization procedure is the one from the FCO.
• measure time: A boolean to choose if the time used by the optimization
procedure is measured or not.
The following code shows how it is possible to fit a time series y, with corre-
sponding time t, under an ito model and GBM:
# y : t h e time s e r i e s
# t : t h e time
# i t o : an I t o m o d e l
# gbm : t h e GBM model
# bnds : b o u n d a r i e s o f t h e non−l i n e a r parameter o f t h e I t o m o d e l
’ ’ ’ Ito model f i t t i n g application ’ ’ ’

a r g = { ’ n o n l i n e a r b o u n d s ’ : bnds , ’ method ’ : ’ Newton−CG’ ,
’ measure time ’ : True }
l i n e a r , n o n l i n e a r , para , l o g l i k , tau = i t o . f i t ( y , t , a r g )
# a r g : hyper−p a r a m e t e r s o f t h e o p t i m i z a t i o n p r o c e d u r e
# l i n e a r , n o n l i n e a r : p a r a m e t e r s o f t h e p r e d i c t a b l e model
# para : p a r a m e t e r s o f t h e e r r o r model
# l o g l i k : maximum l o g −l i k e l i h o o d o b t a i n e d a f t e r o p t i m i z a t i o n
# tau : time used by t h e p r o c e d u r e
17
’ ’ ’ GBM f i t t i n g a p p l i c a t i o n ’ ’ ’
para , l o g l i k = gbm . f i t ( y , t , p a r a m e t e r s = [ None , None ] )
# para : p a r a m e t e r s o f t h e GBM model
# l o g l i k : maximum l o g −l i k e l i h o o d a f t e r o p t i m i z a t i o n
The reason why there is the part: parameters = [None, None] for the GBM
case, it has to do the difference between a pure BM and the GBM model. If
the parameters are set to: parameters = [0, None], it will not fit with respect to
the GBM model but with respect to a pure BM. This is because the argument
parameters represents the parameter to keep constant during the optimization.
If they are set to None, it means that they are going to be optimized. For BM
there is: parameters = [µ, σ] and for OU: parameters = [µ, θ, σ] (see Table 1).
The code of the model classes is presented in the Appendix H, p∼94.
Then five different models have been defined and an optimization procedure has
been developed for each model. Before applying the model to real time series
let us first test the model and the optimization procedure on synthetic data.
2.4 Synthetic Data

2.4.1 Motivation: Estimator Validation
As an initial test of the implementation of the optimization procedure, it was
tested based on randomly generated synthetic price trajectories. This is a com-
mon way to test the correctness of employed estimation procedures and features
of the optimizer, before proceeding to real world data, where the underlying pa-
rameters are unknown. More specifically, as in the synthetic data setting, we
have the benefit of knowing the true values of involved model parameters be-
forehand, we can test how well it is possible to recover these parameters by
means of the implemented optimization procedure. Furthermore, another goal
of this section will be to generate LPPLS and OU instances separately and find
out how accurately the optimization algorithm can recover the LPPLS signal
when we increase the amplitude (σ) of the error part of the model.
In order to evaluate the quality of the optimization procedure, we consider
several important metrics:
• The R-squared of the power-law in a log-log plot (R2 ). The goal is to
observe how well we are able to recover the power-law part of the LPPLS
model (see Appendix E, p∼91).
• The maximum log-likelihood of the fits. The goal is to observe the trend
of the log-likelihood in function of the amplitude of the noise (σ) and of
the number of data points fitted (N) to compare it with the theory (see
Appendix D, p∼91).
• The entropy of the log-periodic part in a log-scale (see Appendix E, p∼91).
The goal of this test is to see if we are able to characterize how well we
recover the log-periodicity of the model.
• The SSE (which has been defined in Section 2.3.1, p∼11) to compare the
different optimization procedures based on each error model.
18
• The absolute relative error of each parameters ψ (|(ψ − ψ0 )/ψ0 |, where ψ0
is the parameter generated). This test will help us to estimate how well
we are able to recover each parameters of the LPPLS model.
• The deviation between the observed log-price five days in the future and
pi+5 − ln pi+5 )/ ln pi+5 |, where i is the index of the last
its prediction (|(lnd
data point of the fitting window and lnd pi+5 is the prediction for the fifth
data point in the future and ln pi+5 is the log-price observed). This last
test is to observe how each models are able to predict the data points five
days in the future. The reason why we chose to do the test with the fifth
point in the future is because when we are going to apply the models to
trading strategies, we will adapt our strategy every five trading days.
Each metrics are going to be computed in function of three different parameters:
• The distance from the original critical time (tc0 − t2 ).

• The number of data points fitted (N = t2 − t1 + 1).
• The amplitude of the noise (σ).
2.4.2 Synthetic Data Generation

In the following, the procedure of generating artificial, random price trajectories
based on given parameter values for each of the model variants (see Section 2.2,
p∼3) is described. Three main hyper-parameters determine the procedure:
• The sample size N . The sample size also determines at the same time
the starting and ending times of the simulation window tstart = 0 and
tend = N − 1 (not be confused with t1 and t2 , which are respectively the
starting and ending time of the fitting windows, where tstart and tend are
respectively the starting and ending time of the generated time series).
Since N = tend + 1, we scanned over:
tend = 60, 100, 250, 375, 500 days.
• The rate of mean-reversion θ. For θ = 0, the OU process simplifies to a

BM and for θ ≥ 1, it becomes GN (see Section 2.2.2, p∼5). By using this
results we are only going to simulate error models by using the OU one,
since BM and GN can be see as special cases of the OU. Then we scanned
over:
θ = 0, 0.03, 0.1, 1, 1.5 days−1 .
• The volatility σ, such that:
σ = 0.001, 0.003, 0.01, 0.03, 0.1 days−1/2 .
Then we have 5 different window sizes N, 5 different mean-reversing rate θ and 5

different values for the volatility σ. Combining all possible values of the hyper-
parameters into triplets yields 5 × 5 × 5 = 125 different configurations. For
each configuration, 5 pure LPPLS are generated and each one are combined to
19
Parameters A B C
Boundaries [0, 6] [−6 min{tc , t2 }−m , 6 min{tc , t2 }−m ] 2m
[− ω|B| 2m
, ω|B| ]
Parameters m tc ω
Boundaries [0, 1] [max{50, 43 t2 }, 54 t2 ] [ω0 , 25]
Table 3: Boundaries of the LPPLS parameters (where the LPPLS function is de-
fined by: lt = A + |t − tc |m [B + C1 cos(ω ln |t − tc |) + C2 sin(ω ln |t − tc |)]) used
for the experiments. ω0 = 4 if tc ≤ t2 , otherwise ω0 = min{25, 5π/ ln(tc /(tc −
t2 ))} and C 2 = C12 + C22 . The boundaries have been chosen to get a synthetic
log-price in [-3, 9] and to respect the domain of definition of each parameters
(tab.2)
five different OU instances. This results to 5 × 5 = 25 different time series for

a single configuration (N, θ, σ). This yields to 25 × 125 = 3125 different time
series generated in total.
Thus, we choose the sample size, as well as the noise model parameters as
hyper-parameters of the procedure. The actual model parameters for LPPLS
trajectories are generated randomly within the allowed boundaries given in Table
3, p∼20.
The idea is to fit each one of this 3125 time series with respect to the LP-
PLS+GN, LPPLS+BM and LPPLS+OU. The goal is to compare the different
optimization procedures.
In addition to the hyper-parameters other studies are going to be done based
on the windows used to fit the data.
• t2 are the different ending times considered by the optimization procedure

such that it is an integer linearly spaced between 65 tend and tend , where
the number of linearly spaced t2 for a given time series chosen is Nt2 = 5.
• Nwindow = 5 which means that five different values for t1 linearly spaced
between 0 and t2 − 20 are used such that the optimization procedure is
applied on the time series on the time interval [t1 , t2 ].
So for each time series 25 fits are done, which leads to a total of 25×3125 = 78125
fits to test the model.
2.4.3 Test Results

The main results are presented in the Appendix I, p∼123 (from Figure 23 to
Figure 36). For each figure there is one metrics considered. All the figures are
composed with three panels:
• The upper one is the metric in function of (σ).
• The center one is the metric in function of the number of data points fitted
(N).
• The lower panels plotted the metric in function of the distance between
the ending time of the fitting window and the true critical time (tc0 − t2 ).
They are also composed of three column for the results for the different error
models (LPPLS+GN, LPPLS+BM, LPPLS+OU). From Figure 23 to Figure
20
29, the results for each metrics in function of each parameters are plotted for
θ = 0.03. The shaded area represents the interval between the 0.2-quantile and
the 0.8-quantile and the dark line in it is the median. As well Figure 30 to
Figure 36 plotted the 0.2-quantiles of the metrics for the different values of θ.
Below, we describe and comment the results of the different tests. Each
metric is discussed separately here, then the main observations are going to be
summarize in a conclusion (see Section 2.4.4, p∼23).
Linear parameters
Figure 23 shows how the relative error for the parameter A changes in function
of the amplitude of the error σ, the number of data points fitted N and the
distance with the true critical time. The median and the 0.2-quantiles for
each case (GN, BM and OU) are similar in function of σ. So in average they
all succeed to recover the parameter as well one time out of two but the BM
optimization gives rise to much larger deviation. In the best case for σ = 10−2
only 20% of the results for all the model give rise to an absolute relative error
below 0.1 (and for σ0.001 it goes down to 0.03). The results in function of the
number of data points are similar for each models and they seem independent
of N even if it seems to increase when N < 100 data points. When the ending
time of the fitting window tends to the critical time the relative error increases
suddenly and the best moment to obtain a good estimation of the parameter
A is between 1 and 3 days before tc . Figure 30 is equivalent to Figure 23 but it
shows only the 0.2-quantiles for different value of the mean-reverting rate θ. In
general GN gives a better estimation of A for θ ≥ 1, which was predictable since
for θ ≥ 1 the error model is equivalent to a standard Gaussian distribution.
On the other hand BM gives the opposite result, it is for θ < 1 that it gives
a better estimation of A. Surprisingly the 0.2-quantiles of the relative error
seems to decrease with the number of data points for each model. θ seems to
not affect the results in function of the distance with the true tc . The results
for the other linear parameters (B, C1 and C2 ) are equivalent to A and give
rise to the same observations.
Non-linear parameters
Figure 24 shows the absolute relative error for tc for each tests. The results
show that each model recover as well the critical time. For all σ, 20% of the
results give tc with a relative error smaller than 0.1. The results seem to be
invariant in function of the number of data points. It decreases when t2 get
closer to three days before the true tc and increases when t2 reaches tc . Figure
31 shows the 0.2-quantiles for each tests for the different mean reversing rate
θ. The results are similar than for the linear parameter A but here the relative
error does not seem to be influenced by the number of data points and the
0.2-quantiles decrease when t2 get closer to tc .
Log-likelihood
On Figure 25 it is possible to observe that the maximum log-likelihood seems
to be linear in function of ln σ for each model, which is predicted by the
formula of the log-likelihood (see Appendix D, p∼91) since: L ∝ −N ln σ
which is what we observed on the figure. When σ decreases the maximum
log-likelihood seems to saturate. Which shows a tendency to overestimate the
noise amplitude when it is low (i.e. σ < 0.01). As well it seems to increase
21
linearly in a log-log plot in function of the number of data points which is
also predicted by its formula. These two results show us that the optimization
procedure is well implemented because the maximum log-likelihood evolved
as predicted withe N and σ. The maximum log-likelihood is decreasing when
t2 get closer to tc which could seem strange but it shows us that the critical
behavior of the model is the most difficult one to recover. Figure 32 shows that
the 0.2-quantiles of the log-likelihood is independent of θ when it is plotted in
function of N and tc0 − t2 but in function of σ GN recovered better tc for θ ≥ 1,
as well for BM for θ < 1 and OU seems to be invariant under θ. This shows us
the use to consider different error models since they succeed to catch the good
features in the special case where they characterized the stochastic part of the
time series (GN: θ ≥ 1, BM: θ < 1 and OU: for all θ).
SSE
Figure 26 shows that for GN the SSE is linear with respect to σ 2 , when
σ ≥ 0.01, which is predicted by the theory (see eq.23). For σ < 0.01, the
median tends to be constant and the quantiles move away from the median.
This shows us that the GN optimization has a tendency to overestimate σ
when it gets lower than 0.01. The results are equivalent for all the model
even if the cost functions of the BM and OU optimization are all different.
This shows us that all the optimization procedure minimize the SSE. On
Figure 33 it is possible to observe that the SSE is independent of θ for all
the tests which is not a good thing because it shows us that the optimization
procedure has a tendency to overfit the data. For the case of the 0.2-quantiles
of the OU optimization it is possible to see that for small noise amplitude
(σ ≤ 10−2 ), the results are different for each θ and the SSE is smaller for a
big θ and get bigger when θ decreases which is a good things since for θ ≥ 1
the main part in the cost function of the OU optimization is the SSE (see Eq.23).
R2
The R2 is a good indicator to see if we overfit or not the data, since for a
high value of σ the LPPLS signal should be lost and so the power-law should
disappear and the R2 should drop to a value close to zero. Figure 27 shows
us that GN has a tendency to overfit the data because for σ = 0.1 it keeps a
R2 > 0.8 20% of the time, as well for OU. For BM, 50% of the results give a
R2 > 0.9 for σ = 10−2 and when σ is increasing the power-law is lost. It is
independent of the number of data points but BM shows larger deviation, as
well for the test in function tc0 − t2 . On Figure 34 it is possible to observe that
for θ ≥ 1, GN does not recover a power-law when σ is big enough (i.e. σ = 0.1)
which means that it does not overfit the data but for θ < 1 it does. The BM
optimization does not recover at all the power-law for θ ≥ 1, which is a good
thing since it should work when the error model generates a deviation from
the trend. For θ < 1 it succeed to recover the power-law for σ = 10−2 and
it losses the signal when σ increases. Then when the mean-reversing rate is
law enough the BM optimization succeed to recover the power-law part from
LPPLS signal. The OU optimization recover the power-law for all the value of
θ when σ = 10−2 and the 0.2-quantiles of the R2 tend to zero when σ increases.
Entropy
Based on Figure 28, the entropy is invariant in function of σ and it is increasing
22
with the number of data points. When t2 gets closer to tc it decreases and
reaches is best value close between one and three days before tc . From Figure
35, the GN optimization has a lower entropy when θ ≥ 1 for all σ. For the BM
one, the results are the same when σ ≤ 0.01 but when σ > 0.01 and θ ≥ 1
the entropy increases, where the it stays constant for the other cases. The OU
optimization has the same results than GN for θ < 1 and the same than BM
for θ ≥ 1.
Deviation
On Figure 29 it is possible to observe that the deviation between the predic-
tion for in five days and the observed value is bigger for the OU optimization
compared to GN and BM of one order of magnitude. Where BM and GN give
equivalent results. The reason why the OU optimization gives so bad predic-
tions could be due that the LPPLS trend measured is below (or above) the time
series where it should be the opposite and so it predicts a mean-reversing effect
in the opposite direction than the one observed.
2.4.4 Conclusion
In the previous section we observed differences and similarities between the
different optimization methods (GN, BM and OU). Here we summarize the
main results:
• On average at least 20% of the fits succeed to recover the parameters with
a relative error smaller than 10−1 ;
• All the models give equivalent results for the relative errors;
• The best moment to estimate the critical time is between 1 and 3 days
before tc ;
• The optimization procedure is well implemented since the results for the
maximum log-likelihood coincide with the theory;
• The SSE and the R2 shows us that we are not protected to overfit the
data, but each error model catch different features of the time series and
in their domain of definition they are less prone to overfit the time series;
• The entropy does not succeed to catch well the periodicity of the log-
periodic part in a log-scale, but in average the results decrease when t2
gets closer to tc ;
• In average all models give a good prediction for five days in the future.
The worst prediction is given by the OU optimization where it is an order
of magnitude above the other models.
Overall, from the tests and insights above, an important conclusion can be
drawn. Given the assumption that the synthetic data trajectories used in the
experiments of this chapter appropriately reflect the statistical properties and
stylized features of real-world data, we come to the following conclusion. Fitting
the LPPLS to a real price time series that is in a bubble state, i.e. obeys LPPLS
dynamics, we should expect to obtain a small set of qualified fit far from tc and
the size of this set should increase when we get closer to tc . Nevertheless, when
23
calibrating the different LPPLS error models, we will often obtain similarly
good results in terms of log-likelihood and the other evaluated metrics. Thus,
distinguishing between the exact type of model may become quite difficult, as
the fits are prone to overfitting. Nevertheless, we can still differentiate between
GBM and LPPLS movements sufficiently reliable. These tests helped greatly
to be aware of these risks in the development of the further content and the
trading strategy later on in this work, as well as to check the correctness and
accuracy of the implemented procedures.
24
3 Trading Strategies for Regime Classification
3.1 Motivation
In the previous chapters, we defined various models for price dynamics, derived
calibration procedures to fit them to real market data and performed an insight-
ful test on synthetic data to check different properties of the data and involved
procedures. Next, we develop trading strategies based on the various models. A
main question that we study and follow throughout this thesis is to determine
whether the success of each trading strategy, measured by various strategy per-
formance metrics, can help in validation or rejection of the model underlying
the strategy. Therefore, the idea is to directly relate the performance of a strat-
egy to the correctness of the underlying model; if a model-based strategy has
outperformed many strategies based on other models throughout a certain time
frame, it must have accurately described and predicted the market dynamics
during this period of time. Given such a relation exists, as a consequence, we
will be able to determine what model dynamics govern the real price time series
over different periods / regimes. This fulfills an important task; identification
and classification of different market regimes, as well as the detection of regime
changes. To outline the big picture of this thesis: while at first, the model-
based trading strategies will be simulated over various price series to classify
and label them into different regimes, later on, then labelled price series will
be used to train a map (such as neural network) which will then learn to pre-
dict regimes and regime changes based on a model-free (in the sense that the
employed machine learning models are seen as ”black boxes”) view. Overall,
we thus have two ”agents”; one model-based agent, which classifies states of
the world according to underlying theoretical assumptions (such as the LPPLS
model). We refer to this agent as the ”consultant”, while the second agent is the
model-free predictive agent, which we will refer to as he ”trader”, as he will be
able to foresee imminent regime changes and identify the current (hidden) state
of the market which allows him to make informed trading decisions based on
these predictions. In the following, the step-by-step development of the various
trading strategies is presented.
Trading strategies fundamentally aim at transforming available raw data into
applicable trading signals via some nonlinear operation, with the goal of max-
imizing the trading performance that results from making trading decisions
based on the generated trading signals. Essentially, a complete trading signal /
decision should cover the following information:
• Type: buy / sell / hold
• Time: time of execution
• Size: transaction volume
Obviously, as there are many different order types and mechanics, the list above
is more complex in reality, but here, only the most basic ”ingredients” of a
trading signal are covered.
In the following section, we will derive a way to select the order size for different
strategies at any point in time. The choice of this so-called ”bet size” is based
on the Kelly Criterion [5] which provides an optimal, dominant strategy for
selecting the bet size that maximizes expected long-term growth, given the
25
underlying assumptions are met. The approach will enable us to compute the
optimal fraction of capital allocated to an asset investment at any point in
time. As we proceed in discrete intervals of time of fixed size, the trading times
are automatically constrained by construction. Furthermore, the transaction
volume is determined by the change of the bet size from one time interval to
the next one. Similarly, the type of the order is determined by the sign of the
change of the bet size. Thus, a great, inherent advantage of the usage of the
Kelly method is already evident; the repetitive computation of a single number
already completely provides all minimum information about the trading signal
(according to the list above) that is required to simulate the strategy. In the
following sections, firstly, the classical Kelly strategy results are derived. Then,
a new concept that we call the ”quenched Kelly criterion” is introduced and the
strategies are derived.
3.2 The Kelly Criterion

In this section we will define the notion of Kelly criterion and a general mathe-
matical framework to optimize trading strategy when we assume that the price
follows a predefined process.
In a two-component portfolio, consisting of a risk-free (such as a bond) and a
risky asset (such as a stock), the bet size or investment fraction λt at time t is
defined as the fraction of total available capital (i.e. total portfolio value) cur-
rently invested in the risky asset [5]. It is this quantity that, as explained above,
determines the trading decision at time t. A series of values for λt estimated
over time thus produces a trading strategy that can be simply represented by
the corresponding series of investment decisions (Eq.29):
ΛT := {λt }T , where: T := [ts , te ] (29)

is the time window of simulation of the strategy, where ts is the starting time
of the simulation and te its ending time. The exact procedure for repetitively
estimating the values of λt over time is described in detail in the next section,
while here, at first, we derive a mathematical expression for λt based on the
Kelly criterion.
Let us consider a portfolio with discounted value V at time t, its value at time
t0 = t + ∆t > t is given by (Eq.30):
Vt+∆t = Vt (1 + λt rt + (1 − λt )rf t ) , (30)

where rt is the return of the risky asset on [t, t + ∆t] and rf t the return of the
risk-free asset (such as a government bond) over the same interval. In other
words, λt is the weight to invest in the asset from t to t + ∆t. In a self-financing
strategy, λt ∈ [−1, 1] (see below for further details). From here on, we set
rf t = 0 for all time t, because a non-zero return of the risk-free asset would
simply bias the computations, and instead, we would like to obtain results that
isolate the pure trading performance based on investing in the risky asset.
Let P (rt ) be a probability measure that provides the cumulative distribution
function (CDF) of the return r of the asset such that r is a random variable
defined on the probability space (Ω, F, P ) (it could be empirical or theoretical).
Given that rt , the potential return (or loss) of investing into the risky asset, is
a random variable, the question that Kelly sought to answer is whether there
26
is an optimal fraction of money (the bet size) to invest into the risky asset (the
bet) such that the expected long-term growth / gain of repetitively playing the
bet is optimal, i.e. maximized (Eq.31):
λt = arg max EP [ln(1 + rt λt )] ,

λ
Z (31)
where: EP [ln(1 + rt λt )] = ln(1 + rt λt )dP (rt ) .
Ω
This is the Kelly criterion, which states the fundamental optimization problem
that must be solved, in order to obtain the optimal investment fraction λ.
The derivation of λt is outlined below for each of the models of Section 2.2,
p∼3, ranging from the classical solution for the asset price following a Geometric
Brownian Motion to the new solution for the various LPPLS model variants.
Essentially, as will be shown, the basic expression for the optimal investment
will have the same general form for all models, but the outcome for λt differs
fundamentally in the computation / estimation of two essential variables: the
drift and the variance of the asset, which are dependent on the underlying model
that is used to describe the asset price dynamics.
Let the price of the risky asset be Pt and the risk-free asset price be Rt . The
value of a portfolio which follows the strategy {φt , ψt } at time t is defined by
(Eq.32):
Vt = φt Rt + ψt Pt . (32)
The variation of the value over an infinitesimal time step dt is defined by
dVt := Vt+dt − Vt . (33)
We consider a self-financing portfolio, i.e. a portfolio that (i) only varies in value
due to changes of the prices of the underlying investments
dVt = φt dRt + ψt dPt (34)

and (ii) can only be changed in composition by financing the purchase of shares
of the risky asset from the sale of units of the risk-free asset and vice versa:
Rt dφt + Pt dψt = 0 . (35)

Since λt is defined as the fraction of available capital (portfolio value) to invest
in the risky asset at time t, we can express it as the total value of shares of the
risky asset in the portfolio divided by the portfolio value at time t (Eq.36):
Pt
λt = ψt . (36)
Vt
Complementary, plugging this into the self-financing property (i) (Eq.34) of the
portfolio, leads to (Eq.37):
Rt
1 − λt = φt . (37)
Vt
Thus, rearranging Expressions (37) and (36) yields expressions for the numbers
of shares (Eq.38):
27
Vt Vt
ψt = λ t , φt = (1 − λt ) . (38)
Pt Rt
Inserting these expressions into Equation (34) results in (Eq.39):
dVt dPt dRt
= λt + (1 − λt ) . (39)
Vt Pt Rt
Thus, the relative variation of the portfolio is nothing but the λ-weighted sum
of the infinitesimal price changes (returns) of the underlying assets from time t
to time t + dt.
Next, very generally, assume that the price dynamics of the risky assets are of
the form (Eq.40):
dPt
= µ(Pt , t)dt + σ(Pt , t)dWt . (40)
Pt
Furthermore, assume that (Eq.41):
dRt = Rt rf t dt , (41)
where rf t is the compound return of the risk-free asset.

Inserting these definitions into Equation (39) leads to (Eq.42):
dVt
= λt (µ(Pt , t) − rt )dt + λt σ(Pt , t)dWt + rt dt . (42)
Vt
Integration and application of Îto’s lemma allows writing the differential of the
log of the portfolio value (Eq.43):
σ 2 (Pt , t)

d ln Vt = λt (µ(Pt , t) − rf t ) − λ2t + rf t dt + λt σ(pt , t)dWt . (43)
2
This quantity is often referred to as the logarithmic growth rate (LGR) G of

the portfolio (Eq.44):
dGt := d ln Vt . (44)
The LGR depends of course on the random variation of the underlying risky
asset, thus on rt in the Kelly criterion optimization problem above (Eq.31) as
well as on the composition of the portfolio, represented by λt .
According to the Kelly criterion, it is the expected LGR that we wish to max-
imize. In order to achieve this, at first, discretize such that λt and the other
involved quantities are constant over [t, t + ∆t]. Next, take the expectation,
which leads to (Eq.45):
Vt+∆t λ2
Et [∆Gt ] = Et [ln ] = λt (µ̂t − r̂f t ) − t Σ̂t + r̂f t , (45)
Vt 2
28
where ∆Gt = Gt+∆t − Gt and (Eq.46):
Z t+∆t
µ̂t = Et [ µ(Ps , s)ds] is the expected return of the risky asset,
t
Z t+∆t
Σ̂t = Et [ σ 2 (Ps , s)ds] is the expected cumulative variance,
t
Z t+∆t
r̂f t = rf s ds is the expected return of the risk-free asset.
t
(46)
The drift and variance of the risky asset in (46) are estimated depending on the
underlying model of the price dynamics by means of the maximum likelihood es-
timation procedure that is described in Section 2.3.1. Furthermore, throughout
this study, rf,t = 0, for all t, such that r̂f t = 0 always. Then, it is Expression
(45) that remains to be maximized with respect to λt , given our estimates of
the quantities in Equations (46) over the next (future) interval of time [t, t + ∆t]
(Eq.47):
λ̂t = arg max Et [∆Gt ] . (47)

λ
The hat-sign here indicates that, as the quantities in Equations (46), the quan-
tity λˆt will just be the estimate of the fraction of money to invest into the risky
asset over the next future interval that will lead to maximum expected loga-
rithmic growth. Note that, as in Equation 45, conditional on information up to
time t, Vt is already known, the quantity to maximize is just Et [ln Vt+∆t ] which
is why the Kelly method is often also referred to as the maximization of the
expected logarithm of wealth. The stated maximization problem has the simple
analytic solution for the optimal investment fraction (Eq.48):
µ̂t − r̂f t
λ̂t = . (48)
Σ̂t
Summarizing, in this section, a model-based approach to determine the optimal
investment decision for the next time interval by means of the Kelly Criterion
was derived. In the following section, it will be explained, how this is wrapped
into trading strategies for the various models, thus, how to obtain the series ΛT
for each strategy in the simulation window T .
3.3 Individual Strategy Simulation

Based on the different models presented and on the Kelly criterion defined in
the previous section, we will construct model-based trading strategy.
Recall that in Section 2.2 we have defined a set M of five different models (see
Table 1, p∼8):
M := {LPPLS+GN, LPPLS+BM, LPPLS+OU, GBM, MGBM} . (49)
Now, sequentially, for each time step t2 in the overall simulation window T :=
[ts , te ], each model m ∈ M is calibrated on the price series in different time
windows (Eq.50):
29
T (t2 ) := {[t1 , t2 ]|t1 ∈ F (t2 )} , (50)
where the set of starting times
F (t2 ) := {t1 := t2 − N + 1|N ∈ N } (51)

and the set of window lengths
N := {20, 21, ..., 504} (expressed in trading days). (52)

Thus, for each ending time / time step in the simulation window, each model
is fitted |N | = 484 times to the price series. For our simulations, we choose a
step size for the fit window end times t2 incremented throughout the simulation
window T:
∆t = 5 days. (53)
Thus, the portfolio is regularly re-balanced, starting from the first simulation
time t2 = ts and then proceeding in steps of 5 days until t2 >= te is reached.
Then the set of all fitting window ending time is (Eq.54):
T2 := {ts , ts + ∆t, ..., ts + k∆t} ,

(54)
such that: ts + (k − 1)∆t < te ≤ ts + k∆t .
At a given t2 , for each model, each fit is then evaluated to obtain different
estimates of the model parameters. Thus, for each model m ∈ M, we obtain a
set of parameters (Eq.55):
Ptm2 = {Ψm (t1 , t2 )|t1 ∈ F (t2 )} , (55)

m
where Ψ (t1 , t2 ) is the set of parameters obtained by calibration of the model
m on the time window [t1 , t2 ].
Each set Ptm2 yields various estimates of the expected growth µ̂t2 (see Expressions
(46) in the previous section) over the same, next interval [t2 , t2 + ∆t] (Eq.56):
µ̂m,t
t2
1
:= µ̂m (Ψm (t1 , t2 )) ∀t1 ∈ F (t2 ) . (56)
Define the set of expected growth estimates resulting from calibration of model
m at time t2 over different fit windows T (t2 ) as (Eq.57):
M m (t2 ) := {µ̂m,t
t2 , ∀t1 ∈ F (t2 )} .
1
(57)
Analogously, the expected cumulative variance estimates are defined as (Eq.58):
Σ̂m,t
t2
1
:= Σ̂m (Ψm (t1 , t2 )) ∀t1 ∈ F (t2 ) . (58)
which leads to the set (Eq.59):
S m (t2 ) := {Σ̂m,t
t2 , ∀t1 ∈ F (t2 )} .
1
(59)
From the expected growths and expected cumulative variances, which are the
fundamental quantities at the core of our model-based Kelly strategy, the Kelly
weights λm,t
t2
1
are computed. Let the set of all Kelly weights for a model m at
time t2 be (Eq.60):
30
m,t1
Λm
t2 := {λ̂t2 |t1 ∈ F (t2 )} , (60)
with
µ̂m,t
t2
1
λ̂m,t
t2
1
:= . (61)
Σ̂m,t
t2
1
Finally, the entire set of investment fraction estimates throughout the entire
window of simulation collects Set (60) over all t2 :
m,t1
Λm
T := {λ̂t2 |t1 ∈ F (t2 ), t2 ∈ T2 } . (62)
The proposed backtest simulation algorithm produces several estimates of the
investment fraction for each rebalancing time. However only a single estimate
is required to simulate the actual portfolio evolution from each time step to the
next one. Thus, the natural question how to deal with multiple estimates of λ
at each t2 arises. In this study, two ways of combining the multiple λ-estimates
into a single forecast and giving rise to corresponding trading strategies are
developed:
• The Quenched Kelly Weight:

Let us assume that at time t ∈ T2 (see Eq.54) a model m ∈ M is fitted over
time windows T (t) (see Eq.50). Which give rise to a set Ptm (see Eq.55) of
parameters obtained after fitting and leads to the set of expected growths
at time t + ∆t, M m (t) (see Eq.57).
Based on all the expected growths in M m (t), it is possible to construct
an empirical probability space (Ω, P, F) for the growth of the asset, such
that the Kelly weight can be computed by Equation 31. This approach
assumes that all the sets of parameters in Ptm are relevant, where a goal
of this study is to estimate the most relevant subset Ptm 0 ⊆ Ptm which
could be empty (this represents the case where the models is completely
rejected). Then if a fit should be irrelevant and it gives rise to an extrem
value µ ∈ M m (t), it will bias the Kelly weight.
It is in this optic that the quenched Kelly weight has been developed. The
idea is to construct an empirical probability space (ΩΛ , PΛ , FΛ ) based on
the observations Λm (t). Then the quenched Kelly weight can be computed
by (Eq.63):
λ̂m (t) := EPΛ [λ] . (63)
The goal of the quenched Kelly strategy based on a model m ∈ M is to

be able to consider that each fit could give rise to a strategy by itself and
the quenched weight is the average over all the possible strategy given by
a model and its different fits.
Thus, the quenched weight is the average over all the forecasts produced
by a given model and the corresponding fits. In case of LPPLS model
forecasts, only the forecasts corresponding to qualified fits, as defined in
Section 2.2.4, p∼8, are considered in the computation of the expectation.
31
The difference between the original Kelly strategy and the quenched one
is based on the definition of the LGR, G(λt , t). For the first one, it is
seen as a function of the return of the asset such as in Equation (31) (i.e.
G(λt , rt ) = ln(1 + Λt rt )). For the quenched one it is seen as a function
of the parameters of the model considered Ψ and then it is computed for
each fits and then the expected value is taken.
The success of the quenched Kelly strategy means that most of the fits
give rise to a set of Kelly weights polarized in the good directions and so
it will mean that the time series is strongly characterized by the model.
The name of the quenched Kelly weight came from the Physics and partic-
ularly from the study of spin glasses (see Appendix F, p∼92). ”Quenched”
refers to a system in which some parameters defining its behavior are ran-
dom variables (here it is the parameters of the model) which do not evolve
with time, where the opposite is ”annealed” in which the random variable
are allowed to evolve themselves. The assumptions to validate a model
is that only a subset Ptm 0 ⊆ Ptm characterized the dynamics for all time
windows in T (t).
• The Mixed Kelly Weight:
The hypothesis to define this new strategy is to consider that all the models
could be valid simultaneously, since they all give different information
about the time time series. Under a such assumptions we construct the
set of all the expected growth (Eq.64):
[
M (t) = M m (t) , (64)
m∈M
where we only considered qualified fits (see Section 2.2.4, p∼8).

Since we do not have more information, we will assumes for now that each
µ ∈ M (t) are as relevant as the others. Then we can construct a uniform
density over M (t). Such that the mixed Kelly weight at time t is defined
by (Eq.65):
E[µ|M (t)]
λt = , (65)
Var(µ|M (t))
where E[.|M (t)] and Var(.|M (t)) are respectively the expected value and
the variance over the probability measure generated by the set M (t). In
this case we assumed a uniform law over M (t).
The mixed Kelly strategy is a first attempt to construct a strategy based
on all the model such that the strategy is able to adapt itself to the current
state of a time series.
Note that, as the computation of the Kelly fraction is based on predicted growth
µ̂, the corresponding LPPLS strategy does not actually make use of the cali-
brated value of the critical time, as is often done, as the critical time is seen
as a central model parameter of interest. Instead, here, the predicted critical
time only indirectly influences the strategy through the value of the drift. A
reason not to use the critical time value directly is that it is the least accurate
32
parameter of the LPPLS since its profile in the log-likelihood is the one with
the least curvature [6]. Therefore, as the Kelly strategy combines all LPPLS
parameters (i.e. the sloppy critical time and the more stable, other parameters)
into a single figure, it might yield more stable and reliable results. So a direct
application of the critical time in a trading strategy is not trivial. That is why
getting a probability measure which will define the truthiness of each fits is so
important. It could help us to get a good estimation of the critical time. If we
succeed to estimate in current time if we are in a bubble it would help to predict
when it will end.
The goal of these two strategies is different. The quenched strategy will be used
to characterize the state of a time series, where the mixed one combines all the
models into a single strategy.
We will use the information obtained from the sets Λm m
T and M (t), for all
m ∈ M to construct new density of probabilities over the set M (t) to complete
the mixed strategy.
Based on these series, different portfolios corresponding to each strategy will be
simulated. This will be the topic of the next section.
3.4 NASDAQ Dotcom Bubble Example Run

In this part we will run the different model-based quenched Kelly strategies and
the mixed Kelly strategy on the NASDAQ 100 from 1985 to 2020. The goal is
to use this example to extract some features about the strategies as a first start
for future studies.
Since five models are used in this paper (see Table 1), five quenched Kelly
strategies are going to be computed. Additionally, the mixed Kelly method will
provide a further strategy combining all the results.
Figure 4 is an example of results obtained on the NASDAQ 100, when the
observation is done on the 1995-06-14 and the prediction is given for the 1995-
06-21. This example have been taken as random on all the decision taken on
the NASDAQ 100 from 1985-08-30 to 2020-02-28. There is the CDF of the
Kelly weight, the expected growth and the variance for the LPPLS + OU and
GBM models. The red line in the graph of the expected growths is the growth
observed from 1995-06-14 to 1995-06-21 and the dark lines on all the graphs are
the expected values.
The LPPLS model gives a wide range of expected growths in [-20,10], but 60%
of the results are in [-2.5, 2.5]. This is due to fits which estimate that the critical
time is close to 1995-06-21 and thus it predict an explosive behavior between
1995-06-14 and 1995-06-21. On the other hand, the GBM predicts a growth of
the same amplitude of the actual mean growth observed but fails to correctly
estimate the sign of it.
This is interesting features to show to motivate the past construction of the
mixed Kelly strategy. The LPPLS variants (LPPLS+GN, LPPLS+BM, LP-
PLS+OU) will give rise to a few extrem values for the expected growths. Where
the geometric models (GBM, MGBM) will give growths of the same magnitude.
Then when the LPPLS extrem values will be qualified following Section 2.2.4,
p∼8, it will take in count this extrem value. So the mixed is based on geometric
models when the LPPLS is not qualified and when the fits of the LPPLS models
are qualified, it will take in count the possibility to observe a bubble followed
by a crash.
33
Figure 4: Example of results obtained for the LPPLS+OU (upper figures) and
the GBM (lower figures) on the NASDAQ 100. Each subfigure are the plot of the
CDF (dark points) of: the weight invested (right column), the expected growth
(central column) and the cummulative variance (left columns). The vertical
black line in each figure is the corresponding expected value and the red line in
the graph of the expected growths are the growth realized from 1995-06-14 to
1995-06-21. The LPPLS model gives a wider range of expected growth (from
-20 to 10). This is due to fits which estimate that the critical time is close to
1995-06-21 and thus it predict an explosive behavior between 1995-06-14 and
1995-06-21. On the other hand the GBM predicted a growth in the correct
range of magnitude, but it fails to estimate the sign of the growth.
Figure 5 shows the results for each quenched strategies and for the mixed one
on the NASDAQ 100 from 1985-08-30 to 2020-02-28. The upper panel of the
figure shows the portfolio value of each strategy and of the buy and hold (B&H)
strategy (we buy the asset at the initial time and keep it until the end). The
middle panel shows the weight of the portfolio value invested in the asset from
time t to t + ∆t (where ∆t = 5 trading days). The last panel is the absolute
value of the confidence indicator (CI; see Section 2.2.4, p∼8) for each model.
The LPPLS-based strategies are uncertain and unstable, at least for this specific
run on the NASDAQ time series. The computed values of the Kelly weight
34
oscillate unsteadily between −1 and 1. Nevertheless, we observe large values
of the weight, close to the peak, is the consequence of higher predicted returns
near the peak of the Dotcom bubble. This indicates that the LPPLS dynamics
emerging during the bubble were valid and correctly recognized in the final
phase of strong index growth. The geometric-based strategies (GBM, MGBM)
are much more stable than the one based on the LPPLS, as they produce more
steadily fluctuating portfolios that even manage, with a short delay after the
peak, to circumvent the Dotcom crash. Finally, the mixed strategy (green)
combining information from the LPPLS and GBM strategies together is the
most stable one, as the computed values of the optimal investment fraction
are mostly much smaller than 1, in contrast to the more ”aggressive”, other
strategies for which λ on average takes larger values.
The CI of the LPPLS models seems to be correlated to the CI of the geometric
models. The CI developed in this paper for the geometric models is based on
the average sign of the trend observed by each fits. Then a low CI would mean
that the fits of geometric models does not polarized in a directions. When a
bubble burst and a drawup/drawdown is observed the CI of geometric models
will drop since for the past few days the trend will be positive/negative , but on
a long run it will be the opposite due to the drawup/drawdown. As well after
the critical time the CI of the LPPLS models will drop. When a strong trend
is observed (i.e. the CI of a geometric model is closed to one), after a while
the CI of the LPPLS models increased. This would mean that when a concrete
trend is observed most of the time a transition occurs from a stable growth to
a bubble.
The goal in the next section is to propose an empirical formalism based on
the success of each quenched Kelly strategy to define the bubble periods (there
starting and ending time) and the probability of truthness of each fits to be able
to define a better probability measurement on ∆Pt to catch the main features
of a time series.
A more elaborate study of the different strategies will be done in Section 5.2,
p∼83. First we will apply the strategy results to identify bubble episodes.
35
Figure 5: Application of the quenched and the mixed strategies based on dif-
ferent models on the NASDAQ 100 from 1985-08-30 to 2020-02-28. The upper
panel plots the value of the portfolio obtained for the different strategies. The
second panel is the portfolio weight invested for the different strategies. The
lower panel is the absolute CI (Section 2.2.4, p∼8) for the different models. The
GBM-based quenched Kelly strategy (continuous blue line) is the one which
gives the best results even if it does not succeed to predict the crash in 2000, but
it succeed to estimate the negative trend a bit after. Where the LPPLS-based
strategies (LPPLS+GN: red dots; LPPLS+BM: dashed red line; LPPLS+OU:
continuous red line) are much more unstable since they are searching super-
exponential growths with log-periodic oscillations, then if it estimates a critical
time close to the horizon time of the decision, it will polarised the weight to
invest. This is the reason why the LPPLS weigths are oscillating all along from
1 to -1. The mixed strategy (green line) is the most stable strategy and it is
able to polarized in the good direction to catch drawups.
36
3.5 Identification of Bubble and Crash Episodes
3.5.1 Outline of the Procedure
The goal of this section is to define the global success of strategies in order to
construct a procedure to label over a periods the global phases of a time series
into bubble or not bubble.
The quenched Kelly strategy for each model is going to be the base of the next
study. The goal is to define a measure of the success of a strategy. This mean
that we will try to define the moment when a strategy will outperforms itself
based on different metrics which quantified the success of a strategy (Tab.4,
p∼38).
The direct comparison of the strategies to each other would not make any sense,
if the objective of this study is to determine the right model which characterized
the time series. Since all strategies are based on different models and so the time
series features that they will highlight will not be the same. This is the reason
why before comparing the different strategies with the others, we first compared
the strategies to themselves. To be able to define the success value of a strategy
versus itself we need to define different financial metrics which can be seen as
observables characterizing the performance of a strategy trajectory.
The idea is to apply the -drawup/drawdown method (see Appendix G, p∼93)
on each metrics (Tab.4) to define when the trend of a metric is to increase or
to decrease.
Then we will consider that if a strategy is in a drawup under a metric, it will
mean that the strategy is successful under this metric. This means that if in
average a metric is increasing it will give a success point to the strategy, such
that the success value of a strategy will be the average of the success points over
all the metrics considered.
Then to define the state of a time series, the success value of each strategy will
be compared to define the success weight of a strategy versus another. Based
on this success weight, a global phase labelling procedure will be constructed.
3.5.2 Metrics
In this part we will present different metrics summarized in Table 4, that will
be used to assess strategies performance. Based on these metrics we will defined
a set of observables for each strategy at each time t. These sets will be used to
characterize the success of a strategy at each instant in a backtest study.
The metrics presented in Table 4 are collected in
O := {V, Sr, CAGR, VaR, acc, R, CI} . (66)

For each metric o ∈ O (except for the portfolio value and the confidence indi-
cator), its value at time t is computed based on portfolio data in a lookback
window of size T and denoted by:
oTt := O(t|DtT ) , (67)

where O(·) is the operation required to compute the corresponding metric over
the required data DtT in the window [t − T, t].
37
Metric Definition Formula
The portfolio
Portfolio Value Vt
value at the
current time t.
The ratio of the
Sharpe ratio SrTt = hµt i|T
mean to stan- ∆µt |T
dard deviation
of the compound
return.
The Compound T1
CAGR
Annual Growth CAGRTt = Vt
−1
Vt−T
Rate.
The Value-at- t
VaR0.05 VaRTt = Q({µs }s=t−T , 0.05) where
Risk at the
0.05-quantile of Q(x, q) is the function yielding the
the compound q-quantile of x.
return.
The strategy Pt
Accuracy accTt = (1/T ) s=t−T 1{λs =µs }
accuracy as the
fraction of inter- where 1{.} is equal to one if the ar-
vals where the gument in it is true and zero other-
predicted sign of wise.
λ was the same
as the sign of the
price change over
that interval.
The average re- −
Average return
turn of the port- RtT = accTt hµt i|+ T
T + (1 − acct )hµt i|T
folio over the pe-
riod of time [t −
T, t]
The confidence
Confidence In- CIt
indicator based
dicator (Section 2.2.4, p∼8)
on the model
used by the
quenched Kelly
strategy.
Table 4: The different metrics considered to estimate the success of a strategy.

Where µt = (Vt − Vt−1 )/Vt−1 its compound return at time t, hµt i|T and ∆µt |T
define respectively the mean-value and the standard deviation over the time
interval [t − T, t], Q(., q) is the q-quantile of a given set, λt is the portfolio
−
weight invested on the time interval [t, t + ∆t] and h.t i|+T (and h.t i|T ) is the
average over the positive (and negative) value.
The lookback period T is of course an arbitrary meta-parameter of the proce-

dure. Thus, computed values for the metrics may vary greatly depending on
its choice. In order to ”average out” the influence of T, each individual metric
is thus computed over many differently sized lookback windows at each time t,
similarly to calibrating the models to windows of different sizes in the previous
chapters. Let
38
õTt = {oτt , τ = 3, .., T } (68)
be the set of values computed for a specific metric o ∈ O for the different time
windows of length τ ∈ [3, 4, .., T − 1, T ].
Next, the q-quantile of Set (68) is denoted as:
Q(õTt , q) . (69)
Then:
OtT = Q(õTt , q), ∀o ∈ O, ∀q ∈ Q

(70)
be the set of all rolling q-quantiles (q ∈ Q; in this paper we used Q =
{0.1, 0.5, 0.9}) over [t − T, t]. Then the set OtT defines all the rolling quan-
tiles based on each metric for a strategy at time t, over the past period T.
Finally, let us consider a set of different time period T = {Ti , i = 1, .., NT } (in
this paper we used T = {100, 200, 300} in days) in this case the set of all the
rolling quantiles of a metric for a strategy at time t, over the past periods T ∈ T
is:
[
etT =
O OtT . (71)
T ∈T
Note that the set etT

O also contains Vt and CIt . For the portfolio value (V) and
the confidence indicator (CI), the value are taken directly with out looking back
over the results and without taking into account the quantiles.
The motivation which give rise to the choice of the set T = {100, 200, 300} (in
days), came from the fact that we take a decision each 5 trading days and in
step size of 5 trading days, approximately 13 rebalancing times occur per 100
trading days. Thus, for each metric o ∈ O, there will be 13 data points for the
last 100 days. So if we use a lookback period below hundred days we will not
have enough points to obtain a good value for the different quantiles. Also the
observations over the past 100 days is equivalent to observe over the past 3-4
months and the one over the past 300 days is close to observe over the past year
and the observation over the past 200 days gives a results in between. Since
there is less data points for a smaller lookback period, it will be more sensitive
to the new data point taken into account at time t. Then big changes from
t − ∆t to t will be visible in the observation over the past 100 days. But it
will not be able to observe longer range changes and it will give results more
sensitive to the noise. On the other hand the observation over the past 300 days
will characterized the long term changes in the strategies performances but it
will be less sensitive to the new data point.
Then we have construct the set O etT which corresponds to all the observation
at time t on the strategy based on the rolling quantiles of the different metrics
over different lookback period. We will call an element of O etT an observable
such that, Vt , CIt and the q-quantile (q ∈ Q) over the past period T ∈ T
(Q(õTt , q)) represent all the observables at time t. Since we have defined five
metrics (without counting V and CI), for which we computed the rolling 0.1,
0.5 and 0.9-quantiles over the past 100, 200 and 300 days. These leads to
3 × 3 × 5 + 2 = 47 observables for each strategies.
39
In the next section we will develop a mathematical formulation of the success of
a strategy. We will use the -drawup/drawdown algorithm on each observables
eT , such that if an observable at time t, ot ∈ O
in O eT , is in a drawup, it will give
t t
a success point to the strategy at time t (Eq.72, p∼45). Then its success value
will be the average at time t of the success points over all observables (Eq.73,
p∼45).
From Figure 6 to Figure 9 there is examples of different rolling metrics on the
NASDAQ 100 for the different strategies.
Figure 6 and Figure 7 show respectively the results for the rolling quantiles
for the Sharpe ratio and the Value-at-Risk over the past 300 days for the buy
and hold strategy and the quenched Kelly strategies based on the GBM and
LPPLS+BM models. They show that each strategy gives different results, for
example around 2002-02-02 the buy and hold and the LPPLS+BM strategies
Sharpe ratio decreased in average where the GBM one had a peak. But all Value-
at-Risk are increasing around this date. Then, the conclusion would be that the
GBM model is characterizing more the time series than the LPPLS+BM model
around 2002-02-02. This is just an example, but the argumentation forms is the
base for the idea that it is going to be developed in the next part to compare
strategies.
Figure 8 shows the rolling Value at Risk of the LPPLS+BM quenched Kelly
strategy on the NASDAQ 100 on the past 100, 200 and 300 days. It is possible
to observed that when we increased the size of the time windows it smoothed the
metric but it also means that the extreme values when the strategy is performing
well or bad over a short period are hidden in the rolling metric. This is an
example of the reason why we used different sizes of time windows.
Figure 9 shows the rolling Sharpe ratio, CAGR and Value-at-Risk over the past
300 days of the GBM quenched Kelly strategy. Observe that around 1999 the
Sharpe ratio and the CAGR show drawups where the Value at Risk shows in
average a drawdown but its 0.9-quantiles has a drawups. The idea to estimate
the success of a strategy developed in the next part will be based on this. When
a metric will be in a drawup, it will increase the success value of a strategy.
40
Figure 6: Sharpe ratio for the LPPLS+BM (upper), GBM (center) quenched
Kelly strategies and the buy and hold (B&H) strategy (lower) on the NAS-
DAQ 100 from 1985-08-30 to 2020-02-28. The dark shade represent the interval
from the 0.1-quantile to the 0.9-quantile and the dark line in it is the median.
Each strategies exhibits different results, which is going to be used to compare
strategies success.
41
Figure 7: VaR for the LPPLS+BM (upper), GBM (center) quenched Kelly
strategies and the buy and hold (B&H) strategy (lower) on the NASDAQ 100
from 1985-08-30 to 2020-02-28. The dark shade represent the interval from the
0.1-quantile to the 0.9-quantile and the dark line in it is the median. Each strate-
gies exhibits different results, which is going to be used to compare strategies
success.
42
Figure 8: The rolling VaR for the LPPLS+BM quenched Kelly strategy for
different windows size T: T = 100 days (upper), T = 200 days (center) and T =
300 days (lower). The dark shade represent the interval from the 0.1-quantile to
the 0.9-quantile and the dark line in it is the median. When the time window T
increases the the rolling VaR gets smoother, but peaks which could characterize
a sudden success disapears. This is the reason why we are considering different
time windows to compute the rolling metrics and we measure different quantiles
over each time windows.
43
Figure 9: The rolling metrics for the GBM quenched Kelly strategy on the time
windows T = 300 days: Sharpe ratio (upper), CAGR (center), VaR (lower).
The dark shade represent the interval from the 0.1-quantile to the 0.9-quantile
and the dark line in it is the median. In this example, when the VaR goes down
the CAGR goes up, which is what we are going to use to characterize the success
of a strategy. Beacause even if a strategy exhibits high returns it should also
has a low risk.
44
3.5.3 Success Weights
The goal of this section is to quantify the success of a strategy to be able to
compare the different strategy, to finally estimate the regime of a time series at
time t.
To be able to qualify the success of a strategy, all the observables in O etT are
considered.
The idea is to determine when the strategy performs on average better than the
past period. We will say that a strategy is performing better over an observable
in OeT , if this observable is increasing in average.
t
To achieve this objective the drawup and drawdown of each observable are
computed following the -Drawup/down algorithm (we used 100 values of
linearly spaced from 0.1 to 5 and 10 different time windows linearly spaced
from the past 10 data points to the past 60 data points; see appendix G, p∼93).
Then it is possible to define a measure G for all ot ∈ O etT such that (Eq.72):
(
T 1 if o is in a drawup at time t,
∀ot ∈ Ot , G(ot ) =
e (72)
0 otherwise.
So the success value α of a strategy at time t is defined by (Eq.73):

1 X
αt = G(ot ) . (73)
etT |
|O
ot ∈O
etT
The success value αt could be seen as a probability density that the strategy
succeed to increase the level of the specific metrics. Since the maximal per-
formances of a strategy depends of the time series on which they are applied.
We decided in this first step to compare the strategy with itself locally. We
use the hypothesis that if in average most of the observables are increasing, the
time series is more and more characterized by the model on which the quenched
strategy is based on.
The next step is to compare the success of each strategy, let us denoted by αtm
the success of the quenched Kelly strategy based on model m ∈ M. The success
weight ωtm of a strategy compared to the other strategies is (Eq.74):
αtm
ωtm = P . (74)
s∈M αts
The issue with Equation (74) is that we do not take into account that none of
the model characterized the time series at time t. So to be able to define if the
model is something else, we introduced a threshold such that if a model success
value is under the threshold, it will be set to zero in Equation (74).
Equation (74) have been adapted to obtain Equation (75):
αtm Θ(αtm − δ)
ωtm = P s s , (75)
s∈M αt Θ(αt − δ)
where Θ(.) is the Heaviside function and δ is a threshold. For this study the
threshold have been set to δ = 1/3, because there is three time scale for the
rolling quantiles and we measured the 0.1,0.5 and 0.9-quantiles for each time
scales, so the idea is to consider that if in average for a time scale all the
45
observables are in a drawup, it is possible to consider that the success of the
strategy is high enough to compare it to the other. Another point would be
to say that the threshold is the minimal success value such that the success of
a strategy is significant enough to be considered. Then we chose δ = 1/3, by
saying that if a third of the metrics are increasing, the model could characterized
the time series.
Then the probability that none of the model work is defined by (Eq.76):
(
o 1 if: ∀m ∈ M, αtm < δ,
ωt = (76)
0 otherwise.
The main issue with the success value is that α increases for a strategy if and
only if several metrics increasing over time. Which impose that the metrics have
to grow forever, if we want to maximize α all the time. On the examples given
from Figure 6 to Figure 9, none of the metric stay constant. They all vary.
Then the theoretical issue with the success value does not happen in practice
because the quenched strategies vary all the time.
In the next section, we will define a global phase labelling procedure based on
the success weight of each strategy to define when we are in a bubble or not.
3.5.4 Regime Classification

The goal of this section is to define a procedure to classify phases of a time
series over a period into two categories:
• Bubble regime: characterized by the LPPLS models;
• Non-Bubble regime: characterized by the geometric models.
The main idea is to simulate in parallel the performance of all strategies over
time. Then, the price dynamics are classified to emerge from the bubble or
non-bubble model that currently has the highest success weight (see previous
section), or in other words, currently performs best in terms of the performance
of the Kelly strategy corresponding to the model.
As, there are several models representing bubble and non-bubble dynamics, the
corresponding models are grouped in two sets, the bubble set and the non-
bubble set. Bubble dynamics are characterized by LPPLS signatures. Thus, if
any of the following LPPLS model variants is the model yielding best strategy
performance in terms of success weight during a specific period of time of the
asset price, then the corresponding period is labelled as bubble / ”critical”. The
set of these ”critical” models is thus given by:
Mc = {LPPLS+GN, LPPLS+BM, LPPLS+OU} ⊂ M . (77)

The models characterizing non-bubble growth are the ones based on the Ge-
ometric Brownian motion assumption and are grouped into the non-bubble /
”stable” set:
Ms = {GBM, MGBM} ⊂ M . (78)

In order to compute the success weight of the entire sets of models, the individual
success weights of the single models are grouped into two corresponding sets.
46
Based on these values, the empirical probability Ωpt that at time t the time series
can be classified as class p is computed as (Eq.79):
ωtp
Ωpt = , where: ωto = 0,
ωts + ωtc
1 X (79)
and: ωtp = ωtm , where: p = s, c,
|Mp |
m∈Mp
where ωtois the weight that none of the models characterized the time series at
time t (Eq.76, p∼46). Thus, at first, the average success weight of the models
corresponding to the bubble / ”stable” and non-bubble / ”critical” classes is
computed and then, they are combined. The first step is required for a fair
comparison, as there are not the same number of LPPLS-based models (three)
as geometric ones (two).
As, at time t, the estimates Ωpt are just based on a few values, a more robust
estimate of these values is obtained by applying a centered moving window
median filter to the series of {Ωck }k=t−T,...,t+T , such that
T
1 X
Ωct → Ωct+k , (80)
2T + 1
k=−T
where T = 22 trading days.

This smoothed estimate of Ωct is regarded as the probability that the price at
time t belongs to the class bubble. The complementary probability
Ωst = 1 − Ωct where: ωto = 0, (81)

of the smoothed estimate yields the probability that the price at time t be-
longs to the class of non-bubble price growth models. There are two important
points. Note that Eq.(81) is computed as the inverse probability, as taking two
smoothed estimates according to Eq.(80) would not guarantee the numbers to
sum up to 1, anymore. Furthermore, note that it is valid here to take the cen-
tered mean, in order to obtain a non-lagging estimate of the T -window median,
as this procedure is allowed to look into the future after the series of strat-
egy success weights have been simulated without forward-looking; the current
chapter does not deal with ex-ante prediction of trading decisions, but instead
compares simulated in-sample strategy performances to find the best suiting
class of models. This results in an ”ex-post” classification / labelling of price
data sets into different regimes. Finally, later on, these labelled datasets will be
used train supervised models that may predict regime changes or trading deci-
sions and other related quantities of interest out-of-sample. The importance of
this approach is given in the fact, that before for instance testing the predictive
power of the LPPLS model, we must ensure that LPPLS predictions are only
”sent out” in the correct periods of times that really correspond to LPPLS dy-
namics; there would be no point in applying a model to a time series that does
not obey the assumed model dynamics. Instead, beforehand, the suitability of
the model has to be predicted, or in other words, model selection has to be
carried out. This is what this section solves, by proposing a labelling procedure
for model selection. The procedure will be explained in the next section.
The probability of being in a bubble state at time t, given by Equation (80) re-
minds us strongly of the DS LPPLS Confidence Indicator, with the fundamental
47
difference that, here, the validity of the LPPLS model is not only confirmed by
looking at the qualified model parameters, but also by comparing the suitability
of the model to other models that may possibly describe the price dynamics.
Thus, Ωct theoretically brings us a step closer to the idea of giving a ”probabil-
ity of being in as bubble state”. Therefore, Ωct is proposed as a new metric to
evaluate when classifying the state of a price series.
In the next part we will try to estimate in an ex-post classification, based on
Ωc , the starting and ending time of bubbles.
3.5.5 Bubble Start and End Times

In the previous section, we derived the empirical probability of a price series
being in a critical phase or not at time t. The next step is to construct a
procedure to extract the ending and starting time from this probability and
the drawups/drawdowns of the price time series. We will call the lapse time
between starting and ending time of a bubble: global phases. Such that the
phase between a starting and ending time is LPPLS/bubble and the one between
an ending time and a starting time is geometric.
The reason why we are searching the starting and ending time of bubbles is to
use them in a next study (see Section 4.2, p∼64) to train a neural network to
predict when we get in a bubble and we are going to exit it. The goal of this
next study will to obtain a new type of estimation of the critical time and also
of the starting time of the bubble. From Figure 5 we see that the LPPLS CI
surges close to the point where the bubble burst. Then if we succeed to estimate
the starting time of a bubble in a backtest study, to train a model-free (”black
box”) network to predict/estimate when we will get in or get out of a bubble.
Let us define:
τ2 = {t| Ωct = 1} , (82)

which is the set of times where all strategies based on a geometric growth failed
and the strategies based on the LPPLS performed well enough.
In the synthetic data part of this thesis (and many other studies by Sornette et
al.), it was found, that the LPPLS model often accurately fits price dynamics
and the LPPLS CI surges, when t is close to the actual critical time tc of the
crash. Therefore, the set τ2 which collects all times where the LPPLS model
strategies dominate provides a collection of times that likely corresponded to
actual critical times. The reasoning behind this is simply that since it is known
that the LPPLS model performs well close to the critical time, when we find a
high performance of the LPPLS model at time t, compared to other models, as
characterized by the metric Ωct , it is likely that time t is or is close to the critical
time tc .
Once we obtained candidates τ2 for the ending time of a bubble the next step
is to obtain a set of candidates for the starting time. The idea is to search
transition in Ωct . By transition we mean the instant before the Ωct changes
suddenly. A first idea could be to search the extrema in its first derivative,
but this extrema characterized the moment where Ωct changes suddenly and we
want the instant just before. This instant corresponds to the moment where the
curvature is maximal. This is characterized by the Laplacian (second derivative)
of the function. This idea cam from image processing, where it is used for edge
detection in an image.
48
The absolute Laplacian of Ωct is computed by:
∆Ωct = |Ωct+1 − 2Ωct + Ωct−1 | , (83)

then to be able to estimate possible phase transition the set of local maxima of
∆Ωct are regrouped in the set: τ1 .
Once the possible ending (τ2 ) and starting (τ1 ) time of a bubble are obtained the
goal is to link them to the last point of drawups (τup ) and drawdowns (τdown )
of the log-price of the asset (or the index): τ = τup ∪ τdown .
The first step is to link element in τ to element in τ2 . The idea is to consider
that the last point of a drawup/drawdown is a possible end for a bubble. Since
τ represents possible ending time of a bubble and we state that close to the
critical time of a bubble Ωct = 1. Then the candidates to be the ending time of
a bubble are (Eq.84):
τend = {t ∈ τ | t0 ∈ τ2 , |t − t0 | < ∆t} , (84)

where the set τend represents contains the ending point of drawups/drawdowns
of the log-price time series which are arbitrarily close to points where the critical
probability is one, where ∆t defines how close the points of the two sets should
be. In this paper we used arbitrarily ∆t = 40 days.
We succeed to define possible ending time of a bubble (τend ), but nothing tells
us that the elements in τend are truthful ending time. What we know is that
they are ending point of drawups/drawdows of the log-price series and close to
these points the LPPLS-based quenched Kelly strategies is successful. Then we
will consider that this last sentence is the definition of the ending time of a
bubble.
To obtain the set of true bubble starting times τstart , the following algorithm
has been constructed:
1. Entries: τend , τ1 , τ , pt the price time series and ∆t > 0.
2. Initialization:
(a) Sort τ , τend and τ1 .
(b) For the first element t2 ∈ τend , take:
pt2 − pt ,

 arg t∈τ ,max if pt2 is in an -drawup,
1 t<t2 −∆t
t1 =
 arg max pt − pt2 , if pt2 is in an -drawdown
t∈τ1 , t<t2 −∆t
(When the ending time is in a drawup, the bubble is considered to be

positive, otherwise it is a negative bubble).
3. Add t1 to τstart .
4. For all the other t2 ∈ τend :
(a) Take T = {t ∈ τend |t < t2 − 3∆t}.
(b) If |T | > 0:
• Take: t0 = max T
49
• Set T such that:
(
τ1 ∩ [t0 + ∆t, t2 − ∆t], if: τ1 ∩ [t0 + ∆t, t2 − ∆t] 6= ∅
T =
[t0 + ∆t, t2 − ∆t], otherwise
• Take t1 ∈ T such that:


 arg max pt2 − pt , if pt2 is in an -drawup.
t∈T
t1 =
 arg max pt − pt2 , if pt2 is in an -drawdown.
t∈T
• If the bubble (t1 , t2 ) overlaps other bubbles. Compute the price

variation for all the possible configurations and take (t1 , t2 ) such
that the variation of price is maximal and erase from τstart and
τend the other bubbles.
(c) Else, take: t1 = max {τend ∩ {t < t2 }}.
(d) Add t1 to τstart
5. Exit: τstart and τend
Where ∆t is the smallest distance between two consecutive starting and ending
(the smallest size of bubble observable). In this paper we used ∆t = 60 days.
The idea behind this algorithm is to search for each possible ending time, the
starting time such that it maximizes the price changes. If element in τ1 are in the
time window considered, we concentrate our search on these points. Otherwise
we search over all the time window.
At each step we are sure to find a bubble, but bubbles can overlap. We two
bubbles or more overlap, we search in the set of all there starting and ending
time the bubble which maximize the most the price changes, but we keep in
mind that an ending time in a drawups is assumed to be a positive bubble.
Then we maximize the growth, otherwise we minimize it.
The ”if-else” condition (4.b, 4.c in the algorithm) is here for the first few points,
because it is possible that the distance between the first ending times in τend is
smaller than 3∆t. In this case we take the previous ending time as a starting
time.
Figure 10 shows an example of the regime classification applied to NASDAQ
100 from 1985-08-30 to 2020-02-28.
The upper panel shows the portfolio value for each quenched strategy and the
price of the index. In the background there is a color gradient which classified
the current regime. Blue corresponds to the case Ωst > Ωct , which means that
the growth follows a geometric trend and so the time series is in a stable phase.
As well, the color red indicates the case Ωst < Ωct , which means that the corre-
sponding time is classified as critical / bubble phase. When the classification
is uncertain, i.e. Ωst ≈ Ωct , the color is green. The special case when none of
the models are successful enough is represented by the gray color. This case
corresponds to ωto = 1 (Eq.76, p∼46).
The vertical dark lines represent the phase transition (for example, passing from
a bubble to another bubble or a stable phase).
Most of the time, the classification is uncertain (green), as both LPPLS and
GBM strategies possibly characterize parts of the price dynamics.
50
There are few moments where we are able to observe that the strategy is strongly
characterized by a model.
The second panel shows the global phases over a time period. The bubble regime
is more present than the geometric dynamic. This is due to the way the labelling
procedure works. This procedure is already biased since the bubble phase are
searched in the first place then the geometric ones fill the gaps between each
bubble.
In this paper the procedure has been developed such that it is going to over-
estimate the bubble phases because it seems preferable to obtain false positive
than false negative. We prefer to exist the market, even if no crash occurs, than
staying in the game and suffering from a crash.
The third and fourth panels show respectively the phase probability and success
values. It is possible to observe that the success value of each model oscillates
from zero to one. The geometric models seldom reached a class probability of
one where the LPPLS ones reached it often. This is because the LPPLS models
did not obtain all at the same time a success value below 0.3 (where 0.3 is the
value used for the threshold in Eq.75), where the success values corresponding
to the geometric models often suddenly drop. The reason why this happens for
the GBM models could be due to oscillations or crashes characterising a bubble
(under the hypothesis that the LPPLS model describe well bubble behavior).
The Appendix shows a zoom on each episodes in Figure 10.
51
Figure 10: Global phase labelling on the NASDAQ 100 from 1985-08-30 to
2020-02-28. The first panel shows the price of the index and the value of the
portfolio obtained from the different model-based quenched Kelly strategy. The
second panel shows the global phase over a period: in blue it means that the
time series is out of a bubble; in red it means that the time series is in a bubble.
The third panel shows the phase probability to be in or out of a bubble wich
is based on the success values of each strategy exposed in the last panel. In
average the phase probability to have a geometric growth or to be in a bubble
are closed to 0.5, but there is moment where the probability to be in a bubble
rise to 1. This means that the geometric-based strategies lose suddenly and the
LPPLS-based ones are succeeding enough. It is around this instant that we are
searching possible bubbles ending time.
52
3.6 Conclusion
To succeed to determine current state of a time series and starting and end-
ing time of bubbles, we constructed model-based quenched Kelly strategy. We
defined the hypothesis that if a model-based quenched Kelly strategy is suc-
cessful, the model well characterize the price series at this instant. Since each
model extract different features of a time series, we defined that a strategy is
successful, if it succeed to increase in average the value of different metrics.
This hypothesis gives rise to the notion of success value. Based on the success
value of each model-based quenched Kelly strategy, we defined a success weight
for each model, which can be seen as an empirical density of probability to be
in a model or an other. To characterize the state (bubble or non-bubble) in
which the price time series is, we construct the critical probability Ωct to be in a
bubble. Then from Ωct and the -drawups/drawdows of the log-price, we extract
the starting time and ending time of the different bubble. We constructed the
procedure to search the starting and ending time of a bubble, such that we are
more likely to found bubble than non-bubble. In this paper we preferred to
obtain false-positive than false-negative.
In the next section, we will apply the different study done (on the synthetic
data and on the quenched Kelly strategies) to estimate in current time the state
of a time series, the critical time of a bubble and how to optimize the different
strategies presented (the quenched and the mixed Kelly strategies).
53
4 Current Time Estimation
From the test on synthetic data, the success value and the labeling procedure
a current time estimation of the state of a time series will be developed. The
goal is to be able to define the current state of a time series, to obtain a good
estimation of the critical time and to find a good strategies based on the different
estimations.
Two techniques have been used:
• A first one based on the past results which will imply a delay.
• A second one based on supervised learning which will try to predict the
state and the best strategy to adopt.
4.1 Estimation from Past Results

4.1.1 Motivation
In the previous section we defined a global phase labelling procedure based on
the success value of a quenched Kelly strategy.
In this section we will try to define in current time the state of a time series,
the best strategy to adopt and the possible critical time. This study will be
based on the success value defined in the last section. We will introduce an
other estimation of the best fit to complete the mixed strategy and to find a
better estimation of the critical time.
This new estimation of the best fit will be based on the capacity of a fit to
predict the trend of the price series. The goal of this new estimation is to
extract in an ex-ante study a probability measure of the truthiness of each fit
done, which will be called the fitting probability. This proability measure will
be used to give an estimation of the current state of a time series, an estimation
of the critical time and trading decisions.
Since the success value of a strategy has been estimated in ex-post study. To
be able to use it in current time we will use the past results. Which will imply
a delay between the estimation and its application.
The main objectve of this section is to compare the results based on the fitting
probability and the ones based on the past success values.
4.1.2 The Fitting Probability Density

In this part we will define a probability density on the different fits done on a
time series based on the capacity of each fit to predict the growth.
In Section 3.3 (p∼29) we defined the mixed Kelly strategy based on all the
qualified fits of all the model. We did the assumption that the probability
measure on the expected growth from each fits followed a uniform law. In this
part we will try to complete this idea.
The hypothesis that we are going to use is that if at time t − ∆t a fit of a model
is qualified and gave an accurate predictions of the price growth. Its more likely
to give an accurate expectation of the price at time t. Following this idea the
first thing to do is to define the accuracy of a fit.
Let us consider that at time t we obtained from all the fits the following set:
0 0 0
A(t) = {(µm,t
t , Σm,t
t , qtm,t )| ∀t0 ∈ F (t), ∀m ∈ M} . (85)
54
Where M is the set of all models considered (Eq.49, p∼29), F (t) the set of
0 0
starting time of the fitting windows at time t (Eq.51), µm,t
t (Eq.56) and Σm,t
t
(Eq.58) are respectively the expected growth and the cumulative variance ob-
0
served from the fit done with model m on the time window [t’, t] and qtm,t the
qualification of this fit (see Section 2.2.4, p∼8).
The observed growth at time t + ∆t is: µt , then it is possible to define the
deviation of the expectations from the different fits with the observed growth
as (Eq.86):
0
0 |µm,t − µt |
dm,t
t = t
0 , ∀t0 ∈ F (t), ∀m ∈ M . (86)
Σm,t
t
The deviation is an estimation of how much a fit succeed to predict the price
dynamic. From each deviation observed at time t+∆t, it is possible to construct
an empirical probability density which will define the capacity of a fit to predict
the growth from time t to t + ∆t. Let us consider the weight (Eq.87):
0 m,t0
wtm,t = 1{qm,t0 6=0, qm,t0 +∆t 6=0} e−dt , (87)
t t+∆t
where 1{.} is equal to one if the instances in it are true and zeros otherwise.
0
dm,t
t is the deviation obtained from Equation (86). To obtain a non-negative
weight a fit need to be qualified at time t and at t + ∆t.
Since the deviations are positive, the weights are increasing when the deviations
decrease. Then we define the fitting probability density by (eq.88):
0
0 wtm,t
pm,t =P . (88)
t
n∈M, s∈F (t) wtn,s
The fitting probability density is an estimation of the quality of each fit versus
the others. It is based on the estimation obtained at time t and the observed
price growth at time t + ∆.
From the fitting probability density it is possible to estimate the probability
that we are in a critical phase or not at time t. In the section 3.5.4 (p∼46)
we defined the sets of critical Mc and stable Ms model. Following the idea
developed in Equation (79), p∼47, the probability to be in a critical phase is
(Eq.89):
ftc
Qct = ,
ftc + fts
1 X (89)
where: ftj = pm,k
t , and: j = s, c .
|Mj | m
m∈Mj , k∈Ft−∆t
This is an observation at time t + ∆t of the phase at time t, since we need to

observe the growth from t to t + ∆t to be able to construct pjt , j = s, c.
In this study the time between two observations is: ∆t = 5 trading days.
In the next sections we will use this new empirical density, which we called
the fitting density, to complete the mixed strategy, estimate the state of a time
series and obtain a good estimation of the critical time.
55
4.1.3 Current Estimation of the Critical Time and State of a Time
Series
In this chapter we will try to estimate the critical time and the current state
of a time series. To estimate the state of a time series we will compare two
techniques:
• The first one will be based on the fitting probability density and it will
be characterized by Equation (89). Since the fitting probability from the
estimations at time t is computed at time t + ∆t, it will be a delay. Then
0
we will considered that if we obtain pm,t
t for a model m at time t + ∆t,
based on its fit on the time window [t0 , t]. Then the probability that the
0
fit done on the time window [t0 + ∆t, t + ∆t] is accurate, is given by: pm,t
t .
• The second one will be based on the success value defined in section 3.5.3.
The success value is obtained from backtesting the quenched Kelly strate-
gies. Then the current success value at time t could not be use. Since the
-Drawup/down algorithm have been used to estimate it. So, to estimate
the state in current time we need to use the past success value. Then
the success weights are computed (see Eq.75, p∼45) and from the success
weights the critical probability Ωct (see Eq.79, p∼47) is computed.
Both Qct and Ωct could be seen as the probability to be in a bubble state. To
smooth the results the rolling average over the last ten data points are taken
for the success values and the deviation.
To estimate the critical time we considered a density of probability on the fits.
We are going to considered two hypothesis on the possible density of probability:
• It could be a uniform density on the qualified fits of the LPPLS-based
models. This means that each fit gives an equally likely estimation of the
critical time. From the test on synthetic data we know that this is not
true even.
• We can use the fitting probability on the fits of the LPPLS-based models.
The expected value and the standard deviation of the critical time are computed
in function of each density.
Figure 11 shows an example of results on the NASDAQ 100 from 1985-08-30 to
2020-02-28.
The first panel shows just the price of the index.
The second one shows the results obtained based on the fitting probability. The
dark shade represent the standard deviation of the distance at time t to the
critical time and the black line in it is its expected value. The red line is the
fitting critical probability Qct , but it is plotted at time t+∆t since it is estimated
at that time. To be able to observe the fitting critical probability, it has been
plotted in log scale. The LPPLS models are not accurate most of the time since
there is just a few moment where Qct > 0.1. But we know from the study on
the synthetic data that just a few number of fits based on the LPPLS-model
are accurate even to predict the LPPLS model itself. Then on a time series
the fitting probability to be in a critical phase should be close to zero most of
the time, which is what we observed. Even if the time series is characterized
by the LPPLS model we can see a low value for Qct > 0.1. Let consider the
56
NASDAQ 100 peaks in around 2000, which corresponds to the burst of the
dotcom bubble. Qct < 10−13 around this period but near the peaks it passed
from 10−20 to 10−14 , which means that a few LPPLS fits succeeded suddenly
to predict the dynamic. There is an other nice example, if you considered the
super-exponential dynamic around 1995-1996, there is a peak above 10−1 when
the dynamic changes to a flat behavior.
The third panel shows the same results but the expected and standard deviation
of the critical time is based on a uniform law and the critical probability is based
on the past success values. This other critical probability gives results in the
same magnitude (i.e. in [0.1, 1]). It gives a peak at the burst of the dotcom
bubble but it gives a peak during the flat phase of the bubble in 1995-1996.
Visually, the success critical probability gives peaks less sharp and oscillate
much more than the one based on the fitting probability.
We say that Figure 11 is a ”consultancy test”, because it shows indicators to be
in a bubble regime and estimations of the critical time. Then it is possible to
give advise on the financial market. For instance, if the fitting critical weight is
increasing along the time, then we can conclude that we are in a bubble regime.
Then we can take into account the estimation of the critical time to give an
estimation of the risk along the time. Which could help traders and other agent
to take decision on the financial market.
The next step is to apply the knowledge accumulate on the different test to
combined the quenched Kelly strategy and to complete the mixed one.
57
Figure 11: Consultancy test on the NASDAQ 100. The dark lines in the two
lower panels are the expected value of the critical time under: the fitting density
of probability (center), a uniform law over the qualified fits (lower). The grey
shades are the standard deviation around the expected critical time. There are
points where there is no expected value for the critical time, this is due to the
cases when none of the fits for any LPPLS-based model are qualified. The red
lines are the critical weights based: on the fitting probability (center panel); the
success values (lower panel). ”Consultancy test” means that we can use these
results to give advise on the financial market. For instance when the fitting
critical weight is increasing we can conclude that we are in a bubble, then we
can use the estimation of the critical time to give an approximation of when
the bubble will burst. The fitting critical weight seems to give better results in
current time than the one based on the past success values since it gives well
defined peaks when ”obvious” bubbles burst (∼1995 and ∼2000). Where the
success critical weight oscillates around 0.5 constantly.
4.1.4 Completion of the Quenched and Mixed Strategies

We will complete the mixed Kelly strategy (Eq.65, p∼32) by considering the
fitting probability (Eq.88, p∼55) and no longer a uniform law. As well we will
combine the quenched Kelly strategies based on the past success values. The
goal of this new strategies is to construct different trading strategies based on
different models to obtain a robust strategy during a geometric growth, a bubble
and around a crash.
From the past success weights (ωtm , for all m ∈ M) computed to obtain the
58
critical weight, it is possible to combine the different quenched Kelly strategies
λm
t (Eq.63, p∼31), such that the new strategy is (Eq.90):
X
λct = ωtm λm
t . (90)
m∈M
As well it is possible to define a new mixed strategy λft where the density of
probability is no more a uniform law but it is the fitting probability density.
Figure 12 shows an example of results on the NASDAQ 100 for the combined
quenched Kelly strategy and the new mixed strategy based on the fitting prob-
ability.
The upper panel shows the portfolio value for the B&H (black), the GBM-
based quenched Kelly (blue), the uniformly mixed (the original mixed; green)
Kelly, the combined quenched Kelly (red) and the fitting mixed Kelly (purple)
strategies.
The lower panel represents the portfolio weight invested.
The fitting mixed takes most of the time the same decision than the GBM one,
but a few times its suddenly changes its position. Compared to the uniformly
mixed, it takes much more risk and this is reflected by the losses during the
burst of the dotcom bubble.
The combined quenched strategy did not succeed to optimize the GBM quenched
strategy.
A more detailed study of all the strategies presented in this paper will be done
in Section 5.2, p∼83.
We tried to optimize the mixed and quenched strategies, but until know we
never directly applied the estimations of the critical time to trading strategies.
This will be the topic of the next section.
59
Figure 12: Test of different strategies on the NASDAQ 100. GBM (blue) is the
quenched Kelly strategy based on the GBM model, mixed (green) is the mixed
Kelly strategy based on all the models under the hypothesis that they are all
equally likely, the alpha strategy (red) is the combined quenched Kelly strategy
based on the past success value of each model and the fitted strategy (purple) is
a mixed Kelly strategy where the density of probability used is based on the past
deviation of the expectation with the realization observed. The best strategy in
terms of return stays the GBM-based strategy and the best strategy in terms of
risk is the uniformly mixed strategy. The combined quenched strategy (alpha
in red) optimized the LPPLS-based strategies but it did not performed better
than the GBM-based one. It did not succeed to catch the new trend after the
crash of the dotcom bubble (∼ 2000) fast enough in contrary with the GBM-
based one. The fitted mixed strategy has a higher return than the uniformly
mixed one but it has a higher volatility too. Then by increasing the returns it
also increased the risks, such as the drawdown observed during the crash of the
dotcom bubble.
4.1.5 Application of the Critical Time to Trading Strategies

Until now we did not apply the estimation of the critical to trading strategy. In
this section we will developed strategies only based on the critical time. Then
we will apply these estimations of the critical time to add breaking time into a
strategy. By breaking time we mean moments when we leave the market. Then
when we will consider that the critical time is to close, we will leave the market.
From the test on synthetic data we know that in average 20% of the fits done
based on the LPPLS model give accurate results on its parameters and it is
between 10 and 1 days before the critical time that we have an accurate results
on it.
On Figure 11 we see that the expected critical t time does not reached often a
60
value such that |t − tc | ≤ 10, for both of the density of probability considered
(uniform, fitting).
Based on this observations we are going to construct a trading strategy on the
expected critical time. Let denoted by t̂c (t) the expected value of the critical
time at time t. If none of the fits are qualified let us assumes that t̂c (t) = 0.
Then the new strategy is as followed:
• If |t̂c (t) − t| > 100 long or short the asset based on the expected qualifi-
cation under the density of probability considered (the fitting probability
or a uniform law on the LPPLS qualified fits).
• When 10 ≥ |t̂c (t) − t| quit the position.
Figure 13 shows an example of results on the NASDAQ 100 for the two tc
strategies considered.
The upper panel shows the portfolio value for the uniform critical time (purple),
the fittinf critical time (green) strategies.
The two lower panel are the same than for Figure 11
It is possible to observe that the strategy based on a uniform law is less efficient
than the one based on the fitting probability density. Near the peak of the
dotcom bubble (around 2000) the fitting critical time strategy succeed to use
the bubble behavior but it did not succeed to leave the market before the first
crash. Where the one based on a uniform law leave the market before the first
peak but it does not succeed to catch the drawup and a drawdown happens
when the price start to decrease after the peak.
61
Figure 13: Test of the critical time strategies based on: the fitting density of
probability (center), a uniform law on the qualified fits (lower). The dark lines in
the two lower panels are the expected value of the critical time under: the fitting
density of probability (center), a uniform law over the qualified fits (lower). The
grey shades are the standard deviation around the expected critical time. There
are points where there is no expected value for the critical time, this is due to the
cases when none of the fits for any LPPLS-based model are qualified. The red
lines are the critical weights based: on the fitting probability (center panel); the
success values (lower panel). The uniform critical time strategy (purple) is less
efficient than the one based on the fitting probability (green). When we assume
that all estimations from all the qualified fits are equally likely (lower panel),
the expected value of the critical time gets more often close (i.e. |tc − t| ≤ 10) to
the current time than the one based on the fitting probability. So the strategy
based on a uniform law gets stopped more often.
The weight to invest is based on the expected qualification of the LPPLS fits.
This work under the hypothesis that the sign of the parameter B of the LPPLS
reflected well all the time the direction in which the price will go. Since the
LPPLS is not valid often, this hypothesis could not work.
To define this strategy we only used the critical time observed, the fits qualifi-
cation and the two probability density on the fits. An idea would be to combine
the new strategies developed in this chapter with the critical time strategies,
such that it will give breaking time to each strategy to exit the market close to
the critical time. Following this idea, a breaking time strategy is defined by:
62
• If |t̂c (t) − t| > 100 start to use the strategy.
• When 10 ≥ |t̂c (t) − t| quit the position until you reached a time t such
that: |t̂c (t) − t| > 100.
It is possible to apply the critical time breaks to all the different strategies
developed in this paper
Figure 14 shows an example of results on the NASDAQ 100. The study and
comparison of all the strategy will be done in a next chapter (section 5.2, p∼83).
Figure 14: Test of the application of the critical time to: the combined quenched
strategy with a unifrom breaking time (purple line); the mixed strategy based
on the fitting probability density with the fitting breaking time (green line). If
we compare this figure with Figure 12, we see that the critical time breaks do not
optimize these strategies (particularly during the crash of the dotcom bubble
after 2000 because it stopped before the end of the drawup of the bubble).
We developed in current time methods to adapt and combined the different

strategy and to estimate the phase of a time series. We based this study on
the past results and we used the hypothesis that the past results will reflect the
future behavior of the time series. In the next section we will apply supervised
learning to predict the fitting probability, the success value of the quenched
strategies, the current state of a time series and when a crash will occur.
63
4.2 Prediction of Structural Breaks / Phase Transitions
4.2.1 Motivation
In the previous part we tried to estimate different features of a time series from
the past results.
The goal of this section is to apply supervised learning on the data collected
during this study. The main objective is to estimate when we get in and get out
of a bubble and the best financial strategy to apply in function of the current
state of the time series.
To be able to achieve this objective, the set of inputs to feed the supervised
learning algorithm will be defined, different targets will be constructed (the
best strategy to adopt, the success value of a model, the bubble characteristic
times, ...) and different Neural Networks will be developed to be able to find a
map between the set of inputs and each targets.
4.2.2 The Inputs Set

The idea is to predict five trading days in the future different values based on
the data collected by the fitting procedure with respect to each models, the
labelling procedure and the price time series.
At time t there is a matrix of parameters Ψm t where m ∈ M is a model in the set
of all the model considered. Such that each rows is the vector obtained by the
set of parameters Ψm (t0 , t) obtained by fitting model m over the time window
[t0 , t], for each t0 ∈ F (t). Since each fits at time t are done over different time
windows (from 20 to 504 past trading days), there is 484 fits per model. Then
for each model m ∈ M, the matrix of data reads (Eq.91):
 0
t1 Ψm (t01 , t)

 . ..
Ψmt =  ..  , (91)

.
t0K Ψm (t0K , t)
where K = 484 is the number of fitting windows, such that: F (t) = {t01 , ..., t0K }.
The row of parameters per fit for the different models are given in Table 5.
Model Parameters
LPPLS+GN Ψ = {A, B, C1 , C2 , m, tc , ω, σ}
LPPLS+BM Ψ = {A, B, C1 , C2 , m, tc , ω, σ}
LPPLS+OU Ψ = {A, B, C1 , C2 , m, tc , ω, σ, θ}
GBM Ψ = {µ, σ}
MGBM Ψ = {µ, σ, θ}
Table 5: Summary of the parameters of each model defined in Table 1 p∼8. All
the parameters came respectively from the equations 5, 6, 9, 10, 11 and 12, in
Section 2.2, p∼3.
The last inputs taken into account are the last 500 log-prices of the asset at
time t: p~t = (ln pt−499 , ..., ln pt ) and Tt = t + ∆t is the time targeted to do the
prediction (where ∆t = 5 trading days).
64
So the set of data to use as an input at time t is (Eq.92):
[
Dt = {Φm t , ∀m ∈ M} {~
pt , Tt }. (92)
We are going to construct different neural network to predict different features,

but the input layer of each network is going to have the same topology and the
input set Dt will be the same for all the network.
4.2.3 The Targets

The different outputs targeted by the networks are going to be:
• The best strategy to adopt λst ;
• The success value of each model m at time Tt (αTmt for all m ∈ M, see
Eq.73, p∼45);
• Different signals to estimate the starting and ending time of a bubble and
an other to estimate if the critical phenomena already occurred at time
Tt ;
• The probability that a fit of model m done on [t0 , t] is truthful to predict
the price at Tt .
Each targets exposed above are described below.
The best trading strategy

The best trading strategy at time t is the ratio of the portfolio value to invest
from t to Tt . To be able to train the network the idea is to take
λst = sign(pTt − pt ) , (93)

to train the network. Why are we trying to predict the best trading strategy?
It is to observe if a network is able to find a better strategy than the one
developed in the previous sections (Sections 3.3 & 4.1.4) and in the next
sections (see Section 4.2.5, p∼75). It is possible to see this target as the direct
procedure to get an ”optimal” trading strategy.
The success value

A success value for each strategy has been obtained from backtesting them
(Eq.73, p∼45). The goal here is to be able to predict the success value of a
strategy from time t to Tt .
The bubble characteristic times

Again, from the labelling procedure the starting and ending time of a bubble
has been estimated based on an empirical study (see Section 3.5.4, p∼46). The
goal here is to predict a signal at time t that at time Tt we will enter (qts ) or
exit (qte ) a bubble. Let us consider the sets of starting and respectively ending
time of a bubble be τstart and τend . A first idea could be to consider that the
signals to target take the forms (Eq.94):
qts = 1{Tt ∈τstart }

(94)
qte = 1{Tt ∈τend } ,
65
where 1{.} is equal to one if the argument in it is true and zero otherwise. The
main problem with this function is that if the striking time is arbitrarily close to
a time τstart ∪ τend but never touch one due to the discretization. The function
will always be zero and we will always by blind to the signal. Even if there is a
few striking time in the bubble characteristic times set, the number of time that
the functions should be 1 is much lower than the moments when it is going to be
zero. To illustrate this let consider the example on the NASDAQ 100. During
the past 30 years of backtesting, we took close to 2000 decisions each spaced
by 5 trading days and approximately 40 bubble episodes have been found. This
means that the maximum number of times that each functions will raise a value
of one would be 40. But it will be 2000 sets of data to train and so the ratios
of time where there gone be a signal would be: 1/50. Then the bubble signal
would be drowned.
The approach to overcome this problem is to construct a density of probabilities
around each time in τstart ∪ τend . The first idea would be to use a Gaussian dis-
tribution but the issue is that the signal would be symmetric around each time.
So if we were looking forward or backward along the time we will observed the
same distribution. The second idea and the one used in this paper is to con-
sider an asymmetric Gaussian distribution for which the density of probability
is (Eq.95):
Z α( x−µ
σ )
1 − (x−µ) 2
y
q(x; µ, σ, δ) = e 2σ2 e− α dy , (95)
σπ −∞
where µ is the position, σ the scale and δ the asymmetry parameter. Figure 15
shows an example of asymmetric Gaussian distribution.
66
Figure 15: The asymmetric Gaussian distribution with µ = 0, σ = 1 and
δ = ±10. For δ > 0 (continuous line) the tail is on the right and for δ < 0
(dashed line) it is on the left side.
Using the asymmetric Gaussian distribution, three different targets are going to
be constructed:
• q s the starting signal of a bubble;
• q e the ending signal of a bubble;
• q c the signal of the dissipation of the critical event after the bubble.
Consider the set of bubbles:
B = {(ts , te )| for all bubble episodes} , (96)

and let us denoted by:
δtse = te − ts , ∀(ts , te ) ∈ B , (97)

the bubble duration and the bubble interval by:
Tse = [ts − 0.1δtse , te + 0.1δtse ] . (98)
67
Then we construct the signal targeted following Equation (99):
r
s
X δtse
qt = 1{t∈Tse } q(Tt ; ts , , 10)
2
(ts ,te )∈B
r
e
X δtse
qt = 1{t∈Tse } q(Tt ; te , , −10) . (99)
2
(ts ,te )∈B
r
c
X δtse
qt = 1{t∈Tse } q(Tt ; te , , 10)
2
(ts ,te )∈B
Where Tt is the striking time (the time at which we want to make a prediction).
Take note that the targets qt are not probability distribution, they represent
the bubble signals. An example is given Figure 16. Once a peak is observed
in qts , the time series entered in a bubble regime. Until qte is low, it is possible
to learn some features about the bubble such as if it is a positive or a negative
one, the strength of the oscillations, its critical exponent, etc ... When qte start
to increase this means that the critical event is close and when it reached a
maximum it is possible to take a decision based on the knowledge get during
the bubble (to be able to short, long or quite the market in ”security”). Once
a peak in qtd has been observed and it started to decrease, it means that we
should have passed the critical events (such as a crash) and a strategy which
work well outside of a bubble (such as the GBM quenched Kelly strategy) could
be applied again.
For each probability a Gaussian noise N (0, 0.05) is added and on each bubble
interval Tse , the signals qt are normalized to be in [0, 1] such that it is equal to
one at each peaks.
68
Figure 16: An example of bubble signals with ts = 0 and te = 100. a) is the
pure probability and b) is the probability where a Gaussian noise N (0, 0.05)
has been added. When a peak in qts (black) is observed, it means that we get
in a bubble and when the signal is decreasing, it means that we get deeper into
the bubble. When the qte (red) starts to increase, the risk of a possible crash
is increasing with it. qtc represents the dissipation of the critical phenomena.
When it decreases, the risk of a possible crash decreases with it.
69
Fits truthiness
The goal here is to establish if a fit is going to give a good prediction at time
Tt of the asset growth. This is equivalent to predict the fitting probability
developed in section 4.1.2 (p∼54). In the past part, we use the past results to
0
define the fitting probability pm,t
t (Eq.88) but here we will try to use a neural
network to predict it.
The list of targets from t to Tt is given in Table 6.
Names Symbols
strategy λst
success values {αTmt , ∀m ∈ M}
bubble signals {qts , qte , qtd }
0
Fits truthiness {pm,t
t , ∀m ∈ M and t0 ∈ F (t)}
Table 6: The value targeted by the neural networks at time t for Tt . Where
each fits of each model m ∈ M at time t are done over different time windows
[t0 , t] for all t0 ∈ F (t).
4.2.4 The Multi-Branches Convolutional Neural Network (MCNN)

The goal of the neural network is to find a map:
φ : Dt → Ot , (100)
where Dt is the set of all the data available at time t (Eq.92) and Ot is the set
of data targeted (Tab.6).
Since we have different matrices Ψt ∈ Dt for each model, the idea is to use a
network with different branches. A branch for each model, a branch for the
past log-price and a last one for the striking time Tt . Convolutional networks
have been used for each branches of each model and for the price data and a
dense network for Tt . The convolutional networks used have the same structure
and are composed of a 1-dimensional convolutional layer, follows by a maximum
pooling one and ended with an other 1-dimensional convolutional layer. Then
the last branches of each layers are flatten and concatenated with each other.
Then there is a last dense layer to get the output. Each branches start at the
bottom with a batch normalization layer (see Figures 17, 18 & 19).
The idea behind this network structure is that the convolutional network will try
to find the correlation in the fits of a model and reduced the set of data for each
model. As well for the log-price data. Then all the set of data are flatten and
concatenated. Then we use two dense layers which is a Multi-Layer Perceptron
(MLP) with one hidden layer. Once the main features of each sub data sets are
extracted by the CNN, the MLP will compare the results of each model with log-
price data and the striking time to finally give the outputs. Here we did not use
a network to deep because features of the price time series are already extracted
from each fits and the goal of the network is to find correlations between these
data.
Three different networks are going to be implemented to be able to predict the
three main category of targets which are (Tab.6):
70
• The trading target, which is going to try to predict the best strategy to
adopt from t to Tt , let us call it the trading network (Fig.19);
• The fit targets which is going to try to predict the probability that a fit
is going to give the best expected growth of the asset, let us call it the
fitting network (Fig.18);
• Finally, the last network is going to try to predict the values which came
from the labelling procedure developed in a previous part, this network
is going to be a bit different than the other since it is going to have two
output branches, one for the success values and an other one for the bubble
signals. Since the procedure of identification of the bubble are based on
the success values. The idea is to consider that the correlation between
each parameters of each models and the price time series should be the
same for the success values and the bubble signals. Let us call this network
the consulting network (Fig.17).
71
Figure 17: Consulting layer where the input layers from 1 to 7 are respectively
the striking time Tt , the parameters of each of the 5 models and the last log-
prices.
72
Figure 18: Fitting layer where the input layers from 8 to 14 are respectively the
striking time Tt , the parameters of each of the 5 models and the last log-prices.
73
Figure 19: Trading layer where the input layers from 15 to 21 are respectively
the striking time Tt , the parameters of each of the 5 models and the last log-
prices.
74
4.2.5 Network Applications
Three different networks (the trading, the consulting and the fitting networks)
have been implemented and inputs and targets sets have been defined. In this
section we will deined strategy based on the outputs of each network. The
consulting and fitting networks can be both used to do consultancy to estimate
the state of a time series at time t and to obtain a prediction at time Tt of the
new state.
All three networks can be used to define a trading strategy:
• For the trading network this is obvious since it is its only vocation.
• For the fitting network the idea is to complete the mixed strategy elabo-
rated in a previous part (section 3.3, p∼29). Such as it has been done in
Section 4.1.4, p∼58. Let us denoted by λft this new strategy.
• For the consulting network it is possible to define the portfolio weight λct
to invest in the asset from the success values. The idea is to use the same
concept developed in Section 4.1.4, p∼58, but here the success values used
are going to be the one predicted and not the one from a month ago.
Then we have three new strategies λst , λft and λct to compare to the other
strategy developed in this paper (Table 8, p∼83, summarize all the strategies
developed in this paper).
Figure 20 shows the targets of the training sets on the NASDAQ 100 and the
applications of the three new strategies. This plots are equivalent to the case
where we succeed to predict exactly the different targets.
The first panel represents the application of each networks (the consulting net-
work in blue, the fitting network in orange and the trading network in green) to
trading strategy, the price of the asset in black and the median and 0.25,0.75-
quantiles in black-dashed of the expected price at Tt based on the expected
growth of each fit of each model under the fitting probability. Here, the trading
network strategy is the best strategy to adopt on the time series. The fitting
network strategy is closed to the optimal strategy, then we can conclude that
the different models used are able to describe the time series at each instant, but
the difficulty is in the estimation of the best fits. Which is going to be the goal
of the fitting network. The consulting network strategy is not as efficient as the
fitting network one but if the network gives us a good estimation of the success
values, it will be able to optimize the combining quenched strategy (Section
4.1.4, p∼58).
The second panel shows us the success values that we would like to predict (it
is the same than in Figure 10, p∼52).
As well the second and third panels represent the bubble signals and the fit
truthiness that we would like to predict. The fitting probability of each model
oscillates from zeros to one.
75
Figure 20: Training set for the NASDAQ 100 with the application of the net-
works. This figure represents the values targeted by the neural networks on
which they trained on. On the upper panel the different strategies are plot-
ted. The best fitted mixed strategy (orange) is closed to the optimal strategy
(green). The best strategy based on the success value (blue) to combine all the
quenched strategies is not as efficient as the others and a small drawdown is
observed around the peak of the dotcom bubble (nears 2000). The second panel
shows the success values for all model-based quenched strategy. The third one
shows the different signals for the starting time (black), ending time (red) and
dissipation (blue) of a bubble. The last panel shows the fits truthiness for each
model based on their capacity in current time to predict the trend. Based on
the fits truthiness, the median and 0.25,0.75-quantiles estimated from the fits
are computed (black dashed lines in the upper panel).
76
To train each networks we considered different time series, which are indexes or
shares (see Table 7). The training set corresponds to all the price time series
presented in Table 7 up to 2016-10. After 2016-10, the data will be used to test
the networks.
Name Symbol From To
Western Union WU 2006-9-20 2020-04-20
TransDigm Group TDG 2006-03-15 2020-04-20
Assurant, Inc. AIZ 2004-02-05 2020-04-20
Garmin GRMN 2000-12-08 2020-04-20
Concho Resources CXO 2007-08-03 2020-04-20
MarketAxess Holding, Inc. MKTX 2004-11-05 2020-04-20
MSCI MSCI 2007-11-15 2020-04-20
NASDAQ NDAQ 2002-07-01 2020-04-20
NASDAQ 100 NDX 1985-08-30 2020-02-28
Edwards Lifesciences EW 2000-03-28 2020-04-20
Equinix EQIX 200-08-11 2020-04-20
Extra Space Storage EXR 2004-08-12 2020-04-20
CME group CME 2002-12-06 2020-04-20
Alphabet Inc Class A GOOGL 2004-08-19 2020-04-20
Swiss Market Index SWISSMI 1988-06-30 2020-04-20
Wynn Resorts WYNN 2002-10-28 2020-04-20
Digital Realty Trust, Inc. DLR 2004-10-29 2020-04-20
Las Vegas Sands LVS 2004-12-15 2020-04-20
Seagate Technology STX 2002-12-11 2020-04-20
Illumina Inc. ILMN 2000-07-28 2020-04-20
Tapestry TPR 2000-10-05 2020-04-20
IntercontinentalExchange ICE 2005-11-16 2020-04-20
Under Armour Inc Class A UAA 2005-11-18 2020-04-20
Leidos LDOS 2006-10-13 2020-04-20
Devon Energy DVN 1985-07-22 2020-04-20
LKQ Corporation LKQ 2003-10-03 2020-04-20
NVR, Inc. NVR 1993-10-01 2020-04-20
Live Nation Entertainment LYV 2005-12-14 2020-04-20
Packaging Corporation of PKG 2000-01-28 2020-04-20
America
Akamai Technologies AKAM 1999-10-29 2020-04-20
United Parcel Service UPS 1999-11-10 2020-04-20
NRG Energy NRG 2003-12-03 2020-04-20
Salesforce CRM 2004-06-23 2020-04-20
Table 7: Names and dates of the price time series used in this paper to train
and test the neural networks.
4.2.6 Results
Figure 21 shows the outputs obtained by using the networks on the testing set
of the NASDAQ 100. The structure of the figure is the same than for Figure 20.
Unfortunately the results obtained are far from the ones targeted. Each outputs
has tended to the mean value of the training set and they seem constant along
77
the time. If we zoom in on each curve, we will observe that there is oscillations
and the results are not flat at all.
To try to extract results from the outputs of each network an idea is to normalize
the results. Two type of renormalization are used:
• The rolling extremum normalization (Eq.101):
xi − xmin
i
xi =
xi − xmin
max
i . (101)
xmax
i = max xj , xmin
i = min xj
j=i−4,...,i j=i−4,...,i
• The rolling standard normalization (Eq.102):

xi − x̄i
xi =
∆xi
v
i u i (102)
1 X u1 X
x̄i = xj , ∆xi = t (xj − x̄i )2
5 j=i−4 4 j=i−4
.
Where xi is the ith -data point of the output x of a network. The rolling normal-
ization is done on the 5 last data points because each data points are spaced by
5 trading days and the smallest size of a bubble labelled in this paper is of 60
days (not trading days), then 5 data points represent the 25 past trading days
and it is the half of the smallest size of possible bubbles. The idea behind the
rolling normalization is to catch the changes in the trend of each output.
Figure 22 shows the results obtained after normalization. All the results are
showed in the appendix K, p∼193.
It is possible to observe that after normalization of each target, the results are
not better at all and they are equivalent for both types of normalization.
78
Figure 21: Testing set for the NASDAQ 100 with the application of the networks.
This figure represents the values obtained on the testing set by the networks
after optimizing them. The upper panels shows the different results for the ap-
plication of the different trading strategies: the best strategy predicted (green);
the fitted mixed strategy (orange); the combined quenched strategy (blue). Each
strategy is based on the prediction given by the corresponding neural network.
The fitted mixed strategy is based on the Kelly weight obtained by combining
the expectation based on all the fits of all the model considered under the fitting
probability predicted by the network (lower panel). The success values used to
combine the quenched strategies is the ones predicted by the corresponding net-
work (second panel). The networks after optimization tend to give as outputs
the mean-values of the training sets.
79
(a) Rolling extremum normalization (b) Rolling standard normalization
Figure 22: Testing set for the NASDAQ 100 with the application of the networks.
after optimizing them. Each outputs (the supervised strategy λs in green, the
success values α on the second panel, the bubble signals q on the third panel
and the fits truthiness on the lower panel) have been normalized using: the
standard normalization (left), the extremum normalization (right). The rolling
normalization have been done on the five past data points. These two types of
normalization did not help to get better results.
4.2.7 Discussion
The results obtained from the neural networks are bad. They do not succeed to
catch the main features from the training part. In general the network tended to
generate output which are closed to the mean value of the training set. Different
reasons could be the cause:
• The hyper-parameters of the networks (number of hidden layers, learning
rates, ...) or the network structures are not optimal and a grid search to
optimize them could be done.
• The size of the training set is to small (here ∼20000 sets of data have been
used to train the networks). When we increased the size of the training
set, the accuracy of the networks after training increased. But visually
the results did not change much.
• We can add other type of data to the input set. Such as the volume
exchanged each trading days, the metrics for each model-based quenched
Kelly strategy (Table 4, p∼38), etc...
• All the targets are values that we want to predict and to get them we
80
assumed that we have been able to see in the future. But it is possible
to ask ourselves if it is possible to predict these values by using neural
networks.
81
5 Discussion
In this paper we tried to develop different tools and techniques to construct
trading strategies robust against crashes and indicators to define in current time
or in a backtest study the state of a time series (geometric growth / non-bubble;
LPPLS / bubble).
The goal of this section is to come back over the different results obtained
and the different concepts and techniques elaborated and used in this paper.
To highlight their successes and/or failures, to give the pro and cons of the
different methods developed and what to change and to improve in possible
future studies.
5.1 Models and Optimizations Procedures

From the tests on synthetic data (Section 2.4.4, p∼23), we observed that it is
difficult to fit data with respect to the LPPLS model. We have seen in Section
2.4.3, p∼20, that in average only 20% of the fits are significant when we tested
on synthetic data, even for a small noise amplitude (σ = 10−2 days−1/2 ). We
have based the optimization procedure on the maximization of the log-likelihood
with respects to each error model considered (see Section 2.2.2, p∼5).
It could be a good study to use other metrics to found different features of
the LPPLS model. For example someone could construct a method to find the
power-law on a side and the log-periodicity on the other to feed it as guesses
into a non-linear optimizations. Since the log-periodicity contains information
about the critical time it could be nice to develop an optimization based on it.
We have seen in Section 2.2.4, p∼8, that if we observes three oscillations, we
can estimate directly tc from the distances between the three local maxima and
this concept is not used at all in the optimization procedure constructed in this
paper.
We tried to construct a generic code to be able to fit the different models to a
time series easily and quickly but this code could be improve. The implemen-
tations of the models and optimization procedures could be written in a better
way. For this paper we implemented on a side the error model (GN, BM, OU)
and the corresponding cost functions (Eq.27, p∼13) on the other. It would be
nicer if the cost function based on an error model were already implemented
into the error model itself, such that when you combine a predictive model and
an error model, it would adapt the cost function directly. Currently you have to
combine a predictive model with cost functions to obtain a non-abstract class
and then we have to combine it again with an error model to obtain the complete
stochastic model.
An other interesting thing to complete the code would be to generalized the
combination of different, such that if we have two models X and Y, we would be
able to construct directly the models: X + Y and X Ẏ . Currently the only way
to combine two models is if one is a predictive model and the other a stochastic
one, then they can be combined in an Ito model instances. This constructions
could be generalized and improved.
82
5.2 Trading Strategies
Strategy name Descriptions
Model-based Defined in Section 3.3, p∼29, it is characterized by
Quenched Kelly Equation 63, p∼31. They are named in function of
Strategy the model they are based on (see Table 1, p∼8.)
Uniformly Mixed Defined in Section 3.3, p∼29, it is characterized by
Kelly Strategy Equation 65, p∼32. It is the first attempt to com-
bine the different model together under the hypoth-
esis that every qualified fits are equally likely.
Fitting Mixed Kelly Defined in Section 4.1.4, p∼58. It is a mixed Kelly
Strategy strategy, where the hypothesis is that the best fits at
the previous step is going to be the best fits at the
current one. Then density of probability is no more
uniform but it is defined by Equation 88, p∼55.
Combining Defined in Section 4.1.4, p∼58, it is characterized by
Quenched Equation 90, p∼59. It is an other attempt to combine
the different models. Based on the previous successes
of each quenched strategy, a probability density of
success of each strategy is computed in an equivalent
way used to obtain the success weights (see Equation
74, p∼45).
Uniform Critical Defined in Section 4.1.5, p∼60. This strategy is
Time based on the hypothesis that all the LPPLS quali-
fied fits are equally likely and so it is for there corre-
sponding critical time. We take position in function
of the expected sign of the bubble and we quite the
position when we get to close to tc .
Fitting Critical Time Defined in Section 4.1.5, p∼60. This strategy is
based on the same hypothesis than for the fitting
mixed Kelly strategy (i.e. if a fit is accurate at the
previous step, it would be accurate at the current
one). Then the expected critical time is based on the
fitting probability (Eq.88, p∼55). We take position
in function of the expected sign of the bubble and we
quite the position when we get to close to tc .
Strategy with Uni- Defined in Section 4.1.5, p∼60. It is the combination
form Breaking Time of any of the Kelly based strategy with the uniform
critical time one. The idea is to use the critical time
as a breaking time of the strategy (i.e. we use the
strategy until we are to close of the critical time, then
we exit the market until the LPPLS model is no more
qualified or when we observed a new critical time fare
away).
Strategy with Fitting Defined in Section 4.1.5, p∼60. It is the same than
Breaking Time for the strategy with uniform breaking time, but it
is based on the fitting probability.
Table 8: Summary of all the strategies developed in this paper.
In this paper we have developed different trading strategies based on the differ-
83
ent models and on the Kelly criterion (see Table 8). Here we will not talk about
the strategies obtained from the applications of the neural networks due to the
poor results that we obtained. So we are going to concentrate this discussion
on all the strategies presented in Table 8.
To be able to compare the different strategies, they all have been backtested
on all the time series given in Table 7, p∼77. Then the metrics presented in
Table 4, p∼38, have been computed for each time series at each decision times
over all possible time windows. Then they have been plotted in function of
the windows size on which the metric have been computed, from Figure 169
to Figure 178. They represent all the possible combinations between all the
strategies presented in Table 8. For each strategy we have plotted each metrics
in function of the size of the time windows T (see Table 4, p∼38). The dark
shade represents the interval between the 0.1-quantile and the 0.9-quantile and
the dark line is the median. The results have been plotted in a log-log scale.
From Figure 169 to Figure 173, we plotted the results for the B&H strategy and
for the uniform and fitting critical time ones.
From Figure 174 to Figure 178, we plotted the results for the other strategies,
such that each row is based on one strategy. There is three columns:
• The first one shows the results for the pure strategies (with no breaking
time).
• The second one shows the results for the same strategy, to which we added
a uniform breaking time.
• The third one is the same as the second column, but it is based on the
fitting breaking time.
Sharpe ratio
Figure 169 and Figure 174 show statistics of the Sharpe ratio over different time
length T. The median of most of the strategies tend to one when we increase the
size of the time windows, which is the same results for the buy and hold strategy
(B&H). In terms of Sharpe ratio, all the strategy are at most as efficient as the
B&H one. Most of the strategy give a wider range of value for the Sharpe ratio
below one. Only three strategies seem as successful as the B&H one and they
are:
• The LPPLS+GN quenched Kelly strategy;
• The MGBM quenched Kelly strategy;

• The Fitting Mixed Kelly strategy with fitting Breaking Time.
Since they all converge well to one compare to the other.
CAGR
Figure 170 and Figure 175 show statistics of the CAGR over different time
length T. In Figure 170, the B&H strategy has a CAGR which tends to 0.1
when T increases. As well the median of all the other strategies tend also to
0.1. So in terms of CAGR all the strategies are as efficient as the others on
a long run (ten years and more). There is only the uniformly mixed Kelly
strategy, which gives a CAGR slightly below the others strategy in terms of
84
median.
VaR
Figure 171 and Figure 176 show statistics of the VaR over different time length
T. On Figure 171, the VaR of the B&H is closed to −10−1 for any time scale
and it does not deviate much from the median. On the other hand the median
of the uniform critical time strategy tends to the same value than the B&H one,
but its quantiles deviate much more from the median. As well for a time period
smaller than a year, it gives a 0.9-quantile equal to zero, this is due to the fact
that this strategy is fearful and does not play much. The fitting critical time
strategy results are in the middle of the two previous strategy. In terms of VaR,
there is strategies that performed as well or even better than the B&H one:
• The LPPLS+GN quenched Kelly, the MGBM, The fitting mixed and the
combining quenched strategies give equivalent results than the B&H one.
When they are combined to a breaking time, they do not perform better
but the deviation from the median increase.
• The uniformly mixed strategy is the one with the best VaR. The median
stays at −10−2 for any time scale and when we combined it to a breaking
time it does not give worst results and the 0.9-quantile increases. Which
means that we are less at risk most of the time.
Accuracy
Figure 172 and Figure 177 show statistics of each strategy portfolio accuracy
over different time length T. Each pure strategies give an accuracy a bit below
40%. When they are combined to a breaking time, the results decreased and
the deviation from the median get wider. Also for a small enough time period
their accuracy could be 0%. For the uniform breaking time, the accuracy can
be 0% for a time period equal or smaller than a thousands days. For the
fitting breaking time, it is closer to hundred days. The uniform breaking time
decreases significantly the accuracy. In average it divides it by three most of
the time, where the fitting one seems to divide it by two.
Average Return
Figure 173 and Figure 178 show statistics of the average return over different
time length T. Figure 173 shows that the average return of the B&H strategy
tends to 10−3 and the deviation from the median get thinner but not much.
The critical times strategies give average returns which decrease along the
time and tend to be negative. Figure 178 shows the average return for all the
other strategies. After ten thousand days the LPPLS+BM quenched strategy
performed better than the B&H one. When it is combined to a breaking time,
the frequency of negative average return decrease and especially when it is
combined with a uniform breaking time, it is mostly positive. All the other
strategies give worst results than the B&H one or at most equivalent results.
In conclusion, when we study all the strategies. Each strategy performs dif-
ferently but in average there is any strategy that bit the B&H one on all the
metrics on a long run. The only one which gives quite good results is the uni-
formly mixed Kelly strategy, since it is the one which gives the best VaR and a
CAGR equivalent to the B&H strategy. Then we can conclude that the applica-
85
tion of the past fit accuracy to define a probability measure on all the fits is not
relevant to predict the future dynamic of a time series. Since the fitting mixed
Kelly strategy does not bit the uniformly mixed one and the fitting breaking
time does not optimize significantly the different strategies. As an example, the
combining quenched strategy does not perform much better than the GBM one.
Then the past success values of quenched strategies are not enough to define a
probability measure on the different models. Then following the approach pre-
sented in this paper, there is not enough information in the past data to define
an underlying probability measure which would characterized as well the state
of a time series and the best fits to use.
When we tried to apply neural networks to predict features. We observed on
Figure 20 that if we were able to estimate perfectly the fitting probability and the
success value, we would succeed to construct trading strategies robust against
crashes and even better we could be close to the optimal strategy. Unfortunately,
the network application did not work.
It is possible to complete the breaking time by considering the probability to be
in a critical phase (see Section 4.1.3, p∼56). We have defined one based on the
accuracy of the past fits and an other one based on the success of the quenched
strategies. Then a future study could be to integrate these probabilities to
estimate if the expectation of the critical time at time t is relevant or not.
5.3 The Success Value

The success value constructed in this paper (Eq.73, p∼45) could be used to
characterize performance increases for any strategy during a backtest. The main
issue with this estimator, is that if a strategy reached is maximal potential under
a metric and remains constant after during a period, it would give no success
point to the strategy. Even if we have seen that in practice under a metric a
strategy performance is never constant and there is always oscillations. If we
had a strategy that succeed to give a constant and positive Sharpe ratio, CAGR,
etc ... or VaR small in amplitude and constant, it would be the best strategy
ever in terms of risk but a such strategy does not exist.
To improve the computation of the success value we can adapt the measure
(Eq.72, p∼45) for each metric in Table 4. For example, you can consider that
if the Sharpe ratio of a strategy is above one, the strategy is performing well
enough whatever it is in a drawup or drawdown or/and we can consider other
metric such as the maximum drawdown over the past period.
The success value have been introduced in the first place in a backtest study.
It could be interesting to adapt the success value to compute it in current time
without considering a delay (such as we did in Section 4.1.3, p∼56). An idea
could be to use moving averages instead of the -drawups/drawdowns, for all
observables in O etT (Eq.71, p∼39). In this case we will consider two time scales:
T2 > T1 , such that when the moving average over the past period T1 is above the
one over the past period T2 it will give a success point to the trading strategy.
5.4 The Global Phase Labelling Procedure

From the success value of the quenched Kelly strategy based on the different
models. We developed a global phase labelling procedure in a backtest study to
estimate bubble and non-bubble episodes. The main issue with this procedure is
86
that it over estimates the bubble episodes. Because we first searched the possible
ending time of a bubble, then we tried to link them to drawups/drawdowns of
a time series. In a second step we searched the starting time in a predefined
set such that it maximize the growth during a bubble. If a bubble is found in
an other one we merge the two in the bigger one (the one which maximizes the
price changes). Once we obtained all the bubble episodes we filled the gaps by
assuming they are geometric growths. We based this procedure on the critical
probability defined by Equation (79). We did the assumptions that the end of
drawups/drawdowns near times t such that Ωct = 1 are possible ending time
of a bubble. This criterion could seem quite strict. It is possible to consider a
threshold δ ∈ [0, 1] such that the new condition for possible bubble ending time
is:
Ωct ≥ δ . (103)
The algorithm presented in Section 3.5.5, p∼48, is constructed such that for each
possible bubble ending time we are sure to find a corresponding starting time. It
is only when two bubbles or more overlap that we are deleting the starting and
ending times which do not maximize the price growth. The procedure could be
adapted such that it is more restrictive on the starting time of a bubble. In the
step 4.(b) of the algorithm we defined the set of possible starting times T for
an ending time. This set is constructed such that it is never empty. It can be
interesting to observe what kind of results we will get if we are more restrictive
such that if there is no possible starting time for an ending time. Then the
corresponding ending time should be discarded.
During this procedure we never take directly into account the CI and the esti-
mation of the critical time. An idea to complete this procedure for future study,
would be to cross-validate the estimations of ending time and starting time of
a bubble, with the fits estimations. For example, once we obtained an estima-
tions of the ending time tc0 of a bubble. We fit the time series over different
time windows [t, tc0 ], to find the time t which maximizes the log-likelihood and
estimates a critical time close to tc0 . It is possible to improve this procedure
by using the facts that it is three days before the crash that we have the best
estimations of the LPPLS parameters (see Section 2.4.3, p∼20). Then we can
fits over [t, tc0 − 3] instead of [t, tc0 ].
This procedure does not take in count that a bubble could be hidden into a bigger
one. Then we can improve the labelling procedure by considering different time
scales (2-6 months, 6 months - 1 year, 1 year - 2 years, etc ...). Such that we
could classify bubbles into short, medium and long bubble episodes. Where in
it, it could be composed of different non-bubble and bubble episodes.
5.5 Predictive Learning of the Critical Dynamic

In Section 4.1.3 we tried to apply supervised learning to predict a density of
probability over the fits, the starting and ending time of a bubble and the success
values of quenched Kelly strategies. In Section 4.2.7, p∼80, we already discussed
that the method developed in this paper have been unsuccessful. It could be
due to the size of the training set, to the topology of the networks or to other
hyper-parameters not well calibrated.
It is also possible that we tried to predict to much with this network and it is
87
not well adapted for this kind of task. Then to go a bit further, it could be a
nice study to try to use reinforcement learning to train an artificial agent. The
objective of the agent could be to find the best trading strategy at each decision
time, to estimate the fitting probability, etc...
We already try to develop an artificial trader which it is based on the Double
Q-learning algorithm. The implementation of the trader worked, but when
we tried to define based on the same techniques an artificial consultant, which
would try to estimate the fitting probability a problem has arisen. We based the
Q-learning algorithm on an action space, which represents the different actions
that an agent can take. For the trader this represent the portfolio weight λ to
invest in the asset. For the consultant this would be the the fitting probability
0
{pm,t
t , ∀m ∈ M, t0 ∈ F (t)} (see Section 4.1.2, p∼54). But to construct the
action space, we had to discretize it and to enumerate all the possible actions.
For example for the trader, it could be an array of N values linearly spaced
between -1 and 1, to obtain the weight to invest for a self-financing strategy.
Then we would have N possible decisions at each instant. The problem that
arise for the consultant, is that we have 484 fits per model. Then each possible
action is an array of 2420 elements. If we decided to discretize the probability
in N bins such that the variation in the probability is δ (i.e. N δ = 1, then we
can only distribute the value for the probability by units of δ). Then the size of
the action space would be:
A = (M K)N , (104)
where M = |M| = 5 is the number of model used, K = |F (t)| = 484 for all time
t is the number of fits done at each decision time for each model and N is the
number of bins in which the fitting probability is discretized. Since we know
that: M K = 2420, then if we only set N = 2, the size of the action space would
be: A = 5856400 actions. To estimate the Q-values to take ”best” decision,
the agent apply its network on all the possible actions and take the one which
maximized the Q-values. N can be seen as the number of fits that could be
as relevant at the same time, then if we only considered that 10 fits give an
accurate predictions of the expected growth. we obtained that: A ≈ 6.89 × 1033
actions. This means that we need to find an other approach to construct the
consultant agent.
An idea would be to construct a density of probability over the growths based
on all the fits. Then for each fits, based on this density of probability, we can
estimate if the probability to observe the growth that they expected is high or
low. Then we could validate each fits based on this probability density. For
example the probability density could be a sum of standard Gaussian distri-
butions, this would mean that there is different possible clusters of expected
growth.
88
6 Conclusion
In conclusion, we succeed to implement a complete code to easily and quickly
fits time series with respect to LPPLS and GBM models. There is no need to
re-implement an optimization procedure if someone want to fits with respect to
other models, they only need to implement the model as a subclass of a pre-
dictive model and to implement methods to compute its Jacobian and Hessian
matrix. This could be a gain of time for future student and researchers.
Based on the different models we succeed to construct a labelling procedure to
estimate bubble phases in time series during a backtest. There is still a few
issues with this procedure. It overestimates the bubble episodes and it does not
consider different time scale at the same time. Also it only uses the results from
the model-based quenched Kelly strategies and it does not use for example the
log-likelihood of the fits, their CI and the estimations of the critical time.
In this paper we developed Kelly strategies based on all the model at the same
time (the uniformly mixed Kelly strategy and the fitting mixed Kelly strategy).
The uniformly mixed strategy shows interesting features since its VaR is an order
of magnitude smaller than the others (it is around 10−2 , where for the other
strategies developed in this paper it is more around 10−1 ). But any strategies
succeed to bit the B&H one on all the metrics considered.
We tried to apply estimations of the critical time to use them as breaking times
into the different strategies but it did not optimize them.
We constructed an estimations in current time, based on the past results, to
be in a critical phase. A first one is based on the accuracy of the fits based
on the LPPLS model versus the fits based on a geometric growth. Where the
second use the past success of the model-based quenched Kelly strategies. A
deeper study could be done to estimate the capacity of each estimators to predict
the burst of a bubble. Quickly we observed that the one based on the fitting
probability gave peaks when a bubble burst where the one based on the success
values oscillated much more and seemed to have a delay and did not succeed to
predict bubble burst.
The final part of this project was about predictions based on the optimization
of neural networks as a model-free approach (”black box”). The results where
bad. After optimization, the networks outputs converged to a value close to the
mean values of the training sets. Oscillations in the outputs have been observed
and we tried to extract from them some features but it did not work. This could
be due to the network structure, the hyper-parameters not well calibrated, the
size of the training set is to small or more deeply to the fact that the data given
as inputs do not contains enough information to predict the different targets.
89
A Martingale Representation Theorem
Let Wt be a Brownian motion (Wiener process) on a standard filtered
probability space (Ω, F, Ft , P ) and Wt the augmented filtration generated by
Wt . If S is a square integrable random variable measurable with respect to W∞ ,
then there exists a predictable process φ which is adapted with respect to Wt ,
such that:
Z ∞
S = E[S] + φs dWs ,
0
Z t (105)
⇒ E[S|Wt ] = E[S] + φs dWs ,
0
where E[.] is the expected value, E[.|Wt ] is the expected value conditional to
R∞ Rt
R . filtration Wt and E[
the φs dWs |Wt ] = φs dWs is the realization of process
φs dWs at time t.
B Computation of the Ornstein-Uhlenbeck Pro-

cess
Let us consider dνt = −θνdt + σdWt and f (νt , t) = νt eθt .
⇒ df (νt , t) = eθt (θνt dt + dνt )
(106)
= σeθt dWt
Z t
−θt −θt
⇒ νt = e ν0 + σe eθt dWs
0
(107)
e−θt
= e−θt ν0 + σ √ WR t 2θe2θt dt
2θ
e−θt
⇒ νt = e−θt ν0 + σ √ W(e2θt −1) , (108)
2θ
√
where Wt = cWt/c and N(0,σ1 ) + N(0,σ2 ) ∼ N(0,√σ2 +σ2 ) have been used.
1 2
C From Return to Log-Return

Let us consider that the log-price dynamic is described by:
dyt = d ln pt = µ(yt , t)dt + σdWt . (109)
The price dynamic is given by:
pt
Rt
dy
= e 0 t ⇒ p(yt ) = p0 eyt . (110)
p0
From Îto’s calculus, the price differential is given by:
∂pt 1 ∂ 2 pt
dpt = dyt + dhyit ,
∂yt 2 ∂yt2 (111)
1
⇒ dpt = pt (dyt + dhyit ) ,
2
90
where dhyit is the infinitesimal variation and it can be computed by dhyit =
(dyt )2 where
√ you only need to keep the terms up to dt, you need to know that
dWt ≡ dt since dWt ∼ N(0,dt) and so dhW it = dt. Which leads to the relation
between the log-return and the compound return:
dpt σ2
= d ln pt + dt . (112)
pt 2
D Log-Likelihood
Let us consider a random variable X which we assume that it follows a process
given by the probability density p(x; Φ) where x ∈ Ω is the value observed and
Φ is the set of hyper-parameters. Assume a sample sets ~x = (x1 , ..., xN ), then
the likelihood is defined by:
N
Y
l(Φ) = p(xi ; Φ) , (113)
i=1
so the log-likelihood is defined by:

N
X
L(Φ) = ln l(Φ) = ln p(xi ; Φ) . (114)
i=1
2
/(2σ 2 )
√
If X ∼ N(0,σ) , p(x, σ) = e−x / 2πσ and so the solution of Equation (114)
is:
N
1 X 2 N N
L(σ) = − 2
xi − ln σ 2 − ln 2π . (115)
2σ i=1 2 2
To maximize it you have to solve ∂σ L(σ)|σ̂ = 0, where σ̂ is the standard devia-
tion which minimize the log-likelihood, which leads to:
N
N N 1 X 2
L(σ̂) = − ln σ̂ 2 − ln 2π, where: σ̂ 2 = x . (116)
2 2 N i=1 i
If the data xi depends of an other set of parameters Ψ, you can define the cost
function:
N
1 X 2
f (Ψ) = σ̂(Ψ)2 = x (Ψ) . (117)
N i=1 i
E Observables
• The R2 of the power-law: The main hypothesis is ln pt = Lt + νt ,
where Lt is the LPPLS function and νt the error model concerned. The
power-law can be approximate by:
ln pt − A
plt ≈ . (118)
B + C cos(ω ln |tc − t| + φ)
Then in a log-log plot it is possible to compute the R2 of ln plt ≈ m ln |tc −
t|. In the general case where you have a set of data points {yi }Ni=1 and the
set {ŷi }N
i=1 of points obtained after fitting. Then the R 2
is computed by:
91
P N
(yi − ȳ) 1 X
R2 = 1 − P i , where: ȳ = yi . (119)
i i − ŷi )
(y N i=1
• Entropy of the log-periodic part: Since we do not have a periodic

part in time but a log-periodic one, let us consider xt = ω ln |tc − t| + φ.
Then the log-periodic part in t is now periodic in function of xt and it can
be approximated by:
ln pt − A − B|tc − t|m
cos xt ≈ . (120)
C|tc − t|m
The entropy of a periodic function f is defined as follow:
– Compute the discrete Fourier transform (DFT) of the signal and
denoted it by: DFT[f ](k).
– Compute the power density spectrum (PDS) of the function:
|DFT[f](k)|2
pk = P 2
. (121)
l |DFT[f](l)|
– Then the entropy of the function f is:

X
S[f ] = − pk ln pk . (122)
k
The entropy is a measure of the uncertainty of the periodicity of the signal.

The smaller it is, the less uncertain the periodicity.
F Spin Glass and Quenched Kelly Weight

Spin glass are characterized by the Hamiltonian of the Ising model:
X
HΛ (sΛ , JΛ ) = − Jij si sj , (123)
i6=j∈Λ
where Λ is a d-dimensional lattice, s = ±1 are the spins and Jij is the exchange
energy between spin i and j. In the case of spin glass each Jij are random
variable.
The partition function is defined by:
X
Z(β, JΛ ) = he−βHΛ is = e−βHΛ ((sΛ ,JΛ )) , (124)
sΛ ∈hsiΛ
where hsiΛ is the set of all the possible configuration of spins on the lattice Λ.
From the partition function you can define the annealed and the quenched free
energy:
annealed free-energy: F a (β) = |λ|−1 lnhZ(β, JΛ )iJ ,
(125)
quenched free-energy: F q (β) = |λ|−1 hln Z(β, JΛ )iJ ,
where λ = |Λ| is the number of spins on the lattice. The quenched one referred
to a random behavior which do not evolve with time, they are called ”frozen”,
it is opposite to the annealed one, where the random variable are allowed to
evolve themselves.
This characteristics are the same used to define the quenched Kelly weight
(Section 3.3, p∼29).
92
G -Drawup/down Algorithm
1. Entries: a time series xi = xti with ti+1 = ti + ∆ti , a set of thresholds E
and a set of windows K.
2. For each ∈ E and k ∈ K:
(a) Set i0 = k
(b) For each i > i0 :
• Compute ∆xi0 ,i = xi − xi0 and the standard deviation σ(k)i of
the time series ∆xi−1,i over the past k days.
• Compute the largest deviation δi0 ,i :

 i max {∆xi0 ,i } − ∆xi0 ,i if: i0 − 1 is a drawdown
0≤j<i
δi0 ,i =
∆xi0 ,i − min {∆xi0 ,i } if: i0 − 1 is a drawup
i0 ≤j<i
• Stop when δi0 ,i > σ(k)i .

• Set the draw up/down index to i1 such that:

 arg i max {∆xi0 ,i } if: i0 − 1 is a drawdown
0 ≤j<i
i1 =
 arg min {∆xi0 ,i } if: i0 − 1 is a drawup
i0 ≤j<i
• Set: i0 = i1 + 1 and register ti1 in T,k .

S
3. Compute the set of all peak times T = ∈E,k∈K T,k .
4. Compute the number of times each peaks tp ∈ T occurred over all trials
Ntp :
X
Ntp = 1{tp ∈T,k }
∈E,k∈K
5. Regroup all the number of apparition of each peak in the set:

Ntp
N = ntp = , ∀tp ∈ T
N
where N = |E||K| is the total number of tests.

6. Compute the set:
T 0 = {tp ∈ T | ntp ≥ δn}
7. Exit: T 0 .
In this paper: δn = 0.65, E = {0 + n∆, n = 0, ..., 99} with 0 = 0.1, ∆ = (5 −

0.1)/99 and K = {k0 + n∆k, n = 0, ..., 10} with k0 = 10 and ∆k = (60 − 10)/10.
93
H Code Python: Model classes
Classes Goals Main characteristics
Defines the general structure for a model. Abstract methods: model, fit
Model Which is characterized by a method
model which defines the model and fit
which defines the optimization procedure.
Defines models which will based the op- Inherits: Model
Optimization timization procedure on a linear system Abstract methods: to minimize,
resolution and a non-linear optimization build linear system, grad model, jac,
based on the Newton Conjugate-Gradient hess model, hess
method. Since it forced a model to define
the Jacobian and hessian matrix of the
model and the cost function with respect
to its parameters.
Overrides the method to be able to fit and Inherits: Optimization
GN cost defines the cost function (which here it is Overrides: build linear system,
the SSE, see equation 23), its Jacobian to minimize, jac, hess
and Hessian matrix. Defines: cost, linear optimization,
grad cost, hess cost
Both adapt the methods linked to the cost Inherit: GN cost
BM cost\OU cost function to be able to optimize whith re- Override: cost, linear optimization,
spect to each error models. grad cost, hess cost
For each error models they adapt respec- Inherit: GN cost\BM cost\OU cost
LogLik GN\LogLik BM tively the methods linked to the cost func- Override: cost, grad cost, hess cost
\LogLik OU tions to optimize the log-likelihood. Define: log likelihood
Overrides and defines the methods to con- Inherits: Optimization
Predictive struct the complete optimization proce- Abstract method: get grid
dure (from the grid search to the non- Overrides: fit
linear optimization). Defines: grid search,
non linear optimization.
PL defines the power-law model and LP- Inherit: Predictive\PL
PL\LPPLS PLS defines the Log-Periodic Power-law Override: model, get grid, grad model,
Singularity model. LPPLS inherits from hess model
PL to simplify the methods link to the
power-law part of the model.
Defines a stochastic model which is char- Inherits: Model
Stochastic acterized by a density of probability and Abstract methods: probability, expec-
different expectations which take a set of tation, volatility, expected growth
information as argument.
Define respectively the standard Gaus- Inherit: Stochastic
GN\BM\OU sian distribution, Brownian Motion and Override: model, fit, probability, expec-
Ornstein-Uhlenbeck process. They are tation, volatility, expected growth
also construct to be able to consider a lin-
ear drift, which leads to a GBM for the
BM case and the MGBM for the OU one.
This model is a combination of a predic- Inherits: Model
Ito model tive and stochastic one. Overrides: model, fit, probability, ex-
pectation, volatility, expected growth
Table 9: The classes definitions used to implement the models and the optimization procedures.
94
from abc import ABCMeta , a b s t r a c t m e t h o d
from s c i p y . o p t i m i z e import minimize
import numpy a s np
import time a s tm
import pylab a s p l
from s c i p y . s i g n a l import a r g r e l e x t r e m a
c l a s s Model :
’ ’ ’ A b s t r a c t c l a s s : D e f i n e a model ’ ’ ’
metaclass = ABCMeta
def init ( self ):

s e l f . name = ’ model ’
@abstractmethod
def model ( s e l f , time ) : pass
@abstractmethod
def p r i n t p a r a m e t e r ( s e l f , l i n e a r = None , n o n l i n e a r = None , s s e = None ) : pass
@abstractmethod
def b u i l t d i c o ( s e l f ) : pass
@abstractmethod
def f i t ( s e l f , o b j e c t i v e , time , a r g = None ) : pass
def i n b o u n d s ( s e l f , para , bounds ) :

f o r n in range ( len ( para ) ) :
i f para [ n]< bounds [ n ] [ 0 ] or para [ n]> bounds [ n ] [ 1 ] :
return F a l s e
return True
############################################################################################
’ ’ ’ A b s t r a c t c l a s s e s : D e f i n e o p t i m i z a t i o n methods t o f i t t o a P r e d i c t i v e / S t o c a s t i c model ’
c l a s s O p t i m i z a t i o n ( Model ) :
’ ’ ’ A b s t r a c t c l a s s : D e f i n e t h e methods t o use t o o p t i m i z e ’ ’ ’
metaclass = ABCMeta
@abstractmethod
def model ( s e l f , time , l i n e a r=None , n o n l i n e a r=None ) : pass
@abstractmethod
def b u i l d l i n e a r s y s t e m ( s e l f , o b j e c t i v e , time , l i n e a r=None , n o n l i n e a r=None ) : pass
@abstractmethod
95
def t o m i n i m i z e ( s e l f , n o n l i n e a r , o b j e c t i v e , time , l i n e a r=None ) : pass
@abstractmethod
def grad model ( s e l f , time , l i n e a r=None , n o n l i n e a r=None ) : pass
@abstractmethod
def j a c ( s e l f , n o n l i n e a r , o b j e c t i v e , time , l i n e a r=None ) : pass
@abstractmethod
def h e s s m o d e l ( s e l f , time , l i n e a r=None , n o n l i n e a r=None ) : pass
@abstractmethod
def h e s s ( s e l f , n o n l i n e a r , o b j e c t i v e , time , l i n e a r=None ) : pass
def c o r r e c t m o d e l p a r a m e t e r s ( s e l f , l i n e a r , n o n l i n e a r ) :
return l i n e a r , n o n l i n e a r
##############################################################
c l a s s GN cost ( O p t i m i z a t i o n ) :
’ ’ ’ A b s t r a c t C l a s s : Minimize t h e SSE ’ ’ ’
metaclass = ABCMeta
################################################################
’ ’ ’ Methods which d e f i n e t h e c o s t f u n c t i o n ’ ’ ’
def d e l t a ( s e l f , o b j e c t i v e , time , l i n e a r=None , n o n l i n e a r=None ) :

return o b j e c t i v e − s e l f . model ( time , l i n e a r , n o n l i n e a r )
def c o s t ( s e l f , o b j e c t i v e , time , l i n e a r=None , n o n l i n e a r=None ) :

d e l t a = s e l f . d e l t a ( o b j e c t i v e , time , l i n e a r , n o n l i n e a r )
return d e l t a . dot ( d e l t a ) / len ( o b j e c t i v e )
def t o m i n i m i z e ( s e l f , n o n l i n e a r , o b j e c t i v e , time , l i n e a r=None ) :

# Function t o minimize t o f i t t o t h e model
return s e l f . c o s t ( o b j e c t i v e , time , l i n e a r , n o n l i n e a r )
#################################################################
’ ’ ’ D e f i n e t h e l i n e a r system t o o b t a i n t h e l i n e a r p a r a m e t e r s ’ ’ ’
def b u i l d l i n e a r s y s t e m ( s e l f , o b j e c t i v e , time , n o n l i n e a r = [ 1 . , 0 . , 0 . ] ) :
grad = s e l f . l i n e a r g r a d ( time , n o n l i n e a r )
A = grad . dot ( grad .T)

b = grad . dot ( o b j e c t i v e )
96
return A, b
##################################################################################
’ ’ ’ L i n e a r o p t i m i z a t i o n methods ’ ’ ’
def l i n e a r o p t i m i z a t i o n ( s e l f , o b j e c t i v e , time , n o n l i n e a r=None , o t h e r g u e s s=None ) :

# Return t h e s o l u t i o n o f t h e l i n e a r system
A, b = s e l f . b u i l d l i n e a r s y s t e m ( o b j e c t i v e , time , n o n l i n e a r )
try :
return np . l i n a l g . s o l v e (A, b ) . r e s h a p e ( −1)
except :
return np . o n e s ( len ( b ) ) ∗ np . nan
##################################################################
’ ’ ’ G r a d i e n t o f t h e f u n c t i o n t o minimize ’ ’ ’
def g r a d c o s t ( s e l f , o b j e c t i v e , time , l i n e a r=None , n o n l i n e a r=None ) :

# Return t h e g r a d i e n t o f t h e SSE
d e l t a = 2 . ∗ s e l f . d e l t a ( o b j e c t i v e , time , l i n e a r , n o n l i n e a r )
N = len ( o b j e c t i v e )
return −d e l t a . dot ( s e l f . grad model ( time , l i n e a r , n o n l i n e a r ) ) /N
def j a c ( s e l f , n o n l i n e a r , o b j e c t i v e , time , l i n e a r=None ) :

# J a c o b i a n o f t h e f u n c t i o n t o minimize
return s e l f . g r a d c o s t ( o b j e c t i v e , time , l i n e a r , n o n l i n e a r )
####################################################################
’ ’ ’ H es si a n Matrix o f t h e f u n c t i o n t o minimize ’ ’ ’
def h e s s c o s t ( s e l f , o b j e c t i v e , time , l i n e a r=None , n o n l i n e a r=None ) :

# Return h e s s i a n m a t r i x o f t h e SSE
j a c m o d e l = s e l f . grad model ( time , l i n e a r , n o n l i n e a r )
h e s s m o d e l = s e l f . h e s s m o d e l ( time , l i n e a r , n o n l i n e a r )
N = len ( o b j e c t i v e )
return 2 . ∗ np . dot ( j a c m o d e l . T, j a c m o d e l ) /N−h e s s m o d e l . dot ( d e l t a ) /N
def h e s s ( s e l f , n o n l i n e a r , o b j e c t i v e , time , l i n e a r=None ) :

# H e s si an m a t r i x o f t h e f u n c t i o n t o minimize
return s e l f . h e s s c o s t ( o b j e c t i v e , time , l i n e a r , n o n l i n e a r )
#######################################################################
c l a s s BM cost ( GN cost ) :
97
metaclass = ABCMeta
################################################################

d e l t a = s e l f . d e l t a ( o b j e c t i v e , time , l i n e a r , n o n l i n e a r )
dt = time [ 1 : ] − time [ : − 1 ]
delta = delta [ 1 : ] − delta [: −1]
return d e l t a . dot ( d e l t a / dt ) / len ( d e l t a )
#################################################################
’ ’ ’ D e f i n e t h e l i n e a r system t o o b t a i n t h e l i n e a r p a r a m e t e r s ’ ’ ’
def b u i l d l i n e a r s y s t e m ( s e l f , o b j e c t i v e , time , n o n l i n e a r = [ 1 . , 0 . , 0 . ] ) :
grad = grad [ 1 : , 1 : ] − grad [ 1 : , : −1]
objective = objective [ 1 : ] − objective [: −1]

dt = time [ 1 : ] − time [ : − 1 ]
A = grad . dot ( grad .T/ dt . r e s h a p e ( −1 , 1 ) )

b = grad . dot ( o b j e c t i v e / dt )
return A, b
##################################################################################
def l i n e a r o p t i m i z a t i o n ( s e l f , o b j e c t i v e , time , n o n l i n e a r=None , o t h e r g u e s s=None ) :

l i n e a r = super ( ) . l i n e a r o p t i m i z a t i o n ( o b j e c t i v e , time , n o n l i n e a r , o t h e r g u e s s )
g r a d 0 = s e l f . l i n e a r g r a d ( time [ 0 : 1 ] , n o n l i n e a r ) [ : , 0 ]
A = ( o b j e c t i v e [ 0 ] − g r a d 0 [ 1 : ] . dot ( l i n e a r ) ) / g r a d 0 [ 0 ]
return np . c o n c a t e n a t e ( ( [ A] , l i n e a r ) )
##################################################################

grad = s e l f . grad model ( time , l i n e a r , n o n l i n e a r )
dt = time [ 1 : ] − time [ : − 1 ]
d e l t a = ( d e l t a [ 1 : ] − d e l t a [ : − 1 ] ) / dt
N = len ( dt )
98
return ( d e l t a . dot ( grad [ : − 1 ] ) − d e l t a . dot ( grad [ 1 : ] ) ) / N
####################################################################
’ ’ ’ H es si a n Matrix o f t h e f u n c t i o n t o minimize ’ ’ ’

grad = s e l f . grad model ( time , l i n e a r , n o n l i n e a r )
h e s s = s e l f . h e s s m o d e l ( time , l i n e a r , n o n l i n e a r )
dt = time [ 1 : ] − time [ : − 1 ]
d e l t a = ( d e l t a [ 1 : ] − d e l t a [ : − 1 ] ) / dt
N = len ( dt )
h e s s 1 = ( h e s s [ : , : , : − 1 ] . dot ( d e l t a ) − h e s s [ : , : , 1 : ] . dot ( d e l t a ) ) /N
dt = dt . r e s h a p e ( −1 , 1 )
hess 2 = grad [ : − 1 , : ] . T. dot ( grad [ : − 1 , : ] / dt )
hess 2 = h e s s 2 + grad [ 1 : , : ] . T . dot ( grad [ 1 : , : ] / dt )
hess 2 = h e s s 2 − grad [ : − 1 , : ] . T . dot ( grad [ 1 : , : ] / dt )
hess 2 = h e s s 2 − grad [ 1 : , : ] . T . dot ( grad [ : − 1 , : ] / dt )
return h e s s 1 + h e s s 2 /N
#####################################################################
c l a s s OU cost ( GN cost ) :
metaclass = ABCMeta
def i n i t p r o c e d u r e ( s e l f , o b j e c t i v e , time , l i n e a r = None , n o n l i n e a r = None ) :

d nu = o b j e c t i v e [ 1 : ] − o b j e c t i v e [ : − 1 ]
nu = o b j e c t i v e [ : − 1 ]
dt = time [ 1 : ] − time [ : − 1 ]
return nu , d nu , dt , len ( dt )
def i n i t ( s e l f , o b j e c t i v e , time , l i n e a r = None , n o n l i n e a r = None ) :

d e l t a = s e l f . d e l t a ( o b j e c t i v e , time , l i n e a r [ 1 : ] , n o n l i n e a r )
return s e l f . i n i t p r o c e d u r e ( d e l t a , time )
def c o r r e c t m o d e l p a r a m e t e r s ( s e l f , l i n e a r , n o n l i n e a r ) :
return l i n e a r [ 1 : ] , n o n l i n e a r
################################################################
def g e t v a r i a t i o n ( s e l f , o b j e c t i v e , time , l i n e a r=None , n o n l i n e a r=None ) :
99
d e l t a = s e l f . d e l t a ( o b j e c t i v e , time , l i n e a r [ 1 : ] , n o n l i n e a r )
dt = time [ 1 : ] − time [ : − 1 ]
d = delta [ 1 : ] − delta [: −1]
delta = delta [: −1]
return d , d e l t a , dt
def f o r m c o s t ( s e l f , d e l t a , o b j e c t i v e , dt ) :
return d e l t a . dot ( d e l t a ∗ dt ) , o b j e c t i v e . dot ( d e l t a ) , o b j e c t i v e . dot ( o b j e c t i v e / dt )

d , d e l t a , dt = s e l f . g e t v a r i a t i o n ( o b j e c t i v e , time , l i n e a r , n o n l i n e a r )
a , b , c = s e l f . f o r m c o s t ( d , d e l t a , dt )
return ( l i n e a r [ 0 ] ∗ l i n e a r [ 0 ] ∗ a + 2 . ∗ l i n e a r [ 0 ] ∗ b + c ) / len ( d e l t a )
#################################################################
’ ’ ’ D e f i n e a system t o o b t a i n t h e l i n e a r p a r a m e t e r s ’ ’ ’
def b u i l d s y s t e m ( s e l f , o b j e c t i v e , time , n o n l i n e a r = [ 1 . , 0 . , 0 . ] ) :
nu , d nu , dt , N = s e l f . i n i t p r o c e d u r e ( o b j e c t i v e , time )
d g r a d = ( grad [ : , 1 : ] − grad [ : , : − 1 ] )
grad = grad [ : , : − 1 ]
A2 = grad . dot ( ( dt ∗ grad ) . T) /N

b2 = −2.∗ grad . dot ( dt ∗nu ) /N
c2 = nu . dot ( dt ∗nu ) /N
A1 = grad . dot ( d g r a d .T) /N

b1 = −( d g r a d . dot ( nu ) + grad . dot ( d nu ) ) /N
c1 = d nu . dot ( nu ) /N
A0 = ( d g r a d / dt ) . dot ( d g r a d . T) /N
b0 = −2.∗ d g r a d . dot ( d nu / dt ) /N
c0 = d nu . dot ( ( d nu / dt ) . T) /N
return [ A0 , A1 , A2 ] , [ b0 , b1 , b2 ] , [ c0 , c1 , c2 ]
def b u i l d l i n e a r s y s t e m ( s e l f , t h e t a , A, b = None , c = None ) :

A = t h e t a ∗ t h e t a ∗ s e l f . h e s s q u a d (A [ 2 ] ) + t h e t a ∗ s e l f . h e s s q u a d (A [ 1 ] ) + s e l f . h e s s q u a d
b = −t h e t a ∗ t h e t a ∗b [ 2 ] − t h e t a ∗b [ 1 ] − b [ 0 ]
return A, b
def quad ( s e l f , l i n e a r , A, b , c ) :
return l i n e a r . dot (A. dot ( l i n e a r . T) + b . T) + c
100
def grad quad ( s e l f , l i n e a r , A, b ) :
return (A + A.T ) . dot ( l i n e a r ) + b
def h e s s q u a d ( s e l f , A ) :
return A + A. T
def q u a d r a t i c p a r t ( s e l f , l i n e a r , o b j e c t i v e , A, b , c ) :
c o s t 2 = l i n e a r [ 0 ] ∗ l i n e a r [ 0 ] ∗ s e l f . quad ( l i n e a r [ 1 : ] , A [ 2 ] , b [ 2 ] , c [ 2 ] )
c o s t 1 = l i n e a r [ 0 ] ∗ s e l f . quad ( l i n e a r [ 1 : ] , A [ 1 ] , b [ 1 ] , c [ 1 ] )
c o s t 0 = s e l f . quad ( l i n e a r [ 1 : ] , A [ 0 ] , b [ 0 ] , c [ 0 ] )
return c o s t 2 + c o s t 1 + c o s t 0
def g r a d q u a d r a t i c ( s e l f , l i n e a r , o b j e c t i v e , A, b , c ) :
g r a d t h e t a = 2 . ∗ l i n e a r [ 0 ] ∗ s e l f . quad ( l i n e a r [ 1 : ] , A [ 2 ] , b [ 2 ] , c [ 2 ] ) + s e l f . quad ( l i n e
g r a d 2 = l i n e a r [ 0 ] ∗ l i n e a r [ 0 ] ∗ s e l f . grad quad ( l i n e a r [ 1 : ] , A [ 2 ] , b [ 2 ] )
g r a d 1 = l i n e a r [ 0 ] ∗ s e l f . grad quad ( l i n e a r [ 1 : ] , A [ 1 ] , b [ 1 ] )
g r a d 0 = s e l f . grad quad ( l i n e a r [ 1 : ] , A [ 0 ] , b [ 0 ] )
return np . c o n c a t e n a t e ( ( [ g r a d t h e t a ] , g r a d 2 + g r a d 1 + g r a d 0 ) )
def h e s s q u a d r a t i c ( s e l f , l i n e a r , o b j e c t i v e , A, b , c ) :
h e s s t h e t a = 2 . ∗ s e l f . quad ( l i n e a r [ 1 : ] , A [ 2 ] , b [ 2 ] , c [ 2 ] )
h e s s t h e t a l i n = 2 . ∗ l i n e a r [ 0 ] ∗ s e l f . grad quad ( l i n e a r [ 1 : ] , A [ 2 ] , b [ 2 ] ) + s e l f . grad q
h e s s = l i n e a r [ 0 ] ∗ l i n e a r [ 0 ] ∗ s e l f . h e s s q u a d (A [ 2 ] ) + l i n e a r [ 0 ] ∗ s e l f . h e s s q u a d (A [ 1 ] ) +
h e s s = np . c o n c a t e n a t e ( ( h e s s t h e t a l i n . r e s h a p e ( 1 , −1) , h e s s ) , a x i s = 0 )
h e s s t h e t a = np . c o n c a t e n a t e ( ( [ h e s s t h e t a ] , h e s s t h e t a l i n ) )
return np . c o n c a t e n a t e ( ( h e s s t h e t a . r e s h a p e ( −1 ,1) , h e s s ) , a x i s = 1 )
##################################################################################
def g e t t h e t a ( s e l f , o b j e c t i v e , time , l i n e a r = None , n o n l i n e a r = None ) :

model = s e l f . model ( time , l i n e a r , n o n l i n e a r )
o b j e c t i v e = o b j e c t i v e − model
d nu = o b j e c t i v e [ 1 : ] − o b j e c t i v e [ : − 1 ]
nu = o b j e c t i v e [ : − 1 ]
dt = time [ 1 : ] − time [ : − 1 ]
return − nu . dot ( d nu .T) / nu . dot ( ( nu∗ dt ) . T)
def g e t g u e s s e s ( s e l f , o b j e c t i v e , time , n o n l i n e a r = None , o t h e r g u e s s = None , A=None , b=No

i f o t h e r g u e s s i s None :
try :
101
A0 , b0 = super ( ) . b u i l d l i n e a r s y s t e m ( o b j e c t i v e , time , n o n l i n e a r )
l i n e a r = np . l i n a l g . s o l v e (A0 , b0 ) . r e s h a p e ( −1)
except :
l i n e a r = np . o n e s ( len ( b ) ) ∗ np . nan
t h e t a = s e l f . g e t t h e t a ( o b j e c t i v e , time , l i n e a r , n o n l i n e a r )
i f t h e t a < 0 or t h e t a > 2 :
try :
l i n e a r [ 1 : ] = np . l i n a l g . s o l v e (A [ 0 ] [ 1 : , 1 : ] , −b [ 0 ] [ 1 : ] ) . r e s h a
l i n e a r [ 0 ] = −(b [ 2 ] [ 0 ] + A [ 2 ] [ 0 , 1 : ] . dot ( l i n e a r [ 1 : ] ) ) / A [ 2 ] [ 0
except :
t h e t a = s e l f . g e t t h e t a ( o b j e c t i v e , time , l i n e a r , n o n l i n e a r )
A, b = s e l f . b u i l d l i n e a r s y s t e m ( t h e t a , A, b , c )
try :
l i n e a r = np . l i n a l g . s o l v e (A, b ) . r e s h a p e ( −1)
except :
t h e t a = np . nan
theta = [ theta ]
else :
theta = other guess
A, b = s e l f . b u i l d l i n e a r s y s t e m ( t h e t a , A, b , c )
try :
l i n e a r = np . l i n a l g . s o l v e (A, b ) . r e s h a p e ( −1)
except :
theta = [ theta ]
return np . c o n c a t e n a t e ( ( t h e t a , l i n e a r ) )
def l i n e a r o p t i m i z a t i o n ( s e l f , o b j e c t i v e , time , n o n l i n e a r=None , o t h e r g u e s s = None , u s e o p

# Return t h e s o l u t i o n o f t h e l i n e a r system
A, b , c = s e l f . b u i l d s y s t e m ( o b j e c t i v e , time , n o n l i n e a r )
g u e s s e s = s e l f . g e t g u e s s e s ( o b j e c t i v e , time , n o n l i n e a r , o t h e r g u e s s , A, b , c )
bnds = ( ( 0 . , 1 . ) , ( None , None ) , ( None , None ) , ( None , None ) , ( None , None ) )
if use opti :
return minimize ( s e l f . q u a d r a t i c p a r t , g u e s s e s , a r g s =( o b j e c t i v e , A, b , c ) ,
else :
return g u e s s e s
#####################################################################
102
def i n i t g r a d ( s e l f , o b j e c t i v e , time , l i n e a r = None , n o n l i n e a r = None ) :
d e l t a , d , dt , N = s e l f . i n i t ( o b j e c t i v e , time , l i n e a r , n o n l i n e a r )
grad = s e l f . grad model ( time , l i n e a r [ 1 : ] , n o n l i n e a r )

d g r a d = grad [ 1 : ] − grad [ : − 1 ]
grad = grad [ 1 : ] − grad [ : − 1 ]
return grad , d grad , d e l t a , d , dt , N

grad , d grad , d e l t a , d , dt , N = s e l f . i n i t g r a d ( o b j e c t i v e , time , l i n e a r , n o n l i n e a r
g r a d 2 = −2.∗ l i n e a r [ 0 ] ∗ l i n e a r [ 0 ] ∗ ( dt ∗ d e l t a ) . dot ( grad )

g r a d 1 = −2.∗ l i n e a r [ 0 ] ∗ ( d e l t a . dot ( d g r a d ) + d . dot ( grad ) )
g r a d 0 = −2.∗( d/ dt ) . dot ( d g r a d )
return ( g r a d 2 + g r a d 1 + g r a d 0 ) /N
#####################################################################
’ ’ ’ H es si a n m a t r i x o f t h e f u n c t i o n t o minimize ’ ’ ’

grad , d grad , d e l t a , d , dt , N = s e l f . i n i t g r a d ( o b j e c t i v e , time , l i n e a r , n o n l i n e a r
h e s s = s e l f . h e s s m o d e l ( time , l i n e a r [ 1 : ] , n o n l i n e a r )
d hess = hess [ : , : , 1 : ] − hess [ : , : , : − 1 ]
hess = hess [ : , : , : − 1 ]
h e s s 2 = −2.∗ l i n e a r [ 0 ] ∗ l i n e a r [ 0 ] ∗ ( h e s s . dot ( d e l t a ∗ dt ) − grad . T . dot ( grad ∗ dt . r e s h a p e (

h e s s 1 = −2.∗ l i n e a r [ 0 ] ∗ ( grad . T. dot ( d g r a d ) + d g r a d . T . dot ( grad ) )
h e s s 0 = −2.∗( d h e s s . dot ( d/ dt ) − d g r a d . T . dot ( d g r a d / dt . r e s h a p e ( −1 , 1 ) ) )
return ( h e s s 2 + h e s s 1 + h e s s 0 ) /N
####################################################################
c l a s s LogLik GN ( GN cost ) :
’ ’ ’ A b s t r a c t C l a s s : Minimize t h e Log (SSE) ’ ’ ’
metaclass = ABCMeta
################################################################

return np . l o g ( super ( ) . c o s t ( o b j e c t i v e , time , l i n e a r , n o n l i n e a r ) )
103
def l o g l i k e l i h o o d ( s e l f , o b j e c t i v e , time , l i n e a r=None , n o n l i n e a r=None ) :
return −len ( o b j e c t i v e ) ∗ 0 . 5 ∗ ( 1 . + s e l f . c o s t ( o b j e c t i v e , time , l i n e a r , n o n l i n e a r ) +
##################################################################
return super ( ) . g r a d c o s t ( o b j e c t i v e , time , l i n e a r , n o n l i n e a r ) / super ( ) . c o s t ( o b j e c t i
##################################################################

# Return h e s s i a n m a t r i x o f t h e l o g −l i k e l i h o o d
c o s t = super ( ) . c o s t ( o b j e c t i v e , time , l i n e a r , n o n l i n e a r )
g r a d c o s t = super ( ) . g r a d c o s t ( o b j e c t i v e , time , l i n e a r , n o n l i n e a r )
h e s s c o s t = super ( ) . h e s s c o s t ( o b j e c t i v e , time , l i n e a r , n o n l i n e a r )
return h e s s c o s t / c o s t − g r a d c o s t . dot ( g r a d c o s t ) / c o s t / c o s t
#######################################################################
c l a s s LogLik BM ( BM cost ) :
metaclass = ABCMeta
################################################################


##################################################################
##################################################################

104
#######################################################################
c l a s s LogLik OU ( OU cost ) :
metaclass = ABCMeta
################################################################


##################################################################
##################################################################

#########################################################################
class P r e d i c t i v e ( Optimization ) :
’ ’ ’ A b s t r a c t c l a s s : D e f i n e an o p t i m i z a t i o n p r o c e d u r e f o r a non−s t o c h a s t i c model ’ ’ ’
metaclass = ABCMeta
def c o n v e r t a r g u m e n t ( s e l f , argument = None ) :

d e f a u l t = [ None , ’ Nelder−Mead ’ , None , None , F a l s e , 5 , F a l s e ]
i f argument i s not None :
names = [ ’ n o n l i n e a r b o u n d s ’ , ’ method ’ , ’ t o l ’ , ’ o p t i o n s ’ , ’ show ’ , ’ n t o e x
105
i n d e x = np . where ( np . i s i n ( names , l i s t ( argument . k e y s ( ) ) ) == True ) [ 0 ]
for i in i n d e x :
d e f a u l t [ i ] = argument [ names [ i ] ]
return d e f a u l t
#################################################################################
’ ’ ’ Grid s e a r c h f o r l i n e a r o p t i m i z a t i o n ’ ’ ’
@abstractmethod
def g e t g r i d ( s e l f , n o n l i n e a r b o u n d s , s t e p =10): pass
def g e t g r i d r e s u l t ( s e l f , o b j e c t i v e , time , n o n l i n e a r b o u n d s , s t e p =10 , o t h e r g u e s s = None )

# D e f i n e a g r i d s e a r c h on t h e non−l i n e a r p a r a m e t e r s t o o p t i m i z e t h e l i n e a r ones
non linear = s e l f . get grid ( non linear bounds , step )
N p o i n t = len ( n o n l i n e a r )
linear = [ ]
c o s t = np . z e r o s ( N p o i n t )
f o r n in range ( N p o i n t ) :
l i n e a r . append ( s e l f . l i n e a r o p t i m i z a t i o n ( o b j e c t i v e , time , n o n l i n e a r [ n ] , o t h
c o s t [ n ] = s e l f . c o s t ( o b j e c t i v e , time , l i n e a r [ − 1 ] , n o n l i n e a r [ n ] )
return c o s t , l i n e a r , n o n l i n e a r
def g e t l o c a l m i n i m a ( s e l f , c o s t , l i n e a r , n o n l i n e a r , n t o e x t r a c t = None ) :
i f n t o e x t r a c t i s None :
argmin = np . argmin ( c o s t )
return l i n e a r [ argmin ] , n o n l i n e a r [ argmin ]

else :
L in e ar , N o n l i n e a r = [ ] , [ ]
x = a r g r e l e x t r e m a ( c o s t , np . l e s s ) [ 0 ]
while len ( x ) > n t o e x t r a c t :
y = a r g r e l e x t r e m a ( c o s t [ x ] , np . l e s s ) [ 0 ]
i f len ( y ) < n t o e x t r a c t :
break
x = x[y]
i f len ( x ) > n t o e x t r a c t :
e p s = len ( x)− n t o e x t r a c t
for n in range ( e p s ) :
x = np . d e l e t e ( x , np . argmax ( c o s t [ x ] ) )
for i in x :
L i n e a r . append ( l i n e a r [ i ] )
N o n l i n e a r . append ( n o n l i n e a r [ i ] )
return L in e ar , N o n l i n e a r
def g r i d s e a r c h ( s e l f , o b j e c t i v e , time , n o n l i n e a r b o u n d s , s t e p =10 , n t o e x t r a c t = None , o t

# E x t r a c t t h e l i n e a r & non−l i n e a r p a r a m e t e r s t h a t minimize t h e SSE
c o s t , l i n e a r , n o n l i n e a r = s e l f . g e t g r i d r e s u l t ( o b j e c t i v e , time , n o n l i n e a r b o u n d s
106
return s e l f . get local minima ( cost , linear , non linear , n t o e x t r a c t )
################################################################################
’ ’ ’ Non−l i n e a r o p t i m i z a t i o n methods ’ ’ ’
def n o n l i n e a r o p t i m i z a t i o n ( s e l f , o b j e c t i v e , time , l i n e a r , n o n l i n e a r g u e s s , bounds , metho
return minimize ( s e l f . t o m i n i m i z e , n o n l i n e a r g u e s s , a r g s =( o b j e c t i v e , time , l i n e a r )
def n o n l i n e a r s e a r c h ( s e l f , o b j e c t i v e , time , l i n e a r , n o n l i n e a r g u e s s , bounds , method= ’ Nel

i f n t o e x t r a c t i s None :
n o n l i n e a r = minimize ( s e l f . t o m i n i m i z e , n o n l i n e a r g u e s s , a r g s =( o b j e c t i v e ,
l i n e a r = s e l f . l i n e a r o p t i m i z a t i o n ( o b j e c t i v e , time , n o n l i n e a r )
return l i n e a r , n o n l i n e a r , s e l f . c o s t ( o b j e c t i v e , time , l i n e a r , n o n l i n e a r )
else :
L in e ar , N o n l i n e a r , c o s t = [ ] , [ ] , [ ]
for l i n , n o n l i n in zip ( l i n e a r , n o n l i n e a r g u e s s ) :
N o n l i n e a r . append ( minimize ( s e l f . t o m i n i m i z e , n o n l i n , a r g s =( o b j e c t
L i n e a r . append ( s e l f . l i n e a r o p t i m i z a t i o n ( o b j e c t i v e , time , N o n l i n e a r
c o s t . append ( s e l f . c o s t ( o b j e c t i v e , time , L i n e a r [ − 1 ] , N o n l i n e a r [ − 1 ] )
a r g = np . argmin ( c o s t )
return L i n e a r [ a r g ] , N o n l i n e a r [ a r g ] , c o s t [ a r g ]
###############################################################################
’ ’ ’ Optimization procedure ’ ’ ’
def f i t ( s e l f , o b j e c t i v e , time , n o n l i n e a r b o u n d s=None , method= ’ Nelder−Mead ’ , t o l=None , o p t

i f measure time :
t = tm . time ( )
i f n o n l i n e a r b o u n d s i s None :
l i n e a r = s e l f . l i n e a r o p t i m i z a t i o n ( o b j e c t i v e , time )
c o s t = s e l f . c o s t ( o b j e c t i v e , time , l i n e a r )
i f show :
s e l f . p r i n t p a r a m e t e r ( l i n e a r , None , s s e )
i f measure time :
return l i n e a r , None , s e l f . c o s t ( o b j e c t i v e , time , l i n e a r ) , tm . time ( )
else :
return l i n e a r , None , s e l f . c o s t ( o b j e c t i v e , time , l i n e a r ) , None
else :
i f n o n l i n e a r g u e s s i s None :
l i n e a r , n o n l i n e a r g u e s s = s e l f . g r i d s e a r c h ( o b j e c t i v e , time , n o n l
107
l i n e a r , n o n l i n e a r , c o s t = s e l f . n o n l i n e a r s e a r c h ( o b j e c t i v e , time ,
else :
l i n e a r = s e l f . l i n e a r o p t i m i z a t i o n ( o b j e c t i v e , time , n o n l i n e a r g u e s
n o n l i n e a r = s e l f . n o n l i n e a r o p t i m i z a t i o n ( o b j e c t i v e , time , l i n e a r ,
l i n e a r = s e l f . l i n e a r o p t i m i z a t i o n ( o b j e c t i v e , time , n o n l i n e a r )
i f show :
i f not s e l f . i n b o u n d s ( n o n l i n e a r , n o n l i n e a r b o u n d s ) :
print ( ’ \ t−−−> Out o f bounds ’ )
s e l f . print parameter ( linear , non linear , sse )
i f measure time :
return l i n e a r , n o n l i n e a r , s e l f . c o s t ( o b j e c t i v e , time , l i n e a r , n o n
else :
return l i n e a r , n o n l i n e a r , s e l f . c o s t ( o b j e c t i v e , time , l i n e a r , n o n
#############################
c l a s s S t o c h a s t i c ( Model ) :
’ ’ ’ A b s t r a c t c l a s s : D e f i n e an o p t i m i z a t i o n p r o c e d u r e f o r a s t o c h a s t i c model ’ ’ ’
metaclass = ABCMeta
def init ( self ):

s e l f . n para = 0
##############################################################################
’ ’ ’ Methods which d e f i n e t h e model and t h e c o s t f u n c t i o n ’ ’ ’
@abstractmethod
def model ( s e l f , time , p a r a m e t e r s = None , i n i t = None ) : pass
@abstractmethod
def p r o b a b i l i t y ( s e l f , f u t u r e o b j e c t i v e , f u t u r e t i m e , p a r a m e t e r s = None , data = None , t i m e
@abstractmethod
def e x p e c t a t i o n ( s e l f , time , i n f o ) : pass
@abstractmethod
def v o l a t i l i t y ( s e l f , time , i n f o ) : pass
108
@abstractmethod
def e x p e c t e d g r o w t h ( s e l f , time , i n f o ) : pass
def l o g l i k e l i h o o d ( s e l f , o b j e c t i v e , time , p a r a m e t e r s = None , data = None , t i m e d a t a = None

return np .sum( np . l o g ( s e l f . p r o b a b i l i t y ( o b j e c t i v e , time , parameters , data , t i m e d a t a
##############################################################################”
’ ’ ’ A d d i t i o n a l methods ’ ’ ’
def c o m b i n e t o p r e d i c t i v e ( s e l f , time , parameters , model , l i n e a r , n o n l i n e a r ) :

return [ model . model ( time , l i n e a r , n o n l i n e a r ) ] + p a r a m e t e r s
def s e p a r a t e f r o m p r e d i c t i v e ( s e l f , p a r a m e t e r s ) :
return l i s t ( np . d e l e t e ( parameters , 0 ) )
def u p d a t e i n f o ( s e l f , model = None , l i n e a r = None , n o n l i n e a r = None , i n f o = None ) :

return i n f o
def q u a l ( s e l f , time , i n f o ) :
return np . s i g n ( i n f o [ 0 ] [ 0 ] )
#######################
c l a s s I t o m o d e l ( Model ) :
’ ’ ’ Combine a p r e d i c t i v e and s t o c h a s t i c model ( Model = Pred + S t o c ) ’ ’ ’
def init ( s e l f , predict , stoch ) :

self . predict = predict
self . stoch = stoch
self . name = s e l f . p r e d i c t . name + ’ ’ + s e l f . s t o c h . name
def b u i l t d i c o ( s e l f , l i n e a r , n o n l i n e a r , p a r a m e t e r s ) :
dico 1 = s e l f . predict . built dico ( linear , non linear )
d i c o 2 = s e l f . s t o c h . b u i l t d i c o ( [ None ] + p a r a m e t e r s )
return {∗∗ d i c o 1 , ∗∗ d i c o 2 }
def p r i n t p a r a m e t e r ( s e l f , l i n e a r=None , n o n l i n e a r=None , p a r a m e t e r s=None , s s e=None , l o g l i k=

print ( ’ \ t P r e d i c t i v e p a r t ’ )
s e l f . predict . print parameter ( linear , non linear , sse )
print ( ’ \ t S t o c h a s t i c p a r t ’ )
s e l f . s t o c h . p r i n t p a r a m e t e r ( [ None ] + parameters , l o g l i k )
##############################################################################
109
def model ( s e l f , time , l i n e a r = None , n o n l i n e a r = None , p a r a m e t e r s = None , i n i t = None ) :
model = s e l f . p r e d i c t . model ( time , l i n e a r , n o n l i n e a r )
parameters = [ 0 . ] + parameters
i f i n i t i s None :
init = 0.
else :
i n i t = i n i t − model [ 0 ]
n o i s e = s e l f . s t o c h . model ( time , parameters , i n i t = i n i t )
return model + n o i s e
def p r o b a b i l i t y ( s e l f , f u t u r e o b j e c t i v e , f u t u r e t i m e , l i n e a r = None , n o n l i n e a r = None , p a r

pred = s e l f . p r e d i c t . model ( time , l i n e a r , n o n l i n e a r )
new para = s e l f . s t o c h . c o m b i n e t o p r e d i c t i v e ( time , parameters , data , t i m e d a t a , s e l
return s e l f . s t o c h . p r o b a b i l i t y ( f u t u r e o b j e c t i v e , f u t u r e t i m e , new para , data , t i m e
def e x p e c t a t i o n ( s e l f , time , i n f o ) :
# info = [ linear , non linear , information to give to the stoch part ]
exp = s e l f . p r e d i c t . model ( time , i n f o [ 0 ] , i n f o [ 1 ] )
info = s e l f . stoch . update info ( s e l f . predict , info [ 0 ] , info [ 1 ] , info [ 2 ] )
return exp + s e l f . s t o c h . e x p e c t a t i o n ( time , i n f o )
def v o l a t i l i t y ( s e l f , time , i n f o ) :
return s e l f . s t o c h . v o l a t i l i t y ( time , i n f o [ 2 ] )
def e x p e c t e d g r o w t h ( s e l f , time , i n f o ) :
exp = s e l f . p r e d i c t . model ( time , i n f o [ 0 ] , i n f o [ 1 ] ) − s e l f . p r e d i c t . model ( i n f o [ − 1 ] [ − 1 ]
info = s e l f . stoch . update info ( s e l f . predict , info [ 0 ] , info [ 1 ] , info [ 2 ] )
return exp + s e l f . s t o c h . e x p e c t e d g r o w t h ( time , i n f o )
def f i t ( s e l f , o b j e c t i v e , time , argument = None , p a r a m e t e r s = [ None , None ] , n o n l i n e a r g u e s

a r g = s e l f . p r e d i c t . c o n v e r t a r g u m e n t ( argument )
l i n e a r , n o n l i n e a r , e r r o r , tau = s e l f . p r e d i c t . f i t ( o b j e c t i v e , time , a r g [ 0 ] , a r g [ 1 ] ,
linear , non linear = s e l f . predict . correct model parameters ( linear , non linear )
parameters , l o g l i k = s e l f . s t o c h . f i t ( o b j e c t i v e − s e l f . p r e d i c t . model ( time , l i n e a r , n
parameters = s e l f . stoch . s e p a r a t e f r o m p r e d i c t i v e ( parameters )
return l i n e a r , n o n l i n e a r , parameters , l o g l i k , tau
def q u a l ( s e l f , time , i n f o ) :
return s e l f . p r e d i c t . q u a l ( f l o a t ( i n f o [ − 1 ] [ − 1 ] ) , i n f o [ 0 ] , i n f o [ 1 ] )
##################################################################################################
’ ’ ’ D e f i n e d i f f e r e n t P r e d i c t i v e model ’ ’ ’
110
c l a s s PL( P r e d i c t i v e ) :
’ ’ ’ D e f i n e t h e Power−Law model ’ ’ ’
def init ( self ):

s e l f . name = ’PL ’
def b u i l t d i c o ( s e l f , l i n e a r = None , n o n l i n e a r = None ) :

return { ’A ’ : [ l i n e a r [ 0 ] ] , ’B ’ : [ l i n e a r [ 1 ] ] , ’m’ : [ n o n l i n e a r [ 0 ] ] , ’ tc ’ : [ non linea
########################################################################################
’ ’ ’ P e r s o n a l i z e d methods ’ ’ ’
def power law ( s e l f , t , m, t c ) :

return np . abs ( t−t c ) ∗ ∗m
def t c b o u n d s ( s e l f , t ) :
i f type ( t ) == int or type ( t ) == f l o a t :
return t −60 , t +252
else :
N = len ( t )
t c 0 = np . maximum( −60 , −N∗0.5)+ t [ −1]
t c 1 = np . minimum ( 2 5 2 , N∗0.5)+ t [ −1]
return tc0 , t c 1
def q u a l ( s e l f , time , l i n e a r , n o n l i n e a r ) :
i f None in l i n e a r or None in n o n l i n e a r or np . nan in l i n e a r or np . nan in n o n l i n e a
return 0 .
else :
m = ( n o n l i n e a r [ 0 ] > 0 . and n o n l i n e a r [ 0 ] < 1 . )
tc0 , t c 1 = s e l f . t c b o u n d s ( time )
t c = ( n o n l i n e a r [1] >= t c 0 and n o n l i n e a r [1] <= t c 1 )
i f m and t c :
return −np . s i g n ( l i n e a r [ 1 ] )
else :
return 0 .
##################################################################################
’ ’ ’ O v e r r i d e model a b s t r a c t method ’ ’ ’
def model ( s e l f , time , l i n e a r = [ 0 . , 1 . ] , n o n l i n e a r = [ 1 . , 0 . ] ) :

# D e f i n e PL model w i t h l i n e a r = [A, B] and non−l i n e a r =[m, t c ]
# PL = A + B∗ | t c − t | ˆm
return l i n e a r [ 0 ] + l i n e a r [ 1 ] ∗ s e l f . power law ( time , n o n l i n e a r [ 0 ] , n o n l i n e a r [ 1 ] )
def l i n e a r g r a d ( s e l f , time , n o n l i n e a r = [ 1 . , 0 . ] ) :
i = np . o n e s ( len ( time ) )
f = s e l f . power law ( time , n o n l i n e a r [ 0 ] , n o n l i n e a r [ 1 ] )
111
return np . a r r a y ( [ i , f ] )
def g e t g r i d ( s e l f , n o n l i n e a r b o u n d s , s t e p =10 , e p s = 1 0 . ∗ ∗ ( − 6 ) ) :
# D e f i n e a g r i d on t h e n o n l i n e a r p a r a m e t e r s f o r t h e g r i d s e a r c h
# n o n l i n e a r b o u n d s = ( (m0, m1) , ( tc0 , t c 1 ) )
m = np . l i n s p a c e ( n o n l i n e a r b o u n d s [ 0 ] [ 0 ] + eps , n o n l i n e a r b o u n d s [ 0 ] [ 1 ] − eps , s t e p )
t c = np . l i n s p a c e ( n o n l i n e a r b o u n d s [ 1 ] [ 0 ] + eps , n o n l i n e a r b o u n d s [ 1 ] [ 1 ] − eps , s t e p )
return np . a r r a y ( np . meshgrid (m, t c ) ) . T . r e s h a p e ( −1 , 2 )
def grad model ( s e l f , time , l i n e a r = [ 0 . , 1 . ] , n o n l i n e a r = [ 1 . , 0 . ] , g i v e a l l=F a l s e ) :

T = ( n o n l i n e a r [1] − time ) . r e s h a p e ( −1 , 1 )
l o g = np . l o g ( np . abs (T ) ) . r e s h a p e ( −1 , 1 )
f = s e l f . model ( time , l i n e a r = [ 0 . , l i n e a r [ − 1 ] ] , n o n l i n e a r = n o n l i n e a r ) . r e s h a p e ( −
dm = l o g ∗ f
d t c = n o n l i n e a r [ 0 ] ∗ f /T
if give all :
return np . c o n c a t e n a t e ( (dm, d t c ) , a x i s =1) , T, l o g
else :
return np . c o n c a t e n a t e ( (dm, d t c ) , a x i s =1)
def h e s s m o d e l ( s e l f , time , l i n e a r = [ 0 . , 1 . ] , n o n l i n e a r = [ 1 . , 0 . ] , g i v e a l l=F a l s e ) :

j a c , T, l o g = s e l f . grad model ( time , l i n e a r , n o n l i n e a r , True )
d2m = ( l o g ∗ j a c [ : , 0 ] . r e s h a p e ( − 1 , 1 ) ) . r e s h a p e ( −1)
dmdtc = ( l o g ∗ j a c [ : , 1 ] . r e s h a p e ( − 1 , 1 ) ) . r e s h a p e ( −1)
d 2 t c = ( ( n o n l i n e a r [ 0 ] − 1 . ) ∗ j a c [ : , 1 ] . r e s h a p e ( −1 ,1)/T ) . r e s h a p e ( −1)
if give all :
return np . a r r a y ( [ [ d2m , dmdtc ] , [ dmdtc , d 2 t c ] ] ) , T
else :
return np . a r r a y ( [ [ d2m , dmdtc ] , [ dmdtc , d 2 t c ] ] )
def p r i n t p a r a m e t e r ( s e l f , l i n e a r=None , n o n l i n e a r=None , s s e = None ) :

print ( ’A = ’ , l i n e a r [ 0 ] , ’ \tB = ’ , l i n e a r [ 1 ] )
print ( ’m = ’ , n o n l i n e a r [ 0 ] , ’ \ t t c = ’ , n o n l i n e a r [ 1 ] )
i f s s e i s not None :
print ( ’ SSE = ’ , s s e )
print
##########################################
c l a s s LPPLS(PL ) :
’ ’ ’ D e f i n e LPPLS model ’ ’ ’
112
def init ( self ):
s e l f . name = ’LPPLS ’
def b u i l t d i c o ( s e l f , l i n e a r = None , n o n l i n e a r = None ) :

d i c o 1 = super ( ) . b u i l t d i c o ( l i n e a r [ : 2 ] , n o n l i n e a r [ : 2 ] )
d i c o 2 = { ’C1 ’ : [ l i n e a r [ 2 ] ] , ’C2 ’ : [ l i n e a r [ 3 ] ] , ’w ’ : [ n o n l i n e a r [ 2 ] ] }
return {∗∗ d i c o 1 , ∗∗ d i c o 2 }
########################################################################################
’ ’ ’ P e r s o n a l i z e d methods ’ ’ ’
def l o g a n g l e ( s e l f , t , tc , w ) :
return w∗np . l o g ( np . abs ( t−t c ) )
def C( s e l f , l i n e a r ) :
return np . s q r t ( l i n e a r [ 2 ] ∗ l i n e a r [ 2 ] + l i n e a r [ 3 ] ∗ l i n e a r [ 3 ] )
def damping ( s e l f , l i n e a r , n o n l i n e a r ) :
return n o n l i n e a r [ 0 ] / n o n l i n e a r [ 2 ] ∗ np . abs ( l i n e a r [ 1 ] / s e l f .C( l i n e a r ) )
def o s c i l l a t i o n ( s e l f , time , n o n l i n e a r ) :
return n o n l i n e a r [ 2 ] / 2 / np . p i ∗np . l o g ( np . abs ( ( n o n l i n e a r [1] − time [ 0 ] ) / ( n o n l i n e a r [1] −
def q u a l ( s e l f , time , l i n e a r , n o n l i n e a r , w0 = 2 . , w1 = 5 0 . , D0 = 0 . 5 , O0 = 2 . 5 ) :
q u a l = super ( ) . q u a l ( time , l i n e a r , n o n l i n e a r )
i f q u a l != 0 :
w = ( n o n l i n e a r [ 2 ] >= w0 and n o n l i n e a r [ 2 ] <= w1)
D = s e l f . damping ( l i n e a r , n o n l i n e a r )
D = (D >= D0)
C = s e l f . C( l i n e a r )
O = True
i f len ( np . a r r a y ( [ time ] ) ) > 1 :
i f np . abs (C/ l i n e a r [ 1 ] ) > = 0 . 0 5 and n o n l i n e a r [ 1 ] > time :
O = s e l f . o s c i l l a t i o n ( time , n o n l i n e a r )
O = (O >= O0)
i f not (w and D and O) :

qual = 0 .
return q u a l
########################################################################################
’ ’ ’ O v e r r i d e Model a b s t r a c t methods ’ ’ ’
def model ( s e l f , time , l i n e a r = [ 0 . , 1 . , 0 . , 0 . ] , n o n l i n e a r = [ 1 . , 0 . , 0 . ] ) :

# D e f i n e LPPLS model w i t h l i n e a r = [A, B, C1 , C2 ] and non−l i n e a r =[m, t c , w ]
# LPPLS = A + | t c −t |∗∗( −m) ∗ (B + C1∗ c o s (w∗ l n | t c −t | ) + C2∗ s i n (w∗ l n | t c −t | ) )
T = s e l f . power law ( time , n o n l i n e a r [ 0 ] , n o n l i n e a r [ 1 ] )
p h i = s e l f . l o g a n g l e ( time , n o n l i n e a r [ 1 ] , n o n l i n e a r [ 2 ] )
113
return l i n e a r [ 0 ] + T∗ ( l i n e a r [ 1 ] + l i n e a r [ 2 ] ∗ np . c o s ( p h i ) + l i n e a r [ 3 ] ∗ np . s i n ( p h i ) )
def l i n e a r g r a d ( s e l f , time , n o n l i n e a r = [ 1 . , 0 . , 0 . ] ) :
p h i = s e l f . l o g a n g l e ( time , n o n l i n e a r [ 1 ] , n o n l i n e a r [ 2 ] )
i = np . o n e s ( len ( time ) )
f = s e l f . power law ( time , n o n l i n e a r [ 0 ] , n o n l i n e a r [ 1 ] )
return np . a r r a y ( [ i , f , f ∗np . c o s ( p h i ) , f ∗np . s i n ( p h i ) ] )
def g e t g r i d ( s e l f , n o n l i n e a r b o u n d s , s t e p =10 , e p s = 1 0 . ∗ ∗ ( − 6 ) ) :
# D e f i n e a g r i d on t h e n o n l i n e a r p a r a m e t e r s f o r t h e g r i d s e a r c h
# n o n l i n e a r b o u n d s = ( (m0, m1) , ( tc0 , t c 1 ) , ( w0 , w1 ) )
m = np . l i n s p a c e ( n o n l i n e a r b o u n d s [ 0 ] [ 0 ] + eps , n o n l i n e a r b o u n d s [ 0 ] [ 1 ] − eps , s t e p )
t c = np . l i n s p a c e ( n o n l i n e a r b o u n d s [ 1 ] [ 0 ] + eps , n o n l i n e a r b o u n d s [ 1 ] [ 1 ] − eps , s t e p )
w = np . l i n s p a c e ( n o n l i n e a r b o u n d s [ 2 ] [ 0 ] + eps , n o n l i n e a r b o u n d s [ 2 ] [ 1 ] − eps , s t e p )
return np . a r r a y ( np . meshgrid (m, tc , w ) ) . T. r e s h a p e ( −1 , 3 )
def grad model ( s e l f , time , l i n e a r = [ 0 . , 1 . , 0 . , 0 . ] , n o n l i n e a r = [ 1 . , 0 . , 0 . ] , g i v e a l l=F a l s

f 1 = s e l f . model ( time , [ 0 . , l i n e a r [ 1 ] , l i n e a r [ 2 ] , l i n e a r [ 3 ] ] , n o n l i n e a r ) . r e s h a p e ( −
f 2 = s e l f . model ( time , [ 0 . , 0 . , l i n e a r [ 3 ] , − l i n e a r [ 2 ] ] , n o n l i n e a r ) . r e s h a p e ( −1 , 1 )
T = ( n o n l i n e a r [1] − time ) . r e s h a p e ( −1 , 1 )
l o g = np . l o g ( np . abs (T ) ) . r e s h a p e ( −1 , 1 )
dm = l o g ∗ f 1
d t c = n o n l i n e a r [ 0 ] ∗ f 1 /T + n o n l i n e a r [ 2 ] ∗ f 2 /T
dw = l o g ∗ f 2
if give all :
return np . c o n c a t e n a t e ( (dm, dtc , dw ) , a x i s =1) , f1 , f2 , T, l o g
else :
return np . c o n c a t e n a t e ( (dm, dtc , dw ) , a x i s =1)
def h e s s m o d e l ( s e l f , time , l i n e a r = [ 0 . , 1 . , 0 . , 0 . ] , n o n l i n e a r = [ 1 . , 0 . , 0 . ] , g i v e a l l=F a l s

j a c , f1 , f2 , T, l o g = s e l f . grad model ( time , l i n e a r , n o n l i n e a r , True )
f 3 = s e l f . model ( time , [ 0 . , 0 . , l i n e a r [ 2 ] , l i n e a r [ 3 ] ] , n o n l i n e a r ) . r e s h a p e ( −1 , 1 )
d2m = ( l o g ∗ j a c [ : , 0 ] . r e s h a p e ( − 1 , 1 ) ) . r e s h a p e ( −1)
dmdtc = ( l o g ∗ j a c [ : , 1 ] . r e s h a p e ( −1 ,1) + f 1 /T ) . r e s h a p e ( −1)
dmdw = ( l o g ∗ j a c [ : , 2 ] . r e s h a p e ( − 1 , 1 ) ) . r e s h a p e ( −1)
d 2 t c = ((1+ n o n l i n e a r [ 0 ] ) ∗ j a c [ : , 1 ] . r e s h a p e ( −1 ,1)/T − n o n l i n e a r [ 2 ] ∗ f 3 /T/T ) . r e s h a p e
dtcdw = ( n o n l i n e a r [ 0 ] ∗ j a c [ : , 2 ] . r e s h a p e ( −1 ,1)/T − n o n l i n e a r [ 2 ] ∗ l o g ∗ f 3 /T + f 2 /T ) . r
d2w = (− l o g ∗ l o g ∗ f 3 ) . r e s h a p e ( −1)
if give all :
return np . a r r a y ( [ [ d2m , dmdtc , dmdw ] , [ dmdtc , d2tc , dtcdw ] , [ dmdw, dtcdw , d
else :
return np . a r r a y ( [ [ d2m , dmdtc , dmdw ] , [ dmdtc , d2tc , dtcdw ] , [ dmdw, dtcdw , d
114
def p r i n t p a r a m e t e r ( s e l f , l i n e a r=None , n o n l i n e a r=None , s s e = None ) :
print ( ’A = ’ , l i n e a r [ 0 ] , ’ \tB = ’ , l i n e a r [ 1 ] )
print ( ’C1 = ’ , l i n e a r [ 2 ] , ’ \tC2 = ’ , l i n e a r [ 3 ] )
print ( ’m = ’ , n o n l i n e a r [ 0 ] , ’ \ t t c = ’ , n o n l i n e a r [ 1 ] , ’ \tw = ’ , n o n l i n e a r [ 2 ] )
i f s s e i s not None :
print ( ’ SSE = ’ , s s e )
print
###########################################################################################
’ ’ ’ D e f i n e d i f f e r e n t s t o c h a s t i c model ’ ’ ’
c l a s s GN( S t o c h a s t i c ) :
’ ’ ’ D e f i n e a Gaussian n o i s e ’ ’ ’
def init ( self ):

s e l f . name = ’GN ’
def p r i n t p a r a m e t e r ( s e l f , p a r a m e t e r s = None , l o g l i k = None ) :

print ( ’mu = ’ , p a r a m e t e r s [ 0 ] , ’ \ t s i g m a = ’ , p a r a m e t e r s [ 1 ] )
i f l o g l i k i s not None :
print ( ’ Log−L i k e l i h o o d = ’ , l o g l i k )
print
def b u i l t d i c o ( s e l f , p a r a m e t e r s = None ) :
115
i f p a r a m e t e r s [ 0 ] i s None :
return { ’ sigma ’ : [ p a r a m e t e r s [ 1 ] ] }
else :
return { ’mu ’ : [ p a r a m e t e r s [ 0 ] ] , ’ sigma ’ : [ p a r a m e t e r s [ 1 ] ] }
##############################################################################
def model ( s e l f , time , p a r a m e t e r s = None , i n i t = None ) :

return p a r a m e t e r s [ 0 ] + p a r a m e t e r s [ 1 ] ∗ np . random . normal ( s i z e = len ( time ) )
def p r o b a b i l i t y ( s e l f , f u t u r e o b j e c t i v e , f u t u r e t i m e , p a r a m e t e r s = None ) :
f u t u r e o b j e c t i v e − parameters [ 0 ]
return np . exp ( −0.5/ sigma / sigma ∗ ( f u t u r e o b j e c t i v e ) ∗ ( f u t u r e o b j e c t i v e ) ) / sigma /np . s q r
def e x p e c t a t i o n ( s e l f , time , para ) :

return para [ 0 ]
def v o l a t i l i t y ( s e l f , time , para ) :

return para [ 1 ] ∗ para [ 1 ]
def e x p e c t e d g r o w t h ( s e l f , time , para ) :

return 0 .
################################################################################
’ ’ ’ Method t o maximize t h e Log−l i k e l i h o o d ’ ’ ’
def f i t ( s e l f , data , time , model = None , l i n e a r = None , n o n l i n e a r = None , p a r a m e t e r s = [ No

i f model i s not None :
data = data − model . model ( time , l i n e a r , n o n l i n e a r )
parameters [ 0 ] = 0 .
p a r a m e t e r s [ 0 ] = s e l f . get mu ( data )
data = data − p a r a m e t e r s [ 0 ]
sigma = s e l f . g e t s i g m a ( data )
p a r a m e t e r s [ 1 ] = np . s q r t ( sigma )
l o g l i k = s e l f . g e t l o g l i k e l i h o o d ( sigma , len ( data ) )
return parameters , l o g l i k
#############################################################################
’ ’ ’ A d d i t i o n a l method ’ ’ ’
def get mu ( s e l f , data ) :

mu = np . mean ( data )
def g e t s i g m a ( s e l f , data ) :
return data . dot ( data ) / len ( data )
116
def g e t l o g l i k e l i h o o d ( s e l f , sigma , N ) :
return −N∗ 0 . 5 ∗ ( 1 . + np . l o g ( sigma ) + np . l o g ( 2 . ∗ np . p i ) )
#########################
c l a s s BM( S t o c h a s t i c ) :
’ ’ ’ D e f i n e t h e Brownian Motion ’ ’ ’
def init ( self ):

s e l f . name = ’BM’

print ( ’mu = ’ , p a r a m e t e r s [ 0 ] , ’ \ t s i g m a = ’ , p a r a m e t e r s [ 1 ] )
print
return { ’ sigma ’ : [ p a r a m e t e r s [ 1 ] ] }
else :
return { ’mu ’ : [ p a r a m e t e r s [ 0 ] ] , ’ sigma ’ : [ p a r a m e t e r s [ 1 ] ] }
def g e t d r i f t ( s e l f , p a r a m e t e r s ) :
return p a r a m e t e r s [ 0 ] − p a r a m e t e r s [ 1 ] ∗ p a r a m e t e r s [ 1 ] ∗ 0 . 5
def u p d a t e i n f o ( s e l f , model , l i n e a r , n o n l i n e a r , i n f o ) :
b = i n f o [ 1 ] − model . model ( i n f o [ 2 ] , l i n e a r , n o n l i n e a r )
return [ i n f o [ 0 ] , b , i n f o [ 2 ] ]
##############################################################################
def model ( s e l f , time , p a r a m e t e r s = None , i n i t = 0 . ) :

dt = time [ 1 : ] − time [ : − 1 ]
bm = [ i n i t ]
d r i f t = p a r a m e t e r s [ 0 ] ∗ ( time − time [ 0 ] )
f o r i in range ( len ( dt ) ) :
bm. append (bm[ −1] + p a r a m e t e r s [ 1 ] ∗ np . s q r t ( dt [ i ] ) ∗ np . random . normal ( 0 . , 1 . ) )
117
return np . a r r a y (bm) + d r i f t
def p r o b a b i l i t y ( s e l f , f u t u r e o b j e c t i v e , f u t u r e t i m e , data = None , time = None , p a r a m e t e r s

d r i f t = s e l f . g e t d r i f t ( parameters )
dt = f u t u r e t i m e − time
dy = f u t u r e o b j e c t i v e − data
return np . exp ( −0.5/ p a r a m e t e r s [ 2 ] / p a r a m e t e r s [ 2 ] / dt ∗ ( dy − d r i f t ∗ dt ) ∗ ( dy − d r i f t ∗ dt ) )
# i n f o = [ para , l a s t d a t a p o i n t , time o f t h e l a s t d a t a p o i n t ]
return i n f o [ 1 ] + s e l f . e x p e c t e d g r o w t h ( time , i n f o )
return i n f o [ 0 ] [ 1 ] ∗ i n f o [ 0 ] [ 1 ] ∗ ( time − i n f o [ 2 ] )
return i n f o [ 0 ] [ 0 ] ∗ ( time − i n f o [ 2 ] )
################################################################################

dy = data [ 1 : ] − data [ : − 1 ]
dt = time [ 1 : ] − time [ : − 1 ]
N = len ( dy )
a = time [ −1] − time [ 0 ]

b = data [ −1] − data [ 0 ]
c = dy . dot ( dy/ dt )
d = 0 . 5 ∗N∗np . l o g ( 2 . ∗ np . p i ) + 0 . 5 ∗ np . l o g ( dt ) . sum( )
p a r a m e t e r s [ 0 ] = b/ a
f = a∗ parameters [ 0 ] ∗ parameters [ 0 ] − 2 .∗ parameters [ 0 ] ∗ b + c

p a r a m e t e r s [ 1 ] = f /N
l o g l i k = −0.5∗N∗ ( 1 . + np . l o g ( p a r a m e t e r s [ 1 ] ) ) − d
p a r a m e t e r s [ 1 ] = np . s q r t ( p a r a m e t e r s [ 1 ] )
118
#############################
c l a s s OU( S t o c h a s t i c ) :
’ ’ ’ D e f i n e t h e Ornstein −Ulhembeck p r o c e s s ’ ’ ’
def init ( self ):

s e l f . name = ’OU ’

print ( ’mu = ’ , p a r a m e t e r s [ 0 ] , ’ \ t t h e t a = ’ , p a r a m e t e r s [ 1 ] )
print ( ’ sigma = ’ , p a r a m e t e r s [ 2 ] )
print
return { ’ t h e t a ’ : [ p a r a m e t e r s [ 1 ] ] , ’ sigma ’ : [ p a r a m e t e r s [ 2 ] ] }
else :
return { ’mu ’ : [ p a r a m e t e r s [ 0 ] ] , ’ t h e t a ’ : [ p a r a m e t e r s [ 1 ] ] , ’ sigma ’ : [ p a r a m e t
def u p d a t e i n f o ( s e l f , model , l i n e a r , n o n l i n e a r , i n f o ) :
b = i n f o [ 1 ] − model . model ( i n f o [ 2 ] , l i n e a r , n o n l i n e a r )
return [ i n f o [ 0 ] , b , i n f o [ 1 ] ]
##############################################################################
def model ( s e l f , time , p a r a m e t e r s = None , i n i t = 0 . ) :

dt = time [ 1 : ] − time [ : − 1 ]
nu = [ i n i t ]
t = time − time [ 0 ]
f o r i in range ( len ( dt ) ) :
nu . append ( nu [ −1] + p a r a m e t e r s [ 0 ] ∗ dt [ i ] + p a r a m e t e r s [ 1 ] ∗ ( p a r a m e t e r s [ 0 ] ∗ t [ i ]
return np . a r r a y ( nu )
def p r o b a b i l i t y ( s e l f , f u t u r e o b j e c t i v e , f u t u r e t i m e , data = None , time = None , p a r a m e t e r s

dt = f u t u r e t i m e − time
dy = ( f u t u r e o b j e c t i v e − data ) / dt
y = p a r a m e t e r s [ 0 ] − p a r a m e t e r s [ 1 ] ∗ data
return np . exp ( −0.5/ p a r a m e t e r s [ 2 ] / p a r a m e t e r s [ 2 ] ∗ ( dy + y ) ∗ ( dy + y ) ) / p a r a m e t e r s [ 2 ] / dt
# i n f o = [ para , l a s t d a t a p o i n t , time o f t h e l a s t d a t a p o i n t ]
return i n f o [ 1 ] + s e l f . e x p e c t e d g r o w t h ( time , i n f o )
119
return i n f o [ 0 ] [ 2 ] ∗ i n f o [ 0 ] [ 2 ] ∗ ( time − i n f o [ 2 ] )
return i n f o [ 0 ] [ 0 ] ∗ ( time − i n f o [ 2 ] ) + i n f o [ 0 ] [ 1 ] ∗ ( i n f o [ 0 ] [ 0 ] ∗ i n f o [ 2 ] − i n f o [ 1 ] ) ∗ ( tim
################################################################################
def c o s t ( s e l f , data , time , para , g e t a l l = F a l s e ) :

A = s e l f . g e n e r a t e s y s t e m ( data , time )
mu = np . a r r a y ( [ para [ 0 ] ∗ para [ 0 ] , para [ 0 ] , 1 . ] )
t h e t a = np . a r r a y ( [ para [ 1 ] ∗ para [ 1 ] , para [ 1 ] , 1 . ] )
if get all :
return t h e t a . dot (A. dot (mu) ) , A, t h e t a , mu
else :
return t h e t a . dot (A. dot (mu) )
def t o m i n i m i z e ( s e l f , para , data , time ) :

return np . l o g ( s e l f . c o s t ( data , time , para ) )
def j a c ( s e l f , para , data , time ) :

c o s t , A, t h e t a , mu = s e l f . c o s t ( data , time , para , True )
d mu = t h e t a . dot (A[ : , : − 1 ] . dot ( np . a r r a y ( [ 2 . ∗ para [ 0 ] , 1 . ] ) ) )

d t h e t a = np . a r r a y ( [ 2 . ∗ para [ 1 ] , 1 . ] ) . dot (A[ : − 1 , : ] . dot (mu) )
return np . a r r a y ( [ d mu , d t h e t a ] ) / c o s t
def h e s s ( s e l f , para , data , time ) :

c o s t , A, t h e t a , mu = s e l f . c o s t ( data , time , para , True )
d mu2 = 2 . ∗ t h e t a . dot (A [ : , 0 ] )
d mu theta = np . a r r a y ( [ 2 . ∗ para [ 1 ] , 1 . ] ) . dot (A[ : − 1 , : − 1 ] . dot ( np . a r r a y ( [ 2 . ∗ para [ 0 ] , 1
d t h e t a 2 = 2 . ∗A [ 0 ] . dot (mu)
return np . a r r a y ( [ [ d mu2 , d mu theta ] , [ d mu theta , d t h e t a 2 ] ] ) / c o s t
def g e t g u e s s e s ( s e l f , data , time ) :

nu , d nu , N, dt = s e l f . i n i t ( data , time )
T = time [ −1] − time [ 0 ]
Y = data [ −1] − data [ 0 ]
mu = Y/T
a, b, = s e l f . g e t c o s t p a r a ( nu , d nu , dt , time [ : − 1 ] , mu)
t h e t a = −b/a ∗ 0 . 5
return np . a r r a y ( [ mu, t h e t a ] )
120
para = minimize ( s e l f . t o m i n i m i z e , s e l f . g e t g u e s s e s ( data , time ) , a r g s =(data
p a r a m e t e r s [ 0 ] = para [ 0 ]
p a r a m e t e r s [ 1 ] = para [ 1 ]
p a r a m e t e r s [ 2 ] = np . s q r t ( s e l f . c o s t ( data , time , para [ : 2 ] ) )
dt = time [ 1 : ] − time [ : − 1 ]
N = len ( dt )
else :
data = data − p a r a m e t e r s [ 0 ] ∗ time
a , b , c = s e l f . g e t c o s t p a r a ( nu , d nu , dt , time [ : − 1 ] , p a r a m e t e r s [ 0 ] )
p a r a m e t e r s [ 1 ] = −b/ a ∗ 0 . 5
p a r a m e t e r s [ 2 ] = np . s q r t ( ( c − b∗b/ a ∗ 0 . 5 ) /N)
l o g l i k = −N∗ 0 . 5 ∗ ( 1 . + 2 . ∗ np . l o g ( p a r a m e t e r s [ 2 ] ) + np . l o g ( 2 . ∗ np . p i ) ) − 0 . 5 ∗ np . l o g ( dt
##################################################################################
’ ’ ’ A d d i t i o n a l methods ’ ’ ’
def g e n e r a t e s y s t e m ( s e l f , data , time ) :

T = time [ −1] − time [ 0 ]
Y = data [ −1] − data [ 0 ]
t = time [ : − 1 ]
a = dt . dot ( t ∗ t )
b = −2.∗ dt . dot ( nu∗ t )
c = dt . dot ( nu∗nu )
d = 2 . ∗ t . dot ( dt )
e = −2.∗( t . dot ( d nu ) + dt . dot ( nu ) )
f = 2 . ∗ nu . dot ( d nu )
g = d nu . dot ( d nu / dt )
return np . a r r a y ( [ [ a , b , c ] , [ d , e , f ] , [ T, Y, g ] ] ) / N
def i n i t ( s e l f , data , time ) :

d nu = data [ 1 : ] − data [ : − 1 ]
dt = time [ 1 : ] − time [ : − 1 ]
d nu = d nu / dt
nu = data [ : − 1 ]
121
N = len ( nu )
return nu , d nu , N, dt
def g e t c o s t p a r a ( s e l f , nu , d nu , dt , t , mu ) :
a = mu∗mu∗ t . dot ( t ∗ dt ) − 2 . ∗mu∗nu . dot ( t ∗ dt ) + nu . dot ( nu∗ dt )
b = 2 . ∗mu∗mu∗ t . dot ( dt ) − 2 . ∗mu∗ ( d nu . dot ( t ) + nu . dot ( dt ) ) + 2 . ∗ nu . dot ( d nu )
c = mu∗mu∗ ( dt [ −1] + t [ −1] − t [ 0 ] ) − 2 . ∗mu∗ ( d nu [ −1] + nu [ −1] − nu [ 0 ] ) + d nu . dot ( d
return a , b , c
122
I Results on Synthetic Data
Figure 23: Absolute relative error of A in function of: the noise amplitude σ
(upper); the number of data points fitted N (center); the distance with the true
critical time tc0 − t2 (lower). The shaded part represents the interval between
the 0.2-quantiles and the 0.8-quantiles of the results and the dark line in it is
the median. 123
Figure 24: Absolute relative error of tc in function of: the noise amplitude σ
(upper); the number of data points fitted N (center); the distance with the true
critical time tc0 − t2 (lower). The shaded part represents the interval between
the 0.2-quantiles and the 0.8-quantiles of the results and the dark line in it is
the median.
124
Figure 25: Log-likelihood in function of: the noise amplitude σ (upper); the
number of data points fitted N (center); the distance with the true critical
time tc0 − t2 (lower). The shaded part represents the interval between the
0.2-quantiles and the 0.8-quantiles of the results and the dark line in it is the
median.
125
Figure 26: SSE in function of: the noise amplitude σ (upper); the number of
data points fitted N (center); the distance with the true critical time tc0 − t2
(lower). The shaded part represents the interval between the 0.2-quantiles and
the 0.8-quantiles of the results and the dark line in it is the median.
126
Figure 27: R2 in function of: the noise amplitude σ (upper); the number of
127
Figure 28: Entropy in function of: the noise amplitude σ (upper); the number
of data points fitted N (center); the distance with the true critical time tc0 − t2
128
Figure 29: Deviation in function of: the noise amplitude σ (upper); the number
of data points fitted N (center); the distance with the true critical time tc0 − t2
129
Figure 30: 0.2-quantiles of the absolute relative error of A for different mean-
reversing rates θ (θ = 0: blue; θ = 0.03: orange; θ = 0.1: green; θ = 1: red;
θ = 1.5: purple) in function of: the noise amplitude σ (upper); the number of
(lower). σ has been fixed to 0.01 for the second and third panels.
130
Figure 31: 0.2-quantiles of the absolute relative error of tc for different mean-
reversing rates θ (θ = 0: blue; θ = 0.03: orange; θ = 0.1: green; θ = 1: red;
θ = 1.5: purple) in function of: the noise amplitude σ (upper); the number of
(lower). σ has been fixed to 0.01 for the second and third panels.
131
Figure 32: 0.2-quantiles of the log-likelihood for different mean-reversing rates
θ (θ = 0: blue; θ = 0.03: orange; θ = 0.1: green; θ = 1: red; θ = 1.5: purple)
in function of: the noise amplitude σ (upper); the number of data points fitted
N (center); the distance with the true critical time tc0 − t2 (lower). σ has been
fixed to 0.01 for the second and third panels.
132
Figure 33: 0.2-quantiles of the SSE for different mean-reversing rates θ (θ = 0:
blue; θ = 0.03: orange; θ = 0.1: green; θ = 1: red; θ = 1.5: purple) in function
of: the noise amplitude σ (upper); the number of data points fitted N (center);
the distance with the true critical time tc0 − t2 (lower). σ has been fixed to 0.01
for the second and third panels.
133
Figure 34: 0.2-quantiles of the R2 for different mean-reversing rates θ (θ = 0:
blue; θ = 0.03: orange; θ = 0.1: green; θ = 1: red; θ = 1.5: purple) in function
of: the noise amplitude σ (upper); the number of data points fitted N (center);
the distance with the true critical time tc0 − t2 (lower). σ has been fixed to 0.01
for the second and third panels.
134
Figure 35: 0.2-quantiles of the entropy for different mean-reversing rates θ
(θ = 0: blue; θ = 0.03: orange; θ = 0.1: green; θ = 1: red; θ = 1.5: purple)
135
Figure 36: 0.2-quantiles of the deviation for different mean-reversing rates θ
(θ = 0: blue; θ = 0.03: orange; θ = 0.1: green; θ = 1: red; θ = 1.5: purple)
136
J Application of the Global Phase Labelling
Procedure on the NASDAQ 100
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
K Supervised Learning Results
Figure 37: Training set for the WU with the application of the net-works.
This figure represents the values targeted by the neural networks on which they
trained on. On the upper panel the different strategies are plotted. The best
fitted mixed strategy (orange) is closed to the optimal strategy (green). The
best strategy based on the success value (blue) to combine all the quenched
strategies is not as efficient as the others. The second panel shows the success
values for all model-based quenched strategy. The third one shows the different
signals for the starting time (black), ending time (red) and dissipation (blue)
of a bubble. The last panel shows the fits truthiness for each model based on
their capacity in current time to predict the trend. Based on the fits truthiness,
the median and 0.25,0.75-quantiles estimated from the fits are computed (black
dashed lines in the upper panel) 193
Figure 38: Testing set for the WU with the application of the networks. This
figure represents the values obtained on the testing set by the networks after op-
timizing them. The upper panels shows the different results for the application
of the different trading strategies: the best strategy predicted (green); the fitted
mixed strategy (orange); the combined quenched strategy (blue). Each strat-
egy is based on the prediction given by the corresponding neural network. The
fitted mixed strategy is based on the Kelly weight obtained by combining the
expectation based on all the fits of all the model considered under the fitting
probability predicted by the network (lower panel). The success values used
to combine the quenched strategies is the ones predicted by the corresponding
network (second panel).
194
figure represents the values obtained on the testing set by the networks after
optimizing them. Each outputs (the supervised strategy λs , the success values
α, the bubble signals q and the fits truthiness) have been normalized using the
extremum normalization. The rolling normalization have been done on the five
past data points. The upper panels shows the different results for the application
mixed strategy (orange); the combined quenched strategy (blue). Each strategy
is based on the prediction given by the corresponding neural network. The
195
standard normalization. The rolling normalization have been done on the five
to combine the quenched strategies is196 the ones predicted by the corresponding
Figure 41: Training set for the TDG with the application of the net-works.
dashed lines in the upper panel)
197
Figure 42: Testing set for the TDG with the application of the networks. This
198
199
Figure 45: Training set for the AIZ with the application of the net-works.
201
Figure 46: Testing set for the AIZ with the application of the networks. This
202
203
Figure 49: Training set for the GRMN with the application of the net-works.
205
Figure 50: Testing set for the GRMN with the application of the networks.
the expectation based on all the fits of all the model considered under the fit-
ting probability predicted by the network (lower panel). The success values used
206
This figure represents the values obtained on the testing set by the networks after
207
Figure 53: Training set for the CXO with the application of the net-works.
209
Figure 54: Testing set for the CXO with the application of the networks. This
210
211
Figure 57: Training set for the MKTX with the application of the net-works.
213
Figure 58: Testing set for the MKTX with the application of the networks.
214
215
Figure 61: Training set for the MSCI with the application of the net-works.
217
Figure 62: Testing set for the MSCI with the application of the networks. This
218
219
Figure 65: Training set for the NDAQ with the application of the net-works.
221
Figure 66: Testing set for the NDAQ with the application of the networks.
222
223
Figure 69: Training set for the NASDAQ 100 with the application of the net-
works. This figure represents the values targeted by the neural networks on
which they trained on. On the upper panel the different strategies are plotted.
The best fitted mixed strategy (orange) is closed to the optimal strategy (green).
The best strategy based on the success value (blue) to combine all the quenched
225
Figure 70: Testing set for the NASDAQ 100 with the application of the net-
works. This figure represents the values obtained on the testing set by the net-
works after optimizing them. The upper panels shows the different results for
the application of the different trading strategies: the best strategy predicted
(green); the fitted mixed strategy (orange); the combined quenched strategy
(blue). Each strategy is based on the prediction given by the corresponding
neural network. The fitted mixed strategy is based on the Kelly weight obtained
by combining the expectation based on all the fits of all the model considered
under the fitting probability predicted by the network (lower panel). The suc-
cess values used to combine the quenched strategies is the ones predicted by the
corresponding network (second panel).
226
works. This figure represents the values obtained on the testing set by the
networks after optimizing them. Each outputs (the supervised strategy λs , the
success values α, the bubble signals q and the fits truthiness) have been normal-
ized using the extremum normalization. The rolling normalization have been
done on the five past data points. The upper panels shows the different re-
sults for the application of the different trading strategies: the best strategy
predicted (green); the fitted mixed strategy (orange); the combined quenched
strategy (blue). Each strategy is based on the prediction given by the cor-
responding neural network. The fitted mixed strategy is based on the Kelly
weight obtained by combining the expectation based on all the fits of all the
model considered under the fitting probability predicted by the network (lower
panel). The success values used to combine the quenched strategies is the ones
predicted by the corresponding network (second panel).
227
works. This figure represents the values obtained on the testing set by the
networks after optimizing them. Each outputs (the supervised strategy λs , the
success values α, the bubble signals q and the fits truthiness) have been nor-
malized using the standard normalization. The rolling normalization have been
done on the five past data points. The upper panels shows the different re-
sults for the application of the different trading strategies: the best strategy
predicted (green); the fitted mixed strategy (orange); the combined quenched
strategy (blue). Each strategy is based on the prediction given by the cor-
responding neural network. The fitted mixed strategy is based on the Kelly
weight obtained by combining the expectation based on all the fits of all the
model considered under the fitting probability predicted by the network (lower
panel). The success values used to combine
228 the quenched strategies is the ones
predicted by the corresponding network (second panel).
Figure 73: Training set for the EW with the application of the net-works.
229
Figure 74: Testing set for the EW with the application of the networks. This
230
231
Figure 77: Training set for the EQIX with the application of the net-works.
233
Figure 78: Testing set for the EQIX with the application of the networks. This
234
235
Figure 81: Training set for the EXR with the application of the net-works.
237
Figure 82: Testing set for the EXR with the application of the networks. This
238
239
Figure 85: Training set for the CME with the application of the net-works.
241
Figure 86: Testing set for the CME with the application of the networks. This
242
243
Figure 89: Training set for the GOOGL with the application of the net-works.
245
Figure 90: Testing set for the GOOGL with the application of the networks.
246
247
Figure 93: Training set for the SWISSMI with the application of the net-works.
249
Figure 94: Testing set for the SWISSMI with the application of the networks.
250
251
Figure 97: Training set for the WYN with the application of the net-works.
253
Figure 98: Testing set for the WYN with the application of the networks. This
254
Figure 99: Testing set for the WYN with the application of the networks. This
255
Figure 100: Testing set for the WYN with the application of the networks.
Figure 101: Training set for the DLR with the application of the net-works.
257
Figure 102: Testing set for the DLR with the application of the networks. This
258
259
Figure 105: Training set for the LVS with the application of the net-works.
261
Figure 106: Testing set for the LVS with the application of the networks. This
262
263
Figure 109: Training set for the STX with the application of the net-works.
265
Figure 110: Testing set for the STX with the application of the networks. This
266
267
Figure 113: Training set for the ILMN with the application of the net-works.
269
Figure 114: Testing set for the ILMN with the application of the networks.
270
271
Figure 117: Training set for the TPR with the application of the net-works.
273
Figure 118: Testing set for the TPR with the application of the networks. This
274
275
Figure 121: Training set for the ICE with the application of the net-works.
277
Figure 122: Testing set for the ICE with the application of the networks. This
278
279
Figure 125: Training set for the UAA with the application of the net-works.
281
Figure 126: Testing set for the UAA with the application of the networks. This
282
283
Figure 129: Training set for the LDOS with the application of the net-works.
285
Figure 130: Testing set for the LDOS with the application of the networks.
286
287
Figure 133: Training set for the DVN with the application of the net-works.
289
Figure 134: Testing set for the DVN with the application of the networks. This
290
291
Figure 137: Training set for the LKQ with the application of the net-works.
293
Figure 138: Testing set for the LKQ with the application of the networks. This
294
295
Figure 141: Training set for the NVR with the application of the net-works.
297
Figure 142: Testing set for the NVR with the application of the networks. This
298
299
Figure 145: Training set for the LYV with the application of the net-works.
301
Figure 146: Testing set for the LYV with the application of the networks. This
302
303
Figure 149: Training set for the PKG with the application of the net-works.
305
Figure 150: Testing set for the PKG with the application of the networks. This
306
307
Figure 153: Training set for the AKAM with the application of the net-works.
309
Figure 154: Testing set for the AKAM with the application of the networks.
310
311
Figure 157: Training set for the UPS with the application of the net-works.
313
Figure 158: Testing set for the UPS with the application of the networks. This
314
315
Figure 161: Training set for the NRG with the application of the net-works.
317
Figure 162: Testing set for the NRG with the application of the networks. This
318
319
Figure 165: Training set for the CRM with the application of the net-works.
321
Figure 166: Testing set for the CRM with the application of the networks. This
322
323
L Trading Strategies Results
Figure 169: Sharpe ratio in function of the time period T for the BH (left),
the uniform critical time (center) and the fitting critical time (right) strategies.
and the dark line in it is the median.
Figure 170: CAGR in function of the time period T for the BH (left), the
uniform critical time (center) and the fitting critical time (right) strategies.
325
Figure 171: VaR in function of the time period T for the BH (left), the uniform
critical time (center) and the fitting critical time (right) strategies. The dark
shade represent the interval from the 0.1-quantile to the 0.9-quantile and the
dark line in it is the median.
Figure 172: Accuracy in function of the time period T for the BH (left), the
uniform critical time (center) and the fitting critical time (right) strategies. The
dark shade represent the interval from the 0.1-quantile to the 0.9-quantile and
the dark line in it is the median.
326
Figure 173: Average return in function of the time period T for the BH (left),
the uniform critical time (center) and the fitting critical time (right) strategies.
327
Figure 174: Sharpe ratio in function of the time period T for the quenched
Kelly strategies (based on the models presented Table 1, p∼8), the uniformly
mixed and fitting mixed Kelly strategies and the combining quenched strategy
(from top to bottom). For each row, there is a strategy and the firts collumn is
tha pure strategy, the second one the strategy with the uniform breaking time
328
and the last one the strategy with the fitting breaking time. The dark shade
represent the interval from the 0.1-quantile to the 0.9-quantile and the dark line
in it is the median.
Figure 175: CAGR in function of the time period T for the quenched Kelly
strategies (based on the models presented Table 1, p∼8), the uniformly mixed
and fitting mixed Kelly strategies and the combining quenched strategy (from
top to bottom). For each row, there is a strategy and the firts collumn is tha pure
strategy, the second one the strategy with the uniform breaking time and the
329
last one the strategy with the fitting breaking time. The dark shade represent
the interval from the 0.1-quantile to the 0.9-quantile and the dark line in it is
the median.
Figure 176: VaR in function of the time period T for the quenched Kelly strate-
gies (based on the models presented Table 1, p∼8), the uniformly mixed and
fitting mixed Kelly strategies and the combining quenched strategy (from top
to bottom). For each row, there is a strategy and the firts collumn is tha pure
330
the median.
Figure 177: Accuracy in function of the time period T for the quenched Kelly
strategies (based on the models presented Table 1, p∼8), the uniformly mixed
and fitting mixed Kelly strategies and the combining quenched strategy (from
top to bottom). For each row, there is a strategy and the firts collumn is tha pure
331
the median.
Figure 178: Average return in function of the time period T for the quenched
Kelly strategies (based on the models presented Table 1, p∼8), the uniformly
mixed and fitting mixed Kelly strategies and the combining quenched strategy
(from top to bottom). For each row, there is a strategy and the firts collumn is
tha pure strategy, the second one the strategy with the uniform breaking time
332
and the last one the strategy with the fitting breaking time. The dark shade
represent the interval from the 0.1-quantile to the 0.9-quantile and the dark line
in it is the median.
References
[1] Didier Sornette, Anders Johansen, Jean-Philippe Bouchaud. Stock Market
Crashes, Precursors and Replicas. Journal de Physique I, EDP Sciences,
1996, 6 (1), pp.167-175. ff10.1051/jp1:1996135ff. ffjpa00247175
[2] A. Johansen, O. Ledoit, D. Sornette, ”Crashes as Critical Points”, Interna-
tional Journal of Theoretical and Applied Finance, Vol.3, No. 2, pp.219-255
(2000)
[3] Gerlach JC, Demos G, Sornette D. 2019 Dissection of Bitcoin’s multiscale
bubble history from January 2012 to February 2018. R. Soc. open sci. 6:
180643. http://dx.doi.org/10.1098/rsos.180643
[4] D. Sornette, Discrete scale invariance and complex dimensions, Physics

Reports 297, 239-270 (1998), DOI: 10.1016/S0370-1573(97)00076-8
[5] E.O. Thorp, The Kelly criterion in blackjack, sports betting and stock
market, Finding the Edge: Mathematical Analysis of Casino Games, June
1997, DOI: 10.1016/S1872-0978(06)01009-X
[6] V. Filimonov, G. Demos D. Sornette (2017) Modified profile likelihood in-
ference and interval forecast of the burst of financial bubbles, Quantitative
Finance, 17:8, 1167-1186, DOI: 10.1080/14697688.2016.1276298
[7] Q. Zhang, D. Sornette, M. Balcilar, R. Gupta, Z.A. Ozdemir, H. Yetkiner,
LPPLS bubble indicators over two centuries of the S&P 500 index, Physica
A 458, pp.126-139 (2016).
333

Master Thesis Virgile Troude Final

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Master Thesis Virgile Troude Final

Uploaded by

Copyright:

Available Formats

Financial Crash Identification

for Trading strategies and Consultancy services

Submitted to the Department of Management, Technology and

Supervisor: Jan-Christian Gerlach Student: Virgile Troude

Professor: Didier Sornette

2 Models & Calibration 3

3 Trading Strategies for Regime Classification 25

4 Current Time Estimation 54

A Martingale Representation Theorem 90

B Computation of the Ornstein-Uhlenbeck Process 90

C From Return to Log-Return 90

F Spin Glass and Quenched Kelly Weight 92

H Code Python: Model classes 94

I Results on Synthetic Data 123

J Application of the Global Phase Labelling Procedure on the

K Supervised Learning Results 193

L Trading Strategies Results 325

Financial Crash Identification

Kelly Weight Strategies

Measure Strategies Performances

Current time estimation

Figure 1: Map representation of the project.

2.2 Modelling Financial Asset Price Dynamics

2.2.1 The Log-Periodic Power-Law Singularity (LPPLS) Model

∀s > t Et [ps ] = pt , (1)

ht ≈ (tc − t)m−1 (B0 + C0 cos [ω ln(tc − t) + φ]) . (4)

ln pt = lt = A + |tc − t|m (B + C cos [ω ln |tc − t| + φ]) , (5)

dpt = µpt dt ⇒ pt = p0 eµt , (6)

2.2.2 Error Models

νt = σWt where: dWt ∼ N(0,√dt) , (11)

dνt = −θνdt + σdWt , (12)

corr(νt , νt+∆t ) = σ 2 e−θ∆t , (15)

Figure 2: 0.8-quantile in function of the time for a standard Gaussian distri-

2.2.3 Summary of the Error Models

States models name models

2.2.4 DS LPPLS Confidence Bubble Indicator (CI)

where the sum is taken over the elements q ∈ QLPPLS

qualG (t1 , t2 ) = signµ(t1 , t2 ) , (20)

where the sum is taken over the elements q ∈ QG t2 .

Parameters Boundaries Descriptions Pre-Conditions

Table 2: Domain of definition of the parameters of the LPPLS model, where

2.3.1 Maximum Log-Likelihood

where yt = ln pt . Based on these distribution formulations, the log-likelihood is

2.3.2 Optimization Procedure

where ~a · ~b denotes the Euclidean scalar product.

2. Solve the Polynomial of degree 2 in θ (Eq.23) to get a guess θ∗ .

To optimize the non-linear parameters, the Newton Conjugate-Gradient method

LPPLS LogLik_BM Ito_model

LogLik_OU Predictive models and the

Cost functions and

Predictive model with an

Predictive model with

Figure 3: Map of the class hierarchies of the implementations of the different

The goal of a such construction is to give the possibility to be able to complete

from model import LPPLS , LogLik GN , LogLik BM ,

c l a s s LPPLS GN(LPPLS , LogLik GN ) : p a s s

• method: Defines the optimization method to use. Although ev-

’ ’ ’ Ito model f i t t i n g application ’ ’ ’

2.4 Synthetic Data

• The distance from the original critical time (tc0 − t2 ).

2.4.2 Synthetic Data Generation

tend = 60, 100, 250, 375, 500 days.

• The rate of mean-reversion θ. For θ = 0, the OU process simplifies to a

θ = 0, 0.03, 0.1, 1, 1.5 days−1 .