Thesis BL

The Black Litterman Model: A non-normal approach
Carlos Felipe Bedoya Riveros
University Sergio Arboleda

School of Exact Sciences and Engineering, Department of Mathematics
Bogota, Colombia
2021
The Black Litterman Model: A non-normal approach
Carlos Felipe Bedoya Riveros
Senior thesis in order to be eligible for the tittle of:

Professional in Mathematics
Adviser:
Ph.D. Martha Corrales
University Sergio Arboleda

School of Exact Sciences and Engineering, Department of Mathematics
Bogota, Colombia
2022
Contents
1 Introduction 6
2 Justification 7
3 Objectives 8
3.1 General Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Specific Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4 Theoretical Framework 9
5 Methodology 11
5.1 A brief description of Bayesian Statistics . . . . . . . . . . . . . . . . . . 11
5.2 Monte Carlo Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.2.1 Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . 12
5.2.2 Markon Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.3 The Metropolis-Haistings Algorithm . . . . . . . . . . . . . . . . . . . . . 15
5.4 The Markowitz Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.5 The Black Litterman Model . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.5.1 The views of the stake-holders . . . . . . . . . . . . . . . . . . . . 19
5.5.2 Expected returns according to the Black Litterman Model . . . . 20
5.5.3 The Bayesian elements of the model . . . . . . . . . . . . . . . . . 20
5.6 Skewed Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.6.1 The Gamma Distribution . . . . . . . . . . . . . . . . . . . . . . 22
5.6.2 The Skew-Normal Distribution . . . . . . . . . . . . . . . . . . . 23
3
5.7 The model for skewed innovations . . . . . . . . . . . . . . . . . . . . . . 23
5.7.1 Gamma Regression Model . . . . . . . . . . . . . . . . . . . . . . 24
5.7.2 Skew-Normal Regression Model . . . . . . . . . . . . . . . . . . . 26
6 Analysis of the Results 28
6.1 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.2 Bayesian Regression and Final Results . . . . . . . . . . . . . . . . . . . 29
7 Conclusions 31
8 Appendix 32
8.1 The Gamma Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
8.2 The Skew Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . 33
8.3 Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
8.4 Estimation of Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4
Abstract
The Black Litterman Model (BLM) is widely used tool for the estimation of the expected
excess return of a given portfolio. It belongs to the field of Bayesian Statistics, and,
assumptions on the prior and likelihood distributions are made. This work relaxes the
likelihood and prior distribution normality assumptions, and explores two scenarios, using
a Gamma and Skew Normal Distribution. The posterior distribution is determined up to
a constant and then data is sampled through Metropolis Hastings algorithms. A specific
exercise is executed and its results are shown the end of the document.
Key words
Black Litterman Model, Bayesian linear regression, Skew Normal regression, Gamma
Distribution, Markov Chains, Metropolis Hastings.
5
1 Introduction
The BLM allows to calculate the expected excess return and the variance of a portfolio.
This is accomplished by establishing both the prior and the likelihood distributions to be
normal, leading to a posterior one of the same type, with a specific value for the mean,
given by the formula of BLM. This approach is adequate when dealing with normally
distributed data. However, erroneous results may arise if the likelihood or prior distri-
bution is skewed. Thus, another approach is indeed necessary. This paper explores two
options as an alternative: Gamma and Skew Normal distributions, the latter is used for
an application with real data.
As explained in the following sections, Bayesian statistics allows to update a prior

distribution into a posterior one, which leads to sampling from it. The most convenient
way is to choose the distributions such that prior and posterior are conjugates. If that is
not the case case but the exact form of the posterior distribution is known, the sampling
process may actually be simple, at least from a computational point of view. If that is
not the case, then other methods are needed. Monte Carlo Markov Chains (MCMC) is
a broad family of algorithms that simulate data that converges to the target, posterior
distribution, by using key elements of Markov Chains. In particular, the Metropolis
Hastings algorithm proposes a practical way of sampling from the posterior distribution.
This paper applies Metropolis Hastings to sample from the distribution and estimate the
mean associated to that distribution. An example is proposed and solved by using R.
This paper also offers a concise explanation of its most important theoretical elements.
It contains a brief introduction to Bayesian statistics, highlighting important features as
conjugate distributions, and what the main idea of Bayesian statistics is, at least from
a pragmatic and operational point of view. Then, the Markowitz model is presented,
which is important in the context the origin of BLM and since this work can be easily
extended into portfolio optimization. Next, it explores the main characteristics of Markov
Chains, including key points as the existence of stationary distributions, as well as a
general intuition for the importance of Monte Carlo simulations, leading to the Metropolis
Hastings algorithm, which is explained in general terms. It is also important to be familiar
with the distributions used in this paper, some of their general features are presented
in the Appendix. Finally, we explain the particular characteristics of the algorithm
implemented in this document and show the results. The proposed application focus on
the returns of companies in the banking, mining and electronics industries.
6
2 Justification
Portfolio optimization is a fundamental process where the investor aims at either maxi-
mizing the expected return or minimizing variability. Both the mean and variance of a
given portfolio are key elements in such process. There are different ways of approaching
this task, a common one is to assume equilibrium returns as it is the case for the Capital
Asset Pricing Model (CAPM). However, to account for the expected return of a portfolio
or its variance is indeed necessary to take into account several factors, not exclusively
the equilibrium of the market. As an innovation, Black and Litterman (1990) [7] pro-
poses a novel way of calculating both the expected return and the variance. It achieves
such goal by introducing to the portfolio theory field a versatile branches of Statistics,
namely, Bayesian Statistics. The BLM weights both the returns associated to the market
equilibrium and the analyst’s views, which leads to a more realistic and useful result.
The BLM has been incorporated in portfolio optimization processes since the way of
calculating the expecting returns allows to introduce views on a given pool of assets.
Nevertheless, the original model, now viewed as a Bayesian model, assumes both dis-
tributions (prior and likelihood) to be normal, which is not necessarily true for several
applications since it is possible to encounter data with nonzero skewness. Thus, a more
realistic approximation to the process must take into account such non-normality feature
of several data sets. A version for the BLM that takes into account skewness of data is
relevant since it allows to optimize a given portfolio of assets, taking into account the
subjective views of experts, and using non-normal data adequately.
Papers such as Theodossiou and Savva (2016) [30] shows clear evidence of negative
skewness in the distribution of portfolio excess return based upon data from the one-
month Treasury bill and NYSE, AMEX, or NASDAQ. In fact, the presence of Skewness
in financial data as evident as to inspire the writing of diverse literature on such topic;
including Adcock et al (2015) [1], it shows that when skewness is present in asset returns,
the skew-normal and skew-Student distributions are viable candidates in both theoretical
and empirical terms, given its tractability and since clear interpretations can be made.
However, this not only applicable in assets allocation. According to it, in actuarial
science, the presence of skewness in insurance claims data is the principal motivation
for using the skew-normal distribution and some natural extensions. Therefore, skewed
distributions are of paramount importance when modelling random variables observed
in the financial market as a whole. Thus, to take into account the skweness of data in
the calculation of the expected excess return and variance of a given portfolio leads to
more accurate results. Furthermore, linear regression is fundamental for a given section
of the BLM, the error vector is usually assumed to be normally distributed. This paper
explores other scenarios were that vector is assumed to be skewed in order to model the
market in a more realistic way.
7
3 Objectives
3.1 General Objective

• To sample from the posterior distribution of the BLM when the likelihood and prior
are assumed to be Skew-Normal.
3.2 Specific Objectives

• To perform a Bayesian regression assuming that innovations follow Skew-normal
distributions.
• To determine the form of the posterior distribution of the BLM, up to a constant.
• To sample from the posterior distribution by using Metropolis Hastings.
8
4 Theoretical Framework
The Black Litterman Model belongs to the branch of Bayesian Statistics, a successful
approach to statistics since it allows to establish a given belief and to update it through
data. The implementation of this model has led to several works on applications and the
true meaning of the mathematical features and assumptions behind it. He and Litterman
(2002) [20] shows that the optimal portfolios asset allocation model have a very simple,
intuitive property. The unconstrained optimal portfolio in the Black-Litterman model
is the market equilibrium portfolio plus a weighted sum of portfolios representing the
investor’s views, diving into the intuition behind this model. These types of works are
frequent in modern finance literature since the original paper Black and Litterman (1990)
[7] does not get too technical, leaving some remaining points to clarify in subsequent works
such as Stachell and Scowcroft (2000) [26]. Several applications have been executed and
published, for instance, Bertismas et al (2012) [5] focuses on inverse optimization, this
approach allows to expand the scope and applicability of the model. It also incorporates
investor information on volatility and market dynamics. On the other hand, Beach and
Orlov (2007) [4] or uses GARCH-derived views as an input into the model, showing
that the returns on the chosen portfolio surpass those of portfolios that rely on market
equilibrium weights or Markowitz-optimal allocations.
The approach of the present document is to relax the normally distributed assumption
on a specif distribution. Since it is often the case the the resulting posterior distribution
is not totally known, then we sample data based upon Monte Carlo Markov Chains
(MCMC). Such approach, or family of methodologies, is applicable in a wide range of
fields. Books such as Dagpunar (2007) [10] shows a wide application of MCMC to finance;
including Brownian motion, asset pricing, etc.
On the other hand, highly sophisticated papers such as Calderhead (2011) [11] of-
fers a methodology that exploits the natural representation of a statistical model as a
Riemannian manifold. The methods developed in that paper provide generalisations of
the Metropolis-adjusted Langevin algorithm and the Hybrid Monte Carlo algorithm for
Bayesian inference. The performance Riemannian manifold-oriented algorithms is as-
sessed by performing Bayesian inference on logistic regression models, log-Gaussian Cox
point process models, stochastic volatility models, and both parameter and model level
inference of dynamical systems described by nonlinear differential equations. Metropolis-
Hastings is also applied in different circumstance. In an integrated work, Schamberger et
al (2017) [27] develops Bayesian inference for a latent factor copula model, which utilizes
a pair copula construction to couple the variables with the latent factor. Then adaptive
rejection Metropolis sampling (ARMS) is used within Gibbs sampling for posterior sim-
ulation. The resulting simulation shows favorable performance of the proposed approach
both in terms of sampling efficiency and accuracy. Finally, an extensive application using
historical data on European financial stocks that forecasts portfolio Value at Risk (VaR)
9
and Expected Shortfall (ES) is shown. Another interesting application is developed in
Roberts and Stramer (2001) [25]. A new Markov chain Monte Carlo approach to Bayesian
analysis of discretely observed diffusion processes is introduced, as the authors treat the
paths between any two data points as missing data. The paper offers a transformation of
the diffusion which breaks down dependency between the transformed missing paths and
the volatility of the diffusion. Finally, two efficient Markov chain Monte Carlo algorithms
are proposed in order to sample from the posterior-distribution of the transformed miss-
ing observations and the parameters of the diffusion. The applications include simulated
data and also to Eurodollar short-rate data. Other application can be found in Miao
(2019) [24], which is oriented towards natural language. The paper proposes CGMH, a
novel approach using Metropolis-Hastings sampling for constrained sentence generation.
CGMH allows complicated constraints such as the occurrence of multiple keywords in
the target sentences, which cannot be handled in traditional RNN-based approaches.
Finally, both Bayesian and skew-normal regressions are fundamental for the variation
to the BLM used in this paper. A thorough exposition and its connection to Relevance
Vector Machine is presented in Bishop and Tipping (2003) [6]. It is a tutorial that offers
an overview of the Bayesian approach to pattern recognition in the context of simple
regression and classification problem, which is a common use for Bayesian regression. An
advanced application for finance is found in Gerlach et al (2011) [17] as well, it focused on
Value-at-Risk forecasting. Some well-known dynamic conditional autoregressive quantile
models are generalized to a fully nonlinear family. The Bayesian solution to the general
quantile regression problem, via the Skewed-Laplace distribution, is adapted and designed
for parameter estimation in this model family via an adaptive Markov chain Monte Carlo
sampling scheme. Regarding skew-normal regressions, the literature is abundant and
relevant works such as Cancho (2010) [12] discusses inference aspects of the skew-normal
nonlinear regression models following both, a classical and Bayesian approach, extending
the usual normal nonlinear regression models. It is also applied in Durante (2019) [16]
to probit regressions, due to the lack of a tractable posterior, Markov chain Monte Carlo
routines and algorithms are performed. Additionally, Xie et al (2009) [32], uses this
type of regression as a natural extension for the usual regression and performs a critical
analysis of the resulting variance.
In general, the BLM has been extensively studied due to its relevance in portfolio the-
ory. The main research focuses in either theoretical extensions or a thorough explanation
of each component of the model. Also, core topics such as MCMC, including Metropolis
Hastings, and non-normal regressions are widely applied in several fields, from biologi-
cal sciences to financial analysis. They are not only versatile in terms of applications,
they also fit very well with an ample range of complementary topics such as time series,
copulas, differential geometry, etc.
10
5 Methodology
The Black Litterman Model allows to calculate the expected value of portfolio’s excess
return. It is relatively easy to use and a versatile option for investments management. It
is a model within the Bayesian statistics framework. The present section of the document
contains a brief description of Bayesian statistics. The simplest considered case occurs
when the posterior distribution belongs to a specific family, for example, a multivari-
ate normal distribution. The more general case, where the exact form of the posterior
distribution is unknown, is also considered through the use of the Metropolis-Haistings
Method. Finally, the Black Litterman Model is described since a Bayesian perspective
and what the model looks like when the normality assumption on the prior distribution
is relaxed.
5.1 A brief description of Bayesian Statistics
The Bayesian approach to statistics is an important part of today’s statistical practice

in several fields. It has important differences with respect to the more traditional, fre-
quentist approach. However, at the end they both allow for statistical inference and
prediction. It is based upon the premise that beliefs about a given parameter are up-
dated through the recollection and processing of data. That is to say, a prior distribution
is set, data is gathered and then the distribution is updated by taking into account
both the previous thoughts and the information subtracted from the collected data. In
the Bayesian approach parameters are not considered fixed but unknown, as shown in
Bolstad (2016) [9], they are taken as a realizations of random distributions.
Bayes Theorem is fundamental to understand the main ideas of Bayesian statistics,

this theorem states that given events A and B, it follows that:
P (B|A)P (A)
P (A|B) =
P (B)
If A is thought as an event related to the parameter and B related to the collected

data, then the probability of the parameter given the data (posterior distribution) is
proportional to the product of the likelihood and the probability of the parameter (prior
distribution). That is the mathematical mechanism behind the beliefs updating process.
There are several ways to harness the main ideas behind Bayesian statistics and take
advantage of them. One of the most useful strategies is to to choose the likelihood and
the prior in a way such that their product leads to same type of distribution than the
prior. That is to say, the features of both the posterior and prior distributions are the
same, the only aspect that changes is the parameter vector. When this occurs then both
11
distributions are conjugate to each other. That is indeed the case for the BLM. If the
distributions are not conjugate but the exact form of the posterior distribution is known,
then the quantile transformation can be used to sample from it. If it is known up to a
constant, which is usually the case, then samples can be gathered through methods such
us the accept-reject algorithm. For the latter case, it is also useful to use Monte Carlo
Markov Chains (MCMC) such as the Metropolis-Haistings Algorithm.
5.2 Monte Carlo Markov Chain
Monte Carlo Markov Chain (MCMC) is a widely used family of models that allows to
simulate data related to a given distribution through the combination of both Monte
Carlo methods ans Markov Chains. It basically simulates data that converge to the
real distribution by using the concept of stationary distribution related to some Markov
Chains.
5.2.1 Monte Carlo Simulation
Monte Carlo methods are widely used for an ample variety of cases. Consider the fol-
lowing, and known problem. Assume that there is a person throwing darts to a square
of side length equal to 2 meters. There is a circle inscribed in the square of radius equal
to 1 meter. Suppose that all darts hit the square. The natural question that arises is
about the probability of a dart to hit either the circumference or its interior. A simple
computation shows that it is π4 . The alternative approach is to simulate a high number
of random points inside of the square and calculate the proportion of points that are
located within or on the circle. Both approaches should lead to fairly close values for
high volume samples.
In the previous example, the analytic method is simple enough. However, in a lot of
applications, the calculation of the probability of an event by using analytic techniques
can be next to impossible. For those cases, a simulation of a large sample and the
calculation of the proportion of values that satisfy a given event can be an effective
approach. That approach could be enough for problems where the main interest is the
result of the calculation itself.
5.2.2 Markon Chain
Markov Chains are a specific type of stochastic process. It is extensively treated in several
sources regarding both theory and practice, such as Brooks (2011) [10], Gilks et al (1995)
12
[18], among others.
Markov Chains are applied in several fields. In order to make sense of it, assume that
there is a particle that may be in one of n states.1 That particle can move from one state
to another at any given period. The main feature of a Markov Chain is that it satisfies:
P (Xn+1 = in+1 | Xn = in ) = P (Xn+1 = in+1 | Xn = in , . . . , Xn−k = in−k , . . . )
where ik is the state of the particle at the k th period. The former equality does not
mean that the states are independent from one another. Rather, it may be roughly stated
as the future being independent from the past, conditional to the present.2
There are several important elements when studying a Markov Chain, the most impor-
tant being the transition matrix Q = (q)ij , where qij is the probability of moving from
state i the state j. Consider the following example:
 
0.2 0.4 0.4
Q = 0.3 0.6 0.1
0.2 0.5 0.3
The former matrix shows, for instance, that if a particle is currently at state 1, the
probability of moving to state 3 in the next period is 0.4. Notice that the sum of the
row entries is 1, meaning that the particle must either remain where it is, or move to a
different state. Another important feature is that the (i, j) entry of Qk represents the
probability of going from state i to state j in k steps.
There is another important element, the initial probability distribution of the particle,
given by the row vector s. The j th entry of the vector, represents the probability for the
particle to be at state j, initially. Furthermore, the j th value of sQk shows the probability
for the particle to be at state j, after k steps.
The previous elements are of paramount importance and can fully describe how the
particle behaves. However, the most relevant feature of Markov Chains regarding sam-
pling is the existence of stationary distributions. A row vector distribution s is called
stationary if it satisfies that:
s = sQ
1
The number of states may actually be infinite, but for the sake of simplicity, a finite number n is
assumed.
2
Think of it as a particle whose current state will determine their future state. However, how the
particle got to where it is turns out to be completely irrelevant.
13
The former equations basically means that from that point on the distribution will remain
the same, thus the name of ’stationary’. Notice that s can be found by solving a linear
system and by applying eigenvalues and eigenvectors techniques. Nevertheless, for certain
type of Markov Chains, there are far easier ways to compute s.
A Markov Chain is called irreducible if it is possible for the particle to move from
any state i to any state j in a finite number of steps. For such Markov Chains it is
guaranteed the existence of an unique stationary distribution. Also, if all the values of
Qm are positive for some m, then:
si = limn→∞ P (Xn = i)
That is to say, no matter what the initial point is, the distribution tends to the stationary
one.
There is a particular type of Markov Chain for which computations may be executed
in a more efficient fashion. They are called reversible Markov Chains and the satisfy:
si Qij = sj Qji (1)
Notice that this conditions implies stationarity since

X X X
si Qij = sj Qji = sj Qij = sj
i i i
The first term in the equality is, by definition, the j th value of sQ. Thus, s is stationary.
To see how (1) allows to bypass potentially long calculations, consider the following
situation. Let G be an undirected graph with vertices {1, . . . , n} each one representing
a possible state for the particle. For the sake of simplicity, assume that all states are
equally likely. For example, if the node number 2 is linked to the nodes 3, 4 and to itself
by an edge, then the probability of either remaining at 2, or going to 3 or to 4 is 31 . The
previous argument immediately shows that the probability of going from i to j in one
step is zero if there is no edge connecting them and d1i , where di is the degree of i. Thus,
the chain is reversible since:
1 1
di Qij = di = dj = dj Qji
di dj
Therefore, the vector of degrees is proportional to the stationary distribution. It now

suffices to normalize it by setting:
di
si = P
j dj
14
5.3 The Metropolis-Haistings Algorithm
As previously stated, Bayesian Statistics revolves around the idea of updating beliefs
about a given parameter θ. The prior distribution, which encompasses the beliefs3 held
about θ is represented by Π(θ). After collecting n observations the beliefs about θ is now
represented by Π(θ | X1 , . . . , Xn ), the following relations holds
p(X1 , . . . , Xn |θ)Π(θ)
Π(θ | X1 , . . . , Xn ) =
p(X1 , . . . , Xn )
where
Z
p(X1 , . . . , Xn ) = p(X1 , . . . , Xn | θ)Π(θ)dθ
Θ
Thus, Π(θ | X1 , . . . , Xn ) is known up to a constant, such constant, given by the integral

shown before, may be extremely difficult to compute. Therefore, there is a clear need
for an algorithm that allows to sample from a distribution when it is known up to such
constant. One possible solution is to apply a ’accept-reject’ method which basically states
that if a probability distribution p(X) is proportional to f (X) then we can sample from
p(X) by choosing a convenient and known distribution g(X) and a constant M such
that f (X) < M g(X) over its entire domain. That method will indeed lead to sampling
from p(X), however, it could be highly inefficient since the vast majority of the proposed
samples may be rejected.
The Metropolis-Hastings Algorithm offers a far more efficient solution, through Monte
Carlo simulation of Markov Chains that converge to the target distribution. The algo-
rithm exploits the main advantage of a Markov Chain, namely, the existence of a sta-
tionary distribution to which the distributions at any step are converging. The Markov
Chain that is going to be created is an irreducible one, in particular, it is reversible. It is
highly expected that after, say, 10.000 iterations, the distribution associated to each step
is the stationary one. Thus, these first 10.000 4 are informally called ’burn-in’ samples,
whereas all the following samples are coming from the actual distribution and are the
ones that are going to be collected. On the other hand, we know that the hallmark
of a Markov Chain is that the current state is what leads to the future one. Consider
the following example: start by choosing a distribution g(X) that is convenient enough
and have the same domain than the target distribution. Let xn be a sample collected
at the nth step. Then, xn+1 may be sampled from a distribution g(X) whose mean is
xn . Now, we need to calculate the probability of going from a state xn to a state xn+1 .
Nevertheless, it is necessary to guarantee the detailed balance condition, meaning that
the chain is actually reversible, which is to say that for all states a, b it must hold that:
3
A belief can be thought of as stating that θ should be, say, between 0.4 and 0.5 with a 95% confidence.
4
This number is by no means a definitive bound, it is just an useful rule of thumb.
15
p(a)Q(a|b) = p(b)Q(b|a)
where p(a) is the probability associated to state a and Q(a|b) is the probability of going
from a to b in one step, the right side is analogous. The former equality is equivalent to
f (a) f (b)
g(a|b)Q(a|b) = g(b|a)Q(b|a)
N N
where N is the normalizing constant. Let A(b|a) be the probability of going from a to
b in one step, then it holds that
A(a|b) f (b) g(a|b)
=
A(b|a) f (a) g(b|a)
Let rf = ff (a)
(b)
and rg = g(a|b)
g(b|a)
. If rf rg < 1, let A(b|a) = rf rg and A(a|b) = 1. Else,
1
A(b|a) = 1 and A(a|b) = rf rg
. Which is simply
A(b|a) = min{rf rg , 1}
It is clear that A(.) is the acceptance probability of the new sample. The final step
consists in taking a random number from the uniform distribution in [0, 1]. The A is
greater or equal that such random value, then the sample is accepted, otherwise the
process repeats.
In order to dive into the intuition of the algorithm let us analyze the Metropolis
scenario, where g(X) is symmetric, therefore rg = 1. In that case, it is true that
A(b|a) = min{rf , 1}
which is equivalent to the fact that if p(b) > p(a) then A(b|a) = 1, meaning, the next
step we are moving towards a more likely sate, which implies that most of the samples
are coming from high density areas of the distribution. The reader can think of this
algorithm as a way of ’discovering’ where the high density areas are and staying around
them as much as possible, which is a desirable feature when sampling from a distribution.
A more detailed explanation of algorithm can be found in Chib and Greenberg (1995)
[13].
In the present document, we specify the posterior distribution up to a normalizing

constant and use the the Metropolis Hasting Algorithm to sample from such distribution.
16
5.4 The Markowitz Model
The Markowitz Model is an extremely important model for Portfolio Theory, there is a
significant amount of research devoted to it, for example Markowitz (1991) [22]. This
model claims that the investors are interested in obtaining optimal results in terms of
expected return and variance. There are two main variants: to maximize the expected
return subject to a given variance or to minimize variance subject to a given expected
return.
Given a collection of n assets, the model takes their expected returns and the variance-
covariance matrix as inputs. The investor could minimize the volatility or variance of a
given portfolio by solving
Min w σp2 = w0 Σw
subject to
Rp = w 0 R
w0 1̂ = 1
where w is a vector such that wi represents the proportion of the total wealth invested
on the ith asset. Σ is the variance-covariance matrix, Rp is the vector of expected returns
of the portfolio, which simply is a weighted average of the expected returns of the n
assets. The restrictions state the desired level of expected return and that all the wealth
is invested in the n assets. Solving this problem allows to link each level of expected
return to an optimum variance and vice-versa, where several possible pairs come from
several different portfolios. Therefore, the remaining question revolves around how to
choose one such portfolio. This question was studied in Sharpe [29] by the reward to
volatility ratio introduced by William Sharp
rp − rf
SR =
σp
where rf is the free risk neutral return, rp is the return of the portfolio and σp is its
standard deviation. Thus, maximizing SR is indeed a way of finding the optimal portfolio
according to this model. Therefore, the problem at hand is to solve
w0 R − rf
Max w 1
(w0 Σw) 2
subject to
w0 1̂ = 1
whose solution is
17
Σ−1 (R − rf 1̂)
w=
1̂0 Σ−1 (R − rf 1̂)
Notice that the expected return and the variance-covariance matrix are inputs into
the optimum portfolio. The Black Litterman Model (BLM) allows to include views of
investor into the calculation of such values, making the portfolio optimization process
more versatile. This paper does not aim at finding the optimum portfolio by using the
modified version of the BLM, but it can be done in a follow-up research.
5.5 The Black Litterman Model
The Black Litterman Model is a widely used model in portfolio management and assets
allocation. It was introduced in the decade of 1990s, and it has been extensively used ever
since the publication of Black and Litterman (1990) [7]. It allows to take into account
two important elements, the equilibrium excess return of a given asset and the particular
views of analysts.5 The BLM is based upon fundamental ideas of Bayesian Statistics,
namely, how a given set of beliefs is updated through data management. Specifically, the
BLM calculates the expected value of the excess return of a given portfolio. When viewed
from a general perspective, the model performs a weighted average of the equilibrium
expected return and the returns coming from the analyst’s views. The BLM is closely
related to the CAPM since both are equivalent in the absence of views.
Consider a set of n assets indexed by i ∈ {1, . . . , n}. Let R be the vector of excess
return 6 with
R ∼ N (µ, Σ)
where µ is modelled as a random variable, as shown in Walters (2014) [31]. This vector
of expected values plays an important part in the model. It is unknown, thus, it must be
modelled. Likewise, µ is deeply linked to the expected return observed under equilibrium
in the markets, Π, leading to
µ ∼ N (Π, τ Σ)
where τ is a scalar that reflects the uncertainty regarding the calculation of Π.7 In is
generally accepted that 0.01 < τ < 0.05 according to Idzorek (2007) [21].
5
This is extremely important in capital markets. Even though the technical analysis is of paramount
importance; the fundamental analysis, the inclusion of the intuition after years of closely observing a
given market is remarkably useful.
6
Given by the difference between the return rate of the asset and the risk-free rate
7
This is calculated through models such as the Capital Asset Pricing Model.
18
The former describes how the assets behave under normality and equilibrium assump-
tions. Now let us study how the views are taken into account inside the model.
5.5.1 The views of the stake-holders
The analyst has k views. Each view is a statement that the portfolio pk has normal
distribution with mean qk and standard deviation wk . This can be represented by
P = [p1 , . . . , pk ]
Q = [q1 , . . . , qk ]
where P can be seen as the matrix that contains the views and Q as the vector that
contains the expected values of each pk . Thus, the views can be expressed as:
Pµ = Q + (2)
where is a random vector, normally distributed with mean zero and a covariance matrix
Ω which lead to
P µ ∼ N (Q, Ω)
Where Ω is a matrix whose diagonal elements ωii are positive and the rest of the
entries are identically zero. This encompasses the idea that the views are assumed to be
independent from one another.
An important element when considering views is how confident the analyst is about
them He and Litterman (2002), this is directly expressed in Ω since ωii is inversely
proportional to the confidence that such analyst has regarding the ith view. Also, the
calculation of Ω is as follows:
Ω = diag (P (τ Σ)P )
Notice that it is completely coherent with the intuition about τ and Σ since the former
represents the uncertainty regarding the calculation of the equilibrium rates and the latter
is the covariance matrix of R. Thus, both elements, if you will, control the variability of
the resulting returns.
As an example, let us consider a simple situation: assume there are four assets A, B,
C and D. The analyst thinks A is expected to have a 8% return. Also, C is expected
19
to outperform D by 6%. The former view is absolute, whereas the latter is relative,
being that type of views more common in real practice. Therefore, the matrix P is the
following:

1 0 0 0
P =
0 0 1 −1

0.08
Q=
0.06
Notice that (2) implies that µ1 = 0.08+1 and that µ3 −µ4 = 0.06+2 , which perfectly
expresses the content of the view in mathematical terms.
5.5.2 Expected returns according to the Black Litterman Model
As previously stated, the BLM is a weighted average, according to

−1
µBL = (τ Σ)−1 + P 0 Ω−1 P (τ Σ)−1 Π + P 0 Ω−1 P Q

(3)
That is to say (3) depicts a weighted average between Π and Q, namely, the equilibrium
excess returns determined by the market and the particular views provided by the analyst.
A more realistic version is as follows: Rather than providing a exact value for Q, we
can consider:
µ̂ = (P 0 P )−1 P 0 Q
Therefore, the equivalent version of the model is:
−1
µBL = (τ Σ)−1 + P 0 Ω−1 P τ Σ)−1 Π + P 0 Ω−1 P µ̂

(4)
Notice that linear regression is introduced, where the dependent variable is Q and P
is the design matrix.
5.5.3 The Bayesian elements of the model
It is usually stated that the BLM belongs to the realm of Bayesian statistics. Let us
show what the particular components are. That is to say, it is vital to identify the prior,
likelihood and posterior distributions.
20
In order to show how the prior turns into the posterior, consider that if X is a random
variable with multivariate normal distribution p(X) and
−2 log (p(x)) = x0 Hx − 2η 0 (x) + M (5)
where M does not depend on x. Then, it follows that Var(x) = H −1 and E(x) = H −1 η.
The former expression is quite important, since it allows to find an explicit form of the
posterior distribution, up to a normalization constant.
The prior distribution is
µ ∼ N (Π, τ Σ) (6)
whereas the likelihood distribution is
pµ ∼ N (Q, Ω) (7)
By (6) and (7) we have that the posterior is proportional to

1 0 −1 1 0 −1
exp − (µ − Π) (τ Σ) (µ − Π) − (P µ − Q) (Ω) (P µ − Q)
2 2
leading to −2 log (p(x)) being proportional to:
µ0 (τ Σ)−1 + P 0 Ω−1 P µ − 2 µ0 (τ Σ)−1 Π + P 0 Ω−1 Q + Π(τ Σ)−1 ΠQ0 Ω−1 Q

Thus, the posterior distribution is normally distributed with

−1
E(µ) = (τ Σ)−1 + P 0 Ω−1 P (τ Σ)−1 Π + P 0 Ω−1 Q

(8)
−1
Var(µ) = (τ Σ)−1 + P 0 Ω−1 P (9)
Notice that (3) is the expected value of the posterior distribution. It is important to
stress that the Black-Litterman formula only applies when assuming normal distributions
for both the likelihood and the prior. However, the regression structure embedded in
the likelihood distribution may have non-normal errors, thus, other scenarios may be
considered as well. Furthermore, the market might present non-normal returns, other
distributions may be considered for the prior distribution as well.
21
5.6 Skewed Distributions
In this paper, both prior and likelihood distributions are not assumed to be normal. It is
useful to assume distributions that are skewed, for instance, the Gamma 8 Distribution
and the Skewed-Normal Distribution.
Some commonly used distributions such as Normal and t−student are centered around
their means, which is an useful property to perform calculations. However, there are
several applications where the data shows that it comes from an asymmetrical distribution
with respect to the mean. This is quantified by the skewness of a distribution, which is
given by
" 3 #
X −µ
E
σ
where X is a random variable and µ, σ are its mean and standard deviation, respec-
tively. A negative skweness implies that the left tail is longer, thus, most of the mass is
concentrated in the right tail, whereas as positive skweness means the opposite situation.
5.6.1 The Gamma Distribution
A random variable X is said to have a Gamma Distribution if its density function is
β α xα−1 e−βx
f (x; β, γ) =
Γ(α)
where Γ(.) is the Gamma Function and α, β are called the shape and rate parameters,
respectively. This distribution applies only for positive values. 9 This kind of distribu-
tions are useful in several and varied applications and there are famous particular cases,
such us the Exponential Distribution and the Chi-Squared Distribution. Other details
about the Gamma Distribution can be found in the Appendix.
8
This case only applies when modelling excess returns that are greater than a constant ξ.
9
When centered at zero, if centered at ξ the it applies only for x > ξ
22
5.6.2 The Skew-Normal Distribution
Let X be a random variable, then it follows a Skew-Normal Distribution (SN) if its

density function is
f (x) = 2φ(x)Φ(αx)
where φ(x), Φ(x) are the standard normal density and cumulative distributions, respec-
tively. Notice that α controls the skweness of the distribution. Also, it is readily seen
that α = 0 leads to the standard normal distribution.
The SN is particularly useful when working with data such that the underlying distri-
bution seems not to be centered at the mean and where tails look different. SN is not
the only known skewed distribution, but it is an option to consider when skewness is
observed or suspected.
For location, scale and shape parameters (ξ, σ, δ), the density function is as follows

2 x−ξ x−ξ
f (x; ξ, σ, δ) = φ Φ δ
σ σ σ
Naturally, the multivariate case can be extended from marginally skew normal dis-
tributed random variables. The special case of the bivariate case is shown in Azzalini
(1996) [3].
5.7 The model for skewed innovations
Recall that the likelihood distribution of the BLM is given by the linear regression
Q = Pµ +
where P is the design matrix and is the error vector. The original model assumes
∼ N (0, Ω)
from which it follows that
P µ ∼ N (Q, Ω) (10)
The original model assume a linear regression for the views. However, may be a
vector that clearly comes from a skewed distribution. Thus, the normality assumption
may not be satisfied. Two alternatives are proposed on the distribution of : a skew-
normal distribution with parameters (0, ω, α) and a gamma distribution with parameters
(α, β). In Blasi (2009) [8], Meucci (2005) [23] and others, the skew normal case is also
considered.
23
5.7.1 Gamma Regression Model
As mentioned before, the assumption on the errors vector of a given regression is funda-
mental since the dependent variable inherits the main features on the distribution of such
vector, Cepeda(2001) [15] offers a comprehensive treatment of General Linear Models.
It offers a way to model one or more parameters for a regression where a variable Y
follows a distribution that belongs to the exponential biparametric family. Such family
of distributions are featured by
f (y|θ, τ ) = b(y) exp [θy + τ T (y) − ρ(θ, τ )]
where ρ(.) is a scalar map that satisfies regularity conditions. The parameters can be
estimated by using the Maximum Likelihood Method or in a Bayesian fashions. We use
the latter in this paper.
Let γ and β be the parameters of interest, then, as shown in Cepeda (2001) [15], we
can assume the prior distribution p(β, γ)

β b0 B0 C
∼N ,
γ g0 C 0 G0
it follows that
π(β, γ) α L(β, γ)p(β, γ)
where π(.) and L(.) are the posterior and likelihood functions respectively. In general,
both parameters β and γ can be modelled by
h(µ) = x0 β
g(τ ) = z 0 τ
In this paper we model one of the parameters and the other is estimated.
On the other hand, the conditional prior distribution is
β|γ ∼ N (b, B)
where
24
b = b0 − CG−1
0 (γ − g0 )
B = B0 − CG−1
o C
0
10
Furthermore, the conditional posterior distribution is
q(β|β̂, γ̂) = N (b∗ , B ∗ )
where
b∗ = B ∗ (B −1 b + X 0 Σ−1 Ỹ )
B ∗ = (B −1 + X 0 Σ−1 X)−1
where Ỹ is a working variable whose specif form depends upon the distribution of Y .
Also, Σ is a diagonal matrix.
α
Additionally if y ∼ G(α, λ) with µ = E(Y ) = λ
then the working variable Ỹ is defined
as
−1
1 d2 log Γ(α) 1

0 log Γ(α) αy y
ỹ = x β − − − log( ) − 1 +
α d2 α dα µ µ
10
A similar process leads to the estimation of γ as shown in Cepeda (2001) [15]
25
5.7.2 Skew-Normal Regression Model
Let
Qi = Pi0 µ + i
follow the structure of a Bayesian linear regression, where
i ∼ SN (0, σi , δi )
It is clear that a great part of the model comes to estimating µ, in Corrales and Cepeda
(2021) [14] a Bayesian method to fit skew normal regressions is proposed. Let b = Π the
the CAPM expected returns and B = τ Σ then the posterior distribution is
π(µ|q, σ, δ) α φp (µ, b∗ , B ∗ )Φn (Dδ (q − P µ); 0; Dσ )
where Dσ = σI and Dδ = δI.
The former can be also represented by the Closed Skew Normal Distribution described
in Gonzalez-Farias (2004) [19], leading to 11
π(µ|q, σ, δ) ∼ CSNp,n (b∗ , B ∗ , Dδ P, −Dσ (Q − µb∗ ))
Let us now show how b∗ and B ∗ are obtained since they play a critical role in the
computations. As shown in [14], the regression parameter µ may be estimated in a
Bayesian way by assuming a prior distribution. Let
f (ξi ) = Pi0 µ
σi2 ≡ σ 2
δi ≡ δ
where f (.) is a real and differentiable function, leading to
ξic = f −1 (Pi0 µ)
11
Assuming that the prior distribution is normal
26
where ξic is the current value of ξi . Regarding the kernel transition function for the
location parameter it is necessary to find a random variable Ti such that E(Ti ) = ξi and
V ar(Ti ) = σi2 1 − π2 τi . Thus
r
2
Ti = Qi − στ
π
The first degree Taylor approximation of f (Ti ) around ξi is
f (Ti ) ≈ Q̃ = Pi0 µ + f 0 f −1 (Pi0 µ) Ti − f −1 (Pi0 µ)

which means that

0 c 0 −1 0
2 2 c 2 2 c
Q̃i ∼ N Pi µ , f (f (Pi µ)) (σi ) (1 − (τi ) )
π
where the kernel transition for µc is
qµ = N (b∗ , B ∗ )
and

∗ ∗ −1 0 −1
b = B B b + P (τ Σ) Q̃
B ∗ = B −1 + P 0 (τ Σ)−1 P

If the regression is heteroscedastic then σi can be treated similarly as shown in Corrales

and Cepeda (2021) [14]. However, this papers focuses only in the homoscedastic case,
both σ and δ are modeled within the main algorithm. Additionally, σ and δ are estimated
through random walks.12
12
The random walk that is used to estimate both σ and δ consists in adding a random value to the
current value, then the acceptance-reject process is identical to the one described in the Metropolis
Hastings section.
27
6 Analysis of the Results
6.1 Application
The implementation of the algorithm is a two layer process. First, we establish the prior
distribution that reflects the behavior of the market. Then, we estimate the likelihood
distribution by performing a linear regression, where the innovations vector follow a
given, known, distribution. This regression is performed in a Bayesian way since the
design matrix P is too small compared to the views vector Q. When the estimation of
the regressors is complete, then, we can set a precise distribution (up to a constant) that
can be used as the main input for the Metropolis-Hastings algorithm, given that the prior
distribution is also assumed to be skew-normal. Both parts where coded in R, although
it could also be done by using Python.
Let the portfolio be composed by 5 assets: Aynet, Banco de Bogota, Kodiak Copper
Corp, Grupo Aval and Almacenes Exito. The data shows the return of each during
the period 2018-02-18 / 30-01-2020, 478 observations per asset. This set of companies
belong to sectors such as banking, retail, mining and electronics manufacturing. The
correlation between any given pair of assets is almost zero, the highest absolute of the
correlation between two of the stocks is 0.13 (Banco de Bogota y Grupo Aval). For the
sake of simplicity, we assume that the selected assets are independent from one another.
Therefore, the prior distribution of the model is going to be the product of the five
univariate distributions that best fit the data. 13
The histograms of the asset returns can be found in the appendix. The package ’sn’
is used to find the best fitting skew-nornal distribution with respect of each asset data.
For example, the parameters (ξ, σ, α) that best fit the return of Kodiak Copper Corp
is (0.003315668, 0.005471094, −0.661901371). The same process was performed for each
of the remaining four assets, the specific values are shown in the Appendix. A similar
process can be followed if a prior gamma distribution is considered to be the best fit for
the given data.
13
The non-independent case can also be addressed with base on the marginal distributions parameters
and a covariance matrix.
28
6.2 Bayesian Regression and Final Results
Let us consider the matrix design

0 0 1 0 0
P =
0 −1 0 1 0
and the vector

0.02
Q=
0
The previous matrix P and vector Q represent an absolute and a relative view. Assume
that the price of copper is rising due to intrinsic factors within capital markets. In that
case it may expected that the return of the stocks of Kodiak Copper Corp rise, let us
assume that the reasoning of the analyst leads to expect a 2% return. On the other
hand, it may expected that the difference in the returns of the stocks of Banco de Bogota
and Grupo Aval is zero since they belong to the same sector and they are currently
in a standoff where no part is getting ahead of the other. This part of the model is
completely flexible, the stakeholders may establish any relative or absolute view on the
assets that fits the fundamental, market-driven factors and/or the intuition gained afters
years on following closely the behavior of the assets involved. It is also discretionary to
perform a gamma or skew normal Bayesian regression, this paper performs the latter.
The algorithm to perform that regression is described in the previous section.
The algorithm is built in two sections: the Bayesian skew-normal regression and the
Metropolis Hastings algorithm. The previous section shows how the prior distribution
was obtained. The likelihood distribution is uniquely determined by the views. There-
fore, since views are also considered independent, then the likelihood distribution is the
product of two univariate skew-normal distributions, the location of the former is µDV I
where as the location of the latter is µGAA − µBBO .14 Both views are assumed to have
the same scale and skewness parameters.
The specifics of the regression are similar to the regression performed in Corrales and
Cepeda (2021) [14], the iterative algorithm for the regression is executed 10.000 times,
the resulting location parameters for (Q1 , Q2 ) are (−0.18499312, 0.02987677), where as
the scale and skewness parameter are 0.14142178 and 0.36800619, respectively. At this
14
DVI is the code for the stock of Kodiak Copper Corp, GAA and BBO refer to Grupo Aval and
Banco de Bogota, respectively.
29
point booth prior and likelihood distributions are fully determined. The next part is the
Metropolis Hastings algorithm itself, samples are obtained from a multivariate normal
distribution and the number of iterations is 20.000, the first 10.000 are disposed. The
final 10.000 samples show the expected return for the five assets,15 the average represents
what is expected from each of them. The vector of expected returns is16 :
(0.049, 0.083, 0.089, 0.038, −0.011)
As a final remark, recall that the result is a weighted average according to the likelihood
of each sample, the prior distribution heavily influences the final result since the final
expected average is close to zero.
15
Aynet, Banco de Bogota, Kodiak Copper, Grupo Aval and Grupo Exito.
16
The final value for returns is shown as percentage, for example, 0.19 represents 0.19%, not 19%
30
7 Conclusions
The BLM is pivotal in portfolio theory. It is flexible enough to combine the views of
the experts with a prior distribution of the assets excess returns. The idea of mixing
what is observed in the market and what is expected by the analyst makes it a powerful
tool for business decisions. The standard version of the model is simple enough due to
normality assumptions. However, when data is clearly non-normal, this approach may
lead to fallacious expectations.
Bayesian inference allows to estimate parameters of interest in a flexible way. By using

algorithm such as Metropolis-Hastings, precises estimations of means and variances can
be performed, without having a closed distribution to sample from. This feature is useful
when the important result is the estimation itself. Such versatility allows to extend the
idea of the BLM to, virtually, any given pair of known distributions who fit the nature
of the selected data.
On the other hand, the regression structure within the likelihood distribution of the
BLM makes the model robust, but yet tractable, since by estimating the regressors,
the Metropolis Hasting algorithm can be executed in a simple way. The design matrix
P and the views vector Q should be chosen by carefully observing the market, where
as the prior distribution encompasses how the market is initially expected to behave.
The more precises the assumptions and beliefs are, the more exact the results. Thus,
the implementation of this methodology is by no means a one-time exercise, rather, it
should be repeated and corrected if applied for real data as part of a complex portfolio
optimization process.
31
8 Appendix
8.1 The Gamma Distribution
As shown in [28], the Gamma Distribution comes from the Gamma Function which is
defined as
Z ∞
Γ(α) = xα−1 e−x dx
0
for α > 0, as an example consider

Z ∞
Γ(1) = e−x dx = 1
0
By using integration by parts it can be shown that Γ(α) = (α − 1)Γ(α − 1). For n a
natural number, it follows that Γ(n) = (n − 1)!
For α > 0 and β > 0 it follows that

Z ∞
Γ(α)
xα−1 eβx dx = α
0 β
from which the density function is
∞
β α α−1 βx
Z
f (x; α, β) = x e dx, x > 0
0 Γ(α)
On the other hand, its Moment Generating Function is

α
β
MX (t) =
β−t
Thus, its expected value and variance are

α
E(X) =
β
α
Var(X) = 2
β
32
8.2 The Skew Normal Distribution
As stated before, the Skew Normal Distribution generalizes the Normal distributions [2]
by setting δ = 0. It is a widely used distribution when data seems to come from a skewed
distribution. It posses three parameters related to its location, scale and shape (ξ, σ, δ).
If δ < 0 the left tail is longer and vice-versa. Its mean and variance are, respectively
12
2
E(X) = ξ + στ
π
2τ 2

2
V ar(X) = σ 1 −
π
where
δ
τ=√
1 + δ2
Finally, its skewness is quantified by

q 3
2
4 − π (τ π )
2 2
2 1 − 2τπ
Both the mean and variance can be easily derived from its Moment Generating Function,
which is
σ 2 t2

MX (t) = 2 exp ξt + Φ(στ t)
2
33
8.3 Figures
34
35
8.4 Estimation of Parameters
The parameters θ = (ξ, σ, α) that best fit the data are:
θAvnet = (−0.01008352, 0.01551082, 1.30703386)

θBanco de Bogota = (−0.002207541, 0.005037007, 1.047920455)
θKodiak Copper Corp = (0.003315668, 0.005471094, −0.661901371)
θGrupo Aval = (0.003627705, 0.005700025, −1.057545857)
θGrupo exito = (0.003659209, 0.007430643, −0.949700778)
36
References
[1] Christopher Adcock, Martin Eling, and Nicola Loperfido. “Skewed distributions
in finance and actuarial science: a review”. In: The European Journal of Finance
21.13-14 (2015), pp. 1253–1281.
[2] Adelchi Azzalini. “A class of distributions which includes the normal ones”. In:
Scandinavian journal of statistics (1985), pp. 171–178.
[3] Adelchi Azzalini and A Dalla Valle. “The multivariate skew-normal distribution”.
In: Biometrika 83.4 (1996), pp. 715–726.
[4] Steven L Beach and Alexei G Orlov. “An application of the Black–Litterman model
with EGARCH-M-derived views for international portfolio management”. In: Fi-
nancial Markets and Portfolio Management 21.2 (2007), pp. 147–166.
[5] Dimitris Bertsimas, Vishal Gupta, and Ioannis Ch Paschalidis. “Inverse optimiza-
tion: A new perspective on the Black-Litterman model”. In: Operations research
60.6 (2012), pp. 1389–1403.
[6] Christopher M Bishop and Michael E Tipping. “Bayesian regression and classifica-
tion”. In: Nato Science Series sub Series III Computer And Systems Sciences 190
(2003), pp. 267–288.
[7] Fischer Black and Robert Litterman. “Asset allocation: combining investor views
with market equilibrium”. In: Goldman Sachs Fixed Income Research 115 (1990).
[8] Francesco Simone Blasi. “Black Litterman model in a Skew Normal Market”. In:
S. Co. 2009. Sixth Conference. Complex Data Modeling and Computationally In-
tensive Statistical Methods for Estimation and Prediction. Maggioli Editore. 2009,
p. 79.
[9] William M Bolstad and James M Curran. Introduction to Bayesian statistics. John
Wiley & Sons, 2016.
[10] Steve Brooks et al. Handbook of markov chain monte carlo. CRC press, 2011.
[11] Ben Calderhead. “Differential geometric MCMC methods and applications”. PhD
thesis. University of Glasgow, 2011.
[12] Vicente G Cancho, Vıctor H Lachos, and Edwin MM Ortega. “A nonlinear regres-
sion model with skew-normal errors”. In: Statistical papers 51.3 (2010), pp. 547–
558.
[13] Siddhartha Chib and Edward Greenberg. “Understanding the metropolis-hastings
algorithm”. In: The american statistician 49.4 (1995), pp. 327–335.
[14] Martha Lucı́a Corrales and Edilberto Cepeda-Cuervo. “Bayesian modeling of loca-
tion, scale, and shape parameters in skew-normal regression models”. In: Statistical
Analysis and Data Mining: The ASA Data Science Journal ().
37
[15] Edilberto Cpeda. “Modelado de variabilidad en modelos lineales generalizados”.
PhD thesis.
[16] Daniele Durante. “Conjugate Bayes for probit regression via unified skew-normal
distributions”. In: Biometrika 106.4 (2019), pp. 765–779.
[17] Richard H Gerlach, Cathy WS Chen, and Nancy YC Chan. “Bayesian time-varying
quantile forecasting for value-at-risk in financial markets”. In: Journal of Business
& Economic Statistics 29.4 (2011), pp. 481–492.
[18] Walter R Gilks, Sylvia Richardson, and David Spiegelhalter. Markov chain Monte
Carlo in practice. CRC press, 1995.
[19] G González-Farıas, JA Domınguez-Molina, and Arjun K Gupta. “The closed skew-
normal distribution”. In: Skew-elliptical distributions and their applications: a jour-
ney beyond normality (2004), pp. 25–42.
[20] Guangliang He and Robert Litterman. “The intuition behind Black-Litterman
model portfolios”. In: Available at SSRN 334304 (2002).
[21] Thomas Idzorek. “A step-by-step guide to the Black-Litterman model: Incorporat-
ing user-specified confidence levels”. In: Forecasting expected returns in the financial
markets. Elsevier, 2007, pp. 17–38.
[22] Harry M Markowitz. “Foundations of portfolio theory”. In: The journal of finance
46.2 (1991), pp. 469–477.
[23] Attilio Meucci. “Beyond Black-Litterman: Views on non-normal markets”. In: Avail-
able at SSRN 848407 (2005).
[24] Ning Miao et al. “Cgmh: Constrained sentence generation by metropolis-hastings
sampling”. In: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33.
01. 2019, pp. 6834–6842.
[25] Gareth O Roberts and Osnat Stramer. “On inference for partially observed non-
linear diffusion models using the Metropolis–Hastings algorithm”. In: Biometrika
88.3 (2001), pp. 603–621.
[26] Stephen Satchell and Alan Scowcroft. “A demystification of the Black–Litterman
model: Managing quantitative and traditional portfolio construction”. In: Journal
of Asset Management 1.2 (2000), pp. 138–150.
[27] Benedikt Schamberger, Lutz F Gruber, and Claudia Czado. “Bayesian inference for
latent factor copulas and application to financial risk forecasting”. In: Econometrics
5.2 (2017), p. 21.
[28] Mark J Schervish and Morris H DeGroot. Probability and statistics. Pearson Edu-
cation, 2014.
[29] William F Sharpe. “Mutual fund performance”. In: The Journal of business 39.1
(1966), pp. 119–138.
38
[30] Panayiotis Theodossiou and Christos S Savva. “Skewness and the relation between
risk and return”. In: Management Science 62.6 (2016), pp. 1598–1609.
[31] CFA Walters et al. “The Black-Litterman model in detail”. In: Available at SSRN
1314585 (2014).
[32] Feng-Chang Xie, Bo-Cheng Wei, and Jin-Guan Lin. “Homogeneity diagnostics for
skew-normal nonlinear regression models”. In: Statistics & Probability Letters 79.6
(2009), pp. 821–827.
39

Thesis BL

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Thesis BL

Uploaded by

Copyright:

Available Formats

The Black Litterman Model: A non-normal approach

Carlos Felipe Bedoya Riveros

University Sergio Arboleda

Carlos Felipe Bedoya Riveros

Senior thesis in order to be eligible for the tittle of:

University Sergio Arboleda

3.1 General Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.2 Specific Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

5.1 A brief description of Bayesian Statistics . . . . . . . . . . . . . . . . . . 11

5.2 Monte Carlo Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . 12

5.2.1 Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . 12

5.2.2 Markon Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

5.3 The Metropolis-Haistings Algorithm . . . . . . . . . . . . . . . . . . . . . 15

5.4 The Markowitz Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

5.5 The Black Litterman Model . . . . . . . . . . . . . . . . . . . . . . . . . 18

5.5.1 The views of the stake-holders . . . . . . . . . . . . . . . . . . . . 19

5.5.2 Expected returns according to the Black Litterman Model . . . . 20

5.5.3 The Bayesian elements of the model . . . . . . . . . . . . . . . . . 20

5.6 Skewed Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5.6.1 The Gamma Distribution . . . . . . . . . . . . . . . . . . . . . . 22

5.6.2 The Skew-Normal Distribution . . . . . . . . . . . . . . . . . . . 23

5.7.1 Gamma Regression Model . . . . . . . . . . . . . . . . . . . . . . 24

5.7.2 Skew-Normal Regression Model . . . . . . . . . . . . . . . . . . . 26

6 Analysis of the Results 28

6.2 Bayesian Regression and Final Results . . . . . . . . . . . . . . . . . . . 29

8.1 The Gamma Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

8.2 The Skew Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . 33

8.4 Estimation of Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 36

As explained in the following sections, Bayesian statistics allows to update a prior

3.1 General Objective

3.2 Specific Objectives

• To determine the form of the posterior distribution of the BLM, up to a constant.

• To sample from the posterior distribution by using Metropolis Hastings.

5.1 A brief description of Bayesian Statistics

The Bayesian approach to statistics is an important part of today’s statistical practice

Bayes Theorem is fundamental to understand the main ideas of Bayesian statistics,

If A is thought as an event related to the parameter and B related to the collected

5.2 Monte Carlo Markov Chain

5.2.1 Monte Carlo Simulation

5.2.2 Markon Chain

P (Xn+1 = in+1 | Xn = in ) = P (Xn+1 = in+1 | Xn = in , . . . , Xn−k = in−k , . . . )

si Qij = sj Qji (1)

Notice that this conditions implies stationarity since

Therefore, the vector of degrees is proportional to the stationary distribution. It now

Thus, Π(θ | X1 , . . . , Xn ) is known up to a constant, such constant, given by the integral

In the present document, we specify the posterior distribution up to a normalizing

5.5 The Black Litterman Model

5.5.1 The views of the stake-holders

5.5.2 Expected returns according to the Black Litterman Model

As previously stated, the BLM is a weighted average, according to

5.5.3 The Bayesian elements of the model

−2 log (p(x)) = x0 Hx − 2η 0 (x) + M (5)

The prior distribution is

whereas the likelihood distribution is

By (6) and (7) we have that the posterior is proportional to

leading to −2 log (p(x)) being proportional to:

µ0 (τ Σ)−1 + P 0 Ω−1 P µ − 2 µ0 (τ Σ)−1 Π + P 0 Ω−1 Q + Π(τ Σ)−1 ΠQ0 Ω−1 Q

Thus, the posterior distribution is normally distributed with

5.6.1 The Gamma Distribution

A random variable X is said to have a Gamma Distribution if its density function is

Let X be a random variable, then it follows a Skew-Normal Distribution (SN) if its

5.7 The model for skewed innovations