Extreme Value Theory and Financial Risk Management

∗
Extreme Value Theory and financial risk management
†
Mandira Sarma
March 21, 2002
∗
An earlier draft of this paper was presented at the the Fifth Capital Markets Conference at UTI Institute
of Capital Markets, Mumbai, India. I thank the participants of the conference for helpful comments and
suggestions. All errors are mine.
†
Address: Indira Gandhi Institute of Development Research, Goregaon (East), Mumbai 400 065. Phone:
+91-22-840-0919. Fax: +91-22-840-2752. Email: mandira@igidr.ac.in
1
Abstract
Financial risk management is about understanding the large movements in the market
values of asset portfolios. The conventional approach of estimating market risk measures
deal in assuming a Gaussian distribution for the innovations of the return series. This
approach may lead to faulty risk measures if the innovation distribution is non-Gaussian,
which is often the case for financial series. This paper uses extreme value theory to
explicitly model the tail regions of the innovation distribution of the return series of a
prominent Indian equity index, the S & P CNX Nifty. We find that the lower tail of the
Nifty innovations behaves very much like the lower tail of the standard Gaussian curve,
while the upper tail has significant ”tail thickness”. This inherent asymmetry and the
existence of tail–thickness can provide valuable information to the risk manager. The
EVT-based tail quantiles have been used to make daily forecasts of Value-at-Risk for the
portfolio under consideration at 95% and 99% levels for a long and a short position in the
Nifty portfolio. These forecasts are found to be providing statistically sound risk measures
for the portfolio under consideration.
KEY WORDS Extreme value theory; Value-at-Risk; pseudo maximum likelihood estima-
tion; correct conditional coverage
JEL Classification: C10, C13, C22, G10
2
1 Introduction
Value-at-Risk (VaR) is widely used as a tool for measuring the market risk of asset portfolios.
It quantifies in monetary terms the exposure of a portfolio to the market fluctuations. It is
defined as the maximum monetary loss of a portfolio such that the likelihood of experiencing
a loss exceeding that amount, due to its exposure to the market movements, over a specified
risk horizon is equal to a pre-specified tolerance level.
Extreme value theory (EVT) deals with the study of the asymptotic behaviour of extreme
(maxima and minima) observations of a random variable. Financial risk management is all
about understanding the large movements in the values of asset portfolios. It essentially
deals with the analysis of the tail regions of the distribution of changes in the market value of
the portfolio. Extreme value theory, by dealing with only extreme observations, can provide
a better treatment to the estimation of tail quantiles like VaR. In conventional techniques
of measuring risk, inferences about the tail region is made after estimating the entire return
distribution. In such an approach, the observations in the interior of the distribution dominate
the estimation process and since extreme observations consist only a small part of the data,
their contribution in the estimation is relatively smaller than the observations in the central
part of the distribution. Therefore in such an approach the tail regions are not accurately
estimated.
Extreme value theory, on the other hand, focuses primarily on analysing the extreme obser-
vations rather than the observations in the central region of the distribution. The theory
provides robust tools for estimating only the tails by making use of the available data. Tail
quantiles like VaR can be estimated more accurately by using EVT than the conventional
approaches.
Another appealing aspect of EVT is that it does not require to make a priori assumption
about the return distribution. The fundamental result of extreme value theory, known as the
“extremal types theorem”, identifies the possible classes of distributions for movements of
the extreme returns irrespective of the actual underlying return distribution. This extremely
3
powerful result of the extreme value theory makes the VaR estimation process free from any
a priori assumption about the portfolio return distribution. Moreover, EVT based methods
inherently incorporates separate estimation of the upper and the lower tails, and thereby em-
phasises the necessity to treat both the tails separately due to possible existence of asymmetry
in the return series. This becomes important when estimating VaR measures for long and
short positions. Conventional models of VaR estimation treats both the tails symmetrically
and hence VaR for the long and short positions are assumed to be equal in magnitude.
This paper uses the recent developments of extreme value theory to analyse the tails of
the innovation distribution of the Nifty returns. Using the “Peaks-Over-Threshold” (POT)
model (McNeil and Frey, 1999) we estimate the tail regions of the innovation distribution of
the Nifty returns. We find that the lower tail of the Nifty innovation behaves very much like
a Gaussian tail whereas the upper tail behaves significantly different from that of a Gaussian
tail. The upper tail is found to exhibit significant “tail thickness” which indicates existence of
asymmetry in the innovation distribution. The existence of asymmetry and the tail thickness
in the innovation distribution provides valuable information while estimating risk measures
based on tail quantiles.
The rest of this paper is organised as follows. Sections 2 and 3 present an overview of extreme
value theory and its application in financial risk management. Section 4 describes the “Peaks-
Over-Threshold (POT)” model used in this paper. Section 5 provides an empirical analysis
of the tails of the Nifty innovations. In Section 5.4 daily 99% and 95% VaR forecasts for a
short and a long Nifty position have been estimated by using the estimated tail quantiles.
Theses measures are tested for the existence of “correct conditional coverage” in section 6.1.
Section 7 concludes this paper.
2 An overview of Extreme Value Theory
The classical Extreme Value theory (EVT) deals with the study of the asymptotic behaviour
of extreme observations (maxima or minima of n random realisations).
4
Suppose that X ∈ (l, u) is a random variable with density f and cdf F . Let X1 , X2 , ....Xn be
n independent realisations of the random variable X. Define the extreme observations as
Yn = max{X1 , X2 , ....., Xn }
Zn = min{X1 , X2 , ...., Xn }
The extreme value theory deals with the distributional properties of Yn and Zn as n becomes
large.
It can be easily shown that the exact distributions of the extreme observation is degenerate
in the limit. In order to find a distribution of interest which is non-degenerate, the extrema
Yn and Zn are transformed with a scale parameter an (> 0) and a location parameter bn ∈ R,
such that the distribution of the standardised extrema
Yn − an
bn
Zn − an
and
bn
is non-degenerate.
The two extremes, the maximum and the minimum are related by the following relation:
min{X1 , X2 , ..., Xn } = −max{−X1 , −X2 , ..., −Xn }
Therefore, all the results for the distribution of maxima leads to an analogous result for the
distribution of minima and vice versa. We will discuss the results for maxima only and ignore
the same for the minima1 .

1
A brief description about the minima can be found in Leadbetter et al. (1983)
5
2.1 The Fisher-Tippett Theorem
The Fisher-Tippett theorem (1928) is a fundamental result in EVT. The importance of this
result is that it exhibits the possible limiting forms for the distribution of Yn under linear
transformations even without the exact knowledge of the underlying distribution F . The
“Fisher-Tippett Theorem”, also known as the “Extremal type theorem” states thus:
If ∃ constants an (> 0) and bn ∈ R such that
Yn − an d
−→ H as n → ∞
bn
for some non-degenerate distribution H, then H must be one of the only three possible ‘extreme
value distributions’.
In that case, X (and the underlying distribution F ) is said to belong to the (maximum)
domain of attraction of the extreme value distribution H. It is denoted by X ∈ DA(H).
More specifically, this basic result states that if there exist some suitable normalizing constants
Yn −an
an (> 0) and bn , the transformed maxima bn has a non-degenerate limiting distribution
function H(x), then H must have one of only three possible “forms”. The limit laws for
maxima were derived by Fisher and Tippett (1928). A first rigorous proof is due to Gnedenko
(1943). De Haan (1970) subsequently provided a simpler proof and Weissman (1977) provided
a simpler version of de Haan’s proof.
The three possible probability laws for suitably normalised extrema are: the Gumbel (or Type
I) distribution, the Fréchet (or Type II) distribution and the Weibull (or Type III) distribu-
tion2 . The Gumbel distribution is a limit law for the thin-tailed distributions such as the
normal or log-normal distributions. The Fréchet distribution is obtained as a limiting distri-
bution for the fat-tailed distributions such as Student’s-t or the Stable Paretian distributions.
The marginal distribution of a stationary garch process is also in the domain of attraction
of the Fréchet family. Finally, the Weibull distribution is obtained when the distribution of
returns has no tail.

2
Details about these distributions can be found in Leadbetter et al. (1983) and Embrechts et al. (1997)
6
Von Mises (1976) gives necessary and sufficient conditions for a distribution F to belong to
the domain of attraction of a particular extreme value distribution. Using these conditions
it can be established that there exist suitable normalising constants for certain well known
distributions to belong to the domain of attraction of a unique extreme value distribution 3 .
For example,
• Normal, Exponential, lognormal (and other monotone transformation of the normal
distribution) ∈ DA(Gumbel)
• Pareto, Cauchy, students-t, fat-tailed distributions ∈ DA(Arećbet)
• Uniform, beta ∈ DA(Weibull)
• Poisson, Geometric ∈
/ any domain of attraction
2.2 The Generalized Extreme Value Distribution
The three families of extreme value distributions, viz. the Gumbel, the Fréchet and the
Weibull, can be nested into a single parametric representation, as shown by Jenkinson and Von
Mises. This representation is known as the “Generalised Extreme Value” (GEV) distribution,
and given by
− 1ξ
Hξ (x) = exp{−(1 + ξx) (1)
where
1 + ξx > 0
The support of ξ is
1
x > − if ξ > 0
ξ
1
x < if ξ < 0
ξ
x ∈ R if ξ = 0
3
Leadbetter et al. (1983)(Chap 1) and Embrechts et al. (1997) (Chap 3) discuss the Von Mises’ conditions
and derive norming constants for each specific distribution to belong to a particular domain of attraction.
7
The parameter ξ, called the tail index, models the distribution tails. Each of the three extreme
value distributions can be obtained as a special case of the GEV distribution. When ξ > 0,
we get the Fréchet distribution, when ξ < 0 we get the Weibull distribution and ξ = 0 is the
case of the Gumbel distribution.
These results imply that essentially all the common, continuous distributions of statistics
belong to the domain of attraction of a single family Hξ , the extreme value distributions
being differentiated only by the value of ξ. This shows the generality of the extremal types
theorem.
2.3 The Pickands-Balkema-de Haan Theorem
Suppose that X1 , X2 , ....Xn are n independent realisations of a random variable X with a
distribution function F (x). Let u be the finite or infinite right endpoint of the distribution
F . The distribution function of the excesses over certain (high) threshold k is given by
F (x + k) − F (k)
Fk (x) = Pr{X − k ≤ x|X > k} =
1 − F (k)
for 0 ≤ x < u − k.
The Pickands-Balkema-de Haan theorem (Balkema & de Haan 1974; Pickands 1975) states
that if the distribution function F ∈ DA(Hξ ) then ∃ a positive measurable function σ(k) such
that
lim sup |Fk (x) − Gξ,σ(k) (x)| = 0

k→u 0≤x<u−k
and vice versa, where Gξ,σ(k) (x) denote the Generalised Pareto distribution.
The above theorem states that as the threshold k becomes large, the distribution of the
excesses over the threshold tends to the Generalised Pareto distribution, provided the under-
lying distribution F belongs to the domain of attraction of the Generalised Extreme Value
distribution.
8
2.4 The Generalised Pareto Distribution (GPD)
The GPD is given by
1 − (1 + ξx/σ)−1/ξ ; if ξ 6= 0
(
Gξ,σ (x) = (2)
1 − exp(−x/σ); if ξ = 0
where σ > 0, and the support of x is x ≥ 0 when ξ ≥ 0 and 0 ≤ x ≤ −σ/ξ when ξ < 0.
3 Extreme Value theory in risk management
There are two broad categories of approaches which uses the results of the extreme value
theory while estimating market risk of financial assets. The first among them, known as the
‘Block Maxima Model (BMM)’ utilises the ‘extremal types theorem’ to model the distribution
of extreme (largest or smallest) observations collected from non-overlapping blocks of fixed
size from the data. Then the ‘generalised extreme value’ distribution is fitted to these block
extrema. This distribution would reflect the behaviour of very high profits (in case of maxima)
and very high losses (in case of minima) from the portfolio.
For example, suppose the data consists of the daily returns of a particular portfolio and we
are interested to analyse the lower tail of the portfolio return distribution. In this case, the
BMM approach would involve the fitting of the GEV distribution Hξ (x) to the minimum
observations collected from non-overlapping blocks over the entire sample. If the block size
is 25 (a month), the the 5th percentile of this distribution would give the magnitude of the
daily loss that can be expected with probability 0.05, which is the daily loss level that one can
expect to face once in 20 months. Such a value is known as the ‘stress loss’ with probability
0.05.
Longin (2000) develops a method for VaR estimation using the BMM approach. He develops
a formula for VaR estimation by relating the distribution of the extremes and the distribu-
tion of the underlying returns in terms of the parameters of the ‘generalised extreme value’
distribution. This approach can be used even for stationary non-iid time series by estimating
9
an additional parameter called the ‘extremal index’4 .
The above measures of market risk are essentially unconditional. These measures are constant
over the forecast period, and do not incorporate the changing time series dynamics of the
underlying returns.
The second approach, known as the ‘Peaks-Over-Threshold (POT)’ model, attempts to esti-
mate the tails of the underlying return distribution, instead of modeling the distribution of
extremes as in the BMM approach.
In the POT model, a certain threshold is identified to define the starting of the tail of the re-
turn distribution. Then the distribution of the ‘excesses’ over the threshold point is estimated.
There are two approaches of estimating the ‘excess’ distribution, viz. the semi-parametric
models based on the Hill estimator (Danielsson and de Vries, 1997) and the fully parametric
model based on the Generalised Pareto distribution (GPD) (McNeil and Frey, 1999). The
Hill estimator based approach is limited in its application as it requires the assumption of fat
tails of the underlying return distribution. On the other hand, the gpd version is applicable
to any kind of distribution, fat-tailed or no. This approach utilises the Pickands-Balkema-de
Haan theorem to fit a generalised Pareto distribution to the excesses over specific thresholds.
The following sections describe the gpd approach of the pot model in details.
4 The Peaks-over-Threshold Model: the GPD approach
The POT model provides for a framework of estimating the tails (positive or negative tails)
of the return distribution by estimating what is known as the distribution of excesses over
certain threshold point which identifies the starting of the tail.
The distribution of excesses over a high threshold k on the portfolio’s loss distribution F is
defined by
Φk (y) = P r{X − k ≤ y|X > k}

4
To know more about this methodology, refer to Longin (2000).
10
In terms of the underlying loss distribution F ,
F (y + k) − F (k)
Φk (y) = (3)
1 − F (k)
Pickands-Balkema-de Haan theorem says
Φk (y) → Gξ,β(k) (y) (4)
for
k→u
Thus, using the Pickands-Balkema-de Haan theorem, one can model the distribution of the
excesses over the threshold k as a GPD, provided the threshold is sufficiently high.
Setting x = k + y and using (3) and (4), we can rewrite F as
F (x) = (1 − F (k))Gξ,β (x − k) + F (k) (5)
for x > k.
Using HS estimate for F (k) and ML estimates of the GPD parameters gives rise to the
following tail estimator formula
!(− 1 )
Nk x−k ξ̂
F̂ (x) = 1 − 1 + ξˆ (6)
N β̂
For a given probability p > F (k), a tail quantile is estimated by inverting the tail estimator
formula (6),
β̂ N

q̂p = k + (1 − p)−ξ̂ − 1 (7)
ξˆ Nk
11
5 A POT analysis of the Nifty tails
In this section, we present a pot analysis of the innovation distribution of the returns of a
prominent Indian equity portfolio, the S & P CNX Nifty.
The Section 5.1 describes the data. Section 5.2 describes the estimation procedure in details.
Section 5.4 provides an example of VaR estimation using a two-stage methodology (McNeil
and Frey, 1999) described in Section 5.5.
5.1 Data
The data consists of 2697 daily logarithmic returns of the Nifty portfolio (from 3 July 1990
till 15 March 2002). The first 1,250 observations (from 3 July 1990 till 7 May 1996) comprise
the estimation window and the rest of the 1446 observations (from 8 May 1996 till 15 March
2002) are used for making rolling window “out–of–sample” VaR forecasts.
5.2 Estimation procedure
5.2.1 Time series model for the Nifty return series
As the pot can be used only for iid data, we need to remove the time series dynamics from
the return series and obtain an iid series free from any time series dynamics.
At first we try to ascertain the time series structure of the Nifty return series. A specification
search in terms of AIC and SBC criteria leads us to choose the ar(1)-garch(1,1) model as
the best model5 .
Table 1 presents the estimated parameters of the mean and volatility equations of the Nifty
returns. The constant term in the mean equation is found to be insignificant although the
ar(1) coefficient is significant.

5
Values of the AIC and SBC criteria for various specifications of the time series can be obtained on request.
12
The parameters in the volatility equations, viz., the constant, the arch(1) parameter and
the garch(1) parameter, are all found to be significant.
We extract the standard residuals from the estimated model and investigate how the moments
of the residual series changed after the time series dynamics from Nifty is removed. Table 2
presents the values of the first four unconditional moments of the Nifty series and the standard
residuals obtained from an ar(1)-garch(1,1) specification of the Nifty returns. The values
of these descriptive statistics indicate the existence of positive skewness and leptokurtosis in
both the raw returns as well as the standard residuals.
Table 3 presents the values of the test statistics for testing the significance of skewness, excess
kurtosis and autocorrelation (upto lag 35) along with their respective p-values for the original
return series and the residual series. The results of Table 3 indicates that the return series has
significant skewness, excess kurtosis and autocorrelation. The residual series is found to have
significant skewness and excess kurtosis but it does not possess significant autocorrelation.
Thus, neither the return series nor the residual series can be considered to be normally
distributed, since both the series have significant positive skewness and excess kurtosis.
As table 3 indicates, the residual series is found to be free from autocorrelation and hence we
can apply the results of EVT to the residual series.
5.2.2 Modeling peaks-over-thresholds
The Pickands-Balkema-de Haan theorem offers the generalised Pareto distribution as a natural
choice for the distribution of excesses (peaks) over sufficiently high thresholds. However,
while choosing an appropriate threshold, one faces a trade off between bias and variance. The
theoretical consideration suggests that the threshold should be as high as possible for the
Pickands-Balkema-de Haan theorem to hold good, but in practice, too high a threshold might
leave us with very few observations above the threshold for estimating the GPD parameters6 .
The GPD estimators are unbiased if and only if k → u, i.e. if the threshold is sufficiently
high. However, if it is chosen very high, there may be very few observations left for estimating
6
For more on this issue, see McNeil and Frey (1999).
13
the GPD parameters to the tail, leading to statistical imprecision and very high variance of
the estimates.
There is no correct choice of the threshold level. While McNeil and Frey (1999), McNeil (1996)
and McNeil (1999) use the “mean-excess-plot” as a tool for choosing the optimal threshold
level7 , Gavin (2000) uses an arbitrary threshold level of 90% confidence level (i.e. the largest
10% of the positive and negative returns are considered as the extreme observations).
In this paper we follow Neftci (2000) and choose the threshold level as 1.65 times the uncon-
ditional variance of the residuals8 . This represents the 5% of extreme movements if the data
were normally distributed. On the both sides of the tails, observations lying beyond 1.65
times the unconditional variance are considered to be extremes9 .
Table 4 presents the estimated threshold point, the number of extreme observations beyond
the threshold, and the results of the maximum likelihood estimation of the GPD to the
excesses (peaks) over the chosen threshold for the lower tail and the upper tail respectively of
the i.i.d. residual series obtained by fitting an ar(1)-garch(1,1) model to the Nifty returns.
The first column of this table gives the estimated threshold points corresponding to the chosen
level of threshold. For the lower tail the threshold point is -1.6493 and for the upper tail it is
1.6494.
The second column of Table 4 gives the number of extreme observations beyond the thresholds
on both the tails, the third column gives the estimated cdf at the thresholds, the fourth and
the fifth columns presents the ML estimation of the GPD parameters (with SEs in parenthesis)
fitted to the peaks over the thresholds.
The estimates of the gpd parameters can be used in the tail quantile estimation formula (7)
to estimate the tails of the distribution.
Table 5 presents some of the estimated quantiles on the lower tail and the upper tail along
7
Details on mean-excess-plots can be found in McNeil and Frey (1999) and Embrechts et al. (1997).
8
We tried with the mean excess plots but did not get a well behaved linear mean excess plot.
9
For analysis of the lower tail, i.e. the minima, we use the negative returns and then apply results for
maxima.
14
with the empirical quantiles and the corresponding quantiles on the standard normal curve10 .
The first column of this table indicates the probability levels and the second, third and the
fourth columns give the corresponding quantiles. The quantiles on the second column are
estimated from the gpd approximation, the quantiles on the third columns are empirically
observed quantiles and the ones on the fourth column are the corresponding quantiles on the
standard normal curve. Panel A of Table 5 provide the quantiles on the lower tail and the
Panel B reports the quantiles on the upper tail.
This table shows that the estimated tail quantiles are closer to the empirical quantiles than
that of a normal distribution approximation. This implies that a normal distribution approx-
imation of the underlying dgp would lead to misleading risk estimates. Particularly for the
upper tail, the normal quantiles seem to be underestimating the empirical distribution, while
the EVT-based quantiles are estimating it more precisely. This indicate the presence of the so
called ‘fat–tailed’ behaviour of financial series. The existence of ‘fat-tails’ for this particular
data is also established by the existence of significant excess kurtosis as shown in table 2.
Approximating a fat–tailed distribution by normal distribution leads to underestimation of
risk.
It is interesting that the discrepancy between the normal quantiles and the empirical quantiles
is less in the case of the lower tail than the upper tail.
Another point worth noting is that the tail quantiles indicate the existence of asymmetry, as
the pth quantile is not equal in absolute value to the (1 − p)th quantile.
These aspects show that assumption of normality do not reflect the actual riskiness of the
portfolio.
10
These quantiles can be considered as the unconditional VaR measures for the i.i.d. residual series obtained
from three models – the EVT-based GPD approach, the historical simulation and the Normal distribution
model.
15
5.3 The KS test for discrepancy
To test for the significance of difference between the estimated and empirical tails, and to test
whether the discrepancies between the normal approximation and the estimated quantiles is
statistically significant, we carry out a non-parametric Kolmogorov-Smirnov test.
Suppose that F (x), G(x) and φ(x) denote the empirical, the estimated and the normal dis-
tribution functions.
First we test the following hypotheses:
H0 : F (x) = G(x)
against the alternative hypothesis
H1 : F (x) 6= G(x)
This hypothesis tests whether the estimated quantiles are significantly different from the
empirical quantiles. The hypothesis testing is done separately for the lower as well as the
upper tail. We do a two-sided Kolmogorov-Smirnov test to test the hypothesis.
The second hypothesis tests whether the tails of the empirical distribution is significantly
higher than that of the normal distribution. The hypothesis is
0
H0 : F (x) = φ(x)
and the alternative hypothesis is
0
H1 : F (x) > φ(x)
We carry out a one-sided Kolmogorov-Smirnov test for this.
The KS statistic for the above two tests are respectively
D = supx |F (x) − G(x)|
and
D+ = supx |F (x) − φ(x)|
16
Table 6 provides the estimated KS-statistics for these hypotheses for the lower and the upper
tails. The discrepancy between the estimated and empirical tails is found to be insignificant
at 0.05 level of significance.
Significance of D+ for the upper tail indicates that the empirical quantiles on the upper
tails are significantly higher than the normal quantiles. This establishes the existence of
significant “tail thickness” on the upper tail. However, the normal approximation does not
lead to significant underestimation of the lower tail quantiles. This implies that the lower tail
of the empirical distribution seems to be behaving much like the normal tail, while the upper
tail is displaying ‘fat–tailed’ behaviour.
Figure 1 give a plot of the estimated lower and upper tails of the Nifty innovation distributions
along with the empirical and the normal tails.
It is seen that for the lower tail, the fitted as well as the normal approximation fit the data
well as there do not seem to be much of a discrepancy between the empirical tail and the
fitted as well as the normal tails. It is also clear that the gpd fit is good only beyond the
threshold -1.6493 from where the lower tail is believed to have started, and gpd does not fit
the data inside the threshold level which is in the middle part of the distribution.
For the upper tail, the tail fatness appears very prominently. In this case, the gpd ap-
proximation estimates the tail very well while the normal approximation underestimates it.
Surprisingly, the gpd is fitting the distribution even inside the threshold to a great extent.
While normal approximation fails to capture the tail behaviour of the data on the upper tail,
the extreme value theory model is able to capture both the tails precisely.
The tail quantile estimates thus obtained can be translated back into the original return series,
given an estimate of the time dependent mean and volatility. This idea has been developed
in McNeil and Frey (1999) in their two stage VaR estimation approach,as described below.
17
5.4 Estimating Value-at-Risk
VaR is a measure of extreme risk in terms of the unknown loss distribution F (x) of the
portfolio under consideration. VaR is the pth quantile of the distribution F , (where p is very
high and pre specified), given by
V aRp = F −1 (p)
Although EVT primarily deals with i.i.d. random variables, recent works of McNeil and
Frey (1999) has developed a procedure for applying this approach to stationary time series
processes for conditional VaR estimation. This approach of VaR estimation is explained in
Section 5.5.
5.5 A two stage approach for VaR estimation
Let {Xt } is a strictly stationary time series whose dynamics are given by
Xt = µt + σt Zt (8)
where µt is the mean process and σt the volatility dynamics of Xt , and,
Zt ∼ fZ (z)
where fZ (z) is white noise.
The pth quantile of the distribution of Xt at time t can be obtained by using that of Zt , as,
xtp = µt + σt zp (9)
where zp is the pth quantile on the distribution of Zt , which, by assumption, is iid.
McNeil and Frey (1999) proposes the following approach to estimate VaR for financial returns
1. Fit a time series model to the return series using a pseudo-maximum likelihood (PML)
18
estimator using normality for fZ (z). Estimate µt and σt from the fitted model and
extract the residuals Zt .
The use of the PML approach to estimate the parameters of the time series by using
the normal distribution for fZ (z) does not imply the assumption of normality of fZ (z).
Under standard regularity conditions (Gourieroux, 1997; Gourieroux et al., 1984) the
use of the normal distribution would yield consistent estimation even if the underlying
distribution is not normal. That is, the consistency of the PML estimator does not
depend on the distribution which is used to build the likelihood function provided it
belongs to the quadratic exponential family of distributions (in this case, the normal
distribution). Moreover, this estimator is asymptotically normal11
2. If the residual series Zt is found to be strictly white noise, the EVT can be applied to
model the tail of the white noise fZ (z). The EVT based VaR formula (7) can be used
t .
to estimate VaR for the Zt series, say V aRZ
t , the VaR for the return series can be estimated

Given the estimate of µt , σt and V aRZ
as
t t
V aRX = µ̂t + σ̂t V aRZ (10)
6 Conditional VaR estimation
The estimated quantiles on the innovation series can be used to make daily VaR forecasts for
the underlying returns. In order to do so, we dynamically estimate µ̂t and σ̂t forecasts for
the “out of sample” periods, by using a rolling window of size 1250. These daily forecasts
and the VaR for the residual series are used in the formula (10) to make one day ahead VaR
11
Pseudo Maximum Likelihood (PML) estimators are obtained by maximising the likelihood function asso-
ciated with a family of probability distributions which do not necessarily contain the true pdf of the underlying
random variable. Gourieroux et al. (1984) establishes that the PML estimators of the first two moments of
the unknown underlying distribution, based on the linear and quadratic exponential family are asymptotically
consistent and normally distributed regardless of the exact form of the true unknown distribution. The normal
distribution, being a quadratic exponential family, can provide consistent estimators of the first two moments.
19
forecasts for the underlying data.
Figure 2 depicts the forecasted 95% and 99% VaR vis-a-vis the actually observed portfolio
returns over the forecast window. VaR forecasts are estimated for a long and a short position
each of Rs. 100 in the Nifty portfolio.
The VaR plots on the negative side of the graph are for the long position and the plots on
the positive side are for the short Nifty position.
The graphs indicate that the VaR measures indicate that the VaR forecasts are able to capture
the volatility dynamics of the underlying return series.
6.1 Testing for statistical accuracy of the risk measures
A correctly specified VaR model should generate the pre specified failure rate conditionally
at every point in time. This is known as the property of “conditional coverage” of the VaR
model. The basic feature of a 99% VaR is that it should be exceeded 1% of the time, and that
the probability of the VaR being exceeded at time t + 1 remains 1% even after conditioning
on all information known at time t. This implies that the VaR should be small in times of
low volatility and high in times of high volatility, so that the events where the loss exceeds
the forecasted VaR measure are spread over the entire sample period, and do not come in
clusters. A model which fails to capture the volatility dynamics of the underlying return
distribution will exhibit the symptom of clustering of failures, even if (on the average) it may
produce the correct unconditional coverage.
Consider a sequence of one period ahead VaR forecasts {vt|t−1 }Tt=1 , estimated at a significance
level 1 − p. These forecasts can be considered as one–sided interval forecasts (−∞, vt|t−1 ] with
coverage probability p. Given the realisations of the return series rt and the ex-ante VaR
forecasts, the following indicator variable may be defined
1 if rt < vt for long position, and rt > vt for a short position

(
It =
0 otherwise
20
where rt is observed return and vt is forecasted VaR measure on day t.
The stochastic process {It } is called the “failure process”. The VaR forecasts are said to be
conditionally efficient if they display “correct conditional coverage”, i.e., if E[It|t−1 ] = p ∀ t.
This is equivalent to saying that the {It } series is iid with mean p.
Christoffersen and Diebold (2000) and Clements and Taylor (2000), suggest that a regression
of the It series on its own lagged values and some other variables of interest, such as day-
dummies or the lagged observed returns, can be used to test for the existence of various
form of dependence structures that may be present in the {It } series. Under this framework,
conditional efficiency of the It process can be tested by testing the joint hypothesis:
H : Φ = 0, α0 = p (11)
where
Φ = [α1 , ...αS , µ1 , ..., µS ]0
in the regression
S
X S−1
X
It = α0 + αs It−s + µs Ds,t + t (12)
s=1 s=1
t = S + 1, S + 2, ..., T (13)
Ds,t are explanatory variables. (14)
The hypothesis (11) can be tested by using an F-statistic in the usual OLS framework.12
To test for the existence of the property of correct conditional coverage, we perform an OLS
regression of the It series on its five lagged values and five day–dummies representing the
12
In view of the fact that the It series is binary, a more appropriate way is to do a binary regression rather
than an OLS regression. However, there seem to be a technical problem in the implementation of the binary
regression as more than 90% of the It ’s are zero and only a few are unity. This asymmetry in the data
results in singular Hessian matrices in the estimation process and the maximum likelihood estimation fails as
a result. This problem seems to be more severe in the case of 99% VaR models. Therefore we resort to an
OLS regression, which is asymptotically equivalent to a binary regression.
21
trading days in a week. Significance of the F-statistic of this OLS will lead to rejection of a
model; otherwise it will lead to its non-rejection. It should be noted that the non-significance
of the F-statistic does not necessarily imply non-significance of the t-statistics corresponding
to the individual regressors in the OLS. We follow Hayashi (2000) and adopt the policy
of preferring the F-statistic over the t-statistic(s) in the case of a conflict. Therefore the
model will not be rejected if the F-statistic is not significant even though some individual
t-statistic(s) may turn out to be significant.
Table 7 presents the results of the test of correct conditional coverage to the evt-based VaR
measures. Panel A of the table deals with 95% VaR estimation and Panel B deals with 99%
VaR models. It is found that for both the long and the short Nifty positions, the VaR measures
are generating the correct conditional coverage, for both 95% and 99% VaR measures. The
estimated failure probabilities are not significantly different from the pre-specified levels, and
the independence of the failure series is indicated by insignificance of the F-statistics in the
ols regression of the failure series on its own past values and the day-dummies.
7 Conclusion
This paper carries out an analysis of the tail behaviour of the Nifty innovation distribution
using extreme value theory. We find that the essential features of the innovation distribution
is very different from the normal distribution. The right tail of the innovation distribution
displays significant ‘tail–fatness’ while the left tails behaves quite like the normal distribution.
This asymmetry in the innovation distribution necessitates treating the left tail and the right
tail separately while estimating risk measures like VaR.
We see that the extreme value theory based gpd model of tail estimation is able to capture
these features of the innovation distribution. By estimating upper and lower tails separately,
this approach takes care of the inherent asymmetry present in the data. This approach also
gives a better fit to both the tails of the innovation distribution compared to the normal
distribution.
22
The quantiles estimated by using the gpd model are used to estimate 95% and 99% Value-
at-Risk measures for a short and a long position in the Nifty portfolio. The tests of “correct
conditional coverage” confirms that the VaR measures possess “correct conditional coverage”,
a condition for the statistical precision of the VaR measures.
23
References
Christoffersen PF, Diebold FX, 2000. How Relevant is Volatility Forecasting for Financial
Risk Management. Review of Economics and Statistics 82:1–11.
Clements MP, Taylor N, 2000. Evaluating interval forecasts of high-frequency financial data.
Manuscript, University of Warwick.
Danielsson J, de Vries CG, 1997. Value-at-Risk and extreme returns. Manuscript, London
School of Economics.
Embrechts P, Kluppelberg C, Mikosch T, 1997. Modelling Extremal Events for Insurance and
Finance. Springer-Verlag Berlin Heidelberg.
Gavin J, 2000. Extreme value theory- an empirical analysis of equity risk. UBS Warburg
working paper.
Gourieroux C, 1997. ARCH Models and Financial Applications. Springer.
Gourieroux C, Monfort A, Trognon A, 1984. Pseudo Maximum Likelihood Methods: Theory.
Econometrica 52:681–700.
Hayashi F, 2000. Econometrics. Princeton Univ Press.
Leadbetter MR, Lindgren G, Rootzen H, 1983. Extremes and Related Properties of Random
Sequences and Process. Springer-Verlag New York Inc.
Longin F, 2000 July. From Value at Rsik to Stress testing: The Extreme Value Approach.
Journal of Banking and Finance 24(7):1097 – 1130.
McNeil AJ, 1996. Estimating the tails of loss severity distributions using extreme value theory.
Manuscript, Department Mathematik, ETH Zentrum, Zurich.
McNeil AJ, 1999. Extreme value theory for risk managers. Manuscript, Department Mathe-
matik, ETH Zentrum, Zurich.
24
McNeil AJ, Frey R, 1999. Estimation of tail-related risk measures for heteroscedastic financial
time series: an extreme value approach. Manuscript, Federal Institute of Technology.
Neftci SN, 2000. Value at Risk Calculations, Extreme Events, and Tail Estimation. The
Journal of Derivatives Spring:1–15.
25
Table 1 Estimation of AR(1)-GARCH(1,1) model
Parameter Estimates SE Confidence bounds
The mean equation:

Constant -0.036 0.040 (-0.114, 0.043)
AR(1) 0.225 0.029 (0.167,0.282)
The variance equation:
Constant 0.039 0.015 (0.010, 0.067)
ARCH(1) 0.101 0.016 (0.069, 0.132)
GARCH(1,1) 0.893 0.015 (0.863, 0.923)
Table 2 Descriptive statistics: Returns and Standard residuals
This table presents the values of the first four unconditional moments of the raw Nifty series and the standard
residuals extracted from a ar(1)-garch(1,1) specification of the Nifty returns.
Mean Variance Skewness Kurtosis

Raw returns 0.1094 4.1675 0.0758 8.6126
Standard residuals 0.0268 1.0033 0.2223 4.3623
Table 3 Tests for skewness, kurtosis and auto correlation
This table presents the values of the test statistics and the corresponding p-values of the tests of skewness,
kurtosis and autocorrelation for the raw data (Panel A) and the standard residuals (Panel B).
statistic p-value
Panel A: The returns series (rt )
skewness 15.7839∗ 0.000
kurtosis 292.3220 ∗ 0.000
H.C. Ljung-Box 57.2202∗ 0.01
Panel B: The residual series (zt )

skewness 46.2049∗ 0.000
kurtosis 70.78355∗ 0.000
H.C. Ljung-Box 47.7039 0.074
* indicates significance at 0.05 level.
26
Figure 1 The tails of the innovation distribution
This figure provides the plots of the estimated tails of the Nifty innovation distribution. The graphs provide
the empirical, the gpd approximation and the normal approximation to the lower and the upper tails. The
threshold levels from where the tail starts are shown by a vertical line. For example, for the lower tail, the
threshold level is -1.6493 and for the upper tail it is 1.6494.
The lower tail of the innovation distribution

0.16
fitted using GPD
empirical
Gaussian tail
0.14
0.12
0.1
probability
0.08
0.06
0.04
0.02
0
-4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5
x
The upper tail of the innovation distribution

0.16
fitted using GPD
empirical
Gaussian tail
0.14
0.12
0.1
probability
0.08
0.06
0.04
0.02
0
0.5 1 1.5 2 2.5 3 3.5 4
x
27
Figure 2 95% and 99% VaR measures estimated with the POT model
This graph shows the dynamic VaR forecasts for a long and a short position of Rs. 100 in the Nifty portfolio
by using the pot approach based on extreme value theory. The graph depicts two sets of VaR forecasts, one
for the long position and the other for the short position. The graph on the negative side of the observed
returns is for the long position and the one on the positive side is for the short position in Nifty.
10
5
Returns (%)
95% VaR

99% VaR
0
Observed Returns
-5
08/05/96 21/02/97 19/12/97 14/10/98

29/07/99
18/05/00 01/03/01

Forecast Dates

Table 4 Results of the GPD estimation
This table provides the results of the estimated GPD parameters fitted to the excesses over the chosen threshold.
The first column gives the threshold points on both the left and the right tails corresponding to the 1.65σ
level of threshold. The second column presents the number of observations beyond the threshold level and the
third column gives the estimated cdf of the tails at the respective threshold points. The fourth and the fifth
columns present the Pseudo-maximum-likelihood estimation of the GPD parameters fitted to the excesses over
the thresholds, along with the standard errors of estimation within parenthesis.
u Nu Fu ξˆ σ̂
left tail -1.6493 50 0.0401 0.2027 0.4099

(0.2095) (0.1030)
Right tail 1.6494 66 0.9471 -0.0064 0.6460
(0.2736) (0.1950)
Figures in parenthesis indicate standard error
28
Table 5 Estimated quantiles on the i.i.d. residuals
This table provides some of the estimated quantiles on the tails along with the empirical quantiles as well as
the corresponding quantiles on the standard normal distribution. Panel A deals with the lower tail and Panel
B deals with the upper tail. The column 1 gives the probability level. Columns 2, 3 and 4 give the estimated,
empirical and standard normal distribution quantiles corresponding to these probability levels.
p EVT Empirical Normal
Panel A: Quantiles on the lower tail

0.06 -1.4907 -1.4164 -1.5548
0.05 -1.5608 -1.5092 -1.6449
0.04 -1.6503 -1.6358 -1.7507
0.03 -1.7717 -1.7618 -1.8808
0.02 -1.9555 -1.8128 -2.0537
0.01 -2.3067 -2.3615 -2.3263
0.009 -2.3646 -2.3704 -2.3656
0.008 -2.4307 -2.4348 -2.4082
0.007 -2.5076 -2.4637 -2.4573
0.006 -2.5991 -2.7965 -2.5121
0.005 -2.7109 -2.9345 -2.5758
0.004 -2.8526 -3.0110 -2.6521
0.003 -3.0474 -3.0858 -2.7478
0.002 -3.3404 -3.2280 -2.8782
0.001 -3.8005 -3.4330 -3.0902
Panel B: Quantiles on the upper tail
0.94 1.5684 1.5550 1.5548
0.95 1.6862 1.6728 1.6449
0.96 1.8302 1.8310 1.7507
0.97 2.0155 2.0034 1.8808
0.98 2.2761 2.1447 2.0537
0.99 2.7202 2.8785 2.3263
0.991 2.7875 2.8839 2.3656
0.992 2.8627 2.9422 2.4082
0.993 2.9479 3.2272 2.4373
0.994 3.0461 3.2469 2.5121
0.995 3.1622 3.2476 2.5758
0.996 3.3041 3.5057 2.6521
0.997 3.4867 3.5153 2.7478
0.998 3.7436 3.6180 2.8782
0.999 3.8535 3.6394 3.0902
29
Table 6 Results of the Kolmogorov-Smirnov tests
This table provides the values of the Kolmogorov-Smirnov test statistics for testing the hypotheses 5.3
against 5.3 (D) and the hypothesis 5.3 against 5.3 (D+ ).
Upper tail Lower tail
D 0.2794 0.1574
D+ 0.8536∗ 0.3589
Critical value of D at 0.05 level of significance = 0.467
Critical value of D+ at 0.05 level of significance = 0.400
Table 7 Results of the test of “conditional coverage”
This table presents the results of the test of autoregressive and periodic dependence in the failure series
generated by the three alternative approaches. The VaR measures are for a long position in the Nifty portfolio
and the VaR measures are estimated by the gpd approach based on the extreme value theory. The first column
gives the portfolio, the second column gives the estimated failure probability and the corresponding p-value in
parenthesis, and the third column reports the estimated F-statistics of the hypothesis of no dependence of the
failure series with the corresponding p-values (within parenthesis). Panel A of this table deals with 95% VaR
estimation and Panel B deals with 99% VaR estimation.
p̂ (p-value) F-stat (p-value)
Panel A: 95% VaR estimation

Short Nifty 0.0481 (0.5824) 1.3190 (0.2148)
Long Nifty 0.0495 (0.5198) 1.7404 (0.0673)
99% VaR estimation

Short Nifty 0.0056 (0.8870) 0.7807 (0.6476)
Long Nifty 0.0142 (0.1959) 0.7438 (0.6834)
Figures in parentheses indicate p-values
30

Extreme Value Theory and Financial Risk Management

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Extreme Value Theory and Financial Risk Management

Uploaded by

Copyright:

Available Formats

∗

Extreme Value Theory and financial risk management

March 21, 2002

tion; correct conditional coverage

JEL Classification: C10, C13, C22, G10

It quantifies in monetary terms the exposure of a portfolio to the market fluctuations. It is

risk horizon is equal to a pre-specified tolerance level.

based on tail quantiles.

Section 7 concludes this paper.

2 An overview of Extreme Value Theory

of extreme observations (maxima or minima of n random realisations).

n independent realisations of the random variable X. Define the extreme observations as

such that the distribution of the standardised extrema

min{X1 , X2 , ..., Xn } = −max{−X1 , −X2 , ..., −Xn }

the same for the minima1 .

If ∃ constants an (> 0) and bn ∈ R such that

domain of attraction of the extreme value distribution H. It is denoted by X ∈ DA(H).

a simpler version of de Haan’s proof.

normal or log-normal distributions. The Fréchet distribution is obtained as a limiting distri-

returns has no tail.

distributions to belong to the domain of attraction of a unique extreme value distribution 3 .

• Normal, Exponential, lognormal (and other monotone transformation of the normal

• Pareto, Cauchy, students-t, fat-tailed distributions ∈ DA(Arećbet)

• Uniform, beta ∈ DA(Weibull)

2.2 The Generalized Extreme Value Distribution

case of the Gumbel distribution.

2.3 The Pickands-Balkema-de Haan Theorem

Suppose that X1 , X2 , ....Xn are n independent realisations of a random variable X with a

lim sup |Fk (x) − Gξ,σ(k) (x)| = 0

The GPD is given by

3 Extreme Value theory in risk management

of extreme (largest or smallest) observations collected from non-overlapping blocks of fixed

extremes as in the BMM approach.

4 The Peaks-over-Threshold Model: the GPD approach

certain threshold point which identifies the starting of the tail.

Φk (y) = P r{X − k ≤ y|X > k}

Pickands-Balkema-de Haan theorem says

Φk (y) → Gξ,β(k) (y) (4)

Setting x = k + y and using (3) and (4), we can rewrite F as

F (x) = (1 − F (k))Gξ,β (x − k) + F (k) (5)

following tail estimator formula

prominent Indian equity portfolio, the S & P CNX Nifty.

and Frey, 1999) described in Section 5.5.

5.2 Estimation procedure

5.2.1 Time series model for the Nifty return series

the best model5 .

ar(1) coefficient is significant.

the garch(1) parameter, are all found to be significant.

both the raw returns as well as the standard residuals.

can apply the results of EVT to the residual series.

5.2.2 Modeling peaks-over-thresholds

times the unconditional variance are considered to be extremes9 .

fitted to the peaks over the thresholds.

to estimate the tails of the distribution.

Panel B reports the quantiles on the upper tail.

Approximating a fat–tailed distribution by normal distribution leads to underestimation of

statistically significant, we carry out a non-parametric Kolmogorov-Smirnov test.

First we test the following hypotheses:

against the alternative hypothesis

upper tail. We do a two-sided Kolmogorov-Smirnov test to test the hypothesis.

higher than that of the normal distribution. The hypothesis is

and the alternative hypothesis is

We carry out a one-sided Kolmogorov-Smirnov test for this.