You are on page 1of 30

Extreme Value Theory and financial risk management


Mandira Sarma

March 21, 2002


An earlier draft of this paper was presented at the the Fifth Capital Markets Conference at UTI Institute
of Capital Markets, Mumbai, India. I thank the participants of the conference for helpful comments and
suggestions. All errors are mine.

Address: Indira Gandhi Institute of Development Research, Goregaon (East), Mumbai 400 065. Phone:
+91-22-840-0919. Fax: +91-22-840-2752. Email: mandira@igidr.ac.in

1
Abstract

Financial risk management is about understanding the large movements in the market
values of asset portfolios. The conventional approach of estimating market risk measures
deal in assuming a Gaussian distribution for the innovations of the return series. This
approach may lead to faulty risk measures if the innovation distribution is non-Gaussian,
which is often the case for financial series. This paper uses extreme value theory to
explicitly model the tail regions of the innovation distribution of the return series of a
prominent Indian equity index, the S & P CNX Nifty. We find that the lower tail of the
Nifty innovations behaves very much like the lower tail of the standard Gaussian curve,
while the upper tail has significant ”tail thickness”. This inherent asymmetry and the
existence of tail–thickness can provide valuable information to the risk manager. The
EVT-based tail quantiles have been used to make daily forecasts of Value-at-Risk for the
portfolio under consideration at 95% and 99% levels for a long and a short position in the
Nifty portfolio. These forecasts are found to be providing statistically sound risk measures
for the portfolio under consideration.

KEY WORDS Extreme value theory; Value-at-Risk; pseudo maximum likelihood estima-

tion; correct conditional coverage

JEL Classification: C10, C13, C22, G10

2
1 Introduction

Value-at-Risk (VaR) is widely used as a tool for measuring the market risk of asset portfolios.

It quantifies in monetary terms the exposure of a portfolio to the market fluctuations. It is

defined as the maximum monetary loss of a portfolio such that the likelihood of experiencing

a loss exceeding that amount, due to its exposure to the market movements, over a specified

risk horizon is equal to a pre-specified tolerance level.

Extreme value theory (EVT) deals with the study of the asymptotic behaviour of extreme

(maxima and minima) observations of a random variable. Financial risk management is all

about understanding the large movements in the values of asset portfolios. It essentially

deals with the analysis of the tail regions of the distribution of changes in the market value of

the portfolio. Extreme value theory, by dealing with only extreme observations, can provide

a better treatment to the estimation of tail quantiles like VaR. In conventional techniques

of measuring risk, inferences about the tail region is made after estimating the entire return

distribution. In such an approach, the observations in the interior of the distribution dominate

the estimation process and since extreme observations consist only a small part of the data,

their contribution in the estimation is relatively smaller than the observations in the central

part of the distribution. Therefore in such an approach the tail regions are not accurately

estimated.

Extreme value theory, on the other hand, focuses primarily on analysing the extreme obser-

vations rather than the observations in the central region of the distribution. The theory

provides robust tools for estimating only the tails by making use of the available data. Tail

quantiles like VaR can be estimated more accurately by using EVT than the conventional

approaches.

Another appealing aspect of EVT is that it does not require to make a priori assumption

about the return distribution. The fundamental result of extreme value theory, known as the

“extremal types theorem”, identifies the possible classes of distributions for movements of

the extreme returns irrespective of the actual underlying return distribution. This extremely

3
powerful result of the extreme value theory makes the VaR estimation process free from any

a priori assumption about the portfolio return distribution. Moreover, EVT based methods

inherently incorporates separate estimation of the upper and the lower tails, and thereby em-

phasises the necessity to treat both the tails separately due to possible existence of asymmetry

in the return series. This becomes important when estimating VaR measures for long and

short positions. Conventional models of VaR estimation treats both the tails symmetrically

and hence VaR for the long and short positions are assumed to be equal in magnitude.

This paper uses the recent developments of extreme value theory to analyse the tails of

the innovation distribution of the Nifty returns. Using the “Peaks-Over-Threshold” (POT)

model (McNeil and Frey, 1999) we estimate the tail regions of the innovation distribution of

the Nifty returns. We find that the lower tail of the Nifty innovation behaves very much like

a Gaussian tail whereas the upper tail behaves significantly different from that of a Gaussian

tail. The upper tail is found to exhibit significant “tail thickness” which indicates existence of

asymmetry in the innovation distribution. The existence of asymmetry and the tail thickness

in the innovation distribution provides valuable information while estimating risk measures

based on tail quantiles.

The rest of this paper is organised as follows. Sections 2 and 3 present an overview of extreme

value theory and its application in financial risk management. Section 4 describes the “Peaks-

Over-Threshold (POT)” model used in this paper. Section 5 provides an empirical analysis

of the tails of the Nifty innovations. In Section 5.4 daily 99% and 95% VaR forecasts for a

short and a long Nifty position have been estimated by using the estimated tail quantiles.

Theses measures are tested for the existence of “correct conditional coverage” in section 6.1.

Section 7 concludes this paper.

2 An overview of Extreme Value Theory

The classical Extreme Value theory (EVT) deals with the study of the asymptotic behaviour

of extreme observations (maxima or minima of n random realisations).

4
Suppose that X ∈ (l, u) is a random variable with density f and cdf F . Let X1 , X2 , ....Xn be

n independent realisations of the random variable X. Define the extreme observations as

Yn = max{X1 , X2 , ....., Xn }

Zn = min{X1 , X2 , ...., Xn }

The extreme value theory deals with the distributional properties of Yn and Zn as n becomes

large.

It can be easily shown that the exact distributions of the extreme observation is degenerate

in the limit. In order to find a distribution of interest which is non-degenerate, the extrema

Yn and Zn are transformed with a scale parameter an (> 0) and a location parameter bn ∈ R,

such that the distribution of the standardised extrema

Yn − an
bn
Zn − an
and
bn

is non-degenerate.

The two extremes, the maximum and the minimum are related by the following relation:

min{X1 , X2 , ..., Xn } = −max{−X1 , −X2 , ..., −Xn }

Therefore, all the results for the distribution of maxima leads to an analogous result for the

distribution of minima and vice versa. We will discuss the results for maxima only and ignore

the same for the minima1 .


1
A brief description about the minima can be found in Leadbetter et al. (1983)

5
2.1 The Fisher-Tippett Theorem

The Fisher-Tippett theorem (1928) is a fundamental result in EVT. The importance of this

result is that it exhibits the possible limiting forms for the distribution of Yn under linear

transformations even without the exact knowledge of the underlying distribution F . The

“Fisher-Tippett Theorem”, also known as the “Extremal type theorem” states thus:

If ∃ constants an (> 0) and bn ∈ R such that

Yn − an d
−→ H as n → ∞
bn

for some non-degenerate distribution H, then H must be one of the only three possible ‘extreme

value distributions’.

In that case, X (and the underlying distribution F ) is said to belong to the (maximum)

domain of attraction of the extreme value distribution H. It is denoted by X ∈ DA(H).

More specifically, this basic result states that if there exist some suitable normalizing constants
Yn −an
an (> 0) and bn , the transformed maxima bn has a non-degenerate limiting distribution

function H(x), then H must have one of only three possible “forms”. The limit laws for

maxima were derived by Fisher and Tippett (1928). A first rigorous proof is due to Gnedenko

(1943). De Haan (1970) subsequently provided a simpler proof and Weissman (1977) provided

a simpler version of de Haan’s proof.

The three possible probability laws for suitably normalised extrema are: the Gumbel (or Type

I) distribution, the Fréchet (or Type II) distribution and the Weibull (or Type III) distribu-

tion2 . The Gumbel distribution is a limit law for the thin-tailed distributions such as the

normal or log-normal distributions. The Fréchet distribution is obtained as a limiting distri-

bution for the fat-tailed distributions such as Student’s-t or the Stable Paretian distributions.

The marginal distribution of a stationary garch process is also in the domain of attraction

of the Fréchet family. Finally, the Weibull distribution is obtained when the distribution of

returns has no tail.


2
Details about these distributions can be found in Leadbetter et al. (1983) and Embrechts et al. (1997)

6
Von Mises (1976) gives necessary and sufficient conditions for a distribution F to belong to

the domain of attraction of a particular extreme value distribution. Using these conditions

it can be established that there exist suitable normalising constants for certain well known

distributions to belong to the domain of attraction of a unique extreme value distribution 3 .

For example,

• Normal, Exponential, lognormal (and other monotone transformation of the normal

distribution) ∈ DA(Gumbel)

• Pareto, Cauchy, students-t, fat-tailed distributions ∈ DA(Arećbet)

• Uniform, beta ∈ DA(Weibull)

• Poisson, Geometric ∈
/ any domain of attraction

2.2 The Generalized Extreme Value Distribution

The three families of extreme value distributions, viz. the Gumbel, the Fréchet and the

Weibull, can be nested into a single parametric representation, as shown by Jenkinson and Von

Mises. This representation is known as the “Generalised Extreme Value” (GEV) distribution,

and given by
− 1ξ
Hξ (x) = exp{−(1 + ξx) (1)

where

1 + ξx > 0

The support of ξ is

1
x > − if ξ > 0
ξ
1
x < if ξ < 0
ξ
x ∈ R if ξ = 0
3
Leadbetter et al. (1983)(Chap 1) and Embrechts et al. (1997) (Chap 3) discuss the Von Mises’ conditions
and derive norming constants for each specific distribution to belong to a particular domain of attraction.

7
The parameter ξ, called the tail index, models the distribution tails. Each of the three extreme

value distributions can be obtained as a special case of the GEV distribution. When ξ > 0,

we get the Fréchet distribution, when ξ < 0 we get the Weibull distribution and ξ = 0 is the

case of the Gumbel distribution.

These results imply that essentially all the common, continuous distributions of statistics

belong to the domain of attraction of a single family Hξ , the extreme value distributions

being differentiated only by the value of ξ. This shows the generality of the extremal types

theorem.

2.3 The Pickands-Balkema-de Haan Theorem

Suppose that X1 , X2 , ....Xn are n independent realisations of a random variable X with a

distribution function F (x). Let u be the finite or infinite right endpoint of the distribution

F . The distribution function of the excesses over certain (high) threshold k is given by

F (x + k) − F (k)
Fk (x) = Pr{X − k ≤ x|X > k} =
1 − F (k)

for 0 ≤ x < u − k.

The Pickands-Balkema-de Haan theorem (Balkema & de Haan 1974; Pickands 1975) states

that if the distribution function F ∈ DA(Hξ ) then ∃ a positive measurable function σ(k) such

that

lim sup |Fk (x) − Gξ,σ(k) (x)| = 0


k→u 0≤x<u−k

and vice versa, where Gξ,σ(k) (x) denote the Generalised Pareto distribution.

The above theorem states that as the threshold k becomes large, the distribution of the

excesses over the threshold tends to the Generalised Pareto distribution, provided the under-

lying distribution F belongs to the domain of attraction of the Generalised Extreme Value

distribution.

8
2.4 The Generalised Pareto Distribution (GPD)

The GPD is given by

1 − (1 + ξx/σ)−1/ξ ; if ξ 6= 0
(
Gξ,σ (x) = (2)
1 − exp(−x/σ); if ξ = 0
where σ > 0, and the support of x is x ≥ 0 when ξ ≥ 0 and 0 ≤ x ≤ −σ/ξ when ξ < 0.

3 Extreme Value theory in risk management

There are two broad categories of approaches which uses the results of the extreme value

theory while estimating market risk of financial assets. The first among them, known as the

‘Block Maxima Model (BMM)’ utilises the ‘extremal types theorem’ to model the distribution

of extreme (largest or smallest) observations collected from non-overlapping blocks of fixed

size from the data. Then the ‘generalised extreme value’ distribution is fitted to these block

extrema. This distribution would reflect the behaviour of very high profits (in case of maxima)

and very high losses (in case of minima) from the portfolio.

For example, suppose the data consists of the daily returns of a particular portfolio and we

are interested to analyse the lower tail of the portfolio return distribution. In this case, the

BMM approach would involve the fitting of the GEV distribution Hξ (x) to the minimum

observations collected from non-overlapping blocks over the entire sample. If the block size

is 25 (a month), the the 5th percentile of this distribution would give the magnitude of the

daily loss that can be expected with probability 0.05, which is the daily loss level that one can

expect to face once in 20 months. Such a value is known as the ‘stress loss’ with probability

0.05.

Longin (2000) develops a method for VaR estimation using the BMM approach. He develops

a formula for VaR estimation by relating the distribution of the extremes and the distribu-

tion of the underlying returns in terms of the parameters of the ‘generalised extreme value’

distribution. This approach can be used even for stationary non-iid time series by estimating

9
an additional parameter called the ‘extremal index’4 .

The above measures of market risk are essentially unconditional. These measures are constant

over the forecast period, and do not incorporate the changing time series dynamics of the

underlying returns.

The second approach, known as the ‘Peaks-Over-Threshold (POT)’ model, attempts to esti-

mate the tails of the underlying return distribution, instead of modeling the distribution of

extremes as in the BMM approach.

In the POT model, a certain threshold is identified to define the starting of the tail of the re-

turn distribution. Then the distribution of the ‘excesses’ over the threshold point is estimated.

There are two approaches of estimating the ‘excess’ distribution, viz. the semi-parametric

models based on the Hill estimator (Danielsson and de Vries, 1997) and the fully parametric

model based on the Generalised Pareto distribution (GPD) (McNeil and Frey, 1999). The

Hill estimator based approach is limited in its application as it requires the assumption of fat

tails of the underlying return distribution. On the other hand, the gpd version is applicable

to any kind of distribution, fat-tailed or no. This approach utilises the Pickands-Balkema-de

Haan theorem to fit a generalised Pareto distribution to the excesses over specific thresholds.

The following sections describe the gpd approach of the pot model in details.

4 The Peaks-over-Threshold Model: the GPD approach

The POT model provides for a framework of estimating the tails (positive or negative tails)

of the return distribution by estimating what is known as the distribution of excesses over

certain threshold point which identifies the starting of the tail.

The distribution of excesses over a high threshold k on the portfolio’s loss distribution F is

defined by

Φk (y) = P r{X − k ≤ y|X > k}


4
To know more about this methodology, refer to Longin (2000).

10
In terms of the underlying loss distribution F ,

F (y + k) − F (k)
Φk (y) = (3)
1 − F (k)

Pickands-Balkema-de Haan theorem says

Φk (y) → Gξ,β(k) (y) (4)

for

k→u

Thus, using the Pickands-Balkema-de Haan theorem, one can model the distribution of the

excesses over the threshold k as a GPD, provided the threshold is sufficiently high.

Setting x = k + y and using (3) and (4), we can rewrite F as

F (x) = (1 − F (k))Gξ,β (x − k) + F (k) (5)

for x > k.

Using HS estimate for F (k) and ML estimates of the GPD parameters gives rise to the

following tail estimator formula

!(− 1 )
Nk x−k ξ̂
F̂ (x) = 1 − 1 + ξˆ (6)
N β̂

For a given probability p > F (k), a tail quantile is estimated by inverting the tail estimator

formula (6),

β̂ N
 
q̂p = k + (1 − p)−ξ̂ − 1 (7)
ξˆ Nk

11
5 A POT analysis of the Nifty tails

In this section, we present a pot analysis of the innovation distribution of the returns of a

prominent Indian equity portfolio, the S & P CNX Nifty.

The Section 5.1 describes the data. Section 5.2 describes the estimation procedure in details.

Section 5.4 provides an example of VaR estimation using a two-stage methodology (McNeil

and Frey, 1999) described in Section 5.5.

5.1 Data

The data consists of 2697 daily logarithmic returns of the Nifty portfolio (from 3 July 1990

till 15 March 2002). The first 1,250 observations (from 3 July 1990 till 7 May 1996) comprise

the estimation window and the rest of the 1446 observations (from 8 May 1996 till 15 March

2002) are used for making rolling window “out–of–sample” VaR forecasts.

5.2 Estimation procedure

5.2.1 Time series model for the Nifty return series

As the pot can be used only for iid data, we need to remove the time series dynamics from

the return series and obtain an iid series free from any time series dynamics.

At first we try to ascertain the time series structure of the Nifty return series. A specification

search in terms of AIC and SBC criteria leads us to choose the ar(1)-garch(1,1) model as

the best model5 .

Table 1 presents the estimated parameters of the mean and volatility equations of the Nifty

returns. The constant term in the mean equation is found to be insignificant although the

ar(1) coefficient is significant.


5
Values of the AIC and SBC criteria for various specifications of the time series can be obtained on request.

12
The parameters in the volatility equations, viz., the constant, the arch(1) parameter and

the garch(1) parameter, are all found to be significant.

We extract the standard residuals from the estimated model and investigate how the moments

of the residual series changed after the time series dynamics from Nifty is removed. Table 2

presents the values of the first four unconditional moments of the Nifty series and the standard

residuals obtained from an ar(1)-garch(1,1) specification of the Nifty returns. The values

of these descriptive statistics indicate the existence of positive skewness and leptokurtosis in

both the raw returns as well as the standard residuals.

Table 3 presents the values of the test statistics for testing the significance of skewness, excess

kurtosis and autocorrelation (upto lag 35) along with their respective p-values for the original

return series and the residual series. The results of Table 3 indicates that the return series has

significant skewness, excess kurtosis and autocorrelation. The residual series is found to have

significant skewness and excess kurtosis but it does not possess significant autocorrelation.

Thus, neither the return series nor the residual series can be considered to be normally

distributed, since both the series have significant positive skewness and excess kurtosis.

As table 3 indicates, the residual series is found to be free from autocorrelation and hence we

can apply the results of EVT to the residual series.

5.2.2 Modeling peaks-over-thresholds

The Pickands-Balkema-de Haan theorem offers the generalised Pareto distribution as a natural

choice for the distribution of excesses (peaks) over sufficiently high thresholds. However,

while choosing an appropriate threshold, one faces a trade off between bias and variance. The

theoretical consideration suggests that the threshold should be as high as possible for the

Pickands-Balkema-de Haan theorem to hold good, but in practice, too high a threshold might

leave us with very few observations above the threshold for estimating the GPD parameters6 .

The GPD estimators are unbiased if and only if k → u, i.e. if the threshold is sufficiently

high. However, if it is chosen very high, there may be very few observations left for estimating
6
For more on this issue, see McNeil and Frey (1999).

13
the GPD parameters to the tail, leading to statistical imprecision and very high variance of

the estimates.

There is no correct choice of the threshold level. While McNeil and Frey (1999), McNeil (1996)

and McNeil (1999) use the “mean-excess-plot” as a tool for choosing the optimal threshold

level7 , Gavin (2000) uses an arbitrary threshold level of 90% confidence level (i.e. the largest

10% of the positive and negative returns are considered as the extreme observations).

In this paper we follow Neftci (2000) and choose the threshold level as 1.65 times the uncon-

ditional variance of the residuals8 . This represents the 5% of extreme movements if the data

were normally distributed. On the both sides of the tails, observations lying beyond 1.65

times the unconditional variance are considered to be extremes9 .

Table 4 presents the estimated threshold point, the number of extreme observations beyond

the threshold, and the results of the maximum likelihood estimation of the GPD to the

excesses (peaks) over the chosen threshold for the lower tail and the upper tail respectively of

the i.i.d. residual series obtained by fitting an ar(1)-garch(1,1) model to the Nifty returns.

The first column of this table gives the estimated threshold points corresponding to the chosen

level of threshold. For the lower tail the threshold point is -1.6493 and for the upper tail it is

1.6494.

The second column of Table 4 gives the number of extreme observations beyond the thresholds

on both the tails, the third column gives the estimated cdf at the thresholds, the fourth and

the fifth columns presents the ML estimation of the GPD parameters (with SEs in parenthesis)

fitted to the peaks over the thresholds.

The estimates of the gpd parameters can be used in the tail quantile estimation formula (7)

to estimate the tails of the distribution.

Table 5 presents some of the estimated quantiles on the lower tail and the upper tail along
7
Details on mean-excess-plots can be found in McNeil and Frey (1999) and Embrechts et al. (1997).
8
We tried with the mean excess plots but did not get a well behaved linear mean excess plot.
9
For analysis of the lower tail, i.e. the minima, we use the negative returns and then apply results for
maxima.

14
with the empirical quantiles and the corresponding quantiles on the standard normal curve10 .

The first column of this table indicates the probability levels and the second, third and the

fourth columns give the corresponding quantiles. The quantiles on the second column are

estimated from the gpd approximation, the quantiles on the third columns are empirically

observed quantiles and the ones on the fourth column are the corresponding quantiles on the

standard normal curve. Panel A of Table 5 provide the quantiles on the lower tail and the

Panel B reports the quantiles on the upper tail.

This table shows that the estimated tail quantiles are closer to the empirical quantiles than

that of a normal distribution approximation. This implies that a normal distribution approx-

imation of the underlying dgp would lead to misleading risk estimates. Particularly for the

upper tail, the normal quantiles seem to be underestimating the empirical distribution, while

the EVT-based quantiles are estimating it more precisely. This indicate the presence of the so

called ‘fat–tailed’ behaviour of financial series. The existence of ‘fat-tails’ for this particular

data is also established by the existence of significant excess kurtosis as shown in table 2.

Approximating a fat–tailed distribution by normal distribution leads to underestimation of

risk.

It is interesting that the discrepancy between the normal quantiles and the empirical quantiles

is less in the case of the lower tail than the upper tail.

Another point worth noting is that the tail quantiles indicate the existence of asymmetry, as

the pth quantile is not equal in absolute value to the (1 − p)th quantile.

These aspects show that assumption of normality do not reflect the actual riskiness of the

portfolio.
10
These quantiles can be considered as the unconditional VaR measures for the i.i.d. residual series obtained
from three models – the EVT-based GPD approach, the historical simulation and the Normal distribution
model.

15
5.3 The KS test for discrepancy

To test for the significance of difference between the estimated and empirical tails, and to test

whether the discrepancies between the normal approximation and the estimated quantiles is

statistically significant, we carry out a non-parametric Kolmogorov-Smirnov test.

Suppose that F (x), G(x) and φ(x) denote the empirical, the estimated and the normal dis-

tribution functions.

First we test the following hypotheses:

H0 : F (x) = G(x)

against the alternative hypothesis

H1 : F (x) 6= G(x)

This hypothesis tests whether the estimated quantiles are significantly different from the

empirical quantiles. The hypothesis testing is done separately for the lower as well as the

upper tail. We do a two-sided Kolmogorov-Smirnov test to test the hypothesis.

The second hypothesis tests whether the tails of the empirical distribution is significantly

higher than that of the normal distribution. The hypothesis is

0
H0 : F (x) = φ(x)

and the alternative hypothesis is

0
H1 : F (x) > φ(x)

We carry out a one-sided Kolmogorov-Smirnov test for this.

The KS statistic for the above two tests are respectively

D = supx |F (x) − G(x)|

and

D+ = supx |F (x) − φ(x)|

16
Table 6 provides the estimated KS-statistics for these hypotheses for the lower and the upper

tails. The discrepancy between the estimated and empirical tails is found to be insignificant

at 0.05 level of significance.

Significance of D+ for the upper tail indicates that the empirical quantiles on the upper

tails are significantly higher than the normal quantiles. This establishes the existence of

significant “tail thickness” on the upper tail. However, the normal approximation does not

lead to significant underestimation of the lower tail quantiles. This implies that the lower tail

of the empirical distribution seems to be behaving much like the normal tail, while the upper

tail is displaying ‘fat–tailed’ behaviour.

Figure 1 give a plot of the estimated lower and upper tails of the Nifty innovation distributions

along with the empirical and the normal tails.

It is seen that for the lower tail, the fitted as well as the normal approximation fit the data

well as there do not seem to be much of a discrepancy between the empirical tail and the

fitted as well as the normal tails. It is also clear that the gpd fit is good only beyond the

threshold -1.6493 from where the lower tail is believed to have started, and gpd does not fit

the data inside the threshold level which is in the middle part of the distribution.

For the upper tail, the tail fatness appears very prominently. In this case, the gpd ap-

proximation estimates the tail very well while the normal approximation underestimates it.

Surprisingly, the gpd is fitting the distribution even inside the threshold to a great extent.

While normal approximation fails to capture the tail behaviour of the data on the upper tail,

the extreme value theory model is able to capture both the tails precisely.

The tail quantile estimates thus obtained can be translated back into the original return series,

given an estimate of the time dependent mean and volatility. This idea has been developed

in McNeil and Frey (1999) in their two stage VaR estimation approach,as described below.

17
5.4 Estimating Value-at-Risk

VaR is a measure of extreme risk in terms of the unknown loss distribution F (x) of the

portfolio under consideration. VaR is the pth quantile of the distribution F , (where p is very

high and pre specified), given by

V aRp = F −1 (p)

Although EVT primarily deals with i.i.d. random variables, recent works of McNeil and

Frey (1999) has developed a procedure for applying this approach to stationary time series

processes for conditional VaR estimation. This approach of VaR estimation is explained in

Section 5.5.

5.5 A two stage approach for VaR estimation

Let {Xt } is a strictly stationary time series whose dynamics are given by

Xt = µt + σt Zt (8)

where µt is the mean process and σt the volatility dynamics of Xt , and,

Zt ∼ fZ (z)

where fZ (z) is white noise.

The pth quantile of the distribution of Xt at time t can be obtained by using that of Zt , as,

xtp = µt + σt zp (9)

where zp is the pth quantile on the distribution of Zt , which, by assumption, is iid.

McNeil and Frey (1999) proposes the following approach to estimate VaR for financial returns

1. Fit a time series model to the return series using a pseudo-maximum likelihood (PML)

18
estimator using normality for fZ (z). Estimate µt and σt from the fitted model and

extract the residuals Zt .

The use of the PML approach to estimate the parameters of the time series by using

the normal distribution for fZ (z) does not imply the assumption of normality of fZ (z).

Under standard regularity conditions (Gourieroux, 1997; Gourieroux et al., 1984) the

use of the normal distribution would yield consistent estimation even if the underlying

distribution is not normal. That is, the consistency of the PML estimator does not

depend on the distribution which is used to build the likelihood function provided it

belongs to the quadratic exponential family of distributions (in this case, the normal

distribution). Moreover, this estimator is asymptotically normal11

2. If the residual series Zt is found to be strictly white noise, the EVT can be applied to

model the tail of the white noise fZ (z). The EVT based VaR formula (7) can be used
t .
to estimate VaR for the Zt series, say V aRZ

t , the VaR for the return series can be estimated


Given the estimate of µt , σt and V aRZ

as

t t
V aRX = µ̂t + σ̂t V aRZ (10)

6 Conditional VaR estimation

The estimated quantiles on the innovation series can be used to make daily VaR forecasts for

the underlying returns. In order to do so, we dynamically estimate µ̂t and σ̂t forecasts for

the “out of sample” periods, by using a rolling window of size 1250. These daily forecasts

and the VaR for the residual series are used in the formula (10) to make one day ahead VaR
11
Pseudo Maximum Likelihood (PML) estimators are obtained by maximising the likelihood function asso-
ciated with a family of probability distributions which do not necessarily contain the true pdf of the underlying
random variable. Gourieroux et al. (1984) establishes that the PML estimators of the first two moments of
the unknown underlying distribution, based on the linear and quadratic exponential family are asymptotically
consistent and normally distributed regardless of the exact form of the true unknown distribution. The normal
distribution, being a quadratic exponential family, can provide consistent estimators of the first two moments.

19
forecasts for the underlying data.

Figure 2 depicts the forecasted 95% and 99% VaR vis-a-vis the actually observed portfolio

returns over the forecast window. VaR forecasts are estimated for a long and a short position

each of Rs. 100 in the Nifty portfolio.

The VaR plots on the negative side of the graph are for the long position and the plots on

the positive side are for the short Nifty position.

The graphs indicate that the VaR measures indicate that the VaR forecasts are able to capture

the volatility dynamics of the underlying return series.

6.1 Testing for statistical accuracy of the risk measures

A correctly specified VaR model should generate the pre specified failure rate conditionally

at every point in time. This is known as the property of “conditional coverage” of the VaR

model. The basic feature of a 99% VaR is that it should be exceeded 1% of the time, and that

the probability of the VaR being exceeded at time t + 1 remains 1% even after conditioning

on all information known at time t. This implies that the VaR should be small in times of

low volatility and high in times of high volatility, so that the events where the loss exceeds

the forecasted VaR measure are spread over the entire sample period, and do not come in

clusters. A model which fails to capture the volatility dynamics of the underlying return

distribution will exhibit the symptom of clustering of failures, even if (on the average) it may

produce the correct unconditional coverage.

Consider a sequence of one period ahead VaR forecasts {vt|t−1 }Tt=1 , estimated at a significance

level 1 − p. These forecasts can be considered as one–sided interval forecasts (−∞, vt|t−1 ] with

coverage probability p. Given the realisations of the return series rt and the ex-ante VaR

forecasts, the following indicator variable may be defined

1 if rt < vt for long position, and rt > vt for a short position


(
It =
0 otherwise

20
where rt is observed return and vt is forecasted VaR measure on day t.

The stochastic process {It } is called the “failure process”. The VaR forecasts are said to be

conditionally efficient if they display “correct conditional coverage”, i.e., if E[It|t−1 ] = p ∀ t.

This is equivalent to saying that the {It } series is iid with mean p.

Christoffersen and Diebold (2000) and Clements and Taylor (2000), suggest that a regression

of the It series on its own lagged values and some other variables of interest, such as day-

dummies or the lagged observed returns, can be used to test for the existence of various

form of dependence structures that may be present in the {It } series. Under this framework,

conditional efficiency of the It process can be tested by testing the joint hypothesis:

H : Φ = 0, α0 = p (11)

where

Φ = [α1 , ...αS , µ1 , ..., µS ]0

in the regression

S
X S−1
X
It = α0 + αs It−s + µs Ds,t + t (12)
s=1 s=1
t = S + 1, S + 2, ..., T (13)

Ds,t are explanatory variables. (14)

The hypothesis (11) can be tested by using an F-statistic in the usual OLS framework.12

To test for the existence of the property of correct conditional coverage, we perform an OLS

regression of the It series on its five lagged values and five day–dummies representing the
12
In view of the fact that the It series is binary, a more appropriate way is to do a binary regression rather
than an OLS regression. However, there seem to be a technical problem in the implementation of the binary
regression as more than 90% of the It ’s are zero and only a few are unity. This asymmetry in the data
results in singular Hessian matrices in the estimation process and the maximum likelihood estimation fails as
a result. This problem seems to be more severe in the case of 99% VaR models. Therefore we resort to an
OLS regression, which is asymptotically equivalent to a binary regression.

21
trading days in a week. Significance of the F-statistic of this OLS will lead to rejection of a

model; otherwise it will lead to its non-rejection. It should be noted that the non-significance

of the F-statistic does not necessarily imply non-significance of the t-statistics corresponding

to the individual regressors in the OLS. We follow Hayashi (2000) and adopt the policy

of preferring the F-statistic over the t-statistic(s) in the case of a conflict. Therefore the

model will not be rejected if the F-statistic is not significant even though some individual

t-statistic(s) may turn out to be significant.

Table 7 presents the results of the test of correct conditional coverage to the evt-based VaR

measures. Panel A of the table deals with 95% VaR estimation and Panel B deals with 99%

VaR models. It is found that for both the long and the short Nifty positions, the VaR measures

are generating the correct conditional coverage, for both 95% and 99% VaR measures. The

estimated failure probabilities are not significantly different from the pre-specified levels, and

the independence of the failure series is indicated by insignificance of the F-statistics in the

ols regression of the failure series on its own past values and the day-dummies.

7 Conclusion

This paper carries out an analysis of the tail behaviour of the Nifty innovation distribution

using extreme value theory. We find that the essential features of the innovation distribution

is very different from the normal distribution. The right tail of the innovation distribution

displays significant ‘tail–fatness’ while the left tails behaves quite like the normal distribution.

This asymmetry in the innovation distribution necessitates treating the left tail and the right

tail separately while estimating risk measures like VaR.

We see that the extreme value theory based gpd model of tail estimation is able to capture

these features of the innovation distribution. By estimating upper and lower tails separately,

this approach takes care of the inherent asymmetry present in the data. This approach also

gives a better fit to both the tails of the innovation distribution compared to the normal

distribution.

22
The quantiles estimated by using the gpd model are used to estimate 95% and 99% Value-

at-Risk measures for a short and a long position in the Nifty portfolio. The tests of “correct

conditional coverage” confirms that the VaR measures possess “correct conditional coverage”,

a condition for the statistical precision of the VaR measures.

23
References

Christoffersen PF, Diebold FX, 2000. How Relevant is Volatility Forecasting for Financial

Risk Management. Review of Economics and Statistics 82:1–11.

Clements MP, Taylor N, 2000. Evaluating interval forecasts of high-frequency financial data.

Manuscript, University of Warwick.

Danielsson J, de Vries CG, 1997. Value-at-Risk and extreme returns. Manuscript, London

School of Economics.

Embrechts P, Kluppelberg C, Mikosch T, 1997. Modelling Extremal Events for Insurance and

Finance. Springer-Verlag Berlin Heidelberg.

Gavin J, 2000. Extreme value theory- an empirical analysis of equity risk. UBS Warburg

working paper.

Gourieroux C, 1997. ARCH Models and Financial Applications. Springer.

Gourieroux C, Monfort A, Trognon A, 1984. Pseudo Maximum Likelihood Methods: Theory.

Econometrica 52:681–700.

Hayashi F, 2000. Econometrics. Princeton Univ Press.

Leadbetter MR, Lindgren G, Rootzen H, 1983. Extremes and Related Properties of Random

Sequences and Process. Springer-Verlag New York Inc.

Longin F, 2000 July. From Value at Rsik to Stress testing: The Extreme Value Approach.

Journal of Banking and Finance 24(7):1097 – 1130.

McNeil AJ, 1996. Estimating the tails of loss severity distributions using extreme value theory.

Manuscript, Department Mathematik, ETH Zentrum, Zurich.

McNeil AJ, 1999. Extreme value theory for risk managers. Manuscript, Department Mathe-

matik, ETH Zentrum, Zurich.

24
McNeil AJ, Frey R, 1999. Estimation of tail-related risk measures for heteroscedastic financial

time series: an extreme value approach. Manuscript, Federal Institute of Technology.

Neftci SN, 2000. Value at Risk Calculations, Extreme Events, and Tail Estimation. The

Journal of Derivatives Spring:1–15.

25
Table 1 Estimation of AR(1)-GARCH(1,1) model

Parameter Estimates SE Confidence bounds

The mean equation:


Constant -0.036 0.040 (-0.114, 0.043)
AR(1) 0.225 0.029 (0.167,0.282)
The variance equation:
Constant 0.039 0.015 (0.010, 0.067)
ARCH(1) 0.101 0.016 (0.069, 0.132)
GARCH(1,1) 0.893 0.015 (0.863, 0.923)

Table 2 Descriptive statistics: Returns and Standard residuals

This table presents the values of the first four unconditional moments of the raw Nifty series and the standard
residuals extracted from a ar(1)-garch(1,1) specification of the Nifty returns.

Mean Variance Skewness Kurtosis


Raw returns 0.1094 4.1675 0.0758 8.6126
Standard residuals 0.0268 1.0033 0.2223 4.3623

Table 3 Tests for skewness, kurtosis and auto correlation

This table presents the values of the test statistics and the corresponding p-values of the tests of skewness,
kurtosis and autocorrelation for the raw data (Panel A) and the standard residuals (Panel B).

statistic p-value
Panel A: The returns series (rt )
skewness 15.7839∗ 0.000
kurtosis 292.3220 ∗ 0.000
H.C. Ljung-Box 57.2202∗ 0.01

Panel B: The residual series (zt )


skewness 46.2049∗ 0.000
kurtosis 70.78355∗ 0.000
H.C. Ljung-Box 47.7039 0.074
* indicates significance at 0.05 level.

26
Figure 1 The tails of the innovation distribution

This figure provides the plots of the estimated tails of the Nifty innovation distribution. The graphs provide
the empirical, the gpd approximation and the normal approximation to the lower and the upper tails. The
threshold levels from where the tail starts are shown by a vertical line. For example, for the lower tail, the
threshold level is -1.6493 and for the upper tail it is 1.6494.

The lower tail of the innovation distribution


0.16
fitted using GPD
empirical
Gaussian tail
0.14

0.12

0.1
probability

0.08

0.06

0.04

0.02

0
-4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5
x

The upper tail of the innovation distribution


0.16
fitted using GPD
empirical
Gaussian tail
0.14

0.12

0.1
probability

0.08

0.06

0.04

0.02

0
0.5 1 1.5 2 2.5 3 3.5 4
x

27
Figure 2 95% and 99% VaR measures estimated with the POT model

This graph shows the dynamic VaR forecasts for a long and a short position of Rs. 100 in the Nifty portfolio
by using the pot approach based on extreme value theory. The graph depicts two sets of VaR forecasts, one
for the long position and the other for the short position. The graph on the negative side of the observed
returns is for the long position and the one on the positive side is for the short position in Nifty.

10

5
Returns (%)

95% VaR


99% VaR
0
Observed Returns

-5

08/05/96 21/02/97 19/12/97 14/10/98


 29/07/99
 18/05/00 01/03/01


Forecast Dates


Table 4 Results of the GPD estimation

This table provides the results of the estimated GPD parameters fitted to the excesses over the chosen threshold.
The first column gives the threshold points on both the left and the right tails corresponding to the 1.65σ
level of threshold. The second column presents the number of observations beyond the threshold level and the
third column gives the estimated cdf of the tails at the respective threshold points. The fourth and the fifth
columns present the Pseudo-maximum-likelihood estimation of the GPD parameters fitted to the excesses over
the thresholds, along with the standard errors of estimation within parenthesis.

u Nu Fu ξˆ σ̂

left tail -1.6493 50 0.0401 0.2027 0.4099


(0.2095) (0.1030)
Right tail 1.6494 66 0.9471 -0.0064 0.6460
(0.2736) (0.1950)
Figures in parenthesis indicate standard error

28
Table 5 Estimated quantiles on the i.i.d. residuals

This table provides some of the estimated quantiles on the tails along with the empirical quantiles as well as
the corresponding quantiles on the standard normal distribution. Panel A deals with the lower tail and Panel
B deals with the upper tail. The column 1 gives the probability level. Columns 2, 3 and 4 give the estimated,
empirical and standard normal distribution quantiles corresponding to these probability levels.

p EVT Empirical Normal

Panel A: Quantiles on the lower tail


0.06 -1.4907 -1.4164 -1.5548
0.05 -1.5608 -1.5092 -1.6449
0.04 -1.6503 -1.6358 -1.7507
0.03 -1.7717 -1.7618 -1.8808
0.02 -1.9555 -1.8128 -2.0537
0.01 -2.3067 -2.3615 -2.3263
0.009 -2.3646 -2.3704 -2.3656
0.008 -2.4307 -2.4348 -2.4082
0.007 -2.5076 -2.4637 -2.4573
0.006 -2.5991 -2.7965 -2.5121
0.005 -2.7109 -2.9345 -2.5758
0.004 -2.8526 -3.0110 -2.6521
0.003 -3.0474 -3.0858 -2.7478
0.002 -3.3404 -3.2280 -2.8782
0.001 -3.8005 -3.4330 -3.0902
Panel B: Quantiles on the upper tail
0.94 1.5684 1.5550 1.5548
0.95 1.6862 1.6728 1.6449
0.96 1.8302 1.8310 1.7507
0.97 2.0155 2.0034 1.8808
0.98 2.2761 2.1447 2.0537
0.99 2.7202 2.8785 2.3263
0.991 2.7875 2.8839 2.3656
0.992 2.8627 2.9422 2.4082
0.993 2.9479 3.2272 2.4373
0.994 3.0461 3.2469 2.5121
0.995 3.1622 3.2476 2.5758
0.996 3.3041 3.5057 2.6521
0.997 3.4867 3.5153 2.7478
0.998 3.7436 3.6180 2.8782
0.999 3.8535 3.6394 3.0902

29
Table 6 Results of the Kolmogorov-Smirnov tests

This table provides the values of the Kolmogorov-Smirnov test statistics for testing the hypotheses 5.3
against 5.3 (D) and the hypothesis 5.3 against 5.3 (D+ ).

Upper tail Lower tail

D 0.2794 0.1574
D+ 0.8536∗ 0.3589
Critical value of D at 0.05 level of significance = 0.467
Critical value of D+ at 0.05 level of significance = 0.400

Table 7 Results of the test of “conditional coverage”

This table presents the results of the test of autoregressive and periodic dependence in the failure series
generated by the three alternative approaches. The VaR measures are for a long position in the Nifty portfolio
and the VaR measures are estimated by the gpd approach based on the extreme value theory. The first column
gives the portfolio, the second column gives the estimated failure probability and the corresponding p-value in
parenthesis, and the third column reports the estimated F-statistics of the hypothesis of no dependence of the
failure series with the corresponding p-values (within parenthesis). Panel A of this table deals with 95% VaR
estimation and Panel B deals with 99% VaR estimation.

p̂ (p-value) F-stat (p-value)

Panel A: 95% VaR estimation


Short Nifty 0.0481 (0.5824) 1.3190 (0.2148)
Long Nifty 0.0495 (0.5198) 1.7404 (0.0673)

99% VaR estimation


Short Nifty 0.0056 (0.8870) 0.7807 (0.6476)
Long Nifty 0.0142 (0.1959) 0.7438 (0.6834)
Figures in parentheses indicate p-values

30

You might also like