You are on page 1of 37

The Geometric-VaR Backtesting Method

Denis Pelletier∗

Wei Wei†

February 6, 2015

Abstract

This paper develops a new test to evaluate Value at Risk (VaR) forecasts. VaR

is a standard risk measure widely utilized by financial institutions and regulators,

yet estimating VaR is a challenging problem, and popular VaR forecast relies on

unrealistic assumptions. Hence, assessing the performance of VaR is of great im-

portance. We propose the geometric-VaR test which utilizes the duration between

the violations of VaR as well as the value of VaR. We conduct a Monte Carlo study

based on desk-level data and we find that our test has high power against various

alternatives.

JEL Codes: C12, G21, G32


Keywords: Risk Management, Backtesting, Volatility, Duration, Value at Risk.


Department of Economics, North Carolina State University, Raleigh, NC 27695, USA, email: de-
nis_pelletier@ncsu.edu

CREATES, Department of Economics and Business, Aarhus University, 8210 Aarhus V, Denmark,
email: wwei@econ.au.dk

1
1 Introduction

The importance of risk management has long been recognized by financial market par-
ticipants. One essential element of managing risk is measuring risk. The 1996 Market
Risk Amendment to the Basel Accord established Value-at-Risk (VaR) as the basis for
determining market risk capital requirements. Since then, VaR has become a standard
tool to measure risk (See Jorion, 2006 and Berkowitz and O’Brien, 2002). VaR is the
maximum expected loss for a given time horizon and confidence level. For example, a
one-day VaR with coverage rate 5% is the value such that the loss in the next day would
be smaller than this value with a 95% probability. In other words, the probability that
the loss exceeds VaR is 5%.
VaR summarizes risk in a single number, and it is easily communicated among finan-
cial institutions and their regulators. Statistically, VaR is a quantile of the conditional
distribution of returns. Despite the simple concept, the estimation of VaR is a chal-
lenging problem, because the conditional distribution of returns is generally unknown.
Although several methods have been developed to estimate VaR, arguably the most
popular approach is Historical Simulation. Pérignon and Smith (2010) find that 73%
of banks that disclosed their VaR method are using Historical Simulation. Historical
Simulation approximates the conditional distribution of returns using a rolling window
with typically one or two years of past returns. It is nonparametric and easy to imple-
ment, hence favored by practitioners, but it relies on unrealistic assumptions; it assumes
that returns are independent and identically distributed. It ignores stylized facts such as
volatility clustering and leverage effects, and the time-varying dynamics of returns is only
accounted for by the rolling window. Moreover, Historical Simulation is under-responsive
to changes in conditional risk; see Pritsker (2006).
Given the popularity of VaR and the lack of effective implementation, evaluating the
performance of the VaR measure is of great importance. In practice, the evaluation of

2
VaR is usually carried out through backtesting, which compares the VaR forecasts with
realized returns. If the ex-post loss exceeds the ex-ante VaR forecast, it is referred to
as a violation. Define a hit sequence, {It }, where It = 1 indicates there is a violation;
if the VaR measure with coverage rate p is correctly specified, the hit sequence must be
i.i.d. Bernoulli with parameter p. Christoffersen (1998) builds a Likelihood Ratio (LR)
framework for the conditional evaluation of an interval forecast such as VaR. He first
formulates a LR test for the unconditional coverage of VaR, which amounts to testing
if the average number of violations corresponds to its expected value, i.e, if E(It ) = p.
The unconditional coverage test does not take into consideration higher-order dynamics:
it assumes that the observations are independent. Violations could have correct nom-
inal coverage while exhibiting time dependence, in particular violation clustering. The
clustering in violations indicates closely grouped large losses and misspecified VaR. To ex-
amine the independence hypothesis, Christoffersen (1998) specifies a first-order Markov
chain alternative for the hit sequence. Finally, he develops the test for correct condi-
tional coverage, which is a joint test of the unconditional coverage hypothesis and the
independence hypothesis.
The First-order Markov chain alternative has limited power against general forms of
time-dependence in violations. Christoffersen and Pelletier (2004) develop a duration-
based approach for backtesting VaR. The intuition is that if VaR is correctly specified
with coverage rate p, the hit sequence should be i.i.d. Bernoulli with parameter p, and the
duration between hits should have no memory and mean equal to 1/p. The distribution
of durations under the null hypothesis is approximated by the exponential distribution
since it is the only continuous distribution with constant hazard rate. For the alternative
hypothesis they considered a Weibull distribution with a decreasing hazard rate. Their
test can also be decomposed into an unconditional coverage test and an independence
test, where the unconditional coverage test checks if the mean of durations is equal to

3
1/p, and the independence test checks if the hazard rate is constant. They also considered
an autoregressive model for the expected conditional duration.
It is also possible to specify a discrete distribution for the durations. Haas (2005)
finds that discrete distributions have better power against violation clustering than con-
tinuous distributions. Candelon, Colletaz, Hurlin, and Tokpavi (2011) propose a GMM
test for duration-based backtesting. They find that discrete distributions perform as
well as continuous distributions within the GMM approach. Berkowitz, Christoffersen,
and Pelletier (2011) implement a discrete duration test under the LR framework, which
they refer to as the geometric test. Under the null hypothesis that durations have no
memory, discrete durations follow the geometric distribution, hence the name geometric
test. Monte Carlo simulations suggest that the geometric test is more powerful than the
continuous distribution based Weibull test.
Engle and Manganelli (2004) argue that requiring the hit sequence to be i.i.d. is a
necessary but not sufficient condition for a correctly specified VaR; if the VaR forecast
is a valid quantile measure, the expectation of hit sequence conditional on information
set at time t − 1 must equal to the coverage rate. In other words, the violation It should
be unbiased and it should be uncorrelated with any information up to t − 1. They
propose a dynamic quantile (DQ) test for backtesting VaR. In particular, they regress
the hit sequence on a set of explanatory variables that includes the VaR forecasts and
the first four lagged hits. Dumitrescu, Hurlin, and Pham (2012) extend this approach
to a dynamic binary choice model which allows for non-linear dependence between the
probability of violations and the explanatory variables. Gaglianone, Lima, Linton, and
Smith (2011) develop evaluation methods based on quantile regressions and it has better
small sample property than the DQ test.
Berkowitz, Christoffersen, and Pelletier (2011) provide a unified framework for back-
testing VaR by noting that violations form a martingale difference sequence. They com-

4
pare the power of existing backtesting methods using data generating processes that
resemble actual Profits and Losses (P/L) from four business lines. They find that the
DQ test performs the best overall but that the Geometric test also performs well in many
cases.
In this paper, we propose a new duration-based test by incorporating VaR forecasts in
the geometric test, hereafter geometric-VaR test. The insight is twofold: first, duration-
based approaches can capture general forms of time-dependence in violations. Specifi-
cally, we choose the geometric test over the Weibull test because discrete distributions
have better power against violation clustering. Second, if VaR is not correctly specified,
the probability of observing a violation would depend on past information, and including
the information in the conditional distribution of durations should improve the power of
a duration-based test. We focus on including VaR forecasts because misspecified VaRs
would react too slowly to changing market conditions and hence be informative about
the probability of getting a violation.
The geometric-VaR test can be decomposed into three individual tests: the first
test focuses on correct unconditional coverage; the second test considers the dependence
structure in durations; the third test examines whether the probability of getting a vio-
lation depends on the VaR forecasts. We compare through simulations the power of the
geometric-VaR test and the related duration-based tests using the same data generating
process based on business line data from Berkowitz, Christoffersen, and Pelletier (2011).
We find that the geometric-VaR test has better power than other duration-based tests or
regression-based tests, and it has power against various forms of misspecifications. We
also assess the performance of actual business line VaR forecasts of Berkowitz, Christof-
fersen, and Pelletier (2011) using the geometric-VaR test and its component tests. Our
framework not only tests whether the VaR forecast is misspecified, but also helps un-
derstand how the VaR forecast is misspecified by examining the individual hypotheses

5
separately.
The rest of the paper is organized as follows. Section 2 reviews the geometric test and
presents the geometric-VaR test. Section 3 discusses how to implement the tests. Section
4 compares the power of the newly proposed test with existing methods of backtesting
through Monte Carlo simulations. Section 5 applies the geometric-VaR test to the actual
P/L and VaR forecasts provided by the bank. Section 6 concludes.

2 Duration-based Backtesting

VaR is defined with a promised coverage rate over a given time horizon. We focus on
the one-day VaR horizon in the paper. A one-day VaR with coverage rate p is the value
such that the loss next day would exceed VaR with probability p. In particular, we say
that the VaR forecast VaRt (p) is efficient with respect to information set Ωt−1 if

P r(rt < −VaRt (p)|Ωt−1 ) = p. (1)

Here we follow the convention of reporting VaR as a positive number. Given a VaR
forecast VaRt conditional on information up to time t − 1 and the realized return rt at
time t, we can obtain the hit sequence {It } by comparing the ex-post return rt and the
ex-ante forecast VaRt . A violation or a hit refers to the event that the loss exceeds the
VaR forecast. Let It be an indicator function such that It = 1 when there is a violation,
that is, 

1, if rt < −VaRt (p),

It = (2)

0, otherwise.

6
If the VaR forecast is efficient with respect to information set Ωt−1 , the distribution of
the hit sequence should be i.i.d. Bernoulli with parameter p:

P r(It = 1|Ωt−1 ) = p. (3)

Let ti denote the day of the ith violation/hit, the no-hit duration Di is constructed by
Di = ti − ti−1 . The hit sequence is transformed into the duration sequence and we can
use duration modeling techniques to explore the data1 .
If the hit sequence is i.i.d. Bernoulli, Di measures the number of Bernoulli trials
needed to get one hit. Hence, under the null hypothesis that VaR is correctly specified,
Di follows a geometric distribution with parameter p:

P r(Di = d) = p(1 − p)d−1 . (4)

The geometric distribution is characterized by a flat hazard function. Hazard function


λid is defined as the ratio of pdf f i (d) over survival function S i (d),

f i (d)
λid = . (5)
S i (d)

For a discrete distribution, we can write

f i (d) = Pr(Di = d) = λid (1 − λid−1 )...(1 − λi1 ),

S i (d) = Pr(Di ≥ d) = (1 − λid−1 )...(1 − λi1 ), (6)

where f i (d) is the probability of Di equaling d; survival function S i (d) is the probability
of duration Di being at least d. Hence, λd measures the probability of getting a hit (a
failure) on day d given that the no-hit duration has survived for d − 1 days. In other
1
See Kiefer (1988) for an extensive review on duration modeling.

7
words,
λid = Pr(Iti +d = 1|Iti +d−1 = 0, ..., Iti +1 = 0, Iti = 1, Ωti +d−1 ). (7)

If the VaR forecast is efficient, the probability of getting a hit does not depend on any
past information, so the hazard function must be a constant. Furthermore, if the VaR
forecast has correct unconditional coverage, the hazard function must equal the coverage
rate p.
On the other hand, if VaR is not efficient with respect to past information, the hazard
function is no longer the constant p. The probability of getting a hit can depend on how
long the no-hit sequence has lasted as well as past information. Berkowitz, Christoffersen,
and Pelletier (2011) propose the geometric test in which the hazard function decreases
with durations under the alternative hypothesis. We review the Geometric test in Sec-
tion 2.1. Section 2.2 introduces how to incorporate explanatory variables such as the
VaR forecasts in our duration-based framework. Finally, we present the Geometric-VaR
test in Section 2.3.

2.1 Geometric Test

The Geometric test allows time dependence in the hit sequence by specifying the following
hazard function for durations:

λd = adb−1 , (8)

with 0 ≤ a < 1 and 0 ≤ b ≤ 1.2 The ordering of duration does not play a role here so we
omit the superscript i.
We can decompose the geometric test into a test of unconditional coverage and a
test of duration independence. The null hypothesis of correct unconditional coverage
2
This corresponds to the discrete Weibull distribution specified in Stein and Dattero (1984). Note
that if b > 1, the distribution has finite support.

8
corresponds to a = p, since the percentage of violations equals the coverage rate p and
the average duration equals 1/p. As in Kupiec (1995) and Christoffersen (1998), we
conduct the unconditional coverage test assuming independence. However, if violations
are clustered together, the risk of bankruptcy would increase even if violations have
correct unconditional coverage. Hence, it is important to examine the time dependence
of violations. Under the null hypothesis, duration has "no memory", and the hazard
function is flat, i.e. b = 1. Under the alternative hypothesis, violations are clustered,
we would observe an excessive number of short durations and long durations compared
to the geometric distribution. This corresponds to a decreasing hazard function3 : if
the no-hit duration has not lasted for long, the probability of getting a hit is high,
hence the excessive amount of short durations; if the no-hit duration has survived for
a long time, the probability of getting a hit is low, hence the excessive amount of long
durations. In other words, the alternative hypothesis is specified as b < 1. By testing
the unconditional coverage and duration independence hypotheses jointly, we obtain the
geometric conditional coverage test.

2.2 VaR Test

If the VaR forecast V aRt is efficient with respect to the information set Ωt−1 , the prob-
ability of getting a violation should be independent of any information known at t − 1.
Engle and Manganelli (2004) suggest regressing It on a vector of explanatory variables
xt−1 , which belongs to the information set Ωt−1 . However, since It is binary, the residuals
of this linear regression are discrete and heteroskedastic as noted in Berkowitz, Christof-
fersen, and Pelletier (2011) and Dumitrescu, Hurlin, and Pham (2012). Instead, we can
resort to the class of binary choice models.
3
Decreasing hazard function is also referred to as negative duration dependence.

9
Let yt∗ denote a latent variable linking the explanatory variables xt−1 to It . Specifically,

0
yt∗ = xt−1 β + u∗t , (9)

and we observe It = 1 when yt∗ > 0. Using this latent variable representation, the
probability of getting a violation becomes

0 0
Pr(It = 1) = Pr(yt∗ > 0) = Pr(u∗t > −xt−1 β) = 1 − F (−xt−1 β), (10)

where F (.) is the c.d.f. of u∗t . If we specify the standard logistic distribution for u∗t , then
∗ ∗ 0 0
F (u∗t ) = eut /(1 + eut ) and Pr(It = 1) = ext−1 β /(1 + ext−1 β ).
The vector of regressors xt−1 can include any variable that belongs to the information
set Ωt−1 . For example, one can use the GARCH estimate of volatility σt2 , the lagged
violations, It−1 , ..., It−k , or any VaR forecast that is known at t − 1, such as VaRt . A
0 0
parsimonious yet powerful choice in practice is to set xt−1 = (1, VaRt ) , and xt−1 β =
β0 − cVaRt . If the forecasted VaRt reacts too slowly to changing market conditions (e.g.,
time-varying and persistent volatility combined with Historical Simulation), it will be
informative about the probability of getting a hit. Moreover, we are more/less likely to
get a violation when VaRt takes a small/large absolute value, hence c > 0 under the
alternative hypothesis.
If we use a distribution with positive support for u∗t , such as the exponential distri-
0 0
bution, instead of the logistic, we obtain Pr(It = 1) = 1 − F (−xt−1 β) = ext−1 β . Compare
this with the logistic error specification, we gain the benefit that β0 is directly linked
to the unconditional coverage a: β0 = log a. Also, c measures the percentage change
induced by a unit change in VaRt . This binary choice model can be easily translated
into a discrete-time duration model. If we assume duration independence, the hazard

10
function is simply given by

λid = Pr(Iti +d = 1|Ωti +d−1 ) = ae−cVaRti +d , (11)

with 0 ≤ a < 1, and c ≥ 04 . We can test the hypothesis of correct unconditional coverage
and VaR independence jointly by testing a = p and c = 0. We refer to this test as the
VaR test.

2.3 Geometric-VaR Test

Misspecified VaR forecast may lead to violation clustering, as well as dependence between
the conditional probability of getting a violation and the VaR forecast. To capture both
types of dependence, we combine geometric test with VaR test and specify the following
hazard function:

λid = adb−1 e−cVaRti +d , (12)

where 0 ≤ a < 1, 0 ≤ b ≤ 1 and c ≥ 0. Under the null hypothesis that VaR is


correctly specified, durations follow a geometric distribution with parameter p, so the
null corresponds to a = p, b = 1 and c = 0.
The parameter a in the hazard rate captures the unconditional coverage. The second
part in the hazard function, db−1 , describes duration dependence or the time-dependence
in violations. Under the alternative hypothesis that violations are clustered, we would
observe negative duration dependence, and b < 1. Another way to account for the
time-dependence is to use lagged violations, It−1 , ..., It−k , as in Engle and Manganelli
4
This specification corresponds to a proportional hazard model with exponential baseline hazard in
the continuous-time setting. If we want to include general explanatory variables other than VaRt , we
can simply adopt the continuous-time duration model, or specify a different error distribution such that
0 ≤ λ ≤ 1. Also, for small samples, we find that (VaRt − min VaR) may be a better choice of explanatory
variable.

11
(2004). However, that approach has limited power against higher order of dependence,
while our duration-based test can capture general dependence structure. The last part
in the hazard function examines the impact of VaRti +d on the probability of getting a
hit. Our methodology can be easily extended to include additional explanatory variables.
However, as the effective sample sizes available to practitioners and regulators are quite
small, we only adopt the most parsimonious version.
The geometric-VaR test can be decomposed into three tests of individual hypotheses
under one unified framework. Specifically, we consider a test of unconditional coverage, a
test of duration independence and a test of VaR independence. For comparison, we also
consider the geometric test in section 2.1 and the VaR test in section 2.2. In summary,
we will examine three individual tests and three joint tests:

1. Unconditional Coverage Test (under the maintained assumption that b = 1 and


c = 0):

H0 : a = p

Ha : a 6= p

2. Duration independence test (under the maintained assumption that c = 0):

H0 : b = 1

Ha : b < 1

3. VaR independence test :

H0 : c = 0

Ha : c > 0

12
4. Geometric test: unconditional coverage and duration independence (under the
maintained assumption that c = 0)

H0 : a = p and b = 1

Ha : a 6= p or b < 1

5. VaR test: unconditional coverage and VaR independence (under the maintained
assumption that b = 0)

H0 : a = p and c = 0

Ha : a 6= p or c > 0

6. Geometric-VaR test: unconditional coverage, duration independence and VaR in-


dependence

H0 : a = p, b = 1 and c = 0

Ha : a 6= p or b < 1 or c > 0

This unified framework allows us to not only test wether VaR forecasts are misspecified
overall, but also understand how they are misspecified by looking into which individual
hypothesis is rejected. The three individual hypotheses have different economic impacts:
financial institutions focus on the unconditional coverage since regulators use the number
of violations to determine the penalties that banks might incur (see Annex 10a in Basel
Committee on Banking Supervision, 2006). The duration independence test considers
whether VaR is capturing the time-varying nature of risk, in particular closely grouped
high risk since that might increase the probability of bankruptcy. The rejection of VaR
independence hypothesis indicates that the VaR forecast does not fully represent the

13
return dynamics. Note that banks use an internal VaR model to determine the capital
requirements. Practitioners face a trade-off between correctly specified VaR and smooth
capital requirements, and examining the three hypotheses separately can help internal
model builders to decide the on trade-off.

3 Test Implementation

If the hit sequence does not start with a violation, the first duration measures the number
of days with no violation rather than the number of days between two violations. In other
words, the first duration is left-censored. Similarly, if the hit sequence does not end with
a violation, the last duration is right-censored. To implement the test, we generate a
binary series {Ci }N N
i=1 along with the duration series {Di }i=1 , while Ci = 1 indicates that

Di is censored. If a duration Di is not censored, or in other words, if the hit sequence


starts with a violation, its contribution to the likelihood is the probability f i (Di ). On
the other hand, if a duration is censored or incomplete, we only know that the duration
has lasted for at least Di days. Hence, its contribution to the likelihood is the survival
function S i (Di ). When the hit sequence is converted to the duration sequence, only the
first and the last duration might be censored. The log-likelihood function that takes
censoring into consideration is then given by

N
X −1
1 1
log L(D| Θ) =C1 log S (D1 ) + (1 − C1 ) log f (D1 ) + log f i (Di )
i=2

+ CN log S N (DN ) + (1 − CN ) log f N (DN ) .

We estimate Θ using maximum likelihood. Following Christoffersen (1998), we utilize


likelihood ratio tests so that the individual hypothesis testing and joint hypothesis testing
can be conveniently implemented under a unified framework. The standard LR statistics

14
can be formulated by LR = −2(log L(D|Θ̂R ) − log L(D|Θ̂U R )) for each of the tests in
section 2, where Θ̂R denotes the maximum likelihood estimate of Θ when parameters are
restricted by the null hypothesis, and Θ̂U R is the unrestricted maximum likelihood esti-
mate (although the parameter space could be restricted by the maintained assumptions).
Specifically, the LR test statistic for the unconditional coverage test is given by

LRUC = −2 [log L(D|a = p, b = 1, c = 0) − log L(D|â, b = 1, c = 0)] . (13)

For the test of duration independence, we have

h i
LRDind = −2 log L(D|â, b = 1, c = 0) − log L(D|â, b̂, c = 0) . (14)

The duration independence test does not depend on the true coverage p; it captures the
time-dependence of violations while maintaining the assumption of VaR independence
(by imposing c = 0). Next, for the VaR independence test, the LR test statistic is
formulated after taking duration dependence into consideration. In other words, the
VaR independence test captures whether the probability of violation depends on the
VaR forecast when we allow for the time-dependence of violations. Specifically,

h i
LRVind = −2 log L(D|â, b̂, ĉ) − log L(D|â, b̂, c = 0) . (15)

The geometric test jointly tests the unconditional coverage and duration independence;
the test statistic is given by:

h i
G
LR = −2 log L(D|â, b̂, c = 0) − log L(D|a = p, b = 1, c = 0) . (16)

Note that LRG equals the sum of LRUC and LRDind . Next, the VaR test jointly tests
the unconditional coverage and VaR independence. Since duration dependence and VaR

15
dependence might be capturing similar dynamics in the data, we assume duration inde-
pendence (b = 1) when forming the test statistic:

LRV = −2 [log L(D|â, b = 1, ĉ) − log L(D|a = p, b = 1, c = 0)] . (17)

Last, for the geometric-VaR test,

h i
LRGV = −2 log L(D|â, b̂, ĉ) − log L(D|a = p, b = 1, c = 0) . (18)

The geometric-VaR test can be decomposed into three individual tests: the test of uncon-
ditional coverage, the test of duration independence and the test of VaR independence.
In particular, LRGV is equal to the sum of the three test statistics:

LRGV = LRUC + LRDind + LRVind . (19)

Once we obtain the test statistics from the sample, we could use their asymptotic
distribution to calculate the p-value. However, the sample size of the duration sequence
is usually small. For example, if we have one year of VaR forecast with a 1% coverage rate,
the expected number of hits is 2.5. Even with 10 years of data, the average sample size for
durations is 25. The small sample size also results in a nontrivial sample selection issue
since we cannot run the backtest unless a minimum number of violations are observed.
Another complication is that we are testing parameter values at the boundary of the
parameter space and this might affect the asymptotic distribution. Also, since we are
working with binary data, the distribution of LR statistics is not continuous. For all
these reasons, we use Monte Carlo techniques in Dufour (2006) to get the simulated
distribution of the test statics and reliable p-values, as in Christoffersen and Pelletier
(2004).

16
To implement the Monte Carlo technique, we first generate N realizations of data
under the null hypothesis. For each test described above, we obtain the test statistics
LRi for the ith realization of data. Then LRi , i = 1, ..., N , form the simulated distribution
of test statistics under the null. Let LR0 denote the test statistics from the sample. We
can obtain p-values by comparing LR0 to its simulated distribution under the null.
Since we are working with discrete distributions, LR0 can be equal to LRi . To break
the ties, we draw Ui from a uniform distribution on [0, 1] for each of test statistic, LRi .
Then the Monte Carlo p-value is given by

N Ĝ(LR0 ) + 1
p̂N (LR0 ) = , (20)
N +1

and

N N
1 X 1 X
ĜN (LR0 ) = 1 − I(LRi ≤ LR0 ) + I(LRi = LR0 )I(Ui ≥ U0 ).
N i=1 N i=1

4 Simulation Studies

4.1 Data Generating Process

The most prominent characteristics of financial returns include volatility clustering, lever-
age effects and fat tails. Volatility clustering refers to the observation that volatility is
persistent: periods of high volatility tend to cluster together. Leverage effect concerns the
asymmetric response to innovations. In financial data, a negative innovation tend to have
different impacts on future volatility than a positive innovation. In particular, a negative
shock tends to increase future volatility more than a positive shock of the same size. Fat
tails explain the large returns that would be very unlikely given a normal distribution.
The most popular approach characterizing these features is to use GARCH-type models,

17
which allow volatility to depend on past returns and other observables. We adopt the
Nonlinear asymmetric GARCH, or NGARCH process with student-t innovations. The
NGARCH model accounts for leverage effects by allowing the impact of a negative shock
to depend on the past conditional volatility (See Engle and Ng, 1993). Specifically, we
simulate from the following process:

Rt+1 = σt+1 ((d − 2) /d)1/2 zt+1


 1/2 !2
2 d − 2
σt+1 = ω + ασt2 zt − θ + βσt2 , (21)
d

where zt is drawn from a student t(d) distribution. The leverage effect is represented by θ.
If θ is positive, volatility tends to increase more after a large negative shock than a large
−1
positive shock. The unconditional variance of returns is given by ω (1 − α(1 + θ2 ) − β) ,
while (α(1 + θ2 ) + β) indicates the volatility persistence.
To choose realistic parameters for their simulations, Berkowitz, Christoffersen, and
Pelletier (2011) estimate the volatility model in (21) with desk level P/Ls from four
business lines of a large commercial bank. Table 1 reproduces their parameter estimates.
The four business lines display very different dynamics. The negative θ in business line
1 and 3 indicates that positive returns have a larger impact on volatility than negative
returns, the opposite of the usual leverage effect. Business line 2 is characterized by
highly persistent volatility and fat tails. Business line 4 has very high unconditional
volatility. Using the parameter estimates in Table 1, we can generate returns that have
similar dynamics to the P/L series from the four business lines.

18
Table 1: Parameter Estimates of NGARCH-t(d) Model for Four Business Lines
Business Line 1 Business Line 2 Business Line 3 Business Line 4
d 3.808 3.318 6.912 4.702
θ -0.245 0.503 -0.962 0.093
β 0.749 0.928 0.873 0.915
α 0.155 0.052 0.026 0.072
ω 0.550 0.215 0.213 1.653

4.2 Historical Simulation

In general, VaR with coverage rate p is the pth quantile of the conditional distribution
of return Rt+1 :

−1
VaRt+1 (p) = Ft+1 (p), (22)

where Ft+1 denotes the conditional distribution of Rt+1 . We compute VaR forecasts using
the Historical Simulation approach, which is widely adopted by financial institutions
because it is nonparametric and easy to implement. Historical Simulation uses the past
one year or two years of returns to construct an empirical estimate of Ft+1 :

t
VaRHS
t+1 (p) = percentile({Rs }s=t−Te +1 , p), (23)

where Te is the size of the rolling window that is used to approximate the conditional dis-
tribution. Following industry practice, we choose Te to be 250, which roughly corresponds
to one year of trading days.
Albeit its popularity, Historical Simulation is problematic since it assumes that re-
turns from the past Te days are independently and identically distributed. It does not
take into consideration the predictability of volatility; the time varying nature of volatility
is only reflected through the rolling window. Moreover, Pritsker (2006) points out that
Historical Simulation is under-responsive to changes in risk. For example, suppose that

19
the market crashed yesterday. The VaR forecast for today should reflect this large in-
crease in risk. However, when computing VaR with Historical Simulation the magnitude
of this very negative return does not directly impact the forecasts.
Historical Simulation reacts slowly to the market dynamics such as the ones described
by NGARCH models, and we can use simulations to assess its impact on violation proba-
bility. Specifically, we generate a sequence of P/Ls with T = 100, 000 from (21) using the
parameters estimated from the four business lines, then we compute VaR forecasts using
(23). By comparing the ex-post return with the ex-ante VaR we obtain the sequence of
violation.
Figure 1 demonstrates how the inadequacy of Historical Simulation results in both
duration dependence, and dependence between violation probability and VaR forecasts.
In the left panel, we sort VaR forecasts into 30 equally spaced bins, and plot the fraction of
observations where It = 1 against the mean values of each bin (red dots). The size of each
dot is proportional to the number of VaR forecasts that fall into that bin. The fraction of
violations, which measures the empirical hazard rate assuming duration independence,
decreases when VaR increases as expected. We also fit the hazard function specified
for the VaR test, λid = ae−cVaRti +d , and plot it in blue lines. The fitted hazard rates
correspond quite well to the empirical ones in all business lines5 , suggesting that our
specification is well-suited for capturing VaR dependence. In the right panel, we plot the
empirical hazard rate assuming VaR independence, λ̂(d) = m(d)/M (d), where m(d) =
P P
1{Di = d} is number of durations equals to d, and M (d) = 1{Di ≥ d} is the
number of durations that are “at risk”. The size of each dot corresponds to m(d). The
fitted hazard function using Geometric test, λd = adb−1 , is plotted in blue lines. These
figures illustrate the importance of not restricting the dependence structure to the first
few lags. Also, they suggest that the power law decay we adopted for hazard function
5
The blue line roughly corresponds to a weighted non-linear regression on the red dots, whereas the
sizes of the dots indicate the weights.

20
adequately describes the dynamics of violations in different realistic settings.

4.3 Size of the Duration-based Tests

In this subsection, we want to illustrate the need of using Dufour’s (2006) Monte Carlo
method described in section 3 to control the size of the duration-based tests. To do so,
we generate returns from a NGARCH-t(d) then compute VaRs using the true conditional
distribution of returns. Specifically, VaRt (p) = Ft−1 (p), where Ft is the t(d) distribution
with variance equal to σt2 , and the coverage rate p is set to 1% and 5% as commonly
reported in practice. The hit sequence is obtained by comparing returns with VaRs,
and test statistics LR0 for each test are computed as in Section 3. This gives us an iid
hit sequence as well as realistic values for the VaR. The sample size T varies from 250
to 1500, which roughly correspond to one year through six years of trading days. The
parameters of the model are {d, θ, β, α, ω} = {10, 0, 0.93, 0.05, 0.21}.
Table 2 reports the size of the tests from 10, 000 replications for 5% VaR and desired
significance level of 10%. The results for 1% VaR are similar and are available upon
request. We first report the size based on chi-squared asymptotic distributions, which
do not take into account the effect of testing parameter values at the boundary of the
parameter space. We can see that in this case, most tests are undersized except for the
unconditional coverage, VaR independence and VaR tests which are oversized for the
smallest sample sizes.
In the second part of Table 2 we report the size based on asymptotic critical values
that take into account testing parameters on the boundary. We obtain these asymptotic
critical values by simulations: for a very large sample size (T = 50, 000) we simulate data
as in the first part of Table 2 and for each sample (we repeat 10,000 times) we evaluate
the test statistics. The 10% critical value is taken as the 90% percentile of the simulated
test statistics and a test rejects if the statistic is larger than the critical value. At the

21
0.04 0.06
Business Line 1
0.03
0.04
0.02
0.02
0.01

0 0
4 5 6 7 8 0 50 100 150
VaR Duration

0.06 0.04
Business Line 2

0.03
0.04
0.02
0.02
0.01

0 0
5 10 15 0 50 100
VaR Duration

0.06 0.04
Business Line 3

0.03
0.04
0.02
0.02
0.01

0 0
3 3.5 4 4.5 0 50 100 150
VaR Duration

0.05 0.05

0.04 0.04
Business Line 4

0.03 0.03

0.02 0.02

0.01 0.01

0 0
15 20 25 30 35 0 20 40 60 80 100
VaR Duration

Figure 1: Empirical Hazard Rate. The left panel plots violation probability against VaRs.
The right panel plots empirical hazard rate against durations.

22
bottom of Table 2 we report these simulated critical values and they are indeed different
than the ones we would obtain from a chi-squared distribution.
Using simulated asymptotic critical values, the size of all tests increases compared to
when we use critical values from a chi-square distribution. As a result, some tests like VaR
independence go from having the correct size to be oversized, while others like geometric-
VaR go from undersized to oversized. We also see that the duration independence test
remains undersized with the simulated critical values.
Table 2 illustrates that the test statistics are not chi-squared distributed under the
null, and their finite sample properties differ from the asymptotic ones. A simple solution
to this problem is to use Dufour’s (2006) Monte Carlo method described in Section 3.
It works for finite samples, accommodates testing parameters on the boundary of the
parameter space and takes into account the sample selection.

4.4 Finite Sample Power of the Tests

We are interested in the finite sample power of the test when VaR is misspecified. To
mimic real-world settings, we generate P/Ls with sample size T + Te using parameters
from Table 1, where Te = 250 and T varies from 250 to 1500. The VaR forecasts are
obtained from Historical Simulation with a rolling window of Te days. After computing
test statistics LR0 , we obtain the p-values using Dufour’s (2006) Monte Carlo technique.
Specifically, we generate a hit sequence of the same sample size T from a Bernoulli
distribution, and generate VaRs from an independent NGARCH process, and then we
compute LRi , i = 1, ... N for each test. We choose N to be 9, 999 for the simulated
distribution. The p-value is obtained by comparing LR0 to LRi , i = 1, ..., N using
equation (20).
The power of each test is computed as the rejection frequency from 5, 000 replications
of the backtesting procedure described above. The significance level is chosen to be 10%

23
Table 2: Size of 10% Duration-based Tests applied to 5% VaR

Sample UC Duration VaR ind. Geometric VaR GV


Size ind.
Using chi-squared asymptotic critical values
250 0.144 0.029 0.126 0.057 0.122 0.085
500 0.120 0.033 0.105 0.067 0.106 0.072
750 0.126 0.035 0.093 0.063 0.096 0.067
1000 0.119 0.037 0.094 0.061 0.091 0.064
1250 0.112 0.035 0.093 0.060 0.085 0.065
1500 0.103 0.040 0.085 0.060 0.086 0.062
Using simulated test statistics for sample size 50,000
250 0.144 0.063 0.215 0.103 0.176 0.149
500 0.120 0.069 0.181 0.109 0.157 0.135
750 0.126 0.075 0.168 0.099 0.135 0.126
1000 0.119 0.079 0.170 0.101 0.142 0.128
1250 0.112 0.081 0.169 0.100 0.129 0.120
1500 0.103 0.079 0.156 0.097 0.126 0.119
Simulated Critical Value
2.719 1.606 1.663 3.761 3.819 4.791
Chi-squared Critical Value
2.706 2.706 2.706 4.605 4.605 6.251

Notes (1): For “chi-squared asymptotic critical values", we generate i.i.d. Bernoulli hit sequence and
VaR regressors that are independent of the simulated hit sequence to assess the size properties. The
asymptotic critical values are computed from chi-squared distribution. See the text for details on each
test.
Notes (2): For “sample size 50,000", we generate returns and VaRs from the same NGARCH-t(d)
process and obtain the hit sequence by comparing returns to VaRs. The empirical size of each test is the
rejection frequency from 10000 replications. The asymptotic distributions of test statistics are computed
from 10000 simulations with sample size 50000. UC stands for unconditional coverage test and GV
stands for geometric-VaR test.

24
for all tests, i.e., a hypothesis is rejected if the p-value computed using the Monte Carlo
technique is smaller than 10%. We conduct the power exercise using both 1% and 5% VaR
and the simulated power of each test is reported in Table 3 and Table 4. The different
Business lines in Table 3 and Table 4 indicate that the simulated P/Ls are generated
using parameters estimated from the corresponding business line.
To compare our test to the regression-based approach, we also present in the tables the
power of a CaViaR test taken from Berkowitz, Christoffersen, and Pelletier (2011). The
CaViaR test estimates the binary choice model in (9) assuming that the error term have
a logistic distribution. The regressors are chosen to be It−1 , VaRt , and a constant. Then
the test examines whether the coefficients for It−1 and VaRt are statistically significant
and whether Pr(It = 1) = p using likelihood ratios.
The power of the unconditional coverage test is generally low for all business lines
since the unconditional coverage test does not consider higher order dynamics of viola-
tions. Also, we expect the unconditional coverage test based on duration to have similar
performance as the unconditional test of Kupiec which compares the actual number of
violations with the expected number (See Kupiec, 1995). The Kupiec test has been
documented to have low power. See for example Pritsker (2006), Pérignon and Smith
(2008) and Berkowitz, Christoffersen, and Pelletier (2011). Moreover, Escanciano and
Pei (2012) show that the unconditional test is always inconsistent for backtesting His-
torical Simulation models, and this also applies to our duration-based test. They also
find that the inconsistency is more severe when the volatility is less persistent, which is
in line with what we observe in business line 1 and 3; in both cases the negative θ and
relatively small volatility persistence lead to less dependent left tail and hence the very
low power in unconditional coverage test.
The duration independence test performs well expect for Business line 3. Business
Line 3 is characterized by a large negative θ, so a negative shock tends to be followed

25
Table 3: Power of 10% Duration-based Tests on 1% VaR in Four Business Lines
Sample UC Duration VaR ind. Geometric VaR GV CaViaR
Size ind.
Business Line 1
250 0.112 0.387 0.291 0.345 0.302 0.441 0.420
500 0.063 0.428 0.534 0.225 0.320 0.595 0.429
750 0.041 0.518 0.668 0.335 0.401 0.619 0.539
1000 0.029 0.580 0.754 0.328 0.475 0.693 0.618
1250 0.022 0.639 0.837 0.384 0.608 0.782 0.682
1500 0.018 0.668 0.887 0.416 0.673 0.834 0.737
Business Line 2
250 0.188 0.337 0.288 0.338 0.344 0.438 0.451
500 0.104 0.442 0.508 0.264 0.357 0.594 0.430
750 0.070 0.546 0.613 0.361 0.395 0.608 0.480
1000 0.071 0.609 0.683 0.357 0.450 0.666 0.532
1250 0.070 0.669 0.746 0.415 0.549 0.748 0.580
1500 0.058 0.724 0.801 0.476 0.594 0.808 0.617
Business Line 3
250 0.070 0.145 0.305 0.118 0.282 0.246 0.333
500 0.046 0.147 0.567 0.066 0.311 0.442 0.329
750 0.032 0.154 0.699 0.062 0.399 0.440 0.410
1000 0.012 0.142 0.798 0.039 0.505 0.514 0.526
1250 0.012 0.142 0.888 0.038 0.648 0.631 0.611
1500 0.010 0.146 0.931 0.035 0.739 0.719 0.686
Business Line 4
250 0.200 0.385 0.289 0.379 0.364 0.475 0.471
500 0.111 0.496 0.492 0.291 0.353 0.613 0.452
750 0.076 0.612 0.597 0.424 0.397 0.638 0.510
1000 0.081 0.693 0.672 0.448 0.472 0.724 0.574
1250 0.080 0.751 0.752 0.513 0.563 0.798 0.612
1500 0.082 0.803 0.801 0.579 0.603 0.850 0.655
Notes: We simulate P/Ls using NGARCH-t(d) models that have the same parameters as the four business
line and then compute VaR using Historical Simulation with a rolling window of size 250. The simulated
power of each test is the rejection frequency from 5000 replications. UC stands for unconditional coverage
test and GV stands for geometric-VaR test. CaViaR is a regression-based test. See the text for details
on each test.

26
Table 4: Power of 10% Duration-based Tests on 5% VaR in Four Business Lines
Sample UC Duration VaR ind. Geometric VaR GV CaViaR
Size ind.
Business Line 1
250 0.156 0.471 0.533 0.349 0.425 0.571 0.447
500 0.063 0.679 0.708 0.476 0.527 0.737 0.517
750 0.028 0.794 0.804 0.619 0.611 0.850 0.611
1000 0.016 0.865 0.885 0.694 0.725 0.923 0.692
1250 0.015 0.908 0.917 0.772 0.778 0.956 0.761
1500 0.008 0.938 0.950 0.839 0.857 0.977 0.851
Business Line 2
250 0.345 0.478 0.548 0.493 0.610 0.686 0.583
500 0.249 0.716 0.669 0.632 0.659 0.808 0.617
750 0.170 0.827 0.735 0.735 0.699 0.882 0.662
1000 0.144 0.898 0.788 0.808 0.760 0.934 0.702
1250 0.122 0.937 0.813 0.859 0.789 0.962 0.741
1500 0.123 0.965 0.838 0.908 0.833 0.982 0.819
Business Line 3
250 0.054 0.130 0.557 0.069 0.348 0.345 0.299
500 0.019 0.149 0.742 0.044 0.456 0.431 0.351
750 0.005 0.149 0.853 0.057 0.559 0.540 0.430
1000 0.002 0.156 0.927 0.044 0.693 0.653 0.511
1250 0.002 0.167 0.962 0.047 0.793 0.761 0.583
1500 0.002 0.179 0.978 0.054 0.855 0.822 0.713
Business Line 4
250 0.330 0.516 0.537 0.520 0.609 0.698 0.590
500 0.223 0.752 0.670 0.659 0.671 0.832 0.625
750 0.155 0.884 0.737 0.791 0.714 0.910 0.672
1000 0.135 0.932 0.783 0.856 0.769 0.952 0.729
1250 0.118 0.959 0.833 0.901 0.807 0.971 0.768
1500 0.111 0.979 0.842 0.938 0.848 0.986 0.837
Notes: We simulate P/Ls using NGARCH-t(d) models that have the same parameters as the four business
line and then compute VaR using Historical Simulation with a rolling window of size 250. The simulated
power of each test is the rejection frequency from 5000 replications. UC stands for unconditional coverage
test and GV stands for geometric-VaR test. CaViaR is a regression-based test. See the text for details
on each test.

27
Table 5: Fraction of Samples Where Test Is Feasible (1% and 5% VaR)
VaR coverage Sample Size Geometric GV
Business Line 1
%1 250 0.7169 0.4690
%1 500 0.9815 0.9271
%1 750 0.9998 0.9968
%5 250 1.0000 0.9972
Business Line 2
%1 250 0.6820 0.4831
%1 500 0.9696 0.9027
%1 750 0.9982 0.9903
%5 250 0.9862 0.9760
Business Line 3
%1 250 0.7332 0.4319
%1 500 0.9913 0.9475
%1 750 1.0000 0.9992
%5 250 1.0000 1.0000
Business Line 4
%1 250 0.6892 0.4974
%1 500 0.9684 0.9048
%1 750 0.9990 0.9948
%5 250 0.9919 0.9792
Notes: We report the fraction of simulations where we can compute geometric and geometric-VaR (GV)
test.

28
by low volatility, and violation clustering is less likely to happen. Since the duration
independence test deals with the time dependence of violations, it is not surprising that
the test has low power against this line of specification (opposite leverage effects), while
the VaR independence test still has good power properties for this business line. The VaR
independence test picks up the dependence between violation and VaR forecast after the
time dependence is accounted for. Its power is lower than the duration independence test
in business lines 1, 2 and 4. The duration independence test has the highest power in the
these business lines, but it does not consider unconditional coverage, which determines
the penalty for banks under current regulation.
The geometric test is a combination of the unconditional coverage test and duration
independence tests; its likelihood ratio test statistic is the sum of the two individual
test, and its power is closely related to the power of the two individual tests. The VaR
test jointly tests the unconditional coverage hypothesis and VaR independence hypothesis
while assuming duration independence. The VaR test has lower power than the geometric
test in business line 1, 2 and 4 but it has better power against the misspecifications in
business line 3. The CaViaR test utilizes lagged hit sequence in addition to VaR and it
performs better than the VaR test for 1% coverage rate. With 5% coverage rate the VaR
test has higher power possibly due to the error term specification.
Finally, the geometric-VaR test has the highest power in most cases when uncondi-
tional coverage is considered, outperforming regression-based CaViaR or other duration-
based tests. It combines the strength of three individual hypothesis tests, namely the
unconditional coverage test and duration independence test and the VaR independence
test, and it has good power properties against various types of misspecification.
To conduct the geometric-VaR test, we need to have more than three durations, or
three durations with at least one of them uncensored. Roughly speaking, this selection
criterion corresponds to having at least three violations6 . The VaR independence test
6
Unless both the first and the last hit sequence is a violation, in which case three violations corresponds

29
also requires the estimation of all three parameters hence it has the same feasible fraction
as the geometric-VaR test. The duration independence test, geometric test and VaR test
all require the estimation of two parameters, hence we need three durations, or two
durations with at least one of them uncensored. Table 5 reports the fraction of samples
where the test is feasible. The feasible fraction affects the effective power of these tests.
For example, we could compute effective power as the product of the rejection frequency
in Tables 3 and 4 with the fractions in Table 5. This measures the overall fraction of
samples for which we can reject the null.

5 Empirical Results

We apply our tests to the actual business line P/Ls and VaR data from Berkowitz,
Christoffersen, and Pelletier (2011) to assess the performance of the VaR forecasts pro-
vided by the bank. The left panel of Figure 2 displays the time series of P/Ls and VaR
forecasts for the four business lines. The right panel of Figure 2 displays the time series
of violations, while the magnitude is the loss on the day of violation. Table 6 reports
descriptive statistics from the four business lines. Business Line 3 only has one violation
and none of the duration-based testing could be conducted.
The test statistics are computed using equations (13) through (18) and the test values
are reported in Table 7. To get reliable p-values, we utilize the Monte Carlo technique
described before and the p-values are displayed in Table 7.
Our geometric-VaR test rejects the VaR models in both business line 2 and business
line 4 at a 5% significance level. The unconditional coverage test fails to reject any of the
business line considered here, while the null of duration independence is rejected for all
three business lines at a 10% significance level. This is consistent with casual observations
from Figure 2: for business line 1, 2 and 4, there are signs of violation clustering, yet the
to two uncensored durations.

30
Business Line 1 Business Line 1
20 0

−10
0
−20
−20
−30

−40 −40
2001 2002 2003 2004 2001 2002 2003 2004

Business Line 2 Business Line 2


0
50

−20
0
−40

−50
−60
2001 2002 2003 2004 2001 2002 2003 2004

Business Line 3 Business Line 3


20 0

10 −5

0 −10

−10 −15

−20 −20
2001 2002 2003 2004 2001 2002 2003 2004

Business Line 4 Business Line 4


0
50

−20
0
−40

−50
−60
2001 2002 2003 2004 2001 2002 2003 2004

Figure 2: Time Series of P/Ls, VaRs and Violations from Four Business Lines. The left
panel displays P/Ls (dashed lines) and one-day 1% VaRs (solid lines). The right panel
displays violations, while the magnitude is the P/L on the day of violation.

31
Table 6: P/Ls and VaRs for Four Business Lines: Descriptive Statistics
Business Line 1 Business Line 2 Business Line 3 Business Line 4
P/Ls
Number of OBS 873 811 623 623
Mean 0.1922 1.5578 1.8740 3.1562
Std. Dev. 2.6777 5.2536 1.6706 9.2443
VaRs
Number of OBS 873 811 623 623
Mean -7.2822 -16.3449 -3.2922 -24.8487
Std. Dev 3.1321 10.5446 1.1901 6.6729
Observed number 9 5 1 4
of hits
Expected number 9 8 6 6
of hits

Table 7: Backtesting Actual VaRs from Four Business Lines


UC Duration VaR ind. Geometric VaR GV CaViaR
ind.
Business Line 1
Test Value 0.062 1.236 0.403 1.297 1.265 1.701 3.227
p-value 0.830 0.081 0.289 0.369 0.444 0.452 0.278
Business Line 2
Test Value 2.576 1.277 5.003 3.853 8.856 8.856 4.856
p-value 0.112 0.074 0.016 0.126 0.019 0.013 0.131
Business Line 4
Test Value 2.082 2.930 1.271 5.012 5.079 6.283 4.104
p-value 0.269 0.021 0.159 0.060 0.088 0.037 0.177
Notes: We report the test statistics and p-value from actual P/Ls and VaRs. The bold numbers are
p-values less than 10%.

32
average number of violation are not far from the expected value. Business line 1 has the
exact number of expected violations, which explains why all the joint tests fail to reject.
For business line 2, the null of VaR independence is rejected at a 1% significance level,
as well as the joint test of unconditional coverage and VaR independence at a 5% level.
Business line 3 shows evidence of violation clustering, as both the duration independence
test and the geometric test rejects the null that VaR is correctly specified at a 10%
significant level.
The empirical results show that the geometric-VaR (GV) test has power against
different forms of alternatives. There are cases when the geometric test performs well
but the VaR tests fails to reject. There are other cases where it is the opposite. The
geometric-VaR test combines strength from the two tests and it always does well without
losing much power from estimating an additional parameter.

6 Conclusions

VaR has become the standard risk measure for financial institutions and regulators. The
estimation of VaR is challenging and popular implementations of VaR estimation are
based on unrealistic assumptions. We propose the geometric-VaR test to evaluate the
performance of VaR forecasts. This test draws strength from a duration-based approach
which captures general forms of time dependence in violation, as well as the strength of
a regression-based approach, which captures the dependence between VaR forecasts and
violations. It can be decomposed into a test of unconditional coverage, a test of duration
independence and a test of VaR independence.
We conduct a Monte Carlo study to assess the power of the geometric-VaR test as
well as its duration-based components. The dynamics of the P/L series are based on
actual desk-level data. We find that the geometric-VaR test has higher power than other
duration-based tests or regression-based approach. We apply the tests to the actual

33
business line P/Ls and VaRs. The geometric-VaR is able to reject the efficiency of
the VaR forecasts in two of the three business lines we considered. In particular, the
geometric-VaR test is able to detect VaR dependence when the geometric test fails to
reject in business line 2, and it is able to capture duration dependence when the VaR
test fails to reject in business line 4. Our finding suggests that the geometric-VaR test
offers a good alternative to detect various forms of VaR misspecifications.
This paper mainly focuses on backtesting from the standpoint of regulators, whose
information set only contains returns and VaR forecasts. As an natural extension to
this paper, it would be interesting to apply the geometric-VaR test and investigate the
strength and shortcomings of current VaR forecasting techniques, and help risk managers
select the VaR model (see e.g. Alexander (2009) for an extensive review on VaR models).
In connection to this topic, one may consider the impact of estimation risk on the back-
testing procedure (e.g., see Escanciano and Olmo (2010, 2011)) or how to render these
tests insensitive to estimation risk (e.g., see Bontemps (2014)). We could also explore the
inclusion of other explanatory variables in the hazard function. For example, one may
consider the difference between returns and VaRs, which can be linked to the literatures
on testing the magnitude of exceedance, see e.g. Lopez (1999); Berkowitz (2001) and
Colletaz, Hurlin, and Pérignon (2013). We leave these topics to future research.

References

Alexander, C. (2009). Market Risk Analysis, Value at Risk Models., vol. 4. John Wiley
& Sons.

Basel Committee on Banking Supervision (2006). Basel II: International convergence of


capital measurement and capital standards: A revised framework.Bank for International
Settlements.

34
Berkowitz, J. (2001). ‘Testing Density Forecasts, with Applications to Risk Management’,
Journal of Business & Economic Statistics, 19(4): 465–74.

Berkowitz, J., Christoffersen, P., and Pelletier, D. (2011). ‘Evaluating value-at-risk models
with desk-level data’, Management Science, 57(12): 2213–2227.

Berkowitz, J., and O’Brien, J. (2002). ‘How Accurate Are Value-at-Risk Models at Com-
mercial Banks?’, Journal of Finance, 57(3): 1093–1111.

Bontemps, C. (2014). ‘Simple moment-based tests for value-at-risk models and discrete
distributions’, Working paper.

Candelon, B., Colletaz, G., Hurlin, C., and Tokpavi, S. (2011). ‘Backtesting Value-at-
Risk: A GMM Duration-Based Test’, Journal of Financial Econometrics, 9(2): 314–
343.

Christoffersen, P., and Pelletier, D. (2004). ‘Backtesting Value-at-Risk: A Duration-Based


Approach’, Journal of Financial Econometrics, 2(1): 84–108.

Christoffersen, P. F. (1998). ‘Evaluating Interval Forecasts’, International Economic Re-


view, 39(4): 841–62.

Colletaz, G., Hurlin, C., and Pérignon, C. (2013). ‘The Risk Map: A new tool for vali-
dating risk models’, Journal of Banking & Finance, 37(10): 3843 – 3854.

Dufour, J.-M. (2006). ‘Monte Carlo tests with nuisance parameters: A general approach
to finite-sample inference and nonstandard asymptotics’, Journal of Econometrics,
133(2): 443–477.

Dumitrescu, E.-I., Hurlin, C., and Pham, V. (2012). ‘Backtesting Value-at-Risk: From
Dynamic Quantile to Dynamic Binary Tests’, Working Papers halshs-00671658, HAL.

35
Engle, R. F., and Manganelli, S. (2004). ‘CAViaR: Conditional Autoregressive Value at
Risk by Regression Quantiles’, Journal of Business & Economic Statistics, 22: 367–
381.

Engle, R. F., and Ng, V. K. (1993). ‘Measuring and Testing the Impact of News on
Volatility’, Journal of Finance, 48(5): 1749–78.

Escanciano, J. C., and Olmo, J. (2010). ‘Backtesting Parametric Value-at-Risk With


Estimation Risk’, Journal of Business & Economic Statistics, 28(1): 36–51.

(2011). ‘Robust Backtesting Tests for Value-at-risk Models’, Journal of Financial


Econometrics, 9(1): 132–161.

Escanciano, J. C., and Pei, P. (2012). ‘Pitfalls in backtesting Historical Simulation VaR
models’, Journal of Banking & Finance, 36(8): 2233–2244.

Gaglianone, W. P., Lima, L. R., Linton, O., and Smith, D. R. (2011). ‘Evaluating Value-
at-Risk Models via Quantile Regression’, Journal of Business & Economic Statistics,
29(1): 150–160.

Haas, M. (2005). ‘Improved Duration-Based Backtesting of Value-at-Risk’, Journal of


Risk, 8(2): 17–38.

Jorion, P. (2006). Value at Risk: The New Benchmark for Managing Financial Risk.
McGraw-Hill.

Kiefer, N. M. (1988). ‘Economic Duration Data and Hazard Functions’, Journal of Eco-
nomic Literature, 26(2): 646–79.

Kupiec, P. (1995). ‘Techniques for verifying the accuracy of risk measurement models’,
The Journal of Derivatives, 3(2).

36
Lopez, J. A. (1999). ‘Regulatory evaluation of value-at-risk models’, Journal of Risk,
1(2): 37–64.

Pérignon, C., and Smith, D. R. (2008). ‘A New Approach to Comparing VaR Estimation
Methods’, The Journal of Derivatives, 16(2): 54–66.

Pérignon, C., and Smith, D. R. (2010). ‘The level and quality of Value-at-Risk disclosure
by commercial banks’, Journal of Banking & Finance, 34(2): 362–377.

Pritsker, M. (2006). ‘The hidden dangers of historical simulation’, Journal of Banking &
Finance, 30(2): 561–582.

Stein, W. E., and Dattero, R. (1984). ‘A New Discrete Weibull Distribution’, Reliability,
IEEE Transactions on, R-33(2): 196–197.

37

You might also like