You are on page 1of 15

850 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO.

4, JULY 2001

Forecasting Volatility with Neural Regression:


A Contribution to Model Adequacy
Apostolos-Paul N. Refenes and Will T. Holt

Abstract—In recent years neural networks have reportedly to raise the question of whether neural networks really are “a
achieved considerable successes in a variety of forecasting applica- major breakthrough or just a passing fad.” A number of factors
tions. Although the results are usually accompanied by extensive justify this skepticism: first, neural networks are universal ap-
empirical validation, practitioners and statisticians still remain
skeptical: the curse of overfitting is compounded by the lack proximators and are thus capable of providing a model which
of rigorous procedures for model identification, selection and fits any data with an arbitrary degree of accuracy; second, the
adequacy testing. This paper describes a methodology for neural lack of procedures for performing tests for misspecified models,
model misspecification testing. We introduce a generalization of and tests of statistical significance for various parameters that
the Durbin–Watson statistic for neural regression and discuss the have been estimated, makes it difficult to assess the model’s sig-
general issues of misspecification testing using residual analysis.
We derive a generalized influence matrix for neural estimators nificance and the possibility that any short term successes that
which enables us to evaluate the distribution of the statistic. We are reported might be due to “data mining.”
deploy Monte Carlo simulation to compare the power of the test The results of many applications are perfectly plausible but
for neural and linear regressors. While residual testing is not a suf- in the absence of a structured procedure for misspecification
ficient condition for model adequacy, it is nevertheless a necessary testing, models are exposed to justifiable criticism. Recent years
condition to demonstrate that the model is a good approximation
to the data generating process, particularly as neural-network have seen the emergence of a more structured approach to model
estimation procedures are susceptible to partial convergence. building for neural networks. Reference [30] has developed a
The work is also an important step toward developing rigorous test for model parameter significance testing which was the
procedures for neural model identification, selection and adequacy first important contribution to statistical identification for neural
testing which have started to appear in the literature. We demon- models. Reference [1] extended this work and provided pro-
strate its applicability in the nontrivial problem of forecasting
implied volatility innovations using high-frequency stock index cedures for variable significance testing. Reference [24] pro-
options. Each step of the model building process is validated using vided expressions of neural model complexity by generalizing
statistical tests to verify variable significance and model adequacy the linear equivalents. Reference [25] examined issues of archi-
with the results confirming the presence of nonlinear relationships tecture selection and simple metrics for variable relevance. Ref-
in implied volatility innovations. erence [28] gave a comprehensive examination of variable rele-
Index Terms—Autocorrelation, Durbin–Watson statistic, neural vance metrics and assessed methods of deriving the distribution
networks, residual diagnostics, volatility forecasting. of the metrics. However, all work to date has been concerned
with variable and parameter testing—the problem of model ade-
I. INTRODUCTION quacy has yet to be addressed. Conventional econometric model
selection techniques require the residuals to be examined for
I N recent years an impressive array of publications have ap-
peared claiming considerable success of neural networks in
forecasting applications in business and finance. Some have pro-
structure as a way of ensuring the model is not misspecified.
This is particularly important for estimation procedures which
are susceptible to partial convergence (e.g., local minima, flat
vided empirical results to support these claims1 but many fell
valleys).
short of this prompting skeptical practitioners and statisticians
In this paper we present a methodology for model misspecifi-
cation testing which is applicable to all types of “neural regres-
Manuscript received August 9, 2000; revised February 9, 2001 and March 28, sion” based on analysis of the model residuals. Such tests are
2001. This work was supported by ROPA under Grant RO22250057, BT Lab- a necessary, but not sufficient, condition for model adequacy.
oratories, U.K. Economic and Social Research Council (E.S.R.C.), and Ph.D. We demonstrate its usefulness in a real world application by de-
Programme, London Business School.
The authors are with the Department of Decision Sciences, London Business veloping a neural predictor of implied volatility innovations for
School, London NW1 4SA, U.K. options pricing within a robust statistical framework. We choose
Publisher Item Identifier S 1045-9227(01)05012-3. input variables with tests of significance and introduce a much
1Traditional model validation using out-of-sample (test set) performance can needed residual diagnostic test procedure for autocorrelation in
be seriously misleading when performed on a small sample and may not be the residuals. We derive the distribution for the statistic so that a
possible if we do not have enough data. In contrast, linear regression models formal statistical test may be performed. The methodology for
have a wide variety of statistical tests for variable and parameter significance
to enable parsimonious models to be produced. Statistics are available to test misspecification testing can be deployed by other researchers
every model assumption; the F-test may be used to assess the whole model, who wish to apply neural modeling to forecasting applications.
t-tests examine the significance of model parameters and residual diagnostics In Section II of this paper, we discuss the general issues
(e.g., [8], [9], [13]) test the assumptions of the residual errors. These tests are
particularly important in situations where model selection is based on small data involved in misspecification testing and particularly issues
sets, without a hold-out sample. arising for neural regression. We also introduce a new residual
1045–9227/01$10.00 © 2001 IEEE
REFENES AND HOLT: FORECASTING VOLATILITY WITH NEURAL REGRESSION 851

diagnostic test procedure for autocorrelation based on the parameters (e.g., convergence criteria, learning rate) define the
Durbin–Watson statistic and assess its power with a Monte complexity of the model. Reducing the complexity is analogous
Carlo simulation. Section III presents a case study of volatility to making assumptions in the modeling process. With too few
forecasting for options pricing using neural regression. Statis- hidden units (i.e., parameters in ), the fitted function
tics are used to test every model assumption; we examine the will on average be different from the true regression .
significance of individual model variables, select the architec- However, with fewer assumptions about the data generating
ture in a principled way and perform residual diagnostics to process (i.e., a large network) there is a serious risk that noise in
test the assumptions of the residual errors. We compare our the data will be fitted as well as any deterministic component.
neural model to univariate time-series and multivariate linear With a complex network, spurious relationships may be found
regression techniques. Section IV concludes the paper and the that degrade the performance of the system. Neural modeling
Appendix contains mathematical derivations. must therefore tradeoff variance for bias, for example, an
assumption may be made that the network should “fit a smooth
II. MISSPECIFICATION TESTING FOR NEURAL REGRESSION function."
Reference [4] places neural networks in context with the
A. Neural Regression
other major forecasting methodologies, observing that neural
Consider the neural regression model described by models offer a freedom from statistical assumptions, resilience
(1) to missing observations and an ability to incorporate nonlinear
relationships. The price of this freedom is a reliance on em-
where
pirical performance for validation due to the lack of statistical
vector of observations on the dependent variable;
diagnostics and easily understandable model structure. Bunn
nonstochastic matrix of independent variables;
states, “it may be that even a large preponderance of empirical
nonlinear function;
evidence in its favor will not be sufficient to create confidence
vector of residual errors. The regressor is defined
in the technology without more research into explainability
by the functional form where is a
and robustness diagnostics.” Much recent work has therefore
vector of model parameters (weights).
concentrated on strengthening the statistical foundations of
The regressor may take many forms, from linear regression to
neural model identification procedures. One area that has been
nonparametric nonlinear neural regression. For any given re-
neglected is that of model mis-specification testing. It is not
gressor the accuracy to which a given set of parame-
generally appreciated that the specification of a model includes
ters, , fits the data is measured by an error function .
assumptions about the residual errors, usually the following.
The error function is associated with the specific objective of the
modeling process (e.g., conditional mean, conditional median, 1) The error term has a mean of zero, i.e., the
conditional variance, etc.). The general requirement of any error model is unbiased.
function is that it must provide a monotonic measure of discrep- 2) , Error term has constant variance.
ancy between model and objective. Typically, the error function 3) , ; Error terms at different
is given by the mean squared error points in time are uncorrelated.
In regression modeling it is vital that these assumptions are
(2) not violated for the parameter estimates to be optimal (i.e., unbi-
ased and efficient). In general, three types of test may therefore
be performed on the residuals.
where .
The objective of “neural leaning” is to estimate the parame- • Autocorrelation: The error in one time period should not
ters which minimize the empirical error function be affected by the error in another time period, e.g., [8].
• Heteroscedasticity: The errors should have a constant vari-
argmin where (3) ance, e.g., [13].
• Normality: The errors should be characterized by a normal
This is achieved using an iterative minimization procedure distribution, e.g., [33].
which produces a converged least squares estimator. The fact
In earlier work [5], we introduced a test for heteroscedas-
that neural networks are able to fit any continuous and differen-
ticity that is applicable to neural models. In the next sections
tiable function leads to the “bias-variance” dilemma. It may be
we examine autocorrelation in the residuals and present a gen-
easily shown that the empirical risk (mean squared error) may
eralization of the Durbin–Watson statistic for neural regression
be decomposed into two components
models. These tests provide a “toolkit” to test neural models for
mis-specification and complete the modeling methodology of
[28].

B. Autocorrelation and the Generalized Durbin–Watson Test


(4) in Neural Regression
Probably the most important assumption concerning the er-
As sample size increases, both bias and variance should rors is that of independence. Autocorrelated errors lead to se-
decrease to zero. Factors including architecture and estimation rious underestimation of the standard errors of the regression co-
852 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 4, JULY 2001

efficients (inefficiency) and prediction intervals that are exces- The projection matrix can be expressed in terms of the gen-
sively wide. At its most basic, a significant autocorrelation in the eralized influence matrix, , as follows:
residuals represents a deterministic error component that has not
been captured by the model. Violating the no-autocorrelation as-
sumption for residuals may invalidate other model specification
tests, for example, the significance of independent variables or (8)
model parameters. With linear regression there are several major
sources of mis-specification that lead to autocorrelated errors, where
namely omission of a relevant variable and incorrect functional identity matrix of order ;
form of the regression model. Although neural models are ca- influence matrix;
pable of universal approximation, this problem can arise if the projection matrix.
network has insufficient complexity to model the true data gen- We may now use the result in Proposition 1 to compute the
erating process. empirical distribution of the Durbin–Watson statistic [16] to test
The Durbin–Watson statistic is the most commonly used test the null hypothesis of no autocorrelation.
for autocorrelation in regression models and has a solid body of Proposition 1: For the neural regression described by
theory for linear regression which we shall extend to apply to (9)
neural regression. The test considers autocorrelation in the first
lag of the residuals for two reasons. First, if autocorrelation is with being the converged least squares estimates
present it is often strongest at the first lag. Secondly, it makes argmin where (10)
tractable the problem of estimating the large number of off-diag-
onal elements of the error variance-covariance matrix. However,
and
as with all statistical tests, the Durbin–Watson test is not without
its weaknesses. It is only valid when a standard set of assump-
tions concerning the model and data are met (see, e.g., [12]). It (11)
is common in econometric modeling to violate one or more of
the assumptions due to undesirable properties of the data (e.g., The generalized influence matrix, is given by
skewness or nonstationarity). The errors of the model may also
be non-Gaussian or heteroscedastic. There has therefore been (12)
considerable interest in the behavior of the statistic when these
where is the derivative with respect to and , and is
assumptions are violated. Relevant references include [19], [18],
[26], [7], [29], [21]. Clearly, these problems also apply to neural the second-order derivatives with regard to .
regression. Nevertheless, provided one is aware of the above, it For derivation, see Appendix 1.
is still one of the most powerful tools for the detection of model It is relatively easy to show that in the case of linear regression
misspecification (see, for example, [31] and our own compara- (13)
tive results for neural regression in Section II-D). Let us derive
the “neural” equivalent of the test. which may be used to compute the empirical distribution of the
The Durbin–Watson statistic [8]–[10] is given by statistic [16] or compute bounds [9] to test the null hypothesis of
the test. The trace of the matrix is our estimate of degrees
of freedom in the neural model, (see [6] for a complete
(5) exposition).

where is the symmetric matrix C. The Distribution of the Durbin–Watson Statistic


In this section we present a procedure for calculating the dis-
tribution of . The exact distribution of the statistic may be com-
puted using an algorithm such as the Imhof [16] method. Alter-
.. .. .. .. natively, a beta distribution approximation may be used. Both
. . . .
.. .. .. .. .. (6) of these methods (especially Imhof) require reasonably compli-
. . . . . cated computational procedures to be calculated for each regres-
sion matrix to be tested. For this reason, Durbin and Watson
derived tables, based on the fact that the true value of can be
shown to lie between two beta distributions independently of
which controls the lag at which the test is performed. the particular set of regressors. However, this method has a re-
Let us define the generalized projection matrix, , by gion where the test gives an inconclusive result, where we are
, so that the Durbin–Watson test may be written in terms of incapable of accepting or rejecting the null hypothesis. In this
the true errors as situation, three options are available; 1) Compute the exact dis-
tribution using the Imhof method; 2) Use the beta approximation
to provide the significance points; and 3) Use an alternative test
(7)
(e.g., the runs test).
REFENES AND HOLT: FORECASTING VOLATILITY WITH NEURAL REGRESSION 853

We examine the exact distribution calculated by the Imhof TABLE I


method. A new error term is defined [9] as which THE POWER OF COMMON TESTS FOR
AUTOCORRELATION. ON THE LEFT OF THE TABLE WE SEE THE POWER OF A
are independent and distributed with as is an NUMBER OF TESTS FOR A PARAMETRIC NONLINEAR REGRESSION MODEL
orthonormal matrix. The matrix diagonalizes both quadratic
W W
[31]. WE SEE THAT THE DURBIN–WATSON STATISTIC OUTPERFORMS
BOTH THE NONPARAMETRIC TESTS ( & ) AND THE ROX-LJUNG
forms and (see [8, p. 412]). The statistic can therefore STATISTIC (PERFORMED AT THE FIRST LAG). THE MIDDLE SECTION OF
be written as THE TABLE SHOWS THE RESULTS OF A MONTE-CARLO SIMULATION
USING A SIMPLE 2-2-1 NEURAL MODEL. THE DURBIN–WATSON TEST
OUTPERFORMS ALL OTHERS. THE FINAL SECTION OF THE TABLE SHOWS THE
POWER OF THE TESTS FOR A COMPLEX 7-5-1 NEURAL ARCHITECTURE.
AGAIN, THE DURBIN–WATSON STATISTIC IS THE MOST POWERFUL BUT
(14) ALL TESTS SHOW REDUCED PERFORMANCE

where are the nonzero eigenvalues of . Using the


fact that where

(15)

and is a particular calculated value of , [20] showed that the


distribution of may be represented in the following way. The
characteristic function of is given by of the Durbin–Watson against a number of other statistics in the
case of neural regression and also other parametric nonlinear
(16)
regression models. We begin by reproducing [31] study of the
performance of the Durbin–Watson test for parametric nonlinear
regression models. White used [17] result that the projection
where for and matrix of parametric nonlinear regression may be written as
otherwise and . The cumulative distribution function
(21)
is given by
where

(17) (22)

Thus a p-value may be obtained for a particular regression White compared the power of the Durbin–Watson test to two
matrix. Reference [10] present another method based on the commonly used asymptotic tests given by
Fourier inversion integral. The p-value may be represented as
(23)
(18)
where
where

(19) (24)

and
and concluded that the Durbin–Watson test has greater power
(20)
than all other tests. This is shown in Table I. The first column
shows the power of the Durbin–Watson test for increasing levels
With both of these procedures, problems include the choice of autocorrelation. Likewise, columns 2 & 3 show the power for
of step size and the truncation of the range of the integral. the asymptotic tests and . Column 4 shows the power of
They are also potentially computationally expensive, especially the [22] statistic evaluated at the first lag for parametric non-
for large and when the step size of integration is small. For linear regression. As expected, the power of each test increases
a discussion of numerical integration and associated problems, as the autocorrelation becomes more severe.
see [27]. Let us now compare the power of these tests (i.e., , ,
& ) for neural models. This is shown in the second part of
D. Examining the Power of the Test Table I for a neural architecture 2-2-1 (i.e., two inputs, two sig-
It is usual in statistics to examine the power of a new test by moidal hidden units and a single linear output node). Column
performing a Monte Carlo simulation. In Table I we present the 5 of the table shows the power of the Durbin–Watson test.
results of a Monte Carlo simulation which compares the power Columns 6 & 7 show the asymptotic tests and column 8 gives
854 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 4, JULY 2001

the power of the Box-Ljung test. Again, the Durbin–Watson past data and then compute the expected value of future vari-
test is consistently more powerful than the other statistics. ance [11]. Such methods may be limited as they do not consider
The Box-Ljung statistic has essentially the same power as the exogenous variables that may contain incremental information
asymptotic tests. A further comparison with a more complex relative to univariate models. It may be the case that useful
7-5-1 architecture is shown in the last part of the table. As ex- insights into volatility may be found from an examination of
pected, the power of all the tests drops off as model complexity exogenous factors such as trading effects, maturity effects,
increases.2 The Durbin–Watson test is again the most powerful. market spreads, etc. These variables are usually added to the
Our results present the worst case power of each test as we model through a conventional linear regression framework.
begin each simulation iteration from a completely random set In this study, we extend the literature on volatility forecasting
of values. by assessing whether nonlinear relationships exist between the
The simulation with the neural models was performed as fol- exogenous variables and implied volatility using a neural re-
lows. We use a multilayer perceptron network with two inputs, gression model. Due to the inherent problem of overfitting with
two hidden sigmoid units, and a single linear output node. neural models, we construct our estimator using a rigorous sta-
1) Fix the weight matrices at random values in the range tistical model building procedure and test the model for mis-
[ 0.5, 0.5]. specification with residuals analysis. Significant variables in the
2) Generate the vector of independent observations . neural model are examined using sensitivity analysis in order
Both variables were drawn from (3.8, 0.4). to identify the type of nonlinearity present. We proceed with a
3) Generate the vector of true disturbances from a normal three-step analysis.
distribution with zero mean and constant variance, with • Univariate analysis: Identify time structure in the implied
autocorrelation fixed at , where = 0.0, 0.1, volatility series using ARIMA modeling.
0.9, etc. • Multivariate linear regression: Building on the results of
4) Calculate from the neural regression model, using , the univariate analysis, identify exogenous influences by
and . building a multivariate linear regression model.
5) Estimate the model using nonlinear least squares, • Multivariate neural regression: Identify nonlinearities and
checking for convergence by ensuring that the error interactions in the data using a neural regression model.
Hessian is positive semidefinite. We ensure a lo-
cally-unique minimum by testing the distribution of the A. The Data
weights using a Kolmogorov-Smirnoff statistic at the
10% significance level. We examine intraday movements for short-maturity,
6) Estimate the model residuals . close-to-the-money call options of 60 minute frequency on the
7) Calculate the statistic . IBEX-35. The data covers a six month sample period between
8) Compute the p-value using the Imhof algorithm. November 1992 to April 1993. The index contains the 35 most
9) Reject the null hypothesis of no autocorrelation if liquid stocks that trade on the Spanish Stock Exchange through
p-value 0.05. its electronic CATS system. This dataset has been provided
10) Repeat Steps 5) to 9), 200 times. by the research department of MEFF (Mercado Espanol de
Futuros Financieros) who provide high quality and precise real
Thus, the performance of the statistic is assessed by counting
time information from electronically recorded trades. Options
the proportion of times the test is able to correctly identify the
on IBEX35 are European style, have a monthly expiration cycle
autocorrelation. Due to the high computational demands of per-
and at every moment the three closer correlative contracts are
forming a Monte Carlo experiment with a neural model we used
quoted (i.e., in March 93, it will be quoted March, April, and
a dataset of 500 observations.
May contracts). The trading hours are set from 11:00 am to
17:00 pm, local time.
III. FORECASTING VOLATILITY WITH NEURAL REGRESSION The implied volatility series from short maturity European
Volatility is the most important variable in options pricing style call options is obtained using the Newton-Raphson method
models. Changes in volatility, especially as an option nears (see, e.g., [14]). Computing the implied volatility of an option
maturity, can have a profound effect upon the value of the requires three types of information:
options contract. The implied volatility of the underlying asset 1) an option valuation model,
can be obtained by solving the pricing model for volatility 2) the values of the model’s parameters (except for
which equates the model and market prices. Thus, the implied volatility), and
volatility represents the markets expectation of future volatility 3) an observed option price with respect to the model.
[15]. Other methods of volatility forecasting come from the The computation of the implied volatility is based on [2]. Within
time-series literature where one may fit ARCH models to the this Black-Scholes model, option value is determined by the
current index value, the option’s exercise price, time to expi-
ration and the riskless interest rate. The interest rate used is the
2One potential reason for this is the greater probability of falling into a dif- current yield of the Treasury bill whose maturity most closely
ferent minimum when using a complex architecture. Resampling locally about a matches the option expiration. Of the three remaining option de-
converged minimum would increase the reported power of the test. In this paper
we simply test for convergence and uniqueness (see Step 5) of the Monte Carlo terminants, the exercise price and time to expiration are known.
algorithm) MEFF automatically records into its database the index level
REFENES AND HOLT: FORECASTING VOLATILITY WITH NEURAL REGRESSION 855

Fig. 1. Time series of implied volatility innovations of IBEX35 index options prices.

whenever an option trades. Finally, the third type of information TABLE II


required to compute the implied volatility is the observed option SUMMARY OF VARIABLES
prices. Again, MEFF automatically record the actual transaction
price whenever an option trades.
Fig. 1 shows the time series of changes in the implied
volatility series given by

(25)

which is analyzed in this study. We consider volatility innova-


tions as most practitioners and academics are interested in how
changes in the expected volatility affect changes in security val-
uation. With visual inspection, there does not appear to be any market. The volume variable is the number of contracts traded
obvious structure to the series of IBEX-35 implied volatility. in an hour.
However, implied volatility levels have a first-order autocorre- We use a further variable average spread to account for in-
lation of 90%, indicating that although a unit root can be rejected formation arrivals in the market. This variable represents the av-
for the series such high autocorrelation may affect inference in erage difference between the bid-ask volatility prices within an
finite samples. hour. The data conversion method is the same as for volume but
In order to explain movements in implied volatility we intro- in this case we average the sum with the number of trades in
duce a number of explanatory variables as described in Table II. the hour. The first difference of spot prices is used as an input
The choice of this universe of variables is motivated as follows. variable to account for differences between the spot price of the
Due to the quotation system which gives us the three closer cor- underlying asset (Futures in IBEX-35) at the end of every hour.
relative contracts (i.e., expiration month at March, April, and In order to capture any seasonality in changes of implied
May) we obtain the implied volatility series derived from dif- volatility (see [23], for discussion) a further dummy variable,
ferent maturity contracts. Although we will expect that the im- day effect, is introduced. Similarly, a time of trade effect variable
plied volatilities obtained from different maturity contract will is incorporated as an input into models. Trades registered in the
be significantly similar due to the fact that we are dealing with first hour are coded 1, the rest are coded 0 (i.e., overnight new
short maturity contracts (less than three months) we introduce a information can affect the behavior of the market at the opening
time to maturity variable. hour).
To ensure that we account for trading effects in our input vari- The standard deviation of past index IBEX-35 returns is also
ables used to explain hourly implied volatility changes we in- used. The historic volatility is computed with a sample horizon
clude two further variables; velocity of trades and volume. The of 25 days. We use an interest rate variable as an input. The
velocity of trades variable represents the number of trades or interest rate is the current yield of the T-bill whose maturity
contracts effectively traded in an hour. This number is the sum most closely matches the option expiration. Finally, to account
of total trades before the end of the hour. Frequent trading is an for the different situation of the option with respect to strike
indication of movement and possible volatile situations in the prices, we introduce a so-called moneyness variable (see [23]).
856 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 4, JULY 2001

TABLE III
SUMMARY STATISTICS OF THE DATA

Table III shows a number of descriptive statistics for the data.

B. Assessing Time Structure—Univariate Analysis


Our first task is to identify time structure in the implied
volatility series. This is achieved by examining the autocor-
relation function, ACF., (Fig. 2) and partial autocorrelation
function, PACF., (Fig. 3). From the partial autocorrelation
function we observe that the first four lags are significant,
suggesting an AR(4) model.
We may confirm the presence of time structure using other
nonparametric tests of randomness which are summarized in
Table IV. The first (runs) test counts the number of times the
sequence was above or below the median. The number of such
runs is 413, as compared to an expected value of 318.95 if the
sequence were random. Since the P-value for this test is less
than 0.01, we can reject the hypothesis that the series is random
at the 99% confidence level. The second (runs up & down) test Fig. 2. Autocorrelation function for implied volatility innovations.
counts the number of times the sequence rose or fell in value.
The number of such runs equals 465, as compared to an ex-
pected value of 435.67 if the sequence were random. Since the
P-value for this test is less than 0.01, we can reject the hypoth-
esis that the series is random at the 99% confidence level. The
third (Box-Pierce) test is based on the sum of squares of the
first 24 autocorrelation coefficients. Since the P-value for this
test is less than 0.01, we can reject the hypothesis that the series
is random at the 99% confidence level. Since the three tests are
sensitive to different types of departures from random behavior,
failure to pass any test suggests that the time series may not be
completely random.
Clearly, these tests support our discovery of structure in the
series using the ACF and PACF. We therefore proceed by esti-
mating an AR(4) model described in Table V. All variables are
strongly significant at the 5% level and the performance of the
model is surprisingly good, with an of 34%. The residuals Fig. 3. Partial autocorrelation function for implied volatility innovations.
of the model pass the Durbin–Watson test for autocorrelation.
The next step of our procedure is to assess whether there is We can see that most of the variables are not significant at the
any gain to be made by modeling implied volatility innovations 5% level. The explanatory power of the model is extremely low,
using exogenous variables. with an adjusted of only 6.36% . The residuals do not pass
the Durbin–Watson test for autocorrelation, indicating model
C. Introducing Exogenous Variables—Linear Regression misspecification. However, from the t-ratios of the variables, it
Analysis is clear that some variables are indeed significant at the 5% level.
In order to assess whether any of the exogenous variables A stepwise backward selection (starting with the highest
influence implied volatility, we begin by performing a multiple p-value, we remove insignificant variables individually until all
linear regression with all variables included, shown in Table VI. variables have p-values 0.05) yields the final model shown
REFENES AND HOLT: FORECASTING VOLATILITY WITH NEURAL REGRESSION 857

TABLE IV TABLE VII


NONPARAMETRIC TESTS FOR RANDOMNESS OF IMPLIED VOLATILITY FINAL MODEL FROM STEPWISE REGRESSION. METHOD: BACKWARD
SELECTION; F-TO-ENTER: 4.0, F-TO-REMOVE: 4.0

TABLE VIII
STATISTICS FOR THE FULL NEURAL MODEL (15 INDEPENDENT VARIABLES,
TABLE V
FIVE HIDDEN UNITS, k = 86 CONNECTIONS). THE EFFECTIVE NUMBER OF
LINEAR AUTOREGRESSION MODEL FOR IMPLIED VOLATILITY INNOVATIONS
PARAMETERS IS GIVEN BY p
USING THE LAGS IDENTIFIED FROM THE PACF

TABLE VI
MULTIVARIATE LINEAR REGRESSION MODEL WITH ALL AVAILABLE
VARIABLES

as these will show the greatest swings in value for changes in


the underlying asset. For a discussion of these issues see [15].
From a statistical perspective, the addition of these variables
gives an increase in explanatory power as measured by the ad-
justed of 41.17%. Fig. 4 shows the model residuals, which
reveal no obvious time structure with visual inspection. This
is confirmed by the autocorrelation and partial autocorrelation
functions (Figs. 5 and 6) which show no significant correlations
in Table VII which includes the lagged variables found to be at the 5% significance level. The Durbin–Watson value of 2.38
significant in the univariate analysis of implied volatility. (See means we do not reject the null hypothesis of no autocorrela-
Table VIII.) tion.
The introduction of exogenous variables has demonstrated a
clear relationship between the implied volatility and change in D. Detecting Nonlinearities—Neural Network Analysis
spot, maturity effect and moneyness. This subset of variables A small but significant increase in explanatory power was
makes sense from a financial perspective. The implied volatility obtained by adding exogenous variables to our model in a
should have a close relationship to changes in the underlying linear fashion. A logical step to follow this analysis is to use
index as the changes in the spot price drive changes in the op- a neural network to establish whether nonlinear dependencies
tions price. It is well known that as an option approaches matu- are present between volatility and any variables from our
rity the volatility of the underlying asset plays a great role in the universe. We do this by constructing well-specified neural
price of options. Similarly, there is usually a heavy amount of estimators using variable significance and misspecification test
trading in options that are near-the-money or out-of-the-money procedures.
858 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 4, JULY 2001

Fig. 4. Linear regression model residuals.

Fig. 5. Autocorrelation function of linear model residuals. Fig. 6. Partial autocorrelation function of linear model residuals.

As with linear regression, we must now check that the in-


We use a standard multilayer perceptron network architec-
cluded variables are statistically significant. For this we require
ture with a single layer of sigmoidal hidden units and a linear
a variable relevance metric (see, e.g., [28]) and its standard
output node. We split the data into three sub-samples; the first
error. For measuring the relevance of a variable we use the MFS
of 540 points for training, the second 59 point set for cross-val-
criteria of [25] which measures the relevance of a variable in
idation, and the final 55 point set for out of sample perfor-
terms of sensitivity of the goodness of fit to the presence of the
mance measurement using a simple trading rule. The neural
variable in the model. This is defined as
model is trained until convergence3 (i.e., the error Hessian is
positive semidefinite). We use a simple model selection proce- (26)
dure whereby the number of hidden units is chosen by training
where
a number of networks and measure the empirical loss (training
error on 540 data points) and prediction error (cross-validation
error of 59 points) on the training dataset. These results are
shown in Fig. 7. As expected, the in-sample error continuously
decreases as the model increases in complexity, with the out
of sample error showing a minimum error at five hidden units
(beyond this point the generalization properties of the network
suffer as we have overfitted the in-sample data). We therefore
chose the final model architecture of five hidden units. The distribution of this measure is obtained empirically by a
local bootstrap. For each variable we test the null hypothesis
3This is important as estimates of degrees of freedom as well as the distribu-
tion of the various statistics derived from the weights assume convergence.
REFENES AND HOLT: FORECASTING VOLATILITY WITH NEURAL REGRESSION 859

Fig. 7. Empirical loss (training error) and estimated prediction risk (projected out of sample error) versus the number of hidden units for a single hidden layer
network.

against the alternative TABLE IX


STATISTICS FOR THE FINAL NEURAL MODEL (7 INDEPENDENT VARIABLES,
5 HIDDEN UNITS, 46 CONNECTIONS)

We do not make any assumptions about the small sample


distribution of the MFS measure. Instead, we use parametric
resampling to obtain the empirical distribution and therefore
the significance points of the test. Briefly, the asymptotic
distribution of the standardized quantity can
be shown to be multivariate normal with mean of zero and
covariance matrix where is the esti-
mated and the true parameter vector. The error Hessian is
with ,
where is a matrix that includes the dependent variable and
the regressors . We sample from the asymptotic distribution
of the parameters at each stage of the selection procedure.
As with the linear framework, at each step of the selection
procedure we removed the variable with the largest p-value
greater than 0.05. After a variable was removed, the model volatility smile [15] with a minimum near the forward price of
was reestimated and the MFS, distribution and prediction risk the underlying asset.
computed. At each stage, if the prediction risk increased by Fig. 9 shows the model sensitivity to change in spot price.
more than 5%, the variable was reinserted into the model. This Again, the relationship seems quadratic. Large changes in spot
allowed for functional dependencies between the variables to price produce large deviations in implied volatility. However,
be accounted for. When the explanatory variables are not truly for small changes in spot price, we observe little change in im-
independent this can effect the estimation of the MFS measure plied volatility. These nonlinearities could explain the outper-
and the associated p-values. The results of this procedure are formance of the neural regression over the other model classes.
shown in Table IX. We can see that the variables selected are the We assess the fit of the model by plotting the actual values of
same as the linear regression, but the weightings are different. implied volatility against those predicted by the model (Fig. 10).
One further method of obtaining insight into the relationship As the points are spread around the diagonal and there are no
between independent variables and implied volatility is to ex- obvious outliers we see that the model does indeed fit the data
amine the sensitivity of the output with respect to each input. accurately.
This is achieved by varying the variable of interest over a suit- The final test for the model is to examine the residuals,
able range of values and observing the impact on the dependent shown in Fig. 11. Visually, the residuals show no obvious time
variable while keeping all other variables fixed to their mean structure. We test for autocorrelation by examining the auto-
value. In this case we examine a range of 3 standard devia- correlation function (Fig. 12) and the partial autocorrelations
tions around the mean. This is shown for moneyness in Fig. 8. (Fig. 13). It is clear that there are no significant correlations at
The function appears quadratic in nature. We see the familiar any lag. We verify this by performing a number of statistical
860 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 4, JULY 2001

Fig. 8. Sensitivity of network output with respect to moneyness.

Fig. 9. Sensitivity of network output with respect to change in spot price.

tests which are presented in Table X. The p-value of our we go long (buy) one options contract if the predicted change in
generalized Durbin–Watson test is well below the critical value volatility is positive and short (sell) one contract is the predicted
of 0.05. Additionally, we perform the two tests and change in volatility is negative. Fig. 14 shows the forecasted
from [17], both of which are well below the critical value for values from the neural model with the actual implied volatilities
rejection of the null hypothesis. We therefore conclude that on the out of sample data set. On the whole, the model accurately
there is no autocorrelation in the residuals. forecasts the direction of future volatility changes. The cumu-
lative profits for the autoregressive, linear, and neural models
E. Trading Performance are shown in Fig. 15. While each method appears to be prof-
We have shown that the final model is satisfactory from an itable, a real trading model must demonstrate stability and risk
specification viewpoint, but an important issue for practitioners control. We therefore present analysis of the cumulative profit
is whether the model performs well in a true out-of-sample in Table XI. The autoregressive model gives a mean profit of
trading situation. We therefore compare the final linear and 1.84 per day but suffers from a maximum drawdown4 of 26%.
neural models in a simplified trading simulation. Assume that Drawdowns of over 10% will not be tolerated in a real trading
the implied volatility option prices are computed by the trivial environment. The multiple linear regression model has a greater
model mean profit, but has a drawdown of 99% at time period 9 (thus
almost completely wiping out all trading capital). The model
Option price Volatility (27) is therefore inappropriate for trading without investigation. The
neural model has a higher mean profit than the other models and
where the initial price is 50 pesetas and the initial volatility is
20%. We test the models by applying a trading rule such that 4The drawdown is the percentage of capital lost over a particular day.
REFENES AND HOLT: FORECASTING VOLATILITY WITH NEURAL REGRESSION 861

Fig. 10. Predicted versus actual values of Implied volatility for final neural model.

Fig. 11. Residual errors from the neural model.

a lower drawdown at 16% and therefore represents the best of a Monte Carlo simulation for a simple network architecture and
the three techniques examined. our final architecture of 7-5-1.
We have described a volatility predictor based on a neural
model that outperforms other commonly used techniques both
in terms of statistical performance metrics (e.g., ) and also in
IV. CONCLUSION a simple trading simulation. We can have further confidence in
the model as each parameter has been tested for statistical sig-
We have discussed the importance of the misspecification nificance and the model has been shown to be correctly speci-
analysis for neural regression models. To test for first-order fied. The use of adjusted performance metrics allows for direct
linear autocorrelation we presented a generalization of the comparison between neural networks and other model classes.
Durbin–Watson test statistic which may be used with neural Residual diagnostic testing is widespread in linear modeling
networks. The use of such diagnostic statistics is essential and is a necessary, but not sufficient, condition for model ade-
in a robust model development process for neural regression quacy. It is hoped that the use of these statistics will lead to both
models. The statistic was shown to be of comparable power greater care in the specification of neural models and aid their
with common tests used in linear regression modeling through acceptance by forecasters.
862 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 4, JULY 2001

The generalized influence matrix, is given by

(31)

where is the derivative with respect to and , and is


the second-order derivatives with regard to .
Proof: We begin with the well known relationship (e.g.,
[12])

(32)

i.e. the influence matrix gives the sensitivities of the estimated


dependent variable with respect to the observed variable

(33)

We consider each partial derivative in turn


Fig. 12. Estimated autocorrelation function of neural model residuals.
(34)

Now for . We denote as and


perform a Taylor series expansion to second order of
around the converged unbiased solution at

(35)

where represents substitution at the converged unbiased so-


lution and the subscripts indicate derivatives with respect to
and . We are therefore assuming that higher order derivatives
evaluated at the optimum are negligible. We drop the asterisks
for notational convenience. We should note that since
is defined to minimize . We minimize
by setting

Fig. 13. Estimated partial autocorrelation function of neural model residuals.


(36)
TABLE X
RESIDUAL DIAGNOSTIC STATISTICS FOR THE NEURAL MODEL The first order conditions are

(37)

Solving for the parameter gives

(38)
APPENDIX A
(39)
DERIVATION OF PROPOSITION ONE
1) Proposition: For the neural regression described by and

(40)
(28)
(41)
with

argmin where (29) So

(42)
and (43)

(30) This relationship holds when the second-order derivatives are


continuous and the network has converged to a locally unique
REFENES AND HOLT: FORECASTING VOLATILITY WITH NEURAL REGRESSION 863

Fig. 14. Predicted vs actual for out-of-sample data from the neural model. Note that this dataset has not been used in any stage of the model specification,
estimation or statistical testing of the models.

Fig. 15. Simple trading rule performance.

TABLE XI
PERFORMANCE METRICS FOR TRADING MODELS

minimum. For a complete discussion of degrees of freedom in [4] D. Bunn, “Nontraditional methods of forecasting,” European J. Opera-
all classes of regression model see [6]. tional Res., vol. 92, pp. 528–536, 1996.
[5] M. Carapeto and W. Holt, “Testing for heteroscedasticity in regression
models,” J. Appl. Statist., 2001, to be published.
REFERENCES [6] M. Carapeto, W. Holt, and A.-P. N. Refenes, “Degrees of freedom in
regression models,”, Working Paper, Dept. Decision Sci., London Busi-
[1] W. Baxt and H. White, “Bootstrapping confidence intervals for clinical ness School, 2000.
input variable effects in a neural network trained to identify the presence [7] J. Durbin, “Testing for serial correlation in least squares regression when
of acute myocardial infarction,” Neural Comput., vol. 7, pp. 624–638, some of the regressors are lagged dependent variables,” Econometrica,
1995. vol. 38, pp. 410–421, 1970.
[2] F. Black and M. Scholes, “The pricing of options and corporate liabili- [8] J. Durbin and G. Watson, “Testing for serial correlation in least squares
ties,” J. Political Economy, vol. 81, pp. 637–659, 1973. regression,” Biometrika, vol. 37, pp. 409–428, 1950.
[3] G. Box and D. Cox, “An analysis of transformations,” J. Roy. Statist. [9] , “Testing for serial correlation in least squares regression,”
Soc., ser. B, pp. 211–264, 1964. Biometrika, vol. 38, pp. 159–178, 1951.
864 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 4, JULY 2001

[10] , “Testing for serial correlation in least squares regression,” [29] K. Wallis, “Testing for fourth-order autocorrelation in quarterly regres-
Biometrika, vol. 58, pp. 1–19, 1971. sion equations,” Econometrica, vol. 40, pp. 617–636, 1972.
[11] R. Engle, “Autoregressive conditional heteroscedasticity with estimates [30] H. White, “Neural networks: A statistical perspective,” Neural Comput.,
of the variance of United Kingdom inflation,” Econometrica, vol. 50, vol. 1, pp. 425–464, 1989.
pp. 987–1001, 1982. [31] K. White, “The Durbin-Watson test for autocorrelation in nonlinear
[12] W. Greene, Econometric Analysis. Englewood Cliffs, NJ: Prentice- models,” Rev. Economics Statist., vol. 74, pp. 362–365, 1992.
Hall, 1989. [32] H. White, Estimation, Inference and Specification Analysis, Econo-
[13] S. Goldfeld and E. Quandt, “Some tests for homoscedasticity,” J. Amer. metric Society Monographs: Cambridge Univ. Press, 1995.
Statist. Assoc., vol. 60, pp. 539–547, 1965. [33] A. Wald, “Tests of statistical hypothesies concerning several parameters
[14] E. Haug, The Complete Guide to Options Pricing Formulas. New when the number of observations is large,” Trans. Amer. Math. Soc., vol.
York: McGraw-Hill, 1998. 54, pp. 426–482, 1943.
[15] J. Hull, Options, Futures, and Other Derivative Securities: Prentice Hall
Publishing Co., 1993.
[16] J. Imhof, “Computing the distribution of quadratic forms in normal vari-
ables,” Biometrika, vol. 48, pp. 419–426, 1961.
[17] G. G. Judge, R. C. Hill, W. Griffiths, H. Lutkepohl, and T.-C. Lee, In-
troduction to the Theory and Practice of Econometrics, 2nd ed. New
York: Wiley, 1988. Apostolos-Paul N. Refenes received the B.Sc. degree in mathematics and com-
[18] M. King, “The Durbin-Watson bounds test and regressions without an puting 1984 and the Ph.D. degree in computing in 1987.
intercept,” Australian Economic Papers, vol. 20, pp. 161–170, 1981. He is Professor of Financial Engineering at Athens University of Business
[19] G. Kramer, “On the Durbin-Watson bounds in the case of regression and Economics and visiting Professor of Decision Science at London Business
through the origin,” Jahrbucher fur Nationalokonomie and Satistik 185, School. He has held previous appointments at London Business School, Univer-
1971. sity College London, University of Athens, and the DTI. He is the author of over
[20] W. L’Esperance, D. Chall, and D. Taylor, “An algorithm for determining 100 papers and editor of four books on the subjects of neural computing and fi-
the distribution function of the Durbin-Watson test statistic,” Economet- nancial engineering applications. His research interests include neural-network
rica, vol. 44, pp. 1325–1326, 1976. design methodology, model identification, and estimation procedures and ap-
[21] W. L’Esperance and D. Taylor, “The power of four tests of autocorrela- plied research on tactical asset allocation, factor models for equity investment,
tion in the linear regression model,” J. Econometrics, vol. 3, pp. 1–21, dynamic risk management, nonlinear cointegration, exchange risk management,
1975. etc.
[22] G. Ljung and G. Box, “On a measure of lack of fit in time series models,” Dr. Refenes founded the international conference on Neural Networks
Biometrika, vol. 66, pp. 265–270, 1979. in the Capital Markets (NnCM) and served as general chair for NnCM-93
[23] A. Miranda and H. Gonzalez, “IBEX options historical database and NnCM-95, Computational Finance 1997, International Chair for the
manual,” in Internal Discussion Document, Spain: MEFF options Joint IEEE/IAFE conference on Computational Intelligence in Financial
exchange, 1994. Engineering (CIFEr), and several other international conferences. Research
[24] J. Moody, “The effective number of parameters: An analysis of gener- on Neural Networks, Financial Engineering and Computational Finance is
alization and processing in nonlinear learning systems,” in Advances in supported by the ESRC, the DTI, ESPRIT, VALUE, and privately by several
Neural Information Processing Systems 4. San Mateo, CA: Morgan companies in the finance sector.
Kaufmann, 1992.
[25] J. Moody and J. Utans, “Architecture selection for neural networks: Ap-
plication to bond rating prediction,” in Neural Networks in the Capital
Markets, A. N. Refenes, Ed. New York: Wiley, 1995.
[26] M. Nerlove and K. Wallis, “Use of the Durbin-Watson statistic in inap- Will T. Holt received the B.Eng. degree in electronics from the University of
propriate situations,” Econometrica, vol. 34, pp. 235–238, 1966. Liverpool, U.K., and the Ph.D. degree in decision science from London Business
[27] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Nu- School.
merical Recipes in C, 2nd ed: Cambridge Univ. Press, 1992. He has recently joined Goldman Sachs & Co as a senior financial engineer
[28] A.-P. N. Refenes and A. Zapranis, “Neural model identification, variable for European Statistical Arbitrage Trading. The topic of his thesis is misspeci-
selection and model adequacy,” J. Forecasting, vol. 18, pp. 299–333, fication tests for neural regression models. His research interests include neural
1999. networks, nonlinear dynamics, statistics and financial econometrics.

You might also like