Professional Documents
Culture Documents
4, JULY 2001
Abstract—In recent years neural networks have reportedly to raise the question of whether neural networks really are “a
achieved considerable successes in a variety of forecasting applica- major breakthrough or just a passing fad.” A number of factors
tions. Although the results are usually accompanied by extensive justify this skepticism: first, neural networks are universal ap-
empirical validation, practitioners and statisticians still remain
skeptical: the curse of overfitting is compounded by the lack proximators and are thus capable of providing a model which
of rigorous procedures for model identification, selection and fits any data with an arbitrary degree of accuracy; second, the
adequacy testing. This paper describes a methodology for neural lack of procedures for performing tests for misspecified models,
model misspecification testing. We introduce a generalization of and tests of statistical significance for various parameters that
the Durbin–Watson statistic for neural regression and discuss the have been estimated, makes it difficult to assess the model’s sig-
general issues of misspecification testing using residual analysis.
We derive a generalized influence matrix for neural estimators nificance and the possibility that any short term successes that
which enables us to evaluate the distribution of the statistic. We are reported might be due to “data mining.”
deploy Monte Carlo simulation to compare the power of the test The results of many applications are perfectly plausible but
for neural and linear regressors. While residual testing is not a suf- in the absence of a structured procedure for misspecification
ficient condition for model adequacy, it is nevertheless a necessary testing, models are exposed to justifiable criticism. Recent years
condition to demonstrate that the model is a good approximation
to the data generating process, particularly as neural-network have seen the emergence of a more structured approach to model
estimation procedures are susceptible to partial convergence. building for neural networks. Reference [30] has developed a
The work is also an important step toward developing rigorous test for model parameter significance testing which was the
procedures for neural model identification, selection and adequacy first important contribution to statistical identification for neural
testing which have started to appear in the literature. We demon- models. Reference [1] extended this work and provided pro-
strate its applicability in the nontrivial problem of forecasting
implied volatility innovations using high-frequency stock index cedures for variable significance testing. Reference [24] pro-
options. Each step of the model building process is validated using vided expressions of neural model complexity by generalizing
statistical tests to verify variable significance and model adequacy the linear equivalents. Reference [25] examined issues of archi-
with the results confirming the presence of nonlinear relationships tecture selection and simple metrics for variable relevance. Ref-
in implied volatility innovations. erence [28] gave a comprehensive examination of variable rele-
Index Terms—Autocorrelation, Durbin–Watson statistic, neural vance metrics and assessed methods of deriving the distribution
networks, residual diagnostics, volatility forecasting. of the metrics. However, all work to date has been concerned
with variable and parameter testing—the problem of model ade-
I. INTRODUCTION quacy has yet to be addressed. Conventional econometric model
selection techniques require the residuals to be examined for
I N recent years an impressive array of publications have ap-
peared claiming considerable success of neural networks in
forecasting applications in business and finance. Some have pro-
structure as a way of ensuring the model is not misspecified.
This is particularly important for estimation procedures which
are susceptible to partial convergence (e.g., local minima, flat
vided empirical results to support these claims1 but many fell
valleys).
short of this prompting skeptical practitioners and statisticians
In this paper we present a methodology for model misspecifi-
cation testing which is applicable to all types of “neural regres-
Manuscript received August 9, 2000; revised February 9, 2001 and March 28, sion” based on analysis of the model residuals. Such tests are
2001. This work was supported by ROPA under Grant RO22250057, BT Lab- a necessary, but not sufficient, condition for model adequacy.
oratories, U.K. Economic and Social Research Council (E.S.R.C.), and Ph.D. We demonstrate its usefulness in a real world application by de-
Programme, London Business School.
The authors are with the Department of Decision Sciences, London Business veloping a neural predictor of implied volatility innovations for
School, London NW1 4SA, U.K. options pricing within a robust statistical framework. We choose
Publisher Item Identifier S 1045-9227(01)05012-3. input variables with tests of significance and introduce a much
1Traditional model validation using out-of-sample (test set) performance can needed residual diagnostic test procedure for autocorrelation in
be seriously misleading when performed on a small sample and may not be the residuals. We derive the distribution for the statistic so that a
possible if we do not have enough data. In contrast, linear regression models formal statistical test may be performed. The methodology for
have a wide variety of statistical tests for variable and parameter significance
to enable parsimonious models to be produced. Statistics are available to test misspecification testing can be deployed by other researchers
every model assumption; the F-test may be used to assess the whole model, who wish to apply neural modeling to forecasting applications.
t-tests examine the significance of model parameters and residual diagnostics In Section II of this paper, we discuss the general issues
(e.g., [8], [9], [13]) test the assumptions of the residual errors. These tests are
particularly important in situations where model selection is based on small data involved in misspecification testing and particularly issues
sets, without a hold-out sample. arising for neural regression. We also introduce a new residual
1045–9227/01$10.00 © 2001 IEEE
REFENES AND HOLT: FORECASTING VOLATILITY WITH NEURAL REGRESSION 851
diagnostic test procedure for autocorrelation based on the parameters (e.g., convergence criteria, learning rate) define the
Durbin–Watson statistic and assess its power with a Monte complexity of the model. Reducing the complexity is analogous
Carlo simulation. Section III presents a case study of volatility to making assumptions in the modeling process. With too few
forecasting for options pricing using neural regression. Statis- hidden units (i.e., parameters in ), the fitted function
tics are used to test every model assumption; we examine the will on average be different from the true regression .
significance of individual model variables, select the architec- However, with fewer assumptions about the data generating
ture in a principled way and perform residual diagnostics to process (i.e., a large network) there is a serious risk that noise in
test the assumptions of the residual errors. We compare our the data will be fitted as well as any deterministic component.
neural model to univariate time-series and multivariate linear With a complex network, spurious relationships may be found
regression techniques. Section IV concludes the paper and the that degrade the performance of the system. Neural modeling
Appendix contains mathematical derivations. must therefore tradeoff variance for bias, for example, an
assumption may be made that the network should “fit a smooth
II. MISSPECIFICATION TESTING FOR NEURAL REGRESSION function."
Reference [4] places neural networks in context with the
A. Neural Regression
other major forecasting methodologies, observing that neural
Consider the neural regression model described by models offer a freedom from statistical assumptions, resilience
(1) to missing observations and an ability to incorporate nonlinear
relationships. The price of this freedom is a reliance on em-
where
pirical performance for validation due to the lack of statistical
vector of observations on the dependent variable;
diagnostics and easily understandable model structure. Bunn
nonstochastic matrix of independent variables;
states, “it may be that even a large preponderance of empirical
nonlinear function;
evidence in its favor will not be sufficient to create confidence
vector of residual errors. The regressor is defined
in the technology without more research into explainability
by the functional form where is a
and robustness diagnostics.” Much recent work has therefore
vector of model parameters (weights).
concentrated on strengthening the statistical foundations of
The regressor may take many forms, from linear regression to
neural model identification procedures. One area that has been
nonparametric nonlinear neural regression. For any given re-
neglected is that of model mis-specification testing. It is not
gressor the accuracy to which a given set of parame-
generally appreciated that the specification of a model includes
ters, , fits the data is measured by an error function .
assumptions about the residual errors, usually the following.
The error function is associated with the specific objective of the
modeling process (e.g., conditional mean, conditional median, 1) The error term has a mean of zero, i.e., the
conditional variance, etc.). The general requirement of any error model is unbiased.
function is that it must provide a monotonic measure of discrep- 2) , Error term has constant variance.
ancy between model and objective. Typically, the error function 3) , ; Error terms at different
is given by the mean squared error points in time are uncorrelated.
In regression modeling it is vital that these assumptions are
(2) not violated for the parameter estimates to be optimal (i.e., unbi-
ased and efficient). In general, three types of test may therefore
be performed on the residuals.
where .
The objective of “neural leaning” is to estimate the parame- • Autocorrelation: The error in one time period should not
ters which minimize the empirical error function be affected by the error in another time period, e.g., [8].
• Heteroscedasticity: The errors should have a constant vari-
argmin where (3) ance, e.g., [13].
• Normality: The errors should be characterized by a normal
This is achieved using an iterative minimization procedure distribution, e.g., [33].
which produces a converged least squares estimator. The fact
In earlier work [5], we introduced a test for heteroscedas-
that neural networks are able to fit any continuous and differen-
ticity that is applicable to neural models. In the next sections
tiable function leads to the “bias-variance” dilemma. It may be
we examine autocorrelation in the residuals and present a gen-
easily shown that the empirical risk (mean squared error) may
eralization of the Durbin–Watson statistic for neural regression
be decomposed into two components
models. These tests provide a “toolkit” to test neural models for
mis-specification and complete the modeling methodology of
[28].
efficients (inefficiency) and prediction intervals that are exces- The projection matrix can be expressed in terms of the gen-
sively wide. At its most basic, a significant autocorrelation in the eralized influence matrix, , as follows:
residuals represents a deterministic error component that has not
been captured by the model. Violating the no-autocorrelation as-
sumption for residuals may invalidate other model specification
tests, for example, the significance of independent variables or (8)
model parameters. With linear regression there are several major
sources of mis-specification that lead to autocorrelated errors, where
namely omission of a relevant variable and incorrect functional identity matrix of order ;
form of the regression model. Although neural models are ca- influence matrix;
pable of universal approximation, this problem can arise if the projection matrix.
network has insufficient complexity to model the true data gen- We may now use the result in Proposition 1 to compute the
erating process. empirical distribution of the Durbin–Watson statistic [16] to test
The Durbin–Watson statistic is the most commonly used test the null hypothesis of no autocorrelation.
for autocorrelation in regression models and has a solid body of Proposition 1: For the neural regression described by
theory for linear regression which we shall extend to apply to (9)
neural regression. The test considers autocorrelation in the first
lag of the residuals for two reasons. First, if autocorrelation is with being the converged least squares estimates
present it is often strongest at the first lag. Secondly, it makes argmin where (10)
tractable the problem of estimating the large number of off-diag-
onal elements of the error variance-covariance matrix. However,
and
as with all statistical tests, the Durbin–Watson test is not without
its weaknesses. It is only valid when a standard set of assump-
tions concerning the model and data are met (see, e.g., [12]). It (11)
is common in econometric modeling to violate one or more of
the assumptions due to undesirable properties of the data (e.g., The generalized influence matrix, is given by
skewness or nonstationarity). The errors of the model may also
be non-Gaussian or heteroscedastic. There has therefore been (12)
considerable interest in the behavior of the statistic when these
where is the derivative with respect to and , and is
assumptions are violated. Relevant references include [19], [18],
[26], [7], [29], [21]. Clearly, these problems also apply to neural the second-order derivatives with regard to .
regression. Nevertheless, provided one is aware of the above, it For derivation, see Appendix 1.
is still one of the most powerful tools for the detection of model It is relatively easy to show that in the case of linear regression
misspecification (see, for example, [31] and our own compara- (13)
tive results for neural regression in Section II-D). Let us derive
the “neural” equivalent of the test. which may be used to compute the empirical distribution of the
The Durbin–Watson statistic [8]–[10] is given by statistic [16] or compute bounds [9] to test the null hypothesis of
the test. The trace of the matrix is our estimate of degrees
of freedom in the neural model, (see [6] for a complete
(5) exposition).
(15)
(17) (22)
Thus a p-value may be obtained for a particular regression White compared the power of the Durbin–Watson test to two
matrix. Reference [10] present another method based on the commonly used asymptotic tests given by
Fourier inversion integral. The p-value may be represented as
(23)
(18)
where
where
(19) (24)
and
and concluded that the Durbin–Watson test has greater power
(20)
than all other tests. This is shown in Table I. The first column
shows the power of the Durbin–Watson test for increasing levels
With both of these procedures, problems include the choice of autocorrelation. Likewise, columns 2 & 3 show the power for
of step size and the truncation of the range of the integral. the asymptotic tests and . Column 4 shows the power of
They are also potentially computationally expensive, especially the [22] statistic evaluated at the first lag for parametric non-
for large and when the step size of integration is small. For linear regression. As expected, the power of each test increases
a discussion of numerical integration and associated problems, as the autocorrelation becomes more severe.
see [27]. Let us now compare the power of these tests (i.e., , ,
& ) for neural models. This is shown in the second part of
D. Examining the Power of the Test Table I for a neural architecture 2-2-1 (i.e., two inputs, two sig-
It is usual in statistics to examine the power of a new test by moidal hidden units and a single linear output node). Column
performing a Monte Carlo simulation. In Table I we present the 5 of the table shows the power of the Durbin–Watson test.
results of a Monte Carlo simulation which compares the power Columns 6 & 7 show the asymptotic tests and column 8 gives
854 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 4, JULY 2001
the power of the Box-Ljung test. Again, the Durbin–Watson past data and then compute the expected value of future vari-
test is consistently more powerful than the other statistics. ance [11]. Such methods may be limited as they do not consider
The Box-Ljung statistic has essentially the same power as the exogenous variables that may contain incremental information
asymptotic tests. A further comparison with a more complex relative to univariate models. It may be the case that useful
7-5-1 architecture is shown in the last part of the table. As ex- insights into volatility may be found from an examination of
pected, the power of all the tests drops off as model complexity exogenous factors such as trading effects, maturity effects,
increases.2 The Durbin–Watson test is again the most powerful. market spreads, etc. These variables are usually added to the
Our results present the worst case power of each test as we model through a conventional linear regression framework.
begin each simulation iteration from a completely random set In this study, we extend the literature on volatility forecasting
of values. by assessing whether nonlinear relationships exist between the
The simulation with the neural models was performed as fol- exogenous variables and implied volatility using a neural re-
lows. We use a multilayer perceptron network with two inputs, gression model. Due to the inherent problem of overfitting with
two hidden sigmoid units, and a single linear output node. neural models, we construct our estimator using a rigorous sta-
1) Fix the weight matrices at random values in the range tistical model building procedure and test the model for mis-
[ 0.5, 0.5]. specification with residuals analysis. Significant variables in the
2) Generate the vector of independent observations . neural model are examined using sensitivity analysis in order
Both variables were drawn from (3.8, 0.4). to identify the type of nonlinearity present. We proceed with a
3) Generate the vector of true disturbances from a normal three-step analysis.
distribution with zero mean and constant variance, with • Univariate analysis: Identify time structure in the implied
autocorrelation fixed at , where = 0.0, 0.1, volatility series using ARIMA modeling.
0.9, etc. • Multivariate linear regression: Building on the results of
4) Calculate from the neural regression model, using , the univariate analysis, identify exogenous influences by
and . building a multivariate linear regression model.
5) Estimate the model using nonlinear least squares, • Multivariate neural regression: Identify nonlinearities and
checking for convergence by ensuring that the error interactions in the data using a neural regression model.
Hessian is positive semidefinite. We ensure a lo-
cally-unique minimum by testing the distribution of the A. The Data
weights using a Kolmogorov-Smirnoff statistic at the
10% significance level. We examine intraday movements for short-maturity,
6) Estimate the model residuals . close-to-the-money call options of 60 minute frequency on the
7) Calculate the statistic . IBEX-35. The data covers a six month sample period between
8) Compute the p-value using the Imhof algorithm. November 1992 to April 1993. The index contains the 35 most
9) Reject the null hypothesis of no autocorrelation if liquid stocks that trade on the Spanish Stock Exchange through
p-value 0.05. its electronic CATS system. This dataset has been provided
10) Repeat Steps 5) to 9), 200 times. by the research department of MEFF (Mercado Espanol de
Futuros Financieros) who provide high quality and precise real
Thus, the performance of the statistic is assessed by counting
time information from electronically recorded trades. Options
the proportion of times the test is able to correctly identify the
on IBEX35 are European style, have a monthly expiration cycle
autocorrelation. Due to the high computational demands of per-
and at every moment the three closer correlative contracts are
forming a Monte Carlo experiment with a neural model we used
quoted (i.e., in March 93, it will be quoted March, April, and
a dataset of 500 observations.
May contracts). The trading hours are set from 11:00 am to
17:00 pm, local time.
III. FORECASTING VOLATILITY WITH NEURAL REGRESSION The implied volatility series from short maturity European
Volatility is the most important variable in options pricing style call options is obtained using the Newton-Raphson method
models. Changes in volatility, especially as an option nears (see, e.g., [14]). Computing the implied volatility of an option
maturity, can have a profound effect upon the value of the requires three types of information:
options contract. The implied volatility of the underlying asset 1) an option valuation model,
can be obtained by solving the pricing model for volatility 2) the values of the model’s parameters (except for
which equates the model and market prices. Thus, the implied volatility), and
volatility represents the markets expectation of future volatility 3) an observed option price with respect to the model.
[15]. Other methods of volatility forecasting come from the The computation of the implied volatility is based on [2]. Within
time-series literature where one may fit ARCH models to the this Black-Scholes model, option value is determined by the
current index value, the option’s exercise price, time to expi-
ration and the riskless interest rate. The interest rate used is the
2One potential reason for this is the greater probability of falling into a dif- current yield of the Treasury bill whose maturity most closely
ferent minimum when using a complex architecture. Resampling locally about a matches the option expiration. Of the three remaining option de-
converged minimum would increase the reported power of the test. In this paper
we simply test for convergence and uniqueness (see Step 5) of the Monte Carlo terminants, the exercise price and time to expiration are known.
algorithm) MEFF automatically records into its database the index level
REFENES AND HOLT: FORECASTING VOLATILITY WITH NEURAL REGRESSION 855
Fig. 1. Time series of implied volatility innovations of IBEX35 index options prices.
(25)
TABLE III
SUMMARY STATISTICS OF THE DATA
TABLE VIII
STATISTICS FOR THE FULL NEURAL MODEL (15 INDEPENDENT VARIABLES,
TABLE V
FIVE HIDDEN UNITS, k = 86 CONNECTIONS). THE EFFECTIVE NUMBER OF
LINEAR AUTOREGRESSION MODEL FOR IMPLIED VOLATILITY INNOVATIONS
PARAMETERS IS GIVEN BY p
USING THE LAGS IDENTIFIED FROM THE PACF
TABLE VI
MULTIVARIATE LINEAR REGRESSION MODEL WITH ALL AVAILABLE
VARIABLES
Fig. 5. Autocorrelation function of linear model residuals. Fig. 6. Partial autocorrelation function of linear model residuals.
Fig. 7. Empirical loss (training error) and estimated prediction risk (projected out of sample error) versus the number of hidden units for a single hidden layer
network.
tests which are presented in Table X. The p-value of our we go long (buy) one options contract if the predicted change in
generalized Durbin–Watson test is well below the critical value volatility is positive and short (sell) one contract is the predicted
of 0.05. Additionally, we perform the two tests and change in volatility is negative. Fig. 14 shows the forecasted
from [17], both of which are well below the critical value for values from the neural model with the actual implied volatilities
rejection of the null hypothesis. We therefore conclude that on the out of sample data set. On the whole, the model accurately
there is no autocorrelation in the residuals. forecasts the direction of future volatility changes. The cumu-
lative profits for the autoregressive, linear, and neural models
E. Trading Performance are shown in Fig. 15. While each method appears to be prof-
We have shown that the final model is satisfactory from an itable, a real trading model must demonstrate stability and risk
specification viewpoint, but an important issue for practitioners control. We therefore present analysis of the cumulative profit
is whether the model performs well in a true out-of-sample in Table XI. The autoregressive model gives a mean profit of
trading situation. We therefore compare the final linear and 1.84 per day but suffers from a maximum drawdown4 of 26%.
neural models in a simplified trading simulation. Assume that Drawdowns of over 10% will not be tolerated in a real trading
the implied volatility option prices are computed by the trivial environment. The multiple linear regression model has a greater
model mean profit, but has a drawdown of 99% at time period 9 (thus
almost completely wiping out all trading capital). The model
Option price Volatility (27) is therefore inappropriate for trading without investigation. The
neural model has a higher mean profit than the other models and
where the initial price is 50 pesetas and the initial volatility is
20%. We test the models by applying a trading rule such that 4The drawdown is the percentage of capital lost over a particular day.
REFENES AND HOLT: FORECASTING VOLATILITY WITH NEURAL REGRESSION 861
Fig. 10. Predicted versus actual values of Implied volatility for final neural model.
a lower drawdown at 16% and therefore represents the best of a Monte Carlo simulation for a simple network architecture and
the three techniques examined. our final architecture of 7-5-1.
We have described a volatility predictor based on a neural
model that outperforms other commonly used techniques both
in terms of statistical performance metrics (e.g., ) and also in
IV. CONCLUSION a simple trading simulation. We can have further confidence in
the model as each parameter has been tested for statistical sig-
We have discussed the importance of the misspecification nificance and the model has been shown to be correctly speci-
analysis for neural regression models. To test for first-order fied. The use of adjusted performance metrics allows for direct
linear autocorrelation we presented a generalization of the comparison between neural networks and other model classes.
Durbin–Watson test statistic which may be used with neural Residual diagnostic testing is widespread in linear modeling
networks. The use of such diagnostic statistics is essential and is a necessary, but not sufficient, condition for model ade-
in a robust model development process for neural regression quacy. It is hoped that the use of these statistics will lead to both
models. The statistic was shown to be of comparable power greater care in the specification of neural models and aid their
with common tests used in linear regression modeling through acceptance by forecasters.
862 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 4, JULY 2001
(31)
(32)
(33)
(35)
(37)
(38)
APPENDIX A
(39)
DERIVATION OF PROPOSITION ONE
1) Proposition: For the neural regression described by and
(40)
(28)
(41)
with
(42)
and (43)
Fig. 14. Predicted vs actual for out-of-sample data from the neural model. Note that this dataset has not been used in any stage of the model specification,
estimation or statistical testing of the models.
TABLE XI
PERFORMANCE METRICS FOR TRADING MODELS
minimum. For a complete discussion of degrees of freedom in [4] D. Bunn, “Nontraditional methods of forecasting,” European J. Opera-
all classes of regression model see [6]. tional Res., vol. 92, pp. 528–536, 1996.
[5] M. Carapeto and W. Holt, “Testing for heteroscedasticity in regression
models,” J. Appl. Statist., 2001, to be published.
REFERENCES [6] M. Carapeto, W. Holt, and A.-P. N. Refenes, “Degrees of freedom in
regression models,”, Working Paper, Dept. Decision Sci., London Busi-
[1] W. Baxt and H. White, “Bootstrapping confidence intervals for clinical ness School, 2000.
input variable effects in a neural network trained to identify the presence [7] J. Durbin, “Testing for serial correlation in least squares regression when
of acute myocardial infarction,” Neural Comput., vol. 7, pp. 624–638, some of the regressors are lagged dependent variables,” Econometrica,
1995. vol. 38, pp. 410–421, 1970.
[2] F. Black and M. Scholes, “The pricing of options and corporate liabili- [8] J. Durbin and G. Watson, “Testing for serial correlation in least squares
ties,” J. Political Economy, vol. 81, pp. 637–659, 1973. regression,” Biometrika, vol. 37, pp. 409–428, 1950.
[3] G. Box and D. Cox, “An analysis of transformations,” J. Roy. Statist. [9] , “Testing for serial correlation in least squares regression,”
Soc., ser. B, pp. 211–264, 1964. Biometrika, vol. 38, pp. 159–178, 1951.
864 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 4, JULY 2001
[10] , “Testing for serial correlation in least squares regression,” [29] K. Wallis, “Testing for fourth-order autocorrelation in quarterly regres-
Biometrika, vol. 58, pp. 1–19, 1971. sion equations,” Econometrica, vol. 40, pp. 617–636, 1972.
[11] R. Engle, “Autoregressive conditional heteroscedasticity with estimates [30] H. White, “Neural networks: A statistical perspective,” Neural Comput.,
of the variance of United Kingdom inflation,” Econometrica, vol. 50, vol. 1, pp. 425–464, 1989.
pp. 987–1001, 1982. [31] K. White, “The Durbin-Watson test for autocorrelation in nonlinear
[12] W. Greene, Econometric Analysis. Englewood Cliffs, NJ: Prentice- models,” Rev. Economics Statist., vol. 74, pp. 362–365, 1992.
Hall, 1989. [32] H. White, Estimation, Inference and Specification Analysis, Econo-
[13] S. Goldfeld and E. Quandt, “Some tests for homoscedasticity,” J. Amer. metric Society Monographs: Cambridge Univ. Press, 1995.
Statist. Assoc., vol. 60, pp. 539–547, 1965. [33] A. Wald, “Tests of statistical hypothesies concerning several parameters
[14] E. Haug, The Complete Guide to Options Pricing Formulas. New when the number of observations is large,” Trans. Amer. Math. Soc., vol.
York: McGraw-Hill, 1998. 54, pp. 426–482, 1943.
[15] J. Hull, Options, Futures, and Other Derivative Securities: Prentice Hall
Publishing Co., 1993.
[16] J. Imhof, “Computing the distribution of quadratic forms in normal vari-
ables,” Biometrika, vol. 48, pp. 419–426, 1961.
[17] G. G. Judge, R. C. Hill, W. Griffiths, H. Lutkepohl, and T.-C. Lee, In-
troduction to the Theory and Practice of Econometrics, 2nd ed. New
York: Wiley, 1988. Apostolos-Paul N. Refenes received the B.Sc. degree in mathematics and com-
[18] M. King, “The Durbin-Watson bounds test and regressions without an puting 1984 and the Ph.D. degree in computing in 1987.
intercept,” Australian Economic Papers, vol. 20, pp. 161–170, 1981. He is Professor of Financial Engineering at Athens University of Business
[19] G. Kramer, “On the Durbin-Watson bounds in the case of regression and Economics and visiting Professor of Decision Science at London Business
through the origin,” Jahrbucher fur Nationalokonomie and Satistik 185, School. He has held previous appointments at London Business School, Univer-
1971. sity College London, University of Athens, and the DTI. He is the author of over
[20] W. L’Esperance, D. Chall, and D. Taylor, “An algorithm for determining 100 papers and editor of four books on the subjects of neural computing and fi-
the distribution function of the Durbin-Watson test statistic,” Economet- nancial engineering applications. His research interests include neural-network
rica, vol. 44, pp. 1325–1326, 1976. design methodology, model identification, and estimation procedures and ap-
[21] W. L’Esperance and D. Taylor, “The power of four tests of autocorrela- plied research on tactical asset allocation, factor models for equity investment,
tion in the linear regression model,” J. Econometrics, vol. 3, pp. 1–21, dynamic risk management, nonlinear cointegration, exchange risk management,
1975. etc.
[22] G. Ljung and G. Box, “On a measure of lack of fit in time series models,” Dr. Refenes founded the international conference on Neural Networks
Biometrika, vol. 66, pp. 265–270, 1979. in the Capital Markets (NnCM) and served as general chair for NnCM-93
[23] A. Miranda and H. Gonzalez, “IBEX options historical database and NnCM-95, Computational Finance 1997, International Chair for the
manual,” in Internal Discussion Document, Spain: MEFF options Joint IEEE/IAFE conference on Computational Intelligence in Financial
exchange, 1994. Engineering (CIFEr), and several other international conferences. Research
[24] J. Moody, “The effective number of parameters: An analysis of gener- on Neural Networks, Financial Engineering and Computational Finance is
alization and processing in nonlinear learning systems,” in Advances in supported by the ESRC, the DTI, ESPRIT, VALUE, and privately by several
Neural Information Processing Systems 4. San Mateo, CA: Morgan companies in the finance sector.
Kaufmann, 1992.
[25] J. Moody and J. Utans, “Architecture selection for neural networks: Ap-
plication to bond rating prediction,” in Neural Networks in the Capital
Markets, A. N. Refenes, Ed. New York: Wiley, 1995.
[26] M. Nerlove and K. Wallis, “Use of the Durbin-Watson statistic in inap- Will T. Holt received the B.Eng. degree in electronics from the University of
propriate situations,” Econometrica, vol. 34, pp. 235–238, 1966. Liverpool, U.K., and the Ph.D. degree in decision science from London Business
[27] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Nu- School.
merical Recipes in C, 2nd ed: Cambridge Univ. Press, 1992. He has recently joined Goldman Sachs & Co as a senior financial engineer
[28] A.-P. N. Refenes and A. Zapranis, “Neural model identification, variable for European Statistical Arbitrage Trading. The topic of his thesis is misspeci-
selection and model adequacy,” J. Forecasting, vol. 18, pp. 299–333, fication tests for neural regression models. His research interests include neural
1999. networks, nonlinear dynamics, statistics and financial econometrics.