Professional Documents
Culture Documents
An Artificial Neural Networks Primer With Financial Applications Examples in Financial Distress Predictions and Foreign Exchange Hybrid Trading System PDF
An Artificial Neural Networks Primer With Financial Applications Examples in Financial Distress Predictions and Foreign Exchange Hybrid Trading System PDF
by
URL: http://w3.to/ctan/
E-mail: ctan@computer.org
School of Information Technology, Bond University, Gold Coast, QLD 4229,
Australia
Table of Contents
Table of Contents
1. INTRODUCTION TO ARTIFICIAL INTELLIGENCE AND ARTIFICIAL NEURAL
NETWORKS .......................................................................................................................................... 2
1.1 INTRODUCTION ........................................................................................................................... 2
1.2 ARTIFICIAL INTELLIGENCE .......................................................................................................... 2
1.3 ARTIFICIAL INTELLIGENCE IN FINANCE ....................................................................................... 4
1.3.1 Expert System ................................................................................................................... 4
1.3.2 Artificial Neural Networks in Finance.............................................................................. 4
1.4 ARTIFICIAL NEURAL NETWORKS ................................................................................................. 5
1.5 APPLICATIONS OF ANNS ............................................................................................................. 7
1.6 REFERENCES ............................................................................................................................. 10
2. AN ARTIFICIAL NEURAL NETWORKS’ PRIMER ........................................................... 14
2.1 CHRONICLE OF ARTIFICIAL NEURAL NETWORKS DEVELOPMENT .............................................. 14
2.2 BIOLOGICAL BACKGROUND ...................................................................................................... 16
2.3 COMPARISON TO CONVENTIONAL COMPUTATIONAL TECHNIQUES ............................................ 17
2.4 ANN STRENGTHS AND WEAKNESSES ....................................................................................... 19
2.5 BASIC STRUCTURE OF AN ANN ................................................................................................ 20
2.6 CONSTRUCTING THE ANN ........................................................................................................ 21
2.7 A BRIEF DESCRIPTION OF THE ANN PARAMETERS ................................................................... 22
2.7.1 Learning Rate ................................................................................................................. 22
2.7.2 Momentum ...................................................................................................................... 22
2.7.3 Input Noise...................................................................................................................... 23
2.7.4 Training and Testing Tolerances.................................................................................... 23
2.8 DETERMINING AN EVALUATION CRITERIA ................................................................................ 23
2.9 REFERENCES ............................................................................................................................. 24
3. THE TECHNICAL AND STATISTICAL ASPECTS OF ARTIFICIAL NEURAL
NETWORKS ........................................................................................................................................ 27
3.1 ARTIFICIAL NEURAL NETWORK MODELS .................................................................................. 27
3.2 NEURODYNAMICS ..................................................................................................................... 27
3.2.1 Inputs .............................................................................................................................. 27
3.2.2 Outputs ........................................................................................................................... 28
3.2.3 Transfer (Activation) Functions...................................................................................... 28
3.2.4 Weighing Schemes and Learning Algorithms................................................................. 30
3.3 NEURAL NETWORKS ARCHITECTURE ........................................................................................ 30
3.3.1 Types of interconnections between neurons ................................................................... 30
3.3.2 The Number of Hidden Neurons ..................................................................................... 31
3.3.3 The Number of Hidden Layers........................................................................................ 32
3.3.4 The Perceptron ............................................................................................................... 32
3.3.5 Linear Separability and the XOR Problem..................................................................... 34
3.3.6 The Multilayer Perceptron ............................................................................................. 37
3.4 LEARNING ................................................................................................................................. 39
3.4.1 Learning Algorithms....................................................................................................... 39
3.5 STATISTICAL ASPECTS OF ARTIFICIAL NEURAL NETWORKS ...................................................... 43
3.5.1 Comparison of ANNs to Statistical Analysis................................................................... 43
3.5.2 ANNs and Statistical Terminology.................................................................................. 43
3.5.3 Similarity of ANN Models to Statistical Models ............................................................. 44
3.5.4 ANNs vs. Statistics .......................................................................................................... 45
3.5.5 Conclusion of ANNs and Statistics ................................................................................. 46
3.6 REFERENCES ............................................................................................................................. 47
4. USING ARTIFICIAL NEURAL NETWORKS TO DEVELOP AN EARLY WARNING
PREDICTOR FOR CREDIT UNION FINANCIAL DISTRESS..................................................... 51
i
Table of Contents
ii
Table of Contents
iii
Chapter 1: Introduction to Artificial Intelligence and
Artificial Neural Networks
1.1 Introduction
There can be little doubt that the greatest challenge facing managers and researchers in
the field of finance is the presence of uncertainty. Indeed risk, which arises from
uncertainty, is fundamental to modern finance theory and, since its emergence as a
separate discipline, much of the intellectual resources of the field have been devoted
to risk analysis. The presence of risk, however, not only complicates decision
financial making, it creates opportunities for reward for those who can analyze and
manage risk effectively.
By and large, the evolution of commercial risk management technology has been
characterized by computer technology lagging behind the theoretical advances of the
field. As computers have become more powerful, they have permitted better testing
and application of financial concepts. Large-scale implementation of Markowitz’s
seminal ideas on portfolio management, for example, was held up for almost twenty
years until sufficient computational speed and capacity were developed. Similarly,
despite the overwhelming need from a conceptual viewpoint, daily marking to market
of investment portfolios has only become a feature of professional funds management
in the past decade or so, following advances in computer hardware and software.
Recent years have seen a broadening of the array of computer technologies applied to
finance. One of the most exciting of these in terms of the potential for analyzing risk
is Artificial Intelligence (AI). One of the contemporary methods of AI, Artificial
Neural Networks (ANNs), in combination with other techniques, has recently begun
to gain prominence as a potential tool in solving a wide variety of complex tasks.
ANN-based commercial applications have been successfully implemented in fields
ranging from medical to space exploration.
• Financial Simulation
• Predicting Investor’s Behavior
• Evaluation
• Credit Approval
• Security and/or Asset Portfolio Management
• Pricing Initial Public Offerings
• Determining Optimal Capital Structure
Trippi and Turban [1996] noted in the preface of their book, that financial
organizations are now second only to the US Department of Defense in the
sponsorship of research in neural network applications.
1
At the time of writing, there is still no standard terminology in the Connectionist field. The neuron has
also been called the following in the Connectionist literature: processing elements, neurodes,
processors, units, etc.
characteristics. They typically use cross sectional data. Solving these problems entails
‘learning’ patterns in a data set and constructing a model that can recognize these
patterns. Commercial artificial neural network applications of this nature include:
• Credit card fraud detection reportedly being used by Eurocard Nederland,
Mellon Bank, First USA Bank, etc. [Bylinsky 1993];
• Optical character recognition (OCR) utilized by fax software such as Calera
Recognition System’s FaxGrabber and Caere Corporation’s Anyfax OCR
engine that is licensed to other products such as the popular WinFax Pro and
FaxMaster [Widrow et al.1993];
• Cursive handwriting recognition being used by Lexicus2 Corporation’s
Longhand program that runs on existing notepads such as NEC Versapad,
Toshiba Dynapad etc. [Bylinsky 1993], and ;
• Cervical (Papanicolaou or ‘Pap’) smear screening system called Papnet3 was
developed by Neuromedical Systems Inc. and is currently being used by the US
Food and Drug Administration to help cytotechnologists spot cancerous cells
[Schwartz 1995, Dybowski et al.1995, Mango 1994, Boon and Kok 1995, Boon
and Kok 1993, Rosenthal et al.1993];
• Petroleum exploration being used by Texaco and Arco to determine locations
of underground oil and gas deposits [Widrow et al.1993]; and
• Detection of bombs in suitcases using a neural network approach called
Thermal Neutron Analysis (TNA), or more commonly, SNOOPE, developed by
Science Applications International Corporation (SAIC) [Nelson and Illingworth
1991, Johnson 1989, Doherty 1989 and Schwartz 1989].
In time-series problems, the ANN is required to build a forecasting model from the
historical data set to predict future data points. Consequently, they require relatively
sophisticated ANN techniques since the sequence of the input data in this type of
problem is important in determining the relationship of one pattern of data to the next.
This is known as the temporal effect, and more advance techniques such as finite
impulse response (FIR) types of ANN and recurrent ANNs are being developed and
explored to deal specifically with this type of problem.
Real world examples of time series problems using ANNs include:
2
Motorola bought Lexicus in 1993 for an estimated US$7 million and the focus of Lexicus is now on
developing Chinese writing recognition [Hitheesing 1996].
3
The company has since listed in the US stock exchange (NASDAQ:PPNT) under the trading name of
PAPNET of Ohio. The PAPNET diagnosis program has recently been made available in Australia.
4 LBS Capital Management Inc., is a Clearwater, Florida, firm that uses Artificial Neural Networks and
Artificial Intelligence to invest US$600 million, half of which are pension assets. It has reported no loss
year in stocks and bonds since the strategy was launched in 1986 and its mid-capped returns have
ranged from 14.53% in 1993 to 95.60% in 1991, compared to the S & P 400 (sic?), which returned
13.95% and 50.10% respectively. [Elgin 1994].
5
Cluster analysis basic objective is to discover the natural groupings of items (or variables) and
clustering algorithms are used to search for good but not necessarily the best, groupings. They are
widely used in understanding the complex nature of multivariate relationships (Johnson and Wichern
1988).
1.6 References
1. “Tilting at Chaos”, The Economist, p. 70, August 15, 1992.
2. Anderer P, et al., “Discrimination between demented patients and normals based
on topographic EEG slow wave activity: comparison between z statistics,
discriminant analysis and artificial neural network classifiers”,
Electroencephalogr Clin Neuropsychol, No. 91 (2), pp. 108-17, 1994.
3. Bankman IN, et al., “Feature-based detection of the K-complex wave in the human
electroencephalogram using neural networks”, IEEE Trans Biomed Eng,; No. 39,
pp.1305-10, 1992.
4. Baxt, W.G. and Skora, J., “Prospective validation of artificial neural network
trained to identify acute myocardial infarction”, The Lancet, v347 n8993, p12(4),
Jan 6, 1996.
5. Baxt, W.G., “Application of Artificial Neural Networks to Clinical Medicine”,
The Lancet, v346 n8983, p1135(4), Oct. 28, 1995.
6. Blue, T., “Computers Trade Places in Tomorrow’s World”, The Australian,
August 21, 1993.
7. Boon ME, Kok LP, Beck S., “Histological validation of neural-network assisted
cervical screening: comparison with the conventional approach”, Cell Vision, vol.
2, pp. 23-27, 1995.
8. Boon ME, Kok LP., “Neural network processing can provide means to catch
errors that slip through human screening of smears”, Diag Cytopathol, No. 9, pp.
411-416. 1993.
9. Bortolan G, Willems JL., “Diagnostic ECG classification based on neural
networks” Journal of Electrocardiology, No. 26, pp. 75-79, 1993.
10. Bylinsky, G., “Computers That Learn by Doing”, Fortune, pp. 96-102, September
6, 1993.
11. Colin, A, “Exchange Rate Forecasting at Citibank London”, Proceedings, Neural
Computing 1991, London, 1991.
12. Colin, A. M., “Neural Networks and Genetic Algorithms for Exchange Rate
Forecasting”, Proceedings of International Joint Conference on Neural Networks,
Beijing, China, November 1-5, 1992, 1992.
13. Devine B, Macfarlane PW, “Detection of electrocardiographic `left ventricular
strain’ using neural nets”, Med Biol Eng Comput; No. 31, pp. 343-48, 1993.
14. Doherty, R., “FAA Adds 40 Sniffers”, Electronic Engineering Times, issue 554,
September 4, 1989.
15. Dybowski, R. and Gant, V., “Artificial neural networks in pathology and medical
laboratories”, The Lancet, v346 n8984, p1203(5), Nov. 4, 1995.
16. Edenbrandt L, Devine B, Macfarlane PW., “Neural networks for classification of
ECG ST-T segments”, Journal of Electrocardiology; No. 25, pp. 167-73, 1992.
35. Medsker, L., Turban, E. and R. Trippi, “Neural Network Fundamentals for
Financial Analysts”, Neural Networks in Finance and Investing edited by Trippi
and Turban, Irwin, USA, Chapter. 1, pp. 329-365, ISBN 1-55738-919-6, 1996.
36. Mehta, A., “Nations Unite for Electronic Brain”, Computer Weekly, issue 1148,
January 11, 1988.
37. Milton, R., “Neural Niches”, Computing, p. 30(2), Sept. 23, 1993.
38. Nelson, M. M. & Illingworth, W. T., A Practical Guide to Neural Nets, Addison-
Wesley Publishing Company, Inc., USA, 1991.
39. Newquist III, H. P., “Parlez-Vous Intelligence Artificielle?”, AI Expert, vol. 4, no.
9, p. 60, September 1989.
40. Pal, S. K. and Srimani, P. K., “Neurocomputing: Motivation, Models, and
Hybridization”, Computer, ISSN 0018-9162, Vol. 29 No. 3, IEEE Computer
Society, NY, USA, pp. 24-28, March 1996.
41. Penrose, P., “Star Dealer who works in the dark”, The London Times, p. 28, Feb.
26, 1993.
42. Rich, E. & Knight, K., Artificial Intelligence, Second Edition, McGraw Hill, pp. 4-
6, 1991.
43. Rosenthal DL, Mango LJ, Acosta DA and Peters RD., “"Negative" pap smears
preceding carcinoma of the cervix: rescreening with the PAPNET system.”,
American Journal of Clinical Pathology, No. 100, pp. 331, 1993.
44. Schwartz, T. J., “IJCN ‘89”, IEEE Expert, vol. 4 no. 3, pp. 77-78, Fall 1989.
45. Schwartz, T., “Applications on Parade”, Electronic Design, v43 n16, p68(1),
August 7, 1995.
46. Shandle, J., “Neural Networks are Ready for Prime Time”, Electronic Design,
v.41 n.4, p51(6), Feb. 18, 1993.
47. Takita, H., “Pennies from Heaven: selling accurate weather predictions”, Today
(Japan), v63 n7, p14(3), July 1995.
48. Trippi and Turban, Neural Networks in Finance and Investing 2n. Edition, Irwin,
USA, ISBN 1-55738-919-6, 1996.
49. Widrow, B., Rumelhart, D. E., Lehr, M. A., Neural Networks: Applications in
Industry, Business and Science, Journal A, vol. 35, No. 2, pp. 17-27, July 1994.
50. Winston, P. , Artificial Intelligence, Third Edition, Addison-Wesley, 1992.
51. Zahedi, F., Intelligent Systems for Business: Expert Systems with Neural
Networks, Wadsworth Publishing Company, Belmont, USA, pp. 10-11, 1993.
“There is no expedient to which a man will not go to avoid the real labor of
thinking”
6
According to Eberhart and Dobbins [1990], James was considered by many to be the greatest American.
incorporated learning based on the Hebbian Learning Rule into the McCulloch-Pitts neural
model. The tasks that he used the perceptron to solve were identifying simple pattern
recognition problems such as differentiating sets of geometric patterns and alphabets. The
Artificial Intelligence community was excited with the initial success of the perceptron and
expectations were generally very high with the perception7 of the perceptron being the
panacea for all the known computer problems of that time. Bernard Widrow and Marcian
Hoff contributed to this optimism when they published a paper [Widrow and Hoff 1960]
on ANNs from the engineering perspective and introduced a single neuron model called
ADALINE that became the first ANN to be used in a commercial application. It has been
used since then as an adaptive filter for telecommunication to cancel out echoes on phone
lines. The ADALINE used a learning algorithm that became known as the delta rule8. It
involves using an error reduction method known as gradient descent or steepest descent.
However, in 1969, Marvin Minsky and Samuel Papert, two well renown researchers in the
Artificial Intelligence field, published a book entitled ‘Perceptron’ [Minsky and Papert
1969], criticizing the perceptron model, concluding that it (and ANNs as a whole) could
not solve any real problems of interest. They proved that the perceptron model, being a
simple linear model with no hidden layers, could only solve a class of problems known as
linearly separable problems. One example of a non-linearly separable problem that they
proved the perceptron model was incapable of solving is the now infamous exclusive-or9
and its generalization, the parity detection problem. Rosenblatt did consider multilayer
perceptron models but at that time, a learning algorithm to train such models was not
available.
This critique, coupled with the death of Rosenblatt in a boat accident in 1971 [Masters
1993], cast doubt on the minds of research sponsors and researchers alike on the viability
of developing practical applications from Artificial Neural Networks. Funds for ANNs
research dried up, and many researchers went on to pursue other more conventional
Artificial Intelligence technology. In the prologue of the recent reprint of ‘Perceptron’,
Minsky and Papert [1988, pp. vii-xv]10 justified their criticism of the perceptron model and
pessimism of the ANNs field at that time by claiming that the redirection of research was
“no arbitrary diversion but a necessary interlude”. They felt that more time was needed to
develop adequate ideas about the representation of knowledge before the field could
progress further. They further claimed that the result of this diversion of resources brought
about many new and powerful ideas in symbolic AI such as relational databases, frames
and production systems which in turned, benefited many other research areas in
psychology, brain science, and applied expert systems. They hailed the 1970s as a golden
age of a new field of research into the representation of knowledge. Ironically, this
signaled the end of the second period of ANN development and the beginning of the Dark
Ages for ANNs research.
7
Pardon the pun!
8
This algorithm is also known as the Widrow-Hoff or Least Mean Squares method. An extension of this
algorithm is used today in the back-propagation algorithm.
9
The exclusive-or (XOR) problem and linear separability issue is discussed in more detail in Chapter 3.
10
Interestingly, the reprint of the ‘Perceptron’ was dedicated by the authors to the memory of Frank
Rosenblatt.
with each other through synapses which are gaps or junctions between the connections.
The transmitting side of the synapses release neurotransmitters which are paired to the
neuroreceptors on the receiving side of the synapses. Learning is usually done by adjusting
existing synapses, though some learning and memory functions are carried out by creating
new synapses. In the human brain, neurons are organized in clusters and only several
thousands or hundred of thousands participate in any given task. Figure 2-1 shows a
sample neurobiological structure of a neuron and its connections.
The axon of a neuron is the output path of a neuron that branches out through axon
collaterals which in turn connect to the dendrites or input paths of neurons through a
junction or a gap known as the synapse. It is through these synapses that most learning is
carried out by either exciting or inhibiting their associated neuron activity. However, not
all neurons are adaptive or plastic. Synapses contain neurotransmitters that are released
according to the incoming signals. The synapses excite or inhibit their associated neuron
activity depending on the neurotransmitters released. A biological neuron will add up all
the activating signals and subtract all the inhibiting signals from all of its synapses. It will
only send out a signal to its axon if the difference is higher than its threshold of activation.
The processing in the biological brain is highly parallel and is also very fault tolerant. The
fault tolerance characteristic is a result of the neural pathways being very redundant and
information being spread throughout synapses in the brain. This wide distribution of
information also allows the neural pathways to deal well with noisy data.
A biological neuron is so complex that current super computers cannot even model a
single neuron. Researchers have therefore simplified neuron models in designing ANNs.
Figure 2-1
A typical biological neuron
Axon
Dendrites
Cell Body
Synapses
know a priori the necessary rules or models that are required to perform the desired task.
Instead, a system builder trains an ANN to ‘learn’ from previous samples of data in much
the same way that a teacher would teach a child to recognize shapes, colors, alphabets, etc.
The ANN builds an internal representation of the data and by doing so ‘creates’ an internal
model that can be used with new data that it has not seen before.
Existing computers process information in a serial fashion while ANNs process
information in parallel. This is why even though a human brain neuron transfers
-3
information in the milliseconds (10 ) range while current computer logic gates operate in
-9
the nanosecond (10 ) range, about a million times faster, a human brain can still process a
pattern recognition task much faster and more efficiently than the fastest currently
11
available computer. The brain has approximately 10 neurons and each of these neurons
acts as a simple processor that processes data concurrently; i.e. in parallel.
Tasks such as walking and cycling seem to be easy to humans once they have learned them
and certainly not much thought is needed to perform these tasks once they are learnt.
However, writing a conventional computer program to allow a robot to perform these tasks
is very complex. This is due to the enormous quantity of data that must be processed in
order to cope with the constantly changing surrounding environment. These changes
require frequent computation and dynamic real time processing. A human child learns
these tasks by trial and error. For example, in learning to walk, a child gets up, staggers
and falls, and keeps repeating the actions over and over until he/she has learned to walk.
The child effectively ‘models’ the walking task in the human brain through constant
adjustments of the synaptic strengths or weights until a stable model is achieved.
Humans (and neural networks) are very good at pattern recognition tasks. This explains
why one can usually guess a tune from just hearing a few bars of it or how a letter carrier
can read a wide variety of handwritten address without much difficulty. In fact, people
tend to always associate their senses with their experiences. For example, in the ‘Wheel of
Fortune’ game show, the contestants and viewers are usually able to guess a phrase
correctly from only a few visible letters in a phrase. The eyes tend to look at the whole
phrase, leaving the brains to fill in the missing letters in the phrase and associate it with a
known phrase. Now, if we were to process this information sequentially like a serial
computer; i.e., look at one visible character at a time; and try to work out the phrase, it
would be very difficult. This suggests that pattern recognition tasks are easier to perform
by looking at a whole pattern (which is more akin to neural network’s parallel processing)
rather than in sequential manner (as in a conventional computer’s serial processing).
In contrast, tasks that involve many numerical computations are still done faster by
computers because most numerical computations can be reduced to binary representations
that allow fast serial processing. Most of today’s ANN programs are being simulated by
serial computers, which is why speed is still a major issue for ANNs, specifically the
training time. There are a growing number of ANN hardware11 available in the market
today including personal computer-based ones like the Intel’s Ni1000 and the
Electronically Trainable Artificial Neural Network (ETANN), the IBM’s ZISC/ISA
Accelerator for PC and the Brainmaker Professional CNAPS Accelerator System. These
ANN hardware process information in parallel, but the costs and the learning curves
11
See Lindsey and Lindblad [1994, 1995] and Lindsey et. al.’s [1996] for a comprehensive listing of
commercial ANN hardware.
required to use them are still quite prohibitive. Most researchers are of the view that in the
near future, a special ANN chip will be sitting next to the more familiar CPU chip in
personal computers, performing pattern recognition tasks such as voice and optical
character recognition.
12
Serial computers are also called Von Neumann computers in computer literature.
13
The old adage of garbage in, garbage out holds especially true for ANN modeling. A well-known case in
which an ANN learned the incorrect model involved the identification of a person’s sex from a picture of
his/her face. The ANN application was trained to identify a person as either male or female by being shown
various pictures of different persons’ faces. At first, researchers thought that the ANN had learnt to
differentiate the face of a male from a female by identifying the visual features of a person’s face. However it
was later discovered that the pictures used as input data showed all the male persons’ heads nearer to the
edge of the top end of the pictures, presumably due to a bias of taller males in the data than females. The
ANN model had therefore learned to differentiate the sex of a person by the distance his/her head is from the
top edge of a picture rather than by identifying his/her visual facial features.
The major weakness of ANNs is their lack of explanation for the models that they create.
Research is currently being conducted to unravel the complex network structures that are
created by ANN. Even though ANNs are easy to construct, finding a good ANN structure,
as well as the pre-processing and post processing of the data, is a very time consuming
processes. Ripley [1993] states ‘the design and learning for feed-forward networks are
hard’. He further quoted research by Judd [1990] and Blum and River [1992] that showed
this problem to be NP-complete14.
Figure 2-2
An Artificial Neuron
x1
ww
1j
Neuron j hj
Σ w ij xi
Oj = g(hj)
x2 ww
2j
In the human brain, neurons communicate by sending signals to each other through
complex connections. ANNs are based on the same principle in an attempt to simulate the
learning process of the human brain by using complex algorithms. Every connection has a
weight attached which may have either a positive or a negative value associated with it.
Positive weights activate the neuron while negative weights inhibit it. Figure 1 shows a
network structure with inputs (x1, x2, ...xi) being connected to neuron j with weights (w1j,
w2j,...wij) on each connection. The neuron sums all the signals it receives, with each signal
being multiplied by its associated weights on the connection.
14
NP (Non-Polynomial)-complete problems as mentioned in Chapter 1, are a set of very difficult problems.
15
There is no standardization of terminology in the artificial neural network field. However, the Institute of
Electrical and Electronic Engineers currently have a committee looking into it. Other terminology that has
been used to describe the artificial neuron include processing elements, nodes, neurodes, units, etc.
16
In some ANN literature the layers are also called slabs.
This output (hj) is then passed through a transfer (activation) function, g(h), that is
normally non-linear to give the final output Oj. The most commonly used function is the
sigmoid (logistic function) because of its easily differentiable properties17, which is very
convenient when the back-propagation algorithm is applied. The whole process is
discussed in more detail in chapter 3.
The back-propagation ANN is a feed-forward neural network structure that takes the input
to the network and multiplies it by the weights on the connections between neurons or
nodes; summing their products before passing it through a threshold function to produce
an output. The back-propagation algorithm works by minimizing the error between the
output and the target (actual) by propagating the error back into the network. The weights
on each of the connections between the neurons are changed according to the size of the
initial error. The input data are then fed forward again, producing a new output and error.
The process is reiterated until an acceptable minimized error is obtained. Each of the
neurons uses a transfer function18 and is fully connected to nodes on the next layer. Once
the error reaches an acceptable value, the training is halted. The resulting model is a
function that is an internal representation of the output in terms of the inputs at that point.
A more detailed discussion of the back-propagation algorithm is given in chapter 3.
1
O pj = − net
17
The sigmoid (logistic) function is defined as 1 + e pj . In the ANN context, Opj is the output of a
neuron j given an input pattern p and netpj is the total input to the ANN. The derivative of the output function
to the total input is required to update the weights in the back-propagation algorithm. Thus we have:
∂Opj
= Opj (1 − Opj )
∂net pj , a trivial derivation. For a more detailed discussion on the back-propagation
‘train’ the ANN by adjusting its weights to minimize the difference between the current
ANN output and the desired output.
Finally, an evaluation process has to be conducted to determine if the ANN has ‘learned’
to solve the task at hand. This evaluation process may involve periodically halting the
training process and testing its performance until an acceptable result is obtained. When an
acceptable result is obtained, the ANN is then deemed to have been trained and ready to be
used.
As there are no fixed rules in determining the ANN structure or its parameter values, a
large number of ANNs may have to be constructed with different structures and
parameters before determining an acceptable model. The trial and error process can be
tedious and the experience of the ANN user in constructing the networks is invaluable in
the search for a good model.
Determining when the training process needs to be halted is of vital importance in
obtaining a good model. If an ANN is overtrained, a curve-fitting problem may occur
whereby the ANN starts to fit itself to the training set instead of creating a generalized
model. This typically results in poor predictions of the test and validation data set. On the
other hand, if the ANN is not trained for long enough, it may settle at a local minimum,
rather than the global minimum solution. This typically generates a sub-optimal model. By
performing periodic testing of the ANN on the test set and recording both the results of the
training and test data set results, the number of iterations that produce the best model can
be obtained. All that is needed is to reset the ANN and train the network up to that number
of iterations.
2.9 References
1. Blum, A. L. and Rivers, R.L., “Training a 3-node Neural Network is NP-complete”,
Neural Networks 5, pp. 117-127, 1992.
2. Bryson, A. E., Ho, Y, -C., Applied Optimal Control, Blaisdell, 1969.
3. Cowan, J. D. and Sharp, D. H., “Neural Nets and Artificial Intelligence”, Daedalus,
117(1), pp. 85-121, 1988.
4. Fischler and Firschein, Intelligence: The Eye, the Brain, and the Computer, Reading,
MA, Addison-Wesley, p. 23, April 1987.
5. Hebb, D. O., The Organization of Behavior, John Wiley, New York, 1949.
6. James, W., Psychology (Briefer Course), Holt, New York, 1890.
7. Judd, J. S., Neural Network Design and Complexity of Learning, MIT Press, USA,
1990.
8. Lindsey, C. S. and. Lindblad, T., "Review of Hardware Neural Networks: A User's
Perspective", Proceedings of ELBA94., 1994.
9. Lindsey, C. S. and. Lindblad, T., "Survey of Neural Network Hardware", Proceedings
of SPIE95, 1995.
10. Lindsey, C. S., Denby, B. and Lindblad, T., June 11, 1996, Neural Network Hardware,
[Online], Artificial Neural Networks in High Energy Physics,
Available: http://www1.cern.ch/NeuralNets/nnwInHepHard.html, [1996, August 30].
11. Masters, T., Practical Neural Network Recipes in C++, Academic Press Inc., San
Diego, CA., USA, ISBN: 0-12-479040-2, p.6, 1993.
12. McCartor, H., “Back Propagation Implementation on the Adaptive Solutions CNAPS
Neurocomputer”, Advances in Neural Information Processing Systems 3, USA, 1991.
13. McCulloch, W. S. and Pitts, W., “A Logical Calculus of Ideas Immanent in Nervous
Activity:, Bulletin of Mathematical Biophysics, pp. 5:115-33, 1943.
14. Minsky, M. and Papert, S. A., Perceptrons, MIT Press, Cambridge, MA, USA,1969.
15. Minsky, M. and Papert, S. A., Perceptrons. Expanded Edition, MIT Press, Cambridge,
MA, USA, ISBN: 0-262-63111-3, 1988.
16. Nelson, M. M. and Illingworth, W. T., A Practical Guide to Neural Nets, Addison-
Wesley Publishing Company, Inc., ISBN: 0-201-52376-0/0-201-56309-6, USA, 1991.
17. Neural Computing: NeuralWorks Professional II/Plus and NeuralWorks Explorer,
NeuralWare Inc. Technical Publishing group, Pittsburgh, PA, USA, 1991.
18. Ripley, B. D., “Statistical Aspects of Neural Networks”, Networks and Chaos:
Statistical and Probabilistic Aspects edited by Barndoff-Nielsen, O. E., Jensen, J.L. and
Kendall, W.S., Chapman and Hall, London, United Kingdom, 1993.
19. Rosenblatt, F., “The perceptron: a probabilistic model for information storage and
organization in the brain”, Psychological Review, 65:pp.386-408, 1958.
20. Rumelhart, D. E., Hinton, G. E., and Williams, R. J., “Learning Internal
Representations by Back-Propagating Errors”, Nature, No. 323: pp.533-536, 1986a.
21. Rumelhart, D. E., Hinton, G. E., and Williams, R. J., “Learning Internal
Representations by Error Propagation”, Parallel Distributed Processing: Explorations
in the microstructure of Cognition edited by Rumelhart, McClelland and the PDP
Research Groups Vol.1, pp. 216-271, MIT Press, Cambridge Mass., USA, ISBN: 0-
262-18120-7, 1986b.
22. Sejnowski, T. J. and Rosenburg, C. R., “Parallel Networks that Learn to Pronounce
English Text, Complex Systems, No. 1, pp. 145-168, 1987.
23. Shih, Y., Neuralyst User’s Guide, Cheshire Engineering Corporation, USA, p. 21,
1994.
24. Werbos, P., Beyond Regression: New Tools for Prediction and Analysis in the
Behavioral Sciences, Ph.D. thesis, Harvard University, 1974.
25. Widrow, B. and Hoff, M. D., “Adaptive Switching Circuits”, 1960 IRE WESCON
Convention Record, Part 4, pp. 96-104, 1960.
“The real problem is not whether machines think but whether men do.”
B. F. Skinner, Contingencies of Reinforcement, 1969
“There are two kind of statistics, the kind you look up and the kind you make up.”
Rex Stout (1886-1975), Death of a Doxy, 1966
3.2 Neurodynamics
3.2.1 Inputs
The input layer of an ANN typically functions as a buffer for the inputs, transferring the
data to the next layer. Preprocessing the inputs may be required as ANNs deal only with
numeric data. This may involve scaling the input data and converting or encoding the input
data to a numerical form that can be used by the ANN. For example, in an ANN real estate
price simulator application described in a paper by Haynes and Tan [1993], some
qualitative data pertaining to the availability of certain features of a residential property
used a binary representation. For example, features like the availability of a swimming
pool, a granny flat and a waterfront location, were represented with a binary value of ‘1’,
indicating the availability of the feature, or ‘0’ if it was not. Similarly, a character or an
image to be presented to an ANN can be converted into binary values of zeroes and ones.
For example, the character ‘T’ can be represented as shown in Figure 3-1.
19
As mentioned earlier, they are also called processing elements, neurodes, nodes, units, etc.
Figure 3-1
The binary representation for the letter ‘T’
1111111
0001000
0001000
0001000
3.2.2 Outputs
The output layer of an ANN functions in a similar fashion to the input layer except that it
transfers the information from the network to the outside world. Post-processing of the
output data is often required to convert the information to a comprehensible and usable
form outside the network. The post-processing may be as simple as just a scaling of the
outputs ranging to more elaborate processing as in hybrid systems.
For example, in chapter 4 of this book, on the prediction of financial distress in credit
unions, the post-processing is relatively simple. It only requires the continuous output
values from the ANN to be converted into a binary form of ‘1’ (indicating a credit union in
distress) or ‘0’ (indicating a credit union is not in distress). However, in the foreign
exchange trading system application in chapter 5, the post-processing of the network
output is more complex. The ANN output is the predicted exchange rate but the trading
system output requires a trading signal to be generated from the ANN output. Thus, the
ANN output has to go through a set of rules to produce the trading signal of either a ‘Buy’
or ‘Sell’ or ‘Do Nothing’.
3.2.3 Transfer (Activation) Functions
The transfer or activation function is a function that determines the output from a
summation of the weighted inputs of a neuron. The transfer functions for neurons in the
hidden layer are often nonlinear and they provide the nonlinearities for the network.
For the example in Figure 3-2, the output of neuron j, after the summation of its weighted
inputs from neuron 1 to i has been mapped by the transfer function f can be shown as:
O j = f j ∑ wij xi
i
Equation 3-1
Figure 3-2
Diagram of the Neurodynamics of Neuron j
w
x1 w1j
Oj = f(h j)
Neuron j hj
x
2 w
w2j Σ w ij xi
x
i Transfer function f
wwij
A transfer function maps any real numbers into a domain normally bounded by 0 to 1 or -1
to 1. Bounded activation functions are often called squashing functions [Sarle 1994]. Early
ANN models, like the perceptron used, a simple threshold function (also known as a step-
function, hard-limiting activation or Heaviside function):
threshold: f ( x ) = 0 if x < 0 , 1 otherwise.
The most common transfer functions used in current ANN models are the sigmoid (S-
shaped) functions. Masters [1993] loosely defined a sigmoid function as a ‘continuous,
real-valued function whose domain is the reals, whose derivative is always positive, and
whose range is bounded’. Examples of sigmoid functions are:
1
logistic: f ( x) =
1 + e−x
e x − e−x
hyperbolic tangent: f ( x ) =
e x + e−x
The logistic function remains the most commonly applied in ANN models due to the ease
of computing its derivative:
f ’ ( x ) = f ( x )(1 − f ( x ))
The output, Oj, of the neuron x j of the earlier example in Figure 3-2 if the function f is a
logistic function becomes:
1
Oj = − ∑ wij xi −θ j
1+ e i
Equation 3-2
where θ j is the threshold on unit j.
Equation 3-3
However, Kalman and Kwasny [1992] argue that the hyperbolic tangent function is the
ideal transfer function. According to Masters [1993], the shape of the function has little
effect on a network although it can have a significant impact on the training speed. Other
common transfer functions include:
linear or identity: f ( x ) = x Normally used in the input and/or output layer.
f ( x) = e − x
2
/2
Gaussian:
Sigmoid functions can never reach their theoretical limit values and it is futile to try and
train an ANN to achieve these extreme values. Values that are close to the limits should be
considered as having reaching those values. For example, in a logistic function where the
limits are 0 to 1, a neuron should be considered to be fully activated at values around 0.9
and turned off at around 0.1. This is another reason why ANNs cannot do numerical
computation as well or as accurate as simple serial computers; i.e. a calculator. Thus
ANNs is not a suitable tool for balancing check books!
3.2.4 Weighing Schemes and Learning Algorithms
The initial weights of an ANN are often selected randomly or by an algorithm. The
learning algorithm determines how the weights are changed, normally depending on the
size of the error in the network output to the desired output. The objective of the learning
algorithm is to minimize this error to an acceptable value. The back-propagation algorithm
is by far the most popular learning algorithm for multilayer networks and will be discussed
in more detail in section 3.4.1.2.
require only a single pass to obtain a solution. According to Nelson and Illingworth [1990]
recurrent networks are used to perform functions like automatic gain control or energy
normalization and selecting a maximum in complex systems.
Most ANN books, however, classify networks into two categories only: feedforward
networks and recurrent networks. This is done by classifying all networks with feedback
connections or loops as recurrent networks. Fully connected feedforward networks are
often called multi-layer perceptrons (MLPs) and they are by far the most commonly used
ANNs. All the ANNs used in this book are MLPs. They will be discussed in more detail in
section 3.3.6.
3.3.2 The Number of Hidden Neurons
Hidden neurons are required to compute difficult functions known as nonseparable
functions which are discussed in section 3.3.5. The number of input and output neurons are
determined by the application at hand. However, there are no standard rules or theories in
determining the number of neurons in the hidden layers although there are some rules of
thumb suggested by various ANN researchers:
• Shih [1994] suggested that the network topology should have a pyramidal shape;
that is to have the greatest number of neurons in the initial layers and have fewer
neurons in the later layers. He suggested the number of neurons in each layer should
be a number from mid-way between the previous and succeeding layers to twice the
number of the preceding layer. The examples given suggest that a network with 12
neurons in its previous layer and 3 neurons in the succeeding layer should have 6 to
24 neurons in the intermediate layer.
• According to Azoff [1994], a rough guideline based on theoretical conditions of
what is know as the Vapnik-Chervonenkis dimension20, recommends that the
number of training data should be at least ten times the number of weights. He also
quoted a theorem due to Kolmogorov [Hecht-Nielsen 1990 and Lippman 1987] that
suggests a network with one hidden layer and 2N+1 hidden neurons is sufficient for
N inputs.
• Lawrence [1994, p. 237] gives the following formula for determining the number of
hidden neurons required in a network:
number of hidden neurons = training facts × error tolerance.
• Note: training facts refers to in-sample data while the error tolerance refers to the
level of accuracy desired or acceptable error range.
• Baum and Haussler [1988] suggest that the number of neurons in the hidden layer
me
should be calculated as follows: j = where j is the number of neurons in the
n+z
hidden layer, m is the number of data points in the training set, e is the error
tolerance, n is the number of inputs and z the number of outputs.
The latter two rules of thumb are very similar and may not be meaningful in cases where
the error tolerances are significantly smaller than the number of training facts. For
example, if the number of training facts is 100 and the error tolerance is 0.001, the number
of hidden neurons would be 0.1 (meaningless!) in Lawrence’s proposal; while Baum and
Hassler’s proposal would result in an even lower value. Most statisticians are not
20
Azoff referred to an article by Hush and Horne [1993].
convinced that rules of thumbs are of any use. They argue that there is no way to determine
a good network topology from just the number of inputs and outputs [Neural Network
FAQ 1996].
The Neural Network FAQ [1996] suggests a method called early stopping or stopped
training whereby a larger number of hidden neurons are used with a very slow learning
rate and with small random initial weight values. The out-of-sample error rate is computed
periodically during training. The training of the network is halted when the error rate in the
out-of-sample data starts to increase. A similar method to early stopping is used in the
development of the ANNs applications for the financial distress and foreign exchange
trading problems of this book. However, those ANNs do not use ‘lots of hidden units’ as
suggested by the article. Instead, they start with small numbers of hidden neurons with the
numbers increased gradually only if the ANNs do not seem to ‘learn’. In this way, the
problem of overfit or curve-fit which can occur when there are more weights (parameters)
than sample data can be avoided. However, a recent report by Lawrence et al. [1996]
suggest that using “oversize” networks can reduce both training and generalization error.
3.3.3 The Number of Hidden Layers
According to the Neural Network FAQ [1996], hidden layers may not be required at all. It
uses McCullagh and Nelder’s [1989] paper to support this view. They found linear and
generalized linear models to be useful in a wide variety of applications. They suggest that
even if the function to be learned is mildly non-linear, a simple linear model may still
perform better than a complicated nonlinear model if there is insufficient data or too much
noise to estimate the nonlinearities accurately.
MLPs that uses the step/threshold/Heaviside transfer functions need two hidden layers for
full generality [Sontag 1992], while an MLP that uses any of a wide variety of continuous
nonlinear hidden-layer transfer functions requires just one hidden layer with ‘an arbitrarily
large number of hidden neurons’ to achieve the ‘universal approximation’ property
described by Hornik et al. [1989] and Hornik [1993].
3.3.4 The Perceptron
The perceptron model, as mentioned in earlier chapters, was proposed by Frank Rosenblatt
in the mid 1960s. According to Carling [1992], the model was inspired by the discovery of
Hubel and Wiesel [1962] of the existence of some mechanism in the eye of a cat that can
determine line directions. Rosenblatt developed the perceptron learning theorem (that was
subsequently proved by Arbib [1989]) which states that if a set of patterns is learnable by a
perceptron, then the perceptron is guaranteed to find the appropriate weight set.
Essentially, Rosenblatt’s perceptron model was an ANN model consisting of only an input
layer and an output layer with no hidden layer. The input and output layers can have one or
more neurons. Rosenblatt’s model uses a threshold function as a transfer function although
the perceptron model can use any of the transfer functions discussed in section 3.2.3.
Therefore if the sum of the inputs is greater than its threshold value, the output neuron will
assume the value of 1, or else a value of 0. Fu [1994] states that in terms of classification,
an object will be classified by neuron j into Class A if
∑w x ij i >θ
Equation 3-4
where wij is the weight from neuron i to neuron j, xi is the input from neuron i, and θ is the
threshold on neuron j. If not, the object will be classified as Class B.
The weights on a perceptron model like the one shown in Figure 3-3 are adjusted by
Equation 3-5
where wij(t) is the weight from neuron i to neuron j at time t (to the tth iteration) and ∆wij is
the weight adjustment. The weight change is computed by using the delta rule:
∆wij = ηδ j xi
Equation 3-6
where η is the learning rate (0<η<1) and δj is the error at neuron j;
δj = Tj - Oj
Equation 3-7
where Tj is the target output value and Oj is the actual output of the network at neuron j.
The process is repeated iteratively until convergence is achieved. Convergence is the
process whereby the errors are minimized to an acceptable level. The delta rule is
discussed in more detail in section 3.4.1.1
Ripley [1993] claims that the number of random patterns a perceptron with N inputs can
classify without error is finite, since the patterns must be linearly separable. This is
irrespective of the existence of an algorithm to learn the patterns. He states that Cover
[1965] showed the asymptotic answer is 2N patterns. Ripley also proves the theorem in his
paper.
Initially there was widespread optimism as the perceptron could compute a number of
simple binary Boolean (logic) functions, i.e. AND, OR and NOT. However, the caveat
emptor here is that the only patterns that a perceptron can learn are linear patterns which
severely limit the type of problems that it could solve. This was the main criticism by
Minsky and Papert [1969] leading them to conclude that the perceptron could not solve
any ‘interesting problems’. One of the examples of a relatively simple problem that they
showed the perceptron could not solve is the exclusive or (XOR) problem which is
discussed in the next section.
3.3.5 Linear Separability and the XOR Problem
Linear separability refers to the case when a linear hyperplane exists to separate all
instances of one class from another. A single plane can separate three-dimensional space
into two distinct regions. Thus by extension, if there were n inputs where n > 2, then
Equation 3-4 becomes:
n
∑w x
i =1
ij j = θj
Equation 3-8
forming a hyperplane of n-1 dimension in the n-dimensional space (also called
hyperspace), dividing the space into two halves. According to Freeman and Skapura
[1991, pp. 24-30], many real life problems require the separation of regions of points in
hyperspace into individual categories, or classes, which must be distinguished from other
classes. This type of problem is also known as a classification problem. Classification
problems can be solved by finding suitable arrangements of hyperplanes that can partition
n-dimensional space into various distinct regions. Although this task is very difficult for
n>2 dimensions, certain ANNs (e.g. MLPs) can learn the proper partitioning by
themselves.
As mentioned in the last section, the perceptron can solve most binary Boolean functions.
In fact, all but two of the sixteen possible binary Boolean functions, which are the XOR
and its complement, are linearly separable and can be solved by the perceptron. The XOR
is a function that outputs a 1 if and only if its two inputs are not the same, otherwise the
output is 0. The truth table for the XOR function is shown in Table 3-1.
Gallant [1993] showed that a perceptron model (which he called a single-cell linear
discriminant model) can easily compute the AND, OR and NOT functions. Thus, he
defined a Boolean function to be a separable function if it can be computed by a single-
cell linear discriminant model; otherwise it is classified as a nonseparable function. He
further states that the XOR is the simplest nonseparable function in that there are no
nonseparable function with fewer inputs.
Application of the perceptron model of Figure 3-6 to the XOR problem yields:
Output, Oj = f(hj)
= f(w1jx1 + w2jx2,θ)
1, w1 j x1 + w2 j x 2 ≥ θ
0, w1 j x1 + w2 j x2 < θ
=
Equation 3-9
where wij is the weight on the connection from neuron i to j and xi is the input neuron i, hj
is the neuron j’s activation value and θ is the threshold value of the threshold function f.
A set of values must be found so that the weights can achieve the proper output value. We
will show that this cannot be done.
From Equation 3-9, a line on the x1 and x2 plane is obtained:
θ = w1jx1+w2jx2
Equation 3-10
By plotting the XOR function and this line for some values of θ, w1 and w2 on the x1 and
x2 plane in
Figure 3-4, we can see that it is impossible to draw a single line to separate the 1s
(represented by the squares) and the 0s (represented by the circles).
The next section will demonstrate how a multilayer perceptron (MLP) can be used to solve
this problem.
Figure 3-3
A Simple Perceptron Model
Output, Oj=f(hj,θj)
hj
w1j j w2j
x1 Inputs x2
Figure 3-4
A plot of the Exclusive-Or function showing that the two groups of inputs (represented by
squares and circles) cannot be separated with a single line.
X1 θ = w1x1 + w2x2
1
X2
-1 1
Table 3-1
-1 Truth Table for the Exclusive-Or Function
X1 X2 Output
0 0 0
0 1 1
1 0 1
1 1 0
3.3.6 The Multilayer Perceptron
As mentioned in earlier sections, an MLP (also called a multilayer feedforward network) is
an extension of the perceptron model with the addition of hidden layer(s) that have
nonlinear transfer functions in the hidden neurons. We have also mentioned that an MLP
having one hidden layer is a universal approximator, and is capable of learning any
function that is continuous and defined on a compact domain21 as well as functions that
consist of a finite collection of points. According to Masters [1993, pp. 85-90], the MLPs
can also learn many functions that do not meet the above criteria; specifically
discontinuities can be theoretically tolerated and functions that do not have compact
support (such as normally distributed random variables) can be learned by a network with
one hidden layer under some conditions22. Masters states that in practice, a second hidden
layer is only required if a function that is continuous has a few discontinuities. He further
states the most common reason for an MLP to fail to learn is the violation of the compact
domain assumption, i.e. the inputs are not bounded . He concludes that if there is a
problem learning in an MLP, it is not due to the model itself but to either insufficient
training, or insufficient number of neurons, insufficient number of training samples or an
attempt to learn a supposed function that is not deterministic.
21
A compact domain means that the inputs have definite bounds, rather than having no limits on what they
can be.
22
Kurkova [1995] has since, proven this theoretical assumption.
Figure 3-5
A Multilayer Perceptron Model That Solves the XOR Problem (adapted from Freeman and
Skapura 1991, p.29)
Output, Oj=f(hj,θj)
hj
0.6 θ = 0.5 -0.2
θ = 1.5
θ = 0.5
Inputs
x2
x1
Figure 3-6
A Possible Solution to the XOR Problem By Using Two Lines to Separate the Plane into
Three Regions
X1 1.5
1 Output = 0
0.5 X2
1.5
-1 0.5 1
Output = 1
-1 Output = 0
3.4 Learning
Learning is the weight modification process of an ANN in response to external input.
There are three types of learning:
1. Supervised learning
It is by far the most common type of learning in ANNs. It requires many samples to
serve as exemplars. Each sample of this training set contains input values with
corresponding desired output values (also called target values). The network will
then attempt to compute the desired output from the set of given inputs of each
sample by minimizing the error of the model output to the desired output. It
attempts to do this by continuously adjusting the weights of its connection through
an iterative learning process called training. As mentioned in earlier sections, the
most common learning algorithm for training the network is the back-propagation
algorithm.
2. Unsupervised learning
3. Reinforcement learning
It is a hybrid learning method in that no desired outputs are given to the network,
but the network is told if the computed output is going in the correct direction or
not. It is not used in this book and hence will not be considered further.
‘biases’ {θj} which is usually taken to be one [Ripley 1993] by minimizing the total
squared error, E:
1
∑ t p − op
2
E=
2 p
Equation 3-11
where op is the output for input xp, tp is the target output and the p indexes the patterns in
the training set. Both the delta rule and the backpropagation algorithms are a form of the
gradient descent rule, which is a mathematical approach to minimizing the error between
the actual and desired outputs. They do this by modifying the weights with an amount
proportional to the first derivative of the error with respect to the weight. The gradient
descent is akin to trying to move down to the lowest value of an error surface from the top
of a hill without falling into any ravine.
3.4.1.1 The Delta Rule/ Least Mean Squares (LMS) (Widrow-Hoff)
The Least Mean Square (LMS) algorithm was first proposed by Widrow and Hoff (hence,
it is also called the Widrow-Hoff Rule) in 1960 when they introduced the ADALINE
(Adaptive Linear), an ANN model that was similar to the perceptron model except that it
only has a single output neuron and the output activation is a discrete bipolar function 23
that produces a value of 1 or -1. The LMS algorithm was superior to Rosenblatt’s
perceptron learning algorithm in terms of speed but it also could not be used on networks
with hidden layers.
Most literature claims the Delta Rule and the LMS Rule are one and the same [Freeman
and Skapura 1991, p. 96, Nelson and Illingworth 1991, p. 137, Carling 1992, p.74, Hecht-
Nielsen 1990, p. 61]. They are, in terms of the weight change, ∆wij, formula given in
Equation 3-6:
∆wij = ηδ j xi
Equation 3-6
where η is the learning rate (0<η<1) and δj is the error at neuron j. However, Fu [1994, p.
30] states that the Widrow-Hoff (LMS) Rule differs from the Delta Rule employed by the
perceptron model in the way the error is calculated for weight updating.
From Equation 3-6, the delta rule error was: δj = Tj - Oj
Equation 3-12
The LMS rule can be shown to be a gradient descent rule.
From Equation 3-11, if we substitute output Op with XpWp
23
This is the also the reason why it does not work in networks with hidden layers.
1
∑ tp − Xp Wp
2
E=
2 p
Equation 3-13
where Xp is an input vector and Wp the weights vector.
Then, the gradient descent technique minimizes the error by adjusting the weights:
δE
∆W = −η
δW
Equation 3-14
where η is the learning rate. From Equation 3-13 and Equation 3-14, the LMS rule can be
rewritten as
∆W = −η(t p − X pW p ) X p
Equation 3-15
3.4.1.2 The Back-propagation (BP)/Generalized Delta Rule
The back-propagation (BP) algorithm is a generalization of the delta rule that works for
networks with hidden layers. It is by far the most popular and most widely used learning
algorithm by ANN researchers. Its popularity is due to its simplicity in design and
implementation.
Figure 3-7
This is similar to Figure 2-2 in chapter 2. Back-propagation of errors for a single neuron
j.
ej=dj-oj
Error dj
w
x w1j
1
Neuron j hj
Σ
x w ij xi
Oj = g(hj)
w
2 w2j
ww Transfer function
xi ij
The single neuron model of Figure 3-7 will be used to explain the BP algorithm. The BP
algorithm is used mainly with MLP but a single neuron model is used here for clarity. The
methodology remains the same for all models.
The BP algorithm involves a two-stage learning process using two passes: a forward pass
and a backward pass. In the forward pass, the output Oj is computed from the set of input
patterns, Xi:
O j = g ( h j ) = f ( h j ,θ j )
i
h j = ∑ wij x j
i =1
i
Therefore, O j = f ( ∑ wij xi ,θ j )
i =1
Equation 3-16
where f is a nonlinear transfer function, e.g. sigmoid function, θj is the threshold value for
neuron j, xi is the input from neuron i and wij is the weights associated with the connection
from neuron i to neuron j.
After the output of the network has been computed, the learning algorithm is then applied
from the output neurons back through the network, adjusting all the necessary weights on
the connections in turn. The weight adjustment, ∆wij, is as in the LMS Rule Equation 3-6,
∆wij = ηδ j xi
Equation 3-6
where η is the learning rate (0<η<1) and δj is the error at neuron j;
δj = ej(Oj)(1-Oj) = (∑ δ w )O (1 − O )
k k j j
Equation 3-17
for hidden neurons where k is the neuron receiving output from the hidden neuron.
The adjustments are then added to the previous values:
New Weight Value: wij = w’ij + ∆wij
Equation 3-18
where w’ij is the previous weight term.
The gradient descent method is susceptible to falling of a chasm and becoming trapped in
local minima. If the error surface is a bowl, imagine the gradient descent algorithm as a
marble rolling from the top of the bowl trying to reach the bottom (global minimum of the
error term, i.e. the solution). If the marble rolls too fast, it will overshoot the bottom and
swing to the opposite side of the bowl. The speed of the descent can be controlled with the
learning rate term, η. On the other hand, if the learning rate is set to a very small value, the
marble will descent very slowly and this translates to longer training time. An error surface
of a typical problem is normally not a smooth bowl but may contain ravine and chasm
where the marble could fall into. A momentum term is thus often added to the basic
method to avoid the model’s search direction from swinging back and forth wildly.
The weight adjustment term of Equation 3-6 will then translate to:
Equation 3-19
where M is the momentum term and w”ij is the weight before the previous weight w’ij. The
momentum term allows a weight change to persist for a number of adjustment cycles.
Notice if M is set to zero, then the equation reverts to Equation 3-6.
Random noise is often added to the network to alleviate the local minima problem. The
objective of the noise term is to ‘jolt’ the model out of a local minima. Fahlman [1992]
states that BP networks can and do fall into local minima but they are often the ones that
are needed to solve the particular problem. In other words, local minima solutions may
suffice for some problems and there is no need to seek the global minimum24.
There are many other variations to the BP algorithm but by far, BP still proves to be the
most popular and is implemented in almost all commercial ANN software packages.
24
This assumes that the global minimum is not very far from the local minima.
gives a list of statistics terminology that has its equivalence in ANN literature. Some of the
more common ones are listed in Table 3-2.
Table 3-2
Statistical and ANN Terminology
variables features
residuals errors
intercept bias
forecasting prediction
techniques; or they fail to compare their results to traditional statistical analysis and by not
doing so, invalidate any claims of a breakthrough.
Before the popularity of ANNs, few financial institutions used any form of statistical
methods (except for technical analysis, which some may claim to be pseudo-statistics) for
financial trading, and even fewer had a dedicated quantitative analysis unit for financial
analysis which is now a common sight in most major banks’ dealing rooms. As mentioned
in chapter 1, financial institutions are second only to the US Department of Defense in
sponsoring research into ANNs [Trippi and Turban 1996].
3.5.5 Conclusion of ANNs and Statistics
Sarle [1994] concludes it is unlikely that ANNs will supersede statistical methodology as
he believes that applied statistic is highly unlikely to be reduced to an automatic process or
‘expert system’. He claims that statisticians depend on human intelligence to understand
the process under study and an applied statistician may spend more time defining a
problem and determining what questions to ask than on statistical computation. He does,
however, concede that several ANNs models are useful for statistical applications and that
better communication between the two fields would be beneficial. White [1992, p. 81]
agrees that statistical methods can offer insight into the properties, advantages and
disadvantages of the ANN learning paradigm, and conversely ANN learning methods have
much to offer in the field of statistics. For example, statistical methods such as Bayes
analysis and regression analysis have been used in generating forecasts with confidence
intervals that have deeper theoretical roots in statistical inference and data generating
processes. ANN is superior for pattern recognition and is able to deal with any model
whereas statistical methods require randomness.
ANNs have contributed more to statistics than statisticians would care to admit. They have
enabled researchers from different disciplines and backgrounds to use modeling tools that
were once only available to statisticians due to the complexities and restrictive conditions
imposed by statistical models. By making modeling more accessible (and more interesting
perhaps), ANNs researchers without statistical background are beginning to gain an
appreciation of statistical methodologies due to the inevitable crossing of paths between
ANNs and statistics.
There are definitely more visible ANN commercial applications than statistical
applications even though the claim that some of the ANN methodologies were already
been ‘known for decades if not centuries in statistical and mathematical literature [Sarle
94]’.
3.6 References
1. Aharonian, G., Comments on comp.ai.neural-nets, Items 2311 and 2386 [Internet
Newsgroup].
2. Ameniya, T., “Qualitative Response Models: A Survey”, Journal of Economic
Literature, No. 19, pp. 1483-1536, 1981.
3. Ameniya, T., Advance Econometrics, Cambridge, Harvard University Press, 1985.
4. Azoff, E. M., Neural Network Time Series Forecasting of Financial Markets, John
Wiley & Sons, pp. 50-51, England, ISBN: 0-471-94356-8, 1994.
5. Baum, E. B. and Haussler, D., Neural Computation 1, 1988, 151–160.
6. Carling, A., Introducing Neural Networks, Sigma Press, ISBN: 1-85058-174-6,
England, 1992.
7. Cover, T. M., “Geometrical and statistical properties of systems of linear inequalities
with application in pattern recognition”, IEEE Trans. Elect. Comp., No. 14, pp. 326-
334, 1965.
8. Falhman, S. E., Comments on comp.ai.neural-nets, Item 2198 [Internet Newsgroup].
9. Freeman, J. A. and Skapura, D. M., Neural Networks: algorithms, applications, and
programming techniques, Addison-Wesley, ISBN 0-201-51376-5, October 1991.
10. Hand, D. J., Discrimination and Classification, John Wiley & Sons, New York, 1981.
11. Hardle, W., Applied Nonparametric Regression, Cambridge University Press,
Cambridge, UK, 1990.
12. Haynes, J. and Tan, C.N.W., “An Artificial Neural Network Real Estate Price
Simulator”, The First New Zealand International Two Stream Conference on Artificial
Neural Networks and Expert Systems (ANNES) (Addendum), University of Otago,
Dunedin, New Zealand, November 24-26, 1993, IEEE Computer Society Press, ISBN
0-8186-4260-2, 1993.
13. Hect-Nielsen, R., Neurocomputing, Addison-Wesley, Menlo Park, CA, USA, ISBN: 0-
201-09355-3, 1990.
14. Hornik, K., “Some new results on neural network approximation”, Neural Networks,
6, 1069-1072, 1993.
15. Hornik, K., Stinchcombe, M. and White, H., “Multilayer feedforward networks are
universal approximators”, Neural Networks, 2, 359-366, 1989.
16. Hosmer, D. W. and Lemeshow, S., Applied Logistic Regression, John Wiley & Sons,
New York, 1989.
17. Hubel, D. H. and Wiesel, T. N., “Receptive fields, binocular and functional
architecture in the cat’s visual cortex”, J. Physiol., 160: 106-154, 1962.
18. Hush, D. R. and Horne, B. G., “Progress in Supervised Neural Networks”, IEEE Signal
Processing Magazine, vol. 10, no. 1, pp. 8-39, January 1993.
19. Kalman, B. L, and Kwasny, S. C., “Why Tanh? Choosing a Sigmoidal Function”,
International Joint Conference on Neural Networks, Baltimore, MD, USA, 1992.
20. Kuan, C.-M. and White, H., “Artificial Neural Networks: An Econometric
Perspective”, Econometric Reviews, vol. 13, No. 1, pp. 1-91, 1994.
21. Lawrence, J., Introduction to Neural Networks: Design, Theory, and Applications 6th
edition, edited by Luedeking, S., ISBN 1-883157-00-5, California Scientific Software,
California, USA, July 1994.
22. Lawrence, S., Giles, C. L., and Tsoi, A. C., “What Size Neural Networks Gives
Optimal Generalization? Convergence Properties of Backpropagation”, Technical
Report UMIACS-TR-96-22 and CS-TR-3617, Institute of Advanced Computer Studies,
University of Maryland, College Park, MD 20742, 1996.
23. Lippmann, R. P., “An Introduction to Computing with Neural Nets”, IEEE ASSP
Magazine, pp. 4-23, April 1987.
24. Masters, T., Practical Neural Network Recipes in C++, Academic Press Inc., San
Diego, CA., USA, ISBN: 0-12-479040-2, p.6, 1993.
25. McCullagh, P. and Nelder, J. A. Generalized Linear Models, 2nd ed., Chapman &
Hall, London, UK, 1989.
26. McCulloch, W. S. and Pitts, W., “A Logical Calculus of Ideas Immanent in Nervous
Activity”, Bulletin of Mathematical Biophysics, pp. 5:115-33, 1943.
27. McLachlan, G. J., Discriminant Analysis and Statistical Pattern Recognition, John
Wiley & Sons, New York, 1992.
28. Minsky, M. and Papert, S. A., Perceptrons. Expanded Edition, MIT Press, Cambridge,
MA, USA, ISBN: 0-262-63111-3, 1988.
29. Nelson, M. M. and Illingworth, W. T., A Practical Guide to Neural Nets, Addison-
Wesley Publishing Company, Inc., USA, ISBN: 0-201-52376-0/0-201-56309-6, 1991.
30. Neural Network FAQ, Maintainer: Sarle, W. S., “How Many Hidden Units Should I
Use?”, July 27, 1996, Neural Network FAQ Part 1-7, [Online], Archive-name:ai-
faq/neural-nets/part3, Available: ftp://ftp.sas.com/pub/neural/FAQ3.html, [1996,
August 30].
31. Ripley, B. D., “Statistical Aspects of Neural Networks”, Networks and Chaos:
Statistical and Probabilistic Aspects edited by Barndoff-Nielsen, O. E., Jensen, J.L. and
Kendall, W.S., Chapman and Hall, London, United Kingdom, 1993.
32. Robbins, H. and Munro, S., “A stochastic approximation method”, Annals of
Mathematical Statistics, No. 25, p. 737-44, 1951.
33. Sarle, W. S., “Neural Networks and Statistical Models”, Proceedings of the Nineteenth
Annual SAS Users Group International Conference, Cary, NC: SAS Institute, USA,
pp. 1538-1550, 1994.
34. Sarle, W. S., “Neural Network and Statistical Jargon?”, April 29, 1996, [Online],
Archive-name:ai-faq/neural-nets/part3, Available: ftp://ftp.sas.com/pub/neural/jargon,
[1996, August 24].
35. Shih, Y., Neuralyst User’s Guide, Cheshire Engineering Corporation, USA, pp.74,
1994.
36. Sontag, E. D. (), “Feedback stabilization using two-hidden-layer nets”, IEEE
Transactions on Neural Networks, 3, 981-990, 1992.
37. Weiss, S. M., and Kulikowski, C. A., Computer Systems That Learn, Morgan
Kauffman, San Mateo, CA, 1991.
38. White, H., Artificial Neural Networks: Approximation and Learning Theory,
Blackwell Publishers, Oxford, UK, ISBN: 1-55786-329-6, 1992.
39. White, H., “Some asymptotic results for learning in single hidden layer feedforward
network models”, Journal of American Statistical Association, No. 84, p. 1008-13,
1989.
40. Widrow, B. and Hoff, M. D., “Adaptive Switching Circuits”, 1960 IRE WESCON
Convention Record, Part 4, pp. 96-104, 1960.
“Economic distress will teach men, if anything can, that realities are less
dangerous than fancies, that fact-finding is more effective than fault-finding”
Carl Becker (1873-1945), Progress and Power
25
Part of this chapter has been published in Neural Networks in Finance and Investing edited by
Trippi and Turban, Irwin, USA, Chapter 15 pp. 329-365, ISBN 1-55738-919-6, 1996
CNW Tan Page 50
Chapter 4: Using Artificial Neural Networks to Develop An Early Warning Predictor for Credit Union
Financial Distress
4.1 Introduction
Since Beaver’s [1966] pioneering work in the late 1960s there has been considerable
interest in using financial ratios to predict financial failure26. The upsurge in interest
followed the seminal work by Altman [1968] in which he combines five financial ratios
into a single predictor (which he calls factor Z) of corporate bankruptcy27. An attractive
feature of Altman’s methodology is that it provides a standard benchmark for comparison
of companies in similar industries. It also enables a single indicator of financial strength to
be constructed from a company’s financial accounts. While the methodology is widely
appealing, it has limitations. In particular, Gibson and Frishkoff [1986] point out that
ratios can differ greatly across industrial sectors and accounting methods28.
These limitations are nowhere more evident than in using financial indicators to predict
financial distress among financial institutions. The naturally high leverage of financial
institutions means that models developed for the corporate sector are not readily
transportable to the financial sector. The approach has nonetheless gained acceptance in its
application to financial institutions by treating them as a unique class of companies.
Recent examples in Australia include unpublished analyses of financial distress among
non-bank financial institutions by Hall and Byron [1992] and McLachlan [1993]. Both of
these studies use a Probit model to deal with the limited dependent variable nature of
financial distress data.
This study examines the viability of an alternative methodology for the analysis of
financial distress based on artificial neural networks (ANNs). In particular, it focuses on
the applicability of ANNs as an early warning predictor of financial distress among credit
unions. The ANN-based model developed in this chapter is compared with the Probit
model results of Hall and Byron. In particular, this study is based on the same data set used
by Hall and Byron. This facilitates an unbiased comparison of the two methodologies. The
results reported in the paper indicate that the ANN approach is marginally superior to the
Probit model over the same data set. The paper also considers ways in which the model
design can be altered to improve the ANN’s performance as an early warning predictor.
26
See, for example, Beaver [1966], Ohlson [1980], Frydman Altman and Kao [1985], Casey and Bartczak
[1985] and McKinley et al. [1983] and the works cited in these studies.
27
The function is Z = 0.12X1+0.014X2+0.033X3+0.006X4+0.999X5 where X1 = Working capital/Total
Assets (%), X2 = Total retained earnings/total assets (%), X3 = Earnings before interest and taxes
(EBIT)/total assets (%), X4 = Market value of equity/book value of total debt (%) and X5 = Sales/total assets.
Rowe et al. [1994, p. 373] states that in some cases, the Z-factor can be approximated with the simplified
sales
equation: Z≈ .
total _ assets
28
These cautions are reinforced by Horrigan [1968] and Levy and Sarnat [1988].
29
See, for example, Deakin [1972], Libby [1975ab], Schipper [1977], Altman, Haldeman and Narayanan
[1977], Dambolena and Khoury [1980], Gombola and Ketz [1983], Casey and Bartzak [1985], Gentry,
Newbold and Whitford [1985a] and Sinkey [1975].
30
Other studies that have used binary choice analysis in financial distress prediction include Ohlson [1980],
Gentry, Newbold and Whitford [1985b], Casey and Bartzak [1985] and Zavgren [1985].
Using both Probit and multiple discriminant models to correct these problems, they found
that neither the multiple discriminant model nor the Probit model outperformed a naive
model which assumed all firms to be non-bankrupt.
The study that is used as the basis for comparison in this chapter is that by Hall and Byron.
Hall and Byron use a Probit model with thirteen basic financial ratios to predict financial
distress among credit unions in New South Wales. Of the thirteen ratios, four were found
to make a significant contribution to predicting financial distress. The significant ratios
were:
RA: Required Doubtful Debt Provision
RB: Permanent Share Capital + Reserves + Overprovision for Doubtful Debt to
Total Assets (%)
RC: Operating Surplus to Total Assets (%)
RG: Operating Expenses to Total Assets (%)
Their estimated index function, Y, was:
Y = 0.330RA - 0.230RB -0.671RC + 0.162RG - 1.174 - 0.507Q1 -0.868Q2+0.498Q3
where the variables Q1 to Q3 are seasonal dummy variables to capture any seasonal effects
in the data.
A conditional probability of financial distress is obtained by referring to the cumulative
normal statistical tables. Any Credit Unions with a conditional probability greater than one
were classified by Hall and Byron as being in ‘distress’.
31
The results were subsequently published by Bell et al. [1990].
The data set was divided into two separate sets. Data for all quarters of 1989 to 1990 were
used as the training set (in-sample data) to build the early warning predictor, while data for
all quarters of 1991 were used as the validation set (out-of-sample data). The training sets
contained a total of 1449 observations with 46 credit unions in the distress category. The
validation set contained a total of 695 observations with 20 credit unions classified as in
distress.
4.4.2 Input (Independent) Variables
The inputs used in the ANN are the same variables used by Hall and Byron. They consider
thirteen financial ratios to reflect the stability, profitability and liquidity of a Credit Union
plus four dummy variables to indicate the quarters in a year (See Table 4.1 below). Hall
and Byron argue that the quarterly seasonal dummies are needed to adjust for the
seasonality in some of the ratios. They also conducted a statistical analysis on the ratios to
determine their significance to credit unions in distress.
Hall and Byron find only four of the thirteen ratios and three of the four quarterly dummy
variables statistically significant as independent variables and thus incorporated only those
variables in their final model. Using the ANN methodology, the ANN is allowed to
determine the significance of the variables by incorporating all the available information as
input in the model. The reason for this is that ANNs are very good at dealing with large
noisy data sets and, in their learning processes, eliminate inputs that are of little
significance by placing little or no weight values on the connections between the input
nodes of those variables. The tradeoff is that larger networks require larger amounts of
training time.
The financial ratios and Hall and Byron’s comments on their significance are reproduced
in Table 4-1.
Application of the ratios in Table 4.1 leads to an input layer of the ANN consisting of 17
neurons with each neuron representing one of the above input variables. The output layer
consists of only one output, indicating the status of the Credit Union as either distressed or
not. The objective is for the ANN to predict the binary output of the status of the Credit
Unions, with 1 indicating that the Credit Union is in distress and 0 indicating that is in
non-distress. The output values of the ANN are continuous with upper and lower bounds
of 0 and 1. Therefore, even though the objective or target values themselves are discrete,
probability theory can be used to interpret the output values.
32
Incidentally, the ANN that gave the minimum Type I errors in the in-sample data set also gave the
minimum Type I errors for the combined in-sample and the out-of-sample data sets. See Chart 4-1.
50
45
40
35
30
20
15
10
0
50
200
350
500
650
800
950
1100
1250
1400
1550
1700
1850
2000
2150
2300
2450
2600
2750
2900
3050
3200
3350
3500
3650
3800
3950
4100
4250
4400
4550
4700
4850
5000
Table 4-2 Artificial Neural Networks Parameters
Network Parameters
Learning rate 0.05
Momentum 0.1
Input Noise 0
Training Tolerance 0.9
Testing Tolerance 0.9
A brief description of each of the parameters is discussed below:
4.5.1 Learning Rate
The learning rate determines the amount of correction term that is applied to adjust the
neuron weights during training. The learning rate of the neural net was tested with values
ranging from 0.05 to 0.1.
Small values of the learning rate increase learning time but tend to decrease the chance of
overshooting the optimal solution. At the same time, they increase the likelihood of
becoming stuck at local minima. Large values of the learning rate may train the network
faster, but may result in no learning occurring at all. Small values are used so as to avoid
missing the optimal solution. The final model uses 0.05; the lowest learning rate in the
range.
4.5.2 Momentum
The momentum value determines how much of the previous corrective term should be
remembered and carried on in the current training. The larger the momentum value, the
greater the emphasis placed on the current correction term and the less on previous terms.
The momentum value serves as a smoothing process that ‘brakes’ the learning process
from heading in an undesirable direction.
Figure 4-1
The ANN Topology
One Neurode
Output Layer
5 Neurodes
Hidden Layer
15 Neurodes
Input Layer
…
4.6 Results
A summary of the overall accuracy of both models training (in-sample) data set and
validation (out-of-sample) data set, as well as selected Credit Unions is displayed in a
similar fashion to the Hall and Byron’s paper so as to allow for a direct comparison of the
two models. The full results for all the Credit Unions (except for Credit Unions numbered
as 1058, 1093, 1148 and 1158 that were too small) from both models are in Appendix B of
this research.
In the tables below, the Type I errors are highlighted by the box shading and the Type II
errors are highlighted by a plain background box. The accuracy of the models is computed
by taking the percentage of the total number of correct classifications in both categories
from the total number in both categories.
Accuracy =
∑ Distress CUs classified as Distress + ∑ Non - Distress CUs classified as Non - Distress
∑ CUs
where CUs = Credit Unions.
[Equation 4.1]
model. The Type II error in using the ANN model is a little over half a percent higher than
the Probit model. However, there are no statistical differences in the results at α = 0.05 in
all cases.
The ANN model is marginally superior (7.5% better) to the Probit scores method in terms
of the fewest number of Type I errors committed. The Type II errors that the ANN model
committed are only 1.8% worse in terms of the number of Type II errors committed.
Therefore it may be a worthwhile tradeoff in using the ANN model over the Probit model.
4.8 Conclusion
The ANN model has been demonstrated to perform as well and in some cases better than
the Probit model as an early warning model for predicting Credit Unions in distress. The
overall accuracy of the ANN model vs. the Probit model is almost the same at around 90%
for the ‘in’ sample data and 92% for the out-of-sample data. The results of the two models
are not statistically significant different at α = 0.05.
However, care should be taken in interpreting the accuracy results as explained in earlier
sections that the Type II errors (predicting a Credit Union in distress when it is not) may
actually be an early warning indicator of problems that do not surface until later quarters.
Therefore the results from the models may actually be better than those reflected in the
overall accuracy. A better benchmark would be the model with the fewest number of Type
I errors.
The models provided early warning signals in many of the credit unions that eventually
were in financial distress but were unjustifiably penalized with Type II errors due to the
classification technique employed by Hall and Byron. In their technique, a credit union
was classified in distress only after it has been put under direction or under notice of
direction. This has a severe effect on ANNs as they learn through mistakes and being told
that predicting a credit union in distress when the supervisors have not put it under
direction or notice of direction is wrong even though it actually goes into financial distress
in the near future. As a result in the ANN will build a suboptimal model that cannot, by
design, provide early warning. This may hold true for the Probit model too. The data set
needs to be reconstructed in future studies so that credit unions that failed in n number of
quarters will be classified as in potential distress in order to allow for n number of quarters
forecast.
One of the elements that seems to be vital to this type of research but missing from the
models in this study is the temporal effect of the independent variables. The temporal
effect of the financial data time series was ignored because this study is meant to be a
comparison with Hall and Byron’s work which did not use any time-dependent variables.
They state in their paper that they find no significance in the one period change of any of
the financial ratios. The models constructed thus are severely restricted in their time
horizon forecast which are only able to predict financial distress for the quarter that
financial ratios are obtained. This seems to be contrary to the objective of achieving an
early predictor system.
to be considered with the potential gain in terms of both monetary gain (from prevention
of a credit union going under) and public confidence. The cost of extra resources required
to implement the system will have to be justified. The personnel resources required include
a team of system builders, system maintenance personnel and additional monitoring and
auditing staff (in anticipation of an increase in the number of credit unions audit due to
Type II errors). Other resources required will include computer equipment, the design and
drafting of new compliance rules for the credit unions, staff training, integrating the system
with existing information systems and a facility to house the new department.
A prototype of the system may need to be constructed to demonstrate to the management
the tangible benefits that can be derived from full implementation of the system. and to
convince them to commit resources to the project. It is important to gain acceptance from
management and also the people who will be working with it. Constructing a prototype
will also provide the system builders with experience that will be valuable in the actual full
implementation of the project.
The ten largest conditional probabilities of both models for each quarter of 1991 are
provided in Appendix A. The only credit union from the Hall and Byron study that was
missing from the table is Credit Union number 1148 which was omitted from this study
due to its small size. The appendix will be used in future research to analyze the
relationship of the ratios to the ANN model output.
Since one of the major weaknesses of ANNs is the difficulty in explaining the model,
future research will concentrate on studying the interaction of the input variables in
relation to the outputs as well as the associated weights of the networks’ structures. The
ANN parametric effects on the result will be studied in a similar method used by Tan and
Wittig [1993] in their parametric study of a stock market prediction model. Sensitivity
analysis on input variables, similar to those performed by Poh [1994], can be conducted to
determine the effect each of the financial ratios have on the financial health of the credit
unions.
Different types of artificial neural networks such as the Kohonen type of network will be
constructed to see if the results can be improved. The Kohonen network has been used by
Prof. A. C. Tsoi of the University of Queensland quite successfully in predicting medical
claims fraud.
The utilization of genetic algorithms to select the most optimal ANN topology and
parameter setting will be explored in future research. Hybrid type of models discussed by
Wong and Tan [1994], incorporating ANNs with fuzzy logic and/or expert systems will
also be constructed in future to see if the results can be improved. The benefits of
incorporating ANNs with rule-based expert systems as proposed by Tan [1993a] for a
trading system will be examined to see if the same concept can be implemented in the
context of financial distress prediction of credit unions.
4.11 References
2. Altman, E, Haldeman R. and Narayanan, P., “Zeta Analysis”, Journal of Banking and
Finance, pp. 29054, June 1977.
3. Altman, E., Financial ratios, “Discriminant Analysis and the Prediction of Corporate
Bankruptcy”, Journal of Finance, pp. 589-609, September 1968.
4. Back, B., Laitinen, T. and Sere, K., “Neural Networks and Bankruptcy Prediction:
Fund Flows, Accrual Ratios, and Accounting Data”, Advances In Accounting, ISBN: 0-
7623-0161-9, Vol. 14, pp. 23-37, 1996.
6. Bell, T. B., G. S. Ribar and J. R. Verchio, “Neural Networks vs. Logistic Regression:
A Comparison of Each Model’s Ability to Predict Commercial Bank Failures”,
Deloitte & Touche/University of Kansas Auditing Symposium, May 1990.
7. Brockett, P. L., Cooper, W. W., Golden, L. L. and Pitakong, U., “A Neural Network
Method for Obtaining an Early Warning of Insurer Insolvency”, Journal of Risk and
Insurance, Vol. 61, No. 3, pp. 402-424, 1994.
10. Dambolena, I. and Khoury, S., “Ratio Stability and Corporate Failure”, Journal of
Finance, pp. 1017-26, September 1980.
11. Gentry, J., Newbold, P. and Whitford, D., “Bankruptcy: If Cash Flow’s Not the Bottom
Line, What Is?”, Financial Analysts Journal (September/October), pp. 17-56, 1985b.
12. Gentry, J., Newbold, P., Whitford, D., “Classifying Bankrupt Firms with Fund Flow
Components”, Journal of Accounting Research (Spring), pp. 146-59, 1985a.
13. Gibson, C. H., Frishkoff, P. A., Financial Statement Analysis: Using Financial
Accounting Information, 3rd. ed., Kent Publishing Company, Boston, 1986.
14. Gombola, M., and Ketz, J., “Note on Cash Flow and Classification Patterns of
Financial Ratios”, The Accounting Review, pp. 105-114, January 1983.
15. Hall, A. D. and Byron, R., “An Early Warning Predictor for Credit Union Financial
Distress”, Unpublished Manuscript for the Australian Financial Institution
Commission.
16. Horrigan, J. O., “A Short History of Financial Ration Analysis”, The Accounting
Review, vol. 43, pp. 284-294, April 1968.
18. Levy, H. and Sarnat, M., “Caveat Emptor: Limitations of Ratio Analysis”, Principles
of Financial Management, Prentice Hall International, pp. 76-77, 1989.
19. Libby, R., “Accounting ratios and the Prediction of Failure: Some Behavioral
Evidence”, Journal of Accounting Research (Spring), pp. 150-61, 1975.
21. Martin, D., “Early Warning of Bank Failure-A Logit Regression Approach”, Journal of
Banking and Finance, vol. 1, pp. 249-276, 1977.
22. McKinley, J. E., R. L Johnson, G.R Downey Jr., C. S. Zimmerman and M. D. Bloom,
“Analyzing Financial Statements”, American Bankers Association, Washington, 1983.
23. Odom, M. D., & Ramesh Sharda, “A Neural Network Model for Bankruptcy
Prediction”, Proceedings of the IEEE International Conference on Neural Networks,
pp. II163-II168, San Diego, CA, USA, June 1990.
24. Ohlson, J., “Financial ratios and Probabilistic Prediction of Bankruptcy”, Journal of
Accounting Research (Spring), pp. 109-31, 1980.
25. Pacey, J., Pham, T., “The Predictiveness of Bankruptcy Models: Methodological
Problems and Evidence”, Journal of Management, 15, 2, pp. 315-337, December 1990.
Poh, H. L., “A Neural Network Approach for Decision Support”, International Journal of
Applied Expert Systems, Vol. 2, No. 3, 1994.
26. Rowe, A. J., Mason, R. O., Dickel, K. E., Mann, R. B., and Mockler, R. J., Strategic
Management: A Methodological Approach 4th Edition, Addison-Wesley Publishing
Company, USA, 1994.
27. Rumelhart, D. E., Hinton, G. E., and Williams, R. J., “Learning Internal
Representations by Error Propagation”, Parallel Distributed Processing, Vol. 1, MIT
Press, Cambridge Mass., 1986.
28. Salchenberger, L. M., E. M. Cinar and N. A. Lash, “Neural Networks: A New Tool For
Predicting Thrift Failures”, Decision Sciences, Vol. 23, No. 4, pp. 899-916,
July/August 1992.
31. Tam, K. Y. and M. Y. Kiang, “Managerial Applications of Neural Networks: The Case
of Bank Failure Predictions”, Management Science, Vol. 38, No. 7, pp. 926-947, July
1992.
32. Tan, C.N.W. and Wittig, G. E., “Parametric Variation Experimentation on a Back-
propagation Stock Price Prediction Model”, The First Australia and New Zealand
Intelligent Information System (ANZIIS) Conference, University of Western Australia,
Perth, Australia, December 1-3, 1993, IEEE Western Australia Press, 1993.
33. Tan, C.N.W., “Incorporating Artificial Neural Network into a Rule-based Financial
Trading System”, The First New Zealand International Two Stream Conference on
Artificial Neural Networks and Expert Systems (ANNES), University of Otago,
Dunedin, New Zealand, November 24-26, 1993, IEEE Computer Society Press, ISBN
0-8186-4260-2, 1993a.
34. Tan, C.N.W., “Trading a NYSE-Stock with a Simple Artificial Neural Network-based
Financial Trading System”, The First New Zealand International Two Stream Conference on
Artificial Neural Networks and Expert Systems (ANNES), University of Otago, Dunedin, New
Zealand, November 24-26, 1993, IEEE Computer Society Press, ISBN 0-8186-4260-2, 1993b
35. Whitred, G. and Zimmer, I., “The Implications of Distress Prediction Models for
Corporate Lending”, Accounting and Finance, 25, pp. 1-13, 1985.
36. Wong, F. and Tan, C., “Hybrid Neural, Genetic and Fuzzy Systems”, Trading On The
Edge: Neural Genetic and Fuzzy Systems for Chaotic Financial Markets, John Wiley
and Sons Inc., pp. 243-261, 1994.
Table 4-6
The 10 largest predicted conditional probabilities
5.1 Introduction
This chapter focuses on the application of Artificial Neural Networks (ANNs) to financial
trading systems. A growing number of studies have reported success in using ANNs in
financial forecasting and trading33. In many cases, however, transaction costs and, in the
case of foreign exchange, interest differentials, have not been taken into account. Attempts
are made to address some of these shortcomings by adding the interest differentials and
transaction costs to the trading system in order to produce a more realistic simulation.
The complexity and problems encountered in designing and testing ANN-based foreign
exchange trading systems as well as the performance metrics used in the comparison of
profitable trading systems are discussed. The idea of incorporating ANNs into a rule-based
trading system has been raised in earlier work by the author [Tan 1993a, Wong and Tan
1994]. The particular trading system used in this chapter is based on an earlier model
constructed by the author and published in the proceedings of the ANNES ‘93 conference
[Tan 1993b]. The system uses ANN models to forecast the weekly closing Australian/US
dollar exchange rate from a given set of weekly data. The forecasts are then passed through
a rule-based system to determine the trading signal. The model generates a signal of either
‘buy’, ‘sell’ or ‘do nothing’ and the weekly profit or loss is computed from the simulated
trading based on the signals. The various attitudes towards risk is also approximated by
applying a range of simple filter rules.
An appendix is provided at the end of this chapter which discusses the different foreign
exchange trading techniques in use, including technical analysis, fundamental analysis and
trading systems.
This chapter builds on another earlier study by the author that was reported at the
TIMS/INFORMS ‘95 conference at Singapore in June 1995 [Tan 1995a] and the Ph.D.
Economics Conference at Perth in December 1995 [Tan 1995b]. It introduces the idea of a
simple hybrid Australian/US dollar exchange rate forecasting/trading model, the
ANNWAR, that incorporates an ANN model with the output from an autoregressive (AR)
model. In the earlier study, initial tests find that a simple ANN-based trading system for
the Australian/US dollar exchange rate market fail to outperform an AR-based trading
system, thus resulting in the development of the hybrid ANNWAR model. The initial
ANNWAR model results indicates that the ANNWAR-based trading system is more
robust than either of the independent trading systems (that utilize the ANN and the AR
models on their own). The earlier study however, uses a smaller data set and the advantage
of the ANNWAR model over the simple ANN model in terms of returns alone is quite
marginal.
Furthermore, the best ANNWAR and ANN models were simple linear models, thus
casting doubt on the usefulness of ANNs’ ability in solving non-linear problems. One of
the reasons for that result may have been the nature of the out-of-sample (validation) data
See for example, . Widrow et al. [1994], Trippi and Turban [1996]33
set which was clearly in a linear downward trend. This property of the data may also have
explained why the AR model outperformed the ANN model in the earlier study. In this
chapter, the tests are repeated with additional data. The results from the larger data set
show that the ANN and ANNWAR models clearly outperform the AR model. The
ANNWAR model also clearly outperforms the ANN model.
For the rest of this book, the ANN model with the AR input is referred to as an ANNWAR
while the ANN model without the AR is referred to as an ANNOAR. I will refer to both
the ANNWAR and ANNOAR models collectively as ANN models in general if no
differentiation is needed.
Studies from the Post-Float period generally provide a better test of the efficiency of the
foreign exchange market. These studies also benefited from advances in econometric
methodology. Using the weekly spot rates and forward rates of three maturities in the
period up to January 1986, Tease [1988] find the market to be less efficient subsequent to
the depreciation in February 1985. Kearney and MacDonald [1991] conduct a similar test
on changes in the exchange rate, using data from January 1984 to March 1987. They
conclude that the change in the spot foreign exchange rate does not follow a random
walk34 and that there was strong evidence for the existence of a time-varying risk
premium.
Sheen [1989] estimate a structural model of the Australian dollar/US Dollar exchange rate
and find some support for the argument that structural models are better predictors than a
simple random walk model using weekly data from the first two years of the float. There
have been other studies that apply multivariate cointegration techniques to the exchange
rate markets, reporting results that do not support the efficient market hypothesis [see for
example, Karfakis and Parikh 1994]. However, the results of these studies may be flawed
as it has been shown that cointegration does not mean efficiency and vice-versa [See
Dwyer and Wallace 1992 and Engel 1996].
5.2.3 Literature Review on Trading Systems and ANNs in Foreign Exchange
Despite the disappointing result from White’s [1988] initial seminal work in using ANNs
for financial forecasting with a share price example, research in this field has generated
growing interest. Despite the increase in research activity in this area however, there are
very few detailed publications of practical trading models. In part, this may be due to the
fierce competition among financial trading houses to achieve marginal improvements in
their trading strategies which can translate into huge profits and their consequent
reluctance to reveal their trading systems and activities.
This reluctance notwithstanding, as reported by Dacorogna et al. [1994], a number of
academicians have published papers on profitable trading strategies even when including
transaction costs. These include studies by Brock et al. [1992], LeBaron [1992], Taylor
and Allen [1992], Surajaras and Sweeney [1992] and Levitch and Thomas [1993].
From the ANN literature, work by Refenes et al.[1995], Abu-Mostafa [1995], Steiner et
al.[1995], Freisleben [1992], Kimoto et al.[1990], Schoneburg [1990], all support the
proposition that ANNs can outperform conventional statistical approaches. Weigend et al.
[1992] find the predictions of their ANN model for forecasting the weekly Deutshmark/US
Dollar closing exchange rate to be significantly better than chance. Pictet et. al. [1992]
reports that their real -time trading models for foreign exchange rates returned close to
18% per annum with unleveraged positions and excluding any interest gains. Colin [1991]
reports that Citibank’s proprietary ANN-based foreign exchange trading models for the US
Dollar/Yen and US Dollar/Swiss Franc foreign exchange market achieved simulated
34
Random walk hypothesis states that the market is so efficient that any predictable fluctuations of price are
eliminated thus making all price changes random. Malkiel [1973] defines the broad form of the random-walk
theory as “Fundamental analysis of publicly available information cannot produce investment
recommendations that will enable an investor consistently to outperform a buy-and-hold strategy in managing
a portfolio. The random-walk theory does not, as some critics have claimed, state that stock prices move
aimlessly and erratically and are insensitive to changes in fundamental information. On the contrary, the
point of the random-walk theory is just the opposite: The market is so efficient prices move so quickly
when new information does arise that no one can consistently buy or sell quickly enough to benefit”.
trading profits in excess of 30% per annum and actual trading success rate of about 60%
on a trade-by-trade basis. These studies add to the body of evidence contradicting the
EMH.
35
The term pip is used to describe the smallest unit quoted in the foreign exchange market for a particular
currency. For example a pip in the US Dollar is equivalent to US$0.0001 or .01 of a cent while a pip in Yen
is 0.01 Yen.
⇒ 01507
. − 0.0707 = 0.0800
.08
The profit rate is per week or an effective, annualized interest rate of 8.32%.
100
If the interest rate differential is wider, with the Australian interest rate at iA = 12% and the
US interest rate at iUS = 5%, the result from this transaction will be a loss:
[ ForeignExchange Pr ofit / Loss] − [ NetFundingCost ]
100×.7947 12.02 4.98
⇒ 100 − − × 100 − × (100×.7947)
.7959 52 × 100 52 x100
⇒ 01507
. − 0.1550 = −0.0043
Clearly, a correct forecast of depreciation is a necessary, but not a sufficient condition for
the speculator to profit. The following factors will affect the profitability of a transaction:
As transaction costs increases, profits will decrease and vice-versa.
As the interest differential widens, the net funding cost will be higher and profits will
decrease.
5.4 Data
5.4.1 ANN Data Sets: Training, Testing and Validation
The data used in this study were provided by the Reserve Bank of Australia. The data
consist of the weekly closing price of the US dollar/Australian Dollar exchange rate in
Sydney, the weekly Australian closing cash rate in Sydney and the weekly closing US Fed
Fund rate in New York from 1 Jan 1986 to 14 June 1995 (495 observations).
In the earlier study reported by the author [Tan 1995ab], the data used extend only from 1
January 1986 to 16 September 1994 (453 observations). The additional data used in this
study are indicated to the right of the vertical line in Chart 1. This additional data has
significantly improved the ANNs but has resulted in dismal performance by the AR. This
is probably due to the higher volatility in this new data set and the less linear nature of the
out-of-sample data. The previous out-of-sample data were clearly on a strong downtrend.
Of the total 495 observations for this study, the last 21 observations (27 January 1995 to
14 June 1995) are retained as out-of-sample data with the remaining 474 observations used
as in-sample data set for both the ANN and the AR models36.
36
Note that in constructing the models, the last observation (14 June 1995) was used only in forecast
comparison as a one step forecast. In addition, the first two observations were only used to generate lagged
inputs before the first forecast on 15 January 1986.
In the case of the ANN, the observations in the in-sample data set are divided again into
training and testing data sets. The first 469 observations are used as the training set to
build the ANN model; the remaining 5 observations are used to determine a valid ANN
model and to decide when to halt the training of the ANN37. Statistical, mathematical and
technical analysis indicators such as the logarithmic values, stochastic oscillators, relative
strength index and interest differentials, are derived from the original data set and used as
additional inputs into the ANNs. Interestingly, the final ANN model disregards all the
additional variables; the best ANN model uses only the closing price of the exchange rate
and the AR output as input variables with a time window size of three periods.
Chart 1: The Australian Dollar/US Dollar Weekly Exchange Rate Data
from 1 Jan 1986 to 15 June 1995
1.7000
The A$/US$ Exchange Rate Sydney Weekly Close
1 Jan 1986 to 14 June 1995
1.6000 New
Data
1.5000
1.4000
1.3000
A$/US$ Exchange Rate
Out of
1.2000 Sample
Data
1.1000
1.0000
17-Dec-86
2-Dec-87
25-May-88
10-May-89
1-Jan-86
25-Jun-86
10-Jun-87
23-Feb-94
25-Mar-92
10-Mar-93
17-Oct-90
2-Oct-91
16-Sep-92
1-Sep-93
17-Aug-94
25-Apr-90
10-Apr-91
3-Feb-95
16-Nov-88
1-Nov-89
37
The number of observations for the test set may seem small but this book uses an additional 21
observations for the out-of-sample data set for validation of the model. The purpose of this test set is mainly
to determine when to stop the training of the ANN. This limitation will be alleviated as more observations are
obtained. However, at the time of research for this book, the amount of data available was limited to the 495
observations.
38
Neuralyst 1.4 is a neural network program that runs as a Microsoft Excel macro. The company responsible
for the program, Cheshire Engineering Corporation, can be contacted at 650 Sierra Madre Villa Avenue,
Suite 201, Pasadena, CA 91107, USA.
39
Note that this is just one of the many ways the rule can be constructed; e.g. a buy signal may be generated
if the closing price has been declining for the past three periods.
Figure 5-1
Artificial Neural Net-AR-based Trading System
No
Is (x*t+1-xt)> ∆R? Is (x*t+1-xt)< ∆R?
Rules
Level 1
Yes Yes
Rules
Rule 3: Rule 3:
Is |(x*t+1-xt)|>Filter? Is |(x*t+1-xt)|>Filter?
No No
Yes Yes
f = S (1 + i − y ) + τ
u
t ,T t t t t
Equation 5-2
f = S (1 + i − y ) − τ
l
t ,T t t t t
Equation 5-3
where f is the futures contract price, S is the spot price at the trading time, t is the
trading date, T is the future date, i is the funding cost, y is the earnings from placement,
u is the upper bound, l is the lower bound and τ is the transaction costs.
When the actual futures price lies above the upper bound, arbitrageurs can make risk free
profits by buying spot, funding the position and selling futures. When the actual futures
price lies above the lower bound, profits can be made by selling spot and buying futures. A
similar logic applies to speculative positions where the forecast future spot rate replaces
the theoretical futures price. The fundamental difference between the two structures is that
arbitrage involves risk-free profits. In contrast, speculation involves profits, with all the
attendant uncertainties.
It is not possible to define a universal trading rule. Ultimately, attitudes towards risk
govern the choice of a trading rule. A risk-neutral speculator will undertake any trade for
which the forecast exchange rate lies outside the boundaries implied by arbitrage pricing.
A risk-averse speculator will require a greater spread between the forecast rate and the
arbitrage boundaries, where the size of the spread will depend on the degree of risk
aversion; a higher spread is consistent with a higher expected return on the transaction. For
example, a highly risk-averse speculator might only trade when the forecast rate is outside
the arbitrage range and a very high level of statistical significance. Increasing the level of
significance reduces the number of trades, but increases the probability of profit on the
trades undertaken. The following section describes the calculation of the arbitrage
boundaries as though the investor is risk neutral. Section 5.5.4 below defines the filter
rules.
5.5.2 Calculating the Arbitrage Boundaries
The trading system assumes that when a trade is transacted, the transaction is funded
through borrowing, in either the domestic (in the case of buying foreign currency), or
foreign money market (in the case of selling the foreign currency), and placing the
transacted funds in the appropriate money markets at the prevailing rates. For example,
when a buy signal is generated, it is assumed that the trader will buy the foreign asset by
borrowing local currency (A$) funds from the domestic market (in this case the Australian
money market) at the weekly domestic cash rate (in this case the Australian Weekly Cash
rate), purchase the foreign currency (US Dollar), and invest it for one week at the foreign
money market rate (US Fed Funds weekly rate). In the case of a sell signal, the trader will
borrow from the foreign money market at the prevailing rate (US Fed Funds), sell the
foreign currency (US Dollar) for domestic currency (A$), and invest the proceeds for one
week in the domestic money market at the prevailing rate (Australian Cash rate).
Transaction costs are important in short-term transactions of this type. Bid-offer spreads in
the professional AUD/USD market are normally around 7 basis points. Spreads in short-
term money markets are usually around 2 basis points. Since the data available for these
rates are mid rates, a transaction cost of 7 basis points is assumed for a two-way foreign
exchange transaction while a transaction cost of 2 basis points is assumed in the money
market transactions.
Since these are the normal interbank spread, sensitivity analysis of the spread is not carried
out. However, sensitivity analysis of the different filter values is performed, and this is
similar to performing a sensitivity analysis on the foreign exchange rate spread.
The interest differential and the spread in both the money market and foreign exchange
transactions represent the cost of funds for performing such transactions. Thus, a risk-
averse investor will trade if the forecast exchange rate change lies outside the band set by
the interest differentials and transaction costs. As noted in the previous sections, the limits
of this band is referred to as the arbitrage boundaries, since they correspond to the
arbitrage boundaries for futures pricing.
The formula for computing the interest differential in terms of foreign exchange points in
deciding whether to buy foreign currency (US dollar) is as follows:
Interest Differential = Foreign Asset Deposit Interest - Local Funding Cost
intspread intspread
1 1 + foreign_ interest_ rate − * fxspread 1 + local_ interest_ rate +
= × 2 × xt +1 − −
2
fxspread 52 2 52
x t +
2
Equation 5-4
and the formula for selling foreign currency is as follows:
Interest Differential = Domestic Asset Deposit Interest - Foreign Funding Cost
intspread intspread
1 + local_interest_ rate − 1 1 + foreign_interest_ rate + * fxspread
= 2
− × 2 × x t +1 +
52 − fxspread 52 2
x t 2
Equation 5-5
where xt is the current closing exchange rate expressed as units of domestic currency per
*
foreign currency unit, x t+1 is the forecast following week’s closing rate, fxspread is 7
basis points or 0.0007 representing the foreign exchange transaction cost, intspread is 2
basis points or 0.02% representing the money market transaction cost,
foreign_interest_rate is the US Fed Fund rate in percentage points and the
local_interest_rate is the Australian cash rate in percentage points.
5.5.3 Rules Structure
The first level rules check if the difference between the forecast and the current closing
rate (x*t+1 - xt) lies outside the arbitrage boundaries set by ∆R. The second level rules are
the filter rules discussed in the next section. A ‘buy’ signal is generated if the difference is
beyond the upper boundary and the filter value. Likewise, a ‘sell’ signal is generated if it is
beyond the lower boundary and passes the filter rule. In all other cases, a ‘do nothing’
signal is generated.
The signals in summary are:
*
For x t+1 - xt > Upper Boundary + Filter Value, Buy
< Lower Boundary - Filter Value, Sell
else, Do Nothing
Equation 5-6
The model assumes all trades can be transacted at the week’s closing exchange rates and
interest rates in the calculation of the profitability of the trades.
5.5.4 Filter Rules
The idea of a filter rule is to eliminate unprofitable trades by filtering out the small moves
forecast in the exchange rate. The reason for this is that most whipsaw losses in trend
following trading systems occur when a market is in a non-trending phase. The filter rule
values determine how big a forecast move should be before a trading signal is generated.
Obviously, small filter values will increase the number of trades while large values will
limit the number of trades. If the filter value is too large, there may be no trade signals
generated at all.
This research uses filter values ranging from zero to threshold values. Threshold values are
filter values at which beyond, all trades are eliminated (filtered out) for each of the three
models. The filter rules are linear in nature but their relationship to the proft results are
nonlinear, as can be observed in the results discussed in later sections. More rules can of
course be added to the system. These additional rules can be the existing rules in technical
analysis indicator-based trading systems, or econometric models that are based on
fundamental information. However, it is necessary to determine whether additional rules
will enhance the trading system.
Table 5-1
Summary of the ANN Parameter Settings
Network Parameters
Learning rate 0.07
Momentum 0.1
Input Noise 0.1
Training Tolerance 0.01
Testing Tolerance 0.01
One Neurode
Output Layer
3 Neurodes
Hidden Layer
6 Neurodes
Input Layer
5.8 Results
The results are reported in terms of profitability of the trading systems. Earlier studies have
shown that the direction of the forecast is more important that the actual forecast itself in
determining the profitability of a model [Tsoi, Tan and Lawrence 1993ab, Sinha and Tan
1994]. Only the out-of-sample results are reported; i.e. the last 21 observations; as this is
the only data set that provides an informative and fair comparison of the models. The
profits and losses are given in terms of foreign exchange points in local currency terms
(Australian dollar); for example, a profit of 0.0500 points is equivalent to 5 cent for every
Australian dollar traded or 5% of the traded amount. The Mean Square Errors (MSE)of the
models’ forecasts are reported in Table 5-2 as well as a brief analysis on the forecast
results in section 5.8.8.
5.8.1 Perfect Foresight Benchmark Comparison
The two models are compared with a “perfect foresight” (PF) model benchmark. This
model assumes that every single trade is correctly executed by the trading system when it
is given perfect foresight into the future and knowledge of the actual closing exchange
rates for the following week. Under this model, all profitable trades are executed.
5.8.2 Performance Metrics
The results are reported with different filter values. Table 5-3 breaks the results down into
different trading performance metrics to help asses the impact of the filter values. A set of
performance metrics is used to provide a more detail analysis on the trading patterns
generated by each model. They are as follows:
i. Total gain
This is the total profit or loss generated by each model for each of the different filter
values.
v. Winning trades
This is the number of trades that generated a profit.
Equation 5-13
40
A threshold filter value is the limit value for the filter before all trading signals are eliminated.
i. AR: 0.0325
ii. ANNOAR: 0.0195
iii. ANNWAR: 0.0162
The filter rules obviously have little or no impact on the PF model.
Charts 5-7, 5-8 and 5-9 show a comparison of the actual A$/US$ exchange rate against the
forecast of each of the three models, AR, ANNOAR and ANNWAR respectively. Chart 5-
9 compares the ANNOAR’s forecast against the ANNWAR’s forecast with the actual
exchange rates as a benchmark. This comparison is made as the ANNOAR and ANNWAR
graphs seems to be very similar, but yet their profitability performances are quite different.
5.8.4 PF Model’s Profitability Performance
The PF as mentioned earlier, serves a benchmark and reflects the ideal model. From Table
5-3a&b and Chart 5-1, an increase in filter values results in a decrease in total profits; from
0.1792 at zero filter value to 0.1297 at a filter value of 0.0100. There is also an increase in
the average profit per trade, revealing that only small profitable trades are filtered out. The
average profit per trade increases from 0.0090 at zero filter value to 0.0162 at filter value
of 0.0100. Increasing the filter values reduces the overall total number of trades. The
largest total gain per trade is 0.0326 and is not filtered out by the range of filter values used
in the test. The PF model’s profit remains steady at 0.0326 at 0.0.0210 before finally being
filtered out at 0.0326. This steady state profit is derived from just one trade which is the
trade with the largest total gain.
5.8.5 AR Model’s Profitability Performance
The AR model’s profitability performance is erratic as observed in Chart 5-2 and Tables 5-
3a&b. They show that the AR model is quite sensitive to the filter values; a change in filter
values by a mere 0.0005 could reverse profits to losses and vice-versa. From Table 5-
3a&b, the highest total gain achieved by the AR model is 0.0193 when using filter values
of 0.0030 to 0.0040 while the biggest loss is -0.0184 at a filter value of 0.0015. Filter
values of 0.0100 and 0.0145 are the only other values that give significant profitable
performance; profits of 0.0175 and 0.0192 respectively. Chart 5-2 shows that a constant
profit of 0.0036 is achieved from 0.0200 to the threshold value of 0.0325. The AR model
has the largest threshold value. This is in contrast to the previous study by the author [Tan
1995ab] where that AR model achieved significant profits and outperformed a simple
ANN (called an ANNOAR in this study) but had all its trades filtered out from 0.0010.
The AR model’s worst average loss per trade is -0.0011 at filter value of 0.0015 while its
best average profit per trade is 0.0018 at a filter value of 0.0100. The average profit/loss
per trade fluctuates over the different filter values. The AR model’s largest gain per trade
is 0.0207 while the largest loss per trade is -0.0347 at filter values of 0.0000 to 0.0015.
This loss reduces to -0.0104 at a filter value of 0.0100. The AR model did not manage to
capture the single biggest possible gain per trade of 0.0326 as indicated by the PF model in
Table 5.3a&b.
The percentage of winning trades for the AR model does not improve significantly with
the increments in filter values. From Table 5-3a&b, the highest percentage of winning
trades is 60% at a filter value of 0.0100 while the lowest is 46.15% at filter values range of
0.0050 to 0.0065. The percentage of correct trades to PF never exceeds 47.62% in the
range of filter values (0.0000 to 0.0100) tested. It does not seem to have a clear positive
correlation total profit. In some cases, filter values with higher profits actually has lower
percentages of correct trades to PF. For example, the filter value of 0.0030 corresponds to
the highest total profit (0.0193) but also to the second lowest percentage of correct trades
to PF (38.10%).
In contrast to the earlier study [Tan 1995ab], the filter value increment from 0.0000 to
0.0005 improved the percentage of winning trade from 72.73% to 100% but the percentage
of correct trades to PF decreased from 45% to 35%.
5.8.6 ANNOAR Model’s Profitability Performance
The ANNOAR model in this study, significantly outperformed the AR model but fails to
achieve the standard of the ANNWAR model. This is in contrast to the earlier study by the
author [Tan 1995ab] where the AR model outperformed the simple stand-alone ANN
model (referred to as ANNOAR in this study). In fact, that is the main motivation for the
experimentation with hybrid ANN models, which subsequently resulted in the
development of the ANNWAR model.
From Tables 5-2 and 5-3c&d show that the ANNOAR model’s highest profit of 0.0546 is
achieved without any filter values. Incrementing the filter value to 0.0005 reduced profit
by more than 50% to 0.0220. This profit was further reduced by 50% to 0.0110 when the
filter value is incremented to 0.0010. However, the total profit gradually increases again to
a maximum of 0.0220 before decreasing again to a stable state profit of 0.0036 at the filter
value of 0.0095. Chart 5-3 indicates that the profitability of the model remains constant at
this level with all subsequent filter values up to the threshold value of 0.0195.
The ANNOAR model’s highest average profit per trade is 0.0089 at a filter value of
0.0090 while its lowest is 0.0006 at filter values of 0.0010 to 0.0015. The largest gain per
trade is 0.0326 with zero filter value, but this trade is filtered out when the filter value is
incremented by 0.0005. The largest loss per trade is -0.0122. However, when the filter
value is increased to 0.0070, the largest loss per trade is reduced to -0.0049 and further
increments to 0.0090 and beyond, eliminate all unprofitable trades.
The lowest percentage of winning trades is 46.06% at filter values of 0.0010 and 0.0015.
The highest percentage of winning trades is 100% when the largest loss per trade is
reduced to zero at filter value of 0.0090 to the threshold value. The highest percentage of
correct trades to PF is 61.90% at the filter value of 0.0090 while the lowest is 23.81% at
the filter value of 0.0040. Generally, the higher filter values (from 0.0080) improve the
percentage of correct trades to PF, though not as significantly as the improvement to
percentage of winning trades. This means that some profitable trades are eliminated
together with the unprofitable trades; i.e. some ‘buy’ or ‘sell’ signals in the PF model are
incorrectly filtered out resulting in instead in a ‘Do Nothing’ signal
5.8.7 ANNWAR Model’s Profitability Performance
Chart 5-5 and Chart 5-6 suggest the ANNWAR model is the best of the three models in
terms of overall profitability. The total profit gained by the model was significantly higher
than the AR and ANNOAR at all filter values tested up to 0.0075. The ANNOAR only
outperformed the ANNWAR at filter values of 0.0085 to 0.0090. This is due to the fact
that both the ANNOAR and ANNWAR have the same stable state profit value of 0.0036
but the ANNWAR hit its stable state profit value earlier at 0.0090.
Table 5.2 and Table 5.3c&d show a total profit gain of 0.0685 with filter values of zero to
0.0009. The total profit dips slightly to 0.0551 when filter value is set to 0.0010 and is
reduced by 50% to 0.0225 with filter values of 0.0015 to 0.0025. Total profits gradually
increase again to 0.0409 when the filter values are incremented to 0.0335. Total profits
then drops to 0.0258 at filter value of 0.0040 but increases to 0.0445 and remains steady
there, from filter values of 0.0045 to 0.0050. Total profits dips again at 0.0055 to 0.0286
but recovers to 0.0334 at filter values of 0.0060 to 0.0065. From that value onward, total
profits gradually decreases to a stable state profit value of 0.0036 at filter value of 0.0090
where it remains till the threshold value of 0.0162 is reached. All trades beyond that value
are eliminated.
The ANNWAR model’s average profit per trade ranges from 0.0015 (at filter values of
0.0015 to 0.0020) to 0.0084 (at filter values of 0.0060 to 0.0065). The largest gain per
trade achieved by this model is 0.0326. This is the same value achieved by the ANNOAR
model. However, unlike the ANNOAR, this highly profitable trade is not eliminated at
0.0005. In fact, it is only eliminated at filter value of 0.0015. The largest loss per trade is -
0.0122 at filter values of zero to 0.0030. It is eliminated with a filter value of 0.0035
resulting in the next largest loss per trade of -0.0104. This trade is quickly eliminated when
the filter value is set to 0.0045, which reduces the largest loss per trade to -0.0049. At a
filter value of 0.0060, all unprofitable trades are eliminated.
The percentage of winning trade decreases from 58.82% at filter values of 0.0000 to
0.0005 to 53.33% at filter values of 0.0015 to 0.0025. It gradually improves from 53.85%
at a filter value of 0.0030 to a 100% from the filter value of 0.0060. The percentage of
winning trades to PF initially increases from 47.62% zero filter value to 52.38% before
decreasing to a low of 28.57% at a filter value of 0.0040. However, from a filter value of
0.0045, it gradually increases to 57.14% at 0.0090. Interestingly, the initial filter values
that give the highest average profit per trade and a 100% of winning trades do not
correspond to the highest percentage of correct trades to PF. This indicates that some
profitable trades are eliminated at those filter values but the majority of the remaining
trades are highly profitable.
Table 5-2
Summary of the Models’ Profitability: Perfect Foresight (PF), Autoregressive (AR),
Artificial Neural Networks with no AR (ANNOAR) and Artificial Neural Networks with AR
(ANNWAR)
Chart 5-1
Effect of Filter Values on the Profitability of the PF Model
0.18
0.16
0.14
0.12
0.1
P/L (A$)
PF P/L
0.08
0.06
0.04
0.02
0
0.0000
0.0010
0.0020
0.0030
0.0040
0.0050
0.0060
0.0070
0.0080
0.0090
0.0100
0.0110
0.0120
0.0130
0.0140
0.0150
0.0160
0.0170
0.0180
0.0190
0.0200
0.0210
0.0220
0.0230
0.0240
0.0250
0.0260
0.0270
0.0280
0.0290
0.0300
0.0310
0.0320
0.0330
0.0340
0.0350
Filter Values
Chart 5-2
Effect of Filter Values on the Profitability of the AR Model
0.0150
0.0100
0.0050
P/L (A$)
0.0000
AR P/L
-0.0050
-0.0100
-0.0150
-0.0200
0.0000
0.0050
0.0100
0.0150
0.0200
0.0250
0.0300
0.0350
Filter Values
Chart 5-3
Effect of Filter Values on the Profitability of ANNOAR Model
0.0500
0.0400
P/L (A$)
0.0200
0.0100
0.0000
0.0000
0.0010
0.0020
0.0030
0.0040
0.0050
0.0060
0.0070
0.0080
0.0090
0.0100
0.0110
0.0120
0.0130
0.0140
0.0150
0.0160
0.0170
0.0180
0.0190
0.0200
Filter Values
Chart 5-4
Effect of Filter Values on the Profitability of ANNWAR Model
0.0600
0.0500
0.0400
P/L (A$)
ANNWAR P/L
0.0300
0.0200
0.0100
0.0000
0.0000
0.0010
0.0020
0.0030
0.0040
0.0050
0.0060
0.0070
0.0080
0.0090
0.0100
0.0110
0.0120
0.0130
0.0140
0.0150
0.0160
0.0170
0.0180
0.0190
0.0200
Filter Values
Chart 5-5
Comparison of the Effect of Filter Values on the Profitability of the AR, ANNWAR and
ANNOAR Models
P/L of Each Model with Different Filter Values Ranging from 0.0000 to 0.0200
0.0700
0.0600
0.0500
0.0400
P/L (A$)
0.0300
0.0200 AR P/L
ANNOR P/L
ANNWAR P/L
0.0100
0.0000
Chart 5-6
Comparison of the Effect of Filter Values on the Profitability of the ANNOAR and
ANNWAR Models
0.0700
Profitability of the ANNOR and ANNWAR Models over Different Filter Values
Ranging from 0.0000 to 0.0200
0.0600
0.0500
0.0400
P/L (A$)
0.0300
0.0200
ANNOR P/L
0.0100 ANNWAR P/L
0.0000
0.0000
0.0010
0.0020
0.0030
0.0040
0.0050
0.0060
0.0070
0.0080
0.0090
0.0100
0.0110
0.0120
0.0130
0.0140
0.0150
0.0160
0.0170
0.0180
0.0190
0.0200
Filter Values
Table 5-3a
Detailed trading comparison of the PF and AR with filter values varied from 0.0000 to 0.0050 basis points.
Filter 0.0000 0.0005 0.0010 0.0015 0.0020 0.0025 0.0030 0.0035 0.0040 0.0045 0.0050
Perfect Foresight
Average profit per trade 0.0090 0.0090 0.0099 0.0104 0.0104 0.0104 0.0114 0.0119 0.0126 0.0133 0.0133
Largest loss per trade 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
Largest gain per trade 0.0326 0.0326 0.0326 0.0326 0.0326 0.0326 0.0326 0.0326 0.0326 0.0326 0.0326
Total gain 0.1792 0.1792 0.1776 0.1763 0.1763 0.1763 0.1706 0.1671 0.1635 0.1592 0.1592
Percentage winning trades 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00%
% of correct trades to PF 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00%
No of winning trades 20 20 18 17 17 17 15 14 13 12 12
Buy 12 12 11 11 11 11 9 8 8 8 8
Sell 8 8 7 6 6 6 6 6 5 4 4
Do Nothing 1 1 3 4 4 4 6 7 8 9 9
AR
Average profit per trade -0.0004 -0.0004 -0.0004 -0.0011 0.0010 0.0010 0.0013 0.0013 0.0013 0.0011 -0.0004
Largest loss per trade -0.0347 -0.0347 -0.0347 -0.0347 -0.0122 -0.0122 -0.0122 -0.0122 -0.0122 -0.0122 -0.0122
Largest gain per trade 0.0207 0.0207 0.0207 0.0207 0.0207 0.0207 0.0207 0.0207 0.0207 0.0207 0.0150
Total gain -0.0074 -0.0074 -0.0074 -0.0184 0.0163 0.0163 0.0193 0.0193 0.0193 0.0150 -0.0056
Percentage winning trades 50.00% 50.00% 50.00% 47.06% 50.00% 50.00% 53.33% 53.33% 53.33% 50.00% 46.15%
% of correct trades to PF 42.86% 42.86% 47.62% 38.10% 38.10% 38.10% 38.10% 42.86% 38.10% 38.10% 33.33%
No of winning trades 9 9 9 8 8 8 8 8 8 7 6
Buy 6 6 6 6 6 6 5 5 5 5 5
Sell 12 12 12 11 10 10 10 10 10 9 8
Do Nothing 3 3 3 4 5 5 6 6 6 7 8
Table 5-3b
Detailed trading comparison of the PF and AR with filter values varied from 0.0055 to 0.0100 basis points.
Filter 0.0055 0.0060 0.0065 0.0070 0.0075 0.0080 0.0085 0.0090 0.0095 0.0100
Perfect Foresight
Average profit per trade 0.0133 0.0133 0.0139 0.0146 0.0146 0.0146 0.0162 0.0162 0.0162 0.0162
Largest loss per trade 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
Largest gain per trade 0.0326 0.0326 0.0326 0.0326 0.0326 0.0326 0.0326 0.0326 0.0326 0.0326
Total gain 0.1592 0.1592 0.1530 0.1465 0.1465 0.1465 0.1297 0.1297 0.1297 0.1297
Percentage winning trades 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00%
% of correct trades to PF 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00%
No of winning trades 12 12 11 10 10 10 8 8 8 8
Buy 8 8 8 7 7 7 5 5 5 5
Sell 4 4 3 3 3 3 3 3 3 3
Do Nothing 9 9 10 11 11 11 13 13 13 13
AR
Average profit per trade -0.0004 -0.0004 -0.0004 -0.0004 -0.0004 -0.0004 0.0005 0.0005 0.0005 0.0018
Largest loss per trade -0.0122 -0.0122 -0.0122 -0.0122 -0.0122 -0.0122 -0.0122 -0.0122 -0.0122 -0.0104
Largest gain per trade 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150 0.0150
Total gain -0.0056 -0.0056 -0.0056 -0.0052 -0.0052 -0.0052 0.0053 0.0053 0.0053 0.0175
Percentage winning trades 46.15% 46.15% 46.15% 50.00% 50.00% 50.00% 54.55% 54.55% 54.55% 60.00%
% of correct trades to PF 33.33% 33.33% 33.33% 38.10% 38.10% 38.10% 42.86% 42.86% 42.86% 42.86%
No of winning trades 6 6 6 6 6 6 6 6 6 6
Buy 5 5 5 4 4 4 4 4 4 4
Sell 8 8 8 8 8 8 7 7 7 6
Do Nothing 8 8 8 9 9 9 10 10 10 11
Table 5-3c
Detailed trading comparison of the ANNOAR and ANNWAR with filter values varied from 0.0000 to 0.0050 basis points.
Filter 0.0000 0.0005 0.0010 0.0015 0.0020 0.0025 0.0030 0.0035 0.0040 0.0045 0.0050
ANN without AR (ANNOAR)
Average profit per trade 0.0029 0.0012 0.0006 0.0006 0.0009 0.0015 0.0013 0.0013 0.0010 0.0010 0.0010
Largest loss per trade -0.0122 -0.0122 -0.0122 -0.0122 -0.0122 -0.0122 -0.0122 -0.0122 -0.0122 -0.0122 -0.0122
Largest gain per trade 0.0326 0.0207 0.0207 0.0207 0.0207 0.0207 0.0207 0.0207 0.0207 0.0207 0.0207
Total gain 0.0546 0.0220 0.0110 0.0110 0.0138 0.0225 0.0182 0.0182 0.0115 0.0115 0.0115
Percentage winning trades 52.63% 50.00% 47.06% 47.06% 50.00% 53.33% 50.00% 50.00% 50.00% 50.00% 50.00%
% of correct trades to PF 47.62% 42.86% 42.86% 38.10% 42.86% 42.86% 33.33% 33.33% 23.81% 28.57% 28.57%
No of winning trades 10 9 8 8 8 8 7 7 6 6 6
Buy 6 5 5 5 5 5 5 5 3 3 3
Sell 13 13 12 12 11 10 9 9 9 9 9
Do Nothing 2 3 4 4 5 6 7 7 9 9 9
ANN With AR (ANNWAR)
Average profit per trade 0.0040 0.0040 0.0034 0.0015 0.0015 0.0015 0.0022 0.0034 0.0026 0.0056 0.0056
Largest loss per trade -0.0122 -0.0122 -0.0122 -0.0122 -0.0122 -0.0122 -0.0122 -0.0104 -0.0104 -0.0049 -0.0049
Largest gain per trade 0.0326 0.0326 0.0326 0.0207 0.0207 0.0207 0.0207 0.0207 0.0150 0.0150 0.0150
Total gain 0.0685 0.0685 0.0551 0.0225 0.0225 0.0225 0.0288 0.0409 0.0258 0.0445 0.0445
Percentage winning trades 58.82% 58.82% 56.25% 53.33% 53.33% 53.33% 53.85% 58.33% 60.00% 75.00% 75.00%
% of correct trades to PF 47.62% 47.62% 52.38% 42.86% 42.86% 42.86% 33.33% 33.33% 28.57% 33.33% 33.33%
No of winning trades 10 10 9 8 8 8 7 7 6 6 6
Buy 7 7 6 5 5 5 5 5 5 4 4
Sell 10 10 10 10 10 10 8 7 5 4 4
Do Nothing 4 4 5 6 6 6 8 9 11 13 13
Table 5-3d
Detailed trading comparison of the ANNOAR and ANNWAR with filter values varied from 0.0055 to 0.0100 basis points.
Filter 0.0055 0.0060 0.0065 0.0070 0.0075 0.0080 0.0085 0.0090 0.0095 0.0100
ANN without AR (ANNOAR)
Average profit per trade 0.0011 0.0017 0.0009 0.0034 0.0034 0.0034 0.0040 0.0089 0.0036 0.0036
Largest loss per trade -0.0122 -0.0122 -0.0122 -0.0049 -0.0049 -0.0049 -0.0049 0.0000 0.0000 0.0000
Largest gain per trade 0.0207 0.0207 0.0142 0.0142 0.0142 0.0142 0.0142 0.0142 0.0036 0.0036
Total gain 0.0119 0.0174 0.0071 0.0171 0.0171 0.0171 0.0158 0.0178 0.0036 0.0036
Percentage winning trades 54.55% 60.00% 62.50% 80.00% 80.00% 80.00% 75.00% 100.00% 100.00% 100.00%
% of correct trades to PF 33.33% 38.10% 38.10% 38.10% 38.10% 38.10% 52.38% 61.90% 57.14% 57.14%
No of winning trades 6 6 5 4 4 4 3 2 1 1
Buy 2 2 2 1 1 1 1 0 0 0
Sell 9 8 6 4 4 4 3 2 1 1
Do Nothing 10 11 13 16 16 16 17 19 20 20
ANN With AR (ANNWAR)
Average profit per trade 0.0057 0.0084 0.0084 0.0064 0.0064 0.0064 0.0032 0.0036 0.0036 0.0036
Largest loss per trade -0.0049 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
Largest gain per trade 0.0142 0.0142 0.0142 0.0128 0.0128 0.0128 0.0036 0.0036 0.0036 0.0036
Total gain 0.0286 0.0334 0.0334 0.0192 0.0192 0.0192 0.0065 0.0036 0.0036 0.0036
Percentage winning trades 80.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00%
% of correct trades to PF 38.10% 42.86% 47.62% 47.62% 47.62% 47.62% 52.38% 57.14% 57.14% 57.14%
No of winning trades 4 4 4 3 3 3 2 1 1 1
Buy 2 2 2 2 2 2 1 0 0 0
Sell 3 2 2 1 1 1 1 1 1 1
Do Nothing 16 17 17 18 18 18 19 20 20 20
Chart 5-7
Comparison of Forecast of the Actual Vs AR on Out-of-sample Data
1.3800
1.3600
A$/US$
1.3400
1.3200
Actual
AR Forecast
Date in Weeks
1.3000
12-May-95
19-May-95
26-May-95
20-Jan-95
27-Jan-95
5-May-95
3-Feb-95
10-Feb-95
17-Feb-95
24-Feb-95
14-Apr-95
21-Apr-95
28-Apr-95
2-Jun-95
9-Jun-95
3-Mar-95
10-Mar-95
17-Mar-95
24-Mar-95
31-Mar-95
7-Apr-95
Chart 5-8
Comparison of Forecast of the Actual Vs ANNOAR on Out-of-sample Data
1.3800
1.3600
A$/US$
1.3400
1.3200
Actual
ANNOAR
Date in Weeks
1.3000
Chart 5-9
Comparison of Forecast of the Actual Vs ANNWAR on Out-of-sample Data
1.3800
1.3600
A$/US$
1.3400
1.3200
Actual
ANNWAR
Date in Weeks
1.3000
5-May-95
12-May-95
19-May-95
26-May-95
20-Jan-95
27-Jan-95
3-Feb-95
10-Feb-95
17-Feb-95
24-Feb-95
2-Jun-95
9-Jun-95
3-Mar-95
10-Mar-95
17-Mar-95
24-Mar-95
31-Mar-95
7-Apr-95
14-Apr-95
21-Apr-95
28-Apr-95
Chart 5-10
Comparison of Forecast of the Actual Vs ANNWAR and ANNOAR on Out-of-sample Data
1.4000
1.3800
1.3600
A$/US$
1.3400
Actual
ANNWAR
ANNOAR
1.3200
Dates in Weeks
1.3000
Table 5-4
Comparison of Mean Square Error(MSE) of the Different Models on the out-of-sample
data
Models MSE
AR 0.000516
Random Walk Theory (RWT) 0.000278
ANNOAR 0.000317
ANNWAR 0.000266
Table 5-4 demonstrates the model with the lowest Mean Square Error (MSE) for the out-
of-sample data is the ANNWAR model. It even outperforms the Random Walk Theory
(RWT) model which is marginally the second best model. The RWT model uses the last
known exchange rate as the forecast for the next rate and obviously the RWT model thus
generates no trades. The ANNOAR model is the third best forecasting model while the AR
model is the poorest.
However, as discussed earlier, the reduction in forecast errors does not necessarily
translate to better profits. In this case though, this seems to be the case with the ANNWAR
performing best, both in terms of exchange rate forecasting, and trading profitability.
5.10 Conclusion
The results reported in this chapter suggest that the ANNWAR model, incorporating the
AR output into an ANN, can improve the robustness and the profitability of trading
systems relative to those based on AR models or ANNs in isolation. The results for this
experiment indicate the ANNWAR model to be a more profitable and robust trading
system in that it performs better and over a wider range of filter values than the other
models.
There appear to be opportunities to exploit some inefficiency in the Australian/US dollar
foreign exchange market, as all models return profits after taking account of interest
differential and transaction costs. This concurs with studies that have found abnormal
profits can be obtained from technical trading and filter rules [Sweeney 1986, Brock et al.
1992, LeBaron 1992]. The utilization of the ANN with other established technical trading
rules may improve profitability.
The AR model has been shown to perform poorly in this book. The results of the AR
model from this study differ quite significantly from the earlier study [Tan 1995ab]. In that
study, the AR model by itself seems ideal for the risk-averse trader as it generates a
smaller number of trades and in conjunction with an appropriate filter value, give the best
average profit per trade as well as the highest number of winning trades. However, its
sensitivity to the filter values, with all trades filtered out at a mere 5 basis points, questions
its reliability and stability for use in a real life trading environment.
In this study, however, the AR is unprofitable at most filter values, does not perform well
in any of the profitability metrics but is quite insensitive to the filter values, as it is the
model that has the highest threshold value. A reason for this could be the more linear
nature of the out-of-sample data in the earlier study, allowing the AR model to perform
better. However, in this study, the out-of-sample data has a more volatile nature with no
clear trend.
In the earlier study, the best ANN architecture is one with no hidden layer. This is the
architecture used in the ANNWAR and the ANN in isolation model in this research. This
suggest that the best model then, may be a linear forecasting model. It is therefore
surprising that the AR model, which is, by definition, a linear best-fit method can be
improved upon by incorporating the AR output into the ANN. Many studies have suggest
that most financial markets are nonlinear in nature, so the results from that time series are
quite interesting as it seems to contradict this view. One explanation could be that the filter
rules have added a non-linear dimension to the trading system in terms of the performance
as measured in terms of profitability.
In this study, the best ANN architecture was a network with one hidden layer. The
additional data may have helped the ANN to pick up the non-linearity nature of the
exchange rate market. Indeed, Hsieh [1989] and Steurer [1995] have shown that there is
considerable evidence of nonlinear structure in the Deutschmark/US Dollar (DEM/USD)
exchange rate. Steurer’s study suggests that there is a low-dimensionality chaos in the
DEM/USD exchange rate and the use of nonlinear nonparametric techniques can produce
significantly better results. Artificial Neural Networks have been shown to ‘capture chaos
because they learn the dynamical invariants of a chaotic dynamical system’ [Deco et al.
1995].
The accuracy of the ANNWAR model as measured by the percentage of winning trades
reached a 100%. A level above 60% is sufficient for a market maker with low transaction
cost to run a profitable foreign exchange desk [Orlin Grabbe 1986]. However, this high
percentage of winning trade requires relatively high filter values. By eliminating all of the
unprofitable trades, many profitable trades are also eliminated thus reducing the total
profit.
The more risk-averse trader may choose to accept a lower total return with the use higher
filter values to minimize the possibility of any trading loss, while a more speculative trader
may be willing to take the risk of having some unprofitable trades in expectation of a
higher return. Further research should investigate if the returns are commensurate with the
additional risk.
This study also confirms the robustness of the ANNWAR model that was introduced in my
earlier work. The ANNWAR model in this study not only significantly outperformed the
other models in terms of profitability but also in terms of exchange rate forecast as
measured by the MSE term.
models in use are performing well. The ANN models may need to be retrained should the
system starts showing signs of diverging from its profitability targets.
Resources required for implementing the system include a reliable source for data, a
computer system, personnel to ensure compliance and risk management controls are in
place, maintenance of database, operational staff to execute the trades and the training of
the personnel that will use the system.
5.13 References
1. Abu Mostafa, Y. S., “Financial Market Applications of Learning Hints”, Neural
Networks in the Capital Market edited by Refenes, A., ISBN 0-471-94364-9, John
Wiley & Sons Ltd., England, pp. 220-232, 1995.
2. Bourke, L., “The Efficiency of Australia’s Foreign Exchange Market in the Post-Float
Period”, Bond University School of Business Honours Dissertation, Australia,
September 1993.
3. Brock, W. A., Lakonishok, J. and LeBaron, B., “Simple Technical Trading Rules and
the Stochastic Properties of Stock Returns”, The Journal of Finance, 47:1731:1764,
USA, 1992.
4. {BIS95], Central Bank Survey of Foreign Exchange Market Activity in April 1995,
Bank for International Settlements Press Communiqué, Basel, October 1995.
5. Colin, A, “Exchange Rate Forecasting at Citibank London”, Proceedings, Neural
Computing 1991, London, 1991.
6. Colin, A. M., “Neural Networks and Genetic Algorithms for Exchange Rate
Forecasting”, Proceedings of International Joint Conference on Neural Networks,
Beijing, China, November 1-5, 1992.
7. Dacorogna, M. M., Muller, U. A., Jost, C., Pictet, O. V., Olsen R. B. and Ward, J. R.,
“Heterogeneous Real-Time Trading Strategies in the Foreign Exchange Market”,
Preprint by O & A Research Group MMD.1993-12-01, Olsen & Associates,
Seefeldstrasse 233, 8008 Zurich, Switzerland, 1994.
8. Davidson, C., June 1995, Development in FX Markets [Online], Olsen and Associates:
Professional Library,
Available: http://www.olsen.ch/library/prof/dev_fx.html, [1996, August 5].
9. Deboeck, G. J., Trading on the Edge: Neural, Genetic, and Fuzzy Systems for Chaotic
Financial Markets, ISBN 0-471-31100-6, John Wiley and Sons Inc., USA, 1994.
10. Deco, G., Schuermann, B. and Trippi, R., “Neural Learning of Chaotic Time Series
Invariants, Chaos and Nonlinear Dynamics in the Financial Markets edited by Trippi,
R, Irwin, USA, ISBN 1-55738-857-1, pp.467-488, 1995.
11. Dwyer G. P., and Wallace, M. S., ‘Cointegration and Market Efficiency”, Journal of
International Money and Finance, Vol. 11, pp. 318-327.
12. Engel, C., “A Note on Cointegration and International Capital Market Efficiency”,
Journal of International Money and Finance, Vol. 15, No. 4, pp. 657-660, 1996.
13. Fishman, M., Barr, D. S. and Heaver, E., A New Perspective on Conflict Resolution in
Market Forecasting, Proceedings The 1st International Conference on Artificial
Intelligence Applications on Wall Street, NY, pp. 97–102, 1991.
14. Fishman, M., Barr, D. S. and Loick, W. J., Using Neural Nets in Market Analysis,
Technical Analysis of Stocks & Commodities, pp. 18–20, April 1991
15. Freedman, R. S., AI on Wall Street, IEEE Expert, pp. 3–9, April 1991.
16. Freisleben, B., Stock Market Prediction with Backpropagation Networks, Industrial
and Engineering Applications of Artificial Intelligence and Expert Systems 5th
34. Reserve Bank of Australia Bulletin, “Australian Financial Markets”, ISSN 0725-0320,
Australia, May 1996.
35. Rumelhart, D. E., Hinton, G. E., and Williams, R. J., “Learning Internal
Representations by Error Propagation”, Parallel Distributed Processing, Vol. 1, MIT
Press, Cambridge Mass., 1986.
36. Schawrtz, T. J., AI Applications on Wall Street, IEEE Expert, pp. 69–70, Feb. 1992.
37. Schoneburg, E., Stock Prediction Using Neural Networks: A Project Report,
Neurocomputing Vol. 2, No. 1, 17–27, June 1990.
38. Sheen, J., “Modeling the Floating Australian Dollar: Can the Random Walk be
Encompassed by a Model Using a Permanent Decomposition of Money and Output?”,
Journal of International Money and Finance, vol. 8, pp. 253-276, 1989.
39. Sinha, T. and Tan, C. , “Using Artificial Neural Networks for Profitable Share
Trading”, JASSA: Journal of the Security Institute of Australia, Australia, September
1994.
40. Steiner, M. and Wittkemper, H., “Neural Networks as an Alternative Stock Market
Model”, Neural Networks in the Capital Market edited by Refenes, A., ISBN 0-471-
94364-9, John Wiley & Sons Ltd., England, pp. 137-148, 1995.
41. Steurer, E., “Nonlinear Modeling of the DEM/USD Exchange Rate”, Neural Networks
in Capital Markets edited by Refenes, A., John Wiley and Sons, England, ISBN 0-471-
94364-9, pp. 199-212, 1995.
42. Surajaras, P. and Sweeney, R. J., “Profit-Making Speculation in Foreign Exchange
Markets, The Political Economy of Global Interdependence, Westview Press, Boulder,
1992.
43. Sweeney, R. J., “Beating the Foreign Exchange Market”, The Journal of Finance,
41:163-182, Vol. XLI, No. 1, USA, March 1986
44. Tan, C. N. W., “Incorporating Artificial Neural Network into a Rule-based Financial
Trading System”, The First New Zealand International Two Stream Conference on
Artificial Neural Networks and Expert Systems (ANNES), University of Otago,
Dunedin, New Zealand, November 24-26, 1993, IEEE Computer Society Press, ISBN
0-8186-4260-2, 1993a.
45. Tan, C. N. W., “Trading a NYSE-Stock with a Simple Artificial Neural Network-based
Financial Trading System”, The First New Zealand International Two Stream
Conference on Artificial Neural Networks and Expert Systems (ANNES), University of
Otago, Dunedin, New Zealand, November 24-26, 1993, IEEE Computer Society Press,
ISBN 0-8186-4260-2, 1993b.
46. Tan, C.N.W., Wittig, G. E., A Study of the Parameters of a Backpropagation Stock
Price Prediction Model, The First New Zealand International Two Stream Conference
on Artificial Neural Networks and Expert Systems (ANNES), University of Otago,
Dunedin, New Zealand, November 24-26, 1993, IEEE Computer Society Press, ISBN
0-8186-4260-2, 1993a.
47. Tan, C.N.W., Wittig, G. E., Parametric Variation Experimentation on a
Backpropagation Stock Price Prediction Model, The First Australia and New Zealand
41
Although the term “price action” is more commonly used, Murphy [1986] feels that the term is
too restrictive to commodity traders who have access to additional information besides price. As his
book focuses more on charting techniques for commodity futures market, he uses the term “market
action” to include price, volume and open interest and it is used interchangeably with “price action”
throughout the book.
CNW Tan Page 118
Appendix C: Introduction to Foreign Exchange Trading Techniques
Murphy [1986] summarizes the basis for technical analysis into the following three
premises:
Market action discounts everything. The assumption here is that the price action
reflects the shifts in demand and supply which is the basis for all economic and
fundamental analysis and everything that affects the market price is ultimately
reflected in the market price itself. Technical analysis does not concern itself in
studying the reasons for the price action and focuses instead on the study of the
price action itself.
Prices move in trends. This assumption is the foundation of almost all technical
systems that try to identify trends and trading in the direction of the trend. The
underlying premise is that a trend in motion is more likely to continue than to
reverse.
History repeats itself. This premise is derived from the study of human psychology
which tends not to change over time. This view of behavior leads to the
identification of chart patterns that are observed to recur over time, revealing traits
of a bullish or a bearish market psychology.
5.14.2 Fundamental Analysis
Fundamental analysis studies the effect of supply and demand on price. All
relevant factors that affect the price of a security are analyzed to determine the
intrinsic value of the security. If the market price is below its intrinsic value then
the market is viewed as undervalued and the security should be bought. If the
market price is above its intrinsic value, then it should be sold.
Examples of relevant factors that are analyzed are financial ratios; e.g. Price to
Earnings, Debt to Equity, Industrial Production Indices, GNP, and CPI.
Fundamental analysis studies the causes of market movements, in contrast to
technical analysis, which studies the effect of market movements. Interest Rate
Parity Theory and Purchasing Power Parity Theory are examples of the theories
used in forecasting price movements using fundamental analysis.
The problem with fundamental analysis theories is that they are generally relevant
only in predicting longer trends. Fundamental factors themselves tend to lag market
prices, which explains why sometimes market prices move without apparent causal
factors, and the fundamental reasons only becoming apparent later on. Another
factor to consider in fundamental analysis is the reliability of the economic data.
Due to the complexity of today’s global economy, economic data are often revised
in subsequent periods therefore posing a threat to the accuracy of a fundamental
economic forecast that bases its model on the data. The frequency of the data also
poses a limitation to the predictive horizon of the model.
5.14.3 ANNs and Trading Systems
Today there are many trading systems being used in the financial trading arena with
a single objective in mind; that is; to make money. Many of the trading systems
currently in use are entirely rule-based, utilizing buy/sell rules incorporating trading
signals that are generated from technical/statistical indicators such as moving
averages, momentum, stochastic, and relative strength index or from chart patterns
formation such as head and shoulders, trend lines, triangles, wedge, and double
top/bottom.
42
It is interesting that some recent studies have linked the neurons in the brain to activities in the
stomach. Therefore, the term ‘gut feel’ may be more than just a metaphor!
43
A knowledge engineer is a term used to describe expert system computer programmers. Their job
function is to translate the knowledge they gather from a human expert into computer programs in
an expert system.
44
The inference engine is a computer module where the rules of an expert system are stored and
used.
45
A system is said to be curve fitting if excellent results are obtained for only a set of data where the
parameters have been optimized but is unable to repeat good results for other sets of data.
CNW Tan Page 120
Appendix C: Introduction to Foreign Exchange Trading Techniques
Opening a position:
Buy rule
b. Sell rule
Closing a position
a. Stop/Take Profit rule
According to R. S. Freedman [Freedman 1991], the two general trading rules for
profiting from trading in securities markets are:
i Buy low and sell high.
ii Do it before anyone else.
Most trading systems are trend following systems, e.g., moving averages and
momentum. The system works on the principle that the best profits are made from
trending markets and that markets will follow a certain direction for a period of
time. This type of system will fail in non-trending markets. Some systems also
incorporate trend reversal strategies by attempting to pick tops or bottoms through
indicators that signal potential market reversals. A good system needs to have tight
control over its exit rules that minimize losses while maximizing gains.
5.14.4.1 Opening Position rules
Only one of the following rules below can execute for a specific security at any one
time, thus creating an open position. None of these rules can be executed for a
security that has an existing open position. A position is opened if there is a high
probability of a security price trending. A position is said to be open if either a buy
or a sell rule is triggered.
a. Buy Rule
This rule is generated when the indicators show a high probability of an increase in
the price of the security being analyzed. Profit can be made by buying the security
at this point in time and selling it later after the security price rises. Buying a
security opens a long position.
b. Sell Rule
This rule is generated when the indicators show a high probability of a drop in
price of the security being analyzed. Profit can be made by selling the security at
this point in time and buying it later after the security price declines. Selling a
security opens a short position.
5.14.4.2 Closing Position rules
A position can only be closed if there is an open position. A position is closed if
there is a high probability of a reversal or ending of a trend.
a. Stop/Take Profit rule
This rule can only be generated when a position (either long or short) has been
opened. It is generated when indicators show a high probability of a reversal in
trend or a contrary movement of the security price to the open position. It can also
be generated if the price of the security hits a certain level thus causing the
threshold level of loss tolerance to be triggered.
Chart 5-11
A Typical Technical Price Chart
35
HIGH
LOW
30 CLOSE
25
20
15
10