You are on page 1of 15

Recognizing Financial Distress Patterns Using a Neural Network Tool Author(s): Pamela K. Coats and L.

Franklin Fant Source: Financial Management, Vol. 22, No. 3 (Autumn, 1993), pp. 142-155 Published by: Wiley on behalf of the Financial Management Association International Stable URL: http://www.jstor.org/stable/3665934 . Accessed: 02/07/2013 00:35
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Wiley and Financial Management Association International are collaborating with JSTOR to digitize, preserve and extend access to Financial Management.

http://www.jstor.org

This content downloaded from 121.52.158.245 on Tue, 2 Jul 2013 00:35:28 AM All use subject to JSTOR Terms and Conditions

Financial Recognizing a Neural Network Using


PamelaK.Coats and L.Franklin Fant

Distress Tool

Patterns

Pamela K. Coats is a Professorof Finance and L. FranklinFant is a Doctoral Candidatein Finance, both at the College of Business, Florida State University,Tallahassee,Florida.

The traditionalapproachand present standardfor predictingfinancialdistressuses multiplediscriminant analysis (MDA) to weight the relative value of information providedby a combinationof financialratios. But MDA has been sharply criticized because the validity of its results hinges on restrictive assumptions (Werbos [37], Eisenbeis [11], Altman and Eisenbeis [3], Scott [29], Tollefson and Joy [32], Sheth [31], Ohlson [26], Pinches [27], Zmijewski [41], Zavgren [39], Karels and Prakash [17], and Odom and Sharda[25]). Two assumptionsare particularlyproblematicfor ratio analysis. First, MDA requiresthat the decision set used to distinguishbetween distressedand viable firmsmustbe linearlyseparable. For a single ratio, this means that a value above or below a given thresholdpoint must always signal eitherdistressor good health. In the instance where two ratios are considered together,the thresholdseparatingthe classification
0

regions is a line; with more than two ratios, a plane. Second, MDA does not allow for a ratio's signal to vacillate dependingon its relationship to anotherratioor set of ratios. In other words, ratios are treated as completely independent. these restrictions violatecommonsense. Unfortunately, In practice, a ratio may signal distress both when it is higherthannormaland when it is lower thannormal,or a ratio's value may be considered acceptable under some conditions, yet risky under others. These problems and others(e.g., bias of extremedatapoints;multivariate normality assumption;and equal groupcovariancesassumption) make MDA incompatiblewith the complex nature, of financialratios.The boundaries,and interrelationships

The authorsrecognize and thankR.C. Lacherand S.C. Sharmafor their andI. Locke, D. Pagach,andthreeanonymousFM reviewcontributions, ers for theirinsightfulcomments. 142

'While otherstudieshave suggestedalternatives to MDA, includinglogit (McFadden [22], Goss et al [15]), probit (McFadden[22]), recursive partitioning(Breimanet al [6], Frydman,Altman,andKao [14]), expert models (Barniv systems (Elmerand Borowski [12]), andnonparametric and Raveh [5]), none of these approacheshas replaced MDA as the for comparison. standard

This content downloaded from 121.52.158.245 on Tue, 2 Jul 2013 00:35:28 AM All use subject to JSTOR Terms and Conditions

FINANCIAL DISTRESSPATTERNS COATS& FANT/ RECOGNIZING

143

power of MDA for financialratioanalysisis compromised and the results may be erroneous (Karels and Prakash [17]). Our research is motivated by the fact that a "neural network" ("NN") analysis of the same ratios used by MDA, for the same objective, is possible without any of that binds MDA.2 Moreover,studies the circumscription indicatethatneuralnetworkmodels areat least as successful as MDA in terms of overall accuracy(Williams [38], Cottrel, Munro, and Zipser [9], Odom and Sharda [25], Webband Lowe [36], and Utans and Moody [57]). The question asked by our study is: How successfully can neuralnetworksdiscernpatternsor trendsin financial data and use them as early warningsignals of distressful conditions in currentlyviable firms? Being able to form highly reliableearly forecastsof the futurehealthof firms to banklending officers, is, of course, criticallyimportant investors, market analysts, portfolio managers,auditors, insurersand many othersin the field of finance. Ourapproach createsNN models which glean andlearn relationships in raw data from processing examples of conclusions reachedby experts (auditors)who have analyzed the same data. The experts, in making their assessments,have implicitlyimposedtheirinsightsandintuition cultivated over years of on-the-job experience. Our research objective is to formalize this ingrained, unarticulatedknowledge of the experts by uncoveringconsistencies between the experts'conclusions and the recurring patternsin the financialdata.To evaluateour results, we measureourneuralnetworks'success in using a limited numberof financialratios to duplicatethe going-concern 4 renderedby auditors.3, determinations The test results in this study suggest that the NN approachis more effective thanMDA for the early detection of financialdistress developing in firms. The NN models consistentlycorrectlypredictauditors'findingsof distress
2Since neuralnetworksdo not impose a linearityconstraint, nonnumeric data which denote,for example, a firm'snationality, regionof operation, marketsegments, employee satisfaction,severity of strikes,or spurious marketbehaviorscan be dealt with in the same fashionas numericinput. auditors'responsibilityin a corporate auditis to 3Partof the independent assess the capability of the firm to continue in existence, i.e., as a going-concern,throughthe following fiscal yearend. If the auditorshave sufficientdoubtas to this capability,they will issue a disclaimerreportto that effect. This report is frequently referredto as a "going-concern opinion." 4No publishedresearchto date has sought to fit a model to emulate the of auditorsin assessingfinancialdistress.Selfridgeand past performance Biggs [30] have, however, examinedthe natureof auditors'knowledge and have proposed a cognitive model which identifies the types and relationshipsof the knowledge involved.

at least 80% of the time over an effective lead time of up to fouryears.A statisticalcomparisonof resultsshows that the neuralnetworksare always betterthanthe MDA models for identifying firms which eventuallyreceive goingconcernopinions. The neuralnetworkwe use is a mathematical algorithm for creating a perfect mapping between the input and outputvalues for a set of trainingdata. The NN training process incrementallycapturesknowledge aboutthe relationship between the outputand the patternsin the input in order to correctly categorize the training situations. Once trainingis complete, the patternsfound by the NN can be used to forecast situationswhere the outcome is unknown. MDA can be consideredequivalentto a special case of NN, and the two approachesgive identical results when the inputvariablesarelinearlyseparable. However,the NN model is not subjectto MDA's constrainingassumptions, such as linearseparability andindependenceof the predictive variables. This allows a neural network to achieve betterresultsthan MDA when patternsare complex. The remainderof this paper is organized as follows. Section I provides an overview of the particulartype of neural network used for our research,namely CascadeCorrelation,and details its method of training.Section II presentsthe researchdesign which focuses on neuralnetworks and MDA comparisonmodels, and describes the collection of the dataandthe selection of the samplesused by the neuralnetworksand MDA models for trainingand testing. Section III presents the neural network results, makes comparisonswith the test resultsfor MDA models, and offers interpretations.Section IV summarizes our findings and the directionof furtherresearch.

I.Overviewof NeuralNetwork Methodology


Artificialneuralnetworksare inspiredby neurobiological systems.RobertHecht-Nielsen,inventorof one of the earliest neurocomputers,defines a neural network as a computingsystem made up of a numberof simple, highly interconnectedprocessing elements which process informationby theirdynamicstateresponsesto externalinputs (Caudill [8]). This definitionbrings out the two key elements in a neural network: "processing elements" and "interconnections." Each processing element receives and combines input signals and transformsthem into a single output signal. Each output signal, in turn, is sent (from its processing element) as an input signal to many other processing

This content downloaded from 121.52.158.245 on Tue, 2 Jul 2013 00:35:28 AM All use subject to JSTOR Terms and Conditions

144

FINANCIAL MANAGEMENT / AUTUMN 1993

elements (and possibly back to itself). Signals are passed aroundthe networkvia weighted interconnections (links) between processing elements. Network knowledge is storedboth in the way the processingelements connect in orderto transfersignals and in the natureand strengthof the interconnections. Manytypes of neuralnetworksexist, with certaintypes more suited to particular problemsthan others. We use a called "Cascade-Correlation" (or learning paradigm and Lebiere "Cascor") [13]). It is a mathematical (Fahlman a network to detect relationships for algorithm training between inputdataand outputvalues in orderto correctly categorize situations. Cascor overcomes several limitations of the more common back propagationapproach.5

Exhibit 1. Simple Six-Node Cascade-Correlation Neural Network

Output layer

Hidden

nodes

Input

layer

A. Structure
Exhibit 1 shows a diagramof a simple Cascade-Correlation neural network consisting of six processing elements, called "nodes," and their weighted interconnections. This elementarynetworkconnects an inputlayer of threenodes, an outputlayer of one node, and two hidden (or internal)nodes. The input layer is composed of pieces of input data which describe the situationbeing studied.For example, financial ratio. each input node may refer to a particular Taken together, the values for these three input nodes representone patternto be studiedby the network.These node values provide the initial signals to the NN. Since neuralnetworksdo not impose a linearityconstraint, qualitative data (denoting, for example, a firm's nationality, region of operation,marketsegments,employee satisfaction, severityof strikes,or spuriousmarketbehaviors)can be dealt with in the same fashion as numericinput. The output layer is composed of a single response or condition node which reflects the situation'sknown outcome. As an example, the output node may be used to denote a firm as being eitherhealthyor distressed.Neural systems generateeithercategorical(e.g., groups A, B, C, or D; small, medium, or large) or relational(e.g., better than/worsethan;greaterthan/lessthan)output. There can be any numberof hidden nodes, depending on (and increasingwith) the complexity of the patternin the inputdata. Each hidden node is fully connectedfrom
overbackpropagation hasbeen shownto haveseveraladvantages 5Cascor the Cascornetworklearns andotherexisting neutralnetworkalgorithms: very quickly,it determinesits own size anddesign, it retainsthe structure it has built even if the training set changes, and it requires no back propagationof error signals through the connections of the network (Fahlmanand Lebiere [13]).

Notes: Lines denote numericconnectionweights between nodes. 0 denotes a node. Processing is directionalfrom bottomto top.

all inputnodes and previouslyinstalledhiddennodes and to all outputnodes. Cascorbegins with no hidden nodes, and then incrementallycreates and installs hidden nodes (one at a time) to improvethe network'sabilityto categorize. It is the hiddennodes, andtheirmannerof connection with every inputand outputnode and to each other,which make a Cascor NN capable of elaborating on hidden structuresin the data. One of the advantagesof CascadeCorrelationover previous neural network designs is that Cascor automaticallyself-determinesthe numberof hidden nodes necessary to detect all of the features of the pattern(see Section I.C., Training).WithotherNN methods, extensive humantrial and erroris usually needed to discover the numberof hidden nodes which best enables good predictions.6 The numericweight assigned to the connectionof any two nodes reflects the direction(positive or negative)and relative strengthof the relationshipbetween the nodes. Determining these weights is the focus of the neural network's computational process. In essence, the network's knowledge about one node's influence on another is encoded in the connectionweights.

is describedby 6Thistask of reducingthe weights to avoid "overfitting" Utans and Moody [33]. See also footnote 17 hereof.

This content downloaded from 121.52.158.245 on Tue, 2 Jul 2013 00:35:28 AM All use subject to JSTOR Terms and Conditions

DISTRESSPATTERNS FINANCIAL COATS& FANT/ RECOGNIZING

145

B. Classification
can Cascorneuralnets represent? Whattype of patterns This is equivalentto asking: How many distinct regions can be formed within a decision space and how are they To help the readervisualize the classification separated? abilities of Cascor neuralnets, severalnets are portrayed graphically in Exhibit 2 for the two-dimensional, i.e., two-input,case. A neural net with no hidden nodes is equivalentto a analysis.As with MDA, the decision multiplediscriminant (or categorization)regions can only be separatedby a single straightline, as shown in Panel A of Exhibit 2. As variablesarenot linearnotedearlier,when discriminating ly separable,MDA (as well as a neuralnet with no hidden nodes) may not be appropriate. A Cascor net with one hidden node has the ability to space intoregions separatethe decision (or categorization) or surface an (eitheropen closed), ratherthan by angular a straightline (see Panel B of Exhibit 2). Because of the hidden node, individual input nodes can interact (pass information and influence each other's output signals). This facilitatesa flexible mapping. As more hidden nodes are added, the net becomes completely general and can separatethe decision space numberof regions defined by complex into an arbitrary boundaries,as the examples in Panel C of Exhibit 2 indicate (Lacher[19]).

Exhibit 2. CascorDecision Regions


Panel A. Cascor Decision Regions WithNo HiddenNodes
Ratio 1

Ratio 2

Panel B. Cascor Decision Regions WithOne or MoreHiddenNodes

Ratio 1

C. Training
The objectiveof the trainingprocessis to autonomously learn7the relationshipbetween the outputand patternsin the input,and to incrementallycapturethatknowledge in of hiddennodes andconnectionweights a uniquestructure which produce correct categorizations.This process of working toward accurate mappings is called "convergence." According to a mathematicaltheoremprovedby Kolmogorovin the 1950s (Lorentz[21]) (and restatedfor
7"Thelearningprocess is actually a trainingprocess. An animalcan be trained by rewarding desired responses and punishing undesired responses. The (neuralnetwork)trainingprocess can also be thoughtof as Whenthe system respondscorrectly involvingrewardsandpunishments. of the current... to an input, the 'reward'consists of a strengthening model weights. This makes it more likely that a similarresponsewill be producedby similar inputs in the future. When the system responds incorrectly, the 'punishment'calls for the adjustmentof the model learningalgorithmemployed, so thatthe weights based on the particular system will responddifferentlywhen it encounterssimilarinputsagain. Desirable actions are thus progressively reinforced,while undesirable actionsare progressivelyinhibited."(Hawley,Johnson,and Raina [16]).
Ratio 2

Panel C. Cascor Decision Regions WithManyHiddenNodes


Ratio 1

SRatio

This content downloaded from 121.52.158.245 on Tue, 2 Jul 2013 00:35:28 AM All use subject to JSTOR Terms and Conditions

146

MANAGEMENT / AUTUMN FINANCIAL 1993

backpropagation neuralnetworksby Hecht-Nielsenin the late 1980s (Caudill[7]) andfor Cascorneuralnetworksby Lacherin the early 1990s (Lacher[19])), the networkwill always eventually figure out how to make perfect mappings of the dataon which the NN is being trained.8 Initially, the network is organized into two layers of nodes. The input layer presentsthe varietyof patterns.In ourresearchexperiments, for example,each pattern representsa differentfirm'sset of five financialratiosfora given year of operation,i.e., five input nodes. The outputlayer is a one-node response (either-1 for distressedor +1 for healthy) associated with a given input pattern.It is the association of patternand response that the network attempts to learn through its internal nodes. The actual numberof hidden nodes and the connection weights are determinedby the neuralnet. As the trainingprocessbegins,the networkis furnished with the entire set of trainingpatterns.At this point, the NN has no hidden nodes and seeks to determinea set of connectionweightswhich, whencombinedwiththe values of theirrespectiveinputnodes,9providea linearmapping to the associated output node. The NN works by minimizing the averageresidualforecasterrorfor the training set. In the initialnetworkwith no hiddennodes, the system of input nodes, output nodes, and connection weights is precisely equivalentto the system of coefficients and independentvariablesin a discriminant analysis (Webband Lowe [36]), i.e., the solution forms a linear frontieras in PanelA of Exhibit2, andthe classificationaccuracyof the NN and the MDA are identical. Once the networkhas looked at all of the patternsin the trainingsample(one pass throughthe entireset of training examples is called an "epoch"),the network starts over again andcycles throughthe set. Whenfurtheradjustment of the weightsproducesno significanterrorreductionafter a certain number of trainingcycles (set by the userlo), trainingof the connectionweights stops.The networkruns one more time over the trainingset to measurethe error.If the user is satisfied with the network's performance,ll
8A network exists which can produce a mapping between inputs and outputsconsistentwith any underlyingfunctionalrelationship.In addition, the inputsneed not be real values (Lorentz[21]). 9See AppendixA for a numericalexampleof how the connectionweights and inputvalues are combinedto form a forecastof the output. loThe user sets parametersfor the error reduction threshold and the numberof trainingcycles. ' Performancesatisfactionis a judgmentby the user.

processingstops. If the residualforecasterroris unacceptable, Cascorstartscreatinghiddennodes one at a time. To create a new hidden node, Cascor starts with a "candidatenode" not yet permanentlyconnected to the network.The candidatenode receives trainableinputconnections from all of the nodes in the network'sinputlayer andfromall preexistinghiddennodes. Cascorrunsseveral passes over the trainingpatterns,adjustingthe candidate node's inputweights,i.e., connections,aftereach pass.The is to maximizethemagnitudeof the goal of the adjustment correlationbetween the candidatenode's output and the residualforecasterrorthat the networkis trying to eliminate. Thus, the new hidden node is intendedto cancel a portion of the forecast error when the node is installed permanently into the network. The new node's input weights are frozen, and all of the outputweights are once again trainedto the errorremainingat the output layer, for i.e., learning,algorithm using a predetermined training, adjustingweights. This cycle of addinghiddennodes one by one repeats untila steadystateis reached;thatis, theprocesscontinues eitheruntilthemodelcanfit all of the samplepatterns, until further additionof hiddennodesno longerreducesforecast error,or untilthe forecasterroris within a tolerablerange determinedby the user.Thus, "training" is the process of and new hidden until the residual nodes creating installing forecasterroris eliminatedor tolerable.To implementthe Cascor neural network, the software NeuralWorks ProfessionalTM II/PLUS (Neural Computing [24]) was used. For a listing of the specific C computercode, see Crowder[10]. Althougha detailedexplanationof the mathematicsof Cascor'slearningprocessis beyondthe scope of thispaper, a general,somewhatintuitiveexplanationis certainlydesirable.When training,the networkis presentedwith one observation fromthe dataat a time.The networkgenerates an outputvalue based on that data (in our case, makes a classification decision) and compares this output to the true, correct output value. The difference between the outputof the networkandthe correctoutputis the error.In suressence, the trainingset presentsa hyperdimensional face to the network, which is the prediction error.The neuralnetwork'stask in trainingis to seek out the global minimumof this errorfunction.By using the inputdatato calculate an outputvalue, and comparingthis to the true value, the networkgeneratesa point on the surfaceof the errorfunction.The algorithmadjustsitself by moving in the directionof steepestdescent, which is the negative of the gradientat thatpoint.This adjustment is accomplished

This content downloaded from 121.52.158.245 on Tue, 2 Jul 2013 00:35:28 AM All use subject to JSTOR Terms and Conditions

FINANCIAL DISTRESSPATTERNS COATS& FANT/ RECOGNIZING

147

by updatingthe coefficients in the functiongeneratingthe output(which involves gradientascent and the chain rule in calculus). Those familiarwith estimationof nonlinear econometric models will see that their optimization is similarto the optimizationdescribedhere.12

B. Criterion for Distress


To identifyfinanciallytroubledfirms,we used auditors' filings, bereportsratherthan the traditionalbankruptcy cause auditors'reportsoffer an earlierwarningof "developing"distressfulconditions(Altmanand McGough[4]). An underlyingassumptionof financial statementsexamined by auditorsis thatthe firm has the ability and intent to continue as a viable operatingentity, i.e., as a goingconcern,for the indefinitefuture.If facts uncoveredduring an audit cast substantialdoubt on the firm's ability to continue as a going-concern (for one year following the date of the financial statements),the firm's independent public accountantsmust consider modifying their report which accompaniesthe financial statements.During the period our study covers, this modificationgenerallytook to as a the form of a disclaimerreport,informallyreferred "going-concernopinion."13 Our decision to use the auditors'reportratherthan a filing as our indicatorof financialdistresswas bankruptcy relevance of based on a desire to capturethe "practical" of a firmmay occur after the predictedevent. Bankruptcy a prolongedperiodof financialdistress.At thispoint,there is little practicaluse for a predictivealgorithmsince the distressednatureof the firm is obvious to virtuallyall of the firm'sstakeholders, i.e., shareholders, employees, vendors, etc. On the other hand, the issuance of a goingconcern opinion is expected to precede bankruptcy, perhaps quite substantially, considering the sequences of events which may lead a firm from financialviability into Anotherreason that issuance of a going-conbankruptcy. cern opinion is preferableto a bankruptcyfiling as the is only one outcome of predictedevent is thatbankruptcy financial distress.Othersinclude reorganization, liquidation, and acquisitionby a viable firm. Regardlessof the eventualoutcome,losses anddownsiderisksprecedingthe final resolutionare likely to be incurredby stakeholders, consideringpre-distressasset values and risk. Bad audit reportscan cause bondratingsto be lowered,lines of credit to dry up, andotherbusinessrelationshipsto be disrupted. Thus, use of the disclaimerreport as the predictioncriterion covers a broaderrange of events thandoes a bankof a going-concernopinion does not mean that the auditors 13Issuance predict failure within the coming year. Nor does the issuance of an unqualifiedopinion (sometimes referredto as a "clean"opinion) represent any guaranteethat the firm will not fail duringthe coming year. A clean opinion simply means thatnothingcame to the auditors'attention duringthe courseof theirworkwhichbroughtthe going-concernassumption into question,whereasa going-concernopinion indicatesthe finding of facts which conflict with thatassumption.

II.Research Design A. Models


Our researchbuilds and tests four Cascor models for predicting financial distress. The models are trained on historicfinancialdatafor manyfirms,some of which were cited by theirauditorsas being likely candidatesfor financial distressandthe remainder being free from such desigfourdifferentleadtimes, nation.The fourmodelsrepresent i.e., the year for which distressfulconditionsin a firm are reportedby auditors, and the one, two, and three years prior. The financial informationwe chose to describe each firm is the set of five ratios from Altman'sZ score model (Altman [1], [2]). We also built four MDA models, again based on Altman'sZ score ratios and using the same data to compare as the Cascormodels, to serve as benchmarks with our neuralnetworkapproachto the pattem-classification of healthy and distressedfirms. We use Altman's becauseAltman'sfindingshave beenthe ratiosthroughout most widely and consistentlyreferencedand used to date by both researchersand practitioners. The five ratios chosen by Altman to explain business viability are:
Z= f(xl, x2,x3, X4,X5), (1)

where assets; capital/total xl = Working = assets; earnings/total V2 Retained andtaxes/total beforeinterest assets; x3= Earnings = of total value of value Market debt; equity/book x4 and assets; x5= Sales/total Z score). Z = Overall index(known as Altman's
12This is an intuitive representationof the Cascor training process, from FahlmanandLebiere[ 13]. Fora mathematical presentation adapted of the process, see FahlmanandLebiere[ 13] andLacher[19]. Forfurther details on neural networks in general, see Minskey and Papert [23], Rumelhart and McClelland [28], Lippman [20], Wasserman and Schwartz[34], [35], Knight[18], and Lacher[19].

This content downloaded from 121.52.158.245 on Tue, 2 Jul 2013 00:35:28 AM All use subject to JSTOR Terms and Conditions

148

/ AUTUMN FINANCIAL MANAGEMENT 1993

ruptcyfiling and shouldhave more relevanceto decisionmakers.

C. Data
The data were collected from the Standard& Poor's COMPUSTATfinancial database covering the period 1970-1989. Two groups of firms were drawn: a group identifiedas financiallydistressedcompaniesand a group identifiedas financiallyviable companies,namedthe "distressed group"and the "viable group,"respectively.The distressedgroup was culled from the IndustrialResearch File which only lists firmsthathave ceased operations.To be in the distressed group, a firm must have received a going-concern opinion and passed a screening process which posed certain"sanitychecks"designed to weed out erroneousdata and make sure that all of the information necessary to calculate the five ratios for Altman's model for the year of the going-concern opinion and the three Therewere 94 suitable yearspriortheretowas available.14 distressedfirmsafterthe screeningprocess.15 To serve as counterexamples to the distressedfirms, a groupof 188 viable firms (two viable firms for every one distressed firm) was chosen randomlyat large from the Full Coverage File. These firms were matched 2-to-1 to the distressed group only with regardto each distressed firm's year of going-concern opinion. To be considered viable, a firm must not have received a going-concern opinion for the year of matchnor for any of the previous threeyears.And to ensurethatfirmson the brinkof distress were not selected as viable, any firm which received a going-concernopinion within two years after the goingconcernopinion for its match in the distressedgroupwas 16The viable group also had rejectedfrom consideration.
involved checking for blank or zero fields and testing for 14Screening ratiovaluesthatwere so unreasonably largeor smallas to suggestan error in the COMPUSTATdatabase. Also, pre-1980 data were adjustedto et al [14, p. 279]). capitalizefinancialleases (Frydman 15These 94 distressedfirms were reportedas ceasing operationsfor the (13), liquidafollowing reasons:acquisitionor merger(15), bankruptcy tion (7), reverseacquisition(1), privatization (6), and other(52). practiceof removingunusualand potentiallymisleadingobserva16The tions from the trainingdata many seem improperto readersaccustomed to performingstatisticalanalysis and hypothesestesting. Readersmust keep in mind, however,thatthe Cascoralgorithmis not a statisticaltool. One does not make statementsof probabilitybased on the outputof the network.It may help to think of the Cascor approach(and indeed most other neuralnetworkarchitecture) as being more relatedto polynomial thanto a statisticalmodel such as linearregression.In fact, interpolation one of the uses of neuralnetworksis as a sophisticated to fitting approach a function to a set of ill-behaved, multidimensionaldata. The fitted functioncan then be analyzedfor its properties.

to pass the same screeningprocessas the distressedgroup. The viable firms were drawnentirely from the manufacturing sector, but data relating to distressedfirms in our databaserunthe gamutof industriesand services. We felt that the numberof distressed firms (51) available from COMPUSTAT's sector alone was insuffimanufacturing cient to adequatelytrainand test the neuralnetwork,17 so 46%of the distressedgroupcomes fromoutsidethe manufacturingsector.This does not bias the comparisonof NN versus MDA since the methodsuse identicaldata. After the sample of firms was selected, data from the two sample groups were randomizedand recombinedto form eight nonoverlappingsets. There were two sets for each of four years.Each set containedone yearof datafor 47 distressedfirmsand94 viable firms.The datarepresent the yearof the going-concernopinion(yo),the year before the going-concern opinion (y-1), two years before the going-concern opinion (y-2), and three years before the Halfof therandomlyselected going-concernopinion(Y-3). sets were used to trainthe NN to recognize patternsthat explained auditors'going-concernopinions. The remaining sets were treated as holdout samples for testing the networks'predictiveability.MDA models for each of the fouryearswere builtusing the same dataon whichthe NN trained,and were tested on the identicalholdout samples used by the NN. Exhibit 3 gives a summarytable of univariate statistics the data. There are differences distinct between describing of the ratios for the distressedand viable the distribution firms. With respect to the distressed firms, there is a in the means and mediansfor ratios markeddeterioration x , x2, andx3 as the year of the going-concernopinion (yo) deviationsof ratiosxl, x2, and approaches.The standard x3 consistentlyincreaseover time, andthereis also a clear tendency for these ratios to become increasinglynegatively skewed with time. Changes in the ratios x4 and x5 over time are not as consistent or striking. Comparing distressed firms in the training sample to those in the testing sample reveals that there are some differences in the two sets of statistics,but these are simply due to the random assignment of the distressed firms to the two samples. In contrast,statisticsfor the viable firms do not exhibit any significantchangesacrosstime. The ratiox4 is heavily skewedto the right,muchmoreso thanit is for the
17Asmall numberof trainingobservationsleads to overfitting.In this situation,the NN learnsto recognize individualcases ratherthangeneralizing. The experiencesof previousNN categorizationstudies indicate thatthe trainingsamplesize of any given categoryshouldnot be less than 30. Forexamples,see Odomand Sharda[25] andUtansandMoody [331.

This content downloaded from 121.52.158.245 on Tue, 2 Jul 2013 00:35:28 AM All use subject to JSTOR Terms and Conditions

DISTRESSPATTERNS FINANCIAL / RECOGNIZING COATS& FANT

149

Exhibit 3. UnivariateStatisticsfor Ratios Used to DistinguishDistressedandViable Firms


Panel A. DistressedFirms TrainingSet Ratios Year Yo Mean Standard deviation Maximum Median Minimum Y-1 Mean deviation Standard Maximum Median Minimum Y-2 Mean Standard deviation Maximum Median Minimum Y-3 Mean deviation Standard Maximum Median Minimum xl -0.72 1.48 0.58 -0.45 -9.47 -0.04 0.50 0.63 0.06 -2.00 0.11 0.31 0.81 0.15 -0.64 0.16 0.30 0.85 0.17 -0.73 x2 -2.30 4.39 0.38 -0.66 -22.64 -0.98 2.84 0.40 -0.17 -18.44 -0.40 0.93 0.55 -0.03 -4.55 -0.27 0.68 0.54 0.03 -2.01 X3 -0.56 0.84 0.36 -0.25 -4.55 -0.26 0.96 0.17 -0.02 -6.21 -0.03 0.28 0.74 0.01 -0.95 0.00 0.24 0.48 0.05 -0.71 X4 1.97 4.47 28.53 0.46 0.01 1.74 3.95 25.70 0.42 0.01 1.51 3.33 21.52 0.38 0.01 1.36 3.07 18.47 0.43 0.01 X5 2.09 1.88 9.10 1.67 0.02 1.46 0.90 4.45 1.40 0.14 1.36 0.72 3.59 1.32 0.24 1.34 0.63 2.55 1.26 0.16 X1 -0.39 0.73 0.64 -0.19 -3.16 -0.04 0.50 0.63 0.05 -2.30 0.15 0.28 0.77 0.14 -1.14 0.22 0.26 0.68 0.22 -0.65 X2 -1.99 3.93 0.44 -0.57 -22.42 -1.27 3.16 0.50 -0.27 -19.46 -0.64 1.59 0.49 -0.20 -9.50 -0.38 0.89 0.51 -0.08 -3.80 Test Set Ratios x3 -0.30 0.61 0.51 -0.15 -2.64 -0.17 0.44 0.37 -0.02 -1.89 -0.12 0.40 0.15 0.00 -1.88 -0.05 0.28 0.79 0.01 -0.93 X4 1.60 2.29 12.22 0.80 0.03 1.92 4.74 32.39 0.64 0.01 1.62 2.50 15.26 0.88 0.04 1.87 3.96 26.19 0.61 0.03 X5 1.62 1.34 5.55 1.30 0.07 1.58 1.13 5.30 1.54 0.05 1.58 1.23 5.77 1.38 0.03 1.57 1.20 5.78 1.42 0.07

Panel B. ViableFirms Set Ratios Training


Year
X1 X2 X3 X4 X5 X1 X2

Test Set Ratios


X3 X4 X5

Yo Mean Standard deviation Maximum Median Minimum Y-1 Mean deviation Standard Maximum Median Minimum Y-2 Mean Standard deviation Maximum Median Minimum Y-3 Mean Standard deviation Maximum Median Minimum

0.34 0.16 0.73 0.36 0.02 0.35 0.16 0.74 0.37 -0.02 0.36 0.16 0.75 0.37 0.01 0.36 0.16 0.71 0.37 -0.07

0.43 0.17 0.81 0.43 -0.02 0.43 0.16 0.83 0.42 0.02 0.43 0.16 0.81 0.41 -0.15 0.41 0.16 0.78 0.40 -0.07

0.16 0.08 0.45 0.15 0.00 0.19 0.14 1.30 0.17 0.03 0.17 0.08 0.46 0.17 0.00 0.17 0.08 0.48 0.17 0.01

15.83 34.75 290.75 4.97 0.15 13.12 20.62 111.54 4.63 0.21 12.32 20.03 133.44 3.90 0.19 11.04 18.02 96.09 3.80 0.11

1.48 0.48 3.94 1.44 0.45 1.50 0.46 3.88 1.48 0.53 1.52 0.44 3.25 1.51 0.53 1.53 0.43 2.98 1.55 0.51

0.35 0.17 0.69 0.36 0.01 0.35 0.17 0.70 0.37 0.00 0.35 0.16 0.70 0.37 0.01 0.35 0.15 0.71 0.37 0.04

0.39 0.19 0.85 0.38 -0.37 0.38 0.18 0.85 0.36 -0.22 0.37 0.19 0.84 0.36 -0.28 0.34 0.26 0.81 0.36 -1.28

0.17 0.08 0.63 0.17 0.00 0.17 0.07 0.34 0.16 0.01 0.17 0.08 0.47 0.16 0.02 0.17 0.09 0.61 0.16 0.05

9.91 13.46 61.13 4.18 0.02 9.68 13.09 58.97 4.08 0.01 8.76 13.08 62.40 3.37 0.01 9.06 15.33 70.24 2.96 0.02

1.42 0.42 2.76 1.38 0.57 1.44 0.44 2.74 1.40 0.58 1.41 0.42 2.48 1.39 0.34 1.45 0.42 2.70 1.36 0.42

This content downloaded from 121.52.158.245 on Tue, 2 Jul 2013 00:35:28 AM All use subject to JSTOR Terms and Conditions

150

FINANCIAL / AUTUMN MANAGEMENT 1993

Exhibit 4. Coefficientsfor MDA Models


Year xi
X2 X3 x4 V5

n?

yo YY-2 Y-3

-0.733 -5.168 -5.752 -4.056

-0.198 -0.849 -1.958 -2.884

-2.259 2.594 -3.396 -2.768

-0.018 -0.057 -0.061 -0.058

-0.154 -0.409 -0.540 -0.292

0.35 -1.71 -2.84 -2.26

Note: functionis: y = X6, whereX is a aThisstatisticrepresentsthe estimatedmidpointbetween the two populations(distressedandviable). The discriminant 1 by 5 vectorof the ratiosx1 throughx5, and 8 is the 5 by 1 vectorof estimatedcoefficients.The allocationrule was to classify firm i as failed if yi2 m.

distressedfirms.Forthe viable firms,this is the only ratio which displaysmuchasymmetry. the statistics Comparing for the distressedfirmsto those for the viable firms,it can be seen that the mean and median of xl throughx4 are consistently larger for the distressed firms than for the viable firms. In addition,the standarddeviations for all ratios except x4 are consistently largerfor the distressed firmsthanfor the viable firms.

III.Prediction Results A. TrainingResults


Using the trainingsets, the Cascorsoftwaredetermined a unique NN for each year. Each NN was traineduntil it delivered 100% classification accuracy for its year. In other words, each trained network eventually correctly identifiedall 47 of the distressedfirmsas being distressed, and all 94 of the healthy firms as being healthy. Some networksrequired up to 1,400 trainingcycles andinstalled as many as eight hiddennodes. Statisticsfor the MDA models for each of the fouryears appearin Exhibit4. The table shows, by year, the coefficients for each ratio and the midpoint score used for predicting viability or distress. We used the software Mathcad(Version 3.0) to implementFisher's linear discriminant.

is misclassified by the predictoras viable and a "type II error" is the misclassification of a viable firmas distressed. An "overallhit" refers to the total correctclassifications for the set, regardless of type.Rate, in each case, is theratio of the number of hits for a given classificationoverthe total numberof actualpatternsin thatclassification. We note that MDA has good success with type II hits, althoughnot with type I hits. Type II hit rates for MDA were all above 90% over the four-yeartest horizon;type I hit rates for MDA range between 63.8% and 70.2%. On the otherhand,Cascorachievedquitehigh scoresfor type II hits while, in addition,showingconsistentlyhigh type I hit rates,consistentlyhigh overallhit rates,androbustness in termsof a longereffective lead time for all predictions. Tests were performedto determineif the two population proportions were equal.Specifically,the null hypothesis was that the proportionof hits, i.e., accuracy,for the MDA model of a given year is greaterthanor equal to the of hitsfor theneuralnetworkmodelof the same proportion year:
Ho: PMDA PNN
?-

(2)

where p is the percentageof hits for either MDA or the Cascor neuralnetwork(NN). The following test statistic is appropriate: (which is normallydistributed)
(PNN- PMDA)0A A

B. Test Results and Implications


Holdoutsampleswere used to test the robustnessof the neuralnetworksfor prediction.We collected statisticson the percentagesof accurateforecastsmade by the trained networks.The MDA models were also tested on the holdout samples and the results were used as benchmarksto assess the performance of the neuralnetworks. Panels A and B of Exhibit5 comparethe classification test results of Cascor and MDA. A "type I hit" is one in which a distressedfirm is correctlyclassified and a "type II hit" is the correct classificationof a viable firm. Conversely, a "type I error"is one in which a distressedfirm

/A A

V2

(3)

PNN qNN + PMDA qMDA

wherep refersto the samplepercentageof hits, q refersto A A the sample percentageof misses (thusp + q 1), and n is the numberof firms.The resultsappearin Exhibit6. From the p-values of this one-tailedtest, it can be seen thatthe null hypothesis is rejected (at the five percent level of significance) for all four tests concerning the group of distressed firms. That is, the neural network outpredicts MDA when the firm in question has or will receive a

This content downloaded from 121.52.158.245 on Tue, 2 Jul 2013 00:35:28 AM All use subject to JSTOR Terms and Conditions

FINANCIAL DISTRESSPATTERNS / RECOGNIZING COATS& FANT

151

Exhibit 5. ClassificationHit Rates and ErrorRates for Cascorvs. MDA Test Results
Panel A. ClassificationHit Ratesfor Cascor vs. MDA TestResults Cascor Year yo Y-i Y-2 Y-3 Type I 89.4 83.0 89.4 80.9 Type II 97.9 97.9 83.0 83.0 Overall 95.0 92.9 86.2 81.9 Type I 63.8 68.1 70.2 66.0 MDA Type II 100.0 90.4 90.4 92.6 Overall 87.9 83.0 83.7 83.7

Panel B. ClassificationErrorRatesfor Cascor vs. MDATestResults Cascor Year Yo Y-1 Y-2 Y-3 Type I 10.6 17.0 10.6 19.1 Type II 2.1 2.1 17.0 17.0 Overall 5.0 7.1 13.8 18.1 Type I 36.2 31.9 29.8 34.0 MDA Type II 0.0 9.6 9.6 7.4 Overall 12.1 17.0 16.3 16.3

Note: Numbersare expressedas percentages.

Exhibit 6. Tests for Differences Between Proportions


<PNN. HO:PMDA PNN versus Ha: PMDA -

Percentageof Hitsfor DistressedFirms n 47 47 47 47 NeuralNetwork (Cascor) 89.4 83.0 89.4 80.9 MDA 63.8 68.1 70.2 66.0 Percentageof Hitsfor ViableFirms n 94 94 94 94 NeuralNetwork (Cascor) 97.9 97.9 83.0 83.0 MDA 100.0 90.4 90.4 92.6 z -1.42 2.22* -1.50 -2.03 p-Value 0.922 0.013 0.934 0.979 z 3.08 1.71 2.39 1.66 p-Value 0.001 0.044 0.008 0.049

Notes: analysis (MDA) or the Cascorneuralnetwork(NN). p is the percentageof hits for eithermultiplediscriminant *Denotes significance at the 5% level.

going-concernopinion. As regardsclassificationof financially healthy firms, the null hypothesis is rejected only once out of fourtests.Therefore, we do notclaim anybetter for networks neural over MDA for the predictionability of viable firms alone. group Whatdo these resultsimply aboutthe relativecontribution of neuralnetworksversusMDA? If the user is simply

concernedwith makinga correctclassification,theneither tool suffices.But, since beingwrongis not a problemwhen the cost is small, decision-makersmust also considerthe magnitudeof costs and benefits of their decisions. Once the relative costs of misclassificationfor each group are considered,the resultsof our analysis indicatethatCascor makes a notablecontribution over MDA as a practitioner's

This content downloaded from 121.52.158.245 on Tue, 2 Jul 2013 00:35:28 AM All use subject to JSTOR Terms and Conditions

152

/ AUTUMN FINANCIAL MANAGEMENT 1993

decision-makingtool. We expect thatthe costs of misclassifying a viable firm as distressed(type II error)are typically small. Considera financialanalystcomparing potential investments or a bank loan officer reviewing loan applications.In an environmentwhere there is a reasonably large numberof viable firms from which to choose, thereis little cost to misclassifyinga viable firm.Thereare Onthe otherhand, manyotherfirmswithwhichto transact. the cost of misclassifying a distressedfirm (type I error) To be more specific, the decision-makcan be substantial. we envisionas most appropriate for Cascor ing framework is a form of Bayesianhypothesistesting,18where costs of errors and probabilitiesof populationmembershipboth explicitly enterinto the final classificationdecision.

enablingCascornetworksto generatepreciseanalogforecasts ratherthanonly binaryclassifications(Fahlmanand Lebiere [13]). This capabilityis experimentalat present, but has the potentialfor providinga forecastof gradations (or degrees or categories) of financialhealth, ratherthan just the bipolarchoices of viable or distressed.

References
Ratios,DiscriminantAnalysis, andthe Pre1. E.I. Altman,"Financial diction of CorporateBankruptcy," Journal of Finance (September 1968), pp. 589-609. 2. E.I. Altman,CorporateDistress: A CompleteGuide to Predicting, New York,NY,JohnWiley Avoiding,andDealing withBankruptcy, & Sons, 1983. 3. E.I. Altmanand R.A. Eisenbeis,"FinancialDiscriminant Analysis: A Clarification," Journal of Financial and QuantitativeAnalysis (March 1978), pp. 185-195. 4. E.I. Altman and T.P. McGough, "Evaluationof a Companyas a Journal of Accounting(December 1974), pp. 50Going Concern," 57. 5. R. Barniv and A. Raveh, "IdentifyingFinancial Distress: A New Journal of Business Financingand AcNonparametric Approach," counting(Summer1989), pp. 361-383. R.A. Olshen,andC.J.Stone, Classifica6. L. Breiman,J.H.Friedman, tion and RegressionTrees,Belmont,Wadsworth,1984. 7. M. Caudill, "NeuralNetworks Primer,Part III," Al E.pert (June 1988), pp. 53-59. 8. M. Caudill,"NeuralNetworksPrimer," The Magazineof Arti-ficial Intelligencein Practice, Miller FreemanPublications,1989. 9. G.W. Cottrel,A. Munro,and D. Zipser,"LearningInternalRepresentations from Gray-Scale Images: An Example of Extensional Proceedings of the 9th Annual Conference of the Programming," CognitiveScience Society, 1987, pp. 461-473. 10. R.S. Crowder,"Software Implementingthe Cascade-Correlation LearningAlgorithmin C," CarnegieMellon University,June 1990. 11. R.A. Eisenbeis,"Pitfallsinthe Applicationof Discriminant Analysis in Business Financeand Analysis, Journalof Finance (June 1977), pp. 875-900. 12. P.J. Elmer and D.M. Borowski, "An Expert System Approachto FinancialAnalysis:The Case of S&L Bankruptcy," FinancialManagement(Autumn1988), pp. 66-76. 13. S.E. Fahlmanand C. Lebiere, The Cascade-Correlation Learning TechnicalReport:CMU-CS-90-100,CarnegieMellon Architecture, University,February1990. 14. H. Frydman, E.I. Altman, and D. Kao, "IntroducingRecursive Partitioningfor Financial Classification: The Case of Financial Distress,"Journalof Finance (March 1985),pp. 269-291. with Lotus15. E. Goss, A. Whitten,and V. Sundaraiyer, "Forecasting
based Logit Regression Models," Journal of Business Forecasting

IV. ConcludingComments
The spirit of this researchis discovery.We compared whichmakes the resultsof the establishedMDA approach a priori assumptions about the discriminatingvariables against a new, more robustneuralnetworkapproach.Our objective was to showcase the advantagesof the NN for recognizingcomplex patternsin data. Test results suggest that the NN approach is more effective than MDA for patternclassification.The MDA model producesexcellent resultsfor the year of the goingconcernopinion.However,Cascordoes betterby comparison in the earlieryears'classifications,sustainingreliability even as we move away from yo. Although the MDA the type methodproduceshigh type IIhit ratesthrough Y-3, I hitratesarenotablyless reliable.Webelieve thatCascor's robustpredictionof distress,in the shapeof type I hit rates consistently above 80% over a four-yeareffective lead time, is worthpursuing. Ourplans for furtherstudies in this areainclude using quarterlydata in place of annual data and using several time periodsin conjunctionto look for time seriespatterns in the magnitudeof changes (or combinationsof changes) in ratios.Also, the ratioswe used in our currenttests were based on Altman's 1968 bankruptcystudy, and different ratios may performbetter today or may be more appropriatefor going-concernopinions.Finally,we believe that Cascor'salreadyhigh hit ratesmay be improvedby further with the parameters, such as the number experimentation of inputand outputnodes, the trainingalgorithm,and the The choice of activationfunction is activationfunction.19 of particularinterestbecause it appearsto be the key to
18SeeZellner [40, Ch. 10] for a review of this procedure. 19SeeAppendixA concerningthe role of the activationfunction.

(Spring 1991), pp. 19-22. 16. D.D. Hawley, J.D. Johnson,and D. Raina, "ArtificialNeuralSysFinancial Anatems: A New Tool for FinancialDecision-Making," 1990), pp. 63-72. lystsJournal(November/December "Multivariate 17. G.V. Karelsand A. Prakash, NormalityandForecastJournal of Business Finance and ing of Business Bankruptcy," Accounting(Winter1987), pp. 573-593. Ideas and Algorithms"Coimmunications 18. K. Knight,"Connectionist of the ACM(November 1990), pp. 59-74.

This content downloaded from 121.52.158.245 on Tue, 2 Jul 2013 00:35:28 AM All use subject to JSTOR Terms and Conditions

FINANCIAL DISTRESSPATTERNS COATS& FANT/ RECOGNIZING

153

19. R.C. Lacher,"ArtificialNeural Networks: An Introductionto the of Comin progress,Department TheoryandPractice,"Monograph puterScience, FloridaStateUniversity,1992. to Computingwith NeuralNets," 20. R.P.Lippmann,"An Introduction IEEEASSP Magazine(April 1987), pp. 4-22. 21. G.G. Lorentz, "The 13th Problem of Hilbert," in Mathematical DevelopmentsArisingfrom Hilbert Problems, F.E. Browder(ed.), Providence,RI, AmericanMathematical Society, 1976. 22. D. McFadden,"A Commenton Discriminant Analysis VersusLogit (1976), pp. Analysis,"Annalsof Economicand Social Measurement 511-523. 23. M. Minskyand S. Papert,Perceptions,Cambridge, MA, MITPress, 1969. Pro24. Neural Computing,Users manualto accompanyNeuralWorks fessionalTM II/PLUS neural network software, Pittsburgh, PA, NeuralWare, Inc., 1990. 25. M.D. Odom and R. Sharda,"A Neural Network Model for BankruptcyPrediction,"in Proceedings of the InternationalJoint Conference on NeuralNetworks,1990. 26. J.A. Ohlson, "FinancialRatios and the ProbabilisticPredictionof Journal of AccountingResearch (Spring 1980), pp. Bankruptcy," 109-131. 27. G.E. Pinches,"Factors InfluencingClassificationResultsfromMulofBusinessResearch(DecemAnalysis,"Journal tiple Discriminant ber 1980), pp. 429-456. and J.L. McClelland,Parallel DistributedProcess28. D.E. Rumelhart of Cognition, Cambridge, ing.:Explorationsin the Microstructure MA and London,England,MIT Press, 1986. 29. E. Scott, "On the FinancialApplicationof DiscriminantAnalysis: Journalof Financial and Quantitative Comment," Analysis (March 1978), pp. 201-205. 30. M. Selfridge and F. Biggs, "The Architectureof Expertise: The Auditor'sGoing-Concern ExpertSystemsReview (Vol. Judgment," 2, No. 3, 1990), pp. 3-18. 31. J.N. Sheth, "Howto Get the Most Out of Multivariate Methods,"in Multivariate Data Analysis, Hair, Anderson, Tatham, and Grablowsky (eds.), Tulsa, OK, Petroleum Publishing Company, 1979, Ch. 1. 32. J.O. Tollefson and O.M. Joy, "Some ClarifyingCommentson Discriminant Analysis Analysis,"Journalof Financialand Quantitative (March1978), pp. 197-200. Via 33. J. UtansandJ. Moody,"SelectingNeuralNetworkArchitectures the PredictionRisk: Applicationto CorporateBond RatingPrediction,"IEEE (June 1991), pp. 35-41. and T. Schwartz,"NeuralNetworks,Part 1,"IEEE 34. P.D. Wasserman Expert(Winter1987), pp. 10-12. and T. Schwartz,"NeuralNetworks,Part2," IEEE 35. P.D. Wasserman Expert(Spring 1988), pp. 10-15. 36. A.R. Webb and D. Lowe, "The OptimizedInternalRepresentation of MultilayerClassifierNetworksPerformsNonlinearDiscriminant Analysis,"NeuralNetworks(Vol. 3, 1990), pp. 367-375. 37. P.J. Werbos, "Beyond Regression: New Tools for Predictionand Analysis in the BehavioralSciences," Ph.D Dissertation,Harvard University,1974. 38. R.J.Williams,LearningInternal by ErrorPropagaRepresentatives tion, Institutefor Cognitive Science Report8506, San Diego, Universityof California,1985. Failure:The Stateof the 39. C.V.Zavgren,"ThePredictionof Corporate Art,"Journalof AccountingLiterature(Spring 1983), pp. 1-38. to Bayesian Inferencein Econometrics, 40. A. Zellner,An Introduction New York,NY, JohnWiley & Sons, 1971.

Issues Relatedto the Estimation 41. M.E. Zmijewski,"Methodological of Financial Distress PredictionModels," Journal of Accounting Research(Supplement1984), pp. 59-82.

AppendixA. Exampleof Cascor PredictionComputations


Each hiddennode is an information processorhavingn and a single output. inputs, n input connection weights, Each node forms a weighted sum of its n inputs,and each weighted sum is transformedby a nonlinear activation functionto computethe output.In otherwords,the output is a nonlinear function of a scalar product of the input values. This transformation may be as simple as providing an "ON"indicationif the sum exceeds a certainthreshold, or may be more complicated,such as a sigmoidal,Gaussian, or exponentialfunction.Justas the activationfunction is not limited to a simple on/off, inputs may be analog, Boolean, or discrete. Exhibit A-i is an abbreviatedexample of corporate financialinputdatain the formatused in ourCascortrials. Each row of the table is one pattern which represents financialdatarelatingto one firm,namely: * One set of the five ratios for the Altman Z score model; and * An actual output of -1 for a financially distressed firm and +1 for a viable firm, as determinedby the auditors'going-concernopinion. for predictExhibitA-2 displaysthe nodal architecture in data for Pattern the file. The the 1 ing output example five ratios and the bias20node are nodes No. I to No. 6. When the hidden node is added, it becomes node No. 7. The outputis represented by node No. 8. The computations of the value of the hiddennode and the value of the output node are shown in Exhibit A-3. We used the symmetric sigmoidal activationfunction for computingthe value of the hidden node and the value of the output node. The equationis: Activation valueof node=
(1

e(SUM)

whereSUM is the weighted sum of the inputsto the node. Cascor's prediction of an output of +1 for Pattern 1 is correctaccordingto the actualoutputfor this viable firm.

20For Cascornetworks,one of the n inputsmustalwaysbe a constantterm set to +1 (Fahlman and Lebiere[ 13]). (knownas the "bias")permanently

This content downloaded from 121.52.158.245 on Tue, 2 Jul 2013 00:35:28 AM All use subject to JSTOR Terms and Conditions

154

FINANCIAL / AUTUMN MANAGEMENT 1993

Exhibit A-1. Example Data File


Patterns
x1 X2

FinancialRatios
3 "4 X5

Actual Outcomes

1 2 3

0.56 0.26 -0.52

0.40 0.40 -0.44

0.09 0.11 -0.25

1.48 24.76 0.38

1.77 1.20 1.35

+1 +1 -1

Notes: +1: viable; -1: distressed.

for Pattern1 of ExampleData File Exhibit A-2. CascorArchitecture


-7.497735 (Bias) +1 O 6.148956

-1.533165 0.56 -2.242788 0.40 O

21.346317 -0.170404

I! -25.111189 169.550812
0.133075 -0.361141 1.409957

0.09
3.410274 1.48 OI11.77 O InputNodes (Nos. 1-6) Connection Weights HiddenNode (No. 7) -0.696259

Connection Weights OutputNode (No. 8)

Notes: O representsa node. U representsa frozen connectionweight. O representsa connectionweight which is trainedrepeatedly. Verticallines sum all incoming activation.

Exhibit A-3. CascorExecutionWorksheetfor Pattern1 of ExampleData File


ActivationValueof HiddenNode (No. 7) Computed InputNodes
+1 0.56 0.40 0.09 1.48 1.77 * * * * * *

ConnectionWeights
-7.4977350 -1.5331650 -2.2427880 -25.1111890 3.4102740 -0.6962590 SUM = = = = = = = -7.497735 -0.858572 -0.897115 -2.260007 5.047206 -1.232378 -7.698601

HiddenNode's ActivationValue =

- 0.5 = -0.5. 698 + (1 e-(-7.69801))

This content downloaded from 121.52.158.245 on Tue, 2 Jul 2013 00:35:28 AM All use subject to JSTOR Terms and Conditions

FINANCIAL COATS& FANT/ RECOGNIZING DISTRESSPATTERNS

155

Exhibit A-3. CascorExecutionWorksheetfor Pattern1 of ExampleData File - Continued


Activation Valueof OutputNode (No. 8) Computed InputNodes +1 0.56 0.40 0.09 1.48 1.77 -0.50 * * * * * * * ConnectionWeights 6.148956 21.346317 -0.170404 169.550812 0.133075 -0.361141 1.409957 SUM = = = = = = = > 1
=+(1.

6.148956 11.953937 -0.068161 15.259572 0.196951 -0.639220 -0.704339 15

OutputNode's Activation Value = 2

j0.5

Note: Just as the outputnode has a user-determined range of -1 to +1, there is a user-determined range of -0.5 to +0.5 for the hidden node(s). In the from the hiddennode's activationvalue andthe outputnode's activationvalue, and the outputnode's activationvalue example above, 0.5 is subtracted is multipliedby 2. These are simply scale adjustments to keep the values within theirprescribedboundaries.

This content downloaded from 121.52.158.245 on Tue, 2 Jul 2013 00:35:28 AM All use subject to JSTOR Terms and Conditions

You might also like