You are on page 1of 14
Artificial Neural Networks: A Tutorial Ani. Jain ‘Michigan State University Jianchang Mao | KM Mobiuddin {IBM Almaden Research Center | These massively parallel systems with large numbers | of interconnected simpl | processors may solve a variety | of challenging computational problems. This tutorial provides the background and the basics. | merous advances hve been nade in developing ineligent Db eeiis Mecat ser Resencherfrommary went dplines re desgnngat fia neural networks (ANN) ose aay o problemen pattern treognton,predtan, opinion esodae emery and conga (seth “Chlleging problems eb) Convnionalappescher ne ben pote fr svn pro ten ough sts epplaons cane foundin en weleon Srinedenvfonment, none fexble enough pero wel ouside {Soman ANN prvi ecing array, and eon aplctons couldbeneit omen them ‘This aice fr thos readers wkh itor no aod of ANN 0 hebpthemundestand nother aren thi sect Gamer Neds Costhemotatonetind the development ofa debe basic Pilot neron and th ara computational model cue net ‘orkarcietresandleaingprocese, and resetsome ofthe ost Common wed NN dele ccondde wcrc recgiion Stee AN apt. WHY ARTIFICIAL NEURAL NETWORKS? “The long course of evolution has given the human brain many desie- able charaeteristies not present in von Neumann or modern parallel com puters. These include — ++ SHTIDAeM representation and computation, SSrnngabi 1 Seerlnon by, + ape “ese " owenenyconspton: Iishoped that devices bsedon lege neu nerwoks wil poses ‘Mowoltho deste caraceriste oer dial computes outperform humans in the donln of numere computation nd tated symbol manipulation, Hower mang anes soe compet perceptual robles ite eoy Tinga manna crowd roma ere enprot hela) ash 0 ih {peed andcnt nto dw ie wos aes conputr Why there a ~=Sr—rsi—O.and0oth —_Boitamanndearningirastochasaciearing rule eid ‘emus. neveworelassclassiiemtionproblenptnepereep: from information-theoretic and thermodynamic princi ron asians an input para fo ore cass fy = 1. ad to ples.” Ieobjectv of Bolemann earning stoadhsttne Inheotherclassifpeet. The linear equation ‘connection weigitsso that thestatesofvisioleunitssaily _partiennedositee phabiiny diamine, ccorline = Sone ‘the Boltzmann learning ruie, tne change in the connec tion weighewy gwen by | ay =11F4=20 | defines the decision boundary (a hyperplane in the “rcinensonalinnarspace}tharhaWesthespace, hero ithe leaning rate and py and pate the core ‘RGHEBIARAMENEOPEM 2 learning procedure to deter: mie the weights and threshold ina pereepron given a “refining pater (se he"Beepuonleamingaleo: mde respedvely T)valesof Oyandpyareusuallest ti sidebar). ‘matedeftomsMonte=Caronexperiments, which ae Note that learning occurs only when the perceptron extremely son sakes crt Rosenblt provedrehenteaningpe Boltzmann Tearnine cm he vowed a specie as of “terns are drawn rom he insti separable lasses, the | ingrateto decrease gradually asthe leaming process pro: | 7. Gotostep Zand repeat forthe next pattem until the ceedstowards 0. However, thisarificial freezing oflearning causes another problem termed pasty, which s the abil lity to adapt to new data. This is known as Grossbey’s st bility plasty dilemma in compitive learning. terror inthe output layer Is below a prespecified thresh- ‘ld or a maximum numberof iterations s reached. March 1996 | eneneconeion thin, nd cmpete ening iLantpsebandonersreoretcenboel Wate el onsets hie hablsiceage Mase eae rapes ofc nes oO, top layer Hidden layers ae weltinown learning algortims tures. However, each learning algorithm is designed for training aspecitcarchitecture. Therefore, when we dis cuss a learning algorithm, a particular network atch tecture astociation is implied. Each algorithm can radigm __Learningrule Architecture Learning algorithm Task Supervised Errorcortection single-or Perceptron Pattern clasificatio multilayer learning algorithms Function approniiation perceptron Back-propagetion Prediction, conti Adaline and Madaline : Boltzmann Recurrent Boltzmann leaming Patter lasification algorithms Hebbian Muitlayerfeed- Linear discriminant ——=Dataanalysis forward analyis Pattern claification Competitive Competitive Learning vector wrthinloss. quantization categorization Data compression ART network ARTMap Patter dasifcaton ‘Within-class: a q categoriztion ‘Unsupervised Errorcorrection Multilayer feed- Sammon'sprojectionDataanalyse ae forward Hebbion Feedforwardor Prindpalcomponent Data dnalyshs | | competitive analy Data compression | Hopfield Network Associative memory _aAssorniive memipge learning fe Pee | Competitive ‘Competitive ‘Vector quantization Categorization Data compressior | Kehonen'sSOM —_Kohonen’s SOM Garegorieation Data analy ART networks ARTS, ARTZ Categorization | Hybrid Error-correction —_RBF network RBF learning Pattern classification “| andeompetiive ‘algerthm Function approximation Prediction, control. 1 32 | Computer = Description of exclusve-OR asses with ructure | decison regions problem nee regions Half plane bounded by hyperplane Single layer PD ‘Acbiteary \ (complexity limited by number of hidden Two layer units) ©, abitrary {complexity limited by umber of hidden ee unit) Figure 8. A geometric interpretation ofthe role of hidden unit in a two-dimensional input space. perform only afew tasks well, The last column of Table 2 lists the tasks that each algorithm can perform. Dueto space imitations, we do not discuss some other algo: rithms, including Adaline, Madaline,* linear discrimi nant analysis,” Sammon's projection, and principal component analysis Interested readers can consult the corresponding references (this article does not always citethe first paper proposing the particular algorithms) MULTILAYER FEED-FORWARD NETWORK: Figure 7 shows atypical three-layer percepcron. Ingen eral ‘we adapt ‘he convention that the iiptiinGRlesaitenoecounted asa th Multilayer perceptron Th “works mttaer perepron: inhi each computa ‘sigmotefunetion, Multilayer perceperons eansormabie ‘uarily complex decision boundaries and represent any ‘Booleanifuneion.” The development ofthe back propa gation learning algorithm for determining weights ina ‘multilayer pereepsron has made these networks the most popular among researchers and users ofneural networks. ‘Wedenote et (6, I, x, A), (XO ATE De a ser ofp ening patens input-output pa), whee x « Re ‘te Input vector inthe n-dimensional pattern space, 2m: -Forclassif- hea ‘@oeeost Rimetion most frequently used in the ANN literatures defined as eet foam ‘Theback propagation algorithm isa gradient-descent method to minimize the squaredertor cost function Equation 2 (se “Back-propegation algorith” sidebar). | "Ageomettc interpretation (adopted and modified fom Lippmann) shown n Figure can help explicate the ole ofhidden units (with the threshold activation function). | "Each ni in the rst hidden layer forms a hyperplane “ne pater space: boundaries berween pattern casses, cane approximated by berpanes. nie in the see- ‘ond hidden layer forms a hypertegion from the ourputs @ “a decision region is obtained by "performing an AND operation on the hyperplanes. The ‘ourputlayer units combine the decision regions made by ‘the units in the second hidden iayer by performing iogie \eabsOReoperations. Remember that this scenario is depicted only to explain the role of hidden units. Their factual beavor, afterthe networkis trained, could differ. | ‘A uwo-ayer network ean form more complex decision boundaries than those shown in Figure8. Moreover, mu slayer perceptrons with sigmoid activation functions can. form smooth decision boundaries rather than piecewise a | Radial Basis Function network The Radial Basis Function (RBF) network which has ‘wolayer, espe dasofmallyerfedrvar net March 1996 | ‘works, Bach unitin the hidden layer employsa radial bass function, such as Gaussian kernel asthe activation fune tion. The radial basis function (or kernel function) sen: ‘ered atthe point specified by the weight vector associated ‘with the unt. Both the positions and the widths ofthese kernels nustbe learned from taining pattems. There are usually many fewer kernelsin the BF networkthan there are training patterns. Each curput unit implements alin ear combination of hese radial basi functions. From the point of view of function approximation, the hidden units providea se of functions thar constitute & basis set for rep: resenting input patterns in the space spanned by the hi. den units “There ae a variety of earning algorithms forthe RBF network.’ Thebasiconeemploysa two-step learning srat- ‘ey, or hybrid learning, Icestimates kernel postions and [kernel widths using an unsupervised clustering algorithm, followed by supervised least mean square (LMS) algo. rithm to devermine the connection weights berwoen the hidden layer and the output ayer. Because the output units arelinesr, 2 nonterative algorithm eanbe used. After this. initial solution is obtained, a supervised gradient-based algorithm can he used to refine the network parameters. ‘This hybrid earning algorithms for training the RBF ne ‘work eonverges much faster than the back-propagation algorithm for training muldlayer perceptrons. However, for many problems, the RBF network often involves a larger namber of hidden units. Tis implies that the run time (after taining) speed ofthe RBP nework i often slower than the runtime peed ofa multilayeeperceptzon. ‘The efficiencies (error versus networksize) ofthe RBF at. work and the multilayer perceptron are, however, prob lem-dependent.Ihas been shawn that the RBP network has the same asymptocc appreximation power a aml ‘layer perceptron, Issues ‘There are many issues in designing feed-forward net works, including hhow many layers are needed fora given task, + how many units are needed per layer, + how lhe network perform ondata notincded in the taining set (generalization ability), and + how large the training set shouldbe fo “good” gen calization Although multilayer fed forward networks using back- propagation have been widely employed frcassficacon and function approximation * many design parametets still must be determined by trial and error. Existing theo: retical results provide only very lose guidelines forselec ing these parametersin practice, KOHONEN’S SELF-ORGANIZING MAPS Theself- organizing map (SOM)*has the desirable prop erty of topology preservation, which captures an impor- tant aspect of the fearure maps in the cartex of highly developed animal brains. na topology-preserving map- ping, nearby input patterns shoul activate nearby output ‘union the map. Figure 4 shows the basi network arch: tecture of Kohonen’s SOM. Irbasially consists of axwo- dimensional array of units, each connected to all minut nodes. Let wy denote the n-dimensional vector associated with the unit at location (i, ofthe 2D array. Each neuron ‘computes the Euclidean distance berween the input vee torx and the stored weight vector ‘This SOM isa special ype of competitive leaning net work that defines a spatial neighborhood for each output lit. The shape ofthe Tocal neighborhood can be square, rectangular, or circular. Initial neighborhood size soften settoonehalfto two thirds ofthe nenworksize and shrinks ‘overtime according toa schedule (for example, an expo: ‘entialy decreasing function). During competitive earn ing allthe weight vectors associated with the winner and itsnejghboring units are updated (se the “SOM learning I. Iiaizeweightsto ial random nubs setiitil ear | ingrate and neighborhood | _2.Present a patter) x, ond evaluate the network outputs. | 4.Select the unit Gc) withthe minimum ouput: algorithm" sidebar). Kehionen’s SOM ean be used for projection of multi variate dat, densityapproximation, and cluscering,Ithas bbeen successfully applied in Une areas of speech recogni tion, image processing, robotics, and process contr The ‘design parameters include the dimensionality ofthe neu A : ron array, the number of neutons in each dimension, the 4 Update all weights according to the following learning | _shapeof the neigbortood, the shrinking schedule ofthe rules neighborhood, and the learning rate. ADAPTIVE RESONANCE ‘THEORY MODELS Recall that the stabi plasticity dilemme is an impor: tant issue in competitive learning. How do we leatn new things (plasticity) and yetretain the stability to ensure that existing knowledge snot erased or corrupted? Carpenter and Grossberg’s Adaptive Resonance Theory models (ARTL,ART2, and ARTMap) were developed nan attempt toovercome this dilemma.” The network has asulicient supply ofoutpat units, but they arenot used until deemed necessary. A unieis said abe committed (wncomited) itis Gsnot) being used. The learning algorithm updates femal mpl ws we) COL IED EM wit, otherwise, | athere Wa [0 isthe neighborhocd of the unit oat ‘ime t anda) isthe learning ate. 5, Dees the value of) an shrink he neighborhood be ) “6 Repeat steps 2 through 5 until the chang In weight vak "es sess than a prespectied threshold ara maximum ‘number of erations isteached the stored prototypes ofa category only if the input vector i sufficiently similar to them, An input veetor and a stored proto- ‘ypearesaid toresonate when they aresuf- ficiently similar. The extent of similarity Is controlled bya vigilance parameter, 9, with (0

for zip code recognition. A 16 16 normalized gray level images presented toa fed-for- ‘ward network with three hidden layers. ‘The units inthe firs layer are locally con: nected tothe units inthe input layer, Form: ng a set focal feature maps. The second hidden layer is constructed in a similar way. Each unit in the second layer also combines local information coming from feature maps inthe irs layer ‘Theactivation levelof anoutpur unitcan be interpreted as an approximation of the 1 posteriori probability ofthe input pat tern’s belonging toa particular class. The ourpurcategoriesare ordered according to activation levels and passed to the post- processing stage. In this stag, contextual Jnformation is exploited to update theclas- sifie’s output. This could, for example, involve looking up a dictionary of admiss- ble words, orutlizing symtacticconstraints present, for example, in phone or socal security numbers Results "ANNs work very well the OCR application. However, theres noconclusive evidence about their superiority over conventional statistical pattern classifiers At the Fist ‘Census Optical Character Recognition System Conference held in 1992," more than 40 diferent handwritten char ‘acter recognition systems were evaluated ased on their performance ona common database. The top 10 perform: fers used either some type of multilayer feed-forward net ‘work ora nearest neighbor based classifier. ANNs tend to be superior in terms of speed and memory requirements ‘compared tonearest neighbor methods. Unlike the nearest neighbor methods, classification speed using ANNisinde pendent ofthe size ofthe training sex. The recognition acc racies of the top OCR systems on che NIST isolated (presegmented) character data were above 98 percent for digits, 96 percent fr uppercase characters, and 87 percent for lowercase characters. (Low recognition accuracy for lowercase characters was largely de to the fact that the test data differed significantly from the raining data, as ‘yell as being dve to “ground.truth” errors.) One conclu ‘on drawn from the tests that OCR system performance ‘on isolated characters compares well with human perfor ‘mance. However, humans sill outperform OCR systems ‘on unconstrained and cursive handwritten documents, DDevELorMeNTS IN ANNs Have stinuLaTED alo of enthus mand eitiism, Some comparative studies are optimistic, some offer pessimism, For many tasks, such as pattern recognition, no one approach dominates the thers. The ‘choice ofthe best technique should be driven bythe given ‘application's nature, Weshould uyto understand the capac ies, assumpeions, and applicability of various approaches and maximally exploit their complementary advantagesto Figure 10,A sample set i ‘characters in the NIST database. rowttor Computer || wom COMPUTA! Sect. inl IA Spue Ds WS RK Figure 11.two schemes for using ANNs in an OCR system. develop beter inteligencsystems. Such an effort may lead toa synergistic approach that combines the strengths of [ANNs with other technologies vo achieve significant ber ter performance for challenging problems. As Minsky recently observed, the time has come to build systems out lof diverse components. individual modules are important, but we also needa good methodology for integration. Iis clear that communication and cooperative work between i March 1996 researchers working in ANN and other disciplines will not only avoid repetitious work but (and more important) will simulate and benefit individual diseipines. Acknowledgments We thank Richard Casey (BM Almaden); Pat Flynn (Washington State University); William Punch, Chitra Dorai, and Kalle Karu (Michigan State University); Ali ‘Khotanzad (Southern Methodist University); and Ishwar Sethi (Wayne State University) for their many useful su pestions. References — 1. DARPA Nara Newor Stay, AFCEA In Pres, Fata Va, 1988, 2 Hert, A Keogh, and RG. Palmer, Prodcion othe The oryofNewral ampucation, Adon Wesley, Reading, Mass, wn, 5. Haykin, Neural Neeworks A Comprehensive Foundation, Mactillan Calege Publish Co, New Yuk, 194, ‘W.S. Meculloch and W, Pits, “A Logical Calulus of Ideas Immanent in Nervous Act,” Bull. Mochemarica Bi. lysis, ol 5,194, pp 15-185. 1. Rosenblat Prine of Neuroyaamic, Spartan Books, Neon York, 1962, 6. M. tins. Papen, Repro: An troduction to Com putational eam, MIE Pes, Cambridge Mas, 1969. 7 1-H, "Neural Nesworks and Phys ystems with ergent Collective Computational Abies," in Proc Nat eatery of cece, USA79, 1982, p. 2554 2588, 8. P.Werbes, "Beyond Regression: New Tels for Pediton and Analysis a the Behavioral Sciences” PAD thsi, Dept. of Applied Matbemaris, Harvard Lniery, Cambridge, Mas, ww, DE RumelhartandJ1. MeClland, Pra Distributed Pro- cessing: Eploration nthe Mcrostrocure of Cognition, MIT Press, Cambridge, Mas, 186, 10, 3A Anderson and . Rosenfeld Neurccumputing: ead tionsof Research, MT Press, Cambridge, Mas, 1988. 8. Brunakand B. Lautrup, Neural Neoworks, Computers wth Incltan, Wold Scent Singapore, 1990 12, Feldman, MA. Fay and NH Goddard, “Compating with Struerure Neural Ncworks," Compute, Vol 21 MoS, Maz. 1988, pp 91-109. 13, D.0. Hebb, The Organisation ofa, fob Wily & Sons, Nev York 1949, 14. RP Lipprnann, “An intodesonto Computing with Neural ets IEEEASSP Magee, Vol No.2, pr 1887, p. 422 15, AJC Jen and. Mao, “Neural Nesorke and Pattern Resog nion,"in CompucatenalIntlgence: Imitating Life, ara, RJ Man and C.J. Robinson ed, EE Pres, Pecans, NJ, 1994, pp. 194-212 16, T Kohonen, Sy Organization and asoitive Merry, Thi alton, Springer-Verlag, New York, 1989. 17 GA. Carpenter ana 5. Grossberg Pars Ragin y See Organising Neuralneors MIT Pres Carb, Mas, 291 18, “The ls Cans Opa Character Recognition System Co ference” RA. Wiking eal, ed, Tech Report, NISTIR 4912, Us Dept. Commerce IST, Gaithersburg, Mé, 1992 19, KMohiuddinandJ. Meo, A Comparative SadyofDiferent Computer Classifier for Hanprinted Charste Recognition,” in Fat tern Recognition in Proce W, ES Gleema and LN. Kanal, ls, Hever cence, The Netherlands, 1994, pp. 457-448. 20. ¥.LeCunetal, "Bod Propagtin Applied Handvrten 2p Code Recogition, Neural Computation, Vol. 1, 1989, pp. sal-ss1 21. ML Mins, “Logical versus Analogical or Symbolic Vers CConnecsinistor Net Versus Sealy" ATMagazne Vol. 65, No.2, 1991,pp. 345. Anil K. Janis Univesity Distinguished Professor andthe ‘chair ofthe Department of Computer Science a Michigan ‘State University. His interests daclude statistical pateera recognition, exploratory patter analysis, neural menworks, ‘Markov random fields, texture analysis, remote sensing, inuerpretationofrange images, and 3D object recognition Jain served sete r-chiefof ERE Transactions on Pat tem Analysisand Machine neligence from 1991 to 1994, and currently serves on the editorial boards of Pattern, Recognition, Patiern Recognition eters, Journal of Math- ematical Imaging, Journal of Applied Intelligence, arc TERE Transactions on Neural Networks, Hens coauthored, edited, and coedited runerous books inte ld Jain isa {fellow of the TEBE anda speaker in the IEEE Computer Soc- 0/8 Distinguished Visitors Program forthe Asia-Pacific region, He isa member ofthe IEEE Computer Society Dlanchang Mao isa reearch staff member atthe IBN Atmaden Research Center: isinteress include pattern eo- nition, neural networks, document image analysis, image rocessing, computer vision and parallel computing. Mao received te BS degre in physics in 1983 and che MS ddegreeinelecricalengineringin 1986 frm East China Nor ‘mal University in Shanghai. Hereceved the PRD in computer science from Michigan State Universieyin 1994, Mao is the abstracts editor of EEE Transactions on Neural Networks. Heisa member ofthe IEE ac the IEEE Computer Socey KLM, Mohiuddin isthe manager othe Docurtere Image ‘Analysis and Recognition projec in the Computer Selence Departmentat the BM Almaden Research Genter He haste. {BM projects on high-speed reconfigurable machines for industrial machine vision parallel processing fr scientific computing, and document imaging systems. His interests include document image analysis, handwriting recogni tion OCR, data compression, and computer architecture. _Mohiuadin received the MS andl PRD degrees inelectrcal engineering from Stanford University in 1977 and 1982, respectively. Heis an associate editor of EEE Transactonson Pattern Analysis and Machine Intelligence. ff erved on Computer's editorial board from 1984 to 1989, and isa senior member of the IEEE and a member ofthe IEEE Cort ter Socety, Readers can contact Anil Jai at the Department of Com puter Science, Michigan State University, A714 Wells Hall, East Lansing, MI 48804; jain@eps. sued,

You might also like