You are on page 1of 7

Fundamentals of Artificial Neural Networks

Mohamad H. Hassoun

A Bradford Book The MIT Press Cambridge, Massachusetts London, England

Contents
Preface Acknowledgments Abbreviations Symbols 1 1.1 Threshold Gates Threshold Gates 1.1.1 Linear Threshold Gates 1.1.2 Quadratic Threshold Gates 1.1.3 Polynomial Threshold Gates Computational Capabilities of Polynomial Threshold Gates General Position and the Function Counting Theorem 1.3.1 Weierstrass's Approximation Theorem 1.3.2 Points in General Position 1.3.3 Function Counting Theorem 1.3.4 Separability in ^-Space Minimal PTG Realization of Arbitrary Switching Functions Ambiguity and Generalization Extreme Points Summary Problems Computational Capabilities of Artificial Neural Networks Some Preliminary Results on Neural Network Mapping Capabilities 2.1.1 Network Realization of Boolean Functions 2.1.2 Bounds on the Number of Functions Realizable by a Feedforward Network of LTG's Necessary Lower Bounds on the Size of LTG Networks 2.2.1 Two Layer Feedforward Networks 2.2.2 Three Layer Feedforward Networks 2.2.3 Generally Interconnected Networks with No Feedback Approximation Capabilities of Feedforward Neural Networks for Continuous Functions 2.3.1 Kolmogorov's Theorem 2.3.2 Single-Hidden-Layer Neural Networks Are Universal Approximators 2.3.3 Single-Hidden-Layer Neural Networks Are Universal Classifiers xiii xix xxi xxiii 1 2 2 7 8 9 15 15 16 17 20 21 24 27 29 30 35 35 35 38 41 41 44 45 46 46 47 50

1.2 1.3

1.4 1.5 1.6 1.7 2 2.1

2.2

2.3

VUl

Contents

2.4

2.5 3 3.1

Computational Effectiveness of Neural Networks 2.4.1 Algorithmic Complexity 2.4.2 Computational Energy Summary Problems Learning Rules Supervised Learning in a Single-Unit Setting 3.1.1 Error Correction Rules 3.1.2 Other Gradient-Descent-Based Learning Rules 3.1.3 Extension of the jU-LMS Rule to Units with Differentiable Activation Functions: Delta Rule 3.1.4 Adaptive Ho-Kashyap (AHK) Learning Rules 3.1.5 Other Criterion Functions 3.1.6 Extension of Gradient-Descent-Based Learning to Stochastic Units Reinforcement Learning 3.2.1 Associative Reward-Penalty Reinforcement Learning Rule Unsupervised Learning 3.3.1 Hebbian Learning 3.3.2 Oja'sRule 3.3.3 Yuille et al. Rule 3.3.4 Linsker's Rule 3.3.5 Hebbian Learning in a Network Setting: Principal-Component Analysis (PCA) 3.3.6 Nonlinear PCA Competitive Learning 3.4.1 Simple Competitive Learning 3.4.2 Vector Quantization Self-Organizing Feature Maps: Topology-Preserving Competitive Learning 3.5.1 Kohonen's SOFM 3.5.2 Examples of SOFMs Summary Problems

51 51 52 53 54 57 57 58 67 76 78 82 87 88 89 90 90 92 92 95 97 101 103 103 109 112 113 114 126 134

3.2 3.3

3.4

3.5

3.6

Contents

ix

4 4.1 4.2

Mathematical Theory of Neural Learning Learning as a Search/Approximation Mechanism Mathematical Theory of Learning in a Single-Unit Setting 4.2.1 General Learning Equation 4.2.2 Analysis of the Learning Equation 4.2.3 Analysis of Some Basic Learning Rules 4.3 Characterization of Additional Learning Rules 4.3.1 Simple Hebbian Learning 4.3.2 Improved Hebbian Learning 4.3.3 Oja's Rule 4.3.4 Yuille et al. Rule 4.3.5 Hassoun's Rule 4.4 Principal-Component Analysis (PCA) 4.5 Theory of Reinforcement Learning 4.6 Theory of Simple Competitive Learning 4.6.1 Deterministic Analysis 4.6.2 Stochastic Analysis 4.7 Theory of Feature Mapping 4.7.1 Characterization of Kohonen's Feature Map 4.7.2 Self-Organizing Neural Fields 4.8 Generalization 4.8.1 Generalization Capabilities of Deterministic Networks 4.8.2 Generalization in Stochastic Networks 4.9 Complexity of Learning 4.10 Summary Problems 5 5.1 Adaptive Multilayer Neural Networks I Learning Rule for Multilayer Feedforward Neural Networks 5.1.1 Error Backpropagation Learning Rule 5.1.2 Global-Descent-Based Error Backpropagation Backprop Enhancements and Variations 5.2.1 Weights Initialization 5.2.2 Learning Rate 5.2.3 Momentum

143 143 145 146 147 148 152 154 155 156 158 161 163 165 166 167 168 171 171 173 180 180 185 187 190 190 197 197 199 206 210 210 211 213

5.2

Contents

5.3

5.4

5.5 6 6.1

5.2.4 Activation Function 5.2.5 Weight Decay, Weight Elimination, and Unit Elimination 5.2.6 Cross-Validation 5.2.7 Criterion Functions Applications 5.3.1 NETtalk 5.3.2 Glove-Talk 5.3.3 Handwritten ZIP Code Recognition 5.3.4 ALVINN: A Trainable Autonomous Land Vehicle 5.3.5 Medical Diagnosis Expert Net 5.3.6 Image Compression and Dimensionality Reduction Extensions of Backprop for Temporal Learning 5.4.1 Time-Delay Neural Networks 5.4.2 Backpropagation Through Time 5.4.3 Recurrent Backpropagation 5.4.4 Time-Dependent Recurrent Backpropagation 5.4.5 Real-Time Recurrent Learning Summary Problems Adaptive Multilayer Neural Networks II Radial Basis Function (RBF) Networks 6.1.1 RBF Networks versus Backprop Networks 6.1.2 RBF Network Variations Cerebeller Model Articulation Controller (CMAC) 6.2.1 CMAC Relation to Rosenblatt's Perceptron and Other Models Unit-Allocating Adaptive Networks 6.3.1 Hyperspherical Classifiers 6.3.2 Cascade-Correlation Network Clustering Networks 6.4.1 Adaptive Resonance Theory (ART) Networks 6.4.2 Autoassociative Clustering Network Summary Problems

219 221 226 230 234 234 236 240 244 246 247 253 254 259 267 271 274 275 276 285 285 294 296 301 304 310 311 318 322 323 328 337 339

6.2

6.3

6.4

6.5

Contents

7 7.1

7.2

7.3 7.4

7.5 7.6 8 8.1

Associative Neural Memories Basic Associative Neural Memory Models 7.1.1 Simple Associative Memories and their Associated Recording Recipes 7.1.2 Dynamic Associative Memories (DAMs) DAM Capacity and Retrieval Dynamics 7.2.1 Correlation DAMs 7.2.2 Projection DAMs Characteristics of High-Performance DAMs Other DAM Models 7.4.1 Brain-State-in-a-Box (BSB) DAM 7.4.2 Nonmonotonic Activations DAM 7.4.3 Hysteretic Activations DAM 7.4.4 Exponential-Capacity DAM 7.4.5 Sequence-Generator DAM 7.4.6 Heteroassociative DAM The DAM as a Gradient Net and Its Application to Combinatorial Optimization Summary Problems Global Search Methods for Neural Networks Local versus Global Search 8.1.1 A Gradient Descent/Ascent Search Strategy 8.1.2 Stochastic Gradient Search: Global Search via Diffusion Simulated Annealing-Based Global Search Simulated Annealing for Stochastic Neural Networks 8.3.1 Global Convergence in a Stochastic Recurrent Neural Net: The Boltzmann Machine 8.3.2 Learning in Boltzmann Machines Mean-Field Annealing and Deterministic Boltzmann Machines 8.4.1 Mean-Field Retrieval 8.4.2 Mean-Field Learning Genetic Algorithms in Neural Network Optimization 8.5.1 Fundamentals of Genetic Algorithms 8.5.2 Application of Genetic Algorithms to Neural Networks

345 345 346 353 363 363 369 374 375 375 381 386 389 391 392 394 400 401 417 417 419 421 424 428 429 431 436 437 438 439 439 452

8.2 8.3

8.4

8.5

xii

Contents

8.6

8.7

Genetic Algorithm-Assisted Supervised Learning 8.6.1 Hybrid GA/Gradient-Descent Method for Feedforward Multilayer Net Training 8.6.2 Simulations Summary Problems References Index

454 455 458 462 463 469 501