Modeling and Control of Complex Systems

AUTOMATION AND CONTROL ENGINEERING
A Series of Reference Books and Textbooks Editor FRANK L. LEWIS, PH.D.
Professor Automation and Robotics Research Institute The University of Texas at Arlington

Co-Editor SHUZHI SAM GE, PH.D.
The National University of Singapore

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.

Nonlinear Control of Electric Machinery, Darren M. Dawson, Jun Hu, and Timothy C. Burg Computational Intelligence in Control Engineering, Robert E. King Quantitative Feedback Theory: Fundamentals and Applications, Constantine H. Houpis and Steven J. Rasmussen Self-Learning Control of Finite Markov Chains, A. S. Poznyak, K. Najim, and E. Gómez-Ramírez Robust Control and Filtering for Time-Delay Systems, Magdi S. Mahmoud Classical Feedback Control: With MATLAB®, Boris J. Lurie and Paul J. Enright Optimal Control of Singularly Perturbed Linear Systems and Applications: High-Accuracy Techniques, Zoran Gajif and Myo-Taeg Lim Engineering System Dynamics: A Unified Graph-Centered Approach, Forbes T. Brown Advanced Process Identification and Control, Enso Ikonen and Kaddour Najim Modern Control Engineering, P. N. Paraskevopoulos Sliding Mode Control in Engineering, edited by Wilfrid Perruquetti and Jean-Pierre Barbot Actuator Saturation Control, edited by Vikram Kapila and Karolos M. Grigoriadis Nonlinear Control Systems, Zoran Vukiç, Ljubomir Kuljaãa, Dali Donlagiã, and Sejid Tesnjak Linear Control System Analysis & Design: Fifth Edition, John D’Azzo, Constantine H. Houpis and Stuart Sheldon Robot Manipulator Control: Theory & Practice, Second Edition, Frank L. Lewis, Darren M. Dawson, and Chaouki Abdallah Robust Control System Design: Advanced State Space Techniques, Second Edition, Chia-Chi Tsui Differentially Flat Systems, Hebertt Sira-Ramirez and Sunil Kumar Agrawal

18. Chaos in Automatic Control, edited by Wilfrid Perruquetti and Jean-Pierre Barbot 19. Fuzzy Controller Design: Theory and Applications, Zdenko Kovacic and Stjepan Bogdan 20. Quantitative Feedback Theory: Fundamentals and Applications, Second Edition, Constantine H. Houpis, Steven J. Rasmussen, and Mario Garcia-Sanz 21. Neural Network Control of Nonlinear Discrete-Time Systems, Jagannathan Sarangapani 22. Autonomous Mobile Robots: Sensing, Control, Decision Making and Applications, edited by Shuzhi Sam Ge and Frank L. Lewis 23. Hard Disk Drive: Mechatronics and Control, Abdullah Al Mamun, GuoXiao Guo, and Chao Bi 24. Stochastic Hybrid Systems, edited by Christos G. Cassandras and John Lygeros 25. Wireless Ad Hoc and Sensor Networks: Protocols, Performance, and Control, Jagannathan Sarangapani 26. Modeling and Control of Complex Systems, edited by Petros A. Ioannou and Andreas Pitsillides

Modeling and Control of Complex Systems

Edited by
University of Southern California Los Angeles, California, U.S.A.

Petros A. Ioannou

Andreas Pitsillides
University of Cypress Nicosia, Cyprus

Boca Raton London New York

CRC Press is an imprint of the Taylor & Francis Group, an informa business

MATLAB® is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks does not warrant the accuracy of the text or exercises in this book. This book’s use or discussion of MATLAB® software or related products does not constitute endorsement or sponsorship by The MathWorks of a particular pedagogical approach or particular use of the MATLAB® software.

CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2008 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Printed in the United States of America on acid-free paper 10 9 8 7 6 5 4 3 2 1 International Standard Book Number-13: 978-0-8493-7985-7 (Hardcover) This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www. copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC) 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com

Contents

1 Introduction to Modeling and Control of Complex Systems . . . . . . . . 1
Petros Ioannou and Andreas Pitsillides

2 Control of Complex Systems Using Neural Networks . . . . . . . . . . . . . 13
Kumpati S. Narendra, Matthias J. Feiler, and Zhiling Tian

3 Modeling and Control Problems in Building

Structures and Bridges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Sami F. Masri and Anastasios G. Chassiakos

4 Model-Free Adaptive Dynamic Programming Algorithms

for H-Infinity Control of Complex Linear Systems . . . . . . . . . . . . . . . . 131 Asma Al-Tamimi, Murad Abu-Khalaf, and Frank L. Lewis

5 Optimization and Distributed Control for Fair Data

Gathering in Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Avinash Sridharan and Bhaskar Krishnamachari

6 Optimization Problems in the Deployment

of Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .179 Christos G. Cassandras and Wei Li

7 Congestion Control in Computer Networks . . . . . . . . . . . . . . . . . . . . . . 203
Marios Lestas, Andreas Pitsillides, and Petros Ioannou

8 Persistent Autonomous Formations

and Cohesive Motion Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 Barıs Fidan, Brian D. O. Anderson, Changbin Yu, ¸ and Julien M. Hendrickx

9 Modeling and Control of Unmanned Aerial Vehicles:

Current Status and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 George Vachtsevanos, Panos Antsaklis, and Kimon P. Valavanis

10 A Framework for Large-Scale Multi-Robot Teams . . . . . . . . . . . . . . . . 297
Andrew Drenner and Nikolaos Papanikolopoulos

11 Modeling and Control in Cancer Genomics . . . . . . . . . . . . . . . . . . . . . . . 339
Aniruddha Datta, Ashish Choudhary, Michael L. Bittner, and Edward R. Dougherty

12 Modeling and Estimation Problems

in the Visuomotor Pathway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 Bijoy K. Ghosh, Wenxue Wang, and Zachary V. Freudenburg

13 Modeling, Simulation, and Control of Transportation Systems . . . 407
Petros Ioannou, Yun Wang, and Hwan Chang

14 Backstepping Controllers for Stabilization

of Turbulent Flow PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439 Miroslav Krstic, Jennie Cochran, and Rafael Vazquez

15 An Approach to Home Automation by Means

of MAS Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461 Giuseppe Conte and David Scaradozzi

16 Multi-Robot Social Group-Based Search Algorithms . . . . . . . . . . . . . 485
Bud Fox, Wee Tiong Ong, Heow Pueh Lee, and Albert Y. Zomaya Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509

Preface

Broadly speaking, a complex system consists of a large number of interacting components, which may include molecules, cells, bacteria, electronic chips, computers, routers, automobiles, even people or business firms. Interactions among the elements of such systems are often nonlinear and lead to rich dynamics, with patterns and fluctuations on many scales of space and time. They are often hard to understand, model and control using traditional approaches. Recent developments in the area of electronics, computational speed, sensor and communication technologies and advances in areas such as microelectromechanical systems MEMS, nanotechnology and quantum electronics open the way for new approaches in dealing with systems far more complex than one could imagine a few years ago. System theory can play a significant role in understanding, modeling, and controlling such complex systems. There is a general understanding that complex system theory together with technological advances in materials, electronics, and sensors will help solve new nontraditional problems in addition to the traditional ones, push the performance envelope further, and open the way for new products and more efficient operations. As complex system and feedback control concepts penetrate different disciplines, new notation is generated and new techniques are developed, leading to many publications, with results and products scattered in different journals, books, conference proceedings, and so on. Given the multidisciplinary nature of complex systems the scattering of information across different areas creates a chaotic situation for the reader who is interested in understanding the complexity and possible solutions as they apply to different areas and applications. The purpose of this book is to bring together a number of research experts working in different areas or disciplines to present some of their latest approaches and future research directions in the area of modeling and control of complex systems in a language that can be understood easily by system theorists. By bringing together different experts with different views and application areas the book provides a better picture of the issues involved in dealing with the modeling and control of complex systems in completely different areas. What works in one area may fail in another and an acceptable approach in one area may produce revolutionary results in another. The book contains sixteen chapters covering an indicative spectrum of the different areas and disciplines that can be classed as complex systems. These include neural networks for modeling and control, modeling and control of civil structures, transportation systems, sensor networks, genomics, computer

networks, unmanned air vehicles, robots, biomedical systems, fluid flow systems, home automation systems, and so on. The focus is not only on the theoretical treatment of the topic but also on the application and future directions. Readers from different disciplines with interest in modeling and control of complex systems will benefit from the book as they will learn how complexity is dealt with in different disciplines by researchers of different backgrounds using different approaches. This feature of the book is very educational and will help researchers learn about methodologies in other areas that may be applicable to their area. In addition it will enable people to shift to other research areas within complex systems where their approach and methodology will lead to new solutions. The book is intended for people who are interested in the theory and application of a system approach to handle complex systems in a very wide range of areas. Possible solutions to the modeling and control of complex systems may include, in addition to theory and simulation tools, the use of advanced sensor and communication technologies for implementation. This mix of theory simulation and technology becomes a strong educational vehicle for enlarging knowledge beyond the bounds of specific topics in which most researchers are often trapped. It encourages a multidisciplinary approach to deal with complexity, which has the potential of leading to new breakthroughs and advances. We wish to thank all the authors for their valuable time and efforts in putting together this book, for their hard work, and for sharing their experiences so readily. We also thank the reviewers for their valuable comments in enhancing the contents of this book. Last, but not least, we would like to thank Frank Lewis, the series editor, B. J. Clark, Helen Redshaw, Nora Konopka, Catherine Giacari, Jessica Vakili, and the rest of the staff at CRC for their understanding, patience, and unwavering support in materializing this book. We hope this book will be a useful reference and a source of inspiration for all the readers in this important and growing field of research, and will contribute to the effective modeling and design of complex systems, which form the pillar of today’s society. Petros Ioannou Andreas Pitsillides

and Internet technologies and their application in mobile e-services. to solve problems in computer networks. respectively. He is an associate professor. University of Southern California and the director of the Center of Advanced Transportation Technologies. Melbourne. neural networks. Prior to that he worked in industry for six years (Siemens 1980–1983. In 1992. and heads the Networks Research Laboratory (NetRL). (Honors) degree from the University of Manchester Institute of Science and Technology (UMIST) and Ph. nonlinear dynamical systems and intelligent transportation systems.D. intelligent transportation systems and marine transportation. Petros Ioannou is a professor in the Department of Electrical EngineeringSystems. from Swinburne University of Technology. His research interests are in the areas of adaptive control. senior lecturer 1990–1994. Australia. neural networks. Dr. he spent a six-month period as an academic visitor at the Telstra (Australia) Telecom Research Labs (TRL). resource allocation and radio resource management. vehicle dynamics and control. and foundation associate director of the Swinburne Laboratory for Telecommunications Research. Department of Computer Science. WLANs.Sc. Andreas Pitsillides (IEEE M’89. Andreas is also a founding member and chairman and scientific director of the Cyprus Academic and Research Network (CYNET) since its establishment in 2000. and has given short courses at international conferences and short courses to industry. in tele-healthcare and security issues. for example. 1983–1986). nonlinear systems. He also holds a courtesy appointment with the Department of Aerospace and Mechanical Engineering. and from 1987 to 1994 was with the Swinburne University of Technology (lecturer. such as nonlinear control theory and computational intelligence. and the author or coauthor of 8 books and over 200 research papers in the areas of controls.The Editors Dr. TCP/IP. presented invited lectures at major research organizations. in 1980 and 1993. Andreas has published over 170 research papers and book chapters. Ioannou is a fellow of IEEE. UMTS third generation mobile networks and beyond). flow and congestion control. Andreas’s research interests include fixed and wireless networks (ad hoc and sensor networks. . SM’2005) received a B. Asea-Brown Boveri. He has a particular interest in adapting tools from various fields of applied mathematics. University of Cyprus. 1992–1994). vehicle automation. fellow of the International Federation of Automatic Control (IFAC).

IST FP6 GEANT. and ICT.3: Performance of Communications Systems. Andreas is also a member of the editorial board of Computer Networks (COMNET) Journal. IST e-TEN FP 6 HEALTHSERVICE24. such as INFOCOM. IST FP 6 C-MOBILE. WiOpt. IST FP 6 MOTIVE. UCY ADAVIDEO. the Swinburne University of Technology. Current research projects include: IST FP 6 M-POWER. He is a member of the International Federation of Automatic Control (IFAC) Technical Committee TC 1. and of the International Federation of Information Processing (IFIP) working group WG 6. the Cambridge Microsoft Research Labs. IST e-TEN LINKCARE. MCCS.3 on Transportation Systems. with total funding exceeding 9 million Euro. Andreas serves or has served on the executive committees of major conferences. .5 on Networked Systems and TC 7.His work has been funded by the European Commission IST program. the University of Cyprus. and the Australian government research grants board. the Cyprus National Research Promotion Foundation (RPF). ISYC. RPF VIDEO.

California Ashish Choudhary Department of Electrical Engineering Texas A&M University College Station. Texas Jennie Cochran Panos Antsaklis Department of Mechanical and Department of Electrical Engineering Aerospace Engineering University of Notre Dame University of California Notre Dame.Contributors Murad Abu-Khalaf Control & Estimation Group The MathWorks. O. Massachusetts Asma Al-Tamimi Mechatronics Engineering Department Hashemite University Zarqa. D. Italy Aniruddha Datta Department of Electrical Engineering Texas A&M University College Station. Australia and National ICT Australia Canberra. Texas . Australia Hwan Chang Department of Electrical Engineering Systems Center for Advanced Transportation Technologies University of Southern California Los Angeles. California Michael L. California Anastasios G. Cassandras Department of Manufacturing Engineering Center for Information and Systems Engineering Boston University Brookline. Bittner Translational Genomics Research Institute Phoenix. Natick. Massachusetts Giuseppe Conte Dipartimento di Ingegneria Informatica Gestionale e dell’ Automazione Universit` Politecnica delle Marche a Ancona. Chassiakos Department of Electrical Engineering California State University Long Beach. Indiana San Diego. Jordan Brian. Inc. Anderson Research School of Information Sciences and Engineering Australian National University Canberra. Arizona Christos G.

Ghosh Department of Mathematics and Statistics Texas Tech University Lubbock. Cyprus . Dougherty Department of Electrical Engineering Texas A&M University College Station. Texas Julien M. California Heow Pueh Lee Institute of High Performance Computing Singapore and Department of Mechanical Engineering National University of Singapore Singapore Marios Lestas Department of Computer Science Networks Research Lab (NetRL) University of Cyprus Nicosia. Texas and Translational Genomics Research Institute Phoenix. California Miroslav Krstic Department of Mechanical and Aerospace Engineering University of California San Diego. Hendrickx Department of Mathematical Engineering Universite Catholique de Louvain Louvain-la-Neuve. Australia and National ICT Australia Canberra. Australia Bud Fox Institute of High Performance Computing Singapore Zachary V. Missouri Bijoy K. Belgium Petros Ioannou Department of Electrical Engineering Systems Center for Advanced Transportation Technologies University of Southern California Los Angeles. California Bhaskar Krishnamachari Ming Hsieh Department of Electrical Engineering Viterbi School of Engineering University of Southern California Los Angeles.Edward R. Minnesota Matthias J. Switzerland ¨ Baris Fidan Research School of Information Science and Engineering Australian National University Canberra. Feiler Systems Design ETH Zurich ¨ Zurich. Arizona Andrew Drenner Department of Computer Science and Engineering Center for Distributed Robotics University of Minnesota Minneapolis. Freudenburg Department of Computer Science and Engineering Washington University Saint Louis.

California Kumpati S. Florida Rafael Vazquez Department of Aerospace Engineering Escuela Superior de Ingenieros University of Seville Seville. Italy Avinash Sridharan Ming Hsieh Department of Electrical Engineering Viterbi School of Engineering University of Southern California Los Angeles. Connecticut George Vachtsevanos School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta. Masri Civil and Environmental Engineering University of Southern California Los Angeles. Connecticut Wee Tiong Ong Department of Mechanical Engineering National University of Singapore Singapore Nikolaos Papanikolopoulos Department of Computer Science and Engineering University of Minnesota Minneapolis. Georgia Kimon P. Minnesota Andreas Pitsillides Department of Computer Science Networks Research Lab (NetRL) University of Cyprus Nicosia. Texas Wei Li The Math Works. Valavanis Department of Computer Science and Engineering University of South Florida Tampa. Texas . Cyprus David Scaradozzi Dipartimento di Ingegneria Informatica Gestionale e dell’ Automazione Universit` Politecnica delle Marche a Ancona. Massachusetts Sami F. Narendra Department of Electrical Engineering Center for Systems Science Yale University New Haven. Lewis Automation & Robotics Research Institute The University of Texas at Arlington Fort Worth.Frank L. Natick. California Zhiling Tian Department of Electrical Engineering Center for Systems Science Yale University New Haven. Inc. Spain Wenxue Wang Department of Mathematics and Statistics Texas Tech University Lubbock.

Zomaya CISCO Systems Chair Professor of Internetworking School of Information Technologies The University of Sydney Sydney. California Changbin Yu Research School of Information Sciences and Engineering Australian National University Canberra. Australia Albert Y. Australia .Yun Wang Department of Electrical Engineering Systems Center for Advanced Transportation Technologies University of Southern California Los Angeles. Australia and National ICT Australia Canberra.

.... 5 Chapter 6: Optimization Problems in the Deployment of Sensor Networks ........... 10 Chapter 15: An Approach to Home Automation by means of MAS Theory......................................................................................9 1.........10 1.........6 1..............................................7 1..........1 1..........15 Chapter 2: Control of Complex Systems Using Neural Networks ..................................................................... and Control of Transportation Systems. 11 1 ........................................ 3 Chapter 3: Modeling and Control Problems in Building Structures and Bridges .................................................................................................................................................2 1..................................... 9 Chapter 14: Backstepping Controllers for Stabilization of Turbulent Flow PDEs .... 6 Chapter 8: Persistent Autonomous Formations and Cohesive Motion Control ..............................................12 1........ 4 Chapter 4: Model-Free Adaptive Dynamic Programming Algorithms for H-Infinity Control of Complex Linear Systems.. Simulation..13 1.... 10 Chapter 16: Multi-Robot Social Group-Based Search Algorithms ............................................ 8 Chapter 12: Modeling and Estimation Problems in the Visuomotor Pathway ................................ 9 Chapter 13: Modeling.................................1 Introduction to Modeling and Control of Complex Systems Petros Ioannou and Andreas Pitsillides CONTENTS 1............ 5 Chapter 5: Optimization and Distributed Control for Fair Data Gathering in Wireless Sensor Networks ...............11 1...........................3 1............................................5 1.............8 1............................................. 7 Chapter 9: Modeling and Control of Unmanned Aerial Vehicles: Current Status and Future Directions . 7 Chapter 10: A Framework for Large-Scale Autonomous Multi-Robot Teams......................................14 1........... 6 Chapter 7: Congestion Control in Computer Networks .................4 1................ 8 Chapter 11: Modeling and Control in Cancer Genomics ..........................................

for diagnostic purposes or to be used for control design. It is often the case that the model of the system is so complex that it is difficult if at all possible to use existing control tools to design a control scheme to meet the performance requirements. and external disturbances. a high-order model could lead to a high-order controller that cannot be implemented due to lack of adequate computer memory and computational speed. It is impossible to describe these dynamic characteristics with mathematical equations and achieve a high level of accuracy in the sense that for the same inputs the outputs of the model match those of the real system over the whole frequency spectrum. The traditional modeling of a practical system as a linear. In addition the performance envelope is no longer restricted by the computational constraints of the past and new control problems and areas emerged. Models can be developed using physical laws as well as experiments and processing of data. In this case modeling is even more challenging as decisions have to be made as to which phenomena and dynamics are neglected and which ones are modeled. nontraditional control techniques. In such a case. What is possible. The options in this case are to develop simplified models that accurately describe the dynamic characteristics of the system over the frequency range of interest for which control design tools are available or to develop new control tools applicable to the complex system under consideration. The dramatic development of computers and microelectronics with the simultaneous reduction in implementation costs opened the way to designing complex control designs based on far more complex system models. Modeling is therefore not only a mathematical exercise but involves a good understanding of the system and its functionality. which is often the low-frequency range. timeinvariant system in the state-space form: x = Ax + Bu ˙ y = C x + Du or the input–output transfer function form: y = G(s)u .2 Modeling and Control of Complex Systems The modeling of complex dynamic systems has always been a challenging research topic due to the fact that mathematical models cannot accurately describe nature. Once a model is developed it has to be validated using real data over the frequency spectrum of interest. with infinite dimensions. The first option is a characteristic of the traditional approaches to control design when electronics and computational tools were not as advanced as they are today. and useful for all practical purposes is to achieve model/system output matching over the frequency range of interest. however. and characteristics that can vary with time. For example. bringing new challenges and the need for new. A real system is often nonlinear. Complex models may be developed in an effort to understand the system. noise. the model would be simplified by reducing its order so that a simplified control design can be developed and implemented using available computational tools.

Therefore. and so on. manufacturing systems. there is a strong need for new modeling techniques and control designs to cope with the plethora of inputs and outputs involved in such complex systems. 1. A plethora of results and applications .Introduction to Modeling and Control of Complex Systems 3 served the needs of a wide class of control problems and continues to do so. Systems that fall into this category include biological systems. The reason is that many electromechanical systems are designed to behave as linear. and evolves over time. Below we present a brief summary of the different areas covered in the chapters to follow. These subsystems include continuous time as well as discrete time parts in a noisy environment. and computational speed. computers. transportation and computer networks. which involves modeling and control. Modeling complexity and controlling complex systems became an emerging area of interest for research. Complex systems include systems in which many subsystems and constituent elements interact with each other as well as with their environment in the presence of uncertainties and unpredictable phenomena. The purpose of this book is to bring together a number of experts dealing with the modeling and control of complex systems to present up-to-date progress as well as future directions. robotic systems. Most of the control problems that arise in these applications are nontraditional as complexity cannot be reduced as much as in traditional mechanical or electrical systems. sensor networks. The availability of these technologies opens the way to apply system theory. teaching. and applications. But from this mass of interactions patterns emerge so that the overall system performs in a satisfactory manner. the above classical model formulation is no longer adequate or applicable. As the class of modeling and control problems expands to include nonclassical systems or the need for expanding the performance envelope in an effort to reduce cost or squeeze more out of a system arises.1 Chapter 2: Control of Complex Systems Using Neural Networks Neural networks have been motivated from biological systems as a way of dealing with complex nonlinear systems. The goal of using neural networks or artificial neural networks to build systems with the ability to learn and control in a way very similar to biological systems has not yet been achieved the way it was promised. Instead neural networks have been used as nonlinear function approximators either off-line or online using adaptive control techniques to update the weights of the network. as well as to meet the challenges of new systems and performance requirements by taking advantage of the dramatic advances in sensors. information technologies. to many nontraditional electromechanical systems and networks of systems more complex than in the past. The diversity of the topic areas has the common theme of control and modeling as it is viewed and treated by experts with different backgrounds and approaches. time-invariant systems over the frequency range of interest. unmanned air vehicles. embedded systems.

They provide a background on some mathematical preliminaries. many of the current research problems are related to these areas. Although the results as presented in the literature are impressive.2 Chapter 3: Modeling and Control Problems in Building Structures and Bridges Future large building structures and bridges could employ a large number of sensors for diagnostic purposes as well as for active control in case of earthquakes and other external forces. and to nonlinear adaptive control when system characteristics are unknown. Results in nonlinear control theory. concepts and structures suggested by classical (linear) adaptive control. spanning the range from micro-electromechanical systems (MEMS) devices. as well as control of systems that are too complex to be handled with traditional techniques. especially with respect to parameter convergence. 1. incorporating parametric as well as nonparametric system identification methods. and the approximating capabilities of neural networks are judiciously combined to deal with the nonlinear adaptive control problems that arise in complex systems. as well as the methods used to update their parameters. Finally. Understanding and identifying the dynamics of these systems and finding ways to control them is an active area of research. Because neural network-based control naturally leads to nonlinear control. results from linear and nonlinear control. . the current status of industrial applications is described with details related to the choice of the neural networks. the structures of identifiers and controllers. active control. as well as optimization and optimal control over a finite time using neural networks. In this chapter the authors address the modeling of realistic structural dynamic systems for purposes of simulation. for developing parsimonious nonlinear models of arbitrary structural systems. Appropriate assumptions that have to be made at every stage both to have well-posed problems and to make them mathematically tractable are discussed extensively and critical comments concerning methods currently in vogue in the control literature are provided. identifiability. The models developed can be used in a variety of applications. The evolution of the field during this period and the principal ideas motivating them are also discussed. and control.4 Modeling and Control of Complex Systems of neural network techniques have appeared in many areas of engineering dealing with modeling. theoretical justifications are scarce. They provide a state-of-the-art approach. and off-line and online training of neural networks for successful practical controllers. neural networks used to practically realize the control laws. function or mapping identification. The authors briefly address global and stabilizability questions. In this chapter the authors discuss issues related to neurocontrol that have arisen during the past fifteen years. or structural health-monitoring applications.

The prevailing methodology for protocol design in this context is a bottom-up intuitive engineering approach. as well as convergence proofs of ADP methods for H∞ discrete-time control. Q-learning provides the first direct adaptive control technique that converges to an H∞ controller. In this chapter the authors present an illustrative case study showing how a distributed convex optimization framework can be used to design a rate control protocol for fair data gathering in wireless sensor networks. 1. especially when it comes to the design of higher layer network protocols. is an improved version of the first algorithm in the sense that the knowledge of the system model is not needed. The authors present a technique for online implementation. to represent many challenging types of stationary as well as nonstationary nonlinearities. control. not a top-down process guided by solid mathematical understanding. and network problems. which is in fact an adaptive control design that converges to the optimal H∞ solution. A wide variety of case studies is provided to illustrate the use of the modeling tools for online or off-line identification of nonlinearities. using experimental measurements as well as simulation results. is used to find the optimal controller forward in time. there is still a large gap between theory and practice. 1. A distributed dual-based gradient search algorithm is proposed and illustrated. and both yield online algorithms. to dispersed civil infrastructure systems. To the best of the authors’ knowledge.3 Chapter 4: Model-Free Adaptive Dynamic Programming Algorithms for H-Infinity Control of Complex Linear Systems In this chapter the authors address the design of optimal H∞ controllers for discrete-time systems by solving linear quadratic zero-sum games that appear in the optimal control problem of complex linear systems. referred to as action-dependent heuristic dynamic programming (ADHDP) or Q-learning. They believe that this kind of systematic modeling . communication. In this algorithm the system model is assumed to be known.4 Chapter 5: Optimization and Distributed Control for Fair Data Gathering in Wireless Sensor Networks Sensor networks are a rather recent area of research that involves complex modeling. In this chapter two methods are presented to obtain the optimal controller. The method used to obtain the optimal controller is the approximate dynamic programming (ADP) technique. The first algorithm. Despite the numerous research efforts in the field.Introduction to Modeling and Control of Complex Systems 5 to aerospace structures. The second algorithm. heuristic dynamic programming (HDP). This leads to a model-free optimal controller design.

1. In this framework the congestion control problem is viewed as a resource allocation problem. The performance of sensor networks is sensitive to the location of its nodes in the mission space. Under dynamically changing sensing fields. The relevant cost functions serve as Lyapunov functions for the derived algorithms.6 Modeling and Control of Complex Systems and optimization framework represents the future of protocol design in complex wireless networks such as sensor networks. the sensing field is modeled by a density function representing the probability that specific events take place while mobile nodes having limited range are introduced. which trades off sensing coverage and communication cost. which is transformed through a suitable representation into a nonlinear programming problem.6 Chapter 7: Congestion Control in Computer Networks Congestion control of computer networks is another problem that deviates from the classical control problems of electromechanical systems.5 Chapter 6: Optimization Problems in the Deployment of Sensor Networks Chapter 6 addresses optimization problems in the deployment of sensor networks. This leads to the basic problem of deploying sensors in order to meet overall system objectives. Finally. the adaptive relocation behavior naturally follows from the optimal coverage formulation. Initially. The lack of measurements and adequate local control actions makes both the modeling and control of traffic very challenging. Next. a deployment setting where data sources are unknown is taken into account. thus demonstrating how local dynamics are coupled to achieve a global objective. Taking into consideration the distributed communication and computation structure of sensor networks. This chapter describes system deployment problems for sensor networks viewed as complex dynamic systems. In this chapter the authors provide a survey of recent theoretical and practical developments in the design of Internet congestion control protocols based on the resource allocation view. A distributed deployment algorithm is applied at each mobile node so that it maximizes the joint detection probabilities of random events. cooperative control comes into play so as to meet specific mission objectives. 1. A theoretical framework is presented that has been used extensively in the last few years to design congestion control protocols with verifiable properties. Many of the algorithms . The main aim is to determine the locations of a given number of relay nodes and the corresponding link flows in order to minimize the total communication cost. In this case. a deployment setting where date sources are known is taken into consideration. communication cost is incorporated into the coverage control problem.

This approach has failed to produce protocols that satisfy all the design requirements. Based on these characteristics and criteria they analyze certain persistence acquisition and maintenance tasks. In this chapter the authors present autonomous multiagents formations in the framework of graph rigidity and persistence. However. communications. for the coordinated/collaborative control of UAV swarms. and target tracking. which again deviate from the traditional control problem formulations. formation control. which is shown through simulations to outperform previous proposals and work effectively in a number of representative scenarios. Thus.7 Chapter 8: Persistent Autonomous Formations and Cohesive Motion Control The modeling and control of formation of agents such as flying objects or robots in order to follow certain trajectories as a single body with the ability to split and reconfigure is another area of recent research activities. The assembly of multiple and heterogeneous vehicles is viewed as a “system of systems” where individual UAVs are functioning as sensors or agents. new modeling. So. including the system architecture. In this chapter the authors present a number of global stability results that guide the proposal of a new adaptive congestion control protocol. networking. 1.Introduction to Modeling and Control of Complex Systems 7 derived have been shown to have globally stable equilibrium points in the presence of delays. They give useful characteristics of rigid and persistent graphs and their implications for the control of persistent formations. They also analyze cohesive motion of persistent autonomous formations and present a set of distributed control schemes to cohesively move a given persistent formation with specified initial position and orientation to arbitrary desired final position and orientation. They also present some operational criteria to check the persistence of a given formation. and computing technologies must be developed and validated if such complex unmanned systems are to perform effectively and efficiently.8 Chapter 9: Modeling and Control of Unmanned Aerial Vehicles: Current Status and Future Directions Chapter 9 reviews the unmanned aerial vehicle (UAV) technologies. the performance of these algorithms in networks of arbitrary topology has been demonstrated through simulations and practical implementation. Both current developments and future directions are addressed to improve the autonomy and reliability of UAVs. for max-min congestion controllers the problem of asymptotic stability in the presence of delays still remains open. in conjunction with manned . 1.

1. making them suitable for operation in scenarios that may be hazardous or harmful to human response.10 Chapter 11: Modeling and Control in Cancer Genomics System biology and genomics is another important emerging area with many challenging modeling and control problems whose solution will have a tremendous impact in the field. The basic design of the majority of marsupial systems does not have the scalability to handle larger-scale teams. Chapter 10 discusses robotic teams comprised of heterogeneous members. but in general marsupial systems represent teams of two or three robots. deploy. Very recent research indicates . and recover smaller deployable robots. which have many unique capabilities. processing. The effectiveness of these teams requires that the team members take advantage of the strengths of one another to overcome individual limitations. Some of these strengths and deficiencies come in the form of locomotion. the power consumption of a large-scale robotic team is modeled and used to optimize the location of mobile resupply stations. In Chapter 11.8 Modeling and Control of Complex Systems systems. sensing. 1. The transitions between these behaviors are used to guide the actions of both the deployable robots and the mobile docking stations that resupply them. to achieve a much wider class of tasks via coordination and interaction with each other goes beyond the classical control techniques for individual robots. the authors present an overview of the research accomplished thus far in the interdisciplinary field of cancer genomics and point out some of the research challenges that remain. Genomics study is important because cellular control and its failure in disease result from multivariate activity among cohorts of genes. The authors propose possible solutions to these challenges. Specifically. which offer increased performance and redundancy in complex scenarios. however. Results from preliminary simulation are presented.9 Chapter 10: A Framework for Large-Scale Autonomous Multi-Robot Teams The control of individual robots involved the dynamic characteristics of the electromechanical parts and involved position and tracking accuracy depending on the application. The work presented in this chapter deals with the modeling of a much larger-scale robotic team that utilizes principles of marsupial systems. and available power. communication. There has been some work in the area of marsupial systems. The control of multiple robots. in a variety of application domains. Each deployable member of the distributed robotic team has a specific model comprised of a series of behaviors dictating the actions of the robot. Many times larger robots can be used to transport.

for cancer therapy. They illustrate these ideas for the real-life example of melanoma cell line. 1. The chapter concludes with a discussion of the motor control problem and how the cortical waves play a leading role in actuating movements that would track a moving target with some level of evasive maneuvers. first in the spatial domain and subsequently in the temporal domain over a sequence of sliding windows. The use of advanced technologies for data .” are sufficiently different from each other to allow for alternative locations of point targets in the visual space. and Control of Transportation Systems Transportation networks are classical examples of complex dynamic systems with many challenging problems related to modeling their behavior and controlling their dynamics. The authors describe how a population of neurons model the dynamic activity of a suitable region of the turtle visual cortex. acquiring and internally representing images of the target and finally actuating a suitable motor action. viewed as “beta strands. The first method utilizes statistical detection wherein the activity waves generated by the visual cortex are encoded using principal components analysis. and show how the model cortex is able to discriminate location of the target in the visual space.12 Chapter 13: Modeling. The representation is carried out. 1. Simulation. Using the model cortex. The authors model genetic regulatory networks using probabilistic Boolean networks (PBNs) whose state transition probabilities depend on an external (control) variable. the beta strands are discriminated using a nonlinear dynamic system with multiple regions of attraction. and consider the issue of choosing the sequence of control actions to minimize a given performance index over a finite number of steps. responding to a class of visual inputs.Introduction to Modeling and Control of Complex Systems 9 that engineering approaches for prediction signal processing and control are quite well suited for studying this kind of multivariate interaction. they show that the representation of the activity waves. The discrimination is carried out using two separate algorithms. such as capturing the target. Discrimination is carried out assuming that the noise is additive and Gaussian.11 Chapter 12: Modeling and Estimation Problems in the Visuomotor Pathway In Chapter 12 the authors describe modeling and estimation problems that arise in the animal visuomotor pathway. Each beta strand corresponds to a suitable initialization of the dynamic system and the states of attraction correspond to various target locations. The pathway is particularly adept at tracking targets that are moving in space. In the second method.

10 Modeling and Control of Complex Systems collection and control makes the development of validated models and control feasible in dealing with such complex systems on the local and network levels. The model is linearized about a prescribed equilibrium. generate complexity. This chapter presents an overview of traffic flow modeling at the microscopic and macroscopic levels. a review of current traffic simulation software and several methods for managing and controlling the various transportation system modes. share resources and . Each one-dimensional model consists of a spatially noncausal subsystem. After a two-dimensional Fourier transform and an invertible change of variables. The resulting linear model is described by a set of partial differential equations (PDEs). Advantages of this method include: no spatial or temporal approximations are needed. which is transformed into a causal subsystem via feedback.13 Chapter 14: Backstepping Controllers for Stabilization of Turbulent Flow PDEs This chapter presents a backstepping approach to the control of the benchmark three-dimensional channel flow. hybrid time-driven/event-driven behaviors. This complex system is modeled by the Navier–Stokes equations. all together. 1. due to a number of factors such as distributed control structures. where the actuation is the velocity components at one wall. the “gains” can be precomputed because they are explicit functions of the Reynolds number and wave numbers (thus no need to solve high-dimensional Riccati equations). 1. which. The backstepping approach is then used to develop the feedback controllers to decouple and stabilize the flow. Ramp metering and speed limit control techniques for current and future transportation systems are also discussed. interoperability between components of different brands.14 Chapter 15: An Approach to Home Automation by means of MAS Theory In Chapter 15 the authors analyze and study home automation systems using a multi-agent system framework. possess a certain degree of intelligence. a continuum of uncoupled one-dimensional PDEs is derived to model the flow. The appliances and devices in modern houses can be viewed as components that are essentially autonomous. Traffic flow and congestion phenomena are so complex that modeling techniques and computers are used to generate simulation models to describe the dynamic behavior of traffic networks. and requirements of safe and efficient interaction with human users. The problem of conceiving and developing efficient systems for home automation presents several difficult aspects.

15 Chapter 16: Multi-Robot Social Group-Based Search Algorithms In Chapter 16 the authors use various ideas from the traditional search and rescue (SAR) theory and merge them with a more heuristic social group-based oriented search mechanism. 1. and communicate among themselves. providing a powerful conceptual framework and a number of appropriate methodological tools for coping with complexity. and the non-cooperative searches are typical in warfare environments where both parties search for each other but attempt to avoid detection. The robots are divided into two social groups: a faster moving group and a more energy-conserving group. and a Voronoi search algorithm using a Voronoi decomposition of search space prior to the commencement of multiradial search. . The formalism derived from the MAS theory can. in principle.Introduction to Modeling and Control of Complex Systems 11 some common goals. to determine the effectiveness of the detection and the tracking ability of a moving target by groups of robots. The work is designed to lay the foundations of future studies in planar and three-dimensional submarine detection and tracking. in both cooperative and noncooperative search scenarios. The cooperative searches involve both parties trying to locate each other as in a SAR situation. which arises mainly by the interaction between different components. nonlinear. They develop a multi-robot social group-based search algorithm. and simulate a group of robots detecting and tracking a target moving in a linear. by ships and aircraft. and random walk manner. respond to these needs. Three algorithms are pursued: a robot search algorithm. a standard search algorithm using a multiradial search function and dispersion behavior.

.

...................... and Zhiling Tian CONTENTS 2...3.... and Stability....................... 18 2.............1 Neural Networks............... 15 2........................................ 29 Neural Networks...............4 Assumptions...........................2..1...................................... and Stability........... 27 2....1 Historical Background.3 13 .........2. 20 2.............. 30 2......... 27 2.................1.....................2...... 32 2.........1..............................................2...............2 Nonlinear Systems .4........................1 Plant........3..................2 2...... 19 2......................... 17 2..........2................. 30 2........ Matthias J. Observability........ Feiler.............................4 Problem Statement ..........1....... 16 2...1.............1 Linear Time-Invariant Systems ......................1........................2....................1 Controllability........ Observability.........................3........4 Objectives of Chapter ........................................... 19 Mathematical Preliminaries ............... 21 2........2 Control of Complex Systems Using Neural Networks Kumpati S.........1....... Adaptive Laws.... 20 2............1.............................1...3......1 Feedforward Networks............ 22 2..............1 Linear Control and Linear Adaptive Control.........3...2...3....... Narendra........................5 Organization of Chapter ...... 23 2.......... 16 2.......................... 16 2........2.................. 15 2......3......... 21 2.............3 Adaptive Systems: Theoretical Considerations ..............................1........3.......................... 25 2...2....3 System Approximation ..1....... and Stability...................................4...... 23 2............2.........1 Controllability.........................................3 Minimum Phase Systems ............. 19 2.....................................1..................................................... 31 2.....1 Introduction..2 Artificial Neural Networks (ANNs) ......................................................1..............2 ARMA Model .........................2 Recurrent Networks .........2.2 Control of Complex Systems.............................................1.. 34 2..3 Nonlinear Adaptive Control: Stability and Design ...............1...............2 An Area for Future Research ....................3 ANNs for Control....

.......... 72 2......2 Dynamic Programming in Continuous and Discrete Time..........2......5....1... 40 2.... 44 2.................6........6........3 Interconnection of LTI Systems and Neural Networks . 48 2....3......4................ 36 2....4 Theoretical and Practical Stability Issues.......................3 Computational Advantage................ 55 2...4........6.. 79 2........4 2............3 Global Observability.................. 45 2........2...1 Modeled Disturbances and Multiple Models for Rapidly Varying Parameters...................................... 53 2......... 43 2.....7.............3.......2......4....1........4........... 35 2.........7 ...............2 Parameter Optimization ... 60 2............2 Control of Nonlinear Multivariable Systems ........3...........2.................4........................................4.................. 73 2.....4 Other Formulations ... 84 2.6......5 2....3.4....6.......3 System Theoretic Properties....3 Adjustment of Parameters: Feedforward and Recurrent Networks . 59 2.6....1....... 42 2...2 Higher-Order Functions ...... 71 Optimization and Optimal Control Using Neural Networks ...................1 Function Approximation Using Neural Networks........................1 Model.....6...............................3..................... 41 Identification and Control Methods .5..........................3..................4.......................1 Back Propagation through Time..............7. 72 2.......................... 85 2................1 Application 1: Controller in a Hard Disk Drive................3..2...................... 79 2.....3.............................2 Dynamic Back Propagation..4. 50 2...........................4........................... 64 2............... 59 2.............. 36 2..........................2 Practical Design of Identifiers and Controllers (Linearization) ...... 78 2........14 2...........4.... 54 2...4..................6 2.................................................1................. 41 2.. 43 2.. 39 2...................... 85 2......1................1........4.........3.......2.4....2 Discrete Time (No Uncertainty)......................................................3 Interconnected Systems ..1 Linear Adaptive Control.3..............................................2... 81 Applications of Neural Networks to Control Problems ...............................................1............... 80 2.......................................3........... 60 Global Control Design ......1 Error Models for Nonlinear Systems ....1 Dynamics on Manifolds .........................................7........1......1 System Representation...........1............... 66 2............................................. 74 2.................2...1 Continuous Time (No Uncertainty) .3 Nonlinear Adaptive Control ......6....... 37 2.6.....4................... 78 2................................1 Identification and Control Based on Linearization ...............2 Global Controllability and Stabilization ...............2 Radial Basis Function Network ...........4 Real-Time Recurrent Learning..........5.............. 64 2................1 Neural Networks for Optimal Control ...2 Gradient-Based Methods and Stability .........................4........... 85 2...............................2 Modeling and Control of Complex Systems Stable Adaptive Laws: Error Models ..3 Related Current Research........................3 Discrete Time (System Unknown).................................................................2 Nonlinear Adaptive Control Using Linearization Methods......3.......

...7 Application 7: Real-Time Predictive Control in the Manufacturing Industry......................................... 87 2.. Among these were control theorists like the first author.................................7.........................1.............. who were inspired by the ability of biological systems to retrieve contextually important information from memory...............2................... and advances in biology.............................. 89 2.......7.... marked by periods of great activity followed by years of fading interest and revival due to new engineering insights [1]–[8]....................7................... They came to the field with expectations of building controllers based on artificial neural networks with similar information processing capabilities...................................................... 87 2..................1....................3 Objective ... 88 2................... Many of them also believed that an integration of the knowledge acquired in the different areas was possible.......1 Objective ........ 93 References.....7................................2.................................... 86 2. 90 2..3 Application 3: MIMO Furnace Control............ 85 2...7....1 Introduction 2............................................7........................ 90 2.... As an area of research it is of great interest due to its potential for providing insights into the kind of highly parallel computation performed by physiological nervous systems......................................6 Application 6: Biped Walking Robot ....Control of Complex Systems Using Neural Networks 15 2..1 Historical Background The term artificial neural network (ANN) has come to mean any computer architecture that has massively parallel interconnections of simple processing elements...8 Application 8: Multiple-Models: Switching and Tuning..7............. 86 2..................2 Method . At the same time they were also convinced that the design of such controllers should be rooted in the theoretical research in .........................1........ 91 2..............4 Application 4: Fed-Batch Fermentation Processes ...... The latest period of explosive growth in pure and applied research in both real and artificial neural networks started in the 1980s.... 92 Acknowledgments ......10 Application 10: Preconscious Attention Control ...... 87 2........................................ when investigators from across the scientific spectrum were attracted to the field by the prospect of drawing ideas and perspectives from many different disciplines...........................7..2 Application 2: Lean Combustion in Spark Ignition Engines................................7.............7......... and process such information to interact efficiently with uncertain environments...............7.........5 Application 5: Automotive Control Systems ...7............................ 86 2.... technological developments............... 91 2...... Research in the area of artificial neural networks has had a long and interesting history......... 94 2..8 Comments and Conclusions ..4 Adaptive Laws and Control Law ....................9 Application 9: Biological Control Structures for Engineering Systems................................7........................

made them attractive as components and subsystems in various applications. these claims led Hornik. gave rise to a frenzy of activity in the neural network control community. or whether the observed successes were reflective of some deep and fundamental approximating capabilities. The control .1.16 Modeling and Control of Complex Systems progress in different areas of mathematical control theory. and easily applied to multivariable systems.2 Artificial Neural Networks (ANNs) From the point of view of systems theory. Following this.1. and White [11] to raise the question whether these were merely flukes. an artificial neural network (ANN. in turn. or forcing them to follow prescribed time functions (tracking). as a result of the work of numerous authors [9]–[11]. During the late 1980s. because the approximation capabilities of such networks could be used in the design of identifiers and controllers for unknown or partially known dynamic systems. As in the past it once again became evident that more formal methods. and provided mathematical justification for them. 2. As in the case of function approximation. nonlinear.1 Linear Control and Linear Adaptive Control The objective of control is to influence the behavior of dynamic systems. vast amounts of empirical evidence began to accumulate. and numerous heuristic methods were proposed in the following years for the control of nonlinear processes. stochastic. and practically implementable family of transformations. hierarchical. it was suggested in 1990 [12] that feedforward neural networks could also be used as components in feedback systems. grounded in mathematical systems theory. Stinchcombe. The fact that they are universal approximators. This. would have to be developed to quantitatively assess the capabilities as well as limitations of neurocontrol. 2.3 ANNs for Control Even as the above ground-breaking developments in static optimization were taking place. can be implemented in hardware. efficiently computable. This includes maintaining the outputs of the system at constant values (regulation). such as adaptive. extensive computer simulation studies were carried out to demonstrate that such networks could approximate very well nearly all functions encountered in practical applications. the study of neural networks left its empirical origins and became a mathematical discipline. As stated in their seminal paper. involve parallel distributed processing. 2. henceforth referred to as a neural network) can be regarded as a finitely parameterized. Since approximation theory is at the core of many systems–related disciplines. demonstrating that neural networks could outperform traditional linear controllers in many applications. it was shown conclusively that neural networks are universal approximators in a very precise and satisfactory sense. and decentralized control.3. are capable of adaptation online. In the early 1980s. learning. the new results found wide application of neural networks in such areas as pattern recognition.1. and optimization. identification.

” Like many other terms in control theory. the presence of nonlinear dynamics in the plant (or process) . no simulation results are included here. The best developed part of control theory deals with linear systems. 2. in which all the state variables are not accessible. In fact. In spite of numerous efforts on the part of researchers in the past. all the theoretical advances in linear control theory were directly relevant to its development. Classical adaptive control deals with the control of linear. Later. Although the authors have carried out extensive simulation studies during the past fifteen years to test many of these methods. The focus of the chapter is on theoretical developments. Starting with the state description of linear systems. because the same control problems were attempted in the modified context. However. while assuring stability and robustness in the presence of perturbations. stability. is the aim of all control design. At the same time. and subsequently in linear adaptive control theory. Some further mathematical details concerning these are included in the following section. Achieving fast and accurate control. the evolution of the field of adaptive control closely paralleled that of linear control theory. observability. and we recapitulate briefly in this section some of the principal concepts and results. have strongly influenced the evolution of neural network-based control. Hence. as indicated by the title of the chapter. the methods were extended to control both single-input single-output (SISO) and multiple-input multiple-output (MIMO) systems.2 Control of Complex Systems Our objective. through the use of observers. time-invariant dynamic systems. it is multifaceted. the principal difficulties encountered were significantly different. current research in control theory includes many problems related to the decentralized control of linear systems using incomplete information about their interconnections. primarily on the methods for generating appropriate control inputs. most researchers would agree on many of the characteristics that would make a system complex. is to control complex systems using neural networks. and the asymptotic convergence of the performance of the adaptive system to that predicted by linear theory. stabilizability.Control of Complex Systems Using Neural Networks 17 problem is to use all available data at every instant and determine the control inputs to the system. and its definition cannot be compressed into a simple statement. as adaptive systems are invariably nonlinear. system theoretic properties such as controllability. there is currently no universally accepted definition of a “complex system. The extensive results in linear control theory. During the period 1970 to 1980 the emphasis was on generating adaptive laws that would assure the stability of the overall system. In fact. Among these. and detectability were investigated in the 1960s and 1970s and the results were used to stabilize and control such systems using state feedback. Many of the concepts and methods developed for the control of a single dynamic system were also extended to larger classes of systems where two or more subsystems are interconnected to achieve different objectives.1.3. when some of the parameters are unknown.

1. that the couplings may be nonlinear. In the following sections. An additional source of complexity is the high dimensionality of the state and parameter spaces.3 Nonlinear Adaptive Control: Stability and Design It is a truism in control practice that efficient controllers for specific classes of dynamic systems can be designed only when their stability properties are theoretically well understood. This means that the system cannot be modeled using “representative” variables of reduced dimensionality. Such problems have been studied extensively in the context of linear systems by means of matrix theory. making the overall system under consideration both nonlinear and adaptive. The identification and control of an isolated nonlinear plant should therefore fall within the ambit of our investigations. Although the qualitative statements made in that context are for the most part valid and make neural networks attractive in static situations such as pattern recognition and optimization. time-invariant (LTI) plants with known parameters and those with unknown but constant parameters (i. In some cases. Even if a stable equilibrium exists. Hence. linear adaptive systems). the system may be prevented from approaching it by external disturbances or input signals in general. This makes the stability of nonlinear adaptive systems containing neural networks a truly formidable problem. In spite of advances in stability theory for over two centuries. This explains why controllers can be designed with confidence at the present time for both linear. The neural network itself is a prime example of such an interconnected system. neural networks are used primarily as controllers in dynamic systems to cope with either known or unknown nonlinearities. In a large set of interconnected systems.. again. and notions such as diagonal dominance have been coined to quantify the strength of the interconnections. The advantages of using neural networks as components in dynamic systems was stated earlier in this section. the effect of the coupling dominates the dynamics of the system. Other characteristics of complex systems would include uncertainties or time variations in system behavior.e. This is sometimes referred to as emergent behavior and is one of the manifestations of complexity. while discussing the use of neural networks for identification and control. as it cannot be explained by simply aggregating the behaviors of the constituent systems. their use in dynamic contexts involving control raises a host of problems that need to be resolved before they can be used with confidence. and operation of the system far from equilibrium. the role of each individual system may be small. Complex systems are typically composed of many interconnected subsystems which mutually influence the evolution of their state variables. The principal difficulty is.18 Modeling and Control of Complex Systems to be controlled would be included as one of the more significant ones. 2. capable of realizing higher-order functionality at the network level. the prior information . but together they constitute a powerful whole. it is incumbent upon the authors to state precisely the class of plants considered. our knowledge of the stability of general nonlinear systems is quite limited.3.

we have the beginnings of interconnected neural networks. As we believe that this is the direction in which the field is bound to evolve in the future. and the theoretical justification they provide for stability and robustness.1. In particular the chapter will examine the efforts made by different investigators to extend principles of linear control and linear adaptive control to such problems. our objective is to present and comment on some successful applications of the theory in practical control problems. the domain of interest in the state space. In this context we also include our own approach to the same adaptive control problems. online or off-line) and utilized. address the same issues mentioned earlier. and the conditions under which the results are valid. it is not surprising that a wide spectrum of methods have been proposed in the literature by many authors making different assumptions.3. When many such are interconnected as described earlier. we include a typical problem for future investigation. the corresponding approaches they have proposed.e. and conclude with a statement concerning our position regarding the current status of the field of neurocontrol. At the present time. and succinctly stated by Feldkamp et al. When the identifier and controller for a nonlinear plant are neural networks. and the manner in which new information concerning the unknown plant is acquired (i. [13]. as well as briefly touch upon some not so conventional applications that are currently under investigation which.4 Assumptions Because the primary difficulty in most of the problems mentioned earlier arise due to the presence of nonlinearities in the representation of the system. Finally. It will examine the assumptions they have made. . but at the same time also determine the extent to which the procedures developed will prove practically feasible. We therefore devote a section to this important topic.. merely to clarify the principal concepts involved. apparently difficult problems can be made almost trivial by unreasonably optimistic assumptions.1. 2. 2. As is well known to experienced researchers. will provide greater motivation for the use of neural networks in control. there is considerable research activity in the use of neural networks in optimization and optimal control problems in the presence of uncertainty and we believe that it would be a great omission on our part if we failed to comment on it. we have a network of neural networks.4 Objectives of Chapter The first objective of the chapter is to discuss in detail the methods that are currently available for the control of a nonlinear plant with unknown characteristics. These assumptions determine the mathematical tractability of the problems. if successful.Control of Complex Systems Using Neural Networks 19 available to the designer concerning the system. using neural networks. the external perturbations that may be present.

because the authors believe that such concepts are essential for our understanding of the nonlinear domain and will be encountered increasingly in neurocontrol in the future.3 introduces feedforward and recurrent networks used to practically realize the control laws. A. B.6 is devoted to optimization and optimal control over a finite time using neural networks.2 Mathematical Preliminaries Well-known results from linear and nonlinear control that are used throughout the chapter are presented in a condensed form in this section for easy reference. as well as concepts for adaptive control that are useful for later discussions.2 is devoted to mathematical preliminaries and includes results from linear and nonlinear control.5. In Section 2. Finally. B ∈ Rn×r .7.4. global stabilizability questions are discussed. time-invariant. many of the current research problems are related to these areas. Section 2. continuous-time system c (discrete-time system d ) is described by the vector differential (difference) equation: : x (t) = Ax(t) + Bu(t) ˙ y(t) = C x(t) : x(k + 1) = Ax(k) + Bu(k) y(k) = C x(k) (2. and the approximating capabilities of neural networks have to be judiciously combined to deal with the nonlinear adaptive control problems that arise in complex systems. 2.20 2. and to nonlinear adaptive control when system characteristics are unknown.2. y(t). both to have well-posed problems and to make them mathematically tractable.1 Linear Time-Invariant Systems A general multiple-input multiple-output (MIMO). The section concludes with a statement of the problems discussed in the chapter.1. and x(t) ∈ Rn .1) c d where u(t) ∈ Rr . The section concludes with the statement of the identification and control problems that are investigated in the following sections. and x(t) . the current status of applications is discussed in Section 2. linear. Section 2. Because neural network-based control naturally leads to nonlinear control. concepts and structures suggested by classical (linear) adaptive control. Results in nonlinear control theory. Section 2. and C are constant matrices with A ∈ Rn×n . y(t) ∈ Rm . Appropriate assumptions have to be made at every stage. and u(t). and C ∈ Rm×n respectively. and the methods used to update their parameters. 2.5 Organization of Chapter Modeling and Control of Complex Systems In this chapter we attempt to discuss many of the issues related to neurocontrol that have arisen during the past fifteen years. which concludes with some critical comments concerning methods currently in vogue in the neurocontrol literature. These are contained in Section 2.

A2 b. The third system theoretic property that is crucial to all control systems is stability and depends on the matrix A. . .Control of Complex Systems Using Neural Networks 21 are.2). the input.2) where b and c T are constant vectors in Rn . Ab. Estimation and control: When c ( d ) is represented by the triple (c. b) is controllable.4) be nonsingular. Controllability and stability: For LTI systems (2.1) is controllable if the (n × nr ) matrix [B.2) is controllable if the matrix Wc = [b. A(n−1)T C T ] is of rank n. the (n × mn) matrix [C . . it can be stabilized by state feedback. AT c T . A(n−1)T c T ] T T T (2. 2. and the state of the system at time t. . An−1 B] is of rank n. . . The MIMO system c ( d ) in Equation (2. observability. Observability. output. The following definitions and basic results can be found in standard textbooks on linear systems [14]. and stability are system theoretic properties that play important roles in systems-related problems.1 Controllability. . The SISO system c ( d ) described in Equation (2.1. respectively. . b) which is controllable and observable.2. that is. A system is said to be observable if the initial state (and hence all subsequent states) of the system can be determined by observing the system output y(·) over a finite interval of time. and Stability Controllability.2) it is known that if the pair (A. . A system is said to be controllable if any initial state can be transferred to any final state by the application of a suitable control input. A.2 ARMA Model The proper representation of a discrete-time LTI system d in terms of only inputs and outputs is an important consideration in the mathematical .2. The SISO system is then described by the equation: c : x (t) = Ax(t) + bu(t) ˙ y(t) = cx(t) d : x(k + 1) = Ax(k) + bu(k) y(k) = cx(k) (2. . AB. . For a SISO system (2. .3) is nonsingular. an important result derived in the 1970s assures the existence of a control input that can stabilize the system. The dual concept of controllability is observability. u = k T x. An−1 b] (2. . For MIMO systems. . A C . The state x of the system is estimated as x and ˆ used to determine the stabilizing input u = k T x . . In the discussions that follow. c is stable if the eigenvalues of A lie in the open left half plane ( d is stable if the eigenvalues of A lie in the interior of the unit circle). the condition for observability is that the matrix Wo = [c T . ˆ 2. and extend the results to the MIMO case. we will deal with single-input single-output (SISO) systems (where r = m = 1) for clarity. .1.

⎣ ⎦ .3 Minimum Phase Systems A question that arises in control theory is whether or not internal signals in the system can become unbounded while the observed outputs remain . The relative degree di is then defined as di = min j {di j }. Hence. .22 Modeling and Control of Complex Systems tractability of many control problems. and represents the smallest time in which some input can affect the jth output. c Ad−1 b are zero. n). we obtain the following input-output relation for the MIMO system: ⎡ ⎤ y1 (k + d1 ) n−1 ⎢ y2 (k + d2 ) ⎥ n−1 ⎢ ⎥ Y(k + d) = ⎢ Ai y(k − i) + B j u(k − j) (2. . the system is said to have a relative degree d. 2 .2. Because cb. each of the m outputs has a clearly assigned relative degree denoted by the elements of the vector d = [d1 . i=0 j=0 ym (k + dm ) where Ai and B j are matrices of appropriate dimensions.8) where αi and β j are constants. the following input-output relation can be obtained: n−1 y(k + n) = c An x(k) + i=0 c An−1−i b u(k + i) (2. j = 1. the input u(k) at time k affects the output at time (k + d) but not earlier. Using the same procedure as in the SISO case. .6) where α i and β j are constants (i. . d2 . .5) From the above equation the following ARMA (autoregressive moving average) representation can be derived for SISO systems: n−1 n−1 y(k + 1) = i=0 α i y(k − i) + j=0 β j u(k − j) (2. If. .2).6). each output yi (·) has a relative degree di j to the jth input u j . For MIMO systems with m inputs and m outputs (r = m). . but c Ad b = 0. dm ]T . it can be shown that the system has a representation: n−1 n−1 y(k + d) = i=0 αi y(k − i) + j=0 β j u(k − j) (2. For LTI systems this is merely the delay through the system. From Equation (2. . 2.9) = ⎥ . in the SISO system (2. . .1. . The same also applies to MIMO systems where n−1 n−1 y(k + 1) = i=0 Ai y(k − i) + j=0 B j u(k − j) (2. .7) where Ai and B j are constant (m × m and m × r ) matrices. c Ab.

We refer to such a system as a “minimum phase system. u(1)] · · · .1 Controllability If the state x(0) of the discrete-time system d in Equation (2.2. observability.13) where Un (0) = {u(0). u(1).11) (representing the zeros of the transfer function of the SISO system) lie inside the unit circle. and stability in the nonlinear case are identical to those in the linear case. the question can be posed as follows: Is it possible for limk→∞ y(k) to be zero while the input u(k) grows in an unbounded fashion? Obviously such a situation is possible if β0 u(k) + β1 u(k − 1) + · · · + βn u(k − n) = 0 (2. u(0)]. and discrete-time nonlinear systems can be described by the state equations of the form: c : x (t) = F (x(t). However.12) is to be transferred to the state x(n) by the application of a suitable input u. . u(k)] y(k) = H[x(k)] (2. 2.10) has unbounded solutions.4).8). obtaining general conditions to assure these properties in a domain D in the state space is substantially more complex.” 2. u(n − 1)] = [x(0). .5). . and Stability The definitions of controllability. Un (0)] (2. 2.2.2.2. The problem of controllability at time k = 0 is evidently one of determining the existence of Un (0) that will satisfy Equation (2. continuous-time.1. and numerous attempts have been made to obtain results that parallel those in linear theory (refer to Section 2.Control of Complex Systems Using Neural Networks 23 bounded. u(t)) ˙ y(t) = H(x(t)) d : x(k + 1) = F [x(k).2 Nonlinear Systems Finite-dimensional. . In terms of Equation (2. the following equation has to be satisfied: x(n) = F [· · · F [F [x(0).12) for any specified x(0) and x(n).2.1 Controllability. u(n − 1)} is an input sequence of length n. or alternatively a necessary and sufficient condition for the question to have a negative answer is that all the roots of Equation (2. It can be shown that this is equivalent to the equation: β0 zn + β1 zn−1 + · · · + βn = 0 (2.12) Work in the area of nonlinear control has been in progress for many decades. . Observability. We include here well-established results concerning such systems which are related to their linearizations (refer to Section 2.11) having at least one root outside the unit circle.

Let (x. By the inverse function theorem if x is the solution of the vector equation f (x) = c. u(k). Implicit function theorem: Let U be an open set in Rm × Rn and let f : U → Rn be a C k function with k ≥ 1. y) ∈ Vm × Vn ⊂ U and a unique C k function φ : Vm → Vn such that f (x. . u(k + n − 2)] Given the sequence Yn (k) = {y(k).1.12) around the origin be z(k + 1) = Az(k) + bu(k) w(k) = cz(k) (2. and the solution y is desired as a unique function of x. . y) = c. 2 [x(k). y) ∈ U where x ∈ Rm and y ∈ Rn with f (x. y(k + n − 1)} and the input Un (k) = {u(k).14) y(k + n − 1) = H[x(k + n − 1)] = H[x(k + n − 1)] = n [x(k). If a point x ∈ U such that the matrix Df (x) is invertible. φ(x)) = c for all x ∈ Vm . u(k + n − 1)} observability implies that the state x(k) and hence x(k + 1). then there exists an open neighborhood of x in U such that f : V → f [V] is invertible with a C k inverse. y) ∈ Vm × Vn and y = φ(x). y) = c if (x. u(k)] (2. x(k + n) can be determined. and is stated without proof. .1.3 Inverse Function Theorem and the Implicit Function Theorem Both controllability and observability consequently involve the solutions of nonlinear algebraic equations. 2. The implicit function theorem extends this result to equations that are functions of x and y. . THEOREM Let the linearized equations of (2.15) .2. . y) of partial derivatives is invertible. Two fundamental theorems of analysis that are useful in this context are the inverse function theorem and the implicit function theorem.2.24 Modeling and Control of Complex Systems 2. y(k + 1). . . . If the (n × n) matrix Dy f (x. . .2. Inverse function theorem: Let U be an open set in Rn and let f : U → Rn be a C k function with k ≥ 1. . then the equation can also be solved in the neighborhood of x if the Jacobian matrix Df (x)|x=x is nonsingular. u(k)]] = . . . observability can be defined by considering the equations: y(k) = H[x(k)] = 1 [x(k)] y(k + 1) = H[x(k + 1)] = H[F [x(k). then there are open sets Vm ⊂ Rm and Vn ⊂ Rn with (x. . . .2. u(k + 1). .2 Observability Similarly. . u(k + 1). Moreover f (x. The following important theorem derived using the implicit function theorem is the starting point of all the local results derived for nonlinear control.

2) depending upon whether the system is MIMO or SISO.u=0 x x=0.u=0 ∂ x=0.u=0 system (2.5) and (2. the nonlinear system (2. the parameters of the plant are assumed to be unknown.2. Yet the theorem is important. In spite of the complexity of nonlinear adaptive systems.15) is controllable (observable). Because the statement of the problems in the linear case. The controllability and observability of the linearized system are merely sufficient conditions for the corresponding properties to hold for (2. The relevance of these comments will be made clear in Sections 2. the assumptions made and the reasons for the difficulties encountered are all directly relevant for the issues discussed in this chapter.12) and are not necessary. . the ARMA representations (2. The term “adaptive control” refers to the control of partially known systems. if the linearized system is well behaved. Linear adaptive control deals with the identification and control of LTI systems with unknown parameters [15].4 and 2.6) are used. it is worth stressing that linear adaptive control gave rise to a multitude of difficult questions for a period of forty years and that some of them have not been completely answered thus far.2.5. Many subclasses may have to be defined and suitable assumptions may have to be made to render them analytically tractable. 2. The plant to be controlled is described by the linear state Equations (2. The class of nonlinear adaptive control problems of interest in this chapter are those in which the nonlinear functions F (·) and H(·) in the description of the controlled plant (2. this represents a very large class of systems for which general analytic methods are hard to develop. If the linearized ∂ x=0. In all cases.4 Stability If A in Equation (2.1) or (2. Identification and control: Identification involves the estimation of the unknown parameters of the plant using either historical input-output data or online input-output measurements. many of the questions that they give rise to are closely related to those in the linear case. Obviously.1.3 Adaptive Systems: Theoretical Considerations All the problems discussed in this chapter can be considered as infinite-time or finite-time problems in nonlinear adaptive control. as according to it the nonlinear system is well behaved in a neighborhood of the origin. Although the latter seem simple (in hindsight). from Lyapunov’s works it is well known that the nonlinear system d is asymptotically stable in a neighborhood of the origin. 2.14) is controllable (observable) in some neighborhood of the origin. and c = ∂h . we provide a brief introduction to them in this section. If only the inputs and outputs are accessible.15) is a stable matrix.2.12) are unknown or partially known. b = ∂u .Control of Complex Systems Using Neural Networks 25 f f ∂ where A = ∂ x . Indirect adaptive control involves the adjustment of controller parameters based on the estimates of the plant parameters.

the Diophantine equation). Comment 3 The resolution of the above problem in linear adaptive control in the late 1970s took several years. To assure boundedness of all the signals strong assumptions such as positive realness of the plant transfer function have to be made.g. Demonstrating the existence of a controller. Comment 2 Existence questions in linear adaptive systems lead to linear algebraic equations (e. whose adaptive control has been investigated rigorously in the literature. the corresponding equations would be nonlinear. Hence. or to track the output of a reference model (model reference adaptive control). all system characteristics have to be discussed in a higher-dimensional state space. This also makes (linear) adaptive systems nonlinear.. Two types of plants that have been analyzed are defined below [16. For example. In nonlinear systems. the error models relating parametric errors to output errors are linear (refer to Section 2. . are those that are in canonical form with constant unknown parameters. This was in spite of the advantage that could be taken of many of the subsystems being linear. Algebraic and analytic parts: All conventional adaptive methods invariably consist of two stages. If indirect control is used. constitutes the first stage. which is algebraic. Since this advantage is lost in nonlinear adaptive control. it must first be demonstrated that a controller structure exists that can result in the convergence of the output error to zero. 17]. Choosing a linear reference model to possess desired characteristics is relatively straightforward. Nonlinear plants with a triangular structure: A more general class of nonlinear systems. Direct and indirect adaptive control: In direct adaptive control the control input is determined directly from a knowledge of the output (control) error. the stability analysis of a linear plant of dimension n has to be discussed in a 3n-dimensional space (2n corresponding to the unknown parameters of the plant). Choosing a nonlinear reference model is considerably more difficult and requires a detailed knowledge of the plant. Therefore. In the latter case. the problem is significantly more complex. mentioned earlier. For example. the reference model has to be chosen properly so that the problem is well posed.3. Both stages are relevant for all the problems treated in this chapter.2).26 Modeling and Control of Complex Systems Comment 1 Parameters that are adjusted online become state variables. Determining adaptive laws for adjusting the controller parameters so that the overall system is stable and the output error tends to zero constitutes the second stage and is the analytical part. The reference model: Controller parameters can be adjusted either to minimize a specified performance criterion. it is not suprising that in most applications linear models are chosen.

.1 Plant We assume that the plant (or process) to be controlled is described by the discrete-time state Equations (2.2. If F and H are unknown or partially known. as in the case of linear adaptive control. Naturally. 2. Naturally. . .Control of Complex Systems Using Neural Networks DEFINITION A system is said to be in parametric pure-feedback (PPF) form if zi = zi+1 + θ T γi (z1 . Comment 4 The proof of stability given in References [16.18) and the functions F and H are smooth. . n − 1 ˙ zn = γ0 (z) + θ T γn (z) + [β0 (z) + θ T β(z)]u ˙ DEFINITION A system is said to be in parametric strict-feedback (PSF) form if zi = zi+1 + θ T γi (z1 . the historical developments in the fields of neurocontrol have traversed the same paths as those of linear adaptive control. β0 (·).4. 17] for the above problems is strongly dependent on the fact that the nonlinear functions γ0 (·). and β(·) are known and smooth (so their derivatives can also be used in the control laws) and the only unknown in the system is the constant vector θ . . .4 Problem Statement As stated in the introduction. 2. even as the latter have closely paralleled those of linear control theory. . the proofs are no longer valid if any of the above assumptions do not hold (see comments in Section 2. the problem is one of nonlinear adaptive control. If F and H are known. . zi ) ˙ zn = γ0 (z) + θ T γn (z) + β0 (z)u ˙ where z = [z1 . These are concerned with the identification and control of nonlinear dynamical systems.4). 2. .4. u(k)] F (0. 0) = 0 y(k) = H[x(k)] H(0) = 0 (2. . In this section we consequently confine our attention to the same sequence of problems that were resolved in the two preceding fields in the past four decades. the problem belongs to the domain of nonlinear control. . z2 . θ ∈ R p is a vector of unknown parameters. . . .16) (2. zn ]. .12): : x(k + 1) = F [x(k). zi+1 ). .17) Stabilizing adaptive controllers have been developed for systems in both PPF and PSF forms where the result is local in nature in the former and global in the latter.2. we will be interested first in the questions that arise in the control problem . 27 (2. i = 1.

such that every initial condition x0 in a neighborhood of the origin is transferred to the equilibrium state in a finite number of steps. The state x(k) of is accessible at every time instant k. Three problems are presented below. 2. determine a control law such that x(k) tends to the equilibrium state in a finite time. Among these the most important are the assumptions about the function F (·) and H(·). and later on how the methods proposed can be modified for the adaptive case.19) . PROBLEM 1 (Identification) The discrete-time plant is described by Equations (2. so that ||x(t)|| ≤ c x and ||y(t)|| < c y . and that the region in the state space in which the trajectories of should lie are also specified. The input u(·) satisfies the condition ||u(t)|| ≤ c u and is BIBO (bounded-input. where c u . and the accessibility of its state variables. the following control problems may be stated: 1. PROBLEM 2 (Control of a stable plant) Assuming that is stable and that models ˆ and ˆ I/O satisfying the conditions of Problem 1 have been determined. ˆ 2. c x . In the first two problems the plant is assumed to be stable. and c y are known constants. Determine a suitable representation for a model ˆ of the plant whose output ˆ x (·) satisfies the condition limk→∞ ||x(k) − x (k)|| < 1 . where 1 and 2 are ˆ prescribed constants. In the third problem. r (k)) ym (k) = Hm (xm (k)) (2. In all cases. (Set point regulation) In problems (1) and (2) determine a control law such that the output y(·) of is regulated around a constant value. If control has to be carried out using only the inputs and outputs of . Assuming that only the input and output of are accessible. Determine a feedback control law u(k) = γ (x(k)). 1. it is assumed that external inputs are bounded with known bounds. bounded-output) stable. A number of factors influence both the problems posed and the methods used for resolving them. where γ (·) is a smooth function. 3. determine an input-output model ˆ I/O of the system such that the output y(k) of the model satisfies limk→∞ ||y(k) − ˆ y(k)|| ≤ 2 for the set of input-output pairs provided. This would naturally call for a modification of the methods used for identification and control. (Tracking) Given a stable reference model m defined by m : xm (k + 1) = Fm (xm (k). 4. the stability of the system . and identification and control proceed concurrently to make the overall system stable (this corresponds to the major stability problem of adaptive control resolved in 1980).28 Modeling and Control of Complex Systems when F and H are known. the plant is assumed to be unstable. a different representation of the system will be needed. If y(k) but not x(k) is accessible.12) where F (·) and H(·) are unknown.

The questions of stability that arise in the various cases 2. determine a feedback control law such that k→∞ 29 lim |y(k) − ym (k)| < (2. These are discussed in Section 2.1. since researchers in control theory and computer science believe that they would enhance our ability to solve complex problems. Decentralized control. A generic problem of interconnected nonlinear systems may be stated as follows. have to be carried out simultaneously. distributed control. All the problems stated above can be addressed either from a strictly theoretical point of view or as those that arise in the design of identifiers and controllers in real applications.4.4. The algorithms used to adjust the parameters of the neural networks 3. Also interest in distributed architectures has increased. neural networks play a critical role similar to that in the problems described earlier. the prior information that is available concerning the plant as well as mathematical tractability will dictate to a large extent the models used for both identification and control. More recently. and hierarchical control come under this category. In the latter case. When dealing with interacting or interconnected systems.20) where is a specified constant. after problems involving isolated systems had been addressed. there has been a great deal of research activity in multiagent systems in which many dynamic systems interact. Comment 5 As in classical adaptive control we will be interested in both the algebraic and the analytical parts of the solution. and Fm and Hm are known. PROBLEM 3 (Control of an unstable plant) In this case identification and control of the plant (which is assumed to be unstable). . In this chapter.2. All four cases stated in Problem 2 can also be considered in this case. Some of the questions that arise in this context are listed below: 1. The above comments indicate that interaction of dynamical systems can arise due to a variety of factors ranging from practical physical considerations to desire for increased efficiency. we are interested in both classes of problems. interest invariably shifted to problems in which multiple dynamic systems are involved.Control of Complex Systems Using Neural Networks where xm (k) ∈ Rn and ym (k) ∈ Rm .2 An Area for Future Research In control theory as well as in adaptive control. Structures of identifiers and controllers and the use of feedforward networks and recurrent networks to realize them 2.

and xi ∈ R N−ni denotes the states of the remaining N − 1 systems. 2. . However.30 The overall system 1 2 Modeling and Control of Complex Systems consists of a set of subsystems i (i = 1. (2. u1 (k) : x2 (k + 1) = f 2 x2 (k).1 Neural Networks Although numerous network architectures have been proposed in the literature. and the stability and robustness issues that have to be addressed are all important considerations in their design. N) : x1 (k + 1) = f 1 x1 (k). The type of networks to be used. u N (k) N where N = . . and Stability In the following sections neural networks are used as identifiers and controllers in dynamic systems. we provide brief introductions to both of them. .21) : N : xN (k + 1) = f N xN (k). . how they acquire their information. For mathematical tractability. one such problem dealing with decentralized adaptive control is discussed. In Section 2. because most real systems are in fact nonlinear. and whether communication is permitted between them constitute different aspects of the problems that arise.4. u2 (k) . . Even though both of them have been studied extensively in the literature. xi ∈ Rni is the i=1 ni is the dimension of the state space state of subsystem i . Each system i is affected by other subsystems through an unknown smooth function h i (·). . Depending upon the nature of the problem.3. Adaptive Laws. The former are static maps whereas the latter are dynamic maps. 2. h 2 [x 2 (k)]. the different subsystems may compete or cooperate with each other to realize their overall objectives. 2. In this section we comment briefly on each of the above aspects. . . it is only reasonable to expect increased interest in the future in nonlinearly interconnected systems. the adaptive laws for adjusting the parameters of the network based on available data. for the sake of continuity. we will be concerned mainly with two broad classes of networks in this chapter: (1) feedforward networks and (2) recurrent networks. h 1 [x 1 (k)].3 Neural Networks. much of the research in progress on problems of the type described above are restricted to linear systems. How the various systems identify their dynamics in the presence of uncertainty. . . h N [x N (k)].

An N-layer MPN with input u ∈ Rn and output y ∈ Rn can be described by the equation: y = WN [WN−1 · · · [W1 u + b 1 ] + b 2 ] + · · · + b N−1 ] + b N (2. An RBFN is shown in Figure 2. represent the output y as a weighted sum of basis (or activation) functions Ri : Rn → R.1b. the output is a linear function of the elements of W. xn ]T . b3 Σ Σ Σ γ γ y1 y2 u1 u2 un R(||u – c1||) R(||u – c2||) R(||u – cN||) W1 W2 WN w0 y Σ un Σ γ vn Σ γ zn γ yn 1a. both MPN and RBFN enjoy two important characteristics. c in ] is the center of the ith j=1 2 σi j receptive field. . . u2 . Quite often Gaussian functions are used as radial basis functions so that Ri (u) = (u j −c i j ) 2 exp − n where c i = [c i1 .1 Neural networks: (a) a multilayer preceptron network (MPN). . Radial basis function networks. . 2. If y ∈ Rn . . the vectors b i (i = 1.1 Feedforward Networks The most commonly used feedforward networks are the multilayer preceptron network (MPN) and the radial basis function network (RBFN). x2 . and σi j is referred to as the width of the Gaussian function. . . . γ (x2 ) · · · . The first is their ability to approximate nonlinear maps. These methods will be discussed in Section 2.3.1a. . . . different methods of adjusting their parameters have been developed and are generally known. Since the function R(u) is predetermined. . A three-layer network is shown in Figure 2. b1 u1 u2 v0=1 z0=1 W2. 2.3. The second is the fact that for such networks.22) where Wi is the weight matrix associated with the ith layer. It is seen that each layer of the network consists of multiplications by constants (elements of the weight matrix) summation and the use of a single nonlinear map γ . N. . W2 . 1] is a smooth function. . c i2 . γ (xn )]T corresponding to an input [x1 . . . . . (b) a radial basis function network (RBF). .Control of Complex Systems Using Neural Networks u0=1 W1. . where i = 1. the RBFN is described by y = W T R(u) + W0 where W = [W1 . b2 31 Σ Σ γ γ v1 v2 Σ Σ γ γ z1 z2 W3 . A Multilayer Preceptron Network (MPN) 1b.3. un ]T as the input and W0 is an offset weight. For the purposes of this chapter. . . A Radial Basis Function (RBF) Network FIGURE 2. which are an alternative to MPN. WN ]T is a weight vector multiplying the N basis functions having u = [u1 . .1. . . where γ : R → [−1. N) represent the threshold values for each node in the ith layer and is a static nonlinear operator with an output vector [γ (x1 ).

which are static maps from one finite-dimensional vector space to another. on the other hand.. includes multiplication by a constant and summation and a single appropriate nonlinear function (e.2. recurrent networks provide a natural way of modeling nonlinear dynamic systems. can be generated using all four operations described above. people in industry became convinced of the usefulness of such networks for the modeling and control of dynamic systems. A static feedforward network. compromises have to be made on the total number of delays used. It was argued by Williams [18] in 1990 that recurrent neural networks can be designed to have significantly new capabilities. the function H : Rn → Rm can be approximated. which are nonlinear dynamic maps. Consider first the general state Equation (2. and delays as shown in Figure 2. It is well known that any LTI discrete-time system can be realized using only the operations of multiplication by a constant. and output nodes (and delays exist between every node and all the other nodes in the system). As in the case of static networks. the sigmoid). It was also found that recurrent networks used as controllers are significantly more robust than feedforward controllers to changes in plant characteristics. interest in recurrent neural networks also grew from successes in practical applications. multiplication by constant. For practical as well as theoretical reasons. respectively. all of which thus far have not been exploited in control theory. 20] that . Many different structures have been suggested in the neural network literature by both computer scientists and engineers. delay. that is. hidden. and nonlinear filters. F and H represent the approximations of F and H.2 Recurrent Networks In contrast to the feedforward networks considered thus far.2. which will be needed for addressing the problems stated earlier. Both of them use the universal approximating property of multilayer neural networks. Since then. We present only two structures. recurrent networks are dynamic maps that map input time signals into output time signals. Through considerable experience.18) representing an nth-order nonlinear system. If the state variables are not accessible and an input-output model I/O of the system (with relative degree d) is needed. As shown later. They can also be used to transform one input sequence into another. generators of sequential patterns.3. The representation of the dynamic system given by Equaˆ ˆ tion (2.18) is shown in Figure 2.g. Recurrent networks. summation. Similarly. addition. The number of such delays can vary from unity (if only the output signal is fed back to the input) to N = N2 where N represents the sum of the input. F : Rn × Rr → Rn can be approximated using a multilayer neural network with (n + r ) inputs and n outputs. Delays can be introduced anywhere in a feedforward network to make it dynamic. it has been shown that recurrent networks can serve as sequence recognition systems.32 Modeling and Control of Complex Systems 2. and a sigmoid nonlinearity. and a unit delay. it has been shown [19. This is accomplished by providing them with memory and introducing time delays and feedback.1. using a separate multilayer network.

. . . The recurrent network models shown in Figures 2. f represents an approximation of f in Equation (2. . y(k − v + 1). .3 can be used either as identifiers or controllers in the problems stated earlier. . u(k). u(k). . . . y(k − v + 1). These are referred to as NARMA (nonlinear ARMA) models. . a SISO system can be described by the equation: y(k + d) = f [y(k). u(k).3. The multivariable system (2. .u) x(k+1) ˆ H(x) y(k+1) FIGURE 2. y(k − 1). . . . . u (k) Z–1 Z–1 u (k – n + 1) y(k – n + 1) Multilayer Neural Network fˆ ( ) y (k + d) Z–1 Z–1 y (k) Z–d FIGURE 2.3 Input-output model. Similarly. u(k − n + 1)] (2. y(k − 1).24) can also be realized in a similar ˆ fashion.2 and 2. it has been shown that a representation of the form: y1 (k + d1 ) = f 1 [y(k). y(k − 1).Control of Complex Systems Using Neural Networks 33 x (k) z–1 0 0 z–1 0 0 z–1 0 0 u ˆ F (x. . u(k). u(k − v + 1)] (2. . and relative degree di for the ith output. . . . u(k). u(k − v + 1)] ··· ym (k + dm ) = f m [y(k). . the realization of a SISO system is shown using tapped delay lines. .2 State vector model.24) exists in . . .23) in a neighborhood of the equilibrium state.23). In Figure 2. for a multivariable system with r inputs (u(k) ∈ Rr ) and m outputs y(k) ∈ Rm . .

. the recurrent model is a true model of the system with all the advantages of such a model (e. Comment 6 The following points are worth emphasizing.34 Modeling and Control of Complex Systems 2. however. may outweigh the theoretical advantages of the recurrent model in some applications. . based on the adjustment of the control parameters using plant parameter estimates. y(k − n + 1). ˆ ˆ y(k − 1).28) since the estimate y(k + d) at time k + d depends upon past estimates y(k).1. the equation describing the model is no longer simple. and is a difference equation. and β j represent the output and parameter errors at time k. a recurrent (or parallel) identification model is used.27) ˜ where y. stable ˆ adaptive laws for adjusting αi (k) and β j (k) can be determined by inspection. The series-parallel model is not truly a model but merely a predictor.. as shown below: n−1 n−1 y(k + d) = ˆ i=0 αi (k) y(k − i) + ˆ ˆ j=0 ˆ β j (k)u(k − j) (2. A series-parallel identification model has the form: n−1 n−1 y(k + d) = ˆ i=0 αi (k) y(k − i) + ˆ j=0 ˆ β j (k)u(k − j) (2.3. The determination of stable adaptive laws for adˆ ˆ ˆ justing αi (k) and β j (k) is substantially more complex in this case. and the assured stability of the overall control system.25) where αi and β j are unknown and need to be estimated.8): n−1 n−1 y(k + d) = i=0 αi y(k − i) + j=0 β j u(k − j) (2. .3 System Approximation The two principal methods for approximating a system described by a recursive equation can be illustrated by considering the estimation of the parameters of a linear system. αi .26) ˆ where αi (k) and β j (k) are the parameter estimates at time k.g. Since ˜ ˜ this has the standard form of error model 1 (described in Section 2. ˆ If. such ˆ adaptive laws are not available and only approximate methods are currently known. In fact. (2. If an efficient predictor is adequate for control purposes (as has been demonstrated in linear adaptive control) the simplicity of the seriesparallel model..3. described by Equation (2.2). In contrast to this. control strategies can be tried out on the model rather than the plant). . The output error ˆ equation has the simple form: n−1 n−1 y(k + d) = ˜ i=0 αi (k) y(k − i) + ˜ j=0 ˜ β j (k)u(k − j).

These relate the parameter errors φ(t) to the output error e(t) ∈ R(Rm for MIMO) between the actual output of the plant and a desired output (generally the output of a ˙ reference model). 35 Error Model 1 u φT e1(t) 2. Error Model 2 e (t) = Ae(t) + bφ T (t)u(t) ˙ (2.29) where the matrix A and vector b are known. The equations describing the models. and (A. b) is controllable. A is stable. The study of error models is rendered attractive by the fact that by analyzing these models. it is possible to obtain insights into the behavior of a large class of adaptive systems.Control of Complex Systems Using Neural Networks φTu = e1 φ = –e1u (adaptive law) Error Model 2 u φT Stable Plant e(t) e = Ae + bφTu φ = –eT Pbu e = Ae + bφTu Error Model 3 u e1(t) φT SPR e1 = ce φ = –e1u FIGURE 2. the adaptive laws proposed.3. Adaptive law: ˙ φ(t) = −e T (t)Pbu(t) or AT P + P A = −Q < 0 −e T (t) Pbu(t) 1 + uT (t)u(t) (2. and the Lyapunov functions which assure stability in each case are given below.2 Stable Adaptive Laws: Error Models The laws for adjusting the parameters derived in classical adaptive control are based on simple linear models. The objective is to determine φ(t) using all the information available at time t to make the system stable. known as error models. which are independent of specific applications. Error Model 1 φ(t). V(φ) = 2 φ (t)φ(t) 1+uT (t)u(t) 2 ˙ and V(t) = −e 1 (t) or 2 −e 1 (t) 1+uT (t)u(t) ≤ 0.4. ˙ ˙ Adaptive law φ(t) = −e 1 (t)u(t) if the input u(t) is bounded and φ(t) = −e 1 (t)u(t) 1 T when it is not known a priori that u(t) is bounded.4 Error models. The error models [15] are shown in Figure 2.30) . u(t) ∈ Rn and φ T (t)u(t) = e 1 (t). so that e(t) tends to zero.

it may be possible to obtain an error model of the form shown in Figure 2. φ) = e T Pe + φ T φ and V(t) = −e T Qe (2. Extremum seeking methods and sensitivity methods that are gradient based were among the most popular methods used. by the same arguments as before. Comment 7 Approximations of the type described in this model have been made widely in the neurocontrol literature without adequate justification.31) By the Kalman–Yakubovich lemma [21]. It is worth emphasizing that when the input u(·) is not known a priori to be bounded. and conditions were established for local . Once the adjustment was completed. 2.5 Error models for nonlinear systems.2. it follows that e and φ are bounded. 2.3.36 Modeling and Control of Complex Systems e1 φT SPR x N(x) u = N(x) FIGURE 2. This ensures that x and consequently N(x) are bounded.3. is used. and hence the output e 1 tends to zero.4.5. the adjustment of the parameters of an adaptive system were made to improve performance. Error Model 3 In error model 2. φ(t) = −e 1 (t) N(x(t)).2. If the same ˙ adaptive law as in error model 3. the stability of the overall system was analyzed. the adaptive laws have to be suitably normalized. We shall comment on these at the end of Section 2.2 Gradient-Based Methods and Stability During the early days of adaptive control in the 1960s. ˙ Adaptive law: φ(t) = −e 1 (t)u(t) or −e 1 (t)u(t) 1+uT (t)u(t) ˙ V(e. This is the same as error model 3 in which u = N(x) where x is the state of the unknown plant and N(·) is a continuous nonlinear function.1 Error Models for Nonlinear Systems In some simple nonlinear adaptive control problems as well as more complex problems in which appropriate simplifying assumptions are made. but ce(t) = e 1 (t) is accessible and c[s I − A]−1 b is strictly positive real. a matrix P exists which simultaneously satisfies the equations AT P + P A = −Q and Pb = c T . e(t) is not accessible. and in both cases the parameters were adjusted online.

However. numerous authors have continued to formulate problems in such a fashion that stable adaptive laws can still be determined. as shown in Section 2. optimization of performance preceded stability analysis in such cases. When neural networks are used in a system for identification and control.Control of Complex Systems Using Neural Networks 37 stability.1 are used to approximate a nonlinear function. and are gradient based as in classical adaptive control of the 1960s. At the same time he also showed that the system could be made globally stable using a design procedure based on Lyapunov’s method. we note that in all the error models ˙ discussed earlier φ → 0. In the following sections we shall assume that gradient methods result in stability if the operating point is stable. and later verified in numerous applications. Once again. This clear demonstration that gradient-based methods could become unstable tolled the death knell of such systems. it is the speed of adaptation that causes instability. Also. researchers have been reexamining gradient-based methods which are both theoretically and practically attractive. For multilayer networks the element θi of . in a paper of great historical significance in adaptive control. Thus.3 Adjustment of Parameters: Feedforward and Recurrent Networks The simplicity of the adaptive laws for adjusting the parameters of the seriesparallel model described earlier resulted from the fact that the error equation (relating parameter errors to output error) was linear. the parameters of the networks are adjusted to minimize some specified norm ||y − yd || where y is the output of the neural network and yd the output of the given nonlinear function when both have the same input u. If the frequency of adjustment of the parameters is such that the output error has a phase shift greater than π . In the following forty years adaptive control aimed at first stabilizing the overall system using stable adaptive laws. most of the methods currently used for adjusting the parameters of a neural network are related to back propagation. it becomes necessary to examine the reasons for discarding gradient methods in the past and explore ways of reconciling them with stability and robustness of the overall system.4. In view of the difficulties encountered in generating stable adaptive laws. conclusively demonstrated using a specific example that gradient methods for adjusting the parameters in an adaptive system can result in instability. In 1966 Parks [22]. 2. Motivated by such considerations. When the neural networks shown in Figure 2. the overall system is nonlinear and it is very hard to derive globally stable adaptive laws for adjusting parameters. For radial ∂e basis function networks if wi is adjusted. and the adjustments are slow compared to the dynamics of the system. and later adjusting the fixed controller parameters of the system to improve performance within a stability framework. In the example described by Parks in his classic paper [22]. adaptation may proceed exactly in a 2 direction opposite to what is desired.3. ∂w is merely the output of the i ith radial basis function network. and witnessed a shift in the next decade to design based on stability methods described earlier.

6. and the latter quite often involves the cascading of several networks. . . . b 2 ) and (W1 . . While the output error depends linearly on the weights (W3 . In problems that do not involve feedback (such as function approximation and pattern recognition) it is simple to apply and does not pose any stability questions. . one is forced to resort to gradient methods for adjusting these parameters. This is shown in Figure 2.38 Modeling and Control of Complex Systems the parameter vector θ (the elements of the matrices Wi and vectors b i in Figure 2. is a dynamic map and the inputs and outputs are not vectors in a finite-dimensional 1 = u0 t = {ti } u1 1 1 1 = v0 v1 t = 2 2 {tk } 1 = z0 z1 z2 t3 = {t3} l y1 Σ Σ γ γ Σ Σ γ γ Σ Σ γ γ u2 v2 . . . In Reference [12].6 is found to be very useful in practical applications. If it is used in control problems. . b 1 ). a convenient ∂w ∂ method of computing ∂θe is needed. the adjustment of the parameters must be slow compared to the dynamics of the controlled system. the architecture in Figure 2. as described later. . an architecture was proposed by which the partial derivatives of e with respect to parameters in an arbitrary number of layers can be realized.1a) are in general nonlinearly related to the output error. Hence. In contrast to the above. Since our interest is in the control of complex systems using neural networks. the recurrent network. zn y2 un W1 = {w1 } ij vn . Σ γ’ δ1 δ2 δp γ W = {wki2} 2 Σ γ’ δ1 δ2 δq 1 γ W3 = {wlk3} Σ γ’ π γ yn π π π Σ Σ Σ W2T = {wik2} π π π Σ Σ Σ W3T = {wkl3} δ1 δ2 δm π π e1 e2 em 1 U П px (n+1) multiplications 1 qx (p+1) multiplications Z mx (q+1) multiplications V FIGURE 2. Hence. Back propagation has been perhaps the most diversely used adaptive architecture in practical applications. . b 3 ) in the output layer [if the activation functions γ (·) are omitted at the output]. as stated earlier. . If w is a typical weight the derivative of e 2 with 2 ∂e ∂e respect to w is ∂e = 2e ∂w so that ∂w has to be computed. “Back propagation” is such a method i and is currently well established.6 Architecture for back propagation. they depend nonlinearly on (W2 .

and desired outputs is specified. In fact. θ) (2. A brief description of the principal ideas contained in each of the approaches is given below.1 Back Propagation through Time The basic idea of this approach is that corresponding to every recurrent network it is possible to construct a feedforward network with identical behavior over a finite number of steps.” and Williams and Zipser as “real-time recurrent learning. and unsupervised learning based on the statistical features of the input signal. reinforcement learning based on a scalar reward signal. and we refer the reader to papers by Pearlmutter [24] and Rumelhart et al. Back propagation through time was first described in 1974 by Werbos and was rediscovered independently by Rumelhart et al.” In spite of the different origins and terminologies. Narendra and Parthasarathy [27] and Williams and Zipser [28] in the early 1990s. u(k). The problem of adjusting the parameters of a recurrent network for approximating a desired behavior has been addressed using supervised learning based on an output error signal. Even though the mathematical tools for dealing with such networks are not sufficiently well developed. The finiteness of the time interval permits the neural network to be unfolded into a multilayer network.3. Mathematically they are substantially more complex than feedforward networks. it has been shown that a recurrent network with enough units can approximate any dynamic system [23]. In the engineering literature. it is not surprising that they are being widely studied at the present time and there is every indication that their use in the control of complex systems is bound to increase in the future. Numerous authors in engineering and other fields (including computer science and computational neuroscience) have proposed different algorithms for adjusting the parameters of recurrent networks. and it is the gradient of this function with respect to the parameters that is desired. Werbos refers to it as “back propagation through time. the problem was considered independently by Werbos [26]. Our interest here is in supervised learning algorithms. inputs. The principal idea of the method is best described by considering two steps of a recursive equation: x(k + 1) = N(x(k). the basic ideas are essentially the same and involve the computation of partial derivatives of time functions with respect to parameters in dynamic systems.3.” Narendra and Parthasarathy as “dynamic back propagation. An error function is defined. 2. (1986).Control of Complex Systems Using Neural Networks 39 vector space but time sequences. so that standard back propagation methods can be used for updating the parameters. and the reader is referred to the source papers for further details. [8. 25] on this subject. A network with a fixed structure (such as the ones described earlier) and a set of parameters.32) . but it is their very complexity that makes them attractive because they permit a wide variety of dynamic behaviors to be programmed by the proper choice of weights.

1. the states are computed for the specified values of the inputs and the gradients computed using back propagation.6. as well as multivariable systems). The main idea is best illustrated by a simple example of a continuous-time system (similar results can be readily obtained for discrete-time systems. If the desired states xd (1) and xd (2) are specified.7. The system. over the instants 0. is a convenient aid for computing the necessary gradient while using this method.3. 2. Comment 8 The architecture for back propagation shown in Figure 2. the interval over which optimization is carried out must be chosen judiciously from practical considerations. the parameter θ can be adjusted to decrease the error function over the interval. u(0). The above method can be readily extended to a finite number of steps.” Determining gradients of error functions with respect to adjustable parameters and adjusting the latter along the negative gradients was well known in the 1960s. If u(0) and u(1) are known and x(0) is specified. θ] x(1) xd(1) N[x(1).40 u(1) u(0) Modeling and Control of Complex Systems Z–1 N[x(0). This was extended to time-varying systems in the above paper and it was shown that it led to a whole gamut of optimization procedures ranging from self-optimizing (quasi) stationary systems to optimal programming and optimal control. and state information that has to be stored grows linearly with time. θ] xd(2) x(2) e(1) e(2) FIGURE 2. which applies to an arbitrary number of layers. output. where θ is an adjustable parameter vector. .2 Dynamic Back Propagation The origin of real-time recurrent learning may be traced back to a paper written in 1965 by McBride and Narendra [29] on “Optimization of timevarying systems. A procedure for determining the gradient of a performance index with respect to the parameters of a nonlinear dynamical system was proposed by Narendra and Parthasarathy in 1991 [27]. This work was naturally strongly influenced by numerous papers written in the 1960s by Narendra and McBride on gradient-based methods for the optimization of linear systems.2. Since the input. u(1). static back propagation over two steps can be carried out to determine the gradient of an error function with respect to θ. Having chosen the interval. can be represented as shown in Figure 2. the states x(1) and x(2) can be computed.7 Back propagation through time. and based on that.3.

α) − yd (τ )]2 dτ = 1 T T 0 e 2 (τ. y) + y = u y(0) = y1 y (0) = y2 ¨ ˙ The objective is to determine the value of α which minimizes J (α) = 1 T T 0 41 (2.α) ∂α z+z=− ˙ ∂F ∂α (2. α)dτ (2.3. T].35) If a time-varying sensitivity model described by Equation (2. 2.Control of Complex Systems Using Neural Networks Let a second-order system be described by the equation: y + F (α. so that the desired gradient z and the ∂y ˙ ∂α change in the parameter α can be computed at every instant of time.34) (= ∂ y(t.α) = z. ∂α Differentiating Equation (2. Since the methods used are identical in the two cases. z (the desired gradient) can be generated as its output.33) [y(τ. ∂α ∂y (α. α) is the error between y(t.3 Interconnection of LTI Systems and Neural Networks The method described above for determining the partial derivatives of the error functions with respect to the adjustable parameters in a recurrent network was used widely in the 1960s for optimizing LTI systems. ˙ ˙ we have (using ∂ F∂α y) = ∂ F + ∂ F ∂α ) ∂α ∂y ˙ z+ ¨ ∂F ∂y ˙ where e(t. the desired partial derivatives ∂ F and ∂ F are known signals. α) and a specified desired time function yd (t).35) can be constructed.4 Real-Time Recurrent Learning In 1989 Williams and Zipser [28] suggested a method very closely related to dynamic back propagation described earlier for discrete-time recurrent networks. A gradient-based method for adjusting α can be implemented if ∂e(t. When the parameters of a neural network need to be adjusted.α) ) can be computed over the interval [0. Comment 9 The adjustment of α is assumed to be slow compared to the dynamics of the system and the methods were referred to as quasi-stationary methods in the 1960s.33) with respect to α and denoting ∂ y(t. Narendra and Parthasarathy [27] suggested the use of dynamic back propagation for use in complex systems in which LTI systems and static multilayer networks are interconnected in arbitrary configurations. In all cases the method calls for the construction of a dynamic sensitivity model whose outputs are the desired partial derivatives.3.3. . 2.3.

.i j u j (n) ⎠ (2. . This yields: ⎡ ⎤ n ∂xi (n + 1) ∂ x j (n) = γ ( yi (n)) ⎣ wi j + δik zl (n) ⎦ (i = 1. The objective is to determine the weights wi j so that the state (x1 (n). 2. 2. .i j x j (n) + j=1 j=1 w2. n) (2.36) or equivalently by the equation: ⎛ xi (n + 1) = γ ⎝ where zj = xj 1≤ j ≤n u j−n j > n n+r j=1 ⎞ wi j z j (n) ⎠ (2.37) and γ is a squashing function. . The term δik zl (n) represents the explicit effect of the weight wkl on the state xi . . we consider several different methods which have been proposed for addressing identification and control problems of the form posed in Section 2. . . . To adjust the weights. 2. . Equation (2.38) ∂wkl ∂wkl j=1 n+m where yi (n) = j=1 wi j z j (n) and δik is Kronecker’s delta (δik = 1. . The first method is based on linearization and represents the only . xn (n)) follows a desired trajectory (x1d (n). 2. If wkl is a typical weight. . x2 (n). i = k and 0 otherwise). . xnd (n)).42 Modeling and Control of Complex Systems Let a recurrent network consist of r inputs ui (·) (i = 1. the partial derivatives of xi (n) with respect to the weights have to be determined. and the first term in the brackets the implicit effect on the state due to network dynamics.2. . . r ) and n state variables xi (i = 1. the effect of a change in it on the network dynamics can be determined by taking partial derivatives of both sides of Equation (2. .36) is a linear time-varying difference equation (which corresponds to the sensitivity network described in dynamic back propagation) which can be used to generate the required partial derivatives in real time. x2d (n). n) which are related by the state equations: ⎛ ⎞ xi (n + 1) = γ ⎝ n r w1. . . . .4 Identification and Control Methods In this section. .36).

12). We now rewrite the . to obtain a more accurate representation of the dynamic system in a neighborhood of the equilibrium state. The latter are included in the list of references contained at the end of the chapter. 2. We comment on and raise both theoretical and practical issues related to these methods.u=0) . The fact that linear systems analysis and synthesis have proved extremely useful in a large number of practical applications attests to the fact that the neighborhoods in which the linearizations are good approximations of the dynamic systems are not always small. We discuss the approach in detail and comment on its practical implementation as well as its limitations. Such representations enjoy many of the theoretical advantages of linear systems. and C = ∂ H |(x=0) are. In this section we attempt. using similar methods.u=0) . These make different assumptions concerning the representation of the plant. The local results for are based on the properties of its linearized system L described by: L : x(k + 1) = Ax(k) + Bu(k) y(k) = C x(k) (2.Control of Complex Systems Using Neural Networks 43 one that we endorse without reservation. These comments are based on extensive simulation studies carried out in numerous doctoral theses at Yale and the Technical University of Munich [30]–[35] under the direction of the first author. 2. The introductory chapters of most textbooks on these subjects emphasize the fact that almost all physical systems are nonlinear and that linear systems are obtained by linearizing the equations describing the system around an equilibrium state (LTI system) or around a nominal time-varying trajectory (linear time-varying system.1 Identification and Control Based on Linearization A vast body of literature currently exists on linear systems and linear control systems. A nonlinear system is described by Equation (2. respectively. but are not included here due to space limitations. LTV) so that they are valid in some neighborhood of the equilibrium state or trajectory.1. alternate methods proposed by other researchers are included in this section. B = ∂ F |(x=0. ∂x ∂u ∂x the Jacobian matrices of F and H with respect to x and u. In addition to the above.39) where A = ∂ F |(x=0. Even though only some methods are presented here. they subsume to a large extent most of the approaches that have appeared in the neurocontrol literature.1 System Representation The developments in this section essentially follow those reported in Reference [36] by Chen and Narendra.4. while at the same time permitting improvements in performance using neural networks to compensate for nonlinearity.4. These are used to derive the main results developed at Yale during the period 1990 to 2004. by including nonlinear terms that are small compared to the linear terms.

and suggests (as shown in this section) how methods proposed for linear systems can be modified for identifying and controlling the nonlinear plant. Using the approach proposed. If F1 .4.” as functions of their argument. If F1 ∈ H and F2 (0) = 0 and is continuously differentiable. 2. If A is a constant matrix and F (·) ∈ H. F1 F2 (·) ∈ H. F2 ∈ H then F1 F2 . The representation shown in Figure 2.2 in the context of the nonlinear system . F1 + F2 ∈ H. Since this is in general agreement with procedures followed by the neurocontrol community. and composition of “higher-order functions.8 actual system Equations (2. u(k)) y(k) = C x(k) + h(x(k)) (2. We denote this class by H. 3. multiplication. any smooth function can be expressed as the sum of a linear function and a higher-order function. Because the problems invariably lead to the addition. then AF (·) ∈ H.44 u Modeling and Control of Complex Systems Unit Delay A + + y b c h( ) f() FIGURE 2.8.2 Higher-Order Functions We shall address all the problems stated in Section 2. In all cases we use the inverse function theorem and the implicit function theorem as the principal tools and indicate how the assumptions concerning the higher-order function permit the application of the theorems. linear identifiers and controllers are first designed before an attempt is made to include nonlinear terms.8 highlights the role played by the linear system in the synthesis of controllers.40) is seen in Figure 2. The following properties of functions in H are found to be useful and can be verified in a straightforward fashion: 1. 2. ∂x Thus.40) where f and h are called “higher-order functions.” A block diagram representation of system (2. DEFINITION A continuously differentiable function G(x) : Rn → Rm is called a “higherorder function” if G(0) = 0 and ∂G |x=0 = 0.12) as: : x(k + 1) = Ax(k) + Bu(k) + f (x(k). . in some neighborhood of the equilibrium state.1. the approach has both pedagogical and practical interest. we formally define them as follows.

y) = Ax + By + f (x. observable. x ∈ Rn 45 (2. and stable. there exists a neighborhood U ⊂ U1 . then the system is locally controllable. and for all x ∈ U x = A−1 y + g( y) g(·) ∈ H. that is.3 System Theoretic Properties In Section 2.1.9 and Figure 2. y). y) with x ∈ Rn and y ∈ Rk . 2. then is also controllable. A is nonsingular.4. it can be shown that if U1 ⊂ Rn+k is an open set containing the origin.43) satisfies the equation F (x.Control of Complex Systems Using Neural Networks If in some neighborhood U1 ⊂ Rn of the origin the equation Ax + f (x) = y A ∈ Rn×n . observable. that is.10). where A is nonsingular. Local controllability: If the system L is controllable. and stable.41) is defined. and f (·) ∈ H is a function from U1 to Rn . The above two results can be shown in block diagram form as in Reference [36] (Figure 2. there exists a neighborhood c of the origin such that for any states x1 . x2 ∈ c there is a finite sequence of inputs that transfer Ax + – x A f() y y A–1 – + x f (x) g( ) FIGURE 2.2 it was shown that if the linearized system L of is controllable. It is the existence of functions g(·) in the neighborhoods around the origin that can be used in the inverse operation that provides an analytical basis for all the results that have been derived in Reference [36]. We refer the reader to that paper for further details. an element of U1 is denoted by (x. and F (x. From the above it is seen that when the functions involved in the equations are either linear or belong to H inverses can be obtained in some neighborhood of the origin. y) = 0. The following discussions indicate how the additional nonlinear terms are determined in each case. containing the origin such that V = AU + f (U) is open. (2. then by the implicit function theorem there exists an open set U ⊂ Rk containing the origin such that: x = A−1 By + g( y) y∈U g(· ∈ H) (2.9 . by the inverse function theorem. It is this fact that is exploited throughout this section.42) Similarly. there exist neighborhoods of the origin where these properties hold.

45) Local observability: Similarly.n) = [y(0). x(n)) g(·) ∈ H (2. . . then the nonlinear system: x(k + 1) = Ax(k) + η(x(k)) η(·) ∈ H (2.46 x y Ax Modeling and Control of Complex Systems A B y Σ A–1 B + + x g( ) f() FIGURE 2. If the input-output sequences are defined as: Y(0. For the nonlinear system .n (2. u(n − 2)]T L (2.n−1) ] − η[Y(0. This can be shown by demonstrating that a function V(x) = x T P x which is a Lyapunov function for Equation (2. . .n−1) ] while the application of the inverse function theorem yields: x(0) = Wo−1 [Y(0.46) −1 yields x(0) = W0 [Y(0.n−1) . When the nonlinear system is . Stability and stabilizability: It is well known that if the linear system described by: x(k + 1) = Ax(k) is asymptotically stable. . .n) and U(0. It is also well known that if the system x(k + 1) = Ax(k) + bu(k) is controllable. y(n − 1)]T U(0. u(n − 1)]T . we also have the result that if L is observable. . For the linearized system we have the equation: x(n) = An x(0) + Wc U0. u(1).48) is also a Lyapunov function for Equation (2. From the above it follows that there exists a neighborhood o of the origin in which the state x(0) can be reconstructed by observing the finite sequence Y(0.n) . using the implicit function theorem it is shown in Reference [36] that: U0. u(1).49) in a neighborhood of the origin.n = Wc−1 [x(n) − An x(0)] + g(x(0).44) where Wc is the controllability matrix and U0.n−1) ] (2. y(1). .49) (2.48) is (locally) asymptotically stable.n = [u(0). .n) − PU(0. . U(0. is locally observable.n) − PU(0. then it can be stabilized by a feedback control law of the form u(k) = x(k) where is a constant row vector.n−1) = [u(0). .47) where η ∈ H.10 x1 to x2 . and P is a known matrix. .

. the same input yields a constant state x ∗ asymptotically. For L . and the zero dynamics of the linearized system L is asymptotically stable. It has also been shown that there exists a nonlinear feedback controller u(k) = x(k) + γ (x(k)) which stabilizes the system in a finite number (≤ n) of steps (i. y∗ (k + d)] exist where g(·) ∈ H. as well as the principal references [19. 32] contained in it. (2. We merely provide the main ideas.. We merely state the results when the state of is accessible. the results presented thus far merely make precise in the three specific contexts considered that design based on linearization works locally for nonlinear systems.e. normal form. can be used to regulate the output around a constant value if the transfer function c[zI − A]−1 b does not have a zero at z = 1. State Vector Accessible) If the nonlinear system has a well-defined relative degree. which are beyond the scope of this chapter. and the corresponding ARMA representation for its linearization L .3. Set-point regulation and tracking: The same procedures used thus far can also be used to demonstrate that well-defined solutions exist for the set-point regulation and tracking problems stated in Section 2. for further details. using the implicit function theorem it can be shown that v can be expressed explicitly in terms of r . Neural networks can consequently be used to practically realize these nonlinear functions. then a neighborhood of the origin. x ∗ (k) ∈ . In each case the existence of a nonlinear function in H assures controllability. g(x(k))] and it follows that this is also asymptotically stable in some neighborhood of the origin s . we need concepts such as relative degree.Control of Complex Systems Using Neural Networks 47 considered. and refer the reader to the source paper [37]. and in the latter case use the NARMA representation for . observability. . THEOREM 1 (Set-Point Regulation) The output y of a nonlinear system (whose state vector is accessible) can be regulated around a constant value r in a neighborhood of the origin if this can be achieved for the linearized system L . we have the equation x(k+1) = Ax(k)+bu(k)+ f [x(k).51) follows asymptotically the . Tracking an arbitrary signal y∗ (k): To pose the problem of tracking an arbitrary signal y∗ (k) precisely.50) Because r = cx ∗ + h(x ∗ ). For the nonlinear system . and when it is not. where x ∗ = [A + b ]x ∗ + bv + f [x ∗ . x ∗ + v] (2. This consists of the sum of the input used in the linear case (i. where v is a constant. and zero dynamics of the nonlinear system.e. THEOREM 2 (Tracking. or stability. such that the output y(k) of desired output y∗ (k) provided x(k). r/c[I − A]−1 b. transfers any state x0 ∈ s to the origin in a finite number [≤ n] of steps). and a control law of the form: u(k) = (c Ad−1 b) −1 [y∗ (k + d) − P x(k)] + g[x(k). A = A+ b ) together with γ (r ) where γ ∈ H. In summary. an input u = x + v.

Hence. u(k). y∗ (k + d). containing the origin. instability is possible. u(k − n + 1))] (2. we proceed to consider the application of these results to Problems 1 to 3 stated in Section 1. Because the plant is known to be stable. . . If the zero dynamics of is asymptotically stable. no stability questions arise in this case. the network NH can be trained by standard methods from a knowledge of y(k) and the estimate y(k)(= NH (x(k))). PROBLEM 4 (Identification) A stable nonlinear plant is operating in a domain D ⊂ X. . 2. y(k − 1). . . Inputs and Outputs Accessible). u(k). . y(k − 1). as stated in the previous sections. . y(k − n + 1).4. and x(k) is bounded. as shown in Figure 2. respectively. u(k − n + 1)) (2. Two networks NF and NH are used to identify the functions F and H. . We shall use multilayer networks and radial basis function networks.2 Practical Design of Identifiers and Controllers (Linearization) Having established the existence of the appropriate functions for the identification and control of nonlinear plants. . Since the input set is compact.53) such that y(k) will follow any reference signal y∗ (k) with a sufficiently small amplitude. ˆ NF (k) can be identified using a series-parallel method. or off-line using data collected as the system is in operation. If a parallel model is used to identify the system.2. as well as recurrent networks for both identification and control. the state x and the output y of also belong to compact sets. Comment 10 Before neural networks are used in any system. then there exists a control law of the form u(k) = 1 ∗ [y (k + d) − α0 y(k) − α1 y(k − 1) − · · · − αn−1 y(k − n + 1) β0 − β1 u(k − 1) − · · · − βn−‘ u(k − n + 1) + g( y(k). training of NF should be carried out . .52) where β0 = 0 and ω(·) ∈ H. Because the state x(k) is accessible.11(b). . The objective is to determine a model ˆ of the plant using neural networks. Identification can be carried out online. · · · y(k − n + 1). as shown in Figure 2. the existence of the appropriate input-output maps must be established.48 Modeling and Control of Complex Systems THEOREM 3 (Tracking. depending upon the available prior information and the objectives. Theorems 1 to 3 establish the existence of such maps. and the practical realization of neural networks. while all the signals in the system remain in a neighborhood of the origin. Let have a well-defined relative degree and an input-output representation of the form: y(k + d) = α0 y(k) + · · · + αn−1 y(k − n + 1) + β0 u(k) + · · · + βn−1 u(k − n + 1) + ω ( y(k). .11(a).

the stability question is invariably present. and that any increase in the nonlinear component of the control input has to be gradual.52) and (2.12 shows the structure of the indirect controller proposed in Reference [12]. Once the model is structurally stable. . parameter adjustments can be carried out online.3. Hence. PROBLEM 6 (Plant unstable) At present we have no methods available for controlling an unknown. the region in the state space where x(t) lies has to be limited and a linear controller designed first before the nonlinear component is added to it. Adjustment of network parameters has to be carried out using dynamic back propagation. unstable. Comment 11 It may also be preferable to identify F (x. u) to distinguish between the contributions of the linear and nonlinear parts. as a first step.11 Identification. provided the adjustment is slow compared to the dynamics of . nonlinear plant with arbitrary initial conditions. standard adaptive techniques can be used to stabilize the system.53) have to be used. 49 x (k+1) x (k+1) x(k) F z –1 e (k+1) ˆ x (k+1) ˆ x (k+1) ˆ x(k) z–1 (b) Parallel off-line using the weights obtained by the series-parallel method as initial values. Figure 2. If only the inputs and outputs of the plant are accessible in Problems 2 and 3. Increasing the domain of stability is a slow process and involves small changes in the nonlinear components of the input. However. if the system is operating in the domain where linearization is valid. PROBLEM 5 (Control) Whether a multilayer network or a recurrent network is used to control the unknown plant. the NARMA model and the corresponding controller given by Equations (2. The direction in which the parameter vector of the neural network is to be adjusted must be determined using one of the methods (of dynamic back propagation) described in Section 2.Control of Complex Systems Using Neural Networks u(k) u(k) F x(k) e (k+1) z–1 NF NF (a) Series-Parallel FIGURE 2. u) as Ax+ Bu+ f (x. It is well known in industry that great caution is necessary while introducing nonlinear control.

4. So the control has to be evaluated in the presence of external and internal perturbations.40) is comparable to the linear components] we do not have an adequate theory at the present time. external disturbances are invariably present.1 Modeled Disturbances and Multiple Models for Rapidly Varying Parameters In the three problems stated in Section 2. numerous methods such as the use of a dead zone. and improves performance using neural networks and a nonlinear component of the input that is small compared to the linear component. Comment 12 The preceding sections indicate that the method of linearization is conservative but rigorous. If the trajectories of the plant lie in a large domain where the above assumptions are not satisfied [i. f (x. 2.2. the plant generally contains dynamics not included in the identification model. However. These methods have also been suitably modified for use in neurocontrol. From a theoretical standpoint..2 and discussed above there were no external disturbances. these methods can be rigorously .5. When small external perturbations and slow parameter variations affect the dynamics of the system. In addition. it was also assumed that the systems to be controlled are autonomous (or the plant parameters are constant) so that the plant can track a desired signal exactly as t → ∞. σ -modification and | |-modification have been suggested in the literature to assure robustness. This underscores the importance of Section 2. The reader is referred to the comprehensive volume on robust control by Ioannou and Sun [38] and the monograph [39] by Tsakalis and Ioannou for details concerning such problems. and the characteristics of the plant may change with time.50 Modeling and Control of Complex Systems Desired output ym – ei Neural Network Ni TDL Neural Network Nc TDL u Nonlinear Plant TDL Z–1 ˆ yp + – TDL yp + Σ ei(t) Σ ec(t) Reference Model r(t) FIGURE 2. u) in Equation (2.e. assures stability using only the linearized equations.12 Indirect control. understanding of which is essential to formulate precisely adaptive control problems in regions where nonlinear effects are predominant.

Instead. Due to space limitations we do not consider them here.55) where xv (k) ∈ Rn . .4. In classical adaptive control theory. They have discussed the approach in detail in Reference [40]. It can be shown that this results in y(k) [and hence y(k)] tracking ˆ y∗ (k) exactly asymptotically in time.. and a few of these are reported in Reference [40]. in the The NARMA models 1 . The plant switches slowly and randomly between the . A large number of simulation studies have been successfully carried out on very different classes of nonlinear systems.4. N approximate the behavior of different environments. and (2) where the parameters vary rapidly and multiple models are used. . The state x(k) of as well as the disturbance are not accessible and it is desired to track a desired output y∗ (k) exactly (as k → ∞) even in the presence of the disturbance.. 2.Control of Complex Systems Using Neural Networks 51 justified if control based on linearization is used.2. when both the plant and the disturbance model are linear. ˆ · · · . u(k − 1). y(k − 1).54) where x(k) and u(k) are the same as before and v(k) is a disturbance that is the output of an unforced stable disturbance model v where v : xv (k + 1) = g[xv (k)] v(k) = d[xv (k)] (2.2. it is known that exact tracking can be achieved by increasing the dimension of the controller.56) to identify the given system as an (n + m) th -order system and control it as in Problem 3. u(k − (n + m) + 1)] (2. The following NARMA identification model is proposed by the authors: y(k + 1) = N[y(k).2 Multiple Models The authors of this chapter are currently actively working in this area and without going into much detail we state the problem qualitatively in its simplest form and merely comment on the principal concepts involved.1. In the simplest case the N models are assumed to be known so that N controllers (each corresponding to one of the models) can be designed. u(k). A plant can operate in any one of N “environments” at any instant.1. u(k). we discuss two cases: (1) where the disturbances are large and can be modeled as the outputs of unforced difference equations. · · · y(k − (n + m) + 1). 2 . 2.1 External Disturbances [40] A SISO system is described by the equations: : x(k + 1) = F [x(k). Mukhopadhyay and Narendra have used this concept for disturbance rejection in nonlinear dynamic systems. v(k)] y(k) = h[x(k)] (2.

13. When the plant characteristics vary. where yi (k) is the output of the ith model] are computed. However. convergence of the model to the plant has been demonstrated in the linear case [41]. and all the trajectories lie in a neighborhood of the origin. Desired Output y* N environments. This has been referred to as the “switching and tuning” approach [42] and is widely used in many different fields (refer to Section 2. the model that corresponds to the smallest error according to an error criterion is chosen at that instant. If the plant comes to rest after a finite number of switchings. In substantially more interesting problems.52 Modeling and Control of Complex Systems ˆ yN + – Model Σ1 uN u u1 Plant y ˆ y1 + – + – Σ Control Error ec Σ e1 eN Model ΣN Σ Controller C1 Controller CN FIGURE 2. all the systems i share the same equilibrium state. The objective is to detect the change in the plant based on the available output data and use the appropriate controller corresponding to the existing model. and the controller corresponding to it is used. It has also been shown that the same is true for nonlinear systems. Simulation studies have been very successful and the transient performance of the system is substantially improved. they may not correspond exactly to one of the predetermined models assumed in the previous case. In the cases discussed thus far. transitions between equilibrium states may involve regions . at every instant.13 Multiple models. As shown in Figure 2. In such situations tuning of both the model i and the corresponding controller Ci may be needed online.7). The results obtained for deterministic systems have also been extended to stochastic systems [43]. the models have different equilibrium states and operate in neighborhoods corresponding to those equilibrium states. N errors [e i (k) (e i (k) = yi (k) − y(k)). provided all of them satisfy the linearization conditions stated earlier.

j = i. . in its simplest form it implies that each output is controlled by one input. . The desirability of decoupling in some practical applications is obvious. .2. 2. . The representation. as described in Section 2. Qualitatively. .Control of Complex Systems Using Neural Networks 53 where the nonlinear terms are dominant.1 Tracking Given a controllable and observable nonlinear dynamic system with m inputs ui (·) and m outputs yi (·) i = 1. . m. due to the couplings as well as the delays that exist between the inputs and outputs. . and the applicability of neural networks as practical adaptive controllers will be judged by their performance in such contexts. r (k). u(k − 1). . the system has a NARMA representation of the form (2. c (·) can be approximated using a neural network Nc [or as the sum of a linear function of u(k − i) and y(k − i) and a nonlinear component that belongs to H].23). It is these equations that are used to determine controllers for tracking a desired output y∗ (k). y(k) = C x(k).2 Decoupling An important practical question that arises in multivariable control is decoupling. .2. 2.5. .2. Stability is guaranteed by the linear part while exact tracking is achieved using the nonlinear component.58) As stated earlier in this section.2. . m (2. it can be shown that it can be represented in a neighborhood of the origin by the equations: yi (k + di ) = i [x(k). . .2. with well-defined relative degrees di . Based on the existence of control inputs for LTI systems. We comment briefly here on two questions that are relevant to the objectives of this chapter. y(k − n + 1) and u(k). and is not invariant with respect to ui . u(k − n + 1)] (2. .57) Because x(k) (by the assumptions made) can be expressed in terms of the outputs and inputs y(k). 2. . y(k − 1). . . y(k − n + 1).4. . . They are (1) the problem of tracking in multivariable control and (2) decoupling of multivariable systems.2 Control of Nonlinear Multivariable Systems [44] Most practical systems have multiple inputs and multiple outputs. Switching to assure that the system transitions from one neighborhood to another raises questions that require concepts of nonlinear control that are contained in Section 2. .2.4. and control of nonlinear multivariable systems are rendered very complex. For details concerning these problems the reader is referred to Reference [44]. In mathematical terms the output yi is invariant under inputs u j . identification. it can be shown that the desired input u(k) can be generated as the output of a multivariable nonlinear system: u(k) = c [y(k). u(k − n + 1). . u(k)] i = 1.4. 2. . For linear multivariable systems described by the equation x(k + 1) = Ax(k) + Bu(k). 2. . . .

⎤ ⎥ ⎥ ⎥ x(k) + E −1r (k) ⎦ (2. . . Each system j affects the input to the system i by a signal a i j x j where i has no knowledge of either a i j or x j (t).4. 2 . it can be shown using the same arguments as before that in a neighborhood of the origin. 47]. each subsystem attempts to cancel the perturbing signals ˆ ˆ from the other subsystems (e. the different dynamic systems are assumed to be linear.60) While approximate decoupling can be achieved using linear state feedback. As stated earlier..14 consists of N subsystems 1 .2. It was shown that the overall system would be asymptotically stable and that all the errors would tend to zero. 2. . Similar arguments can also be given for the case when decoupling has to be achieved using only inputs and outputs.. exact decoupling can be achieved by approximating g(·) using neural networks. h i j x j ) by using h i j xmj in place of h i j x j . N .2. The above problem was answered affirmatively in Reference [46. In adaptive contexts. r (k)) g ∈ H (2. exact decoupling is possible using a controller of the form: u(k) = Gx(k) + F r (k) + g(x(k). i is linear and time invariant. Since the solution of the linear problem is the first approximation for nonlinear control problems that are addressed using linearization methods. ..g. . to make the problems analytically tractable.59) c m Adm = Gx(k) + F r (k) (E is the matrix whose ith row is c i Adi −1 B). A system shown in Figure 2. whose objective is to choose a control input ui (·) such that its state xi (t) tracks a desired state xmi (t).54 Modeling and Control of Complex Systems it is well known that the system can be decoupled if a matrix E ∈ Rm×m is nonsingular [45] and ⎡ ⎢ ⎢ u(k) = E −1 ⎢ ⎣ c 1 Ad1 c 2 Ad2 . we present here an important result that was obtained recently and that may have interesting implications for decentralized nonlinear control of the type stated in Section 2. If it can be assumed that the desired outputs xmi (t) of the N subsystems are common knowledge. If the linearization of the given nonlinear system can be decoupled. and ˆ adapting h i j (t).3 Interconnected Systems All the problems considered thus far in this section are related to the identification and control of isolated nonlinear systems. The question that is raised is whether all the subsystem i can follow their reference inputs without having knowledge of the state vectors x j (t) of the other systems. interest in the control field is shifting to problems that arise when two or more systems are interconnected to interact in some sense.

Control of Complex Systems Using Neural Networks y*1 u1(t) Σ a1NxN a12x2 u2(t) Σ A1. BN. The authors claim that the NL q system form that they introduce [alternating sequences of nonlinear elements (N). 2.5). the dynamical systems satisfy the condition discussed earlier.61) ··· + Dq u(k)] · · · D2 u(k)] + D1 u(k)] (2. B1. C2 y2 Σ y*2 e2 y1 Σ e1 55 xN uN(t) Σ AN. Decentralized nonlinear control with nonlinear interactive terms between subsystems stated in Section 2. where the trajectories lie in larger domains in the state space.14 Interconnected system.62) . C1 x1 x2 A2. introduced by well-known researchers. we present in this subsection some representative samples of methods.3 Related Current Research The literature on neural network-based control is truly vast. The plant and controller models (represented by Mi and Ci ) have the general form: x(k + 1) = y(k) = 1 [A1 2 [A2 1 [c 1 2 [c 2 ··· q [Aq x(k)(k)] · · · q [c q x(k) + B2 u(k)] + B1 u(k)] (2. more advanced methods will be needed (refer to Section 2. provided that in the region of interest is in the state space. B2. and there is also intense activity in the area at the present time. For more general nonlinear interconnections of the type stated in Section 2. Because our main objective is to examine and present the basic principles used in simple terms.4. CN yN y*N eN Σ FIGURE 2. Dealing with all the ongoing work in any detail is beyond the scope of this chapter. linear gains (L) having q layers] represents a large class of dynamic systems that can be used as identifiers and controllers.2 is obviously the next class of problems of interest and can be attempted using neural networks.2. One general method for representing nonlinear systems for control purposes was introduced by Suykens and coworkers [48].

Similar laws are also derived for recurrent networks. and γ (·) are known nonlinear functions. Identifiers using the approach in Reference [49] are represented by recurrent networks described by ˆ ˆ x (k + 1) = W1 (k)σ ( x (k)) + W2 (k)φ( x (k))γ (u(k)) ˆ ˆ ˆ (2. y(k) ∈ Rm is the output. V2 ) = ˜T ˜ ˜T ˜ ˜ ˜ ˜ ˜ e T Pe + 1 Tr [W1 W1 + W2 W2 + V1T V1 + V2T V2 ] is a Lyapunov function. The same structure is used for both controllers and identifiers so that stability questions of the subsystems as well as the overall system are the same. A general structure was also introduced by Poznyak and coworkers [49] in which identification. V1 . by making several assumptions concerning the boundedness of various ˜ ˜ ˜ ˜ signals in the system. and γ are assumed to be bounded functions. For seriesparallel models [where the arguments of σ and φ are x(k) rather than x (k)] ˆ standard adaptive laws can be derived directly from those found in Reference [15]. . the authors demonstrate that V(e. which results in a modified back-propagation scheme. However. To approximate the plant dynamics the parameters of the matrices are adjusted using dynamic back propagation. σ . It is assumed that a general nonlinear system can be represented by the difference equation: x(k + 1) = Ax(k) + W1 σ (x(k)) + W2 φ(x(k))γ (u(k)) (2. φ. deriving adaptive laws for such models was known to be a very difficult problem and was never resolved for the deterministic case. and tracking are treated. The model is asymptotically stable if there exist diagonal matrices Di such that: −1 ||Di Ai Di+1 || ≤ 1.63) The above condition assures the existence of a Lyapunov function of the form V(x) = ||D1 x||. Taking advantage of the structure of the models and using Lure theory it is shown that sufficient conditions can be derived for asymptotic stability. The procedure used to control the plant is based on indirect adaptive control. estimation.64) where W1 and W2 are weight matrices and σ (·). . ∀ i (2. These are expressed in terms of the matrices [ Ai (i = 1. In classical adaptive control. . To assure stability (satisfying inequalities) it is shown that the adjustment of Ai can be realized by solving a nonlinear constrained optimization problem. . W2 . which 2 assures the convergence of the state error e to zero. However. φ(·). sliding mode control. W1 .56 Modeling and Control of Complex Systems where x(k) ∈ Rn is the state. and u(k) ∈ Rr is the input of the recurrent network where the complexity increases with q .65) ˆ ˆ and the adaptive laws for adjusting W1 (k) and W2 (k) are derived. The objective is consequently to estimate W1 and W2 and then to determine the control input u(·). 2 . Hence W1 and W2 determine a family of maps to which any given nonlinear system belongs. The emphasis of the book is on recurrent networks and we briefly outline below the approach proposed for the identification problem. q )]. Both σ and φ contain additional parameter vectors V1 and V2 corresponding to hidden layers.

Comment 13 The system is stable and gradient methods are used to approximate unknown functions. assumptions concerning the plant to be controlled are invariably made to have analytically tractable problems. In Section 2. If z1 = x1 .4 the authors provide their own view on this subject. Neural networks approximate f 1 and g1 as f 1 and g1 x and the latter are used to determine the control input u. θ ) and α(x1 . Using similar methods. In the rest of this section the evolution of these assumptions to their present form is traced briefly by examining a sequence of typical papers that have appeared in the literature. it was only natural that different assumptions were made.2. In Section 2. then ∂α ˙ u = −z1 − z2 + ∂x (x2 + f + θ T ζ ) + ∂α θ ∂θ 1 ˙ ˆ θ= ζ ∂α z1 − z2 ∂x 1 ˆ − σ ( θ − θ0 ) . (2.67) It is assumed that f (·) is known while φ(·) is unknown and that the state variables are accessible. With this assumption the problem becomes a nonlinear (second-order) version of the problems described in Section 2. in the next section. The input and output are related by the equation y = f 1 (x) + g1 (x)u where f 1 (x) = h T f 0 (x) ˙ x ˆ ˆ and g1 (x) = h T g0 (x). The objective is to regulate the system around the equilibrium state x1 = x2 = 0. it was stated that assumptions can make complex problems almost trivial.3. The authors believe that the neurocontrol community should reexamine the conditions that are currently assumed as almost self-evident. θ ) = −x1 − θ T ζ (x1 ).1. starting around the early 1990s.Control of Complex Systems Using Neural Networks 57 ASSUMPTIONS As has been stated several times in the preceding sections. and h are smooth.4. To make the problem tractable it is assumed that φ(x) = θ T ζ (x1 ).66) where x(t) ∈ Rn can be measured and f 0 . Chen and Liu [50] consider the problem of tracking in a nonlinear system (1994). Polycarpou [51] deals with a second-order system (1996): x1 = f (x1 ) + φ(x1 ) + x2 ˙ x2 = u ˙ (2. z2 = x2 − α(x1 . an adaptive ˆ law for obtaining an estimate θ of θ . g0 .68) . In fact. Because nonlinear adaptive control is very complex. and a control law u are derived and are ˆ ˆ ˆ shown below. Gradually. where ζ (x1 ) is a vector of known basis functions. At the present time they have become an integral part of the thinking in the field. The system is described by the equation: x = f 0 (x) + g0 (x)u ˙ y = h(x) (2. these became accepted in the field and researchers began to apply them to more complex systems. prior information that is assumed in linear adaptive control is indicated.

Based on these assumptions it is shown that a control input u(x) = α(x) + uc (x) can be determined to stabilize the system. . . is justifiable in this case. We discuss this further in the next section. xi ) + G i (x1 . and Fi s are unknown. x2 .. . The control used is seen to depend upon the partial derivatives of α which is estimated online. . u) is an external disturbance that is bounded and unknown. x2 . . xn )u ˙ (2. In the absence of the disturbance a control α(x) stabilizes the system. ω(x. . . we consider a representative sample of four papers here. . The objective is to regulate the system close to the equilibrium state. . xi )xi+1 i = 1. . 2. 1. x2 . i ˙ xn = f n (x1 . . . x2 . Out of a very large collection of papers published in the literature [54]–[60]. . u(x)) = W T S(x) for any control law. . u(x)) is approximated here using basis functions. . 2. . which is typical of backstepping. 2. xi )xi+1 i = 1. . x2 . Comment 14 As a mathematical problem the above is precisely stated. . In [54] Kwan and Lewis (2000) consider the tracking problem in a system described by: xi = Fi (x1 . x2 . an arbitrary function ω(x. in our opinion.70) where the state variables are accessible. because it is a function of only one variable. xi ) + gi (x1 . . the resulting control is found to be quite complex.69) where ω(x. x2 . . i ˙ xn = Fn (x1 . The following two assumptions are made: 1.e. xn ) + gn (x1 . . . x2 . The objective is to determine a control input such that x1 (t) tracks a desired output asymptotically. . G i s are known and sign definite. . i. . xn )u ˙ (2. . Rovithakis [52] describes a system (1999) by the equation: x = f (x) + g(x)u + ω(x. which are based on backstepping procedures and utilize basis functions. Comment 15 It is assumed that the nominal system (without the disturbance) is stable and that an explicit Lyapunov function V(x) is known for it. . . u(x)) lies in the range of the basis functions S(x). . an expanded version of this paper was presented in Reference [53]. 2. Ge and Wang (2002) consider a similar problem in Reference [58] in which the system is described by the equations: xi = f i (x1 .58 Modeling and Control of Complex Systems In spite of the assumption and the low order of the system. . Also while φ(x1 ) in (d) was a function of a single variable. u) x(t) ∈ Rn ˙ (2. . In 2004. . The assumption that φ can be approximated using basis functions. . . . xn ) + G n (x1 . . . . . ω(x. and a Lyapunov function for the nonlinear system is V(x). .71) .

. . xi ) where ξi are basis vectors. If the plant is to be identified (i. . . It is further assumed that |gi | ≤ gid i = 1. we shall raise several questions and provide brief comments regarding each one of them to help us to critically evaluate the contributions made by the different authors. Because the approach based on linearization described in Sections 2.4. . Following this. . xi ) + xi+1 i = 1. xi ) in Equation (2.4. .3 we discussed some typical methods that are representative of a large number of others that have also been proposed and share common features with them.4 Theoretical and Practical Stability Issues In Section 2. . . .72) can be expressed as θiT ξi (x1 . Most of them claim that their approaches result in global asymptotic stability of the overall system and robustness under perturbations with suitable modifications of the adaptive laws. .2 is closest to classical adaptive control.1 Linear Adaptive Control The theoretical study of linear adaptive control starts with the assumptions made concerning the plant to be controlled.1 and 2. The above typical papers clearly indicate the thrust of the research in the community and the emphasis on basis functions. n − 1 ˙ xn = f n (x1 . in one way or another.4. we shall briefly comment on that.e. Zhuang. . . attempt to emulate classical adaptive control. The plant is assumed to be linear and time invariant (LTI) of order n.Control of Complex Systems Using Neural Networks As in (1) it is assumed that gi are bounded away from 0. . zn related to the unknown functions f i and gi lies in the span of a set of known basis functions. . Li.4. the overall system is . 2. . ˙ (2. . ˙ 4.. . An upper bound on n is known. . xn ) + u. If the plant is to be controlled. . Qiang. x2 . . parameters are to estimated) it is normal to assume that it is stable with bounded inputs and bounded outputs. 2. n and that a complex unknown ˙ function of z1 . . All the zeros of the plant transfer function are assumed to lie in the open left half of the complex plane (minimum phase). 2. we shall start with the latter to provide a benchmark for examining and comparing the different methods. .72) 59 Tracking of a desired signal is achieved by computing a control input assuming that each element f i (x1 .4. As all of them. .4. . it is generally assumed to be unstable and whatever adaptive scheme is proposed is expected to stabilize it. z2 . . x2 . . and Kaynak (2004) [59] consider the same system as in (2) and attempt the same problem with the same assumptions on gi and gi but use two different sets of basis functions. x2 . with 2n unknown parameters. Because the controller parameters that are adjusted become state variables. 2. Wang and Huang (2005) [60] consider the system described by: xi = f i (x1 . . 3.

2.60 Modeling and Control of Complex Systems nonlinear with an extended state space. x2 .4. Hence all the results of linear adaptive control carry over.4. this is what is meant by the proof of stability in adaptive control.4.1. THE PROBLEM: A system is described by the differential equation: x1 = x2 ˙ x2 = x3 ˙ ··· xn = f (x1 .2 Nonlinear Adaptive Control Using Linearization Methods As shown in Section 2. as the time ˙ derivative V of V is always negative semidefinite and not negative definite. and the input u is to be chosen so that the output x1 tracks a desired output y1d which is the . and interconnected systems.4. Theoretically. By a suitable choice of a Lyapunov function V and corresponding adaptive laws. This is the same procedure adopted for multivariable control. If only linear adaptive control is used. the neural networks are invariably adjusted on a slower time scale. However. It is next shown that the control error tends to zero. the state variables xi are accessible. in the method based on linearization. .73) f (·) : Rn → R is smooth. it is first shown that the system is stable and that all signals and parameters are bounded. It is in this extended state space that all properties of the overall system are studied. but are valid only in this domain.4. . Also. because the overall nonlinear effect (due to plant and controller) is small compared to the linear terms.3 Nonlinear Adaptive Control The following simple adaptive control problem is a convenient vehicle for discussing many of the questions related to purely nonlinear adaptive control. Additional conditions on the reference input (persistent excitation) are needed to show that the parameter errors tend to zero (asymptotic stability). the errors do not tend to zero due to the presence of the nonlinear terms. 2. xn ) + u ˙ (2. . Another important consequence of the linearity of the plant is that all the results are global. control using multiple models. to assure that the approximation is sufficiently accurate. Asymptotic stability never follows directly. It is at this stage that neural networks are needed to compensate for the nonlinear terms and make the control error tend to zero. adaptation of both linear and nonlinear terms can be fast. . we are operating in a domain where the linear terms dominate the nonlinear terms.

75) the error e = ( y − yd ) satisfies the same stable homogeneous differential equation and tends to zero asymptotically. • Suggesting that f (x). This solution was known in the adaptive control field thirty years ago. The theoretical solution is obtained by using the control input n u=− i=1 a i (t)xi − α T η(x) + r ˆ (2. the choice of u(·) is simple.Control of Complex Systems Using Neural Networks output of a stable nth-order differential equation: n−1 (n) yd + i=0 (i) αi+1 yd = r 61 (2. Instability may be caused by the residual error between f (x) and α T η(x).76) and adjusting α using an adaptive law derived from the error model ˆ in Figure 2. Consequently. . this assumption is not valid since stabilizing the system is one of the main objectives of adaptive control. Obviously. • To measure the inputs and the outputs of to estimate f (·). x ∈ Rn .74) where r is a known bounded reference input. information concerning the function f (·) in the strict adaptive control problem can be obtained only from the inputs and outputs of .4. n = 1 may be an exception). the problem is trivial from a theoretical standpoint. and strictly speaking one for which a solution does not exist unless some assumptions are made concerning f (·). • If it is assumed that f (x) = α T η(x) where η(x) is a known vector function of x and α ∈ Rn an unknown parameter vector. can be approximated by α T η(x) by choosing ηi (x) as basis functions and N sufficiently large is impractical even for small values of n (as stated in Polycarpou [51]. it is with these assumptions that we are concerned here. If n u=− i=1 a i xi − f (x) + r (2. f (·) unknown: This is an adaptive control problem. The prior information concerning f (·) determines the nature and complexity of the adaptive control problem. ˆ • If f (·) is a part of a nonlinear dynamic system to be controlled. it must be assumed that the system is stable. f (·) known: In the deterministic case where f is known.

If truncated models are used to identify a nonlinear system. it is not clear how the approximation error scales with the dimension n. we will assume that the plant is stable only if identification is of interest. is it a continually evolving process (such as an aircraft or a chemical process) or can it be stopped or interrupted and reinitiated (like a broom balancer or a biped robot [as in Section 2. In our opinion. In the twentieth century. The many theoretical proofs given in the literature are consequently not acceptable in their present forms. If the plant is unstable. Below we abstract from the above sample problem a set of general questions that we feel have to be answered before a nonlinear adaptive control problem is addressed. 4. 3. For example.7])? These two represent very different classes of problems. the effect of making the model more complex is not clear. from both theoretical and practical standpoints f (x) = α T η(x) is not a satisfactory parameterization of the approximator (though it may be convenient to derive adaptive laws). a large number of papers dealing with very complex nonlinear adaptive control problems with multiple unknown nonlinearities and highdimensional state space have appeared in the literature. Are basis functions a generally acceptable way to approximate a nonlinear function? Although it is true that any continuous function f (x). Hence. 1. only the method based on linearization can be used to stabilize an unknown unstable system. following adaptive control. the problem has not been solved rigorously thus far. and unstable if the principal objective is stability. The number of basis functions increases dramatically with the dimension of the space over which the function is defined. Online adaptive control refers only to the first class. The authors have considerable experience with approximation methods and have carried out extensive numerical identification for many years using different approaches. the magnitudes of the residuals are evident in such cases. Is the plant stable or unstable? As stated earlier. numerous representations for nonlinear systems have been proposed. Repetitive . One extreme assumption would be to assume that the plant is linear! But this will limit the class of plants to which the methods can be applied. These have shown that N must be very large even for simple functions. 2. With the models proposed in References [48] and [49] this is not the case. including neural networks. x ∈ Rn can be approximated as f (x) ≈ α T η(x) α ∈ R N using a sufficiently large number N of basis function (ηi (x)). However. at present. In all cases the nonlinearities are approximated as α T η(x) where η(x) is known.62 Modeling and Control of Complex Systems If f (·) is unknown. starting with the work of Volterra. Is the representation of the plant sufficiently general? It is clear that assumptions have to be made to render the problem tractable.

Control of Complex Systems Using Neural Networks learning. and the choice of the basis functions dictated by it. To the authors’ knowledge such investigations have not been carried out thus far.e. Stability questions also arise in computer simulations but very little cost is attached to them. This has not impeded the use of neural networks in practice to improve the performance of systems that have already been stabilized using linear controllers. However. The latter operate in real time (i. the plant trajectories should also lie in S. In the case of online adaptive control. Once again this demonstrates that successful applications do not necessarily imply sound theory. the behavior of the system outside S being unknown. 7. the dynamics of adaptation is as fast as the dynamics of the plant). a neural network can be trained (slowly off-line or online) to identify the system in S. If the plant is stable. Obviously. almost all of them can be shown to work satisfactorily if the plant is stable and the adjustments are sufficiently slow.. and they can be reinitiated. is not online adaptive control but is important both practically and theoretically. The reference trajectory should therefore lie inside S and during the adaptation process. This accounts for the great care taken in industry while trying new control methods. as seen from the next question. is the region S in the state space in which the trajectories lie known? We shall assume that this is indeed the case (though it is not a simple assumption in higher dimensions). as in the second case. regions outside S can be explored and a feedback controller can be designed through repetitive learning. However. stability questions arise only in the latter.7 attest to this. Much greater emphasis has to be placed in the future on the prior information assumed about f (x). The solutions for the most part are not mathematically precise. 63 In the authors’ opinion. Therefore. 6. This accounts for neural networks performing very well in practical applications. If a process can be interrupted. any input that drives the system outside S can result in instability. If this is all the prior information available. No gradient method in real time has been demonstrated to be stable. such a method operating in a slow time scale cannot stabilize an unstable plant. better theoretical formulations of problems are needed. there have been very few real theoretical results in nonlinear adaptive control using neural networks. Are gradient methods or stability-based methods used in the adjustment of the controller? The essential difference between the two is in time scales. Are the controllers to be designed off-line or online? As seen from earlier comments. The applications described in Section 2. . regions outside S can be explored incrementally using continuity arguments and approximating properties of neural networks. 5.

2.64 Modeling and Control of Complex Systems 2. Nevertheless. The mathematical machinery used in the study of global system theory consists of differential geometric methods. Our objective in this section is to point the reader to the excellent and insightful body of literature that exists on the subject as well as to convey the intuition behind the principal ideas involved. the characterization of observability.5 Global Control Design A basic problem in control is the stabilization of an equilibrium state. An obvious question of both practical and theoretical importance is whether or not the region of validity can be extended to larger domains of the state space. is curved and is defined as the manifold M. and the theory of topological groups. We begin by characterizing the natural state space of a nonlinear dynamic system. Whereas the manifold is an abstract geometric object. The first question that arises naturally is the essential difference between linear and nonlinear systems. It is important to keep this in mind when designing global nonlinear controllers. and realization “is not more difficult than linear systems of the usual type in Rn ”[62] p.1 Dynamics on Manifolds An excellent textbook on the subject is Boothby [63]. the development of adequate mathematical tools has always been guided by linear intuition and aimed at finding analogies of the concepts developed in linear systems theory. The space of nonlinear systems. In other words. the coordinate system is the physical handle on that object through which we have to interact with the system when we control it. where a point p ∈ M if there exists an open neighborhood U of p and a homeomorphic map ϕ : U → ϕ(U) ⊂ Rn . As Brockett put it. from a geometric viewpoint.5. called the (local) coordinate chart of M. The study of the global properties of nonlinear systems and their control is much more complicated than the local approaches employed so far. 1. The state space Rn of a linear system is “flat” in the sense that it expands to infinity along the direction given by the vectors of a basis of Rn . nonlinear stabilizers were developed which are valid in the neighborhood of such a point. the theory of foliations. This simple fact has an important consequence: many coordinate systems may be needed to describe the global evolution of a nonlinear dynamic system. the manifold “looks” locally like Rn . controllability. This will permit the neural network community to formulate well-posed problems in the design of nonlinear controllers using neural networks as well as to chart future directions for research. on the other hand. In Section 2.4. One important idea is the following: The flow of a system is a C 1 -map φ : R × U → M sending an initial value p ∈ U defined on some open neighborhood U ⊂ M to a value . even as the state space of a system becomes a differentiable manifold.

The mappings are denoted by: Xp : C ∞ ( p) → R (2. But what happens to the tangent vectors attached to p ? We define a map (called the tangent map of F at p) F∗ : Tp M → TF ( p) N as follows: F∗ X p (h) = X p (h ◦ F ) (2. Geometrically.81) where h is (again) a smooth scalar valued function h : M → R. that is. we have to find a new set of local coordinates. that is. It defines a vector field X p as follows: Xp = d φ(t.79) which is called the directional (Lie-) derivative of h along f . f (x)). p) ∈ M (at time t ∈ [t0 . The tangent map ϕ∗ : Tp M → Tϕ( p) Rn is used to define local representatives of the tangent vectors X p at p ∈ M. Once the solution leaves the neighborhood U in which the representation is valid. we obtain the usual differential equation: f (x) = x ˙ (2. ϕ∗ ( X p )) =: (x. It is easily checked that F∗ X p is indeed an element of TF ( p) N.77) on the smooth manifold M.Control of Complex Systems Using Neural Networks φ(t. Hence the argument of X p in Equation (2. it is composed of the point p on M and the corresponding tangent vector X p ∈ Tp M attached at that point. Furthermore ϕ( p) = x for all p ∈ U. ϕ). . the tangent space to N at F ( p) (see Figure 2.15). The tangent bundle defined as: TM = p∈M Tp M (2. Any vector field defines a linear operator assigning to any h ∈ C ∞ (x): n (L f h)(x) = i=1 f i (x) ∂h ∂xi x (2.82) is 2n-dimensional with natural coordinates (ϕ( p). Denote by C ∞ (x) the set of smooth functions defined on a neighborhood of x ∈ Rn . p) dt 65 (2. h ◦ F denotes the composition of h and F (at p).78) Notice that f (x) is merely the local representative of the vector field X p defined in Equation (2. t1 ]). p) ∈ M at t = 0.81) is simply the function h(·) evaluated at the point F ( p). Given a coordinate chart (U. the vector field assigns to each p ∈ M a tangent vector given as an element of a linear space of mappings called tangent space Tp M to the manifold M at p.77) t=t0 X p is the tangent vector to the curve t → φ(t.80) Given a smooth map F : M → N it is clear that for any point p ∈ M we have F ( p) ∈ N.

We are interested in designing a globally stabilizing feedback controller for a general nonlinear system of the form (2.g. We are now ready to define a (smooth) vector field as the mapping: X : M → TM (2.5. the solution for Equation (2. u(x)). u∗ = u(x ∗ )] is an asymptotically stable fixed point of the closed-loop system f (x. we set M = Rn in this section.66 Modeling and Control of Complex Systems F TpM p Xp F*Xp F(p) N M TF(p)N FIGURE 2. We assume that the solution for Equation (2. the region of attraction of x ∗ is a compact subset K ⊂ Rn . We restrict ourselves to the case of semi-global stabilization.84) where x ∈ Rn . Reference [64] and references therein) we select one that closely builds upon the results obtained in Chapter 4..84) exists up to infinite time for any u ∈ V ⊂ Rr fixed. a smooth connected manifold. 2.2 Global Controllability and Stabilization Keeping in mind that the representation of a dynamic system on curved spaces requires many local coordinate systems. Among the many ideas developed for nonlinear feedback control (see. Notice that the tangent space Tx Rn to the Euclidean space at any point x is actually equivalent to Rn . This is fundamental if neural networks . The vector field is parameterized by the controls u ∈ V ⊂ Rr . in fact it will allow us to extend the results in the very direct meaning of the word (see Reference [65]). In view of the above a nonlinear control system of the general type can be defined in local coordinates as follows: : x = f (x.84) where ϕ( p) = x ∈ Rn and ϕ∗ ( X p (u)) = f (x.83) assigning to every p ∈ M a tangent vector X p ∈ Tp M (in a smooth way).84) exists up to infinite time for any fixed u. u) ˙ (2. that is. u) ∈ Tx Rn are both defined in a neighborhood U of M. e.84). Given the control system (2. A first question is whether a smooth feedback can be found that stabilizes the system. The problem is to find a feedback control u ∈ V such that [x ∗ .15 Tangent maps.

. A fundamental property of discontinuous controls is that it may generate additional directions (i. u) = 0} 67 (2.85) denote the equilibrium set of the control system.86) is not smoothly stabilizable in the large. Equivalently. u) ∈ Rn × Rr | f (x. Moreover. ui ) where i = 1. It turns out that for some point [x ∗ . As an example [65].88) where φu (t. Example (adapted from Reference [64]) Consider a kinematic model of a car (front axis) with position [x1 . the system: x1 = u1 ˙ x2 = u2 ˙ x3 = x2 u1 − x1 u2 ˙ (2. If Equation (2. The conditions motivated the introduction of discontinuous feedback laws. because no point of the form x = [0 0 ε]T is in the image of f (x. x ∗ ∈ Rn u(t) : [0. u = 0 is bounded. we are interested in a special kind of controllability (discussed in Section 2. a general smooth system defined on a compact set K ⊂ Rn is never globally smoothly stabilizable because its equilibrium set is evidently bounded. T] → V ⊂ Rr . V ⊂ Rr is a finite set. Another necessary condition for C ∞ stabilizability obtained by Brockett [66] is that f (x. the state space of the car is given . N. that is. as its equilibrium set defined by 2 2 x1 + x2 = 1. It is clear that the controls ui ∈ V = {u1 .Control of Complex Systems Using Neural Networks are to be employed to approximate and implement the control law. As an example.. u) : Rn × Rm → Rn maps every neighborhood of (x ∗ . u N } generate different vector fields f i = f (x. the system 2 2 x1 = x1 + x2 − 1 ˙ x2 = u ˙ (2. Let f −1 (0) = {(x. Accessibility in general is the property that the above holds for arbitrary u : [0. T] → V ⊂ Rr . .87) does not have a continuous stabilizing feedback. Reference other than f i ) where the system may evolve. controllability means that every point x0 ∈ K can be steered into x ∈ Rn . x2 ] ∈ R2 and the angle of rotation x3 ∈ S1 . u).e. x0 ) is the flow of f (x. . . . Every vector field when applied to the system will cause the state variable x to evolve in the direction tangent to it. u(x ∗ )] ∈ f −1 (0) to be smoothly stabilizable f −1 (0) must be an unbounded set. u piecewise constant such that φu (T. x0 ) = x ∗ (2. u∗ ) onto a neighborhood of zero.88) holds for every point x0 ∈ K then x ∗ is said to be piecewise constantly accessible from the set K . .2): a point x0 ∈ K is piecewise constantly steered into a point x ∈ Rn if: ∃ T ∈ R. T > 0 for x0 ∈ K . . . To this end. u) on Rn with initial value x0 .

We know that if the linearized system L = ∂f ∂x x ∗ . It can be shown that the system moves infinitesimally in the direction orthogonal to the drive direction. u∗ ) over Nx∗ . w ∗ ) ∈ f −1 (0). as it implies that the points attainable from some point x0 by the vector fields f i lie not only in the directions given by linear combinations of f i but also in the direction of the (iterated) Lie brackets of f i .89) f 1 = ⎝ cos x3 ⎠ “drive 1 0 As the experienced driver knows. (2. The system must be locally stabilized at the point (x ∗ . 68]). For the interested reader some of the more formal mathematical statements regarding Lie brackets are included at the end of the section (see also Reference [67. that is. the Lie bracket writes [ f 1 . in order to park the car. Then w = w(x) has a piecewise smoothly stabilizing extension u = u(x) : Rn → V over K if and only if Nx∗ is p. The two applicable events are “drive” and “rotate” that corresponds to the two vector fields: ⎛ ⎛ ⎞ ⎞ sin x3 0 f 2 = ⎝ 0 ⎠ “rotate (2. where Nx∗ is an open neighborhood of x ∗ such that its closure Nx∗ is an invariant set of the closed loop system x = f (x. the resulting flow is obtained as the composition of the flows induced by the fields f 1 . Re s ≥ 0 (2. Let K be a compact set and w ∈ Int V where V is the set of admissible controls. ˙ Let us highlight the conditions of the theorem. B) is stabilizable.90) In the above example we obtain [ f 1 . u∗ ). It is required that 1. that is. In order to verify (1) and realize the local stabilizing controller the methods described in Chapter 4 are used.68 Modeling and Control of Complex Systems by M = R2 × S1 . we state the main result of the section. 2. − f 1 . one has to use the switching sequence given by “roll – rotate – roll back – rotate back”.91) rank(s I − A. constantly accessible from K .u∗ f . The point x ∗ must be piecewise constantly accessible from a compact set K . which is due to Reference [65]. u∗ ) ∈ f −1 (0). w(x)) and w = w(x) smoothly stabilizes in (x ∗ .u∗ := ( A. f 2 . . The new direction corresponds to the Lie bracket of the vector fields f 1 and f 2 . ∂u ∂ x ∗ . and − f 2 . THEOREM Let w = w(x) be a smooth feedback that locally stabilizes at (x ∗ . The Lie bracket is instrumental in understanding nonlinear control systems. The Lie bracket of two vector fields is another vector field that measures the noncommutativeness of the flows induced by both vector fields. f 2 ]x = ∂ f2 ∂x x f 1 (x) − ∂ f1 ∂x x f 2 (x).w. At present. B) = n whenever then the original system is locally C ∞ stabilizable at (x ∗ . In local coordinates. f 2 ] = [− cos x3 sin x3 0]T .

88). Based on this existence theorem. N} and a switching sequence σ (x) : Rn → that decides which control law is to be used depending on the state of the system.96) . We define a linear feedback: u = −x2 (2. . Assuming global accessibility of the origin (the question is addressed later) we define the piecewise smooth feedback: 2 u = −sign(x1 ) 1 + x1 − sign(x2 ) (2. i = 1.94) which steers the system state to the region where the linear stabilizing control law is valid. The theorem states that the semiglobally stabilizing control law will be given by a set of smooth controls ui . . Notice that one of the controllers ui corresponds to the local stabilizer at x ∗ while the others serve to enlarge the region of attraction of x ∗ in the closed-loop system. This can be verified using the strict Lyapunov function: 2 V = |x1 | + 0. u) | u ∈ V ⊂ Rr } generates a Lie algebra of differentiations of C ∞ (Rn ). i ∈ = {1. N + 1 neural networks can be used to implement ui . . .5 x2 (2.Control of Complex Systems Using Neural Networks 69 Condition (2) refers to our above discussion and Equation (2. Consider the control system: x = f (x) + g1 (x)u1 + . Example Find a global controller that stabilizes the origin of the system: 2 x1 = −x1 + x2 1 + x1 ˙ x2 = u ˙ (2. Controllability of a nonlinear system depends upon the way the family of vector fields FV = { f (x. N and the switching function.93) to stabilize the nonlinear system (2. Figure 2. . . .92) We verify condition (1) and find that the controllability matrix of the linearized system (at zero) has full rank. The first N networks play the role of function approximators while the ( N + 1)th network is used as a classifier of the state space of the system. This is possible since the system is in fact semiglobally controllable.16 displays the local stability region and a sample trajectory which starts from outside this region and is steered to the origin using a piecewise smooth control.92) in a neighborhood of the origin. ˙ drift vf control vectorfield (2. It is clear that the control law cannot be extended to an arbitrary compact domain 2 K ⊂ R2 . + gr (x)ur . .95) ˙ The time derivative along the trajectories of the closed-loop system is V = −|x1 | − |x2 | < 0.92). . because the quadratic term x1 will eventually dominate the stable part “−x1 ” on the right-hand side of Equation (2. .

. The controller consists of a piecewise smooth feedback of the form (2. The set of points reachable from x0 ∈ K lie on the integral manifold of x . Frobenius’ theorem states that is integrable if and only if it is of constant rank and involutive.16 Piecewise smooth stabilization in the large. f 2 ] ∈ at every point x ∈ Rn . The set of vector fields of Equation (2.70 Modeling and Control of Complex Systems 3 2 1 0 −1 −2 −3 −3 −2 −1 0 1 2 3 FIGURE 2. [ f. Given an initial point x(0). This is achieved by successively including “new directions” to obtained by forming higherorder Lie brackets of the vector fields in FV . .97) We form a distribution of vector fields. is controllable if it is possible to construct an integrable distribution of dimension n.94) (outside the local stability region) and (2. 2 1 + x1 0 (2. x is called a distribution of vector fields at x and is a subspace of the tangent space Tx Rn of Rn at x. that is.98) We include only those Lie brackets that are not already contained in lowerorder Lie brackets of vector fields in FV . .93) (in a neighborhood of the origin). g]} = span 0 1 . x = span{g. we wish to determine the set of points that can be reached from x(0) in finite time by a suitable choice of the input functions u1 . Lie x FV is the linear space spanned by the tangent vectors of Lie FV at that point. Thus. f 2 ∈ ⇒ [ f 1 . In the above example we have f (x) = 2 −x1 + x2 1 + x1 0 and g(x) = 0 1 (2. closed under Lie brackets: f 1 . Given any point x ∈ Rn . . ur . .92) that can be obtained by applying different controls u spans a linear space x at any point x ∈ Rn .

Stabilization of a nonlinear system is more difficult if only a function of the state is available. see Equation (2. His proof makes use of a “generic transversality” theorem of differential topology to characterize observable flows. and control theory which addresses the questions of global nonlinear control.Control of Complex Systems Using Neural Networks 71 The dimension of that space is called the rank of the Lie algebra Lie FV at the point x ∈ Rn . . The system defined in Equation (2.2. conditions under which strong observability holds have been investigated and it is shown how this can be used to construct global input-output models of the nonlinear system. observability. .5. The construction of the distribution enables us to identify the set of reachable states without specifying the control input u1 .3 Global Observability In the above.96). it was assumed that the state x of the system is accessible.98). Comment 16 In this section we found that the extension of familiar concepts such as controllability. An extension of this result to “universal observability” has been given in Reference [70] in which the output function is only continuous. A weaker form is generic observability.99) The condition is evidently fulfilled in our example. ur .100) where h : Rn → R is a C ∞ function representing some measuring device. This is referred to as strong observability. Constructive methods for actually realizing global controllers based on geometric control theory are only sparely available and often . In a paper by the first author [19]. one can distinguish these initial states by observing the values of the “output” function h(x) for any input sequence of length l. given two initial states x1 and x2 . The question is whether. Unlike the linear case it is impossible to pass directly from controllability conditions to observability because in the nonlinear domain there is no clear notion of duality. topology. As is well known from control theory and has also been stressed in Section 2. in the way introduced earlier: y = h(x) (2.84) is globally controllable provided that: rank Lie x FV = dim Rn ∀ x ∈ Rn (2. However. which requires that almost any input sequence of length greater or equal to l will uniquely determine the state. and stabilization to the nonlinear domain requires new mathematical tools and insights. The set depends exclusively on geometrical properties of the system (2. this literature has not yet entered the engineering literature. the critical property in this case is observability. Over the last thirty years a rich body of literature has been created at the intersection of differential geometry. 2. In Aeyels [69] it is demonstrated that almost all smooth output functions pair with an almost arbitrarily chosen smooth vector field to form a globally observable system provided that l = 2n + 1 samples are taken.

Hence it addresses problems of optimization under uncertainty and bears the same relation to Section 2. unlike the adaptive control problems treated earlier. In this section. in other cases solutions may have to be computed online. Even though the authors are not actively involved in research in this area at present. These are problems in which optimization is carried out over a finite time. we wish to discuss this topic briefly and clarify the concepts involved. We have not reached this stage yet in the nonlinear domain but the material presented in this section is meant to be the first step in that direction. The fact that switching may be involved in order to overcome the topological obstruction to global stabilization must become common knowledge in much the same way as state space properties are in linear systems.6.1 Neural Networks for Optimal Control A question that arises in all decision making in both biological and engineering systems concerns the extent to which decisions should be based on memory and online computation. in some cases retrieval of stored answers may be preferable.72 Modeling and Control of Complex Systems involve a high level of mathematical formalism which is hard to grasp intuitively.2 the principal mathematical vehicle is dynamic programming. our ideas for successful and ingenious design come from the “feel” we have about the effect of the chosen design on the system behavior. . Optimal control is the principal tool. In Section 2.4 bear to feedback control theory. 2. the system to be controlled is either unknown or partially known. based on data obtained at that instant. and is used to determine optimal control inputs as functions of time. as well as the analytical difficulties encountered. a few years ago the first author had an active program in the area of optimal control using neural networks. In Section 2.4. the system to be optimized is assumed to be completely known.1. there is currently a great deal of interest in the use of neural networks in optimization and optimal control problems. For example.6. The multiplemodel approach described in Section 2.6 Optimization and Optimal Control Using Neural Networks As stated in Section 2.6. More importantly.6. The information collected is used to design neural networks to act as feedback controllers.1 that adaptive control problems discussed in Section 2. much of the motivation for using different approximation schemes. Finally. Hence. we describe methods proposed in the last decade that utilize the above concepts for solving optimal control problems using neural networks. 2. although the problems treated in this section involve optimization over a finite time interval. In practice. are very similar.1. for the sake of completeness.4 provides the architecture for orchestrating the action of the neural networks involved in global nonlinear control. and consequently has some familiarity with such problems.

1 Function Approximation Using Neural Networks The gradual evolution of the “solve and store” approach from function approximation to optimal feedback control is best illustrated by considering several simple examples. Substantial progress was made. The problem was solved 1000 times for randomly chosen values of x1 and x2 in a compact set and a two-input two-output neural network shown in Figure 2. controllers of the form u(x)] are required.1.2. where feedback controllers [i.. PROBLEM 7 Consider the network shown in Figure 2. Because the concepts may prove attractive to future researchers in the field.e. which makes them nonrobust in practical applications.. 2.17 Estimation of voltages in a time-varying electrical system. The network outputs were compared Rx1 R(x2 – x1) R(1 – x2) x1 Vs R1 R2 R3 x2 (a) FIGURE 2. while three other resistances depend on variables x1 and x2 . the research was terminated.17a representing two electrical vehicles operating on a track. x2) Neural Network V2(x1. and R3 are fixed resistances. However. R2 . This is a standard application of a neural network as a function approximator. V1(x1.Control of Complex Systems Using Neural Networks 73 Theoretical methods such as Pontryagin’s maximum principle and Bellman’s dynamic programming exist for determining optimal controls for nonlinear dynamic systems with state and control constraints. they are discussed briefly here. R1 . solutions for specific problems can rarely be determined online in practical problems. u(t)]. The objective is to estimate V1 and V2 for given values of x1 and x2 .e. The approach itself was motivated by a problem in a transportation system involving N electrical vehicles whose positions and velocities determine voltages which are critical variables of interest.6. The authors suggested that open-loop solutions of optimal control problems computed off-line could be used to train neural networks as online feedback controllers. In optimal control theory the solutions are obtained as functions of time [i. x2) (b) . The voltages V1 and V2 across R1 and R2 are nonlinear functions of x1 and x2 and are the values of interest. A simplified version is given in Problem 7. During the period 1994 to 2000 very promising methods were proposed by Narendra and Brown [71] for circumventing these difficulties.6. However. and the authors succeeded in proposing and realizing solutions to relatively complex optimal control problems. for a variety of reasons. and also as an introduction to Section 2. even as efforts to improve the scope of the approach were proving successful.17b was trained.

18. α) = (1 − α)[10 + 0. from which the optimal values x1 (0. α) = 0 are shown for two typical values of α in Figure 2. PROBLEM 9 A function f (x. PROBLEM 10 (Dynamic Optimization) Problem 9 sets the stage for dynamic optimization in which neural networks can be used effectively as feedback controllers in optimal control problems.1(x1 + 10α − 5) 3 + α(−3x1 − 10) − x2 ] = 0 x2 Plots of f (x. 2 2 f (x.5)x1 + (1+α)2 − 1+α x1 x2 2 (2. where available data have to be processed at many levels to obtain the relevant inputs and outputs necessary to train neural networks. α) has to be minimized subject to an inequality constraint g(x.6. x2 (0. 2) substantially more complex. α) = (α + 0.8.2 Parameter Optimization The next stage in the evolution of the method was concerned with static optimization in which parameter values have to be determined that optimize a performance criterion in the presence of equality or inequality constraints. A system is described by the differential equation: x = f [α(t). x2 (0. The general statement of such problems is first given and two examples are included to illustrate the different specific forms it can take. As shown below.74 Modeling and Control of Complex Systems with true values for test inputs and had standard deviations of 0. x2 ]T ∈ R2 .1] and a neural network has to be trained to obtain the corresponding optimal values of x1 (α) and x2 (α). α = 0. α can assume different values in an interval [0. The constrained parameter optimization problem was solved 100 times for 100 values of α to train the network to obtain optimal solutions for values of α not used before.2) and x1 (0. α) = c (a constant) and g(x.1. This problem introduces the principal difficulty encountered in many of the problems that follow. PROBLEM 8 All the resistances in Problem 7 were linear. respectively.8) can be computed. It is seen from Figure 2. making the computations of currents and voltages simple for any choice of x1 and x2 . For every choice of x1 and x2 the problem had to be solved iteratively and the results used to train the neural network. α) = 0.102) .2 and α = 0.2).19 that the optimal values of x1 and x2 are discontinuous functions of α. In Problem 8.0917 and 0. a number of optimization problems have to be solved to obtain the information to train a neural network. and α is a parameter. where x = [x1 . that is.8). 2. one of the resistances was made nonlinear.101) g(x.0786. x2 ) (i = 1. u(t)] ˙ x(t0 ) = x0 (2. making the computation of Vi (x1 .

. ||ui (t) ≤ 1 i = 1.Control of Complex Systems Using Neural Networks 10 8 6 4 2 0 –2 –4 –6 –8 –10 –10 –8 –6 –4 –2 0 x1 2 4 6 8 10 x2 g (x. α) = c.18 Contour plots of f (x. 2. and constraint curves g(x. α) = 0 2 4 6 8 10 where f (·) satisfies conditions to assure the existence and uniqueness of solutions in an interval [0.2 10 8 6 4 2 x2 0 –2 –4 –6 –8 –10 –10 –8 –6 –4 –2 0 x1 (b) α = 0. α) = 0 f (x. r ]. .. α) = c g (x. The input u(·) is amplitude constrained and must lie in a unit cube c ⊂ Rr [i. T]. The initial state x0 ∈ S0 and the objective is to determine a control input that transfers x0 to xT and . . α) = e 75 (a) α = 0.8 FIGURE 2. . f (x. x(t) ∈ Rn and u(t) ∈ Rr .e. where c is a constant. α) = 0 for two values of α.

2 0. Once u∗ (t) is known as a function of x ∗ (t) and λ(t). Precomputed solutions coincid with neural network approximations. λ(t) and the corresponding u∗ (t) as functions of time. .5 α 0.7 0.1 0.9 1 FIGURE 2.76 3 2 1 0 –1 –2 –3 0 4 2 x2 0 –2 –4 –6 0 0. λ.4 0. λ(t).4 0. This yields x ∗ (t). The above problem reduces to the solution of 2n differential equations of the form: x (t) = Hλ [x(t).8 0.9 1 0.5 α 0.7 0. u(t)] x(T) = xT and the optimal input u∗ (t) is determined from the optimality condition: I nf u(t)∈C H[x ∗ . Following the procedure we have adopted thus far. In the problems that we shall consider. u(t)] x(0) = x0 ˙ ˙ λ(t) = −Hx [x(t).3 Modeling and Control of Complex Systems x1 0. λ.2 0.6 0.3 0. minimizes a performance criterion: J [u] = 0 T L[x(t).105) (2. u] = H[x ∗ . the above problem must be solved for numerous values of x0 ∈ S0 to obtain the necessary information. u(t)]dt (2.19 Optimal values of x1 and x2 as functions of α. it will be unique.104) correspond to a two-point boundary value problem (TPBVP) that can be solved off-line through successive approximations.6 0.103) The optimal input and optimal trajectory are denoted by u∗ (t) and x ∗ (t). λ(t). Comment 17 Our interest is in training neural networks as feedback controllers for the above problem. respectively. u∗ ] (2.8 0.104) This necessary condition confines the optimal solution to a small set of candidates. Equations (2.1 0.

the optimal control from t1 to T is merely u∗ (t +t1 ) over the remaining T −t1 units of time.. xT the final state. If a family of optimal controls can be generated off-line for different values of the state.Control of Complex Systems Using Neural Networks 77 By Bellman’s principle of optimality.20. given the optimal trajectory from x0 to xT . It can be shown that the solution to the above problem (1) does not exist. The ˙ ˙ scalar input u(·) satisfies the amplitude constraint |u(t)| ≤ 1. T − t). A specified initial state x0 must be transferred to a final state x f in minimal time. Such a neural network will have for its inputs x(t) the state of the system. and only brief descriptions of the problems and the corresponding solutions are presented here. or (2) is bang bang.e. The optimal trajectory in the state space and the corresponding input for a typical pair of states x0 and x f (generated by a neural network) is shown in Figure 2. x1 Constrained Region x0 xf x2 +1 –1 FIGURE 2. and the trajectories should lie outside a circular region in the state space. PROBLEM 11 (Minimum Time) A second-order system is described by the differential equations x1 = x2 . PROBLEM 12 (Minimum Energy) Another problem that is generally included in standard textbooks on control systems deals with minimum energy control. or (3) is piecewise continuous.20 . x2 = u. for which a closed form solution exists. if x ∗ (t1 ) is on this trajectory. The following two problems are considered in Reference [71]. and Tr the time to go (i. they can be stored and used to train a neural network.

22. the basic philosophy has been to solve the optimal control problem off-line a number of times to obtain optimal solutions. also performed remarkably accurately. 600 ms).27 ms vs.107) e −Aτ bb T e −A τ dτ. 2 ∗ The open-loop optimal control u (t) is given by u∗ (t) = −b T W(Tr ) −1 [x(t) − e −ATr x f ] where Tr is the remaining time. T − t and W(Tr ) = 0 Tr (2. and T). Many other formulations were suggested in the late 1990s to reduce the computational effort.3 Computational Advantage A comparison of the computational times required to evaluate the Grammian matrix W. when applied to a slowly varying terminal state. a neural network having the structure shown in Figure 2. A system is described by the linear state equation: x = Ax(t) + bu(t) ˙ (2. T (2. b) is controllable. Comment 18 The method proposed here.6.106) where x(t) ∈ Rn . 2.1.78 u(t) Modeling and Control of Complex Systems Plant . A ∈ Rn×n and b ∈ Rn . and (A. reveals that the latter has a significant advantage (0.21 can be trained as a feedback controller. The objective is to transfer an initial state x0 at time t = 0 to a final state x(T) = xT T where T is specified. The idea is .6.4 Other Formulations Thus far. that is. 2.108) Assuming that the optimal control input and the corresponding response x ∗ (t) have been generated for a number of triples (x0 . x = Ax + bu x(t) Neural Network x0 xT Tr =T – t FIGURE 2. u(t) ∈ R. using standard techniques and a neural network. and use the information at different levels to train a neural network to act as a feedback controller.1. Two of them are shown in Figure 2. xT . while minimizing the energy E = 1 0 u2 (τ )dτ .21 Our interest is in determining a feedback controller for such a problem using a neural network and comparing the computational effort involved in the two cases.

u) x(t) u = G(x. yielding x(t) and λ(t) simultaneously. 2. if given x0 . x. the initial value of λ(t). x f .104) can be integrated forward in time. xt f and Tt f (= t f − t)). x = f (x. u) x(t) λ(t) u = G(x. In particular.6. we review briefly in this section the problem to be addressed and the principal concepts involved to aid us in the discussions that follow involving approximate methods that have been proposed in the literature. which is substantially simpler to solve. the 2n Equations (2. λ) λ(t) = N3(x. xt f and tf a neural network can map them on to λ0 .22 to convert the TPBVP into an initial value problem. u) .6. t] is defined as: J [x(t). The state x(t) ∈ Rn of the system is governed by the .2.1 Continuous Time (No Uncertainty) A performance function J [x. λ = Λ( λ.109) and the objective is to determine a control law that minimizes the performance function. 2.2 Dynamic Programming in Continuous and Discrete Time Although optimization using dynamic programming is well known.Tf ) x(t) xf Tf FIGURE 2. t f ) λ0 u(t) . xf .22a. u(t). λ) u(t) . τ ]dτ (2. A more robust method is shown in Figure 2.22b where a neural network yields λ(t) corresponding to the triple (xt . u. t] = φ(x(t f )) + tf t0 L[x(τ ). x = f (x. We first state the problem in continuous time and later switch to discrete-time systems for the practical realization of the solutions. u(τ ).Control of Complex Systems Using Neural Networks x0 xf tf 79 λ0 = N2 (x 0 . This is shown in Figure 2.

since they are universal approximators. t] ˙ (2. k) (2. t) (2. u(k). k] = φ(x(T)) + k=k0 L(x(k). This is why researchers became interested in neural networks.110) with quadratic performance criteria.2 Discrete Time (No Uncertainty) In discrete time the procedure is substantially more transparent. τ ))dτ ) (2. Therefore assumptions about the performance measure J and the system (2.80 differential equation: Modeling and Control of Complex Systems x (t) = f [x(t). the sum of two integrals t second integral must be optimal independent of the value of the first integral.110) and u(t) ∈ Rr denotes the control input.114) . it is important to bear in mind that. u(τ ). t]T f (x(t). t] ∂t = − min L(x(t). and they permit the problem to be stated as a parameter optimization problem. t] = min t0 ≤τ ≤t f φ(x(t f ) + tf t0 L(x(τ.113) subject to x(k + 1) = f [x(k). u(t). k] x(0) = x0 (2. the above problem admits only a nonsmooth viscosity solution. The analogous problem can now be stated as follows: k=T J T = J [t0 . Comment 19 Before we discuss the use of neural networks to carry out the computation. u(k). The initial and final times t0 and t f as well as the initial and final states x(t0 ) = x0 and x(T) = xT are assumed to be specified (in which case φ in Equation (2. we obtain the Hamilton–Jacobi– Bellman equation ∂ J ∗ [x(t). 2.109) can be omitted) or the cost of the final state can be explicitly included in φ. u(k). T] [x(k).6. L : Rn × Rr × R → R is the instantaneous cost as in standard optimal control theory. by the principle of optimality.2.111) is over the interval [t. Extending this argument to the case t → 0. t) + u(t) ∂ ∂x J ∗ [x(t).112) are in general extremely hard to solve and exact solutions have been derived mainly for linear systems (2.111) If the integral in Equation (2.112) Partial differential equations of the form given in Equation (2. u(t). If J ∗ represents the optimum value of J we have J ∗ [x(t). u(t). t f ] expressed as the t+ t tf Ldτ + t+ t Ldτ . in general.110) have to be carefully examined before attempting the use of neural networks.

Control of Complex Systems Using Neural Networks

81

As J [T, T] [x(k), u(k), k] = φ(x(T)), we start by considering the transfer from k = T − 1 to k = T:
∗ J [T−1, T] = J [T, T] [ f (x, u, T − 1)] + L(x, u, T − 1)

(2.115)

where (x, u) implies [x(T − 1), u(T − 1)]. This is a one-step optimization problem whose solution yields the optimal u(T − 1) for any initial state x(T − 1). The expression for J [T−2, T] is similar to Equation (2.115), except that the optimal cost from T − 1 to T has to be used in place of J [T, T] . We therefore have
∗ ∗ J [T−2, T] = min [L(x, u, T − 2) + J T−1 (x, T − 1)] u(T−2)

(2.116)

Proceeding backwards in time, we have the general equation
∗ J [T−k, T] [x, T − k] = min ∗ L(x, u, T − k) + J [T−k+1, T] [ f (x, u, T − k)]

u(T−k)

(2.117)

where k = 1, 2, . . . , T is the stage number. Because the procedure involves computations backwards in time (as generally in optimal control), the computations have to be carried out off-line. At state T − k, one obtains the optimal control law u∗ (T − k), by the minimization of the function in Equation (2.113) as function g(x(T − k), T − k) of the state and time. This in turn is used to ∗ compute the optimal performance J [T−k, T] [x, T − k], which in the next step ∗ is used to derive u (T − k − 1). As stated earlier, except for the LQ (linearquadratic) problem which has been studied extensively, an explicit expression of g : Rn × R → Rm is very hard to obtain. In such a case the state space is discretized and an exhaustive search over the space of admissible solutions is performed to find the optimum. The number of computations grows exponentially with the dimension of the state space, and this is generally referred to as the curse of dimensionality. 2.6.2.3 Discrete Time (System Unknown) In the problems of interest to us in this chapter, accurate knowledge of complex nonlinear systems is not available. When f (·) in Equation (2.117) is unknown or partially unknown, the methods described thus far cannot be applied and incremental minimization methods of the type treated in previous sections become necessary. These are referred to collectively as approximate dynamic programming. In this section we confine our attention to discrete-time methods, in which most of the current work is being carried out. In these methods, which have been strongly influenced by reinforcement learning, one proceeds forward in time to estimate future rewards based on state and action transitions. Incremental dynamic programming is combined with a parametric structure (using neural networks) to reduce the computational complexity of estimating cost.

82

Modeling and Control of Complex Systems

2.6.2.3.1 Adaptive Critics In view of the complexity of the problems and the computations they entail, it is not surprising that a variety of methods have been proposed, which include heuristic dynamic programming (HDP), dual heuristic programming (DHP), globalized dual heuristic programming (GDHP), and the so-called action-dependent programming (AP) variants of HDP and DHP. All of them involve the use of neural networks to approximate the value function, the decision law, and the system dynamics, so that the problem is reduced to one of parameter optimization. As in previous sections we shall try to address the basic ideas of the different methods. Following this, we shall briefly outline the differences between the various methods proposed and proceed to comment on convergence and stability questions they give rise to [72]. Unlike classical dynamic programming, we proceed forward in time but nevertheless determine the optimal ∗ control law by using the same recurrence Equation (2.117) where J [T−k+1, T] is ˆ∗ replaced by an estimate of the optimal cost-to-go J [T−k+1, T] . So, at k = T we solve:
∗ ˆ∗ ˆ J [0, T] = min{L(x, u, 0) + J [1, T] [ f (x, u, 0)]} u(0)

(2.118)

ˆ where f is the estimate of f at time T − k = 0. At step 1 the procedure is repeated, that is,
∗ ˆ∗ ˆ J [1, T] = min{L(x, u, 1) + J [2, T] [ f (x, u, 1)]} u(1)

(2.119)

∗ ˆ∗ Again, J [2, T] is used instead of J [2, T] . The estimate of the optimal cost J ∗ as a function of the state has been updated using the cost that was actually caused by the control u(0) in the previous instant. Repeating this process at every stage k = T, . . . , 1, the estimate of the optimal policy u∗ (x), the estimate ˆ ˆ of the plant dynamics f (x, u, k) and the estimate of the optimal cost-to-go ˆ∗ J [T−k, T] are evaluated. The evaluation of all three functions at any stage k based on x(k) is carried out iteratively over l cycles (so that k denotes the time instant, while l denotes the number of iterations at that time instant). It is claimed that this procedure will result in the convergence of u∗ (x, k) to ˆ ˆ u∗ (k) and J ∗ to J ∗ . Although the optimization procedure described above contains the essential ideas of the four methods mentioned earlier, they differ in the exact nature in which the various steps are executed (e.g., the nature of the functionals or their derivatives that they attempt to realize) as shown below. Heuristic dynamic programming (HDP) is essentially the procedure outlined above, and represents conceptually the simplest form of design. It uses two neural networks to approximate the value function and the decision law. In dual heuristic programming (DHP) neural networks are used to approximate the derivatives of the value function with respect to the state variables (used in the computation of the control law). Empirically, the resulting updating laws have been shown to converge more rapidly than HDP, although at the

Control of Complex Systems Using Neural Networks

83

cost of more complexity, as a vector function is approximated rather than a scalar. This also results in the relationship between the updatings of the value function and the control law becoming more complicated. The globalized dual heuristic programming attempts to combine the fast convergence of DHP with the easier implementation of HDP. Action-dependent (AD) methods modify the above three methods by using a value function V(x, α) in place of V(x), where α is the control action. Once again empirically this has been shown to result in an improved convergence rate. The reader is referred to Reference [73] for details concerning all four methods. Comment 20 (Convergence) The control law is determined as the one that ∗ minimizes the instantaneous cost L(x, u(x), T−k) in the value function J [T−k, T] . ˆ ∗ ˆ At every stage, only an estimate J [T−k, T] is available and hence the quality of the resulting control depends on how close the estimate is to the actual optimal trajectory. Hence, u(x) is suboptimal in general. But then, if it is applied ˆ in the next step of the recursive procedure, it does not really minimize the instantaneous cost L(x, u(x), T − k). Hence, L does not contribute to the cost ˆ in the same way that the optimal cost L ∗ would, and this, in turn, may distort the estimate of the value function. A word must be added regarding the logic ˆ∗ behind the forward programming scheme. The improvement of J T−k, T is of no use for determining the next control input. As expected, the procedure approximates the cost only in hindsight, that is, after the control has been applied. However, in the procedures proposed many iterations are carried out for the same time instant. Comment 21 (Stability) Because u(x) applied to the system is suboptimal ˆ we have to assume that it does not destabilize the system. ˆ Comment 22 (Stability) If J ∗ is seen as the output of a system and J ∗ is its estimate, then clearly, by testing the system and comparing the output to the estimate, one gains information that can be used to adjust the estimate at the next instant of time. This is similar to the viewpoint adopted in adaptive control where the adjustment is performed on a suitably parameterized system model. Accepting this analogy temporarily, we recall that a series of questions had to be answered in order to prove stability of the adaptive process. These questions are concerned with the way in which the adjustment of parameters and the application of the control are to be interwoven so as to keep all the signals in the system bounded. A similar issue arises in the present case, since the system is controlled at the same time that an estimate of the cost to go is generated.

2.6.2.3.2 Conclusion 1. At the present time, we do not believe that the methods described in Section 2.6.2 can be used to stabilize an unknown nonlinear dynamical system online while optimizing a performance function

84

Modeling and Control of Complex Systems over a finite interval. However, as in Section 2.4, stability may be arrived at provided that the uncertainty regarding the plant is sufficiently small and initial choices are sufficiently close to the optimal values. It should also be mentioned that a large body of empirical evidence exists which demonstrates the success of the method in practical applications. 2. To the authors’ knowledge the stability of even a linear adaptive system optimized over a finite interval has not been resolved thus far. As it is likely that conditions for the latter can be derived, we believe that it should be undertaken first so that we have a better appreciation of the difficulties encountered in the nonlinear case. 3. In problems discussed in Section 2.6.1 it was assumed that the dynamics of the plant were known. In light of the discussions in Section 2.6.2, it may be worth attempting the same problem (i.e., with plant uncertainty) using optimal control methods described in Section 2.6.1. By providing an alternative viewpoint it would complement much of the work that is currently in progress using dynamic programming methods. 4. Finally, the authors also believe that the multiple-model-based approach [41] (which is now used extensively in many different fields) may have much to offer to the problems discussed in this section. Although it is bound to be computationally intensive, the use of multiple trajectories at each stage would increase the probability of convergence. The reader is referred to References [41]–[43] where a switching scheme is proposed to achieve both stability and accuracy.

2.7 Applications of Neural Networks to Control Problems
A computer search carried out by the first author about ten years ago revealed that 9,955 articles with the title “neural networks” were published in the engineering literature over a five-year period, of which over 8,000 dealt with problems related to function approximation. Of the remaining 1,500, approximately 350 were related to applications, which were concerned with theory, experiments in the laboratory, and primarily simulation studies. Only 14 of the roughly 10,000 articles dealt with real applications. The authors have once again searched the engineering literature and have concluded that matters are not very different today. For a comprehensive and systematic classification of neural network-based control applications they refer the reader to the paper by Agarwal [74]. They are also aware that a number of exciting applications of neural networks have existed in the industrial world during the entire period, and that many of them do not appear in

Control of Complex Systems Using Neural Networks

85

journal articles for proprietary reasons. In this section we present a few carefully chosen applications. These have been collected from books and technical articles, and from friends in industry through private communication. These fall into three distinct categories. The first consists of applications that are in one way or another related to the issues raised in Sections 2.3 and 2.4, and indicate the extent to which theory plays a role in design. Sufficient details are provided about these problems. In the second category are included those applications that emphasize the practical considerations that determine the choices that have to be made. The third category consists of novel applications where the emphasis is not on mathematical techniques but on the ingenuity and creativity of some of our colleagues. 2.7.1 Application 1: Controller in a Hard Disk Drive [75] This concerns a high-performance servo controller for data acquisition in a hard disk drive using an electromechanical voice-coil motor (VCM) actuator. Such high-performance controllers are needed due to the rapid increase in data storage density. 2.7.1.1 Model If q is the position of the actuator tip and q is its velocity, the model of such a ˙ system is described by the equation: M¨ + F (q , q ) = u q ˙ (2.120)

where M is the system inertia and is unknown, and the smooth function F (·) is also unknown but bounded with a known bound K F , that is, |F | ≤ K F . Further, (q , q ) ∈ S where S is a compact set and is known. ˙ 2.7.1.2 Radial Basis Function Network Since the domain over which F needs to be approximated is compact and F is bounded, the approximation can be achieved using a radial basis function network such that ˙ F (q , q ) = θ T R(q , q ) + E F ˙ (2.121)

where R is a vector of radial basis functions, θ is an unknown constant vector and |E F | ≤ K E is an error term. Since the state is bounded, the principal concern is accuracy. 2.7.1.3 Objective ˆ ˆ If M and θ are estimates of M and θ, the objective is to determine adaptive laws for updating the estimates and at every instant use the estimates to determine the control input to transfer the initial state to the desired final state.

86

Modeling and Control of Complex Systems

2.7.1.4 Adaptive Laws and Control Law The following adaptive laws for determining the parameter estimates and control law for generating the input u were used. ˙ ˆ M = −γ qr r ¨ ˙ ˆ θ = − φ(q , q )r ˙ ˆq ˆ u = M¨ r + θ T φ(q , q ) − ( K d r + K r + K i ˙ ˙ ˙ ˙ where = q − q d , qr = q d − λ , are error signals.

(2.122)
t 0

r dτ ) − K E sgn(r ) λ>0

and r = q − qr = ˙ + λ ˙ ˙

Comment 23 This is a well-defined problem for which neural networks can be designed, as the compact region in which the trajectories lie is known a priori, and all radial basis functions were carefully chosen to cover the operational range of the controller.

2.7.2 Application 2: Lean Combustion in Spark Ignition Engines [76] This problem discussed by He and Jagannathan is a very interesting application of neural networks in control. At the same time it also raises theoretical questions related to those discussed in Section 2.4. It deals with spark ignition (SI) engines at extreme lean conditions. The control of engines at lean operating conditions is desirable to reduce emissions and to improve fuel efficiency. However, the engine exhibits strong cyclic variations in heat release, which is undesirable from the point of view of stability. The objective of the design is consequently to reduce cyclic variations in heat release at lean engine operation. The system to be controlled is described by the equations: x1 (k + 1) = f 1 (x1 (k), x2 (k)) + g1 (x1 (k), x2 (k))x2 (k) + d1 (k) x2 (k + 1) = f 2 (x1 (k), x2 (k)) + g2 (x1 (k), x2 (k))u(k) + d2 (k)

(2.123)

where x2 (k) is the mass of fuel before the kth burn, x1 (k) is the mass of air before the kth burn, and u, the control variable, is the change of mass of fuel per cycle. f 1 , f 2 , g1 , and g2 are smooth unknown functions of their arguments and gi (k) are known to lie in the intervals [0, gmi ], i = 1, 2. d1 (·) and d(·) are external disturbances. 2.7.2.1 Objective The objective is to maintain x1 (k) at a constant value (Problem 2) and reduce the variations in the ratio x2 (k)/x1 (k) over a cycle, by using u as the control variable.

Control of Complex Systems Using Neural Networks

87

2.7.2.2 Method The authors use a nonlinear version of backstepping (which we have discussed in Section 2.4) and use x2 (k) as the virtual control input and u(k) as the control input. The former requires the adjustment of the weight vector w1 of a neural network, and the latter that of the weight vector w2 of a second network. The adaptive laws are
T w1 (k + 1) = w1 (k) − α1 φ(z1 (k)) w1 (k)φ(z1 (k)) + k1 e 1 (k) α2 T w2 (k + 1) = w2 (k) − σ (x2 (k)) z1 (k)σ (z2 (k)) + k2 e 2 (k) k2

(2.124)

where z1 (k) = [x1 (k) x2 (k) x1d ]T and z2 (k) = [x1 (k) x2 (k) w1 (k)]T . The authors claim that the objectives set forth earlier are achieved and that the performance is highly satisfactory. Comment 24 Although this is a very ingenious application, we do not agree with the theoretical arguments used to justify the structure of the controllers. Significantly more information about the manner in which the basis functions were chosen, and the conditions under which the neural networks were trained, need to be known before questions of theoretical correctness can be argued. 2.7.3 Application 3: MIMO Furnace Control [77] An application of neural networks for temperature control was developed by Omron Inc. in Japan over ten years ago. It is included here because it exemplifies the set-point regulation problem of a nonlinear system posed in Section 2.3 and discussed in Section 2.4. It deals with an MIMO temperature control system. The range of the temperatures and the accuracy of the setpoint control affect the final products in an industrial process. The objective is to regulate temperature in three channels by bringing up the set point during startup as quickly as possible while avoiding overshoots. The system is open-loop stable so that neural networks can be trained to obtain both forward and inverse models of the three channels of interest. The identification models, in turn, are used to train the controllers. A comparison of the proposed scheme with a self-tuning controller and a proportionalintegral–derivative (PID) controller demonstrated that neurocontrollers are considerably more robust than the other two, and can also cope with changes in the dynamics of the plant. 2.7.4 Application 4: Fed-Batch Fermentation Processes [78] Biofermentation processes, in which microorganisms grown on a suitable substrate synthesize a desired substance, are used widely to produce a large number of useful products. The application of neural network-based controllers discussed by Boskovi´ and Narendra in 1995 for the control of fed-batch ferˇ c mentation processes [78] is an appropriate one for examination in the present

88

Modeling and Control of Complex Systems

context. It reveals clearly that the efficacy of a control strategy in a practical application depends upon a number of factors, which include the prior information needed to implement the controller, the difficulty encountered in choosing design parameters, stability and robustness issues, the effect of measurement noise, and the computational aspects involved in the implementation of the control algorithm. The paper deals with the control of a distinctly nonlinear system whose dynamics are not known precisely, whose initial conditions and parameters can vary, and whose inputs have saturation constraints. The system is openloop stable and all its five state variables are accessible. The two inputs of the system u1 (·) and u2 (·) are to be determined to maximize the production of microorganisms [i.e., x1 (k)] in the interval [0, T], while assuring that one of the state variables x4 (k) (ethanol) is small. On the basis of the results presented in the paper it became clear that the method to be used for controlling a fermentation process would depend upon several factors, including the extent to which parameter values and initial conditions may vary, the error the designer would be willing to tolerate, and prior knowledge concerning nonlinearities. It was concluded that linear adaptive controllers and neurocontrollers were the only two viable alternatives. Even among these, if accuracy and robustness are critical issues, neural network-based control is distinctly preferable. 2.7.5 Application 5: Automotive Control Systems The research team at Ford Research Laboratory, under the leadership of Feldkamp, has been investigating for about fifteen years the efficacy of neural network techniques for addressing different problems that arise in automotive systems. The experiments that they have carried out under carefully controlled conditions, the meticulous manner in which they have examined various issues, as well as their candid comments concerning the outcomes, have had a great impact on the confidence of the neural network community in general, and the authors in particular, regarding the practical applicability of neural networks. The reader is referred to References [13], [79]–[82], which are papers presented by Feldkamp and coworkers at the Yale Workshops during the period 1994 to 2005. Idle speed control implies holding the engine speed at or near an externally supplied target speed in the presence of disturbances. The latter include load from the air conditioning system, load from electrical accessories such as windows, and load from power steering. Some of these loads may be large enough to stall a poorly controlled engine. The controls to the system are throttle position and spark advance. As these have different dynamics and control authority, an interesting aspect of the control problem is in coordinating the control actions effectively. In Reference [83] an attempt was made to develop a recurrent neural network idle speed controller for a four-cylinder engine. An identification

Control of Complex Systems Using Neural Networks

89

stage preceded the design of a controller that was trained online. Training was carried out with no disturbance, a single disturbance, and combinations of disturbances during which engine speed measurements were compared to the target speed, and an extended Kalman filter rather than simple gradient updates were used. The latter involved truncated back propagation through time. A variation of the above procedure was to identify the system using a recurrent network and use it off-line to train a controller. Over the 1990s, the same approach was used to obtain nonlinear models for active suspension and antilock braking. In all cases the best estimates of noise, parameter uncertainty, measurement error, and actuator delays were used. Of current interest to the group is a setting in which a nominal model is given and the objective is to develop a controller with an adjustable tradeoff between performance and robustness. Feldkamp has been arguing strongly for a long time for the use of recurrent networks as controllers. Training such networks off-line eliminates stability issues provided an initial set of values can be found which will eventually yield a satisfactory reduction in the chosen cost function. Based on this, the first author is starting a program at Yale to study the theoretical properties of recurrent neural networks. 2.7.6 Application 6: Biped Walking Robot [84] An exciting, ambitious, and very complex application, whose development over many years was followed by the first author, was the biped walking robot designed, implemented, and tested by Kun and Miller at the University of New Hampshire. It encompassed almost all the difficulties enumerated in Section 2.4, including unstable nonlinear dynamics, time delays in the control loops, nonlinear kinematics that are difficult to model accurately, and noisy sensors. In spite of these difficulties walking control strategies were tested and extended in studies over generations of bipeds. From the point of view of this paper it brings to focus some of the theoretical questions concerning stability raised in Section 2.4. Dynamic walking promises higher speeds, improved walking structures, and greater efficiency. However, this also implies exploring new regions in the state space to learn better strategies, which automatically brings in its wake stability questions. Three neural networks controlled front-to-back and side-to-side balance, and good foot contact. The first network controlled instantaneous front/back position of the hips relative to the feet, the second network predicted the correct amplitude and velocity of side-to-side lean during each step. The third network was used to learn kinematically consistent postures. All three networks have to operate in new regions to learn the proper strategies, and in all cases instability is a distinct possibility. Frequent human support was needed to keep the biped from falling during the learning process when it was learning to perform satisfactorily in unfamiliar regions.

90

Modeling and Control of Complex Systems

2.7.7 Application 7: Real-Time Predictive Control in the Manufacturing Industry The following is a recent application from industry, but for proprietary reasons details concerning the process controlled are not provided. In a manufacturing plant, the sensors used to measure the relevant outputs do not report actual readings frequently enough for effective feedback control. Further, the plant has unmodeled dynamics which consist of several parts that are very hard to model from first principles. Hence, the system presented the right opportunity to develop a dynamic model using the neural network methods described in Section 2.4. The neural network model was used as a virtual sensor, and based on the estimate of the state provided by the model, an intelligent set-point generator was developed to determine set points for two control variables. A 32% improvement in performance of the overall system was achieved. 2.7.8 Application 8: Multiple Models: Switching and Tuning The multiple-model switching and tuning method proposed in Reference [41] is currently widely used in very different areas. Brief descriptions of two applications are given below: 1. Anomaly detection in finance. The stock markets in the United States employ the industry’s most sophisticated real-time surveillance systems to ensure investor protection and a fair and competitive trading environment. They have designed and developed unusual methods for detecting universal real-time market activity. The “switch and tune” approach has been used to develop piecewise models of various regions in the underlying operating space. New regions are explored and flagged by the use of anomaly detection methods using neural networks, while well-traversed spaces are learned using a local classifier which classifies activity as an anomaly or not. 2. Reconfigurable control of space structural platforms. In a broad spectrum of aerospace applications, achieving acceptable performance over an extended operating range may be difficult due to a variety of factors such as high dimensionality, multiple inputs and outputs, complex performance criterion, and operational constraints. The multiple-model-based “switching and tuning” philosophy described in Section 2.4 is ideally suited for such problems. One example of such a system is described in Reference [85]. The system considered is a flexible structure with nonlinear and time-varying dynamics. An adaptive radial basis function network is used to identify most of the spatiotemporal interaction among the structure members. A fault diagnosis system provides neurocontrollers with various failure scenarios, and an associative memory compensates for catastrophic changes of structural parameters by providing

Control of Complex Systems Using Neural Networks a continuous solution space of acceptable control configurations. As stated in Section 2.4, the latter are designed a priori.

91

2.7.9 Application 9: Biological Control Structures for Engineering Systems In the introduction it was stated that control theorists were attracted to the new field of neurocontrol in the 1980s, inspired by the ability of biological systems to interact with uncertain complex environments. Thus far, most of the systems that we have discussed are engineering systems. The following interesting application by Doyle et al. [86] is a notable exception. In it, the authors attempt reverse engineering by using biological structures for applications in control systems. The baroreceptor vagal reflex is responsible for short-term blood pressure control. The authors make a very good case that it provides an excellent biological paradigm for the development of control strategies for multipleinput single-output (MISO) processes. The components of the system have well-defined control analogs. The central nervous system is the “controller.” The transducers in the major blood vessels (baroreceptors) are the “sensors,” and the sympathetic and vagal postganglionic motor neurons in the heart and vessels are “actuators.” Demand (eating, exercise), external inputs (cold weather), emotional state (joy, anger), and anticipated action (postural adjustment) correspond to “time-varying environments.” Maintaining blood pressure around a set point dictated by cardiovascular demands is the objective. The control system performs a variety of tasks which includes integration of multiple inputs, noise filtering, compensation for nonlinear features of cardiovascular function, and generation of a robust control input. The primary objective of the authors is to understand the above system and then to mimic the functions for process control application. The MISO control architecture employed in the baroreceptor reflex consists of two parallel controllers in the central nervous system: the sympathetic and the parasympathetic systems. Whereas the response of the latter is relatively fast (2–4.5 sec), the response of the former is slow (10–80 sec). However, the faster control is “expensive,” whereas the slower control is acceptable. The brain coordinates the use of the two controllers to provide effective blood pressure control while minimizing the long-term cost of the control actions. This is one of many functions discussed by the authors which is used for reverse engineering. We consider this a truly noteworthy application. 2.7.10 Application 10: Preconscious Attention Control An application that is distinctly different from those discussed thus far, which combines engineering and biology, and which was brought to the attention of the first author, concerns preconscious attention control. Automobile collision avoidance is an example of a problem where improved recognition results in reduced reaction time, but where visual distraction decreases overall performance. In this application, the authors [87] seek to aid an operator in recognizing an object requiring attention by first presenting an object image

The effect of visual priming is found by comparing an operator’s reaction time with that of a non-primed operator. Stability can be assured using linear methods and fast adaptation is possible. As neural networks are the principal components used to cope with nonlinearities. Section 2. without affecting stability. Extensive simulation results have shown that need not be small. . and concludes with a statement of problems for investigation in the following sections. the interesting feature of the indirect control procedure is that the output of the reference model is a function of the state of the plant. the determination of the existence of appropriate maps. to achieve greater accuracy. as recognition becomes a function of memory as well as awareness. Because each individual is assumed to have unique visual sensitivities that evolve with all prior experience. their structures and the adaptive laws used to train them are discussed in Section 2. Slow adjustment of the parameters of the neural networks can be used. From an engineering point of view. Section 2.8 Comments and Conclusions The chapter examines the current status of neurocontrol and the methods available for infinite-time and finite-time adaptive control of nonlinear systems. For further details concerning this application the reader is referred to Reference [87]. using the inverse function theorem and the implicit function theorem all the problems investigated in the past in adaptive control theory can be revisited after including nonlinear terms in the description of the plant.2 presents many of the basic results from control theory and adaptive control theory for easy reference.4 deals with the methods that have been proposed to identify and control nonlinear dynamic systems. In such cases. The authors believe that the method based on linearization is one that is general and can be rigorously justified. The observed reduction in reaction time is directly proportional to the reduction in neural activity. a neural network is used to derive an operator-specific sensitivity to images presented outside of awareness.92 Modeling and Control of Complex Systems to the preconscious region of visual awareness (the region in which neural activity registers without the user’s awareness). Spread throughout the chapter are comments to relate the procedures proposed to well-established concepts in adaptive control. The theoretical basis for their design. It is valid in a neighborhood of the equilibrium state where the linear terms dominate the nonlinear components. and the realization of identifiers and controllers when such maps exist are discussed. Preconscious exposure results in subsequent processing of identical images to require less neural activity in identification (known as visual priming). 2.3. Visual priming results from the plasticity of the brain and is known to facilitate recognition. The emphasis of the chapter is on the simple ideas that underlie the different methods.

call for theoretical explanations and thereby catalyze research. most of them are both novel and creative and. and the authors are of the opinion that the arguments used can be rigorously justified only when the initial trajectories are in the neighborhoods of the optimal trajectories. An introduction to nonlinear control concepts is included in Section 2. all of whom collaborated with him on different aspects of the research reported here. Jo˜ o. In particular. the dynamics of the plant are assumed to be known and optimal control theory is used to design feedback controllers. and Alex Parlos concerning recent applications. and approximate dynamic programming methods are proposed. Finite time optimization and optimal control is the topic of Section 2. several others proposed by different authors are also discussed. Optimal control theory and dynamic programming are the principal mathematical tools used here. In Section 2. The chapter concludes with a section on applications. Jo˜ o Cabrera. who were his former graduate students. They also believe that a vast number of contributions made in recent years extending the well-known backstepping procedure to general nonlinear systems cannot be theoretically justified. and will be needed when attempts are made in the future to control dynamic systems in those regions. Nonlinear control using neural networks is still in its infancy. the authors believe that the assumptions regarding the existence of basis functions need considerably more justification before they can be accepted as being theoretically rigorous.1. a He would also like to acknowledge the help received from Lee Feldkamp. Snehasis Mukhopadhyay. These become relevant when the nonlinear terms dominate the linear terms in regions of the state space far from equilibrium. whereas others have only a tenuous relation to them.6. However. Santosh Ananthram. and a Osvaldo Driollet. when their power and scope become evident as experience with distinctly nonlinear phenomena increases. Although these concepts have not become part of the mainstream thinking in the neural network community. In Section 2. the authors believe that the latter will take to them enthusiastically in the future. The assumptions made by them concerning the dynamics of the plant are critically examined.4 is on linearization-based methods. and Lingji for many insightful discussions in recent years. Some of the applications are based on the analytical principles developed in the chapter. and Jovan Boskovi´ . and . The same questions that arise in nonlinear adaptive control also arise in this case.Control of Complex Systems Using Neural Networks 93 Although the emphasis in Section 2. but there is little doubt that great opportunities abound both in theory and in practice. Sai-Ming Li. like many other similar applications in the history of control. the dynamics of the plant are assumed to be unknown. In particular.6.6. Asriel Levine.2. Acknowledgments The first author would like to thank Kannan Parthasarathy.5. he would like to thank Snehasis. Lingji Chen. ˇ c his former postdoctoral fellow.

M. Krstic. G. MA: MIT Press. A. pp. McClelland. 1949. E. D. vol.” IEEE Transactions on Neural Networks.” Neural Networks. Hinton. E. Finally. K. vol. Hornik. 5. and Systems. 4–27. Rumelhart and J. 1. Cambridge.” Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Technical Report 85-460-1. thesis.” in Proceedings of the Eighth Yale Workshop (New Haven. W. S. pp. “Identification and control of dynamical systems using neural netwroks. References 1. vol. S. 17. Linear System Theory. F.C. Annaswamy. 1995. 10. 3. Annaswamy. Rosenblatt. vol. 2. 5. and F. 168–173. pp. Jacobi. K. Stable Adaptive Systems. Combridge. Werbos. 1989. NJ: Prentice-Hall. Nonlinear and Adaptive Control Design. L. pp. Narendra and A. “Learning representations by error propagation. 183–192. 1994. Widrow. 359–366. M. CT). Washington. Cybenko. Signals. Cambridge. 1989. Ph. Englewood Cliffs. and H. D. “A logical calculus of the ideas immanent in nervous activity. 1962. MA: Harvard University. Goldstein. pp. MA: MIT Press. Minsky and S. 12. Cybernetics. “Adaptive control of nonlinear systems with triangular structure. 1995. 2. E. and P. and J. pp. 1989. Rugh. G. Funahashi.” Mathematics of Control. Englewood Cliffs. New York: Wiley. 15. “Enabling concepts for applications of neuraocontrol. Wiener. A. NY: Cornell Aeronautical Laboratory. 1974. N. Narendra and K. 115–133.: Spartan Books. “Approximation by superposition of a sigmoidal function. “On the approximate realization of continuous mappings by neural networks. K. 4. 2. 2. 1990. M. NJ: Prentice-Hall. Cambridge. 39. 14. D. and R. 7. 13. D. Feldkamp. Organization of Behavior: A Neuropsychological Theory. vol. Williams. vol. B.” IEEE Transactions on Automatic Control. Papert. McCulloch and W. Kanellakopoulos. and G. V. Editors. G.94 Modeling and Control of Complex Systems the generous support from the National Science Foundation (through Paul Werbos) over a period of fifteen years when much of the work on linearization was done. 6. Perceptrons: An Introduction to Computational Geometry. Pitts. 1948. J. M. S. Parthasarathy. Kokotovic.” Neural Networks. 2005). The Perceptron: A Perceiving and Recognizing Automaton. 1411–1428. 1. 318– 362. Yovitz. 1994. 1943. 8. MA: MIT Press. L. O. Stinchcombe. 435–461. 1957. Rumelhart. D. 1986.” Bulletin of Mathematical Biophysics. L. Baillieul. David. Buffalo. 9. Puskorius. J. 1969. K. I. Yuan. White. “Multi-layer feedforward networks are universal approximators. Hebb. . Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. the authors would like to thank the editors Andreas Pitsillides and Petros Ioannou for their invitation to contribute to this book and for their patience and support. I. W. “Generalization and information storage in networks of adaline ‘neurons’. M. Seto. 16. pp. vol. pp. G. P. M. 11.D.” Self-Organizing Systems. J. 303–314. New York: John Wiley & Sons. Editors. 1989 (Dover. pp.

Hinton. S. “Lyapunov redesign of model reference adaptive control systems. . 37. 801–806. J. H. 33. 15(3). McBride and K. “Backpropagation through time: What it does and how to do it. Werbos. Frequency Domain Criteria for Absolute Stability. Billings. Li. Narendra and K. 36. vol. R. C. and control. Doctoral Dissertation: New Haven. D. Cabrera and K. “A learning algorithm for continually running fully recurrent neural networks. 24. Robust Adaptive Control. Neural Networks in Dynamical Systems. 1972. E. 1990. 533–536. 1245–1255. Pattern Recognition and Control. 1966. vol. Sutton. Nakamura. pp. 1965. Synthesis of Nonlinear Control Systems Using Neural Networks. 1991. New York: Academic Press. A. B. pp. 2007–2027. 2. Mukhopadhyay. 27. S. Werbos. 2001. 28. pp. Chen and K. 2001. S. M. Adaptive Control in the Presence of Disturbances. 35. E. 37. Rumelhart.” Neural Networks for Control. 25. pp. S. Doctoral Dissertation: New Haven. Funahashi and Y. vol.” Neural Computation. 1999. T. Parthasarathy. I: Deterministic non-linear systems. K.” IEEE Transactions on Neural Networks. J.” Automatica. CT: Yale University. “Learning representations by backpropagating errors. Ioannou and J. Nonlinear Adaptive Control of Discrete-time Systems Using Neural Networks and Multiple Models. S. 78. pp. Zipser.” IEEE Transactions on Neural Networks. CT: Yale University. Levin and K. Narendra. Narendra. vol. and R. E. 2004. 1999. J. A. L. J. 1996. S. “Nonlinear adaptive control using neural networks and multiple models. 11. pp. Taylor. U. Doctoral Dissertation: Techinical University of Munich. CT: Yale University. Sun. A. R. I. Leontaritis and S. 1994. J. 1990. “Identification and control of a nonlinear discretetime system based on its linearization: A unified framework. L. L. 26. vol. J. 6.-M. pp. 1. S.” Neural Networks. Levin. 31. and P. Chen and K. 7. 41. vol. “Issues in the application of neural networks for tracking based on inverse control. 32. 34. Narendra. “Approximation of dynamical systems by continuous time recurrent neural networks. U. P. D. Narendra. Williams. “Gradient methods for the optimization of dynamical systems containing neural networks. P. Feiler. J. 38. R. Parks.” IEEE Transactions on Neural Networks. S. Williams. “Adaptive state representation and estimation using recurrent connectionist networks. 1992. 270–280. 23. pp. pp. vol. Narendra and J. Pearlmutter. 6. vol. 663–673. “Input-output parametric models for nonlinear systems. Williams and D. G.” IEEE Transactions on Neural Networks. pp. K. identification. 1993. Editors. 30. W. pp. 303–328. Chen.” Proceedings of the IEEE. 2004. A. 362–367. P. 20. 44(11). vol. Miller.” International Journal of Control. 1985. 6. 252–262.” IEEE Transactions of the Professional Group on Automatic Control. CT: Yale University. 1989. B. 30–42. 22. 1550–1560. K. vol.” Nature. 1986. 1996. J. NJ: PrenticeHall. Upper Saddle River. 1212–1228. “Control of nonlinear dynamical systems using neural networks: Observability. L.Control of Complex Systems Using Neural Networks 95 18. vol. vol.” IEEE Transactions on Automatic Control. 21. Doctoral Dissertation: New Haven. Doctoral Dissertation: New Haven. Narendra. “Gradient calculations for dynamic recurrent neural networks: A survey.” IEEE Transactions on Automatic Control. 1993. 19. Cambridge: MIT Press/Bradford Books. 29. “Optimization of time-varying systems. pp. S. S.

1993. Chen and C. 2001. A. “Decoupling in the design and synthesis of multivariable control systems. pp. Lewis. A. 2001. 651–659. vol. 53. pp. 44. J. vol. J. Kwan and F. Zhang. and T. vol. S.-C. George.-C. Poznyak. 104–107. 4. Tsakalis and P. S. “Robustifying nonlinear systems using high order neural network controllers. 1995. Narendra and N. 1994.” International Journal of Adaptive Control and Signal Processing. G. Adaptive Systems Control Signal Processing (Grenoble. and C. S. Falb and W. pp.” Neural Networks. pp. Rovithakis. M. K.” International Journal of Adaptive Control and Signal Processing. 1967. Narendra and O. A. S. P. 52. 30. 1997. 1998. S. 1993. Linear Time Varying Systems: Control and Adaptation. Harris. 45. K. 753–765. pp. Wolovich. vol. “Adaptively controlling nonlinear continuous-time systems using multilayer neural networks. “Robust redesign of a neural network controller in the presence of unmodeled dynamics. Hang. 43. S. Rovithakis. Singapore: World Scientific. 39. 12. O. T. 2004. A. Man. 1994. 40. O. Suykens. and Cybernetics Society. M. K. S. pp. 41. Symp. Driollet. pp. 47. 2001. New York: Springer-Verlag. 63–72. Lee. F. pp. S. Polycarpou. MA: Kluwer Academic. “A new decentralized model reference adaptive control scheme for large scale systems. “Robust backstepping control of nonlinear systems using neural networks. B. T. A. vol. 1306–1310. pp. 41. Vandewalle. 87–102. “Adaptive control using multiple models.” IEEE Transactions on Neural Networks. Ge. 2002. 46. Norwell. 1999. “Adaptive control of nonlinear multivariable systems using neural networks. A.” IEEE Transactions on Automatic Control. 50. 44. and W. pp. . pp. K. K. vol. vol. 54. E. vol. State Estimation and Trajectory Tracking. “Exact output tracking in decentralized adaptive control systems. 49.” IEEE Transactions on Automatic Control. 15. 447–451. 390–395. A. Lee. vol. Differential Neural Networks for Robust Nonlinear Control: Indentification. H. Ioannou. “Disturbance rejection in nonlinear systems using neural networks. Feiler. and B. Narendra.” IEEE Transactions on Automatic Control. 15. C. Oleng. G. NJ: Prentice-Hall. 47. Yu. L. Upper Saddle River. L. London: World Scientific. switching and tuning. Narendra and S. 1482– 1490. Balakrishnan. Liu. 42. A. 48. M. Artificial Neural Networks for Modeling and Control of Non-Linear Systems. Adaptive Neural Network Control of Robotic Manipulators. 41. Stable Adaptive Neural Network Control. S. vol. 42. 56. vol. J.” IEEE Transactions on Neural Networks. Mirkin. Narendra and J. and K. 55. K. N. C. L. vol.” IEEE Transactions on Automatic Control. M. 17. C. P. A. Mukhopadhyay.” IEEE Transactions on Automatic Control. 1992.” IEEE Transactions of the Systems. “Stochastic adaptive control using multiple models for improved performance in the presence of random disturbances. Ge. Driollet. “Stable adaptive neural control scheme for nonlinear systems. H. Narendra. 2000. 1996. S. Sanchez.96 Modeling and Control of Complex Systems 39. DeMoor. 171–187.” in Proceedings of the 4th IFAC Int.” IEEE Transactions on Automatic Control. “Adaptive control using multiple models. 51. pp. S. S. 2003. 737–752. Mukhopadhyay and K. S. France). 287– 317.

Zhuang. Isidori. “Observability and topological dynamics. 65–92. 1973. R. and D. 195–202. “Robust and adaptive backstepping control for nonlinear systems using rbf neural networks. Boothby. An Introduction to Differentiable Manifolds and Riemannian Geometry. “Neural network-based adaptive dynamic surface control for a class of uncertain nonlinear systems in strict-feedback form. Sussmann. 68. and O. 69. 1981. A. 13. X.” IEEE Transactions on Automatic Control. H. Boston Birkhauser.” SIAM Journal on Control. 73. Brockett. S. Editors. 67. F. S. S. London: Springer. A. Guo. pp. M. vol. pp. 62. Stengel. J. 74. 59.” IEEE Transactions on Control Systems Technology. Nerurkar. “System theory on group manifolds and coset spaces. Sontag.” Learning and Approximate Dynamic Programming: Scaling Up to the Real World. pp. pp. W. E. Powell. 10. 1994. “Model-based adaptive critic designs. W. J. W. pp. F.” AMS Transactions. Clarke and R. “Stable adaptive controller design. vol. Wunsch. “Asymptotic stability and feedback stabilization. CT: Yale University. Lin.” IEEE Transactions on Neural Networks.” SIAM Journal on Control and Optimization. Selmic. 440–448. R. vol. S. 60. F. . 16. 273–287. Dordrecht. and L. “Adaptive NN control of uncertain nonlinear purefeedback systems. vol. Center for Systems Science. M. “Stability and stabilization disturbances: Discontinuities and the effect of disturbances. 2005. 1980. vol. D. White and D. vol. 1997. Li. J. K.” Journal of Dynamics and Differential Equations. 171–188. 71. pp. S. R. Stern. New York: Van Nostrand Reinhold. P. vol.” Automatica. Narendra and S. Y. Editors. W. “A systematic classification of neural-network based control. H. pp. D. Lewis. Valavani. 72. K. Nikitin. R. and Control.” IEEE Control Systems Magazine. Sussmann. pp. London: World Scientific. L. 265–284. 15. 1975. 180.” IEEE Transactions on Neural Networks. 25. 2004.-H.” Differential Geometric Control Theory. 2005. 63. pp. Philadelphia: Society of Industrial and Applied Mathematics Press. pp. Qiang. Global Controllability and Stabilization of Nonlinear Systems. Editors. Wang. 64. Neural Networks in Optimal Control: Part I. 75–93. 65. D. S. 1992. Aeyels. New Haven. J. S. vol. Barto.” Nonlinear Analysis. 58. Si. 1991. Ferrari and R. and G. 3. 19. Brown. W. Brockett. vol. Herrmann. 596–603. Editors. Millman and H. 17. pp. 693–701. vol. Wang and J. A. “Practical implementation of a neural network controller in a hard disk drive. D. 66. Fuzzy. Huang. S. J. Campos. 1997. Y. 1995. S. NL: Kluwer Academic Publishers. 2002. New York: IEEE Press and John Wiley & Sons. 38. “Orbits of families of vector fields and integrability of distribution. Agarwal. “Generic observability in differentiable systems. Sofge. Brockett. J. Ge. pp. Werbos. Technical Report 9703. and R. J. 70. 1983. 61. New York: Academic Press. M. Neuro-Fuzzy Control of Industrial Systems with Actuator Nonlinearities.” Handbook of Intelligent Control: Neural. Kaynak. Nonlinear Control Systems. 2004. A. 1999. 181–191. Differential Equation. and Adaptive Approaches. 1972. II: Proof of stability. 2002. 75. G. 671–682.Control of Complex Systems Using Neural Networks 97 57. S. 146–154. “Approximate dynamic programming for real-time control and neural modeling. Narendra. Ge and C.

1133–1142. 12. L. Feldkamp. A. Omidvar and D. pp. pp. 89–130. and classification. New York: Academic Press. 17–22. S. A. “Unravelling dynamics with recurrent networks: Application to engine diagnostics. M. 78–85. vol. Prokhorov. A. CT). “Reconfigurable neural control in precision space structural platforms. Miller. CT). and R. J. “Bayesian regularization in extended Kalman filter training of neural networks. 1997.” in Proceedings of the Twelfth Yale Workshop (New Haven. and G. Puskorius. M. Gerrity. F.” Automatica. filtering. V. P.98 Modeling and Control of Complex Systems 76. 77. Narendra. T. Subramanian and D. 289–316. “A signal processing framework based on dynamic neural networks with application to problems in adaptation. and I. M. James. 2001. 41. “Conditioned adaptive behavior from a fixed neural network. J. 86. New York: Academic Press.” in Proceedings of the Eleventh Yale Workshop (New Haven.” IEEE Transactions on Control Systems Technology. Feldkamp. 1998. pp. G. Yen. Elliott. A. 2003. V. Editors. A. G. pp. D. Jagannathan. 1. 87. Feldkamp. Marko. Prokhorov.” Proceedings of the IEEE. pp. 1995. 82. Omatu. Schwaber. pp. G. Feldkamp. pp. 1998. S. 59–64. Feldkamp and D. L. Henson. Kun and T.” Automatica.” in Proceedings of the 1996 IEEE International Conference on Robotics and Automation (Minneapolis.” Neural Networks for Control. CT). Doyle. A. D. Puskorius. He and S. vol. “Recurrent neural networks for state estimation. pp. Editors. K. L. 79. 81.L. 85. nonlinear and neuralˇ c network-based adaptive controllers for a class of fed-batch fermentation processes. Feldkamp. “Neuro-controller for reducing cyclic variation in lean combustion spark ignition. 1993. 80.” in Proceedings of the Tenth Yale Workshop (New Haven. “Comparison of linear. CT). “Adaptive dynamic balance of a biped robot using neural networks. J. D. 238–245. Elliott. pp. 2005. 77–84. 1997. Rybak. Omidvar and D. 6650251 and US Patent Application 60840623. “MIMO furnace control with neural networks. pp. L. Khalid. Feldkamp and G. 78. O. 84. and T. 1996.” Neural Networks for Control. Prokhorov and L. A. Yusof. “Neuronal modeling of the baroreceptor reflex with applications in process modeling and control. . S. vol. 651–659. vol. A. US Patents 6967594. L. Boskovi´ and K. J. MN). 83. Jesion. 1996.” in Proceedings of the Ninth Yale Workshop (New Haven. 31. B. O. M. 814–840. Ogunnaike. A. A.

1 Nonlinear Nonparametric Terms .... Masri and Anastasios G........... 111 3...........2....................2 Problem Formulation ......4 Classification of Identification Techniques for Structural Systems..............................6 Scope .1..........1 Introduction..........3 Modeling and Control Problems in Building Structures and Bridges Sami F.....3 Identification of Structural Systems . 117 3.....1.................2 Orthogonal Series Expansion............4............................................................1.......3 ............3 Nonlinear Forces Representation by Chebyshev Series..................................................................2.......2............3 Identification of Parametric Nonlinear Part ......4 Neural Network Approach .....4.......... 112 3.................1 Formulation of Hybrid Parametric/Nonparametric Approach .................... 113 3..................... 111 3..................................... 104 3..............................................................2................... Chassiakos CONTENTS 3...............................1.......... 100 3..1 Background ...................................................................3. 100 3....... 102 3...2.....3..... 117 99 3........2......2.. 104 3.............................. 101 3............................3.................................3 Online Identification Algorithm............................1.......3.......2....................... 103 3..........5 Uncertainty in Identification of Structural Systems ................................. 100 3.....................1 Identification of Hysteretic Systems .. 107 3................ 106 3..........4......................2......2 3....................... 108 3........................................2 Identification of Parametric Linear Part... 103 Hybrid Approach for the Identification of Nonlinear Structural Systems............2.................4....4 Identification of the Nonparametric Nonlinear Part........................ 108 3.................................................1 Modeling of the Vincent Thomas Bridge Using Earthquake Response Measurements .2 Overview of Structural Control of Civil Infrastructure Systems..........2............... 114 Examples and Case Studies.......1....... 109 3.............

.......100 3......................................... and control of civil structures...............................................5 Nonparametric Identification through Volterra–Wiener Neural Networks. 121 3....................3................1 Identification Results for Full-Scale Structural Steel Subassembly... 123 3.......................................4 Conclusions . monitoring......... it ................3.............. The nature of control theory and practice is largely determined by the particular application under consideration...1 Background Recent developments in the broad field of structural control of civil infrastructure systems have led to increased emphasis on procedures to obtain models.........1.................................. of various types and formats.2 Modeling and Control of Complex Systems Online Identification of Hysteretic Joints from Experimental Measurements ............................... 118 3.1...2.... (2) the widespread use of structural health monitoring (SHM) approaches based on the use of vibration response measures.... and the subject of structural control has distinctive features that govern the direction of research.....3.............2 Identification Results for a Structural Reinforced Concrete Subassembly ..... and (3) the need to have robust models of civil structures whose nonlinear motions under strong nonstationary loads are to be actively controlled........................... for representing complex nonlinear systems encountered in the modeling.... 120 3.........2 Overview of Structural Control of Civil Infrastructure Systems The general field of structural control deals with monitoring and controlling the motion of civil infrastructure systems under the action of dynamic environments.............4 Further Examples of Parametric Identification of MDOF Systems ....................................................... 125 3................... 3........... 129 3..2............ This chapter provides a synopsis of recent developments in the modeling and control of complex civil infrastructure components............3........ it focuses on the performance of relatively massive structures...........3.............................. 128 References........... The needs that are fueling these developments include (1) the increasing emphasis on high-fidelity simulation models that can be relied on to reduce the need for physical tests............3..........1 Introduction 3......................... 118 3...................3 Models of Nonlinear Viscous Dampers from Experimental Measurements................. For example.....

The objective is to utilize the most effective combination of these methods to provide integrated control of structural vibrations. In this context. sensors must be developed of various types. models can be derived that provide more accurate response predictions for dynamic loads on the structure produced by wind. especially after a major event.Modeling and Control Problems in Building Structures and Bridges 101 involves excitations whose precise properties are not known beforehand.3 Identification of Structural Systems Structural identification provides a means of utilizing data from laboratory and field testing to improve dynamic modeling capabilities for structural systems. active methods in which counterforces are applied to reduce the motions. related subjects must also be examined. relevant material properties must be studied. Two different goals of structural control must be given consideration: the utilization of control in the design of new structures and the utilization of control to improve the seismic or wind resistance of existing structures. wind. When considering practical applications. In the case of structural control. identification techniques are used to determine accurate low-order dynamic models of the structure. unwanted motions of structures: passive methods such as base-isolation or tuned-mass dampers. for example: motions and strains must be monitored. 3. Further details concerning these issues are available in the work of Housner et al. and so on. it may require dissipation of large amounts of kinetic energy. There are a variety of different approaches to controlling.1. but not necessarily eliminating. The ultimate goal of research on structural control is the practical application to real structures. earthquakes. It has the potential for important benefits to the economy and to public safety. combined active and passive methods of control. controlled damping. methods of varying structural stiffness. Identification of structural systems plays a very important role in both SHM and structural control applications. In the case of structural health monitoring. damage detection and health monitoring methods must be developed. By systematically utilizing dynamic test data from a structure. problems in system identification must be overcome. The excitations of main concern are earthquake. system identification provides an additional tool for defect identification or damage assessment of civil structures. . rather than relying on theory alone. (1997). it may involve the application of relatively large counterforces. identification techniques are used to determine any changes in the building’s or bridge’s characteristics. and man-made forces. and it is concerned with only relatively low accuracy of control. such as an earthquake. and so on. The problems posed by existing structures differ from the problems of designing new structures because of various constraints imposed by the fact that the building already exists. with an acceptable cost. to be used in the design of vibration control systems. or man-created forces.

apart from some unknown parameters. but despite these valuable efforts. For example. representations involving orthogonal functions or functionals can be used. although this step should be followed by an assessment of the assumed mathematical form for the model in light of how well it fits the data. the mathematical form of the model is not clear. these methods are commonly called “nonparametric. a linear finite-element model used to predict the dynamic behavior of a structure. A recent and updated overview of the field can be found in Kerschen et al. and the references therein. Beck and Jennings (1980). 3. Parametric methods can be used. Identification then consists primarily of estimating values for these parameters from experimental data. there is no well-accepted mathematical form that can be used to model such behavior .” In many practical dynamic problems. They may also be used to assess.1. or to improve. Parameter estimation is still required to find coefficients in the function expansions.4 Classification of Identification Techniques for Structural Systems Identification techniques for structural systems can be classified into two broad categories. Parametric methods also allow feedback to the design process by assessing the accuracy of assumptions used to derive theoretical models that are needed during design to predict what the response of a proposed structure would be under dynamic loadings. they take a “black-box” approach and fit the input-output relation of the structure by some general form. For example. traditional seismic design of a structure requires that the structure behave in a ductile inelastic fashion in severe earthquakes. If a poor match is observed. it may be necessary to modify this form by improving on the assumptions or approximations that were used and then to repeat the parameter estimation so that a model that gives a better fit to the data can be derived. Instead. so an increasing amount of attention has been devoted to nonparametric methods. Development of inelastic constitutive models for large-scale structures is an area of much research in earthquake engineering. Parametric methods assume that the mathematical form of the model is known from theoretical considerations. (1987a. such as those arising from wind or earthquakes. to estimate equivalent viscous damping values for the modes of vibration of a structure. but because these coefficients are not structural parameters directly related to physical properties. Nonparametric methods refer to techniques that require little or no theoretical knowledge of the structure. which are difficult to derive from theory.102 Modeling and Control of Complex Systems Some early publications on the subject are available in the work of Masri and Caughey (1979). 1987b). so the modeling of this behavior is important. for example. (2006). parametric and nonparametric. Masri et al. depending on the level of prior knowledge of the system structure required for their implementation.

but there are free parameters that must be assigned values to choose a model from the class that “best” describes the behavior of the structure. These include restrictions on the type of input signals that can be used and restrictions on the nature of the dynamic systems to be identified. the sheer number of components in a structure and the complexity of their interactions may make it difficult to build up a mathematical structure for a model based on theory. nonparametric techniques may require a prohibitive amount of computational effort. which can occur if the structural behavior includes plastic hysteresis. coupled with very demanding storage requirements. This gives rise to two types of uncertainty. Traditional nonparametric identification methods do have their own problems.1. limits the usefulness of nonparametric methods in improving the analytical modeling capabilities required in dynamic design. The other type of uncertainty. The avoidance of a theoretical structural model may be an advantage in deriving a model for use in vibration control or response predictions for an existing structure. “modeling error. One of the main aspects of system identification procedures is the criterion for choosing the “best” models. we present an overview of a systematic. hybrid approach for the identification of nonlinear systems in which a parametric model is used to describe the linear part as well as known nonlinear features of the structural model. but also provides an integrated framework to handle nonuniqueness.1.” arises because the accuracy of each model in the class.6 Scope In the sequel. This advantage.Modeling and Control Problems in Building Structures and Bridges 103 with confidence. however. The first type. a general mathematical form is chosen to specify a class of parametric or nonparametric models describing the inputoutput relation of a specific structure. and a nonparametric approach is used to describe the model-unknown nonlinear part. 3. which shows that a Bayesian probabilistic formulation not only leads to a natural choice for this criterion.5 Uncertainty in the Identification of Structural Systems In the usual identification approach. which can be dynamically tested. however. 3. Even when a reliable constitutive model is available at the element or member level. some methods are inapplicable to nonlinearities involving infinite memory. A review of methods for identification of the parametric part . is not known a priori. “parameter uncertainty. modeling error. and in particular that of the “best” models.” arises simply because the “best” values for the free parameters are not known a priori. This includes the possibility that there may be multiple solutions for the “best” values. Furthermore. Related work on the subject can be found in Beck (1990). for example. and measurement noise.

including orthogonal series expansions and neural networks. Typically. Consider the equation of motion of a discrete MDOF system subjected to directly applied excitation forces: ˙ M¨ (t) + f R (x(t). which is followed by an examination of several nonparametric approaches for modeling the nonparametric nonlinear part. Equation (3. vector f R is represented as a combination of a linear part f L (t) and an additive nonlinear part f N (t). The vector of restoring forces f R may depend on its arguments in a linear or nonlinear fashion.2) In the case of structural systems.1) where M is a constant matrix characterizing the inertia properties of the system. as represented by the vector f (t) in Equation (3.1 Formulation of the Hybrid Parametric/Nonparametric Approach This section presents a general formulation of the dynamics of a broad class of typical nonlinear multidegree-of-freedom (MDOF) structural systems. t) = f (t) x (3. fR = fL + fN (3.2 Hybrid Approach for the Identification of Nonlinear Structural Systems 3. Illustrative case studies and further examples are presented to demonstrate the utility of the proposed identification approach. external influences can act on the structure directly. x (t). (3.2. x is the displacement vector. 3.3) .104 Modeling and Control of Complex Systems is given. f R is a vector valued function representing the system’s restoring forces. When we include separate support motions and directly applied external forces.1) will be modified as following: e e e e e ¨ ˙ ¨ ˙ M11 x 1 (t) + C11 x 1 (t) + K 11 x 1 (t) + M10 x 0 (t) + C10 x 0 (t) e + K 10 x 0 (t) + f N (t) = f (t). and f is the vector of excitation forces.1). or they may enter the structure indirectly through the motion of the structure’s supports. which leads to the hybrid parametric/nonparametric identification and modeling of such systems.

the nonlinear part f N of the restoring forces can be modeled as consisting of two additive components: a parametric component. = constant matrices that characterize the inertia. and a nonparametric . respectively. = (x 1 (t). Examples of such nonlinearities appearing frequently in structural modeling are polynomial spring nonlinearities (such as the Duffing oscillator.Modeling and Control Problems in Building Structures and Bridges where: f (t) x(t) x 1 (t) x 0 (t) e e e M11 . K 11 105 e e e M10 . f p .4) First the linear parametric part f L of the class of models defined above is identified by using a time-domain method for the system matrices appearing in Equation (3. as well. associated with the unconstrained DOF of the system. Next the nonlinear forces f N acting on the system are identified. each of dimension n1 × n1 . Systems ¨ ˙ ˙ with hysteretic properties can fit this category in certain applications. C10 . associated with the support motions. = constant matrices that characterize the inertia. respectively. x 0 (t)) T = system displacement vector of dimension (n1 + n0 ). each of dimension n1 × n0 . or systems with ad¨ ˙ ditional polynomial cross terms (such as the Duffing–Van der Pol oscillator. linearized damping. as described above. = internal degree-of-freedom (DOF) displacement vector of dimension n1 . = prescribed support (boundary) displacement vector of dimension n0 . Furthermore. = an n1 column vector of nonlinear nonconservative forces involving x 1 (t) as well as x 0 (t). whose dynamics for a single DOF system are given by m x + c 1 x + c 2 x + c 3 x 3 = f ). Based on the nature of the problem under consideration. the identification problem is reduced to determining the optimum values of the model parameters. C11 . Because in this case the form of the nonlinearity has already been chosen.2) is now represented by the following terms: e e e e e ˙ ¨ ˙ f L = C11 x 1 + K 11 x 1 + M10 x 0 + C10 x 0 + K 10 x 0 (3. given for a single DOF system as m x + c 1 x + c 2 x + c 3 x 3 + c 4 x 2 x = f ). K 10 f N (t) = an n1 column vector of directly applied forces.3) as will be described later. It is noted that the linear component f L of the restoring forces vector in Equation (3. and linearized stiffness forces. linearized damping. whose functional form is assumed known. it is often reasonable to postulate a parametric form of a simplified nonlinear model that represents the physical phenomena being analyzed. and linearized stiffness forces.

11) . 6 A. (3. respectively.3) rewritten in the following form: e e e e e ¨ ˙ ¨ ˙ M11 x 1 (t) + C11 x 1 (t) + K 11 x 1 (t) + M10 x 0 (t) + C10 x 0 (t) e + K 10 x 0 (t) = f (t) + δ(t).2 Identification of the Parametric Linear Part Consider the equivalent linear system of the system in Equation (3. plus additional modeling errors and measurement noise. x 0 (t). respectively. < 3 Ai >. Then at every tk . < 4 Ai >. . . Let the response vector r (t) of dimension 3(n1 + n0 ) be defined as: T T r (t) = x 1 (t). .6) The term δ(t) contains the nonlinear restoring forces f N . let the six matrices appearing in Equation (3. ⎟ ⎝ . < 2 Ai >. . x 1 (t). . x ⎞ r T (t1 ) ⎜ r T (t ) ⎟ 2 ⎟ ⎜ R=⎜ . In the equation-error approach.7) For clarity of presentation. 3.9) Introducing matrix R.6) be denoted by 1 A. .6) are measured at times t1 . f np . < 5 Ai >. Also. N. < 6 Ai >) T .10) and using the notation above. tN .106 Modeling and Control of Complex Systems component. ⎟ ⎜ . and introduce the parameter vector α i : α i = (< 1 Ai >. . the linearized system matrices are estimated by minimizing a norm of the error δ(t) over a specified time interval to give the “best” linear model. the grouping of the measurements can be expressed concisely as: ˆ ˆ ˆ Rα = b + δ (3. 2 A. (3. x 1 (t). x 0 (t). ⎠ r T (tN ) ⎛ k = 1. which does not have a known functional form: f N = f p + f np (3. . .8) Suppose that the excitation and the response of the system governed by Equation (3. . 2.2. let < j Ai >= i th row of a generic matrix j A. (3. . x 0 (t) ¨T ˙T ¨T ˙T T . 1 A¨ 1 (tk ) + 2 A˙ 1 (tk ) + 3 Ax 1 (tk ) + 4 A¨ 0 (tk ) x x x 5 6 + A˙ 0 (tk ) + Ax 0 (tk ) = f (tk ) + δ(tk ). t2 .5) These two components will be identified by parametric and nonparametric identification methods. . (3. (3.

· · · . These nonlinearities differ from the simple polynomial nonlinearities because they exhibit hereditary behavior. In the remaining part of this section we will focus on modeling and identifying . The identification of these components is the subject of the current and the next sections.14) Because all the terms appearing on the right-hand side of Equation (3. the time history of f N can be determined. damping. 3. Note from Equation (3. then if a sufficient number of measurements are taken. ˆ (3. ˆ Keeping in mind that R is of dimensions m × n where m = Nn1 . For this case.Modeling and Control Problems in Building Structures and Bridges 107 ˆ where R is a block diagonal matrix whose diagonal elements are equal to R. we find that the optimal parameters are given by: ˆ ˆ −1 ˆ ˆ α = ( RT W R) RT Wb. results in: e f N (t) = f (t) − M11 x 1 (t) + f L (t) . By substituting Equation (3. and b and δ are the corresponding vectors of excitation measurements and equation errors.11) into Equation (3. and n = 3n1 (n1 + n0 ). and of a nonparametric component f np with unknown functional form. the residual nonlinear forces f N (t) can be assumed to consist of a parametric component f p . Some parametric forms of typical structural nonlinearities were described before (for example the Duffing oscillator). and stiffness terms.3 Identification of the Parametric Nonlinear Part As has been described already. α n1 ) T . W is usually chosen subjectively and is often taken as a diagonal matrix. A different class of nonlinearities that are of particular interest in structural identification are hysteretic nonlinearities. in the weighted least-squares equation-error method we minimize the cost function given by: ˆT ˆ J (α) = δ Wδ (3.2. ¨ (3. α 2 . this will result in m > n. The parameters of these models are estimated using standard least-squares techniques. least-squares procedures can be used to solve for all the system parameters that constitute the entries in α.4).3) for the nonlinear force vector f N (t).13) Solving Equation (3. T T T ˆ ˆ α = (α 1 . whose functional form is known.14) are available from measurements or have been previously identified.12) ˆ where W is the inverse of the covariance matrix of δ .12) and performing the minimization of J (α). and using the definition of f L from Equation (3.14) that f N (t) can be interpreted as the residual force vector corresponding to the difference between the excitation vector f (t) and the equivalent linear force vector composed of the inertia. Consider the general case where the measurements associated with certain degrees of freedom are more reliable than others or measurements accumulated over certain time periods are to be emphasized differently from the others. Under these conditions.

k = 1. in order to distinguish them from the more general restoring forces f R of the previous sections. The values of x (t) and x(t) are available ˙ either by direct measurements at times tk . Representative examples involve buildings under strong earthquake excitations or aerospace structures incorporating joints. and ˙ u(t) is the system’s external excitation. and Wen (1989). the nonlinear forces cannot be expressed in the form of an algebraic function involving the instantaneous values of the state variables of the system. x (t)) is the restoring force. much effort has been devoted by numerous investigators to develop models of hysteretic restoring forces and techniques to identify such systems.2. The symbol Q is used here to represent hysteretic restoring forces. 3. Due to the hysteretic nature of the restoring forces in such situations. or by integration of the signal x (t). . . ¨ If the restoring force Q(x. (1998). Consequently.1 Identification of Hysteretic Systems Problems involving the identification of structural systems exhibiting inelastic restoring forces with hereditary characteristics are widely encountered in the applied mechanics field.3. 1989): Q(x. Details of the formulation and applications of this approach can be found in Chassiakos et al. x ) = z with ˙ z = (1/η)[A˙ − ν(β|x ||z|n−1 z − γ x |z|n )] ˙ x ˙ ˙ (3.108 Modeling and Control of Complex Systems hysteretic behavior in the context of parametric nonlinear identification. x ) has hysteretic characteristics. the availability of a method for the online identification of hysteretic restoring forces is crucial for the practical implementation of structural control concepts. Q(x(t).15) where x(t) is the system displacement. Baber and Wen (1982). . . the Bouc–Wen model (Wen. Consequently.2. a model for ˙ such a force can be given by the following nonlinear differential equation. The method can easily be expanded to MDOF systems. Bouc (1967). x (t) are assumed ¨ to be available at times tk .16) . . 3. . using dynamic neural networks.3. The motion of the single DOF system to be identified is governed by: m x (t) + Q(x(t). One of the challenges in actively controlling the nonlinear dynamic response of structural systems undergoing hysteretic deformations is the need for rapid identification of the nonlinear restoring force so that the information can be utilized by online control algorithms. We also note that in the next section we will present a treatment of hysteretic nonlinearities in a nonparametric nonlinear identification context. and measurements of u(t). x (t)) = u(t) ¨ ˙ (3. The mass m of the system is assumed to be known or already estimated. Masri and Caughey (1979).2 Problem Formulation In this section we present the modeling and formulation of the hysteretic identification problem for a single DOF system. Some early contributions in this area have been made by Caughey (1960). . . k = 1.

and N is a large enough integer. Because measurements are usually taken at discrete time intervals t.19) z(k − 1)|n−1 z(k − 1) + a n (1/η)νγ x (k − 1)|z(k − 1)|n ) ˙ This discrete-time model gives rise to the following discrete-time linearly parameterized estimator: ˆ ˙ Q(k) = z(k − 1) + θ0 (k) x (k − 1) n=N + n=1 (θ2n−1 (k)|x (k − 1)||z(k − 1)|n−1 z(k − 1) ˙ (3. make online estimates of the unknown parameters of the hysteretic model expressed by Equation (3. desirable to use a linearly parameterized estimator for the online estimation of hysteretic behavior. and n will produce smooth hysteretic loops of various hardening or softening characteristics.20) + θ2n (k) x (k − 1)|z(k − 1)|n ) ˙ .Modeling and Control Problems in Building Structures and Bridges 109 Different combinations of the parameters η.3. then the coefficients a i in Equation (3. a 3 = 1. The model is parameterized linearly with respect to the coefficients (1/η) A. ν. x .15) is rewritten as: Q(k) = z(k) = u(k) − m x (k) ¨ (3.16) will be used: n=N z = (1/η) A˙ − ˙ x n=1 a n ν(β|x ||z|n−1 z − γ x |z|n ) ˙ ˙ (3. ˙ ¨ and u.18) where the value of coefficient a n determines the contribution of power n to the hysteresis. For example. and (1/η)νγ . x (k) = x (tk ). A. x (k) = x (tk ).16).3 Online Identification Algorithm The hysteretic model obeys the nonlinear differential Equation (3. Let u(k) = u(tk ).16). 3. hence the following modification of the model expressed by Equation (3.16) is n = 3. It is.17) hence the values of z at time tk are available and the identification problem can be stated as: given the mass m and using the online measurements of x.18) will be used: n=N z(k) = z(k − 1) + t(1/η) A˙ (k − 1) + x t n=1 (−a n (1/η)νβ|x (k − 1)|| ˙ (3.2. if the value of power n in model (3. (1/η)νβ. x .18) will be a 1 = 0. however. with different amplitudes and shapes. β. a 2 = 0. γ . and z(k) = z(tk ). a discrete time version model of the system defined by Equation (3. ˙ ˙ ¨ ¨ The system equation of motion (3. x(k) = x(tk ). but nonlinearly with respect to the power n.

19). i = 0. .16) is a good representation of the unknown system. ˙ ˙ ˙ |x (k − 1)||z(k − 1)|2N−1 z(k − 1). θ2 (k).25) where γ0 > 0 is the learning rate of the algorithm and β0 > 0 is a design constant. x (k − 1)|z(k − 1)|2 .25) is designed for updating the estimates θ (k) online: ⎧ if μ(k) ≤ Mθ ⎨ μ(k). . and θ2n is an estimate of ( t)a n (1/η)νγ . . . z(k − 1) ˙ (3. if the model of Equation (3.25) guarantees that all the signals will remain bounded and that the error e(k) −→ 0 as k → ∞.24)–(3. . Estimator (3. θ1 (k).24) and (3. . . . x (k − 1)|z(k − 1)|2N . . θ0 (k) is an estimate of t(1/η) A.110 Modeling and Control of Complex Systems where the coefficients θi (k). . |x (k − 1)||z(k − 1)|0 z(k − 1). θ2n−1 (k) is an estimate of −( t)a n (1/η)νβ. θ2N ]T the vector containing the true values of the parameters.23) (3. 1991). θ1 .22). the following adaptation law (3. . Let θ (k) = [θ0 (k). ˙ ˙ |x (k − 1)||z(k − 1)|1 z(k − 1). 2N are estimates at time tk of the corresponding coefficients from Equation (3. . and using standard techniques found in the adaptive estimation and adaptive control literature (Ioannou and Datta. The number Mθ is an upper bound on the norm θ ∗ .24) θ (k) = ⎩ ( Mθ / μ(k) )μ(k) if μ(k) > Mθ with μ(k) = θ(k − 1) − γ0 β0 + ξ(k − 1) 2 e(k)ξ(k − 1) (3. . Also let: ˙ ξ(k − 1) = x (k − 1). .20) is then expressed as: ˆ Q(k) = z(k − 1) + ξ T (k − 1)θ (k) and the estimation error will be: ˆ e(k) = Q(k) − Q(k) = ξ T (k)θ ∗ − ξ T (k)θ(k) = ξ T (k)φ(k) (3. . θ2 . that is. The adaptive law expressed by Equations (3.21) be a vector containing the system measurements at time tk . Such an upper bound can be easily found. (3. x (k − 1)|z(k − 1)|1 . if some information about the order of magnitude of the elements of θ ∗ is available a priori.22) T where φ(k) = θ ∗ −θ (k) is the ([2N+1]×1) vector of parameter errors between the actual and estimated values θi . The norms ξ(k) and μ(k) are the usual Euclidean vector norms. θ2N (k)]T be the vector containing the ∗ ∗ ∗ ∗ parameter estimates at time tk and θ ∗ = [θ0 . Based on Equation (3. .

The particular choice of combinations and permutations of the generalized coordinates and the number of terms J ma xi needed for a given h i depend on the nature and extent of the nonlinearity of the system and its effects on the specific DOF i. and accelerations.14) and f p has been identified by one of the methods of the previous section.2.26) The central idea of the present method (Masri and Caughey. the nonparametric part is given from Equation (3.27) where the v1 s and v2 s are suitable generalized coordinates which.. velocity. vector h depends simultaneously on all the components of the system acceleration. Under the assumption of additive parametric and nonparametric nonlinearities.27) can be achieved by performing the least-squares fit of the nonlinear forces in the “modal” domain as outlined . in the case of nonlinear dynamic systems commonly encountered in the structural mechanics field. ˙ ¨ (3.2. the residual restoring forces f np will be identified by a nonparametric method.4. The approximation indicated in Equation (3. localized nonlinearities) and relatively low-order systems. an improved rate of convergence of the series in Equation (3. are linear combinations of the physical displacements.g.27) is that each component h i of the nonlinear force vector h can be adequately estimated by a collection ˆ ( j) of terms h i . in many practical cases involving distributed nonlinearities coupled with a relatively high-order system. Because h i (t) is chosen as the i th component of f np (t). in turn.1 Nonlinear Nonparametric Terms Let h i (t) represent the ith component of the nonlinear residual force vector f np . each one of which involves a pair of generalized coordinates. x ) ≈ ˙ ¨ j =1 ˆ ( j) ( j) ( j) h i v1i . the choice of suitable generalized coordinates for the series in Equation (3. x ). x . For certain structural configurations (e. 1979) is that.4 Identification of the Nonparametric Nonlinear Part 111 The nonparametric nonlinear forces f np do not have a known functional representation. a judicious assumption is that each component of h can be expressed in terms of a series of the form: J ma xi h i (x. v2i (3.Modeling and Control Problems in Building Structures and Bridges 3. In general.27) will directly estimate the corresponding component of the unknown nonlinear force.27) is a relatively straightforward task.5) as: f np = f N − f p and because the time history of f N is known from Equation (3. x . However. velocities. 3. and displacement vectors associated with the n1 internal DOF as well as the n0 support components: h(t) = h(x. the procedure expressed by Equation (3.

31) Equation (3. x ) = h i (x 1 . x 1 .33) . M11 K 11 = . from here on. be given by: (1) (1) ˆ ˙ ¨ ˙ ¨ h i(2) (x.28) For simplicity of notation. v2i . x .) and Tl (. we use the symbol h to denote the vector of transformed nonlinear residual forces ( T f np ) instead of f np : ˙ ¨ h(u. As an example. v2i (3. Consequently. Thus. the 4th Chebyshev (1) (1) (1) polynomial will be T4 (v1i ) = 8(v1i ) 4 − 8(v1i ) 2 + 1. x .4. the residual error as defined by Equation (3.2. resulting in the corresponding vector of generalized coordinates u: u(t) = −1 x(t) (3.2 Orthogonal Series Expansion The individual terms appearing in the series expansion of Equation (3. u.27) may be evaluated by using the least-squares approach to determine the optimum ˆ fit for the time history of each h i . h i(1) may be expressed as a double series involving a suitable choice of generalized coordinates: (1) (1) ˆ h i(1) v1i . v2i ≡ k (2) (i) (2) (2) Ck Tk v1i T v2i . where is a diagonal matrix containing the squares of the natural frequencies on the diagonal.30) where the Ckl s are a set of undetermined constants and Tk (. that is. the eigenvalue prob−1 −1 lem associated with M11 K 11 is solved.) are the Chebyshev polynomials. u) = T f np (t) (3. ˆ Let h i(2) . if Tk (.) are suitable basis functions. the deviation (residual) error between h i and its first estimate h i(1) . v2i ≡ k (1) (i) (1) (1) Ck Tk v1i T v2i (3. x 1 ) − h i(1) v1i .32) where: (2) (2) ˆ h i(2) v1i . x ) ≈ h i(2) v1i .30) can be further reduced (2) (2) by fitting h i(2) by a similar double series involving variables v1i and v2i : (2) (2) ˆ ˙ ¨ h i(2) (x. Using the identification results for the linear part. and is the eigenvector matrix or modal matrix.112 Modeling and Control of Complex Systems below.30) accounts for the contribution to the nonlinear force h i of gen(1) (1) (1) (1) eralized coordinates v1i and v2i appearing in the form (v1i ) k (v2i ) . (3.29) 3. (3. such as orthogonal polynomials. then for k = 4. Experience with typical structural nonlinearities has shown that the main reason for this improvement is the fact that relatively few “modes” dominate the transformed nonlinear forces.

x . x ) and ˆ ( j) ( j) ( j) h i v1i . x . x . 3. the range of the summation indices k and appearing in Equation (3. −1 < ξ < 1 (3. J ma xi (3.2. . The terms h i (v1i . x ) − h i ˙ ¨ i i j = 1. v2i ≡ k ( j) (i) Ck Tk v1i T v2i . 1987b.36) Note that.). The nth Chebyshev polynomial is defined by the identity Tn (cosθ) = cos(nθ). J ma xi . x . estimate each h i (x.36). n = 0 . . . depends on the DOF index i. m = 0 π. the jth residual error is given by: ˙ ¨ ˙ ¨ h i (x.38) where w(ξ ) = (1 − ξ 2 ) −1/2 is the weighting function and δnm is the Kronecker delta. x ) = h i where ˙ ¨ ˙ ¨ h i(1) (x. or equivalently by: Tn (ξ ) = cos(n cos−1 ξ ). Similarly. x ) by a series of ˙ ¨ ˆ i( j) of the form indicated in Equation (3. 2006). v2 j−1) ). x ). Note that in the special case in which no cross-product terms are involved in any of the series terms. v2i ) in Equation (3.35) (3.27) are estimates ( j) of h i (x. function h can be expressed as the sum of two one-dimensional orthogonal polynomial series instead of a single twodimensional series of the type under discussion. x .Modeling and Control Problems in Building Structures and Bridges 113 This procedure is extended to account for all DOFs that have significant inˆ ( j) ( j) ( j) teraction with DOF i. x . in general.34) (3. The approximating functions h numerical value of the Ck coefficients can be determined by invoking the applicable orthogonality conditions for the chosen polynomials. n=m=0 (3. 2. . x ) ≡ h i (x. Although there is a wide choice of suitable basis functions for least-squares application.37) and satisfies the weighted orthogonality property 1 −1 w(ξ )Tn (ξ )Tm (ξ )dξ = (π/2) δnm . Further details regarding this approach and a demonstration of the utility of this procedure are available in the works of Masri et al. . In this iterative process. ( j) ( j) ( j) ( j−1) ( ( ˆ ( j−1) (v1 j−1) .36) may vary with the series index j and DOF index i. (x.4. (1987a.3 Nonlinear Forces Representation by Chebyshev Series Using orthogonal polynomials Tk (. the total number of series terms needed to achieve a given level of accuracy in fitting the nonlinear force time history. the orthogonal nature of the Chebyshev polynomials and their “equal-ripple” characteristics make them convenient to use in the present work.

ξ is the output of the linear MIMO system. . x ) = z with z = G(z. without assuming any knowledge of the functional form of the nonlinearity.. as shown in Figure 3. are given as follows: 1 2 3 p+1 = Hp (s) p ξ = [ τ.39) where G is a continuous nonlinear function. s denotes the Laplace operator). the vector ζ could contain the acceleration signals x as well as the input excitation forces f which are ¨ . capable of capturing nonlinear hysteretic effects. The multilayer feedforward networks will work well if the unknown nonlinearities are static..1. that is. which learns to approximate the unknown nonlinearity within the range of the data provided in the training set. the output of the nonlinearity depends only on the instantaneous value of its input. . the simple multilayer feedforward networks cannot capture the input-output nonlinear dynamics..4 Neural Network Approach A different approach for identifying the nonparametric nonlinear part of the restoring forces is the use of artificial neural networks. 1 =ζ = H1 (s) = H2 (s) . and Masri et al. Training algorithms such as standard back propagation can be used to train the network. 2001). The vector ζ contains all the signals that are available for measurement (for example. The dynamics of the linear MIMO system. (1992.39) is the differential Equation (3. such as a hysteretic element.114 Modeling and Control of Complex Systems 3. When the nonlinearity is of a dynamic nature. Detailed discussions on the approximating properties of this type of network for structural systems are provided in Chassiakos and Masri (1991). A more general form of Equation (3. The standard multilayer feedforward neural networks have been shown to perform very well in identifying a broad class of structural nonlinearities..16) for z. The main challenge in designing adaptive algorithms for estimating the accelerations/displacements of system (3.4. 1 2 (3. f ) ˙ ˙ ˙ (3. A network ˙ ˙ architecture that has been shown to approximate nonlinear dynamic systems is the class of Volterra–Wiener neural networks (VWNN) (Kosmatopoulos et al. which depends nonlinearly and dynamically on x. x . and x and f are the displacement and excitation forces. The VWNN consists of a linear multi-input multi-output (MIMO) stable dynamic system connected in cascade with a linear-in-the-weights neural network.16) for the restoring force is given by: Q(x. respectively.2. and f . x. 1993). and Hi (s) are stable transfer function matrices available for design (here.40) τ p+1 ] where ζ is the input vector to the VWNN. . which possesses memory characteristics. x .

41) where y is the output of the neural network.1 Block diagram of the Volterra–Wiener neural network. All other parameters. assumed to be available for measurement). Adaptive Laws ⎧ ⎨ γ φ τ (ξ ) ˙ = W τ ⎩ I − WW γ φ τ Wτ W Normalized Estimation Error = ( y − y)/η2 ˆ η = 1 + φτ φ 2 if |W| < M or if |W| = M otherwise and (γ φ τ ) τ W ≤ 0 (3. W denotes the matrix of the synaptic weights of the neural network. Note that from Equation (3. are fixed.Modeling and Control Problems in Building Structures and Bridges 115 ζ Linear MIMO Filter H(s) ξ Neural Network WT φ(·) y FIGURE 3. 1991). the “learning” capabilities are due to the synaptic weights W.40) we know that the linear MIMO system used in the VWNN dynamics consists of a cascade of linear stable filters Hi (s). Such an adaptive law keeps the parameter estimates bounded regardless of the boundedness properties of the signals x. that is.44) . Although different parameter estimation algorithms exist that can be used for the adjustment of W. x . and φ is a vector of the nonlinear activation functions of the neural network.43) (3. where the output of each filter is fed as input to the next filter.42) Here y denotes the estimate of the vector x (t + 1) at the next time sample in ˆ ¨ the case where accelerations are estimated or the estimate of the displacement vector x(t + 1) at the next time sample in the case where displacements are estimated.40). ξ is the output of the linear filter (3. and not adjusted during training. and f . which are adjusted based on certain input-output measurements.41). we will use a normalized gradient adaptive law with projection (Ioannou and Datta. It is noted that in the VWNN (3.40) and (3. The linear-in-the-weights neural network is described as follows: y = Wτ φ(ξ ) (3. The adaptive law is summarized as follows: ¨ Estimation Model y = Wτ φ(ξ ) ˆ (3. the linear filters Hi (s) and the nonlinear activation functions φ(ξ ).

e. the structure dynamics possess “memory.116 Modeling and Control of Complex Systems The scalars γ . ¨ There are two design issues for the filters H1 (s). There are no general design methodologies for . x (t − p). Hp (s). . M are positive design constants. . In the case where displacements are estimated. In other words. where the output of each filter is fed to the next one. a discrete-time estimation scheme should use not only the current values x (t) and f (t) as inputs but ¨ also their past values x (t − 1). Because the structure is a dynamic system. if we obtain a state representation of Equation (3. The parameter matrix W(t) remains bounded for all t. the choice of α) must be such that the filter “passes” all the signal energy. .” because they depend on past values of the states and inputs. f (t − 1). 2. The number p obtained in this manner is further increased or decreased until a good estimator is obtained. and so on. for practical reasons. we assume that during training the adaptive algorithm is provided with the actual node displacements. The role of the VWNN filter H(s) can be understood by considering the discrete-time analog of the estimation process. that is. the simplest one is to choose Hi (s) to be low-pass first-order filters of the form Hi (s) = 1/(s + α) where α is a positive design parameter.41). Note that. Therefore. we want to keep p as small as possible. Parameter γ is the adaptive gain. f (t − p). the dimension of this representation should be greater or equal to the dimension of a state space representation of the system. The cut-off frequency of these filters (alternatively. A good “rule of thumb” is to choose p such that. Although there are many different approaches that can be used. . . . The continuous-time analog of the memory in the VWNN estimator is the cascade of stable linear filters. . The normalized estimation error converges to a residual set whose radius can be made arbitrarily small by increasing accordingly the dimensions of ξ and the regressor vector φ. The value of p should be large enough to ensure that the memory of the estimator is larger than the memory of the actual system. that is. the output of the second filter is the analog ¨ of x (t − 2). The second issue is the number p of filters used (i. where p de¨ ¨ notes the memory of the estimator.42)–(3. The vector y corresponds to either x (t + 1) or x(t + 1) depending on whether ¨ we estimate accelerations or displacements. the memory of the estimator). provided that |W(0)| < M. the future value of any of its states depends not only on the current value of the states and inputs but also on their past values. The output of the first filter can be thought of as the analog of x (t − 1). . M is a large positive constant bounding W such that |W| < M.44) guarantees the following properties: 1.. f (t − 1) in discrete time. The adaptive law (3. The role of the regressor vector φ is to capture the nonlinear characteristics of the structure dynamics. we modify p using trial and error. f (t − 2). The first is the choice of each filter. the cut-off frequency must be large enough so that there is no loss of information during filtering of the input signal.

The new terms added are referred to as second-order terms. the methodology of Section 3. (2) steel and concrete nonlinear joints. The reason we use a sigmoidal is to make sure that all regressor signals are bounded and normalized.3 Examples and Case Studies In this section we apply the identification methodologies to several case studies and we present experimental results from the following systems: (1) the Vincent Thomas Bridge. Then. and so on. the total response of the bridge under arbitrary dynamic environments. with good fidelity. A good “rule of thumb” is letting the regressor vector φ be formed as the output of a high-order neural network (HONN).2. and approximation properties of the HONN can be found in Kosmatopoulos et al. If we need more approximation power we augment further φ by including third-order terms. We also present representative simulation results from various nonlinear systems. (2003) to obtain a complete reduced-order. as presented in Section 3.2. illustrating the applicability of the developed techniques. multi-input multi-output (MIMO) dynamic model of the bridge based on the dynamic response of the structure to the 1987 Whittier and 1994 Northridge earthquakes. equivalent linear. the resulting signals form the first entries of φ (which are called first-order terms). and 10 at various locations on its base). 3.3. to generate a reducedorder nonlinear mathematical model suitable for use in subsequent studies to predict. The bridge has been instrumented with 26 accelerometers (16 on the bridge structure. and is at risk in the seismically active southern California region. particularly because it straddles the Palos Verdes fault zone. we augment φ by adding all the possible products between two entries of φ (this is similar to a multidimensional second-order Taylor expansion). Starting with the available acceleration measurements.2 is applied to the data set to develop a reduced-order. 3. The output of a HONN is a combination of nonlinear transformation of its input signals: first the input signal ξ is passed through a sigmoidal. The sigmoidal should not be very steep to make sure that it does not get saturated for all possible values of the input variables. MDOF model.Modeling and Control Problems in Building Structures and Bridges 117 choosing the regressor terms for complicated high-dimensional systems such as MDOF structures.4. . (3) nonlinear viscous dampers. (1995). A combination of linear and nonlinear system identification techniques were used in the work of Smyth et al. Details on the formulation. theoretical analysis.1 Modeling of the Vincent Thomas Bridge Using Earthquake Response Measurements The Vincent Thomas Bridge in the Los Angeles metropolitan area is a critical artery for commercial traffic flow in and out of the Los Angeles harbor. fourth-order terms. The linear system identification method is combined with a nonparametric identification technique.

2 Online Identification of Hysteretic Joints from Experimental Measurements In this section we present experimental results.3. for the particular subset of observations used in the identification procedure.16) was assumed to represent the nonlinear hysteretic behavior of the system. The applied tip loads and beam displacements were monitored by force and displacement sensors.2 shows a set of representative time-domain plots of the nonlinear residual fitting. time-invariant model estimate. The representative results are from a single accelerometer station (station 7. . identical scales are used for all plots. Figure 3. and finally at the bottom the remaining total error is shown. The third plot shows the residual (i. an axial loads was applied to the column to simulate the dead and live loads in an actual building column. using the techniques of Section 3.3.18). and nonproportional damping) as well as quantitative measures of the extent and nature of nonlinear interaction forces arising from strong ground shaking. Following the development of Section 3. the difference of the previous two signals). 3.e. 3. the apparent nonlinearities in the system restoring forces are quite significant. Because the behavior of the column wall has an important effect on the overall behavior of the connection. Hydraulic actuators were used to impose the vertical loads as well as the induced moment at the connection..3(a) shows the phase-plane plots (restoring force versus displacement) of the measured (solid) and identified (dashed) restoring forces. It is shown that. located on a side span of the bridge and measuring lateral acceleration). Figure 3.2.3. mode shapes. the model of Equation (3. The second plot shows the linear. In the fourth plot the nonparametrically modeled residual is given.1 Identification Results for Full-Scale Structural Steel Subassembly The experiments were conducted by means of a full-scale structural steel subassembly. The study also shows the potential of the presented identification techniques to detect slight changes in the structure’s influence coefficients. and they contribute substantially to the improved fidelity of the model. The top plot shows the measured acceleration history. For ease of comparison. A value of N = 3 was chosen for the number of terms in the sum of Equation (3.118 Modeling and Control of Complex Systems Results of this study yield measurements of the equivalent linear modal properties (frequencies.2.2. which may be indicators of damage and degradation in the structure being monitored. The experimental measurements were processed to extract the value of the applied moment and the corresponding joint rotation. made of ASTM A36 steel and consisting of a W16X40 wide flange beam framing into an 11-inch/square box column. which were subsequently used to develop the hysteretic characteristics of the connection.3 for identifying the parametric nonlinear part of two systems: (1) a fullscale structural steel subassembly and (2) a structural reinforced concrete subassembly.

The term θ0 . linear estimate 60 70 80 Nonlinear residual 60 70 80 Nonparametric model estimate of Nl. This decrease in stiffness accounts for the slight clockwise rotation of the hysteretic loops in Figure 3. which basically represents the stiffness of the system.Modeling and Control Problems in Building Structures and Bridges 119 200 cm/sec2 100 0 –100 –200 0 200 cm/sec2 100 0 –100 –200 0 200 cm/sec2 100 0 –100 –200 0 200 cm/sec2 100 0 –100 –200 0 200 cm/sec2 100 0 –100 –200 0 10 20 30 40 Time 50 10 20 30 40 50 10 20 30 40 50 10 20 30 40 50 10 20 30 40 50 Measured acceleration 60 70 80 Eq. The agreement is seen to be extremely good. .3(b) shows the convergence of the (2N + 1) parameter clusters of vector θ. Figure 3.2 Sample nonlinear identification result from the hybrid identification of a multi-input multioutput nonlinear model of Vincent Thomas Bridge under earthquake excitation. residual 60 70 80 Error in fitting of Nl. residual 60 70 80 FIGURE 3. as is evident from the evolution of the parameter θ0 . can be seen to be steadily decreasing in Figure 3.3(b). One can see that the system is degrading.3(a).

: z ____ : z Modeling and Control of Complex Systems 0. Details of the test article and a photograph of the fabricated specimen and test apparatus are available in the work of Masri et al.2 .120 1 ˆ -----.4 0. Again. multistory frame joint prototype.16) was assumed to represent the nonlinear behavior of the system.4(a) shows the phase plots of the measured concrete restoring force (solid curve) and its estimate (dashed curve).3 Adaptive identification of structural steel subassembly undergoing cyclic testing.2 Identification Results for a Structural Reinforced Concrete Subassembly The concrete specimen was a one-third scale model of a reinforced concrete. exhibits dead-space nonlinearities . (1994).8 0. Figure 3. in addition to its hysteretic characteristics. The concrete specimen was tested by means of a servohydraulic device which imposed a prescribed dynamic motion at the specimen boundary.2.6 0. 3.6 −0.4 −0.2 −0.6 −0. following the development of Section 3.8 1 FIGURE 3.6 0. (b) evolution of the parameter estimates.8 −1 −1 −0.3. (a) Phase-plane plot of restoring force prediction (dashed) compared to measured force (solid). It is seen that the system. r(x. the model of Equation (3.2.4 0.3.4 −0.2 0.2 0 x (a) 0. x) 0 −0.8 −0.

01 θ5 0 −0.3 Models of Nonlinear Viscous Dampers from Experimental Measurements Nonlinear viscous dampers are increasingly incorporated into retrofit and new design strategies of large civil structures to dissipate energy from strong dynamic loads.03 0.3 (Continued).02 0.2.01 −0.04 θ0 0.03 θ4 θ3 θ6 121 θ2 θ1 0 100 200 300 400 500 600 700 800 900 1000 (b) FIGURE 3. Figure 3. such as those resulting from wind and seismic activity. An experimental data set from a full-scale nonlinear viscous damper under dynamic load was used to develop different types of parametric and nonparametric models as presented in Section 3. Assuming a parametric model commonly used in the design of nonlinear viscous dampers (called the simple design model [SDM] here).06 0.05 0.4(b) shows the evolution of the estimated parameters. as well.02 −0. 3.07 0. an adaptive least-squares identification approach was used to identify the model’s .Modeling and Control Problems in Building Structures and Bridges 0. It is seen that the identified model approximates very accurately the characteristics of the structure. even though the restoring force incorporates features associated with highly nonlinear behavior exhibited by hysteretic as well as dead-space-type nonlinearities.3.

1 Displacement (a) 12 10 8 θ6 6 θ3 0.3 −0.2 −0.4 −0. .2 −0.8 0.4 Restoring Force 0.8 −0.2 0 −0.2 0.4 θ Parameters 4 θ5 2 0 θ2 −2 −4 −6 0 50 100 150 Samples (b) 200 250 θ1 θ1 θ0 θ4 300 FIGURE 3.4 Adaptive identification of structural reinforced concrete subassembly undergoing cyclic testing.4 −0. (b) evolution of the estimated parameters.6 0. exact measured force.6 −0.1 0 0. (a) Phase-plane plot of restoring force prediction vs.122 Modeling and Control of Complex Systems Least−Squares with Forgetting Factor ID of Reinforced Concrete Data 0.3 0.

. The system is modeled as a three-story building.6.5 (b. the system parameters reach their correct asymptotic values within a few seconds of tracking time. the more general model of Equation (3.e. The three elements connecting the three masses are assumed to have unknown restoring force characteristics. Results of the application of the online parametric identification method of Section 3. The model was developed to identify hysteretic elements. thus simulating an abrupt damage to the system.4. The results of the parametric modeling are shown in Figure 3. It is clear from the phase-domain plot on the left-hand side of Figure 3.3.4.4. The phase plots show approximately a one-cycle period of the damper response (solid line for measured.2. the first row of plots shows the relationship between displacement and force. f) shows the results using nonparametric neural networks as discussed in Section 3.5 (a.4 Further Examples of Parametric Identification of MDOF Systems In this section we present simulation results from a three-DOF structure.3 was used to obtain the results shown in Figure 3. (2007).7 as well as the time-history plots of the evolution of the system parameters. and mass 2 is connected to mass 3).3 are shown in Figure 3. No other connections are present in the model. as special cases.7 shows an important illustration of the application of the parametric identification approach in the structural health monitoring field.18) was used.6 correspond to the the three phase diagrams in which each element restoring force is plotted versus the corresponding interstory displacement. and dashed for identified forces). mass 1 is connected to mass 2. linear elements and polynomial nonlinearities. the structure support is connected to mass 1. and the bottom (first story) connection was identified as a hysteretic element. Details regarding this study are available in the work of Yun et al. that the online monitoring approach can accurately detect the incipient damage state. Because the restoring forces are unknown.4. as well as be able to track its changing magnitude.Modeling and Control Problems in Building Structures and Bridges 123 parameters.2. The right-hand panels of plots show the evolution of θ parameters corresponding to each of the three elements. In the figure.. In each of the three elements. Figure 3. whereas Figure 3. and the second row shows the relationship between velocity and force for each investigated identification method. The nonparametric restoring force method presented in Sections 3.2. e).2. the middle element was identified as a linear spring-damper connection. 3. The model correctly identified all three elements: the top (third) element was identified as a polynomial-type nonlinearity (damped Duffing oscillator).1 to 3. Synthetic data were generated to simulate a situation in which a nonlinear SDOF system had its stiffness suddenly reduced from a value of 5 to 3. consisting of three masses connected in a chain-like topology (i. d). but it can also identify. The left-hand panels in Figure 3.5 (c.

(c.5 Sample identification results of nonlinear viscous damper models for a representative experimental data set. f) identification using nonparametric artificial neural networks (ANN). e) identification using the nonparametric restoring force method (RFM). (a. d) Identification using the parametric simple design model (SDM). (b. .124 800 600 400 Force (kN) Modeling and Control of Complex Systems 800 600 400 Force (kN) −100 0 100 Displacement (mm) (a) SDM 200 200 0 −200 −400 −600 −800 −200 −100 0 100 Displacement (mm) (b) RFM 200 200 0 −200 −400 −600 −800 −200 800 600 400 Force (kN) Force (kN) −100 0 100 Displacement (mm) (c) ANN 200 200 0 −200 −400 −600 −800 −200 800 600 400 200 0 −200 −400 −600 −800 −500 0 Velocity (mm/sec) (d) SDM 800 600 400 Force (kN) 200 0 −200 −400 −600 0 Velocity (mm/sec) (e) RFM 500 −800 −500 0 Velocity (mm/sec) (f) ANN 500 500 800 600 400 Force (kN) 200 0 −200 −400 −600 −800 −500 FIGURE 3.

4 is used to identify the restoring forces of a three-DOF system. The network weights were allowed to adapt during this period. the system was simulated for 40 sec. similar to the one of Section 3.5 1 0.5 0 0.8 presents plots of the restoring forces (solid curves) and their estimates (dashed curves) produced by the neural estimator. Using a wideband random signal as the base excitation.4. The neural network weights are initially set to small .5 2 Element # 2 .5 1 1.3.Duffing nonlinearity 125 6 Restoring Force 4 2 0 2 4 1. representing a three-story building. The neural estimator was also running for the entire 40-sec duration.5 0 0.2.6 Parametric identification of three elements in a nonlinear three-degree-of-freedom system.4.5 2 Element # 1 . 3.Linear FIGURE 3. Figure 3.5 Nonparametric Identification through Volterra–Wiener Neural Networks In this section the Volterra–Wiener neural network of Section 3.Modeling and Control Problems in Building Structures and Bridges 6 Restoring Force 4 2 0 2 4 4 3 2 1 0 1 2 3 4 Element # 3 .3.5 6 Restoring Force 4 2 0 2 4 1.Bouc-Wen nonlinearity 1 0.5 1 Interstory Displacement 1.

random values.5 0 0.05 0 0. although the network has never been trained on the responses to the specific base excitation. Next. It is seen that the agreement is excellent. in order to validate the approximation capabilities of the VWNN.1 θ Parameters 0.126 0.5 1 0. and it is seen from the figures that it takes about 15 sec (a few response cycles) for the network weights to adapt and to estimate the restoring forces exactly.Bouc-Wen nonlinearity θ4 θ0 FIGURE 3.5 θ Parameters θ0 Element # 2 .5 3 2. the weights of the neural network are fixed to the values already obtained. Figure 3.5 2 1. .Duffing nonlinearity 0. The adaptation is on from time t = 0.9 presents the restoring forces (solid curves) and their estimates (dashed curves) produced by the fixed VWNN.15 0. Now a different base excitation is used.05 Modeling and Control of Complex Systems θ0 θ2 θ1 Element # 3 .6 (Continued).1 0 2 4 6 8 10 12 14 16 18 20 3.Linear θ1 0 2 4 6 8 10 12 14 16 18 20 6 5 θ Parameters 4 3 2 1 0 1 0 2 4 6 8 10 12 Time (sec) 14 16 18 20 Element # 3 .

Modeling and Control Problems in Building Structures and Bridges 5 4 3 2 Restoring Force 1 0 1 2 Estimated force 3 4 1. force 1 0. where Stiffness Shifts from 5 to 3 at t = 5 sec.5 127 Interstory Displacement for Element #1 Identification of Element #1.5 2 2. .5 Actual rest.5 3 3.7 Detection of change in nonlinear element when at time t = 5 sec. the stiffness changes abruptly from 5 to 3. 8 6 θ Parameters for Element #1 θ0 4 2 0 θ4 2 4 6 0 2 4 6 8 10 Time 12 14 16 18 20 FIGURE 3.5 0 0.5 1 1.

for developing parsimonious nonlinear (i. using experimental . not-necessarily-linear) models of arbitrary structural systems. spanning the range from micro-electro-mechanical (MEMS) devices.8 Nonparametric identification using VWNN of three elements in a nonlinear three-degree-offreedom system. to dispersed civil infrastructure systems. active control. 3. or structural health monitoring applications require the availability of a toolkit of methods and approaches for developing robust mathematical models of varying levels of sophistication and format to capture the underlying complex physical phenomena embedded within the structural systems of interest. A wide variety of case studies is provided to illustrate the use of the modeling tools for online or off-line identification situations. The models can be used in a variety of applications.e.. to aerospace structures.4 Conclusions The great variety of challenging situations encountered in the modeling of realistic structural dynamic systems for purposes of simulation.128 Modeling and Control of Complex Systems Time-History of Restoring Forces and their Estimates 4 3rd Story 2 0 –2 –4 5 2nd Story 0 5 10 15 20 25 30 35 40 45 0 –5 10 1st Story 5 0 –5 0 5 10 15 20 25 30 35 40 45 0 5 10 15 20 25 Time (secs) 30 35 40 45 FIGURE 3. incorporating parametric as well as nonparametric system identification methods. This chapter provides a state-of-the-art approach.

Bouc. 8.” Int.” Proc. “Random Excitation of a System with Bilinear Hysteresis. 145–160. 5th Int. C. (1990). Structural Safety and Reliability. “Statistical System Identification of Structures. P. hysteresis. Caughey. ASME 27. Earthquake Eng. Abstract. Trans. K. Dyn. limited-slip. T.9 Time history of restoring forces and their estimates after training using VWNN. T. 403–416. References Baber. T. Conf. ASCE. (1967). (1982). and so on. Beck. New York. to represent many challenging types of stationary as well as nonstationary nonlinearities such as polynomial-type. R. Beck. 649–652. L. . “Forced Vibration of Mechanical Systems with Hysteresis. Y. Prague. J. measurements and simulation results.” Earthquake Eng.Modeling and Control Problems in Building Structures and Bridges Time-History of Restoring Forces and their Estimates 129 2 3rd Story 1 0 –1 –2 4 2nd Story 2 0 –2 –4 4 1st Story 2 0 –2 –4 0 5 0 5 0 5 10 15 20 25 30 35 40 45 10 15 20 25 30 35 40 45 10 15 20 25 Time (secs) 30 35 40 45 FIGURE 3. 10. Mech. “Stochastic Response of Multistory Yielding Frames. and Jennings. Czechoslovakia.” Proc. and Wen. but with different base excitation. Struct. J. (1960). K. 4th Conf. Appl. Nonlinear Oscillation. (1980). L. “Structural Identification Using Linear Models and Earthquake Records. J. Struct.” J. Dyn.

B. Struct. F. G.” IEEE Trans. (2007). W... G. S.” Trans. 923–929. 123– 133. L. 20(3). F. F. Tasbihgoo. P. 100–110. F. Appl. T. A.. R. K. K. M. S. K. W.. W. Eng. (1987a).. J. M... O. Bergman. and Ioannou. “System Identification of the Vincent Thomas Suspension Bridge Using Earthquakes Records. F. J-C. F. R. 45–56. F. A. “Structural Control: Past. (2001). Highazy. “Robust Adaptive Neural Estimation of Restoring Forces in Nonlinear Structures.” Trans. “On-Line Identification of Hysteretic Systems. and Datta. “Past. Skelton. Wolfe. Appl. K. Saud. Masri. “Data-Based Model-Free Representation of Complex Hysteretic MDOF Systems. and Yao. Mech. Eng... T. A. (2006). F. (1979). and Chassiakos. Appl. Miller. ASME J. S. Struct.. W. Masri. Kosmatopoulos.. Neural Networks 6(2). E. S. Smyth. T.” Computing Systems Eng. A. F. T. Masri. “A Nonparametric Identification Technique for Nonlinear Dynamic Problems. Agbabian. Wen. M. K. I: Formulation. 42(2).. Chassiakos. (1992). M. F. Mech. Masri. Masri. Mech.. J... G.” Earthquake Eng.. F. Civ. J. ASME J.. A. “Comparison of Modeling Approaches for Full-Scale Nonlinear Viscous Dampers. 880–893.. F. Masri. II: Applications. S. A.. and Masri. “Identification of Nonlinear Dynamic Systems Using Neural Networks. T. “An Experimental Study of Embedded Fiber-Optic Strain Gauges in Concrete Structures. G.. 422–431. S. S. and Caughey. Caffrey. Ioannou.. A. S.. 33. S. Y. F. 109. J.” J. S. B. T. Mech. Masri.. T. Polycarpou. Control Health Monitoring 13 (1)..” Mech. “High-Order Neural Network Structures for Identification of Dynamical Systems. R. (1998). (1995). F. “Identification of Nonlinear Vibrating Structures. IEEE 79.. Spencer. Claus. F. S.. 2(1). 365–387. 433–447.. P. Mech. 339–367... Abdel-Ghaffar.. F. M.. G. 109. (1991). R.” ASCE J. A. P. A. C. 897–971.-B. M. A. S. T. A.” Trans. J. Systems Signal Proc. “Structure-Unknown Nonlinear Dynamic Systems: Identification through Neural Networks.-S. A. and de Vries. K. G. Masri... Mech.. Am. K. “Identification of Nonlinear Vibrating Structures.. Chassiakos. Masri. (2003). (1991). and Chassiakos. T. and Black. Soong. Makris. P. K.... Appl. Smyth. Caughey.” Trans. A. “Robust Adaptive Control: A Unified Approach. Eng.. Yun. Vibration Control (in press). Smyth. ASME J. 39–52. Rev. A. “Identification of the Internal Forces of Structural Systems Using Feedforward Multilayer Networks. Caffrey. A. Soc. H. P. A. E. 60. and Golinval. Masri. 1736–1768. 68(6). A. and Caughey. ASME J. Vakakis. W. (1994). Pei. Worden. and Caughey. Present and Future..” Trans. Mech. “Methods of Random Vibration for Inelastic Structures.. T. Masri. B.” Appl. K.” Trans. A. Kerschen. (1989). Dyn. 65(1). (1997).” Smart Materials & Structures 1(1). Chassiakos. (2006).” J. 46(2). Claus. Housner. G.” Proc. Christodoulou.130 Modeling and Control of Complex Systems Chassiakos. Mech. N. A. Saud.” J.. Appl. 194–203. R. Kosmatopoulos. and Masri.. ASME J. K. Miller. 505–592. O. 1696– 1717. .. (1987b).. Chassiakos. (1993). 918–922. 120 (8). E. S. Div. S. A.. and Caughey. 123(9). M. and Caughey. G. Smyth. Tasbihgoo. and Caughey.. F. ASME J. Appl.. R. Mech.. Present and Future of Nonlinear System Identification in Structural Dynamics.

..............6 Conclusion .......1 Introduction In this chapter the design of optimal controllers for the discrete-time linear quadratic zero-sum games that appear in the H∞ optimal control problem is addressed......................................... 151 4... and Frank L.3 Convergence of Zero-Sum Game Q-Learning ............3................5...................2 4.............. 131 Discrete-Time State Feedback Control for Zero-Sum Games.....................4.....3................................2 Q-Learning-Based H∞ Autopilot Controller Design ............4.................. 156 References............. 150 4.... In this chapter two methods 131 .4 Action-Dependent Heuristic Dynamic Programming (ADHDP): Q-Learning.............................................................. Murad Abu-Khalaf...........1 HDP-Based H∞ Autopilot Controller Design ....................................... Lewis CONTENTS Introduction........................................ 148 4.......................................................... 146 4.... 137 4.................1 Derivation of Model-Free Online Tuning Based on the Q-Learning Algorithm (ADHDP)............. 153 4........5 Online ADP H∞ Autopilot Controller Design for an F-16 Aircraft...... 140 4........ 158 4.......3 4............ 132 Heuristic Dynamic Programming (HDP) ....... 143 4.........4... 142 4..............................2 Online Implementation of HDP Algorithm .3.................... 137 4....5.....2 Online Implementation of Q-Learning Algorithm. The method used to obtain the optimal controller is the approximate dynamic programming (ADP) technique......1 Derivation of HDP for Zero-Sum Games ....................................1 4..........................4 Model-Free Adaptive Dynamic Programming Algorithms for H-Infinity Control of Complex Linear Systems Asma Al-Tamimi................. 139 4...3 Convergence of Zero-Sum Game HDP ............

such as reinforcement learning. control and disturbance. also known as Q-learning. heuristic dynamic programming (HDP). action-dependent heuristic dynamic programming (ADHDP). The organization of this chapter is as follows. zero-sum games for a discrete-time linear system with quadratic infinite horizon cost are revisited. The current status on ADP is given in Reference [2]. the solution of the zero-sum game of a linear discrete-time system with quadratic cost is derived under a state feedback information structure. The second algorithm. dual heuristic dynamic programming (DHP). conversion proofs of ADP methods for H∞ discrete-time control. Hagen and Krose [3]. [24].4 extends the results to the ADHDP case. The connection with algebraic Riccati equations was emphasized in Reference [9]. To our knowledge. In Section 4. Bradtke et al. and others to solve optimal control problems forward in time. are derived with the associated Riccati equation. This overcomes computational complexity associated with dynamic programming. Barto et al. Widrow et al. an H∞ control example for an F-16 aircraft autopilot design example is given to show the practical effectiveness of the two ADP techniques. in this algorithm the system model is needed.2 Discrete-Time State Feedback Control for Zero-Sum Games In this section. which is an off-line technique that requires a backward-in-time solution procedure [11]. We present a technique for online implementation as well as.2. [1]. Specific forms for both the . is an improved algorithm from the first one. to tune and improve both networks forward in time. Bertsekas and Tsitsiklis [21]. which is in fact an adaptive control design that converges to the optimal H∞ solution. and the value function. Werbos [16] classified ADP approaches into four main schemes: heuristic dynamic programming (HDP). 4. In these works. the optimal control law. The policies for each of the two players. as the system model is not needed. that is. ADP was proposed by Werbos [14]. In Section 4. This leads to a model-free optimal controller design. is a method to find the optimal controller forward in time. [6]. and hence can be implemented in actual control systems. are modeled as parametric structures. This is combined with incremental optimization. an action network. Prokhorov and Wunsch [20].3. The first algorithm. an HDP algorithm is proposed to solve the zerosum game forward in time. Howard [15]. and both yield online algorithms. Q-learning provides the first direct adaptive control technique that converges to an H∞ controller. Watkins [12]. These are ADP algorithms that create agents that learn to coexist. a critic network. and action-dependent dual heuristic dynamic programming (ADDHP). Finally. Section 4. neural networks. for the first time.132 Modeling and Control of Complex Systems are presented to obtain the optimal controller. and Landelius [9] applied ADP techniques to the discrete-time linear quadratic optimal control problem. action-dependent heuristic dynamic programming (ADHDP) or Q-learning.

Consider the following discrete-time linear system: xk+1 = Axk + Buk + Ewk yk = xk .4) Using the dynamic programming principle. Consider the infinite-horizon value function: V(xk ) = ∞ i=k xiT Rxi + uiT ui − γ 2 wiT wi (4. one can write the infinite-horizon cost-to-go as: V(xk ) = ∞ i=k xiT Rxi + uiT ui − γ 2 wiT wi ∞ i=k+1 T T T = xk Rxk + uk uk − γ 2 wk wk + xiT Rxi + uiT ui − γ 2 wiT wi (4. wk ) + V ∗ (xk+1 )) = max min(r (xk . then it is known that the policies are in saddle-point equilibrium. uk . (4. y ∈ R p . Therefore. that is. For any stabilizing sequence of policies uk and wk . Equation (4. wk ) + V(xk+1 ). in which the infinite-horizon cost is to be minimized by player 1.3) can be written as: T T T T T xk Mxk = xk Rxk + uk uk − γ 2 wk wk + xk+1 Mxk+1 . and maximized by player 2. the optimization problem in Equations (4. uk . Here the class of strictly feedback stabilizing policies is considered.9) that is strictly feedback stabilizing.1) where x ∈ Rn . It is desired to find the optimal control u∗ for the worst-case disturbance k ∗ wk . uk . wk ) + V ∗ (xk+1 )). w u u w (4.Model-Free Adaptive Dynamic Programming Algorithms 133 Riccati equation and the control and disturbance policies are derived that are required for applications in ADP. (4.2) can be written as: V ∗ (x) = min max(r (xk . minimax is equal to maximin. γ is the desired L2 gain for disturbance attenuation. It is known that for any stabilizing policies. uk .2) for a prescribed fixed value of γ .3) T T T = xk Rxk + uk uk − γ 2 wk wk + V(xk+1 ) = r (xk . the cost function is a finite quadratic function of the states and can be written as: T V(xk ) = xk Mxk for some symmetric positive semidefinite matrix M. wk .5) If we assume that there exists a solution to the game algebraic Riccati equation (GARE) as in Equation (4. In the H-infinity control problem. in the restricted class of feedback stabilizing policies under .1) and (4. uk ∈ Rm1 is the control input and wk ∈ Rm2 is the disturbance input.

15) . 17].134 Modeling and Control of Complex Systems which xk → 0 as k → ∞ for all x0 ∈ Rn . (4.9) Substituting Equation (4. the inequalities in Equations (4. uk .7) should be satisfied: I − γ −2 E T PE > 0 I + B PB > 0 T (4.5) and applying the Bellman optimality principle. which is given as: B T PA .8) and satisfies the GARE [5. (4.13) (4. Assuming that the game has a value and is solvable. and the worst case disturbance is: ∗ wk = ( E T PE − γ 2 I − E T PB( I + B T PB) −1 B T PE) −1 ×( E T PB( I + B T PB) −1 B T PA − E T PA)xk (4. E T PA (4.7) where P ≥ 0 such that: T V ∗ (xk ) = xk P xk (4. I + B T PB E T PB B T PE T E PE − γ 2 I −1 (4.6) and (4.10) This can be rewritten as: T T T T xk P xk = min max xk Rxk + uk uk − γ 2 wk wk u w + ( Axk + Buk + Ewk ) T P( Axk + Buk + Ewk ).14) so the optimal disturbance is a state feedback with gain: K = ( E T PE − γ 2 I − E T PB( I + B T PB) −1 B T PE) −1 ×( E T PB( I + B T PB) −1 B T PA − E T PA).11) by satisfying the first necessary condition and given as: u∗ = ( I + B T PB − B T PE( E T PE − γ 2 I ) −1 E T PB) −1 k ×( B T PE( E T PE − γ 2 I ) −1 E T PA − B T PA)xk so the optimal control is a state feedback with gain: L = ( I + B T PB − B T PE( E T PE − γ 2 I ) −1 E T PB) −1 ×( B T PE( E T PE − γ 2 I ) −1 E T PA − B T PA). in order to have a unique feedback saddle-point in the class of strictly feedback stabilizing policies. wk ) + V ∗ (xk+1 )) u u w w T T T T = min max xk Rxk + uk uk − γ 2 wk wk + xk+1 P xk+1 .12) (4.6) (4. one has: P = AT PA + R − [ AT PB AT PE] V ∗ (xk ) = min max(r (xk .11) The optimal controller can be derived from Equation (4.8) in Equation (4.

. the value T function of the game V ∗ (xk ) = xk P xk satisfies a certain Riccati equation.14) exist due to Equations (4. in Equations (4. which was derived under full information structure. Moreover. and disturbance K . Note that Equation (4.16) This is equivalent to: P = R + L T L − γ 2 K T K + ( A + B L + E K ) T P( A + B L + E K ) T = R + L T L − γ 2 K T K + Acl PAcl (4. LEMMA 1 If ( I − γ −2 E T PE) is invertible.17) is the closed-loop Riccati equation.6) and (4. PROOF Apply the matrix inversion lemma. I − γ −2 E E T P is invertible and I − γ −2 EET P > 0. The form of the Riccati equation derived in this chapter is similar to the one appearing in Reference [5]. then (I − γ −2 E E T P) is also invertible. PROOF Because ( I − γ −2 E T PE) is invertible then the following expression I + γ −2 E( I − γ −2 E T PE) −1 E T P. respectively. LEMMA 2 The optimal policies for control L.17) where Acl = A+ B L + E K . are equivalent to the ones that appear in Reference [7] and [8]: L = −B T P( I + B B T P − γ 2 EET P) −1 ) A K = γ −2 E T P( I + B B T P − γ 2 EET P) −1 ) A. (4. under state feedback information structure. 8] derived under the same state feedback information structure. it can be shown that: I + γ −2 E( I − γ −2 E T PE) −1 E T P = ( I − γ −2 E E T P) −1 . it will be shown that the Riccati equation derived in this chapter is equivalent to the work in References [7.7). is valid: Applying the matrix inversion lemma. Next it is shown that. Equation (4.13) and (4.Model-Free Adaptive Dynamic Programming Algorithms 135 Note that the inverse matrices in Equations (4. Hence.15).11) can be rewritten as follows: T T ∗T ∗ xk P xk = xk Rxk + u∗T u∗ − γ 2 wk wk k k ∗ + Axk + Bu∗ + Ewk k T ∗ P Axk + Bu∗ k + Ewk .13) and (4.

E T PA (4. in Equation (4.136 Modeling and Control of Complex Systems LEMMA 3 Substituting the policies. (4.17) can be written as follows: P = ( A + B L + E K ) T P( A + B L + E K ) + L T L − γ 2 K T K + R = AT PA + AT PBL + AT PEK + L T B T PA + K T E T PA + [ L T K T ] × B T PB E T PB B T PE E T PE L K + [ LT KT ] I 0 0 −γ 2 I L K + R.20) −1 D22 ( A21 A−1 B T PA 11 K = where −1 D11 A12 A21 A11 − E PA) T = ( I + B T PB − B T PE( E T PE − γ 2 I ) −1 E T PB) −1 = B T PE = E T PB = I + B T PB A22 = E T PE − γ 2 I −1 D22 = ( E T PE − γ 2 I − E T PB( I + B T PB) −1 B T PE) −1 .21) A12 A22 −1 = −1 D11 −1 D22 A21 A−1 22 −1 −D11 A12 A−1 22 .20) can be written as follows: L K It is known that: A11 A21 =− −1 D11 −1 D22 A21 A−1 22 −1 −D11 A12 A−1 22 −1 D22 B T PA .13) and (4.18) PROOF The control policy and the disturbance policy can be written as fol−1 L = D11 ( A12 A−1 E T PA − B T PA) 22 lows: (4.19) and (4. −1 −1 From Equations (4.7). one can rewrite Equation (4. and given by: P = AT PA + R − [ AT PB AT PE] I + B T PB E T PB B T PE T E PE − γ 2 I −1 B T PA .22) Equation (4.15). E T PA (4. Equations (4.6) and (4.17) one can obtain the Riccati equation that appears in Reference [5]. one concludes that D11 and D22 are invertible.23) . E T PA (4. −1 D22 Therefore.21) as follows: L K =− =− A11 A21 A12 A22 −1 B T PA E T PA −1 I + B T PB E T PB B T PE T E PE − γ 2 I B T PA . Equations (4.19) (4.

18). which is given as: P = R + AT P( I + ( B B T − EET ) P) −1 A. 8].1). E T PA (4. a parametric structure is used to approximate the cost-to-go function of the current control policy.2. Note that since Vi (x) .24) can be written as: P = AT PA + AT PB AT PE L K + R. one has the desired Riccati equation: P = AT PA + R − [ AT PB AT PE] I + B T PB E T PB B T PE E T PE − γ 2 I −1 B T PA .25) Substituting Equation (4. one finds V1 (x) by solving Equation (4.26) is the same as Equation (4.23). Then the certainty equivalence principle is used to improve the policy of the action network.26) It can be seen that Equation (4. In the HDP.3 Heuristic Dynamic Programming (HDP) In this section. one has: P = AT PA + AT PBL + AT PEK + L T B T PA + K T E T PA − L T × I + B T PB B T PE T T E PB E PE − γ 2 I I + B T PB B T PE T T E PB E PE − γ 2 I −1 137 KT B T PA +R E T PA (4.24) = AT PA + AT PBL + AT PEK + R. Starting with an initial quadratic cost-to-go V0 (x) ≥ 0 that is not necessarily optimal. the HDP algorithm is developed to solve the discrete-time linear system zero-sum game described in Section 4.22) in Equation (4.27) Equation (4.18) is equivalent to the game algebraic Riccati equation (GARE) that appears in References [7. HDP and ADHDP.25). Equation (4. 4. forward in time. uk wk (4. and the value function (4.1 Derivation of HDP for Zero-Sum Games Consider the system (4. In the next sections the solution for the optimal controller will be found using the two ADP algorithms.22) in Equation (4.2).27) is a recurrence relation that is used to solve for the optimal cost-to-go.3.27) with i = 0 according to: T Vi+1 (xk ) = min max xk Rxk + uT u − γ 2 w T w + Vi (xk+1 ) .Model-Free Adaptive Dynamic Programming Algorithms Substituting Equation (4. 4. It is shown in Reference [5] that Equation (4. (4. the game value function.

u(x. are found as: L i = ( I + B T Pi B − B T Pi E( E T Pi E − γ 2 I ) −1 E T Pi B) −1 × ( B T Pi E( E T Pi E − γ 2 I ) −1 E T Pi A − B T Pi A). .32). The parameter structures (4. (4. B. and p = v( P). one then repeats the same process for i = 0. Substituting Equations (4. xn−1 xn . it is shown that Vi (xk ) → V ∗ (xk ) as i → ∞. . L i ) = L iT x.27) and (4. K i = ( E Pi E − γ I − E Pi B( I + B Pi B) T 2 T T −1 (4.34) which can be thought of as the desired target function to which one needs to ˆ fit V(x. .27) by using the certainty equivalence principle.30) B Pi E) T −1 ×( E Pi B( I + B Pi B) T T −1 B Pi A − E Pi A). . policies are found using Vi (x) in Equation (4. The output vector of v(·) is constructed by stacking the columns of the squared matrix into a one-column vector with the off-diagonal elements summed as Pi j + P ji . .. L i and K i of ui (xk ) and wi (xk ). where v(·) is a vector function that acts on n × n matrices and outputs a n(n+1)/2 × 1 column vector. a natural choice of these parametric structures is given as: ˆ ¯ V(x. . (4.29) and (4. pi ) = xk Rxk + (L i xk ) T (L i xk ) − γ 2 ( K i xk ) T ( K i xk ) + piT xk+1 .138 Modeling and Control of Complex Systems is not initially optimal.8).31) (4. (4. . 0. Because Vi (x) is quadratic in the state as given in Equation (4. where V ∗ (xk ) is the optimal value function for the game based on the solution to the GARE (4.14). . These greedy policies are denoted as ui (xk ) and wi (xk ).29) (4.31). x1 xn . pi ). x2 x3 . ˆ w(x. . . . 1. In this section. Therefore Vi+1 (x) is given by: T Vi+1 (xk ) = xk Rxk + uiT (xk )ui (xk ) − γ 2 wiT (xk )wi (xk ) + Vi (xk+1 ). xn ) is the Kronecker product ¯ quadratic polynomial basis vector 0.33) 2 2 2 where x = (x1 . x2 .18). and E. T T Once Vi+1 (x) is found.12) and (4. and the two action networks are linear in the state.28). one has: T ¯ d(xk . pi+1 ) in a least-squares sense to find pi+1 such that: T ¯ pi+1 xk = d(xk . and (4. Note that to update the action networks. . it is necessary to know the plant model matrices A.28) It can be shown that the parameters of the action networks.33) give an exact closed-form representation of the functions in Equations (4.28). pi ) = piT x . K i ) = ˆ K iT x. as shown in Equations (4.32) (4. (4. 2.30) in the right-hand side of Equation (4.

34) and (4. with α.35) can be solved in real time by collecting enough data points generated from d(xk .34). and ε1 positive integers and ε0 ≤ ε1 . pi ) The least-squares (4. pi ) ]T . pi ) in Equation (4. α > α0 . by observing the states online. where ¯ X = [ x |xk−N−1 x |xk−N−2 ¯ ··· x |xk−1 ] ¯ · · · d(xk−1 .3. ε0 .36) d(xk−N−2 . Therefore. one has the following leastsquares problem: pi+1 = ( XXT ) −1 XY. ¯ (4. where n is the number of states.31) in a least-squares sense over a compact set. and k is the discrete time. To satisfy the excitation condition of the least-squares problem. Y = [ d(xk−N−1 . one needs to have the number of collected points N at leastN ≥ n(n + 1)/2. or. ⎧ ⎫ ⎨ ⎬ T | pi+1 x − d(x. This can be determined by simulation.Model-Free Adaptive Dynamic Programming Algorithms 139 The parameter vector pi+1 is found by minimizing the error between the target value function (4. pi )|2 d x . is the covariance matrix .36) can be solved recursively by requiring a persistency of excitation condition: ε0 I ≤ 1 α α xk−t xk−t ≤ ε1 I ¯ ¯T m=1 for all k > α0 . . after several time steps that are enough to guarantee the excitation condition. uk . wk ) as the dynamics evolve in time. xk+1 and the reward function r (xk . This requires one to have knowledge of the state information xk . pi ) − xk pi+1 (t − 1) ¯T pi+1 (t) = pi+1 (t − 1) + i (t) − 1) xk e i (t) ¯ 1+ ¯ i (t − 1) xk i (t xk ¯T = i (t − 1) − i (t − 1) xk xk i (t − 1) ¯ ¯T 1 + xk i (t − 1) xk ¯T ¯ where i is the policy update index. pi ) (4.2 Online Implementation of HDP Algorithm The least-squares problem in Equation (4.35) pi+1 = arg min pi+1 ⎩ ⎭ 4. in real-time applications. t is the index of the recursions of the recursive least-squares. The recursive least-squares algorithm (RLS) is given as: e i (t) = d(xk .

3. Ki = ( E T Pi E − γ 2 I − E T Pi B ( I + BT Pi B ) −1 BT Pi E )−1 × ( E T Pi B ( I + BT Pi B ) −1 BT Pi A − E T Pi A). pi )]T .1. of the recursion and e(t) is the estimation error of the recursive least-squares. The on-line HDP algorithm developed in this chapter is summarized in the flowchart shown in Figure 4.1 Zero-sum games HDP flowchart.140 Modeling and Control of Complex Systems Start of the Zero-Sum HDP Initialization p0 = v( P0 ) ≥ 0 : P0 ≥ 0 i =0 Policy Iteration Li = ( I + BT Pi B − BT Pi E ( E T Pi E − γ 2 I ) −1 E T Pi B )−1 × ( BT Pi E ( E T Pi E − γ 2 I )−1 E T Pi A − BT Pi A). Solving the Least-squares X = [x x k − N −1 x x k − N −2 Λ x x k–1 ] d ( xk–1 .3 Convergence of Zero-Sum Game HDP We now prove the convergence of the proposed zero-sum game HDP algorithm. pi ) d ( xk − N − 2 . . Y = [ d ( xk − N −1. pi ) Λ pi +1 = ( XX ) T −1 XY i i +1 No pi+1 − pi F <ε Yes Finish FIGURE 4. Note that i (0) is a large number and i+1 (0) = i . 4.

30). that is.18): Pi+1 =AT Pi A + R−[ AT Pi B AT Pi E] I + B T Pi B B T Pi E T T E Pi B E Pi E − γ 2 I −1 B T Pi A .37) with P0 ≥ 0 converges to P that solves Equation (4.37) follows from Equation (4.Model-Free Adaptive Dynamic Programming Algorithms 141 LEMMA 4 Iterating on Equations (4. (4. pi )|2 d x . Equation (4.39) Because the excitation condition is assumed. iteration on pi is equivalent to the following iteration: Pi+1 = R+ L iT L i −γ 2 K iT K i +( A+ B L i + E K i ) T Pi ( A+ B L i + E K i ).18). THEOREM 1 Assume that the game has a value and is solvable.29). (4. ¯ (4. where v is the vectorized function in the Kronecker product.40).36). E T Pi A (4.18) when starting with P0 ≥ 0.34) in Equation (4.40) Using the same steps as in Lemma 3. substituting Equation (4. one has: ⎛ ⎞−1 ⎛ ⎞ pi+1 = ⎝ xk xk d x ⎠ ¯ ¯T ⎝ xk xk d x ⎠ ¯ ¯T ×v( R + L iT L i − γ 2 K iT K i + ( A + B L i + E K i ) T Pi ( A + B L i + E K i )) = v( R + L iT L i − γ 2 K iT K i + ( A + B L i + E K i ) T Pi ( A + B L i + E K i )).36) under excitation condition is equivalent to the following iteration on the GARE in Equation (4.39). . Since the matrix Pi+1 which reconstructed from pi+1 is symmetric. the corresponding excitation conditions hold.35) is solvable. ¯¯ ¯ (4. then the HDP algorithm converges to the value of the game that solves the Riccati Equation (4. If the sequence of least-squares problems in Equation (4.37) PROOF The least-squares problem is defined in Equation (4.38) pi+1 ⎩ ⎭ The first-order necessary condition requires that: (2x x T pi+1 − 2x d T (x. and (4. PROOF This follows from Lemma 4 and from Reference [5] where it is shown that iterating on Equation (4. which is: ⎧ ⎫ ⎨ ⎬ T pi+1 = arg min | pi+1 x − d(x. pi ))d x = 0.

wk ) is equal to the game value function V ∗ (xk ) when the policies uk . the concept of Q-functions to zero-sum games that are continuous in the state and action space as in Equation (4.43) The optimal action-dependent game value function Q∗ (xk . 4. In this section. will be addressed. wk are optimal. wk ) + V ∗ (xk+1 ) T T T T = xk Rxk + uk uk − γ 2 wk wk + xk+1 P xk+1 T T T = xk Rxk + uk uk − γ 2 wk wk + ( Axk + Buk + Ewk ) T P( Axk + Buk + Ewk ) ⎤⎡ ⎤ ⎡ R 0 0 xk T T T 0 ⎦ ⎣ uk ⎦ = xk uk wk ⎣ 0 I (4. B T PE T E PE − γ 2 I (4. uk . ADHDP. wk ) + V ∗ (xk+1 ) T = xk T uk T wk T H xk T uk T wk T (4. uk .5) is developed.1 is by selecting P0 = 0. so in the next section the relationship between the optimal value function and the Q-function is discussed. uk .4 Action-Dependent Heuristic Dynamic Programming (ADHDP): Q-Learning The ADHDP algorithm is based on the Q-function. will lead to a model-free algorithm. uk .41) where H is the matrix associated with P that solves GARE.142 Modeling and Control of Complex Systems We have just proved convergence of the HDP algorithm. Note that an easy way to initialize the algorithm in Figure 4.42) wk 0 0 −γ 2 I ⎡ ⎤ ⎡ T⎤ xk A T T T + xk uk wk ⎣ B T ⎦ P A B E ⎣ uk ⎦ ET wk so H can be written as: ⎛ ⎞ ⎡ T A PA + R Hxx Hxu Hxw ⎝ Hux Huu Huw ⎠ = ⎣ B T PA Hwx Hwu Hww E T PA AT PB T B PB + I E T PB ⎤ AT PE ⎦. wk ) = r (xk . . and is derived as: T xk T uk T T wk H xk T uk T wk T = r (xk . In the next section the second algorithm. The optimal policies are derived from the Q-function which. as will be shown. The optimal action-dependent value function Q∗ of the zero-sum game is then defined to be: Q∗ (xk .

44) and (4.4.45) ⎤ E L E ⎦ (4. then the system model is not needed to compute the controller gains. and then finds Q1 (x. uk .. we show how to develop an algorithm to learn the Q-functions (i. the gains of the optimal strategies can be written in terms of H as: −1 L = Huu − Huw Hww Hwu −1 K = Hww − Hwu Huu Huw −1 −1 −1 Huw Hww Hwx − Hux . and they are the main equations needed in the algorithm to be proposed to find the control and disturbance gains.48) (4.47) Equations (4. Note that if H is known. 4. a parametric structure is used to approximate the Q-function of the current control policy. −1 Hwu Huu Hux − Hwx . k+1 (4. wk ) uk wk wk T = min max xk ∗ = Q (xk . (4.10) in terms of the H. w) ≥ 0 that is not necessarily optimal.Model-Free Adaptive Dynamic Programming Algorithms Then one has: V ∗ (xk ) = min max Q∗ (xk .45) in Equation (4. uk .46) KE Substituting Equation (4.4 to develop a Q-learning algorithm to solve for the discrete-time zero-sum game H matrix that does not require the system dynamic matrices. Then the certainty equivalent principle is used to improve the policy of the action network.48) and (4. wk ) = r (xk .42): ⎤ ⎡ ⎤T ⎡ ⎡ R 0 0 A B E A B 0 ⎦ + ⎣ LA LB LE ⎦ H ⎣ LA LB H=⎣0 I KA KB KE KA KB 0 0 −γ 2 I which can be related to: ∗ Q∗ (xk .46) and (4. wk ) + Q∗ (xk+1 .9) and (4.8): P= I LT KT H I LT KT T . u∗ . wk ).43).44) Therefore the relation between P and H can be obtained by equating Equations (4. In the Q-learning. In the Q-learning approach.49) Equations (4. we use the Q-function of Section 4. In the next section.47) are the action-dependent version of Equations (4. This model-free Q-learning algorithm allows for solving the GARE equation online without requiring the knowledge of the plant model. w) by solving . uk . wk+1 ). the H matrix) of a given zero-sum game. k uk ∗ T uk T wk T H xk T uk T wk T 143 (4. u. one starts with an initial Q-function Q0 (x. u∗ . Similarly using Equation (4. (4.1 Derivation of Model-Free Online Tuning Based on the Q-Learning Algorithm (ADHDP) In this section.49) depend only on the H matrix.e. u.

it is shown that Qi+1 (xk . wk ) = min max xk uk wk uk wk T uk T wk T Hi+1 xk T uk T wk T . the Q-function is quadratic in the state and the policies. . Because in this ˆ ˆ chapter linear quadratic zero-sum games are considered. then once it is determined. B.49) the corresponding state feedback policy updates are given by i i i i L i = Huu − Huw Hww −1 Hwu i i i i K i = Hww − Hwu Huu −1 Huw −1 −1 i i i i Huw Hww −1 Hwx − Hux . the Q-function is given for any policy u and w.48) and (4. u.51) with ui (xk ) = L i xk wi (xk ) = K i xk . and only the H matrix is required. A parametric structure is used to approximate the actualQi (x.52) in Equation (4. 1. uk+1 . In this chapter. the plant model matrices A. Note that in Equation (4. ui (xk )). wi (xk ) → Q∗ (xk .50) to obtain the following recurrence relation on i: T Qi+1 (xk . to repeat the same process for i = 0. According to Equations (4. L) and w(x. and E are not needed. u. To update the action networks.144 Equation (4. . wk ) T T T = xk Rxk + uk uk − γ 2 wk wk + min max Qi (xk+1 . wi (xk )) = xk Rxk + uiT (xk )ui (xk ) − γ 2 wiT (xk )wi (xk ) T + xk+1 uiT (xk+1 ) T wiT (xk+1 ) Hi xk+1 uiT (xk+1 ) wiT (xk+1 ) T (4. wk+1 ) .50) and by applying the following incremental optimization on the Q function as: T min max Qi+1 (xk . one can substitute Equation (4. the improved policies ui (xk ) and wi (xk ) use the certainty equivalence principle. parametric structures are used to obtain approximate closed-form representations of the two action networks u(x. .52) Note that since Qi (x. the two action networks are . K ). L i → L and K i → K .53) that is used to solve for the optimal Q-function forward in time.50) with i = 0 as: Modeling and Control of Complex Systems Qi+1 (xk . w). uk. w) is not initially optimal. uk . i i i i Hwu Huu −1 Hux − Hwx (4. The idea is to solve for Qi+1 . 2. Similarly. uk+1 wk+1 = = T xk Rxk T xk Rxk + + T uk uk T uk uk − − T γ wk wk T γ 2 wk wk 2 + Vi (xk+1 ) + Vi ( Axk + Buk + Ewk ) (4. wk ) as i → ∞. (4. To develop solutions to Equation (4.50).. Moreover.50) forward in time that do not need the system matrices. uk . ui (xk ). which means Hi → H.

58) The parameter vector h i+1 is found by minimizing the error between the target value function (4. a natural choice of these parameter structures is given as: ui (x) = L i x. the right-hand side of Equation (4. ˆ w i (x) = K i x.59) |h i+1 z(xk ) − d( z(xk ). Hi ) = xk Rxk + ui (xk ) T ui (xk ) − γ 2 w i (xk ) T w i (xk ) ˆ ˆ ˆ ˆ + Qi (xk+1 . h i+1 ) in a least-squares sense to find h i+1 such that: T h i+1 z(xk ) = d( z(xk ). ¯ ¯ h i+1 ⎩ ⎭ Solving the least-squares problem one obtains: ⎛ h i+1 = ⎝ where z(xk ) is: T z(xk ) = xk T = xk T = xk I ⎞−1 z(xk ) z(xk ) T d x ⎠ ¯ ¯ z(xk )d( z(xk ). = h iT z ¯ (4. ⎧ ⎫ ⎨ ⎬ T h i+1 = arg min (4.53). ¯ 2 zq ) is the Kronecker product quadratic polynomial basis vector 0.51). . . .54) and (4. Therefore.55) 2 2 where z = [ x T uT w T ]T z ∈ Rn+m1 +m2 =q . z = (z1 .61) . h i )d x ¯ ¯ (4.54). zq −1 zq . h i )|2 d xk . ui (xk+1 ). z2 . z2 z3 . w i (xk+1 ) ˆ ˆ (4. In the linear case. h i ).55) are updated using Equation (4. (4.55).Model-Free Adaptive Dynamic Programming Algorithms 145 linear in the state. and (4. To solve for Qi+1 in Equation (4. 0. z1 zq .56) in a least-squares sense over a compact set . (4.60) ˆ ( ui (xk )) T (L i xk ) T L iT K iT ˆ ( w i (xk )) T ( K i xk ) T T T T T .53). Note that Equations (4. h i ) = zT Hi z.57) which can be thought of as the desired target function to which one needs to ˆ fit Q(z.53) is written as: T d(zk (xk ). ¯ ¯ (4. ˆ ˆ ¯ Q( z. . The output of v(·) is constructed by stacking 2 the columns of the squared matrix into a one-column vector with the offdiagonal elements summed as Hi j + H ji .56) (4.54) (4. . . .57) and (4. and h = v( H) with v(·) a vector function that acts on q × q matrices and gives a q (q +1)/ × 1 column vector. .56) give an exact closed-form representation of the functions in Equation (4. the parametric structures in Equations (4.

59) and (4. h i ) d( z( p2). exploration noise is added to both inputs in Equation (4. Hi ) = xk Rxk + uei (xk ) T uei (xk ) − γ 2 w ei (xk ) T w ei (xk ) + Qi (xk+1 . w ei (xk ) ˆ K i xk + n2k K i xk n2k Evaluating Equation (4. where q = n + m1 + m2 is the number of states including both policies. control and . To satisfy the excitation condition of the least-squares problem. To overcome this problem.52) to obtain: uei (xk ) = L i xk + n1k ˆ w ei (xk ) = K i xk + n2k ˆ (4.2 Online Implementation of Q-Learning Algorithm The least-squares problem in Equation (4. This requires one to have knowledge of the state information xk . . This can be determined by simuˆ ˆ ˆ ˆ lation. . and also of the reward function r (zk ) = xk Rxk + T 2 T uei (xk ) uei (xk ) − γ w ei (xk ) w ei (xk ) and Qi . and therefore z(xk ) z(xk ) T d xk ¯ ¯ is never invertible. ∈ h i+1 = ( ZZT ) −1 ZY with ¯ Z = [ z( p1) z( p2) ¯ ··· z( pN) ] ¯ · · · d( z( pN).55). p2.58) at enough points p1. one needs to have the number of collected points N at least N ≥ q (q + 1)/2. therefore z(xk ) in Equation (4. σ1 ) and n2k (0.63) is therefore guaranteed by the excitation condition. σ2 ) are zero-mean exploration noise with variances 2 2 σ1 and σ2 respectively. h i ) ¯ Here the target in Equation (4. xk+1 as the T dynamics evolve in time. h i ) in Equation (4. p3. . see Equations (4. however.57) becomes: T ˆ ˆ ˆ ˆ d(zk (xk ). 4.54) ˆ ˆ and (4.61) becomes: ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ xk 0 xk xk ˆ z(xk ) = ⎣ uei (xk ) ⎦ = ⎣ L i xk + n1k ⎦ = ⎣ L i xk ⎦ + ⎣ n1k ⎦.64). by observing the states online. ui (xk+1 ).62) where n1k (0.146 Modeling and Control of Complex Systems Note. or in real-time applications.63) can be solved in real time by collecting enough data points generated from d(zk . that ui and w i are linearly dependent on xk . w i (xk+1 )) ˆ ˆ (4. ¯ .60) will never be solvable. which means that the least-squares problem in Equations (4.63) ¯ Y = [ d( z( p1). one has: (4.64) ˆ ˆ ˆ with ui and w i used for Qi instead of uei and w ei where the invertibility of the ˆ matrix in Equation (4. h i ) ]T .4.

Start of the Zero-Sum Q-Learning Initialization h0 = v(H0) = 0 : P0 = 0 i = 0. hi ) d ( z ( xk −2 ).Model-Free Adaptive Dynamic Programming Algorithms 147 disturbance.65) ¯ Y = [ d( z(xk−N−1 ). and ε1 positive integers and ε0 ≤ ε1 . K0 = 0. ¯ ¯T k=1 for all k > α0 . h i ) ]T . i +1 i +1 i + i +1 i +1 i + i+ i+ K i +1 = ( H ww − H wu H uu1 H uw ) −1 ( H wu H uu1 H ux1 − H wx1 ) −1 −1 −1 −1 i i+1 No hi+1 − hi F <ε Yes Finish FIGURE 4.2 Zero-sum games Q-learning (ADHDP) flowchart. ε0 . ¯ (4. L0 = 0. Solving the Least-Squares Z = [z ( xk−N−1 ) z ( xk −N−2 ) Λ hi+1 = ( ZZ ) ZY Hi+1 = f(hi+1) T −1 z ( xk −1 )] d ( z ( xk −1 ). hi )]T Y = [d ( z ( xk −N −1 ). h i ) ¯ One can also solve Equation (4. α > α0 with α0 . .2. the excitation condition is replaced by the persistency of excitation condition.65) recursively using the well-known recursive least-squares technique. h i ) d( z(xk−2 ). ε0 I ≤ 1 α α zk−t zk−t ≤ ε1 I. The online Q-learning algorithm developed in this chapter is summarized in the flowchart shown in Figure 4. Y and Z matrices are obtained in real time as: ¯ Z = [ z(xk−N−1 ) z(xk−N−2 ) ¯ ··· z(xk−1 ) ] ¯ · · · d( z(xk−1 ). hi ) Λ Policy Iteration i+ i +1 i +1 i +1 i +1 i +1 i+ i+ Li +1 = ( H uu1 − H uw H ww H wu ) −1 ( H uw H ww H wx1 − H ux1 ). In online implementation of the least-squares problem. In that case.

In the remainder of this section. B T Pi B + I B T Pi E Hi+1 = ⎣ B T Pi A T T T 2 E Pi A E Pi B E Pi E − γ I (4.65) is equivalent to: ⎡ ⎡ ⎤T A B A B E Hi+1 = G + ⎣ L i A L i B L i E ⎦ Hi ⎣ L i A L i B Ki A Ki B Ki E Ki A Ki B where G is ⎡ ⎤ 0 0 ⎦. Ki E ⎤ E L i E ⎦.3 Convergence of Zero-Sum Game Q-Learning We now prove that the proposed Q-learning algorithm for zero-sum games converges to the optimal policies.65). Ki A Ki B Ki E Ki A Ki B Ki E LEMMA 6 The matrices Hi+1 .67) . Because the matrix Hi+1 reconstructed from h i+1 is symmetric. the least-squares Equation (4.51) and Equation (4. 4. Ki E (4. Some preliminary lemmas are needed. and K i+1 can be written as: ⎤ ⎡ T A Pi A + R AT Pi B AT Pi E ⎦. it will be shown that this policy iteration technique will cause Qi to converge to the optimal Q∗ . Ki A Ki B Ki E Ki A Ki B Ki E I where v is the vectorized function in Kronecker products.148 Modeling and Control of Complex Systems This algorithm for zero-sum games follows by iterating between Equation (4. L i+1 .64) is equivalent to: ⎛ ⎡ ⎡ ⎤T A B A B E ⎜ ¯T d( zk (xk ).65) becomes: ⎛ ⎡ ⎤T ⎡ ⎤⎞ A B E A B E ⎜ ⎟ h i+1 = ( ZZT ) −1 ( ZZ) ×v⎝G + ⎣ L i A L i B L i E ⎦ Hi ⎣ L i A L i B L i E ⎦⎠.4. −γ 2 ⎤⎞ E ⎟ L i E ⎦⎠. LEMMA 5 Iterating on Equations (4.51) and (4. iterating on h i is equivalent to: ⎡ ⎡ ⎤T ⎤ A B E A B E Hi+1 = G + ⎣ L i A L i B L i E ⎦ Hi ⎣ L i A L i B L i E ⎦. h i ) = zk ×v ⎝G + ⎣ L i A L i B L i E ⎦ Hi ⎣ L i A L i B ¯ Ki A Ki B Ki E Ki A Ki B then using the Kronecker products.66) R ⎣0 0 PROOF 0 I 0 Because Equation (4.

B T Pi E T 2 E Pi E − γ I Using Equations (4.68) (4.70) Equation (4. where Pi is given as Pi = I PROOF 149 (4. one has: Pi+1 = I T L i+1 T K i+1 Hi+1 I T L i+1 T K i+1 T . .71) with Pi defined as in Equation (4. K i+1 = ( E T Pi E − γ 2 I − E T Pi B( I + B T Pi B) −1 B T Pi E) −1 ×( E T Pi B( I + B T Pi B) −1 B T Pi A − E T Pi A).70).69) L iT K iT Hi I L iT K iT T . LEMMA 7 Iterating on Hi is similar to iterating on Pi as: Pi+1 = AT Pi A + R − [ AT Pi B × I + B T Pi B E T Pi B T AT Pi E] −1 B T Pi E E Pi E − γ 2 I B T Pi A E T Pi A (4.72) T T T T = R + L i+1 L i+1 − γ 2 K i+1 K i+1 + ( AT + L i+1 B T + K i+1 E T ) ×Pi ( A + B L i+1 + E K i+1 ).Model-Free Adaptive Dynamic Programming Algorithms L i+1 = ( I + B T Pi B − B T Pi E( E T Pi E − γ 2 I ) −1 E T Pi B) −1 ×( B T Pi E( E T Pi E − γ 2 I ) −1 E T Pi A − B T Pi A).67).70) in Lemma 6.67) in Lemma 6. and using Equation (4. one obtains Equations (4.66) in Lemma 5 can be written as: A B = G + ⎣ Li A Li B Ki A Ki B =G+ A B E T ⎡ Hi+1 ⎡ ⎤T A B E L i E ⎦ Hi ⎣ L i A L i B Ki A Ki B Ki E I L iT K iT Hi I L iT ⎤ E Li E ⎦ Ki E K iT T A B E .68) and (4.70) then it follows that: Hi+1 AT Pi A + R ⎣ B T Pi A = E T Pi A ⎡ AT Pi B T B Pi B + I E T Pi B ⎤ AT Pi E ⎦.69). (4. PROOF From Equation (4.51) and (4. Because Pi is described as in Equation (4. one obtains: ⎡ Pi+1 = I T L i+1 T K i+1 AT Pi A + R ⎣ B T Pi A E T Pi A AT Pi B T B Pi B + I E T Pi B ⎤⎡ ⎤ AT Pi E I ⎦⎣ L i+1 ⎦ B T Pi E T 2 K i+1 E Pi E − γ I (4.

65) is solved completely. where H corresponds to Q∗ (xk .150 Modeling and Control of Complex Systems Substituting Equations (4. the excitation condition is satisfied. L 0 = 0 and K 0 = 0 converges with Hi → H. The F-16 short period dynamics states are x = [ α q δe ]T where α is the angle of attack.72). iterating on Equation (4. We used standard zero-order-hold discretization techniques explained and easily implemented in MATLABTM [30] to obtain . and because from Equation (4.5 Online ADP H∞ Autopilot Controller Design for an F-16 Aircraft In this design application. 4.69) in Equation (4. We have just proved convergence of the Q-learning algorithm assuming the least-squares problem (4. one concludes that Qi → Q∗ . Then. and δe is the elevator deflection angle.68) and (4. L 0 = 0. one has: Pi+1 = AT Pi A+R−[ AT Pi B AT Pi E] I + B T Pi B B T Pi E T T E Pi B E Pi E − γ 2 I −1 B T Pi A .43) with corresponding P solving the GARE (4. E T Pi A The next result is our main theorem and shows convergence of the Qlearning algorithm.46). the zero-sum game that corresponds to the H∞ control problem is solved for an F-16 aircraft autopilot design. Since Lemma 7 shows that iterating on Hi matrix is equivalent to iterating on Pi .9).66) in Lemma 5. and K 0 = 0 implies that P0 = 0.9). q is the pitch rate. THEOREM 2 Assume that the linear quadratic zero-sum game is solvable and has a value under the state feedback information structure.71) with P0 = 0 converges to P that solves Equation (4. that is. uk .70) H0 = 0. The discrete-time plant model of this aircraft dynamics is a discretized version of the continuous-time one given in Reference [27]. Note that this implies that Q-learning can be interpreted as solving the GARE of the zero-sum game without requiring the plant model. wk ) as in Equations (4. withH0 = 0. then as i → ∞ AT PA + R Hi → ⎣ B T PA E T PA ⎡ AT PB T B PB + I E T PB ⎤ AT PE ⎦.41) and (4. B T PE T E PE − γ 2 I Hence from Equation (4. PROOF In Reference [5] it is shown that iterating on the GARE (4.

In Figures 4.0078 ⎦. or injection of a small probing noise signal. including covariance resetting. one can use several standard schemes. Following this initialization step.00038373 ⎦ B = ⎣ −0.0005 −0. Note that P ≥ 0.74) −0.4074 15. In this example.132655 ⎡ ⎤ ⎡ ⎤ −0.0089 P = ⎣ 12. state resetting. that implies: ∞ k=0 T xk Qxk + u∗T u∗ k k ≤ γ2 ∞ T wk wk k=0 (4.4074 −0. we use state .1.5109 12.73) with sampling time T = 0.0741349 0.75) for all finite energy disturbances. that is. from Reference [8]. In order to maintain the excitation condition.3 of this chapter is applied to solve for the H∞ autopilot controller forward in time. The parameters of the critic network and the action network are initialized to zero. the L2 gain for disturbance attenuation is γ = 1.000708383 ⎦ A = ⎣ 0. the states of the aircraft are initialized to be x(0) = [ 4 2 5 ] where any values can be selected.0089 −0.1476 0. (4. the HDP algorithm developed in Section 4.1 HDP-Based H∞ Autopilot Controller Design In this part.00951892 ⎦ E = ⎣ 0.3 and 4. The solution to Equation (4.4.Model-Free Adaptive Dynamic Programming Algorithms the sampled data plant: ⎡ ⎤ 0. In this HDP design.0872 −0.00150808 0.867345 0 151 (4. In this H∞ design problem. Hence u∗ (xk ) has the well-known robustness and disturbance rejection capabilities of H∞ control.0101 The corresponding policies have the gains: L = [ 0.1244 0 ].0096 0.906488 0.0816012 −0. 4. all disturbances with ∞ T wk wk k=0 is bounded.5. the aircraft dynamics are run forward in time and tuning of the parameter structures is performed using recursive least-squares by observing the states and rewards online. the states and the inputs to the aircraft are shown with respect to time.0078 1.9) is ⎡ ⎤ 15.5994 −0.90121 0 0 0.0661 ] K = [ 0.0733 0.

4 The control and disturbance inputs. .152 5 4. x2.5 2 1.3 State trajectories with reinitialization. x3 3 2.2 0 –0.5 States x1.8 0. 1 0.5 4 3.2 –0.6 0.5 1 0.5 0 0 200 400 600 Modeling and Control of Complex Systems x1 x2 x3 800 Time (k) 1000 1200 1400 1600 FIGURE 4.4 Control input Disturbance input The Control and Disturbance Inputs 0 500 Time (k) 1000 1500 FIGURE 4.4 0.

Following this initialization step.5. The states of the aircraft are initialized to be x0 = 4 2 5 where any values can be selected. In this example. 272. The parameters of the critic network and the action network are initialized to zero.4 of this chapter is applied to solve for the H∞ autopilot controller forward in time.7. and 4.8 and 4.2 Q-Learning-Based H∞ Autopilot Controller Design In this part. the persistency of excitation condition . In Figures 4. 4. the parameters of the critic network converge to P in Equation (4.5. 4.9. The parameters of the actions networks are updated according to Equation (4. we inject probing noise to the control and disturbance inputs.Model-Free Adaptive Dynamic Programming Algorithms 16 14 12 The Convergence of P P11 10 8 6 4 2 0 –2 0 500 Time (k) FIGURE 4. the Q-learning algorithm developed in Section 4.51). As expected.74) that solves the GARE equation.) Convergence of the critic network parameters.5 (See color insert following p. In Figures 4. State reinitialization has appeared recently in Reference [29] to solve the Hamilton-Jacobi-Bellman (HJB) equation associated with continuous-time optimal control problems. the convergence of the parameters of the critic network and the action network are shown. 153 P12 P13 P22 P23 P33 1000 1500 resetting and the states are reinitialized to x(0) = [ 4 2 5 ] periodically to prevent them from converging to zero.6. Hence. The recursive least-squares algorithm is used to tune the parameters of the critic network online. the aircraft dynamics are run forward in time and tuning of the parameter structures is performed by observing the states online. the states and the inputs to the aircraft are shown with respect to time.

06 0.1 0.6 (See color insert following p.154 Modeling and Control of Complex Systems 0.) Convergence of the control action network parameters.02 0 500 Time (k) FIGURE 4.08 L11 L12 L13 0 500 Time (k) 1000 1500 FIGURE 4.) Convergence of the disturbance action network parameters.1 0. 272.08 The Convergence of the Control Policy 0.16 The Convergence of the Disturbance Policy 0.08 0.06 0.06 –0.02 –0.04 0.7 (See color insert following p. K11 K12 K13 1000 1500 0.02 0 –0.04 0. .14 0. 272.04 –0.02 0 –0.12 0.

. 272. 4 The Disturbance Input w 2 0 –2 –4 0 1000 2000 3000 4000 5000 6000 7000 8000 4 u The Control Input 2 0 –2 –4 0 1000 2000 3000 4000 5000 The Time Step 6000 7000 8000 FIGURE 4.9 The control and disturbance inputs.8 (See color insert following p.) State trajectories.Model-Free Adaptive Dynamic Programming Algorithms 4 The State x1 2 0 –2 4 The State x2 2 0 –2 5 The State x3 x3 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 x2 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 x1 155 –5 0 1000 2000 3000 4000 5000 The Time Step 6000 7000 8000 9000 FIGURE 4.

that is.) Online model-free convergence of Pi to P that solves the GARE. whereas in the ADHDP case.10. The convergence of these two schemes has been shown to be related to a stable iteration on the GARE. The HDP algorithm requires the knowledge of the plant model to tune the action networks.6 Conclusion This chapter presented application of the online ADP techniques to solve the linear quadratic discrete-time zero-sum game appearing in H∞ optimal control forward in time. the convergence of the critic and action networks is shown. . Two ADP schemes have been applied to the zerosum game case.156 16 14 12 The Convergence of P 10 8 6 4 2 0 –2 Modeling and Control of Complex Systems P11 P12 P13 P22 P23 P33 0 1000 2000 3000 4000 Time (k) 5000 6000 7000 8000 FIGURE 4. The ADHDP algorithm results in a direct adaptive method that finds the H∞ feedback controller. It was shown that the convergence to the optimal solution in the HDP algorithm is faster than in ADHDP. to avoid the parameter drift problem. it can be shown that the critic network parameters Hi converge to the corresponding game value P that solves Equation (4.11.9). required for the convergence of the recursive least-squares tuning. 4. namely HDP and ADHDP. 4. In Figures 4. Using Equation (4. will hold.12.10 (See color insert following p. 272. and 4.70). the plant model is not required to tune the action or the critic networks.

06 0.04 –0.04 0.12 0.16 0. 0.02 –0. 272.11 (See color insert following p. .12 (See color insert following p. 6000 7000 8000 FIGURE 4. 6000 7000 8000 FIGURE 4. 272.06 –0.06 0.02 0 –0.08 The Convergence of the Control Policy 0.08 K11 0.02 K12 K13 157 0 1000 2000 3000 4000 5000 The Policies Update no.1 0.Model-Free Adaptive Dynamic Programming Algorithms 0.14 The Convergence of the Disturbance Policy 0.1 0.08 L11 L12 L13 0 1000 2000 3000 4000 5000 The Policies Update no.) Convergence of the disturbance action network parameters.04 0.) Convergence of the control action network parameters.02 0 –0.

J. An aircraft design example makes the point. which is highly effective in feedback control systems design. P. M. “Stabilizing a discrete. 2004. Barto.” IEEE Trans. No. J. 1994.158 Modeling and Control of Complex Systems The results presented herein are directly applicable in practice because they provide means to solve the H∞ control problem. pp. Landelius. J.” IEEE Transactions on Systems Man. and Cybernetics. vol. New Jersey. . “Neural networks for control and system identification. “Linear quadratic regulation using reinforcement learning. G. 7. S. SMC-13. Birkh¨ user. Ydestie. T. without having to deliberately insert any disturbance signal to the system. Wunsch. 18–27. “Neuronlike elements that can solve difficult learning control problems. Barto. Bradtke. no. one needs to provide an input signal that acts as a disturbance that is tuned to be the worst-case disturbance in forward time. 1997. 1978. 1990. 1989. pp. Once the H∞ controller is found. Ba¸ ar and G. 252–254.” Heuristics. Werbos. F. 1995. L. Stoorvogle and A. Lewis and V. 6. June. vol. 4. Hagen and B. Barto. Brewer. 3. constant. 1994. J. 39–46. A. SIAN. Powell. Note that if γ → ∞ or the disturbance gain matrix E = 0. S. Netherlands. 1999. New York. H∞ Optimal Control and Related Minimax Design Probs lems. Krose. 835–846. E. Bernhard. 1983. Weeren. S. England. Baltimore. 11. a special case of this approach can be the solution of the discrete-time linear quadratic regulator (LQR) in optimal control. PhD Dissertation. MD. Handbook of Learning and Approximate Dynamic Programming. C. CAS-25. pp. 5. 9. T. No. 39. 3475– 3476. Olsder. A.” Proceedings of the American Control Conference. pp. “The discrete-time Riccati equation related to the H∞ control problem. J. and C. Boston. 12. pp. Wageningen. 1974. 686–691.” Proceedings on the 8th Belgian-Dutch Conference on Mechanical Learning.” IEEE Transactions on Automatic Control. vol. a 8. Cambridge University. and A. Philadels phia. R. Linkoping University. 9. 2. D. W. John Wiley & Sons. Optimal Control. T. 13. Si. John Wiley. 1995. Watkins. pp. References 1. 3. one can use the parameters of the control action network as the final parameters of the controller. G. Syrmos. Kleinman. Circuit System. A. Thesis. Hoboken. 1. Vol. October 1998. W. L. and D. 10.D. 3. Sweden. Cambridge. linear system with application to iterative methods for solving the Riccati equation. Anderson. Dynamic Noncooperative Game Theory. A. Ph. Learning from Delayed Rewards. “Adaptive linear quadratic control using policy iteration.” IEEE Transactions on Automatic Control. J. B. Reinforcement Learning and Distributed Local Model Synthesis. “Kronecker products and matrix calculus in system theory. It is interesting to see that when designing the H∞ controller in forward time. W. Sutton. Ba¸ ar and P. T.

Vol. K. 41. 17.Model-Free Adaptive Dynamic Programming Algorithms 159 14. Howard. London. Huang.” International Conference on Machine Learning. 1991. Tsitsiklis. 1986. R. 21. pp.” IEEE Transactions on Systems. T. 1973. MIT Press. W. Aircraft Control and Simulation. P. 2004. 8. Prokhorov and D. G. 25. 26. and J. Athena Scientific. “Multiagent reinforcement learning: Theoretical framework and an algorithm. Vol. M. 5034–5040. M. New York. 18. N. 19. L. . 1992. “Adaptive dynamic programming. H. 1996. S. 30. Dynamic Programming and Markov Processes. 2002. 28. 2001. J. 27. New York. 55–66. B. pp.” IEEE Transactions on Systems. Prentice-Hall. L. and S.” Journal of Cognitive Systems Research.. 1997. G. ed. Lin and C. MATLABTM 7th edition. “Adaptive critic designs. 1998. Narendra and F. Vol. Maitra. 2003. F. “A menu of designs for reinforcement learning over time. SMC-3. A. Belmont. Lewis. L. Sofge. Werbos. New York. 8. 140–153. 1960. “Value-function reinforcement learning in Markov games. Widrow. L. pp. A. D. Werbos. J. No. MA. pp. pp. Miller. Cox. and P. Vol. pp. Lewis. The MathWorks Inc. Englewood Cliffs. Lewis. 2nd edition. 2005. Hu and M. S. L. and Cybernetics. Lewis. No 4. Gupta. Saeks.” Automatica. 1996. Van Nostrand Reinhold. Kwon and S. 455–465. 242–250. Vol.. L.” in 43rd IEEE Conference on Decision and Control. I. Neuro-Dynamic Programming. 22. W. Murray. Byrnes. “H ∞ control of discrete-time nonlinear system. Lewis. 23. F. Werbos: MIT Press. R. 20. 494–510. John Wiley. Vol.” IEEE Transactions on Automatic Control. Man. 24. F. Optimal Estimation. 67–95. MA. pp. J. White and D.” IEEE Transactions on Neural Networks. Littman. P. MA. 5. Cambridge. W. 2002. Wellman. No. “Punish/reward: Learning with a critic in adaptive threshold systems. C. Springer-Verlag. Han. J. “Approximate dynamic programming for real-time control and neural modeling. and Cybernetics. Sutton. 15. 997–1007. 16. 32. Wunsch. 29. J. ed. P. J. 1992. 37. “Special Issue on neural network feedback control. Cambridge. Stevens and F. N. John Wiley. Receding Horizon Control. 2. Applied Optimal Control and Estimation. 5. P. D. 2005.” Handbook of Intelligent Control. Man. Lendaris. “Hamilton-Jacobi-Isaacs formulation for constrained input nonlinear systems. Abu-Khalaf. pp. and R. 2. B. NJ. D. Bertsekas and J. Vol. No. J.” Neural Networks for Control.

.

........................4..................2 Performance Evaluation..................1 Introduction Wireless sensor networks are complex systems consisting of large numbers of small devices...... 161 Fair Data Gathering Problem ............... 172 5..3........................2..................... It is believed that pervasive deployments of such autonomous embedded networked sensing systems will constitute an important milestone in the information revolution [2]............4 Dual-Based Approach .............................. The crucial engineering challenge in the development of protocols for these novel networks is that they are characterized by severe resource constraints....................2...........1 5...... 165 5................................ computation..........3 A Primal-Based Distributed Heuristic ................................. each capable of a combination of radio communication.................. 163 5......1 Modeling Wireless Receiver Bandwidth Consumption ........... They are able to provide a fine granularity of spatiotemporal monitoring at large scale..4......................................5 Conclusions .................. 170 5....................... 168 5.2 Introduction........ 163 5.......................................................1 Distributed Algorithm... 173 5............ and actuation [1]........................... 168 5............................................... Applications envisioned for these next-generation networks range from military applications and ecological studies to civil structure monitoring and industrial process control................................................... 161 .... sensing..................................5 Optimization and Distributed Control for Fair Data Gathering in Wireless Sensor Networks Avinash Sridharan and Bhaskar Krishnamachari CONTENTS 5.........1 Performance of the Primal-Based Algorithm. 176 5.............................. Precisely because the deployments need to be large in scale..... 176 References...............2 Formulating the Constrained Optimization Problem....

Various practical algorithms have been proposed for a range of important tasks in wireless sensor networks. and computational ability. bandwidth. not a top-down process guided by solid mathematical understanding. taking into account resource constraints (such as energy or bandwidth constraints) on intermediate nodes and links. node localization. Despite the large and growing research literature on these topics (see. At a minimum. individual sensor nodes are often characterized by significant limits on energy. the amount of data collected. In particular. Researchers have been hard at work gaining a better understanding of how to model wireless sensor networks and how to develop scalable protocols for them. for instance. Generally. [4]. the survey in Reference [1]). or some fairness measure) is defined in terms of the data flow from sensor sources. providing a benchmark for more scalable distributed approaches. Such distributed convex optimization-based algorithms for multihop wireless networks are presented in some recent works by Chiang [5]. . The prevailing methodology for protocol design in this context is a bottom-up intuitive engineering approach. time synchronization. routing. this approach helps in identifying optimal flows from a centralized perspective. But beyond this. it is also sometimes possible to develop distributed algorithms directly based on the optimization formulation. data-centric storage. A thorough tutorial on distributed convex optimization for cross-layer protocol design in both wired and wireless networks is presented by Chiang et al. in the context of traditional wired networks. and querying. In the specific context of wireless sensor networks. This approach is often characterized by the use of a distributed gradient search based on the Lagrange dual of the original problem formulation. One area of research where this gap is starting to narrow is the development of a network utility maximization/flow optimization approach for various problems pertaining to data gathering. including deployment. particularly when it comes to the design of higher-layer network protocols. medium access. a high-level network utility optimization goal (such as maximizing the lifetime of the network. and Wang and Kar [6]. This has two important interrelated implications for design: (1) control algorithms are needed to allocate and manage the available resources efficiently while enabling the deployed network to perform its intended monitoring application. storage. in these problems. there is still very much a large gap between theory and practice. where a distributed dual-based gradient search algorithm is proposed for the problem of maximizing data extraction under energy constraints.162 Modeling and Control of Complex Systems considerations including economics and ease of deployment restrict the capability of each individual node. and (2) these resource-allocation algorithms must be efficient in their own operation (there is no point designing an inefficient algorithm to efficiently allocate resources!). there is a closely related work by Ye and Ordonez [7]. The design of control protocols for data networks based on distributed solutions to convex optimization problems was first advocated in the famous work by Low and Lapsley [3].

The second is the bandwidth consumed by flows generated by neighbors that are not destined to this receiver. Finally. In the next section. This chapter is organized as follows. we present brief concluding comments and discuss our future work in Section 5. Rangwala et al. 5. we shall describe and formalize the problem as a linear program. The set E is a union of two sets e and n. 5.2. e ⊂ E is the set of links that connect a child to a parent. that work is the basis of the primal-based algorithm we shall present in this chapter. In Section 5. their design approach is not based on a formal distributed optimization framework). and the set of all nodes by the set V. We formulate the objective function as a linear combination of minimum rate and sum rate. In wireless networks there are two factors that contribute to the consumption of bandwidth at a receiver. this is the interference perceived at the receiver. n ⊂ E is the set of links that connect two neighbors (that do not have a parent–child relationship). Every receiver in the network has a finite receiver bandwidth capacity given by the set B.4. Every receiver in the network is bandwidth constrained. We denote the set of all communication links in the network by the set E. Thus. and hence every node in the network must be allocated a rate (for sourcing and relaying data) that does not exceed the available bandwidth at any receiver.5. we illustrate this approach by developing a distributed algorithm for fair data gathering in wireless sensor networks.2 Fair Data Gathering Problem The problem we wish to address is as follows: a set of nodes in the wireless sensor network are all trying to send data to a single sink through the shortest path tree. the network graph G can be represented . We had previously treated this problem from the point of view of developing a timedivision multiple access mechanism for fair data gathering [8]. and experimentally studied a practical distributed rate control protocol called IFRC—interference-aware rate control protocol—for this very problem (however. The first is the bandwidth consumed by flows belonging to the children of the receiver. Recently. The goal is to maximize the utilization of the network capacity while maintaining a fair bandwidth allocation to all source nodes.1 Modeling Wireless Receiver Bandwidth Consumption As a first step we need to develop a model that can capture the consumption of a receiver’s bandwidth by various flows traversing this receiver on their path to the sink.3 we present a primal distributed heuristic that serves as a baseline for comparison with the dual-based distributed gradient search algorithm that we develop and evaluate in Section 5. [9] have developed.Optimization and Distributed Control for Fair Data Gathering 163 In this chapter. implemented. discussing along the way our modeling of a wireless receiver bandwidth that is crucial to the formulation.

Ni is the set of all neighbors (i) of i.164 Modeling and Control of Complex Systems 1 2 3 4 5 6 FIGURE 5. the rates allocated to the set of links n would represent the bandwidth wasted at the receiver due to interference. The segregation of the links into the two sets e and n thus gives us a simple model to quantify the consumption of bandwidth at a receiver. and rsrc is the source rate allocated to node i (if node i is a source). The routing tree rooted at the sink is denoted by T ⊂ G. by a three tuple (E.V.1 Six-node topology.1. through that specific link e ij . Thus. The relevance of this segregation is as follows: the flows from the source to the sink are carried over the set of links e. j∈C i + rk. To make the description of our model more explicit let us calculate the bandwidth constraint of node 2 in the six-node topology presented in Figure 5. In Figure 5.k∈Ni + rsrc ≤ B (i) where Ci is the set of all immediate children of i.B). The bandwidth constraint at the receiver i would then be (i) r j.∀nki ∈n. Note that only the set of edges e are part of T.1 the set of links corresponding to the set e are marked with a solid line and the set of links belonging to the set of noise links belonging . One important aspect of the quantification of the network at hand that we would like to point out is that we have explicitly segregated the set of links that connected two nodes that have a parent–child relationship (set e) and the set of links that connect nodes that do not have a parent–child relationship (set n). Also.∀e ij .1 illustrates the model with an example. Hence the rates allocated to each e ij ∈e are equal to the bandwidth consumed at a receiver (i) by the flows of children ( j) connected to the receiver. These flows are not destined for the receiver but due to the broadcast nature of the wireless domain will interfere in the receiver’s reception of flows that it needs to hear from its children. a receiver is able to hear flows that are being sent by its neighbors to their parents over the set of links n. Figure 5.

this is the total bandwidth at the receiver that is lost due to interference. The constraints of our optimization problem come directly from our bandwidth consumption model presented in Section 5.2 Formulating the Constrained Optimization Problem We now formulate our constrained optimization problem. Rsrc . We define the optimization problem as follows: P1 : max : Y + (i) i∈T r src subject to : 1T × Rin + N × Rnoise ≺B Rin = C × Rsrc Rnoise = C × Rsrc + Rsrc (i) rsrc ≥ Y Rsrc ≺ B. 5. j ∈ V C: Is an n × 1 matrix cij = 0 1 j has i in its path j does not have i in its path • Y: Is a scalar that acts as a slack variable for our objective function. N.1 gives us the relationship between the rate allocated to the sources and the receiver bandwidth.2. Rin represents total input rate incident on a receiver from all its child nodes. and C. r4 = rsrc (5) and r5 = rsrc .e. The new variables that have been introduced are the matrices Rin and Rnoise . i. • • • • B: Is an n × 1 vector representing the bandwidth available at each node i ∈ V Rsrc: Is an n × 1 vector representing the rate allocated to each source i∈V N: Is an n × 1 matrix. for node 2 the bandwidth constrained would be given by: (2) r3 + r4 + r5 + rsrc ≤ B (2) where (3) (6) (4) r3 = rsrc + rsrc . that denotes the presence of a noise edge nij ∈n between two nodes i. Our model of the receiver bandwidth consumption from Section 5. Rnoise represents the total rate that is incident on each receiver from its neighbors.1 and our application and network-level parameters represented by the matrices B.. . Before illustrating the constrained optimization problem let us define some additional notation that will be part of our optimization problem.Optimization and Distributed Control for Fair Data Gathering 165 to the set n is marked with a dashed line.2.2. Thus.

5 2. it can be extended easily to an arbitrarily weighted linear combination (to consider different trade-offs between fairness and total throughput). Hence. The receiver bandwidth capacity is presented in Table 5.5 Heterogeneous 1.0 1. i Note that the noise bandwidth rnoise ∈ Rnoise represents the total outgoing bandwidth of node i ∈ V. the heterogeneous case where at least one node has a receiver bandwidth capacity lower than all the other nodes in the network and the homogeneous case where all receiver bandwidths have the same TABLE 5.5 2.66 1.5 2.5 2. We use the topology presented in Figure 5. We consider two cases.2.66 4. we can obtain a solution to the rate allocation problem by using an appropriate numerical linear program solver.5 2.66 4.166 Modeling and Control of Complex Systems 1 2 3 4 5 6 7 8 9 FIGURE 5.2 Nine-node topology. Heterogeneous and Homogenous Node 2 3 4 5 6 7 8 9 Homogenous 2. The optimization problem is a linear program.5 2.66 1. The objective function is given here as the sum of the minimum rate and the sum rate.0 4.5 2.1 Optimal Source Rate Allocation for a Nine-Node Topology.0 .66 1.1.2 as an example and give the solutions obtained by running the linear program through a linear program solver in Table 5.

Thus. for the heterogeneous case. . or the topology itself might be changing due to node failures. The root has the maximum number of flows incident on it and hence the maximum fair bandwidth allocation it can perform is . Such a centralized solution is therefore not scalable. which could also change dynamically.i= j + ∀ j∈N(i) ∀k. the constraints in the optimization problem are related to the number of flows active in the network. motivating us to explore distributed approaches. It cannot cater for a dynamic environment where the receiver bandwidth capacity might be changing. j=k c jk . every source would be required to be given a rate at least equal to that offered by the bottlenecked node. c ij ∀ j. The bandwidth per flow allocation can then be obtained by dividing the capacity of the receiver by the sum of the number of children incident on the receiver and sum of all the children of its neighbors: B (i) c ij ∀ j. Moreover. The solutions obtained by solving the optimization problem give us the following insights: • For the homogeneous case the node with the bottleneck bandwidth would always be the root. The remaining capacity of the network can than be distributed among nodes that can still have a higher rate (that do not traverse the bottlenecked node) to maximize the capacity utilization. Homogeneous and Heterogeneous Node 1 2 3 4 Homogenous 20 20 20 20 Heterogenous 20 8 20 20 receiver bandwidth capacity. In these cases information would have to be repeatedly sent from the nodes in the network to the root and the result recalculated and propagated to the children.2 167 Receiver Bandwidth Capacity for Nine-Node Topology.Optimization and Distributed Control for Fair Data Gathering TABLE 5. Solving the optimization problem in a centralized manner for a wireless network presents us with some problems.i = r oot B r oot • For the heterogeneous case the node with the bottleneck bandwidth would be the node that has the least bandwidth per flow allocation capacity.

168 Modeling and Control of Complex Systems 5. Hence.3. we can continue our increments until we have consumed the bandwidth at every bottleneck receiver. nodes would send their total output rate to their parents. the algorithm is able to . This heuristic distributed algorithm would be iterative in nature. The output rate of each node (which is treated as “noise” for any neighboring node not logically linked to this node on the tree) would be calculated as follows: (i) (i) (i) (i) routput = rnoise = rin + rsrc . the termination condition for the algorithm would be when all the nodes in the network have been constrained.3 A Primal-Based Distributed Heuristic Consider our optimization problem. Whenever a child receives a message from a parent or a neighbor receives a message from another neighbor about the unavailability of bandwidth. their children. if we iteratively increase the source bandwidths. The pseudo code for the algorithm has been presented in Figure 5. The parent in turn would compare the total bandwidth consumed by themselves. If we start increasing the source rates ( j) of source i ∈ V. As is evident in Figure 5. informing one another of their current available bandwidth. rnoise ( j) 5.3. At the end of each iteration.2 to test the performance of our heuristic. as well as the (i) terms rnoise ∀ j that lie in the path of i to the root. At the end of each iteration every node would compare the total bandwidth consumed with the total receiver bandwidth as follows: (i) (i) (i) Bpending = B (i) − rin − rsrc − j∈N(i) (i) where rnoise is the total noise bandwidth incident on the node from its neigh(i) bors. Thus. We use the nine-node topology presented in Figure 5.1 Performance of the Primal-Based Algorithm Because the distributed algorithm is a heuristic we need to see how close the algorithm performs to the optimal. the child or neighbor should stop incrementing its rate and also inform all their children about the unavailability of receiver bandwidth. This in turn would imply an increase in the bandwidth consumption for all these nodes. If Bpending ≤ 0 node i would send a “CONSTRAIN” message to all its children as well as its neighbors. At each iteration the root node would send messages to the child nodes (which would then propagate the message through the tree) to increment their rate by an amount ε.5 for the nine-node topology. and their neighbors with the available bandwidth. and neighbors exchange messages. we will also start increasing the terms rin . We could formulate the above logic as a distributed solution by making the parents. children.4 and Figure 5.

Else inform parent and all neighbors of the o/p bandwidth Step 5: goto Step 1 (i) (i) FIGURE 5.5 1 0. Step 2: Termination Condition: If (constrainedi = TRUE. For j . Step 3: Rate Update: For i If (constrainedi = FALSE ) rsrc = rsrc + ε. j N i constrainedj = TRUE. j C i constrainedj = TRUE.Optimization and Distributed Control for Fair Data Gathering 169 Step 1: Parent initiates message to child to increment bandwidth.3 Primal-based heuristic algorithm. For j .4 Bandwidth allocation by the heuristic when all receivers have equal bandwidth capacity.5 Src 2 2 Source Bandwidths Allocated 1. of Iterations (υ = 0. i) end.01) 200 250 FIGURE 5. 2.5 0 0 50 100 150 No. . For k. Step 4: Checking Bandwidth Constraints: (i) (i) If (rin + rnoise > B(i)) – constrainedi = TRUE. k Cj constrainedk = TRUE.

The accuracy of the solution is solely dependent on the choice of ε.5 1 0. possibly an expensive proposition. The smaller the value of ε. Thus. Apart from the choice of the parameter for ε another drawback of the algorithm is that the source rates can only be increased and not decreased. 5. dual-based approach that gives an economic interpretation of the problem in terms of shadow prices.5 0 Modeling and Control of Complex Systems Source Bandwidth Allocated 0 50 100 150 200 250 No.3 we presented a heuristic for solving our constrained optimization problem. Hence. of Iterations (υ = 0. this heuristic is not flexible enough to handle such dynamics.5 2 1. achieve the optimum solution.01) 300 350 400 FIGURE 5.170 4 Src 2 Src 4 3. we present an alternative.5 3 2. more rigorously derived. the closer the rate allocation achieved is to the optimum.4 Dual-Based Approach In Section 5. The constrained optimization problem .5 Bandwidth allocation on a nine-node topology with the heuristic when receivers have different bandwidth capacities. At the same time the value of ε controls the rate of convergence of the algorithm. if the topology changes or the number of flows in the network changes we would have to restart the algorithm to make the source nodes converge to the optimum. In this section. the tuning of the parameter ε presents a trade-off between the speed of convergence and accuracy. Thus. This motivates us to explore another approach.

2 is our primal problem.Optimization and Distributed Control for Fair Data Gathering 171 P1 presented in Section 5. Because P2 is a linear program. In our approach we will derive its Lagrange dual and then work with the dual to come up with a distributed algorithm using the shadow price interpretation of the Lagrange multipliers in the dual. To aid our solution we can simplify our primal further. The Lagrange relaxation is as follows: L(Y. Hence. Rsrc . from the duality theorem [10] it can be seen that the duality gap of the primal and the dual would be zero. ν) = Y + i∈T (i) rsrc − λT × (C × Rsrc + N × Rsrc + N × C × Rsrc + Rsrc − B) − υ T (−Rsrc + Y). λ. the Lagrange function is ⎛ D(λ. ( j)∗ ∗ Let ζ ( Rsrc ) = rsrc + j∈C (i) j∈N(i) k∈C ( j) ( j)∗ j∈N(i) From the Lagrange dual function it can be seen that the subgradient with respect to λi is ∂L ∂λi ∗ = −(ζ (i) ( Rsrc ) − B (i) ) . We consider the maximum bandwidth constraint as our domain and relax the constraints in the primal to obtain the Lagrange dual. we consider the Lagrange dual function of the primal P2 (see Reference [10] for a treatment of Lagrange duality theory).2. We can perform the simplification by rewriting the primal in terms of Rsrc and Y as follows: P2 : max :Y + i∈T (i) rsrc subject to: j∈C (i) rsrc + j∈N(i) k∈C ( j) ( j) (k) rsrc + (i) rsrc ≥ Y ∀ i ∈ T Rsrc ≺ B. j∈N(i) (i) rsrc + rsrc ≤ B (i) ∀ i ∈ T ( j) In this section. υ) = max ⎝Y + Rsrc ≺B i∈T (i) + rsrc − B (i) (i) rsrc − i∈T ⎛ λi ⎝ j∈C (i) ( j) rsrc + j∈N(i) k∈C ( j) (k) rsrc + j∈N(i) (i) rsrc − i∈T (i) υi − rsrc + Y) = max + Rsrc ≺B Y 1− i∈T υi + + i∈T (i) rsrc 1 + υi − i∈C ( j) λ( j) + i∈C ( j) k∈N( j) λk λ( j) + λi i∈N( j) λi B (i) i∈T (k)∗ rsrc + (i)∗ rsrc + rsrc .

it can be seen that if [ζ i (R∗ ) − B i ] src is consistently positive. Now given the update mechanism of the subgradients. and decrement rsrc . we will tend toward the maximum. (i) the coefficient of rsrc denoted by μi is given by: ⎞ ⎛ μi = 1 + υi − ⎝ i∈C j λj + j∈Nk i∈C j λk + i∈N j λ j + λi ⎠ . we describe our distributed algorithm that is based on the maximization of the Lagrange dual function. the value of λi would keep increasing. at what point do the coefficients of a particular node i become negative. for a fixed value of rsrc . whose coefficients are negative.4. The objective here is to maximize the Lagrange dual function. we will maximize the Lagrange dual function by moving in the direction of the negative gradient. which would imply a decrement operation to be carried for the source rate of that specific node i? The answer to the above question lies in our observations of the subgradients of the shadow prices. Let us assume we do increment all source rates (this increment could be the same as that being done in our heuristic algorithm). Also to be noted is the fact that λi is a part of the negative term in the coefficient j ∀Rsrc j ∈ C i | j ∈ Ni | j ∈ C k .172 and the subgradient w. because the dual is a linear equation. because the Lagrange dual function is a linear combination of (λ(i) . In the subgradient technique the update of the shadow prices itself would be performed as follows: ∗ λi (t + 1) = [λi (t) + α(ζ i ( Rsrc ) − B i )]+ (i)∗ υi (t + 1) = [υi (t) + αt (rsrc − Y∗ )]+ where R∗ and Y∗ are the optimal values that would maximize the Lagrange src dual function given by dual function at the tth iteration. λi affects the coefficients of source rates of all nodes j for which either i is an intermediate node on the path from j to the sink. μ(i) src (i) and λi ). whose coefficients (i) are positive.1 Distributed Algorithm In this section. it implies that as long as we keep incrementing rsrc . As mentioned earlier if we trace the (i) graph represented by the Lagrange dual function. Also. that is. If we trace the graph in the direction of the negative gradient we are bound to hit the optimal. . k ∈ Ni . 5.r.t υi is ∂L ∂υi Modeling and Control of Complex Systems (i)∗ = rsrc − Y∗ . or i is a neighbor to node j or is a neighbor to a node k which is on the path of j to the sink. In the Lagrange dual function. We will use this to develop our distributed algorithm.

The pending bandwidth is the difference of the receiver bandwidth capacity and the total bandwidth consumed at the receiver by all flows originating either at the children or from neighboring nodes.3 .Optimization and Distributed Control for Fair Data Gathering 173 Thus. The available bandwidth is either the receiver bandwidth at that node divided by the number of children at that node or its parent’s available bandwidth. and a flag is set implying that the node is constrained. our analysis of the update mechanism for the subgradient gives us a clear estimate of the decrement required to be applied to the source rate of a constrained receiver.4. In step 3 we calculate the pending bandwidth at each node in the network. however. unlike in the case of the primal-based heuristic. Hence the source rate of the node is set to the available bandwidth of this constrained node. this implies that a constraint has been violated at one of the nodes on the path or the nodes neighboring the path from the source to the sink. The fast convergence to the optimal value compared to the heuristic presented in Section 5. In step 2 we initialize the source rate of all sources in the network to the available bandwidth at that source. we go ahead and increment the rate of the source node by the pending bandwidth. As can be seen. we need not be extremely cautious about incrementing the source rates to avoid exceeding the receiver bandwidth at a constrained node.2 Performance Evaluation The simulation results of the algorithm are shown in Figure 5. In step 1 we perform a breadth first search of the tree and at every node initialize the available bandwidth. Eventually the value of λi will be j large enough to make the coefficients ∀Bsrc that contain λi negative. We now present an algorithm designed on the above principles in Figure 5. In this case.8. here too we see that the sources whose flows are consuming bandwidth at the constrained receiver are the ones that receive a signal. The algorithm proceeds as follows. We also consider the pending bandwidth of neighbors of these intermediate nodes. the algorithm allows the source bandwidth of nodes to converge to the optimal value within 10 iterations.6.7 and Figure 5. The pending bandwidth at the specific node in question is then the minimum of all these bandwidths. else we repeat the algorithm from step 3. Similar to the primal heuristic. 5. it makes the gradient of λi positive. Hence. If the pending available bandwidth is negative. forcing them to reduce their rates. This in turn would result in a negative reinforcement to the affected nodes. In step 5 we check the constrained flag for all nodes in the network and if all nodes have been constrained the program terminates. which increases the value of λi . In case the pending available bandwidth is positive. if we allocate the source rates in such a manner that we exceed the bandwidth constraints at a particular receiver i. In step 4 for every node in the network we look at the pending bandwidth of nodes lying in the path from the specific node in question to the sink. whichever is smaller.

i Ck Ck constrainedi = TRUE else constrainedi = FALSE (i) rsrc = r (i) + pending _ bandwidth src Step 5: Checking termination condition: If (constrainedi = TRUE) i end else goto Step 3 FIGURE 5.174 Modeling and Control of Complex Systems Step 1: Perform a breadth first search of the tree.6 Dual-based algorithm. there are no iterative increments of source bandwidths (thus removing the effect of ε from the rate of convergence). i If ( pending _ bandwidth < 0) (i) rsrc = Bavailableconstrained_ node Cj i Cj i Nj j Nj j N k. of how much decrement needs to be applied to their source bandwidth. Moreover in the improved bandwidth allocation algorithm. Apart from the rate of convergence another advantage that the improved algorithm presents . is a result of the explicit knowledge that nodes get. and set the available bandwidth at each node i as follows: If Bavailablei < Bavailablej i C j Bavailablei Else Bavailablei = Bavailablej Step 2: Initialization For i (i) rsrc = Bavailablei = Bi Ci constrainedi = FALSE Step 3: Pending Bandwidth Calculation For i Bpending i = ζ i(Bsrc ) – Bi Step 4: Updating Source bandwidth: For i Pending _bandwidth = min(Bpending j). i N k. i constrained _node = arg min(Bpending j ). when one of the receivers becomes bandwidth constrained.

of Iterations 7 8 9 10 FIGURE 5.8 Bandwidth allocation by the distributed algorithm for nine-node topology with all nodes having the same receiver bandwidth. . of lterations 7 8 9 10 FIGURE 5.5 2 1.5 0 1 2 3 4 5 6 No.5 2 1. 3.5 3 2.Optimization and Distributed Control for Fair Data Gathering 4 175 Src 2 Src 3 Src 4 Bandwidth Allocated to Source Nodes 3.5 0 1 2 3 4 5 6 No.7 Rate allocation by the improved distributed algorithm for nine-node topology with node 2 having the constrained bandwidth.5 Bandwidth Allocated per Node 3 2.

pp. 23. H. [9]. 1999. 3. 1. Wang and K. Calderbank. 6. 5. S. 2005. vol. Kar. vol. and J. One direction we are currently pursuing is implementing the dual-based algorithm on a real wireless test-bed. to provide a direct comparison with the IFRC protocol proposed by Rangwala et al.176 Modeling and Control of Complex Systems as compared to the heuristic is its ability to adapt to the changes in network topology. Cambridge University Press. 104–116. 2001. National Academy Press. 4. in press 2007. “Cross-layer rate control for end-to-end proportional fairness in wireless networks with random access. At the same time we have the ability to increment source bandwidths when there is available capacity pending in the network.” Proceedings of ACM MobiHoc. Networking Wireless Sensors. X. In the improved algorithm. I: Basic algorithm and convergence. C. We believe that this kind of systematic modeling and optimization framework represents the future of protocol design in complex wireless networks such as sensor networks. pp. May 2005. Thus.” IEEE/ACM Transactions on Networking. By changes to the network topology we imply a deletion or addition of a new source or an intermediate node into the network. Areas Comm. Cambridge. Embedded Everywhere: A Research Agenda for Networked Systems of Embedded Computers. It is therefore of particular interest to see how such theoretically guided approaches will perform in a real-world example. National Research Council Staff.. “Optimization flow control. 2. Sel. Low and D. M. “Layering as optimization decomposition: A mathematical theory of network architectures. Krishnamachari. 861–875. . Chiang. Doyle. Low. DC.5 Conclusions We have presented an illustrative case study showing how a distributed convex optimization framework can be used to design a rate control protocol for fair data gathering in wireless sensor networks. S. References 1. we have the ability to decrease the source bandwidth when a bandwidth constraint is violated. Lapsley. if a new node were added. Chiang.” Proceedings of the IEEE. 7. in step 4 of the algorithm the nodes that were constrained would become unconstrained and increment their bandwidth to the excess capacity available. M. Washington. A. no. 5. 2005. E. H. B. adding capacity to the network.” IEEE J. “Balancing transport and physical layers in wireless multihop networks: Jointly optimal congestion control and power control. R.

Rangwala. Boyd and L. Gummadi. “A sub-gradient algorithm for maximal data extraction in energy-limited wireless sensor networks. Cambridge. Krishnamachari.” Proceedings of ACM SIGCOMM Symposium on Network Architectures and Protocols. June 2005. University Press. S. S. R. 2004. 10.Optimization and Distributed Control for Fair Data Gathering 177 7. Psounis. W. Communications and Mobile Computing. September 2006. April 2004. A. Ordonez. Ye and F. 9. Govindan. 8. Vandenberghe. and K. and Communications. R. Sridharan and B. Computing. “Interference-aware fair rate control in wireless sensor networks. “Max-min fair collision-free scheduling for wireless sensor networks. Cambridge. Convex Optimization.” Proceedings of the International Conference on Wireless Networks. 2004.” Proceedings of the IEEE International Conference on Performance. .

.

........ 181 Deployment of Networks with Fixed Data Sources and no Mobility ..................................................... 201 References....2 Incremental Iterative Approach .............2 Enumerating Topology Adjustment Options ................... 199 Acknowledgments .4............................ 187 6.....................................1 Problem Decomposition....................4................ 191 6..........6 Optimization Problems in the Deployment of Sensor Networks Christos G.............. or monitoring and tracking “target points” over a specific region.............................................2..........2 6.......................................... 188 6.......................3...3............................................................ 186 6.............. surveillance................... Cassandras and Wei Li CONTENTS 6..............................” Collected data are then further processed and often support 179 ..........................................................................2............... 189 6........................4 Deployment of Networks with Unknown Data Sources and Mobile Nodes ..................... 189 6............. 195 6...........1 Mission Space and Sensor Model ..............2 Optimal Coverage Problem Formulation and Distributed Control ...........3 Obtaining a New Deployment.................3...... 191 6........4........2.................. 179 Sensor Network Structure ......1 Introduction A sensor network consists of a collection of (possibly mobile) sensing devices that can coordinate their actions through wireless communication and aim at performing tasks such as exploration.1 6...................................................3....5 Research Issues in Sensor Networks.................................................3 Optimal Coverage Problem with Communication Costs...3 Introduction............ 182 6..........3... 201 6...................1 Determining the Bottleneck Node.............................. often referred to as the “mission space............................. 190 6................

often in adverse stochastic environments. In this chapter. we describe deployment problems for sensor networks viewed as complex dynamic systems. When the nodes are mobile. which may adapt to changing conditions in the sensing region. sensor networks are expected to realize a long-anticipated convergence of communication. they have limited on-board resources (e. and control functionality into such networks one can envision closing the loop on remote processes that would otherwise be inaccessible. because of limited energy.” for example. The dynamic version allows the coordinated cooperative movement of sensors. This leads to the basic problem of deploying sensors in order to meet the overall system’s objectives. This implies that optimization in designing and operating sensor networks is a real need and not a mere luxury.180 Modeling and Control of Complex Systems higher-level decision-making processes. not just computers. and they may be subject to communication constraints. We first consider a deployment setting where data sources are known. The performance of a sensor network is sensitive to the location of its nodes in the mission space. mechanisms are also needed to determine desired trajectories for the nodes over the mission space and cooperative control comes into play so as to meet specific mission objectives. optimal locations for them can be determined by an off-line scheme prior to the deployment. and control [1]. nodes are typically small and inexpensive. computing.. which is akin to the widely studied facility location optimization problem. they execute sensing processes or they are mobile. therefore. they allow us to interact with the physical world. Thus. typically deploying them into geographical areas with the highest information density. In addition. databases. Moreover. when it comes to measuring the performance of sensor networks. making a sensor network as a whole a challenging dynamic system. Second. power and computational capacity). the limited computational capabilities of nodes often make distributed control or optimization methods indispensable. This is also referred to as the coverage control or active sensing problem. The static version of this problem involves positioning sensors without any further mobility. First. We formulate a minimum-power wireless sensor network deployment problem whose objective is to determine the locations of a given number of relay nodes and the corresponding link flows . sensors must be deployed so as to maximize the information extracted from the sensing region while maintaining acceptable levels of communication and energy consumption. For example. the metrics can be quite different from those used in standard communication networks. Nodes in such networks are generally inhomogeneous. we recognize that nodes have finite lives and we often seek control mechanisms that maximize an appropriately defined “network lifetime. operating with limited resources. By inserting decision-making.” Part of such mechanisms may involve switching nodes on and off so as to conserve their energy or finding means to periodically replenish their energy supply.g. at least some nodes in such a network are “active. It should be pointed out that sensor networks differ from conventional communication networks in a number of critical ways. giving rise to new types of problems. In particular. they are characterized by dynamics. [2]. or human-generated data. Finally.

At any time instant. Finally. a sensor node may fall into one of the following states: 1.2 describes the basic structure of sensor networks. We then describe a distributed deployment algorithm (first proposed in Reference [4]) applied at each mobile node so that it maximizes the joint detection probabilities of random events. These data will eventually be sent back to the base station. The rest of the chapter is organized as follows. in addition to being combinatorially complex. denoted by B (also referred to as “data collection point” or “sink”). During cooperation. in Section 6. single-base-station data collection network. We assume that a mobile sensor has a limited range that is defined by a probabilistic model. we incorporate communication cost into the coverage control problem. computing. Relaying: a relaying node receives data from other nodes and forwards it towards their destination. denoted by R. the main objective of a sensor network is to collect field data from an observation region (the “mission space”).3. digitizes the information. processes it. Nodes in a sensor network collaborate to ensure that every source is sensed and that the data gathered are successfully relayed to the base station.g.2 Sensor Network Structure In its most basic form.4 we present a solution approach to the coverage control problem. . the temperature of a noxious gas exceeds a certain level or an object emits detectable data).5 we outline some fundamental research questions related to sensor networks and the convergence of communication. If the sensing field (or our perception of the sensing field) changes over time. This is shown to be a nonlinear optimization problem with a nonconvex cost function. 6. the sensing field is modeled using a density function representing the probability that specific events take place (e. we describe a deployment approach for sensor networks with fixed data sources and no mobility. Communication cost is modeled as the power consumption needed to deliver collected data from sensor nodes (data sources) to the base station using wireless multihop links. 2. In Section 6. the adaptive relocation behavior naturally follows from the optimal coverage formulation. and route it to a base station. the coverage problem we formulate trades off sensing coverage and communication cost. Finally. In this case. viewing the sensor network as a multisource. there may exist multiple data sources in R (also referred to as “target points” or simply “targets”). Next. and control. We describe a solution approach (first presented in Reference [3]) based on an incremental algorithm deploying nodes one at a time into the network.Optimization Problems in the Deployment of Sensor Networks 181 in order to minimize the total communication power consumption. Section 6.. we consider a setting where data sources are unknown. Thus. and stores the data in its onboard buffer. In Section 6. Sensing: a sensing node monitors the source using an integrated sensor.

). Power control. The links connecting clusterheads and the base station may have a larger data rate in order to support high-speed data transmission. a clusterhead refines the observation of the cluster’s region. However. it cannot reenter any other state. for a comprehensive overview. A sleeping node does not participate in either sensing or relaying.3 Deployment of Networks with Fixed Data Sources and no Mobility The deployment of sensor nodes may be either deterministic or random. Routing. most of the device is either shut down or works in low-power mode. that is. to optimize some network performance metric. there are also nodes acting as clusterheads. and son on. Once this is accomplished. and [2]. transport.182 Modeling and Control of Complex Systems 3. The most important problems where dynamic control-oriented methods may be used are 1. ultimately. Each clusterhead is in charge of a cluster of sensor nodes which is obtained by making a spatial or logical division of the network. it “wakes up” from time to time and listens to the communication channel in order to answer requests from other nodes. 3. it may produce some post-processed data and route them to the base station. 4. see References [5]. In this case. data-link. Once a node is dead. network. Instead of a flat structure. By aggregating the data sent from sensor nodes. that is. These nodes generally have more powerful data processing and routing capabilites. determining the destination node of data packets transmitted from some node i on their way to the base station. Then. Dead: a dead node is no longer available to the sensor network. research focuses on the relationship between sensor density and network performance. positioning the nodes so as to meet the goal of successfully transferring data from the sources to the base station and. Sleeping: for a sleeping node. besides sensors and a base station. Scheduling. that is. In this case. 6. determining the precise timing mechanism for transmitting packets of possibly different types. there are numerous operational control issues at different layers (physical. some sensor networks assume a more hierarchical one. a state transition to “sensing” or “relaying” may occur. Deterministic deployment takes place when the characteristics of the mission space are . [6]. at the expense of size and cost. It has either used up its energy or has suffered vital damage. The first and most basic problem we face is that of deployment. 2. The latter situation arises in applications such as reconnaissance and exploration where sensors are randomly dropped into the mission space and their exact location cannot be precisely controlled. Upon receiving a request. making decisions aimed at conserving a node’s energy in a way that benefits the network as a whole. that is.

e |E| ] are defined on every link j ∈ E with c j . . . which generally depends on the node locations. One of the commonly applied approaches is to discretize the mission space and place sensor nodes along grid points. In what follows. . in building monitoring). A capacity vector c = [c 1 . Alternatively. . Over this flow network. . . How many sensor nodes are needed to meet the overall system objectives? 2. e) be a flow network with an underlying directed graph G = (V. . one can formulate a nonlinear optimization problem and seek to exploit the structure of a sensor network in order to develop decomposition approaches to solve it. .1) (6. . When data sources change or some part of the network malfunctions. we describe such an approach. c. . we can formulate an optimization problem that minimizes the total cost by controlling on each link j the locations of sensor nodes xs( j) and xt( j) and the data rate f jm from each source m = 1. m (6. . . . introduced in Reference [3]. To collect data at each data source. . . Let W = (V. . E).3) . . with location x0 ∈ R2 . M) and a single base station B. . xt( j) ) f jm a ij f jm = −rm dim ∀i. . how do we adjust the network topology and sensor deployment? These questions can be resolved by an off-line scheme that is akin to the widely studied facility location optimization problem.g. Suppose there are N active sensor nodes and each has location xk ∈ R2 (k = 1. In addition. M). p j ∈ R+ . . N). how do we precisely deploy these nodes in order to optimize network performance? 3. . As all grid points and interconnected links must be considered. because a data source may be far from the base station and the distance may exceed the range of radio communication. The resulting optimal deployment problem can be formulated as a linear program. we consider M data sources residing at points sm ∈ R2 (m = 1. . Fundamental questions in this case include: 1. Each data source has a fixed position and a given data rate denoted by rm (m = 1. . M: M minxi s. .2) j∈E M f jm ≤ c j ∀ j ∈ E m=1 (6. c |E| ] and a cost vector e = [e 1 . E. this results in significant combinatorial complexity. Adopting the source/base station structure of a sensor network discussed earlier..fm m=1 j∈E e j (xs( j) . For a given network with a certain number of sensor nodes. . a sensor must be deployed at its location.Optimization Problems in the Deployment of Sensor Networks 183 known in advance (e. we also need to deploy a certain number of sensor nodes that work as relays. where V is the set of nodes and E is the set of links.t. Each link j starts at node s( j) and ends at node t( j) and e j denotes some cost metric per unit of data.

in which case we can write: E tx = α11 + α2 d n . xt( j) ) = e(||xs( j) − xt( j) ||) = α1 + α2 ||xs( j) − xt( j) ||n (6. . A property of e j (·) as formulated in Equation (6.8) is that it is a convex function of both xs( j) and xt( j) . a minimum-power topology was proposed based on the assumption that there is no constraint on the number of intermediate sensor . . as long as the convexity property is preserved.2). Therefore. M). flow non-negativity. E se = α3 (6. . and dm = [dim ] is the flow balance vector for data source m such that ⎧ ⎨ −1 i = 0 dim = +1 i = m ⎩ 0 otherwise.4) (6. α12 is the energy/bit consumed by the receiver electronics. xt( j) ) can be specified based on a model whose key parameters are the energy needed to sense a bit (E se ). . In this case. and the fixed locations of the M sources. |E| [7]: ⎧ ⎨ +1 if arc j leaves node i a ij = −1 if arc j enters node i ⎩ 0 otherwise.184 Modeling and Control of Complex Systems f jm ≥ 0 ∀ j. In Reference [8]. In the flow balance Equation (6. . (6. and transmit a bit over a distance d (E tx ). N and j = 1. α2 accounts for energy dissipated in the transmit op-amp. the energy consumed by a node acting as a relay that receives a bit and then transmits it a distance d onward is e(d) = α11 + α2 d n + α12 ≡ α1 + α2 d n . the link cost e j (xs( j) . The remaining three equations represent the link capacity constraints. . m xm = sm ∀m. . in Equation (6. and α3 is the energy cost of sensing a bit.1). .8) (6. where a component f jm of the flow vector fm denotes the data rate on link j ( j ∈ E) that originates from source m (m = 1. . receive a bit (Er x ). the decision variables are fm and xi . A = {a ij } is the node-link incidence matrix of graph G such that for all i = 1. . A 1/d n (n ≥ 1) path loss is commonly assumed [8].7) for each link j ∈ E that starts at node s( j) and ends at node t( j). xt( j) ) denotes the transmission energy consumed per unit of data. . we shall consider a particular problem in which our objective is to determine a minimum power deployment. Hence. Er x = α12 .5) In Equation (6. .1) we have: e j (xs( j) . The solution of the deployment problem is in fact robust with respect to the specific form of e j (·).6) where α11 is the energy/bit consumed by the transmitter electronics. Although this formulation is general. The function e j (xs( j) .

a large number of relay nodes are needed. The theoretical optimal number of hops. For example. xt( j) ). the fact that in a sensor network data traffic always flows from the sources towards the base station.Optimization Problems in the Deployment of Sensor Networks 185 nodes. The nonlinearity of the cost function as well as the coupling of these two problems make Equation (6. the most energy-efficient path between a data source and the sink is a straight line with multiple hops. a natural idea is to minimize the power consumption by (1) making two or more data flows share some relays. current sensor networks indeed operate with light traffic and the actual data flow over a link is unlikely to reach the link’s capacity. the solution approach proposed in Reference [3] uses a decomposition method exploiting two facts: 1.1) difficult to solve. In addition. which couples two traditional optimization problems: if flow vectors fm are given and Equation (6. the convexity of the link costs e j (xs( j) .1). Equation (6.1) can be viewed as a facility location problem.9) However. we also relax the capacity constraint (6. When this capacity is not reached.3). and the minimumpower topology is constructed by building such a path for each data source in the network. or (2) deploying fewer relays on some route. Under this assumption.1) is optimized only over the locations of sensors xi . including the node that D acts as a sensor at a data source s. because each data flow is independent and shares no relays with the rest. in constructing this minimum-power topology. n are defined by the node energy model above. is given by K opt = dchar where D = s − b is the distance between s and b and dchar is the “characteristic distance” given by: dchar = n α1 α2 (n − 1) where αi . which allows us to reduce the feasible space of fm by only considering flow vectors that form a tree structure over the network. This topology consumes the least power since each data flow rm takes the shortest path toward the sink and by optimizing the number of intermediate nodes. on the other hand. K opt . it is also easy to see that no links other than those in a tree structure are ever used . n − 1 dchar (6. This brings us back to the minimum-power sensor deployment problem (6. As an alternative. the power consumption on this shortest path is also minimized. we have found that using standard Lagrangian relaxation methods does not substantially reduce complexity because this coupling is tight. The corresponding lower bound for power consumption between some source sm and the base station is (Dm = sm − b ): Pm = α1 n Dm − α12 rm + α3rm . if sensor locations xi are given and fm are the only decision variables. When the number of nodes is limited. and 2. it can be reduced to a minimum-cost flow problem.

f M ) m f (6.13) and (6. .13) (6.10) s. . therefore guaranteeing the tree structure of the network.15) . f M ) for a given set of flow vectors.xm = sm . Equation (6.1) is a weighted sum of all link costs. . . . m j∈E f jm ∈ [0.14) build a unique path between each data source and the base station. . hence. . .t.1) becomes: min g(f1 . Because the cost e j (·) of link j is a convex function of the location of its end points xs( j) and xt( j) and the total cost in Equation (6. xt( j) ) ∂ xi . the first step is to solve Equation (6. .14) s. rm ] ∀ j. (6. this implies that for a given set of flow vectors fm . M g(f1 .12) (6. In this formulation. and keeping in mind the network tree structure and the elimination of Equation (6. .t. Subproblems (6. . in particular. 6. . Starting with a feasible set of flow vectors f1 . .12) still captures all flow balance equations. f M ) = min xi m=1 j∈E f jm e j (xs( j) . j∈E a ij f jm = −rm dim ∀i. . . . 1 M With g(f . More formally. . 1. . . .10). the cost will also be a convex function of the locations of sensors xi . It views the network as a dynamic system with “inner forces” applied to each node. An efficient gradient-based method for doing this (referred to as the “inner force method”) is detailed in Reference [3]. f M . m b ij f jm ≤ rm ∀i. . which provides information used to update the flow vectors.1 Problem Decomposition The proposed decomposition method is motivated by the special structure of the problem. . m = 0.11) suggest an iterative approach for solving the original problem. This convexity permits the design of a fast algorithm to find the optimal sensor locations xi∗ and the corresponding minimal cost g(f1 . . . m where b ij = 1 if arc j leaves node i 0 otherwise. f ) as above. whereas constraints (6. a force applied to node i by link j is defined as: M Fij = − m=1 f jm ∂e j (xs( j) . the power consumption increases as well). the main problem (6. M.3.10) and (6.11) (6.186 Modeling and Control of Complex Systems [7] (if any such link is used. xt( j) ) (6. the distance to the sink is increased.3).

. and 3. As numerical results illustrate (see Reference [3]). we have at our disposal the lower bound (6. there is still a difficulty that prohibits its implementation.1 graphically summarizes the overall process. which generally implies the existence of multiple local minima. Although this idea is straightforward.12) with a ij .2 Incremental Iterative Approach In an incremental deployment. The next step is to add a node and determine its optimal location while preserving the network’s tree structure. we follow a different approach. Thus. Finally. that is. the number of possible tree structures increases exponentially and constructing an efficient algorithm to find the optimal topology is a crucial issue. the price to pay is that global optimality can no longer be guaranteed. The second step is to solve subproblem (6. the power improvement of each case will be checked and the one that provides the greatest improvement will become the new configuration. The difficulty is that g(f1 . as discussed above. . Unfortunately. and construct the corresponding tree structure with the base station as its root.Optimization Problems in the Deployment of Sensor Networks 187 Each such force causes node i to move toward the steepest descending direction that leads it to an equilibrium point (unique.9). as the number of nodes increases. each located at one of the M sources. . determining the optimal location of the new node and the corresponding flow vectors. 2. . The addition of a node and determination of its optimal location is a threestep process. since. b ij determined by this simple initial tree structure. Figure 6. . we know that the optimal deployment with an unlimited number of nodes consists of multihop straight-line paths between every data source and the base station. . However. . The associated flow vectors f1 . f M ) is a nonconvex (and nonconcave) function of the flow vectors f1 . f M are immediately given by Equation (6. . we determine which part of the network needs a new relay the most. f M . Then. based on the idea of 1. First of all. . due to convexity) where all forces applied on i are balanced out. the initial step is to begin with M nodes. find the optimal routing from all data sources to the sink in the tree structure resulting from the first step. 6. incrementing the number of nodes one at a time. The approach proposed in Reference [3] is based on a local topology adjustment. this lower bound is rapidly approached by the proposed algorithm and with a number of nodes significantly smaller than the associated number K opt given earlier.3. . all possible topology adjustments around this area are obtained. .9) that our solution can be compared to. repeating this process until the number of available nodes N is reached or the cost is sufficiently close to the known lower bound (6. .11). . thus the size of the problem is limited.

… . fM)TK (x1. …. xM ) K<N Yes No 1. … fM )t* and ( x1. … . f t=1. the inner forces on a link contain the gradient information of the power consumption on this link. t = 1.….188 Modeling and Control of Complex Systems Initialize with K = M nodes. Before adding a new node. fM ) … Solve Subproblem 1 for (x1. Add node and enumerate candidate tree structures and corresponding flows ( f1. solve min-cost flow problem for (f1. …. TK t 1 M ) (f1. …. xK+1)1 and g1 g1 (f1. xK+1)t* K = K+ 1 Terminate with K = N nodes with optimal solution (x1. The bottleneck node is determined by checking the inner forces applied to nodes: as mentioned earlier. Thus. TK (f1. …. the greater the power savings by shortening this link. … fM ) and ( x1. if a . xN ) and (f1. …. …. fM)t .2. … fM) FIGURE 6. ….3.1 Incremental iterative node deployment process. …. Locate bottleneck node 2. fM)1 Solve Subproblem 1 for (f1.1 Determining Bottleneck Node A bottleneck node is defined as a node around which a new relay and corresponding new topology would bring the most improvement to the power conservation of the whole network. fM ) Solution for Subproblem 2 t * = arg min g (f . xK+1)TK and gTK gTK(f1. …. 6. all nodes in the network have reached their equilibrium points.…. The larger an inner force applied by a link on the node.

. the number of all possible new topologies is 3 · 2m−1 − 2 where m is the number of children of the bottleneck node.3 Obtaining a New Deployment The outcome of step 2 when the current number of nodes is L < N is a number of possible new network tree structures. With these observations in mind. . the solution of problem (6.t∗ . ..11) reduces to comparing all such costs and determining t ∗ = arg mint=1. the power consumption on this link will improve greatly. 6. .. we should point out that the incremental deployment approach above is based on a centralized scheme. . . . .10) is solved (as described earlier). i = 1. Intuitively. i = 1.. f M ) t .t . we define the sparseness around node i as: SPi = j∈V(i) ||Fij || with Fij given in Equation (6. and the flows (f1 .15) and the bottleneck node k is defined to be the node that has the greatest sparseness. i=0. the insertion of a new relay also means adding a new link. . . . . . It assumes the existence of a controller with powerful computational capabilities. In the case of mobile nodes but still known data sources. . . there is no guarantee that the optimal location of the new node is indeed in the vicinity of the bottleneck node as defined above. In closing. . .2. perfect information of the whole network. L + 1 and cost gt (f1 . it follows that by shortening one of its links.. N Obviously. the corresponding node locations xi. we can visualize the area around this sensor node as being more sparse. . t = 1. . each with associated flow vectors (f1 .. we need to consider topologies generated when an additional relay and link are present in the target area. . Once it is determined. . . the precise placement of the new relay must be determined.2. Because we are working on a tree structure. L + 1. giving the corresponding optimal node locations xi. 6. TL gt (f1 . For each such structure t. Thus. . and there is a higher need for a new relay in this region. . and unlimited control over all sensor nodes. TL . . subproblem (6.. . Next. but the cost involved on other links will overwhelm this improvement.. f M ) t∗ . so the solution implied by this approach is generally suboptimal. That is.2 Enumerating Topology Adjustment Options The bottleneck node indicates the area that needs a new relay the most. . f M ). f M ). .3. say TL . k = arg max SPi . an open problem is the development of distributed algorithms for sensor node deployment through which an individual sensor node can autonomously decide how to move based on its own local knowledge of the overall system. .3. . As shown in Reference [3].Optimization Problems in the Deployment of Sensor Networks 189 node is balanced under several inner forces that have relatively larger magnitude.

4 Deployment of Networks with Unknown Data Sources and Mobile Nodes When nodes are mobile and data source targets are either unknown or are mobile as well. sensors must be deployed so as to maximize the information extracted from the mission space while maintaining acceptable levels of communication and energy consumption. There are also efforts that rely on a centralized controller to solve the coverage control problem. optimal locations can be determined by an off-line scheme that is akin to the widely studied facility location optimization problem. The mission space will now be modeled using a density function representing the frequency in which specific events take place (e. The static version of this problem involves positioning sensors without further mobility. tend to overlook the fact that the overall sensing performance may be improved by sharing the observations made by multiple sensors. The movement of sensors not only impacts sensing performance.g. In addition.190 Modeling and Control of Complex Systems 6. Partition-based deployment methods. Much of the active sensing literature [11] also concentrates on the problem of tracking specific targets using mobile sensors. a sensor network is not only required to sense but also to collect and transmit data as well. typically deploying them into geographic areas with the highest information density. In particular. In Reference [9] a coverage control scheme is proposed that aims at the maximization of target exposure in some surveillance applications. A centralized approach does not suit the distributed communication and computation structure of sensor networks.[11]. The dynamic version allows the coordinated movement of sensors. on the other hand. data are .. In Reference [10]. the problem of deploying sensors in order to meet the overall system objectives is referred to as the coverage control or active sensing problem [9]. and in Reference [12] a heuristic algorithm based on “virtual forces” is applied to enhance the coverage of a sensor network. the authors develop a decentralized coverage control algorithm based on Voronoi partitions and the Lloyd algorithm. the combinatorial complexity of the problem constrains the application of such schemes to limited-size sensor networks. the problem is often viewed in that framework. but it also influences other quality-of-service aspects in a sensor network. Some of the methods that have been proposed for coverage control assume uniform sensing quality and an unlimited sensing range. Because of the similarity of coverage control with facility location optimization. another issue that appears to be neglected is the cost of relocating sensors. This motivates a distributed coverage control approach for cooperative sensing [4].[10]. which may adapt to changing conditions in the mission space. For this reason. and the Kalman filter is used extensively to process observations and generate estimates. both sensing quality and communication performance need to be jointly considered when controlling the deployment of sensors. Finally. especially those related to wireless communication: because of the limited on-board power and computational capacity.

. (6. The received signal strength generally decays with x − si . λi are determined by physical characteristics of the sensor.4. A simple example is pi (x) = p0i e −λi ||x−si || where the detection probability declines exponentially with distance. it emits a signal and this signal is observed by a sensor node at location si .. the distance between the source and the sensor. s)d x. Assuming that sensors make observations independently. In the mission space . N. . or it could be the probability that a variable sensed (e.Optimization Problems in the Deployment of Sensor Networks 191 generated at a certain point). A deployment algorithm is applied at each mobile node such that it maximizes the joint detection probabilities of random events. the adaptive relocation behavior naturally follows from the optimal coverage formulation. in the case that the mission space (or our perception of the mission space) changes over time.4. the joint probability that this event is detected can be expressed by: N P(x. x ∈ .2 Optimal Coverage Problem Formulation and Distributed Control When deploying mobile sensor nodes into the mission space. .1 Mission Space and Sensor Model We model the mission space as a polyhedron ⊂ R2 . At the two extremes. (6. s) = 1 − i=1 [1 − pi (x)] . . R(x) may be the frequency that a certain type of data source appears at x. R(x) satisfies R(x) ≥ 0 for all x ∈ and R(x) < ∞. si ∈ R2 . . This motivates the formulation of an optimal coverage problem. When an event occurs at point x. We assume that the event density function is fixed and given. .17) . Depending on the application. s N ). 6. this allows us to model a mission space with no information on target locations (using a uniform density function) or one with known locations (using a probability mass function). . we represent this degradation by a monotonically decreasing differentiable function pi (x). Similar to the model in Reference [13]. that captures the frequency or density of a specific random event taking place (in Hz/m2 ). there are N mobile nodes located at s = (s1 . and p0i .16) The optimal coverage problem can be formulated as a maximization of the expected event detection frequency by the sensor nodes over the mission space : max s R(x) P(x. however. over which there exists an event density function R(x). we want to maximize the probability that events are detected. We assume that a mobile sensor node has a limited range that is defined by a probabilistic model. temperature) at x exceeds a specific threshold. 6. when an event takes place at x and it is observed by sensor nodes. . i = 1.g. which expresses the probability that sensor i detects the event occurring at x.

this solution is only suitable for networks of limited size. we can use i to replace in / Equation (6.k =i [1 − pk (x)] dpi (x) si − x dx ddi (x) di (x) (6. In Equation (6. (6.18) When taking partial derivatives with respect to si .20). N. we have: ∂F ∂si = R(x) ∂ P(x. s) ∂si d x. Since pi (x) = 0. As shown in Figure 6. .21) defines node i’s region of coverage. However. s)d x. In addition.2. To address these difficulties. In this case. when the distance between nodes i and k is greater than 2D. We denote the objective function in Equation (6.16). every point x in i satisfies . di (x) ≥ D ddi (x) (6. instead of using a centralized scheme. a necessary condition for the detection probability pk (x) to be greater than 0 is dk (x) ≤ D. d pi (x) = 0 for all x s. d pi (x)/ddi (x) = 0 for all x ∈ i . Let: pi (x) = 0. This approximation is based on the physical observation that when di (x) is large. since it requires global information such as the value of R(x) over the whole mission space and the exact locations of all other nodes. (6. which is represented by i = {x : di (x) ≤ D}. The problem may be solved by applying a nonlinear optimizer with an algorithm that can evaluate integrals numerically. we first truncate the sensor model and constrain its sensing capability by applying a sensing radius. It is hard for a mobile sensor node to directly compute Equation (6.t.192 Modeling and Control of Complex Systems In this optimization problem. we will develop a distributed control method to solve the optimal coverage problem. a centralized controller with substantial computational capacity is required. the base station is a likely candidate for such a controller. the evaluation of integrals remains a significant task for a sensor node to carry out. Thus.21) is the emergence of the concept of neighbors.19) can be rewritten as: ∂F ∂si N = R(x) k=1. Another byproduct of using Equation (6.20). . In a mobile sensor network. both the complexity of the optimization problem and the communication overhead will make this centralized scheme infeasible.17) by: F (s) = R(x) P(x. pi (x) = 0 for most sensing devices. . Thus. . In view of Equation (6.19) If this partial derivative can be evaluated locally by each mobile node i. the partial derivative (6. the controllable variables are the locations of mobile sensors in the vector s. Otherwise. then a gradient method can be applied that directs nodes towards locations that maximize F (s).21) where D denotes the sensing radius. i = 1. Equation (6. for a point x ∈ i and a node k = i.20).20) where di (x) ≡ x − si .

k = i}.22) The final step in making Equation (6. ˜ pi (x) = pi (u. v) ˜ ddi (x) ˜ where Ri (u.22) become: ˜ R(x) = Ri (u. ddi (x) di (x) (6. v). A (2V + 1) × (2V + 1) grid is applied over the coverage region i with V = D/ . and the unit length being . the terms in Equation (6. . the transformation that maps (u. As nodes are deployed and data are collected. Equation (6. N. If we define a set / Bi = {k : si − sk < 2D. thus pk (x) = 0 and [1 − pk (x)] = 1 for all x ∈ i . where << D is the resolution of the grid. v) onto the global coordinate system is x = si + [ u v ]T . . .20) reduces to: ∂F ∂si = i R(x) k∈Bi [1 − pk (x)] dpi (x) si − x d x.22) computable is to discretize the integral evaluation. an individual node may update its local map through merging new observations into its perception. dk (x) > D. and by exchanging information with nearby neighbors. d pi (x) = pi (u. After applying Equation (6. let (u. On the grid of each node i.Optimization Problems in the Deployment of Sensor Networks 193 2D FIGURE 6. k = 1.2 Defining neighbor sets. . v) denote the location of a point x.20). with its origin located at si . its axes parallel to the grid’s setting.21) and using Bi . then any sensor node k ∈ Bi (k = i) will not contribute to the integral in Equation (6. Upon switching to this local coordinate system. all sensor nodes start with the same copy of an estimated event density function at the beginning of the deployment. . In this local coordinate system. a Cartesian coordinate system is defined. In a typical dynamic deployment application. v). Then. v) indicates node i’s local perception (map) on the event density of the mission space.

By doing so. By applying the grid and the coordinate tranformation. In both cases. v).194 Modeling and Control of Complex Systems We also can rewrite the product term in Equation (6. v) and pi (u.23) depends on the scale of the grid and the size of neighbor set Bi . the parameters p0i and λi if pi (x) = p0i e −λi ||x−si || ) and properly rescaling ˜ pi (u. v) using stored ma˜ ˜ trices. the complexity is quadratic in V.22) can be rewritten as: ∂F ∂si1 ∂F ∂si2 ≈ ≈ 2 ˜ ˜ ˜ Ri (u. 2 + v2 u u=−V v=−V V V V V 2 (6. v) pi (u. The most common approach in applying a gradient method is to determine the next waypoint on the ith mobile node’s motion trajectory through: sik+1 = sik + αk ∂F ∂sik (6. Through acquiring key sensor model parameters from neighbors (e. v) pi (u. where k is an iteration index.v− sk2 − si2 ˜ ≡ Bi (u. v) where (u − sk1 −si1 . v) and pi (u. v) as two matrices in the on-board memory of a sensor ˜ ˜ node. and the corresponding complexity is O(V 2 ).g. v)v √ . node i can also easily evaluate Bi (u. v) and the sensor model. The gradient information above provides a direction for a mobile node’s movement. v) Bi (u. In the worst case. v − sk2 −si2 ) are the coordinates of x in the kth node’s local coordinate system. The precise way in which this information is used depends on the choice of motion scheme. The best case occurs when there is no neigh∂ i . v) and pi (u.23) is that the values of pi (u.. the computation effort in repeatedly evaluating Equation (6. The computational complexity in evaluating the gradient shown in Equation (6.23) These derivatives can be computed easily by mobile sensor nodes using only the local information available. v) are uniquely ˜ determined by (u.24) bor for node i.22) as: [1 − pk (x)] = k∈Bi k∈Bi 1 − pk u − ˜ sk1 − si1 .g. An advantage of switching to the local coordi˜ nates in Equation (6. v)u √ 2 + v2 u u=−V v=−V ˜ ˜ ˜ Ri (u. and the step size αk is selected according to standard rules (e. see Reference [14]) in order to guarantee the convergence of motion trajectories.. Equation (6. node i has N − 1 neighbors and the number of operations F needed to compute ∂s is O( NV 2 ). This motivates the storage of pi (u. v) Bi (u.23) is drastically reduced.

e. Assuming a flat network structure (i. We shall use once again the link energy model (6. and the latter depends on the node’s location. . . Then. the overall objective function is written as: J (s) = w1 F (s) − w2 G(s) ∂ In order to derive partial derivatives ∂sJ as done earlier. the optimal coverage problem can be revised by combining sensing coverage and communication cost as follows: ⎧ ⎫ N ⎨ ⎬ max w1 R(x) P(x.26) s ⎩ ⎭ i=1 where w1 . Let c i (s) be the total power consumed by the network in order to deliver a bit of data from node i to the base station.6) to (6. . the coverage control mission includes the task of forwarding data to the base station.28) .7) of Section 6.18). which can be expressed as: ∂s ∂G ∂si = c i (s) dri (si ) + dsi N rk (sk ) k=1 ∂c k (s) ∂si .3 Optimal Coverage Problem with Communication Costs 195 Besides sensing and collecting data. One can think of w1 as the reward for detecting an event and w2 as the price of consuming a unit of energy.27) so that.4. (6. N. the cost of communication comes mainly from the power consumption for wireless transmissions. Here we assume that ri (si ) is proportional to the frequency events are detected.2). Note that ri is defined as a function of si because the amount of data forwarded at i is determined by the number of events detected.3. Let us denote the communication cost by: N G(s) = i=1 ri (si )c i (s) (6. s)d x − w2 ri (si )c i (s) (6. that is..Optimization Problems in the Deployment of Sensor Networks 6. w2 are weighting factors.25) where α3 (bits/detection) is the amount of data forwarded when the sensor node detects an event. The base station location is represented as s0 ∈ R2 and the data rate originating from the ith sensor node is denoted by ri (si ). we shall focus on the i i evaluation of ∂G . recalling Equation (6. no clusterheads as discussed in Section 6. i = 1. . ri (si ) = α3 R(x) pi (x)d x (6.

With properly selected step sizes. c i (s).7) with α1 = 0. six sensors establish a formation and move toward the center of the mission space.196 Modeling and Control of Complex Systems In this expression. sensors also maintain wireless . We consider two distinct cases. In addition. 20]. α2 = 0. 0]. The sensing radius is D = 5. At time t = 0. . we consider an example with a team of six mobiles waiting to be deployed into a 40 × 40 (meter) mission space. and n = 4.0. that is. both sensing coverage and communication cost are included (w1 .3. both ri and ∂ri can be obtained by applying the same ∂si method as the one described earlier. During its movement. mobile sensors reside at s0 = [0. Each mobile node is equipped with a sensor whose detection probability is modeled by pi (x) = p0i e −λi ||x−si || where p0i = 1.0. v) pi (u. so that sensors expand the overall area of sensing and at the same time jointly cover the points with high event density.3 presents several snapshots taken during the deployment process of the first case. as illustrated by the circles in Figure 6. V V r i ≈ α3 dri ≈ α3 dsi1 dri ≈ α3 dsi2 2 u=−V v=−V V 2 u=−V v=−V V 2 u=−V v=−V V V ˜ R (u. N. R(x) = R0 − β x − x0 (6. Figure 6. is determined by the way in which data forwarding paths are constructed. λi = 1. it collects 32 bits of data and forwards them back to the base station (so that α3 = 32 in Equation (1. mobile sensors will finally converge to a maximum point of J (s).26). In the second case. To illustrate the use of this distributed deployment algorithm. no communication cost is considered. In the first case. v) pi (u. . A mobile sensor also has a wirelesstransceiver whose power consumption is modeled by Equation (6.6)).3a.1. w2 > 0). w2 = 0 in the optimal coverage formulation (6. v)u ˜ √ 2 + v2 u ˜ R (u. Upon a sensor detecting an event. In this case.0 for all i = 1. these quantities are obtained in Reference [4] and each sensor uses gradient information to direct motion control as in Equation (6. 2 + v2 u (6.29) The only term remaining to derive in ∂G is c i (s) and its gradient. which corresponds to w1 > 0. That is. the formation keeps evolving. The cost of ∂si delivering a bit of data from i to the base station.30) where R0 = 3. v) ˜ ˜ R (u. recalling that x = si + [ u v ]T . v)v ˜ √ . The event density function R(x) is given by. For a typical shortest-path-based routing scheme. . v) pi (u. the precise routing protocol used. x0 = [0.24) with ∂ J /∂sik replacing ∂ F /∂sik . .001nJ/bit/m4 .01nJ/bit.0. the event density of a point x (x ∈ ) declines linearly with the distance between x and the center point x0 of the mission space. β = 0. Starting with Figure 6.

Figure 6.26). The team of sensors reaches a stationary deployment as illustrated in Figure 6. but they also maintain an economical multihop path to the base station. In contrast to the final formation of the first case (Figure 6. A direct observation is that in both cases.3 as links between sensor nodes and the base station. Figures 6. This is shown in Figure 6.3d).3d.3 Sensor deployment without communication cost consideration. The team of sensors finally converges to a stationary formation as shown in Figure 6. sensing coverage increases monotonically . The other two sensors are aligned as relays to support the communication with the base station. only four sensors gather around the center of the mission space.4d.5 depicts the change in sensing coverage (measured by the expected frequency of event detection) when sensors move towards the optimal deployment. a critical difference can be observed in the formation of mobile sensors: sensors not only move towards the area with high event density.6 demonstrate the sensing coverage and communication cost associated with the two cases previously shown.5 and 6.4. It can be seen in this symmetric formation that all six sensors are jointly sensing the area with the highest event density.Optimization Problems in the Deployment of Sensor Networks 197 (a) (b) (c) (d) FIGURE 6. The corresponding deployment simulation results are shown in Figure 6. Comparing with the first case. communication with the base station. We incorporate communication cost into the optimal coverage formulation by setting w2 = 0.0008 and w1 = 1 − w2 in Equation (6.

. cost 0 20 40 60 t 80 100 120 FIGURE 6.198 Modeling and Control of Complex Systems (a) (b) (c) FIGURE 6. cost Including comm. (d) 100 90 Frequency of Detection (Hz) 80 70 60 50 40 30 No comm.5 Comparison of sensing coverage.4 Sensor deployment with communication cost consideration.

Compared to the communication cost of the first case (1. one expects that global optimality is intricately connected to properties of the event density function and the sensor model adopted. If no communication cost is considered during sensor deployment. which corresponds to a 7.6 0. 6. when sensors reach optimal deployment.6 1. If communication cost is considered.4 Power (nW) No comm.2 0 0 20 40 60 t 80 100 120 FIGURE 6.47 Hz. One issue that we have not addressed explicitly in the development of this distributed cooperative coverage control approach is that of the global optimality of the gradient-based algorithm involved.26) actually trades off sensing coverage for a lower communication cost. only 84. This remains a topic of ongoing research. sensing coverage reaches a maximum at 91.8 0. cost 1. the final power consumption is 8.Optimization Problems in the Deployment of Sensor Networks ×105 199 2 1.2 some of the major design and control problems related to sensor networks. In particular. because the optimal coverage formulation (6.4 0.8 1. This trade-off can be further examined by looking at Figure 6.5 Research Issues in Sensor Networks We mentioned in Section 6. there is a 95. There are numerous open . This coverage loss is natural.73% power saving.2 1 0. with the evolution of formations.877 × 105 nW). which include the use of cooperative control techniques as they pertain to the case of mobile nodes. cost Including comm. However. in the case that communication cost is considered.6 Comparison of communication costs.01 × 103 nW.36% coverage loss.74 events can be detected per second.6.

we mentioned in Section 6. see also Reference [15]. Second..e. the processes of data fusion and control are traditionally based on a synchronized time structure. First. A second research issue of particular importance to control theory is the obvious shift from sensor-poor to data-rich control systems. identifying the precise location of a sensor network node) when nodes are mobile is one that deserves in-depth study. we also mentioned that one form of power control is to switch the state of sensor nodes between “sleeping” and “sensing” or “relaying. To do so. but that is being challenged by computational models that rely on event-driven processes and by the simple intuitive observation that time-driven sampling is inherently wasteful. The presense of clusterheads implies different approaches for some of the problems we have discussed. making use of clusterheads acting as intermediate processing nodes between data sources and a base station. power control. The traditional setting of differential equation models and time-driven digital sampling provides a comfortable infrastructure for communication and control methodologies. bridging the gaps between them is a real challenge. requires new sampling mechanisms and possibly new data collection hardware as well. where data are collected only when “something interesting” happens. and control which brings together three disciplines that often use different modeling paradigms and different ways of thinking. we briefly mention some of them. however. Traditional feedback control systems have been designed under the premise that sensors are . Naturally.200 Modeling and Control of Complex Systems research issues in the areas of routing. one can define different types of sensor network missions. Although many of the open questions above are technically challenging in their own right. In the context of cooperative control. typically formulated through optimization problems. One of these issues concerns the combination of asynchronous and synchronous modes of operation in a common system setting. and designing a system environment where both can coexist remains an open problem. there are also some more fundamental issues of much broader long-term impact where progress has been minimal. Although the gathering of data is inherently asynchronous (due to multiple sensor nodes operating in different temporal and spatial scales). In what follows. computing. Third.2 that a potentially better structure for sensor networks is a hierarchical one. for example. The limited resources of sensor network nodes emphasize the need to switch to a more efficient event-driven sampling approach. Questions of local versus global optimality and the need for mechanisms consistent with the distributed nature of sensor networks are issues that remain largely unexplored to date. and the execution of various cooperative missions. This is one manifestation of the difference between time-driven and event-driven behavior.” Formulating such a switching control problem and devising solution methods dependent on the information available to each node is an interesting direction for research. These issues are closely related to the convergence of communication. scheduling. deployment may be quite different if a clusterhead can “aggregate” data from neighboring nodes and avoid the need for these nodes to use up energy for direct communication with the base station. the problem of location detection (i.

2542 – 2547. on Communications. vol. Cassandras. W. 8. M. Li. and Challenges. Combinatorial Optimization: Algorithms and Complexity. W. Sankarasubramaniam.” in Proceedings of IEEE Wireless Communications and Networking Conference. I. Cassandras.” in Proceedings of IEEE INFOCOM. To date. 7. G. References 1.” Proceedings of the IEEE. pp. 1998. M. C. 4245–4250. of course. 2001. by the Army Research Office under Grant DAAD19-01-0610. 3. S. pp. 2005. 11. One would expect that a combination can enable us to exploit advantages of both. pp. 2005. Koushanfar. Akyildiz. 2003. no. of 44th IEEE Conference on Decision and Control. and M. G. pp. H.” in Proc. pp. Graham. C. Cassandras and W.” European Journal of Control. vol. 9. C. Cayirci. Chandrakasan. pp. 5. Dover Publications San Raphael. 1247–1256. 2001.” in Proceedings of IEEE International Conf. “Upper Bounds on the Lifetime of Sensor Networks. F. B. vol. 2002. 785–790. Baliga. Bhardwaj.” IEEE Communications Magazine. Kumar. Y. “Middleware and Abstractions in the Convergence of Control with Communication and Computation. Li and C. 4-5. 2005. pp. by the Air Force Office of Scientific Research under Grants FA9550-04-1-0133 and FA9550-04-1-0208. “Distributed Cooperative Coverage Control of Sensor Networks. “A Survey on Sensor Networks. Opportunities. G. . Garnett. “A Minimum-Power Wireless Sensor Network Self-Deployment Scheme. G. The sudden wealth of sensor data (subject. 1380–1387. there appears to be a significant gap between schools of thought advocating one versus the other approach. W. 2. and T. Srivastava. Su. “Sensor Networks and Cooperative Control. 2005. R. and P. and E. 6.Optimization Problems in the Deployment of Sensor Networks 201 few and expensive and much of the “intelligence” in such systems is concentrated on compensating for limited state information. S. CA. no. 4. 91. “Coverage Problems in Wireless Ad-Hoc Sensor Networks. 40. 436–463. 8. and by the Department of Energy under Grant DE-FG52-06NA27490. Steiglitz. F. A. Li and C.” in Proceedings of 44th IEEE Conference on Decision and Control. “Sensor Networks: Evolution. Meguerdichian. Acknowledgments This work is supported in part by the National Science Foundation under Grant DMI-0330171. Chong. 102–114. Potkonjak. Papadimitriou and K. to bandwidth and delay limitations) shifts the need for “intelligence” towards processing potentially huge amounts of data and combining model-based methodologies with increasingly data-driven ones.

M. 14. L. pp. 1293–1303. Athena Scientific. pp. “Coverage control for mobile sensing networks. Gadeyne. Clouqueur. D. Nonlinear Programming. vol. Mosterman. Martinez. 2004. C. Karatas.” in Proceedings of 2nd IFAC Conference on Analysis and Design of Hybrid System. J. 12. Ramanathan. Cassandras. 2. no. Phipatanasuphorn. V. “Sensor Deployment and Target Localization Based on Virtual Forces. and K. T. and P. . A Survey.” IEEE Transactions on Robotics and Automation. 15. G. of the 5th International Conference on Numerical Methods and Applications (Borovets. and K.” in Proceedings of 1st ACM International Workshop on Wireless Sensor Networks and Applications (Atlanta. 2006. “Sensor Deployment Strategy for Target Detection. Clune. Cortes. I. Bulgaria). Bruyninckx. Zou and K. P. H. T. GA). J. 2003. MA. 20. T. 136–141. Bullo. Chakrabarty. Y. 13.” in Proc. “Hybrid System Simulation with SimEvents. 1995. Bertsekas. 11. and F. pp. “Active Sensing for Robotics. P. 316–324. Mihaylova. pp.202 Modeling and Control of Complex Systems 10. Lefebvre. S. 2002. 2002.” in Proceedings of IEEE INFOCOM. Saluja. 42–48. Belmont.

.......................5........5.............. The Internet possesses similar 203 ....5 Adaptive Congestion Control Protocol ...... the Internet has experienced tremendous growth..................................... 221 7................. which has transformed it from a small-scale research network to the largest and most complex artificially deployed system................ 237 7.......................... 234 7................3 Introduction.. 239 7... 225 7........................2........................5 A Multilink Example................................................ 223 7...............................................2.......2 A Primal Algorithms ..................... 203 Problem Formulation ..... 222 7...............................3...............................4 ACP Router................ 232 7.............................. 206 Previous Work ........ and Petros Ioannou CONTENTS 7......................... 225 7....................................7 Congestion Control in Computer Networks Marios Lestas...................................2.3........ 209 7.....2...............3 ACP Receiver.................... 210 7............................................................................................1 Protocol .....2......................... 244 7........... 221 7..............................................5...................................................................................................5.... 223 7.....6 Conclusions .............1....1....................5..........................3.........2 ACP Sender ..................... 215 7.....5............................. Andreas Pitsillides......................... 220 7................5..........5......2 7............5...........1 7......4 Model of the Single Bottleneck Link Case .....................................2..5...... 243 References....................................2.2 Performance Evaluation...................................1 Introduction In the last twenty years....4 Dynamics of ACP .....5.........3 Max-Min Congestion Controller Algorithms...1 Dual Algorithms.1 Scalability.................1 Packet Header ...............1........................................................3 Fairness ..7 Comparison with RCP ........................................... 213 7............2 Performance in the Presence of Short Flows ............................6 Comparison with XCP ..........................................................1.................... 229 7..............................5.. 241 7... 217 7.........................5.............................

When such vulnerabilities do show up. these methods do not have analytically proven performance properties and can thus prove to be ineffective as the system evolves over time. Congestion control is a representative example of how ad hoc solutions. These observations highlight the necessity to develop a new theoretical framework to help explain the complex and unpredictable behaviors of the Internet and offer alternative network protocols that are provably effective and robust. the distributed management of the available resources. useful in explaining how the interaction between the individual components of such systems allows the emergence of a global behavior that would not be anticipated from the behavior of components in isolation. and so on [1]. the extreme heterogeneity as a result of the diverse network technologies and communication services that are accommodated. Given the lack of a coherent and unified theory of complex systems. the slow additive increase and the drastic multiplicative decrease policy of the TCP protocol causes the system to spend a significant amount of time trying to probe for the available bandwidth. There are several factors contributing to the immense complexity of the system: the large scale and size as a result of its exponential growth. Many of the complex network functions that drive the current Internet have been developed using engineering intuition. the complex structures that arise in the implementation of the various functionalities of the layered protocols.” The original algorithm proposed by Van Jacobson [2] with its later enhancements ([3]–[7]). designers usually resort to even more complex network functions to solve the problem. The problem with this approach is that very little is known about why these methods work and very little explanation can be given when they fail. Dramatic progress is being made in developing such a theoretical framework to investigate and solve the problem of Internet congestion control. although being successful at the beginning. Congestion control mechanisms were introduced in the transmission control protocol (TCP) protocol to fix the defects that led in October 1986 to the first of a series of “congestion collapses. can later be found to be ineffective as the evolution of the system reveals their deficiencies. heuristics.204 Modeling and Control of Complex Systems structural properties to the ones characterizing many other complex systems pervading science: a plethora of often heterogeneous subsystems (sources and routers) performing complex functions. and ad hoc nonlinear techniques. led to the current implementation of TCP which has served the Internet remarkably well as this has evolved from a small-scale network to the largest artificially deployed system. Such a framework can serve as a starting point to develop a unified theory for complex systems. interconnected by heterogeneous links (wired. Despite its profound success. thus contributing to a spiral of increasing complexity [1]. there are currently strong indications that TCP will perform poorly in future high-speed networks. thus leading to underutilization . wireless. the fragmented nature of the underlying infrastructure. with the objective of making the system resilient to failures and robust to changing environments. satellite links) often incorporating complex dynamics themselves. Simulations and real measurements indicate that as the bandwidth delay products increase within the network. the hierarchical organization.

as the prices or congestion signals generated at each link [16]. The theoretical framework used in most of the recent studies originates in the work of Hayden [14]: however. [24]. Maxmin congestion control schemes are associated with a special class of utility . Max-min fairness is considered by many to be the ultimate fairness criterion as it originates from the intuitive notion of allowing each session to get as much network use as any other session. TCP becomes oscillatory and prone to instability [9]. It turns out that solutions with the required distributed structure can be obtained by interpreting the dual variables of the relevant Lagrangian function. Such abstractions based on fluid flow models help develop solutions that can be shown analytically to be effective. increasing the allocation of a session beyond the max-min equilibrium results in “forcing” other sessions to reduce their rates below their fair share. it has been shown that in networks incorporating wireless and satellite links. it has gained increasing popularity due to the pioneering work of Kelly et al. the proposed algorithms share a common problem. It has also been shown analytically that as the bandwidth delay products increase. Some of these algorithms have been used as a baseline to develop practical packet-level Internet congestion control protocols which have exhibited superior performance [23]. A utility function is associated to each flow and the objective is to maximize the aggregate utility function subject to the capacity constraints.Congestion Control in Computer Networks 205 of the available resources [8]. TCP is grossly unfair towards connections with high round-trip delays [10]. there has been increasing emphasis on designs based on mathematical abstractions of networks of arbitrary topology. These observations have triggered intense research activity on Internet congestion control. which has led to TCP enhancements and new congestion control protocols. long delays and noncongestion-related losses also cause the TCP protocol to underutilize the network [11]. Moreover. A class of congestion control algorithms that are known to solve this problem are algorithms that achieve max-min fairness at equilibrium. this problem is transformed into a convex programming problem. Through an appropriate representation. Despite the fact that heuristic methods continue to provide solutions with improved properties [12]. [16]. [18]. prior to implementation. where it was utilized to develop scalable price-based Internet congestion control schemes. many of which have been accompanied by local or global asymptotic stability results [19]–[22]. Connections traveling more hops have a higher probability of being assigned a smaller sending rate value and are thus beaten down by short hop connections [14]. This is known as the beat-down problem [25]. [15]. Several congestion control algorithms have been proposed using the described methodology [17] [18]. and stable. Finally. scalable. The challenge is then to establish global asymptotic stability of these schemes in the presence of heterogeneous delays.[13]. In the fore-mentioned framework. However. Congestion control algorithms can then be viewed as distributed iterative algorithms that compute optimal or suboptimal solutions of this problem. robust. the congestion control problem is viewed as a resource allocation problem where the objective is to allocate the available resources (link bandwidths) to the competing users without the input data rates at the links exceeding the link capacity.

which is chosen based on a control law of the form: wi = g(wi . s N ] where si denotes user i. l L }. nonlinear control theory [27]. Associated with each user si is its sending rate xi . 7. We use this model to formulate the congestion control problem mathematically. l2 . However. . In Section 7. we present a fluid flow model that has been used extensively in the last few years. Finally. . This chapter provides a survey of recent theoretical and practical developments in modeling and design of Internet congestion control protocols. s2 . So. . which fail to satisfy all the design objectives.2 we present a theoretical framework that has been used extensively in the last few years to analyze networks of arbitrary topology and in Section 7. The network is utilized by a finite set of users U = [s1 .2 Problem Formulation Central to the development of Internet congestion protocols with verifiable properties are mathematical abstractions of networks of arbitrary topology. and so designers usually resort to simple network topologies comprising a single bottleneck link to analytically validate their designs. . . ˙ xi = h(wi . in Section 7. q i ). We consider a store and forward.3 we review advances that have emerged in the context of this framework. Let I denote the index set of the users. Each user injects data packets into the network. The network consists of a finite set of links R = {l1 . the problem of designing fair (in the max-min sense) and effective congestion control protocols supported by global stability results in the presence of delays still remains open. and even fuzzy logic-based control [28]. despite intense research activity on Internet congestion control. packet-switched network that accommodates elastic applications. In such simple networks several feedback-based control techniques have been used for design: linear control theory [26]. where l j denotes link j. These nonlinearities make the analytical evaluation of the proposed congestion control schemes difficult in networks of arbitrary topology incorporating delays.206 Modeling and Control of Complex Systems functions and are usually characterized by nonlinear feedback communication mechanisms. . a new max-min congestion control protocol that has been shown through analysis and simulations to outperform previous proposals and work effectively in a number of scenarios. . Let J denote the index set of the links.2) .5 we present adaptive congestion control protocol (ACP). Data traffic within the network is viewed as a fluid flow.1) (7. the lack of analytically verifiable solutions in networks of arbitrary topology has led to packet-level protocols.4 we present a simpler mathematical model used in the analysis of recently proposed max-min congestion control schemes and then in Section 7.6 we offer our conclusions and future research directions. q i ) wi (0) = wi0 (7. In this section. In Section 7. .

5) (7. . q 1 ). Similarly we use the vectors w = [w1 . q 2 ). . . z j (0) = z j0 ∀j J (7. zL ]T to denote the vector of the controller states at links l1 to l L . . The entry in the ith row and jth column of A is denoted by a i j . . p j = v(z j . . . . . In this representation. q N )]T We can then write: w = G(w. . . q 2 ). . we use the vector z = [z1 . h 2 (w2 . We use the vector p = [ p1 . . . y2 . . q N )]T H(w. C L ]T . A consists of elements equal to 0 or 1. We lump the functions gi (. z2 .7) At each link j we associate a signal processor that generates a signal p j which denotes the congestion status at the link. Let A ∈ R L×N denote the matrix that represents the route of each user. . . .) and v(. q ) = [h 1 (w1 . . q 2 . Similarly we define the vector C = [C1 .Congestion Control in Computer Networks 207 where wi denotes a state maintained at user si (wi may also be a vector of states). We use the vector y = [y1 . . . yL )]T . Ignoring the queuing dynamics we can establish that: y = Ax (7. .). y1 ). y) = [d(z1 . . p L ]T to denote the vector of the congestion signals at links l1 to l L . w N ]T and q = [q 1 .) to form the vector valued functions: D(z. . q i denotes a feedback signal received from the network which represents the presence of congestion in the route of user si . .) to form the vector valued functions: G(w. h N (w N . .3) (7. d(zL . . s N . q ) = [g1 (w1 . . q 1 ). and gi (. h i (. The functions d(. Otherwise it is equal to 0. . . We use the vector x = [x1 . ˙ x = H(w. . . . g2 (w2 . . x2 . p2 . C2 .). q ). y j ). .4) To each user we also associate a utility function Ui (xi ) of the sending rate xi which describes how “happy” a user is with a particular sending rate assignment.) and v(. y j ). w2 . Let C j denote the output capacity of link j and let y j denote the flow rate of data into link j. and we lump the functions d(. .8) (7. . If user i utilizes link j then a ji is equal to 1. q ) w(0) = w0 (7. xN ]T to denote all the sending rates of the sources s1 . These utility functions are chosen to be strictly increasing and concave. . yL ]T to denote the vector of input flow rates at links l1 to l N . . The congestion signal p j is generated according to a control law of the form: z˙j = d(z j .9) where z j denotes the state of the controller at link j (z j may also be a vector of states).) are to be generated by the congestion control strategy. h i (. s2 .6) (7. g N (w N .) are functions that are chosen by the congestion control strategy. q N ]T to denote all the states and the feedback signals at the sources.

as it traverses from its source si to its destination. p L L L N N N are system signal vectors. Data packets are active participants in this mechanism. and the congestion signals p. y).12) The operator F (.w are the state vectors of the system. D : × → . F (. .) is to be determined by the congestion control strategy. G(.14) (7. yL )]T The time evolution of the congestion signals can then be described by the following control law: z = D(z. So each feedback signal q i can only be a function of the congestion signals p j of the links l j which lie in the path of source si .) has specific structure. The relationship between the feedback signals q .17) (7.1 demonstrates how Equations (7. F : L → N L×N are static.13) to (7. . ˙ x = H(w. The equations indicating how the variables defined above are coupled together are summarized below: Plant : y = Ax Controller : z = D(z.).).10) (7. q ) w(0) = w0 z(0) = z0 (7. is represented by a vector valued function F (.208 Modeling and Control of Complex Systems V(z.16) (7. y) q = F ( p) w = G(w. Figure 7. It must be noted that the operator F (. G : × → are vector fields.11) The congestion signals generated at the links are communicated back to the sources resulting in the generation of a feedback signal q i at each source si .) such that: q = F ( p) (7. q ). v(zL . q .13) (7.).18) are interconnected in a feedback system. ˙ p = V(z. . and A is a matrix. generated at the links. y) z(0) = z0 (7.18) L N N L where z . received at the sources. This feedback signal is communicated back to the source using an acknowledgement mechanism. x.19) .15) (7. due to the assumed mechanism with which the feedback signals are generated. possibly nonlinear mappings. V(. and H(. y) = [v(z1 . The control objective is then to design the operators D(.). V : L × L → L . y1 ). A packet. H : N × N → N . calculates the feedback signal q i by processing the congestion signals it encounters in its path. y.) such that: t→∞ lim x(t) = x ∗ (7. . ˙ p = V(z. y).

y) p = V(z. is bounded. x≥0 (7. This together with the continuity of the cost function guarantee the feasibility of the optimization problem. where x ∗ solves the following optimization problem: P1: max i I Ui (xi ) (7.3 Previous Work Many algorithms have been proposed to solve the above optimization problem or relaxations of the problem. on the other hand. z = D(z.) Feedback Communication p FIGURE 7. These algorithms have been used as a baseline to develop packet-level protocols whose performance has been demonstrated through simulations or practical implementation. w = G(w. In dual algorithms. The polyhedral constraint set. the links update their congestion signals dynamically while the users utilize static laws to determine their sending rates. network users update their sending rates using dynamic laws. In primal algorithms.Congestion Control in Computer Networks 209 x A Routing Matrix . Congestion control algorithms which utilize dynamic laws at both the .y) Congestion Signal Update Source Behavior q F(. 7.q) y . The proposed algorithms can be divided into two classes: primal algorithms and dual algorithms.1 Feedback system. the objective is to maximize the aggregate utility function subject to capacity and feasibility constraints.q) x = H(w. while the links generate congestion signals using static laws.21) In other words.20) subject to Ax ≤ C.21). described by the inequalities (7.

In this subsection we formulate the dual problem and we present a dual algorithm that is shown to converge to the unique equilibrium point that is primal dual optimal. p) + x (7. The dual problem is formulated using the Lagrangian function which involves augmentation of the primal cost function with the constraints.210 Modeling and Control of Complex Systems link and the user ends are widely known as primal-dual algorithms.22) can be rewritten as: L(x. We demonstrate how a class of dual algorithms emerges as distributed solutions of the dual of the problem P1 while a class of primal algorithms solves relaxations of the original optimization problem. It turns out that these dual variables can be interpreted as the congestion signals generated at the links.24) and the dual problem is then: D: min D( p) p≥0 (7. These algorithms. p) = i I (Ui (xi ) − xi q i ) + j J pjCj (7. weighted by auxiliary (or dual) variables. however. Congestion control algorithms that achieve maxmin fairness solve this problem. p) = i I Ui (xi ) − p T ( Ax − C) (7.26) . The Lagrangian of the system problem P1 is given by: L(x.3. 7. In this section. suffer from the beat-down problem [25]. Equation (7.25) The dual function D( p) can be expressed in closed form by noting that the elements of the vector x are decoupled from each other in D( p) and so the values that maximize the Lagrangian can be expressed as follows: xi ( p) = Ui −1 (q i ) ˆ (7.1 Dual Algorithms The main objective of dual algorithms is to solve the dual problem of P1.22) By defining q = AT p and noting that p T Ax = x T AT p = x T q . we review representative primal and dual algorithms. We point out how max-min fairness relates to a special class of utility functions in the optimization problem P1 and we present a dual max-min congestion control algorithm that has been shown analytically to converge to the desired equilibrium point from any feasible initial condition.23) The Lagrangian is used to define the dual function as follows: D( p) = max L(x.

y j − C j ≤ 0 p j (0) = p j0 (7. p ∗ ) is primal dual optimal.29) where xi∗ = Ui −1 (q i∗ ) and q ∗ = AT p ∗ . The above assumption guarantees that the dual function is strictly convex. The strict convexity of f i (z) ∀i. then guarantee that the dual function D( p) is strictly convex. The feasibility of the problem P1. the necessary and sufficient optimality condition ([29]. p. y j − C j > 0.Congestion Control in Computer Networks Substituting the above in Equation (7. the concavity of the cost function. p. (C − Ax ∗ ) ≥ 0 (7. from which we can conclude that f i (z) is strictly convex. The convexity of D( p) follows from the following observation. The preceding analysis that characterizes the desired equilibrium properties of the system provides insights on how to update the primal and dual variables so that they converge to the desired equilibrium values.28) Because the utility function Ui (z) is assumed to be strictly increasing and concave it follows that −Ui −1 (z) is strictly negative and increasing. The strict convexity of the dual function D( p) then guarantees the uniqueness of p ∗ . 176) yields: ∇ D( p ∗ ) T p ∗ = (C − Ax ∗ ) T p ∗ = 0.27) We now make the following assumption: ASSUMPTION 1: The matrix A is full row rank. 317). p ∗ ) = max L(x. it follows that x ∗ ≥ 0. p. Then it is not hard to verify that: d f i (z) = −Ui −1 (z) dz (7. p ∗ ) + i I x (7.31) (7. At the unique optimal solution p ∗ of the dual problem. 316) that the pair (x ∗ .23) yields: D( p) = i I 211 [Ui (Ui −1 (q i )) − Ui −1 (q i )q i ] + j J pjCj (7. We consider the following dual algorithm: ⎧ ⎨ yj − C j p j = yj − C j ˙ ⎩ 0 xi = Ui −1 (q i ) if p j > 0 if p j = 0. Let f i (z) = Ui (Ui −1 (z)) − Ui −1 (z)z.32) . and the polyhedral constraint set guarantee the existence of at least one Lagrange multiplier ([29]. if p j = 0. Since p ∗ ≥ 0. 437) and thus of at least one solution p ∗ of the dual problem ([30]. and Assumption 1. p. It is also true that: Ui (xi∗ ) = L(x ∗ . The latter and the inequality ( Ax ∗ − C) ≤ 0 guarantee that x ∗ is feasible.30) from which it follows ([30].

either ˙ y j = C j and p j > 0 or p j = 0 and y j ≤ C j . THEOREM Suppose Assumption 1 holds.32) has a unique equilibrium point p ∗ which is globally asymptotically stable. We summarize the stability properties of the system in the following theorem.212 where yj = i Ij Modeling and Control of Complex Systems xi . Equation (7. It follows that p ∗ is globally asymptotically stable. it follows that V( p ∗ ) = 0. In addition the corresponding vector x ∗ is the unique solution of the problem P1. for each j. V( p) ≥ 0 for all p ≥ 0. A class of dual algorithms are proposed in Reference [15] which solve relaxations of the problem. Significant efforts have been made . The work is closely related to the work of Kelly and coworkers in References [32] and [15].35) ˙ where J (t) = { j : p j (t) = 0.31) guarantees that p j (t) ≥ 0 ∀t ≥ 0. Then starting from any initial condition p j (0) ≥ 0 j J . for all p ≥ 0 such that V = 0.31) to (7. PROOF We first show that the following function is a Lyapunov function for the system of differential Equations (7. The time derivative of V( p) is given by: ˙ ˙ V = ∇V( p) T p = j J p≥0 (C j − y j ) p j = ˙ j J \J (t) −( y j − C j ) 2 ≤ 0 (7. (C − Ax) T p = 0 and Ax − C ≤ 0. In addition. The network subproblem is a special case of P1 where the utility functions are weighted logarithmic functions. The latter two conditions are the necessary optimality conditions for the minimization of V( p) subject to p ≥ 0 which ˙ are satisfied only when p = p ∗ .31) to (7. y j (t) − C j ≤ 0}. The algorithm presented is similar to the algorithm proposed in Reference [16] and the stability proof is along the lines of the proof in Reference [31].32): qi V( p) = i I q∗ i −Ui −1 (σ )dσ + j J ( p j − p ∗ )C j j (7. Because V( p) = D( p) − D( p ∗ ) and D( p ∗ ) = min D( p). qi = j Ji pj (7. The forementioned algorithms and the related stability results were based on fluid flow models that ignore feedback delays. When V = 0.33) I j is the index set of the users utilizing link j and J i is the index set of the links that lie in the path of user i.34) Since p(0) ≥ 0. the system of differential Equations (7. This suggests that V( p) < 0 for all p ≥ 0 ˙ except when p = p ∗ in which case V( p) = 0. The strict convexity of the dual function guarantees that p ∗ is the only value in p ≥ 0 for which the latter is true. Kelly decomposes the problem P1 into user and network subproblems. So.

continuous. Because the functions f j (σ ) j J .39) The functions f j (σ ).3.Congestion Control in Computer Networks 213 to develop modifications that ensure stability in the presence of feedback delays. primal algorithms can only solve relaxations of the original optimization problem. The reasoning behind the consideration of problem P2 is that by suitable choice of the functions f j (σ ) j J. and we present a primal algorithm that converges to the unique solution of this problem.38) For the above algorithm. one can approximate the original constrained optimization problem P1 with the problem P2. The latter condition together with the continuity of V(x) and the closure of the constraint set guarantees the feasibility of the optimization problem. not identically zero and we consider the following optimization problem: yj P2 : max V(x) = x≥0 i I Ui (xi ) − j J 0 f j (σ )dσ (7.2 A Primal Algorithms The above analysis demonstrates that dual algorithms can solve the original optimization problem exactly. as demonstrated in several studies. a i is a positive source gain. 7. We are thus motivated to study primal algorithms where smart decisions are taken by the end systems. which approximates the problem P1. j J are chosen such that V(x) is coercive. increasing. which involves a penalty for violating the constraints. A notable attempt is reported in Reference [18] where the following algorithm is proposed: p˙j = yj − C j Cj −αi q i Mi τi + (7. We consider functions f j (σ ) j J which are non-negative. The problem with dual algorithms is that smart decisions are taken within the network. and conditions for global asymptotic stability have been established in Reference [22] using Lyapunov Ksasovskii functionals. τi is the round-trip time of source i. xi is a source constant and [ f (x)]+ is defined as: x [ f (x)]+ = x f (x) max (f (x). In this subsection we formulate an alternative optimization problem. 0) if x > 0 if x = 0 (7. conditions for local stability in the presence of delays have been established in Reference [18] using frequency response methods. However.36) pj xi = xi e ¯ (7. that is V(x) → −∞ when x → ∞.37) where Mi is an upper bound on the number of bottleneck links that source i sees in its path. thus violating the end-to-end principle that has shaped the Internet.

43) to (7.44) appear in the literature. The proof of the above theorem is similar to the proof of Theorem 1 and is omitted.41) (7. The cost function in Equation (7. xi (0) = xi0 ˙ ⎪ ⎩0 if xi = 0. At the optimal vector x ∗ . THEOREM Starting from any initial condition xi (0) ≥ 0 i I . This same condition is conjectured to be true in the case of heterogeneous delays. Ui (xi ) − q i > 0.43) xi = Ui (xi ) − q i if xi = 0. .40) This gives the following set of equations: Ui (xi∗ ) − j Ji f j ( y∗ ) ≤ 0 j f j ( y∗ ))xi∗ = 0 j j Ji (7. ∇V(x ∗ ) T x ∗ = 0 (7. yj = i Ij xi (7. It must be noted that several modified versions of the control law (7. which appears in Kelly’s network subproblem.214 Modeling and Control of Complex Systems are increasing it follows that the cost function V(x) is strictly concave.39) serves as a Lyapunov function for the system of differential equations.44) We summarize the convergence properties of the above algorithm in the following theorem. the following source algorithm has been proposed: xi = ki (wi − xi q i ) ˙ (7. Ui (xi ) − q i ≤ 0 where qi = j Ji f j ( y j ).43) to (7. We consider the following control law: ⎧ ⎪ Ui (xi ) − q i if xi > 0 ⎨ (7. the necessary and sufficient optimality condition yields: ∇V(x ∗ ) ≤ 0. The vector x ∗ is the unique solution of the problem P2.42) (Ui (xi∗ ) − We are looking for congestion control laws with the structure described in the previous section which solve the optimization problem P2.44) has a unique equilibrium point x∗ that is globally asymptotically stable.45) i I A simple condition on the gains ki was derived in Reference [19] which guarantees local stability in the presence of delays whenever all round-trip times are equal. This guarantees the uniqueness of a rate vector x ∗ which solves problem P2. This conjecture is shown to be true in Reference [33]. For the weighted logarithmic utility function Ui (xi ) = wi logxi . the system of differential Equations (7.

The vector x ∗ is said to be max-min fair if it solves . Max-min fairness was motivated by the intuitive notion of allowing each session to get as much network use as any other session. max-min fairness can be defined using a special class of utility functions. with any further increase in its rate resulting in the reduction of other sessions. At each user si the feedback signal q i is calculated by adding the congestion signals p j encountered in the path of the user: qi = j Ji pj (7. A special case of the above algorithm which can be used to describe TCP-like congestion algorithms ([20]. [21]) is the following: xi = ki xi ˙ f j ( yj ) = yj Cj α − bxim q i xin B (7. With respect to the optimization problem P1. thus giving rise to the term max-min flow control. 7.49) where a .50) Congestion control schemes that adopt this approach are known to suffer from the beat-down problem [25]. This has led to the idea of maximizing the network use allocated to the users with the minimum allocation. b are positive real numbers and m.3 Max-Min Congestion Controller Algorithms The algorithms reviewed in the previous section share a common method with which they calculate the feedback signals received by the network users as a function of the congestion signals generated within the network. For the above algorithm global asymptotic stability in the presence of heterogeneous delays was established in Reference [21] using Razumkhin’s theorem.3.Congestion Control in Computer Networks 215 Local stability certificates in the presence of heterogeneous delays were also derived in Reference [20] for the following algorithm: xi = ki xi ˙ f j ( yj ) = yj Cj 1− B qi Ui (xi ) (7. [16]. Max-min fairness has been defined in a number of ways. An equilibrium rate assignment that solves the fore-mentioned problem is the one that achieves max-min fairness.48) (7. n are real numbers that satisfy m + n > 0.46) (7. Connections traveling more hops have a higher probability of being assigned a smaller sending rate value and are thus beaten down by short hop connections [14].47) where B > 0.

as it traverses from source to destination. we present a max-min congestion controller which has been shown analytically to converge to the desired equilibrium point in the absence of delays. which update their transmission rates accordingly. ∀j J .54) (7. This nonlinearity in the feedback mechanism makes the analysis of max-min congestion control schemes in networks of arbitrary topology difficult. xi . The congestion control algorithm is the following: xi = min a ji p j . j ∀i I ∀j J (7. However. accumulates in its header the minimum of the desired sending rates it encounters in its path. x≥0 (7. Local stability results have been established in a number of studies taking advantage of the decoupled set of differential equations which describe the system in a neighborhood about the equilibrium point [34]–[36]. However.52) Note the difference with Equation (7. q i = min p j j Ji (7. global stability results are rather difficult to obtain due to the system being hybrid [37]. where the summation operator is replaced with the minimum operator. Then a packet. ˙ yj = i Ij p j (0) = p j0 . So.55) p j = Pr [C j − y j ].53) (7. over the years network engineers have used different approaches to design max-min congestion control schemes based on alternative definitions of max-min fairness. at each user si the feedback signal q i is equal to the minimum of the congestion signals p j encountered in the path of the user. The most popular design approach has been the following: each link generates a signal which denotes the sending rate it desires from all the uses traversing the link. [38]) are modified versions of the algorithm presented here. So. such an approach results in high gains at the sources and leads to undesirable transient properties. This information is communicated back to the sources.51) This suggests that by appropriate choice of the utility functions. only approximations to the max-min allocation can be achieved. In addition. one can use the methods described in the previous section to design congestion control algorithms which converge to sending rate vectors approximating the max-min fair allocation. The establishment of global asymptotic stability of these algorithms in the presence of delays still remains an open problem. Other algorithms that appear in the literature and are accompanied by similar stability results ([14].50). Below.216 the following optimization problem: P3 : max lim subject to Modeling and Control of Complex Systems α→∞ −(− log xi ) α i I Ax ≤ C.

The projection operator in Equation (7. ∀i I .56) guarantees that the controller states are bounded from above.Congestion Control in Computer Networks 217 where p j0 ≥ 0 and the projection operator Pr [. The proof is long and technical and demonstrates the degree of complexity involved in the establishment of global asymptotic stability results for max-min congestion controllers. reveal insights on how to develop algorithms that remain stable in the presence of delays and queueing dynamics. We summarize the properties of the above algorithm in the following theorem. As pointed out in the survey. we present a widely used mathematical model of a single bottleneck link network. These models. j J } of the controllers are chosen to be nonnegative to ensure that the sending rates remain non-negative at all times. However. and t→∞ lim x(t) = x ∗ (7.56) ⎪ ⎩ 0 otherwise The initial states { p j (0). The proof of this theorem can be found in Reference [39]. In this section. simpler in terms of the assumed topology but more complex in terms of the considered dynamics. THEOREM The congestion control algorithm (7. where x ∗ is the max-min vector of sending rates which solves problem P3. 7. j J } are bounded.55) guarantees that the controller states { p j . ∀t ≥ 0. for max-min congestion control schemes. the establishment of global asymptotic stability in the presence of delays still remains an open challenging research problem.57) for any feasible initial condition {xi (0) ≥ 0.53) to (7.] is defined as follows: ⎧ ⎪ C j − y j if p j < K ⎨ p j = C j − y j if p j ≥ K . which we use to demonstrate how the link . some of these algorithms have also been shown to be globally stable in the presence of delays.4 Model of the Single Bottleneck Link Case In the previous section we reviewed representative congestion control algorithms. thus leading to congestion control protocols with verifiable properties. xi (t) ≥ 0. We focused on algorithms whose equilibrium points can be shown to be globally asymptotically stable using fluid flow models of arbitrary networks that ignore delays. C j − y j < 0 ˙ (7. i I }. In order to ensure that this upper bound does not affect the convergence properties of the feedback system we choose the parameter K to be larger than the maximum capacity in the network. This problem has forced designers to develop maxmin congestion control schemes based on models comprising a single bottleneck link.

.58) In addition. N τ τ p(0) = p0 (7.59) (7. We assume that all network users have the same round-trip propagation delay τ and so the sending rate of all users is equal to the same delayed value of the desired sending rate. Signal Processor p Destination2 .218 Modeling and Control of Complex Systems y Source1 q C Destination1 Source2 . which share a common bottleneck link through highbandwidth access links. ˙ q (0) = q 0 (7. This information is communicated back to the network users which respond by setting their sending rate equal to the received information. and the output capacity is denoted by C.60) . We consider the single bottleneck link network shown in Figure 7. DestinationN side algorithm presented in Equation (7.2.2 Single bottleneck link network. . At the bottleneck link we assume that there exists a buffer. The rate of data entering the buffer is denoted by y. . SourceN FIGURE 7. the signal processor calculates the desired sending rate p according to the following differential equation: p= ˙ 1 ki kq (C − y) − 2 q . The input data rate at the link is thus given by the following equation: y = Np(t − τ ) We model the queue as a simple integrator as follows: q = y − C. At the bottleneck link. It consists of N users. which accommodates the incoming packets. which calculates the desired sending rate p.54) needs to be modified to account for delays and queuing dynamics. we implement a signal processor. the queue size is denoted by q . .

61) to (7. in order to maintain stability in the presence of delays.61) (7. simply integrates the excess capacity. The proof of this theorem can be found in Reference [40]. capacity. In the next section we present ACP.58) to (7. an adaptive congestion control protocol. It must be noted that the latter is an unknown time-varying parameter that needs to be estimated. In addition.62) is stable independent of delay. A queue size term is introduced to ensure almost zero queue sizes at equilibrium. however. and the ACP protocol described in the next section.62) The above equations have been used to describe a number of recently proposed congestion control protocols: the explicit congestion control protocol (XCP) presented in Reference [40].60) and is shown through analysis and simulations to satisfy all the design requirements of congestion control protocols outperforming previous proposals.54). The former algorithm continues to integrate the excess capacity. Equation (7. which guarantees stability when delays and queuing dynamics are ignored. τ τ q = x. THEOREM If the parameters ki and kq satisfy: π 0 < ki < √ . 4 2 √ kq = ki2 2 (7. . the rate control protocol (RCP) protocol presented in Reference [26].60) to obtain the following set of differential equations: ki kq x(t − τ ) − 2 q (t − τ ). It is interesting to compare the link side algorithm described by Equation (7. The following theorem gives conditions on ki and kq which guarantee that the system is stable.60) with the link side algorithm described by Equation (7. but incorporates additional terms that guarantee that the system is stable in the presence of delays and queuing dynamics. leading to protocols with different performance characteristics. We substitute the latter in Equations (7. The latter algorithm.63) then the system (7. More relaxed stability bounds using nonlinear analysis tools can be found in Reference [26] and [41].Congestion Control in Computer Networks 219 where ki and kq are design parameters. which is also based on Equation (7. differ significantly in the packet-level implementation of the equation. the control parameters are normalized with the time delay and also with the number of flows N utilizing the bottleneck link. We define the variable x(t) = y(t) − C.60) has been used as a baseline to develop a number of congestion control protocols which. q (0) = q 0 ˙ x=− ˙ x(0) = x0 (7. and number of sources.

Attempts to develop algorithms that do not require maintenance of per flow states within the network include the queue lengthbased approach in Reference [42]. The algorithm integrates the excess capacity and introduces a queue size term to ensure almost zero queue sizes at equilibrium. such an aggressive policy can cause underutilization of the network for large periods of time.220 Modeling and Control of Complex Systems 7. The RCP protocol has been designed with the objective of minimizing the completion time of the network flows. scalability with respect to changing bandwidths. small queue sizes. in networks with high-bandwidth delay products. it applies a rather aggressive policy when increasing or decreasing the sending rate of the network users. In this section we present an adaptive congestion control protocol (ACP). XCP constitutes the most promising approach as it achieves high network utilization. several max-min Internet congestion control protocols have been proposed recently. and the RCP protocol presented in Reference [26]. delays. it is well known that such an approach offers limited control space and thus leads to significant oscillations and degradation in performance. Each link calculates at regular time intervals a value that represents the sending rate it desires from all users traversing the link. a new congestion control protocol with learning capability. as it traverses from source to destination. The user-side algorithm then gradually modifies its congestion window in order to match its sending rate with the value received from the network. However. smooth and fast responses. it has been shown in Reference [43] that the scheme fails to achieve fairness in scenarios with multiple congested links. The deficiencies of the fore-mentioned protocols indicate that the problem of high speed Internet congestion control still remains open. The userside algorithm also incorporates a delayed increase policy in the presence of congestion to avoid excessive queue sizes and reduce packet drops. ACP can be characterized as a dual protocol where intelligent decisions are taken within the network. A packet. This information is communicated to the user that has generated the packet through an acknowledgment mechanism. All these approaches have distinct disadvantages. However. which outperforms previous proposals and is shown through simulations to work effectively in a number of scenarios.5 Adaptive Congestion Control Protocol Due to the problems encountered by TCP in networks with high-bandwidth delay products. The main control architecture is in the same spirit as the one used by the available bit rate (ABR) service in asynchronous transfer mode (ATM) networks. and the number of users utilizing the network. However. the XCP protocol presented in Reference [40]. In order to achieve the latter. The design of the link-side algorithm which calculates the desired sending rate is based on the algorithm described by Equation (7.60). accumulates in a designated field in the packet header the minimum of the desired sending rates it encounters in its path. and almost no packet drops. The scheme proposed in Reference [42] generates feedback signals using queue length information only. In order to maintain stability in the presence .

The scheme guides the network to a stable equilibrium which is characterized by high network utilization. This field is initiated with the user’s desired rate and is then updated by each link the packet encounters in its path. however. It also exhibits nice dynamic properties such as smooth responses and fast convergence. is known to lack robustness and lead to erroneous estimates.3 ACP congestion header. It is read by each router and is used to calculate the control period.3. In ACP online parameter identification techniques are used to derive an estimation algorithm which is shown through analysis and simulations to work effectively. It is scalable with respect to changing delays. In this way. In our simulations we use realistic traffic patterns which include both bulk data transfers and short-lived flows. a packet as it traverses from source to destination accumulates the minimum sending rate it encounters in its path. .1 Protocol 7. Algorithms that have been proposed to estimate this parameter are based on pointwise division in time [44]–[46]. This is an unknown time-varying parameter that needs to be estimated.1. the link informs its users that it is H_rtt (sender’s rtt estimate) H_feedback (desired sending rate) H_congestion (congestion bit) FIGURE 7. In the rest of this section we describe in detail the proposed congestion control scheme and we evaluate its performance using simulations. This approach. The H_congestion bit is a single bit which is initialized by the user with a zero value and is set by a link if the input data rate at that link is more than 95% of the link capacity. The field is set by the user and is never modified in transit. In this way. The H_feedback field carries the sending rate which the network requests from the user that has generated the packet. the value in the field is compared with the desired sending rate value and the smallest value is stored in the H_feedback field. and almost no packet drops.1 Packet Header In a way similar to XCP. and number of users utilizing the network. 7. More details about ACP and additional simulation results involving random packet drops can be found in Reference [41].5. The H_rtt field carries the current round-trip time estimate of the source that has generated the packet. At each link. max-min fairness. Extensive simulations indicate that the proposed protocol satisfies all the design objectives. the ACP packet carries a congestion header which consists of three fields as shown in Figure 7. small queue sizes.Congestion Control in Computer Networks 221 of delays the algorithm requires the number of users utilizing each link.5. bandwidths.

ACP maintains a congestion window cwnd which represents the number of outstanding packets and an estimate of the current round-trip time rtt.2 ACP Sender As in TCP. The projection operator Pr[. which indicates congestion in the source destination path.66) otherwise. the H_feedback field in the packet header is initialized with the desired sending rate of the application and the H_rtt field stores the current estimate of the round-trip time. (7. We multiply with the mr tt to transform the rate information into window information and we divide by the packet size to change the units from bytes to packets. which represents the sending rate requested by the network in bytes per second. In addition to these variables ACP calculates the minimum of the roundtrip time estimates which have been recorded. When a new acknowledgment is received. On packet departure. This is a good measure of the propagation delay of the source destination path and is used to transform the rate information reaching the sender to window information.1 (desired_window − cwnd) cwnd 1 (desired_window − cwnd)] cwnd (7. we apply a less aggressive increase policy.222 Modeling and Control of Complex Systems on the verge of becoming congested so that they can apply a delayed increase policy and avoid excessive instantaneous queue sizes and packet losses. We do not immediately set the cwnd equal to the desired congestion window because this abrupt change may lead to bursty traffic. the value in the H_feedback field. The smoothing gain of this filter depends on the state of the H_congestion bit in the acknowledgment received. The initial congestion window value is set to 1 and is never allowed to become less than this value because this would cause the source to stop sending data. If this is equal to 1. The congestion window is updated according to the following equation: cwnd = cwnd + 0.] is defined below and guarantees that the congestion window does not become less than 1. The congestion window is updated every time the sender receives an acknowledgment. is read and is used to calculate the desired congestion window as follows: desired_window = H_feedback × mrtt size (7. Pr [x] = x 1 if x > 1 otherwise.64) where size is the packet size in bytes. mrtt. The desired window is the new congestion window requested by the network.5.65) if desired_window > cwnd and H_congestion=1 and cwnd = Pr [cwnd + (7. Instead we choose to gradually make this change by means of a first-order filter.1.67) . If the source does not have a valid estimate of the round-trip time the H_rtt field is set to zero. 7.

7.Congestion Control in Computer Networks 223 7. For each link.3175. It then resets the received number of bytes.1. The desired sending rate is denoted by p and is updated every control period. The above variables are used to calculate the desired rate p every control period using the following iterative algorithm: p(k + 1) = Pr [ p(k) + 1 1 ki (0. When the control timer expires. The desired sending rate and other statistics are updated every time the timer expires. Values outside this range are not feasible.5. This variable is incremented with the packet size in bytes. respectively. The router implements a per link control timer.3 ACP Receiver When it receives a packet the ACP receiver generates an acknowledgment in which it copies the congestion header of the packet. and the projection operator is defined as follows: ⎧ ⎨ 0 if x < 0 Pr [x] = C if x > C (7. To achieve this objective the router maintains for each link a value that represents the sending rate it desires from all users traversing the link.05 and is updated every control period. The q is computed by taking the minimum queue seen by the arriving packets during the last propagation delay. On packet arrival the router reads the H_rtt field in the packet header and updates the variables that are used to calculate the average round-trip time. ˆ d(k) N(k) p(0) = 0 (7.4 ACP Router At each output queue of the router. In Reference [41] we show using phase plane analysis that this choice of the design parameters guarantees that the ACP protocol is stable for .68) ˆ where ki and kq are design parameters. N represents an estimate of the number of users utilizing the link. every time the queue associated with the link receives a packet. the objective is to match the input data rate y to the link capacity C and at the same time maintain small queue sizes. the router maintains a variable that denotes the number of received bytes.99 ∗ C − y(k)) − kq q (k)] . The router also maintains at each output queue the persistent queue size q in bytes.5. The average round-trip time is initialized with a value of 0.1587 and 0. The design parameters ki and kq are chosen to be 0. the link calculates the input data rate by dividing the received number of bytes with the control period. The control period is set equal to the average round-trip time d. The router calculates at each output queue the input data rate y. The propagation delay is unknown at the router and is thus estimated by subtracting the local queueing delay from the average round-trip time.1.69) ⎩ x otherwise The projection operator guarantees that the desired sending rate is nonnegative and smaller than the link capacity. The local queueing delay is calculated by dividing the instantaneous queue size with the link capacity.

The last function performed by the router at each link is to notify the users traversing the link of the presence of congestion so that they can apply . The basic idea is to integrate the excess capacity and to add a queue size term to guarantee that at equilibrium the queue size converges to zero.68) is based on Equation (7.71) ˆ N(0) = 10 (7.224 Modeling and Control of Complex Systems all delays. We choose this value to ensure a relatively conservative policy when initially updating the desired sending rate. A novel part of this work is that we use online parameter identification techniques to derive an algorithm that estimates the unknown parameter online.60). when transforming the continuous time representation to the discrete time representation of Equation (7.] is defined as follows: Pr [x] = x 1 if x > 1 otherwise (7. a packet as it traverses from source to destination accumulates the minimum of the desired sending rates it encounters in its path.68). Note that the initial value of the ˆ estimated number of flows N is equal to 10. Here we present a discrete-time implementation of the algorithm: ˆ γ [y(k) − N(k) p(k)] p(k) ˆ ˆ . the excess capacity term must be divided with the time delay and the queue size term must be divided with the square of the propagation delay. This prevents excessive instantaneous queue sizes. The link algorithm (7. Values less than one are obviously not feasible.1. the router compares the desired sending rate with the value stored in the H_feedback field and updates the field with the minimum value. The desired sending rate calculated at each link is used to update the H_feedback field in the packet header. The derivation is based on a fluid flow model of the network and is presented in Reference [41] together with the properties of the algorithm. γ is a design parameter that affects the convergence properties of the algorithm. in order to maintain stability.70) The projection operator guarantees that the number of flows traversing the link is never allowed to be less than 1. In this way. On packet departure. However. Previous experience in the design of link algorithms for congestion control has shown that to maintain stability we need to normalize the control parameters with the number of users utilizing the network. We choose γ to be equal to 0. We do this to reserve bandwidth resources which can be used to accommodate statistical fluctuations of the bursty network traffic. Previous work has shown that in a continuous time representation of the algorithm. N(k + 1) = Pr N(k) + 1 + p 2 (k) where the projection operator Pr[. Note also that we slightly underutilize the link at equilibrium by setting the virtual capacity equal to 99% of the true link capacity. we multiply both terms with the time delay and so we end up dividing only the queue term with the delay to maintain stability.

. and number of users. . 50 users share the bottleneck link through access links. 7. We conduct our simulations on the ns-2 simulator. We also conduct a comparative study and demonstrate how ACP fixes the performance problems encountered by XCP and RCP. We thus investigate the scalability of ACP with respect to changing link bandwidths.4. and number of users utilizing the network. 7.5.95 of the link capacity. and number of users utilizing Source 1 155 Mb/sec 20 msec 155 Mb/sec 20 msec Bottleneck Link Sink 1 Source 2 Sink 2 . delays. In this section. In our simulations we mainly consider bulk data transfers but we also evaluate the performance of the protocol in the presence of short web-like flows.2 Performance Evaluation Our objective has been to develop a window-based protocol that does not require maintenance of per flow states within the network and satisfies all the design objectives of congestion control protocols. we demonstrate through simulations that ACP satisfies these objectives to a very good extent.1 Scalability It is important for congestion control protocols to be able to maintain their properties as network characteristics change. propagation delays. We conduct our study by considering the single bottleneck link network shown in Figure 7.Congestion Control in Computer Networks 225 a delayed increase policy. As mentioned above. delays. the purpose of this study is to investigate the scalability of ACP with respect to changing bandwidths.4 Single bottleneck link topology used to investigate the scalability of ACP with respect to changing link capacities.2. In this case it deduces that the link is congested and sets the H_congestion bit in the packet header. Source 50 Sink 50 FIGURE 7. In the basic setup. . On packet departure the link checks whether the input data rate is larger than 0. . . More simulation results incorporating random packet losses and additional performance metrics can be found in Reference [41]. The bandwidth of all links in the network is set equal to 155 Mb/sec and their propagation delay is set equal to 20 msec.5. .

delays in the range 10 msec to 1 sec.1. thus causing the average queue size to increase as well.1 Effect of Capacity We first evaluate the performance of the ACP protocol as we change the link bandwidths. which implies that max-min fairness is achieved in all cases. as in all simulations the network users are assigned the same sending rate at equilibrium. the queue size always converges to an equilibrium value that is close to zero. We consider two measures for the queue size: the average queue size and the equilibrium queue size. We simulate for a sufficiently long time to ensure that the system has reached an equilibrium state. we fix the propagation delays to 20 msec. The reason for this becomes apparent when we investigate the transient properties of the protocol. Plots of the bottleneck utilization and the average queue size versus the link capacity are shown in Figure 7. The average queue size remains very small but we do observe an increasing pattern. We observe that ACP scales well with increasing bandwidths. However. As the bandwidth increases the maximum value of this overshoot increases.2. So. The packet size is equal to 1000 bytes and the buffer size of all links is set equal to the bandwidth delay product. It is highly unlikely that in an actual network the network users will enter the network simultaneously.226 Modeling and Control of Complex Systems the network. When investigating the scalability of the protocol with respect to a particular parameter. We do not report packet drops. we fix the other parameters to the values of the basic setup and we evaluate the performance of the protocol as we change the parameter under investigation. before settling down to a value that is close to zero. It varies depending on the round-trip propagation delay. we do not show fairness plots. In our simulations. and number of users in the range 1 to 1000. 7.5. The average queue size is calculated over the entire duration of the simulation and thus contains information about the transient behavior of the system. We consider bandwidths in the range 10 Mbits/sec to 1 Gbit/sec. . We fix the number of users to 50. as in all simulations we do not observe any.5. Moreover. The simulation time is not constant. In addition. In the transient period. in all scenarios. in all cases the queue size at equilibrium is small as required. we consider persistent file transfer protocol (FTP) sources. during which the users gradually enter the network. the users enter the network with an average rate of one user per round-trip time. The protocol achieves high network utilization (≈ 98%) at all bandwidths. The dynamics of the protocol and its ability to perform well in more complex network topologies are investigated in separate studies in later sections. the queue size at the bottleneck link experiences an instantaneous overshoot. The performance metrics that we use in this study are the average utilization of the bottleneck link and the queue size of the buffer at the bottleneck link. and we consider link bandwidths in the range 10 Mbits/sec to 1 Gbit/sec. The equilibrium queue size is calculated by averaging the queue length values recorded after the system has converged to its equilibrium state.

the queue size at equilibrium is close to zero. and we consider round-trip propagation delays in the range .6 0. Any change in the link propagation delay causes a corresponding change in the round-trip propagation delay of all source destination paths. However. we fix the number of users to 50.5 ACP achieves high network utilization and experiences no drops as the capacity increases.1.2 Effect of Delays We then investigate the performance of ACP as we change the propagation delay of the links.Congestion Control in Computer Networks 1 227 0.5. The average queue size increases with increasing capacity due to larger instantaneous queue sizes in the transient period. at all capacities.8 Bottleneck Utilization 0.2 0 0 200 400 600 800 Bottleneck Capacity (Mb/sec) (a) Utilization vs Capacity 1000 50 Average Equilibrium 40 Queue Size (packets) 30 20 10 0 0 200 400 600 800 Bottleneck Capacity (Mbits/sec) (b)Average Queue Size vs Capacity 1000 FIGURE 7.2.4 0. We fix the link bandwidths to 155 Mbits/sec. 7.

10 msec to 1 sec.4 0.8 Bottleneck Utilization 0.6 0. As the propagation delays increase. Figure 7. the queue size at equilibrium is close to zero.6 ACP achieves high network utilization and experiences no drops as the round-trip propagation delay increases. However.2 0 0 200 400 600 800 1000 Round−Trip Propagation Delay (msec) (a) Utilization vs Delay 200 Average Equilibrium Queue Size (packets) 150 100 50 0 0 200 400 600 800 Round–Trip Propagation Delay (msec) (b) Average Queue Size vs Delay 1000 FIGURE 7. This trend. however. the maximum of the overshoot . is due to the increasing instantaneous queue size in the transient period. Plots of the bottleneck utilization and the average queue size versus the round-trip propagation delays are shown in Figure 7. The results are similar to the results obtained when investigating the effect of changing capacities. as in the case of capacities.228 1 Modeling and Control of Complex Systems 0. the average queue size increases. The average queue size increases with increasing propagation delay due to larger instantaneous queue sizes in the transient period. at all delays.6.6a demonstrates that the protocol achieves high network utilization at all delays. The equilibrium queue size remains very small.

the fair congestion window is close to 1. the desired sending at the link is forced to oscillate about the equilibrium value. the utilization drops to about 90% and the average queue size increases. Because the congestion window can only take integer values. although smaller in number.2. When the fair congestion window is not an integer (which is the common case). thus causing an increase in the average queue size. it oscillates between 1 and 2. The reason for this is that as the number of users increases the queue size experiences oscillations. consists of both short and long flows.Congestion Control in Computer Networks 229 observed in the transient period increases. thus causing oscillations of the input data rate and the queue size. we evaluate the performance of ACP in the presence of short web-like flows. Although the average queue size increases. These oscillations of the congestion window cause both the utilization and the queue size to oscillate. Fifty persistent FTP flows share the single . Short flows account for a smaller percentage which. This behavior causes a decrease in the observed average utilization and an increase in the observed average and equilibrium queue size. the equilibrium queue size is not close to zero. 7.2.4. The reason is that at such a high number of users. However. As the number of users increases.3 Effect of the Number of Users We finally investigate the performance of ACP as we increase the number of users utilizing the single bottleneck link network in Figure 7. The bandwidth of each link is set equal to 155 Mbits/sec and the round-trip propagation delay is equal to 80 msec. however.4. We consider different numbers of users in the range 1 to 1000. the queue size at equilibrium is close to zero as required. The oscillatory behavior at equilibrium is caused by the fact that the congestion window can only take integer values. unlike the previous two cases.1.5.7. Elephants. The set of flows is dominated by a relatively few elephants (long flows) and a very large number of mice (short flows). cannot be ignored. These oscillations dominate the overshoots observed during the transient period and so the equilibrium queue size calculated is very close to the average queue size. In this section. however. Plots of the bottleneck utilization and the average queue size versus the number of users are shown in Figure 7. these oscillations grow in amplitude and at some point they cause a significant degradation in performance. We observe in Figure 7. account for the biggest percentage of the network traffic.5. Internet traffic. 7. It exhibits similar behavior to the behavior of the average queue size. We observe that up to approximately 800 users the protocol satisfies the control objectives as it achieves high network utilization and small queue sizes.7 that when the network is utilized by more than 800 users. We consider the single bottleneck link network shown in Figure 7.2 Performance in the Presence of Short Flows In our performance analysis so far we have only considered persistent FTP flows that generate bulk data transfers.

In Figure 7.7 ACP achieves high network utilization and experiences no packet drops as the number of users increases. thus causing a slight degradation in performance. Short flows arrive according to a Poisson process. the utilization drops slightly and the average queue size increases.2 0 0 200 400 600 800 1000 Number of Users (a) Utilization vs Number of Users 200 Queue Size (packets) 150 100 50 0 0 200 400 600 Number of Users 800 1000 (b) Average Queue Size vs Number of Users FIGURE 7. At a high number of users. The shape of this distribution is set to 1. We conduct a number of tests where we change to the mean of this arrival process to emulate different traffic loads.4 0. We observe . Because the congestion window can only take integer values both the utilization and queue size oscillate.230 1 Modeling and Control of Complex Systems 0.8 Bottleneck Utilization 0.6 0. bottleneck link with short web-like flows.8 we show plots of the utilization and the average queue size at the bottleneck link versus the mean arrival rate of the short flows.35. The reason is that the fair congestion window is small (close to 1). The transfer size is derived from a Pareto distribution with an average of 30 packets.

It must be noted that 500 users per second corresponds to a link load of 75%.6 0.2 0 0 100 200 300 400 Mice Arrival Rate (mice/sec) 500 (a) Utilization vs Mice Arrival Rate 100 Average Equilibrium 80 Queue Size (packets) 60 40 20 0 0 100 200 300 400 Mice Arrival Rate (mice/sec) 500 (b) Average Queue Size vs Mice Arrival Rate FIGURE 7. that as we increase the arrival rate the utilization drops slightly.8 ACP achieves high network utilization and maintains small queue sizes as the arrival rate of short web-like flows increases. the utilization recorded at the bottleneck link is 96%. whereas both the average queue size and the equilibrium queue size increase. Note that 500 users per second corresponds to a link load of 75%.Congestion Control in Computer Networks 1 231 0. which is satisfactory.8 Bottleneck Utilization 0.35. the transfer size of short flows is derived from a Pareto distribution with an average of 30 packets and a shape factor equal to 1. The important thing is that the queue size remains small and no packet drops are observed.4 0. Experiments have shown that short flows account for about 20% of the traffic. . In this regime. In simulations.

The path of the first user traverses all three links while the path of the second user traverses the first link only. and 4. At the beginning only users 1 and 2 utilize the network. the congestion windows of users 1 and 2 are different. 7. the actual values match exactly the theoretical values which implies that max-min fairness is achieved at all times. User 2 is still bottlenecked at the first link because this is the only link that it utilizes. the first link is the bottleneck link of the network and the fair sending rate for the two links is 77. users 2 and 3 start sending data at 20 sec and users 5 to 7 start sending data at 40 sec.9. We consider the three-link network shown in Figure 7. The first two users utilize the network throughout the simulation.5. 6. Seven users utilize the network at different time intervals. despite the fact that their theoretical max-min sending rates in this period are the same. the different round-trip times generate different congestion . whereas users 3 and 4 are still bottlenecked at the second link. user 2 increases its window to take up the slack created by user 1 sharing the bandwidth of link 2 with the other two users. and 7. There is no inconsistency between the two observations. These responses are compared with the theoretical max-min allocation values at each time.2. Both users traverse the second link. At 20 sec users 3 and 4 enter the network. Note that at 20 sec. which becomes the bottleneck link for users 1. We observe that at equilibrium. The two users experience different round-trip propagation delays as they travel a different number of hops.3 Fairness Our objective has been to develop a congestion control protocol that at equilibrium achieves max-min fairness. In Figure 7. Although their sending rates are identical.9 A three-link network used to investigate the ability of ACP to achieve max-min fairness. One thing to notice is that during the first 20 sec. In this section we investigate the effectiveness of ACP to achieve max-min fairness in a scenario where the max-min fair sending rates change dynamically due to changes in the network load. 4 Users 5−7 FIGURE 7. The actual responses are denoted by solid lines. whereas the theoretical values are denoted by dotted lines.10 we show the time responses of the congestion window of a representative number of users.232 User 1 Modeling and Control of Complex Systems 155 Mb/sec 20 msec 155 Mb/sec 20 msec 155 Mb/sec 20 msec User 2 Users 3. which now becomes the bottleneck link for users 1. At 40 sec users 5 to 7 start sending data through the third link.5 Mbits/sec. User 2 is bottlenecked at the first link. The bandwidth of each link is set equal to 155 Mbits/sec and the propagation delay of each link is set equal to 20 msec. During the time that only these two users are active. 3. 5.

This can be observed in Figure 7. does not increase indefinitely. The desired sending rate. Because the desired sending rate is originally equal to the link capacity. however. the link asks for more data.Congestion Control in Computer Networks 2000 User 1 User 2 User 3 User 5 User 1 User 2 User 3 User 5 233 1500 Cwnd 1000 500 0 0 20 Time (sec) 40 60 FIGURE 7. Despite this overshoot the system does not experience any packet drops. This is a result of the second link becoming a bottleneck link only when users 3 and 4 enter the network. This demonstrates the ability of ACP to achieve fairness in the presence of flows with different round-trip times and number of hops. A projection operator in the link algorithm causes the desired sending rate at the second link to converge to the link capacity. The above setting can be used to emulate the case where network users cannot comply with the network’s request because they do not have enough data to send. When users 3 and 4 enter the network the second link becomes their bottleneck link. During the time that only users 1 and 2 utilize the network. and the link reacts by asking for even more data. and so the input data rate in the second link is consistently less than the capacity. windows.10 Time response of the congestion window of a representative number of users compared with the theoretical max-min values. Basically. This causes the algorithm that updates the desired sending rate at the link to consistently increase the desired sending rate. the congestion windows of the two users experience an overshoot before settling down to their equilibrium value. Another interesting observation is the overshoot in the response of user 3.10. Their sending rate thus becomes equal to the desired sending rate computed at the link. Also note that the response of user 4 equals the response of user 3 and the response of users 6 and 7 are equal to the response of user 5 and are thus not shown. the two users are bottlenecked at the first link. The above shows the ability of ACP to also cope with this case. . The theoretical values are denoted by dotted lines. the users do not comply because they are bottlenecked elsewhere.

In such an environment. and we also evaluate the performance of the estimator which is used to track the number of users utilizing the network.4 Dynamics of ACP To fully characterize the performance of the proposed protocol. Because the users gradually enter the network. To conduct our study we consider the following scenario. . However. The protocol must generate smooth responses that are well damped and converge fast to the desired equilibrium state. they experience overshoots. This is why 300 250 200 Cwnd 150 100 50 0 cwnd 1 cwnd 30 cwnd 40 0 20 40 Time (sec) 60 80 FIGURE 7. user 30 stops sending data at 30 sec. we need to investigate its transient properties. We observe that the protocol achieves smooth responses which converge fast to the desired equilibrium with no oscillations. At 30 sec 20 of these users stop sending data simultaneously.2. User 1 utilizes the network throughout the simulation. To evaluate the transient behavior of ACP. in some cases. apart from the properties of the system at equilibrium. we examine the queuing dynamics at the bottleneck link. and user 40 enters the network at 45 sec. So the number of users utilizing the network is reduced to 10. When user 1 starts sending data it converges fast to its max-min fair allocation.11 Time response of the congestion window of three users. we consider the single bottleneck link network shown in Figure 7. At 45 sec. We observe smooth and fast responses with no oscillations. the max-min allocation gradually decreases.4. User 1 utilizes the network throughout the simulation.234 Modeling and Control of Complex Systems 7. and user 40 enters the network at 45 sec. Thirty users originally utilize the single bottleneck link network shown in Figure 7.4 and we generate a dynamic environment where users enter and leave the network at different times. thus causing the number of users to increase to 50. we investigate the dynamics of the user sending rates.5. In Figure 7.11. user 30 stops sending data at 30 sec.11 we present the time responses of the congestion window of a representative number of users. The transient behavior of the other users is very similar to the ones shown in Figure 7. 40 additional users enter the network. however.

while they increase their sending rate to take up the slack created. in the transient periods during which new users enter or leave the network. We also observe that the time response of the congestion window of user 40 experiences a small overshoot before settling down to its equilibrium value. the system reacts quickly by increasing the sending rate of the remaining users. However. However. We observe from Figure 7. However. This slight overshoot is caused by the feedback delays and the pure integral action of the congestion controller. This causes user 1 to decrease its congestion window and user 40 which has just entered the network to gradually increase its congestion window to the equilibrium value. This is caused by the fact that the remaining users. the average queue size increases as we increase the . The controller at the bottleneck link iteratively calculates this rate and communicates this information to the end users. thus causing an instantaneous underutilization of the link.Congestion Control in Computer Networks 235 the congestion window of user 1 experiences a large overshoot before settling down to its equilibrium value. When 40 new users enter the network at 45 sec. This is why in our study of the scalability properties of ACP. It must be noted that the maximum queue size recorded in the transient period increases as the bandwidth delay product increases. It can be avoided by introducing proportional action. This is due to the fact that the user sets its sending rate equal to the desired sending rate calculated at the bottleneck link while the latter is still decreasing. such a modification would increase the complexity of the algorithm without significantly improving the performance and is thus avoided.11 indicates fast convergence to the new equilibrium value with no oscillations. Note.11 that user 1 converges fast to the new equilibrium value with no undershoots or oscillations. This causes user 1 to increase its congestion window. The time response of the queue size indicates that the latter converges to a value that is close to 0. a new user. This is what is required by the congestion control protocol in order to avoid excessive queueing delays in the long run. the flow of data suddenly decreases. that once the desired sending rate calculated at the bottleneck link has settled down to an equilibrium value. The link identifies this drop in the input data rate and reacts by increasing its desired sending rate. the response does experience a small overshoot before settling down to its equilibrium value. When the 20 users suddenly stop sending data at 30 seconds the flow of data through the bottleneck link drops. However. It might seem strange that we observe increasing queue sizes when users leave the network. the max-min fair sending rate decreases. We observe that the link utilization converges fast to a value which is close to 1. thus causing an instantaneous decrease in the utilization. The next thing we investigate is the transient behavior of the utilization and the queue size at the bottleneck link. The time response in Figure 7. converges fast to the max-min allocation value with no overshoots. In Figure 7. experience overshoots. the queue size experiences an instantaneous increase. When the 20 users leave the network.12 we show the time responses of the utilization and the queue size at the bottleneck link. such as user 30. however. thus achieving almost full utilization in a very short period of time.

However.6 0. bandwidths and the delays. Utilization converges fast to a value that is close to 1.4 1. careful choice of the control parameters at the links and the delayed increase policy that we apply at the sources ensure that these overshoots do not exceed the buffer size and thus do not lead to packet drops.8 0.12 Time response of the instantaneous utilization and the queue size at the bottleneck link.2 0 0 20 Modeling and Control of Complex Systems 40 Time (sec) (a) Utilization vs Time 60 80 500 Bottleneck Queue Size (packets) 400 300 200 100 0 0 20 40 Time (sec) 60 80 (b) Queue Size vs Time FIGURE 7. There is an instantaneous drop when the 20 users leave the network but the protocol manages to recover quickly.2 Bottleneck Utilization 1 0. we evaluate the performance of the proposed . Here. A distinct feature of the proposed congestion control strategy is the implementation at each link of an estimation algorithm that estimates the number of flows utilizing the link.236 1. The queue size experiences instantaneous increases when new users enter the network but at equilibrium the queue size is almost zero.4 0. These estimates are required to maintain stability in the presence of delays.

13 Time response of the estimated number of users utilizing the bottleneck link.Congestion Control in Computer Networks 50 Estimated Number of Flows 40 30 20 10 0 237 0 20 40 Time (sec) 60 80 FIGURE 7. We observe almost perfect tracking at equilibrium and fast responses with no overshoots. We observe that the estimator generates smooth responses with no overshoots or oscillations. 20 users Link 2 Link 1 155 Mb/sec 155 Mb/sec Router 1 15 msec Router 2 15 msec Link 3 155 Mb/sec 15 msec Link 4 80 Mb/sec 15 msec Router 3 Router 4 Router 5 Router 9 20 users 20 users 20 users 20 users FIGURE 7. In the scenario that we have described in the previous subsection.2. So. or three links. In addition. . In Figure 7. two. the estimator tracks the changes in the number of users and produces correct estimates at equilibrium.5 A Multilink Example Until now we have evaluated the performance of ACP in simple network topologies that include one. We consider the parking lot topology shown in Figure 7. 7.14 A parking lot network topology. the number of users utilizing the single bottleneck link network changes from 30 to 10 at 30 sec and it becomes 50 at 45 sec. we evaluate the performance of the proposed estimation algorithm by investigating how well the estimator tracks these changes. estimation algorithm.5.13 we show the time response of the output of the estimator. Our objective in this section is to investigate how ACP performs in a more complex network topology.14.

we show on separate 1 0.238 Modeling and Control of Complex Systems The network consists of eight links connected in series. . In Figure 7.14. We do not report packet drops. which has a bandwidth of 80 Mbits/sec. In this way. Moreover.15 ACP achieves high utilization at all links and experiences no packet drops. Twenty users utilize the network by traversing all eight links. We evaluate the performance of ACP by examining the utilization and the average queue size observed at each link.15.4 0. In addition it manages to maintain small queue sizes.8 Link Utilization 0. as we do not observe any.6 0.2 0 1 2 3 4 5 Link ID 6 7 8 (a) Utilization at Each Link 35 30 Queue Size (packets) 25 20 15 10 5 0 1 2 3 4 5 6 Link ID (b) Queue at Each Link 7 8 Average Equilibrium FIGURE 7. All links have a bandwidth of 155 Mbits/sec except link 4. all links in the network are bottleneck links and link 4 is the single bottleneck link for the 20 users that traverse the whole network. each link in the network is utilized by an additional 20 users which have single hop paths as shown in Figure 7. The propagation delay of all links is set equal to 15 msec.

the access links of the next 10 users have a propagation delay of 100 msec. which has been recently developed in Reference [40]. both the equilibrium queue size and the average queue size remain small. We have chosen a variety of Sources 1−10 155 Mb/sec 15 msec Sources 11−60 155 Mb/sec 15 msec Router 1 155 Mb/sec 100 msec Sources 61−70 155 Mb/sec 2 msec Sources 71−80 Sinks 1−10 Sinks 11−80 Link 1 Router 2 Link 2 Router 3 155 Mb/sec 15 msec 80 Mb/sec 15 msec FIGURE 7. the remaining flows do not make efficient use of the residual bandwidth [43]. We consider the two-link network shown in Figure 7. It has been shown through analysis and simulations that when the majority of flows at a particular link are bottlenecked elsewhere. The access links of the first 60 users have a propagation delay of 15 msec. we do expect them to be fully utilized. This is consistent with our observations in previous sections.Congestion Control in Computer Networks 239 graphs the utilization achieved at each link and the average and equilibrium queue sizes recorded at the link. 7. we observe that ACP achieves almost full utilization at all links. At link 4 we observe smaller average queue size. This is due to its smaller bandwidth delay product. We consider a simulation scenario that involves users with heterogeneous round-trip times. Eighty users access the network though 155-Mbits/sec access links.2. .16.16 A two-link network used to investigate the ability of ACP to achieve max-min fairness at equilibrium.6 Comparison with XCP Our objective has been to develop a congestion control protocol that does not require maintenance of per flow states within the network and satisfies all the design objectives. Indeed. An explicit congestion control protocol (XCP). we consider a topology where the above problem is evident and we demonstrate that ACP fixes this problem and achieves max-min fairness. In addition.5. In this section. Link 1 has a bandwidth of 155 Mbits/sec whereas link 2 has a bandwidth of 80 Mbits/sec. and the propagation delay of the last 10 users are set to 2 msec. satisfies most of the design objectives but fails to achieve max-min fairness in the case of multiple congested links. Because all links in the network are bottleneck links for some flows.

This is consistent with the findings in Reference [43]. We simulate the above scenario using both XCP and ACP users.6 0. which utilize link 1 only.17 where 1.17 Time response of the utilization at the first link achieved by ACP and XCP.2 0 0 5 10 15 Time (sec) 20 25 30 ACP XCP FIGURE 7. This has been done to ensure that both links are bottleneck links for some flows. We observe that ACP achieves higher utilization.1 we compare the theoretical max-min congestion window values with the equilibrium values achieved by ACP and XCP.1 Modeling and Control of Complex Systems Theoretical Max-Min Fair Values. whereas XCP does not. This is demonstrated in Figure 7. This inefficiency causes underutilization of link 1.4 0.8 0. In Table 7. The first 10 users of the network have connection sinks at the first router and the rest of the users have connection sinks at the second router. We observe that ACP matches exactly the theoretical values.240 TABLE 7. . The other users traversing link 1 are bottlenecked at link 2 and so the 10 users that are bottlenecked at link 1 do not make efficient use of the available bandwidth. Compared with the Equilibrium Values Achieved by ACP and XCP Users 1–10 11–60 61–70 71–80 Round-Trip Time (msec) 60 90 260 62 Max-Min Congestion Window 56 13 37 9 ACP Congestion Window 56 13 37 9 XCP Congestion Window 40 13 37 9 propagation delays to investigate the ability of ACP to achieve fairness in the presence of flows with multiple round-trip times. XCP fails to assign max-min sending rates to the first 10 users. whereas the remaining users are bottlenecked at link 2. The first 10 users are bottlenecked at link 1.4 1.2 1 Utilization 0.

Although RCP and ACP were motivated by the same design ideas. 7. whereas ACP achieves almost full utilization of the link at equilibrium. This conservative policy ensures no packet losses and high network utilization. RCP and ACP also have fundamental differences in the implementation of the algorithm that updates the desired sending rate. the linear model is a poor approximation of the nonlinear model in some regions of the state space. it does take several round-trip times for each source to converge to its max-min fair sending rate and this can cause larger duration of flows with small file sizes. applies a more conservative policy both when increasing and when decreasing the source sending rate. utilization. and packet drops. whereas the main objective of ACP is to optimize network-centric performance metrics such as fairness. to avoid packet losses. The controller is designed assuming that the number of users utilizing the link is known. In practice the latter is an unknown timevarying parameter.5. queue sizes. However. implements a certainty equivalent controller at each link. However. We then replace the known parameter in the control algorithm with its estimate to yield the certainty equivalent controller. especially for new users. the design objectives of the two protocols are different. However. The properties of the RCP controller have been established by linearizing the nonlinear equations in a small neighborhood about the stable equilibrium point. However. However. Obviously XCP causes underutilization of the link. This is done to ensure that flows with small file sizes finish their sessions quickly. The main objective of RCP is to minimize the duration of the network flows. on the other hand. ACP. the controller is very slow in recovering.7 Comparison with RCP RCP and ACP were developed independently based on similar design principles. such an aggressive increase policy. Another thing to note in Table 7. At each source RCP applies a rather aggressive increase policy where it immediately adopts the desired sending rate received from the network as the current sending rate of the source. must be accompanied by an aggressive decrease policy in the case of congestion. when the desired sending rate experiences a large undershoot. as we will see later such an aggressive decrease policy can cause RCP to underutilize the network for a significant time period. ACP.1 is the ability of ACP to achieve max-min fairness despite the presence of flows with a variety of round-trip times. they implement different algorithms at both the sources and the links. RCP implements a nonlinear congestion controller.Congestion Control in Computer Networks 241 we plot the time response of the utilization achieved at link 1 by the ACP and the XCP users. thus causing underutilization of the network for large time intervals. . This example demonstrates that ACP outperforms XCP in both utilization and fairness. Specifically. whereas ACP implements a certainty equivalent controller.2. on the other hand. We utilize online parameter identification techniques to estimate this parameter online. These model inaccuracies can cause the RCP algorithm to deviate significantly from the predicted behavior and perform poorly in some scenarios.

.18 Time responses of the congestion window of the network users for ACP and RCP. Observe that the second RCP user converges very slowly to its equilibrium value. The sending rate of user 1 converges to the bandwidth of the bottleneck link and then 1600 1400 1200 Cwnd (packets) 1000 800 600 400 200 0 0 5 10 15 (a) ACP 1600 1400 1200 Cwnd (packets) 1000 800 600 400 200 0 0 5 10 15 20 Time (seconds) (b) RCP 25 30 cwnd 1 cwnd 2 20 25 30 cwnd 1 cwnd 2 Time (seconds) FIGURE 7. This represents a 100% increase in the traffic load at the bottleneck link. The bandwidth of each link is set to 155 Mbit/sec and the roundtrip propagation delay is set equal to 80 msec. The network is initially utilized by only one user. At 15 sec a second user enters the network. Such sudden changes can cause RCP to underutilize the network for significant time periods. In Figure 7.4.242 Modeling and Control of Complex Systems RCP performs poorly when the network experiences sudden changes in the traffic load. We simulate both ACP and RCP networks. We observe that ACP generates smooth responses which gradually converge to their equilibrium values.18 we show the time responses of the congestion window of users 1 and 2 for ACP and RCP. We consider the single bottleneck link network of Figure 7. In this section we demonstrate this behavior of RCP and we show that ACP continues to perform well in the scenarios where RCP performs poorly.

In Figure 7.2 Bottleneck Utilization 1 0. which is one packet. This slow response is a result of the nonlinear control algorithm RCP utilizes to calculate the desired sending rate.4 1. on the other hand. whereas ACP achieves almost full utilization in that period. the desired sending rate does not recover quickly. This problem is exacerbated as we increase the link bandwidths. It remains close to one for approximately 5 sec and converges to the equilibrium value in 15 sec. This causes excessive queue sizes at the bottleneck link.19 we show the time responses of the utilization of the bottleneck link achieved by ACP and RCP. Note how quickly user 1 originally converges to its equilibrium value. We observe that RCP underutilizes the network for a significant amount of time when the second user enters the network.4 0. However. gradually decreases to half of this value when user 2 enters the network. We observe that RCP underutilizes the network for a significant amount of time when the second user enters the network. When this happens. RCP.19 Time response of the utilization achieved at the bottleneck link by ACP and RCP. when user 2 enters the network its sending rate is set equal to the sending rate of user 1. The nonlinearity causes slow responses when the desired rate experiences large undershoots.2 0 0 5 10 15 Time (seconds) 20 25 30 ACP RCP 243 FIGURE 7.8 0.6 Conclusions This chapter provides a survey of recent theoretical and practical developments in the design of Internet congestion control protocols for networks of .Congestion Control in Computer Networks 1.6 0. whereas ACP achieves almost full utilization in that period. 7. The aggressive decrease policy that RCP adopts then causes the desired sending rate calculated at the link to decrease to the minimum value allowed by the control algorithm. It takes several round-trip times for the congestion windows to converge to their equilibrium values. This behavior of RCP can cause underutilization of the network in significant time periods. adopts a more aggressive response policy.

and fast recovery algorithms. April 1999. TCP extensions for high performance. June 2005. Madhow. thus demonstrating how local dynamics are coupled to achieve a global objective. January 1997. Robustness and the Internet: Design and evolution. The relevant cost functions serve as Lyapunov functions for the derived algorithms. M. W. Romanow. R.caltech. However. and A. Doyle. J. V. . Stevens. Even though we have not yet established analytically its global convergence. for max-min congestion controllers the problem of asymptotic stability in the presence of delays still remains open. T. Mathis. Dynamics of TCP/RED and a scalable control. IEEE/ACM Transactions on Networking. pp. Shorten. The NewReno modification to TCPs fast recovery algorithm. D. 5(3):336–350. RFC 2581. Leith. H. V. Madhavi. 5. June 2002. RFC 2018. S. 6. Li. C. Lakshman and U. Future work will consider the analytical establishment of global stability for arbitrary topologies in the presence of delays. So. TCP congestion control. In Proc. J. V. S. which is shown through simulations to outperform previous proposals and work effectively in a number of scenarios. Hamilton Institute. The performance of TCP/IP for networks with high bandwidth-delay products and random loss. M. for this class of problems. Henderson. Ireland. April 1999. W. Floyd and T. Borman. 3. 8. simulation results are encouraging. New York. RFC 1323. volume 1. and W. In this chapter we present a new adaptive congestion control protocol. Allman. RFC 2001. Paxson. S. Technical Report HI.244 Modeling and Control of Complex Systems arbitrary topology. the performance of these algorithms in networks of arbitrary topology has been demonstrated through simulations and practical implementation. 2002. Paganini. This approach has failed to produce protocols that satisfy all the design requirements. We present a theoretical framework that has been used extensively in the last few years to design congestion control protocols with verifiable properties. pages 23–27. 9. June 1997. V. References 1. In this framework the congestion control problem is viewed as a resource allocation problem which is transformed through a suitable representation in a nonlinear programming problem. 2. and J. Wang. F. 314–329. May 1992. and RN. RFC 2582. Stevens. fast retransmit. ACM Press. October 1996. Willinger and J. Many of the derived algorithms have been shown to have globally stable equilibrium points in the presence of delays. TCP slow start. Low. Congestion avoidance and control. S. Adlakha. Y. Floyd. IEEE INFOCOM. 4. 10. 7. Jacobson. Experimental evaluation of TCP protocols for high-speed networks. August 1988. and D. congestion avoidance. http://netlab. Maynooth. TCP selective acknowledgement options. Doyle. Jacobson. In Symposium Proceedings on Communications Architectures and Protocols. Braden.edu/internet/.

S. Zhang-Shen. Global stability of Internet congestion controllers with heterogeneous delays. Improving performance in highspeed wide area networks. Vinnicombe. Caceres and L. June 2005. December 2001. Improving the performance of reliable transport protocols in mobile computing environmnents. Dokkipati. Wei. and N.Congestion Control in Computer Networks 245 11. 13. A. December 2004. 19. 11(10):1153–1173. Kobayashi. Maulloo. Voice flow control in integrated packet networks. proportional fairness and stability. volume 1. and S. Kunniyur and R. February 2005. and L. H. L. Global asymptotic stability of a TCP/AQM protocol for arbitrary networks with delay. December 2003. M. volume 3. In Proceedings of the American Control Conference. Wang. performance. 15. 26. M. Low. June 2004. Congestion control for high performance. March/April 1995. 1998. December 2003. Optimization flow control. IEEE Journal on Selected Areas in Communications. Pitsillides.398. Basar. F. T. 13(1):43–56. Technical Report LIDS-TH-1152. February 2005. W. algorithms. E. Technical Report CUED/F-INFENG/TR. Tan. pages 2948–2953. High speed TCP for large congestion windows. Processor sharing flows in the Internet. pages 3048–3057. FAST TCP: Motivation. Cambridge. December 2000. Doyle. June 1995. G. 28. Papachristodoulou. 16. Polycarpou. N. UK. S. 9(6):818–832. Hayden. Rossides. stability and fairness in general networks. MA. volume 4. End-to-end congestion control for the Internet: Delays and stability. 17. Cambridge University Engineering Department. Decision and Control. Paganini. 12(2):286–299. F. 13(5):850–857. 25. IEEE/ACM Transactions on Networking. In Proc. Rossides. Ying. R. and R. September 2003. E. 12. Floyd. Lestas. MIT Laboratory for Information and Decision Systems. Dullerud. S. 21. Low. 18. R. R. 32(2):83–91. 27. Jin. Kelly. and R. December 1999. Bonomi and K. and C. Congestion control in differentiated services networks using fuzzy-red. Rate control in communication networks: Shadow prices. Johari and D. IEEE/ACM Transactions on Networking. A. April 2003. Liu. Chrysostomou. 20. S. F. McKeown. 24. 49:237–252. G. Analysis and design of an adaptive virtual queue algorithm for active queue management. Hegde. Iftode. Controlling the Internet: A survey and some new results. L. In Proceedings of the Thirteenth International Workshop on Quality of Service 2005. J. C. . T. 14. Journal of the Operational Research Society. IEEE/ACM Transactions on Networking. Cambridge. I: Basic algorithm and convergence. Z. A. The rate-based flow control framework for the available bit rate ATM service. Lapsley. H. On the stability of end-to-end congestion control for the Internet. Srikant. Computer Communication Review. D. 13(1):94–107. Pitsillides. and D. IEEE Network. Srikant. S. Tan. IEEE/ACM Transactions on Networking. December 2006. P. Sekercioglu. In Proc. IEEE Conference on Decision and Control. Srikant. April 2004. 7(6):861–874. architecture. S. pages 1029–1034. 1981. Kelly. 23. Low and D. Ioannou. A. IEEE/ACM Transactions on Networking. 22. IEEE/ACM Transactions on Networking. and A. M. Adaptive nonlinear congestion controller for a differentiated-services framework. H. IEEE Conf. 9(2):25–39. IFAC Control Engineering Practice (CEP) Journal. RFC 3649. X. Fendick. H.

L. June 2001. Lestas. Rohrs. On the stability of optimization-based flow control. 1998. Internet congestion control for highbandwidth-delay products. C. F. B. volume 3. European Transactions on Telecommunications. 43. S. pages 242–251. Y. Academic Press. Wydrowski. (Submitted for publication. volume 4. June-July 2004. M. volume 1. M. M. volume 6. volume 2. 32. In Proceedings of the American Control Conference. A novel explicit rate congestion control algorithm. Wong and F. Santosh and K. Adaptive Congestion Protocol: A new congestion control protocol with learning capability. pages 1025–1036. 30. P. F. Sept-Oct 1997. June 2002. Kelly. and G. Explicit window adaptation: A method to enhance TCP performance. D. . S. IEEE Communications Letters. Lestas. 37. 40. L. S. pages 543–548. Loguinov. 6(11):512–514. Fulton. In Proc. volume 2. 44. D.246 Modeling and Control of Complex Systems 29. P. 10(5):227–246. 8:33–37. On a hybrid model for max-min congestion controllers.) 42. In Proc. and K. 33. An ABR feedback control scheme with tracking. International Journal of Communication Systems. and C. April 1997. and A. Benmohamed and S. A. and A. L. Computer Networks. IEEE Conference on Decision and Control. 46. 47(6):895–902. IEEE Transactions on Automatic Control. Ioannou. A. K. 39. pages 2432–2439. November 2002. M. In Proceedings of the IEEE INFOCOM. Proceedings of the IEEE American Control Conference. Anurag. D. A congestion control algorithm for max-min resource allocation and bounded queue sizes. March 2005. December 2004. In Proceedings of the Workshop on Modeling and Control of Complex Systems. Charging and rate control for elastic traffic. 36. Andrew. NH. M. Varma. Zhang. H. In Proceedings of the IEEE INFOCOM’97. volume 2. Bertsekas. P. H. Ioannou. Feedback control of congestion in packet switching networks: The case of multiple congested nodes. August 2002. Bertsekas. K. Pitsillides. L. 35. New York. Ioannou. 1982. M. pages 1683–1688. In Proceedings of the IEEE GLOBECOM’98. Nashua. In Proceedings of the IEEE INFOCOM. June 2006. and B. P. Kalampoukas. Global asymptotic stability of a max-min congestion control scheme. Lim. pages 805–814. November 1998. 45. Paganini. Stochastic approximation approach for max-min fair adaptive rate control of abr sessions with mcrs. pages 1358–1365. and D. 31. Massoulie. In Proceedings of the IEEE INFOCOM. L. Pitsillides. Jetmax: Scalable max-min congestion control for high-speed heterogeneous networks. and A. 34. Leonard. P. Pitsillides. April 2006. M. Low. and C. Stability of distributed congestion control with heterogeneous feedback delays. Pitsillides. Ioannou. 1982. Constrained Optimization and Lagrange Multiplier Methods. Meerkov. Zukerman. Ramakrishnan. ACM SIGCOMM. 38. 41. P. Handley. Hadjipollas. Maxnet: A congestion control architecture. Li. pages 4689–4694. April 1998. Lestas. Constrained Optimization and Lagrange Multiplier Methods. January 1997. P. A. Bonomi. June 2005. In Proceedings of the IEEE INFOCOM. Athena Scientific. Wydrowski and M. Lestas. D. Understanding XCP: Equilibrium and fairness. P. Katabi.

.....1 Acquiring Persistence ....3 Maintaining Persistence During Formation Reconfigurations ...................................................................... 273 Acknowledgment...... 251 8...........5.......1...... 271 8......................... 270 8. Hendrickx s CONTENTS Introduction.......3.8 Persistent Autonomous Formations and Cohesive Motion Control Barı¸ Fidan................5.................... 267 8....................3 Acquisition and Maintenance of Persistence...........................1.... 274 8....................................... 260 8............5................................................4.........3. 255 8.1 Control Law for Zero-DOF Agents ............................................... 248 Rigid and Persistent Formations . and Julien M...................2 Persistent Formation Reconfiguration Operations ..........1 Problem Definition..... Anderson.....4.......... 255 8...........5 Decentralized Control of Cohesive Motion .......................................4 Cohesive Motion of Persistent Formations.....................2 Cyclically Led Minimally Persistent Formations... O.............2 247 ...... 266 8..............2... Changbin Yu....5.5...................1...1 Control Design.........................6 Discussions and Future Directions......................2 Acyclically Led and Cyclically Led Formations ....................3 More Complex Agent Models ......................1 Rigid Formations........................................................................... 269 8....................................2 Control Law for One-DOF Agents ...................................................... 273 References.............................................................................5..... Brian D..............................2 Stability and Convergence .. 264 8.2.....................2 Constraint-Consistent and Persistent Formations................................. 266 8...............................2.....3 Control Law for Two-DOF Agents...... 269 8..5......................1 8............... 252 8.........................3................................... 250 8....2.................5.............1 Acyclically Led Minimally Persistent Formations... 263 8.................. 259 8......... 266 8... 263 8.......... 268 8...

the topic of distributed motion control of autonomous multiagent systems has gained significant attention. the chapter focuses on two complementary issues about autonomous formations: persistence (which will be explained further below) in Sections 8. . This topic presents numerous aspects to be explored corresponding to different control tasks of interest. we give an intuitive introduction to the fundamental terms to be used throughout the chapter.1 Introduction Recently. G F for a particular F can be directed or undirected depending on the properties of information links of F . and so on. The formal definitions of these terms (where needed) will be given later in the chapter. the distances between all other pairs of agents in F are consequentially held fixed as well. E F ) with a vertex set VF and an edge set E F where each vertex i ∈ VF corresponds to an agent Ai in F and each edge (i.22–24. In this chapter. in parallel with the interest in the real-life applications of such systems involving teams of unmanned aerial and ground vehicles. combat and surveillance robots.6. Aj ) of agents.12. using a recently developed theoretical framework of graph rigidity and persistence. and cohesive. We represent each multiagent formation F by a graph G F = (VF .. that is.248 Modeling and Control of Complex Systems 8. as will be discussed below. namely persistent formations. As the title indicates.e. A formation F with an underlying graph G F = (VF . that is. [1–4. Typically the agent pairs in F whose interdistances are explicitly maintained are the ones having information (i. whose representative vertices are connected by an edge in E F . G F is also called the underlying graph of the formation F . where the formation shape is maintained during any continuous motion via a set of constraints on each agent to keep its distances from a prespecified group of other neighboring agents constant. we analyze a general class of autonomous multiagent systems moving in formation. underwater vehicles.31]. shapepreserving motion.5. in Sections 8.3. and hence F can move as a cohesive whole.1 we assume a point-agent system model [14. Before listing the contents and contributions of the chapter and linking these two topics. Here.4 and 8.26]. j) ∈ E F corresponds to an information link between a pair ( Ai .2 and 8. Leaving the agent dynamics issues to future studies in the field and focusing on the motion of the entire formation rather than individual agent behaviors. E F ) is called rigid if by explicitly maintaining distances between all the pairs of agents which are connected by an information link.20. sensing and communication) 1 It is worth noting here that agent dynamics and dynamic interactions are major issues in realworld multivehicle formation control and some further discussions on these issues can be found in Reference [25] and the references therein. We use the term formation for a collection of agents moving in real twoor three-dimensional space to fulfill certain mission requirements. and so on. assumed agent dynamics and interagent information structures. control approaches to be followed.

In a persistent formation. all remaining interagent distances will be consequently maintained and the formation will be rigid. j) ∈ E F constant. there is a joint effort of both agent Ai and agent Aj to simultaneously and actively maintain their relative positions. a minimally persistent formation provably preserves its persistence with a minimal number of edges. In the symmetric case. consequently. if each agent in F is able to satisfy all the constraints on it provided that all other agents within F are trying to satisfy their constraints (i. for example. actively maintains its distance to agent Aj at the desired value di j . A formation that is both rigid and constraint-consistent is called persistent [31].2. Aj ) with an information link between the two corresponds to keeping the length of the edge (i. explicit maintenance of the distance between an agent pair ( Ai . The associated undirected underlying graph will have an undirected edge (i. There are two types of control structures that can be used to maintain the required distance between pairs of agents in a formation: symmetric control and asymmetric control. Persistence appears to be the crucial property of an information/control architecture of a formation that ensures that the formation can move 2 There exists an exceptional small class of formations in 3 .e. E F ) by a directed edge −→ − (i. If enough agent pairs explicitly maintain distances. For a formation F with asymmetric control structure. for example. . for which the intuitive explanation here and the formal definition of persistence given in Section 8. to keep the distance between. then F is called constraint consistent (examples of both a constraint-consistent formation and a formation lacking constraint consistence will be presented subsequently). in the asymmetric case. This means that only agent Ai has to receive the position information broadcast by agent Aj . we also say that Ai has the constraint of staying at a distance di j from Aj or Ai follows Aj or Ai is a follower of Aj . In this case. that is. This is modeled in the associated (directed) underlying graph G F = (VF . provided that all the agents are trying to satisfy the distance constraints on them. In the asymmetric case. in (a geometric representation of) the underlying graph G F .Persistent Autonomous Formations and Cohesive Motion Control 249 links in between.2 do not match. j) between vertices i and j. both the overall control complexity and the communication complexity in terms of message sent or information sensed for the formation are expected to be reduced by half. if removal of any single edge (in the underlying graph) makes F nonpersistent then F is further called minimally persistent. j) ∈ E F from vertex i to vertex j. that is. agent Ai and agent Aj at a desired value di j . Hence. This special class is further discussed in Section 8. agent Ai . they can in fact satisfy these constraints and. only one of the agents in each pair. when the formation moves.2 For a given persistent formation F . corresponding to the edges in the underlying graph G F . satisfy as many of their constraints as possible). the global structure of the formation is preserved. Therefore. which is the assumed control structure in this chapter.. or sense the position of agent Aj and it can make decisions on its own.2. it necessarily moves as a cohesive whole.

from an operational point of view. We particularly consider systematic construction of provably persistent two-dimensional formations by assigning directions to their information links. The chapter concludes with some mention of relevant future research directions.2 Rigid and Persistent Formations In this section.3. 8. nevertheless. We briefly review some common operations on persistent formations. For details the reader may refer to Reference [6. generalization of these designs for other kinematic models is discussed briefly as well. splitting a formation into smaller formations.5.1. j) ∈ E F ⇔ ( (i.29. j) ∈ E F or ( j. The u undirected graph G u = (VF . including addition of new agents to the formation.250 Modeling and Control of Complex Systems cohesively.14. Minimal persistence defines those situations where loss of a link means cohesiveness of the motion is no longer assured. satisfying −→ − −→ − u (i. in Section 8. merging two or more formations. we give formal definitions of the rigidity and persistence notions and present a brief review of the fundamental characteristics of rigid and persistent formations to the extent needed for the analysis in the following sections.4 and 8.31]. . The control design procedure is presented assuming a velocity integrator agent model that is widely considered in the literature [1. nonminimal persistence may be desirable to secure redundancy [8]. In Section 8. respectively) considering real-world multivehicle formation applications. Consider a formation F with asymmetric control structure.12.28. . 3. E F ) of F has been defined in Section 8. Finally. in Sections 8. We present a set of distributed control schemes to move a given two-dimensional persistent formation with specified initial position and orientation to arbitrary desired final position and orientation without deforming the shape of the formation during the motion. Based on these characteristics and criteria. We focus on formations in 2 and 3 (two-dimensional and three-dimensional Euclidean spaces. closing ranks when an agent is lost.2. . we focus on cohesive motion control of persistent autonomous formations. We present some operational criteria to check the persistence of a given formation. we review the general characteristics of rigid and persistent formations using a recently established framework of rigid and persistent graphs. and we provide strategies for maintaining persistence during these operations. The directed underlying graph G F = (VF .26]. E F ) with the same vertex set VF and the undiF u rected edge set E F having the same edges in E F but with the directions neglected. although most of the definitions and results can be generalized for arbitrary dimensional space n (n ∈ {2. i) ∈ E F ) . we focus on the acquisition and maintenance of the persistence of certain types of autonomous formations.}) [31]. that is.

For a set S. Provably equivalently.2. 3 In . E) with vertex set V and edge set E is a function p : V → n . j) ∈ E. of which it is a realization. 3}) if almost all its representations in n are rigid. we focus on formal definition and characterization of rigidity and persistence of formations with asymmetric control structure. In n (n ∈ {2. One reason for using these terms is to avoid the problems arising from having three or more collinear vertices in 2 or four or more coplanar vertices in 3 . a representation of an undirected graph G = (V. p ) < . ¯ ¯ Given a graph G = (V. Note that each representation p of a graph induces a realizable distance set [defined by di j = | p(i) − p( j)| for all (i. Fundamental characteristics of rigid and minimally rigid graphs and some of their applications this chapter. An undirected graph is said to be generically n-rigid or simply n-rigid (n ∈ {2.1 Rigid Formations We formally call a formation F in n (n ∈ {2. where the weight of each edge (i. p2 ) = max | p1 (i) − p2 (i)| i∈V ¯ A distance set d for G is a set of distances di j > 0. 3}) if G is n-rigid and if there exists no n-rigid subgraph of G with the same set of vertices as G and a smaller number of edges than G. A distance set is realizable if there exists a representation p of the graph for which | p(i)− p( j)| = di j for all (i. j) ∈ E]. A representation p is rigid if there exists > 0 such that for all realizations p of the distance set induced by p and satisfying δ( p. d) can be considered as a weighted graph [weighted version of the graph G = ¯ (V. E) and a corresponding distance set d. the pair (G. 3}) with asymmetric control structure rigid (and its directed underlying graph G F generically n-rigid) if its undirected underlying graph G u is generically n-rigid. denotes the Euclidean distance between p1 (i) and p2 (i). defined for all edges (i. one for vectors and the other for sets. For a vector ξ ∈ n (n ∈ {2. 3}). |S| denotes the number of elements of S. Next. Another notion used in rigidity analysis is minimal rigidity. 3}). j ∈ V (we say in this case that p and p are congruent). 8. where generic n-rigidity F of an undirected graph (n ∈ {2. E)]. 3}) is defined in the sequel. Hence | p1 (i) − p2 (i)|. Such a representation is then called a realization. Some discussions on the need for using the qualifiers “generic” and “almost all” can be found in References [14. |ξ | denotes the Euclidean norm of ξ . a graph is minimally n-rigid if it is n-rigid and if no single edge can be removed without losing n-rigidity. We say that p(i) ∈ n is the position of the vertex i. A graph G is called minimally n-rigid (n ∈ {2. we use | · | to denote two different operators. j) ∈ E is di j ∈ d.Persistent Autonomous Formations and Cohesive Motion Control 251 is called the underlying undirected graph of the formation F (or of G F ).27]. on this page. there holds | p (i) − p ( j)| = | p(i) − p( j)| for all i. j) ∈ E. and define the distance between two representations p1 and p2 of the same graph by:3 δ( p1 . using their undirected and directed underlying graphs.

the position of vertex i is fitting if there is no p ∗ ∈ n for which −→ − −→ − {(i. THEOREM 1 For any n-rigid graph G = (V. We also say that the position of the vertex i ∈ V is fitting for the distance set d if it is not possible to increase the set of active edges leaving i by modifying the position of i while keeping the positions of the other vertices unchanged.2. a graph obtained by adding one vertex to a graph G = (V. a representation p : V → n −→ − (n ∈ {2. given a representation p. 3}) with asymmetric control structure persistent (constraint consistent) if its directed underlying graph G F is n-persistent (respectively n-constraint consistent). If |E | = n|V | − n(n + 1)/2 then G is minimally n-rigid. 3}) of G. E ) is minimally n-rigid and satisfies the following: (1) |E | = n|V|−n(n + 1)/2. representation for its distance set. Consider a directed graph G = (V. E) and n edges connecting this vertex to other vertices of G is (minimally) n-rigid if and only if G is (minimally) n-rigid. LEMMA 2 For n ∈ {2. j) ∈ E : | p(i) − p( j)| = di j } ⊂ {(i. The representation p is called persistent if there exists > 0 such that every representation p fitting for the distance set induced by p and satisfying δ( p. We say −→ − that the edge (i. and a set of desired distances di j > 0. where n-persistence and n-constraint consistence of a directed graph are defined as follows. j) ∈ E is active if | p(i) − p( j)| = di j . j) ∈ E. ∀(i. p ) < is congruent to p.2 Constraint-Consistent and Persistent Formations Similar to the definition of rigid formations. E).1). j) ∈ E : | p∗ − p( j)| = di j } ¯ A representation p of a graph is called fitting for a certain distance set d if ¯ Note that any realization is a fitting all the vertices are at fitting positions for d. 8.252 Modeling and Control of Complex Systems in autonomous formation control can be found in References [6. E) be a minimally n-rigid graph (n ∈ {2. 3}) with at least n vertices. LEMMA 1 Let G = (V.2. E) (n ∈ {2. and distance between two representations corresponding to a directed graph are defined exactly the same as the ones corresponding to undirected graphs (see Section 8. (2) any subgraph G = (V . E ) of G with at least n vertices satisfies |E | ≤ n|V | − n(n + 1)/2. E ) be a subgraph of G. vertex positions. More formally. we call a formation F in n (n ∈ {2. 3}. Following are a selection of these characteristics.27–29]. there exists a subset E ⊆ E of edges such that the graph G = (V. 3}) and G = (V . Note here that the representation. A graph is then .

the in. PROPOSITION 1 [31] An n-persistent graph (n ∈ {2. 3}) if almost all its representations in n are persistent. rigidity. 2. 3}) if and only if it is generically n-rigid and generically n-constraint consistent. 4 Rigidity for a directed graph is defined in the same way as for an undirected graph. (c) The representation is both rigid and constraint consistent. that is. respectively.Persistent Autonomous Formations and Cohesive Motion Control 5 2 1 4 (a) 3 4 (b) 3 5 2 1 4 (c) 3 5 2 1 253 FIGURE 8. and 3 are stationary. In order to check persistence of a directed graph G.) Hence it is not persistent. A graph is generically n-persistent (n ∈ {2. j) for which d + (i) ≥ n + 1. where d − (i) and d + (i) designate. (Assuming vertices [agents] 1.1.4 and constraint consistence of a directed graph is given in the following theorem and demonstrated using a two-dimensional example in Figure 8.and out-degree of the vertex i in the graph G. and 3 are stationary. Assume√ the distance set d is given by d12 = d13 = d23 = d25 = that d34 = d45 = 1 and (for [b] and [c]) d24 = 2. we say that a graph p and satisfying δ( p. respectively. hence it is persistent. 5 can continuously move to new positions without violating the distance constraint [d25 ] on it.) Hence it is not persistent. d34 . (a) The representation is constraint consistent but not rigid. j) for which d + (i) ≥ n + 1. 3}) remains n-persistent after deletion of any −→ − edge (i. Again. a representation p is called constraint consistent if there exists ¯ > 0 such that any representation p fitting for the distance set δ induced by ¯ . p ) < is a realization of δ is generically n-constraint consistent (n ∈ {2. (b) The representation is rigid but not constraint consistent. . d45 . for which vertex 4 is unable to meet all three distance constraints [d24 . 3}) if almost all its representations in n are constraint consistent. the number of edges in G heading to and originating from i. Similarly. generically n-persistent (n ∈ {2. The relation among persistence. (Again assuming that 1. an n-constraint-consistent graph −→ − (n ∈ {2. 4 and 5 can continuously move to new locations without violating d25 . Similarly. one simply takes no account of any assigned direction. 3}) is persistent if and only if it is rigid and constraint consistent. d45 ] on it at the same time.1 ¯ Application of Theorem 2 in 2 . 2. one may use the following criterion. 3}) remains n-constraint consistent after deletion of any edge (i. d34 . THEOREM 2 [31] A representation in n (n ∈ {2.

Then.1 and the formal definition.2. 3}). COROLLARY 1 The total DOF count of an n-persistent graph in n(n + 1)/2. 3}). 3}). and all the other vertices have zero DOF. The following corollary of Proposition 1 provides a natural bound on the total number of degrees of freedom in an n-persistent graph in n (n ∈ {2. In n (n ∈ {2. a vertex (agent) with n-DOF in n is also called a leader. 3}) if and only if all those subgraphs are n-rigid which are obtained by successively removing outgoing edges from vertices with out-degree larger than n until all such vertices have an out-degree equal to n. despite the ability of any single agent to move to a position that satisfies the constraints on it once all the other agents are fixed [31]. The distinction between structural persistence and persistence does not arise in two dimensions. note that n of the n(n + 1)/2 DOFs correspond to translations and the remaining n(n − 1)/2 correspond to rotations of a formation represented by the n-persistent graph. It has been stated in Section 8. we present two other essential results on the characterization of persistent graphs (and hence persistent formations). as well as the details of the above problem. Next. which we also call the total DOF count of that graph in n . n (n ∈ {2.1 that there exists a particular small class of formations in 3 . G is n-persistent if and only if G is n-persistent. over all n-dimensional representations of G F . the vertices with out-degree 1 have n − 1 DOFs. The problem in this exceptional class arises when it is not possible for all the agents in a certain subset of the agent set of the formation to simultaneously satisfy all their constraints. Persistent formations free of this problem are called structurally persistent. do not match. In an underlying graph of an n-dimensional formation (n ∈ {2. which is defined as the maximal dimension. For a formal definition and characteristics of structural persistence. the reader may refer to Reference [31]. In 3 . it turns out that a formation is structurally persistent . 3}). of the set of possible fitting positions for this agent (vertex). PROPOSITION 2 [31] Consider a directed graph G and another directed graph G that are obtained by adding one vertex to G and at least n edges leaving this vertex and incident on different vertices of G (n ∈ {2.254 Modeling and Control of Complex Systems Another notion found useful in characterizing a persistent formation F (or its underlying graph G F ) is the number of degrees of freedom (DOF count) of each agent (vertex) in n (n ∈ {2. here in Section 8. 3}) can at most be In Corollary 1. THEOREM 3 A directed graph is n-persistent (n ∈ {2. 3}). proofs of which can be found in Reference [31]. the ones with out-degree 2 have n − 2 DOFs. the vertices with zero outdegrees have n DOFs. for which the intuitive definition of persistence given in Section 8. considering the whole formation as a single body.

application of a vertex addition to graph G i−1 = (Vi−1 . to the edges of the undirected underlying graphs) to obtain persistent formations. where certain systematic construction procedures have been developed similar to their well-established counterparts for growing undirected rigid formation (or graphs). we focus on acquisition of persistence for such formations. ( j. . Interpretation (1) is partially analyzed in References [14–16. where at each step of the procedure. l)} . and if G 0 is further a minimally two-persistent graph. . (2) converting a given nonpersistent or non-rigid formation to a persistent one via swapping some of the directions of the information links and adding some extra links if needed.3. The resultant graph is G i = (Vi . and edge reversal. namely Henneberg construction sequences [6.e. and so on. which can be interpreted in various ways: (1) systematic construction of a persistent formation from a given team of autonomous agents with certain desired information architecture characteristics. Hence. Each of these operations (as defined below) preserves minimal two-persistence when applied to a minimally two-persistent graph and two-persistence when applied to a two-persistent graph.. For simplification.}). . the graph G i obtained at each step i = 1. At step i (i ∈ {1. 2. (3) assigning directions to the links of formations with given rigid undirected underlying graphs (i. .1 Acquiring Persistence The importance of persistence for cohesive and coordinated motion of autonomous formations with asymmetric control structure has been indicated in the previous sections. ( j. E i ) where Vi = Vi−1 ∪ { j} and − → −→ − − E i = E i−1 ∪ {( j. 8. each G i is minimally two-persistent. k). 2. we assume all the practical persistent formations considered in the chapter to be structurally persistent as well.16]. a systematic procedure is developed for constructing (minimally) twopersistent graphs.31]. k). is twopersistent.Persistent Autonomous Formations and Cohesive Motion Control 255 if and only if it is persistent and does not have two leaders each with three DOFs. In References [15. if the procedure starts with a twopersistent graph G 0 . .3 Acquisition and Maintenance of Persistence 8. edge splitting. one of the following three operations is applied: vertex addition. In this subsection. we briefly explain the three operations. . l) outgoing from j where k. Next. E i−1 ) means addition to G i−1 of a vertex j with in-degree 0 and outdegree 2 and two distinct edges − → −→ − − ( j.27]. l ∈ Vi−1 .

2. N}. k) ∈ E i−1 where j ∈ Vi−1 has at least 1 DOF (in with Vi = Vi−1 and 2 −−→ ) with (k. .16]: First. 3}. reversing the direction of each of the edges in these cycles in a sequential manner. 2. (l. (3. m)} \ {( j. j)} \ {( j. 1). as the final part of the construction procedure. (l. if G is an acyclic (cycle free).16] that the only possible differences between G and G are directions (orientations) of certain cycles. k)} Finally. minimally two-persistent graph. 2)} and applying the procedure described above in the following particular form [15. k) ∈ E i−1 and adding a new vertex l and the edges − → − → −− − − −→ ( j. using a sequence of N − 3 operations each of which is either vertex addition or edge splitting.16]. that is. It is further shown in References [15. k)} Any minimally two-persistent graph G = (V. 1). k). m) where m ∈ Vi−1 . E 0 ) with −→ −→ −→ − − − V0 = {1. . j) to obtain G i = (Vi . that is. The doability of each of the three parts of the above procedure for building G from G 0 is proven in References [15. where |V| = N ≥ 3. Then from G N−3 . As a special case. E i ) −−→ −→ − E i = E i−1 ∪ {(k. the resultant graph is G i = (Vi . . . . l). k). l). (3.256 Modeling and Control of Complex Systems Application of edge splitting means removing a directed edge −→ − ( j. application of edge reversal on G i−1 is replacing directed edge −→ − ( j. a minimally two-persistent graph G N−3 is built having the same underlying undirected graph as G. then G (with possibly a different permutation of vertex indices) can be grown from G 0 by applying a sequence of N − 3 vertex additions [14]. (l. E i ) where Vi = Vi−1 ∪ {l} and − → − → −− − − −→ −→ − E i = E i−1 ∪ {( j. E) with V = {1. (l. E ) is obtained having the same undirected underlying graph and the same DOF distribution (among vertices in V in 2 ) with G. G is obtained by reversing the directions of these cycles via a sequence of edge reversals for each cycle. E 0 = {(2. by redistributing the DOFs among the vertices by applying a sequence of edge reversals. can be obtained starting with a seed graph G 0 = (V0 . a graph G = (V.

Let K k be the directed graph obtained by assigning directions to the edges of K k such that for any vertex pair i.Persistent Autonomous Formations and Cohesive Motion Control 257 A generalized version of the vertex addition operation above (for both twopersistence and three-persistence) is discussed in References [14. . that is. .31]. 2. 3}) and any acyclic (cycle free) n-persistent graph G = (V. . 3}.10] and their rigidity can be verified easily using these definitions and the rigidity criteria available in the literature.n . j satisfying 1 ≤ i < j ≤ k. One possible approach to the interpretation (2) of persistence acquisition is to perform the acquisition task in two steps.2) followed by some discussion on their practical implications. and bipartite graphs of type K m. the direction of edge (i. Below we present the persistence acquisition procedures for some of these classes (depicted in Figure 8. namely complete graphs.15. The complete list of results as well as the proofs can be seen in Reference [8]. This interpretation has not been fully analyzed in the literature yet. where |V| = N ≥ n + 1 can be grown from an acyclic n-persistent seed graph with three vertices. 2. which is not unreasonable given that the notion of persistence is very recently defined and the relation between this directed graph notion and the undirected graph notion of rigidity is nontrivial. . and vertex j is joined to at least three . C 3 graphs. Formal definitions of these graph classes can be found in References [7. one applicable to nonminimally rigid as well as minimally rigid graphs. 2. . PROPOSITION 4 Given a trilateration graph T. j) is from j to i. for example. that is. wheel graphs. k such that 1. . and 3 form a complete graph. but partial discussions on or relevant to making a nonrigid graph rigid via adding edges and making a nonconstraint-consistent directed graph constraint consistent via edge reversals can be found in References [6. k}. where every vertex pair i. consider the k-complete (undirected) graph K k with the vertex set V = {1. PROPOSITION 3 Given an integer k ≥ 3. a graph with an ordering of vertices 1. 3}). . where a vertex with out-degree at least n and in-degree 0 is added to an n-persistent graph (n ∈ {2. .16. . The results in References [14. References [27–29].29]. systematic solutions for classes of nonminimally rigid undirected graphs. and in the second step directions of selected links are swapped to satisfy constraint consistence and hence persistence of the directed underlying graph. performing a sequence of N − 3 generalized vertex additions. C 2 graphs. N}. Nevertheless. Particularly. is not available in the literature yet. . bilateration and trilateration graphs. 2. . where in the first step the undirected underlying graph is made rigid via addition of a necessary number of links with arbitrary directions.31] imply that the generalized vertex addition operation preserves n-persistence (n ∈ {2. Then. a discussion on making a nonrigid graph rigid via adding edges is presented in Reference [6]. K k is n-persistent for n ∈ {2. . are provided in Reference [8]. j ∈ V is directly connected by an edge. E) with V = {1. where the task is named as a (minimal) cover problem. A general solution to the problem defined in interpretation (3).

and the direction of any edge (0. . . . one hub vertex (labeled vertex 0). . . vertices 1. k. let T be the directed graph obtained by assigning directions to the edges of T such that the direction of each edge (i. . . i + 1) is from i to i + 1 . 2). and the edges (0.2 Persistence acquisition (via link direction assignment) of two-dimensional formations with (a) a complete underlying graph. . Then. k) is from k to 1. consider the wheel graph Wk that is composed of k rim vertices. 2. (2. k. the direction of (1. T is n-persistent for n ∈ {2. Complete graphs model the information architecture of formations where the sensing (communication) radius of each agent potentially allows it to maintain its distance actively from any other agent in the entire formation.258 5 1 Modeling and Control of Complex Systems 1 4 2 3 4 2 3 K5 (a) 6 1 6 5 (b) 1 0 5 0 4 W6 (c) 3 2 5 2 4 W6 3 (d) FIGURE 8. . . (k − 1. . the rim cycle of edges Ck = {(1. . . Let Wk be the directed graph obtained by assigning directions to the edges of Wk such that the direction of each rim edge (i. PROPOSITION 5 Given an integer k ≥ 3. (k. 2. . . k). i) for i = 1. (b) a trilateration underlying graph. Wk is two-persistent. . j) for i < j is from j to i. 3). Trilateration results given in Proposition 4 can be used in acquisition of cycle-free formations with leader-follower structure . 5. (d) a wheel underlying graph with two different representations. . labeled vertices 1. . k connecting the hub vertex to each of the rim vertices. 1)} passing through these vertices. 2. i) is from i to 0. . Note that each of the rigid graph classes considered above corresponds to a formation architecture that can be used in guidance and control of autonomous multiagent formations. 3}. (c). . Then. j − 1 for j = 4.

The focus of analysis in Reference [6] for each operation is preservation of rigidity during the operation. G F 2 = (VF 2 . In terms of the underlying graphs. E F 1 ). splitting.2 Persistent Formation Reconfiguration Operations In many autonomous multiagent formation applications. In Reference [6] three key categories of operations on rigid formations have been analyzed: merging. including the results above.Persistent Autonomous Formations and Cohesive Motion Control 259 [26] and asymmetric control architecture. the formation whose agent set is the union of the agent sets of F1 and F2 and whose directed [information] link set is L 1 ∪ L 2 ∪ L n . Closing ranks can be thought of as a special (pseudo-) splitting operation. some of which are mentioned in Section 8. the commander (corresponding to the hub vertex) does not need to be in the geometric center of the formation. have been developed for a limited number of formation classes. Note here that since the sensing/communication capabilities of the agents and the distance constraints on them may be different from each other. find a directed edge set E n such that the directed graph (VF 1 ∪ VF 2 . F2 (add new directed edges to E F 1 . the methodology used to develop these procedures can be used to generate similar procedures for persistence acquisition and of other formation classes as well. E F ) is split into two formations F1 . the merging task is equivalent to the following: Given the underlying graphs G F 1 = (VF 1 . E F 2 ). where L i denotes the directed [information] link set of Fi for i ∈ {1.3. E F 2 ) of two persistent formations F1 and F2 . the commander. which can be thought of as the reverse of merging.. E F 1 ). E F 1 ∪ E F 2 ∪ E n ) is persistent. Although the procedures in Reference [8]. The trivial extensions of the above operations for persistent formations and the relevant persistence maintenance problems during these operations can be defined as follows. and closing ranks. One might use wheel graphs to model two-dimensional formations in which there is a central agent.e. the case is considered where a persistent formation F with directed underlying graph G F = (VF . In merging the task is to establish a set L n of new directed [information] links between (agents of) two persistent formations F1 . G F 2 = (VF 2 . one needs to analyze certain scenarios that have a significant likelihood in practice. F2 such that the merged formation F1 ∪ F2 ∪ L n (i. as a matter of guaranteeing robustness in the presence of such scenarios. respectively (where VF = VF 1 ∪ VF 2 ) due to loss of some information links in F (or some edges in E F ).2d. In splitting.2c and d. The task is to establish new links within each of F1 . the formations with wheel underlying graphs may have various geometries such as the ones depicted in Figure 8. which can be sensed by all other agents. and the closing ranks task is to establish new directed links between certain pairs among the remaining agents such that the . 2}) is persistent. E F 2 ) such that both F1 and F2 become persistent. F2 with directed underlying graphs G F 1 = (VF 1 . As demonstrated in Figure 8. The case of interest is the loss of an agent (and the links associated to this agent) from a persistent formation. 8.1.

Another similar scenario is where the leader of a formation has to be substituted due to evolving mission requirements [8. the leader change task above without damaging the control structure can be abstracted as changing directions of certain edges in an underlying directed graph in an appropriate way that maintains the persistence. for example. abstracting each vehicle with the sensing/communication equipment on it as a sensor agent. in Reference [6]. Note here that splitting can be thought of as a generalized closing ranks operation (defined for the loss of a set of agents instead of a single agent) as well. splitting into more than two persistent formations. One such scenario is where a multiagent (vehicle) formation loses some of its agents and new agents are required to be added to the formation without violating the existing control structure [6]. splitting. which can be thought of as the extension of the (minimal) cover problem discussed in Reference [6] for directed graphs or formations. The persistence maintenance problems corresponding to the three operations above can be thought of as special cases of the problem of making a nonpersistent (underlying) graph persistent via adding some new edges (and swapping some of the edge directions).8]. and closing ranks operations for persistent formations .260 Modeling and Control of Complex Systems new formation (formed after the agent loss and establishment of the new links) is persistent as well. Full analysis of the basic or extended versions of the merging. the main issue in the reconfiguration operations on persistent formations in terms of the information architecture is maintenance of persistence. can be equivalently reformulated as F1 being what is left when F .2. Furthermore. and so on. various other scenarios can be generated as combinations of specific forms of a number of the above formations.30].3. This observation has been found useful (at least for the undirected underlying graphs and rigidity considerations) in treating splitting problems using certain results derived for the closing ranks problem. observing that the scenario of the above splitting problem for the post–split formation F1 . having initially F2 as its subformation. for example. A complete generalization to directed graphs of the solution to the minimal cover problem has not yet been achieved. In this application. in order to adapt varying conditions during the surveillance mission. We conclude this subsection with a real-life example where frequent formation changes are expected: terrain surveillance using a formation of aerial vehicles with surveying sensors mounted on them [3. an extra sensor agent may be needed to improve the overall coverage. then loses the agents in the subformation F2 . closing ranks during loss of two or more agents.3 Maintaining Persistence During Formation Reconfigurations As mentioned in Section 8. 8. Moreover.3. the three operations and the corresponding persistence maintenance tasks can be further generalized to consider merging involving more than two persistent formations. which can be done by maintaining persistence of the formation during variations. In such a case it is essential to coordinate well the behavior of each such additional sensor agent with that of the already existing agents. If the formation in the beginning is persistent.

}) can be merged into a (structurally) n-persistent graph if and only if this merging can be done by adding edges leaving vertices with one or more local DOFs in n . DOFs (in n ) in the corresponding G i (i ∈ {1. This framework is based on the elaboration of the fact that in a three-persistent graph. . persistence maintenance of a three-dimensional formation is analyzed during addition of a new agent to the formation using a DOF allocation state framework. and G1 is composed of single-vertex graphs.17. However. In that case. and the second is about necessary and sufficient conditions for an edge-optimal persistent merging.30.Persistent Autonomous Formations and Cohesive Motion Control 261 and the relevant persistence maintenance problems are not available in the literature yet. . is an edge-optimal persistent merging in n if and only if the following conditions all hold: 1.31]. The six DOF allocation states in the framework are defined as the following six sets of DOF counts (in 3 ) of . J ( E M ) is the set of those in which one connected pair of vertices is incident to edges of E M . 2. 3}. G2 is composed of directed graphs with two vertices and an edge. . |E M | = n(n + 1)/2 (|G3 | − 1) + (2n − 1)|G2 | + n|G1 |. In References [8. G is then structurally three-persistent if and only if it has at most one vertex with three DOFs in 3 . . . for having E M such that no single edge of E M can be removed without losing persistence of the merged formation. . such that the original local DOF count of each vertex is greater than or equal to the number of added edges leaving this vertex. The main results of this analysis are summarized in the following theorems. All edges of E M leave vertices with local DOFs in n . where I ( E M ) is the set of graphs in which at least three vertices or two unconnected ones are incident to edges of E M . for example.31]. . the merged graph G is n-persistent if and only if it is n-rigid. N}). THEOREM 5 [17] Consider a set G = G3 ∪ G2 ∪ G1 of disjoint directed graphs where G3 is composed of n-persistent graphs (n ∈ {2. 3. where E M is a set of additional edges having end-vertices belonging to different graphs in G. 3}) having at least three vertices. THEOREM 4 [17] A collection of n-persistent graphs G 1 . there are at most six DOFs (in 3 ) to be allocated among the vertices [8. some partial analysis results and relevant discussions can be found in some recent studies. . N ∈ {2.31]. and K ( E M ) is the set of those in which only one vertex is incident to edge(s) of E M . . the problem of merging persistent formations in 2 and 3 is partially analyzed in a so-called metaformation framework. . In Reference [17]. The first theorem is about a particular way of merging the directed underlying graphs of persistent formations via addition of a set E M of additional edges having end-vertices belonging to different underlying graphs. G N (n ∈ {2. Then. there holds |E M | ≤ n(n + 1)/2(|I ( E M )| − 1) + (2n − 1)|J ( E M )| + n|K ( E M )|. 2. that is. For all nonempty E M ⊆ E M . 3. that is. G = G i ∈G G i ∪ E M . in References [8. .

i) : ∀ j ∈ V2 } for some V1 . 2. Further properties and interpretations of the four directed trilateration operations DT(0). . 0. and DOF ( j) ≥ 1 . . .} S4 = {2.}. sensitivity to loss of certain agents (e. which represent the six different ways (considering the agents as indistinguishable) of allocating the six DOFs. 1. one vertex (the leader) has three DOFs. an undirected graph formed by applying a sequence of trilateration operations starting with an initial undirected triangle. . 1. communication delays between the commander agent and the other agents. 2. 3}. Employing an analysis based on the DOF allocation framework. . 0. . often called a trilateration graph.}. |V2 | = m. and so on. is guaranteed to be generically three-rigid [28. one has two DOFs. and all the others are 0-DOF: S1 = {3. a common requirement would be developing decentralized controllers for individual agents. 0. . Similarly. . 1.}. 1. 0. S6 = {1. S3 = {3. . . to another three-persistent graph G = (V . V2 ⊆ V satisfying V1 ∩ V2 = ∅. 1. a directed graph formed by applying a sequence of directed trilateration operations starting with any initial directed triangle with three vertices and three directed edges. A directed trilateration. k) : ∀k ∈ V1 } ∪ {( j.5 provided that the vertices of V1 ∪ V2 are all distinct and are not collinear [31].}.29]. 1. 0. E). have been developed in Reference [31] for maintaining persistence. . 5 If there is no such V2 then the corresponding DT(m) cannot be performed for the graph G. E = E ∪ {(i.} Further discussion on the DOF allocation states can be found in Reference [8]. in a possible central control scheme. S2 = {2. 0. 1. REMARK 1 In the implementation of the persistence acquisition and maintenance strategies presented in this chapter. another has one DOF. where −→ − −→ − V = V ∪ {i}. . 0. ∀ j ∈ V2 . Furthermore. . 1. where m ∈ {0. S5 = {2. 1. 0. . instead of a centralized control scheme. where in the state S1 . The main concerns leading to this requirement are complexity and computational cost. |V1 | = 3 − m. and DT(3) are given in Reference [8]. . 1. 1. 0. 1. . 1. 0. is defined as a transformation of a three-persistent graph G = (V. namely directed trilateration operations. 2. 0. one for each vertex pair is guaranteed to be generically three-persistent [8. DT(m). DT(1). 2. that is. a set of directed vertex addition operations requiring a minimal number of new edges. 1. for example.262 Modeling and Control of Complex Systems vertices ordered in a nonincreasing manner. 1. a central commander).g. where |V| ≥ 3.31]. impracticality of processing local information by a central control unit. .. Note here that Theorem 3 indicates that the graph obtained after applying a directed trilateration is three-persistent. E ). 2. . 0. 1. DT(2). . the directed trilateration defined above preserves the three-persistence of the graphs.

. PROBLEM 1 Consider a persistent two-dimensional formation F with m ≥ 3 agents A1 . The particular problem we deal with in these two sections.1 Problem Definition In the previous sections we have focused on characteristics of persistent autonomous formations and discrete procedures to acquire and maintain persistence. . where pi f are consistent with d. the initial position pi0 of each agent Ai (i ∈ {1. . . In this section and the next. A2: The individual controller of each agent Ai (i ∈ {1. . v yi (t)] ∈ 2 denote the position and velocity of Ai at time t. . . . The control task is to move F to a given desired (final) position and orientation defined by a set of final positions pi f of the ¯ individual agents Ai for i = 1. cohesively. relaxation or generalization of which will be discussed later: A1: Each agent Ai has a velocity integrator kinematics pi (t) = vi (t) ˙ (8. is how to move a given persistent formation with specified initial position and orientation to a new desired position and orientation cohesively. We perform our control design and analysis in continuous-time domain with the following assumptions. using a decentralized strategy. that is. . . . we focus on decentralized motion control of two-dimensional persistent formations where each of the agents makes decisions based only on its own observations and state. in its basic form. vi (t) = [vxi (t). . that is. m. More specific definition of the problem is given as follows.4 Cohesive Motion of Persistent Formations 8. yi (t)]. The controller of Ai (i ∈ {1. without violating the persistence of the formation during the motion. . . . .Persistent Autonomous Formations and Cohesive Motion Control 263 8. respectively. . without deforming the shape or violating the distance constraints of F during motion. that is.4. where the ¯ initial positions pi0 are consistent with d. ∀t ≥ 0 for some constant maximum speed limit v > 0.1) where pi (t) = [xi (t). that is. Am whose initial position and orientation in 2 (the xy-plane) are specified with a set ¯ d of desired inter-agent distances di j between neighbor agent pairs ( Ai . m}) is known. . . ¯ A3: Each agent Ai knows its final desired position pi f and can sense its own position pi (t) and velocity vi (t) as well as the position p j (t) of any agent Aj it follows at any time t ≥ 0. . m}) can adjust the velocity vi (t) directly. vi (t) is the control signal of agent Ai . m}) is assumed to guaran¯ tee that vi (t) is continuous and |vi (t)| ≤ v. without considering any dynamic control task required for the formation. Aj ). .

would not be easily extendible for more complex scenarios. and so on.4. only when its distance constraint is satisfied within a certain error bound. On the other hand. together with the assumptions A1 to A4. we need to consider the requirements in possible extensions of Problem 1 and pay attention to simplicity and robustness.31]. The sum of the DOFs of individual agents in a persistent two-dimensional formation is at most three. control. and then generating vi for each Ai that would result in the predetermined trajectory pi (t). and for a minimally persistent formation. exactly three. imperfect position and distance sensors providing measurements with uncertainties.264 Modeling and Control of Complex Systems A4: The distance-sensing range for a neighbor agent pair ( Ai . noise. which is the same as the DOF of a free rigid (nonvertex) object in 2 (two for translation and one for rotation) [14. physical and computational energy consumption. in fact. in order to make our design and analysis framework usable in more complex practical scenarios. and a 1-DOF agent can undergo its DOF movement. it might not even be possible. and so on.2 Acyclically Led and Cyclically Led Formations For a complete analysis of Problem 1. minimally persistent formations can be divided into two categories: acyclically led minimally persistent formations (or formations with the leader-follower structure) where one agent . For example. Again these issues are neglected for the convenience of building a clear initial design and analysis framework that can be elaborated later for a particular. Note here that Problem 1. Based on the distribution of these three DOFs. and communication. Aj ) is sufficiently larger than the desired distance di j to be maintained. In a more realistic or more practical scenario one would need to consider more complex agent dynamics. specifications. even if not needed for the sake of solving Problem 1 only. disturbance. is formulated in a simple form in order to simplify the initial analysis of cohesive motion of persistent formation to be presented in this chapter. more practical problem according to particular. obstacles in the area of interest that the formation has to avoid. a straightforward attempt to solve Problem 1 based on predetermining a suitable time trajectory for the formation starting at the given initial setting and ending at the final desired setting. Moreover. we need to take into account the following categorization of two-dimensional persistent formations in terms of distribution of the DOFs. there would be some optimality criteria for the control task in terms of the overall process duration. and time delay effects in sensing. 8. we choose the control laws below so that meeting the distance constraint has a higher priority than reaching to the final desired position which can be rewritten as a guideline as follows: G1: A 0-DOF agent has to use all of its control potential for satisfying its distance constraints. and hence a time trajectory pi (t) for each agent Ai . In our approach.

another has one DOF. and cyclically led minimally persistent formations (or formations with the three-coleader structure) where three agents have one DOF and the rest have zero DOF. A3 .Persistent Autonomous Formations and Cohesive Motion Control 265 has two DOFs. Some stability properties of a subclass of acyclically led formations are investigated in Reference [26]. because of lying on a cycle. and the rest have zero DOFs.3 Atypical minimally persistent formations in 2 : (a) An acyclically led formation where the first follower (A4 ) does not follow the leader (A1 ). and there always exists a cycle passing through all of the three coleaders in a cyclically led formation. In both structures the zero-DOF agents are called the (ordinary) followers. such an investigation is not available yet for cyclically led formations. we design controllers to solve Problem 1 for both the cyclically led and acyclically led categories of minimally persistent formations. (b) A cyclically led formation where the coleaders (A1 . Note here that there exist acyclically led (minimally) persistent formations where the first follower does not directly follow the leader but another (ordinary follower) agent (see Figure 8. which can be easily shown using the definition of DOF and Lemma 1 of Reference [15]: There is no cycle in an acyclically led formation passing through the leader. in Section 8. In a cyclically led formation. In an acyclically led formation. In Section 8.5. we assume that the first follower directly follows the leader in an acyclically led formation and the three coleaders follow each other in a cyclically led formation. the two-DOF agent is called the leader and the one-DOF agent is called the first follower.3a). . the motions of the three coleaders are cyclically dependent on each other and hence the motion control for the formation requires a more implicit strategy than one for an acyclically led formation. however. The names cyclically led and acyclically led come from the following facts.3b).5. and there exist cyclically led (minimally) persistent formations where the coleaders do not directly follow each other but some other agents (see Figure 8. A2 A2 A1 A4 A5 A4 A3 (a) (b) A3 A5 A1 FIGURE 8. It can be easily shown that any two-dimensional minimally persistent formation has to be either acyclically led or cyclically led. In a cyclically led formation. For simplicity. the one-DOF agents are called the coleaders. A5 ) do not follow each other.

three types of controllers are needed. for example.1 Control Law for Zero-DOF Agents Consider a zero-DOF agent Ai and the two agents Aj and Ak it follows. because it is recognized that there may be clashes between them. G3: Any zero-DOF or one-DOF agent shall move through a path of shortest length in order to satisfy its distance constraints. In other words. 8. and Guideline G1.5. Note here that the guidelines are labeled as guidelines (separate from the assumptions). and two-DOF agent sets. Assumptions A1 to A4. initial and final zero-velocity constraints.5. Based on Assumptions A1 to A4 and Guidelines G1 to G3. and so on. In our case the two separate motion vector types of an agent are (1) to maintain a distance constraint with each of the agents it follows and (2) to move towards a final destination. The way these clashes are resolved is embedded in the control laws presented below. In our control design to solve Problem 1 for minimally persistent formations. in a two-dimensional minimally persistent formation. we use basic vector analysis and borrow ideas from the virtual vector field concept. although the motion behaviors and stability and convergence analysis for these two categories are expected to be different. For optimality considerations and to cope with constant velocity requirements in certain unmanned air vehicle (UAV) and other flight formation applications. regardless of whether the formation is acyclically led or cyclically led.1. .5 Decentralized Control of Cohesive Motion 8.6 The main idea in the virtual vector field approach is obtaining the overall velocity vector (for each agent) as the superposition of the vectors defining each of the separate motion tasks (of this agent). we design a control scheme for zero-DOF. Note here that.266 Modeling and Control of Complex Systems 8. one-DOF. any zero-DOF 6 However. details of which can found in Reference [5] and the references therein. and two-DOF agents separately. the interagent distance constraints are not considered and hence do not constitute a vector field. one for each zero-DOF.1 Control Design Based on the definition of Problem 1. as in these works. it can be judged that the structure of the individual controller of each agent in the minimally persistent formation of interest should be specific to its DOF. the virtual vector field approaches described in these works are different from our approach. one-DOF. we assert the following two additional guidelines in our control design in addition to Assumptions A1 to A4 and Guideline G1: G2: Any agent shall move at the constant maximum speed v > 0 ¯ at any time instant t ≥ 0 unless it is impossible to do so at that particular instant t due to.

Due to the distance constraints of keeping | pi (t) − p j (t)|. we design the following control scheme for Ai : vi (t) = βi (t)vi1 (t) + 1 − βi2 (t)vi2 (t) δ ji (t) = (δ ji x (t). at each time t ≥ 0.Persistent Autonomous Formations and Cohesive Motion Control 267 agent necessarily follows exactly two other agents by definition of minimal persistence and Proposition 1. pid (t) can be explicitly determined as: ¯ pid (t) = p jk (t. First.2) where p jk (t.4) . the switching term βi (t) is introduced to avoid chattering due to small but acceptable errors in the desired interagent distances.3) where v > 0 is the constant maximum speed of the agents and εk > 0 is a small ¯ design constant.3).1.5. δ ji y (t)) = p j (t) − pi (t) ¯ δ ji (t) = δ ji (t) − di j δ ji (t)/|δ ji (t)| ⎧ ¯ δ ji (t) < εk ⎪ 0. pi (t)) (8. observe that once Ai satisfies its distance constraint with Aj . p0 ) for any p0 ∈ 2 denotes the intersection of the circles ¯ C( p j (t). we propose the following control law for the zero-DOF agent Ai : ¯ vi (t) = vβi (t)δid (t) /|δid (t)| δid (t) = pid (t) − pi (t) = p jk (t. | pi (t) − pk (t)| at the desired values of di j . In Equation (8. respectively. ·) the first argument denotes the center and the second denotes the radius. Assuming | pi (t) − pid (t)| is sufficiently small. and in the notion C(·. Based on this observation and Guidelines G1 to G3. ¯ ⎩ δ ji (t) ≥ 2εk (8. di j ) and C( pk (t). ⎪ ⎪ ⎨ |δ (t)|−ε id k εk ≤ |δid (t)| < 2εk βi (t) = εk ⎪ ⎪ ⎪ 1. the desired position pid (t) of Ai is the point whose distances to p j (t) and pk (t) are di j . Based on this observation. respectively. and which satisfies continuity of pid (t). it is free to move on the circle with center p j and radius di j provided that it does not need to use the whole of its velocity capacity to satisfy pi − p j = di j . ⎪ ⎪ ⎨ δ (t) −ε | ¯ ji | k ¯ εk ≤ δ ji (t) < 2εk βi (t) = εk ⎪ ⎪ ⎪ 1. dik ) that is closer to p0 . pi (t)) − pi (t) ¯ ⎧ |δid (t)| < εk ⎪ 0.2 Control Law for One-DOF Agents Let agent Ai have one DOF and Aj be the agent it follows. dik . |δid (t)| ≥ 2εk ⎩ (8. dik . 8.

5. The ¯ switching term βi (t) is for avoiding chattering due to small but acceptable errors in the final position of Ai . which can take place only when | pi − p j | is sufficiently close to di j . . Hence the velocity input at each time t can be simply designed as a vector with magnitude v in the direction of pi f − pi (t): ¯ vi (t) = vβi (t)δi f (t) |δi f (t)| ¯¯ δi f (t) = pi f − pi (t) ⎧ |δi f (t)| < ε f ⎪ 0.4).6) εk . In Equation (8. 8.1.5) (8. δ ⊥ (t) is the unit vector perpendicular to the distance vector δ ji (t) = p j (t) − pi (t) with clockwise orientation with respect to the circle C( p j (t). via the switching term βi (t).5]) to satisfy | pi − p j | ∼ di j and the rotational action vi2 (given in Equation [8. ⎪ ⎨ ¯ βi (t) = |δi f (t)|−ε f ε f ≤ |δi f (t)| < 2ε f εf ⎪ ⎪ ⎩ 1. ¯ ji In Equation (8. |δi f (t)| ≥ 2ε f (8. it can use its full velocity capacity only to move towards its desired final position pi f .6). ε f > 0 are small design constants and ·. |δi f (t)| ≥ 2ε f (8.3 Control Law for Two-DOF Agents If a given agent Ai has two DOFs (which can only happen if Ai is the leader of an acyclically led formation in our case). di j ). ⎪ ⎨ |δi f (t)|−ε f ¯ βi (t) = ε f ≤ |δi f (t)| < 2ε f εf ⎪ ⎪ ⎩ 1.268 where Modeling and Control of Complex Systems ¯ ¯¯ vi1 (t) = vδ ji (t)/|δ ji (t)| ¯ ji ¯ ji ¯¯ vi2 (t) = vβi (t)sgn( δi f (t). δ ⊥ (t) determines the orientation of motion that would move Ai towards pi f . · denotes the dot product operation. the controller switches between the translational action vi1 (given in Equation [8.6]) to move = the agent Ai towards pi f . as it does not have any constraint to satisfy. δ ji x (t))/|δ ji (t)| ⎧ |δi f (t)| < ε f ⎪ 0. δ ⊥ (t) ) δ ⊥ (t) δi f (t) = pi f (t) − pi (t) ¯ ji ¯ ¯ ¯ δ ⊥ (t) = (−δ ji y (t). and the term sgn ¯ ji δi f (t).7) ¯ The switching term βi (t) again prevents chattering due to small but acceptable errors in the final position of Ai .

5. Note here that. 8.4. Am uses the control law (8. .4). A2 is the first follower.1 A1 uses the control law (8. 2 from Equation (8. A2 uses the control law (8. Am .7) it follows that: T ˙ V1 (t) = −δ1 f (t)v1 (t) T = −¯ β1 (t)δ1 f (t)δ1 f (t)/|δ1 f (t)| v¯ = −¯ β1 (t)|δ1 f (t)| v¯ ⎧ if δ1 f (t) < ε f ⎪ 0. Following is an informal sketch of a possible analysis of the stability and convergence properties of F during its motion. . .7).5. In this case based on the proposed control scheme in Section 8. A2 follows A1 . . by the assumption at the end of Section 8. A2 is the first follower.Persistent Autonomous Formations and Cohesive Motion Control A2 269 A1 A4 A5 A3 FIGURE 8. if ε ≤ δ (t) < 2ε (8.5.6). . The leader agent A1 uses the T control law (8. each of A3 .1 Acyclically Led Minimally Persistent Formations Consider Problem 1 for an acyclically led minimally persistent formation F with m ≥ 3 agents A1 . where without loss of generality.3). defining the Lyapunov function V1 (t) = 1 δ1 f (t)δ1 f (t). .7). Consider dynamic behavior of each agent separately.2. Hence.1 to each of the classes of acyclically led and cyclically led persistent formations separately.5. ⎨ |δ1 f (t)|−ε f ≤ −¯ δ (t) − ε . and the other agents are ordinary followers (such a formation is depicted in Figure 8.8) = −¯ |δ1 f (t)| v v 1f f f 1f f εf ⎪ ⎩ −¯ |δ1 f (t)| ≤ −2¯ ε f . .4 An acyclically led minimally persistent formation to be controlled: A1 is the leader.2 Stability and Convergence In this section we informally discuss the stability and convergence properties associated with the application of the control laws designed in Section 8. A1 is the leader. v v if δ1 f (t) ≥ 2ε f . noting that the formal complete analysis was not completed during the submission of this chapter. 8.4) to (8. .

.1. . for agents Ai where i ∈ {3. Via a geometric analysis of accumulation of the position errors it is observed that Equation (8.4 which involves some loss of generality. the circles C( p j (t). Note here that the discussions above. can be deduced. m}).5). . the ball with center p1 f and radius ε f .5. Vid (t) = 8. . 2ε f ) in finite time and remains there. each of A1 .8) that p1 (t) enters the ball B( p1 f .4) to (8. . A2 . 4.270 Modeling and Control of Complex Systems Therefore. and converges to finite time-varying balls around p1 (t) for t ≥ 0 with certain radii as well as a fixed ball around p2 f with a certain radius. The results shown in Reference [25] for a four-agent example formation indicate that the control goals are successfully achieved where each agent satisfies its distance constraints all the time during motion with a significantly small error (less than 2% of the distance to be kept). Combining this result with the above ones for agents A1 .5. dik ) intersect for all t ≥ 0. A2 and via some geometric analysis on the definition of pid in (Equation 8. 4. Am uses the control law (8. . as well as satisfaction of the distance constraints within certain error tolerance bounds. .6) and each of A4 .2) can be guaranteed to be well defined selecting the constant εk sufficiently small. This analysis is expected to establish the conditions under which p2 (t) remains bounded. A2 .2 Cyclically Led Minimally Persistent Formations Consider a cyclically led minimally persistent formation F with m ≥ 3 agents A1 . A3 follows A2 .4) to (8. . A similar but relatively longer analysis can be done for the dynamic behavior of A2 defining the Lyapunov functions 1 T 1 T ¯ ¯ δ12 (t) δ12 (t). . .2. ∀t ≥ 0 and limt→∞ |δ1 f (t)| ≤ ε f . . where A1 . . di j ) and C( pk (t). By the assumption at the end of Section 8. and A3 are the coleaders. In this case. ˙ Similarly. are valid if and only if pid (t) in Equation (8. Some simulation results and discussions on testing of the control structure described above on acyclically led persistent formations can be found in Reference [25]. and A1 follows A3 . Am . ε f ).2). that is. .6). . we have |δ1 f (t)| ≤ |δ1 f (0)|. assume also that A2 follows A1 .3). based on the proposed control scheme in Section 8. V2 f (t) = δ2 f (t)δ2 f (t) 2 2 ˙ 21 (t) and V2 f (t) together with V1 (t) and the control law Equa˙ ˙ and examining V tions (8.2) is well defined. . the conditions to guarantee convergence of all the agents to their final desired positions. that is. as well as the applicability of the control laws for agents Ai (i ∈ {3. analyzing Vid (t) for V21 (t) = 1 T δ (t)δid (t) 2 id it appears possible that boundedness and convergence properties of each pi (t) can be established. p1 (t) is always bounded and asymptotically converges to the ball B( p1 f . and A3 uses the control law (8. m}. and the other agents are ordinary followers (such a formation is depicted in Figure 8. It can be further deduced from Equation (8. . that is. .

Persistent Autonomous Formations and Cohesive Motion Control A2 271 A1 A4 A5 A3 FIGURE 8. a more practical design and analysis procedure for the cohesive motion problem would require a more complex agent model than Equation (8.5.1). we have designed control schemes for and analyzed the problem of cohesive motion of persistent formations based on the velocity integrator model (8. 8. Combining all these results.5 A cyclically led minimally persistent formation to be controlled: A1 . the conditions guaranteeing cohesive motion as well as stability and convergence of the entire formation to the desired destination can be deduced.6). will depend on the specifications of the agents used in this application. Again. A2 .1 demonstrate that the global stability and convergence properties or cyclically led persistent formations are comparable to those of the acyclically led formations.4) to (8. Nevertheless. . This is mainly because of guidance of the whole formation by a coleader triple which make constrained motions as described by Equations (8. the formal analysis corresponding to the above sketch has not been completed yet. The actual kinematic or dynamic model of agents in a real-life formation.3 More Complex Agent Models In this chapter.1) for the agent kinematics. of course. the corresponding simulation results and discussions of Reference [25] on testing of the control structure described in Section 8. would be more complex than a velocity integrator in general. The form of such a model for a particular application. One distinction observed in the results shown in Reference [25] is that the agents follow longer and more indirect paths than the acyclically led case. and A3 are the coleaders. however. Therefore. For the remaining agents it appears possible that boundedness and convergence properties of each pi (t) can be established in a similar manner to that suggested for acyclically led formations. Some of the widely used continuous-time agent models7 used in the formation control literature corresponding to the practical experimental agents 7 Discrete-time counterparts of these models exist and are used in the literature as well.5.

13]: Mi ( pi ) pi + ηi ( pi . yi (t)) denotes the position of Ai as before and θi (t) denotes the orientation or steering angle of the agent with respect to a certain fixed axis. The simulation results in Reference [25] demonstrate that cohesive motion control with agent kinematics or dynamics models that are more complex than the velocity integrator model is feasible. the force and torque inputs.5. and are not direct extensions of the designs presented in Section 8. and other additive disturbances.22]. and the nonholonomic unicycle dynamics model [19]: ( xi . based mainly on simulation studies. The control schemes used in this work employ a so-called “separation-separation control” idea Reference [9]. vi is the translational speed.10). are currently being investigated by the authors. the fully actuated uncertain dynamics model [11. pi ) = ui ¨ ˙ where Mi represents the mass or inertia matrix. Design of an enhanced control scheme to obtain a better performance for the unicycle kinematic agent model (8.1) ˙ and it is assumed that the control input is the vectorial acceleration a i [21.9) where pi (t) = (xi (t). on the solution of Problem 1 for the nonholonomic unicycle kinematic agent model (8. gravitational effects. vi sin θi ) ˙ ˙ ˙ θi = ωi (8. and ui is the control signal that can be applied in the form of a vectorial force. which was originally developed for following a mobile robot by another one at a specified distance or relative position.g. robots or vehicles) of interest are the double integrator or point mass dynamics model. respectively. Coriolis.. Some preliminary results. it is observed that the control goal in Problem 1 is achieved.272 Modeling and Control of Complex Systems (e.1. and the control input signals ui1 . yi ) = (vi cos θi . where the acceleration term vi = a i is added to the model (8. . A simplified form of Equation (8. ωi is the angular velocity. vi sin θi ) ˙ ˙ ˙ θi = ωi 1 vi = ˙ ui1 mi 1 ωi = ui2 ˙ τi (8.10) are presented in Reference [25]. Nevertheless. ηi represents the centripetal.ui2 are. yi ) = (vi cos θi . in the simulations in Reference [25] using the new control scheme. as well as similar designs for the double integrator and fully actuated uncertain dynamics models.9) is the nonholonomic unicycle kinematics model [9]: ( xi .10) where it is assumed that the translational speed vi and the angular velocity ωi can be applied as control inputs directly. although the performance is poor compared to the performance for the velocity integrator model in terms of both the path length and the distance constraints.

the total displacement of all agents. the authors are currently working on developing new metrics to characterize health and robustness of formations. and so on. B.5. we have analyzed certain persistence acquisition and maintenance tasks.2 and 8. Using these characteristics and criteria. Information Technology and the Arts and the Australian Research Council through the Backing Australia’s Ability Initiative. and C. which is funded by the Australian government’s Department of Communications.Persistent Autonomous Formations and Cohesive Motion Control 273 8. Different approaches to these ongoing studies as well as consideration of various real-life effects such as distance measurement noises. We have reviewed the general characteristics of rigid and persistent graphs and their implications on the control of persistent formations. Beside these.6 Discussions and Future Directions In this chapter we have analyzed persistent autonomous multiagent formations based on a recently developed theoretical framework of graph rigidity and persistence. guaranteeing persistence after merging of two or more persistent formations to accomplish the same mission. for example. Hendrickx holds an FNRS fellowship (Belgian Fund for Scientific Research). as well as the general solution to persistence acquisition problems defined in Section 8. as well as testing theoretical results that can be applied to the control of formations of aerial vehicles.3. Fidan. recovering persistence in the event of an agent loss. and solution of the cohesive motion problem in the existence of obstacles in the region of interest. Acknowledgment The work of B. Later.1. Relevant to the studies presented in Sections 8. .3. There still exist open problems or designs to be completed related to discussions presented on each of characteristics. lack of global position sensing for some agents. persistence acquisition. the general forms of splitting and closing rank problems for persistent formations remain open. persistence maintenance. Related to the cohesive motion problem. J. and cohesive motion of persistent formations.3. and presented some operational criteria to check the persistence of a given formation. Yu is supported by National ICT Australia. the authors are currently working on analyzing and enhancing the control laws and strategies presented for optimality in terms of. Anderson. design of similar control schemes for more complex agent models discussed in Section 8. may constitute different future research directions. we have analyzed cohesive motion of persistent autonomous formations and presented a set of distributed control schemes to cohesively move a given persistent formation with specified initial position and orientation to arbitrary desired final position and orientation.

.. B. S. K. in Proc. Rigidity.... J.. 5. and Utkin. Vol. No. Yu. 223–258. Gazi. P. December 2005. R. J. V. Fazackerley.D. Y. B. Das. Belta. References 1. 4. Agent behaviour and capability correlation factors as determinants in fusion processing. and Bogner. Cairns. Murphey and P.E. 1992. Yang. Conf..M. Dordrecht. 247–254.D. Cooperative Agents.. Vol. 14.-C. A. J. pp. 2nd Int. V.. V.. and The Concerted Research Action (ARC) “Large Graphs and Networks” of the French Community of Belgium... V.. IEEE Transactions on Automatic Control. and Passino. on Intelligent Sensors. and Anderson.. 865–875. P. in Proc. W.. 6. Eren. J. A vision-based formation control framework. 2003. and Kumar. Kumar. 1208–1214. 11.P. 21. Fusion 2003: Special Session on Fusion by Dist. December 2005.. No. in Cooperative Control and Optimization. Communications in Information and Systems. 403–409. 13. and Finn. Pardalos (eds.O.. Eren. Vol. R.. Vol. Autonomous control of multiple UAVs for the passive location of radars. the Netherlands.274 Modeling and Control of Complex Systems His work is supported by the Belgian Programme on Interuniversity Attraction Poles initiated by the Belgian Federal Science Policy Office. Song. and Kumar.. 2003.. April 2003. 2.I. Abstraction and control for groups of robots. in Proc. in Proceedings of the 42nd IEEE Conference on Decision and Control.K. R. 2673–2684. T. pp.).D.N. pp. A. R.. and Belhumeur. pp. C.O. in IET Control Theory & Applications. 4. Whiteley. Morse. Australia. Stability analysis of swarms. Morse. A.S. and Blondel.N. Fidan. Graph Theory Applications. Gazi. 4.R. March 2004. IEEE Transactions on Robotics and Automation. Anderson. Acquiring and maintaining persistence of autonomous multi-vehicle formations. Baillieul. and Belhumeur. 10. No. Vol. Vol... Goldenberg. The scientific responsibility rests with its authors. 48. 3.S.S. IEEE Transactions on Robotics and Automation. Anderson.. Brown. 692–697. Cooperative control of robot formations. pp. Swarm aggregations using artificial potentials and sliding mode control. L. Drake.O. J. and Ostrowski. 2.. V. March 2007. 2. 73–94. pp. 556–563. October 2002. INFOCOM — Conference of the IEEE Computer and Communications Societies. A. 11.R. B. Vol. 1. pp...M. 1. June 2004. B. pp. Whiteley. Operations on rigid formations of autonomous agents.. Bowyer. B. 7. 452–460. Sliding mode control for gradient tracking and robot navigation using artificial potential fields. pp... K. 6.. Sensor Networks and Information Processing (ISSNIP).. A. Kluwer Academic. and randomization in network localization. W. P. 4. Information patterns and hedging Brockett’s theorem in controlling vehicle formations. 9. D. pp. computation. C. 813–825. IEEE Transactions on Robotics and Automation.. Das. T. 12.D. V. R.. Guldner.. Vol. Vol. Anderson. pp. Foulds.O.. 20. A. Hendrickx. Directed graphs for the analysis of rigidity and persistence in autonomous agent systems. 3. V. No. 8.. 18. and Suri. Delvenne. J. New York. Fierro.. 2004. IEEE Transactions on Robotics and Automation. No. 2002. Springer-Verlag.D. April 1995. . Fierro.

).. Tay. K. Goodman and J. C. 379–384. and Passino. pp.D. Francis. Generating isostatic frameworks. R. Yu. S.D. 401–420. B.. C.. and Blondel. O’Rourke (eds. IEEE Transactions on Robotics and Automation. 21–69. B. 859–873. Yu. Vol. 17.O. IEEE Transactions on Automatic Control. Rigidity and scene analysis. Oxley. Yu. IEEE Transactions on Automatic Control.. pp.R. 171–311. Automatica. persistence. H. Sandeep.. CRC Press. Vol. 1997. December 2006. Leader-to-formation stability..M. B.J. 2004. Hendrickx. and Kumar. 30–44. Fidan. Lin.M. W. in Handbook of Discrete and Computational Geometry. J. pp. J. 43. Vol. T. Servatius (eds. July 2007. 893–916. R. 30. 27. Fidan. Yu. A decentralized scheme for spacecraft formation flying via the virtual structure approach...).. Anderson.M. 933–941...J. Fidan. Three and higher dimensional autonomous formations: Rigidity.. V.. pp. pp. 21. 5980–5985. J. G. March 2006. and Murray. Beard. 1970.. M. Some matroids from discrete applied geometry. Vol. Journal of Engineering Mathematics. R.. C. B. Formation reorginization by primitive operations on directed graphs. B. pp. J. 24.O. No. Anderson. J. AIAA Journal of Guidance. 28.M. 331–340.. International Journal of Robust and Nonlinear Control. June 2006. to appear in IEEE Transactions on Automatic Control. . Lawton. Flocking for multi-agent dynamic systems: Algorithms and theory. 29.. Tanner.. Fidan. Fidan. pp. Control and Dyanmics. Pappas. Elementary operations for the reorganization of minimally persistent formations. B..M. J. 20. Vol. 25.. 860–881. Graph rigidity and distributed formation stabilization of multi-vehicle systems.. 4. American Mathematical Society. 50.. R. Hendrickx. in Proceedings of the IEEE Conference on Decision and Control. Stable social foraging swarms in a noisy environment. 3.. 31. Liu. No.. and Blondel. On graphs and rigidity of plane skeletal structures. 443–455. Olfati-Saber. pp.D. and Maggiore. in Proceedings of the 14th Mediterranean Conference on Control and Automation. Yu. 2965–2971. W. Vol. and B. Ren. December 2003. January 2005. 3.D. B. W.M.. V. Decentralized cohesive motion control of multi-agent formations.. B. Y. FL. pp. Structural Topology.G. Olfati-Saber..W.D. 121–127. 27. B. pp. A decentralized approach to formation maneuvers. W.. pp. and Whiteley. January 2004.D... 19. J. and Anderson. R. and Yu. IEEE Transactions on Robotics and Automation. Sensor Networks and Information Processing (ISSNIP). No.. July 2006. 20. in Matroid Theory.T. Rigidity and persistence of meta-formations. RI. B. 19. 387–402. Fidan. 49. B. C. No.W. Z.O. Necessary and sufficient graphical conditions for formation control of unicycles. B. B.. Whiteley. V. Hendrickx. Providence. 6. and Young.D.A. Hendrickx. 3. pp.G. 1.. Laman. 17. Boca Raton. Vol. 197 of Contemporary Mathematics. and Blondel. Bonin....D. and Anderson. pp. Whiteley. Vol. Vol. G. and Beard. 18. 73–82. pp. Anderson. 16. in Proceedings of the 2nd International Conference on Intelligent Sensors. V. 22. J. 1996.. in Proceedings of the 17th International Symposium on Mathematical Theory of Networks and Systems (MT NS2006).O. 11. December 2002. C.. December 2005. and structural persistence.. March 2007. 26. IEEE Transactions on Automatic Control. 1985. pp..O.Persistent Autonomous Formations and Cohesive Motion Control 275 15. Vol. No. 51.E.. 2004. in Proceedings of the 45th IEEE Conference on Decision and Control. Persistence acquisition and maintenance for autonomous formations. 23. C. pp. Vol.

.

Future combat operations will continue to place unmanned aircraft in challenging conditions...................... 284 Two-Vehicle Example ................................................................................................................ such as the urban warfare environment............3 9................................... Panos Antsaklis..................... and Kimon Valavanis CONTENTS 9.......................................9 Modeling and Control of Unmanned Aerial Vehicles: Current Status and Future Directions George Vachtsevanos........................................8 Conclusion ......................1 9............4 9............ reduced autonomy.7 New Directions and Technological Challenges..............7.............................................................................................. 277 System Architecture ............... 294 9..... 294 References.2 9........................ 293 9......................... 292 9............. However............................................ and operator workload requirements of current unmanned vehicles present a roadblock to their success............ 292 9...1 Technological Challenges........1 Particle Filtering in a Bayesian Framework ... the poor reliability............................ sharing resources and 277 ............................. 285 Simulation Results .................6 Introduction.....6....................................................................... 291 9............................ 280 Formation Control ..................................................7......................................... 290 9..............1 Introduction Recent military and civil actions worldwide have highlighted the potential utility for unmanned aerial vehicles (UAVs)......................... 288 Target Tracking..................5 9... Both fixed-wing and rotary aircraft have contributed significantly to the success of several military and surveillance/rescue operations.....2 Enabling Technologies...... It is anticipated that future operations will require multiple UAVs performing in a cooperative mode.................

pipeline monitoring. with emphasis on recent developments aimed at improved autonomy and reliability of UAVs. communications. The challenges increase significantly as we move up the . and computing technologies that must be developed and validated if such complex unmanned systems as UAVs are to perform effectively and efficiently. and adversarial reasoning strategies require new technologies to deal with issues of system complexity. target tracking. Figure 9. and other applications will utilize an unprecedented level of automation in which human-operated. We will conclude by proposing possible solutions to these challenges. uncertainty management.1 shows the autonomous control level trend according to the Department of Defense UAV Roadmap [2]. We will pose the major technical challenges arising in the “system of systems” approach and state the need for new modeling. in a variety of application domains. Surveillance and reconnaissance tasks that rely on UAVs require sophisticated modeling. planning. networking. flying in a swarm. computing. The future urban warfare. Multiple UAVs. plug-and-play. and deconfliction of assets coupled with the technology to realize such concepts. and semiautonomous air and ground platforms will be linked through a coordinated control system [1]. and control. in conjunction with manned systems. computing. and communications requirements. The same roadmap details the need for new technologies that will address single-vehicle and multivehicle autonomy issues. as well as search and rescue. Networked UAVs bring a new dimension to future combat systems that must include adaptable operational procedures. We will briefly survey softwareenabled control technologies — fault detection and control reconfiguration. This chapter reviews the current status of UAV technologies. and other quality of service functions. Homeland security. Thus. Here. and envelope protection that are intended to endow the UAV with improved autonomy and reliability even when operating under extreme conditions. optimum terrain coverage. and discusses future directions and technological challenges that must be addressed in the immediate future. planning. planning. and control technologies. and computational efficiency. adaptive mode transitioning. Current R&D activities are discussed that concern issues of modeling. We view the assembly of multiple and heterogeneous vehicles as a “system of systems” where individual UAVs are functioning as sensors or agents. and the emerging need for improved UAV and UAV team autonomy and reliability. rescue operations. such as forest fire detection.278 Modeling and Control of Complex Systems complementing other air or ground assets. and communications issues must be considered as the UAVs are tasked to perform surveillance and reconnaissance missions in an urban environment. hostile threats. networking. The technical challenges the control designer is facing for autonomous collaborative operations stem from real-time sensing. environmental and operational uncertainty. and so on. A software (middleware) platform enables real-time reconfiguration. autonomous. border patrol. constitute a network of distributed (in the spatiotemporal sense) sensors that must be coordinated to complete a complex mission. The same scenario arises in similar civil applications.

especially from an implementation and integration point of view. computational intelligence. control. contingency management. have been investigated intensively in recent years. .1 Autonomous control level trend. We will emphasize the synergy of tools and methodologies stemming from various domains. Moderate success has been reported thus far in meeting the lower-echelon challenges. hierarchy of the chart shown in Figures 9. links to command and control. such as distributed artificial intelligence (DAI). while less attention has been paid to the system architecture. However. as well as the resurfacing of classical mathematical notions that may be called upon now to solve difficult spatiotemporal dynamic situations. such as formation control and autonomous search. more work has been focused on solving particular problems. in this area. as well as game theory and dynamic optimization. Other significant concerns relate to inter-UAV communications. and communication concerns and highlight those new directions that are needed to assist in arriving at satisfactory solutions. We will review briefly in this chapter a few of the challenges referred to above and suggest possible approaches to these problems. and soft computing. Recent advances in computing and communications promise to accommodate the on-line real-time implementation of such mathematical algorithms that were considered intractable some years back.Modeling and Control of Unmanned Aerial Vehicles Autonomous Control Levels Fully autonomous swarms 10 Group strategic goals Distributed control Group tactical goals Group tactical replan Group coordination Onboard route replan Adapt to failures & flight conditions Real-time health/diagnosis Remotely guided 9 8 7 6 5 4 3 2 1 1955 1965 1975 Global Hawk Predator Pioneer AF UCAV UCAR 279 UCAV-N 1985 1995 2005 2015 2025 FIGURE 9. Achieving the ultimate goal of full autonomy for a swarm of vehicles executing a complex surveillance and reconnaissance mission still remains a major challenge. The intent is to motivate through application examples the modeling.2a and b from single-vehicle to multivehicle coordinated control. innovative coordinated planning and control technologies. To meet these challenges. and so on.

fused with off.RT Health Diagnosis. cooperate. capability includes. Individual task planning/execution. Strategic group goals assigned Enemy tactics inferred ATR 8 Battlespace Cognizance Group accomplishment of strategic goal with minimal supervisory assistance 7 Battlespace Knowledge Short track awareness – History and predictive battlespace data in limited range. Tactical group goals assigned.g. The autonomous nature of agents allows for efficient communication and processing among distributed resources. timeframe and numbers. or replaces. Reduced dependence upon off-board data. and take actions to meet the objectives of a particular application or mission. Enemy Trajectory estimated Individual task planning/execution to meet goals Group accomplishment of tactical goal with minimal supervisory assistance 6 Real-Time MultiVehicle Cooperation Ranged awareness – Onboard sensing for long rang. Individual determination of tactical goal. 9. DAI systems provide an environment in which agents autonomously coordinate. Prognostic Health Mgmt) Group diagnosis and resource management Group accomplishment of tactical plan as externally assigned Air collision avoidance Possible close air space separation for AAR. Network failures and communication delays are one of the main concerns in the design of cooperative control systems. Ability – Optimize for current and board data to compensate for most predictive conditions failures and flight conditions.2 The autonomous control level chart. Choose tactical targets Coordinated tactical group planning Individual task planning/ execution Choose targets of opportunity Requires little guidance to do job Group accomplishment of strategic goal with no supervisory assistance 9 Battlespace Swarm Cognizance Enemy Strategy inferred. Capable of total independence Distributed tactical group planning. items from lower levels Level Level Descriptor Observe Perception/Situational Awareness Cognizant of all within Battlespace Battlespace inference – Intent of self and others (allies and foes). this requires that each UAV communicates all the data from its sensors to a central location and receives all the control signals back. . Collision avoidance Ability to predict onset of failures (e. On the other hand. Orient Analysis/Coordination Decide Decision Making Act Capability 10 Fully Autonomous Coordinates as necessary Strategic group goals assigned.280 Modeling and Control of Complex Systems Autonomous Control Level (ACL) Chart Note: As ACL increases. Complex Intense environment – on-board tracking Proximity inference – Intent of self and others (allies and foes).2 System Architecture While networked and autonomous UAVs can be centrally controlled. make decisions. supplemented by off-board data Tactical group goals assigned Enemy location sensed/ estimated Coordinated trajectory planning and execution to meet goals – Group optimization Group accomplishment of tactical goal with minimal supervisory assistance 5 Real-Time MultiVehicle Cooperation Sensed awareness Tactical group plan On-board trajectory – Local sensor to detect assigned replanning others. negotiate. formation in nonthread conditions FIGURE 9.

A “system of systems” approach suggests a hierarchical architecture for the coordinated control of multiple UAVs. capability includes.3. The formation control problem is viewed as a pursuit game of n pursuers and n . The primary task of a typical team of UAVs is to execute faithfully and reliably a critical mission while satisfying local survivability conditions. Deconfliction Self-accomplishment of tactical plan as externally assigned Medium vehicle airspace separation 3 Robust Response to RealTime Faults/Events Health/Status history and models Tactical plan assigned RT Health Diagnostics Ability to compensate for most control failures and flight conditions Evaluate status vs.4. sensors. shown in Figure 9. we adopt an assumed mission scenario of a group of UAVs executing reconnaissance and surveillance (RS) missions in an urban warfare environment. and communication and weapon systems. and obstacle avoidance. a middle level with local knowledge. Self resource management.2 (Continued). or replaces. as depicted in Figure 9. formation control. each individual UAV in the team is considered as an agent or sensor with particular capabilities engaged in executing a portion of the mission. required mission capabilities Abort/RTB if insufficient Self-accomplishment of tactical plan as externally assigned 2 Changeable Mission Health/Status sensors RT Health diagnosis Off-board replan Execute preprogrammed or uploaded plans in response to mission and health conditions Self-accomplishment of tactical plan as externally assigned 1 Execute Preplanned Mission Preloaded mission data Flight Control and Navigation Sensing Pre/Post Flight BIT Report status Preprogrammed mission and abort plans Wide airspace separation requirements 0 Remotely Piloted Vehicle Flight Control sensing Nose camera Telemetered data Remote pilot commands N/A Control by remote pilot FIGURE 9. and a lower level that interfaces with onboard baseline controllers.Modeling and Control of Unmanned Aerial Vehicles Autonomous Control Level (ACL) Chart Note: As ACL increases. The hierarchical architecture. features an upper level with global situation awareness and team mission planning. For the purpose of coordinated control of multiple UAVs. – Ability to compensate for most failures and flight conditions – inner loop changes reflected in outer loop performance On-board trajectory replanning – Event driven. In order to define the application domain. Each level consists of several interacting agents with dedicated functions. items from lower levels Level Level Descriptor Observe Perception/Situational Awareness Deliberate Awareness – Allies communicate data Orient Analysis/Coordination Decide Decision Making Act Capability 281 4 Fault/Event Adaptive Vehicle Tactical plan assigned Assigned Rules of Engagement RT Health Diagnosis.

evaders. The highest level of the control hierarchy features functions of global situation awareness and teamwork. and is further decomposed into tasks and subtasks. The vehicle model is simplified to point mass with acceleration limit. and create operational orders. The overall mission is usually planned by the command and control center based on the capabilities of each individual UAV agent. such as integer programming.7] can be applied as a solution to autonomous mission replanning. Simulation results are provided to verify the performance of the proposed algorithms. and team members’ status. It is also responsible for keeping track of the team’s plan and goals. which are finally allocated to the UAV assets (individually or in coordination with other vehicles). Collision avoidance is achieved by designing the value function so that it ensures that the vehicles move away from one another when they come too close together. Planning the UAVs’ flight route is also an integral part of mission planning. The mission planning agent is able to generate and refine mission plans for the team. which . This can usually be cast as a constrained optimization problem and tackled with various approaches.282 Modeling and Control of Complex Systems Urban Warfare GTMax GTMav OAV Sniper Manned Vehicle Fixed Wing UAV GTMax Ground Sensor Moving Target Soldiers Ground Sensor Ground Sensor Commander Operator FIGURE 9. A modified A* search algorithm. assuming that the destination points are avoiding the vehicles in an optimal fashion. Stability of the formation of vehicles is guaranteed if the vehicles can reach their destinations within a specified time. and so on.3 A team of five UAVs executing reconnaissance and surveillance missions in an urban warfare environment. generate or select flight routes. graph theory. Market-based methods [3–5] and especially auction theory [6.

hazard. whenever necessary. in heterogeneous agent societies. . interacting with the knowledge fusion agent.. while the followers fly in close formation in the proximity of the leader. The global performance measurement agent measures the performance of the team and suggests team reconfiguration or mission replanning. FIGURE 9.. and maneuverability measures. Real-world implementation of this level is not limited to the agents depicted in the figure. For example.Modeling and Control of Unmanned Aerial Vehicles Command Manned 283 Level 3 Global Knowledge Team Mission Planning Global Situation Knowledge Fusion Global Performance QoS Assessment Level 2 Local Knowledge Formation Control Moving Obstacle Local Situation Local Mission FDI/Reconfigurable Level 1 Behavioral Knowledge Vehicle Weapon System Communication Sensing Agent .4 A generic hierarchical multiagent system architecture. evaluates the world conditions based on data gathered from each UAV (and ground sensors if available) and reasons about the enemy’s likely actions. Adversarial reasoning and deception reasoning are two important tasks executed here. knowledge of coordination protocols and languages may also reside [11. The global situation awareness agent. Quality of service (QoS) is assessed to make the best effort to accomplish the mission and meet the predefined quality criteria. an optimal route is generated for the leader. attempts to minimize a suitable cost function consisting of the weighted sum of distance.12]. .. [8–10] can be utilized to facilitate the design of the route planner. In the case of a leader-follower scenario..

if the group of vehicles is viewed as the pursuer and the group of desired points in the formation as the evader.3 Formation Control The problem of finding a control algorithm that will ensure that multiple autonomous vehicles can maintain a formation while traversing a desired path and avoid intervehicle collisions will be referred to as the formation control problem. which provides a framework to determine acceptable initial vehicle deployment conditions but also provides insight into acceptable formation maneuvers that can be performed while maintaining the formation. however. Hence. However. The formation control problem has recently received considerable attention due in part to its wide range of applications in aerospace and robotics. in general. One such application is the so-called pursuit game. as they mention in their conclusion. However. In Reference [14] the individual trajectories of autonomous vehicles moving in formation were generated by solving the optimal control problem at each time step. The formation control problem can be regarded as a pursuit game. This chapter views the formation control problem from a two-player differential game perspective. if the vehicles can reach the points under such conditions then they will always be able to reach their destination. A classic example involving the implementation of the virtual potential problem is presented in Reference [13]. assuming that the destination points are avoiding the vehicles in an optimal fashion. it is necessary not only to determine the control strategies of the vehicles but also the optimal avoidance . Stability of the formation of vehicles is guaranteed if the vehicles can reach their destination within some specified time. if local minima exist. the drawback of the virtual potential function approach is the possibility of being “trapped” in local minima. It seems counterintuitive that the destination points should be avoiding the vehicles optimally. except that it is. one cannot guarantee that the system is stable. the pursuit game will be viewed as a perfect information game. The authors performed simulations on a two-dimensional system. that is. in order to solve such a problem it is advantageous to know the dynamics and the positional information of both the evader and the pursuer. This is computationally demanding and hence not possible to perform in real time with current hardware. which proved to be well behaved.284 Modeling and Control of Complex Systems 9. in which a pursuer has to collide with an evading target. Differential game theory was initially used to determine optimal military strategies in continuous time conflicts governed by some given dynamics and constraints [15]. the problem is essentially reduced to the standard but much more complex pursuit game. as the system consists of n pursuers and n evaders instead of only one of each. Naturally. much more complex in terms of the combined dynamic equations. As a consequence of our stability criterion.

within which we say that the formation has settled. how does one ensure that intervehicle collisions are avoided? To ensure this. it is necessary to consider the payoff function determined by the integral of G(x. ψ)dt = τ .Modeling and Control of Unmanned Aerial Vehicles 285 strategies of the desired points. 9. The f j (x. φ. ψ) is a predetermined function which. Such boundaries naturally add complexity to the problem. As an example. However. and the Vj is the corresponding value of the game. The above formulation suggests a way for approaching the solution to the differential game. φ. ψ) does not provide the means to perform the actual collision avoidance. G (x. however. Notice that the only quantity that is not specified in the equation is the Vj term. provides the payoff of the game. ψ) ⎦ = 0 ¯ ¯ which has to be true for both φ andψ. φ. So. then τ G(x. if we simply seek that the vehicles must reach their goal within a certain time τ .” that is. only initial conditions that ensure collision-free trajectories will be valid.4 Two-Vehicle Example In order to illustrate some of the advantages and disadvantages with the differential game approach to formation control. However. points that can move in three dimensions . φ. ψ) + G(x. if G (x. ψ). they also provide a framework for positional measurement errors. These initial condition requirements provide us with the ability to introduce tolerance boundaries. ψ) = 1. consider the following system of simple point “helicopters. Let us label the final control vector of the ¯ ¯ vehicles by φ and the control final vector of the desired points by ψ. we have restricted our solutions to the initial vehicle deployment. in order to incorporate collision avoidance into the controller. From the main equation it is possible to determine the retrograde path equations (RPEs). initial conditions of the RPEs will have to be considered in order to integrate the RPEs. However. which will ensure that the vehicles will reach the desired points in τ time. φ. the main equation which has to be satisfied is ⎡ min max ⎣ φ ψ j ⎤ Vj · f j (x. Hence. However. when integrated. φ. φ. This can be verified by evaluating 0 G(x. ψ) term is the jth dynamic equation governing the system. G(x. φ. φ. which will have to be solved to determine the actual paths traversed by the vehicles in the formation. Then. ψ) is changed to penalize proximity of vehicles to one another. but merely limits the solution space. one can either change the value function or add terms to the system of dynamic equations.

Figure 9. This simply implies that there is a constant distance separating the two desired points. 2. the ki and kd factors are simply linear drag terms to ensure that the velocities are bounded.5 shows the coordinate system and the associated angles. Hence the dynamic equations become: xd = vxd ˙ vxd = Fd cos(ψ1 ) sin(ψ2 ) − kd · vxd ˙ yd = v yd ˙ v yd = Fd sin(ψ1 ) sin(ψ2 ) − kd · v yd ˙ zd = vzd ˙ vzd = Fd cos(ψ2 ) − kd · vzd ˙ In the above dynamic systems. and the Fd and Fi terms are the magnitudes of the applied forces.286 Modeling and Control of Complex Systems governed by the following dynamic equations: xi = vxi ˙ vxi = Fi cos(φ2i−1 ) sin(φ2i ) − ki · vxi ˙ yi = v yi ˙ v yi = Fi sin(φ2i−1 ) sin(φ2i ) − ki · v yi ˙ zi = vzi ˙ vzi = Fi cos(φ2i ) − ki · vzi ˙ where i = 1.5 Definition of angles. and that the formation can only perform translations and not rotations in the three-dimensional space. Z 2i.Ψ1 X FIGURE 9.Ψ2 Y 2i-1. . The two desired “points” are described by one set of dynamic equations.

1).1).1) To obtain the control law that results from the max-min solution of Equation (9. ρ1 Vvz1 ¯ cos( φ 2 ) = − . the following control strategy for vehicle 1 is found: Vvx1 . For the optimal avoidance strategy of the desired points. the following lemma is used: LEMMA 1 Let a . α . ρd2 ¯ cos( ψ1 ) = + Vvyd ρd1 ρd1 ¯ sin( ψ2 ) = + ρd2 ¯ sin( ψ1 ) = + . ρd1 Vvzd ¯ cos( ψ2 ) = + . b ∈ : Then ρ= is obtained where max (a · cos (θ ) + b · sin (θ )) θ a 2 + b2 cos (θ ) = and the max is ρ. ρ2 ¯ cos( φ 1 ) = − where ρ1 = and ρ2 = 2 2 2 Vvx1 + Vvy1 + Vvz1 2 2 Vvx1 + Vvy1 Vvy1 ρ1 ρ1 ¯ sin( φ 2 ) = − ρ2 ¯ sin( φ 1 ) = − Similar results are obtained for vehicle 2. ρ and sin(θ ) = b ρ By combining Lemma 1 with the solution of Equation (9.1).Modeling and Control of Unmanned Aerial Vehicles 287 Substituting the dynamical equations into the main Equation (9. we obtain the following: Vvxd . we obtain the following expressions: min[F1 · (Vvx1 · cos(φ1 ) · sin(φ2 ) + Vvy1 · sin(φ1 ) · sin(φ2 ) + Vvz1 · cos(φ2 )) φ +F2 · (Vvx2 · cos(φ3 ) · sin(φ4 )+Vvy2 · sin(φ3 ) · sin(φ4 ) + Vvz2 · cos(φ4 ))] and max[Fd ·(Vvxd ·cos(ψ1 ) ·sin(ψ2 ) + Vvyd ·sin(ψ1 ) ·sin(ψ2 ) + Vvzd ·cos(ψ2 ))] ψ (9.

9. to obtain a more general solution. In Figure 9.6 the vehicles can move quickly enough to actually reach the desired trajectories. The above plot shows the tracking capabilities of the derived controller. there will always exist some positional error. Because the forces in the system are not dependent on the proximity of the vehicles to the desired points. in order to display the utility of this approach.5 Simulation Results From the closed-form expression of the control presented in the previous section.7 the velocities of the vehicles are not sufficient to reach the desired trajectories. it is obvious that the optimal strategies are in fact bang-bang controllers. This is due to the fact that the dynamic equations are decoupled. a solution manifold should be used. the final value will be zero. . the analysis is performed on the actual position and velocity differential equations. the vehicles simply move in a smaller circle. and hence working within a three-dimensional framework will not change the problem considerably. it should also be noted that this solution closely resembles the isotropic rocket pursuit game described in Reference [15]. which ensures that the error remains constant. the previously mentioned final conditions will suffice. Furthermore. and occurs when the difference between the desired position and the actual position is zero. Naturally. It is. and hence reduce the number of differential equations by a factor of 2.288 Modeling and Control of Complex Systems From this. for the sake of clarity. However. The closed-form expression of the value function is then of the form: Vvx1 = (x1 − xd ) · 1 − e −k1 t k1 It should be noted that the above analysis could be performed on a reduced set of differential equations. The two vehicles are attempting to follow two parameterized circular trajectories with a radius of three. where each equation would express the differences in distance and velocity. In the latter case. we see that the retrograde equations have the following form: vx1 = −F1 · x1 = −vx1 Vx1 = 0 V vx1 = Vx1 − k1 · Vvx1 o o o o Vvx1 + k1 · vx1 ρ2 For this example. whereas in Figure 9. however. possible to resolve this problem simply by switching controllers at some error threshold. or introducing terms that minimize the force terms F1 and F2 as the vehicles approach the desired points. however.

3D Position Plot Vehicle 1 Vehicle 2 Desired Trajectory 1 Desired Trajectory 2 15 10 Z 5 0 10 15 5 Y 0 0 10 5 X FIGURE 9. .Modeling and Control of Unmanned Aerial Vehicles 3D Position Plot Vehicle 1 Vehicle 2 Desired Trajectory 1 Desired Trajectory 2 289 15 10 Z 5 0 10 15 5 Y 0 0 5 X 10 FIGURE 9.7 Two-vehicle simulation with insufficient vehicle velocities.6 Two-vehicle simulation with sufficient vehicle velocities.

color and motion data are collected for each particle to determine which particles have a high probability of correctly tracking the target. The agent (sensor) placement problem is formulated by defining a global utility function to be optimized given a graph representing the region of interest.21]. We call the new setup (the sensor plus the UAV) an agent. this information is used to initialize the filter in the first few frames of video. This task is achievable as long as the target is within the sensor’s field of view (FOV). in which both tasks discussed above are integrated. On the next iteration. Particle filters have recently been successful in tracking mobile targets in video [19.17] because it captures the multipleobserver–multiple-target scenario with target prioritization. we can start a second task. and motion characteristics of the target is known a priori. a team of agents. To address this problem. Details of the approach can be found in Reference [18]. particles are drawn according to this probability. In the particle filter framework. Information such as size.6 Target Tracking There exist a number of efforts to formally describe the dynamic agent placement problem for target tracking. to (reactively or proactively) move the sensor to guarantee that the target stays in view. At each step. the target tracking task will fail to track the moving target until the target reenters the sensor’s FOV. the sensor is mounted on a moving platform such as a UAV. while the other particles “die. color. A real-time dynamic programming tool is called upon to solve the maximization problem.” . A course motion model is developed first where target transitions follow a stochastic model described by an Mth-order Markov chain. Thus. The video tracking problem consists of determining the position of a target within a particular video frame based on information from all past frames. W-CMOMMT can be shown to be an nondeterminstic plynomial (NP)-hard problem [18]. 9. other than the target tracking task. the state of each particle is updated as the video progresses from one frame to the next. The algorithm attempts to maximize the coverage by searching for a set of observation points at each time step. The work presented in this chapter is of the active sensing-based target tracking variety. successful particles “survive” and are used in subsequent frames. Thus.290 Modeling and Control of Complex Systems The term target tracking is often used to refer to the task of finding or estimating the motion parameters (mainly the location and direction) of a moving target in a time sequence of measurements. Thereafter. Agents use the model to predict the target locations at future time instants as probability distributions. and a set of targets. The choice is made to use a formulation of the variety of Weighted Cooperative Multi-robot Observation of Multiple Moving Targets (W-CMOMMT) [16. That second task is what we call the agent placement task. If it happens that the target keeps moving away from the FOV. using a model similar to that in Reference [20].

prediction and update. and wk is the importance (i) weight associated with this point. n (i) (i) where xk represents a point in the state space. there are a number of vibrations in the video. then the previous weights may be neglected and the above Equation becomes: (i) (i) wk ∝ p zk | xk After the weights are determined to at least a scale factor. zk is made and the prior is updated using Bayes’ rule: p(xk |zk ) p(xk |z1:k−1 ) p(xk |z1:k ) = p(zk |z1:k−1 ) In most cases. Therefore. xk−1 Measurements are then taken at each particle and the weights are updated using: (i) (i) (i) wk ∝ wk−1 p zk xk If the particles are resampled at each iteration. with Gaussian distributions. The particle filter is one way to estimate the above equations. based on all previous measurements. The movie camera was held by a human operator. can be determined in two steps. then the prior is given as: p(xk |z1:k−1 ) = p(xk |xk−1 ) p(xk−1 |z1:k−1 ) dxk−1 After the measurement. . when a Kalman filter is used. The wk are nonnegative. the above equations cannot be determined analytically. A particle filter iteratively approximates the posterior pdf as a set: Sk = (i) (i) xk . In the prediction step. . the system must be linear. probability density function (pdf). xk . This. The Kalman filter is a well-known exception. However. wk i = 1.1 Particle Filtering in a Bayesian Framework 291 The objective of Bayesian state estimation is to estimate the posterior pdf of a state. the particles are updated using the system dynamics and sampling from: (i) (i) p xk . A particle filter was used to track a soldier as he maneuvered in an urban environment 2. . and sum to unity. . z1:k . and the zoom is adjusted during the video.Modeling and Control of Unmanned Aerial Vehicles 9. If a first-order Markov model is assumed. the state update model is used to determine the prior pdf p(xk |xk−1 ).6. It has been shown21 that the posterior pdf estimated using particle filtering converges to the actual pdf as the number of particles increases. Frames were grabbed from a movie at a rate of 30 Hz. they are normalized such that their sum is equal to unity. . At each iteration. p(xk |z1:k ).

7 seconds apart from each other. A few frames of the output are shown in Figure 9.7. If the second lowest is “illuminated.” the neural network has output the highest confidence level.1 Technological Challenges From single system to “system of systems. 9.8. A neural network construct is employed to adapt the particle filter so that minimum number of particles is called upon and these particles are focused on the target even when the latter emerges from an occlusion. . Frames are approximately 1.” • Modeling — Spatiotemporal modeling paradigms are needed for real-time planning and control of networked systems. The box represents a weighted average of the ten best particles.7 New Directions and Technological Challenges 9.292 Modeling and Control of Complex Systems FIGURE 9.” the neural network has output the lowest confidence level. If the middle two are “illuminated. If the top three are “illuminated.8 Typical output frames. If the lowest “light” is “illuminated.” the neural network has output the second lowest confidence level.” the neural network has output the second highest confidence level. The set of “lights” in the upper left corner of each frame are used to indicate the output of the neural network.

and so on. Computing — On-platform computational requirements. improved and reliable/costeffective sensor suites. Computing — Embedded processing requirements. and assessment of networked systems. performance and effectiveness metrics. need for command and control and supervisory functions. Sensors and sensing strategies — Hardware/software requirements.7. “smart” sensors and sensing strategies. networked sensors. bandwidth and other quality of service requirements. open systems architectures. Control of networks of dynamic agents. contingency planning. pursuit-evasion. sensor fusion. new reasoning paradigms for tracking. software reliability issue. and so on. obstacle avoidance. Networking and communications — Communication protocols and standards. • • • • • . verification/validation. with system of systems behaviors. Control — Intelligent and hierarchical/distributed control concepts must be developed and expanded to address “system of systems” configurations. Software models for improved QoS. sensors). data processing.2 Enabling Technologies • New modeling techniques are required to capture the coupling between individual system/sensor dynamics. Sensors and sensing strategies — Innovative concept and technologies in wireless communications. data mining. Spatiotemporal models of distributed agents (sensors) are required to integrate system and motion dependencies. Game-theoretic notions and optimization algorithms running in almost real time to assist in cooperative control and adversarial reasoning. and so on. and so on. communications. formal methods for verification and validation. hardware and software architectures. Performance metrics/verification and validation — Need new system of systems performance and effectiveness metrics to assist in the design. • • • • 9.Modeling and Control of Unmanned Aerial Vehicles • 293 Control — Hierarchical/intelligent control of multiple networked systems (agents. surveillance/reconnaissance. coordinated control. Performance metrics/verification and validation — Defining metrics for design and performance assessment. Means to represent and manage uncertainty.and intrasystems reliable and secure communication protocols. Networking and communications — Inter. planning and scheduling. fault-tolerant computing platforms. new and reliable. Hybrid system approaches will play a key role.

Netherlands: Springer. D. R. Vol. S. Engelbrecht. 2004. 1998. “Intelligent Route Planning for Fast Autonomous Vehicles Operating in a Large Natural Terrain.. and communications/computing concerns into a single architecture. “Market-based Algorithms for Optimal Decentralized Control of Complex Dynamic Systems. of the 3rd International Conference on Multiagent Systems. G. 5. pp.. Homeland security. Air Warfare. 3.” Proc.. and reliability concerns have limited thus far their widespread utility. Baltimore. 7–66. W.. S. Singapore: World Scientific. and Vachtsevanos. and Prasad. or similar federated system of systems configurations to perform efficiently and reliably. Vol. 6. “OSD UAV Roadmap 2002-2027. References 1.” December 2002.. 8. New York: New York University Press. Bidding. Auctions. E.. Al-Hasan. Simon. Phoenix. Vachtsevanos. “Mission Planning and Flight Control: Meeting the Challenge with Intelligent Techniques. M. 1992.294 Modeling and Control of Complex Systems 9. control. 1999. J. M.” Proceedings of the American Helicopter Society 60th Annual Forum. and so on. V. October 1997. and Wellman. and Contracting: Uses and Theory. Office of the Secretary of Defense (Acquisition. Vol. 2002. Computational Optimization and Applications. AZ... W. and Stark. Market-Based Control: A Paradigm for Distributed Resource Allocation. 40. Kim. H. S. UAVs must possess attributes of autonomy in order to function effectively in a “system of systems” configuration. “A Market Protocol for Decentralized Task Allocation. Walsh. . Schrage.. pp. Typical application domains include reconnaissance and surveillance missions in an urban environment. pp. R.. L.. 4. H. target tracking and evasive maneuvers. 40. 1996. MD. Al-Hasan.. autonomy issues. Shubik.8 Conclusion Federated systems consisting of multiple unmanned aerial vehicles performing complex missions present new challenges to the control community. “An Intelligent Approach to Coordinated Control of Multiple Unmanned Aerial Vehicles. 2. 9. Coordinated/collaborative control of UAV swarms demands new and novel technologies that integrate modeling. 1–24. 1(1).. W. search and rescue operations. M. 3295–3296. Technology. The systems and controls community is called upon to play a major role in the introduction of breakthrough technologies in this exciting area.. Auction Algorithms for Network Flow Problems: A Tutorial Introduction. June 7–10. Clearwater. Major technological challenges remain to be addressed for such UAV swarms.” Journal of Advanced Computational Intelligence. & Logistics). J.” Journal of Robotics and Autonomous Systems. G. and Reimann. Vachtsevanos. 1983. F. 1. Bertsekas. Voos. M.” Proc. G. Tang. 62–70. of the 38th IEEE Conference on Decision and Control. D... pp. Excessive operator load. R. Rufus. Vol. 7.

Baras. Hegazy. “Decentralized Control of Autonomous Vehicles. pp. and Maskell. Al-Hasan.” Proceedings of the 41st IEEE Conference on Decision and Control. 2004. J.. Hoff. “A Neural Fuzzy Controller for Moving Obstacle Avoidance. S. 12. “A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking. S. U. T.” IEEE Transactions on Signal Processing.K. . 50.” Proceedings of the Twelfth Mediterranean Conference on Control and Automation. G.. September 24–27. M. 1965. New York: John Wiley & Sons. Manchester. Vermaak. pp.. pp. Las Vegas.. and Vachtsevanos. X. 11. 173–197.” Third International NAISO Symposium on Engineering of Intelligent Systems. P. Howard. Perez.” Proceedings of the Fourth International Conference on Autonomous Agents. Spain. 134–149. and Lee. Vol. ACM Press: Barcelona 2000. Perez. F. Malaga. Tan. B. “Model Predictive Control of Coordinated Multi-Vehicle Formation. B. 1526–1531.Modeling and Control of Unmanned Aerial Vehicles 295 10.. 2000. 31(1-4).” Proceedings of the European Conference on Computer Vision. A.” Proceedings of the 42nd IEEE Conference on Decision and Control. “Data Fusion for Visual Tracking with Particles.. and Hovareshti. 14. P. Differential Games: A Mathematical Theory with Applications to Warfare and Pursuit. and Pereira. 21–22. B. 2002. 4631–4636. Dunbar.” Proceedings of 5th Intl. Hue. December 2003. Hawaii. pp... J. April 10. and Blake. 15. pp. A. and Vermaak.” Annals of Mathematics and Artificial Intelligence. 17. pp. “Hierarchical Command and Control for Multi-agent Teamwork. Isaacs.. 2002. “Broadcast of Local Eligibility: Behavior-Based Control for Strongly Cooperative Robot Teams. Conf. and Murray. “A Framework for Networked Motion Control. December 2002. pp. M. 2002.” Proceedings of the IEEE. J. “Dynamic Agent Deployment for Tracking Moving Targets. Werger. and Vachtsevanos. G. 495–513. “From Insect to Internet: Situated Control for Networked Robot Teams. R. Vol. M. Kusadasi. pp. 92. C. R. Arulampalam. 1–13. 2004. 13.. 16. S. 21.. Turkey. B.. on Practical Application of Intelligent Agents and Mult-Agent Technology (PAAM2000).. 20.. Sousa. J. 19. 174–188. Aydin. 2001. 1532–1537. December 2003.. and Mataric. W. and Mataric. Control and Optimization. M.” Proceedings of the 42nd IEEE Conference on Decision and Control. J. “Color-Based Probabilistic Tracking. B. P. Werger.. Hawaii. C.. pp. 18.

.

.......................2........................................ 308 Challenges and Issues......................................................3................5 10.......2..........................1 Autonomous Recharge ..........................2 Heterogeneous Teams ........................2...................1 Results ......2 Organization .. 311 10...............................................1........... 324 Future Work............................. 304 10...............2.............2..3 How this Approach Differs ............. 329 10..........2 Traditional Algorithms for Docking .............2...........................1................2 10...............................2 Autonomous Docking and Deployment . 304 10....2 Deployment Methods ..........................3 Traditional Assumptions and Limitations of Docking .................1 Problem Description..............................3... 309 10............ 316 10.....2 Hardware ................................. 298 10........... 313 Optimization of Behaviors ...............1 Simulation Extensions................................................3.......................1 General Theory of Docking ......4................. 312 10...........................................................................................................2....6 297 ....... 309 10...5................................................................................................................................ 302 10............ 300 10...................................... 302 10....................2........2............10 A Framework for Large-Scale Multi-Robot Teams Andrew Drenner and Nikolaos Papanikolopoulos CONTENTS 10................................. 300 10......... 299 Related Works ............................... 309 10.3..................3 10.......1.....1 Resource Distribution Problem......1 Introduction...4 Docking Methods ..........................2..1.1 Docking ....2........ 314 10.............................1.2..............2........6............. 314 Simulation... 303 10...........3........................3 Cooperative Robotics .1........ 300 10........ 305 10....... 301 10........................................................................................1 Cooperative Maneuvering for Energy Conservation ........1.........................3 Deployment Models....1.............. 329 10............................. 299 10..........1 Homogeneous Teams .........................................2..4 10.................................................................

..... or an unintentional leak of chemical materials................ and available power...........2 Hardware Innovations ............... more operators than robots are required...3 Applications................... The work presented here deals with the modeling of a much larger-scale robotic team that utilizes principles of marsupial systems. 330 10............. However...... The basic design of the majority of marsupial systems does not have the scalability to handle larger-scale teams. but in general marsupial systems represent teams of two or three robots.............. A primary consideration to the team is cost........ multipurpose robots that are highly redundant are .......... Robotic teams offer the potential for increased accuracy.. the coordination and control of these robotic teams can be a challenging task......7 Conclusions ........... those responding to the scenarios may benefit greatly from the use of robots........... Specifically................. which offer increased performance and redundancy in complex scenarios................... 10... Whether the result of a natural disaster...... Large.... Each deployable member of the distributed robotic team has a specific model comprised of a series of behaviors dictating the actions of the robot.... 332 Acknowledgments .........6....... such as a hurricane or earthquake. sensing... processing..... the power consumption of large-scale robotic teams is modeled and used to optimize the location of mobile resupply stations................... which can create a logistical nightmare in terms of operating a functional team.. Other challenges exist for the utilization of robotic teams....... There has been some work in the area of marsupial systems. The transitions between these behaviors are used to guide the actions of both the deployable robots and the mobile docking stations that resupply them...... and recover smaller deployable robots............. the ability to transport a wide range of sensors...... In some cases............. The effectiveness of these teams requires that the team members take advantage of the strengths of one another to overcome individual limitations......... communication................298 Modeling and Control of Complex Systems 10. 333 Robotic teams comprised of heterogeneous members have many unique capabilities that make them suitable for operation in scenarios that may be hazardous or harmful to humans............. the ability to enter spatially restricted areas......... the number of scenarios where an unmanned response would reduce the risk to responders has increased greatly....6......1 Introduction In recent years........ 331 10............................ terror attack...................... Results from preliminary simulation are presented...... 332 References................ Some of these strengths and deficiencies come in the form of locomotion.............. Many times larger robots can be used to transport....... the ability to operate for extended periods without fatigue................. deploy.. and the ability to do this remotely either semiautonomously or autonomously.........

and power utilization. In addition. power (longevity). and ambulance services. a primary concern revolves around how long the team is capable of monitoring the area before it will run out of power. surveillance. • • 10.2. the individual capabilities of the robotic team members are often restricted or specialized. finding energy in the environment. cooperative robotics. specifically issues in resource distribution. fire. robotic team formation. Finding means to optimize the deployment time is required in many scenarios where robots may be monitoring environmental contamination (such as tracking the release of a harmful chemical agent) or searching for survivors in a disaster area.1. For the most effective response this means that robotic teams should be staged around an area. This requires selection of robots for the team that balance size.2 Organization This chapter is organized as follows. Even if a robot is deemed affordable. processing. Relevant literature.A Framework for Large-Scale Multi-Robot Teams 299 often prohibitively expensive to use in environments where they may be lost or by institutions with limited funding. heterogeneous teams comprised of robots with varying locomotion capabilities are desirable.1 as well as deal with . sensing. and inspection. As a result of cost and availability. reconnaissance. similar to the dispersal of police. resupply off other robots. Area coverage — There are a number of tasks. When searching areas there may be areas that a single robot may not be able to traverse or enter into because of size limitations. including energy conservation. When attempting to monitor a hazardous area. such as search and rescue. Speed of deployment — Response time is critical to successful mediation of many types of hazards. Availability is a secondary concern.1.1 Problem Description The task then becomes: How can a large team of robots be controlled in a manner that maximizes the resource availability for mission-specific criteria? Some example criteria are • Team longevity — The maximum operational lifetime of the robotic team is an important factor in a number of scenarios. or some combination thereof.3 will more formally outline the problem discussed in Section 10. This can be addressed through a number of means. it must be available to respond to a situation in a timely fashion. and communication capabilities. In order to address this.1. which require a robotic team to thoroughly search an area. Section 10. will be discussed in Section 10. 10. these robots must have the sensing and computational abilities to recognize when they have been somewhere or if they have found the object of their search.

7.1 Resource Distribution Problem The distribution of resources is an underlying problem in two key ways when responding to emergency scenarios. First it is the distribution of resources relative to the incident that is being responded to. The MinDART team [54.2 Related Works 10. followed by a discussion of simulation in Section 10. including a short discussion of hardware presently in development for this work. 38] are each . For example. 63]). it is necessary to stockpile food. Section 10. 10.4 will present the formulations for the optimization of one of the behaviors in greater detail. whereas others are heterogeneous [7. 62]. This can be thought of as the “tactical” question of “How do I best maneuver my resources in the field to achieve a goal?” The first problem is outside of the scope of this work. Often these teams are called upon to identify features in an environment and either manipulate them (such as carrying targets to a goal [39. 9.300 Modeling and Control of Complex Systems specific challenges and issues involved. when preparing for natural disasters.1 Homogeneous Teams One interesting case of the resource distribution problem is that of search and retrieval. 10.6. and then return with those objects of interest. followed by concluding remarks in Section 10. and reconfigurable molecules [51. Future work. 51].1. 70]. 41]. homogeneous teams [71. water. robots (resources) must be distributed to find objects of interest at unknown locations. 10. CONRO [10. This can be thought of as the “strategic” response planning or “Where is the best place to store my resources for distribution?” The second key area deals with getting the necessary resources to the place they are needed. or achieve a degree of coverage for surveillance or reconnaissance applications [22]. 55] is an example of a homogeneous team that is used for this type of operation. create representations of the area [16. This work focuses on identifying means by which robotic teams can disperse supplies among its members. One interesting subclass of homogeneous teams is those that are comprised of reconfigurable homogeneous modules. 21]. This includes many aspects of information gathering and path planning in order to minimize the response time. Systems such as PolyBot [71. and medicine in geographically convenient areas to expedite the initial relief until outside relief can arrive. In the search and retrieval scenario.2. There are generally two types of robotic teams. Military leaders position units around the world to enable resources to be deployed to areas of need as they arise in order to respond quickly and restore order. Resource distribution on the scene has been classically done with a fairly static team of robots. It is assumed that the robot teams exist and are in useful positions.5.2. will be discussed in Section 10.

Drenner et al. thus it cannot be reliant upon the ability to identify a specific node in the configuration. The Millibots [7] combine a variety of sensors on a team of small robots in order to accomplish more than a single robot could. In References [38. although online reconfiguration is possible to demonstrate the versatility of the system in dynamic environments. Reconfiguration planning is generally done off-line. In CONRO. moving as a snake or crawling as a caterpillar. In order to control the system. predefined gait tables are used to control the system. [13] report on scouts equipped with infrared (IR) emitters that are able to illuminate the way for other scouts that carry different sensor payloads. In each of these cases. Another type of heterogeneous team was characterized by Hirose as the “super mechano-colony” or SMC [19].2. As a result. systems comprised of robots with extra capabilities are distributed among robots of lesser capabilities. 72] is a modular system similar to CONRO. Marsupial systems [39] enable the sharing of locomotion (and at times power) of larger systems with smaller ones. 31]. or SLAM. a variety of locomotion methods can be achieved by combining the homogeneous modules [62]. This is particularly useful in applications such as Simultaneous Localization and Mapping. it is possible to share other resources as well. The dynamic nature of the reconfiguration of individual modules requires a dynamic communication network topology. However. 68.2 Heterogeneous Teams Often robotic teams are built on limited budgets and with limited supplies.A Framework for Large-Scale Multi-Robot Teams 301 able to reconfigure themselves by rearranging identical modules. it is possible to build self-reconfiguring robots in three dimensions in polynomial time. Using this static identification. Individual modules can be autonomously reconfigured to support a variety of locomotion gaits. However. many of the individual members may not have the high-fidelity sensors desired. a hormone-inspired distributed control scheme has been developed [58]. CONRO can configure itself as a long chain. This work has been extended [33. The system must be fault tolerant as nodes fail or as configurations change. This lattice structure can make transitions among the modules which allows it to traverse any continuous surface of the structure. Sharing sensor data and computational resources of a robotic team are important aspects in completing a mission. 10. The system can be reconfigured into a rolling wheel configuration as well as a variety of multilegged walkers.1. control of the system is done with static identification of each individual module. PolyBot [71. To accomplish this. different configurations result in different types of locomotion. This enables small robots to be autonomously transported over large terrain quickly and without exhausting precious battery power. In these cases. topology information is used to send a virtual hormone which dictates the motion of the individual modules. 70. Simulations have been developed where the PolyBot is configured such that it can carry an object while rolling. 20] to .

3 How this Approach Differs Although there have been many approaches to solving the classical distribution of resources problem. all robotic systems would have the capability to perform fully autonomously.1.302 Modeling and Control of Complex Systems utilize multiple robots to manipulate a mothership.2. where no single robot has any authority over any others as they simply relay information to adjoining robots. this approach is built around a different control structure. systems have had to connect to share capabilities. The work proposed here makes use of a similar. In the last decade. more complex or more useful device has been around for centuries. The difference here is that this system proposes the manipulation of a large-scale marsupial device that creates a unique multitiered hierarchical system designed to be scalable across multiple marsupial docking stations. Second. Alternatively. the act of docking autonomously has been explored by many. The typical hierarchical structure has many benefits. system described in detail in Reference [8]. the team size is reconfigured dynamically as the larger systems have the capability to expand the number of “active” systems dramatically. but rarely increases. In this approach.2 Autonomous Docking and Deployment The act of having one agent connect to another to form a larger. but different. the number of which may become lower due to failure. Many robotic teams have a simple hierarchy for the control structures that underlie the coordination of team members [72].2. This can be depicted by thinking of the robots as nodes of a tree. the system being transported simply replaces the transporting system so the number of “active” robots within the team does not change. 10. such as enabling easier communication routing protocols and task allocation structures. traditional resource distribution is built around a static team configuration. this approach differs in two main ways. a robotic team could be controlled through a fully connected graph where each robot can talk directly to each other robot. First. autonomous docking can be achieved more quickly and more accurately than through simple teleoperation. There are many reasons and approaches to docking and deployment. Ideally. At the onset of the response. An exception to this is cases in which a marsupial system is able to release a second robot to perform a task. Autonomous docking allows for autonomous recharge. Often. Sliding scales of autonomy are often beneficial in resource allocation tasks as the end user may wish to simply assign a set of resources to perform a certain task and take control of individual assets when they have reached an objective. as reported in Reference [58]. the robotic team is known to consist of a fixed number of systems. Fully distributed control is also possible. 10. Although this makes direct communication possible. which allows for . it also requires significant overhead to achieve. The multihierarchical control can be thought of as something that is a bit in between. According to Reference [39]. From the interlocking of railroad cars to the docking of spacecraft in orbit.

a laser beacon is used to determine the angle of the robot relative to the wall. Hada and Yuta [17] describe a robotic system that leaves a recharging station. Failures in docking can occur if the robot does not properly align itself. For the purposes of autonomous recharging. Most modern approaches are derivatives of this method. Over the duration of this first experiment.1 Autonomous Recharge Perhaps one of the most necessary behaviors of any robotic system that is expected to have sufficient longevity to carry out missions is the ability to either generate its own power (through the use of on-board reactors such as the SlugBot [29] or solar conversion. A second experiment was conducted where the robot patrolled the hallway outside of the lab.391 km. Walter [66] was probably the first to address the issues of autonomous recharge. 10. Traversing larger environments where the docking stations are not immediately obvious to the robots may require additional searching or mapping of the environment prior to docking. The charging connection is mounted on the rear of the Pioneer 2 DX.2. As designed. This process is repeated continuously for a week to verify the ability to recharge. Once the proper orientation is achieved. the system was 97% successful in autonomous docking and recharge. Further. . The work in Reference [61] implements a recharging station and docking mechanism for a Pioneer 2 DX robot. The robot will attempt docking by calculating the time difference between the detection of a reflective tape near the docking station by each of two photovoltaic sensors. in preengineered environments. In work reported in References [17] and [61] the docking station (or marker) was always within the robot’s initial field of view. As the robot servos towards the docking station. These are detected with an IR system as well as monitoring for a voltage spike when electrically connected.A Framework for Large-Scale Multi-Robot Teams 303 an increased operational lifetime. the robot rotates using odometry alone. the placements of docking and charging stations may be known a priori. Alternatively. The robot will then blindly back into place for recharging. It is worth noting that the capability to reach a home base and recharge has become commercially common in systems such as the Roomba autonomous vacuum cleaner from iRobot [49]. This in turn allows for increased sensor coverage. The robot then aligns itself with the tape and moves to a docking station. the robot travels 3. This method resulted in 1080 successful dockings and chargings.2. there are tremendous advantages in terms of locomotion capabilities by transporting smaller systems on more capable ones. A pair of robots would use light sensors to find their way to a charging station. thus the robot must drive in backwards without the benefits of any forward-facing sensors. and moves back. as found on Mars Rovers such as the Sojourner [35]) or to autonomously reach a recharging station and receive power that way. moves to a fixed point. The process begins with a colortracking system that identifies the location of the charging station. which can expedite the mission being undertaken by a robotic team.

This method allows for rapid deployment of a sensor network as the helicopter is not nearly as subject to terrain obstacles that other robots may face. and each robot simply navigates to the required position. The method could be improved by running some form of SLAM on the individual robots and attempting to merge their maps. and reach them faster [43]. There are different types of coverage that can be accomplished when deploying robots into different situations. these coverage methods are classified into “blanket coverage. When the environment is not known a priori. there are examples where robots are deployed to explore the environment in order to maximize coverage [23. 10. This system served as a testbed for a more robust system comprised of an RWI ATRV-Jr that carries the prePackbot Urban. If the robotic team is to perform surveillance on a known area for an extended duration.2 Deployment Methods In order to successfully deploy a team of robots or other sensors. In Reference [15]. deployment through this means provides little information about where the robot ultimately ends up. such as the power available to each of the smaller systems. the starting points of the scouts are known relative to the Pioneer. Unfortunately. Here.2.304 Modeling and Control of Complex Systems 10. there are many factors to consider. which limits the usefulness of the method for pinpointing features in the environment.” and . a computational geometry model is used to identify the best location for each of the robots. thus the robots that are launched must either make their way back to a “safe” zone or be considered disposable.2. the size of the area that must be covered. In Reference [42] a marsupial approach enables a modified powerwheels jeep to transport a smaller Inuktun microbot. however. 22]. These systems have been used to show that marsupial teams when working cooperatively can reach locations that independent systems cannot. In Reference [11] Corke et al. This system has no means of asset recovery. the communication capability of the team. In addition. the scout team [21] can be deployed ballistically by a larger robot. Scouts have also been deployed via their jumping mechanism when used in conjunction with a Pioneer 2 robot [27]. there would be a great uncertainty in their initial starting points. and the nature of the mission at hand. the ranger. to investigate a target area.2. deployment via this mechanism allows for recovery of the robots.” “barrier coverage. the task becomes that of the “art gallery problem” [46].2. Here. the nature of the mission may require that they deploy themselves in a particular fashion. For instance. allowing for a better estimation of environmental landmarks relative to the larger Pioneer. describe a group of sensors that are carried and released from an unmanned helicopter. These factors can be combined in different ways for different deployment strategies.3 Deployment Models Regardless of how the robots are physically deployed. Marsupial robotics need not be restricted to ground vehicles.

Each has benefits and drawbacks in terms of costs.4 Docking Methods There are many approaches to docking. the robots or sensors are deployed in a static lattice formation for monitoring changes in environmental hazards such as oil slicks. In Reference [5] a reactive.2. In Reference [26]. In Reference [65] a mobile platform is docked in industrial environments using low-cost IR sensors. 28. sensors. For resources that have been dynamically allocated in response to environmental hazards.” There have been many methods of deploying robots to achieve these different forms of coverage. In Reference [47]. potential fields-based system is developed which allows for the docking of one robot with another. Dispersion based on animal navigation [37] moves individual robots away from the highest local density of other robots that are visible to onboard sensors. robots simply follow wind direction to locate the source. Teams of the mobile docking stations can deploy sensors into a lattice such as the ones utilized in References [45. 25. however. Deployment to map boundaries can also be accomplished by using deformable contour models [36]. or radiation levels. 10. the docking station itself can act as an additional landmark for the coordination described in References [22. in the “deployed” state. such as identifying the source of a release of a biological or chemical agent into the atmosphere.A Framework for Large-Scale Multi-Robot Teams 305 “sweep coverage. For instance.2. The IR sensors are ideal for robots that may not have the computational resources . This is accomplished by limiting inter-robot communication and changing the “density” of the sensors by using a recursive dyadic partitioning in order to define a minimum set of sensors or robots necessary to map the boundary to a desired resolution. 67]. air plumes.2. dispersion occurs through random walk with artificial potential fields to prevent robots from coming too close together. 40]. the mobile docking station can deploy additional units to act as control points or retrieve those units if they are not necessary to reduce the complexity of a control scheme such as the deformable contour models used in Reference [36]. in this case the control points are the mobile robots capable of sensing the environmental disturbance. Finally. In Reference [4].6. 52. Each of these deployment models can be enhanced through the utilization of a system such as the one proposed in Section 10. When mapping the boundaries of pollutants. Here. there are many models for which robots can be deployed. 34. 67]. the goal is to maximize observation time by optimizing the power consumption used in communication. In Reference [45. when attempting to identify the source of a plume. an overhead camera is used to direct a “virtual pheromone” to cause the robots to disperse. and processing and communication requirements. This is a similar approach to using deformable contour models (or snakes) for image segmentation. the mobile docking station could deploy additional robots or sensors to increase the aperture of sensing when it is caught in a local minimum or shallow gradient. 23]. Following gradients is done with different degrees of success in References [53.

Here. in responding to scenarios where the environment does not remain static. 48. a final dead-reckoning approach is utilized. or the lattice structures of Reference [51. 38]. Thus. the behavior is determined by what portion of the visual field is utilized and which control law is adopted. This work is extended in Reference [57]. In the first stage. The previous docking methods [5. There are many vision-based approaches. the optical flow is used to control the motion of the robot to maneuver the robot perpendicularly to a specified docking location. In each of these behaviors. whether due to an aftershock of an earthquake or an adversarial agent manipulating the environment. the purpose of the docking is not so much to transport one robot to a specific . 70]. In order to complete the final stage of docking. Further. docking is achieved using an alignment and breaking process. a combination of sensing methodologies is used to maneuver a robotic platform in an industrial environment that is relatively known. The initial guidance is based on odometry and landmarks fixed within the environment. and eco-docking. In Reference [50]. the docking station must remain stationary or maintain a topological reference to the landmark that remains invariant to any lighting changes that may occur during the duration of the deployment. as errors tend to accumulate from odometry. In Reference [56] this work is used with a stereo camera system that allows for centering and wall-following behaviors. 57. In both instances. where the concepts of ego-docking. 9]. Rather. a combination of IR and ultrasonic sensors with a relatively known environment is used to correct these errors. In polymorphic systems such as CONRO [10. where the camera is mounted on an external docking platform. are introduced. a landmark is identified using color cameras based on the spherical coordinate transform [24]. The work in Reference [5] is extended in Reference [39]. PolyBot [71. 39] are all methods of docking in marsupial or industrial environments where the goal of docking involves the robot arriving in a specified location. However. However. The final docking procedure makes use of artificial landmarks and a CCD camera. where the vision system is mounted onboard. and many experiments were conducted to verify that the autonomous docking was more reliable than teleoperation. This method works fairly reliably. a more reliable method that is independent of the environment must be found. the docking robot is only aware of the general location of the docking station and moves directly towards the docking station. These sensing readings are used to generate a trajectory that maneuvers the system to a docking station.306 Modeling and Control of Complex Systems for processing vision onboard. a second motion control scheme is used based on the original potential field method. Once the docking robot can detect the docking station with onboard sensing. 50. In Reference [48]. When the docking robot is within suitable range. the motion is controlled without precalibrating the vision systems. This system uses two passive retroreflectors to localize the robot relative to a known docking location. the approach relies on a three-stage behavior for docking. the feature that is used to complete the docking must be identified during the deployment phase. Thus. it may be impossible to maintain a fix on any feature for docking. with few obstacles.

The control process for such a system would be the same hormone-inspired system that is used to control CONRO [58]. connector style. In Reference [72]. the process of docking is divided into two phases. The robots are then skidsteered such that when they are partially aligned.A Framework for Large-Scale Multi-Robot Teams 307 location. the two modules must move close enough to sense the guidance signal from the other robot. but to actively combine capabilities to accomplish something that otherwise would not be possible. the process of connecting modules varies depending upon the distance between the modules. As with many other docking systems. the approach and the physical latching. the errors are potentially too great for an accurate coupling. These devices are designed with the shear forces in mind that such a junction may encounter when covering difficult terrain. until they are properly aligned and can be connected. The design of the system allows for a faster docking and ultimately faster reconfiguration. First. the IR sensors become saturated as the modules become close. speed of deployment and recovery of assets. At medium ranges. Due to the inaccuracies in angle estimation. At each connection side. and docking schema are important to the development of a reconfigurable system that is capable of transitioning from a homogeneous system to a heterogeneous system. When one robot docks entirely inside another such as in Reference [39]. At long range. Alignment is accomplished using IR sensors. They are used to measure distance and align the modules as well as serve as a communication link between attached modules. As a result. PolyBot [71] is similar to CONRO in its approach to connecting modules. they must orient themselves using the guidance sensors. special connectors have been designed to allow for fast and convenient docking. Here. The appropriate choices of connection method. At this point. docking is achieved between two mobile robots by using a forklift on one robot and a receptacle set on another. and complexity of controlling a system. there is a different set of needs than when two robots join together to form one larger robot. each module has three male connection sides and one female connection side. but does so with a series of genderless connection plates with four pins and four mating chamfered holes. Achievement of docking is highly dependent upon the means by which two robots dock. the goal is to develop systems that can operate in three dimensions and have the ability to tether themselves to one another. Partial alignment of the system is accomplished through a vision system. . In the case of CONRO [59]. Second. both electrically and mechanically. The time and energy required to accomplish these tasks will ultimately affect mission performance in terms of battery life. errors can propagate across the length of the chain. there is a three-staged approach to the docking procedure. In Reference [6]. These IR transmitters and receivers are dual purpose. there is a set of IR transmitters and receivers. the pins should be correctly aligned and the mechanical design of the latching mechanism facilitates the final stage of docking. In the case of Reference [44]. the wheel slippage will force the connection together. The work in Reference [59] is extended to design for self-assembly in space [60].

This control algorithm attempts to address these issues while moving the robot to bring the manipulator into a preferred configuration.308 10. localization relative to one another [23. The control strategies for coordinating the team of robots are built on a set of primitives for each robot’s sensory and locomotion capability. 10].2) the first approach may be more applicable. including the object being transported.3 Cooperative Robotics Modeling and Control of Complex Systems To this point. exploration and coverage [22. Behaviors and tasks can be assigned and executed sequentially. rather than developing a more complex manipulation scheme. robot control. or conditionally to ensure that the robotic team is able to manipulate the large object to the desired position. This is not necessarily a particularly efficient utilization of available resources.6. Second. In Reference [1]. or for exploration and simultaneous localization and mapping (SLAM) [41. The first approach treats the whole system. The control algorithm for performing this task is decentralized and scalable to a larger number of robots. a manipulator is mounted on a mobile platform. the movement of the mobile docking station as described in Section 10. a team of four robots is utilized to cooperatively move a large object. most mobile platforms are subject to nonholonomic movement while manipulators have a much higher degree of freedom. in a system with reduced complexity (such as the one proposed in Section 10. Two methods of controlling the manipulation of a large object are discussed. The first is that manipulators and mobile platforms generally have different dynamics. The interface allows the operator to intervene and prevent the systems from reaching an error state. The second approach distributes the control to the two robots manipulating the object. This is an important aspect to understand when integrating the two aspects of the task that these robots will be used for.2 can be also characterized in the sense of cooperative manipulation. 18].6. Two of the robots in this task are designated observers and two of them are involved in the cooperative manipulation. The combined locomotion is broken down into trajectory planning. The control algorithm in this work is built around two issues. 14. as a large kinematic chain. However. this work has mainly focused on the ability of robots to cooperate in terms of locomotion [71. This system has been used . and formation control tasks. 16]. This allows for a simplification of manipulation by allowing the manipulator to be brought into a desired position by planning the motion of the mobile platform first. the work in Reference [1] requires that two of the four team members be dedicated to observing the environment and the coordination of the two robots actually manipulating the environment. In Reference [69]. however. In Reference [63] a framework for controlling multiple mobile manipulators is presented. This system relies upon a human–robot interface that is built upon a hierarchical control system [2]. then performing the preferred manipulation. There are savings in terms of communications and computational complexity in the second approach. Also.2. concurrently. This is advantageous in that centralized control would be too complex when dealing with large numbers of robots and dynamic reallocation of robot roles is necessitated by the environment. 14].

and power capabilites to facilitate the organization of the entire team. who benefit from the speed of the train and collectively .3 Challenges and Issues The underlying problem in utilizing heterogeneous robotic teams is that of resource availability. robots. 21]. and so forth) can generally be thought of as a process in which individual systems combine as a means to benefit one another. so that the other “motherships” or “docking stations” can be used in conjunction with one another. the robot must be capable of deploying.1 General Theory of Docking Docking of two or more systems (agents. but actually serve as useful models for this work: • Trains — Although the environment and perception of trains as a docking system may be restrictive. The framework to support this must also allow that the deployable robots be interchangeable. In some cases they provide power through tethers [42]. The robot that is deploying other systems must have the communication. The benefit is often in terms of energy conservation. a locomotive provides the power to move the whole train. The larger “motherships” or “docking stations” provide a number of services to the deployed systems.1.A Framework for Large-Scale Multi-Robot Teams 309 successfully to transport flexible boards with a team of three mobile robots.3. these approaches must improve in scalability in order to support significantly larger robotic teams. The challenge becomes: “Where and how are resources best deployed for a specific mission?” One potential approach is to marsupially transport and deploy smaller robots to the scene to complete the mission objectives. there are a number of systems that may not be traditionally thought of as docking. time conservation. Some cars of a train can carry coal or other fuel for powering the train. or resource availability. The problem of deploying robotic systems has been studied in detail. they are an interesting example of the benefits of docking. 10. Increased scalability and team size have many associated challenges. coordinating. In order to ensure maximum longevity of the team. and recharging the deployable systems.1 Docking 10. processing. The ability of railroad cars to “dock” with one another is an example of a system where two or more unique systems share capability. 10. With this viewpoint. machines. recovering. However. Other cars carry passengers. whereas work in the area of robot recovery has been more limited.3. while in others they provide enhanced processing and communication capabilities [27. The system is capable of transitioning formation to allow for changes in support and navigation of changing environments. For example.

but as the need arises. the aircraft carrier at sea is one of the best models for this work. Aircraft carriers — In terms of scalability. however. A recharging station and docking mechanism for a Pioneer 2 DX robot has been implemented using a color-tracking system in a fixed environment [61]. in general the sensing needs to be adaptive.310 Modeling and Control of Complex Systems save energy over each individual passenger trying to traverse the distance alone. The term polymorphic systems can be used to describe the second type of robot docking. Other cases of extremely calibrated environments include the use of low-cost infrared sensors [65]. This is similar in many ways to the train example. In these cases. The carrier must coordinate its movements with other ships as well as the aircraft it services. deploy. and resupply not only the set of aircraft intended for its use. Work has been done in this area since the 1960s when systems attempted to align themselves for recharge using a combination of photovoltaic sensors and light beacons [66]. Enhancements through the use of stereo camera systems allow for improved centering and wall following for better alignment with the docking location [56]. Vision-based docking is achieved in Reference [48] and extended in Reference [57] where optical flow is used to control the maneuvering of the robot to a desired docking location. However. but can service a number of them simultaneously. A carrier must transport. The ability to refuel a plane while in flight enables extended mission durations or relocation of resources over longer distances. this is limited as the number of agents being simultaneously serviced is reduced in the case of a single refueling plane. photovoltaics are used in conjunction with reflective tape to align robots to a fixed docking station. the work is done using highly calibrated systems. the docking process is much more complicated as the alignment must occur in three dimensions. Such calibration may be possible as in fixed industrial environments where all path planning with respect to landmarks can be done a priori [50]. Such approaches have met with limited success in commercial applications such as the Roomba [49] and the Sony AIBO [3] which have the ability to recognize a docking station and recharge as necessary. Robotic docking — Robotic docking can be classified into three models similar to the models discussed above. The basic case is that of simply docking with a known position for power purposes (similar in nature to in-flight refueling). • In-flight refueling — A more commonly accepted example of docking is that of in-flight refueling. where modules of differing capabilities are interlinked to form a • • . Carriers can deploy and recover fighters individually. In Reference [17]. the carrier must be able to service planes and helicopters from other carriers as well. recover. Here.

One of the main advantages of this approach is that marsupial teams when working cooperatively can reach locations that independent systems cannot. the initial control laws are used to guide the robot until it reaches a position that is relatively close to the goal. Other methods attempt to utilize knowledge gained from the point of deployment to match the environment from other perspectives when returning [39]. a brief understanding of traditional algorithms for docking is necessary. and control one or more other robots. robotic docking starts with a set of control laws. This is illustrated in Figure 10. there are the marsupial systems. 57. Teams of robots such as the Scout and Pioneer team [27]. or the marsupial teams of Reference [42] use a larger robot to transport. and fine approach. 59]. However. until the robot can dock. coordinate. Examples of these types of systems are the PolyBot [71. regardless of how the stages are accomplished. coarse approach. The limitations of these assumptions form one of the general problems that this work attempts to solve. This phase generally consists of visionbased sensing [24. 70. In highly calibrated environments [50] this is known a priori. 38]. Coarse approach phase — Once the docking station location is known. in this case. the robotic modules may interconnect in a nonlinear fashion.3. The robot then works through three general phases of motion which cycle between identification of the docking station. Some approaches use GPS locations. and the lattice structures of References [51. This section will provide a brief look at the traditional algorithms for docking and the various assumptions made in specific implementations. 9. the second phase involves traversing a relatively long distance toward the docking area.A Framework for Large-Scale Multi-Robot Teams 311 more useful structure. Finally. the majority of algorithms for docking utilize the three stages. • Docking station identification — Identification of the location of the docking station is the first step in the docking process and is accomplished in several ways. 61]. generally formed a priori. These control laws may be a preprogrammed sequence of events in highly calibrated environments [50]. allowing for robots that can traverse a variety of terrains. Traditionally.1. the Georgia Tech Yellow Jackets [12].1. 61] or IR communication [71. The individual implementations for each phase are often the result of tradeoffs in available sensing and computational capabilities of the teams. 48. However. the CONRO [10. 65]. 17]. 57. 10. slightly adaptive algorithms that attempt to identify landmarks in otherwise unknown (but structured) environments. 48. Marsupial systems are in many ways like the carrier example. 72]. Here. beacons [66. and reach them faster [43]. or visual landmarks [24.2 Traditional Algorithms for Docking To understand the contributions of this work. • . or systems that follow artificial potential fields towards a recognized goal [5].

there is no guarantee that the fiducial calculated at deployment. If a failure occurs in one phase.3 Traditional Assumptions and Limitations of Docking To this point there have been a number of assumptions. it is likely that over time the lighting conditions will change. as is the case in Reference [39]. final alignment must occur. as the robots may be in close proximity to one another. causing differences in visual interpretations . Failures often occur as the result of improper alignment leading the robot away from the final goal position. including: • Static environments — Many approaches assume that the environment will not change and that preplanned movements and known landmarks will be effective for docking. the desired application of these systems is often in highly dynamic environments such as those found in urban search and rescue. However. However. In this case. where the landmarks are provided and not likely to change. This works in specially built environments such as factories. longer-range sensors may become saturated [72] and relationships between landmarks at deployment may become skewed from the close vantage point [39]. the robot may have to revert to a previous phase or restart the process entirely.1. will remain the same by the time the mission is completed. As the robot may be attempting mechanical or electrical interconnection. successful docking results in either physical interconnection or the stowing of one robot onboard another.3.1 Traditional docking algorithm flow diagram.312 Modeling and Control of Complex Systems Predefine control laws Identify docking station Coarse approach phase Fine approach phase Dock FIGURE 10. In these instances. 10. more precision is required. Also. • Fine approach phase — Once the robot is close to the docking point.

However. the docking station will remain in a static location. Thus. . Many environments will cause tethers to fail as the result of snagging. Marsupial teams have typically been one mother and one daughter [43]. this reduces the speed of exploration as all robots must be deployed from the same area. which in turn may reduce the operational lifetime as the deployable systems may expend more energy to reach their intended destinations. Given the cost and complexity of the docking station. in dynamic environments. leaving the docking station in a fixed position removes any aspects of self-preservation. Additionally. it is important that it be able to maneuver if the position it is in becomes unstable. The design considerations for such a platform are discussed in greater detail in Reference [8]. First. especially in the case of tethering. coordinating.A Framework for Large-Scale Multi-Robot Teams 313 of an area. Team size — Generally speaking. the route traversed by the deployable system may become impassable.3. one should not assume that wireless communication will work perfectly. there are three main disadvantages to this approach. Other times tethers can be restrictive in terms of energy expended to pull the tether or simply the finite tether length. • Fixed deployment and recovery positions — Many of the implementations assume that whether or not the environment changes. robotic teams that have been physically implemented have consisted of teams of fewer than ten robots (with the exception of a few projects such as the Centibots project [30]). although more recently there have been up to five robots involved as in the scouts and pioneers [27] or the AIBOs and ATRV-Jr [12]. provides upfront knowledge of where to return. negating the benefits of the assumption. and recharging a sufficiently large number of smaller robots. This assumption allows easy identification of the docking area. and allows the use of physical mechanisms such as tethers. Larger teams should expedite the exploration and area covered. However. a docking station that is mobile and an algorithm for deploying robots that may act as communication-repeating nodes is necessary. Second. the creation of large teams has been somewhat infeasible. an approach that distributes control across a team of docking stations should provide the scalability to enable large teams of inexpensive robots. Finally. in highly dynamic environments. Infinite power and perfect communication — Many approaches do not consider power expenditure or the costs of communication. recovering. • • 10.2 Hardware One of the major challenges to an endeavor of this nature is the design and availability of a hardware platform that is physically capable of deploying. It is also possible that the physical environment changes as the result of falling rubble or aftershocks of an earthquake. For cost and complexity of control reasons.

At each time t. Improving upon this would be to minimize the average distance between the estimated location of the robots. The docking station can then simply traverse this tree. As a result. However. including position [RXi (t).1 Cooperative Maneuvering for Energy Conservation In order to conserve power. given that the rates of power expenditure may vary.4 Optimization of Behaviors 10. but adds a penalty to solutions that put the docking station in an area where the remaining energy at time t is less than the required energy for the robot to return. Di (t + δt)) = xe x− f where: x= di VRi (10. and an estimated battery life E Ri . it would be ideal to deploy robots into several areas by having the docking station first drop them off and then move between the deployed robots as necessary to recharge them. or the docking station may be situated centrally. velocity VRi . Di (t + δt)) f = 1. Thus. and significantly more battery life than the deployable systems. which do not carry as much energy. However. This leads to the creation of the following objective function. a means of intelligently maneuvering both the docking station and deployable systems must be devised which conserves as much energy as possible. Each docking station Di is similar in that it has a position [DXi (t). This cost function can be given by Equation (10. causing the deployed systems to expend more power to return. the docking station may be constantly moving. which may expend power unnecessarily. . This would be more beneficial to the deployed systems. C( Ri (t).314 Modeling and Control of Complex Systems 10. The distance function dist is simply the distance of the shortest known traversable path to the docking station. both of these cases can result in an undesirable situation where robots do not have enough power to make it back to the docking station and are unable to be recovered.1) E Ri di = dist( Ri (t). DYi (t)] and a velocity VDi .4.1). This energy conservation can be formulated as identifying a minimum spanning tree of the estimated location of all deployed systems such that the distances between the docking station and the robots are minimized. a cost function C must be used to calculate the cost of the robot returning to a potential location of the docking station at time t + δt. Each robot Ri has several parameters. RYi (t)]. a third approach is necessary for finding a position that minimizes not only the distance (and effectively the energy required to return).

.3 depicts many possibilities for the best placement of the docking station.2 The set of possible locations at time t + δt. assuming a constant velocity VD . The task then becomes finding the value of θ D that minimizes: n (10. DY (t) + αVD δtsin(θ D )]). resulting in: n C( Ri . [DX (t) + αVD δtcos(θ D ).2) can be represented as a circle with the origin located at the position of the dock Di at time t with a radius of r Di . in order to reduce the complexity. the deployed robots may be clustered or partitioned for collection by a single docking station. [DX (t + δt). additional constraints may be needed for both algorithmic and practical purposes.4) Figure 10. In these cases. i=1 (10. It is important to consider that the best placement for the docking station may be to remain stationary or move only slightly. This gives the new position as: DX (t + δt) = DX (t) + r D cos(θ D ) DY (t + δt) = DY (t) + r D sin(θ D ).5) There are still cases where this approach may be suboptimal or fail. Additionally. This requires the introduction of a velocity scalar α over the range [0−1]. the exploration of the deployable systems may be reduced to a function based upon available sensing and communication ranges. which is equivalent to the docking station’s velocity times the time increment or VDi δt. rather than allowing multiple docking stations to attempt pickup of the same systems.A Framework for Large-Scale Multi-Robot Teams 315 FIGURE 10. For example. i=1 (10. DY (t + δt)]). which would reduce the spatial area that the docking stations must cover.3) C( Ri . and for the optimization to be conducted over both α and θ .2) (10. The set of possible positions for the docking station Di at time t + δt (depicted in Figure 10.

. (b) Possible New Location.5 and 10. they “explore” their surroundings by randomly moving at each time step (depicted by a grey dot). The simulation requires that the deployable robots maintain a series of internal states based upon available power and estimated proximity to the docking station as shown in Figures 10. 10. While the deployable robots have sufficient power.5) at each time step. (c) Another Possible Location.5). A single docking station capable of recharging 10 robots at a time attempts to optimally position itself among the deployable robots using Equation (10. depicted as an asterix.4.5 Simulation A simulation setup has been developed to test the validity of the method proposed in Section 10.4 depicts a sample deployment. continually repositions itself according to Equation (10.316 Modeling and Control of Complex Systems (a) Initial Configuration. Figure 10. The docking station. FIGURE 10.6.1 which involves 50 robots deployed at random in an open field.3 Computing the cost at different potential locations of the docking station at time t + δt.

If the robot lacks sufficient power to return to the docking station. Robots are removed from the cluster as they successfully dock or die. A priority cluster is then chosen and the docking station will optimize its location in order to recover the members of that cluster. When the cluster is empty.4 Simulated initial deployment of mobile docking station and robots. Figure 10. Exploration continues until the robot’s power is sufficiently low and the robot begins to “seek home” (depicted as a dark dot).6 are shown in Figure 10.A Framework for Large-Scale Multi-Robot Teams 200 150 100 50 0 –50 –100 –150 –200 –200 317 –150 –100 –50 0 50 100 150 200 FIGURE 10.6 illustrates several time steps of the simulation utilizing the clustering. However. As long as the relative spatial distribution of the robots is small. This suboptimal performance of the cost function required that a prioritization mechanism be added. as shown in Figure 10. . it enters an “abandoned” state (depicted as a grey x) where it attempts to minimize power consumption until the docking station is close enough to “seek home” or until it runs out of power completely and is “dead” (depicted as a black x). a problem emerged where groups of robots would become “abandoned” in diametrically opposite positions.5) holds. Given that all of the robots are of equal priority. For visualization purposes the surfaces of the objective function for each of the steps in Figure 10. a new cluster is chosen. Robots are clustered based on their estimated spatial positions and internal states. simulation shows that the cost function in Equation (10.7. as the robots disperse outwards. This came in the form of clustering using the ISODATA [64] clustering algorithm. the docking station would remain stuck in the center and all of the robots would die.5.

FIGURE 10. .5 A sample situation in which the “global” solution results in extremely suboptimal performance.318 Modeling and Control of Complex Systems 200 150 100 50 0 –50 –100 –150 –200 –200 –150 –100 –50 0 50 100 150 200 FIGURE 10.6 A sample run of the simulation. 200 150 100 50 0 –50 –100 –150 –200 –200 –150 –100 –50 0 50 100 150 200 (a) Initial Configuration.

FIGURE 10.6 (Continued). .A Framework for Large-Scale Multi-Robot Teams 200 150 100 50 0 –50 –100 –150 –200 –200 319 –150 –100 –50 0 50 100 150 200 (b) Time = 400. 200 150 100 50 0 –50 –100 –150 –200 –200 –150 –100 –50 0 50 100 150 200 (c) Time = 600.

.320 200 150 100 50 0 –50 –100 –150 –200 –200 Modeling and Control of Complex Systems –150 –100 –50 0 50 100 150 200 (d) Time = 1000.6 (Continued). FIGURE 10. 200 150 100 50 0 –50 –100 –150 –200 –200 –150 –100 –50 0 50 100 150 200 (e) Time = 1100.

FIGURE 10.5 0 –0. 1 0.7 Objective surfaces for each time step shown in Figure 10.5 –1 10 8 6 4 2 0 –2 –16 –14 –12 –10 –8 –6 –4 (a) Initial Configuration. FIGURE 10. .A Framework for Large-Scale Multi-Robot Teams 200 150 100 50 0 –50 –100 –150 –200 –200 321 –150 –100 –50 0 50 100 150 200 (f ) Time = 1200.6 (Continued).6.

9 0.5 0. –66 –64 –62 –60 –58 –56 FIGURE 10.85 0.4 –8 –10 –12 –14 –16 –18 –20 –158 –156 –154 –152 –150 –148 –146 (c) Time = 600.8 20 18 16 14 12 10 8 Modeling and Control of Complex Systems 94 96 98 100 102 104 106 (b) Time = 400. . 1 0.95 0.95 6 4 2 0 –2 –4 –6 –68 (d) Time = 1000.7 0.9 0.322 1 0.7 (Continued).8 0.96 0.6 0.99 0. 1 0.98 0.97 0.

A Framework for Large-Scale Multi-Robot Teams 323 1 0.4 0.6 0.9 0.6 0. FIGURE 10.5 0.2 18 16 14 12 10 8 6 52 54 56 58 60 62 64 (e) Time = 1100.8 0.8 0.2 0 8 6 4 2 0 –2 –4 –72 –70 –68 –66 –64 –62 –60 (f ) Time = 1200. .3 0.7 0.4 0.7 (Continued). 1 0.

Initial Distribution = Unimodal 3500 3000 2500 Time 2000 1500 1000 500 0 Docked Exploring Seeking Home Abandoned Robot State (a) Center unimodal Dead + + + + FIGURE 10. and below the distribution of robots.1 1857. to the right.5 296.7 866.4 1502.1 2275.3 929.7 128.7 232.9 173. Dock = Center.5 584.9 201. Table 10.2 149. The other half of the runs were conducted with the robots distributed in a bimodal random distribution. Figure 10.3 2844.5 203.4 880.2 425.6 2220.9 805.0 1736.1 Results A series of 24 simulations with 50 robots were run with the docking station starting in positions in the center of.8 545. . Figure 10.1 Abandoned 973.7 560.1 Modeling and Control of Complex Systems Results of Simulation Runs Initial Initial Dock Robot Position Distribution Centered Unimodal Right Unimodal Lower Unimodal Lower right Unimodal Centered Bimodal Right Bimodal Lower Bimodal Lower right Bimodal Average Mean Time in State Runs 3 3 3 3 3 3 3 3 24 Docked 194.2 2973.5.9 shows the box plots of the time spent in each state for the bimodal runs.8 2046.7 10.7 918.8 117.3 Dead 2488.0 145.3 2002.5 210.3 199.4 Exploring 1072.5 843.1 390.8 1844.8 Time spent in each state across all unimodal runs.5 Seek Home 270.324 TABLE 10.5 1483.8 439. Half of the runs were conducted with the robots initially in a unimodal random distribution. to the lower right.9 431.0 488.7 554.1 shows results of these simulations.6 1957.8 shows the box plots of the time spent in each state for the unimodal robot distributions.1 2960.9 572.

8 (Continued). Initial Distribution = Unimodal 4000 3500 3000 2500 Time 2000 1500 1000 500 0 Docked Exploring Seeking Home Abandoned Robot State (b) Right unimodal Dead 325 Dock = Lower Right.A Framework for Large-Scale Multi-Robot Teams Dock = Right. Dead . Initial Distribution = Unimodal 4000 3500 3000 2500 Time 2000 1500 1000 500 0 Docked Exploring Seeking Home Abandoned Robot State (c) Lower right unimodal FIGURE 10.

Dead .9 Time spent in each state across all bimodal runs.8 (Continued). Initial Distribution = Unimodal 3500 3000 2500 Time 2000 1500 1000 500 0 Docked Exploring Seeking Home Abandoned Robot State (d) Lower unimodal Dead FIGURE 10. Initial Distribution = Unimodal 3500 3000 2500 Time 2000 1500 1000 500 0 Docked Exploring Seeking Home Abandoned Robot State (a) Center bimodal FIGURE 10.326 Modeling and Control of Complex Systems Dock = Lower. Dock = Center.

A Framework for Large-Scale Multi-Robot Teams Dock = Right. Dead . Initial Distribution = Bimodal 3500 3000 2500 Time 2000 1500 1000 500 0 Docked Exploring Seeking Home Abandoned Robot State (c) Lower right bimodal FIGURE 10.9 (Continued). Initial Distribution = Bimodal 3500 3000 2500 Time 2000 1500 1000 500 0 Docked Exploring Seeking Home Abandoned Robot State (b) Right bimodal Dead 327 Dock = Lower Right.

Initial Distribution = Bimodal 4000 3500 3000 2500 Time 2000 1500 1000 500 0 Docked Exploring Seeking Home Abandoned Robot State (d) Lower bimodal Dead FIGURE 10. .10 Power consumption of the docking station for recharging and movement.328 Modeling and Control of Complex Systems Dock = Lower. 12 ×104 Docking Station Power Utilization Over Time Power Used to Manuever Docking Station Power Used to Recharge Robots 10 Power Used per Time Step 8 6 4 2 0 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Time FIGURE 10.9 (Continued).

A Framework for Large-Scale Multi-Robot Teams Mean Power Across Robots Over Time 500 450 400 350 Mean Power 300 250 200 150 100 50 0 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Time 329 FIGURE 10. The simulation requires an extension that causes more time to be expended while the robot is physically docking. Additionally. Among these extensions are additional cost functions for different mission priorities to develop a means of weighting priorities in order to address more complex goals. 10. there are several extensions necessary for consideration. The mean power available to the robots for a single run is shown in Figure 10.11 Mean power available to the distributed robots.1 Simulation Extensions In order to increase effectiveness as a model for a potential multi-robot system. this time expenditure will be based on the actual physical means of docking and thus will be evaluated when the hardware design is finished.10. In terms of the simulation.6 Future Work 10. there will be some work in allowing for simultaneous docking and .11.6. However. there are several areas of extension that should be considered: • Docking process and procedure — The current “instantaneous docking” method will not hold in a real-world system. The power consumption of the docking station for movement and recharging from a single run is shown in Figure 10.

The development of extensions to the algorithm discussed in Section 10. Additionally.” the addition of dynamic events in a cluttered environment will be the true test of the simulation. This may involve swapping tasks for a given robot and it may require the docking station to pick up robots from one area to redirect the search in another. more effort should be placed on intentional initial deployment by the docking station as well as a better method for redeployment. deployment. recovery.1 is necessary to ensure the correct distribution of multiple docking stations. and recharge. the docking station’s movement may be driven by restoring communication rather than minimizing power consumption. However. If the docking station is unable to communicate with deployed systems. multiple docking stations will be advantageous. This new robotic platform will actually . • Multiple docking stations — In order to support larger teams of deployable robots.6. Presently. The simulated results in Section 10. As the cost and availability of single-chip solutions become more feasible. a more accurate communication model will be necessary to ensure that the docking station’s estimation of robot position and state are accurate in a real-world scenario.5 revolve around the development of the hardware platform capable of actually performing the tasks of transportation. very few small robots have the capability of forming self-organized repeating networks. Increasing the accuracy of this behavior will better model the power consumption of the system and thus make the recovery model more accurate. This process is outside of the scope of the “energy minimization” and “resource recovery” aspects of this work.2 Hardware Innovations The work discussed in this chapter is the first step in a much larger process that will culminate in the development of a unique multi-robot system. Communication issues — The present simulation does not take into account transmission delays associated with distance and repeating communication.330 Modeling and Control of Complex Systems deployment of multiple robots as the present designs call for the inclusion of multiple docking bays. Robot deployment — The initial deployment of robots is random. but it is important nonetheless. Thus.4. robots integrating these capabilities should appear. • • • • 10. which is a feasible simulation of deployment via airdrop or similar mechanism. Robot motion — The random motion when “exploring” by the deployed robots is not particularly useful in terms of actual multi-robot systems. Environments — Presently the simulation is running in an “open field. communication on miniature hardware platforms will have to evolve.

This may involve proxy processing of sensing which could result in a large amount of resources dedicated to computation. • Distributed detection and decontamination — Determining the location of hazardous materials requires that robotic team members collect samples of the environment and bring them back for further analysis. which should expedite development time and lower development costs. Here. be built using existing platforms and result in a system similar to Hirose’s “super mechano-colony” [19].A Framework for Large-Scale Multi-Robot Teams 331 FIGURE 10.12. It must also be able to communicate with the deployed systems as well as communicating back to a base station independent of the robot team.6. However. The reuse of the MegaScouts as the locomotive capability of this system allows the development to focus only on the deployment. If necessary. It must be capable of performing the computation necessary to coordinate the deployed robots. unlike the “super mechanocolony. 10. Additionally. The method for recovery described here can optimize the . and recharge aspects. This allows a greater flexibility of usage. The design of the docking station component is dependent on a number of parameters. the docking station portion will function as a stationary system and potentially allow deployment of the MegaScouts for other purposes.” the MegaScouts will cooperatively manipulate a mobile marsupial docking station capable of deploying a dozen smaller robots. which is a ruggedized platform with the strength and versatility to maneuver large payloads. These capabilities must be fit into a package that leaves enough volume for transporting robots and is still maneuverable by the cooperating MegaScouts. recovery. A preliminary concept drawing of the system is shown in Figure 10. the robots propelling the system will be the MegaScout [32].3 Applications There are a number of applications that can make use of a scalable multi-robot system as described in this work. there must be sufficient energy reserves present in order to continuously operate itself as well as provide power to resupply other systems.12 Concept of the modular mobile docking station.

The ability to deploy multiple robots from a single docking station that can be moved to multiple locations allows for new dispersion methods. A semipermanent sensor network can be created where the docking station will assist in deploying and optimizing the location of sensors to monitor an area for a given task. • 10. The one presented here is just one of many ways in which this system could be used to recover robots. Initial design considerations of the development of a physical system capable of performing the tasks discussed are also presented. and CNS-0420836. recovery options are increased when the docking station is mobile. Future work is necessary to extend these simulations to coordinating multiple mobile docking stations and more complex environments. The approach presented attempts to provide a method for multi-robot systems to scale to large numbers in practical and computationally feasible ways. CNS-0224363. CNS-0324864. • Pervasive coverage — The resource sharing and optimization of resource distribution afforded by this model will allow more work in the area of pervasive coverage. . Preliminary simulations are discussed and initial findings are shown.7 Conclusions This work has discussed the background and limitations of existing algorithms and methods for multi-robot systems. the docking station can reconfigure the team and recharge the robots that are monitoring the area. As the environment changes. This method is built upon the design of a marsupial system that attempts to maximize system longevity by relocating the docking station in order to minimize energy expended for recovery. This work has also been supported through Grants IIS-0219863. the increased computational power of the docking station can do the work in planning where sensors are necessary using feedback from the deployed robots.332 Modeling and Control of Complex Systems time that samples are brought back and can be used to retask robots to sample. Acknowledgments This material is based on work supported under a National Science Foundation Graduate Research Fellowship. For example. or decontaminate areas that have been designated as contaminated. Improved dispersion and recovery models — The underlying hardware system proposed and software simulation provide a unique testbed for dispersion and recovery algorithms. monitor. Conversely.

P. Nelson. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. Waletzko. pages 200–205. T. Y. Dellaert. J. 1994. 13. 3. 8. Burt. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robot Systems. Bajcsy. R. Peterson. Sony global . I.net/Products/aibo/index. Castano. Papanikolopoulos. McGuire.A Framework for Large-Scale Multi-Robot Teams 333 The authors also wish to acknowledge Casey Carlson for the concept drawings of the mobile docking station. PA. Shen. Mobility enhancements to the scout robot platform. C. E. 2002. Integration of reactive and telerobotic control in multi-agent robotic systems. pages 155–164. pages 3602–3609. M. Yun. Yamamoto. Adams. A. K. html. B. E. Human management of a hierarchical control system for multiple mobile agents. Anaheim. Ali. Alegre. DC. Toward a team of robots with repair capabilities: A visual docking system. Burt. and X. In AAAI Mobile Robot Competition Workshop. I. Arkin and K. J. Kaess. Will. S. Wang. CA. C. C. Washington. C. The Georgia Tech yellow jackets: A marsupial team for urban search and rescue. Murphy. R. 1995. R. Saripalli. and N. S. Chokkalingam. Paul. 2000. D. Kumar. 2. Khosla. R. Walker. Alberta. In Proceedings of the Third International Conference on Simulation of Adaptive Behavior. 9.sony. IEEE Transactions on Robotics and Automation. Will. References 1. Berhault. Mar. Autonomous deployment and repair of a sensor network using an unmanned aerial vehicle. Autonomous navigation in a manufacturing environment. 7. CONRO: Towards deployable robots with ˜ inter-robot metamorphic capabilities. pages 44–49. Dahlin. Bererton. W. N. A. R. A. pages 1069–1074. pages 333–342. Drenner. A. Castano. C. T. Brighton. 2004. Moshkina. Kosecka. Drenner. E. Modular mobile docking station design. Corke. Apr. Yesin. P. Kratochvil. Edmonton.AIBO global link. In Proceedings of the IEEE Conference on Decision and Control. Balch. 2006. M. R. Arkin and R. A. S. Merrill. volume 1. Paredis. L. pages 473–478. . 12. Stubbs. and D. Rus. Rybsk. May 2002. Hrabar. M. In Proceedings of the 5th International Symposium on Distributed Autonomous Robotic Systems. L. Navarro-Serment. B. Adams and R. Pittsburgh. F. 10. R. Autonomous and self-sufficient ˜ CONRO modules for reconfigurable robots. 4. 6.-M. Khosla. England. Dec. Mintz. Sukhatme. 1990. D. http://www. Aug. B. Carlson. J. Aug. In IEEE International Conference on Robotics and Automation. and K. 2000. Grabowski. K. R. In IEEE International Conference on Robotics and Automation. Papanikolopoulos. In Government Microcircuit Applications Conference. I. Autonomous Robots. McMillen. R. Millibots: Small distributed robots for surveillance and mapping. V. F. J. Bererton and P. C. C. Oct. 2000. volume 1. and P. 8(3):309–324. Cooperative material handling by human and robotic agents: Module development and system synthesis. and G. 1994. 2000. and P. Seventh International Symposium on Experimental Robotics. R. R. 11. Ravichandran. J. 6:445–454. Aug. Paul. and P. Mandelbaum. pages 3524–3529. 5.

volume 3. Fox. Hyams. Kadioglu and N. C. M. Ishida. Sukhatme. Heterogeneous teams of modular robots for mapping and exploration. Kagawa. Kruppa. J. E. H. Maui. A first experiment of long term activity of autonomous mobile robot: Result of repetitive base-docking over a week. 2000. 2000. and P. A. H. Hada and S. S. 2000. Sensors and Actuators B. Voyles. Matari´ . Moriizumi. M. 2000. M. Grabowski. Reprinted in Unmanned Systems Magazine. Y. Rybski. and S. Malver. Hayes. Hirose. M. 22. D. R. E. 23. T. L. 20. Budenske. Odor-source localization in the clean room by an autonomous mobile sensing system. 19. and G. pages 2297–2302. 1996. Special Issue On Heterogeneous Multi-Robot Systems. 2003. pages 235–244. S. pages 2849–2854. In Proceedings of the 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems. G. Anaheim. In Proceeedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. K. Papanikolopoulos. R. Krantz. 2002. H. Stoeter. Murphy. B. volume 3. and K. In Lecture Notes in Control and Information Sciences. W. . Autonomous Robotics. F. Oct. Experimental Robotics VII. Reconfigurable robots for distributed robotics. AUVS-92. P. 8(3):325–344. In Proceedings of the 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. 17. Gini. In Government Microcircuit Applications Conference. and A. Oct. Nakamoto. A method for transporting a team of miniature robots. Mar. Autonomous Robots. 25. Howard. 33:115–121. J. pages 1664–1669. E. Study of autonomous mobile sensing system for localization of odor source using gas sensors and anemometric sensors. R. D. Autonomous Robots. Switzerland. A probabilistic approach to collaborative multi-robot localization. Hawaii. 18. R. and G. Yuta. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems.334 Modeling and Control of Complex Systems 14. Springer-Verlag. EPFL. volume 3. 1994. volume 2. Hirose. In Proceedings of the 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems. 24. the Nineteenth Annual AUVS Technical Symposium. Papanikolopoulos. CA. Paredis. Ishida. 8(3):293–308. Burgard. June 1992. Hougen. Suetsugu. Switzerland. Super mechano-system: New perspective for versatile robotic system. N. Khosla. D. Swarm robotic odor localization. Howard. T. and T. 2001. Mar. Sukhatme. Damoto. A. Localization for mobile robot teams c using maximum likelihood estimation. A. Y. F. B. J. C. Matari´ . Goodman. Navarro-Serment. 26. pages 72–75. Sensors and Actuators A. A. Berlin. Mar. D. NV. 2000. 15. Kawakami. An incremental deployment c algorithm for mobile robot teams. Las Vegas. M. S. and R. Nakamoto. 2000. pages 249–258. 9:7–16. Moriizumi. Dec. and T. Bonney. 16. and R. 2002. Powell. J. Thrun. 10(4):28–34. 45:153–157. pages 434–439. EPFL. Oct. Yesin. pages 1073–1078. Study of super-mechano-colony (concept and basic experimental setup). volume 1. J. Dvorak. In Proceedings of the ISER 2000 Seventh International Symposium on Experimental Robotics. 21. Command control for many-robot systems. Nelson. Gage. 2000. Martinoli. 27. A. Position estimation and cooperative navigation of micro-rovers using color segmentation.

In Proceedings of the 41st SICE Annual Conference. In Proceedings of the 7th International Symposium on Experimental Robotics. Vincent. I. 39. Vona. Japan. Analysis of positioning uncertainty in reconfigurable networks of heterogeneous mobile robots. 29. Using modular self-reconfiguring robots for locomotion. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. An autonomous water vapor plume tracking robot using passive resistive polymer sensors. Designing and understanding adaptive group behavior. Stoeter. In Conference Optoelectronic and Microelectronic Materials and Devices. Slugbot: A robotic predator in the natural world. Rus. Low-order-complexity visionbased docking. Hyams. Tracking environmental level sets with autonomous vehicles. Goodman. Schneider. 2000. In Proceedings of the IEEE International Conference on Robotics and Automation. and D. H. 33. B. 9:175–188. Steering control of a mobile robot using insect antennae. Burt. Dec. L. 2003. 2000. Lin. C. Pfeifer. Canada. Guibas. pages 193–204 Kluwer Academic Publishers. Fox. D. L. Tsikata. In Proceedings of the 1998 IEEE/RSJ International Conference on Intelligent Robots and Systems.jpl. B. pages 141–145. O. Multi-Robot Systems: From Swarms to Intelligent Automata. R. volume 2. J. Murphy. 1998. Melhuish. M. Eriksen. Waletzko. 35. Lewis. Kratochvil. Shimoyama. 37. In Proceedings of the Fifth International Symposium on Artificial Life and Robotics for Human Welfare and Artificial Liferobotics. Limketkai. Stewart. Jackson. On analysis and control of collective behavior of a super-mechano colony in an object retrieval mission considering energy recharging and congestion. Self-reconfigurable molecule robots as 3d metamorphic robots. Robotic systems to track chemical plumes. 17(6):922–930. E. D. Kazadi. I. Victoria. Dec. Heterogeneous implementation of an adaptive robotic sensing team. and P. E. C. pages 259–269. Goerke. 30. C. NL. Ko. volume 3. A. N. K. In A. I. K. C. Jan. R. L.html. A. R. New Orleans.gov/MPF/roverpwr/ power. Oct. Stubbs. pages 530–535. Schultz. Sept. editors. Matsuo. Minten. 32.nasa. In Proceedings of the IEEE International Conference on Robotics and Automation. Honolulu. 4(1):51–80. 2001. Ruspini. T. In S. M. Apr. Bertozzi. Mourikis and S. and C. Taipei. Recent Developments in Cooperative Control and Optimization. 31. 2003. McGray and D. D. 2002. Miura. B. T. volume 1. pages 537–540. S. Dordrecht. Moriizumi and H. Rus. Parker. Konolige. Drenner.A Framework for Large-Scale Multi-Robot Teams 335 28. Marthaler and A. 36. HI. 38. Centibots: Large scale robot teams. B. http://mars. Oita. Adaptive c Behavior. C. E. 2004. 41. I. Olson. pages 837–842. I. J. Kuwana. and M. Ishida. Green. S. Micire. 1995. D. Roumeliotis. Butenko. 2003. Kurihara and Y. BC. pages 572–579. A. Kelly. Briesemeister. J. Dec. and M. Autonomous Robots. A. pages 4264–4269. IEEE Transactions on Robotics and Automation. 1995. Taiwan. K. pages 470–475. Matari´ . R. Murphey. 34. Agno. editors. E. M. and H. . Kotay. B. 2000. D. and F. Pardalos. D. McMillen. Y. volume 2. LA. and L. Dordrecht. Holland. A. and H. 40. Ortiz. Papanikolopoulos. 2002. Aug. Mars microrover power subsystem. Kluwer Academic Publishers.

Taiwan. D. Santos-Victor and G. CA. Seattle. Ellis. France. 1995. IEEE Transactions on Robotics and Automation. J. 55. R. In Proceedings of the 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. 2001. Will. R. volume 3. WA. Mitra. 1997. Aachen. Nowak.irobot. Veeraraghavan. 1995. June 2004. FL. Camera self orientation and docking maneuver using normal flow. Art Gallery Theorems and Algorithms. and A. R. July 1998. 999–1006. Willett. and M. Oct. Russell. Docking in self-reconfigurable robots. volume 2488. Maui. In Proceedings of the Third Annual Conference on Autonomous Agents. 22. http://www. Rybski. and A. In Proceedings of the 2003 IEEE International Conference on Robotics and Automation. Estimating inhomogeneous fields using wireless sensor networks. S. Schoolcraft. No 6. 1995. E. 7(4):473–474. . 49. Marsupial and shape-shifting robots for urban search and rescue. A.336 Modeling and Control of Complex Systems 42. 50. Pearce. 59. 13(4):2–4. 56. Kelley. and G. Questa. U. 15(2):14–19. W. Santos-Victor and G. IEEE Journal on Selected Areas in Communications Vol. IEEE Transactions on Mechatronics. J. P. and L. D. In DARS 2004. Thiel. Russell. Thiel. 18(5):700–712. May 1994. pp. R. pages 1158–1163. T. In Proceedings of SPIE. Navigation and docking maneuvers of mobile robots in industrial environments. 46.-M. 2002. Shen and P. Shen. Gini. S. N. 1999. P. 48. J. A robotic system to locate hazardous chemical leaks. A. and R. pages 281–288. pages 274– 283. Schilling. Toulouse. R. T. LaPoint. Sandini. P. 2004. Hawaii. pages 556–561. 2002. Apr. J. Will. and N. Taipei. Germany. ACM Press. W. A. Larson. and P.com/. Oxford University Press. L. 67(3):223–238. 1998. Aug. 52. and M. In Proceedings of IECON. 43. 2003. Orlando. 2002. Germany. 54. D. Mackay-Sim. Johnson. IEEE Intelligent Systems. In Proceedings of the International Conference on Intelligent Autonomous Systems. Osentoski. 1987. R. 57. Visual behaviors for docking. Sandini. Marina del Rey. E. Pollock. H. Marsupial-like mobile robot societies. R. Connectors for self-reconfiguring robots. Dec. IEEE Intelligent Systems. Murphy. E. M. Rybski. New York. Roomba self-charging home base. R. Roth and K. M. In IEEE International Conference on Robotics and Automation. Mackay-Sim. Sandini. In Proceedings of the International Conference on Intelligent Autonomous Systems. 51. pages 2672–2677. Kiefer. B. pages 2458–2462. Dispersion behaviors for a team of multiple miniature robots. Nilsson. Grossmann. Evaluation of control strategies for multi-robot search and retrieval. pages 1049–1054. volume 2. Rus. Self-reconfiguring robots. Hormone-inspired adaptive communication and distributed control for conro self-reconfigurable robots. Sensing odour trails for mobile robot navigation. Bugajska. Mar. Gini. Sept. J. A.-M. M. Oct. pages 364–365. O’Rourke. H. Karlsruhe. Visual based obstacle detection: A purposive approach using the normal flow. Stoeter. 47. 44. Larson. Rybski. 45. Computer Vision and Image Understanding. 53. In IEEE International Conference on Robotics and Automation. Mar. Murphy. P. Salemi. 2000. M. 58. Ausmus. Deveza. Communication strategies in multi-robot search and retrieval: Experiences with minDART. pages 301–310. Papanikolopoulos.

Workshop on Human-Friendly Welfare Robotics Systems. pages 2643–2648. In Proceedings of the 31st Conference on Decision and Control. Portugal. Yamakita. May 2002. 1989. and J. 63. Shukuya. In Proceedings of the 2003 IEEE International Conference on Robotics and Automation. Vaz. pages 3022–3027. Y. W. Taiwan. Taniguchi. pages 951– 956. Sept. Kumar. In Proceedings of the 2003 IEEE International Conference on Robotics and Automation. Jung. K. Washington. The Living Brain. T. J. Shen. 65. and K. D. 2004. 2000. 62. and M. 7(4):442–451. Roufas. M. P. R. Ostrowski. . Taiwan. volume 1. 2000. volume 1. Roufas. an approach to urban search and rescue. Eldershaw. Desai. Norton. and Y. Tucson. Duff. IEEE/ASME Transactions on Mechatronics. Roufas. 2002. Therrien. 2003. G. A. P. S. New York. Decision Estimation and Classification. Duff.W. pages 69–76. V. Dec. In Proceedings of the IEEE International Conference on Robotics and Automation. pages 1050–1055. M. Taipei. Yim. 72. Grossmann. Willett. I. a 66. Polybot: A modular reconfigurable robot. P. Analysis of formation control of cooperative transportation of mother ship by SMC. DC. and R. D. G. Sukatme. Silverman.-M. Guimar˜ es. Yim. Staying alive: A docking station for autonomous robot recharging. Self-assembly in space via selfreconfigurable robots. volume 3. In Proceedings of the 2003 IEEE International Conference on Robotics and Automation. W. Yun. 2003. and K. Will. 1992. pages 735–740. AZ. 67. Korea. volume 3. M. D. M. Coordination of multiple mobile manipulators. Will. M. B. V. New York. 2001. In IEEE International Symposium on Industrial Electronics. In 1st Intl. and C. 69. W. 71. Sugar. In Proceedings of the 2002 IEEE International Conference on Robotics and Automation. Duff. and P. Y. Nies. July 1997. Y. Stoy. Taipei. Yamamoto and X. D. In Proceedings of the IEEE International Conference on Robotics and Automation. Docking of a mobile platform based on infrared sensors. pages 124– 133. Taejon. G. Ribeiro. 68. Zhang. C. 64. D. M. pages 2516–2521. 61. K. pages 514–520. pages 3828–3833. R. Apr.A Framework for Large-Scale Multi-Robot Teams 337 60. Implementing configuration dependent gaits in a self-reconfigurable robot. 1963. Coordinating locomotion and manipulation of a mobile manipulator. Connecting and disconnecting for chain self-reconfiguration with polybot. Sept.-M. Dec. In Proceedings of Information Processing in Sensor Networks. Backcasting: Adaptive sampling for sensor networks. and B. Nowak. W. Yim. Sept. volume 2. Wiley. volume 2. 2003. and G. Modular reconfigurable robots. Khoshnevis. W. Ferreira. Martin. 70. C. Shen. Walter.

.

.......................................5..........5.........4 11............... 344 Mathematical Details ..................................................................................... rather than that of individual genes...............2 Review of Probabilistic Boolean Networks.................. Bittner...................................................................................................... signal processing..... 362 Acknowledgments ....... 364 Genomics concerns the study of large sets of genes with the goal of understanding collective function............................................................................................ 345 11................................... Such a study is important because cellular control and its failure in disease result from multivariate activity among cohorts of genes.......................... In this chapter..........3 Control in Probabilistic Boolean Networks: Problem Formulation .................7 Concluding Remarks .. 354 11............ 353 11................................................. and control are quite well suited for studying this kind of multivariate interaction.............5.............. Ashish Choudhary...... we will present an overview of the research that has been accomplished thus far in this interdisciplinary field and point out some of the open research challenges that remain......3 11...... Very recent research indicates that engineering approaches for prediction..................... 342 Dynamic Programming .....11 Modeling and Control in Cancer Genomics Aniruddha Datta......... 350 11..2 Real-World Example Based on Gene Expression Data......4 Solution Using Dynamic Programming ...................................................................................................................6. 364 References............5................................................. 339 ........ 354 11................................................................. 341 Intervention ... Dougherty CONTENTS 11.......................... 340 Genetic Regulatory Networks and Dynamical Systems................. and Edward R...................5 Introduction... 347 11........ 362 11....... 345 11......................... 358 11.........1 Simple Illustrative Example ........1 11.....2 11....................................1 Introduction ................8 Future Directions ........6.....................6 Examples................................ Michael L.......

including biologists and medical practitioners. and final tumor development in the disease. Thus. Having established the connection between optimal control theory and a problem in cancer therapy. disease-free conditions. utilizing a melanoma cell line. Indeed. It is our belief that techniques of this type. Proliferation genes or oncogenes. which are well proven and time tested in the engineering literature. This is a standard finite horizon optimal control problem for Markov chains and can be solved using the classical technique of dynamic programming. play an important role in the initiation. The turning ON of oncogenes or the turning OFF of tumor suppressor genes does not usually occur in isolation. . we will consider the following control problem: given a PBN whose state transition probabilities depend on an external (control) variable. these events are triggered by many other genes acting together in a collective fashion.1 Introduction Cancer is caused by a breakdown in the cell cycle control system. both of which can lead to tumorigenesis and cancer development. which turn on cell division. Such readers can skip directly to Section 11. A real-world example. This usually manifests as uncontrolled cell proliferation or reduced apoptosis. First. The first few sections of the chapter are kept nontechnical in an effort to make the results accessible to a wide audience. 11. it makes sense to study the holistic behavior of a set of genes in order to gain an understanding of the gene regulatory mechanisms that endow a cell with remarkable adaptivity under normal. We will also report on ongoing work and progress made in overcoming some of these challenges. Such rule-based networks provide a convenient tool for studying interactions between different genes while allowing for uncertainty in the knowledge of these relationships. we will highlight several challenges that will have to be overcome before such methods can be used in actual clinical practice. then suspend treatment and observe the effects over some additional time before deciding if further intervention is necessary. This chapter will first introduce PBNs as a modeling tool and then consider the issue of control in probabilistic Boolean networks. The choice of the finite horizon performance index is motivated by cancer treatment applications where one would ideally like to intervene only over a finite time horizon. which serve as brakes on cell division. and tumor suppressor genes. progression.6.2 from Section 11. is included to illustrate the key ideas.4. will one day find application in actual cancer therapy. Such an understanding can also be expected to provide useful pointers toward the regulatory mechanisms that fail in disease and perhaps also suggest appropriate intervention strategies for treatment.340 Modeling and Control of Complex Systems Among the recent paradigms that have been proposed for modeling genetic regulatory networks are the so-called probabilistic Boolean networks (PBNs). choose the sequence of control actions to minimize a given performance index over a finite number of steps.

By viewing the expression status of the genes across different conditions. it is possible that several sets of predictors may provide us with an equally good estimate of its transcriptional activity. relative to the best estimate in the absence of any knowledge of the transcriptional activity of the predictors. This goodness can be measured in terms of the COD. 11. Although the COD does not tell us anything about whether the transcriptional activity of the target genes are regulated by their predictors. if two genes behave such that both of them turn ON and OFF simultaneously. or vice versa. This can be mathematically formalized via the notion of the coefficient of determination (COD). it makes sense to quantify how our estimate for the expression status of a particular gene. the expression status of one particular gene will not depend on just one other gene but on a multitude of genes. called predictor genes. one would like to study their behavior in a collective fashion. it is reasonable to infer that they may be coregulated. can be improved in the presence of the knowledge of the expression status of some other genes. On the other hand. To establish such multivariate relationships between genes.Modeling and Control in Cancer Genomics 341 The advent of DNA microarray technology has heralded a new era in the study of intergene relationships [1–5]. it is sufficient to note that the COD measures the degree to which the best estimate for the transcriptional activity1 of a target gene can be improved using the knowledge of the transcriptional activity of some predictor genes. A rigorous treatment of the COD and its use in genomic signal processing. cancer classification. For our purposes here. the COD is a number between zero and one. Such a ranking would provide us with a quantitative measure to determine the relative ability of each of these predictor sets to improve the estimate of the transcriptional activity of that particular target gene. In general. and so forth can be found in References [6–8]. As mathematically defined. called a target gene. with a higher value indicating a tighter relationship. it is possible to rank several sets of predictors in terms of their CODs. if one gene turns ON when another turns OFF and vice versa. it may be possible to establish relationships between the genes that show variations in expression status at least a minimum number of times across the different conditions. the expression status of the two genes are inversely related and one may be an inhibitor for the other. For instance. . it does indicate the existence of intergene relationships. By using DNA microarrays.2 Genetic Regulatory Networks and Dynamical Systems Given a set of genes of interest. Furthermore. Given a particular target gene of interest. it is now possible to simultaneously monitor the expression status of thousands of genes. This can be facilitated by observing the transcriptional activity profile or gene activity profile of this set of genes across different 1 The process of synthesizing m-RNA from DNA is called transcription. for a particular target gene.

342

Modeling and Control of Complex Systems

conditions and using that knowledge to infer the existence of relationships between the different genes via the coefficient of determination. As already discussed, given any target gene, in general there will be several sets of predictor genes, each with different determinative power as indicated by the COD. Thus, while attempting to infer intergene relationships, it makes sense to not put all our faith in one particular predictor set; instead, for a particular target gene, a better approach would be to consider a number of predictor sets with high CODs, while discarding the predictors with low CODs. Thereafter, each retained predictor set could be inferred to be indicative of the transcriptional activity of the target gene with a chance (probability) proportional to its COD. Having inferred the intergene relationships as above, it is now possible to use this information to model the evolution of the gene activity profile over time. The only assumption that is required is that the transcriptional activity of a given target gene at a particular time point is determined by the transcriptional activity profile of its predictors at the previous time point. Because each target gene is associated with several predictors, it is not possible to say with complete certainty what the transcriptional activity status of that gene will be at the next time point. Instead, one can compute the chances that at the next time step the target gene will be transcriptionally active, based on the information about the gene activity profile at the previous time step. The time evolution of the gene activity profile now defines a dynamic system. The fact that the gene activity profile at a particular time point depends only on the gene activity profile at the immediately preceding time point makes the dynamic system a Markovian one. Systems of this type have been extensively studied in the dynamic systems literature, and many powerful techniques are available for analyzing their behavior [9]. The ideas articulated in this section have been mathematically formalized in References [10,11] by introducing the so-called probabilistic Boolean networks (PBNs). The Markovian property of such networks has been established and results from Markovian dynamic system theory have been used successfully to study different aspects of their evolutionary behavior [12,13]. Here it is appropriate to mention that the PBNs are a generalization of the Boolean networks introduced earlier in the biological modeling literature by Kauffman [14–16].

11.3

Intervention

The PBNs mentioned in the last section are “descriptive” in nature in the sense that they can be used to describe the evolution of the gene activity profile, starting from any initial profile. For treatment or intervention purposes, we are interested in working with “prescriptive” PBNs where the chances of transitioning from one gene activity profile to another depend on certain auxiliary variables, whose values can be chosen to make the gene activity profile evolve in some desirable fashion.

Modeling and Control in Cancer Genomics

343

The use of such auxiliary variables makes sense from a biological perspective. For instance, in the case of diseases like cancer, auxiliary treatment inputs such as radiation, chemotherapy, and so forth may be employed to move the gene activity profile away from one that is associated with uncontrolled cell proliferation or markedly reduced apoptosis. The auxiliary variables could also include genes that serve as external master-regulators for all the genes in the network. The values of the individual auxiliary variables, which we also refer to as control inputs, can be changed from one time step to the other in an effort to make the network behave in a desirable fashion. The evolution of the gene activity profile of the PBN with control depends not only on the initial profile but also on the values of the control inputs at different time steps. Furthermore, intuitively it appears that it may be possible to make the gene activity profiles of the network evolve in a desirable fashion by appropriately choosing the control input at each time step. We first provide a nontechnical discussion of the underlying principles. The PBN with control is a special case of what is referred to in the engineering control literature as a controlled Markov chain [17]. Dynamical systems of this type occur in many real-life applications, the most notable example being the control of queues. Given such a controlled Markov chain, the objective is to come up with a sequence of control inputs, usually referred to as a control strategy, such that an appropriate cost function is minimized over the entire class of allowable control strategies. To arrive at a meaningful solution, the cost function must capture the costs and the benefits of using any control. The actual design of a “good” cost function is application dependent and is likely to require considerable expert knowledge. We next outline a procedure that we believe would enable us to arrive at a reasonable cost function for determining the course of therapeutic intervention using PBNs. In the case of diseases like cancer, treatment is typically applied over a finite time horizon. For instance, in the case of radiation treatment, the patient may be treated with radiation over a fixed interval of time following which the treatment is suspended for some time as the effects are evaluated. After that, the treatment may be applied again, but the important point to note is that the treatment window at each stage is usually finite. Thus, we will be interested in a finite horizon problem where the control is applied only over a finite number of steps. Suppose that the number of steps over which the control input is to be applied has been determined a priori to be M and we are interested in controlling the behavior of the PBN over the interval k = 0 through k = M − 1. Suppose at time step k, the gene activity profile of the PBN is given by z(k) and the corresponding control input is v(k). Then we can define a cost Ck (z(k), v(k)) as being the cost of applying the control input v(k) when the gene activity profile is z(k). For a given trajectory, the cost of control over the entire treatment horizon is simply the summation of these one-step costs. Recall that starting from a given initial gene activity profile, the evolution of the gene activity profile may follow several different trajectories. Thus, it makes sense to consider the cost of control averaged over all the possible trajectories for evolution.

344

Modeling and Control of Complex Systems

The averaged cost of control does give us one component of the finite horizon cost. We now proceed to introduce the second component. The net result of the control actions v(0), v(1), · · · , v( M−1) is that the gene activity profile of the PBN will evolve in time and finally end up in some gene activity profile z( M). Depending on the particular PBN and the control inputs used at each step, it is possible that some of the gene activity profiles may never occur. However, because the control strategy itself has not yet been determined, it would be difficult, if not impossible, to identify and exclude such profiles from further consideration. Accordingly, we assume that all the possible terminal gene activity profiles are reachable and assign a penalty or terminal cost C M (z( M)) associated with each one of them. We next consider penalty assignment. First, consider the PBN with all controls set to zero, that is, all the therapeutic interventions have been deactivated. Then divide the possible gene activity profiles into different categories depending on how desirable or undesirable they are and assign higher terminal costs to the undesirable gene activity profiles. For instance, a gene activity profile associated with rapid cell proliferation leading to cancer should be associated with a high terminal penalty, whereas a gene activity profile associated with normal behavior should be assigned a low terminal penalty. For the purposes of this chapter, we will assume that the assignment of terminal penalties has been carried out and we have at our disposal a terminal penalty C M (z( M)) which is a function of the terminal gene activity profile. Now, starting from a given initial gene activity profile, there is a certain chance of ending up with a particular gene activity profile in M steps. Furthermore, this particular terminal gene activity profile could be attained following different trajectories, each with its own chances of being followed. Thus, it makes sense to average out the terminal penalty to arrive at the second component of our cost function. The finite horizon cost to be minimized is given by the sum of the averaged cost of control and the averaged terminal penalty. Assuming that the control input v(k) is a function of the current gene activity profile z(k), we now use a mathematical technique called dynamic programming to arrive at an optimal sequence of control inputs that minimizes the finite horizon cost. In the next section, we provide a heuristic discussion of this important procedure.

11.4

Dynamic Programming

Dynamic programming, pioneered by R. Bellman in the 1950s [18], has been applied extensively in engineering applications. The control of queues in a computer server, the optimal scheduling of elevators in a building, or the optimal routing of telephone calls through a communication network are but a few examples of real-world applications where dynamic programming has played

Modeling and Control in Cancer Genomics
D 6 B 5 3 E A 5 C 3 7 2 F 4 3 I
FIGURE 11.1 Optimal fare selection problem.

345

4 G 5 4 H 4 6

J

an important role. To get a more concrete feel for what dynamic programming can do, consider the minimum fare selection problem in Figure 11.1. Here the number alongside an arrow indicates the fare involved in travelling from the vertex at the tail end of the arrow to the one located at the arrow head. For instance, the fare from point A to point C is 5 units. Clearly, there are many different paths that can be used to travel from point A to point J. The problem of interest, however, is to determine the optimal path in traveling from point A to point J, that is, the path for which the fare required is the minimum. This certainly represents a familiar real-world scenario. It can be verified that, in this example, the optimal path is given by A-C-F-H-J. Now suppose that the cost of travel between the different points is uncertain in the sense that at different times, it is represented by different sets of numbers. In this case, it would make sense to minimize the average fares. Roughly speaking, the technique of dynamic programming enables us to systematically determine the path that minimizes the average fare without having to go through unnecessary trial and error. From the discussion presented here, it is intuitively clear that the dynamic programming technique can be used to solve the optimal intervention problem posed in the last section. The technical developments are presented next. To make the presentation self-contained, some of the ideas discussed intuitively so far will be revisited, although at a much higher level of mathematical rigor.

11.5 Mathematical Details
11.5.1 Introduction Probabilistic Boolean networks (PBNs) have been proposed recently as a paradigm for studying gene regulatory networks [10]. These networks, which allow the incorporation of uncertainty into the intergene relationships, are

346

Modeling and Control of Complex Systems

essentially probabilistic generalizations of the standard Boolean networks introduced by Kauffman [14–16]. Given a PBN, the transition from one state to the next takes place in accordance with certain transition probabilities. Indeed, as shown in Reference [10], and as will be briefly reviewed in the next subsection, the states of a PBN form a homogeneous Markov chain with finite state space. Thus the PBNs form a subclass of the general class of Markovian genetic regulatory networks. The PBNs considered thus far in the literature can be described by Markov chains with fixed transition probabilities. Consequently, for such a network, given an initial state, the subsequent states evolve according to a priori determined probabilities. This setup provides a model for dynamically tracking the gene activity profile while allowing for uncertainty in the relationship between the different genes. However, it does not provide any effective knobs that could be used to externally guide the time evolution of the PBN, hopefully toward more desirable states. Intervention has been considered in the context of PBNs from other perspectives. By exploiting concepts from Markov chain theory, it has been shown how at a given state, one could toggle the expression status of a particular gene from ON to OFF or vice versa to facilitate transition to some other desirable state or set of states [12]. Specifically, using the concept of the mean first passage time, it has been shown how the particular gene, whose transcription status is to be momentarily altered to initiate the state transition, can be chosen to “minimize” in a probabilistic sense the time required to achieve the desired state transitions. These results come under the category of “transient” intervention, which essentially amounts to letting the original network evolve after reinitializing the state to a different value. A second approach has aimed at changing the steady-state (long-run) behavior of the network by minimally altering its rule-based structure [13]. This too constitutes transient intervention, but is more permanent in that it involves structural intervention. In this section, we consider PBNs where the transition probabilities between the various states can be altered by the choice of some auxiliary variables. These variables, which we will refer to as control inputs, can then be chosen to increase the likelihood that the network will transition from an undesirable state to a desirable one. Such a situation is likely to arise in the treatment of diseases such as cancer where the auxiliary variables could represent the current status of therapeutic interventions such as radiation, chemotherapy, and so forth. To be consistent with the binary nature of the state space associated with PBNs, these auxiliary control inputs will be allowed to be in one of two states: an ON state, indicating that a particular intervention is being actively applied at that point in time, and an OFF state, indicating that the application of that particular intervention has ceased. The control objective here would be to “optimally” apply one or more treatments so that an appropriate cost function is minimized over a finite number of steps, which we will refer to as the treatment horizon. The choice of the cost function, as well as the length of the treatment window, are two important aspects where the expert knowledge from biologists/clinicians could play a crucial role.

Modeling and Control in Cancer Genomics

347

Once the cost function and the treatment window have been selected, the control problem is essentially reduced to that of controlling a Markov chain over a finite horizon. Control problems of this type have been studied extensively in the controls literature for over four decades. Among the different solution methods available, the most popular one is the technique of dynamic programming, pioneered by Bellman in the 1960s [17,18]. In this section, we will formulate the optimal control problem for a PBN and arrive at a solution based on the dynamic programming approach. This section is organized as follows. In Subsection 11.5.2, we provide a brief review of PBNs as introduced in Reference [10]. In Subsection 11.5.3, we formulate the control problem for PBNs. The solution to this problem using the dynamic programming technique is presented in Subsection 11.5.4. 11.5.2 Review of Probabilistic Boolean Networks In this subsection, we provide a brief review of PBNs. We will only focus on those aspects that are critical to the development in this section. For a detailed and complete exposition, the reader is referred to References [10–12]. A probabilistic Boolean network is a formalism that has been developed for modeling the behavior of gene regulatory networks. In such a network, each gene can take on one of two binary values, zero or one. A zero value for a gene corresponds to the case when that particular gene is not expressed and a one value indicates that the corresponding gene has been turned ON. The functional dependency of a given gene value on all the genes in the network is given in terms of a single Boolean function or a family of Boolean functions. The case of a single Boolean function for each gene arises when the functional relationships between the different genes in the network are exactly known. Such a situation is not very likely to occur in practice. Nevertheless, networks of this type, referred to as standard Boolean networks [16], have been studied extensively in the literature. To account for uncertainty in our knowledge of the functional dependencies between the different genes, one could postulate that the expression level of a particular gene in the network is described by a family of Boolean functions with finite cardinality. Furthermore, each member of this family is assumed to describe the functional relationship with a certain probability. This leads to a PBN, as introduced in Reference [10]. Our discussion so far has only concentrated on the static relationships between the different genes in the network. To introduce dynamics, we assume that in each time step, the value of each gene is updated using the Boolean functions evaluated at the previous time step. For PBNs, the expression level of each gene will be updated in accordance with the probabilities corresponding to the different Boolean functions associated with that particular gene. To concretize matters, let us assume that we are attempting to model the relationship between n genes. Suppose that the activity level of gene i at time step k is denoted by xi (k). Thus, xi (k) = 0 would indicate that at the kth time step, the ith gene is not expressed, whereas xi (k) = 1 would indicate

348

Modeling and Control of Complex Systems

that the corresponding gene is expressed. The overall expression levels of all the genes in the network at time step k is given by the row vector x(k) = [x1 (k), x2 (k), · · · , xn (k)]. This vector is sometimes referred to as the gene activity profile (GAP) of the network at time k. Now suppose that for each gene i, there are l(i) possible Boolean functions:
(i) f 1(i) , f 2(i) , f 3(i) , · · · , fl(i)

that can be used to describe the dependency of xi on x1 , x2 , · · · , xn . Furthermore, suppose that f j(i) is selected with a probability c (i) so that: j
l(i)

c (i) = 1. j
j=1

Then the expression level of the ith gene transitions according to the equation: xi (k + 1) = f j(i) (x(k)) with probability c (i) . j (11.1)

Let us consider the evolution of the entire state vector x(k). Corresponding n to a PBN with n genes, there are at most N = i=1 l(i) distinct Boolean networks, each of which could capture the intergene functional relationships with a certain probability. Let P1 , P2 , · · · , PN be the probabilities associated with the selection of each of these networks. Suppose the kth network is obtained by selecting the functional relationship f i(i) for gene i, i = 1, 2, · · · , n, k 1 ≤ i k ≤ l(i). Then, if the choice of the functional relationship for each gene is assumed to be independent of that for other genes, we have:
n

Pk =
i=1

c i(i) . k

(11.2)

As discussed in Reference [10], even when there are dependencies between the choice of the functional relationships for different genes, one can calculate the Pi s by using conditional probabilities instead of the unconditional ones c (i) . j The evolution of the states of the PBN can be described by a finite Markov chain model. To do so, we first focus on standard Boolean networks. Then the state vector x(k) at any time step k is essentially an n-digit binary number whose decimal equivalent is given by:
n

y(k) =
j=1

2n− j x j (k).

(11.3)

As x(k) ranges from 000 · · · 0 to 111 · · · 1, y(k) takes on all values from 0 to 2n −1. Now to be completely consistent with the development in Reference [10], define: z(k) = 1 + y(k). (11.4)

Then as x(k) ranges from 00 · · · 0 to 11 · · · 1, z(k) will take on all values from 1 to 2n . Clearly, the map from x(k) to z(k) is one-to-one, onto and hence

Modeling and Control in Cancer Genomics

349

invertible. Thus, instead of the binary representation x(k) for the state vector, one could equivalently work with the decimal representation z(k). Furthern more, each z(k) could be uniquely represented by a basis vector w(k) ∈ R2 where w(k) = e z(k) , for example, if z(k) = 1, then w(k) = [1, 0, · · ·]. Then, as discussed in Reference [10], the evolution of the vector w(k) proceeds according to the following difference equation: w(k + 1) = w(k) A (11.5)

where A is a 2n × 2n matrix having only one nonzero entry in each row.2 Equation (11.5) is reminiscent of the state transition equation in Markov chain theory. The only difference here is that for a given initial state, the transition is completely deterministic. However, Equation (11.5) can also be interpreted easily within a stochastic framework. For instance, the vector w(k) does represent the probability distribution over the entire state space at time step k. Indeed, because of the deterministic nature of the evolution, at each time step k, the entire probability mass is concentrated on only one out of the 2n possible states, thereby accounting for the 2n -dimensional vectors w(k) with only one nonzero entry of one corresponding to the location where the probability mass is concentrated. The matrix A also qualifies as a bona fide stochastic matrix with the sole nonzero entry in each row being equal to one. Thus, given an initial state, the transition to the next state is deterministic and takes place with probability one. The stochastic interpretation of Equation (11.5) given above allows us to readily extend it to accommodate state transitions in PBNs. Toward this end, n let a and b be any two basis vectors in R2 . Then, using the total probability theorem, it follows that the transition probability Pr {w(k + 1) = a |w(k) = b} is given by: Pr {w(k + 1) = a |w(k) = b}
N

=
s=1

Pr {w(k + 1) = a |w(k) = b, Network s is selected}.Ps Ps
s∈S

= where

(11.6)

S = {s : Pr {w(k + 1) = a |w(k) = b, Network s is selected } = 1}. By letting the vectors a and b range over all possible basis vectors in R2 , we can determine the 2n × 2n entries of the transition probability matrix A. Now let w(k) denote the probability distribution vector at time k, that is, wi (k) = Pr {z(k) = i}. It is straightforward to show that w(k) evolves
2 Row
n

a has a 1 in column b if given w(k) = a w(k + 1) = b.

350 according to the equation:

Modeling and Control of Complex Systems

w(k + 1) = w(k) A

(11.7)

where the entries of the Amatrix have been determined using Equation (11.6). This completes our discussion of PBNs. For a more rigorous derivation of Equation (11.7), the reader is referred to Reference [10]. 11.5.3 Control in Probabilistic Boolean Networks: Problem Formulation Probabilistic Boolean networks can be used for studying the dynamic behavior of gene regulatory networks. However, once a probability distribution vector has been specified for the initial state, the subsequent probability distribution vectors evolve according to Equation (11.7) and there is no mechanism for “controlling” this evolution. Thus, the PBNs discussed thus far in this section are “descriptive” in nature in the sense that they can be used to describe the evolution of the probability distribution vector, starting from any initial distribution. For treatment or intervention purposes, we are interested in working with “prescriptive” PBNs where the transition probabilities of the associated Markov chain depend on certain auxiliary variables, whose values can be chosen to make the probability distribution vector evolve in some desirable fashion. The use of such auxiliary variables makes sense from a biological perspective. For instance, in the case of diseases like cancer, auxiliary treatment inputs such as radiation, chemotherapy, and so forth may be employed to move the state probability distribution vector away from one, which is associated with uncontrolled cell proliferation or markedly reduced apoptosis. The auxiliary variables could also include genes that serve as external master-regulators for all the genes in the network. To be consistent with the binary nature of the expression status of individual genes in the PBN, we will assume that the auxiliary variables (control inputs) can take on only the binary values zero or one. The values of the individual control inputs can be changed from one time step to the other in an effort to make the network behave in a desirable fashion. Suppose that a PBN with n genes has m control inputs u1 , u2 , · · · , um . Then at any given time step k, the row vector u(k) = [u1 (k), u2 (k), · · ·, um (k)] describes the complete status of all the control inputs. Clearly, u(k) can take on all binary values from [0, 0, · · · , 0] to [1, 1, · · · , 1]. As in the case of the state vector, one can equivalently represent the control input status using the decimal number:
m

v(k) = 1 +
i=1

2m−i ui (k).

(11.8)

Clearly, as u(k) takes on binary values from [0, 0 · · · , 0] to [1, 1, · · · , 1], the variable v(k) ranges from 1 to 2m . We can equivalently use v(k) as an indicator of the complete control input status of the PBN at time step k.

Modeling and Control in Cancer Genomics

351

We now proceed to derive the counterpart of Equation (11.7) for a PBN subject to auxiliary controls. Let v∗ be any integer between 1 and 2m and suppose that v(k) = v∗ . Then, it is clear that the procedure outlined in the last subsection can be used to compute the corresponding A matrix, which will now depend on v∗ and can be denoted by A(v∗ ). Furthermore, the evolution of the probability distribution vector at time k will take place according to the following equation: w(k + 1) = w(k) A(v∗ ). (11.9)

Because the choice of v∗ is arbitrary, the one-step evolution of the probability distribution vector in the case of a PBN with control inputs takes place according to the equation: w(k + 1) = w(k) A(v(k)). (11.10)

Note that the transition probability matrix here is a function of all the control inputs u1 (k), u2 (k), · · ·, um (k). Consequently, the evolution of the probability distribution vector of the PBN with control now depends not only on the initial distribution vector but also on the values of the control inputs at different time steps. Furthermore, intuitively it appears that it may be possible to make the states of the network evolve in a desirable fashion by appropriately choosing the control input at each time step. We next proceed to formalize these ideas. Equation (11.10) is referred to in the control literature as a controlled Markov chain [17]. Markov chains of this type occur in many real-life applications, the most notable example being the control of queues. Given such a controlled Markov chain, the objective is to come up with a sequence of control inputs, usually referred to as a control strategy, such that an appropriate cost function is minimized over the entire class of allowable control strategies. To arrive at a meaningful solution, the cost function must capture the costs and the benefits of using any control. The actual design of a “good” cost function is application dependent and is likely to require considerable expert knowledge. We next outline a procedure that we believe would enable us to arrive at a reasonable cost function for determining the course of therapeutic intervention using PBNs. In the case of diseases like cancer, treatment is typically applied over a finite time horizon. For instance, in the case of radiation treatment, the patient may be treated with radiation over a fixed interval of time following which the treatment is suspended for some time as the effects are evaluated. After that, the treatment may be applied again but the important point to note is that the treatment window at each stage is usually finite. Thus, we will be interested in a finite horizon problem where the control is applied only over a finite number of steps. Suppose that the number of steps over which the control input is to be applied has been determined a priori to be M and we are interested in controlling the behavior of the PBN over the interval k = 0, 1, 2, · · · , M − 1. Suppose at

352

Modeling and Control of Complex Systems

time step k, the state3 of the PBN is given by z(k) and the corresponding control input is v(k). Then we can define a cost Ck (z(k), v(k)) as being the cost of applying the control input v(k) when the state is z(k). With this definition, the expected cost of control over the entire treatment horizon becomes:
M−1

E
k=0

Ck (z(k), v(k))|z(0) .

(11.11)

Note that even if the network starts from a given (deterministic) initial state z(0), the subsequent states will be random because of the stochastic nature of the evolution in Equation (11.10). Consequently, the cost in Equation (11.11) had to be defined using an expectation. Equation (11.11) does give us one component of the finite horizon cost, namely the cost of control. We now proceed to introduce the second component. The net result of the control actions v(0), v(1), · · · , v( M − 1) is that the state of the PBN will transition according to Equation (11.10) and will end up in some state z( M). Because of the probabilistic nature of the evolution, the terminal state z( M) is a random variable that could possibly take on any of the values 1, 2, · · · , 2n . Depending on the particular PBN and the control inputs used at each step, it is possible that some of these states may never be reached because of noncommunicating states in the resulting Markov chains, and so forth. However, because the control strategy itself has not yet been determined, it would be difficult, if not impossible, to identify and exclude such states from further consideration. Instead, we assume that all the 2n terminal states are reachable and assign a penalty or terminal cost C M (z( M)) associated with each one of them. Indeed, in the case of PBNs with perturbation, all states communicate and the Markov chain is ergodic [12]. We next consider penalty assignment. First, consider the PBN with all controls set to zero, that is, v(k) ≡ 1 for all k. Then divide the states into different categories depending on how desirable or undesirable they are and assign higher terminal costs to the undesirable states. For instance, a state associated with rapid cell proliferation leading to cancer should be associated with a high terminal penalty, whereas a state associated with normal behavior should be assigned a low terminal penalty. For the purposes of this chapter, we will assume that the assignment of terminal penalties has been carried out and we have at our disposal a terminal penalty C M (z( M)) which is a function of the terminal state. Thus, we have arrived at the second component of our cost function. Once again, note that the quantity C M (z( M)) is a random variable and so we must take its expectation while defining the cost function to be minimized. In view of Equation (11.11), the

3 In

the rest of this chapter, we will be referring to z(k) as the state of the probabilistic Boolean network because, as discussed in Section 11.5.2, z(k) is equivalent to the actual state x(k).

For instance. 2. the principle of optimality asserts the following: if one searches for an optimal strategy over a subset of the original number of steps. when such an optimal strategy does exist. Although intuitively obvious. The optimal control problem can now be stated as follows: Given an initial state z(0). is based on the so-called principle of optimality. μ M−1 } that minimizes the cost functional: M−1 J π (z(0)) = E k=0 Ck (z(k). the principle of optimality can have far-reaching consequences. At each step. In general.14) and (11. page 23). let us assume that at time k.14). However. that is. · · · .4 Solution Using Dynamic Programming Optimal control problems of the type described by Equations (11.Modeling and Control in Cancer Genomics finite horizon cost to be minimized is given by: M−1 353 E k=0 Ck (z(k). v(k) = μk (z(k)) (11. then this new optimal strategy will be given by the overall optimal strategy. say M. · · · . a decision is made and the objective is to come up with a strategy or sequence of M decisions which is optimal in the sense that the cumulative performance index over all the M steps is optimized. Then: J ∗ (z(0)) = J 0 (z(0)) (11. · · · . find a control law π = {μ0 . This technique. μk (z(k))) + C M (z( M)) (11. 2.13) where μk : {1.14) subject to the constraint: Pr {z(k + 1) = j|z(k) = i} = a i j (v(k)) where a i j (v(k)) is the ith row. pioneered by Bellman in the 1960s. PROPOSITION 1 Let J ∗ (z(0)) be the optimal value of the cost functional (11. 11. it can be used to obtain the following proposition proven in Reference [17] (Chapter 1. jth column entry of the matrix A(v(k)). 2n } → {1. μ1 . 2m }. such an optimal strategy may not exist. the control input v(k) is a function of the current state z(k). (11.12) To proceed further. restricted to the steps being considered.5.15) can be solved by using the technique of dynamic programming. This principle is a simple but powerful concept and can be explained as follows. Suppose that we have an optimization problem where we are interested in optimizing a performance index over a finite number of steps. v(k)) + C M (z( M))|z(0) .15) .

we present two examples to show optimal control design using the dynamic programming approach.354 Modeling and Control of Complex Systems where the function J 0 is given by the last step of the following dynamic programming algorithm which proceeds backward in time from time step M − 1 to time step 0: J M (z( M)) = C M (z( M)) J k (z(k)) = min m E {Ck (z(k).6 Examples In this section.19) min k = 0. x1 . The example we consider is adapted from Example 1 in Reference [10]. one function f 1(2) associated with x2 . These functions are given by the truth table shown in Table 11. · · · . v(k)) + 2n j=1 (11. 11. 2. The truth .1. whereas the second one is a realistic example based on actual gene expression data.2.6. v(k)) + J k+1 (z(k + 1))} v(k)∈{1.17) k for each z(k) and k. it follows that: 2n E[J k+1 (z(k + 1))|z(k).J k+1 ( j) ⎦ . f 2(1) associated with x1 . · · · . and two functions f 1(3) . the control law π ∗ = {μ∗ .15) is given by: J M (z( M)) = C M (z( M)) J k (z(k)) = v(k)∈{1.2 } (11. M − 1. x2 . M − 1.2m } ⎡ ⎣Ck (z(k). 1. μ∗ .2···. f 2(3) associated with x3 . · · · . That example involves a PBN with three genes. (11. 11.14) and (11.16) (11. and x3 . j (v(k)). 1.19) can be used in arriving at an optimal control strategy. 2.18) ⎤ a z(k).17) is conditioned on z(k) and v(k). in view of Equation (11. if v∗ (k) = μ∗ (z(k)) minimizes the right-hand side of Equation (11.17) k = 0. Thus. The first example is a simple contrived one for illustrative purposes only. v(k)] = j=1 a z(k). There are two functions f 1(1) .J k+1 ( j). μ∗ } is optimal.18) and (11. 0 1 M−1 Note that the expectation on the right-hand side of Equation (11. Furthermore. Hence.1 Simple Illustrative Example In this subsection. the dynamic programming solution to Equations (11. we present an example of a PBN with control and work through the details to show how Equations (11.15). j (v(k)).···.

The probabilities P1 and P2 associated with each of these networks are given by P1 = P2 = 0.6 0 1 1 0 0 1 1 1 0. and x3 will be renamed. z = 3 → z = 3. (11. To introduce control. the variables x1 . whereas the variables x2 and x3 become x1 and x2 . f 1(2) ) and the second corresponding to the choice of functions ( f 1(1) .5 0 0 0 0 0 0 0 1 0. z = 2 → z = 3. let us assume that x1 is now going to be a control input whose value can be switched externally between 0 and 1 and the states of the new PBN are x2 and x3 .1 355 Truth Table for Example 1 in Reference [10] x1 x2 x3 000 001 010 011 100 101 110 111 c (i) j (1) f1 (1) f2 (2) f1 (3) f1 (3) f2 0 1 1 1 0 1 1 1 0.20) TABLE 11. the first corresponding to the choice of functions ( f 1(1) . With this change.Modeling and Control in Cancer Genomics TABLE 11. z = 4 → z = 2.5 table corresponds to an uncontrolled PBN. we have the truth table shown in Table 11.3. f 2(2) ).2.2 Truth Table for the Example of this Section u1 0 0 0 0 1 1 1 1 c (i) j v 1 1 1 1 2 2 2 2 x1 0 0 1 1 0 0 1 1 x2 0 1 0 1 0 1 0 1 z 1 2 3 4 1 2 3 4 (1) f1 (2) f1 (2) f2 0 1 1 0 1 1 0 1 1 0 0 0 1 0 1 1 1 0.5 0 0 0 0 0 0 0 1 0. the following transitions are associated with the network N1 and occur with probability P1 : z = 1 → z = 1.2 which also contains the values of the variables v and z corresponding to u1 and [x1 x2 ]. respectively. From Table 11. To be consistent with the notation introduced in Section 11. it is clear that when v = 1.5 . x2 . respectively. the variable x1 now becomes u1 . We next proceed to compute the matrices A(1) and A(2) corresponding to the two possible values for v. The values of c (i) in the j table dictate that there are two possible networks.5.5.4 0 1 1 0 1 1 0 1 1 0 0 0 1 0 1 1 1 0.

we can arrive at the following A(2) matrix: ⎤ ⎡ 0 0 1 0 ⎢ 0 0 P2 P1 ⎥ ⎥ ⎢ A(2) = ⎢ ⎥.25) where v(k) and ui (k).18) and (11. 2. z = 4 → z = 1. ⎣ P2 P1 0 0 ⎦ 0 0 0 1 (11. (11.22). n = 2 so that the variable z can take on any one of the four values 1. z = 3 → z = 3.14). in a real-world example.19) becomes: J 5 (z(5)) = C5 (z(5)) ⎡ J k (z(k)) = min ⎣u1 (k) + v(k)∈{1. · · · .21) In view of Equations (11. To set up the optimization problem (11. For the sake of simplicity. (11. The optimization problem (11. The current choice of terminal penalties indicates that the most desirable terminal state is 1 while the least desirable terminal state is 4. Suppose that the control action is to be carried out over five steps so that M = 5.26) k = 0.22) A(1) = ⎢ ⎥. The dynamic programming algorithm resulting from Equations (11. the matrix A(1) is given by: ⎤ ⎡ 1 0 0 0 ⎢ 0 0 1 0⎥ ⎥ ⎢ (11. or 4. (11.25).21). C5 (4) = 3.23). ⎣ 0 0 1 0⎦ P2 P1 0 0 Similarly. 3. v(k)) = i=1 ui (k) = u1 (k) (11. v(k)). the cost Ck (z(k). (11. the control variable v can take on any one of the two values 1 or 2. C5 (3) = 2. Moreover. (11. Clearly. 4. v(k)) captures the cost of applying the input u1 (k) at the kth step. C5 (2) = 1. because m = 1.8).24) Note that the above choices of M and the values of the terminal penalties are completely arbitrary. 1.J k+1 ( j) ⎦ (11. i = 1. 3.24). assume that the terminal penalties are given by: C5 (1) = 0. m are related by Equation (11.23) In this example. (11.15) can now be posed using the quantities defined in Equations (11. we need to define the function Ck (z(k).27) . j (v(k)). Also.15). 2.14).2} 4 j=1 ⎤ a z(k). and (11. z = 2 → z = 3.356 Modeling and Control of Complex Systems The corresponding transitions associated with network N2 that occur with probability P2 are given by: z = 1 → z = 1. let us define: m Ck (z(k).20) and (11. 2. (11. this information would be obtained from biologists.

the control input is applied only in the last time step provided the state z of the system at that time step is equal to 3. if no control is employed. Case 1 z(0) = 1: According to Equations (11. Hence.28) (11. Such a control strategy cannot be optimal because not only does the network end up in the most undesirable terminal state.22). the optimal control strategy in this case is no control. There are two cases to consider: (1) P2 = 1. but also the maximum possible control cost is incurred over the entire time horizon. Note from Equation (11.29). Furthermore. it is clear that the evolution of the probabilistic Boolean network is starting from the most undesirable terminal state. Case 2 z(0) = 4: In this case. and (11. z(3) μ∗ (z(4)) = 4 2 1 if z(4) = 3 otherwise. otherwise. Let us now consider a few different initial states z(0) and see whether the optimal control strategy determined above makes sense.24) that the evolution of the probabilistic Boolean network is starting from the most desirable terminal state. then the state would continue to remain in this most undesirable position during the entire control duration. optimal and the value of the optimal cost is 0.27) with P1 = 0. starting from z(0) = 4. 1 0 0 0 Clearly. the control strategy arrived at is. (11. from Equation (11. (11.22) we have: ⎡ ⎤ 1 0 0 0 ⎢0 0 1 0⎥ ⎥ A(1) = ⎢ (11. the network will reach the state z(1) = 1 in one step and stay there forever. Moreover. from Equation (11. from Equation (11. the optimal control strategy is to not apply any control at all. this no-control strategy is. the state of the network remains at this position. which does agree with the value determined from Equations (11.27).Modeling and Control in Cancer Genomics 357 We proceed backwards step by step from k = 4 to obtain a solution to Equations (11.26) and (11.30) ⎣ 0 0 1 0 ⎦. optimal and the optimal cost is 0.23) note that if the control input was kept turned ON over the entire control horizon. To get a more concrete feel for the optimal control strategy. Thus. .29) Thus.28). let us focus on the cases where the PBN degenerates into a standard (deterministic) Boolean network.26) and (11. indeed. from Equation (11. after. then.24). z(2). z(1). The net result is that the optimal control strategy for this finite horizon problem is given by: μ∗ (z(0)) = μ∗ (z(1)) = μ∗ (z(2)) = μ∗ (z(3)) = 1 0 1 2 3 for all z(0).22) it is clear that in the absence of any control. P1 = 0: In this case. indeed.

5. the abundance of messenger RNA for the gene WNT5A was found to be a highly discriminating difference between cells with properties typically associated with high metastatic competence versus those with low metastatic competence. However. experimentally increasing the levels of the Wnt5a protein secreted by a melanoma cell line via genetic engineering methods directly altered the metastatic competence of that cell as measured by the standard in vitro assays for metastasis.3. In this expression profiling study. z(2) = 3.4. Thus in this case. The network chosen as an example of how control might be applied is one developed from data collected in a study of metastatic melanoma [19]. because the available data suggest that disruption of WNT5A’s influence could reduce the chance of a melanoma metastasizing. 0⎦ 0 0 ⎢0 ⎢ A(2) = ⎢ ⎣0 0 ⎡ 0 0 1 0 1 0 0 0 ⎤ 0 1⎥ ⎥ ⎥.27) with P1 = 1. Then at the last time step. z(3) = 3.31).31) it follows that with z(0) = 4. From Equation (11.5. and 11. by the use of an antibody that binds Wnt5a protein. The optimal cost is given by 2 (the sum of the terminal cost and the cost of control) and this value agrees with that determined from Equations (11. the gene activity profile at any time step is not a binary number but a ternary one. 0 (unchanged). 0⎦ 1 (11.22) and (11.358 Modeling and Control of Complex Systems (2) P2 = 0.31) Note from Equation (11. we will have z(1) = 2.2 Real-World Example Based on Gene Expression Data In this subsection. This of course suggests a study of control based on interventions that alter the contribution of the WNT5A gene’s action to biological regulation. could substantially reduce the ability of Wnt5a to induce a metastatic phenotype.5. A further finding of interest in the aforementioned study was that an intervention that blocked the Wnt5a protein from activating its receptor.2. and z(4) = 3. the PBN formulation and the associated control strategy can be developed exactly as described in Sections 11. and 1 (upregulated). we apply the methodology of this chapter to derive an optimal intervention strategy for a particular gene regulatory network. As discussed in that paper. P1 = 1: In this case.28) that the optimal control strategy is no control over the first four time steps.26) and (11. the WNT5A network was obtained by studying the predictive relationship among 587 genes. In this study. The methods for choosing the genes involved in a small local network that includes the activity of the WNT5A gene and the rules of interaction have been described in Reference [21]. The expression status of each gene was quantized to one of three possible levels: −1 (downregulated). with . These findings were validated and expanded in a second study [20]. 11. the control input is turned ON and from Equation (11. 11.23) we have: 1 ⎢0 ⎢ A(1) = ⎢ ⎣0 0 ⎡ 0 0 0 1 0 1 1 0 ⎤ 0 0⎥ ⎥ ⎥. a desirable outcome. the resulting state is z(5) = 2. from Equations (11.6.

it is not necessary to actually construct a PBN. A detailed description of this is available in Reference [21]. which is an intractably large number to use either for modeling or for control. In this context. The control objective for this ten-gene network is to externally downregulate the WNT5A gene. using the best three-gene predictor for each gene. These relationships were developed using the COD (coefficient of determination) technique [6–8] applied to the gene expression patterns across 31 different conditions and prior biological knowledge. A network with 587 genes will have 3587 states. all that is required are the transition probabilities between the different states under the different controls. we will have 3n states instead of the 2n states encountered earlier. The reason is that it is biologically known that WNT5A ceasing to be downregulated is strongly predictive of the onset of metastasis. is shown in Figure 11. which we are in the process of improving. the number of genes was narrowed down to the ten most significant ones and the resulting multivariate relationship. Although there is nothing conceptually difficult about doing this. it is beyond the computational limits of our current software. the only difference that now for an n-gene network.2. 049) states. Consequently.Modeling and Control in Cancer Genomics RET-1 359 HADHB MMP-3 WNT5A S100P Pirin MART-1 Synuclein STC2 PHO-C FIGURE 11. it is appropriate to point out that to apply the control algorithm of this chapter. Controlling the ten-gene network using dynamic programming would require us to design a control algorithm for a system with 310 (= 59.2 Multivariate relationship among the genes of the ten-gene WNT5A network [21]. .

and let the network evolve from there. The resulting genes along with their multivariate relationship are shown in Figure 11. and next we discuss the two cases separately. Here the choice of the numbers 3 and 6 is arbitrary but they do reflect our attempt to capture the intuitive notion that states where WNT5A equals 1 are less desirable than those where WNT5A equals 0. In this case. Biologically such a control could be implemented by using a WNT5A inhibitory protein. if necessary. that is. the control variable is binary. Using the procedure discussed in Reference [10]. Two types of possible controls were used. 3. while 1 indicates that such a forcible alteration has taken place. with 0 indicating that the expression status of WNT5A has not been forcibly altered. we determined their two best two-gene predictors and their corresponding CODs. . the control action at any given time step is to force WNT5A equal to −1. and (3) the types of controls and the costs associated with them. Because our objective is to ensure that WNT5A is downregulated. For the treatment window. Accordingly.3. For each gene in this network. and 4. a penalty of 3 to all states for which WNT5A equals 0. we arbitrarily chose a window of length 5.3 Multivariate relationship among the genes of the seven-gene WNT5A network. (WNT5A Controlled Directly) In this case. 2. and a penalty of 6 to all states for which WNT5A equals 1. The terminal penalty at time step 5 was chosen as follows. the COD information for each of the predictors was then used to determine the 37 ×37 matrix of transition probabilities for the Markov chain corresponding to the dynamic evolution of the gene activity profile of the seven-gene network. 1. (2) the terminal penalty. control inputs would be applied only at time steps 0. The optimal control problem can now be completely specified by choosing (1) the treatment/intervention window. we further narrowed down the number of genes in the network to seven by using COD analysis on the 31 samples.360 Modeling and Control of Complex Systems WNT5A STC2 Pirin S100P HADHB MART1 RET1 FIGURE 11. we assigned a penalty of zero to all states for which WNT5A equals −1.

In this context. Moreover. our simulations indicated that at every time step from 1 to 5. the probability of WNT5A being equal to −1 was higher with control than that without control. This is not surprising given that.5. MART-1 = 0. in general. with control. however. pirin = 1. to achieve this control. at the final state.5. Using these control inputs. namely to keep WNT5A downregulated. S100P = −1. HADHB = 0. Furthermore. we are trying to control the expression status of WNT5A using another gene and the control horizon of length 5 simply may not be adequate for achieving the desired objective with such a high probability. equal to 1. our simulations indicated that. it is significant to point out that if the network starts from the initial state STC2 = −1. The treatment window and the terminal penalties are kept exactly the same as before. RET-1 = 0. this does not happen. however. we arbitrarily assigned a cost of 1 to each such forcible change and solved for the optimal control using dynamic programming. MART-1 = 0. then it quickly transitions to a bad absorbing state (absorbing state with WNT5A = 1). Nevertheless. even in this case. With this kind of intervention strategy. we studied the evolution of the state probability distribution vectors with and without control. successful in achieving the desired control objective. we conclude that the optimal control strategy of Sections 11. whether at a given time step such intervention takes place or not is decided by the solution to the resulting dynamic programming algorithm and the actual state of the network immediately prior to the intervention. Thus. (WNT5A Controlled through Pirin) In this case. WNT5A = 1 and if no control is used. S100P = −1. RET-1 = 0. The control action consists of either forcing pirin to −1 (corresponding to a control input of 1) or letting it remain wherever it is (corresponding to a control input of 0). As before. The net result was a set of optimal control inputs for each of the 2187 (= 37 ) states at each of the five time points. at any step. if the network starts from the state corresponding to STC2 = −1. WNT5A always reached −1 at the final time point (k = 5). indeed. WNT5A = 1 and evolves . we studied the evolution of the state probability distribution vectors with and without control. it seems reasonable to incur a control cost at a given time step if and only if the expression status of WNT5A has to be forcibly changed at that time step.3 and 11. HADHB = 0. The only difference is that this time. the control objective is the same as in Case 1. Once again. Using the resulting optimal controls. the probability of WNT5A being equal to −1 was higher with control than that without control. pirin = 1. For every possible initial state.Modeling and Control in Cancer Genomics 361 Of course. Having chosen these design parameters. no definite ordering of probabilities between the controlled and uncontrolled cases at the intermediate time points.4 was. we implemented the dynamic programming algorithm with pirin as the control. With optimal control. in this case. pirin. the probability of WNT5A being equal to −1 at the final time point was not. there was. a control cost of 1 is incurred if and only if pirin has to be forcibly reset to −1 at that time step. we use another gene. For every possible initial state. In this case.

7 Concluding Remarks In this chapter. the PBNs with control can be used as a modeling tool to facilitate effective strategies for therapeutic intervention. We next discuss some of the issues that we are aware of at the current time.8 Future Directions Motivated by biological considerations. we have shown how control can be introduced into a PBN leading to a controlled Markov chain. Next. then the probability of WNT5A = −1 at the final time point equals 0. remain and these will have to be successfully tackled before the methods suggested in this chapter find application in actual clinical practice. it was shown how the state evolution of a PBN can be modeled as a standard Markov chain. we also considered intervention in the presence of random perturbations. 11.362 Modeling and Control of Complex Systems under optimal control. In this work.673521. In the case of diseases like cancer. in Reference [23]. in Reference [22]. we have extended our intervention results to the so-called context-sensitive PBNs which we believe are a closer approximation at modeling biological reality. However. In Reference [10]. the initial result on intervention presented here has been subsequently extended in several directions. where any gene in a PBN could randomly switch values with a small probability. and so forth on the holistic behavior of the genes. chemotherapy. In that same reference. we have modified the optimal intervention algorithm to accommodate the case where the entire state vector (gene activity profile) is not available for measurement. Furthermore. First. In contrast to the PBNs introduced in Reference [10]. Furthermore.” Thus. we have introduced probabilistic Boolean networks with one or more control inputs. these control inputs can potentially be used to model the effects of treatments such as radiation. the evolution of the state of the networks considered here depends on the status of these control inputs. however. Several open issues. This is quite good in view of the fact that the same probability would have been equal to zero in the absence of any control action. the assignment of these terminal penalties for cancer therapy is by no means a . the control inputs can themselves be chosen so that the genes evolve in a more “desirable fashion. • Methodical assignment of terminal penalties: The formulation of the optimal control problem assumes that there is a terminal penalty associated with each state of the PBN. 11. we also showed how the control inputs can be optimally chosen using the dynamic programming technique.

one consideration is to use genes for which inhibitors or enhancers are readily available. the kind of terminal penalty used for the melanoma cell line study of Section 11. PBN design from steady-state data: Yet another aspect that merits further investigation is motivated by the fact that the currently available gene expression data comes from the steady-state phenotypic behavior and really does not capture any temporal history. we propose to assign terminal penalties based on equivalence classes. The reason is that although the intervention will be carried out only over a finite horizon. how can we be certain that it is capable of controlling some other gene(s)? Although the answer is not clear at this stage.2. .6. Another possibility is to use the concept of gene influence introduced in Reference [10]. we do believe that the traditional control theoretic concept of controllability [25] may yield some useful insights. Such an idea can be generalized to PBNs. We intend to develop more systematic approaches for affecting the steady-state behavior.28]. Consequently. one of the genes in the PBN. Issues that arise upstream will definitely impact intervention and vice versa. This last aspect further underscores the fact that the category of intervention cannot be researched in isolation. an approach that we have explored preliminarily in Reference [23]. a reasonable objective of therapeutic intervention could be to alter the attractor landscape in the associated Boolean network. and some initial results in this connection have been reported in Reference [26]. The question is. However. To remedy the situation. For such purposes. Intervening to alter the steady-state behavior: Given a Boolean network. and a brute force approach aimed at such intervention has been proposed in Reference [13].Modeling and Control in Cancer Genomics 363 straightforward task. how do we decide which gene to use? Of course.2 is simply inadequate. because it fails to capture the steady-state behavior once the intervention has ceased. Consequently. The results of preliminary simulation studies in this regard [24] appear to be encouraging. was used as a control input. one would like to continue to enjoy the benefits in the steady state. one can partition the state-space into a number of attractors along with their basins of attraction. the process of inferring PBNs from the data will have to be modified. in the sense that it will have to be guided more by steadystate and limited connectivity considerations. even if such a gene is chosen. namely pirin. • • The optimal control results presented in this chapter assume known transition probabilities and pertain to a finite horizon problem of known length.6. Major research efforts in these directions are currently under way [27. • Choice of control inputs: In the case of the melanoma cell line study presented in Section 11. The attractors characterize the long-run behavior of the Boolean network and have been conjectured by Kauffman to be indicative of the cell type and phenotypic behavior of the cell.

(1996). 93. (2000).. Meltzer. M. O. by the Translational Genomics Research Institute.. & Davis. L. 201–209. D. Berlin. (1997).. Sivakumar.. 8.. Science. 7. Kemeny. Kim. Meltzer. W. P. 10614–10619.. 278. Sivakumar. J. Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic Scale. 1976.. M. J. Davis. Y. References 1. M. J. Signal Processing.. O.. Bittner. K.. Chai. R. Biomedical Optics. 3.. 2219–2235. P. & Snell. E. & Chen.. A. R.. Y. Wodicka. O.... Dong. Shalon. Mittmann. S. (2000). the results presented in this chapter correspond to the following stages in standard control design: modeling. 4(4). 270. S. . Ray. Finite Markov Chains. Acknowledgments This work was supported in part by the National Cancer Institute under Grant CA90301. K. (1995). Heller.. Kim. 680–686. J.. L. DeRisi. (2000). R.. Schena. (1997). 411–424. D... P.. 14. Chen. DeRisi. L. Chen.. Iyer. & Trent. Chen. A considerable amount of effort needs to be focused on this endeavour. Shalon. & Brown. Y. W. Kim. 1359–1367. M. 2. S. D. R. Springer-Verlag. Coefficient of Determination in Nonlinear Signal Processing. A. Nature Biotechnology. O. Y. & Trent. Penland. Parallel Human Genome Analysis: Microarray-Based Expression Monitoring of 1000 Genes.. Genome-wide Expression Monitoring in Saccharomyces cerevisiae. E. 80(10). M. controller design. Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray.. Genomics.. Su.. 4. J. & Brown. 15. G. and by the National Science Foundation under Grants ECS-0355227 and CCF-0514644. Use of a cDNA Microarray to Analyze Gene Expression Patterns in Human Cancer. P. 5. The designed controllers will have to be successfully implemented in practical studies. Ho. M. (1996).. M. M. 457–460. Brown... Nature Genetics. 9... H. Trent. 6. Bittner. Proceedings of the National Academy of Sciences USA. L. & Bittner. Y. L. (1976). before the benefits of using engineering approaches in translational medicine become transparent to the biological and medical communities. Schena. J. Dougherty. P. Dougherty. E. M. S. Multivariate Measurement of Gene-Expression Relationships. P. R. M. L. M.. H. J. & Lockhart. J. Brown. A General Framework for the Analysis of Multivariate Gene Interaction via Expression Arrays... 67. R.. at least with cancer cell lines. P. V. Dougherty. and verification of the performance of the designed controller via computer simulations. R. Science. Finally.364 Modeling and Control of Complex Systems Their extension to the situation where the transition probabilities and the horizon length are unknown is a topic for further investigation. 467–470. Meltzer.

. Bioinformatics. S.. M. Dougherty. P. & Dougherty.. Chen. A. The Origins of Order: Self-Organization and Selection in Evolution.. Hostetter. A. Bittner. 19. K. E. V. E. W. 18. E. Datta... A.. NJ. Wang. Dynamic Programming and Optimal Control. Pal.. Leja. From Boolean to Probabilistic Boolean Networks as Models of Genetic Regulatory Networks. 10(4). External Control in Markovian Genetic Regulatory Networks: The Imperfect Information Case. 279–288. Proceedings of the American Control Conference. Lueders. Sci. Dougherty. 20(6).. Metabolic Stability and Epigenesis in Randomly Constructed Genetic Nets. 261–274. 417–422. Kauffman. M. 10(4). & Dougherty. (2000). R. Acad.. 924–930. R. Princeton University Press. Proc. Choudhary. J. P. R. X.. Rosenblatt. Chen.. 536–540. W. E. Molecular Classification of Cutaneous Malignant Melanoma by Gene Expression Profiling. Simon. N.. 596–600. Pal. M.. Theoretical Biology. P. (2004). M. Carpten. & Zhang. A. A. (2005). Intervention in Context-Sensitive Probabilistic Boolean Networks. 24. Shmulevich. Ben-Dor. M.. D.. 1. M. & Dougherty. 17. E.. R. (2006). E. Bioinformatics. Wnt5a Signalling Directly Affects Cell Motility and Invasion of Metastatic Melanoma. A. Shmulevich. Gene Perturbation and Intervention in Probabilistic Boolean Networks.. R. R.. 11.. 12. Towards a General Theory of Adaptive Walks on Rugged Landscapes. Choudhary. 27. Control of Stationary Behavior in Probabilistic Boolean Networks by Means of Structural Intervention... Bioinformatics. Assignment of Terminal Penalties in Controlling Genetic Regulatory Networks. Seftor. I... S. Pollock. Bittner. 13. 21. Proceedings of the IEEE. F.... 2375–2387. 18. (2002). R. X. 18. Ivanov. Cancer Cell. 54. Gillanders. Radmacher. J. Probabilistic Boolean Networks: A Rule-Based Uncertainty Model for Gene Regulatory Networks.. M. R.. (2005). Pal. Bittner. & Sondak. 22. I. 90(11). E. S. NH. Athena Scientific. Kauffman. 1211–1218. Optimal Infinite Horizon Control for Probabilistic Boolean Networks. Wang.. 11–45. 22.. 431–446. Bellman. I. R. C... E.. Datta. 21. D.. Dietrich. S. C. R. Nashua. I. Bittner.. Kim. 128. 4021–4025.. Dougherty. A. Cao.. M. K. & Dougherty. D. (2002).. & Trent. Can Markov Chain Models Mimic Biological Regulation? Journal of Biological Systems. E.. Kauffman. Datta. Beaudry... R.. 437–467... R. Bittner. Dougherty. & Suh. & Zhang. E. Shmulevich.. Y. M.. Generating Boolean Networks with a Prescribed Attractor Structure.Modeling and Control in Cancer Genomics 365 10.. Dougherty. Ivanov. (1993). Biological Systems.. 14. 20. & Zhang. R. Berens. Nature. 25. 406 (6795). Dougherty. L. N.. I. Y.. Yakhini. Bioinformatics.. Weeraratna. R. & Dougherty. Alberts. W.. Kim. Hendrix. & Zhang. (2000). & Dougherty. 23. (1957). Datta. E. Jiang. Kalman. R. Z. S. (1969). J. Y.. G. Shmulevich. A. Datta. L. . (2002). (2002). Marincola. Sampas.. 21. 20. T. A... Bioinformatics. Bittner. W. H. E. (1962). 1778–1792. Duray. Meltzer. R. A Bayesian Connectivity-Based Approach to Constructing Probabilistic Gene Regulatory Networks. R. 26. E. A. (2002). L. E. 28. Natl. Princeton. Li. 1319–1331. IEEE Transactions on Signal Processing. Canonical Structure of Linear Dynamical Systems. E. Theoretical Biology.. M. R. Glatfelter.. A. Jiang. Y. Bertsekas. Zhou. (1987). A. Dynamic Programming. Pal... (2004). I.. Bioinformatics. New York. A. Oxford University Press.. 337–357. (2002). E. 16. & Levin. S. Gooden. P.. E.. (2005). 15. 2918–2927.

.

........ We show how the model cortex is able to discriminate location of a target in the visual space......... 374 Generation of Activity Waves in the Visual Cortex......... 403 References.......................... 387 12.....................6............. 393 12......................................................3 Decoding with Additive Gaussian White Noise Model.................... 403 Acknowledgments ............8 The Role of Tectal Waves in Motor Control: A Future Goal................. Specifically..............................6 Statistical Detection of Position .................................1 Series Expansion of Sample Functions of Random Processes....................................................................................................................................... 393 12..................................................................... Wenxue Wang. 388 12..4 12..............................12 Modeling and Estimation Problems in the Visuomotor Pathway Bijoy K..................... 367 ......................................... 383 12.................... 389 12......... 377 Simulation with a Sequence of Stationary Inputs.......7 Detection Using Nonlinear Dynamics...7...........5 Introduction.......................9 Conclusion............................................7................................. 387 12.............2 Memory with Two Elements ................3 12...........2 12................ The discrimination is carried out using two separate algorithms.............. Ghosh...................... 403 In this chapter we describe how a population of neurons models the dynamic activity of a suitable region of the visual cortex.................................. and Zachary V............................2 Hypothesis Testing ......................... 379 Encoding Cortical Waves with β-Strands Using Double KL Decomposition....... Freudenburg CONTENTS 12...... 394 12.. 368 Multineuronal Model of the Visual Cortex............6............................ responding to a class of visual inputs......... 399 12.. a large-scale neuronal model has been described which generates a propagating wave of activity that has been independently recorded in experiments using multiple electrodes and voltage-sensitive dyes.......1 12.......................6.............................1 Phase Locking with a Network of Kuramoto Models...

Neurons at each cortical locus are activated by visual stimuli presented at every point in the binocular visual space. The role of the visual pathway prior to the cortex is essentially filtering the visual signal although the role of the cortex is somewhat more involved. although the latency and shape of the response waveforms vary as the stimulus is presented at different loci in the visual space. Neurons at adjacent points in the cortex are activated by stimuli presented at adjacent regions of the visual space. In the second method.1 Introduction In this chapter our goal is to describe modeling and estimation problems that arise in the animal visuomotor pathway. and finally actuate a suitable motor action. which we describe presently. The chapter concludes with a discussion of the motor control problem and how the cortical waves play a leading role in actuating movements that would track a moving target with some level of evasive maneuvers. Consequently. This suggests that there may not be a simple map of the coordinates of the visual space to the coordinates of the visual cortex in turtles. Extracellular recordings from the visual cortex of freshwater turtles produce a different result [16]. Turtles anticipate the future position of a moving target by solving a motion prediction problem—a task that is believed to be initiated in the visual cortex. The representation is carried out. acquire and internally represent images of the target. Discrimination is carried out assuming that the noise is additive and Gaussian.368 Modeling and Control of Complex Systems The first method utilizes statistical detection wherein the activity waves generated by the visual cortex are encoded using principal components analysis. there is a continuous but deformed map of the coordinates of visual space to the coordinates of the cortex.” is sufficiently different from each other for alternative locations of point targets in the visual space. we show that the representation of the activity waves. over a sequence of sliding windows. first in the spatial domain and subsequently in the temporal domain. 12. Mammals have a cerebral cortex that embodies several topographically organized representations of visual space. Visual inputs to the retina are routed through the geniculate before it hits the cortex (see Figure 12. Extracellular recordings show that neurons in a restricted region of visual cortex are activated when a visual stimulus is presented to a restricted region of the visual space. Each beta strand corresponds to a suitable initialization of the dynamic system and the states of attraction correspond to various target locations.2).1 we show the tracking maneuver of a freshwater turtle as it strives to capture a moving fish. viewed as a “beta strand. such as capturing the target. The pathway is particularly adept at tracking targets that are moving in space. the classical receptive field of the neuron [7]. the beta strands (β-stands) are discriminated using a nonlinear dynamic system with multiple regions of attraction. In Figure 12. Using the model cortex. Position in the visual space is perhaps represented in a form other than a retinotopic .

) Kinematic analysis of turtle prey capture. Selected movie frames at the top of the figure show a turtle orienting to a moving fish (arrow) in frames 01 to 82.3 for a visualization of the wave propagation in a model cortex). [24]. [28]. Individual waves could be represented as a weighted sum of as few as three eigenvectors which are functions of the coordinates of the cortex. known as Karhunen–Loeve decomposition. Interestingly. Head angle (α) 2. Prey angle (β) 3. presentation of different . The waves have been analyzed using a variant of the principal components method. [30] have supported this viewpoint. extending and turning its neck (133 to 135) and capturing the fish (138).1 (See color insert following p. 272. [29]. Distance from snout to prey X Reference Point (RP) FIGURE 12. Distance from RP to snout 4. The bottom image shows the digitization points of the kinematic analysis. map.Modeling and Estimation Problems in the Visuomotor Pathway 369 01 24 60 82 100 117 130 133 135 138 Y Prey α β Digitization Points for Kinematic Analysis 1. moving toward it (100 to 130). These waves have been demonstrated using both multielectrode arrays and voltage-sensitive dyes [23]. Experiments conducted by Senseman and Robbins [27]. Both methods detect the activity of populations of pyramidal cells [28]. They used voltage-sensitive dye methods to show that presentation of a visual stimulus to the retina of an in vitro preparation of the turtle eye and brain produces a wave of depolarization that propagates anisotropically across the cortex (see Figure 12.

2 (See color insert following p.) The visual pathway in the turtle visual system from eyes to visual cortex.370 Modeling and Control of Complex Systems Cortex Visual Input Eye Retina Ho LGN b a Visual Streak a rizo Axi ntal s b c c d Lateral Medial Subpial Stellate Horizontal d L C R FIGURE 12. 272.) A traveling wave of cortical activity from the model cortex without Hebbian and anti-Hebbian adaptation.3 (See color insert following p. 1 ms 90 ms 160 ms 15 0 220 ms 400 ms 500 ms –15 –30 580 ms 760 ms 880 ms –45 –60 FIGURE 12. 272. .

The cortex also contains at least three populations of inhibitory interneurons. but also across olfactory. and stellate cells. Both geniculate and pyramidal cells are excitatory. Geniculate neurons project excitatory contacts onto pyramidal cells. [32]. subpial cells. visual. and is divided into lateral and medial parts. such as spots of light at different points in the visual space. Turtle visual cortex has three layers: an outer layer 1.) Distribution of cells in each of the three layers of the turtle cortex projected on a plane. Pyramidal cells (including lateral and medial pyramidal cells) have somata situated in layer 2 and are the source of efferent projections from the cortex. and the horizontal cells (situated in layer 3). an intermediate layer 2. Pyramidal cells give rise to 1600 μm 800 0 0 Lateral Medial Subpial Stellate Horizontal M C L 800 Left μm Center Right 1600 R FIGURE 12.Modeling and Estimation Problems in the Visuomotor Pathway 371 visual stimuli. the stellate (situated in the inner half of layer 1). and an inner layer 3. [21]. 272. Interactions among these five types of cells involve two types of effects: excitatory and inhibitory. [20]. . Subsequent research work has provided abundant evidence that the traveling electrical waves are observed not only in turtle visual cortex [24]. produce waves that have different representations in the three-dimensional eigenspace.4 for a model cortex). The large-scale model described in this chapter contains geniculate neurons in the dorsal lateral geniculate complex of the thalamus and the five major populations of neurons in the visual cortex (see Figure 12. the subpial (situated in the outer half of layer 1). Propagating waves with comparable properties can be produced in a large-scale model of turtle visual cortex that contains geniculate and cortical neurons [9]. This raises the possibility that visual information is coded in the spatiotemporal dynamics of the cortical waves. and visuomotor areas of the cortex in a variety of species [10].4 (See color insert following p. The lateral geniculate (LGN) cells are distributed linearly (shown at the right side of the bottom edge of the cortex) and the solid line shows how they interact with cells in the cortex.

excitatory inputs to the inhibitory interneurons as well as neighbor pyramidal cells.5B. The subpial (SP). The pyramidal cells (PYR) are excitatory. and horizontal cells are inhibitory and provide inhibitory inputs to pyramidal cells. The distinction between medial and lateral pyramidal cells is not made in this diagram. left part) involves the geniculate inputs that make excitatory contacts on subpial. The geniculate afferents (GA) provide excitatory input to cells in both pathways.372 Modeling and Control of Complex Systems Stellate Horizontal Input LGN Pyramidal Subpial Excitation connection Inhibition connection Anti−Hebbian connection Hebbian connection (a) SP GA PYR ST Feed Forward (b) GA SP ST PYR H Feedback FIGURE 12. Figure 12. The axons of pyramidal cells leave the visual cortex to other cortical areas and to the brainstem in both pathways. A feed-forward pathway (Figure 12. . Subpial. and horizontal (H) cells are inhibitory.5b). Subpial and stellate cells also involve recurrent connections to neighbor cells. Each box symbolizes a population of cells. The five types of cells can be thought of as forming two anatomically defined pathways within the cortex (Figure 12. stellate. Interconnection between neurons in various layers of the visual cortex is shown.5 Cortical circuit of freshwater turtles. stellate (ST).5a shows the interconnections among the cortical neurons in the large-scale cortex model.

2. and horizontal cells. The subpial and stellate cells make inhibitory contacts on the lateral pyramidal cells. stellate. Each of these populations of inhibitory interneurons make inhibitory contacts on pyramidal cells. striatum. turtle vision has a greater acuity across the horizontal axis (along the surface of water for a freely swimming turtle) in comparison to the vertical axis (above and below the water surface). . Both the lateral and medial pyramidal cells give rise to efferent connections to the thalamus. which make excitatory contacts on subpial. and lateral pyramidal cells. there are inhibitory contacts between individual subpial cells as well as between individual stellate cells. and 12. Visual input from the LGN to the cortex is not retinotopic. The retinal ganglion cells are densely distributed around a horizontal axis called the visual streak. In addition. and brainstem. 12. right part) involves the recurrent collateral of both lateral and medial pyramidal cells. A feedback pathway (Figure 12. Lateral pyramidal cells give rise to excitatory recurrent contacts on the other lateral and medial pyramidal cells. sparking the generation of a wave.2. Only 13 out of 201 LGN neurons are shown for clarity.4.6. The retinal inputs are redistributed “retinotopically” on the lateral geniculate nucleus (LGN).5 mm R FIGURE 12. Thus.6 Linear arrangement of geniculate neurons. These lines cross over. In fact. The retino-cortical pathway has been sketched in Figure 12. M C L 0.5b. The precise functional role of the LGN is not entirely known and has not been detailed here.Modeling and Estimation Problems in the Visuomotor Pathway 373 stellate. The somata are shown as boxes and the corresponding axons are shown as lines. The LGN receives feed-forward inputs from the retina and feedback inputs from the cortex. giving rise to an intense activity at the rostro-lateral part of the cortex. inputs from the LGN are spatially distributed across the cortex and the axons are shown as lines in Figures 12.

272. (b) Probable neural substrate for motion extrapolation in turtles.7. and the preteactum. A direct input from the retina is fused with an input from the cortex at the tectum. PT. the dimensions of the somata and dendrites of individual types of neurons are based on measurements from Golgi impregnations of turtle visual cortex [4].7 (See color insert following p. Biophysical parameters for each cell type are measured with in vivo intracellular recording methods [14]. is an evolutionary process and involves numerous parameters. in general.) Prey capture and motion extrapolation. namely the striatum. The visuo-cortical complex is part of an overall visuomotor pathway. sketched in Figure 12. The animal is able to make prediction based on long-term visual data and correct the prediction based on “somewhat recent” target location. Visual input converges onto the optic tectum via two separate routes. pretectum. Briefly. substantia nigra). (a) To capture a moving fish. The physiology of each type of synapse included in the model is known from in vitro . a turtle must extrapolate its future position.y. the substantia nigra. The intermediate stages of the cortical input. RF. lateral geniculate nucleus. some of which are obtained by physiological measurements and some of which are simply tuned in the modeling process.2 Multineuronal Model of the Visual Cortex In this section. 12.t) (a) (b) FIGURE 12. (Abbreviations: LGN. Modeling. [15]. reticular formation. t3 Modeling and Control of Complex Systems Striatum Cortex SN PT LGN Tectum RF Retina Movement I (x. SN. t1 x2. The exact mechanism of sensor fusion at the tectum is a subject of future research.374 x1. are not relevant for this discussion. The tectum is responsible for predicting future locations of moving targets by processing cortical waves and fusing more recent visual inputs from the retina. For a comprehensive description of the computational model we would like to refer to Reference [21]. we give a description of the large-scale model of the turtle visual cortex. [5]. t2 x3.

Moreover. and horizontal cells from Golgi preparations. stellate. Each compartment is modeled by a standard membrane equation and implemented in GENESIS [3]. The kinetics of individual types of voltage-gated channels have not been characterized with voltage clamp methods in turtle visual cortex. Each neuron is represented by a multiple compartmental model with 3 to 29 compartments based on its morphology (see Figure 12. .8 (See color insert following p.Modeling and Estimation Problems in the Visuomotor Pathway 375 intracellular recording experiments [13]. 272.8). and horizontal cells and their postsynaptic targets.) Compartmental structures of cortical neuron models in the large-scale model of turtle visual cortex. The geometry of the geniculocortical and intracortical interconnections are known in detail [6]. stellate. These data are used to estimate spheres of influence between subpial. [17]. The membrane equation is written Lateral Medial Subpial Lateral Medial Subpial Stellate Horizontal Stellate Horizontal FIGURE 12. Our model assumes the three layers are projected onto a single plane (see Figure 12. the visual cortex of freshwater turtles contains three layers. So the parameters needed to implement Hodgkin– Huxley-like kinetic schemes are obtained from work on mammalian cortex and constrained by comparing the firing patterns of model cells to real cells following intracellular current injections. there is some information on the basic shape and dimensions of the axonal arbors of subpial. As noted in the introduction.4).

each measuring 28 × 190 μm. 311 medial pyramidal cells. The . Geniculate axons are modeled as delay lines that extend across the cortex from lateral to medial. 45 stellate cells. Ri is the total membrane resistance of the ith compartment. 44 subpial cells. The second summation is over all the species of ionic conductances present on the ith compartment. Cm . The third summation is over all the species of synaptic conductances present on the ith compartment. Maps of the spatial distribution of neurons in each of the three layers of the cortex are constructed from coronal sections through the visual cortex of a turtle.1) is over all the compartments linked to the ith compartment. The cells are distributed between 8 × 56 blocks according to the actual density information. while retaining the information about the relative densities of cells in the visual cortex of a real turtle. The axons are not modeled as compartments but as delay lines. Ci is the total membrane capacitance of the ith compartment. The maps are divided into an 8 × 56 array of rectangular areas. In Figure 12. Within each block. For a detailed description of compartmental models see References [21] and [33].8 we show the compartmental structures of the five types of cortical interneurons in the model cortex. and Ra using standard relationships [3]. The first summation in Equation (12.4). Total resistances and capacitances are calculated from the geometry of the compartments and the biophysical parameters. This algorithm is convenient to use because it can generate as many different models as needed. and 20 horizontal cells (see Figure 12. Biophysical data are not available for neurons in the dorsal lateral geniculate complex of turtles. independently for every block. The number of geniculate neurons in the model is L = 201. Istim (t) is a time-varying current injected into the ith compartment. so geniculate neurons are modeled as single isopotential compartments with a spike-generating mechanism. The somata are modeled as spherical compartments and the dendrites are modeled as cylindrical compartments.1) gk (Vi (t) − E k ) + syn gk (Vi (t) − E k ) + Istim (t) ⎦ where Vi (t) is the time-dependent membrane potential of the ith compartment relative to the resting membrane potential. the cell coordinates are chosen randomly from a uniform distribution. An algorithm is developed in MATLABTM to construct an array of neurons in each layer that preserves the ratios of cells between layers in the real cortex. and Ri j is the coupling resistance between the ith and jth compartments. Rm .376 Modeling and Control of Complex Systems using a set of ordinary differential equations described as follows: ⎡ 1 ⎣ (Vi (t) − Er ) d Vi (t) + =− dt Ci Ri + ion j (Vi (t) − Vj (t)) Ri j ⎤ (12. Experimental data are not available for each of the 8 × 56 rectangular boxes and are interpolated at locations where measurements are not available. The model in our study has 368 lateral pyramidal cells.

Pyramidal cells locally excite each other. The number of synaptic sites (varicosities) assigned to each geniculate afferent is calculated by multiplying the length of the axon by the average number of varicosities per 100 μm of axon length.05 m/s. and then run in relatively straight lines from lateral to medial cortex.18 m/s [5]. The geometry of the geniculate afferents and their spatial distribution are based on anatomical data from Reference [17]. a cortical neuron will be connected to any other cell in the cortex within its sphere of influence. The opposite of afferent is efferent.5). The distribution is strongly skewed to the left. consistent with measurements of propagating waves in the turtle visual cortex [27]. . respectively. Waves are typically generated in the pyramidal cells as a result of an external input current that results in an increase in membrane potential. Propagation times between neurons are calculated using the distance between a pair of neurons and conduction velocities. The rostrocaudal axis of the geniculate is consequently mapped along the caudorostral axis of the cortex. Typically this wave results as an interaction between a feed-forward and a feedback circuit (see Figure 12. Geniculate afferents enter the cortex at its lateral edge. Roughly speaking. cross over each other. Cortico-cortical connections are given conduction velocities of 0. The other afferents1 are evenly spaced between these two axons. 12.Modeling and Estimation Problems in the Visuomotor Pathway 377 LGN neurons are arranged linearly along the lateral edge of the cortex with axons extending to the cortex (see Figure 12. the feedforward circuit controls the origination and propagating speed of the traveling wave and the feedback circuit controls the propagation duration.6). The spatial positions of the individual varicosities (the total of approximately 11.3 Generation of Activity Waves in the Visual Cortex We have already seen that a group of neurons in the turtle visual cortex has the ability to sustain a traveling wave. we have constructed spheres of influence. the details of which have been explained in Reference [32]. and the conduction velocities for axons of inhibitory interneurons in rat cortex [26]. Therefore. The synaptic strengths are higher in the center of influence and are linearly reduced with the distance. resulting in a region of neural activity that tends to 1 An afferent nerve carries impulses toward the central nervous system. indicating a greater number of varicosities in the lateral than in the medial part of the visual cortex. For cortico-cortical connections.300 varicosities has been used) are assigned to axons using the distribution of varicosities along the lengths of real axons [17]. The axons of the most rostral (right) and most caudal (left) LGN neurons in the array extend to the caudal and rostral poles of the cortex. The conduction velocity for geniculate afferents in turtle visual cortex has been measured at 0.

In summary. These produce increasingly stronger inhibition to active pyramidal cells (see Figure 12. in anti-Hebbian adaptation. we have observed that the neural population remains hyperpolarized (i. This is achieved successfully. After about 700 ms. maintained a very low membrane potential) long after the initial wave has been killed. Using the large-scale model of the visual cortex that consists of excitatory and inhibitory cells described above. . outlined in this chapter. the synaptic strength between two cells increases in proportion to the product of the pre.and postsynaptic activities. In Hebbian adaptation. Eventually these waves are killed by a strong gaba (a type of synaptic input)-initiated inhibition that originates after a long delay. the synaptic strength between two cells decreases in proportion to the product of the pre. the first round of waves has been inhibited and the pyramidal cells are hyperpolarized.and postsynaptic activities. stellate. how Hebbian and antiHebbian adaptation has been used in generating a series of cortical waves. forcing these cell populations to get out of hyperpolarization. One way to remedy this problem is to detect this period of hyperpolarization and increase the synaptic interaction between the excitatory pyramidal cells.378 Modeling and Control of Complex Systems propagate in all directions. The weights between the cells are very large. We do not claim to have established a biological role of the Hebbian/anti-Hebbian adaptation in the wave-generation process observed in actual turtles. the feed-forward circuit incorporates inhibitory actions from the stellate and subpial cells.9 we show anti-Hebbian action on the pyramidal cells.5). Fortunately. Likewise. Rows 1a and 2a show wave activity as a function of time. In our model. they control the timing of wave generation. an undesirable property. Left unabated. The feedback inhibition reduces and eventually kills the neuronal activity at the spot where the activity is greatest. is that the waves encode target information from the visual scene. using simulation of the model cortex. There are inhibitory actions that inhibit the wave using a feedback circuit due to three different cells: subpial. and horizontal. we outline in this section how cortical cells have the ability to generate and maintain a wave of activity. The inhibitory interactions between the stellate/subpial/horizontal and the pyramidal cells are chosen to be Hebbian. using Hebbian and anti-Hebbian adaptation..e. the excitatory interconnection between pyramidal cells is chosen to be anti-Hebbian. as indicated by the red lines in rows 1b and 2b of Figure 12. The cortex remains unresponsive to future visual inputs.9. Although the precise roles of the two inhibitory cells are different and somewhat unclear. The combined effect of the two circuits gives the appearance of a traveling wave. these pyramidal cells would excite the entire cortex. In later sections we show how these waves encode the location of targets in the visual space. This produces increasingly larger synaptic weights between pyramidal cells once the waves have been abated. A subsequent input causes a second round of waves (not shown in the figure). An important result. This would amplify the tiny input into the pyramidal cells. In Figure 12. We show.

Modeling and Estimation Problems in the Visuomotor Pathway 379 10 20 100 ms 10 20 30 40 50 60 200 ms 10 20 30 40 50 60 300 ms 10 20 30 40 50 60 400 ms (1a) 30 40 50 60 10 20 30 40 50 60 1. noises have been .5 1 1.10).2 0 2 0 10 20 30 40 50 60 10 20 30 40 50 60 1.5 2 0 0.4 1.5 1 1. (1b): Frames of weight responses corresponding to the activities in (1a).6 1.9 (See color insert following p.4 0.6 1.4 0. each input goes into 20 LGN neurons.5 2 0 0.6 0. (2b): Frames of synaptic weight responses corresponding to activities in (2a).6 0.5 2 10 20 500 ms 10 20 30 40 50 60 600 ms 10 20 30 40 50 60 700 ms 800 ms (2a) 30 40 50 60 10 20 30 40 50 60 1.4 0.6 1.2 0 10 20 30 40 50 60 1.5 0.8 0.2 1 0.2 1 0.6 0.4 0.4 1.5 2 0 0. from left to right along the LGN array.2 0 1.4 1. 12.4). (2a): Frames of pyramidal cell activity due to pulse input to the LGN at 400 ms following the first pulse lasting for 150 ms.4 1.4 0.) Pyramidal to pyramidal anti-Hebbian synaptic response to changes in the pyramidal activity.6 1.2 1 0. and 181–200.4 1.6 0.6 0. To study the encoding property of the large-scale cortex model.5 2 FIGURE 12.2 1 0.2 1 0.6 0. 1–20.2 0 10 20 30 40 50 60 1. (1a): Frames of pyramidal cell activity due to pulse input to the LGN at 0 ms lasting for 150 ms.4 1.5 1 1.8 0.6 1.2 0 10 20 30 40 50 60 (2b) 0 0. three equidistant positions of the stimuli have been chosen across the LGN.4 Simulation with a Sequence of Stationary Inputs The stationary stimulus has been simulated by presenting a 150-ms square current pulse to a set of adjacent geniculate neurons (see Figure 12. For the purpose of our simulation.5 2 0 0.2 0 10 20 30 40 50 60 (1b) 1b 0 0.8 0.2 1 0.6 1.6 1.8 0.2 1 0.2 1 0.6 0. The stimuli are labeled by “Left.8 0.8 0.8 0.” and “Right” (see Figure 12.5 1 1.6 1.4 0.2 0 10 20 30 40 50 60 1.5 1 1. 272.4 0.5 2 0 0.4 1.4 1. respectively.5 1 1.5 1 1.6 0.5 1 1. 91–110.4 0.8 0.” “Center.2 0 1.

the synaptic interactions between pyramidal cells are stronger in the case of model cortex with adaptation as compared to the model without adaptation. The model cortex with adaptation that we describe in this section samples the visual world every 500 ms by producing a wave of cortical activity that lasts for a little less than 400 ms.11). an important one is that the duration of wave propagation is shortened from about 800 ms (see Figure 12. Among the many consequences of adaptation.10 Simulation of flash inputs. In order to study the ability of the model cortex to encode a sequence of consecutive events. In this section we claim that by introducing Hebbian/anti-Hebbian adaptation we are able to pull out the model cortex from hyperpolarization after the first wave has propagated.11). to compensate for the hyperpolarization of membrane potential. Without the Hebbian and anti-Hebbian adaptation. After the wave has propagated.3). Each of the 500-ms time intervals would Ti m m s) .380 Current (nA) Modeling and Control of Complex Systems 0. After the end of the first round of waves (around 450 ms in Figure 12. With the implementation of Hebbian/anti-Hebbian adaptation to the model cortex. introduced into the model by injecting randomly generated currents to the somata of cortical neurons that satisfy Gaussian distribution. An input initiated around 500-ms results in a second wave (see Figure 12. Using waves generated by inputs at different locations in the visual space.3) to less than 400 ms (see Figure 12. the model cortex produces propagating waves of activity that last for about 600 to 800 ms (see Figure 12. one obtains a model that responds to the activities of the pyramidal cells by altering intercellular synaptic interactions. we are able to study the encoding property of the model cortex (see Reference [9]). the model is expected to generate a sequence of activity waves corresponding to a sequence of visual inputs.11). with stationary inputs described as above.3 Left Center Right LGN 150 e( FIGURE 12. the cortical neurons remain hyperpolarized and unresponsive to any future inputs. This results in a strong amplification of tiny inputs into the pyramidal cells.

In order to ensure that one period of cortical activity does not “spill over” to the next period. Furthermore. In each period a target is located at either L. CC. The second input is initialized at 500 ms. At any given period. This gives rise to a total of nine pairs of target locations for the two consecutive periods. C. or Right (R). LR. In the simulations that we have carried out. be called a period. a target is shown only for the first 150 ms of each period and removed subsequently. RC. and RR. or R. one would like the detection results at any given period to be independent of prior locations of the target. one would like to detect “target location” from the associated cortical wave observed during the same period. CL. CR. A target can be located at three different locations: Left (L). LC. Center (C). given by LL.Modeling and Estimation Problems in the Visuomotor Pathway 381 100 ms 150 ms 200 ms 250 ms 300 ms 350 ms 400 ms 450 ms 600 ms 650 ms 700 ms 750 ms 800 ms 850 ms 900 ms 950 ms FIGURE 12.) Two cortical waves generated by the model cortex using Hebbian and anti-Hebbian adaptation with two consecutive inputs. that start at 0 ms and .11 (See color insert following p. RL. we consider a pair of consecutive periods. 272. Each combination of two locations can be simulated as an input by presenting two 150-ms square current pulses.

382 Modeling and Control of Complex Systems 500 ms. data compression. and so forth) [20]. Consequently. in the next section we proceed to describe a principal components-based technique. Principal components analysis has been introduced independently by many authors at different times. They originate from the same point in the cortex (rostrolateral edge) and they propagate in both rostrocaudal and mediolateral directions. 0 ≤ t ≤ T. y) denotes the pixels. the image can be approximated in a low-dimensional subspace. respectively. the method is useful for finding a separable approximation to the solution of a partial differential . The value of membrane potential at each pixel is color coded and the spikes are not removed in the process. Selected snapshots from movies corresponding to stationary stimuli (assuming a model cortex without adaptation) are shown in Figure 12. the method is used for removing a redundancy (decorrelating pixels) from images [25]. partial differential equations. The interpolated data. we are primarily interested in the pyramidal neurons. The simulation results consisting of membrane potentials of individual neurons are recorded and saved in a data file. In order to compare two movies. The program uses triangle-based linear interpolation. are denoted by I (x. In the theory of partial differential equations. y.3. also called principal eigenvectors. and represents a rotation of a coordinate system. n). although other methods are also available (triangle-based cubic interpolation. Given that every frame has l × l pixels. and every movie has m frames. Even though the data for all neurons are available. y. The responses of pyramidal neurons are visualized as movies by spatially resampling the data from a nonuniform grid of neuron coordinates to an artificially constructed l × l uniform grid. t) where the pair (x. the rotation proposed by the method is optimal as it leads to a complete removal of the correlation from neighboring pixels. which is equivalent to diagonalizing the image correlation matrix. weather prediction. The transformation itself is linear. so that neighboring pixels in the new coordinate system are less correlated. Each of the nine inputs causes the model cortex (with adaptation) to produce a pair of waves of activity in each of the two consecutive periods. This method has also been used earlier by Senseman and Robbins for the analysis of data recorded from the cortex of a freshwater turtle [29]. it is clear that the dimension of I (x. The overall simulation time is set to 1000 ms. y. the simulation is repeated 20 times for each of the nine input pairs. using only selected basis vectors. Considering a noisy model of the cortex. giving rise to a set of 180 pairs of activity waves. t) of the model cortex to different target locations can be viewed as a collection of movie frames (snapshots). t) could be very high (l × l × m). such as image and signal processing. for visualization. to the corresponding sets of adjacent geniculate neurons. The spatiotemporal response I (x. [30]. A comparison between model waves [20] and experimental waves recorded by Senseman [28] shows that the two waves have similar features. and the responses of pyramidal cells are denoted by I (t. fluid dynamics. The method is widely used in various disciplines. Moreover. nearest neighbor interpolation. In image processing. and so forth [11]. 1 ≤ n ≤ N where t is time and n is the index of the pyramidal neuron.

in which each segment of the cortical wave is mapped to a point in a suitably defined B-space. [21]. . we describe how the set of 180 pairs of activity waves (described in Section 12. Both the starting and ending times of the windows change while the length of the window remains constant. 12.Modeling and Estimation Problems in the Visuomotor Pathway 383 equation. which is optimal in the sense that it maximizes the kinetic energy cost functional [8]. overlapping encoding windows and double KL decomposition is applied to the segment of the spike rate signal that is covered by each window. Another encoding window technique considered in Reference [9] is the expanding detection window (EDW) for which the starting time remains unchanged at 0 ms. the time axis is covered by equal-length. the method goes by the names: Karhunen– Loeve (KL) decomposition. Both the starting and ending times of the windows slide over the time axis while the length of the window remains constant. and singular value decomposition.4) are encoded. the model cortex with adaptation is repeatedly simulated by adding independent and identically distributed Gaussian noises to each of its neurons. Hotelling decomposition. overlapping. In this section we only describe the SDW technique. The next section describes some of the main ideas using double KL decomposition. We shall refer to it as the KL decomposition which has already been applied to the analysis of cortical waves (see References [9].5 Encoding Cortical Waves with β-Strands Using Double KL Decomposition In this section.12. sliding encoding windows. Depending on the context. [29]. The time axis is covered by equal-length. Plotting images of successive windows produces a a w t1 + 1 t1 + w FIGURE 12. proper orthogonal decomposition. a is the amount of time that the window slides and w is the width of each encoding window. repeated presentation of the same stimulus does not produce the same response in general. For each of the nine input pairs. As shown in Figure 12. as βstrands. As a result of additive noise injected to the cortical neurons. We discuss how to utilize a two-step KL decomposition to analyze the cortical responses of the model cortex to various stimuli with injected noises using the sliding detection window (SDW) technique. using double KL decomposition.12 Encoding window. [30]).

01 0 400 800 1200 0 400 800 1200 0. 1 ≤ n ≤ N to denote the smoothed spike rate of the cell as a spatiotemporal response signal. called the β-strand (see Figure 12. Figure 12.15 below). the spike trains from the pyramidal cells are smoothed by a low-pass filter into a spike rate function.13 shows some examples of spike trains of pyramidal cells and their smoothed spike rates. sequence of points in the B-space. I (t.13 Responses of model pyramidal cells.02 0. The traces in the right column show the smoothed spike rate of the same three cells.06 0 400 800 1200 Time (ms) 0.02 −0.02 0. The tth row represents the spike rate of each neuron at time t in response to a particular stimulus. The traces in the left column show voltage traces from three model pyramidal cells.02 −0. For a particular input stimulus.02 −0.06 0 Membrane Voltage (V) 0. 0 ≤ t ≤ T.01 0 0 400 800 1200 FIGURE 12. n).06 0 0.02 0.02 −0. n) can be viewed as a matrix. Let the length of each .01 0 400 800 1200 0 400 800 1200 0.02 −0. we continue to use I (t. where t is time and n is the index of the pyramidal neuron. The encoding process is now described in detail.384 Modeling and Control of Complex Systems 0. This is a vector-valued function of time and is an alternative way to encode the original movie as a strand.02 −0. The nth column corresponds to the pyramidal neuron n. Prior to the double KL decomposition process.

3) where φ j ∈ R N×1 is the jth principal mode. respectively. p t1 + 1 ≤ t ≤ t1 + w (12. so its eigenvalues are all real and nonnegative and the corresponding eigenvectors form an orthonormal basis in R N . Figure 12. Let M denote the total number of cortical response movies in response to stimuli in the left. We first describe the KL transform into A-space. The eigenvectors corresponding to the largest p eigenvalues of C1 are called the principal eigenvectors. ⎥ ⎣ . · stands for the standard inner j j product. The coefficients α k (t1 + i) of the KL decomposition are uncorrelated j in terms of j and we call α k (t). · · · . ⎢ . Let ⎡ γ jk ⎢ k ⎢ α j (t1 + 2) ⎢ =⎢ .2) where [I k (t1 + i)]T is the transpose of I k (t1 + i). and the pth-order successive reconstruction of the spatiotemporal signal I k (t) ∈ R1×N is given by: p ˆ I k (t1 + i) = j=1 α k (t1 + i) φ T . . t1 = 0. · · · be the response signals for different time windows. For the kth. ⎣ α k (t1 + w) j α k (t1 + 1) j ⎤ ⎥ ⎥ ⎥ ⎥. ⎥ ⎢ . 1 ≤ j ≤ p the pth-order j A-space representation of the movie segment within the corresponding time window for the kth movie. k = 1. α k (t) . φ T . the spatiotemporal signal in this time window can be viewed as a collection of vectors {I k (t1 + 1). I k (t1 + 2). t1 + 1 ≤ t ≤ t1 + w. I k (t1 + w)} where I k (t1 + i) ∈ R1×N . Here a is the amount of time that the encoding windows slide (see Figure 12. The dimensionality of the cortical response is reduced by two KL transforms into A-space and B-space. The vector function k k α1 (t). · · · . ⎥ ⎦ k T ⎡ ⎢ k⎥ ⎢ γ2 ⎥ ⎢ ⎥ (ξ ) = ⎢ . and right visual fields. Statistical analysis of a random process can be facilitated if the process is further parameterized using a second KL decomposition. · · · .4) can be viewed as a sample function of a vector random process.12). i = 1.14 shows the first three principal modes and the corresponding time coefficients in a certain time window.Modeling and Estimation Problems in the Visuomotor Pathway 385 time window be w and let I (t. The covariance matrix C1 ∈ R N×N for a family of M movies is calculated as: C1 = 1 Mw M w ( I k (t1 + i)) T ( I k (t1 + i)) k=1 i=1 (12. j j i = 1. ⎦ k γp γ1k ⎤ (12. 2. M movie. α2 (t). The matrix C1 is symmetric and positive semidefinite. n). · · · . center. w. w (12. · · · . t1 + 1 ≤ t ≤ t1 + w. or modes. a . 2a . 2.5) . 2. and ·. the time coefficients α k (t1 + i) are j given by α k (t1 + i) = I k (t1 + i).

2. · · · . . The right-hand column shows the corresponding time coefficients. We refer to this β-strand as the B-space representation of this movie. The coefficients β k are found by orthogonal proj jection of ξ k onto the jth eigenvector β k = ξ k . ψ T . j = 1. where j = 1.14 (See color insert following p. M = 180. for each sliding encoding window. The statistical mean of the β-strands of the left-. q are the eigenvectors corresponding to the largest q eigenvalues of the matrix C2 . we used w = 10. In our analysis of this section. 2.) The left-hand column shows the three principal spatial modes.386 Modeling and Control of Complex Systems i=1 i=2 i=3 FIGURE 12.6) The q th-order successive approximation of the kth vector ξ k is given by: q ˆ ξk = j=1 βk ψ T j j (12. . k=1 (12. p = 679. we obtain a low-dimensional representation of the original movie segment. and the values . we say that the vector consisting of these q components is the q th-order B-space representation of the movie.7) where ψ j . the first q components of each β vector are used. It turns out that only a few β components capture most of the information contained in the original movie and the rest of the β components are close to zero. . . Repeating the above data-processing procedure for all the sliding encoding windows of a movie produces a β-strand as a vector-valued function of time. Calculating the covariance matrix as in Equation (12. p. center-. q = 3. and right-stimuli movies can be easily obtained. The β vector is referred j j to as the B-space representation of the cortical movie restricted to a given time window. By discarding those components that are close to zero. 272.2). If. we have: C2 = 1 M M (ξ k ) T (ξ k ).

In the right figure are the mean β-strands in the second time period. 4. .06 0.. 3 (12. red and green represent the actual positions of stimuli at left.Modeling and Estimation Problems in the Visuomotor Pathway 387 First Waves 0.1 Series Expansion of Sample Functions of Random Processes The β-strand.08 0. we propose to obtain a series expansion of the β-strand within a chosen detection window.3 0. 2.15 shows the mean β-strands for 60 presentations of stimuli at the left. In our case. 12.18 0. Assume that the three positions of the target correspond to three different hypotheses. r (t). This process involves finding a complete orthonormal set {φi (t). that is.8 1 0. and right.12 0.2 0. . and H3 denote the hypothesis that the stimulus is from the left. This idea can be extended to include sample functions of a random process as well.05 0 0.2 0. of t1 were chosen to be 0.6 Statistical Detection of Position In this section. It is well known that a deterministic waveform with finite energy can be represented in terms of a series expansion. H2 . i = 1. 272. .02 0 0.14 0. and right clusters of geniculate neurons in the first time period and the second time period. respectively. and right clusters of geniculate neurons.15 (See color insert following p.04 0. center. center.4 Second Waves 0. 12. respectively.16 0.15 0.2 0.8) where n(t) represents a vector-valued Gaussian noise process contained in the β-strand with mean 0. Let us write: r (t) = si (t) + n(t). center.2 0.1 0 0 0.6 0.1 0.1 0. 2. Figure 12. The colors blue.4 0. can be regarded as a sample function of a vector stochastic process.1 0 0 0. center.2 0. i ∈ N} . let H1 .6. In the left figure are the mean β-strands for the 60 presentations for stimuli presented at the left.8 FIGURE 12. the problem of detection is posed as a hypothesis testing problem.4 0.6 0.4 0.) The typical β-strands with double KL decomposition.3 0. and right clusters of geniculate neurons in the first time period.

i ∈ N} are obtained. 12. λi is called the eigenvalue of the noise process and φi (t) is called the corresponding eigenfunction.6. L→∞ i=1 ri φi (t). t and u denote time and i and j denote indices of the component of the vector noise process. T1 ≤ t ≤ T2 . Commonly used decision criteria include the Bayes and . It corresponds to the energy along the coordinate function. This is to say that.13) Let us recall from Section 12. and K (t. Clearly.m. one can project the sample function r (t).5 that q is the number of β components we choose for the B-space representation of the cortical movies. rν ].13).388 Modeling and Control of Complex Systems (N denotes the set of integers) and expanding r (t) as: L r (t) = l. T1 ≤ t ≤ T2 (12. (12. φi (t). u) = E[ni (t)n j (u)]. The coefficients ri . l. that is. The complete orthonormal set φi (t) is the solution of the integral equation: q λi φik (t) = j=1 T2 T1 K k j (t. It is shown in Reference [31] that if mi = 0. Once the coordinate functions {φi (t). In Equation (12. then we would like to have: E[(ri − mi )(r j − m j )] = λi δi j . r2 .i. · · · . T1 ≤ t ≤ T2 (12. K i j (t. denotes “limit in the mean” which is defined as: q L→∞ L lim E k=1 (r k (t) − i=1 ri φik (t)) 2 = 0. u) are the covariance matrix of the noise process n(t). are required to be uncorrelated with each other. if E[ri ] = mi .m. u)φi (u)du j (12.12) where k = 1. Let us denote r k (t) and φik (t) to be the kth component of the vectors r (t) and φi (t). in a particular sample function. (12.2 Hypothesis Testing The proposed detection algorithm is based on computing conditional probability densities and choosing a decision criterion (see Reference [31] for details). respectively. T1 ≤ t ≤ T2 onto φi (t) and obtain the coefficient ri as: q ri = k=1 T2 T1 r k (t)φik (t)dt. The νth-order representation of r (t) can then be written as a vector R = [r1 .9). In Equation (12. λi ≥ 0 for all i. then λi is the expected value of the energy along φi (t). Here.12).i.10) where E is the expectation operator. q . to be defined later in Equation (12. · · · .9) where φi (t) are vectors of the same dimension as r (t).11) The value of ri2 has a simple physical interpretation. 2.

The second reason is that a certain cost is incurred each time an experiment is conducted. P3 (12.3 Decoding with Additive Gaussian White Noise Model If the vector noise process is Gaussian and white. if the logarithm likelihood ratio pair falls in region H1 . pr |H1 ( R|H1 ) (12. the dividing line between regions H1 and H2 becomes the negative vertical axis. 2.17) ln 2 ( R) ln 1 ( R) + ln (12. hypothesis H2 with probability P2 . the stimulus is from the left part of the visual space. 12. that is. and hypothesis H3 with probability P3 .6.3 of Reference [31]) as follows: 1 ( R) = = pr |H2 ( R|H2 ) pr |H1 ( R|H1 ) pr |H3 ( R|H3 ) . the likelihood ratio can be computed (see Section 2. We propose to design our decision rule so that on the average the cost is as small as possible. if P1 = P2 = P3 = 1/3. If we assign the cost of correct detection to be zero and that of wrong detection to be 1. The first is that the hypotheses are governed by probability assignments. where N0 ∈ R. and we address only the case for which n(t) is white. In this chapter. we assume that P1 = P2 = P3 = 1/3. we say that the hypothesis H1 is true. and δ(·) is the Dirac . that is. In the following discussion. 3. the dividing line between regions H2 and H3 becomes the diagonal line which is 45 degrees counterclockwise from the positive horizontal axis. I is the identity matrix. j = 1. we use the former for two reasons.18) The associated decision space has been sketched in Figure 12.15) 2 ( R) The decision regions in the decision space are determined by comparing the logarithm likelihood ratio to the following thresholds: ln 1 ( R) H2 or H3 H1 or H3 H3 or H2 H1 or H2 H3 or H1 H2 or H1 > < > < ln P1 P2 P1 P3 P2 . the same can be said for H2 and H3 . which we denote as P j . It is demonstrated [31] that for a decision using the Bayes criterion. The vector noise process n(t) in Equation (12. that is. In Figure 12. and the dividing line between regions H3 and H1 becomes the negative horizontal axis. the optimum detection consists of computing the logarithm likelihood ratio and comparing it to a threshold. E[n(t)nT (u)] = N0 I δ(t − u).14) (12.16) ln > < 2 ( R) ln (12.Modeling and Estimation Problems in the Visuomotor Pathway 389 Neyman–Pearson criteria. Likewise.8) can be either white or colored. For a particular strand r (t).16. hypothesis H1 occurs with probability P1 .16.

instead of solving the integral Equation (12. the region that the pair ln 1 ( R) and ln 2 ( R) fall into in the decision space determines which hypothesis is true. So. the eigenfunctions of the noise process turn out to be the orthonormalization of {si (t). 2. H1 . H2 .12). we apply the Gram–Schmidt orthogonalization procedure on {si (t). φ2 (t)) IP(φ2 (t). 2. φ1 (t)) IP(φ2 (t). φ1 (t)) IP(φ1 (t). φ1 (t)) IP(s3 (t). φ1 (t)) IP(φ1 (t). function. 3} to get {φi (t). For any given β-strand r (t). 3} as: φ1 (t) = s1 (t)/ norm(s1 (t)) φ2 (t) = ψ2 (t)/ norm(ψ2 (t)) φ3 (t) = ψ3 (t)/ norm(ψ3 (t)) where ψ2 (t) = s2 (t) − c 1 ∗ φ1 (t) ψ3 (t) = s3 (t) − c 2 ∗ φ1 (t) − c 3 ∗ φ2 (t) c 1 = IP(s2 (t). i = 1. 2.14) and (12. in terms of the logarithm likelihood ratio Equations (12. φ2 (t)) IP(φ1 (t). φ1 (t)) IP(s3 (t).16 Decision space divided into three regions. i = 1. H1 is the hypothesis that the visual input is from left. φ2 (t)) . φ2 (t)) IP(φ2 (t). i = 1. φ2 (t)) IP(φ1 (t). φ1 (t)) IP(φ2 (t). φ2 (t)) IP(s3 (t). and H3 . φ1 (t)) IP(s3 (t). H2 is the hypothesis that the visual input is from center. 3}. φ1 (t)) IP(φ2 (t). H3 is the hypothesis that the visual input is from right. φ1 (t)) IP(φ1 (t).15). φ2 (t)) IP(φ2 (t).390 Modeling and Control of Complex Systems ln Λ2 (R) H3 ln P1 P3 H2 In Λ1 (R) H1 ln P1 P2 FIGURE 12. φ2 (t)) c2 = c3 = IP(φ1 (t).

respectively. 2 2 ln 2 ( R) = i=1 In this study. and r3 depend on which hypothesis is true: E[ri |H j ] = mi j . The mean values of r1 . Gaussian. each point on the decision space represents a given cortical wave (restricted to the corresponding time window) in response to an unknown stimulus. This phenomenon has not been studied in detail in this chapter and will be described in the future. an observation that has already been made by Du et al. For each of the decision spaces in Figure 12. . white. H2 . The column on the left corresponds to the first period and the column on the right corresponds to the second period. any point corresponding to a left. The noise n(t) is assumed to be additive.17. on the decision space. indicating the “spillover effect” from the first time period. The actual position of the stimuli at the left. We also note that the detection error probabilities are slightly higher for the second time period in comparison to the first.18. Ideally.” We observe that the detection error increases slightly when the detection window slides to the latter part of any period. ·) are defined.Modeling and Estimation Problems in the Visuomotor Pathway and IP(·. This indicates that the target locations are “less detectable” toward the latter part of the period in comparison to the earlier part. respectively. 3. r2 . we show the decision spaces for a set of five different time windows chosen from two consecutive time periods. Any point that does not fall in its corresponding region in the decision space produces a detection error.17. as: q 391 IP(a (t). In Figure 12. and r3 do not depend on which hypothesis is true and are statistically independent of r1 . All of the ri except r1 .14) and (12.15) can be calculated as: 3 ln 1 ( R) = i=1 3 1 N0 1 N0 1 2 1 2 ri mi2 − mi2 − ri mi1 + mi1 2 2 1 2 1 2 ri mi3 − mi3 − ri mi1 + mi1 . j = 1. and independent over time. Note also that the coefficients r1 . We then project r (t) onto this set of orthonormal coordinate functions to generate coefficients ri as in Equation (12. and r3 . and right clusters of geniculate neurons are encoded by the blue. we have plotted the detection error probability as a function of the location of the “time window.13). red. and green colors. or H3 . respectively. φ2 (t). ·) and norm(·. center. the length of the sliding detection window has been set to 99 ms. i. [9]. In Figure 12. and φ3 (t) and are chosen so that the entire set is complete. 2. r2 . center. The remaining φi (t) consist of an arbitrary orthonormal set whose members are orthogonal to φ1 (t). a (t)). the logarithm likelihood ratio (12. The waves generated in each of the time periods have been used to detect the location of the target at that time period. r3 are uncorrelated with each other. r2 . b(t)) = k=1 T2 T1 a k (t)b k (t)dt norm(a (t)) = IP(a (t). Based on the Gaussian assumption. r2 . or right stimulus should fall in the region of H1 .

.17 (See color insert following p. center. 272. On the left column are the decision spaces using the waves in the first time period and on the right column are the decision spaces using the waves in the second time period. red.392 First Waves Modeling and Control of Complex Systems Second Waves 65−163 ms 145−243 ms 225−323 ms 305−403 ms 385−483 ms FIGURE 12. The coordinates are log likelihood ratios computed for five different time windows. and green colors respectively. The actual positions of stimuli at left.) Decision spaces for the detection of three hypotheses. and right clusters of geniculate neurons are encoded by the blue.

03 0. The mechanism of recognition is related to phase locking.19). for the purpose of decoding from cortical waves.02 0.18 Detection error probability using statistic method with points on β-strands within a sliding time window of 99 ms. N (assume N = 2 for illustration) are phase variables taking values in the interval [−π. Elements of the oscillator network interact with each other via phases rather than amplitudes. play a crucial role.03 0. Each unit of the oscillator network oscillates with the same frequency and a prescribed phase relationship. 12. For pattern recognition with a network of oscillators.02 0 50 100 150 200 250 300 350 400 450 500 0. The left figure shows the detection error probability using the first waves and the right figure shows the detection error probability using the second waves.01 −0. · · · . phase differences. The parameters si j and ψi j are assumed to satisfy si j = s ji .07 0.19) where φi .01 0 −0.02 Second Waves 393 0 50 100 150 200 250 300 350 400 450 500 FIGURE 12.07 0.06 0.01 0 −0.1 Phase Locking with a Network of Kuramoto Models Consider a dynamic system of the form: ˙ φi = ω + N si j sin(φ j − φi + ψi j ) j=1 (12.19) . 12. π). we define a new variable φ = φ1 − φ2 and rewrite Equation (12. memorized patterns correspond to synchronized states.04 0.04 0. i = 1. To illustrate the main idea. instead of phases.01 −0. we would like to review a model proposed by Kuramoto [12]. The index i refers to the ith unit and these units are coupled.Modeling and Estimation Problems in the Visuomotor Pathway First Waves 0.7 Detection Using Nonlinear Dynamics The purpose of this section is to introduce yet another computing paradigm. emerging from a network of oscillators. In order to understand the dynamics of Equation (12. ψi j = −ψ ji .05 0.05 0.02 0.06 0.7.

we obtain analogously the following equation: 2 ˙ ¯ ¯ φ = − s12 sin(nφ − ψ12 ).20) for the purpose of memorizing n patterns. the two stable points of φ are actually the same.22) to distinguish among n complex patterns. ±2. π ). (12. The phase difference ¯ ¯ variable φ(t) can be plotted as a unit complex number e i φ(t) .23) ¯ ¯e then φ(t) converges to the kth stable stationary point φk . In order to use Equation (12.e. we would require that it has (at least) n equilibria.394 as follows: Modeling and Control of Complex Systems ˙ φ = −2s12 sin(φ − ψ12 ). 2 and 3.22) are given by 12 ¯e φk = ψn + 2(k−1)π if ψ12 < 0. · · · .20) converges globally to a unique equilibrium point.19) with respect to the new variables.21) For φ1 . N = 2).2 Memory with Two Elements Let us discuss the problem of detecting n patterns with a Kuramoto model using two units (i. Additionally it can be verified that if: n ¯e φk − π π ¯ ¯e < φ(0) < φk + n n (12. k = 1.7.20) are given by φ − ψ12 = kπ . φ = ψ12 and φ = ψ12 + 2π are the two stable points if ψ12 < 0.22) Up to mod 2π. out of which the stable points are given precisely by: φ − ψ12 = 2kπ. k = 0. n (12. 3 The main idea behind pattern recognition is to utilize the convergence properties of Equation (12. the n stable stationary points of Equation (12. n Rewriting Equation (12. (12. This can be achieved by rescaling the phase variables as: ¯ φ1 = 1 φ1 .19 such a plot is shown when the rescaling parameter n is 3. This gives rise to 12 ¯e three stable stationary points at φk = ψ3 + 2(k−1)π . ±1. 12. In Figure 12. and φ = ψ12 and φ = ψ12 − 2π are the two stable points if ψ12 > 0. n ¯ φ2 = 1 φ2 . we obtain: 1 1 ˙ ¯ ¯ ¯ φ 1 = ω + s12 sin(nφ2 − nφ1 + ψ12 ) n n 1 1 ˙ ¯ ¯ ¯ φ 2 = ω + s21 sin(nφ1 − nφ2 + ψ21 ). Up to mod 2π . indicating that Equation (12.. Let us .20) The stationary points of Equation (12. n n ¯ ¯ ¯ By defining φ = φ1 − φ2 . φ2 in the interval [−π.

Thus. the n patterns pk . n where π1 and π2 are any two complex numbers such that |π1 | = |π2 | = 1 and arg(π1 π2 ) = ¯ ψ12 . . · · · . . 2.24) for k = 2. n. . Let us define a mapping ξ : C2 −→ R as follows: w1 w2 −→ arg(w1 w2 ).25) 12 ¯e It would follow that ξ( pk ) = φk = ψn + 2(k−1)π . .22) under initial conditions constrained by Equation (12.19 ¯ ¯ Phase variable φ(t) is plotted as a unit complex number e i φ(t) with the rescaling parameter 3. n The complex vectors pk .23).Modeling and Estimation Problems in the Visuomotor Pathway ψ12 + 2π 3 3 395 ψ12 3 ψ12 4 + π 3 3 FIGURE 12. . n. are mapped to the n stable equilibria of Equation (12. . k = 1. 2. 3. . Patterns that are close to any pk would be attracted towards the . n k = 1. ¯ (12. are n memorized complex patterns ¯e associated with n stable equilibria φk . · · · .22) under the map ξ . 2. showing three stable equilibria which result in three regions of convergence for the dynamical system (12. . k = 1. define the following n vectors in C2 as: p1 = π1 π2 and pk = e +i n π1 (k−1)π e −i n π2 (k−1)π (12. n.

we have Q = 3.396 Modeling and Control of Complex Systems corresponding kth equilibrium. k = 1. k = 1. The average of points on β-strands within the sliding time window in either the first or second time period. · · · . we would like to memorize patterns of real vectors. The locations of the target are detected with the β-strands within a sliding time window of width 99 ms. In this example. a Kuramoto model with two units has been chosen to detect the position of visual inputs from the cortical waves generated in the first time period and the second time period. the phase variables φi s of the two ¯ oscillatory units can be initialized with ξ T(v) and φ(t) converges to one of the equilibria. The dynamics of Equation (12. Rather. The map T can be either linear or nonlinear. respectively. Hence. The vectors pk s are defined in Equation (12. As an illustration. indicating that only the first three principal components are used for the detection ¯ problem. n. Figure 12. n where pk s are defined as in Equation (12. are mapped to the complex vector space C2 and the phases of two units in the Kuramoto model are initialized. C. Three equilibria are achieved by rescaling with n = 3. In the case of linear transformation. which we would now like to explore. 2. Assume that we have n vectors vk .22) can be used to memorize patterns of n real vectors. 2.20 shows plots of phase difference variable φ(t) in terms of sin and cos functions over time for the detection results from 180 cortical responses using the two-unit Kuramoto model and linear map upon the average points on β-strands within the time window associated with the waves in the . The memorized patterns are associated with phase difference equilibria via the map ξ T where: ¯e ξ T(vk ) = φk . To detect a pattern v in R Q . we are not interested in a set of complex patterns. in R Q which we would like to memorize.24) and the vectors q k j are the average of the β-strands within a time window. Target locations are not typically given as a vector of complex numbers. the map T between the real vector space R Q and the complex vector space C2 is obtained by minimizing the following error criterion: 3 60 (12. and R. We consider a map: T : R Q −→ C2 such that vk −→ pk . in R Q . This principle can therefore be used to recognize between alternative target locations. It follows that the rescaled phase difference φ = φ converges to one 3 of the three equilibria that are associated to the three positions of the targets. ¯ L.26) pk − Tq k j k=1 j=1 (12.27) where k is the position index and j varies over the total number of movies.24). · · · .

5 0.5 0 –0. and right clusters of geniculate neurons are encoded by blue.5 0 –0. red. The actual positions of the target at the left. The actual positions of stimuli at left. first time period (left column).20 represents a given cortical wave in response to an unknown stimulus.5 cos 0 (φ/ –0. red.) Convergence of phase variables in the two-unit Kuramoto model in detection with linear maps from β-space to complex vector space using the first waves and the second waves for five different time windows. respectively. and the second period (right column).5 0 cos (φ/ –0. and green colors.5 100 150 me 3) Ti –1 0 50 65–163 ms 1 sin(φ/3) sin(φ/3) 0.5 3) 200 250 100 150 ime T –1 0 50 300 1 0. . 272.5 –1 1 0.5 0 –0.5 –1 1 0. and right clusters of geniculate neurons are encoded by the blue.5 3) –1 0 250 300 150 200 50 100 Time 225–323 ms FIGURE 12.Modeling and Estimation Problems in the Visuomotor Pathway First Waves Second Waves 397 1 sin(φ/3) sin(φ/3) 0.5 –1 1 0.5 0 cos (φ/ –0.5 0 –0.5 300 0 200 250 cos (φ/ –0.5 0.5 cos 0 (φ/ 3) –0.5 0 –0. The figure shows the detection results within five different sliding time windows.5 –1 1 150 200 100 Time –1 0 50 250 300 1 0. Each curve in Figure 12. center. center.5 0 cos (φ/ 3) –0.5 –1 1 200 250 100 150 e –1 0 50 Tim 300 1 0. respectively.5 –1 1 0.20 (See color insert following p.5 0 –0.5 3) –1 0 200 250 100 150 ime 50 T 300 145–243 ms 1 sin(φ/3) sin(φ/3) 0.

21.5 3) –1 0 200 250 100 150 ime 50 T 300 385–483 ms FIGURE 12. respectively.5 250 300 0 cos 150 200 (φ/ –0.5 0 cos (φ/ –0.398 Modeling and Control of Complex Systems 1 sin(φ/3) sin(φ/3) 0. we can map the points on the decision space S onto the complex vector space C2 by a linear transformation D.5 100 3) Time –1 0 50 305–403 ms 1 sin(φ/3) sin(φ/3) 0. any curve corresponding to a left.20 (See color insert following p. Any curve that does not approach to its corresponding equilibrium produces a detection error. Subsequently.5 –1 1 0. Because the maps from β-strands to complex vector space are not limited to being linear. We consider the nonlinear map L from the space R Q of β-strands to the points on the decision space S. one would like to ask if there is a nonlinear map for which the detection results can be improved.5 –1 1 1 0. obtained in Section 12.5 3) 200 100 150 ime T –1 0 50 250 300 1 0.5 0 cos (φ/ –0.5 300 250 0 cos (φ/ 150 200 100 3) –0. In this chapter we give an example of such a map that can improve the detection results using the Kuramoto model.) (Continued). Ideally.6. Performing the detection over a continuum of detection windows and summing the total detection error for each detection window yields the relationship between the probability of detection error and detection window as shown in Figure 12.5 –1 1 0.5 –1 1 0.5 0 –0. It may be remarked that the detection error probabilities are quite large in comparison to Figure 12. indicating that the algorithm using nonlinear dynamic methods requires further improvement. The maps defined are described as follows: L : R Q −→ S D : S −→ C2 . center.28) .18. and green colors. 272.5 0 –0.5 0 –0.5 –1 0 50 Time 0. (12.5 0 –0. or right stimulus should converge to the associated point of equilibrium.

and then moves toward the fish to grasp it with its mouth.4 0. shown in Figures 12.5 0.3 0.21. the turtle must extrapolate the present position of the fish and plan its head movement to reach point x3 at time t3 . However.23. In Figure 12. when the turtle completes its head movement.1 0 0 Second Waves 399 50 100 150 200 250 300 350 400 450 500 FIGURE 12.2 0.Modeling and Estimation Problems in the Visuomotor Pathway First Waves 0.The turtle first notices the fish at point x1 at time t1 . The details are similar to optimizing a cost function of the form (12. An important question in this context is. To be successful.2 0. are considerably improved in comparison to Figures 12. The turtle tries to track a moving fish by anticipating its future location.20 and 12.21 Detection error probability using the Kuramoto model with points on β-strands within a sliding time window of 99 ms.5 0.8 The Role of Tectal Waves in Motor Control: A Future Goal We have remarked earlier that an important role of the cortex is in predicting the future locations of a moving target. The detection results. One can now use the Kuramoto model as described earlier. However. The left figure is the detection error probability using the first waves and the right figure is the detection error probability using the second waves. “How does the turtle accomplish this complex motion extrapolation task?” .22 and 12. It watches the fish until it reaches point x2 at time t2 . the turtle will miss the fish if it bases its head movement on the position of the fish when the movement began.7 we show a turtle that is trying to catch a fish moving past it from left to right.3 0. Thus. We now describe in some detail how the tracking movement is executed.4 0.7 0.7 0.8 0.1 0 0 50 100 150 200 250 300 350 400 450 500 0. the fish has a stake in the event and will try to evade capture by making escape movements that appear unpredictable to the turtle.6 0.8 0. 12. the fish keeps moving and reaches point x3 by the time t3 . The map D is obtained by constructing an optimal linear function that maps each of the three clusters on the decision space S to its corresponding pattern on the complex vector space C2 .6 0. By concatenating the two maps L and D we obtain a nonlinear map from the space of β-strands onto the space C2 of complex patterns.27).

5 –1 0 50 Time 0. center.5 –1 1 0. The actual positions of stimuli at left.5 cos 0 (φ/ –0.5 cos 0 (φ/ 3) –0.5 0 –0.5 300 0 cos 200 250 (φ/ –0.5 –1 1 0.5 250 300 0 cos 150 200 (φ/ –0. and right clusters of geniculate neurons are encoded by the blue.5 0 –0.5 0 cos (φ/ –0.400 First Waves Modeling and Control of Complex Systems Second Waves 1 sin(φ/3) sin(φ/3) 0.5 0 –0.5 300 250 0 cos (φ/ 150 200 100 3) –0.5 100 3) Time –1 0 50 65–163 ms 1 sin(φ/3) sin(φ/3) 0.5 100 150 ime 3) T –1 0 50 1 0.22 (See color insert following p.5 –1 1 250 300 150 200 100 Time –1 0 50 1 0.5 –1 1 0.5 0 –0. .5 0.5 0 –0. red.5 –1 1 1 0. and green colors.5 0 –0.5 3) –1 0 250 150 200 50 100 Time 300 145–243 ms 1 sin(φ/3) sin(φ/3) 0.) Convergence of phase variables in the two-unit Kuramoto model in detection with maps from points in decision space to complex vector space using the first waves and the second waves. 272. respectively.5 –1 1 0.5 3) –1 0 250 300 150 200 50 100 Time 225–323 ms FIGURE 12.

.23 Detection error probability using the Kuramoto model with maps from points in decision space to complex vector space.5 250 300 0 cos 150 200 (φ/ –0.1 0.2 0.5 0.05 0 −0.5 100 3) Time –1 0 50 1 0.5 0 cos (φ/ 3) –0.5 –1 1 0.5 100 150 me 3) Ti –1 0 50 305–403 ms 1 sin(φ/3) sin(φ/3) 0.05 Second Waves 0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 FIGURE 12.5 3) –1 0 200 250 100 150 ime 50 T 300 385–483 ms FIGURE 12. First Waves 0.22 (See color insert following p.5 –1 1 0.5 0 –0.5 0 –0.15 0. 272.5 0 –0. The left figure is the detection error probability using the first waves and the right figure is the detection error probability using the second waves.5 –1 1 0.15 0.5 300 0 200 250 cos (φ/ –0.) (Continued).Modeling and Estimation Problems in the Visuomotor Pathway 401 1 sin(φ/3) sin(φ/3) 0.05 0 −0.5 –1 1 250 150 200 50 100 Time –1 0 300 1 0.05 0.5 0 –0.1 0.2 0.5 0 cos (φ/ –0.

Information about the position and speed of visual stimuli is encoded in the spatiotemporal dynamics of the wave [9]. which drives the motoneurons that innervate the neck muscles. so it is natural to inquire if they play a role in the turtle’s attempt to catch the fish. we hypothesize that the cortical wave contains information that can be used to extrapolate the future position of a stimulus that has been moving along a smooth trajectory. whereas the tectal wave contains information that can be used to predict the future position of a stimulus that undergoes an abrupt change in its trajectory. [2] used single electrodes to record the responses of retinal ganglion cells in salamanders and rabbits to moving bars. y. I (x. which sends information to the tectum via the striatum. The movement is realized by projections from the tectum to the brainstem reticular formation. Wilke et al. [34] used an array of 100 extracellular electrodes to record the responses of ganglion cells in turtles to moving and stationary bars. . [18]. Berry et al.1 is the substrate for the prey capture behavior. Port et al. A moving bar produced a rapid and intense increase in the firing of ganglion cells that was not seen following the presentation of a stationary bar. t). [19]) suggest that moving stimuli produce a wave. A light intensity function. Several studies ([1]. [22] recorded simultaneously from pairs of electrodes implanted in the superior colliculus of macaques while the monkeys made visual saccades. visual cortex. They hypothesize that this wave is involved in determining the duration of eye and head movements. studies using both multielectrode arrays and voltage-sensitive dyes show that visual stimuli produce waves of activity in the visual cortex of freshwater turtles [28]. or “hill. contains the image of the moving fish that is the input to the system. Our future work would test the hypothesis that the waves in the visual cortex and optic tectum contain information that can be used to extrapolate the future position of a moving stimulus from its past trajectory.” of activity in the superior colliculus (the mammalian homolog of the optic tectum). More recently. The image is transformed by the retinal circuitry and fed in parallel to the lateral geniculate nucleus and optic tectum. [34]. and substantia nigra. Finally. Specifically. An interesting feature of the dynamics of this system is that moving stimuli produce waves of activity in the retina. Their data suggest that relatively large saccades — typically coordinated with head movements — are accompanied by a wave of activity that proceeds from caudal to rostral across the superior colliculus. [20]. The lateral geniculate transmits information to the cortex. The tectum thus receives direct information from the retina with a relatively short time delay and indirect information from the cortex with a longer delay. pretectum. Both retinal and tectal waves have been considered as candidate mechanisms for motion extrapolation [22]. The neural image of the bar on the retina is a wave of activity that leads the advancing edge of the bar. The tectum contains a topographic map of visual space and can generate a head movement directed towards point x2 at time t2 .402 Modeling and Control of Complex Systems The neural system in Figure 12. This phenomenon is due to a contrast gain mechanism in the retinal circuitry and is a potential mechanism underlying motion extrapolation. and tectum.

” In each period. Anticipation of moving stimuli by the retina. L. and we construct one such function in this chapter. Two-dimensional saccade related population activity in superior colliculus in monkey. H. a nonlinear function is required to generate the associated initial conditions. The second algorithm is developed using a nonlinear dynamic model that has the ability to converge to an equilibrium point based on where the model has been initialized. and M. 79:2082–2096. T. Gandhi. We show that in order to obtain detection results comparable to that obtained by the statistical method.9 Conclusion To conclude. Neurophysiol. References 1. 398:334–338. The role of the retino-cortical pathway is discussed in the context of the overall visuomotor control problem and we remark that the tectum plays an important part in the generation of the associated motor commands. We emphasize how Hebbian and antiHebbian adaptation plays a crucial role in maintaining a sequence of cortical activity and describe how these activity waves can be used to decode locations of unknown targets in space. W. I. Such control problems are the subject of future research. . J.Modeling and Estimation Problems in the Visuomotor Pathway 403 12. J. The proposed method utilizes hypothesis testing by projecting the observed data onto a decision space. we use the Kuramoto model and construct two different functions that generate the required initial conditions. The first algorithm utilizes a statistical detection method to detect the location of a target in the visual space. 1999. A. Berry II. Brivanlou. Nature. J. Each of the equilibrium points is calibrated with alternative locations of the target and a function is constructed that maps the raw data onto a vector space of complex numbers that can be used as an initial condition for the model. Keller. 2. For the purpose of this chapter. The animal scans the visual world by splitting it into a sequence of time windows called “periods. Meister. Sanjoy.. and D. we would like to emphasize that the retino-cortical circuit of a turtle has been described in this chapter. M. Jordan. R. Acknowledgments This work is partially supported by Grants EIA-0218186 and ECS-0323693 from the National Science Foundation. N. E. Anderson. 1998. the cortex produces a wave of activity. Two different decoding algorithms have been discussed in this chapter.

Biomedical Engineering. Encoding and decoding target locations with waves in the turtle visual cortex. 19. 2004. X. IEEE Trans. K. Comp. TELOS. 8. Nenadic. H. Chemical Oscillations. B. 9. J. 2001. Neurophysiol. S. and P. Port. 21. Neurol. Mancilla. Mulligan and P. Wurtz. P. D. C. Sylvester. and P. P. J. Ulinski. 1994.. 296:531–547. J. J. Holmes. Guitton. Waves. New York. Wurtz. S. Ulinski. Subpial and stellate cells: Two populations of interneurons in turtle visual cortex. Vis.. Delcomyn. 1998. K. 52:566– 577. Colombe. 1990. 1984. Bower and D. Sommer... Ulinski. Multielectrode evidence for spreading activity across the superior colliculus movement map. Freeman & Co. M. and N. Spread of activity during saccades.. 17. Temporal dispersion windows in cortical neurons. and D. P. The Book of Genesis. J. K. M. B. Block. 6. Cambridge University Press. S. Dynamical Systems and Symmetry. D. Vis. Spatiotemporal characteristics of phasic motor discharges. M. J. Beeman. 7:71–87. IEEE Trans. P. and P. Neurol. J. H. New York. P. 2005. Du. 5. January 2001. S. J. B. editor. Traveling electrical waves in cortex: Insights from phase dynamics and speculation on computational role. 13. Ulinski. Physiol. Colombe and P. Control of orienting gaze shifts by the tectoreticulospinal system in the head-free cat. Propagating waves in visual cortex: A large scale model of turtle visual cortex. 20. 4. Z. M. A. Fowler. and P.. 1991. Ulinski. S. G. Mancilla and P.. J. 2000. Ulinski. 11. 66:1642–1666. and Turbulence. 1991. Ulinski. Comp. 84:344–357. J. Springer Verlag. 15. 1996. and G. B. Nicol. Responses of regular spiking and fast spiking cells in turtle visual cortex to light flashes. T. S. 296:548–558. H. 1998. Comp. J. G. 471:333– 351. W. Neurosci.. D.404 Modeling and Control of Complex Systems 3. Munoz and R. J. Neurophysiol. Neuron. M. Ulinski.. 1998. J. J. Saccade-related activity in monkey superior colliculus. Neurophysiol. Ghosh. 7. Coherent Structure. Trends and Perspectives in Applied Mathematics. and P. Slater. J. Organization of geniculocortical projections in turtles: Isoazimuth lamellae in the visual cortex. B. 14:161–184. J. Y. Comput.. S. Spatial organization of axons in turtle visual cortex: Intralamellar and interlamellar projections. G. Mazurskaya. 1974.. Symmetry of attractors and the Karhunen–Loeve decomposition. 2002. Nenadic. L. Turbulence. Organization of receptive fields in the forebrain of Emys orbicularis. Modeling and estimation problems in the turtle visual cortex. 12. 29(3334):33–44. Ghosh.. Larson-Prior. Kleinfeld. S. . 2003. J. 22. Z. 66:293–306. Berkooz. 14. K. 18:9–24. B. Santa Clara. S. Dellnitz. N. E. Neurosci. Comput. Neurophysiol. Ghosh. 7:311–318. Neurosci. II. 49:753–762. and M. 73:2334–2348. Ermentrout and D. Cambridge. Neurol. and R. In L. Biomedical Engineering. Munoz. 10. L. Sirovich. Foundations of Neurobiology. Neurosci. Kuramoto. Ulinski. Springer-Verlag. III. S. pages 73–108. Ulinski. Pelisson. Role of GABAA -mediated inhibition in controlling the responses of regular spiking cells in turtle visual cortex. Excitatory amino acid receptormediated transmission in geniculocortical and intracortical pathways within visual cortex. L. F. 16. 1990. New York. A. Golubitsky. 18. Cosans and P. Neurosci.. 15:979–993. Lumley. Z. J. 1999. 1995. Behav.

Electrophysiological mapping of GABA A receptor mediated inhibition in adult rat somatosensory cortex. 2001. Neurophysiol. D. 13:963–977. Modal behavior of cortical neural networks during visual processing. Comp. Ghosh. Direct evidence for local oscillatory current sources and intracortical phase gradients in turtle visual cortex. 187:549–558. 27. A. M. C. and D. 1968.. Bongard.Modeling and Estimation Problems in the Visuomotor Pathway 405 23. L. Comput. 97(2):877–882. Wang. Prince. Prechtl. Vis. Physiol. Ulinski. K. A. Integrative Neurosci. C. Pesaran. 28. D. S. P. Neurosci.. M. 2006. 33. and D. K. Campaigne.. Rao and P. Neurosci. . IEEE Trans. W. Boca Raton. J. J. 34. Eurich. 2001. 30. 2005. C.. M. J. S. B. M. 1999. Population coding of motion patterns in the early visual system. Ghosh. 94(4):7621–7626. Comput. 75:1589– 1600. Bullock. Luo. 2000. Schwegler. Neurosci. 1996. Salin and D. and P. 26. Proc. Correspondence between visually evoked voltage sensitive dye signals and activity recorded in cortical pyramidal cells with intracellular microelectrodes. D. T. M. 25. Robbins. Kleinfeld. CRC Press. Visualizing differences in movies of cortical activity. J. Senseman. Vis. K. L.. Generation of receptive fields of subpial cells in turtle visual cortex. Van Trees. USA. 1996. S. 32. A. Spatiotemporal structure of depolarization spread in cortical pyramidal cell populations evoked by diffuse retinal light flashes. 16:65–79. Senseman and K. 29. A. Detection. C. Thiel. Acad. J. J. J. P. Proc. Senseman. Ulinski. W. 19:263–289. Acad. Graphics. Wang. B. Sci. USA. W. C. 1997. New York. Wilke. Yip. 31. 1999. R. Prechtl. Natl. J. A. B. Kleinfeld. Ammermuller. H. Cohen. D. A. Visual stimuli induce waves of electrical activity in turtle cortex. Robbins. Senseman and K. Two cortical circuits control propagating waves in visual cortex. B. FL.. 19:1–7. P. S. Sci. Visual. D. 24. 4:217–224. M. and P. Mitra. John Wiley & Sons. H. 1998. 5:561–593. The Transform and Data Compression Handbook. Neurosci. Estimation and Modulation Theory. and H. Natl. Greschner.

.

.............................................2.............. 435 407 ..........2...............................................................................6 Deterministic Optimal Control Model ...9 Psychophysical Spacing Model or Action Point (AP) Model ....4 Conclusions . 422 13...............3 Asymmetric Model.............................3 Mesoscopic Models........ Simulation...2.....3..... 418 13.2..........2 Microscopic Traffic Flow Models...7 Stochastic Optimal Control Model..... 421 13............4 Nonlinear Car-Following Model ................ 427 13. 424 13.................. 420 13.............................................................. 419 13..........2........2............................................1 System Description and Notation .1 Macroscopic Traffic Flow Models.... 419 13...2........2 Introduction....................5 Helly’s Model ................................. and Hwan Chang CONTENTS 13..................2..........................1 Linear Car-Following Model...2. 434 References..........3 Highway Traffic Flow Control...3......................2........................2 Generalized Linear Car-Following Model ................2....2.......... and Control of Transportation Systems Petros Ioannou............ 418 13......2... 409 13...............................................2........ 420 13.... 408 Modeling of Traffic Flow ....3...................................4 Evaluation of the HTFC System ..........................................1 13........................................................13 Modeling...................................2.......2........2.........2................................................................................................................................. 431 13................................3 Control Strategies..... 420 13......3..2........................ Yun Wang.............................2 Microscopic Simulation... 409 13...8 Gipps Model.2............ 417 13..... 422 13... 418 13............. 417 13..........2.

Transportation systems and traffic phenomena constitute highly complex dynamic problems where simplified mathematical models are not adequate for their analysis.” In another paper [2] it is pointed out that most of the congestion is due to mismanagement of traffic rather than to demand exceeding capacity. to include environmental and health effects. The successful implementation of intelligent transportation systems will require a good understanding of the dynamics of traffic and the effect of associated phenomena and disturbances.408 Modeling and Control of Complex Systems 13. environmental constraints. high cost of land. No one predicted the dramatic increase in car ownership that has led to the current situation where congestion during rush hours often converts a smooth traffic flow to a virtual parking lot. and the time it takes to build a new highway. chaotic intersections. The need for additional capacity in order to maintain the mobility and freedom drivers used to enjoy in the past can no longer be met in most metropolitan areas by following the traditional approach of building additional highways. For this reason computer simulation models are developed and tuned to describe the traffic flow characteristics on a given traffic network. In addition.1 Introduction Freeways were built originally to provide almost unlimited mobility to road users for a number of years to come. “the traffic situation on today’s freeways resembles very much the one in urban road networks prior to the introduction of traffic lights: blocked links. a small element of this large dynamic system. and so forth. make the building of new highways in many metropolitan areas a very challenging proposition. that element itself has complex dynamics which include those of the vehicle as well as the dynamics of the driver. The current highway system operates as an almost open-loop dynamic system. The lack of space. safety. which is susceptible to the effect of disturbances in ways that lead to frequent congestion phenomena. Highway traffic flow is a complex dynamic system that exhibits phenomena that are not easy to capture using mathematics. the travel time that drivers experience. travel cost. the understanding of human interaction within the transportation system is also crucial. . The high complexity and dynamicity of traffic systems cannot be always accurately captured by mathematical models. Different vehicles may have different dynamics. Furthermore each vehicle interacts with others leading to an interconnected dynamic system. The only way to add additional capacity is to make the current system more efficient through the use of technologies and intelligence. There is a need for more advanced methods and models in order to understand the complexity of traffic flow characteristics and find ways to manage traffic better using advanced technologies and feedback control techniques. As characterized on page 271 of Reference [1]. as well as the possible disruption to the traffic system in already congested areas. reduced safety. The driver’s response exhibits a certain level of randomness as different drivers may respond differently under the same driving conditions. quality of life. When one looks at a particular vehicle. The negative effects of congestion go beyond the obvious one.

density (concentration). PARAMICS [6]. They are simpler to develop and analyze but less accurate in their description of traffic flow characteristics during transient traffic disturbances. In the following subsections we present some of the most popular macroscopic and microscopic models proposed in the literature over the years.2. have been around as early as the 1950s. q (number of vehicles/unit time). the macroscopic and microscopic models. Generalized average speed is defined as generalized flow/generalized density [7]. It is more common to define flow as the rate at which vehicles . and density are the main states of the system. and so forth. With advances in computers and computational power. With advances in computers and computational speed these software tools allow the simulation of a large traffic network using microscopic models. These packages include CORSIM generated by the Federal Highway Administration (FHWA) [3]. an example of a computer simulation model validated by field data. one can view the traffic flow as a flow of fluid where instead of overflows we have reduction in speed and flow rates. if not earlier.1 Macroscopic Traffic Flow Models On the macroscopic level. 13.Modeling. typically characterized by flow rate. both on the vehicle level. and the flow level. local traffic disturbances.2 Modeling of Traffic Flow Traffic flow models can be divided into two major popular classes. v (distance traveled/unit time). In such a macroscopic view the responses of individual vehicles are averaged and quantities such as average speed. and Control of Transportation Systems 409 Zooming out. referred to as microscopic models. however. 13. and flow rate and do not capture individual vehicle responses. Efforts to model traffic flow. and others. This chapter presents an overview of traffic flow modeling at the microscopic and macroscopic levels. and the design of a highway traffic flow control system evaluated using the simulation model. traffic flow is often treated as a fluid flow. flow. The macroscopic models are concerned with the overall behavior of traffic flow as described by the values of average speed. k (number of vehicles/unit length). AIMSUM [5]. They are more accurate in the sense of capturing individual vehicle behavior but computationally demanding if one wants to model traffic flow due to many vehicles. density. and continue to take place. On the other hand the microscopic models deal with the individual vehicle/driver behavior as the vehicle operates in traffic. For this reason software tools and packages have been developed to model traffic flow using the microscopic behavior of vehicles. simulation tools have been developed to capture what up-to-date mathematical models cannot adequately describe. and speed. Simulation. and commercial ones such as VISSIM [4]. referred to as macroscopic models.

and density as the number of vehicles per unit length over a segment of road at an instant of time. The maximum flow q 0 can . given by the equation: q = vfree k − k2 kjam (13.2) we obtain the relationship between flow and density. Greenshield’s model [8] assumes that the traffic flow speed v = V(k) is a linear function of density. We describe some of these models below.1 Upstream vehicles passing the observer.3) which shows a quadratic relationship between flow and density with a maximum flow q 0 reached at the critical density kc . described as: V(k) = vfree 1 − k kjam (13. If traffic is stationary. that is. It is clear that as the density reaches the jam density the speed goes to zero. pass a point in space.1.2 shows the speed–density relationship. speed and density can be considered constant. This model has been shown to approximate real traffic in the case of fairly light traffic conditions. at steady state. In order to better understand these definitions let us consider Figure 13. and speed–flow relationship from some field data. we have: q = kv (13. the traffic composition (ratio of passenger cars to trucks). Using Equations (13.410 Modeling and Control of Complex Systems Observer Δx FIGURE 13. Using similar field data and empirical studies. Therefore. and so forth [7].2) where vfree is the free flow speed and kjam is the jam density. Because t is small. flow–density relationship. several static models have been proposed in the literature in an effort to capture these relationships in the form of equations. t is approximately equal to x/ v. which is equal to kv.1) at location x and time t. Figure 13. All the vehicles that are within a short distance x upstream of the observer pass the observer in a very small time t. it is reasonable to assume that there is a relationship between flow and speed that depends on the properties of the road. the weather. Therefore the flow rate q is approximately equal to k x/ t.1) and (13. Thus. The total number of vehicles that passed the observer is approximately k x.

Modeling. the free flow speed. at least qualitatively. . and so forth. and to the right of the critical density. Simulation. and speed–flow curves from some field data.3 with the corresponding curve in Figure 13. The corresponding fundamental diagram based on Equation (13. and Control of Transportation Systems Speed-Density Curve 120 100 80 v (km/h) 60 40 20 0 0 10 20 30 40 50 60 70 k (veh/km) q (veh/h/lane) 2500 2000 1500 1000 500 0 Flow-Density Curve 411 0 10 20 30 40 50 60 70 k (veh/km) Speed-Flow Curve 120 100 v(km/h) 80 60 40 20 0 0 500 1000 1500 2000 2500 q (veh/h/lane) FIGURE 13. Comparing Figure 13. At low densities the flow increases linearly with density. which is vfree .2 Speed–density. As the density increases further. the traffic is considered stable.3) is shown in Figure 13. In the region to the left of the critical density. of the traffic flow at low densities.3. the flow also increases until it reaches the maximum value q 0 at the critical density kc . In this case the speed is equal to the slope of the curve. The speed is equal to the slope of the straight line connecting the origin with any point on the curve.2 it is clear that the Greenshield model is a good approximation. After this point further increases in density lead to reduction of flow rate and congestion takes place until the jam density is reached where the speed and flow rate become zero. flow–density. road conditions. the traffic is considered congested and unstable. be viewed as the capacity of the traffic network based on its geometry.

6) kjam FIGURE 13. kjam k The Greenberg model [9] offers a good approximation of traffic flow characteristics during congested traffic conditions and is given by the equation: V(k) = vfree ln kjam k (13. and is described as: V(k) = vfree exp q −k kc (13. the relationship of flow rate with density is given by: q = vfree k ln kjam k (13.3 Fundamental diagram of Greenshield’s model.5) which describes a similar shape (Figure 13. Underwood’s model [10] also gives a good approximation of the free-flow traffic.4) Similarly.412 q Vf Modeling and Control of Complex Systems q0 V Stable Unstable kc FIGURE 13.3. k . Comparing the shape of this curve with field data it is clear that the Greenberg model is a good approximation. at least qualitatively.4) as that shown in Figure 13. of the flow at high densities.4 Fundamental diagram of Greenberg’s model.

10) is also true when traffic is not stationary. and is referred to as the LWR model. t) = Q(k(x. A general model that describes some of the previous models as special cases is given in Reference [13] as: V(k) = vfree 1 − k kjam l m (13. They are not dynamic models as there is no dependency on time. Greenshield’s model is obtained by setting m = l = 1.12) .5). Equations (13. t) (13. x. x2 ] at time t is: N(t) = x2 k(x.9) by appropriate choices of m and l. If the road is homogeneous. The corresponding flow–density relationship is q = vfree k exp −k kc (13. It is the simplest first-order hydrodynamic model that provides a coarse description of one-dimensional traffic flow dynamics. the density at which the roadway segment is operating at its capacity. Simulation. m → ∞.11) For a long.Modeling. The above models are based on observations of traffic flow at steady state. For example. and entrances (Figure 13. that is. crowded. The first dynamic macroscopic traffic flow model was proposed by Lighthill and Whitham [14] and Richards [15].8) 2 where D is a diffusion coefficient given by D = ς vr . l = 1. The model is based on the assumption that the fundamental relationship between flow and density. t)d x x1 (13. that is. Underwood’s model is obtained by setting kjam = kc m. t)) (13.6) can be obtained from Equation (13. ς is a constant referred to as the relaxation parameter and vr is a random speed. 12] modifies Greenshield’s model to make the speed drop gradually (instead of instantaneously) as density increases and is described by the equation: V(k) = vfree 1 − k kjam − D ∂k k ∂x (13.2) and (13. q = Q(k.9) where m ≥ l > 0 are real valued parameters.10) becomes: q (x. the total number of vehicles within the space interval [x1 . and Control of Transportation Systems 413 where kc is the critical density. one-way road without traffic lights. exits.7) The diffusion model [11. Equation (13.

entrances.t) FIGURE 13.14) completely define the LWR model and can be expressed as a single equation by using: ∂ ∂t = i. Equation (13. t) (13. the LWR model together with the speed–density relationship (such as Equation [13. t) x2 x1 ∂ ∂t k(x.t) x1 k(x.2]) or flow–density relationship (such as Equation [13. ∂t Qk = ∂Q ∂k (13. t) − q (x2 .14) Equations (13.12) into Equation (13. t))d x = 0 ∂x ∂ ∂ k(x. density. x2 ] the change of N(t) can only come from the change of the flow at the boundaries. Using the conservation law for vehicles within [x1 . t)) d x = 0 ∂t ∂x to obtain kt + Qk k x = 0 where kt = ∂k .e. t) ∂t Substituting Equation (13.13) k(x. describe the evolution of traffic states (flow.13) we obtain: ∂ ∂t x2 x1 (13.414 Modeling and Control of Complex Systems Traffic Direction q(x1.5 Long one-way road without exits. Therefore. t) − q (x1 . t) + Q(k(x.t) x2 q(x2. that is. ∂ N(t) = q (x1 .11]). x1 x2 x2 x1 k(x.11) and (13.15) . the dynamic speed equation. and speed) along a roadway section. t)d x + x2 x1 ∂ Q(k(x. and traffic lights. t)d x = q (x1 . t)d x + q (x2 .15) defines the evolution of traffic density along a specified roadway. t) − q (x2 . derived from microscopic . kx = ∂k ∂x Given appropriate initial/boundary conditions. Payne [16] modified the LWR model by adding a second differential equation.

vi. Consider the freeway segment shown in Figure 13.6. q1 Section i ki. subdivided into N sections.17) .6 Discretized space–time model. and applications of the PW model. sN FIGURE 13.15) is modified to incorporate on-ramp and off-ramp flows: ∂k ∂q + dx = r − s ∂t ∂x (13.15) the dynamic speed equation: ∂v ∂v 1 1 d V(k) ∂k = −v + V(k) − v + ∂t ∂x τ 2k dk ∂ x (13. One of the popular extensions is the discrete model proposed by Papageorgiou et al. v1. [13] first modified the PW model in a continuous space–time framework. and Control of Transportation Systems Section 1 k1. The conservation of flow Equation (13.Modeling. [13]. s1 ri. The model is usually called the PW model and includes in addition to Equation (13.16) where τ is the driver’s reaction time and V(k) is the stationary speed–density relationship implied by the particular car-following model. variations. Whitham [17] presented a similar model independently. Simulation. In this discrete space configuration. qN 415 r1. vN. car-following models (it can also be derived from statistical mechanical models) [16]. During the past thirty years the PW model motivated numerous publications dealing with extensions. Payne also presented a discretized version of the PW model in Reference [16]. qi Section N kN. the following variables are used: T0 Time step size (in h) L i Length of section i mi Number of lanes of section i ki (n) Traffic density (in veh/km/lane) of section i at time nT0 vi (n) Mean speed (in km/h) of section i at time nT0 q i (n) Traffic flow (in veh/h/lane) out of section i at time nT0 ri (n) On-ramp inflow of section i at time nT0 si (n) Off-ramp outflow of section i at time nT0 Papageorgiou et al. derived as follows. si rN.

and ξ is introduced in order to keep the last term within reasonable bounds when the density becomes small. ν.22) is the conservation equation and it is deterministic.24) (13. ξ . ≈ ∂t T0 ∂q i q i (n) − q i−1 (n) ≈ ∂x Li (13.19) The dynamic speed Equation (13.21) − τ Li ki (n) + ξ L i ki (n) + ξ where ν. the difference equation of density is obtained as: ki (n + 1) = ki (n) + T0 [q i−1 (n) − q i (n) + ri (n) − si (n)] Li (13. whereas the .25) νT0 ki+1 (n) − ki (n) δT0 ri (n)vi (n) − + ωi (n) τ Li ki (n) + ξ L i ki (n) + ξ 1 a k kc a V(k) = v f exp − q i (n) = ki (n) · vi (n) · mi + ζi (n) where.16) is discretized as: vi (n + 1) − vi (n) vi (n) − vi−1 (n) 1 = −vi (n) + [V(ki (n)) − vi (n)] T0 Li τ 1 ν ki+1 (n) − ki (n) − (13. τ. a are constant parameters. which have the same values for all sections and need to be calibrated for different roadway segments.416 Modeling and Control of Complex Systems where r is the on-ramp inflow and s is the off-ramp outflow.23) (13. By rearranging the terms and adding the effect of the on ramps. δ are constant parameters.22) T0 T0 vi (n)[vi (n) − vi−1 (n)] + [V(ki (n)) − vi (n)] Li τ (13.18) Then. δ. Equation (13. the speed difference equation is obtained as: ν=− vi (n + 1) = vi (n) − T0 T0 vi (n)[vi (n) − vi−1 (n)] + [V(ki (n)) − vi (n)] Li τ δT0 ri (n)vi (n) νT0 ki+1 (n) − ki (n) − (13. Its discrete time approximation is derived as: ki (n + 1) − ki (n) ∂ki .20) τ ki (n) Li where d V(k) 2dk is regarded as a constant parameter. The complete discrete time–space model improved in References [18] and [19] is summarized as: ki (n + 1) = ki (n) + vi (n + 1) = vi (n) − − T0 [q i−1 (n) − q i (n) + ri (n) − si (n)] L i mi (13.

no passing or moving backward are allowed. The stationary speed Equation (13.24) is a special case of the general model in Equation (13. The two major classes of microscopic models are the car following and the cellular automata. The various constants that appear in the model need to be selected so that the model closely describes traffic flow characteristics by matching real traffic data. the following vehicle reacts to the .22) to (13. The physical meanings of some of the terms in the dynamic speed Equation (13. ki+1 i(n)−ki (n) is the anticipation term. vi (n)[vi (n) − vi−1 (n)] is the convection term. 13. In car-following models. which are either empty or occupied. Each driver reacts in some fashion to a stimulus from the vehicle ahead in the following way [23]: response(t + τ ) = sensitivity × stimulus(t) (13. therefore.2.23) were given in References [20] and [21]. The cellular automata are not as popular as the car-following models which attracted most of the attention from the research as well as simulation and analysis point of view. density.25) have some noise. that is. Below we present some of the most popular car-following models.Modeling. but because lane changing and passing are not as frequent a phenomenon as vehicle following.27) where v f is the speed of the following and vl is the speed of the leading vehicles in the same lane. lane-changing models received less attention. which k (n)+ξ describes how drivers respond to the downstream density. 13.1 Linear Car-Following Model Pipes [25] was one of the first researchers to propose the linear car-following model: v f (t + τ ) = λ[vl (t) − v f (t)] ˙ (13. Efforts to validate the model using real traffic data are described in Reference [13].26) where τ is the reaction time. Therefore.25) describes the evolution of the speed. which includes the reaction time of the driver as well as that of the vehicle actuation system. and Control of Transportation Systems 417 dynamic speed Equation (13.9). which represents the influence of the upstream speed. The discrete time model (13. we assume that vehicles are in the same lane. The cellular automata treat the road as a string of cells. Simulation.2. An example is the stochastic traffic cellular automata model presented in Reference [22]. ωi a nd ζi . In this model.23) and the transport Equation (13. the modeling of a traffic network involving many vehicles is far more elaborate and complex than in the case of macroscopic models. [V(ki (n)) − vi (n)] is the relaxation term which regards the speed provided by the stationary speed–density relationship under current density ki (n) as the desired value.2 Microscopic Traffic Flow Models Microscopic traffic flow models deal with the individual vehicle behavior. and flow with time at different segments of the highway lanes. Lane-changing models dealing with passing and merging also exist [24].2.

418 Modeling and Control of Complex Systems speed difference between the lead and following vehicles by generating an acceleration/deceleration command proportional to the speed difference after some delay τ . one choice of which is M(t) = αe −β t . ¯ 13. λ− [vl (t) − v f (t)]. The approximated sensitivity coefficient and reaction time in the Pipes model (13. leading to the generalized nonlinear model: v f (t + τ ) = λ ˙ vm (t + τ ) f [xl (t) − x f (t)]l [vl (t) − v f (t)] (13. This model is also called the Gazis–Herman–Rothery (GHR) model. x f are the absolute positions of the lead and following vehicles. τ= ¯ 1 ¯ λ ∞ t M(t)dt 0 ¯ The terms λ and τ are roughly equivalent to λ and τ in Equation (13.2 Generalized Linear Car-Following Model Lee [26] made a generalization of the linear car-following model expressed as: v f (t) = ˙ 0 t M(t − z)[vl (z) − v f (z)]dz (13. 13. the driver’s reaction is solely dependent on the relative speed and does not depend on the spacing between the lead and following vehicles. This model assumes that the acceleration/deceleration at time t depends on the time history of the relative speed.29).2.28) where M(t) is a memory function. [27] tried to improve Pipes’ model by assuming that the sensitivity constant in Equation (13. vl (t) − v f (t) ≥ 0 vl (t) − v f (t) < 0 (13.27).27) is a function of the intervehicle spacing.27) can be derived from the memory function as: ¯ λ= 0 ∞ M(t)dt.27).2. drivers’ reactions to deceleration is generally greater than to acceleration for safety reasons.4 Nonlinear Car-Following Model Gazis et al. and m. respectively.30) where xl . that is. 13. there are two sensitivity coefficients λ+ and λ− at positive and negative relative speed. where α and β are constant parameters. and λ are design constants. which is not realistic. respectively. The proportionality or sensitivity variable λ is assumed to be constant.3 Asymmetric Model The linear car-following model assumes that drivers react to acceleration and deceleration in the same way. in Equation (13. l. .29) Instead of a single sensitivity coefficient λ in Equation (13.2. whereas in reality.2.2. This fact motivated the asymmetric model: v f (t + τ ) = ˙ λ+ [vl (t) − v f (t)].2.

v f (t + τ ) = λv [vl (t) − v f (t)] + λx [xl (t) − x f (t) − D(t)] ˙ (13.Modeling. The parameters and controller gains were estimated using real traffic data in Reference [30]. and τ is the reaction time as before. 13. the optimal control can be shown to be: u(t) = Cv [vl (t) − v f (t)] + C x [xl (t) − x f (t) − Cc v f (t)] The acceleration of the vehicle is equal to v f (t) = u(t − τ ) − ρv f (t) − βv2 (t) ˙ f (13. [30] into the control structure. ρ is a coefficient related to mechanical drag (about 10−5 to 10−4 sec−1 ). C x . which was introduced by Burnham et al.2. which is not realistic.2. β is a coefficient that depends on the aerodynamic drag (about 10−3 to 10−2 m−1 ).33) where Cv . In this case the driver response depends both on the relative speed and the relative spacing the driver likes to maintain.6 Deterministic Optimal Control Model Tyler [29] modeled the car-following behavior as an optimal control problem where the speed of the following vehicle is regarded as the state of the dynamic system and the control u(t) is generated by solving an optimization problem.5 Helly’s Model When the relative speed is zero. and q 1 . q 2 .31) where λv is the speed sensitivity coefficient. This behavior may be unrealistic as drivers may be willing to use high acceleration when the intervehicle spacing is large irrespective of the relative speed and may not like to accelerate as much at small intervehicle spacing even if there is a positive relative speed. The following model takes into account some of the psychological responses of drivers.32) where σ v f (t) is some desired spacing which depends on the speed of the following vehicle linearly. and D(t) is the desired intervehicle spacing. Equations (13. Helly [28] proposed a model that takes both the relative speed and the spacing as stimulus. r are the weights of the three different square terms. (13. and Cc are constant gains. and Control of Transportation Systems 419 The model suggests that at large intervehicle spacing the vehicle does not accelerate as much and at small intervehicle spacing it accelerates much more if the relative speed is also high.30) give zero acceleration no matter how small the spacing is. This model clearly indicates that the driver/vehicle . and (13.29).27). If we assume that the dynamics of the lead and the following vehicles are the same. λx is the spacing sensitivity coefficient.34) (13. The cost of the optimization problem is chosen as: J = 1 2 ∞ 0 {[xl (t) − x f (t) − σ v f (t)]2 q 1 +[vl (t) − v f (t)]2 q 2 + r u2 (t)}dt (13.2.2. 13. Simulation.

m (<0) is the most severe deceleration the driver of ˆ the following vehicle is willing to undertake.2.m 2xl (t) − 2L l − 2x f (t) − v f (t)τ − f. the more perceptible the speed difference is. at small spacing. drivers are not influenced by relative speed. 13.des 1/2 (13.m τ vl2 (t) ˆ b 1/2 + b 2 τ 2 − b f.38) where a f.des is the speed at which the following vehicle wishes to travel. which is the most severe deceleration the driver of the lead vehicle is willing to undertake.2.36) (13.025 + v f (t) vf . 13.2. This model consists of two parts: the acceleration and deceleration parts which can be expressed in the same equation as: v f (t + τ ) = min v f (t) + 2.m τ 1 − v f (t) vf . 33] proposed a psychophysical car-following model that takes into account these considerations.34). Wiedemann [32. as well as on the vehicle speed.2. and the smaller the spacing. b f. b is the estimate of bl. a stochastic optimal control car-following model could be derived similar to Equations (13. L l is the physical length of the leading vehicle plus a margin. the observed position x (t) and speed v(t) are corrupted by noise.2.33) and (13. Perception thresholds or action points are the basic . b f. The optimal control law ˆ ˆ and model are described by: v ˆ ˆ ˆ ˆ u(t) = Cv [ˆ l (t) − v f (t)] + C x [xl (t) − x f (t) − Cc v f (t)] 2 v f (t) = u(t − τ ) − ρv f (t) − βv f (t) + ω(t) ˙ v f (t) = v f (t) + η(t) ˆ where ω(t) and η(t) are white noise.9 Psychophysical Spacing Model or Action Point (AP) Model The above models assume that drivers react to changes in relative speed even at large spacing. there are combinations of spacing and relative speed for which there is no response because the relative motion is too small.m (13.37) .5a f. 13.m .m is the maximum acceleration the driver of the following vehicle is willing to undertake.8 Gipps Model The Gipps model [31] is based on the assumption that each driver sets limits to his or her desired braking and acceleration rates and selects a safe speed to ensure that there will be no collision even when the vehicle ahead comes to a sudden stop.des 0. However. In this case.7 Stochastic Optimal Control Model By introducing noise. drivers are subjected to certain constraints on the stimuli to which they respond: at large spacing. This model is used in the microscopic traffic simulator VISSIM.2. vf .420 Modeling and Control of Complex Systems response depends on the relative speed and spacing.35) (13.

4). therefore. Gazis et al. Individual vehicle responses and local traffic disturbances get averaged and cannot be seen in a macroscopic model. l = 1. such a mesoscopic traffic flow model is proposed for automated vehicles. The connection between microscopic and macroscopic models can be established by using the microscopic models to define speed–density relationships on the macroscopic level during stationary or steady-state traffic conditions. we obtain: v = λ ln s + c 0 = λ ln when k = kjam . and Control of Transportation Systems 421 characteristics of these models. Then Equation (13. [35] showed that Greenberg’s model (Equation [13.41) is the same as Equation (13.39) becomes: dv λ ds = dt s dt Integrating both sides. Simulation. More details about the model can be found in References [32–34]. and τ = 0 as follows: [vl (t) − v f (t)] (13.40) . Another class of models that is more complex than the macroscopic ones but not as complex as the microscopic models is referred to as mesoscopic models. v = 0. On the other hand microscopic models are more elaborate as they include individual vehicle responses and their complexity increases rapidly as the number of vehicles in the network increases. These models can be built by using the microscopic models and interpolation to generate the states of the macroscopic model. which leads to kjam (13.41) k When λ = vfree .39) v f (t + τ ) = λ ˙ [xl (t) − x f (t)] Let spacing s = xl (t) − x f (t).2.Modeling.4]) can be derived from the nonlinear car-following model (13. the model is stochastic with specified distributions of thresholds.30) with m = 0. Equation (13. therefore c 0 = −λ ln 1 kjam 1 + c0 k (13. The driver of the following vehicle continues to do what he or she is doing until he or she hits a threshold. and density. v = λ ln 13. This threshold depends on the driver’s perception and physical ability as well as the spacing and relative speed. flow rate. which is Greenberg’s model. therefore. Macroscopic models are simpler to analyze and simulate. they are often called action point (AP) models.3 Mesoscopic Models As described in the previous section. In Reference [36]. the macroscopic models capture the average characteristics of traffic flow determined by variables such as average speed.

In this section we present a highway traffic flow control (HTFC) system which integrates roadway to vehicle (R2V) communication capabilities and ACC systems to design a traffic flow control system. 13. route guidance. These trends offer an opportunity to have the infrastructure communicate directly with ACC vehicles by providing commands. modern ramp metering strategies can be classified into two categories: (1) reactive strategies. and other optimal ramp control strategies [40–46]. It has been shown that the use of variable speed limits can improve traffic flow performance [47. and a combination of these have been developed and implemented. such as vehicles with adaptive cruise control (ACC). It motivates the design of ACC systems as an integral part of a larger control system that involves the roadway. During the last decade. 48] by preventing traffic flow breakdown [49] in the presence of traffic disturbances. such as ALINEA [37–39]. aiming at maintaining the freeway traffic conditions close to prespecified set values by use of real-time measurements. Among the various traffic flow control strategies. The highway traffic management center (HTMC) collects information about the status of the traffic and calculates the appropriate commands for the ramps and desired speed limits along the highway lanes.1 System Description and Notation The structure of the HTFC system is shown in Figure 13.3.422 Modeling and Control of Complex Systems 13. Nonlinear optimization and model predictive control (MPC) techniques have been used for generating desired speed limits commands [50. also referred to as intelligent cruise control (ICC). considerable research efforts have been devoted to automating vehicles in an effort to improve safety and efficiency of vehicle following [52]. The speed limits are communicated to the individual vehicles via short-range vehicle to roadway communications or by billboards (less advanced system).3 Highway Traffic Flow Control In recent years considerable research efforts have been made to improve highway traffic flow. If the vehicles are equipped with . According to an overview [1]. such as fuzzy logic.7. and warnings for the purpose of improving traffic flow characteristics and safety. ramp metering. Although dedicated highways with fully automated vehicles is a far in the future objective [53]. In addition to ramp metering. variable speed limits can be issued by the infrastructure to vehicles in an effort to control traffic flow characteristics on highways. artificial neural networks. The coordination of variable speed limits and ramp metering is shown to increase the range over which ramp metering is effective [50]. 51]. and (2) nonlinear optimal ramp metering strategies. recommendations. the introduction of semiautomated vehicles. speed control. on current freeways designed to operate with manually driven vehicles has already taken place in Japan and Europe and more recently in the United States too [54].

and the controller generates commands every T1 seconds. The freeway stretch and its surveillance system are simulated . which include ramp metering commands and desired speed limits for the various sections of the traffic network.8. The HTFC system can also be viewed as a feedback control system. from nT1 to (n + 1)T1 . that is. and Control of Transportation Systems 423 Remote HTMC WAN Beacon.6. Once a control command is generated at time nT1 . The HTMC system consists of the data acquisition and processing block whose responsibility is to process all traffic measurements obtained at a sampling period T2 and provide to the roadway controller those measurements that are relevant to control at a sampling rate T0 . The roadway controller uses these measurements to come up with the control commands. these systems are modified to accept and respond to speed limit commands from the roadway. Each section is about 500 m long as shown in Figure 13. Aggregated traffic state variables are collected from traffic surveillance systems or estimated every To seconds.8. Since almost all vehicles are following the vehicles immediately ahead of them the speed limit commands will be indirectly obeyed by all if at least one driver in each lane obeys the roadway speed limit commands. Simulation. Roadside Unit Vehicle with ACC FIGURE 13. These commands are provided at a sampling period T1. it will remain constant during this control interval.7 The integrated roadway/adaptive cruise control system. Consider a freeway stretch that is subdivided into N sections. adaptive cruise control systems (ACC). The feedback roadway control system is shown in Figure 13. The non-ACC vehicles would have to rely on the human drivers to obey the desired speed limits. as shown in Figure 13. where T1 = Nc To . Nc is a positive design integer.Modeling.

424 Modeling and Control of Complex Systems Control Inputs Speed Limit and Ramp Metering Rate Data Acquisition and Processing T0 Roadway Controller T1 HTMC System Roadside Vehicle Communication (DRSC) T2 Highway Traffic System Output: Traffic Data FIGURE 13. R j (nT1 ) Ramp flow command of the jth on ramp during time interval [nT1 . J R The set of the section indices in which ramp meters are controlled. using the microscopic simulator VISSIM. IV The set of the section indices in which speed limits are controlled. Vmax The lower and upper bounds of speed limits. To = 15 sec). Vi (nT1 ) Speed limit command of the ith section during time interval [nT1 . In addition to the symbols and notation defined in Section 13. (n + 1)T1 ]. Rmax The lower and upper limits of ramp flow rate. Rmin .3. (nT1 = nNc T0 ). However. T1 Controller time step size (T1 = Nc To .1. (n + 1)T1 ].8 The HTFC as feedback control system. the following symbols and notation are used: To Surveillance system time step size (in this project. extensive simulation studies need to be performed to evaluate the performance and robustness of the proposed control strategies and the effect of the proposed commercial developments or road schemes. (nT1 = nNc T0 ). The upper limit is the default speed limit of the freeway stretch.2 Microscopic Simulation Because actual experiments involving new traffic control algorithms are not feasible most of the time due to cost and possible adverse effects on traffic. Macroscopic models capture the evolution of traffic on a coarse level and therefore need less computing power and calibrating efforts. Nc is a positive integer).2. Vmin . 13. they are sometimes .

Simulation. and Control of Transportation Systems 425 not sufficient to capture the desired level of details of the studied transportation system. and VISSIM [4] have been used and studied among traffic engineers and researchers. Section 1 Section 2 Section 3 Section 4 Section 5 Section 6 Section 7 Section 8 Section 9 Section 10 Section 11 Section 12 Section 13 Section 14 FIGURE 13. The freeway curvature was not considered either because the degree of curvature in the area is not high enough to affect the traffic.Modeling. data from four different days of June 2004 were selected. These four days showed similar congestion patterns. The existence of the HOV lane was not considered. microscopic simulations have increased their area of application. and congested speed. . the calibration of the model parameters plays an important role in simulating the desired transportation system accurately.9 (upper part). The Berkeley Highway Laboratory (BHL) is a test site covering 4. which has five lanes.9 The BHL and its extension. The facility provides traffic data collected by 16 directional dualinductiveloop-detector stations [55].3 km of Interstate-80 immediately east of the San Francisco-Oakland Bay Bridge. duration of the congestion. occupancy. Along with the advances in computing power. Dual-loop detectors in these seven detector stations collect speed. and flow measurements which were then aggregated to 30-sec summary data files and could be downloaded from the BHL Web site. Software packages such as PARAMICS [6]. that is. peak flow rate. the freeway stretch includes two on ramps (Ashby Avenue and University Avenue) and one off ramp (University Avenue). including one HOV (high occupancy vehicle) lane. Different software packages consist of different traffic flow models which are the keys to the accuracy of a traffic simulation system. AIMSUN [5]. a freeway segment model is created and validated by field data using VISSIM. Therefore. we increased the sampling period to 5 min. The unidirectional freeway stretch constructed in VISSIM includes only the BHL northbound part. As shown in Figure 13. The basic idea of calibrating the model is to use the flow measurements from station 7 as input flows to VISSIM and compare flow and speed measurements of simulation runs with different parameters to those field measurements in order to find an acceptable set of parameters. In the design and evaluation of the HTFC system. CORSIM [3]. Specifically. The triangular marks represent the data collection stations of BHL. Because our simulation period is 12 hours (from 10:00 am to 10:00 pm).

2099 (13. blue points are points (spacing. CC2 is the following variation. stochastic. Therefore. time step-based.10 Car-following model parameter estimation using field data. Several parameters involved in this model are quite sensitive and need to be calibrated. the slope of the line is approximately the time headway h and the intercept of the line is approximately the parameter d. Vl is the vehicle length. red points are the fitted straight line by least square estimation. which is around 9. CC1 is the headway time. and s is the spacing. our first guess of these parameters came from the estimation of the following two parameters: h and d. and CC0 + Vl + CC2/2. Car-Following Model Parameter Estimation 45 40 Spacing (m/veh) 35 30 25 20 15 10 0 5 Field data points Estimated points 10 15 Speed (m/sec) 20 25 FIGURE 13.42) and (13.43). Consider the common length of a car (including the standstill distance) to be around 6 m.426 Modeling and Control of Complex Systems The traffic flow model in VISSIM is a discrete. . in Equation (13. These parameters can be expressed in the following equation: s = CC0 + CC1 · v + Vl + CC2/2 (13. v is the vehicle speed. As shown in Figure 13. microscopic model. Spacing estimates were obtained by using s = v/q . Comparing Equations (13.5.43): s = h · v + d = 1. Flow and speed measurements in the free-flow region of the 4 days’ data were pooled together. that is. Due to the fact that spacing s is approximately the inverse of density k. we get nominal values for CC1.4934 · v + 9. speed) from field data. then CC2/2 is around 3. and that we can estimate the relationship between k and v in steady state. which is around 1.42) where CC0 is the standstill distance. The model contains a psychophysical car-following model for longitudinal vehicle movement based on the Wiedemann 1999 car-following model.43) where h and d are estimated by least squares estimation using our field data.10.

(b) simulated flow.5. and occupancy strategy. Based on the nominal values of these parameters and a series of simulation runs. Simulation. and efficient local ramp metering strategy. we present a brief review of ALINEA. It can be applied without any theoretical preinvestigation or calibration to a broad range .3 Control Strategies Several ramp metering strategies. METALINE. simulated flow. such as ALINEA.11 Validation results: (a) field flow. 57]. Figure 13.5 and CC2 is chosen to be 6.Modeling. It has been shown that these strategies are easy to implement and capable of reducing traffic congestion. FLOW. (d) simulated speed. ALINEA is a simple. demand–capacity strategy. 13. flexible. field speed. It is clear that the simulation model generated matches real data and is therefore suitable for studying traffic phenomena and the effects of new control strategies on traffic flow characteristics. robust. are investigated in References [1.11 shows the validation results in terms of field flow. and Control of Transportation Systems Field Flow [veh/h/lane] Simulated Flow [veh/h/lane] 427 Flow [veh/h/lane] Flow [veh/h/lane] 2000 1500 1000 500 0 Sta 4 2 tion ID 2000 1500 1000 500 0 6 4 Sta tion 2 4 6 Tim 6 0 0 (a) 10 12 6 8 2 4 our] [h Time ID 0 0 (b) 2 12 8 10 ur] e [ho Field Speed [km/h] Simulated Speed [km/h] 120 100 80 60 40 20 0 6 4 Sta tion 2 ID 0 0 (c) 2 4 12 8 10 ] [hour Time 6 120 100 80 60 40 20 0 6 Sta 4 tion 2 4 6 12 8 10 ur] e [ho Speed [km/h] Speed [km/h] ID 0 0 (d) 2 Tim FIGURE 13. CC1 is chosen to be 1. As our ramp metering strategy is a modification of ALINEA.3. CC2 is around 6. (c) field speed. and simulated speed [56].

In the freeway layout shown in Figure 13. otherwise where Nc ¯ R j (nT1 ) = R j ((n − 1) T1 ) + K r m=1 kd − k j ((n − 1) Nc T0 + mT0 ) (13. K r is a positive controller parameter. kd can be chosen to be close to the critical density kc in the fundamental diagram. The ramp metering control strategy is combined with the speed limit control strategy developed next to form the overall roadway controller of the HTFC system. j ∈ JR . The control strategy described by Equation (13. and the desired occupancy o d replaced by the desired density kd . ALINEA can be expressed as R(nT1 ) = R((n − 1)T1 ) + K r [o d − o(nT1 )] (13. if section j ( j ∈ J R ) contains one on-ramp located near the middle of the section.6. kd is the desired density. A small traffic disturbance due to a shortduration accident or vehicle breakdown creates a shock wave that takes a long time to be dissipated due to the fact that vehicles away from the accident can be at high speeds. In the HTFC system a generalized ALINEA ramp metering strategy is used. T1 = Nc To .44) with the occupancy o replaced by the traffic density ki .44) is a simple integral controller where the integral action rejects constant disturbances and reference points in an effort to force the downstream occupancy to stay close to the desired occupancy when the traffic volume is high. ⎨ ¯ if Rj (nT1 ) < Rmin (13. JR is the set of section indices in which ramp meters are controlled. o(nT1 ) is the measured downstream occupancy at time nT1 . and o d is the desired value for the downstream occupancy.45) R j (nT1 ) = Rmin . Ramp metering provides some feedback by controlling the volume of vehicles entering the highway through the ramps but there is no control of the vehicles coming into the highway network with different speeds from different branches of the highway. whereas vehicles close to the accident are at almost zero speed. K r is a control parameter. ⎪ ⎩¯ R j (nT1 ) . which is typically chosen close to the critical occupancy o c [39].44) where R(nT1 ) is the ramp meter command at time t = nT1 . It results in low-speed. .428 Modeling and Control of Complex Systems of freeway ramps where congestion problems exist. described as follows: ⎧ ¯ if Rj (nT1 ) >Rmax ⎪ Rmax . This possible high speed differential along the highway lanes is also associated with a high differential in the traffic density. Different studies have demonstrated that ALINEA is not inferior to sophisticated coordinated approaches under recurrent traffic congestions [39]. then a similar ramp metering strategy can be implemented as in Equation (13. The current highway traffic is operating as an almost open-loop dynamic system. Nc is a positive integer.46) R j (nT1 ) is the ramp command for the ramp on section j at time t = nT1 . Then.

the desired speed limit is the default speed limit of the ith freeway section.25)]. that is. which is described as follows. S2.48) Qi (nT1 ) = Qmin . Otherwise. 58] based on some second-order traffic flow models [such as Equations (13. The speed of the traffic flow at each section i satisfies an upper and lower bound Vmin ≤ Vi (nT1 ) ≤ Vmax (13. When Ci is inactive. 50.Modeling. The above rules prevent frequent switches of the controller between the active mode and the inactive mode. otherwise where Nc ¯ Qi (nT1 ) = Qi ((n − 1) T1 ) + K v m=1 [kd − ki+1 ((n − 1) Nc T0 + mT0 )] (13. ⎪ ⎩ ¯ Qi (nT1 ). and their robustness is questionable because the design models involve many unknown parameters that have to be estimated or calibrated a priori. One way to close the loop to this physically unstable dynamic system is to calculate the desired speeds vehicles need to follow at each section of the highway for a given traffic flow situation and communicate them to the vehicles using variable message signs along the freeway [49] or via short-range roadway to vehicle communication. The deployment of roadway control systems involving variable speed limits is feasible with current communication technologies. where + is a positive design parameter and kc is the critical density. section i is regarded as a virtual on ramp of section i + 1 and the same generalized ALINEA ramp metering strategy is applied to regulate the flow rate Qi from section i to section i + 1. These control strategies usually are computationally intense. If ki+1 (nT1 ) ≥ (1 + + )kc .22)–(13. A simple speed control strategy based on information from the fundamental flow–density relationship is used in the HTFC system. ¯ Denote Ci (i ∈ IV ) as the controller generating the desired speed limit Vi for section i. ⎧ ¯ if Qi (nT1 ) > Qmax ⎪ Qmax . S3. Vmax are positive constants. 51. When Ci is active.47) where Vmin .49) . where − is a positive design parameter. Various speed control strategies have been proposed in the literature [48. The following switching rules are used to determine whether or not Ci should be active: S1. ⎨ ¯ i (nT1 ) < Qmin if Q (13. Simulation. − )kc . Ci maintains the same status as in the previous control time interval. If ki+1 (nT1 ) ≤ (1 − then Ci is inactive. then Ci is active. and Control of Transportation Systems 429 high-density waves that propagate upstream and persist for much longer than it takes to clear the accident or the vehicle breakdown.

as shown in Figure 13. T1 = Nc To . We denote by vc the speed corresponding to the critical density. For practical purposes. . Therefore. we use the flow rate to speed relationship as described by the fundamental flow–density diagram. in order to regulate traffic speed instead of the traffic flow rate as done in ramp metering. critical density kc . kc Density (veh/km) and K v is a controller parameter.12. Qmax ] to [Vmin .13.51) ¯ However. Therefore.430 Flow (veh/h) Modeling and Control of Complex Systems Qmax A Qmin Vmin Vc B kd FIGURE 13. Therefore. when Ci is active. if the flow–density relationship is assumed to be: q = vfree · k · exp − 1 α k kc α (13. is speed.50) then the free flow vfree . The above equations provide the regulation of the flow at a particular section of the highway. Our control variable. shown in Figure 13.51) may lead to unsafe changes of speed limits. Therefore. It is reasonable to assume that Vmin ≤ vc ≤ Vmax . Specifically. The flow–density relationship of every section can be estimated either off-line or online [59]. however. i ∈ IV . We set Qmax as the flow corresponding to the critical density. and the exponent α can be estimated online or off-line using real traffic data and used to find the mapping f ( Q).12 Fundamental flow–density diagram. the mapping f ( Q) is defined based on the estimated flow–density relationship. we have the desired speed limit as: ¯ Vi (nT1 ) = f ( Qi (nT1 )) (13. vc ] can be found. Vi generated by Equation (13. we use the following speed limit Vi . we can set Qmin as the flow corresponding to Vmin . A mapping from [Qmin . denoted as f ( Q). kd is the desired density. which is the capacity. IV is the set of section indices in which speed limits are controlled. Nc is a positive integer.

52). ⎨ Vi (nT1 ) = Vi+1 (nT1 ) + c v .52) where c v is a design constant. Different congestion scenarios were created in VISSIM to evaluate the proposed HTFC system (Table 13.13 f(Q). otherwise (13.3.Modeling.9 was extended north to include a total of 14 sections. Simulation.6 km long. In order to quantify the effectiveness of the proposed HTFC system. Qmax ] to [Vmin.4 Evaluation of the HTFC System The validated microscopic simulation model in the upper part of Figure 13.1). if Vi (nT1 ) ≥ Vi+1 (nT1 ) + c v ¯ Vi (nT1 ) = f (ki+1 (nT1 )vi+1 (nT1 )). 13. as shown in Figure 13. vc ]. ¯ if Vi (nT1 ) ≤ Vi ((n − 1)T1 ) − c v ¯ if Vi (nT1 ) ≥ Vi+1 (nT1 ) + c v otherwise (13.53) and rules S1 to S3. and (13.9. strictly increasing mapping from [Qmin . we use two quantities: total time spent (TTS) in the network and the standard deviation of .53) The roadway controller of the overall HTFC system consists of the ramp metering strategy given by Equation (13. the speed limit is given as: Vi (nT1 ) = ¯ Vi+1 (nT1 ) + c v . ⎪ ⎩ ¯ Vi (nT1 ). and Control of Transportation Systems 60 vc 50 40 v (km/hour) 30 20 10 Qmin 0 0 500 1000 q (veh/hour/lane) 1500 Qmax 431 Vmin FIGURE 13. If Ci is inactive at time (n − 1)T1 and becomes active at time nT1 .45) and the speed control strategy given by Equations (13. which is smoother: ⎧ ⎪ Vi ((n − 1)T1 ) − c v . lower part. about 7.

This is not . When all the vehicles are manually driven.54) where Nsim = (3600/T0 ) = 240. We assume that drivers will follow the speed limit commands.432 TABLE 13. We consider sections 4 to 14 for calculating TTS because the first three sections of the segment are not controlled via variable speed limits. Moreover. the TTS is actually a weighted measure of the average density of the segment (freeway sections 4–14). is a smoothness measure of traffic. smaller density deviation is an indicator of possible lower emission rates and lower possibility of accidents.1 Modeling and Control of Complex Systems Simulation Inputs for Different Congestion Scenarios Scenario No. The StdK. Environmental effects and safety effects are related to the StdK because the smoother the density of the segment the fewer the number of acceleration or deceleration events that take place. which is defined below. in the case of roadway to vehicle communication system via a display or audio inside the vehicle. the speed limit commands are communicated to the drivers via billboards or. the simulation model needs some space to accommodate the extra vehicles that cannot enter section 4 and all its downstream sections. ki. Because all the simulation runs are 1 hour long and the length of each section is constant.n )]. and (ki. Therefore. if the speed limits are reduced at section 4.n ) is the density map of the segment for the whole hour. 1 ≤ n ≤ 240.55) where 4 ≤ i ≤ 14.n : density of section i at time nT0 (13. As T0 = 15 sec and simulation time is 1 hour. StdK = std[(ki. The TTS is defined as: Nsim N TTS = T0 n=1 i=4 [mi L i ki (nT0 )] (13. because the inflow to section 1 is at a constant level. N = 14 is the total number of sections. Scenario Category Inflow (veh/h/lane) 1 Peak hour traffic 2100 (high mainline demand) 2 Peak hour traffic 2200 (high mainline demand) 3 Peak hour traffic 2300 (very high mainline demand) 4 Accident traffic 1800 Disturbance/Incident None None None Speed drops to 10 km/h during the time interval 600–900 sec on sections 10 and 11 Speed drops to 4 km/h during the time interval 600–900 sec on sections 10 and 11 5 Accident traffic 1800 density (StdK).

21 8% 541. For example. section 11 begins to become congested due to the high inflow rate.3 470. Results from over 100 simulation runs showed that the HTFC system relieves congestion and reduces the TTS. This four-lane section. for all the scenarios in Table 13.53 HTFC on 448. as indicated by the density deviations.1 455.2 4% 537.5 8.01 13% 20.4 14% 8.2 5.48 6% 498.94 19.97 7% 41. which means a 13% decrease from the case without the HTFC system (517 veh·h). We also tested the HTFC system for different ACC penetration. whereas the shape of the fundamental diagram remains the same during the free-flow region.85 476.2).8 471.31 34.78 18% 8.9 871.2). which agrees with intuition.3 7% 8.24 26% 21.Modeling.78 7. The smoothness of traffic. Furthermore.14 shows that the critical density and capacity increase with the ACC penetration.8 944. When the onset of congestion is detected at section 11.5 13% 9.3 468. traffic is free flowing shortly after at section 11 and downstream. Therefore.52 19.05 9% 8.5 12% 694. congestion gets dissipated faster (Table 13.46 11% 35. becomes a bottleneck due to the immediate on ramp in the next section.98 33.2 433 Simulation Evaluation Results ACC% 0% 10% 40% 100% TTS StdK TTS StdK TTS StdK TTS StdK Scenario No.7 4% 670.6 527. (veh·h) (veh/km) (veh·h) (veh/km) (veh·h) (veh/km) (veh·h) (veh/km) 1 HTFC off 517 9.5 6. Figure 13. and 100% vehicles are ACC-equipped vehicles. We also estimated critical densities and capacities and other traffic flow characteristics for mixed manual and ACC vehicles scenarios. is also reduced by a factor of 27% (Table 13.76 2% 35.4 481.6 7.14 30% 21. and Control of Transportation Systems TABLE 13. which is approximately close to the estimated capacity.7 621. For the peak hour traffic scenario (scenario 1).3 5.86 460 6.8 642. .3 10% 969.21 19.23 7.8 454.6 5% 501.8 793.51 2% 549.2 27% 9. This reduction was acquired by the quick response to the onset of congestion and the smoothness effect of reducing speed limits.7 881.31 12% 3 4 5 a strong assumption as only a single driver in each lane needs to respond favorably to the speed limit command to affect the rest.31 6.41 28% 10. TTS is reduced to 449 veh·h. as the penetration of ACC vehicles increases.8 2% 1091.08 19. 40%.8 622.1 Down by 13% 27% 6% 19% 3% 14% 2% 8% 2 HTFC off HTFC on Down by HTFC off HTFC on Down by HTFC off HTFC on Down by HTFC off HTFC on Down by 555.6 10% 1017.2 496.86 456. simulation runs were conducted when 0%.92 7.4 8% 654.27 9% 36.7 18% 10.08 7.12 36.6 17% 692. Simulation. the roadway controller immediately reduces the speed