Introductory Techniques For 3D Computer Vision

INTRODUCTOR TECHNIQUES for 3-D COMPUTE VISION Emanuele Trucco Alessandro Verri‘nunc agere incipiam tibi, quod vementer ad has res attinel, esse ea quae rerum simulacra vocamus. you shall now see me begin to deal with what is of high importance to the Subject, and to show that there exists what we call images of things) Lucretius, De Rerum Nanura, 424-44 Contents Foreword Preface: About this Book Introduction 1 11 Whatis Computer Vision? 1 12 The Mary Faces of Computer Vision 2 121. Rested Disciplines 2 122. Recrchand Application Areas 3 13. Exploring the Computer Vision World. 4 131. CaxferencesJourals and Books 4 132 Income, 6 133 Some Hints on Math Sofware, 11 14 TheRoad Ahead 11 Digital Snapshots 5 21 Introdudion 16 2.2 Intensitylmages 16 21 Main Concens 16 222 Baie Optics 18 223 Baie Radiomery, 22 224 Gemmaric Image Formation, 26 23 Acquirirg Digital Images 28 2D Basic acs 28 232 Spatial Sampling 31 233 Acquistion Noise and How to Esimat 32 24 — CameraParameters M 24d Denton 34 242. Berisic Prameters, 38Contents 243 Inrnsic Parameters 36 24S Camera Models Rested, 38 25 Range Data and Range Sensors 40 25.1. Representing Range Images 41 252 Range Sensors 41 253 Active Trangulaon, 254 A Simple Sensor $4 26 Summary 47 27 Further Readings 47 28 Review 48 Dealing with Image Noise st 31 Image Noise 51 BLL Gausian Noise 3 B12 Impulsive Nowe 53 32 Noise Filtering 55 4121 Smoothing by Averaging, 56 322 Gaussian Smoodhing 423 Areour Samples Really Gausian’, 60 324 Nonlinear Filtering, 62 33 Summary 64 34 Further Readings 64 35 Review 6 Image Features 67 4.1 What Are Image Features? 68 42 Bdge Detection 69 424 Bases, 9 422. The Canny Edge Decor 71 423 Other Ee Detectors 50 424 Concluding Remarks on Edge Detection 89 43, Point Beatures: Corners. 82 44 Surface Extraction from Range Images 85 44.1. Deine Shape Clases 86 442. Esimasing Local Shape 87 45° Summary 90 46 Further Readings 90 47 Review 91 More Image Features 95 SA Introduction: Line and Curve Detection 96 52 The Hough Transform 97 S21. The Hough Transform for Lines 97 522 The Hough Traform for Curves, 100 Contents vl 523. Concluding Remarks on Hough Transforms, 100 53. Fitting Elipses to Image Data 101 $41 Euclidean Distance Fi 101 532. Algebra Disance Fi 103, ‘533 Robust ing 106 534 Concluding Remarks on Elie Fiting 107 54 Deformable Contours 108 SAL The Energy Petional, 109 542 The Element ofthe Enerty Functional, 109 543 AGreedy Algorithm, 110 55 LineGrouping 114 56 Summary 17 57 Further Readings 117 58 Review 118 Camera Calibration 123 61 Introdtetion 123 62. Direct 2arameter Calibration 125 621. Basie Equations 25 622 eal Length, Aspect Rati, and Eurinsic Parameters 127 (623 Fuimasing the Image Cenen,131 63 Camera Parameters from the Projection Matrix 132 631. imation ofthe Projection Mari, 132 632 Computing Camera Parameters, 134 64 — Conclusing Remarks 136 65 Summary 136 66 Further Readings. 136 67 Review 137 Stereopsis 139 71 Introdvetion 140 ZA Tee Two Problems of Stra, 140 712. ASinple Stereo Sytem, 188 213. Toe Poramaor ofa Str Spon, 144 712 The Correspondence Problem 145 72. Bascs 145 722 Corelation-Based Methods 146 723. Fate ated Methods, 148 724 Conluding Remarks 149 73 Epipoler Geometry 150 73. Notation, 150 732. Bases 151 733. Tre Esental Mais, E182 734 Tae Fundamental Mtr, F156Contents 735. Computing Eand F: The Figh-point gordon, 155 736 Locating the Epipotes ftom Eand F156 737. Rectfcaion, 157 74 3D Reconstruction 161 74.1 Reconsructon by Tangulaton, 122 742. Reconsnction up oa Sele Factor 164 743 Reconstruction up toa Projectve Transformation, 165 78° Summary 171 76 Further Readings 171 77 Review 172 Motion ” 81 Introduction 178, SLL. Tre Importance of Visual Motion, 178 812 The Problems of Modon Analysis 180 82 The Motion Field of Rigid Objects 183 821 Basics 18 {$22 Special Case: Pare Translation, 188 823. Special Cave 2 Moving Plane, 187 824 Motion Parlay 188 325 The Instantaneous Epipole, 191 83 The Notion of Optical Flow 191 831. ‘The Image Brighnes Constancy Equation, 122 832. The Aperure Problem, 192 833. TheValiliy ofthe Constaney Equation: Optical Flow, 166 84 Estimating the Motion Field 195 84D. Differential Techniques. 195 842. Feaure-based Telniqus 198 85 Using the Motion Feld 208 SI. 4D Motion and Sracture rom a Sparse Motion Field, 203 852 ED Motion and Sucre rom a Dense Motion Feld, 208 86 — Motion-based Segmentation 212 87 Summary 215 88 Further Readings 215 89° Review 216 Shape from Single-image Cues a9 9.1 Introduction 220 92 Shape from Shading 221 921. Dhe Refecance Map, 221 922 The Fundamental Equation, 23 93. Finding Albedo and lluminant Direction 226 83.1 Some Necessary Assumptions 226 932A Simple Method for Lamberian Swfoces 277 10 " 94 9s 96 97 98 Contents ‘A Variational Method for Shape from Shading 229 9441 The Functional 1 be Minimized, 229 942 The Euler Lagrange Equations, 230 943. from the Continuous othe Discrete Case, 231 944 The Algor, 231 945. Enforcing Igri, 232 946. Some Necessary Deals 244 Shape ‘tom Texture 235, 95.1. Whatis Texte’, 235 95.2. Using Terre to Infor Shape: Fundamentals 297 953 Surface Orientaion from Stas Texture 239 954 Concluding Remarks, 242 Summary 241 Furthe- Readings 242 Review 242 Recognition 207 101 102 03 104 10s 106 107 108 109 What Does it Mean to Recognize? 248 Interpretation Trees 249 1021 An Example, 251 1022 Wild Cards ond Spurious Features 253 1023 A Feable Algorithm, 253 Invariants. 255 1031 troduction 285 02 Definitions 26 1033 twvariant Based Recognition Algorithm, 259 Appescance Based Identification 252 M41 Images or Features? 22 1042 mage Eigenspaces, 25 Concluding Remarks on Object Identification 270, 3D Object Modelling 270, 1061 Feawre-ased and Appearance-based Modes 271 1062 Objes Versus Vewercenered Representation, 272 1063 Coneluding Remarks 273 Summary 273 Furthe- Readings 274 Review 275 Locating Objects in Space 279 Md m2 Introduction 280 Matehing from Intensity Data. 283, 1121 3D Location from a Pepectv Image, 283 122 ¥D Location froma Weak perspective Image, 288 1123 Pose from Elipes, 292Contents 11.24 Conctding Remarks, 299 113 Matching from Range Data. 294 1131 Esimating Translation Firs, 296 M22. Esimaing Rowton First, 300 1133 Concluding Remarks 301, 114 Summary 302 115 Further Readings 302 116 Review 303, ‘A Appendix 307 AJ Experiments: Good Practice Hints 307 A2 Numerical Differentiation 311 A3.TheSampling Theorem 314 Ad Projective Geometry 316 AS Differential Geometry 320, AG — Singular Value Decomposition 322 AT Robust Estimators and Model Fiting. 326 AS Kalman Filtering 328 AQ Three-dimensional Rotations 332 Index 335 Foreword Until recently, computer vision was regarded asa fleld of research stil in its infancy, rot yet mature and siable enough to be considered part ofa standard curriculum in ‘computer science. Asa consequence, most hooks on computer vision became absolete as soon as they were published. No book thus far has ever managed to provide a ‘comprehensive avervew of theficld since even the good ones focusonanarrow subarea ‘typically the author's research endeavor. With Trucco and Veri, the situation has finally changed. Ther book promises to bo the first true textbook of computer vision, the first to show that computer vision is now a mature discipline with solid foundations. Among connoisseurs, the aulhors are ‘well knowa a carefuland critical experts inthe eld (Tam proud to have figured inthe career of one of therr: Alessandro Verti worked with me at MIT fora short yea, and ‘twas ajoy to work with him) ‘Over the years I have been asked many times by new graduate students or colleagues what to readin order to lear about computer visio. Until now, my answer ‘was that I could not recommend any single book. As.a substitute, | would suggest an ‘ever-changing list of existing books together witha small collection of specific papers. rom now on, however, my answers clear: Introductory Techniques for 3-D Computer Vision i the ext to read, T personaly belive that Jnroductory Techniques for3.D Campute Visi wil he the standard textbook for graduate and undergraduate courses on computer vision in years to come, Its ar almost perfect combination of theory and practice. It provides a complete introduction to computer vison, effectively giving the base background for practitioners and future researchers in the fl “Trucco and Ver have written a textbook that is exemplary in its larity of ex- positon and in its intentions. Despite the intial warning (“Frail dice e il fare c di smezz0 il mare™), the objectives stated in the preface are indeed achieved. The book "penore wor a deere thi Foreword ‘not only places a correctly balanced emphasis on theory and practice but also provides ‘needed material about typically neglected but important topics such as measurements, calibration, SVD, robust estimation, and numerical diferentiaion, Computer vision is just now maturing from an almost esoteric corner of research to.a key discipline in computer science. In the last couple of years, the first billion- dollar computer vision companies have emerged, a phenomenon no dovbt facilitated by the irrational exuberance of the stock market. We will undoubtedly see many more commercial applications of computer vision inthe near future, ranging from industrial inspection and measurements to security database search, surveillance, multimedia and ‘computer interfaces. This is a transition that other fields in engineering, such as signal processing and computer graphics, underwent long ago. Trucco and Vers imely book isthe first to represent the discipline of computer vision ia its new, mature state, asthe industries and applications of computer vision grow and mature as wel. sit reaches ‘adulthood, computer vision is sil far from being a solved problem. The most exciting ‘developments, iscoveries and application ie ahead of us. Though a similar statement can be made about most areas of computer science, iis true for computer vision in ‘much deeper sense than, ay, for databases or graphics. Afterall, understanding the Principles of vision has implications far beyond engineering, since visual perception is ‘one of the key modules of human intelligence, Ulimately, understanding the problem ‘of vision islielyt help us understand the brain. For this reason, Iam sure that along and successful series of new editions will follow this book, with updates most likely to.come in the chapters dedicated to object recognition and in new hot topics such as adaptation and learning. Introductory Techniques for 3-D Computer Vision is much more than a good ‘textbook: I's the fist book to mark the coming of ge of our own discipline, computer ‘Tomaso Poggio ‘Cambridge, MA Brain Sciences Department and Artificial Intelligence Laboratory “Massachussetts Insitute of Technology Preface: About this Book Here tae this book and peruse it wel (Christopher Marlowe, Doctor Fuss, rai die ei fare ce’ di mezzo il mar! Ttlin proves What this Book is and is Not ‘This book le meant:0 be: ‘+ anapplie introduction tothe problems and solutions of modern computer vision ‘+ practical texbook, teaching how to develop and implement algorithms to rep: resentative problems. ‘+a structured, easy-to-follow textbook, in which each chapter concentrates on a specifi problem and solves it building on previous results, and all chapters form logical progression, + a colection ofseleted, well-tested methods (theory and algorithms), aiming to balance difficulty and applicability. ‘+ a starting point to understand and investigate the literature of computer vision, including confxences, journals, and Internet sites a sel-teaching tool for esearch students acad and professional scientists This book is nor meant to be: + an alLembracing book on computer vision and image processing ‘+ a book reporting research results that only specialist ean appreciate: It is meant for taching an exhaustive or historical review of methods and algorithms propose for each problem. ‘The choive of topics has been guided by our feeling of practitioners. There is ‘no implication whatsoever that what is lft out is unimportant. selection has been Between words ndnkthere ithe se.Preface: About this Book imposed by space limits and the intention of expining both theory and algorithms to the level of detail necessary to make implementation really posible What are the Objectives ofthis Book? + ‘Tointroduce the fundamental problems of computer vision, ‘+ Toenable the reader to implement solutions for easonaby complex problems. + To develop two parallel tracks, showing how fundamental problems are solved using both intensity and range images, wo most popular types of images in today's ‘computer vision community. ‘+ To enable the reader to make sense of the literature of computer vision. ‘What isthe Reader Expected to Know? This book hasbeen written for people itrestedin programming solutions tocomputer ‘ison problems The bes way of reading iso ty out the algorithms ona computer. ‘We assume that th eader is able oraslte ou pseudocode into computer programs, and therefore that he o hei familiar with language suitable for numerical compu tations (for instance C, Fortran). We also expect thatthe reader has aces to popular rumercl bares tik the Numerical Rerpes or Meschach orto high-evellanguages, for developing numerical software like MATLAB, Mathematica or Sia The whole book is non-languae specif: We have endeavored to present ll he necessary vision specif information, so thatthe reader anly needs some competence ina programming Language Although some of the mathematics may appear complex at fist glance, the whole book revolves around base calls linear algebra including least squares, eigenvee: tors andsingular value decomposition andthe fundamentals ofanalyticand projective geometry. Who can Benefit from this Book? + Students of university courses on computer vison, typically final-year undergrad uates or postgraduates of degrees like Computer Science, Engineering, Mathe: matics, and Physics. Most of the knowledge required to read this book should be part oftheir normal background, ‘+ Researchers looking for a modem presentation of computer vision, as well asa collection of practical algorithms covering the main problems ofthe discipline. ‘+ Teachers and students of professional training courses. ‘+ Industry scientists and academics interested in learning the fundamentals and the practical aspests of computer vision. 3For information on tia the othe packages msatiaed ere se Chae Preface: About this Book xv How is this Book Organized? Each chapter is opened by a summary fits contents, and concluded by asel-check list, review questions, acancse guide tofurther readings, as wellasexercses and suggestions for computer project. For each problem analyzed, we give 1. problem statement, defising the objective to be achieved 2, atheoretical treatment ofthe problem, 3, one or two algorithms in pseudocode. 4, hints on the practical applicability ofthe algorithms ‘few maberatil concep are cui tothe understanding of solos and signin bt not neces Haown to everybody To make te ook ressoaby Scfeontaned, we lave ince an append ih several brit section reviewing Bachgound tps We ed to gear te append to he velo etl neo 0 Understand te dscsions ofthe main extn tempt ovo jut amere spel ot vague reminder. it iade an fro Keep the fone informal thowghot,hopefly without rsaxing oo mosh te material gor. "he paps ave ben designed oft ick etication of important mater Pelem Stement port defitons and algorithms at ese i fas hin and coments of pace elvan alling codag suggestion spear ina fren point rand are bight a pointer (=). Final we ht nuded in Caper information on he computer von com smn indding ptr Internet von ies (eta image ad dimen) Anda tofthe na pucatons electron newts ndcnferences Suggestions for Instructors “Thematerialinthistextshould be enough for two semesters atthe senior undergraduate level, assuming thre hours per week. Ukimately, this depends onthe students’ back: ‘round, the desired evel of detail the choice of topics, and how much time is allocated ta project work. Insructors may want to review some of the material in the appendix in the first few lectures ofthe course nase ony ore semester i available, we suggest two selections of topics. + Stereo and Motion Chapters 1 to 6 (image acquisition, noise attenuation, feature texttaction andcalibration), then Chapters 7 (stereopsis) and 8 (motion analysis), ‘Object Recognition Chapters | to 6, then Chapters 10 (object recognition) and 11 {object location). [Ideally the students should be assigned projects to implement and test atleast some of the algorittms It is up to th instructor to decide which ones, depending on how the course is stctured, what existing software is available to students, and which parts ofthe book one wants to cover.‘So Why Another Book on Computer Vision? ‘We like to think of this textbook, frst and foremost, asa practical guide tothe solutions of problems characteristic of today's computer vision community As thisbookis meant for both students and practitioners, we have tied to give a reasonably complete theo= retical treatment of each problem while emphasising practical solutions. We have ried to state algorithms as clearly as possible, and to lay out the material in a graphically appealing manne, i a logical progression. Ttseems tous that theresa shortage of such textbooks on computer vision, There are books surveying large number of topics and techniques, often large and expensive, sometimes vague in many places because of the amount of material included; books very detailed on theory, but lacking on algorithms and practical advice; books meant for the specials, reporting advanced esuts in specific research or application areas, but of little use to students; and books which are nearly completely out of date. Moreover, and ‘not infrequently in computer vison, the style and contents of research articles makes it dificult (sometimes close to impossible) to reimplement the algorithms reported. ‘When working on such articles for this book, we have tried to explain the theory in what seemed to us a more understandable manner, and to add details necessary for implementation. Of course, we take full and sole responsiblity for our interpeetalion. ‘We hope our book fils gap, and satisfies areal demand. Whether or not we have succeeded is for you, the eader, to decide, and we would be delighted to hear your ‘comments. Above all, we hope you enjoy reading this book and find it useful ‘Acknowledgments ‘We are indebted toa numberof persons who contributed various ways tothe making ofthis book. We thank Dave Braunegg, Bob Fisher, Andrea Fusiello, Massimiliano Ponti, Claudio Uras, and Larry Wolff for their precious comments, which allowed us tore move several laws from preliminary drafi Thanks ls to Massimiliano Aonzo, Adele Lorusso, Jean-Francois Los, Alessandro Migliorini, Adriano Pascoleti, Piero Parodi, and Maurizio Plu for their careful proofreading Many people kindly contributed various material which has been incomporated in the book; in the hope of mentioning them all, we want to thank Tiziana Acard, Bill Austin, Brian Calder, Stuart Clarke, Bob Fisher, Andrea Fusiello, Christan Fruhling, Alois Goller, Dave Lane, Gerald McGunigle, Stephen McKenna, Alessandro Miglio- fini. Maid Mirmehd, David Murray, Francesca Cwlone, Mattzin Pi, Costas Plaka, Toseba Tena Ruiz, John Selkirk, Marco Straforini, Manickam Umasuthan, and Andy Wallace. Thanks to Marco Campani, Marco Cappello, Bruno Caprile, Enrico De Micheli, Andrea Fusello, Federico Girosi, Francesco Isgrd, Greg Michaelson, Pasquale Ot tonello, and Vito Roberto for many useful discussions (Our thanks to Chris Glennie and Jackie Harbor of Prentice-Hall UK, the former {or taking us through the early stages ofthis adventure, the latter for following up with remarkably light-hearted patience the development of this book, which was peppered by our consistent inftnging of deadlines. Thanks to Irwin Zucker and Tom Robbins of Prentice Hal in the US. for taking the book through is very final stage. Finally very special thanks to Clare, Daniela, Emanuele, Emily, Prancesce and Lorenzo, who put up wit two absent fathers and husbands for many a month, for their support and love, Fortunately for us, maybe unfortunately for them, we are back, Binanuele Truce Alessandro Ver Dept of Computing and Dip. di lnformaticae letra Engineering ‘Scienze de nformarion Heriot-Watt Unizersiy Universi i Genova Rcearton Via Dodecanes0 35 Edinburgh EH92501 16146 Genova UK Kea sretece.t0- 46. vorridge-inéa.tt [stu wn writen eile the authr was wi the Department of Pass tthe Unversity of Gens,L Introduction “alse “Ready when you are” Big Trouble in Lite China 1.1. Whatis Computer Vision? “This isthe fis, ineseapable question of this book. Since itis very dificult to produce an uncontroversial definition of such a multifaceted discipline as computer vision, let tusask more precise questions Which problems are we attempting to tackle? And how ddowe plan to solve them? Answering these questions wil imit and define the scope of this book, and, in doing so, motivate our definition of computer vision, ‘The Problems of Computer Vision, The target problem ofthis book is comput ing properies ofthe -D world from one or mor digital images. The properties that intrest us are mainly geometric (fr instance, shape and position of solid objets) and dynamic (for instance, object velocities). Most ofthe solutions we present assume that ‘considerable amourt of image processing has alteady taken place; thats, new images have been computed from the original ones, or some image parts have been identified tomake explicit the information necessary to the target computation The Tools of Cemputer Vision. As the name suggests, computer vision involves computers interpreting images. Therefor, the toolsmeeded bya computer vision sytem inchude hardware for acquiring and storing diptal imagesina computer, processing the mages, and commun ating results to users or other automated systems Thisisa book bout the algorithms 3f computer vision: it contains very litle material about hardware, but hopefully enough o realize where digital images come from. Ths does nor mean that algorithms and software are the only important aspect ofa vision system. On the2 Chapter 1 Introduction ‘contrary, in some applications, one can choose the hardware and can engineer the scene to facilitate the task ofthe vision system: fr instance, by controlling the illumination, using high-resolution cameras, or constraining the pose and location of the objects In ‘many situations, however, one has litle or no control over the scene. For instance, in the case of outdoor surveillance or autonomous navigation in unknown environments, appropriate algorithms are the key to sucess. ‘We are now ready to define the scope of computer vison targeted by this book: a set of computational techniques aimed at estimating or making explicit the geometric and dynamic properties of the 3-D world from digital images. 1.2. The Many Faces of Computer Vision ‘An exhaustive list of all the topics covered by the term “computer vision” dificult to collate, because the ld is vast, multidsciplinry, and in continuous expansion: new, exciting applications appear ll the time. So there is more to computer vision than this ‘book can cover, and we complement our definition inthe previous section with a quick ‘overview of the main research and application areas, and some related disciplines, 1.21 Related (Computer vision has been evolving asa multidisciplinary subject for about thirty years. Its contours blend into those of artificial intelligence, robotics, signal processing, pattem recognition, control theory, psychology, neuroscience, and other fields Two consequences of the rapid growth and young age of the field of computer vision have been, that + the objectives tools nd people ofthe computer vision community overlap those of several other disciplines. * the definition and scope of computer vision are sill maters of discussion, so that all definitions should be taken witha grain of sal You are likely to come across terms like image analysis, scene analysis, and image understanding, tht in this book we simply regard as synonyms for computer vision, ‘Some other tems, however, denote disciplines closely related but not identical (0 computer vision. Here are the principal ones: mage Processing. Image processing isa vast research area. For our purposes, it differs from computer vision in that it concerns image properties and image-o-image transformations, whereas the main target of computer Vsion isthe 3-D world. As most computer vision algorithms require some preliminary image processing, the overlap between the two disciplines is significant. Examples of image processing include en- ‘hancement (computing an image of better quality than te original oe), compression (devising compact representations for digital images, typically for transmission pur- ‘poses, restoration (climinating the effect of known degradations) and featureexraction (locating special image elements like contours, or textured areas). A practical way to Section 12. The Many Faces of Computer Viion 3 understand the diference between representative problems of image processing and ‘computer vision is tocompare the contents of Chapters 3 4, and 5 with those of Chap: ters6 011, Pattern Recogaition. For along time, pattern recognition has produced teeh- niques for recognizng and clasifving objects using digial images. Many methods developed in the past worked well with 2-D objects or 3D objects presented in constrained poses, but were unsuitable forthe general 3-D world. This triggered much of the research which led to todays field of computer vision. This book does not cover classic patter recognition, although some of its methods creep up here and there, The International Association jor Pattern Recognition (IAPR) gathers many researchers and users interested inthe field, and maintains a comprehensive WWW site (ntep://peipa.esesx.ac.uk/iapr/) Photogrammery. Photogrammetry is concerned with obtaining reliable and accurate measuremers from noncontactimaging. Ths discipline overlap les withcom- puter vision than image processing and pattern recognition The main differences are that photogrammetry pursues higher levels of accuracies than computer vision, and not all of computer vision is related to measuring Taking a look at photogrammetric meth- ‘ds before designinga vision system carrying out meesurements i always a good idea ‘The International Soiesy of Photogrammetry and Remote Sensing iste international organization promoting the advancement of photogrammetry. It maintains avery com: prehensive Internet site (ae%p://0u.p.igp-ethz.ch/isprs/isprs.htal) including archives and activites, and publishes the Journal of Photogrammerry and Remote Sensing 1.22 Research and Application Areas For the purposes ofthis section, research areas refer to topics addressed bya significant ‘numberof computer vision publications (a visible indicator of research), and application ‘areas refer to domains in which computer vision methods ate used, possibly in conjunc- tion with othe techuologies to solve real-world problems. Te following ists and the agcompanying figures should give you the flavor ofthe variety and scope of computer vision; further applications ae illustrated in the book. The ls are meant to be sugges tive, not exhaustive; most ofthe terms that may be unclear now will be explsned later inthe book. Examples of Research Areas Image feature detection Contour representation Featuro-hased segmentation ange image analysis ‘Shape modelling and representation ‘Shape reconstruction from single mage cues (shape from X)4 Chapter 1 Introduction Stereo vision Motion analysis Color vision ‘Active and purposive vision Invariants ‘Uncalibeated and selcalibeating ystems Object detection 34D object recognition 3D object location High-performance and real-time architectures Examples of Applicaton Areas Industrial inspection and quality control Reverse engineering Surveillance and security Face recognition Gesture recognition Road monitoring Autonomous vehicles (land, underwater, space vehicles) Hancb-eye robotics systems Space and applications Miltary applications Medical image analysis (e., MRI, CT, X-rays, and sonar sean) Image databases Virtual reality telepresence, and teleroboties 1.3 Exploring the Computer Vision World ‘This setion provides satin set of pointers (othe mulifceted work of computer vision. nal the following list, items appear in no particular order, 1.3.1. Conferences, Journals, and Books Conferences. ‘The following international conferences cover the most significant advancements on the topics central o this book. Printed proceedings are avsilable for allconferences, and details appear regularly on the Internet. International Conference on Computer Vision (ICCV) International Conference on Computer Vision und Pattern Recognition (CVPR) European Conference on Computer Vision (ECCV) Section 13. Exploring the Computer Vision World 5 Figure 11 protaype of 2D inspection cel, The cell incudes two types of depth sensors 8 laser scanner, and a Mair fine sem (se Chapter2), whi locate the object n space and esform measurements Noie the urnable for optima, tomate objet pastoning. International Conference on Image Processing (ICIP) International Conference on Pattern Recognition (ICPR) Several national conferences and international workshops are organized on annual or biennial hasis. A complete list would be too long, so none of these are mentioned for fairness. Journals, ‘The following technical journals cover the most signilicant advance ‘ments onthe fied. Tey ean be found inthe libraries of any university hosting research ‘on computer vision or image processing International Journal of Computer Vision IEEE Transactions on Pattern Analysis and Machine Intelligence Compuier Vision and Image Understanding ‘Machine Visionand its Applications “Image and Visien Computing Journal Journal ofthe Cptical Socesy of America A Pater Recognition6 Chapter 1 Introduction Figure 12 Left: automatic recogiion of road bridges in arial inrared images courtesy of Majid Mirmebdi, University of Surey; Cow copyright reproduced withthe permission ‘of the Conller of Her Majesty's Stationery Office), Right an example of automatic ace detection, parcalarly important for survslline and security systems Tae face regions teleted ‘ean bo subsequenly compared wi database of faces for dentieaton (courtesy of Stephen MeKenna, Queen Mary and Wesel College, London) Paitemn Recognition Leters IEEE Transactions on Image Processing IEEE Transactions on Systems, Man and Cybernetics IEE Proceedings: Vision, Image and Signal Processing Biological Cybernetics Neural Computation Arfcal Intelligence Books. So many books on computer vision and related fields have been published that it seems futile to produce lon lists unless each entry is accompanied by a ‘comment. Since including a complete, commented ist here would take too much space, wwe leave the task of introducing books inspect, echnical contexts to the following. chapters. 13.2 Internet As the Internet undergoes continuous, ebullient transformation, this information is likely o age faster than the rest of this book, and we can only guarantee that the section 12 Exploring the Computer Vsion World 7 Steaks and avtonomous road navigation: some images from a sequence nthe estimated motion Bed (optical low, cscs in Chapter 8) rapa ndzating the sativa motion of wel and came Figure 12 Computer vsion scguired fom a moving 2 ‘Somputed by motion rays list below is correct a the time of printing, Further Internet sites related to specitic problems ae given in the relevant chapters of this book. cau. edu/-ch1/vision chive home page, nevp:// demos, archives, research + The Computer Vision Home Page, itep:// haem} and the Pilot Buropean Image Processing Ar peipa.essex.ac.uk, contain links to test images,8 chapter Introduction Section 13 Exploring the Computer Vision World 9 Figure 15 Computervison and vinta eleprsence: the movements ofthe operator’ head ae racked by a sion system (aot shown) and copied in el ine by the head-ee plato (or stereo head) onthe sigh (courtesy of David W. Maray, University of Oxford) roups, research publications, teaching material, frequently asked questions and plenty of pointes to other interesting sites. + The Annotated Computer Vision Bibliography is an excllent, well-organized source of online published papers and eports, as well as announcements of con ferences and journals at http: //iris.usc.edu/Vision-llotes/oibliography/ contents. heal. You can search the contents by keyword, author, journal, con ference, paper te, and other ways. ‘ery competes bibliographies on image analy, pater reopition and Figure 14. Compote vison ming nein npn for ene) oe nd compute von are proded every yer by Aatel Resetfeld atte Univer turner vec (ROVAUYS te tt one shown os ANGUS, bl by the Siy or Mayland (Eepe/septeleoycon/VSION-LISTARCEVE/RUSEPLD™ Occr Sulems Laborato Hero Wat Unser As wh ay ROVIAUYS ANGUS He ae Wee ee ements cis so an sonar somos Chager) Boton ln example of wart soe patio Balan age The wt cs ute rons rma ver athe ples pls, nage fom shove : Beton agi he rot of oma uo lj inret te of Dave Lae + C¥ontin i colton of hypertext summaries of methods and applications o eso Unberst) computer vis, rece extished ty the Univers of abu at /7 wor dato ae h/eaigh/svatt/pesonth pages/eh/Clontine/Nenty. ne)0 Chapter 1 Introduction Figure 16 An example of modal application of computer vision: computer sista diagoses from mammographic images Top: Xray image of a female breast, distized rom a conventional Xiu) photography. Bottom: ose-up and automatic identifation of suxpet nodules (courtesy ‘rseuar Clarke and Brian Calder, ero Watt University, and Matthw Freedman, Georgetown University Media Scoot, Washington DC). Section 14 TheRoad Ahead 11 «The Vision List and The Pixel are fre electronic bulletins circulating news and requests, and hostng technical debates To subscribe, email pixel essex.ac.x and Vislon-List-Requesteteleoe.con, Both have fip and WWW archives of ‘useful material 133. Some Hints on Math Software “Thssecton gives pointers to numerical computation packages widely used in computer vision, which we founc useful. Notice that this list reflects only our experience; no ‘comparison whatsoever with other packages i implied «+ Numerical Recipes isa book and software package very popularin the vision com ‘munity The soure: code, in C, FORTRAN and Pascal, is published by Cambridge University Press together with the companion book by Pres, Teukolsky, Vet tering, and Flannery Numerical Recipes in CFORTRAN/Pascal. The book is an ‘xcellent introduction to the practicalities of numerical computation. There ialso 1 Numerical Recipes: Example Book illustrating how to cal the library routines. + Meschach i a public-domain numerical library of C routines fo linear algebra, developed by David F. Stewart and Zbigniew Leyk ofthe Australian National University, Canberra. For information and how to obtain a copy, see Web page at http: //a.netlib.no/net1ib/c/neschach/readze, + MATLAB sa software environment for fast prototyping of numerical programs, ‘vith its own language, interpreter, libraries (called ¢oolbox) and visualization took, commerciaized by the US. company The MathWorks. It is designed to be ‘easy to use, and rans on several platforms, inchuding UNIX and DOS machines. Matlab is desrited in several rocent books, and there is large community of tasers Plenty ofieformation on software, books bulletins training and soon avail fable at The MarsWorks’ WWW site, xeep://nex nathworks con/, of contact "The MathWorks In, 24 Prime Park Way, Natick, MA 01760, USA. «+ Mathematiais another software ensironment for mathematical applications, with ‘a large community of users. The standard reference book is Stephen Wolfram’ ‘Mathematica, Plenty of information on software, books, and bulletins available at ‘Wolfram Research's WWW sit, tp: /awy.04 -con/. «+ Sila is a publedomain scientific software package for numerical computing developed ty INRIA (France). It includes linear algebra, control, signal processing, praphics and animation. You can aocess Sciab from http: //awe= rroeq.inria, fr/ecilab/, or contact Scslabeinria. fr, 14 The Road Ahead “This book is organize in two logical parts The fist part (Chapters 2to 5) deals with the image aequsition and processing methods (nose attenuation, feature extraction, Fine and curve detection) necessary to produce the input data expected by subsequent algorithms The primary purpose of thsrst part isnot to give aneshaustive treatment of Jmage processing, butto make the book sl-contained by suggesting image processingChapter 1 introduction Figure 1.7 ‘The book ata glance: metho dases (white boxes), results (grey Box) their faterdependence, and where to ind the various tpi in this book nto commonly fund inn some cases characteris of compte vison, The me era ebook (Chaps 6 1d with he conpute sn probes {cts nn anal bee entation, objet lean) hat we hve nied sours Tree cute of he book captured by Fi 17, wich stows the metho preset her merdpenene he mtr nr and wth ha {Eons ich opis Ouro tops wh he seg of ne inet or eo ‘oars id sequen, Ble bmg lo rts te images ae proceed atten nie ogo bythe auton pres. The arget Mrantn {tre ot lean and en of ec and ssemparan- Section 1.4 The Road Ahead 13 eters) ae shown at the bottom of Figute 7. The diagram suggests that in most cases the same information an be computed in more than one way. ‘One well-known class of methods rely on the identification of special image clement, called image features. Examples of such methods are: + callration, which determines the value of internal and external parameters of the vision system; + stereo anayss, vhich exploits the difference between two images to compute the structure (shape) of -D objects, and thei location in space: + recognition, whigh determines the objects’ identity and location; + feature-based motion analysis, which exploits the finite changes induced in an mage sequence by the relative motion of world and camera to estimate +-D structure and motion; and + some shape fron single image methods, which estimate 3D structure from the information contained in one image only. [Another class of methods computes the target information from the images directly Of these, this book ineludes: + one shape from single image method, which estimates 3D structure from the shading of a singe image, and + optical flow metiods, class of motion analysis methods which regards an image sequence as a cbse approximation ofa continuous, time-varying sign ‘We are now ready to begin our investigation into the theory and algorithms of computer vision.Digital Snapshots Verwelle doch! Dubs so shin!! Goethe, Faust This chapter deals with digita. images and their relation tothe physical work, We learn the principles of image formation define the two main types of images i this book (inensity and range images), and discuss hov to acquire and store them in a computer. Chapter Overview Section 2.2 considers the base optical, radiometric, and geometric principles underlying the formation of intensity images. ‘Section 23 brings the computer into the picture, laying out the special nature of digital images, their acquisition, and some mathematical models of intensity cameras, Section 24 discuses the funlamental mathematical models of intensity cameras and theit parameters Section 2.5 introduces range images and describes class of range sensors hated on intensity ‘cameras o that we can use what we learn about intensity imaging ‘What You Need to Know to Understand this Chapter ‘+ Sampling theorem (Appentix, section A3). + Rotation matries (Appendix, section A.) "op Yor ae ote! 86 Chapter 2 Digital Snapshots 24 Introduction This chapter deals with the main ingredients of computer vision: digital images. We concentrate on two types of images frequently used in computer vision: intensity images, the familiar, photographie images encoding light intensities, ac ‘quired by television cameras; ange images, encoding shape and distance, acquired by special sensors like sonars oF laser scanners Intensity images measure the amount of Fight impinging on a photosensitive device rangeimagesestimate directly the3-D structure ofthe viewed scene through avarity of techniques. Throughout the book, we will develop algorithms for both types ofimages? tis important to stres immediatly that an digital image, irrespective of type is a2-D army (trix) of numbers. Figure 21 illustrates ti fact forthe case of intensity images Depending onthe nature of the image the numbers may represent light intensities, distances or other physical quantities Ths fact has two fundamental consequences + The exact relationship ofa digital image to the physical world (ie. its nature of range or intensity image) is determined by the acquisition process, which depends on the sensor used + Any information contained in images (¢. shape, measurements, or object iden: sity) must ultimately be extracted (computed) trom7-D numerical arrays in which itisencoded. {In this chapter, we investigate the origin ofthe numbers forming a digital image: the rest ofthe book is devoted to computational techniques that make explicit some of the information contained implicy in these numbers. 22. Intensity images ‘We startby introducing the main concepts behind intensity image formation 2.2.1. Main Concepts In the visual systems of many animals incloding man, the process of image formation begins withthe light ays coming from the outside world and impinging onthe photore ceptors in the retina A simple look at any ordinary photograph suggests the variety of physical parameters playing a role in image formation. Here i an incomplete list: (Optical parameters of the ens characterize the sensor's optics They include «lens ype, + foal length, Anoop esas ome algorithm mk ses fr nts nage cn Section 22 Intersty mages 47 Figure 21. Digital images are 2 arrays of numbers: 2020 grey evel image of an eye (pivels have been elrgd for display) andthe coresponding 2-D acy. + fild of view, + angular apertures. Photometric pramcters appear in models ofthe light energy reaching the sensor after being reflected from the objects in the scene. They include: + type, intensity, and direction of ilumination, « reflectance properties ofthe viewed surfaces, + effets ofthe sensors structure on the amount of ight reaching the photore cxptors Geometric parameters determine the image position on which aD points projected. ‘They include: + typeof projections, + position and orientation of camera in space, + perspective distortions introduced by the imaging process.8 Chapter 2 Digital snapshots ‘oprica ‘SoREEN ‘opricaL ae svsTeM ‘APERTURE Figure 22. "The basic elements of an imaging device Allthe above playsa olen any intensity imaging device, beita photographiccam- cera, camooeder, or computer-based systom. However, further parametersare needed to characterize digital images and their acquisition systems. These include: ‘the physical properties of the photosensitive matrix of the viewing camera, ‘the discrete nature ofthe photoreceptors, ‘the quantization of the intensity scale We will now review the optical, radiometric, and geometri aspects of image formation, 222 Basic Optics ‘We fist noed to establish afew fundamental notions of optics. As for many natural visual systems, the process of image formation in computer vision begins with the light ‘ays which enter the camera through an angular aperture (or pupil), and hit a sereen ‘oF image plane (Figuse 2.2), the eamera’s photosensitive device which registers ight intensities. Notice that most ofthese rays are the result of the reflections of the rays ‘emitted by the light sources and hiting object surfaces. Image Focusing. Any single point of a sene reflects light coming from possibly ‘many directions so that many ras reflected by the same point may enter the camera In order to obtain sharp images, all rays coming from a single scene point, P, must ‘converge onto a single point on the image plane, p, the image of P. i this happen, we say that the image of Pisin focus if ot, the image is spread over aetce, Focusing all rays from a scene point onto a single image point can be achieved in two ways: 1 Reducing the camera's aperture to a point, called a pinhole. his means that only ‘one ray from any given point can enter the camera, and creates a one-to-one correspondence between visible points, rays, and image points Thsresultsin very Section 22 Intensity mages 19 sharp, undistored images of objects at different distances from the camera (see Project 21), 2. Introducing an optical system composed of lenses, apertures, nd other elements, explicily designed to make all rays coming from the same 3-D point converge ‘onto a single image point ‘An obvious disadvartage of a pinhole aperture is its exposure rime; that i, how long the image plane is alowed to receive light. Any photosensitive device (camera film, electronic sensors) needs @ minimum amount of light to rgistera legible image. As @ pinhole allows ver litle lightinto the camera er time uit the exposure time necessary toform the image is too long (typically several seconds) tobe of practical use? Optical systems, instead, can 2e adjusted to work under a wide range of illumination conditions nd exposure times (he exposure time being controlled by a shut), "= Intuitively an opal system canbe regarded as a device that sims at producing the same ‘mage obtained pinhole aperture, but by means oa much ager aperture and ashore «exposure time: Moreover, an optical systems enhances the light gathering power. Thin Lenses. Standard optical ystems are quite sophisticated, but we can learn the basi ideas from the simplest optical system, the shin lens, The optical behavior of thin lens (Figure 2.3) is characterized by two clement: an axis, called opical avs, going through the lens center, O, and perpendicular to the plane; and two special points F and F,, called left ani right focus, placed on the optical ais oa the opposite sides of the lens, and at the same distance from 0. This distance, called the focal length ofthe lens is usually indicated by By construction a tin lens deflects all rays parallel tothe optical axis and coming from one side onto tke focus onthe other side, as described by two basic properties, Thin Lens: Basic Properties 1 Any ray entering the les parallel othe axis on one side goes through the fots on the “other side, 2. Any ray eoteringthe lens from the focus. on onesie emerges pall to thesis oa the ther side The Fundamental Equation of Thin Lenses. Our next task isto derive the {fundanerial equation of thin lenses from the basic properties 1 and 2. Consider a point ?P, not too far from the optical axis, and let 7+ f be the distance of P from the lens along the optical axis (Figure 2.4). By assumption, a thin lens focuses al the rays from onto the same poit, the image point p. Therefore, we can locate p by intersect only two known rays.and we do not have to worry about tracing the path of any other sume, uhh, ven) popetonlo the ager othe spre dane, ihn um ‘spropriorl the mont of ight ht enter the nag emChapter2. Digital Snapshots Section 22 Intensity mages 21 Figure 23 Geometic optics ofa thin les (a perpendcaat view 10 ‘he plane approximating the len} Note that by applying property 1 tothe ray PQ and property? tothe ray PR, PQ and PR are deflected to intersect at certain pointon the other side of the thin lens. But since the lens focuses all rays coming from P onto the same point, PQ and PR must intersect at p! From Figuee 24 and using the two pars of similar triangles < PS > and < ROF; > and < psF, > and < QO F, >, we obiain immediately f en Setting 2=Z + f and3=2+ f, (21) reduces to our target equation. 2: ‘The Fundamental Equation of Thin Lenses, @2) © The say going through the lens ceter, 0, named the principal ray, ges through pun selected. Field of View, One last observation about optics Let d be the effective diameter ofthe lens, ideatifying the portion ofthe lens actually reachable by light rays Figure 24 Ingingby a thin lens Noie tht in genera, tel leas has wo Aitferent focal lengths, because the carvatares of twosurtaes may beillerent ‘Te sation depicted hee ia speci case, but its sufcient for our purposes. See the Furhet Readings at the end of his chapter for more on optics. © Wecalld the efcive diameter wo emphasie the ilerence between d and the physical ameter ofthe ens The aperture may prevent ight rays from reaching the peripheral Pon ofthe lng so that ds usually smaller than the pial diameter ofthe les. ‘The effective lens diameter and the focal length determine the field of view of the lens, which isan angular measure ofthe portion of3-D space actually seen by the camera Itiscustomary to deine the field of view, w, as half of the angle subtended by the lens ameter as seen fom the focus: a nw = 55 ee ‘This is me minimum amount of optics needed for our purposes Optical models of real imaging devices are a great desl more complicated than our treatment of thin (and ideal) lenses; problems and phenomena not considered here include spherical aberration (defocusirg of nonparaxial rays), chromatic aberration (different defocusing ‘of rays of different dors), and focusing objets at different distances from the camera.* “The findanetal oat oth ems imp ht scene point dln stances he esc ‘infocus te! nag dsances The opal las sews fel exneras re signed hat al ons thn agen ge of dances aread ono slo ote mage plane nd etetoe sepa ino, “Thisrange scaled dh ld ofthe camera22 Ghapter2_ Digital Snapshots Lic source see coo ARRAY 0. Tot D> omnes p . ap) ua) 1 @ |sunrace Figure 25. tlusraton of the basic radiometric concepts. ‘The Further Readings section atthe end of this chapter tell where to find more about opts. 2.23. Basic Radiometry ‘Radiometrysthe essential part of image formation concerned with the relation among the amounts of light energy emitted {rom light sources, reflected from surfaces, and registered by sensors. We shall use radiometric concepts to pursue two objectives 1. modelling how much of the illuminating lights reflected by object surfaces; 2, modeling how much ofthe reflected light actually reaches the image plane of the Definitions. We begin wit some definitions, illustrated in Figure 25 and sum marized as follows: ‘Image Irradiance and Seene Radiance ‘The image tradiance isthe power of he ight, er unt area and at each point p of the image plane "The sone radiance isthe power ofthe light, er unit rca, deally emitted by each pot P of surface in SD space ina given direction d = Ideally refers to the fact thatthe surface in the definition ofsoene radiance might be the illuminated surface ofan objet the radiating surface ofa light source, or even & Scttiows surface. Te term scone radiance denotes the tll radiance ete bya pint; Section 22 Intensity mages 23 Sometimes radiance refers tothe enery raised from a surface (emitted or reflected), whereas rane refers to the energy incident on a surface. Surface Reflecance and Lambertian Model. A model of the way in which 2 surface refiecs incident light called a surface reflectance model. A well-known one is the Lambertian model, which assumes that each surface point appears equally bright from all viewing directions This approximates well the behavior of rough, nonspecular surfaces as well as various materials lke matte paint and paper. If we represent the direction and amouat of incident light by @ vector I, the scene radiance ofan ideal Lambertian surface, , is simply proportional tothe dot product between Tand the unit ‘normal to the surfac, Un ea) With >0 a constant called the surfac's albedo, which is typical of the surfaces material, We abo assume that I's postive; that the surface faces the ight source “his isa necessary canton forthe ay flight wo reach P this condition snot met, the scene radiance shouldbe set equal 0. ‘We willuse the Lambertian modelin several pars fthisbook; or example, wile analyzing image sequences (Chapter) and computing shape fom shading (Chapter9). Intuitively, the Lambertian modelisased on the exact cancellation of two factors Ne- sletng constant ems the amount of ight reaching any surface isalways proportional tothe cosine of the angle between the Huminant and the surface normal w (that i the effective area of the suriae as seen fom the luinant direction). According to the model, a Lambertian surface reflects light ina given direction d proportionally to the cosine of the angle between dand m. Bu since the surfaces rea sen fom the m grid of squared ‘element (0). Section 23 Acquiring Digital Images 31 In summary itis convenient fo assume that the CCD elements are always in ‘one-to-one correspondence with the image pixels and to introduce effective horizontal and vertical sizes to account forthe posibe different scaling along the horizontal and vertical direction, The effective sizes of the CCD elements are our first examples of camera parameters, vhich are the subject of section 24, 23.2. Spatial Sampling ‘The spatial quantization of images originates at the very early stage of the image {formation process asthe photoreceptors ofa CCD sensor are organized in rectangular array of photosensitive elements packed closely together. For simplicity, we assume that the distance d between adjacent CCD elements (specified by the camera manufacturer) isthesamein thehorzontal and vertical directions Weknow from the sampling theorem ‘that d determines the highest spatial frequency, that can be captured by the system, according tothe relation How does this characteristic frequency compare with the spatial frequency spectrum of ‘mages? A classical sult ofthe diffraction theory of aberration states that she imaging _process canbe expressed in terms of a liner low-pass fltring ofthe spatial frequencies, (ofthe visual signal Foe more information about the diffraction theory of aberrations, see the Further Reafings) In particular, fais the linear size of the angular aperture ‘ofthe optics (eg, the diameter ofa circular aperture), the wavelength of ight, and f ‘the focal length, spatial frequencies larger then SF {do not contribute tothe spatial spectrum of the image (that is they are fered ou). Ina typical imege acquisition system, the spatial frequency vs nearly one order ‘of magnitude smaller than v.. Therefore, since the viewed pattern may well contain spatial frequencies larger than, we expect aliasing. You can convince yourself of the realty of spatial aliasing by taking images of a pattern of equally spaced thin black lines on a white background (sce Exercise 2.6 at inercasing distance from the camera. ‘As predicted by the sampling theorem, ifm isthe numberof the CCD elements inthe horizontal direction, the camera cannot se more than’ vertical lines with’ somewhat less than n/2, say n’~n/3). Until dhe number of ines within the field of view remains smaller than all the ine are correctlyimaged and resolved. Once the limits reached, i the distance of the patter is increased further, but before blurring effects take over, the numberof imaged lines decreases a the distance of the pattern increases! © Themainresor why spatial aliasing soften newecedis thatthe amplitude (hat ithe {informatica coment of high requency components of ordinary images s usualy, though ‘byo means always, very small32 Chapter? Digital Snapshots 233 Acquisition Noise and How to Estimate It Let us briefly touch upon the problem of nose introduced by the imaging system and bow itis estimated. The effect of noses, essentially that image values are not those expected, as these ae corrupted during the various stages of image acquisition. Asa onsequcnc, the pixel values of twoimagesof the same scene taken bythe sume camera andinthe same light conditions are never exact the same (ty it), Such Buctuations vil intodce eosin the results of ealulation based on piel values iis therefore ‘important to estimate the magnitude ofthe noise. “The main objective of this setion io suggest sinple characterization of image noite, which can be used by the algorithms of fllowing chapters, Noise attenuation, in pariculr, isthe subject of Chapter 3 "An obvious way 1 proceed isto regard noisy variations as random variables, and ty to characterize their statistical behavior, Todo this we aequie a sequence of mages ofthe same scene in the same aquisition conditions, andcompute the pointwise average of the image rightness overallthe images The same sequence can also be sed to estimate the signalo-noise rai ofthe aoqusition system, as fllows® Algorithm EST_NOISE. Weare givenm images ofthe same cee, Fy .. Ey-1hich we assume guare (Nx N) fo simplicity Torech f= let r= Baw - 1 vore(2 {LTD Ba, a} en he quaiyo(, san eal the sana devon th acuion net ach pia “heavens of, oe te inagean ema fe serge owl ey (ot. anesinat ofthe wort ae acon noe, © Notice thatthe bea requeney of some Muorescent room lights may skew the resus of EST-NOISE. Figure211 shows the noise estimates relative to particular acquisition system. A statie camera was pointed at a picture posted on the wall. A sequence of n= 10 images ‘was then acquired. The graphs in Figure 2.11 reproduce the average plus aad minus ‘the sgn rato is usally exes ia decibel (8), an i deed as 10 tne the past in ‘eI othe to of wo power in ures, of pal nd nae). Fo xa» alton ao of 1heamesponc t= 28 Section 23 Acquiring Digital Images 33, 1 Srey eves scan line Figure 2.11. Estimated acquisition noise. Graphs of the average image rightness plu (solid ine) and minus (dotted ne) the estimated standard Geviaton, over sequence of inages of the same scene along the same orion can fe. The image brightness ranges fom 73 o 211 grey evel the standard devition of the image brightness (pixel values) over the entre sequence, along an horizontal scantine (image row). Notice thatthe standard deviation is almost independent ofthe average, typically ess than 2 and never larger than 25 grey values ‘This corresponds o an average signal-to-noise ratio of nearly one hundred "Another cause of noise, which i important when a vision system is used for fine reasurements is that pizel values are not completely independent ofeach other: some cros-salhing occurs between adjacent photosensors in each row of the CCD array. {due to the way the content of each CCD row is read in order tobe sent to the frame bulfer. This caa be vetfied by computing the autocovariance Cre, ) of the image of spatially unifoem pater parallel to the image plane and illuminated by diffuse light ‘Algorithm AUTO_COVARIANCE 1. Given an image E, foreach Cent’) -FTDVEU +E +1)-FT TT 8)Fn Chapter? Digital snapshots Figure 2.12 Autocovariance ofthe image ofa uniform patter for typical image dequisition sistem, showing ross aking Between adjacent piel along © The autcovarance should actually be estimated asthe average ofthe autocovarianee ‘computed on many images of the same patter. To minimize tbe effect of radiometric rotlinearies (se (2.13), Cre should be computed ona atch in the cena portion of the image Figure 2.12 display the graph ofthe average ofthe autocovariance computed on ‘many images acquired bythe same acquisition system used to generate Figure 2.11. The utocovariance was computed by means of (2.18) onapatch of 16 x I6pixelscentered in the image center, Notice the smal but visible covariance along the horizontal direction consistently with the physical properties of many CCD cameras, this indicates that the ‘rey valu ofeach pixel is not completely independent ofthat ofits neighbors 24 Camera Parameters ‘We now come back to discuss the geometry of a vision system in greater detail. In prtvula, we want to characterize the parametcrs underlying camera models 2A. Definitions Computer vision algorithms reconstructing the 3-D structure of a scene or computing the positon of objects in space need equations linking the coordinates of points in -D space with the coordinates oftheir corresponding image points. These equations are ‘written in the camera reference frame (see (2.14) and section 2.2.4), but itis often assunied that + the camera reference frame can be located with respect to some other, known, eference frame (the world reference frame), nd Section 24 Camere Parameters 35 + the coordinatesof the image points inthe camera reference frame canbe obtained {tom pivelcoorlinates, the only ones directly available from the image. This is equivalent to assume knowledge of some camera's characteristics, known in vision asthe camera's extrinsic and ininsic parameters. Our next task to understand ‘theexaet nature of the intrinsic and extrinsic parameters and why the equivalence holds. Defi ‘Camera Parameters ‘Tae extrinsic parameten ao the parameters that define he location and orientation of th camera reference fame wath rapes to known wold eteence fame. “Te nisi paramere are the parameters necessary oink the piel coordinates of animage point with the corespanding coordinates inthe camera reference frame. Inthe next two sections, we write the basi equations that allow us to define the extrinsic and intinsic parameters in practical terms. The problem of estimating the value ofthese parareters is called camera calibration. We shall solve this problem in ‘Chapter 6 since calination methods need algorithms which we discuss in Chapters 4 and 5 242. Extrinsic Parameters ‘The camera reference frame has been introdced forthe purpose of writing the funds ‘mental equations of the perspective projection (2.14) in a simple form. However, the ‘camera reference frane i ofen unknown, and & common problem is determining the location and orientation of the camera frame with respect to some known reference frame, using only image information. The extrinsic parameters are defined as any’ set ‘of geomeric parame that identify uniquely the transformation berween the unknown ‘camera reference frameand a known reference frame, named the world reference fame. ‘A typial choie for describing the transformation between camera and world frame isto wse + 1 3-D translation vector, desribing the relative postion ofthe origins ofthe twoeference frames, and + 43 x 3rotationmatri, R,an orthogonal matrix(R” R= RAT = 1) that brings the corresponding axes of the two frames onto each othe, The orthogonality relations reduce the numberof degrees of freedom of Rto thee (see section A.9 inthe Append). Inn obvious rotation (see Figure 2.13), the relation between the coordinates of ‘point Pin world sed camera frame, Py and P, respectively, is 19)6 ‘Chapter 2 Digital snapshots Figure 2.13. The elation between camere and world coordinate frames. with my rans Rel rams mre m3 Definition: Extrinsic Parameters The camer extrinsic parameters ae the translation vector, T, and he rotation math, (or, beter it fee parameters) which pei the transformation between the camera andthe word relerence frame 243. Intrinsic Parameters ‘The intinsc parameters can be defined as the set of parameters needed to characterize ‘the optical, geometric, and digital characteristics ofthe viewing camera, For a pinhole ‘camera, we need thee sets of intrinsic parameters, specifying respectively + the perspective projection, for which the only parameter is the focal length, f + the transformation between camera frame coordinates and piel coordinates; * the geometric distortion introduced by the optics. From Camera to Pisel Coordinates. To find the second set of intrinsic param- ‘ters, we must link the coordinates (1g Yn) of an image point in pixel units with the ‘coordinates (x,y) of the same point in the camera reference frame. The coordinates Section24 Camera Parameters 37 (ins Yin) canbe thought of as coordinates of a new reference frame, sometimes called image reference frame, ‘The Transformation between Camera and Image Frame Coordinates Neglecting any geometrdstorsions possibly introduced by the opis nd in the assumption that the CCD arrays made «rectangular rid of photosensitive elements, we have 2G, in) eam) with (2.0) the coordinates in pixel of the image center (the principal point), and (5) the tffetive sie of th pixel (in ilimeters) inthe boizontl and vertical direction respectively, ‘Therefore, the earrent set of intrinsic parametersisf,0,,0) 554 1% ‘The sign changein(220)isduc tothe fact thatthe horizontal nd vertical axes of the image nd camera elerece frames have opposite orientation. In several cases the optics introduces image distortions that become evident a the periphery of the image, or even elsewhere using optics with large fields of view. Fortunately, these distortions can be modelled rather accurately as simple radial distortions aecordingto the relations xenil thr? th) pull thy? + hor with (2494) the coordinates ofthe distorted points, and = x + 93. As shown by the equations above, this distortion isa radial displacement ofthe image points The displacement i all atthe image center, and increases with the distance ofthe point from the image center. ky and a are further intrinsic parameters Since they are usually ‘ery smal, radial distortion is ignored whenever high accuracy ie not required inal regions ofthe image cr when the peripheral pixelscanbe discarded, IFnot, as <<, fps often set equal to O, and ky isthe only intinsie parameter to be estimated inthe radial distortion mode The magnitude of geometric dstortion depends on the qult ofthe ens used. Asarule of thumb, with optisof average quality and CCD size around 50 » S00 expect distorsonsof several pics Gayaround in he outer cori ofthe image- Under these circumstances, ‘model with fs =0 is stl acute, Its now time foe a summary.38 Chapter? Digital Snapshots Intrinsic Parameters ‘The camera itis parameters are defined a8 the foal length, f, the location ofthe image center in pixel coordinates, the effective pixel size inthe hovigonta nd vertical direction (sy), and, frequied the radial distortion coefficient, ky 244 Camera Models Revisited We are now fully equipped to write relations linking diretl the pixel coordinates ofan ‘mage point wit the world coordinates ofthe corresponding 3D point, without explicit reference tothe camera reference frame needed by (2.18). Linear Version of the Perspective Projection Equations. Plugging (219) and (2.20) into 2.14) we obtain (tin = 008 =f bm = 098) =F; 2 Re. where R,,/=1,2,3,isa3-D vector formed by the/-throw ofthe matrix . Indeed, (2.21) relates the 3-D coordinates ofa point in the word frame to the image coordinates of the corresponding image point, va the camera extrinsic and intinsie parameters. "Notice that, due tothe particular form of (2.21), not all the itis parameters are independent. In particular, the Focal Tength could be absorbed into the effective sizes of the CCD elements [Neglecting radial distortion, we can rewrite (2.21) asa simple matrix produc. To this purpose, we define two matrices, My and Men a ~fls: 0 0 0 -Fisy oy o 0 7 mong ns “RIT (= mons 1), rym ny RIT M, and Man so that the 3 3 matrix Miy depends only on the intrinsic parameters, while the 3x 4 ‘matrix M,,, only oncheexzrnsic parameters, we now adéa“1” asa fourth coordinate of Po (thatis express Pin homogeneous coordinates) and form the product Mi Mex Pyy ‘we obtain linear matrix equation describing perspective projections. Section24 Camera Parameters 39 ‘The Linear Matrix Equation of Perspective Projections Me x } Pa T What isinteresng about vector [x 2,23] thatthe ratios (3/85) and (x9) are nothing but the inage coordinates n/n =n 22/3 Yin Moreover, we have separated nicely the two steps ofthe word-image projection; ‘+ Mex performs te transformation between the world and the camera reference frame; ‘+ My performs the transformation between the camera reference frame and the mage reference frame. © In more formal terms the relation between a 3-D point anit perspective projection on ‘he image plane cn be seen linear wansfrmation fom the projective space, the space of vectors [Xa Fo Za 1] tothe projective plane, the space of vector [2,82 33) This ‘wansformation is fined up oan arbrary sale factors otal tho matix Mba ony ‘independent eres (Ge review questions). This fat wl be dscused in Chapter 6 The PerspectiveCamera Model. Various eamera model inching te perspec tive and weak perspective ones, can be derived by setting appropriate constraints on the matrix M = Mi Mss. Assuming, for simplicity, 0, and sy ss) 1, M ean then be rewritten a fins Sma ~fns SRT M ( [Resolution or prcson: the smallest change in range thatthe Sensor can measure ot represent. Speed the number of ange points measured per second ‘Sie and weight inpriant in some applications eg, only smal sensors canbe ited on 8 ot ar), = tis often dificult to know the ata souray of Sensor without carrying out your own measurements A:curacy figures are sometimes reported without speiying to which error they refer to (eg, RMS absolute mea, maximum), and often omiting the experimental conltons and the optical properties of the surfaces used. 2.6 Summary Alter working through tis chapter you should be able wo: 2 explain ow digital images are formed, represented and acquired 2 estimate experimentally the nos introduced in an image by an acquisition sytem 2 explainthe concep ofintrisicand extrinsic parameters, the mosteommon modes of intensity cameras and their applicability design (but not etimplement) an algorithm fr calibrating and wing complete range sensor based on det calibration 27 Further Readings Ttishard to find moreon the cnntent of hichapter ont one hook Asa resto ‘want to know more jou must be willing to do some bibliographic seach. A readable account basic optican be foundinthe Feynman's Lecture on Physics[4). Aclasscon the subject and beyond i the Born and Wolt[3]. The Bom and Wolf also covers topics like image formation and spatial requency filtering (though ts not always simple to go through). Our derivation of (2.13) is based on Hor and Sjoberg 6], Horn [5] gvesan extensive treatment of surface refetance models Of the many, very good textbooks on signal theory, ou favorit isthe Oppeniteim, Wllsky ané Young [11]. The discussion con camera models vis the projection mati is based on the appendix of Mundy and Zisserman’s book Geometric Invariants in Computer Vision (9). ‘Our discussion of range seasrs is largely based on Bes [1], which isa very 200d introduction tothe principles, types and valuation of range Sensors. recent48 Chapter? Digital Snapshots detailed review of commercial laser scanners can be found in [14]. Two laser-based, active iiangulation range sensors are described in (12, 13]; the latter is based oa des cltraon, te former wera grote mode Refen [8] and [2] are examples of triangulation sensors projecting patterns of lines generated us Tncerent ih (es opposed to ase ight) ont th sens. Krotkow (7 nd Nayar and [Nakagawa [10] make good introductions to focus-besed ranging. 28 Review Questions © 21 How does an image change if the focal eng is varied? 22 Give an intuitive explanation of the reason why a pio camera as an infinite depth of Hl 323 Use the definition of Fnumber to explain geometrically why this quantity measures the fraction ofthe light entering the camera which reaches the image plane [3 24 Explain why the beat requency of fuorescent room ight (eg, 60 Hz) ean ‘skew the results of EST NOISE, 1 25 Intensity shresholing is probably the simplest way to locate interesting object in an image (a problem caled image segmenaion). Te idea is that only the pels whose valuc is abowe a threshold belong to interesting objects. Com rent on the shortcomings ofthis technique, patcuady in trms of he relation between scene radiane and image irradiance. Assuming that scene and ilumination can be controlled, what would you dot gurantee sucessul segmentation by thresholding? 26 The projection matrix fis 23% 4 matrix defined upto an arbitrary sale factor. Thisleaves only 1 of the 1 enties of independent. On the oer hand, we have seen thatthe matrix canbe written in terms of 10 parameters ($intinsic and Gextisicindependent parameters). Can you gues the independent intrinsic parameter that has been lt out? Ifyou cannot guess now, you have to wait for Chapters 1-27. Explain the problem of camera calibration, and why calibration s necessary atl 28 Explain why the length in milimees of an image line of endpoints 2, :) and [9] snot simply 9 ~~ 2 = 9) What does this formal mis? 29. Explain the ditferenc between a range and an intensity image. Could range images be acquired using intensity cameras only (Leno laser ight or the ke)? 2.10. Explain the reason for the word “shaded” in “cosine shaded rendering of ‘ange image". What assumptions onthe illumination does cosine shaded image imply? How i the surface gradient Linked to shading? 2-211 Whatisthe eason fr step lin RANGE CAL? Section2.8 Review 49 15 22 Consider a triangulation sensor which scans a wiole surface profile by translating an object through a plane of laser light. Now imagine the surface is scanned by macing the laser light sweep the object. In both cases the camera is stationary, What parts ofthe triangulation algorithm change? Why? 18 213. The performance ofa range sensor based on (224) depend on the values of {f.8,8. How would you define and determine “optimal” values of /,,9 fo such Exercises © 24 Show that 2.1) and (2.2) are equivalent (© 22 Devise an experiment that checks the prediction of (213) on your own ‘stem. Hint: Use spatially uniform object (ike a lat sheet of matte gay paper) illuminated by perfectly diffuse light. Use opties with a wide ild of view. Repeat the experimentby averaging the acquired image overtime. What ditference does ‘his averaging sep make? © 23. Show that.in the pinhole camera model, three collinear points in 3-D space are imaged int three collinear points onthe image plane. 2 24 Use the perspective projection equations to explain why ina picture ofa face {aken frontaly and from a very small distance, the nose appears much larger than the rest ofthe face, Can this effect be reduced by acting onthe focal length? © 25 Estimate tie nose of your acquisition system using procedures EST NOISE. and AUTO_COVARIANCE. © 26 Use the equations of section 232 to estimate the spatial aliasing of your acquisition system, and devise a procedure to estimate, roughly, the number of CCD elementsof your eamera. 2 2.7 Writea program which displays a range image asa normal image (grey levels encode distance) ora a cosine shaded image. (2 28 Derive 224 from the geometry shown in Figure 215. Hin: Use the law of sines and the pinhole projection equation. Why have we chosen to position the reference frame asin Figure 2.15? © 29. Wecan pric the sensitivity of measurements obtained through (2.24) by ‘aking partial derivatives with espect to the formula’ parameters, Compare such Predictions with respect tb and f. Projects © 21 You can tuld your own pinhole camera, and join the adepts of pinhole ‘photography. Perce ahole about Sami diameter on one side ofan old tin box, 10 to 30min depth. Spray the inside of box and li with back paint Pierce a pinhole ina piece of thsk aluminium fol (eg, the one used for milk tops), and fix the foil, tothe hole int box ith black tape. Ina dark room, fixa piece of black and white photographic film on the hole in the box, and sel the box with black tape. The ‘nearer the pinhole tothe fim, the widor the field of view. Cover the pinhole with50° Chapter? Digital snapshots References W a 6 4 ‘1 a a 8 o) 0) un ir 03) 0 «pies of black paper tobe used as shutter, Your camera is read. Indicative, 8 125-ASA film may require an exposure of about 5 seconds Make sure that the camera doesnot move as you open and clos the shutter. Some experimentation willbe necessary but results can be trking! 22 Although you wil lam how to locate image features and extract stright ines atomatislyin the next chapter youcan et ead for an implementation of the profile scanner described in section 254, and setup the equipment necessary Au need (inv aition to eamera, fame buffer and computes) is a pojetar cxeating black stripe (easily done with ide projector and an appropiate slide, or even with a asiight) and a few, accurately cut blocks You must also work out the bes arrangement for projector, spe and camera J. Bes, Active, Optical Imaging Sensors, Machine Vision and Applications, Vo. 1, pp. 127152 (988), A. Bake, H.R. Lo, D.MeCoven and B Lindsey taocalr Active Range Sesing, EEE Transactions on Pater Analysis and Machine Ineligence, Vo. 8, pp. 477-483 (193) M. Bor and E. Wolf, Principles of Opis, Pergamon Press, New York (1959), RP Feynman, RB Leighton, and M, Sand, The Feynman Lectures on Physics, Adson ‘Wesley, Reading, Mass (1965). BK Horn, Robor Vision, MIT Press, Cambridge, MA (1986. [BKC Horn and RW. Sjoberg. Caleuatng the Reflectance Map, Applied Optics, Vol. 18, pp 17101799197). E Krotkow, Fotsng, International oust Computer Vision, Vol, 1,9p.223-237 (185) 'M. Maruyama and'S. Abe, Range Sensing by Projecting Malipl Slits with Random Cus, IEEE Transactions on Pater Analysis and Machine Ineligence vo 15, 00, p. 647-551 (1988), {UL Mundy and A. Ziserman, Appendix -Projective Geometry for Machine Vision. Ia ‘Geomericfvarians in Computer Vision, Mundy JL and Ziserman, A. eds, MIT Press (Cambidgs, MA (192) SSK. Nayar and, Nakagawa, Spe from Foeus IEEE Tansactons on Pater Analysis ‘and Machine Ineligece, Vl 16, pp. 824-831 (1938) A. Oppeubcia, A'8 Wilsy an LT. Young, Sinan Stes, Preatie al Items ‘ional Editions (193) Sunt Mare, 1-C Jzouin and G. Mein A Versatile PC-Based Range Finding System, TEE Transctons on Robotics and Automation Vl, RAT, 9, 2, pp. 250-256 (191) E Truccoand RB Fst, Acquistion of Consistent Range Data Using DizetCalibaton, Proc IEEE lt Conf on Robotics and Automaton San Diego, pp. 410-315 (194), ‘T. Wohlers, 3D Digs, Computer Graphies Worl, hl pp. 78.7 (192) Dealing with Image Noise ‘The mariachis would serenade ‘And they would not shut yp tl thy were pid ‘Tom Letre,n Old Mexzo Attenuating of, ideally, suppressing image noise is important because any computer vision ‘system begins by processing intensity values, This chapter introduces a few, basic noise models and filtering methods which constitute an initial but useful toolkit for many practical situations. Chapter Overview Section 3.1 discusses the concept of noise and bow to quantify it I also introduces Gaussian and impulsive noise, and the effects on images. Section 32 discusses some estental linear and a nonlinear filtering methods sed to attenuate random and impulsive noise. ‘What You Need to Know to Understand this Chapter ‘+ The basics of signal theory: sampling theorem (Appendix, section A.3), Fourier transforms, and linear filtering. 3.1. Image Noise (Chapter 2 introduced the concept of acquisition noise, and suggested a method to estimate it. But, in general, the term noise covers mach more st52 Chapter3 Dealing with image Noise Noise In computer vison, noise may rete to any ei in images, data or intermediate ess, thats not interesting forthe purposes ofthe maia computation For example, one can speak of noise in different cases ‘+ For image processing algorithms lke edge or line detection, noise might be the spurious Nuctuations of pixel values introduced by the image acquisition system, ‘+ For algorithms taking as input the results of some numerical computation, noise can be the errors introduced in the latter by random fluctuations or inaccuracies of input data, the computer's limited precision, round-off errors, and the like, * For algorithms trying to group lines into meaningful objects, noise is the contours ‘which do not belong to any meaningful object. 8 In computer vision, what is considered noise for a ask soften te interesting signal fora sitleret task Different types of noise are countered by different techniques, depending on the noise’ mature and characteristics. This chapter concentrates on image nse. It must be clear that nove filtering is classic topic ofboth signal and image processing, and the literature on the subjects vas (Gee section 34, Further Readings). This chapters just ‘meant to provide afew starting tools which prove useful in many practical situations tis now time to formalize better our dramatis persona. Image Noise ‘We sal ase thatthe main image nos is dive and random; thai spurious, random signal, mij), added othe tre piel vals 14) Hepat Demin 6 Noise Amount ‘The amount of nie nan image an be estimated by means of mth standard deviation of the random signaln())-simportantto iow how ong che vithceapct othe interesting signa. This is specid by the sgnalso-noze rao, or SNR: Sie 6) where ois the standard deviation ofthe signal (ihe piel vales 1). The SNR is often expressed in dase: Sitar ta 03) Ths bsration wat mide by end Mai Section 3.1. Image Noise 53 © Addtive noise isan adequate assumption for he image acquisition spstems introduce in ‘Chapter 2, ut in some eases the nose might not be additive. For insiaace, malice oie whereby I =n1,odelsimage degradation in loisin ines und photonraphs owing to gin se "Notice that we assume that the resolution of the quantized grey eves issufcient to sample the image appropriately; that i, to represent all significant variations of the image iradianct.? Coarse quantization can introduce spurious contours and is thus called quantizaion noise. Byte images (256 grey levels per pixel), introduced in (Chapter2, appear to 2 adequate for most practical purposes and are extremely popular. BAA Gaussian Noise In the absence of information, one often assumes nj) fo be modelled by a white, Gaussian, zero-meanstochasic process. For eachlocation (i, ),thisamountsto thinking ‘of mi ) 8 & random variable, distributed according toa zero mean Gaussian distr ‘bution function of fixed standard deviation, which is added to 1(, j) and whose values are completely independent ofeach other and of the image in both space and time. “This simple model preits that noise values are distributed symmetrically around zero and, consequeil, pixel values /(,j) around ther true values 1(j; this is what you expect from goud acquisition systems, whic, in addition, should guarantee low rise levels, Moreover, its easier to deal formally with Gaussian distributions than ‘wth many other statistical models To illustrate the effect of Gaussian noise on images, Figure 3. (a) shows a synthetic grey-level “checkerboard” pattem and the profile ofthe ‘rey levels along a hrizontl canine. Figure 3.1 (b) shows the same image corrupted by adaltive Gaussiar noise, and the profil of the grey levels along the same seanlin. © The Gaussian mie modal is often convenient approximation dictated by ignorance: if ‘xe donot know and cannot eimae the noise crac, we take ito be Gaussian. Be ‘rare, however, tht white Gausian nosis jst an approximation of adie eal ose! ‘You shoul away try and discover as much as possble atthe origin ofthe noise: e3- investigating whch sensor aoquired the Gata, and design suppression methods optimally talloed tos eharacterisics This 8 known a8 mage restoration, another vast chapter of sage processing, 3.4.2. Impulsive Noise [Impulsive noise, also know 2s spot or peak nots, occurs usualy in addition tothe one normally introducedby acquisition, Impulsive noise alters random pixels, making their values very diferen: from the true values and very often from those of neighboring pixels 100, Impulsive nose appears inthe image & a sprnkle of dark and ight spots Itean be caused by ransmission errors, faulty elements in the CCD array, or external noise corrupting the analog-o-digial conversion. Ofcouns, hat asgiionvritonsdependson what youste ae. Tisisdicoseuhrin Capersse Chapter 3 Dealing with Image Noise @ © © Figure 3.1. (a) Symhetcimage o «120 x 120 greysevel“checkerhoat” and sey evel profile slong row. (b) Ate adding roman Gaussian aise =5).(e) Aer adding sl and pepper ‘we (etext for parameters Sal-and-pepper noise i 4 model adopted frequently to simulate impulsive noise in symthetic images. The nosy image values fy(h,&) are given by 1k) xe Ia k ul + Himes ~inin) X21 6a) where / isthe trucimage, x,» ¢ [0,1] are two uniformly distributed random variables, isaparameter controling how much of the image iscorrupted, and nie HOW Severe Section 32 NoiseFitering 55 isthe noise, You eanoblain saturated sal-and-pepper noise turning y into atwo-valued variable (y = Oot y= 1) and setting nin =0 0 ngs = 255, ‘illustrat the effects of salt and pepper noise onimages, Figure 3.1 (right) shows the “checkerboard” satlern andthe sane Seanline of Figure 3.1 (let) corupted by alt and pepper noise Wil nig and |= 99, 32. Noise Filtering Problem Statement: N Suppression, Smoothing, and Filtering Given an image 1 crnpied by nose attenuate a much as possible (aly, eliminate it ng! significantly Attenuating or if posible, suppressing image nose is important a the result of ‘most computation ca pixel values might be distorted by noise. An important example is computing image derivatives, which isthe basis of many algorithms: any noise in the signal can resultin serious errors in the derivatives (see Exercise 3.3). A common technique for noise smoothing sincar fering, which consists in convolving the image With a constant mati, called mask or kernel’ Asa rominder, here isthe basic linear filtering algorithm. ‘Algorithm LINEAR FILTER [Let be aN % M imag, an odd number smaller than both W and Mand A the Kernel of linea ler, that sa mm mas, The fered version fof Ja each pel (, ) en by he diserete convolution ter replaces the value 1, j) with a weighted sum of J values in a neighborhood of i,j); the weights are the entries of the kerel. Te effects of linear filter on a signal can be better appreciated in the frequency domain. Through the convolution theorem, the Fourie transform ofthe convolution of 7 and A is simply the product oftheir Fowier transforms (1) and $(A). Therefor, the result of convolving signal with sto attenuate (or suppress the signal frequencies corresponding to low (or zero) values of [5(4), peetrum of the filter A. "The ane aris 0 inv easter os his pit few, Kernel 3 ft ht he common with const Reel model pace: ad ine ithe pu reponse a the ler,56 ‘Chapter 3 Dealing with Image Noise 3.2.1. Smoothing by Averaging fall entries of Ain (35) are non-negative, the filters performs average smoothing. The simplest smoothing Kernel isthe mean iter, which replaces pixel value with the mean ofits neighborhood for instance, with m = 3 ifita liad 66 Sliad ‘ifthe sum ofall kernel entries isnot one, asit happens for averaging Kernels (must, be divided bythe sum of te entries to avoid that the fiered image becomes brigher than the origina ove Wy does such ler tenuate nos? Inte, averaging takes out small ‘vats: Averaging mn vale around piel) Grids he andr deviation tthe ne by vm, Frequency Behavior of the Mean Filter. in the frequency domain we have that ‘the Fourier transform of I-D mean iter kernel of width 2W isthe “sine” function sinful sine(o) = 252) (Figure 33 shows an example in 2-D). suai lobe are weighted more thaa the frequencies fal ‘mean filter canbe regarded as an approximate “low-pass ice the signal frequencies falling inside the in the secondary lobes, the filter Limitations of Averaging. Averaging is simple but as problems, including at least the following 1. Signal frequencies shared with noise are los; this implies tht sharp signal vari ations ae filtered out by averaging, and the image is blurted. As we shall ee in (Chapter 4, blurring affects the accuracy of feature localization 2. Impulsive noise is only attenuated and diffused, not removed. 3. The secondary obesin the Fourier transform ofthe mean filter's et nose into the filtered image. 3.22 Gaussian Smoothing Gaussian smoothing isa particular case of averaging, in which the kernels a2-D Gaus sian. Iseffectisllusrated by Figure 32, which shows the results of Gaussian smoothing applied to the noisy “checkerboards” of Figure 3.(center and right), corrupted by Gaussian and impulsive noise respectively. Notice that impulsive noise has only been attenuated; in fact, each spike has also been spread in space Frequency Behavior of the Gaussian Kernel, Te Fourier transform of Gauss {anisstilla Gaussian and, hence, hs no secondary lobes. Thismakes the Gaussian kernel Section3.2 Noise fitering 57 LCC = @) () Figure 3.2 (a) Results of applying Gaussian ering (kernel vd spinel, =1) tothe "checherbour” image coreuped by Gaus note and grey-level pole along a row (b) Same or the eeckerbosed” mage corrupted by sal und pepper noise a better low-pass filter than the mean filter. A comparison of the mean and Gaussian filters in both the spatial and frequency domain in 2-D is shown in Figure 33 Separability ofthe Gaussian Kernel. Gaussian smoothing canbe implemented efficiently thanks tothe fact thatthe kernels separable:58 Chapter3 Dealing with Image Noise @ () Figure 23. (a) The plot of «$x 5 Gaussian kernel of width (tp) and its Fourie transform (Gottom).(b) The same fora mean-fterkemel, ai ao =¥ YD chi -nj-= en This means that convolving an image / with a 2-D Gaussian kernel G is the same {8 convolving frst all ows, then all columns with a 1-D Gaussian having the same 2. The advantage is that time complexity increases linearly with mask size, instead of quadratically (see Exercise 34). The next box gives the obvious algorithm for a ‘separable kernel implemented in the special cae of the Gaussian kerel Section 32 NoseFitering 59 @ ) Figure 3.4 (a) 1-D Gausin (dotted) and real samples (cres for Sx 5 kere. (b) Plot af corresponding integer cermel, Algorithm SEPAR FILTER “Toconvove an image! with am x m2-D Gaussian kernel G witha = 90 4 Build 1D Gausian mask g, of width, with 2 Convole each rw of with, yeding anew image | Comoe cach column of with Building Gausian Kemels. Thanks to the separability ofthe Gaussian kernel, wwe can consider only 1-D masks, To build a discrete Gaussian mask, one has to sample ‘continuous Gaussian. To do so, we must determine the mask width given the Gaus- sian kernel we intend to use, or, conversely the o of the continuous Gaussian given the desired mask width, A relation between o and the mask width w (typically an odd umber) can be obtained imposing that w subtends most of the area under the Gaus- fan, A adequate choice i w ~ 57, which swbtends 9R.76% of the area. Fitting this portion ofthe Gaussian between the endpoints of the mask, we find that a 3pixel mask corresponds to 23 =3/5 = (46 pixel: Spiel mask to 05 =5/5 = I pixel: in general, oat Ga) Sampling acortinuous Gaussian yields real ere! entries Filtering times can be realy reduced by approximated integer kernels so that image values being integers 00, ‘lating point operations are necessary a all To build an integer kernel, you simply normalize the real eel to make its smallest entry 1, round off the results, and divide by the sum ofthe entries Figure 34 shows the plot ofa LD Gaussian profile, the real samples taken, and the 5» 5 integer kernel (1,9, 18,9, 1),60 Chapter3_Dealing with Image Noise ‘Algorithm INT_GAUSS_KER ‘To build an approximate integer kernel 1. Compute a floating point kernel G¢h 8) the same size 85 Get gin = G0) be the minimum valu of 6 2 Determine the normalization factor f = Ita ‘3 Compute the entries of the nom normalized fer as G.(hk) indicates closest integer. int {GU 8), where ie 3.23 Are our Samples Really Gaussian? You must be avare that problems lrk behind the straightforward recipe we gave for building discrete Gaussian kernels Itis instructive to take a closer look atleast atone, sampling. The pseization imposes axed sarplinginterval of pixel the piel with is taken as un, by virtue of the sampling theorem (see Append, section A’), we , =2x(05) = is lost. Notice that ai ied ony by the pixelizaton step not by the sana ‘What are the consequence foe building discrete Gaussian Kernels? In the contin ‘uum, the Fourier transform ofthe Gausian ga) =6 3 is the Gaussian g(o, 0°) with o'= 1/o. As g(o, 0") snot bandlimited, sampling ¢(x.0) ‘onthe pixel grid implies necessarily the loss of all components with > a, For roa tively small this means thatthe Fourier transform of g(x), s(a,0), is substantially ferent from zero well outside the interval [~, x], as shown in Figure 3S. To avoid lasing, the best we can do is to try Keeping most ofthe energy of g(v,0") within the interval [—,x]- Applying the “98:86% of the area” criterion inthe frequeney domain, wefind one ‘The preceding inequality tells you that you cannot sample appropriately a Gaussian ‘kernel whose 0 isless than 08 (in pixel units) no matter how many spatial samples you keep! ‘We can also interpret this result in terms of the minimum size for a Gaussian kernel Since o = w/5, for w =3 we have ¢ =046. Therefore, you cannot build a fitful Gaussian Kernel with just 3 samples For w =5,iastead, we have o = 1 which means that S samples are enough, What happens if you ignore allthis? Figure 3.6 shows that the inverse FFT of the FFT ofthe original Gaussian g(x,2) is significantly different from g(r, 0) for ¢ =06 (w =3). In acordance with oar prediction, a much smaller difference is found for @ = (w =5). 0.796, Section32 Noe itering 61 Figure 35. The Fourier transforms of two sampled Gaussians, for w= 3 (9 = 05, dotied line) and w= 5 (¢ =1, slid lin) [Notice thata smaller poron othe transform coresponing 0.2 ands lost between = Repeated averaging (RA) isa simple and etcient way to approximate Gassan smoothing It is sed on the fact thay vit of the centr it theorem, convohing a3 3 verging mask mines witha image J apponimates the convolution fF with a Gaussan mask of ~ yn/3 and sie n+ 1) “n= 2043 © Notise that RA ends toa iffrent relation betwoce @ and m fom the one we obtained {com the aeacieron (Exercise 3.6). © Figure3.6_Contiuots Gausian kernel (dotted), sampled el kernel, and continuous kernels reconstructed from samples (sit), fr = 0:5 (w= 3) (a) and o = 1 (w= 5} (b) respectivelyCChapter 3 Dealing with Image Noise Algorithm REP_AVG Leta « Binds th RA mask convolution matrices A and @.Let/ be the mputimage, Dine the 3x3 124] 1 = 2 69) 5[} 24] To onvolve / wth an approximated Guusian Kerac of = 73 L hoa! 2 For = rR ‘You might be tempted to combine separability and repeated averaging, as this would yield avery efficent algorithm indeed. But are you sue thatthe Kernel defined in REP_AVG is separable? Using separability with a nonseparable kernel means that the result of REP_AVG is different from the application of the 2-D mask, which ‘may result in errors in further processing; image differentiation is once again an apt example. Asafe way to combine separability and repeated averaging is cascading. The idea is that smoothing with Gaussian kernels of increasingly large standard deviations can also be achieved by convolving an image repeatedly with the same Gaussian kernel, In this way each tering pass of REP_AVG is surely separable (see Exercise 37) 3.24 Nonlinear Filtering Insection 32.1, we listed the main problems of the averaging filter: blur, poor feature localization, secondary lobes inthe frequency domain, and incomplete suppression of peak noise. Gaussian filters solve only the third one, asthe Fourier transform of a Gaussian has no secondary lobes The remaining problems are tackled efficiently by nonlinear filtering; thats fitering methods that cannot be modelled by convolution, The median filter is 4 useful representative of this lass. A median filter just replaces each pixel value /, ) wit the median of the values found ina local neigh hothood of (ij). As with averaging, the larger the neighborhood, the smoother the result Algorithm MED_FILTER ‘LeU be the input image, the tered i Foreach pel, sand an odd number 1. Compute the median m(, ) of he values in nn neighborhood of jy +h + he [on/2sn/]). where n/2ndsaesintger division, Section32 Nokefitering 63 @ ©) Figure3.7_ (a) Results of wppying median filtering (pixel, wide) othe “checkerboard” image corrupted by Gausian se, and eey-vel pro lon (@)Sane forthe “checkerboard” image onropted by impulsive Figure 327 shows the effects of median filtering on the “checkerboard” image corrupted by Gaussin and impulsive noise (Figure 3.1 center and right, respectively). (Compare these resus with those obtained by Gaussian smoothing (Figure 32): Median filtering has suppressed impulsive noise completely. Contoursare ako blurred lessby the sedan than by the Gaussian filter; therefore, a median filter preserves discontinuities better than linear, sseraping filters6 Chapter3 Dealing with image Noise 33. Summary After working through this chapter you shouldbe able to: explain the concept of nose, image noise, and why noise smoothing s important for computer vision design noise-smoothing algorithms wing Gaussian and median filtering decide whetheritis appropriate touse linear or median smoothing filtersin specific 3.4 Further Readings [Noise filtering and image restoration are clase tpies of noise and image processing, Detaled discussions of image processing methods are found in several books; for instance, [, 3, 10,8]. Papoulis [7] is a good reference text for Fourier transforms Repeated averaging for computer vision was fst reported by Brady eta. (1) Cai 2] discusses several linear filtering methods inthe context of difusion smoothing ‘Witkin [II] and Lindeberg [5] provide good introduction to sealespace represen tations, the study of image properties when smoothing with Gaussians of increasing standard deviation (the scale parameter). One reason for keeping multiple scalesis that, some image features may be lost after filtering with large Kernel, but small Kernels could keep in too much noise, Alternative methods for representing signals at multiple scales include pyramids [9] and wavelets [6] (see also references therein). 35. Review Questions, 54. Explain the concept of image nose, how itcan be quanted, and how it can affect computer vision computations. 13 32 How would you estimate the quantization noise in a range image in terms ‘of meun and standard deviation? Notice that thi allows you to compare icetly ‘quantization and acquisition nose 833. Explain why a non-negative kernel works as a low-pass filters and in whit ‘assumptions it ean suppress noise, (1 34 Whatisa separable kernel? What are the advantages of separability? 135 What are the problems ofthe mean filter for noise smoothing? Why and in ‘what sense is Gaussian smoothing better? 5 36 Explain why the sampling accuracy of a L-D Gaussian filter with « =0.6 ‘cannot be improved using more than three spatial samples. 1 37 Whatis repeated averaging? What are its effects and benefits? 1 38 Whatis the diference between cascading and repeated averaging? 3 39 Can you think of any disadvantage of cascading? (Hine Which standard 0) 2 38 Consider the L-D step profile 10 (3 fe [0,3] 8 iels] ‘Work out the result of median filtering with » =3, and compare the result with the output of ilering wih the averaging mask 1/4 [12 1} © 39 Median tering can degrade thin ines. Thiscan be partially avoided by vsing ‘nonsquare neighborhoods What neighborhood shape would you use to preserve horizontal or vertical ines, pixel wide?65 Chapter3_ Dealing with Image Noise Project (© 21. Write programs implementing Gaussian and median noise filtering The code should allow yout specify the filter's width. The Gaussian implementation should bbe made as eficient a possible. References [1] MBrad,1 Ponce, A. Yuilland M Asada, Describing Surfaces, Computer Vision, Graph les and Image Processing, Vol. 32,001, pp. 1-28 (1985), LD. Cai, Sele Based Surface Undersanding Using Difsion Smooshing, PHD Thess, Deparment of Arial Intligence, Univesity of Edinburgh (1950), [3] RC Gonzalezand RE. Woods Digital nage Processing, Addison-Wesley, Reading (MA) 099, RM. Haralick and LG, Shapiro, Computer and Robot Vision, Vl, Adison-Wesley, Reacing (MA) (1992) [5] TiLindeberg, Scale-Space for Discrete Signal, IEEE Transactions on Patera Analysisand ‘Machine Intelignce, Vo, PAMI-2, 30.3, p. 24-254 (1930. [6] SG Maat, A Theor of Mulizesluion Image Procesing: the Wavelet Representation, TEEE Transactions on Pater Analysis and Machine Ineligence, Vo. PAMI-I, 0.6, (614-695 (198). 11. Papouis, The Fourier Integral ands Applications, MeGra-Hl, New York (196). [8] WK Pra, Digital image Processing, Wiles, New York (1981). [9] A. Rosenfeld Multiresolution Image Processing, Springer Verlag, New York (184), [00] A. Rosenfeld and A.C. Kak, Digi! Picture Procesing, Academic Press, London 1976). [01] AP Witkin, ScaleSpace itering, Proc. th In. Conf on Arf Inligence ICAL, Kariseube, pp. 1019-1022 (193). i Image Features (Que nso dito come una salita veg oh ales dl itliano in gia! Polo Cots, Baral ‘This and the following chapter consider the detection, location and representation of special parts ofthe image, called image features, usually corresponding to interesting element ofthe Chapter Overview Section 41 introduces the concept of image feature, and sketches the fundamental isues of Feature detection, on which many computer visio algorithms are based. Section 4.2 deals with edges, cr contour fragments, and how to detect them, Edge detectors are the basis ofthe line and curve detectors presented in the next chapter. Section 43 presens features which do not correspond necessarily to geometric elements of the scene, but are nevertheless useful Section 4.4 discusses surface features and surface segmentation for range images. What You Need to Know to Understand this Chapter ‘+ Working knowledge of Chepter2and3 + Basie concepts of signal theory, ‘Eigenvalues and eigenvectors ofa matrix. ‘Elementary differential geometry, mainly surface curvatures (Append, section A.) The ne anit nul oud The ey ys on alan on boys, o62 Chapter4 Image Features 4.1 What Are Image Features? Tn computer vision, the term image feature refers to two possible entities: 1. a global property of an image or part thereof, for instance the average grey level, the area in pixel (global feature); oF 2 a par ofthe image with some special properties, for instance a circle, a line, ot textured region in an intensity image, a planar surface in a range image (local feature) The Sequence of operations of most computer vision systems begins by detecting and Tocating some features i the input images. I this and the following chapter, We con: centrate on the second definition above, and illustrate how to detect special parts of intensity and range images like points, curves particular strutures of grey levels, oF su face patches. The reason for this choices that mos algorthmsin the following chapters assume that specifi, local features have already been Tocated. Here, we provide ways of doing that. Global features are indeed used in computer vision, but are less useful to solve the problems tackled by Chapters 7, 8,9, 10 and 11. We assume therefore the following detition, Definition: Image Features meaningful dete Image features ae os -Meaningfal means thatthe features are associated to interesting scene elements via the image formation process. Typical examples of meaningful features are sharp intensity variation crated by the contours of the objects the scene, or image regions ith uniform grey levels, for instance images of planar surfaces, Sometimes the image features welook for are nat associated obviously toany part or property ofthe scene, but reflect particular arrangements image value wth desirable properties like invariance orease of detectability Fr instance, section 4.3 discusses an example of features whch prove adequate for tracking across several images (Chapter 8). On the other hand, the number of pixels of grey level 134 makes a rather unuseful feature, as in general, it cannot be associated o any interesting properties of the scene, a individual grey levels, chaage with illumination and viewpoint. ‘Detecable means tha location algorithms must exist, otherwise a particular fea- lure is of no use! Different features are, of course, associated to diferent detection algorithms; these algorithms output collections of feature descriptors, which specify the position and other essential properties of the features found in the image. For instance, ‘descriptor for line features could specify the coordinates of the segments central point the segments length, and its orientation, Feature descriptors ae used by higher-level programs; fr instane, i this book, chains of edge points (section 4.2 are used by line detectors (Chapter3)ines, in tur, are used by calibration (Chapter 6) and recognition algorithms (Chapter 10). Section 42 EdgeDetecton 69 "© In BD computer vision, feature extraction i an intermediate step, not the gel ofthe system, We do mc extzat ine, a, jas to obi ine mape; we extract ines fo nage robots in corido, to decide wheter an image contains certain objet, ta calibrate the innsc parameters ofa camera, and soon. The important corollary is that does not ‘make mc senseto pursue “peret feature extraction per se asthe adequacy of feature , the edge orientation image and ‘as twothreshods such hat ral the edge points in fy, nd scansing I ina fixed order: 1. Locate the next unvsited ede piel (J), such hat If) > 1 ‘ithe hips senting, vont Section 42 Edge Detection 79 2 Stang fom yj), follow the chains of connected local maxima, in bot directions perpendicular tthe edge normal, as long Jy > x. Mark al visited points, and eave 3 Tt the locations al points ia the coenected nou found ‘The ouput is st of sts each describing the poston of «connected contour inthe image, ‘swells the strength anc tho orientation images, desrbing the properties ofthe edge pois Hysteresis thresholding reduces the probability of fale contours @ they must produce a response higher than r, 1 occur, as wells the probability ofsteaking, which requires now much lager fluctuations to cur than in the single-threshold case, FA yis lange, x amas be set 100, Notice that HYS“ERESIS_THRESH performs edge tracking: i finds chains of connected edge maxims, or connected contours. The descriptors for such chains saved by HYSTERESIS THRESH in addition to edge point descriptors, can be useful for curve detection ‘= Notie that Yunis willbe splithy NONMAX._ SUPPRESSION. How serious this de pendson what the elgsare computed fr. posible solution sto modly HYSTERESIS ‘THRESH so that treognizes Vjunction and interrupt all edges, Figure4.6showstte output ofourimplementation of NONMAX_SUPPRESSION and HYSTERESIS.THRESH when run on the images in Figure 45, All contours are ‘one pixel wide, a desi Figure4.6_ Output of HYSTERESIS. THRESH ran on igure 45 showingtheeffect of varying thefilters ize Lefto right 67 = 1,23 pixel The grey levels hasbeen inverted (back on white) forclaiy80 Chapter4 Image Features 423, Other Edge Detectors Early edge detection algorithms were less formalized mathematically than Canny’ ‘We sketch two examples, the Robers and the Sobel edge detectors, which are easily implemented in their essential form, 7 ‘Algorithm ROBERTS EDGE DET “The inputs formed by an image 7, anda threshold. 1. apply aise soothing as appropri (Fr instance, Gaussian smoothing in the absence of information on nie: see Chapter 3) obtaining a new image 1 2: filer (algorithm LINEAR. FILTER, Chapter) withthe masks [1a] G4] btining two images J and 1. estimate the gradient magnitude at cach pixel (i,j) 15 iia. cbining n imag of mags rains 4 aay esl ps, ) eh at) > ou The outpt isthe location of edge poins obtained inthe last step _ Algorithm SOBEL_EDGE_DET Same as for ROBERTS EDGE_DET, but replace step 2. withthe follwing, 2 filter, algorithm LINEAR FILTER, Chapter 3) wi the masks 4-2-1 101 oo a] |-202 124 101 obtaining wo images and Notice that the element special to these two detectorsis the edge-enbancing filter (see Review Questions), Figure 4.7 shows an example of Sobel edge detection. 424 Concluding Remarks on Edge Detection Evaluating Bdge Detectors. The ultimate evaluation for edge detectors which are pat of larger vision systems is whether or not a particular detector improves the performance ofthe global system, other conditions being equal. For instance, within an Section 42 Edge Detection 81 Figure 47_Lete output of Sobel edge enhancer run on Figure 4.1. Middl: edges detected by thresholding the enlancod image at 38. Right sme, thresholding at SU. Notice hat some ‘omtous ae thicker thin one pitel (compare wth Figure 45), inspection system, the detector leading tothe best accuracy inthe target measurements, and acceptably fat, ito be preferred However, it is useful to evaluate edge detectors per se as well. We run edge . The geome interpretation i ad can be understood through fe pati ates Fist consider perfect uniform Q: the image gradient vanishes everywhere, © comes te mul matt and wehave y= = 0. Second asume that conan an ideal lak an white step edge: we hve p=, i > andthe eigenvector asocinted wih is parle othe image gradient Noe that Ci rank dee in Both cases, with rk 0 andl respectely id, ssume that Q contains the comer ofa black Sauate aginst awhile background: ss there ae two principal dictions in we Chee orig» Ovand the larger the elgevaes, the sronge (hehe eons) heir Corresponding image lines At this pit, you have caught on wh he fact tha he Cigemectorseheadeedge directions the egeales edge srengh A comers ened Section 43 Point Features: Comers &3 @ oS) Figure 48 Corners fan ina 8-i,syothtic checkerboard image, coruptd by two realizations of sybase Gausian noise of standard deviation 2. Te corner isthe botom right point of each 15 1S nighbourhood (hight), by two strong edges; therefore, as; = 2a, a comer isa location where the smaller eigenvalue, 3 i Tare enough Time for examples Figure 4.8 shows the comers found in a synthetic image of a ‘checkerboard, with and without additive noise. Figure 49 shows the corners found in the image ofa building, an the histogram of the 22 values. The shape ofthis histogram israther typical fr most natural images. Ifthe image contains uniform regions or many almost ideal step edgs the histogram has a second peak at =0. he til (right ofthe histogram is formed by the points fr which hs is large, which are precisely the points (or, equivalently, the neighbourhoods) we are interested in. Figure 4.10 shows another ‘example with a road scene @ o rc) Figure 49 (a: rigina ima the image points for wheh of building (b): the 15x 1 pixel neighbourhoods of some of 20.6): hitgram ofA values across the imageChapter 4 image Features Figure 410. (a):image ofan outdoor sene. The oeneris the bottom right point ofeach 15 18 ‘eibouthood (highlighted). (corer found sing 215 x 15 neighbourhood. ‘We altcratethat our feature pon ise high const image comes and junc earth messin Set soto (asthe comeria Fite 48), Bak comer tthe lal neat aterm aot cresponing o obvi en fea antmeof te eres in Fig 0) In onerale t aes pois he try aac hsivewel pronounced dstncvedcon asociedo cgrvalcs vc bon scanty ager an er et nc ummare te pede or ating ths new pe afimge features ‘Algorithm CORNERS “Te input is formed by an image, , ad wo parameters: the threshold on 2, rand the Hiner sive of square wind (aeighbourbood), ay 2 +1 pte 1. Compute the image gradient ove the enite image 1 2 For ech image point (4) form tne matic C of (49) over (2H +1) x QN +1) aeghbourhood @ ofp {b) compute the smal eigenvae of C; {e) iia > rate the coordinates pinto a Ts, 2 Sort Lin dereasing order of, ‘4 Scanning the sorted list topo hottom: foreach current pint, p, delete al points pearing furtheron in thelist which belong io the neighbourhood ofp he output nist of feature points fr which 22> «and whose neighbourhoods Jo not overap Section 4.4 Surface Exaction from Range Images 85 Algorithm CORNERS has two main parameters: the threshold , and the sizeof the neighbourhood, 2 +1). The threshold, x, canbe estimated from the histogram of ig (Exercise 4.6), the latter has often an obvious valley near zero (Figure 49) Notice that such ale snot abvays present (Exercise 47). Unfortunatly thereis no simple criterion for the estimation ofthe optimal size ofthe neighbourhood. Experience indicates that choices of ¥ between 2and 10 are adequate in most practical case. © Inthe case of comerpoits the vale of Wi linked tothe lestono the corner within the neighbourhood. As you ean sce rom Figure 49, for elativaly large values of Whe cornet tends to move any from he neighbourhood center (ee Exercise 48 fra quantitative analysis ofthis eft). 4.4 Surface Extraction from Range Images Many 3-D objects, especially man-made can be conveniently described in terms of the shape and postion cf the surfaces they are made of, For instance, you can describe a cone as an object formed by two surface patches, one conical and one planar, the latter perpendicular to theaxis ofthe former. Surface-based descriptions are used for abject classification, poseeximation, and reverse engineering, and are ubiguitous in computer sraphics ‘As we ave seen in Chapter 2, range images are basically sampled version ofthe visible surfaces inthe scene. Therefore, ignoring the distortion introduced by sensor imperfections, the shape of the image surface and the shape ofthe visible scene surfaces ‘are the same, and any geometric property holding for one holds forthe other too. This section presents a wdl-known method to fin patches of various shapes composing the Visible surface of an object. The method, called HK segmtentation, partitions a range image into regions ofhomogeneous shape, called homogencous surface patches, just surface patches fo shor.” The method is based on differential geometry; Appendix, section A gives a stort summary of the basic concepts necessary. == Thesolution1o several eomputer vision problems involving 3D objet modes are simpler when using }D features than 2-D features, as image formation must be taken into account forthe later. Tate tho iage alucreardl a asic land nthe age lan "Nose hat sure paces ae he bacingesiens for bung sara tase CAD model of an ox atonalChapter 4 image Features kK T Tale Figure 411. usteation ofthe local shapes resuing rom the HK casifcaion Problem Statement: 1 K Segmentation of Range Images Given a range image Fin. form, compute anew image registered with and the same ize, in which ach pe asia with local shape cas selected from a pven dictionary ‘To sole this problem, we need two tools a dictionary of shape classes, and ‘an algorithm determining which shape class approximates best the surface at each piel 44.4. Defining Shape Classes ‘Since we want to estimate surface shape at each point (pixel), we need a local definition ‘of shape Differential geometry provides a convenient one: using the sign ofthe mean ‘curvature Hand ofthe Gaussian curvature K, we can clasty the local surface shape as ‘shown in Table 4.1, and illustrated by Figure 4.11 ‘Inthe able, concave andconvex are defined with respect tothe viewing direction: a holein therange surfaces concave, and itsprincipal curvatures Appendix, section A.S) negative, At cylindrical points, one of the two principal curvatures vanishes, a8 for instance at any point of a simple cylinder or cone (not the vertex). At eliptc points, both principal curvatures have the same sign and the surface Tooks locally like either Section 44 Surface Extraction from Range images 87 x H____Laalaape dw ° ae cave ein comercial cnear ipic + = camvereigue xy ype Table 4.1, Surface patches casifcation scheme, the inside of a bow if concave) or the tip of anos (ifconves) Athypesboic points the principal curvatures ate nonzero and have diferent signs the surface looks like a sade Novice that this classication i qualitative, in the sense that only the sign of the curvatures, not ther magnitude," influences the result. This fers Some robust. ses assign can oft be estimated correctly even when magnitude estimates become 442. Estimating Local Shape Given Table 4.1, all ve have to dois to recall the appropriate expressions of H and K, ‘evaluate them at each image point, and use the signs of Hand K to index Table 4. Here ishow to compute HK from a range image, hin ry form (subscripts indicate again partial ditferetition), Dray Hy 2 4.10) asin en) (4 Hhyy ~ Dh hlny OE ay = ee ++ AY fa Unfortunately, we cannot expect good results without sorting outa few details Fist, the input imagecontains nose, and this distorts the aumericl estimates of deriva tives and curvaturessnoise smoothing is therefore required. Notice thatthe worst nose may be due to quanisation (if the 8-bit image does not capture all significant depth Variations of the sere), o to limited sensor accuracy "Win ee exception fo cousChapter image Features The low anqiston noise of state-of-the-art later scanners should not jeopardise se ‘us the quality ofthe HK segmeatation. Segmentation can fly Howevet, because im ge quantisation and resolution are nt sfcont given the objects andthe stand-off distance, = Gaussian smoothing, as any averaging file, tends to underestimate high curvatures, and to introduce spurious curvatures around contours (Exercise 49) Second, the reslt may sill contain smal, noisy patches, even wien smoothing is applied to the data, Small patches can be eliminated by additional filtering (Exer cise 4.10) Third, planar patches should yield # = K 0, but numerical estimates of Hand K will nover be exactly ero, To decide which small numbers ean be safely considered zero, we can establish zero-hresholds for H and K. In this cas, the ac curacy with which plana patches are extracted depends (among others) on the noise level, the orientation ofthe plane and the values of the thresholds Fourth, estimat ing derivatives and curvatuzes does not make sense at surface discontinuities, To skip discontinuities, ne could therefore un an edge detector on R (c.g, CANNY. EDGE, DETECTOR for step edges) keep a map of edge points andskip them when estimating Hand K. ‘Tosummarie, here isa basic HK segmentation algorithm. Algorithm RANGE, SURF_PATCHES The inputs a range image, In form, and ast of six shape label (sl, asocated to the clases of Table. 1 Aply Gausan smoothing oF, obtaining, 2 Compute the inmages ofthe derivatives fy fu yy (Append, Seton A2) ‘Compute the HK images wing (4.1) and (4.10) ‘4 Compute the shape image, by assigning a shape label to each pel, aoxoring to the rules in Table. “The output isthe shape image S Figure 4.12 and 4.13 show two examples of HK segmentation. The range data were acquired and processed using the range image acquisition and processing systems developed by the Computer Vision Group of Heriot-Watt University. The objects are ‘mechanical components formed by planar and curved surfaces. The figures show the input range data the data after smoothing (both as grey-level image and as 3D plot), and the patches detected Section 44 Surface Extraction from Range images 89 @ @ Figure 4.12 (a): Inputrange image, grey ced (the darker te cost to the sensor. (b): Aer smoothing srey coded, (c: Same stp ht a6 3-D Hometrc pot. (4) The patches detected bby HA segmentation, Courtesy of M Uist, Heriot-Watt Universi Inorder to be wed by subsequent task ike classification or pose estimation, the ‘ulput of RANGE SURF PATCHES is often converted into alist of symbolic patch descriptors. In each descriptor, a surface patch is associated with a number of attributes, ‘which may include a enigue identifier, postion of pati center, patch area, information fon normals and curvatures, contour representations, and poiaters to neighbouring patches Closed-form surface models (eg, quadrcs) are often fitted to the surface patches extracted by the HK segmentation, and only the model's coefiiens and type (ce. plane, elinder,cone) stored inthe symbolic descriptors,90 Chapter a Image Features @ Figure 413. (a): input range image, ry coded the darker te loser othe sensor) (alter soothing, grey coded, c):stme a top right, a8 -D isometric lo (2: the patches detected by HK segmentation. Courtesy of M, Umasithan, Heriot-Watt Univers 45° Summary After working through tis chapter you should be able to: {2 explain what image features are and how they relate tothe 3-D world ests forthe related design detectors for edges and point features andperforman algorithms; a design a simple HK segmentation program for range images. 46 Further Readings Several books review the theory and algorithms of large collections of edge detectors, for instance (8, 9,14} John Canny’s description of his edge detector is found in [4] Section 4.7 Review 91 Spacek [15] derives sihtl different, optimal ede detector. De Micheli et al [6] and Pratt [4] give exarples of discussions on performance evaluation of edge detectors A very good electronic textbook on image processing, including material on feature ‘detection, isHIPR (Hypermedia Image Processing Reference), published by Wily (on line information at ettp; //w, viley .co.uk/electronic/aipr). Second-order derivative filters for edge detection have been very popular inthe eighties; a clas reference is Marts book {13}. These filters look forthe zero-rossings ofthe second derivative of a Gaussian-fitered image. Their dsadventages in compar ison with Canny’s detector include worse directional properties (being sotropic their ‘output contains contibutions fom the direction perpendicular tothe edge normal: this does increase noise without contribute to detection); moreover, they always produce closed contours, whch do not always correspond to interesting edges For a theoreti cal analysis of the main properties of ist-order and second order derivative filters see Torre and Poggio (Ii. "The pont feature (corner) detector CORNERS is based on Tomasi and Kanade’s fone [I6; an application to motion-based reconstruction is desribed in [17], and discussed in Chapter 8. Further corner detectors are reported in 10, 39} ‘Bes! and Jain 1}, Hoffman and Jain [11], and Fan [7] are some variations of the HK segmentation method. Hoover et al. {12} and Truceo and Fisher [19] report useful experimentalassessments of H,K segmentation algorithms from range images. For examples of reverse engineering from range images se [5,2 47. Review Questions 141 Describe ie difference between local and global features and give examples ‘of image features from both classes, 13 42 Given ou: definition and classification of edges, discus the differences be: tween edges in intensity and range images Do our edge types make sense for range images? 19 43 Would youapply intensity edge detection algorithms to range images? Would the algorthmsrequire modifications? Why? 10 44. Consider he equation giving the orientation ofthe edge normal in algorithm CANNY_ENHANCER. Why can you ignore the aspect rato of the pixel here, ‘but notin the situation proposed by Question 28? 1D 45 In section 422, we used the image gradient to estimate the edge normal Discuss the practical approximations implied by this choice. 1346. Explain vty ¥junctions are spit by HYSTHERESIS. THRESH. 1-42. Discus the diferences between Sobel’ edge enhancement scheme pre ceded by a Gaussian smoothing pass, and Canny’s edge enhancement scheme, ‘What are the main design differences? Do you expect different results? Why?2 Chapter 4 image Features 13-48 You can suppress short edge chains in the output of CANNY_EDGE_ DETECTOR by filtering the input image with wider Gaussians, How would you achieve the same result with a Sobel edge detector? 19-49 How would you design an experimental comparison of Canny’s and Sobel's edge detectars? 440 Why isthe case K > 0, H =0 not featured in Table 4.1? 3 411 Explain why 11K segmentation cannot be applied to intensity imagesin the hope to find homogencous scene surfaces Exercises © 4d Consider the SNR measure formalzing the good detection criterion (43). ‘Show that, ifthe filter has any symmetric components, the SNR measure wil ‘worsen (decrease). This shows that the best detection is achieved by purely ant- symmetric filters (© 42 Prove the detection-localizaton uncertainty equations (46) for step edge rele?) where © isa fixed fraction (eg, 09) and the weights are proportional tothe counters” values ‘Asan example, Figure 52 (s) shows asymthetic 64x 64 image of two Fines Oaly a subsel ofthe lines’ points are present, and spurious points appear at random locations. Figure 5.2 (b) shows the counters in the associated (m,n) parameter space. Section52 The Hough Transform 99 o 0) Figure 52_ (a) An imag containing wo ines sampled iregulaly and several random pints (©) Plot ofthe counters in the corresponding parameter space (ow may pont contribute to «ach ell (n,n)). Notice thatthe main peaks are obvious but hore are masy secondary peaks. ‘We are now recy fo te following algorithm ‘Algorithm HOUGH. LINES ‘The input is, a AN binary image in which each pixel Bj) 81 if an edge pixel. 0 ober. Le. 8 bethe araysconsiing the discrete intervals of the p, parameter spaces (010, VAP + N29 [0.x] and R, 7 respectively their numberof clement. 1. Discetize the parameter spaces ofp and @ asing sampling steps Jp, 30, which must yield ceptable resolition and manageable size foray an 2 Let A(R, 7) hom area integer counters (accumulators: inal 5 For each pte, Bj), sUch that £()= and for (4) let 9 = co 40) + jin (0) find the nde. ofthe element of loses 0 (6) increment 4,0) by one allelements of Ato r ‘4 Finda ocal mavima (fy such hat Ai, p> e, whore i user defined threshold The ourputis ast of pairs ad), By), seine the lines detested in n polar form. *© Iamestinate mp) ofthe edge direction at image point pis available, and we assume that mp) i also he dretion ofthe line through p, gue cll m,n, = ¥~ mys) canbe identified. Inthissase instead of the whol line, weinerement onl the cuter tm.) toallow forthe uncerainty associated wth edge direction estimates weintement al the cello a small sqyment centered in (m,),the length af which depends inversely on he100 chapter More Image Features elailt ofthe direction estimates This can speed up considerably te construction of the paramcter space 5.2.2. The Hough Transform for Curves, ‘The HT is easly generalized to detect curves y = f(s, a), where a=[ay,...,9]' isa vector of P parameters, The basi algorithm is very similar to HOUGH LINES. ‘Algorithm HOUGH_CURVES The input asin HOUGH_LINES. Let (i) be the chosen parametrization of tags 1. Diseetize the intervals of variation of ,.-.ap with sampling steps yeding acceptable resoltion and managesble size forthe parameter space Let ...5p be te ize ofthe iseoized intervals 2. Let Alston) be an aray of integer counters (accumulators), and iniaize alts ‘ements 020. ‘4 For each pel Ej) such that EG, y= flsyayin 4, Final focal maxima a suc that A(ag) > where xb a wer defined threshold 1. increment al counters n he curve defined by set of vectors ay. ar deserting the curve instances detected . = Thesizeofthe parameter space increases exponentially with te numberof model parame ters and thetime neded to find all maximabecomesrapdly unacceptable. This iascrious limatation. In particle, assuming for simplicity that he disretied intervals fal param fers ave the same ie N, the cost ofan exhanstve search ora curve with parameters proportional o.N?. This pablem canbe tackled by varabe-esoluion parameter spaces (sce Question 56), 5.23 Concluding Remarks on Hough Transforms ‘The HT algorithm i a voting algorithm: Each point “votes” for all combinations of ppstanetets which may have produced iifitwors part of the trget curve From thispoint bf view, the array of counters in parameter space can be regarded asa histogram. The fina total of votes, c(m), in a counter of coordinates m indicates the relative likelihood ofthe hypothesis “a curve with parameter set m exists in the image. “The HTcan also be regarded as patern marching: the clas of curves identified by the parameter space isthe class of patterns. Notice that the HT is more efficient than direct template matehing (comparing all posible appearances ofthe pattern with the image). "The HT has several attractive features Fist, a all points are procesed indepen: dently, it eopes well with occlusion (fhe noise does ot result in peak as high as those ‘created by the shortest true lines) Secon itis relatively robust to noise, as spurious Secton53 Fitting Elipsesto image Data 101 points are unfikely to contribute consistently to any single bi, and just generate back ‘round noise. Thirdit detects multiple instances ofa model ina single pass ‘The major limiation of the Hs probably the rapid inerease ofthe search time with the numberof parameters inthe curve's representation, Another limitation i that non-target shapes cn produce spurious peaks in parameter space: For instane, line detection can be distarbed by low-curvature circles. 5.3. Fitting Elipses to image Data Many objects contancicular shapes, which almost always appear slipsesin intensity images (but see Exercise 5.5): for this reason, ellipse detectors are usefl tools for ‘computer vision. Theellipse detectors we consider take an image of ee points in input, {ndnd hobs lige ing te points Therefore ths sexton connate ele {iting nd assumes that we have identified a set of mage points plausibly belonging to a single arc of elise ‘Problem Statement: Elipse Fiting Letpr py beasetof image poins and voltae [2a 8ni] fip.a) =a" amar + bay ty? tae bey f=0 ‘heim equation ofthe generis mv aracterzed by the parameter vector ee Find the parameter vector, a, asocated tothe elise which fits p py bes in the east squares sense asthe soutonof| in (Dp. (1) where Dip table distance ‘Notice that the ecuation we wroe fr fia) is eally a generic conic We sal ave more ‘say about his point later Whats asta tance? Tee ae to main answers for ei iting, he Euclidean distance and the algebraic distance. * 5.3.1. Euclidean Distance Fit The first idea sto tryand minimize the Euclidean distance between the ellipse and the ‘mieasured points In this cae, problem (5.1) becomes sin > ip pl? (62)102 Chapter More image Features ‘under the constrain that p belongs to the ellipse: (pa) =0. Geometrically, the Euclidean distance seems the most appropriate, Unfortunately, it leads only to an approximate, numerical algorithm, How does this happen? Let us try Lagrange muliplies to solve problem (52). We define an objective function b= ip— pil ~2/p.a, which yield p-p=2°/(Pa) 3) Since we do not know p, we try to expres it asa function of computable quantitics. To do this, we introduce two approximations 1. We considera first-order approximation ofthe curve = Fp.a)® f(Pi.a) + (PPI V/0..a) G4) 2, We assume that the p, are close enough to the curve, so that V (p) ~ V(@)- Approximation 2 allows us to rewrite (53) as p—Pi=AVf(P.a), which, plugged into (5.4), gives = £0018) Wn. ‘Substituting in (5.3), we finally find 0 ISFeei “Thisisthe equation we were after: Itallows uso replace, n problem (5.2), the unknown ‘quantity ip ~ pil with a function we can compute. The resulting algorithm iss follows: Ip pel= Algorithm EUCL_ELLIPSE_ FIT The inputisaset of Vimage pins p,-..-py- Weassume the notation inroducedin he problem statement box fr elie tng, 1. Stato an iii value my Section53FitingElipses to image Data 103, 2 Using ay asin point, run a numerical minimization to find the soation of ‘The outputs the efnng the best. ise © A reasonale inal value is the solution ofthe clsedorm algorithm discussed next (secon 532), How satisfactory is EUCL_FLLIPSE_FIT? Only partially, We started with the tmue (Euclidean) distance, the best possible, but were forced to iatroduce approximations and arrived a 2 nonlinear minimization that can be solved only numerically, We are not even guaranteed thatthe best-fit solution isan ellipse: It could be any conic, as ‘we imposed no constraints ona. Moreover, we have all the usual problems of numerical ‘optimization, ineludcg how tofind a good initia estimate fora and how to avoid getting stuck in local minima = Thegood news however stat EUCL. ELLIPSE. FIT canbe weld for general oni ‘Of cous, tere ia isk thatthe rsat is not the con we expect (ee Further Reangs) A logical question at this point is: If using the rue distance implies anyway approximations and anumerial solution, can we pechaps find an approximate distance leading toa closed-form solution without further approximations? The answer is yes, and the next section explains how todo it 532 Algebraic Distance Fit Definition: Algebrale Distance The algebraic distance oa point p from acure fp, simply LF. ‘The algebraic dstance is different fom the true geometric distance between & curve anda pont; inthis sense, we start of with an approximation, However, this isthe ‘only approximation we introduce, sine the algebraic distance turns problems (51) ito 8 linear problem that we can solve in closed form and with no further approximations. Problem (5.1) becomes ayo Sisfa? re) ‘To od he ri ston a=0, we mast enforse a constant on a. OF he several constraints possible (see Further Readings), we choose one which forces the solution tobe an elise108 Chapter More Image Features 00-2000 010000 r}-2.0.0 000|,_sroy_ 100 0 000) 66) 000000 000000 Notice that this can be regarded as a “normalized” version of the elpial constraint P= dae = 0,38 aly defined up toa sale factor. ‘We can ida solution to this problem with no approximations. Fist, we rewrite problem (5.5) 38 Jax Xa = mn "Sal, (57) where 2 2 1 qoan op non dup yan x-|2 mm Hon (58) 2 yw Yh ow wT Inthe terminology of constrained last squares is called the design matrix, §= XX the seater matrix, and C the constrains matrix. Again using Lagrange multipliers, we ‘obtain that problem (5.7) issolved by Sa=2Ca 69) ‘Thisisaso-called generalized eigenvalue problem, which can be solved in closed form. It canbe proven thatthe solution, asthe eigenvector corresponding othe only negative eigenvalue, Most numerical packages wil find the solution of problem (5.9) for you, ting care of the fact that Ci rank-deficient, The resulting algorithm is very simple. ‘Algorithm ALG ELLIPSE. FIT “Theinputisaset of W imag pins p,...,py. Weassume he notation introduced inthe problem. statement box fr elise iting 1. tld he design mrs, as pee (58), 2 Build the scatter matrix, S=X", {3 Build the constant mati, C, 8 por 4). ‘4 Use a numerical package to compute the eigenvalves ofthe g Jen andl the oly negative eigenvalue eralized eigenvalue pro “The output isthe best parameter vector, ven by the eigenvector asain 10. Figure 53 shows the result of ALG_ELLIPSE_FIT run on an elliptical ar cor rupted by increasing quantities of Gaussian noise. Section 53 Flting Elipsesto image Data 105 Figure 53. Fxample cf best elpses found by ALG_ELLIPSE.FIP for the sume ar of ellipse, coruped by increasingly strong Gaussian noise From lett right, the noise varies fom 395 10.20% ofthe data read (gure courtesy of Maur Plu, Univesity of Edinburg). ‘ALG_ELLIPSE_FIT tends tobe biased towards low-eccenmiy slutonsindeed 4 characteristic ofall methods based on the algebraic distance. Informally, this means that the algorithm prefers fat ellipses to thin ellipses, as shown in Figure 54, The reason isbest understood through the geometric interpretation of the algebraic distance Geometre Interpretation ofthe Algeb Consider a point p ner Tying onthe ellipse f(y proportional 0 stance (Elipses) The algebraic distance, fp, is =! [s#} =] whee risth tans the pe roms cetera th dnc ofp fom he lps along the ame ine hich goos through p, and ds (Figure 55). Figure $4. station of the low-ccentrcy bias introduced by the algebraic tance. ‘ALG_ELLIPSE_FIT was run on 20 sample covering hall an elise, spaced uniformly along sand corruped by diferent realization of rather strong, Gaussian ise wih costa Standard deviation (0 =008, abou 10% ofthe smaller soma). Tho best ellipse (oid) is systematically biased tobe Mater” than the tue one (dashed)Chapter More Image Features Figure 55 ttwsraion of the distances d and ‘nthe geomet interpretation ofthe algebraic itance, Q. Ata parity of, Qisargerat Phan MPs Notice that this interpretation i valid flor any conic. For hyperbola, the ceater ithe inersection of the asympcoes for parabola, he center sat init For any fixed d, Q is maximum atthe intersection of the ellipse with its smaller axis (¢, Pin Figure 55) and minimum at the intersection of the elise with ts agger axis (eg. Pin Figure 5.5). Therefore, te algebraic distance is maximum (high weight) for observed points around the fat parts ofthe clips and minimum (Jow weight) for served points around the pointed parts. As consequence, iting algorithm based fon Q ends to believe that most data points are concentrated inthe Matter part ofthe ellipse, which results in “ater” best-fit elipses 53:3. Robust Fitting ‘One question which might have occured to yous Where do the data point for ellipse iting come frm? In real applications and without apron information on the cen, finding the points most kel to belong aspecifc elise isa dificult problem. Insome cases its reasonable to expect that the datapoints can he selected by hand. Filing that, serelyonedge chaining as dseribodin HYSTERESIS, THRESH. inanyaseitisvery key thatthe datapoints contain outs ree ts re dt pons which vt te Stil asaptns ofthe ein tor In our ease, an outer isan edge point erroneously assumed to biog toa elise fut Both EUCL_ ELLIPSE. FrT and ALG_ELLIPSE. FIT asleastsquares estimators, assume thatal datapoints aa he regarded as itu poits corrupted by additive, Caus- sian noise; hence, evenasmall number fours can degrade theiresls badly. Robust estimators ae a lass of methods designed to tolerate outers" A robust distance that i Sxtion inthe Append gies sn itredtin toronto Section5.3 Fitting Elipses to Image Data 107 ‘often works wells the absolute value, which is adopted by the following algorithm for robust ellipse ting Algorithn ROB_ELLIPSE. FIT Theinputisasctof image points... .py.Weastume the notationintcedinthe problem statement box for elips iting 1, run ALG_ELLIPSE FT, and allt stuson my 2 Usingay asin, un aaumeszl mination ond oat of nin la ‘The outputs the soliton, a, which denis the best elise Figure 56 illutates the problems caused by outliers to ALG_ELLIPSE_FIT, and allows you to conpare the result of ALG_ELLIPSE_FIT with those of ROB_ ELLIPSE_FIT, started from the solution of ALG_ELLIPSE_FIT, in conditions of severe nose (that is, bts of outliers and Gaussian nose). Both algorithms were run ‘on 40 points from halfan ellipse, spaced uniformly along x, and corrupted by different realizations of Gaussin nose with constant standard deviation (o = 0.05, about 7% of the smaller semians c), About 20% of the points were turned into outliers by adding 8 uniform deviate in a, a] to their coordinates. Notice the serious errors caused to ‘ALG_ELLIPSE_FIT by the outliers which are well tolerated by ROB_ELLIPSE_FIT. 5.3.4 Concluding Remarks on Ellipse Fitting With moderately noisy data, ALG_ELLIPSE_FIT should be your first choice, With seriously noisy data te eccentricity ofthe best-fit ellipse can be severely underest- ‘mated (the more so, the smaller the arc of ellipse covered by the data). If ths is a problem for your applization, youcan try EUCL ELLIPSE, FIT, starting from thesolution of ALG_ELLIFSE_FIT: With data containing many outlier the resuits ofboth UCL ELLIPSE_FIT and ALG_ELLIPSE_FIT wil be skewed: in his case, ROB ELLIPSE_FTT, started from the solution of ALG_ELLIPSE_FIT should do the trick (but you are advised t take a look at the references in section A.7 in the Appendix if robustness sa serios issue for your application). If speed maters, your best bet i ALG_ELLIPSE_FIT slone, assuming you use a reasonably efficient package to solve ‘the eigenvalue problen, and the assumptions ofthe algorithms are plausibly satisfied ‘What “moderate noisy” and “seriously nosy” mean quantitatively depends on your data (number and density of data points along the ellipse, statistical distribution, land standard deviation of noise). In our experience, ALG ELLIPSE. FIT gives good fits with more than 10 points from half an ellipse, spaced uniformly along 1, and108 Chapter More Image Features FigureS6 Compara of ALG_ELLIPSE. FIT and ROB_ELLIPSE_FTT when ing to data severely corrupted by oaticrs The cls show the datapoints the asterisks suges the robust fit the slid ine show the algebraic, and the dots the tre (uncorrupted) eligse corrupted by Gaussian noise of standard deviation up to about 5% of the smaller semiaxis Section 5.7 suggests Further Readings on the evaluation and comparison of ellpse-fiting algorithms. 5.4. Deformable Contours Havin discussed how to it simple curves, we now move on tothe general problem of fitting & curve of arbitrary shape toa set of image edge points We shall deal with closed contours only. 'A widely used compster vision model to represent and lt general, closed curves isthe snake, or active contour, or again deformable contour. You can think of a snake asan elastic band of arbitrary shape, sensitive to the intensity gradient. The snake is located initially near the image contour of interest, and is attracted towards the target contour by forces depending on the intensity gradient Section 5.4 Deformable Contours 109 "Notice thatthe sakes applied othe inst image, not to anime of ede points ax the line and ellipse detactos ofthe previous sections ‘We start giving description of the deformable contour model using the notion of energy functional an continuous image coordinates (no pxelization) We then discuss «simple, iterative algrithm fiting a deformable contour toa chain of egde points of real, piselized image. 5.4.1 The Energy Functional “The keyidea of deformable contours isto associate an energy funcionalto each possible contour shape, in such a way thatthe image contour to be detected corresponds to & ‘minimum ofthe functional. Typically, the energy functional used is a sum of several terms, each correspording to some force acting on the contour. Consider a contour, e=e(s), parametrized by its arclength? s.A suitable energy functional, &,consistsof the sum of three tems: es f (015) Boa + BU)Ecare + (8) Eimage) 43, (5:10) ‘where the integrals taken along the contoure and each ofthe energy term Ener and Einag, i function of e or ofthe derivatives of e with respect tos. The parameters 1,8 and, contro therelative influence ofthe corresponding energy term, and can vary along e. Let us now define more precisely the three energy terms in (510) 5.4.2. The Elements of the Energy Functional Each energy term serves a diferent purpose. The terms Ey and Fours encourage continuity and smooulness of the deformable contour, respectively they canbe reparded 88a form of internal energy. Eiqagea€counts for edge attraction, dragging the contour toward the closest image edge: it canbe regarded asa form of external energy. What functions can achieve these behaviors? Continuity Tere. Wecanexploit simple analogies with physical systems to devise ‘rather natural formfor the continuity term: "Gen an array paren of cave e= ei) ih the parameter and > awe have Eons = 1Ps ~ Bs-all*s ‘hile for smaller distances (5.11) promotes the formation of equally spaced chains of points and avoids the formation of point clusters. ‘Smoothness Term, The sim ofthe smoothness term is to avoid oscillations ofthe deformable contour. This is achieved by introducing an energy term penalizing high contour curvatures Since Evy encourages equally spaced points on the contour, the ‘curvature is well approximated bythe second derivative ofthe contour (Exercise 5.8), hence, we cam define Eee 38 Beare = [Bit — 281+ Bist (5.12) Ecdge Atiaction Term. ‘The third term corresponds to the energy associated to the extemal foree attracting the deformable contour towards the desired image contout. ‘This canbe achiewed by a simple function: Iii, (13) Eine = were VF isthe spatial gradient of he intensity image 1, computed at each snake point. (Ceatly, Ejnays becomes very small (negative) wherever the norm ofthe spatial gradient is large, (that ig near images edges), making € small and attracting the snake towards {mage contours Note that Ejmoe alike Fee and Ey, Gepends only onthe contour, ‘ot on is derivatives with respect tothe ae length, 5.43 AGreedy Algorithm We are now ready to describe a method for iting a snake to an image contour. The ‘method is based on the minimization ofthe energy functional (510). First of ll, let us summarize the assumptions and state the problem. ‘Assumptions Let beam image and jy... the chain of image locations representing the ntl postion ofthe deformable contour, which we assume close tothe image contour f interes, Section5.4 Deformable Contours 119 Problem Statement Starting rom jy... fy ind the deformable contour ps... py which fs the target image ‘contour best, by inimzing the energy fuetonal Y leon + Alar +E ne) with ws =O, Eons Engr tl Ea 8885.11), $12) and (5.13) espectvly (Ofte many algorithms proposed to fit deformable contours, we have selected a seedy algorithm. A gredy algorithm makes oly optimal ches, inthe hope that they lead ta globaly optimal solution. Among the reasons for selecting the seedy algorithm instead of cher methods we emphasize its simplicity and low computational complet The algorithm iconcepnal sinple because i doesnot require knowledge ofthe calculus of variations thas low computainal comple because it converges ina number of trations proportional tothe number of contour pints times the number of locations in whie each point can move at each iteration, whereas other snake algorithms take much longer. ‘Te core oa greedy algorithm for the computation of deformable contour consists of two hascsteps Fist, at each iteration, ach point ofthe contour is moved within a small neighborhood tothe point which minimizes the energy functional. Second, fore starting anew iteration the algorithm looks for corner in the eontour, and takes appropriate measures onthe parameters j,..., By controlling Ey. Let ws discuss these two steps in more detail Step 1: Greedy Mininization. The neighborhood over which the energy functionals locally minimized is typically smal for instance, a3 x3 or x S window entered at cach contour point). Keeping the size ofthe neighborhood small lowers the computational nad of the method the complexity beng liner the sizeof the neighborhood). The local minimization is done by direct comparison ofthe eneray functional valu teach locaton. ‘Step 2: Comer Elimination. During the second step the algorithm searches for comers ascurvature macina along contour. a eorvatoe maximum found at point Bisset to zt Neglecting the contibuion of Es at p makesit posible to Keep the deformable contour pieeense smooth. © Fora corre impkmentaton ofthe mood, it isimportanto normalize the contibation ofeach energy ter. For the ters Bad Er ti suiient did by the largest he leas of various he mubomataltectaigue fo zen he imam ofan, ‘hese wy alas ves the efor termining te minima an rr neon,wm Chapter5 More Image Features ‘aluein the neighborhoodin which he pointean move. For Zina insted, tay be eal to normalz the nom of he spatial gradient, VI, a6 Ma =m With and maximum and minimum of V7] over the neighborhood respectively. The iterations stop when a predefined fraction of all the points reaches a local ‘minimum: however, the algorithms greed does not guarantee convergence tthe global ‘minimum. I usually works very well a fr as the initialization isnot too far from the desired solution. Let us enclose the algorithm details inthe usual bo. Algorithm SNAKE, The input formed y an intensity image, , which contains a closed contour of interest, and by chain of mag locations p,...,y,denig the intial postion and shape ofthe snake ‘Let f be the minimum fraction of snake points that must move in each iteraon before coavergence, and Uip) a smal neighborhood of point p In the besining, p= and (used in Eom) ‘While afactin greater han of he snake points move in a iteration 1. foreach N, finde location of U (for which the funsonal detinedin (5.10) ‘minimum aad move the snake pont pt that leaton 2 for eachi = 1,...,, estimate the carvatae kof the snake a 9 2, im-1— 2014+ Poe and lok for local maxina. Set f= Dfor lp at which the curvature hs oc maximum snd exceeds a user-defined minum vale; ‘update the value of the average distance, ‘Onoutpt tis algorithm returns a chain of points py that represent a deformable contour. ‘+ Westll have to asin values toa, i, and One posible choice st inalize all of them to: another possbity ise, = = and y= 12, which gives edges attraction more relevance in the minimization stage. ‘© Toprevent the formation of noisy corer t may be useful toad an eta condition: point 1 is comer if and only if the curvature i loaly maxwum a py andthe norm of the Intensity gradient at pis ucienly large This ignores corners formed too far avay from image edges. Examples ofthe application of SNAKE to synthetic and ral images are shown in Figures 57 and 58, Section 54 Deformable Contours 113 Figure 57 = ): Intl postion athe snake, oe pastion (th iteration), al result (Qoth tration) Parameters were a= f= y= 1.2. SNAKE sed « 7ocal nighborboods and stopped when ono ewer pots changed ina eration. @ () © Figure $8 (a: Intl position ofthe snake. lntermediate position Sh iteration. (Final result (130th eration) Parameters were = f= I, y~ 12, SNAKE ised 7x Tloal nejhborboods and stopped when ony 9 or ewer points changed in an teration,Chapter More image Features 55. Line Grouping ‘We close this chapter by touching upon the dificult problem of grouping. For our Purposes, we can state the problem as follows Problem Statement: Grouping Given the st of features detected in an image, decide which groups of features part ofa same object, wihour knowing what objets we are looking at kay tobe ‘The featres in question ould be edge points lines, or curve sepments We iit our discasion tothe particular ease of sagt line segment, which ae important featuresin many practical casesand allow ustointtoduce the main concepts grouping Moreover, ine Segments are easy represented through ther endpoints; we adopt his convention here. ‘Notice that our problem statement generalizes the sims of roupig given atthe being ofthis chapter, in two diestions Fist, we are now intersted in grouping smote comples features than ede points second, we consider feature not ying neces sarily next to cach other onthe image plane Aswecannot use object model, we must esr to properties ofthe image to form the desired aroups of featres. We consider three such properties: proximity, paral, and collinear, Using these, we devise a method based on empirical estimates of he probability ofa group arising by acident of viewpoint or object postion. Grouping by means of properties like proximity, parallels, and collinearity i often called perceptual grouping, o perceptual organization’ Let us start by discussing how to exploit proximity. Cleary, wo line segments close to eachother in space project onto image segments also close to each other, irespectively of viewpoint and positioning. The converse 8 not tue: As iusrated in Figure 59, two 3D line segments might be fr apart, but appear close onthe image plane, for example duet aecdental positioning in 3D space. Notice that “cient indicates thatyou would expect the two 3D segmentsin Figure 590 appear, general, {ar apar from each othe. This observation leads us to precise problem statement of grouping by proximity. ‘Problem Statement: Grouping by Proximity Given et of image line segments projections of 3-D ine egmens determine whch pats ate ‘mos likely tobe projections of 3D line segments which are clos to eachother in the scene “ure icason oesnot do jstctothe vst oy of pchophy sel sudieson perceptual orgie, youre interese okow more ook othe Furr Reading Section5.5 Line Grouping 115. Figure 5.9 Due to assdenalpostoning, two 3. line far apr from ach oer projet oto close image ines. ‘What we need isa way of quantifying “most likely”; in other words, we are looking {ora significance mecsure of proximity, tp, which takes on large values for pairs of image lines corresponding to pairs of3- lines which are close to eachother in the seen. Intuition suggests that pairs of long image sepments ae better than short ones for our purposes. Therefore, f we denote with g the smaller distance between the endpoints of ‘wo image segments and with the length ofthe shorter segment (Figure 5.10(a)), 2 reasonable formula fr jy 8 Byo= () 1 Under appropriate assumptions on the distribution of the segments in 3-D space, you can show that 1/19 proportional to the probability that he two endpoints are close by accident (Exercise 58), ‘We ean devise ia similar w significance measures for parallelism and collinear ity, per ad at 2 f 5.15) Hee os = E 619) bet = BF) with 9 the angle betwsen the two segments s the separation (measured as the distance of the midpoint of th: shorter segment from the longer one), and the length of the longer segment (se Figure 5.10(b) and ()).Chapter More image Features —— ofA 00. f ; G a po o o o Figure 5.10. Mhsration of the relevant quantities in grouping by proximity (a), paralleis, (@),andollnearity (0) = Iegoes almost without saying that inthe case of parallel tho two Segments are supposed {overlap while inthe ase of eoinearty they ae not Asin the case of proximity, I/per nd Iu ean be interpreted as proportional tothe probability of accidental parallelism and collinearity, respectively Notice that the smaller # ands, the larger pr nd jt In addition, py gels larger for I~, while ica decreases withthe distance g between the endpoints of the segment pai and is independent ofthe length ofthe longer cement. We can now give a simple grouping algorithm based on yo, ar 84 Hes ‘Algorithm PERC_GROUPING ‘The inpat is Sst of image ine sepmensprejectins of ine Segments in spAEE Le pr sd! Te postive real tresolds. 1. For al pais of endpoints of image segments in 8 (2) compat iy with (14); (b) depending on wheter the sepmensin the pir overlap compute pe from (515) oF es Wi (3.10) 2 Order the significant mesures obtained, separately and in decreasing ode. 5 Group the segment pais for whi oye Of ge pr a “The output is formed by groups of image segments most kel to be pojgtions of $D segments pat ofa same object. wo segments and do ot elp if the peti of ve dosnt oe With band vee ves ‘Oe the opment erp Section5.7 Further Readings 117 + In case of cont (the same image pir being assigned a sgniflcance measure above threshold for bh proximity an either parallelism or ealinesrity), proximity fediscarded, The values of fy, fr And ty depend on the application and shouldbe determined empirical, However one is often interested only in fnng, sa. the tn best segment ‘air for each property in hs case the problem of sting Values fo iy, pry 88 ta S sided PERC_GROLPING does a reasonable job of identifying close, parallel, and collinear segment peirs. If S becomes large, precomputing the pairs of segments which {do not le too far apart on the image plane leads to a more efficient implementation. 56 Summary ‘After working through this chapter, you should be able to: @ explain the retion between the image features introduced in this chapter and those introduced in Chapter 4 detect Lines elfpses and more general curves in digital images @ design an algorithm for fiting deformable contours @ grouping image lines 57 Further Readings (Our treatment of the Hough transform is largely based on Ilingworth and_Kitter's review [8], which describes a umber of applications and developments. Haralick and ‘Shapiro'sbook [7] gve algorithmic details, and Davies [4] reports a detailed analysis of Hough methods. The Hypermedia Image Processing Reference (see Further Readings, ‘Chapter 4 covers both line and curve detection ur dscussionof ellipse fitting using the Euclidean distance is based on Sampson [17] and Haraick ani Shapiro [7]. The elipse-specifc algorithm, ALG_ELLIPSE_ FIT, is due to Pil, Fitzgibbon and Fisher (I4]- Strang {19] (Chapter 6) and Press eta {15] (Chapter 1) include good discussions ofthe generalized eigenvalue problem. The ‘geometric interpretation of the algebraic distance for general cone fiting is discused bby Booketein [3] ane Sampeon [17], Rosin [16] studict the effect of various coasteaints for algebraic distane:fiting Kanatani (9} (Chapter 8 and 10) discusses weighted least, squares fing, way of adapting least squares to the general casein which the standard deviation ofthe noe is different for each data point. Fitzgibbon and Fisher [5] report ‘useful, experimemal comparison of several conic-iting algorithms, while a teview ‘of applications of robust statistics in computer vision, du to Meer eal, can be found in[13} Deformable centours were introduced by Kass, Witkin, and Terzopoulosina now classical paper [10]. The algorithm discussed inthis chapter is based on the work of ‘Williams and Shah 20). Further interesting work on snakes, addressing minimization and contour tracking issues, i reported in(1] and (2)chapter's Mote Image Features Much research has been devoted to perceptual grouping and perceptual organization [6,21]; see [18] for «rent survey. The signfcance measures discussed in this chapter are esentialy based onthe work of Lowe [12] who discusses perceptual group. ing at length in [1] rom both the computational and psychophysical viewpoints 5.8 Review Questions 1 5._Line detection canbe implemented as template marching, which consists in ‘tering images with masks responding to lines in different orientations, then applying threshold to select line pixels. Compare template matching with the “Hough transform as methods to detect image lines. 1352 What are the disadvantages of template matching as a curve detection technique? 153. Discuss how the availabilty of edge direction information can reduce the search space of the Hough transform forthe case of icles. Generalize for higher- order parameter spaces. 11 5.4. Explain the presence of the secondary peaks around the main two peaks in Figure 52, 15.5. Explain why the (p,0) parametrization for lines (Section 52.1) leads to a better discretization ofthe Hough parameter space thaa the (mn) ane. Compare the accuracies ofthe parameter estimates you can hope to achieve in bath cases, liven thatthe search forthe maxima must take place ina reasonable time, 3-56 Whenusing Hough transform algorithms, searching for maxima in parameter space becomes increasingly onerousas the number of model parametersincrases. ‘Time and space could be saved by sampling the parameter spaces mone coarsely far from significant maxima, Discuss 1357. The finite size ofthe image implies that, on average, the length in pixel ofthe visible portions oflnes close to the image center Cis greater than tht of lines distant from C. How does this bias the Hough transform How could you counter this bias? 1 S8_Explain why the result of EUCL_ ELLIPSE _FIT could be any conic, not just anellipse. 159 Whatis the purpose of the parameters wf, and yin the energy functional ‘hich defines a deformable contour? 1 5.10 Could yousuggest an energy term that keeps. deformable contour faraway from a certain image point? 51 5.11 Isthe square in (5.14) rally necessary? 2 5.1 Modify algorithm HOUGH_LINE to take advantage of information about the edge direction Section 58 Review 119 © 52. Determine the equation ofthe curve identified in the (9,6) plane bya point (9) when searching for lines. (© 53. Write an algorithm, HOUGH CIRCLE, using the Hough transform 10 de- teetcirles © 54. Write the equation of the weighted average suggested in section 52.1 t0 estimate the perameters of line inthe presence of noise. 2. 55. Prove thatthe image of aD circle lying entirely in front of the viewer is an ellipse. What happens to the image i the 3D cree lies on both sides ofthe image plane. (HintTinkforinstance ofthe profile ofthe Earthasseenfroman airplane) 9. 56. Yow are hoking for lines in an image containing 100 edge points (noice that this isa small number for real applications). You deve to apply your iting routines to all aosible groups of points containing at leat 5 points (Brute force fitting) How many groups do you have to search? Assume that the time cost of| fitting afne to. points is a (in some relevant units), and that the time cost of forming m groupsfram 100 points smn?. How many points should agroup contain tomake sroupng more expensive than brte fore fting? © 57. Show thatthe tangent vector toa curve parametrized by its ac length is @ ‘uit vectr. Hit Use the chain ue of differentiation to find the elation between the tangent vedor with respect to an arbitrary parametrization and th ac length Now, looking the definition of arc length...) © 58 Given a curve in Cartesian coordinates fs) = [xs), y()]", withthe are length the curvature kis given by the expression iy =2"y| ‘Show that this definition of curvature reduces tothe expression used in section 54 ifle=[x,y)] and the x are equally spaced © 59 Inthe assumption ofa background o ine segments uniformly distributed in the image with espect to orientation, position, and scale, the density of lines varies with the inverse ofthe square oftheir length Show that the expected number of endpoints of lines of length within a radius of a given endpoint is thus given by me N 2 Projects {© 51. Implement algorithm HOUGH_LINES; thea, write and implement an algo rithm using the Hough transform to detect circles Both algorithms should take advantage of information on edge directions when available, using your imple ‘mentation of rat ast the output assumed from) relevant algorithms in Chapter, Using synthetic images, compare the time needed to detect single line and asin ple citele formed by the same number of points. Corrupt the edge images with spurious edge poinis by adding saturated salt-and pepper noise (Chapter 3), and estimate the ro>ustness of both algorithms by ploting te errorsin the parameter120 Chapter More Image Features References Oo) a 8 co] io] ic) nm wl io] fo) wy 13} 0 04 0s) bg un cstimates as a function ofthe amounts of noise (quantified bythe probability that pine is changed), 52 Implemeat algorithm SNAKE. Using synthetic images, study'the difference between the deformable contours obtained for different choices ofa, f, and y, AA. Amini S. Tehrani and TE. Weymouth, Usying Dynamic Programming for Minimia- ing the Energy of Active Contours inthe Presence of Hard Consens Proc 2™ ner Conf on Computer Vision, Tampa Bay (FL), pp. 95-9 (1588). ‘A. Blaks,R.Curaen, and A, Ziserman, A Framework fr SptioFemporal Contonthe ‘rackngof Visual Contours temainal Journal of Compute Vision Vl 11, pp. 127-148 983), FEL. Bookstein, iting Conic Sections to Scattered Data, Computer Graphics and Image Processing, NOL8, 5691 (197), ER Davies, Machine Vision Theory Algorithms, Proccalities, Academic Press London 90), A.W. Fzibbon and RB. Fisher, A Buyers Guide to Conic iting, Proc! Bish [Machine Vision Conference, Birminghar, pp S352 (1995) B, Flnchbaugh and B Chandrasekaran, A Theory of Spatio-Temporal Agaregation for Vision, AficialImelignce, Vol 19 (1981). RAM Haraick and L.G. Shapiro, Compuerand Robo Vision, Volume I Addison-Wesley a9), [lingworth nd J.Kitler, A Survey ofthe Hough Transform, Computer Vision, Graphics, and Image Processing Vl. 4, pp 87-116 (1988). K. Kanatani: Geometric Compuaion fr Computer Vision, Oxford University Pres, Ox- fora (1993), M.Kass, A. Witkin, and D.Tezopoulos Snakes: Active Contours Model Pro. Fist Inter Cont Comput Vision, Landon (UK), pp. 259-209 (1987), 1G. Lowe: Perceptual Organization and Visual Recognition, Kluwer, Boston (MA) (195). DG. Lowe, Objet Recognition rom Single 2D Images, Artifical Inligence Vol. 31, p-385-385 (1957. P Meer PL Mint, A. Reweneld and IY. Kim: Rast Regn Method for Comper Vision a Review, aeration Journal of Computer Visio, VOL. 6, p. 59-70 (181) M Plu, A.W. Fiugibbon and RB Fisher, Elipse Speci Least Square Fiting IEEE In Conf on image Processing Lausanne (199). WH. Press SA. Teulosky, WT. Vetting and BP Fannery, Numerical Reips in, second etiton, Cambridge University Press (1992), PL. Rosi, A Note on he Least Square Fiting of Elipses,Panem Recognition Lene, VoL. 14, pp. 79-808 (199). ‘PD. Sampson Fiting Cone Sections 1 “Very Scattered” Data:an Iterative Refinement of the Bookstein Algoritim, Computer Graphics and Image Pocesing, Vol. 18, pp. 97-108 (19%), (is) tia) fen) 2) References 121 S Sarkar and KL Boyer, Perceptual Orasizaton in Computer Vision: @ Review anda Propostl fora Casieatory Strvture, IEEE Transactions on Stems, Man, and Cyber resis Vol. 23, 382-39 (1993), 6. Strang Linear Algebra and ls Applications, Harcourt Brace, Orlando (FL (1988). DJ. Wiliams anéM. Shab, A Fast Algorithm for Active Contours and Curvature Estima tion, CVGIP: Image Undertanding, Vo. 8,9. 1-26 (192), ‘A. Witkin andJ. Tenenbaum, On the Roe of Structure in Vision in Human ad Machine Vision, 1 Beck, H Hope and A. Rosenfeld eds, Academic Press (New York), p.a8I-SiS 83Camera Calibration For the ancient Egyptians exactitude was symbolized by feather thal served as a weight on sales used forthe weighing of souls Lalo Calving, Six Memo forthe Nex Milenio ‘This chapter tackles the problem of camera calibration; that i determining the value ofthe extrinsic and intrinsic parareters ofthe camera ‘Chapter Overview Section 6. defies and motivates the problem of camera calibration and its main issues, Seetion 62 discusses a method based on simple geometric properties for estimating the camera ‘parameters, given a numberof corsespondences between scene and image points. Section 6.3 describes an alternative, simpler method, which recovers the projection matrix fis, then computes the camera rarameters as functions ofthe entries ofthe matrix, What You Need to Know to Understand this Chapter ‘+ Working knowledge of geometric camera models (Chapter 2). ‘+ SVD and constrained leit-squares (Appendix, section A.) ction methods (Chapter 5). «+ Famliarity with fine ext 6.1. Introduction We learned in Chapters 4 and Show to identify and locate image features and we are therefore fully equipped to deal withthe important problem of camera calibration; that m3124 chapter§ Camera Calibration is estimating the values ofthe intrinsic and extrinsic parameters ofthe camera mode, Which was intzoduced in Chapter 2 The key idea behind calibration isto write the projection equations linking the ‘known coordinates of ast of -D points and ter projections and solve forthe camera parameters Inonder to ge to know the coordinates of some 3D points camera calibra tion methods rely on one or more images ofa calibration pate: thats, a 3D object of ‘known geometry posibly located in a known position in space and generating image features which canbe located accurately. Figure 6.1 shows atypical calibration pattern, consisting of two planar grids of biack squares on a white background. Itiseasy to know the 3D postion ofthe vertices of each square once the postion ofthe two planes has been measured, and locate the vertices on the images, for instance as intersection of mage lines, thanks Co the high contrast and simple geometry ofthe patter, Problem Statement Given one or more images of acalration pattern, estimate 1. te intrinsic parameters 2 the earns parameters or both © Theaccurayofcalbation depends onthe acura ofthe measurements ofthe calibration pattern; hati is construction tolerances. Tobe on these sd, the cairation pattern ‘Should be bull ith tolerances one or two order af magnitudes smaller than the desired curacy of calibration. For example, ifthe desired aceuracy of ealbation is O.imm, the ‘alton pattern shoud be built wt olranee smaller than Q0Lmm. Although there ae techniques inferring 3-D information about the scene fom smcaliraed cameras some of wich willbe deseribedin the next two chapters etective camera calibration procedures open up the possiblity of using a wide range of existing algorithms for reconstruction and recognition, al lying onthe knowledge of the camera parameters. This chapter scsses two algorithms for camera calbaton. The frst method recovers rely the intnsc and extinsie camera parameters the second metod est rates the projection marx it, withou solving explicit fo the various parameters! which are then computed as closed-form functions ofthe entros of the projection ma trix The choice of which method to adopt depends largely on which algorithms are to be applied next (Section 64), "eal hat he pectin matric inks world sd iage cous, nd enti ar fasion ofthe Insc td enc partes Section 62 Direc Parameter Calibration 125 Figure 6.1 The typical calletin patter used inthis chap. 6.2. Direct Parameter Calibration ‘We start by identifying the parameters to be estimated, and east the problem in geometric terms 6.2.1. Basic Equations ‘Consider aD pont, P, defined byitscoorinates[X*, "inthe word reference frame. As usual in carton, the world reference frame is known. © This means to pok an accesible object dining three mutually orthogonal directions interetng ina common pont. In this chapter, this objec she ealibration pattern: a ‘son systems fr indoor robo navigation, fr insane, tear be acorner of 2 oom. Let [X°,1°,2°7 be the coordinates of P inthe camera reference frame (with 2° > OifPisvsble) Asusual, the origin ofthe camera frameisthe enter of projection, and the Z ais isthe optical axis The poston and orientation of the camera frame is unknown since, ulke the image and world reference frames, the camera frame is inacesile direct. This equivalent wo saying thatthe we donot know the extrinsic parameters hat the 33 rotation matrix R and +D translation vetorT such that26 Chapter Camera Calibration x kl re on. (61) = In components, (6:1) can be written as XonuX" 4 ra¥* + rZ" +, Sraik® tral bra" +7; Zo araX" tral” +132" 4 (62) ‘Note the sgt but important change of notation with respect to Chapter. Ia that chap. ter, the transformation beeen the world and camera reference frames was delned by translation followed by otton. Her, theoreti reversed nd rotation precede tans ‘ion. While the rotation matrix isthe samen bth eases, the rasation vectors (se Question 6.2), Assuming that radia distortions (setion 243) canbe neglected? we can write theimage of[4°.1", 2°] inthe image reference frame as (se (2.14) and (2.20) aan LE ton 3) th Nw to (64) Forsimpliity, and since theres norsk of confusion, we drop the subscript indicating image (pixel) coordinates, and write (x,y for (xi, dia). AS we know from Chepter 2, (63) and (6.4) depend on the five intrinsic parameters f (focal length), 5, and 5, (horizontal and vertical effective pixel size) and o, and o, (coordinates ofthe image center), and, owing to the particular form of (6:3) and (64), the five parameters are not independent, However, if we let fx = fy and a = s/s, we may consider a new Set of four intrinsic parameters, 2. oy fa and a, all independent of one another. The parameter f, is simply the focal length expressed inthe effective horizontal pixel size (the focal length in horizomal pixels), while a, usally called aspect ratio, specifies the pitel deformation induced by the sequisition process defined in Chapter 2 Let us ‘ow summarize all the parameters tobe calibrated ina box (se aso the discussion in sections 749 and 243). Extrinsic Parameters * Rythe 3x3 rotation matic the 3. translation vector sal cose his sumption in Exerc, which supe ow ocala the artes ofthe il ston del of Chapter Section 62 Direct Parameter Calibration 127 Intrinsic Parameters length in effective horizontal isl Sie wits + aaiy/ aspectatio + (01.0) image caer coordinates + trail distortion coeient Plugging (6.2) into (63) and (6.4) gives raX* ra" ns" 4 Ty “rk bral rele oT Xm brs" 47, PX peal @ pra Te 3) -o% 66) Notice that (6) and (66) bypass the inaccessible camera reference frame and ink directly the world conrdinaes [X*,Y¥, 2*]" with the coordinates (x,y) ofthe corte- sponding image point. If we use @ known calibration patter, both vectors are messi able. This suggests tat, vena sufficient numberof points on the calibration pattern, ‘we can ty to solve (5) and (6.6) fr the unknown parameters. Ths isthe idea behind the fist calibration method, which is articulated in two parts 4, assuming the ecordinates ofthe image center are known, estimate al the remain ing parameters 2 find the coordiates ofthe image center 6.2.2. Focal Length, Aspect Ratio, and Extrinsic Parameters We assume that the coordinates of the image center are known, Thus, with no los of seneralty, we cam consider the transated coordinates (x, )= (x ~0,,~0,)-In other ‘words, we assume that the image center iste origin ofthe image reference frame. As we suid in the introduction, the key idea iso exploit the known coordinates ofa sufficient ‘numberof corresponding image and world points. ‘Assumptions and Problem Statement Assuming that the locaton ofthe image contr (oo, iskaow, and that rail distortion can be neglected, estimate a, and T trom image points (1 = 1.-.,pojetions of 8 known word points [32 in the word ference frame The key observation that (65) and (66 ave the same denominator therefore, from each corresponcing pairo points (X,Y, 2,2, »)) wesan write an equation ofthe form BflraX? + mY rast +) = yfeluX! + nad! +maZP+ 1). (62)128 Chapter § Camera Calibration Since w= f/f (62) can be thought of as @ linear equation for the 8 unknowns =: aXPo alt ndP tame — XPS — pla WZ — vty where weran al, \Writing the last equation for the N corresponding pairs leas to the homogeneous systom of N linear equations av=0 68) ‘where the W > 8 matrix A i given by Xp att ade on nah onthe aMY tf Poe XP nlf mdf -n By yw avi an WZ av XY Ih -7 IFW > Tandthe W points are notcoplanar, A hasrank 7, andsystem (68) hasa nontrivial salution (unique upton unknown sae factor), which can be determined from the SVD of 4, A= UDV™, 8 the column of V corresponding tothe only nll singular vale along the diagonal of D (Appendix, section A‘). 1 The teste ofthe mise and the inaccurate localization of image and world points make the rank of 4 key to be maxima (ight), In his ese the solution s the eigenvetor coresponig tothe wall eigenvalue A rigorous proof of the fact that, in th ideal case (noise-ree, perfectly known coordinstos) the rank of Ais? seems oo involved tobe presented here, and we refer yr to the Further Readings section for details Here, we just observe that, ifthe effective rank is larger than 7, system (68) would only have the trivial solution. ‘Our next tsk is to determine the unknown scale factor (and hence the various camera parameters) rom the solution vector v = ¥, If we call the sale factor, we have virnasm Tani aravans, a 69) Since r+, += 1, from the ist three components of # we obtain (610) Section 62 Direct Parameter Calibration 129 Sinilay singe + b= of ¥we ave andr >0, fom the fifth, sixth, and seventh component VERB ra Ree} t+) =alr (an) We an solve (6.10) and (6.1) fr [yas well asthe aspect ratio, We observe that the {frst wo rows of the retation matrix, R, andthe fst two components ofthe translation vector, T,can now be determined, up to an unknown common sgn, rom (6.9). Further more, the third row othe matrix & can he obtained a the vector produto he fi two estimated rows thought of as3-D vectors. Interestingly this implis that the sign of the third row is already fixed, asthe entries of the third sow remain unchanged ifthe signs ofall the entresof the ist two rows are reversed, ‘© Since the computtion ofthe estimated rotation matrix, does not take into agcoun ex ly the orhogsraity constrains, cannot be expected tobe orthogonal (RR™ = 1) In ‘order o enforce othganaltyon Rone can sort tothe ubiquitous SVD decomposition. ‘Assume the SVDof is R= U DV, Sige the thre singular vals ofa 3 x 3 orthogo tal mati are all, we can simply replace D withthe 3 3 iemity mat, s tha he ‘esting mata, CTV" isexaellyonbogonal Gee Appendix, section A. for details) Finally, we detemine the unknowa sign ofthe scale factor y, and finalize the estimates ofthe paraneters To this purpose, we go back to (6), for example, with + instead of x ~o,, and recall that for every point 2°> Oand, therefor, x and rX" + rig¥™ srri2™ + T, mast have opposite sgn. Consequenty its sufcient to check the sign of x(rj:X" + 712° +n3Z" + T,) for one of the poins. If x(rX" tr + rgZ*+T) > 0, 12) the sign ofthe frst to rows of ® and ofthe first two components ofthe estimated translation vector must be reversed. Otherwise no further action is required. A similar argument canbe applied to y and r.X* + rn¥* + ras Z* + Ty in (6.8). Atthispoint, wehave determined te rotation matrix, R,the rst two components ofthe translation vector, T, and the aspect ratio, . We ae left with two, tll undeter- ‘mined parameters: T,the third component of the translation vector. and f. the focal length in horizontal pixel units. Both 7, and f, can be obtained by least squares from a jstem of equations ike (65) or (66), written for WV points To do this, foreach point (3,99) We can write sik? +n ~falraX! + rake trsZP +1), (613) thea solve the overeosstained system of N tinear equations (EJs «aw130 Chapter Camera Calibration inthe two unknowns T; and f, where a uk} +na¥f +rsZf +7) ma (uk! +m¥f +rsZf +7) Aw (iX¥ + may + msZh +7) and tra XY + my +523) matrsXG + aN tr ZQ) ‘The least squares solution (7, js) of system (6.14) is cavay“ian (as) 11 remains to be discussed how can we actually acquire an image of NV points of ‘known world coordinates and locate the N corresponding image points accurately. One possible solution of this problem can be obtained with the pattern shown in Figure 6. ‘The pattern consists of two orthogonal grids of equally spaced Black squares drawn on ‘white, perpendicular planes. We lt the world reference frame be the 3-D reference frame centered in the lower lft corner ofthe left rid and with the aes parallel to the three directions identified by the calibration pattern. Ifthe horizontal and vertical size ofthe surfaces andthe angle between the surfaces are known with high accuracy (ftom construction) then the 3-D coordinates ofthe vertices ofeach ofthe square in the world reference frame canbe easily and accurately determined through simple trigonometry. Finally, the location ofthe vertices onthe image plane can be found by intersecting the edge lines ofthe corresponding square sides. We now summarize the method: ‘Algorithm EXPL_PARS_CAL ‘The input isan image ofthe calibration pattern described inthe text (Figure 6.1) andthe loeaton ofthe image center, 4. Measure the 3 coordinates ofeach vertex ofthe» squares on the calibration pattern in the word relerene frame Let 2 In onde a Sind the coordinates inthe image reference frame ofeach ofthe W vertices + locate the image ines defined by the sides the squares, using procedures EDGE, COMP and HOUGH_LINES of Chapters3 and). + estimate the image coordinates of al the Vortices of he imaged squares by intersecting the Tins found. Section 62 Direct Parameter Calibration 131 4 Having established the W comespondences between image and world points compute the ‘SVD of in (6, Te solution isthe column of V corresponding tothe smallest singular vale of 44. Determine j ard fom (60) and (61). 5. Recover the fsttwo rows of Rand the ist wo components fT from (69). 6 Compute the thd rw of asthe vector prodvet of the fist two rows estimated inthe previous step, ant enforce the orthogonal constraint on the estimate of tout SVD decomposition 7. Pcka point for wich ~o,)snotceably diferent fromO. inequality (6.12) ists, ever the sgn the first two rows of Rand ofthe frst two components of, 8 Set up A and bolster (6.14), and we (6.15) t estimate Ts nd fi he outpuis formed by af, andthe extras parameters ofthe vicwing camera "When using cabration patter ke the on in Figure 6, the 3D squares lic on two di ferent planes Th intersections of ines define by squares fom diferent world panes do ot correspond t any image vertices. You must theretoreense that your implement tion considers ony the intersections of pairs of nes associated to the same plane ofthe calbration pattem * The tne equation in image coordinates are computed by leas squares. using as many collinear edge points as possible: Tis process improves the accuracy ofthe estimates of line parameters ad vertex location onthe image plane. 6.2.3. Estimating the Image Center In what follows we describe a simple procedure forthe computation of the image center. Asa prelininary step, we recall the definition of vanishing points fom projective geometry, and state a simple theorem suggesting how to determine the image center through the orthocen er ofa triangle in the image Lett sy. be pnalltne in3-D space, and the corresponding image lines. Dut tthe perspective projection, ie lnes Ly appear to meat in «point p, called vushing point, dened 1 he common intersect al the mage es Orthoceater Theorem: Image Center from Vanishing Points Let be the triangle onthe image plane defined bythe tre vanishing points of three mutually orthogonal eso paral lines in space. The nage center ithe orthocenter of 7 ‘The proof ofthe theorem is lelt a an exercise (Fxercise 6.2) The important fact Js the theorem reduees the problem of locating the image center fo one of intersecting The rene owing ithe commen tert the thee ies132 Chapter Camera Calibration ‘mage lines, and canbe created easily on a suitable calibration pater. In fact, we can tse the same calibration patera (actually, the same image!) of Figure .1, already used for EXPL. PARS.CAL, so that EXPL_PARS_CAL and the new algorithm, IMAGE. CENTERCAL, it nicely together “Algorithm IMAGE_CENTER_CAL The input isan image of he calration pattern in Figure 6 and the output ofthe st two steps of algorithm EXPL_PARS. CAL 1. Compute the three vanishing points pp and ps, determined bythe thee Bundles of tins obtained instep 2of EXPL. PARS CAL 22. Comput he orhoosater, 0 ofthe trans pp “The output are the image cooeinates ofthe image center, 0. © This esemial hat he calitation pater is imaged from a viewpoint guaranteeing that ‘0 vanishing pointes much fart tan te het from the image center, otherwise ‘he image ines become nearly pall, and small inaccuricies in the location ofthe fines rest in large errs inthe coordinates ofthe vanishing point. This can happen ifone of the three mutaly orthogonal detions i nearly parallel othe image plane station tobe definitely voided, Even witha good viewpoint itisbest to determine the vanishing points wing several lines and las squares += Toimprove the asuray ofthe mage contr estimate, you should run IMAGE. CENTER, CAL with several viens ofthe calitation patter and average the results Experience shows that an accurate location ofthe image center is not erucal for ‘obiaining precise estimates of the other camera parameters (see Further Readings). Be feareful, however, as accurate knowledge ofthe image center i required to determine the ry in space identified by an image point (as we shal se, fr example, in Chapter). 6.3. Camera Parameters from the Projection Matrix ‘We now move on to the description of a second method for camera cal new method consis in two sequential stages 1. estimate the projection mari linking world and image coordinates; 2 compute the camera parameters as closed-form functions of the entries of the projection matrix, 6.3.1. Estimation ofthe Projection Matrix [As we have seen in Chapter 2, the relation between the 3-D coordinates (X,Y ‘a point in space and the 2-D coordinates (x,y ofits projection on the image plane Sesion 63 Camera Parameters from the Projection Matrix 133, can be written by means of a3 > 4 projection matrix, M, according tothe equation ()(F uy muX! + mae + mis2 + m4 with 8 mkt +m 6) 2 mak bm em Te mati died pt an tity ee aor nd shoe cl independent xin viene deemed troup homogencos er oem formed by writing (6.16) for at least 6 world-image point matches However, through the ‘sc ofcourse one Fg 6, many mov corespadeee suntan tube an cn bested ire en sree ‘Case wetc vn mach be moyen nearer ee Am=0, (ox) with Mh tak ky mth 0) or behind (F. <0) the camera, We do not give the usual algorithm, ‘which would merely restate the equations derived in ths section.136 Chapter 6 Camera Calibration 6.4 Concluding Remarks Isthereany difference between the two calibration methods presented? Obviously, you should expect the same (in practic, very similar) parameter values from both! Method 2s probably simpler, but the algorithm on which we based EXPL. PARS CAL (see Further Readings) isa well-known technique inthe computer vision community, and has been implemented and used by many groups. PROJ_MAT_CALIB isuseful whenever the projection matrixissufficienttosolvea vision problem, andthereis noneed to make the individval parameters explicit; an example i referenced in the Further Readings. ‘We sd that the precision of calibration depends on how accurately the world and image reference points are located. But which accuracy should one pursue? AS the terrors the parameter estimates propagate to the results the application, the answer ‘depends ultimately on the accuracy requirements ofthe target application. For instance, inspection systems in manufacturing often require submillimeter accuracies, whereas certors of sentimeters can be acceptable in vision systems for outdoors navigation, Locating image reference points of lines at pixel precision is generally unsatisfactory in the former ease, but acceptable inthe latter. A practical guideline is: the effort going into improving calibration accuracy shouldbe commensurate othe requirements of the ‘aplication 65. Summary ‘After working through this chapter you shouldbe abe to explain what calibration is and why itis necessary 2 calibrate the intrinsic and extinsc parameters of an intensity camera estimate the entries ofthe projection matrix design a calibration pattern, motivating your design choices 6.6 Further Readings [EXPL_PARS_CAL has been adapted from wellknown (but rather involved) paper by Tsai 10}, which contains a proof that dhe rank of the matrix A of EXPL_PARS_ CALIS7 in the Weal ase. The orthocenter property and the calibration of the image center through vanishing poinsis due to Capile and Torre [2], who also suggest a neat ‘method for calibrating the rotation matrix. Direct calibration ofthe projection matrix is described, for example, by Faugeras and Toscan [5]. The explicit computation ofthe calibration parameters from the projection matrix is taken from Faugera’s book [4]. ‘which also discusses the conitions under which thsi posible (Chapter 3, Section 4) Ayache and Lastman [1] desribe a stereo system which requires calibration of the projection matrix, but not explicit knowledge ofthe camera parameters The influence ‘on vision algorithms of erroneous estimates of the image center and other camera parameters is discused by [3 7,6]. Thacker and Mayhew [9] describe a method for calibrating a stereo system from arbitrary stereo images. Section 67 Review 137 Recent developments on calibration are presented, for instance, by Faugeras and Maybank [8] who introduce an elegant method based solely on point matches and able {obtain camera calbration without the use of calibration patterns 67 Review Questions 9 6.1 Describe tie purpose of calibration and name applications in which calbration is necessar {3 62 Whatis the relation between the translation vectors between two reference frames ifthe uansformation between the frames is (a) first rotation and then translation and (b) vice versa? Verify your answer in the trivial case in which roation is described by the identity mati, 3 63 Why does gorithm EXPL. PARS_CAL determine only f, and not f,? 64 Discuss the construction requirements of a calibration pattern 65, How would you estimate the accuracy ofa calibration program? Exercises (© 61 In the case of large fields of view, or if pixels very far from the center of the image are considered, the radial distortion cannot be neglected. A possible procedure for calibrating the distortion parameter k ito rewrite (6.13) with xu(1 + hr} in place of x with r? =x? +a°y?,0r CLEA TV0XE tral + nnd + T= — flr XF te naYl tnsZf + Ta. ‘The corresponding nonlinear system of equations fo the unknowns f,T, and ‘k can be solved through gradient descent techniques using the output of EXPL_ PARS_CAL as inital guess for f, and T. and 0 as intial ues for ky. Write oat ‘the complete equations suggested, and devise a calibration algorithm ineluding y inthe intrinsic parameters. 62 Prove the arthocenter theorem by geometrical arguments. (Hints Leth be the altitude from the vertex (and vanishing point)» to the side s, and O the projection center. Since both the segments hand 0 ate orthogonal (os, the plane through f and u0 is orthogonal tos and hence to the image plane...) {631 Prove thatthe vanishing points associated to three coplanar bundles of par: allel lines are ellnea, (64. timate tke theoretical error ofthe coordinates of the principal point as & function of te errr (uncertainty) ofthe coordinates ofthe vanishing points. Can. ‘you gues the viewpoint that minimizes erro propagation? Stn by nec Robins138 Chapter6 Camera Calibration © 65. Explain why errors inthe location ofthe image center do not affect greatly the estimates ofthe remaining parameters. (Hine Check out (6:18), (6.19) and (620)) © 66 The direct calibration procedure given in Chapter 2 fora range sensor establishes a direct mapping between pixel and word coordinates. Sketch a calibration/acqusition procedure forthe same sensor, but based on the estima- tioa/knowledge of the camera parameters. Projects © 6. Inplemeotthefirst(EXPL_PARS_CAL and IMAGE_CENTER_CAL)and second (based on PROJ_MAT-CALIB) calibration algorithm in this chapter. ‘Compate the sensiivity ofthe estimates ofthe iurinsic parameters obtained by the two methods, by eepeating the calibration 20 times for different views of the ‘calibration pattern and without moving the camera (© 62 Compare the sensitivity ofthe estimates ofthe extrinsic parameters obtained by the two methods, by repeating the procedure above without moving the setup. References u I. Ayache and F, Lustman, Tiiocular Stereo Vision for Robotics IEEE Transactions on ‘Pater Analyse and Machine Inaligence Vol 13, pp. 73-85 (191), [2] B.Caprile and V. Torre, Using Vanishing Points for Camera Calibration, International ural of Computer Vision Vol 3, pp 127-140 (190). [5] K Danis and H.-H. Nagel, Amaya Results onEtrorSensvty of Motion Estimation ‘rom Two Views mage ar Vision Computing Vo 8 pp. 297-20 (1950, [5] OD. Faugeras, Three-Dimensional Compuer Vision: A Geometric Viewpoint, MIT Pres, Cambridge (MA) (193), [5] OD Faugeras and G Toscan, Th Catration Problem for Sete. in Proc IEEE Confer ‘ence on Compater Vision and Patern Recognition 86, Miami Beach (FL),pp.1520( 1980), (6) B KamgarParsi and B, Kamgar-Pari, Evaluation of Quantization Eros in Computer Vision, IEEE Transaction on Patern Anclysis and Machine Inelignce, Vol PAMI-ll, N09, pp. §29-939 (1988), PRK. Lenz and R ¥ Tos Teohniqus for Calibration of the Scale Factor and Image Center for Fgh Accuracy }D Machine sion Metrology, [EEE Transaction on Pater Analysis ‘nd Machine Intelgence, Vl. MIO, No 1, pp. 713-720 (198) [8] § Maybank and OD. Faugeras, A Thoory of Slf-Calibratin ofa Moving Camera, Ir rational Journal of Computer Vision Vol 8p. 123-181 (1982). 1A. Thacker and LEW. Mayhew, Optimal Combination of Stereo Camera Calibration from Arbitrary Stereo Images, Image and Vision Compuaing Vo 9, pp. 27-32 (1991), U0] RX. Tosi, A Verse Camera Calibration Technique for High Accuracy 3D Machine ‘Vision Metrology Using Off-the-Shelf TV Cameras ad Lenses, IEEE Journal of Robotics and Automation Vol RA:3, No 4 pp. 323-34 (1987) 5 Stereopsis ‘Two are beter than one; boca they have a good reward for thei labor. Ecclesiae 49 ‘This chapter isan introductiontostereopsis, or simply stereo, in computer vision, Chapter Overview Seaion 71 san fomsnoiusion isd sero in wo bproblenscomtepondence and reconsston nd aaj simples em Secon 72 eas wit the prob of essing corespondensesbteen nog teens ot tate Secon 73s deste tte gomety of seen the ep geoer recovered and used. : Sesion 7 dace the mtd forte eosin ofthe rc fase cach assuming a different amount of knowledge on the intrinsic and extrinsic parameters of the and how it can be What You Need to Know to Understand this Chapter + Working knowledge of Chapters 2, 45,6 ‘+ ‘Singular value decomposition and constrained optimization (Appendix, section A.) ‘+ Basic notions of projective geometry (Append, section A.A), 139140 Chapter 7 stereopsis 7.1. Introduction Stereo vision referstothe ability to infer information onthe 3-D structure and distance of ‘ascene from ovo or more images taken from different viewpoints. You can learn a great {deal of the basic principles (and problems) ofa stereo system through a simple exper- mont. Hold one thumb at arm's length and close the right and left eye alternatively ‘What do you expect to see? With presumably litle surprise you find that the relative position of thumb and background appears to change, depending on which eye is open {or closed). Its precisely tis difference in retinal location thats used by the brain to reconstruct a3-D representation of what we see. TAA The Two Problems of Stereo From a computational standpoint, treo system must solve two problems. The fist own as correspondence consists in determining which tem inthe lf eye corresponds to which em in he righ ee Figure 7.1). rather subtle dfclty hee is that some parts of the scene ae vibe by one eye onl. nthe thumb experiment, fr example, which part ofthe background is occluded by your thumb depends on which eye's open. “Therefore, aster system must alo be able to determine the image parts that should not be matched “The second problem tht sere system must solve i reconstruction. Our vivid :D perception ofthe world is de othe interpretation that the rai ives ofthe com pated difference in retinal postion, sumed dispai, between corresponding items? "The disparities of al the image points form the socalled disparity map, which canbe displayed as an mage. the geometry ofthe stereo sytem is known, the dspaiy map an be converted to a 4D map ofthe viewed scene (the reconstruction). Figure 72 Shows an example of stereo reconstruction with a human fac. Figure 7. illustrates an application of computational streopss in space research: reconstructing the elie of ihe surface of Vents rom two satelite (SAR) images The images were recorded by the Magellan spacecraft, and cover an area of approximately 120 x 40 km; each pixel corresponds o 75 Definitions: Stereo Correspondence and Reconstruction ‘The corspondence problem: Which prs the lean ight images ae projections ofthe same scene clement? ‘The reconsiucion problem: Given a numberof corresponding parts ofthe ef nd sight image, «and pens information onthe geometry ofthe sere stom, what can we sty about the 3D Tecation and structure ofthe observed objets? “Af yon think isi bons yt expan why, wt both eyes oper, you los inary se only one th, wllscprse in dopa Gom the ackgsons sis bet dees the now popu auotregrons in which he ecto of ephsindued Lynette then pay So be Further Resigs ov ore on aosteogam Section 7.1 Introduction 141 Figure 7.1 Anillusteaion ofthe correspondence problem. A matching between coresponding Pints ofan image per estas (aly some correspondences are shown), © Figure 7.2 (a) On ims roms reo pir of Emam Tacs face. (8) -D rendering ‘stereo reconstruction. Courtesy of the Turing Institute, Glasgow (UK). =chapter? Stereopsis rigre 73. Ste resnsevcton of the sure of Ven rom apa of SAR images (ack tie ayenion swing these seu planet) The ie se of Vens acquit by the Magn satelite ()D rein othe se ara Coury a Ass Gl, ee or Cpe raps ad Vn, ‘Teshnial University of Graz Section7.1 Introduction 143 P : oo Pe eo = . . o, ro © Figure 74 A dimple stereo stem, 3.D reeosruction depends on the solution ofthe correspondence problem (a; dep is estimated from the ‘span ofeoresponing pins () 741.2 A Simple Stereo System Before starting our investigation of the corespondence and reconstruction problems with he necessary mathematical machinery itis wsefulto eam s mach as we ean fom the very simple model ilustrated in Figure 7.4(a. The diagram shows the lop view of a stereo system compysed of vo pinhole eameras, The et and ight image planes are coplanar, and represated bythe segments and I, respectively, 0, and Oy are the centers of projection. The optical axes arc parallel; for this reason, the fixation point, defined the point oinersection ofthe optical axes is intl fr rom the cameras. ‘The way in whith stereo determines the poston in space of P and Q (Fig ue 74a) is iangulton, that i by intersecting the rays defined bythe centers ot projetion andthe images of P and Q pra angulation depends cracally on the suuton ofthe omespadece protien I (pp) ah Gy) ate sen ts als of corresponding image pint intersecting the ray Orn ~ Ou and Oi ~ Oe eas to nerpretng the image pons as projections of P and Q; buf (p14) aid a.) are the selected pairs of corresponding points, triangulation returns P! and 0 Note that both interpretation although dramatically diferent, sand nan equal footing ‘nce we acepl the respective correspondences. We wll ave more tose about the ortespondenve problem and is sotions ia Section 72. Le us now assue that the eorespondence problem has ben solved, and turn to reconstruction Itsintructve to wrt the equation underying the ianglaion of Figure 7, We cncenirate onthe recovery ofthe poston of singe pont, P, rom tsua Chapter 7 Stereopsis projections and p Figure74(b). The estane, 7, betwen the centersf projection 1, and 0,,iscale the baseline ofthe stereo system. Let ands, be the coordinates ‘of pand wih pect to the principal pons yan cy, the common foal eat and Z the distance between P and the baseline. From the similar anges, P,P) and (01 P.0,) we have Ten-n_T Dates a) Solving (71) for 2 we obtain a) where d =x, — xy, the disparity, measures the difference in retinal positon between the corresponding points in the two images From (7.2) we see that depth i inversely proportional to disparity. You can verily this by looking at moving objects ouside: distant objects seem to move more slowly than close ones. 7.13. The Parameters ofa Stereo System ‘As shown by (72), in our simple example depth depends on the focal length, f, and the stereo baseline, 7; the coordinates x and tare referred tothe principal points, e and cy. The quantities f,7,1,¢- ate the parameters ofthe stereo system, and finding their values isthe stereo calibration problem. There are two kinds of parameters to be calibrated ina general stereo system, ‘The Parameters of a Stereo System The intrinsic parameters characterie the transformation mapping an image point rom camera to pitl coordinates in each eamera ‘The eine parameters describe the eave peston and orientation of he two cameras. ‘The intrinsic parameters are the ones introduced in Chapter 2; a minimal set for each camera includes the coordinates ofthe principal point and the focal lengths in pixel. The extrinsic parameters instead, are slightly different: they deseribe the rigid transformation (rotation and translation) that brings the reference frames of the two cameras onto each other: ‘Since in many cases the intrinsic parameters, or the extrinsic parameters or both, are unknown, reconstruction i often acallbration problem, Rather suprisingly, you will learn in Section 74 that a stereo system can compute a great deal of }-D information without any prior knowledge ofthe stereo parameters (uncalibrated stereo. In order Bn Chap these exisieparumetr were to dene the Digit mdon hat igs theca nth word fre tue ota cach teres, th referents fae of oe cara tke sthe wld ‘eference fame, Section 72 The Correspondence Problem 145 to deal properly with reconstruction, we need to spend some time on the geometry ‘of stereo the so-called epipolar geometry (Section 7.3). Asa byproduct, the epipolar ‘geometry will prove wsefl to get better understanding ofthe computational problems ‘of stereo, and fo deve more efficient (and effective) correspondence algorithms. © Belore starting cur investigation of stereo, a word of warhing about the validity ofthe encusons that an be dean fom the simple terco mode (7.2) cefering to Figure 7 ‘They illseate wll the main issues of stereo, but are to simple to tl the entire story. In patcular, (72) may lad you to conlude thatthe disparity can only decease a he stance ofthe objet rom the cameras increases; nstea, in typical stereo system with ‘converpingecamerast disparity actully increases with the distance ofthe objects fron the {zation pont. Cary the reasoa why You canao ner his property frm our examples ‘thatthe tration point satiny 7.2 The Correspondence Problem Let us first discuss the corespondence problem jgnoring quantitative knowledge ofthe cameras’ parameters, 724 Basics ‘We will start off witk the common assumptions underlying most methods for finding correspondences in image pais. ‘Assumptions | Most cone point ae visible from bth viewpoints. 2 Corresponding inage regions are sini. ‘These assumptins hold for stereo systems in which the distance ofthe fixation point from th cameras is much larger than the baseline. In general, however, both assumptions may be fils and the correspondence problem becomes considerably more dict. For the ime being, we take the validity of these assumptions fr granted and ‘ew the correspondence problem asasearch problem: givenanelementin the lftimage, ve search fr the cormsponding element inherit mage. This involves two dezsons * which image element to match, and ‘which similarity measure to adopt. ‘= We postpone the scusion ofthe problem arising from the fact that not all the elements of one image have necessarily a corespondng element inthe other image ‘Myaig he otal ae etenct in atin pit a fit ditance from te cameras,6 Chapter? Stereopsis Figure 7.5 Aa llscaton of eorelation-basedeorespondence. We look for he right iage Point coresponding to the central pixel ofthe left-mage window. Ths window is correlated to several windows of the sae size in the ight image (oly afew are shown ere). he center of the right image window producing th highest correlation isthe corresponding point sought For the sake of convenience, we classify eorrespondence algorithms in two classes, correlation-based and feature-based methods, and discuss them separately. Although almost indistinguishable from a conceptual point of view, the two classes lead to quite different implementations: fr instance, coreltion-based methods apply tothe totality ofimage points; feature-based methods attempt to establish a correspondence between sparse sets of image features. 7.22 Correlation-Based Methods Incorrelation-based methods, the elements to match are image windows of fixed size, and the similarity criterion is a measure of the correlation between windows in the two images. The corresponding element is given by the window that maximizes the similarity criterion within a search region (Figure 7.5) As usual, we give a summary of the algorithm. ‘Algorithms CORR_MATCHING The inpotisasterco pir ofimages, (let) andl (ight Letprandp, be pitelsintheletandrightimags 2 +1 the wid (in pixel) ofthe correlation indo, Rp) the Search region nthe right image ssoctd with pan ») funtion of two piel values ux oreach pine y= [i,j othe lef image: 1. foreach displacement Ia. Rip) compute Section 7.2. The Correspondence Problem 187 LY week Di thd p+ t dy 3) 2 the ipaityofpristhe vector d= dl" that maximizes 8) over Rip a remax ((@] The eutputis an ares of esparits (the dipariy map, one pe each piel of ‘Two widely adopted choices forthe function y= Wu, v)in(7.3) are us) (a) ‘hich yields the erosscorrelation between the window in the left image and the search region inthe right image, and vu =-W—wF, 5) hich performs the 0 called SSD (sum of squared diferencs) or block matching. ‘= The elation etneencos-coreaton and block matching becomes apparent expanding (75) (Exercie 73), Notice thatthe two choices of yin (7.4) and (75) lead to the same set of corre spondences ifthe energy ofthe right image inside each window, defined asthe sum of the squares ofthe intsty values inthe window, is constant across the search region. [In many practical situations, thisis unlikely to happen, and (75) i usually preferable to (7-4) The teason is tat SSD, unlike cross- ePPOLAR ore : tine Nt Ne 5 ‘ EPIPOLAR PLANE Figure7.6. ‘The epipolar geometry ‘= Note that point vectors denoted by the same bold capital eter but by diferent subscripts like P and P ident the same point in space. The subscript! or tls you he reference {ame in which we vectors are expese (elt sgh) Instead, point vectors denoted by ‘he san old salletr but bya diferent subscript, ike pan p,xidenty diferent pois insgce (ie, belonging to diferent image plans). In this ease, the subscript els you alo thea leon ce erste Tish iba er tie ase 73.2. Basics "The reference frames of the lft and right cameras ae related via the extrinsic param: eters. These define arigi transformation in 3-D space, defined by a translation vector, T=(0, ~ 0)), and arotation matrix, R. Given a point P in space, the relation between Prand P,istherefor: (PT), 07) ‘The name epipolar geometry is used because the points at which the line through the centers of projection intersects the image planes (Figure 7.6) ae called epipoes. We denote the lft and right epipole by ey and e, respectively. By construction, de lef epipole isthe image of the projection center ofthe ight camera and vce versa ‘© Notie that, i te line through the centers of projection is parallel to one of the image lanes, the coresponding eppole the pint at init ofthat ine The relation between a point in 3D space and ils projections is described by the ‘usual equation of perspective projection in vector form: fi pay, P, 78)182 Chapter 7 Stereopsis and =f p= ze. 9) ‘The practical importance of epipolar geometry stems from the fact that the plane identified by P, 0, and 0, called epipolar plane, intersects each image ina line, called epipolar line (see Figure 7.6). Consider the triplet P, p, and p,. Given pi, P can lie anywhere on the ray from O; through py. But, since the image of this ray in the right ‘image isthe epipolar line through the corresponding point, p,, the correct match must lie ‘on the epipolar line. This important factis known asthe epipolar constraint. Itestablishes ‘a mapping between points in the eft image and lines inthe right image and vice versa © Incidomaly, sine al rays include the projection center by construction this also proves, that all tbe epipolar lines go through the epipole. So, if we determine the mapping between points on, say, the lft image and corresponding epipolar lines on the right image, we can restrict the search for the match of pr along the corresponding epipolar line. The search for correspondences {is thus reduced 10a I-D problem. Alternatively, the same knowledge can be used 10 verify whether or aot a candidate mateh ies on the corresponding epipolar ine. This is usually a most effective procedure to reject fase matches due to occlusions Let us now summarize the main ideas encountered inthis section: Definition: Epipolar Geometry Given a sereo pat of cameras any point in 3D space, P, defines a plane, p, going though andthe centers of projection ofthe two cameras. The plane xp scaled epipolar plane, andthe Tes where xp intersects the image planes conugated cipolr ines. The image in onecamera of the projection center ofthe other is ealled epipoe Properties ofthe Epipoles ‘With the exception ofthe epipole, oly one epipolar line goes through any image point ‘All he epipolar ines of one camera go through the camera's eppol, Definition: Epipolar Constraint Corresponding points mst ie on conjugated epipolar lines. ‘The obvious question at this point is, can we estimate the epipolar geometry? (Or equivalently, how do we determine the mapping between points in one image and cpipolarlinesin the other? This is the next problem we consider. Is solution also makes clear the relevance of epipolar geometry for reconstruction. 7.3.3. The Essential Matrix, E ‘Theequationof the epipolar plane through P can be written asthe coplanarity condition of the vectors Py, T, and Py —T (Figure 76), or Section73 pipolar Geometry 153 DTT xP =o Using (7.7), we obtain RTP) TT «P, 7.40) Recalling that a vector product can be written as a multiplication by a rank-defcient mati, we can write where at) Using this fat, (710) becomes 2) with E 3) Note tha, by constriction, has always rank 2. The matrix H is called the essential ‘aris and establishes natura lnk berween the epipolar constraint and the extrinsic parameters ofthe stereo system. You wll earn how to recover the extrinsic parameters from the essential matrix inthe next section Inthe meantime, observe tha, using (78) and (79), and dividing by 2,2), (712) can be ewritten as Py Ep|=0. ay As already mentioned, the image points p and p,, which lie on the left and right image planes respectively, ean be repasded as points inthe projective plane P* defined by the left and rightimage planes respectively (Appendix, section A.4). Consequently, you are entitled to think of Ep, in (7.14) as the projective line in the right plane, w, that goes through p, andthe esipole os Pe (as) [As shown by (714) and (7.15), the essential mari isthe mapping between points and epipolar lines we wer: looking for. = Notice that the whole scusion used coordinates inthe camera reference fame, but what we atl ease from images are piel coordinates. Therefore, inorder tobe able to rake proftable use of the essential matrix, we eed to know the transformation from ‘camera coors to psel coordinates, tat the intnsie parameters. This imitation iscomoved in he nest socton but a pice.Chapter 7 Stereopsis 7.3.4 The Fundamental Matrix, F ‘We now show thatthe mapping between points and epipolar lines can be obtained from corresponding pints only, with no prior information on the stereo system. Let Mj and M, be the matrices ofthe intrinsic parameters (Chapter 2) of the lett and right camera respectively. If py and are the points in piel coordinates corresponding top and p, in camera coordinates, we have p= My pr (7.16) and = My by (any ‘By substituting (7.16) and (7.17) into (7.14), we have BF, as) where PM TeMs! uy) F isnamed fundamental mati. Te essential and fundamental matrix, as wells (7.14) ‘and (7.18), are formally very similar. As with Ep, in (7-14), F'n (7.18) canbe thought ‘ofasthe equation of the projective epipolar line that correspond tothe point, or = Fb 2) ‘The most important difference between (715) and (720), and between the essential and fundamental matrices, is that the fundamental marx is defined in terms of pixel coordinates, the essential matrix in terms of camera coordinates. Consequently. youcan estimate the fundamental matrix from a number of pont matches in pixel coordinates, ‘you can reconstruct the epipolar geometry with no information at all onthe ntnsic oF ‘extrinsic parameters ‘= Thisindicates thatthe epipolar onsraint asthe mapping between points and conrespond ing epipolar lies canbe established with o prior knowledge ofthe stereo parameters ‘The definitions and basic mathematical properties of these twoimportant matrices are worth a summary Definition: Essential and Fundamental Matrices Toreech pair ofcorespondng points pandpincamera coordinates the essential marx satisies the equation PER =O. Toreach par of corresponding pont randy in piel coordinates, the fundamental mars satsis the equation oT rh=0. Section73Epipolar Geometry 155 Properties Both matrices ena full econstrcton ofthe epipolar geometry ICM, and M, ate te mati ofthe incase parameters, the relation between the essential and fundamental mates given by ‘The eset marie 1. encodes information onthe extinsie parameters only (se (7.13) 2 haseank 2, since in (7-11) haseank and fll rank 2. its two nonzcro singular values are equal ‘The fundamental marx: 1 encodes informston on both the intrinsic and extrinsic parameters 2 has ank 2,soce and 7 have fll rank and has ank 2 7.3.5 Computing £ and F: The Eight-point Algorithm ow do we compute the essential and fundamental matrices? Of the various methods possible, the eight-point algorithm iby far the simplest and definitely the one you can not ignore (ifyou ar curious about other techniques, ook ino the Further Reading). ‘We consider here the fundamental matrix only, and leave it to you to work out the straightforward modification needed to recover the essential matrix, ‘The idea behind the eight-point algorithm is very simple. Assume that you have ‘been able to establish n point correspondences between the images. Each eorrespon- {dence gives you a homogeneous linear equation ike (718) for the nine entries of F; these equations fonn a homogeneous linear system. If you have at last eight correspondences (ie, n > 8) and the n points do not form degenerate configurations® the rine entries of F cas be determined asthe nontrivial solution ofthe system. Since the system is homogeneous, the solution i unique up toa signed scaling factor. fone uses ‘more than eight posts, so thatthe system is overdetermined, the solution ean once again be obtained by means of SVD related techniques. If Ais the system's matrix and 4—UDV", the solation isthe column of ¥ corresponding to the only null singular value of A (See Appentis, section A.6), Because of noi, numerical errors and inaccurate correspondences, Ais more likely to be fullrank, and the olution isthe column of V associate withthe Test singular value of A = The estimated fundamental matrix is almost certsaly noosingular. We can enforce the ssngulaiy consi by adjusting he ets othe estimated matcn Fas Jone in Chapter6 "ora torugh dacnson o the dgenrte cnfiguaions of ht or more pois wll of he ints inf etn othe seal ana funda matics the Free Renings.|136 Chapter? Stereopsis forrotaionmatrzes: we compute the singular value decomposition ofthe catimated tix, ‘DV, and se the smallest singular valve onthe diagonal f the matrix D equal © 0.16 isthe corrected D mtr, the orrected estimate, Fis given by F'= UD" (ee Appendix, section A). ‘The following isthe asic structure ofthe eight- point algorithm: Algorithm EIGHT_POINT ‘Th input is formed byw point correspondences with n> 8. 1 Construct system (7.18) fom x correspondences Let be then 9 marx ofthe eoet- cients ofthe system and A= UDV" the SVD o A 2 Theentres of F(aptoan unknown, signed scale factors the components othe column of V corresponding tothe least Singular value of 4 ‘& Toenfore the singularity constant compute the singular valve decompenition of F: F=vpv". 4 Sethe smalessingularalueinthe digoaalotD equalto et D'bethe corrected marx The coreced estimate of F, is ally given by Feupv" The output isthe estimate ofthe fundamental mates, © Inowderto avoid numerical asses the eightpointalgoritim shouldbe implemented wi care, The most important action to take i to normalize the coordinates ofthe cor responding point otha the envies of & ae of comparable size. Typically, the Hist two ‘ooedinats (in pixc) ofan ig point ae refered tothe top lt comer of the image, tnd can vary between afew pixels toa few hundreds the llerenes can make A seriously itteonitioned (Appendix section A.6). To make things worse, the third (nomogeneous) ‘ooeinate of nage points is usualy 10 one. A simple procedure to avoid numerical instability ito translate the first two coordinates of cach point tothe centroid of ach data st, and scale the norm of each points hat he average norm over the datasets. This ‘ean be accomplished by muting each let (ght) point by two suitable 3 3 matrices, Hi and H, (ge Excreue 16 lor details on how to compute both Hand H,). The alge ‘thm EIGHT. POINT i then used o estimate the mari F = HF, and F obiaized as aptea 7.3.6 Locating the Epipoles from E and F ‘We can now establish the relation between the epipoles and the two matrices F and F- Consider for example the fundamental mattis, F. Since &; lies on all the epipolar lines ofthe lft image, we can rewrite (7.18) as Bre Section73 Epipolar Geometry 157 for every py. But since F isnot identically zero, this is possible if and only if Fe on) From (7.21) and the fact that F has rank it follows that the epipol, isthe null space of F: similarly, 6, i the null space of F”. ‘We are now in position to present an algorithm for finding the epipoles. Accurate epipote localization & helpful for refining the location of corresponding epipolar lines, checking the geometric consistency ofthe entre construction, simplifying the stereo ‘geometry and recovering 3D structure in the case of uncalibrated stereo, ‘Again we present the algorithm in the cas of the fundamental matrix. The adap tation tothe case ofthe essential matrix is even simpler than before. The algorithm follows easily from (7.21): To determine the location ofthe epipoles, itis sufficient to find the nul spaces cf F and F These canbe detemined for instance from the singular value decomposition P = UDV™ and F=V DU" as column of Vand U respectively corresponding tothe nl singular value inthe diagonal matrix D ‘Algorithm EPIPOLES LOCATION ‘Tre input isthe fundamental mates F 1, Find the SVD ofthat is, = UV". 2 The epipolee isthe column of V corresponding tothe nl singular value. ‘3. The epipole isthe columa of V corresponding othe aul singular vale ‘Tae output ar the pipes, and, ‘Notice hat we cn safely assume that hors exactly oe singular value equa to because algorithm EIGHT POINT enforces the singularity constraint exp thas to be noticed that there are alternative methods to locate the epipoles, not ‘based on the fundamental matrix and requiring as few a 6 point correspondences. More about them in the Further Readings. 73:7 Rectification ‘Before moving on tothe problem of 3D reconstruction, we want to address the issue of rectification. Given pair of stereo images, rectification determines transformation (or warping) of each image such that pairs of conjugate epipolar line become collinear and parallel to one ofthe inage axes, usually the horizontal one. Figure7.7 shows an example, The importance of rectification is that the correspondence problem, which involves - Dsearch in general, reduced ta J-D search on ascantine identified rivially. In other ‘words to find the point corresponding to (i,j) of the lft image, we just look along the scanlne j= jin the right image158 Chapter? Stereopsis © Figure 77. (a) A stereo pair (b) The pair retied. The left images pot the epipolar ine ‘omesponng tthe points mike in the ight pictues Stereo pa courtesy of INRIA (France). Let us begin by stating the problem and our assumptions Given a treo pata images the intinse parameters of each ames, and she extrinsic param ‘ers ofthe system, & and T, compute the mage transformation that makes conjugated epipolar lines collinear and pail tothe horizontal image ax. "The assumption of knowing the intrinsic and extrinsic parameters ot strcly necessary (see Further Readings) but leads to a very simple technique. How do we Section? Epipolar Geometry 159 > ° Figure 7.8. Recitation of stereo pat: The epipoa ines estocited toa -D point Pinthe original eancrs (bac ines) become colincar inthe rected camera liht sey) Notice thatthe orginal cameras cab bein any positon, and the opteal axes may 20 about computing the rectifying image transformation? The rectified images can be thought of as acquired by a new stereo rig, obtained by rotating the original cameras around their optical centers. Tiss illustrated in Figure 78, which shows also how the points of the rectiiedimages are determined fom the point of the original images and their corresponding projection rays proceed to describe a rectification algorithm assuming, without losing gener ality that in both cameras 1 the origin ofthe image reference frame is the principal point 2 the focal engthis equal o f160 Chapter 7 Stereopsis The algorithm consists of four steps: + Rotate the lft camera so thatthe epipole goes oii along the horizontal axis, ‘Apply the same rotation tothe right camera to recover the orginal geometry. + Rotate the right camera by R. ‘Adjust the scale in both camera reference frames. To carry out this method, we construct triple of mutually orthogonal unit vector e, ez, and e. Since the problem is underconsraned, we are going to make an arbitrary choice. The first vector, e, isgiven by th epipole; since the image centers inthe origin, coincides with the direction of translation, or T a= ay The only constraint we have on the second vector, eis that it must be orthogonal to To this purpose, we compute and normalize the cross product of ey wth the direction vector ofthe optical axis to obtain 1 yaen Tho third unit vector is unambiguously determined as [-y.n.0)" eer xe Tis easy to check that the orthogonal matrix defined as om) rotates the left camera about the projection center in such away that the epipolar lines ‘become parallel tothe horizontal axis This implements the fist step of the algorithm. Since the remaining steps are straightforward, we proceed to give the customary slgo- sithm: ‘Algorithm RECTIFICATION ‘The aputs formed by he asasc and extrinsic parameters of astereo stem anda se of points ineachcamerato be rected (which oul be the whole images). In additon, Assumptions 1 and shove old 1 Bld the matrix Rr 8810 (722), 2 SOUR = Rog and = RR Section 7.4 3.0 Reconstruction 161 ‘3 For ach eft-camera point, p=; fT compute Reale andthe coodins of he corresponding rected poi abo 4. Repeat the previous step forthe right camera wing Rand ‘The ouput hepa of transformations to be applica to the two camer in oder to rectify the two input point se aswell as Notice thatthe rectifed coordinates are in general not integer. Therefor, if you ‘want to obtain integer coordinstes (for instance if you ate rectifying the whole images), you should implemeat RECTIFICATION backwards, that i, starting from the new image plane and applying the inverse transformations, so tha the pixel values inthe new image plane canbe computed as bilinear interpolation of the pel values in the ‘ld image plane. ‘= A rected imageis notin general consid inthe same region ofthe image plane athe ‘original image. You may have to alter the fol lenghs ofthe rece camera to kesp all the points withinimages of the same sz asthe orginal We are now full equipped to del with the reconstruction problem of stereo, 7.4 3D Reconstruction We have learned methods for solving the correspondence problem and determining the epipolar geomety from atleast eight point correspondences. At this point, the 3-D reconstruction tht can be obtained depends onthe amount ofa priori knowledge available on the parameters of the stereo system: we ean idemty three cases” First, if both intrinsic and exrnsic parameters are known, you can solve the reconstruction problem unambiguously by triangulation, as detailed in section 71, Second, if oaly the intrinsic parameters are known, you can stl solve the problem and, at the same time, estimate the extrinsic parameters of the system, but only up to an unknown sealing facior. Thied ifthe pixel correspondences are the only information available, and neither the intritsc nor the extrinsic parameters are known, you ean still obtain a reconstruction of tie environment, but only up 10 an unknown, global projective ‘transformation. Heres «visual suimary. ine here ar sever interment ae bus ws concert om tee the for snp,162 Chapter 7 Stereopsis | Priori Knowledge “ED Reconstruction from Two Views Intrinsicand extinde parameters Unambiguous (absolute coordinates) Inns parameters only Upton unknown scaling factor NNoinformation on parameters Uptoan unknown projective transformation of the enironment ‘We now consider these three casesin turn, 7.441 Reconstruction by Triangulation ‘This is the simplest case, I you know both the intrinsic and th extrinsic parameters of your stereo system, reconstruction is straightforward, “Assumptions and Problem Statement ‘Under the assmption that the itis and exinsic parameters are known, compute the -D Tocition of the pens from her projections pr and py [As showin in Figute 7 the point P, projected into the pair of corresponding points py and p, is atthe intersection of the two rays from 0; through pr and from 6, through p respectively. In our assumptions the rays are known and the intersection can be compiited. The problem i since parameters and image locations are known only approximately, the wo rays wll not actually intersect space; thei intersection can only be estimated as the point of minimum distance from both rays This is what we set off todo, Let ap (a €)be the ray. through O; and py-Let T+ bRp, (beR) be the ray, ‘through O- and p, expressed in the left reference frame, Letw be a vector orthogonal to both | and . Our problem reduces to determining the midpoint, P, ofthe segment paral! tow that joins # and r Figure 79) ‘Thisis very simple because the endpoints ofthe segment, say agp and + byR™Pr, can be computed solving the linear system of equations apy — DR" +e(H x RTP =T ) foray, by, and cy, We summarize this simple method below ‘Algorithm TRIANG All vectors ad cordinates ate referred othe left camera reference frame. The inputs formed bya set of corresponding pins; let rand p be a generic pai. Section 74 3.0 Reconstruction 163 etapa ¢B, be the ay J through 0, (a =0) and ps a= 1)-LetT + 5R"p, eB, thera r through 0, (0 =0)andp,( Pps the vector erthogonal both andr and ‘ay -+ev, CR, te line w through ap (or sme ed) and paral! ow 1, Determine the exdpoins ofthe segments, belonging othe ine paalel tw that joins and .ayp and T+ yp, by solving (723). 2% The triangulated point, Ps Ue midpoint ofthe segment The output isthe set reconstructed 3-D pots, "The determinanot the coetcens of system (7.23) isthe tiple product ofp, Rp. and 8, compute tho loeaton of the 3D points rom their projections p and. Unlike triangulation, in which the geometry ofthe stereo system was fully know, the solution eannot rely on sufficient information to locate the 3D points unambign- ‘ously, Intuitively since we do not know the baseline ofthe system, we cannot recover the ‘rue cae of he viewed scene. Consequently, the reconstruction is unique only upto an ‘unknowa sealing factor. This factor ean be determined if we know the distance between ‘two points in the observed scene. ‘The origi of his ambiguity is quite clear inthe method that we now preseat. The first step requires estimation of the essential matrix, £, which can only be known up to an arbitrary eae facto; therefore, we look fora convenient normalization of E. From, ‘the definition of the essential matrix, (7.13), we have. E Talas=s"s, 412 nT, -Tt eve=| “1 +1? -11 (728) “EI, ~RI, T2472 From (725) we have thatthe trace of EE is TH(ETE) =2TP, so that dividing the enties of the essential matrix by rere is equivalent vo normalizing the length of the translation veetor to unit. Notice that by effect of this normalization, the dference between the resents matrix andthe one estimated through the eight-poin algorithm gat most a global sig change Section 7.4 3.0 Reconstruction 165 ‘Using this normalization, (725) canbe rewritten as . (726) where 2 is the normalized essential matrix and t= the normalized translation vector. Recovering tke components of T from any row or column of the matrix E°E is now a simple mater, However, since each entry of the matiix E™E in (7.26) is quadratic in the components of 7, the estimated components might differ from the true components by aglobal sign change: Let us assume, forthe time being, that T has been recovered withthe proper global sign; then the rotation matrix can be obtained by simple algebraic computations. We define ct, azn with-=1,2,3and Ee theee ons ofthe normalized essential matrix thought ofas D vectors If are the rowsof the oaton matrix Ragin thought of s3-D vectors, éasy but rather lengthy algebraic exlculations yield R vbw) om 728) With the triplet (j,k) spanning al eycic permutations of (1, 2,3), Insummary given an estimated, normalized essentil mattix, we end up with four «ifferent estimates forthe par (FR). These four estimates are generated by the twofold ambiguity inthe sign of £ and’. The 3-D reconstruction ofthe viewed point resolves ‘the ambiguity and finds the only correct estimate. Fr each ofthe four pairs (FR), we compute the third component ofeach point in the lft camera reference frame, From (72) and (79), and since 2, =Rj (P|, we obtain L£R@~t) Re ‘Thus, forthe fist component ofp, we have ARE 125 we} = Fal plugging (78) ino (7.29) with T=, and solving for Zi, pn glfRicaROt ox) eR aR) we We can recover the otter coordinates of ; fom (7.8), and the coordinates of P, from the relation P= RPT (3)166, Chapter? Stereopsis "eturs out that ony one ofthe fourestiate of, vies geometrically consistent (ie, postive) Zand Z coordinates forall the points. The ston to take inorder to Aketimine the correct solution are dealed inthe box below, which summarizes the entire lgorithm. ‘Algorithm EUCLID REC ‘Te inputs formed by asetofcoresponding image pins in camera cooninates with pnd Pe «generic pai, and an estimate ofthe normalzed esetial mati, 1. Recover rom (7.26, 2 Come the vectors w rom (727), and compute the rows othe mati Rog (728, -& Reconstet the Zand, oot each oi! using (7) (8) (3). 44 thesis of and, ofthe reconstructed pins are (a) bah negative for some pin cbangethesga rand potosep3s (6) one negative, one psive for some pois, cane thes of ach ey of Ean po tosep (6 bth potive ora point ‘Tae outputs the st of ecostractd 3D points (upto x scale factor. = When implementing BUCLID_REC, make sue tat he grt des not though mor than iterations of steps 2-4 Sine there are only 4 possible combinations forthe unknown signs ofT and). Keepin mind ht inthe caeof very small placement, tie eros in te ipariy estimates may be sficen wo make the }D reconstution inconsistent when ths happens the algorithm keeps going trough steps 24. 7.43. Reconstruction up to a Projective Transformation The am ofthis section show that youcan compute a3-D reconstruction even inthe absence of any information on the intrinsic and extinsie parameters. The pice to pay thatthe reconstruction is unique only up oan unknown projective ransformation othe ‘orld The Further Readings point you to methods for determining this ransformaton ‘Assumptions and Problem Statement Assuming bat oly m point correspondences are given, with > 8 (and therefore the location of the epipoes, end €),compute the location of the points rom thee projections, pane. = It worth noticing that, fo estimates of the intinsc and extrinsic parameters are valable and nonlinear deformations canbe neglected, the accuracy ofthe reeansruction ‘only affected by that ofthe algorithms computing he disparities, not by calibration, The plan for this section sas follows, We show that, mapping five arbitrary scene point ino the standard projective bass of P*, and using the epipoes, the projection Section 7.4 30Reconstrution 167 ‘matrix ofeach camera can be explicitly recovered up to an unknown projective transformation (the one associating the standard basis to the five points selected, which is "unknown as we donotkrow the locaton of the ve3-D points in camera coordinates)” (Once the projection ratrices are determined, the 3-D location of an arbitrary point in space is obtained by uiangulation in projective space, You can fin the essential notions ‘of projective geometry needed to cope with allthis in the Appendix, section A. Determining th Projection Matrices, order to carry out ou pla, we intro duce a slight change of notation. In what follows, we deop the andr subscripts and ‘adopt the unprimed end primed letters to indicate points in the left and right images respectively, In addition, capital eters now denote points inthe projective space P° (four coordinates), while small eters points in P? (three coordinates). The 3-D space is reparded as a subsot of P?, and each image plane asa subset of P2. This means that we regard the 3-9 point [X, ¥,2]" of B® as the point [X,¥, Z, I” of P% and a point [s,)]” of R? asthe point [x,, 1” of P®. Let O and denote the projection centers, We let Py... Py be the point in P® toe recovered fom their ft and sight ages, pi... Pe and p...p amd assume that, ofthe first ive B;(Py.Pa,... Ps), no three ae colinearand no four are coplanar. Wetirstshow tat, f we choose P;,Pr,..., Psasthe standard projective bass of P* (see Appendix, section A 4), each projection matrix canbe determined upto a projective {factor that depends onthe location of the epipoles. Since a spaial projective transforma- fon is fixed ithe destiny of five points is known, we can, without losing generality. setup a projective transformation that sends P, Py... Ps into the standard projective basis of P2, Py = [1,0 0,0], Ps =(0,1,0,0]',Py~[0,0,1,0)", P= (0.0.0.1) and Pe=[l, Lisl) Forte coresponing image points pin the let camera, we an write MP, = op. (732) where Af is the projection matrix and py #0. Similarly sine a planar projective rans- formations fixed if th: destiny of four points is Knowa, we can also setup a projective transformation that send the fist four py into the standard projective bass of, that 10,0)", pp = (@, 1,0)", ps=(0,0,1)", and py= (11,1) »In-what follows it is assumed that the covedinates of the ft point, ps, ofthe epipoe e, and of any other Jimage point, p, ate obtained applying his transformation to Weir old coordinates The purpose ofall this isto simplify the expression of the projection matrix: substituting Pj,...,Py and py,....ps into (732), we see that the matrix Mf can be rewritten as 10 0 wm Dp 0 00 yy "You sald eosin youl that knowing he lation of tee pois nh camer eee ame sunt canes ala, wick ater defen be pl worsted seo 33)168 Chapter 7 stereopsis Let a, 8,7 be the coordinates of s inthe standard basis; (732) with i = S makes it posible f eliminate pa, pz and p rom (7.33), obtaining sy 0 a) M=| 0 Pos~oe 0m 73) 0 0 yps~ ps ps Finally snc a projection matrix is defined only up toa seal factor, wecan divide each enry of mati (734) by a, obtaining w-1 0 0 1 M=| 0 pr-1 0 1 (738) 0 0 yx-11 where x = psp. The projection matrix ofthe eft camera has been determined up othe unknown projective parameter x In oder to determine x, iis useful to relate the entries of M tothe coordinates ‘ofthe projection center, O. This can be done by observing that M models a perspective projection with © as projection center. Therefore, M projects every point of P?, with he ‘exception of O, into a point of P® Since M has rank 3, the nul space of Mf is nontrivial and consists necessarily of O: Mo=0. (736) Equation (736) can be solved for 0,, 0, and 0. Pace) eu Corresponding relation and results can be obtained forthe right camera (in the primed reference frame). In parfculr, we can write wy-1 0 0 1 w=| 0 py-1 0 a]. and a ] E (738) Since the location of the epipoles is known, x and 2’ (und hence the full projection ‘matrices and the centers of projection) can be determined from MOo'=ce i) Section 74 20Reconstuction 169 and MO=oe 40) with o #0ando' ¢0 L's ee first what we can recover fom (7.38). Substituting (7.35) and (7.38 into (7.38), we obtain te followin system of equations FSel(2)-E) Since is unknown, system (7.4) is homogencous and nonlinear inthe three unknown -%.2'and a. However, we can regard it asa lieae system in the unknown x,» and a, so that solving foro” we have Son) VT(ps xP) with v = (ae, p'e,,y"e:). Since e, ps, ps, and v are known and the unknown factor cancels out, (742) acually determines x A similar derivation applied to (7.40) yields ay psx p) vTips% pO with v = (ae, }, ye). Having determined both x and x’ we can regard both the Projection matrices ad the centers of projetions as completely determined. (a3) Computing the Projective Reconstruction. We ate now in a position to reconstruct any point in P? given its corresponding image points p= [p.,py.p.]" and [Pi roe] - Te reconstructions unique wp othe unkown proj rane formation fixed by the choice of P,..., Ps as the standard basis for P°. Observe that the projective line! defined by 30+ 4 [Ores OyPy OP], cay, wih, w €Band nt both 0, goes through O (lord spe spy : ( oe 0 » Sine th ples nth cates of rojo ona ght in (738 and (74) ate otndepeser, Forte ppt ofthis nod bower hs can be sll areSection 7.6 Further Readings 171 170 Chapter? stereopsis 75 Summary Similarly, the projective line v0'+ [ote 0,p 0:00)» gh 0 and p The projective pot an hs with and ot both 0, oes through and The on read by merece vo poet ies and This amount aking {hn sou of the Homogeneous sytem neat exuations 0. Oars ~0; ~Or 0; Op, ~0; ~0;0, 45) 0, Op, ~0; ~O%9: |\ * roo a 0 dhe = Once again, singular valve decompostion 1 DVT of he system matrix of (7.45) provides uerteatyslable procedure for solving his inca yster: The solution i given by the Column of V associated wth the smiles singular vale along the diagonal oD. ‘Algorithm UNCAL, STEREO re ys ne in ei mi OLE. ‘se throughout te section inthe standard projective basis ‘Te outpatis formed hy the cordinates of Py. Having found a projective reconstruction of our points, how do we go back 10 cies oorinates 1 we know thelosaton oP... Psin the world fame, wean determine the projective ensfomation introduced atthe begining ofthis section hat mapped these ie pints thought of as pons of Pinto he tangas projet Dass (se the Append section A ordeals onhow o doit). Te Further Readings point { (nontrivial algorithms for Euclidean reconstruction which lax this assumption, need more than two images. After working through this chapter you shouldbe able to: explain the fundamental concepts and problems of stereo solve the correspondence problem by means of correation-based and feature- based techniques 1D estimate the fundamental (and, if possible the essential) matrix from point cor respondences determine the epipolar geometry and rectifying a stereo pair recover 3D structure from image correspondences when (a) both intrasie and extrinsic parameters are known, (6 onl the intrinsic parameters are known, and () both intrinsic and extrinsic parameters ate unknown 7.6 Further Readings ‘Tho literature on sero immense. You may wish to stat with the two classic orre- spondence algorithmsby Marr and Poggi [14,15], Among the mulitude of errespon. dence algorithms proposed inthe last two decades, we suggest the methods proposed in 2,9, 12,17]. A way to adapt the shape and sizeof SSD windows to difereut image pars is esried in Kanade end Okutomn [10] Rectication is discussed by Ayache [1] and Faugeras [5]. A MATLAB imple- ‘mentation of etiction algorithm based on projection matrcescan be downloaded ftom fep://taras.d:si.unind.it/pub/aources/sect stn. tar.gz. The algorithm and the implementation are due to Andea Fusilo, For uncalibrated rectification see Robert eta 20} ‘The cigh-pointslgoritm is due to Longuet Higgins [11 the normalization peo- cedure to avoid numerical insabliies (discussed at length in Exercise 16) i due to Harley [7]. Linear and nonlinear methods fr determining the fundamental mtr 8 well s stability sues and erica configurations, are studied by Luong and Faugeras 113. Shashua 21] proposed method for locating the epipoles and achieving proj. tive reoonsiruction that require only Six point matches The recovery of the exinic parameters fom the esential matrix described inthis chapters again due to Longue Higgins [11]. An alterrative method forthe calibration ofthe extrinsic parameters can te found in 8. We hve largely based the introduction to uncalibrated sere onthe seminal paper by Faugeras [4], Similar results are discussed by Spar [22] and Mohr and Arbogast 18). More eenty a numb ofmthods fr the recovery of Euclidean struc ture rom the projetv: econsrucion have been proposed (se [36,1] fr example) Ifyou are curios about understanding and creating autosterengrams, check cutthe Website neepi/ iw. ccc, nottinghan.ac.uk/-etzpe/sirds bial, The lace reconstruction in Figure 7.2 was computed by a sereo stem commercialized by the Turing Institute (ieep://evw tring gla. ac.) which maintains un intrest ing Website on stereopsis The INRIASyntim Web site contains useful test dat, including calibrated stereo pais (please notice the copyright atached!) beep. syatin. iia. fx/eyutin/analyze/paites-eng bt)172 Chapter7 _stereopss Section7.7 Review 173 “Toconclude, we observe tha stereo-tke visual systems can also be bil taking two pictures of the same scene from the same viewpoint, but under different illumination Lv 5 = Day Nes (Ew randy pute and sire the possible values Of i you are sing (746)? Ey ¥8. Canyoupresom- ‘Thisis the so-called photometric stereo, rst proposed by Woodham [23] © 74 Discussstmategies for estimating cortespondences at subpixel precision using correlation-based methods (Zin: Keep tack ofthe value of win the neighborhood ofthe maximum, and take a weighted average of every integer location in 177 Review that neighborhood.) Make sue tht the weights ate positive and sum o 1 ‘Guains (© 75 Design a corelation-based method that canbe used to match edge points. 1 74 What are the intrinsic and extrinsic parameters ofa stereo sytem? 517.2. What are the main properties of correlation-based and feature-based meth- ‘ods for fading correspondences? 13 73 What isthe epipolar constraint and how could you use it to speed up the search for corresponding points? 1374 What are the main properties ofthe esential and fundamental matrices? (1175 Whats the purpose of rectification? (376 What happens in ectfication if the focal lengths of the two original cameras are not equal? 7.7. What sort of 3D reconstruction can be obtained if al the parameters, only the intrinsic parameters, or no parameters can be assumed to be knows? 76 Determine the matrices MH; and H, needed to normalize the entries ofthe fundamental matrix before sppling algorithm EIGHT POINT. (Hint Given a ye YT with Dive ‘Then, find the 3x 3 matrix H such that Hp. = i with p= [sy ~ 1/4 (9; ~ 3/4, I") Verify tat the average length of each com ponent of , equals 1 77. Write an algorithm reconstructing sene fom a rectified tere pir sin J 7.8 What is the purpose of step 4 in algorithm EUCLID_REC? rectified images coordinates. (Hint: Use the simultaneous projection ‘uations 4.79 Isthe reconstruction of the fundamental marx necessary for unabated associated toa -D pont in he two cameras) stereo? 78 Verity that(72) ean be derived from (7.23 in the special ase of he stereo 710 How can (123) reconstruct depts in misters ifthe focal length in system of Figure 74 nilmetrsis not known? © 79 In analogy with the case of point matching, compute the sition tthe ‘rangulation problem inthe case of line matching I andl, are the matched aa lines thisamourtstofnd the ln intersection othe panes though O an, and Oy and repost. Whyte riangulaton based on nes computationall © 7.1 Estimate the accuracy of the simple ae system Lobes 74 oa ‘easier than the triangulation based on points? ‘that the only source of noise isthe localization of corresponding points in the two 2 7.10 Assume T and Ri, and T, and R, are the extrinsic parameters of two cameras ‘images. (Hint: Take the partial derivatives of Z with respect to x, f.) a with respect to tie same world reference frame, Show | that ‘the translation vector, thedependene ofthe erin depth estimation sa fancton ofthe baseline wid See ene ee eee eae and the focal length. system composed of the two cameras are given by (7.24). (Hint: For a point P 9 72 Usingyoursouton io Exerc 71 estimate the accray with which features inthe word reletence fame we have Pe —RP I and Py = MP + But he ‘oul be lcaled inthe two ages in order to reconstruct ph witha lative teallonbaweer Brand P. epee Ecc RP D) értorsmaler than 1%. © TAY tates: notation of Section? tw =, x‘ with =<, Prove (© 73. Check what happens if you compute SSD and cross-correlation between an arbitrary pattern and a perfectly black pattern over a window W. Discuss the effect of replacing the definition of eross-correlaton with the normalized cross correlation ven) = =BO=D 06 that a@Ryt for every triplet (,j,) whichis a eyeic permutation of (1,2,3) and use the result to derive (728). (Hint Make use of the vector identity’ A (B x €) = (A OB—(A"B)C)™ Chapter? Stereonsis Projects © 71 Ina typical implementation of CORR-MATCHING, one computes e(d) = iy yateach piel for each posible shift (where W is the window over which ed) iSevaluated andy isthe pixelwise cross correlation measure), and stores the shift for which e() is maximum, Ifthe sizeof W is x, thisimplementation requires O(n) ditions. However, ifm is larger than afew units the overlap between the cortelation windows centered at neighbor pixels can be exploited to obtain & ‘more efficient implementation that requires O(2n) additions. The key ida i 0 compute e(d) for each posible shift tall piel fist. This makes it possible to ‘se the result ofthe computation of ed) for some dat one pixel to evaluate ed) ar the neighboring pixel, Here isa simple way to doit. For each posible shift, evaluate y over the entire image. Once you have obtained c(d) at some pixel p ‘over the window W, you can compute cd) forthe pixel immediately tothe right ‘ofp, for example by simply subiacting the contribution to c(d) from the leftmost column of W and adding the contribution from the column immediately to the right of W, The memory equizement isnot much different, a for each shift you donot need tosave the value of ¥ over the entire image, but only the intermediate ‘maximum of o(d) {and corresponding shift) forall pixels Implement tis version ‘of CORR- MATCHING and compare it with the standard implementation © 72. Design and implementa program that, givenastereo pair, determines atleast «ight point matches, then recovers the fundamental matrix and the location ofthe ‘epipoles. Check the accuracy of the result by measuring the distance between the estimated epipolar lines and image points not used by the matrix estimation. References IN Ayache, Anca Viton for Mobile Robos: Stereo Vision and Mulisesory Perception, MIT Press Cambridge (MA) (19) NN. Ajache and B.Fveron, ficient Registration of Stereo Images by Matching Grap Descriptions of Edge Segments, Intemational Jounal of Computer Vision, VoL, na. 2 987), [5] Devernay and OD. Faugeras From Projetve to Euclidean Reconstruction, Technical Report 272, INRIA (1985) avilable from heep:/ esas. 2). [4] OD Faugeras, What Can Be Seon a Throe Dimensions wih an Uncalirated Stereo Rig? rc. 2 Eurypeun Qnferenc on Compe Vion, Sata Macher (0a), pp. 2-578 (19%), [5] OD. Faugeras, Three-Dimensional Computer Vision: A Geomanic Viewpoint, MET Pres, Cambridge (MA) (1993), RL Hatley Estimation of Relative Camera Postion for Unealbated Cameras, Proc 2nd European Conference on Computer Vision, Santa Margherita (Italy), pp. 519 87 (1992) [7] RA. Hardy, In Defence of the &-Point Algorithm, Proc. Sth Intemational Conernce on Computer Vision, Cambridge (MA), p. 106-107 (195). BKCP. Hom, Relative Orientation, International Journal of Computer Vision, VOL. , pp 5978 (199), By io 8 References 175 [9] DG-ones and 1. Malik, Determining 3-D Shape from Orientation and Spatial Freguency Dispaies, Proc 2nd European Confrence on Computer Vision, Santa Margherita Ital), Pp. 66-69 (1092), {10} T. Kanade and M. Okutomi, A Stereo Matching Algorithm with an Adaptive Window: ‘Theory and Experiments [BEE Transacionson Pater Anais and Machine Ineligence, ‘Vol. PAM-16, 920-92 (199) {11} HLC. Longuet-Higins, A Computer Algorithm for Reconstructing a Scene frm Two Projections Notre Vo 293, no. 10, pp 13313598, [12] BD.Lucasand'T. Kanade, An Herative Image Registration Technique wih an Application to teteo Vision, Foe Intemational Joint Conference on Aa Inceligence, pp. 514599 (981), 103] 0-1 Luongané OD. Faugeras The Fundamental Matrix: Tory, Algoitims an Sabi ty Analysis Inemational Journal of Computer Vision, Vol 1, p. 4-75 (1996). [14] D. Marr and: Poi, Cooperative Computation of Stereo Disparity, scence Vol. 194 pp, 253257 (1996, [15] D. Marr ana: Powio, A Computational hoor of Human Stereo Vision, Proe R. Soe Lond. B Vo. 208, p, 301-28 (1979). [06] 1. Mathes, 7: Kanade and R. Sreiski, Kalman FiterBssed Algorithms fr Estimating Depth fom Image Sequences, Inernational Journal of Computer Vision Vl. 3, pp. 29 236(1980, [17] JEW. Mayhew and LP Fis, Pychophysicel and Computational Studies Towards a ‘Theor of Human Steroopsis Aria Inligence, Vo. sp. 49-385 (198) [18] R.Mobr and E.Ashogast It Can Be Done without Camera Calibration, Pater Recogni- tion Leners, Vo. 139.3948 (1950), [19] M. Poets L. van Gool and M,Proesmans, Euclidean 3-D Recoastractio fom Image Sequences wit Viriable Foal Lenghts, Proc European Conference on Computer Von, Cambeidge (UK), pp. 31-2196), [20] 1. Robert, C. Ze, OD. Faugerts and M. Heber, Aplications of Non Metre Vision {0 Some Visually Guided RobotieTsks, Technical Report 2584, INRIA (1985) (avaiable from itp: fone ria. f). [21] A. Shasua, Projetve Depth: a Geometse Invariant for D Reconstruction fom Two PerspectivelOrthoyraphic Views and for Visul Recognition, Proc. IEEE Int. Conf on Computer Vision, Belin (ermany), pp 583330 (1983), [22] G.Sparr, An Altai Analytic Method for Reconstruction fom Imae Correspon- ences it Proc. 7H Scandinavian Conference on image Anas, pp. 274-281 (191) [23] RJ. Woodham, Ptotometric Stereo: Reflectance Map Technique fo Determining Sur: {ace Orientation froma Single View, Proc SPIE Technical Symposium on Image Under standing Sysems ad Industral Applications, Vl. 155, pp 136143 (1078),Eppursi moove! Galileo ‘This chapter concerns the analysis of the visual motion observed in time-varying image sequences Chapter Overview Section 8.4 present the basic soncepts importance and problems of visual motion, Section 82 introduces the notions of marion field and motion parallax, and their fundamental equations. Section 8.3 discusses the image brighmess constancy equation and the opical low, the approx ‘mation of the motion field which canbe compute from the changing image brighiness patter, Section 84 presents methods for estimating the motion fd, divided in diferential and feature- ‘marehingiracking methods. Section 8.5 deals with the reconsruction of 3-D morion and structure. Section 8.6 discusses motion-based segmentation based on change detection. What You Need to Know to Understand this Chapter ‘+ Working knowledge of Chapters 2 and? + Eigenvalues and eigenvectcrs ofa matrix + Least squares and SVD (Appencix, section A), ‘+ The basics of Kalman filtering (Appendix, section A). "and eis movig ”178 Chapter8 Motion 8.1. Introduction ‘Until now, we have studied visual computations on single images, or two images ac= quired simultaneously. In this chapter, we broaden our perspective and focus on the processing of images over ime. More precisely, we are interested in the visual information that can be extracted from the spatial and temporal changes occurring in an mage sequence, Definition: Image Sequence ‘An image sequence is a series of N images or frames, acquired at discret ime instants = fo-+ Rt where drifted time interval andt-=0,1,.... —1 == Inondertoaoquitean image sequence, you needa frame grabber capable of storing frames stafat sate Typial rates ire the socalled frame rae and eld rae, correspondngtoa time invert of 24sec and 30sec respectively Iyouare allowed tochoosaiferenttime interval osimply want to subsample an image sequence, make sure that Arssmall enough toguarantoe that the dsereesequences representative sampling ofthecoainveusimage ‘volving overtime; as aul of thumb, hi means that the apparent displacements over the image plane between frames should be at mos few pel. Assuming the illumination conditions do not vary, image changes are caused by a ‘relative motion between camera and scene: the viewing camera could move infront of ‘static scene, or parts ofthe scene could move infront of a stationary camera, or in ‘general, both camera and objects could be moving with different motions. 8.1.1. The Importance of Visual Motion ‘The temporal dimension in visual processing is important primarily (or two reasons. Firs, the apparent motion of objects onto the image plane isa strong visual eve for understanding structure and 3-D motion, Second, biological visual systems use visual tuoion (infer properties ofthe 3-D wotld wit litle prior’ knowledge of it Two simple examples may be useful to illustrate these points Example I: Random Dot Sequences. Consider an image of random dots, gener ated by assigning to each pela random grey level. Consider a second image obtained by shifting a squared, central region ofthe ist image by afew pels, say, to the righ and filling the gap thus created with more random dots. Two such images are shown in Figure 8.1. Ifyou display the two imagesin sequence on a computer screen, in the same window and one after the other ata suficientl fst rate, you will unmistakably see a square moving sideways back and forth agninst a steady background, Notice that the Section 8.1. Introduction 179 Figure 8.1 sequence of two random dot images a square hasbeen spaced bawecn the two frames, visual system base ts judgement on the only information available in the sequence; that ithe displacement of the square in the two images? ‘Example 2: Computing Time-to-Impact. Visual motion allows us to compute "useful properties of the observed 3-D world with very litle knowledge about it. Com sider planar versionof the usual pinhole camera model, and a vertical bar perpendicular tothe optical axis travelling towards the camera with constant velocity as shown in Figure 82. We want to prove a simple bu very important fact: Is possible to compute the time, taken by te bar to reach the camera only from image information; that is, without knowing either te real sizeof the bar or is velocity in 3-D space? Asshownin igure82, we denote with Ltherealsize ofthe bar, with Vtsconstant velocity, and with f te focal length ofthe camera, Te origin ofthe reterence fame is the projection center. the position ofthe ba on the optical ais is D(0) = Dy at time +=, is positon at alater timer will be D= Dp ~ Vt. Note that L, V,f, Dy, and the choice of the time origin ae ll unknown, but that ean be written as fz D From Figure 82, we wee that (7), the apparent size of the bar at time 1 on te image an) idetlyyoucan oka the wo apes of Figure 81a ono do sereoyran pecs qate oan in he background Stand ads (ora het of pape ofthe mei) beech he wo as doc your mse gst he diet, 0 tht each eye ean see ely One nage. Fats yee end the page Aer wie thetmo agi hol fie nd pce the pec ae ting ein ‘hetackground, Inthe inlopaly-crclleommuniyofompter vison, isc her pesmi sme -colson180 Chapter Motion Figure 82. How lonbefore the bar reaches the camera? we now compute the time derivative of 1), at phe adr aD take the ratio between 1) and), and use (81), we obtain to, to. (82) ro 2 ‘Thisis the equation we were after: since both the apparent sizeof the bar, ?), and its time derivative, are measured from the images, (82) allows us to compute + inthe absence of any -D information, lke the size of the bar and its velocity 8.1.2. The Problems of Motion Analysis Itisnow time to state the main problems of motion analysis. The analogies with stereo suggest to begin by dividing the motion problem into two subproblems. “Two Subproblems of Motion Corespondence: Which element of fame correspon to which elements ofthe nex frame of thesequence? Reconsracon: Given 4 number of corresponding elements and possbly knowledge of the camera's inns parameters whatcan we sayabout the +D motion andsiusture ofthe observed world? Main Differences between Motion and Stereo Corespondence: As image sequences ate sampled temporally at usally high ate the spatial ‘erences (disparities) between consecutive frames are on average uch smaller than those of typical stereo prs Recorsrucon: Unie stereo, in motion the relative 2D displacement between the viewing Camera and the scene is not necessary caused by a single }D nig transformation Section 81 Introduction 181 Figure 83 ‘Thre ames rom along image sequence (ft right and top cotta) and the ‘Optical flow computed fom the sequence, sowing thatthe plant inthe foreground is moving Award the camera, nc the sof toys away from it Regarding correspondence, the fact that motion sequences make many, closely sampled frames availible for analysis is an advantage over the stereo ease for at least two reasons, Fis, feature-based approaches can be made more effective by tracking techniques, which exploit the past history of the features’ motion to predict disparities inthe next frame. Sesond, due tothe generally soll spatial and temporal differences ‘petween consecutive frames, the correspondence problem can also be casts the pro: Jem of estimating the apparent morion ofthe image brighmess patern, usually called ‘opicl flow (see Figue 8.3).Chapter 8 Motion ‘We shall use two strategies for solving the correspondence problem, Differential methods (sction 84.1) lead to dense measures; that i, computed at cach image pixel, They use estimates of time derivatives, and require therefore image sequences sampled closely. “Matching methods (section 842) lead to sparse measures; that is. computed only at a subset of image points We shall place emphasis on Kalman tering asa technique for matching and tracking lficenty sparse image features over time, Unlike correspondence, and perhaps not surprisingly reconstruction i more dif cult ia motion than in stereo, Even inthe presence of ony one 3.D motion between the viewing camera and the scene, frame-by-frame recovery of motion and structure turns ‘ut tobe more sensitive to noise. The reason is thatthe baseline between consecutive frames, regarded asa stereo par, is very small (see Chapter 7). motion and struc: ture estimation from both sparse and dense estimates of the image motion is discussed in sections 851 and 852, respectively. This chapter discusses and motivates methods for solving correspondence and reconstruction under the following simplifying assumption. ‘Assumption ‘Tere is only one igi, reative motion betwoen the camera and the observed seen, and the ‘umination conditions Jo uot change Thisassumption of single rg motion implies that he 3D objects observed cannot ‘move of diferent motions. Tis assumption is volted, for example, by sequences of football matches, motorway traffic or busy streets butsatisie by, say the sequence of a building viewed by a moving observer. The assumption als rules out exible (nonrigid) objects: deformable objets like clothes or moving human bodies are excluded, If the camera is looking at more than one moving objet, or you simply cannot assume a moving camera in a static environment, a third subproblem must be added “The Third Subproblem of Motion ‘Thescemeniaion problem: What are he regiosof the image plane which correspond to dtferent moving objects? ‘The main dificulty here i a chicken and egg problem: should we first solve the matching problem aad then determine the regions corresponding to the diferent ‘moving objects, or find the regions first, and then Took for correspondences? This ‘question is addressed in section 86 in the hypothesis that the viewing camera is not ‘moving, Pointers to solutions to this difficult problem in more general cases are given in the Further Readings. Section 82 The Motion Field of Rigid Objects 183 ‘We now begin ty establishing some basic facts. 8.2. The Motion Field of Rigid Objects Definition: Motion Field The motion felis the 2-D vector ld of veloc ofthe image point induce bythe relative notion between the viewing eimera andthe observed sens The motion fied can be thought of as the proection of the 3-D velocity field ‘on the image plane (0 visualize this vector fel, imagine to project the 3-D velocity ‘vectors on te image, The purpose ofthis section is to get acquainted with the theory and geometrical properties ofthe motion fel. We shall work inthe camera reference frame, ignoring the image reference frame and the pixelization The issue of eamera ‘calibration wil be raed in due time This section presents some essential facts of motion fells compares disparity representationsin motion and stereo, analyzes two special eases of rigid motion leading to generally useful ft, and introduces the concept of mation parallax, 82.1 Basics Novation, Welet P=[X.¥,2]' be a-D point in the usual camera reference frame: The projector centers inthe origin the optical axisis the Z axis and f denotes ‘the focal length, Theimage ofa scene point, P, isthe point p given by P Paty 3) ‘As usual (See Chapter 2), since the third coordinate of pis always equal tof, we write s.9]” instead ofp = [x,y,f]"- The relative motion between Pand the camera can he described as Tox, 4) where Tis the tanstioal component of the motion and the angular velocity As the motion iii, Tand w are the same for any P In components, (4) reads T.-oztox TeX tort 65) mond tank “Remember, th mean tt we consider tents parameters know Sate that T denotes «cy vet aint hater, oo daca! ves ain the eo the okChapter Motion ‘The Basie Equations of the Motion Field. To obtain the relation between the velocity of P in space and the corresponding veloity ofp onthe image plane, we take the time derivative ofboth sides of (83), which gives an important set of equations. on Field "The Basic Equations of the Mo “The motion Ged i vem by cr In component nd sing (85), (86) read BET gpa 7 67 fF tof 0 Notice that she motion field is the sum of two components one of which depends on translation only, he other on roration only n particular, the translational components ‘of the motion field are pr Txn td, Since the component ofthe motion field along the optical axis always equal 00, we stallwritew = [> %]"intead of y= [v1.0] - Notice hain the last wo pais of ‘uations the erms depending onthe angular veloc, wand depth, Z, ae decoupled This discoses an important property ofthe motion fel: He part of te motion fd that depends on angular velocity doesnot carry information on depth Comparing Disparity Representations in Stereo and Motion. Aswesaid before, stereo und motion pose similar computation problems, and one of these is correspon dence. Point displacements are represented by disparity maps in stereo, and by motion fields in motion. An obvious question is, how similar are disparity maps and motion fields? The Key difference i that the motion feld ia differential concept, stereo di parity isnot. The motion field is based on velocity, and therefore oa time derivatives: Section 82 The Motion Field of Rigid Objects 185 Consecutive frames must be as close as possible to guarantee good discrete approsi- ‘mations of the continous time derivatives. In stereo, there is no such constraint on the ‘wo images, and the cisparities ean take, in principle, any value ‘Stereo Disparity Map and Motion Field The spatial dsplacemerts of eortespondng pins between the images ofa treo pair (forming ‘he stereo csprty ma) ae fn, an, in principe uneonsrained The spatial displacenents of corresponding points between consecutive Frames ofa motion sequence (forming the notion fol) ae discret apprenimatons of time-varying derivatives, an mus therefore be suitably sal, The motion fldeaincides wth the stereo dspaiy map ony if spatial and temporal ie ences between ames sen small. 82.2 Special Case |: Pure Translation ‘We now analyze the casein whic the relative motion between the viewing camera and the scene has no sotatinal component. Theresakng motion eldhes a peculiar spat structure anditsanais leads to concepts very usefl in general Since @=0, (8.7) read 8) Lots 7 ‘We fist consider the general case in which 7, #0. Introducing point py = [x,y] such that (89) (88) become T ws0-wl = Equation (810) say that the motion fel of a pure translation is radia: It consists of ‘vectors radiating fom a common origi, the point po, which is therefore the vanishing point ofthe translation direction. In particular if Te ~ O, the vectors point away from. Po, and py is called the focus of expansion (Figure 84 (a); iT. > 0, the motion field ‘vectors point toward po, and py i called the focus of canracion (Figure 84 (b)). Ia addition, the length cf v = vip) i proportional to the distance between p and po, and inversely proportional to the depth ofthe -D point P.106 SWZ Rule [= = ANY @ © 0 Figure 84. ‘The thre types of motion fli generated by translational motion. The led square marks the instantaneous epipoe. EE ‘ = = Notie thatthe pont py retains its significance and many of it properties even in the presence ofa eatonal component of D motion (section 852) ET vanishes (a rather special case), (8.8) become all the motion fed vectors are paral (see Figure 84(¢) and their lengths are inversely proportional to the depth ofthe corresponding 3-D points. © Inhomogeneous coordinates there would be no nee to dtnguish between the twocass T-40andT, =: Forall posible values of 7 including 7 =0, pis the vanishing point of the direction in 3D space of he teaslation vector T, and the -D kn trough the center of projection and po paral oT. Following is a summary of the main properties of the motion field of a purely translational motion ‘Pare Translation: Properties ofthe Motion Field 1. 10.40, the motion Sed isradale(8.10), andl vectors point toward (aay fom) sng oi en by (88) 17, =, he moto els paral. 24 The lng of motion fd ecto iin propotonal othe depth ZiT #0, hoinveney proportional othe dsanee am pop {piste vanishing pin ofthe direction of ranlation (ee (:10) 44 iste intrstion ofthe ray parallel othe eslaton vector with the image plane Section 82 The Motion Field of Rigid Objects 187 8.23 Special Case2: Moving Plane ‘Panes are common urface in man-made objects and environments, so its useful 10 investigate the properties ofthe mation field ofamaving plane. Assume that the camera is observing a planarsurface, x, of equation (ean) where m= [nyysr isthe unit vector normal 0, and d the distance between andthe origin (the enter of projection). Let x be moving in space with tansational ‘elosity Tand angular velocity so that both m and d in 1) are Funetions of ine By means of (83) (811) canbe rewritten as natny enh maser y f @2) Solving for 2 in (8.12), and plugging the resulting expression into (8.7), we have 1) where 24> Te Ty lox — Tye “The (8.13) stats, interestingly that she moron fed ofa moving plana surface, at ‘any instant, is @ quatratc polynomial inthe coordinates (x,y, $} ofthe image pois. "The remarkable symmetry of the time-dependent coeiicients ay... ass not co incidental. You can easily verify thatthe a remisin unchanged if d, m, T, and w are replaced by Tit T=(tin tmx T/A,Chapter 8 Motion This means that, apart from the special casein which m and Tare parallel, the same ‘motion field can be produced by two diferent planes undergoing two different 3-D ‘= The praca consequence is that it is usually imposible to recover uniquely the 3D structure parameters w and and motion parameters T and o, of plana se of points from the motion fed alone. ‘You might be tempted to regard this discussion on the motion field ofa planar surface as a mere mathematical curiosity. On the contrary, we can draw at least two important and general conclusions from it. 1. Since the motion field of a planar surface is described exactly and globally by 8 polynomial of second degree (see (8.13), the motion field of any smooth surface is likely 10 be approximated wel bya low-order polynomial even over relatively large regions of the image plane (Exercise 8.1), The useful consequence is that very simple parametric models enable «quite accurate estimation ofthe motion field in rather general cccumstances (section 8.4.1), 2. As algorithms recoverying 3-D motion and structure eannot be based on motion estimates produced by coplanar points, measurements must be made at many diferent locations ofthe image plane in order to minimize the probability of ooking at points that le on planar or nearly planar surfaces” We will return 10 this point in sections 85.1 and 85.2. ‘We conclude this ection with a summary ofthe main properties of the mation field ofa planar surace Moving Plane: Properties of Motion Field 1 The motion eld of «plana secs, any tne instant, x quadratie polynomial in the image coordinates. 2 Due to the special symmetry of the polynomial coefficient the sme motion eld cam be _rodced by wo diferent lana surfaces undergoing diferent 3D motors. 8.2.4 Motion Parallax ‘The decoupling of rotational parameters and depth in the (8.7) is responsible for what iscalled motion paralar. Informally, motion parallax refers to the fat that she relative ‘motion feld of two instantaneously coincident points does not depend on the rotational "srs shold not sures you, Pasa ssc lck neal The ght pit lorit (Chale), for example alto yl unig sin the pins al coplanar in pace ey pla” srface a uc tat can be apimsed by plan within a ive tolerance, which isp propel othe dts ofthe sua rs tas pane Section 82 The Motion Feld of Rigid Objects 189 4 | A_ 4 om Figure @5 Thre couples of nstantancouly coincident image points ad ther Now vectors 3) the diference vectors pint towards he istantaneous eipoe(). component of motion in 3D space; ths section makes ths statement more precise. Motion parallax willbe used in section 8.5.2 to compute structure and motion from optical ow. Let two points P =[X,¥, Z]" and P= [3, 7, 2]" be projected into the image points p and p respectively We know thatthe corresponding metion field vectors can be writen as and at some instants, the points p and phappen tobe coincident (Figure 8.5(a), we have oa nd the rotational components ofthe observed motion, (vx) and (i, 3), become Fem —oyf tony 4 22 nf tay Oe f 61) nf nox190 Chapter8 Motion Therefore, by taking the difference between v at ‘ou, and we obtain he rotational components cancel 1a Os-LIG- 5) 1d TMZ > ‘The vector (Av, 1) can be thought of asthe relative motion field Other factors being equal, Au and A, increase with the separation in depth between P and P. NNotie thatthe ratio between dv, and dv, can be written as avy y= Be vith so,” image coordinates ofp, the vanishing point of the tansation direction (Figure 85(b)).* Hence, forall posible rotational motions the vector (Ax, Av") points inthe direction of po, Consequentiyte dot product hetveen the motion fel, and the vetor [y~ ype ~ x, which s perpendicular to p~ p, depends nether on the MD sirutureof the seen nor onthe taeslatonal component of motion, and can be writen as v= Owe — G—a0bef. We will make use ofthis result in section 82, where we will eam how fo compute ‘motion and structure from dense estimates ofthe motion fed 1 Beaware thatthe vanishing point of wansation,p, and the point at wich» vanishes, cll, ivq, are in general diferent they consid onl i the mati spuely trasational. Any rutationl component about nas not perpendiculrto the image plate shits the postion ‘tq whereas the positon ofp remains uachanged,asits determined bythe anslational ‘component only. Somewhat deceptily the Now eld inthe neighborhood of q might ill Took very much lik a focus of expansion a contraction (see Fgure 83). ‘And heres the customary summary of the main ideas Motion Parallax ‘Tie relative motion ld of tw instantaneously coincident points: 1 doesnt depend onthe rotational component of motion 2_point towards away from) the pot, the vanishing point of the translation diretion 5 Seon 82 makes it lara hs pint ca he eared aan nonanear pipe. Section 83 TheNotion of Optical Flow 191 Figure 8. The point py as isantancous epipate 82.5. The Instantaneous Epipole We close this introductory section with an important remark. The point ps. being the intersection of te image plane with the direction of translation of the center of projection, can be regarded asthe instantaneous epipole between pairs of consecutive frames in the sequence (Figure 86). The main consequence of this property is that itis possible to locae pp without prior knowledge of the camera intrinsic parameters (section 85.2) 55 Notice that, ain the case of stereo, knowing the ipo location in image coordinates {smo equvalentto knowing the diction of transation (the baseline vector for stee0), The elation hetveen epipoe location and translation direction isspecied by (89), which is writen inthe camera (not image) fame, and contains the foal length f. Therefore ‘he epipole's locaion gives the recon of ranlaton only ifthe burns parameters ofthe viewing camera ae known, 83. The Notion of Optical Flow ‘We now move tothe problem of estimating the motion field from image sequences, that is from the spatial ind temporal vacations of the image brightness. To do this, we ‘must model the link between brightness variations and motion fed, and arrive at @ fundamental equation of motion analysis the image brighmess constancy equation, We ‘want also to analyze the power and validity of this equation, thats, understand how ‘much and how well can help us to estimate the motion field, For simpli, we will assume thatthe magebrighmes is continuous and differentiable as many tes as needed {in both the spatial and temporal domain,12 Chapter Motion 183.1. The image Brightness Constancy Equation 11s common experience that, under most ircamstances, the apparent brightness of troving objets remains constant, We have sen in Chapter 2 thatthe image radiance Fs proportional tothe scene radiance inte direction of the optical axis of the camera, ifwe assume that the proportionality factor i the same across the entire image plane the constancy ofthe apparent brightnes of the observed sene canbe writen athe stationarity ofthe image brightness E overtime: ae a sta (1) the image rightness, should be repre 8 function of oth the sai cording ofthe age plane, «andy and of ime, that B= lay 0), Sine and $ aren tr functions othe otal dtvatv in (5) should not be confused wih the aril desivatv ‘Via the chain ule of ferentiation, the total temporal derivative reads BGO. y0.0 _dBds , dy J 9E og a sar * ay ar * “The partial spatial derivatives of the image brightness are simply the components of te spatial image gradient, VE, andthe temporal derivatives ds/dt and dy/dr, the ‘components ofthe motion fv Using these fact, we can rewrite (8.16) asthe image brightness constancy equation (eas) 19) “The Image Brightness Constancy Equation Given the image righiness, E = (a, 1), and the motion Bld, (wey v+ B=. ein “The subscript denotes partial dferenttion wth respect time. ‘We shall now discuss the relevance and applicability of this equation for the estimation of the motion field. 832. The Aperture Problem How much ofthe motioa field ean be determined through (817)? Only is component inthe direction ofthe spatial image gradient. We can see this analytically by isolating the measurable quantities in (8.17). Ee a =u (8.18) IvEL~ TEL =) ‘ncomponentssreinc alle the norma component eae the paige grat isos to the pleco lang wich ingens esis cosa Section 83 The Nation of Optical Flow 193 Fgwe 87 The aerure pre: he Bk nd eye stow wo ston of ‘the same image ins in two consecutive frames. The image volosty perceived in a) ‘through the small aperture, only the component parallel othe image raient of the tre image velniy, revealed in). ‘The Aperture Problem The compen he notion ein the dito organo he pti imag rat is ‘not constrained by the image brightness constancy equation. : —— ‘The aperture problemcan be visualized as follows Imagine to observea thin black rectangle moving against white background through a small aperture, “Small” means thatthe corners ofthe rectangle are not visible through the aperture (Figure 87(a)); the small aperture simulates the narrow support ofa differential method. Clearly, there fare many, actually infinite, motions of the rectangle compatible with what you see ‘through the aperture (Figure 8.7()} the visual information available is ony sufficient {odetermine the velocity ia the direction onhogonal tothe visible side ofthe rectangle; the velocity inthe pale! direction cannot be estimated, - Notice tha the pall between (617) snd Figure 87 isnot eres. Eguation (817 relates ‘the image gradiet and the motion eld tthe same inage point, there estshishing ‘contrat aif smal spatial seppot instead, Fgure 87 describes a state ‘of afais over x sal bu ne spat eon. This iumediteysoggests tha a possible Steateg for solving the aperture problem so lok atthe spatial and temporal variations ‘ofthe mage brigtines vers neighborhood of ach pois. Wael hs te sppers tobe adopted the vs tem frites194 Chapter 8 Motion 83.3. The Validity of the Constancy Equation: Optical Flow How well does (8.17) estimate the normal component of the motion field? To answer this question, we can look at the difference, Av, between the true value andthe one estimated by the equation To do this, we must introduce a model of image formation, agoaunting forthe reflectance of the surfaces and the illumination ofthe scene. For the purposes of this discussion, we restrict ourselves toa Lambertian surface, Silluminated by a pointwise ight source infinitely far away from the camera (Chap: ter 2), Therefore, ignoring photometric dstorsion, we can write the image brightness, Eas on, (a9) where p isthe surface albedo, I identifies the direction and intensity of illumination, and mis the unit normal to a P. Let us row compute the total temporal derivative ofboth sides of (8.19). The only ‘quantity that depends on time on the right hand side isthe normal tothe surface. If the surface is moving relative to the camera wit translational velocity T and angular velocity, the orientation of the normal vector n will change according to dn Fawxn, (820) ‘where x indicates vector product. Therefore, taking the total temporal derivative of ‘oth sides of (8.19), and using (8.17) and (820), we have VET +E) = plo xm) (21) We can obtain the desired expression for Av from (818) and (821): Woxnl TEI ‘We conclude tha, even under the simplifying assumption of Lambertian reflectance, the image brightness constancy equation yields the true normal component of the ‘motion field (that is, | is identically 0 for every possible surface) only for (a) purely translational motion, or (b) fr any rigid motion suc thatthe illumination direction is parallel othe angular velocity. ‘Other factors being equal, the diference Av decreases as the magnitude of the spatial gradient increases; this suggest that points with high spatial image gradient are the locations at which the motion field can be best estimated by the image brighmess constancy equation. In general, |Av| is unlikely to be identically zero, and the apparent motion ofthe image brighness is almost always different from the motion field. For this reason, to Avoid confusion, we cal the apparent motion an optical flow, and refer to techniques, cstimating the motion eld rom the image brightness constancy equation as ptica flow techniques, Here isa summary of similarities and differences between motion eld and optical low. |av= 8.4. Estimating the Motion Section 84 Estimating the Motion Field 195 Definition: Optical Flow The optialfiovsa vector eld aj tothe contin (817, ands defined as the apparent ‘motion of the image brightness pattern. oa Optical Flow and Motion Field ‘The optical lw stheappraximation of the movin field whichcan be computed from time-varying image sequences. Under the smplilingestumpions of + Lambertian surfaces points light source at ininty + no photometiditortion the enor ofthis approximation ‘+ small at pins wit high pail rains * extly zero ony fr ratsational mtion or fran gd motion sue tha the iltminaton rction pail to the angular velocity We are now reacy to learn algorithms estimating the motion fl id ‘The estimation ofthe motion ed isa useful starting point forthe solution of many motion problems. The many techniques devised by the computer vision community cin be ‘oughly divided into two major clases differential ehniques and matching techniques. Differemial techniques are based on the spatial and temporal variations of the image brightness at all pixel, and can be regarded as methods for computing optical ov. Matching techniques nstea, estimate the disparity of special image points (eatures) between frames. We examine differential techniques in section 84.1; matching isthe theme of section 84.2. 8.4.1. Differential Techniques In recent (and not so wecent) yeas a large number of diferentil techniques for com puting optical low have been proposed, Some of them require the solion of a system of partial differential :quations, others the computation of second and higher-order “derivatives ofthe image brightness, others again least-squares estimates of the pa ‘ameters characterizing the optical flow. Methods in the latter clas have at least two advantages over thosein the frst tor ‘+ They are not iterative; therefor, they are genuinely local, and less biased than iterative methods by possible discontinuities ofthe motion fel + They do not involve derivatives of order higher than the first therefore, they are less sensitive to nis than methods requiring higher-order derivatives.Chapter Motion We describe differential technique that gives good results The basic assumption is that the motion field is well approximated by a constant vector eld, ¥, within any small region ofthe image plane." Assumptions 1 The image brighiness constancy equation sols a g00d approximation of the normal ‘component ofthe motion eld 2 The motion felis wll approximated bya conan ecor Sed within any smal patch of the image plane, ‘An Optical Flow Algorithm. Given Assumption 1 for each point p, within a small, 1 patch, Q, we can write (Ey vs E,=0 ‘where the spatial and temporal derivatives ofthe image brightness are computed at PisP2--- Py © A ype siz ofthe “small patch” Sx S Therefore, the optical flow can be estimated within Q as the constant vector, 7, ‘hat minimizes the functional vbl= > [weve el] na The solution to this least squares problem canbe found by solving the near system slave a, 62) The th row ofthe N? x 2 mati As the spatial image gradient evaluated at point ve) vem) Ae : 623) VE(wvs0) and bis the ?-dimensional vector ofthe partial emporal derivatives ofthe image brightness evaluated at p,..pyssaftera sign change: = = [EDD +5 EDN aa) (624) "Note tat ths iin arom with he rst coho of ston 82.3 (ton el orn planes) reputig the apposition of oth motion Sel Section 84 Estimating the Motion Field 197 ‘The least squares solution ofthe overconstrained system (8.22) can be obtained as!? (825) Fisthe optical flow (th estimate of the motion field) a the center of patch Q; repeating this procedure forall mage points we obtain a dense optical flow. We summarize the algorithm as follows eee Algorithm CONSTANT. FLOW ‘Tae input i time-varying sequence ofm images. 6,8... By, Let Q be a quite region of oe W pines (pica, N= 5), 1. tte each image othe sequence with a Gaussian fiter of standard deviation equal to, (ypcaly 0, = 15 pvels along each pail dimension, 2 iter each image ofthe sequence along the temporal dimension with a Gaussian iter of Standard deviation (epic o, = 15 frames) 112+ isthe size ofthe temporal iter, leave out the ist and last ages ‘3 Forcach pisel of each image ofthe sequence: (3) compute the matix 4 and the vector busing (823) and (824) () compate the optical fow wing (825) The outputs the ope How compute in the last step *= The purpose of spa ltering iso attenuate nos inthe estimation ofthe spatial image trade: temporal ering prevent aliasing in the time domain. Fo the implementation ‘the temporal isting imagine to stack the images ane om top ofthe ater and ier sequences of pines having te same coordinates Note tht the sie ofthe temporal iter islnked tothe maxmum speed that en be “measired” by the algorithm, ‘An Improved Optical Flow Algorithm. We can improve CONSTANT. FLOW byobserving that the ertormade by approximating the motion feldatp ith ts estimate atthe center ofa patch increases with the distance ofp from the center itsell. This suggests a weighted least-square algorithm, in which the points cose to the center of the patch are given more weight than those at the periphery. If W is the weight matrix, the solution, fy isgiver by jo = (AWA LAT Wb, Concluding Remarks on Optical Flow Methods. Its instructive to examine the image locaton at whict CONSTANT. FLOW fils As we have sen in Chapter the 2x 2matin tga( DB Dey = sae( oe, ne): (626) See Appendix ston Af aerate wap of svg oeonsaie nest cn198 (Chapter 8 Motion ‘computed over an image egion Q, is singular if and onl ill the spatial eradients in Q are all or parallel. In this case the aperture problem cannot be solved, and the only possibilty i to pick the solution of minimom norm, that i, the normal flow. The fact that we have already met the matrix ATA in Chapter 4is not a coincidence: the next section tells you why. ‘Notice that CONSTANT_FLOW gives good results because the spatial structure ofthe motion field of rigid motion is well described by alow-degree polynomial in the image coordinates (as shown in section 82.3). For this reason, the assumption of local constancy of the motion field over small image patches is quite effective 8.42 Feature-based Techniques The second class of methods for estimating the motion feld is formed by so-called ‘matching techniques, which estimate the motion fil at feature points only. The result isa sparse motion field, We start with a two-{rame analysis (nding feature disparities ‘between consecutive frames), then illustrate how tracking the motion ofa feature across ‘long image sequence can improve the robusiness of frame-to-frame matching ‘Two-Frame Methods: Feature Matching. If motion analysis is restricted to two consecutive frames, the sume matching methods can be used for stereo and motion.” ‘Thisistrue for bth ooeeation-based and feature-based methods (Chapter). Here we concentrate on matching feature points. You can easily adap this method fr the stereo ‘The point-matching method we describe is reminiscent of the CONSTANT. FLOW algorithm, and based on the features we met in Chapter 4. There, we looked at thematrix A’ A of (8.26) computed over small square image regions the features were the centers of those regions for which the smallest eigenvalue of A” A was larger than a threshold, The idea of our matching method is simple: compute the displacement of such feature points by iterating algorithm CONSTANT. FLOW. The procedure consists of three steps First, the uniform displacement of the square region Q is estimated through CONSTANT_FLOW, and added tothe current Aisplacement estimate (initially set to 0) Second, the patch Q is warped according to the estimated flow, This means that 0 is displaced according (othe estimated flow, and the eesuting patch, 0, ie resampled in the pixel grid of fame J. Ithe estimated fow ‘equals (1). the pray valu at piel (i,j) of ean be obtained from the gray values ofthe pixels of Q close to (i ~ vy, j ~ v2) For aur purpose, bilinear interpolation is sufficient. Third, the fist and second steps are iterated una stopping criterion is met. “Here isthe usual algorithm box, conaiaing an example of stopping criteria "pat heey mind the dicenion of ston 2: 08 the iferenses betwee see and mation dsp. "araterplaon meus thuthe interpolate i inearinetch ofthe oupte hoes to =.) = Section 84 Estimating the Motion Field 199 ‘Algorithm FEATURE. POINT MATCHING “The inputs formed by 1 and I Feature points inthe vwoframes Let Os, Os, and O bethree YN image rgions andr a fixe, posiie real number Leta ‘othe unknown dsplacenent between J and Fy ofa Feature point pon which Qi centered, Foal feature pons two frames ofan image sequence, anda set of coresponding 1. Setd=Oand center 0; onp. 2. Fstimate the displacement ofp, center of Q, trough (8.25) and etd = A+ 4 Let O'be the patch obained by warping Q; according to . Compute 5, the sum ofthe squared diference (ihe SSD of Chapter? etwoon the now pach Q and he correspond ing patch Qs inthe frame f 4. US r,s0t = 0 and goo stp i; otherwie ext ‘The outputs an estimate of foal eature pints. 'F Inboththe smoothing stage necessary tocompute the derivatives in (6.25), andthe warping ‘age of sieps 2 and 3 respectively, you should considera epiom ataly larger than Qt (ap by efector hiscabesyou tate the produ wound oun elles 7 © Analterative stoping criterion ito contol te etative vatation of the estimated flow at each tration and exit the lop if the elaive variation falls Blow a ined threshold Multiple-Franve Methods: Feature Tracking. As we assume to analyze long i age sequences, nol ust airs of frames, we can improve on two-rame feature matching ‘We start with an intuitive fact: ifthe motion of the observed scene is continuous, as itnearly always ig we shoul be able o make predictions on the motion ofthe image Points, at any instant, on the bass oftheir previous trajectories. In other words, we ex- Pect the motion of image points to be continuous and therefore predictable, in most ‘casex: we shouldbe abl o use the disparities computed between frames J, and 2, 11.2 and |, and so on, to make predictions on the disparities between f and i, before observing frame i Definition: Feature Tra Feature racking isthe pcblem of matching features from frame to frame in tong sequences of images ‘We approach traccing in the general framework of optimal estimation theory: ou solution isthe Kalman ter, For our purposes, a Kalman filter isa recursive algorithm ‘hich estimates the postion and uncerainty ofa moving feature point inthe next frame thats, where to Took fr the feature, and how large a region should be searched in the200 CChapter8 Metion nex fame, around the preived poston, tobe sue to find the feature withina certain Confidence. An introduction to the basi lements of Kalman iter theory i necessary {o understand ths section, and can be found inthe Appenci, section A. Readit now ityou are not fila wth the Kalman filter. ‘Let us formalize the tracking problem. A new frame of the image sequence is acquired and processed at each isan, = 1+, where ki natural number. The Sampling interval i essumed I for simplicity, and, more importantly, small enough to consider the motion of feature pints fom frame to frame near. ‘Weeonsideronyonefeaire pint, = [2's inthe frame acquired at instant tec moving with velocity v= [tasty] - We describe the motion on the image plane with the state vector x= [xe, yi, Yet, Uys]"s Assuming a sufficiently small sampling Interval (an therefore constant feature velcty between frames), we write the system rmodelof the linear Kalman ltr as peapei thai téint 29 memati where &-1 and m1 are zero-mean, white, Gaussian random processes modelling the -ystem noise In terms ofthe state vector x, (8.27) rewrites wan +m with 4 ‘Astomeasurements we assume that fas feature extractor estimates the position of the feature point p, at every frame ofa sequence. Therefore, the measurement model of the Kalman fer becomes Loo opm), ae[otool[t]e™ itp 2220-0 white Gaus aon proses olin te eee ‘Assumptions and Problem Statement In the assumptions of the linear Kalman filter (Appendix, section A}, and given the noisy observations, compute the best estimate ofthe features postion and velocity at instant nd their uncertainties Section 84 Estimating the Motion Field 201 Figure 88 An example of feature tacking over thre frames of talc eguense, The feature tracked the centroid othe eu, marked with cross The Kalman filter algorithm is summarized in the following equations, repeated here from the Appencx, section A.8 for completeness. Algorithm KALMAN. TRACKING = ‘The inputs formes at instant, by the covariance matrices of system and measuemeat rose at time ty, Qs and Ay respectively the time-invariant state matrix, @, the time-invariant measurement matrix, andthe position measurement atime 2. The ens of Py ate seo igh, abiary values Fm MP] + On Ke = PLE RCE + Ray B= Osh Riles MD) = KORT = Ka RT ‘The ouput isthe optima! estimation ofthe poston and velocity atime and their nce tame giver by the diagonal elements of Fh Two thins are worth noticing here. First, we donot just believe the noisy measure- ‘ments of py of the feature detector; the hltr integrates them with model predictions to ‘obtain optimalestimates Second, the fier quantifies th uncertainty onthe stat estimate, in the form of the diagonal elements of the state covariance matrix, This information allows the feature detetorto dimension automatically the image region to be searched to find the feature point in the next frame, The search region is centered on the bes positon estimate, and is larger the larger the uncertainty: The elemeats ofthe state covariance matrix re aually initialized to very large values; ina well-designed filter, they decrease and reach a steady sate rapidly thereby restricting the search region of an image feature within afew frames An cxample of tacking with Kalman filtering is shown in Figure88. Th centroid ofthe erin the mage (indicated by the white cross is tracked overtime, The sizeof the crossis proportional othe uncertainty inthe system's202 Cchapter8 Motion Two problems arse inthe implementation ofthis algorithm. SS ‘Missing Information Kalman ttering is based onthe knowledge ofthe following: 1. the system mode and the corresponding noise covariance mati, 2 the measurement model and the corresponding nose covariance mati, Ry ‘thei time sytem tat, and sate covariance mates, However several of these quantities are sully unknown, Data Assocation {nthe presence of several image eatures and muliple measurements which observed measure ments shoal be assoiated with which feature? “Missing Information, Fortunately, this problem i not as ba as you could expect. “The system model is usually wnkaown, but is assumed linear ifthe time sampling is fast enough. The measurement model is available, as we assume that feature positions are computed at each frame. A really critical parameter, instead, ithe relative weight Of model prediction and measurements expressed by the filter's gain, Ky. From the {equation of Kc, we see that the gain depends on the covariance matrices of system ‘and measurement noise. In particular, ifthe entries of Rare much smaller than those ‘f 0, (that is the system model is much noisier, and therefore more uneetain than the measurements), the Kalman filter ignores the prediction of the system model and relies almost entirely on measurements. Conversely, ifthe entries of Q, are much Smaller than the entries of Re (that ithe measurements are much more uncertai than the prediction), the filter ignore the measurements and relies almost entirely on the prediction ofthe system model. Clearly, one aims ata balanced situation to achieve the greatest benefit from the integration of measurements and prediction. To achieve a balance, one can estimate Rj on te bass of the information available on the measuring process then scale the entries of Qy,making them comparable with those of Finally the tate ants covariance canbe initialized far oftheir asymptoticvalues with norsk of compromising the filter convergence.* ‘Data Assocation, Ths is @ nontvial problem in general, as there may be many featuresto be tracked, You should look into the Further Readings fra detailed analysis of techniques dealing with it, Here, we just consider briefly the case of low cluter fand multiple but nonintrfering targets. Low clutter means thatthe likelihood of noisy features at each frame (e., false features, features appearing for one frame only) is low. Noninterfering targets means that feature pas do not intersect. I this case, the technique known as nearest neighbor daia association (NNDA) is the most effective NNDA just selects the measurement associated withthe updated state nearest to the Site ier asunptios ae said of ous Section 85 Using the Motion Field 203 @) ©) Figure 88 (a) Disjoin search regions of two features, centered ‘round the best positon estimates pp: the measurements my, my fare asociated to the closest estimates (b) Ith search regions erst, te minimum distance criterion ils predicted state (see Figure 8.9(a). It is good practice to measure the distance between Mates by means ofthe inverse of te state covariance matrix (Exercise 8.5), NDA is clearly suboptimal (Figure 89(4) and more sophisticated methods ae required to deal with high cutter and interfering targets 85_ Using the Motion Field [Now that we have various ways to estimate the motion field, what d timate the motion fed, what do we do with it? We targettwo tasks of practical importance, the reconstruction of -D motion and structure Problem Statement Given the movin ft etimated om ange sequen, comput the shape, os thers obo an ti mao ice ote ewingemera et (Once again we distinguish between methods using dense and sparse she note a tin thos using dense and sparse estimates of 85.1. 3-D Motion and Structure from a Sparse Motion Field Inthisseation we stint 3.D motion andstutur fom asparse st of mache img tcatures the average pry between onecteframesssmal he eonstcton an gin in slit ad obanes from he tne integration af log sequenecs frames In the cons th terape dpa bewee ames arg ths problem can be dealt with in asereoke fasion, for example by meats af he eight pint slzctimof Chapter” ppd to pir of frames OF he many metho proposed in he erature for the former cs, we hae stn the facorzaton me, Wish isimpleo implement and pes ery good and manera stb es or obecs ‘eed from ator le distances Te essay ssption are tanmarzed bel204 Chapter8 Motion “Assumptions: Factorization Method 1 The camera mode! sonographic 22 The postion of» image points. corresponding to the scene points Py P coplanar, have been tacked in frames with N23, Ps, not all x Note that Assumptions equivalent to acquiring the ene sequence before starting any processing This may or may nt be ascepable depending on the appicaton, Notice also fat sine the eamera models ortbographi, camera calibration cn be altogether ignored if we accept to reconstruct he SD points only up toa sale factor “The remainder of this section introduces the necessary notations, discusses the ‘rankheorem,on whichthe whole method based, andstates the complete factorization algorithm. Notation, We let pj = [1 nu] denote the _jth image point (j= 1...) atthe fat frame (f= +4), a think ofthe x and a8 ene of two > w matrices, ‘Cand Y respectively, We then form the 2» n measurement mati “fl where (629) ae the coordinates of the centroid of the image points in the i-th frame, Again, we think ofthe ¥ and jas entries of two NY x n matrices X and ¥, and form the 2 x m matrix W, called the registred measurement matrix (630) ‘The Rank Theorem. ‘The factorization method is based on the proof ofa simple but fundamental result, Section 85 Using the Motion Field 205 ‘WORLD REFERENCE FRAME Y FRAMES x Figure 8.10. Thegeometry ofthe factorization method. ‘Rank Theorem ‘The registered measurement mati (without nose) has a most ank 3 ‘The proof is baed on the decomposition (factorization) of W of (830) into the product of a 2V x3 matrix, and a3 x n matrix, S. R desribes the frame-to-frame ‘otation of the camera with respect to the points PS describes the points’ structure (coordinates). The poof is essential forthe actual algorithm, so we will go through it in dot ‘We consider all uanttes expressed in an object-centered reference frame with the origin inthe centoid of Py,...P, (Figure 8.10), and let, and j, denote the unit veetors of the image eference frame, expressed in the world reference frame and at tne instant. Ths he direction ofthe opt axis sien by the cross produto andj, kK Itcan be seen from Figuie 8:10 dat (631) (632) where 7; i the vector from the world origin to the origin of the #-th image frame; ‘moreover, 8 the origin is inthe centroid ofthe points, (633)206 Chapter8 Motion ‘Now, plugging (8.31) and (8.32) into (828), and using (829), we obtain, gal @-m)-2 De (Pe ~T) spi) -1)-1 Pie. 6s) But due to (8.33), and tothe fact thatthe index isnot summed, (834) become Suis ‘Therefore ifwe define the 2N x 3 rotation matrix Ras (835) and a3 xm shape matrix Sas S=[Pi Pr... Pads (8.36) wwe ean write Since the rank of R i 3 because N > 3, and the rank of Sis also 3 because the W points in 3D space are not all coplanar, the theorem is proved 1 Notice tempore ofthe assumption of noncoplanar pints ‘The importance of the rank theorem is wofold. Fis i tls you that there sa eat deal of redundancy inthe image data no matter how many points and views you fe considering the ank ofthe reitered measurement mati doesnot exceed thee Second, and most importantly, the factrization ofthe registered measurement matrix, Was the product of Rand soggestsa method for reconstructing structure and mation from asequene of tracked image points Section 85 Using the Motion Field 207 The Factorization Algorithm. The factorization of Wis relatively straightfor Ward. First of all, note that this factorization isnot unique: if Rand S fatorize W, and ‘Qisany invertible x }matrix, then RQ and Q~Salsofactorize W. The prootissimple: (ROO) = RIO Fortunately, we ean adi two constraints RS=W, 1. the rows off, thought ofas -D vectors, must have unit norm; 2 the fist n rows of (the i) must be orthogonal tothe corresponding last n tows (che) Our ast ettort before aching analorithm box sto show that these constraints alow us to compute factriation of which unique upto an unknown inital orientation ofthe world reference frame with respect to the camera frame (Figure 810). At the same time, we aso show how io extend the method to the casein which, due to noise ov imperfect matching, the rank ofthe matrix Wi greater than 3. Here ithe proot Fin, consider the singular value decomposition (Append, section A.6) of W, Weupv’, (837) ‘The fact thatthe rank of W is greater than 3 means that more than 3 singular values along the diagonal of 0 will not be zero, The rank theorem can be enforced simply by setting all but the tree largest singular values in D to zero and recomputing the corrected matrix W from (837). ‘© _Bynow,thisshouldnot suprise you. We wed the same method elewhere;e. to compute the closest rotation matrix to a numerical estimate ia Chapter 6 Notice tha if the rato between th hid and fourth singular value isnot are, s expected the SVD wan you bout the conssteny ofthe dat. ‘Then let D'be the3 « 3top left submatrix of D corresponding to the three largest singular values 01,0, and 03, and U’ and V’ the 2N x 3andin x 3submatrices of U and V formed by the columns cosresponding too;, 01, and es, (828) In general the ows and ofthe mtrs wi ot sats the constrains mentioned above: hawever, if we ok fora matix Q such hat (839) then the new matrices R= i¢@ and S$ = Q~1R still factorize W,and the rows of R satisfy the constraints. The obtained factorization is now clearly unique up to an arbitrary208 Chapter8_— Motion rotation, One possible choice isto assume that at time ¢=0 the world and camera reference frame coincide. Here isa concise description of the entire method. A method for determining Q from (839) is discussed in Exercise 88. ‘Algorithm MOTSTRUCT_FROM_FEATS “The input isthe regintered measurement mati W, computed fom n features tracked over N 1, Compute the SVD of W, Wevv, where U is 82 «20 matrix, Vn xn, sd D2W x UTU = 1, V"V = fs and D isthe ‘agonal matrix ofthe singular values 2 Seto zero al but the three largest singular valves in D. 1. Define and Sain (85). 4. Solve (839) for Qfor example by means of Newton's method (Exerise 88). ‘The output ae the rotation and shape matrices given by R= RO and 8-075, The algorithm determines the rotation of a set of D points with respect to the ‘camera, but how about their translation? The component of the translation parallel to ‘the image plane is simply proportional to the frame-by-frame motion ofthe centroid of the data points on the image plane, However, because ofthe orthographic assumption, the component ofthe transaion along the optical axis cannot be determined. 852. 3-D Motion and Structure from a Dense Motion Field ‘We now discuss the reconstruction of 3-D motion and structure from optical ow. The two major diferences with the previous section are that += optical Now provides deuse but often inaccurate estimates of the motion field; ‘the analysisis instantaneous not integrated over many frames. Problem Statement Given an optical ow and he intras parameters ofthe viewing camera, recover the 3D motion an statue ofthe observed scene with respect othe camera reference fame, ‘We have chosen a method that represents a good compromise between ease of ‘implementation and quality of results. The method consists of two stages: Section 85 Using the Motion Feld 209 1. determine the rection of transiation through approximate motion parallax, 2, determine a least-squares approximation ofthe rotational component of the op tial flow, and weit in the motion field equations to compute depth. Stage I: Translation Direction, "Th fst stage is rather complex. We start by explaining the solution in the ideal case of exact motion parallax, then move tothe as of approximate farallax, We learned in section 8.2 thatthe relative motion fed of| ‘wo instantaneously coincident image points, [Av,, Av,] is directed towards (or away from) the vanishing oint ofthe translation direction, py (the instantaneous epipole), according to 1 ounaa-tn (5 7 ayato-n($ where Z and Z ae the depths ofthe 3-D points P= (X, ¥,Z) and P= (X, ¥, 2), which project onto the sameimage point, p= [,y]", inthe frame considered. IF(840) can be ‘written fortwo different mage points, we can locate the epipole, p,as the intersection ofthe estimated, relative motion fields. Once the epipoleisknowa, itisstraightforward to get the direction of translation from (83). ‘If (8.40) canbe wit for more than two points, we ca esto least squares to abana ‘beter esate ofthe epipole’s locaton. ‘his solton ea be extended tthe mor east ease of approximate motion paral, in hich the esimates thereat maton elare svat on fo aot void ie pots The ey obser hat te direcsDetveen hope fow vers ean mage on panda any poi: as can be regarded nosy tsa of he mot paral p(eeon 82) "Ne mt now eit th (40) forthe ae of aproximate paral. We tein by wating be utes ad voto component oe rate ton Sl fez. au?) and [ap 05)” respect for vo alot oinident image points, p fndp There Ts Ts (al) Av! =0x0-5)+ 2ay-39)-Ba?—¥ 7 i (a) Sete ete 2) Fay a+ B08 F 7210 Chapter8 Motion From the rotation equations we notice that Av) > Oand A’ Ofor p> p. Asto the translation equations we ean rewrite them as ateas-nn(t ry a=. (5 The second tems ofthe right-hand side ofthe (8.43) tend to zero for > p, while the first terms tend tothe expression obtained forthe exact motion parallax, (40). We can therefore write the relative motion feld of two almost coincident points concisely as uy with e- and e, smooth functions ofthe diference between pnd, and e (0) =¢,(0)=0. ‘Equations (844) show that, if p and p are close enough, a lage relative motion field can only be due to a large difference in depth between the 3-D points P and P. ‘This observation suggests a relatively simple algorithm for locating the instantaneous epipole (end therefore the direction of translation) from a number of approximate ‘motion paallx estimates, We compute the ow differences (Av, ;) between apoint py and alts neighbors within a small patch Q,, then determine the eigenvalues and eigenvectors ofthe matrix an[ Zee, Sone"). a Ladvdey LA%y here the sums are taken over the Qi; the eigenvector corresponding to, the greater cigenvalue, idemifes the direction ofthe line through p, which minimizes the sum ofthe squared distances to the set of difference vectors (Append, section A.6). This «rection i taken o be the optimal estimate of motion parallax within the patch Q, “Morcover,y itso canbe rogarded asa moneure ofthe eatimate’s relly. IF slang the underlying distribution ofthe flow differences has a peak inthe direction fi. Thisis likely to be due tothe presence of considerable differences in depth within (Instead, if is small, the underlying distribution of the flow differences is ater, and almast certainly eeated by the flow fild ofa surface that doesnot vary much in depth within 0. "Oe might rue hat wht aly cous sould be the ati eee he sal nd eater eieavaes. However te the rang of and br fist, the restr peau sagen abs tres Section 85 Using the Motion Field 211 ‘We can now orulate a weighted east squares scheme to comput the intersec- Son of th several lines that, the epipole py, Snce hand te coplanar tach patch Q, we cane dxw'r (245) [there are N patches, we can write simultaneous instances of (8.46), that in matrix notation, bp=0 (47) with co) iv x ph The problem of determining least-squares estimate of psthusreduced tothe problem of solving the overcorstrained homogeneous system (8.47). As customary the solution can be found from the SVD of B, 8 = UDV" (Appendix, section A.6), asthe column oF V corresponding tothe null (in practic, the smallest) singular value of B. «© _Inorderio give appropriate weighs othe different estimates, itishetter to use a weighted leastsquae scheme and consider the matrix WB, where the ens ofthe diagonal matt W are the larger egeavaucs of , Stage Two: Rovedonal Flow and Depth The res ofthe algorithm is stig forward. We simply frm the pointwise dot producto, between the optical ow at point p; = [x),]" and the vector [y; ~ yy, —(xi — 4p)]". As we know from section 82, 1 depends only onthe rotational componeat of matin; therfore, teach point py of the image plane we have = Hf01— dW) 64) with and v asin (842. the intrinsic parameter ofthe camera are known, we can ‘rte inear stem: simultaneous instances of (848) in the image reference frame by using (814), and scve forthe three components ofthe angular velocity using least squares. ial, we recover the traltional direction from the epiole coordinates by means of (89), and sove (87) fo the depth Z of each image point tis now time tosummarze the method Algorithm MOTSTRUCT_FROM_FLOW The input quantities are he intrinsic parameters ofthe viewing camer Sold, ¥, produced by singe rigd motion und a dense optical fowa2 ‘Chapter8 Motion 1 2, For each image point p, ‘Write (87) nthe image reference rame, sing the knowledge ofthe inti parameters (@) compute the Now diferences Avy and ey between he optical fow tp, and atallthe points pina neighborhood of () compate the eigenvalues and eigenvectors ofthe matrix A of (845); let be the sreater eigenvalue, andthe uit eivenvevtr corresponding 10 a Compute the SVD of WB, WB =U DV", with Basin (8.48) and W a diagonal matrix suck i. Estate the eppole py a the column of V corresponding to the smallest vale 44 Form the dt produc of (848) f= 1... and rewrite the equations obtained inthe mage reference frame 5, Determine the angular velocity components asthe least squares solution of system of simultaneous instances of (8.9). 6 Determine the translational direction trom the epipol cordinates and the knowledge of ‘he intrinsic parameters (se (89). 7, Solve (8.7) forthe depth Zo each mage pin. ‘The output quantities are the deci of wansation, the angular velocity, and the 3D coordinates of the scene pots, © Notice that, as discussed in section 825, the epipoe canbe estimated withow prior Inowledge ofthe camera parameters, that ig, wth an uncalibrated camers. The diestion of ransation, instead, canbe obtained from the eppoe nly if the iniasic parameters of ‘he camera are known MOTSTRUCT_FROM_FLOW is not as accurate as MOTSTRUCT_FROM_ FEATS. This is not surprising, as MOTSRUCT.FROM_FLOW is an instantaneous ‘method, which rlis on local approximations ofthe observed motion onthe assump ‘ion of large variation in depth inthe observed scene, and on the accuracy of camera calibration ‘86 Motion-based Segmentation Inthisfinal section, we relax the assumption thatthe motion between the camera andthe scene is desribed by a single 3-D motioa to deal with the problem of multiple motion. For the sake of simplicity, we restrict the analysis tothe ease in which the camera is fixed. Ifyou ace interested in motion segmentation inthe presence of camera motion, problem which sil waiting fr a general and satisfactory olution, see the Further Readings. I the camera is fixed, identifying moving objects can be seen as a problem of, detecting changes against a fied background. Section 86 Motion-based Segmentation 213 ‘Problem Statement Given a sequence of images taken by a Oxed camer, tnd the regions ofthe image, if any, corresponding (othe dierent moving objects. This problem can be thought of asa dasifiation problem. One has o classify the pines ofeach frame of a sequence as either moving or fted with respect tothe camera. On the bass of what we have seen so far, possible procedure seems to be the computation of optical ow followed bya thresholding procedure. The pixels for which the norm of te optical ow is large enough are labeled as "movin, he others 25 “fixed”. Two eriicams canbe raised against this approach. Fist, the estimation of optical ow inside ptces that contain pixels from independently moving objects s usualy rather poor. Sond a many applications surveillance tasks fr example) the detection of motion o changes in the scene sal that needed Presumably the simplest strategy for detecting changes in an image sequence i image diferencng: (take the pointwise ference between consecutive frames, ad (i label as “moving* the pines for which this difference exceds a predetermined ‘threshold, ‘Algorithm CHANGE, DETECTION ‘The input isan image suenc, If... ly and postive ral number, x:for each image pat (esha) 1 Compute the potewise image dterence Au, j)= haf) ~ 1 2 if) > «label piel jo the frame tas moving The output isa map the moving image regions ‘The threshold must be chosen so thatthe probability of mistaking differences created by image noise for real motions very smal simple way todo this isto acquire ‘wo images ofa static scene, with no illumination changes and look at the histogram of the diference image. fn agreement withthe assumption that this diferenoe is mainly due to the camera noise (see Chapter 2), the histogram should look ikea zero-mean Gaussian function (See Figure 8.11). The threshold can thus be estimated asa muliple ofthe standard deviaton ofthe computed distribution, Ofcourse, the mox diferent the images used to estimate the standard deviation, the more ‘ccuately the histogram eects the cstbuton of the nose. Better results cax be obtained ina numberof different ways One possibility is to resort to statistical tess (se Further Readings) A second possiblity so take motion ‘measures more sophisicated than image differencing. For instance, an alternative 10 image differencing is he weighted average ofthe normal flaw magnitude, |E|/IVEl,214 Chapter8 Motion H H “a a 35000. 30000. 25000 - 20000 - 5000 murmber ofriseis 10000 - 5000 10 109530 $0 Ss gray eve diffrence Figure 8.11 ‘Tan sample snaahoty, taken a cle int frm an owemig, surveillance sequence, and the histogram af the dierences ofthe ya values Between the Wo ‘vera small patch Q; centered at point py. The weights are taken tobe the square of the norm ofthe spatial image gradient, so that this motion measure can be written as (lI DIVER M Where the sum are taken over Qj, and the temporal and spatial derivatives are meant to be computed at each point of Q,. The purpose of the constant C isto remove the instability that may be caused by uniform patches. The choice ofthe weights agrees section 88 Further Readings 215 withthe results of secion 83, which proved thatthe difference between the true motion field andthe apparent motion ofthe image brightness smaller at the locations where the norm of the spatial image gradient is larger. The conversion of this idea into an algorithm for change detection is straightforward, and lft as an exercise. 8.7 Summary After working trough this chapter you shouldbe able to: 2 explain the fundamental problems of motion analysis in computer vision 2 estimate the opieal ow from a sequence of images (@_-match and tracking image features overtime a estimate 3D motion and structure from both dense and sparse estimates of the ‘motion field 1D detect changes in image sequences taken from a fixed camera 8.8 Further Readings ‘The discussion on ihe difference between motion fed and optical flow is taken from Versi and Poggio 19} The method descrined here for estimating optical flow is due to Lucas and Kanade [1]. A variational approach to the computation of optical fw was first proposed by Hom and Schunck [8], Of the many parametric methods proposed since see, for example, Campani and Vert 3] (in which you ean also find a discussion ‘on the spatial properties ofthe motion fel). For strictly local computation of optical flow you should start with Nagel’s work (see [14], for example). correlation-based method is described by Poggio, Little, and Gamble [15]. Finally, the paper by Barron, Fleet and Beauchemin (2} san excellent review of many methods for estimating optical flow, and includes Internet sites with public-domain code ‘The literature on feature matching for motion is also vast, Amoag the many techniques available, you may want to look atthe classic books by Ullman [18] and Hildreth [7] A nice and simple account of Kalman fltering is the paper by Cooper [1]. Alternatively, you ca look atthe very clear and complete book by Maybeck [13]. Data association is diseusst, for example, in the book edited by Bar-Shalom (I) The factorizaticn method for +-D motion and structure from sparse features is ddue to Tomasi and Kanade [17]. Among the other feature-based method we suggest the technique proposed by Faugeras, Lustman, and Toscani[]. Te epipole’s method for reconstruction frcm optical ow is based on the paper by Rieger and Lawton [16] and onan idea originally suggested by Longuct-Higains and Prazday [11], Part of the algorithm MOTSTRUCT_ FROM. FLOW is based on the implementation proposed inthe appendix ofthe paper by Heeger and Jepson [6], the main topic of which san alteratve algorithm to MOTSTRUCT_ FROM. FLOW, Fora thorough analysis of change detection see Hsu, Nagel and Reker [9]. As to ‘motion-based segmeatation inthe general ease, you may want to look atthe work of Irani, Rowsso, and Peg (10, for example, and references therein26 Chapter Motion 89 Review Questions 81 Whatare the properics of the motion eld generated by a planar surface? Woald an arbitrary oriented planar surface generate x motion fel across the cate age lane? 4&2 What isthe relation between the instantaneous eppole andthe foes of expansion or contraction)? 5 83 What isthe erence between motion feld and opi ow’? 3 84 The derivation of (817) assumes thatthe mage rightness icontinuos and canbe diferentated, How puss this assumption in general? Why? 5.85 What are the astumptios behind the algorithm CONSTANT. FLOW? 3.86 Givenourdisusson ofthe Kalman iter fr festr tracking, how wouldyou decide the shape and iv of the earch resions given the uncertainties prodoced ty thefiter? 13 87 How would you decide whether the effective rank of W in algorithm MOTSTRUCT FROM. FEATS is 3 88. What happens i you apply MOTSTRUCT. FROM. FLOW tothe opis flow ofa plana suis? © 89. Why change detection methods are not useful (a themselves) fr motion based segmentation inthe geeral ese? Exercises (© AL Estimate the ratio between the quadratic and linear terms, the quadratic and constant terms, and the linear and constant terms in (8.13. Set the motion and structure parameters ofthe plana surface to some arbitrary but reasonable rales © 82 Show that the aperture problem can be solved if corer i visible through the aperture. © 83 Extend algorithm CONSTANT_FLOW by assuming that the motion fields locally approximated by a linear vector field © 84 Cicate a simple, symhetc image sequence by sifting a given image by [eset] piel per frame (with ug and vy not integer, and using bitnea iter. polation inthe resampling tage, Then, apply the inverse transformation; that is, ‘eate an image sequence starting from the last frame and applying a motion of [Ces,—v]' pixel per frame. Compa the las frame ofthis sequence withthe ‘original image. Do you expect them tobe exactly equal? If nt, why? (© 85 Show thatthe distance between system states computed through the covari- lance matrix, named Mahalanobis distance, isnot isotropic. What isthe advantage ‘of using Mahalanobis distance instead of the usual Fueidean distance? 9. 86 Write the Kalman filter equations fr a one-dimensional state vector. Section 89 Review 217 © 87 ‘The celebrated structure-rom-motion theorem, due to Shimon Ullman, states that, under orthographic projection, at least views of atleast n noncoplanar points are reeded to uniquely recover structure. Can you guess the values of Nand from the factorization method? (2 88 The nonlinear system (8.39) can be solved by means of Newton's method, anterative prevedure, the main idea of which is rather simple. Write down the {ull system of 3 quadratic equations inthe nine entries of the matrix Q, say 4is---+4y Starting {rom an inital guess forthe values of gi... 95 (lke the identity matrix for example, take the partial derivatives of each equation with respect fo each unknown, and evaluate the expressions obtained atthe current, value ofthe enties If Mi denotes the partial derivatives ofthe i-th equation with respect f0 gj, the matrit Mi, can be viewed asthe system matrix of the linear Mag: where Aq=[42,... gs] the components of which shouldbe used to update the current estimate of gi, ....qs- The components of the 30-D vector € are the residuals of all equations, computed by means of the previous estimates of iy. 44 The procedure is iterated until the components of 44 are sufficiently smal, implement and solve this method fo estimating 0. (2&9 Show that Step 3 of MOTSTRUCT_FROM_FLOW works equally welleven ifthe epipole isa init Projects © 81 Implement a coarse-tosine versio of CONSTANT_FLOW. Build the coarse levels by iteratively averaging ovr the fur neighboring pites at the Jumediatelinrleves Stat CONSTANT _FLOW atthe coarsest vel and propagate the estimtes to each corespondig group of fou pitels atthe nr level Compare the results with a standard implementation of CONSTANT. FLOW. ‘Why should thtwo methods ctf? Which method is expected to perform better inthe presence of large image displacement? © 82 Inplemen! FEATURE_POINT MATCHING and MOTSTRUCT_ PROMPEATS Use these of matched features, output of the forme, a input ofthe later. © 83 Implement CONSTANT FLOW and MOTSTRUCT_FROM FLOW. Use the optial fowestimates output ofthe former as input of he lat. (© 84 Implement a tracking system based on the Kalman filer, and use i to ack moving object viewed from afixed camera, Use the centroid af the largest set of ‘cnnected pte in which changes detected, an measue the objec’ position through CHANGE, DETECTION. Lette centroid position and velocity be the system stat. Devise a simple data association algorithm to deal with the possibilty ‘of detecting changes in more than one large set of connected piel218 chapter8 Motion References 0 el 8 “1 6 9 a io 8) to) un) 12] 3] i tus] i) rr 18) 9) pe Y. BarShalom (ed), Mul-Target Mult Senor racking, Artec Howse (1950. 4L, Barron, DJ Flee, and S. Beauchemin, Performance of Optical lw Tehniques Iemationl Journal of Computer Vision, Vo. 12, pp. 43-T7 (1984), M, Campaniand A. Ver, Motion Analysis rom Fist Order Properties of Optical Flow, CVGLP: Image Understanding, Vo. 6, 9p. 90-101 (199) [WS Cooper, Use of Optimal Estimation Teor, in Particular the Kalman Fite, in Data ‘Analysis and Signal Pressing, Review of Sint Isramenaation, Nol. 7, pp. 2862-2869 (1986), ‘OD. Faugeras F Lastman, and G. Toscan, }D Structure from Point and Line Matches, Proc. It International Conference on Compa Vision, London (UK), pp. 25-34 (1987) 1D Heegerand A.D. Jpion Subspace Method for Recovering Rigid Motion I Algoritim ‘nd Implementation, International Journal of Computer Vision, Vo. 1, pp. 95-117 (192). EC Hildreth, The Measurement of Vual Mion, MIT Press Cambridge (MA), (1984). KP. Horn and BG, Sehunek, Determining Optical Fos, Arca Duligence, Vol. 17, pp 185-203 (1981), ‘YZ. His, HHL Nagel and G.Rekers New Likeinood Test Methods for Change Detection intmage Sequences Computer Vision, Graph and mage Processing, Vol. 26, pp. 73-108 (2989, IM Irani, B Rowso, and S Peleg, Computing Ocloding and Transparent Mtions, Inter rational Journal of Computer Vision, Vo. 2, pp. 5-16 199) H.C Longuet Higgins and, Praday, The Iterpretaton ofa Moving Retinal Image, Pro. Royal Soe. London B, Vo. 208, pp. 88-897 (980, BD. Laces and: Kanade, An Herative Image Registration Teehsigue with an Appice tion to Stereo Vision, Pro. Tih International Joint Conference on Aral Inligence, Vancouver (CA), p. 64-679 (181) PS. Maybeck, Sochae Models, Estimation, and Control, Vol I, Academic Press, New York (197). H.-H, Nagel, Dislacemeat Vctrs Derived from 2nd Order Intensity Variations in Image Sequences, Computer Vision, Graphics, and Image Proesing, Vol-21, pp. 85-117 (1983) “Poggio .Ltle E. Gamble, Pall Optical Flow, Nature, Vl 30, p.375-378(1986). UH, Rieger and DT. Lawton, Processing Dillerentil Image Motion, Zoural ofthe Optical “Society of America, Vol. pp 384-359 (1985). Tomas and: Kanads, Shape and Motion fom Image Steans under Orthography: 3 Fctoization Method iernatonal urna of Computer Vision, Vol 9, pp. 137-154 (199). S Ullman, The Inerpretation of Vinal Motion, MIT res, Cambridge (MA) (197). [A Ver and T. Poggio, Motion eld and Opt Flow: Qualitative Properies, IBEE Transactions on Pater Anas Machine Ineligece, Vo, pp 490458 (1989). “. Vigil and OD. Faugeras, Foed Forward Recovery of Motion and Stustare from a Sequence of 2D-ines Matches, Proc. 3rd Intemational Conference on Computer Vision, (Osaka (Japan), ps 517-520 (19). Shape from Single-image Cues Je voutais pes erever ‘Sans soir lune Sous son faux aide thune ‘Aun cbté point! Boris Van JeVoudras pas Crver The subject ofthis chapter, nferrng the shape of objects from a single intensity image, is a Chi polem of omer son, We donot sonser ange imagen which shape sandy explicit Chapter Overview Section 91 lists the main methods for inferring shape from intensity images, Section 9.2 introduces the concept of reflectance map and the problem of shape from shading fom a physical and mathemztical viewpoint 7 Section 9.3 discusses a metha for estimating albedo and iltoninant direction, Section 9.4 describes a methed for extracting the shape ofan object from a shading patter. Section 9.5 shows how shape can be computed from the distortion oD texturescaused bythe aging projection, considering deterministic and statistical textures, ‘What You Need to Know to Understand this Chapter + Working knowledge of Chapters 2 and 4 + Working knowledge of Fas Fourier Transform (FFT), "would or wit oe wiht knowing wb the ons, Blin be fi ko is poi sce 2920 ‘Chapter Shape from Single-image Cues 9.1. Introduction ‘A common experience of our everyday life isthe perception of solid shape, indeed so ‘common that itis hard t appreciate its ull complexity. We percsive without effort the ‘shape of complex abject ike faces and cars, irregular surfaceslike mountains and even of changing surfaces ikea ee inthe wind. In spite of this apparent simplicity, shape reconstruction has proven avery hard problem for computer vision; indeed, one which is solved only partially. Many methods, collectively known as shape from X, have been proposed for reconstructing -D shape from intensity images, and the algorithms in the previous two chapters can be regarded as shape-from-X methods (ie, shape from stereo, shape from motion) Shape-from-X methods exploit a large variety of image cues; Table 9.1 mentions the best-known ones, and classifies them according to two important characteristic: the number ofmagesneeded, and whether or notthe method requires purposive modification of the vison systems parameter; that i, whether the method is ative or pasive. "The active-passive distinction is worth commenting. In ative methods, the vision system's parameters are modifed purposively; for instance, a shape-from-defocus system control the focus ofthe lens to acquire two or more out-of-focus images, and use estimates ofthe focus level at each piel o compute local shape. Shape-from-motion systems can be either ative (ifthe sensors are moved purposively to generate a relative motion between sensor and scene) or pasive (if relative motion occurs without purposive sensor motion)? This chapter concentrates on shape reconsructon from a single intensity image. Notice that we make no assumptions on the shape ofthe objects inthe scene, This, situation is similar to trying to make out the shapes of unknown, solid objects from a single photograph, Here is concise statement ofthe problem, Problem Statement Given a single image of unknown objers. reconstruct the shape ofthe vil suraces. As suggested by the table above, various eues can be exploited to tackle the problem. This chapter discusses two of them in detail, shading and rexture, and gives algorithms for computing shape from shading and shape from texture. Te reasons for choosing these two methods are that they do not require special hardware (unlike for instance shape from zoom or focusidefocus, which require a motorized lens), and do ‘not depend on image preprocessing (unlike for instance shape from contours, which assumes thatthe contours of objects have been identified). cv von denotes whol ses compue vison he np of bch ef eyo recor in ser the term ef orate fr bration, ah et son pee, mpes re anata at ‘Spurl parently norer lo comp pease Thus eto nd obser nerat ata, {vison isnot tou intereting ed saphos or eguenes For mason ave slo, set Farther Reaings Section 82 Shape from Shading 221 Stopes How may ingues Mebaliype _ Fadmoein stereo Dormer pase Clap Motion asequnce—setepassne: Chapters Foceidekens Doe more sive Farther Readings Zoom doemore scive Farr Resins Conte singe pasive Farther Readings ‘Temure single pauine ——thischpter Sain, singe pine ——_thisater Table 9.1 Shape-from-X methods and their lasitcaton, ‘= Notice that our problem statement implies the use of intensity images as range images are themselves arepesentation of D shape. In ft the problem ofthis chaper canbe egarded as one f producing a rang mage from an bensty one. 9.2. Shape from Shading ‘Shape from shading wses the pattern of lights and shades in an image to infer the shape. ‘ofthe surfaces in view. This Sounds very useful also considering that single intensity images are easier to aoquite than stereo pars or temporal sequences Unfortunately, the problems we face in shape from shading ae considerably more complicated that any others encountered so far. There are at least two main reasons fr this fact: the fist has todo ith he physi ofthe problem, the second with the mathematics. We willhave to look into both in ceder to arrive ata method for computing shape from shading In spite of its ficulties, shape-from-shading has been used successfully in var ious applications. A typical example is astronomy, where shape from shading is used to reconstruct the surface of a planet from photographs acquired by a spacecraft. An ‘example is given in Figure 9.1, which takes forward the Magellan example of Chap- ter7. Here ashape-from-shading algorithms refines an inital range image obtained by stereopsis 9.2.1 The Reflectance Map "The fact that shape can be reconstructed from shading is duc 0 the existence of a fundamental equation that links image intensity and surface slope. In order to establish this equation we need to introduce the important concept of refectance map. We fst discuss the simple cae of Lambertian surfaces. Tn Chapter 2 vo have seen that a uniformly illuminated Lambertian surface appears equally right fomall viewpoints The radiance L ata3-D point, P, is proportional tothecosine ofthe angle between the surface normal andthe direction of theilluminant: en222° Chapter9_ Shape fom Singlesimage Cues i) Figure 9.1 (a): 0 rendering ofthe surface reonsteucted by stereopis from the stereo pairon page 142. (0): the refined surface obtained by te shapefrom shading lgoritm: notice the increase level of deta: Courtesy of Alois Gller, Insitute for CCompater Graphics und Vision, Techical University of Graz withthe vector which gives the illuminant direction (pointing towards the light source), nthe surface normal at P=[X, Y,Z]" and pte effective albedo; thats, the real albedo times the intensity of illuminant. Tsiessthe dependence of the radiance on the surface normal, we rewrite (91) a8 02) Section 82 Shape from Shading 223, We use R instead of L because (92) tells you how light is reflected by a Lambertian surface in terms of the surface normal for any given albedo and illuminant. This is @ particular example o: reflectance map. In general, the function Rismore complicated ‘or known only numerically through experiments 92.2. The Fundamental Equation InChapter?2, we alsomet the imagelrradiance equation, which, denoting with p(s, 9)” the image of P, we wrote as (44 2) 105 (4) cate, (03) ith £(p) the brightness measured onthe image plane at p. We now make two important assumptions: 1, Weneglect the constant termsin (93) and assume that the opicalsystem hasbeen, calibrated to remediate the cos effect. 2, We assume tha all the visible points of the surface receive divetilumination. Assumption 1 simplifies the mathematics of the problem, and Assumption 2 avoids , becomes therefore asthe average ofthe rightness ‘ales ofthe given image, and hene look a (9.12) as an equation for albedo and slant. "Asimiar derivation forthe average ofthe square othe image rightness < E>, (see Exercise 91 for some hit) give ‘Wearesilleft withthe problem of estimating r, the tt of the illuminant, Thiscan be obtained from the spatial derivatives ofthe image brightness, Fund Through, some simple but lengthy algebra that we omit (if you are curious, see the Further Readings) one finds tan = 916 ee) with < £, > and < E> the averages ofthe horizontal and vertical components of the direction ofthe image spatial gradient, [E,,£,]" =(E2 + £3)* [Ex]. Wenow summarize this method Section 94 Variational Method for Shape from Shading 229 ‘Algorithm APPRX_ALBEDO_ILLUM_ FINDER ‘The inputs an tensity image of Lamberton surface. 1 Compute the average ofthe image brightness < E >, and of its square, 2 Compute the spi image gradient, £1", and let, £,]" be the wi vector giving the direction of," Compute the average ofboth components, < Ey > and < Fy > ‘3. Estimate p, cos o, and tan through (9.14), (2.15), and (2.16 ‘The ouput ae estimates of co nd tan ‘This method gives reasonably good resus though it fis or very smal and ery large slants However, one ofthe hypothesis underlying the derivation of his method is nc sister. I might tave ocurred to you that a surface whose normal vectors ae wiry tributed in 3 spaces ikely to give rst seltshadowing. especial fr large ant of the illuminant diction. This means that in some of aur precomputed integrals the image brightness was negative. To avoid this inconsistency, the ntgrals shouldbe cvluated a> rmeicaly using (17) 8 dfiton of mig brighiness For details on the conseat (bat considerably nox complicate) version of APPRX_ALBEDO_ILLUM FINDER reler to the Further Readings Curiously enough, the “consent” method doesnot improve ‘uch the ial rest. ‘We are now ready to search fora solution to the shape from shading problem. 9.4 A Variational Method for Shape from Shading We picked one of the many existing methods based on a variational framework, the solution of which gives the slopes of the unknown surface Z. 9.4.1. The Functional to be Minimized Even under the simplifying Lambertian assumption, the direct inversion of (9.6) is very dificult task Inessence, one as to solve a nonlinear paral differential equation in the presence of uncertain boundary conditions. For many such equations, even slight amaunt of nie: can means that the sation (a) des not exist (bs not unique ‘or (c) does not depend continuously on the data,‘ Our equation is no exception. If you are interested to the mathematical aspects of this problem you will nd pointers in the Further Readings. ‘A typical trick to circumvent at least existence and continuity problems (con- litons (a) and (c) i to recast the problem in the variational framework. Instead of ooking for an exact solution to (96), we allow for some small deviations between the “inte mathemati true, poem fr wit et oe he omits or hls tobe pos230 (Chapter shape from singl-image Cues image brightness and the reflectance map, and enforce a smoothness constraint which controls the smoothness ofthe solution. One possible way to implement this ida isto Took forthe tinimum of a funcional £ ofthe form b= [ted (Cusy Robots tate)). — Can in which the smoothness constraint is given by the sum ofthe spatial derivatives of p and g. The parameter 2 is always positive and controls the relative influence ofthe two terms inthe minimization process. Cleary, a large 2 encourages a very smooth solution nat necessarily close to the data, while a small 2. promotes a more irregular solution closer tothe data, Unlike the cas of deformable contours, the minimization of his functional cannot be performed effectively by means ofa greedy algorithm and we have to make use of the full machinery ofthe calculus of variations, 12. The Euler-Lagrange Equations ‘The eaeuls of variations gives you a straightforward procedure to derive equations minimizing pencicfuntonal the Euler-Lagrange equations. Thissectin simply tls Youhowtosetup these equations refers tothe Further Readings for more information ora funtional& which, ie (317), depends on two function p and g of wo real variables and y, and on their fist orde spatial derivatives, the Euler-Lagrange ‘equations read and Since Ris the only function ofp and. in (9.17), and neither E or R depend on Ps Py ‘ge and gy, the Euler-Lagrange equations associated with (9.17) become aR WAU FE 2p Bry and aR 20 RYE Digs 2iay “For ur pups ts dition fhe Esler Lagrange etion sect ovo maths» rather boing sat ch inate exe of alee. There podem sot the derivation othe Section 94 AVariational Method for Shape from Shading 231 Which can be simpli to give (018) and 9) With Ap and Ag deroting the Laplacian of p and g (that is, Ap = pus + pyy and Aq = due + dy)-Ournext tsk isto solve (9.18) and (9.19) for p ang 9.43. From the Continuous to the Discrete Case ‘etamns out that solving (9.18) and (9.19) i easier inthe diserete than in the continuous «ase. Thus we immediately proceed to ind the disrete counterpart of 9.18) and (9.19). ‘We start by denoting with pnd, the samples of p and g over the pixel grid atthe location (i,j). Trough the usual formula fr the numerical approximation of the second derivative (Appendix, section A.2), (9.18) and (9.19) become 1 Arist Peay Pray + gst + Pajnt =~ FEU) ~ ROP 4)) 2 (820) ” and 1 stay east acai toi HE Real, 021 ci © The paral derivaves of the reectance map in (220) and (921) are either computed snalytially in the Lambertian case for example) and then evaluated at py end OF ‘ealuated numerical fom the reflectance map sel, The problem is now reduced to finding the slopes py and 9, solutions of (9.20) and (921), and determining the unknown surface Z = Z(x, from them, 9.44 The Algorithm We observe that (9.20) and (0.21) canbe rewritten as a a R a e-e (922) pas Put and (023)232 chapter Shape from Single image Cues with (Appendix, section A2) px Metab ist Piss + bu + and gis P41 + Ais1 EMAL 7 [As pj and diy ate the averages of pj and qiy over the four nearest neighbors, (0.22) and (9-23) can be turned into an iterative scheme that, starting from some initial configurations forthe py and q,advances from the step k to the step k + according to the updating rule (924) and (025) However, where have all those houndary conditions gone? The answer isnot easy, because in many cases of interest the boundary conditions ae practically unknown, In this case, one attempts to impose “neutral” boundary conditions, ike the so-called natural boundary conditions (p and q constant over the image boundary), of the eylie boundary conditions (p and q wrapped around atthe image boundary). Since we are soing to compute the Fourier transform ofp and g, in what follows we adopt the eelic boundary conditions. Ifthe boundary conditions are known, they should be enforced in the iterative scheme at every step. 9.45. Enforcing integrability [As most pioneers of shape from shading, we too have left out what tums out 0 be a ‘ery important deta Ifyou tually atemprto reconstruct the viewed surface fom the normals obtained by iterating (24) and (228), you wll son ind out thatthe solution is inconsistent This shal surprising: since the functional was not told that p and @ were the partial derwvalives ofthe stme function Z, there sno Z stich hat 2 = p and 2, = 4. Toctcumvent this inconvenience, a good iea isto inser a step enforcing integrability after each tration. ts more complicated to explain the reason why it ‘works than doing itso we fist you how it works and then discus why "At each iteration, we compute the Fast Fourier Transform (FFT) of pang. In complex notation,” with asthe imaginary unit, we can write Tf you aos fir with the complex notation fo the FFT, ou may ust skp he dren snd go lt the my toibedextpio oft agra, Of coun yoo need at at be seo use an FFT eu! Section 94 A Variational Method for Shape from Shading 233 paLeploneyelosor? and a= Deena, where the sums range over al possible values of a, and (multiple of fundamental frequency) and the cand cy ate the Fourier coefficients, Then let, 2= Leto err, 02) with hn oy) = “ety ») (oan ‘The function Zin (226) has three important properties. © provides souton to the problem of econsutig suac rom set of poninteabe pant * Sine the soetkent eo») donot depend on x and, (226) can be easly diferenited wth respec to andy ope a nwt ol megble pala and sch at i Dilenetoo, 9 =P chlo oye (928) and Z =F = Duoscton peor? = Prelarasili, 828) '* Mostimportantly, p’ and’ are the integrable pair closest to the old pair of p and g. Inmore technical terms, wehave projected the old p and ontoaset which contain only Integrable pairs. Its «projection in a mathematical as Wel as intuitive sense, because it you doit twice (or more times) you invariably get the result ofthe fist ime. This can easily be sen by plugging the coefficients cand cof (28) and (9.29) in (9.27), ‘We now summatize the procedure we have dbewssed Algorithm SHAPE_FROM_SHADING The inputs Formed by sa nage ofan unknowa sua the electance map ofthe suri the surface’ albedo, andthe direction an intensity of the illuninant. The same assumptions of section 92 bald. Moreorer the surtace slopes, p and, Gitialied to 0) ae assumed to wrap sound th image boundries eyelie boundary contin) Let be the elfectivealbedo andi the ilinant direction, Uni a suitable stopping criterion sme, iterate the flowing two scpae Chapter shape from Single-image Cues 1, Update p and g through (8.22) nd (2). 2 Compute the FFT ofthe updated p and q, estimate Z acconding to (926), and p' and g according 0 (9.28 and (0.29)-Set p= p! and g=¢ ‘Tae outputs formed bythe estimate of Z, p and g ‘© Aveschiteration, make ste hat he relectance maps postive at each pte: st negative “ales to The parameter is usually sto a vathe lage value, for instance 100 = Notice thatthe stepenfocingintgrabiliy can slsobeemployed asastand-alone proseéare for reconstructing srlae Z fom nniategrale paso slopes pandq over a pixel rid. 9.46 Some Necessary Details ‘What can be said about the convergence properties of SHAPE FROM SHADING, the optimal and the stopping conditions? ‘Convergence. Unfortunately, not much canbe said in general about convergence, except thet it depends on how far the initial p and g are from the true valves and on the precision with which the reflectance map and the illuminant are known, “The optimal >. Something more canbe suid on 2. The algorithm tends to converge faster for smaller 2 (as it ean be realized by looking atthe right-hand.side of (9.24) and (9.25) Not surprisingly, though, if becomes too small (typically below a few ‘hundreds, the algorithm becomes unstable. For better (and more sophisticated) ways to speed up the iterative step of SHAPE. FROM_SHADING, look into the Further Readings. Instead, if istoo large, the algorithm tends to promote very regular solutions and can even walk away fom the “correct solution .. having started from it! ‘Stopping Condition. Its not easy to give stopping condition valid inal cases Looking atthe resid, facarce Re often helps but is not always appropriate, and the same can be said forthe functional © itself. The problem i, the history of values of a residual funetion is not always a faithful indication of what is going on ia the minimization process. It may happen that the residual appears stuck while the solution is actually getting closer tothe desired minimum, It also n0t unusual thatthe residual does wot change appreciably for many iterations and then starts a relatively rapid descent toward the minimum value ‘We leave you with the somewhat uncomfortable feeling that implementing a shape-from-shading algorithm means dealing with a numberof open issues Ultimately, you must get acquainted with the particular problem at hand, and develop your own, ideas about it We conclude this part of the chapter with an example of SHAPE_FROM_ SHADING when sun on the data of Figure 92 (a). Figure 94 displays (from left to right) the surface reconstructed after 100 iterations, 1000 iterations, and 2000 iterations. Section 85 Shape from Texture 235 a oN () ie lm esi eo fm 9.5. Shape from Texture ‘We now move on to the second theme ofthis chapter, shape from texture, Fist ofall, ‘we must specify what we mean by texture. 9.5.1 What is Texture? ‘Definition: Texture, Texels Aseria reption an eon opt cle fact Amiens ol sre petting esa of which is distorted by the projection across the image, 7 — a Fae 5 and 96st odio, Figure 9 shows images oa euler asidofees coverings plane @) and ner (9) Sots the dt o he clpes Gage exe) proetonsf he sls he sone (uae es) ao te mee: sesh commen ont nperiant etre soon Figure sho ns oat Surtcs om leh wood sks ad sal avers, sn. Te etre236 Chapter 9 Shape from Single-image Cues @ Figure 95. (a) Image ofa pla curved sri covered by «deterministic texture. (b) The same texture on Figure 95 is called deterministic, the one in Figure 9.6 statistic. As we shal se soon, there isan important difference between deterministic and statistic textures in practice OO ‘efinition: Deterministic and Statistic Textures _Devrminstic textures ace eeated by the repetition of xed geometric shape such as acc, square, a deoorative mot ‘Stati textures ae erated hy changing puters with i Mi NN Nit @) (b) © Figure 96. Three natural textzes (a) Wood sticks and smal eaves. (b) The surface ofa rock (6) Linea patoras on sand. Section 85 Shape from Texture 237 Examples of deterministic textures are images of patterned wallpaper, bricks walls, and decorative tiles Most natural textures are statistic: think for example of pebbies, gravel, wood, or lawns. To recover shape from both types of texture we need ‘oleama few, basic facts and make some choices 9.5.2. Using Texture to infer Shape: Fundamentals Why Does it Work? Notice the strong impresion of shape you get fom Fig- ures 5, in which texture isthe only cue preseat® How does this happen? In the image ofa textured 3-D surace, de eel appear done. Tithe ke fact behind shape from texture: the distortion of the individual exes and ts variation across the image create the 3D impresion? We can therefore formulate the computational problem of hape-trom-textre a follows Problem Statement: Shape from Texture Given a single image ofa texturod surface, estimate the shape ofthe observed surface fom the astortion ofthe texture created by the imaging process. Representing Tage Texture How do we represent image texels? Deterministic and statistic textures ere represented by qualitatively diferent techniques, and methods vary accordingly Deterministic texels are represented naturally by the shape parameters of the specific shape at hand. For instance, the ellipses in Figure 9.5 can be represented by the parameters ofthe ellipse equation (Chapter 5). Statistic textures are represented typically in terms of pail frequency properies; for instance, by the power spectrum ‘computed over imageregions Representing Tectare Distortion, What kind of texture distortion does the imag. ing process introduce? Considering Figure 9.5, we notice two distortions: perspective distortion due to the perspective projection, which makes circles increas ingly far from the camera project to smaller and smaller ellipses; foreshortening: which makes citcles not parallel to the image plane appear as ellipses. Under suitable assumptions, the amount of both distortions can be measured from an mage. For example i Figure 95 (a) perspective distortion can be quantified by the area variation cross the elipses, and foreshortening by the ratio ofthe else's semiaxes. In act, we can use two different classes of texture-based measures to estimate shape: “Tatura be ao a pore meant segment ings Ie Fgre 96, slough testy varies with 20 pirat repr wehaesronginpeesntsunfor textnsincach mags edi sere obvions ithe ve mags epee dierent aac pe. nde oth ar import seo shape persion a human vs,238 Cchapter9 Shape from Single-Image Cues SURFACE Figure 9.7. The angles tit rand slant, defining unit norm Nove that this gue does nr illsrate a pojction, which is why the cog o the ference frames son the image plan, and tha these ages fate fret fom thse use in shape fom shading (Figure 93), 1. ameasure of shape distortion (applicable to individual image teres) 2. the rate of change of a measure of shape distortion, called texture gradient or ddistorion gradient for that measure (applicable to regions containing several image texels) Representing Surface Shape. a general, the shape of @ surface at any point is completly identified by the surface’s orientation (i, the normal) and curvatures (Appendix, section AS) However, i turns ou that estimating curvatures from texture {sar from trivial, We shall therefore concentrate on recovering just surface normals ‘we did in section 94, As we know from our diseussion of shape from shading, a map of rrormals specifics the surfaces orientation only at the points at which the normals are computed (e4, the center of deterministic exes), but, assuming thatthe normal are ‘dense enough and the srface fs moth, the map can be integrated to recover surface shape (for example, as explained in section 9.43). Representing Normals. Inthe literature of shape from texture, its eommon to represent unit normals by the tilt and slant angles illustrated in Figure 97, = Notice tht these angles aze defined ina diferent way trom thelr shape-fom-hading ames (Figure 93) Leb he north sure at P The sures aly approximated yi tangent plane, perpendicular om, The tl «isthe angle taking te X axis of the camera frame, XYZ, on the projetion ofthe normal on the image plane, N. The slant, isthe Section 85 Shape from Texture 239 angle taking m onto th: ~Z axis ofthe camera frame, Rotating the frame OXY Z around Z by « generates the new reference frame OX,¥,Z; i this frame, the normal lies in the X,2 plane, so thal the tangent plane is parallel to ¥ The General Stmcture of Shape-from-Texture Algorithns. We have now all the elements necessary tsate the general structure of shapefrom-texture methods, 1, Select arepresextation adequate forthe image texture at hand, 22 Compute the chosen distortion measures (if required, their gradients) from the mage, in erms of the representation selected. 3. Use local dstorion (if required, texture gradients) to estimate the local orient tion ofthe surtaze, ‘The nxt section gives simple shape from texture algorithm which estimates the ‘orientation ofa planefeom a statistic texture.” 853 Surface Oriertation from Statistic Texture As usual, we begin by stating a set of assumptions ‘Assumptions 1 The 3D tomes ar small ine segments called needles, 2 The needles ave dstibuted uniformly on the 3-D surface, and their dretons are all independent. ‘A Thesurface is approximately planar 44 The image projection i orthographic We chaactriethe needle by their orientation only: ther postions and lengths ae ieevant, Te reason for choosing such apparently odd exes is that they allow us to establish a deterministic, geometric relation between the orientation of a 3D texel and the one ofits corresponding image texel. The idea i ows this elation to ‘write the probability. say p, of the observed image given aa orientation (2,1) ofthe EXD plane, which allows ust estimate the orientation of the plane athe pat ‘meximiing p(a maximum likligod approach, se Appendix, section A). In actual fact, an approximate olution (¢, 2) an be derived in closed frm, and this what we shall sein STAT SHAPE FROM_TEXT. © Image nedtes canbe extracted from images of 3D textures not necessary mde up by ‘sl tine seamen. To do thi you run an edge detector (Chapter 4) with an adequately small Kernel to extract short coatour, followed by a module detecting and desing 7 goth covering ovation of kin spc given in Chapter 0240 Chapter shape from Singleimage Cues small, rectilinear segments, 8, «variation of the Hough ine detector of Chapter 5. Remember though ifthe 3D texture isnot composed ofsmal line segments, the conditions ‘ofassumptions | and2are only approximated and this isikely o worsen results ‘We now sketch the derivation behind STAT_SHAPE_FROM_TEXT, omitting much detail. Assume there are N needles in the image, and eta; be the angle formed by the ith image needle with the image's x axis In our assumptions, a is @ uni formiy distributed random variable in [0,]. We now introduce an auxiliary vector, {eos 2a, sin 2a", itself a random quanti. This vector is characterized by a probability distribution called distribution ontheunitcrce, whichis function of the dstibution ‘ofthe a Is center of mas is defined by 1 c 2 sin? 130) rym 0%) Itcan be proven that, in orthographic projections (Assumption 4), the center of massis sa sin 2S, 31) Solving foro and r, we find te +o WF (mod2n), (932) where @ and ¥ are the polar coordinates of the center of mass =} arcan 5. 03) Q=VEFH, w= janet 2 Notice the ambiguity inthe estimate oft ‘The complete algorithm is stated below 2S ‘Algorithm STAT. SHAPE. FROM_TEXTURE The input isan image containing W neds cach forming an angle wih the x axis ofthe image ‘The asunptions ated above hol 1. Compute C$ wing (9.30) Section 96 Summary 241 2 Compute the polar coordinates Q,¥ ofthe center of mass af the dstibaio onthe uit cic wing (933. ‘3. Estimate the orientation ofthe ¥D plan, 2), using (9.32) ‘Tae outputs f),the estimate ofthe orientation ofthe 3. plane 9.5.4 Concluding Remarks ‘Shape from Testure and Teeture Segmentation. 1 our discussion of shape ftom {exture, we assumed. uniform texture throughout the image. In reality, tis i something ofa special case: Images ae ikely to contain different textures, of textured areas surrounded by non-textured ones. In general, differenly textured regions need separat- ing before shape-from-texture algorithms can he applied. This problem is called txaure segmentation, anditisa classic problem of image processing. As texture isan ubiquitous feature ofsutaces texture segmentationis frequently used to identify objects ofinterest in the image; for instance, it proves very useful in many defect detection systems. Intexture segmentation, image pixels are classified on the bass of several textural measures, called fesure features, computed in local neighborhoods These measures are usually statistical properties of the intensity values, or spatial frequency measures like the power specum. Unfortunately, segmentation relies on the assumption that texture features are constant within regions of uniform texture, whereas texture (and feature) distortions are exactly what shape-fom-texture methods rely on! For this ‘reason, petforming texture segmentation and shape from texture at thesame time snot trivial at ll which is why we have treated shape from texture us independent of texture segmentation. The Further Readings point you to en introduction tothe literature of texture segmentation! Texture Depends on Spatial Scale. ‘Textures appear and disappears at different spatial scales. For example, imagine to zoom in on a loor made of wooden planks: ‘When the image conans many planks, the main image texture is given by the planks’ contours; as you clos? in on one plank, the main texture is given by the Wood's fibers. ‘These textures look different: The wooden planks create a deterministic pattern, the fibersa statistical one Therefore, “the texture ofa surface” actually refers tothe texture ofa surface ata giver spatial scale 9.6 Summary ‘After working through this chapter you should be able to: explain the nature and objectives of shape-from-X methods explain the purpose and nature of shape from shading 2 euson why tue epmetaton hs not een inde in og cus of festa tsi (Chap ‘eps4and) sea thai war ot needed spot shape fm sae242 Chapter Shape from singlesmage Cues ‘2 design an algorithm for shape from shading, and recover a surface from a map of normals explain the purpose and nature of shape from texture 1D recover the orientation of plane covered by a statistical texture 9.7. Further Readings Table 9.1 mentions several shape-fom-X methods; here are some introductory refer ences, chosen from a vat iterature. You can begin an investigation into shape from focus end defocus fom [17,21 Ma and Olsen's work [15] sa good staring pont for ‘Shape from zoom. A goodlvtraton af shape fron contours given by [16], anda mod fm shape-rom-contour approach is discussed by Cipolla and Blake (5). Active vision isa recent, influential paradigm of computer vision; ora introduction, see (3), or the report ofthe recent pane discussion in [2,0 again the seminal paper by Aloimonos etal Ue aprximte meta for deteminng odo niin aseen aed from [25], which also explains in detail how to obtain 216). The field of shape from shading hes been strongly influenced by a numberof pioneering works due to Horn and coworkers [11 14, 12] More on the electance map ean be found in 13) The original ‘method proposed by Horn moe than twenty yeas goin [1] isl a classi, though fot easy to explain and implement. The algorithm described in this chapter as been ‘proposed by Frankot and Chellppa [6] an improvement of the method described in [12] A rigorous account onil- posed problem of eomputer vision and methods for their solution canbe found in (2) Much of our discussion of shape from texture is based on Grdiag’s work [7, 8] and references therein, in which you can also find a detailed treatment of texture: ‘axed curvature estimation, Methods for recovering shape from deterministic textures, Lnder perspective projections are discussed in [4 15,9} The closed-form solution in STAT. SHAPE, FROM_ TEXTURE isducto Garding 8} andisa variation of Witkin’ ‘maxiasunikelinood estimator [25] which involves @nontiil minimization. Receat xampls of shape-rom-texture algorithms reconstructing curved surfaces covered by Statistica textres using spatial frequen desriptos, are given in (23 and [20]. Ap proaches tothe problem of segmenting texture and computing shape from textare taneously exists but the methods are aot eva; an eevmpl i reported in (8) Texfuresezmentation in sell sa casi topic of image processing, and iterature s ‘ast To begin its exploration, se [10] for a collection of texturebesed segmentation algorithms, and [2] for recent survey ofthe eld. 98 Review Questions (3-91 Why the methods in tis chapter recover shape but not distance? 11 92 Why the methods inthis chapter are applied to intensity images, and not to range images? Section 98 Review 243 (1 93. Explain theditference between the tit and slant used in shape from shading ‘and shape from texture. 10 94 Ate the til and slant anges used in shape from shading and shape from texture the same as spherical coordinates? If not, what are the differences? 1995. Explain why, in algorithm APPRX_ALBEDO_ILLUM_FINDER, it could hhappen that cos > 1. What would you do to avoid this inconsistency? 1996 Identify the ambiguity intrinsic tothe reflectance map of a Lambertian sur face, Diseuss the consequences on shape from shading 1-97 Whyacethcboundary conditions necessaryin SHAPE_FROM. SHADING? Could you run the algorithm ignoring them? 9-98 How would you seta value for 2 in SHAPE_FROM SHADING? (Hint: Look atthe right-hand side of (9.24) and (9.25).) 1 99 Explain the diference between foreshortening and perspective distortion. 910 Can you ssume orthographic projections when using the area gradient? Why? 59.11 Identity tte elements given in the general structure of shape-trom-texture algorithms in STAT_SHAPE_FROM_TEXT, © 9.12 How woull you choose the parameters of an edge detector used to extract needles for STAT_SHAPE. FROM TEXT? © 9.1 Compute te integral os : o) Figure 10.2 ‘3) Ideal range view ofa simple objec, showing the labels) of image feats (surface patches) mmm ae rectangular mg 7 ate Square; mm ate L-shaped (b) Visualization of sua based mode of te same objec, stowing the lbes(m,) of the object features.252 chapter 10 Recognition 6 Figure 103. Portion ofthe interpretation tee forthe problem of Figure 102 expanded when reaching the fst interpretation, f (slid path). The fire shows also the pat (dashed) correspanding to the comer interpretation fp (but not the nodes generated between fan I). only be matched to m, or my. The tree is expanded depth-first, uni all image features acematched, The next level down does not include the model features already matched, ‘seach model patch can correspond to only one image patch The fist consistent inter. pretation we find, 1, = (ism) fram) ema Sa ns is wrong. Why? Because the constraints enforced are associated to local features, and therefore focal in nature. In general, even if we introduce more constraints, even involving groups of some features, there is no guarantee that all interpretations found make global sense. One way to check global consistency isto compute the transformation which brings the matched model patches onto the image patches, project the located model tonto the image, and check that all madel patches end up in the expected positions. This process i called verification; it uses the image and 3-D positions ofthe features matched to estimate the imaging transformation, backprojets the mode features onto Section 102 Interpretation Tees 253, ‘the image plane, andchecks whether model features are imaged close enough to their corresponding image features inthe orginal image. Since this amounts to solving the location problem, which is the subject of the next chapter, we leave out the details for now. Going back tocar example, we verify that fy sinconsistent, andstrt backtracking, ‘We move to the frst alternative possible, (fs, m:) expand a new subiree depth-first, and look for new interpretations. We come across afew more inconsistent ones, and eventually reach the zofrect one. "Notice that dept-tst search with backtracking isa reasonable choice, although heuristic Search and mere sophisticated algorithms are ako possible (ce Further Readings). 10.22 Wild Cards and Spurious Features ur example above is unrealistic in atleast two ways. First, eal images offen contain severalinstances of te target object; we wouldlike a recognition algorithm find them all In this sense, we should make ITs addess the question “are there any instances of X in the image?” (section 101). Second, feature correspondence is not generally one-to-one: Some object features are certainly hidden to the viewer (m,ma.ms in Figure 10.2(a)), somemaybe occluded by other objects; the feature detector might miss ‘some image features and introduce some spurious ones. We would like to recognize “objects in spite of reasonable amounts of missing and spurious features, Spurious features are accommodated by a wild card; that i, ettious feature which matches any image feature for which no real match is found. In practice, an ‘addtional “wildcard aode” issimply appended tothe list of model features Without the ‘wildcard, he search would backtrack as soon as encountering a spurious image Leatue, {ailing potentially axeptable interpretations Unfortunatly the wildcard increases the search complexity setiousy. Assume that R image features are real object features, and S are spurious The wdeard search finds the tru interpretation, in which all the B real image features have deen matched, but also the interpretations containing R ~ & real features and S +k wid card matches, of which there are many. ‘A method redtcing unnecessary efforts is branch-and-bound. Suppose we are looking forall the instances ofa modelin theimage. When we find the fist interpretation accepted by verification, we record the number, rf its nonwildcard matches. We then ‘backtrack fo look for ater interpretations, but terminate any path f we realize that any interpretation below the current node cannot possibly include more than real matches. For instance, ifr =6, the current interpretation has two nonwildcard matches, and only 3 image features hav not been explored, we can safely quit this path and backtrack (Of course, we must abo update r when discovering a verified interpretation with r! > r real matches, 102.3 A Feasible Algorithm ‘We are now ready togve an algorithm which takes into account ll the points tovched ‘on in our discussion.254 Chapter 10 Recognition ‘Algorithm INT_TREE Let (Open the ordered lis of eso expand eft rg {nterp the is of consistent matches forming an interpretation: Marsice the maximum numberof moawidcard matches in any interpretation found: Whe wild ard, sSze{nterp) the numberof noildcard matces in nterp: root abe forthe te 008; consist (X),X a non- wider math, is true if Xs consistent wih constraints or X00, and fale otherwise, Let [indicate an empty stand [a 5,.-.1] lis of elements. Notice that he ‘ildardis by definition, consent with any constraints (Open s [root Iterp=[ Masize=0 WHILE (opea #[)) BEGIN remove leftmost match, X = (fjm) {tom ope; TF (cosisteat(X)) AND ‘max posse sie of interpretation on his path > Masi) BEGIN ad X to Inerp TF (leat node reached on this path) BEGIN vei Tater: IF (verication succeeds) BEGIN sve laterp: Intesp«[ Maxsze=sze(laterp) END, return} END ELSE (+ not et but someon) Tet = (tomo fptomalSgae WI) be the expansion of the eurreat node drop model features aleady matched fom Ls dL tothe left of Open END ELSE (‘inconsistent match *) END END Section 103 variants 255 103 Invariants ‘We now move onto variants, the second identification method of this chapter. Invari- ants are functions of ‘eature groups that allow us to index into the database of models. Therefor, with reference to our classification of recognition tasks (section 10.1), invariants adress the question “what is this part ofthe image?” (Problem 3). This ection suggests also how to use invariants to address the more general question “what objects are we looking at” (Problem 1)? For reasons which willbe clear soon, we focus on intensity images. 103.1. Introduction Ingeneral.invariatscan be defined as properties (untions of geometric configurations which do not change under a certain clas of transformations. For instance, the length of line segment doesnot change if the segment is rotated and translated: Length is invariant under rigidmotion. How can invariants help in object recognition? ‘One of the Key prablems of recognition from intensity images thatthe appearance ofan object depends on imaging conditions like viewpoint end intrinsic parameters. fwe could identity shape properties which donot change with imaging conditions the problem would be soved. This is exactly what invariants are, and te basic idea behind invarant-based recognition svery simple. You define invariant functions of some image ‘measurables (for instance, image contours) which yield sufficient different values for allthe shapes of nters, and then use vertors of invariant functions to index a database of models The invariance ofthe model index (the vectors of invariants) tothe imaging conditions means thatthe same index is computed from any image ofthe same object. So here is what we want todo Problem Statement| Given st ofimage festares define a vector of invariant functions ofthe image Features and vse itt index brary of models ‘Unfortunatly, but not surprisingly, chee is basie imitation: Invariants are known only for some lasses of shapes, and defining useful invariants for general objects isnot eagyatall Forte pepases ont inteovctory dienesion, we make the felling assumptions Assumptions 1 Weconside olysclor, algebraic varans ofthe geome maging transformation. 2 The objects to be rocognned are planar (but their position and pose in space is uncon stained) ‘The invariants are functions of group of image contours 44 mage contours ae formed byline segments and arcs conics.256 Chapter 10 Recognition Scalar, algebraic invariants ate obviously scalar, algebraic functions, in our ease of the parameters of the equations representing image contours Two examples o such invariants are introduced inthe next section; the Further Readings point to discussions ‘of non-algebraic invariants. We concentrate on geometric invariance. ‘There aze various reasons behind the apparently very restrictive Assumptions 2, 3, and 4, Fist, all the fundamental concepts of invariants can still be illustrated, ‘Second, several invariants for lies and conics are known. Third, many’ man-made ‘objects although not planar, contain coplanar lines or conics (or instance, windows of ‘ars and buildings or circular holes on a same planar face of» mechanical component). Fourth, itis dificult to determine invariants for general, 3-D objets. ‘Assumption 4 implies that line and conics detectors, such as those described in ‘Chapter 5, are available to locate and describe lines and conics in the image. "The use of invariants brings about two main advantages. Easy model aquisition A library of models suitable or recognition i bul easy by acquiring a single eal image ofeach objec in any pos, computing a vector of invariants and string the vector asthe object index Recogition without model search, There is no need to search the model base to evoenize an object, sa vector of invariants indexes rely into the Iibary of models This implies that suitable adening stateay (or instance, a hash table) rakes the recognition time independent of the size of the model brary. 1032 Definitions We are now ready to give more precise definitions. In particular, we want to formalize the transformation for which we need invariants, as well s the concept of invariant property care after invariants ofthe geometric imaging transformation introduced in Chapter 2.As we assume all object planar, he imaging transformation sa transforma: tion between planes: it maps an abject plane onto the image plane, Therefore, we can ‘consider ditectl the mapping between corresponding points on the objec plane and the image plane. The situation is illustrated by the example in igure 104, As the general imaging transformation of Chapter 2, our plane-to-plane mappings modelled by apr jective transformation (Append, setion A.4); henoe the name projective invariants. ‘Let [r,1]' be the homogeneous coordinates of an image point and [X. ¥. 1] the homogeneous evordinates of the corresponding soene point, P. The projective transformation that maps P into p (Appendix, section A.4) can be written as kp=TP, hoaphii ood dn sue and oes mee thn one img per mode Dtapratis rent time wl gow with tei of te ode basse te Fut Resins for mae Section 103 variants 257 OBJECT PLANE ‘OBJECT PLANE? IMAGE PLANE Figure 104 Th: projective transformations 7; and Ts model the mapping from object to image pane fortwo diferent viewpoints (before and after a rotation R anda walation tf the objet plan). Where T isthe 3 3 projection matrix, defined up toa scale factor, and k accounts for the fact that homogeneous coordinates are defined upto asele factor too, And here is 1 formal definition ofthe invariants we need for our purposes. Definition: Projctive Invariant Let ey beast of contour descriptors of a planar objet, expressed inthe homogeneous coor nates th objet plane Let be the projective transformation modeling he mapping between object plane and image plane points The function / = (eq) is projet variant it Hq) =H) forany projective transbrmaton T: Tiseasy to guess how eg looks like in our assumptions A line, ax + by + cz = simply represented bythe vector1= [a b,c]";a conic, ax? +bxy +cy" +dx bey +f 0 by the matex,C, defined by a 62 ap)fx bey for ec e}|y ap epr ¢\LA Tcp=0,258 Chapter 10 Recognition and used in Chapter Sin the context of elise fitting. n order to move onto algorithms, ‘ve sill need to define invariants suitable for recognizing planar objects delimited by segments of lines and conics. We begin with the crass-ratio, defined on the projective line, on which our other invariants are based. Definition: Cross-ratio Given four dint colinear point described ia homogencous coordinates ofthe projective line oI]! Gayoso A theerossrati cis defined as 40,204) 0.2.3.9) vith at) the determinant between 6p Cross-ratio Invariance ‘The cross atio isinvaran to projective transformations ofthe projective line onto tse “The proof of the invariance andthe main mathematical properties ofthe eros- ratio can be found inthe Appendix, section A.4. The importance ofthe cross-rato isin the fet that, although four collinear points may seem arather restrictive configuration, itis posible to obtain them by projective constructions of some sts of ines and conics. ‘We report two of these inthe folowing, [Invariants of Five Coplanar Lines Given five coplanar lines labelled. two independent projective invariants are Man Me Lael Ms Meal sl” 4) [Maza [Mss ~ 1 here My tha and M8 the determinant of My, = Rewarnedthat,forcettin coniguationsthelubelingoflinesin each My maymakesome determinant inthe denominators vanish (Exerc 105) See the Appendiy, section A+ faran approieh to this problem ‘The proof ofthe invariance off nd Is left san exercise (Exercise 106). Notice that each Mi canbe viewed as the area ofthe triangle the vertices of which are the intersections ofthe lines 1), and lk. Our next invariant involves two coplanar conics, Section 103 Invariants 259 Invariants of Two Coplanar Conics| Given wo matrices desing coplanar cons, C nd C, sucha C ent projective invariants are Ltwoindepen (a2) wher [4] isthe teaceok matic The proof of the invariance of, say, sis instructive, We nee to show that efa']-r[es) vere cand arth maces desrting the cons C; and > respectively tans formed by an arbitra-y projective transformation T such that peaely P. 103) The four conics are defined by PrcP=0 (104) PrP =0 (05) Prep=0 (106) prep=0 (107) Iwe recover P from (10.3) and plug the result (10.4) and (10.5) by comparison with (1046) and (10.7) we obtain aston ror? Equations (10.8) tellyou how conics ate affected by a projective transformation, By means of (108) it is easy to derive the thesis as fe where the last passage relies onthe circular property of the trace of the matrix product, (08) an [replete tert tela, Add tty Atos Ana 10.33 Invariant-Besed Recognition Algorithms ‘This section gives two esental,invariant-based algorithms for recognition and model acquisition. In both eases, the underlying mechanism srather simple. Weidentifyaset of four invariants, f,.... such that one or more can be computed forever interesting object, and define a vector collecting al the invariants in afixed order, g[l}.... 4)"260 Chapter 10 Recognition For model acquisition, we take a generic image of each object, 0 extract lines and ‘conics, evaluate the vetor of invariants for each relevant group of contours, and store the resulting vector, gas the index forthe model of object 0. © Weare guingto use mo vews of ach abet ot jst one tote out invariants which may vary too much aumeriallybeweenditiorent viens. Thetwo views shouldbe representative ‘of ewpoins adopted during recognition. For recognition, we extract lines and conics from the input image, elect the groups ‘of contours for which we can compute the invariants, compute the invariant vectors for all such groups, and use the vectors to index the model library. The object models indexed by the larger numbers of invariants are assumed tobe present in the image. = Contour grouping is «serous practical problem (reall ur dscuson of groping in (Chapter). How do we know thal the featres we use to compute an invariant relly belong tothe sume obec? Simply tying posible groups of features east impractical large ‘numbers of groups including many spurious ones The algorithms inthis section adopt a ‘imple solution: fr lines, they consider only groups of conccuive mage ine (rouping bby provimty) tht is tines with endpoint sulin close to each oer fr conics they just assume that the mumber of eons inte img slow, and the numberof al posible ‘roupsof conics smal (on grouping at al), Notice hat oven simple proxiniy requires care, fs line detectors ae likey to split atleast some ines nto various segment “Thisis the box forthe algorithm acquiring models ‘Algorithm INV_ACQ “The input isthe set of esripto fines and eonis (Sefined in ChaperS detected in wo views ofeach objet 1 For each objet, 04, (a) foreach othe two views (i) form all posible feature groups, say Min al for which an invariant... eis ein (ether five consecutive ines o ais of eons (a compute tho M vectors yr Bae O86Po? BLOUP. Whore gh ~ [1] consis the vals othe four invariants defined by equations (10.1) nd (102), ‘mar invariants inapplicable to each group (some groups contain only cons, the others on fins) (b) store an objet model for 0, formed by abel (ihe objets name) anda vector inex, Tatoo Bul where ge = [ie fe fx dal ad J ii) the average values of he invariant ove the two views, ithe values are su Gently lone to cach ater iy symbol abtiniatng that the iavaran snot liable otherwise ‘The outputisa brary of models sutale or invariant ase recognition Section 103 variants 261 © The inevitable Metuations ofthe numerical values computed, caused by inaccuracies of feature detection image noise and soo, men that range of values, ot just one, of each invariant must he sweated to each objec. [Next we give the skeleton of an invaiant-based recognition algorithm. As we ‘adopt only a very linited mechanism for feature grouping, recognition tris to filter ‘ut spurious groups by verification; thats, backprojectng each model indexed by a set of feature groups and checking that the mode!’ features actually projet onto the corresponding image features in the groups Algorithm INV REC ‘The inputs the set of escriptors of ines and conics defined setion 10.3.2) detestedinan input mage, andthe brary o modes generated by INV_ACO. in which each mode, sasocated {oa se of merical vector indies.» Bu, Wild B= [Fe ->a) 1 Formallpossleroups sy theese R) ffi consecutiv imagenes and image conics, asdone for INV_ACO. 2 Compute the Rivariant vectors... sone per feature group where B= [ff 1] contains he values ofthe fou insrans dined by equstons (101) and (102); mark invariants inappiable othe eroup (some groupscantin oly conics, the theron) {3 Forma lis of al bjt mols indered by t east one invariant vet ndex of... Bs Let gyo-s-rBy. the H index vectors pointing oa generic model hypathess 0, 4. (erteaion 1) Discard rom the stall O; for whichis not possible to determine aunque Projectvetransfomnaion, , compatible with ll tho features assorted 0B .. § Werieaion 2) Dscaté all 0, now associated tow unique projective transformation for hich some baekorojecte features are no sufciently eos the coresponding mage features The outputs st of objects detected in the image ‘Some comments are inorder on steps 3,4.andS.Forstep 3, all invariant values are real, and therefore best represented by ranges of values to account for uncertanty;care must be taken when writing indexing procedures. See Further Readings for examples ‘of indexing procedures. Astostepd, youcan estimate the projective transformation T 2s follows For etch mage line featuring in the definition of gy... 7 (Say = 1,-..,n) you write Th= kd, (109) where Lyi the model ine corresponding to and ki s usual in projective geometry, «nonzero, arbitrary eal number. By linearly combining the three sealar equations in (109), each line I generates two homogeneous equations in the nine entries of T, in which the coefficients are functions of the components of Ly and k. The set ofall262 Chapter 10 Recognition equations (10.9) forms linear, overconstrained, homogeneous system in the entries of T that we wite A (0.19) with A the 2n x9 matrix of coefficients and t= [Tip Tia... 7)". he compatibility ofthe lines and ly canbe checked by looking atthe SVD of the matrix A, A= U DV (Appendix, section A.6). Ifthe effective rank of Ais (that ig the least singular value of Ais very smal) al the line are compatible an tas usual, i given by the column of ¥ corresponding to the least singular value. Otherwise, the lines are not compatible Inthe assumption thatthe matrix T passed the compatiblity test fr lines, you are left wth the problem of testing the compatibility forthe conic; featuring in the Aefnition of ga...» (Say = Lm) withthe corresponding model conics C; ‘This can be done by Tooking a the ratio between each ofthe nine coefficients ofc, and TCT". If the ratio kis approximately the same fr all coefcent, the matrix 7 passed the pobal compatibility test and you may proceed to step 5. Otherwise the ‘mode bypthess 0 discarded, In sep 5, there are two points fo make. Fist, how do we backproject lines and conics? For lines, we backproject each by applying the transformation tothe model line like in (109). For cones, we obtain the backprojection ofeach c; by computing ‘TC,T". Notice thatthe formal diference between this expression and the right hand side of (108) ise othe fact that denotes a projective transformation of ines the former and of points inthe latter. In mathematical terms 7 inthe formeris the inverse transpose of Tin the later. Second, “sufcently close” (verification 2) means that the distance between backprojcted featuresshould never be larger than afew pixelssexact figures depend on resolution, 104 Appearance-Based Identification Finally, we address Problem of section 101:“Ithispart of theimage an instance of X?" ‘And we address the question by using images instead of features as basic components of object modes. 10.4.1. Images or Features? “The key ida behind appearance-based dentincatin Is simple To tore images at -D objets as their representation. Instead of epresentng oben O throught geometic featuesand their sptialelatons werepesentO withiheserofis possible appearances, thatis the set ofmages taken, del, rom ll posible Viewpoints and witha posible ‘lunination diction In practice, weusea suey are numberof vewpointsand ituninaions direction. Asan example, Figure 15 shows 12mage representation of atoy car Wecan create a database of models for ientication by building such set * conser ly listion fo sly. i me Know om Chapter 2. ter iminain parameters ley olen dating pl aes Section 10.4 Appesrance-Based Identification 263, Figure 1055 A simple database exemplifying appearance-based objet representation, ‘Only the viewpoint, 2% the iluinaton was changed oobi the views sown. forall objects of interest. Identifying an object, then, means to find the sot containing ‘the image which i mest similar to che one tobe recognized. Problem Statement Given an image, , conning an objet to identi, anda database of objet model, each one formed by asetofimagetshowingtheobjctundera large number of viewpoints an illminaton onion, fn the set containing the image which most similar oJ A desirable characteristic of appearance based identification, a presented, is that ‘object models can be compared directly with input data as both are images. Feature- based modes (like the ones used with invariants and interpretation tees), instead, ‘equite that features detected and described before data and modelean be compared Unfortunately theresa price to pay: the database may become extremely large even for limited numbers of ots and illumination conditions For example assuming 128x128 mages at one byte pe: pixel, 10 viewpoints per abject, and 10 illumination directions, the representation ofa single object would occupy about 64 megabytes of memory! Therefore the practical problem is, can we devise a way to Keep memory accupation within manageable Knits while performing appearance-based recognition?264 Chapter 10 Recognition 10.42 Image Eigenspaces ‘We shall arive at an appearance-based algorithm, the parametric eigenspace method in thee steps: 1. We definea quantitative method to compare images and introduce some necessary assumptions 22, We introduce an efficient, appearance-based object representation, which makes itfeasible to search a large database of images. ‘3. We give algorithms to build the representation and to perform identification, Comparing Images. A simple, quantitative way to compare two images, ay J, and fy, both 3 for simplicity, i to compute their correlation, e: ELV hth nid. hoh= where K isa normalizing constant, and o denotes image correlation. The larger c, the ‘more similar fy and f- ‘This is simple enough, but we must take some precautions. AS we are really interested in comparing 3D objects, not just images, we need assumptions to guarantee that correlation is meaningful for our purposes ‘Assumptions 1 Each image contains on obec only 2 The objets are imaged bya fied camera under weak perspective [3 The images are normalized in ie that the image frame isthe inimam rectangle cnclosng the largest appearance ofthe object. 4. Theenergy ofthe piel values of each imageisnormalizdt statis SY 1 &, Thecbjctis completely vile and unoceuded in all images [All these assumptions ae consaquences of the fact that we want to compare 3D objects by comparing their images. You shouldbe able to explain the reasons behind each assumption (see Review Questions for hints). Efficiens Image Comparison with Kigenspaces. Our next goal is to devise an efficient method to search a large image database in order to find the image most similar to a new one, inthe correlation sense. The database contains the appearance: based representations of several objects; each objects represented bya set of images, taken by different viewpoints and with light coming from different directions. Such a database is suggested by Figure 10.6, which shows only one image per object for reasons of space. Clearly, if we store a full image foreach view, and many views for Section 104 Appearance-Based Identification 265 Figure 106 Images rom a smal appearance based dataase composed of twelve toy ars. Only one image per object is shown, cach obj the ssc a abe becomes probe large, and serch sed on briefors crc anne Instead, we vepresent jets egepuce To invoice egempocs and thei advange we ned To ear images over a state a fundamental theorem. : : - “Totarstrm 2D image ino vector. we justsan the image topo tom and left to right. In this way, a 'x N image, X, is represented by a N-dimensional weston [Xa Xia Kins Katee Xa Notice tha his represntaton allows us to write image correlation asthe dot product of wo vectors® For instance, tne correlation of images X and X, represented by vetors 1 ao respectively, becomes 0X; 0, From now on, we shall use vectors for images. And here isthe fundamental theorem, SWeasume tht th cnsn nthe olin defo266, Chapter 10 Recognition ========= === = “Theorem: Eigenspace Representa Let ny. be N2.dimensional vectors and §= 1° 4) their average Given the N? xm I-08) we can write each a6 whore, .--€y ae the eienvestors ofthe covariance mati, Q = XX", corresponding tthe ‘(nonzero eigenvals of Q,and = [pit ex” ithe vector ofthe components ox egenspace ‘Now let us go ack 0 our database ofall images ll objets al viewpoints, ll suminton diretions. Assuming O obec with P vewpoits and 1 iuination Cizetons foreach abject the database conains OP images. Using the procedure Sggesedy the theorem. we con bid the covariance mat, 0, fhe whole database, ndropresent each imag, with is vector of eigenspace coordinates, Qisceatly det large mati, and the some ze asx, tut here comes heist advantage of tigenspcesonly ic component associated otelargestegenvales of Qaresigicnt tcraprsent ih images Ta other words assuming tha he nonzero cigenvales hy OF Gare such that > 4p 2+ hy ad y= ford > K We can Write aD ts and ignore all the represented by a point of coordinates ina k-dimensional eigenspace, smaller subspace ofthe original, n-dimensional cgenspace. So far for one image, but how do we representa set of images; that i, all the views in the representation ofthe o-th object? Imagine to move through the views of ‘the representation a pose end illumination change continuously, the point gy moves ‘continuously in eigenspace, sweeping a so-called manifold g” = g'p,D, where p and Tare vectors defining the objet pose and the illumination direction, respectively. The set of eigenspace points associated tothe images ofthe o-th objec is a sampling ofthe associated manifold.” emaining nk components If W pixels 1. For each objet oo be represented, 0: (a) plac he object o the cumtable; = (b) acquit a et of images by rotating the turtable by each ime; (6) imall images, make sre to adjust the background so'that the object canbe easily segmented from the backaround: {) segment the obec from the background (se Execs 109); (6) normalize the images in sale and energy as tated inthe assumptions (9 represent the normalized images a8 veclors. x, where pis he rotation index, p = Teves 2. Compute the svrage image veto 4 ofthe compete database fx. bah x, dl ‘A Form the N2 x N? covariance matrix, Q= XX", with X = [xx]. -atis]..32] 4, Comput the eigenvalues of, Ree the Sint args egevalues and the asscsted sigevetons 1 foreach objeto (2 compute the t-imensonl eigenqace pias curesponding to then images: [ele (estore the dsreteeigespace carve [afl] a8 the representation of ovjete “Tc ouput saset of 0 drt cuvesin the k-dimensional cgespcs, each representing BD object. Pe © Asdetiedin the Append, ection A.6, you do not actualy ned o compute the eigen ‘ales and eigenvectors of 1X7. Thanks oa fundamental property ofthe singular valve *rfpoutive a few ists tops that ic Section 104 Appearance-Based Identification 269 dezomposition te eigenvalues of 1." are the meas the eigenvalue the mn matrix XX (a matrix of much smaller size) and the eigenvectors of XX ca be computed from the corresponding eigenvectors of X71 Notice thatthe dimensionality reduction is cartied out onthe global database. This ensures that the important eigenvectors record visual information of ll objects inthe database, Comersely, as we see next, recognition i performed inthe eigenspace of individual objects, To begin with, we suggest you try EIGENSPACE_ LEARN with N=61, 0=5,n=32.This means that Q is 4096 x 4096, and X i 4096 x 32, all rather reasonable numbers ‘© Toimplement EIGENSPACE LEARN, you nocd turntable to change the viewpoint by controled rotations The best would be ous a computer controlled ualabl,buthe dik of anold record payer, tring at constant speed, Wil do for he st attempts. ‘We now tur tothe identification algorithm. Obviously enough, we assume that the learning and identification stage are run wit the same ilumination conditions and ‘camera position, Algorithm EIGENSPACE_IDENTIF ‘The input isa W x W nage, 1 of one ofthe objec inthe database The image F mst satis ‘he asumpions stated tthe beginning ofthis soxtion and acquired so that the abject ean be cal seumented rom de background, We sstme the smeilluminaton conditions and camera poston adopted in EIGENSPACE. LEARN. 1 Segment the objec from the background 2% Normalize in see and energy, and represent the normalize image as a vector A Compute the fnensional eigenspace point corresponding tok ge [o)..el da, Where isthe aveage image vector of the whole database. 4 Find te igenspace point, , ated by BIGENSPACE_ LEARN, closes to, ‘The output isthe est asoited to the curve on which js; tht the deny ofthe objet in [Now fora few points of practical importance. Fist, finding the point of a curve or surface which is closest toa given point i not trivial if the eurve is represented by high numberof point (asin our case), brute force can prove too expensive, Second it may not always be tru that < x. Third, finding the eigenvalues of vey large matrices is computationally expensive, and special algorithms exist for this purpose Finally, theChapter 10 Recognition figure-ground segmentation necessary to 7er0 out background pixels not trivial, and, in general, is simple ony for certain clases of objects and with controlled scenes. 405 Concluding Remarks on Object identification How do the methods presented compare with each other? Although simple, INT. TREE isa reasonable algorithm fr ral images. It copes with missing features, noisy features, and multiple objec instances. The wild card infates complexity, which be ‘comes exponential in the number of model and data features; branch-and-bound and ‘other methods (see Further Readings) alleviate this problem, INT_TREE performs ‘roving and identification simultaneously (See the four problems of section 10.1: It ‘elects which image features are most likely to belong tothe object, and performs the identity test. The inevitable pice isa rather high complexity. Aligament or hybrid meth ‘ods ace another way to reduce the complexity of IT search Such methods match only the number of data features stitly necessary for carrying ou verification. More of this is discussed in the next chapter. Invariants provide image measurements independent of viewpoint and intrinsic parameters, and suggest an easy strategy for model acquisition from real images. These ace very valuable characteristics for practical recognition systems. However, their us- ability is subject othe possibility of defining invariants forthe objects of interest, but ‘ot many invariants are known and easy to compute for 3-D shapes. Other points r= {quiring atention include grouping, which is shared by all classifications methods based ‘on local features, andthe discriminational power of each invariant (how reliably ean different objects be told apart given noisy images). Tt can be more laborious to build models suitable for an interpretation-tre al gorithm, as we presented it, than fr invariants Moreover, interpretation tree locate instances ofa given abject in an image, and require model search to recognize all ob- jeets present in an image; invariants, instead, support direet mode! indexing, and do rot require model earch. However, interpretation trees take care of feature grouping, invariant do not. Invarians-based methods allow one to build mode! libraries from only one or two views per object, as opposed to the many views required to build a parametric cigenspace. However, eigenspaces can be bull for any 3-D shape and do not require feature extraction, while invariants can cate only for special shape classes and depend on the performance ofthe feature extractor. Asain eigenspaces do not require feature rouping, invariants do, A disadvantage of parametric eigenspace methods that they are vulnerable to oclusion and sensitive to segmentation 106 3-0 Object Modeling As promised at the beginning ofthis chapter, we now bring together the hints to 3-D ‘object modelling scattered throughout this chapter. By now you have certainly realized that designing adequate modes is tremendously important, and, indeed, 3-D object ‘modeling is much-investigated issue in computer vision. The aim of this section is not to lst the many representations in existence (although some examples will Section 10.6 30 Object Modelling 271 mentioned), but to make you reflect on the necessity and design issues of 3-D object ‘modeling, We refer constantly tothe identification methods we learnt tohelp you make practical sense ofthe discussion Let us delimit the scope of ths section, and give a definition. We consider only models of 3-D objects and their features, but models canbe devised fo image features, (see for instance the edge models of Chapter 4) or even images (see the eigenspace ‘epresentation of secion 10.4). We limit ourselves to geometric features, as this book concentrates on the geometric properties of the world; the Further Readings point you to alterative models Also, our treatment of object modeling is centered on identification and locition, but -D models ate used for various other purposes (e. inspection). And heres the definition promised. Definition: Computer Vision Model of aD Object A computational represniaton of slfislent numberof geometric properties of the object to petfoom a desved vir tasks, (ur definition suggests that computer vision models need not be exhaustive: we ‘want to represent onl what we need to accomplish a given task. For instance, to tll X from ¥, we would like the represent onl the minimum information necessary tell X and Y apart. This i different from computer graphics, where must be exhaustive: You cannot generate an image and leave blank areas! ‘The identiicatien method presented inthis chapter suggest at lest two pairs of alternatives for objec modelling schemes: feature-based versus appearance-based, and object-cemtered versus viewer-centeed, 106.1. Feature-based and Appearance-based Models ‘The fist alternative soncerns the base elements ofthe representation: features or images Feature-based models represent 3-D objets through features, theit type, and their spatial relations The features can be thse introduced in Chapters 4 and 5, or others, pethaps nongeometrg, like colo, reflectance properties and polarization. We used feature-based models for interpretation tees (te features were surface patches) and invariants (he feaires weer enntoe) With eate-hased made, dentition means finding a set of features whichis uniquely distinctive for an object; location ‘means in essence, to match a number of image and object features, plug thei positions inthe projection eqrations, and solve for the positon and orientation ofthe 3-D object!” The advantage of feature-based models is that they generate compact object descriptors offer some robustness against occlusion (features are local), and some Intl th isthe rasor why the econ on eet modeling comes afer thse enifton a Deca bout hit iden a cused a he eat caper.m Chapter 10 Recognition invariance against illumination and pose variations A disadvantage is that they eannot bbe compared dirty with images and require feature extraction. Notice that feature- ‘based models are further classed in terms of features type: For instance, boundary representations, 0 B-reps, describe an object by its boundaries (eg, lines, surfaces), ‘olumetrc representations describe the position, size, and approximate shape of the main parts ofan objec, or just the shape ofthe volume of space occupied by the objec. “Appearance-based models represent an object through one or more images, as in the eigenspace method, This works for -D objects, or for 4D objects constrained to limited number of poses (as assumed by EIGENSPACE_LEARN); however, as cigenspaceshave shown, oe needs manyimagesto represent 3-D objectsatistactorily. Recognition (both identiiation and location) means to find the image in a model set which is most similar to the one to recognize. Similarity is quantified by a metric ‘measuring the distance between images fo instance correlation or SSD. The advantage js that images and models can be compared directly, and objects with no obvious features can sill be modelled. Disadvantages include the fact that lumination, pose, and location variations alter the images. 10.62 Object Versus Viewer-centered Representations, The second alternative concerns the fat that -D objects canbe observed by different viewpoints ‘Object-centered representations attach reference frame to an objet, and express the object geometry (points lnc, surfaces) in that fame. We did this with interpretation trees. The classic example of object-centered representations isthe generalized cone, ‘that we can figure as a section sweeping along and perpendicularly to an axis; the section may change ast moves, A special clas of feature-based modes is needed to deseribes deformable objects like faces, clothes, and human bodies. Snakes (Chapter 3) areexamples of features atthe hear of deformable models andthe deacan be extended to three dimensions If the deformation allowed are limited, an interesting idea is to ‘model a deformable object with a reference shape and these of its deformations. Viewercentered representations model 3-D objects through a set of images or views taken, ideally, in all posible conditions (viewpoint, illumination, sensor param eters), Views can be images or processed images (thai, visible features). Eigenspaces are an example of viewercentored representations storing an image for each views ‘spect graphs, instead, store a feature-based description foreach view. An aspect isa set fof viewpoints in space fom wlil the sane object Features ave simultancously visible. ‘When, moving he sensor orth objec, some features appear or disappear (a situation called visual event) a new aspect is entered. Aspect graphs are computed by algorithms analyzing symbolic, feature-based object descriptions, in which the features are nearly invariably contours Buta symbolic object description is not always available, and ro feasible aspect graph algorithms exist for complex shapes. In this case, appraximate SNe the erences ten aie eats and elgnspocs the forme ar are om x symbole scion the ater rom naps th omer ne ete bsed tbe later mage based. Moreove, it een sede ene dase anes in core, Section 107 Summary 278 aspect graphs can be built. The set of posible viewpoints is restricted toa finite grid coma sphere centered on the object; the features visible from each viewpoint are found by raytracing @ CAD model ofthe object. This method is applicable to any object for which a CAD mode! can be built, which are many more than those for which viable aspect graph algorithms exist. The problem is that not all the important features may ‘be captured by the faite set of viewpoints on the sphere. 10.6.3. Concluding Remarks iow should we chose between feature and appearance-based representations? This {depends largely on the object tobe represented and the algorithms at hand. Frinstance, ifwe must represent set of polyhedra, and have access to ane detector, a description in terms of lines andtheir relative postion in space i reasonable: It describes well the ‘objects’ shape, and we can identify the features in input images. Feature-based models 4o not make sense for objects without detectable features; if the only features we can detect are straight lines and we must represent aset of human faces, we would be better off with appearance-based methods. Objectcentered representations are intrinsically feature-based, so identification and location work enatly as explained for feature-based models, and advantages and isadvantages are similar. For instance, a disadvantage is that input images and models cannot be compared directly a the senso isnot an element ofthe representation, and object appeararce is not explicitly predicted. The features chosen should also create image features which ae reasonably invariant o viewpoint and illumination (for Instance, a line remains a line from all viewpoints but a circle is turned into diferent ellipses). Finally, shape may be too complex to be represented in terms of features. ‘One disadvantage of viewer centered representations i their lpg siz, so that efficient algorithms for view comparison are vital. The eigenspace method i basically an efficent way to perform image correlation, but it suffers from the problems of ppearance-ased matching; with aspect graphs, such problems are alleviated thanks to the use of feature based views, but searching large aspect graph is ill a problem for applications. A great advantage of image-based methods is that any shape can be represented, no matter how complex, as long as we can take images of it. 107 Summary ‘After working through this chapter you should be able to: {explain the nature of 3-D object recognition and identification, and describe and ‘motivate the sthproblems involved design a basi iatepretation tee algorithm and a few variations design simple appearance-based system learning a small database of object ‘models and recognizing objects in unknown poses 1D design a simple invariantsbased system for plana shapes 1B. discuss the main issues of representing -D objects for model-based recognition274 Chapter 10 Recognition 108 Further Readings (Our discussion of interpretation tees andthe algorithm INT_TREE, isbased argelyon Grimson’s book [14], which contains an extensive analysis ofthe method, details its use with both range and intensity images, and shows how geometric constraints can reduce the search, Fisher [12] report an informative, experimental comparison often different variations on the basic intrepretation tree algorithms, Detailed introductions to state space methodk, the historic roots of interpretation trees are found in many books on artical intligence, for instance [25] an [16]. Haralick and Elliot [15] discuss efficient ttee search for constraint satisfaction, Murray ea. [20] isan example of application of interpretation tes to intensity images. ‘Another approach to feature-based identification is graph matching, in which ‘one builds graphs composed of features (nodes) and adjaceney links (ares) for both image data and model, Identification is cast a a subgraph isomorphism problem; that is deciding whether the data graph i contained inthe model graph Solutions to this dificult problem are computationally complex, and for this reason no algorithm has been included here. The interested reader can const [1,24 “Methods for searching large databases of object models for identification purposes include model invocation {11} “Zisserman and Mundy’s book [18] s the best starting point to investigate invar- tants; the frst chapters an excellent introductionto the field. INV_ACO and INV_REC are based on the complet, invarant-based vision system fr planar objects reported in Rothwell eral. 23), which includes interesting discussions of algebraic and nonalge- braic invariants, grouping, and recognition times. Invariance can be extended to image features other than contours; se for instance Nayar and Bolle (21) ‘The parametric eigenspace method for appearance-based identification is due to ‘Murase and Nayar [19], who also suggest ways to address the problems pointed out at the end of section 10.4. Finding the closest point to a curve or surface is discussed in several articles on free-form object location, or instance (4 Building automatically databases of 3-D models from multiple views of an object (reverse engineering) continues to attact substantial research. Accurate models can be ‘obtained by fusing range views [4,28 13} see also Chapter 2 and references therein for ‘model acquisition sensors. Notice that algorithms computing structure from stereo and ‘motion (Chapters 7 and 8) can be regarded asthe bass of a model acquisition system, ‘using maliple intensity views ‘Object modelling and shape representation is a pedigree problem in computer vision, and the literature is vast. General discussions can be found in (1, 3, 7, 17, 24), 1 which we refer for pointers tothe many representations devised by computer vision. The classic reference for generalized cones is Nevatia and Binford [22]. An increasingly popular typeof deformable-object models based ona reference shape and. its deformations isthe Point Distbution Mode! [10] (see also the Further Readings on snakes in Chapter 5). For exible models of 3D objects, see for instance [9}. Bowyer and Dyer [6] give an overview of exact and approximate aspect graphs and (5) isan instructive debate on their pros and cons. Useful articles are also found in computer raphics journal for instance the IEEE Transactions on Visualization and Computer Graphs (see nttp:/ wey. 100e 0rg/). Section 109 Review 275 109 Review Questions 0 10. Which nongcometrc features could be usd fr 3-D object identification, and in which asumprions? Mention situations in which nongeometri features simpiiy visual identfcaion, and other in which shape isthe Best or the only feature posible © 102 ‘Think of al-world applications in which an denieation system could be usefl. Discus potential advantage and problem for each application on your lis, and the sur ability ofthe methods presented in his chapter. 1 103 Consider the example of Figure 103. I the image features had been listed ina diferentoner, the right path could have been found earir. Can youthink of any citera fr ordering image and model features otha the search generally reduced? 9. 104 Wihy did ne not use conics forthe estimation ofthe projective transforma tion in Step 4 of INV_REC? 11105 Explain titively why; nthe case that al he lines associated to an object, hypothesis belng indeed to that object, the ank of the mati Ai (1010) 88 1106 Why do we need to assume that the illumination conditions ate the same for EIGENSPACE_IDENTIF and EIGENSPACE LEARN ifthe grey levels are omalied (x? = 1? Does the normalization countractthe effects of changes illuminant destion? Ando illuminant intensity? 1 107 Explain the seasons which make cach ascumption of EIGENSPACE_ IDENTIE necsary. (Hint for Assumption 3: Consider two diferent images, f and, such thatthe minim value rear than the maximum value ‘The conrelationof and islarger than the autocorrelation ff, yet f cannot tbe more smiat to than itself) 108 What hagpens to algorithm EIGENSPACE_IDENTIF ifthe object inthe input image isnot one of te objet in the database? 109 Suppose wo ofthe eigenspace curves built by EIGENSPACE_LEARN intersect in sone points, What does tis mean in terms of identification? 2 10.10 Compare the characteristics of the thee identification techniques presented in hs chapter. Tent clase of objects and appiaton or whic some ae better suited than others. Why iit the case? 10.11 Wessidthat idenitication and location can be performed simultaneously. How could thisbe done? 1 10.2 Why did we say that objectcentered representations are necessarily feature-based? Exercises © 101 Estimate the complexity increase of the interpretation tree search caused by the introduetion ofa wildcard26 Chapter 10. Recognition © 10.2 Expand the complete tre explore by the search illustrated by Figure 103. How many inconsistent solutions are generated? © 103. Modify INT-TREE into an alignment algorithm: that i, verification takes place as soon as enough features for computing the modet-mage transformation have been computed, 2 104 What happens of an interpretation tree search without wid card, which ‘comes across a spurious image feature as the first feature? © 105. Prove that, if no two of the three lines hl, and Is ae parallel, the deter inant of Mio (in the same notation of (10.1) equals the area ofthe triangle the vertices of which are dhe intersections of hh, and 2 106 Prove the invariance of J; oF Fs of (101). (Hint Draw five fines on a sheet of paper and observe that all four triangles involved in the invariant havea side hich lies on a same line, Now from the definition of eross-rati © 10.7 Assume a set of objects characterized by a n-dimensional vector of inva ants. Tlustrate how an interpretation tre could be used to guide invariant-based recogaition, and what particular problem of invariant-based recognition would the interpretation tree address. © 108 Give criterion to estimate k automatically in EIGENSPACE, LEARN. © 109 Adaptalgoritm CHANGE, DETECTION forsegmenting the object from the background in tep 1 of EIGENSPACE_LEARN 2 10,10 Prove that the Euclidean distance between two images represented as vectors i the same as the SSD (sum of squared differences) distance used in Chapter 7 for stereo correspondence. © IAI Extend EIGENSPACE_LEARN and EIGENSPACE_IDENTIF to take into account a variable illumination direction, Project © 10.1 Implement wo versions of INT_TREE, with and without wildcard and run both on a set of synthetic data (created, for instance, by writing les of symbolic patch descriptors by and), Verify the complexity increase caused by the wild cae References it DH. Ballard and CM. Brown, Compuer Vision, Prentice-Hall, Englewood Clif, NJ (098, (2). Bergevn, M. Soucy, H. Gagnon and D. Laurendest, Towards a General Multvew Registration Technique, [BEE Transactions om Pare Anas and Machine Inligence, Vol. PAMIIS, pp. 540.547 (1996 4. Bes), Geometric Modeling and Computer Vision, Proceedings othe IEEE, Vol. 1,0. 5369881088), PZ Besl and N. MeKi, A Method for Registaion of 3.D Shapes, IEEE Transactions on ‘Potern Analysis and Machine Ineligence, ol PAML4, pp. 238256 (19). 8 in References 277 [5]_K. Bowyer (ed), Way Aspoct Graphs Are Not (Yet) Practical for Computer Vision, Computer Vision, Graphics and Image Processing: Image Understanding, ol. 88 pp 212. 218(1932) [6] K-Bowyerand CR Dyer, Aspect Graphs: An Introduction and Survey of Recent Rests, Intemational formal of bnaging Systems and Techoogy, Vo. 2, pp 315-28 (199) (7) MA. Brady, Critra for the Representation of Shape ia Huan and Machine Vision, J ‘Beck, B. Hope and A, Rosenfeld ods, Academic Pres (1983). [8]. ¥. Chen and G. Medion, Object Modeling by Regssaton of Mulple Range Images, Image and Visio Computing, ol 10, pp 145185 (1982). [9] 1.D.Cohen and’. Cohen, Finite Elements Methods fr Active Contour Moslsin Ballons for2-D and 2D ages BEE Transactions on Pater Anais and Machine Ineligence, Vol. AMIS, pp. 131-1147 (1993), [20] ‘TE Cootes,C Taylor, DHL Cooper and. Graham, Active Shape Models ‘Thee Taning and Application, Computer Vision and Image Understanding, Vo. 6 9p. 38-59 (195) [11] RB Fisher, From Surjaces to Objects, Jon Wiley and Sons, Chichester (UK) (1988). [12], RB Fisher, Perfomance Comparion of Te Variations on the Interpretation Tree Mach ing Algorithm, Frac. European Cont on Computer Vion, Sockbolt, pp. SU-S12 (1984), [03] A.W. Fitzgibbon DLW, Eggert and RB, Fisher, High-Level CAD Model Acquistion from Range Images, Compu Aided Desig, Vol. 29, p. 321-230 (197), [18] WEL Grimson, Object Recognition by Compuer:the Role of Geometric Constraint, MIT ros Cambri (MA) (1990) [15] RM Hara and GL. Eliot, Increasing Tree Search Efcieny for Constraint Satsac: tion Problems, dif Ineligence, Vl 14, pp 263313 (980, [06] GE. Lager anc WA. Stubblefield, Arial Inlligence, second edition, Benjamin’ Cummings, Redvood Cay (CA) (193) [07] D. Mar, Vision, W.H, Foeman an Francisco (198). [U8] 41. Mundy and A. Ziserman, Geometric Invariance in Compuer Vision, IT Pres Cambridge (MA) (199). [09] HL Murase and SK. Nopar, Visual Learning and Recognition of3-D Objecs rom Appeat: ane, Intemational Toural of Computer Vision, Vo. 14, p. 5:24 (1985. [20] DW. Murray, BA. Castelow and BF. Buston, From Images Sequences to Recomised Moving Polyhedal Object Intemational fournal of Comper Visio, Val. 3, pp 181208 (298, 1] SK. Nayar and RM Dole, Reflectance Ratio: A Photometric Invariant for Object Reeog nition, Proc IEEE Inerational Conf on Computer Vision, pp, 280-285 (193) [22] RNevatia and TBinford, Descision and Recognition of Curved Objects, Arf Il. ligence, Vol. 38,9p. 7738 (197, [23] CA. Rothwell, A. Ziserman, D.A, Forsyth and JL. Mundy, Planar Object Recognition Using ProjectiveSape Representation, iterational ural of Computer Vision, N15, 1p. 57-99 (1985) [28] M.Sonka,V. Hlvacand RBosle, Image Processing, Analysis and Machine Vision, Chap man and Hal, London (1953 (25) PH.Winsion, Amici! neligence, Third edition, Addison Wesley, Reading (MA) (193).11 Locating Objects in Space Un sunt poculadulciora elle? Horatio This chapter introduces algcrithms for object locaton; that is determining the position and orientation ofthe objects in view, assuming their identity (model) is known, Chapter Overview Section 1.1 introduces the problem andthe assumptions made for this chapter. Section 11.2solves object locstion piven ull: perspective and weak-perspeciv intensity image. Section 1.3 solves object location given a range image. What You Need to Know to Understand this Chapter + Basic feature extraction (Chapter 4) and object recognition (Chapter 10), + Perspective and weak-perspective camera models (Chapter 2}. ‘+ Least squares, SVD (Appendix, section A.6). ‘+ Rotation matrices ad their parametrizatons (Appendix, section A.) Whee ze he che este thin honey? 29280 Chapter 11 Locating Objects in Space 114 Introduction ‘Once we have identified the objects in an image, how do we find their location in space? ‘This problem is known in computer vision as model-based object location, but you are likely to come across other names for instance model marching, pose estimation, and optical jigging. We begin with a general definition of the problem, which willbe specialized fr the intensity and range cases. Problem Statement: Model-based Object Location Given an image, a sensor model, nd the geomet model of the object imaged, nd which 3D position and orientation of the mode generated the image Postion and orientation (or pose) refer to the 3-D translation and rotation, re spectively, which bring the object model wherever it was observed by the sensor. The definition above hints the assumptions we makein thischaptr, which are summarized below ‘Assumptions | 1. Location uses single image. 2. A model of geometric image formation (sensor mode is completely known, [3 Object models are object-centered, and based on geometric features. ‘4. The detifcation problem as already ben solved ‘Assumption 2 means, for instance in the case of intensity images, that a precise camera model is adopted (eg, full perspective), and the cameras intrinsic parame: ters are known, Assumption 3 excludes appearance-based object models, and models formed by non- geometric features (see the Further Readings and Chapter 10 for more ‘on such models). Assumption 4 means that we know thal a precise part of the image (forintance, a group of features corresponds to an object model and, within that part of the image, which image feature corresponds to which model feature. "Animportant aspect of locaton, introduced in the previous chapter, is verification. Errorsin the output of feature extraction and object identification can cause inaccurate or definitely wrong location estimates. We can verify the accuracy ofthe latter by creating an image of the model in the estimated postion (backprojecting the model), and checking how close the mode! features are imaged tothe corresponding featuresin the input image? Note at his ps becuse the sensor mols ksows Section 1.1 Introduction 281 @ w Figure 11.1. (a) Linedrawing rendering ofa }D object mode! (a photographic stand). (6) The ‘mode's location has ben estimated from a range image, and the model backprojcted. Notice the imperfet alignment, mainly de to the incomplete range data. Courtesy of A. M. Walle, Heriot. Wat Universit Asan example Figure 1. illustrates model-based matching and verification for ‘range image of a photographic stand, Another example is shown in Figure 11.2 in hich intensity and range data are used together to compute location, Before moving on to intesity-based location, we summarize the three basic modules ofa recogn tion system,’ as defined by the discusion and assumptions ofthis and the previous chapter. ‘Object Recognition: Basic Modules 1. Objcerideniaton, which selects object modes roma database of modelsand establishes the erresponderce of model and data features. 2 Model based losin, which postions he seleted modes in space. [Veriton whi tters ou the Byptbeses (located model) hat prove coast with the input mage © The ype of images (range or intensiy) makes a substantial difference for model-based machin. In the imtenity css, model and data ae diferent in nature (te modal is 3. the image 2), and mus be transformed oe compared: inthe range case, both models and ala are spesied by SD coordinates, and canbe compared unalred (mie ston an eet ae ake fr rtd282 Chapter 11 Locating Objecsin Space Section 112. Matching from Intensity Data 283 11.2. Matching from ntersity Data This section diseuses model. based objet location from a single intensity image (more precisely, roma setof image features) All model points and vectors are expressed in the model reference jrame, te frame associated withthe object model (Chapter 10}; all data points and vectors are expressed inthe camera reference frame, as we assume the intrinsic parametersknown, ‘We describe two methods The frst one adopts a full-perspectve camera and is basically an application of Newton’ iterative method for solving a system of simulta neous, nonlinear equations; the features canbe either points or lines The second one «employs the more retrictive weak-perspective eamera, but givesa closed-form solution (as opposed to an trative one) using three corresponding pars of image and model points: We start by satng the problem in the ease of point features, Problem Statement Lary sith Y= [X",78,28]" and n> 3 expresed in the model reference frame, bem point of an objet model. Let Py... Py with P,= [Xs Yi] expressed in the camera reference frame, indtte he coordinates ofthe corresponding points on the object observed. Let BiB Mith =D i, Bethe image pons, expressed inthe eamera frame, projections tthe Determine the rig tansformation (rotation matrix of ents (t..1,1]' algnng te camera and model reference frames tat, Do.tazi = Ren +t quay © Notice hat the problem equivalent to determinngthe camera sextrinsic parameter with espoet othe objec eerece frame. In thi sere, solving the location problem means locating the camcra in space and consequent locating te vehicle of root actuator on stich the camer i possibly ited, 11.21 3-D Location from a Perspective Image ‘The structure of a location algorithm depends on the camera model adopted, a the lauer determines te equations of image formation. Inthe ease of full perspective, he Figure 11.2. {a) Range image ofa mechanical component. (b) Surface patches extracied (Chapter 8). ()Inensiy image ofthe sme component. () Linear ‘edges detected inthe intensity image, =) CAD model ofthe compocent located algorithm can be formulated as an application of Newton's iterative method for solving land backprojeted onto the intensity image. Courtesy of W. Austin, Heriot-Watt systems of noalinear equations University. Outline of the Algorithm. The relation bewween an objet point and an image Point, both in camer: coordinates, is given by the usual perspective projections: $ Wedereat neces cyuton without asuningpviorknowledgot Nevo’ metho youre aie ssh yu loon eit simpy mena cued by he ac tha 3 oon mati Isai ents bt thes: depend onthe parameters a24 Chapter 11 Locating Objects in Space rego? =| 28,28 bal Th. iy Pgging (11.1) nto (11.2), we se that each point corespondence generates to non linear equations, nu X" +no¥? +n3ZF + Th Per ea aX tral tne 47s TAK ral Prat +Ts ata) meas ‘Te unknown components of ® and T can be dtexmined froma sfcent numberof correspondences, each binging tvo equations ike (113) The resting stem has si ‘knowns a & depends only on thre free parameters (Append, section A) the following these ae the rotation angles about the three axes ofthe cameras frame, dy, dx and dy. Our six unknowns ae therefore ¢, 8,5, Tis TT. Solving the system i ‘where Newton's method comes into ply. [Newtons Method. As any iterative technique, Newton's method stat off with an intial guess for R and T, say R and T°, and computes the location ofp through (113) with ® = #" and T=T".16R® and are not to fr from the true solution, (R, Ty the residuals (ay are smal and can be approximated by a first-order expansion of x (1, 62,63, Ti- and yl. $s Tis Te, ) given by (11.3) ina neighborhood of Rand 7? Let us compute the partial derivatives necessary forthe expansion. The partial derivatives with respect to 7,7, and Ts are 1) ax, a and on an For the partial derivatives with respect to the rotation angles, we must recall the defni- sion of derivative ofa vector with respect to rotation angle, illustrated in Figure 1.3, Given our choice of angles, we have oP, r Fat naT xR, Pe toto xP, i Section 11.2. Matching from intensity Data 285, Figure 112 The derivative of point vector, wi respect to a rotation angle, 8, about 2 zectin through the oiin,m (In= 1) is ven bythe vector product of mand P and rom which we obtain by PRN a a; i Pe is and ay AAW a Kh a _ (Xi a TE aI ZR esa “Therefore, for each image-model point correspondence, the expansion yield the fl lowing pair of linear 2quations: 28 a 4 8 ayy Logan + gab =se us) ay gp, Bar + Bae) = San + Fao) =8nChapter 11 Locating Objects in Space The six unknowns AT; and Ag, can be determined if atleast three point correspon- denoes are known, To counteract the effect of inaccurate measurements or correspon ddences however, we xy 10 use as many correspondences as possible. “The method proceeds by producing new estimates forthe rotation matrix and translation vector, and iterating the procedure until th residuals fx, andy; become small enough. Some Necessary Details. In order to arrive atthe customary algorithm box, we must address three problems. 1. Given the curentestimatesR and, andcorections 7; and how do we compute the new estimates) and)? 2 Is the solution unique? ‘3 How doe determine a satisfactory intial guess? Updating the Estimates, Updating T, is straightforward: Tu =T +07) Updating 8, instead, requires some care. The new rotation matrix, Rj, s obtained by repeated matrix multiplication of Rj with the three rotation matrices defined by the correction angles, Ad, Ady, and Ads. Appendix, section A.9 gives the structure ofthe ecessary matrices ‘= Upon convergence, the correction anges become sufiintly smal to make the order of multiplication irrelevant (Exerc 111). However, thsi ot true a the begining. In over tibet ofc an ode, a sik o your chose, Uniqueness of Solution. 1 large number of feature points in nondegenerate configurations (eg not all coplanar are employed, te issue of multiple solutions with the exception of symmetrical objects, canbe safely ignored. Multiple solutions do arse ‘ith small numbers of features ¢g.,n =3)sin this case, one way of finding the correct, solution isto run the method ftom several, different starting positions. Fora thorough iscussion of ths issue see the Further Readings, Determining the Initial Guess. Fortunately, this problem is less critical than expected. The (11.3) are linear with respect to translation and scaling over the image plane, and approximately linear over a wide range of values ofthe rotational parameters. “rence, the method is likely 1 converge to the desired solution for a rather wide range ‘of possible starting postions, and even rough estimates of T and should be sufficient toensure convergence to the true solution, Suggestions for ths initialization stage are sven in Exercise 11.2 Summary of the Algorithm. We are now ready to state a location algoritam, Section 1.2 Matching from Intensity Data 287 Algorithm 3D_POSE “The input is formed by n coresponing image and model pins with » > 3, and the inl estimates # and 1, Use the curent stimates of and and (1.1 10 compute the that the rected location of the model points Pin the camera ame. 2 Projet he P, ono the image plane though (11.2). ‘A Compute the resduls and 41, from (11.4). ‘4. Solve the ines stm formed by n instances of (1.5) forthe unknown corrections 7; and 49, J 1.2.3 ‘5. Upsat the curtent estimates ofthe transation vector and the rotation matrix Ce duals ae sutficienly small exit ele goto tep "During the it erations the residuals should docroase by abou one oder of magnitude periteration. Thar afew iterations are usualy sufcien to obain a satisfactory saluton Rernember that stem (i) (step 4) ought tobe overcosstrained (n> 3) to counteract oie, and solve by eat squares. Extending the Method o Line Features. Since linesate often easier to detest and localize in images than points, lne-based location has important practical applications. ‘We just sketch the principles behind the extension of 3D_POSE to line features; the development ofan algorithm is lft as an exercise (Exercise 11.3). We write the equation of a line in the image in the form vorrei ‘Observe thatthe distnce between tis lin anda point, fr, yo issimply d ~ do), with (19) therefore, the derivatives of dy are simply lincer combination of the derivatives of and yp, with the sam weights of (11.6) Now, given a st of pairs of corresponding image and mode! lines, we choose two points on each matched image line, and compute the distance between each point and the matched model line, as shown in Figure 114, Since each point gives one ‘equation forthe corroction parameters, andtwo points are sufficient to nique idenlify the model line line-tosine correspondence brings home the same information (wo equations) of @ point-to-point correspondence, and the stracture of algorithm 3D_ POSE remains unchanged. The only changes are thatthe (11.6) are replaced by the two ppoint-line distance equations and the expressions ofthe two distances replace those of {he residuals (115),288 Chapter 11. Locating Objects in Space Section 11.2. Matching from Intensity Data 289 Figure 11.4 The dotted lines show the line- ‘sed model of box backprojected onto the mage. Te olidlines show thelines detected the input image ofthe box, each corresponding to the closest model line. The arrows show the ‘wo pois and distances used by the method for one ofthe pairs of matched lines. ‘= In order to achieve stable estimate the tw points onthe matched model ine shouldbe the endpoints ofthe matched image ine (ee Figure 1.4) 112.2. 3 Location from a Weak-perspective Image This ction presents an alternative method, esl ilustated in geometic terms The rethod utes pot correspondences and assumes a weak perspective camera. It ako cass us into our next tas, locating objects from range dal. Setting the Location Problem in Geometri Terms, Fits of ll.et us summarize the assumptions behind the method. ‘Assumptions 1 The viewing goometry can be approximated by the weak perspective camera model, 2 Thre image points Br a Cortespn fo the model pons Pf, PY PE, respectively The most obvious difference between this method and 3D_POSE is the camera, model. The simpler weak-perspectve camera although more resrictve, is mathematically simpler the than ful-prspective one, and does not contain intrinsic parameters 5ifyoudnotrememberthewenk-penpstine camer model review scons??? Chapter 2beore song on. a Figure 115 The object pois, Py, Ps, Ps, and ther images under weak petspeciv, poh Po. generated by an orthographic projection flowed by a Scaling factors. Notice that, a consequence, the pyramid onthe left ia sealed ‘version ofthe pyramid on the ih like principal point cx focal length The price to pay, as we shall seein deal, is that we cannot determine how far each point is asa weak-perspective camera does not give information on distarces. ‘= There sno such ting lke an optal camera model er se The optimal camera models simply se simplex geometric model compatible with the conan of he problem at hand. It youean guarantee that the depth the scene observed issmal wth respect totsdstance {rom the camera, a weak-perspective camer is pefelly adequate, and there i no need for more sophstiated models Under weak perspetive, the geometry of image formation is described by an cxthographic projection followed by scaling. This ilustrated by Figure 1.5, which also shows the geometric quantities used inthe owing. In a notation sina othe ‘one sed for3D_POSE, we t P,P P; denote three point (incamera coordinates) on the objet imaged, PS, PY, PS the corresponding model pints (in mode! oordinates), and py pi o the img points projections of Pp, Py, Ps Firs, we project Py and Pz othograpically ota a plane, parallel tothe image Plane and pasing through the thie object point, Po. We then sale down the resulting triangle on tothe image plane triangle oppo. The tw angles, swell asthe pyramids shown in igure 115 differ by an unknown scale factors, The distances from x to P, and Pp, which we consider signed, are denoted with Hy and respective The method works in wo stages: 1. Compute the 2D coordinate ofthe object points inthe camera frame, 2. Find the rigid transformation bringing the model points onto the object points ‘Notice tha stage produces ase of range dat; that a cloud of 1D points Stage 2, ‘herefors, mast se the location problem from range data290 Chapter 1 Locating Objecsin Space ‘Stage 1: Solving forthe Scene Poins. The 3-D coordinates ofthe scene points {nthe camera frame can be obtained in closed form as follows. We can derive three constraints from the three right triangles in Figure 11.5 ay where the distances 4, lbs ~ pil and IPP yl are known: the dy can be computed from the image, the D,j from the model, These distances, as well as the scale factor, s, are postive; and hy (likewise A and H:), instead, are signed. The "sin in (h;~ h)issimply due tothe particular configuration of Figure 115. “Adding the first two equation in (11.7) and subtracting the third gives hha = s*(Dpy + Din — Dia) ~ + dn — ‘Squaring this last equation and using the frst two in (1.7) yields a quartic equation in the scale factor: ast tbs += rn) {Dus + Doo + Dia\-Dor + Dee + Diz\Du ~ Daa + Dn)(Dy1 + Ba Pi) Did +d, +d) + DB dy ~ dy +a + DRG, +a —dh) (9) Cy + +d + + ha ~ dn + a) +d — i) ‘Ascuming thatthe mode! triangle is not degenerate, simple but rather lengthy calew- lations (see Exercise 114 for some hints, and the Further Readings forthe complete sory) show that (1.8) has a unique solution given by [ox lore re and thatthe coordinates ofthe scene points in the camera frame are (1110) Gaye) Pe ra. ya.1 +a) (ALL) Po=He.990) Pi Section 1.2. Matching from Intensity Data 291 with is.h) = 46 (6D =, of fobs? =A) (i, = dnt call ita cofora-+ 6, (UL11 ellbusthata 3D pose cannot be estimated ft tne del eins are cole In praice the numeral solution willbe l-eonitoned whenever the ‘model pont are sealy cline; his situation should be avoided Here isa summary ofthe algorithm that recovers the coordinates ofthe seene points n the camera frame. ‘Algorithm WEAK_PERSP_INV ‘The input stormed by hee model points PF, PY, PY, and tbe tre corresponding mage pins, PoP Pe 1 Compute the seal factor wing (1.10) and (119). 2 Compute the cocinats ofthe scene points using (1.1). ‘The outputs formed by the coordinates ofthe scene points imaged, Py, Py, Ps, expresed in the camera reference fame ‘Stage 2: Solving for the Pose. We now turn briefly tothe second stage of our method, ie, how to compute the rigid transformation bringing the model points onto the scene points expressed inthe camera reference frame. We pointed out already that, asthe input i now a tof 3D points expressed i the camera frame, this corresponds exactly to solving location from range data. Several solutions to this problem are discussed in ection 11.3, and any of them can be coupled to WEAK PERSP_INV to complete the slutiono the lation problem from a weak-perspectve image Youcanzegatd ein features “AK. PERSP_INV san algorithm econstrstng shape acre) rom292 Chapter 11 Locating Objecsin Space CIRCLE PLANE IMAGE PLANE. z Figure 11.6 The cone and two ofthe thre reference frum used by POSE FROM ELLIPSE. Notice tht th ais forms a generic angle itn, 11.2.3 Pose from Ellipses ‘We conclude the section on intensity-based location with an algorithm estimating the pose of a circle in space from a singe intensity image. Given that many man-made objects contain circles, which are imaged as ellipses, this is @ useful too! in many situations; eg, estimating the orientation of a vehicle relative to a circular target for vision-guided docking ‘The geometry ofthe algorithm, POSE_FROM_ELLIPSE, is illustrated in Fig ‘ure 116. The image ellipse delins a cone with vertex in te centr of projection ofthe pinhole camera, We can fnd the orientation ofthe circle's plane, x, by rotating the cam ‘era so that the intersection ofthe cane withthe image plane becomes circle, which happens when the image plane is parallel tothe circle, This rotation is estimated asthe ‘composition of two successive rotations: The first put the Z axis through the center of the circle, and aligns the X and Y axes with the axes of the image ellipse; the second rotates the reference frame around the new ¥ axis until the image plane becomes par allel to. In general this problem has two distint solution, due to a twofold ambiguity {nthe determination of the second rotaton’s angle How do we estimate the two rotations? Let aX? + XY + c¥? dX +e¥ + f=0 ‘be the equation ofthe ellipse in the image plane. We assume all distances expressed in multiples of the focal length, which s therefore set to 1. The equation ofthe conesthen PTCP=0, «112 [X,¥, Zand isthe real, symmetrie matrix ofthe elise, The frst rotation, ‘ansforming the frame OXY7Z into OX'¥'Z’ and sending point Po P is determined AN? 4X +e? ANZ HOYT +f Section 1.2 Matching from Intensity Data 283 simply by iagonalizing C. 14, 22,2 re the eigenvalues of C, with A; < Ay < As, and 1, €2,€5 the corresponding eigenvalues, we have P= R]P=[ejele)" P. ‘The second rotation is determined by imposing the equality ofthe coefficients of 1? and ¥°, resulting in arotation around the ¥ axis by an angle which sets a = c= 3a, Notice the twofold ambiguity in 8, which cannot be resolved in the absence of further constraints Inthe X'7'Z frame, the second rotation is therefore identified by a matrix cos? 0 sind m=| 0 1 0 sin? 0 cos ‘which brings OX'Y'2'ontoa new frame OX"Y"Z". The global rotation matrix, R, ithe composition of Ry and; tha is, R = Ry. The normal to plane x is therefore “(4 Following isa descripion ofthe algorithm. ‘Algorithm POSE_FROM_ELLIPSE ‘The input isthe implicit equation of an image ellipse, (11.12), pespective projection of a ele inspace on plane. The focal legthisset to ll distances are expressed in focal length wits 1. Compute the eigenvalues },2,33 0f€ (1 <}y <2s),and the corresponding eigenvectors 2 Compute he rwovalues = Compute he rrsion matic rcan ER ue 0 sno rota “OE © Somoclipes maybe partly vise because of surface discontinuities or eckson from ‘other objects, In sack eases, errors may crep into the ellipse ft, and consequently into294 Chapter 11. Locating Objects in Space the orientation estimates Notice that POSE_FROM ELLIPSE canbe regarded a de- termini shape fram-eaure algorithm or surfaces covered by cles The cocesponding ellipses in dhe images must be large enough o be extracted reliably by an clipe detector (Chapter), 11.24 Concluding Remarks ‘We conclude this section with some comments on the strengths and weaknesses ofthe ‘two methods described, and a word of caution, First ofall, which method should we ‘use, and when? The answer depends on a nusnber of factor In short, 3D_POSE is considerably more complicated from the computational viewpoint, butcanlbe employed with points, lines, or arbitrary combinations thereof Further strengths of 3D_POSE Include the ability to aust automatically toa variable numberof data features, and the possibilty to include unknown model parameters inthe estimation process A limit, isthe need for an initial guess ofthe solution. WEAK_PERSP_INV is computationally simpler, but is likely to be less effective in the presence of large amounts of noise, as practically every method based on the minimum amount of information needed to compute a unique solution. For details on the important issue of robustness against noise, see the Further Readings (as wel as Appendix, section A.7) ‘We reiterate that any answers should be verified against the data Ideally, verification could exploit all the features extracted from the image, which in general are ‘many more than those used for solving location. However, occlusions, missing features, false correspondences and inaccuracies of numerical estimates contribute to make ver ification a difficult operation in itselt As discussed in the context of invaiants-based identification (Chapter 10), verification can be used not only to check the accuracy ‘of the estimated location ofan object, but also the hypothesized object identity (the model selected). The complexity of efficient procedures for pose verification are beyond the scope ofthis introductory chapter; the interested reader is refered tothe Further Readings, 11.3. Matching from Range Data ‘Wenow tumtomodelbased objet location froma single range image. As done through ‘ot this chapter we assim that the identification problems hasbeen solved; we mus now match a given objectcentered, feature-ased model toa given subset of range data, knowing the coresponding pairs of model and data features. All model points and vectors are expresedin the model reference frame associated wih the model ll datapoints and vectors areexpresedin the senso reference frame, defined by the range sensor asd to segue the image, and known frm cabration. ‘As model and data are now directly comparable (bth ae specified by 3D coordinates), we are better of than in the intensity case: We jst have to find the rigid transformation which aligns the model with the dat, and no projection is involved ‘This acualy an assumption that doesnot relet the way in which some range sensors operate (for insane, sale factor ould be invoincd), Section 1.3 Matching from Range Data 285 LOCATED NODEL REF. FRAME. DATA (VISIBLE SURFACES) » SENSOR REF. FRAME MODEL REF. FRAME AT START Figure 117. A2. istration ofthe location problem with range data, Staring wih the model reference frame superimposed onthe sensor reference fame, we mst estimate the transformation R,T bringing the model ono the dat ‘You can visualise « model reference frame initially coincident with the sensor reference frame, then rotated and translated so that the model points coincide with their corresponding datapoints; Figure 11.7 ilutrates ths idea with a2-D example, In the following we assume that the features forming both model and data are surface patches, extracted as discussed in Chapter 4; for simplicity, we consider only planar paces. ‘© Plnarpatchesareeay to represent and ther normals canbe estimated reliably from range {ata proved rg enough portion of the patch vibe A sfsienty high number of Planar patches ea to acurate location estimates, Following are the problem statement and our explicit assumptions ‘Assumptions The range image const a known objec in unknown poston and oentaton. Ast of feature desciptors dy... desrbing the vise planar patches of the object is avalable; each patch dosriptor contains estimates ofthe patch centr, andthe patch236, Chapter 11 Locating Object in Space oral exes inthe sensor fees ram, wel the tase fhe pln fhe ch om thoi ‘An ot onesies mal ofthe bet aval, he rg fhe ode rerense rane thot cent. Te model Sexcpor fh Wie athe am ind the ath etd ante al’ pomal bike xed ihe el tren frames elas the dite fhe ple ote pth om he ‘Mosel estore ospond data desig, forall ewe Land N Problem Statement Compute the rig transformation (rotation and transaton of the model reference fame) lig ing mel and data, "The data are supplied by segmentation and identification algorithms, and typi cally corupted by noise ofa least three kinds First, all data measurements (normals, centroids, distances) are estimates and eatry numerical errors. Second, some correspon ‘ences may be fale, Third, some data patches may be partially occluded (so that their centroid does not coincide with that ofthe complete model patch), and some model patches completely invisible (o that only a subset of the model patches can be used for cation). Al these factors contribute tothe uncertainty of location estimates ‘We solve the location problem in two ways, differing forthe order in which we cstimate rotation and translation, and forthe data used: ‘+ esiating translation frst, and representing the planar patches by their normals, and centroids; “+ esimating rotation frst, and representing the planar patches by their normals as well asthe equations ofthe associated planes 113.1. Estimating Translation First ‘This method computes the translation of the object's centroid fist, then uses the result ‘tocompute the rotation, inthe translated reference frame, bringing the ranslated model ‘onto the data Estimating Translation. fest idea is suggested by the fact that, atthe begin: ning, the model reference frame coincides withthe sensor reference frame Figure 11.7); therefore, an estimate of the tansation of the object centroid is obtained simply by averaging the centroids of all the data patches. The resulting algorithm is extremely simple ‘Algorithm TRANSL, RANGE, ‘The inputs formed by the observed centroids ofthe data paches i, ..-.P "The ouput isan estimate ofthe object centro Se ony Section 113 Matching from Range Date 287 ‘This simple algorithm does not need rotation estimates, but is not free from problems. How relicble is 1? In general, the average altenvates the errors on the centroid data; however, if some of the patches imaged are partially occluded, their centroids will be wrong, and this eror will affect T (Exercise 117). Moreover, the average of the visible centroids is generally different than the erage of al the patch centroids in the model, and some views ofthe objet can make ths problem particularly Severe, The conclusion is that this algorithm is ony a fist idea and must be improved "upon: we do soi the next section. ‘Unfortunately, 3-D rotations are trickier than translations s they can he repre sented in several diffrent ways (Appendix, sction A.9). We must therefore choose & ‘representation first, then use i to estimate the unknown rotation. We present two sim: ple methods to estimate rotations, based on two different representations For move on {he representation of -D rotations, you ae referred tothe Further Readings Estimating the Rotation Matrix. first idea ist estimate the entries ofthe ‘rotation matrix direcly then ensure that the resulting matrix is numerically rthogo. nal We needa mininum of thre pairs of corresponding data-model normal; the nine simultaneous equatens resulting, (as) constrain completely he nine entries ofthe tation matrix, which brings the model onto the data '%* Itis good dea to discard coresponding pis of normals in which the two normals form liferen angles (lowing fra resonable numerical olerace), ‘Tocounterat th effects of nose, we use al the available pairs of corresponding ‘normals and solve the least-squares problem axgmyht> Inf ~ Rag?) (1s) Tosolve problem (1.15) we ntualucea veer, = [Py n,n34m9)--«3|" formed by the entries ofthe unktown rotation matrix, and write the associated, overconstrained system of equations : (6) here Mf is 3N’ x 9Hlck-diggonal mati and fis a 3N-element vector. ‘Tiss what ei. forinsanc, nthe eration of haiti alpitn EXPL. BARS, CAL Chspier) "Note thatthe three vests ead spall ed by the cet oot plc, b this woud be moe seas iaons,298 Chapter 11 Locating Objects in Space © Notice that, thanks to the structure of M, sytem (11.16) is formed by three independent systems cach yielding oe line of (Exercise 118. Here isthe usual algorithm box, ‘Algorithm ROT_MATRIX RANGE, “The input is formed by N pis of coreponing dita and vel noma wf... ad merase 1. Form the overeonstrined system (11.16) and compute the least-squares solution, com responding toa matrix 2 Toentorce orthogonality on compute its SVD decomposition = UAV", where 100) asv(o10}o7, uy) Dor and set t01 or 1, whatever closest to det(V 0") & Compute = UAV" using the comected A. The outputs the bes eassqures esi © As usual the least squares solution sues fom the peseace of oatlins created by fae ‘correspondences and orasonal lage erorsin the data norma. this case you can run a ‘robust estimator (Appendi, section A7) to identify the outers, eliminate them fom the ata et, and reapply ROT_MATRIX_ RANGE. Estimating Axis and Angle, ‘Tho second method is geomettic algorithm based ‘om the axs-angle parametrization (Appendix, section A9). Any 3-D rotation can be expressed as a rotation around an axis® represented by a -D unit vector n, by acertain angle, ¢, so that mand p sais the right-hand rule. We assume 6 [0, x], and represent rotations by angles between x and 2x by using —m instead ofm “Two corresponding pairs of directions (two model normals two data normals) are sufficient to idemtity uniguely m and 9, which can be computed by a simple geometric algorithm, The algorithm considers three possible cases: Fist, the axis of rotation land the model and data normals are all independent, second, two of the normals coincide, and therefore define the axis of rotation; third, the model normals, the data ‘normals and the exis of rotation are all coplanar. As usual, we shal use al he available correspondences Following isthe detailed description of th algorithm, "isis knowns eran (ut om many heres ze kc by ths me) Section 113 Matching from Range Data 299 ‘Algorithm AXIS_ANGLE_ RANGE ‘The input is formed by pais ofcorespondng dats and model normals. (9) with = 1,...,¥. We sume no pairs il that a, mh ora, + fr al pairs 1, Forcahi-t puro corresponding norma: (0) a= (ns, nh a) #10.0.07, (0 Rajat xa <0) fr (G) Tacromion ale 6 [7] een by femal eg PEATE guy ae? () ELSE TF nf =n, @m=my 4) IF (anf cay <0, etm = ® (o ELSE tal ag, ands (i TFCare xa <0, ww tan)" (6) E15 (xo ation coplanar with he ta normal acne with he tel ors DOM an =m a2) (Gi) TE Cafe, <0) 06 (a) <0) «i som A= led ad 1am 1) aus)300 Chapter 11 Locating Objecsin Space ‘The algorithm canbe made more robust in various ways, You can consider ony the pir in which the deta end model normals frm the sme angles, within 2 reasonable numerical tolerance. You can ty to identify oars (severely wrong estimates) inthe my and 6 (Exercise 11.10), disacd them, and recompute the averages. You can take the median instead ofthe mean instep 11.32 Estimating Rotation First ‘The second solution ofthe location problem computes the rotation ofthe object fist, then computes the translation of the rotated model reference frame which brings the ‘model onto the data Estimating Rotation. The rotation matrix is computed essentially as in ROT MATRIXRANGE, but we incorporate the reliability of the data normals possibly supplied by the acquisition stage, and solve the least-squares problem in a different ===S=======— === ‘Algorithm ROT_MATRIX. RANGE? The input i formed by V pais of corresponding data and model nocmal,mf,....mf, and ifs. -tf respecivey aswell kaso of positive weighs... uy expressing thereby secede data normals (he lrgerw, the more cle the estimate m) 1. Bail the correlation marc K = 4, wn 2 Compute the SVD K-=VAU A being the diagonal mati of the singular values ‘A. Compute 3= det(VU, andset 8 too, whatever isclosest 44 The test east-quares estimate ofthe rotation mariis 100) o1oly (1120) 003 ‘The output she estimate ofthe rotation mati, ‘2 ifn information on data liability is valle, set © ROT.MATRIX RANGE? and ROT MATRIX RANGE are essentially two ways of Solving the same least squares problem. ROT_MATRIX. RANGE oo could be modified totabe into acount data uncertainties (Exercise 11.9). Estimating Translation, We now rotate the mode! reference frame sing R, and proceed to estimate the translation bringing the rotated model onto the data, We make ‘se ofthe equations ofthe patch planes, Pini =a (uy Section 113 Matching from Range Data 301 for data and mol respectively isead ofthe patch entoidh used by TRANSL RANGE (weshalecmment on thicoie inthe next secon) Inthe equations above, 4f and dare he dances ofthe planes (om the on aad Pi gener pit in shoe. Te indeperden planes constrain the asain compete Inthe Weal xs, the projection ofthe usknows translation slong the normal of any plane would be ely the distance between te unrated an traslated plans, (if =). Therefore we can determine Ta he slution of he east gars problem ‘= aramjni Des — ay Tay (22) ‘where a isthe normal ofthe rotated jth mode plane, Problem (1.2) coresponds tothe overconraned system of N equations ajay, re with j =1, ..,.N,which cam be rewritten inthe familiar matrix form as 1T=4, (1123) With L 230 x 3 matix, and da 3N-dimensional vector. Here isthe usual summary Algorithm TRANSL, RANGE2 ‘The nputisformed by tre esti planar patches, Pa! =f and Ps rotation matrix, and by the equations aN comesponding 1 foe data and model respectively 1. Rotate the normals al the model patches: 2 Solve the ovecostained system (1.23) by least squares. ‘The outputs the estate ofthe translation vector, The considerations on outers made for ROT_MATRIX_RANGE apply here to. 11.33. Concluding Remarks From a mathematical point of view, range-based location appears as litle more than an exercise on least squares: One way of another, we always setup a least squares problem, then just twned the crank. The real challenges are robusiness and accuracy, ‘hich depend on how well we can cope with the errors and occlusions corrupting the ata, and how well wecan dently and discard outlersin sets of input data or numerical estimates. The Further Readings and Appendix, section A.7 point to suggestions on the latterpoint Related challenges include segmenting and pairing features and estimating feature geometry accurately.302 (Chapter 11 Locating Objects in Space Using plane equations leads to more reliable translation estimates than using ceatroids, because the latter are more sensitive to partial occlusion. Nonplanar (¢., cylindrical, spherical) patches can also support range matching, at the price of more ‘complex descriptors and procedures. 11.4 Summary ‘After working through this chapter you shouldbe able to {2 explain the problem of 3-D location from intensity and range images 2 explain the different problems involved by the use of intensity and range data 12. design algorithms fr 3-D model matching from intensity and range images, given set of corresponding feature pai find the pose ofa circle in space from its image 115 Further Readings ‘The literature on model-based location from intensity images is very rch [14] contains an interesting collection of papers on -D recognition and location. Fora review on the recovery of D pose fom pont triplets in the general perspective cas, see Haack et 1a [8} A closed form solution based on quaterions i given by Horn [9]. Murase and [Nayar [15] i an example of location sytem adopting viewer-entered object models and models not based on features. 3D_POSE i based on work by Love [LI], who deals withthe uniqueness of the solution and other issues in his account ofthe SCERPO sjstem. Lowe has also reported fn extesion forthe location of arbitrary 3D surfaes and articulated objets (12) WEAK _PERSP_INV is based on Alters thesis [1] (se als [2)), which isan excellent source of references. Algorithms ROT_MATRIX.RANGE end RANGE_TRANSL ‘wereused by Banu [3]in an earl system for automatic model aoquisiion from multiple range views. Grimson and Lozano Perez [7] us the axing parameterization and show an application of AXIS_ANGLE, RANGE. Our algorithm s adapted from Orr [16], ROT_ MATRIX. RANGE2is based onthesolation givenin Kanatani book (10), Chapter 5), which includes proofs and a detailed discussion of the algorithm, The same ‘chapier gives a complete mathematical treatment of rotation estimation, least squares and SVD, ‘Among the Several nfuentil papers on model-based matching, Faugeras and Hebert [5] apply the pscudoinverse method to obtain the best least-squares transfor- ‘mation; Marshall and Marin [13] discus and refine ths method. Boles and Horaud describe the local feature focus method, and its implementation in the 3DPO system [s} designed to dite robot arm to gasp industrial parts RANSAC [6 (fr random sumple consensus) is an example of robust algorithm, inthe sense of good tolerance 10 cuter inthe data (ee the Appendix, section A.7 for more on robustness). Section 1.6 Review 303 116 Review Questions 11 ILI Whatis the location problem? How does it relate to identifcation? 1D 112 How would you modify 3D_POSE if the focal length is unknown? 113 Inthe eas: of lne-to-line correspondence, 3D_POSE requires the choice of ‘wo points on eich matched model line. What isthe best possible choice? 5114 Invhat siuationsisthe weak-perspoctive camera model suitable for model based matching? 0115 Explain why a model triangle parallel to the image plane (Figure 115) leads to instabilities in algorithm WEAK_PERSP_INV (Hin Look at what happens it the locaton of ne of the three points on plane x is perturbed inthe direction ‘orthogonal to the lin identified by the other two points) 13116 Thetwomethodsforrange-based location suggested in section 11 3estimate rotation and translation in different orders Does this mean thatthe estimated rotation and translation should be applied in diferent orders to align model and data? Why? 11 117 What is tre exact difference between applying the rigid transformations Rix+T) and Re + Ttoa pair of independent vectors u,v? Visualize your rests bby moving areal object in pace, Can you imagine cases in which the result isthe same independent of the order of application of rotation and translation? 1 1L8 Under wtich hypotheses iit sensible to take the average of several esti mates to find an approximate answer, as done in TRANSL. RANGE? 3119 We suggested that algorithm AXIS_ANGLE.RANGE can be made more robust by using the median, not the mean, in Step 2. Under what assumptions ‘would this work? When would it fil? 1311.0. How world you assign the reliability weights tothe datainput oa location algorithm ike ROT_MATRIX_RANGE2?Inother words, which elementsin the processing of the previous modules conteibute measurable uncertainty? ILI We suggested planar patches as the main features for solving the 3D location problen with ange data. Can you imagine what problems would aise in ‘ur solutions if eurved patches were adopted instead? Exercises © Id Devise numerical and geometric examples showing thatthe order of mult plication of roticon matrices matters in general (Appendix, section A). Is this still the case if the rotation angles are very small? Check the results by rotating ‘eal object by wo large and two small rotation angles around the same axes, respectively, anc altering the order of matrix multiplication © 112 Think of smple heuristics forthe estimation ofthe initial values of Rand T” {in 3D_POSE. (Hint Try to make use of information about the centr of gravity of308 Chapter 11 Locating Objects in Space the data (for 7 and 7), the apparent size (for f), and orientation ofthe matched image lines (for two ofthe rotation angles),) © 113° Write outa complete algorithm extending 3D_POSE tothe case of lines, as suggested in section 11.21, © 114 The extension suggested in Exercise 113s based on the explicit line equation (11.6), butthis representation sinadequate for vertical and near-verticallines. ‘Adapt your solution to the line representation x cos + ysin# = d, which does not have this problem, © ILS. Devise an algorithm for backprojecting model features onto data, and a procedure based on the RMS error for estimating the fit between backprojected ‘model features andthe corresponding image features. 116 Show thatthe quantities a and in (1.8) are proportional to the areas of the model and image triangle, respoctively, Show that a > 0 implies b > 0 and #2 — ae > 0. (Hint Make use ofthe angles ppop2 and PiPiP>) (© 117 If only a subset of the model patches are visible, algorithm TRANSL. [RANGE produces, in general, wrong translations (centroids). Assuming no false correspondences, explain how the model canbe altered to counteract tis problem, (Hint: Compute the model centroid using ony the matched patches) (2 IL. Write out the components of M and f in algorithm ROT MATRIX. RANGE. Use the result to identify three independent systems within (11.16) © 119 How would you modify ROT_ MATRIX. RANGE so that the oltion incorporate uncertainties on the dato normals? (© 12.0 How would you identity outers in the m and ¢ of AXIS_ANGLE. RANGE? Ifyou need help, checkout Append section A. © 1141. Using the equations inthe Appendix, section A, writ roaton maeix Rvihch expresses a rotation around [1,1] by 48” anticlockwise. Assume @ right-handed reference frame. © 11,12 Wrvea program which (a rotates wo syihtievectors, vino w,¥t8- ing the rotation matrix defined in he previous exereise; (computes an estinate ‘of R using algorithm ROT_MATRIX_RANGE. Iya rotation matrix within reasonable numerical accuracy? 9 1143. Usingte Restimatedin the previous exercise, writea program thatrotates buck uy, using, producing di ¥. Compute the distance between the tis of tia, and ¥,,- How large are the errors? How do they change if you perturh 1, With Gaussian noise of increasing standard deviation? Projects © 111 Implement the slgorithm 3D_POSE for point-to-point and ine tole cor- espondences respectively Using the solution to Exercise 11.3, study the conver References by eI 1 i 5 6 8) 8 uo) wy References 305 gence properties ofthe algorithm asa function ofthe numberof features. Test your ‘ode on real imzges of a polyhedral object using the solution to Exercise 115, 112 Implement WEAK_PERPS INV, and test the stability of the results (in {erm of reconstruction errors) with several data corrupted by varying amounts of Gaussian noie. Run experiments with several triplets of object points, so that the angle formed by the image plane with the plane defined by a triplet varies between about 3 and ©. 13 Implement algorithms ROT_MATRIX.RANGE? and TRANSL RANGE? using your favorite numerical package. Test your code on synthetic ‘data, obtained by generating sets of points rotating and translating them, and coe ruptng the resus with nose of various intensity (see the Appendix, section A. for guidelines or experiments) to check robustness. Ifyou have implemented the Patch extractor proposed in the project of Chapter 2 o if you have access to a program extracting adequate features from range data, you can test your code ‘with real data aswell ‘TD. Alter, RobustandEficent3.D Recognition by Alignment, ech Report AP-TR-I410, “Massachsets Insitute of Technolgy, Cambridge (MA) (1982) ‘TD. Alterand EWI. Grimson, Fast and Robust 3.D Reoogition by Alignment, Proceed: ings 3d IEEE Iuernational Conference on Comper Vision (193). 'B, Bhanu, Repreeataton and Shage Matching of :D Objects, IEEE Transactions on Panera Anais ad Machine Iteligence, Vol PAMIS, pp. 40-351 (198), RC Bolles and PHocaud, PO: A Thee-Dimensional Part Orientation Sytem, IEEE InernationalIournal of Robotics Research, Val 5, pp. 3-26 (1986). (OD. Faugeras and M. Heber, The Representation, Recognition, and Location of 3D Shapes trom Range Dat, Znemacona Jounal of Robots Research, Vol. 5, pp. 272 (0986, ‘M.A. Fichler and C. Bolles Random Sample Consensus A Parag for Mode Fiting wi Applications Image Analsisand Automated Carogrphy, Communications of the ACM, Vol 24, 9p. 381385 (1981), WEL. Grimson an T Lozano Péter, Mode-Based Reeogiton and Localization fom Sparse Rang a Tete Data, Inertial Journal of Roboie Research, Vol 3, pp 3-33 (i984), RM. Haralick, C. Lee, K. Ottenberg and M. Noll, Analysis of Solutions of the Throe Point Perspective Pose Estimation Problem, Proc, IEEE Int Cont on Compute Vion snd Pater Recogtion, pp. 82-8 (19) BK. Horn, Cloed-Form Solution of Absolute Orientation Using Unit Quaterions, Journal of he Opeal Soi of Americ A, No.4 (137), K. Kanatani, Geometric Computation for Machine Vision, Oxford Univesity Press Oxford 0993, D.Lowe, Three-Dinensional Object Resngiton rom Two-Dimensional Images Aifiial Inelgence, VOL 3, pp, 355395 (198).306 Chapter 11 Locating Objecsin Space 10). Lowe tng Pre The Dimes deo Inge EEE aos oo it oe Macnee VL PMI ep 1250 091 1a) Ab vaahand Mart Comper ion Model ond pasion Ne ete Cenio ) pu) enw Masten 38530 Node Rern fom Sep Cue, NT os, Cnet) 9) 15) ese dK Sap Vl Laing an Ripon. Oe om App : a ance, International Journal of Computer Vision, Vol. 14, pp. 5-24 (1995). Pppen Ix (is) MIL Om On tmaing Rte DAI ong Pape 5, Det o Ariel nei foe aly bn 0) In this appendix, we sive you some details on a number of fats needed to read and make profitable use ofthis book. ‘A1 Experiments: Good Practice Hints This secon cones ineduton othe perfomance sessment of compat von Drom and is coerned mainly with te dg of experiment wx Toe purpose Sony to provide of ood yrctce panies be wared Unt te foundations of ! tote he pei la of Mahine Vion and Appraton orale performer char tctrzationof von clgriims ination othe Web ste quoted in Chapter check out the infra ston pefoatcesesteat ecm for compat von systems and comporeng Srp. //pendora. ings /B0Mot/becinaring. ial tainted by the Ewopean Computer Vsion Network (ECV Ne). Another txing te dese HATE ol fo senerting spt for testing computer ven programs (aeep://peipa.eesex.ac.uk/nate/) yan Anand on Expres | The reason for an appendix on experiments in a textbook on computer vision is twofold. Fan fro copter eben exten icin tr dt fst thon caial on tnt dlc apne senna | mio in gee don nl Sole er of opty amatuer) enon nan ces went ste oe veoh pce 307308 Appendix Appendix Experimental Performance Assessment ‘Testing isnot just about taking separate measurements, but discovering how a system behaves asits parameters and experimental conditions vary. Thisbehavior canbe investigated theoretically and experimentally ere we ate interested inthe latter method. Our objective is therefore 10 assess experimentally and quantiatvely how well a computer vision program performs its tas. ‘Toachieve tis objective, we must 1. identify the key variables and parameters ofthe algorithm; 2, predict the behavior of the output variables as functions of the parameters and input variables; 4. devise a systematic testing provedure, checking the results ofthe tests against ‘expected values, ‘We discuss briefly each point in the next sections. In essence, we run the program ‘undergoing testing in a large number of conditions, and record the value of target variables, their diserepances from the expected true values, or errors and the error statstis Ifthe tue values are not knownin advance, we canstill easure the dil between each messure and its average, and its statistics. Identifying the Key Variables The key variables of an algorithm are: © its input variables; + its ouput or target variables; + its parameters, for instance constants and thresholds ‘We must also consider quantifiable experimental conditions which can affect the results of tests (eg, illumination dtection, uncertainties on input variables), that we shal all, experimental parameters. ‘The nature of the target variables and their asociated errors depends onthe algorithm, For the purposes of this discussion, we identify two broed classes of algorithms, aimed respectively at measurement or detection. “Measurement Algorithms. a algorithm outputs measurements like lengths or areas such measurements are the target variables. In this case, typical errors used are + themean aboe ero, tm = Bath where isthe numberof tests is the value ofthe target variable, x, measured atthe i-th test, and xp the varable’s true vale; Section A1 Experiments: Good Practice Hints 308 * themean error, the RMS (rootmean square) error, Detection Algorithms. Wfanalgorithmis meant to detectinstancesofa particular entity, the errors abeve do not make sense; instead, we observe + the numbers offale posit instances. ats spurious responses not corresponding trea ‘+ the numberof fase negatives; that i real instances missed by the algorithm, © Notice thatthe quantities above are estimates ofthe poser’ probabiliy of fake positives and fae negatives Predicting the Behavior of the Output Variables ‘There are two basic cases, depending on the nature ofthe input data: As the test data are synthetic Syheti daa ace simuloted datasets. They allow us to run our program in per fectly controlled stations; as we design the data, we know the value of al! input variables and parameters (Sometimes called ground rath), as well as the exact value of the corresponding output variables. ‘Real data ace o%taned Srom real images. When testing with real data, you should endeavor to control the input variables, parameters and experimental parameters, so that results can be compared with the predictions of synthetic tests. + Synthetic data are wef 2 ong as they miro elise situations as closely as possible It ‘good practice w hein testing with such synthetic dita then use the resus t uid the hoo of paramatrs and input vaiales ob Uo in tests with real data Its deity ‘bad procice oem tbat “the program works well” because “the results look god” wih ‘fem uncharacterized el images Designing Systematic Tests ‘The next step isto cesign systematic experiments to investigate the behavior of the program as input variables, parameters and experimental parameters vary. Again, the ‘procedure depends onthe nature of data we begin with synthetic dataAppendixA Appendix Synthetic Data, Reference Procedure with Synthetic Data 1. Choose a realistic range of variation fo each input variable, algorithm parameter, aad experimental parameters. Disretie each range using sampling steps small enough 10 {utrance a signican sampling ofthe behavior ofthe output variables, but large enough ‘omake the toal number o uns in ste 4 (below) feasible in reasonable mes, gven the ‘computational resources available, 2 ‘Selecta modelofandom, additive nose tobe added the ideal data, The model (statistical ‘dtributon anit paramcter) should ree the real mise measured the aplication; in the absence of such infomation, ase Gaussian nose lus outliers. Chooses eaisicrange forthe amount (standard deviation) of the nose, and discretize such range according to the ere of pot ‘4 Foreach posible combination of values of input parameters experimental parametersand mounts of oie: (@) generate the dataset corresponding tothe current values andcoruptit witha number of diferent realizations of noise ofthe earent amount, (6) run the program onl the nosy datasets obtained; (6) store the average values of the target variables together with the current valves of ‘oie amount, input variables algorithm parameters and experimental parameter. 4 Compate global sais (over al ests) forthe errors ofthe tage variables Itis convenient to show the results as graphs plotting the error statistics against variations of input variables, program and experimental parameters, and noise level. ‘= As we cannot test the progam in all posible situations a practical question ass: What isthe minimom number of test guaranteeing that the results are representative forthe ‘general bebavir of the progam, within a given level of confidence? Aa exhaustive an "wer comes from tats, and the considerations required are beyond the spe of his appendix Refer othe Further Reading (for instance Lapis book forthe we sory. Real Dasa, Ulvaately, the ayoritn must work with sea das fo he target application. Hence, you should test your program with real, controlled images. What and how much you can control in real data depends on the application; forthe purposes of performance assessment, ou should know in advance reliable estimates of input and ‘output variables and experimental parameters, This includes the statistical properties ofthe image noise (Chapter 3) and the uncertainty on any quantity, whichis input to the program, ‘In these assumptions, the reference algorithm for synthetic data cam be adapted and applied. You eannot vary values as you please any more, but you can certainly use the results of the synthetic tess to identify key values for each controllable range, and run real tests forthe seleted combinations of input variables and parameters Section 2 Numerical Differentiation 341 References RM. Hlaralich, C. Lee, K. Ottenberg, and M. Nolle, Analysis of Solutions of the Three Poirt Perspective Pose Estimation Problem, Proc. IEEE Int. Conf on Computer Vision and Pattern Recognition, pp. 592-598 (1991) LW. Lapin, Probability and Statsties for Modern Engineering, 2nd edition, PWS- Kent, Boston (MA) (199), ‘Machine Vision and Applications, Special Issue on Performance Characteristics of Vision Algorithms, Vol. 9, no. 56 (1997) ‘AZ. Numerical Differentition ‘Tis section remind: you of some basic formulae for numerical derivatives, and how to apply them to digita images. Given our definition of digital image andthe needs ofthe algorithms inthis beok, we limit ourselves to firs and second derivatives in the case of ‘equally spaced data. To dispel any potential impression that the problem is solved by a few, straightforward formula, we list inthe lat section a few ofthe many dangers lurking in the periloa land of numerical differentiation, Deriving Formulae for Numerical Derivatives Consider a real-valued function, (1), of which we know only 1 samples taken at equidistant points :y,-..,., sue that xj = 1.1 + for some h > 0, We want to compute finite difference estimates off’ and "the fist and second derivatives of fas ella the error on these estimates! T solve the problem we write Taylor's polynomial approximation of f(s) atx +h and x ~ hs fusmererenoebiroron) (any fe wapor-mroredreson (aa subi (A2) f(g oe te se mf he fC) Lot 4 + om, The quantity 0(2) means thatthe uncation error, caused by stopping the polynomial approximation tosecond order, tendsto 210.8 ash tends to zero. Similarly, summing up (A.1) and (A.2) and solving for f"(x) we get the desired formula forthe second derivative "Note thar weasume ta fan et hima note te stam pois the sample do telly sels!AppendicA Appendix +H) 2/001 + F0-W Pew = Let = 28a fea ‘We can derive fnteifference formulae in which the truncation ertor vanishes more rapidly if we begin with higher-order polynomial approximations (which involve more samples)? Here isa summary of the useful formulae and their truncation errors in the notation introduced at the beginning ofthe section + 00h, First Derivatives: Cental-difference Approximations Second Derivatives: Central-fference Approximations 12h Si “AI 4 om) pa ee} fis =F fis — fa, oy f ~ ae D ‘= With digital images one nearly nvaibly ets ‘Theequationsabove are called cenra-difference approximations, es they estimate the derivatives atx using samples from a symmetric interval centered on x. One can use asymmetric intervals instead, lading tothe so-called forward and backward “approximations; fr instance, fr the fst derivatives, £60 ow are the forward and backward approximation respectively, Notice that these formulae carry larger truncation errors; therefore, central diferences should be your choice whenever derivatives must be estimated wth simple formulae (see the next subsection for hits tormore sophisticated methods). i thingie the emp ht ihe oer derives off ex Section A2 Numerical Differentiation 3 Computing Image Derivatives One of the nice fetures of the formulae above is that shey allow us to compute im «age derivatives by convolution (see algorithm LINEAR FILTER in Chapter 3). Fo instance, suppose sou want to estimate the image gradient, VI ~[@1/0x, 81/0} all pixels, )- You just convolve the rows and columns withthe mask fro =) and this implemen’ the fist formula in the box a ll image points. Such masks ar sometimes called stencil "Notice that theresut can be epresented astm images, one for each radon component suming we alow nnintegcr pixel values (and donot consider peripheral ils). Building the masks implementing the other formulae in the box is left asa exercise. You may aso want to try and show that the tend for the Laplacian o fa fieyh ay oe ey ae 010 1-41 010 ‘A Word of Caution ‘What we sid so farallows you to take numerical derivatives with simple formulae, bu 'sjustthe tp of the ibeberg;a word of caution must be spent on errors. The total error of numerical approx mation is due to two sources, roundoff errors and truncation errors Thissection has considered only the later. The formers due tothe fact that most numbers are not represented exactly by binary cades in the computer; this can lead to significant errors in the result of even simple computations. Moreover, it turns out that there is an optimal value of h minimising the total error ofa given derivative of f at +4; fo make things vorse. this value varies. in general, with. Since we do not contra the spatial sampling of a digital image, we must expect that the errors associated with numerical image derivatives vary across the image. © A useful take-home messige is the fractional acuray of the simple ite-ifference ap proximation inthe box is abvays worse tan the factional accuracy with which ke function an be computed (which, nurs generally worse than the machin acura), Better accuracies can be obtained with more sophisticated methods. For instance, ‘one can ita spline othe data, and compute the derivatives from the splines coefcient. ‘A good, concise introduction to these methods (including C code) is given by the Numerical Recipesa4 AppendixA Appendix References (CE Gerald and PO. Wheatley, Applied Numerical Analysis, Fourth edition, ‘Addison-Wesley, Reading (MA) (1970) WH. Press S.A. Teulosky, WT, Vettrling, and BP, Flannery, Numerical Recipes nC, 2nd ecition, Cambridge University Press, Cambridge (UK) (1992) ‘A3 The Sampling Theorem ‘This section sates the celebrated Sampling Theorem and discusses afew related concepts This should give you the minimum knowledge necessary to go through Chapters and 3.Butifthisis the first ime you hear about the Sampling Theorem, you arestrongly ‘encouraged to read more about it The Theorem Let F(o) be the Fourier transform ofa function £(), with € (~90,00). We assume that f is band-limited; that is, F(w) =0 for Jo| > oc > O. Then the following theorems holds true. ‘Sampling Theorem ‘Tae function f canbe exact reconstructed for every € (oe, +0) fom a sequenee of equis- tant samples, f= e/a), according othe formula $F jlitlarcan £0 — Sabet = Sa © The Sampling Theorem i nontrivial in atlas three respects. Fist, ites you that any bndimited fonction canbe reconstructed acy at locations wher ths ot been ob served. Sesondtherecoseuction relies upon discrete observations. Third, the iformation contet ofa functions not “locals shown by (A.3), he reonseution of ay ale f() receives ite contributions fom al samples A few important remarks now follow, and the Nyquist Frequency ‘The frequency v,= o4/r, inverse of the sampling interval T= /o, is named Nyquist Frequency and is typical of the signal tis oe minimal sampling frequency necessary to ‘reconstruet the signal. Ths means that, fre < the series tive F pitiet=m Section A3 The Sampling Theorem 315 ‘doesnot converge 13 f(t). The diference between f and fs thats the reconstruction error, is due to the fact that the sampling distance is too coarse to capture the higher frequencies of the signal. This phenomenon is called aliasing because f,, the econ- struction of the original signal f, is corrupted by higher fequencies which behave a i ‘they were lower frequencies Ifthe Function is Not Band-imited iow strong isthe assumption that f is band-limited? In practice, not very: Afterall, ‘because ofthe integrability conditions the Fourier transform of any function (fitexsts) ‘must be very close to ero outside some finite frequency range. The problem is, one {does not necessarily know in advance the value of o, and it may well be the ease that 1 fais too small wth respect to the finest sampling distance achievable in practoe. The consequence of thisfact can be appreciated by means of the following example. ‘Assume you are given a sequence of equidistant samples... g(T),g(0),g(T), @27),... of function g(. You don't know whether or not gis band-limited, Ifyou eto =/T, the series Same =) converges to bandlimited function with a: =. (the proof of this fat is left 1 you as an exercise). I gs not band-limited, or pethaps its band is larger than 2, g and & lifer and the difference inereaes with the amplitude of G, the Fourier transform of ‘outside the band [—, 0]. Function Reconstruction Perhaps a weaknessof the Sampling Theorem is that the reconstruction given by (A:3) converges rather sloay. he function sn x/x goes to zeroas x forx -» oe. This means that samples faraway from the location where the reconstruction takes place might stil ive important contabutions. tisinstrutive to evaluate the derivative ofa band-timited funeton f(@) making use of(A.3). To compute / (0), the derivative off) at =O, we take the derivative of both sides of (A.3) set = 0, and obtain Spek, "h Where n ranges from ~co to +50 (with the exception of n=O, where the derivative of sinc(x) vanishes). Thus fora = x (the typical computer vision setting in which the pixel width is one) we have 0 fur fica, ftw fon z * ‘This formula shouldbe compared with the stencils proposed in section A.2. ~ fiat316 Appendix Appendix References |A.Papoulis, The Fourier Integra and ls Applications, MeGraw-Hil, New York (1962) AA Projective Geometry Inthis section, which i not meant tobe a rigorous introduction to projective geometry, ve give you the minimum information necessary to go through the projective material ‘ofthe book. We ist define projective transformations and standard bases, then discuss briefly the most important projective invariant, the eross-ratio. Definitions “The projective geometry immediately relevant for computer vision deals with points, lines and thei relations in -D and 3-D. In this ection, we discus the main conceptsin ‘the planar case; the extension tothe 3-D casei straightforward ‘The Projective Plane ‘We begin by defining the projective plane. Definition: Projective Plane ‘The projective plane, Piste set of equivalence classes of ples of el numbers (oot az), where two tpl p=[e,y,2)" and p bn, A point p< P? is thus identified by three qumbers, called homogeneous co- ‘ordinates, defined up to an undetermined factor. This redundant representation allows us to develop a more general geometry than Euclidean geometry. We retain only the elementary concepts of point, ine, and incidence, but we donot talk of angles and lengths AUseful Model ‘A useful mode ofthe projective plane can bo obtained in 3-D space: each projective point pis put in correspondence witha -D ine through the origin. The prof that this §8a faithful model ofthe projective plane i left to you as an exercise. In this setting all 3D lines (or, equivalently, all points of P*) stand on an equal footing. Instead ifwecut the bundle of3-D linesin our model with plane, x, not going through the origin (say the plane of equation z= 1), we can distinguish between proper and improper pois: ‘+ each point of the projective plane with z #0 isa proper point, identified by the coordinates [x/ Section A& Projective Geometry 317 + each point with: = Ois an improper point, identified by the coordinates [x yO)" ‘Thesame eascning an be applied toboth the I-D and 3-D case forthe definitions ‘of P! and P® respectively). Muatis mucandis, the picture is identical, You add one ‘coordinate forthe description of a n-dimensional point, subject to the condition that the (n+ 1)-tuple of numbers (not all zero) are unigue up to an undetermined factor ‘ence a point in the projective line is ideniied by two numbers, whereas a point in the projective space by four numbers In both cass, the homogeneous coordinates are ‘unique up to an undetermined factor. The Projective Line ‘We now cise this preliminary section by introducing the notion of projective fn. This can be easily done trough the mode above, snc collinear points in corespond to coplanar ines in the -D mode Definition: Projective Line A projective ln u, is terested by 83.0 plane gong though the origin, ot wo. ayy FF Inthe projective lane points and line are dual In (4), one can alternatively think of p 88 (a) point ing on the ine ,o (6 ine going shcough the pint Projective Tansforn AA projective transfomation isa linear transformation beoween projective spaces. In ‘computer vision, thee area least wo important clases of projective transformations ‘+ linear invertible transformations of P*,n = 1,2, 3, into themselves. ‘+ transformationsbetween P® and P?, which model image formation, In what follows, wo are interest in the first class. In particular, we want to establish that a projrtive transformation of * onto iselfs completely determined by ityaction on n +2 points. For Ue sake of simplicity, we prove this general result in the Particular cae of n=2; the extension to the ease ofa generic n> 0 does not pose any ‘problem and is eft fr an exercise. Determining a Projestive Transformation A projective transformation of the projective plane onto sifiscompletely determined one the {eansformation is knowa on four point of which no thre are elise.a8 ‘Appendix A Appendix ‘Asa projective transformation is line resent tin matrix form and write invertible transformation, we ean rep- Tp=p. ‘Since the coordinates of both p and p! are known up to an undetermined facto, the entries of Tare also known up to an undetermined factor. ‘We want to show that T can be writen in terms ofthe four points p= [xx +. -- image of py = [1,0,0]", p2=10, 1,0)", ps =[0, 0,1)", and ps [2,15 4]", respectively. Thanks to the fact that T isa 3x 3 invertible matrix, our statement is proven ifitcan be proven for T, We start by writing Ty From this equation we find thatthe frst columa of 7! can be written as Ax 921", vith undetermined, By using the knowledge of p and p), we then find that 7"! can be written as ay} wo (: a dh acy Since no three ofthe fou pont pare collinear, we can now determine 2 and v, up to an unknown factor. To do this, we us the lst availabe point, pa, to find af + ua} bos =p ay, rink toy de teh veh = 0x i ‘The nine entries of T~ are therefore known up fo an undetermined factor. In summary, we have shown that a projective trasformaton of P® onto itself is characterized by it ation on four points, no three of which are collinear. The four points py.s, are called the standard basis of P= ‘By means of similar argumens, you should be able to show that one can pick (0,0f",10.1]", [11] asthe standard bess forthe case ofline-totine projective ransor- ‘mations, and[1,0,0,0][0,,0,0]",[0,0,1,0]",[0,0,0, 1)", [1,1.1, 1)’ asthe standard bass forthe case of space-tospace projective transformations ‘The Cross-atio We close this brief appendix on projetive geometry by touching upon the vast subject ‘of invariants, We consider the most important and simplest invariant, the rossratio. Section A4 Projectve Geometry 319 Definition: Crose-ratio Given four dine points of P*, described in homogeneous coordinates 3 p Atheros moc i defined as [ssl = 20.246.9) 239 = G5) with ai, =n) —59% {he determina betwee p20 Croseato Invariance ‘The eos ati invatant to projective tansformations of P onto itt ‘© Notice thatthe orkrof the points matter inthe defniton of rosa, and each pint appears once in tke numerator and once inthe denominator. In addition, the erosatio canals ake onthe improper values. ‘You should be sble to prove that, given four points, you can define sx different cross-ratios. A slightly more dificult exercise isto show tha, if is one of thes erss- ratios, the other five ae 1/2, 1-2, 1/(1 — 2), (—1)/2, and 4/0. —1). This tells you also that for some labelling the cros-ratio goes toinfiaty, iti always possible to find a different labeling which evaluates toa finite value Itisinstructive © go through the proof ofthe cros-rati invariance step by step. If we form the determinant between two of the four points, say p; and po, and assume ‘yand y, are not zero we can write wrsmm= yn (2-2) as aa.2 Aslx,y1” arehomogeneous coordinates of ine point, 1, 2)can be interpreted asthe Euclidean distance between py and po times an undetermined factor, yy2, depending ‘om both py and pa Naw let 7, 2x 2 invertible matrix, representa generic projective ‘transformation, Th =P. If the determinant between p] and py is denoted by a1, 2), we have #0, 40,2), where [| denotes the determinant of 7 We can see that the determinant of the trans- {formation cancels outin any ratio of determinant ike (AS), but not the factors 913 such factors cancel ou, however, in the definition of crss-ato, for which invariance is achievedAppendix Appendix References JUL, Mundy and A. Zisserman, Appendix—Projective Geometry for Machine Vision. In Geomeric Invariants in Compuaer Vision, Mundy, JL and Zisserman, Ao MIT Press, Cambridge (MA) (1992), CE. Springer, Geometry and Analysis of Projetive Spaces, Freeman (1964), AS. Differential Geometry This section complements our discussion of range image segmentation (Chapter 4) by ‘recalling afew concepts from the differential geometry of surfaces. Surface Tangent, Normal, and Area Consider a parametric surface, S(u,v) = [x v),y(u, 0), 2,1) ]" and a point P on $. ‘Assume that all ist and second derivatives of § with respect tow and w exist at P. The ‘tangent plane at P is identified by the two vectors S, = 38 au and $, = 8/30. ‘The surface normal isthe unit normal vector to the tangent plane of Sat Ps thats, S,xS, 18.30 ‘Range images in ry form (Chapter 4) correspond to the parametrization S(x, y) = (x,x.Gx 9)" In this case, asi?) hs. tis also useful to know how to compute the area of a surface patch, Q, from 2 generic parametrization, Qn»), aswell s from the parametrization Qa, = (x yh, ))). If weal Ag the former and Ay, the latter, we have sox f f10. aided ay= | [ Vir rmaray [v Surface Curvatures ‘Curvatures make useful shape descriptors as they are invariant to viewpoint and parametrization. We now want to extend the notion of curvature of a curve to define ‘the curvature ofa surface, in order to derive the quantities used in Chapter 4 ‘To begin with, recall thatthe curvature ofa parametric curve a(t) = (x0, ¥10), with ra parameter, i given at each point by Section AS Differential Geometry 321 ayter'y koe a wrt ort ‘where a= dad. For the common parametrization a(x) = [x,y(x)]', the curvature becomes ko=—_, asort Consider a parametic surface, Su»), and a point P on S. Assume that all ist and second derivatives o”§ with respect 10 u,v exstat P. We define surface curvatures in foursteps. ‘Step 1: Normal Curvature of a Curve on S. Consider a curve C on S going through P. We define the normal curvature of Cat P as ky =koos, ‘where fis the curvature of Cat P, and is the ange formed by the surface normal at P,s(P), withthe curve normal, me(P).> Step 2: Norma! Curvature along a Direction, Itcan be proven tht ky does not depend on the paricular curve C chosen, but only on the tangent of Cat P, identified by the unit vector d. Thi enables us to speak of normal curvature along a direction. For the sake of visualization. we can choose C as the planar curve obtained by intersecting $ ‘witha plan through P and containing both d and ms(P). Obviously, C8 a erss-section ‘of S along d, and describes the surface shape along that direction. Notice tha, in this ase ‘Step 3: Principal Curvatures and Directions. We could now desceibe the local shape ofS at P by taking the normal curvatures at P in all directions. This i totally ‘impractical, but fortunately unnecessary. Assume we know the maximum and minimum ‘normal curvatures at P, ky and fy respectively, called principal curvatures, and the corresponding direction, dj and d called principal directions. It can be proven that the principal directions are always orthogonal: * the normal curvature along any direction, v= (cos 8, sin 6), where f isthe angle from dj to d cn be computed through Euler’ forma Ig = hy 00s" f+ hy inf * consequently, the local shape of the surfaces completely specified by the principal curvatures and directions. Soca ink of th projstion kc) ng the ace ormal ns?)32 Appendix Appenclx ‘Step 4: Classifying Local Shape. Finally, the shape classification given in Chap- tee 4 is achieved by defining two further quantities, the mean curvature, H, and the Gaussian curvature, K +h AGA k= hike ‘One can show that the Gaussian curvature measures how fast the surface moves away {from the tangent plane around P, and inthis sense isan extension of the L-D curvature {The formilae giving M and K Tor a range surface in form, (x, ,h(x,9)) are given in Chapter 4 #H References MB. Do Carmo, Differential Geometry of Curves and Surfaces, Prentice-Hall, Englewood Cit (1) (1976) ‘AG Singular Value Decomposition ‘The aim of this section isto collect the basic information needed to understand the Singular Value Decomposition (SVD) as used throughout this book. We start giving the definition of SVD fora generic, rectangular matrix A and discussing some related concep. We then illustrat thee important application ofthe SVD: ‘+ solving systems of nonhomogeneous linear equations; + solving rank-defcient systems of homogeneous linear equations; ‘guaranteeing that the entries ofa matriestimated numerically satisty some given constraints (e@, orthogonality) Definition ‘Siogel ‘Any m x nmateix A can be writen asthe product of tree matrices: "Value Decomposition Azupv". ao ‘The columas ofthe m x m matic U are mutually onthogoal unit vecors, as ate the columns ofthe mx matrix V. The m xa matrix D is diagonals agonal elements, called singular aes, are suc hat "= Whileboth U and V are no unique, the singular values; are uly determined by A, ” zl Section AS Singular Value Decomposition 323, ‘Some important properties now follow. Properties of the SYD Property 1. The singular values giv yu valuable information onthe singulavity ofa square mati square marx A i nonsingular if and only if als Singular values ae diferent fom zero. Mest importantly, theo; aso tll you how close A is tobe singular the aio called condition number, measures the degree of singularity of A. When 1/ is comparable with the arithmetic precision of your machin, the matrix Ai il-condifoned and, forall practical purposes can be considered singular. Property 2. If Ais recanguar mati, the number of nonzero equ the rank of A Thus given a xed tolerance, (typically ofthe order of 10") the number of singular values arcater than ¢ equals the effective rank ofA Property 3. If Aisa square, nonsingular matrix its inverse can be writen as atevpy", Be A singular or not, the pseudoinverse of 4, A*, can be written as a with Dj" equal to 2 for all nonzero singular values and zero otherwise, ICA i nonsingular, then Dj!= D and a* =a! Property4. ‘Tre columas of V corresponding to the nonzero singular values span the range of A the columns of V corresponding tothe zero singular value the null space ofA Property 5. ‘he squares ot the nonzero singular Value are the nonzero eigen values of hoth the mx matrix ATA and m x m matrix AAT, The columns of U are eigenvectors of AA, the columns of ¥ eigenvectors of A". Morevoer, Auk =o." and Avy = oyu, where wand v, ae the columns of € and V corresponding 0 oy Property 6. Cne possible distance measure between matrices can use the Frobenius norm, The Frobenius norm of a matrix A is simply the sum of the squares of the entries aj of A, on 4. (an)324 Appendix A Appendix ‘By plugging (A.6) in (A.7), it follows that \Alr= De ‘We are now ready tosummarize the applications of the SVD used throughout this book. Least Squares Assume you have to solve a system of m linear equations, Ax=b, for the unknown n-dimensional vector x. The m xn matrix A contains the coefficients of the equations, the m-limensional vector b the data. If nt all the components of b ‘are null, the solution can be found by multiplying both sides ofthe above equation for A" toobtain Al ax= A", It follows tha the solution i given by xeATaytay, ‘This solution is known to be optimal in the least square sense. is usually good idea to compute the pseuoinverse of ATA through SVD. In the case of more equations than unknowns the pseudoinverse is more likely to coincide with the inversof AA, but keeping an eyeon the condition numberof AA (Property 1} won't hut ‘Notice that inca iting amounts to soe exactly the same equation. Consequently, you can se the same strategy! Homogeneous Systems ‘Assume you are given the problem of solving a homogeneous system of m linear equations in unknowns ax with m > n ~ Land rank(A) =» ~ 1. Disregarding the trivial solution x =0, a solution ‘unique up toa sale factor can easily be found through SVD. This solution is simply ‘proportional othe eigenvector corresponding to the only zero eigenvalue of A°A (all ‘other eigenvalues being strictly positive because rank(A) =n ~ 1). This can be proven as follows Since the norm ofthe solution of homogeneous system of equations i arbitrary, wwe look fora solution of unit norm in the least square sense. Therefore we want to lan? = (ax) "Ane xT AT AR, Section AS Singular Value Decomposition 325 subject to the constraint Introducing the Lagrange multiplier this equivalent to minimize the Lagrangian f(a) =x" AT Ax — XX" — 1), Equating to 7er0 the derivative ofthe Lagrangian with respect to x gives AAR —ax=0. This equation tls you that 2. an eigenvalue of ATA, and the salution, x=, the corresponding eigmector. Replacing x wth, and A" Ae, with ein the Lagrangian yields Le “Therefore, the mininum i eached at 3 =0, the last eigenvalue of ATA. But from Properties 4and§ follows tha thssoltion coud have been equivalently established ashe column of V cxesponng to he only rll singular value of A (he kernel of ). This the reason wt, throughout his book, we have not distinguished between these ‘90 seemingly diferent solutions ofthe same problems Enforcing Constraints One often generates numerical estimates of a matrix, A, whose entries are not all independent, but sasty some algebraic constraints. Ths isthe exs, for example, of ‘orthogonal matrices or the fundamental matrix we met in Chapter 7. What is bound to happen is that the errors introduced by nose and numerical computations ater the «estimated mati, cal it , so that its entries no longer satis the given constraints. This ‘may cause serious problemi subsequent algorithms assume that A satis exacl the constraints __ Once again, SYD comes to the rescue, and allows us to find the eases matrix 10 A, in the sense ofthe Frobenius norm (Property 6), which sates the constraints cexacly. Thsisachieved by computing the SVD of the estimated matrix, A=U DV" ,and estimating Aas U D'7", with 0” obtained by changing the singular values of Dto those expected when the constraint are satisfied exactly" Then, the entries of A= UD'V" satis the desired ecasteints by construction. References Strang, Linear Algebra and its Applications, Harcourt Brace Jovanovich, Or- lando (FL) (1988). “ifs a pod sume stint issnglr aus sold not eto aro the expected ones.226 Appendix Appendix ‘AJ Robust Estimators and Model Fitting “Thissecton sketches few, introductory concepts behind robust estimator, n particular the socalled M estimators. Our limited aimis to support the discussion of Chapter S by explaining ‘why least squares are a maximum likelihood estimator from the point of view of statis, + wy least squares i skewed significantly by outliers, and ‘+ why an estimator based on the east absolute value, as used in Chapter 5, tolerates, ‘outliers better than conventional leat squares. "The subject asa vast literature; an inital set of further readings is provided atthe end ofthis appendix. Several robust estimators nat detailed here have become popular in computer vision. interested, you should checkout atleast the least median of squares, discussed inthe review by Meer eta. and detailed in Roussecuw and Leroy's book, and the RANSAC (Random Sample Consensus) algorithm, introduced by Fischler and Bolles and also discussed by Meer et a. Least Squares as Maximum Likelihood Estimator Consider W data points p= ry)" 1... Nand a model y= f(x, where a is vectorof parameters an fis a inown fenton. Assome ths the data pint are erations corrupted by noe (tobe characterized ete inthe following) Te well known estimate of the parameter vector, a,, such that f (x, a.) interpolates the data best inthe leas squares senses were ‘We want to show briefly that a, i the parameter vector maximizing the probability that ‘he data area noisy version of xa), given appropriate assumptions on the noise. ‘Notice that we should estimate a by maximizing the probability that ais correct _siven the dota, but we cannot estimate ths probability (why?), However, if we assume that the noise corrupting each data point is addiive and Gaussian, with 2ero mean and. standard deviation , and that the amounts of noise at different data points are all, independent, we cun express the probability P that, for a given a, all datapoints fall within ay ofthe true valve: FGa) (as) 8 oryat Pa] ay), (as) In essence, mavimum likelihood estimation follows from the assumption that the Parameter yector which maximizes P is also the mast likely one 10 occur given the ob | Section A7 Robust Estimators and Medel Fitting 327 served data. To obtain (A.8) from (A.9) snow easy: we just notice that P is maximized by minimizing the negative of its logarithm: that, and the constants ¢, N, Aycan be ignored inthe minimization, Why Least Squares Fits are Skewed by Outliers Since we assumed that the noise corrupting the datapoints is Gaussian, the probability that noisy points within distance d fis corresponding true point deceass very rapidly with Consequently, the lest squares estimator belies that mest points li within afew standard deviations ofthe true (ankaown) model. Now suppose thatthe data contin even a small percentage of points that are just way off and presumably snot consistent with the Gausian hypothesis Ponts ike these are called ous. Inthe absence of further information, a leas squares estimator believes that outliers to ae lose to the model asthe probability of an outice beng a fa sit really from the true models racially zero. The results thatthe outs “pull” the best ft away from the true model mut re than they shoul Why Absolute Value Estimators Tolerate Outliers Better ‘Theessence ofthe problem with leastsquaresis thatthe Gaussian distribution decreases very rapidly as d becomes lager than ¢. Therefore, a solution is to adopt a nose distribution which does not vanish as quickly as a Gaussian; that is, which considers outliersmore likely occur. An example ofsuch distribution s the double exponential, Pridjace! In this case, the provabilty P becomes Poff ay (ano ‘Thesame reasoning took stom (A) 10(A8) kes rom(A.10 tothe rob mnesimum elo! einer ao S| — Fe 8)| The price we payisthat, unlike (A), this problem cannot be solved in closed form i sms cases and unerical methods mast be employed. ‘Tye resons or ths pint being the datasets ever an forthe prrseo is brietrotaton, sage leva328 Appendix Appendix References MA. Fischler and R.C. Bolles Random Semple Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography, Communications ofthe ACM, Nol 24, pp. 381-395 (198, P Meer, D. Mintz and A. Rosenfeld, Robust Regression Methods for Computer Vision: a Review, Intemational Jounal of Computer Vision, Vol. 6, pp. 59-70 (991, PA Rousseeuw and AM Leroy, Robust Regression and Outlier Detection, Wiley, ‘New York (1987) ‘AB Kalman Filtering ‘This section isa brief account of a casi too! of optimal estimation theory the linear Kalman iter. Its meant only to enable understanding of our treatment feature-based tracking in Chapter 8, Foran extensive discussion of Kalman filtering, see Maybeck’s lassc book. ‘The purpose of optimal estimation algorithms isto produce optimal estimates of ‘the state of dynamic system, onthe basis of nosy measurements and an uncertain model “ofthe system's dynamics. The theory is based on two ingredients, the notions of system ‘model and measurement model ‘The System Model A physical system is modelled by a sue vector, x, often called simply the state, and & set of equations, called the system model. The tate isa time-dependent vector, x() the ‘components of which are system variables, in sufficient number to capture the dynamic properties of the system. The system models vector equation describing the evolution ofthe stat in time We indicat discrete equally spaced time instants with x = +£AT, with #=0,1,... and AT'a sampling interval, and denote with % the state x(.). We also assume that 47 is small enough to captue the system's dynamies, so that she tate does not change much between consecutive time instants, and a linear system mode is an adequate approximation of the state change within AT. The linear system model is usually written in vector form as w= Ou thon whore & i a random vector modelling additive sytem nose. The subscript 1 indicates thatthe state anstion marx, isa function of time, and hence accounts for more complicated dynamics than the one we considered for feature tacking in Chapter 8 ‘The Measurement Model ‘The second ingredient of estimation theory is the measurement model. We assume that, at any time instant 1, a noisy measurement of the state vector (or atleast of Section A8 Kalman Fikering 328 some components) taken, and that the following, linear, relation holds between the ‘measurements and the true sysem state: = Hit Me In this equation, 2s the vector of measurements taken at time ., Hh the socalled ‘measurement mati, and wy a random veetor modeling additive noise, which accounts for the uncertainty associated with the measurements. “The key points and assumptions are summarized as follows: ‘Optinal Linear Estimation Theory (Kalman Filtering) Assumptions Thestate ofa djnamiesysem at times (1,2...) is described by n-dimensional vector, the state vector. The evolution ofthe sstens stat modeled by thelinar equation weeny thes with 6.19 time-dependent m D rotation canbe described asarotation by an gle, 6, around an axis identified by a unit vector m= [n3.22, 2 The corresponding rotation matrix, R, can then be obtained in tems of # and the components of m, which ives you a total of four parameters. The redundancy of this334 Appendix Appendix ‘parametrization (four parameters for three degrees of freedom) is eliminated by adding the constrain that mas unit norm, that by dividing each n by 3-3 The matrix Rin terms of 9 and mis given by mim ni 0-15 m Teos6-+(1—eos8)] myn on} many |+sine} ms 0 —m| (AT) Index ns stn mom 0 Conversely, both # and canbe obtained from the eigenvalues and eigenvectors ‘of R. The thre eigenvalues of Rare 1,co80 + isin, and cos@ i sind, where isthe {imaginary uit. The unit veto, ni proportional othe eigenvector of Rcoresponding to the eigenvalue 1; the angle # can be obtained from either of the two complex eigenvalues. To resolve the ambiguity inthe sgn of bth # and n, you can check the consistency of (8.1). References S.L Altmann, Rotations, Quaterions, and Double Groups, Oxford University Press, Oxford (1986). AUTO_COVARIANCE, 33, erage smoothing 56 ©. Fagen and M Heer, Te Repesettion, Recognition and Loaing | prose Seaneprnetantin 2 {0 Shapes rom Range Das, eta ual of oboe Rawk, No : EXSaNGLE RANGE a9 Sims inde GA-Komnand: M. Korn Mathnata andbok or Sdenss ond Egnees, | “- ‘second edition, McGraw-Hill, New York (1968). aberrations. See lens. -B- | teuisiion nose 32 Breprn, ‘Secon sats Decco, 29,20, 280 Shean Reson vas Soins Boren 2s ae pe Sang snr cna 26-29 ‘Savior anin t algebraic distance, 10,105, ofstereo system, 14 SEctEiLie io sachin 38 premety ened tom Binns | sheet Bn i lon 98 trea Bona er pdm mia be ' appearance-based identification, 249, 262-270 block matching, 147 Mraumctdieremitia mage MFR AIREDO LUMPADER.2 Bott ne-ae ‘te Books 7 oi torn | pore Deut 24 | Showa aman oat | sci ma. an we soars 22 ‘wong M0 Boner335 Index box fer 76 Brady. M6 ‘ranch hound method, 253 ‘brightness constancy equation, -c CHD, 64 «ales of variations, 11,229 Calder, 8,10 calbration, 123-136, 144 curacy, 124,136 of eamera parameters 125,134 of projection mars, 132 cnbration pattern, 124,125, 130,131,132 calibration. See calibration, uncalibrated, 28 ‘amera-image coordinate transformation, 37 camera model ating, 40 perspective, 26,39 ‘fundamental equations 26 linear version, 38 projection matrix, 39 weak perpestive, 27, 38,228,253, 288 projection matrix, 40 camera parameters 35 ‘extras, 35,126 38127 ‘Campani, M215, Canny, J.90 ‘CANNY_EDGE_DETECTOR, 71 ‘CANNY_ENHANCER, 76 Capi, B36 cascading 62 (CCD. See Charged Coupled Device CCD array, 29 ester of projection, 26 chain, 68 (CHANGE. DETECTION, 213 Charged Coupled Deve, 38, CCallappa, R22 ipl, R, 242 Clarke, 5,0 loud of points computer vision ‘exploring. 4 related dsiplies,2 research and application areas, 3 Scope ofthis bok, 2 In this book, 11 coiton number, 323, oni detetion, 108. CONSTANT. FLOW, 197 constraint marx, 18 constant satisfcton, 250, onvoton theorem, 55 Cooper, W'S, 215 comme detection, 82-85 CORNERS, 84 corelation, 254 ‘orelation matrix, 300 correspondence, 140, 145-150, 180 constrains, 148 corelation-based, 146-148, sisfereniat methods, 182 feature-based, 148-180 matching methods (motion), 182 ‘motion stereo 180, and eticatin, 157 ssearch problem, 145 ‘CORR MATCHING, 146 cosine shading Se image ‘oss corelation, 7 noemalized 147,172 roserato, 318. cose talking, 33 curvature ‘of eurve, 500, {stron by smootbing. 8, 92 Gaussian 85,322 mean, 85,322 form, 321 principal, 321 pf sarace, 520 carve detection, 6 elise, 101-108 enerl curves, 108-112 ata association, 202 Davies E17 exe, 32 ‘elormable contour. See sake DeMichel, E91 ep image, 41 {pho eld Seems ceive ‘feo tang 24 eivaies. Se numeri dlerentiation. sesin atin, 18 iret geometry, 30-322 recalibration 4418 pany 10, 14,160 moto sre, 84 sary map 140,17 385 ‘orton ofapect rato 39 ence 37 aa 37137 isto o the wise, 20 double exponential 3 Dyer,cR I EB ge, 09 ddzection, 73 enhancement, 70, 5-77 Jocaization, 0 ‘modeling, 1-73 normal, 73,76 posto, center, 73 ‘amp, 71 tidy, 71 tof, 71 step. 71 strength, 73,76 tvocking. 9 ye deteton, 8-82 optimal, 70 stages 70 sutpae precision, 81 ge detector, 8 Canny. 71-79 performance ters, 7, 73-75, 80,91 Roberts 80 second-order, 91 Sobel, 80,91 step edge, 71.76 ge, 8 clgenspace, 265, elgenspace method, 264270 EIGENSPACE_IDEN-IE 269 Index 337 EIGENSPACE_LEARN, 268 ah point slgoritm, 155,208 EIGHT POINT, 156 Eliot, 61,774 ellipse ase fom, 292 clipe detection, 101-108 obs, 106 ‘ng algerie dtane, 103 sing Eucdean distance 101 lip titing See eps detection. energy functional, 109 epipolar constrain, 152 epipolar gomety, 145, 150-157 epipolr lin, 152,157 ‘conjugate, 152 epipolar plane, 152 pial, 11, 152,186,191 ‘instantaneous, 191, 08,210 and tramlation direction, 191,20, 212 EPIPOLES LOCATION, 157 0%, 308 mean, 309 ean absolte, 308 mean square, 309 roundol, 313 truncation, 311 ese mali, 12, 16,185, estimation theory, 199 EST_NOISE,32 EUCLID_REC, 165 ‘BUCL ELLIPSE FT, 102 Euler's theorem (Fepresnting rotations), 295, 333 Euler Lagrange equations 290 experimental parameters 308 EXPL PARS CAL, 150, expose time, 19 Fmumber, 26 fsctrization method, 23-208 fale negative, 309 fale postive, 309 Fan, TI,91 Faugeras 0 D, 136,171,215, 302 feature, 68 soba, 68338 Index feature (con) grouping See grouping. Toca, 68 featur descriptor, 6, 250 edge, 3 surface patch, 89 feature extraction, 2,62 feature matching, 198 feature tracking 199-203, FEATURE MATCHING, 19 FEATURE_POINT MATCHING, 199 fe ate 178 fiter box, 78 linear, 55 low-pass 56 mean, 56 medina, 62 she, RBA, 91,117,274 ‘zgibboa, A W.i17 firation point, 143 Flet.DJ,215, focal lent, 19,26 im piel, 126 focus. 18 focus of contraction, 185,190 focus of expansion, 185, 190 focus of projection, Se center of projection. frame, 178 feame baer and CCD ara siz, 30 ‘ame grabber, 28,29 for sequences, 178 frame rate, 178 Frankot, R242 Freedman, M,10 Frobenis nom, 323 fundamental matrix. 154,155 sell 8,171 -6- Gamble E215 Gaussian Kernel, 57-60 building 59 separability 57,77 Gaussian smoothing. 56 ‘and carat estimation, 8 peneralized cone, 272 generalized eigenvalue probe, 104 Goller, A 12,222 ‘raph matching 74 Grimson, WEL, 274, 302 ‘round truth 59 ‘ouping, 96, 14-117, 260,270, Ging, 1242 oH Haale, RM, 107,274,502 Haley 1,171 Hebert, M, 302 Heeger, D215 Hildreth EG, 215 HK segmentation, 85 Hotfman,R,91 ‘homogeneous coordinates 316 Hoover, A, 91 Horas, F302 Hom, BKB 4,215,222,302 Hough transform, 97-101, 118, for curve detection, 100 forline detection, 7 HOUGH CURVES, 100, HOUGHLINES, 9 Hsu, YZ, 215 hysteresis thresholding, 8 HYSTERESIS. THRESH, 75 - ideatifcaton 288,270 i-posed problem, 229 ingwort, 317 ihuminan ection, 222 estimation, 225-239 image rightness. 26 exer, 26126 13,132 ‘comparison, 264 ‘compresion, 2 digi, 16,2834 ahaneement2 Feature See feature. formation, 16| geometric, 26-28 intensity 16-28. radiomecy. Se radiometry intensity. 16, 26-49, inradinge, 23,26 sadianoe, 22 range, 16, $047,250, cosine shaded ry form, sh i frm, 520 form, 4 restoration, 2,53 segmentation, 8 sequence, 178 mage aoquison system, 28, mage analysis 2 ‘mage dilerening 215 mage morphology 93 mage noise. See nose, age plane, 18,26 image procesing. 2.91 mage sequence, 176 mage understanding ¢ IMAGE_CENTER CAL 132 improper point, 317, ‘impulse response, 8 INRIA, 18,171 lmerpretation, 250 lnterpretation tre, 2-254 INT GAUSS KER, 6f INTTREE, 254 snvaran, 299, 288-282 algebraic, 25,256 based recognition, 29 of fines. 288 of wo cones, 259 INV_ACO, 260 INVIREG 261 Irani, 215, inadiance. Ser image, a Jain, AK, 91 Sain, RC. 81 Jepson, AD, 215 ok Kalman fering, 18,199, 200,20, 38-3 data sociation, 202 ‘missing information, 202 Inder 339 problems 202 Uncertain, 381 KALMAN TRACKING, 201 Kanade 7,91, 71,215 Kanatani K, 117,302 Kass M117 ern $8 ‘iter, J, 117 ‘rok, .48 “Le Lambertian surface, 23, ‘reatngasytheicimage, 224 Lane, DM, 8 Laplacian, 313 laser scanners 4,82 Laston, DT. 215 least median of squares, 326 leas squares fet of outers 527 as maximum Ikelihod estinator, 326 fd SVD, 324 ‘robust estimators 327 Jeftight consists, 150 Jens sheratons 21 Sept of Bel, 21 clletve dimer, 20 eld of sew, 2 focal length 19 fock of 19 prinepl ry, 20 ‘in 19-22 basic properties 19 fundamental eqution, 1,20 Lindeber.T, 64 LINEAR FILTER, 55 line detection 96, 97-101 ite, 4,215 loc featarefoeus method, 2 location, 248 from intensity images, 29-294 perpective, 283, weak perspective, 298 problem, 28 from range images, 294-302 Longue Higgins HC Lowe, D 0,118Index Lozano-Pecen 1.302 Laas, B D215 {Laong, Q-7 171 astm, F 135,215 Me Meestimaor, 325 Magellan spacecraft, 140 “Mahalanobis distance, 216 Malik 52 rman, 266 Mart,D,91, 171 Marshal, AD, 302 Mari, RR, 302 ask, 55 Maybank $137 Maybeck,PS,215, 328 Mashew, TE W, 136 Meenas, 5,6 MED_FILTER, 62 Meet, B.117 Mirmehdi M,6 model, 248, 270-273 featire-based, 271 rmodelhased recopiton, 48 ‘model soquistion 256, 260, 258 model fing 9, 326-328 ‘model index, 255 model matching, 80 See objec recognition model reference frame, 28,294 Mohr, 171 motion based segmentation, 212-215 ‘motion analy ortespondenc, 180 reconstruction, 180,208 from dense motion fel, 208-212 from sparse motion fed, 205-208 segmentation, 182, 212-215 sulbproblems 180,182 ‘motion fel 183 Isic equations 184 dierent techniques, 195, estimation, 195-203 feature-based techniques, 198 ‘of moving plan, 187 of pure translation, 185 relative, 189,190,208 ‘of sooth surface, 18 1 optical ow, 195 ‘motion pala, 188,209,210 MOTSTRUCT. FROM. FEATS, 208, MOTSTRUCT_FROM FLOW, 211 Mundy, 31, 47.274 Murase, H, 274, 302 Murray D W,9,274 oN Nagel, HH 215 Nakagawa, ¥,48 Nayar SK, 24,302 nearest neighbor data asoiation, Er eee, 239 Neva, R274 Newton's method 217,284 oie, 51-35 adv, 52 amount, 52 incompater vision, 52 ering, 2, 55-63 197 average, 56 Gaussian, $6 linear, 55 mean, 56 media, oninear, 62 repeated averaging, 6 Gausian, $3 impale, 8, rltpicative 53 ‘quantization, $3 random, 52 saltand pepper, 4 ‘smoothing See nose tering ‘suppression. Senos, tering. inte, 53 online ltering, 2 ‘onmaximum supresio, 70,77 NONMAX SUPPRESSION, 77 ‘ormal (to surface), 320, numerical dferentaon, 31-314 ‘aceuray, 313 ceaal-iferences 312 orwaribackwarddiferencs, 312 ‘numerical packages, 1 ‘Nyist froqueny, 314 -o- objec centered representation, 272 ‘objet modeling 27073 ‘objet recognition, 247-273 ‘object representation. e object modeling. ‘Okutom M171 ‘OppenhcmnA V.47 coptcl ais, 19,26, optical ow, 181,191 Ox, MIL 302 ontcentey, 131 onthogonality of mmesical mati, 129,257 conhographic projector, 27 utr, 106,327 Papoulis A, 64 patlern recognition, 3 peak ase. Se nose, inpusve Peg, 8,215 pesca grouping, 1 etecptual orpaizatin, 14 PERC.GROUPING, 1i6 performance asset. See testing. perspective. Se camera model photogrammetry 3 hotometistero, 172 photosensitive deve, 1° Pu, M, 117 Pinhole 18,26 pinhole photopephy. 8 pac, 29 fetiv size, 3, 37,25. pine eoordinats. 38 Poggio, 91,171,215 point distribution mod, 274 ose estimation, 280, ose frm lps, 292 POSE_FROM ELLIPSE, 293 Prat, WK, Prazdny,K.215 principal dietionseuratres, 321 Inder 341 rina point, 26,37 ‘projection mai, 39 libation.Secalbatin, independent parameters 135, projective geometry, 316-320 invariant, 256,257 fine 317 pane, 316 rasformatio, 26,317 PROJ_MAT-CALIB 134 Proper point, 36 pebdimvers, 323, pop 8 Pyramids 64 radiance, Seimage. radiometry, 2-26 fundamental equation, 25 random dots sequences, 178 stereograns, 179 random sample consensus Soe RANSAC. range image, Se image. ange sensor, 4 ‘cive triangulation, baseline, 2 calration. Se dret calbation, characteriza, 46 parameters, $7 ripe, 43 RANGE_ACO, 46 RANGE.CAL, 4 [RANGE_SURF PATCHES, 88 rank tore, 204 RANSAC, 3, 526 eoogntion. Se object recognition, base modules 281 location, Se lesion, reconsruciog, 140,148, 161-170, 180, motion vx. sero, 18) projective, 165,169 byteangulation, 162 uptoa scale factor 164 reetifestion, 157,160 reflectance map, 221,223, 22342 Index relectance model, 23, Lambertian, 23 Reker, 6,215, repeated averaging. 61 REP_AVG, resolton, 3 restoration, See image. reverse engineering, 91,274 Rieger, TH, 215 Robbiana F137 Rober, 101 Roberts edge detector, 80 ROBERTS EDGE_DET,®) robust estimator, 106, 326-328 ROB_ELLIPSE FIT, 107 Rosin, PL, 11? rotations in 3D, 332-334 Rotwel. CA, 278 ROT MATRIX RANGE, 298 ROT_MATRIX_RANGE2, 00 roundoff error, 313 Rows, B,215| os. sapling theorem, 31, 314-316 Sampson, PD, 117 SAR images 149| Seale space representations, 6 scatier matt, 104 scene analysis 2 SCERPO, 302 Schunk, BG, 215 Seosor reference fam, 294 SEPAR FILTER, 59, sequence, 178 Shab, MIT share Toca detition, 85 parameters 257 shape fom shading. See shape trom X shape fom X, 220, contours 242 focus 20 Shading 21-204 boundary conditions, 224,252 enforcing integrity, 232 fundamental equation, 223 iterative scheme, 232 variation! sli, 229 restue, 235-241 eterministi, 298 00m, 242 SHAPE_FROM SHADING,253, Shapio,1.G,117 Shashua,A,17L shutter, 19 signa o-noise ratio 32, 33,52, 78 ‘Singular value decomposition, 322-325 Sjoberg, RW, 47 smart camera, 28, sake, 108-112 'SNR. See signal to noise rato, Sobel edge detector, 91 SOBEL, EDGE _DET, 89 solid angle, 3 Spacek, LA, 91 Sparr,G,17. spatial sampling, 3. Spat noise See nose, impulsive, 880,147 standard basis 318 STAT_SHAPE FROM_TEXTURE, 200 stencil 313, Stereograms. Soe autosterengrams, stereo ead, 9 sereopsis 140-170 ‘alain, 144 ‘dspaiy. Se dispar, epipolar geometry, 145 parameter estimation, 164 Parameter of stereo syste, 14 Subproblems 140 meaieatod, 14 stereo vision. Se sereopis stochastic process 58, Sreng 117 streaking 78 stipe. See range sensor. structrefrom-moson theorem, 217 subpixel precision 81, 173, sum of sated dtferences (SSD), 147 surfae-based segmentation, &8-89, surface pate, 85,250, ‘area, 320, tangeot plane, 320 ‘SVD See singular value decomposition. “1 tangent plane (10 surface), m0 template matching 96,18 Teezapouls,D, 117 test data real, 309 synthe 309 testing, 307-311 designing tess 309 reference procedure, 310, tex data, 309 tex, 235, texture, 235-257 deterministic, 235 istortion, 257,238 features, 241 sradieat, 238 segmentation, 241 statis, 236 ‘Thacker, NA, 136 thresholding. 8, 70,78 Iyseresis, 78 talent anges In shape from shading 206, in shape from teture,238 time-to-impaet, 79 time fig, «2 ‘Tomasi C, 91,215, Tore, ¥,81, 136 Tosca, 6.136 215, tracking 181, 331 TRANSL, RANGE, 26 TRANSL. RANGE?, TRIANG, 162 triangulation, 14: See range sensor. Lunatic 31D Tsai, R136 ‘Tasing inst, 141,17, Index Ulm, $215,217 Umasuthan, M8990 UNCAL STEREO, 170, ‘uncertainty, 331 ~v- vanishing point 131, 137,185, 186,190, velocity fel, 183 verifeaton, 252,261, 250,294 video signa, 29 viewer centered epreetation, 272 visual eveat,272 volumetri epresentatons 272 ein, 100 — Wallace, A M45 warping (in etieation), 157 wavelets 64 weak perspective. See camera model WEAK PERSP INV.291 wild ard, 253 Wiliams D 217 Wilsky, 8,47 Witkin, 64,117,282 Woit E47 Woodhars, R172 orld reference fame, 34 aye Young, 11.47 ot Zisserman, A, 47,278 3

Introductory Techniques For 3D Computer Vision

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introductory Techniques For 3D Computer Vision

Uploaded by

Copyright:

Available Formats

You might also like