You are on page 1of 194

Vision based lane tracking using multiple cues and particle ltering

Nicholas Apostolo

A thesis submitted for the degree of Master of Philosophy at the Australian National University

Department of Systems Engineering Research School of Information Sciences and Engineering

August 2, 2004

c 2004 Nicholas Apostolo. All rights reserved.

Statement of Originality
This thesis is an account of my research undertaken between April 2001 and September 2002 while I was a Masters student in the Department of Systems Engineering at the Australian National University. Except as otherwise indicated in the text, the work described is my own. It has not been submitted for any other degree or award in any other university or educational institution.

Nicholas E. Apostolo

Acknowledgments
In a similar fashion to every other report that I have submitted as part of my education, the creation of this dissertation has been somewhat chaotic, and I am left writing this last page with only few hours to spare. There are many people who have helped me during this thesis and I wish thank you all. If your name is not mentioned here, it is not because I do not appreciate your eorts, it is because I have taken far too long already to complete this thesis and I just cant wait any longer to submit it. Firstly, I would like to thank my supervisor Professor Alexander Zelinsky for his support and guidance throughout my Masters studies, and for introducing me to the world of robotics and computer vision. Thanks to the many other academics in the department, particularly Dr David Austin whose knowledge and willingness to answer my endless stream of questions was invaluable. To our non-academic sta, James Ashton, Luke Cole, Rosemary Shepherd and everybody at RSISE who keeps the lab running smoothly. I would also like to thank the sta of Seeing Machines for their help with faceLABTM , particularly David Liebowitz who was kind enough to spend time helping with the analysis of the data collected for VIOV. To my fellow students who made RSISE such a memorable place to work, Kiril Kouzoubov, Leanne Matuszyk, Grant Grubb, Simon Thompson, and The Mad Swedes (Peter Lundgren, Anton Falk and Lars Petersson you know I will never forget that video!). A special thanks to Andrej Mihajlovski for our many insightful discussions over coee that have expanded my narrow little view of the world, and to Gareth Loy and Luke Fletcher for making our oce the place to be in RSISE. Lastly, I would like to thank my family for their love, guidance, and encouragement that has always been unconditional even with the ups and downs, and the habitual hours that I keep.

Abstract

OAD related fatalities cause an economic and social burden that is in stark contrast to the benign convenience that we normally associate with automo-

bile use. In 1999 alone, between 750,000 and 880,000 people were killed as a result of road related accidents, with an estimated cost of US$ 500 billion. In OECD countries (the 23 leading economically developed countries of the world) 150,000 are killed every year. One way of combating this problem is to develop intelligent vehicles that are self-aware and act to increase the safety of the transportation system. This dissertation presents a novel multiple-cue visual lane tracking system for research into intelligent vehicles. An algorithm called distillation is developed that adaptively allocates computational resources over multiple cues to robustly track a target in a multi-dimensional state space. Bayesian probability theory provides a framework for sensor fusion, and resource scheduling is used to intelligently allocate the limited computational resources available across the suite of cues. Using distillation, a 15 Hz visual lane tracking system is developed to seamlessly detect and track the pose of the vehicle, and the width of the road over time. A selection of cues are developed that are suited to a variety of road types and conditions. Quantitative experimental results of the distillation lane tracker are discussed and compared qualitatively to prior art. Frame-by-frame success rates ranging between 91.4% and 100% were achieved for scenarios ranging from clear highways to high curvature roads in the rain. The particle lter conferred a number of benets suited to the task of lane tracking. First, a priori constraints such as lane edges meeting in a vanishing point on the horizon and road hypotheses lying in the plane of the road, were indirectly incorporated into the algorithm through the top-down validation process of the particle lter. Instead of searching for a result like traditional lane trackers, the distillation algorithm tests a number of hypotheses that all satisfy the road model

construct. Second, these constraints, in combination with concentrating particles in areas of high probability, helped prune erroneous solutions that have previously plagued lane trackers (i.e. oil marks and cracks on the road surface, rain on the windshield, departure lanes, construction sites, etc.). Third, randomly redistributing a small number of particles each iteration helped the system recover from tracking failure. To explore the use of lane tracking as a driver assistance tool, the lane tracker was integrated with a driver monitor to estimate the drivers focus of attention with respect to the road. A strong correlation was found between the eye gaze direction of the driver and the curvature of the road which was identied previously by Land (1992) using a manual frame-by-frame visual analysis of the scene. This system successfully closed the loop between vision inside and outside the vehicle for the rst time.

Publications
Journal articles
Apostolo, N. E. and A. Zelinsky (2004, April). Vision in and out of vehicles: integrated driver and road scene monitoring. International Journal of Robotics Research 23 (4), 513538.

Conference papers
Apostolo, N. E. and A. Zelinsky (2003, June). Robust vision based lane tracking using multiple cues and particle ltering. In Proceedings of IEEE Intelligent Vehicles Symposium, Columbus, OH, USA. Petersson, L., N. E. Apostolo, and A. Zelinsky (2003). Driver assistance: and integration of vehicle monitoring and control. In Proceedings of IEEE International Conference on Robotics and Automation. Petersson, L., N. E. Apostolo, and A. Zelinsky (2002, November). Driver assistance based on vehicle monitoring and control. In Proceedings of the Australian Conference on Robotics and Automation, Auckland. Apostolo, N. E. and A. Zelinsky (2002, July). Vision in and out of vehicles: integrated driver and road scene monitoring. In Proc. 8th International Symposium on Experimental Robotics (ISER02), SantAngelo dIschia, Italy. Also in B. Siciliano and P. Dario (Eds.), Experimental Robotics VIII, Springer tracts in advanced robotics 5, pp. 634643. Springer. Loy, G., L. Fletcher, N. E. Apostolo, and A. Zelinsky (2002, May). An adaptive fusion architecture for target tracking. In Proc. The 5th International Conference on Automatic Face and Gesture Recognition, Washington DC.

Fletcher, L., N. E. Apostolo, J. Chen, and A. Zelinsky (2001). Computer vision for vehicle monitoring and control. In Proc. Australian Conference on Robotics and Automation, Sydney, Australia.

Magazine pages
Fletcher, L., N. E. Apostolo, L. Petersson, and A. Zelinsky (2003, May/June). Vision in and out of vehicles. In A. Broggi (Ed.), IEEE Intelligent Systems: Putting AI Into Practice, Volume 18, pp. 1217. IEEE Computer Society.

Contents

Statement of Originality Acknowledgments Abstract Publications 1 Introduction 1.1 1.2 1.3 1.4 Research objectives . . . . . . . . . . . . . . . . . . . . . . . . . . The experimental platform . . . . . . . . . . . . . . . . . . . . . . Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

iii v vii ix 1 3 6 6 7 9 10 14 31 45 47 47 48 54 55

2 Previous work 2.1 2.2 2.3 2.4 Intelligent transportation systems . . . . . . . . . . . . . . . . . . Cues for lane detection . . . . . . . . . . . . . . . . . . . . . . . . Lane localization and tracking . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 Experimental platform 3.1 3.2 3.3 3.4 Vehicle overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Actuators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computing architecture . . . . . . . . . . . . . . . . . . . . . . .

3.5

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

57 59 61 62 70 82 85 86 92 94

4 Distillation: a detection and tracking algorithm 4.1 4.2 4.3 4.4 The structure of the distillation algorithm . . . . . . . . . . . . . Particle ltering for target tracking . . . . . . . . . . . . . . . . . The cue processor . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5 Lane localization and tracking with distillation 5.1 5.2 5.3 5.4 5.5 5.6 5.7 Coordinate geometry and reference frames . . . . . . . . . . . . . Ackermann steering motion model . . . . . . . . . . . . . . . . . . Sensor calibration . . . . . . . . . . . . . . . . . . . . . . . . . . .

Lane tracking cues . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Lane tracking performance . . . . . . . . . . . . . . . . . . . . . . 110 Comparisons with the literature . . . . . . . . . . . . . . . . . . . 125 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 133

6 Vision in and out of vehicles 6.1 6.2 6.3 6.4 6.5 6.6

System architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Driver monitoring with faceLABTM . . . . . . . . . . . . . . . . . 137 Integrating the lane tracker and faceLABTM . . . . . . . . . . . . 137 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . 139 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 143 157

7 Conclusion A Lane curvature tracking

A.1 Integrated vehicle pose and curvature tracking . . . . . . . . . . . 157 A.2 Curvature tracking system . . . . . . . . . . . . . . . . . . . . . . 158 A.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

B Vehicle calibration C Enlarged gures D Multimedia extensions

165 167 171

List of Figures
1.1 1.2 1.3 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 Sample road images. . . . . . . . . . . . . . . . . . . . . . . . . . Lane tracking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TREV (TRansport Experimental Vehicle) - the testbed vehicle. . Cues for object identication. . . . . . . . . . . . . . . . . . . . . Oranges. The combination of texture, colour and edges simplies the identication of the oranges. . . . . . . . . . . . . . . . . . . . Lane marker extraction using dark-light-dark lane proles. . . . . Lane marker extraction using thresholding. . . . . . . . . . . . . . Edge cues for road detection. . . . . . . . . . . . . . . . . . . . . 3 5 6 15 16 17 18 19 20 21 22 23 23 24

First and second derivatives for edge detection. . . . . . . . . . . Road gradient magnitude image. . . . . . . . . . . . . . . . . . . . Thresholding the gradient magnitude image to obtain edges. . . . Non-maximal suppression. . . . . . . . . . . . . . . . . . . . . . .

2.10 Canny edge detection. . . . . . . . . . . . . . . . . . . . . . . . . 2.11 Dickmanns oriented edge detection algorithm. . . . . . . . . . . .

2.12 Lane edge feature detection using Dickmanns oriented edge detector. 25 2.13 Laplace of Gaussian kernel for lane marker feature extraction. . . 2.14 Colour for lane detection. . . . . . . . . . . . . . . . . . . . . . . 26 27 30 32 34 37

2.15 Road colour histogram classication. . . . . . . . . . . . . . . . . 2.16 VaMoRs and VaMP. The two testbed vehicles used by Dickmanns. 2.17 MarVEye. The vision system used by Dickmanns. . . . . . . . . . 2.18 GOLD: inverse projective mapping for lane tracking. . . . . . . .

2.19 Steps of the GOLD algorithm. . . . . . . . . . . . . . . . . . . . . 2.20 Rapidly adapting lateral position handler (RALPH). . . . . . . . 2.21 Hough transform for lane tracking. . . . . . . . . . . . . . . . . . 2.22 Massively parallel road follower. . . . . . . . . . . . . . . . . . . . 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 TREV: the experimental platform. . . . . . . . . . . . . . . . . . Vision platforms installed in TREV. . . . . . . . . . . . . . . . . . CeDARs elds of view. . . . . . . . . . . . . . . . . . . . . . . . . CeDAR: the active platform used in TREV for lane tracking. . . . CeDAR: the Helmholtz conguration has three degrees of freedom that allows left and right verge, and head tilt. . . . . . . . . . . . CeDAR is located in place of the rear-view mirror in TREV. . . . faceLABTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Laser range nder. . . . . . . . . . . . . . . . . . . . . . . . . . . Raytheon drive motor steering actuator. . . . . . . . . . . . . . .

38 40 44 45 48 49 50 50 51 52 53 53 55 56 56 61 65 67 69 70 72 73 76 78 79

3.10 Location of the Animatics braking actuator under the drivers seat in the vehicle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.11 Computing and communications architecture on TREV. . . . . . 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 The two subsystems of the distillation Algorithm. . . . . . . . . . Example distributions of the Bayes lter. . . . . . . . . . . . . . . The four iterative steps of the particle lter algorithm. . . . . . . Example distributions of the particle lter. . . . . . . . . . . . . . Thruns laser range nder sensor model. . . . . . . . . . . . . . . Cue processor cycle. . . . . . . . . . . . . . . . . . . . . . . . . . Distillation algorithm with a slow processing loop. . . . . . . . . . Time allocation during one iteration. . . . . . . . . . . . . . . . . An example of the scheduling of cues and their migration over one iteration of the distillation algorithm. . . . . . . . . . . . . . . . .

4.10 An example of the shared observation preprocessing for four cues of the lane tracker. . . . . . . . . . . . . . . . . . . . . . . . . . .

5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9

The search space of the lane tracker. . . . . . . . . . . . . . . . . Visual lane tracking cues. . . . . . . . . . . . . . . . . . . . . . . Homogeneous coordinate transformations. . . . . . . . . . . . . . Roll , pitch , and yaw rotation directions about the X, Y , and Z axes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Coordinate systems used to model TREV. . . . . . . . . . . . . . Pinhole camera model. . . . . . . . . . . . . . . . . . . . . . . . . Ackermann steering model. . . . . . . . . . . . . . . . . . . . . . .

86 86 88 89 90 92 93

Fields of view and lookahead distances of the two cameras on CeDAR. 96 Camera calibration target. . . . . . . . . . . . . . . . . . . . . . . 97 97 98 99 99

5.10 Steering sensor calibration: raw data. . . . . . . . . . . . . . . . . 5.11 Steering sensor calibration: graph of Instantaneous Center of Curvature versus front wheel angles. . . . . . . . . . . . . . . . . . . . 5.12 The bicycle approximation to the Ackermann steering model. . . . 5.13 Steering sensor calibration: least-square t mapping steering transducer voltage to the ICC. . . . . . . . . . . . . . . . . . . . . . .

5.14 Transformation of the road model into image space. . . . . . . . . 101 5.15 Vehicle state and road width estimation model. . . . . . . . . . . 101 5.16 Road model: the lane marker region. . . . . . . . . . . . . . . . . 103 5.17 Lane marker cue. . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.18 Road model: the road edge region. . . . . . . . . . . . . . . . . . . 104 5.19 Possible inconsistencies between cues using the road edge region. . 105 5.20 Road edge cue. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.21 Road model: the road interior region. . . . . . . . . . . . . . . . . 107 5.22 Road colour probability map. . . . . . . . . . . . . . . . . . . . . 107 5.23 Road colour cue. . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.24 Non-road colour cue. . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.25 Particle lter convergence I. . . . . . . . . . . . . . . . . . . . . . 112 5.26 Particle lter convergence II. . . . . . . . . . . . . . . . . . . . . . 113 5.27 Particle lter convergence is evidenced by the stable state variance. 114

5.28 Cue scheduling example. . . . . . . . . . . . . . . . . . . . . . . . 114 5.29 The scene during cue scheduling. . . . . . . . . . . . . . . . . . . 115 5.30 Performance characteristics of the lane tracking system on a highway with clear lane markings. . . . . . . . . . . . . . . . . . . . . 119 5.31 Performance characteristics of the lane tracking system on a high curvature road. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.32 Performance characteristics of the lane tracking system on a high curvature road in the rain. . . . . . . . . . . . . . . . . . . . . . . 121 5.33 Performance characteristics of the lane tracking system on a highway test with poor lane markings. . . . . . . . . . . . . . . . . . . 122 5.34 The lane tracker is distracted by a passing vehicle because of the low signal produced by the lane markers separating the lanes. . . 123 6.1 6.2 6.3 6.4 6.5 6.6 Vision in and out of vehicles. . . . . . . . . . . . . . . . . . . . . . 134 Structure of the integrated system. . . . . . . . . . . . . . . . . . 135 Regions of interest for obtaining the drivers focus of attention. . . 136 VIOV results: rendered output for a high curvature road. . . . . . 140 VIOV results: comparison between the vehicle yaw and drivers gaze yaw. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 VIOV results: the drivers focus of attention along a highway and high curvature road. . . . . . . . . . . . . . . . . . . . . . . . . . 142

A.1 Complete lane tracking system with curvature estimation. . . . . 158 A.2 A birds-eye view of a clothoid road element. . . . . . . . . . . . . 160 A.3 Curvature road model. . . . . . . . . . . . . . . . . . . . . . . . . 161 A.4 Vibration eects on the far-eld camera. . . . . . . . . . . . . . . 162 A.5 Disappearing lane edges in curvature tracking. . . . . . . . . . . . 163 C.1 Enlargement of gures 5.30(b)(d) that shows a comparison between the experimental and baseline results on a highway with clear markings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 C.2 Enlargement of gures 5.31(b)(d) that shows a comparison between the experimental and baseline results on a high curvature road. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

C.3 Enlargement of gures 5.32(b)(d) that shows a comparison between the experimental and baseline results on a high curvature road in the rain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 C.4 Enlargement of gures 5.33(b)(d) that shows a comparison between the experimental and baseline results on a high curvature road in the rain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

List of Tables
3.1 5.1 5.2 5.3 Performance specications of CeDAR. . . . . . . . . . . . . . . . . Intrinsic camera parameters and their errors. . . . . . . . . . . . . 51 96

Statistics for the prior art autonomous vehicle trials. . . . . . . . 129 Statistics for the distillation lane tracker. . . . . . . . . . . . . . . 129

B.1 Vehicle calibration. . . . . . . . . . . . . . . . . . . . . . . . . . . 165 B.2 Extrinsic camera calibration. . . . . . . . . . . . . . . . . . . . . . 165 B.3 Intrinsic camera calibration. . . . . . . . . . . . . . . . . . . . . . 165 D.1 Index into the multimedia extensions. . . . . . . . . . . . . . . . . 172

Chapter 1 Introduction

HERE is no doubt that the mass introduction of the automobile by Henry

Ford in 1914 revolutionized the world and forever changed the way we travel. Suddenly everything seemed a little closer more attainable. The automobile became an integral part of society while we became dependent upon their use. Can you imagine life without them? Unfortunately the old adage nothing in life is free is applicable with respect to the automobile. Ignoring the economic costs of owning and maintaining a vehicle, there are other more global eects that aect our everyday lives. The emission of greenhouse gases to the environment, noise pollution and the congestion from road infrastructures are just a few. One of the more startling eects is the economic and social burden caused by accidents and fatalities. Between 750,000 and 880,000 people died globally in road related accidents in 1999 alone, with an estimated cost of US$518 billion (Jacobs et al. 2000). In OECD countries (the 23 leading economically developed countries of the world) 150,000 are killed every year. Several alternatives exist to battle the on-growing burden of road related fatalities. These include stricter road regulations, marketing campaigns targeting speeding and driver fatigue, increased driver training, and improving the safety of the road infrastructure and vehicles. It is apparent from these approaches that the driver is commonly considered the most unreliable component within a vehicle. While strategies that target the driver have been successful in the past (DSA 2003), a large proportion of automobile safety research is the design of safe vehicle mechanisms (i.e. crash zones, airbags, ABS braking, etc.). A less passive

Introduction

approach is taken by the eld of robotics and computer vision. Here we aid, or completely remove, the driver from the system by introducing smart functions in the vehicle; however, the vehicle must be aware of its environment to accomplish this. It must be able to see the world. Specically, it must be able to estimate some form of model of the world that is sucient for the task at hand. This dissertation concentrates on increasing the safety of the vehicle through the development of a robust lane tracking system to see the world outside the vehicle. The lane tracker is then demonstrated in combination with a driver monitor to estimate the location of the drivers focus of attention. This work is part of an intelligent transportation systems (ITS) initiative by the Australian National University (ANU) that is focused on autonomous driver monitoring and vehicle control to aid the driver (Fletcher et al. 2001). A major aim of this project is the development of a system of cooperating internal and external vehicle sensors to aid research into the visual behavior of the driver. Potential uses for this technology include: lane departure warning systems; adaptive cruise control; obstacle detection and warning; automated driver visuo-attentional analysis; fatigue and inattention warning systems; autonomous vehicle control. While roads contain an inherent structure that has made them a popular target for computer vision and robotics researchers for decades, there still exists a signicant variability that hinders the progress of lane tracking technologies. Take for example the road images in gure 1.1. While this is not representative of the actual proportion of dierent road types many roads are structured and clearly marked it does indicate that a signicant variability exists and that no single cue would be sucient for robust tracking. Signicant advances have been made on specialized systems for tracking individual road types, but little progress has been made in robustly tracking a variety of road types seamlessly. This is especially evident in the recent results of the DARPA Grand Challenge 2004 which is a land race for autonomous vehicles (DARPA 2004). Only seven teams out of

1.1 Research objectives

Figure 1.1: Sample road images. While roads are inherently structured, there still exists a large variability that hinders the acceptance of lane tracking technologies. A move away from single cue lane trackers is required to begin acquiring the standard of robustness required for wide acceptance. twenty managed to complete more than 1 mile of the 142 mile course, the furthest of which made a humbling 7 miles. Varying road types cannot be entirely blamed for this a number of vehicles were disabled due to collisions with obstacles and equipment failures but it does highlight the inherent diculty of autonomous travel. Until there is a signicant move away from single cue tracking, there is little hope of achieving the standard of robustness required for the wide acceptance of lane trackers. It is this premise that underlies and motivates the research presented in this thesis. To see the road robustly, a suite of cues that suciently encompasses the variety of possible road variations must be integrated and managed accordingly.

1.1

Research objectives

The central theme of this thesis is the design, implementation and demonstration of a novel visual lane tracking system suitable for research into smart-vehicle functions and autonomous vehicle control. The system should be adaptive at the level of perception and should therefore seamlessly track the road over a variety of conditions. There are a number of facets of this goal. First, the design and development of

Introduction

a novel, adaptive, multiple-cue localization and tracking algorithm suitable for seamless lane tracking across dierent road surfaces is investigated. Second, the application of this algorithm to vehicle localization and lane structure tracking is demonstrated on a number of driving scenarios. Third, a system that integrates vision inside and outside the vehicle is developed and tested to monitor the visual behaviour of the driver. The localization and tracking algorithm was approached with four complimentary goals that have not yet been widely explored: A generic tracking infrastructure that allows the seamless tracking of roads of diering appearance through the dynamic fusion and distillation of multiple cues. Distillation in this context refers to the process of prioritizing cues based on the information content of their observations. This is to be accomplished at the fundamental level of perception as opposed to the explicitly detecting scene changes, followed by algorithm adaption. Maximize CPU usage with respect to cue performance. A major aim of this system is to determine the dominant cues for tracking and manage the computational resources accordingly. The fusion of cues running at dierent frequencies to allow slow running but powerful cues to run in the background. A probabilistic engine to drive the search and provide a statistical measure of the uncertainty in the result.

1.1.1

Lane tracking

The goal of lane tracking is twofold. First, the vehicle must be localized with respect to the road. Second, the lane structure must be estimated. Tracking the parameterization of the vehicle state and lane structure over time is implicitly assumed in this formulation. Formally, the lane tracking problem investigated here involves the estimation of the lateral oset of the vehicle from the centerline of the road, ysr , the vehicle yaw, vs , and the width of the road, rw (gure 1.2). The rst two parameters complete the description of the vehicle state while the third provides the lane structure. Although parameterizing the road structure with only the road width

1.1 Research objectives

y The Road

vs
z ysr

Road skeletal line rw

Figure 1.2: Lane tracking includes the detection and tracking of three features: the lateral oset of the vehicle ysr , the vehicle yaw vs , and the width of the road rw. Note that this gure is exaggerated for clarity.

inherently limits the lane tracker to low curvature roads, it is suitable for the lane tracking applications investigated here. An extension to curvature tracking is discussed in appendix A. The majority of previous lane tracking projects have focused on using one or two visual cues for the detection of lane boundaries. These systems are inherently limited by the failure modes of the individual cues chosen. The integration of multiple cues that specialize in specic road features is deemed essential for a robust solution. Cues investigated include edge feature detection, lane marker detection, dynamic road colour probability maps and a priori state-based cues that help regularize the solution at a fundamental level. The resulting tracking architecture uses a Bayesian framework to fuse data from multiple cues and a particle lter (Blake and Isard 1998) to control hypothesis generation over the search space. Additionally, it is desirable to perform such computations in an ecient manner by limiting the use of extra cues to periods of high uncertainty. This is particularly useful for initialization. Once the road position is known, robust prediction of future road positions is made using the continuity of the vehicles motion and validated via further observations. This enables higher probabilistic outcomes of the cues and a resulting reduction in computational complexity. The lane tracker is demonstrated in a number of challenging scenarios and is ulti-

Introduction

Figure 1.3: TREV (TRansport Experimental Vehicle) - the testbed vehicle. mately integrated with a driver monitoring system to track the visual behaviour of the driver.

1.2

The experimental platform

The testbed vehicle, TREV (TRansport Experimental Vehicle), is a 1999 Toyota Landcruiser 4WD (gure 1.3). As vision is the main form of sensing in this project, two dierent vision platforms have been installed. A passive set of cameras mounted on the dashboard facing the driver are part of the faceLABTM (Victor et al. 2001) system for driver monitoring. CeDAR, an active vision head designed at ANU (Sutherland et al. 2000), carries two forward-facing cameras that are congured for simultaneous near-eld and far-eld scene coverage.

1.3

Contributions

The design, development and testing of the distillation algorithm a novel tracking algorithm that integrates particle ltering and cue fusion technologies for target detection and tracking. This work was carried out in conjunction with Gareth Loy and Luke Fletcher of the Robotics Systems Laboratory at the Australian National University. The prototyping of a visual lane tracking system using the distillation algorithm as a search basis in matlab.

1.4 Thesis outline

A real-time 15 Hz visual lane tracking system developed in C++ with the distillation algorithm and tested on TREV in a variety of road scenarios. Integration of lane tracking with driver monitoring to provide the rst instance of integrated vision inside and outside the vehicle. The design and implementation of a framework for curvature estimation. Note that this was not tested thoroughly due to inadequate camera stabilization.

1.4

Thesis outline

Chapter 2 discusses intelligent transportation systems with particular emphasis on the use of computer vision for lane tracking. TREV, the experimental platform used in the project, is described in chapter 3. The distillation algorithm that forms the basis of the detection and tracking architecture is motivated and developed in chapter 4. Chapter 5 applies the distillation algorithm to the task of lane tracking and reviews the performance of the system. Appendix A discusses an extension of the system to track lane curvature. A driver assistance system that integrates the lane tracker with a driver monitoring package to estimate the drivers focus of attention with respect to the road is developed and reviewed in chapter 6. The thesis is concluded and further work is discussed in chapter 7.

Introduction

Chapter 2 Previous work

One moment starts to blend in with the next. Your eyelids start to feel heavy, your blink rate changes and you stop looking at the road and the surroundings like you normally would. While this is happening, the car starts to drift from the centre of the lane and you dont notice. Suddenly you feel the steering wheel move to correct the position of the car and you hear a voice. It is soothing but has a sense of urgency: Warning! Correcting unintentional lane departure! Depress the brake pedal to regain full control of the vehicle.. You realise that you must have been suering a nanosleep1 and had unintentionally steered into the oncoming trac. All while your wife and two children were sleeping unaware. The above scenario of fatigue aected driving is frightfully common. It has been estimated that driver fatigue and inattention account for around 30% of all driver fatalities (HoRSCoCTA 2000). This kind of recovery system is not a gment of our imagination but an area of ongoing research in the eld of intelligent vehicles (Petersson et al. 2002). A technology that will become a reality in years to come. Systems that monitor the driver for signs of fatigue and where their attention is focused, that detect and track the lane you are driving in, and that can autonomously control the lateral and longitudinal position of the vehicle are the types of technologies that are slowly coming onto the marketplace and are being actively researched. Any vehicle that displays some form of awareness of their environment through smart functions falls under the generic title of intelligent vehicles. However intelligent vehicles are not limited to self-aware vehicles, but
A nanosleep is a brief period (usually only a few seconds) in which the brain enters a sleep state regardless of the activity the person is performing at the time.
1

MAGINE driving along the highway after a tiring week-long trip to the snow.

10

Previous work

encompass technologies such as fuel cell power sources and external airbags. This chapter shall review the eld of intelligent transportation systems (ITS) with an emphasis on visual lane tracking algorithms. Three important facets are covered. First, a broad overview of ITS and the major ITS initiatives are introduced. Second, the range of cues for lane detection are discussed. Finally, the lane tracking systems that have shaped the eld are reviewed.

2.1

Intelligent transportation systems

The eld of intelligent transportation systems covers a broad variety of technologies ranging from fuel cell development to information systems, communications technology and intelligent vehicle functions. However, two main areas of thought exist on the construction of intelligent transportation systems: a new road infrastructure can be created or modied to be an integral part of the ITS (i.e. through the use of lane marker beacons, dedicated roadways, advanced trac control systems, etc.) or intelligent vehicles can be designed to work with the existing road infrastructure. There are obvious issues that support the development of intelligent vehicles over the radical change of current road infrastructures: the nancial restrictions associated with the creation, modication and maintenance of existing road infrastructures, and the additional exibility of designing a vehicle that can deal with the real world environment. Because of this and the additional challenge of creating intelligent robotic systems for use in the real world, this project is focused on intelligent vehicles for use within existing road infrastructures.

2.1.1

ITS technologies

Early research in intelligent vehicles focused on autonomous vehicles as a solution to the problem of road related accidents. This was based on the premise that the human driver is the most unreliable component of the system. If responsibility is taken away from the unreliable component, then the safety of the overall system will be increased. Due to legal issues associated with an autonomous robotic system that is responsible for the lives of many people, both inside and outside the vehicle, the commercial viability of these systems will not be realized for decades to come. The current focus of intelligent vehicle research has now become more widespread, with a focus on the introduction of intelligent functions to

2.1 Intelligent transportation systems

11

support, as well as to assist, the driver in their driving task. This is not to say that autonomous driving is no longer a focus of IV research, it is just a way of achieving this goal in smaller, more incremental steps. Some of the technologies currently being researched for intelligent vehicles are presented below in order of projected commercial developmental completion (Pearson and Neild 2001). Head up displays (HUDs) move visual information the driver needs into their line of sight, so that they do not have to take their eyes from the road. The HUD projects the information the driver needs onto the windscreen of the vehicle so that it appears to oat in mid air about one arms length in front of the driver. This saves the driver from moving their attention from the road to the gauges on the dashboard and then having to refocus their attention onto the road. Adaptive cruise control (ACC). Current cruise control systems maintain a constant speed without the driver having to manually control the throttle of the vehicle. As the name suggests, adaptive cruise control systems adjust the speed of the vehicle to maintain a safe distance to the other vehicles directly ahead of you. Driver monitoring (inattention and fatigue detection) systems are used in a variety of situations where either the drivers gaze direction or metrics based on the visuo-attentional behaviour of the driver are required. Examples include studies by car manufacturers to determine how distractive dashboard congurations are and studies of the drivers visual behaviour during driving to determine the onset of fatigue. External air bags for pedestrian collisions are currently being considered by the European Commission for pedestrian friendly cars. Pedestrian detection systems that use vision and other sensors to detect the presence of pedestrians and can warn the driver of a possible impact. Obstacle detection systems, like pedestrian detection systems, can be used to warn drivers of an imminent impact. Early commercial systems are based on proximity warning systems for car parking assistance. More sophisticated systems will be used to characterize the environment external to the

12

Previous work vehicle and will be used in conjunction with lane tracking systems for autonomous vehicle guidance, and with driver monitoring systems for fatigue and inattention warning systems.

Lane tracking and associated systems will be introduced in gradual steps starting with lane departure warning systems, then assisted lane keeping systems, cars with automatic steering and then urban autonomy. Intelligent air bags will be able to detect the presence of passengers, the size of the passenger and the severity of the collision to judge the required deployment of airbags to maximize safety. Full autonomous control of vehicles including the ability to specify the destination only and have the vehicle take control for the entire journey. This kind of system requires a myriad of technologies including GPS mapping, voice recognition systems, lane tracking and driver monitoring systems, obstacle and pedestrian systems as well as advanced driving control systems. It is clear that perceiving the environment both inside and outside the vehicle is important in ITS. However, few projects look to the driver for any information. As an example of this, an application is developed in this dissertation that fuses visual lane tracking with driver monitoring in the rst step towards closing the loop between vision inside and outside the vehicle (chapter 6). This results in a system that allows a detailed analysis of the drivers visual behavior with respect to the environment external to the vehicle.

2.1.2

Major ITS initiatives

The rst phase of research into intelligent transportation systems saw projects initiated in universities, automobile manufacturers and research centers all over the world. Many of the technologies you see on the road today are the result of these projects. As early as 1986, Europe introduced the PROgraM for a European Trac system with Highest Eciency and Unprecedented Safety (PROMETHEUS) project. This was a consortium involving over 19 European countries including 13 vehicle manufacturers, several government research centers and a variety of universities. The focus of this project ranged from digital road map technology to autonomous

2.1 Intelligent transportation systems

13

vehicle research with a number of ITS ideas carried through from conception to implementation and testing. ERTICO is a more politically and commercially oriented ITS partnership set up in 1991 in Europe (ERTICO 2004). It is focused on promoting and supporting the implementation of ITS in Europe, and is made up of 85 partners including 38 from industry and 25 public authorities. Their achievements to date are mostly in standards development, coordinating cooperation between partners through various forums and collaborative projects, and the promotion of ITS technologies through public demonstration of their systems. In 1995, the United States banded together to form the National Automated Highway System Consortium (NAHSC) which like the PROMETHEUS project, contained a number of universities, research centers and automobile manufacturers across America (NAT 1998). Today the NAHSC is no longer being funded by Congress. The Japanese government formed the Advanced Cruise-Assist Highway System Research Association (AHSRA) in 1996 from a large number of automobile manufacturers and research centers (ACAHSRA 2004). AHSRA focused on the automatic vehicle guidance problem. The CyberCar project is yet another European based consortium aimed at changing the urban transport environment into one where a shared number of fully automated vehicles form a transportation system for goods and passengers with an on-demand door-to-door capability (Ouanounou 2004). This system may incorporate other forms of mass transportation systems such as trains with the automated vehicle used as a compliment. The system is intended to work on existing road infrastructures and with the automated vehicles dealing with issues such as road recognition. These systems are being tested in a number of European countries including Switzerland, Germany and France. A central theme of these projects is to be able to control both the number of vehicles as well as the technologies they use, such as clean fuels. This initial ux of research has had a signicant impact on the automobile industry with a number of innovative products nding their way into the marketplace. Companies such as Nissan, Toyota, Honda and Daimlerchrysler are all investing into ITS technology through active in-house development of ITS devices demonstrated in various concept cars. Airbags, ABS brakes, GPS guided electronic road

14

Previous work

maps and automatic emergency call systems are all examples of the types of technologies that have appeared onto the market as a result of this research. While the market for ITS technologies is dominated by large multinationals, smaller startups are starting to nd their way into the marketplace:

Mobileye produces a single camera embedded system for obstacle detection, ego-motion analysis and lane marker detection. Its products such as LDW and ACC are due to be released by 2005 (Mobileye 2004). Iteris develops intelligent transportation systems ranging from vehicle detection systems to lane departure warning systems and highway capacity software (Iteris 2002). Seeingmachines is a spin-o from the Robotics Systems Laboratory at the Australian National University that produces faceLABTM , a driver monitoring package (Seeingmachines 2004). It uses a passive stereo camera pair mounted on the dashboard of the vehicle to capture images of the drivers head at 60 Hz. These are processed in real-time to determine the 3D pose of the persons face as well as their eye gaze direction. Its primary use is analyzing the visual behaviour of the driver and fatigue detection.

Although a large number of these groups are non-prot organizations, the ultimate driving force behind ITS is marketplace acceptance and prot margins. To be accepted, a system must satisfy stringent government and industry standards, the least of which is reliability. Failure rates of 1%, 0.1% or even 0.01% are not acceptable. Focus must be placed in designing systems that can adapt to the variation of scenes that are likely to occur and as will be shown later, multiple cues are crucial to the success of such systems.

2.2

Cues for lane detection

Think of the way you drive your car and perceive the environment. A person runs into your view on a footpath. You detect them in your peripheral vision using motion. You spot a red car on the side of the road in the middle of the night using colour. You follow the road using the help of the lane markings and reectors. When they disappear you use colour and texture segmentation to detect the road.

2.2 Cues for lane detection

15

(a)

(b)

(c)

Figure 2.1: Cues for object identication. The shape is shown in (a), while texture is shown in (b) and colour in (c). Although you can probably tell what this is by looking at all three, no individual cue gives enough information for robust identication. Figure 2.2 over the page shows the original image. All are tools of vision, showing that it is a very powerful medium to sense with. A medium that allows multiple cues to sense dierent characteristics of the same object to increase tracking and detection robustness. Figure 2.1 shows an example of this. Individual cues do not provide enough information to correctly identify the object; however, when all the cues are combined, the object is clear (gure 2.2). It is goal of this thesis to build a lane tracking system that makes use multiple-cues in a similar fashion to the human vision system.

2.2.1

Characteristics of the road

Like most objects and structured environments, there are certain characteristics of the road that distinguish it from the background clutter in the scene. Figure 1.1 in chapter 1 showed a number of common roads that hint at the variety of features that can be used for road detection and tracking:

Most semi-structured roads are locally at with continuous curvature. Discontinuities in curvature are usually present at trac junctions and can be handled separately.

16

Previous work

Figure 2.2: Oranges. The combination of texture, colour and edges simplies the identication of the oranges. While the texture may vary between roads, it is often homogeneous within the road and dierent from its surroundings. Similarly, the colour may vary between roads, but it is often constant within the road even though shadows may change its brightness. The boundary separating the road and the non-road regions is often characterized by an edge. Finally, the most common feature on structured roads is the lane marker. In the next section, each of these features that has been used previously in lane tracking is discussed with respect to its advantages and disadvantages for the task of lane detection.

2.2.2

Intensity prole

Ohta et al. (1980) showed in their seminal paper that in general, the rst eigenvector from a PCA analysis of colour images transforms the colour image into the intensity image. This indicates that a signicant portion of the information or variance of the image is contained in the intensity signal, thus making it an ideal rst choice for analysis. Raw image data was used explicitly in some early systems (Pomerleau 1989; Pomerleau and Jochem 1996) at CMU. Pomerleau (1989) uses the intensity im-

2.2 Cues for lane detection

17

Figure 2.3: Lane marker extraction using dark-light-dark lane proles. The blue line shows the intensity prole along the black scanline in the image. Lane markers are extracted by looking for dark-bright-dark transitions in the intensity prole (red circles). Other markers such as oils streaks or tire marks (green circle) can also be tracked by searching for representative template proles.

age in combination with steering commands of a human operator to train an articial neural network to steer the vehicle autonomously. Later, Pomerleau and Jochem (1996) matched horizontal image intensity proles to a template prole for identication of lanes without relying on lane markers explicitly. The type of feature tracked is dependent on the intensity prole; for example, gure 2.3 shows the intensity prole of a road with both lane markers and tire marks. Lane markers are identied by dark-bright-dark transitions (peaks in the intensity prole), while the two troughs to the left identify the tire marks. Consistent horizontal proles like this at dierent rows of the image can be used to reconstruct the road prole from any type of feature that runs parallel to the road, as long as the prole of that feature is known; however, using intensity proles directly has a number of drawbacks. The proles are strongly dependent on the type of marker being tracked and are sensitive to lighting changes in the image.

18

Previous work

(a)

Intensity image

(b)

Threshold = 100

(c)

Threshold = 200

(d)

Threshold = 240

Figure 2.4: Lane marker extraction using thresholding. Thresholding an image is extremely sensitive to the threshold used. Too low a threshold results in signicant noise (b), while too high a threshold throws away information (d). Thresholding Thresholding the intensity image has been used by a number of researchers in combination classication algorithms to extract lane markers from the image (Bertozzi and Broggi 1998; McDonald et al. 2001). This technique is based on the premise that the lane marker is a bright against dark feature (gure 2.4) and is a natural extension of the intensity prole technique where the threshold level can be adapted to suit dierent lighting conditions. This is inherently limited as it can only extracted bright lane markers and its success is extremely sensitive to the threshold level.

2.2.3

Gradient images and edge detection

Gradient edge methods were the central theme of many rst generation lane trackers and continue to be a commonly chosen cue (Kenue 1989; Kluge 1994; Taylor et al. 1996; Lakshmanan and Kluge 1996; Bertozzi and Broggi 1998; Dickmanns 1998; McDonald et al. 2001). This is the result of a number of premises. First, gradient methods are relatively insensitive to brightness levels and do not rely on a single threshold. Second, the majority of structured roads are characterized by either well-dened edges or lane markers (gure 2.5). Third, a transition between objects, such as the road and surroundings, will induce an intensity gradient in

2.2 Cues for lane detection

19

(a)

(b)

(c)

Figure 2.5: Edge cues for road detection. The original road image is shown in (a) and the edge gradient magnitude calculated using the Sobel operators is shown in (b). Lighter values correspond to stronger gradients. The edges found using thresholding and non-maximal suppression are shown in (c). Both lane markers and the road to non-road boundary can be characterized using these edge gradients. image space. The majority of edge detection methods can be grouped into two categories: gradient and Laplacian. Gradient methods look for minima and maxima in the rst derivative of the image (gure 2.6(b)). Laplacian methods search for zero crossings in the second derivative (gure 2.6(c)). We will concentrate on gradient methods here as Laplacian methods are seldom used within the lane tracking community. For a detailed review of both these methods, see (Forsyth and Ponce 2003).

Gradient methods Edge detection starts with the calculation of edge gradients directions in the image I by convolution with a kernel g I= dI dI ix + iy = (gx I)ix + (gy I)iy dx dy I in the x and y

(2.1)

where ix and iy are the unit vectors in the horizontal and vertical image axes respectively. A popular set of kernels are the Sobel operators, gx and gy , which calculate the gradients in the x and y directions respectively 1 0 1 1 2 1 1 1 gx = 2 0 2 ; gy = 0 0 0 . 4 4 1 0 1 1 2 1

(2.2)

20

Previous work

0.9

0.8

0.7

0.6 f(x)

0.5

0.4

0.3

0.2

0.1

0 3

0 x

(a) The intensity prole of an edge


x 10
3

df(x)

0 3

0 x

(b) The rst derivative


x 10
5

1 d2f(x)

5 3

0 x

(c) The second derivative

Figure 2.6: Edge detection. Gradient methods search for the minima and maxima in the rst derivatives of the image (b) while Laplacian methods search for zero crossings in the second derivative (c).

2.2 Cues for lane detection

21

(a) Intensity

(b) dx

(c) dy

(d)

(dx2 + dy 2 )

Figure 2.7: Road gradient magnitude image. Applying the Sobel kernels to an intensity image of the road results in the x and y gradient images (b) & (c). Road edges and lane marker edges are highlighted in the gradient magnitude image (d). Using a 2D kernel like this incorporates the premise that an edge is more than a peak in a scanline, but has a 2D structure that is characterized by its neighbouring pixels. The gradient magnitude of image I is then | I| = and the gradient direction is ( I) = arctan {(gy I)/(gx I)} . (2.4) (gx I)2 + (gy I)2 (2.3)

Applying the Sobel kernels to the road image in gure 2.7 we get the gradient images shown. Both road edges and lane marker edges are highlighted in these images; however, a signicant amount of noise is present in the background that may distract a lane detector. There are many other kernels for calculating the gradient of an image and each has its own advantages and disadvantages. Some basic 1D derivative kernels are gx = gy = 1 1 ; gx = gy = 1 0 1 . (2.5) (2.6)

While these lters appear similar, they dier signicantly in their Fourier characteristics. The second lter suppresses high frequency terms while the rst does not and the rst form induces a phase shift while the second does not. This can be seen in the Fourier magnitude, G, and phase, , characteristics g = 1 1 |G()| = 2| sin(/2)|; () = ( )/2; g = 1 0 1 |G()| = 2| sin()|; () = /2.
F F

(2.7) (2.8)

22

Previous work

(a) | I|

(b) = 10

(c) = 50

(d) = 100

Figure 2.8: Thresholding the gradient magnitude image to obtain edges. Selecting dierent thresholds results in diering levels of detail. Too low a value of and the edge image becomes cluttered. Too high a value of and all detail is removed. Another gradient operator is the Prewitt gradient kernels. They are a simpler version of the sobel operators that dont weight the contributions from the neighbouring pixels 1 0 1 1 1 1 1 1 gx = 1 0 1 ; gy = 0 0 0 . 3 3 1 0 1 1 1 1 Edge detection Edges can then be found by thresholding the gradient magnitude image by E = | I| > . (2.10) (2.9)

The eect of using dierent is illustrated in gure 2.8. Too high a value of and little detail is captured. Too low a value of and the image becomes cluttered. Two problems are characteristic of this technique for nding edges. First, taking the derivative enhances noise within the images. This can be overcome by blurring the intensity image before calculating the derivatives; however, this risks the loss of edge details. Second, thick edges are produced where the intensity changes rapidly over several pixels. This is addressed by non-maximal suppression. Thick edges are reduced to a single pixel by suppressing all pixels whose gradients are not greater than its immediate neighbours in the direction of the local gradient. Possibly the most popular edge detection algorithm in the literature is the Canny edge detector (Canny 1986). The Canny edge detector is an optimal edge detector according to certain statistical criteria2 and Canny used the calculus of variations
This statistical criteria is not covered here for brevity. See (Canny 1986) for more information.
2

2.2 Cues for lane detection

23

(a) I

(b) | I|

(c) | I| > 50

(d) Edges

Figure 2.9: Non-maximal suppression. Edge pixels in the thresholded image that are not greater in magnitude than its previous and succeeding pixel in the gradient direction are suppressed.

(a) I

(b) Sobel edges

(c) Canny edges

Figure 2.10: Canny edge detection. The original image is shown in (a) while the edges found using the Sobel operator and the Canny edge detector are shown in (b) and (c) respectively. The hysteresis thresholding of the Canny edge detector generates strings of edges as opposed to the noisy edges found using the Sobel operator. to satisfy this criteria. The edge detector works in a number of stages. First, the input image is smoothed with a Gaussian kernel. A simple rst derivative operator is applied to highlight regions of the image with high rst derivatives. Because edges produce ridges in the gradient magnitude image, the algorithm tracks these ridges and sets all pixels that are not on the peak of the ridge to zero, which reduces the edge to a thickness of one pixel in a way similar to nonmaximal suppression. Tracking is controlled by two thresholds T1 and T2 with T1 > T2 . Tracking begins at a point on the ridge that is greater than T1 . Tracking then continues from that point in both directions until the ridge falls below T2 . This process is called hysteresis thresholding and merges noisy edges that are connected to strong edges into edge strings. There is a general mix in the literature between the use of binary edge images and gradient magnitude images for lane detection. The gradient can be used as a

24
1

Previous work
compress rows into single vector

1 0

ternary correlation mask

search for edges with a ternary correlation

Figure 2.11: Dickmanns oriented edge detection algorithm. An oriented pixel eld is condensed into a single vector and correlated with a ternary mask to search for edge pixels. sample of edge strength (Lakshmanan and Kluge 1996; Kluge et al. 1998) whereas thresholding techniques combined with non-maximal suppression can be used to extract binary edge locations and mask out erroneous edges with small gradient magnitudes (Kluge 1994; Taylor et al. 1996; Aufrre et al. 2001; McDonald et al. e 2001). Perhaps the most prominent of edge detectors used in lane tracking is that of Dickmanns (1998) who calculates the oriented edge strength using a combination of a ternary mask and a low-pass lter.

Dickmanns oriented edge detector The oriented edge detector derived by Dickmanns (1998) extracts edges at a prespecied angle, , to detect road boundaries and lane markings in an image. An outline of the algorithm is shown in gure 2.11. A low-pass ltering of the search region is performed by condensing a pixel eld oriented at a angle , into a single vector. A ternary correlation is then used to search this vector for edge pixels. The algorithm is particularly useful for searching for edges at predicted angles, but can also initialize lane trackers by searching over all angles. Sensitivity of the edge detector to curved edges and varying edge widths is controlled by the size of the ternary mask, and the number of rows condensed together. Figure 2.12 shows the edges found using this algorithm on a road image containing various characteristics that make road detection dicult (reections in the windscreen, structures that run parallel to the road, shadows across the road, etc.). The blue lines are the road and lane edges that were correctly detected.

2.2 Cues for lane detection

25

Figure 2.12: Lane edge feature detection using Dickmanns oriented edge detector. The solid blue lines show the correct road and lane marker edges while the dashed lines show possible erroneous edges. The vertical red lines show erroneous edges that are easily removed via simple continuity constraints and road model tting. The red lines indicate invalid edges that can be easily removed via continuity constraints and road model regression, while the dashed lines indicate edges that are incorrectly detected as road or lane boundary edges. This example clearly shows the diculty and ambiguity in using a single cue for lane detection; however, the additional selectivity of the edge detector to oriented edges can help reduce the number of false lane edges that are generated.

2.2.4

Lane marker detection

A common theme among many previous lane trackers is the use of lane markers as the primary feature for tracking and consequently there are a number of techniques for lane marker extraction. Pomerleau and Jochem (1996) learn the dark-bright-dark intensity prole of the lane markers explicitly in RALPH; however, they are not limited to that feature alone, but track any feature that runs parallel with the road. Bertozzi and Broggi (1998) and McDonald et al. (2001) both exploit the brightness of the lane markers by thresholding the intensity image of the road image to extract lane marker features. A useful characteristic of lane markers is they are almost always homogeneous in width and are typically smooth; therefore, for lane marker detection it is possible

26

Previous work

(a) 1D approximation to the LoG kernel

(b) The original colour camera image.

(c) The LoG ltered image.

Figure 2.13: Laplace of Gaussian kernel for lane marker feature extraction. to use a feature detector that is tuned to respond to bar-like features of a particular width.

Laplace of Gaussian kernel The Laplace of Gaussian (LoG) kernel, also known as the mexican-hat kernel, is particularly suited to the extraction of bar-like features in an image (gure 2.13). It is a kernel that is traditionally used to calculate the zero-crossings of an intensity image for edge extraction. We can approximate the LoG kernel in 1D to extract only vertical bar-like features LoGk (r) = c(1 r2 /a2 ) expr
2 /2a2

(2.11)

where r is the radial oset from the centre pixel of the kernel, c controls the amplitude of the kernel and a controls the spread of the kernel. These parameters can be congured to emphasize bar-like features of a set width in the image. For example, with the camera conguration used in this thesis, lane markers on Australian roads were typically three pixels wide.

2.2.5

Colour

Colour is a cue that is often ignored in lane tracking systems simply because it is dicult to exploit eciently and is sensitive to lighting changes; however, as gure 2.14 shows, road and non-road pixels often form largely disjoint sets in colour space and can therefore be exploited to track both structured and unstructured

2.2 Cues for lane detection

27

(a) Colour image

250 200 150 B 100 Road pixels

160 140 120 Nonroad pixels

50 0 200 Nonroad pixels 100 0 R 100 200 G 0

200 150

Road pixels 50 100 50 Y 150 U 100

(b) RGB

(c) YUV

Figure 2.14: Colour for lane detection. There is a clear distinction between the road and non-road colour pixels in both the RGB and YUV coordinate spaces.

roads (Kluge and Thorpe 1992; Jochem and Baluja 1993). Jochem and Baluja (1993) overcame the computational burden of colour image processing by exploiting a massively parallel platform to classify pixels as either road or non-road in the RGB colour space. They use between 5 and 30 clusters to represent road and non-road colours. Conversely, SCARF (Kluge and Thorpe 1992) uses a Bayesian classication technique to determine a continuous road-surface likelihood for each

28

Previous work

pixel in a reduced colour image and compares this with an ideal model. SCARF represents both road and o-road regions by a distribution similar to a Gaussian mixture model in RGB space. The general approach for using colour as a cue within lane tracking systems is summarized in the following steps: Generate a static road colour model oine or a dynamic model online. Classify pixels as either road or non-road pixels. Fit a road model to the classied pixels. Building a colour model involves: Grouping road and non-road pixels. This is either done oine by a human operator labelling each pixel or online by using the road model formed in the previous timestep of the algorithm to extract the road and non-road pixels from that image. Converting the pixels into the relevant colour space. Clustering the road and non-road pixels. Labelling the clusters as either road or non-road clusters. Nearest neighbour classication can then be used to label each pixel in a new image as either road or non-road (Jochem and Baluja 1993). Nearest neighbour classication is not the optimal method for pixel classication, but is used here as a illustrative example. The RGB colour space typically used in these systems can be sensitive to changes in the lighting conditions caused by shadows or cloud cover; therefore, it is desirable to use a colour model that is not dependent on the intensity of the image. We can follow the work from the face detection literature and use a colour model that separates the intensity or luminance of an image from the colour chrominance. A popular choice for this has been the CIE Lab colour space (Cai and Goshtasby 1999); however, most colour spaces have been used successfully for face detection (Loy 2003).

2.2 Cues for lane detection

29

Video cameras commonly return the raw image data in the YUV format, where Y is the intensity, and U and V are the chrominance values. This is primarily the consequence of engineers designing colour television signals so that they are back-compatible with black and white TVs (Maller 2002). Since it has been shown previously that the intensity of an image contains the majority of the information in the signal, each pixel is assigned an intensity value Y, whereas every other pair of pixels share the UV components. This reduces the storage size of a YUV image by a third. Many applications then convert the YUV images to RGB for processing (often converting to another space for colour manipulation). The advantage of one colour space over another for lane detection is questionable at best. Therefore, for computational simplicity the lane tracker developed in this thesis uses the YUV histogram lookup tables formed through dynamic modelling of the road colour distribution (5.4.3). The transformation between RGB and YUV is linear and reversible Y = R 0.299 + G 0.587 + B 0.114; (2.12)

U = R 0.169 + G 0.332 + B 0.500 + 128; V = R 0.500 + G 0.419 + B 0.0813 + 128; (2.13)

R = Y + (1.4075 (V 128)); G = Y (0.3455 (U 128) (0.7169 (V 128)); B = Y + (1.7790 (U 128).

Evolving the technique described previously for pixel classication, we no longer classify pixels as either road or non-road, but assigns each pixel a road probability according to the colour histogram table (gure 2.15). A threshold can be applied to the output likelihood image to obtain a binary road colour image; however, the image can be left as continuous values to give a measure of how road-like each pixel is. This confers the advantage that slightly discoloured road pixels can still contribute to the nal solution.

2.2.6

Texture and other modalities

While the edge, lane marker, and colour cues have been the most commonly used cues for lane tracking, researchers have not limited themselves to only those. Rasmussen (2002) combines both colour and texture with a laser range

30

Previous work

(a) Sampled road regions

(b) UV road histogram (the x and y axes are U and V respectively)

(c) Blurred UV road likelihood lookup table

(d) Road image

(e) Road likelihood image

Figure 2.15: Road colour histogram classication. The UV components of the sampled pixels (a) are histogrammed in (b) and blurred to generate the nal road probability lookup table in (c). This is used to estimate the road likelihood image (e) of the road scene in (d).

nder for road following.

Cramer and Wanielik (2002) use a laser range nder to detect road boundaries based on the statistical reectivity of surfaces.

Masaki (2004) combines edge detection with stereo vision to obtain accurate depth measurements of objects within the scene for lane and object detection.

2.3 Lane localization and tracking

31

2.2.7

Lane tracking cues summary

It is clear that there is an abundance of information in images available for lane tracking. This is evidenced by the variety of methods that have been used in the past for lane detection. One conclusion of these works is that each cue has both advantages and disadvantages over the other cues and more importantly, no individual cue is enough to solve the lane tracking problem. Combining cues in an intelligent manner like the way the human visual system does is an important step to attaining the reliability and exibility that is essential for commercial acceptance of lane trackers. While the cues selected play an important role, the lane localization and tracking framework is the basis upon which this will be accomplished and is the focus of the next section.

2.3

Lane localization and tracking

Although lane tracking is a challenging task, it is essential for autonomous road following and hence there are a plethora of approaches in the literature. Lane tracking can be formally expressed as the estimation of two sets of parameters over time: the vehicle pose with respect to the road and the road structure. While a common approach is to rst determine the relative position of the vehicle with respect to the road, followed by the estimation of the road structure (Dickmanns 1999), a number of early approaches to road following completely ignore forming an explicit model of the road. They focus instead on learning the steering response of a human operator given only an image as input. Pomerleau (1989) uses an articial neural network in ALVINN (autonomous land vehicle in a neural net) to learn a mapping from image pixel values to steering actuator commands for autonomous lane following. While this work is interesting, the majority of lane following systems learn an explicit model of the road structure so that more advanced steering commands can be issued. For this reason, this chapter is focused on lane tracking systems that estimate an explicit road model. The remainder of this section provides an overview of the most common techniques for lane localization and tracking paying particular attention to visual-based methods.

32

Previous work

(a) VaMoRs

(b) VaMP

Figure 2.16: VaMoRs and VaMP. The two testbed vehicles used by Dickmanns. Pictures from (Gregor 2000) and (Hofmann 2000)

2.3.1

Dickmanns and the 4D-approach

One of the most successful and prominent approaches to lane tracking is the work by the team at the Universit at der Bundeswher lead by Professor Dickmanns (Dickmanns and Zapp 1987; Dickmanns 1998; Dickmanns 1999). Dickmanns et al. pioneered the 4D-approach to lane tracking in the PROMETHEUS project where they demonstrated autonomous control of their vehicle at speeds up to 100 kph on the German Autobahn. They use a detailed dynamical model of the vehicle with edge based feature detectors and an extended Kalman lter to recursively estimate vehicle and road state parameters in a predened lookahead range. Two dierent testbeds were used in the developmental phase of this research (gure 2.16). A 5 tonne van known as VaMoRs was driven autonomously at speeds of up to 100 kph on a 24 km section of the Autobahn that was closed for construction (Dickmanns 1988). Later VaMoRs was superseded by a Mercedes sedan called VaMP. VaMP was driven autonomously for 95% of a 1600 km trial from Munich to Odense in Germany to demonstrate a number of intelligent systems for autonomous vehicle control (5.6.1). While lane tracking and following formed a signicant portion of their work, lane changing, convoying and obstacle avoidance algorithms were also demonstrated. Vision is used as the main method of sensing the external environment, but an assortment of other sensors such as accelerometers are used to estimate vehicle motion. Four main contributions of this work stand out. First, an ecient oriented edge

2.3 Lane localization and tracking

33

feature detection algorithm enabled real-time performance on the limited hardware of the time (2.2.3). Second, the 4D-approach was fundamental to the efciency of the algorithm as it constrained the search area in image space to tractable proportions. Third, specialized vision systems were developed to allow both near-eld and far-eld views in combination with saccadic motion to increase the visual search space. Finally, clothoid road models were promoted as an ecient means to representing the curvature of the road (A.2).

The 4D-approach The 4D-approach is a combination of a bottom-up and top-down method for road recognition. Bottom-up estimation is an independent process that determines the road structure using edge features while the top-down approach is used for prediction and validation of road models. This algorithm incorporates a model of the world in space (3D) and in time (1D) that is recursively rened using an extended Kalman lter. Horizontal search windows are used to extract lane markers from intensity images for tracking. Lane markers are characterized by dark-bright-dark transitions at a particular orientation and width (gure 2.3). The oriented edge detector described in section 2.2.3 is used to extract these transitions. The search windows are localized using the expected positions and orientations of the lane markings predicted each timestep. The error signal between the predicted position and detected position is used to update the road model. Tracking is established in two phases. First, the algorithm is initialized by extracting the edges that delineate the lane markings from a large search area in the image. Regression is used to form two lines through the left and right lane markings in the wide-angle image. Straight lines form a good approximation in the wide-angle image because the lane curvature is negligible in the near-eld (6-30 m) on highways. Using this assumption of a linear road segment in the near-eld, the vehicle yaw and lateral oset from the centreline of the road as well as the road width are estimated. This forms the basis for an extended Kalman lter (EKF) that estimates a moving average clothoid model of the road (A.2.1; Dickmanns 1988). The estimated state parameters can be broken into two categories: the vehicle position parameters that include the lateral oset and the yaw of the vehicle with respect to the centreline of the road;

34
4 R. Gregor et al.

Previous work

3.1

Sensor Concept

The EMS-Vision system design has been kept as general as possible to be able to cope with a variety of scenarios. Thus, the requirements for the vision system are manifold: Driving at high speeds requires large look-ahead distances on the trajectory planned in order to detect obstacles suciently early for collision avoidance. On uneven and rough terrain, inertial stabilization of the viewing direction is necessary in order to reduce motion blur in the images (especially the tele-ones). (a) MarVEye
WWL

MT WWR

ST

L0,05= 96m Wide field of view Color signal

L0,05= 300m large lookahead range

Fig. 1. (b) MarVEye eld of view MarVEye camera conguration

Figure 2.17: MarVEye. The vision system used by Dickmanns unpredictable et al. has four In cluttered environments with many subjects moving in an cameras providing three view is of view. for collision avoidance;be inferred byof manner, a wide eld of elds required 3D information can the capability the stereo rig in the near-eld whilein these cases (in the near range) to accomplished stereo interpretation will help curvature and scene estimation is understand using a colour arrangement and motion of objectsgrayscale camerathese requirethe spatial camera in the medium-eld and a quickly. All of in the far-eld. Pictures from led to the design of (Gregor 2000a) active/reactive Vehicle Eye ments have (Gregor 2000) and the Multi-focal,
MarVEye, taking advantage of the assembly of functions which nature combined into the vertebrate eye, closely interconnected to the vestibular system for the road shape The MarVEye camera arrangement combinesvertical eld inertial sensing. parameters that include the horizontal and a wide curvaof view (f.o.v.) nearby with central and vertical curvature, the road width and ture, the change in horizontal areas of high resolution. Coarse resolution monochrome peripheral vision with a wide f.o.v. (> 100 ), realized by a pair of the in a divergent arrangement, is the lane (A.2.1). cameraschange in the road width along accompanied by high resolution foveal color imaging with a f.o.v. of 23 . A high sensitivity b/w-camera with a f.o.v. of 7 for large lookahead distances completes the system. Figure 1 shows the MarVEye camera arrangement. At L0.05 one pixel in the image corresponds to These parametersworld, so thatestimatedlaneextendingis covered by 3 pixels. grad5cm in the real are robustly a typical by marking the lookahead range

ually so that the estimated curvature parameters approach that of the lane. The search window in the near-eld covers a range of 640 m, while the search eld 3.2 Hardware Concept in the far-eld covers the 30100 m range. The lookahead region is shortened in periods of uncertainty and then increased again over time. A separate obstacle computational part of the EMS-Vision system. As a PCI bus master transfer detection monochrome video regions(768the image that into hostother vehicles so of one module indicates image in by 572 pixels) contain memory takes
approximately 12 ms (and tracking module do not search in these regions that the lane detection andthe memory interface being blockedin the meantime),for The MarVEye camera conguration had major inuence on the design of the

lane markers.

2.3 Lane localization and tracking Summary

35

The work of Dickmanns and the team at the Universit at der Bundeswher has produced a number of excellent intelligent systems for vehicles and they were pioneering in the eld of lane tracking. They showed extreme reliability on the structured autobahns in Germany, and even extended their systems to o-road use and intersection detection. The success of Dickmanns algorithm can largely be attributed to the strong dynamical models for vehicle motion used in conjunction with the recursive 4Dapproach to machine vision. While the edge feature detectors that were used were novel and powerful for their day, they were nonetheless based on simple edge extraction which can still be fooled by erroneous edges. It was the strong temporal coherence enforced with the 4D-approach that allowed them to succeed where others had failed. This directed search for edge features allowed the use of an expensive image processing algorithm and was particularly useful for structured man-made environments. However, the major limitation in their approach was that it struggled with variations in the driving environment (i.e. construction zones, dramatic lightning changes, poor or no lane markings, shadows across the road, etc.) since edge-based feature detection was the primary means of lane extraction. While the extended Kalman lter enforced temporal consistency, it is inherently a unimodal technique that is prone to failure if the posterior distribution used for tracking is non-Gaussian. By restricting the visual search space using predictions from the tracking algorithm, they overcame many of the problems associated with multimodal posterior distributions, however this was not sucient to deal with the uncertainty of semi-structured roads that contain poor lane markings or inconsistent structure.

2.3.2

Scene constraints and a prior knowledge

The multimodal nature of the lane tracking problem presents an interesting question: how does one decide which edges are part of the road boundary and which ones are not?. Further information or a priori knowledge must be used to lter out invalid information. In a true philosophical sense, Webster (1998) denes a priori information as the knowledge and conceptions assumed, or presupposed, as prior to

36

Previous work experience, in order to make experience rational or possible. In a logical sense a priori characterizes a kind of reasoning that deduces consequences from denitions formed or which infers eects from causes previously known.

This concept of a prior model that can help infer eects from causes has been used by previous authors to help disambiguate the ill-posed nature of lane tracking problem. For example, the LANELOK system by General Motors (Kenue 1989) and the system presented in (Suzuki et al. 1992) both use a global constraint on the scene that the projective transformation induced by the camera, projects the parallel lines of the road such that they meet at the vanishing point. This shall be referred to as the vanishing point constraint. However, this does not help with any edges that are found to be parallel with the road (cracks, oil stains, fences etc.). Alternative sources of information or constraints on the scene must be used to disambiguate the problem and the lane tracking systems in the literature are characterized by the assumptions they make.

2.3.3

GOLD: general obstacle and lane detection

The General Obstacle and Lane Detection system (GOLD) used in the ARGO vehicle at the University of Parma exploits a at road assumption to help simplify the problem of lane tracking (Bertozzi and Broggi 1998). The system transforms stereo image pairs into a common birds eye view using an inverse projective mapping (IPM) and uses a pattern matching technique to detect lane markings on the road (gure 2.18). Disparities between the two images in the common domain are used to detect obstacles that deviate from the plane of the road. The remapped images simplies the detection of the lane markings as they can be thought of as almost vertical lines with a constant separation. The detected obstacles areas on the images are removed from the remapped images thus removing the chance of an obstacle inuencing the detection of the lanes. The algorithm proceeds in six steps (gure 2.19):

1. The inverse projective mapping is applied to the image to back-project it onto the ground plane.

2.3 Lane localization and tracking

37

(a)

(b)

(c)

Figure 2.18: GOLD: inverse projective mapping for lane tracking. (a) the captured road image. (b) the inverse projective transform of the road image. (c) the detection lane center. Pictures from (Bertozzi 1998). 2. The lane markers are modelled as bright regions against a dark background and are extracted in each row of the image using low-level image processing. 3. The image is enhanced using geodesic morphological dilation (Serra 1982) which exploits the vertical nature of the lane markers in the IPM. 4. An adaptive threshold in a 3x3 neighbourhood is used to extract lane markings, which handles lighting changes and diering lane markers eectively. 5. The binary image is thinned and scanned horizontally to build chains of non-zero pixels. These are then joined if they satisfy certain criteria that help remove the eect of erroneous lines. 6. A road model is used to choose the chain that most likely ts the center of the lane. A number of advantages are conferred by this technique. Remapping the image into the ground plane and looking for vertical lines inherently enforces the constraint that lane markers are parallel (the vanishing point constraint). Robustness to shadows is achieved by searching for dark-bright-dark regions that represent lane markers and through the adaptive thresholding operation used. Finally, The inverse perspective mapping allows fast image processing algorithms to be used, thus allowing real-time operation. The GOLD system was experimentally tested over 2000 km of roads throughout Italy in the MilleMiglia in Automatico Tour in 1998 (5.6.1). The ARGO vehicle

38

Previous work

(a)

(b)

(c)

(d)

(e)

(f)

Figure 2.19: Steps of the GOLD algorithm. (a) lane markers are extracted from the inverse projective mapping and the image is enhanced using geodesic morphological dilation. (b) an adaptive threshold is used to create a binary image of lane markers. (c) the binary image is thinned. (d) polygon chains are formed and joined together (e) a road model is used to extract the centerline polygon. (f) the extracted road model. Pictures from (Bertozzi 1998).

was guided autonomously by the GOLD system for 94% of the trial (Broggi et al. 1999a). This was an excellent result for the project and shed some light on a number of the inherent diculties in lane tracking. Failures in lane tracking were caused by poor road infrastructures (absent or worn lane markings) and this is a direct result of using lane marker features as the sole cue. In regions where the signal from the lane markings are weak or absent, the system fails. Another constraint on the system was that it was limited to roads with little horizontal curvature and no vertical curvature.

2.3 Lane localization and tracking

39

2.3.4

Navlab Carnegie Mellon University

The Navlab project at Carnegie Mellon University (Thorpe 1990) has explored many dierent algorithms for lane detection. SCARF (Crisman and Thorpe 1993) used adaptive colour classication and a voting scheme for lane localization, while YARF (Kluge and Thorpe 1992) extended this with the addition of feature detection. ALVINN (Baluja 1996) employed a neural net to learn a function from image space to steering angle through supervised training. MANIAC (Jochem et al. 1993) extended ALVINN through a modular architecture that used pre-trained ALVINN networks for dierent road types and a connectionist superstructure to handle the transition between dierent road types. Possibly the most impressive lane tracker from this research is RALPH (Pomerleau 1995). RALPH (rapidly adapting lateral position handler) validates multiple hypotheses of the road curvature by removing the hypothesized curvature from a birds eye view of the road and condensing the rows of this image into single row. Correct hypotheses will form a clear peak and trough in the intensity signal that can be compared with predetermined templates to discover the lateral oset of the vehicle (gure 2.20). The system adapts to new road types through the use of a number of predetermined templates and a temporally adapting template it calculates using the far-eld view of the road. The basis of this work is similar to that of Bertozzi and Broggi (1998) and the GOLD lane tracker that also applies an inverse perspective mapping to obtain a birds eye view of the road; however, RALPH diers in a number of ways. First, RALPH estimates the curvature of the road. Second, there is no low-level image processing required. RALPH works directly with the intensity image. Third, condensing the rows of the hypothesized image confers the same low-pass ltering advantage as Dickmanns oriented edge detector. Fourth, adaptive template matching is used to determine the road model which infers the advantage that the system can adapt to changes in road widths as well as being able to track anything that runs parallel to the road (i.e. discolouration of the road in the middle of lane caused by oil spots from other vehicles, tracks left by other vehicles in the rain, etc.). RALPH rst introduced the idea of validating hypotheses of the road model that match the observation, instead of estimating the model from the observation. This confers a number of advantages, two of which are particularly important. First, erroneous markings on the road have less impact on the estimation process

40

Previous work

(a)

(b)

(c)

Figure 2.20: Rapidly adapting lateral position handler (RALPH). A trapezoidal region is extracted from the road image (a) which is transformed to remove the hypothesized road curvature (c) to match the prole in (b). Pictures from (Pomerleau 1995).

2.3 Lane localization and tracking

41

because they are only considered if they happen to lie on the hypothesized road model. The system is therefore less likely to be distracted by these erroneous markings and obstacles such as other cars. Second, the system implicitly incorporates a priori assumptions such as the vanishing point constraint, reducing the need for complex post-processing algorithms. RALPH was successfully tested in Navlab 5 over 3000 miles of road, part of which was a trial from Pittsburgh, PA to Washington DC where RALPH steered autonomously for 96% of the 302 mile journey (5.6.1). It was successful at handling harsh shadows, dense fog, rain and night-time tracking, however it suered from the inclination to follow departure lanes instead of tracking the lane that it was in. The approach is unique in that it isnt restricted to tracking lane markers, but can follow any feature that is parallel with the road; however, the model is inherently limited to roads with no vertical curvature and has no computationally ecient of extending it to do so. The main problems discovered during the trial were associated with poor visibility due to rain, shadows of overpasses and road deterioration. In addition, the rapidly adapting template introduces the possibility of drift in the estimation; for example, because the far-eld is used to adapt the template of the road prole, it is possible for the system to mistakenly track features that are not part of the road and for the tracking to drift o the road. This explains the tendency of the system to follow departure lanes.

2.3.5

LANA and LOIS

The group at the University of Michigan have produced two novel contributions to the eld of lane tracking: LOIS (Lakshmanan and Kluge 1996; Kluge et al. 1998) and LANA (Kreucher and Lakshmanan 1999). Central to LOIS (likelihood of image shape) is a likelihood function that encodes the knowledge that edges of a lane should be near intensity gradients in an image. A parametric family of deformable templates are used to identify lanes, with the likelihood measure identifying how well a particular parameterization matches the observation. Like the 4D-approach of Dickmanns, a prior model constrains the possible locations of the lane based on the lane location from the previous frame. The lane edges are modelled as a set of circular arcs on at ground. For small to moderate curvatures k, this can be well approximated by a parabola of the form x = 0.5ky 2 + my + b. (2.14)

42

Previous work

The transformation of this curve into the image plane by perspective projection results in a family of curves which is well documented in the literature and forms the basis for a number of lane tracking algorithms (see (Kluge et al. 1998) for more details on this model). The likelihood function is based on the premise that there should be an intensity gradient near the location of the lane edge and that this intensity gradient should have the same orientation as the lane edge. They formulate a penalty function to encode this belief p(I|) and pose the problem within a Bayesian framework as nding the maximum a posteriori (MAP) estimate of the likelihood function and prior using the Metropolis algorithm with geometric annealing = argmax p(|I) = argmax
I

p(I|)p() p(I)

(2.15)

where is the parameterization of the road (the output variables) and I is the data (the gradient image). The prior in this case p() is actually the solution from the previous iteration of the algorithm. This is used as the starting point for the estimation in the current iteration. The prior p(I) is ignored in the estimation since it is constant with respect to . In a similar fashion to the curvature hypothesis testing of RALPH, LOIS does not estimate the road model from a set of image features, but validates a number of hypotheses via a likelihood function. Another signicant advance is the use of a Bayesian framework for lane tracking. A number of benets are conferred this technique. First, certain a priori information is indirectly included in the estimation (i.e. lane edges meeting at the vanishing point on the horizon, lane edges lie in the plane of the road, etc.). Second, outlying features can be discounted if they are incorrectly oriented with the hypothesis while weak features can contribute positively to the estimation. Third, the Bayesian framework allows a mathematically rigorous approach to the lane tracking problem that allows prior knowledge to be introduced into the estimation. LANA (Kreucher and Lakshmanan 1999) is an extension of LOIS that captures the strength and orientation of image-based features in the frequency domain using discrete cosine transforms (DCT). Like LOIS, deformable templates are used to detect lane markers of interest within a Bayesian framework. A number of advantages in using the frequency domain for lane detection were found by Kreucher and Lakshmanan. The most signicant of these was that LANA was less distracted by strong erroneous lane-edges in the far-eld. It was

2.3 Lane localization and tracking

43

also postulated that LANA would discriminate better between globally correct and incorrect hypotheses which is realized through likelihood functions that have a higher kurtosis than those produced by LOIS. However, no denitive proof that LANA was indeed a better lane tracker than LOIS was oered. A distinct disadvantage of both techniques is that they are inherently reliant on lane markers. In addition, their use of the solution from the previous iteration as a starting point for the current iteration is similar in form to Bayes ltering (4.2.1) but without the mathematical rigour.

2.3.6

Hough transform lane tracking

An interesting approach to lane tracking was taken by McDonald et al. (2001) who use a Hough transform to extract small linear segments from images acquired from a dashboard mounted camera. The Hough transform (Hough 1962; Duda and Hart 1972) is an algorithm that transforms a set of points in image space to a set of points in parameter space. In this case the parameter space is dened by a curve parameterization. Each point in image space maps to the set of parameterizations in the parameters space that could have generated this point. After transforming all points from image space, parameterizations that have a high number of votes result in peaks in the parameter space. The main advantage of the Hough transform is that it is robust to noise and occlusion. The Hough transform is used to nd line segments within the image that are parameterized by and , the angle of the line and the perpendicular distance of the line from the origin respectively (gure 2.21). The search is performed for road contours and not for a road model. Road curvature is then estimated by repeating the above process for several horizontal image segments and applying an exponential averaging technique to smooth the solution over time. Combining edge detection using Sobel operators with thresholded intensity images (by ANDing the two images together to remove erroneous edges) conferred a cheap feature detector for the Hough transform that was specically suited to lane marker detection; however, this in itself is a major limitation. Incorporating a threshold into the image processing places an inherent constraint on the system. Changes in the contrast or lighting levels can have a dramatic eect on the result of a thresholding operation.

44

Previous work

20

40

60

80

100

120

140 20 40 60

80

(a)

20

40

60

80

100

120

140 20 40 60 80 100 120 140 160 180

(b)

Figure 2.21: Hough transform for lane tracking. Here each edge pixel in the image space (a) contributes to the set of possible parameterizations in Hough space (b) that may have generated it. Intensity in (b) represents peak size. The vertical axis is and the horizontal axis is . Four peaks exist in the Hough transform, one for each of the four lines in (a). Notice that the space is periodic; therefore, the two peaks on the right edge correspond to the same two lines as the two peaks on the left edge. Picture taken from (McDonald et al. 2001)

2.4 Summary

45

Figure 2.22: Massively parallel road follower. A trapezoid is used to model the road and a Hough transform calculates the centreline of the road using a 5-30 cluster colour model for road and non-road colours. Picture taken from Jochem 1993. Early work by Jochem and Baluja (1993) also used a Hough transform for estimating the centreline of the road, however they used colour as the primary cue (gure 2.22). A 5-30 cluster colour model is used to approximate the distribution of road and non-road colours, which are then used for pixel classication. The Hough transform is then used to estimate the centreline of the road from the classied pixels. It was designed to run on a massively parallel architecture, but is initialized by the user outlining the road in the rst frame.

2.4

Summary

While the task of lane tracking has the distinct advantage that roads are a semistructured environment, it is still an inherently dicult problem that requires a combination of information rich cues and intelligent tracking algorithms to succeed. Further information than just the observations is required to regularize the solution and the lane trackers in the literature are characterized by the assumptions used. Many lane trackers assume that there are well-dened lane markers that dierentiate the road from non-road regions (Dickmanns and Zapp 1987; Lakshmanan and Kluge 1996; Bertozzi and Broggi 1998; Kreucher and Lakshmanan 1999; McDonald et al. 2001). This assumption can be used to limit the search space to bar-like intensity proles within the observation. Bertozzi and Broggi (1998) use these intensity proles directly to track the lane, while Dickmanns and Zapp (1987) use horizontal gradient proles within the image as the observations. McDonald et al. (2001) uses the brightness of lane markers to mask a gradient map

46

Previous work

to nd lane marker edges, while LANA searches for lane markers in the frequency domain (Kreucher and Lakshmanan 1999). This is often still insucient to robustly track lanes, so some systems further constrain the solution by including a priori information: Suzuki et al. (1992) and Kenue (1989) constrain the lane edges to meet at a point in the horizon and Bertozzi and Broggi (1998) and Pomerleau (1995) assume that the road is at and that the lane markers run parallel to each other. Dickmanns and Zapp (1987) use a highly accurate dynamical model to extend the single image approach to lane detection into the time domain (the 4D approach). This restricts the search space at frame t of the algorithm to a region predicted from frame t 1 using the dynamics of the vehicle. This was a major advance in lane tracking since the predicted lane location constrained the search space, which enabled the ecient use of the limited computational resources. The next generation of lane tracking algorithms use more advanced area-base techniques for lane detection that form explicit models of the lane structure for validation of hypotheses (Lakshmanan and Kluge 1996; Kluge et al. 1998; Kreucher and Lakshmanan 1999). This conferred the additional benet of reducing the eect of noise in the observation space, but conversely increased the computational complexities of the algorithm. One common characteristic of almost all of these systems is that they rely on only a single cue for lane detection, which is used regardless of how well it is performing. No attempt is made to seamlessly track the road as its characteristics change at the fundamental level of perception (i.e. from a highly structured highway to semi-structured lanes to unstructured o-road conditions). To achieve the level of robustness required for commercial acceptance, lane trackers must be able to handle a large variety of scenarios. It is clear that the failure modes of the systems presented here are heavily inuenced by the cues used; therefore, it would appear that multiple cues are essential for robust tracking. In addition, eort must not be wasted on cues that provide little information. After all, innite computational power is still a long way o. In contrast with the CMU systems that accommodate changes in road appearance by switching pre-trained or retrained models, our system is based on the distillation algorithm, which attempts to dynamically allocate computational resources over a suite of cues to robustly track the road in a variety of situations (4; Loy et al. 2002; Apostolo and Zelinsky 2004).

Chapter 3 Experimental platform

HE selection of a suitable experimental platform is the rst of a series of

dicult problems faced in an intelligent vehicles project. A number of criteria must be satised. First, the platform must be suciently exible to t all the computers that are required for experimentation. Second, it must withstand the rigors of experimental operation. Third, it must supply sucient power to run the necessary computing and sensory equipment. Fourth, it must be adaptable and easy to work on. Finally, it must have all this at a relatively low cost. This chapter outlines the experimental platform used for this research. An overview of the actuators and sensors tted on TREV (Transport Research Experimental Vehicle) together with a detailed description of the on-board vision systems used for lane tracking is provided.

3.1

Vehicle overview

The vision in and out of vehicles (Apostolo and Zelinsky 2004) project at the ANU brings with it an additional set of requirements on the platform:

several vision platforms are required for vision inside the vehicle to monitor the driver and vision outside the vehicle to monitor the surrounding scene; automatic control of the steering, acceleration and braking for autonomous operation requires sucient space for after-market modications to the vehicle;

48

Experimental platform

Figure 3.1: TREV: the experimental platform. a communications network is necessary to allow the actuators, sensors and computers to exchange information. With this in mind, the experimental platform selected for the rst iteration of research was a 1999 Toyota Landcruiser 4WD (gure 3.1). A 4WD vehicle was chosen for a number of reasons. First, it provides a robust platform capable of surviving the rigors of experimentation. Second, it has sucient interior space for installing sensors, computers and actuators. Finally, it allows the option of o-road autonomous driving research (Fletcher et al. 2001). For environment sensing, a number of sensors were installed including two vision systems, a global positioning system (GPS), inertial navigation sensor (INS), a SICK laser range nder and a pulse FM radar. Additionally, three actuators were tted to control the steering, acceleration and braking of the vehicle.

3.2
3.2.1

Sensors
Vision

As vision is the main form of environment sensing, two dierent vision platforms have been installed in TREV (gure 3.2 and extension 1 in appendix D). A passive set of cameras mounted on the dashboard are part of the faceLABTM
1

system

for driver monitoring (Victor et al. 2001). The other vision platform, CeDAR, is an active vision head designed at ANU (Sutherland et al. 2000; Dankers and
A commercial face and eye gaze tracking system built by Seeing Machines (http://www.seeingmachines.com).
1

3.2 Sensors

49

Figure 3.2: Vision platforms in TREV. Top: CeDAR active vision head. Right (above the steering wheel): faceLABTM passive stereo cameras. Zelinksy 2003) and carries two cameras that are used for dual near-eld and fareld scene coverage. The near-eld camera on CeDAR is the main form of sensing for the lane tracker. It was used passively at 60 Hz with a eld of view of 46.4 o . The far-eld camera was congured for curvature tracking and was used at 60 Hz with a eld of view of 17.06 o (gure 3.3).

Lane tracking CeDAR is an active stereo vision head designed in the Robotics Systems Laboratory (RSL) at ANU (Sutherland et al. 2000) for research into active vision systems for target detection and tracking (gure 3.4). The platform uses a parallel architecture with a cable drive mechanism to achieve performance superior to the dynamics of the human eye with zero backlash. The Helmholtz conguration of the head gives the system three degrees of freedom in the tilt, left verge and right verge axes (gure 3.5). Table 3.1 summaries the

50

Experimental platform

FOV = 46.40

FOV = 17.06

L0.05 = 21.9m L0.05 = 53.3m

Figure 3.3: Fields of view and lookahead distances of the two cameras on CeDAR. The short lookahead distance of L0.05 = 21.9 m is intended for maximum scene coverage in the near-eld and was the smallest possible with the range of focal lengths available on the cameras. See 5.3.1 for more details on this conguration.

Figure 3.4: CeDAR: the active platform used in TREV for lane tracking.

3.2 Sensors

51

Figure 3.5: CeDAR: the Helmholtz conguration has three degrees of freedom that allows left and right verge, and head tilt. performance specications of the head. Specication Saccade rate (Hz) Angular Resolution (o ) Angular Repeatability (o ) Maximum Range (o ) Maximum Velocity (o /s) Maximum Acceleration (o /s2 ) Tilt 5 0.01 0.01 90 600 18000 Verge 6 0.01 0.01 90 800 20000

Table 3.1: Performance specications of CeDAR. CeDAR is positioned in place of the rear-view mirror, facing the front of the vehicle (gures 3.2 and 3.6). This location was chosen as it oered the optimal angle of view for the road and surrounding area.

Why active vision? There are several reasons why active vision is advantageous in an intelligent vehicle project: Stabilization of the camera frame can be performed with respect to vibrations from the road surface. This is important for stabilising the far-eld image streams that are used for road curvature tracking (A.3).

52

Experimental platform

Figure 3.6: CeDAR is located in place of the rear-view mirror in TREV.

In modelling the curvature of the road, the cameras can follow the shape of the road to obtain a superior view for model estimation.

Saccade movements can be used to increase the physical search space for lanes, obstacles and pedestrian detection.

The performance characteristics of CeDAR easily make it suitable for the tasks associated with intelligent vehicles.

Driver state monitoring faceLABTM (Victor et al. 2001) is a head pose and eye gaze monitoring system commercialised by Seeing Machines (Seeingmachines 2004) based on research and development work between ANU and Volvo Technological Development Corporation. It uses a passive stereo camera pair mounted on the dashboard of the vehicle to capture 60Hz video images of the drivers head. These images are processed in real-time to determine the 3D position of matching features on the drivers face. The features are then used to calculate the 3D pose of the persons face (1mm, 1o ) as well as the eye gaze direction (3o ), blink rates and eye closure. The technology has been developed for driver safety systems, particularly driver fatigue and inattention measurement. In this thesis, faceLABTM is used in a demonstration application that fuses lane tracking and driver monitoring to analyze where and when the driver is looking at the road (6).

3.2 Sensors

53

(a) faceLABTM inside a test vehicle

(b) faceLABTM world environment

Figure 3.7: faceLABTM inside a test vehicle (a) and the world environment illustrating a vehicle interior and the drivers focus of attention (b).

Figure 3.8: The location of the laser range nder is in the centre of the bull-bar on TREV.

3.2.2

Radar and laser range nder

TREV has been equipped with several other external sensors for obstacle detection including a 94 GHz Pulsed FM Millimeter Wave Radar (gure 3.8; Mihajlovski 2002) and a SICK Laser Range Finder (LMS221-30206). Both can be interchangeably mounted on the bullbar at the front of the vehicle and used, by themselves or in conjunction with the lane tracking system (chapter 5), for obstacle detection and tracking. Note that neither the radar nor the laser range nder are used in the lane tracking system presented here.

54

Experimental platform

3.2.3

Vehicle state sensors

A global positioning system (GPS), inertial navigation sensor (INS) and a steering position transducer have been tted onto the vehicle, and are used with the factory standard tailshaft encoder to determine the current dynamic state of the vehicle. The Trimble AgGPS132 DGPS receiver has sub-meter dierential GPS accuracy and velocity accuracy rated at 0.16 kph. This sensor is capable of providing data at the rate of 10Hz which is useful for high level navigational problems, correcting drift in the INS unit and providing additional data for fusion with the tailshaft encoder and steering position sensor. The Crossbow FOG-AUTO IMU600AA INS unit provides fully compensated angular rate and acceleration outputs suitable for automobile applications with output occurring up to 200Hz. This can be used to stabilize the active vision system with respect to vibrations caused by road noise. A HSI-Houston scientic 1950 position transducer has been tted to the steering mechanism in the undercarriage of the vehicle to enable steering angle measurements to be made. Combining this with readings from the tailshaft encoder, the vehicle speed and heading direction can be measured for use in the motion model of the lane tracker (5.2).

3.3

Actuators

Three actuator sub-systems have been tted to control the steering, acceleration and braking. Acceleration control has been achieved via an interface to the vehicles cruise control unit through the SNAP IO module (3.4). A Raytheon rotary drive motor/clutch unit is used for steering control. This was installed in the engine bay next to the steering shaft of the vehicle (gure 3.9). The steering shaft of the vehicle is separated from the Raytheon drive motor unit via an idle gear so that manual control of the steering can be recovered at anytime by the driver. A linear driver unit produced by Animatics is used to control the braking subsystem (gure 3.10). The linear drive is connected to the brake pedal through a steel cable that is protected by a guiding sheath. Braking is accomplished through the linear driver unit pulling on the cable. An electromagnet connects the cable

3.4 Computing architecture

55

Figure 3.9: Location of the Raytheon drive motor steering actuator next to the steering shaft in the engine bay. to the linear driver unit and must be powered for the linear driver to operate. This safety feature enables power to be cut to the linear driver unit and return manual control to the operator of the vehicle. All three of these actuators are connected to the ethernet on the vehicle through a SNAP I/O module to allow autonomous control of the vehicle (3.4). The amalgamation of these three actuators allows full lateral and longitudinal control of the vehicle.

3.4

Computing architecture

Figure 3.11 shows the computing and communications architecture on-board TREV. One monitor and set of input devices is used to control multiple computers through an electronic switching box. An ethernet hub is the networking backbone of the vehicle allowing communications to occur between the various computers and devices. Access to the low level hardware devices such as the steering actuator and the tailshaft encoder is accomplished via a SNAP I/O B3000ENET unit. The main purpose of the SNAP I/O unit is to provide a network presence for any hardware device that has a communications interface.

 

' &

) (

# "

% $

 

 

 

 

 

 

' &

) (

# "

% $

 

 

 

 

 

 

' &

) (

# "

% $

 

 

 

 

 

 

' &

) (

# "

% $

 

 

 

 

 

 

' &

) (

# "

% $

 

 

 

 

 

 

' &

) (

# "

% $

 

 

 

 

 

 

' &

) (

# "

% $

 

 

 

 

 

 

' &

) (

# "

% $

 

' &

) (

# "

% $

 

' &

) (

# "

% $

 

' &

) (

# "

% $

 

' &

) (

# "

% $

 

 

 

 

 

 

' &

) (

# "

% $

 

 

 

 

 

 

' &

) (

# "

% $

 

 

 

 

 

 

' &

) (

# "

% $

 

 

 

 

 

 

' &

) (

# "

% $

 

' &

) (

# "

% $

 

' &

) (

# "

% $

 

' &

) (

# "

% $

 

 

 

 

 

 

 

 

 

 

 

' &

) (

# "

% $

 

 

 

 

 

 

 

 

 

 

 

' &

) (

# "

% $

Figure 3.10: Location of the Animatics braking actuator under the drivers seat in the vehicle.

56

CeDAR active vision system

Figure 3.11: Computing and communications architecture on TREV.


Lane tracking Steering angle sen. Visual output display and input devices Steering actuator

Tailshaft encoder CeDAR control SNAP I/O Unit Brake actuator Switching box

INS faceLAB Cruise control

The system coordinator

Experimental platform

Ethernet hub

GPS faceLAB vision system

3.5 Summary

57

Any computer that requires access to an image stream, such as the lane tracking computer, is connected to the cameras through a set of framegrabbers located on that computer. The CeDAR cameras (two Sony CCB EX37 colour units) are the only devices on the car that are directly accessed by the computers. All other interaction is done over the network via the SNAP I/O unit. Images are captured from these cameras at 60Hz using two Imagenation PXC200L framegrabbers. The four computers installed in TREV are for image processing, hardware control and high level data interpretation. A Pentium IV 1.4 GHz computer is dedicated to the lane tracking algorithm. The faceLABTM system requires a single computer for driver head pose and eye gaze estimation. The third computer is used to control the three axes on the active vision head, while the fourth computer is the coordinator of the system and performs all high level computations. All four computers communicate over the ethernet network installed in the vehicle.

3.5

Summary

TREV was designed to allow comprehensive experimentation in the eld of intelligent transportation systems and autonomous vehicles. It is an ideal platform for lane tracking experimentation. The high performance CeDAR head supports a variety of congurations for lane tracking tasks with the active component essential for building sophisticated lane tracking systems (A.3). The large space capacity of the vehicle allows the use of o-the-shelf components thereby reducing the expense of the system. More importantly, TREV has the full range of sensors necessary for lane tracking experimentation with a number of additional systems that are available for use in novel demonstration applications with the lane tracker. The system setup can easily coordinate the vision systems of faceLABTM and CeDAR allowing for experiments involving vision in and out of the vehicle. For example, faceLABTM and the lane tracker are combined in this dissertation to create a driver assistance system that determines whether the drivers attention is focused on the road (6).

58

Experimental platform

Chapter 4 Distillation: a detection and tracking algorithm

HE target detection and tracking algorithm is undoubtedly the core of any

lane tracking system. It is used to search over the space of possible road models for the most likely estimate of the scene and governs not only the performance of the system, but also the framework within which it must operate. TREV uses an algorithm called distillation that was developed at the Robotics Systems Laboratory, ANU, for lane tracking and for face localization and tracking. The distillation algorithm is a novel method for target detection and tracking that combines a particle lter1 with a multiple cue fusion engine and is suitable for both low and moderately high dimensional tracking problems. The algorithm is grounded in Bayesian statistics and is self-optimized using the Kullback-Leibler distance. This attempts to produce the best statistical result given the computational resources available. The basis of the distillation algorithm is that a suite of cues are used to search for a target, the performance of each cue over time is evaluated and the set of cues that are performing optimally are distilled to a subset of cues that can run at the real-time frame rate2 (15 Hz in the case of the lane tracker). The remainder of the cues are processed at speeds less than the frame rate and their results are monitored for a contribution
1 A particle lter (Blake and Isard 1998) is a search method that represents a continuous posterior density with a set of discrete particles, or hypotheses. These particles represent the target location and are moved to positions of high probability to concentrate computational power in those areas of interest. 2 The real-time frame rate is dened here to be a minimum desired processing speed for the cues that is guaranteed over time. This number is application specic. For example, highly dynamic applications might require a minimum frame rate of 60 Hz. In the case of lane tracking system, 15 Hz is sucient.

60

Distillation: a detection and tracking algorithm

to the overall tracking. The cues can be reinstated to run at the frame rate at any time if it is determined that their contribution will result in improved tracking performance. This allows cues to run at dierent frequencies enabling slow running, but valuable cues to run in the background. This chapter describes the distillation algorithm in detail, focusing on the particle ltering and cue fusion technologies used. A hypothetical lane tracking is used to explain the development of the particle lter from its Bayesian roots and to outline its application to tracking problems. The novelty of the distillation algorithm lies in the integration of the particle lter, cue fusion and cue adaption technologies used and in the seven main characteristics that were initially set out as goals of the algorithm in its design stage: Adaptive at the cue level. The system should automatically adapt its choice of cues based on sensor input and cue performance, to maximise information extraction from the data stream given the computational resources that are available. Robust to sensor noise and discontinuities in tracking trajectories. Ecient use of nite computational resources. The system should dynamically allocate computational resources to the dominant cues so that the best realtime tracking result is achieved within the computational constrains of the system. Extendable. With the continued increase in computational power (described by Moores Law3 ), increasingly more advanced techniques in image processing become viable alternatives to current algorithms. The system should be able to easily incorporate new image processing algorithms with respect to improving existing cues or adding new cues. Scalable architecture that allows the system to be applied to more dicult problems with scalable computational time. Probabilistic approaches t naturally into an uncertain world and can provide statistical information regarding the performance of the system as well as a measure of condence in the nal result.
Moores Law which was described by Gordon Moore in 1965 when he observed that the capacity of the microchip seemed to double every 18 to 24 months (Moore 1965). This increase has continued to occur until the present day.
3

4.1 The structure of the distillation algorithm


Apply Motion Model

61

Preprocess Evidence Schedule Cues

Particles Evaluate Probabilities

Cue Processor
Probabilities

Update PDF

Particle Filter

Diffuse Particles

Calculate Metrics Fuse Cue Probabilities

Resample Particles

Figure 4.1: The two subsystems of the distillation algorithm. The cue processor manages the computational resources and cues, while the particle lter controls the search. Generic core structure that allows the algorithm to be applied without modication to various tracking problems.

4.1

The structure of the distillation algorithm

The framework of the distillation algorithm is separated into two main subsystems: the cue processor and the particle lter (gure 4.1). Information that ows between the two subsystems is limited to the particles that form the search basis, xlt , and the probabilities returned by the cue processor for each particle, p(o|xlt ). Here, o is the set of observations used by the cue and p(o|xlt ) is the likelihood of these observations given the particles, xlt . The particles are hypotheses or guesses at the target location in the state space and are tested by the cues. For the lane tracker, these particles represent the lateral oset of the vehicle, ysr , and the yaw of the vehicle with respect to the centreline of the road, vs , and the road width, rw (5). Additional parameters can be added to estimate the curvature of the road (A.2). The particle lter controls the search process by manipulating the position of the particles in the state space based on the probabilities that are returned by the cues (4.2). Particles are moved to regions of high probability, allowing greater resolution in areas of interest in the posterior distribution.

62

Distillation: a detection and tracking algorithm

The cue processor has a number of functions (4.3). First and foremost, it schedules which cues are to run in the current iteration based on their performance in past iterations . Second, it processes the likelihood of each particle passed from the particle lter with each cue. Third, it fuses the likelihoods of the cues. Fourth, the performance of each cue is evaluated and nally the cue processor passes the likelihood probabilities of each particle back to the particle lter.

4.2

Particle ltering for target tracking

The particle lter has evolved from Bayes ltering to a recursive method capable of searching previously intractable multi-modal non-Gaussian state-space problems. The continuous posterior is approximated by a discrete set of particles, or hypotheses, that represent the target location. The particle lter moves particles to regions of higher probability to concentrate computational power in these areas of interest. There are a number of characteristics conferred by the particle lter algorithm that are particularly benecial for lane tracking: No Gaussian assumptions are made about the posterior distribution. The lter removes the shape restriction on the posterior distribution required by other algorithms. For example the Kalman lter assumes Gaussian distributions throughout to simplify the analytical expression that is required to update the posterior. This assumption can be grossly inaccurate and causes the algorithm to fail for posteriors that do not resemble Gaussian distributions or that are multi-modal. The particle lter allows the posterior distribution to take any shape. Information concerning the condence of the result can be directly derived from the particle distribution. Particle lters can be congured to gracefully handle catastrophic tracking failures through the random distribution of a small subset of the particles over the state space (i.e. the kidnapped robot problem (Thrun et al. 2001a)). Particle ltering concentrates computational resources in areas of the posterior that are most relevant, by sampling in proportion to the posterior distribution (Thrun et al. 2001a).

4.2 Particle ltering for target tracking

63

The number of particles can be adjusted dynamically to adapt to the available computational power. However, reducing the number of particles also decreases the resolution of the output and the likelihood of nding a global maximum in the posterior. The theory presented below is an overview of particle ltering and appears in many texts (Isard and Blake 1998; Vlassis et al. 2001) in several dierent forms; however, (Thrun et al. 2001b) presents an excellent derivation of the particle lter within a robotics framework and should be referred to for more detail. The derivation provided here is designed to give a broad overview of the technique focusing on the important aspects of the algorithm with respect to its implementation.

4.2.1

Bayes ltering

The recursive Bayes lter described by equation 4.1 is a common ltering technique used in the eld of mobile robotics (for simultaneous localization and mapping). This equation is the basis upon which the particle lter is built. The recursive nature of the Bayes lter allows the calculation of the current posterior of states x at time t, p(xt |d0...t ), based upon knowledge of the current evidence d (consisting usually of observations, o, and actions, a) and the previous posterior, p(xt1 |d0...t1 ). It is important to note that it is assumed that the environment upon which the Bayes lter operates is Markov, in that past and future data are independent if one knows the current state. This is a valid assumption for lane tracking given that the vehicle is under full control. To determine the current state of the vehicle with respect to the road, all that is required is the previous state, the actions from that state (i.e. the odometry and dynamic state of the vehicle) and the current observations to correct errors in the action data. The core of the Bayes lter is in the recursive equation that links the current state with the posterior from the previous iteration: p(xt |d0...t ) = t p(ot |xt ) p(xt |xt1 ; at1 )p(xt1 |d0...t1 )dxt1 (4.1)

where xt denotes the state at time t, d0...t denotes all evidence from time 0 to time t and is a normalization factor that does not depend on x. Equation 4.1 will now be derived from equation 4.2, which describes the probability of state xt at time t given all data d0...t from time 0 to time t. p(xt |d0...t ) = p(xt |ot ; at1 ; ot1 ; at2 ; . . . ; o0 ) (4.2)

64

Distillation: a detection and tracking algorithm

In this case two types of data are present and arrive alternatively. First, the observation o which comes in the form of image data from the cameras. Second, the action a that comes in the form of vehicle state information (velocity v and steering angle ). Applying Bayes rule to equation 4.2 the posterior is expanded to give p(xt |d0...t ) = t p(ot |xt ; at1 ; ot1 ; . . . ; o0 )p(xt |at1 ; ot1 ; . . . ; o0 ) (4.3)

The Markov assumption reduces p(ot |xt ; at1 ot1 ; . . . ; o0 ) to p(ot |xt ) and then p(xt |d0...t ) = t p(ot |xt )p(xt |at1 ; ot1 ; . . . ; o0 ). (4.4)

Integrating the rightmost expression over time using the theorem of total probability gives p(xt |d0...t ) = t p(ot |xt ) p(xt |xt1 ; at1 ; ot1 ; . . . ; o0 ) p(xt1 |at1 ; ot1 ; . . . ; o0 )dxt1 . (4.5) Applying the Markov Assumption once more reduces p(xt |xt1 ; at1 ; ot1 ; . . . ; o0 ) to p(xt |xt1 ; at1 ) and noticing that the at1 element in the rightmost term has no eect on the result of that term (as at1 only aects the state at time t not t 1), the recursive formula for the Bayes lter reduces to p(xt |d0...t ) = t p(ot |xt ) p(xt |xt1 ; at1 )p(xt1 |ot1 ; . . . ; o0 )dxt1 . (4.6)

This equation has two conditional distributions that are fundamental to the lter. The conditional distribution p(xt |xt1 ; at1 ) is called the motion model and it updates the prior p(xt1 |at1 ; . . . ; o0 ) given any actions made from time t1 to time t (4.2.2). Note that the prior for time t is the posterior from time t1. This integral gives the temporary distribution which is referred to by Thrun et al. (2001a) as the proposal distribution. The sensor model, p(ot |xt ), is used to validate the proposal distribution and describes the likelihood that the observation ot is true given the state xt (4.2.2). This is also known as the likelihood distribution. Combining the observation and action models with an initial prior, this equation denes a recursive estimator for the state of a partially observable system. Figure 4.2 shows the distributions of the Bayes lter that are modied during each step of the algorithm. This gure is based on a hypothetical lane tracker that detects the lateral oset of the vehicle from the lane center. The posterior

4.2 Particle ltering for target tracking


Deterministic drift (motion model)

65

Intermediate states of the motion model

Posterior at t1
p p

Diffision (motion model)

Posterior at t
p p

Observation (sensor model)

Figure 4.2: Example distributions from a Bayes lter for the case of a hypothetical lane tracker that detects the lateral oset of the vehicle from the lane centre. The posterior from iteration t 1 is modied by the motion model via a deterministic drift based on the actions (odometry) of the vehicle and a diffusion of the distribution determined by the error characteristics of the action sensors ( p(xt |xt1 ; at1 )p(xt1 |ot1 ; . . . ; o0 )dxt1 ). The observation step modies the proposal distribution based on measurements from the vehicle observation sensors (p(ot |xt )) to form the posterior (p(xt |ot )). from iteration t 1 is modied via the deterministic drift4 and diusion5 noise of the motion model. The deterministic drift can be observed in the shift of the distribution to the left in proportion to the measured shift of the vehicle. The error in the odometry causes the distribution to shift slightly further to the
The deterministic drift is the movement of particles that would occur due to the motion of the vehicle if its action (odometry) sensors were noiseless. 5 The diusion accounts for errors in the action (odometry) sensor.
4

66

Distillation: a detection and tracking algorithm

left than the car actually moved. This error is corrected via the diusion of the distribution and the next step. The intermediate distribution of the motion model is then corrected with the observation data (images) from the vehicle through the sensor model to form the posterior.

4.2.2

Particle ltering

The central problem with the continuous form of the Bayes lter is that the sensor and motion models require analytical solutions to be evaluated. This is often dicult and inecient to implement for all but the simplest of problems. One way of dealing with this is to generate approximate solutions with Gaussian distributions, which is the case in the Kalman lter family of algorithms. However this type of solution fails when the solution is non-Gaussian. An alternative way of solving this problem is to approximate the continuous distributions of equation 4.6 with a discrete set of weighted samples. The form of the posterior then simplies to:
n

p(xt |d0...t ) = t p(ot |xt )


i=1

p(xt |xt1 ; at1 )p(xt1 |ot1 ; . . . ; o0 )

(i)

(i)

(4.7)

where (i) denotes the ith particle. The posterior becomes a set of n tuples, each consisting of a particle, or hypothesis, of the state x and its weight w.

p(xt |d0...t ) {x(i) , w(i) }i=1,...,n w(i) = t p(ot |xt )


k (k) (k) p(xt |xt1 ; at1 )p(xt1 |ot1 ; . . . ; o0 ).

(4.8) (4.9)

Each weight is a non-negative number representing the belief in the particle, which is normalized so that the sum of the weights is one. By approximating the equation through sampling, the particle lter relaxes the constraint of requiring analytical solutions to the motion and sensor models, to requiring a sample based motion and sensor model mechanism to regenerate the samples of the posterior from the prior using the current data. In prior work it has been shown that the solution in equation 4.7 converges to the true posterior of the Bayes lter (equation 4.1), under certain weak assumptions (Tanner 1996), when the number of particles, n, approaches . However, there is a tradeo between the number of particles and the execution speed of the

4.2 Particle ltering for target tracking

67

Apply Motion Model

Start

Update PDF

Particle Filter

Diffuse Particles

Resample Particles

Figure 4.3: The four iterative steps of the particle lter algorithm. First the pdf is updated using the sensor model and then the particles are resampled according to the pdf. The motion model is then applied to diuse the particles and apply a deterministic drift according to the actions performed by the robot.

algorithm. An increase in the number of particles proportionally increases the time required to update the posterior as well as increasing the accuracy of the lter. As mentioned previously, there are two important terms in equation 4.7, the motion model and the sensor model. The combination of these two terms with an initial prior, is sucient for the full update of the discrete posterior distribution. The full update consists of the four steps shown in gure 4.3. The summation
i

p(xt |xt1 ; at1 )p(xt1 |ot1 ; . . . ; o0 ) in equation 4.7 applies the

(i)

(i)

motion model to the prior (i.e. the posterior from the previous time-step) which resamples the particles based on their weights, moves the particles according to the action (odometry) of the vehicle and diuses them according to the accuracy of the action sensors. Specically, it describes the probability of particle xt existing given the action at1 on each particle xt1 from the previous time-step. In practice this is implemented by a sampling technique that is specic to the problem being solved (4.2.2; Thrun et al. 2001b). The update step is a direct
(i) (i)

68

Distillation: a detection and tracking algorithm

application of the sensor model, p(ot |xt ). A particle lter version of the example presented in gure 4.2 is shown in gure 4.4 where the deterministic drift of the motion model is shown in two steps. The motion model rst resamples the particles based on their weights from the prior to shift them to areas of high probability and then moves them according to the actions of the vehicle. The particles are diused to account for the error in the odometry action sensors (note that the particles were shifted further than the car actually moved due to the error in the odometry sensors). The sensor model p(ot |xt ) then re-evaluates the weights on the particles based on the observations to form the posterior.

The motion model The motion model accounts for the vehicles actions in each cycle and for the errors inherent in the measurement of these actions. It updates the positions of particles based on the vehicles action and disperses them according to the error characteristics of the vehicles action sensors. In the case of the lane tracking system described in chapter 5, the particles representing the state of the vehicle are randomly resampled from the weighted sample set such that the particles are distributed according to the prior distribution. This moves them to areas of higher probability. The particle positions are then updated according to the Ackermann motion model (5.2) combined with a normal random dispersion to account for measurement errors.

The sensor model The sensor model completes the particle lter update step by assigning a weight to each particle through the likelihood distribution p(ot |xt ). This enables the particle lter to migrate particles into areas of high probability in the next iteration, which increases the resolution in areas of interest. In the lane tracking system each sensor is a cue that is used to nd the lane based on the particular observation that the cue uses. For example there are cues based on edge maps of the road images as well as road colour probability maps and each of these cues uses a different sensor model to calculate the probability p(ot |xt ) of observation ot existing given the state xt (4.3.2). The sensor model must take into account the error characteristics of the sensor when calculating the weights that will be used to

4.2 Particle ltering for target tracking


Deterministic drift (motion model)

69

Intermediate states of the motion model

Posterior at t1
p p

Diffision (motion model)

Posterior at t
p p

Observation (sensor model)

Figure 4.4: Distributions from the particle lter for the case given for the Bayes lter (gure 4.2). The posterior from iteration t 1 is modied by the motion model via a deterministic drift of the distribution based on the actions (odometry) of the vehicle (which also re-samples the distributions according to the weights of the prior) and a diusion of the distribution based on the error characteristics (i) (i) (i) (i) of the action sensors ( i p(xt |xt1 ; at1 )p(xt1 |ot1 ; . . . ; o0 )). The observation step modies the proposal distribution using the observations from the vehicles sensors (p(ot |xt )). Particles occluding each other in the top-right gure indicate multiple particles at the same position.

update the posterior. Due to the nature of the observations (image based), these cues follow a dierent approach to the sensor model than is typically applied to mobile robot localization (Thrun et al. 2001b). In the case described by Thrun et al. (2001b) a laser range nder is modelled using three dierent elements. Based on the 2D pose of the robot (which is the state

70

Distillation: a detection and tracking algorithm

Expected Distance

Gaussian P(o|x)

Tail

Exponential x

Figure 4.5: Thruns laser range nder sensor model. space of the particle lter) and the map of the robots environment, the sensor is modelled using a combination of a Gaussian, an exponential component and a sensor failure component. The Gaussian is centered at the measurement that should be returned by the sensor based on the map and the hypothesised pose. The exponential component is designed to model occlusions and passing humans, and a high probability component is added to the tail of the distribution to model sensor failures (gure 4.5). The combination of these three terms provides an ecient analytical mapping between each particle and the probability of the observation given the particle state. In the lane tracking system the input space of the observation is large since it is an image and the simple approach described by (Thrun et al.) would be very inecient to apply. Thrun et al. would be inecient because it would require manipulating the image stream for every single particle. A dierent approach is taken where the observation image is manipulated to include the error characteristics of the sensor model explicitly so that measurements can be eciently sampled directly from the image (5.3.3). In the majority of circumstances the image map containing the observation was blurred with a Gaussian kernel of a standard deviation equal to the expected error. This produces a smooth surface in the sensor model aiding the convergence of the particle lter.

4.3

The cue processor

Many perceptual tasks require information from several dierent sources for robustness. Intelligent agents that act in dynamic environments cant rely on indi-

4.3 The cue processor

71

vidual cues to maintain their accuracy over time. An intelligent agent is dened as a system situated within and as a part of an environment, that senses that environment and acts on it, over time, in pursuit of its own agenda and so as to eect what it senses in the future (Franklin and Graesser 1997). An example is an autonomous robot or vehicle that has to navigate autonomously. Since individual cues are so unreliable, mechanisms for fusing and adapting cues are active areas of research (Triesch and von der Malsburg 2000; Kittler et al. 1998; Nageswara 2001). At the beginning of this chapter a number of design goals for the distillation algorithm were listed. The cue processor was designed to fulll four of these goals directly (adaptive, ecient, extendible and generic) and two in combination with the particle lter (robust, and probabilistic). The cue processor is adaptive in selecting which cues are performing best and provides those cues with more computational resources. This action distills6 the set of cues to those that contain more information. The ecient use of computational resources is achieved by the adaptive nature of the cue processor. Computational resources are allocated based on the utility7 of each cue and the time required to calculate each cue (4.3.1). This divides computational resources across the cues in a way that maximizes the information extracted. The process is extendible and generic in that cues can be easily inserted into the algorithm without any modication to the program structure regardless of their internal mechanism; however, each cue must agree on the state space that is being used in the system and have some knowledge of the resources that are available and depending on the fusion mechanism, be probabilistically independent of each other. In combination with the particle lter, robustness and probabilistic performance is achieved through the use of multiple cues and cue fusion. To summarize, the cue processor subsystem of the distillation algorithm has two main responsibilities: the fusion of multiple cues over time and the management of cues for real time performance. Real time performance is achieved through the scheduling of cues such that a minimum frame rate is preserved given the computational resources of the system. The cue processor cycle shown in gure 4.6 is executed each time the particle lter
The method of distilling the set of cues to a set that contains a higher concentration of information is the motivation behind the name the distillation algorithm. 7 The utility of a cue is a measure of the performance of the cue with respect to the other cues. It is calculated using the Kullback-Leibler divergence (Soto and Khosla 2001).
6

72

Distillation: a detection and tracking algorithm

Preprocess Evidence Schedule Cues

Particles Evaluate Probabilities

Cue Processor
Probabilities

Calculate Metrics Fuse Cue Probabilities

Figure 4.6: Cue processor cycle.

requires an update of the posterior in each iteration of the distillation algorithm. The processor fuses the weights wt
(i)

returned from each cue, which are then

normalized to give the sensor model distribution. The cycle consists of ve key steps that are used to evaluate the posterior. In the rst step the cues are scheduled to run as either foreground or background cues based on their respective utility and execution time. A foreground cues run at the frame rate of the system while a background cue can run at rates slower than the frame-rate. To make the algorithm more ecient, each cue is preprocessed so that any observations that need to be collected or manipulated (such as grayscale images, edge maps and colour probability maps) are calculated once only and not by each cue separately. After preprocessing, each cue, (j), calculates its sensor model, p(ot |xt ) (4.2). These distributions are fused together and normalized in the last step to give the posterior. Foreground cues calculate their sensor models in this frame while background cues only partially calculate their sensor models and are fused with the foreground cues in the future time-step in which they complete their calculations.
(j)

4.3 The cue processor


Apply Motion Model

73

Preprocess Evidence Schedule Cues

Particles Evaluate Probabilities

Cue Processor
(Foreground Cue Cycle) Probabilities

Update PDF

Particle Filter

Diffuse Particles

Calculate Metrics Fuse Cue Probabilities

Resample Particles

Add Cue to Finished Cues

Slow Cue Processor


Preprocess Evidence Evaluate Probabilities

(Background Cue Cycle)

Figure 4.7: Distillation algorithm with a slow processing loop.

4.3.1

Scheduling and performance evaluation

The computational resources of the system are dynamically allocated based on metrics that predict the future performance of each cue. This conguration not only increases the performance of the system, since it dynamically chooses the cues most suited to the current conditions, it also makes the system exible to future changes in hardware and software. The concept of foreground and background cues was introduced in the previous section. Figure 4.7 shows a slow processing loop running in the background processing the background cues. When results become available from the background cues their likelihoods are added to a list to be fused at the end of the current cycle in the main cue processing loop. The distillation algorithm traces the path of the particles over time so that the current position of a particle from a previous time-step is known and the sensor model from that previous time-step can be fused with the sensor models of the current time-step.

74

Distillation: a detection and tracking algorithm

Performance evaluation To dynamically allocate computational resources over a set of cues, the distillation algorithm uses a measure of each cues performance, or utility, and execution time to select which cues are to be foreground and background in the next iteration. This scheduling procedure maximizes the amount of information that can be obtained from the complete set of cues in one time frame, tf , using the computational resources available. Assuming that the fused result from all the available cues is close to the true likelihood p(ot |xt ), then a good measure of the information contained the in j th cue, p(ot |xt ), can be obtained via a comparison with the fused result. The Kullback-Leibler distance (Kullback and Leibler 1951) , or relative entropy, is an information theoretic measure of how close two probability distributions are to one another. The Kullback-Leibler distance is given by t p(ot |xt ), p(ot |xt ) =
i (j) (j)

p(ot |xt ) log

(i)

p(ot |xt ) p(ot |xt )


(j) (i)

(i)

(4.10)

The Kullback-Leibler distance has been used previously by Soto and Khosla (2001) to evaluate the performance of cues and has proved a successful measure of a cues performance relative to the rest of the cues in the set. The utility of the j th cue at time t is dened as ut =
(j)

1 (t (p(ot |xt ), p(ot |xt )) + )


(j)

(4.11)

where is a positive oset to ensure the denominator is never zero. The KullbackLeibler distance produces a higher value the greater the distance between two distributions. The utility is a measure of the performance of a cue with respect to the fused likelihood and should produce a lower value for a larger distance between a cues likelihood and the fused likelihood; hence, the inverse of the Kullback-Leibler distance is used.

Scheduling using the utility and execution time The scheduler performs an exhaustive search through the complete space of possible cue combinations C and cue rates R for the conguration that has the maximum total utility. The cue rate is the number of frames allocated to a cue

4.3 The cue processor

75

to process the particles. So if the frame rate is 15 Hz and a cue has a rate of 2, then it will execute at a frequency of 7.5 Hz. This attempts to maximize future information extraction based on previous cue performance. The total utility of a cue combination is then the sum of the utilities from each cue discounted exponentially for each frame that it is late ut (c) =
jc

ut rj 1

(j)

(4.12)

where is the discount factor and rj is the rate of the j th cue. Note that each cue in c is a tuple that includes both the cue c and the cue rate r. Initially, the system was tested with rates of 1, 2, 4, and 8. The utility is discounted on the premise that the amount of information provided by the cue reduces exponentially for each frame that it is late (i.e. the result from a cue that runs over eight frames is worth less than the result of a cue that runs over two frames). An analogy is the way returns on investments are calculated money received now is worth more than it is in the future. A value of 0.9 was typically used for this discounts the utility of each cue by 10% for each frame that it is late (compounded). This value represents the value placed on receiving current information. If the system is highly dynamic, a lower value of should be used. In the case of the lane tracker operating at 15 Hz, a discount factor 0.9 discounts information highly (by %80) if it is received after 1 second an approximate reaction time for a typical driver. The cue conguration ci for iteration i is selected from all cues that are currently not processing in the background (equation 4.13) that satisfy the time constraint in equation 4.14: ci = argmax (ut (c))
cC

(4.13) (4.14)

t(ci ) < tf rame

where t(ci ) is the time required by this cue combination and tf rame is the inverse of the frame rate the period of one frame. The period t(ci ) is divided into four sections as shown in gure 4.8 and dened by t(c) = tF (c|r=1 ) + tB (c|r>1 ) + tB0 + tI ; tF (c|r=1 ) < 0.5tf rame ; tB (c|r>1 ) + tB0 < 0.2tf rame ; tI < 0.3tf rame . (4.15) (4.16) (4.17) (4.18)

76
Capture image (t)

Distillation: a detection and tracking algorithm


Capture image (t+1)

Internal processing

Foreground cues
Shared preprocessing Fixed probability calculation

Background cues

Already running semiactive cues

Newly scheduled semiactive cues

Fixed preprocessing

Fixed probability calculation

Fixed preprocessing

Fixed probability calculation

Figure 4.8: Time allocation during one iteration. The parameter c|r=1 is the set of foreground cues and c|r>1 is the set of background cues. The period tF (c|r=1 ) is the time required to process the foreground cues and is a combination of the shared preprocessing time, to (4.3.2), and the xed particle processing time, tf , that computes the likelihood distribution p(ot |xt ) tF (c) =
iO (c) (i) (i)

to (i) +
jc

tf (j)

(4.19)

where O (c) is the set of unique observations from the cues c, to (i) is the preprocessing time for observation i and tf (j) is the xed processing time for cue j. The shared preprocessing time is the time needed to calculate all the required observations once only (4.3.2). The xed particle processing time is that time required to evaluate the sensor model of each cue. Similarly, tB (c|r < 1) is the time required to process the background cues and consists of the preprocessing time and the xed particle processing time; however, the background cues do not use shared preprocessing since the exact iteration the cue will start processing is unknown at the time of the scheduling. Hence, it is undetermined which observations will already be processed in each frame tB (c) =
iO(c)

to (i) +
jc

tf (j)

(4.20)

where O(c) is the complete list of observations from the cues c (possibly containing repeated elements). This is preferable to the shared preprocessing time as it is the worst case scenario and allows enough time for the calculation of all background cues even if they all have to preprocess their observations in isolation. The period tB0 is the processing time required for the background cues that are already running from a previous iteration while tI is a small fraction of time allocated to the internal computation of the particle lter algorithm.

4.3 The cue processor

77

The cues are scheduled in two steps. This is done to reduce the computational complexity involved in the exhaustive search over all the combinations that are possible for the cue rates: First the foreground cues are scheduled. The scheduler starts by generating all possible cue combinations c|r=1 . All combinations that have an execution time greater than the time allocated for foreground cues are removed from the list (equation 4.14). The utility for all the remaining cue combinations is generated and the combination with the highest utility is selected as the foreground cue set (equation 4.13) and assigned a cue rate of 1. The remaining cues are then scheduled as background cues c|r>1 = C c|r=1 . (4.21)

All combinations of rates rj are generated for the remaining cues and combinations that dont satisfy the time constraint in equation 4.14 are removed from the list. The utility for each combination is calculated according to equation 4.11. The conguration of rates and background cues c|r>1 with the highest utility is then selected for the background cues and these cues are put on a list of cues waiting to be processed by the slow cue processor loop shown in gure 4.7 An example of the operation of the scheduler and the migration of cues over the foreground and background lists is shown in gure 4.9.

4.3.2

Preprocessing

Preprocessing the raw data from the sensors (i.e. the images) is necessary to transform the data into the form that is used for evaluating the sensor model of each cue. Distillation uses a preprocessing algorithm that ensures that the observations ot required by the cues are calculated once only regardless of the number of cues that use the observation.

78

Distillation: a detection and tracking algorithm


All Cues

Cues Available for Scheduling

Background Cues still Running

C2 C6 C1 C9 C8 C5 C3 C7 C4

C4 C5 C7 C3 C1 C9 C2 C8 C6

Scheduled Foreground Cues time t time t+1 C1 C2 C9 Cues Available for Scheduling

Scheduled Background Cues

C6 C8

Background Cues still Running

C2 C7 C1 C3 C9 C5 C6 C4 C8

Figure 4.9: An example of the scheduling of cues and their migration over one iteration of the distillation algorithm. Cues C1, C2, and C9 are scheduled to run as foreground cues during cycle t, while cues C6 and C8 are scheduled as background cues. Cues C3, C4, C5 and C7 continue to process in the background over this iteration. During the cycle, cue C3 nishes and is put back into the available cues list.

Figure 4.10 shows an example of the type of data that is shared among cues in the lane tracking system and how the preprocessed observation is only calculated once. Both the road colour cue and the non road colour cue require the road colour probability map, while both the road edge cue and the lane marker cue use the grayscale image (5.4). In each case only the cue that requests the

4.3 The cue processor


Capture Images Preprocessed Images

79
Evaluate Cues

RGB Image

Grayscale Image

Blurred Canny Edge Map

Road Edge Cue

Lane Marker Map

Lane Marker Cue

Road Colour Probability Map Road Colour Cue

NonRoad Colour Cue

Figure 4.10: An example of the shared observation preprocessing for four cues of the lane tracker. Both the road colour cue and non road colour cue share the road colour probability map while the road edge cue and the lane marker cue share the grayscale image. In each case, only the rst cue that requests the observation creates. observation rst will create it. A central storage unit (CSU) is used to manage all observation objects required by the cues. It maintains exclusive access to the dierent observation objects as well as maintaining statistical information that includes observation creation time and which cues access the objects. During the processing stage, each cue checks whether the observations it requires are in the CSU. If they are not, then the cue creates and places them in the CSU. This process of calculating shared resources is used by the scheduling algorithm to distribute computational resources eciently across the set of cues.

4.3.3

Probability evaluation

The evaluation of the likelihood for the j th cue is a direct application of the (j) sensor model p(ot |xt ) on the particles passed to it from the particle lter. The details of these models for the lane tracker are left until chapter 5 since they are application specic.

80

Distillation: a detection and tracking algorithm

Where the sensor data (i.e. images) is only preprocessed once for each iteration, regardless of the number of cues that require them, the evaluation of the likelihood for each cue requires one calculation for every particle in the system (i.e. p(ot |xt ) is calculated for every ith particle in the j th cue). The execution time for calculating the likelihood is directly proportional to the number of particles in the system. Hence it is desirable to design a system where the majority of complex calculations are in the preprocessing component of the cycle and the likelihood evaluation is as ecient as possible.
(j) (i)

4.3.4

Fusion

The fusion of the likelihood distributions calculated by the cues is the nal step in updating the posterior of the search space. With the assumption of probabilistic independence, the likelihood distributions of the cues can be fused using the product rule (Kittler et al. 1998; Nageswara 2001)
m

p(ot |xt ) =

(1) p(ot

(m) . . . ot |xt )

=
j=1

p(ot |xt ).

(j)

(4.22)

Here, the probability of state xt given the measurements from the dierent sensors, or observations ot . . . ot returned by each cue. An important consideration when fusing the probability distributions using the product rule is that if a cue returns 0 for any of the particles then the nal probability of that particle will be 0. This can be dealt with in two ways. Equation 4.22 can be modied to use an oset (0, 1) so that each element of the product cannot be 0 p(ot . . . ot |xt ) =
j=1 (1) (m) m (1) (m)

is the product of the likelihood distributions

(p(ot |xt )(1 ) + ).

(j)

(4.23)

An alternative method of handling this is to disallow zero probabilities in the evaluation of the cues likelihood. Since each cue has more knowledge of the shape of its distributions, this strategy is used in the lane tracking system (chapter 5).

Fusion of slow cues The distillation algorithm maintains a record of where the particles move over time so that background cues that were initiated at a previous time-step can

4.3 The cue processor

81

be fused with the particles of the current time-step. Since the algorithm knows where particles have migrated, it is possible to fuse the likelihood distributions from background and foreground cues using equation 4.23. However, if a sampling strategy is used that also distributes a fraction of the particles randomly over the state space8 , not all the particles will have logical migration paths. In this case, if a particle from a previous time-step does not have a direct descendant, it is assigned a probability of 0.5 indicating that it doesnt believe or disbelieve in the truth of the hypothesis of that particle
(j) (i) p(ot |xt )

p(otold |xtold ) if direct descendant; 0.5 if not direct descendant.

(j)

(i)

(4.24)

The distribution of equation 4.24 is then used for the fusion of a background cue that was started at time told with the foreground cues at time t.

4.3.5

Cue design considerations

There are several design considerations when creating cues for use in the distillation algorithm: Researchers have found that the particle lter performs poorly when the sensor, or in this case, the cue is without noise (Doucet 1998; Liu and Chen 1998; Thrun et al. 2001b). This apparent contradiction makes perfect sense. If the likelihood from the cue has high kurtosis, fewer particles will be in regions of high probability and hence particles will be propagated to those regions more slowly. Thrun et al. (2001b) found that the introduction of an articial noise level of 20% worked reliably, although the method was not mathematically rigorous. Cues must be close to probabilistically independent to validate the use of the product rule for fusion. Where the sensor data (i.e. images) is only preprocessed once for each iteration regardless of the number of cues that use them, the evaluation of the sensor model of each cue requires one calculation for every particle
The random distribution of a fraction of the particles over the state space is a common strategy for helping the particle lter recover from failures in which the target is lost. It allows particles to exist in regions of the state space which would not normally be covered due to the convergence of particles in regions of high probabilities.
8

82

Distillation: a detection and tracking algorithm generated by the particle lter (i.e. p(ot |xt ) is calculated for every ith particle in the j th cue). The execution time for calculating the sensor model is directly proportional to the number of particles in the system; hence, resources should be allocated to preprocessing information-rich observations ot to ensure that the sensor model evaluation is as ecient as possible.
(j) (i)

In the lane tracking system described in the next chapter, all of these points were considered during the design of the system. The noise characteristics of the sensors are incorporated eciently into the cues by blurring the image maps used to evaluate the likelihoods. This smooths the likelihood, which in turn aids the convergence of the algorithm. By allocating resources to build information rich observations in the form of image maps and incorporating the sensor error into these directly, the cues eciently calculate their associated sensor model directly from the image map. Hence, the sensor model evaluation is the least expensive part of the process and a real-time frame rate of 15 Hz is achievable. Finally, cues were designed to be as close to probabilistically independent as possible to validate the use of the product rule for cue fusion.

4.4

Summary

The application of the distillation algorithm to lane tracking is discussed in detail in the next chapter; however, it is important to outline why the distillation algorithm is a suitable basis for lane tracking. Robustness is an essential characteristic for obvious reasons, but is also desirable with respect to discontinuous changes in road structure that can often occur on moderately structured roads and at road junctions. The particle lter has demonstrated an ability to adapt to discontinuous changes in road width while probabilistic measures of the lters performances could be used to determine when tracking has been lost. The adaptive nature of the distillation algorithm enables the lane tracker to adjust to changes in road types and characteristics seamlessly while still operating at a frame-rate and accuracy suitable for the task being performed. A real-time system requires the ecient use of computational resources and this is handled at a fundamental level in the distillation algorithm, such that the best statistical result is obtained given the computational resources available. The modularity of the cue processor allows the lane tracker to be extendible while being generic enough to be applied to a variety of tracking problems. When new

4.4 Summary

83

hardware becomes available or new processing techniques for lane detection are developed, they can be easily incorporated into the current system by the addition of a cue that encompasses the new technique. A scalable architecture is useful in lane tracking as it allows the tracker to adjust itself to the characteristics of the tasks being solved (i.e. if the weather conditions are bad with the possibility of poor visibility, the number of particles used by the system can be increased to counteract sensor noise with scalable computation time). The novelty of distillation lies in the integration of particle ltering, cue fusion and cue adaption technologies and the associated performance characteristics of the algorithm listed above. The framework seamlessly adapts cues at a fundamental level to handle changes in the environment that it is perceiving. This characteristic combined with a strong mathematical basis for adaption are crucial to its success. Many previous attempts at using multiple cues, blindly use all cues regardless of their performance or adopt ad-hoc methods for selecting, and changing cues. Previously, little eort has been focused on running cues at dierent frame-rates depending on their utility and speed. Most often, cues that are not performing well are stopped and integrated back into the algorithm in a seemingly ad-hoc manner. Distillation allows cues to run continuously and at dierent frame-rates depending on their overall contribution to the fused result, with the hope of extracting the most information given the computational resources available. Given a set of cues suitable to a variety of roads, the distillation algorithm provides the lane tracking system with a search basis that has the ability to adapt itself automatically to the type of the road that it is on and manage the computational resources it has at its disposal so that it will produce the best statistical result possible according to the specied criteria.

84

Distillation: a detection and tracking algorithm

Chapter 5 Lane localization and tracking with distillation

HAPTER 1 identied the two tasks of a lane tracker: localization and track-

ing of the vehicle with respect to the road and modelling the lane structure. This is a interesting problem because of its ill-posed nature. The variability of roads, changing and dicult lighting conditions, and occluding obstacles all make this a challenging task requiring a robust method for localization and tracking. A novel 15 Hz lane tracking system built around the distillation algorithm is presented in this chapter. Figure 5.1 shows the tracking problem solved here. The lateral oset, ysr , and the yaw of the vehicle, vs , from the skeletal line of the lane complete the locale of the lane. Estimating the road width, rw, as part of the tracking process completes an initial model of the lane structure and is the rst step to determining a full curvature based model. This state vector can be written as ysr x = vs . rw

(5.1)

Six cues using colour, edge, and lane marker features as well as heuristic guides are fused in the distillation algorithm to validate state hypotheses at a frame-rate of 15 Hz (gure 5.2 and extension 11 in appendix D). This chapter is presented in ve parts. First, an overview of coordinate geometry and the reference frames used in TREV is provided (5.1). Second, the Ackermann steering model used to model the motion of the vehicle is reviewed (5.2). Third,

86

Lane localization and tracking with distillation


y The Road

vs
z ysr

Road skeletal line rw

Figure 5.1: The search space of the lane tracker: the lateral oset, ysr and yaw, vs , of the vehicle with respect to the road skeletal line, and the road width, rw. Note that this gure is exaggerated for clarity.

(a) Road

(b) Colour

(c) Edges

(d) Lane markers

Figure 5.2: Visual lane tracking cues. Four visual lane tracking cues using colour, edges and lane marker features are used to track the road. image acquisition and the calibration of sensors on TREV are discussed (5.3), and then the cues that form the sensor models are developed (5.4). To conclude, experimental results of the lane tracker, designed to demonstrate the ability of the distillation algorithm as a lane tracking solution, are presented focusing on the particle ltering, cue fusion and scheduling algorithms used (5.5).

5.1

Coordinate geometry and reference frames

To evaluate the cues likelihood1 from image observations, a mechanism for transforming the hypothesized road model into image space is required. For this purpose, six coordinate systems, or reference frames, are used to model TREV with homogeneous coordinates.
The term likelihood will be interchangeably used with the term sensor model to reference the mathematical entity p(o|x) in this chapter.
1

5.1 Coordinate geometry and reference frames

87

5.1.1

Homogeneous coordinate transformations

Homogeneous coordinates allow a consistent mathematical framework to be used for translational and rotational transformations between coordinate systems. Traditionally, a point in 3D space would be represented by its inhomogeneous form p = x y z . However, when we wish to apply multiple rotation and translation transformations to a point, it is simpler to represent this point in its scale-invariant homogeneous form P = x y z 1 . Therefore a transformation involving a translation T followed by a rotation R can be applied using P = TRP. The transformation matrices T and R use homogeneous coordinates and are based on the standard homogeneous Euler transformations.

Euler translations Let us dene PA = [x, y, z, 1] to be a homogeneous point in coordinate system A, PA 4 as shown in gure 5.3. A transformation of point PA between translated coordinate systems A and B is then PB = T(xBA , yBA , zBA )PA where T is the Euler homogeneous translation matrix 1 0 T(tx , ty , tz ) = 0 0 0 1 0 0 0 0 1 0 tx ty . tz 1 (5.2)

(5.3)

The translations tx , ty and tz are along the X, Y and Z directions respectively, and xBA , yBA and zBA describe the location of the origin of coordinate system A in coordinate system B.

Euler rotations A common method of transforming a point from one coordinate system to another that has a dierent orientation is to use the z-y-x Euler angles and apply the rotation transformations in the order given by PB = Rz (BA )Ry (BA )Rx (BA )PA (5.4)

88

Lane localization and tracking with distillation

z
z

B P
A

BA

(a)

(b)

Figure 5.3: Homogeneous coordinate transformations. (a) Point PA in coordinate system A. (b) The transformation of a point P between two translated coordinate systems is described by equation 5.2. where the Euler homogeneous rotation matrices are dened as 1 0 0 0 cos() sin() Rx () = 0 sin() cos() 0 0 0 cos() 0 sin() 1 0 0 Ry () = sin() 0 cos() 0 0 0 cos() sin() 0 sin() cos() 0 Rz () = 0 1 0 0 0 0 0 0 ; 0 1 0 0 ; 0 1 0 0 . 0 1

(5.5)

(5.6)

(5.7)

The set of Euler angles {, , } is formed by rotating coordinate system A around the z, y and x respectively to form coordinate system B.

Combining rotations and translations The full transformation between a set of coordinate systems that have dierent orientations and centres is a combination of equations 5.2 and 5.4 applying the

5.1 Coordinate geometry and reference frames


z

89

Figure 5.4: Roll , pitch , and yaw rotation directions about the X, Y and Z axes. The axis orientations shown here are the standard denition for the automotive industry where the X axis is in the direction of travel, the Z axis is in the opposite direction to gravity and the Y axis obeys the right hand rule and goes into the page. translation rst PB = Rz (BA )Ry (BA )Rx (BA )T(xBA , yBA , zBA )PA . (5.8)

5.1.2

Coordinate systems for modelling TREV

Following previous authors, TREV is modeled using six coordinate systems that transform a model of the lane into the image plane (gure 5.5; Dickmanns 1999). The road coordinate system (RCS) is centered on the skeletal line of the lane at the same longitudinal position as the centre of the rear axle of the vehicle (gure 5.5(a)). The skeletal line of the lane is located at the center of, and runs parallel with, the lane. The road model is dened in this coordinate system and is transformed into the image plane using the ve following transformations and coordinate systems. The origin of the surface coordinate system (SCS) is in the centre of the rear axle of the vehicle and is oriented in the same direction as the RCS (gure 5.5(b)). The transformation of a point in the RCS to the SCS, Msr , consists of a translation in the Z direction, zsr , that accounts for the height of the vehicle and a translation in the Y direction, ysr , that accounts of the lateral oset of the vehicle from the skeletal line of the lane Ps = Msr Pr = T(0, ysr , zsr )Pr . (5.9)

90

Lane localization and tracking with distillation

z CeDAR x y y Side view x Front view z

z CeDAR

z zsr y x Front view

y
y Side view y

z skeletal line

x
skeletal line

ysr

Top view

Top view

(a) The road coordinate system.


z CeDAR y Side view y vs x skeletal line x y x Front view

(b) The surface coordinate system.

z
y

z CeDAR

bv x x

zbv y

Side view y

Front view

x xbv

skeletal line

Top view

Top view

(c) The vehicle coordinate system.

(d) The base coordinate system.

Figure 5.5: Coordinate systems used to model TREV. The road, surface, vehicle and base coordinate systems dene the transformation into camera space. The parameters in red govern the transformations. The camera and image space coordinate systems are not shown since they are camera centric.

The parameter Pr is a 3D homogeneous point (P = [x, y, z, 1] ) in the RCS and Ps is point P in the SCS. Note that ysr is one of the three unknowns in the state space of the distillation algorithm. The vehicle coordinate system (VCS) shares the same origin as the SCS but is oriented so that the X axis is parallel with the vehicles motion (gure 5.5(c)). The axes of the coordinate system follows the standard orientation in the automotive industry with X being the direction of travel, Z being in the opposite direction to gravity and Y following the right hand rule. The transformation between the SCS and the VCS, Mvs , accounts for the yaw of the vehicle, vs , and the roll of

5.1 Coordinate geometry and reference frames the vehicle2 , vs Pv = Mvs Ps = Rz (vs )Ry (vs )Ps . lation algorithm.

91

(5.10)

Note that vs is the second of the three unknowns in the state space of the distil-

The base coordinate system (BCS) is centred at the base of the CeDAR camera head and is oriented in the same direction as the vehicle but is tilted bv relative to the pitch of the vehicle (gure 5.5(d)). The transformation between the VCS and the BCS, Mbv , accounts for the translations, xbv and zbv , of the camera head relative to the centre of the rear axle in the X and Z directions respectively Pb = Mbv Pv = Ry (bv )T(xbv , 0, zbv )Pv . (5.11)

The camera coordinate system (CCS) is oset from the camera base in the Y and Z directions by yci b and zci b for each camera i, with a yaw and pitch of ci b and ci b respectively Pc = Mcb Pb = Rz (ci b )Ry (ci b )T(xci b , yci b , zci b )Pb (5.12)

where Mcb is the transformation between the CCS and the BCS. This incorporates the oset from the camera centre to the centre of CeDAR. Finally, the transformation into the image plane coordinate system (IPCS), Mic , is derived from the pinhole camera model (Hartley and Zisserman 2000) shown in gure 5.6 and the internal specications of the camera x0 f kx 0 0 1 pi = Mic Pc = y0 (5.13) 0 f ky 0 Pc . f 1 0 0 0 The matrix Mic is the the perspective transformation matrix (PTM) of the camera which has been modied to incorporate the change of axes from the CCS to the IPCS . This is included so that the X and Y axes represent the standard horizontal and vertical axes in the image respectively with the centre located at the top left corner of the image plane. The parameters f , kx , ky , x0 and y0 are the focal length, scaling factors in the X and Y directions and the principle point of the camera respectively. The full transformation from the RCS to the IPCS is then described by pi = Mic Mcb Mbv Mvs Msr Pr = Mic Mcr Pr .
2

(5.14)

For the experiments in this dissertation, the roll of the vehicle is assumed to be negligible, but was included in the formulation to allow for more dynamic scenarios in future work.

92
Y

Lane localization and tracking with distillation

f X y C x P camera centre x Z principal axis principle point (x0, y0) image plane X

Figure 5.6: Pinhole camera model. The focal length f and principal point (x0 , y0 ) govern the projection of a 3D point X onto the image plane at x. The matrix Mcr is the transformation from the RCS to the CCS. The internal perspective mapping of the point from the CCS into the image plane is dened by Mic . All the parameters apart from the search parameters ysr and vs are known from the calibration of the vehicle, CeDAR and the intrinsic camera parameters (B). Therefore the search space completes the transformations from the RCS to the image plane so that the road model can be transformed into the image plane and validated by the distillation algorithm.

5.2

Ackermann steering motion model

The purpose of the motion model in the distillation algorithm is threefold. First, it must resample the particles according to the posterior of the last iteration. Second, it must move the particles corresponding to any actions performed by the vehicle. Finally, it disperses the particles according to the inaccuracies of the action measurements and motion model. The steering system on TREV, like most commercial vehicles, is based on the Ackermann steering model (gure 5.7; Dixon 1991). This section presents a motion model that is founded on the Ackermann steering system, which accurately

5.2 Ackermann steering motion model


1 2

93

L Centre of the rear axle

a Instantaneous Centre of Curvature c R

Figure 5.7: Ackermann steering model used for the lane tracking motion model. The lines extending perpendicular from all wheels must intersect at the instantaneous center of curvature (ICC). describes the path travelled by the vehicle relative to a low curvature road. This motion model implements the particle lter update equation. The Ackermann steering system shown in gure 5.7 is designed so that all four wheels rotate about the same point without slipping. This point is called the instantaneous centre of curvature (ICC). The lines extending from the axes of the front steering wheels, (a and b), intersect the same point on the line extending from the axis of the rear wheels, (c). For this to occur, the two front wheels must be at dierent angles relative to each other and the dierence between them is called the Ackermann angle () = 2 1 where the angles of the left and right wheels are 1 and 2 respectively. The simple trigonometry of gure 5.7 produces two equations for calculating R R + d/2 = L tan(/2 + 1 ); R d/2 = L tan(/2 + 2 ); (5.16) (5.17) (5.15)
i

p(xt |xt1 ; at1 )p(xt1 |ot1 ; . . . ; o0 ) term of the

(i)

(i)

(i)

where the distance L is the wheelbase of the vehicle, d is the distance between the centres of the left and right wheels, and R is the distance between the centre of the rear axle and the ICC.

94

Lane localization and tracking with distillation

Given an initial lateral oset, yi , and yaw, i , of the vehicle at iteration i, the Ackermann update equations for the state of the lane tracker with added Brownian motion are

i+1 = i + tv/R + randn(0, y );

(5.18)

yi+1 = yi + R(cos(i + pi/2) cos(i+1 + pi/2)) + randn(0, ); (5.19) rwi+1 = rwi + randn(0, rw ). (5.20)

Here t is the period between iterations i and i + 1, and v is the velocity of the vehicle. The function randn(0, ) diuses the particles using a normally distributed random number generator with a mean 0 and standard deviation . This Brownian motion is added to account for errors in the action measurements. The variances (y , , rw ) were set empirically with typical values of (0.02, 0.008, 0.01). Therefore the summation term in the full update equation of the particle lter is implemented in three steps. First, resampling with replacement is used to resample the distribution according to the posterior from the previous iteration. Second, the deterministic drift is applied through the Ackermann motion model. Finally, Brownian motion is added to diuse the particles. Note that this motion model is only valid for roads with low curvature. With high curvature roads, the curvature parameters have to be included in the motion model to nd the vehicles new state with respect to the road. It was found during experimentation that using this motion model on roads with high degrees of curvature resulted in sporadic tracking. This was caused by particles being moved to regions of low probability between frames because of the inaccuracy of the motion model on high curvature roads.

5.3

Sensor calibration

Before experimentation with the lane tracker could begin, three sensors on TREV required characterization. The intrinsic parameters of the cameras on CeDAR were calibrated using a standard matlab toolbox while the steering sensor and tailshaft encoder used for odometry where characterized empirically.

5.3 Sensor calibration

95

5.3.1

Image acquisition

Each Sony CCB EX37 colour camera on CeDAR digitises a 60 Hz interlaced 640x480 image stream via an Imagenation PXC200L framegrabber. Alternating rows of the image are updated in each cycle. This gives a full 60 Hz 320x240 image stream or a 30 Hz 640x480 image stream. For this project, the 320x240 image streams were used allowing frame-rates of up to 60 Hz. The CeDAR vision system combines a wide eld of view (FOV) of 46.40o (L0.05 = 21.9 m) for maximum scene coverage in the near-eld, and a narrow FOV of 17o (L0.05 = 53.3 m) in the far-eld for road curvature estimation (gure 5.8). The parameter L0.05 is the distance at which one pixel represents 5 cm in the real world and is a measure of the maximum distance at which lane markings can be reliably extracted (Gregor et al. 2000). At L0.05 lane markers are typically around three pixels wide. At a speed of 70 kph, a lookahead distance of 53.3 m provides a reaction time of around 3 seconds. This far-eld lookahead distance is suitable for curvature estimation on the test roads around the ANU where the speed limit is 70 kph.

5.3.2

Camera calibration

In the previous section it was shown that the transformation between the RCS and the IPCS is completed with two of the three state parameters of the particle lter, ysr and vs . The rest of the parameters of the transformation are obtained through the design of the vehicle and the calibration of the cameras. The vehicle design parameters are straight-forward to obtain through measurement or from design drawings (B). The cameras are calibrated using the Camera Calibration Toolbox For matlab v3.0 10-17-00 (Heikkil 2000; Bouguet 2002). a The Camera Calibration Toolbox uses a linear technique to initialize the internal camera parameters which are then rened by a non-linear optimization. The toolbox estimates ten intrinsic camera parameters: eective focal lengths, principal point coordinates, skew factor, two radial and three tangential distortion coecients. Twenty images of a calibration target at dierent orientations were used estimate the intrinsic parameters (gure 5.9). These results are shown in table 5.1. Only the four intrinsic parameters in table 5.1 were required for the pinhole camera model used (gure 5.6). The skew factor and distortion coecients

96

Lane localization and tracking with distillation

FOV = 46.40

FOV = 17.06

L0.05 = 21.9m L0.05 = 53.3m

Figure 5.8: Fields of view and lookahead distances of the two cameras installed on CeDAR are congured for dual near-eld and far-eld scene coverage. The short lookahead distance of L0.05 = 21.9 m is intended for maximum scene coverage in the near-eld and was the smallest that was possible with these cameras. A lookahead distance of L0.05 = 53.3 m was chosen for the far-eld since it allows approximately three seconds of reaction time at a speed of 70 kph which is the speed limit on the high curvature roads that the curvature experiments are being conducted on. are not shown as they proved to be negligible. fx (mm) fy (mm) x0 (pixels) y0 (pixels) Camera 1 5.5955 5.5985 172.0103 107.9645 1 0.01257 0.0130 5.0411 3.0329 Camera 2 16.0039 15.7260 152.0712 97.3408 2 0.0498 0.0498 5.6237 6.0153 Table 5.1: Intrinsic camera parameters and their errors. The standard deviation of parameter i is i .

5.3.3

Steering sensor characterization

TREV is equipped with a HSI-Houston scientic 1950 position transducer tted to the steering mechanism to measure the steering angle of the vehicle (3.2.3). This

5.3 Sensor calibration

97

(a)

(b)

(c)

Figure 5.9: The calibration target with 25.4mm squares used to calibrate the internal parameters of the cameras with the Camera Calibration Toolbox for matlab (v3.0 10-17-00).

Figure 5.10: Steering sensor calibration. Wheel angle versus steering transducer voltage data. The parameters 1 and 2 are the wheel angles for the left and right front wheels respectively.

provides a voltage output, V , that is related to the steering angle of the left and right front wheels (1 and 2 respectively). The voltage V is measured through the SNAP IO module. This system was characterized by manually measuring the steering angle of both front wheels and the voltage returned from the transducer at various steering angles. Figure 5.10 shows the wheel angles with the corresponding transducer voltage data.

98

Lane localization and tracking with distillation

Figure 5.11: Graph of the instantaneous centre of curvature (R) versus the front wheel angles. The parameters R1 and R2 are calculated from the left and right front wheel angles. According to the Ackermann steering model, R1 should equal R2 , however there is a slight discrepancy that can be resolved using the bicycle approximation shown in gure 5.12. Plotting the R against 1 and 2 using equations 5.16 and 5.17, a slight discrepancy between the values of R hint that the steering mechanism doesnt obey a pure Ackermann model (gure 5.11). Often, a small amount of exibility is allowed in steering mechanisms for slippage which explains the discrepancy here. This anomaly can be accounted for by assuming a bicycle Ackermann model, where there is only one wheel at the front and rear of the vehicle (gure 5.12) and using the mean of the two curves in gure 5.11. The mean curve in gure 5.11 can be approximated analytically by a hyperbolic coth function to map the voltage V to the ICC R R = a coth(bV ) + c (5.21)

where a, b and c are coecients determined via a least-squares t with the measured data (gure 5.13) a 5.0309 b = 0.2565 . c 1.2267

5.3 Sensor calibration

99

mean

L Centre of the rear axle

a Instantaneous Centre of Curvature

c R d

Figure 5.12: The bicycle approximation to the full Ackermann steering model uses only one front and rear wheel with d = 0 in equation 5.16 to calculate R. The angle mean is the mean of 1 and 2 .

Figure 5.13: The least squares tted data for the function mapping the steering transducer voltage V to the ICC R.

100

Lane localization and tracking with distillation

5.3.4

Tailshaft encoder characterization

To measure the velocity of the vehicle, the Toyota factory standard tailshaft encoder on TREV is accessed through the SNAP IO unit over the network (3.2.3). This occurs every iteration of the algorithm and a separate module maintains the velocity of the vehicle for other modules to access. The SNAP IO unit returns the number of tailshaft encoder counts that has occurred since the last time it was read. Characterization of the tailshaft encoder revealed that the velocity is given by v= 0.3910833C t (5.22)

where v is measured in (m/s), C is the number of counts since the last iteration, t is the time since the last iteration in (s) and the constant, 0.3910833, is a unit conversion factor between the number of counts and the distance travelled in (m/counts). This conversion factor was empirically determined by measuring the number of counts that occurred over a distance of 2 km.

5.4

Lane tracking cues

The cues selected for this experiment were designed to be simple and ecient while covering a variety of road types. Individually, each cue is weak, but when they are combined through the cue fusion process, they produce a robust solution to lane tracking. Two types of cues are used in the lane tracker: state based cues and image based cues. State based cues use the state represented by each particle as the observation measurement. These were introduced as a heuristic to constrain the the lane tracker. For example, it was found in earlier experiments without the heuristic cues, that the tracking algorithm would often jump between lanes on a multilane road, or would measure multiple lanes as a single lane. The introduction of cues that impose a prior on the lane width and location constrained the solution suciently to correct this. Image based cues use the image streams from CeDAR as the observation measurements. These cues depend on the road regions and their transformations into image space as shown in gures 5.14 and 5.15 to process their (i) sensor models, p(ot |xt ). Each hypothesis from the particle lter denes a road model in the RCS through the parameter rw. The lateral oset, ysr , and the yaw, vs , of each hypothesis complete the transformation between the RCS and

5.4 Lane tracking cues


Road model in the road co-ordinate system Road model in image space

101

Figure 5.14: Transformation of the road model into image space. The road regions are sampled from the preprocessed images to calculate the likelihood of each image based cue.
Lstart y Lend

vs
z

RCS ysr

x Orer

Road skeletal line rw

Onrr

Figure 5.15: The road model used for vehicle state and road width estimation is dened by the three unknowns, ysr , vs and rw. The dark shaded area is used as the lane marker region in the lane marker and road colour cues while the light shaded area is the road region. The light edges of the lane marker region are part of the road edge model. The parameters Lstart and Lend are the start and end positions of the road region in the road coordinate system respectively and were set to 8 m and 26 m respectively. Note that the gure is exaggerated for clarity. image plane (5.1.2), allowing the road model dened by rw to be projected onto the image plane. Each image-based cue, then samples pixels from the projected model in the observation image to evaluate their likelihood. Each cue calculates its sensor model using n lines in the RCS, which are dened according to the set of coupled points Sr = Pr , Pr , Pr , Pr , . . . , Pr , Pr 1,s 1,e 2,s 2,e n,s n,e (5.23)

Points Pr and Pr are respectively the start and end points of the ith line in i,s i,e the RCS. These points are transformed from the road coordinate system to the

102

Lane localization and tracking with distillation

image plane by the cues to evaluate their sensor models SI = Mir Sr where SI is the set of points in the image plane. (5.24)

5.4.1

Lane marker cue

The lane marker cue (LMC) lters the intensity image with a Laplace of Gaussian (LoG) kernel to emphasize vertical bar-like features (2.2.4). This makes it particularly good at locating lane markers. The LMC samples the lane marker region in image space to evaluate its sensor model (gure 5.16)
(i) p(ot |xt )

n l=1

ml I p=1 Ilmm (pl,p,i ) n l=1 ml

(5.25)

where pI is the pth point of the lth line from the set of points SI dened by the l,p,i lane marker region and particle i, Ilmm (p) is the value of pixel p from the lane marker map described below and ml is the number of pixels in line l. Lane marker region The lane marker region is dened by the two lines in gure 5.16 and the cumulative oset Orer + Onrr . This region is suitable for cues that respond to peaks in the lane marker map or troughs that occur near the centre of lane markers in the road colour probability maps.

Lane marker map The lane marker map is generated from the grayscale camera image by a horizontal correlation with a 1D approximation to a LoG kernel (2.2.4). The parameters of the LoG kernel were set to emphasize bar-like features that are approximately three pixels wide. This can capture lane markers up to the lookahead range L0.05 (5.3.1). Typical values were a = 2 and c = 2/( 3 1/4 ). Using a horizontal correlation in this fashion emphasizes vertical bars in the image and reduces the possibility of horizontal shadows across the road being detected. Figure 5.17 shows the steps in the evaluation of the lane marker cue sensor model. The colour camera image from (a) is used to create the lane marker map in (b),

5.4 Lane tracking cues


Lstart P1,s y Lend P1,e

103

Road skeletal line x z Road co-ordinate system Orer rw

P2,s Onrr

P2,e

Figure 5.16: Road model: the lane marker region. The lane marker region is dened by two lines and the combined oset Orer + Onrr . This region is suitable for cues that test peaks in the lane marker map or troughs in the road colour probability maps that occur near the centres of lane markers.

(a) The original colour image.

(b) The lane marker map, Ilmm .

(c) The likelihood from the lane marker cue.

Figure 5.17: Lane marker cue. A grayscale version of the original camera image is preprocessed using a 1D LoG kernel to extract bar like regions (b). The lane marker region is then used to evaluate the sensor model from this preprocessed image (c). Note that lighter pixels correlate to higher probabilities in (b) and (c).

while this image is used by the sensor model to generate the cue likelihood in (c).

5.4.2

Road edge cue

The road edge cue (REC) is designed for roads with lane markings or denite edges on the road boundary. It uses a blurred Canny edge map and the road edge

104

Lane localization and tracking with distillation


Lstart y P1,s Lend P1,e

Road skeletal line x z Road co-ordinate system P2,s Orer rw

P2,e

Figure 5.18: The road edge region is dened by the two lines and the oset Orer . This region is specically suited to testing for the edges of the road and inside edges of lane markings. region (gure 5.18) to evaluate its sensor model
n l=1 ml I p=1 Icem (pl,p,i ) n l=1 ml

p(ot |xi ) t

(5.26)

where pI is the pth point of the lth line from the set of points SI dened by l,p,i the road edge region and particle i, and Icem (p) is the value of pixel p from the blurred Canny edge map described below.

Road edge region This region is specially suited to detecting the edge of the road and the edges of lane markings. The lines and points that describe this region in the RCS are given in gure 5.18. The lines are oset from the boundary of the inner road region by a distance Orer for compatibility with the other cues in the system. The oset permits slight road curvatures and ensures that the road edge cue and the road colour cues produces peaks in the likelihood at similar locations. For example, if the are no lane markers, the road edge cue will peak at the edge of the road using the RER and the non-road colour cue will peak at a similar location using the LMR. Inversely, if there are lane markers, both the lane marker cue and the non-road colour cue will peak at the same location using the LMR. Figure 5.19 shows what happens with Orer is too small and the curvature of the road causes the edges of the road to overlap both the road edge region and the road interior region. A typical value for Orer was 0.2 m.

5.4 Lane tracking cues


road edge region good size Orer Orer too small Orer Orer

105

road region

road region

road edges

Figure 5.19: Possible inconsistencies between cues using the road edge region. If Orer is too small than the edges of a road with low curvature can overlap both the road edge region and the road interior region causes inconsistencies in the likelihoods between cues (right). Canny edge extraction A Canny edge detector (Canny 1986) is used to generate the edge map required by the road edge cue, which is blurred to allow for errors in the image sensors (2.2.3). The Canny edge detector uses three stages to generate edges: directional gradients are calculated from the intensity image; non-maximal suppression is used to nd peaks in the image gradients; and hysteresis thresholding is used to locate edge strings. The process for evaluating the road edge cues sensor model from the Canny edge map is shown in gure 5.20. An intensity image of the camera colour image (a) is preprocessed using a Canny edge detector and then blurred with a Gaussian lter (b). This is used in conjunction with the road edge region to evaluate the sensor model of the cue (c).

5.4.3

Road colour cue

The road colour cue (RCC) is designed for roads that have a dierent colour than their surroundings (both unmarked and marked roads). It returns the average sampled pixel value in the hypothesized road interior region from a road colour probability map. This map is dynamically generated each iteration using the

106

Lane localization and tracking with distillation

(a) The original colour image.

(b) The blurred Canny edge map, Icem .

(c) The likelihood from the road edge cue.

Figure 5.20: Road edge cue. A grayscale version of the original camera image is preprocessed using a Canny edge detector and blurred using Gaussian kernel to extract possible lane edges. The road edge region is then used to evaluate the sensor model from this preprocessed image. Note that lighter pixels correlate to higher probabilities in (b) and (c). estimated road parameters and the YUV colour image from the previous iteration. The sensor model is evaluated using p(ot |xi ) t =
n l=1 ml I p=1 Ircpm (pl,p,i ) n l=1 ml

(5.27)

where pI is the pth point of the lth line from the set of points SI dened by l,p,i the road region and particle i, and Ircpm (p) is the value of pixel p from the road colour probability map described below.

Road interior region This region is used by the road colour cue to evaluate the sensor model and test each hypothesis against the interior of the road region. Seven lines form a grid over the hypothesised road region as shown in gure 5.21. The distance Lsep and line 3 were chosen to spread the test region evenly over the central section of the road region.

Road colour model The road colour cue relies on a coarse road colour probability map in UV space to calculate its sensor model. The road colour probability of each pixel in the YUV image is determined using a UV histogram lookup table generated from

5.4 Lane tracking cues


Lstart y P1,s Road skeletal line P3,s x z Road co-ordinate system P2,s P4,e P5,e P6,e P2,e P7,e P4,s P5,s P6,s P7,s P1,e Lend

107

rw

P3,e

Lsep

Lsep

Lsep

Figure 5.21: The road interior region is dened by seven lines and Lsep to distribute the test area over the region evenly. This region is used to test the interior of a hypothesised road region.

(a) The estimated road region from iteration t 1.

(b) The non-zero bins of the blurred histogram that forms the road colour probability map.

(c) The road colour probability image in iteration t.

Figure 5.22: Road colour probability map. This map in UV space is generated from pixels sampled from the estimate of the road location in the previous iteration (a) and is used to form the road colour probability image for use by the road colour and non-road colour cues (b + c). Note that lighter pixels correlate to higher probabilities in (c). the result of the previous iteration. The intensity component of the mapping is removed to reduce the inuence of shadows on the result. The road colour histogram is generated in three steps (gure 5.22 and 2.2.5)

pixels of the previous road model estimate are sampled from the YUV camera image of the previous iteration; a 256x256 UV histogram is created from the UV values of the extracted pixels;

108

Lane localization and tracking with distillation

(a) The original colour image.

(b) The road colour probability image, Ircpm .

(c) The likelihood from the road colour cue.

Figure 5.23: Road colour cue. A road colour probability image is generated from the original camera image and road estimate from the previous iteration. The road interior region is then used to evaluate the sensor model from this image. Note that lighter pixels correlate to higher probabilities in (b) and (c).

the histogram is blurred using a 5x5 Gaussian kernel and normalized to form a smooth road colour probability map.

The colour map is initialized using a default road region with zero lateral oset and yaw, and a road width of 2 m. This road width was chosen to reduce the possibility of the initial road region containing non-road regions. This assumption has an obvious limitation it requires the lane tracker to be started while the vehicle is on the road. This could be overcome by initializing the road colour probability map to have a uniform distribution over all values of U and V and allowing it to converge over time to the true road colour distribution. Figure 5.23 shows an example of the road colour probability map (b) generated from the YUV colour camera image in (a). The likelihood created from the road colour probability image and the hypothesized road regions is shown in (c). This cue is very weak without the addition of the non-road colour cue. This is evident if you consider that any road hypothesis that lies on the road will return a high probability; however, only the hypotheses on the edge are correct. The inverse is true for the non-road colour cue. Both cues produce peaks in their likelihood that slightly overlap because of the oset between the road region and the lane marker region that they use; hence, the combined probability of the two cues produces the correct distribution.

5.4 Lane tracking cues

109

5.4.4

Non-road colour cue

The non-road colour cue (NRCC) is used to evaluate non-road regions in the road colour probability map. It uses the lane marker region to validate that these regions of each hypothesis lie in regions of low road colour probability p(ot |xi ) t =1
n l=1 ml I p=1 Ircpm (pl,p,i ) n l=1 ml

(5.28)

where pI is the pth point of the lth line from the set of points SI dened by the l,p,i non-road region and particle i, and Ircpm (p) is the value of pixel p from the road colour probability map. The lane marker region is used because the lane markers form low probability regions in the road colour probability map and will therefore produce peaks at the same locations in the likelihood as the lane marker cues. In addition, these regions naturally fall in non-road regions when there are no lane markers.

Non-road colour model The road colour probability map described in section 5.4.3 is used with the lane marker region to evaluate the cues sensor model. Figure 5.24 shows the various steps in evaluating the cue. As with the road colour cue, the road colour probability map is generated (b) from the YUV colour camera image (a). These are used in conjunction with the non-road regions to calculate the likelihood (c).

5.4.5

Elastic lane cue

The elastic lane cue (ELC) is a heuristic cue used to move particles towards the lane that the vehicle is in. Preliminary experiments showed the tracking system switching randomly between lanes when driving on a multi-lane highway and necessitated the introduction of this state-based cue. The ELC returns 1 if the lateral oset of the vehicle is less than half of the road width and 0.5 otherwise
i p(ot |xi ) = p(ot |ysr , i , rwi ) = vs t i 1.0 : |ysr | rwi /2 i 0.5 : |ysr | > rwi /2

(5.29)

110

Lane localization and tracking with distillation

(a) The original colour image.

(b) The road colour probability map, Ircpm .

(c) The likelihood from the non-road Colour Cue.

Figure 5.24: Non-road colour cue. A road colour probability map is generated from the camera image and road estimate from the previous iteration. The lane marker region is then used to evaluate the sensor model from this image. Note that lighter pixels correlate to higher probabilities in (b) and (c).

5.4.6

Road width cue

The road width cue (RWC) is particularly useful on multi-lane roads where is is possible for the other cues to favour two or more lanes over one. It returns a value from a Gaussian function centered at a preset road width rw given the hypothesized road width rwi
i p(ot |xi ) = p(ot |ysr , i , rwi ) = t vs

rw 2

exp(rw rw )

2 /2 2 rw

(5.30)

The standard deviation, rw , governs the strength of this cue and a value of 1 m was typically used. The preset road width, rw , was 3.6 m which was empirically determined from previous lane tracking experiments to be the average road width.

5.5

Lane tracking performance

The experimental data presented in this section is the result of two dierent versions of the lane tracking system. A prototype was originally implemented in matlab to test the distillation algorithm and to develop a suite of cues. Lane detection and cue performance were characterized during this stage (5.5.1). The second version was a real-time implementation in C++ that was integrated into TREV. The results presented here are therefore organized to demonstrate the ability of the distillation algorithm as a lane tracking solution. The performance characteristics of the distillation algorithm are analysed with respect to the particle

5.5 Lane tracking performance

111

lter, cue fusion and cue scheduling algorithms. Finally, the integral performance of the lane tracker is characterized and qualitatively compared to a number of prominent lane trackers in the literature.

5.5.1

Lane detection

One of the most impressive characteristics of the particle lter is its prociency for target detection. A separate bootstrapping procedure is commonly used to localize the lane using specic assumptions such as assuming that the vehicle is already on the road when no prior information exists. Conversely, the particle lter seamlessly moves from detection to tracking without any additional computation in the detection phase. Figures 5.25 and 5.26 show the particle locations for the rst eight iterations of the lane tracker. The particles are initially distributed uniformly over the state-space and the convergence of the mean of the particles to the location of the road takes approximately ve iterations while a good estimate of the road location is found within three iterations (extension 2 in appendix D). The convergence of the algorithm is also evident in the three graphs of gure 5.27 where all three parameters of the particle lter converge within ten iterations. The yaw of the vehicle appears to take longer; however, this is due to the slow variation of the vehicles direction over time. The level of the constant variance is maintained by the error diusion of the motion model.

5.5.2

Cue scheduling

Typical behaviour of the cue scheduling algorithm is presented in gures 5.28 and 5.29. In this case, the two colour cues are scheduled into the foreground at iteration 1922, while the lane edge cue is scheduled into the background. This occurs since the combined utility of the two colour cues increases to a value greater than the lane edge cue utility at iteration 1921. All three are not scheduled to run in the foreground due to the processing time constraints outlined in section 4.3. Figure 5.29 shows the reason why the utility of the colour cues increase with respect to the lane edge cue. The colour of the road changes between iterations 1890 and 1905 that causes it to be more distinct from the non-road regions. This results in the stronger performance of the colour cues as the road colour probability map adjusts.

112

Lane localization and tracking with distillation

(a) Uniformly initialized particles

(b) Iteration 3

Figure 5.25: Particle lter convergence I (continued in gure 5.26). The initial uniform distribution of the particles over the state space in (a) converges after only eight iterations to the Gaussian-like distribution in gure 5.26(b).

5.5 Lane tracking performance

113

(a) Iteration 5

(b) Iteration 8

Figure 5.26: Particle lter convergence II (continued from gure 5.25). The initial uniform distribution of the particles over the state space in gure 5.25(a) converges after only eight iterations to the Gaussian-like distribution in (b).

114

Lane localization and tracking with distillation

(a) Lateral oset (ysr )

(b) Vehicle yaw (vs )

(c) Road width (rw)

Figure 5.27: Particle lter convergence is evidenced by the reduction of the state variance over time to a stable level. This level is maintained by the error diusion of the motion model.

Figure 5.28: Cue scheduling example. The utility of the two colour cues increases between iterations 1922 and 1944 and they are both scheduled into the foreground cue list, while the lane edge cue is removed. The total utility over this range has increased due to the addition of the extra cue into the foreground list.

5.5.3

Vehicle pose and road width tracking

The performance of a lane tracking system is often quantied by the percentage of time the experimental vehicle is operated in autonomous mode. The term autonomous mode has a variety of meanings in the literature, which can lead

5.5 Lane tracking performance

115

(a) Iteration 1890

(b) Iteration 1905

(c) Iteration 1920

(d) Iteration 1935

Figure 5.29: The scene during cue scheduling. The two colour cues are scheduled into the foreground at iteration 1922, while the lane edge cue is scheduled into the background. Note the change in road colour from iteration 1890 to 1905. This change in road colour causes it to become more distinct from the non-road regions resulting in the eventual increase in utility of the colour cues as the road colour probability map adjusts.

to quantitative comparisons between systems that are inherently ambiguous: the VaMP prototype in the experiment from Munich (Germany) to Odense (Denmark) in 1995 (Maurer et al. 1996) was operated autonomously although a human operator selected the desired travel speed and initiated automatic lane change manoeuvres, while supervising the rear hemisphere for obstacles; on the other hand, Navlab 5 used in the No Hands Across American trial from Pittsburgh, PA, to San Diego, CA, by researchers at the Robotics Institute of Carnegie Mellon University (Pomerleau and Jochem 1996) produced steering commands for lane keeping while a human operator controlled both the throttle and brakes of the vehicle. In addition to the ambiguous meaning of autonomous mode, it is not an appropriate measure of performance for a lane tracking system. This is because it

116

Lane localization and tracking with distillation

is a measure of performance for an autonomous agent that includes not only the lane tracking system, but a collective of collaborating systems designed to control hardware in combination with high level logic systems for decision making. To measure performance in this study, a direct frame-by-frame comparison between the results of the lane tracking system and manually derived baseline results is used. This baseline was obtained by a human operator manually labelling each image from the experiment to calculate the road width and vehicle state. In the following analysis, the absolute estimate error xerr is calculated for each parameter x of the particle lter to be the absolute dierence between the estimate xe and the baseline xb xerr = |xe xb |. (5.31)

The mean estimate error <xerr > is the average of absolute estimate error over time N 1 <xerr > = x(i) (5.32) N n err where N is the total number of iterations in the experiment. If the mean estimate error is greater than a threshold then the iteration is deemed to be a failure. The thresholds are set according to the lane types being tracked. In these experiments, the thresholds were set to 0.3 m, 2.3 o and 0.6 m for ysr , vs , and rw respectively. Three sets of results are presented for the mode and weighted mean estimates: unltered, median ltered, and median ltered with a moving average lter. The mean and median lters were both computed over seven consecutive iterations. The mode estimate, or MAP estimate, xme , is the particle with the highest probability in the posterior, p(x|d) xme = argmax p(x|d).
x

(5.33)

The weighted mean estimate, xwme , is the mean of the particles in the posterior weighted by their probability, which is the expectation value, E{x}, of the posterior
n

xwme =
i=1

xp(x|d) = E{x}.

(5.34)

The lane tracker was tested in several dierent scenarios including: highways with clear lane markings and light trac;

5.5 Lane tracking performance outer city driving with high curvature roads; medium levels of rain on a high curvature outer city road; highways with poor lane markings and light trac.

117

The experimental results presented here are from a series of trials, some of which were processed online and other oine. The oine experiments were necessary to obtain the complete dataset for comparisons with baseline results. This was the result of a hardware limitation where the images captured from the vision system could not be saved to disk in real-time. Figures 5.305.33 show the lane tracking results in the above four scenarios using six dierent cues. Extensions 36 in appendix D shows the tracking sequences for these experiments. The rst experiment shows the lane tracker performing well on a highway with clear lane markings (gure 5.30 and extension 3). It successfully handles shadows across the road, departure lanes and miscellaneous markings that can often confuse lane trackers. The application of a median lter followed by an averaging lter improves the results, however all failures are removed using the weighted mean of the particles. A more challenging experiment is shown in gure 5.31 (extension 4). Here the lane tracker is tested on a road that has both high horizontal and vertical curvature. Extremely harsh shadows were present on the road surface as well as dramatic changes in lighting from overhead bridges, and several departure lanes. The tracking performance in this case is slightly lower than the highway example because of the high degree of curvature present, but is still very good. The high degree of curvature invalidates both the motion model and the no curvature assumption, resulting in slightly deteriorated performance. Rain has traditionally proven a very dicult scenario for lane trackers. The road becomes reective, the images captured through the windscreen are blurred and there is generally poor visibility. Figure 5.32 (extension 5) shows the lane tracker performance in medium rain on the high curvature road of the previous experiment. Notice that surprisingly, the success rate is higher than in the previous experiment without rain. One possible reason for this outcome could be that the blurred vision caused by the rain on the windscreen conditioned the observations of the cues such that the fused posterior in the particle lter was smoother. Re-

118

Lane localization and tracking with distillation

sampling in this case would show improved convergence. Other consequences of rain that normally cause concern for traditional lane trackers, such as a reective road, do not impair the result here because of the validation framework in the distillation algorithm. The a priori knowledge that is inherently incorporated into the algorithm through this validation framework as well as particles concentrating resources in areas of high probability help to prune erroneous results caused by rain artifacts. The nal experiment was conducted on a particularly dicult highway sequence where the markers separating the lanes are not clearly visible (gure 5.33 and extension 6). This experiment was designed to test the limits of the lane tracker where the signals from the cues are very weak. Figure 5.33(a) shows three frames from the tracking sequence where there were harsh shadows, low lighting, and shadows parallel to the lane markers. Note how dicult it is to see the markers separating the lanes. The comparison between lane tracker results and the manual results show a reasonable correlation. There is one notable exception where the lane tracker was distracted by a passing car at iteration 900 due to the weak lane marker signals (gure 5.34). Note that the lane tracker does successfully recover from this. The constant oset that is visible in the yaw and road width measurements is due to the lane trackers dependence on the heuristic road width cue when it could not consistently track the right edge of the road in this sequence. In this case, a default road width, governed by the road width cue, automatically dominates the road width estimation. Several characteristics of these results stand out:

Filtering the estimates eected the results dierently for each scenario. In the highway experiment, median ltering reduces the mean error signicantly and the weighted mean estimate has the highest success rate. This is the result of the low dynamical nature of the highway scenario. The samples in the particle lter converge to the road location and the weighted mean of the samples removes the eect of outliers on the estimation. In the case of the high curvature road, median ltering increases the success rate, but reduces the mean error by a much lower degree. The weighted mean estimate for this scenario decreases the accuracy of the result in all cases because of the highly dynamical nature of the road. With this in mind, the weighted mean would seem an appropriate choice for the estimate of the scene. It is inherently smoothed by the validation mechanism of the particle

5.5 Lane tracking performance

119

(a) Rendered lane tracking results. The top row shows 3 images from the sequence that include harsh shadows and departure lanes. The bottom row shows the tracked lane.

(b) Lateral oset

(c) Yaw

(d) Road width

Estimate Mode

Filter Median Median + averaged Median Median + averaged

Weighted mean

Success rate (%) 99.0 99.5 99.8 100 100 100

<yerr > <err > <rwerr > (m) (o ) (m) 0.06 0.29 0.07 0.03 0.23 0.06 0.03 0.20 0.05 0.07 0.24 0.04 0.07 0.24 0.04 0.07 0.23 0.04

(e) Comparison with the baseline results. The weighted mean estimate has the highest success rate with minimal cost to the accuracy of the result.

Figure 5.30: Performance characteristics of the lane tracking system on a highway with clear lane markings. The top row shows a representative example of the rendered lane tracking results. The middle row shows a comparison with the baseline results over time (for an enlarged view, see gure C.1 in appendix C). The bottom row shows a quantitative comparison with the baseline results.

120

Lane localization and tracking with distillation

(a) Rendered lane tracking results. The top row shows 3 images from the sequence that include harsh shadows, dramatic lighting changes and high curvature roads. The bottom row shows the tracked lane.

(b) Lateral oset

(c) Yaw

(d) Road width

Estimate Mode

Filter Median Median + averaged Median Median + averaged

Weighted mean

Success rate (%) 97.7 98.8 97.7 96.5 96.5 96.5

<yerr > <err > <rwerr > [m] [o ] [m] 0.08 0.50 0.09 0.07 0.47 0.09 0.07 0.44 0.08 0.09 0.60 0.12 0.09 0.61 0.12 0.09 0.59 0.11

(e) Comparison with the baseline results. The weighted mean estimate has a lower success rate in this case due to the highly dynamic nature of the road.

Figure 5.31: Performance characteristics of the lane tracking system on a high curvature road. The top row shows a representative example of the rendered lane tracking results. The middle row shows a comparison with the baseline results over time (for an enlarged view, see gure C.2 in appendix C). The bottom row shows a quantitative comparison with the baseline results.

5.5 Lane tracking performance

121

(a) Rendered lane tracking results. The top row shows 3 images from the sequence that include medium rain, erroneous lines and reections on the road and image distortion caused by windscreen wipers. The bottom row shows the tracked lane.

(b) Lateral oset

(c) Yaw

(d) Road width

Estimate Mode

Filter Median Median + averaged Median Median + averaged

Weighted mean

Success rate (%) 96.3 98.2 98.2 97.5 97.5 98.2

<yerr > <err > <rwerr > [m] [o ] [m] 0.10 0.53 0.10 0.09 0.48 0.10 0.09 0.46 0.09 0.08 0.46 0.09 0.08 0.46 0.09 0.08 0.46 0.09

(e) Comparison with the baseline results. The weighted mean estimate shares the highest success rate and also improves the accuracy of the result.

Figure 5.32: Performance characteristics of the lane tracking system on a high curvature road in the rain. The top row shows a representative example of the rendered lane tracking results. The middle row shows a comparison with the baseline results over time (for an enlarged view, see gure C.3 in appendix C). The bottom row shows a quantitative comparison with the baseline results.

122

Lane localization and tracking with distillation

(a) Rendered lane tracking results. The top row shows 3 images from the sequence that include harsh shadows, dramatic lighting changes and barely visible lane markings. The bottom row shows the tracked lane.

(b) Lateral oset

(c) Yaw

(d) Road width

Estimate Mode

Filter Median Median + averaged Median Median + averaged

Weighted mean

Success rate (%) 87.1 88.2 90.3 91.4 91.4 92.5

<yerr > <err > <rwerr > [m] [o ] [m] 0.15 1.22 0.32 0.14 1.30 0.33 0.14 1.25 0.30 0.11 1.26 0.20 0.11 1.26 0.20 0.11 1.26 0.20

(e) Comparison with the baseline results. The weighted mean estimate has the highest success rate and also improves the accuracy of the result.

Figure 5.33: Performance characteristics of the lane tracking system on a highway test with poor lane markings. The top row shows a representative example of the rendered lane tracking results. The middle row shows a comparison with the baseline results over time (for an enlarged view, see gure C.4 in appendix C). The bottom row shows a quantitative comparison with the baseline results.

5.5 Lane tracking performance

123

(a) Iteration 810

(b) Iteration 840

(c) Iteration 870

(d) Iteration 900

(e) Iteration 930

(f) Iteration 960

Figure 5.34: The lane tracker is distracted by a passing vehicle because of the low signal produced by the lane markers separating the lanes. lter and automatically reduces the eect of outliers, while only harming the accuracy in highly dynamic scenarios. Traditional hindrances to lane tracking such as miscellaneous markings in the middle of the road and rain do not pose a signicant problem to the distillation algorithm. This is a result of two characteristics of the algorithm. First, a number of a priori constraints are indirectly incorporated into the algorithm through the hypothesis validation process of the particle lter. Instead of searching for a result like traditional lane trackers, the distillation algorithm validates a number of hypotheses that all satisfy the road model construct. Hence, constraints such as all lines meeting in a vanishing point and lane markers lying on the plane of the road, are included at a fundamental level of the algorithm. Second, because particles are concentrated in areas of high probability, there is less chance of distraction by a miscellaneous marking that appears in the road image. The selection of cues has a strong impact on the outcome of the algorithm. For example, since no cues are particularly sensitive to dramatic changes

124

Lane localization and tracking with distillation in lighting (none are reliant on particular threshold parameters that are a function of image contrast), the traditional dicult scenario of entering a tunnel does not pose a problem to this algorithm.

Changes in road type and disappearing lane markers (or weak lane marker signals) are handled elegantly within the cue fusion framework. As long as there is a cue, or a combination of cues, that can handle a particular scenario, the road parameters will be estimated well. The kidnapped robot problem is successfully handled by the random dispersion of a fraction of the particles in the particle lter. When the lane is lost, the random dispersion of particles quickly relocates the correct location of the lane and concentrates resources there (gure 5.34). A number of observations were formed through experimental experience with the algorithm that should be noted for future use: The road colour cues were dicult to incorporate into the cue processing structure. It was mentioned previously that it is important to concentrate resources on calculating information rich observations so that the evaluation of the particles was ecient. However, it was found that the colour cues, while expensive to calculate, provided little additional information and were mostly scheduled into the background. The colour cues are important for cases where there are no clear lane markers and therefore should not be ignored. There are two possible solutions for improving the colour cues. First, develop a sensor model where the cue estimates the position of the road from the road colour probability map and evaluates the hypotheses according to an error model based around this estimate. This is similar to the work of Thrun et al. (2000) for mobile robot localization. Second, the road colour probability map could be blurred to articially induce noise, which may make the likelihood surface smoother and more well dened. The speed of the algorithm is linearly proportional to the number of particles in the particle lter. While a high number is desirable to increase the resolution in the search space, this number is practically limited by the speed of the hardware. 300 particles were used to estimate the 3D state vector, which was found to be sucient. However, if we were to introduce extra parameters such as vehicle roll and curvature estimation, the number of particles would have to be increased, pushing the hardware to its limits.

5.6 Comparisons with the literature

125

The algorithm was sensitive to the amount of noise introduced into the observations by blurring them with a Gaussian kernel. If they were not blurred, the algorithm would often have diculty converging, which is consistent with the observations of Thrun et al. (2000). While every eort was made to ensure that the algorithm executed in realtime, real-time performance is not guaranteed since a real-time operating system was not used. This was only a problem when the system was set to run at a frame-rate that was beyond the speed of the hardware.

5.6

Comparisons with the literature

The lack of a standard measure of performance and a common ground truth under which dierent algorithms can be tested is a signicant problem in the lane tracking community. The percentage of time the experimental vehicle was operated in autonomous mode is the most common measure of performance used by researchers, if any quantitative analysis is oered at all (Bertozzi and Broggi 1998; Dickmanns 1999; Pomerleau 1995; Thorpe 1990; Suzuki et al. 1992). This is often quoted for long distance experiments over very specic road surfaces, however the accuracy of the vision systems is never analysed directly. This makes an accurate comparison between vision systems virtually impossible, for not only is there no standard measure of performance, all the systems operate on completely separate datasets. The previous section presented a quantitative analysis of lane trackers performance in a number of dierent road scenarios. The following section presents a qualitative comparison between the results of this lane tracker and a number of prominent systems in the literature. It is a qualitative comparison for four reasons: dierent datasets are used for each experiment; each system in the literature uses a dierent denition of autonomy; the lane tracking systems are not analyzed directly but through a performance measure that depends highly upon the rest of the system encapsulating the autonomous vehicle; no system explicitly denes how their measures of performance are calculated or what denes system failure.

126

Lane localization and tracking with distillation

The result of this is that a direct comparison between the accuracy of the lane trackers is impossible. A comparison between the measures of performance provided by each project is oered as a rough guide, but the reader is warned not to make strong quantitative judgments considering the radical dierences in datasets and conditions. A dataset containing the baseline, the respective image set and a transformation framework between the road-centric coordinate system and the image coordinate system is included in extension 7 in appendix D and online3 . This can be used as a common dataset for comparisons between lane tracking systems.

5.6.1

Public road trials

Three of the most prominent trials of autonomous vehicles in real trac conditions on public roads include: The VaMP prototype in the experiment from Munich (Germany) to Odense (Denmark) in 1995 by researchers at the Universitt der Bundeswehr Mnchen a u (UBM) (Maurer et al. 1996). Navlab 5 was evaluated in the No Hands Across American experiment from Pittsburgh, PA to San Diego, CA by researchers at the Robotics Institute of Carnegie Mellon University (CMU) in 1995 (Pomerleau and Jochem 1996). The ARGO experimental vehicle developed at the Dipartimento di Ingegneria dellInformazione of the Universit` di Parma, Italy, was tested in a the MilleMiglia in Automatico Tour in 1998. In this trial, the GOLD lane tracking system guided the vehicle over 2000 km of roads throughout Italy (Broggi et al. 1999a).

Munich to Odense UBM experiment The VaMP prototype vehicle developed at the Universitt der Bundeswehr Mna u chen (UBM) was tested in a large-scale trial in 1995 on public motorways from Munich, Germany to Odense, Denmark. This was a ground-breaking experiment in both its size and in the level of autonomy attained. Over 95% of the 1600 km
3

http://rsise.anu.edu.au/~nema/research

5.6 Comparisons with the literature

127

journey was performed autonomously where the vehicle was responsible for both lane keeping and longitudinal control while the human driver was responsible for setting the desired travel speed according to speed limits and environmental conditions. Over 400 automatic lane change manoeuvres were performed by the vehicle which were initiated by the human driver who was also responsible for checking the rear hemisphere of the vehicle for obstacles. The road detection and tracking module was responsible for estimating a state vector describing the road shape and dynamical model of the vehicle (2.3.1; Dickmanns and Zapp 1987; Dickmanns 1998; Dickmanns 1999). This vector included the lateral oset, yaw, side slip angle, pitch angle and the horizontal and vertical clothoid parameters describing the shape of the road. A Kalman lter was used to recursively estimate this state vector based on the visual detection of lane markers. The left and right lane markers were searched for by an oriented edge detector of a small width in a long search window across the expected lane location. The road model was then tted to the detected edge elements (Dickmanns and Zapp 1987). The main purpose of this experiment was testing the capability, reliability and performance of highway autonomous driving. The success of the system comes from the high level of dynamical modelling of the vehicle and the highly uniform roads experimented on. The main problems discovered during the trial occurred at construction zones where lane markings were ambiguous. Another major impediment was the limited operating range of the video cameras. The glare of the sun on the windscreen would often saturate the cameras making the detection of lanes dicult.

Pittsburgh to San Diego CMU experiment The Robotics Institute of Carnegie Melon University conducted a transcontinental trial from Pittsburgh, PA to San Diego, CA, in a Pontiac Trans Sport that was to test over a decade of research in autonomous vehicles (Jochem and Pomerleau 2004). Navlab 5 was the brains behind the vehicle, which included a number of portable computers, a windshield mounted camera, a GPS receiver and a radar system for obstacle detection (Jochem et al. 1995). The vehicle drove autonomously for 98.2% of the 2849 mile journey. Lateral control of the vehicle was performed by Navlab 5 issuing steering commands after estimating the road curvature from a picture of the frontal scene and radar data. The throttle and

128

Lane localization and tracking with distillation

brakes were human operated. The rapidly adapting lateral position handler (RALPH) was used to detect and track the lanes for lane keeping (2.3.4; Pomerleau 1995). RALPH uses horizontal intensity proles to determine and remove road curvature from a birds eye view of the road and then calculates the lateral oset of the vehicle using a horizontal scanline template matching approach. The algorithm exploits a at-road assumption and uses any markings that run parallel to the road to calculate the curvature of the road (i.e. lane markings, oil spots, curbs and even ruts made in the snow by car wheels). The high percentage of autonomy attained arms the road-worthiness of the on-road autonomous lane keeping abilities, and the lateral roadway departure warning and support systems. The main problems encountered include poor visibility due to rain, low sun reections and glare from the windscreen, shadows of overpasses, construction zones, and road deterioration. A strong point of the system is its ability to use any feature that runs parallel to the road to estimate the road curvature.

MilleMiglia in Automatico Tour trial The ARGO vehicle developed at the Dipartimento di Ingegneria dellInformazione of the Universit` di Parma, Italy, was tested in 1998 in the MilleMiglia in Autoa matico Tour (Broggi et al. 1999b). The aim of this trial was to demonstrate that it is possible to autonomously drive a vehicle under dierent road and environmental conditions by using only visual information and low-cost general purpose hardware. Only low cost passive sensors (cameras) were used for environmental sensing (stereo vision for obstacle detection and monocular vision for road geometry extraction). Of the 2000 km journey, 94% of it was driven in autonomous mode, which included autonomous lateral control for road following and humantriggered lane change manoeuvres. Lane and object detection was performed using the Generic Obstacle and Lane Detection (GOLD) algorithm (2.3.3; Bertozzi and Broggi 1998). GOLD transforms stereo pairs into a common birds eye view and uses a pattern matching technique to detect lane markings on the road. The remapped image simplies the detection of the lane markings as they can be visualized as vertical lines with a constant separation. Disparities between the two images in the common domain

5.6 Comparisons with the literature Group UBM Distance [km] 1600 % Automatic 95 Level of Autonomy Autonomous steering and longitudinal control with human-selected travel speed and human-triggered lane changing manoeuvres Autonomous steering for lane keeping Autonomous steering and human-triggered lane changing manoeuvres

129

CMU ARGO

4587 2000

98.2 94

Table 5.2: Statistics for the prior art autonomous vehicle trials. Experiment Highway (clear markers) High curvature road Rain Highway (unclear markers) Average Success rate (%) 100 96.5 97.5 91.4 96.4 <yerr > [m] 0.07 0.09 0.08 0.11 0.09 <err > [o ] 0.23 0.60 0.57 1.14 0.22 <rwerr > [m] 0.04 0.12 0.09 0.20 0.11

Table 5.3: Statistics for the distillation lane tracker. All the results here are based on the weighted mean estimates (expected value) of the posterior. are used to detect obstacles that deviate from the plane of the road. The detected obstacles area on the images is removed from the remapped images thus removing the chance of the obstacle inuencing the detection of the lanes. Diculties in lane tracking were caused by poor road infrastructures (absent or worn lane markings), sun glare on the windscreen and the inability of the cameras to adapt to dramatic changes in illumination such as at the entrance or exit of a tunnel.

Comparison with literature summary All three trials succeeded in attaining high percentages of autonomous travel with a move to non-specialized equipment looking promising (Dickmanns used predominantly highly specialized equipment in the initial VAMP vehicle, but has

130

Lane localization and tracking with distillation

recently moved to non-specialized processing equipment while using expensive, specialized vision systems). The main problem that is evident from these experiments is the data acquisition systems. Computing power is gradually becoming a non-issue, however light reections and glare from the sun on the windscreen as well as dramatic lighting changes has proved dicult for the cameras to handle. The introduction of cameras with logarithmic response curves should help to curb this issue. The promising results show that full automation of vehicles is feasible at least on motorways or highly structured roads, but that various manmade structures such as construction zones and overpasses can prove confusing for systems that try to t a road model to features in an image. Tables 5.2 and 5.3 compare the results of these systems with the distillation lane tracker.

5.7

Summary

A direct quantitative comparison with other lane tracking systems is dubious at best given the loose measures and dierent datasets used. However, a qualitative analysis can oer some insight into performance. As shown in tables 5.2 and 5.3 the distillation lane tracker performs both slightly above and below the results of the other lane trackers depending on the scenario. The lane tracker was found to work robustly with respect to the problems typically associated with lane tracking including

dramatic lighting changes; changes in road colour; changes in road type; shadows across the road; roads with erroneous lines that are not lane markings; rain; lane markings that disappear and reappear.

5.7 Summary

131

This can be attributed to the combination of particle ltering and cue fusion. Because of the particle lter, cues only have to validate hypotheses and do not have to search for the road by tting a model to image features. This implicitly incorporates a number of a priori constraints into the system (such as road edges meeting at the vanishing point in image space and the edges lying in the road plane) which assist it in its detection task. Cue fusion was found to dramatically increase the robustness of the algorithm due to the variety of conditions the cues were designed for. The initial set was limited to image based cues that contained no prior information except for the road model described in section 5.4. The particle lter often converged to lane segments that the vehicle was not in or to the whole road instead of a single lane. This was due to the lane markings and edges of the road having stronger signals than the lane markings separating the lane. The addition of the two heuristic cues (road width cue and elastic lane cue) to strengthen hypotheses that contain the vehicle and have a road width close to the average was found to be an eective solution to this problem. In addition to these points, the particle statistics of the lane tracker oer a form of condence measure that was not used here. A high covariance on the distribution of particles indicates a at posterior distribution, in turn indicating that the particle lter is having diculty tracking the road. Conversely, high kurtosis indicates that the particle lter is condent in its result. The combination of these factors has proven to be an elegant and successful approach to lane tracking that can be easily adapted over time to incorporate new cues and more processing power.

132

Lane localization and tracking with distillation

Chapter 6 Vision in and out of vehicles

economic eects that road related accidents have on our community. In 1999 alone between 750,000 and 880,000 people died globally in road related accidents at an estimated cost of US$ 518 billion (Jacobs et al. 2000). Around 30% of these fatal car crashes can be attributed to driver inattention and fatigue (HoRSCoCTA 2000; Victor et al. 2001). One way of combating this problem is to develop intelligent vehicles that are self-aware and act to increase the safety of the transportation system. Numerous studies have been performed to analyze signs of driver fatigue through the measurement of the visual demand on the driver. This is often through manual frame-by-frame visual analysis and measurement or infrared corneal-reection technologies (Land 1992; Land and Tatler 2001). While these studies produce valuable results, they are often time consuming and too unreliable for many practical purposes (Victor et al. 2001). This chapter introduces VIOV, an intelligent transportation system that fuses visual lane tracking and driver monitoring technologies in the rst step towards closing the loop between vision inside and outside the vehicle (Apostolo and Zelinsky 2004). The distillation lane tracker is integrated with the driver monitoring system faceLABTM to determine where and when the drivers attention is focused on the road. VIOV is part of an ongoing project at the Australian National University focused on developing a system of cooperating internal and external vehicle sensors to aid research into the visual behaviour of the driver. By combining lane tracking with driver monitoring we can detect not only when a driver is looking at the road, but also where their attention is focused (gure 6.1). This is the rst step in characterizing the environment surrounding the driver for

N the introduction to this dissertation we discussed the startling social and

134

Vision in and out of vehicles

Figure 6.1: Vision in and out of vehicles. By combining lane tracking with driver monitoring it becomes possible to know exactly where and when the drivers attention is focused on the road. a complete analysis of what holds the drivers attention. There are numerous practical uses for this technology lane departure warning systems and particularly the reduction of false positives; obstacle detection and warning systems; automated driver visuo-attentional analysis systems; fatigue and inattention warning systems.

This chapter is organized as follows. First, the architecture of the VIOV is presented. Second, the driver monitoring system faceLABTM is reviewed followed by a discussing of its integration with the distillation lane tracker. The chapter is concluded with a review of the experimental results of the integrated system and an analysis of the drivers visual behaviour in several dierent scenarios.

6.1 System architecture

135

Lane Tracker

Driver Monitor

Lateral offset Vehicle yaw Road width

Head pose Eye gaze

Drivers focus of attention (1 of 12 regions of interest)

High Level Data Correlator


Figure 6.2: Structure of the integrated system. The lane tracker estimates a model of the road while faceLABTM tracks the drivers head pose and eye gaze. The information from both of the sub-systems is fused to estimate what regions of the scene hold the drivers attention.

6.1

System architecture

The software architecture is composed of the lane tracker, the driver monitor faceLABTM , and the high level data correlation system (gure 6.2). The lane tracking system calculates the state of the vehicle relative to the road, xlt (the lateral oset of the vehicle, ysr , and the yaw of the vehicle, vs , with respect to the centerline of the road and the road width, rw), while the driver monitor tracks the drivers head pose, xhp , and eye gaze, xgaze . The information from both subsystems is integrated in the high level data correlation system to compute the drivers focus of attention. The drivers focus of attention is extrapolated to one of 12 regions of interest (gure 6.3). The regions are divided into three dierent ranges: the near-eld, the far-eld and the horizon. The horizon is used to collect eye gaze data that is roughly parallel to the ground plane and doesnt intersect the other two ranges.

136

Vision in and out of vehicles

Figure 6.3: Regions of interest for obtaining the drivers focus of attention. The near-eld and far-eld cover the ground plane, while the horizon region is used to collect eye gaze data that is roughly parallel to the ground. These regions are split up into a further three or four regions governing which side of the scene the driver is focused on.

The near-eld covers the ground plane up to the end of the calculated road region (26 m in front of the vehicle). The far-eld extends from the near-eld to a distance of 100 m away from the driver. Each of these three ranges is split into another three or four regions depending on the experiment. On a two-lane road, four regions are dened: a region to the left of the road (the lane detected by the lane tracker), the right lane containing on-coming trac and a region to the right of the road. The right hand lane containing the oncoming trac is not detected by the lane tracker it is assumed to be a mirror image of the detected left lane. If the experiment is on a one-way road such as a highway, the right lane containing on-coming trac is not used. The system can be extended to cater for any road conguration. The drivers focus of attention is obtained by calculating the intercept of the drivers eye gaze vector found in faceLABTM , with the regions calculated by the lane tracker (6.3).

6.2 Driver monitoring with faceLABTM

137

6.2

Driver monitoring with faceLABTM

The faceLABTM package (Victor et al. 2001) is a driver monitoring system commercialized by Seeing Machines (Seeingmachines 2004) based on research and development work between ANU and Volvo Technological Development Corporation. It was originally developed by members of the Robotic Systems Laboratory, ANU, and is actively used in the ITS project as well as a number of other projects (Atienza and Zelinsky 2002). It uses a passive stereo pair of cameras mounted on the dashboard of the vehicle to capture 60 Hz video images of the drivers head. These images are processed in real time to determine the 3D position of matching features on the drivers face. The features are then used to calculate the 3D pose of the persons face (1mm, 1o ) as well as the eye gaze direction (3o ), blink rates and eye closure. Extension 8 in appendix D shows faceLABTM tracking a persons head pose and eye gaze over a short period. For the experiments presented here, faceLABTM measures the head position, xhp , and gaze direction of the driver, xgaze , at a rate of 60 Hz xhp = xgaze = xhp yhp zhp ; . (6.1) (6.2)

xgaze ygaze zgaze

Here, xhp is the location of the head (dened at the center of the forehead) in the Cartesian world coordinate system that is initialized in the calibration of faceLABTM . In this work, only the head position is important and the orientation is ignored. The drivers gaze direction, xgaze , is a 3D point estimated relative to xhp .

6.3

Integrating the lane tracker and faceLABTM

Combining the head pose and gaze direction captured by faceLABTM with the parameterized lane model determined by the lane tracker, we can track the visual behaviour of the driver relative to the road. Calculating the intercept of the drivers gaze vector with the regions of interest shown in gure 6.3 determines the drivers focus of attention. Additionally, the visual scan patterns of the driver can be recorded with emphasis on xation points and saccade movements to dierent regions in the scene.

138

Vision in and out of vehicles

Each focus of attention region in gure 6.3 is dened by four vertices in the world coordinate system1 of faceLABTM x1 x2 x3 x4 Xv = y1 y2 y3 y4 . z1 z2 z3 z4 The patch vertices relative to the head position x1 x2 x3 x4 Xv = Xv xhp xhp xhp xhp = y1 y2 y3 y4 z1 z2 z3 z4 are converted to pitch, , and yaw, , angles according to i = atan2(xi , zi ); i = atan2(yi , x2 + zi2 ). i (6.5) (6.6)

(6.3)

(6.4)

Here atan2(a, b) is a function that calculates the arc tangent of b/a and uses the signs of its arguments to determine which quadrant the result lies in. Applying equations 6.5 and 6.6 to the vertices Xv , they can be arranged into the following 2D homogeneous points 1 2 3 4 = 1 2 3 4 1 1 1 1

Xp = xp1 xp2 xp3 xp4

(6.7)

The direct linear transformation (DLT) algorithm (Hartley and Zisserman 2000) can then be used to calculate a homography H that transforms these 2D homogeneous points onto the unit square Xus Xus = HXp The homography is calculated by solving for the null vector of A 1 h Ah = A h2 = 0. h
1

(6.8)

(6.9)

The conversion from the road coordinate system of the vehicle and the world coordinate system of faceLABTM is handled by a 3D translation and rotation matrix that is dependent on the internal conguration of TREV.

6.4 Experimental results where

139

h1 h1 h2 h3 H = h2 = h4 h5 h6 h7 h8 h9 h3

(6.10)

and A is formed from the elements of Xp 0 xp1 0 xp2 0 xp3 0 xp4 A = xp1 0 xp2 0 xp3 0 xp4 0 xp1 xp1 xp2 0 0 0 0 xp4

(6.11)

Then the gaze vector can be projected onto the same plane using the homography xgaze = Hxgaze (6.12)

and if xgaze lies within the unit square then the drivers gaze is focused on that region. This is repeated for each focus of attention region Xv to determine which region holds the drivers attention (intersections with the ground plane take priority over intersections with the horizon).

6.4

Experimental results

VIOV was tested on a highway and on an outer city high curvature road. Example rendered output of the system is shown in gure 6.4 (extension 9 in appendix D) for the high curvature road. The light gray region shows where the drivers attention is focused. These results can be viewed interactively and analyzed using the matlab code dataset in extension 10. A comparison of the yaw of the vehicle and the drivers gaze is shown in gure 6.5 while a comparison of the drivers focus of attention statistics is shown in gure 6.6. Figure 6.5 shows a strong correlation between the yaw of the vehicle (a good indication of the curvature of the road) and the drivers gaze direction. This suggests that this drivers gaze follows the curvature of the road. A similar result was found by Land (1992) who used a frame-by-frame visual analysis system to nd that the driver often looks at the tangent of the road when cornering. Figure 6.6 shows the percentage of time the driver focuses on the dierent regions of interest in the scene. The classication Other is the percentage of time that the drivers gaze did not intercept any of the elds of interest. These graphs clearly

140

Vision in and out of vehicles

(a) Focus of attention is to the left of the road on the horizon

(b) Focus of attention is in the far-eld on the left lane

(c) Focus of attention is in the near-eld on the left lane

(d) Focus of attention is on the horizon in the left lane

Figure 6.4: VIOV results: rendered output for a high curvature road. Each gure shows the rendered lane tracking results (top left), the view from within the cabin (top right), and the rendered integrated results (main). The rendered results show the head pose and eye gaze of the driver and the drivers focus of attention (the light gray region).

show the repeatability in the experiments as well as particular characteristics of the experimental dataset. The road used for the four pie charts to the left has a predominantly right curvature in the north-easterly direction (column two of gure 6.6) shown by the greater proportion of time the driver spent focusing on the right lane as well as the right side of the road. Note that the classication Right Road in these pie charts indicates the lane that the on-coming trac occupies. The larger fraction of time classied as Other in the south-westerly experiments is a result of the driver looking around the left curvature to such a high degree that his gaze does not intercept any of the listed regions. This does not appear

6.5 Conclusions

141

Figure 6.5: VIOV results: comparison between the vehicle yaw and drivers gaze in two tests along the same piece of road but in dierent directions. The dashed line is the yaw of the vehicle while the solid line is the yaw of the drivers gaze. Note that the large peaks are generated by the drivers saccade movements to the left and right windows of the car. The drivers gaze closely follows the vehicle yaw. in the north-easterly experiments because the combined Right Road and Right regions are larger and hence capture more of his gaze. The amount of time the driver spent looking at the road in the highway experiment (right pie chart of gure 6.6) stayed approximately the same as with the high curvature roads even though the road was actually smaller (no right lane). The time spent focusing on the left and right sides of the highway was approximately equal which is to be expected considering the low curvature of the road.

6.5

Conclusions

An integrated visual driver monitoring and lane tracking system was successfully used to close the loop between vision inside and outside the vehicle for the rst time. A strong correlation was found to exist between the eye gaze direction of the driver and the curvature of the road, while the repeatability of the experiments was high. A similar correlation has been found previously by Land (1992) using a manual frame-by-frame visual analysis of the scene. While the measured data enabled the estimation of the drivers focus of attention, a more informative visuo-attentional analysis could be made using a model of the external scene with greater detail. A step towards this would be estimating the curvature of the

142

Vision in and out of vehicles

Figure 6.6: VIOV results: the drivers focus of attention along a highway and high curvature road. In the four pie charts to the left, the top and bottom rows present the data from the two dierent tests along the same high curvature road while column one is for the south-west route and column two is the north-east route. The pie chart to the right is for the highway test road and detecting obstacles such as cars and sign-posts. The applications of such a system are then numerous, possibly the most important being driver warning systems of the future.

6.6

Acknowledgments

I would like to thank the sta of Seeing Machines for their help with faceLABTM , particularly David Liebowitz who was kind enough to spend time helping with the analysis of the data collected.

Chapter 7 Conclusion
There is no doubt that the advent of the automobile in 1914 revolutionized the way we travelled; however, the benets associated with this revolution are contrasted dramatically by the toll that road related accidents impose on us both personally and socially. This dissertation identied the development of intelligent vehicles that are self-aware as a step to combating this toll. To aid the development of these vehicles a new algorithm for target detection and tracking called distillation was developed. Distillation uses multiple cues to track a target in a multi-dimensional state space and controls the search using a particle lter. Finite computational resources are eciently allocated across the cues, taking into account the cues expected utility and resource requirement. The system can accommodate for cues running at dierent frequencies, allowing cues performing less well to be run slowly in the background for added robustness with minimal additional computation. A novel approach to lane tracking was then presented that uses the distillation algorithm to track the vehicle pose and the width of the road. It was found that the lane tracking system beneted greatly from the cue fusion and particle ltering technologies used and was shown to perform robustly in a number of situations that are often dicult for lane trackers. The particle lter conferred a number of benets suited to the task of lane tracking. First, a priori constraints such as lane edges meeting in a vanishing point on the horizon and road hypotheses lying in the plane of the road, are indirectly incorporated into the algorithm through the hypothesis validation process of the particle lter. Instead of searching for a result like traditional lane trackers, the distillation algorithm tests a number of hypotheses that all satisfy the road model

144

Conclusion

construct. Second, these constraints, as well as particles concentrating resources in areas of high probability, helped prune the erroneous solutions that have previously plagued lane trackers (i.e. oil marks and cracks on the road surface, rain on the windshield, departure lanes, construction sites, etc.). Randomly redistributing a small number of the particles in the particle lter handled the kidnapped robot problem eectively. When the lane is lost, randomly dispersed particles quickly localize the lane and concentrate resources there. This characteristic of the particle lter also aided the initial convergence of the algorithm. It was found that the selection of cues strongly inuences the outcome of the algorithm. For example, since no cues were particularly sensitive to dramatic changes in lighting, the traditionally dicult scenario of entering a tunnel does not pose a problem to the tracker. Having a wide selection of cues conferred the additional benet that changes in road type and disappearing lane markers (or weak lane marker signals) are handled elegantly at a fundamental level of the algorithm. Scene changes do not have to be explicitly detected and handled separately. Cues that are better suited to the scene will take priority over, but will not replace, other cues based on their performance with respect to the fused posterior distribution. The distillation lane tracker performed well in comparison with previous lane trackers, achieving ground truth success rates between 91.4% and 100% in scenarios ranging from clear highways to high curvature roads in the rain. Prior art systems achieved between 94% and 98.2% autonomous operation over large distances. A direct quantitative comparison between systems is impossible since each group uses a dierent dataset, has their own denition of autonomous operation and encompasses their lane tracker within an autonomous agent responsible for driving on public roads. Therefore, a dataset of images with vehicle state parameters was established as part of this project to encourage direct comparisons between lane trackers. Finally, the lane tracking system was integrated with a driver monitor in a system called VIOV, which estimates the drivers focus of attention with respect to the road and successfully closes the loop between vision inside and outside the vehicle. A strong correlation was found to exist between the eye gaze direction of the driver and the curvature of the road, while the repeatability of the experiments was high. A similar correlation has been found previously by Land (1992) using

145 a manual frame-by-frame visual analysis of the scene. Until there is a signicant move away from single cue tracking, there is little hope of achieving the standard of robustness required for commercial acceptance of lane trackers. It is hoped that this dissertation has not only provided a framework for multiple-cue lane tracking, but has also motivated the importance of using multiple complementary modalities for lane tracking and stimulated research in this area.

Further Work
While the results presented in the previous chapter are promising, there are number of improvements and directions that this research could take in the future. Auto-calibration. Any autonomous agent that is expected to act in a dynamic environment must be able to automatically calibrate itself. In the system presented here, the calibration of the cameras was performed oine prior to experimentation. A signicant improvement in both practicality and accuracy would be achieved by developing an auto-calibration procedure for the camera system. Many previous authors have used calibration grids on the road as well as some more adventurous using calibration markers on the bonnet of the car. The latter approach would prove more valuable as calibration could be performed on-the-y even while tracking the road. Sensor modelling. An important consideration when using a particle lter for tracking is accurate modelling of the sensors and their error characteristics for the evaluation of their sensor model. For eciency, the error characteristics of the image sensors were modelled by blurring the probability maps with a Gaussian kernel once each iteration. This eectively smoothed the likelihood distribution for each cue. While this was found to aid convergence, another method that is more mathematically rigorous should be investigated such as that used by Thrun et al. (2001b). Action sensors. Error characteristics of the action sensors were not modelled explicitly, but incorporated into the algorithm by dispersing the particles according to Brownian motion, after applying the action model . An advancement on this would be to learn the error characteristics of the action sensors such as the tailshaft encoder and wheel angle sensor through empirical experiment.

146

Conclusion

Cue performance evaluation. The Kullback-Leibler divergence is an information theoretic measure of the closeness of two distributions that was used to measure the distance between each cues likelihood and the fused likelihood of all the cues. While the fused result is the best estimate we have of the true distribution, comparing a cues distribution with that may not be the best measure of utility. For example, consider the case where one cue is very good at estimating the width of the road and the lateral oset of the vehicle, but is poor at estimating the yaw of the vehicle. It would produce a distribution that is very peaked on the road width and lateral oset axes, and very at on the yaw axis. Now, consider another cue that is good at estimating the yaw, but poor at estimating the road width and lateral oset. The combination of both of these cues would produce an excellent result but they will both dier signicantly from the fused result. This artifact has not been investigated in great detail in this thesis, but should be considered for future research. Particle lter. The particle lter proved extremely good at bootstrapping the algorithm and tracking the road; however, after convergence the distribution of particles was often close to a Gaussian distribution. This suggested that the nonGaussian modelling capabilities provided by the particle lter may not be needed once the algorithm has converged. It would be interesting to experiment with using the particle lter as a detection apparatus and then using a Gaussian based tracker such as an extended Kalman lter to track the road while the signal is strong. In times of uncertainty the particle lter could be restarted to relocate the road or could be run in the background as a form of top-down checking against tracking failure. Cues. To increase the robustness of the algorithm to road type, more cues need to be developed for non-structured roads. The colour cues could be improved as suggested in section 5.5.3 for this purpose. The colour road model could also be initialized to a uniform distribution and left to converge over time. A time constant could be added so that the distribution is smoother over time. In addition, a non-road colour model could be formed in a similar fashion to the road colour model and used for pixel classication. A further step would be to include cues that do not use the modality of vision, such as laser range nders and sonar. Curvature estimation. The road model at the moment is limited to a linear region of road ranging from 826 m in front of the vehicle. The next step is to

147 incorporate curvature estimation. Experiments were performed with a curvature tracking system as outlined in appendix A; however, insucient camera stability caused the system to fail. With camera stabilization, this system could be investigated in more detail. In addition to this, merging the curvature estimation with the vehicle pose estimation so that all the parameters are estimated in the same step would reduce the propagation of errors from one system to the other. The down-side of this is that more particles, and hence more processing time, would be required to handle the higher-dimensional space. Taking advantage of the statistical measures that can be produced by a probabilistic method such as particle ltering, the condence of the result could be used to set an adaptive lookahead distance for curvature estimation. As the condence of the result increases, the lookahead distance can be extended. In times of uncertainty, the lookahead distance could be reduced to consolidate resources.

148

Conclusion

Bibliography
ACAHSRA (2004). AHSRA web page. Internet: http://www.ahsra.or.jp/. Apostolo, N. E. and A. Zelinsky (2002, July). Vision in and out of vehicles: integrated driver and road scene monitoring. In Proc. 8th International Symposium on Experimental Robotics (ISER02), SantAngelo dIschia, Italy. Also in B. Siciliano and P. Dario (Eds.), Experimental Robotics VIII, Springer tracts in advanced robotics 5, pp. 634643. Springer. Apostolo, N. E. and A. Zelinsky (2003, June). Robust vision based lane tracking using multiple cues and particle ltering. In Proceedings of IEEE Intelligent Vehicles Symposium, Columbus, OH, USA. Apostolo, N. E. and A. Zelinsky (2004, April). Vision in and out of vehicles: integrated driver and road scene monitoring. International Journal of Robotics Research 23 (4), 513538. Atienza, R. and A. Zelinsky (2002, October). Active gaze tracking for humanrobot interaction. In IEEE Transactions on Multimodal Interfaces, Pittsburgh, PA, USA. Aufrre, R., R. Chapuis, and F. Chausse (2001). A model-driven approach for e real-time road recognition. In Machine Vision and Applications, Volume 13, pp. 95107. Springer-Verlag. Baluja, S. (1996, June). Evolution of an articial neural network based autonomous land vehicle controller. IEEE Transactions on Systems, Man and Cybernetics, Part B 26 (3), 450463. Bertozzi, M. and A. Broggi (1998, January). GOLD: a parallel real-time stereo vision system for generic obstacle and lane detection. In IEEE Transactions on Image Processing, Volume 7, pp. 6281. Blake, A. and M. Isard (1998). Active Contours. Great Britain: Springer. Bouguet, J. (2002, October). Camera calibration toolbox for matlab. Internet. http://www.vision.caltech.edu/bouguetj/calib doc/.

150

BIBLIOGRAPHY

Broggi, A., M. Bertozzi, and A. Fascioli (1999a, JanFeb). The 2000 km test of the ARGO vision-based autonomous vehicle. IEEE Intelligent Systems, 5564. Broggi, A., M. Bertozzi, and A. Fascioli (1999b). ARGO and the MilleMiglia in Automatica tour. IEEE Intelligent Systems, 5564. Cai, J. and A. Goshtasby (1999). Detecting human faces in color images. Image and Vision Computing 18 (1), 6375. Canny, J. (1986). A computational approach to edge detection. In IEEE Trans. Pattern Analysis and Machine Intelligence, Volume 8, pp. 679698. Cramer, H. and G. Wanielik (2002, September). Road border detection and tracking in non cooperative areas with a laser radar system. In Proceedings GRS 2002 German Radar Symposium, Bonn. Crisman, J. and C. Thorpe (1993, February). SCARF: A color vision system that tracks roads and intersections. IEEE Trans. on Robotics and Automation 9 (1), 4958. Dankers, A. and A. Zelinksy (2003, April). A real world vision system: Mechanism, control and vision processing. In 3rd International Conference on Computer Vision Systems, Graz, Austria. DARPA (2004, March). Darpa grand challenge web-site. Internet:

http://www.darpa.mil/grandchallenge/. Dickmanns, E. D. (1988, November). Dynamic computer vision for mobile robot control. In Proc. 19th Int. Symp. and Expos. on Robots, Sydney, Australia. Dickmanns, E. D. (1998). Ecient computation of intensity proles for realtime vision. Lecture Notes in Computer Science, 131139. Dickmanns, E. D. (1999, October). An expectation-based, multi-focal, saccadic (EMS) vision system for vehicle guidance. In Proc. International Symposium on Robotics and Research, Salt Lake City, Utah. Dickmanns, E. D. and A. Zapp (1987). Automonous high speed road vehicle guidance by computer vision. In R. Isermann (Ed.), Automatic Control World Congress, 1987: Selected Papers from the 10th Triennial World Congress of the International Federation of Automatic Control, Munich, Germany, pp. 221226. Pergamon. Dixon, J. C. (1991). Tyres, suspension and handling. Cambridge: Cambridge University Press.

BIBLIOGRAPHY

151

Doucet, A. (1998). On sequential simulation-based methods for Bayesian ltering. Technical Report CUED/F-INFENG/TR. 310, Cambridge University Department of Engineering. DSA (2003). Road saftey initiatives USA and global. Internet: http://www.driveandstayalive.com/articles%20and%20topics/road%20 safety%20initiatives/aa-index road-safety-initiatives.htm. Duda, R. and P. Hart (1972). Use of the Hough transform to detect lines and curves in pictures. In Communications of the ACM, Volume 15, pp. 1115. ERTICO (2004, July). Welcome to ERTICO. Internet: http://www.ertico.com. Fletcher, L., N. E. Apostolo, J. Chen, and A. Zelinsky (2001). Computer vision for vehicle monitoring and control. In Proc. Australian Conference on Robotics and Automation, Sydney, Australia. Fletcher, L., N. E. Apostolo, L. Petersson, and A. Zelinsky (2003, May/June). Vision in and out of vehicles. In A. Broggi (Ed.), IEEE Intelligent Systems: Putting AI Into Practice, Volume 18, pp. 1217. IEEE Computer Society. Forsyth, D. A. and J. Ponce (2003). Computer vision: a modern approach, Chapter 8: Edge Detection, pp. 165188. Prentice Hall. Franklin, S. and A. Graesser (1997). Is it an agent, or just a program?: A taxonomy for autonomous agents. pp. 2135. Gregor, R., M. L utzeler, M. Pellkofer, and K. H. Siedersberger. Expectationbased, multifocal, saccadic visionunderstanding dynamic scenes observed from a moving platform. IV2000 slides. Gregor, R., M. Ltzeler, M. Pellkofer, K. H. Siedersberger, and E. D. Dicku manns (2000). Ems-vision: A perceptual system for autonomous vehicles. In Proc. IEEE Intelligent Vehicles Symposium. Hartley, R. and A. Zisserman (2000). Multiple View Geometry in computer vision. The Edinburgh Building, Cambridge, CB2 2RU, UK: Cambridge University Press. Heikkil, J. (2000, Oct). Geometric camera calibration using circular control a points. In IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 22, pp. 10661077. Hofmann, U., A. Rieder, and E. D. Dickmanns (2000). Application to hybrid adaptive cruise control. In IVS.

152

BIBLIOGRAPHY

HoRSCoCTA (2000). Inquiry into Managing Fatigue in Transport. The Parliament of the Commonwealth of Australia. House of Representatives Standing Committee on Communications, Transport and the Arts. Hough, P. (1962). Method and means of recognizing complex patterns. US Patent 3069654. Isard, M. and A. Blake (1998). Condensation conditional density propagation for visual tracking. Iteris (2002, August). Welcome to iteris. Internet. http://www.iteris.com/. Jacobs, G., A. Aeron-Thomas, and A. Astrop (2000). Estimating global road fatalities. Technical Report TRL Report 445, Transport Research Laboratory, Old Wokingham Rd, Crowthorne, Berkshire, RG45 6AU. Jochem, T. and D. Pomerleau (2004, March). No hands across America home page. Internet. http://www-2.cs.cmu.edu/afs/cs/user/tjochem/www /nhaa/nhaa home page.html. Jochem, T., D. Pomerleau, B. Kumar, and J. Armstrong (1995, September). PANS: a portable navigation platform. In IEEE Symposium on Intelligent Vehicles, pp. 107112. Jochem, T., D. Pomerleau, and C. Thorpe (1993, February). MANIAC: A next generation neurally based autonomous road follower. In Proceedings of the International Conference on Intelligent Autonomous Systems. Also appears in the Proceedings of the Image Understanding Workshop, April 1993, Washington D.C., USA. Jochem, T. M. and S. Baluja (1993). A massively parallel road follower. In CAMP93, pp. 212. Kenue, S. K. (1989). Lanelok: detection of lane boundaries and vehicle tracking using image-processing techniques parts I and II. In Proceedings, SPIE Mobile Robots IV, 1989, pp. 221244. Kittler, J., M. Hatef, R. Duin, and J. Matas (1998, March). On combining classiers. In Pattern Analysis and Machine Intelligence, IEEE Transactions on, Volume 20, pp. 226239. Kluge, K. (1994). Extracting road curvature and orientation from image edge points without perceptual grouping into features. In Proceedings of the Intelligent Vehicles Symposium, pp. 109114.

BIBLIOGRAPHY

153

Kluge, K., C. M. Kreucher, and S. Lakshmanan (1998, April). Tracking lane and pavement edges using deformable templates. In SPIE 12th Annual International Aerosense Symposium, Orlando, Florida. Kluge, K. and C. Thorpe (1992, July). Representation and recovery of road geometry in YARF. In Proceedings of the Intelligent Vehicles 92 Symposium, pp. 114 119. Kreucher, C. and S. Lakshmanan (1999). LANA: a lane extraction algorithm that uses frequency domain features. In IEEE Transactions on Robotics and Automation, Volume 15, pp. 343350. Kullback, S. and R. A. Leibler (1951, March). On information and suciency. Annals of Mathematical Statistics 22 (1), 7986. Lakshmanan, S. and K. Kluge (1996, March). Lois: A real-time lane detection algorithm. In Annual Conference on Information Sciences and Systems, Princeton, NJ. Land, M. F. (1992). Predictable eye head coordination during driving. Nature 359, 318320. Land, M. F. and B. W. Tatler (2001). Steering with the head: the visual strategy of a racing driver. Current Biology 11, 12151220. Liu, J. S. and R. Chen (1998). Sequential Monte Carlo methods for dynamic systems. Journal of the American Statistical Association 93 (443), 1032 1044. Loy, G. (2003). Computer vision to see people: a basis for enhanced human computer interaction. Ph. D. thesis, The Australian National University. Loy, G., L. Fletcher, N. E. Apostolo, and A. Zelinsky (2002, May). An adaptive fusion architecture for target tracking. In Proc. The 5th International Conference on Automatic Face and Gesture Recognition, Washington DC. Maller, J. (2002, February). FxScript reference: RGB and YUV color. Internet: http://www.joemaller.com/fcp/fxscript yuv color.shtml. Masaki, I. (2004). Cost-eective real-time three-dimensional vision system. Internet: http://www-mtl.mit.edu/ITRC/page15.html. Maurer, M., R. Behringer, F. Thomanek, and E. D. Dickmanns (1996). A compact vision system for road vehicle guidance. In Proceedings of the 13th International Conference on Pattern Recognition, Vienna, Austria.

154

BIBLIOGRAPHY

McDonald, J., J. Franz, and R. Shorten (2001). Application of the hough transform to lane detection in motorway driving scenarios. In Proc. of the Irish Signals and Systems Conference. Mihajlovski, A. (2002, July). Autonomous road vehicle: Radar sensor characterization and adaptive cruise control. Australian National University Engineering Honours Thesis. Mobileye (2004). Mobileye - general. Internet: http://www.mobileye.com/. Mobileye Vision Technologies Ltd. Moore, G. E. (1965). Cramming more components onto integrated circuits. 38, 114117. Nageswara, S. R. (2001). On fusers that perform better than best sensor. In Pattern Analysis and Machine Intelligence, IEEE Transactions on, Volume 23. NAT (1998). National automated highway system research program: A review. Special Report 253, Transport Research Board of the National Academies. Ohta, Y., T. Kanade, and T. Sakai (1980). Color information for region segmentation. Computer Graphics and Image Processing (13), 222241. Ouanounou, G. (2004). CyberCars. Internet: http://www.cybercars.org/. Pearson, I. and I. Neild (2001, November). Technology timeline. Technical report, BTexact Technologies, Adastral Park, Martlesham, Ipswich IP5 3RE, UK. Reference: WP106. Petersson, L., N. E. Apostolo, and A. Zelinsky (2002, November). Driver assistance based on vehicle monitoring and control. In Proceedings of the Australian Conference on Robotics and Automation, Auckland. Petersson, L., N. E. Apostolo, and A. Zelinsky (2003). Driver assistance: and integration of vehicle monitoring and control. In Proceedings of IEEE International Conference on Robotics and Automation. Pomerleau, D. (1989). ALVINN: An autonomous land vehicle in a neural network. In Advances in Neural Information Processing Systems 1. Morgan Kaufmann. Pomerleau, D. (1995, September). RALPH: Rapidly adapting lateral position handler. In Proc. IEEE Symposium on Intelligent Vehicles, pp. 506 511. Pomerleau, D. and T. Jochem (1996, February). Rapidly adapting machine vision for automated vehicle steering. IEEE Expert 11 (1), 1927.

BIBLIOGRAPHY

155

Rasmussen, C. (2002, May). Combining laser range, color, and texture cues for autonomous road following. In Proc. IEEE International Conference on Robotics & Automation, Washington, DC. Seeingmachines (2004, March). Seeing machines, makers of faceLAB - head and eye tracking. Internet: http::/www.seeingmachines.com. Serra, J. (1982). Image analysis and mathematical morphology. London: Academic Press. Soto, A. and P. Khosla (2001, November 9-12). Probabilistic adaptive agent based system for dynamic state estimation using multiple visual cues. In Proceedings of the International Syposium of Robotics Research (ISRR). Sutherland, O., H. Truong, S. Rougeaux, and A. Zelinsky (2000). Advancing active vision systems by improved design and control. In Proceedings of the International Symposium of Experimental Robotics, pp. 7180. Suzuki, A., N. Yasui, N. Nakano, and M. Kaneko (1992). Lane recognition system for guiding of autonomous vehicle. In Proceedings of the Intelligent Vehicles Symposium, 1992, pp. 196201. Tanner, M. A. (1996). Tools for Statistical Inference (3rd ed.). New York: Springer. Taylor, C. J., J. Malik, and J. Weber (1996). A real-time approach to stereopsis and lane-nding. In Intelligent Vehicles, pp. 207212. Thorpe, C. (1990). Vision and Navigation: the Carnegie Mellon Navlab. Kluwer Academic Publishers. Thrun, S., M. Beetz, M. Bennewitz, W. Burgard, A. Cremers, F. Dellaert, D. Fox, D. Hahnel, C. Rosenberg, N. Roy, J. Schulte, and D. Schultz (2000). Probabilistic algorithms and the interactive museum tour-guide robot minerva. Robot Research 19, 972999. Thrun, S., D. Fox, W. Burgard, and F. Dellaert (2001a). Robust Monte Carlo localization for mobile robots. Articial Intelligence 128 (1-2), 99141. Thrun, S., D. Fox, W. Burgard, and F. Dellaert (2001b). Robust monte carlo localization for mobile robots. Articial Intelligence Journal . Triesch, J. and C. von der Malsburg (2000, March). Self-organized integration of adaptive visual cues for face tracking. In Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition (FG 2000), Grenoble, France, pp. 2630.

156

BIBLIOGRAPHY

Victor, T., O. Blomberg, and A. Zelinsky (2001). Automating driver visual behaviour measurement. In Proc. Vision in Vehicles 9. Vlassis, N., B. Terwijn, and B. Krse (2001, September). Auxiliary particle lo ter robot localization from high-dimensional sensor observations. Technical Report IAS-UVA-01-05, Computer Science Institute, University of Amsterdam. Webster (1998). Websters revised http://www.dictionary.com. unabridged dictionary. Internet.

Appendix A Lane curvature tracking

HE main focus of this dissertation was the development of a robust visual

lane tracking system for intelligent vehicle experimentation. In chapter 5 a vehicle pose and road width tracking system was developed that was sucient for simple intelligent systems such as the driver monitoring system developed in chapter 6; however, advanced systems will require more detailed descriptions of the surrounds and the next step is curvature tracking. The system discussed below was implemented and tested; however, the hardware supporting the vision system was insucient to handle the vibrations amplied by the far-eld camera. Hence, a description of the system is presented here and possible extensions discussed, but no results presented.

A.1

Integrated vehicle pose and curvature tracking

The architecture of the lane tracking system is made up to two distillation algorithm sub-systems that share common resources (gure A.1). The rst sub-system detects and tracks the vehicle state relative to the road and the road width (chapter 5). Both the lateral oset of the vehicle with respect to the skeletal line of the lane ysr and the yaw of the vehicle vs are determined with the width of the lane rw in the rst sub-system. This information is used to complete the transformation between the road central co-ordinate system and the image plane so that the curvature can be estimated in the second sub-system. The curvature is estimated in the form of a horizontal clothoid element. The clothoid describes the curvature of the road in the form of an initial curvature (c0 ) and a rate of change

158

Lane curvature tracking

Pose and road width tracker

ysr, vs, rw

Curvature tracker ysr, vs, rw, c0, c1 Output

Figure A.1: Complete lane tracking system with curvature estimation. First, the lane tracker estimates the pose of the vehicle and the road width with the rst distillation processor. Second, the curvature estimator uses the pose of the vehicle and the road width to calculate the curvature of the road in the second distillation processor. of curvature(c1 ).

A.2

Curvature tracking system

This section describes the state space, motion models and sensor models used in the road curvature tracking system. The vehicle state and road width tracking system described in chapter 5 determines the missing vehicle state parameters required for the transformation of points from the road co-ordinate system into the image plane. The road width is also determined, so that a single clothoid of two parameters can be used to model the horizontal curvature of the road. A clothoid is a dierential geometry curve that species the trajectory that the vehicle follows given a constant steer-rate at a constant velocity.

A.2.1

State space

The skeletal line of the road is given by the horizontal clothoid parameters c0h and c1h c(l) = c0h + c1h l (A.1)

where c(l) is the curvature of the road at arc length l. The curvature is equal to 1/R where R is the Instantaneous Centre of Curvature. The parameter c0h is the

A.2 Curvature tracking system

159

initial curvature at l = 0 and c1h is the rate of change of curvature with respect to l. The state space of the curvature tracking system is therefore xc = c0h . c1h (A.2)

Integrating c(l) with respect to l gives the heading angle (l) of the vehicle
l

(l) = o (l) +
0

c( )d ;

(A.3) (A.4)

= c0h l + c1h l2 /2.

Integrating again gives the x and y position along the trajectory at arc length l
l

x(l) = x0 +
0 l

cos(( ))d ; sin(( ))d.


0

(A.5) (A.6)

y(l) = y0 +

Assuming a total trajectory change of less than 15o , equations A.5 & A.6 can be approximated by x(l) = x0 + l; y(l) = y0 + c0h l2 /2 + c1h l3 /6. (A.7) (A.8) (A.9) Therefore a point PR (l) along the skeletal line of the road is represented by x(l) and y(l) in equations A.7 & A.8 and heading (l) (equation A.4). A birds-eye view of a clothoid in the road co-ordinate system is shown in gure A.2.

A.2.2

Curvature motion model

The update equations for the curvature motion model are founded on the differential geometry of the clothoid curves described previously. The term c0h is updated via equation A.1 over the arc length travelled since the last iteration, l, to get c0h,i+1 = c0h,i + c1h,i l i. The length of the arc travelled since the last iteration is l = vt cos(vs ) (A.11) (A.10)

where c0h,i is the curvature and c1h,i is the rate of change of curvature, at iteration

160

Lane curvature tracking

y l RCS x z x(l) PR(l) (l) y(l)

Figure A.2: A birds-eye view of a clothoid road element with initial curvature c0h and rate of change of curvature c1h . Note that the curvature in this gure is exaggerated for clarity. where vs,i is the yaw of the vehicle with respect to the road at iteration i, t is the time that has expired between iterations and v is the velocity of the vehicle. The rate of change of c with respect to the arc length travelled, c1h , does not change with l according to the clothoid model c1h,i+1 = c1h,i . (A.12)

The particles are diused using a normally distributed random number generator centered at (0,0) with a standard deviation of (c0h , c1h ) c0h = c0h,i+1 + randn(0, c0h ) c1h = c1h,i+1 + randn(0, c1h ) (A.13) (A.14)

A.2.3

Road model for curvature estimation

The road model and regions used by the cues for curvature estimation are fundamentally the same as the road model for vehicle state and road width estimation (5.4). The road edge cue still uses the road edge region, the lane marker cue uses the lane marker region, and the colour cues use both the road region and the non-road region. The dierence is that a straight section of road in front of the vehicle is no longer the only part of the road tested. A clothoid element is generated using equations A.7 & A.8 to represent the skeletal line of the road in

A.2 Curvature tracking system


Lstart Lsep y Lsep Lend Lsep

161

Lsep

Lsep

RCS x z Orer rw

Onrr PR(l) (l)

Figure A.3: Curvature road model. The road model for curvature estimation is a combination of ve linear segments created using a clothoid skeletal line with an initial curvature c0h and a rate of change of curvature c1h . Note that the curvature in this gure is exaggerated for clarity. the road co-ordinate system as shown in gure A.3. This element is broken up into ve segments that each have the same width (Lsep ). These ve sections are then used to create ve joined linear road models in the road co-ordinate system, each of which is dened in the same way as in section 5.4. An important aspect of this model is that the even spacing of the parallelogram elements in the road co-ordinate system produce a curve in image space that has a higher resolution of curve elements at depths further from the camera. The curvature of the road at greater depths in more pronounced in the image plane due to the perspective mapping and the angle of view of the road. Thus a higher concentration of road elements is needed to maintain accuracy in the far-eld.

A.2.4

Curvature cues

All the image based cues discussed in section 5.4 are used for curvature estimation of the road. The cues behave in the same fashion as those for the vehicle state and road width estimation cues except that the road model is generated using the c0h and c1h parameters of each particle and the line segments from each of

162

Lane curvature tracking

Figure A.4: Vibration eects on the far-eld camera. The two consecutive frames here show a 6 pixel vertical jump caused by slight vibrations in the car being amplied by the zoom on the far-eld camera. Such vibrations made testing the curvature tracking system impossible. the ve linear road models are summed to evaluate the sensor models.

A.3

Conclusions

The curvature tracking system was implemented and combined with the vehicle state and road width tracking system; however, it was found that the far-eld camera was too unstable to provide a reliable image to test the curvature tracking system (gure A.4). In addition, active control of the vision system will be necessary to keep the lane in the image on high curvature roads (gure A.5). The combination of these two eects caused the curvature tracking system to fail. There are a number of ways to overcome these inadequacies. First, active stabilization using the inertial sensor on CeDAR combined with image stabilization based on horizon detection and correction could be used to reduce vibration eects. Second, curvature tracking could be implemented using the near-eld camera. This would reduce the accuracy of the curvature estimates but would reduce the eects of vibrations signicantly. In addition to this, estimation of the curvature could be integrated with the estimation of the vehicle pose and road width estimation. This would increase the dimensionality of the problem thus increasing the number of particles required in the particle lter, but would simplify the algorithm dramatically while removing the possibility of propagating errors from one sub-system to the other.

A.3 Conclusions

163

Figure A.5: Disappearing lane edges in curvature tracking. Active vision is required to keep lanes centred curing curvature tracking using the far-eld camera conguration.

164

Lane curvature tracking

Appendix B Vehicle calibration

ELOW is the vehicle calibration data used in the coordinate system trans-

formations discussed in section 5.1.2 as well as the camera calibration data from section 5.3.2. ysr (m) zsr (m) vs (o ) -0.212 -0.365 -0.0045 bv (o ) xbv (m) zbv (m) 0.0 -1.6 -1.255

Table B.1: Vehicle calibration. ci b (o ) ci b (o ) xci b (m) yci b (m) zci b (m) Camera 1 0.0 0.036 -0.02 -0.15 0.025 Camera 2 0.0 0.070 -0.02 0.15 0.025 Table B.2: Extrinsic camera calibration. fx (mm) fy (mm) x0 (pixels) y0 (pixels) Camera 1 5.5955 5.5985 172.0103 107.9645 1 0.01257 0.0130 5.0411 3.0329 Camera 2 16.0039 15.7260 152.0712 97.3408 2 0.0498 0.0498 5.6237 6.0153 Table B.3: Intrinsic camera calibration. The standard deviation of parameter i is i .

166

Vehicle calibration

Appendix C Enlarged gures

(a) Lateral oset

(b) Yaw

(c) Road width

Figure C.1: Enlargement of gures 5.30(b)(d) that shows a comparison between the experimental and baseline results on a highway with clear markings.

168

Enlarged gures

(a) Lateral oset

(b) Yaw

(c) Road width

Figure C.2: Enlargement of gures 5.31(b)(d) that shows a comparison between the experimental and baseline results on a high curvature road.

169

(a) Lateral oset

(b) Yaw

(c) Road width

Figure C.3: Enlargement of gures 5.32(b)(d) that shows a comparison between the experimental and baseline results on a high curvature road in the rain.

170

Enlarged gures

(a) Lateral oset

(b) Yaw

(c) Road width

Figure C.4: Enlargement of gures 5.33(b)(d) that shows a comparison between the experimental and baseline results on a high curvature road in the rain.

Appendix D Multimedia extensions


The table below presents the multimedia extensions that can be found on the CD accompanying this dissertation. Extension 1 2 Description Overview of the vision systems on TREV. Example lane tracking sequence from a high curvature road showing the convergence of particles onto to lane location. Video Lane tracking sequence along a highway with clear lane markings, shadows and several run-o lanes. Video Lane tracking sequence along a high curvature outer city road showing dramatic lighting changes, a high level of shadows and discontinuous lane markings. Video Lane tracking sequence along a high curvature outer city road in medium rain. Video Lane tracking sequence along a highway with poor lane markings and harsh shadows. Data/Code Example dataset from the highway lane tracking scenario containing the baseline, the respective image set and a transformation framework between the road-centric coordinate system and the image coordinate system. Extract the data and code using your favourite archiving utility and use am skeleton.m as a template le for your code.
continued on next page

Media type Video Video

3 4

5 6 7

172
continued from previous page

Multimedia extensions

Extension 8 9

10

11

Media type Description Video Example faceLABTM sequence showing head pose and eye gaze tracking. Video The integrated driver and road scene monitoring system is shown in a 3 minute sequence around a high curvature outer city road. Data/Code Example dataset from a real-world test as well as the visualization software that integrates the lane tracking results with the head pose and eye gaze vectors from faceLABTM (The faceLABTM Toolbox for matlab was kindly supplied by Seeing Machines - which I have extended, with the help of David Liebowitz, to use dynamic patch data). Extract the data and code using your favourite archiving utility and run the run viov script in a matlab shell to visualize the data. The data viewer is a modied version of the threed browser that comes with FAT v1.0. Video Cues for lane tracking. Table D.1: Index into the multimedia extensions.