1758-Article Text-1754-1-10-20080129

AI Magazine Volume 25 Number 2 (2004) (© AAAI)
Articles
Dynamic Vision-Based
Intelligence
Ernst D. Dickmanns
■ A synthesis of methods from cybernetics and AI field of AI (Miller, Gallenter, and Pribram 1960;
yields a concept of intelligence for autonomous Selfridge 1959).1 With respect to results
mobile systems that integrates closed-loop visual promised versus those realized, a similar situa-
perception and goal-oriented action cycles using
tion to that with cybernetics developed in the
spatiotemporal models. In a layered architecture,
systems dynamics methods with differential mod-
last quarter of the twentieth century.
els prevail on the lower, data-intensive levels, but In the context of AI, the problem of comput-
on higher levels, AI-type methods are used. Knowl- er vision has also been tackled (see, for exam-
edge about the world is geared to classes of objects ple, Marr [1982]; Rosenfeld and Kak [1976];
and subjects. Subjects are defined as objects with Selfridge and Neisser [1960]). The main para-
additional capabilities of sensing, data processing, digm initially was to recover three-dimension-
decision making, and control application. Special- al (3D) object shape and orientation from sin-
ist processes for visual detection and efficient gle images or from a few viewpoints. In aerial
tracking of class members have been developed. or satellite remote sensing, the task was to clas-
On the upper levels, individual instantiations of
sify areas on the ground and detect special ob-
these class members are analyzed jointly in the
task context, yielding the situation for decision
jects. For these purposes, snapshot images tak-
making. As an application, vertebrate-type vision en under carefully controlled conditions
for tasks in vehicle guidance in naturally perturbed sufficed. Computer vision was the proper name
environments was investigated with a distributed for these activities because humans took care
PC system. Experimental results with the test vehi- of accommodating all side constraints ob-
cle VAMORS are discussed. served by the vehicle carrying the cameras.
When technical vision was first applied to
vehicle guidance (Nilsson 1969), separate
viewing and motion phases with static image
evaluation (lasting for many minutes on re-
D
uring and after World War II, the prin- mote stationary computers) were initially
ciple of feedback control became well adopted. Even stereo effects with a single cam-
understood in biological systems and era moving laterally on the vehicle between
was applied in many technical disciplines for two shots from the same vehicle position were
unloading humans from boring work load in investigated (Moravec 1983). In the early
system control and introducing automatic sys- 1980s, digital microprocessors became suffi-
tem behavior. Wiener (1948) considered it to ciently small and powerful, so that on-board
be universally applicable as a basis for building image evaluation in near real time became pos-
intelligent systems and called the new disci- sible. The Defense Advanced Research Projects
pline cybernetics (the science of systems con- Agency (DARPA) started its program entitled
trol). After many early successes, these argu- On Strategic Computing in which vision archi-
ments soon were oversold by enthusiastic tectures and image sequence interpretation for
followers; at that time, many people realized ground-vehicle guidance were to be developed
that high-level decision making could hardly (autonomous land vehicle [ALV]) (AW&ST
be achieved on this basis. As a consequence, 1986). These activities were also subsumed un-
with the advent of sufficient digital computing der the title computer vision. This term became
power, computer scientists turned to descrip- generally accepted for a broad spectrum of ap-
tions of abstract knowledge and created the plications, which makes sense as long as dy-
10 AI MAGAZINE Copyright © 2004, American Association for Artificial Intelligence. All rights reserved. 0738-4602-2004 / $2.00
Articles
namic aspects do not play an important role in

sensor signal interpretation.
For autonomous vehicles moving under un-
constrained natural conditions at higher
speeds on nonflat ground or in turbulent air, it
is no longer that the computer “sees” on its
own. The entire body motion as a result of con-
trol actuation and perturbations from the envi-
ronment has to be analyzed based on informa-
tion coming from many different types of
sensors. Fast reactions to perturbations have to
be derived from inertial measurements of ac-
celerations and the onset of rotational rates be-
cause vision has a rather long delay time (a few
tenths of a second) until the enormous
amount of data in the image streams has been
interpreted sufficiently well. This concept is
proven in biological systems operating under
similar conditions, such as the vestibular appa-
ratus of vertebrates. Figure 1. Test Vehicle VAMP for Highway Driving.
If internal models are available in the inter-
pretation process for predicting future states
(velocity, position, and orientation compo- tion of the concept on commercial off-the-
nents) based on inertial measurements, two es- shelf (COTS) hardware (a distributed PC sys-
sential information cross-feed branches be- tem) and the experimental results with VA-
come possible: (1) Own body pose need not be MORS conclude the article.
derived from vision but can be predicted from
the actual state and inertially measured data Review of the
(including the effects of perturbations) and (2)
slow drift components, occurring when only 4D Approach to Vision
inertial data are used for state prediction, can This approach with temporal models for mo-
be compensated for by visual observation of tion processes to be observed was quite uncom-
stationary objects far away in the outside mon when our group started looking at vision
world. Thus, quite naturally, the notion of an for flexible motion control in the late 1970s
extended presence is introduced because data and early 1980s. In the mid-1980s, it was real-
from different points in time (and from differ- ized that full reward of this approach could be
ent sensors) are interpreted in conjunction, obtained only when a single model for each
taking additional delay times for control appli- object was installed covering visual perception
cation into account. Under these conditions, it and spatiotemporal (4D—integrated models in
no longer makes sense to talk about computer three-dimensional space and time, the fourth
vision. It is the vehicle with an integrated sen- dimension) state estimation (Dickmanns 1987;
sor and control system, which achieves a new Wuensche 1988). Also, the same model was to
level of performance and becomes able to see, be used for computation of expectations on
also during dynamic maneuvering. The com- higher system levels for situation assessment
puter is the hardware substrate used for data and behavior decision, which includes the gen-
and knowledge processing. eration of hypotheses for intended behaviors
The article is organized as follows: First, a re- of other subjects in standard situations. The big
view of the 4D approach to vision is given, and advantage of this temporal embedding of sin-
then dynamic vision is compared to computa- gle-image evaluation is that no previous im-
tional vision, a term that is preferred in the AI ages need to be stored. In addition, 3D motion
community. After a short history of experi- and object states can more easily be recovered
ences with the second generation of dynamic by exploiting continuity conditions according
vision systems according to the 4D approach, to the 4D models used (3D shape and dynamic
the design goals of the third-generation system models for 3D motion behavior over time).
are discussed, emphasizing dynamic aspects. A Joint inertial and visual sensing and recur-
summary of the requirements for a general sive state estimation with dynamic models are
sense of vision in autonomous vehicles and the especially valuable when only partially pre-
natural environment is then followed with the dictable motion has to be observed from a
basic system design. Discussions of the realiza- moving platform subject to perturbations. This
SUMMER 2004 11
Articles
is the general case in real life under natural jects of certain classes; (4) powerful feedback
conditions in the outside world. It was soon re- control laws for subjects in certain situations;
alized that humanlike performance levels and (5) knowledge databases, including explic-
could be expected only by merging successful it representations of perceptual and behavioral
methods and components both from the cy- capabilities for decision making.
bernetics approach and the AI approach. For As some researchers in biology tend to spec-
about a decade, the viability of isolated percep- ulate, the conditions for developing intelli-
tion and motion control capabilities for single gence might be much more favorable if tasks
tasks in vehicle guidance has been studied and requiring both vision and locomotion control
demonstrated (Dickmanns 1995; Dickmanns are to be solved. Because biological systems al-
and Graefe 1988). Figure 1 shows one of the most always have to act in a dynamic way in
test vehicles used at University of the Bun- 3D space and time, temporal modeling is as
deswehr, Munich (UBM). common as spatial modeling. The natural sci-
Basic knowledge had to be built up and, ences and mathematics have derived differen-
lacking performance of electronic hardware tial calculus for compact description of motion
did not yet allow for the realization of large- processes. In this approach, motion state is
scale vision systems; this lack of performance linked to temporal derivatives of the state vari-
was true for many of the subfields involved, ables, the definition of which is that they can-
such as frame grabbing, communication band- not be changed instantaneously at one mo-
width, digital processing power, and storage ca- ment in time; their change develops over time.
pabilities. The characteristic way in which this takes place
At many places, special computer hardware constitutes essential knowledge about “the
with massively parallel (partially single-bit) world.” Those variables in the description that
computers has been investigated, triggered first can be changed instantaneously are called con-
by insight into the operation of biological vi- trol variables. They are the basis for any type of
sion systems (for example in the DARPA pro- action, whether they are goal oriented or not.
ject on strategic computing [AW&ST 1986; They are also the ultimate reason for the possi-
Klass 1985]). This line of development lost mo- bility of free will in subjects when they are able
mentum with the rapid development of gener- to derive control applications from mental rea-
al-purpose microprocessors. Video technology soning with behavioral models (in the fuzzy
available and driven by a huge market con- diction mostly used). It is this well-defined ter-
tributed to this trend. At least for developing minology, and the corresponding set of meth-
basic vision algorithms and understanding ods from systems dynamics, that allows a uni-
characteristics of vision paradigms, costly fied approach to intelligence by showing the
hardware developments were no longer neces- docking locations for AI methods neatly sup-
sary. Today’s workstations and PCs in small plementing the engineering approach.
clusters allow investigating most of the prob- Recursive estimation techniques, as intro-
lems in an easy and rapid fashion. Once opti- duced by Kalman (1960), allow optimal estima-
mal solutions are known and have to be pro- tion of state variables of a well-defined object
duced in large numbers, it might be favorable under noisy conditions when only some out-
to go back to massively parallel vision hard- put variables can be measured. In monocular
ware. vision, direct depth information is always lost;
It is more and more appreciated that feed- nonetheless can 3D perception be realized
back signals from higher interpretation levels when good models for ego motion are avail-
allow making real-time vision much more effi- able (so-called motion stereo). This fact has been
cient; these problems are well suited for today’s the base for the success of the 4D approach to
microprocessors. Within a decade or two, tech- image sequence analysis (Dickmanns and
nical vision systems might well approach the Wünsche 1999).
performance levels of advanced biological sys- First, applications in the early 1980s had
tems. However, they need not necessarily fol- been coded for systems of custom-made (8-to
low the evolutionary steps observed in biology. 32-bit) microprocessors in Fortran for relatively
On the contrary, the following items might simple tasks. Driving at speeds as high as 60
provide a starting level for technical systems miles per hour (mph) on an empty highway
beyond the one available for most biological was demonstrated in 1987 with the 5-ton test
systems: (1) high-level 3D shape representa- vehicle VAMORS; both lateral (lane keeping)
tions for objects of generic classes; (2) homoge- and longitudinal control (adapting speed to
neous coordinate transformations and spa- road curvature) was done autonomously with
tiotemporal (4D) models for motion; (3) action half a dozen Intel 8086 microprocessors (Dick-
scripts (feed-forward control schemes) for sub- manns and Graefe 1988).
12 AI MAGAZINE
Articles
Second-generation vision systems by our plementation of spatial scaling by just one

group (around 1990 to 1997) were implement- number, the scale factor. The series of arrows,
ed in the programming language C on “trans- shown directed upward in the center left of the
puters” (microprocessors with four parallel figure, is intended to indicate that object data
links to neighborning ones in a network). Bifo- come from a distributed processor system. In
cal vision with two cameras, coaxially arranged the system developed for our industrial part-
on gaze control platforms, have been the stan- ner, even diverse makes with arbitrary pro-
dard (Schiehlen 1995). The focal length of their gramming methods could be docked here. The
lenses differed by a factor of 3 to 4 so that a rel- dynamic database (DDB as it was called at that
atively large field of view was obtained nearby, time, now dynamic object database [DOB]
but in a correspondingly smaller field of view, shown by the horizontal spatial bar, left) func-
higher-resolution images could be obtained. In tioned as a distribution platform to all clients;
the framework of the PROMETHEUS Project in a transputer network, this function could be
(1987–1994), more complex tasks were solved performed by any processing node. The upper
by using about four dozen microprocessors. For part was not well developed; elements neces-
information on road and lane recognition, see sary for the few functions to be demonstrated
Behringer (1996) and Dickmanns and Mysli- in each test were programmed in the most effi-
wetz (1992); for information on observing oth- cient way (because of time pressure resulting
er vehicles in the front and the rear hemi- from the tight project plan). On the basis of
spheres, see Thomanek (1996) and Thomanek, this experience, a third-generation system was
Dickmanns, and Dickmanns (1994). Another to be developed in an object-oriented (C++)
one dozen transputers were in use for programming paradigm using commercial off-
conventional sensor data processing, situation the-shelf PC systems and frame-grabbing de-
assessment, and control output (throttle, vices, starting in 1997. It was designed for flex-
brakes, and steering of the car as well as gaze ible use, and for easy add-ing on of
control for the active vision system). components as they become available on the
At the final demonstration of the EUREKA Pro- market. In addition, system autonomy and
ject, two Mercedes 500 SEL sedans (VAMP [fig- self-monitoring were to achieve a new level.
ure 1] and VITA [the Daimler-Benz twin vehi- The structure of several software packages for
cles]) showed the following performance in classes of objects to be perceived (these might
public three-lane traffic on Autoroute A1 near be called “agents for perception”) had proven
Paris: (1) lane keeping at speeds as high as 130 to be efficient. These packages combine specific
km/h; (2) tracking and relative state estimation feature-extraction algorithms; generic shape
of (as many as) six vehicles in each hemisphere models with likely aspect conditions for hy-
in the autonomous vehicle’s and two neigh- pothesis generation; dynamic models for ob-
boring lanes; (3) transition to convoy driving ject motion; and recursive estimation algo-
at speed-dependent distances to the vehicle in rithms, allowing the adjustment of both shape
front; and (4) lane changing, including deci- parameters and state variables for single ob-
sion making about whether the lane changes jects. N of these objects can be instantiated in
were safe (Dickmanns 1995; Dickmanns et al. parallel. Communication is done through a dy-
1994). namic object database (DOB) updated 25 times
In figure 2, a summary is given on the gen- a second (video rate). More details on the sys-
eral system concept as derived from ground tem can be found in Gregor et al. (2001); Pel-
and air vehicle (Schell and Dickmanns 1994) lkofer 2003; Pellkofer, Lützeler, and Dickmanns
applications (except the upper right part, (2001); Rillings and Broggi (2000) and Rieder
which is more recent). The lower left part, in- (2000). On the higher system levels, more flex-
cluding inertial sensing, was developed for air ibility and easy adaptability in mission perfor-
vehicles first and added later to the ground ve- mance, situation assessment, and behavior de-
hicle systems. The lower part essentially works cision were to be achieved.
with systems engineering methods (recursive Starting from activities of a physical unit
state estimation and automatic control). The (the body with its functioning subsystems) in-
vision part is based on computer graphics ap- stead of beginning with abstract definitions of
proaches adapted to recursive estimation (ob- terms seems to be less artificial because the no-
ject-oriented coordinate systems for shape rep- tion of verbs results in a natural way. This start
resentation, homogeneous coordinate trans- is achieved by introducing symbols for activi-
formations [HCTs] for linking points in the im- ties the system already can perform because of
age to object representation in 3D space, and the successful engineering of bodily devices.
simulation models for object motion). It This abstraction of symbols from available be-
should be noted that HCTs allow for easy im- havioral capabilities provides the symbol
SUMMER 2004 13
Articles
Task
Central assignment AI
Valid level
Mission plan Decision
CD Mission planning
SA
Situation assessment
on larger time scale Mission gaze & attention behavior locomotion
in task context Monitoring BDGA decision BDL
DOB time histories (ring buffers) Local Gaze control Vehicle control
Objects / subjects: represented by mission
identifiers & actual state variables elements Explicit representation of capabilities
Dynamic Knowledge triggered behaviors
Representation
recognition of
relative ego-state; DKR
Feed- Feed- Feed Feed-
predictions; many single forward back forward back
objects control time control control time control 4
histories laws histories laws
D
recognition basic software basic software level
inertial ego state
of objects of specific gaze control vehicle control
including short-term
predictions classes computer hardware computer hardware
other inertial vision actuators for actuators for

gaze control locomotion control
sensors gaze platform
Autonomous Vehicle Hardware (physical body)
Figure 2. Dynamic Vision System with Active Gaze Stabilization and Control for
Vehicle Guidance under Strongly Perturbed Conditions.
Control engineering methods predominate in the lower part (below knowledge representation bar) and AI-type methods in the upper part.
grounding often complained to be missing in scales; however, their approach is missing the
AI. It is interesting to note that Damasio (1999) nice scalability effect of homogeneous coordi-
advocates a similar shift for understanding hu- nates in space and does not always provide the
man consciousness. highest resolution along the time axis. In the
The rest of the article discusses the new type existing implementation at UBM, explicit rep-
of expectation-based, multifocal, saccadic resentation of capabilities on the mental level
(EMS) vision system and how this system has has been fully realized for gaze and locomotion
led to the modified concept of technical intel- control only. Several processes in planning and
ligence. EMS vision was developed over the last perception have yet to be adapted to this new
six years with an effort of about 35 person- standard. The right center part of figure 2
years. During its development, it became ap- shows the functions implemented.
parent that explicit representation of behav-
ioral capabilities was desirable for achieving a Dynamic versus
higher level of system autonomy. This feature
is not found in other system architectures such
Computational Vision
as discussed by Arkin (1998) and Brooks and Contrary to the general approach to real-time
Stein (1994). Albus and Meystel (2001) use vision adopted by the computer science and AI
multiple representations on different grid communities, dynamic vision right from the
14 AI MAGAZINE
Articles
beginning does embed each single image into a mation available about how to adjust parame-
time history of motion in 3D space for which ters to obtain the best-possible spatial fit. (If
certain continuity conditions are assumed to you have experienced the difference between
hold. Motion processes of objects are modeled, looking at a monitor display of a complex spa-
rather than objects, which later on are subject- tial structure of connected rods when it is sta-
ed to motion. For initialization, this approach tionary and when it is in motion, you immedi-
might seem impossible because one has to start ately appreciate the advantages.)
with the first image anyway. However, this ar- Third, checking the elements in each row
gument is not quite true. The difference lies in and column of the Jacobian matrix in conjunc-
when to start working on the next images of tion allows decision making about whether a
the sequence and how to come up with object state or shape parameter should be updated or
and motion hypotheses based on a short tem- whether continuing the evaluation of a feature
poral sequence. Instead of exhausting comput- does make sense. By making these decisions
ing power with possible feature combinations carefully, many of the alleged cases of poor per-
in a single image for deriving an object hypoth- formance of Kalman filtering can be eliminat-
esis (known as combinatorial explosion), recog- ed. If confusion of features might occur, it is
nizable characteristic groupings of simple fea- better to renounce using them at all and in-
tures are tracked over time. Several hypotheses stead look for other features for which corre-
of objects under certain aspect conditions can spondence can be established more easily.
be started in parallel. It is in this subtask of hy- Fourth, characteristic parameters of stochas-
pothesis generation that optical (feature) flow tic noise in the dynamic and measurement
might help; for tracking, it loses significance. processes involved can be used for tuning the
The temporal embedding in a hypothesized filter in an optimal way; one can differentiate
continuous motion then allows much more ef- how much trust one puts in the dynamic mod-
ficient pruning of poor hypotheses than when el or the measurement model. This balance
you proceed without it. These motion models might even be adjusted dynamically, depend-
have to be in 3D space (and not in the image ing on the situation encountered (for example,
plane) because only there are the Newtonian lighting and aspect conditions and vibration
second-order models for motion of massive ob- levels).
jects valid in each degree of freedom. Because Fifth, the (physically meaningful) states of
of the second order (differential relationship) the processes observed can be used directly for
of these models, speed components in all six motion control. In systems dynamics, the best
degrees of freedom occur quite naturally in the one can do in linear feedback control is to link
setup and will be estimated by the least squares corrective control output directly to all state
fit of features according to the recursive estima- variables involved (see, for example, Kailath
tion scheme underlying the extended Kalman [1980]). This linear feedback allows shifting
filter. Prediction error feedback according to eigenvalues of the motion process arbitrarily
perspective mapping models is expected to (as long as the linearity conditions are valid).
converge for one object shape and motion hy- All these items illustrate that closed-loop
pothesis. There are several big advantages to perception and action cycles do have advan-
this approach: tages. In the 4D approach, these advantages
First, state-prediction and forward-perspec- have has been exploited since 1988 (Wuensche
tive mapping allow tuning of the parameters in 1988), including active gaze control since 1984
the feature-extraction algorithms; this tuning (Mysliwetz 1990). Visual perception for land-
might improve the efficiency of object recogni- ing approaches under gusty wind conditions
tion by orders of magnitude! It also allows im- could not be performed successfully without
plementing the idea of Gestalt in technical vi- inertial measurements for counteracting fast
sion systems, as discussed by psychologists for perturbations (Schell 1992). This feature was al-
human vision. so adopted for ground vehicles (Werner 1997).
Second, in parallel to the nominal forward- When both a large field of view close up and
perspective mapping, additional mappings are a good resolution farther away in a limited re-
computed for slight variations in each single gion are requested (such as for high-speed dri-
(hypothesized) value of the shape parameters ving and occasional stop-and-go driving in
and state components involved. From these traffic jams or tight maneuvering on networks
variations, the Jacobian matrix of first-order of minor roads), peripheral-foveal differentia-
derivatives on how feature values in the image tion in visual imaging is efficient. Figure 3
plane depend on the unknown parameters and shows the aperture angles required for map-
states is determined. Note that because the ping the same circular area (30 meter diameter)
model is in 3D space, there is very rich infor- into a full image from different ranges.
SUMMER 2004 15
Articles
grees is desirable. Second, a small subarea of

this f.o.v. needs to have much higher spatial
60˚ ~17.5˚ resolution, such that at several hundred meters
distance, an object of the size of about 0.1 me-
30 m 5.7˚ ter in its smaller dimension should be de-
tectable. Third, the pointing direction should
0 30 m 100 m 300 m be quickly controllable to track fast-moving
objects and reduce motion blur, which is also
required for compensating disturbances on the
autonomous vehicle body. In case some new,
Figure 3. Scaling of Lenses Required for Mapping the Same interesting features are discovered in the wide
Area at Different Ranges into an Image of Given Size. f.o.v., the top speed of the saccades for shifting
viewing direction should limit delay times to a
few tenths of a second (as with the human
eye). Fourth, the capability of color vision
With approximately 760 pixels to an image should be at least in part of the f.o.v. Fifth,
row, at each of the distances given, about 4 there should be a high dynamic range with re-
centimeters normal to the optical axis are spect to light intensity distribution in the f.o.v.
mapped into a single pixel. The wide-angle (in humans to 120 decibels). Sixth, for maneu-
lens (60 degrees) would map 40 centimeters at vering in narrow spaces or reacting to other ve-
a distance of 300 meters onto a single pixel. In hicles maneuvering right next to oneself (for
this case, saccadic viewing direction control is example, lane changes), stereo vision is advan-
advantageous because it trades about two or- tageous. Stereo ranges up to 10 meters might
ders of magnitude in data flow requirement for be sufficient for these applications. Under the
short delay times until the high-resolution im- flat-ground assumption, good monocular
ager has been turned to point to the area of range estimation (accuracies of a few percent)
special interest. These two-axis pointing de- to vehicles has been achieved reliably as long
vices do have their own dynamics, and image as the point where the vehicles touch the
evaluation has to be interrupted during phases ground can be seen. Seventh, degradation of
of fast rotational motion (blur effects); the vision process should be gradual if environ-
nonetheless, an approximate representation of mental conditions deteriorate (dusk, rain,
the motion processes under observation has to snowfall, fog, and so on). Multiple cameras
be available. This requirement is achieved by a with different photometric and perspective
standard component of the recursive estima- mapping properties can help achieve robust-
tion process predicting the states with the dy- ness against changing environmental condi-
namic models validated to the present point in tions.
time. This very powerful feature has to rely on Based on these considerations, an eye for
spatiotemporal models for handling all transi- road vehicles to be assembled from standard
tions properly, which is way beyond what is cameras and lenses was designed in 1995 for
usually termed computational vision. Dynamic quick testing of the approach. It was dubbed
vision is the proper term for this approach, the multifocal, active-reactive vehicle eye
which takes many different aspects of dynamic (MARVEYE) and is not considered a prototype
situations into account for perception and con- for a product, just a research tool. Figure 4
trol of motion. shows some of the properties. It allows investi-
gating and proving all basic requirements for
vision system design; further development
Requirements for a steps toward a product seem justified only in
General Sense of Vision for Au- case it appears to be promising from a price-
performance point of view.
tonomous Vehicles Beside these specifications regarding sensor
The yardstick for judging the performance lev- properties, equally important features regard-
el of an autonomous vehicle is the human one. ing the knowledge-based interpretation
Therefore, the vehicle’s sense of vision should process should be met. It relies on schemata for
at least try to achieve many (if not all) of the classes and relationships between its members.
average human capabilities. Objects of certain classes are known generi-
The most important aspects of human vi- cally, and visual interpretation relies on tap-
sion for vehicle control are as follows: First, a ping this source of knowledge, which encom-
wide field of view (f.o.v.) nearby for regions passes both 3D shape and likely visual
greater than 180 degrees, as with humans, is appearance as well as typical motion patterns
not required, but greater than about 100 de- over time. In the generic models, some adapt-
16 AI MAGAZINE
Articles
wide field
f = 4.8 mm of view f = 16 mm f = 50 mm
~45˚ 3-chip
color camera
single
chip ~ 23˚ high sensitivity ~7˚
color
cameras
divergent tronocular stereo
>100˚
L0.05 = ~30 mm L0.05 = ~100 mm L0.05 = ~300 mm
Figure 4. Visual Ranges of the MARVEYE Concept with Four Cameras.
able parameters, the actual aspect conditions, tive spatial arrangement and motion state, the
and the motion state variables have to be de- maneuver activity they are actually involved in
rived from actual observations. and the relative state within the maneuver
Objects of certain classes do not follow sim- should also be available (lower right corner in
ple laws of motion. Subjects capable of initiat- table 1).
ing maneuvers (actions) on their own form a On the feature level, there are two (partially)
wide variety of different classes and subclasses. separate regimes for knowledge representation:
Each subclass again can have a wide variety of First, what are the most reliable and robust fea-
members: All animals capable of picking up in- tures for object detection? For initialization, it
formation from the environment by sensors is important to have algorithms available, al-
and making decisions, how to control parts of lowing an early transition from collections of
their body or the whole body, are members of features observed over a few images of a se-
these subclasses. Also, other autonomous vehi- quence (spanning a few tenths of a second at
cles and vehicles with human drivers inside most) to hypotheses of objects in 3D space and
form specific subclasses. time. Three-dimensional shape and relative as-
Most of our knowledge about the world (the pect conditions also have to be hypothesized
mesoscale world of everyday life under discus- in conjunction with the proper motion com-
sion here) is geared to classes of objects and ponents and the underlying dynamic model.
subjects as well as to individual members of Second, for continued tracking with an estab-
these classes. The arrangement of individual lished hypothesis, the challenge is usually
members of these classes in a specific environ- much more simple because temporal predic-
ment and a task context is called a situation. tions can be made, allowing work with special
Three areas of knowledge can be separated feature-extraction methods and parameters
on a fine-to-coarse scale in space and time adapted to the actual situation. This adapta-
(table 1): (1) the feature/(single object) level tion was the foundation for the successes of
[“here and now” for the actual point in space the 4D approach in the early and mid-1980s.
and time, upper left corner]; (2) the generic ob- For detecting, recognizing, and tracking
ject (subject) class level, combining all possible objects of several classes, specific methods have
properties of class members for further refer- been developed and proven in real-world ap-
ence and communication (central row); and plications. In the long run, these special per-
(3) the situation level with symbolic represen- ceptual capabilities should be reflected in the
tations of all single objects (subjects) relevant knowledge base of the system by explicit repre-
for decision making. In addition to their rela- sentations, taking their characteristics into ac-
SUMMER 2004 17
Articles
be implemented). This animation capability on

Column no 1 2 3 4
range in time point
the subject level is considered essential for
local extended
in space in time time integral time integrals deeply understanding what is happening
around oneself and arriving at good decisions
point in “here & now” transition matrices -------- with respect to safety and mission perfor-
features
3D space measurements for local objects
mance.
local edge positions, change of feature feature On the situation level (lower right corner in
differential regions, angles, parameters history table 1), the distinction between relevant and
environment curvatures
irrelevant objects-subjects has to be taken, de-
local object states, state transitions, state predictions, pending on the mission to be performed.
objects
space feature distri- new aspect condit. object state Then, the situation has to be assessed, further
integrals bution, shape “central hub” histories taking a wider time horizon into account, and
decisions for safely achieving the goals of the
maneuver local 1 step prediction multiple step autonomous vehicle mission have to be taken.
situations
space situation of situation predictions; Two separate levels of decision making have
(usually not done) monitoring; been implemented: (1) optimizing perception
mission actual global mission
space situation -------- performance
(mentioned earlier as BDGA (Pellkofer 2003;
Pellkofer, Lützeler, and Dickmanns 2001) and
(2) controlling a vehicle (BDL [behavior deci-
Table 1. Multiple-Scale Recognition and Tracking of Features, Objects, and sion for locomotion] (Maurer 2000; Sieders-
Situations in Expectation-Based, Multifocal, Saccidic (EMS) Vision. berger 2003)) or monitoring a driver. During
mission planning, the overall autonomous
mission is broken down into mission elements
that can be performed in the same control
count for the planning phase. These character- mode. Either feed-forward control with para-
istics specify regions of interest for each object meterized time histories or feedback control
tracked. This feature allows flexible activation depending on deviations from the ideal trajec-
of the gaze platform carrying several cameras tory is applied for achieving the subgoal of
so that the information increase by the image each mission element. Monitoring of mission
stream from the set of cameras is optimized. progress and the environmental conditions
These decisions are made by a process on the can lead to events calling for replanning or lo-
upper system level dubbed behavior decision cal deviations from the established plan. Safety
for gaze and attention (BDGA in upper right of aspects for the autonomous vehicle body and
figure 2 [Pellkofer 2003; Pellkofer, Lützeler, and other participants always have a high priority
Dickmanns 2003). in behavior decision. Because of unforseeable
On the object level, all relevant properties events, a time horizon of only several seconds
and actual states of all objects and subjects in- or a small fraction of a minute is usually possi-
stantiated are collected for the discrete point in ble in ground vehicle guidance. Viewing direc-
time now and possibly for a certain set of re- tion and visual attention have to be controlled
cent time points (recent state history). so that sudden surprises are minimized. When
For subjects, the maneuvers actually per- special maneuvers such as lane changing or
formed (and the intentions probably behind turning onto a crossroad are intended, visual
them) are also inferred and represented in spe- search patterns are followed to avoid danger-
cial slots of the knowledge base component ous situations (defensive-style driving).
dedicated to each subject. The situation-assess- A replanning capability should be available
ment level (see later discussion) provides these as a separate unit accessible from the “central
data for subjects by analyzing state-time histo- decision,” part of the direct overall system (see
ries and combining the results with back- figure 2, top right); map information on road
ground knowledge on maneuvers spanning networks is needed, together with quality crite-
certain time scales. For vehicles of the same ria, for selecting route sequences (Gregor
class as oneself, all perceptual and behavioral 2002).
capabilities available for oneself are assumed to The system has to be able to detect, track,
be valid for the other vehicle, too, allowing and understand the spatiotemporal relation-
fast in-advance simulations of future behav- ship to all objects and subjects relevant for the
iors, which are then exploited on the situation task at hand. To not overload the system, tasks
level. are assumed to belong to certain domains for
For members of other classes (parameter- which the corresponding knowledge bases on
ized), stereotypical behaviors have to be stored objects, subjects, and possible situations have
or learned over time (this capability still has to been provided. Confinement to these domains
18 AI MAGAZINE
Articles
keeps knowledge bases more manageable. consisting of regularly sampled snapshots, dis-
Learning capabilities in the 4D framework are crete dynamic models in the form of (n x n)
a straightforward extension but still have to be transition matrices for the actual sampling pe-
added. It should have become clear that in the riod are sufficient. For rigid objects with 3
dynamic vision approach, closed-loop percep- translational and 3 rotational degrees of free-
tion and action cycles are of basic importance, dom, these are 12 12 matrices because of the
also for learning improved or new behaviors. second-order nature of Newtonian mechanics.
In the following section, basic elements of If the motion components are uncoupled, only
the EMS vision system are described first; then, 6 (2 2) blocks along the diagonal are nonze-
their integration into the overall cognitive sys- ro. For practical purposes in noise-corrupted
tem architecture are discussed. environments, usually the latter model is cho-
sen; however, with a first-order noise model
added in each degree of freedom, we have 6
System Design blocks of (3 3) transition matrices.
The system is object-oriented in two ways: By proper tuning of the noise-dependent pa-
First, the object-oriented–programming para- rameters in the recursive estimation process, in
digm as developed in computer science has general, good results can be expected. If several
been adopted in the form of the C++ language different cameras are being used for recogniz-
(grouping of data and methods relevant for ing the same object, this object is represented
handling of programming objects, exploiting multiple times, initially with the specific mea-
class hierarchies). Second, as mentioned earlier, surement model in the scene tree. A supervisor
physical objects in 3D space and time with module then has to fuse the model data or se-
their dynamic properties, and even behavioral lect the one to be declared the valid one for all
capabilities of subjects, are the core subject of measurement models. In the case of several
representation.2 Space does not allow a de- sensors collecting data on the same object in-
tailed discussion of the various class hierar- stantiated, one Jacobian matrix has to be com-
chies of relevance. puted for each object-sensor pair. These
processes run at video rate. They implement all
Basic Concept the knowledge on the feature level as well as
According to the data and the specific knowl- part of the knowledge on the object level. This
edge levels to be handled, three separate levels implies basic generic shape, the possible range
of visual perception can be distinguished, as in- of values for parameters, the most likely aspect
dicated in figure 5 (right). Bottom-up feature conditions and their influence on feature dis-
extraction over the entire area of interest gives tribution, and dynamic models for temporal
the first indications of regions of special inter- embedding. For subjects with potential control
est. Then, an early transition to objects in 3D applications, their likely value settings also
space and time is made. Symbols for instantiat- have to be guessed and iterated.
ed objects form the central representational el- As a result, describing elements of higher-
ement (blocks 3 to 5 in figure 5). Storage is al- level percepts are generated and stored (3D
located for their position in the scene tree (to shape and photometric parameters as well as
be discussed later) and the actual best estimates 4D state variables [including 3D velocity com-
of their yet to be adapted shape parameters and ponents]). They require several orders of mag-
state variables (initially, the best guesses). nitude less communication bandwidth than
The corresponding slots are filled by special- the raw image data from which they were de-
ized computing processes for detecting, recog- rived. Although the data stream from 2 black-
nizing, and tracking members of their class and-white and 1 (3-chip) color camera is in the
from feature collections in a short sequence of 50 megabyte a second range, the output data
images (agents for visual perception of class rate is in the kilobyte range for each video cy-
members); they are specified by the functions cle. These percepts (see arrows leaving the 4D
coded. The generic models used on level 2 level upward in figure 2 [center left] and figure
(blocks 3 and 4) consist of one subsystem for 5 [into block 5]) are the basis for further evalu-
3D shape description, with special emphasis ations on the upper system level. Note that
on highly visible features. A knowledge com- once an instance of an object has been gener-
ponent for speeding up processing is the aspect ated and measurement data are not available
graph designating classes of feature distribu- for one or a few cycles, the system might
tions in perspective images. The second subsys- nonetheless continue its regular mission, based
tem represents temporal continuity conditions on predictions according to the temporal mod-
based on dynamic models of nth order for ob- el available. The actual data in the correspond-
ject motion in space. In a video data stream ing object slots will then be the predicted val-
SUMMER 2004 19
Articles
Situation analysis: Which are the relevant objects? L

Recognition of maneuvers and intentions of other subjects, predictions; e
Check situation with respect to own mission plans and goals. v
e
Behavior decision for l
visual perception and locomotion 3
Time histories of some state variables (for objects of special interest)

Scene tree representation of all objects tracked (Hom. Coord. Transf. (HCT))
Dynamic Object dataBase (DOB, with time stamps) 5
Top-down feature extraction in specific fields of view; L

find corresponding (groups of) features on objects tracked; e
4D recursive state estimation 4 v
e
platform l
4D
for gaze Recognize, track large object structures nearby 2
3 level
control
1-3a
wid
e
Tele
Feature Database (topological) 2
wide
L
Bottom-up feature extraction in large (total) field of view; e
MarV detect interesting (groups of) features, feature flow;
v
e
Eye (central) stereo disparity map in rows searched. 1 l
1
Figure 5. Three Distinct Levels of Visual Perception.

1. Bottom-up detection of standard features. 2. Specific features for individual objects under certain 3D aspect conditions. 3. Time histories
of state variables for recognizing maneuvers and mission elements (trajectory and intent recognition at larger spatial and temporal scales).
ues according to the model instantiated. Trust computer in actual implementation. The ellip-
in these predicted data will decay over time; tical dark field on the shaded rectangular area
therefore, these periods without new visual in- at the left-hand side (1–3a) indicates that ob-
put are steadily monitored. In a system relying ject recognition based on wide-angle images is
on saccadic vision, these periods occur regular- performed directly with the corresponding fea-
ly when the images are blurred because of high tures by the same processors for bottom-up fea-
rotational speeds for shifting the visual area of ture detection in the actual implementation.
high resolution to a new (wide-angle) image re- In the long run, blocks 1 and 2 are good candi-
gion of special interest. This fact clearly indi- dates for special hardware and software on
cates that saccadic vision cannot be imple- plug-in processor boards for a PC.
mented without temporal models and that the The 4D level (2) for image processing (la-
4D approach lends itself to saccadic vision beled 3 and 4 in the dark circles) requires a
quite naturally. completely different knowledge base than level
In a methodologically clean architecture, 1. It is applied to (usually quite small) special
the blocks numbered 1 to 3 in the dark circular image regions discovered in the feature data-
regions in figure 5 should be separated com- base by interesting groups of feature collec-
pletely. Because of hardware constraints in tions. For these feature sets, object hypotheses
frame grabbing and communication band- in 3D space and time are generated. These 4D
width, they have been lumped together on one object hypotheses with specific assumptions
20 AI MAGAZINE
Articles
for range and other aspect conditions allow The upper part of figure 5 is specific to situ-
predictions of additional features to be de- ation analysis and behavior decision. First, tra-
tectable with specially tuned operators (types jectories of objects (proper) can be predicted in
and parameters), thereby realizing the idea of the future to see whether dangerous situations
Gestalt recognition. Several hypotheses can be can develop, taking the motion plans of the
put up in parallel, if computing resources allow autonomous vehicle into account. For this rea-
is; invalid ones will disappear rapidly over time son, not just the last best estimate of state vari-
because the Jacobian matrices allow efficient ables is stored, but on request, a time history of
use of computational resources. a certain length can also be buffered. Newly
Determining the angular ego state from iner- initiated movements of subjects (maneuver
tial measurements allows a reduction of search onsets) can be detected early (Caveney, Dick-
areas in the images, especially in the tele-im- manns, and Hedrik 2003). A maneuver is de-
ages. The dynamic object database (DOB, block fined as a standard control time history result-
5 in figure 5) serves the purpose of data distri- ing in a desired state transition over a finite
bution; copies of this DOB are sent to each period of time. For example, if a car slightly in
computer in the distributed system. The DOB front in the neighboring lane starts moving to-
isolates the higher-system levels from high vi- ward the lane marking that separates its lane
sual and inertial data rates to be handled in real from the one of the autonomous vehicle, the
time on the lower levels. Here, only object initiation of a lane-change maneuver can be
identifiers and best estimates of state variables concluded. In this maneuver, a lateral offset of
(in the systems dynamics sense), control vari- one lane width is intended within a few sec-
ables, and model parameters are stored. onds. This situational aspect of a started ma-
By looking at time series of these variables neuver is stored by a proper symbol in an addi-
(of limited extension), the most interesting ob- tional slot; in a distributed realization, pointers
jects of the autonomous vehicle and their like- can link this slot to the other object properties.
ly future trajectories have to be found for deci- A more advanced application of this tech-
sion making. For this temporally deeper nique is to observe an individual subject over
understanding of movements, the concept of time and keep statistics on its way of behaving.
maneuvers and mission elements for achieving For example, if this subject tends to frequently
some goals of the subjects in control of these close in on the vehicle in front at unreasonably
vehicles has to be introduced. Quite naturally, small values, the subject might be classified as
this requirement leads to explicit representa- aggressive, and one should try to keep a safe
tion of behavioral capabilities of subjects for distance from him or her. A great deal of room
characterizing their choices in decision mak- is left for additional extensions in storing
ing. These subjects might activate gaze control knowledge about behavioral properties of oth-
and select the mode of visual perception and er subjects and learning to know them not just
locomotion control. Realization of these be- as class members but as individuals.
havioral activities in the physical world is usu-
ally performed on special hardware (real-time Arranging of Objects in a Scene Tree
processors) close to the physical device (see fig- To classify the potentially large decision space
ure 2, lower right). Control engineering meth- for an autonomous system into a manageable
ods such as parameterized stereotypical feed- set of groups, the notion of situation has been
forward control-time histories or (state- or introduced by humans. The underlying as-
output-) feedback control schemes are applied sumption is that for given situations, good
here. However, for achieving autonomous ca- rules for arriving at reasonable behavioral deci-
pabilities, an abstract (quasistatic) representa- sions can be formulated. A situation is not just
tion of these control modes and the transitions an arrangement of objects in the vicinity but
possible between them has been implemented depends on the intentions of the autonomous
recently on the upper system levels by AI vehicle and those of other subjects. To repre-
methods according to extended state charts sent a situation, both the geometric distribu-
(Harel 1987; Maurer 2000a; Siedersberger tion of objects-subjects with their velocity
2003). This procedure allows more flexible be- components, their intended actions, and the
havior decisions, based also on maneuvers of autonomous vehicle’s actions have to be taken
other subjects supposedly under execution. In into account.
a defensive style of driving, this recognition of The relative dynamic state of all objects-sub-
maneuvers is part of situation assessment. For jects involved is the basic property for many
more details on behavior decision for gaze and aspects of a situation; therefore, representing
attention (BDGA), see Pellkofer (2003) and Pel- this state is the central task. In the 4D ap-
lkofer, Lützeler, and Dickmanns (2001). proach, Dirk Dickmanns (1997) introduced the
SUMMER 2004 21
Articles
er, this process involves a heavy computation-

al load. Both numeric differencing and analytic
Road in geographic CS (2 dof) sun (lighting
differentiation of each single HCT, followed by
(6 dof) conditions)
the concatenations required, have been tried
Road at location of Other and work well. From systematic changes of the
own vehicle Differential (local) objects full set of unknown parameters and state vari-
Longitude, lateral road model
positions, (4 curvature parameters) ables, the Jacobian matrix rich in first-order in-
yaw, pitch, and Road at formation on all spatial relationships is ob-
bank angles position of tained. Under ego motion, it allows motion
other object
stereo interpretation with a single camera.
The lower left part of figure 6 shows all rele-
vant relationships within the vehicle carrying
Autonomous vehicle cg the three cameras. The perspective projection
position & orientation step P models the physics of light rays between
Fixed
(6 dof) objects in the outside world (road and other ve-
Platform base hicles) and the sensor elements on the imager
on chip Other chip. When transfer into computer memory is
Angles measured object, performed by the frame grabber, additional 2D
conventionally Local
Features “where”- (position) shifts can occur, represented by transformation
(2 rotations) in images problem FG. In the upper part of figure 6, the extended
(+1 translation
fixed)
road is represented, both with its nearby region
(shape yielding information on where the autonomous
P FG “what”- parameters)
problem vehicle is relative to it and with its more distant
parts connected to the local road by a generic
Moveable part curvature model (Dickmanns and Zapp 1986).
of platform (6 dof Vision
fixed) When absolute geographic information is
Frame grabbing offset
3 cameras needed, for example, for navigating with maps
Perspective projection or determining orientation from angles to the
Concatenated transformations Pi' Sun, corresponding elements of planetary
geometry have to be available. It is interesting
to note that just five HCTs and a few
astronomical parameters allow representing
Figure 6. Scene Tree for Visual Interpretation in Expectation-Based, Multifo-
cal, Saccadic Vision Showing the Transformations from the Object in Space the full set of possible shadow angles for any
to Features in the Image. point on a spherical Earth model at any arbi-
trary time. For a given point on Earth at a given
time, of course, only two angles describe the
direction to the Sun (azimuth and elevation
yielding actual, local shadow angles).
so-called scene tree for achieving this goal in an Other objects can be observed relative to the
efficient and flexible way. In figure 6, an exam- autonomous vehicle’s position or relative to
ple is given for a road scene with other vehicles their local road segment (or some other neigh-
and shadows. As in computer graphics, each boring object). The problem of where another
node represents an object, a separately mov- object is located essentially results from sum-
able subobject, or a virtual object such as a coming positions of features belonging to this
ordinate system or the frame grabber. Its ho- object. For understanding the type of object,
mogeneous coordinate transformations essentially differencing of feature positions
(HCTs), using 4 4 matrices and represented and shape interpretation are required. In bio-
by graph edges, are exactly as used in computer logical systems, these computations are done
graphics; however, in computer vision, many in separate regions of the brain, which is why
of the variables entering the transformation the naming of “where” and “what” problems
matrices between two elements are the un- has been introduced and adopted here (right
knowns of the problem and have to be iterated part of figure 6). The number of degrees of free-
separately. By postponing concatenation and dom of relative position and orientation mod-
forming the derivatives of each matrix with re- eled is shown at each edge of the scene tree; 6
spect to the variables involved, the elements of degrees of freedom (3 translations and 3 rota-
the first-order overall derivative matrix linking tions) is the maximum available for a rigid ob-
changes of feature positions in the image to ject in 3D space. (More details are given in
changes in the generic shape parameters or Dickmanns [1997] and Dickmanns and Wuen-
state variables can easily be computed. Howev- sche [1999].)
22 AI MAGAZINE
Articles
Representation of Capabilities,
Mission Requirements
Category of Gaze Locomotion
Useful autonomous systems must be able to ac-
capabilities control control
cept tasks and solve them on their own. For
this purpose, they must have a set of perceptu-
al and behavioral capabilities at their disposal.
Flexible autonomous systems are able to recog- Goal-oriented use Performance of
nize situations and select behavioral compo- of all capabilities mission elements
nents that allow efficient mission performance. in mission context by coordinated
All these activities require goal-oriented con-
behaviors
trol actuation. Subjects can be grouped to class-
es according to their perceptual and behavioral
Goal-oriented Maneuvers, Maneuvers,
capabilities (among other characteristics such specific capabilities special feed- special feed-
as shape), so that the scheme developed and in each category back modes back modes
discussed later can be used for characterizing
classes of subjects. This scheme is a recent de-
velopment; its explicit representation in the
C++ code of EMS vision has been done for gaze Elementary skills
and locomotion behavior (figure 7). Gaze Vehicle
(automated
Two classes of behavioral capabilities in mo- control control
capabilities)
tion control are of special importance: (1) so- in each category basic skills basic skills
called maneuvers, in which special control time
histories lead to a desired state transition in a
finite (usually short) amount of time, and (2) Underlying Underlying
Basic software software & software &
regulating activities, realized by some kind of & hardware
feedback control, in which a certain relative actuators actuators
preconditions
state is to be achieved and/or maintained; the gaze control own body
regulating activities can be extended over long
periods of time (like road running). Both kinds
of control application are extensively studied
Figure 7. Capability Matrix as Represented in Explicit Form in the
in control engineering literature: maneuvers in
EMS Vision System for Characterizing Subjects by Their Capabilities
connection with optimal control (see, for ex-
in Different Categories of Action.
ample, Bryson and Ho [1975]) and regulating
activities in the form of linear feedback control
(Kailath [1980] only one of many textbooks). of vehicle, friction coefficients to the road, and
Besides vehicle control for locomotion, both so on). This knowledge is quasistatic and is typ-
types are also applied in active vision for gaze ically handled by AI methods (for example,
control. Saccades and search patterns are real- state charts [Harel 1987]).
ized with prespecified control time histories Actual implementations of maneuvers and
(topic 1 earlier), and view fixation on moving feedback control are achieved by tapping
objects and inertial stabilization are achieved knowledge stored in software on the processors
with proper feedback of visual or inertial sig- directly actuating the control elements (lower
nals (topic 2). layers in figures 2, 7, and 8 right). For maneu-
Complex control systems are realized on dis- vers, numerically or analytically coded time
tributed computer hardware with specific histories can be activated with proper parame-
processors for vehicle- and gaze-control actua- ters; the nominal trajectories resulting can also
tion on the one side and for knowledge pro- be computed and taken as reference for super-
cessing on the other. The same control output imposed feedback loops dealing with unpre-
(maneuver, regulatory behavior) can be repre- dictable perturbations. For regulatory tasks, on-
sented internally in different forms, adapted to ly the commanded (set) values of the
the subtasks to be performed in each unit. For controlled variables have to be specified. The
decision making, it is not required to know all feedback control laws usually consist of matri-
data of the actual control output and the reac- ces of feedback gains and a link to the proper
tions necessary to counteract random distur- state or reference variables. These feedback
bances as long as the system can trust that per- loops can be realized with short time delays be-
turbations can be handled. However, it has to cause measurement data are directly fed into
know which state transitions can be achieved these processors. This dual representation with
by which maneuvers, possibly with favorable different schemes on the level close to hard-
parameters depending on the situation (speed ware (with control engineering–type methods)
SUMMER 2004 23
Articles
and on the mental processing level (with qua- subjects might allow inferring pursued inten-
sistatic knowledge as suited for decision mak- tions. In any case, these observations can affect
ing) has several advantages. Besides the practi- autonomous vehicle decisions.
cal advantages for data handling at different Recent publications in the field of research
rates, this representation also provides symbol on human consciousness point in the direc-
grounding for verbs and terms. In road vehicle tion that the corresponding activities are per-
guidance, terms such as road running with con- formed in different parts of the brain in hu-
stant speed or with speed adapted to road curva- mans (Damasio 1999).
ture, lane changing, convoy driving, or the com- Figure 2 shows the overall cognitive system
plex (perception and vehicle control) architecture as it results quite naturally when
maneuver for turning off onto a crossroad are eas- exploiting recursive estimation techniques
ily defined by combinations or sequences of with spatiotemporal models for dynamic vi-
feed-forward and feedback control activation. sion (4D approach). However, it took more
For each category of behavioral capability, net- than a decade and many dissertations to get all
work representations can be developed show- elements from the fields of control engineering
ing how specific goal-oriented capabilities and AI straightened out, arranged, and experi-
(third row from bottom of figure 7) depend on mentally verified such that a rather simple but
elementary skills (second row). The top layer in very flexible system architecture resulted. Ob-
these network representations shows how spe- ject-oriented programming and powerful mod-
cific capabilities in the different categories con- ern software tools were preconditions for
tribute to overall goal-oriented use of all capa- achieving this goal with moderate investment
bilities in the mission context. The bottom of human capital. An important step in clarify-
layer represents necessary preconditions for ing the roles of control engineering and AI
the capability to be actually available; before methods was to separate the basic recognition
activation of a capability, this fact is checked step for an object-subject here and now from
by the system through flags, which have to be accumulating more in-depth knowledge on
set by the corresponding distributed proces- the behavior of subjects derived from state
sors. For details, the interested reader is referred variable time histories.
to Pellkofer (2003) and Siedersberger (2004). The horizontal bar half way up in figure 2,
labeled “Dynamic Knowledge Representation
Overall Cognitive System Architecture (DKR),” separates the number-crunching lower
The components for perception and control of part, working predominantly with systems dy-
an autonomous system have to be integrated namics and control engineering methods,
into an overall system, allowing both easy hu- from the mental AI part on top. It consists of
man-machine interfaces and efficient perfor- three (groups of) elements: (1) the scene tree
mance of the tasks for which the system has for representing the properties of objects and
been designed. To achieve the first goal, the subjects perceived (left); (2) the actually valid
system should be able to understand terms and task (mission element) for the autonomous ve-
phrases as they are used in human conversa- hicle (center); and (3) the abstract representa-
tions. Verbs are an essential part of these con- tions of autonomous vehicle’s perceptual and
versations. They have to be understood, in- behavioral capabilities (right).
cluding their semantics and the movements or Building on this wealth of knowledge, situa-
changes in appearance of subjects when watch- tion assessment is done on the higher level in-
ing these activities. The spatiotemporal models dependently for gaze and vehicle control be-
used in the 4D approach should alleviate the cause MARVEYE with its specialized components
realization of these requirements in the future. has more precise requirements than just vehicle
In general, two regimes in recognition can guidance. It especially has to come up with the
be distinguished quite naturally: First are ob- decision when to split attention over time and
servations to grasp the actual kind and motion perform saccadic perception with a unified
states of objects or subjects as they appear here model of the scene (Pellkofer 2003; Pellkofer,
and now, which is the proper meaning of the Lützeler, and Dickmanns 2001). A typical ex-
phrase to see. Second, when an object-subject ample is recognition of a crossroad with the
has been detected and recognized, further ob- telecamera requiring alternating viewing direc-
servations on a larger time scale allow deeper tions to the intersection and further down into
understanding of what is happening in the the crossroad (Lützeler 2002). If several special-
world. Trajectories of objects can tell some- ist visual recognition processes request contra-
thing about the nature of environmental per- dicting regions of attention, which cannot be
turbations (such as a piece of paper moved satisfied by the usual perceptual capabilities,
around by wind fields), or the route selected by the central decision unit has to set priorities in
24 AI MAGAZINE
Articles
Image Proc. PC 1 Image Proc. PC 2 Image Proc. PC 3 Behavior PC

Synchronization and Data Exchange
DKR DKR DKR DKR
grab grab grab CD HMI

EPC RDTL EPC RDTD EPC
server server server
BD BDL MP
GA
monochr.
monochr.
monochr.
color
GC VC EPC GPS
vehicle subsystem
MarVEye camera configuration on two-axis gaze
gaze control platform subsystem
Figure 8. Distributed Commercial Off-the-Shelf (COTS) Computer System with Four Dual PCs, Three Frame Grabbers, and Two
Transputer Systems on Which Expectation-Based, Multifocal, Saccadic Vision Has Been Realized and Demonstrated.
the mission context. Possibly, it has to modify tions when these control laws are to be used
vehicle control such that perceptual needs can with which parameters and (2) storing actual
be matched (see also figure 2, upper right); results of an application together with some
there is much room for further developments. performance measures about the quality of re-
The results of situation assessment for vehi- sulting behavior. The definition of these per-
cle guidance in connection with the actual formance measures (or pay-off functions, pos-
mission element to be performed include some sibly geared to more general values for judging
kind of prediction of the autonomous vehicle behavior) is one of the areas that needs more
trajectory intended and of several other ones attention. Resulting maximum accelerations
for which critical situations might develop. and minimal distance to obstacles encountered
Background knowledge on values, goals, and or to the road shoulder might be some criteria.
behavioral capabilities of other vehicles allow
assuming intentions of other subjects (such as
a lane-change maneuver when observing sys- Realization with Commercial
tematic deviations to one side of its lane). On Off-the-Shelf Hardware
this basis, fast in-advance simulations or quick The EMS vision system has been implemented
checks of lookup tables can clarify problems on two to four DualPentium PCs with a scal-
that might occur when the autonomous vehi- able coherent interface (SCI) for communica-
cle maneuver is continued. Based on a set of tion rates up to 80 megabytes a second and
rules, the decision level can then trigger transi- with NT as the operating system. This simple
tion to a safer maneuver (such as deceleration solution has only been possible because two
or a deviation to one side). transputer subsystems (having no operating
The implementation of a new behavior is system at all) have been kept as a hardware in-
done on the 4D level (lower center right in fig- terface to conventional sensors and actuators
ure 2), again with control engineering methods from the previous generation (lower layers in
allowing the prediction of typical frequencies figure 8). Spatiotemporal modeling of process-
and damping of the eigenmodes excited. De- es allows for adjustments to variable cycle
signing these control laws and checking their times and various delay times in different
properties is part of conventional engineering branches of the system.
in automotive control. What has to be added Three synchronized frame grabbers handle
for autonomous systems is (1) specifying situa- video data input for as many as 4 cameras at a
SUMMER 2004 25
Articles
dom in yaw and pitch (pan and tilt), as used in

the five-ton test vehicle VAMORS. Its stereo base
is about 30 centimeters. For achieving a large
stereo field of view, the optical axes of the outer
(wide-angle) cameras are almost parallel in the
version shown. Divergence angles as high as ±
~20° have been investigated for a larger total
field of view, accepting a reduced stereo field.
Both extreme versions have even been imple-
mented together with three more cameras
mounted at the bottom of the carrier platform.
Because of the increased inertia, time for sac-
cades went up, of course. For high dynamic
performance, a compromise has to be found
with smaller cameras and a smaller stereo base
(entering inertia by a square term in lateral off-
set from the axis of rotation). (Note that the
human stereo base is only 6 to 7 centimeters).
Smaller units for cars with reduced stereo
base (~15 cm) and only 1 degree of freedom in
Figure 9. One Version of MARVEYE as Tested in VAMORS. pan have also been studied. In the latest ver-
sion for cars with a central zoom camera
rate of as high as 50 megabytes a second. The mounted vertically in the yaw axis, viewing di-
first one transfers only 2 video fields every 40 rection in pitch is stabilized and controlled by
milliseconds into PC1 (of the 4 odd and even a mirror (Dickmanns 2002).
ones grabbed every 20 microseconds). The two
black-and-white cameras with wide-angle lens- Experimental Results
es are divergently mounted on the pointing
platform. The second frame grabber does the Both driving on high-speed roads with lane
same with (triple) red-green-blue fields of a markings and on unmarked dirt roads, as well
three-chip color camera with a mild telelens on as cross country, have been demonstrated with
PC2, and the third one handles a black-and- EMS vision over the last three years. Recogniz-
white video field of a camera with a strong tele- ing a crossroad without lane markings and
lens on PC3. The embedded PC (EPC) DEMON turning onto it, jointly using viewing direction
process allows starting and controlling of all and vehicle control, is shown here as one ex-
PCs from a single user interface on the behav- ample (Lützeler and Dickmanns 2000). Figure
ior PC, on which all higher-level processes for 10 shows three snapshots taken at different
behavior decision (both central decision [CD] time intervals, with the telecamera performing
and those for gaze and attention (BDGA) as a saccade during the approach to an intersec-
well as for locomotion [BDL]) run. The subsys- tion. Saccading is performed to see the cross-
tems for gaze control (GC) and vehicle control road both at the intersection and at a distance
(VC), as well as the global positioning system into the intended driving direction for the
(GPS) sensor, are connected to this PC (figure 8, turnoff. This procedure allows determining the
right). It also serves as human-machine-inter- intersection angle and geometry more precise-
face (HMI) and mission planning (MP). In the ly over time. When these parameters and the
lower right, a small picture of the test vehicle distance to the crossroad have been deter-
VAMORS can be seen. mined, the turning-off maneuver is initiated
System integration is performed through dy- by fixing the viewing direction at some suit-
namic knowledge representation (DKR) ex- able distance from the vehicle into the cross-
ploiting SCI, the bar shown running on top road; now the gaze angle relative to the vehicle
through all PCs, thereby implementing the body increases as the vehicle continues driving
corresponding level in figure 2 (for details, see straight ahead. Based on the turning capability
Rieder [2000]). of the vehicle, a steer angle rate is initiated at a
proper distance from the center of the intersec-
MARVEYE tion, which makes the vehicle start turning
Several realizations of the idea of a multifocal such that under normal conditions, it should
active-reactive vehicle eye (MARVEYE) with end up on the proper side of the crossroad.
standard video cameras have been tested. Fig- During this maneuver, the system tries to
ure 9 shows one version with degrees of free- find the borderlines of the crossroad as a new
26 AI MAGAZINE
Articles
Figure 10. Teleimages during Saccadic Vision When Approaching a Crossroad.

The center image during a saccade is not evaluated (missing indicated search paths).
reference in the wide-angle images. Figure 11 serted in one of four PCs. Detection of a ditch
shows this sequence with simultaneously tak- at greater distances was performed using
en images left and right in each image row and proven edge-detection methods, taking aver-
a temporal sequence at irregular intervals from age intensity values on the sides into account;
top down. The upper row (figure 11a) was tak- discriminating shadows on a smooth surface
en when the vehicle was still driving essential- and nearby tracking of the ditch had to be
ly in the direction of the original road, with the done on the basis of the disparity image deliv-
cameras turned left (looking over the shoul- ered. The ditch is described by an encasing rec-
der). The left image shows the left A-frame of tangle in the ground surface, the parameters
the body of VAMORS and the crossroad far away and orientation of which are determined visu-
through the small side window. The crossroad ally. An evasive maneuver with view fixation
is measured in the right window only, yielding on the nearest corner was performed (Hof-
the most important information for a turnoff mann and Siedersberger 2003; Pellkofer, Hof-
to the left. mann, and Dickmanns 2003)].
In figure 11b, the vehicle has turned so far
that searching for the new road reference is
done in both images, which has been achieved
Conclusions and Outlook
in figure 11c of the image pairs. Now, from this The integrated approach using spatiotemporal
road model for the near range, the road bound- models, similar to what humans do in everyday
aries are also measured in the teleimage for rec- life (subconsciously), seems more promising for
ognizing road curvature precisely (left image in developing complex cognitive robotic systems
figure 11d). The right subimage here shows a than separate subworlds for automatic control
bird’s eye view of the situation, with the vehi- and knowledge processing. Both the cybernet-
cle (marked by an arrow) leaving the intersec- ics approach developed around the middle of
tion area. the last century and the AI approaches devel-
This rather complex perception and locomo- oped in the latter part seem to have been too
tion maneuver can be considered a behavioral narrow minded for real success in complex en-
capability emerging from proper combinations vironments with noise-corrupted processes and
of elementary skills in both perception and a large spectrum of objects and subjects. Sub-
motion control, which the vehicle has or does jects encountered in our everyday environment
not have. The development task existing right range from simple and dumb (animals on lower
now is to provide autonomous vehicles with a evolutionary levels or early robots) to intelli-
sufficient number of these skills and behavioral gent (such as primates); autonomous vehicles
capabilities so that they can manage to find can augment this realm. Classes of subjects can
their way on their own and perform entire mis- be distinguished (beside shape) by their poten-
sions (Gregor 2002; Gregor et al. 2001). As a fi- tial perceptual and behavioral capabilities. Indi-
nal demonstration of this project, a complete viduals of each class can actually only have
mission on a road network was performed, in- some of these capabilities at their disposal. For
cluding legs on grass surfaces with the detec- this reason (among others), the explicit repre-
tion of ditches by EMS vision (Siedersberger et sentation of capabilities has been introduced in
al. 2001). For more robust performance, the the EMS visual perception system.
EMS vision system has been beefed up with This advanced system mimicking vertebrate
special stereo hardware from Pyramid Vision vision trades about two orders of magnitude re-
System (PVS ACADIA board). The board was in- duction in steady video data flow for a few
SUMMER 2004 27
Articles
system level by distributed processing taking

specific sensor locations and parameters into
account; basic visual perception is done (ex-
cept during saccades with fast gaze direction
changes) by class-specific agents having gener-
ic shape and motion models at their disposal.
During saccades, the system works with pre-
dictions based on the dynamic models (in-
cluding animation of typical behaviors). The
results are accumulated in a shared scene tree
displaying the actual best estimates of geomet-
ric shape parameters and (3D) dynamic state
variables of all objects-subjects.
This so-called DOB is the decoupling layer
with respect to the upper AI-type level on
which the data stream concerning objects and
subjects is monitored and analyzed (maybe on
a different time scale) with special respect to
the tasks to be solved. The object data rate is
reduced by several orders of magnitude com-
pared to video input rate. The object-oriented
representation in connection with the task at
hand yields the situation and determines the
decisions for efficient use of perceptual and be-
havioral capabilities.
Behavioral capabilities are implemented on
the 4D layer with its detailed representation of
dynamic properties. Most of the stochastic
perturbations encountered in real-world
processes are counteracted on this level by
feedback control geared directly to desired
state-variable time histories (trajectories). All
these capabilities are represented on the high-
er decision level by their global effect in state
transition (performing maneuvers) or by their
regulatory effects in counteracting perturba-
tions while performing tasks for achieving the
goals of the mission elements. Switching be-
tween capabilities is triggered by events or fin-
ished mission elements. Switching and transi-
tion schemes are represented by state charts.
The system has been implemented and val-
idated in the two autonomous road vehicles
VAMORS (5-ton van) and VAMP (Mercedes S-
Figure 11. MarVEye Snapshots for Turning class car) at UBM. Driving on high-speed
Left into a Crossroad (unsealed, gravel). roads, turning off onto crossroads in networks
A. The turn maneuver just started; the right wide-angle camera tracks the cross- of minor roads, leaving and entering roads for
road; the left one shows the side window of the vehicle. B. Halfway into the turn- driving cross country, and avoiding obstacles
off. C. Both wide-angle cameras track the new road reference. D. Left: Tele-image
(including ditches as negative obstacles) have
after resuming “road running” on the crossroad. Right: Bird’s eye view of turn-off
trajectory.
been demonstrated.
By defining proper performance criteria in
special categories and more general values for
tenths of a second delay time in attention-fo- the overall system, the quality of the actual
cused, high-resolution perception. This con- functioning of these capabilities can be judged
cept allows a wide field of view (> ~ 100° by in any actual case. This procedure can form a
45°) at moderate data rates, including a central starting point for learning, alone or in a team,
area of stereo vision that can be redirected by based on perception and locomotion capabili-
about ± 70 degrees in pan. All video signals are ties. Slightly varied control output in certain
interpreted in conjunction on the so-called 4D maneuvers (or, more general, actions) will lead
28 AI MAGAZINE
Articles
to different values of the performance Inter-pretation des Fahrspurverlaufes durch Control of Motion. In Handbook of Comput-
measure for this maneuver. By observ- Rechnersehen für ein autonomes Straßen- er Vision and Applications, Volume 3, eds.
ing which changes in control have led fahrzeug (Visual Recognition and Interpre- Bernd Jähne, Horst Haußecker, and Peter
to which changes in performance, tation of Roads by Computer Vision for an Geißler, 569– 620. San Diego, Calif.: Acad-
Autonomous Road Vehicle). Ph.D. diss., emic Press.
overall improvements can be achieved
Department LRT, UniBw Munich. Dickmanns, E. D., and Zapp, A. 1986. A
systematically. Similarly, systematic
Brooks R., and Stein, L. 1994. Building Curvature-Based Scheme for Improving
changes in cooperation strategies can
Brains for Bodies. Autonomous Robots 1(1): Road Vehicle Guidance by Computer Vi-
be evaluated in light of some more re- 7–25. sion. In Mobile Robots, Proceedings of the
fined performance criteria. On the top Society of Photo-Optical Instrumentation
Bryson, A. E., Jr., and Ho, Y.-C. 1975. Ap-
of figure 7, layers have to be added for plied Optimal Control. Optimization, Estima- Engineers (SPIE), Volume 727, 161–168.
future extensions toward systems cation, and Control. rev. ed. Washington D.C.: Bellingham, Wash.: SPIE—The Interna-
pable of learning and cooperation. Hemisphere. tional Society for Optical Engineering.
The explicit representation of other ca- Caveney, D.; Dickmanns, D.; and Hedrik, K. Dickmanns, E. D.; Mysliwetz, B.; Chris-
pabilities of subjects, such as percep- 2003. Before the Controller: A Multiple Tar- tians, T. 1990. Spatiotemporal Guidance of
tion, situation assessment, planning, get Tracking Routine for Driver Assistance Autonomous Vehicles by Computer Vision.
and evaluation of experience made, Systems. Automatisierungstechnik 51 (5/ IEEE Transactions on Systems, Man, and Cy-
should be added as additional 2003): 230–238. bernetics (Special issue on Unmanned Vehi-
columns in the long run. This ap- Damasio, A. 1999. The Feeling of What Hap- cles and Intelligent Robotic Systems) 20(6):
proach will lead to more flexibility and pens. Body and Emotion in the Making of Con- 1273–1284.
extended autonomy. sciousness. New York: Harcourt Brace. Dickmanns, E. D.; Behringer, R.; Dick-
Dickmanns, E. D. 2002. The Development manns, D.; Hildebrandt, T.; Maurer, M.;
Acknowledgments of Machine Vision for Road Vehicles in the Thomanek, F.; and Schiehlen, J. 1994. The
Last Decade. Paper presented at the IEEE In- Seeing Passenger Car. Paper presented at
This work was funded by the German
telligent Vehicle Symposium, 18–20 June, the Intelligent Vehicles ‘94 Symposium,
Bundesamt fuer Wehrtechnik und 24–26 October, Paris, France.
Versailles, France.
Beschaffung (BWB) from 1997 to 2002
Dickmanns, E. D. 1998. Vehicles Capable of Gregor, R. 2002. Fähigkeiten zur Missions-
under the title “Intelligent Vehicle durchführung und Landmarkennavigation
Dynamic Vision: A New Breed of Technical
Functions.” Contributions by former (Capabilities for Mission Performance and
Beings? Artificial Intelligence 103(1–2): 49–
coworkers S. Baten, S. Fuerst, and V. v. 76. Landmark Navigation). Ph.D. diss., Depart-
Holt are gratefully acknowledged here ment LRT, UniBw Munich.
Dickmanns, D. 1997. Rahmensystem für vi-
because they do not show up in the suelle Wahrnehmung veränderlicher Gregor, R., and Dickmanns, E. D. 2000.
list of references. Szenen durch Computer (Framework for EMS Vision: Mission Performance on Road
Visual Perception of Changing Scenes by Networks. In Proceedings of the IEEE Intel-
Notes Computers). Ph.D. diss., Department Infor- ligent Vehicles Symposium, 140–145.
1. Also of interest is McCarthy, J.; Minsky, matik, UniBw Munich. Washington, D.C.: IEEE Computer Society.
M.; Rochester, N.; and Shannon, C. 1955. A Dickmanns, E. D. 1987. 4D Dynamic Scene Gregor, R.; Lützeler, M.; Pellkofer M.;
Proposal for the Dartmouth Summer Re- Analysis with Integral Spatiotemporal Mod- Siedersberger, K.-H.; and Dickmanns, E. D.
search Project on Artificial Intelligence, els. In the Fourth International Symposium on 2001. A Vision System for Autonomous
Aug. 31. Robotics Research, eds. R. Bolles and B. Roth, Ground Vehicles with a Wide Range of Ma-
2. It is interesting to note here that in the 311–318. Cambridge Mass.: The MIT Press. neuvering Capabilities. Paper presented at
English language, this word is used both for Dickmanns, E. D. 1995. Performance Im- the International Workshop on Computer
the acting instance (a subject [person] do- provements for Autonomous Road Vehi- Vision Systems, 7–8 July, Vancouver, Cana-
ing something) and for the object being in- cles. In Proceedings of the International da.
vestigated (subjected to investigation). In Conference on Intelligent Autonomous Gregor, R., Lützeler, M.; Pellkofer, M.;
the German language, one would prefer to Systems (IAS-4), 27–30 March, Karlsruhe, Siedersberger, K. H.; and Dickmanns, E. D.
call this latter one the object, even if it is a Germany. 2000. EMS Vision: A Perceptual System for
subject in the former sense (der Gegen- Autonomous Vehicles. In Proceedings of
Dickmanns, E. D., and Graefe, V. 1988a.
stand der Untersuchung; Gegenstand direct- the IEEE Intelligent Vehicles Symposium,
Application of Dynamic Monocular Ma-
ly corresponds to the Latin word objectum). 52–57. Washington, D.C.: IEEE Computer
chine Vision. Journal of Machine Vision and
Application 1: 223–241. Society.
References Harel, D. 1987. State Charts: A Visual For-
Dickmanns, E. D., and Graefe, V. 1988b.
Albus, J. S., and Meystel, A. M. 2001. Engi- Dynamic Monocular Machine Vision. Jour- malism for Complex Systems. Science of
neering of Mind—An Introduction to the Sci- nal of Machine Vision and Application Computer Programming 8(3):231–274.
ence of Intelligent Systems. New York: Wiley 1:242–261. Hofmann, U., and Siedersberger, K.-H.
Interscience. 2003. Stereo and Photometric Image Se-
Dickmanns, E. D., and Mysliwetz, B. 1992.
Arkin, R. C. 1998. Behavior-Based Robotics. Recursive 3D Road and Relative Ego-State quence Interpretation for Detecting Nega-
Cambridge, Mass.: MIT Press. Recognition. IEEE Transactions on Pattern tive Obstacles Using Active Gaze Control
AW&ST. 1986. Researchers Channel AI Ac- Analysis and Machine Intelligence (PAMI) and Performing an Autonomous Jink. Pa-
tivities toward Real-World Applications. (Special issue on Interpretation of 3D per presented at the SPIE AeroSense Un-
Aviation Week and Space Technology, Feb. 17: Scenes) 14(2): 199–213. manned Ground Vehicles Conference,
40–52. Dickmanns, E. D., and Wünsche, H.-J. 22–23 April, Orlando, Florida.
Behringer, R. 1996. Visuelle Erkennung und 1999. Dynamic Vision for Perception and Hofmann, U.; Rieder, A.; and Dickmanns,
SUMMER 2004 29
Articles
E. D. 2000. EMS Vision: An Application to Joint Conferences on Artificial Intelligence. Recognition by Machine. In Computers and
Intelligent Cruise Control for High Speed Pellkofer, M. 2003. Verhaltensentscheidung Thought, ed. E. Feigenbaum and J. Feldman,
Roads. In Proceedings of the IEEE Intelli- fuer autonome Fahrzeuge mit Blickrich- 235–267. New York: McGraw-Hill.
gent Vehicles Symposium, 468–473. Wash- tungssteuerung (Behavior Decision for Au- Siedersberger, K.-H. 2003. Komponenten
ington, D.C.: IEEE Computer Society. tonomous Vehicles with Gaze Control). zur automatischen Fahrzeugführung in se-
Kailath, T. 1980. Linear Systems. Englewood Ph.D. diss., Department LRT, UniBw Mu- henden (semi-) autonomen Fahrzeugen
Cliffs, N.J.: Prentice Hall. nich. (Components for Automatic Guidance of
Kalman, R. 1960. A New Approach to Lin- Pellkofer, M., and Dickmanns, E. D. 2000. Autonomous Vehicles with the Sense of Vi-
ear Filtering and Prediction Problems. In EMS Vision: Gaze Control in Autonomous sion). Ph.D. diss., LRT, UniBw Munich. LRT.
Transactions of the ASME Journal of Basic Vehicles. In Proceedings of the IEEE Intelli- Siedersberger, K.-H., and Dickmanns, E. D.
Engineering, 35–45. Fairfield, N.J.: Ameri- gent Vehicles Symposium, 296–301. Wash- 2000. EMS Vision: Enhanced Abilities for
can Society of Mechanical Engineers. ington, D.C.: IEEE Computer Society. Locomotion. In Proceedings of the IEEE In-
Klass, P. J. 1985. DARPA Envisions New Pellkofer, M.; Hofmann, U.; and Dick- telligent Vehicles Symposium, 146–151.
Generation of Machine Intelligence. Avia- manns E. D. 2003. Autonomous Cross Washington, D.C.: IEEE Computer Society.
tion Week and Space Technology. 122(16) (22 Country Driving Using Active Vision. Paper Siedersberger, K.-H.; Pellkofer, M.; Lützeler,
April 1985) presented at Intelligent Robots and Com- M.; Dickmanns, E. D.; Rieder, A.; Mandel-
Lützeler, M. 2002. Fahrbahnerkennung puter Vision XXI: Algorithms, Techniques, baum, R.; and Bogoni, I. 2001. Combining
zum Manövrieren auf Wegenetzen mit ak- and Active Vision, 27–30 October, Provi- EMS Vision and Horopter Stereo for Obsta-
tivem Sehen (Road Recognition for Maneu- dence, Rhode Island. cle Avoidance of Autonomous Vehicles. Pa-
vering on Road Networks Using Active Vi- Pellkofer, M.; Lützeler, M.; and Dickmanns, per presented at the International Work-
sion). Ph.D. diss., Department LRT, UniBw E. D. 2001. Interaction of Perception and shop on Computer Vision Systems, 7–8
Munich. Gaze Control in Autonomous Vehicles. Pa- July, Vancouver, Canada.
Lützeler, M., and Dickmanns E. D. 2000. per presented at the XX SPIE Conference
Thomanek, F. 1996. Visuelle Erkennung
EMS Vision: Recognition of Intersections on Intelligent Robots and Computer Vi-
und Zustandsschätzung von mehreren
on Unmarked Road Networks. In Proceed- sion: Algorithms, Techniques, and Active
Straßenfahrzeugen zur autonomen
ings of the IEEE Intelligent Vehicles Sympo- Vision, 28 October–2 November, Boston,
Fahrzeugführung (Visual Recognition and
sium, 302–307. Washington, D.C.: IEEE Massachusetts.
State Estimation of Multiple Road Vehicles
Computer Society. Pomerleau, D. A. 1992. Neural Network Per- for Autonomous Vehicle Guidance). Ph.D.
Marr, D. 1982. Vision. San Francisco, Calif.: ception for Mobile Robot Guidance. Ph.D.
diss., LRT, UniBw Munich.
Freeman. diss., Computer Science Department,
Carnegie Mellon University, Pittsburgh, Thomanek, F.; Dickmanns, E. D.; and Dick-
Maurer, M. 2000a. Flexible Automa- manns, D. 1994. Multiple Object Recogni-
tisierung von Straßenfahrzeugen mit Rech- Penn.
tion and Scene Interpretation for Au-
nersehen (Flexible Automation of Road Ve- Rieder, A. 2000. Fahrzeuge sehen—Multi-
tonomous Road Vehicle Guidance. Paper
hicles by Computer Vision). Ph.D. diss., sensorielle Fahrzeug-erkennung in einem
presented at the 1994 Intelligent Vehicles
Department LRT, UniBw Munich. verteilten Rechnersystem für autonome
Symposium, 24–26 October, Paris, France.
Maurer, M. 2000b. Knowledge Representa- Fahrzeuge (Seeing Vehicles—Multi-Sensory
Vehicle Recognition in a Distributed Com- Werner, S. 1997. Maschinelle Wahrneh-
tion for Flexible Automation of Land Vehi-
puter System for Autonomous Vehicles). mung für den bordautonomen automatis-
cles. In Proceedings of the IEEE Intelligent
Ph.D. diss., LRT, UniBw Munich. chen Hubschrauberflug (Machine Percep-
Vehicles Symposium, 575–580. Washing-
Rosenfeld, A., and Kak, A. 1976. Digital Pic- tion for On-board Autonomous Automatic
ton, D.C.: IEEE Computer Society.
ture Processing. San Diego, Calif.: Academic. Helicopter Flight). Ph.D. diss., LRT, UniBw
Miller G.; Galanter, E.; and Pribram, K. Munich.
1960. Plans and the Structure of Behavior. Schell, F. R. 1992. Bordautonomer automa-
New York: Holt, Rinehart, and Winston. tischer Landeanflug aufgrund bildhafter Wiener, N. 1948. Cybernetics. New York: Wi-
und inertialer Meßdatenauswertung (On- ley.
Moravec, H. 1983. The Stanford Cart and
the CME Rover. Proceedings of the IEEE board Autonomous Automatic Landing Ap- Wuensche, H.-J. 1988. Bewegungss-
71(7): 872–884. proached. Based on Processing of Iconic teuerung durch Rechnersehen (Motion
(Visual) and Inertial Measurement Data). Control by Computer Vision). Band 20,
Mysliwetz, B. 1990. Parallelrechner-basierte
Ph.D. diss., LRT, UniBw Munich. Fachber. Messen-Steuern–Regeln. Berlin:
Bildfolgen-Interpretation zur autonomen
Schell, F. R., and Dickmanns, E. D. 1994. Springer.
Fahrzeugsteuerung (Parallel Computer-
Based image Sequence Interpretation for Autonomous Landing of Airplanes by Dy- Ernst D. Dickmanns studied aerospace
the Guiding of Autonomous Vehicles). namic Machine Vision. Machine Vision and and control engineering at RWTH Aachen
Ph.D. diss., Department LRT, UniBw Mu- Application 7(3): 127–134. and Princeton University. After 14 years in
nich. Schiehlen, J. 1995. Kameraplattformen für aerospace research with DFVLR Oberpfaf-
Newell, A., and Simon, H. 1963. GPS: A Pro- aktiv sehende Fahrzeuge (Camera Pointing fenhofen in the field of optimal trajecto-
gram That Simulates Human Thought. In Platforms for Vehicles with Active Vision). ries, in 1975, he became full professor of
Computers and Thought, 279–293, eds. E. Ph.D. diss., LRT, UniBw Munich. control engineering in the Aerospace De-
Feigenbaum and J. Feldman. New York: Mc- Selfridge, O. 1959. Pandemonium: A Para- partment at the University of the Bun-
Graw-Hill. digm for Learning. In Proceedings of the Sym- deswehr near Munich. Since then, his main
Nilsson N. J. 1969. A Mobile Automaton: posium on the Mechanization of Thought research field has been real-time vision for
An Application of Artificial Intelligence. In Processes., 511–529, ed. D. V. Blake and A. autonomous vehicle guidance, pioneering
Proceedings of the First International Joint M. Uttley. London, U.K.: Her Majesty’s Sta- the 4D approach to dynamic vision. His e-
Conference on Artificial Intelligence, tionary Office. mail address is Ernst.Dickmanns@unibw-
509–521. Menlo Park, Calif.: International Selfridge, O., and Neisser, U. 1960. Pattern muenchen.de.
30 AI MAGAZINE

1758-Article Text-1754-1-10-20080129

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1758-Article Text-1754-1-10-20080129

Uploaded by

Copyright:

Available Formats

AI Magazine Volume 25 Number 2 (2004) (© AAAI)

namic aspects do not play an important role in

Second-generation vision systems by our plementation of spatial scaling by just one

other inertial vision actuators for actuators for

Autonomous Vehicle Hardware (physical body)

grees is desirable. Second, a small subarea of

L0.05 = ~30 mm L0.05 = ~100 mm L0.05 = ~300 mm

Figure 4. Visual Ranges of the MARVEYE Concept with Four Cameras.

be implemented). This animation capability on

Situation analysis: Which are the relevant objects? L

Time histories of some state variables (for objects of special interest)

Top-down feature extraction in specific fields of view; L

Figure 5. Three Distinct Levels of Visual Perception.

er, this process involves a heavy computation-

Image Proc. PC 1 Image Proc. PC 2 Image Proc. PC 3 Behavior PC

grab grab grab CD HMI

dom in yaw and pitch (pan and tilt), as used in

Figure 10. Teleimages during Saccadic Vision When Approaching a Crossroad.

system level by distributed processing taking

You might also like