Professional Documents
Culture Documents
Articles
Dynamic Vision-Based
Intelligence
Ernst D. Dickmanns
■ A synthesis of methods from cybernetics and AI field of AI (Miller, Gallenter, and Pribram 1960;
yields a concept of intelligence for autonomous Selfridge 1959).1 With respect to results
mobile systems that integrates closed-loop visual promised versus those realized, a similar situa-
perception and goal-oriented action cycles using
tion to that with cybernetics developed in the
spatiotemporal models. In a layered architecture,
systems dynamics methods with differential mod-
last quarter of the twentieth century.
els prevail on the lower, data-intensive levels, but In the context of AI, the problem of comput-
on higher levels, AI-type methods are used. Knowl- er vision has also been tackled (see, for exam-
edge about the world is geared to classes of objects ple, Marr [1982]; Rosenfeld and Kak [1976];
and subjects. Subjects are defined as objects with Selfridge and Neisser [1960]). The main para-
additional capabilities of sensing, data processing, digm initially was to recover three-dimension-
decision making, and control application. Special- al (3D) object shape and orientation from sin-
ist processes for visual detection and efficient gle images or from a few viewpoints. In aerial
tracking of class members have been developed. or satellite remote sensing, the task was to clas-
On the upper levels, individual instantiations of
sify areas on the ground and detect special ob-
these class members are analyzed jointly in the
task context, yielding the situation for decision
jects. For these purposes, snapshot images tak-
making. As an application, vertebrate-type vision en under carefully controlled conditions
for tasks in vehicle guidance in naturally perturbed sufficed. Computer vision was the proper name
environments was investigated with a distributed for these activities because humans took care
PC system. Experimental results with the test vehi- of accommodating all side constraints ob-
cle VAMORS are discussed. served by the vehicle carrying the cameras.
When technical vision was first applied to
vehicle guidance (Nilsson 1969), separate
viewing and motion phases with static image
evaluation (lasting for many minutes on re-
D
uring and after World War II, the prin- mote stationary computers) were initially
ciple of feedback control became well adopted. Even stereo effects with a single cam-
understood in biological systems and era moving laterally on the vehicle between
was applied in many technical disciplines for two shots from the same vehicle position were
unloading humans from boring work load in investigated (Moravec 1983). In the early
system control and introducing automatic sys- 1980s, digital microprocessors became suffi-
tem behavior. Wiener (1948) considered it to ciently small and powerful, so that on-board
be universally applicable as a basis for building image evaluation in near real time became pos-
intelligent systems and called the new disci- sible. The Defense Advanced Research Projects
pline cybernetics (the science of systems con- Agency (DARPA) started its program entitled
trol). After many early successes, these argu- On Strategic Computing in which vision archi-
ments soon were oversold by enthusiastic tectures and image sequence interpretation for
followers; at that time, many people realized ground-vehicle guidance were to be developed
that high-level decision making could hardly (autonomous land vehicle [ALV]) (AW&ST
be achieved on this basis. As a consequence, 1986). These activities were also subsumed un-
with the advent of sufficient digital computing der the title computer vision. This term became
power, computer scientists turned to descrip- generally accepted for a broad spectrum of ap-
tions of abstract knowledge and created the plications, which makes sense as long as dy-
10 AI MAGAZINE Copyright © 2004, American Association for Artificial Intelligence. All rights reserved. 0738-4602-2004 / $2.00
Articles
SUMMER 2004 11
Articles
is the general case in real life under natural jects of certain classes; (4) powerful feedback
conditions in the outside world. It was soon re- control laws for subjects in certain situations;
alized that humanlike performance levels and (5) knowledge databases, including explic-
could be expected only by merging successful it representations of perceptual and behavioral
methods and components both from the cy- capabilities for decision making.
bernetics approach and the AI approach. For As some researchers in biology tend to spec-
about a decade, the viability of isolated percep- ulate, the conditions for developing intelli-
tion and motion control capabilities for single gence might be much more favorable if tasks
tasks in vehicle guidance has been studied and requiring both vision and locomotion control
demonstrated (Dickmanns 1995; Dickmanns are to be solved. Because biological systems al-
and Graefe 1988). Figure 1 shows one of the most always have to act in a dynamic way in
test vehicles used at University of the Bun- 3D space and time, temporal modeling is as
deswehr, Munich (UBM). common as spatial modeling. The natural sci-
Basic knowledge had to be built up and, ences and mathematics have derived differen-
lacking performance of electronic hardware tial calculus for compact description of motion
did not yet allow for the realization of large- processes. In this approach, motion state is
scale vision systems; this lack of performance linked to temporal derivatives of the state vari-
was true for many of the subfields involved, ables, the definition of which is that they can-
such as frame grabbing, communication band- not be changed instantaneously at one mo-
width, digital processing power, and storage ca- ment in time; their change develops over time.
pabilities. The characteristic way in which this takes place
At many places, special computer hardware constitutes essential knowledge about “the
with massively parallel (partially single-bit) world.” Those variables in the description that
computers has been investigated, triggered first can be changed instantaneously are called con-
by insight into the operation of biological vi- trol variables. They are the basis for any type of
sion systems (for example in the DARPA pro- action, whether they are goal oriented or not.
ject on strategic computing [AW&ST 1986; They are also the ultimate reason for the possi-
Klass 1985]). This line of development lost mo- bility of free will in subjects when they are able
mentum with the rapid development of gener- to derive control applications from mental rea-
al-purpose microprocessors. Video technology soning with behavioral models (in the fuzzy
available and driven by a huge market con- diction mostly used). It is this well-defined ter-
tributed to this trend. At least for developing minology, and the corresponding set of meth-
basic vision algorithms and understanding ods from systems dynamics, that allows a uni-
characteristics of vision paradigms, costly fied approach to intelligence by showing the
hardware developments were no longer neces- docking locations for AI methods neatly sup-
sary. Today’s workstations and PCs in small plementing the engineering approach.
clusters allow investigating most of the prob- Recursive estimation techniques, as intro-
lems in an easy and rapid fashion. Once opti- duced by Kalman (1960), allow optimal estima-
mal solutions are known and have to be pro- tion of state variables of a well-defined object
duced in large numbers, it might be favorable under noisy conditions when only some out-
to go back to massively parallel vision hard- put variables can be measured. In monocular
ware. vision, direct depth information is always lost;
It is more and more appreciated that feed- nonetheless can 3D perception be realized
back signals from higher interpretation levels when good models for ego motion are avail-
allow making real-time vision much more effi- able (so-called motion stereo). This fact has been
cient; these problems are well suited for today’s the base for the success of the 4D approach to
microprocessors. Within a decade or two, tech- image sequence analysis (Dickmanns and
nical vision systems might well approach the Wünsche 1999).
performance levels of advanced biological sys- First, applications in the early 1980s had
tems. However, they need not necessarily fol- been coded for systems of custom-made (8-to
low the evolutionary steps observed in biology. 32-bit) microprocessors in Fortran for relatively
On the contrary, the following items might simple tasks. Driving at speeds as high as 60
provide a starting level for technical systems miles per hour (mph) on an empty highway
beyond the one available for most biological was demonstrated in 1987 with the 5-ton test
systems: (1) high-level 3D shape representa- vehicle VAMORS; both lateral (lane keeping)
tions for objects of generic classes; (2) homoge- and longitudinal control (adapting speed to
neous coordinate transformations and spa- road curvature) was done autonomously with
tiotemporal (4D) models for motion; (3) action half a dozen Intel 8086 microprocessors (Dick-
scripts (feed-forward control schemes) for sub- manns and Graefe 1988).
12 AI MAGAZINE
Articles
SUMMER 2004 13
Articles
Task
Central assignment AI
Valid level
Mission plan Decision
CD Mission planning
SA
Situation assessment
on larger time scale Mission gaze & attention behavior locomotion
in task context Monitoring BDGA decision BDL
DOB time histories (ring buffers) Local Gaze control Vehicle control
Objects / subjects: represented by mission
identifiers & actual state variables elements Explicit representation of capabilities
Dynamic Knowledge triggered behaviors
Representation
recognition of
relative ego-state; DKR
Feed- Feed- Feed Feed-
predictions; many single forward back forward back
objects control time control control time control 4
histories laws histories laws
D
recognition basic software basic software level
inertial ego state
of objects of specific gaze control vehicle control
including short-term
predictions classes computer hardware computer hardware
Figure 2. Dynamic Vision System with Active Gaze Stabilization and Control for
Vehicle Guidance under Strongly Perturbed Conditions.
Control engineering methods predominate in the lower part (below knowledge representation bar) and AI-type methods in the upper part.
grounding often complained to be missing in scales; however, their approach is missing the
AI. It is interesting to note that Damasio (1999) nice scalability effect of homogeneous coordi-
advocates a similar shift for understanding hu- nates in space and does not always provide the
man consciousness. highest resolution along the time axis. In the
The rest of the article discusses the new type existing implementation at UBM, explicit rep-
of expectation-based, multifocal, saccadic resentation of capabilities on the mental level
(EMS) vision system and how this system has has been fully realized for gaze and locomotion
led to the modified concept of technical intel- control only. Several processes in planning and
ligence. EMS vision was developed over the last perception have yet to be adapted to this new
six years with an effort of about 35 person- standard. The right center part of figure 2
years. During its development, it became ap- shows the functions implemented.
parent that explicit representation of behav-
ioral capabilities was desirable for achieving a Dynamic versus
higher level of system autonomy. This feature
is not found in other system architectures such
Computational Vision
as discussed by Arkin (1998) and Brooks and Contrary to the general approach to real-time
Stein (1994). Albus and Meystel (2001) use vision adopted by the computer science and AI
multiple representations on different grid communities, dynamic vision right from the
14 AI MAGAZINE
Articles
beginning does embed each single image into a mation available about how to adjust parame-
time history of motion in 3D space for which ters to obtain the best-possible spatial fit. (If
certain continuity conditions are assumed to you have experienced the difference between
hold. Motion processes of objects are modeled, looking at a monitor display of a complex spa-
rather than objects, which later on are subject- tial structure of connected rods when it is sta-
ed to motion. For initialization, this approach tionary and when it is in motion, you immedi-
might seem impossible because one has to start ately appreciate the advantages.)
with the first image anyway. However, this ar- Third, checking the elements in each row
gument is not quite true. The difference lies in and column of the Jacobian matrix in conjunc-
when to start working on the next images of tion allows decision making about whether a
the sequence and how to come up with object state or shape parameter should be updated or
and motion hypotheses based on a short tem- whether continuing the evaluation of a feature
poral sequence. Instead of exhausting comput- does make sense. By making these decisions
ing power with possible feature combinations carefully, many of the alleged cases of poor per-
in a single image for deriving an object hypoth- formance of Kalman filtering can be eliminat-
esis (known as combinatorial explosion), recog- ed. If confusion of features might occur, it is
nizable characteristic groupings of simple fea- better to renounce using them at all and in-
tures are tracked over time. Several hypotheses stead look for other features for which corre-
of objects under certain aspect conditions can spondence can be established more easily.
be started in parallel. It is in this subtask of hy- Fourth, characteristic parameters of stochas-
pothesis generation that optical (feature) flow tic noise in the dynamic and measurement
might help; for tracking, it loses significance. processes involved can be used for tuning the
The temporal embedding in a hypothesized filter in an optimal way; one can differentiate
continuous motion then allows much more ef- how much trust one puts in the dynamic mod-
ficient pruning of poor hypotheses than when el or the measurement model. This balance
you proceed without it. These motion models might even be adjusted dynamically, depend-
have to be in 3D space (and not in the image ing on the situation encountered (for example,
plane) because only there are the Newtonian lighting and aspect conditions and vibration
second-order models for motion of massive ob- levels).
jects valid in each degree of freedom. Because Fifth, the (physically meaningful) states of
of the second order (differential relationship) the processes observed can be used directly for
of these models, speed components in all six motion control. In systems dynamics, the best
degrees of freedom occur quite naturally in the one can do in linear feedback control is to link
setup and will be estimated by the least squares corrective control output directly to all state
fit of features according to the recursive estima- variables involved (see, for example, Kailath
tion scheme underlying the extended Kalman [1980]). This linear feedback allows shifting
filter. Prediction error feedback according to eigenvalues of the motion process arbitrarily
perspective mapping models is expected to (as long as the linearity conditions are valid).
converge for one object shape and motion hy- All these items illustrate that closed-loop
pothesis. There are several big advantages to perception and action cycles do have advan-
this approach: tages. In the 4D approach, these advantages
First, state-prediction and forward-perspec- have has been exploited since 1988 (Wuensche
tive mapping allow tuning of the parameters in 1988), including active gaze control since 1984
the feature-extraction algorithms; this tuning (Mysliwetz 1990). Visual perception for land-
might improve the efficiency of object recogni- ing approaches under gusty wind conditions
tion by orders of magnitude! It also allows im- could not be performed successfully without
plementing the idea of Gestalt in technical vi- inertial measurements for counteracting fast
sion systems, as discussed by psychologists for perturbations (Schell 1992). This feature was al-
human vision. so adopted for ground vehicles (Werner 1997).
Second, in parallel to the nominal forward- When both a large field of view close up and
perspective mapping, additional mappings are a good resolution farther away in a limited re-
computed for slight variations in each single gion are requested (such as for high-speed dri-
(hypothesized) value of the shape parameters ving and occasional stop-and-go driving in
and state components involved. From these traffic jams or tight maneuvering on networks
variations, the Jacobian matrix of first-order of minor roads), peripheral-foveal differentia-
derivatives on how feature values in the image tion in visual imaging is efficient. Figure 3
plane depend on the unknown parameters and shows the aperture angles required for map-
states is determined. Note that because the ping the same circular area (30 meter diameter)
model is in 3D space, there is very rich infor- into a full image from different ranges.
SUMMER 2004 15
Articles
16 AI MAGAZINE
Articles
wide field
f = 4.8 mm of view f = 16 mm f = 50 mm
~45˚ 3-chip
color camera
single
chip ~ 23˚ high sensitivity ~7˚
color
cameras
divergent tronocular stereo
>100˚
able parameters, the actual aspect conditions, tive spatial arrangement and motion state, the
and the motion state variables have to be de- maneuver activity they are actually involved in
rived from actual observations. and the relative state within the maneuver
Objects of certain classes do not follow sim- should also be available (lower right corner in
ple laws of motion. Subjects capable of initiat- table 1).
ing maneuvers (actions) on their own form a On the feature level, there are two (partially)
wide variety of different classes and subclasses. separate regimes for knowledge representation:
Each subclass again can have a wide variety of First, what are the most reliable and robust fea-
members: All animals capable of picking up in- tures for object detection? For initialization, it
formation from the environment by sensors is important to have algorithms available, al-
and making decisions, how to control parts of lowing an early transition from collections of
their body or the whole body, are members of features observed over a few images of a se-
these subclasses. Also, other autonomous vehi- quence (spanning a few tenths of a second at
cles and vehicles with human drivers inside most) to hypotheses of objects in 3D space and
form specific subclasses. time. Three-dimensional shape and relative as-
Most of our knowledge about the world (the pect conditions also have to be hypothesized
mesoscale world of everyday life under discus- in conjunction with the proper motion com-
sion here) is geared to classes of objects and ponents and the underlying dynamic model.
subjects as well as to individual members of Second, for continued tracking with an estab-
these classes. The arrangement of individual lished hypothesis, the challenge is usually
members of these classes in a specific environ- much more simple because temporal predic-
ment and a task context is called a situation. tions can be made, allowing work with special
Three areas of knowledge can be separated feature-extraction methods and parameters
on a fine-to-coarse scale in space and time adapted to the actual situation. This adapta-
(table 1): (1) the feature/(single object) level tion was the foundation for the successes of
[“here and now” for the actual point in space the 4D approach in the early and mid-1980s.
and time, upper left corner]; (2) the generic ob- For detecting, recognizing, and tracking
ject (subject) class level, combining all possible objects of several classes, specific methods have
properties of class members for further refer- been developed and proven in real-world ap-
ence and communication (central row); and plications. In the long run, these special per-
(3) the situation level with symbolic represen- ceptual capabilities should be reflected in the
tations of all single objects (subjects) relevant knowledge base of the system by explicit repre-
for decision making. In addition to their rela- sentations, taking their characteristics into ac-
SUMMER 2004 17
Articles
features
3D space measurements for local objects
mance.
local edge positions, change of feature feature On the situation level (lower right corner in
differential regions, angles, parameters history table 1), the distinction between relevant and
environment curvatures
irrelevant objects-subjects has to be taken, de-
local object states, state transitions, state predictions, pending on the mission to be performed.
objects
space feature distri- new aspect condit. object state Then, the situation has to be assessed, further
integrals bution, shape “central hub” histories taking a wider time horizon into account, and
decisions for safely achieving the goals of the
maneuver local 1 step prediction multiple step autonomous vehicle mission have to be taken.
situations
space situation of situation predictions; Two separate levels of decision making have
(usually not done) monitoring; been implemented: (1) optimizing perception
mission actual global mission
space situation -------- performance
(mentioned earlier as BDGA (Pellkofer 2003;
Pellkofer, Lützeler, and Dickmanns 2001) and
(2) controlling a vehicle (BDL [behavior deci-
Table 1. Multiple-Scale Recognition and Tracking of Features, Objects, and sion for locomotion] (Maurer 2000; Sieders-
Situations in Expectation-Based, Multifocal, Saccidic (EMS) Vision. berger 2003)) or monitoring a driver. During
mission planning, the overall autonomous
mission is broken down into mission elements
that can be performed in the same control
count for the planning phase. These character- mode. Either feed-forward control with para-
istics specify regions of interest for each object meterized time histories or feedback control
tracked. This feature allows flexible activation depending on deviations from the ideal trajec-
of the gaze platform carrying several cameras tory is applied for achieving the subgoal of
so that the information increase by the image each mission element. Monitoring of mission
stream from the set of cameras is optimized. progress and the environmental conditions
These decisions are made by a process on the can lead to events calling for replanning or lo-
upper system level dubbed behavior decision cal deviations from the established plan. Safety
for gaze and attention (BDGA in upper right of aspects for the autonomous vehicle body and
figure 2 [Pellkofer 2003; Pellkofer, Lützeler, and other participants always have a high priority
Dickmanns 2003). in behavior decision. Because of unforseeable
On the object level, all relevant properties events, a time horizon of only several seconds
and actual states of all objects and subjects in- or a small fraction of a minute is usually possi-
stantiated are collected for the discrete point in ble in ground vehicle guidance. Viewing direc-
time now and possibly for a certain set of re- tion and visual attention have to be controlled
cent time points (recent state history). so that sudden surprises are minimized. When
For subjects, the maneuvers actually per- special maneuvers such as lane changing or
formed (and the intentions probably behind turning onto a crossroad are intended, visual
them) are also inferred and represented in spe- search patterns are followed to avoid danger-
cial slots of the knowledge base component ous situations (defensive-style driving).
dedicated to each subject. The situation-assess- A replanning capability should be available
ment level (see later discussion) provides these as a separate unit accessible from the “central
data for subjects by analyzing state-time histo- decision,” part of the direct overall system (see
ries and combining the results with back- figure 2, top right); map information on road
ground knowledge on maneuvers spanning networks is needed, together with quality crite-
certain time scales. For vehicles of the same ria, for selecting route sequences (Gregor
class as oneself, all perceptual and behavioral 2002).
capabilities available for oneself are assumed to The system has to be able to detect, track,
be valid for the other vehicle, too, allowing and understand the spatiotemporal relation-
fast in-advance simulations of future behav- ship to all objects and subjects relevant for the
iors, which are then exploited on the situation task at hand. To not overload the system, tasks
level. are assumed to belong to certain domains for
For members of other classes (parameter- which the corresponding knowledge bases on
ized), stereotypical behaviors have to be stored objects, subjects, and possible situations have
or learned over time (this capability still has to been provided. Confinement to these domains
18 AI MAGAZINE
Articles
keeps knowledge bases more manageable. consisting of regularly sampled snapshots, dis-
Learning capabilities in the 4D framework are crete dynamic models in the form of (n x n)
a straightforward extension but still have to be transition matrices for the actual sampling pe-
added. It should have become clear that in the riod are sufficient. For rigid objects with 3
dynamic vision approach, closed-loop percep- translational and 3 rotational degrees of free-
tion and action cycles are of basic importance, dom, these are 12 12 matrices because of the
also for learning improved or new behaviors. second-order nature of Newtonian mechanics.
In the following section, basic elements of If the motion components are uncoupled, only
the EMS vision system are described first; then, 6 (2 2) blocks along the diagonal are nonze-
their integration into the overall cognitive sys- ro. For practical purposes in noise-corrupted
tem architecture are discussed. environments, usually the latter model is cho-
sen; however, with a first-order noise model
added in each degree of freedom, we have 6
System Design blocks of (3 3) transition matrices.
The system is object-oriented in two ways: By proper tuning of the noise-dependent pa-
First, the object-oriented–programming para- rameters in the recursive estimation process, in
digm as developed in computer science has general, good results can be expected. If several
been adopted in the form of the C++ language different cameras are being used for recogniz-
(grouping of data and methods relevant for ing the same object, this object is represented
handling of programming objects, exploiting multiple times, initially with the specific mea-
class hierarchies). Second, as mentioned earlier, surement model in the scene tree. A supervisor
physical objects in 3D space and time with module then has to fuse the model data or se-
their dynamic properties, and even behavioral lect the one to be declared the valid one for all
capabilities of subjects, are the core subject of measurement models. In the case of several
representation.2 Space does not allow a de- sensors collecting data on the same object in-
tailed discussion of the various class hierar- stantiated, one Jacobian matrix has to be com-
chies of relevance. puted for each object-sensor pair. These
processes run at video rate. They implement all
Basic Concept the knowledge on the feature level as well as
According to the data and the specific knowl- part of the knowledge on the object level. This
edge levels to be handled, three separate levels implies basic generic shape, the possible range
of visual perception can be distinguished, as in- of values for parameters, the most likely aspect
dicated in figure 5 (right). Bottom-up feature conditions and their influence on feature dis-
extraction over the entire area of interest gives tribution, and dynamic models for temporal
the first indications of regions of special inter- embedding. For subjects with potential control
est. Then, an early transition to objects in 3D applications, their likely value settings also
space and time is made. Symbols for instantiat- have to be guessed and iterated.
ed objects form the central representational el- As a result, describing elements of higher-
ement (blocks 3 to 5 in figure 5). Storage is al- level percepts are generated and stored (3D
located for their position in the scene tree (to shape and photometric parameters as well as
be discussed later) and the actual best estimates 4D state variables [including 3D velocity com-
of their yet to be adapted shape parameters and ponents]). They require several orders of mag-
state variables (initially, the best guesses). nitude less communication bandwidth than
The corresponding slots are filled by special- the raw image data from which they were de-
ized computing processes for detecting, recog- rived. Although the data stream from 2 black-
nizing, and tracking members of their class and-white and 1 (3-chip) color camera is in the
from feature collections in a short sequence of 50 megabyte a second range, the output data
images (agents for visual perception of class rate is in the kilobyte range for each video cy-
members); they are specified by the functions cle. These percepts (see arrows leaving the 4D
coded. The generic models used on level 2 level upward in figure 2 [center left] and figure
(blocks 3 and 4) consist of one subsystem for 5 [into block 5]) are the basis for further evalu-
3D shape description, with special emphasis ations on the upper system level. Note that
on highly visible features. A knowledge com- once an instance of an object has been gener-
ponent for speeding up processing is the aspect ated and measurement data are not available
graph designating classes of feature distribu- for one or a few cycles, the system might
tions in perspective images. The second subsys- nonetheless continue its regular mission, based
tem represents temporal continuity conditions on predictions according to the temporal mod-
based on dynamic models of nth order for ob- el available. The actual data in the correspond-
ject motion in space. In a video data stream ing object slots will then be the predicted val-
SUMMER 2004 19
Articles
ues according to the model instantiated. Trust computer in actual implementation. The ellip-
in these predicted data will decay over time; tical dark field on the shaded rectangular area
therefore, these periods without new visual in- at the left-hand side (1–3a) indicates that ob-
put are steadily monitored. In a system relying ject recognition based on wide-angle images is
on saccadic vision, these periods occur regular- performed directly with the corresponding fea-
ly when the images are blurred because of high tures by the same processors for bottom-up fea-
rotational speeds for shifting the visual area of ture detection in the actual implementation.
high resolution to a new (wide-angle) image re- In the long run, blocks 1 and 2 are good candi-
gion of special interest. This fact clearly indi- dates for special hardware and software on
cates that saccadic vision cannot be imple- plug-in processor boards for a PC.
mented without temporal models and that the The 4D level (2) for image processing (la-
4D approach lends itself to saccadic vision beled 3 and 4 in the dark circles) requires a
quite naturally. completely different knowledge base than level
In a methodologically clean architecture, 1. It is applied to (usually quite small) special
the blocks numbered 1 to 3 in the dark circular image regions discovered in the feature data-
regions in figure 5 should be separated com- base by interesting groups of feature collec-
pletely. Because of hardware constraints in tions. For these feature sets, object hypotheses
frame grabbing and communication band- in 3D space and time are generated. These 4D
width, they have been lumped together on one object hypotheses with specific assumptions
20 AI MAGAZINE
Articles
for range and other aspect conditions allow The upper part of figure 5 is specific to situ-
predictions of additional features to be de- ation analysis and behavior decision. First, tra-
tectable with specially tuned operators (types jectories of objects (proper) can be predicted in
and parameters), thereby realizing the idea of the future to see whether dangerous situations
Gestalt recognition. Several hypotheses can be can develop, taking the motion plans of the
put up in parallel, if computing resources allow autonomous vehicle into account. For this rea-
is; invalid ones will disappear rapidly over time son, not just the last best estimate of state vari-
because the Jacobian matrices allow efficient ables is stored, but on request, a time history of
use of computational resources. a certain length can also be buffered. Newly
Determining the angular ego state from iner- initiated movements of subjects (maneuver
tial measurements allows a reduction of search onsets) can be detected early (Caveney, Dick-
areas in the images, especially in the tele-im- manns, and Hedrik 2003). A maneuver is de-
ages. The dynamic object database (DOB, block fined as a standard control time history result-
5 in figure 5) serves the purpose of data distri- ing in a desired state transition over a finite
bution; copies of this DOB are sent to each period of time. For example, if a car slightly in
computer in the distributed system. The DOB front in the neighboring lane starts moving to-
isolates the higher-system levels from high vi- ward the lane marking that separates its lane
sual and inertial data rates to be handled in real from the one of the autonomous vehicle, the
time on the lower levels. Here, only object initiation of a lane-change maneuver can be
identifiers and best estimates of state variables concluded. In this maneuver, a lateral offset of
(in the systems dynamics sense), control vari- one lane width is intended within a few sec-
ables, and model parameters are stored. onds. This situational aspect of a started ma-
By looking at time series of these variables neuver is stored by a proper symbol in an addi-
(of limited extension), the most interesting ob- tional slot; in a distributed realization, pointers
jects of the autonomous vehicle and their like- can link this slot to the other object properties.
ly future trajectories have to be found for deci- A more advanced application of this tech-
sion making. For this temporally deeper nique is to observe an individual subject over
understanding of movements, the concept of time and keep statistics on its way of behaving.
maneuvers and mission elements for achieving For example, if this subject tends to frequently
some goals of the subjects in control of these close in on the vehicle in front at unreasonably
vehicles has to be introduced. Quite naturally, small values, the subject might be classified as
this requirement leads to explicit representa- aggressive, and one should try to keep a safe
tion of behavioral capabilities of subjects for distance from him or her. A great deal of room
characterizing their choices in decision mak- is left for additional extensions in storing
ing. These subjects might activate gaze control knowledge about behavioral properties of oth-
and select the mode of visual perception and er subjects and learning to know them not just
locomotion control. Realization of these be- as class members but as individuals.
havioral activities in the physical world is usu-
ally performed on special hardware (real-time Arranging of Objects in a Scene Tree
processors) close to the physical device (see fig- To classify the potentially large decision space
ure 2, lower right). Control engineering meth- for an autonomous system into a manageable
ods such as parameterized stereotypical feed- set of groups, the notion of situation has been
forward control-time histories or (state- or introduced by humans. The underlying as-
output-) feedback control schemes are applied sumption is that for given situations, good
here. However, for achieving autonomous ca- rules for arriving at reasonable behavioral deci-
pabilities, an abstract (quasistatic) representa- sions can be formulated. A situation is not just
tion of these control modes and the transitions an arrangement of objects in the vicinity but
possible between them has been implemented depends on the intentions of the autonomous
recently on the upper system levels by AI vehicle and those of other subjects. To repre-
methods according to extended state charts sent a situation, both the geometric distribu-
(Harel 1987; Maurer 2000a; Siedersberger tion of objects-subjects with their velocity
2003). This procedure allows more flexible be- components, their intended actions, and the
havior decisions, based also on maneuvers of autonomous vehicle’s actions have to be taken
other subjects supposedly under execution. In into account.
a defensive style of driving, this recognition of The relative dynamic state of all objects-sub-
maneuvers is part of situation assessment. For jects involved is the basic property for many
more details on behavior decision for gaze and aspects of a situation; therefore, representing
attention (BDGA), see Pellkofer (2003) and Pel- this state is the central task. In the 4D ap-
lkofer, Lützeler, and Dickmanns (2001). proach, Dirk Dickmanns (1997) introduced the
SUMMER 2004 21
Articles
22 AI MAGAZINE
Articles
Representation of Capabilities,
Mission Requirements
Category of Gaze Locomotion
Useful autonomous systems must be able to ac-
capabilities control control
cept tasks and solve them on their own. For
this purpose, they must have a set of perceptu-
al and behavioral capabilities at their disposal.
Flexible autonomous systems are able to recog- Goal-oriented use Performance of
nize situations and select behavioral compo- of all capabilities mission elements
nents that allow efficient mission performance. in mission context by coordinated
All these activities require goal-oriented con-
behaviors
trol actuation. Subjects can be grouped to class-
es according to their perceptual and behavioral
Goal-oriented Maneuvers, Maneuvers,
capabilities (among other characteristics such specific capabilities special feed- special feed-
as shape), so that the scheme developed and in each category back modes back modes
discussed later can be used for characterizing
classes of subjects. This scheme is a recent de-
velopment; its explicit representation in the
C++ code of EMS vision has been done for gaze Elementary skills
and locomotion behavior (figure 7). Gaze Vehicle
(automated
Two classes of behavioral capabilities in mo- control control
capabilities)
tion control are of special importance: (1) so- in each category basic skills basic skills
called maneuvers, in which special control time
histories lead to a desired state transition in a
finite (usually short) amount of time, and (2) Underlying Underlying
Basic software software & software &
regulating activities, realized by some kind of & hardware
feedback control, in which a certain relative actuators actuators
preconditions
state is to be achieved and/or maintained; the gaze control own body
regulating activities can be extended over long
periods of time (like road running). Both kinds
of control application are extensively studied
Figure 7. Capability Matrix as Represented in Explicit Form in the
in control engineering literature: maneuvers in
EMS Vision System for Characterizing Subjects by Their Capabilities
connection with optimal control (see, for ex-
in Different Categories of Action.
ample, Bryson and Ho [1975]) and regulating
activities in the form of linear feedback control
(Kailath [1980] only one of many textbooks). of vehicle, friction coefficients to the road, and
Besides vehicle control for locomotion, both so on). This knowledge is quasistatic and is typ-
types are also applied in active vision for gaze ically handled by AI methods (for example,
control. Saccades and search patterns are real- state charts [Harel 1987]).
ized with prespecified control time histories Actual implementations of maneuvers and
(topic 1 earlier), and view fixation on moving feedback control are achieved by tapping
objects and inertial stabilization are achieved knowledge stored in software on the processors
with proper feedback of visual or inertial sig- directly actuating the control elements (lower
nals (topic 2). layers in figures 2, 7, and 8 right). For maneu-
Complex control systems are realized on dis- vers, numerically or analytically coded time
tributed computer hardware with specific histories can be activated with proper parame-
processors for vehicle- and gaze-control actua- ters; the nominal trajectories resulting can also
tion on the one side and for knowledge pro- be computed and taken as reference for super-
cessing on the other. The same control output imposed feedback loops dealing with unpre-
(maneuver, regulatory behavior) can be repre- dictable perturbations. For regulatory tasks, on-
sented internally in different forms, adapted to ly the commanded (set) values of the
the subtasks to be performed in each unit. For controlled variables have to be specified. The
decision making, it is not required to know all feedback control laws usually consist of matri-
data of the actual control output and the reac- ces of feedback gains and a link to the proper
tions necessary to counteract random distur- state or reference variables. These feedback
bances as long as the system can trust that per- loops can be realized with short time delays be-
turbations can be handled. However, it has to cause measurement data are directly fed into
know which state transitions can be achieved these processors. This dual representation with
by which maneuvers, possibly with favorable different schemes on the level close to hard-
parameters depending on the situation (speed ware (with control engineering–type methods)
SUMMER 2004 23
Articles
and on the mental processing level (with qua- subjects might allow inferring pursued inten-
sistatic knowledge as suited for decision mak- tions. In any case, these observations can affect
ing) has several advantages. Besides the practi- autonomous vehicle decisions.
cal advantages for data handling at different Recent publications in the field of research
rates, this representation also provides symbol on human consciousness point in the direc-
grounding for verbs and terms. In road vehicle tion that the corresponding activities are per-
guidance, terms such as road running with con- formed in different parts of the brain in hu-
stant speed or with speed adapted to road curva- mans (Damasio 1999).
ture, lane changing, convoy driving, or the com- Figure 2 shows the overall cognitive system
plex (perception and vehicle control) architecture as it results quite naturally when
maneuver for turning off onto a crossroad are eas- exploiting recursive estimation techniques
ily defined by combinations or sequences of with spatiotemporal models for dynamic vi-
feed-forward and feedback control activation. sion (4D approach). However, it took more
For each category of behavioral capability, net- than a decade and many dissertations to get all
work representations can be developed show- elements from the fields of control engineering
ing how specific goal-oriented capabilities and AI straightened out, arranged, and experi-
(third row from bottom of figure 7) depend on mentally verified such that a rather simple but
elementary skills (second row). The top layer in very flexible system architecture resulted. Ob-
these network representations shows how spe- ject-oriented programming and powerful mod-
cific capabilities in the different categories con- ern software tools were preconditions for
tribute to overall goal-oriented use of all capa- achieving this goal with moderate investment
bilities in the mission context. The bottom of human capital. An important step in clarify-
layer represents necessary preconditions for ing the roles of control engineering and AI
the capability to be actually available; before methods was to separate the basic recognition
activation of a capability, this fact is checked step for an object-subject here and now from
by the system through flags, which have to be accumulating more in-depth knowledge on
set by the corresponding distributed proces- the behavior of subjects derived from state
sors. For details, the interested reader is referred variable time histories.
to Pellkofer (2003) and Siedersberger (2004). The horizontal bar half way up in figure 2,
labeled “Dynamic Knowledge Representation
Overall Cognitive System Architecture (DKR),” separates the number-crunching lower
The components for perception and control of part, working predominantly with systems dy-
an autonomous system have to be integrated namics and control engineering methods,
into an overall system, allowing both easy hu- from the mental AI part on top. It consists of
man-machine interfaces and efficient perfor- three (groups of) elements: (1) the scene tree
mance of the tasks for which the system has for representing the properties of objects and
been designed. To achieve the first goal, the subjects perceived (left); (2) the actually valid
system should be able to understand terms and task (mission element) for the autonomous ve-
phrases as they are used in human conversa- hicle (center); and (3) the abstract representa-
tions. Verbs are an essential part of these con- tions of autonomous vehicle’s perceptual and
versations. They have to be understood, in- behavioral capabilities (right).
cluding their semantics and the movements or Building on this wealth of knowledge, situa-
changes in appearance of subjects when watch- tion assessment is done on the higher level in-
ing these activities. The spatiotemporal models dependently for gaze and vehicle control be-
used in the 4D approach should alleviate the cause MARVEYE with its specialized components
realization of these requirements in the future. has more precise requirements than just vehicle
In general, two regimes in recognition can guidance. It especially has to come up with the
be distinguished quite naturally: First are ob- decision when to split attention over time and
servations to grasp the actual kind and motion perform saccadic perception with a unified
states of objects or subjects as they appear here model of the scene (Pellkofer 2003; Pellkofer,
and now, which is the proper meaning of the Lützeler, and Dickmanns 2001). A typical ex-
phrase to see. Second, when an object-subject ample is recognition of a crossroad with the
has been detected and recognized, further ob- telecamera requiring alternating viewing direc-
servations on a larger time scale allow deeper tions to the intersection and further down into
understanding of what is happening in the the crossroad (Lützeler 2002). If several special-
world. Trajectories of objects can tell some- ist visual recognition processes request contra-
thing about the nature of environmental per- dicting regions of attention, which cannot be
turbations (such as a piece of paper moved satisfied by the usual perceptual capabilities,
around by wind fields), or the route selected by the central decision unit has to set priorities in
24 AI MAGAZINE
Articles
monochr.
monochr.
color
GC VC EPC GPS
vehicle subsystem
MarVEye camera configuration on two-axis gaze
gaze control platform subsystem
Figure 8. Distributed Commercial Off-the-Shelf (COTS) Computer System with Four Dual PCs, Three Frame Grabbers, and Two
Transputer Systems on Which Expectation-Based, Multifocal, Saccadic Vision Has Been Realized and Demonstrated.
the mission context. Possibly, it has to modify tions when these control laws are to be used
vehicle control such that perceptual needs can with which parameters and (2) storing actual
be matched (see also figure 2, upper right); results of an application together with some
there is much room for further developments. performance measures about the quality of re-
The results of situation assessment for vehi- sulting behavior. The definition of these per-
cle guidance in connection with the actual formance measures (or pay-off functions, pos-
mission element to be performed include some sibly geared to more general values for judging
kind of prediction of the autonomous vehicle behavior) is one of the areas that needs more
trajectory intended and of several other ones attention. Resulting maximum accelerations
for which critical situations might develop. and minimal distance to obstacles encountered
Background knowledge on values, goals, and or to the road shoulder might be some criteria.
behavioral capabilities of other vehicles allow
assuming intentions of other subjects (such as
a lane-change maneuver when observing sys- Realization with Commercial
tematic deviations to one side of its lane). On Off-the-Shelf Hardware
this basis, fast in-advance simulations or quick The EMS vision system has been implemented
checks of lookup tables can clarify problems on two to four DualPentium PCs with a scal-
that might occur when the autonomous vehi- able coherent interface (SCI) for communica-
cle maneuver is continued. Based on a set of tion rates up to 80 megabytes a second and
rules, the decision level can then trigger transi- with NT as the operating system. This simple
tion to a safer maneuver (such as deceleration solution has only been possible because two
or a deviation to one side). transputer subsystems (having no operating
The implementation of a new behavior is system at all) have been kept as a hardware in-
done on the 4D level (lower center right in fig- terface to conventional sensors and actuators
ure 2), again with control engineering methods from the previous generation (lower layers in
allowing the prediction of typical frequencies figure 8). Spatiotemporal modeling of process-
and damping of the eigenmodes excited. De- es allows for adjustments to variable cycle
signing these control laws and checking their times and various delay times in different
properties is part of conventional engineering branches of the system.
in automotive control. What has to be added Three synchronized frame grabbers handle
for autonomous systems is (1) specifying situa- video data input for as many as 4 cameras at a
SUMMER 2004 25
Articles
26 AI MAGAZINE
Articles
reference in the wide-angle images. Figure 11 serted in one of four PCs. Detection of a ditch
shows this sequence with simultaneously tak- at greater distances was performed using
en images left and right in each image row and proven edge-detection methods, taking aver-
a temporal sequence at irregular intervals from age intensity values on the sides into account;
top down. The upper row (figure 11a) was tak- discriminating shadows on a smooth surface
en when the vehicle was still driving essential- and nearby tracking of the ditch had to be
ly in the direction of the original road, with the done on the basis of the disparity image deliv-
cameras turned left (looking over the shoul- ered. The ditch is described by an encasing rec-
der). The left image shows the left A-frame of tangle in the ground surface, the parameters
the body of VAMORS and the crossroad far away and orientation of which are determined visu-
through the small side window. The crossroad ally. An evasive maneuver with view fixation
is measured in the right window only, yielding on the nearest corner was performed (Hof-
the most important information for a turnoff mann and Siedersberger 2003; Pellkofer, Hof-
to the left. mann, and Dickmanns 2003)].
In figure 11b, the vehicle has turned so far
that searching for the new road reference is
done in both images, which has been achieved
Conclusions and Outlook
in figure 11c of the image pairs. Now, from this The integrated approach using spatiotemporal
road model for the near range, the road bound- models, similar to what humans do in everyday
aries are also measured in the teleimage for rec- life (subconsciously), seems more promising for
ognizing road curvature precisely (left image in developing complex cognitive robotic systems
figure 11d). The right subimage here shows a than separate subworlds for automatic control
bird’s eye view of the situation, with the vehi- and knowledge processing. Both the cybernet-
cle (marked by an arrow) leaving the intersec- ics approach developed around the middle of
tion area. the last century and the AI approaches devel-
This rather complex perception and locomo- oped in the latter part seem to have been too
tion maneuver can be considered a behavioral narrow minded for real success in complex en-
capability emerging from proper combinations vironments with noise-corrupted processes and
of elementary skills in both perception and a large spectrum of objects and subjects. Sub-
motion control, which the vehicle has or does jects encountered in our everyday environment
not have. The development task existing right range from simple and dumb (animals on lower
now is to provide autonomous vehicles with a evolutionary levels or early robots) to intelli-
sufficient number of these skills and behavioral gent (such as primates); autonomous vehicles
capabilities so that they can manage to find can augment this realm. Classes of subjects can
their way on their own and perform entire mis- be distinguished (beside shape) by their poten-
sions (Gregor 2002; Gregor et al. 2001). As a fi- tial perceptual and behavioral capabilities. Indi-
nal demonstration of this project, a complete viduals of each class can actually only have
mission on a road network was performed, in- some of these capabilities at their disposal. For
cluding legs on grass surfaces with the detec- this reason (among others), the explicit repre-
tion of ditches by EMS vision (Siedersberger et sentation of capabilities has been introduced in
al. 2001). For more robust performance, the the EMS visual perception system.
EMS vision system has been beefed up with This advanced system mimicking vertebrate
special stereo hardware from Pyramid Vision vision trades about two orders of magnitude re-
System (PVS ACADIA board). The board was in- duction in steady video data flow for a few
SUMMER 2004 27
Articles
28 AI MAGAZINE
Articles
to different values of the performance Inter-pretation des Fahrspurverlaufes durch Control of Motion. In Handbook of Comput-
measure for this maneuver. By observ- Rechnersehen für ein autonomes Straßen- er Vision and Applications, Volume 3, eds.
ing which changes in control have led fahrzeug (Visual Recognition and Interpre- Bernd Jähne, Horst Haußecker, and Peter
to which changes in performance, tation of Roads by Computer Vision for an Geißler, 569– 620. San Diego, Calif.: Acad-
Autonomous Road Vehicle). Ph.D. diss., emic Press.
overall improvements can be achieved
Department LRT, UniBw Munich. Dickmanns, E. D., and Zapp, A. 1986. A
systematically. Similarly, systematic
Brooks R., and Stein, L. 1994. Building Curvature-Based Scheme for Improving
changes in cooperation strategies can
Brains for Bodies. Autonomous Robots 1(1): Road Vehicle Guidance by Computer Vi-
be evaluated in light of some more re- 7–25. sion. In Mobile Robots, Proceedings of the
fined performance criteria. On the top Society of Photo-Optical Instrumentation
Bryson, A. E., Jr., and Ho, Y.-C. 1975. Ap-
of figure 7, layers have to be added for plied Optimal Control. Optimization, Estima- Engineers (SPIE), Volume 727, 161–168.
future extensions toward systems ca- tion, and Control. rev. ed. Washington D.C.: Bellingham, Wash.: SPIE—The Interna-
pable of learning and cooperation. Hemisphere. tional Society for Optical Engineering.
The explicit representation of other ca- Caveney, D.; Dickmanns, D.; and Hedrik, K. Dickmanns, E. D.; Mysliwetz, B.; Chris-
pabilities of subjects, such as percep- 2003. Before the Controller: A Multiple Tar- tians, T. 1990. Spatiotemporal Guidance of
tion, situation assessment, planning, get Tracking Routine for Driver Assistance Autonomous Vehicles by Computer Vision.
and evaluation of experience made, Systems. Automatisierungstechnik 51 (5/ IEEE Transactions on Systems, Man, and Cy-
should be added as additional 2003): 230–238. bernetics (Special issue on Unmanned Vehi-
columns in the long run. This ap- Damasio, A. 1999. The Feeling of What Hap- cles and Intelligent Robotic Systems) 20(6):
proach will lead to more flexibility and pens. Body and Emotion in the Making of Con- 1273–1284.
extended autonomy. sciousness. New York: Harcourt Brace. Dickmanns, E. D.; Behringer, R.; Dick-
Dickmanns, E. D. 2002. The Development manns, D.; Hildebrandt, T.; Maurer, M.;
Acknowledgments of Machine Vision for Road Vehicles in the Thomanek, F.; and Schiehlen, J. 1994. The
Last Decade. Paper presented at the IEEE In- Seeing Passenger Car. Paper presented at
This work was funded by the German
telligent Vehicle Symposium, 18–20 June, the Intelligent Vehicles ‘94 Symposium,
Bundesamt fuer Wehrtechnik und 24–26 October, Paris, France.
Versailles, France.
Beschaffung (BWB) from 1997 to 2002
Dickmanns, E. D. 1998. Vehicles Capable of Gregor, R. 2002. Fähigkeiten zur Missions-
under the title “Intelligent Vehicle durchführung und Landmarkennavigation
Dynamic Vision: A New Breed of Technical
Functions.” Contributions by former (Capabilities for Mission Performance and
Beings? Artificial Intelligence 103(1–2): 49–
coworkers S. Baten, S. Fuerst, and V. v. 76. Landmark Navigation). Ph.D. diss., Depart-
Holt are gratefully acknowledged here ment LRT, UniBw Munich.
Dickmanns, D. 1997. Rahmensystem für vi-
because they do not show up in the suelle Wahrnehmung veränderlicher Gregor, R., and Dickmanns, E. D. 2000.
list of references. Szenen durch Computer (Framework for EMS Vision: Mission Performance on Road
Visual Perception of Changing Scenes by Networks. In Proceedings of the IEEE Intel-
Notes Computers). Ph.D. diss., Department Infor- ligent Vehicles Symposium, 140–145.
1. Also of interest is McCarthy, J.; Minsky, matik, UniBw Munich. Washington, D.C.: IEEE Computer Society.
M.; Rochester, N.; and Shannon, C. 1955. A Dickmanns, E. D. 1987. 4D Dynamic Scene Gregor, R.; Lützeler, M.; Pellkofer M.;
Proposal for the Dartmouth Summer Re- Analysis with Integral Spatiotemporal Mod- Siedersberger, K.-H.; and Dickmanns, E. D.
search Project on Artificial Intelligence, els. In the Fourth International Symposium on 2001. A Vision System for Autonomous
Aug. 31. Robotics Research, eds. R. Bolles and B. Roth, Ground Vehicles with a Wide Range of Ma-
2. It is interesting to note here that in the 311–318. Cambridge Mass.: The MIT Press. neuvering Capabilities. Paper presented at
English language, this word is used both for Dickmanns, E. D. 1995. Performance Im- the International Workshop on Computer
the acting instance (a subject [person] do- provements for Autonomous Road Vehi- Vision Systems, 7–8 July, Vancouver, Cana-
ing something) and for the object being in- cles. In Proceedings of the International da.
vestigated (subjected to investigation). In Conference on Intelligent Autonomous Gregor, R., Lützeler, M.; Pellkofer, M.;
the German language, one would prefer to Systems (IAS-4), 27–30 March, Karlsruhe, Siedersberger, K. H.; and Dickmanns, E. D.
call this latter one the object, even if it is a Germany. 2000. EMS Vision: A Perceptual System for
subject in the former sense (der Gegen- Autonomous Vehicles. In Proceedings of
Dickmanns, E. D., and Graefe, V. 1988a.
stand der Untersuchung; Gegenstand direct- the IEEE Intelligent Vehicles Symposium,
Application of Dynamic Monocular Ma-
ly corresponds to the Latin word objectum). 52–57. Washington, D.C.: IEEE Computer
chine Vision. Journal of Machine Vision and
Application 1: 223–241. Society.
References Harel, D. 1987. State Charts: A Visual For-
Dickmanns, E. D., and Graefe, V. 1988b.
Albus, J. S., and Meystel, A. M. 2001. Engi- Dynamic Monocular Machine Vision. Jour- malism for Complex Systems. Science of
neering of Mind—An Introduction to the Sci- nal of Machine Vision and Application Computer Programming 8(3):231–274.
ence of Intelligent Systems. New York: Wiley 1:242–261. Hofmann, U., and Siedersberger, K.-H.
Interscience. 2003. Stereo and Photometric Image Se-
Dickmanns, E. D., and Mysliwetz, B. 1992.
Arkin, R. C. 1998. Behavior-Based Robotics. Recursive 3D Road and Relative Ego-State quence Interpretation for Detecting Nega-
Cambridge, Mass.: MIT Press. Recognition. IEEE Transactions on Pattern tive Obstacles Using Active Gaze Control
AW&ST. 1986. Researchers Channel AI Ac- Analysis and Machine Intelligence (PAMI) and Performing an Autonomous Jink. Pa-
tivities toward Real-World Applications. (Special issue on Interpretation of 3D per presented at the SPIE AeroSense Un-
Aviation Week and Space Technology, Feb. 17: Scenes) 14(2): 199–213. manned Ground Vehicles Conference,
40–52. Dickmanns, E. D., and Wünsche, H.-J. 22–23 April, Orlando, Florida.
Behringer, R. 1996. Visuelle Erkennung und 1999. Dynamic Vision for Perception and Hofmann, U.; Rieder, A.; and Dickmanns,
SUMMER 2004 29
Articles
E. D. 2000. EMS Vision: An Application to Joint Conferences on Artificial Intelligence. Recognition by Machine. In Computers and
Intelligent Cruise Control for High Speed Pellkofer, M. 2003. Verhaltensentscheidung Thought, ed. E. Feigenbaum and J. Feldman,
Roads. In Proceedings of the IEEE Intelli- fuer autonome Fahrzeuge mit Blickrich- 235–267. New York: McGraw-Hill.
gent Vehicles Symposium, 468–473. Wash- tungssteuerung (Behavior Decision for Au- Siedersberger, K.-H. 2003. Komponenten
ington, D.C.: IEEE Computer Society. tonomous Vehicles with Gaze Control). zur automatischen Fahrzeugführung in se-
Kailath, T. 1980. Linear Systems. Englewood Ph.D. diss., Department LRT, UniBw Mu- henden (semi-) autonomen Fahrzeugen
Cliffs, N.J.: Prentice Hall. nich. (Components for Automatic Guidance of
Kalman, R. 1960. A New Approach to Lin- Pellkofer, M., and Dickmanns, E. D. 2000. Autonomous Vehicles with the Sense of Vi-
ear Filtering and Prediction Problems. In EMS Vision: Gaze Control in Autonomous sion). Ph.D. diss., LRT, UniBw Munich. LRT.
Transactions of the ASME Journal of Basic Vehicles. In Proceedings of the IEEE Intelli- Siedersberger, K.-H., and Dickmanns, E. D.
Engineering, 35–45. Fairfield, N.J.: Ameri- gent Vehicles Symposium, 296–301. Wash- 2000. EMS Vision: Enhanced Abilities for
can Society of Mechanical Engineers. ington, D.C.: IEEE Computer Society. Locomotion. In Proceedings of the IEEE In-
Klass, P. J. 1985. DARPA Envisions New Pellkofer, M.; Hofmann, U.; and Dick- telligent Vehicles Symposium, 146–151.
Generation of Machine Intelligence. Avia- manns E. D. 2003. Autonomous Cross Washington, D.C.: IEEE Computer Society.
tion Week and Space Technology. 122(16) (22 Country Driving Using Active Vision. Paper Siedersberger, K.-H.; Pellkofer, M.; Lützeler,
April 1985) presented at Intelligent Robots and Com- M.; Dickmanns, E. D.; Rieder, A.; Mandel-
Lützeler, M. 2002. Fahrbahnerkennung puter Vision XXI: Algorithms, Techniques, baum, R.; and Bogoni, I. 2001. Combining
zum Manövrieren auf Wegenetzen mit ak- and Active Vision, 27–30 October, Provi- EMS Vision and Horopter Stereo for Obsta-
tivem Sehen (Road Recognition for Maneu- dence, Rhode Island. cle Avoidance of Autonomous Vehicles. Pa-
vering on Road Networks Using Active Vi- Pellkofer, M.; Lützeler, M.; and Dickmanns, per presented at the International Work-
sion). Ph.D. diss., Department LRT, UniBw E. D. 2001. Interaction of Perception and shop on Computer Vision Systems, 7–8
Munich. Gaze Control in Autonomous Vehicles. Pa- July, Vancouver, Canada.
Lützeler, M., and Dickmanns E. D. 2000. per presented at the XX SPIE Conference
Thomanek, F. 1996. Visuelle Erkennung
EMS Vision: Recognition of Intersections on Intelligent Robots and Computer Vi-
und Zustandsschätzung von mehreren
on Unmarked Road Networks. In Proceed- sion: Algorithms, Techniques, and Active
Straßenfahrzeugen zur autonomen
ings of the IEEE Intelligent Vehicles Sympo- Vision, 28 October–2 November, Boston,
Fahrzeugführung (Visual Recognition and
sium, 302–307. Washington, D.C.: IEEE Massachusetts.
State Estimation of Multiple Road Vehicles
Computer Society. Pomerleau, D. A. 1992. Neural Network Per- for Autonomous Vehicle Guidance). Ph.D.
Marr, D. 1982. Vision. San Francisco, Calif.: ception for Mobile Robot Guidance. Ph.D.
diss., LRT, UniBw Munich.
Freeman. diss., Computer Science Department,
Carnegie Mellon University, Pittsburgh, Thomanek, F.; Dickmanns, E. D.; and Dick-
Maurer, M. 2000a. Flexible Automa- manns, D. 1994. Multiple Object Recogni-
tisierung von Straßenfahrzeugen mit Rech- Penn.
tion and Scene Interpretation for Au-
nersehen (Flexible Automation of Road Ve- Rieder, A. 2000. Fahrzeuge sehen—Multi-
tonomous Road Vehicle Guidance. Paper
hicles by Computer Vision). Ph.D. diss., sensorielle Fahrzeug-erkennung in einem
presented at the 1994 Intelligent Vehicles
Department LRT, UniBw Munich. verteilten Rechnersystem für autonome
Symposium, 24–26 October, Paris, France.
Maurer, M. 2000b. Knowledge Representa- Fahrzeuge (Seeing Vehicles—Multi-Sensory
Vehicle Recognition in a Distributed Com- Werner, S. 1997. Maschinelle Wahrneh-
tion for Flexible Automation of Land Vehi-
puter System for Autonomous Vehicles). mung für den bordautonomen automatis-
cles. In Proceedings of the IEEE Intelligent
Ph.D. diss., LRT, UniBw Munich. chen Hubschrauberflug (Machine Percep-
Vehicles Symposium, 575–580. Washing-
Rosenfeld, A., and Kak, A. 1976. Digital Pic- tion for On-board Autonomous Automatic
ton, D.C.: IEEE Computer Society.
ture Processing. San Diego, Calif.: Academic. Helicopter Flight). Ph.D. diss., LRT, UniBw
Miller G.; Galanter, E.; and Pribram, K. Munich.
1960. Plans and the Structure of Behavior. Schell, F. R. 1992. Bordautonomer automa-
New York: Holt, Rinehart, and Winston. tischer Landeanflug aufgrund bildhafter Wiener, N. 1948. Cybernetics. New York: Wi-
und inertialer Meßdatenauswertung (On- ley.
Moravec, H. 1983. The Stanford Cart and
the CME Rover. Proceedings of the IEEE board Autonomous Automatic Landing Ap- Wuensche, H.-J. 1988. Bewegungss-
71(7): 872–884. proached. Based on Processing of Iconic teuerung durch Rechnersehen (Motion
(Visual) and Inertial Measurement Data). Control by Computer Vision). Band 20,
Mysliwetz, B. 1990. Parallelrechner-basierte
Ph.D. diss., LRT, UniBw Munich. Fachber. Messen-Steuern–Regeln. Berlin:
Bildfolgen-Interpretation zur autonomen
Schell, F. R., and Dickmanns, E. D. 1994. Springer.
Fahrzeugsteuerung (Parallel Computer-
Based image Sequence Interpretation for Autonomous Landing of Airplanes by Dy- Ernst D. Dickmanns studied aerospace
the Guiding of Autonomous Vehicles). namic Machine Vision. Machine Vision and and control engineering at RWTH Aachen
Ph.D. diss., Department LRT, UniBw Mu- Application 7(3): 127–134. and Princeton University. After 14 years in
nich. Schiehlen, J. 1995. Kameraplattformen für aerospace research with DFVLR Oberpfaf-
Newell, A., and Simon, H. 1963. GPS: A Pro- aktiv sehende Fahrzeuge (Camera Pointing fenhofen in the field of optimal trajecto-
gram That Simulates Human Thought. In Platforms for Vehicles with Active Vision). ries, in 1975, he became full professor of
Computers and Thought, 279–293, eds. E. Ph.D. diss., LRT, UniBw Munich. control engineering in the Aerospace De-
Feigenbaum and J. Feldman. New York: Mc- Selfridge, O. 1959. Pandemonium: A Para- partment at the University of the Bun-
Graw-Hill. digm for Learning. In Proceedings of the Sym- deswehr near Munich. Since then, his main
Nilsson N. J. 1969. A Mobile Automaton: posium on the Mechanization of Thought research field has been real-time vision for
An Application of Artificial Intelligence. In Processes., 511–529, ed. D. V. Blake and A. autonomous vehicle guidance, pioneering
Proceedings of the First International Joint M. Uttley. London, U.K.: Her Majesty’s Sta- the 4D approach to dynamic vision. His e-
Conference on Artificial Intelligence, tionary Office. mail address is Ernst.Dickmanns@unibw-
509–521. Menlo Park, Calif.: International Selfridge, O., and Neisser, U. 1960. Pattern muenchen.de.
30 AI MAGAZINE