You are on page 1of 34

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/265308751

Automation and digital photogrammetric workstations. Manual


of Photogrammetry

Article · January 2004

CITATIONS READS

28 6,721

3 authors, including:

Peggy Agouris Anthony Stefanidis


George Mason University George Mason University
123 PUBLICATIONS   2,323 CITATIONS    163 PUBLICATIONS   3,619 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

GeoSensor Networks View project

Public health informatics View project

All content following this page was uploaded by Anthony Stefanidis on 17 June 2017.

The user has requested enhancement of the downloaded file.


12

Chapter
AUTOMATION AND DIGITAL
PHOTOGRAMMETRIC
WORKSTATIONS
Peggy Agouris, Peter Doucette and Anthony Stefanidis

12.1 DIGITAL PHOTOGRAMMETRIC WORKSTATIONS

12.1.1 Introduction
Prior to the 1990s, early pioneering attempts to introduce digital (softcopy) photogrammetric map
production involved multiple special purpose and high-cost hybrid workstations equipped with custom
hardware and running associated proprietary software. The early 1990s saw the advent of the end-to-end
softcopy-based system, or Digital Photogrammetric Workstation (DPW). This marked a major paradigm
shift toward a stand-alone workstation that could accommodate virtually all aspects of image-based map
production and geospatial analysis. Through most of the 1990s, DPW systems ran almost exclusively on
high-end Unix or VAX platforms. Supported by advances in computer hardware and software, the late
1990s saw a migration toward more economical, modular, scalable, and open hardware architectures
provided by PC/Windows-based platforms, which also offered performance comparable to their Unix-
based counterparts. At the printing of this manual, there are about 15 independent software vendors that
offer PC/windows-based DPW production systems that vary in cost, functionality, features, and complexity.
Among the most popular fully-featured high-end systems are SOCET SET® by BAE Systems, Softplotter®
by Autometric, ImageStation® by Z/I Imaging, and Geomatica® by PCI Geomatics. Contained among these
system designs is a legacy of experience in the development of photogrammetric and mapping
instrumentation from several familiar vendors. Therefore, DPW design and development have been
greatly influenced by established conventions in the photogrammetric practice, resulting in comparable
architecture and functionality in these systems.
A DPW system comprises software and hardware that supports the storage, processing, and display of
imagery and relevant geospatial datasets, and the automated and interactive image-based measurement
of 3-dimensional information. Dictated by the requirements of softcopy map production, the defining
characteristics of a DPW include:
 the ability to store, manage, and manipulate very large image files,
 the ability to perform computationally demanding image processing tasks,
 providing smooth roaming across entire image files, and supporting zooming at various resolutions,
 supporting large monitor and stereoscopic display,
 supporting stereo vector superimposition, and
 supporting 3-dimensional data capture and edit.
Some of these challenges are met by making use of common computer solutions. For example, in order
to support rigorous image processing and display demands for images with radiometric resolutions of up
to 16 bit panchromatic (48 bit RGB) DPW systems make use of high-end graphics hardware, and large
amounts of video memory and disk storage. These capabilities are enhanced by specially designed
solutions that enable, for example, seamless roaming over large image files (see e.g. the architecture of
Intergraph’s ImagePipe [Graham et al., 1997]).
A typical DPW is shown in Figure 12.1(a). It is a dual monitor configuration, designed to dedicate one
monitor to stereoscopic viewing. The second monitor is commonly used for the photogrammetric soft-
ware graphical user interface (GUI), general display purposes, and general computing requirements. Even
though single monitor configurations are also possible when the graphics hardware supports stereo in a
window viewing, dual monitor configurations currently remain the most popular choice. They represent
a natural evolution of analytical photogrammetric plotters. Similar to DPWs, analytical stereoplotters

1
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
made use of a monitor dedicated to GUI and
software control. The stereo displaying moni-
tor of a DPW (and corresponding eyewear)
can be considered the counterpart of the
complex electro-optical viewing system of
an analytical stereoplotter with its oculars
and corresponding prisms. It allows the op-
erator to view stereoscopically a window in
the overlapping area of a stereopair. As the
images are available in a DPW in softcopy (a) (b)
format (as computer files) instead of actual
film, the complex electromechanical mecha- Figure 12.1. (a) A Digital Photogrammetric Workstation (courtesy
nism used to control movement in an ana- of Z/I Imaging); (b) 3-dimensional TopoMouse (courtesy of Leica
Geosystems).
lytical stereoplotter is obsolete for a DPW,
replaced by computer operations that generate pixel addresses within a file and accessing the corre-
sponding information. The 3-dimensional measuring device (a.k.a. turtle) of an analytical stereoplotter is
replaced by specialized 3-dimensional mouse designs that are available to facilitate the extraction of x,
y, z coordinates. These 3-dimensional mouse designs may range from simple mouse-and-trackball con-
figurations to more complex specially designed devices such as Leica Geosystems’ TopoMouse shown in
Figure 12.1(b). The TopoMouse has several programmable buttons, and a large thumbwheel to adjust the
z-coordinate, and complements standard PC input devices to control roaming and measuring in both
single image and stereo mode.
Beyond the above-mentioned effects that the availability of softcopy imagery has had on the design
of DPW configurations, the most dramatic effect of this transition has been the increased degree of
automation of photogrammetric operations. Whereas automation in analytical stereoplotters was limited
to driving the stereoplotter stage to specific locations, in DPWs automation has affected practically all
parts of the photogrammetric process, from orientation to orthophoto generation. This automation is
addressed in Sections 12.2 - 12.4 of this manual.

12.1.2 Electro-Stereoscopic Display and Viewing


Techniques
Among high-end DPWs, the preferred electro-stereoscopic viewing methods include active liquid crystal
shuttering eyewear used in conjunction with an IR emitter, or passive polarized spectacles used in
conjunction with a liquid crystal modulating display panel. Either system allows viewing color imagery,
and permits multiple simultaneous viewers. Popular choices include the electro-stereoscopic viewing
devices shown in Figure 12.2.
The active modulating panel (ZScreen) of Figure 12.2(a) is a liquid crystal panel that is placed over a
monitor. The polarization (electric field
component of EM radiation) of the panel is
modulated at a rate synchronized with the
graphics processor that alternates left and
right image displays on the screen. Vertical
and horizontal polarization directions are used
to differentiate between left and right images.
Similarly aligned polarized spectacles are
worn by the viewer to decode the stereo
image.
The active shuttering eyewear in Figure (a) (b)
12.2(b) is synchronized with the display to Figure 12.2. Electro-stereoscopic viewing devices. (a) Passive
occlude the unwanted image, and transmit polarized glasses with active modulating panel (Monitor
ZScreen®), and (b) Active shuttering glasses with IR emitter
the wanted image for each eye to render a
(CrystalEyes®) (courtesy of StereoGraphics Corporation).
stereo image. The synchronization signal is

2 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
transmitted to the eyewear via an IR signal that originates from an emitter that interfaces with the
graphics hardware.
In either case, the stereoscopic display technique used is referred to as time-multiplexing or field-
sequential. A field represents image information allocated in video memory to be drawn to the display
monitor during a single refresh cycle. In such an approach, parallax information is provided to the eye by
rapidly alternating between the left and right images on the monitor. The images must be refreshed at a
sufficient rate (typically 120 fields per second, to achieve 60 fields per second per eye), in order to generate
a flicker-free stereoscopic image. Today, stereo-ready high-end graphics cards and monitors that use at
least a double-buffering technique to provide refresh rates up to 120 fields per second are readily available
for most computing platforms. Such graphics cards are equipped with a connector that provides an output
synchronization signal for electronic shuttering devices such as an IR emitter or monitor panel.
Stereoscopic viewing solutions also exist for graphics hardware that is not stereo-ready. Known as
the above-and-below format, the method uses two vertically arranged subfields that are above and
below each other, and squeezed top to bottom by a factor of two in a single field. A sync-doubling emitter
adds the missing synchronization pulses for a graphics hardware running at a nominal rate of 60 fields per
second, thus doubling the rate for the desired flicker-free stereo image. As a result of altering the sync
rate of the monitor, above-and-below stereoscopic applications must run at full-screen. The stereo
graphics hardware automatically unsqueezes the image in the vertical so the stereo image has the
expected aspect ratio. A trade-off with this approach is that the vertical resolution of the stereo image
is effectively reduced by a factor of 2, since the pixels are stretched in the vertical directional. Nonetheless,
the above-and-below format provides a workable solution for non-stereo-ready graphics hardware.
When video memory is limited, a relatively low-cost technique for generating a stereo image is to
interlace left and right images on odd and even field lines (e.g., left image drawn on lines 1, 3, 5…etc.,
and right image on lines 2, 4, 6…etc.). Shuttering eyewear is synchronized with the refresh rate of the
odd and even field lines in order to decode a stereo image for the viewer. The drawbacks of interlaced
stereo include a degradation of vertical resolution by a factor of 2, more noticeable flicker, and applications
that are limited to full screen stereo mode.
For state-of-the-art stereo viewing capabilities, high-end 3-dimensional graphics hardware designs
offer what is known as quad buffered stereo (QBS). Once available only on Unix workstations, QBS has
now become common place on PCs. QBS can be understood as a simple extension of double-buffered
animation, i.e., during the display of one video image, the image to follow is concurrently drawn to a
memory buffer. The result is a faster, and thus smoother transition between image sequences. Double-
buffering is exploited in stereoscopic display techniques to speed up the transitions between left and
right images. QBS extends this concept even further by dividing the graphics memory into four buffers,
such that one pair of buffers is the display pair, while the other is the rendering pair. The result is vastly
improved stereo viewing quality in terms of smoothness during image roaming, as well as rendering
real-time superimposed vectors that are always current. A significant advantage of QBS is that it allows
for multiple stereo-in-a-window (SIAW) renderings. That is, a user has access to other application windows
while rendering stereo displays in one or more windows on a single monitor.
In an ideal stereoscopic system, each eye sees only the image intended for it. Electro-stereoscopic
viewing techniques are susceptible to a certain amount of crosstalk, i.e., when either eye is allowed to
see the image for the other eye, to some extent. The principal source of crosstalk in electro-stereoscopic
display is CRT phosphor afterglow, or ghosting. Following the display of the right image, its afterglow
will persist to some extent during the display of the left image, and vice-versa. Ghosting is kept at a
tolerable level by using sufficiently low parallax angles. As a general rule of thumb, parallax angles are
kept under 1.5° for comfortable stereo viewing with a typical workstation configuration (StereoGraphics,
1997). The parallax angle θ is defined as

(12.1)

where P is the distance between right and left-eye projections of a given image point, and d is the

Automation and Digital Photogrammetric Workstations 3


Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
distance between the viewer’s eye plane and the display screen. Latest advances in stereoscopic
viewing devices include autostereoscopic solutions that require no specialized eyewear, rendering their
display on a liquid crystal flat-screen monitor.

12.2 AUTOMATED ORIENTATIONS IN A DPW

12.2.1 Photogrammetric Mapping Workflow


The structure of the typical photogrammetric mapping workflow remains largely unchanged, as images
are oriented and subsequently analyzed to produce geospatial information. Figure 12.3 is a schematic
description of this workflow, identifying two groups of operations that make use of digital imagery. The
middle column group comprises operations that relate images to other images and to a reference
coordinate system, namely orientations and triangulation. The right-hand column comprises operations
that generate geospatial information: producing digital elevation models (DEMs) and orthophotos, and
proceeding with GIS generation or updates through feature extraction.

Figure 12.3. Typical photogrammetric mapping workflow.

While the overall workflow has not changed much, the use of softcopy imagery and DPWs has
revolutionized photogrammetric mapping by introducing automated modules to perform these tasks.
Today, modern DPW production software provides users with end-to-end solutions that are highly
modularized, flexible, and interoperable. Software applications are typically offered as a collection of
self-contained tools that can be run in customizable workflow sequences to generate desired products.
Typical core components offered by most vendors correspond to the fundamental tasks identified in
Figure 12.3, with additional modules dedicated to the management of the produced information. For
instance, in the case of SoftPlotter’s main production toolbar depicted in Figure 12.4, a standard end-to-
end workflow would generally proceed from left to right. However, it is also possible to customize this
process, importing for example external block triangulation results from third party software, so that the
user could proceed immediately to DEM generation. Each flow component usually offers import and
export options to accommodate an extensive variety of standard and customized image and data formats
to support interoperability with software
from other vendors. In Sections 12.2-12.4
an overview of the fundamental imple-
mentation and capabilities of essential
DPW workflow components is presented. Figure 12.4. Main production toolbar of SoftPlotter (courtesy of
Autometric).

4 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
12.2.2 Interior Orientation in a DPW
While digital cameras are becoming the standard in close-range applications, aerial mapping missions
commonly make use of analog cameras. The use of advanced high resolution digital sensors in
photogrammetric applications is still at an experimental stage [Fraser et al., 2002]. Aerial photographs
captured in analog format (using film cameras) are subsequently scanned to produce softcopy counterparts
of the original diapositives. These digitized diapositives are the common input to a DPW workflow. In this
set-up, interior orientation comprises two transformations:
 a transformation from the digital image pixel coordinate system (rows and columns) to the photo
coordinate system of the analog image, as it is defined by the fiducial marks, and
 the definition of the camera model by selecting the corresponding camera calibration parameters,
to support the eventual transformation from photo coordinates to the object space.
The second task is addressed during project
preparation by selecting the appropriate camera
calibration file, similar to the way this information is
selected in an analytical stereoplotter. This file includes
the geometric parameters that define the specific
image formation process, e.g. camera constant, fiducial
mark coordinates, distortion parameters. This
information is used later during aerotriangulation to
reconstruct the image formation geometry using the
collinearity equations (Chapters 3 and 11).
The novelty introduced by the use of DPWs in
recovering interior orientation is related to the first
transformation, namely from pixel (r,c) to the photo
(x p ,y p ) coordinate system, which requires the
identification and precise measurement of fiducial marks
in the digital image. Since fiducial marks are well-defined
targets, this process is highly suitable for full automation.
The typical workflow of an automated interior Figure 12.5. Typical workflow of an automated
interior orientation module.
orientation module is shown in Figure 12.5. Input data
include the image file, corresponding fiducial mark photo coordinates, and information on scanning pixel
size and scanning calibration information. This information is used to derive the approximate locations of
fiducial marks and to extract image patches that contain them. This can be accomplished either manually,
with the human operator pointing the cursor to their vicinity, or automatically, using hierarchical
approaches [Schickler and Poth, 1996]. In either case, the selected image patches are large enough to
ensure that they fully contain the fiducial marks, corresponding for example to a diapositive window as
large as 15 by 15 mm in the approach of [Kersten and Haering, 1997].

Jena LMK 2000 Wild RC20 Wild RC30 Zeiss RMK TOP15
Figure 12.6. Examples of fiducial marks supported in automated interior orientation schemes
(courtesy of Autometric, 2002).

Automation and Digital Photogrammetric Workstations 5


Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
The selected image patches serve as search windows, and the precise measurement of fiducial marks
becomes a matching task that compares templates of fiducial marks to these search windows [Lue, 1995].
Templates depict an ideal view of the fiducial mark, and can be synthetic or real image patches. Matching
a template to the search window allows for the identification of the center of the fiducial mark as the
matching counterpart of a pre-identified template location. Practically any matching technique can be
employed to perform template-to-image matching, but the most popular choices remain correlation and
least squares matching techniques. They allow for the identification of a location within the search
window that offers the maximum correlation coefficient or minimum sum of squared gray value differences
when compared to the template. Chapter 6 of this Manual contains details on the application of correlation
and least squares matching. In order to facilitate the performance of automated interior orientation most
DPW software platforms have a database with templates of commonly encountered fiducial marks (Figure
12.6). DPW operators can also design additional templates to depict any type of fiducial mark.
Alternative approaches to template matching include feature-based techniques that decompose a
fiducial mark into its structural primitives (Figure 12.7), and attempt to identify them within the search
window using appropriate techniques. For example, circular elements may be identified using a Hough
transform, while linear elements may be identified using edge detectors such as the Canny filter. These
techniques can be more time consuming than template techniques, but tend to be more suitable for
integration into hierarchical approaches to perform search window detection and precise fiducial mark
localization.

Figure 12.7. Structural decomposition of a fiducial mark to its graphical primitives.

The relationship between pixel (r,c) and photo (xp,yp) coordinate systems is described by a six parameter
affine transformation:

(12.2)

The six parameters express the combined effects of two translations, rotations and scalings between
the two coordinate systems. The measurement of each fiducial mark introduces two such observation
equations. Therefore, as soon as three fiducial marks have been measured an initial estimate of the
parameters can be made, with the solution re-adjusted every time an additional mark is measured. When
using calibrated scanners the results of scanner calibration should be used to correct pixel coordinates
before using them in the interior orientation process.
Reported accuracies in the automated measurement of fiducial marks are in the range of 0.1-0.3 pixel
using template matching techniques [Kersten and Haering, 1997; Schickler and Poth, 1996]. With a scanning
resolution of 12.5 to 25 µm, this corresponds to fiducial measurements with accuracy better than 4 to 8 µm.
These results are comparable to those achieved by a human operator in an analytical plotter. Figure 12.8
depicts an example graphical user interface of an automated interior orientation module.

12.2.3 Automatic Conjugate Point Measurements in a


Stereopair
Interior orientation allows for the transformation of pixel to photo coordinates and, using camera calibra-
tion data, to reconstruct if necessary the geometry of the bundle that generated a single image. The next
logical photogrammetric step is relative orientation. Its objective is to determine the relative position of

6 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
two overlapping images with
respect to each other, allowing
conjugate rays to intersect in
space and form a stereomodel
of the depicted area. This re-
quires the measurement of an
adequate number of conjugate
points in the overlapping area of
the stereopair. As discussed in
Chapter 11, in order for a rela-
tive orientation solution to be
statistically robust these points
must be well dispersed, cover-
ing the six von Gruber locations
(Figure 12.9). Benefiting by the
development of robust match-
ing techniques and work on the
automation of relative orienta-
tion, the measurement of con-
jugate points in a stereopair has
become an automated process Figure 12.8. Example GUI of an automated interior orientation module (courtesy
in digital photogrammetric ap- of Autometric, 2002).
plications.
The typical workflow of automated conjugate point measurement in a stereopair is shown in Figure.
12.10. Using approximate information on image overlap, windows in the vicinity of the von Gruber
locations are selected in each stereomate. The challenge is then to identify within these windows
distinct primitives (e.g. points, line segments) that are suitable for subsequent matching, and to select an
appropriate matching technique to establish correspondences among them. A popular choice among
existing software modules is to select interest points in each image separately, and to match them
subsequently using correlation techniques [Schenk et al., 1991]. Interest points are points of high gray
value variance, e.g. bright spots, or sharp corners, and can be detected using operators like Moravec or
Förstner (Chapter 6). By definition, interest points are distinct from their background, and are therefore
highly suitable for localization and matching. By applying such an operator in each von Gruber window of
each stereomate two pools of interest
points per window are produced, one
for each image. These points are match-
ing candidates, and become input to a
matching scheme that aims to identify
pairs of conjugate points. Matching can
be performed using an area-based ap-
proach, whereby windows centered on
each interest point are compared to iden-
tify conjugate pairs as those that display
the highest correlation values. Various
conditions can be introduced to mini-
mize gross errors and improve the over-
all performance of the technique (e.g.
setting a minimum threshold on accept-
able correlation limits, imposing con-
straints on acceptable parallax ranges).
Figure 12.9. Von Gruber locations in the overlapping area of a
This matching process is commonly stereopair.

Automation and Digital Photogrammetric Workstations 7


Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
implemented in a hierarchical manner, with window sizes and acceptable parallax values decreasing as
one moves from coarse to fine resolutions in an image pyramid [Tang and Heipke, 1996]. This is equivalent
to zooming-in to the correct solution as the resolution of the image is sharpened. For more practical
details on the implementation of conjugate point matching strategies, see Section 12.3.2.
Automated fiducial measurement processes differ on point selection and matching strategies. For
example, matching points can be selected to form a grid (or some other regular pattern) in the left image.
They are then transferred to its right stereomate using a priori information on the average parallax values
for this model. These points are then precisely matched using area-based techniques to identify pairs of
conjugate points.
Once an adequate number of conjugate points has been selected, their coordinates and standard
photogrammetric techniques are used to recover the relative orientation parameters, (Chapter 11). The
relationship between two stereomates is expressed by a total of five parameters, e.g. three relative
rotations dω, dφ, dκ, of the right stereomate with respect to the left one, and the two components of the
stereomodel base along the Y and Z axes. The estimation of these parameters requires the identification
and measurement of conjugate points in the overlapping area of two stereomates. The observation of the
same point in two photos introduces 4 observation equations (2 collinearity equations per point per
photo) and three unknowns (the three model coordinates of this point), in addition to the 5 parameters of
the relative orientation. Accordingly, for n points in the overlapping area a total of 4n equations result,
with 3n+5 unknowns. Therefore, at least five points must be observed in order to obtain a minimum
constraint solution, while observing six or more points allows for the performance of a least-square
adjustment for the estimation of the orientation parameters.
The automated approach represents a significant evolution from conjugate point measurement during
relative orientation performed in an analytical stereoplotter. The obvious advantage is related to the
number of conjugate points identified to orient a stereopair: whereas in the analytical approach the
human operator typically measures six points (one in each von Gruber location), in the softcopy approach
it is not uncommon to have hundreds of conjugate points automatically measured. This obviously results
in more robust solutions. However this advantage is compromised by the fact that the accuracy with
which these points are matched tends to be slightly lower than the accuracy with which a human
operator identifies conjugate points in an analytical stereoplotter. This problem can be controlled to a
certain extent by imposing stricter conditions on acceptable matches: raising, for example, the minimum
acceptable correlation coefficient will eliminate
weak matches that tend to be blunders.
Narrowing the range of acceptable parallaxes
will have similar effects, but requires reasonable
a priori approximations of the orientation
parameters.
In Figure 12.10 epipolar image resampling is
shown as a final step of this orientation process.
It is marked by a dashed line to indicate that
even though in actuality it is a separate process
that simply makes use of relative orientation
results, it has become a de facto part of the
image orientation process within a DPW
environment. Its objective is to produce
epipolar stereopairs (also referred to as
normalized stereopairs), generated by rectifying
the original stereomates into an epipolar
orientation. This removes y-parallax, while
leaving x-parallax unresolved so that it may be
interpreted as differences in elevation. Epipolar Figure 12.10. Automated conjugate point measurement in a
rectification generally requires rotations of one stereopair.

8 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
or both images such that horizontal lines of imagery displayed on the screen are epipolar lines as
illustrated in Figure 12.11. A detailed mathematical derivation of the procedure can be found in Section
3.2.2.7, while resampling is addressed in more detail in Section 12.3.3. With digital images, rotations are
accomplished via a resampling process that must guarantee the same ground sample distance per pixel
for both images. This technique, referred to as epipolar resampling, has become an essential part of DPW
workflow because it supports subsequent processes like DEM generation and orthophoto production. As
image resampling involves interpolation to derive each new pixel, it is a CPU intensive operation. In
order to alleviate this problem, techniques such as on-the-fly epipolar resampling have been developed
(used e.g. in ImageStation) to dynamically localize the resampling process to regions of interest. This
eliminates the need to generate entire epipolar resampled images prior to display and reduces the
amount of disk storage overhead.

12.2.4 Aerotriangulation
In modern DPWs the relative orientation workflow presented in the previous section is not implemented
as a separate stand-alone module, but rather as part of a broader point measurement and triangulation
module. However, it provides the theoretical and practical basis of automated point measurement during
aerotriangulation in DPWs. Aerotriangulation is often characterized as one of the more complex procedures
in terms of user knowledge of the underlying principles of a photogrammetric block adjustment. Its
objective is to relate multiple images to each other in order to:
 recover the complete orientation parameters of each image in the block, namely the (X0,Y0,Z0)
coordinates of the exposure station and the ω,φ,κ rotations, and to
 determine the ground coordinates (X,Y,Z) of points observed in them.
This requires the measurement of conjugate points in the overlapping areas of the block imagery (tie
and pass points), and the measurement of the photo coordinates of depicted control points.
Virtually all vendors provide triangulation algorithms that are based on rigorous physical sensor
models and the well-established principles of least squares bundle adjustment in which all parameters
are fully weight-constrained. These modules typically support two types of measurements:
 Automatic Point Measurements: proceed according to the workflow described in Section 12.2.3
to automatically produce large amounts of conjugate points. To accommodate the needs of
aerotriangulation, matching tools have been extended from stereo to multi-stage application.

Figure 12.11. Pairwise epipolar rectification (courtesy of BAE Systems, 2001).

Automation and Digital Photogrammetric Workstations 9


Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
Figure 12.12. Point measurement in aerotriangulation (courtesy of Autometric, 2002).

The proposed techniques include the extension of least squares matching to multi-image
application and their integration with block adjustment [Agouris, 1993; Agouris and Schenk,
1996], and the introduction of graph-based techniques to combine sequences of stereo matching
results in a block adjustment [Tsingas, 1994].
 Interactive Point Measurements: supports user-controlled measurements for the identification
and measurement in a semi-automatic mode for specific points, especially ground control points.
Additionally, modern DPWs support blunder detection and the remeasurement of erroneous points,
to improve the overall quality and performance of softcopy aerotriangulation. Figure 12.12 illustrates
how point measurement can be performed on multiple overlapping images that include reference views.
Experiments with DPW aerotriangulation indicate the high accuracy potential of the automated
approach. More specifically, results from the recent OEEPE aerotriangulation test using imagery scanned
at a resolutions of 20-30 mm indicate tie point measurements with accuracies ranging from 0.11-0.5
pixels (corresponding to 2.2-11 mm) [Heipke, 1999]. The optimal results (0.11-0.2 pixels) were achieved
processing imagery of open and flat terrain with good texture. In more adverse conditions, such as in
blocks of alpine regions at scales ranging from 1:22,000 to 1:54,000, and a scanning resolution of 25 mm,
point measurement accuracies ranging from 0.25-0.5 pixels are achieved [Kersten, 1999]. In the same
set-up the exposure station coordinates are estimated as accurately as 0.6m in X and Y, and 0.4m in Z.
These results indicate that under favorable conditions (open and flat terrain, good texture, high scanning
resolution), with a DPW, natural point measurement accuracies comparable to the accuracies measured at
signalized points in analytical plotters can be achieved. The single disadvantage today is the rather large
number of blunder matches that can be introduced as the result of full automation. Editing and removal
(or remeasuring) of blunders is a time-consuming process.
The performance of DPWs becomes even more impressive when considering the favorable effects on
production due to the high degree of automation. The time requirements to process a block are reported

10 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
to range from 10-20 minutes per image considering only operator-assisted processes and excluding
batch processes, scanning, and control preparation [Kersten, 1999]. This represents a significant
improvement compared to analytical processes.

12.2.5. Generalized Sensor Models


The sensor model establishes the functional relationship between image and object space. A physical or
rigorous sensor model is used to represent the physical imaging process, making use of information on the
sensor’s position and orientation. Classical sensors employed in photogrammetric missions are commonly
modeled through the collinearity condition. By contrast, a generalized or replacement sensor model does
not include sensor position and orientation information. Rather, a general polynomial function is used to
represent the transformation between image and object space. While a general polynomial function is less
accurate than a physical model, it offers significantly faster computational processing. It is also completely
independent of the sensor platform, and thus particularly suitable for today’s ever increasing variety of
image sources. A common practice used in DPW software implementations is to use a physical sensor
model (when available) during aerotriangulation only. Then, a set of rational functions that approximate the
projective geometry is derived from the physical model solution. These rational functions then serve as a
replacement sensor model that can provide for faster or real-time implementations of subsequent workstation
operations (e.g., DEM generation, orthoimage generation, and feature extraction).
High-end DPW software vendors provide several polynomial transformation schemes. Table 12.1
lists available options for the fast sensor model tool in SOCET SET®. Although a physical sensor model
(when available) can be used for all DPW operations in SOCET SET, the standard practice is to use the
physical model for aerotriangulation only, then derive a polynomial model to improve system performance
for subsequent operations. Further details on RFMs are given in Chapter 11.

12.3 GENERATING GEOSPATIAL DATASETS: DEMS


AND ORTHOPHOTOS

12.3.1 Digital Photogrammetric Production of Geospatial


Datasets
The completion of triangulation in a DPW allows the subsequent extraction of precise 3-dimensional
geospatial information. Geospatial dataset production has truly been revolutionized due to automation
and the processing convenience provided by a fully-digital production environment. The increase in
computing power during the last decade and the evolution of multimedia computing have created a trend

Table 12.1. Example polynomial transformation schemes offered in SOCET SET®.

Automation and Digital Photogrammetric Workstations 11


Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
towards geospatial data fusion and 3-dimensional visualization. This leads to an ever-expanding list of
geospatial products and analysis capabilities that can be derived from modern DPWs, ranging from simple
line-of-sight analysis and perspective view generation to complex site modeling and fly-through animations
(Chapter 13). As an example, Figure 12.13 demonstrates 3-dimensional visualizations of orthoimagery
draped over DEMs. As geospatial information is becoming crucial to numerous diverse applications and
supports an ever-expanding user community, the diversification and customization of geospatial products
is expected to multiply in the future. However, at the core of geospatial product generation are two of
the staples of photogrammetric production: DEMs and orthophotos.

(a) (b)
Figure 12.13. 3-dimensional visualization of DPW derived products. (a) Orthoimage with 1m GSD
draped over a DEM with 30m post spacing (Mt. Katahdin, Maine); (b) Orthoimage with 0.5m
GSD draped over a DEM with 1m post spacing (Seattle, Washington) (note: building sides are
artificially rendered).

12.3.1 Automated DEM Generation


Generating a Digital Elevation Model (DEM) from a rectified stereo pair is a highly automated process on
a DPW. A DEM is a regularly spaced raster grid of z-values of the surface terrain. Alternatively terrain
elevation information may be available in the form of a triangulated irregular network (TIN). A TIN stores
only critical points and breaklines (i.e., topographic surface discontinuities created by such features as
ridgelines, rivers, and cliffs) that can define a surface model more efficiently than a DEM grid at comparable
or even better accuracy. The standard approach to automated terrain extraction (ATE) is similar to the
approach followed for conjugate point matching in orientations as discussed in Section 12.2.3. It makes
use of image matching or correlation to generate a uniformly spaced grid of posts (z-values). Through
correlation, points in the reference (left) image are matched (correlated) automatically to their conjugates
in the target (right) image. A correlation procedure generally follows a hierarchical approach by progressing
through successively higher resolution layers of an image pyramid, or reduced resolution data set (RRDS).
Results from a low RRDS layer are used to initialize the search for the next highest RRDS layer, and so on.
In this manner, the search area is constrained to minimize wandering and reduce the rate of erroneous
matches. Each DPW platform uses its own image correlation strategy, defined by key parameters. Factors
that dictate the selection of strategy parameters include terrain relief, cultural content, image quality,
shadowing, and desired speed of operation. Table 12.1 shows typical strategy parameters [Autometric,
2000] which can be input manually, or can be selected automatically according to terrain type (e.g.,
dynamic ATE in SOCET SET). The procedure is often performed in an iterative manner such that results
may be reviewed, correlation strategy parameters fine-tuned accordingly, and the process repeated.
In order to further improve the performance of automated DEM point measurement, many platforms
perform this operation using epipolar resampled imagery (Section 12.2.3). The effect of this choice on
processing speed is demonstrated in Figure 12.14. The strategy on Figure 12.14a represents a non-
constrained area-based matching approach in which a square template (patch) centered on the point of

12 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
Table 12.2. Example parameters defined in a correlation strategy.

interest in the reference image identifies the reference pattern to be matched in the target image. In
general, larger template sizes produce better results due to higher signal-to-noise ratios, but at the
expense of higher computational demands. In theory, such a search could scan through every pixel in the
target image to determine the best matching patch. By contrast, the epipolar searching strategy (Figure
12.14b) is considerably more efficient because the search is constrained to corresponding epipolar lines.
By its nature it is obvious that epipolar searching depends to a certain extent on the quality of the
relative orientation of the stereopair.
Upon completion of automated DEM generation the results have to be reviewed and edited to
remove blunders. The quality metrics presented in Section 12.2.4 are representative of the accuracy
potential of automated matching methods in a DPW. Accordingly it is expected that correctly matched
points from DEM generation are as accurate as 0.5 pixel or even better. However, the main difference in
automated point measurement during aerotriangulation and DEM generation is related to the massive
amounts of points collected during the second process. This increases the potential for blunders, as
attempts are made to match points in ground areas that may not be suitable for this task (e.g. having low
radiometric variation). Even though automated modules are equipped with tools to identify and remove
poor matching candidates, it is still estimated that anywhere from 5% up to 30% of the points automatically

Automation and Digital Photogrammetric Workstations 13


Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
(a) (b)
Figure 12.14. Autocorrelation strategies: (a) non-constrained area-based matching; (b) epipolar searching (courtesy of
Autometric, 2002).

generated require post-editing [LH Systems, 2001]. Modern high-end DPW platforms generally provide
a comprehensive set of post-processing tools for DTM accuracy assessment, breakline analysis, and
interactive post editing. The autocorrelation process generates a correlation coefficient that indicates
the relative accuracy of a match between a point on the source image and the corresponding point on the
target image. The correlation coefficient takes on a value from 0 to 1, where 1 represents perfect
correlation. Figure 12.15 illustrates one way to review ATE results, i.e., by superimposing the post grid
over the stereo pair. A color-coded classification scheme as in Table 12.3 is used to indicate the relative
accuracy of each post, which is based on the correlation coefficient. Points with low correlation coefficient
values are prime candidates for post-processing.

Figure 12.15. ATE review by superimposing crosses to show post accuracies. These crosses may be color
coded according to classification scheme of Table 12.3 (courtesy of Autometric, 2002). Please refer to the
color appendix for the color version of this image.

14 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
Table 12.3. Sample classification scheme for posts from ATE [Autometric, 2000].

12.3.3 Orthoimage Generation


Existing imagery may be resampled to produce a new image that conforms to specific geometric properties,
such as the production of a vertical view from oblique imagery. This may be a one-to-one process, where
a single source image is modified into another resampled one, as is commonly the case when producing
an orthophoto, or a many-to-one process, whereby a new image contains parts from multiple images as
is the case in mosaicking. Resampling is typically a two-step process comprising:
 the establishment of a geometric correspondence between the coordinate systems of the
source image s(x,y) and the resampled image r(x’,y’), and
 the establishment of a function to express the radiometric relationship between the two images.
Orthorectification is a special case of image resampling whereby the effects of image perspective and
relief displacement are removed so that the resulting orthoimage has uniformly scaled pixels, resembling
a planimetric map. The two basic approaches to orthoimage generation are forward and backward projection
[Novak, 1992]. In forward projection, pixels from the source image are projected onto the DEM to
ascertain their object space coordinates, which are subsequently projected into the orthoimage. In
backward projection, the object space coordinates are projected into the source image to derive the
radiometric information for the corresponding orthoimage pixel. In either case, image resampling is
required to account for terrain variation and perspective effects. Orthophoto generation typically proceeds
following a differential rectification, using the collinearity equations to describe the above mentioned
geometric relationship between the two coordinate systems. In analog and analytical applications
orthoimage generation was a time-consuming process that often required the use of dedicated hardware.
With the use of digital imagery, orthorectification was one of the first photogrammetric processes to be
automated, and orthoimagery gained renewed popularity in the geospatial user community.
DPW software input requirements for orthoimage generation include triangulation results and a DEM.
The main factors affecting the accuracy of the resulting orthoimage are the spatial resolution of the
source image, the accuracy of triangulation, and the accuracy and resolution of the DEM. Beyond these
factors, a common problem with orthoimage generation is building lean, the effect of building displacement
in urban scenes. The problem and its treatment are demonstrated in Figure 12.16. The figure illustrates
the progressive correction of orthophoto distortions and displacements according to the availability of
certain input sources. Without a detailed DEM it is impossible to correct terrain variations, as demonstrated
by the distorted orthophoto grid in Figure 12.16(c). Today most orthoimage generation modules support
at least the use of a DEM to correct for these distortions as demonstrated in Figure 12.16(d). By using as

Automation and Digital Photogrammetric Workstations 15


Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
additional input feature files that model buildings or structures with significant height, the position of
these features can be corrected as demonstrated by Figure 12.16(e). The building roof is repositioned to
reflect its true position in an orthographic projection. However this repositioning has the effect of
leaving a shadow of the building in the orthophoto, corresponding to the area covered by the oblique
building image. This effect is commonly referred to as building lean. This effect can be corrected by using
available supplemental images that reveal areas hidden in building shadows (Figure 12.16[f]).

(a) (b)

(c) (d)

(e) (f)
Figure 12.16. Approaches to removing the effects of building lean from an orthoimage. (a) Orthoimage generation
geometry; (b) raw image; (c) Orthorectification from triangulation, but without a DEM; (d) Orthorectification from
triangulation and a DEM; (e) Orthorectification from a DEM and feature information, but no supplemental imagery; (f)
same inputs as previous, but with supplemental imagery to fill in shadows (courtesy of BAE Systems, 2001).

Joining two or more contiguous orthoimages to create large coverage image maps is accomplished
through image mosaicking (Figure 12.17). The general requirement to produce a mosaic is contiguous
orthorectified images (although it is possible to create a mosaic from raw imagery). The process involves

16 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
resampling all input images
into a common spatial resolu-
tion. The user typically has
complete control over the
positioning of seam lines.
Automatic (or manual) histo-
gram matching techniques are
employed to smooth out ra-
diometric differences among
the input images and to opti-
mize the dynamic range of the
mosaic.
Histogram matching
techniques, e.g., image dodg- Figure 12.17. Image mosaic geometry (courtesy of BAE Systems, 2001).
ing, are used to smooth ra-
diometric unevenness among different input images that compose a mosaic. In histogram matching, a
lookup table is generated to convert the histogram of one image to resemble or match the histogram of
another. The matching process is based on the assumption that differences in global scene brightness are
due to external factors such as atmospheric conditions and sun illumination. Therefore, all pixels in a
particular match are radiometrically adjusted in a similar manner. Figure 12.18 demonstrates histogram
matching applied to a mosaic created from four orthoimages. Illumination differences are evident be-
tween image sequences 1-2, and 3-4, which were photographed approximately two years apart.

(a) (b)
Figure 12.18. Histogram matching. (a) Input orthoimages: (1-2) photographed July 1994, and (3-4)
photographed May 1996; (b) Mosaic performed with histogram matching.

Automation and Digital Photogrammetric Workstations 17


Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
12.4 AUTOMATED FEATURE MEASUREMENTS FOR
GEOSPATIAL APPLICATIONS

12.4.1 Introduction
Feature extraction represents
one of the most complicated
photogrammetric workflow
components from both design
and user perspectives. All of
the high-end systems provide
for the creation of three-di-
mensional feature topology
composed of standard vector
primitives, i.e., points, lines,
and polygons. Sophisticated
relational databases for feature
geometry and attributes are
also provided, with import and
export options to several com- Figure 12.19. Feature extraction in a DPW (courtesy of Autometric, 2002).
mercial formats. A common Please refer to the color appendix for the color version of this image.
practice among DPW vendors
is to provide a seamless interface for a third party software solution to feature extraction, in addition to,
or in lieu of a native solution. A popular environment is the Computer Aided Design (CAD) software
package Microstation®, by Bentley Systems.
Features can be delineated and edited in monoscopic mode (2-dimensional), or stereoscopic mode (3-
dimensional) using a 3-dimensional mouse configuration. In either mode, feature vectors are superimposed
on the imagery as shown in Figure 12.19. Feature extraction requires triangulated imagery and, although
not required, a DEM is usually generated first in order to facilitate the feature extraction process. For
example, a DEM can be used to automatically determine the bottom of a building from a delineated
rooftop, or to provide continuous surface tracking of geomorphic features (e.g., drainage) by constraining
the cursor to the terrain surface.
The process of feature attribution is assigning numerical or textual characteristics of a feature such as
composition, size, purpose, and usage is usually driven by a user-definable set of rules referred to as the
extraction specification. In a typical feature attribution configuration, the user populates a list of pre-
defined attribute names for a given feature type. To provide some level of automation to the process, a
set of reserved attribute names can be automatically calculated from the feature geometry, such as area,
length, width, height, and angle of orientation.
Unlike DEM generation and orthophoto production, the complexity of feature extraction renders it a
largely manual process. However, most platforms provide semi-automated feature extraction tools to
assist the user by completing a feature once adequate information has been collected (e.g. automatically
drawing the sides of a building based on a user-delineated roof). Furthermore, users typically have the
opportunity to import dedicated software solutions to automate feature extraction.
Efforts to automate the extraction of cartographic vector features from digital images form a major
research direction in the photogrammetric and computer vision communities. The recent proliferation of
high-resolution remotely sensed imagery is further intensifying the need for robust automated feature
extraction (AFE) solutions. However, AFE has proven to be a challenging task, as it involves the
identification, delineation, and attribution of specific features (e.g., buildings, roads, and rivers).
Accordingly, the solution of this complex problem lies well beyond matching arbitrary image patches as
performed in automated DEM generation. To date, feature extraction remains largely a manual task in
typical production settings. However, as result of on-going efforts, many AFE algorithms are approaching
the robustness levels required by production environments.

18 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
Table 12.4. Candidate geospatial features for automated extraction.

AFE research is focused on features that are the most useful in GIS applications and those that are the
most time-consuming for manual extraction. For instance, Table 12.4 lists geospatial features that have
been identified by the National Geospatial-Intellligence (NGA) as contributing significantly to extraction
labor in the creation of typical feature data sets. An estimation of the level of research effort being given
to each feature is also provided. AFE research, to date, has focused heavily on man-made features and
targets, with emphasis given to road networks and buildings. Beyond the obvious importance of these
features for geospatial applications, the motivation for this is the fact that roads and buildings are among
the most easily recognizable features over a wide range of image scales for human vision. Although road
and building extraction are relatively trivial tasks for a human extractor, most automated methods are not
yet able to achieve comparable reliability. In the remainder of Section 12.4.1 general design issues
behind common AFE strategies are presented. Some representative automated approaches for road and
building extraction are shown in Sections 12.4.2 and 12.4.3.

12.4.1.1 Photogrammetry and Computer Vision


The problem of Automated Feature Extraction has greatly extended the scope of traditional photogrammetry
into the domain of computer vision (CV) and artificial intelligence. Techniques spanning many disciplines,
including digital image processing, statistical pattern recognition, perceptual organization, computational
geometry, artificial neural networks, and fuzzy logic, have been explored. While short-term goals focus
on solving specific problems, a far-reaching goal of computer vision is to model the perceptual abilities
of human vision. Computer vision can be divided into two broad categories: machine vision and image
understanding. In machine vision, imaging conditions (e.g., lighting and camera positioning) are typically
close-range and highly controlled. Scene objects are relatively well defined for specific extraction tasks.
Typical applications might include industrial inspection of machine parts, optical character recognition,
and feature extraction from scanned maps. Machine vision algorithms tend to be simple, fast, and provide
complete and reliable solutions.
In image understanding (IU), control over imaging conditions is comparatively limited, scene features
are often ambiguous, and background clutter and noise exist to a greater extent. Also, imagery may consist
of several multispectral bands, which is seldom the case in machine vision. The goal of IU is to attempt to
model a scene in terms of its constituent components (regions) in order to facilitate some form of feature
extraction. Depending upon scene complexity, solutions are often partial at best. Typical IU applications
involve object detection and extraction from remotely sensed imagery from space, aerial, and terrestrial
sensors. In the remainder of this section the AFE problem will be assumed to fall within the domain of IU.

Automation and Digital Photogrammetric Workstations 19


Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
12.4.1.2 Feature Detection versus Delineation
Automated Feature Extraction commonly refers to a class of CV algorithms that address the tasks of
feature detection and feature delineation. Feature detection represents a scene segmentation process in
which image pixels are associated with specific feature patterns based on spatial and/or radiometric
properties. Pixel-feature associations are generally made with Type I operators, examples of which
include edge detection, spectral classification, histogram thresholding, texture analysis, and correlation
matching. Type I operators are useful in applications such as automated target recognition, template
matching, and thematic classification of scene regions.
In feature delineation, features are collected and displayed as vectors in a complete and precise
topological representation suitable for subsequent GIS analysis and database storage. Such representations
usually assume the form of strings of connected pixels, or preferably, points, lines or polygon vectors.
Feature delineation is performed with Type II operators, which include methods for linking and grouping
pixels into larger feature components. Type II operators usually work in conjunction with Type I operators
in that the output of the latter serves as input to the former. Omission and commission errors generated
from Type I output are minimized to some extent by Type II operators. For example, feature gaps
(omission errors) are filled in, and clutter (commission error) is ignored. Examples of Type II operators
include region growing, the Hough transform, and line trackers.
As scene structure increases in complexity, so does the need for more sophisticated AFE techniques.
Urban scenes generally present more structural complexity than rural scenes. For instance, automated
delineation of roads in urban scenes is complicated by road markings, vehicular traffic, parking lots,
driveways, sidewalks, curbs, and shadows. By contrast, extraction of a radiometrically homogeneous
road going through a desert is a comparatively trivial task for a simple AFE technique.

12.4.1.3 Degrees of Automation


Automated feature extraction strategies are often categorized according to their degree of automation
into semi- or fully automatic. The objective of semi-automatic methods is to assist the human operator in
real-time. This strategy is designed to use interactive user-provided information in the form of seed
points, widths, and directions, with real-time manual editing of the extraction results. On the other hand,
real-time execution sets limits on the computational complexity of the algorithm.
A fully automatic extraction strategy is intended to extract features from a scene as an offline
process, i.e., without the need of user-provided inputs. This is often suitable to perform GIS updates, in
which existing coarse or outdated feature data is used to guide a revised extraction. A successful update
is dependent upon the accuracy of the reference data relative to the extraction image. A feature update
strategy is an example of a top-down process in that it begins with a priori reference information to guide
feature extraction. Given current worldwide feature database holdings, an update strategy offers a
practical approach to revise and refine existing data. However, for extraction of new features, a rigorous
approach to full automation is needed in the sense that there can be no reliance on reference vector data.
A rigorous strategy is also motivated by gaining insight into the nature of the extraction problem as a
vision process. A standard methodology begins with low-level detection that generates initial hypotheses
for candidate feature components, followed by mid-level grouping of components, and concludes with
high-level reasoning for feature completion. This operational flow is an example of a bottom up, or data-
driven process.

12.4.1.4 General Strategies


Automated feature extraction techniques often use a toolbox approach, in which a collection of specialized
feature models is assembled in a single algorithm to accommodate variable scene content. Regardless of
the algorithm, three key strategies are often used to support feature extraction 1) using scale space, 2)
using context in scene interpretation, and 3) data fusion. A synopsis of each strategy follows.
Scale space. Several photogrammetric processes make use of image pyramids, also known as reduced
resolution data sets. Image pyramids are an example of scale space [Koenderink, 1984; Witkin, 1983]. In
computer vision, image scale space is used to extract the manifestations of features at different image

20 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
scales. For example, the use of image scale space can optimize the extraction of salient features by first
isolating them at lower resolutions, and then homing in on their precise locations at higher resolutions.
In this way, extraction at lower resolutions serves to initialize subsequent extractions at higher resolutions.
There is ample evidence to suggest that scale space processing is inherent in human vision.
Scene context. Modeling scene context is motivated by perceptual organization in human vision. The
premise is to interpret and exploit background scene components more completely to provide a contextual
reference that enhances the extraction of target features, as opposed to a more constrained approach that
only distinguishes targets from non-targets. For example, road markings and vehicular traffic can provide
valuable cues for the existence of roads. Modeling scene context generally increases the complexity of
the algorithm.
Data fusion. The goal of data fusion is to merge different types of information to enhance the
recognition of features. High-resolution multispectral and hyperspectral imagery and high-resolution
DEMs have become very useful information sources for automated extraction algorithms. The premise is
that solution robustness can be increased when several different input sources of information are
analyzed. However, the increase in computational complexity sets an upper limit on the effectiveness of
data fusion.

12.4.1.5 Evaluating the Performance of Extraction Algorithms


The development of a variety of algorithms to support AFE has brought attention to the development of
robust performance evaluation metrics for these techniques [Bowyer and Phillips, 1998; McKeown et al.,
2000]. Practical utility of an algorithm within a production environment is ultimately determined by its
usage cost. An algorithm’s usage cost includes algorithm initialization (e.g. selection of seed points and
definition of algorithm parameters), algorithm execution (computer run time), and the subsequent manual
editing of its output to meet accuracy requirements. Usage cost is typically expressed as a comparison
(e.g. fraction) of the level of effort expended in algorithm-based extraction versus completely manual
extraction for the same job.
While cost effectiveness is a production-driven measure of success, delineation accuracy is the
measure that typically defines the success of an AFE strategy from an algorithmic standpoint. Accuracy
is commonly measured by comparing algorithm output against a manually derived ground truth. Human
error that may be introduced in the process can be accounted for by defining a buffer tolerance region
during the comparison between human and algorithm extraction. Algorithm extraction output pixels are
compared against the ground truth, and separated into four categories [Wiedemann et al., 1998]:
 true positives (TP): correctly extracted pixels (e.g. actual road pixels extracted as such).
 true negatives (TN): correctly unextracted pixels (e.g. actual background pixels extracted as
such).
 false positives (FP): incorrectly extracted pixels (e.g. background pixels marked incorrectly by
the algorithm as road pixels).
 false negatives (FN): incorrectly unextracted pixels (e.g. road pixels marked incorrectly by the
algorithm as background pixels).
Various accuracy measures that are commonly used in the literature are derived from these four
classifications. Among these, three commonly used measures include:

(12.5)

(12.6)

(12.7)

Automation and Digital Photogrammetric Workstations 21


Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
Correctness is a measure ranging between 0 and 1 that indicates the detection accuracy rate relative
to ground truth. It can also be interpreted as the converse of commission error, such that correctness +
commission_error = 1. Completeness is also a measure ranging between 0 and 1 that can be interpreted
as the converse of omission error, such that completeness + omission_error = 1. Completeness and
correctness are complementary metrics that clearly need to be interpreted simultaneously. For example,
if all scene pixels are classified as TPs, then the completeness value is a perfect 1.0; of course the
correctness value would likely be near 0. A more meaningful single metric is quality, which is a normalized
measure between correctness and completeness.The quality value can never be higher than either the
completeness or correctness.

12.4.2 Automating Road Extraction


Since a road network does not conform to a specific global shape, the detection process must begin by
considering local shape characteristics. Road extraction algorithms commonly use geometric and
radiometric attributes to model the appearance of a road segment. With respect to geometry, roads are
generally considered to be elongated, constant in width, linear to mildly curvilinear, smoothly curved,
continuous, and connected into networks. In low-resolution images roads are single-pixel-width lines,
whereas in high-resolution images roads are characterized geometrically as the area within a pair of
edges. With respect to radiometry, road models tend to assume good contrast, well-defined and connected
gradients, homogeneity, and smooth texture. As many road models are based on gradient and texture
analysis of high-resolution imagery, input images are often single-layer (e.g., panchromatic). However,
models that incorporate spectral analysis can exploit the content of multispectral images [Agouris et al.,
2002].
Initially most approaches to road extraction focused on scenes with mainly rural content as they are
much less complex and easier to model than urban scenes. However, in recent years researchers have
developed models for urban scene content for road extraction. Approaches to urban scenes include
modeling road markings (which require a spatial resolution of about 0.2-0.5m per pixel) [Baumgartner et
al., 1999b], or exploiting the geometric regularity of city grids [Price, 2000].
Over the last two decades, a few prominent modeling techniques have emerged from road extraction
research. Among them, tracking, anti-parallel edge detection, and snakes are representative of a progress
towards more complex road models. In the remainder of this section a brief overview of these techniques
is provided, and the section concludes with examples from recently developed road extraction strategies
that employ these techniques.

12.4.2.1 Interactive Road Tracking


Road tracking or following is a local exploratory technique within a scene. In interactive tracking, the
process is user-initialized in real time with a starting point, start direction, and/or feature width. The
algorithm then predicts the trajectory of the road in incremental steps until it reaches a stopping
criterion. One technique for prediction is to fit a polynomial such as a parabola to the most recently
identified path points [McKeown and Denlinger, 1988]. The tracking process may combine edge, radiometric
and/or textural constraints via template/correlation matching, whose patterns are derived from the
starting point, and periodically updated to better model local conditions along the road path.
More robust algorithms allow for gradual surface and width changes along a track, as well as negotiating
occlusions such as shadows, vehicles, surface markings, and surface irregularities. Additional search
parameters often include a search radius, allowable gap size, search angle, junction options, curvature
rate, and line smoothing and generalization. An effective approach to the problem is to provide an on-
the-fly input mode in which the algorithm stops when it encounters potential obstacles such as overpasses
and intersections, and prompts the user to determine how the algorithm should proceed.

12.4.2.2 Road Detection from the Anti-Parallel Edge Model


Over the last two decades, gradient analysis has perhaps provided the most motivation for road extraction
algorithms. A simple and well-known technique for detecting roads in high-resolution images is via anti-

22 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
parallel (apar) edge detection [Nevatia and Ramesh, 1980; Zlotnick and Carnine, 1993]. Given its prominence
in the literature, the apar method is presented in some detail to provide an indication of the practical
utility and limitations of automated road detection.
Figure 12.20, adapted from Gonzalez and Woods, 1992, demonstrates anti-parallel edges with a
simulated road feature that is 3 pixels wide. Any two edge pixels p and q are considered anti-parallel if
the distance between them is within a predefined width range, and the relative difference in their
gradient orientations is less than a predefined angle. In addition, the gradient directions must oppose one
another (hence the prefix anti). Anti-parallel gradient orientations either attract or repel one another
relative to the road/background relationship.

(a) (b)
Figure 12.20. Anti-parallel gradients. (a) attracting gradients, and (b) repelling gradients (adapted from Gonzalez and
Woods, 1992).

The implementation of apar detection begins with an edge detection technique that provides gradient
magnitude and orientation, such as the 3x3 Sobel operators (Fig. 12.21). As an example, the horizontal
and vertical gradients at pixel z5 in Figure 12.21 are calculated respectively as,

Gx = (z1 + 2z2 + z3) – (z7 + 2z8 + z9) (12.8)

and

Gy = (z1 + 2z4 + z7) – (z3 + 2z6 + z9). (12.9)

The magnitude of the gradient at pixel z5 is calculated as,

(12.10)

and the local orientation of the gradient is estimated as,

(12.11)

Automation and Digital Photogrammetric Workstations 23


Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
Further details on these and other similar operators may be found in Chapter 5 of this Manual.
The application of this technique requires an image to be scanned in horizontal and vertical directions
in search of successive edge pixels p and q that satisfy road width and gradient magnitude and orientation
criteria. For example, a horizontal scan line first detects a candidate edge pixel p, and searches for a
candidate anti-parallel edge pixel q on the same row (Figure 12.22). Vertical scanning follows horizontal

(a) (b) (c)


Figure 12.21. 3x3 Sobel masks. (a) horizontal detector, (b) vertical detector, and (c)
gray levels of image patch.

scanning, and results are merged. The perpendicular width estimate of the road for two edge pixels p and
q on a scan line is determined as,

(Horizontal scan) (12.12)

(Vertical scan) (12.13)

^ falls within a specified width range, and the deflection angle


An anti-parallel pair is detected when wpq
α between the gradient orientations gradients at p and q, defined as

(12.14)

falls below a specified threshold. There are many variations of this implementation in the literature.

Figure 12.22. Horizontal scan for anti-parallel pixels.

Once anti-parallel edge pixels are detected, corresponding centerline pixels are derived in a
straightforward manner by determining the midpoint between anti-parallel pixels. Road network topology

24 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
can then be constructed from the centerline pixels by using an appropriate linking or tracking strategy.
However, apar detection errors can confound tracking algorithms. Figure 12.23 shows the results of an
apar centerline detection algorithm using a width range of 5 to 15 pixels, and a gradient orientation
deflection angle threshold of 50 degrees.
Finding roads by anti-parallel edge detection is effective to the extent that 1) anti-parallel edges are
exclusive to roads and 2) all roads have anti-parallel edges. Buildings, road markings, sidewalks, shoulders,
vehicular traffic, medians, intersections, and random clutter can cause erroneous detection. There are a
variety of input parameters that can be adjusted heuristically to provide an acceptable compromise between
the rates of false positive versus false negative detection. Anti-parallel edge detection is an effective and
standard segmentation technique as a first step towards automated road extraction in high-resolution images.

(a) (b)

–179 (c) 180 (d)


Figure 12.23. Apar centerline detection (a) input image (GSD = 1m/pixel); (b) Canny edge detection; (c) Gradient
orientation image (units are angular degrees); (d) Road centerline hypotheses detected from anti-parallel edges. Images
courtesy [Agouris et al., 2002]. Please refer to the color appendix for the color version of this image.

12.4.2.3 Road Extraction by Active Contour Models


Deformable contour models or snakes applied as object extraction tools were first introduced in [Kass et
al., 1988], and have since been the subject of substantial research for road extraction. A deformable line
attaches itself to an edge location in an image similarly to the manner in which a deformable object

Automation and Digital Photogrammetric Workstations 25


Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
embedded in a viscous medium deforms its shape until it reaches a stage of stability. In its numerical
solution the snake is represented by a polygonal line (i.e., nodes connected by segments). The geometric
and radiometric relations of these nodes are expressed as energy functions, and object extraction
becomes an optimization problem. The main issues are how to define the energy at the snake nodes and
how to solve the energy minimization problem.
In general, the energy function of a snake contains internal and external forces. The internal forces
regulate the ability of the contour to stretch or bend at a specific point. The external force attracts the
contour to image features (e.g., edges). Additionally, external constraints may be used to express user-
imposed restrictions (e.g., to force the snake to pass through specific points).
The total energy of each point is expressed as

(12.15)

where,
 Econt , Ecurv are expressions of the first and second order continuity constraints (internal forces),
 Eedge is an expression of the edge strength (external force), and
 α, β, and γ are relative weights describing the importance of each energy term.
A brief description of these energy functions follows [Agouris, et al., 2001].
Continuity term: If v^ i = (xi, yi) is a point on the contour, the first energy term in (12.15) is defined as:

(12.16)

where d is the average distance between the n points:

(12.17)

The continuity component forces snake nodes to be evenly spaced, avoiding grouping at certain areas,
while at the same time minimizing the distance between them.
Curvature term: This term expresses the curvature of the snake contour, and allows for the manipulation
of its flexibility and appearance:
(12.18)

Edge term: Continuity and curvature describe the geometry of the contour and are referred to as
internal forces of the snake. The third term describes the relation of the contour to the radiometric
content of the image, and is referred to as external force. In general, it forces points to move towards
image edges. An expression of such a force may be defined as

(12.19)

The above model attracts the snake to image points with high gradient values. Since the gradient is
a metric for the edges of an image, the snake is attracted to strong edge points. The gradient of the image
at each point is normalized to display small differences in values at the neighborhood of that point.
The coefficients α, β, and γ in (12.15) are weights describing the relative importance of each energy
term in the solution. Increasing the relative values of α and β will result in putting more emphasis on the
geometric smoothness of the extracted line. This might be suitable for very noisy images, but might be
unsuitable when dealing with sharp angles in the object space. Increasing the relative value of γ places
more emphasis on the radiometric content of the image, regardless of the physical appearance of the
extracted outline. As is commonly the case in snake solutions, the selection of these parameters is
performed empirically.
Together, the three energy functions describe an ideal model of a road segment, namely a smooth
curve coinciding with a strong edge in the image. The objective of traditional snake-based road extraction
is to identify in an image a sequence of points describing a contour that approximates this ideal model.

26 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
Selecting seed points within the vicinity of an image feature initializes the snake contour. Using an
iterative procedure, nodes are repositioned to produce a new snake with a lower total energy than its
prior state. For road extraction in high resolution images, the single line snake model is easily extendable
to a dual line, or ribbon snake by including a width component in the model, as shown in Figure 12.24.

(a) (b) (c) (d)


Figure 12.24. Optimization steps of a ribbon snake. (a)-(c): Dotted lines indicate the passive part of
the ribbon. White parts are currently optimized. Black ends indicate the result of the optimization so
far. (d): Final result. Images courtesy [Baumgartner et al., 1999a].

12.4.2.4 Sample Strategies for Fully Automated Road Extraction


To demonstrate the capabilities of current automated road extraction algorithms, three different fully automated
extraction strategies are briefly described, and results shown in Figure 12.25 with evaluation values.

Strategy 1 [Baumgartner et al., 1999a]


This strategy is a compilation of three different modules that are used in an integrated approach to road
extraction. The first module (local extraction) uses multiple image scales, context information, and ribbon
snakes for road extraction, and is based on local grouping of lines (hypotheses for road axes) and edges
(hypotheses for roadsides). It was developed for panchromatic aerial imagery with a spatial resolution of
0.5m or smaller. It delivers reliable hypotheses for roads with a good geometric accuracy. The second
module (global extraction) fuses linear structures from various sources and constructs a weighted graph.
Pairs of seed points within this graph are selected and the shortest paths between these seed pairs are
extracted to construct a road network. Compared to the first module, the second module relies on more
global grouping criteria. The third module (network completion) completes the road network delivered
by the second module. It generates hypotheses for missing connections and verifies these hypotheses
based on the image data. Algorithm extraction results following application of all three modules on a rural
panchromatic scene are shown in Figure 12.25(a).

Strategy 2. [Agouris et al., 2002]


This strategy combines anti-parallel edge detection with classification of spectral image content. The
procedure begins by identifying initial hypotheses of road network centerlines using a combination of
anti-parallel edge detection and a fuzzy linking technique based on principles of perceptual organization.
The initial centerline hypotheses provide the sites for the selection of training samples for a subsequent
Bayesian supervised classification. The non-road class statistics are generated from unsupervised
classification. A binary road class image is generated, and the process of anti-parallel edge detection and
fuzzy linking is repeated (this time on the binary image) to generate new road centerline hypotheses. The
selection of training samples and supervised classification is repeated on the new sites to refine the road
class statistics. The entire process is repeated until a stopping criterion is reached. The incorporation of
a spectral refinement feedback loop in the process acts as a method of self-supervised road classification.
The key to a successful extraction refinement is the accurate selection of training samples. To ensure high
quality training samples, a conservative linking strategy is used to keep false positive detection of
candidate road centerline segments to a minimum. Algorithm extraction results shown in Figure 12.25(b)
are generated from USGS 1.0m color-infrared orthoimagery of an suburban scene after two iterations of
self-supervised classification.

Automation and Digital Photogrammetric Workstations 27


Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
Strategy 3 [Harvey, 1999]
This strategy uses a combination of road finding and road tracking, where the former serves to initialize
the latter. The road finding technique is based on a local histogram of edge directions. A user provides a
tile size (e.g., 50m), as well as edge magnitude and direction images. The image is then tiled using the
given tile size, and each tile is overlapped by 50% to help minimize edge effects. Within each tile, the
procedure proceeds as follows:
 Histogram the edge directions;
 Find the prominent direction peaks in the direction histogram;
 For each direction peak, compute edge direction consistency and inconsistency histograms;
 Split/merge these consistency peaks to generate road hypotheses.
This technique generates rectangular road hypotheses of a fixed length, but with a position, angle,
and width that represent the local attributes of the road. Multiple roads can be detected within each tile.
The road finding results are then passed on to a tracking technique as described in [McKeown and
Denlinger, 1988]. Algorithm extraction results from a panchromatic image of the Ft. Benning site are
shown in Figure 12.25(c).

(a)

(b)

(c)
Figure 12.25. Examples of fully automated road extraction strategies (reference scene on left, ground truth in middle,
and extraction results on right). (a) Strategy 1 (correctness = 91.2%, completeness = 83.2%, quality not provided); (b)
Strategy 2 (correctness = 94.0%, completeness = 91.9%, quality = 86.8%); (c) Strategy 3 (correctness = 27.5%,
completeness = 56.3%, quality = 25.0%). Images are provided courtesy of the respective publications for each
strategy. Please refer to the color appendix for the color version of this image.

28 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
12.4.3 Building Extraction
While roads generally lie on the terrain surface, buildings extend from the terrain as self-contained 3-
dimensional objects. Approaches to automated building extraction therefore typically combine
photogrammetric principles with CAD-based modeling techniques, commonly referred to as site modeling.
Extraction methods exploit the strong geometric regularities of collinearity, coplanarity, and parallel
lines and planes that are inherent to buildings. They proceed by extracting from the image primitives that
describe the building structure. These primitives may range from points and lines to planar elements
(Figure 12.26). Extraction models also exploit the orientation of shadows cast by buildings when camera
and sun orientations are known or unknown.

(a) (b) (c) (d)

Figure 12.26. Building representations. (a) points, (b) wire frame, (c) surface, (d) volumetric. Courtesy [Rottensteiner,
2001].

The roof structure of a building is a fundamental model consideration in building extraction algorithms.
Roof structures can be separated into three broad categories: flat, peaked, and gable, as illustrated in
Figure 12.27. More complex roof structures are usually modeled by considering individual planar roof
components. Semi-automated extraction techniques can exploit geometric regularity combined with a
DEM. The Auto Create tool used in SOCETSET is designed to complete a building structure based on a
particular digitizing sequence. The circled numbers in the top views in Figure 12.27 demonstrate the
digitization sequence for each roof type. For example, in Figure 12.27(a), the user digitizes the three
points as shown, and the algorithm derives the z-value from the DEM (or manually input by the user) to
complete a flat roof building. The peaked and gabled roofs require four and six digitized points respectively.

12.4.3.1 Using High Resolution DTMs


With the advent of high-resolution digital elevation models (e.g., from lidar), the task of fully automatic
building extraction can be greatly facilitated. In a manner similar to detecting edges from radiometric

Figure 12.27. Digitizing sequences for rapid extraction of buildings with specific roof structures. (a) flat; (b) peaked;
(c) gabled.

Automation and Digital Photogrammetric Workstations 29


Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
discontinuities from imagery, elevation discontinuities in a high-resolution DEM can be used to detect
objects that extend above the terrain. Figure 12.28 demonstrates how edge detection performed on a
high-resolution DEM compares with an orthoimage of the same urban scene. The DEM in Figure 12.28(c)
is represented as an 8-bit gray level image, where darker shades indicate higher elevation values. A
perspective view of the DEM is shown in Figure 12.28(e). Performing edge detection with the DEM
image can be particularly effective at dealing with multi-tiered roofs and other complicated roof structures.
Currently, the generation of such high-resolution DEMs is a costly process.

(a) (b) (c)

(d) (e)
Figure 12.28. Automatic building detection. (a) Orthoimage; (b) Canny edge detection of orthoimage; (c) DEM image
with 1m post spacing; (d) Canny edge detection of DEM image; (e) perspective view of DEM.

12.4.3.3 Sample Algorithms of Fully Automated Building Extraction


To demonstrate the capabilities of current automated building extraction algorithms, two different fully
automated extraction algorithms are described, and extraction results shown in Figures 12.29-30, with
evaluation metrics. Note that the evaluation metrics are provided for both two (image space pixels) and
three (object space voxels) dimensions.

Algorithm 1 [Irvin and McKeown, 1989; Shufelt, 1999a].


This algorithm combines two techniques called BUILD and SHAVE. BUILD is a line and corner-based
analysis system which operates solely in image space. BUILD assumes that all images are acquired with
nadir or near-nadir imaging geometry, that perspective effects can be ignored, and that all buildings can
be modeled by 2-dimensional convex quadrilaterals (boxes). BUILD begins by using a sequence finder to
break edges at points of high curvature, and then uses a collinear line linking process to handle fragmented
edges that appear to share the same underlying structure. By itself, BUILD does not generate 3-dimensional
building hypotheses.
SHAVE is a shadow-based verification system that makes the same assumptions of image geometry
as BUILD. SHAVE uses the global shadow threshold computed by BUILD in conjunction with a sequence
finder to delineate shadow regions on the shadow-casting sides of boxes, using the known solar azimuth.
After delineating a shadow region, SHAVE computes the average length of the shadow region, which

30 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
could be used to derive an image space estimate of building height in conjunction with the solar
elevation angle.
BUILD+SHAVE simply runs BUILD to produce 2-dimensional boxes, and then runs SHAVE on those
boxes to obtain shadow lengths for each box. The ground sample distance is computed at the center of
the box and this is multiplied by the length of the shadow in image space to obtain the length of the
shadow in object space, which can then be used with the solar elevation angle to derive an object space
height estimate for the building. Photogrammetric routines are then used to generate a 3-dimensional
object space wireframe model from the 2-dimensional box and height estimate. BUILD+SHAVE extraction
results from a panchromatic image of the Ft. Hood site are shown in Figure 12.29(c).

(a) (b) (c)


Figure 12.29. Algorithm 1 (BUILD+SHAVE) extraction results from the Ft. Hood site. (a) Reference scene; (b) Ground
truth; (c) Automated extraction. Images courtesy of [Bulwinkle and Shufelt, 1998]. Please refer to the color appendix for
the color version of this image.

Algorithm 2 [Shufelt, 1999b]


This algorithm, referred to as PIVOT (Perspective Interpretation of Vanishing
points for Objects in Three dimensions), is a building extraction system
which uses rigorous photogrammetric camera modeling at all phases of its
processing. PIVOT uses a vanishing point detection algorithm to detect
horizontal, vertical, and slanted vanishing points, using this orientation
information to constrain the search for building structure. The constrained
corners produced by PIVOT are then used to form rectangular and triangular
primitive volumes, which PIVOT combines to model simple and complex 3-
dimensional object-space buildings. In addition to the conventional hypothesis
Figure 12.30. Algorithm 2
verification approach of evaluating the edge support for a building hypothesis,
(PIVOT) extraction results
PIVOT also makes full use of the camera model and knowledge about solar from the Ft. Hood site.
elevation and azimuth in an object-space based shadow verification test and Image courtesy of [Shufelt,
a surface intensity consistency test. Extraction results from PIVOT using 1999a].
panchromatic imagery of the Ft. Hood site are shown in Figure 12.30.

The evaluation metrics of these two algorithms are as follows:


Algorithm 1:
2-dimensional: correctness = 91.6%, completeness= 71.3%, quality = 60.0%
3-dimensional: correctness = 60.9%, completeness= 47.1%, quality = 36.1%
Algorithm 2:
2-dimensional correctness = 79.6%; completeness = 84.3%; quality = 69.3%
3-dimensional correctness = 77.4%; completeness = 77.0%; quality = 62.8%

Automation and Digital Photogrammetric Workstations 31


Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
References
Autometric, Inc., 2002. SoftPlotter 4.0 User’s Guide.
Agouris, P., 1992. Multiple Image Multipoint Matching for Automatic Aerotriangulation, Ph.D.
Dissertation, Dept. of Geodetic Science, The Ohio State University, Columbus, Ohio.
Agouris, P., P. Doucette, and A. Stefanidis, 2002. Automated Road Extraction from High Resolution
Multispectral Imagery Technical Report, Digital Image Processing and Analysis Laboratory,
Department of Spatial Information Science and Engineering, University of Maine, Orono, Maine.
Agouris, P., and T. Schenk, 1996. Automated Aerotriangulation Using Multiple Image Multipoint
Matching, Photogrammetric Engineering and Remote Sensing, 62(6): 703-710.
Agouris, P., A. Stefanidis, and S. Gyftakis, 2001. Differential Snakes for Change Detection in Road
Segments, Photogrammetric Engineering and Remote Sensing, 67(12): 1391-1400.
BAE Systems, 2001. SOCETSET User’s Manual, Version 4.3.1.
Baumgartner, A., W. Eckstein, C. Heipke, S. Hinz, H. Mayer, B. Radig, C. Steger, and C. Wiedemann,
1999a. T-REX: TUM research on road extraction. In: Festschrift für Prof. Dr.-Ing. Heinrich Ebner zum
60. Geburtstag, C. Heipke and H. Mayer (eds.), pp. 43-64. Lehrstuhl für Photogrammetrie und
Fernerkundung, Technische Universität München
Baumgartner, A., C. Steger, H. Mayer, W. Eckstein, and H. Ebner, 1999b. Automatic Road Extraction
Based on Multi-Scale, Grouping, and Context, Photogrammetric Engineering and Remote Sensing,
65(7): 777-785.
Bowyer, K. and P. Phillips, 1998. Overview of Work in Empirical Evaluation of Computer Vision
Algorithms. Empirical Evaluation Techniques in Computer Vision (K. Bowyer and P. Phillips, editors),
IEEE Computer Society Press, pp. 1-11.
Bulwinkle, G. and J. Shufelt, 1998. A Building Model Evaluation Suite Using the CMU Site Exchange
Format. Tech. Report CMU-CS-134, School of Computer Science, Carnegie Mellon University,
Pittsburgh, Pennsylvania.
Doucette, P. 2002. Automated Road Extraction from Digital Imagery by Self-Organization, Ph.D.
Dissertation, Dept. of Spatial Information Engineering, University of Maine, Orono, Maine.
Fraser C.S., E. Baltsavias, and A. Gruen, 2002. Processing of Ikonos Imagery for Submetre 3D
Positioning and Building Extraction, ISPRS Journal of Photogrammetry and Remote Sensing , 56:
177– 194.
Gonzalez, R., and R. Woods, 1992. Digital Image Processing. Addison-Wesley.
Graham L.N., K. Ellison, and H. Riddell, 1997. The Architecture of a Softcopy Photogrammetry System,
Photogrammetric Engineering and Remote Sensing, 63(8): 1013-1020.
Harvey, W., 1999. Performance Evaluation for Road Extraction. Bulletin de la Societé Francaise de
Photogrammetrie et Télèdetection n. 153(1999-1): 79-87.
Heipke, Ch., 1999. Automatic Aerial Triangulation: Results of the OEEPE-ISPRS Test and Current
Developments, Photogrammetric Week ‘99, Wichmann, pp. 177-191.
Irvin, R. and D. McKeown, 1989. Methods for Exploiting the Relationship Between Buildings and Their
Shadows in Aerial Imagery, IEEE Transactions on Systems, Man, and Cybernetics, 19(6): 1564-1575.
Kass, M., A. Witkin, and D. Terzopoulos, 1988. Snakes: Active Contour Models, International Journal of
Computer Vision 1(4): 321-331.
Kersten, Th., 1999. Digital Aerial Triangulation in Production - Experiences with Block Switzerland,
Photogrammetric Week ‘99, Wichmann, pp. 193-204.
Kersten, Th., and S. Haering, 1997. Automatic Interior Orientation of Digital Aerial Images,
Photogrammetric Engineering and Remote Sensing, 63(8): 1007-1011.
Koenderink, J., 1984. The Stucture of Images. Biological Cybernetics, 50: 363-370.
Lue, Y., 1995. Fully Operational Automatic Interior Orientation, Proceedings of Geoinformatics ‘95, pp.
26-35.
McKeown, D., T. Bulwinkle, S. Cochran, W. Harvey, C. McGlone, and J. Shufelt, 2000. Performance
Evaluation for Automatic Feature Extraction, International Archives of Photogrammetry and Remote
Sensing, XXXIII: (B4).
McKeown, D., and J. Denlinger, 1988. Cooperative Methods for Road Tracking in Aerial Imagery, IEEE
Proc. Computer Vision and Pattern Recognition, Ann Arbor, MI, pp. 662-672.
Nevatia, R. and B. Ramesh, 1980. Linear Feature Extraction and Description, Computer Vision,
Graphics, and Image Processing 13: 257-269.

32 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
Novak, K., 1992. Rectification of Digital Imagery. Photogrammetric Engineering and Remote Sensing,
58(3): 339-344.
Price, K. 2000. Urban Street Grid Description and Verification, IEEE Workshop on Applications of
Computer Vision (WACV), Palm Springs, pp. 148-154.
Rottensteiner, F., 2001. Semi-Automatic Extraction of Buildings Based on Hybrid Adjustment Using 3D
Surface Models and Management of Building Data in a TIS, Ph.D. Dissertation, Vienna University of
Technology.
Schickler, W. and Z. Poth, 1996. Automatic Interior Orientation and Its Daily Use, International
Archives of Photogrammetry and Remote Sensing, XXXI(B3), pp. 746-751.
Schenk, T., J.C. Li, and C. Toth, 1991. Towards an Autonomous System for Orienting Digital
Stereopairs, Photogrammetric Engineering and Remote Sensing, 57(8): 1057-1064.
Shufelt, J. 1999a. Performance Evaluation and Analysis of Monocular Building Extraction from Aerial
Imagery, IEEE Transactions on Pattern Analysis and Machine Intelligence 21(4): 311-326.
Shufelt, J. 1999b. Performance Evaluation and Analysis of Vanishing Point Detection Techniques, IEEE
Transactions on Pattern Analysis and Machine Intelligence 21(3): 282-288.
StereoGraphics Corporation, 1997. StereoGraphics Developers’ Handbook.
Tang, L., and C. Heipke, 1996. Automatic Relative Orientation of Aerial Images, Photogrammetric
Engineering and Remote Sensing, 62(1): 806-811.
Tsingas, V., 1994. A Graph-Theoretical Approach for Multiple Feature Matching and Its Application on
Digital Point Transfer, International Archives of Photogrammetry and Remote Sensing, XXX(3/2), pp.
865-871.
Wiedemann, C., C. Heipke, H. Mayer, and O. Jamet, 1998. Empirical Evaluation of Automatically
Extracted Road Axes, Empirical Evaluation Methods in Computer Vision (K. Bowyer, and P. Phillips,
editors), IEEE Computer Society Press, pp. 172-187.
Witkin, A. 1983. Scale-Space Filtering, Int. Joint Conference on Artificial Intelligence, pp. 1019-1022.
Zlotnick, A. and P. Carnine, 1993. Finding Road Seeds in Aerial Images, Computer Vision, Graphics,
and Image Processing 57(2): 243-260.

Automation and Digital Photogrammetric Workstations 33


Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
View publication stats

You might also like