Professional Documents
Culture Documents
net/publication/265308751
CITATIONS READS
28 6,721
3 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Anthony Stefanidis on 17 June 2017.
Chapter
AUTOMATION AND DIGITAL
PHOTOGRAMMETRIC
WORKSTATIONS
Peggy Agouris, Peter Doucette and Anthony Stefanidis
12.1.1 Introduction
Prior to the 1990s, early pioneering attempts to introduce digital (softcopy) photogrammetric map
production involved multiple special purpose and high-cost hybrid workstations equipped with custom
hardware and running associated proprietary software. The early 1990s saw the advent of the end-to-end
softcopy-based system, or Digital Photogrammetric Workstation (DPW). This marked a major paradigm
shift toward a stand-alone workstation that could accommodate virtually all aspects of image-based map
production and geospatial analysis. Through most of the 1990s, DPW systems ran almost exclusively on
high-end Unix or VAX platforms. Supported by advances in computer hardware and software, the late
1990s saw a migration toward more economical, modular, scalable, and open hardware architectures
provided by PC/Windows-based platforms, which also offered performance comparable to their Unix-
based counterparts. At the printing of this manual, there are about 15 independent software vendors that
offer PC/windows-based DPW production systems that vary in cost, functionality, features, and complexity.
Among the most popular fully-featured high-end systems are SOCET SET® by BAE Systems, Softplotter®
by Autometric, ImageStation® by Z/I Imaging, and Geomatica® by PCI Geomatics. Contained among these
system designs is a legacy of experience in the development of photogrammetric and mapping
instrumentation from several familiar vendors. Therefore, DPW design and development have been
greatly influenced by established conventions in the photogrammetric practice, resulting in comparable
architecture and functionality in these systems.
A DPW system comprises software and hardware that supports the storage, processing, and display of
imagery and relevant geospatial datasets, and the automated and interactive image-based measurement
of 3-dimensional information. Dictated by the requirements of softcopy map production, the defining
characteristics of a DPW include:
the ability to store, manage, and manipulate very large image files,
the ability to perform computationally demanding image processing tasks,
providing smooth roaming across entire image files, and supporting zooming at various resolutions,
supporting large monitor and stereoscopic display,
supporting stereo vector superimposition, and
supporting 3-dimensional data capture and edit.
Some of these challenges are met by making use of common computer solutions. For example, in order
to support rigorous image processing and display demands for images with radiometric resolutions of up
to 16 bit panchromatic (48 bit RGB) DPW systems make use of high-end graphics hardware, and large
amounts of video memory and disk storage. These capabilities are enhanced by specially designed
solutions that enable, for example, seamless roaming over large image files (see e.g. the architecture of
Intergraph’s ImagePipe [Graham et al., 1997]).
A typical DPW is shown in Figure 12.1(a). It is a dual monitor configuration, designed to dedicate one
monitor to stereoscopic viewing. The second monitor is commonly used for the photogrammetric soft-
ware graphical user interface (GUI), general display purposes, and general computing requirements. Even
though single monitor configurations are also possible when the graphics hardware supports stereo in a
window viewing, dual monitor configurations currently remain the most popular choice. They represent
a natural evolution of analytical photogrammetric plotters. Similar to DPWs, analytical stereoplotters
1
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
made use of a monitor dedicated to GUI and
software control. The stereo displaying moni-
tor of a DPW (and corresponding eyewear)
can be considered the counterpart of the
complex electro-optical viewing system of
an analytical stereoplotter with its oculars
and corresponding prisms. It allows the op-
erator to view stereoscopically a window in
the overlapping area of a stereopair. As the
images are available in a DPW in softcopy (a) (b)
format (as computer files) instead of actual
film, the complex electromechanical mecha- Figure 12.1. (a) A Digital Photogrammetric Workstation (courtesy
nism used to control movement in an ana- of Z/I Imaging); (b) 3-dimensional TopoMouse (courtesy of Leica
Geosystems).
lytical stereoplotter is obsolete for a DPW,
replaced by computer operations that generate pixel addresses within a file and accessing the corre-
sponding information. The 3-dimensional measuring device (a.k.a. turtle) of an analytical stereoplotter is
replaced by specialized 3-dimensional mouse designs that are available to facilitate the extraction of x,
y, z coordinates. These 3-dimensional mouse designs may range from simple mouse-and-trackball con-
figurations to more complex specially designed devices such as Leica Geosystems’ TopoMouse shown in
Figure 12.1(b). The TopoMouse has several programmable buttons, and a large thumbwheel to adjust the
z-coordinate, and complements standard PC input devices to control roaming and measuring in both
single image and stereo mode.
Beyond the above-mentioned effects that the availability of softcopy imagery has had on the design
of DPW configurations, the most dramatic effect of this transition has been the increased degree of
automation of photogrammetric operations. Whereas automation in analytical stereoplotters was limited
to driving the stereoplotter stage to specific locations, in DPWs automation has affected practically all
parts of the photogrammetric process, from orientation to orthophoto generation. This automation is
addressed in Sections 12.2 - 12.4 of this manual.
2 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
transmitted to the eyewear via an IR signal that originates from an emitter that interfaces with the
graphics hardware.
In either case, the stereoscopic display technique used is referred to as time-multiplexing or field-
sequential. A field represents image information allocated in video memory to be drawn to the display
monitor during a single refresh cycle. In such an approach, parallax information is provided to the eye by
rapidly alternating between the left and right images on the monitor. The images must be refreshed at a
sufficient rate (typically 120 fields per second, to achieve 60 fields per second per eye), in order to generate
a flicker-free stereoscopic image. Today, stereo-ready high-end graphics cards and monitors that use at
least a double-buffering technique to provide refresh rates up to 120 fields per second are readily available
for most computing platforms. Such graphics cards are equipped with a connector that provides an output
synchronization signal for electronic shuttering devices such as an IR emitter or monitor panel.
Stereoscopic viewing solutions also exist for graphics hardware that is not stereo-ready. Known as
the above-and-below format, the method uses two vertically arranged subfields that are above and
below each other, and squeezed top to bottom by a factor of two in a single field. A sync-doubling emitter
adds the missing synchronization pulses for a graphics hardware running at a nominal rate of 60 fields per
second, thus doubling the rate for the desired flicker-free stereo image. As a result of altering the sync
rate of the monitor, above-and-below stereoscopic applications must run at full-screen. The stereo
graphics hardware automatically unsqueezes the image in the vertical so the stereo image has the
expected aspect ratio. A trade-off with this approach is that the vertical resolution of the stereo image
is effectively reduced by a factor of 2, since the pixels are stretched in the vertical directional. Nonetheless,
the above-and-below format provides a workable solution for non-stereo-ready graphics hardware.
When video memory is limited, a relatively low-cost technique for generating a stereo image is to
interlace left and right images on odd and even field lines (e.g., left image drawn on lines 1, 3, 5…etc.,
and right image on lines 2, 4, 6…etc.). Shuttering eyewear is synchronized with the refresh rate of the
odd and even field lines in order to decode a stereo image for the viewer. The drawbacks of interlaced
stereo include a degradation of vertical resolution by a factor of 2, more noticeable flicker, and applications
that are limited to full screen stereo mode.
For state-of-the-art stereo viewing capabilities, high-end 3-dimensional graphics hardware designs
offer what is known as quad buffered stereo (QBS). Once available only on Unix workstations, QBS has
now become common place on PCs. QBS can be understood as a simple extension of double-buffered
animation, i.e., during the display of one video image, the image to follow is concurrently drawn to a
memory buffer. The result is a faster, and thus smoother transition between image sequences. Double-
buffering is exploited in stereoscopic display techniques to speed up the transitions between left and
right images. QBS extends this concept even further by dividing the graphics memory into four buffers,
such that one pair of buffers is the display pair, while the other is the rendering pair. The result is vastly
improved stereo viewing quality in terms of smoothness during image roaming, as well as rendering
real-time superimposed vectors that are always current. A significant advantage of QBS is that it allows
for multiple stereo-in-a-window (SIAW) renderings. That is, a user has access to other application windows
while rendering stereo displays in one or more windows on a single monitor.
In an ideal stereoscopic system, each eye sees only the image intended for it. Electro-stereoscopic
viewing techniques are susceptible to a certain amount of crosstalk, i.e., when either eye is allowed to
see the image for the other eye, to some extent. The principal source of crosstalk in electro-stereoscopic
display is CRT phosphor afterglow, or ghosting. Following the display of the right image, its afterglow
will persist to some extent during the display of the left image, and vice-versa. Ghosting is kept at a
tolerable level by using sufficiently low parallax angles. As a general rule of thumb, parallax angles are
kept under 1.5° for comfortable stereo viewing with a typical workstation configuration (StereoGraphics,
1997). The parallax angle θ is defined as
(12.1)
where P is the distance between right and left-eye projections of a given image point, and d is the
While the overall workflow has not changed much, the use of softcopy imagery and DPWs has
revolutionized photogrammetric mapping by introducing automated modules to perform these tasks.
Today, modern DPW production software provides users with end-to-end solutions that are highly
modularized, flexible, and interoperable. Software applications are typically offered as a collection of
self-contained tools that can be run in customizable workflow sequences to generate desired products.
Typical core components offered by most vendors correspond to the fundamental tasks identified in
Figure 12.3, with additional modules dedicated to the management of the produced information. For
instance, in the case of SoftPlotter’s main production toolbar depicted in Figure 12.4, a standard end-to-
end workflow would generally proceed from left to right. However, it is also possible to customize this
process, importing for example external block triangulation results from third party software, so that the
user could proceed immediately to DEM generation. Each flow component usually offers import and
export options to accommodate an extensive variety of standard and customized image and data formats
to support interoperability with software
from other vendors. In Sections 12.2-12.4
an overview of the fundamental imple-
mentation and capabilities of essential
DPW workflow components is presented. Figure 12.4. Main production toolbar of SoftPlotter (courtesy of
Autometric).
4 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
12.2.2 Interior Orientation in a DPW
While digital cameras are becoming the standard in close-range applications, aerial mapping missions
commonly make use of analog cameras. The use of advanced high resolution digital sensors in
photogrammetric applications is still at an experimental stage [Fraser et al., 2002]. Aerial photographs
captured in analog format (using film cameras) are subsequently scanned to produce softcopy counterparts
of the original diapositives. These digitized diapositives are the common input to a DPW workflow. In this
set-up, interior orientation comprises two transformations:
a transformation from the digital image pixel coordinate system (rows and columns) to the photo
coordinate system of the analog image, as it is defined by the fiducial marks, and
the definition of the camera model by selecting the corresponding camera calibration parameters,
to support the eventual transformation from photo coordinates to the object space.
The second task is addressed during project
preparation by selecting the appropriate camera
calibration file, similar to the way this information is
selected in an analytical stereoplotter. This file includes
the geometric parameters that define the specific
image formation process, e.g. camera constant, fiducial
mark coordinates, distortion parameters. This
information is used later during aerotriangulation to
reconstruct the image formation geometry using the
collinearity equations (Chapters 3 and 11).
The novelty introduced by the use of DPWs in
recovering interior orientation is related to the first
transformation, namely from pixel (r,c) to the photo
(x p ,y p ) coordinate system, which requires the
identification and precise measurement of fiducial marks
in the digital image. Since fiducial marks are well-defined
targets, this process is highly suitable for full automation.
The typical workflow of an automated interior Figure 12.5. Typical workflow of an automated
interior orientation module.
orientation module is shown in Figure 12.5. Input data
include the image file, corresponding fiducial mark photo coordinates, and information on scanning pixel
size and scanning calibration information. This information is used to derive the approximate locations of
fiducial marks and to extract image patches that contain them. This can be accomplished either manually,
with the human operator pointing the cursor to their vicinity, or automatically, using hierarchical
approaches [Schickler and Poth, 1996]. In either case, the selected image patches are large enough to
ensure that they fully contain the fiducial marks, corresponding for example to a diapositive window as
large as 15 by 15 mm in the approach of [Kersten and Haering, 1997].
Jena LMK 2000 Wild RC20 Wild RC30 Zeiss RMK TOP15
Figure 12.6. Examples of fiducial marks supported in automated interior orientation schemes
(courtesy of Autometric, 2002).
The relationship between pixel (r,c) and photo (xp,yp) coordinate systems is described by a six parameter
affine transformation:
(12.2)
The six parameters express the combined effects of two translations, rotations and scalings between
the two coordinate systems. The measurement of each fiducial mark introduces two such observation
equations. Therefore, as soon as three fiducial marks have been measured an initial estimate of the
parameters can be made, with the solution re-adjusted every time an additional mark is measured. When
using calibrated scanners the results of scanner calibration should be used to correct pixel coordinates
before using them in the interior orientation process.
Reported accuracies in the automated measurement of fiducial marks are in the range of 0.1-0.3 pixel
using template matching techniques [Kersten and Haering, 1997; Schickler and Poth, 1996]. With a scanning
resolution of 12.5 to 25 µm, this corresponds to fiducial measurements with accuracy better than 4 to 8 µm.
These results are comparable to those achieved by a human operator in an analytical plotter. Figure 12.8
depicts an example graphical user interface of an automated interior orientation module.
6 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
two overlapping images with
respect to each other, allowing
conjugate rays to intersect in
space and form a stereomodel
of the depicted area. This re-
quires the measurement of an
adequate number of conjugate
points in the overlapping area of
the stereopair. As discussed in
Chapter 11, in order for a rela-
tive orientation solution to be
statistically robust these points
must be well dispersed, cover-
ing the six von Gruber locations
(Figure 12.9). Benefiting by the
development of robust match-
ing techniques and work on the
automation of relative orienta-
tion, the measurement of con-
jugate points in a stereopair has
become an automated process Figure 12.8. Example GUI of an automated interior orientation module (courtesy
in digital photogrammetric ap- of Autometric, 2002).
plications.
The typical workflow of automated conjugate point measurement in a stereopair is shown in Figure.
12.10. Using approximate information on image overlap, windows in the vicinity of the von Gruber
locations are selected in each stereomate. The challenge is then to identify within these windows
distinct primitives (e.g. points, line segments) that are suitable for subsequent matching, and to select an
appropriate matching technique to establish correspondences among them. A popular choice among
existing software modules is to select interest points in each image separately, and to match them
subsequently using correlation techniques [Schenk et al., 1991]. Interest points are points of high gray
value variance, e.g. bright spots, or sharp corners, and can be detected using operators like Moravec or
Förstner (Chapter 6). By definition, interest points are distinct from their background, and are therefore
highly suitable for localization and matching. By applying such an operator in each von Gruber window of
each stereomate two pools of interest
points per window are produced, one
for each image. These points are match-
ing candidates, and become input to a
matching scheme that aims to identify
pairs of conjugate points. Matching can
be performed using an area-based ap-
proach, whereby windows centered on
each interest point are compared to iden-
tify conjugate pairs as those that display
the highest correlation values. Various
conditions can be introduced to mini-
mize gross errors and improve the over-
all performance of the technique (e.g.
setting a minimum threshold on accept-
able correlation limits, imposing con-
straints on acceptable parallax ranges).
Figure 12.9. Von Gruber locations in the overlapping area of a
This matching process is commonly stereopair.
8 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
or both images such that horizontal lines of imagery displayed on the screen are epipolar lines as
illustrated in Figure 12.11. A detailed mathematical derivation of the procedure can be found in Section
3.2.2.7, while resampling is addressed in more detail in Section 12.3.3. With digital images, rotations are
accomplished via a resampling process that must guarantee the same ground sample distance per pixel
for both images. This technique, referred to as epipolar resampling, has become an essential part of DPW
workflow because it supports subsequent processes like DEM generation and orthophoto production. As
image resampling involves interpolation to derive each new pixel, it is a CPU intensive operation. In
order to alleviate this problem, techniques such as on-the-fly epipolar resampling have been developed
(used e.g. in ImageStation) to dynamically localize the resampling process to regions of interest. This
eliminates the need to generate entire epipolar resampled images prior to display and reduces the
amount of disk storage overhead.
12.2.4 Aerotriangulation
In modern DPWs the relative orientation workflow presented in the previous section is not implemented
as a separate stand-alone module, but rather as part of a broader point measurement and triangulation
module. However, it provides the theoretical and practical basis of automated point measurement during
aerotriangulation in DPWs. Aerotriangulation is often characterized as one of the more complex procedures
in terms of user knowledge of the underlying principles of a photogrammetric block adjustment. Its
objective is to relate multiple images to each other in order to:
recover the complete orientation parameters of each image in the block, namely the (X0,Y0,Z0)
coordinates of the exposure station and the ω,φ,κ rotations, and to
determine the ground coordinates (X,Y,Z) of points observed in them.
This requires the measurement of conjugate points in the overlapping areas of the block imagery (tie
and pass points), and the measurement of the photo coordinates of depicted control points.
Virtually all vendors provide triangulation algorithms that are based on rigorous physical sensor
models and the well-established principles of least squares bundle adjustment in which all parameters
are fully weight-constrained. These modules typically support two types of measurements:
Automatic Point Measurements: proceed according to the workflow described in Section 12.2.3
to automatically produce large amounts of conjugate points. To accommodate the needs of
aerotriangulation, matching tools have been extended from stereo to multi-stage application.
The proposed techniques include the extension of least squares matching to multi-image
application and their integration with block adjustment [Agouris, 1993; Agouris and Schenk,
1996], and the introduction of graph-based techniques to combine sequences of stereo matching
results in a block adjustment [Tsingas, 1994].
Interactive Point Measurements: supports user-controlled measurements for the identification
and measurement in a semi-automatic mode for specific points, especially ground control points.
Additionally, modern DPWs support blunder detection and the remeasurement of erroneous points,
to improve the overall quality and performance of softcopy aerotriangulation. Figure 12.12 illustrates
how point measurement can be performed on multiple overlapping images that include reference views.
Experiments with DPW aerotriangulation indicate the high accuracy potential of the automated
approach. More specifically, results from the recent OEEPE aerotriangulation test using imagery scanned
at a resolutions of 20-30 mm indicate tie point measurements with accuracies ranging from 0.11-0.5
pixels (corresponding to 2.2-11 mm) [Heipke, 1999]. The optimal results (0.11-0.2 pixels) were achieved
processing imagery of open and flat terrain with good texture. In more adverse conditions, such as in
blocks of alpine regions at scales ranging from 1:22,000 to 1:54,000, and a scanning resolution of 25 mm,
point measurement accuracies ranging from 0.25-0.5 pixels are achieved [Kersten, 1999]. In the same
set-up the exposure station coordinates are estimated as accurately as 0.6m in X and Y, and 0.4m in Z.
These results indicate that under favorable conditions (open and flat terrain, good texture, high scanning
resolution), with a DPW, natural point measurement accuracies comparable to the accuracies measured at
signalized points in analytical plotters can be achieved. The single disadvantage today is the rather large
number of blunder matches that can be introduced as the result of full automation. Editing and removal
(or remeasuring) of blunders is a time-consuming process.
The performance of DPWs becomes even more impressive when considering the favorable effects on
production due to the high degree of automation. The time requirements to process a block are reported
10 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
to range from 10-20 minutes per image considering only operator-assisted processes and excluding
batch processes, scanning, and control preparation [Kersten, 1999]. This represents a significant
improvement compared to analytical processes.
(a) (b)
Figure 12.13. 3-dimensional visualization of DPW derived products. (a) Orthoimage with 1m GSD
draped over a DEM with 30m post spacing (Mt. Katahdin, Maine); (b) Orthoimage with 0.5m
GSD draped over a DEM with 1m post spacing (Seattle, Washington) (note: building sides are
artificially rendered).
12 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
Table 12.2. Example parameters defined in a correlation strategy.
interest in the reference image identifies the reference pattern to be matched in the target image. In
general, larger template sizes produce better results due to higher signal-to-noise ratios, but at the
expense of higher computational demands. In theory, such a search could scan through every pixel in the
target image to determine the best matching patch. By contrast, the epipolar searching strategy (Figure
12.14b) is considerably more efficient because the search is constrained to corresponding epipolar lines.
By its nature it is obvious that epipolar searching depends to a certain extent on the quality of the
relative orientation of the stereopair.
Upon completion of automated DEM generation the results have to be reviewed and edited to
remove blunders. The quality metrics presented in Section 12.2.4 are representative of the accuracy
potential of automated matching methods in a DPW. Accordingly it is expected that correctly matched
points from DEM generation are as accurate as 0.5 pixel or even better. However, the main difference in
automated point measurement during aerotriangulation and DEM generation is related to the massive
amounts of points collected during the second process. This increases the potential for blunders, as
attempts are made to match points in ground areas that may not be suitable for this task (e.g. having low
radiometric variation). Even though automated modules are equipped with tools to identify and remove
poor matching candidates, it is still estimated that anywhere from 5% up to 30% of the points automatically
generated require post-editing [LH Systems, 2001]. Modern high-end DPW platforms generally provide
a comprehensive set of post-processing tools for DTM accuracy assessment, breakline analysis, and
interactive post editing. The autocorrelation process generates a correlation coefficient that indicates
the relative accuracy of a match between a point on the source image and the corresponding point on the
target image. The correlation coefficient takes on a value from 0 to 1, where 1 represents perfect
correlation. Figure 12.15 illustrates one way to review ATE results, i.e., by superimposing the post grid
over the stereo pair. A color-coded classification scheme as in Table 12.3 is used to indicate the relative
accuracy of each post, which is based on the correlation coefficient. Points with low correlation coefficient
values are prime candidates for post-processing.
Figure 12.15. ATE review by superimposing crosses to show post accuracies. These crosses may be color
coded according to classification scheme of Table 12.3 (courtesy of Autometric, 2002). Please refer to the
color appendix for the color version of this image.
14 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
Table 12.3. Sample classification scheme for posts from ATE [Autometric, 2000].
(a) (b)
(c) (d)
(e) (f)
Figure 12.16. Approaches to removing the effects of building lean from an orthoimage. (a) Orthoimage generation
geometry; (b) raw image; (c) Orthorectification from triangulation, but without a DEM; (d) Orthorectification from
triangulation and a DEM; (e) Orthorectification from a DEM and feature information, but no supplemental imagery; (f)
same inputs as previous, but with supplemental imagery to fill in shadows (courtesy of BAE Systems, 2001).
Joining two or more contiguous orthoimages to create large coverage image maps is accomplished
through image mosaicking (Figure 12.17). The general requirement to produce a mosaic is contiguous
orthorectified images (although it is possible to create a mosaic from raw imagery). The process involves
16 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
resampling all input images
into a common spatial resolu-
tion. The user typically has
complete control over the
positioning of seam lines.
Automatic (or manual) histo-
gram matching techniques are
employed to smooth out ra-
diometric differences among
the input images and to opti-
mize the dynamic range of the
mosaic.
Histogram matching
techniques, e.g., image dodg- Figure 12.17. Image mosaic geometry (courtesy of BAE Systems, 2001).
ing, are used to smooth ra-
diometric unevenness among different input images that compose a mosaic. In histogram matching, a
lookup table is generated to convert the histogram of one image to resemble or match the histogram of
another. The matching process is based on the assumption that differences in global scene brightness are
due to external factors such as atmospheric conditions and sun illumination. Therefore, all pixels in a
particular match are radiometrically adjusted in a similar manner. Figure 12.18 demonstrates histogram
matching applied to a mosaic created from four orthoimages. Illumination differences are evident be-
tween image sequences 1-2, and 3-4, which were photographed approximately two years apart.
(a) (b)
Figure 12.18. Histogram matching. (a) Input orthoimages: (1-2) photographed July 1994, and (3-4)
photographed May 1996; (b) Mosaic performed with histogram matching.
12.4.1 Introduction
Feature extraction represents
one of the most complicated
photogrammetric workflow
components from both design
and user perspectives. All of
the high-end systems provide
for the creation of three-di-
mensional feature topology
composed of standard vector
primitives, i.e., points, lines,
and polygons. Sophisticated
relational databases for feature
geometry and attributes are
also provided, with import and
export options to several com- Figure 12.19. Feature extraction in a DPW (courtesy of Autometric, 2002).
mercial formats. A common Please refer to the color appendix for the color version of this image.
practice among DPW vendors
is to provide a seamless interface for a third party software solution to feature extraction, in addition to,
or in lieu of a native solution. A popular environment is the Computer Aided Design (CAD) software
package Microstation®, by Bentley Systems.
Features can be delineated and edited in monoscopic mode (2-dimensional), or stereoscopic mode (3-
dimensional) using a 3-dimensional mouse configuration. In either mode, feature vectors are superimposed
on the imagery as shown in Figure 12.19. Feature extraction requires triangulated imagery and, although
not required, a DEM is usually generated first in order to facilitate the feature extraction process. For
example, a DEM can be used to automatically determine the bottom of a building from a delineated
rooftop, or to provide continuous surface tracking of geomorphic features (e.g., drainage) by constraining
the cursor to the terrain surface.
The process of feature attribution is assigning numerical or textual characteristics of a feature such as
composition, size, purpose, and usage is usually driven by a user-definable set of rules referred to as the
extraction specification. In a typical feature attribution configuration, the user populates a list of pre-
defined attribute names for a given feature type. To provide some level of automation to the process, a
set of reserved attribute names can be automatically calculated from the feature geometry, such as area,
length, width, height, and angle of orientation.
Unlike DEM generation and orthophoto production, the complexity of feature extraction renders it a
largely manual process. However, most platforms provide semi-automated feature extraction tools to
assist the user by completing a feature once adequate information has been collected (e.g. automatically
drawing the sides of a building based on a user-delineated roof). Furthermore, users typically have the
opportunity to import dedicated software solutions to automate feature extraction.
Efforts to automate the extraction of cartographic vector features from digital images form a major
research direction in the photogrammetric and computer vision communities. The recent proliferation of
high-resolution remotely sensed imagery is further intensifying the need for robust automated feature
extraction (AFE) solutions. However, AFE has proven to be a challenging task, as it involves the
identification, delineation, and attribution of specific features (e.g., buildings, roads, and rivers).
Accordingly, the solution of this complex problem lies well beyond matching arbitrary image patches as
performed in automated DEM generation. To date, feature extraction remains largely a manual task in
typical production settings. However, as result of on-going efforts, many AFE algorithms are approaching
the robustness levels required by production environments.
18 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
Table 12.4. Candidate geospatial features for automated extraction.
AFE research is focused on features that are the most useful in GIS applications and those that are the
most time-consuming for manual extraction. For instance, Table 12.4 lists geospatial features that have
been identified by the National Geospatial-Intellligence (NGA) as contributing significantly to extraction
labor in the creation of typical feature data sets. An estimation of the level of research effort being given
to each feature is also provided. AFE research, to date, has focused heavily on man-made features and
targets, with emphasis given to road networks and buildings. Beyond the obvious importance of these
features for geospatial applications, the motivation for this is the fact that roads and buildings are among
the most easily recognizable features over a wide range of image scales for human vision. Although road
and building extraction are relatively trivial tasks for a human extractor, most automated methods are not
yet able to achieve comparable reliability. In the remainder of Section 12.4.1 general design issues
behind common AFE strategies are presented. Some representative automated approaches for road and
building extraction are shown in Sections 12.4.2 and 12.4.3.
20 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
scales. For example, the use of image scale space can optimize the extraction of salient features by first
isolating them at lower resolutions, and then homing in on their precise locations at higher resolutions.
In this way, extraction at lower resolutions serves to initialize subsequent extractions at higher resolutions.
There is ample evidence to suggest that scale space processing is inherent in human vision.
Scene context. Modeling scene context is motivated by perceptual organization in human vision. The
premise is to interpret and exploit background scene components more completely to provide a contextual
reference that enhances the extraction of target features, as opposed to a more constrained approach that
only distinguishes targets from non-targets. For example, road markings and vehicular traffic can provide
valuable cues for the existence of roads. Modeling scene context generally increases the complexity of
the algorithm.
Data fusion. The goal of data fusion is to merge different types of information to enhance the
recognition of features. High-resolution multispectral and hyperspectral imagery and high-resolution
DEMs have become very useful information sources for automated extraction algorithms. The premise is
that solution robustness can be increased when several different input sources of information are
analyzed. However, the increase in computational complexity sets an upper limit on the effectiveness of
data fusion.
(12.5)
(12.6)
(12.7)
22 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
parallel (apar) edge detection [Nevatia and Ramesh, 1980; Zlotnick and Carnine, 1993]. Given its prominence
in the literature, the apar method is presented in some detail to provide an indication of the practical
utility and limitations of automated road detection.
Figure 12.20, adapted from Gonzalez and Woods, 1992, demonstrates anti-parallel edges with a
simulated road feature that is 3 pixels wide. Any two edge pixels p and q are considered anti-parallel if
the distance between them is within a predefined width range, and the relative difference in their
gradient orientations is less than a predefined angle. In addition, the gradient directions must oppose one
another (hence the prefix anti). Anti-parallel gradient orientations either attract or repel one another
relative to the road/background relationship.
(a) (b)
Figure 12.20. Anti-parallel gradients. (a) attracting gradients, and (b) repelling gradients (adapted from Gonzalez and
Woods, 1992).
The implementation of apar detection begins with an edge detection technique that provides gradient
magnitude and orientation, such as the 3x3 Sobel operators (Fig. 12.21). As an example, the horizontal
and vertical gradients at pixel z5 in Figure 12.21 are calculated respectively as,
and
(12.10)
(12.11)
scanning, and results are merged. The perpendicular width estimate of the road for two edge pixels p and
q on a scan line is determined as,
(12.14)
falls below a specified threshold. There are many variations of this implementation in the literature.
Once anti-parallel edge pixels are detected, corresponding centerline pixels are derived in a
straightforward manner by determining the midpoint between anti-parallel pixels. Road network topology
24 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
can then be constructed from the centerline pixels by using an appropriate linking or tracking strategy.
However, apar detection errors can confound tracking algorithms. Figure 12.23 shows the results of an
apar centerline detection algorithm using a width range of 5 to 15 pixels, and a gradient orientation
deflection angle threshold of 50 degrees.
Finding roads by anti-parallel edge detection is effective to the extent that 1) anti-parallel edges are
exclusive to roads and 2) all roads have anti-parallel edges. Buildings, road markings, sidewalks, shoulders,
vehicular traffic, medians, intersections, and random clutter can cause erroneous detection. There are a
variety of input parameters that can be adjusted heuristically to provide an acceptable compromise between
the rates of false positive versus false negative detection. Anti-parallel edge detection is an effective and
standard segmentation technique as a first step towards automated road extraction in high-resolution images.
(a) (b)
(12.15)
where,
Econt , Ecurv are expressions of the first and second order continuity constraints (internal forces),
Eedge is an expression of the edge strength (external force), and
α, β, and γ are relative weights describing the importance of each energy term.
A brief description of these energy functions follows [Agouris, et al., 2001].
Continuity term: If v^ i = (xi, yi) is a point on the contour, the first energy term in (12.15) is defined as:
(12.16)
(12.17)
The continuity component forces snake nodes to be evenly spaced, avoiding grouping at certain areas,
while at the same time minimizing the distance between them.
Curvature term: This term expresses the curvature of the snake contour, and allows for the manipulation
of its flexibility and appearance:
(12.18)
Edge term: Continuity and curvature describe the geometry of the contour and are referred to as
internal forces of the snake. The third term describes the relation of the contour to the radiometric
content of the image, and is referred to as external force. In general, it forces points to move towards
image edges. An expression of such a force may be defined as
(12.19)
The above model attracts the snake to image points with high gradient values. Since the gradient is
a metric for the edges of an image, the snake is attracted to strong edge points. The gradient of the image
at each point is normalized to display small differences in values at the neighborhood of that point.
The coefficients α, β, and γ in (12.15) are weights describing the relative importance of each energy
term in the solution. Increasing the relative values of α and β will result in putting more emphasis on the
geometric smoothness of the extracted line. This might be suitable for very noisy images, but might be
unsuitable when dealing with sharp angles in the object space. Increasing the relative value of γ places
more emphasis on the radiometric content of the image, regardless of the physical appearance of the
extracted outline. As is commonly the case in snake solutions, the selection of these parameters is
performed empirically.
Together, the three energy functions describe an ideal model of a road segment, namely a smooth
curve coinciding with a strong edge in the image. The objective of traditional snake-based road extraction
is to identify in an image a sequence of points describing a contour that approximates this ideal model.
26 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
Selecting seed points within the vicinity of an image feature initializes the snake contour. Using an
iterative procedure, nodes are repositioned to produce a new snake with a lower total energy than its
prior state. For road extraction in high resolution images, the single line snake model is easily extendable
to a dual line, or ribbon snake by including a width component in the model, as shown in Figure 12.24.
(a)
(b)
(c)
Figure 12.25. Examples of fully automated road extraction strategies (reference scene on left, ground truth in middle,
and extraction results on right). (a) Strategy 1 (correctness = 91.2%, completeness = 83.2%, quality not provided); (b)
Strategy 2 (correctness = 94.0%, completeness = 91.9%, quality = 86.8%); (c) Strategy 3 (correctness = 27.5%,
completeness = 56.3%, quality = 25.0%). Images are provided courtesy of the respective publications for each
strategy. Please refer to the color appendix for the color version of this image.
28 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
12.4.3 Building Extraction
While roads generally lie on the terrain surface, buildings extend from the terrain as self-contained 3-
dimensional objects. Approaches to automated building extraction therefore typically combine
photogrammetric principles with CAD-based modeling techniques, commonly referred to as site modeling.
Extraction methods exploit the strong geometric regularities of collinearity, coplanarity, and parallel
lines and planes that are inherent to buildings. They proceed by extracting from the image primitives that
describe the building structure. These primitives may range from points and lines to planar elements
(Figure 12.26). Extraction models also exploit the orientation of shadows cast by buildings when camera
and sun orientations are known or unknown.
Figure 12.26. Building representations. (a) points, (b) wire frame, (c) surface, (d) volumetric. Courtesy [Rottensteiner,
2001].
The roof structure of a building is a fundamental model consideration in building extraction algorithms.
Roof structures can be separated into three broad categories: flat, peaked, and gable, as illustrated in
Figure 12.27. More complex roof structures are usually modeled by considering individual planar roof
components. Semi-automated extraction techniques can exploit geometric regularity combined with a
DEM. The Auto Create tool used in SOCETSET is designed to complete a building structure based on a
particular digitizing sequence. The circled numbers in the top views in Figure 12.27 demonstrate the
digitization sequence for each roof type. For example, in Figure 12.27(a), the user digitizes the three
points as shown, and the algorithm derives the z-value from the DEM (or manually input by the user) to
complete a flat roof building. The peaked and gabled roofs require four and six digitized points respectively.
Figure 12.27. Digitizing sequences for rapid extraction of buildings with specific roof structures. (a) flat; (b) peaked;
(c) gabled.
(d) (e)
Figure 12.28. Automatic building detection. (a) Orthoimage; (b) Canny edge detection of orthoimage; (c) DEM image
with 1m post spacing; (d) Canny edge detection of DEM image; (e) perspective view of DEM.
30 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
could be used to derive an image space estimate of building height in conjunction with the solar
elevation angle.
BUILD+SHAVE simply runs BUILD to produce 2-dimensional boxes, and then runs SHAVE on those
boxes to obtain shadow lengths for each box. The ground sample distance is computed at the center of
the box and this is multiplied by the length of the shadow in image space to obtain the length of the
shadow in object space, which can then be used with the solar elevation angle to derive an object space
height estimate for the building. Photogrammetric routines are then used to generate a 3-dimensional
object space wireframe model from the 2-dimensional box and height estimate. BUILD+SHAVE extraction
results from a panchromatic image of the Ft. Hood site are shown in Figure 12.29(c).
32 Chapter 12
Manual of Photogrammetry (5th Edition), McGlone, C., Mikhail, E., and Bethel, J., (eds.).
American Society of Photogrammetry and Remote Sensing, pp 949-981.
Novak, K., 1992. Rectification of Digital Imagery. Photogrammetric Engineering and Remote Sensing,
58(3): 339-344.
Price, K. 2000. Urban Street Grid Description and Verification, IEEE Workshop on Applications of
Computer Vision (WACV), Palm Springs, pp. 148-154.
Rottensteiner, F., 2001. Semi-Automatic Extraction of Buildings Based on Hybrid Adjustment Using 3D
Surface Models and Management of Building Data in a TIS, Ph.D. Dissertation, Vienna University of
Technology.
Schickler, W. and Z. Poth, 1996. Automatic Interior Orientation and Its Daily Use, International
Archives of Photogrammetry and Remote Sensing, XXXI(B3), pp. 746-751.
Schenk, T., J.C. Li, and C. Toth, 1991. Towards an Autonomous System for Orienting Digital
Stereopairs, Photogrammetric Engineering and Remote Sensing, 57(8): 1057-1064.
Shufelt, J. 1999a. Performance Evaluation and Analysis of Monocular Building Extraction from Aerial
Imagery, IEEE Transactions on Pattern Analysis and Machine Intelligence 21(4): 311-326.
Shufelt, J. 1999b. Performance Evaluation and Analysis of Vanishing Point Detection Techniques, IEEE
Transactions on Pattern Analysis and Machine Intelligence 21(3): 282-288.
StereoGraphics Corporation, 1997. StereoGraphics Developers’ Handbook.
Tang, L., and C. Heipke, 1996. Automatic Relative Orientation of Aerial Images, Photogrammetric
Engineering and Remote Sensing, 62(1): 806-811.
Tsingas, V., 1994. A Graph-Theoretical Approach for Multiple Feature Matching and Its Application on
Digital Point Transfer, International Archives of Photogrammetry and Remote Sensing, XXX(3/2), pp.
865-871.
Wiedemann, C., C. Heipke, H. Mayer, and O. Jamet, 1998. Empirical Evaluation of Automatically
Extracted Road Axes, Empirical Evaluation Methods in Computer Vision (K. Bowyer, and P. Phillips,
editors), IEEE Computer Society Press, pp. 172-187.
Witkin, A. 1983. Scale-Space Filtering, Int. Joint Conference on Artificial Intelligence, pp. 1019-1022.
Zlotnick, A. and P. Carnine, 1993. Finding Road Seeds in Aerial Images, Computer Vision, Graphics,
and Image Processing 57(2): 243-260.