You are on page 1of 9

Real-time Scene Stabilization and Mosaic Construction

M. Hansen, P. Anandan, K. Dana, G. van der Wal, P. Burt
David Sarnoff Research Center
CN 5300
Princeton, NJ 08543

Abstract in consumer video recorders are a good example of the
W e describe a real-time system designed to construct a lower end of electronic stabilizers which typically com-
stable view of a scene through aligning images of an in- pensate for only certain types of translational image
coming video stream and dynamically constructing an motion with low precision (e.g., [l]).
image mosaic. This system uses a video processing unit We describe a system for real-time stabilization that
developed b y the David Samofl Research Center called can remove full first-order (affine) deformations be-
the Vision Front End (VFE-100) for the pyramid-based tween images in a sequence, and assemble these aligned
image processing tasks requimd to implement this pro- images within a single reference coordinate system to
cess. This paper includes a description of the multires- produce an image mosaic. This system performs sta-
olution coarse-to-fine image registration strategy, the bilization using pyramid-based motion estimation and
techniques used f o r mosaic construction, the implemen- image warping. In order to achieve real-time perfor-
tation of this process on the VFE-100 system, and ex- mance at a minimal hardware cost, this system has
perimental results showing image mosaics constructed been implemented using a pyramid-based image pro-
with the VFE-100. cessing system called the Vision Front End (VFE),
which is specially tailored for such applications.
This paper describes both the algorithm and hard-
1 Introduction ware implementation of the scene stabilization system.
Section 2 provides an overview of the affine image regis-
Imagery obtained from a sensor mounted on a moving tration and mosaic construction process, then describes
platform frequently requires stabilization in one form the goals of the scene stabilization system and the
or another. In some situations, it is necessary to cor- strategies used to construct the stabilized scene from
rect for sensor unsteadiness t o enhance the imagery for the incoming video stream. The affine image registra-
viewing purposes, as can occur when the sensor plat- tion process itself is described in detail in Section 3.
form is vibrating or bouncing rapidly. Automated vi- An overview of the VFElOO video processing system
sion processes, such as moving target detection, typi- is provided in Section 4, along with further explana-
cally require camera-induced motion t o be removed in tions of the scene stabilization implementation on the
order to operate effectively. Other applications, such VFE-100. Finally, Section 5 shows experimental re-
as surveillance, can utilize stabilized imagery both for sults using this system from video sequences taken from
image enhancement and for automated mapping and moving aerial and ground-based imagery, and Section 6
panoramic display construction. provides conclusions and comments about the system
One traditional solution to the image stabilization as well as areas for further research.
problem is to use mechanical stabilizers based on ac-
celerometers, gyros, or mechanical dampers. While
these techniques are useful for handling large motions 2 Stabilization Overview
caused by unstable sensor platforms, these systems are
typically not precise. As a result, even after mechan- The goal of this system is to provide a stable view of a
ical stabilization there may be significant residual im- scene from an unstable incoming video stream provided
age motion. Also, as the stabilization performance by an imaging sensor in motion. The two tasks in this
of the mechanical systems increase, they tend to get process are (1) to estimate and correct for any affine de-
physically bulky and very expensive. These shortcom- formations that exist between successive images in the
ings have led to the use of electronic stabilizers, which video stream, and (2) to combine these aligned images
can offer potentially more precise stabilization a t a re- into a suitable representation for display.
duced cost without the need for mechanical appara- A simple form of scene stabilization can choose an ar-
tus. The commercial “steady cam” systems available bitrary video frame as defining a reference coordinate

54
$4.00Q 1994 IEEE
0-8186-6410-X/94

Authorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:09 from IEEE Xplore. Restrictions apply.
center of the mosaic, so the original reference point for
the stabilized scene is not lost. If large translational
motion occurs within the video changing the field-of-
view significantly from the reference field-of-view, the
mosaic representation expands the field-of-view rather
I I
Display field-of-view than having to select one over the other.

Figure 1: Stabilized scene construction using an image
mosaic. 3 Affine motion estimation
system. Subsequent frames are then aligned to one At the heart of the scene stabilization system is the
another, and the affine transformations are cascaded technique to accurately estimate interframe motion. In
from one frame to the next, yielding the appropriate order to support mosaic construction over many frames
transformation between that video frame and the ref- in real-time, the alignment must be within subpixel
erence. Then, each frame is warped to align with the accuracy and the estimation must be fast.
reference and displayed. The net result of this simple We use a multiresolution, iterative process operating
stabilization display process will be imagery that ap- in a coarse-tefine fashion on Laplacian pyramid images
pears motionless, except for border effects that will oc- to estimate the affine motion parameters (see Figure 2).
cur as a result of warping the imagery. Scene contents Under user control, the system can be made t o com-
that were not visible in the reference frame will not pensate for any first-order motion transformation, such
be displayed in this scenario-the field-of-view of the as translation, rotation, dilation, etc. During each it-
stabilized video will always be restricted to the field- eration, an optical flow field is estimated between the
of-view of the reference frame. Also, the display areas images through local cross-correlation analysis, then an
will be undefined in areas where the incoming imagery affine (or other first-order) motion model is fit to the
does not overlap with the reference frame. flow field using weighted least-squares regression. The
If interframe translational motion is large, the simple estimated affine transform is then used to warp the
stabilization display described above will not be ade- previous image to the current image and the process is
quate t o provide a meaningful stable scene. Large m e repeated between them.
tions can occur when a camera is mounted on a ground-
based vehicle moving over rough terrain, for example, 3.1 Cross-correlation computation
or when the sensor is deliberately scanning, as would be Figure 3 describes the cross-correlation computation
the case for certain surveillance applications. In these process based on the Laplacian pyramid images. Let
situations, it is advantageous to modify the display in L c denote the spatial resolution with L c = 0 being
order to (1) keep the original (reference) field-of-view the highest resolution and L L denote ~ the Laplacian
visible and updated with the most recent image infor- image at resolution L c . (All images throughout the
mation available, and (2) display the stable scene in a paper will be referenced with boldface type.) After one
manner that all (or at least a significant amount) of of the Laplacian images L L [t~- 13 has been prewarped
the imagery that has been stabilized is visible simulta- by the motion estimate from the previous iteration (or
neously. This leads t o an image mosaic representation with zero motion if the current iteration is the first)
as shown in Figure 1. the images are shifted with respect to one another and
The image mosaic is a display that is comprised of image-wise multiplied, yielding a product image. For a
many images aligned to a fixed reference coordinate shift of ( i , j ) , the product image Ii,, is defined as
system. As in the case of simple stabilization, an ar-
bitrary frame is chosen to define the 2D coordinate
system for the stabilized scene. All further imagery
is warped and aligned to the reference coordinate sys- with i , j E [ - N , N ] . Integrating the image Ii,, fully
tem, then inserted into the mosaic at the appropriate will yield the cross-correlation value C;,j between the
location. two full Laplacian images at shift ( i , j ) . Local cross-
The mosaic representation is superior to the simple correlation values can be computed by integrating each
stabilization display in several respects. It provides an product image 1;j over local patches t o yield cross-
expanded field-of-view display which can in many sit- correlation “images” of the form Ci,j(z,y). However, in
uations encompass all of the imagery acquired in the order to avoid border effects and make the results most
video stream into a single representation. The refer- representative of the information at the centers of the
ence field-of-view of the scene is always visible at the local patches, a weighted integration function W ( z ,y)

55

Authorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:09 from IEEE Xplore. Restrictions apply.
Laplacian pyramid Iterate, reselecting
Laplacian pyramid level 4
and linear motion model

PTf-11

Laplacian pyramid

Figure 2: Motion estimation process overview.

is preferred over simple neighborhood averaging for 10- shift at level Lc corresponds t o a shift of 2LC at the
cal integration [2]. Therefore, the values for Ci,j(z, y) highest resolution level. This shift dictates the over-
can be computed from the product image I(;, j ) with all range and precision of the estimates yielded from
analyzing at this resolution level.
Ci,j(z,Y) = Ii,j(z,Y) @ W ( z ,Y) (2) The size of the correlation search area N determines
the maximum displacement (range) that can be esti-
where W ( z ,y) is the integration weighing function and mated at spatial resolution L c . Although larger values
@ denotes convolution. (Note that the image C i , j ( z ,y) of N allow for a larger range of motions t o be estimated,
is not denoted with bold type, because C i , j ( z , y ) is re- the potential for false matches also increases. Also,
ferred to both as an image and as a cross-correlation there is a quadratic increase in the attendant compu-
function.) tational costs. Therefore, in practice, the values for N
The convolution with the kernel W ( z ,y) (typically are restricted to 1 or 2.
a Gaussian) has the effect of smoothing the product The level of integration, Li, determines two things.
images Iij into the cross-correlation images Ci,j. De- First, it determines the amount of smoothing that
pending on the size of this kernel, the resulting Ci,j has been performed on the correlation results. More
will be oversampled to various degrees. Therefore, es- smoothing leads to better signal-to-noise characteris-
timating flow based on analysis of C i , j ( z , y ) directly tics, but will correspondingly result in poorer estimates
will result in correspondingly oversampled flow field. of the spatial location of the correlation peak. More
In order to keep computational costs to a minimum, a significantly, it makes the implicit assumption that the
pyramid reduction process for the integration (see Fig- correlation values (and the flow field) within the area
ure 3) is used instead of performing convolutions of the of integration are locally smooth, which may not be the
product images at the correlation resolution level L c . case everywhere in the image. Also, Li determines the
Different numbers of pyramid reduction steps can be size of the resulting flow field, since a flow estimate can
used to achieve integration over correspondingly differ- be made for each position in the integrated correlation
ent spatial areas, with each pyramid level generated at image excluding the borders. Therefore, the integra-
an integration step reducing the size of the flow field tion level Li is chosen just large enough t o provide the
(and the computational costs associated with this flow necessary support for reliable and accurate flow esti-
field) by a factor of four. mates.
The critical parameters for the local cross-correlation Since the Laplacian is a signed image with approxi-
process are: the spatial resolution level LC used for the mately zero local mean values, the correlation surface
Laplacian images, the half-width N of the correlation has both positive and negative values. This is similar to
search, and the spatial resolution Li chosen for integra- using a type of mean normalized correlation on Gaus-
tion, where L; = LC + the number of integration levels sian blurred images. Note also that the magnitudes of
used. the correlation values are not normalized. While such
The value of LC (the resolution level used for the cor- normalization is completely within the scope of the al-
relation) determines the spatial frequency band used gorithm, in practice this increases the burden on the
for the motion estimation, and hence the motion that computation and, as demonstrated in [3], the result-
will be detected during this iteration. A single pixel ing increase in the accuracy of the flow field is small.

56

Authorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:09 from IEEE Xplore. Restrictions apply.
Laplacian pyramid

mu
(2N+1)x (2N+1)
product images
Laplacian pyramid
U

Ii,i
\ /
Gaussian pyramid
to level 4

Figure 3: Local cross-correlation computations used in the motion estimation process.

-1
peak lbcation
shift=L

Figure 4: Local cross-correlation surface ( N = 1) for a Figure 5: Vector estimation (1-D) from three correla-
given position based on the integrated cross-correlation tion points using the symmetric triangle model to lo-
images after filtering and subsampling. cate the correlation peak.

Considering this, correlation normalization was not in- A further reduction in computation is achieved by sep-
cluded with the current algorithm. arately estimating the horizontal and vertical locations
of the peak. The one-dimensional horizontal and ver-
3.2 Flow estimation tical correlation curves that are generated based the
Laplacian of Gaussian can be modeled as a symmet-
The flow estimation is based on an analysis of the cross
ric triangle as shown in Figure 5 s e e [5] for a related
correlation surface defined by the cross-correlation val-
analysis. Using this model, the equation for computing
ues c$(z,y) for i , j E [ - N , N ] at a given location
the 1D shift value S,,, from 3 correlation values PI,
(z, y) (see Figure 4). The flow value at (2, y) is de-
Pz, P3 is computed with
termined through locating the peak in this correlation
surface. The peak-detection process depends on the
search range N . Below we describe the method used
for analysis of the 3 x 3 correlation surface. In addition,
we use the shape of the cross-correlation surface pro- What if the assumption of the shift being less than
vides to determine uncertainty in the flow estimate [4]. 1/2 a pixel away from the center does not hold? A
The cross-correlation surface for the N = 1 (3 x 3 discrete second derivative can be computed about the
correlation surface) case is shown in Figure 4. Assume center of the correlation surface to determine if the cor-
that the correlation peak is somewhere within 1/2 a relation data is suitable for this sort of interpolation.
pixel of the center of the correlation surface. In order Peaks at a shift of greater than a full pixel will usually
to obtain subpixel flow estimates, we use a type of inter- result in no maximum being detected at the center of
polation of the available values to determine the peak. the surface. Shifts of greater than 1/2 a pixel but less

57

Authorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:09 from IEEE Xplore. Restrictions apply.
than 1 pixel will pass the second derivative test and can 3.4 The affine registration process: practical
be interpolated, but the shift estimates in this case will considerations
not be as accurate as for the measurements for pixels
at less than 1/2 pixel shift. In actual operation, the affine registration process is
Using the P I , P2, P3 nomenclature, a test using configured to perform coarse-to-fine, progressive com-
the second derivative about the center of the surface plexity motion estimation using various motion models.
along one of the lines through the surface is given by A table of the different parameters used to configure
T = 2P2 - P3- PI. If T < 0, then there is no maximum the registration process to perform affine registration
detected around that position in the correlation surface is shown in Table 1.
and no vector should be estimated for that point. Like- This particular configuration of parameters shows
wise, the diagonally-oriented lines on the 3 x 3 surface the basic strategies that can be used for the image
passing through the center should be checked using this registration, and may be varied depending on the as-
same test: if one of the diagonal orientations does not sumptions that can be made about the scene and the
show a peak about the center, then no vector should motion. Note that the entire process is performed in 5
be estimated for that location. iterations. The first two iterations are performed cor-
Our current implementation is restricted to binary relating images at LC = 5 and LC = 4, integrating to
valued confidence measure, based on the aforemen- level Li = 6 and L; = 5 . The first iteration uses a
tioned test on the sign of T . Although a continuous cross-correlation half-size of N = 2 (5 x 5 correlation
valued confidence measure would be desirable, our ob- surface) while the second iteration performs the corre-
servation has been that practice it is more important lations using N = 1 (3 x 3 correlation surface). The
to remove the bad flow estimates altogether, than to goal of the first iteration is t o identify very large dis-
have a continuous weighting. In general, the relation- placements at level 5, then iterate again at level 4 to
ship between the derivative measure such as T and the get a better estimate of the displacement. Also note
actual uncertainty associated with the flow vectors is that the coarse level estimation uses only a transla-
highly non-linear. We are currently exploring various tional motion model. This is because often the effects
strategies for this “normalization” process. of the linear terms are small at coarse resolution; more-
over, the reliable estimation of the linear terms requires
higher-frequency spatial features.
Once a rough estimate of the translational shift in
3.3 Flow field regression the imagery has been determined at level LC = 5 and
4, the processing is performed at levels LC = 3 and
The final step of the alignment process is to fit the flow L c = 2, and an affine model is used for the motion es-
field to a linear, first order motion model. A least- timation. At these resolution levels, typically there is
squares regression is used to fit the flow field to these enough structure present to perform more complex mo-
models. The vector confidence values are used to weigh tion estimation from the flow fields determined. Note
each vector’s influence to the regression-when vectors that, for all later iterations, the same integration level
have 0 valued confidences, these vectors do not con- is used. The 16 x 15 flow field computed from L; = 5
tribute to the regression at all, while positive-valued images is normally sufficient for all types of first-order
confidences allow the vectors to contribute to the re- motion and the computational requirements for a flow
gression in a weighted manner. As noted earlier, the field of this size is modest.
current algorithm restricts the confidence measures to It is worth noting that the registration algorithm
be binary valued. never proceeds beyond & = 2. This restriction is
One of the more important considerations to this al- imposed because our desired accuracy is achieved at
gorithm is the type of motion model to use with the flow LC = 2 and additional processing time is unnecessary.
field during this regression step. A linear, first-order
motion model, for example, allows for motion up to an
affine distortion. In many cases, however, full &ne 4 Real-time implementation
distortions do not occur during the video sequence-
a simple translation, or translation and rotation, or a As mentioned earlier, the real-time scene stabilization
similarity transform may be adequate to fully describe system was designed based on a video processing sys-
the motion occurring within the scene. If prior infor- tem called the VFE-100, a commercially available im-
mation can be used to restrict the motion model, it is age processing platform currently being sold by Sensar,
useful to do so in order to avoid instabilities and am- Inc. The VFElOO was specifically designed to perform
biguities during estimation. multiresolution (pyramid) processing tasks such as im-

58

. .... .- .

Authorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:09 from IEEE Xplore. Restrictions apply.
Iteration Correlation Integration Correlation Motion
Number Resolution L c %solution Li Search Area N Model Used
1 5 (16 x 15) 6 (8 x 7) 2 (5 x 5 area) Translation only
2 4 (32 x 30) 5 (16 x 15) 1 (3 x 3 area) Translation only
3 3 (64 x 60) 5 (16 x 15) 1 (3 x 3 area) Affine
4 3 (64 x 60) 5 (16 x 15) 1 (3 x 3 area) Affine
5 2 (128 x 120) 5 (16 x 15) 1 (3 x 3 area) Affine
Table 1: Description of parameters to perform affine registration using the described algorithm. Resolution levels
are shown as pyramid levels followed by the actual image size in parenthesis. The original image was 512 x 480
pixels in size.
age registration at real-time video rates. The VFE-100 ten, compiled, and loaded into the VFE-100 through
is based on a prototype system that was developed by a host computer (currently either an IBM-PC or Sun
the David Sarnoff Research Center under contract with SPARC) which can communicate with the C30 proces-
the US Army Missile Command (MICOM) to perform sor through a standard VME interface.
image alignment and motion detection for missile track-
ing- applications.
..
This prototype was augmented and 4.2 Scene stabilization with the VFE-100
commercialized by Sensar, Inc. into the more general-
The VFE-100 architecture was designed with this im-
purpose image processing platform that was used to
age alignment application in mind; therefore, this algo-
implement the scene stabilization system.
rithm can be implemented on the VFE-100 efficiently.
4.1 VFE-100 architecture In fact, with the exception of the vector estimation and
flow field regression steps of the alignment algorithm,
Most of the processing functions present within the all of the required image processing tasks can be per-
VFE-100 system are based on pyramid techniques. To formed using the VFE100 hardware.
obtain the processing power required to perform many The utilization of the VFE100 hardware is as fol-
pyramid operations quickly, the VFE-100 contains six lows. The warper is used when images are warped both
proprietary VLSI chips called PYR-1 pyramid chips de- for the alignment (once for each iteration of processing)
signed to perform the image convolutions and other and for the final mosaic construction. The pyramid
arithmetic operations required for generating a variety modules are used to generate the Laplacian pyramids
of different image pyramids and reconstructing imagery used for the alignment and the Gaussian pyramids used
based on those pyramids. for constructing the image mosaic. The correlation
The overall block diagram of the VFE-100 system modules compute the product images (for the various
architecture is shown in Figure 6. The VFE-100 con- shifts) between the two Laplacian images, with three of
sists of a stereo digitizer, a number of frame stores, an these modules present t o make the correlation process
affine warper, six PYR-1 pyramid chips, three image even faster. The pyramid chips present in the corre-
correlators, an ALU, and a display frame store with lation modules are there for performing the pyramid-
color graphics overlay. All of these devices are con- based integration after the cross-correlation multiplica-
nected with a 16 x 16 crosspoint switch, which enables tion step.
the processing units to communicate flexibly. Timing The C30 DSP, as mentioned before, is responsible for
streams are provided with each data stream within the sequencing the image processing hardware. In addition
VFE-100, making the interconnections and processing to that, however, the C30 also takes care of all compu-
between VFE-100 devices completely independent of tations that cannot be performed completely with the
the operations of other devices. video hardware. For the scene stabilization algorithm,
Hardware sequencing and other image processing the most computationally intensive tasks for the C30
tasks that cannot be performed in hardware are han- are the estimation of the flow field by analyzing the
dled by an on-board TI TMS320C30 DSP. Software is cross-correlation images generated by the correlation
written for the VFE100’s C30 using standard ANSI modules, and fitting a linear model to the estimated
compatible C programs. A special hardware control li- flow field. In addition, the C30 is responsible for keep-
brary provides a high-level interface for accessing the ing track of the various affine coefficients used in the
VFE-100 hardware from C programs without having alignment and estimation process, the mosaic construc-
to program hardware registers or other operations di- tion information, and for other bookkeeping tasks re-
rectly. The software run on the VFE-100 is writ- quired for this process.

59

Authorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:09 from IEEE Xplore. Restrictions apply.
5 Experimental results resolution mosaic. During mosaic construction, rather
than just inserting aligned frames into the mosaic, these
Demonstrating the efficacy of the scene stabilization al- frames can be merged into the mosaic using multires-
gorithm in hard copy form is extremely difficult because olution interpolation techniques [7]. This creates a
it is based on the processing of live video. Here, in or- seamless mosaic representation capable of compensat-
der to show the idea of scene stabilization through au- ing for changes in image intensity (due to camera au-
tomated mosaic construction, a small subset of frames tomatic gain control, aperture, and so on) and other
from a video sequence has been provided, along with border effects that are visible in the current system.
the mosaic that was generated from the (entire) se- Finally, another extension involves handling the par-
quence. The sample image frames provided in the fig- allax induced by the three-dimensional structure of the
ures are taken approximately 5 seconds apart. The scene. A full 3D representation in terms of pointwise
scene stabilization system used to generate these re- range or shape information would be desirable, but
sults performed at a processing speed of 10 Hz on the would require significant extensions of the hardware.
VFE-100 system. Instead, we intend to represent the scene in terms of
The video sequence is aerial photography of varying layers of surface patches corresponding t o planes at dif-
terrain, shown in Figure 7. The camera itself moves ferent depths. Such a patchwise representation has the
with the consistent translational motion of the aircraft benefit of ease of implementation as well as more reli-
and also suffers occasional rotations. Using the affine able estimation in terms of parametric motion models.
alignment configuration described in Table 1, all trans-
lational and rotational motion has been compensated Acknowledgements
for, and a panoramic view of the scenes from the first
to the last frame has been constructed using the mosaic The work described in this paper was supported in part
representation. by Sensar Inc., and by the Advanced Research Project
Agency under contract DAAA15-93-C-0061. Besides
the authors, many of the members of the vision groups
6 Conclusions at Sarnoff have contributed to this paper. Specifically,
we would like to acknowledge the contributions of Rob
The real-time scene stabilization system described in Bassman, Joe Sinniger, and Gary Greene for hardware
this paper performs two basic functions. First, the sys- design and development.
tem accurately estimates the affine deformations be-
tween images comprising a video sequence with low
computational cost. Second, it compensates for these References
deformations and constructs an image mosaic of the
sequence providing a stable scene in a single 2D co- [l] K. Uomori, A. Morimura, A. Ishii, H. Sakaguchi, and
Y. Kitamura, “Automatic image stabilizing system by
ordinate system. The implementation of this system
full digital signal processing,” IEEE Trans. Cons. Elec.,
using the Sensar VFE-100 operates with a throughput vol. 36, no. 3, pp. 510-519, 1990.
of 10-15 Hz. [2] P. J. Burt, “Fast filter transforms for image processing,”
From this initial real-time scene stabilization system, Comp. Graph. and Image Proc., vol. 16, pp. 20-51, 1981.
there are many avenues for expanding research. Ongo- [3] P. J. Burt, C . Yen, and X. Xu, “Local correlation mea-
ing research by the authors and others is focused on sures for motion analysis: a comparative study,” in
solving the case of scene stabilization where there is Proceedings of Conf. on Patt. Recog. and Image Proc.,
little if any interframe overlap in the video sequence. pp, 269-274, June 1982.
To solve this extreme stabilization case, a reference mo- [4] P. J. Burt, C . Yen, and X. Xu, “Multi-resolution
flow-through motion analysis,” in Proceedings of IEEE
saic will be constructed not only for display, but also
Comp. Vision and Patt. Recog. Conf., pp. 246-252,
used for alignment. The mosaic itself can be expanded 1983.
using a technique called tiling, which divides the initial [5] K. Nishihara, “PRISM: a practical real-time stereo
mosaic into a group of smaller mosaics that are linked matcher,” MIT AI Memo 780, 1984.
through affine transformations [6]. [6] M. Hansen, P. Anandan, K. Dana, G . van der Wal,
Other research is focusing on better mosaic storage and P. Burt, “Real-time scene stabilization and mosaic
and construction. In the current implementation, the construction,” in The 1994 Image Understanding Work-
image mosaic must be shown at a reduced resolution shop, Nov. 1994.
because of the display size and frame store limitations. [7] P. J. Burt and E. H. Adelson, “A multiresolution spline
with application to image mosaics,” ACM Trans. on
Using a much larger display, a “viewport” can be used
Graphics, vol. 2, no. 4, pp. 217-236, 1983.
to scroll and display portions of a much larger, full

60

Authorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:09 from IEEE Xplore. Restrictions apply.
Figure 6: VFE-100 video processing system architecture.

61
Figure 7: Image mosaic generated by the VFE-100 system. Four frames of the 20 second video sequence, spaced
approximately 5 seconds apart, are shown a t the left. The completed mosaic, shown at right, represents the
assembly of the full 20 second sequence. The processing speed of the VFE-100 was 10 Hz for this example.

62