You are on page 1of 3

International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue 8–August 2013

ISSN: 2231-2803 Page 2728

A survey of Alignment methods used for Motion
based Video sequences
B.Kalaivanipriya M.Sc
, P.Narendran M.Sc., PGDCA, M.Phil, B.Ed, PGDBA

Mphil Scholar, Department of Computer Science, Gobi Arts and Science College, Gobi, Tamil Nadu, India.
Associate professor and HOD, Department of Computer Science, Gobi Arts and Science College, Gobi, Tamil Nadu, India

Abstract— Computer vision applications such as video synthesis,
super resolution imaging human action recognition, 3D
visualization, and human action temporal segmentation uses
Temporal alignment methods. The synchronization is done
before making use of following surveyed methods. The discussion
can be made by using the Sequence to sequence alignment
matching, Spatio-Temporal alignment matching and Temporal
matching. These methods overcome the problem of matching two
unsynchronized video sequences of the similar dynamic moment
which is recorded by different stationary Uncalibrated video
cameras. The matching can be done in both Space and Time .The
following surveyed methods based on the concept of Space–Time
Trajectories of moving objects. Sub-frame temporal
correspondence between the two video sequences can be obtained
by exploiting the dynamic behavior of the Space-Time
trajectories of these methods.


Various methods can be adopted for aligning videos. In this
survey comparison can be done on various methods to align
two videos. These ideas can be comprehensive for alignment
of multiple video sequences. With fixed internal parameters,
The sequences are recorded by Uncalibrated video cameras
whereas these parameters are not known. This can be shown
by using a combination of temporal changes , spatial
information and the inbuilt frame-to-frame transformation of
the video sequences, and thus the efficient video alignment
can be achieved. Video Alignment provides new applications
which are not possible when using image-to-image alignment
only. The video alignment problem has been widely studied,
and successful solutions have been suggested .The problem
here is to estimate point correspondences between two video
frames .The spatial alignment techniques can be classified into
two categories: They are feature-based approaches and direct
based alignment. In feature-based alignment approach features
are detected and matched across the two spatial images. Direct
Based Alignment comprises of methods that image intensities
are matched directly
This survey evaluates methods to align two videos. Here,
the issue is that the sequences may not be synchronized when
alignment in time is required as well. The task is to recover
the transformation between the points for a given points from
video sequence . Exploitation of spatial coherence between
the given images can be done by The temporal “redundancy”
in successive frames of a video which then can be used to
move a step away from the traditional image alignment
techniques that exploit spatial . Hence, by using the temporal
behavior (frame-to frame transformance) of the videos ,
alignment can be achieved for the respective frames from
each sequence which have no spatial overlap among them.
Several applications such as surveillance and security systems,
can benefit extremely fromthe alignment of multiple video
sources. A main application is viewing multiple video inputs
whereas in high security areas and the output is displayed in
one combined video sequence, instead of various monitors
contributed for each input source. Another application consist
of generation of super-resolution in time and space and large
screen movies. By using image alignment techniques with
corresponding frames the problemwith video alignment can
be simplified to aligning corresponding frames by using
image alignment techniques. Still there are cases when using
only common spatial information which is not only enough
to determine the transformation.
The following survey describes several approaches with
different constraints which tries to solve the problem of
Video Alignment. As discussed above, the simple image
alignment techniques would not produce correct results. Some
of the successful methods for temporal variations (moving
objects), between the sequences are used where Other
methods try to link both spatial and temporal information
together to get better results. The following surveyed methods
enforce some constraints on the motion of video sequences.

A. The Trajectory-Based Sequence Matching Method
In [1] the Trajectory based sequence matching method has
been discussed. In that, Feature-based image matching can be
generalized to feature-based sequence matching .This can be
done by extending the notion of the features from feature
points to feature trajectories. Trajectory-Based Sequence
Matching approach is based on matching space-time
trajectories of moving objects. It is similar to matching
interest points such as corners, as done in regular feature-
based matching techniques. These sequences are matched in
space and time by implementing regular matching of all points
along respective space-time trajectories. By utilizing the
dynamic properties of these space-time trajectories,
synchronization ( sub-frame temporal correspondence)
International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue 8–August 2013
ISSN: 2231-2803 Page 2729

between the two video sequences can be obtained. Moreover,
the problemof combinatorial complexity of the spatial point-
matching can be reduced by using trajectories rather than
feature-points significantly in the large search space. This
allows for matching information across sensors in various
situations which are more difficult when using image-to-
image matching alone. The situations include very wide
baseline matching, matching under large scale differences,
and matching across different sensing modalities such as IR
and visible-light cameras.


A. Canonical time warping
In general , global transformation of the whole time series
is not accurate in many situations such as aligning long
sequences. In these cases, local models have been provided to
performbetter which is discussed in [2]. This extends CTW
and energy function has been derived to provide the optimized
result. Local canonical time warping extends CTW (canonical
time warping) by allowing multiple local spatial deformations.

1) Local canonical time warping: The fundamental
premise of peer-to-peer (P2P) systems is that of individual
peers are shared by voluntary resources .Then there is an
inherent tension between collective welfare and individual
rationality that threatens the viability of these systems. In [7]
author presents at the intersection of computer science and
economics , targets the design of distributed systems which
consisting of rational participants with selfish and diverse
interests. In particular, major findings and open questions
related to free-riding in P2P systems such as challenges in the
design of incentive mechanisms for P2P systems, factors
affecting the degree of free-riding and incentive mechanisms
to encourage user cooperation were discussed in [7].

B. Dynamic time warping

1) Unbiased bidirectional DWT: In [3] Unbiased
bidirectional dynamic time warping has been discussed which
is an efficient technique for aligning video sequences which
are related by changing temporal offsets. The Formulation of
alignment as the unbiased bidirectional dynamic time warping
which results in alignment .These alignments are not biased
by the choice of the reference sequence. The occurrence of
false singularities is reduced by the regularized dynamic time
warping and results in sub-frame accuracy synchronization.
Comparative analysis with a symmetric transfer error based
technique and rank-constraint based technique experimented a
significant improvement in video alignment with this
technique. Unbiased bidirectional dynamic time warping
technique has a lower or similar computational complexity
when compared to other dynamic time warping techniques.
This technique has been presented in a two-camera setup, and
then the performance has been evaluated based on two-videos.
This method can be extended to align more than two videos.
2) View-invariant Dynamic Time Warping: In [4],
Dynamic Time Warping with the view invariant method has
been discussed which is an extensively used method for
warping two temporal signals. To performnon-linear time
alignment this method uses an optimum time compression or
expansion function. Signature recognition, speech recognition,
gesture recognition is some applications used by this method.
To measure the misalignment between the temporal signals a
distance measure is computed for two signals. Initially the
specification can be made for eight corresponding points
between the first frames of two videos and image coordinates
were denoted. Then the feature points in two videos can be
tracked to obtain trajectories. By using the distance measure
between the points between two trajectories, the classic DWT
algorithmis executed. Finally time- warping function can be
generated by backtracking. Since the matching measure does
not depend on the viewpoint , this method cannot be affected
by any change in viewpoint. This method dynamically
calculates non-linear time- warping function between two 2
dimensional (2D) trajectories.


A. Alignment manifold
In [5] Alignment manifold method has been dealt for the
problem of aligning two spatio-temporal signals. These
signals are generated from the same dynamic scene or the
similar group of dynamics. The misalignment between the
these two signals can be captured by different cameras at the
same time or by the same camera at different times. This
result from the differences in calibration parameters, view
angles , view points , as well as temporal scaling and shifts. In
this work The spatial and temporal factors are considered in
B. Timeline reconstruction Method
In [6], the Timeline reconstruction method has been
discussed which provides an efficient and straightforward
method to temporally align multiple video sequences. When
scene points move along overlapping , three-dimensional, and
near-periodic trajectories this method handles large time
shifts and temporal dilations with no degradation in accuracy .
This method tolerates large extent of outliers in the data ,
discontinuities in feature trajectories , high levels of noise,
sequences which contains multiple frame rates and absolute
absence of stereo correspondences for moving features by
reducing alignment problem.

C. Super-Resolution in Time and Space
In [7] Super-Resolution in Time and Space method has
been discussed. In the spatial super - resolution , multiple low-
resolution pixel shifts are combined to obtain a single high-
resolution sub pixel. This high-resolution sub pixel contains
special features which are not visible in any of the input
sequences. Such methods are also supported by sequence-to-
sequence alignment. On the other hand, sequence-to-sequence
International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue 8–August 2013
ISSN: 2231-2803 Page 2730

alignment which also provides temporal alignment at the
accuracy of high sub frame rates. These processes provide
video applications such as super-resolution in time. Super-
resolution in time defines incorporates information from
several video sequences which is recorded at sub frame time
shift into a single new video sequence of higher frame-rate
which is known as higher temporal resolution. Such a
sequence displays dynamic events which occurs faster than
the regular video frame rate .The following example gives
clearer idea of this method .When a wheel is turning fast
beyond its definite speed it will appear to be rotating in the
wrong direction in all the input video sequences .This visual
effect happens because of temporal aliasing. Playing the
recorded video in slow motion will not construct this process
to go away. Conversely, the reconstructed high-resolution
sequence displays the correct motion of the wheel. The
temporal super resolution cannot be obtained when the video
cameras are synchronized by using dedicated hardware). In
this case, all synchronized cameras capture the occurrences
of the same time. The basis for exceeding the spatial and
temporal resolution of existing video cameras are provided by
Sequence-to-sequence alignment.

D. Multisensory Alignment
In [7], the Multisensory Alignment method has been
discussed which describes frames .These frames are obtained
by sensors of different modalities. For example Infra-Red and
visible light, this varies significantly in their appearance.
Features visible in one frame might be visible in the other
frame, and vice versa. This creates a problem for image
alignment methods. However, when trajectories of moving
objects are used as the features for matching two sequences ,
then there is no need of the similar frame appearance across
the two sensors longer. The coherent appearance information
is needed and thus it can be replaced with coherent dynamic
behavior of feature trajectories. The processing will not be
affected in appearance of the objects across the two sequences
will not affect the processing. The results obtained from
spatio-temporal alignment is displayed after fusing the two
sequences. The both sequence features are displayed clearly
by the fused sequence.
In this work more Alignment methods are surveyed for
spatio-Temporal Alignment. The detailed discussion has been
described for each surveyed methods. Some of the above
methods have not theoretically proved .Some methods use
Temporal alignment and some other method focus on both
Spatial as well as temporal alignments and Sequence to
sequence methods. The method concentrates on the
transformations between the successive frames will be found
using any image alignment technique . The overall
transformation between any two points fromthe two frame
sequences can be chosen by the reference frame in the initial
part of the video. The above mentioned benefits of sequence-
to sequence matching/alignment, Spatio-Temporal alignment
and Temporal alignment gives rise to new video applications,
which are very difficult or even impossible to attain by using
existing image-to-image matching tools.

[1]. Y. Caspi, D. Simakov, and M. Irani, “Feature based sequence to
sequence matching,” Int. J. Comput. Vision, vol. 68, no. 1, pp. 53–
64, 2006.

[2]. F. Zhou and F. Torre, “Canonical time warping for alignment of
human behavior,” in Advances in Neural Information Processing
Systems, 2009.

[3]. C. Lu, M. Singh, I. Cheng, A. Basu, and M. Mandal, “Efficient
video sequences alignment using unbiased bidirectional dynamic
time warping,” J. Vision Commun. Image
Represent.,vol.22,no.7,pp. 606–614, Oct. 2011.

[4]. C. Rao, A. Gritai, M. Shah, and T. F. S. Mahmood,“View-
invariant alignment and matching of video sequences,” in Proc.
ICCV03, 2003, pp. 939–945
[5]. R. Li and R. Chellappa, “Aligning spatio-temporal signals on a
special manifold,” in Proc. Eur. Conf. Computer Vision , 2010, pp.

[6]. F. L. C. Padua et al., “Linear sequence-to-sequence alignment,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 2, pp. 304–
320, Feb. 2010.

[7]. Y.Caspi and M.Irani,“Spatio-temporal alignment of sequences,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 11, pp. 1409–
1424, Nov. 2002.