Global Motion Assisted Low Complexity Video Encoding For UAV Applications

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 9, NO.
1, FEBRUARY 2015 139
Global Motion Assisted Low Complexity Video

Encoding for UAV Applications
Malavika Bhaskaranand, Student Member, IEEE, and Jerry D. Gibson, Fellow, IEEE
Abstract—We design a video encoding scheme that is suited for UAV and camera mounts which is known. Therefore, there is
applications such as unmanned aerial vehicle (UAV) video surveil- a need for low complexity video encoders that can efficiently
lance where the encoder complexity needs to be low. Our low com- compress UAV fly-over videos with primarily global motion.
plexity encoder predicts frames using the global motion informa-
tion available in UAVs and thus achieves lower complexity and Traditional video compression standards such as H.264/AVC
more than 40% BD-rate savings for fly-over videos compared to a and the recently finalized High Efficiency Video Coding
complexity-constrained H.264 encoder with motion estimation re- (HEVC) typically target applications such as entertainment
stricted to 8 8 blocks and half pixel accuracy. We also incorporate video broadcast, storage and playback, and video streaming,
a spectral entropy based bit allocation scheme into this encoder to where encoders can be complex but decoders need to be rel-
achieve near constant quality within groups of pictures (GOPs) at
the cost of small increases in delay and complexity, and a small drop atively simple [5], [6]. The bulk of the complexity in such
in compression efficiency. Both these encoders with their corre- encoders is due to the block motion estimation (ME) performed
sponding low complexity “matched” decoders provide significant to compensate for the local motion and predict frames from
gains of more than 49% BD-rate savings over the Wyner-Ziv based previously encoded ones. Therefore, traditional compression
DISCOVER codec which has a low complexity encoder and a high schemes are not designed to address the more stringent encoder
complexity decoder. Furthermore, for videos where the global mo-
tion is spatially consistent within 2 2 blocks, we show that the complexity constraints and the degrees of freedom present in
computational complexity of these proposed encoders can be sig- UAV video applications.
nificantly reduced with only about 1% BD-rate increase. In this paper, we design novel low complexity encoders
Index Terms—Complexity, global motion, low complexity video suited for moderate to high frame-rate UAV video coding that
encoding, spectral entropy based bit allocation. use global motion compensation for frame prediction. The
global motion information is assumed to be available from
other modules or easily derivable from the known movement
I. INTRODUCTION of the UAV and camera mounts, and is specified using the
W ITH a growing interest in unmanned aerial vehicles homography transformation with eight parameters per frame
(UAVs) for a wide range of applications, there is an (unlike the six parameter affine transformation used in our
escalating need for incorporating video data compression units earlier papers [7], [8]). This global motion prediction approach
into UAVs. In addition to being used in military applications, compensates the motion in the entire frame, unlike the block
UAVs are increasingly being used in commercial applica- motion based prediction used in mainstream video codecs
tions, particularly where the use of manned aircraft poses a where the motion for each block can be specified separately
severe danger to the pilots [1], [2]. The increasing quantity using a motion vector (MV).
and quality of camera sensors pose challenges for the video Our encoders with global motion compensation achieve
compression systems on board the UAVs. At the same time, lower complexity and more than 40% bitrate savings compared
available transmission bandwidth has become more limited to a complexity-constrained H.264/AVC encoder with block
due to the increase in the number of UAVs sharing the same ME restricted to 8 8 blocks and half pixel accuracy, since
link [3]. These challenges are compounded by the stringent our encoders replace the highly complex block ME engine of
space, weight, and power constraints and the desire for longer the H.264 encoder with the relatively simpler global motion
endurance, greater functionality, and lower fuel consumption compensation and do not need to transmit block MVs. We also
in UAVs [4]. Furthermore, unlike typical video sequences that design a second set of encoders with global motion compen-
have substantial local motion, the motion in the UAV fly-over sation and a spectral entropy based bit allocation scheme that
videos is primarily global and due to the movement of the achieve near constant quality across frames, but at the cost
of a small increase in delay, a slight increase in complexity,
and a modest loss in compression efficiency. We show that
Manuscript received October 09, 2013; revised April 03, 2014; accepted July
18, 2014. Date of publication August 07, 2014; date of current version Jan- these encoders with global motion in conjunction with their
uary 20, 2015. This research was supported in part by Raytheon Applied Signal corresponding low complexity “matched” decoders provide
Technology, Inc. The guest editor coordinating the review of this manuscript
substantial gains over the Wyner-Ziv DISCOVER codec [10]
and approving it for publication was Dr. Vladan Velisavljevic.
The authors are with the Department of Electrical and Computer Engi- which has a low complexity encoder and a high complexity
neering, University of California, Santa Barbara, CA 93106 USA (e-mail: decoder. Additionally, we demonstrate that the complexities
malavika@ece.ucsb.edu; gibson@ece.ucsb.edu).
of the proposed encoders can be significantly reduced without
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. significant compression performance losses, for videos with
Digital Object Identifier 10.1109/JSTSP.2014.2345563 global motion that is spatially consistent within 2 2 blocks
1932-4553 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
140 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 9, NO. 1, FEBRUARY 2015
(not presented in our earlier work [7]–[9]). Furthermore, we codec based on the Stanford architecture has been shown
include complexity analysis and performance results for a to outperform H.264/AVC intra and sometimes H.264/AVC
complexity-constrained H.264/AVC encoder in which motion ‘zero-motion’ standard coding [10].
vectors are initialized using the global motion information One of the drawbacks in many implementations of the Stan-
derived from the known movement of the UAV and camera ford architecture is their use of a feedback channel for encoder
mounts, also not presented in our earlier papers [7]–[9]. Fur- rate control [20]. The feedback channel requires low delay de-
thermore, the analyses and discussions in this paper are more coding at the receiver because the Wyner-Ziv frame is decoded/
detailed and comprehensive than in our earlier papers [7]–[9]. reconstructed several times in order to compute the total number
This paper is organized as follows. Section II presents a brief of bits required for successful reconstruction of that frame [20].
overview of related research on low complexity video encoders There have been modifications proposed to avoid the feedback
and the use of global motion models for motion estimation channel in [21]–[23]; however, these modifications increase the
and/or compensation in video codecs. Section III discusses the complexity of the encoder and result in a loss of quality, es-
architecture of the proposed encoders and the global motion pecially for high bit-rates. Since in our target applications, im-
compensation and the spectral entropy based bit allocation posing instantaneous decoding of the video stream at the re-
scheme used in our encoders. Section IV enumerates and ceiver can be too restrictive, the DISCOVER codec based on
introduces the notation for all the different encoders analyzed the Stanford architecture is not suitable.
and compared in this paper. Section V presents a theoret-
ical analysis of the complexities of the proposed and other
B. Global Motion in Video Codecs
hybrid encoders in terms of the number of computations,
number of memory accesses, and storage buffer sizes required. The MPEG4 Part 2 video coding standard supported global
Section VI evaluates and compares the rate-distortion perfor- motion using a single affine transformation to describe the mo-
mances of our proposed encoders to that of the other H.264 tion of an entire frame with respect to its reference frame. How-
based encoders and the Wyner-Ziv DISCOVER codec. Finally, ever the global motion toolset was not widely adopted since the
Section VII summarizes the paper and draws conclusions. large complexity cost of global motion estimation did not pro-
vide commensurate gains in rate-distortion performance [24].
II. RELATED PRIOR RESEARCH Therefore, the H.264/AVC standard that succeeded MPEG4 Part
2 did not support global motion compensation. There have been
efforts [25], [26] to introduce additional global motion predic-
A. Low Complexity Encoders
tion modes in H.264/AVC codecs that provide up to 27% bitrate
Many researchers have worked on low complexity versions savings over the standard H.264/AVC codec.
of standard video encoders like H.264/AVC and MPEG-4. The Global motion has also been utilized within a temporal fil-
low complexity baseline profile of the H.264/AVC standard tering framework to improve the performance of H.264/AVC
discards complex coding toolsets such as B slices and CABAC codecs [27] and HEVC codecs [28]. Stojanovic and Ohm [29]
entropy coding [5] resulting in performance loss. Works such have used global motion compensation between temporally dis-
as [11], [12] design encoders that can achieve multiple com- tant frames to improve the compression efficiency of HEVC,
plexity-rate-distortion points by choosing lower complexity particularly for video sequences with camera zoom. However,
coding modes or by dynamically changing the block ME search all these techniques use global motion within a block ME frame-
range. Fast mode decision algorithms [13], [14] have also been work for standard video sequences and not for motion compen-
proposed to reduce the complexity of H.264 encoders. sating entire frames in video sequences with primarily global
A complexity analysis [15] of the recently finalized HEVC motion.
standard states that the “implementation cost of a HEVC de- Video codecs tailored for UAV applications have also been
coder is not expected to be higher than that of an H.264/AVC proposed that use the available global motion information.
decoder”, but the encoder is “expected to be several times Gong et al. [30] use a homography model for the global motion,
more complex than an H.264/AVC encoder.” Therefore, one merge the first intraframe and subsequent interframe residues
can anticipate that low complexity HEVC encoders would be in a group of pictures (GOP) into a single “big image,” and
developed in a manner similar to those of the low complexity code this image using JPEG2000. However, the residue data
H.264/AVC encoders. in the “big image” is not conducive to JPEG2000 compression
Outside the standards community, recent research on low since JPEG2000 is primarily designed for natural images. The
complexity video encoders has primarily been based on the work by Rodriguez et al. [31] uses the available global motion
theory of Wyner-Ziv distributed source coding [16]. The first information to initialize MVs and thus simplify the block
Wyner-Ziv based video coding algorithms were developed ME in a MPEG-4 encoder. Recently, Soares and Pinho [32]
simultaneously by a group at Stanford University [17] and a and Angelino et al. [33] have presented modifications of the
group at the University of California, Berkeley [18]. Although H.264/AVC encoder to initialize MVs using the camera motion
it has been shown under i.i.d. and Gaussian assumptions information from UAV sensors. However these approaches still
that such architectures achieve the same performance as perform block ME, albeit at a lower complexity, and transmit
conventional motion compensated predictive video coding the derived block MVs. Transmitting the global motion infor-
systems [19], these results have not been achieved by practical mation instead of the derived MVs is more rate efficient for
Wyner-Ziv based codecs. The DISCOVER codec, a Wyner-Ziv fly-over videos.
BHASKARANAND AND GIBSON: GLOBAL MOTION ASSISTED LOW COMPLEXITY VIDEO ENCODING FOR UAV APPLICATIONS 141
The eight parameter homography model [36] can handle

changes in view-point (perspective) in addition to rotation,
scaling, translation, and shear and is represented as
(1)
where are the pixel co-ordinates in the input image to

be homography transformed, are the co-ordinates after
the homography transformation, and is an auxiliary scaling
factor. Even though the homography matrix is a 3 3 ma-
trix, it has only 8 degrees of freedom due to the scaling
factor. Therefore, two matrices that differ only by scale are
equivalent.
UAVs have navigation systems that estimate the position and
attitude of the UAV typically using inertial navigation sensors
Fig. 1. Architecture of the proposed low complexity video encoders. The
blocks that are different from a standard H.264/AVC encoder are indicated by
(INS) and/or global positioning sensors (GPS) [37]. The po-
shading. sition and attitude obtained using these sensors could be used
to estimate the camera motion and hence the global motion in
the video sequence as follows. The homography matrix that
III. PROPOSED ENCODER ARCHITECTURE transforms the image captured at time instant to the co-ordi-
nates of the image captured at time instant can be derived as
The block diagram of the proposed low complexity encoders
is shown in Fig. 1. The input video sequence of resolution
(2)
is divided into groups of pictures (GOPs). The first
frame in each GOP is intra coded and the remaining frames
in the GOP are inter-predicted from global motion compen- where is the camera calibration matrix, is the camera ro-
sated reference frames. Unlike in conventional block ME, the tation matrix, is the distance of the camera from the imaging
global motion parameters apply to the entire frame and not plane (height of the UAV from the ground), is the camera
for individual blocks. Then, the prediction residue is passed translation vector, and is a unit normal on the imaging plane
with respect to the camera axes at [31]. The camera calibra-
through a spatial 2-D transform and the transform coefficients
tion matrix has the form
are quantized and entropy coded. If the quantizer is to be
designed using the spectral entropy bit allocation scheme de-
veloped in [34], then the transform coefficients are buffered, (3)
their variances are estimated, significant coefficients to be re-
tained are selected, bits are allocated, and scalar quantizers
that operate independently on the transform components are where and are the physical width and height of a pixel on
designed. the camera's sensor and and are the horizontal and vertical
Our new, low complexity video encoders use the 4 4 H.264 offsets of the camera's center of projection in the image plane.
integer transform [35] and the H.264 quantization framework. However, the motion estimates from the INS could be inac-
One of them uses the default H.264 quantization matrix (QM) curate due to accumulated errors especially in low-cost UAVs
while the other uses QMs designed using spectral entropy based and data from the GPS system could be unavailable due to the
bit allocation. The GOP size is set to 8 and interframes are pre- inaccessibility of external satellite signals [37]. To overcome
dicted using 2 reference frames: the previous frame and the next these limitations, vision-based methods [37]–[40] have been de-
available intraframe. veloped that use the camera-captured video data to either im-
prove the accuracy of the position and attitude estimated using
A. Global Motion Compensation and Parameter Derivation INS/GPS or compute the position and attitude without relying
on the INS/GPS data. Therefore, the homography transforma-
In our application, the observed scene can be approxi- tion used for global motion compensation in our proposed en-
mated to be planar since the camera mounted on the UAV coders could be derived from the position and attitude estimated
is sufficiently far from the objects being observed. Therefore, using such vision-based methods. Furthermore, when the posi-
the motion across frames which is dominated by the camera tion and attitude are estimated using vision-based methods such
movement can be specified using the homography model [36] as [37], [38] that compute the homography transformation be-
which assumes the observed scene to be planar. This planar tween video frames, the already computed homography param-
assumption is violated only at the edges of high 3D objects eters can be used for global motion compensation in our pro-
such as buildings and results in higher prediction error for a posed encoders.
very small fraction of pixels in the video sequences in our Given the homography matrix that describes the motion of
dataset. the current frame with respect to the reference frame, the motion
across them is compensated by computing for each full- are ordered based on their energies i.e., .
pixel position in the current frame using (1) and reading Let there be sampling functions (blocks/frames) each with
the data at location in the reference frame, approximated such components. Out of the total coefficients, let
to the nearest half-pixel position with the pixel data being gen- coefficients be coded. Then the spectral entropy-based coeffi-
erated using the H.264/AVC interpolation filter. Moreover, in cient selection [42], [45] dictates that the number of coefficients
the cases when some portions of the observed scene towards coded in each component should be proportional to the vari-
the edges of the current frame are not present in the reference ance of that component i.e., , where
frame, the pixels in those regions of the current frame are pre- and .
dicted from the border pixels of the reference frame extrapo- If is the average number of bits spent to code a coef-
lated using the nearest-neighbor algorithm. However, since our ficient of component , the total number of bits spent is
encoders use bidirectional prediction, these “new” regions in the where is the number
current frame are present in either of the past or future reference of bits required to code the binary significance map that indi-
frame and hence do not greatly affect the encoder rate-distortion cates the significant coefficients. The coding distortion is gen-
performance for the video sequences studied here. erated by two sources: quantization and discarding coefficients.
The homography matrix can be viewed as a concise descrip- Hence the expected value of the distortion of the th component
tion of the translation of each pixel at location in the can be written as
current frame to the pixel at location in the reference
frame. Therefore, if the global motion is spatially consistent
within a small spatial neighborhood, we can approximate that
all the pixels in the spatial neighborhood defined by
in the current frame get
translated by the same amount to match the spatial neighbor- (5)
hood in the reference frame. Such an ap-
In this equation, the quantization error is computed assuming
proximation further reduces the computational complexity of
that the overload distortion is negligible and the high-resolution
the homography transformation since (1) needs to be evaluated
approximation holds and is a constant determined by the dis-
only once for each spatial neighborhood. Later in Sections V and
tribution of the normalized random variable [46].
VI, we demonstrate that this approximation of the homography
transformation for 2 2 blocks (i.e., and ) sig- Hence, the problem of bit allocation is to find for
nificantly reduces the motion compensation complexity with a so as to minimize subject to
very small loss in compression performance. the bit budget constraint that .
Using Lagrangian optimization methods, the number of bits
B. Spectral Entropy Bit Allocation allocated to each of the coded coefficients of component can
In this section, we present a bit allocation scheme based on be shown to be
the principles of spectral entropy developed by Campbell [41]
and Yang et al. [42] as a basis for efficiently sampling frequency
coefficients. Yang et al. [42] have shown that the Campbell
(6)
bandwidth is the minimum average bandwidth for encoding the
This is similar to the result of classical bit allocation [46] except
process across all possible distortion levels. In addition, Jung
and Gibson [43] have obtained an expression for coefficient rate that the geometric mean of 's has been replaced
using the logarithm of the ratio of rate-distortion function slopes by and the geometric mean of 's has
for the given source and a uniform source, where the logarithm been replaced by . The corresponding total distor-
is averaged over large distortions. These results indicate a rela- tion is
tionship between coefficient rate and the rate-distortion function
of a source. Therefore, we have developed a spectral entropy
based bit allocation scheme [34], [44] starting with the spectral
entropy based coefficient selection method developed by Yang (7)
et al. [42]. Here we include a brief derivation and discussion of The proposed bit allocation method examines the input trans-
our bit allocation scheme for completeness. form coefficients and chooses to code only those that are signif-
Consider a zero-mean, stationary, continuous-time random icant for retaining signal fidelity. Therefore, this method adapts
process . Using the K-L expansion in the time interval to the actual coefficient values that need to be coded. In con-
, the process can be decomposed as trast, the classical bit allocation method relies entirely on the
energies of the transform components and hence is designed for
a class of inputs, all having the same component energies but
(4)
different coefficient values. Ortega and Ramchandran [47] have
noted that “input-by-input” approaches that adapt to the source
where 's are normalized eigenfunctions and 's are un- data being compressed are likely to be superior to “one size fits
correlated random variables with and . all” approaches that are designed to perform well on average for
Without loss of generality, we can assume that the components a class of inputs.
For the case when all the components have the same nor- TABLE I
malized distribution, , (6) can be rewritten NOMENCLATURE OF THE HYBRID ENCODERS EVALUATED.
as
(8)
where is the spectral entropy ex-
pressed in bits and is the coefficient rate. In (8),
the first term is an average bitrate, the second term depends
on the source, and the last term depends on the current com-
ponent being coded.
The corresponding classical bit allocation equation can be ex-
pressed as
is applied to each pixel or 1 1 block. Similarly, the proposed

encoder with global motion compensation and QMs designed
using the spectral entropy based bit allocation is referred to
(9)
as the GMs_1 1 encoder where ‘s’ stands for spectral en-
tropy QM. We also modify the global motion compensation
❷
❸ in the GMh_1 1 and GMs_1 1 encoders to compute the
where in is replaced by in ❷ and the arithmetic mean homography transformation for 2 2 blocks (as discussed
of 's in is replaced by the geometric mean in Section III-A) instead of for each pixel and denote these
encoders as GMh_2 2 and GMs_2 2. In Section V, we show
in ❸.
that these GMh_2 2 and GMs_2 2 encoders have much
In a two-dimensional discrete cosine transform (DCT) based
lower complexity than the GMh_1 1 and GMs_1 1 encoders.
coding scheme, Mester and Franke [48] used the spectral
We compare our proposed encoders to a complexity-con-
entropy (corresponding to ) and the energy (corresponding
strained H.264 encoder with block ME restricted to 8 8 blocks,
to ) of the transform coefficients to classify data blocks and half pixel MV accuracy, and no deblocking filter. This encoder
adopted different coding strategies for the different classes. is termed the BMh_8 8 encoder where ‘BM’ denotes block
They used these two measures to estimate the amount of tol- motion. We have found that for the test sequences presented
erable errors and sensitivity to quantization and/or truncation in this paper, a ME search area of 32 32 in the BMh_8 8
and thus develop an adaptation scheme for a threshold coding encoder gives good results at reasonable complexity. We also
system. The spectral entropy and energy were treated as or- include in our comparison, a global motion assisted H.264
thogonal entities in their work, where as (8) provides a way encoder that we denote as BGMh_8 8 where ‘BGM’ stands
to combine the average bitrate (first term), spectral entropy for global motion initialized block ME. The BGMh_8 8
(second term ), and energy (third term ) for bit allocation. encoder is the BMh_8 8 encoder with its MVs initialized (be-
fore block ME) using the available global motion parameters
and is similar to the encoders proposed for UAV applications
IV. ENCODERS EVALUATED AND NOMENCLATURE
in [31]–[33]. Since the MVs in this encoder are initialized
In this paper, we compare the complexity and compression using global motion, the search range for block ME in the
performance of eight different encoders: two versions of the BGMh_8 8 encoder can be reduced to 16 16 to achieve
proposed encoder with global motion compensation and de- compression performance similar to that of the BMh_8 8
fault H.264 QM; two versions of the proposed encoder with encoder. As a baseline reference for the performances of the
global motion compensation and spectral entropy QMs; two above encoders, we include a H.264 encoder with zero motion
H.264 based encoders with block ME; one zero motion en- where all the MVs are set to zero and hence no bits are spent on
coder without any motion estimation or compensation; and one transmitting MVs. We denote this encoder as ZMh_8 8 where
Wyner-Ziv based DISCOVER encoder (used with a high com- ‘ZM’ represents zero motion and 8 8 represents the block
plexity decoder). This section introduces each of these encoders sizes for which the decision of uni-directional or bi-directional
and the notation used to refer to them in the rest of this paper. prediction is made. Interframe prediction in all these encoders
The proposed encoder with global motion compensation and (GMh_1 1, GMh_2 2, GMs_1 1, GMs_2 2, BMh_8 8,
default H.264 QM is termed the GMh_1 1 encoder where BGMh_8 8, and ZMh_8 8) is performed using 2 reference
‘GM’ stands for global motion and ‘h’ for default H.264 QM, frames. The nomenclature of these encoders is summarized in
and ‘1 1’ indicates that the homography transformation in (1) Table I.
We do not include analysis and results for the BMh_4 4 TABLE II

encoder with ME blocks allowed to be of size 4 4 or 8 8. The SUMMARY OF COMPLEXITY ANALYSIS OF THE HYBRID ENCODERS IN
BMh_4 4 encoder has poorer compression performance than TABLE I FOR 640 480 SEQUENCE AND 2 REFERENCE FRAMES.
the BMh_8 8 encoder because no rate-distortion optimized
mode decision is used in either encoder due to its complexity.
Further, the drop in performance with the use of ME on 4 4
blocks is because the bitrate increase due to the greater number
of MVs in BMh_4 4 compared to BMh_8 8 does not offset
the quality gains obtained due to better motion prediction. In
addition, the number of memory accesses required for ME on
4 4 blocks is about twice that required for ME on 8 8 blocks
and this is one of the main reasons why 4 4 blocks sizes are
not allowed for inter-prediction in HEVC [6].
All the encoders discussed so far in this section utilize some
form of inter-frame prediction around a 2D spatial transform
and hence are termed hybrid encoders. In this paper, we com- Here the term macroblock (MB) is used to refer to an image area
pare the performances of these hybrid encoders with the DIS- of 16 16 pixels. The memory accesses and computations are
COVER codec [10] that uses a low-density parity-check accu- evaluated only for the processing blocks that differ across the
mulate (LDPCA) code based channel encoder and a high com- encoders viz. block motion estimation, global motion compen-
plexity decoder with motion interpolation. sation, and spectral entropy-based QM design. The complexity
analysis is summarized in Table II with details provided in the
V. COMPLEXITY ANALYSIS respective subsections that follow.
The BGMh_8 8 encoder requires fewer computations and
The electronic components of UAV payloads are resource memory accesses than the BMh_8 8 encoder because the use
constrained largely due to power limitations. The dynamic of global motion to initialize MVs allows reduced ME search
power consumed in a circuit is proportional to its clock fre- area size and average number of block candidate searches.
quency. Reducing the number of computations per unit time The GMh_1 1 encoder also requires fewer computations and
would enable running the circuit at a lower clock frequency, memory accesses than the BMh_8 8 encoder since it replaces
thus resulting in lower power consumption [49]. Therefore, the block ME with the simpler global motion compensation. The
number of computations is an important factor contributing to number of computations for global motion compensation can
the complexity of a video codec system. be reduced to a fourth by computing the homography once
Another important factor contributing to the video encoder's for each 2 2 block in the GMh_2 2 encoder instead of for
power consumption is the number of external SDRAM memory every pixel in the GMh_1 1 encoder. The use of spectral en-
accesses. For encoding 720p video, the SDRAM memory ac- tropy based QM design increases the number of computations,
cesses will have a significant power consumption of about 40 – memory accesses, and storage requirements for the GMs_1 1
120 mW with 1 reference frame and higher with more reference and GMs_2 2 encoders, but these encoders still require fewer
frames [49]. An additional parameter frequently used to charac- memory accesses than the BMh_8 8 encoder. The ZMh_8 8
terize the complexity of a video codec system is the size of the encoder is the least complex since it does not perform block
memory/storage required. ME or global motion compensation. All these hybrid encoders
In addition to the number of computations, the number of have similar storage requirements except the GMs_1 1 and
memory accesses, and the memory (storage) requirements, the GMs_2 2 encoders which require about 300 kB additional
complexity of real-world systems depends on a number of other memory for buffering transform coefficients.
factors such as the availability of hardware accelerators, that The Wyner-Ziv DISCOVER codec encoding time has been
are highly dependent on the particular implementation of the shown to be less than 1/6-th the encoding time of the zero mo-
system. However, the number of computations, the number of tion H.264 encoder [50]. Therefore, we conjecture that the DIS-
memory accesses, and the memory requirements are common COVER encoder has lower complexity than the hybrid encoders
and significant factors contributing to the complexity of real- discussed above.
world systems. Therefore in this section, we present a theoret- 1) Computations: From (1), the global motion compensation
ical analysis of the complexity in terms of the number of com- using homography seems to require 6 multiplications and 6 ad-
putations, the number of memory accesses (reads and writes), ditions per pixel. This translates to multiplications
and the memory requirements as a necessary first step towards and additions per MB where is the size of the
comparing the complexities of the various encoders presented in MB, is the block size for which the homography transfor-
this paper. Complexity analyses of the implementations of these mation is applied ( for the GMh_1 1 and GMs_1 1
encoders in real-world systems are consigned to future work. encoders and for the GMh_2 2 and GMs_2 2 en-
coders), and is the number of reference frames used. How-
A. Complexity of Encoders ever on closer examination, we observe that for a video frame
In this subsection, we evaluate the complexities of all the hy- of width pixels and height pixels, the pixel co-ordinates
brid encoders listed in Table I in terms of the average number of and take only a finite set of
computations required for processing one macroblock of data, values. Hence we can pre-compute the quantities and
the average number of memory reads and writes required per for , store them, and reuse them to reduce
macroblock processed, and the memory (storage) requirements. the computational complexity of global motion compensation.
TABLE III TABLE IV

COMPUTATIONS PER MACROBLOCK FOR GMH_ MEMORY ACCESSES PER MACROBLOCK FOR GMH_
AND GMS_ ENCODERS. AND GMS_ ENCODERS.
to be power-equivalent to 3 additions, the number of equiva-

lent additions required per MB is for the GMh_1 1
encoder, for the GMh_2 2 encoder, for the
Therefore, the number of additions required per MB for global GMs_1 1 encoder, for the GMs_2 2 encoder, and
motion compensation using homography transformation is for the BGMh_8 8 encoder. Hence, the GMh_1 1,
GMh_2 2, and GMs_2 2 encoders require fewer computa-
(10) tions than the BMh_8 8 encoder while the GMs_1 1 encoder
requires similar number of computations as the BMh_8 8 en-
and the number of multiplications required per MB is coder. The BGMh_8 8 encoder requires fewer computations
than the GMh_1 1 and GMs_1 1 encoders, but more than the
GMh_2 2 and GMs_2 2 encoders.
(11)
For the ZMh_8 8 encoder, no computations are necessary
for motion estimation or compensation, since all MVs are set to
The spectral entropy based QM design in the zero. Therefore, it is the least computationally intensive encoder
encoder involves computing the energy of the coefficients among all the hybrid encoders.
which requires additions and multiplications per MB. The 2) Memory Accesses: In this subsection, we compute the
number of computations per MB for the and number of external memory accesses assuming cache sizes to
encoders are listed in Table III for be smaller than the memory required to store pixels of one video
and 2, and . frame but large enough to store the block ME search area used
On the other hand, the complexity of evaluating a single can- by the BMh_8 8 and the BGMh_8 8 encoders.
didate block during block ME is equal to the complexity of The global motion compensation operation requires
the sum of absolute differences (SAD) computation i.e., memory reads (each accessing 8 bits) per MB corresponding to
additions. For video sequences typical of UAV surveillance the access of data from the reference frames. It does not de-
and with half-pixel MV accuracy and block ME search range pend on , the size of the block for which the homography
of 32 32, the EPZS algorithm [51] requires about 11 candi- transformation is applied. Since the spectral entropy based QM
date searches per block on an average with 2 reference frames. design requires the energy of the coefficients to be computed
Therefore, the BMh_8 8 encoder requires about 5632 additions before the QM is designed, it involves additional memory reads
per MB. and writes of transform coefficient data which typically have
The BGMh_8 8 encoder uses a reduced ME search area higher 14-bit precision compared to 8-bit pixel data. Therefore,
of 16 16 (compared to the 32 32 ME search area of the treating one 14-bit memory access as equivalent to two 8-bit
BMh_8 8 encoder) and hence requires only about 5 candidate memory accesses, the spectral entropy based QM design neces-
searches per block on an average with 2 reference frames sitates an additional -bit memory writes per MB to store
(compared to the 11 candidate searches required on average the coefficients and -bit memory reads per MB to access
per block for the BMh_8 8 encoder). The ME engine in the the coefficients for quantization after the QM is designed. The
BGMh_8 8 encoder also includes the computation of global 8-bit memory accesses for the proposed encoders are tabulated
motion for 8 8 blocks that involves about 24 additions and 16 in Table IV with and . In terms of power con-
multiplications per MB. sumed, memory writes are considered equivalent to memory
Since the proposed GMh_1 1, GMh_2 2, GMs_1 1, and reads since the two memory access operations consume similar
GMh_2 2 encoders with global motion compensation and the amounts of power for a given memory bandwidth [54].
BMh_8 8, and BGMh_8 8 encoders all use half-pixel accu- For computing the number of 8-bit memory accesses in the
rate motion compensation with the same H.264/AVC interpola- ME engine of the BMh_8 8 encoder, we assume ME blocks of
tion filter, the computational complexity of pixel interpolation size and a search area size of centered about
in all these encoders is assumed to be the same and hence is ex- the same location as the current block. If the overlapping regions
cluded from the analysis here. of search areas within the same reference frame are reused for
In order to compare the computational complexities of these adjacent blocks (level C memory reuse as described in [55]), the
various encoders that require different number of additions and number of 8-bit memory accesses per block is given by
multiplications, it is beneficial to approximate each multiplica-
tion operation by an equivalent number of addition operations
that consume the same power. The ratio of the power consumed (12)
by the 16-bit multipliers proposed in [52] and the 16-bit adders
considered in [53] is between 2 and 4 using the same The assumption that the search area is centered about the
silicon technology. Therefore by approximating a multiplication same location as the current block is very restrictive and
TABLE V encoders evaluated here (BMh_8 8, BGMh_8 8, ZMh_8 8,

MEMORY REQUIREMENTS FOR GMH_ GMh_1 1, GMh_2 2) have similar storage requirements
AND GMS_ ENCODERS. except the GMs_1 1 and GMs_2 2 encoders which require
about 300 kB of extra storage.
B. Complexity of Global Motion Parameter Derivation

The homography model parameters of a video sequence can
be derived as in (2) from the UAV position and attitude informa-
tion that is available from the INS/GPS systems or vision-based
estimation methods [37]–[40]. The vision-based position and
attitude estimation methods when employed are considered as
part of the navigation system and hence their complexities are
can result in reduced coding efficiency for fast motion video
not included in the complexity analysis of our proposed en-
sequences since motion vectors (MVs) can exceed the given
coders. In (2), the matrices , , and are of size 3 3 and
search range [56]. Hence in most cases, the search area is cen-
the vectors and are of size 3 1. Since the camera calibra-
tered about the MV predictor and a more sophisticated search
tion matrix is independent of the movement of the UAV and
area reuse algorithm such as [56] needs to be used. Since the
camera mount, and can be pre-computed. Hence the ho-
memory reuse is much less when the search area is centered
mography matrix derivation requires 21 additions and 36 mul-
about the MV predictor than when it is centered about the same
tiplications that are power-equivalent to 129 additions. These
location as the current block, the average number of memory
129 additions per frame for a 640 480 sequence correspond
accesses is larger.
to about 0.1 additions per 16 16 MB. Assuming all the ma-
Considering the best scenario in which the search area is cen-
trices and vectors in (2) are stored with 32-bit precision, 132
tered about the same location as the current block, the number
memory reads and 132 bytes of storage are required to compute
of memory accesses per MB for ME is
the global motion parameters of each frame which corresponds
in the BMh_8 8 encoder with search area 32 32 and
to 0.03 memory accesses per MB. Therefore, the derivation of
in the BGMh_8 8 encoder with search area
the global motion parameters from the camera motion informa-
16 16. The corresponding number for the
tion is a very low complexity operation.
encoder is 512 and that for the encoder
is 1536. Consequently, the number of memory accesses re-
quired by the encoder is less than that VI. COMPRESSION PERFORMANCE EVALUATION
required by both the BMh_8 8 and BGMh_8 8 encoders. The In this section, we present results for the compression perfor-
encoder requires fewer memory accesses mance of the proposed encoders with global motion compen-
than the BMh_8 8 encoder, but more than that required by the sation used in conjunction with their corresponding low com-
BGMh_8 8 encoder. plexity “matched” decoders. We compare their performances
The ZMh_8 8 encoder only requires the co-located blocks with those of a complexity-constrained H.264 (BMh_8 8) en-
from the reference frame for motion compensation, and hence coder, a global motion assisted H.264 (BGMh_8 8) encoder,
needs memory reads per MB. Therefore, it requires as many and a zero-motion H.264 (ZMh_8 8) encoder, also with their
memory accesses as the encoder, but fewer corresponding “matched” decoders. The different bitrate points
memory accesses than the encoder. for all these hybrid encoders are obtained at fixed H.264 quan-
3) Memory Requirements: The external memory require- tization parameters (QPs) in the range 16 – 40. For all these
ments of the encoders include bytes encoders, the GOP size is set to 8 and the I frames are encoded
for the reference frame buffers, bytes for the current frame using the same H.264 intraframe encoder. Implementations of
buffer, bytes to store the pre-computed quan- all these encoders are in MATLAB r2011b.
tities for global motion compensation, and bytes for storing We also include results for the Wyner-Ziv based DISCOVER
constants where is the size of the 2-D spatial transform codec [10], [57] which has a low complexity encoder and a high
used. In addition to the buffers used by the complexity decoder. We use a 2007 implementation of the DIS-
encoders, the encoders require additional COVER codec provided on the project website [57]. Recent im-
storage for the transform coefficient buffer and the spectral en- provements to the DISCOVER codec such as those proposed
tropy QM look-up table. The buffer for storing the transform co- in [58], [59] have not been included since their implementa-
efficients with 14-bit precision has to be bytes large. How- tions were not available. However, even with these improve-
ever, if the bottom half of this buffer is used to store the current ments, the performance of the DISCOVER codec is very sim-
frame (with 8-bit pixel data), the additional memory required for ilar to or slightly better than that of H.264/AVC codec with zero
the transform coefficient buffer is only bytes. The spectral motion. For the Wyner-Ziv DISCOVER codec, the different bi-
entropy QM look-up table requires bytes. These memory trate points are achieved by varying the quantization index from
requirements are presented in Table V for , , and 1– 8. The side information at the decoder is derived via motion
. compensated interpolation using 2 nearest reconstructed frames.
The BMh_8 8, BGMh_8 8, and ZMh_8 8 encoders The GOP size is set to 8 and I frames (also called key frames
require bytes for the current and reference in Wyner-Ziv video coding literature) are coded using a H.264
frame buffers and bytes for constants. In addition, the intraframe encoder.
BGMh_8 8 encoder requires bytes for global The video sequences used to evaluate the encoders range
motion compensation with . Therefore, all the hybrid in resolution from 176 144 (QCIF) to 640 480 (VGA) with
Fig. 2. Rate-distortion performance for 176 144 “aerial_beach1_crop” sequence. The encoder nomenclature remains the same as described in Table I. (a) Average
PSNR vs. bitrate. (b) Average SSIM vs. bitrate.
Fig. 3. Quality variation across GOPs for 176 144 “aerial_beach1_crop” sequence. The encoder nomenclature remains the same as described in Table I.
(a) Standard deviation of PSNR vs. average PSNR. (b) Standard deviation of SSIM vs. average SSIM.
frames captured at the rate of 24 to 30 frames per second. Al- “matched” decoders for 89 frames of the 176 144 (QCIF)
though the global motion parameters for the “aerial_beach1_crop” sequence. We also include results for
and encoders can be derived from the position the Wyner-Ziv based DISCOVER codec [10]. Fig. 2 plots the
and attitude of the UAV, we have found it difficult to obtain un- average PSNR and SSIM against the average bit-rate while
compressed video sequences with this metadata. Data obtained Fig. 3 plots the standard deviation of PSNR and SSIM within
from the sensor data management system of the Air Force Re- GOPs against the average PSNR and SSIM.
search Laboratory [60] was found to contain incorrect camera From Fig. 2, we see that the BGMh_8 8 encoder performs as
information. Therefore, for the results presented in this section, well as the BMh_8 8 encoder, but at lower complexity. This is
we estimate the homography global motion parameters from because the BGMh_8 8 encoder uses smaller ME search areas
the video sequence using the Scale Invariant Feature Transform centered around the MVs initialized using global motion param-
(SIFT) [61] to find point correspondences between two video eters, but still transmits MVs.
frames and the RANdom Sample And Consensus (RANSAC) Comparing the curves for the BMh_8 8 and the GMh_1 1
algorithm [62] to determine the homography transformation that encoders, we note that the use of the known global motion
best describes the global motion between those two frames. instead of block motion gives a significant BD-rate saving of
We evaluate the average rate-distortion performances of the 40% or equivalently a BD-PSNR improvement of 2.5 dB. This
encoders by plotting the average frame quality against the av- performance improvement is largely because the GMh_1 1
erage bitrate. We also quantify the consistency of frame quality encoder does not need to transmit motion information since it
by computing the standard deviation of the frame quality within utilizes the global motion parameters derivable at both the en-
each GOP and plot it against the average frame quality. The coder and decoder. In contrast, the BMh_8 8 encoder spends
frame quality is measured using PSNR and SSIM [63] and a significant fraction of its bitrate on coding MVs, ranging
encoder performances are compared using the BD-rate and from 10% at higher bitrates to 80% at very low bitrates. The
BD-PSNR [64] measures. GMh_2 2 encoder provides compression performance similar
In Figs. 2 and 3 we compare the performance of the var- (1% BD-rate increase) to that of the GMh_1 1 encoder, despite
ious hybrid encoders listed in Table I with their respective its much lower computational complexity. This is because for
TABLE VI motion parameters are derived from the video data. In such
PERFORMANCE GAINS OF PROPOSED GMH_1 1 AND GMS_1 1 ENCODERS cases, the global motion parameters used in the video encoder
OVER THE BMH_8 8 ENCODER. need to be embedded in the video bitstream and transmitted.
The transmission of an eight parameter homography matrix with
single-precision requires . If two homography
matrices are transmitted for every frame, this would translate
to an additional 15 kbps for a video sequence at 30 frames per
second.
For the “aerial_beach1_crop” sequence for which results
were presented in Figs. 2 and 3, the GMh_1 1 and GMs_1 1
encoders still achieve BD-rate savings of 34% and 24% respec-
tively compared to the BMh_8 8 encoder, instead of 40% and
30% BD-rate savings without transmitting the homography
parameters. Similar small drops in the performance gains of
the GMh_1 1 and GMs_1 1 encoders over the BMh_8 8
encoder have been observed for the other test sequences when
the “aerial_beach1_crop” sequence, the motion is spatially the homography parameters are transmitted.
consistent within 2 2 blocks i.e., the motion of all the 4 pixels Therefore, the additional bitrate required to transmit the
in each 2 2 block are very similar. global motion parameters to the decoder does not greatly affect
The use of spectral entropy based bit allocation in the the performance improvements achieved by the GMh_1 1 and
GMs_1 1 encoder makes the quality across frames more GMs_1 1 encoders over the BMh_8 8 encoder. This addi-
constant as seen in Fig. 3 but slightly degrades the average tional bitrate can be reduced by predicting the global motion
rate-distortion performance (BD-rate loss of 16% compared parameters for the current frame from the parameters of the
to GMh_1 1 encoder). However, the GMs_1 1 encoder previous frame using a method like that given in [65].
still outperforms the BMh_8 8 encoder particularly at lower
bitrates. The improvement in performance of the GMs_1 1
encoder with respect to that of the BMh_8 8 encoder is 30% VII. DISCUSSION AND CONCLUSIONS
in terms of BD-rate savings and 1.6 dB in terms of BD-PSNR Motivated by UAV video applications, we have proposed low
increase. The compression performances of the GMs_1 1 and complexity (GMh_1 1) encoders that utilize the known global
GMs_2 2 encoders are similar, once again because of the motion and are superior to the complexity-constrained H.264
spatial consistency of the motion within 2 2 blocks. (BMh_8 8) encoder with 8 8 block ME for fly-over videos
The GMh_1 1 and GMs_1 1 encoders with corresponding mainly due to two reasons: (a) they replace the highly com-
low complexity “matched” decoders achieve BD-rate savings of plex block motion estimation engine with the relatively sim-
56% and 49% respectively over the DISCOVER codec which pler global motion compensation and hence reduce the encoder
has a low complexity encoder and a high complexity decoder. In complexity and (b) they do not need to transmit MVs which
addition for the DISCOVER codec, the PSNR of frames within if transmitted can constitute a significant fraction of the bit-
a GOP fluctuates considerably as demonstrated in Fig. 3. This stream. Our new (GMh_1 1) encoders achieve more than 40%
is because the quality of the side information derived at the BD-rate savings or equivalently more than 1.7 dB BD-PSNR
decoder depends on the distance of the frame from the intra improvement at lower complexity compared to the complexity-
frames and this affects the reconstructed frame quality. Unlike constrained H.264 (BMh_8 8) encoder. We have demonstrated
the DISCOVER codec, our proposed encoders do not require that for videos with global motion that is spatially consistent
feedback channels and provide better compression efficiency within 2 2 blocks, the computational complexity of this en-
than the BMh_8 8 encoder. The zero motion H.264 encoder coder can be reduced by 75% to obtain the GMh_2 2 encoder
(ZMh_8 8) has the poorest rate-distortion performance of all without considerably affecting the compression performance.
the encoders compared here and is included only for reference. We have also incorporated into these encoders, a spectral
Similar results have been obtained for other test sequences entropy based QM design scheme that provides near constant
with primarily global motion and are summarized in Table VI in quality within GOPs at the cost of small increases in delay
terms of the BD-rate savings and BD-PSNR improvements and complexity, and a small drop in compression efficiency.
of the GMh_1 1 and GMs_1 1 encoders with respect to the Compared to the complexity-constrained H.264 encoder
BMh_8 8 encoder. Since the performances of the BMh_8 8 (BMh_8 8), these (GMs_1 1 and GMs_2 2) encoders with
and BGMh_8 8 encoders are similar, we choose to present spectral entropy QM design still provide more than 28%
the performance gains with respect to only the BMh_8 8 BD-rate savings.
encoder. Along the same lines, we present results only for the All our proposed encoders (GMh_1 1, GMh_2 2,
GMh_1 1 and GMs_1 1 encoders whose compression per- GMs_1 1, GMs_2 2) provide significant compression gains
formances are similar to those of the GMh_2 2 and GMs_2 2 of more than 49% BD-rate savings over the Wyner-Ziv DIS-
encoders respectively. In all cases, the proposed GMh_1 1 COVER codec. Furthermore, we have considered the case
and GMs_1 1 encoders achieve significant compression per- when the global motion parameters cannot be derived at the
formance improvement over the BMh_8 8 encoder. decoder and have shown that embedding the global motion
In certain application scenarios, all the information about the parameters in the video bitstream only requires an additional
UAV and camera motion required to derive the global motion 15 kbps and hence does not greatly reduce the compression
parameters might not be available at the decoder or the global performance gains of the proposed encoders.
In our complexity analysis, we evaluate the encoders in [12] Y. Tan, W. Lee, J. Tham, and S. Rahardja, “Complexity-rate-distortion
terms of the following factors that most significantly contribute optimization for real-time H.264/AVC encoding,” in Proc. Int. Conf.
Comput. Commun. Netw., Aug. 2009, pp. 1–6.
to their complexity and power consumption in real-world sys- [13] L. Su, Y. Lu, F. Wu, S. Li, and W. Gao, “Real-time video coding under
tems: number of computations, memory accesses, and memory power constraint based on H.264 codec,” in Proc. SPIE 6508, Visual
storage. The fewer computations and memory accesses re- Commun. Image Process. (VCIP), Jan. 2007, vol. 6508, pp. 650 802-
quired by our GMh_1 1, GMh_2 2, and GMs_2 2 encoders 1–650 802-12.
[14] Y.-C. Lin, T. Fink, and E. Bellers, “Fast mode decision for H.264 based
compared to the BMh_8 8 encoder can translate to lower on rate-distortion cost estimation,” in Proc. IEEE Int. Conf. Acoust.,
power consumed for video encoding in UAVs. Moreover, the Speech, Signal Process. (ICASSP), 2007, vol. 1, pp. I-1137–I-1140.
bitrate savings achieved at fixed video quality by our encoders [15] F. Bossen, B. Bross, K. Suhring, and D. Flynn, “HEVC complexity and
implementation analysis,” IEEE Trans. Circuits Syst. Video Technol.,
can help reduce power required to transmit the compressed vol. 22, no. 12, pp. 1685–1696, Dec. 2012.
video bitstream in UAVs, because a reduced bitrate directly [16] A. Wyner and J. Ziv, “The rate-distortion function for source coding
translates into lower transmitted power. with side information at the decoder,” IEEE Trans. Inf. Theory, vol.
The strength of our encoders is that they utilize the global IT-22, no. 1, pp. 1–10, Jan. 1976.
[17] A. Aaron, R. Zhang, and B. Girod, “Wyner-Ziv coding of motion
motion information available in many UAV applications. In video,” in Proc. Asilomar Conf. Signals, Syst. Comput., Nov. 2002,
cases when the global motion parameters are not readily avail- vol. 1, pp. 240–244.
able or cannot be estimated accurately from UAV sensor data, [18] R. Puri and K. Ramchandran, “PRISM: A new robust video coding ar-
they could be computed using fast image registration techniques chitecture based on distributed compression principles,” in Proc. 40th
Allerton Conf. Communicat., Control, Comput., Oct. 2002.
that could possibly employ fast point-feature descriptors such [19] P. Ishwar, V. Prabhakaran, and K. Ramchandran, “Towards a theory
as FAST-ER [66] and real-time RANSAC variations such as for video coding using distributed compression principles,” in Proc.
ARRSAC [67] or be refined using algorithms such as those in Int. Conf. Image Process. (ICIP), 2003, vol. 2, pp. II-687–90.
[20] C. Brites, J. Ascenso, and F. Pereira, “Feedback channel in pixel do-
[68] from initial estimates derived from UAV sensor data. These main Wyner-Ziv video coding: Myths and realities,” in Proc. 14th Eur.
estimated global motion parameters can be used to achieve the Signal Process. Conf., Sep. 2006.
compression performance of any of our proposed encoders, [21] C. Brites and F. Pereira, “Encoder rate control for transform domain
albeit at a higher complexity due to the global motion parameter Wyner-Ziv video coding,” in Proc. IEEE Int. Conf. Image Process.
(ICIP), 2007, vol. 2, pp. II-5–II-8.
estimation or refinement. However, if the additional complexity [22] M. Morbee, J. Prades-Nebot, A. Pizurica, and W. Philips, “Rate al-
of global motion parameter estimation or refinement is not location algorithm for pixel-domain distributed video coding without
acceptable, a H.264/AVC encoder like the BMh_8 8 encoder feedback channel,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal
could be used at the cost of lower compression efficiency. Process. (ICASSP), Apr. 2007, vol. 1, pp. 521–524.
[23] C. Yaacoub, J. Farah, and B. Pesquet-Popescu, “Feedback channel
Therefore, one needs to chose the appropriate encoder based suppression in distributed video coding with adaptive rate allocation
on the complexity-compression trade-off flexibilities provided and quantization for multiuser applications,” EURASIP J. Wireless
by the application and scenario. Commun. Netw., vol. 2008, no. 1, pp. 427 247:1–13, Oct. 2008.
[24] A. M. Tourapis, F. Wu, and S. Li, “Direct macroblock coding for pre-
dictive (P) pictures in the H.264 standard,” in Proc. SPIE, 2004, vol.
REFERENCES 5308, pp. 364–371.
[25] A. Smolic, Y. Vatis, H. Schwarz, P. Kauff, U. Goelz, and T. Wie-
[1] R. Schneiderman, “Unmanned drones are flying high in the military/ gand, “Improved video coding using long-term global motion compen-
aerospace sector [Special reports],” IEEE Signal Process. Mag., vol. sation,” in Proc. Visual Commun. and Image Process., Jan. 2004, pp.
29, no. 1, pp. 8–11, Jan. 2012. 343–354.
[2] Z. Sarris, “Survey of UAV applications in civil markets,” in Proc. IEEE [26] A. Glantz, A. Krutz, and T. Sikora, “Adaptive global motion temporal
Mediterranean Conf. Control and Autom., 2001, pp. 1–11. prediction for video coding,” in Proc. Picture Coding Symp., 2010, pp.
[3] D. L. Hench, P. N. Topiwala, and Z. Xiong, “Channel adaptive video 202–205.
compression for unmanned aerial vehicles (UAVs),” in Proc. SPIE [27] M. Esche, A. Glantz, A. Krutz, and T. Sikora, “Adaptive temporal tra-
5558 Applicat. of Digital Image Process. XXVII, 2004, pp. 475–484. jectory filtering for video compression,” IEEE Trans. Circuits Syst.
[4] T. Klassen, “The UAV video problem: Using streaming video with un- Video Technol., vol. 22, no. 5, pp. 659–670, May 2012.
manned aerial vehicles,” Military and Aerosp. Electron., vol. 20, no. 7, [28] A. Krutz, A. Glantz, M. Tok, M. Esche, and T. Sikora, “Adaptive
Jul. 2009. global motion temporal filtering for high efficiency video coding,”
[5] T. Wiegand, G. Sullivan, G. Bjontegaard, and A. Luthra, “Overview IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 12, pp.
of the H.264/AVC video coding standard,” IEEE Trans. Circuits Syst. 1802–1812, Dec. 2012.
Video Technol., vol. 13, no. 7, pp. 560–576, Jul. 2003. [29] A. Stojanovic and J.-R. Ohm, “Exploiting long-term redundancies in
[6] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the reconstructed video,” IEEE J. Sel. Topics Signal Process., vol. 7, no.
high efficiency video coding (HEVC) standard,” IEEE Trans. Circuits 6, pp. 1042–1052, Dec. 2013.
Syst. Video Technol., vol. 22, no. 12, pp. 1649–1668, Dec. 2012. [30] J. Gong, C. Zheng, J. Tian, and D. Wu, “An image-sequence com-
[7] M. Bhaskaranand and J. D. Gibson, “Low-complexity video encoding pressing algorithm based on homography transformation for unmanned
for UAV reconnaissance and surveillance,” in Proc. Military Commu- aerial vehicle,” in Proc. Int. Symp. Intell. Inf. Process. Trusted Comput.,
nicat. Conf. (MILCOM), 2011, pp. 1633–1638. Oct. 2010, pp. 37–40.
[8] M. Bhaskaranand and J. D. Gibson, “Global motion compensation and [31] A. F. Rodriguez, B. B. Ready, and C. N. Taylor, “Using telemetry data
spectral entropy bit allocation for low complexity video coding,” in for video compression on unmanned air vehicles,” in Proc. AIAA Guid-
Proc. IEEE Int. Conf. Commun. (ICC), 2012, pp. 2043–2047. ance, Navigation, Control Conf. Exhibit, 2006.
[9] M. Bhaskaranand and J. D. Gibson, “Low complexity video encoding [32] P. H. F. T. Soares and M. d. S. Pinho, “Video compression for UAV
and high complexity decoding for UAV reconnaissance and surveil- applications using a global motion estimation in the H.264 standard,”
lance,” in Proc. IEEE Int. Symp. Multimedia, Anaheim, CA, USA, Dec. in Proc. Int. Workshop Telecomm., Santa Rita do Sapucai, Brazil, May
2013, pp. 163–170. 2013.
[10] X. Artigas, J. Ascenso, M. Dalai, S. Klomp, D. Kubasov, and M. Ouaret, [33] C. Angelino, L. Cicala, M. D. Mizio, P. Leoncini, E. Baccaglini, M.
“The DISCOVER codec: Architecture, techniques and evaluation,” in Gavelli, N. Raimondo, and R. Scopigno, “Sensor aided H.264 video
Proc. Picture Coding Symp. (PCS), 2007, vol. 17, pp. 1103–1120. encoder for UAV applications,” in Picture Coding Symp. (PCS), Dec.
[11] Z. He, Y. Liang, L. Chen, I. Ahmad, and D. Wu, “Power-rate-distortion 2013, pp. 173–176.
analysis for wireless video communication under energy constraints,” [34] M. Bhaskaranand and J. D. Gibson, “Spectral entropy-based bit allo-
IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 5, pp. 645–658, cation,” in Proc. Int. Symp. Inf. Theory its Applicat. (ISITA), 2010, pp.
May 2005. 243–248.
[35] H. Malvar, A. Hallapuro, M. Karczewicz, and L. Kerofsky, [58] A. Abou-Elailah, F. Dufaux, J. Farah, M. Cagnazzo, and B. Pesquet-
“Low-complexity transform and quantization in H.264/AVC,” IEEE Popescu, “Successive refinement of motion compensated interpolation
Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 598–603, Jul. for transform- domain distributed video coding,” in Proc. Eur. Signal
2003. Process. Conf. (EUSIPCO), Aug. 2011, vol. 1, pp. 11–15.
[36] R. I. Hartley and A. Zisserman, Multiple View Geometry in Computer [59] C. Brites, J. Ascenso, and F. Pereira, “Learning based decoding
Vision, 2nd ed. Cambridge, U.K.: Cambridge Univ. Press, 2004. approach for improved Wyner-Ziv video coding,” in Proc. Picture
[37] D. Lee, Y. Kim, and H. Bang, “Vision-based terrain referenced Coding Symp. (PCS), 2012, pp. 165–168.
navigation for unmanned aerial vehicles using homography [60] “UAV video data from the sensor data management system (SDMS)
relationship,” J. Intell. Robot. Syst., vol. 69, no. 1-4, pp. 489–497, Jan. of the Air Force Research Lab (AFRL),” [Online]. Available: https://
2013. www.sdms.afrl.af.mil
[38] M. K. Kaiser, N. Gans, and W. Dixon, “Vision-based estimation for [61] D. G. Lowe, “Distinctive image features from scale-invariant key-
guidance, navigation, and control of an aerial vehicle,” IEEE Trans. points,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, Nov. 2004.
Aerosp. Electron. Syst., vol. 46, no. 3, pp. 1064–1077, Jul. 2010. [62] M. A. Fischler and R. C. Bolles, “Random sample consensus: A par-
[39] A. E. R. Shabayek, C. Demonceaux, O. Morel, and D. Fofi, “Vision adigm for model fitting with applications to image analysis and auto-
based UAV attitude estimation: Progress and insights,” J. Intell. Robot. mated cartography,” Commun. ACM, vol. 24, no. 6, pp. 381–395, Jun.
Syst., vol. 65, no. 1–4, pp. 295–308, Jan. 2012. 1981.
[40] F. Kendoul, “Survey of advances in guidance, navigation, and control [63] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality as-
of unmanned rotorcraft systems,” J. Field Robot., vol. 29, no. 2, pp. sessment: From error visibility to structural similarity,” IEEE Trans.
315–378, Mar. 2012. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
[41] L. L. Campbell, “Minimum coefficient rate for stationary random pro- [64] G. Bjontegaard, “Calculation of Average PSNR Differences between
cesses,” Inf. Control, vol. 3, no. 4, pp. 360–371, 1960. RD-Curves,” 2001, ITU-T SC16/Q.6 VCEG-M33.
[42] W. Yang, J. Gibson, and T. He, “Coefficient rate and lossy source [65] M. Tok, A. Krutz, A. Glantz, and T. Sikora, “Lossy parametric motion
coding,” IEEE Trans. Inf. Theory, vol. 51, no. 1, pp. 381–386, Jan. model compression for global motion temporal filtering,” in Proc. Pic-
2005. ture Coding Symp. (PCS), 2012, pp. 309–312.
[43] J. Jung and J. Gibson, “The interpretation of spectral entropy based [66] E. Rosten, R. Porter, and T. Drummond, “Faster and better: A machine
upon rate distortion functions,” in Proc. IEEE Int. Symp. Inf. Theory, learning approach to corner detection,” IEEE Trans. Pattern Anal.
Jul. 2006, pp. 277–281. Mach. Intell., vol. 32, no. 1, pp. 105–119, Jan. 2010.
[44] M. Bhaskaranand and J. D. Gibson, “Spectral entropy-based quanti- [67] R. Raguram, J.-M. Frahm, and M. Pollefeys, “A comparative analysis
zation matrices for H.264/AVC video coding,” in Proc. 44th Asilomar of RANSAC techniques leading to adaptive real-time random sample
Conf. Signals, Syst. Comput., 2010, pp. 421–425. consensus,” in Computer Vision – ECCV. New York, NY, USA:
[45] W. Yang and J. Gibson, “Coefficient rate in transform coding,” in Proc., Springer, 2008, pp. 500–513.
35th Allerton Conf. Communicat., Control, Comput., Sep.-Oct. 29–1, [68] S. Baker and I. Matthews, “Lucas-Kanade 20 years on: A unifying
1997, pp. 128–137. framework,” Int. J. Comput. Vis., vol. 56, no. 3, pp. 221–255, Feb. 2004.
[46] A. Gersho and R. M. Gray, Vector Quantization and Signal Compres-
sion. Norwell, MA, USA: Kluwer, 1991.
[47] A. Ortega and K. Ramchandran, “Rate-distortion methods for image
and video compression,” IEEE Signal Process. Mag., vol. 15, no. 6, Malavika Bhaskaranand received her Ph.D. degree
pp. 23–50, Nov. 1998. in electrical and computer engineering from the Uni-
[48] R. Mester and U. Franke, “Spectral entropy-activity classification in versity of California, Santa Barbara, in 2013, under
adaptive transform coding,” IEEE J. Sel. Areas Commun., vol. 10, no. the supervision of Prof. Jerry D. Gibson. She received
5, pp. 913–917, Jun. 1992. the B.E. degree from the National Institute of Tech-
[49] M. Budagavi and M. Zhou, “Next generation video coding for mobile nology Karnataka, India, in 2004 and the M.S. de-
applications: Industry requirements and technologies,” in Proc. SPIE gree from the University of California, Santa Bar-
6508, Visual Communicat. and Image Process. (VCIP), 2007, pp. 650 bara, in 2008. She was a Senior Development Engi-
813-1–650 813-6. neer with the Media Processing Group, Ittiam Sys-
[50] “DISCOVER Codec Evaluation,” [Online]. Available: http://www. tems in Bangalore from 2004 to 2007 where she de-
img.lx.it.pt/~discover/home.html veloped video encoders, decoders, and transcoders on
[51] A. M. Tourapis, “Enhanced predictive zonal search for single and embedded platforms. Her current research interests include video compression,
multiple frame motion estimation,” in Proc. Visual Commun. Image video processing, and image processing.
Process. (VCIP), Jan. 2002, pp. 1069–1079.
[52] K.-H. Chen and Y.-S. Chu, “A low-power multiplier with the spurious
power suppression technique,” IEEE Trans. VLSI Syst., vol. 15, no. 7,
pp. 846–850, Jul. 2007. Jerry D. Gibson is Professor of Electrical and Com-
[53] B. Ramkumar and H. Kittur, “Low-power and area-efficient carry se- puter Engineering at the University of California,
lect adder,” IEEE Trans. VLSI Syst., vol. 20, no. 2, pp. 371–375, Feb. Santa Barbara. He has been an Associate Editor of the
2012. IEEE TRANSACTIONS ON COMMUNICATIONS and the
[54] A. Gupte, B. Amrutur, M. Mehendale, A. Rao, and M. Budagavi, IEEE TRANSACTIONS ON INFORMATION THEORY. He
“Memory bandwidth and power reduction using lossy reference frame was an IEEE Communications Society Distinguished
compression in video encoding,” IEEE Trans. Circuits Syst. Video Lecturer for 2007–2008. He is an IEEE Fellow, and
Technol., vol. 21, no. 2, pp. 225–230, Feb. 2011. he has received The Fredrick Emmons Terman Award
[55] J.-C. Tuan, T.-S. Chang, and C.-W. Jen, “On the data reuse and memory (1990), the 1993 IEEE Signal Processing Society
bandwidth analysis for full-search block-matching VLSI architecture,” Senior Paper Award, the 2009 IEEE Technical Com-
IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 1, pp. 61–72, mittee on Wireless Communications Recognition
Jan. 2002. Award, and the 2010 Best Paper Award from the IEEE TRANSACTIONS ON
[56] H. Shim and C.-M. Kyung, “Selective search area reuse algorithm for MULTIMEDIA. He is the author, coauthor, and editor of several books, the most
low external memory access motion estimation,” IEEE Trans. Circuits recent of which are The Mobile Communications Handbook (Editor, 3rd ed.,
Syst. Video Technol., vol. 19, no. 7, pp. 1044–1050, Jul. 2009. 2012), Rate Distortion Bounds for Voice and Video (Coauthor with Jing Hu,
[57] “DISCOVER Codec,” [Online]. Available: http://www.dis- NOW Publishers, 2014), and Information Theory and Rate Distortion Theory
coverdvc.org/ for Communications and Compression (Morgan-Claypool, 2014).

Global Motion Assisted Low Complexity Video Encoding For UAV Applications

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Global Motion Assisted Low Complexity Video Encoding For UAV Applications

Uploaded by

Copyright:

Available Formats

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 9, NO.

1, FEBRUARY 2015 139

Global Motion Assisted Low Complexity Video

The eight parameter homography model [36] can handle

where are the pixel co-ordinates in the input image to

is applied to each pixel or 1 1 block. Similarly, the proposed

We do not include analysis and results for the BMh_4 4 TABLE II

TABLE III TABLE IV

to be power-equivalent to 3 additions, the number of equiva-

TABLE V encoders evaluated here (BMh_8 8, BGMh_8 8, ZMh_8 8,

B. Complexity of Global Motion Parameter Derivation

You might also like