You are on page 1of 5

MPEG-4 Compliant Video Encoding: Analysis and Rate Control Strategies

Paul0 Nunes, Fernando Pereira


Instituto Superior TCcnico - Instituto de Telecomunicagbes
AV.Rovisco Pais, 1049-001Lisboa, Portugal
Phone: +3512184184 60; Fax: +35121 841 84 72
e-mail: {paulo.nunes, fernando.pereira)@lx.it.pt

Abstract and picture memories and the decoding speed. Although


the Video Buffering Verifier is essentially defined in
Any set of MPEG-4 elementary bitstreams building a
terms of decoding operation, it is a task of the encoder to
video scene can only be considered profile@level
compliant if it does not violate the MPEG-4 Video implement it in order to guarantee that it is not violated by
“shaping” the encoded data in a way that meets the
Buffering Verifier constraints for the chosen
relevant constraints. Such task is mainly dealt with by the
profile@ level. This paper analyses this mechanism,
rate control mechanism that takes into account the status
discussing its major features and drawbacks, notably in
of the several Video Buffering Verifier buffers in order to
comparison with altemative solutions. Furthermore, rate
optimalIy control the encoder.
control strategies to guarantee a compliant encoding, are
This paper analyzes the three MPEG-4 Video
proposed.
Buffering Verifier mechanism models, the Video
Reference Memory Verifier (W),the Video
Complexity Verifier (VCV), and the Video Rate Buffer
1 Introduction Verifier (VBV), highlighting its major features and
drawbacks, notably in comparison with relevant
The MPEG-4 standard, MPEG’s most recent altemative models. Additionally, this paper proposes
achievement, is aimed to define an audiovisual coding video coding rate control strategies defined in terms of the
standard to address the emerging needs of the reactions taken by the rate control module to guarantee a
communication, interactive, and broadcasting service compliant encoding, considering the status of the various
models as well as of the mixed service models resulting Video Buffering Verifier buffers.
from their technological convergence. In this context, the
MPEG-4 standard has been designed to be generic in the
sense that it is not targeted for a particular application but
2 The MPEG-4 video buffering verifier
includes many coding tools and algorithms that can be
The idea of using a Video Buffering Verifier to bound
used in a variety of applications under different situations,
the decoding complexity of a given bitstream is not new,
notably in terms of functionalities and bit rates. Since it is
and was already adopted in the previous MPEG video
not reasonable that all terminals support the whole
coding standards, MPEG-1 and MPEG-2. There, the
MPEG-4 toolbox, subsets of it have been defined, through
major purpose of the Video Buffering Verifier mechanism
the concepts of profiles and levels. A subset of the syntax
was to set some restrictions on the maximum variability of
and semantics, corresponding to a subset of tools of the
the number of bits per picture, especially in the case of
MPEG-4 visual standard [l], defines a visual profile,
constant bit rate operation, and thus on the complexity of
while sets of restrictions within each visual profile, in
the encoded‘video streams.
terms of computational resources, define the various
In MPEG-4, the several video objects composing a
levels. Profiles and levels will be at the center of MPEG-4
scene may vary in size along time and may be encoded at
deployment.
different VOP rates. To limit the decoding complexity of
In order that a set of visual bitstreams building a scene
the corresponding bitstreams it is then also necessary to
may be considered compliant with a given MPEG-4 visual
set some limits on the variability of the number of MBls
profileOleve1, allowing interoperability, it must not
and on the picture memory required to store the decoded
contain any disallowed syntactic element and additionally
data. This constitutes the major novelty of the MPEG-4
it must not violate the Video Buffering Verifier
Video Buffering Verifier mechanism, relatively to the
mechanism constraints. This mechanism, based on virtual
previous MPEG standards, since it does not only bound
buffers, allows the encoder to limit the decoding
computational resources required, notably the bitstream

0-7803-6514-3/00/$10.0002000IEEE 54

Authorized licensed use limited to: Qualcomm. Downloaded on June 25,2010 at 20:50:26 UTC from IEEE Xplore. Restrictions apply.
the bitstream buffer memory but also the MB decoding 3 MPEG-4 video buffering verifier analysis
capacity and the MB picture memory.
The MPEG-4 Video Buffering Verifier mechanism Although the Video Buffering Verifier mechanism is
consists of three different models, each one defining a set essentially described in terms of decoder operation, it is a
of rules and limits to Verify if a specific Of de~oder task of the encoder to implement it and to guarantee that it
resources is Within the VdUeS allowed by the is not violated. For this, the encoder has to “shape” the
corresponding profile and level definition: encoded data in a way that it does not violate the
Video Reference Memory Verifier (VMV) - This constraints imposed by this mechanism. If any of the
model is used to verify if the picture memory required Video Buffering Verifier models tends to be violated, the
at the decoder for the decoding of a given scene does encoder has to take appropriate countermeasures to avoid
not exceed the values specified for the corresponding it. Such task is mainly dealt with by the rate control
profile and level. The model is defined in terms of the mechanism that takes into account the scene
VMV buffer size, which is the maximum number of characteristics, the encoding results, and the status of the
decoded MBs that the decoder can store during the several Video Buffering Verifier buffers for the best
decoding process (maximum in the sense that it is there control of the encoder [2].
for sure). This buffer accumulates all the decoded MBs
of all VOPs and stores them until they are no longer 3.1 Analysis of the VMV model
needed for the prediction of other VOPs.
Video Complexity Verifier (VCV) - This model is The VMV occupancy due to a given VOP depends on
used to verify if the computational power, defined in its size in MB units and on the time interval during which
terms of MB/s, required at the decoder does not exceed the VOP stays in the VMV buffer. It is possible to
the values specified for the corresponding profile and identify, at least, two major types of picture memory
level. The model is defined in terms of the VCV MB/s required by any receiving terminal:
decoding rate and VCV buffer size and is applied to all Decoding memory to store the decoded VOPs and the
MBs in the scene. If arbitrarily shaped VOs exist in the corresponding predictions (reference VOPs), during the
scene, an additional VCV buffer and VCV decoding
rate is also defined, to be applied only to the boundary
MBs. The ratio between the VCV buffer size and the
. decoding urocess.
- I

Composition memory to store the decoded vops


(composition units) and the resulting composed scene,
VCV decoding rate defines the VCV latency, L, i.e. the during the composition and presentation process.
time it takes to decode a full VCV, and sets the
minimum latency of the decoding process. At each Depending on how the two mentioned types of picture
vop decoding time, the number of m Sin the vop is memory are considered in the VMV model there are
added to vcv buffer, and its occupancy decreases at several alternatives for the modeling of this capability.
constant VCV decoding rate until the VOP decoding ahemative VMV are
has finished or a new VOP is added to the VCV buffer. compared with the adopted MPEG-4 VMV model:
3. Video Rate Buffer Verifier (VBV) - This model is Global memory approach - In this approach, the
used to verify if the bitstream memory required at the VMV model takes into account all the memory needed
decoder does not exceed the values specified for the for the decoding and for the composition of the
corresponding profile and level definition. The model is decoded VOPs. In this case, a given VOP consumes
defined in terms of the VBV buffer size which is the picture memory from the instant it starts being decoded
maximum amount of bits that the decoder can store in until it has been presented and is no longer needed for
the bitstream memory. The encoded bits for each VOP the prediction of other VOPs.
enter the VBV at constant or variable bit rate, and are Decoding memory approach - In this approach, the
instantaneous removed from this buffer at the VOP VMV model takes only into account the memory
decoding time. needed for reconstructing the encoded VOPs. Here, a
In order that a given set of elementary streams (ESs) given VOP consumes picture memory from the instant
building a visual scene may be considered compliant with it starts being decoded until it has been decoded and is
a given profile and level, the encoder must guarantee that no longer needed for the prediction of other VOPs.
none of the above mentioned buffers overflows and, -
MPEG-4 V M V model approach When the encoder
additionally, it must also guarantee that, in certain does not use B-VOPs, the MPEG-4 VMV follows a
circumstances, the VBV never underflows. Bitstream global memory approach since it specifies that each
compliance with a given profile8level guarantees that the VOP should stay in the VMV buffer until the following
resources required at the decoder do not exceed a certain VOP starts being composed. When B-VOPs are used,
pre-defined amount. the MPEG-4 V M V specifies that these VOPs should be
released from the VMV buffer at their decoding times

55

Authorized licensed use limited to: Qualcomm. Downloaded on June 25,2010 at 20:50:26 UTC from IEEE Xplore. Restrictions apply.
plus the VCV latency, i.e. at the time corresponding to 0 Signal the impossibility to encode the given scene with
the maximum decoding duration of the B-VOP. Thus, the amount of picture memory provided by the chosen
B-VOPs are released from the VMV buffer before the profile0level combination (this may lead to the choice
following VOP (in composition order) starts being of a more powerful profile@level).
composed. This approach is neither a decoding memory An overflow of the W buffer may lead to a situation
approach, otherwise B-VOPs should be released where the decoder is not able to correctly decode part or
immediately after their decoding has finished, or a all MBs in one or more VOPs (due to the lack of picture
global memory approach since B-VOPs are released memory to store the decoded VOPs) and thus to a coding
from the VMV buffer before their presentation has desynchronization between encoder and decoder.
finished.
The adopted MPEG-4 VMV solution underestimates 3.3 Analysis of the VCV model
the total picture memory needed at the decoder, notably
the memory needed for the composition and presentation In MPEG-4 the load on the decoder is measured by the
of B-VOPs, but overestimates the picture memory needed occupancy of the VCV buffer. There are several ways to
only for the decoding process. Since VOP composition is measure the complexity of the encoded data, which can be
not normative in MPEG-4, it would cause no surprise if defined by any of the following parameters:
the decoding memory approach had been adopted. 0 Number of MBs per second.

Number of MBs per MB type (opaque, boundary, or


3.2 VMV rate control strategies transparent) per second.
0 Number of MBs per MB coding type (I-MB, P-MB, B-
For each target encoding time instant, the encoder
verifies if the amount of picture memory required for MB, etc) per second.
decoding the given VOP or set of VOPs, during its Number of arithmetic instructions and memory
lifetime at the decoder, does not exceed the maximum R e a m r i t e operations per second.
amount of memory available for the selected profile and In terms of the VCV model operation, the use of these
level. For this, the encoder estimates the picture memory measures is similar. At each VOP decoding time, the
usage during the decoding period of each VOP, which can adopted complexity measure for the incoming VOP is
be given by the following expression: added to the VCV buffer occupancy (e.g. #MBs) which
vmv(t) = vmv(to) + mi& - t o ) - H , M , ]- xkAM , -u(t - 1 , ) (1) decreases at a constant rate defined in the appropriate data
complexity units by the profile@level combination (e.g.
where to and r, are, respectively, the start and end #MB/s), until the next VOP decoding time. The sizes of
decoding times of the given VOP, H is the VCV decoding the VCV buffers, using the appropriate complexity
rate, M I is the total number of MBs in the VOP, u ( t ) is measure units, define the maximum load that the decoder
the unit step function, A represents the set of VOPs to be can accept, while the drain rates of the VCV buffers
released from memory in the interval j,,.t,], and rk their define the maximum decoding speed of the decoder.
There are several alternatives for the modeling of the
corresponding releasing time instants; vmv(ro) denotes decoder computational capability, notably depending on
the VMV occupancy immediately before start decoding the number of buffers and decoding rates used. Below
VOP i; min[(f - r , ) . H , M , ] denotes the allocation of several VCV model approaches, based on virtual buffers,
picture memory as the decoding process progresses (while are compared the with the adopted MPEG-4 VCV model:
the VCV occupancy decreases at decoding rate H, the Single buffer with single decoding rate - This is the
V M V occupancy increases at the same rate); finally the simplest VCV model based on the virtual buffer
picture memory released during the decoding period of the approach which consists in a single VCV buffer that
given VOP is given by CkE,Mk -u(t - f k ) . accumulates the complexity of the encoded data and a
single decoding rate that defines the speed at which the
By estimating in advance the local maxima of (1) the
decoder can decode this data. In this case, decoding is
encoder is able to take preventive actions to avoid possible
performed in a first in first out W O ) basis. There are
violations of the VMV, such as:
two major possible variations for this approach:
Skip the encoding of one or more of the larger VOPs.
0 Without MB weights - All MBs have the same
Avoid B-VOPs (when applicable). weight in terms of decoding complexity. This approach
0 Use VOPs with less MBs (in the bounding box), by has the advantage of being rather simple. Its major
merging VOs or reducing its number (this solution drawback comes from the fact that decoders have to be
impacts on the ‘authoring’ of the scene). designed to deal with worst case scenarios (heaviest
MB types) in terms of decoding capacity.

56

Authorized licensed use limited to: Qualcomm. Downloaded on June 25,2010 at 20:50:26 UTC from IEEE Xplore. Restrictions apply.
0 With MB weights - The MBs are divided into A VCV model where the three different types of
classes, e.g. according to its coding type, and each MB MBs (transparent, opaque, and boundary) could be
class has an associated decoding complexity weight. discriminated in terms of decoding complexity would
The VCV occupancy due to a given VOP is a weighted be more adequate than the current MPEG-4 VCV
sum of its MBs. As in the preceding case, the VCV model. In this context, the single buffer with single
buffer is emptied at a fixed VCV decoding rate. The decoding rate with MB weights or the multiple buffers
main advantage of this approach is to model more with multiple decoding rates solutions appear as good
closely the real decoding complexity of a given set of candidates for the VCV modeling. In the first case, the
ESs building a visual scene. However, the definition of impact of transparent MBs could be reduced by
meaningful MB weights is not a straightforward task. assigning them a lower MB weight, while for the
second case this could be done by defining a higher
Single buffer with multiple decoding rates - This
buffer size and higher decoding rate.
approach assumes also a FIFO decoding; however, here
each MB class has its own decoding rate. This
News C I F [ 4 V O s l 6’ 15 Hz
approach is equivalent to the previous approach when
450
the MB weights are the ratios between the different MB
400
class decoding rates and a reference decoding rate.
350
Multiple buffers with multiple decoding rates - This 300
VCV model approach assumes some degree of 2 250
parallelism in the decoder which may happen in Ii 200
hardware-based decoders with dedicated hardware for 150
decoding some types of MBs, e.g. a module for 100
decoding MBs without shape information or 50
completely opaque, and another module for decoding 0
MBs with shape information (i.e. boundary MBs). The 0 50 100 150 208 256 300
Frame
VCV model consists of N buffers, one for each MB
class, and the associated decoding rates. The decoding News C I F [ 4 V O s l @ IS HZ
time is determined by the last buffer to be emptied.
VCV B u r fer Occupancy -
MPEG-4 VCV model approach - The MPEG-4VCV VMV B u f f e r Occupancy

model follows a multiple buffers, multiple decoding


rate approach with two buffers: the VCV, accumulating 100

all MBs, and the Boundary VCV, accumulating only


boundary MBs. However, these two MB classes are not
mutually exclusive since boundary MBs are counted in
both VCVs. With this approach, the MPEG-4 VCV
model tries to include in the same model the parallel
nature of a pure multiple buffer with multiple decoding 0 50 I00 150 200 250 300
rates with the serial nature of a single buffer with Frame
multiple decoding rates. Since the VOP MBs are
organized in two classes (boundary and all), opaque -
Figure 1 Sequence ‘News’ encoded with
core profile 0 level 2
and transparent MBs are treated exactly in the same
way. This fact has a high impact in the measured
complexity of certain scenes, where transparent MBs, 3.4 VCV rate control strategies
although not canying “rich content”, lead to an
overestimation of the scene complexity and thus to the In order to produce valid bitstreams, the encoder must
impossibility of compliantly encode scenes which were ensure that the two VCV buffers never overflow and that
expected to be encoded with a certain profile@level each VOP is completely decoded in time. To meet these
combination. This situation is illustrated in Figure 1 for requirements the encoder analyzes the set of VOPs to be
the News sequence (4 VOs) in CIF format encoded at encoded for each target encoding time (typically the
15 Hz with the MPEG-4 core profile @ level 2; the multiples of l/fi , where fi. denotes the temporal coding
high percentage of transparent MBs in the scene causes rates for the various objects in the scene) and tests if the
the VCV to overflow. In this case, the feedback above conditions can be verified for this set of VOPs.
mechanism that prevents the violation of the VCV Whenever the rate control mechanism detects a possible
model has been disabled, thus whenever the encoder violation of the VCV model, it can take one of the
detects a violation of the VCV model, the bitstream is following actions to decrease the occupancy of the VCV
generated but is signaled as non-compliant. buffers:

57

Authorized licensed use limited to: Qualcomm. Downloaded on June 25,2010 at 20:50:26 UTC from IEEE Xplore. Restrictions apply.
Skip one or more VOPs for that time instant. 0 Reactive mode - Whenever the coding decisions taken
Reduce the number of MBs with shape information (in by the encoder lead to a violation of the VBV
case this is the limiting factor), e.g. by changing the constraints, new coding decisions are adopted to avoid
shape of the objects or merging objects (if this is an this situation.
acceptable solution for the application in question) The reactive mode is typically more effective in
Decrease the size of some VOPs, e.g. by splitting an achieving proper VBV buffer control, since it usually
object into two or more objects it may be possible to involves choosing iteratively from multiple coding
reduce the total number of MBs in the resulting decisions the one that, fulfilling the VBV constraints, has
bounding boxes (if this is an acceptable solution for the the best tradeoff in terms of rate-distortion operation.
application in question). However, the reactive mode is not very suited for real-
time encoding since it may involLveextra delays that are
Notice that the overflow of one (or both) of the VCV not acceptable for real-time conditions. Besides being less
buffers may lead to the incomplete decoding of one or effective, the preventive mode cannot completely
more VOPs and thus to a coding desynchronization guarantee that the VBV will not be violated; the less
between encoder and decoder. severe the VBV thresholds to operate the preventive mode
are, the higher is the risk that a VBV violation happens.
3.5 Analysis of the VBV model Typically, the encoder rate control can take one of the
following actions to prevent or avoid VBV violation:
The encoder needs to shape the encoded data in order
0 Adjust the MB quantization parameters (texture
to guarantee that the encoded bits for each VOP are
available for decoding at the time corresponding to the losses/distortion).
VOP decoding time. This can be done by allocating the 0 Adjust the shape encoding losses (distortion).

available channel rate in a pre-defined or rigid way -@ed 0 Skip the encoding of the incoming VOP(s).
bandwidrh allocation, or dynamically, by allocating the
available channel rate among the several VOs in the scene 0 Introduce stuffing bits (increase the VBV buffer
according to the VO’s characteristics (e.g. size, activity, occupancy if underflow is the problem).
complexity, etc.) - dynamic bandwidth allocation. Both the overflow and underflow of the VBV buffer
As for the VMV and VCV models, MPEG-4 Visual must be avoided. Notice that the VBV buffer overflow
does not standardize any method for achieving proper may lead to a loss of data and thus to the incomplete
VBV buffer control. This freedom allows the decoding of one or more VOPs, while the VBV buffer
implementers to chose the best buffer control mechanism underflow may prevent the decoder to decode the
that guarantees proper VBV operation for each case. As incoming data on time and thus both may lead to a coding
shown in 131 and [4], joint rate control strategies, where desynchronization between encoder and decoder.
the available bit rate is dynamically allocated among the
several VOs in the scene can achieve better performance, 4 Finalremarks
when compared to independent rate control, where the
available resources are rigidly allocated. This paper analyzed the MPEG-4 Video Buffering
Although the MPEG-4 VBV specification states that in Verifier mechanism highlighting its major features and
the case of a scene composed by multiple VOs, each with drawbacks and compared the adopted MPEG-4 models
one or more VOLs, the VBV model should be applied with other alternative models. This analysis showed that
independently to each VOL [11, the MPEG-4 VBV model the MPEG-4 models can still be improved, notably the
does not prevent the dynamic allocation of resources and VMV and VCV models: the VMV to better reflect the fact
combined buffer control, provided that the sum of all that composition is not normative in MPEG-4, while the
individual buffers does not exceed the profile@level VCV to better reflect the real complexity of visual scenes
limits. containing a high percentage of transparent MBs.

3.6 VBV rate control strategies


111 ISOmEC 14496-2:1999, Information Technology - Coding
of Audio-visual Objects - Part 2 Visual.
Whenever the rate control mechanism detects a P I P. Nunes, F. Pereira, “Implementing the MPEG-4 Natural
possible violation of the VBV model (underflow or Visual Profiles and Levels”, Doc. M4878, 48th MPEG
overflow), it should react with the most adequate action in meeting, Vancouver, July 1999.
the context of the application in question. This can be 131 P. Nunes, F. Pereira, “Rate Control for Scenes with
done in two distinct modes: Multiple Arbitrarily Shaped Video Objects”, PCS’97,
Berlin, September 1997.
-
Preventive mode Whenever the VBV occupancy
141 P. Nunes, F. Pereira, “Object-Based Rate Control for the
reaches certain high thresholds, the encoder takes MPEG-4 Visual Simple Profile”, WIAMIS’99, Berlin,
adequate action(s) to neutralize the unwanted situation. May 1999.

58

Authorized licensed use limited to: Qualcomm. Downloaded on June 25,2010 at 20:50:26 UTC from IEEE Xplore. Restrictions apply.

You might also like