You are on page 1of 10

Final Exam Multimedia Communications

M. Arif Nugroho 2101120003

1. H.264
a. H.264 Basic process
H.264 divided into two basic process : encoding and decoding. On encoding
process carries out prediction, transforming, and encoding process to produce
a compressed H.264 bitstream. An H.264 video decoder carries out the
complementary processes of decoding, inverse transform and reconstruction
to produce a decoded video sequence.

Figure 1.1 H.264 encoder process

Figure 1.2 H.264 decoder process.

Encoding Process
The encoder forms a prediction of the current macroblock based on
previously-coded data, either from the current frame using intra
prediction or from other frames that have already been coded and
transmitted using inter prediction. The encoder subtracts the
prediction from the current macroblock to form a residual.

The prediction methods supported by H.264 are more flexible than

those in previous standards,enabling accurate predictions and hence
efficient video compression. Intra prediction uses 16 16 and 4 4
block sizes to predict the macroblock from surrounding, previously
coded pixels within the same frame (Figure 1.4).
The values of the previously-coded neighbouring pixels are
extrapolated to form a prediction of the current macroblock. Figure 1.5
shows an example. A 1616 prediction block is formed, an
approximation of the original macroblock. Subtracting the prediction
from the original macroblock produces a residual block (also containing
16 16 samples).
Inter prediction uses a range of block sizes from 16 16 down to 4 4
to predict pixels in the current frame from similar regions in previously
coded frames (Figure 1.6). These previously coded frames may occur
before or after the current frame in display order. In the example
shown in Figure 1.6 , macroblock 1 (MB1) in the current frame is
predicted from a 16 16 region in the most recent past frame. MB2
is predicted from two previously coded frames. The upper 8 16 block
of samples, a partition, is predicted from a past frame and the lower 8
16 partition is predicted from a future frame.

Figure 1.3 Flow Diagram

Figure 1.4 Intra Prediction

Figure 1.5 Original Macroblock, intra prediction, Residual MB

Figure 1.6 Inter Prediction

Transform and Quantization

A block of residual samples is transformed using a 4 x 4 or 8 x 8
integer transform, an approximate from the Discrete Cosine Transform
(DCT). The transform outputs a set of coefficients, each of which is a
weighting value for a standart basis pattern. When combined, the
weighted basis patterns re-create the block of residual samples.
The output of the transform, a block of transform coefficients, is
quantized (each coefficient is divided by an integer value).
Quantization reduces the precision of the transform coefficients
according to a quantization parameter (QP). For example, the original
coefficient values in figure 1.7 are divided by a Quantization parameter
and rounded to the nearest integer. Typically, the result is a block in
which most or all of the coefficients are zero, with a few non-zero
coefficients. Setting QP to a high value means that more coefficients
are set to zero, resulting in high compression at the expense of poor
decoded image quality. Setting QP to a low value means that more
non-zero coefficients remain after quantization, resulting in better
image quality at the decoder but but also in lower compression.

Bitstream Encoding

Figure 1.7 Quantization Example

The video coding process produces a number of values that must be

encoded from the compressed bitstream. These values include :
o Quantized transform coefficients.
o Information to enable the decoder to re-create the prediction.
o Information about the structure of the compressed data and the
compression tools used during encoding.
o Information about the complete video sequence.
These values and parameters, syntax elements are converted into
binary codes using variable length coding or arithmetic coding. Each of
these encoding methods produces an efficient, compact binary
representation of the information. The encoded bitstream can then
stored or transmitted.
Decoder Process
Bitstream decoding
A video decoder receive the compressed H.264 bitstream, decodes
each of the syntax elemets and extracts the information described
above, i.e. quantized transform coefficients, prediction information,
etc. this information is then used to reverse the coding process and
recreate a sequence of video images.
Rescaling and inverse transform
The quantized transform coefficients are re-scaled. Each coefficient is
multiplied by an integer value to restore its original scale. In the
example of figure 1.8, the quantized coefficients are each multiplied by
a QP or step size of 8. The rescaled coefficients are similar but not
identical to the originals (figure 1.7)

Figure 1.8 Rescaling example

Figure 1.9 Inverse transform : combining weighted basis patterns to

create a 4 x 4 image block.
An inverse transform combines the standard basis patterns, weighted
by the re-scaled coefficients, to re-create each block of residual data.
Figure 1.9 shows how the inverse DCT or integer transform creates an
image block by weighting each basis pattern according to a coefficient
value and combining the weighted basis patterns. These blocks are
cobined together to from a residual macroblock.
For each macroblock, the decoder forms an identical prediction to the
one created by the encoder using inter prediction from previouslydecoded frames or intra prediction from previously-decoded samples in
the current frame. The decoder adds the prediction to the decoded
residual to reconstruct a decoded macroblock which can then be
displayed as part of a video frame (figure 1.10)

Figure 4.10 Reconstruction flow diagram

2. a. Advantages and disadvantages of tcp/udp video streaming.

Disadvantages :
complex error handling mechanism. UDP is an unreliable protocol. As a
result, packets may be lost during transit. To offer good-quality video,
these losses have to be mitigated. Retransmission, Forward Error
Correction and error concealment are techniques which may be used.
Network unfriendliness : UDP transmission is not elastic and hence not
TCP friendly. As a result, it either takes unfarily too much bandwidth or
leads to high packet loss in the presence of fluctuating bandwidth.
Unselective data loss : for video stream, some frames and some data
fields are more important than others and need to be protected. Since
wireless error occurs at any time, these important data may be lost,
leading to degradation in quality. If those more important frames or
data fields can be selectively protected, better video quality would be
Firewall penetration : through some protocols make use of UDP
(STUN,SIP,RTP). Applications using UDP experienced penetration
problem than TCP.
Advantages :
UDP protocol is fast. No error checking required.
Tcp :
Disadvantages :
high complexity on TCP protocols.
Advantages :
Reliable transmission : tcp is a reliable protocol and hence effectively
addresses the synchronization and retransmission problem as
mentioned above. There is no need of complex error concealment and
resiline mechanism which need to be implemented in the client and
Network fairness : TCP is instrinsically friendly, which shares network
resources with other data traffic/flows in the presence of congestion.
There is no need to implement other mechanisms to achieve fairness.
It also adapts its transmission rate according to the avalilable network
bandwidth, thereof allowing the video applications to make full use of
the bandwidth.
Ease of deployment : using TCP in applications is easy, and TCP
applications more readily penetrate firewalls.
b. UDP use checksum for error detection while transporting data packets
over a network. UDP at the sender performs the ones complement of the
sum of all the 16-bit fields in the UDP header. The checksum is also
calculated for a few of the fields in the IP header in addition to the UDP
header. The computed result is stored in the checksum field of the UDP
header. If the computed checksum is zero, this field must be set to 0XFFFF.
In the destination computer, it passes an incoming IP datagram to UDP if

the valu in the type field of the IP header is UDP. When the UDP receives
the datagram from IP, it examines the UDP checksum. All 16-bit fields in
the UDP header are added together, including the checksum. If this sum
equals 1111111111111111, the datagram has no errors. If one of the bits
in the computed sum is zero, it indicates that the datagram was
inadvertenly altered during transmission and thus has some errors. If the
checksum field in the UDP header is zero, it means that the sender did not
calculate checksum field in the UDP header is zero, it means that the
sender did not calculate checksum and the field can be ignored. If the
checksum is valid or nonzero, UDP at the destination computer examines
the destination port number and if an application is bound to that port, the
datagram is transferred to an application message queue to buffer the
incoming datagrams before transferring them to the application. If the
checksum is not valid, the destination computer discards the UDP
datagram. Example :
UDP header has three fields that contains the following 16-bits values:
0110011001100101,0101010101010110, and 0000111100001111, the
checksum can be calculated as follows :
First two 16-bit are added:
The sum of first and second 16-bit data is :
Adding the third 16 bit data to the above sum gives :
The sum of these values is :
0011010100110101. Now the checksum computed by the senders UDP is
0011010100110101. At the destination computer, the values of all the
four 16-bit fields, source & destination ports, length and checksum are
added. If no errors were introduced in the datagram, the sum at the
receiver will be 1111111111111111. If one of the bits is a zero, error
3. a. on JPEG/MPEG-1/2 use fixed size block matching (FSBM). On MPEG 4 using
various size block matching (VSBM).
b. VSBM is improved version of FSBM by varying the size of blocks to more
accurately match moving areas. This method was proposed by chan,yu,and
Constantinides. VSBM is a scheme that starts with relatively large blocks,
which are then repeadtedly divided, this is a so-called top down approach. If
the best matching error for a block is above some threshold, the block is
divided into four smaller blocks, until the maximum number of blocks or
locally minimum errors are obtained. The application of such top-down
methods may generate block structures for an image that match real moving
objects, but it seems that an approach which more directly seeks out areas of
univorm motion might be more effective. For the same number of blocks per

frame as FSBM, VSBM method results in a smaller mean sequare error (MSE),
or better prediction. More significantly, for a similar level of MSE as FSBM, the
VSBM technique can represent the inherent motion using fewer blocks, and
thus a reduced number of motion vectors.
4. a. chroma substantion benefits : chroma subsampling is a technique that are
used to reduce bandwidth in many video systems. Since the human visual
system is not very sensitive to color, color resolution can be reduced to lower
bandwidth. Video systems do this via chroma subsampling.
b. artefact that found in an MPEG-encoded video :
Aliasing : occurs when a signal being sampled contains frequencies
that are too high to be successfully digitized at a given sampling
frequency. When samped these high frequencies fold back on top of
the lower frequencies producing distortion. In most method of video
digitizing, this will produced pronounced vertical lines in the picture.
This problem can be reduced by applying a low pass filter to the
video signal before it is digitized to remove the unwanted high
frequency components.
Quantisation Noise : this form of distortion occurs because, when
digitized the continuously variable analogue waveform must be
quantized into a fixed finite number of levels. It is the coarseness of
these levels that causes quantisation noise. A 24-bit colour picture
suffers from virtually no quantisation noise, since the number of
available colours is so high 16.7 million. Reasonable results can be
obtained from an 8-bits per pixel picture, especially if the picture is
greyscale rather than colour.
Overload : like quantisation noise, overload is related to the finite
number of levels that the signal can take. If a signal is digitized that
is too high in amplitude, then the picture will appear bleached. For
example, if the signal level of a greyscale image is too high for the
conversion process to cope with, then all levels above the
maximum will be converted to white, causing the washed out
Video signal degradation : video in digital form degrades far less
gracefully than its analogue counterpart. While digital information
may in theory be duplicated an infinite number of times without any
degradation, once that degradation does occur, it is very
noticeable. Due to the compression techinques used, a single bit
error in the data stream could for example cause a large block of
pixels to be displayed in a completely different colour to that
Gibbs effect : this is most noticeable around artificial objects such
as plain coloured, large text and geometric shape such as square. It
shows up as a blirring or haze around the object, where the sudden
transition is made from the artificial object to the background. It is
caused by the discrete cosine transform used to digitize

chrominance and luminance information. This phenomena is also

apparent around more natural shapes like a human figure.
Blockiness : when video footage involving high speed motion is
digitized, the individual 8x8 blocks that make up the picture
become more pronounced.

5. PRISM Codec Improvement. there are new proposed process on PRISM

decoding and encoding. The new proposed process are :
a. Syndrome coding : syndrome coding is introduced to make possible
exploiting the temporal redundancy at the decoder. This module can
generate compression without having the knowledge of the exact
difference between the current video block and the best predictor
created at the decoder based on the previously decoded frame. The
compression is achieved through the careful selection of the quantized
DCT coefficients bitplanes that are transmitted to the decoder. So, the
syndrome encoder selects the bitplanes that will be transmitted to the
decoder and those that will estimated based on good temporal
b. Block classification : the number of bitplanes coded for a given DCT
coefficient should increase with the increasing of the difference between
the current block and the best predictor created at the decoder based on
the previously decoded frame; this difference corresponds to correlation
noise. However, as the encoder cannot rely on motion estimation, the
correlation noise has to be predicted somehow based on the available
information from the previous frame. The approach is to classify each
block to be coded into one of 16 correlation classes, [0;15], relying on the
difference between the current block and the collocated one in the
previous frame measured through the MSE. The block classified with 0 are
skipped (no bits transmitted since there is very high correlation) and those
classified with 15 are intra encoded since it is considered there is not
enough temporal redundancy to be exploited.
c. Cyclic redundancy coding : the decoder side information is constituted by
multiple video block candidates, corresponding to motion compensated
blocks from the previously decoded frame. While any of the side
information candidates can help disambiguate the information received by
the syndrome encoder, not all can do it correctly because some of them
are too different from the original block. To detect that a block has been
successfully decoded, a 16 bits CRC playing the role of a block
hash/signature, is transmitted to the decoder.