Unit II

Part -2 : Video Compression
 Principles of Video
 H.261
 H.263
 MPEG-1
 MPEG-2
 MPEG -4
• JPEG stands for 'Joint Photographic
Experts Group', - the term 'Joint' refers
to the link between the standardization
bodies that created these working
groups, ISO and ITU-T.
• One approach to compressing a video
source is to apply the JPEG algorithm to
each frame independently.
• This is known as moving/motion JPEG
Video Compression • If a typical movie scene has a minimum duration of 3 seconds. assuming a frame refresh rate of 60 frames/s each scene is composed of 180 frames hence by sending those segments of each frame that has movement associated with them considerable additional savings in bandwidth can be made. • There are two types of compressed frames .Those that are compressed

In video telephony there are fine movements of the face and hands yet the background information from frame to frame remains the same throughout the sequence.

In a live steaming of a tennis match. the motion is represented by the players and the ball while the remaining information that of the stadium and the audience remains similar all through the video.

Video Compression (Example frame sequences I and P frames) • In the context of compression. video is also referred to as moving pictures and the terms "frames" and "picture" are used interchangeably. since video is simply a sequence of digitized pictures.

Video Compression – I frames • I-frames (Intracoded frames) are encoded without reference to any other frames. • I–frames the compression level is small. • The number of frames/pictures between I-frames must be repeated at regular intervals to avoid losing the whole picture as during transmission it can get corrupted and hence looses the frame. Each frame is treated as a separate picture and the Y. Cr and Cb matrices are encoded separately using JPEG. They are good for the first frame relating to a new scene in a movie

Video Frames Video Frames

• The accuracy of the prediction operation is determined by how well any movement between successive frames is estimated.Video Compression – P frames • The encoding of the P-frame is relative to the contents of either a preceding I-frame or a preceding Pframe. This is known as the motion estimation. additional information must also be sent to indicate any small differences between the predicted and actual positions of the moving segments involved. • Since the estimation is not exact. • P-frames are encoded using a combination of motion estimation and motion compensation. This is known as the motion compensation. • No of P frames between I-frames is limited to avoid error propagation.

Video Compression – Frame Sequences I-. P. and B-frames • Each frame is treated as a separate (digitized) picture and the Y. Cb and Cr matrices are encoded independently using the JPEG algorithm (DCT. Quantization. entropy encoding) except that the quantization

Video Compression – PBFrames • A fourth type of frame known as PB-frame has also been defined. it does not refer to a new frame type as such but rather the way two neighbouring P.and B-frames are encoded as if they were a single frame.

Video in Multimedia Application • Entertainment: Broadcast Television and VCR/DVD Recordings • Interpersonal: Video Telephony and Video Conferencing • Interactive: Access to stored Video • To understand the need for video compression. look at the bit rate of certain applications.

Hence B-frames (Bi-directional) are used. Their contents are predicted using the past and the future frames. • B.frames provides highest level of compression and because they are not involved in the coding of other frames they do not propagate errors • To limit the time for search the comparison is limited to few segments. • For fast moving video it will not work effectively.Video Compression • Motion estimation involves comparing small segments of two consecutive frames for differences and should a difference be detected a search is carried out to determine which neighbouring segments the original segment has moved. • Works well in slow moving applications like video telephony.

I-frame Implementation Schematic Intraframe Coding The encoding procedure used for the macroblocks that make up an I-frame is the same as that used in the JPEG standard

I-frame Implementation Schematic .Simplified

RGB and YCrCb • Since the three component signals R.G.B are treated separately in digital television. • All the three R.G.B should have the same resolution in terms of sampling rate and number of bits per sample. it is possible to digitize the three signals separately to make up the picture. • Quality of video depends on (i) Digitization Format and (ii) Frame Refresh Rate

Chroma Subsampling Downsampling chrominance

Chroma Subsampling • Subsampling format of 4:4:4 means that for every 4 luminance components. That is. • No compression taking place in 4:4:4. Not useful because one hour of uncompressed video = 100 GB of space. • Take advantage of an important property of the human psycho visual system. • Human vision system more sensitive to the luminance component than the chrominance component. we have 4 Cb and 4 Cr components. • So. the human eye is more sensitive to brightness than it is to colour. it makes sense that we can drop some of

• In the 4:2:0 scheme. and 1 Cb and Cr sample. • in the 4:1:1 format. we have horizontal subsampling where for every 4 Y components horizontally. we have one Cb and one Cr component. the number of chroma samples is half that of the luma samples. • Every 2x2 pixel block will have 4 luma samples.Chroma Subsampling • In the 4:2:2 scheme. we have horizontal and vertical sampling.

Macroblock The digitized contents of the Y matrix associated with each frame divided into a twodimensional matrix of 16 X 16 pixels known as a macroblock.

To encode a p-frame the contents of each macroblock in the frame – known as the target frame are compared on a pixel-bypixel basis with the contents of the I or P frames (reference frames).P-frame encoding • 4 DCT blocks for the luminance signals in the example here and 1 each for the two chrominance signals are used.Video Compression.  If a match is not found the search is extended to cover an area around the  If a close match is found then only the address of the macroblock is encoded.

Video Compression – P-frame Encoding

the contents of each macroblock in the frame (target frame) are compared on a pixel-by-pixel basis with the contents of the corresponding macroblock in the preceeding I.or P-frame Video Compression – P-frame Encoding To encode a P-frame.

Video Compression – B-frame Encoding

• Third motion vectors and set of difference matrices are then computed using the target and the mean of the two other predicted set of values.B-frame encoding • To encode B-frame any motion is estimated with reference to both the preceding I or P frame and the succeeding P or I frame. • The set with the lowest set of difference matrices is chosen and is encoded. • The motion vector and difference matrices are computed using first the preceding frame as the reference frame and then the succeeding frame as the reference.

Decoding of I. P. and B frames • • • I-frames decoded immediately to recreate original frame. P-frames the received information is decoded and the resulting information is used with the decoded contents of the preceding I/P frames (two buffers are used). P. B-frames the received information is decoded and the resulting information is used with the decoded contents of the preceding and succeeding P or I frame (three buffers are used).

PB .frames A new frame type showing how two neighbouring P and B frames are encoded as if they were a single frame.

Implementation schematic – I-frames Intraframe Coding • The encoding procedure used for the macroblocks that make up an I-frame is the same as that used in the JPEG standard

There are three possibilities: 1. then the target macroblock is encoded in the same way as a macroblock in an Iframe.P-frames • In the case of P-frames the encoding of each macroblock is dependent on the output of the motion estimation unit which. If no close match is found. depends on the contents of the macroblocks being encoded and the contents of the macroblock in the search area of the reference frame that produces the closest match. 3. 2. If the two contents are very close. in turn. only the address of the macroblock in the reference frame is encoded.Implementation Issues. both the motion vector and the difference matrices associated with the macroblock in the reference frame are encoded. If the two contents are the same.

RMKCET 43 .Department of ECE.

Inter-frame (P-frame) Coding P-frames use "pseudo-differences" from previous frame ("predicted"). so frames depend on each other.

Implementation Issues.B-frames

Implementation Issues - Bitstream format
For each macroblock it is necessary to identify the
type of encoding that has been used. This is the role
of the formatter
Type – indicates the type of frame encoded I, P or B
Address – identifies the location of the macroblock
in the frame
Quantization Value – is the value used to quantize
all the DCT coefficients in the macroblock
Motion vector – encoded vector
Block representation – indicates which of the six
8X8 blocks that make up the macroblcok are present
B1, B2, ..B6: JPEG encoded DCT coefficients for
those blocks present
Digitization format defines the sampling rate to be used for luminance and two chrominance signals and their relative position in each frame.

Digitization Format Digitization format exploits the fact that two chrominance signals can tolerate a reduced resolution relative that used for the

H.261 Video Compression Standard (ITU-T) • Standard defined by ITU-T for the provision of video telephony and video conferencing services over the ISDN. • Also known as p×64 where p can be 1 to 30. • Network offers transmission channels of multiples of 64 kbps. • Digitization format used – QCIF (for video telephony) /CIF (for video

H.261 Video Compression Standard (ITU-T) • Spatial Resolution of each format as follows: 1. CIF: Y=352×288. Cb=Cr=176×144 2. QCIF: Y=176×144. Cb=Cr=88×72 • Progressive (non interlaced) scanning used with the frame refresh rate of 30 fps for CIF and either 15 or 7.5 fps for the QCIF. • Just I and P frames are used in H.261 with three P frames in between each pair of I frames.

Department of ECE. RMKCET 54 .

H.261 Video Encoder

H.261 Video Encoder

• To transmit digital signals over PSTN access circuits. security surveillance. Applications include video telephony. interactive game playing etc. • This has put a demand on encoder to compress video at these very low bit rates.H.263 Video Compression Standard (ITU-T) • Defined by ITU-T for use in a range of video applications over wireless and PSTNs. all of which require the output of the video encoder to be transmitted across the network connection in real time as it is output by the encoder. modem is required whose bit rates used to be 28.8 kbps or 56 kbps.

at low bit rates it has to revert to using a high quantization threshold and a relatively low frame rate.H.263 encoder is based on that used in H.261 standard. Since it uses only I and P frames. • At low bit rates than 64 kbps. • To minimize these effects. • High quantization leads to blocking artifacts and low frame rate leads to jerky movements. H.263 uses advanced coding options.H.261 encoder gives relatively poor picture quality.

Blocking Artifact and Jerky Movement

H.263 Features • Two mandatory formats associated with digital video – QCIF and Sub-QCIF. • Spatial Resolution of each format as follows: 1. QCIF: Y=176×144. Cb=Cr=88×72 2. S-QCIF: Y=128×96. Cb=Cr=64×68 • Progressive (non interlaced) scanning used with the frame refresh rate of either 15 or 7.5 fps. • Frame types: I. P and B frames are used. To achieve high frame rate. neighbouring PB frames encoded as single entity.

Unrestricted Motion Vectors • The motion vectors associated with predicted macroblocks are normally restricted to a defined area in the reference frame around the location in the target frame of the macroblock being encoded. • This gives improvement in level of compression. • In the unrestricted motion vector mode. for those pixels of a potentially close-match macroblock that fall outside the frame boundary. edge pixels are used or motion vector is allowed to point outside of the frame area.

• With this type of network. • It is not possible to identify the specific macroblocks that are corrupted but rather that the related GOB contains one or more macroblocks in error.Error Resilience • The target network for H.263 standard is wireless network and PSTN. there is a relatively high probability that transmission bit errors will be present in the bit stream received by the decoder. • Short burst of errors corrupt a string of macroblocks within a frame.

the error will persist for a number of frames. it is highly probable that the same GOB in each of the following frames that are derived from the GOB in error will contain errors. hence making the error more apparent to the viewers. • This means that when an error in a GOB occurs.Error Resilience • As the frame contents are predicted from information in other frames.

the decoder skips the remaining blocks in the affected GOB and searches for the resynchronization marker (start code) at the head of the next GOB. • For example. In order to mask the error from the viewer.Error Resilience • When an error in a GOB is detected. an error concealment scheme is incorporated into the decoder. a common approach is to use the contents of the corresponding GOB from the preceding (decoded) frame. • It then recommences decoding from the start of this GOB.

4.27). and Reference Frame Selection. it rapidly spreads to other neighboring GOBs. It is shown in the next slide (Fig. Schemes to minimize this effect are: Error Tracking. I frames are inserted at relatively infrequent intervals. • Lack of I frames has the effect that errors within a GOB may propagate to other regions of the frame due to the resulting errors in the motion estimation and motion compensation information.Error Resilience • PSTN provides only a relatively low bit rate transmission channel and to conserve bandwidth. This may have annoying effect to viewer. • Although the initial error occurs in one GOB position. Independent Segment Decoding.

RMKCET 71 .Department of ECE.

the return channel is used by the decoder to send NAK message back to encoder. It then proceeds to transmit the macroblocks in these frames in their Intracoded form. Encoder identifies the macroblock in those GOBs and later frames that are likely to be affected. • When an error is detected. a two way channel is required for the exchange of the compressed audio and video information generated by the codec in each terminal.Error Tracking • With real-time applications such as video telephony. •

• To achieve this each GOB is treated as a separate sub video which is independent of the other GOBs in the frame. the same GOB in each successive frame is affected until the Intracoded frame is sent by the encoder. • This means that the motion estimation and motion estimation is limited to the boundary pixels of a GOB rather than a frame.Independent Segment Decoding • This scheme prevent errors in GOB affecting neighbouring GOBs in succeeding frames. • When an error in a GOB occurs. Department of ECE. RMKCET 73 .

RMKCET 74 .Department of ECE.

Department of ECE. This can be operated in two different modes: NAK and ACK.e. 5th frame). • When the NAK relating to frame 2 is received. the number being determined by the round trip delay of the communication channel. • In this scheme the GOB in error will propagate for a number of frames. that is. RMKCET 75 . the encoder selects (the decoded) GOB 3 of frame 1 as the reference to encode GOB 3 of the next frame (i.Reference Picture Selection • This scheme is similar to the error tracking scheme. the time delay between the NAK being sent by the decoder and an inter-coded frame derived from the initial I frame being received.

RMKCET 76 .Department of ECE.

• Only frames that have been acknowledged are used as reference frames. • At this point the ACK for frame 4 is received and this is used to encode frame 7. RMKCET 77 .Reference Picture Selection • With ACK mode. all frames received without errors are acknowledged by the decoder returning an ACK message. lack of ACK for frame 3 means that frame 2 must be used to encode frame 6 and frame 5. • In this example. Department of ECE.

the digitization format used is the source intermediate format (SIF) and progressive scanning with a refresh rate of 0 Hz (NTSC) and Department of ECE. RMKCET 78 25 Hz (for PAL) .261.Video Compression – MPEG-1 example frame sequence • Uses a similar video compression technique as H.

RMKCET 79 .MPEG • • MPEG-1 ISO Recommendation 11172 uses resolution of 352x288 pixels and used for VHS quality audio and video on CD-ROM at a bit rate of 1. Different levels of video resolution possible Low: 352X288 comparable with MPEG-1 Main: 720X 576 pixels studio quality video and audio. bit rate up to 15 Mbps High: 1920X1152 pixels used in wide screen Department of ECE.5 Mbps MPEG-2 ISO Recommendation 13818 Used in recording and transmission of studio quality audio and video.

MPEG • • • • MPEG-4: Used for interactive multimedia applications over the Internet and over various entertainment networks MPEG standard contains features to enable a user not only to passively access a video sequence using for example the start/stop/ but also enables the manipulation of the individual elements that make up a scene within a video In MPEG-4 each video frame is segmented into a number of video object planes (VOP) each of which will correspond to an AVO (Audio visual object) of interest Each audio and video object has a separate object descriptor associated with it which allows the object – providing the creator of the audio and /or video has provided the facility – to be manipulated by the viewer prior to it being decoded and played out Department of ECE. RMKCET 80 .

RMKCET complete compressed video (sequence) which 81 . the Department of ECE.Video Compression – MPEG-1 video bitstream structure: composition • The compressed bitstream produced by the video encoder is hierarchical: at the top level.

Department of ECE. RMKCET 82 .

Video Compression – MPEG-1 video bitstream structure: format • In order for the decoder to decompress the received bitstream. each data structure must be Department of ECE. RMKCET 83 clearly identified within the bitstream .

Department of ECE. RMKCET 84 .

RMKCET 85 .Department of ECE.

Department of ECE. RMKCET 86 .

Department of ECE. RMKCET 87 .

RMKCET 88 .Department of ECE.

RMKCET 89 .Department of ECE.

RMKCET 90 multiple video object planes .Video Compression – MPEG-4 coding principles • Content based video coding principles showing how a frame/scene is defined in the form of Department of ECE.

Department of ECE. RMKCET 91 .

Department of ECE. RMKCET 92 .

Video Compression – MPEG – 4 encoder/decoder schematic • Before being compressed each scene is defined in the form of a background and one or more Department of ECE. RMKCET 93 foreground audio-visual objects (AVOs) .

Video Compression – MPEG VOP

The audio associated with an AVO is compressed
using one of the algorithms described before and
depends on the available bit rate of the
Department of ECE, RMKCET
transmission channel
and the sound quality 94

