You are on page 1of 32

Project Report

ON
“VIDEO COMPRESSION”

Submitted in partial fulfillment of the requirement for the degree


of
B. Tech. in Computer Science & Engineering.

Under Guidance of: Submitted By:


Mr. Roshan Singh Amit Saini (Roll No.)
(Assistant Professor) Raj Kamal Sharma (Roll No.)
Sandeep Yadav (Roll No.)

MANAV RACHNA COLLEGE OF ENGINEERING


FARIDABAD
BATCH (2007-2011)

1
Table Of Content

Chapter 1. Introduction

1.1 Objective(s) of the System/Tool………………………………………………1

1.2 Scope of the System/Tool………………………………………………………2

1.3 Problem definition of the System/Tool,………………………………………3

1.4 Hardware and Software Requirements………………………………………3.

Chapter 2. Problem Analysis

2.1 Literature Survey…………………………………………………………….4

2.1.1 Introduction.........................................................................................4
2.1.2 VIDEO COMPRESSION TECHNOLOGY……...............................5
2.1.3.COMPRESSION STANDARDS………………................................6
2.1.4.MPEG-1...............................................................................................9
2.1.5.MPEG-2..............................................................................................12
2.1.6.MPEG-4..............................................................................................17
2.1.7.MPEG-7..............................................................................................19
2.1.8.H.261....................................................................................................21
2.2 Methodology Adopted………………………………………………………..24

Chapter 3. Project Estimation and Implementation Plan

3.1 Cost and Benefit Analysis……………………………………………………26

3.2. Schedule Estimate…………………………………………………………………………28

3.3 PERT Chart/ Gantt Chart………………………………………………………………28

References………………………………………………………………………30

2
Chapter -1 Introduction

1.1. OBJECTIVE
1.1.1. NEED OF THE SYSTEM: Uncompressedxc video (and audio) data are huge. In
HDTV, the bit rate easily exceeds 1 Gbps. -- big problems for storage and network
communications. For example: One of the formats defined for HDTV broadcasting
within the United States is 1920 pixels horizontally by 1080 lines vertically, at 30
frames per second. If these numbers are all multiplied together, along with 8 bits for
each of the three primary colors, the total data rate required would be approximately 1.5
Gb/sec. Because of the 6 MHz. channel bandwidth allocated, each channel will only
support a data rate of 19.2 Mb/sec, which is further reduced to 18 Mb/sec by the fact
that the channel must also support audio, transport, and ancillary data information. As
can be seen, this restriction in data rate means that the original signal must be
compressed by a figure of approximately 83:1. This number seems all the more
impressive when it is realized that the intent is to deliver very high quality video to the
end user, with as few visible artifacts as possible.
1.1.2. OUTCOME OF THE SYSTEM:

Video Compressor is a multifunctional video compression software to help you compress


video files to smaller file size. With comprehensive video formats supported, plentiful profiles
and handy tools provided, this Video Compressor is the ideal video file compressor and video
size compressor.
A digital video compression system and its methods for compressing digitalized video signals
in real time. The system compressor receives digitalized video frames divided into subframes,
performs in a single pass a spatial domain to transform domain transformation in two
dimensions of the picture elements of each subframe, normalizes the resultant coefficients by a
normalization factor having a predetermined compression ratio component and an adaptive
rate buffer capacity control feedback component, to provide compression, encodes the
coefficients and stores them in a first rate buffer memory asynchronously at a high data transfer

3
rate from which they are put out at a slower, synchronous rate. The compressor adaptively
determines the rate buffer capacity control feedback component in relation to instantaneous
data content of the rate buffer memory in relation to its capacity, and it controls the absolute
quantity of data resulting from the normalization step so that the buffer memory is never
completely emptied and never completely filled. In expansion, the system essentially mirrors
the steps performed during compression. An efficient, high speed decoder forms an important
aspect of the present invention. The compression system forms an important element of a
disclosed color broadcast compression system.

Scope of the system / Tool

What will the tool be able to do?

The system tool is able into these topics:


 Desktop Tools and Development Environment — Startup and shutdown, arranging the
desktop, and using tools to become more productive with MATLAB
 Data Import and Export — Retrieving and storing data, memory-mapping, and
accessing Internet files
 Mathematics — Mathematical operations
 Data Analysis — Data analysis, including data fitting, Fourier analysis, and time-series
tools
 Programming Fundamentals — The MATLAB language and how to develop
MATLAB applications
 Object-Oriented Programming — Designing and implementing MATLAB classes
 Graphics — Tools and techniques for plotting, graph annotation, printing, and
programming with Handle Graphics® objects
 3-D Visualization — Visualizing surface and volume data, transparency, and viewing
and lighting techniques
 Creating Graphical User Interfaces — GUI-building tools and how to write callback
functions
 External Interfaces — MEX-files, the MATLAB engine, and interfacing to Sun
Microsystems™ Java software, Microsoft® .NET Framework, COM, Web services, and the
serial port
There is reference documentation for all MATLAB functions:
 Function Reference — Lists all MATLAB functions, listed in categories or
alphabetically
 Handle Graphics Property Browser — Provides easy access to descriptions of graphics
object properties

4
 C/C++ and Fortran API Reference — Covers functions used by the MATLAB external
interfaces, providing information on syntax in the calling language, description, arguments,
return values, and examples
The MATLAB application can read data in various file formats, discussed in the following
sections:
 Recommended Methods for Importing Data
 Importing MAT-Files
 Importing Text Data Files
 Importing XML Documents
 Importing Excel Spreadsheets
 Importing Scientific Data Files
 Importing Images
 Importing Audio and Video
 Importing Binary Data with Low-Level I/O

1.4 Hardware And Software Requirements:


1.4.1 HARDWARE REQUIREMENTS:
512 MB – RAM
10 GB – HARD DISK
1.4.2 SOFTWARE REQUIREMENTS:
1. OPERATING SYSTEM – WINDOWs or LINUX
2. MATLAB

5
Chapter-2 Problem Analysis

2.1 Literature Survey


2.1.1 Introduction (Background Work):
Video compression typically operates on square-shaped groups of neighboring pixels, often
called macro blocks. These pixel groups or blocks of pixels are compared from one frame to
the next and the video compression codec (encode/decode scheme) sends only the difference
within those blocks. This works extremely well if the video has no motion. A still frame of
text, for example, can be repeated with very little transmitted data. In areas of video with more
motion, more pixels change from one frame to the next. When more pixels change, the video
compression scheme must send more data to keep up with the larger number of pixels that are
changing. If the video content includes an explosion, flames, a flock of thousands of birds, or
any other image with a great deal of high-frequency detail, the quality will decrease, or the
variable bit rate must be increased to render this added information with the same level of
detail.
Video is basically a three-dimensional array of color pixels. Two dimensions serve as spatial
(horizontal and vertical) directions of the moving pictures, and one dimension represents the
time domain. A data frame is a set of all pixels that correspond to a single time moment.
Basically, a frame is the same as a still picture.
Some forms of data compression are lossless. This means that when the data is decompressed,
the result is a bit-for-bit perfect match with the original. While lossless compression of video is
possible, it is rarely used, as lossy compression results in far higher compression ratios at an
acceptable level of quality.
One of the most powerful techniques for compressing video is interframe compression.
Interframe compression uses one or more earlier or later frames in a sequence to compress the
current frame, while intraframe compression uses only the current frame, which is effectively
image compression.

The most commonly used method works by comparing each frame in the video with the
previous one. If the frame contains areas where nothing has moved, the system simply issues a

6
short command that copies that part of the previous frame, bit-for-bit, into the next one. If
sections of the frame move in a simple manner, the compressor emits a (slightly longer)
command that tells the decompressor to shift, rotate, lighten, or darken the copy — a longer
command, but still much shorter than intraframe compression. Interframe compression works
well for programs that will simply be played back by the viewer, but can cause problems if the
video sequence needs to be edited.

2.1.2 Video Compression Technology:

At its most basic level, compression is performed when an input video stream is analyzed and
information that is indiscernible to the viewer is discarded. Each event is then assigned a code -
commonly occurring events are assigned few bits and rare events will have codes more bits.
These steps are commonly called signal analysis, quantization and variable length encoding
respectively. There are four methods for compression, discrete cosine transform (DCT), vector
quantization (VQ), fractal compression, and discrete wavelet transform (DWT):

DCT:-Discrete cosine transform is a lossy compression algorithm that samples an image at


regular intervals, analyzes the frequency components present in the sample, and discards those
frequencies which do not affect the image as the human eye perceives it. DCT is the basis of
standards such as JPEG, MPEG, H.261, and H.263.

VQ:-Vector quantization is a lossy compression that looks at an array of data, instead of


individual values. It can then generalize what it sees, compressing redundant data, while at the
same time retaining the desired object or data stream's original intent.

FTC:-Fractal compression is a form of VQ and is also a lossy compression. Compression is


performed by locating self-similar sections of an image, then using a fractal algorithm to
generate the sections.

DWT:-Like DCT, discrete wavelet transform mathematically transforms an image into


frequency components. The process is performed on the entire image, which differs from the

7
other methods (DCT) that work on smaller pieces of the desired data. The result is a
hierarchical representation of an image, where each layer represents a frequency band.

2.1.3 Compression Standards (Techniques for solving the problem):

MPEG stands for the Moving picture Expert Group.MPEG is an ISO/IEC working group,
established in 1988 to develop standards for digital audio and video formats. There are four
MPEG standards being used or in development. Each compression standard was designed with
a specific application and bit rate in mind, although MPEG compression scales well with
increased bit rates. They include:

2.1.3.1 MPEG-1
Designed for up to 1.5 Mbit/sec Standard for the compression of moving pictures and audio.
This was based on CD-ROM video applications, and is a popular standard for video on the
Internet, transmitted as .mpg files. In addition, level 3 of MPEG-1 is the most popular standard
for digital compression of audio--known as MP3. MPEG-1 is the standard of compression for
VideoCD, the most popular video distribution format thoughout much of Asia.

2.1.3.2 MPEG-2
Designed for between 1.5 and 15Mbit/sec Standard on which Digital Television set top boxes
and DVD compression is based. It is based on MPEG-1, but designed for the compression and
transmission of digital broadcast television. The most significant enhancement from MPEG-1
is its ability to efficiently compress interlaced video. MPEG-2 scales well to HDTV resolution
and bit rates, obviating the need for an MPEG-3.

2.1.3.4 MPEG-4
Standard for multimedia and Web compression. MPEG-4 is based on object-based
compression, similar in nature to the Virtual Reality Modeling Language. Individual objects
within a scene are tracked separately and compressed together to create an MPEG4 file. This
results in very efficient compression that is very scalable, from low bit rates to very high. It

8
also allows developers to control objects independently in a scene, and therefore introduce
interactivity.

2.1.3.5 MPEG7- this standard, currently under development, is also called the Multimedia
Content Description Interface. When released, the group hopes the standard will provide a
framework for multimedia content that will include information on content manipulation,
filtering and personalization, as well as the integrity and security of the content. Contrary to the
previous MPEG standards, which described actual content, MPEG-7 will represent information
about the content.

2.1.3.6 H.261:-H.261 is an ITU standard designed for two-way communication over ISDN
lines (video conferencing) and supports data rates which are multiples of 64Kbit/s. The
algorithm is based on DCT and can be implemented in hardware or software and uses
intraframe and interframe compression. H.261 supports CIF and QCIF resolutions.

MPEG4 advantages include high compression, low bit rate and motion compensation support.
Disadvantages are latency and blocking artifacts. JPEG, JPEG2000, and MPEG4 have all been
used in video surveillance systems, with the choice depending on what is most important in
that particular application. H.264 is an advanced compression scheme which is also starting to
find its way into video surveillance systems. H.264 offers compression at the expense of
additional hardware complexity. It is not examined this paper, but FPGA-based solutions for
H.264.

2.1.3.7 MPEG21:

The MPEG-21 standard, from the Moving Picture Expert Group, aims at defining an open
framework for multimedia applications. MPEG-21 is ratified in the standards ISO/IEC 21000 -
Multimedia framework (MPEG-21).

MPEG-21 is based on two essential concepts:

• definition of a Digital Item (a fundamental unit of distribution and transaction)


• users interacting with Digital Items

9
Digital Items can be considered the kernel of the Multimedia Framework and the users can be
considered as who interacts with them inside the Multimedia Framework. At its most basic
level, MPEG-21 provides a framework in which one user interacts with another one, and the
object of that interaction is a Digital Item. Due to that, we could say that the main objective of
the MPEG-21 is to define the technology needed to support users to exchange, access,
consume, trade or manipulate Digital Items in an efficient and transparent way.

The storage of an MPEG-21 Digital Item in a file format based on the ISO base media file
format, with some or all of Digital Item's ancillary data (such as movies, images or other non-
XML data) within the same file.

2.1.3.8 H.263:

H.263 is a video compression standard originally designed as a low-bit rate compressed format
for video conferencing. It was developed by the (VCEG) ITU-T Video Coding Expert Group
(VCEG).

H.263 has since found many applications on the internet: much Flash video content (as used
on sites such as YouTube,Google Video,MySpace, etc.) used to be encoded in Sorenson Spark
format, though many sites now use VP6 or H.264 encoding.

H.263 was developed as an evolutionary improvement based on experience from H.261, the
previous ITU-T standard for video compression, and the MPEG-1 and MPEG-2 standards. Its
first version was completed in 1995 and provided a suitable replacement for H.261 at all bit
rates. It was further enhanced in projects known as H.263v2; MPEG-4 Part 2 is H.263
compatible in the sense that a basic H.263 bit stream is correctly decoded by an MPEG-4
Video decoder.

2.1.3.9 H.264:

10
The next enhanced codec developed by ITU-T VCEG (in partnership with MPEG) after H.263
is the H.264 standard, also known as AVC and MPEG-4 part 10. As H.264 provides a
significant improvement in capability beyond H.263, the H.263 standard is now considered a
legacy design. Most new videoconferencing products now include H.264 as well as H.263 and
H.261 capabilities. H.264 is used in such applications as players for Blu-ray Discs, videos from
You Tube and the iTunes Store, web software such as the Adobe Flash Player and Microsoft
Silverlight, broadcast services for DVB and SBTVD, direct-broadcast satellite television
services, cable television services, and real-time videoconferencing.

2.1.4 MPEG-1
The ‘Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to
about 1,5 Mbit/s’ (ISO/IEC 11172) or MPEG-1 as it is more commonly known as, standardizes
the storage and retrieval of moving pictures and audio storage media forms the basis for Video
CD and MP3 formats.
This part of the specification describes the coded representation for the compression of video
sequences.
The basic idea of MPEG video compression is to discard any unnecessary information i.e. an
MPEG-1 encoder by analyses:
1
How much movement there is in the current frame compared to the previous frame what
changes of color have taken place since the last frame what changes in light or contrast have
taken place since the last frame what elements of the picture have remained static since the last
frame

The encoder then looks at each individual pixel to see if movement has taken place, if there has
been no movement, the encoder stores an instruction to say to repeat the same frame or repeat
the same frame, but move it to a different position.

1.I intra-frame
2.B Bidirectional frames
3.P Predicted frames

11
Audio, video and time codes are converted into one single stream.

625 and 525 line


from 1 to 1.5Mbits/s
24-30 frames per second

MPEG-1 compression treats video as a sequence of separate images. ‘Picture Elements’, often
referred to as ‘pixels’ are elements in the image. Each pixel consists of three components –
Luminance/luminosity (Y) and two for chrominance Cb and Cr. MPEG-1 encodes Y pixels in
full (check the correct term) resolution as the Human Visual System (HVS) is most sensitive to
luminance/luminosity.
1
Quantification
Predictive coding –the difference between the predicted pixel value and the real value is coded.
Movement compensation (MC) predicts the value of a neighboring block of pixels (1 block =
8x8 pixels) in an image to those of a known block of pixels. A vector describes the 2-
dimensional movement. If no movement takes place, the value is 0.
2
Interframe coding
Sequential coding
VLC (Variable? Coding)
Image Interpolation
Intra (I frames) are coded independently of other images.

MPEG codes images progressively Interlaced images need to be converted into a de-interlaced
format before encoding, Video is encoded and Encoded video is converted into an interlaced
form.
To achieve a high compression ratio: An appropriate spatial resolution for the signal is
chosen/the image is broken down into different pixels; block-based motion compensation is
used to reduce the temporal redundancy.

12
1 Motion compensation is used for causal prediction of the current picture from a
previous picture, for non-causal prediction of the current picture from a future picture, or for
interpolative prediction from past and future pictures.
The difference signal, the prediction error, is further compressed using the discrete cosine
transform (DCT) to remove spatial correlation and is then quantized. Finally, the motion
vectors are combined with the DCT information, and coded using variable length codes.

MPEG-1 is a standard in 4 parts:

Part 1 addresses the problem of combining one or more data streams from the video and audio
parts of the MPEG-1 standard with timing information to form a single stream .This is an
important function because, once combined into a single stream, the data are in a form well
suited to digital storage or transmission.

Part 2 specifies a coded representation that can be used for compressing video sequences -
both 625-line and 525-lines - to bitrates around 1,5 Mbit/s. Part 2 was developed to operate
principally from storage media offering a continuous transfer rate of about 1,5 Mbit/s.
Nevertheless it can be used more widely than this because the approach taken is generic. A
number of techniques are used to achieve a high compression ratio. The first is to select an
appropriate spatial resolution for the signal. The algorithm then uses block-based motion
compensation to reduce the temporal redundancy. Motion compensation is used for causal
prediction of the current picture from a previous picture, for non-causal prediction of the
current picture from a future picture, or for interpolative prediction from past and future
pictures. The difference signal, the prediction error, is further compressed using the discrete
cosine transform (DCT) to remove spatial correlation and is then quantized. Finally, the motion
vectors are combined with the DCT information, and coded using variable length codes.

Part 3 specifies a coded representation that can be used for compressing audio sequences -
both mono and stereo. Input audio samples are fed into the encoder. The mapping creates a
filtered and subsampled representation of the input audio stream. A psychoacoustic model
creates a set of data to control the quantiser and coding. The quantiser and coding block creates

13
a set of coding symbols from the mapped input samples. The block 'frame packing' assembles
the actual bit stream from the output data of the other blocks, and adds other information (e.g.
error correction) if necessary.

Part 4 specifies how tests can be designed to verify whether bitstreams and decoders meet the
requirements as specified in parts 1, 2 and 3 of the MPEG-1 standard. These tests can be used
by:

• manufacturers of encoders, and their customers, to verify whether the encoder produces
valid bitstreams.
• manufacturers of decoders and their customers to verify whether the decoder meets the
requirements specified in parts 1,2 and 3 of the standard for the claimed decoder
capabilities.
• applications to verify whether the characteristics of a given bitstream meet the
application requirements, for example whether the size of the coded picture does not
exceed the maximum value allowed for the application.

2.1.5 MPEG-2

MPEG-2 is an extension of the MPEG-1 international standard for digital compression of audio
and video signals. MPEG-2 is directed at broadcast formats at higher data rates; it provides
extra algorithmic 'tools' for efficiently coding interlaced video, supports a wide range of bit
rates and provides for multichannel surround sound coding.

1. INTRODUCTION:

The MPEG-2 standard [2] is capable of coding standard-definition television at bit rates from
about 3-15 Mbit/s and high-definition television at 15-30 Mbit/s. MPEG-2 extends the stereo
audio capabilities of MPEG-1 to multi-channel surround sound coding. MPEG-2 decoders will
also decode MPEG-1 bit streams.

2. VIDEO FUNDAMENTALS

14
Television services in Europe currently broadcast video at a frame rate of 25 Hz. Each frame
consists of two interlaced fields, giving a field rate of 50 Hz. The first field of each frame
contains only the odd numbered lines of the frame (numbering the top frame line as line 1).
The second field contains only the even numbered lines of the frame and is sampled in the
video camera 20 ms after the first field. It is important to note that one interlaced frame
contains fields from two instants in time. American television is similarly interlaced but with a
frame rate of just less than 30 Hz.

The red, green and blue (RGB) signals coming from a color television camera can be
equivalently expressed as luminance (Y) and chrominance (UV) components. The
chrominance bandwidth may be reduced relative to the luminance without significantly
affecting the picture quality. For standard definition video, CCIR recommendation 601 [3]
defines how the component (YUV) video signals can be sampled and digitized to form discrete
pixels. The terms 4:2:2 and 4:2:0 are often used to describe the sampling structure of the
digital picture. 4:2:2 means the chrominance is horizontally sub sampled by a factor of two
relative to the luminance; 4:2:0 means the chrominance is horizontally and vertically sub
sampled by a factor of two relative to the luminance.

3. BIT RATE REDUCTION PRINCIPLES: A bit rate reduction system operates by


removing redundant information from the signal at the coder prior to transmission and re-
inserting it at the decoder. A coder and decoder pair is referred to as a 'codec'. In video signals,
two distinct kinds of redundancy can be identified.

Spatial and temporal redundancy: Pixel values are not independent, but are correlated with
their neighbors both within the same frame and across frames. So, to some extent, the value of
a pixel is predictable given the values of neighboring pixels.

Psycho visual redundancy:

The human eye has a limited response to fine spatial detail [4], and is less sensitive to detail
near object edges or around shot-changes. Consequently, controlled impairments introduced
into the decoded picture by the bit rate reduction process should not be visible to a human
observer.

15
Two key techniques employed in an MPEG codec are intra-frame Discrete Cosine Transform
(DCT) coding and motion-compensated inter-frame prediction. These techniques have been
successfully applied to video bit rate reduction prior to MPEG, notably for 625-line video
contribution standards at 34 Mbit/s and video conference systems at bit rates below 2 Mbit/s.

4. MPEG-2 DETAILS

In an MPEG-2 system, the DCT and motion-compensated interframe prediction are combined.
The coder subtracts the motion-compensated prediction from the source picture to form a
'prediction error' picture. The prediction error is transformed with the DCT, the coefficients are
quantized and these quantized values coded using a VLC. The coded luminance and
chrominance prediction error is combined with 'side information' required by the decoder, such
as motion vectors and synchronizing information, and formed into a bit stream for
transmission. In the decoder, the quantized DCT coefficients are reconstructed and inverse
transformed to produce the prediction error. This is added to the motion-compensated
prediction generated from previously decoded pictures to produce the decoded output.

Picture types

In MPEG-2, three 'picture types' are defined. The picture type defines which prediction modes
may be used to code each block..

'Intra' pictures (I-pictures) are coded without reference to other pictures. Moderate compression
is achieved by reducing spatial redundancy, but not temporal redundancy. They can be used
periodically to provide access points in the bitstream where decoding can begin.

'Predictive' pictures (P-pictures) can use the previous I- or P-picture for motion compensation
and may be used as a reference for further prediction. Each block in a P-picture can either be
predicted or intra-coded. By reducing spatial and temporal redundancy, P-pictures offer
increased compression compared to I-pictures.

16
'Bidirectionally-predictive' pictures (B-pictures) can use the previous and next I- or P-pictures
for motion-compensation, and offer the highest degree of compression. Each block in a B-
picture can be forward, backward or bidirectionally predicted or intra-coded. To enable
backward prediction from a future frame, the coder reorders the pictures from natural 'display'
order to 'bitstream' order so that the B-picture is transmitted after the previous and next pictures
it references. This introduces a reordering delay dependent on the number of consecutive B-
pictures.

Buffer control: By removing much of the redundancy from the source images, the coder
outputs a variable bit rate. The bit rate depends on the complexity and predictability of the
source picture and the effectiveness of the motion-compensated prediction.

MPEG-2 is a standard currently in 6 parts :

Part 1 of MPEG-2 addresses the combining of one or more elementary streams of video and
audio, as well as, other data into single or multiple streams which are suitable for storage or
transmission. This is specified in two forms: the Program Stream and the Transport Stream.
Each is optimized for a different set of applications. The Program Stream is similar to MPEG-1
Systems Multiplex. It results from combining one or more Packetised Elementary Streams
(PES), which have a common time base, into a single stream. The Program Stream is designed
for use in relatively error-free environments and is suitable for applications which may involve
software processing. Program stream packets may be of variable and relatively great length.

The Transport Stream combines one or more Packetized Elementary Streams (PES) with one
or more independent time bases into a single stream. Elementary streams sharing a common
timebase form a program. The Transport Stream is designed for use in environments where
errors are likely, such as storage or transmission in lossy or noisy media. Transport stream
packets are 188 bytes long.

17
Part 2 of MPEG-2 builds on the powerful video compression capabilities of the MPEG-1
standard to offer a wide range of coding tools. These have been grouped in profiles to offer
different functionalities. Since the final approval of MPEG-2 Video in November 1994, one
additional profile has been developed. This uses existing coding tools of MPEG-2 Video but is
capable to deal with pictures having a colour resolution of 4:2:2 and a higher bitrate. Even
though MPEG-2 Video was not developed having in mind studio applications, a set of
comparison tests carried out by MPEG confirmed that MPEG-2 Video was at least good, and
in many cases even better than standards or specifications developed for high bitrate or studio
applications.

The Multiview Profile (MVP) is an additional profile currently being developed. By using
existing MPEG-2 Video coding tools it is possible to encode in an efficient way tow video
sequences issued from two cameras shooting the same scene with a small angle between them.

Part 3 of MPEG-2 - Digital Storage Media Command and Control (DSM-CC) is the
specification of a set of protocols which provides the control functions and operations specific
to managing MPEG-1 and MPEG-2 bitstreams. These protocols may be used to support
applications in both stand-alone and heterogeneous network environments. In the DSM-CC
model, a stream is sourced by a Server and delivered to a Client. Both the Server and the Client
are considered to be Users of the DSM-CC network. DSM-CC defines a logical entity called
the Session and Resource Manager (SRM) which provides a (logically) centralized
management of the DSM-CC Sessions and Resources.

Part 4 of MPEG-2 will be the specification of a multichannel audio coding algorithm not
constrained to be backwards-compatible with MPEG-1 Audio. The standard has been approved
in April 1997.

18
Part 5 of MPEG-2 was originally planned to be coding of video when input samples are 10
bits. Work on this part was discontinued when it became apparent that there was insufficient
interest from industry for such a standard.

Part 6 of MPEG-2 is the specification of the Real-time Interface (RTI) to Transport Stream
decoders which may be utilised for adaptation to all appropriate networks carrying Transport
Streams.

2.1.6 MPEG-4

Introduction

The creation of the MPEG-4 specification arose as experts wanted a faster compression rate
than MPEG-2, but which also worked well at low bit rates. Discussions began at the end of
1992 and work on the standards started in July 1993.
MPEG-4 provides a standardized method of:

1 1. Audio-visual coding at very low bit rates


2 2. Describing audio-visual objects in a scene.
3 3. Multiplexing and synchronizing the information associated with the objects
4 4. Interacting with the audio-visual scene that is received by the end user.

Elementary Streams:

Each encoded media object has its own Elementary Stream (ES), which is sent to the decoder
and decoded individually, before composition. The following streams are created in
MPEG-4:
1 Scene Description Stream
2 Object Description Stream
3 Visual Stream

19
4 Audio Stream

When data has been encoded, the data streams can be transmitted or stored separately and need
to be composed at the receiving end. Media objects are organized in a hierarchical manner to
form audio-visual scenes. Due to this organizational manner, the media objects, each object
can be described or encoded independently of other objects in the scene e.g. the background.

MPEG-4/BiFS:
1 Allows users to change their view point in a 3D scene or to interact with media objects.
Allows different objects in the same scene to be coded at different levels of quality.

MPEG-4 ‘Systems’ also addresses:


1 1. A standard file format to enable the exchange and authoring of MPEG-4 content
2 2.Interactivity (both client-side and server-side
3 3.MPEG-J (MPEG-4 & Java)
4 4. FlexMux tool which allows for the interleaving of multiple streams into a single
stream.

Profiles have been developed to create conformance points for MPEG-4 tools and toolsets,
therefore interoperability of MPEG-4 products with the same Profiles and Levels can be
assured.

A Profile is a subset of the MPEG-4 Systems, Visual or Audio tools set and is used for specific
applications. It limits the tool set a decoder has to implement, therefore many applications only
need a portion of the MPEG-4 toolset. Profiles specified in the MPEG-4 standard include:
1 a. Visual Profile
2 b. Natural Profile
3 c. Synthetic & Natural/Synthetic Hybrid Profiles
4 d. Audio Profile
5 e. Graphic Profile
6 f. Scene Graph Profile

20
The systems part of the MPEG-4 addresses the description of the relationship between the
audio-visual components that constitute a scene. The relationship is described at two main
levels.

• The Binary Format for Scenes (BIFS) describes the spatio-temporal arrangements of
the objects in the scene. Viewers may have the possibility of interacting with the
objects, e.g. by rearranging them on the scene or by changing their own point of view in
a 3D virtual environment. The scene description provides a rich set of nodes for 2-D
and 3-D composition operators and graphics primitives.
• At a lower level, Object Descriptors (ODs) define the relationship between the
Elementary Streams pertinent to each object (e.g. the audio and the video stream of a
participant to a videoconference) ODs also provide additional information.

7
2.1.7 MPEG-7

Introduction
The MPEG standards are an evolving set of standards for video and audio compression. MPEG
7 technology covers the most recent developments in multimedia search and retreival, designed
to standardize the description of multimedia content supporting a wide range of applications
including DVD, CD and HDTV.
MPEG-7 is a seven-part specification, formally entitled ‘Multimedia Content Description
Interface’. It provides standardized tools for describing multimedia content, which will enable
searching, filtering and browsing of multimedia content.

ISO 15938-1 Systems


MPEG-7 descriptions exist in two formats:

21
Textual - XML which allows editing, searching and filtering of a multimedia description. The
description can be located anywhere, not necessaryily with the content. Binary format -
suitable for storing, transmitting and streaming delivery of the multimedia description.

The MPEG-7 Systems provides the tools for:


1
a. The preparation of binary coded representation of the PEG-7 descriptions,
b. For efficient storage and transmission.
c. Transmission techniques (both textual and binary formats)
d. Multiplexing of descriptions
e. Synchronization of descriptions with content
f. Intellectual property management and protection
g. Terminal architecture
h. Normative interface

Descriptions may represented in two forms:


Textual (XML)
Binary form (BiM – Binary format for Metadata). Binary coded representation is
useful for efficient storage and transmission of content.

1 MPEG-7 data is obtained from transport or storage ,handed to delivery layer. This
allows extraction of elementary streams (consisting of individually accessible chunks called
access units) by undoing the transport/storage specific framing and multiplexing and retains
timing information needed for synchronisation.
2
3 Elementary streams are forwarded to the compression layer where the schema streams
(schemes describing strucure of MPEG-7 data) and partial or full description streams (streams
describing the content) are decoded.

22
MPEG-7 tools

MPEG-7 uses the following tools:

• Descriptor (D): It is a representation of a feature defined syntactically and


semantically. It could be that a unique object was described by several descriptors.

• Description Schemes (DS): Specify the structure and semantics of the relations
between its components, these components can be descriptors (D) or description
schemes (DS).

• Description Definition Language (DDL): It is based on XML language used to define


the structural relations between descriptors. It allows the creation and modification of
description schemes and also the creation of new descriptors (D).

• System tools: These tools deal with binarization, synchronization, transport and storage
of descriptors.

2.1.8 H.261

H.261 is a ITU-T video coding standard, ratified in November 1988. Originally designed for
transmission over ISDN lines on which data rates are multiples of 64 kbit/s. It is one member
of the H.26x family of video coding standards in the domain of the ITU-T Video Coding
Experts Group (VCEG). The coding algorithm was designed to be able to operate at video bit
rates between 40 kbit/s and 2 Mbit/s. The standard supports two video frame sizes: CIF
(352x288 luma with 176x144 chroma) and QCIF (176x144 with 88x72 chroma) using a 4:2:0
sampling scheme. It also has a backward-compatible trick for sending still picture graphics
with 704x576 luma resolution and 352x288 chroma resolution (which was added in a later
revision in 1993.
Different steps follow by this standard is:

23
1. Loop filter
The prediction process may be modified by a two-dimensional spatial filter (FIL) which
operates on pixels within a predicted 8 by 8 block..The filter is separable into one-dimensional
horizontal and vertical functions. Both are non-recursive with coefficients of 1/4, 1/2, 1/4
except at block edges where one of the taps would fall outside the block. In such cases the 1-D
filter is changed to have coefficients of 0, 1, 0. Full arithmetic precision is retained with
rounding to 8 bit integer values at the 2-D filter output. Values whose fractional part is one half
are rounded up. The filter is switched on/off for all six blocks in a macro block according to the
macro block type.

2. Transformer
Transmitted blocks are first processed by a separable two-dimensional discrete cosine
transform of size 8 by 8. The output from the inverse transform ranges from –256 to +255 after
clipping to be represented with 9 bits. The transfer function of the inverse transform is given
by:

NOTE – Within the block being transformed, x = 0 and y = 0 refer to the pel nearest the left
and top edges of the picture, respectively.
The arithmetic procedures for computing the transforms are not defined, but the inverse one
should meet the error Tolerance.

3. Quantization
The number of quantizers is 1 for the INTRA dc coefficient and 31 for all other coefficients.
Within a macro block the same quantizer is used for all coefficients except the INTRA dc one.
The decision levels are not defined. The INTRA dc coefficient is nominally the transform
value linearly quantized with a step size of 8 and no dead-zone. Each of the other 31 quantizers

24
is also nominally linear but with a central dead-zone around zero and with a step size of an
even value in the range 2 to 62.

Clipping of reconstructed picture


To prevent quantization distortion of transform coefficient amplitudes causing arithmetic
overflow in the encoder and decoder loops, clipping functions are inserted. The clipping
function is applied to the reconstructed picture which is formed by summing the prediction and
the prediction error as modified by the coding process. This clipper operates on resulting pel
values less than 0 or greater than 255, changing them to 0 and 255, respectively.

Different Tools have been developed already:

• Video compressor
• AVI Compressor
• MP4 Compressor
• MPEG Compressor
• 3GP Compressor
• You Tube Compressor
• iPod Compressor
• Flash Video Compressor
• Quick Time Compressor
• WMV Compressor
• MKV Compressor
• VOB Compressor
• DVD Compressor

25
2.2 Methodology Adopted

H.261

H.261 is a ITU-T video coding standard, ratified in November 1988. Originally designed for
transmission over ISDN lines on which data rates are multiples of 64 kbit/s. It is one member
of the H.26x family of video coding standards in the domain of the ITU-T Video Coding
Experts Group (VCEG). The coding algorithm was designed to be able to operate at video bit
rates between 40 kbit/s and 2 Mbit/s. The standard supports two video frame sizes: CIF
(352x288 luma with 176x144 chroma) and QCIF (176x144 with 88x72 chroma) using a 4:2:0
sampling scheme. It also has a backward-compatible trick for sending still picture graphics
with 704x576 luma resolution and 352x288 chroma resolution (which was added in a later
revision in 1993.
Different steps follow by this standard is:

1.Loop filter
The prediction process may be modified by a two-dimensional spatial filter (FIL) which
operates on pixels within a predicted 8 by 8 block..The filter is separable into one-dimensional
horizontal and vertical functions. Both are non-recursive with coefficients of 1/4, 1/2, 1/4
except at block edges where one of the taps would fall outside the block. In such cases the 1-D
filter is changed to have coefficients of 0, 1, 0. Full arithmetic precision is retained with
rounding to 8 bit integer values at the 2-D filter output. Values whose fractional part is one half
are rounded up. The filter is switched on/off for all six blocks in a macro block according to the
macro block type.

26
2.Transformer
Transmitted blocks are first processed by a separable two-dimensional discrete cosine
transform of size 8 by 8. The output from the inverse transform ranges from –256 to +255 after
clipping to be represented with 9 bits. The transfer function of the inverse transform is given
by:

NOTE – Within the block being transformed, x = 0 and y = 0 refer to the pel nearest the left
and top edges of the picture, respectively.
The arithmetic procedures for computing the transforms are not defined, but the inverse one
should meet the errorTolerance.

3. Quantization
The number of quantizers is 1 for the INTRA dc coefficient and 31 for all other coefficients.
Within a macro block the same quantizer is used for all coefficients except the INTRA dc one.
The decision levels are not defined. The INTRA dc coefficient is nominally the transform
value linearly quantized with a step size of 8 and no dead-zone. Each of the other 31 quantizers
is also nominally linear but with a central dead-zone around zero and with a step size of an
even value in the range 2 to 62.

Clipping of reconstructed picture


To prevent quantization distortion of transform coefficient amplitudes causing arithmetic
overflow in the encoder and decoder loops, clipping functions are inserted. The clipping
function is applied to the reconstructed picture which is formed by summing the prediction and
the prediction error as modified by the coding process. This clipper operates on resulting pel
values less than 0 or greater than 255, changing them to 0 and 255, respectively

27
Chapter -3 Project Estimation and Implementation Plan
3.1 Cost and Benefit Analysis
3.1.1 ECONOMICAL

Economic analysis is the most frequently used method for evaluating the candidate system.
More commonly known as cost of Benefit Analysis, the procedure is to determine the
benefits and savings that are expected from the candidate system and compare them with the
costs. If benefit outweighs the cost then the decision is made to design and implementation
otherwise further justification or alterations are made in the proposed system.

This project doesn't have many hardware requirements, thus, it requires less costing to install
the software on the whole.

Though, from the point of economy, the manual handling of the hardware component is
much cheaper and best as compared to computerized systems. This approach normally
works very well in any ordinary organization . The major problem starts when the no. of
hardware components are starts growing with a time. Manual system needs various
registers/books to maintain the daily complain entry, hardware entry done. In case of any
misplacement of hardware component, the concerned registers have to be searched for the
verification of identifying the status of that component . It is very cumbersome job to
maintain all these manually. So it is very easy to maintain all these in the proposed system.

3.1.2 COST ANALYSIS

• The cost to conduct investigation was negligible, as the center manager


and teachers of center provided most of information.
• The cost of essential hardware and software requirement is not very
expensive.

28
• Moreover hardware like Pentium Core PC and software like MATLAB
are easily available in the market.

3.1.3 BENEFITS AND SAVINGS-


Cost of the maintenance of the proposed system is negligible.

• Money is saved as paper work is minimized.


• Records are easily entered and retrieved.
• Time is saved as all the work can be done by a simple mouse click.
• The proposed system is fully automated and hence easy to use.
• Since benefits out base the cost, hence our project is economically feasible.

29
3.2 Schedule Estimate

This is the table of ‘Activity’ and its estimated time duration, which are used to accomplish the
project.

Activity Completion Duration Effort


Date (In Days) (in Manhours)
A) Introduction 20 AUG 2010 20 250
B) Problem Analysis 15 SEP 2010 25 370
C) Project Estimation & 20 OCT 2010 35 520
Implementation Plan
D) Research Design 10 NOV 2010 20 300
E) System Interface Design 10 DEC 2010 30 450
F) Coding 20 FEB 2011 30 600
G) Experiments Specification 10 MAR 2011 20 300
H) Conclusions 25 MAR 2011 15 20
I) User Manual 10 APR 2011 15 20

3.3 Gantt Chart


A Gantt chart is a horizontal bar chart developed as a production control tool in 1917 by Henry
L. Gantt, an American engineer and social scientist. Frequently used in project management, a
Gantt chart provides a graphical illustration of a schedule that helps to plan, coordinate, and
track specific tasks in a project.

Gantt charts may be simple versions created on graph paper or more complex automated
versions created using project management applications such as Microsoft Project or Excel.

A Gantt chart is constructed with a horizontal axis representing the total time span of the
project, broken down into increments (for example, days, weeks, or months) and a vertical axis

30
representing the tasks that make up the project (for example, if the project is outfitting your
computer with new software, the major tasks involved might be: conduct research, choose
software, install software). Horizontal bars of varying lengths represent the sequences, timing,
and time span for each task. Using the same example, you would put "conduct research" at the
top of the vertical axis and draw a bar on the graph that represents the amount of time you
expect to spend on the research, and then enter the other tasks below the first one and
representative bars at the points in time when you expect to undertake them. The bar spans may
overlap, as, for example, you may conduct research and choose software during the same time
span. As the project progresses, secondary bars, arrowheads, or darkened bars may be added to
indicate completed tasks, or the portions of tasks that have been completed. A vertical line is
used to represent the report date.

Scheduling of SDLC (GANNT CHART)


In Weeks

Testing
Coding
Design.
Documentation
Analysis

0 5 10 15 20 25 30 35

References

31
[1] HUFFMAN, D. A. (1951). A method for the construction of minimum redundancy codes.
In the
Proceedings of the Institute of Radio Engineers 40, pp. 1098-1101.
[2] CAPON, J. (1959). A probabilistie model for run-length coding of pictures. IRE Trans. On
Information
Theory, IT-5, (4), pp. 157-163.
[3] APOSTOLOPOULOS, J. G. (2004). Video Compression. Streaming Media Systems
Group.
[4] The Moving Picture Experts Group home page. (3. Feb. 2006)
[5] CLARKE, R. J. Digital compression of still images and video. London: Academic press.
1995, pp.
[6] http://www.irf.uka.de/seminare/redundanz/vortrag15/.
(3. Feb. 2006)
[7] PEREIRA, F. The MPEG4 Standard: Evolution or Revolution
[8] MANNING, C. The digital video site.
[9] SEFERIDIS, V. E. GHANBARI, M. (1993). General approach to block-matching motion
estimation.
Optical Engineering, (32), pp. 1464-1474.
[10] GHARAVI, H. MILLIS, M. (1990). Block matching motion estimation algorithms-new
results. IEEE
Transactions on Circuits and Systems, (37), pp. 649-651.
[11] CHOI, W. Y. PARK R. H. (1989). Motion vector coding with conditional transmission.
Signal
Processing, (18). pp. 259-267.
[12] Institut für Informatik – Universität Karlsruhe.

32