You are on page 1of 21


Karishma Raut


Data compression is an art or science of representing information in a

compact form. We create these compact representations by identifying & using
structures that exist in the data. Data can be characters in a text file, numbers
that are samples of speech or image waveforms or sequences of numbers that are
generated by other processes.


The reason we need data compression is that more & more of the
information that we generate & use is in digital form in the form of numbers
represented by bytes of data. And the no. of bytes required to represent multimedia
data can be huge.

Ex: In order to digitally represent 1 sec of video without compression, we need more
than 20MB or 160MBits.

The idea of compression by reducing redundancy suggests the general law of

compression which is to “assign short codes to common events & long codes to rare

Compression data is done by changing its representation from inefficient

(long) to efficient (short).

Adv: Decrease storage space, time to transmit and cost.

Compression performance

The following are measure of performance of compression technique.

1. Compression ratio: It is defined as size of output stream divided by size of

i/p stream.

A value of 0.6 means that the data occupies 60% of its original size after
compression. Bits per sec bps,bits per pixel bpp.

2. Compression factor:
The value greater than one indicates compression.i.e. compression factor of
60% means that o/p stream occupies 40% of original size & saving of 60%
3. Compression gain:

Where the ref. Size is either size of i/p stream or size of i/p stream or size of
compressed stream produced by some standard lossless compression
Prof. Karishma Raut

4. Speed of compression: It can be measured in cycles per byte(CPB).This is

the average number of machine cycles it take to compress one byte. This
measure is important when compression is done special hardware.
5. Other quantities:Such as, mean square error (MSE) & peak signal to noise
ratio (PSNR), are used to measure the distortion caused by lossy
compression of image & movies.
6. Relative compression: It is used to measure the compression gain in
lossless audio compression methods, such as MLP. (meridian lossless
packing) This expresses the quality of compression by the number of bits
each audio sample is reduced. MLP(multilevel progressive method) for image


Lossy& Lossless compression

Lossless compression Lossy compression

1. The technique involve no loss of Some loss of information

2. Original data can be recovered exactly Original data cannot be recovered
from the compressed data. exactly.
3. It is used where the difference
between original & constructed data is Tolerable
not tolerated.
4. Higher compression ratios are Higher compression ratios.
generally not possible.
5. Application: text compression, image Application: storage & technique of
processing in medical satellite signal speech, video communication.

Fixed code  Basic techniques  RLE, move to front

Variable code  Statistical method  Shannon Fano, Huffman

Prof. Karishma Raut

Statistical methods

Variable size codes statistical methods use the symbols (characters or pixels)
they operate on with the shorter codes assigned to symbols or groups of symbols
that appear more often in data (have a higher probability of occurrence).

Designer & implementers of variable size codes have to idea with the two
problems of

(1) Assigning codes that can be decoded unambiguously &

(2) Assigning codes with average size.

Prefix codes

Tunshall codes

The golomb code

Shannon-Fano code


Arithmetic etc.

Shannon-Fano Coding

Named after Claude Shannon &Robert Fano

1st best variable codes.

We start with a set of n symbol with known probabilities of occurrence.

1. The symbols are first arranged in descending order of their probabilities.

2. The set of symbols is then divided into two subsets that the same (or almost
the same) probabilities.
3. All symbols in one sunset get assigned codes that start with a 0, while the
codes of the symbols in the other subset start with a 1.
4. Each subset is then recursively divided into two subsets of roughly equal
probabilities & repeat the procedure.
5. The process continues until no more subset remains.

1. List the symbol in order of decreasing probability.

2. Partition the set of symbols into two sets that is as close to being
equiprobable as possible.
3. Assign ‘1’ to each symbol in the upper set & ‘0’ to each symbol in
lower set.
4. Continue this process, each time partitioning the sets with as nearly
equal probabilities as possible until further partitioning is not
Prof. Karishma Raut


The algorithm starts by building a list of all the alphabet symbols in

descending order of their probabilities. It then constructs a tree, with a symbol at
every leaf, from the bottom up.

This is done in steps, where at each step the two symbols with smallest
probabilities are selected, added to the top of the partial tree, selected from the list
& replaced with an auxiliary symbol representing the two original symbols. When
the list is reduced to just one auxiliary symbol, the tree is complete.
Prof. Karishma Raut

The tree is traversed to determine the codes of the symbols.

In Huffman code, no codeword is prefix of other codeword (Prefix property). All

prefix codes are uniquely decodable.

The Huffman procedure is based on two observations:

1. In an optimum code, symbols that occur more frequently will have

shorter code words than symbols that occur less frequently.
2. In the optimum code, the two symbols that occur lest frequently will
have the same length.

Standard Hufffman
Min. Variance Hufffman
Non-binary Hufffman
Adaptive Hufffman
Prof. Karishma Raut


The standard Huffman & Min. variance code is identical in terms of

their redundancy. However the variance of length of the code words is
significantly different.
Prof. Karishma Raut

In some applications the available TXn rate is forced.


We will be generating bits at a rate of 40,000/- bits/sec.

Buffer size = 40,000 - 22,000 =18,000 bits


We will be generating bits at rate 30,000 it/sec.

Buffer size = 30,000 - 22,000 = 8,000 bits

It is reasonable to elect the use of second code i.e. min. variance Huffman

In Min variance Huffman code, variation in codelength is less as

compared to standard Huffman code.


The binary coding procedure can be easily extended to the non-binary case
i.e. m-ary alphabet (m ≠ 2).

In general case of an m-ary code & an M-letter alphabet. Let’s ‘m’ be the no.
of letters that are combined in the first phase. Then ‘m’ is the number between two
& m which is equal to M modulo (m – 1).


The Huffman coding requires knowledge of the probabilities of source

sequence. If these probabilities are unknown then Huffman coding becomes two
Prof. Karishma Raut

pass procedures. In the 1st procedure the statistics are collected & in the second
pass the source is coded.

For the adaptive Huffman codes two extra parameters are added to the
binary tree.

Weight: the weight of each external node is the number of time the symbol
corresponding to the leaf has been encountered.

The weight of each internal node is the sum of weight of its offspring’s.

Node number: A unique number is assigned to every node. If an alphabet size is ‘n’
then there will be 2n – 1 internal & external nodes which can be numbered as y 1 to


1. if suppose wj is the weight of yj then for other nodes the weight will be such
that X1 ≤ X2 ≤ X3……. ≤ X2n – 1
2. if say y2j-1 & y2j are the offspring’s of the same parent node for 1 ≤ j ≤ n &
the node number for the parent node is greater than offspring then this
property is called sibling property of Huffman code.

The tree which possesses this property is known as Huffman tree.

In the adaptive Huffman codes neither Tx nor Rx knows anything about the
source statistics that the starting ofTxn.


At the beginning the tree at both Tx& Rx consist of signal node that
corresponds to all the symbols which not yet transmitted (NYT) & has zero weight.

Whenever Txn. proceeds the nodes corresponding to transmitted symbol are

added to the tree & tree is configured using update procedure.

Before Txn. starts, a fixed code for each symbol is agreed upon betn Tx & Rx.

The fixed code is decided as follows:

If suppose a square has an alphabet (a1, a2, ….am) of sine ‘m’ then pick ‘e’ &
‘r’ such that m = 2e + r & 0 ≤ r ≤ 2e

Then the letter ak is encoded as (e+1) bit binary representation of k – 1 if 1 ≤

k ≤ 2r OR ak is encoded as ‘e’ bit binary representation of k – r -1.

When the symbol is encountered 1st time then NYT code is transmitted
followed by fixed code for that symbol.

A node is then created & symbol is taken out of NYT list.

Prof. Karishma Raut

The TX &Rx start with same tree structure & both will use same updating
procedure & therefore encoding & decoding process remains synchronized with
each other.


As the received binary string is read then the tree is traversed in a manner
identical to that used in the encoding procedure. Once a leaf is encountered the
symbol corresponding to leaf is decoded. If the leaf is the ‘NYT’ node then be check
the next ‘e’ bits to see if the resulting number is less than r.

If it is less than r be read another bit to complete the code for the symbol.
The index for the symbol is obtained by adding ‘r + 1’ to the decimal number
corresponding to the ‘e’ bit binary string or by adding ‘1’ to the ‘e+1’ binary string.

Once the symbol has been decoded the tree is updated & the next bit is
used to start another traversal down the tree.

 Read received bit.

 If it leads to internal node go on reading bits till we reach to leaf.
 If the leaf corresponds to external symbol on the tree decode it & update
 Else it is a NYT node. Then, read next e bits, if decimal equivalent of those e
bits is less than r read one more bit. The index of symbol ‘k’ is then decimal
equivalent plus 1. Update tree. K = ( )10 +1
 If decimal eq. of e bits is greater than r then the index of symbol ‘k’ is then
decimal eq. plus r plus1 update tree. K = ( )10 + r + 1


It is more efficient to generate code words for groups or sequences of

symbols rather than generating a separate code word for each symbol in a
sequence. But this approach becomes impractical with Huffman & causes
exponential growth in the size of codebook.

The arithmetic coding fulfills this requirement. Here, a unique identifier or

tag is generated for the sequence to be encoded. This tag corresponding to a binary
fraction which becomes the binary code for the sequence.

To map sequences of symbols into unit interval (o,1), a cumulative

distribution function (cdf) is used.

To used thid technique, we need to map the source symbols or letter to


X(ai) = I ai € A

Where, A = { a1,a2,…} is the alphabet for a discrete source & x is a

random variable.
Prof. Karishma Raut

Mapping meanse probability model P

P( x = i) = P(ai)

&cdf is Fx(i) = p(x = k)

Fx(k) =0, k ≤ 0 &Fx (k) = 1 k≥ m

The procedure for generating tag works by reducing the size of the interval in which
the tag resides as more and more elements of the sequence are received.

We start out by first dividing unit interval into subintervals of form [Fx (i-1),Fx (i))
i.e. [0,1)

Boundaries of the interval contain the tag for the sequence being encoded as,

Lower limit l (n) =l (n – 1) +(u (n – 1) -l (n – 1))Fx (Xn-1)

Upper limit u(n) = l (n – 1) +(u (n – 1) -l (n – 1))Fx (Xn)

Where Xn is the value of random variable corresponding to the nthe

observation symbol.

Tag =
Prof. Karishma Raut

Deciphering tag

1.Intitialize l (n) =0, u(n)=1

2. For each k find

t*=(tag- l (k– 1))/( u (k – 1) -l (k– 1))

3.Find the value of xk for which

Fx (xk -1)≤ t* ≥ Fx (xk))

4.Update l (k) ,and ukn)

5. Continue until the entire sequence has been decoded.

There are two ways to know the entire sequence has been decoded. The decoder
may know the length of sequence or second way to use special symbol to indicate
end of transmission.
Prof. Karishma Raut
Prof. Karishma Raut


There are many applications in which the o/p of source consists of recurring
patterns. The example is a text source in which certain word occur recursively. To
encode such source a list or a dictionary is maintained which has a collection of
frequently occurring patterns from the source.

Whenever a pattern appears at the source o/p, it is checked for the

corresponding matching entry in the dictionary.

If the pattern has similar entry in the dictionary then the o/p source pattern
is simply encode with the index of related pattern in dictionary instead of coding it

This obtains the compression & the technique is called dictionary encoding
method. The encoder & decoder have identical dictionaries.


Static dictionary encoding is used when considerable prior knowledge about

the source is known to us in advance. The static dictionary contains those patterns
which are anticipated to occurs more frequently from source o/p.

For the different pattern entries with recurring nature a highly efficient
compression nature a highly efficient compression technique with static dictionary
method is obtained.

These schemes would work well only for those applications & data for which they
were designed for. This static scheme is not capable to be used with other
applications & if it is used results in expansion rather than compression.

Ex. Diagram coding


The technique is able to compress a variety of source o/p. the dynamic

dictionary technique adapts to the characteristic of source o/p rather than
changing as like in static approach.

Dynamic dictionary method does not required initial knowledge of the source
o/p for the compression purpose.

The new patterns o/p from source which do not exist in the dictionary are
included dynamically into the dictionary.

LZ-77 LimpleZiv
LZ-78 LimpleZiv
LZ-w Limpel-Ziv-Wetch
Prof. Karishma Raut

In this method dictionary is portion of previously encoded sequence through
coding window. The window consists of two parts, a search buffer contains portion
of recently encodes sequence. The look ahead buffer contains next portion of the
sequence to be encoded.
To encode the sequence in look ahead buffer the encoder moves search
pointer through search buffer until it encounters a match to first symbol in look
ahead buffer.
The distance of pointer from look ahead buffer is called offset.Encoder then
examines symbols following. The symbol of pointer location to see for the match in
look ahead buffer. The number of consecutive symbols in search buffer that
matches with consecutive symbols in look ahead buffer starting with first symbol is
called length of match.
The encoder then searches for longest match & when a longest match is
found then encoder encodes it with triple < 0, ,c>

Where, 0 offset

Length of match

C code word corresponding to symbol in look-ahead buffer

following by match.


We receive the triples < 0, 0, c(d) >, < 7, 4, c(r)>, &< (3, 5 , c(d))>

Initially we have decoded c a b r a c a


1. < 0, 0, c(d) >

2. < 7, 4, c(r)>
3. < (3, 5 ,c(d)>
Cabraca d abrarrarrad
Prof. Karishma Raut

Adv: It is adaptive method so no prior knowledge of source is required.


1. Triples are encoded using fixed length code.

2. Large size buffers are required. Efficient search algorithms are needed.
3. Method turns to be inefficient if period of repeatation is larger than the
size of search buffer.

Application: ZIP packages

Prof. Karishma Raut


The LZ-78 algorithm overcomes the problems occurring in LZ-77 algorithm

by dropping the reliance in the search buffer & keeping an explicit dictionary. This
dictionary has to be built at both the encoder & decoder in identical manner. The
i/p are coded as double <i, c >

i index corresponding to dictionary entry that was longest match to the i/p.

c code for the character I the i/p following matched portion to the i/p.( 1 st
unmatched charater)

while the LZ78 algorithm has the ability to capture patterns & hold them
indefinitely. It also has a rather serious drawback. The dictionary keeps growing
without bound.

Also as ‘double’ has to be encoded for each unmatched entry no match


Index 0 is used for the unmatched portion. The encoded double becomes a
new entry to the dictionary. Thus, each new entry into the dictionary is one new
symbol concatenated with an existing dictionary entry.
Prof. Karishma Raut
Prof. Karishma Raut

LZW LimpelZivWeitch

Weitch proposed a technique for removing the necessity of encoding the

second element of the pair <i , c>. That is, the encoder would only send the index to
the dictionary.

In order to this, the dictionary has to be prime with all the letters of the
source alphabet. The i/p to the encoder is accumulated in a pattern ‘p’ as long as
‘p’ is contained in the dictionary. If the addition of another letter ‘a’ results in a
pattern ‘p *a’ that is not in the dictionary, then the index of ‘p; is transmitted to the
receiver, the pattern ‘p *a’ is added to the dictionary & we start another pattern
with the letter ‘a’.


The encoder o/p sequence becomes the encoder i/p sequence. The decoder
starts with the same dictionary as encoder.

With reference to the correct example, the index value 5 corresponds to the
letter w, so we decode w as the first element of our sequence. At the same time, in
order to mimie the dictionary construction procedure of the encoder, we begin
construction of the next element of the dictionary. We start with the letter w. this
pattern exists in the dictionary, so we do not add it to the dictionary & continue
with the decoding process. The next decoder input is 2, which is the index
corresponding to the letter a. we decode an ‘a’ & concatenate it with our current
pattern to from the pattern wa. As this does not exist in the dictionary, we add it as
next entry of the dictionary & start a new pattern beginning with the letter a.
Prof. Karishma Raut


Waveform codes: It tries to produce, audio samples that are as close to the original
samples. Ex. PCM, DCPAM, Subband& adaptive transform

Subband coding: It transforms the audio samples to the frequency domain in

different freq. bands& codes each subband separately with ADPCM or a similar
quantization method.

Adaptive transform coding (ATC): It transforms audio samples to the frequency

domain with DCT.

Source codes: It used a mathametical method of the source of data. The model
depends on certain parameters & the encoder used the i/p data to compute those
parameters. Once the parameters are obtained, they are written on the compressed
stream. The decoder i/p the parameters & employs the mathematical model to
reconstruct the original data.

For audio, it is known as Vocoder Vocal Coder.

How audio compression is implemented in MPEG-1?

There are different MPEG std for audio compression MPEG1 and MPEG-2 and
known as layer I,II,III. Layer III is refered as MP3.

It defines the bit stream that should be presented to the decoder leaving the design
of the encoder to individual venders. The basic strategy used in all 3 layers is

The i/p consisting of 16 bit PCM words is first transformed into the frequency
domain. The frequency coefficient are quantized, coded and packed into the MPEG
bit stream. Each layer is previous layer and also provides higher compression. The
three layers are backward compatible i.e. a decoder for layer III should be able to
decode layer I and II. A decoder for layer II should be able to decode layer-I encoded

MPEG-1 provides a 4:1 compression. In layer I coding the time frequency mapping
is accomplished using a bank of 32 subband filters. The o/p of the subband filter is
critically sampled. i.e. o/p of each filter is down sampled by 32. The samples are
divided into groups of 12 samples each.

Twelve samples from each 32 subband filters make up one frame of the layerI
coder. Once the frequency components are obtained the algorithm examines a
scalefactor. The subband o/p is divided by scale factor before being linearly
quantized. There are total of 63 scale factors specified in the MPEG standard
specification of each scalefactor requires 6 bits.

The output of the quantization and bit allocation steps are combined into a frame
as shown. The header is made up of 32 bits.
Prof. Karishma Raut

If protection info known, all 16 bit can be used for frame synchronization.

4 bit - bit rate index kbits/sec

2 bit – sampling frequency

1 bit – padding bit

2 bit - mode

2 bit – mode extension bits

1 bit – copy right bit

1 bit – original media

Prof. Karishma Raut

JPEG Standard

JPEG is joint photograohic expert group is one of the most widely used standard for lossy
image compression. It is a joint collaboration of ISO and ITU.

Following describes JPEG algorithm:

1. The DCT is used in JPEG scheme. In this scheme the image to be compressed is level
shifted by 2p-1.
2. The level shifting is performed by subtracting value 2p-1 from the each pixel value of
the given image where P is number of bit required to represent one pixel in the image.
3. Therefore if we have 8 bit images then 28-1 = 128and pixels values varies between -
128 to 127. The whole image is then divided into 8*8 blocks of pixels.
4. These 8*8 blocks then transformed using DCT. It is found that dimension of image is
not multiple of 8 then encoder replicates last column or row until the final size
becomes multiple of 8.
5. These additional rows and column are removed during decoding process. The DCT of
level shifted block gives the DCT coefficient.
6. Then algorithm uses uniform midtreadquantizer to quantize the DCT coefficient. The
quantized value of each DCT coefficient is represented by label.
7. The reconstructed value is obtained from this label by multiplying it with the
corresponding entry in sample quantization table which is provided.
8. The step size generally increases as we move from DC coefficient to higher order
coefficient. More quantization error is introduced at higher frequency coefficient as
compared to lower frequency coefficient as quantization error is increasing function
of steps.
9. This label for DC and AC coefficient are coded differently by JPEG. This results in
higher compression but more complex operation.