You are on page 1of 33

Data Compression

ANISHA M. LAL
Data Compression

 Data compression aims to reduce the amount of data


required to represent a given quantity of information
while preserving as much information as possible.
Image Compression
 The goal of image compression is to reduce the
amount of data required to represent a digital image.
Types of Compression

 Lossless
 Information preserving
 Low compression ratios

 Lossy
 Not information preserving
 High compression ratios

 Trade-off: image quality vs compression ratio


Compression Ratio

compression

Compression ratio:
Data Redundancy

 Interpixel Redundancy
 Coding Redundancy
 Psychovisual Redundancy

 Compression attempts to reduce one or more of


these redundancy types.
Redundancy Types

 Inter-pixel Redundancy: It is a redundancy corresponding to


statistical dependencies among pixels, especially between
neighbouring pixels.

 Coding Redundancy: The uncompressed image usually is coded


with each pixel by a fixed length. For example, an image with
256 gray scales is represented by an array of 8-bit integers.

 Psycho-visual Redundancy: It is a redundancy corresponding to


different sensitivities to all image signals by human eyes.
Therefore, eliminating some less relative important information
in our visual processing may be acceptable.
Lossless Compression Techniques

 Interpixel Redundancy
 Based upon frequency of occurrences
 Run length encoding
 Diatomic encoding
 Bit plane encoding

 Coding Redundancy
* Based upon probability of occurrences
 Huffman encoding
 Arithmetic encoding
Run length Encoding (Interpixel
Redundancy)
 Encodes repeating string of symbols (i.e., runs) using a
few bytes: (symbol, count)

1 1 1 1 1 0 0 0 0 0 0 1  (1,5) (0, 6) (1, 1)


a a a b b b b b b c c  (a,3) (b, 6) (c, 2)

 Can compress any type of data but cannot achieve high


compression ratios compared to other compression
methods
Bit-plane Encoding (Interpixel
Redundancy)
 Process each bit plane individually.

(1) Decompose an image into a series of binary images.


(2) Compress each binary image (e.g., using run-length
coding)
Huffman Encoding (Coding
Redundancy)

* A variable-length coding technique.


* Symbols are encoded one at a time!
* There is a one-to-one correspondence between source symbols
and code words
* Optimal code (i.e., minimizes the number of code symbols per source
symbol).
Huffman Encoding Technique
• Forward Pass
1. Sort probabilities per symbol
2. Combine the lowest two probabilities
3. Repeat Step2 until only two probabilities remain.
Cont…
 Backward Pass
Assign code symbols going backwards
Huffman Decoding
 Coding/decoding can be implemented using a look-up
table.
 Decoding can be done unambiguously.
Cont…

 For example, a data stream has only five symbols ABCDE with
the following probabilities
 P(A)=0.16
 P(B)=0.51
 P(C) =0.09
 P(D) = 0.13
 P(E) = 0.11
Cont…

A 011
B 1
C 000
D 010
E 001
Arithmetic Encoding (Coding
Redundancy)

 Sequences of source symbols are encoded together


(instead of one at a time).
 No one-to-one correspondence between source
symbols and code words.
 Slower than Huffman coding but typically achieves
better compression.
Arithmetic Coding (cont’d)

 A sequence of source symbols is assigned a single


arithmetic code word which corresponds to a sub-interval
in [0,1].

α1 α2 α3 α3 α4 [0.06752, 0.0688) 0.068


 We start with the interval [0, 1] ; as the number of symbols
in the message increases, the interval used to represent it
becomes smaller.
 Smaller intervals require more information units (i.e., bits)
to be represented.
Arithmetic Coding (cont’d)

Encode message: α1 α2 α3 α3 α4

1) Start with interval [0, 1)

0 1
2) Subdivide [0, 1) based on the probabilities of αi

3) Update interval by processing source symbols


Example

Encode
α1 α 2 α3 α3 α4

[0.06752, 0.0688)
or
0.068
Lossy Compression
 Transform the image into a domain where compression
can be performed more efficiently (i.e., reduce interpixel
redundancies).
JPEG Compression

 Accepted as an international image compression


standard in 1992.
 It uses DCT for handling interpixel redundancy.

 Modes of operation:
(1) Sequential DCT-based encoding
(2) Progressive DCT-based encoding
(3) Lossless encoding
(4) Hierarchical encoding
JPEG Compression
(Sequential DCT-based encoding)

Entropy
encoder

Entropy
decoder
JPEG Steps
1. Divide the image into 8x8 subimages;

For each subimage do:

2. Shift the gray-levels in the range [-128, 127]


- DCT requires range be centered around 0

3. Apply DCT  64 coefficients


1 DC coefficient: F(0,0)
63 AC coefficients: F(u,v)
Example

[-128, 127] (non-centered


spectrum)
JPEG Steps

4. Quantize the coefficients (i.e., reduce the amplitude of


coefficients that do not contribute a lot).

Q(u,v): quantization table


Example

 Quantization Table Q[i][j]


Example (cont’d)

Quantization
JPEG Steps (cont’d)
5. Order the coefficients using zig-zag ordering
- Places non-zero coefficients first
- Creates long runs of zeros (i.e., ideal for run-length
encoding)
Example
JPEG Steps (cont’d)

6. Encode coefficients:

6.1 Form “intermediate” symbol sequence

6.2 DC coefficients: predictive encoding

6.3 AC coefficients: variable length coding


DC Coefficients Encoding

symbol_1 symbol_2
(SIZE) (AMPLITUDE)

predictive
coding:

You might also like