You are on page 1of 114

Image Compression

CS474/674 – Prof. Bebis


Chapter 8 (except Sections 8.10-8.12)
Image Compression
• Reduce the amount of data while preserving as much
information as possible!
– Lower memory requirements.
– Faster transmission rates.
Data ≠ Information

• Data and information are not synonymous terms!

• Data is the means by which information is conveyed.

• The same information can be represented by different


amounts of data!
Data ≠ Information - Example

Ex1: Your wife, Helen, will meet you at Logan Airport


in Boston at 5 minutes past 6:00 pm tomorrow
night

Ex2: Your wife will meet you at Logan Airport at 5


minutes past 6:00 pm tomorrow night

Ex3: Helen will meet you at Logan at 6:00 pm


tomorrow night
Image Compression (cont’d)

• Lossless
– No information loss
– Low compression ratios

• Lossy
– Information loss
– High compression ratios

Trade-off: information loss vs compression ratio


Compression Ratio

compression

Compression ratio:
Relevant Data Redundancy

where

Example:
Types of Data Redundancy

(1) Coding Redundancy


(2) Interpixel Redundancy
(or Spatial Redundancy)
(3) Psychovisual Redundancy
(or Irrelevant Information Redundancy)

• The goal of data compression is to reduce one or more


of these types of redundancy.
Coding Redundancy
• A code is a system of rules for representing data (e.g., image
pixels) in some alternative form for efficient storage,
transmission, secrecy etc.
– A code consists of a list of
symbols (e.g., letters, numbers).
– A code word is a sequence of
symbols used to represent the
data (e.g., pixel values).
– The length of a code word is the
number of symbols in the code
word
– Length could be fixed or variable.
Coding Redundancy (cont’d)
• Coding redundancy results from employing inefficient
coding schemes.

• How do we compare different coding schemes?

– Compute the average number of symbols Lavg per


code word (data content).

– Coding schemes with lower Lavg are more efficient.


(i.e., require less memory).
Computing Lavg for Images
N x M image
(symbols: 0/1 bits)

rk: k-th gray level E ( X )   xP ( X  x)


l(rk): # of bits for representing rk x
P(rk): probability of rk

L 1 L 1
Lavg  E (l (rk ))   l (rk ) P(l (rk ))   l (rk ) P(rk ) bits/pixel
k 0 k 0

Average image size: bits


Coding Redundancy - Example

• Case 1: l(rk) = fixed length

L=8

/pixel

Average image size: 3NM bits


Coding Redundancy – Example (cont’d)
• Case 2: l(rk) = variable length

L=8

/pixel

Average image size: 2.7NM bits


Interpixel redundancy
• Interpixel redundancy results from pixel correlations (i.e.,
a pixel value can be reasonably predicted by its neighbors).

histograms

auto-correlation


f ( x) o g ( x)   f ( x) g ( x  a )da


auto-correlation: f(x)=g(x)
Interpixel redundancy (cont’d)
• Interpixel redundancy is typically addressed by applying
some transformation to the data first.

Example: Pixel values at line #100

Grayscale threshold

thresholding
transformation 11 ……………0000……………………..11…..000…..

Binary
Original: 1024 bytes
Thresholded: 1024 bits
Psychovisual redundancy
• The human eye is more sensitive to lower frequencies than to
higher frequencies in the visual spectrum.
• Discard data that is perceptually insignificant.

256 gray levels 16 gray levels random noise + 16 gray levels

Add a small
Example: random
represent number
to each pixel
pixels using prior to
fewer bits! quantization

CR=8/4 = 2:1
Data ≠ Information (revisited)

Goal: reduce the amount of data while preserving as


much information as possible!

Question: What is the minimum amount of data that


preserves the information content of an image?

We need some measure of information!


How do we measure information?
• We assume that information is generated by some
probabilistic process.
• Idea: associate information with probability:
– Events with high probability contain less information.
– Events with low probability contain more information.

• A random event E with probability P(E) contains

If P(E)=1, then I(E)=0! (no information)


How much information does a pixel contain?

• We assume that pixel values are generated by some


random process.

• How much information does a pixel value rk contain?

units of information!

(assuming statistically independent random events)


How much information does an image contain?

• The average information content of an image is:


L 1
E   I (rk ) P(rk )
k 0

using

units of info / pixel


Entropy:
(e.g., bits/pixel)
Entropy – Example

H=1.6614 bits/pixel H=8 bits/pixel H=1.566 bits/pixel

The amount of entropy, and thus information in an image,


is far from intuitive!
Data Redundancy
• Data redundancy can be computed by comparing data to
information:
Do not confuse R with RD
(relative data redundancy)

data information

where:

Note: if Lavg= H, then R=0 (no data redundancy)


Data Redundancy - Example

Lavg = 8 bits/pixel

R= Lavg- H or R= 6.19 bits/pixel


Entropy Estimation
• Estimating H reliably is not easy!

First order estimate of H Second order estimate of H


Use pixel frequencies: Use pixel block frequencies:

Which entropy estimate is more reliable?


Entropy Estimation (cont’d)

• In general, differences between first-order and


higher-order entropy estimates indicate the presence
of interpixel redundancy.

• As mentioned earlier, interpixel redundancy can be


addressed by applying some transformation to the
data.
– Let’s look at a simple example!
Entropy Estimation - Example
• Consider a transformation that simply subtracts column i-1
from column i:
original image difference image
transformation

max value: 243 (8 max value: 74 (7


bits/pixel) bits/pixel)
• No information has been lost – why?
– Add column i to column i+1
Estimating Entropy – Example (cont’d)
• Could a better transformation be found?
• What is the entropy of the difference image?

16

Less than the 1st order entropy of the


original image (1.41 < 1.81 bits/pixel)

•It is possible that a better transformation can be found since the


2nd order entropy estimate is even lower:
General Image Compression and
Transmission Model

We will focus on the Source Encoder/Decoder only.


Encoder – Three Main Components

• Mapper: applies a transformation to the data to account for


interpixel redundancies.
Encoder (cont’d)

• Quantizer: quantizes the data to account for psychovisual


redundancies.
Encoder (cont’d)

• Symbol encoder: encodes the data to account for coding


redundancies.
Decoder - Three Main Components

• The decoder applies the same steps in inverse order.

• Note: Quantization is irreversible in general!


Fidelity Criteria

• How close is to ?

• Criteria
– Subjective: based on human observers.
– Objective: based on mathematically defined criteria.
Subjective Fidelity Criteria
Objective Fidelity Criteria

• Root mean square error (RMS)

• Signal-to-noise ratio (SNR)


Lossless Compression
Taxonomy of Lossless Methods

(Run-length encoding)

(see “Image Compression Techniques” paper)


Huffman Coding
(addresses coding redundancy)
• A variable-length coding technique.

• Source symbols (e.g., gray levels) are encoded one at a time.


• One-to-one correspondence:
source symbols ↔ code words
• Optimal: minimizes code word length per source symbol.
Huffman Coding (cont’d)
• Forward Pass
1. Sort probabilities per symbol (e.g., gray-levels)
2. Combine the lowest two probabilities
3. Repeat Step2 until only two probabilities
remain.
Huffman Coding (cont’d)

• Backward Pass
Assign code symbols going backwards
Huffman Coding (cont’d)
• Lavg assuming binary coding:

• Lavg assuming Huffman coding:


Huffman Decoding

• Decoding can be performed unambiguously using a


look-up table.
• Scan symbols one at a time until you find a match,
then repeat the process.
Arithmetic (or Range) Coding
(addresses coding redundancy)

• Huffman coding encodes source symbols one at a time which


might not be efficient in general.
• Arithmetic coding assigns sequences of source symbols to
variable length code words.

• No one-to-one correspondence:
(source symbols ↔ code words)

• Slower than Huffman coding but can achieve higher compression.


Arithmetic Coding – Main Idea
• Maps a sequence of symbols to a real number (arithmetic
code) in the interval [0, 1).
α1 α2 α3 α3 α4

• The mapping is built incrementally (i.e., scanning source


symbols in sequence) and depends on the source symbol
probabilities.
Arithmetic Coding – Main Idea (cont’d)
Symbol sequence: α1 α2 α3 α3 α4
known probabilities P(αi)
– Start with the interval [0, 1)

0 1
– A sub-interval of [0,1) is chosen to encode the first symbol α1 in the sequence
(based on P(α1)).

0 1

– A sub-interval within the previous sub-interval is chosen to encode the next


symbol α2 in the sequence (based on P(α2)).

0 1

– Eventually, the whole symbol sequence is encoded by choosing some


number within the final sub-interval, e.g.:
final
Arithmetic Coding - Example

Subdivide [0,1)
based on P(αi)
Encode
α1 α2 α3 α3 α4

Subdivide Subdivide Subdivide Subdivide Subdivide

[0.06752, 0.0688) 0.8 0.16


final sub-interval

0.4 0.08
arithmetic code: 0.068 0.04
0.2
(can choose any number
within the final sub-interval)

Warning: finite precision arithmetic might cause problems due to truncations!


Arithmetic Coding - Example (cont’d)

• The arithmetic code 0.068 can be encoded using Binary Fractions:

0.0068 ≈ 0.000100011 (9 bits) α1 α2 α3 α3 α4


• Huffman Code:
0100011001 (10 bits) Example

• Fixed Binary Code:


5 x 8 bits/symbol = 40 bits
Arithmetic Decoding - Example
Subdivide based
on P(αi) Subdivide Subdivide Subdivide Subdivide

1.0 0.8 0.72 0.592 0.5728

α4 α4 α4 α4 α4
0.8 0.72 0.688 0.5856 0.57152

Decode 0.572
α3 α3 α3 α3 α3

0.4 0.56 0.624 0.5728 0.56896

α2 α2 α2 α2 α2 α3 α3 α1 α2 α4
0.2 0.48 0.592 0.5664 0.56768

α1 α1 α1 α1 α1 A special EOF symbol can


be used to terminate iterations .
0.0 0.4
0.56 0.56 0.5664
LZW Coding
(addresses interpixel redundancy)

• Requires no prior knowledge of symbol probabilities.

• Assigns sequences of source symbols to fixed length


code words.

• No one-to-one correspondence:

(source symbols ↔ code words)


• Included in GIF, TIFF and PDF file formats
LZW Coding
• LZW builds a codebook (or dictionary) of symbol
sequences (i.e., gray-level sequences) as it processes the
image pixels.
• Each symbol sequence is encoded by its dictionary
location.
Dictionary
Location Entry
Each dictionary location can be
0 … encoded by 9 bits in this example.
1 …
… … Therefore, the sequence of gray-
… …
levels 10-120-51 will be encoded by
240 10-120-51
… … 9 bits instead of 3 bytes!
511 -
LZW Coding (cont’d)
Dictionary Initialization
39 39 126 126
First 256 entries are 39 39 126 126
assigned to gray levels 39 39 126 126
0,1,2,..,255 39 39 126 126
Dictionary
Location Entry As the encoder examines the
image pixels, gray level
0 0
1 1 sequences that are not in the
… … dictionary are added to the
255 255
256 - dictionary.
… …
511 -
LZW Coding (cont’d)
Dictionary Initialization
39 39 126 126
First 256 entries are 39 39 126 126
assigned to gray levels 39 39 126 126
0,1,2,..,255 39 39 126 126
Dictionary
- Is 39 in the dictionary……..Yes
Location Entry
- What about 39-39………….No
0 0 * Add 39-39 at location 256
1 1
… …
255 255
256 - 39-39
… …
511 - So, 39-39 will be encoded by 256
LZW Coding (cont’d)
39 39 126 126
39 39 126 126
39 39 126 126
39 39 126 126
Can be implemented
efficiently using a queue!
1. Scan next symbol and
enter it in the queue.
2. Check if the sequence in the
queue exists in the dictionary.
3. If true, dequeue it and make
a new entry in the dictionary; 10 x 9 bits/symbol = 90 bits vs 16 x 8 bits/symbol = 128 bits
else, go to step 1.
Decoding LZW

• Decoding can be done using the dictionary again.


• For image transmission, there is no need to transmit
the dictionary for decoding.
• The dictionary can be built on the “fly” by the
decoder as it reads the received code words.
Run-length coding (RLC)
(addresses interpixel redundancy)
• Represent sequences of repeating symbols (a “run”) using a
compact representation (symbol, count) :
(i) symbol: the symbol itself
(ii) count: the number of times the symbol repeats

111110000001
 (1,5) (0, 6) (1, 1)
aaabbbbbbcc  (a,3) (b, 6) (c, 2)
• Each pair (symbol, count) can be thought as a “new” symbol
which can be encoded using, for example, Huffman coding.
Bit-plane coding
(addresses interpixel redundancy)

• Decompose an image into a series of bit planes.


(i.e., 8 bit planes for PGM images)

• Compress each bit plane separately (e.g., using RLC)


Lossy Methods - Taxonomy

(see “Image Compression Techniques” paper)


Lossy Compression – Transform Coding
• Transform the image into some other domain to address
interpixel redundancy.

Quantization is irreversible in general!


Example: DFT

Note that |F(u,v)| decreases, as


u, v increase!
K << N

Idea: Approximate f(x,y) using fewer


terms (i.e., largest F(u,v) coefficients)!

K-1 K-1
What transformations can be used?

• Various transformations T(u,v) are possible, for example:


– DFT
– DCT (Discrete Cosine Transform)
– KLT (Karhunen-Loeve Transformation)
– PCA (Principal Component Analysis)

• JPEG uses DCT – let’s see why!


DCT (Discrete Cosine Transform)

Forward:

Inverse:

if u=0 if v=0
if u>0 if v>0
DCT – Basis Functions

• Basis functions for a 4x4 image (i.e., cosines of


different frequencies).
Compare DCT with other transformations
DFT WHT DCT

Image is divided into


8 x 8 sub-images
(64 coefficients per
sub-image). Reconstruction error (RMS)
Sub-images were
reconstructed by
truncating 50% of the
smallest coefficients.

2.32 1.78 1.13


DCT - Sub-image size selection

Performed experiments
using a large number of
random images.

Reconstructions (75% truncation of coefficients)


original 2 x 2 sub-images 4 x 4 sub-images 8 x 8 sub-images
JPEG Compression

Entropy
encoder

Became an
international
image
compression
standard in
1992.

Entropy
decoder
JPEG - Steps

1. Divide image into 8x8 sub-images.

For each sub-image do:


2. Shift the gray-levels in the range [-128, 127]
(i.e., reduces the dynamic range requirements of DCT)

3. Apply DCT; yields 64 coefficients


1 DC coefficient: F(0,0)
63 AC coefficients: F(u,v)
Example

[-128, 127] (DCT spectrum)


Note: the low frequency components
are around the upper-left corner of
the spectrum (not centered!).
JPEG Steps (cont’d)

4. Quantize coefficients (i.e., reduce the amplitude of


coefficients that do not contribute a lot).

Q(u,v): quantization array


Computing Q[i][j] - Example

• Quantization Array Q[i][j]


Example (cont’d)

Cq(u,v)

C(u,v)

Quantization

Q(u,v) Small magnitude coefficients


have been truncated to zero!

“quality” controls how many of


them will be truncated!
JPEG Steps (cont’d)
5. Order the coefficients using zig-zag ordering

Creates long runs of zeros (i.e., ideal for RLC)


JPEG Steps (cont’d)

6. Encode coefficients:

6.1 Form “intermediate” symbol sequence.

6.2 Encode “intermediate” symbol sequence into


DC coefficient is encoded differently from AC coefficients.
a binary sequence using Huffman coding.
Intermediate Symbol Sequence – DC coefficient

symbol_1 (SIZE) symbol_2 (AMPLITUDE)


(6) (61)

SIZE: # bits needed to


encode the amplitude
Amplitude Encoding of DC coefficient

symbol_1 symbol_2
(SIZE) (AMPLITUDE)

We use
predictive
coding: 64x64 64x64

The DC coefficient of blocks other than the first


block is substituted by the difference between the
DC coefficient of the current block and that of the
previous block.
Intermediate Symbol Sequence – AC coefficient

symbol_1 (RUN-LENGTH, SIZE) symbol_2 (AMPLITUDE) end of block

RUN-LENGTH: run of zeros preceding coefficient


SIZE: # bits for encoding the amplitude of coefficient
Note: If RUN-LENGTH > 15, use symbol (15,0) , e.g.:
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2  (15, 0)(3, 2)(2)
AC Coefficient Encoding
Symbol_1 Symbol_2
(Variable Length Code (VLC) (Variable Length Integer (VLI)
pre-computed Huffman codes) pre-computed codes)
# bits
Idea: smaller (and
more common)
values are assigned
fewer bytes and take
up less space than
larger (and less
common) values.

DC coefficients are
encoded similarly.

(1,4) (12)  (111110110 1100)


VLC VLI
Final Symbol Sequence
Effect of “Quality” parameter

(58k bytes) (21k bytes) (8k bytes)

lower compression higher compression


Effect of Quantization:
homogeneous 8 x 8 block
Effect of Quantization:
homogeneous 8 x 8 block (cont’d)

De-quantized coefficients
Quantized coefficients (multiply by Q(u,v))
Effect of Quantization:
homogeneous 8 x 8 block (cont’d)

Reconstructed

Reconstruction error is low!

Original
Effect of Quantization:
non-homogeneous 8 x 8 block
Effect of Quantization:
non-homogeneous 8 x 8 block (cont’d)

De-quantized coefficients
Quantized coefficients (multiply by Q(u,v))
Effect of Quantization:
non-homogeneous 8 x 8 block (cont’d)

Reconstructed

Reconstruction error is high!

Original
Case Study: Fingerprint Compression

• FBI is digitizing fingerprints at 500 dots per inch


using 8 bits of grayscale resolution.
• A single fingerprint card (contains fingerprints from
all 10 figures) turns into about 10 MB of data.

A sample fingerprint image


768 x 768 pixels =589,824 bytes
Need to Preserve Fingerprint Details

The "white" spots in the middle of


the black ridges are sweat pores
which are admissible points of
identification in court.

These details are just a couple


pixels wide!
What compression scheme should be used?

• Lossless or lossy compression?

• In practice lossless compression methods haven’t


done better than 2:1 on fingerprints!

• Does JPEG work well for fingerprint compression?


Results using JPEG compression
file size 45853 bytes
compression ratio: 12.9

Fine details have been lost.

Image has an artificial ‘‘blocky’’


pattern superimposed on it.

Artifacts will affect the


performance of fingerprint
recognition.
WSQ Fingerprint Compression

• An image coding standard for digitized fingerprints


employing the Discrete Wavelet Transform
(Wavelet/Scalar Quantization or WSQ).

• Developed and maintained by:


– FBI
– Los Alamos National Lab (LANL)
– National Institute for Standards and Technology (NIST)
Results using WSQ compression
file size 45621 bytes
compression ratio: 12.9

Fine details are better


preserved.

No “blocky” artifacts.
WSQ Algorithm

Target bit rate can be set via a parameter, similar


to the "quality" parameter in JPEG.
Compression ratio

• FBI’s target bit rate is around 0.75 bits per pixel (bpp)

• This corresponds to a compression ratio of


8/0.75=10.7

• Let’s compare WSQ with JPEG …


Varying compression ratio (cont’d)
0.9 bpp compression
WSQ image, file size 47619 bytes, JPEG image, file size 49658 bytes,
compression ratio 12.4 compression ratio 11.9
Varying compression ratio (cont’d)
0.75 bpp compression
WSQ image, file size 39270 bytes JPEG image, file size 40780 bytes,
compression ratio 15.0 compression ratio 14.5
Varying compression ratio (cont’d)
0.6 bpp compression
WSQ image, file size 30987 bytes, JPEG image, file size 30081 bytes,
compression ratio 19.0 compression ratio 19.6
JPEG Modes

• JPEG supports several different modes:


– Sequential Mode
– Progressive Mode
– Hierarchical Mode
– Lossless Mode

(see “Survey” paper)


Sequential Mode

• Image is encoded in a single scan (left-to-right, top-to-


bottom); this is the default mode.
Progressive JPEG
• Image is encoded in multiple scans.
• Produces a quick, roughly decoded image when
transmission time is long.
Progressive JPEG (cont’d)

• Main algorithms:
(1) Progressive spectral selection algorithm
(2) Progressive successive approximation algorithm
(3) Hybrid progressive algorithm
Progressive JPEG (cont’d)

(1) Progressive spectral selection algorithm


– Group DCT coefficients into several spectral bands
– Send low-frequency DCT coefficients first
– Send higher-frequency DCT coefficients next

Example:
Progressive JPEG (cont’d)

(2) Progressive successive approximation algorithm


– Send all DCT coefficients but with lower precision.
– Refine DCT coefficients in subsequent scans.

(3) Hybrid progressive algorithm


Combines spectral selection and successive
approximation
Example
after 0.9s after 1.6s

after 3.6s after 7.0s


Hierarchical JPEG

• Hierarchical mode encodes the f4

image at different resolutions. N/4 x N/4

f2

• Image is transmitted in multiple


N/2 x N/2
passes with increased resolution at
each pass.
f

NxN
Hierarchical JPEG (cont’d)

f4

N/4 x N/4

f2
down-sample
N/2 x N/2

up-sample

NxN
Quiz #7
• When: 12/11/2023
• What: Image Compression

• Study the problems provided on the next slides to practice


for the exams.
Practice Problem 1
Practice Problem 1 (cont’d)
Practice Problem 1 (cont’d)
Practice Problem 2
Practice Problem 2 (cont’d)
Consider a 2nx2n binary image:
Practice Problem 2
Practice Problem 2
Practice Problem 3
Practice Problem 3 (cont’d)

You might also like