Image Compression

Image Compression
CS474/674 – Prof. Bebis

Chapter 8 (except Sections 8.10-8.12)
Image Compression
• Reduce the amount of data while preserving as much
information as possible!
– Lower memory requirements.
– Faster transmission rates.
Data ≠ Information
• Data and information are not synonymous terms!
• Data is the means by which information is conveyed.
• The same information can be represented by different

amounts of data!
Data ≠ Information - Example
Ex1: Your wife, Helen, will meet you at Logan Airport

in Boston at 5 minutes past 6:00 pm tomorrow
night
Ex2: Your wife will meet you at Logan Airport at 5

minutes past 6:00 pm tomorrow night
Ex3: Helen will meet you at Logan at 6:00 pm

tomorrow night
Image Compression (cont’d)
• Lossless
– No information loss
– Low compression ratios
• Lossy
– Information loss
– High compression ratios
Trade-off: information loss vs compression ratio

Compression Ratio
compression
Compression ratio:
Relevant Data Redundancy
where
Example:
Types of Data Redundancy
(1) Coding Redundancy

(2) Interpixel Redundancy
(or Spatial Redundancy)
(3) Psychovisual Redundancy
(or Irrelevant Information Redundancy)
• The goal of data compression is to reduce one or more

of these types of redundancy.
Coding Redundancy
• A code is a system of rules for representing data (e.g., image
pixels) in some alternative form for efficient storage,
transmission, secrecy etc.
– A code consists of a list of
symbols (e.g., letters, numbers).
– A code word is a sequence of
symbols used to represent the
data (e.g., pixel values).
– The length of a code word is the
number of symbols in the code
word
– Length could be fixed or variable.
Coding Redundancy (cont’d)
• Coding redundancy results from employing inefficient
coding schemes.
• How do we compare different coding schemes?
– Compute the average number of symbols Lavg per

code word (data content).
– Coding schemes with lower Lavg are more efficient.

(i.e., require less memory).
Computing Lavg for Images
N x M image
(symbols: 0/1 bits)
rk: k-th gray level E ( X )   xP ( X  x)

l(rk): # of bits for representing rk x
P(rk): probability of rk
L 1 L 1
Lavg  E (l (rk ))   l (rk ) P(l (rk ))   l (rk ) P(rk ) bits/pixel
k 0 k 0
Average image size: bits

Coding Redundancy - Example
• Case 1: l(rk) = fixed length
L=8
/pixel
Average image size: 3NM bits

Coding Redundancy – Example (cont’d)
• Case 2: l(rk) = variable length
L=8
/pixel
Average image size: 2.7NM bits

Interpixel redundancy
• Interpixel redundancy results from pixel correlations (i.e.,
a pixel value can be reasonably predicted by its neighbors).
histograms
auto-correlation

f ( x) o g ( x)   f ( x) g ( x  a )da

auto-correlation: f(x)=g(x)
Interpixel redundancy (cont’d)
• Interpixel redundancy is typically addressed by applying
some transformation to the data first.
Example: Pixel values at line #100
Grayscale threshold
thresholding
transformation 11 ……………0000……………………..11…..000…..
Binary
Original: 1024 bytes
Thresholded: 1024 bits
Psychovisual redundancy
• The human eye is more sensitive to lower frequencies than to
higher frequencies in the visual spectrum.
• Discard data that is perceptually insignificant.
256 gray levels 16 gray levels random noise + 16 gray levels
Add a small
Example: random
represent number
to each pixel
pixels using prior to
fewer bits! quantization
CR=8/4 = 2:1
Data ≠ Information (revisited)
Goal: reduce the amount of data while preserving as

much information as possible!
Question: What is the minimum amount of data that

preserves the information content of an image?
We need some measure of information!

How do we measure information?
• We assume that information is generated by some
probabilistic process.
• Idea: associate information with probability:
– Events with high probability contain less information.
– Events with low probability contain more information.
• A random event E with probability P(E) contains
If P(E)=1, then I(E)=0! (no information)

How much information does a pixel contain?
• We assume that pixel values are generated by some

random process.
• How much information does a pixel value rk contain?
units of information!
(assuming statistically independent random events)

How much information does an image contain?
• The average information content of an image is:

L 1
E   I (rk ) P(rk )
k 0
using
units of info / pixel

Entropy:
(e.g., bits/pixel)
Entropy – Example
H=1.6614 bits/pixel H=8 bits/pixel H=1.566 bits/pixel
The amount of entropy, and thus information in an image,

is far from intuitive!
Data Redundancy
• Data redundancy can be computed by comparing data to
information:
Do not confuse R with RD
(relative data redundancy)
data information
where:
Note: if Lavg= H, then R=0 (no data redundancy)

Data Redundancy - Example
Lavg = 8 bits/pixel
R= Lavg- H or R= 6.19 bits/pixel

Entropy Estimation
• Estimating H reliably is not easy!
First order estimate of H Second order estimate of H

Use pixel frequencies: Use pixel block frequencies:
Which entropy estimate is more reliable?

Entropy Estimation (cont’d)
• In general, differences between first-order and

higher-order entropy estimates indicate the presence
of interpixel redundancy.
• As mentioned earlier, interpixel redundancy can be

addressed by applying some transformation to the
data.
– Let’s look at a simple example!
Entropy Estimation - Example
• Consider a transformation that simply subtracts column i-1
from column i:
original image difference image
transformation
max value: 243 (8 max value: 74 (7

bits/pixel) bits/pixel)
• No information has been lost – why?
– Add column i to column i+1
Estimating Entropy – Example (cont’d)
• Could a better transformation be found?
• What is the entropy of the difference image?
16
Less than the 1st order entropy of the

original image (1.41 < 1.81 bits/pixel)
•It is possible that a better transformation can be found since the

2nd order entropy estimate is even lower:
General Image Compression and
Transmission Model
We will focus on the Source Encoder/Decoder only.

Encoder – Three Main Components
• Mapper: applies a transformation to the data to account for

interpixel redundancies.
Encoder (cont’d)
• Quantizer: quantizes the data to account for psychovisual

redundancies.
Encoder (cont’d)
• Symbol encoder: encodes the data to account for coding

redundancies.
Decoder - Three Main Components
• The decoder applies the same steps in inverse order.
• Note: Quantization is irreversible in general!

Fidelity Criteria
• How close is to ?
• Criteria
– Subjective: based on human observers.
– Objective: based on mathematically defined criteria.
Subjective Fidelity Criteria
Objective Fidelity Criteria
• Root mean square error (RMS)
• Signal-to-noise ratio (SNR)

Lossless Compression
Taxonomy of Lossless Methods
(Run-length encoding)
(see “Image Compression Techniques” paper)

Huffman Coding
(addresses coding redundancy)
• A variable-length coding technique.
• Source symbols (e.g., gray levels) are encoded one at a time.

• One-to-one correspondence:
source symbols ↔ code words
• Optimal: minimizes code word length per source symbol.
Huffman Coding (cont’d)
• Forward Pass
1. Sort probabilities per symbol (e.g., gray-levels)
2. Combine the lowest two probabilities
3. Repeat Step2 until only two probabilities
remain.
• Backward Pass
Assign code symbols going backwards
• Lavg assuming binary coding:
• Lavg assuming Huffman coding:

Huffman Decoding
• Decoding can be performed unambiguously using a

look-up table.
• Scan symbols one at a time until you find a match,
then repeat the process.
Arithmetic (or Range) Coding
(addresses coding redundancy)
• Huffman coding encodes source symbols one at a time which

might not be efficient in general.
• Arithmetic coding assigns sequences of source symbols to
variable length code words.
• No one-to-one correspondence:
(source symbols ↔ code words)
• Slower than Huffman coding but can achieve higher compression.

Arithmetic Coding – Main Idea
• Maps a sequence of symbols to a real number (arithmetic
code) in the interval [0, 1).
α1 α2 α3 α3 α4
• The mapping is built incrementally (i.e., scanning source

symbols in sequence) and depends on the source symbol
probabilities.
Arithmetic Coding – Main Idea (cont’d)
Symbol sequence: α1 α2 α3 α3 α4
known probabilities P(αi)
– Start with the interval [0, 1)
0 1
– A sub-interval of [0,1) is chosen to encode the first symbol α1 in the sequence
(based on P(α1)).
0 1
– A sub-interval within the previous sub-interval is chosen to encode the next

symbol α2 in the sequence (based on P(α2)).
0 1
– Eventually, the whole symbol sequence is encoded by choosing some

number within the final sub-interval, e.g.:
final
Arithmetic Coding - Example
Subdivide [0,1)
based on P(αi)
Encode
α1 α2 α3 α3 α4
Subdivide Subdivide Subdivide Subdivide Subdivide
[0.06752, 0.0688) 0.8 0.16

final sub-interval
0.4 0.08
arithmetic code: 0.068 0.04
0.2
(can choose any number
within the final sub-interval)
Warning: finite precision arithmetic might cause problems due to truncations!

Arithmetic Coding - Example (cont’d)
• The arithmetic code 0.068 can be encoded using Binary Fractions:
0.0068 ≈ 0.000100011 (9 bits) α1 α2 α3 α3 α4

• Huffman Code:
0100011001 (10 bits) Example
• Fixed Binary Code:

5 x 8 bits/symbol = 40 bits
Arithmetic Decoding - Example
Subdivide based
on P(αi) Subdivide Subdivide Subdivide Subdivide
1.0 0.8 0.72 0.592 0.5728
α4 α4 α4 α4 α4
0.8 0.72 0.688 0.5856 0.57152
Decode 0.572
α3 α3 α3 α3 α3
0.4 0.56 0.624 0.5728 0.56896
α2 α2 α2 α2 α2 α3 α3 α1 α2 α4
0.2 0.48 0.592 0.5664 0.56768
α1 α1 α1 α1 α1 A special EOF symbol can

be used to terminate iterations .
0.0 0.4
0.56 0.56 0.5664
LZW Coding
(addresses interpixel redundancy)
• Requires no prior knowledge of symbol probabilities.
• Assigns sequences of source symbols to fixed length

code words.
• No one-to-one correspondence:
(source symbols ↔ code words)

• Included in GIF, TIFF and PDF file formats
LZW Coding
• LZW builds a codebook (or dictionary) of symbol
sequences (i.e., gray-level sequences) as it processes the
image pixels.
• Each symbol sequence is encoded by its dictionary
location.
Dictionary
Location Entry
Each dictionary location can be
0 … encoded by 9 bits in this example.
1 …
… … Therefore, the sequence of gray-
… …
levels 10-120-51 will be encoded by
240 10-120-51
… … 9 bits instead of 3 bytes!
511 -
LZW Coding (cont’d)
Dictionary Initialization
39 39 126 126
First 256 entries are 39 39 126 126
assigned to gray levels 39 39 126 126
0,1,2,..,255 39 39 126 126
Dictionary
Location Entry As the encoder examines the
image pixels, gray level
0 0
1 1 sequences that are not in the
… … dictionary are added to the
255 255
256 - dictionary.
… …
511 -
Dictionary Initialization
39 39 126 126
First 256 entries are 39 39 126 126
assigned to gray levels 39 39 126 126
0,1,2,..,255 39 39 126 126
Dictionary
- Is 39 in the dictionary……..Yes
Location Entry
- What about 39-39………….No
0 0 * Add 39-39 at location 256
1 1
… …
255 255
256 - 39-39
… …
511 - So, 39-39 will be encoded by 256
39 39 126 126
39 39 126 126
39 39 126 126
39 39 126 126
Can be implemented
efficiently using a queue!
1. Scan next symbol and
enter it in the queue.
2. Check if the sequence in the
queue exists in the dictionary.
3. If true, dequeue it and make
a new entry in the dictionary; 10 x 9 bits/symbol = 90 bits vs 16 x 8 bits/symbol = 128 bits
else, go to step 1.
Decoding LZW
• Decoding can be done using the dictionary again.

• For image transmission, there is no need to transmit
the dictionary for decoding.
• The dictionary can be built on the “fly” by the
decoder as it reads the received code words.
Run-length coding (RLC)
• Represent sequences of repeating symbols (a “run”) using a
compact representation (symbol, count) :
(i) symbol: the symbol itself
(ii) count: the number of times the symbol repeats
111110000001
 (1,5) (0, 6) (1, 1)
aaabbbbbbcc  (a,3) (b, 6) (c, 2)
• Each pair (symbol, count) can be thought as a “new” symbol
which can be encoded using, for example, Huffman coding.
Bit-plane coding
• Decompose an image into a series of bit planes.

(i.e., 8 bit planes for PGM images)
• Compress each bit plane separately (e.g., using RLC)

Lossy Methods - Taxonomy
(see “Image Compression Techniques” paper)

Lossy Compression – Transform Coding
• Transform the image into some other domain to address
interpixel redundancy.
Quantization is irreversible in general!

Example: DFT
Note that |F(u,v)| decreases, as

u, v increase!
K << N
Idea: Approximate f(x,y) using fewer

terms (i.e., largest F(u,v) coefficients)!
K-1 K-1
What transformations can be used?
• Various transformations T(u,v) are possible, for example:

– DFT
– DCT (Discrete Cosine Transform)
– KLT (Karhunen-Loeve Transformation)
– PCA (Principal Component Analysis)
• JPEG uses DCT – let’s see why!

DCT (Discrete Cosine Transform)
Forward:
Inverse:
if u=0 if v=0
if u>0 if v>0
DCT – Basis Functions
• Basis functions for a 4x4 image (i.e., cosines of

different frequencies).
Compare DCT with other transformations
DFT WHT DCT
Image is divided into

8 x 8 sub-images
(64 coefficients per
sub-image). Reconstruction error (RMS)
Sub-images were
reconstructed by
truncating 50% of the
smallest coefficients.
2.32 1.78 1.13

DCT - Sub-image size selection
Performed experiments
using a large number of
random images.
Reconstructions (75% truncation of coefficients)

original 2 x 2 sub-images 4 x 4 sub-images 8 x 8 sub-images
JPEG Compression
Entropy
encoder
Became an
international
image
compression
standard in
1992.
Entropy
decoder
JPEG - Steps
1. Divide image into 8x8 sub-images.
For each sub-image do:

2. Shift the gray-levels in the range [-128, 127]
(i.e., reduces the dynamic range requirements of DCT)
3. Apply DCT; yields 64 coefficients

1 DC coefficient: F(0,0)
63 AC coefficients: F(u,v)
Example
[-128, 127] (DCT spectrum)

Note: the low frequency components
are around the upper-left corner of
the spectrum (not centered!).
JPEG Steps (cont’d)
4. Quantize coefficients (i.e., reduce the amplitude of

coefficients that do not contribute a lot).
Q(u,v): quantization array

Computing Q[i][j] - Example
• Quantization Array Q[i][j]

Example (cont’d)
Cq(u,v)
C(u,v)
Quantization
Q(u,v) Small magnitude coefficients

have been truncated to zero!
“quality” controls how many of

them will be truncated!
5. Order the coefficients using zig-zag ordering
Creates long runs of zeros (i.e., ideal for RLC)

6. Encode coefficients:
6.1 Form “intermediate” symbol sequence.
6.2 Encode “intermediate” symbol sequence into

DC coefficient is encoded differently from AC coefficients.
a binary sequence using Huffman coding.
Intermediate Symbol Sequence – DC coefficient
symbol_1 (SIZE) symbol_2 (AMPLITUDE)

(6) (61)
SIZE: # bits needed to

encode the amplitude
Amplitude Encoding of DC coefficient
symbol_1 symbol_2
(SIZE) (AMPLITUDE)
We use
predictive
coding: 64x64 64x64
The DC coefficient of blocks other than the first

block is substituted by the difference between the
DC coefficient of the current block and that of the
previous block.
Intermediate Symbol Sequence – AC coefficient
symbol_1 (RUN-LENGTH, SIZE) symbol_2 (AMPLITUDE) end of block
RUN-LENGTH: run of zeros preceding coefficient

SIZE: # bits for encoding the amplitude of coefficient
Note: If RUN-LENGTH > 15, use symbol (15,0) , e.g.:
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2  (15, 0)(3, 2)(2)
AC Coefficient Encoding
Symbol_1 Symbol_2
(Variable Length Code (VLC) (Variable Length Integer (VLI)
pre-computed Huffman codes) pre-computed codes)
# bits
Idea: smaller (and
more common)
values are assigned
fewer bytes and take
up less space than
larger (and less
common) values.
DC coefficients are
encoded similarly.
(1,4) (12)  (111110110 1100)

VLC VLI
Final Symbol Sequence
Effect of “Quality” parameter
(58k bytes) (21k bytes) (8k bytes)
lower compression higher compression

Effect of Quantization:
homogeneous 8 x 8 block
homogeneous 8 x 8 block (cont’d)
De-quantized coefficients
Quantized coefficients (multiply by Q(u,v))
homogeneous 8 x 8 block (cont’d)
Reconstructed
Reconstruction error is low!
Original
non-homogeneous 8 x 8 block
non-homogeneous 8 x 8 block (cont’d)
De-quantized coefficients
Quantized coefficients (multiply by Q(u,v))
non-homogeneous 8 x 8 block (cont’d)
Reconstructed
Reconstruction error is high!
Original
Case Study: Fingerprint Compression
• FBI is digitizing fingerprints at 500 dots per inch

using 8 bits of grayscale resolution.
• A single fingerprint card (contains fingerprints from
all 10 figures) turns into about 10 MB of data.
A sample fingerprint image

768 x 768 pixels =589,824 bytes
Need to Preserve Fingerprint Details
The "white" spots in the middle of

the black ridges are sweat pores
which are admissible points of
identification in court.
These details are just a couple

pixels wide!
What compression scheme should be used?
• Lossless or lossy compression?
• In practice lossless compression methods haven’t

done better than 2:1 on fingerprints!
• Does JPEG work well for fingerprint compression?

Results using JPEG compression
file size 45853 bytes
compression ratio: 12.9
Fine details have been lost.
Image has an artificial ‘‘blocky’’

pattern superimposed on it.
Artifacts will affect the

performance of fingerprint
recognition.
WSQ Fingerprint Compression
• An image coding standard for digitized fingerprints

employing the Discrete Wavelet Transform
(Wavelet/Scalar Quantization or WSQ).
• Developed and maintained by:

– FBI
– Los Alamos National Lab (LANL)
– National Institute for Standards and Technology (NIST)
Results using WSQ compression
file size 45621 bytes
compression ratio: 12.9
Fine details are better

preserved.
No “blocky” artifacts.
WSQ Algorithm
Target bit rate can be set via a parameter, similar

to the "quality" parameter in JPEG.
Compression ratio
• FBI’s target bit rate is around 0.75 bits per pixel (bpp)
• This corresponds to a compression ratio of

8/0.75=10.7
• Let’s compare WSQ with JPEG …

Varying compression ratio (cont’d)
0.9 bpp compression
WSQ image, file size 47619 bytes, JPEG image, file size 49658 bytes,
compression ratio 12.4 compression ratio 11.9
0.75 bpp compression
WSQ image, file size 39270 bytes JPEG image, file size 40780 bytes,
0.6 bpp compression
WSQ image, file size 30987 bytes, JPEG image, file size 30081 bytes,
JPEG Modes
• JPEG supports several different modes:

– Sequential Mode
– Progressive Mode
– Hierarchical Mode
– Lossless Mode
(see “Survey” paper)

Sequential Mode
• Image is encoded in a single scan (left-to-right, top-to-

bottom); this is the default mode.
Progressive JPEG
• Image is encoded in multiple scans.
• Produces a quick, roughly decoded image when
transmission time is long.
Progressive JPEG (cont’d)
• Main algorithms:
(1) Progressive spectral selection algorithm
(2) Progressive successive approximation algorithm
(3) Hybrid progressive algorithm
(1) Progressive spectral selection algorithm

– Group DCT coefficients into several spectral bands
– Send low-frequency DCT coefficients first
– Send higher-frequency DCT coefficients next
Example:
(2) Progressive successive approximation algorithm

– Send all DCT coefficients but with lower precision.
– Refine DCT coefficients in subsequent scans.
(3) Hybrid progressive algorithm

Combines spectral selection and successive
approximation
Example
after 0.9s after 1.6s
after 3.6s after 7.0s

Hierarchical JPEG
• Hierarchical mode encodes the f4
image at different resolutions. N/4 x N/4
f2
• Image is transmitted in multiple

N/2 x N/2
passes with increased resolution at
each pass.
f
NxN
Hierarchical JPEG (cont’d)
f4
N/4 x N/4
f2
down-sample
N/2 x N/2
up-sample
NxN
Quiz #7
• When: 12/11/2023
• What: Image Compression
• Study the problems provided on the next slides to practice

for the exams.
Practice Problem 1
Practice Problem 1 (cont’d)
Practice Problem 2
Consider a 2nx2n binary image:
Practice Problem 2
Practice Problem 2
Practice Problem 3

Image Compression

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Image Compression

Uploaded by

Copyright:

Available Formats

Image Compression

CS474/674 – Prof. Bebis

• Data and information are not synonymous terms!

• Data is the means by which information is conveyed.

• The same information can be represented by different

Ex1: Your wife, Helen, will meet you at Logan Airport

Ex2: Your wife will meet you at Logan Airport at 5

Ex3: Helen will meet you at Logan at 6:00 pm

Trade-off: information loss vs compression ratio

(1) Coding Redundancy

• The goal of data compression is to reduce one or more

• How do we compare different coding schemes?

– Compute the average number of symbols Lavg per

– Coding schemes with lower Lavg are more efficient.

rk: k-th gray level E ( X )   xP ( X  x)

Average image size: bits

• Case 1: l(rk) = fixed length

Average image size: 3NM bits

Average image size: 2.7NM bits

Example: Pixel values at line #100

256 gray levels 16 gray levels random noise + 16 gray levels

Goal: reduce the amount of data while preserving as

Question: What is the minimum amount of data that

We need some measure of information!

• A random event E with probability P(E) contains

If P(E)=1, then I(E)=0! (no information)

• We assume that pixel values are generated by some

• How much information does a pixel value rk contain?

(assuming statistically independent random events)

• The average information content of an image is:

units of info / pixel

H=1.6614 bits/pixel H=8 bits/pixel H=1.566 bits/pixel

The amount of entropy, and thus information in an image,

Note: if Lavg= H, then R=0 (no data redundancy)

R= Lavg- H or R= 6.19 bits/pixel

First order estimate of H Second order estimate of H

Which entropy estimate is more reliable?

• In general, differences between first-order and

• As mentioned earlier, interpixel redundancy can be

max value: 243 (8 max value: 74 (7

Less than the 1st order entropy of the

•It is possible that a better transformation can be found since the

We will focus on the Source Encoder/Decoder only.

• Mapper: applies a transformation to the data to account for

• Quantizer: quantizes the data to account for psychovisual

• Symbol encoder: encodes the data to account for coding

• The decoder applies the same steps in inverse order.

• Note: Quantization is irreversible in general!

• Root mean square error (RMS)

• Signal-to-noise ratio (SNR)

(see “Image Compression Techniques” paper)

• Source symbols (e.g., gray levels) are encoded one at a time.

• Lavg assuming Huffman coding:

• Decoding can be performed unambiguously using a

• Huffman coding encodes source symbols one at a time which

• Slower than Huffman coding but can achieve higher compression.

• The mapping is built incrementally (i.e., scanning source

– A sub-interval within the previous sub-interval is chosen to encode the next

– Eventually, the whole symbol sequence is encoded by choosing some

Subdivide Subdivide Subdivide Subdivide Subdivide

[0.06752, 0.0688) 0.8 0.16

Warning: finite precision arithmetic might cause problems due to truncations!

• The arithmetic code 0.068 can be encoded using Binary Fractions:

0.0068 ≈ 0.000100011 (9 bits) α1 α2 α3 α3 α4