You are on page 1of 7

Why compression?

Image data is by its nature multidimensional and tend to take up a lot of space

Storage requirements
2 hours of video in HDTV resolution
Compression 2 * 60 * 60 * 30 * 1920 * 1080 * 3 ≈ 1.22TB

Computational expenses
Encryption of data is computationally expensive

Transmission time & costs


ISPs charge by the data amount, phone companies by the time unit

Limited hardware resources (memory, CPU etc)


Cell phones with cameras, USB memory sticks
Lecture 8 ©2005 P-O Östberg, Umeå University Page 1 Lecture 8 ©2005 P-O Östberg, Umeå University Page 2

Compression types Compression applications


The fundamental task of image compression is to reduce the amount of data There are two major categories of compression applications
required to represent and image. this is done by transforming and removing • Transmission compression
image data redundancies Transmission time and throughput limits computation time
Mathematically this means transforming data to a statistically uncorrelated set Interactivity demands short response times
Bandwidth limitations may require constant compression ratios
There are two major categories of compression algorithms • Storage compression
• Lossless compression algorithms Data analysis may afford higher compression ratios
The original data is reconstructed perfectly
Theoretical limit on maximal compression The same type of algorithms are used in both cases but the application
Practical compression ratios of <10 : 1 (still images) specifications determine how
• Lossy compression algorithms
Decompression results in an approximation of the original image
Maximal compression rate is a function of reconstruction quality
Practical compression ratios of >10 : 1 (still images)

Lecture 8 ©2005 P-O Östberg, Umeå University Page 3 Lecture 8 ©2005 P-O Östberg, Umeå University Page 4

Data usage patterns Redundancy vs Compression ratio


Symmetric applications Data are the means by which information is conveyed
Data is typically compressed & decompressed the same number of times Various amounts of data can represent the same amount of information
Compression and decompression involves roughly the same computations
Video telephony 1
Data redundancy RD = 1 −
Asymmetric applications
CR
Data is typically compressed once and decompressed many times
n1
Compression computationally expensive, decompression in real-time Compression ratio CR =
Offline scene rendering in a computer game or movie n2
Video decoding and rendering - 1/30 seconds round trip time
Redundancy types
Time dependent applications Coding redundancy
Data is typically time resolved and may be skipped upon failure Interpixel redundancy
Video transmission over networks with packet loss Psychovisual redundancy
Lecture 8 ©2005 P-O Östberg, Umeå University Page 5 Lecture 8 ©2005 P-O Östberg, Umeå University Page 6

1
Coding redundancy Interpixel redundancy
Coding redundancy is due to inefficiencies in data representations Interpixel redundancy is defined as failure to identify and utilize data
relationships
n
p r (rk ) = k k = 0,1,..., L − 1
n If a pixel value can be reasonably predicted from its neighboring (or preceeding
L −1 / following) pixels the image is said to contain interpixel redundancy
Lavg = ∑ l (rk ) p r ( rk )
k =0 Interpixel redundancy depends on the resolution of the image
pr(rk) – the probability that a pixel r will have value k • The higher the (spatial) resolution of an image, the more probable it is that
two neighboring pixels will depict the same object
L - the number of (possible) grays levels
• The higher the frame rate in a video stream, the more probable it is that the
nk - the number of pixels with value k corresponding pixel in the following frame will depict the same object
n - the number of pixels
l(rk) – the number of bits required to store a pixel r with value k These types of predictions are made more difficult by the presence of noise
L – the number of (possible) pixel values
Lavg - the average number of bits required to store a pixel
Lecture 8 ©2005 P-O Östberg, Umeå University Page 7 Lecture 8 ©2005 P-O Östberg, Umeå University Page 8

Psychovisual redundancy Quantization effects


Psychovisual redundancy stem from the fact that the human eye does not
respond with equal intensity to all visual information

The human visual system does not rely on quantitative analysis of individual
pixel values when interpreting an image – an observer searches for distinct
features and mentally combines them into recognizable groupings
In this process certain information is relatively less important than other – this
information is called psychovisually redudant

Psychovisually redundant image information can be identified and removed – a


process referred to as quantization
Quantization eliminates data and therefore results in lossy data compression

Reconstruction errors introduced by quantization can be evaluated objectively


or subjectively depending on the application need & specifications
Lecture 8 ©2005 P-O Östberg, Umeå University Page 9 Lecture 8 ©2005 P-O Östberg, Umeå University Page 10

Image compression models Image compression models

The channel encoder inserts a form of “controlled redundancy” (e.g. Hamming Mapper - transforms the image to a (non-visual) format designed to reduce
codes or parity bits) in the data representation in order to detect and / or interpixel redundancies
correct errors introduced by channel noise Quantizer - reduces psychovisual redundancies by quantizing data deemed
This increases the size of the data representation and if the channel is noise less important for visual interpretation (omitted for lossless compression)
free (i.e. not prone to errors) the channel encoder and decoder are omitted Symbol encoder - codes the data efficiently (typically using some form of
variable-length coding scheme) and aims to reduce coding redundancies
Lecture 8 ©2005 P-O Östberg, Umeå University Page 11 Lecture 8 ©2005 P-O Östberg, Umeå University Page 12

2
Lossless compression Huffman coding
Provides limited compression capabilities - information theory provides a Provides a data representation with the smallest possible number of code
mathematical framework for (and good measures of) the levels of symbols per source symbol (when coding the symbols of an information
compression that can be achieved source individually)

Principally motivated by application specifications – data loss may be Huffman code construction is done in two steps
unacceptable 1) Source reductions
• Medical imaging – diagnostic quality must be maintained Create a series of source reductions by ordering the probabilities of the
• Satellite imaging – image acquisition costs are astronomical symbols and then combine the symbols of lowest probability to form new
• Legal and historical archival symbols recursively
• Future application requirements may be unknown This step is sometimes referred to as “building a Huffman tree” and can be
done statistically
2) Code assignment
When only two symbols remain, retrace the steps in 1) and assign a code
bit to the symbols in each step
Encoding and decoding is done using these codes in a lookup table manner
Lecture 8 ©2005 P-O Östberg, Umeå University Page 13 Lecture 8 ©2005 P-O Östberg, Umeå University Page 14

Huffman coding Arithmetic coding


Provides a data representation where the entire symbol sequence is encoded
as a single arithmetic code word (which is represented as an interval of
real numbers between 0 and 1)

Arithmetic code construction is done in 3 steps


1) Subdivide the half-open interval [0,1) based on the probabilities of the
source symbols
2) For each source symbol recursively
• Narrow the interval to the sub-interval designated by the encoded symbol
• Subdivide the new interval among the source symbols based on probability
3) Append an end-of-message indicator

Encoding is done by choosing any number in the interval to represent the data
Decoding is done by retracing the steps in the code construction

Lecture 8 ©2005 P-O Östberg, Umeå University Page 15 Lecture 8 ©2005 P-O Östberg, Umeå University Page 16

Arithmetic coding Lempel-Ziv-Welch (LZW) coding


Provides a data representation where fixed-length code words are assigned to
variable length sequences of source symbols
LZW reduces both coding redundancies as well as interpixel redundancies and
requires no knowledge of the source symbol probabilities

LZW code construction is done by constructing a dictionary which translates


encountered source symbol sequences to code symbols
This dictionary is created as the data stream is being processed and can be
flushed (reinitialized) when it is full or when coding ratios decline
The size of the dictionary affects compression performance
• Small dictionaries are less likely to contain an encountered source symbol
• Large dictionaries results in larger code words (lower compression ratios)

LZW is patent protected

Lecture 8 ©2005 P-O Östberg, Umeå University Page 17 Lecture 8 ©2005 P-O Östberg, Umeå University Page 18

3
Run-length coding Predictive coding
Provides a data representation where sequences of symbol values of the same Provides a data representation where code words express source symbol
value are coded as “run-lengths” deviations from predicted values (usually values of neighboring pixels)
Predictive coding efficiently reduces interpixel redundancies
• 1D – the image is treated as a row-by-row array of pixels
• 2D – blocks of pixels are viewed as source symbols • 1D & 2D – pixels are predicted from neighboring pixels
• 3D – pixels are predicted between frames as well
Works well for simple, artificial images
Does not work in the presence of noise Works well for all images with a high degree of interpixel redundancies
Can be combined with bit-plane coding (using Gray codes) Works in the presence of noise (just not as efficiently)

Input data 222222222226666666666666699999999999999999 Input data 222222222226666666666666699999999999999999


Code 2,11 6,14 9,17 Code 200000000004000000000000030000000000000000

Predictive coding can be used in both lossless and lossy compression schemes

Lecture 8 ©2005 P-O Östberg, Umeå University Page 19 Lecture 8 ©2005 P-O Östberg, Umeå University Page 20

Lossless predictive coding Lossy predictive coding

Lecture 8 ©2005 P-O Östberg, Umeå University Page 21 Lecture 8 ©2005 P-O Östberg, Umeå University Page 22

Lossy compression Transform coding


Provides higher compression capabilities - reconstruction quality (and therefore In transform coding a linear transform is used to map the image data into a set
also maximal compression ratios) is usually specified in terms of application of transform coefficients (which are then quantized and coded)
fidelity requirements For reasons of computational complexity the image is subdivided into smaller
subimages before the transformation computation
Principally motivated by application specifications – higher compression ratios
may be fundamentally required (e.g. video telephony) The goal of the transformation process is to decorrelate (minimize the variance)
• Distortions are tolerated because of higher compression ratios the pixels of the subimages as much as possible
• The observers visual system may not be able to discern these distortions This will result in a localization of the image information in a minimal number of
• Why include data that isn’t needed? transform coefficients - the transformation coefficients with little data can
then be more coarsely quantized or eliminated
The better information compaction, the better reconstruction approximations

Transformation coding can be


• Adaptive – each subimage is treated in an individual manner
• Non-adaptive – all subimages are treated in the same way
Lecture 8 ©2005 P-O Östberg, Umeå University Page 23 Lecture 8 ©2005 P-O Östberg, Umeå University Page 24

4
Transform coding Karhunen-Loève Transform (KLT)
The Karhunen-Loève Transform (KLT, also known as the Hotelling transform)
minimizes the mean-square error between the original (sub)image and the
(decompressed) reconstruction

This makes the KLT the optimal information compaction transformation for any
image and any number of retained coefficients
The basis functions of the KLT depend on the image however, making them
impossible to precompute and therefore not a practical choice for image
compression

Lecture 8 ©2005 P-O Östberg, Umeå University Page 25 Lecture 8 ©2005 P-O Östberg, Umeå University Page 26

Discrete Cosine Transform (DCT) Discrete Cosine Transform (DCT)


The Discrete Cosine Transform (DCT) offers a good approximation of the
optimal information compaction ability of the KLT while still allowing for
N −1 N −1
⎡ (2 x + 1)uπ ⎤ ⎡ (2 y + 1)vπ ⎤
C (u , v) = ∑∑ f ( x, y )α (u )α (v) cos ⎢
precomputation by having data-independent basis functions
⎥ cos ⎢ ⎥
x = 0 y =0 ⎣ 2N ⎦ ⎣ 2N ⎦
Compared to the Fourier transform the DCT also has the added benefits of
• being completely real N −1 N −1
⎡ (2 x + 1)uπ ⎤ ⎡ (2 y + 1)vπ ⎤
• being better at handling discontinuities (the Gibbs phenomenon*)
f ( x, y ) = ∑∑ C (u, v)α (u )α (v) cos ⎢ ⎥ cos ⎢ 2 N ⎥
u =0 v = 0 ⎣ 2N ⎦ ⎣ ⎦
For these reasons the DCT has been something of a standard in image
compression systems ⎧ 1
⎪⎪ u=0
α (u ) = ⎨ N
*) Causes increased blocking artifacts in the reconstructed subimage when ⎪ 2
u = 1,2,..., N − 1
using the Fourier transform for transform based image compression ⎪⎩ N

Lecture 8 ©2005 P-O Östberg, Umeå University Page 27 Lecture 8 ©2005 P-O Östberg, Umeå University Page 28

Discrete Cosine Transform (DCT) Discrete Cosine Transform (DCT)

Lecture 8 ©2005 P-O Östberg, Umeå University Page 29 Lecture 8 ©2005 P-O Östberg, Umeå University Page 30

5
Subimage size selection Subimage size selection

Both transform coding reconstruction error and computational complexity are


functions of the subimage size

Usually a (integer power of two) size is chosen that will reduce the correlation
between adjacent subimages in the reconstructed image

Lecture 8 ©2005 P-O Östberg, Umeå University Page 31 Lecture 8 ©2005 P-O Östberg, Umeå University Page 32

Bit allocation Bit allocation


The process of truncating, quantizing and coding the coefficients of the Zonal coding - transform coefficients are selected using a fix filter mask, which
transformed subimage is referred to as bit allocation has been formed to fit the maximum variance of an average image
Knowing which transform coefficients that are going to be retained makes it
The reconstruction error in a decompressed image is a function of possible to optimize the transform computation – only the selected
• The number of coefficients discarded transform coefficients need to be computed
• The precision used to store the remaining coefficients
• The relative importance of these coefficients Threshold coding - coefficients are selected with respect to their magnitude
There are several ways of implementing threshold coding
Most transform coding systems select which coefficients to retain • Global thresholding – use the same (global) threshold value for all images
• On the basis of maximum variance – zonal coding • Adaptive thresholding – the threshold value is based on the coefficients
• On the basis of maximum magnitude – threshold coding • Local thresholding – the threshold value is based on the location of the coeff
• N-largest – the N largest coefficients are retained (requires ordering)

Note that some thresholding coding schemes result in a predefined


reconstruction error rather than a specified compression rate
Lecture 8 ©2005 P-O Östberg, Umeå University Page 33 Lecture 8 ©2005 P-O Östberg, Umeå University Page 34

Bit allocation Bit allocation

Lecture 8 ©2005 P-O Östberg, Umeå University Page 35 Lecture 8 ©2005 P-O Östberg, Umeå University Page 36

6
Wavelet coding Wavelet coding
Like transform coding, wavelet coding is based on the premise that using a
linear transform (here a wavelet transform) will result in transform
coefficients that can be stored more efficiently than the pixels themselves
Due to the facts that wavelets are computationally efficient and that the wavelet
basis functions are limited in duration, subdivision of the original image is
unnecessary

Wavelet coding typically produces a more efficient compression than DCT-


based systems
• Objectively – the mean square error of wavelet-based reconstructions are
typically lower
• Subjectively - the blocking artifacts (characteristic of DCT-based systems at
high compression ratios) are not present in wavelet reconstructions

The choice of wavelet to use greatly affects the compression efficiency

Lecture 8 ©2005 P-O Östberg, Umeå University Page 37 Lecture 8 ©2005 P-O Östberg, Umeå University Page 38

Wavelet coding Wavelet selection

Lecture 8 ©2005 P-O Östberg, Umeå University Page 39 Lecture 8 ©2005 P-O Östberg, Umeå University Page 40

You might also like