Digital Imaging
2006/2007
Lecture 3
Image processing II
Ioannis Ivrissimtzis 05 - Mar - 2007
Summary of the lecture
Frequency domain
Basis change
Walsh - Hadamard transform
Tensor product transforms
Discrete Cosine Transform
Image compression
Encoding
Huffman encoding
Image compression
JPEG
Spatial/Frequency domain
The analysis and processing of an image can be done in different
domains:
The spatial domain (up to now we always worked in this domain).
A frequency domain.
Basis change
Example: We have two numbers a,b.
We can represent them separately as a,b.
We can represent them by two different numbers (a+b)/2, (a-b)/2.
If we know a,b we can find (a+b)/2, (a-b)/2.
If we know (a+b)/2, (a-b)/2 we can find a,b (check).
Basis change
In matrix language we have
a 1 0 a
b 0 1 b
for the first representation and
a b a b a b
a 2 2 1 1 2
b a b a b 1 1 a b
2 2 2
for the second.
Basis change
We use the terminology:
a 1 0 a
b 0 1 b Coefficients
Basis
a b a b a b
a 2 2 1 1 2
b a b a b 1 1 a b
2 2 2
Basis change
Why use the second more complicated basis?
In some applications we may have to work with an incomplete set of
coefficients.
A subset of the coefficients of the second base may give better
information about the whole set of coefficients.
In the first base the first coefficient is a, while in the second base the
first coefficient is (a+b)/2 which is more representative.
In progressive image transmission we prefer the first 10% of the data
to give an approximation of the whole image, rather than an exact
description of a small part of it.
Basis change
Why use the second more complicated basis?
It could be convenient.
Consider the problem of allocating a fixed amount of money x to two
people.
In the first basis we have to work with two variables a,b related by the
equation a+b = x.
In the second basis the first coefficient is fixed at S/2 and the only
variable is the second coefficient, which controls the difference
between the money each person gets.
Example
The four matrices below form a basis for the 4x1 matrices.
We can write any other 4x1 matrix as a linear combination of them, in a
unique way.
1 0 0 0
0 1 0 0
, , ,
0 0 1 0
0 0 0 1
Example
Writing a 4x1 matrix in this basis is trivial. We have:
a1 a1 0 0 0
a 0 a 0 0
2 2
a3 0 0 a3 0
a4 0 0 0 a4
giving,
a1 1 0 0 0
a 0 1 0 0
2 a1 a2 a3 a4
a3 0 0 1 0
a
4 0
0
0
1
Example
For example:
3 1 00 00 00
0 0 11 00 00
3? ?0 ?1? 8??
1 0 00 11 00
8 0 00 00 11
We call this base, the natural base.
Example
A different basis for the 4x1 matrices:
1 1 1 1
1 1 1 1
, , ,
1 1 1 1
1 1 1 1
We can write any other 4x1 matrix as a linear combination of the four
matrices above, in a unique way. We call this a basis change.
Example
How do we write a 4x1 matrix in the new basis?
3 1 1 1 1
0 1 1 1 1
? ? ? ?
1 1 1 1 1
8 1 1 1 1
Example
Let the coefficients be the unknowns x1 , x2 , x3 , x4 .
3 1 1 1 1
0 1 1 1 1
x x x x
1
1 1 2 1 3 1 4 1
8 1 1 1 1
Example
We can rewrite the equation as a linear system in matrix form.
3 1 1 1 1 x1
0 1 1 1 1 x
2
1 1 1 1 1 x3
8 1 1 1 1 x4
Example
To solve the system we invert the transformation matrix.
1
1 1 1 1 3 x1
1 1 1 1 0 x
2
1 1 1 1 1 x 3
1 1 1 1 8 x 4
Example
The inverse of this matrix is
1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1
1 1 1 1 4 1 1 1 1
1 1 1 1 1 1 1 1
In the literature, sometimes the original transformation matrix is divided
by 1/4 or 1/2. ( 1/N or 1 / N in general ).
Example
We get,
1
1 1 1 1 3 1 1 1 1 3 3
1 1 1 1 0 1 1 1 1 1 0 3 / 2
1 1 1 1 1 4 1 1 1 1 1 5 / 2
1 1 1 1 8 1 1 1 1 8 1
This is called the Walsh-Hadamard transform of this.
Example
The natural basis
Example
The Walsh-Hadamard basis
Summary of the lecture
Frequency domain
Basis change
Walsh - Hadamard transform
Tensor product transforms
Discrete Cosine Transform
Image compression
Encoding
Huffman encoding
Image compression
JPEG
Walsh-Hadamard transform
We can generalise the previous transform to larger matrices. For
example, the Walsh-Hadamard transform of order 8 is given by the
matrix
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
Walsh-Hadamard transform
Notice that the rows are ordered by the number of sign changes. We
say that they are ordered by sequency.
1 1 1 1 1 1 1 1 0
1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1 2
1 1 1 1 1 1 1 1 3
1 1 1 1 1 1 1 1 4
1 1 1 1 1 1 1 1 5
1 1 1 1 1 1 1 1 6
1 1 1 1 1 1 1 1 7
Walsh-Hadamard transform
The natural basis for 8x1 matrices
Walsh-Hadamard transform
The Walsh-Hadamard basis for 8x1 matrices
Walsh-Hadamard transform
Different components of the natural basis correspond to different pixels
of the image.
Different components of the Walsh-Hadamard basis correspond to
different “frequencies”.
The general W-H transform
The general Hadamard matrix Hn of order 2n is defined recursively by:
H n 1 H n 1
Hn
H n 1 H n 1
A sequency ordering of its rows will give the corresponding Walsh-
Hadamard transform.
Two dimensional W-H transform
The 2D Walsh-Hadamard transform is the tensor of the 1D transform.
Example: Every 4x4 greyscale image can be uniquely written in the
Walsh-Hadamard basis as linear combination of these 16 images.
The white squares denote
1’s and the black squares
denote -1’s.
Two dimensional W-H transform
(1,1,-1,-1)
(1,-1,-1,1)
(1,-1,1,-1)
(1,1,1,1)
How do we compute
these sixteen images?
Take the corresponding (1,1,1,1)
elements of the 1D
basis and find their
tensor product. (1,1,-1,-1)
(1,-1,-1,1)
(1,-1,1,-1)
Two dimensional W-H transform
1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
Summary of the lecture
Frequency domain
Basis change
Walsh - Hadamard transform
Tensor product transforms
Discrete Cosine Transform
Image compression
Encoding
Huffman encoding
Image compression
JPEG
Tensor product transforms
How can we compute T(F), the W-H transform of an 2D image F ?
It may seem that we have to solve a large and complicated linear
system.
In fact, the transform is computed directly by
T (F ) H F H '
where H is the W-H matrix of the 1D transform and H′ is the transpose
of H.
Tensor product transforms
That is, to find the transform of an image, we multiply it from the left
with the transformation matrix and from the right with its transpose.
t11 t12 t13 t14 a11 a12 a13 a14 t11 t21 t31 t41 z11 z12 z13 z14
t t22 t23 t24 a a22 a23 a24 t12 t22 t32 t42 z z22 z23 z24
21 21 21
t31 t32 t33 t34 a31 a32 a33 a34 t13 t23 t33 t43 z31 z32 z33 z34
t 41 t42 t43 t44 a41 a42 a43 a44 t14 t24 t34 t44 z41 z42 z43 z44
The transformation The original The transpose The transform
matrix T Image A of T of A
Tensor product transforms
To see why this happens we first need to introduce the notion of
orthogonality.
We say that a matrix is orthogonal if its inverse is equal to its transpose.
1
b11 b12 b13 b14 b11 b21 b31 b41
b b22 b23 b24 b b22 b32 b42
21
12
b31 b32 b33 b34 b13 b23 b33 b43
b41 b42 b43 b44 b14 b24 b34 b44
Tensor product transforms
Consider an image with all its pixels equal to 0, except one which has
value 1. Notice that this is an element of the natural basis. We have:
b11 b12 b13 b14 0 0 0 0 b11 b21 b31 b41
b b22 b23 b24 0 0 1 0 b b22 b32 b42
21 12
b31 b32 b33 b34 0 0 0 0 b13 b23 b33 b43
b41 b42 b43 b44 0 0 0 0 b14 b24 b34 b44
=
0 0 b12 0 b11 b21 b31 b41 b12 b13 b12 b23 b12 b33 b12 b43
0
0 b22 0 b12 b22 b32 b42 b22 b13 b22 b23 b22 b33 b22 b43
0 0 b32 0 b13 b23 b33 b43 b32 b13 b32 b23 b32 b33 b32 b43
0 0 b42 0 b14 b24 b34 b44 b42 b13 b42 b23 b42 b33 b42 b43
Tensor product transforms
This is equivalent to the tensor product of the two corresponding 1D basis
images:
b13 b23 b33 b43
b12 b12 b13 b12 b23 b12 b33 b22 b43 b12 b13 b12 b23 b12 b33 b12 b43
b b b22 b23 b22 b33 b22 b43
b22 b22 b13 b22 b23 b22 b33 b22 b43 22 13
b32 b32 b13 b32 b23 b32 b33 b32 b43 b32 b13 b32 b23 b32 b33 b32 b43
b42 b42 b13 b42 b23 b42 b33 b42 b43 b42 b13 b42 b23 b42 b33 b42 b43
Tensor product transforms
To put it all together, let B be an orthogonal matrix corresponding to an
1D basis. Then, T=B-1 is the transform matrix.
Let A be an image. We have to show that Z=T·A·T′ is the transform of
A
corresponding to the tensor product of the 1D basis B.
I uv , Buv
To see this, let be the (u, v) elements of the 2D natural and
transform basis, respectively.
Tensor product transforms
Z T A T '
(z11 I11 z nn I nn ) T A T'
T -1 (z11 I11 I nn z nn ) (T')-1 A
z11 T -1 I11 (T')-1 z nn T -1 I nn (T')-1 A
z11 B I11 B ' z nn B I nn B ' A
z11 B11 z nn Bnn A
The latter means that the matrix Z gives indeed the coefficients for
writing A in the tensor product basis of B.
Summary of the lecture
Frequency domain
Basis change
Walsh - Hadamard transform
Tensor product transforms
Discrete Cosine Transform
Image compression
Encoding
Huffman encoding
Image compression
JPEG
DCT
The Discrete Cosine Transform (DCT) is given by the matrix with
entries
1/ 2 for u 0
T (u , v)
cos (2v 1)u otherwise
2n
In fact, there are several types of DCT and this is the type II, which is
the most popular.
Each type corresponds to different assumptions about the boundary of
the image.
Example
The DCT matrix for n=4
1/ 2 1/ 2 1/ 2 1/ 2
3 5 7
cos cos cos cos
8 8 8 8
T 2 6 10 14
cos cos cos cos
8 8 8 8
3 9 15 21
cos cos cos cos
8 8 8 8
Spatial/Frequency domain
The DCT is an example of a transform from the spatial to the frequency
domain.
While the elements of the original matrix correspond to pixels, the
elements of the transform correspond to frequencies.
The decomposition of a signal into frequencies, has many applications
(e.g. analysis, processing, compression). It allows us to see properties
of the signal which remain hidden in the spatial domain representation.
The frequency domain, even though it is more complicated, it is the
natural domain of many signals (e.g. sound, light, electric signals),
because they consist of waves of different frequencies.
Example
The sound signal is decomposed into several frequencies.
Each frequency can be amplified or attenuated.
http://kaffeine.sourceforge.net/featurepics/equalizer.png
Example
The signal can be digital or analogue.
http://rolls.com/rollsproducts/
2D DCT
The 2D DCT is obtained in the usual way Z T * A * T ' ,where T is
the matrix of the 1D DCT.
For example, the basis of the 4x4 DCT consists of 16 matrices.
One such matrix is shown below.
0.250 0.135 -0.250 -0.326
0.135 0.073 -0.135 -0.176
-0.250 -0.135 0.250 0.326
-0.326 -0.176 0.326 0.426
The (2,3) element of the We add 0.5 to the matrix
basis of the 4x4 DCT to put it in the range [0,1]
DCT frequencies
As it is obvious from the definition of the DCT, the values of the u row
of the matrix are on the underlying function:
u (2v 1)
cos
2n
We can see the rows of the matrix (which give the basis of the
transform) as a sample from these functions.
DCT frequencies
The underlying the functions for n=8.
DCT frequencies
Notice that the low frequencies correspond to the top rows and the high
frequencies to the bottom rows.
Therefore, in the 2D DCT the low frequencies correspond to the top-left
of the image’s transform.
Example
DCT
The large coefficients are concentrated on the upper left corner of the
image of the DCT transform.
Filters for the DCT
The design of a filter for DCT depends on the distribution of the
frequencies in the frequency domain.
Ideal lowpass filter Ideal highpass filter
Ideal filters for the DCT
The design of a filter for DCT depends on the distribution of the
frequencies in the frequency domain.
Ideal bandpass filter Ideal bandreject filter
Summary of the lecture
Frequency domain
Basis change
Walsh - Hadamard transform
Tensor product transforms
Discrete Cosine Transform
Image compression
Encoding
Huffman encoding
Image compression
JPEG
Encoding
We want to find a compact computer representation for a message
written in the form of a string of symbols
abababadaaabcdadbaab
The symbols are also called letters.
The set of all different letters is called alphabet. The above string uses
an alphabet of 4 letters:
{a,b,c,d}
Encoding
The resulting computer representation will be a string of bits
0101000010111110010101
That is, a sequence of 0’s and 1’s.
The process of going from the initial string of symbols to the bit string
is called encoding.
The reverse process, from the bits to the string of symbols is called
decoding.
The aim is to find a representation as compact as possible, that is, to
use as few bits as possible.
Encoding
The coding methods we study here assign a bit string to each letter of
the alphabet.
This correspondence between letters and sting bits is called the code.
An simple example of code is:
a00
b 01
c 10
d 11
We say that the code of a is 00, the code of b is 01, …
Encoding
With the above code:
a00
b 01
c 10
d 11
The string
“a b a b a b a d a a a b c d a d b a a b”
is encoded as
“0001000100010011000000011011001101000001”
Encoding
The bit string
“0001110111010011000000011011001101000001”
is decoded by taking the bits two-by-two and finding the corresponding
letter
“ 00 | 01 | 00 | 01 | 00 | 01 | 00 | 11 | 00 | 00 | 00 | 01 | … “
“a b a b a b a d a a a b … “
Fixed-length code
In the previous example all the letters were encoded by same number
of bits, (two bits for each letter). Such codes are called fixed-length
codes.
We want to improve on the previous code, that is, to use less bits for
encoding a given string.
The idea is to use fewer bits for the letters that appear more frequently
and more bits for the letters that are used less frequently.
Variable length code
Consider the code
a0
b 10
c 11 0
d 111
The string
“a b a b a b a d a a a b c d a d b a a b”
is now encoded as
“010010010111000101101110111100010”
that is, with 33 bits instead of the 40 bits with the previous code.
Prefix codes
The code in the previous slide is a prefix free and this property allows
us to decode it back to the original string.
Prefix free means that the code of a letter can not be the beginning of
the code of another letter.
The code in the previous slide is prefix free because
the code of a is 0 and no other letter’s code starts with 0
the code of b is 10 and no other letter’s code starts with 10
the code of c is 110 and no other letter’s code starts with 110
the code of d is 111 and no other letter’s code starts with 111
Prefix codes
To decode a prefix free code:
We start from the beginning of the string and check the first bit, then the
first two bits and so on, until we find the code of a letter.
Add this letter to the decoded string. Because of the prefix free property
it can not be the beginning of the code of a different letter.
We continue with the next bits, until the end of the string.
If the process breaks down (i.e. we can not find a valid code), it means
that there was an error in the encoding process.
Prefix codes
In the previous example the string
“010010010111000101101110111100010”
is decoded as
“0 | 10 | 0 | 10 | 0 | 10 | 111 | 0 | 0 | 0 | 10 | 110 | 111 | 01 | …”
giving,
“a b a b a b a d a a a b c d a d b a a b”
Different letters may have codes of different length, but nevertheless
we do not need separators to indicate the end of a letter’s code.
Summary of the lecture
Frequency domain
Basis change
Walsh - Hadamard transform
Tensor product transforms
Discrete Cosine Transform
Image compression
Encoding
Huffman encoding
Image compression
JPEG
Huffman coding
The Huffman encoding is an algorithm for finding an efficient prefix
code for a given string.
The Huffman code is optimal. It requires the least number of bits
compared to any other prefix code, under the assumption that every
letter is encoded-decoded separately.
Huffman coding
STEP 1: (preparatory)
• Count how many times each letter appears in the string.
• Divide these numbers by the total number of letters in the
string to find the probability of each letter.
This normalization is not necessary. The algorithm would also work
with the actual number of appearances of each letter.
If we do not know the string we encode, we may use an estimate of the
probabilities.
Huffman coding
STEP 2:
• Sort the probabilities of the letters in descending order.
• Combine the two letters with the lowest probability. The
probability of the compound letter is the sum of the
probabilities of its components.
• Sort again the letters and continue combining the two letters
with the lowest probability until there are only two letters.
It is important to keep a record of the letter combined at each step.
It will be needed at STEP 3 where this process will be reversed.
Huffman coding
STEP 3
• At this stage there are only two (possibly compound) letters.
Their first bit will be 0 and 1, respectively.
• While there is a compound letter, split it, and make the next bit
of the two constituent (possibly compound) letters to be 0 and 1,
respectively.
• Continue splitting, until there is no compound letter left and you
have recovered the initial letters.
Example
Consider a string S consisting of 1000 letters from an alphabet of 6
{a,b,c,d,e,f}
The frequency with which a letter appears in the message is shown in
the following table
a b c d e f
450 100 120 30 200 100
Exercise: Find the Huffman code for S.
Example
STEP 1
We divide by 1000 (the number of letters in the message) to find the
probability of each letter
a b c d e f
0.45 0.1 0.12 0.03 0.2 0.1
We sort them in descending order
a e c b f d
0.45 0.2 0.12 0.1 0.1 0.03
Example
STEP 2
We combine the two least probable letters and sort again
a e f+d c b
0.45 0.2 0.13 0.12 0.1
We combine the two least probable letters and sort again
a c+b e f+d
0.45 0.22 0.2 0.13
Example
STEP 2
We combine the two least probable letters and sort again
a e+(f+d) c+b
0.45 0.33 0.22
We combine the two least probable letters
(e+(f+d))+(c+b) a
0.55 0.45
Example
STEP 3
We now have only two letters and can proceed with STEP 3. Notice
that we do not need any information about the probabilities any more.
We assign the first bit of the two letters
(e+(f+d))+(c+b) a
0 1
We split the compound letter, assigning the second bit
e+(f+d) c+b a
00 01 1
Example
STEP 3
We split the compound letters, assigning the third bit
e f+d c b a
000 001 010 011 1
We split the compound letter, assigning the fourth bit
e f d c b a
000 0010 0011 010 011 1
Binary trees
Why does the Huffman algorithm produces prefix free codes?
An intuitive way to see this by considering binary trees.
A binary tree is a special type of graph, that is, nodes connected with
edges.
One node of the binary tree, called the root is at the top.
Each node is connected either with none, or exactly with two
nodes below it.
The node above is called the parent and the two nodes below are
called the children.
The nodes with no children are called leaves.
Binary trees
Example of binary tree:
Root
Level 1
Level 2
Level 3
Level 4
Binary trees
The leaves are shown as
white circles.
Binary trees
There is a natural way to
assign a bit string to each
0 1
node.
00 01 The left child has the bit
string of its parent with an
additional 0 at the end.
000 001 010 011
The right child has the bit
string of its parent with an
0010 0011 additional 1 at the end.
Binary trees
The bit strings of the leaves of a
binary tree form a prefix free code.
0 1
Indeed, consider any code that is
00 01 the beginning of the a leaf’s code.
They are all in the path joining the
000 001 010 011 leave with the root.
0010 0011 Thus, they are internal nodes.
Binary trees
a The Huffman code is given by
the leaves of a binary tree.
Therefore, it is prefix free.
e c b
f d
Huffman coding
The STEP 2 of the algorithm determines a binary tree.
The STEP 3 of the algorithm reconstructs it.
The algorithm is efficient (in fact, it is optimal) because pushes the
letters with low probability to the bottom of the tree, while the letters
with high probability stay at the top levels of the tree.
Summary of the lecture
Frequency domain
Basis change
Walsh - Hadamard transform
Tensor product transforms
Discrete Cosine Transform
Image compression
Encoding
Huffman encoding
Image compression
JPEG
Compression
Two data sets may represent the same information but have
different size in the computer’s memory.
For example the same picture can be encoded in different formats,
.jpg, .png, .bmp and have different file size.
If the same information is encoded by two data sets consisting of
n1 and n2 information units (e.g. bits), we say that the second set
has been compressed with a compression ratio
Cr = n1 / n2
Compression
The processing of the initial data into a new data set with smaller size
is called compression (or encoding), while the inverse process of
recovering the initial data is called decompression (decoding).
Decompression is necessary when the compressed data are in a form
that can not be immediately used.
Usually, the main consideration in compression is the compression
ratio. We are looking for large compression ratios, that is, to use a few
bits as possible.
Compression
Other considerations in compression are:
The the complexity and the time and memory costs of the
compression-decompression algorithm.
The resiliency of the algorithm. That is, how does small errors in the
compressed data affect the decompressed data?
Compression
Sometimes, depending on the application, we are not interested in
recovering the exact initial data but only an approximation of it.
Compression algorithms that can recover the exact initial data are
called lossless. Algorithms that can not recover the exact initial data
are called lossy.
Generally, lossless algorithms achieve better compression rates.
In image processing usually we can afford the loss of some information,
so, lossy algorithms are common.
Compression
To evaluate a lossy algorithm we need a measure of the information
lost by the compression. That is, we need an estimate of the
difference between the initial and the compressed and then
decompressed image.
The root-mean-square error.
The root mean-square signal to noise ratio.
Subjective criteria:
Absolute rating scale (e.g. the rating scale of the Television
Allocations Study Organization).
Side by side comparison of the original and the compressed and
then decompressed image.
Compression
In image compression, the initial image is processed and a new
compressed image is obtained, with smaller size but which,
nevertheless, carries the same or a comparable amount of information.
That means that the initial data carried a certain amount of redundancy.
We can identify several types of redundancy, from higher type to lower:
Psychovisual redundancy
Interpixel redundancy
Coding redundancy
Summary of the lecture
Frequency domain
Basis change
Walsh - Hadamard transform
Tensor product transforms
Discrete Cosine Transform
Image compression
Encoding
Huffman encoding
Image compression
JPEG
JPEG
The JPEG is a compression algorithm using the Discrete Cosine
Transform and Huffman encoding.
Subdivide the image into pixel blocks of 8x8 size. The blocks are
processed one after the other from left to right and top to bottom.
Assuming that the values of the pixels are integers in the range
[0 , 255], subtract 128 to bring them in the range [-128 , 127]. The
reason is that the DCT is maps the interval [-128 , 127] to itself.
Apply the DCT. The DCT values are computed with 11-bit precision
(even though the input has 8-bit precision).
JPEG
Next, we scale and quantize the DCT values, using the scaling array
16 11 10 16 24 40 51 61
12 12 14 19 26 58 60 55
14 13 16 24 40 57 69 56
14 17 22 29 51 87 80 62
18 22 37 56 68 109 103 77
24 35 55 64 81 104 113 92
49 64 78 87 103 121 120 101
72 92 95 98 112 100 103 99
JPEG
The DCT coefficients at the top right of the array are scaled less.
DCT
The actual number were obtained experimentally.
JPEG
Create a sequence of DCT coefficients using the zig-zag pattern:
0 1 5 6 14 15 27 28
2 4 7 13 16 26 29 42
3 8 12 17 25 30 41 43
9 11 18 24 31 40 44 53
10 19 23 32 39 45 52 54
20 22 33 38 46 51 55 60
21 34 37 47 50 56 59 61
35 36 48 49 57 58 62 63
Finally, encode with Huffman encoding, using a special symbol for
the end of the non-zero coefficients.