You are on page 1of 32

Digital Image Processing

Lectures by

Dr. Vrinda V. Nair

Lecture Notes prepared by:
Jerrin Thomas Panachakel

jerrin.panachakel@gmail.com
Contents

1 Image Processing 1

2 Digital Images 3
2.1 Digital Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Image Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Pixel Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.2 Spatial Resolution . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.3 Grayscale Resolution . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Issues with Digitising an Image . . . . . . . . . . . . . . . . . . . . . . 5
2.4 Isopreference Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Image Processing Fundamentals 9
3.1 Zooming an Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Neighbours of a Pixel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 A Few Terms... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4 Digital Image Fundamentals 18
4.1 Distance Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2 Mathematical operations on pixels . . . . . . . . . . . . . . . . . . . . . 21
4.2.1 Array and matrix operation . . . . . . . . . . . . . . . . . . . . 21
4.2.2 Linear and non-linear operations . . . . . . . . . . . . . . . . . 21
4.2.3 Arithmetic operations . . . . . . . . . . . . . . . . . . . . . . . 22

5 Operators (cont’d), Transforms, Probabilistic Models 23
5.1 Set operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.2 Logical operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.3 Geometric spatial transformation . . . . . . . . . . . . . . . . . . . . . 24
5.4 Image Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.5 Vector and Matrix operation . . . . . . . . . . . . . . . . . . . . . . . . 25
5.6 Image transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.7 Probabilistic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

6 Histogram Processing 28
6.1 Histogram Equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Lecture 1

Image Processing

In imaging science, image processing is any form of signal processing for which the input
is an image, such as a photograph or video frame; the output of image processing may
be either an image or a set of characteristics or parameters related to the image. Most
image-processing techniques involve treating the image as a two-dimensional signal
and applying standard signal-processing techniques to it. Image processing is referred
to processing of a 2D picture by a computer.Closely related to image processing are
computer graphics and computer vision. In computer graphics, images are manually
made from physical models of objects, environments, and lighting, instead of being
acquired (via imaging devices such as cameras) from natural scenes, as in most animated
movies. Computer vision, on the other hand, is often considered high-level image
processing out of which a machine/computer/software intends to decipher the physical
contents of an image or a sequence of images (e.g., videos or 3D full-body magnetic
resonance scans). Various image processing techniques include:

1. Image Enhancement: It refers to giving special importance to, or sharpening,
of image features such as boundaries, or contrast to make a graphic display more
useful for display & analysis. This process does not increase the inherent infor-
mation content in data. It includes gray level & contrast manipulation, noise
reduction, edge crispening and sharpening, filtering, interpolation and magnifi-
cation, pseudo colouring, and so on. Image restoration is SUBJECTIVE in
nature.Meaning, it differs from person to person the degree to which the image is
enhanced. The same image which one person classifies as “highly enhanced” may
get classified as a “poor” image by another person. Also, it depends on the ap-
plication, for example in analysing echocardiogram, which i the ultrasound image
of heart, what is more important is the conservation of edges in the image rather
than noise reduction.

2. Image Restoration: It is concerned with filtering the observed image to min-
imize the effect of degradations. Effectiveness of image restoration depends on
the extent and accuracy of the knowledge of degradation process as well as on
filter design. Image restoration differs from image enhancement in that the latter
is concerned with extraction or giving special significance of image features.Also,
unlike image enhancement, which is subjective, image restoration is objective in

1
nature, in the sense that restoration techniques tend to be based on mathemat-
ical or probabilistic models of image degradation, we use several models
and finds out (hopefully!!) which model is best suited. For instance, ultrasound
images are usually corrupted by speckle noises, whereas CCD (Charge Couple
Devices) camera images are affected by salt & pepper noises (due to faulty CCD
elements).

3. Image Compression: The bandwidth available to transmit an image and the
memory available to store an image are limited. Therefore, to reduce the band-
width and memory requirements, we go for image compression. Examples of im-
age compression standards include but not limited to JPEG (Joint Photographic
Experts Group), which is based on DCT (Discrete Cosine Transform) and JPEG
2000, which is based on DWT (Discrete Wavelet Transform).

4. Morphological Processing: Morphological image processing is a collection of
non-linear operations related to the shape or morphology of features in an image.
Morphological operations rely only on the relative ordering of pixel values, not
on their numerical values, and therefore are especially suited to the processing of
binary images. Morphological operations can also be applied to greyscale images
such that their light transfer functions are unknown and therefore their absolute
pixel values are of no or minor interest.

5. Image Segmentation: Image segmentation is the process of partitioning a dig-
ital image into multiple segments (sets of pixels, also known as superpixels). The
goal of segmentation is to simplify and/or change the representation of an image
into something that is more meaningful and easier to analyze. Image segmen-
tation is typically used to locate objects and boundaries (lines, curves, etc.) in
images. More precisely, image segmentation is the process of assigning a label to
every pixel in an image such that pixels with the same label share certain visual
characteristics.

2 Jerrin
c Thomas Panachakel
Lecture 2

Digital Images

2.1 Digital Representation
To make a one-dimensional analog signal digital, we go for sampling, quantization and
encoding. The same applies to analog images too, which can be considered as 2-D
signals. We perform sampling, quantization and encoding to make an analog image
digital. Analogous to the time index in analog 1-D signals, we have spatial co-ordinates
in 2-D images and instead of amplitude, we have intensity. Fig. ?? shows the process of
sampling an image.Fig. 2.1a shows the original ’analog’ image (there is no way by which
we can have an analog image in a digital computer but assuming the sampling rate and
sampling depth to be very high, the intensity and spatial coordinates can be considered
as being continuous in nature). Fig. 2.1b shows the sampling grid used. Each element
(square) in the grid is called a pixel or a pel (picture element).Fig. 2.1c shows the
output of sampling. When measuring the value for a pixel, we take the average color
of an area around the location of the pixel. A simplistic model is sampling a square,
this is called a box filter (which is used for obtaining Fig. 2.1c), a more physically
accurate measurement is to calculate a weighted Gaussian average (giving the value
exactly at the pixel coordinates a high weight, and lower weight to the area around

(a) ’Analog’ Image (b) Sampling Grid (c) Sampled Image

3
2.1. DIGITAL REPRESENTATION

Figure 2.1: Pixel values

it). It is interesting to see the exact pixel values of this sampled image; this is shown
in Fig. 2.1. The values of the pixels need to be stored in the computers memory, this
means that in the end the data ultimately need to end up in a binary representation,
the spatial continuity of the image is approximated by the spacing of the samples in the
sample grid. The number of bits is called the BPP (Bits Per Pixel). For obtaining the
values shown in Fig. 2.1, 8 BPP sampling depth was used. We can make the following
observations,

• Brighter the pixel, higher will be the pixel value and vice-versa.

• Although for an 8BPP image, the values of pixels can range from o (darkest
pixel) to 255 (brightest pixel), here the range is from 8 to 250,the ratio of these
two numbers is called the dynamic range of the image. It is measured as a ratio,
or as a base-10 (decibel) or base-2 (doublings, bits or stops) logarithmic value.

• The ’standard’ coordinate system for images is slightly confusing, atleast for be-
ginners, the first element is denoted as (0, 0) and the last as (10, 8), the first index
denotes the row and the second index denotes the column. This is a widely used
format but there are other formats too.

It may be noted that contradictory to high-level programming languages such as C++,
Python etc, matrix indices should be positive in Matlab and hence the first pixel element
will be (1, 1) instead of (0, 0)!

4 Jerrin
c Thomas Panachakel
2.2. IMAGE RESOLUTION

2.2 Image Resolution
Image resolution is the detail an image holds. The term applies to raster digital images,
film images, and other types of images. Higher resolution means more image detail.
Image resolution can be measured in various ways. Basically, resolution quantifies how
close lines can be to each other and still be visibly resolved. Resolution units can be
tied to physical sizes (e.g. lines per mm, lines per inch), to the overall size of a picture
(lines per picture height, also known simply as lines, TV lines, or TVL), or to angular
subtenant. Line pairs are often used instead of lines; a line pair comprises a dark line
and an adjacent light line. A line is either a dark line or a light line. A resolution 10
lines per millimeter means 5 dark lines alternating with 5 light lines, or 5 line pairs per
millimeter (5 LP/mm).

2.2.1 Pixel Resolution
When the pixel counts are referred to as resolution, the convention is to describe the
pixel resolution with the set of two positive integer numbers, where the first number
is the number of pixel columns (width) and the second is the number of pixel rows
(height), for example as 7680 by 4320. Another popular convention is to cite resolution
as the total number of pixels in the image, typically given as number of megapixels,
which can be calculated by multiplying pixel columns by pixel rows and dividing by
one million. Other conventions include describing pixels per length unit or pixels per
area unit, such as pixels per inch or per square inch.

2.2.2 Spatial Resolution
The measure of how closely lines can be resolved in an image is called spatial resolu-
tion, and it depends on properties of the system creating the image, not just the pixel
resolution in pixels per inch (ppi). For practical purposes the clarity of the image is
decided by its spatial resolution, not the number of pixels in an image. In effect, spatial
resolution refers to the number of independent pixel values per unit length.

2.2.3 Grayscale Resolution
It is the smallest discernable or perceptible change in graylevel.

2.3 Issues with Digitising an Image
Image Frequency: Image frequency is a rather confusing term. Lets begin from what
we learned in our high school... What is frequency? Rate of change.... Higher the
frequency, higher will be rate of change and vice-versa. For 1-D signals, we measure the
change with respect to time but we don’t have a time index in an image! Instead, as we
discussed in Lecture 1, we have spatial coordinates. Putting everything together, a high
frequency image will have pixel values varying rapidly and the reverse for low frequency
images. For example, consider the signal, y = cos(x), x = 0.0, 0.1, 0.2, ..., 0.9r , then

Jerrin
c Thomas Panachakel 5
2.3. ISSUES WITH DIGITISING AN IMAGE

Figure 2.2: Image Frequency: An Illustration

cos(x) = [1.0000, 0.9950, 0.9801, 0.9553, 0.9211, 0.8776, 0.8253, 0.7648, 0.6967, 0.6216),
translating these values to the range [0, 255], cos0 (x) = [255.0000, 254.3605, 252.4485, ...
...249.2831, 244.8958, 239.3306, 232.6430, 224.8998, 216.1785, 206.5661], formula used,
cos0 (x) = ((cos(x)+1) × 128)− 1. Similarly, consider the signal z = cos(5x), and obtain
cos0 (5x) for same values of x. A plot of both was obtained using Matlab is shown in
Fig. 2.2. As it can be seen, the rate change of pixel intensities for cos0 (5x) is much
higher than that for cos0 (x). So, in images also, higher frequency mean more rapid rate
of change. And where do we have high frequencies in an image? In the fine details,
edges, where there are rapid changes in pixel values. So, if we pass an image through
a low pass filter, we will lose the high frequencies present, and the perceivable change
will be that the fine details will be lost, implies, the edges will be blurred. This is really
bad, but not in all cases. Let me explain that...
When do you say that you image is corrupted by noise? When some pixels have
values which are not matching with the surrounding pixel values. That is, we are
actually having high frequencies there! Thus an easy way to remove high frequencies is
by using low pass filter. This same approach is used in many applications, to remove
the noise from a noisy image, we pass the image through a low pass filter and hence we
can remove the nose at the expense of high frequency information present in the image.
And now the question is what the unit of this image frequency is? In image pro-
cessing, frequency has the units of cycles per millimetre.
Now we have frequency and sampling, but how are they related?

• What if the sampling rate is low (“Hi! Nyquist....”)?
Aliasing: Fig. 2.3 illustrates what happens if a signal is sampled at regular time
intervals that are slightly less often than once per period of the original signal.
The blue curve is the original signal, and the red dots indicate the sampled values.
The red dots are what are recorded, since they represent the signal values at the
times the signal is sampled. The pattern of the red dots is a terrible representation
of the signal. The red sampled data looks like a sine wave at about one-tenth
the frequency of the original! This is aliasing. When the number of pixels in an
image is reduced keeping the gray levels in the image constant, fine checker board
patterns are found at the edges f the image and this effect is called checker bard
effect. An example of spatial aliasing is the Moir pattern one can observe in a

6 Jerrin
c Thomas Panachakel
2.3. ISSUES WITH DIGITISING AN IMAGE

Figure 2.3: Aliasing in 1-D signals

poorly pixelized image of a brick wall.
Sometimes, the image frequency and the Nyquist criteria demands a non-feasible
sampling rate. In these cases, we go for spatial anti-aliasing filters. Anti-aliasing
filters are basically low pass filters and as we discussed earlier, low pass filters
are blurring filters since they blur the image. What happens is that these filters
will remove the high frequencies present in the image that may cause aliasing
effect if sampled at the given sampling rate. Hence, by removing high frequency
components, the minimum sampling rate imposed by Nyquist criteria is reduced.

• What if my sampling depth or BPP is low?
When number of gray levels (sampling depth) is low, an effect known as “false
contouring” occurs. This is called so because the resulting ridges resembles to-
pographical contours in a map.This effect arises when the number of brightness
levels is lower than that which humans can easily distinguish. If we use k bits
perp pixel, number of gray levels or brightness levels will be 2k . If k = 1, the
image is called a binary image or a bitonal image.

• Suppose we scan a 6in × 4 in photo at 110 dpi, what will be resolution of the
image?
Resolution will be 660 × 440.

• What happens when a high resolution image is displayed on a low resolution
screen?
You cannot display a higher resolution than the maximum hardware capacity.
However, it is quite possible that you have set your screen resolution to less than
what the hardware (read ’monitor’) is capable of. In that case, an image having
higher resolution than this will be squeezed (opposite of stretched) to fit into this
resolution. Most image viewing programs and operating systems automatically
redraw or resize the image to fit on your screen or in the allowed space in the
viewer program. And they usually do a very good job. If everything else looks
sharp on your screen, regardless of the resolution, a super large image for which

Jerrin
c Thomas Panachakel 7
2.4. ISOPREFERENCE CURVES

Figure 2.4: Isopreference Curve

your operating system or viewer program is only showing a quarter or less of the
pixels will still look as sharp as anything else.

2.4 Isopreference Curves
T.S. Huang in 1965 attempted to quantify experimentally the effect on image quality
by varying the resolution (N ) and number of bits per pixel (k) simultaneously. Three
images were used for this, 1) image of the face of a woman which contains relatively
less details, 2) image of a cameraman which contains medium details and 3)image of
a crowd which contains relatively high amount of details. Sets of these three images
with varying values of N and k were generated and observers were asked to rank the
images according to their subjective qualities. The result was plotted on an N k plane
as shown in Fig. 2.4. Points on isopreference curve correspond to images of N & k
values corresponding to the coordinates and having equal subjective quality. As the
values of N and k were increased, the curve moved to shifted up and right as shown in
the figure.
The isopreference curves tend to be vertical as the amount of details increases.
This suggests that for images with large amount of details, only few intensity levels
are required. For example, the curve corresponding to crowd is almost vertical which
indicates that for a fixed resolution (N ), the perceived quality is independent of the
number of intensity levels used. Also, the curves of other two images too remain con-
stant at some interval where the number of samples is increased and the number of
intensity levels decreased. The most likely reason for this is that a decrease in k tends
to increase the apparent contrast, a visual effect that humans perceive as higher quality
of an image.

8 Jerrin
c Thomas Panachakel
Lecture 3

Image Processing Fundamentals

3.1 Zooming an Image
Zooming is basically achieved by oversampling the image. Zooming is a two-step pro-
cess,

• STEP 1: Creation of new pixel locations

• STEP 2: Assignment of pixel values to these newly created locations (intensity
level assignment)

Interpolation can be used in STEP 2. Interpolation is the process of using known data
to estimate values at unknown location. There are several approaches for achieving
zooming using interpolation, they are listed below in the increasing order of complexity

1. Nearest neighbour interpolation (Zero-order hold)

2. Bilinear interpolation (First-order hold)

3. Bicubic interpolation

4. Spline interpolation

5. Sinc Interpolation

1. Nearest neighbour interpolation (aka proximal interpolation): This is the simplest
and fastest method for pixel value assignment.  
182 152 20
Consider an 8 BPP image with the following pixel values,  34 162 76  and
48 211 198
we want the image represented by these values to be zoomed 1.5 times (1.5X).
As discussed earlier, the first step involved is the creation of new pixel locations.
Dimension of the original image is 3 × 3, there is no way by which we can have
a 3 ∗ 1.5 × 3 ∗ 1.5 = 4.5 × 4.5 image since dimension is the count of number

9
3.1. ZOOMING AN IMAGE

(a) Original Im-
age (b) Expanded Image (c) Interpolated Image

Figure 3.1: Zooming using Nearest Neighbour Interpolation

of pixels, which has to be integer values. So, what is done widely 1 is we go
for the nearest integer greater than the product of original dimension and scale
(something that we achieve using  ceil()in Matlab. So, the row expanded pixel
182 152 0 20 0
matrix is  34 162 0 76 0 . Once this is done, we go for column expansion
48 211 0 198 0
 
182 152 0 20 0
 34 162 0 76 0
 
which gives us   0 0 0 0 0 . Now, we have the expanded matrix, the
 48 211 0 198 0
0 0 0 0 0
next task is to replace all the 0s in the matrix with some values so that the effects
such as blurring, edge halos etc. are minimum. First we see nearest neighbour
interpolation. In this interpolation, the newly created pixel locations are assigned
intensity
 values of the pixels closest
 to them. In our example, the resulting matrix
182 152 152 20 20
 34 162 162 76 76 
 
 34 162 162 76 76 . Won’t it be great to visualize this process, see
is,  
 48 211 211 198 198
48 211 211 198 198
Fig. 3.1. A major demerit of this approach is that it causes distortion of straight
edges.

2. Bilinear Interpolation: Since different sources don’t agree on bilinear and bicubic
interpolation, only the basics are given here. In bilinear interpolation, the values
of four neighbouring pixels are used to decide the intensity value of the pixel at
the given location. If we need to find the value at location (x, y) given by v(x, y),
we use the following relation,

v(x, y) = ax + by + cxy + d (3.1)
1
Whn I say “widely”,I actually mean “in Matlab”. The built-in function in Matlab for resizing
an image, imresize() follows the algorithm discussed here though there are contributions in Matlab
Central where there are some changes like instead of ceil(), float() is used!!

10 Jerrin
c Thomas Panachakel
3.2. NEIGHBOURS OF A PIXEL

The four coefficients are found out by solving four equations in four unknowns
that can be obtained by using the four nearest neighbours of (x, y).
Although bilinear interpolation is computationally more demanding than nearest
neighbour interpolation, it gives better results.

3. Bicubic Interpolation: In bicubic interpolation, the relation used is
3 X
X 3
v(x, y) = aij xi y j (3.2)
i=0 j=0

The sixteen coefficients are found out by solving sixteen equations in sixteen un-
knowns that can be obtained by using the sixteen nearest neighbours of (x, y).
It is interesting to note that Eq. 3.2 attains the form of Eq.3.2 when the limits
of both summations are changed from 0 to 3 to 0 to 1. Of all the techniques
discussed, bicubic interpolation gives the best results. Comparison of these tech-
niques is given in Fig. 3.2. Checkerboard effect is clearly visible in Fig. ??.Bicubic
interpolation gives slightly sharper results than the other two

(a) (b) (c) (d)

Figure 3.2: Zooming using various interpolation techniques
(a)Original Image (32 × 32) (b)Nearest Neighbour Interpolation (32X Zooming)
(c)Bilinear Interpolation (32X Zooming) (d)Bicubic Interpolation (32X Zooming)

3.2 Neighbours of a Pixel
• A pixel P with coordinates (x, y) has two vertical and two horizontal neighbours
with coordinates from the set
{(x + 1, y), (x − 1, y), (x, y + 1)&(x, y − 1)} where the first two are horizontal
neighbours and the rest are vertical neighbours as shown in Fig. 3.3. This set
of four pixels is denoted as N4 (P ), known as the 4 neighbours of P . It may be
noted that each neighbouring pixel is at a unit distance from P .

• If P is a border pixel, some neighbouring pixels of P will lie outside the image.

• The four diagonal neighbours of P have coordinates from the set,
{(x − 1, y − 1), (x − 1, y + 1), (x + 1, y + 1), (x + 1, y + 1)}. This is shown in Fig.
3.3. This set of pixels are denoted as ND (P ), the diagonal neighbours of P .

Jerrin
c Thomas Panachakel 11
3.3. A FEW TERMS...

Figure 3.3: Neighbours of a Pixel

• The collection of four neighbours and diagonal neighbours (ND (P ) ∪ N4 (P )) is
known as the eight neighbours of P , denoted as N8 (P ).

3.3 A Few Terms...
• If we need to establish that two pixels are connected, we should show that 1.
they are neighbours and 2. their intensity values (pixel values) conform to some
relation, for example, their intensity values belong to the same set.
• Let V be the set of intensity values used to define adjacency. If we are considering
an 8BPP image, the set of values that each pixel can take is
I = {0, 1, 2, 3, ....255}. In this case, V ⊂ I.
• 4- adjacency: Two pixels p and q are said to be 4- adjacent iff
– Values of p and q is from set V .
– q ∈ N4 (p), i.e. q is one of the ’4 neighbours’ of p.
Let V = {1}, and intensity matrix as be given in Fig. 3.5. If the pixel shown in
red is p, then all the pixels in blue can be q, i.e. they are 4- adjacent with p.

Figure 3.5: 4- Adjacency

12 Jerrin
c Thomas Panachakel
3.3. A FEW TERMS...

• 8- adjacency: Two pixels p and q are said to be 8- adjacent iff
– Values of p and q is from set V .
– q ∈ N8 (p), i.e. q is one of the ’8 neighbours’ of p.

Figure 3.6: 8- Adjacency

Let V = {1, 2}, and intensity matrix as be given in Fig. 3.6. If the pixel shown
in red is p, then all the pixels in blue can be q, i.e. they are 8- adjacent with p.
• m- adjacency (mixed adjacency): Two pixels p and q are said to be m-
adjacent iff
– q ∈ N4 (p). i.e. q is one of the ’4 neighbours’ of p. OR
– q is in ND (p) and the set ND (p) ∪ N4 (p) has no pixels whose values are from
V.

Figure 3.7: m- Adjacency

Let V = {1}, and intensity matrix as be given in Fig. 3.7.If the pixel shown in
red is p , then the pixel in black satisfies condition 1 for m-adjacency and the
pixel in blue satisfies condition 2.
Before going any further, why do we actually need these stuffs? Consider Fig. 3.4.
How many object are there? We can find out using what we discussed just now!
Other applications include but not limited to segregating the foreground image from
the background.
• A (digital ) path (or curve from pixel p with coordinates (x, Y ) to pixel q with Figure 3.4
coordinates (s, t) is a sequence of distinct pixels with coordinates

Jerrin
c Thomas Panachakel 13
3.3. A FEW TERMS...

(x0 , y0 ), (x1 , y1), ......., (xn , yn )

where (X0 , y0 ) = (x, y), (xn , yn ) = (s, t) and pixels (xi , yi ) and (xi−1 , yi−1 ) are
adjacent for 1 ≤ i ≤ n. Here, n is the length of the path. The path is said to
be a closed path if (x0 , y0 ) = (xn , yn ). We can define 4−, 8− or m− paths
depending on the type of adjacency specified.

(a) (b) (c)

Figure 3.8: Ambiguity in using 8- Adjacency path
(a): Intensity Matrix (b): One Possible Path (c): Another Possible Path

Now let’s see why we moved to m adjacency from 8 adjacency. It’s because of the
ambiguity of the latter. That is, 8 adjacency sometimes gives us ambiguous or multiple
paths for the same pair of pixels as shown Fig. 3.8. Vrinda ma’am: “IF the question
is to find the shortest path, find all possible paths and mention which one
is the shortest.”
Problem 3.1
Find the shortest (a) 4- path, (b) 8- path and (c) m- path between the pixels
4 2 3 2

?
3 3 1 3
with the intensity value in boldface. . Given, V = {1, 2}.
2 3 2 2
2 1 2 3

(a): Referring to Fig. 3.9a, a 4- path doesn’t exist between the given pair of
pixels.
(b): Referring to Fig. 3.9b, the length of the shortest 8- path is 4.
(c): Referring to Fig. 3.9c, the length of the shortest m- path is 5.

• Let S be a subset of pixels in an image. Two pixels p and q are said to be
connected in S if there exists a path between them consisting entirely of pixels in
S.

14 Jerrin
c Thomas Panachakel
3.3. A FEW TERMS...

(a) 4- path (b) 8- path (c) m- path

Figure 3.9: Solution: PBLM. 3.1

• For any pixel p in S, the set of pixels that are connected to it in S is called a
connected component.

• If there is only one connected component, in set S, then it is called a connected
set.

• Let R be a subset of pixels in an image. Then, R is a region of the image if R is
a connected set.

• Two regions R1 and R2 are said to be adjacent if their union forms a connected
set.

• Regions that are not adjacent are said to be disjoint.

• 4- and 8- adjacencies are considered when referring to regions.

• Suppose that an image contains K disjoint regions, RK , k = 1, 2, 3.....K. Let Ru
denotes the union of all these K sets and (Ru )C denotes its complement. Then all
the points in Ru are called the foreground and those in (Ru )C the background.

• The boundary or border or contour of a region R is the set of points that are
adjacent to the points n the complement of R. i.e., the border of a region is the
set of pixels in that region that have at least one background neighbour.

• The boundary of a finite region forms a closed path and is thus a “global” concept.

• Edges are formed from pixels with derivative values that exceed a preset thresh-
old.

• The idea of an edge is a “local” concept that is based on a measure of intensity-
level discontinuity at a point.

What is the difference between an edge and a boundary? In general
terms, an edge is a more “local” concept based on a measure of gray-level dis-
continuity at a point, whereas a region boundary is a “global” idea, which, in a
finite region, may form a closed path.

Jerrin
c Thomas Panachakel 15
3.3. A FEW TERMS...

PROBLEM
Problem 3.2
Consider the two image subsets, S1 and S2 , shown in the figure. For
V=1, determine whether these two subsets are
(a): 4- adjacent
(b): 8- adjacent
(c): m- adjacent

figure).

PROBLEM
?
(a): Not 4- adjacent. (b): Is 8- adjacent (ref. figure). (c): Is m- adjacent (ref.

Problem 3.3
Consider the two image subsets, S1 and S2 , shown in the figure. For
V=1, determine the number of pairs between S1 and S2 for each
(a): 4- adjacent

?
(b): 8- adjacent
(c): m- adjacent

16 Jerrin
c Thomas Panachakel
3.3. A FEW TERMS...

(a): No. of 4- adjacent pairs: 2 with coordinates [(2, 4), (2, 5)] and [(3, 4), (3, 5)].
(b): No. of 8- adjacent pairs: 6 with coordinates [(1, 4), (0, 5)], [(1, 4), (2, 5)], ...
...[(2, 4), (2, 5)], [(2, 4), (3, 5)], [(3, 4), (2, 6)], [(3, 5), (3, 6)]
(c): No. of m- adjacent pairs: 3 with coordinates [(2, 4), (2, 5)], ...
...[(3, 4), (3, 5)], [(1, 4), (0, 5)].

Jerrin
c Thomas Panachakel 17
Lecture 4

Digital Image Fundamentals

4.1 Distance Measures
For pixels p,q and z, with coordinates (x, y), (s, t) and (v, w),respectively, D is a distance
function or metric if

a) D(p, q) ≥ 0 (D(p, q) = 0 iff p = q)

b) D(p, q) = D(q, p), and

c) D(p, z) ≤ D(p, q) + D(q, z)

Different Metrics

1. Euclidean Distance:

• The Euclidean distance between p and q is defined as
1
De (p, q) = [(x − s)2 + (y − t)2 ] 2 (4.1)

• The pixels having a distance of less than or equal to a value r from a pixel
location, say (x, y), are contained in a disk of radius r centered at (x, y) as
shown in Fig. 4.1.

2. D4 distance or city-block distance:

• The city-block between p and q is defined as

D4 (p, q) = |x − s| + |y − t| (4.2)

• The pixels having a distance of less than or equal to a value r from a pixel
location, say (x, y),form a diamond centered at (x, y) as shown in Fig. 4.2.

18
4.1. DISTANCE MEASURES

Figure 4.1: Euclidean distance metric

Figure 4.2: City-block distance metric

• The pixels with D4 = 1 are the 4-neighbours of (x, y).

3. D8 distance or chessboard distance:

• The chessboard block between p and q is defined as

D4 (p, q) = max(|x − s|, |y − t|) (4.3)

• The pixels having a distance of less than or equal to a value r from a pixel
location, say (x, y),form a square centered at (x, y) as shown in Fig. 4.3.
• The pixels with D8 = 1 are the 8-neighbours of (x, y).

4. Dm distance

• Defined as the shortest m-path between the two points.
• Unlike the other two distance metrics discussed, Dm distance is dependent
on the values of the pixels along the path as well as the values of

Jerrin
c Thomas Panachakel 19
4.1. DISTANCE MEASURES

Figure 4.3: Chessboard distance metric

(a) P1 = 0 and P3 = 0 (b) P1 = 1,
Dm distance = 2. Dm distance = 3.

(c) P3 = 1 and P1 = 0 (d) P1 = P3 = 1
Dm distance = 3. Dm distance = 4.

Figure 4.4: m-paths between P and P4 , P = P2 = P4 = 1, V = {1}.

the neighbours.
Refer section 3.3 for Dm distance. Dm between P and P4 for several ar-
rangements of pixels are shown in Fig.4.4. Note that P and P2 are no longer
m-adjacent in Fig. 4.4b.

20 Jerrin
c Thomas Panachakel
4.2. MATHEMATICAL OPERATIONS ON PIXELS

4.2 Mathematical operations on pixels
4.2.1 Array and matrix operation
• Array operation is carried out on a pixel-by-pixel basis.

• Matrix operation is carried out using matrix theory.

• Consider the following 2 × 2 images

   
a11 a12 b11 b12
and
a21 a22 b21 b22

The array product of these two images is

    
a11 a12 b11 b12 a11 b11 a12 b12
=
a21 a22 b21 b22 a21 b21 a22 b2

whereas, the matrix product is given by,
    
a11 a12 b11 b12 a11 b11 + a12 b21 a11 b12 + a12 b22
=
a21 a22 b21 b22 a21 b11 + a22 b21 a21 b12 + a22 b22

4.2.2 Linear and non-linear operations
Let L be an operator. Then, L is said to be a linear operator iff

L(A + B) = L(A) + L(B) (4.4)
Example for a linear operator is the sum operator as illustrated below,

X X X
[ai fi (x, y) + aj fj (x, y)] = ai fi (x, y) + aj fj (x, y) (4.5)
X X
= ai fi (x, y) + aj fj (x, y) (4.6)
= ai g( x, y) + aj gj (x, y) (4.7)

Thus,the sum operator satisfies Eq.4.4 and hence it is a linear operator. L is said to be
a non-linear operator iff

L(A + B) 6= L(A) + L(B) (4.8)
An example for a non-linear operator is the max operator as illustrated below,

   
2 5 2 1
Let A = and B =
7 3 4 5

Jerrin
c Thomas Panachakel 21
4.2. MATHEMATICAL OPERATIONS ON PIXELS

Then,

L(A) = max(A) = 7 and (4.9)
L(B) = max(B) = 5 (4.10)
L(A) + L(B) = 12, now , (4.11)
L(A + B) = max(A + B) = 11 (4.12)

Clearly,
L(A + B) 6= L(A) + L(B) (4.13)
and hence the operator is non-linear in nature.

4.2.3 Arithmetic operations
• Arithmetic operations are array operations, i.e. the operations are carried out
between the corresponding pair of pixels.

• Four major arithmetic operations are,

addition: s(x, y) = f (x, y) + g(x, y)
subtraction: s(x, y) = f (x, y) − g(x, y)
multiplication: s(x, y) = f (x, y) × g(x, y)
division: s(x, y) = f (x, y) ÷ g(x, y)

Applications

Addition: Image enhancement. If pixel g(x, y) = f (x, y) + µ(x, y) where g(x, y) is ob-
tained when the pixel f (x, y) of the original noise-free image is corrupted
by zero mean noise. If the noise at every pixel coordinates (x, y) is uncor-
related, it can be easily shown than if K versions of g(x, y) is averaged, the
averaged version approaches the original noise-free image with very small
variance provided K is sufficiently large. This is called image averaging
or signal averaging.
Subtraction: For enhancing the difference between two images.
Multiplication/division: Shading correction. Let f (x, y) be the “perfect” image, which when multi-
plied by the shading function h(x, y) gives the image g(x, y). i.e. g(x, y) =
f (x, y) × h(x, y). If the shading function h(x, y) is known or can be approx-
imated by imaging a constant intensity image, f (x, y) can be obtained from
g(x, y) by multiplying g(x, y) by the inverse of h(x, y), i.e., by dividing g by
h.

22 Jerrin
c Thomas Panachakel
Lecture 5

Operators (cont’d), Transforms,
Probabilistic Models

5.1 Set operators
• A set is specified by the contents of the two braces {.}.
eg.: C = {w|w = −d, d ∈ D}

• The elements of the set as far as DIP is concerned can be pixel coordinates
(ordered pair) or pixel intensities.

• If every element of a set A is also an element of a set B, then A is said to be a
subset of B, denoted as A ⊆ B.

Union: C = A ∪ B, C contains elements belonging to either A, B or both.

Intersection: D= A ∩ B, elements belonging to both A and B.

Disjoint sets: Two sets A and B are said to be disjoint or mutually exclusive if they ahve no
common elements, i.e. A ∪ B = ∅

Complement: Complement of A is the elements that are not in A.
Ac = {w|w ∈
/ A}

Difference: A − B = {w|w ∈ A, w ∈ B} = A ∪ B c

5.2 Logical operations
• Operated on binary images.

• Similar to union, complement etc, we have AND, OR, NOT etc here.

• Used in morphology

23
5.3. GEOMETRIC SPATIAL TRANSFORMATION

5.3 Geometric spatial transformation
• Geometric transformation modifies the spatial relationship between pixels in an
image.

• Consists of two basic operations

1. a spatial transformation of coordinates
2. intensity interpolation that assigns intensity values to the spatially trans-
formed pixels

• The transformation of coordinates may be expressed as

(x, y) = T {(v, w)} (5.1)
where (v, w) are pixel coordinates in the original image and (x, y) are the corre-
sponding pixel coordinates in the transformed image.

• Most commonly used coordinate transformation is affine transformation (from
the latin affinis: “connected with”.

• general form of affine transformation
 
t11 t12 0
[x y 1] = [v w 1] T = [v w 1] t21 t22 0 (5.2)
t31 t32 1

• Depending on the transformation matrix T , the transformation can be anyone
described in TABLE 5.1.

5.4 Image Registration
• Used to align two or more images.

• Like to align two images taken at the same time but using different imaging
systems (like MRI and PET).

• Or to align two images taken using the same imaging system but at different
times, like satellite images of a given location, several days, moths or years apart.

• We have the input image and the reference image (image to which the input is
to be aligned). What we need to find the transformation that will perform this
alignment or registration.

• For this, we use what is called tie points or control points which are corresponding
points whose location are known precisely in the input and the reference images.

24 Jerrin
c Thomas Panachakel
5.5. VECTOR AND MATRIX OPERATION

Transformation Name Affine Matrix , T Coordinate Equation
 
1 0 0
Identity 0 1 0 x = v, y = w
0 0 1
cx 0 0
Scaling  0 cy 0 x = cx v, y = cy w
 0 0 1 
cosθ sinθ 0
Rotation −sinθ cosθ 0 x = v cos θ − w sin θ, y = v sin θ + w cos θ
0 0 1
1 0 0
Translation  0 1 0 x = v + ty , y = w + ty
tx ty 1
1 0 0
Shear (vertical) sv 1 0 x = v + sx w, y = w
 0 0 1
1 sh 0
Shear (Horizontal) 0 1 0 x = v, y = sh v + w
0 0 1

Table 5.1: Affine Transformations

• There are several methods for selecting the tie points, 1) interactively, 2) using
algorithms that attempt to detect the points automatically or 3)by embedding
artifacts (such as small metallic objects) in the imaging sensors which will produce
a set of known points or reseau marks directly on all images which can be used
as guided for establishing tie points.

• After spatial transformation, we have to apply intensity interpolation to assign
values to the pixels.

5.5 Vector and Matrix operation
• Images are represented as matrices with each element corresponding to a pixel in-
tensity value at the spatial location corresponding the coordinates of the element.

• A color image in the RGB space will have a red, a blue and a green component
per pixel.

• Thus, each pixel may be considered as a vector with elements corresponding to
the component
  values as shown below
zR
z = zG  where z is the pixel under consideration and zR , zG , &zB are the differ-
zB
ent component values of this pixel.

Jerrin
c Thomas Panachakel 25
5.6. IMAGE TRANSFORMATION

• Once pixels are represented using matrices and vectors, we can make use of tolls
of vector-matrix theory. For example, Euclidean Distance, D between a pixel
vector x and an arbitrary point a is defined by the vector product,
1
D(x,a) = [(x − a)T (x − a)] 2 (5.3)
1
2 2 2
= [(x1 − a1 ) + (x2 − a2 ) + ..... + (xn − an ) ] 2 (5.4)

5.6 Image transformation
• Similar to one-dimensional signal processing, in image processing too, we trans-
form a given image from the spatial domain to a transform domain.
• For example, in the compression standard JPEG, image DCT transformation is
used and in JPEG-2000, wavelet transform is used.

Figure 5.1: Image transformation

• General block diagram is given in Fig. 5.1.

5.7 Probabilistic Methods
• Pixel intensity values can be treated as random quantities.
• Let zi , i = 0, 1, 2...L − 1 denote the values of all possible intensities in an M × N
image (for eg. for an 8 Bits Per Pixel (BPP) image, L = 8 and hence pixel
intensities vary from 0 to 255.
• The probability of occurrence of the intensity level zk denoted by p(zk ) is given
by,
nk
p(zk ) = (5.5)
MN
where nk is the number of times the intensity zk occurs in the image and M N is
the number of pixels.
• From probability theory or otherwise, we know that,
L−1
X
p(zk ) = 1 (5.6)
k=0

26 Jerrin
c Thomas Panachakel
5.7. PROBABILISTIC METHODS

Mean
L−1
X
m = zk p(zk ) (5.7)
k=0

L−1
Variance σ 2 = (zk − m)2 p(zk )
P
k=0

• Variance is the measure of spread of pixel intensity values about the mean and
hence is is a useful measure of image contrast.

Jerrin
c Thomas Panachakel 27
Lecture 6

Histogram Processing

Let T (r) be transformation on the pixel values r of the input image to give s, the pixel
values of the transformed image. i.e.,
s = T (r) (6.1)
where 0 ≤ r ≤ L − 1 We assume the following,
1. T (r) is a monotonically increasing function in the interval 0 ≤ r ≤ L − 1
and
2. 0 ≤ r ≤ L − 1 for 0 ≤ r ≤ L − 1.
It may be noted that we assume the transformation to be monotonically increas-
ing rather than to be strictly monotonically increasing. The difference is that a
function is said to be strictly monotonically increasing iff T (r2 ) > T (r1 )f orr2 > r1 and
monotonically increasing if T (r2 ) ≥ T (r1 )f orr2 > r1 . i.e., we allow the transformation
to map multiple values to map to the same element in case of a monotonically increasing
function whereas in a strictly monotonically increasing function, we allow only unique
mappings (one to one mapping), i.e., no two elements from the domain can have the
same element in the range.
One issue with simple monotonically increasing function is that inverse mapping
may not be possible since multiple elements are mapped to the same element in the
range. so, if we want the inverse transformation (r = T −1 (s)0 ≤ r ≤ L − 1 to exist, we
change the assumption one to “T (r) is a strictly monotonically increasing function
in the interval 0 ≤ r ≤ L − 1”.

6.1 Histogram Equalization
Consider the transformation
k
X
sk = T (rk ) = (L − 1) pr (rj ) (6.2)
j=0
k
(L − 1) X
= nj k = 0, 1, 2, 3....L − 1 (6.3)
M N j=0

28
6.1. HISTOGRAM EQUALIZATION

This transformation is known as histogram equalization or histogram linearisa-
tion. The transformed image is obtained by replacing each pixel in the input image
with intensity rk with pixel intensity value sk .

Jerrin
c Thomas Panachakel 29
References

1. http://en.wikipedia.org/wiki/Bilinear_interpolation

2. http://en.wikipedia.org/wiki/Nearest_neighbour_interpolation

3. http://site.iugaza.edu.ps/amaliha/files/2010/02/DIP_ch02.pdf

4. http://support.esri.com/es/knowledgebase/techarticles/detail/23127

5. http://www.cambridgeincolour.com/tutorials/image-interpolation

6. http://tech-algorithm.com/articles/nearest-neighbor-image-scaling

7. http://tech-algorithm.com/articles/nearest-neighbor-image-scaling

8. http://facweb.cs.depaul.edu/research/vc/VC_Workshop/presentations/pdf/
Jacob_tutorial1.pdf

9. http://bme.med.upatras.gr/improc/image_sampling

10. http://staff.najah.edu/sites/default/files/Chapter2_Digital_Image_Fundamentals.
pdf

11. http://www-cs.ccny.cuny.edu/~wolberg/pub/crc04.pdf

12. http://www.haberdar.org/Spatial-Frequency-of-an-Image-Tutorial

13. http://www.cs.cf.ac.uk/Dave/Multimedia/node223

14. http://www-cs.engr.ccny.cuny.edu/~wolberg/cs470/doc/CSc470.pdf

15. http://bme.med.upatras.gr/improc/image_sampling

16. http://users.wfu.edu/matthews/misc/DigPhotog/alias

17. Digital Image Processing, Gonzalez and Woods, Second Edition, Prentice-Hall,
2002

30