You are on page 1of 74

UNIT 1 DIGITAL IMAGE FUNDAMENTALS

ELEMENTS OF VISUAL PERCEPTION :

The field of digital image processing  is built on the foundation of mathematical and probabilistic formulation, but human
intuition and analysis play the main role to make the selection between various techniques, and the choice or selection is
basically made on subjective, visual judgements. 
In human visual perception, the eyes act as the sensor or camera, neurons act as the connecting cable and the brain acts
as the processor. 
The basic elements of visual perceptions are: 
 
1. Structure of Eye
2. Image Formation in the Eye
3. Brightness Adaptation and Discrimination
Structure of Eye: 
 
The human eye is a slightly asymmetrical sphere with an average diameter of the length of 20mm to 25mm. It has a volume
of about 6.5cc. The eye is just like a camera. The external object is seen as the camera take the picture of any object. Light
enters the eye through a small hole called the pupil, a black looking aperture having the quality of contraction of eye when
exposed to bright light and is focused on the retina which is like a camera film. 
The lens, iris, and cornea are nourished by clear fluid, know as anterior chamber. The fluid flows from ciliary body to the
pupil and is absorbed through the channels in the angle of the anterior chamber. The delicate balance of aqueous
production and absorption controls pressure within the eye. 

Cones in eye number between 6 to 7 million which are highly sensitive to colors. Human visualizes the colored image in
daylight due to these cones. The cone vision is also called as photopic or bright-light vision. 
Rods in the eye are much larger between 75 to 150 million and are distributed over the retinal surface. Rods are not
involved in the color vision and are sensitive to low levels of illumination. 
Image Formation in the Eye: 
When the lens of the eye focus an image of the outside world onto a light-sensitive membrane in the back of the eye, called
retina the image is formed. The lens of the eye focuses light on the photoreceptive cells of the retina which detects the
photons of light and responds by producing neural impulses. 
 
The distance between the lens and the retina is about 17mm and the focal length is approximately 14mm to 17mm.  
Brightness Adaptation and Discrimination: 
Digital images are displayed as a discrete set of intensities. The eyes ability to discriminate black and white at different
intensity levels is an important consideration in presenting image processing result. 
 
The range of light intensity levels to which the human visual system can adapt is of the order of 10 10 from the scotopic
threshold to the glare limit. In a photopic vision, the range is about 10 6.
 

IMAGE ACQUISITION SYSTEM:


https://buzztech.in/image-acquisition-in-digital-image-processing/

Sampling & Quantization in Digital Image Processing


In Digital Image Processing, signals captured from the physical world need to be translated into digital form
by “Digitization” Process. In order to become suitable for digital processing, an image function f(x,y) must be
digitized both spatially and in amplitude. This digitization process involves two main processes called

1. Sampling: Digitizing the co-ordinate value is called sampling.

2. Quantization: Digitizing the amplitude value is called quantization

Typically, a frame grabber or digitizer is used to sample and quantize the analogue video signal.

Sampling

Since an analogue image is continuous not just in its co-ordinates (x axis), but also in its amplitude (y axis), so
the part that deals with the digitizing of co-ordinates is known as sampling. In digitizing sampling is done on
independent variable. In case of equation y = sin(x), it is done on x variable.
When looking at this image, we can see there are some random variations in the signal caused by noise. In
sampling we reduce this noise by taking samples. It is obvious that more samples we take, the quality of the
image would be more better, the noise would be more removed and same happens vice versa. However, if you
take sampling on the x axis, the signal is not converted to digital format, unless you take sampling of the y-axis
too which is known as quantization.

Sampling has a relationship with image pixels. The total number of pixels in an image can be calculated as
Pixels = total no of rows * total no of columns. For example, let’s say we have total of 36 pixels, that means we
have a square image of 6X 6. As we know in sampling, that more samples eventually result in more pixels. So it
means that of our continuous signal, we have taken 36 samples on x axis. That refers to 36 pixels of this image.
Also the number sample is directly equal to the number of sensors on CCD array.
Here is an example for image sampling and how it can be represented using a graph.

Quantization

Quantization is opposite to sampling because it is done on “y axis” while sampling is done on “x axis”.
Quantization is a process of transforming a real valued sampled image to one taking only a finite number of
distinct values. Under quantization process the amplitude values of the image are digitized. In simple
words, when you are quantizing an image, you are actually dividing a signal into quanta(partitions).
Now let’s see how quantization is done. Here we assign levels to the values generated by sampling process. In
the image showed in sampling explanation, although the samples has been taken, but they were still spanning
vertically to a continuous range of gray level values. In the image shown below, these vertically ranging values
have been quantized into 5 different levels or partitions. Ranging from 0 black to 4 white. This level could vary
according to the type of image you want.

There is a relationship between Quantization with gray level resolution. The above quantized image represents
5 different levels of gray and that means the image formed from this signal, would only have 5 different colors.
It would be a black and white image more or less with some colors of gray.
When we want to improve the quality of image, we can increase the levels assign to the sampled image. If we
increase this level to 256, it means we have a gray scale image. Whatever the level which we assign is called as
the gray level. Most digital IP devices uses quantization into k equal intervals. If b-bits per pixel are used,

The number of quantization levels should be high enough for human perception of fine shading details in the
image. The occurrence of false contours is the main problem in image which has been quantized with
insufficient brightness levels. Here is an example for image quantization process.
IMAGE FORMATION:

In modeling any image formation process, geometric primitives and transformations are crucial to project 3-D
geometric features into 2-D features. However, apart from geometric features, image formation also depends
on discrete color and intensity values. It needs to know the lighting of the environment, camera optics, sensor
properties, etc. Therefore, while talking about image formation in Computer Vision, the article will be
focussing on photometric image formation.

2.1 Photometric Image Formation

Fig. 1 gives a simple explanation of image formation. The light from a source is reflected on a particular
surface. A part of that reflected light goes through an image plane that reaches a sensor plane via optics.

Fig: 1. Photometric Image Formation (Image credit: Szeliski, Computer Vision: Algorithms and Applications 2010)
Some factors that affect image formation are:

 The strength and direction of the light emitted from the source.

 The material and surface geometry along with other nearby surfaces.

 Sensor Capture properties

2.1.1 Reflection and Scattering

Images cannot exist without light. Light sources can be a point or an area light source. When the light hits a
surface, three major reactions might occur-

1. Some light is absorbed. That depends on the factor called ρ (albedo). Low ρ of the surface means more light
will get absorbed.

2. Some light gets reflected diffusively, which is independent of viewing direction. It follows Lambert’s
cosine law that the amount of reflected light is proportional to cos(θ). E.g., cloth, brick.

3. Some light is reflected specularly, which depends on the viewing direction. E.g., mirror.
Fig. 2: Models of reflection (Image credits: Derek Hoiem, University of Illinois)

Apart from the above models of reflection, the most common model of light scattering is the Bidirectional
Reflectance Distribution Function (BRDF). It gives the measure of light scattered by a medium from
one direction into another. The scattering of the light can determine the topography of the surface — smooth
surfaces reflect almost entirely in the specular direction, while with increasing roughness the light tends to
diffract into all possible directions. Eventually, an object will appear equally bright throughout the outgoing
hemisphere if its surface is perfectly diffuse (i.e., Lambertian). Owing to this, BRDF can give valuable
information about the nature of the target sample.

There are multiple other shading models and ray tracing approaches that are used in unison to properly
understand the environment by evaluating the appearance of the scene.
2.1.2 Color

From a viewpoint of color, we know visible light is only a small portion of a large electromagnetic spectrum.

Two factors are noticed when a colored light arrives at a sensor:

 Colour of the light

 Colour of the surface

Bayer Grid/Filter is an important development to capture the color of the light. In a camera, not every
sensor captures all the three components (RGB) of light. Inspired by human visual preceptors, Bayers
proposed a grid in which there are 50% green, 25 % red, and 25% blue sensors.

Demosaicing algorithm is then used to obtain a full-color image where the surrounding pixels are used to
estimate the values for a particular pixel.

There are many such color filters that have been developed to sense colors apart from Bayer Filter.
Fig. 3: (a) Bayer arrangement of filters on an image sensor. (b) The cross-section of the sensor.

2.2 Image sensing Pipeline (The digital camera)

The light originates from multiple light sources, gets reflected on multiple surfaces, and finally enters the
camera where the photons are converted into the (R, G, B) values that we see while looking at a digital image.

An image sensing pipeline in the camera follows the flowchart that is given in fig. 4.
In a camera, the light first falls on the lens (optics). Following that is the aperture and shutter which can be
specified or adjusted. Then the light falls on sensors which can be CCD or CMOS (discussed below), then the
image is obtained in an analog or digital form and we get the raw image.

Typically cameras do not stop here. They use demosaic algorithms mentioned in the above topic. Image is
sharpened if required or any other important processing algorithms are applied. Post this, white balancing and
other digital signal processing tasks are done and the image is finally compressed to a suitable format and
stored.

Fig. 4: Image sensing pipeline in a camera (Image credit: Szeliski, Computer Vision: Algorithms and Applications 2010)
2.2.1 CCD vs CMOS

The camera sensor can be CCD or CMOS. In charged coupled device (CCD). A charge is generated at each
sensing element and this photogenerated charge is moved from pixel to pixel and is converted into a voltage at
the output node. Then an analog to digital converter (ADC) converts the value of each pixel to a digital value.

The complementary metal-oxide-semiconductor (CMOS) sensors work by converting charge to voltage


inside each element as opposed to CCD which accumulates the charge. CMOS signal is digital and therefore
does not need ADC. CMOS is widely used in cameras in the current times.

Fig. 5: CCD vs CMOS (Image credit: D. Litwiller, CMOS vs. CCD: Maturing technologies, maturing markets)

2.2.2 Properties of Digital Image Sensor

Let us look at some properties that you may see while clicking a picture on a camera.
Shutter Speed: It controls the amount of light reaching the sensor

Sampling Pitch: It defines the physical space between adjacent sensor cells on the imaging chip.

Fill Factor: It is the ratio of active sensing area size with respect to the theoretically available sensing area
(product of horizontal and vertical sampling pitches)

Chip Size: Entire size of the chip

Sensor Noise: Noise from various sources in the sensing process

Resolution: It tells you how many bits are specified for each pixel.

Post-processing: Digital image enhancement methods used before compression and storage.

Fundamentals of Image Formation


Image formation is an analog to digital conversion of an image with the help of 2D Sampling and Quantization techniques
that is done by the capturing devices like cameras.  In general, we see a 2D view of the 3D world.
In the same way, the formation of the analog image took place. It is basically a conversion of the 3D world that is our analog
image to a 2D world that is our Digital image.
Generally, a frame grabber or a digitizer is used for sampling and quantizing the analog signals. 

Imaging:

The mapping of a 3D world object into a 2D digital image plane is called imaging. In order to do so, each point on the 3D
object must correspond to the image plane. We all know that light reflects from every object that we see thus enabling us to
capture all those light-reflecting points in our image plane. 
Various factors determine the quality of the image like spatial factors or the lens of the capturing device.
 
Color and Pixelation:

In digital imaging, a frame grabber is placed at the image plane which is like a sensor. It aims to focus the light on it and the
continuous image is pixelated via the reflected light by the 3D object. The light that is focused on the sensor generates an
electronic signal. 

Each pixel that is formed may be colored or grey depending on the intensity of the sampling and quantization of the light
that is reflected and the electronic signal that is generated via them.
All these pixels form a digital image. The density of these pixels determines the image quality. The more the density the
more the clear and high-resolution image we will get.

Forming a Digital Image:

In order to form or create an image that is digital in nature, we need to have a continuous conversion of data into a digital
form. Thus, we require two main steps to do so:
 Sampling (2D): Sampling is a spatial resolution of the digital image. And the rate of sampling determines the quality of
the digitized image. The magnitude of the sampled image is determined as a value in image processing. It is related to
the coordinates values of the image.
 Quantization: Quantization is the number of grey levels in the digital image. The transition of the continuous values from
the image function to its digital equivalent is called quantization. It is related to the intensity values of the image.
 The normal human being acquires a high level of quantization levels to get the fine shading details of the image. The
more quantization levels will result in the more clear image. 
Geometric Operations

 Filters, point operations change intensity


 Pixel position (and geometry) unchanged
 Geometric operations: change image geometry
 Examples: translating, rotating, scaling an image

Examples of
Geometric
operations
Geometric Operations

 Example applications of geometric operations:


 Zooming images, windows to arbitrary size
 Computer graphics: deform textures and map to arbitrary surfaces
 Definition: Geometric operation transforms image I to
new image I’ by modifying coordinates of image pixels

 Intensity value originally at (x,y) moved to new position (x’,y’)


(x,y)
(x + dx, y + dy)

Example:
Translation
geometric operation
moves value at
(x,y) to (x + dx, y + dy)
Geometric Operations

 Since image coordinates can only be discrete values,


some transformations may yield (x’,y’) that’s not discrete
 Solution: interpolate nearby values
Simple Mappings
 Translation: (shift) by a vector (dx, dy)

 Scaling: (contracting or stretching) along x or y axis by a


factor sx or sy
Simple Mappings
 Shearing: along x and y axis by factor bx and by

 Rotation: the image by an angle α


Image Flipping & Rotation by 90 degrees

 We can achieve 90,180 degree rotation easily


 Basic idea: Look up a transformed pixel address instead
of the current one
 To flip an image upside down:
 At pixel location xy, look up the color at location x (1 – y)

 For horizontal flip:


 At pixel location xy, look up (1 – x) y

 Rotating an image 90 degrees counterclockwise:


 At pixel location xy, look up (y, 1 – x)
Image Flipping, Rotation and Warping

 Image warping: we can use a function to select which


pixel somewhere else in the image to look up
 For example: apply function on both texel coordinates (x, y)

x  x y *sin( * x)

Homogeneous Coordinates
 Notation useful for converting scaling, translation, rotating
into point‐matrix multiplication
 To convert ordinary coordinates into
homogeneous coordinates
Affine (3‐Point) Mapping
 Can use homogeneous coordinates to rewrite translation,
rotation, scaling, etc as vector‐matrix multiplication

 Affine mapping: Can then derive values of matrix that achieve


desired transformation (or combination of transformations)

 Inverse of transform matrix is inverse mapping


Affine (3‐Point) Mapping

 What’s so special about affine mapping?

 Maps
 straight lines ‐> straight lines,
 triangles ‐> triangles
 rectangles ‐> parallelograms
 Parallel lines ‐> parallel lines
 Distance ratio on lines do not change
Non‐Linear Image Warps

Original Twirl Ripple Spherical


Twirl
 Notation: Instead using texture colors at (x’,y’), use texture colors at
twirled (x,y) location
 Twirl?
 Rotate image by angle α at center or anchor point (xc,yc)
 Increasingly rotate image as radial distance r from center
increases (up to rmax)
 Image unchanged outside radial distance rmax
Ripple

 Ripple causes wavelike displacement of


image along both the x and y directions

 Sample values for parameters (in pixels) are


 τx= 120
 τy= 250
 ax= 10
 ay= 15
Spherical Transformation

 Imitates viewing image through a lens placed over image


 Lens parameters: center (xc, yc ), lens radius rmax and refraction index ρ
 Sample values ρ = 1.8 and rmax = half image width
Image Warping
Types of Images
There are three types of images. They are as following:

1. Binary Images
It is the simplest type of image. It takes only two values i.e, Black and White or 0 and 1.
The binary image consists of a 1-bit image and it takes only 1 binary digit to represent a
pixel. Binary images are mostly used for general shape or outline.

For Example: Optical Character Recognition (OCR).

Binary images are generated using threshold operation. When a pixel is above the
threshold value, then it is turned white('1') and which are below the threshold value then
they are turned black('0')

2. Gray-scale images
Grayscale images are monochrome images, Means they have only one color. Grayscale
images do not contain any information about color. Each pixel determines available
different grey levels.

A normal grayscale image contains 8 bits/pixel data, which has 256 different grey levels.
In medical images and astronomy, 12 or 16 bits/pixel images are used.
3. Colour images
Colour images are three band monochrome images in which, each band contains a
different color and the actual information is stored in the digital image. The color
images contain gray level information in each spectral band.

The images are represented as red, green and blue (RGB images). And each color image
has 24 bits/pixel means 8 bits for each of the three color band(RGB).

8-bit color format


8-bit color is used for storing image information in a computer's memory or in a file of
an image. In this format, each pixel represents one 8 bit byte. It has 0-255 range of
colors, in which 0 is used for black, 255 for white and 127 for gray color. The 8-bit color
format is also known as a grayscale image. Initially, it was used by the UNIX operating
system.

16-bit color format


The 16-bit color format is also known as high color format. It has 65,536 different color
shades. It is used in the system developed by Microsoft. The 16-bit color format is
further divided into three formats which are Red, Green, and Blue also known as RGB
format.

In RGB format, there are 5 bits for R, 6 bits for G, and 5 bits for B. One additional bit is
added in green because in all the 3 colors green color is soothing to eyes.

24-bit color format


The 24-bit color format is also known as the true color format. The 24-bit color format is
also distributed in Red, Green, and Blue. As 24 can be equally divided on 8, so it is
distributed equally between 3 different colors like 8 bits for R, 8 bits for G and 8 bits for
B.

Relationships between pixels(Neighbours and


Connectivity)

An image is denoted by f(x,y) and p,q are used to represent individual


pixels of the image.

Neighbours of a pixel

A pixel p at (x,y) has 4-horizontal/vertical neighbours at (x+1,y), (x-


1,y), (x,y+1) and (x,y-1). These are called the 4-neighbours of p :
N4(p).

A pixel p at (x,y) has 4 diagonal neighbours at (x+1,y+1), (x+1,y-1), (x-


1,y+1) and (x-1,y-1). These are called the diagonal-neighbours of p :
ND(p).
The 4-neighbours and the diagonal neighbours of p are called 8-
neighbours of p : N8(p).

Adjacency between pixels

Let V be the set of intensity values used to define adjacency.

In a binary image, V ={1} if we are referring to adjacency of pixels with


value 1. In a gray-scale image, the idea is the same, but set V typically
contains more elements.

For example, in the adjacency of pixels with a range of possible


intensity values 0 to 255, set V could be any subset of these 256 values.

We consider three types of adjacency:

a) 4-adjacency: Two pixels p and q with values from V are 4-adjacent


if q is in the set N4(p).

b) 8-adjacency: Two pixels p and q with values from V are 8-


adjacent if q is in the set N8(p).

c) m-adjacency(mixed adjacency): Two pixels p and q with values


from V are m-adjacent if

1. q is in N4(p), or
2. 2) q is in ND(p) and the set N4(p)∩N4(q) has no pixels whose
values are from V.

Connectivity between pixels

It is an important concept in digital image processing.

It is used for establishing boundaries of objects and components of


regions in an image.

Two pixels are said to be connected:

 if they are adjacent in some sense(neighbour pixels,4/8/m-


adjacency)

 if their gray levels satisfy a specified criterion of similarity(equal


intensity level)

There are three types of connectivity on the basis of adjacency. They


are:

a) 4-connectivity: Two or more pixels are said to be 4-connected if


they are 4-adjacent with each others.

b) 8-connectivity: Two or more pixels are said to be 8-connected if


they are 8-adjacent with each others.
c) m-connectivity: Two or more pixels are said to be m-connected if
they are m-adjacent with each others.

Distance Transform

Common Names: Distance transform

Brief Description
The distance transform is an operator normally only applied to binary images. The
result of the transform is a graylevel image that looks similar to the input image,
except that the graylevel intensities of points inside foreground regions are changed to
show the distance to the closest boundary from each point.
One way to think about the distance transform is to first imagine that foreground
regions in the input binary image are made of some uniform slow burning
inflammable material. Then consider simultaneously starting a fire at all points on the
boundary of a foreground region and letting the fire burn its way into the interior. If
we then label each point in the interior with the amount of time that the fire took to
first reach that point, then we have effectively computed the distance transform of that
region. Figure 1 shows a distance transform for a simple rectangular shape.

Figure 1 The distance transform of a simple shape. Note that we are using the
`chessboard' distance metric.

There is a dual to the distance transform described above which produces the distance
transform for the background region rather than the foreground region. It can be
considered as a process of inverting the original image and then applying the standard
transform as above.

How It Works
There are several different sorts of distance transform, depending upon which distance
metric is being used to determine the distance between pixels. The example shown in
Figure 1 uses the `chessboard' distance metric but both the Euclidean and `city block'
metrics can be used as well.
Even once the metric has been chosen, there are many ways of computing the distance
transform of a binary image. One intuitive but extremely inefficient way of doing it is
to perform multiple successive erosions with a suitable structuring element until all
foreground regions of the image have been eroded away. If each pixel is labeled with
the number of erosions that had to be performed before it disappeared, then this is just
the distance transform. The actual structuring element that should be used depends
upon which distance metric has been chosen. A 3×3 square element gives the
`chessboard' distance transform, a cross shaped element gives the `city block' distance
transform, and a disk shaped element gives the Euclidean distance transform. Of
course it is not actually possible to generate a good disk shaped element on a discrete
grid on a small scale, but there are algorithms that vary the structuring element on
each erosion so as to approximate a circular element.

The distance transform can be calculated much more efficiently using clever
algorithms in only two passes (e.g. Rosenfeld and Pfaltz 1968). This algorithm, which
is based on recursive morphology, will not be described here.

Guidelines for Use


The distance transform is very closely linked to both the medial axis transform and to
skeletonization. It can also be used to derive various other symmetries from binary
shapes. As such it is usually only used as a step on the way to producing these end
products (and in fact is often only produced in theory rather than in practice).

Here we illustrate the Euclidean distance transform with some examples.

The binary image

becomes

when a distance transform is applied (scaled by a factor of 5).


Similarly,

becomes

(scaled by a factor of 3).

And finally,

becomes

(scaled by a factor of 4).

The distance transform is sometimes very sensitive to small changes in the object. If,
for example, we change the above rectangle to

which contains a small black region in the center of the white rectangle, then the
distance transform becomes
(after brightening the image by a factor of 6). This can be of advantage when we want
to distinguish between similar objects like the two different rectangles above.
However, it can also cause problems when trying to classify objects into classes of
roughly the same shape. It also makes the distance transform very sensitive to noise.
For instance, if we add some `pepper noise' to the above rectangle, as in

the distance transform yields

(brightened by a factor of 15).

An example of applying the distance transform to a real world image is illustrated


with

To obtain a binary input image, we threshold the image at a value of 100, as shown in

The scaled (factor 6) distance transform is


Although the image gives a rough measure for the width of the object at each point, it
is quite inaccurate at places where the object is incorrectly segmented from the
background.

The last three examples show that it is important that the binary input image is a good
representation of the object that we want to process. Simple thresholding is often not
enough. It might be necessary to further process the image before applying the
distance transform.

: Fundamental steps in Digital Image


Processing
1. Image Acquisition
This is the first step or process of the fundamental steps of digital image processing.
Image acquisition could be as simple as being given an image that is already in digital
form. Generally, the image acquisition stage involves preprocessing, such as scaling
etc.
2. Image Enhancement
Image enhancement is among the simplest and most appealing areas of digital image
processing. Basically, the idea behind enhancement techniques is to bring out detail
that is obscured, or simply to highlight certain features of interest in an image. Such as,
changing brightness & contrast etc.
3. Image Restoration
Image restoration is an area that also deals with improving the appearance of an image.
However, unlike enhancement, which is subjective, image restoration is objective, in the
sense that restoration techniques tend to be based on mathematical or probabilistic
models of image degradation.
4. Color Image Processing
Color image processing is an area that has been gaining its importance because of the
significant increase in the use of digital images over the Internet. This may include color
modeling and processing in a digital domain etc.
5. Wavelets and Multiresolution Processing
Wavelets are the foundation for representing images in various degrees of resolution.
Images subdivision successively into smaller regions for data compression and for
pyramidal representation.
6. Compression
Compression deals with techniques for reducing the storage required to save an image
or the bandwidth to transmit it. Particularly in the uses of internet it is very much
necessary to compress data.
7. Morphological Processing
Morphological processing deals with tools for extracting image components that are
useful in the representation and description of shape.
8. Segmentation
Segmentation procedures partition an image into its constituent parts or objects. In
general, autonomous segmentation is one of the most difficult tasks in digital image
processing. A rugged segmentation procedure brings the process a long way toward
successful solution of imaging problems that require objects to be identified individually.
9. Representation and Description
Representation and description almost always follow the output of a segmentation
stage, which usually is raw pixel data, constituting either the boundary of a region or all
the points in the region itself. Choosing a representation is only part of the solution for
transforming raw data into a form suitable for subsequent computer processing.
Description deals with extracting attributes that result in some quantitative information of
interest or are basic for differentiating one class of objects from another.
10. Object recognition
Recognition is the process that assigns a label, such as, “vehicle” to an object based on
its descriptors.
11. Knowledge Base:
Knowledge may be as simple as detailing regions of an image where the information of
interest is known to be located, thus limiting the search that has to be conducted in
seeking that information. The knowledge base also can be quite complex, such as an
interrelated list of all major possible defects in a materials inspection problem or an
image database containing high-resolution satellite images of a region in connection
with change-detection applications.

What is the color model??

The color model aims to facilitate the specifications of colors in


some standard way.

Different types of color models are used in multiple fields like in


hardware, in multiple applications of creating animation, etc.

 In Digital Image Processing, the hardware-oriented models that are commonly


used are the RGB model for printers and color monitors.
 CMY(cyan, magenta, yellow) and CMYK(cyan, magenta, yellow, black) models are
used for color printing.
 HSI(hue, saturation, intensity) deals with colors as humans interpret.

Let’s discuss each color model and its application. We will start
with the most popular model that is the RGB model
RGB Model

In the RGB model, each color appears in its primary components


of red, green and blue. This model is based on a Cartesian
coordinate system as you can see in Fig.2.
Fig.2. RGB color cube

We will see how an image is divided into three components.


import cv2
image = cv2.imread('C:/Users/hp/images/1.jpg')
image = cv2.resize(image, (300, 300))
b = image[:,:,0:1]
g = image[:,:,1:2]
r = image[:,:,2:3]
# RGB - Blue
cv2.imshow('B-RGB', b)
# RGB - Green
cv2.imshow('G-RGB', g)
# # RGB - Red
cv2.imshow('R-RGB', r)
cv2.waitKey(0)

 Fig. 3. Input image


Fig.4. R, G and B component of an image

Application-

 The RGB model is widely used in the representation and display of images in
electronic systems like computers and televisions.
 It is also used in conventional photography as well.
 Image scanner which scans images and converts it to a digital image mostly
supports RGB color.
 It is used in web graphics.

CMY Model

This model contains the secondary colors. In this model, any


secondary color when passed through white light will not reflect
the color from which a combination of colors is made.
For example- when cyan is illuminated with white light, no red light
will be reflected from the surface which means that the cyan
subtracts the red light from the reflected white light (which itself is
composed of red, green and white light).
 (Eq. 1)

The formula given in equation 1 is for inter-conversion of RGB and


CMY models.

Fig. 5. Input Image

Fig.6. C, M and Y components of an image

Application-

 It is used in color printing as it uses colored inks.


 It is used in most commercial printing like magazines, books, etc.

HSI Model (Hue, Saturation, Intensity)


It is a very important and attractive color model because it
represents the colors the same way as the human eye senses
colors.
We have already read about Saturation and Intensity.

Now, what is Hue?

Hue is a color component that describes a pure color (pure yellow,


orange or red)

Saturation component represents the measure of the degree to


which color is mixed with white color.

 0 degree – Red
 120 degree – Green
 240 degree – Blue
 60 degree – Yellow
 300 degree – Magenta

Intensity 

 Range is [0, 1]
 0 means white
 1 means black

The formula for conversion of RGB to HSI is quite complicated as


compared to other color models.

3  Colour Models
Colour models provide a standard way to specify a
particular colour, by defining a 3D coordinate
system, and a subspace that contains all constructible colours within a particular
model. Any colour that can be specified using a model will correspond to a single
point within the subspace it defines. Each colour model is oriented towards either
specific hardware (RGB,CMY,YIQ), or image processing applications (HSI).

3.1  The RGB Model

In the RGB model, an image consists of three independent image planes, one in each
of the primary colours: red, green and blue. (The standard wavelengths for the three
primaries are as shown in figure 1). Specifying a particular colour is by specifying the
amount of each of the primary components present. Figure 5 shows the geometry of
the RGB colour model for specifying colours using a Cartesian coordinate system.
The greyscale spectrum, i.e. those colours made from equal amounts of each primary,
lies on the line joining the black and white vertices.
Figure

Figure 5: The RGB colour cube. The greyscale spectrum lies on the line joining the
black and white vertices.

This is an additive model, i.e. the colours present in the light add to form new colours,
and is appropriate for the mixing of coloured light for example. The image on the left
of figure 6 shows the additive mixing of red, green and blue primaries to form the
three secondary colours yellow (red + green), cyan (blue + green) and magenta (red +
blue), and white ((red + green + blue).

The RGB model is used for colour monitors and most video cameras.

3.2  The CMY Model

The CMY (cyan-magenta-yellow) model is a subtractive model appropriate to


absorption of colours, for example due to pigments in paints. Whereas the RGB model
asks what is added to black to get a particular colour, the CMY model asks what is
subtracted from white. In this case, the primaries are cyan, magenta and yellow, with
red, green and blue as secondary colours (see the image on the right of figure 6).

When a surface coated with cyan pigment is illuminated by white light, no red light is
reflected, and similarly for magenta and green, and yellow and blue. The relationship
between the RGB and CMY models is given by:
� � � �� �

� � � �� �
C 1 R
� � � �� �
M = 1 - G .
� � � �� �
Y 1 B
� � � �� �

� � � �� �

The CMY model is used by printing devices and filters.

Figure

Figure 6: The figure on the left shows the additive mixing of red, green and blue
primaries to form the three secondary colours yellow (red + green), cyan (blue +
green) and magenta (red + blue), and white ((red + green + blue). The figure on the
right shows the three subtractive primaries, and their pairwise combinations to form
red, green and blue, and finally black by subtracting all three primaries from white.

3.2.1  Why does blue paint plus yellow paint give green?

As all schoolchildren know, the way to make green paint is to mix blue paint with
yellow. But how does this work? If blue paint absorbs all but blue light, and yellow
absorbs blue only, when combined no light should be reflected and black paint result.

However, what actually happens is that imperfections in the paint are exploited. In
practice, blue paint reflects not only blue, but also some green. Since the yellow paint
also reflects green (since yellow = green + red), some green is reflected by both
pigments, and all other colours are abosrbed, resulting in green paint.
3.3  The HSI Model

As mentioned above, colour may be specified by the three quantities hue, saturation
and intensity. This is the HSI model, and the entire space of colours that may be
specified in this way is shown in figure 7.

Figur
e

Figure 7: The HSI model, showing the HSI solid on the left, and the HSI triangle on the
right, formed by taking a horizontal slice through the HSI solid at a particular
intensity. Hue is measured from red, and saturation is given by distance from the
axis. Colours on the surface of the solid are fully saturated, i.e. pure colours, and the
greyscale spectrum is on the axis of the solid. For these colours, hue is undefined.

Conversion between the RGB model and the HSI model is quite complicated. The
intensity is given by

R+G+B

I= ,

3
where the quantities R, G and B are the amounts of the red, green and blue
components, normalised to the range [0,1]. The intensity is therefore just the
average of the red, green and blue components. The saturation is given by:

min (R,G,B) 3

S=1- =1- min (R,G,B)

I R+G+B

where the min(R,G,B) term is really just indicating the amount of white present. If
any of R, G or B are zero, there is no white and we have a pure colour. The
expression for the hue, and details of the derivation may be found in reference [1].
3.4  The YIQ Model

The YIQ (luminance-inphase-quadrature) model is a recoding of RGB for colour


television, and is a very important model for colour image processing. The importance
of luminance was discussed in § 1.

The conversion from RGB to YIQ is given by:

� � � �� �

� � � �� �
Y 0.299 0.587 0.114 R
� � � �� �
I = 0.596 -0.275 -0.321 . G
� � � �� �
Q 0.212 -0.523 0.311 B
� � � �� �

� � � �� �

The luminance (Y) component contains all the information required for black and
white television, and captures our perception of the relative brightness of particular
colours. That we perceive green as much lighter than red, and red lighter than blue,
is indicated by their respective weights of 0.587, 0.299 and 0.114 in the first row of
the conversion matrix above. These weights should be used when converting a
colour image to greyscale if you want the perception of brightness to remain the
same. This is not the case for the intensity component in an HSI image, as shown in
figure 8.

The Y component is the same as the CIE primary Y (see § 2.1).

Figure

Figure 8: Image (a) shows a colour test pattern, consisting of horizontal stripes of
black, blue, green, cyan, red, magenta and yellow, a colour ramp with constant
intensity, maximal saturation, and hue changing linearly from red through green to
blue, and a greyscale ramp from black to white. Image (b) shows the intensity for
image (a). Note how much detail is lost. Image (c) shows the luminance. This third
image accurately reflects the brightness variations preceived in the original image.

4  Applying Greyscale Transformations to Colour Images


Given all these different representations of colour, and hence colour images, the
question arises as to what is the best way to apply the image processing techniques we
have covered so far to these images? One possibility is to apply the transformations to
each colour plane in an RGB image, but what exactly does this mean? If we want to
increase the contrast in a dark image by histogram equalisation, can we just equalise
each colour independently? This will result in quite different colours in our
transformed image. In general it is better to apply the transformation to just the
intensity component of an HSI image, or the luminance component of a YIQ image,
thus leaving the chromaticity unaltered.

An example is shown in figure 9. When histogram equalisation is applied to each


colour plane of the RGB image, the final image is lighter, but also quite differently
coloured to the original. When histogram equalisation is only applied to the luminance
component of the image in YIQ format, the result is more like a lighter version of the
original image, as required.

Figure

Figure 9: The top image is a very dark image of a forest scene. The middle image is
the result of applying histogram equalisation to each of the red, green and blue
components of the original image. The bottom image is the result of converting the
image to YIQ format, and applying histogram equalisation to the luminance
component only.
Basic Image Conversion

Image by Author

As we go through the world of Digital Image, as described in my article


introduction to image processing, the most crucial thing to understand
and learn is how to use basic Image Processing Techniques.
Image Processing, in context, is the art of conducting different
methods, actions, or transformations on an image, either by enhancing
it, manipulating it, or changing its content that will suit our need.

Image Processing’s core of operations are in a sense, are different


mathematical equations performed on the image itself wherein the
image will be changed according to the mathematical function that we
have defined.

Since we will treat the Image as NumPy arrays, it can be transformed


and enhanced using different Linear Algebra techniques in dealing
with arrays and matrices.

We start by reading and showing our previous example of a flower


bouquet:
from skimage.io import imread, imshowsample = imread('flower.jpg')
imshow(sample);

Figure 1: Sample Image (Image by Author)


We will now start trying the different Basic Image Conversion!

Showing Grayscale

We can show the grayscale of the image by using the rgb2gray function.
We can also show the grayscale by using the cmap = ‘gray’ version of
the plot/graph we chose.
from skimage.color import rgb2gray
import matplotlib.pyplot as pltsample_g = rgb2gray(sample)fig, ax =
plt.subplots(1,2,figsize=(12,12))
ax[0].imshow(sample)
ax[0].set_title('Original',fontsize=15)
ax[1].imshow(sample_g,cmap='gray')
ax[1].set_title('Grayscale',fontsize=15)
plt.tight_layout()
plt.show()

Figure 2: Grayscale (Image by Author)

Showing Monochrome or Binarized Image

We can show also the binarized representation of an image if we put


thresholding. Remember that binarizing an image means that we will
make the value of the pixels in the array either a one or zero. To this,
we can put a threshold of either the mean of the pixel values and
determine if the pixel value is above or below that specific mean. Using
this threshold, we can create an array of true or false which can be then
made into an array of integers using the img_as_uint function, this
will bring back the threshold array into a float-like array that can now
be shown using imshow.
from skimage import img_as_uint#use the mean of pixel values as the
threshold
mean1 = sample_g.mean()sample_b = img_as_uint(sample_g > mean1)fig,
ax = plt.subplots(1,2,figsize=(12,12))
ax[0].imshow(sample)
ax[0].set_title('Original',fontsize=15)
ax[1].imshow(sample_b,cmap='gray')
ax[1].set_title('Binarize',fontsize=15)
plt.tight_layout()
plt.show()

Figure 3: Binarized (Image by Author)

Notice that to show Figure 3, we used the mean of the pixel values as
the threshold for binarizing the image. The mean is usually chosen as a
threshold when we have a sense of the majority of the pixel values. For
this sample, we know that mean will primarily be at the whitest value
since the majority of the pixel values in the sample are white due to the
background. This is clearly seen in the resulting binarized image.
Showing Color Spaces

We can show the different color spaces of the image. Remember that
the image has the 3 or 4 dimensions on the NumPy array shape right?

One of these is the number of color channels. Normally an image has 3


Color channels, Namely, Red, Green, and Blue. We can show the
different color value channels of an image using the cmap parameter of
the imshow function.
fig, ax = plt.subplots(1, 3, figsize=(15,5))
ax[0].imshow(sample[:,:,0], cmap='Reds')
ax[0].set_title('Red',fontsize=15)
ax[1].imshow(sample[:,:,1], cmap='Greens')
ax[1].set_title('Green',fontsize=15)
ax[2].imshow(sample[:,:,2], cmap='Blues')
ax[2].set_title('Blue',fontsize=15);
plt.tight_layout()
plt.show()

Figure 4: Color Channels (Image by Author)

As you can see we segment the NumPy array per channel, this is why
we were able to get the specific pixel values for each color space.

Showing the HSV Values

We can also show the HSV Values using the scikit-image function and
cmap parameters. HSV refers to the hue, saturation, and value of the
pixel. These different kinds of parameters are usable for segmenting
different items in the object which is important for object detection
models and many more. We can use the rgb2hsv function of skimage.
from skimage.color import rgb2hsv
sample_hsv = rgb2hsv(sample)fig, ax = plt.subplots(1, 3,
figsize=(15,5))
ax[0].imshow(sample_hsv[:,:,0], cmap='hsv')
ax[0].set_title('Hue',fontsize=15)
ax[1].imshow(sample_hsv[:,:,1], cmap='hsv')
ax[1].set_title('Saturation',fontsize=15)
ax[2].imshow(sample_hsv[:,:,2], cmap='hsv')
ax[2].set_title('Value',fontsize=15);
plt.tight_layout()
plt.show()

Figure 5: HSV Channels (Image by Author)

Grayscale, Binarized, Color Channel, and the HSV Channel are some of
the image conversion techniques that are usually used for a lot of
machine learning models. There are other variations of conversion in
scikit-image, some further examples seen below.

Using HED Channel or Haematoxylin-Eosin-DAB (HED) color


space
from skimage.color import rgb2hedsample_hed = rgb2hed(sample)fig,
ax = plt.subplots(1, 3, figsize=(15,5))
ax[0].imshow(sample_hed[:,:,0],cmap='gray')
ax[0].set_title('1st Channel',fontsize=15)
ax[1].imshow(sample_hed[:,:,1],cmap='gray')
ax[1].set_title('2nd Channel',fontsize=15)
ax[2].imshow(sample_hed[:,:,2],cmap='gray')
ax[2].set_title('3rd Channel',fontsize=15);
plt.tight_layout()
plt.show()

Figure 6: HED Color Space (Image by Author)

Using the XYZ color space


from skimage.color import rgb2xyzsample_xyz = rgb2xyz(sample)fig,
ax = plt.subplots(1, 3, figsize=(15,5))
ax[0].imshow(sample_xyz[:,:,0],cmap='gray')
ax[0].set_title('1st Channel',fontsize=15)
ax[1].imshow(sample_xyz[:,:,1],cmap='gray')
ax[1].set_title('2nd Channel',fontsize=15)
ax[2].imshow(sample_xyz[:,:,2],cmap='gray')
ax[2].set_title('3rd Channel',fontsize=15);
plt.tight_layout()
plt.show()

Figure 7: XYZ Color Space (Image by Author)

Using the YUV color space


from skimage.color import rgb2yuvsample_yuv = rgb2yuv(sample)fig,
ax = plt.subplots(1, 3, figsize=(15,5))
ax[0].imshow(sample_yuv[:,:,0],cmap='gray')
ax[0].set_title('1st Channel',fontsize=15)
ax[1].imshow(sample_yuv[:,:,1],cmap='gray')
ax[1].set_title('2nd Channel',fontsize=15)
ax[2].imshow(sample_yuv[:,:,2],cmap='gray')
ax[2].set_title('3rd Channel',fontsize=15);
plt.tight_layout()
plt.show()

You might also like