Professional Documents
Culture Documents
The field of digital image processing is built on the foundation of mathematical and probabilistic formulation, but human
intuition and analysis play the main role to make the selection between various techniques, and the choice or selection is
basically made on subjective, visual judgements.
In human visual perception, the eyes act as the sensor or camera, neurons act as the connecting cable and the brain acts
as the processor.
The basic elements of visual perceptions are:
1. Structure of Eye
2. Image Formation in the Eye
3. Brightness Adaptation and Discrimination
Structure of Eye:
The human eye is a slightly asymmetrical sphere with an average diameter of the length of 20mm to 25mm. It has a volume
of about 6.5cc. The eye is just like a camera. The external object is seen as the camera take the picture of any object. Light
enters the eye through a small hole called the pupil, a black looking aperture having the quality of contraction of eye when
exposed to bright light and is focused on the retina which is like a camera film.
The lens, iris, and cornea are nourished by clear fluid, know as anterior chamber. The fluid flows from ciliary body to the
pupil and is absorbed through the channels in the angle of the anterior chamber. The delicate balance of aqueous
production and absorption controls pressure within the eye.
Cones in eye number between 6 to 7 million which are highly sensitive to colors. Human visualizes the colored image in
daylight due to these cones. The cone vision is also called as photopic or bright-light vision.
Rods in the eye are much larger between 75 to 150 million and are distributed over the retinal surface. Rods are not
involved in the color vision and are sensitive to low levels of illumination.
Image Formation in the Eye:
When the lens of the eye focus an image of the outside world onto a light-sensitive membrane in the back of the eye, called
retina the image is formed. The lens of the eye focuses light on the photoreceptive cells of the retina which detects the
photons of light and responds by producing neural impulses.
The distance between the lens and the retina is about 17mm and the focal length is approximately 14mm to 17mm.
Brightness Adaptation and Discrimination:
Digital images are displayed as a discrete set of intensities. The eyes ability to discriminate black and white at different
intensity levels is an important consideration in presenting image processing result.
The range of light intensity levels to which the human visual system can adapt is of the order of 10 10 from the scotopic
threshold to the glare limit. In a photopic vision, the range is about 10 6.
Typically, a frame grabber or digitizer is used to sample and quantize the analogue video signal.
Sampling
Since an analogue image is continuous not just in its co-ordinates (x axis), but also in its amplitude (y axis), so
the part that deals with the digitizing of co-ordinates is known as sampling. In digitizing sampling is done on
independent variable. In case of equation y = sin(x), it is done on x variable.
When looking at this image, we can see there are some random variations in the signal caused by noise. In
sampling we reduce this noise by taking samples. It is obvious that more samples we take, the quality of the
image would be more better, the noise would be more removed and same happens vice versa. However, if you
take sampling on the x axis, the signal is not converted to digital format, unless you take sampling of the y-axis
too which is known as quantization.
Sampling has a relationship with image pixels. The total number of pixels in an image can be calculated as
Pixels = total no of rows * total no of columns. For example, let’s say we have total of 36 pixels, that means we
have a square image of 6X 6. As we know in sampling, that more samples eventually result in more pixels. So it
means that of our continuous signal, we have taken 36 samples on x axis. That refers to 36 pixels of this image.
Also the number sample is directly equal to the number of sensors on CCD array.
Here is an example for image sampling and how it can be represented using a graph.
Quantization
Quantization is opposite to sampling because it is done on “y axis” while sampling is done on “x axis”.
Quantization is a process of transforming a real valued sampled image to one taking only a finite number of
distinct values. Under quantization process the amplitude values of the image are digitized. In simple
words, when you are quantizing an image, you are actually dividing a signal into quanta(partitions).
Now let’s see how quantization is done. Here we assign levels to the values generated by sampling process. In
the image showed in sampling explanation, although the samples has been taken, but they were still spanning
vertically to a continuous range of gray level values. In the image shown below, these vertically ranging values
have been quantized into 5 different levels or partitions. Ranging from 0 black to 4 white. This level could vary
according to the type of image you want.
There is a relationship between Quantization with gray level resolution. The above quantized image represents
5 different levels of gray and that means the image formed from this signal, would only have 5 different colors.
It would be a black and white image more or less with some colors of gray.
When we want to improve the quality of image, we can increase the levels assign to the sampled image. If we
increase this level to 256, it means we have a gray scale image. Whatever the level which we assign is called as
the gray level. Most digital IP devices uses quantization into k equal intervals. If b-bits per pixel are used,
The number of quantization levels should be high enough for human perception of fine shading details in the
image. The occurrence of false contours is the main problem in image which has been quantized with
insufficient brightness levels. Here is an example for image quantization process.
IMAGE FORMATION:
In modeling any image formation process, geometric primitives and transformations are crucial to project 3-D
geometric features into 2-D features. However, apart from geometric features, image formation also depends
on discrete color and intensity values. It needs to know the lighting of the environment, camera optics, sensor
properties, etc. Therefore, while talking about image formation in Computer Vision, the article will be
focussing on photometric image formation.
Fig. 1 gives a simple explanation of image formation. The light from a source is reflected on a particular
surface. A part of that reflected light goes through an image plane that reaches a sensor plane via optics.
Fig: 1. Photometric Image Formation (Image credit: Szeliski, Computer Vision: Algorithms and Applications 2010)
Some factors that affect image formation are:
The strength and direction of the light emitted from the source.
The material and surface geometry along with other nearby surfaces.
Images cannot exist without light. Light sources can be a point or an area light source. When the light hits a
surface, three major reactions might occur-
1. Some light is absorbed. That depends on the factor called ρ (albedo). Low ρ of the surface means more light
will get absorbed.
2. Some light gets reflected diffusively, which is independent of viewing direction. It follows Lambert’s
cosine law that the amount of reflected light is proportional to cos(θ). E.g., cloth, brick.
3. Some light is reflected specularly, which depends on the viewing direction. E.g., mirror.
Fig. 2: Models of reflection (Image credits: Derek Hoiem, University of Illinois)
Apart from the above models of reflection, the most common model of light scattering is the Bidirectional
Reflectance Distribution Function (BRDF). It gives the measure of light scattered by a medium from
one direction into another. The scattering of the light can determine the topography of the surface — smooth
surfaces reflect almost entirely in the specular direction, while with increasing roughness the light tends to
diffract into all possible directions. Eventually, an object will appear equally bright throughout the outgoing
hemisphere if its surface is perfectly diffuse (i.e., Lambertian). Owing to this, BRDF can give valuable
information about the nature of the target sample.
There are multiple other shading models and ray tracing approaches that are used in unison to properly
understand the environment by evaluating the appearance of the scene.
2.1.2 Color
From a viewpoint of color, we know visible light is only a small portion of a large electromagnetic spectrum.
Bayer Grid/Filter is an important development to capture the color of the light. In a camera, not every
sensor captures all the three components (RGB) of light. Inspired by human visual preceptors, Bayers
proposed a grid in which there are 50% green, 25 % red, and 25% blue sensors.
Demosaicing algorithm is then used to obtain a full-color image where the surrounding pixels are used to
estimate the values for a particular pixel.
There are many such color filters that have been developed to sense colors apart from Bayer Filter.
Fig. 3: (a) Bayer arrangement of filters on an image sensor. (b) The cross-section of the sensor.
The light originates from multiple light sources, gets reflected on multiple surfaces, and finally enters the
camera where the photons are converted into the (R, G, B) values that we see while looking at a digital image.
An image sensing pipeline in the camera follows the flowchart that is given in fig. 4.
In a camera, the light first falls on the lens (optics). Following that is the aperture and shutter which can be
specified or adjusted. Then the light falls on sensors which can be CCD or CMOS (discussed below), then the
image is obtained in an analog or digital form and we get the raw image.
Typically cameras do not stop here. They use demosaic algorithms mentioned in the above topic. Image is
sharpened if required or any other important processing algorithms are applied. Post this, white balancing and
other digital signal processing tasks are done and the image is finally compressed to a suitable format and
stored.
Fig. 4: Image sensing pipeline in a camera (Image credit: Szeliski, Computer Vision: Algorithms and Applications 2010)
2.2.1 CCD vs CMOS
The camera sensor can be CCD or CMOS. In charged coupled device (CCD). A charge is generated at each
sensing element and this photogenerated charge is moved from pixel to pixel and is converted into a voltage at
the output node. Then an analog to digital converter (ADC) converts the value of each pixel to a digital value.
Fig. 5: CCD vs CMOS (Image credit: D. Litwiller, CMOS vs. CCD: Maturing technologies, maturing markets)
Let us look at some properties that you may see while clicking a picture on a camera.
Shutter Speed: It controls the amount of light reaching the sensor
Sampling Pitch: It defines the physical space between adjacent sensor cells on the imaging chip.
Fill Factor: It is the ratio of active sensing area size with respect to the theoretically available sensing area
(product of horizontal and vertical sampling pitches)
Resolution: It tells you how many bits are specified for each pixel.
Post-processing: Digital image enhancement methods used before compression and storage.
Imaging:
The mapping of a 3D world object into a 2D digital image plane is called imaging. In order to do so, each point on the 3D
object must correspond to the image plane. We all know that light reflects from every object that we see thus enabling us to
capture all those light-reflecting points in our image plane.
Various factors determine the quality of the image like spatial factors or the lens of the capturing device.
Color and Pixelation:
In digital imaging, a frame grabber is placed at the image plane which is like a sensor. It aims to focus the light on it and the
continuous image is pixelated via the reflected light by the 3D object. The light that is focused on the sensor generates an
electronic signal.
Each pixel that is formed may be colored or grey depending on the intensity of the sampling and quantization of the light
that is reflected and the electronic signal that is generated via them.
All these pixels form a digital image. The density of these pixels determines the image quality. The more the density the
more the clear and high-resolution image we will get.
In order to form or create an image that is digital in nature, we need to have a continuous conversion of data into a digital
form. Thus, we require two main steps to do so:
Sampling (2D): Sampling is a spatial resolution of the digital image. And the rate of sampling determines the quality of
the digitized image. The magnitude of the sampled image is determined as a value in image processing. It is related to
the coordinates values of the image.
Quantization: Quantization is the number of grey levels in the digital image. The transition of the continuous values from
the image function to its digital equivalent is called quantization. It is related to the intensity values of the image.
The normal human being acquires a high level of quantization levels to get the fine shading details of the image. The
more quantization levels will result in the more clear image.
Geometric Operations
Examples of
Geometric
operations
Geometric Operations
Example:
Translation
geometric operation
moves value at
(x,y) to (x + dx, y + dy)
Geometric Operations
x x y *sin( * x)
Homogeneous Coordinates
Notation useful for converting scaling, translation, rotating
into point‐matrix multiplication
To convert ordinary coordinates into
homogeneous coordinates
Affine (3‐Point) Mapping
Can use homogeneous coordinates to rewrite translation,
rotation, scaling, etc as vector‐matrix multiplication
Maps
straight lines ‐> straight lines,
triangles ‐> triangles
rectangles ‐> parallelograms
Parallel lines ‐> parallel lines
Distance ratio on lines do not change
Non‐Linear Image Warps
1. Binary Images
It is the simplest type of image. It takes only two values i.e, Black and White or 0 and 1.
The binary image consists of a 1-bit image and it takes only 1 binary digit to represent a
pixel. Binary images are mostly used for general shape or outline.
Binary images are generated using threshold operation. When a pixel is above the
threshold value, then it is turned white('1') and which are below the threshold value then
they are turned black('0')
2. Gray-scale images
Grayscale images are monochrome images, Means they have only one color. Grayscale
images do not contain any information about color. Each pixel determines available
different grey levels.
A normal grayscale image contains 8 bits/pixel data, which has 256 different grey levels.
In medical images and astronomy, 12 or 16 bits/pixel images are used.
3. Colour images
Colour images are three band monochrome images in which, each band contains a
different color and the actual information is stored in the digital image. The color
images contain gray level information in each spectral band.
The images are represented as red, green and blue (RGB images). And each color image
has 24 bits/pixel means 8 bits for each of the three color band(RGB).
In RGB format, there are 5 bits for R, 6 bits for G, and 5 bits for B. One additional bit is
added in green because in all the 3 colors green color is soothing to eyes.
Neighbours of a pixel
1. q is in N4(p), or
2. 2) q is in ND(p) and the set N4(p)∩N4(q) has no pixels whose
values are from V.
Distance Transform
Brief Description
The distance transform is an operator normally only applied to binary images. The
result of the transform is a graylevel image that looks similar to the input image,
except that the graylevel intensities of points inside foreground regions are changed to
show the distance to the closest boundary from each point.
One way to think about the distance transform is to first imagine that foreground
regions in the input binary image are made of some uniform slow burning
inflammable material. Then consider simultaneously starting a fire at all points on the
boundary of a foreground region and letting the fire burn its way into the interior. If
we then label each point in the interior with the amount of time that the fire took to
first reach that point, then we have effectively computed the distance transform of that
region. Figure 1 shows a distance transform for a simple rectangular shape.
Figure 1 The distance transform of a simple shape. Note that we are using the
`chessboard' distance metric.
There is a dual to the distance transform described above which produces the distance
transform for the background region rather than the foreground region. It can be
considered as a process of inverting the original image and then applying the standard
transform as above.
How It Works
There are several different sorts of distance transform, depending upon which distance
metric is being used to determine the distance between pixels. The example shown in
Figure 1 uses the `chessboard' distance metric but both the Euclidean and `city block'
metrics can be used as well.
Even once the metric has been chosen, there are many ways of computing the distance
transform of a binary image. One intuitive but extremely inefficient way of doing it is
to perform multiple successive erosions with a suitable structuring element until all
foreground regions of the image have been eroded away. If each pixel is labeled with
the number of erosions that had to be performed before it disappeared, then this is just
the distance transform. The actual structuring element that should be used depends
upon which distance metric has been chosen. A 3×3 square element gives the
`chessboard' distance transform, a cross shaped element gives the `city block' distance
transform, and a disk shaped element gives the Euclidean distance transform. Of
course it is not actually possible to generate a good disk shaped element on a discrete
grid on a small scale, but there are algorithms that vary the structuring element on
each erosion so as to approximate a circular element.
The distance transform can be calculated much more efficiently using clever
algorithms in only two passes (e.g. Rosenfeld and Pfaltz 1968). This algorithm, which
is based on recursive morphology, will not be described here.
becomes
becomes
And finally,
becomes
The distance transform is sometimes very sensitive to small changes in the object. If,
for example, we change the above rectangle to
which contains a small black region in the center of the white rectangle, then the
distance transform becomes
(after brightening the image by a factor of 6). This can be of advantage when we want
to distinguish between similar objects like the two different rectangles above.
However, it can also cause problems when trying to classify objects into classes of
roughly the same shape. It also makes the distance transform very sensitive to noise.
For instance, if we add some `pepper noise' to the above rectangle, as in
The last three examples show that it is important that the binary input image is a good
representation of the object that we want to process. Simple thresholding is often not
enough. It might be necessary to further process the image before applying the
distance transform.
Let’s discuss each color model and its application. We will start
with the most popular model that is the RGB model
RGB Model
Application-
The RGB model is widely used in the representation and display of images in
electronic systems like computers and televisions.
It is also used in conventional photography as well.
Image scanner which scans images and converts it to a digital image mostly
supports RGB color.
It is used in web graphics.
CMY Model
Application-
0 degree – Red
120 degree – Green
240 degree – Blue
60 degree – Yellow
300 degree – Magenta
Intensity
Range is [0, 1]
0 means white
1 means black
3 Colour Models
Colour models provide a standard way to specify a
particular colour, by defining a 3D coordinate
system, and a subspace that contains all constructible colours within a particular
model. Any colour that can be specified using a model will correspond to a single
point within the subspace it defines. Each colour model is oriented towards either
specific hardware (RGB,CMY,YIQ), or image processing applications (HSI).
In the RGB model, an image consists of three independent image planes, one in each
of the primary colours: red, green and blue. (The standard wavelengths for the three
primaries are as shown in figure 1). Specifying a particular colour is by specifying the
amount of each of the primary components present. Figure 5 shows the geometry of
the RGB colour model for specifying colours using a Cartesian coordinate system.
The greyscale spectrum, i.e. those colours made from equal amounts of each primary,
lies on the line joining the black and white vertices.
Figure
Figure 5: The RGB colour cube. The greyscale spectrum lies on the line joining the
black and white vertices.
This is an additive model, i.e. the colours present in the light add to form new colours,
and is appropriate for the mixing of coloured light for example. The image on the left
of figure 6 shows the additive mixing of red, green and blue primaries to form the
three secondary colours yellow (red + green), cyan (blue + green) and magenta (red +
blue), and white ((red + green + blue).
The RGB model is used for colour monitors and most video cameras.
When a surface coated with cyan pigment is illuminated by white light, no red light is
reflected, and similarly for magenta and green, and yellow and blue. The relationship
between the RGB and CMY models is given by:
� � � �� �
� � � �� �
C 1 R
� � � �� �
M = 1 - G .
� � � �� �
Y 1 B
� � � �� �
� � � �� �
Figure
Figure 6: The figure on the left shows the additive mixing of red, green and blue
primaries to form the three secondary colours yellow (red + green), cyan (blue +
green) and magenta (red + blue), and white ((red + green + blue). The figure on the
right shows the three subtractive primaries, and their pairwise combinations to form
red, green and blue, and finally black by subtracting all three primaries from white.
As all schoolchildren know, the way to make green paint is to mix blue paint with
yellow. But how does this work? If blue paint absorbs all but blue light, and yellow
absorbs blue only, when combined no light should be reflected and black paint result.
However, what actually happens is that imperfections in the paint are exploited. In
practice, blue paint reflects not only blue, but also some green. Since the yellow paint
also reflects green (since yellow = green + red), some green is reflected by both
pigments, and all other colours are abosrbed, resulting in green paint.
3.3 The HSI Model
As mentioned above, colour may be specified by the three quantities hue, saturation
and intensity. This is the HSI model, and the entire space of colours that may be
specified in this way is shown in figure 7.
Figur
e
Figure 7: The HSI model, showing the HSI solid on the left, and the HSI triangle on the
right, formed by taking a horizontal slice through the HSI solid at a particular
intensity. Hue is measured from red, and saturation is given by distance from the
axis. Colours on the surface of the solid are fully saturated, i.e. pure colours, and the
greyscale spectrum is on the axis of the solid. For these colours, hue is undefined.
Conversion between the RGB model and the HSI model is quite complicated. The
intensity is given by
R+G+B
I= ,
3
where the quantities R, G and B are the amounts of the red, green and blue
components, normalised to the range [0,1]. The intensity is therefore just the
average of the red, green and blue components. The saturation is given by:
min (R,G,B) 3
I R+G+B
where the min(R,G,B) term is really just indicating the amount of white present. If
any of R, G or B are zero, there is no white and we have a pure colour. The
expression for the hue, and details of the derivation may be found in reference [1].
3.4 The YIQ Model
� � � �� �
� � � �� �
Y 0.299 0.587 0.114 R
� � � �� �
I = 0.596 -0.275 -0.321 . G
� � � �� �
Q 0.212 -0.523 0.311 B
� � � �� �
� � � �� �
The luminance (Y) component contains all the information required for black and
white television, and captures our perception of the relative brightness of particular
colours. That we perceive green as much lighter than red, and red lighter than blue,
is indicated by their respective weights of 0.587, 0.299 and 0.114 in the first row of
the conversion matrix above. These weights should be used when converting a
colour image to greyscale if you want the perception of brightness to remain the
same. This is not the case for the intensity component in an HSI image, as shown in
figure 8.
Figure
Figure 8: Image (a) shows a colour test pattern, consisting of horizontal stripes of
black, blue, green, cyan, red, magenta and yellow, a colour ramp with constant
intensity, maximal saturation, and hue changing linearly from red through green to
blue, and a greyscale ramp from black to white. Image (b) shows the intensity for
image (a). Note how much detail is lost. Image (c) shows the luminance. This third
image accurately reflects the brightness variations preceived in the original image.
Figure
Figure 9: The top image is a very dark image of a forest scene. The middle image is
the result of applying histogram equalisation to each of the red, green and blue
components of the original image. The bottom image is the result of converting the
image to YIQ format, and applying histogram equalisation to the luminance
component only.
Basic Image Conversion
Image by Author
Showing Grayscale
We can show the grayscale of the image by using the rgb2gray function.
We can also show the grayscale by using the cmap = ‘gray’ version of
the plot/graph we chose.
from skimage.color import rgb2gray
import matplotlib.pyplot as pltsample_g = rgb2gray(sample)fig, ax =
plt.subplots(1,2,figsize=(12,12))
ax[0].imshow(sample)
ax[0].set_title('Original',fontsize=15)
ax[1].imshow(sample_g,cmap='gray')
ax[1].set_title('Grayscale',fontsize=15)
plt.tight_layout()
plt.show()
Notice that to show Figure 3, we used the mean of the pixel values as
the threshold for binarizing the image. The mean is usually chosen as a
threshold when we have a sense of the majority of the pixel values. For
this sample, we know that mean will primarily be at the whitest value
since the majority of the pixel values in the sample are white due to the
background. This is clearly seen in the resulting binarized image.
Showing Color Spaces
We can show the different color spaces of the image. Remember that
the image has the 3 or 4 dimensions on the NumPy array shape right?
As you can see we segment the NumPy array per channel, this is why
we were able to get the specific pixel values for each color space.
We can also show the HSV Values using the scikit-image function and
cmap parameters. HSV refers to the hue, saturation, and value of the
pixel. These different kinds of parameters are usable for segmenting
different items in the object which is important for object detection
models and many more. We can use the rgb2hsv function of skimage.
from skimage.color import rgb2hsv
sample_hsv = rgb2hsv(sample)fig, ax = plt.subplots(1, 3,
figsize=(15,5))
ax[0].imshow(sample_hsv[:,:,0], cmap='hsv')
ax[0].set_title('Hue',fontsize=15)
ax[1].imshow(sample_hsv[:,:,1], cmap='hsv')
ax[1].set_title('Saturation',fontsize=15)
ax[2].imshow(sample_hsv[:,:,2], cmap='hsv')
ax[2].set_title('Value',fontsize=15);
plt.tight_layout()
plt.show()
Grayscale, Binarized, Color Channel, and the HSV Channel are some of
the image conversion techniques that are usually used for a lot of
machine learning models. There are other variations of conversion in
scikit-image, some further examples seen below.