You are on page 1of 17

Unit 4

Basic concept of Computer Vision:-


Computer vision is a field of artificial intelligence (AI) that enables computers and systems
to derive meaningful information from digital images, videos and other visual inputs — and take
actions or make recommendations based on that information. If AI enables computers to think,
computer vision enables them to see, observe and understand.

As humans, we are able to perceive the three-dimensional world around us with such ease.
Imagine looking at a flower vase. We can effortlessly perceive each petal's shape and
translucency and can separate the flowers from the background. Computer vision aims at giving
computers the ability to understand the environment as we do. It focuses on looking at the world
through multiple images or videos and reconstructing properties like the shape of objects,
intensity, color distributions, etc.

Recent advancements in the field of deep learning are enabling computer vision methods to
understand and automate tasks that the human visual system can do. This article discusses the
introductory topics of Computer Vision, namely image formation and representation. Image
formation will briefly cover how an image is formed and the factors on which it depends. It will
also cover a pipeline of image sensing in a digital camera. The second half of the article will cover
image representation which will explain the various ways to represent the images and will focus
on certain operations that can be done on images.

1. Image formation

i) Photometric Image Formation:- Fig. 1 gives a simple explanation of image formation. The
light from a source is reflected on a particular surface. A part of that reflected light goes through
an image plane that reaches a sensor plane via optics.

Fig: 1. Photometric Image Formation

Some factors that affect image formation are:


 The strength and direction of the light emitted from the source.
 The material and surface geometry along with other nearby surfaces.
 Sensor Capture properties
ii) Reflection and Scattering:- Images cannot exist without light. Light sources can be a point
or an area light source. When the light hits a surface, three major reactions might occur-
 Some light is absorbed. That depends on the factor called ρ (albedo). Low ρ of the surface
means more light will get absorbed.
 Some light gets reflected diffusively, which is independent of viewing direction. It
follows Lambert’s cosine law that the amount of reflected light is proportional to cos(θ).
E.g., cloth, brick.
 Some light is reflected specularly, which depends on the viewing direction. E.g., mirror.

Fig. 2: Models of reflection (Image credits: Derek Hoiem, University of Illinois)

Apart from the above models of reflection, the most common model of light scattering is
the Bidirectional Reflectance Distribution Function (BRDF). It gives the measure of light
scattered by a medium from one direction into another. The scattering of the light can determine
the topography of the surface — smooth surfaces reflect almost entirely in the specular direction,
while with increasing roughness the light tends to diffract into all possible directions. Eventually,
an object will appear equally bright throughout the outgoing hemisphere if its surface is perfectly
diffuse (i.e., Lambertian). Owing to this, BRDF can give valuable information about the nature of
the target sample.

There are multiple other shading models and ray tracing approaches that are used in
unison to properly understand the environment by evaluating the appearance of the scene.

iii) Color:-From a viewpoint of color, we know visible light is only a small portion of a large
electromagnetic spectrum.
Two factors are noticed when a colored light arrives at a sensor:
 Colour of the light
 Colour of the surface
2. Image Representation
After getting an image, it is important to devise ways to represent the image. There are various
ways by which an image can be represented. Let’s look at the most common ways to represent an
image.
The image types we will consider are:

1.Binary Images:-
 Binary images are the simplest type of images and can take on two values, typically black
and white, or ‘0’ and ‘1’.
 A binary image is referred to as a 1 bit/pixel image because it takes only 1 binary digit to
represent each pixel.
 These types of images are frequently used in computer vision application where the only
information required for the task is general shapes, or outlines information.
 For example, to position a robotics gripper to grasp an object or in optical character
recognition (OCR). Binary images are often created from gray-scale images via a
threshold value is turned white (‘1’), and those below it are turned black (‘0’). We define
the characteristic function of an object in an image to be

 Each pixel is stored as a single bit (0 or 1)


 A 640 x 480 monochrome image requires 37.5 KB of storage.

2.Gray Scale Images:-


 They contain brightness information only, no color information.
 The number of different brightness level available.
 The typical image contains 8 bit/pixel , which allows us to have (0-255) different
brightness (gray) levels.
 The 8 bit representation is typically due to the fact that the byte, which corresponds to 8-
bit of data, is the standard small unit in the world of digital computer.
 Each pixel is usually stored as a byte (value between 0 to 255)
 A 640 x 480 grey scale image requires over 300 KB of storage.

Below Figure shows a grayscale image and a 6 × 6 detailed region, where


brighter pixels correspond to larger values.

Fig. A gray Scale Image and the pixel values in a 6 x 6 neighborhood

3. Color Images:-
 Representation of color images is more complex and varied.
 The two most common ways of storing color image contents are:

1)RGB representation—in which each pixel is usually represented by a 24-bit number


containing the amount of its red (R), green (G), and blue (B)
components.

2) Indexed representation—where a 2D array contains indices to a color palette (or lookup


table - (LUT)).

24-Bit (RGB) Color Images Color images can be represented using three 2D arrays of same
size, one for each color channel: red (R), green (G), and blue (B) (Figure below). Each array
element contains an 8-bit value, indicating the amount of red, green, or blue at that point in a [0,
255] scale. The combination of the three 8-bit values into a 24-bit number allows 224
(16,777,216 usually referred to as 16 million or 16 M) color combinations. An alternative
representation uses 32 bits per pixel and includes a fourth channel, called the alpha channel, that
provides a measure of transparency for each pixel and is widely used in image editing effects. In
the figure 4 we see a representation of a typical RGB color image.
Below Figure illustrate that in addition to referring to arrow or column as a vector, we can refer
to a single pixel red ,green, and blue values as a color pixel vector –(R,G,B ).

Example of 24-Bit Colors Image

 Each pixel is represented by three bytes (e.g., RGB)


 Supports 256 x 256 x 256 possible combined colors (16,777,216)
 A 640 x 480 24-bit color image would require 921.6 KB of storage

Indexed Color Images :-A problem with 24-bit color representations is backward compatibility
with older hardware that may not be able to display the 16 million colors simultaneously. A
solution devised before 24-bit color displays and video cards were widely available consisted of
an indexed representation, in which a 2D array of the same size as the image contains indices
(pointers) to a color palette (or color map) of fixed maximum size (usually 256 colors). The
color map is simply a list of colors used in that image. Figure 6 shows an indexed color image
and a 4 × 4 detailed region, where each pixel shows the index and the values of R, G, and B at
the color palette entry that the index points to.
 One byte for each pixel
 Supports 256 out of the millions s possible, acceptable color quality
 Requires Color Look-Up Tables (LUTs)
 A 640 x 480 8-bit color image requires 307.2 KB of storage (the same as 8-bit grayscale)

3.Linear filtering

Linear filtering of a signal can be seen as a controlled scaling of the signal components in
the frequency domain. Reducing the components in the center of the frequency domain (low
frequencies), gives the high-frequency components an increased relative importance, and thus
highpass filtering is performed. Filters can be made very selective. Any of the Fourier
coefficients can be changed independently of the others. For example, let H be a constant
function minus a pair of Dirac functions symmetrically centered in the Fourier domain with a
distance |ω1| from the center,

H = 1 - ∂(u - ω1) + ∂(u + ω1)

This filter, known as a notch filter, will leave all frequency components untouched, except the
component that corresponds to the sinusoid in Fig. 1, which will be completely removed. A
weighting of the Dirac functions will control how much of the component is removed. For
example, the filter
H = 1 – 0.9∂(u - ω1) + ∂(u + ω1)

will reduce the signal component to 10% of its original value. The result of the application of this
filter to the signal F1 + F2. (Fig. 1, bottom) is shown in Fig. 2. The lower-frequency component is
almost invisible.
Filters for practical applications have to be more general than “remove sinusoidal component
cos(ωTx).” In image enhancement, filters are designed to remove noise that is spread out all over
the frequency domain. It is a difficult task to design filters that remove as much noise as possible
without removing important parts of the signal.

4.Image in frequency domain

Frequency Domain Filters:- Frequency Domain Filters are used for smoothing and sharpening
of image by removal of high or low frequency components. Sometimes it is possible of
removal of very high and very low frequency. Frequency domain filters are different from
spatial domain filters as it basically focuses on the frequency of the images. It is basically done
for two basic operation i.e., Smoothing and Sharpening.
These are of 3 types:

1. Lowpass filter (smoothing):- A low pass filter is used to pass low-frequency signals. The
strength of the signal is reduced and frequencies which are passed is higher than the cut-off
frequency. The amount of strength reduced for each frequency depends on the design of the
filter. Smoothing is low pass operation in the frequency domain.

G(u, v) = H(u, v) . F(u, v)


Where, F(u, v) is the Fourier Transform of original image
H(u, v) is the Fourier Transform of filtering mask

Following are some lowpass filters:

i) Ideal Lowpass Filters:-The ideal lowpass filter is used to cut off all the high-frequency
components of Fourier transformation.Below is the transfer function of an ideal lowpass filter.
2. Butterworth Lowpass Filters:- Butterworth Lowpass Filter is used to remove high-frequency
noise with very minimal loss of signal components.

3.Gaussian Lowpass Filters:- A Gaussian Filter is a low pass filter used for reducing noise (high
frequency components) and blurring regions of an image. The transfer function of Gaussian
Lowpass filters is as shown below:

2. High pass filter:- A highpass filter is used for passing high frequencies but the strength of the
frequency is lower as compared to cut off frequency. Sharpening is a highpass operation in the
frequency domain. As lowpass filter, it also has standard forms such as Ideal highpass filter,
Butterworth highpass filter, Gaussian highpass filter.
Mechanism of high pass filtering in frequency domain is given by:

H(u, v) = 1 - H'(u, v)
Where, H(u, v) is the Fourier Transform of high pass filtering
H'(u, v) is the Fourier Transform of low pass filtering
Figure: High Pass Filter

Introduction to Frequency domain:-

In the frequency domain, a digital image is converted from spatial domain to frequency
domain. In the frequency domain, image filtering is used for image enhancement for a specific
application. A Fast Fourier transformation is a tool of the frequency domain used to convert the
spatial domain to the frequency domain. For smoothing an image, low filter is implemented and
for sharpening an image, high pass filter is implemented. When both the filters are implemented,
it is analyzed for the ideal filter, Butterworth filter and Gaussian filter.

The frequency domain is a space which is defined by Fourier transform. Fourier


transform has a very wide application in image processing. Frequency domain analysis is used to
indicate how signal energy can be distributed in a range of frequency. The basic principle of
frequency domain analysis in image filtering is to computer 2D discrete Fourier transform of the
image.
1.Fourier Series and Transform

Fourier Series:- Fourier series is a state in which periodic signals are represented by summing
up sines and cosines and multiplied with a certain weight. The periodic signals are further broken
down into more signals with some properties which are listed below:
1. Broken signals are sines and cosines.
2. New signals are harmonics of each other.

Fourier series analysis of a step edge:

Fourier decomposition:-

Example:
Fourier Transformation:-

Fourier transformation is a tool for image processing. it is used for decomposing an


image into sine and cosine components. The input image is a spatial domain and the output is
represented in the Fourier or frequency domain. Fourier transformation is used in a wide range of
application such as image filtering, image compression. Image analysis and image reconstruction
etc.

The formula for Fourier transformation:

Example:-

5.Image Sampling
Conversion of analog signal to digital signal:- The output of most of the image sensors is an
analog signal, and we can not apply digital processing on it because we can not store it. We can
not store it because it requires infinite memory to store a signal that can have infinite values. So
we have to convert an analog signal into a digital signal. To create an image which is digital, we
need to covert continuous data into digital form. There are two steps in which it is done.

 Sampling
 Quantization
We will discuss sampling now, and quantization will be discussed later on but for now on we
will discuss just a little about the difference between these two and the need of these two steps.
Basic idea:
The basic idea behind converting an analog signal to its digital signal is

to convert both of its axis (x,y) into a digital format.


Since an image is continuous not just in its co-ordinates (x axis), but also in its amplitude (y
axis), so the part that deals with the digitizing of co-ordinates is known as sampling. And the part
that deals with digitizing the amplitude is known as quantization.

Sampling:-

Sampling has already been introduced in our tutorial of introduction to signals and system. But
we are going to discuss here more.
 Here what we have discussed of the sampling.
 The term sampling refers to take samples
 We digitize x axis in sampling
 It is done on independent variable
 In case of equation y = sin(x), it is done on x variable
It is further divided into two parts , up sampling and down sampling

If you will look at the above figure, you will see that there are some random variations in
the signal. These variations are due to noise. In sampling we reduce this noise by taking samples.
It is obvious that more samples we take, the quality of the image would be more better, the noise
would be more removed and same happens vice versa.
However, if you take sampling on the x axis, the signal is not converted to digital format,
unless you take sampling of the y-axis too which is known as quantization. The more samples
eventually means you are collecting more data, and in case of image, it means more pixels.

6. Image Processing And Feature Extraction

Digital Image Processing:-

Digital image processing deals with manipulation of digital images through a digital
computer. It is also used to enhance the images, to get some important information from it. DIP
focuses on developing a computer system that is able to perform processing on an image. The
input of that system is a digital image and the system process that image using efficient
algorithms, and gives an image as an output.
For example: Adobe Photoshop, MATLAB, etc.

Digital Image Processing allows users the following tasks

 Image sharpening and restoration: The common applications of Image sharpening


and restoration are zooming, blurring, sharpening, grayscale conversion, edges
detecting, Image recognition, and Image retrieval, etc.
 Medical field: The common applications of medical field are Gamma-ray imaging,
PET scan, X-Ray Imaging, Medical CT, UV imaging, etc.
 Remote sensing: It is the process of scanning the earth by the use of satellite and
acknowledges all activities of space.
 Machine/Robot vision: It works on the vision of robots so that they can see things,
identify them, etc.

Characteristics of Digital Image Processing

 It uses software, and some are free of cost.


 It provides clear images.
 Digital Image Processing do image enhancement to recollect the data through images.
 It is used widely everywhere in many fields.
 It reduces the complexity of digital image processing.
 It is used to support a better experience of life.

Feature Extraction:-

Feature extraction is a part of the dimensionality reduction process, in which, an initial


set of the raw data is divided and reduced to more manageable groups. So when you want to
process it will be easier. The most important characteristic of these large data sets is that they
have a large number of variables. These variables require a lot of computing resources to
process. So Feature extraction helps to get the best feature from those big data sets by selecting
and combining variables into features, thus, effectively reducing the amount of data. These
features are easy to process, but still able to describe the actual data set with accuracy and
originality.

Why Feature Extraction is Useful?

 The technique of extracting the features is useful when you have a large data set and need
to reduce the number of resources without losing any important or relevant information.
 Feature extraction helps to reduce the amount of redundant data from the data set.
 In the end, the reduction of the data helps to build the model with less machine effort and
also increases the speed of learning and generalization steps in the machine
learning process.

Applications of Feature Extraction

 Bag of Words- Bag-of-Words is the most used technique for natural language
processing. In this process they extract the words or the features from a sentence,
document, website, etc. and then they classify them into the frequency of use. So in this
whole process feature extraction is one of the most important parts.
 Image Processing –Image processing is one of the best and most interesting domain. In
this domain basically you will start playing with your images in order to understand
them. So here we use many many techniques which includes feature extraction as well
and algorithms to detect features such as shaped, edges, or motion in a digital image or
video to process them.
 Auto-encoders: The main purpose of the auto-encoders is efficient data coding which is
unsupervised in nature. this process comes under unsupervised learning . So Feature
extraction procedure is applicable here to identify the key features from the data to code
by learning from the coding of the original data set to derive new ones.

7. Correlation:-

 Correlation explains how one or more variables are related to each other. These variables
can be input data features which have been used to forecast our target variable.
 Correlation, statistical technique which determines how one variables moves/changes in
relation with the other variable.
 It gives us the idea about the degree of the relationship of the two variables. It’s a bi-
variate analysis measure which describes the association between different variables.
 In most of the business it’s useful to express one subject in terms of its relationship with
others.
 For example: No of testing vs no of positive cases in Corona.

Positive Correlation: -Two features (variables) can be positively correlated with each other. It
means that when the value of one variable increase then the value of the other variable(s) also
increases.

Positive Correlation

Negative Correlation: -Two features (variables) can be negatively correlated with each other. It
means that when the value of one variable increase then the value of the other variable(s)
decreases.

Negative Correlation

No Correlation:- Two features (variables) are not correlated with each other. It means that when
the value of one variable increase or decrease then the value of the other variable(s) doesn’t
increase or decreases.

No Correlation
8. Edge Detection

Edge detection is a very old problem in computer vision which involves detecting the
edges in an image to determine object boundary and thus separate the object of interest. One of
the most popular technique for edge detection has been Canny Edge detection which has been
the go-to method for most of the computer vision researchers and practitioners. Let’s have a
quick look at Canny Edge Detection.
Canny Edge Detection Algorithm:-
Canny Edge detection was invented by John Canny in 1983 at MIT. It treats edge
detection as a signal processing problem. The key idea is that if you observe the change in
intensity on each pixel in an image, it’s very high on the edges.
In this simple image below, the intensity change only happens on the boundaries. So, you
can very easily identify edges just by observing the change in intensity.

Now, look at this image. The intensity is not constant but the rate of change in intensity is
highest at the edges. (Calculus refresher: The rate of change can be calculated using the first
derivative(gradient).)

The Canny Edge Detector identifies edges in 4 steps:

I. Noise removal: Since this method depends on sudden changes in intensity and if the image has

a lot of random noise, then it would detect that as an edge. So, it’s a very good idea to

smoothen your image using a Gaussian filter of 5×5.

II. Gradient Calculation: In the next step, we calculate the gradient of intensity(rate of change in

intensity) on each pixel in the image. We also calculate the direction of the gradient. Gradient

direction is perpendicular to the edges. It’s mapped to one of the four directions(horizontal,

vertical, and two diagonal directions).


III.Non-Maximal Suppression: Now, we want to remove the pixels(set their values to 0) which

are not edges. You would say that we can simply pick the pixels with the highest gradient

values and those are our edges. However, in real-world images, gradient doesn’t simply peak

at one pixel, rather it’s very high on the pixels near the edge as well. So, we pick the local

maxima in a neighborhood of 3×3 in the direction of gradients.

IV. Hysteresis Thresholding: In the next step, we need to decide on a threshold value of the
gradient below which all the pixels would be suppressed(set to zero). However, Canny edge
detector using Hysteresis thresholding. Hysteresis thresholding is one of the very simple yet
powerful ideas. It says that in place of using just one threshold we would use two thresholds:

 High threshold= A very high value is chosen in such a way that any pixel having gradient
value higher than this value is definitely an edge.
 Low threshold= A low value is chosen in such a way that any pixel having gradient value
below this value is definitely not an edge.

Pixels having gradients between these two thresholds are checked if they are connected to an
edge, if yes, then they are kept else suppressed.

You might also like