Professional Documents
Culture Documents
Image Recognition
Image Recognition
COMPILED BY:
CS/7HCS/00187
CS/7HCS/00188
CS/7HCS/00194
CS/7HCS/00202
CS/7HCS/00207
contributed to the growth of computer vision and image recogniton concepts. From
controlling a driver-less car to carrying out face detection for a biometric access, image
recognition helps in processing and categorizing objects based on trained algorithms. This
paper discusses “Image processing”, an ability of a computer powered camera to identify and
detect objects or features in a digital image or video. The paper also discusses how image
recognition works, its benefits, its usage in business, benefits, challenges e.t.c.
INTRODUCTION
Just like the phrase “What-you-see-is-what-you-get” says, human brains make vision easy.
It doesn’t take any effort for humans to tell how a dog or a cat looks like. But this process is quite
hard for a computer to imitate. It seems easy for human being only because as humans, naturally;
our brains are incredibly good in recognizing images. The point here is that when it comes to
identifying images, we humans can clearly recognize and distinguish different features of
objects. This is because our brains have been trained unconsciously with the same set of images
that has resulted in the development of capabilities to differentiate between things effortlessly.
We are hardly conscious when we interpret the real world. Encountering different entities of the
visual world and distinguishing with ease is a no challenge to us. Our subconscious mind carries
out all the processes without any hassle. Contrary to human brains, computer views visuals as an
array of numerical values and looks for patterns in the digital image, be it a still, video or
graphics; recognizes them and distinguishes the key features of the image. The manner in which
Computer vision also uses image processing algorithms to analyze and understand
identifying pedestrians and vehicles on the road by, categorizing and filtering millions of user-
uploaded pictures with accuracy. Therefore, the two terms “computer vision” and “image
recognition” may have been used interchangeably. Computer vision (CV) is to let a computer
imitate human vision and take actions. For example, CV can be designed to sense a running child
on the road and produces a warning signal to the driver. In contrast, image recognition is about the
pixel and pattern analysis of an image to recognize the image as a particular object. Computer
vision means it can “do something” with the recognized images. Image recognition is called the
1
labeling process applied to a segmented object of a scene. That is, the image recognition
presumes that objects in a scene have been segmented as individual elements (e.g. a bolt, and a
wrench). The typical constraint here is that images are acquired in a known viewing geometry
1. The decision theoretic methods, which uses proper decision or discriminant functions for
2
BRIEF HISTORY OF IMAGE RECOGNITION
In the year 2001, an efficient algorithm for face detection was invented by Paul Viola and
Michael Jones. Their demo that showed faces being detected in real time on a webcam feed was
the most stunning demonstration of computer vision and its potential at the time. Soon, it was
implemented in Open Computer Vision (CV) and face detection became synonymous with
Viola and Jones algorithm. Every few years a new idea comes along that forces people to pause
and take note. In object detection, that idea came in 2005 with a paper by Navneet Dalal and Bill
Every decade or so a new idea comes along that is so effective and powerful that you
abandon everything that came before it and wholeheartedly embrace it. Deep Learning is that
idea of this decade. Deep Learning algorithms had been around for a long time, but they became
mainstream in computer vision with its resounding success at the ImageNet Large Scale Visual
Learning by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton shook the computer vision
world with an astounding 85% accuracy 11% better than the algorithm that won the second
place. In ILSVRC 2012, this was the only Deep Learning based entry. In 2013, all winning
entries were based on Deep Learning and in 2015 multiple Convolutional Neural Network
With such huge success in image recognition, Deep Learning based object detection was
inevitable. Techniques like Faster R-CNN produce jaw-dropping results over multiple object
classes. We will learn about these in later posts, but for now keep in mind that if you have not
looked at Deep Learning based image recognition and object detection algorithms for your
3
applications, you may be missing out on a huge opportunity to get better results. With that
overview, we are ready to return to the main goal of this post understand image recognition
Image recognition is the ability of a computer powered camera to identify and detect
examining, and sympathizing images. To identify and detect images, computers use machine
In other words, image recognition can be said to be the ability of a system or software to
identify objects, people, places, and actions in images. It uses machine vision technologies with
artificial intelligence and trained algorithms to recognize images through a camera system.
• Face recognition
• Scene identification
Image recognition technology works by detecting salient regions, which are portions that
contain the most information about the image or the object. It does this by isolating the most
informative portions or features in a selected image and localizes them, while ignoring the rest of
the features that may not be of much interest. The process uses an image recognition algorithm,
4
also known as an image classifier, that takes an image as input and outputs what the image
contains. For an algorithm to know what an image contains, it has to be trained to learn the
differences between classes. For instance, if the goal of an image recognition system is to detect
and identify dogs, the image recognition algorithm needs to be trained with thousands of images
of dogs and thousands of images of backgrounds that do not contain any dogs.
Facebook can now perform face recognize at 98% accuracy which is comparable to the
ability of humans. Facebook can identify your friend’s face with only a few tagged pictures. The
efficacy of this technology depends on the ability to classify images. Classification is pattern
matching with data. Images are data in the form of 2-dimensional matrices. In fact, image
recognition is classifying data into one category out of many. One common and an important
example is optical character recognition (OCR). OCR converts images of typed or handwritten
text into machine-encoded text. The major steps in image recognition process are gather and
In the commercial world, the major applications of image recognition are face
recognition, security and surveillance, visual geolocation, object recognition, gesture recognition,
code recognition, industrial automation, image analysis in medical and driver assistance. These
applications are revolutionizing the business world, across many industries, and here’s how:
• InE-commerce
Image recognition has been highly adopted in e-commerce, including search and
products, providing potential customers with a more engaging experience of the world
5
around them. It presents a more interactive view of the world by making everything
searchable.
cards with Face ID. In the workplace, this can be used to determine if a person is
granted access to official work documents or simply to check in. Another example
• Automotive industry
Self-driving cars are the buzz in the automotive industry and are already being tested in the
U.S. and other parts of the world. These advancements in the automobile world are made
possible by computer vision technology which uses AI image recognition. Computer vision
systems powered by deep learning are trained using thousands of images such as road signs,
pathways, moving objects, vehicles, and people and are fed into the systems neural networks.
The systems get intelligent as more training data is fed into the system and this is how
Image recognition can really help you with digital marketing. By integrating the
application’s programing interface to your text-based analytics platforms, you will be able to
offer visual insights to your customers without the expensive product creation that uses logo
detection. Image recognition can also help you monitor ROI (Return on investment) and protect
6
your brand. You will be able to track how a sponsorship is doing with image and logo detection
and this will help you determine how much revenue you will get in return. Therefore, integrating
an image recognition application programing interface is an easy way of giving your customers
When using image recognition, you can easily transposition digital information. As
opposed to virtual reality, image recognition doesn’t replace your environment with a digital one,
instead, it adds more perks to it. in addition, you can easily organize your visual memory. Image
recognition software can help you make mental notes through visual. If you take an image, its
computer vision will match up with the visual background information, meaning that you can get
information about wine bottles, books, DVDs, and many more by simply taking a photo of their
covers or labels. When you have these images in your computer, you can then search for the
Viewpoint Variation: In a real world, the entities within the image are aligned in different
directions and when such images are fed to the system, the system predicts inaccurate values. In
short, the system fails to understand that changing the alignment of the image (left, right, bottom,
top) will not make it different and that is why it creates challenges in image recognition.
Scale Variation: Variations in size affect the classification of the object. The closer you view
Deformation: Objects do not change even if they are deformed. The system learns from the
perfect image and forms a perception that a particular object can be in specific shape only. We
know that in the real world, shape changes and as a result, there are inaccuracies when the
7
Inter-class Variation: Certain object varies within the class. They can be of different shape,
size, but still represents the same class. For example, buttons, chairs, bottles, bags come in
• Drones:
areas.
• Manufacturing:
the premises. Monitoring the quality of the final products to reduce the defects.
• Autonomous Vehicles:
Autonomous vehicles with image recognition can identify activities on the road
and take necessary actions. Mini robots can help logistics industries to locate
and transfer the objects from one place to another. It also maintains the database
of the product movement history to prevent the product from being misplaced or
stolen.
• Military Surveillance:
8
making capabilities can help prevent infiltration and result in saving the lives of
soldiers.
• Forest Activities: Unmanned Aerial Vehicles can monitor the forest, predict
changes that can result in forest fires, and prevent poaching. It can also provide
a complete monitoring of the vast lands, which humans cannot access easily.
• The first and very obvious stage is obtaining the image that we will be working on. With the
current state of technology, obtaining an image is very easy, a good picture can be taken with
both a Smartphone and a camera installed on the laptop. A computer can only analyze a
digital input, so each photo has to be represented as a set of points pixels. Only this kind of
signal can be then converted into a decision on further action. If we use a digital camera to
• The second step of the process is to find a mathematical description of the image. There are
three main levels of image processing. Pre-processing is used to remove noise, sharpen the
photo to easily recognize a significant object in the image or change the color of the photo to
• The third step is the definition of the set of features that will be the most descriptive for
identified objects. It is called mid-level processing. The choice of these features strongly
affects recognition so they are selected exactly for the specific application. Very often,
quantitative features are used, which can be conveniently expressed and put on coordinate
9
For a human face, for example, it can be its width, distance between the corners of the lips or
between the pupils. “Measured” face can be then determined in space, which will set a point for
us to identify a specific person and distinguish from others even if those are very similar.
Now, we are entering the third stage of the image recognition process classification. It is a
mathematical function that can assign each feature to one of the classes on which we will be
making decisions later. Similar objects will be close to each other in the space of features,
different far away, what leads us to focused or a well-separated group of points, representing
objects of a given class. As you can guess, objects of different classes will be located at a
considerable distance from other classes. Such grouping is a prerequisite for being able to
recognize objects effectively. If different classes are mixed due to incorrect initial selection of
• The last step is the right recognition and decision making. This stage focuses on the proper
description of the image, in the form of a mathematical formula. It refers to the concept of
indicator functions, which are created automatically by the image recognition algorithm,
which uses the learning process. It is important that the algorithm leads to the fact that for
every point it can be determined to what extent the object meets the conditions for being
classified into the appropriate class. A correctly constructed selection of indicator functions
for all the classes or correctly defined boundaries between clusters of classes lead directly to
the final stage of the whole process — recognition and decision making — object is assigned
to a class for which the indicator function has the highest value or on which side of the
10
These steps can be further explained below:
Step 1: Preprocessing
Often an input image is pre-processed to normalize contrast and brightness effects. A very
common preprocessing step is to subtract the mean of image intensities and divide by the
standard deviation. Sometimes, gamma correction produces slightly better results. While dealing
with color images, a color space transformation ( e.g. RGB to LAB color space ) may help get
better results. As part of pre-processing, an input image or patch of an image is also cropped and
resized to a fixed size. This is essential because the next step, feature extraction, is performed on
The input image has too much extra information that is not necessary for classification.
Therefore, the first step in image classification is to simplify the image by extracting the
important information contained in the image and leaving out the rest. For example, if you want
to find shirt and coat buttons in images, you will notice a significant variation in RGB pixel
values. However, we can simplify the image by running an edge detector on an image. You can
still easily discern the circular shape of the buttons in these edge images and so we can conclude
that edge detection retains the essential information while throwing away non-essential
information. The step is called feature extraction. In traditional computer vision approaches
designing these features are crucial to the performance of the algorithm. Turns out we can do
much better than simple edge detection and find features that are much more reliable. In our
example of shirt and coat buttons, a good feature detector will not only capture the circular shape
of the buttons but also information about how buttons are different from other circular objects
11
like car tires. Some well-known
known features used in computer vision are Haar-like
Haar features
A feature extraction algorithm converts an image of fixed size to a feature vector of fixed
size. In the case of pedestrian detection, the HOG feature descriptor is calculated for a 64×128
patch of an image and it returns a vector of size 3780. Notice that the original dimension of this
image patch was 64 x 128 x 3 = 24,576 which is reduced to 3780 by the HOG descriptor.
HOG is based on the idea that local object appearance can be effectively described by the
distribution ( histogram ) of edge directions ( oriented gradients ). The steps for calculating the
1. Gradient calculation : Calculate the x and the y gradient images, and , from the
3. Calculate histogram of gradients in these 8×8 cells : At each pixel in an 8×8 cell we
know the gradient ( magnitude and direction ), and therefore we have 64 magnitudes and
64 directions i.e. 128 numbers. Histogram of these gradients will provide a more useful
and compact representation. We will next convert these 128 numbers into a 9-bin
9
histogram ( i.e. 9 numbers ). The bins of the histogram correspond to gradients directions
0, 20, 40 … 160 degrees. Every pixel votes for either one or two bins in the histogram. If
the direction of the gradient at a pixel is exactly 0, 20, 40 … or 160 degrees, a vote equal
to the magnitude of the gradient is cast by the pixel into the bin. A pixel where the
12
direction of the gradient is not exactly 0, 20, 40 … 160 degrees splits its vote among the
two nearest bins based on the distance from the bin. E.g. A pixel where the magnitude of
the gradient is 2 and the angle is 20 degrees will vote for the second bin with value 2. On
the other hand, a pixel with gradient 2 and angle 30 will vote 1 for both the second bin (
4. Block normalization : The histogram calculated in the previous step is not very robust to
lighting changes. Multiplying image intensities by a constant factor scales the histogram
bin values as well. To counter these effects we can normalize the histogram — i.e. think
of the histogram as a vector of 9 elements and divide each element by the magnitude of
this vector. In the original HOG paper, this normalization is not done over the 8×8 cell
that produced the histogram, but over 16×16 blocks. The idea is the same, but now
5. Feature Vector : In the previous steps we figured out how to calculate histogram over an
8×8 cell and then normalize it over a 16×16 block. To calcualte the final feature vector
for the entire image, the 16×16 block is moved in steps of 8 ( i.e. 50% overlap with the
calculated at each step are concatenated to produce the final feature vector.
The input image is 64×128 pixels in size, and we are moving 8 pixels at a time.
Therefore, we can make 7 steps in the horizontal direction and 15 steps in the vertical
13
Step 3 : Learning Algorithm For Classification
In the previous section, we learned how to convert an image to a feature vector. In this
section, we will learn how a classification algorithm takes this feature vector as input and outputs
thousands of examples of cats and backgrounds. Different learning algorithms learn differently,
but the general principle is that learning algorithms treat feature vectors as points in higher
dimensional space, and try to find planes / surfaces that partition the higher dimensional space in
such a way that all examples belonging to the same class are on one side of the plane surface.
digital image or video. This concept is used in many applications like systems for factory
automation, toll booth monitoring, and security surveillance. Typical image recognition
algorithms include:
• Face recognition
Machine learning and deep learning methods can also be a useful approach to image recognition.
14
Recognition Using Machine Learning
key features from images and using them as input to a machine learning model.
An example of this is classifying digits using HOG features and an SVM classifier.
Digit classification using histogram of oriented gradients (HOG) feature extraction of image
A deep learning approach to image recognition may involve the use of a convolutional
neural network to automatically learn relevant features from sample images and automatically
15
SUMMARY/CONCLUSION
The image recognition market is estimated to grow from USD 15.95 Billion in 2016 to
USD 38.92 Billion by 2021, at a CAGR of 19.5% between 2016 and 2021. Advancements in
machine learning and use of high bandwidth data services is fueling the growth of this
According to the report by Markets and Markets, the image recognition market is divided
into hardware, software, and services. The hardware segment dominated by smartphones and
scanners can play a huge role in the growth of image recognition market. Image recognition has
no doubt taken the world by storm as its roles in artificial intelligence (Machine learning) cannot
be over-emphasized owing to the recent advancements in machine learning and an increase in the
16