You are on page 1of 18

PAPER ON IMAGE RECOGNITION

COMPILED BY:

CS/7HCS/00187

CS/7HCS/00188

CS/7HCS/00194

CS/7HCS/00202

CS/7HCS/00207

SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE, SCHOOL


OF PROFESSIONAL STUDIES, KWARA STATE UNIVERSITY, MALETE.

COURSE TITTLE: SELLECTED TOPICS IN COMPUTER SCIENCE

COURSE CODE: HCSC408

LECTURER: DR. AISHIAT YUSUF ASAJU


ABSTRACT

The recent advancement in artificial intelligence and machine learning has

contributed to the growth of computer vision and image recogniton concepts. From

controlling a driver-less car to carrying out face detection for a biometric access, image

recognition helps in processing and categorizing objects based on trained algorithms. This

paper discusses “Image processing”, an ability of a computer powered camera to identify and

detect objects or features in a digital image or video. The paper also discusses how image

recognition works, its benefits, its usage in business, benefits, challenges e.t.c.
INTRODUCTION

Just like the phrase “What-you-see-is-what-you-get” says, human brains make vision easy.

It doesn’t take any effort for humans to tell how a dog or a cat looks like. But this process is quite

hard for a computer to imitate. It seems easy for human being only because as humans, naturally;

our brains are incredibly good in recognizing images. The point here is that when it comes to

identifying images, we humans can clearly recognize and distinguish different features of

objects. This is because our brains have been trained unconsciously with the same set of images

that has resulted in the development of capabilities to differentiate between things effortlessly.

We are hardly conscious when we interpret the real world. Encountering different entities of the

visual world and distinguishing with ease is a no challenge to us. Our subconscious mind carries

out all the processes without any hassle. Contrary to human brains, computer views visuals as an

array of numerical values and looks for patterns in the digital image, be it a still, video or

graphics; recognizes them and distinguishes the key features of the image. The manner in which

a system interprets an image is completely different from humans.

Computer vision also uses image processing algorithms to analyze and understand

visuals from a single image or a sequence of images. An example of computer vision is

identifying pedestrians and vehicles on the road by, categorizing and filtering millions of user-

uploaded pictures with accuracy. Therefore, the two terms “computer vision” and “image

recognition” may have been used interchangeably. Computer vision (CV) is to let a computer

imitate human vision and take actions. For example, CV can be designed to sense a running child

on the road and produces a warning signal to the driver. In contrast, image recognition is about the

pixel and pattern analysis of an image to recognize the image as a particular object. Computer

vision means it can “do something” with the recognized images. Image recognition is called the

1
labeling process applied to a segmented object of a scene. That is, the image recognition

presumes that objects in a scene have been segmented as individual elements (e.g. a bolt, and a

wrench). The typical constraint here is that images are acquired in a known viewing geometry

(often perpendicular to the workspace).

Image recognition methodologies are distinguished into:

1. The decision theoretic methods, which uses proper decision or discriminant functions for

matching the objects to one of several prototypes and

2. Structural methods, where an object is decomposed in a set of primitive element pattern

of predefined length and direction.

2
BRIEF HISTORY OF IMAGE RECOGNITION

In the year 2001, an efficient algorithm for face detection was invented by Paul Viola and

Michael Jones. Their demo that showed faces being detected in real time on a webcam feed was

the most stunning demonstration of computer vision and its potential at the time. Soon, it was

implemented in Open Computer Vision (CV) and face detection became synonymous with

Viola and Jones algorithm. Every few years a new idea comes along that forces people to pause

and take note. In object detection, that idea came in 2005 with a paper by Navneet Dalal and Bill

Triggs. Their feature descriptor, Histograms of Oriented Gradients (HOG), significantly

outperformed existing algorithms in pedestrian detection.

Every decade or so a new idea comes along that is so effective and powerful that you

abandon everything that came before it and wholeheartedly embrace it. Deep Learning is that

idea of this decade. Deep Learning algorithms had been around for a long time, but they became

mainstream in computer vision with its resounding success at the ImageNet Large Scale Visual

Recognition Challenge (ILSVRC) of 2012. In that competition, an algorithm based on Deep

Learning by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton shook the computer vision

world with an astounding 85% accuracy 11% better than the algorithm that won the second

place. In ILSVRC 2012, this was the only Deep Learning based entry. In 2013, all winning

entries were based on Deep Learning and in 2015 multiple Convolutional Neural Network

(CNN) based algorithms surpassed the human recognition rate of 95%.

With such huge success in image recognition, Deep Learning based object detection was

inevitable. Techniques like Faster R-CNN produce jaw-dropping results over multiple object

classes. We will learn about these in later posts, but for now keep in mind that if you have not

looked at Deep Learning based image recognition and object detection algorithms for your

3
applications, you may be missing out on a huge opportunity to get better results. With that

overview, we are ready to return to the main goal of this post understand image recognition

using traditional computer vision techniques.

What is image recognition?

Image recognition is the ability of a computer powered camera to identify and detect

objects or features in a digital image or video. It is a method for capturing, processing,

examining, and sympathizing images. To identify and detect images, computers use machine

vision technology that is powered by an artificial intelligence system.

In other words, image recognition can be said to be the ability of a system or software to

identify objects, people, places, and actions in images. It uses machine vision technologies with

artificial intelligence and trained algorithms to recognize images through a camera system.

A typical image recognition algorithm includes the following:

• Optical character recognition

• Pattern matching and gradient matching

• Face recognition

• License plate matching

• Scene identification

How does image recognition work?

Image recognition technology works by detecting salient regions, which are portions that

contain the most information about the image or the object. It does this by isolating the most

informative portions or features in a selected image and localizes them, while ignoring the rest of

the features that may not be of much interest. The process uses an image recognition algorithm,

4
also known as an image classifier, that takes an image as input and outputs what the image

contains. For an algorithm to know what an image contains, it has to be trained to learn the

differences between classes. For instance, if the goal of an image recognition system is to detect

and identify dogs, the image recognition algorithm needs to be trained with thousands of images

of dogs and thousands of images of backgrounds that do not contain any dogs.

How image recognition technology actually works?

Facebook can now perform face recognize at 98% accuracy which is comparable to the

ability of humans. Facebook can identify your friend’s face with only a few tagged pictures. The

efficacy of this technology depends on the ability to classify images. Classification is pattern

matching with data. Images are data in the form of 2-dimensional matrices. In fact, image

recognition is classifying data into one category out of many. One common and an important

example is optical character recognition (OCR). OCR converts images of typed or handwritten

text into machine-encoded text. The major steps in image recognition process are gather and

organize data, build a predictive model and use it to recognize images.

How image recognition is used in business?

In the commercial world, the major applications of image recognition are face

recognition, security and surveillance, visual geolocation, object recognition, gesture recognition,

code recognition, industrial automation, image analysis in medical and driver assistance. These

applications are revolutionizing the business world, across many industries, and here’s how:

• InE-commerce

Image recognition has been highly adopted in e-commerce, including search and

advertising. Today, mobile applications use the technology to identify specific

products, providing potential customers with a more engaging experience of the world

5
around them. It presents a more interactive view of the world by making everything

searchable.

• Business process management

Image recognition technology can assist in the identification process during

business operations. An example of this would be the replacement of traditional ID

cards with Face ID. In the workplace, this can be used to determine if a person is

granted access to official work documents or simply to check in. Another example

where image recognition is applicable for efficient business operations is in the

manufacturing process. Machines equipped with image recognition can automatically

detect defective products in the manufacturing pipeline.

• Automotive industry

Self-driving cars are the buzz in the automotive industry and are already being tested in the

U.S. and other parts of the world. These advancements in the automobile world are made

possible by computer vision technology which uses AI image recognition. Computer vision

systems powered by deep learning are trained using thousands of images such as road signs,

pathways, moving objects, vehicles, and people and are fed into the systems neural networks.

The systems get intelligent as more training data is fed into the system and this is how

autonomous driving is enabled.

The Benefits of Image Recognition

Image recognition can really help you with digital marketing. By integrating the

application’s programing interface to your text-based analytics platforms, you will be able to

offer visual insights to your customers without the expensive product creation that uses logo

detection. Image recognition can also help you monitor ROI (Return on investment) and protect

6
your brand. You will be able to track how a sponsorship is doing with image and logo detection

and this will help you determine how much revenue you will get in return. Therefore, integrating

an image recognition application programing interface is an easy way of giving your customers

the best service.

When using image recognition, you can easily transposition digital information. As

opposed to virtual reality, image recognition doesn’t replace your environment with a digital one,

instead, it adds more perks to it. in addition, you can easily organize your visual memory. Image

recognition software can help you make mental notes through visual. If you take an image, its

computer vision will match up with the visual background information, meaning that you can get

information about wine bottles, books, DVDs, and many more by simply taking a photo of their

covers or labels. When you have these images in your computer, you can then search for the

information they contain, counting on keywords, location etc.

Challenges of Image Recognition

Viewpoint Variation: In a real world, the entities within the image are aligned in different

directions and when such images are fed to the system, the system predicts inaccurate values. In

short, the system fails to understand that changing the alignment of the image (left, right, bottom,

top) will not make it different and that is why it creates challenges in image recognition.

Scale Variation: Variations in size affect the classification of the object. The closer you view

the object the bigger it looks in size and vice-versa

Deformation: Objects do not change even if they are deformed. The system learns from the

perfect image and forms a perception that a particular object can be in specific shape only. We

know that in the real world, shape changes and as a result, there are inaccuracies when the

system encounters a deformed image of an object.

7
Inter-class Variation: Certain object varies within the class. They can be of different shape,

size, but still represents the same class. For example, buttons, chairs, bottles, bags come in

different sizes and appearances.

Uses of Image Recognition

• Drones:

Drones equipped with image recognition capabilities can provide vision-based

automatic monitoring, inspection, and control of the assets located in remote

areas.

• Manufacturing:

Inspecting production lines, evaluating critical points on a regular basis within

the premises. Monitoring the quality of the final products to reduce the defects.

Assessing the condition of the workers can help manufacturing industries to

have a complete control of different activities in the systems.

• Autonomous Vehicles:

Autonomous vehicles with image recognition can identify activities on the road

and take necessary actions. Mini robots can help logistics industries to locate

and transfer the objects from one place to another. It also maintains the database

of the product movement history to prevent the product from being misplaced or

stolen.

• Military Surveillance:

Detection of unusual activities in the border areas and automatic decision-

8
making capabilities can help prevent infiltration and result in saving the lives of

soldiers.

• Forest Activities: Unmanned Aerial Vehicles can monitor the forest, predict

changes that can result in forest fires, and prevent poaching. It can also provide

a complete monitoring of the vast lands, which humans cannot access easily.

STEPS USED IN IMAGE RECOGNITION

• The first and very obvious stage is obtaining the image that we will be working on. With the

current state of technology, obtaining an image is very easy, a good picture can be taken with

both a Smartphone and a camera installed on the laptop. A computer can only analyze a

digital input, so each photo has to be represented as a set of points pixels. Only this kind of

signal can be then converted into a decision on further action. If we use a digital camera to

make it, then we don’t have to worry about this step.

• The second step of the process is to find a mathematical description of the image. There are

three main levels of image processing. Pre-processing is used to remove noise, sharpen the

photo to easily recognize a significant object in the image or change the color of the photo to

grayscale if color is not an important feature in the studied issue.

• The third step is the definition of the set of features that will be the most descriptive for

identified objects. It is called mid-level processing. The choice of these features strongly

affects recognition so they are selected exactly for the specific application. Very often,

quantitative features are used, which can be conveniently expressed and put on coordinate

axes such as the size of the feature or distance between features.

9
For a human face, for example, it can be its width, distance between the corners of the lips or

between the pupils. “Measured” face can be then determined in space, which will set a point for

us to identify a specific person and distinguish from others even if those are very similar.

Now, we are entering the third stage of the image recognition process classification. It is a

mathematical function that can assign each feature to one of the classes on which we will be

making decisions later. Similar objects will be close to each other in the space of features,

different far away, what leads us to focused or a well-separated group of points, representing

objects of a given class. As you can guess, objects of different classes will be located at a

considerable distance from other classes. Such grouping is a prerequisite for being able to

recognize objects effectively. If different classes are mixed due to incorrect initial selection of

features, it will be impossible to group them correctly.

• The last step is the right recognition and decision making. This stage focuses on the proper

description of the image, in the form of a mathematical formula. It refers to the concept of

indicator functions, which are created automatically by the image recognition algorithm,

which uses the learning process. It is important that the algorithm leads to the fact that for

every point it can be determined to what extent the object meets the conditions for being

classified into the appropriate class. A correctly constructed selection of indicator functions

for all the classes or correctly defined boundaries between clusters of classes lead directly to

the final stage of the whole process — recognition and decision making — object is assigned

to a class for which the indicator function has the highest value or on which side of the

designated boundary it is.

10
These steps can be further explained below:

Step 1: Preprocessing

Often an input image is pre-processed to normalize contrast and brightness effects. A very

common preprocessing step is to subtract the mean of image intensities and divide by the

standard deviation. Sometimes, gamma correction produces slightly better results. While dealing

with color images, a color space transformation ( e.g. RGB to LAB color space ) may help get

better results. As part of pre-processing, an input image or patch of an image is also cropped and

resized to a fixed size. This is essential because the next step, feature extraction, is performed on

a fixed sized image.

Step 2: Feature Extraction

The input image has too much extra information that is not necessary for classification.

Therefore, the first step in image classification is to simplify the image by extracting the

important information contained in the image and leaving out the rest. For example, if you want

to find shirt and coat buttons in images, you will notice a significant variation in RGB pixel

values. However, we can simplify the image by running an edge detector on an image. You can

still easily discern the circular shape of the buttons in these edge images and so we can conclude

that edge detection retains the essential information while throwing away non-essential

information. The step is called feature extraction. In traditional computer vision approaches

designing these features are crucial to the performance of the algorithm. Turns out we can do

much better than simple edge detection and find features that are much more reliable. In our

example of shirt and coat buttons, a good feature detector will not only capture the circular shape

of the buttons but also information about how buttons are different from other circular objects

11
like car tires. Some well-known
known features used in computer vision are Haar-like
Haar features

introduced by Viola and Jones, Histogram of Oriented Gradients ( HOG ), Scale-Invariant

Feature Transform ( SIFT ), Speeded Up Robust Feature ( SURF ) etc.

Feature Extraction Using Histogram of Oriented Gradients ( HOG )

A feature extraction algorithm converts an image of fixed size to a feature vector of fixed

size. In the case of pedestrian detection, the HOG feature descriptor is calculated for a 64×128

patch of an image and it returns a vector of size 3780. Notice that the original dimension of this

image patch was 64 x 128 x 3 = 24,576 which is reduced to 3780 by the HOG descriptor.

HOG is based on the idea that local object appearance can be effectively described by the

distribution ( histogram ) of edge directions ( oriented gradients ). The steps for calculating the

HOG descriptor for a 64×128 image are listed below.

1. Gradient calculation : Calculate the x and the y gradient images, and , from the

original image. This can be done by filtering the original image .

2. Cells : Divide the image into 8×8 cells.

3. Calculate histogram of gradients in these 8×8 cells : At each pixel in an 8×8 cell we

know the gradient ( magnitude and direction ), and therefore we have 64 magnitudes and

64 directions i.e. 128 numbers. Histogram of these gradients will provide a more useful

and compact representation. We will next convert these 128 numbers into a 9-bin
9

histogram ( i.e. 9 numbers ). The bins of the histogram correspond to gradients directions

0, 20, 40 … 160 degrees. Every pixel votes for either one or two bins in the histogram. If

the direction of the gradient at a pixel is exactly 0, 20, 40 … or 160 degrees, a vote equal

to the magnitude of the gradient is cast by the pixel into the bin. A pixel where the

12
direction of the gradient is not exactly 0, 20, 40 … 160 degrees splits its vote among the

two nearest bins based on the distance from the bin. E.g. A pixel where the magnitude of

the gradient is 2 and the angle is 20 degrees will vote for the second bin with value 2. On

the other hand, a pixel with gradient 2 and angle 30 will vote 1 for both the second bin (

corresponding to angle 20 ) and the third bin ( corresponding to angle 40 ).

4. Block normalization : The histogram calculated in the previous step is not very robust to

lighting changes. Multiplying image intensities by a constant factor scales the histogram

bin values as well. To counter these effects we can normalize the histogram — i.e. think

of the histogram as a vector of 9 elements and divide each element by the magnitude of

this vector. In the original HOG paper, this normalization is not done over the 8×8 cell

that produced the histogram, but over 16×16 blocks. The idea is the same, but now

instead of a 9 element vector you have a 36 element vector.

5. Feature Vector : In the previous steps we figured out how to calculate histogram over an

8×8 cell and then normalize it over a 16×16 block. To calcualte the final feature vector

for the entire image, the 16×16 block is moved in steps of 8 ( i.e. 50% overlap with the

previous block ) and the 36 numbers ( corresponding to 4 histograms in a 16×16 block )

calculated at each step are concatenated to produce the final feature vector.

What is the length of the final vector ?

The input image is 64×128 pixels in size, and we are moving 8 pixels at a time.

Therefore, we can make 7 steps in the horizontal direction and 15 steps in the vertical

direction which adds up to 7 x 15 = 105 steps. At each step we calculated 36 numbers,

which makes the length of the final vector 105 x 36 = 3780.

13
Step 3 : Learning Algorithm For Classification

In the previous section, we learned how to convert an image to a feature vector. In this

section, we will learn how a classification algorithm takes this feature vector as input and outputs

a class label (e.g. cat or background).

Before a classification algorithm can do its magic, we need to train it by showing

thousands of examples of cats and backgrounds. Different learning algorithms learn differently,

but the general principle is that learning algorithms treat feature vectors as points in higher

dimensional space, and try to find planes / surfaces that partition the higher dimensional space in

such a way that all examples belonging to the same class are on one side of the plane surface.

Recognition methods in image processing

Image recognition is the process of identifying and detecting an object or a feature in a

digital image or video. This concept is used in many applications like systems for factory

automation, toll booth monitoring, and security surveillance. Typical image recognition

algorithms include:

• Optical character recognition

• Pattern matching and gradient matching

• Face recognition

• License plate matching

• Scene identification or scene change detection

Machine learning and deep learning methods can also be a useful approach to image recognition.

14
Recognition Using Machine Learning

A machine learning approach to image recognition involves identifying and extracting

key features from images and using them as input to a machine learning model.

An example of this is classifying digits using HOG features and an SVM classifier.

Digit classification using histogram of oriented gradients (HOG) feature extraction of image

(top) and SVMs (bottom).

Image Recognition Using Deep Learning

A deep learning approach to image recognition may involve the use of a convolutional

neural network to automatically learn relevant features from sample images and automatically

identify those features in new images.

15
SUMMARY/CONCLUSION

The image recognition market is estimated to grow from USD 15.95 Billion in 2016 to

USD 38.92 Billion by 2021, at a CAGR of 19.5% between 2016 and 2021. Advancements in

machine learning and use of high bandwidth data services is fueling the growth of this

technology. Companies in different sectors such as e-commerce, automotive, healthcare, and

gaming are rapidly adopting image recognition.

According to the report by Markets and Markets, the image recognition market is divided

into hardware, software, and services. The hardware segment dominated by smartphones and

scanners can play a huge role in the growth of image recognition market. Image recognition has

no doubt taken the world by storm as its roles in artificial intelligence (Machine learning) cannot

be over-emphasized owing to the recent advancements in machine learning and an increase in the

computational power of the machines.

16

You might also like