SSRN Id3611339

International conference on Recent Trends in Artificial Intelligence, IOT, Smart Cities &
Applications: 2020
A REVIEW OF IMAGE DETECTION, RECOGNITION AND CLASSIFICATION

WITH THE HELP OF MACHINE LEARNING AND ARTIFICIAL
INTELLIGENCE
Riya Kumari1,Shikha Nikki2, Robin Beg3, Sawan Kumar Gope4
Ritesh Ranjan Mallick5, Shashi Ranjan6, Arijit Dutta7
1-7Deprtment of Computer Science & Engineering
Chaibasa Engineering College(Estd. By Govt. of Jharkhand & run by Techno India Under PPP),
Jharkhand, India
Abstract:
Humans can easily detect and identify objects present in front of their eyes . We know very well that visual system of human is very fast and accurate
and can perform complex tasks like object identification and detection very easily. But consider a situation in which we have to find a ring from the table
consisting of different sized boxes and other materials. Trying to find that key will take long time and we have to face some difficulties. If we have some
sort of computer algorithm, we can find ring without wasting single second similarly, with the availability of huge amount of data and algorithm, we can
easily train the datasets, calculate to detect and classify multiple objects with high accuracy. In this era DL, ML & AI are in trends. One of the most
recognized field of AI is Computer Vision. Computer Vision is a science of computer and software that can recognize and understand images. It also involves
image recognition, object detection and more. In this paper, we will briefly explain the concept of modern image detection , image classification and object
recognition.
1. Introduction 1.1. Image Detection
Day by day AI is getting more and more advanced but it seems to struggle It is a computer technology that process the image and detects object. Deep
when it comes to rendering images, that is, image detection, classification learning is the best option for image detection. Tutoring a single deep neural
and recognition. These three branches might seems similar but the fact is network how to solve several problems is more efficient than training
not true, although each of them has one goal to improve AI’s abilities to several networks to solve one single problem. Thus, compact parts of the
understand visual content. These differences make us to think on the deep neural network will improve its overall performance. when comes to
different aspect of image detection, classification and recognition. The appealing Deep Learning to image detection, we use python along with
purpose of object detection is to recognize and locate all known objects in open-source libraries like OpenCV Object detection, Image Detection ,
a scene. In 3D space, recovering pose of object is very important for robotic Luminoth, ImageAI, and others. These libraries clarifies the learning
control systems. The information from the object detector can be used for process and offer a ready-to-use environment.
obstacle avoidance and other interactions with the environment.
1.1Keywords
AI, ML, DL, CNN, ConvNet, Keras, Tensorflow
Nomenclature
AI Artificial Intelligence
ML Machine Learning
DL Deep Learning
CNN Convolutional Neural Network
Electronic copy available at: https://ssrn.com/abstract=3611339

ICAISC 2020
1.2. Image Classification
It is a process of identifying objects in the image. The neural network has

to process different images with different objects, detect them and classify
by the breed of the item on the picture. There are different types of deep
learning solutions for image classification. Among them we use
Convolutional Neural Network to analyze.
2.1. Convolutional Neural Network
CNN or CovNet is a type of feed- forward artificial neural network in

which the connectivity pattern between its neurons is inspired by the
organization of the animal visual cortex.
1.3. Image Recognition Convolutional Neural Network have following layers:

• Convolution
It is the combination of object detection and classification. It is the ability • ReLU Layer
of AI to detect the object, classify and recognize it. The significant steps in • Pooling
image recognition process are gather and organize data, build a computing • Fully Connected
model and use it to recognize images. One of the case of image recognition
solution is face recognition that is to unlock our smartphones we have to let
it scan our face. For this the technique has to detect the face, then classify We can use a variety of techniques to perform object detection. Popular
it as human face and decide if it belongs to the owner of the smartphone. deep learning–based approaches using Convolutional Neural
Networks (CNNs), such as R-CNN and YOLO v2, automatically learn to
detect objects within images.
We used these two approaches to get started with object

detection using deep learning:
2.1.1. Create and train a custom object detector:
To train a custom object detector from scratch, we need to design a network

architecture to learn the features for the objects of interest. We also need to
compile a very large set of labeled data to train the CNN. The results of a
custom object detector can be remarkable. That said, we need to manually
set up the layers and weights in the CNN, which requires a lot of time and
training data.
2.1.2. Use a pretrained object detector:

2.Deep Learning
Many object detection workflows using deep learning leverage transfer
Deep learning is a next evolution of machine learning. The term Deep
learning, an approach that enables we to start with a pretrained network and
Learning was introduced to the machine learning community in 1986.Deep
then fine-tune it for your application. This method can provide faster results
Learning is a subset of machine learning where learning method is based
because the object detectors have already been trained on thousands, or
on data representation or feature learning. ”Deep” refers to 1 or more
even millions, of images.
hidden layers in this case. In Deep Learning data gets through multiple
numbers of non linear transformation obtain an output.
Whether we create a custom object detector or use a pretrained one,
we will need to decide what type of object detection network we
ICAISC-20202

want to use: a two-stage network or a single-stage network. sharing. Image is convolved with an activation function to get feature maps.
To reduce spatial complexity of the network feature maps are treated with
2.2 Two-stage Method: pooling layer to get abstracted feature maps. This process is repeated for
desired number of filters and accordingly feature maps are extracted.
The initial stage of two-stage networks, such as R-CNN and its variants, Eventually, these feature maps are processed with fully connected layers to
identifies region proposals, or subsets of the image that might contain an get output of images recognition showing confidence score for the predicted
object. The second stage classifies the objects within the region proposals. class label.
Two-stage networks can achieve very accurate object detection results;
however, they are typically slower than single-stage networks.
3.2 TensorFlow:
TensorFlow is a free and open source machine learning

framework for all software developers. It is used for implementing
machine learning and deep learning applications. To develop and research
on fascinating ideas on artificial intelligence, Google team created
TensorFlow. TensorFlow is designed in Python programming language,
hence it is considered an easy to understand framework.
TensorFlow is a software library or framework, designed by the Google
team to implement machine learning and deep learning concepts in the
easiest manner. It combines the computational algebra of optimization
techniques for easy calculation of many mathematical expressions.
The following are the important features of TensorFlow :
• It includes a feature of that defines, optimizes and calculates

mathematical expressions easily with the help of multi-
2.3. Single-Stage Networks:
dimensional arrays called tensors.
In single-stage networks, such as YOLO v2, the CNN produces network • It includes a programming support of deep neural networks
predictions for regions across the entire image using anchor boxes, and the and machine learning techniques.
predictions are decoded to generate the final bounding boxes for the • It includes a high scalable feature of computation with various
objects. Single-stage networks can be much faster than two-stage data sets.
networks, but they may not reach the same level of accuracy, especially
for scenes containing small objects.
3.2.1. Tensor Data structure:
Tensors are used as the basic data structures in TensorFlow language.

Tensors represent the connecting edges in any flow diagram called the
Data Flow Graph. Tensors are defined as multidimensional array or list.
Tensors are identified by the following three parameters −
i. RANK: Unit of dimensionality described within tensor is

called rank. It identifies the number of dimensions of the
tensor. A rank of a tensor can be described as the order or n-
dimensions of a tensor defined.
ii. SHAPE: The number of rows and columns together define the
shape of Tensor.
3. Implementation Details:-
iii. TYPE: Type describes the data type assigned to Tensor’s
elements.
In this paper we are more concern about object detection using image
processing with the help of Tensorflow and compare results of the two A user needs to consider the following activities for building a
Tensor :
most widely used algorithm.
3.1. Basic concepts: • Build an n-dimensional array

• Convert the n-dimensional array
Object detection in TensorFlow can be done by using various models and 3.2. Keras:
is a procedure of determining the instance of the class to which the object
belongs and estimating the location of the object by outputting the Keras is a high level Python library run on top of TensorFlow framework.
bounding box around it with accuracy or confidence reading. Single class It focuses on understanding in-deed learning techniques. It is used to create
and multiple class object detection can be done in an image. CNN is a type layers for neural networks that retain the concepts of shape and
of feed forward neural networks and works on the principle of weight mathematical description.

ICAISC 2020
The creation of framework can be divided into the following two interference that is it plays the role of feature abstraction. Convolutional
types- Neural Network is also found to give the most accurate results in real life
problems. But at the same moment it have some flaws also such as high
figuring, without Graphics Processing Unit it is passive to train and there is
• Sequential API high requirement of training data. As these are some of the drawbacks of
• Functional API CNN which should be looked after and should be worked on it to get best
results from CNN.
The following are the eight steps to creating a deep learning model REFERENCES
in Keras:
i. Loading the data

Mukesh Tiwari and Dr. Rakesh Singhai,” A Review of Detection and
ii. Preprocess the loaded data
Tracking of Object from Image and Video Sequences” in Proceedings of
iii. Definition of model International Journal of Computational Intelligence Research , pp. 745-
iv. Compiling the model 765, Nov. 2017.
v. Fit the specified model Pratik Kalshetti, Ashish Jaiswal, Naman Rastogi and Prafull Gangawane,
vi. Evaluate it ”OBJECT DETECTION” .
vii. Save the model Reagan L. Galvez, Argel A.Bandala, Elmer P. Dadios , Ryan Rhay P.
Vicerra and Josc Martin z. Maningo, ”Object Detection Using
Convolutional Neural Network”. Proceedings of TENCON 2018-2018
IEEE region 10 conference [jeju,Korea,28-31 October 2018].
Farhana Sultana, Abu Sufian and Paramartha Dutta,”A Review of Object
Detection Models based on Convolutional Neural Network” .2nd ICCDC-
2019,HIT,Haldia.
Zhong-Qiu Zhao, Peng Zheing, Shou-tao Xu and Xindong Wu,”Object
Detection with Deep Learning: A Review”.arXiv:1807.05511v2[cs.CV]16
April 2019.
4. Conclusion
The CNN approach for object detection and classification is proved to be a

highly accurate (97.8%) and practical method. As we have seen that CNN
automatically detects important and specific without any human
ICAISC-20204

SSRN Id3611339

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SSRN Id3611339

Uploaded by

Copyright:

Available Formats

International conference on Recent Trends in Artificial Intelligence, IOT, Smart Cities &

A REVIEW OF IMAGE DETECTION, RECOGNITION AND CLASSIFICATION

1. Introduction 1.1. Image Detection

Electronic copy available at: https://ssrn.com/abstract=3611339

1.2. Image Classification

It is a process of identifying objects in the image. The neural network has

2.1. Convolutional Neural Network

CNN or CovNet is a type of feed- forward artificial neural network in

1.3. Image Recognition Convolutional Neural Network have following layers:

We used these two approaches to get started with object

2.1.1. Create and train a custom object detector:

To train a custom object detector from scratch, we need to design a network

2.1.2. Use a pretrained object detector:

Electronic copy available at: https://ssrn.com/abstract=3611339

TensorFlow is a free and open source machine learning

The following are the important features of TensorFlow :

• It includes a feature of that defines, optimizes and calculates

Tensors are used as the basic data structures in TensorFlow language.

Tensors are identified by the following three parameters −

i. RANK: Unit of dimensionality described within tensor is

3.1. Basic concepts: • Build an n-dimensional array

Electronic copy available at: https://ssrn.com/abstract=3611339

i. Loading the data

The CNN approach for object detection and classification is proved to be a

Electronic copy available at: https://ssrn.com/abstract=3611339

You might also like