You are on page 1of 5

CHAPTER ONE: GENERAL INTRODUCTION

1.1 Introduction
This chapter highlight on the general introduction of the project which includes the description of
the background of the study, statement of the problem, project questions, aims and objectives of
the study, significant of the study, project artifacts, scope and limitations of the study,
methodology, project structure and definition of terms.

1.2 Background of the Study


Object detection is a phenomenon in computer vision that involves the detection of various
objects digital images or videos. Some of the objects detected includes people, cars, chairs,
stones and some other real-time objects. Fast, accurate algorithms for object detection would
allow computers to drive cars without specialized sensors, enable assistive devices to convey
real-time scene information to human users, and unlock the potential for general purpose,
responsive robotic systems. Current detection systems repurpose classifiers to perform detection.
To detect an object, these systems take a classifier for that object and evaluate it at various
locations and scales in a test image. Systems like deformable parts models (DPM) use a sliding
window approach where the classifier is run at evenly spaced locations over the entire image.
Over the past decade, Deep learning has drawn much greater attention and become imperious
technology in the Artificial intelligence area. Object detection is considered one of the
noteworthy areas in the deep learning and Computer vision. Object detection has been
determined the numerous applications in computer vision such as object tracking, retrieval, video
surveillance, image captioning, Image segmentation, Medical Imagine and several greater
number other applications as well Object tracking, and capturing system, is one of the major
areas of research due to its increased commercial applications such as surveillance systems,
Mobile Robots, Medical therapy, security systems and driver assistance systems. Object
tracking, by definition, is to track an object (or multiple objects) over a sequence of images.
Tracking is usually performed on higher-level applications that require the location and object in
every frame. The most popular application in this area is vision-based surveillance, to help
understand the movement patterns of people with suspicious actions. Traffic scene analysis is
also a well-known application, to get the tracking information for keeping the vehicles in lane
and preventing the accidents. Thus, object detection and tracking under dynamic conditions is

1
still a challenge for real-time performance which requires the computational complexity to be
minimum.

Humans glance at an image and instantly know what objects are in the image, where they are,
and how they interact. The human visual system is fast and accurate, allowing us to perform
complex tasks like driving with little conscious thought.
Humans can detect and identify objects present in an image. The human visual system is fast and
accurate and can also perform complex tasks like identifying multiple objects and detect
obstacles with little conscious thought. The availability of large sets of data, faster GPUs, and
better algorithms, we can now easily train computers to detect and classify multiple objects
within an image with high accuracy. We need to understand terms such as object detection,
object localization, loss function for object detection and localization, and finally explore an
object detection algorithm known as “You Only Look Once” (YOLO). More recent approaches
like R-CNN uses region proposal making predictions. Unlike sliding window and region
proposal-based techniques, YOLO sees the entire image during training and test time so it
implicitly encodes contextual information about classes as well as their appearance. Fast R-CNN,
a top detection method, mistakes background patches in an image for objects because it can’t see
the larger context. YOLO makes less than half the number of background errors compared to
Fast R-CNN. Third, YOLO learns generalizable representations of objects. When trained on
natural images and tested on artwork, YOLO outperforms top detection methods like DPM and
R-CNN by a wide margin. Since YOLO is highly generalizable it is less likely to break down
when applied to new domains or unexpected inputs. YOLO still lags behind state-of-the-art
detection systems in accuracy. While it can quickly identify objects in images it struggles to
precisely localize some objects, especially smaller objects.

1.3 Statement of the Problems.


Artificial intelligent comes with imperative advancement where a machine is trained to learn and
behave like a human. An object should be easily and better identify among number of objects.
Real-time detection of an object with top-level classification and localization accuracy remains
challenging. There are many difficulties with object detection which have obstacles in seeing
creative solutions which means these additional considerations and plenty more signal that object
detection research is certainly not done. Furthermore, Limited amount of annotated data
currently available for object detection proves to be another substantial hurdle. Problem with
object detection frameworks as a result of small objects, especially those bunched together with

2
partial occlusions. Therefore, this project work will use YOLO model for identifying real-time
objects among several others objects.

1.4 Project Questions


 What are the various model that can be used for simple objection detection?
 How is YOLO model more suitable for object detection?

1.5 Aim and Objectives of the Study

The main aim of this project work is to develop a system that will allow identification of real-
time object using suitable deep learning model.
The following objectives have been identified to fulfill the stated aim of this project work:
1. To investigate and analyze the problems of real-time object detection.
2. To implement a system from YOLO model for handling real-time object detection
problem. .
3. To test the new system using some appropriate data.

1.6 Significance of the Study


Object detection is merely to recognize the object with bounding box in the image, where in
image classification, we can simply categorize (classify) that is an object in the image or not in
terms of the likelihood (Probability). Object detection is useful in any setting where computer
vision is needed to localize and identify objects in an image. Object detection flourishes in
settings where objects and scenery are more or less similar. Humans can detect and identify
objects present in an image. The human visual system is fast and accurate and can also perform
complex tasks like identifying multiple objects and detect obstacles with little conscious thought.
The availability of large sets of data, faster GPUs, and better algorithms, we can now easily train
computers to detect and classify multiple objects within an image with high accuracy. We need
to understand terms such as object detection, object localization, loss function for object
detection and localization, and finally explore an object detection algorithm known as “You only
look once” (YOLO).

1.7 Project Artefact


A simple object detection system would be developed at the end of this project. A system that
could allow identification of real-time object using suitable deep learning model called YOLO.

3
1.8 Scope and Limitation of the Study
The ultimate Scope and limitation of object detection is to locate important items, draw
rectangular bounding boxes around them, and determine the class of each item discovered, other
decision parameters such as results and comparison of other datasets are not covered.

1.9 Methodology
This project of simple object detection has been handle using YOLO Model.
More on this model was been investigated from highly academic journals, conferences,
text books and other online sources, where the overall development approached to used
was YOLO, and iterative model.
A YOLO Model where YOLO is an abbreviation of You Only Look Once is an algorithm
that uses Neural Networks to provide real-time detection. This algorithm is popular
because of its speed and accuracy. It has been used in various applications to detect
traffic signals, peoples, packing meters and animals. The algorithm and recognizes
various objects in a picture (in real-time).object detection in YOLO is done as a
regression problem and provides the class probabilities of the detected image.
The mode was been implemented using Python programming language. Python is a
general-purpose interpreted, interactive, object-oriented and high-level programming
language. It was created by Guido van Rossum during 1985-1990. It is widely used in the
area of artificial intelligent due to its richness in of appropriate libraries.
1. 10 Project Structure
This project was structured to be in different chapters which is from chapter 1-5. The first
chapter include introduction, background of the study, and statement of the problem, project
question, project structure, aims and objectives, significant project artifact, scope and limitations
methodology. The second chapter include literature review and references. Third chapter include
introduction of the chapter description of the existing system, description of the proposed
system, dataflow diagram, architectural design, system flowchart and system interface design.
The fourth chapter which include introduction of the chapter, system design, implementation and
testing occurs. Finally the last chapter which is chapter five include summary, conclusion and
recommendation of the project work.

4
1.11 Definition of Terms
Object Detection is a computer vision technique that allows us to identify and locate objects in
an image or video. With this kind of identification and localization, object detection can be used
to count objects in a scene and determine and track their precise locations, all while accurately
labeling them.
Object Recognition is a computer vision technique for identifying objects in images or
videos. Object recognition is a key output of deep learning and machine learning algorithms.
When humans look at a photograph or watch a video, we can readily spot people, objects, scenes,
and visual details.
Deep learning is a subset of machine learning in artificial intelligence that has networks capable
of learning unsupervised from data that is unstructured or unlabeled. Also known
as deep neural learning or deep neural network.
Object is anything that has a fixed shape or form that you can touch or see.
YOLO Algorithm is an algorithm based on regression, instead of selecting the interesting part
of an image, it predicts classes and bounding boxes for the whole image in one run of
the Algorithm.
Artificial intelligence (AI) refers to the simulation of human intelligence in machines that are
programmed to think like humans and mimic their actions. The term may also be applied to any
machine that exhibits traits associated with a human mind such as learning and problem-solving.
Machine learning is a method of data analysis that automates analytical model building. It is a
branch of artificial intelligence based on the idea that systems can learn from data, identify
patterns and make decisions with minimal human intervention.
Region Based Convolutional Neural Networks (R-CNN) are a family of machine learning
models for computer vision and specifically object detection.
s

You might also like