You are on page 1of 30

Object Detection Through YOLO Neutral

Network
UNIVERSITY COLLEGE OF ENGINEERING AND TECHNOLOGY FOR WOMEN, WARANGAL

Under the guidance of :- T.Swapna


Team members:-
R . Ankitha --- 17568T0939
B . Sandhya      ---    17568T0907
M . Mamata     ---    185680966L
N . Shyamili      ---    17568T0935  
M . Sowjanya --- 17568T0928
Contents

1. Abstract
2. Introduction.
3. Literature Survey
4. Existing problem
5. Proposed solution
6. Algorithm
7. Architecture
8. Software requirements
specification
9. Flow chart
10. Result
11. Conclusion
12. Future scope
13. references
Abstract

We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to
perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and
associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full
images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly
on detection performance.
INTRODUCTION
Key words:- Object Detection , Deep Learning, Neural Networks, YOLO Algorithm.
Literature Survey
Existing Problem
Object detection is an important task, yet challenging vision task.

• The latest techniques like:


YOLO
RetinaNet
RCNN
Fast-RCNN
Faster-RCNN
Mask RCNN
• In recent years, a number of successful single-object tracking system appeared, but in the presence of several objects,
object detection becomes difficult and when objects are fully or partially occluded, they are obtruded from the human
vision which further increases the problem of detection. Decreasing illumination and acquisition angle.
• The problem with computer vision, image processing, and machine vision is that it determines whether or not the
image data has a specific object, feature, or function. One of the major problems was the image classification. Image
classification involves labeling an image based on the content of the image. For example, the objects in a particular
image such as tray, fork, spoon, etc. It is only detected by fork and spoon, but not by the tray because inside the data
set, tray keyword is not inserted. Therefore, the image classification will not be classified.
Proposed System

• All of the previous object detection algorithms use regions to localize the object within the image. The
network does not look at the complete image. Instead, parts of the image which have high probabilities of
containing the object. YOLO or You Only Look Once is an object detection algorithm much different from
the region based algorithms seen above. In YOLO a single convolutional network predicts the bounding
boxes and the class probabilities for these boxes.
• In our proposed project ,we are using YOLO neural network based object detection by modifying loss
function . YOLO and OpenCV method for object detection which detects each and every object clearly.
The last step is to have the boundary boxes and labeled images. It is easy to understand and consumes less
time to detect the object.
• Biggest advantages:
 Speed (45 frames per second — better for real-time)
 Network understands generalized object representation (This allowed them to train the network on
real world images and predictions on artwork was still fairly accurate).
 faster version (with smaller architecture) — 155 frames per sec but is less accurate.
 Open source.
 It has one CNNs solving both for location and classification problem.
Algorithm
Algorithm

1. The input image is divided into SxS grid.


2. For each cell it predicts B bounding boxes Each bounding box contains five
elements:- (x, y, w, h) and a box confidence score.
3. YOLO detects one object per grid cell only regardless of the number bounding
boxes
4. It predicts C conditional class probabilities.
5. If no objects exists then confidence score is zero else confidence score should be
greater or equal to threshold value.
6. YOLO then draws bounding box around the detected objects and predicts the class
to which the object belongs.
Architecture

• For given image, YOLO predicts (x, y ,w, h) and objectness confidence for all bounding boxes and
conditional class probability for all grid cells.
• Considering its spatial meaning, we can view the output as S × S × (B ∗ 5 + C) box.
• In VOC competition, S = 7, B = 2, and C = 20.
• Following figure describes the architecture of YOLO.
• For given image, convolution layers extract features and final FC layer predicts bounding box
parameters, objectness confidence, and conditional class probability.

• example (input image – 100 X 100 X 3, output – 3 X 3 X 8), our model will be trained as follows:
                      
 75 convolutional layers, with skip connections and upsampling layers. No form of pooling is used, and a convolutional
layer with stride 2 is used to downsample the feature maps. This helps in preventing loss of low-level features often
attributed to pooling.
 YOLO v3 makes prediction across 3 different scales.
 The detection layer is used make detection at feature maps of three different sizes, having strides 32, 16, 8 respectively.
This means, with an input of 416 x 416, we make detections on scales 13 x 13, 26 x 26 and 52 x 52.
Loss:-
Following is the loss function of YOLO. The first two terms are about bbox regression. Next two
terms are about objectness classification and the last term is about the class classification.

Here, 1i and 1ij are indicators about responsibility of ith grid cell and its jth bounding box,
respectively
Evaluation
Non Maximum Suppression
• Non Maximum Suppression (NMS) is a pre-work for
• Some of predicted regions severely overlap.
evaluation, removing overlapped regions using confidence.
• In object detection, multiple detection for single GT is
• Choose the most confident bounding box and remove all
penalized.
other boxes with high IoU with the box. Repeat until there is
no more box.
• NMS is applied to both RPN and detection network.
Performance:-
• YOLO is the first deep-learning model in the context of • Compared with fast R-CNN, YOLO has high location error
real-time detection with the state-of-the-art accuracy. and low background error.
• Real-Time : 30 frames per second or better. When the speed • Correct: correct class and IOU > .5
of car is 60km/h, car moves 0.55m between detections. • Localization: correct class, .1 < IOU < .5
• Similar: class is similar, IOU > .1
• Other: class is wrong, IOU > .1
• Background: IOU < .1 for any object
Data Flow Chart
 Initially User will be given the options to choose the type of the File to be given to
the System as an input. Thus, User can either choose option of File Selection or start
the Camera.
 In the former, User can choose either Image File or a Video File and, in the latter,
User can start the Camera module.
 Once the input is selected Pre-processing is done, where the SXS grids are formed.
The resultant thus formed with the grids is send to the Bounding Box Prediction
process where the Bounding Boxes are drawn around the detected objects.
 Next the result from the previous process is sent to the Class Prediction where the
Class of the object to which it belongs is predicted.
 Then it is sent to the detection process where a Threshold is set in order to reduce
clumsiness in the output with many Bounding Boxes and Labels in the final Output.
 At the end an image or a stream of images are generated for image and video or
camera input respectively with Bounding Boxes and Labels are obtained as the
Output.
Software requirements
specification
Hardware Requirements:
 CPU: Intel Core i7 7700HQ
 GPU: NVIDIA GeForce GTX 1050
 Hard Disk : 120 GB
 RAM : 16GB DDR 4
 Camera integrated system
 
Software Requirements:
• Coding Languages - Python
• IDE - Jupyter notebook 6.2.0
• Install YOLOv3
• Darknet
• OpenCV
• COCO dataset ----Data set(80 objects)
Result
Output1: Output2:
Output 4:
Output 3:

Output 5:
Conclusion

• We introduce YOLO, a unified model for object detection. Our model is simple to
construct and can be trained directly on full images. Unlike classifier-based
approaches, YOLO is trained on a loss function that directly corresponds to detection
performance and the entire model is trained jointly.
• Fast YOLO is the fastest general-purpose object detector in the literature and YOLO
pushes the state-of-the-art in real-time object detection. YOLO also generalizes well to
new domains making it ideal for applications that rely on fast, robust object detection.
Future scope
Tracking objects: It is needless to point out that in the field of security and surveillance object detection
would play an even more important role. With object tracking it would be easier to track a person in a video.
Object tracking could also be used in tracking the motion of a ball during a match. In the field of traffic
monitoring too object tracking plays a crucial role.

Counting the crowd: Crowd counting or people counting is another significant application of object
detection. During a big festival, or, in a crowded mall this application comes in handy as it helps in
dissecting the crowd and measure different groups.

Self-driving cars :Another unique application of object detection technique is definitely self-driving cars.  A
self-driving car can only navigate through a street safely if it could detect all the objects such as people,
other cars, road signs on the road, in order to decide what action to take.

Detecting a vehicle :In a road full of speeding vehicles object detection can help in a big way by tracking a
particular vehicle and even its number plate. So, if a car gets into an accident or, breaks traffic rules then it is
easier to detect that particular car using object detection model and thereby decreasing the rate of crime
while enhancing security.
References
[1]M. B. Blaschko and C. H. Lampert. Learning to localize
objects with structured output regression. In Computer Vision–
ECCV 2008, pages 2–15. Springer, 2008. 4
[2] L. Bourdev and J. Malik. Poselets: Body part detectors
trained using 3d human pose annotations. In International
Conference on Computer Vision (ICCV), 2009. 8
[3] H. Cai, Q. Wu, T. Corradi, and P. Hall. The crossdepiction
problem: Computer vision algorithms for recognising objects in
artwork and in photographs. arXiv preprint arXiv:1505.00110,
2015. 7
[4] N. Dalal and B. Triggs. Histograms of oriented gradients for
human detection. In Computer Vision and Pattern Recognition,
2005. CVPR 2005. IEEE Computer Society Conference on,
volume 1, pages 886–893. IEEE, 2005. 4, 8
[5] T. Dean, M. Ruzon, M. Segal, J. Shlens, S. Vijayanarasimhan,
J. Yagnik, et al. Fast, accurate detection of 100,000 object
classes on a single machine. In Computer Vision and Pattern
Recognition (CVPR), 2013 IEEE Conference on, pages 1814–
1821. IEEE, 2013. 5
[6] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng,
and T. Darrell. Decaf: A deep convolutional activation feature
for generic visual recognition. arXiv preprint arXiv:1310.1531,
2013. 4
[7] J. Dong, Q. Chen, S. Yan, and A. Yuille. Towards unified
object detection and semantic segmentation. In Computer
Vision–ECCV 2014, pages 299–314. Springer, 2014. 7
[8] D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov. Scalable
object detection using deep neural networks. In Computer
Vision and Pattern Recognition (CVPR), 2014 IEEE Conference
on, pages 2155–2162. IEEE, 2014. 5, 6
[9] M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I.
Williams, J. Winn, and A. Zisserman. The pascal visual object
classes challenge: A retrospective. International Journal of
Computer Vision, 111(1):98–136, Jan. 2015. 2
[10] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D.
Ramanan. Object detection with discriminatively trained part
based models. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 32(9):1627–1645, 2010. 1, 4
[11] S. Gidaris and N. Komodakis. Object detection via a
multiregion & semantic segmentation-aware CNN model. CoRR,
abs/1505.01749, 2015. 7
Thank You

You might also like