A 2019 Guide To Object Detection

12/31/2019 A 2019 Guide to Object Detection
KDnuggets
Subscribe to KDnuggets News | | Contact

search KDnuggets Search
SOFTWARE
News/Blog
Top stories
Opinions
Tutorials
JOBS
Companies
Courses
Datasets
EDUCATION
Certificates
Meetings
Webinars
Global Master of Management Analytics: The Essential Degree for the World of Data
KDnuggets Home » News » 2019 » Aug » Tutorials, Overviews » A 2019 Guide to Object Detection ( 19:n29 )
A 2019 Guide to Object Detection

<= Previous post
Next post =>

Like 69 Share 69 Tweet Share 9
Tags: Computer Vision, Image Recognition, Object Detection
Object detection has been applied widely in video surveillance, self-driving cars, and object/people tracking. In this piece, we’ll look at the basics of
object detection and review some of the most commonly-used algorithms and a few brand new approaches, as well.
78
TDWI Las Vegas Strategy Summit
SHARES
https://www.kdnuggets.com/2019/08/2019-guide-object-detection.html 1/13
Trends in Analytics - What You Need to Know in 2020 and Beyond
Save 20% thru Dec 20
By Derrick Mwiti, Data Analyst.

comments
Photo by Fernando @cferdo on Unsplash
Object detection is a computer vision technique whose aim is to detect objects such as cars, buildings, and human beings, just to mention a few. The
objects can generally be identified from either pictures or video feeds.
Object detection has been applied widely in video surveillance, self-driving cars, and object/people tracking. In this piece, we’ll look at the basics of
object detection and review some of the most commonly-used algorithms and a few brand new approaches, as well.
How Object Detection Works

Object detection locates the presence of an object in an image and draws a bounding box around that object. This usually involves two processes;
classifying and object’s type, and then drawing a box around that object. We’ve covered image classification before, so let’s now review some of the
common model architectures used for object detection:
R-CNN
Fast R-CNN
Faster R-CNN
Mask R-CNN
SSD (Single Shot MultiBox Defender)
YOLO (You Only Look Once)
Objects as Points
Data Augmentation Strategies for Object Detection

78
SHARES
R-CNN Model

This technique combines two main approaches: applying high-capacity convolutional neural networks to bottom-up region proposals so as to localize
and segment objects; and supervised pre-training for auxiliary tasks.
Rich feature hierarchies for accurate object detection and semantic segmentation
Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The...

This is followed by domain-specific fine-tuning that yields a high-performance boost. The authors of this paper named the algorithm R-CNN (Regions
with CNN features) because it combines regional proposals with convolutional neural networks.
source
This model takes an image and extracts about 2000 bottom-up region proposals. It then computes the features for each proposal using a large CNN.
Thereafter, it classifies each region using class-specific linear Support Vector Machines (SVMs). This model achieves a mean average precision of 53.7%
on PASCAL VOC 2010.
The object detection system in this model has three modules. The first one is responsible for generating category-independent regional proposals that
define the set of candidate detectors available to the model’s detector. The second module is a large convolutional neural network responsible for
extracting a fixed-length feature vector from each region. The third module consists of a class of support vector machines.
source
This model uses selective search to generate regional categories. Selective search groups regions that are similar based on color, texture, shape, and size.
For feature extraction, the model uses a 4096-dimensional feature vector by applying the Caffe CNN implementation on each regional proposal. Forward
propagating a 227 × 227 RGB image through five convolutional layers and two fully connected layers computes the features. The model explained in
this paper achieves a 30% relative improvement over the previous results on PASCAL VOC 2012.
Some of78
the drawbacks of R-CNN are:
SHARES
Training is a multi-stage pipeline. Tuning a convolutional neural network on object proposals, fitting SVMs to the ConvNet features, and finally
learning bounding box regressors.
Training is expensive in space and time because of deep networks such as VGG16, which take up huge amounts of space.
Object detection is slow because it performs a ConvNet forward pass for each object proposal.
Fast R-CNN

In this paper, a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection is proposed.
Fast R-CNN
This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection. Fast R-CNN...

It’s implemented in Python and in C++ using Caffe. This model achieves a mean average precision of 66% on PASCAL VOC 2012, versus 62% for R-
CNN.
https://arxiv.org/pdf/1504.08083.pdf
In comparison to the R-CNN, Fast R-CNN has a higher mean average precision, single stage training, training that updates all network layers, and disk
storage isn’t required for feature caching.
In its architecture, a Fast R-CNN, takes an image as input as well as a set of object proposals. It then processes the image with convolutional and max-
pooling layers to produce a convolutional feature map. A fixed-layer feature vector is then extracted from each feature map by a region of interest
pooling layer for each region proposal.
The feature vectors are then fed to fully connected layers. These then branch into two output layers. One produces softmax probability estimates over
several object classes, while the other produces four real-value numbers for each of the object classes. These 4 numbers represent the position of the
bounding78box for each of the objects.
SHARES

Faster R-CNN

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations...

This paper proposes a training mechanism that alternates fine-tuning for regional proposal tasks and fine-tuning for object detection.
source
The Faster R-CNN model is comprised of two modules: a deep convolutional network responsible for proposing the regions, and a Fast R-CNN detector
that uses the regions. The Region Proposal Network takes an image as input and generates an output of rectangular object proposals. Each of the
rectangles has an objectness score.
source
Mask R-CNN

Mask R-CNN
We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach...

The model presented in this paper is an extension of the Faster R-CNN architecture described above. It also allows for the estimation of human poses.
78
SHARES
source
In this model, objects are classified and localized using a bounding box and semantic segmentation that classifies each pixel into a set of categories. This
model extends Faster R-CNN by adding the prediction of segmentation masks on each Region of Interest. The Mask R-CNN produces two outputs; a
class label and a bounding box.
SSD: Single Shot MultiBox Detector

SSD: Single Shot MultiBox Detector
We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD...

This paper presents a model to predict objects in images using a single deep neural network. The network generates scores for the presence of each
object category using small convolutional filters applied to feature maps.
source

78
SHARES
This approach uses a feed-forward convolutional neural network that produces a collection of bounding boxes and scores for the presence of certain
objects. Convolutional feature layers are added to allow for feature detection at multiple scales. In this model, each feature map cell is linked to a set of
default bounding boxes. The figure below shows how SSD512 performs on animals, vehicles, and furniture.
source
You Only Look Once (YOLO)

This paper proposes a single neural network to predict bounding boxes and class probabilities from an image in a single evaluation.
You Only Look Once: Unified, Real-Time Object Detection

We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform...

The YOLO models process 45 frames per second in real-time. YOLO views image detection as a regression problem, which makes its pipeline quite
simple. It’s extremely fast because of this simple pipeline.
It can process a streaming video in real-time with a latency of less than 25 seconds. During the training process, YOLO sees the entire image and is,
therefore, able to include the context in object detection.
78
SHARES
source
In YOLO, each bounding box is predicted by features from the entire image. Each bounding box has 5 predictions; x, y, w, h, and confidence. (x, y)
represents the center of the bounding box relative to the bounds of the grid cell. w and h are the predicted width and height of the whole image.
This model is implemented as a convolutional neural network and evaluated on the PASCAL VOC detection dataset. The convolutional layers of the
network are responsible for extracting the features, while the fully connected layers predict the coordinates and output probabilities.
The network architecture for this model is inspired by the GoogLeNet model for image classification. The network has 24 convolutional layers and 2
78 layers. The main challenges of this model are that it can only predict one class, and it doesn’t perform well on small objects such as
fully-connected
birds.S HARES
source
This model achieves a mean average precision of 52.7%, it is, however, able to go up to 63.4%.
78
SHARES

Objects as Points

This paper proposes modeling an object as a single point. It uses key point estimation to find center points and regresses to all other object properties.
Objects as Points
Detection identifies objects as axis-aligned boxes in an image. Most successful object detectors enumerate a nearly...

These properties include 3D location, pose orientation, and size. It uses CenterNet, a center point based approach that’s faster and more accurate
compared to other bounding box detectors.
Properties such as object size and pose are regressed from features of the image at the center location. In this model, an image is fed to a convolutional
neural network which generates a heatmap. The maximum values in these heatmaps represent the centers of the objects in the image. In order to estimate
human poses, the model examines 2D joint locations and regresses them at the center point location.
This model achieves 45.1% COCO average precision at 1.4 frames per second. The figure below shows how this compared with the results obtained in
other research papers.
Learning Data Augmentation Strategies for Object Detection

Data augmentation involves the process of creating new image data by manipulating the original image by, for example, rotating and resizing.
78
SHARES
Learning Data Augmentation Strategies for Object Detection
Data augmentation is a critical component of training deep learning models. Although data augmentation has been shown...

While this isn’t itself a model architecture, this paper proposes the creation of transformations that can be applied to object detection datasets that can be
transferred to other objection detection datasets. The transformations are usually applied at training time.
In this model, an augmentation policy is defined as a set of n policies that are selected at random during the training process. Some of the operations that
have been applied in this model include distorting color channels, distorting the images geometrically, and distorting only the pixel content found in the
bounding box annotations.
Experimentation on the COCO dataset has shown that optimizing a data augmentation policy is able to improve the accuracy of detection by more than
+2.3 mean average precision. This allows a single inference model to achieve an accuracy of 50.7 mean average precision.
Conclusion

We should now be up to speed on some of the most common—and a couple of recent—techniques for performing object detection in a variety of
contexts.
The papers/abstracts mentioned and linked to above also contain links to their code implementations. We’d be happy to see the kind of results you obtain
after testing them.
Let’s not limit ourselves. Object detection can also live inside your smartphone. Learn how Fritz can teach mobile apps to see, hear, sense,
and think.

Bio: Derrick Mwiti is a data analyst, a writer, and a mentor. He is driven by delivering great results in every task, and is a mentor at Lapid Leaders
Africa.
Original. Reposted with permission.
Related:
Object Detection with Luminoth

78
An Overview of Human Pose Estimation with Deep Learning
SHARES
Large-Scale Evolution of Image Classifiers
What do you think?

3 Responses
Upvote Funny Love Surprised Angry Sad

0 Comments KDnuggets 
1 Login
 Recommend t Tweet f Share Sort by Best
Start the discussion…
LOG IN WITH OR SIGN UP WITH DISQUS ?
Name
Be the first to comment.
✉ Subscribe d Add Disqus to your siteAdd DisqusAdd

🔒 Disqus' Privacy PolicyPrivacy PolicyPrivacy
<= Previous post

Next post =>
Top Stories Past 30 Days
Most Popular Most Shared
1. How to Speed up Pandas by 4x with one line of code 1. Open Source Projects by Google, Uber and Facebook for
2. How to select rows and columns in Pandas using [ ], .loc, iloc, Data Science and AI
.at and .iat 2. Python, Selenium & Google for Geocoding Automation: Free
3. 10 Free Must-read Books on AI and Paid
4. Data Science for Managers: Programming Languages 3. AI, Analytics, Machine Learning, Data Science, Deep
5. Stop explaining black box machine learning models for high Learning Research Main Developments in 2019 and Key
stakes decisions and use interpretable models instead Trends for 2020
6. The Future of Careers in Data Science & Analysis 4. Plotnine: Python Alternative to ggplot2
7. 10 Free Top Notch Machine Learning Courses 5. Getting Started with Automated Text Summarization
6. Text Encoding: A Review
7. Automated Machine Learning Project Implementation
Complexities
Latest News
How To “Ultralearn” Data Science: summary, for thos...
Top Stories, Dec 16-29: What is a Data Scientist Worth?...
Towards a Quantitative Measure of Intelligence: Breakin...
How To “Ultralearn” Data Science: deep understandin...
78 Overfitting in Deep Learning
Fighting
SHARES
KDnuggets 19:n49, Dec 27: What is a Data Scientist Worth? N...
Top Stories
Last Week
Most Popular
1. Build Pipelines with Pandas Using pdpipe
2. 10 Free Top Notch Machine Learning Courses

3. Math for Programmers!
4. AI, Analytics, Machine Learning, Data Science, Deep Learning Research Main Developments in 2019 and Key Trends for 2020
5. How to Speed up Pandas by 4x with one line of code
6. The 4 Hottest Trends in Data Science for 2020
7. Python Dictionary and Dictionary Methods

Most Shared
1. AI, Analytics, Machine Learning, Data Science, Deep Learning Research Main Developments in 2019 and Key Trends for 2020
2. Plotnine: Python Alternative to ggplot2
3. AI, Analytics, Machine Learning, Data Science, Deep Learning Technology Main Developments in 2019 and Key Trends for 2020
4. Moving Predictive Maintenance from Theory to Practice
5. Build Pipelines with Pandas Using pdpipe
6. The 4 Hottest Trends in Data Science for 2020
7. 5 Great New Features in Latest Scikit-learn Release
More Recent Stories

KDnuggets 19:n49, Dec 27: What is a Data Scientist Worth? N...
10 Best and Free Machine Learning Courses, Online
Random Forest vs Neural Networks for Predicting Customer Churn
KDnuggets Cartoon in an English textbook?
Market Basket Analysis: A Tutorial
What is Data Catalog and Why You Should Care?
What is a Data Scientist Worth?
Top tweets, Dec 11-17: Idiot’s Guide to Precision, Re...
How To “Ultralearn” Data Science: optimization learning, P...
Google’s New Explainable AI Service
The Most In Demand Tech Skills for Data Scientists
Alternative Cloud Hosted Data Science Environments
Interpretability part 3: opening the black box with LIME and SHAP
5 Ways to Apply Ethics to AI
Ontotext Platform 3.0 for Enterprise Knowledge Graphs Released
The 4 fastest ways NOT to get hired as a data scientist
Automatic Text Summarization in a Nutshell
How to Convert an RGB Image to Grayscale
The ravages of concept drift in stream learning applications a...
KDnuggets 19:n48, Dec 18: Build Pipelines with Pandas Using...
KDnuggets Home » News » 2019 » Aug » Tutorials, Overviews » A 2019 Guide to Object Detection ( 19:n29 )
© 2019 KDnuggets. About KDnuggets. Privacy policy. Terms of Service

Subscribe to KDnuggets News
78
X SHARES

A 2019 Guide To Object Detection

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A 2019 Guide To Object Detection

Uploaded by

Copyright:

Available Formats

12/31/2019 A 2019 Guide to Object Detection

Subscribe to KDnuggets News | | Contact

A 2019 Guide to Object Detection

By Derrick Mwiti, Data Analyst.

Photo by Fernando @cferdo on Unsplash

How Object Detection Works

SSD: Single Shot MultiBox Detector

You Only Look Once (YOLO)

You Only Look Once: Uniﬁed, Real-Time Object Detection

Learning Data Augmentation Strategies for Object Detection

Original. Reposted with permission.

Object Detection with Luminoth

What do you think?

Upvote Funny Love Surprised Angry Sad

 Recommend t Tweet f Share Sort by Best

Start the discussion…

LOG IN WITH OR SIGN UP WITH DISQUS ?

Be the ﬁrst to comment.

✉ Subscribe d Add Disqus to your siteAdd DisqusAdd

<= Previous post

Top Stories Past 30 Days

Most Popular Most Shared

2. 10 Free Top Notch Machine Learning Courses

More Recent Stories

© 2019 KDnuggets. About KDnuggets. Privacy policy. Terms of Service

You might also like