You are on page 1of 23

Object Detection using Python OpenCV

Saturday, March 30, 2019 8:33 PM

Clipped from: https://circuitdigest.com/tutorial/object-detection-using-


python-opencv

Object Detection using Python & OpenCV

We started with learning basics of OpenCV and then done some basic image
processing and manipulations on images followed by Image segmentations
and many other operations using OpenCV and python language. Here, in this
section, we will perform some simple object detection techniques using
template matching. We will find an object in an image and then we will
describe its features. Features are the common attributes of the image such
as corners, edges etc. We will also take a look at some common and popular
object detection algorithms such as SIFT, SURF, FAST, BREIF & ORB.

As told in the previous tutorials, OpenCV is Open Source Commuter


Vision Library which has C++, Python and Java interfaces and supports
Windows, Linux, Mac OS, iOS and Android. So it can be easily installed
in Raspberry Pi with Python and Linux environment. And Raspberry Pi with

Food Recognizer Page 1


in Raspberry Pi with Python and Linux environment. And Raspberry Pi with
OpenCV and attached camera can be used to create many real-time image
processing applications like Face detection, face lock, object tracking, car
number plate detection, Home security system etc.

Object detection and recognition form the most important use case for
computer vision, they are used to do powerful things such as

• Labelling scenes
• Robot Navigation
• Self-driving cars
• Body recognition (Microsoft Kinect)
• Disease and cancer detection
• Facial recognition
• Handwriting recognition
• Identifying objects in satellite images

Object Detection VS Recognition

Object recognition is the second level of object detection in which computer


is able to recognize an object from multiple objects in an image and may be
able to identify it.

Now, we will perform some image processing functions to find an object


from an image.

Finding an Object from an Image

Here we will use template matching for finding character/object in an image,


use OpenCV’s cv2.matchTemplate() function for finding that object

import cv2
import numpy as np

Load input image and convert it into gray

image=cv2.imread('WaldoBeach.jpg')
cv2.imshow('people',image)
cv2.waitKey(0)
gray=cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)

Load the template image

template=cv2.imread('waldo.jpg',0)
#result of template matching of object over an image

Food Recognizer Page 2


#result of template matching of object over an image
result=cv2.matchTemplate(gray,template,cv2.TM_CCOEFF)
sin_val, max_val, min_loc, max_loc=cv2.minMaxLoc(result)

Create bounding box

top_left=max_loc
#increasing the size of bounding rectangle by 50 pixels
bottom_right=(top_left[0]+50,top_left[1]+50)
cv2.rectangle(image, top_left, bottom_right, (0,255,0),5)

cv2.imshow('object found',image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Food Recognizer Page 3


In cv2.matchTemplate(gray,template,cv2.TM_CCOEFF), input the gray-scale
image to find the object and template. Then apply the template matching
method for finding the objects from the image, here cv2.TM_CCOEFF is
used.

The whole function returns an array which is inputted in result, which is the
result of the template matching procedure.

And then we use cv2.minMaxLoc(result), which gives the coordinates or the


bounding box where the object was found in an image, and when we get
those coordinates draw a rectangle over it, and stretch a little dimensions of
the box so the object can easily fit inside the rectangle.

There are variety of methods to perform template matching and in this case
we are using cv2.TM_CCOEFF which stands for correlation coefficient.

cv2.matchTemplate takes a “sliding window” of the object and slides it over


the image from left to right and top to bottom, one pixel at a time. Then for
each location, we compute the correlation coefficient to determine how
“good” or “bad” the match is.

Regions with sufficiently high correlation can be considered as matches, from


there all we need is to call to cv2.minMaxLoc to find where the good
matches are in template matching.

Food Recognizer Page 4


matches are in template matching.

Feature Description Theory

In template matching we slide a template image across a source image until


a match is found. But it is not the best method for object recognition, as it
has severe limitations. This method isn’t very resilient.

The following factors make template matching a bad choice for object
detection.

• Rotation renders this method ineffective.


• Size (known as scaling) affects this as well.
• Photometric changes (e.g. brightness, contrast, hue etc.)
• Distortion form view point changes (Affine).
The one solution for this problem is image features

Image features are interesting areas of an image that are somewhat


unique to that specific image. They are also called key point features or
interest points.

The sky is an uninteresting feature, whereas as certain keypoints (marked in


red circles) can be used for the detection of the above image (interesting
Features). The image shown above clearly shows the difference between the
interesting feature and uninteresting feature.

Importance of feature detection

Food Recognizer Page 5


Features are important as they can be used to analyze, describe and match
images. They have extensive use in:

• Image alignment – e.g panorma stiching (finding corresponding


matches so we can stitch images together)
• 3D reconstruction
• Robot navigation
• Object recognition
• Motion tracking
• And more!

What defines the interest points?

Interesting areas carry a lot of distinct information and unique information of


Food Recognizer Page 6
Interesting areas carry a lot of distinct information and unique information of
an area. Typically, they are areas of high change of intensity, corners or
edges and more. But always be careful as noise can appear “informative”
when it is not! So try to blur so as to reduce noise.

Characteristic of Good or Interesting Features

Repeatable – They can be found in multiple pictures of the same scene.

Distinctive – Each feature is somewhat unique and different to other


features of the same scene.

Compactness/Efficiency – Significantly less features than pixels in the


image.

Locality – Feature occupies a small area of the image and is robust to


clutter and occlusion.

Food Recognizer Page 7


Corners as features

Corners are identified when shifting a window in any direction over that
point gives a large change in intensity.

Corners are not the best cases for identifying the images, but yes they have
certainly good use cases of them which make them handy to use.

So to identify corners in your image, imagine the green window we are


looking at and the black one is the image we want to find corners in, and
now when we move the window only inside the black box we see there is no
change in intensity and hence the image is flat i.e. no corners identified.

Now when we move the window in one direction we see that there is change
of intensity in one direction only, hence it’s an edge not a corner.

When we move the window in the corner, and no matter in what direction

Food Recognizer Page 8


When we move the window in the corner, and no matter in what direction
we move the window now there is a change in intensity, and this is identified
as a corner.

So let’s identify corner with the help of Harris Corner Detection algorithm,
developed in 1998 for corner detection and works fairly well.

The following OpenCV function is used for the detection of the corners.

cv2.cornerHarris(input image, block size, ksize, k)

Input image - Should be grayscale and float32 type.

blockSize - The size of neighborhood considered for corner detection

ksize - Aperture parameter of Sobel derivative used.

k - Harris detector free parameter in the equation

Output – array of corner locations (x,y)

Also an important thing to note is that Harris corner detection algorithm


requires a float 32 array datatype of image, i.e. image should be gray
image of float 32 type.

import cv2
import numpy as np

Load image then grayscale

image = cv2.imread('chess.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

The cornerHarris function requires the array datatype to be float32

gray = np.float32(gray)
harris_corners = cv2.cornerHarris(gray, 3, 3, 0.05)

We use dilation of the corner points to enlarge them

kernel = np.ones((7,7),np.uint8)
harris_corners = cv2.dilate(harris_corners, kernel, iterations = 2)

Threshold for an optimal value, it may vary depending on the image

Food Recognizer Page 9


Threshold for an optimal value, it may vary depending on the image

image[harris_corners > 0.025 * harris_corners.max() ] = [255, 127,


127]

cv2.imshow('Harris Corners', image)


cv2.waitKey(0)
cv2.destroyAllWindows()

Corner Harris returns the location of the corners, so as to visualize these


tiny locations we use dilation so as to add pixels to the edges of the corners.
So to enlarge the corner we run the dilation twice. And then we again do
some thresholding to change the colors of the corners.

The following function is used for the same with the below mentioned
parameters

cv2.goodFeaturesToTrack(input image, maxCorners, qualityLevel,


minDistance)

Input Image - 8-bit or floating-point 32-bit, single-channel image.

Food Recognizer Page 10


• Input Image - 8-bit or floating-point 32-bit, single-channel image.
• maxCorners – Maximum number of corners to return. If there are
more number of corners than the total numbers of corners which are
actually found, then the strongest one of them is returned.
• qualityLevel – Parameter characterizing the minimal accepted quality
of image corners. The parameter value is multiplied by the best corner
quality measure (smallest eigenvalue). The corners with the quality
measure less than the product are rejected. For example, if the best
corner has the quality measure = 1500, and the qualityLevel=0.01 ,
then all the corners with the quality measured less than 15 are rejected.
• minDistance – Minimum possible Euclidean distance between the
returned corners.

import cv2
import numpy as np

img = cv2.imread('chess.jpg')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

We specify the top 50 corners

corners = cv2.goodFeaturesToTrack(gray, 100, 0.01, 15)

for corner in corners:


x, y = corner[0]
x = int(x)
y = int(y)
cv2.rectangle(img,(x-10,y-10),(x+10,y+10),(0,255,0), 2)

cv2.imshow("Corners Found", img)


cv2.waitKey()
cv2.destroyAllWindows()

It also returns the array of location of the corners like previous method, so
we iterate through each of the corner position and plot a rectangle over it.

Food Recognizer Page 11


Problems with corners as features
Corner matching in images is tolerant of or corner detection don’t
have any problem with image detection when the image is
• Rotated
• Translated (i.e. shifts in image)
• Slight photometric changes e.g. brightness
or affine intensity

However, it is intolerant of:


• Large changes in intensity or photometric
changes)
• Scaling (i.e. enlarging or shrinking)

Food Recognizer Page 12


SIFT, SURF, FAST, BRIEF & ORB Algorithms

Scale Invariant Feature Transform (SIFT)

The corner detectors like Harris corner detection algorithm are rotation
invariant, which means even if the image is rotated we could still get the
same corners. It is also obvious as corners remain corners in rotated image
also. But when we scale the image, a corner may not be the corner as
shown in the above image.

SIFT is used to detect interesting keypoints in an image using the difference


of Gaussian method, these are the areas of the image where variation
exceeds a certain threshold and are better than edge descriptor.

Then we create a vector descriptor for these interesting areas. And the scale
Invariance is achieved via the following process:
i. Interesting points are scanned at several different scales.
ii. The scale at which we meet a specific stability criteria, is then
selected and encoded by the vector descriptor. Therefore, regardless of the
initial size, the more stable scale is found which allows us to be scale
invariant.

Rotation invariance is achieved by obtaining the Orientation


Assignment of the key point using image gradient magnitudes. Once we
know the 2D direction, we can normalize this direction.

Food Recognizer Page 13


know the 2D direction, we can normalize this direction.

A full paper on SIFT can be read here:

http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf.

And you can also find a tutorial on the official OpenCV link.

Food Recognizer Page 14


Speeded Up Robust Features (SURF)

SURF is the speeded up version of SIFT, as the SIFT is quite computational


expensive

SURF was developed to improve the speed of a scale invariant feature


detector. Instead of using the Difference of Gaussian approach, SURF uses
Hessian matrix approximation to detect interesting points and uses the sum
of Haar wavelet responses for orientation assignment.

A full paper on SIFT can be read here: http://www.vision.ee.ethz.ch/


~surf/eccv06.pdf

Alternatives of SIFT and SURF

As the SIFT and SURF are patented they are not freely available for
commercial use however there are alternatives to these algorithms which
are explained in brief here

Features from Accelerated Segment Test (FAST)

• Key point detection only (no descriptor, we can use SIFT or SURF to
compute that)
• Used in real time applications

Here you can find the papers on FAST

https://www.edwardrosten.com/work/rosten_2006_machine.pdf

Binary Robust Independent Elementary Features (BRIEF)

• Computers descriptors quickly (instead of using SIFT or SURF)


• it is quite fast.

Here you can find the paper on BRIEF

http://cvlabwww.epfl.ch/~lepetit/papers/calonder_pami11.pdf

Oriented FAST and Rotated BRIEF (ORB)

• Developed out of OpenCV Labs (not patented so free to use!)


• Combines both Fast and Brief
Here you can find the paper on ORB

http://www.willowgarage.com/sites/default/files/orb_final.pdf

Food Recognizer Page 15


Using SIFT, SURF, FAST, BRIEF & ORB in OpenCV

Flow process for SIFT, SURF, FAST, BRIEF & ORB

Feature Detection implementation

The SIFT & SURF algorithms are patented by their respective creators, and
while they are free to use in academic and research settings, you should
technically be obtaining a license/permission from the creators if you are
using them in a commercial (i.e. for-profit) application.

Below we are explaining programming examples of all the


algorithms mentioned above.

SIFT

import cv2
import numpy as np

image = cv2.imread('paris.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

Food Recognizer Page 16


Create SIFT Feature Detector object
sift = cv2.xfeatures2d.SIFT_create()

#Detect key points


keypoints = sift.detect(gray, None)
print("Number of keypoints Detected: ", len(keypoints))

Draw rich key points on input image

image = cv2.drawKeypoints(image, keypoints, None,


flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)

cv2.imshow('Feature Method - SIFT', image)


cv2.waitKey(0)
cv2.destroyAllWindows()

Console Output:

Number of keypoints Detected: 1893

Here the keypoints are (X,Y) coordinates extracted using sift detector and
drawn over the image using cv2 draw keypoint function.

Food Recognizer Page 17


SURF

import cv2
import numpy as np

image = cv2.imread('paris.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

Create SURF Feature Detector object, here we set hessian threshold


to 500

# Only features, whose hessian is larger than hessianThreshold are retained


by the detector

#you can increase the value of hessian threshold to decrease the keypoints

surf = cv2.xfeatures2d.SURF_create(500)

keypoints, descriptors = surf.detectAndCompute(gray, None)


print ("Number of keypoints Detected: ", len(keypoints))

Draw rich key points on input image

image = cv2.drawKeypoints(image, keypoints, None,


flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)

cv2.imshow('Feature Method - SURF', image)


cv2.waitKey()
cv2.destroyAllWindows()

Console Output:

Number of keypoints Detected: 1548

Food Recognizer Page 18


FAST

import cv2
import numpy as np

image = cv2.imread('paris.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

Create FAST Detector object

fast = cv2.FastFeatureDetector_create()
# Obtain Key points, by default non max suppression is On
# to turn off set fast.setBool('nonmaxSuppression', False)
keypoints = fast.detect(gray, None)
print ("Number of keypoints Detected: ", len(keypoints))

Draw rich keypoints on input image

image = cv2.drawKeypoints(image, keypoints, None,


flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)

cv2.imshow('Feature Method - FAST', image)


cv2.waitKey()
cv2.destroyAllWindows()

Console Output:

Food Recognizer Page 19


Console Output:

Number of keypoints Detected: 8960

BRIEF

import cv2
import numpy as np

image = cv2.imread('paris.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

Create FAST detector object

brief = cv2.xfeatures2d.BriefDescriptorExtractor_create()

Create BRIEF extractor object

#brief = cv2.DescriptorExtractor_create("BRIEF")

# Determine key points


keypoints = fast.detect(gray, None)

Obtain descriptors and new final keypoints using BRIEF

keypoints, descriptors = brief.compute(gray, keypoints)


print ("Number of keypoints Detected: ", len(keypoints))
Food Recognizer Page 20
print ("Number of keypoints Detected: ", len(keypoints))

Draw rich keypoints on input image

image = cv2.drawKeypoints(image, keypoints, None,


flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)

cv2.imshow('Feature Method - BRIEF', image)


cv2.waitKey()
cv2.destroyAllWindows()

Console Output:

Number of keypoints Detected: 8735

ORB

import cv2
import numpy as np

image = cv2.imread('paris.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

Create ORB object, we can specify the number of key points we


desire

orb = cv2.ORB_create()

Food Recognizer Page 21


orb = cv2.ORB_create()
# Determine key points
keypoints = orb.detect(gray, None)

Obtain the descriptors

keypoints, descriptors = orb.compute(gray, keypoints)


print("Number of keypoints Detected: ", len(keypoints))

Draw rich keypoints on input image

image = cv2.drawKeypoints(image, keypoints, None,


flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)

cv2.imshow('Feature Method - ORB', image)


cv2.waitKey()
cv2.destroyAllWindows()

Console Output:

Number of keypoints Detected: 500

We can specify the number of keypoints which has maximum limit of 5000,
however the default value is 500, i.e. ORB automatically would detect best
500 keypoints if not specified for any value of keypoints.

So this is how object detection takes place in OpenCV, the same programs

Food Recognizer Page 22


So this is how object detection takes place in OpenCV, the same programs
can also be run in OpenCV installed Raspberry Pi and can be used as a
portable device like Smartphones having Google Lens.

Food Recognizer Page 23

You might also like