You are on page 1of 5

Aldel Education Trust’s

St. John College of Engineering and Management, Palghar


(A Christian Religious Minority Institution)
Approved by AICTE and DTE, Affiliated to University of Mumbai/MSBTE
St. John Technical Campus, Vevoor, Manor Road, Palghar (E), Dist. Palghar, Maharashtra-401404
NAAC Accredited with Grade ‘A’
DEPARTMENT OF COMPUTER ENGINEERING

Experiment 04
Title: Detecting and Recognizing Objects
Aim: Object detection and recognition techniques, HOG descriptors, The scale issue, The location
issue, Non-maximum (or non-maxima) suppression, Support vector machines, People detection

Theory :
Object detection and recognition techniques:
Detecting and Recognizing Faces, which we'll reiterate for clarity: detecting an object is the ability
of a program to determine if a certain region of an image contains an unidentified object, and
recognizing is the ability of a program to identify this object. Recognizing normally only occurs
in areas of interest where an object has been detected, for example, we have attempted to recognize
faces on the areas of an image that contained a face in the first place.
When it comes to recognizing and detecting objects, there are a number of techniques used in
computer vision, which we'll be examining:
● Histogram of Oriented Gradients
● Image pyramids
● Sliding windows
Unlike feature detection algorithms, these are not mutually exclusive techniques, rather, they are
complimentary. You can perform a Histogram of Oriented Gradients (HOG) while applying the
sliding windows technique.

HOG descriptors:
HOG is a feature descriptor, so it belongs to the same family of algorithms, such as SIFT, SURF,
and ORB.
It is used in image and video processing to detect objects. Its internal mechanism is really clever;
an image is divided into portions and a gradient for each portion is calculated. We observed a
similar approach when we talked about face recognition through LBPH.
HOG, however, calculates histograms that are not based on color values, rather, they are based on
gradients. As HOG is a feature descriptor, it is capable of delivering the type ofinformation that
is vital for feature matching and object detection/recognition.
The extrapolation of histograms into descriptors is quite a complex process. First, local histograms
for each cell are calculated. The cells are grouped into larger regions called

Om Bhamare BE B 05
Aldel Education Trust’s
St. John College of Engineering and Management, Palghar
(A Christian Religious Minority Institution)
Approved by AICTE and DTE, Affiliated to University of Mumbai/MSBTE
St. John Technical Campus, Vevoor, Manor Road, Palghar (E), Dist. Palghar, Maharashtra-401404
NAAC Accredited with Grade ‘A’
DEPARTMENT OF COMPUTER ENGINEERING
blocks. These blocks can be made of any number of cells, but Dalal and Triggs found that 2x2
cell blocks yielded the best results when performing people detection.

The scale issue:


Imagine, for example, if your sample was a detail (say, a bike) extrapolated from a largerimage,
and you're trying to compare the two pictures. You would not obtain the same gradient signatures
and the detection would fail (even though the bike is in both pictures).

The location issue:


Once we've resolved the scale problem, we have another obstacle in our path: a potentially
detectable object can be anywhere in the image, so we need to scan the entire image in portions
to make sure we can identify areas of interest, and within these areas, try to detect objects. Even
if a sample image and object in the image are of identical size,there needs to be a way to instruct
OpenCV to locate this object. So, the rest of the imageis discarded and a comparison is made on
potentially matching regions. To obviate these problems, we need to familiarize ourselves with
the concepts of
image pyramid and sliding windows.

Non-maximum (or non-maxima) suppression:


Non-maximum (or non-maxima) suppression is a technique that suppresses all the resultsthat
relate to the same area of an image, which are not the maximum score for a particular area. This
is because similarly colocated windows tend to have higher scores and overlapping areas are
significant, but we are only interested in the window with the best result, and discard overlapping
windows with lower scores.
When examining an image with sliding windows, you want to make sure to retain thebest
window of a bunch of windows, all overlapping around the same subject.
To do this, you determine that all the windows with more than a threshold, x, in common will be
thrown into the non-maximum suppression operation.

Support vector machines:


Explaining in detail what an SVM is and does is beyond the scope of this book, but suffice it to
say, SVM is an algorithm that—given labeled training data–enables the classification of this data
by outputting an optimal hyperplane, which, in plain English, isthe optimal plane that divides
differently classified data.

Program :
HOG descriptors:

Om Bhamare BE B 05
Aldel Education Trust’s
St. John College of Engineering and Management, Palghar
(A Christian Religious Minority Institution)
Approved by AICTE and DTE, Affiliated to University of Mumbai/MSBTE
St. John Technical Campus, Vevoor, Manor Road, Palghar (E), Dist. Palghar, Maharashtra-401404
NAAC Accredited with Grade ‘A’
DEPARTMENT OF COMPUTER ENGINEERING

Om Bhamare BE B 05
Aldel Education Trust’s
St. John College of Engineering and Management, Palghar
(A Christian Religious Minority Institution)
Approved by AICTE and DTE, Affiliated to University of Mumbai/MSBTE
St. John Technical Campus, Vevoor, Manor Road, Palghar (E), Dist. Palghar, Maharashtra-401404
NAAC Accredited with Grade ‘A’
DEPARTMENT OF COMPUTER ENGINEERING

People detection:
import cv2
import numpy as np

def is_inside(o, i):


ox, oy, ow, oh =
oix, iy, iw, ih =
i
return ox > ix and oy > iy and ox + ow < ix + iw and oy + oh <iy
+ ihdef draw_person(image, person):
x, y, w, h = person
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 255),
2)img = cv2.imread("download.jpg")
hog = cv2.HOGDescriptor()
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetect
or()) found, w = hog.detectMultiScale(img)
found_filtered = []
for ri, r in enumerate(found):
for qi, q in
enumerate(found): if ri != qi
and is_inside(r, q):
brea
kelse:
found_filtered.append(r)
for person in
found_filtered:
draw_person(img,
person)
cv2.imshow("people detection", img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Om Bhamare BE B 05
Aldel Education Trust’s
St. John College of Engineering and Management, Palghar
(A Christian Religious Minority Institution)
Approved by AICTE and DTE, Affiliated to University of Mumbai/MSBTE
St. John Technical Campus, Vevoor, Manor Road, Palghar (E), Dist. Palghar, Maharashtra-401404
NAAC Accredited with Grade ‘A’
DEPARTMENT OF COMPUTER ENGINEERING
Output :
People detection:

Conclusion: Hence we studied the Object detection and recognition techniques, HOG descriptors, the scale
issue, the location issue, Non-maximum (or non-maxima) suppression, Support vector machines, People
detection.

Om Bhamare BE B 05

You might also like