You are on page 1of 3

 Unlike classification networks such as ResNets or VGG net, the object detection

algorithm has to identify multiple objects and specify their exact location as
shown in the image.
 This property of predicting the bounding boxes around the objects is known
as object localization.
 Object localization needs to predict the height, width and location of bounding
box around the image.
 Before specifying the bounding box attributes of each object the image is divided
into (S \times S)(S×S) grid cells as shown in the picture.
 If the centre of the object falls on in a grid cell then that grid cell is responsible for
predicting the object.
 he target label yy defines each of the grid cells.
 y is a vector given by y = \begin{bmatrix}p \\ b_x \\ b_y\\ b_h\\ b_w \\ c_1 \\
c_2 \vdots \\ c_n\end{bmatrix}⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡pbxbybhbwc1c2⋮cn⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤ .
 p is known as object confidence that gives the probability of the presence of an
object in the bounding box.
 c_1, c_2 ... c_n is the class confidence intervals For example, if you have one of
the two classes to identify pedestrian or a car, then c_1 gives the probability that
the grid cell has a car and c_2 gives the probability of the presence of a
pedestrian

Note the object confidence  \bold pp is different from


that of the class confidence  \bold cc.
 p is the probability of the presence of an object
within the bounding box irrespective of the class
of object.
 c is the probability of the object belonging to a
particular class under the probability 

IOU
 Intersection over union(IOU) is a measure of the accuracy of the predicted
bounding box against the ground truth box (the actual bounding box).
 It is the ratio of area covered by the intersection of ground truth box and
predicted box to the area covered by the union of these to boxes.
 The maximum possible value of IOU is 1. If the measured IOU is greater than the
set threshold, we can conclude that predicted bounding box is close to the
ground truth box.

 Consider a case where two objects share the same center as shown in the
image.

 If the two objects remain in the same bounding box then our model ends up
predicting one of the objects which should not be the case.

Swipe next to see how the concept of anchor boxes eliminates this problem.

Anchor Boxes
 Assuming the object in each grid cell can either fit in one of two fixed anchor
boxes and we have one of the two class of objects to be identified, then the
target label y is defined as

\small y = \begin{bmatrix}p_1 \ b_{x1} \ b_{y1} \ b_{h1} \ b_{w1} \ c_{11} \


c_{12} \ p_2 \ b_{x2} \ b_{y2} \ b_{h2} \ b_{w2} \ c_{21} \ c_{22}
\end{bmatrix}y=[p1 bx1 by1 bh1 bw1 c11 c12 p2 bx2 by2 bh2 bw2 c21 c22]

 In general, if image is divided in S \times SS×S grid and each grid is defined by
B number of bounding box and each box is responsible for predicting C number
of classes then the dimension target, y is \small S \times S \times (B*(5 +
C))S×S×(B∗(5+C)).

Hand Engineering
 The dataset to train the object detection model is slightly different from the object
classification model.
 Each image in the training data for object detection is divided manually into S x S
grid cells.
 If the center of the object of interest falls in a grid cell and fits in one of the anchor
boxes, then p-value of that anchor box and the class value of that object are set
to 1 along with the bounding box attributes.

Overview
 In You Only Look Once (YOLO) algorithm, you run the image through a CNN
model and detect the object through a single pass.
 This algorithm identifies multiple bounding boxes for the same object. Hence, we
use a method called non-max suppression to filter out single prediction box for
each object in the image. Rest of the cards show you step by step procedure of
how YOLO algorithm works.

You might also like