Professional Documents
Culture Documents
(a) What are the dimensions of the input and the outputs for the first two layers for a mini-
batch of 100 images?
(b)How many parameters are there in the Conv1 and MaxPool1 layers?
A convolutional layer 32 filters with height and width 3 and 0 padding which has both
a weight and a bias (i.e. CONV3-32)
A 2 × 2 max-pooling layer with stride 2 and 0 padding (i.e. POOL-2)
A batch normalization layer (i.e. BATCHNORM)
Compute the output activation volume dimensions and number of parameters of the layers.
You can write the activation shapes in the format (H, W, C where H, W, C are the height,
width, and channel dimensions, respectively.
1. What is the output activation volume dimensions and number of parameters for
CONV3-32?
2. What is the output activation volume dimensions and number of parameters for
POOL2?
3. What is the output activation volume dimensions and number of parameters for
BATCHNORM?
Suppose you want to redesign the AlexNet architecture to reduce the number of arithmetic
operations required for each backprop update. (i) Would you try to cut down on the number
of weights, units, or connections? Justify your answer. (ii) Would you modify the convolution
layers or the fully connected layers? Justify your answer.
In AlexNet, the input image is 227 * 227 * 3 and the first convolutional layer
contains 96 filters with K=11 and stride = 4.
1. What would be the width, height and depth of the output with padding = 0.
2. What would be the width, height and depth of the output with padding = 2
Here's the calculation of the output dimensions for the first convolutional layer in
AlexNet:
Input Image:
Width: 227 pixels
Height: 227 pixels
Depth (channels): 3 (RGB)
Convolutional Layer Parameters:
Kernel size (K): 11 x 11
Number of filters (No): 96
Stride: 4
Padding:
We can use the following formula to calculate the output height (H_out) and width
(W_out) for convolutional layers:
For the YOLO algorithm, assume we have the following two boxes: the
lower-right box with the size of 2×3 and the upper-left box with the
size of 2×2, and the size of the overlapping region is 1×1. What is the
Box Dimensions:
Lower-right box: 2 x 3
Upper-left box: 2 x 2
Overlapping region: 1 x 1
Calculating IoU:
1. Intersection Area: The overlapping region is a square with a size of 1 x 1,
resulting in an area of 1 square unit.
2. Union Area:
o Calculate the area of each box separately:
Lower-right box: 2 (width) * 3 (height) = 6 square units
Upper-left box: 2 (width) * 2 (height) = 4 square units
o Since there might be some overlap in the calculation of the union area
(the overlapping region is counted once for each box), we need to
subtract the overlapping area to avoid double-counting.
o Union Area = Area of lower-right box + Area of upper-left box -
Overlapping Area = 6 + 4 - 1 = 9 square units
3. IoU: Intersection Area / Union Area = 1 / 9 = 0.111... (rounded to four decimal
places)
Suppose you are running non-max suppression during the YOLO algorithm on the
predicted boxes as shown below. Assume that boxes with probability less than or equal to
0.4 are discarded, and the low threshold for deciding if two boxes overlap is 0.5. How
many boxes will remain after non-max suppression stage?
To compute the Average Precision (AP) for each object class (A and B) and the Mean
Average Precision (mAP), we need to follow these steps:
For each detection, we need to determine whether it's a true positive (TP) or false positive
(FP) based on its IoU (Intersection over Union) with the ground truth bounding box. We'll
use a threshold of 0.5 for IoU.
We'll compute the area under the Precision-Recall curve for each object class.
First, let's calculate the IoU for each bounding box and ground truth. Then, we'll
determine whether each detection is a TP or FP based on the IoU threshold. Finally, we'll
compute Precision and Recall at different thresholds.
To calculate the Intersection over Union (IoU) for each bounding box and ground truth,
we'll use the formula:
Where:
- Area_of_Overlap is the area of intersection between the bounding box and the ground
truth.
- Area_of_Union is the area of union between the bounding box and the ground truth.
Let's calculate IoU for each detection and ground truth:
For Object A:
For Object B:
Next, we'll determine whether each detection is a true positive (TP) or false positive (FP)
based on the IoU threshold of 0.5.
To determine whether each detection is a true positive (TP) or false positive (FP) based
on the IoU threshold of 0.5, we'll compare the calculated IoU values with the threshold:
For Object A:
For Object B:
For Object A:
- FP: 2
For Object B:
- FP: 4
Now, let's compute the Average Precision (AP) for each class and then compute the Mean
Average Precision (mAP).
Now that we have computed the Precision and Recall values for each object class, let's
compute the Average Precision (AP) for each class and then compute the Mean Average
Precision (mAP).
For Object A:
- Precision = 0.5
- Recall = 1.0
To compute AP for Object A, we'll use the trapezoidal rule to approximate the area under
the Precision-Recall curve (since we only have one point, the area is simply the rectangle
formed by the point and the axes).
For Object B:
- Precision = 0.0
- Recall = 0.0
Now, let's compute the Mean Average Precision (mAP) by averaging the APs for both
object classes:
mAP = (AP for Object A + AP for Object B) / Total number of object classes
= (0.5 + 0.0) / 2
= 0.25
So, the Mean Average Precision (mAP) for the given detections is 0.25.
This indicates the overall performance of the object detector across both classes,
considering both precision and recall.