You are on page 1of 106

TRƯỜNG ĐẠI HỌC BÁCH KHOA HÀ NỘI

XỬ LÝ ẢNH TRONG CƠ ĐIỆN TỬ


Machine Vision

Giảng viên: TS. Nguyễn Thành Hùng


Đơn vị: Bộ môn Cơ điện tử, Viện Cơ khí

Hà Nội, 2021 1
Chapter 7. Object Recognition

❖1. Introduction

❖2. Pattern Matching

❖3. Feature-based Methods

❖4. Artificial Neural Networks

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 2
1. Introduction
▪ Object recognition: localize and to classify objects.
▪ General concept:
➢ training datasets containing images with known and labelled objects;
➢ extracts different types of information (colours, edges, geometric forms) based on the
chosen algorithm
➢ for any new image the same information is gathered and compared to the training
dataset to find the most suitable classification.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 3
1. Introduction
▪ Applications:
➢ robots in industrial environments,
➢ face or handwriting recognition
➢ autonomous systems such as modern cars which use object recognition for pedestrian
detection, emergency brake assistant and so on.
➢…

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 4
1. Introduction

▪ General Object Recognition Strategies


➢ Appearance-based method
➢ Feature-based method
➢ Interpretation Tree
➢ Pattern Matching
➢ Artificial Neural Networks

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 5
1. Introduction
▪ General Object Recognition Strategies: Appearance-based method
➢ Face or handwriting recognition
➢ Reference training images
➢ This dataset is compressed to obtain a lower dimension subspace, also called eigenspace.
➢ Parts of the new input images are projected on the eigenspace and then correspondence is
examined.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 6
1. Introduction
▪ General Object Recognition Strategies: Feature-based Method
➢ Characteristic for each object
➢ Colours, contour lines, geometric forms or edges
➢ The basic concept of feature-based object recognition strategies is following:
• Every input image is searched for a specific type of feature,
• This feature is then compared to a database containing models of the objects in order to
verify if there are recognised objects.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 7
1. Introduction

▪ General Object Recognition Strategies: Feature-based method


➢ Features and their descriptors can be either found considering the whole image (global
feature) or after observing just small parts of the image (local feature).
➢ An histogram of the pixel intensity or colour are simple examples for global features.
➢ It is not always reasonable to compare the whole image, as already slight changes in
illumination, position (occlusion) or rotation lead to significant differences and a correct
recognition is not possible anymore.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 8
1. Introduction

▪ General Object Recognition Strategies: Feature-based method


➢ Descriptors of local features are more robust against these problems and therefore
algorithms with local features often outperform global feature-based methods.

Two patches of different


images are cut and
compared if the error
between the patches is
below a certain threshold.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 9
1. Introduction

▪ General Object Recognition Strategies: Interpretation Tree


➢ Interpretation tree is a depth first search algorithm for model matching.
➢ Algorithms based on this approach often try to recognise n-dimensional geometric
objects, therefore a database containing models with known features is necessary.
➢ The feature set might consist of distance, angle and direction constraints between points
on the surface of the objects.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 10
1. Introduction

▪ General Object Recognition Strategies: Interpretation Tree

Procedure of an interpretation tree algorithm

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 11
1. Introduction

▪ General Object Recognition Strategies: Pattern Matching


➢ Methods of pattern matching, or sometimes called template matching, are often used
because of their simplicity.
➢ Template matching is a technique for finding small parts of an image which match a
template image.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 12
1. Introduction

▪ General Object Recognition Strategies: Pattern Matching


➢ One famous application of template matching is traffic sign recognition, small parts of the
input image are tried to be matched with a database full of different images of traffic signs.
➢ This approach has lots of disadvantages such as problems with occlusion, rotation, scaling,
different illuminations.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 13
1. Introduction

▪ General Object Recognition Strategies: Artificial neural networks


➢ A model consists of several layer, in which each layer is composed of a certain number of
neurons.

A neural network containing one input layer, two hidden layer and one output layer.
Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 14
1. Introduction

▪ General Object Recognition Strategies: Artificial neural networks


➢ An input and an output layer is the minimum amount of layers a network can have, but
normally hidden layer are included to be able to learn more complex things such as object
recognition.
➢ All neurons from one layer are connected to all neurons from the next layer and therefore
create a huge network with millions of parameters.
➢ All of these connections have a weight which is updated during learning phase. Neurons
are activated if the sum of the input signals is above a certain threshold and a activation
function triggers the output.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 15
1. Introduction

▪ General Object Recognition Strategies: Artificial neural networks

A neural network containing one input layer, two hidden layer and
one output layer
Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 16
1. Introduction

▪ General Object Recognition Strategies: Artificial neural networks


➢ There are different types of networks such as feed-forward, recurrent with different
number and types of hidden layers, while the input (e.g. number of pixels) and output
(number of classes) layer are fixed.
➢ Later, convolutional neural networks and their hidden layers are explained in a more
detailed way in Section 4. New inputs go through the same way, some neurons might be
activated based on the trained network and finally, this leads to the most suitable
classification.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 17
1. Introduction

▪ Performance Analysis
➢ Invariances and Robustness
➢ Complexity
➢ Reliability and Accuracy

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 18
1. Introduction
❖ Performance Analysis: Invariances and Robustness
▪ First, the algorithms are analysed and checked whether invariances occur and what level
of robustness they have.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 19
1. Introduction
❖ Performance Analysis: Complexity
▪ Secondly, the algorithms are compared regarding complexity, especially in terms of
computational load and memory usage.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 22
1. Introduction
❖ Performance Analysis: Reliability and Accuracy

The development of accuracy rates of


traditional computer vision and deep
learning regarding ImageNet

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 26
Chapter 7. Object Recognition

❖1. Introduction

❖2. Pattern Matching

❖3. Feature-based Methods

❖4. Artificial Neural Networks

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 28
2. Pattern Matching
❖ Template matching is a technique for finding areas of an image that match (are similar) to
a template image (patch).
❖ How does it work?
▪ We need two primary components:
▪ Source image (I): The image in which we expect to find a match to the template image
▪ Template image (T): The patch image which will be compared to the template image our
goal is to detect the highest matching area:

https://docs.opencv.org/4.3.0/de/da9/tutorial_template_matching.html 29
2. Pattern Matching

❖ Template matching

https://docs.opencv.org/4.3.0/de/da9/tutorial_template_matching.html 30
2. Pattern Matching
❖ Template matching
▪ To identify the matching area, we have to compare the template image against the source
image by sliding it:

https://docs.opencv.org/4.3.0/de/da9/tutorial_template_matching.html 31
2. Pattern Matching
❖ Template matching
▪ By sliding, we mean moving the patch one pixel at a time (left to right, up to down). At
each location, a metric is calculated so it represents how "good" or "bad" the match at that
location is (or how similar the patch is to that particular area of the source image).
▪ For each location of T over I, you store the metric in the result matrix R. Each location
(x,y) in R contains the match metric:

https://docs.opencv.org/4.3.0/de/da9/tutorial_template_matching.html 32
2. Pattern Matching
❖ Template matching

https://docs.opencv.org/4.3.0/de/da9/tutorial_template_matching.html 33
2. Pattern Matching
❖ Template matching
▪ The image above is the result R of sliding the patch with a metric
TM_CCORR_NORMED. The brightest locations indicate the highest matches. As you can
see, the location marked by the red circle is probably the one with the highest value, so
that location (the rectangle formed by that point as a corner and width and height equal to
the patch image) is considered the match.

https://docs.opencv.org/4.3.0/de/da9/tutorial_template_matching.html 34
2. Pattern Matching
❖ Template matching
▪ Which are the matching methods available in OpenCV?

https://docs.opencv.org/4.3.0/de/da9/tutorial_template_matching.html 35
2. Pattern Matching
❖ Template matching
▪ Which are the matching methods available in OpenCV?

https://docs.opencv.org/4.3.0/de/da9/tutorial_template_matching.html 36
Chapter 7. Object Recognition

❖1. Introduction

❖2. Pattern Matching

❖3. Feature-based Methods

❖4. Artificial Neural Networks

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 37
3. Feature-based Methods

▪ Feature Detectors
▪ Feature Descriptors
▪ Feature Matching

38
3. Feature-based Methods
❖ Feature detectors

Image pairs with extracted patches below. Notice how some patches
can be localized or matched with higher accuracy than others.
Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 39
3. Feature-based Methods
❖ Feature detectors
▪ The simplest possible matching criterion for comparing two image patches:

where I0 and I1 are the two images being compared, u = (u, v) is the displacement vector, w(x) is a
spatially varying weighting (or window) function, and the summation i is over all the pixels in the patch.

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 40
3. Feature-based Methods
❖ Feature detectors

Aperture problems for different image patches: (a) stable (“corner-like”) flow; (b) classic aperture problem
(barber-pole illusion); (c) textureless region. The two images I0 (yellow) and I1 (red) are overlaid. The red
vector u indicates the displacement between the patch centers and the w(xi) weighting function (patch
window) is shown as a dark circle.

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 41
3. Feature-based Methods
❖ Feature detectors
▪ auto-correlation function or surface

Three auto-correlation surfaces EAC(Δu) shown as both grayscale


images and surface plots: (a) The original image is marked with
three red crosses to denote where the auto-correlation surfaces
were computed; (b) this patch is from the flower bed (good
unique minimum); (c) this patch is from the roof edge (one-
dimensional aperture problem); and (d) this patch is from the
cloud (no good peak). Each grid point in figures b–d is one value
of Δu.

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 42
3. Feature-based Methods
❖ Feature detectors
▪ auto-correlation function or surface

Uncertainty ellipse corresponding to an eigenvalue analysis of


the auto-correlation matrix A.

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 43
3. Feature-based Methods
❖ Feature detectors
▪ Forstner–Harris

Interest operator responses: (a) Sample image, (b) Harris response, and (c) DoG response. The circle sizes
and colors indicate the scale at which each interest point was detected. Notice how the two detectors
tend to respond at complementary locations.
Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 44
3. Feature-based Methods
❖ Feature detectors
▪ Adaptive non-maximal suppression (ANMS)

Adaptive non-maximal suppression (ANMS) (Brown,


Szeliski, and Winder 2005): The upper two images show
the strongest 250 and 500 interest points, while the
lower two images show the interest points selected
with adaptive non-maximal suppression, along with the
corresponding suppression radius r. Note how the latter
features have a much more uniform spatial distribution
across the image.

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 45
3. Feature-based Methods
❖ Feature detectors
▪ Scale invariance

Multi-scale oriented patches (MOPS) extracted at five pyramid levels (Brown, Szeliski, and
Winder 2005). The boxes show the feature orientation and the region from which the
descriptor vectors are sampled.
Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 46
3. Feature-based Methods
❖ Feature detectors
▪ Scale invariance

Scale-space feature detection using a sub-octave Difference of Gaussian pyramid (Lowe 2004): (a) Adjacent
levels of a sub-octave Gaussian pyramid are subtracted to produce Difference of Gaussian images; (b) extrema
(maxima and minima) in the resulting 3D volume are detected by comparing a pixel to its 26 neighbors.
Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 47
3. Feature-based Methods
❖ Feature detectors
▪ Rotational invariance and orientation estimation

A dominant orientation estimate


can be computed by creating a
histogram of all the gradient
orientations (weighted by their
magnitudes or after thresholding
out small gradients) and then
finding the significant peaks in this
distribution (Lowe 2004)

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 48
3. Feature-based Methods
❖ Feature detectors
▪ Rotational invariance and orientation estimation

Affine region detectors used to match two images taken from dramatically different viewpoints
(Mikolajczyk and Schmid 2004)
Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 49
3. Feature-based Methods
❖ Feature detectors
▪ Affine invariance

Affine normalization using the second moment matrices, as described by Mikolajczyk, Tuytelaars, Schmid et
al. (2005): After image coordinates are transformed using the matrices A0-1/2 and A1-1/2, they are related by a
pure rotation R, which can be estimated using a dominant orientation technique.

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 50
3. Feature-based Methods
❖ Feature detectors
▪ Affine invariance

Maximally stable extremal regions (MSERs) extracted and matched from a number of images
(Matas, Chum, Urban et al. 2004)

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 51
3. Feature-based Methods

▪ Feature Detectors
▪ Feature Descriptors
▪ Feature Matching

52
3. Feature-based Methods
❖ Feature descriptors

Feature matching: how can we extract local descriptors that are invariant to inter-image variations and yet
still discriminative enough to establish correct correspondences?

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 53
3. Feature-based Methods
❖ Feature descriptors
▪ Bias and gain normalization (MOPS)

MOPS descriptors are formed using an 8×8 sampling of bias and gain normalized intensity values, with a
sample spacing of five pixels relative to the detection scale (Brown, Szeliski, and Winder 2005). This low
frequency sampling gives the features some robustness to interest point location error and is achieved by
sampling at a higher pyramid level than the detection scale.
Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 54
3. Feature-based Methods
❖ Feature descriptors
▪ Scale invariant feature transform (SIFT)
A schematic representation of Lowe’s
(2004) scale invariant feature transform
(SIFT): (a) Gradient orientations and
magnitudes are computed at each pixel
and weighted by a Gaussian fall-off
function (blue circle). (b) A weighted
gradient orientation histogram is then
computed in each subregion, using trilinear
interpolation. While this figure shows an 8
× 8 pixel patch and a 2 × 2 descriptor array,
Lowe’s actual implementation uses 16 × 16
patches and a 4 × 4 array of eight-bin
histograms.

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 55
3. Feature-based Methods
❖ Feature descriptors
▪ Gradient location-orientation histogram (GLOH)

The gradient location-


orientation histogram
(GLOH) descriptor uses log-
polar bins instead of square
bins to compute orientation
histograms (Mikolajczyk and
Schmid 2005).

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 56
3. Feature-based Methods
❖ Feature descriptors

Spatial summation blocks for SIFT, GLOH, and some newly developed feature descriptors (Winder and Brown 2007): (a)
The parameters for the new features, e.g., their Gaussian weights, are learned from a training database of (b) matched
real-world image patches obtained from robust structure from motion applied to Internet photo collections (Hua, Brown,
and Winder 2007).

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 57
3. Feature-based Methods

▪ Feature Detectors
▪ Feature Descriptors
▪ Feature Matching

58
3. Feature-based Methods
❖ Feature matching
▪ Matching strategy and error rates

Recognizing objects in a cluttered scene (Lowe 2004). Two of the training images in the database are shown on the left.
These are matched to the cluttered scene in the middle using SIFT features, shown as small squares in the right image.
The affine warp of each recognized database image onto the scene is shown as a larger parallelogram in the right image.
Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 59
3. Feature-based Methods
❖ Feature matching
▪ Matching strategy and error rates

False positives and negatives: The black digits 1 and


2 are features being matched against a database
of features in other images. At the current
threshold setting (the solid circles), the green 1 is a
true positive (good match), the blue 1 is a false
negative (failure to match), and the red 3 is a false
positive (incorrect match). If we set the threshold
higher (the dashed circles), the blue 1 becomes a
true positive but the brown 4 becomes an
additional false positive.

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 60
3. Feature-based Methods
❖ Feature matching
▪ Matching strategy and error rates

The number of matches correctly and incorrectly estimated by a feature matching algorithm, showing the number of
true positives (TP), false positives (FP), false negatives (FN) and true negatives (TN). The columns sum up to the actual
number of positives (P) and negatives (N), while the rows sum up to the predicted number of positives (P’) and
negatives (N’). The formulas for the true positive rate (TPR), the false positive rate (FPR), the positive predictive value
(PPV), and the accuracy (ACC) are given in the text.

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 61
3. Feature-based Methods
❖ Feature matching
▪ Matching strategy and error rates

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 62
3. Feature-based Methods
❖ Feature matching
ROC curve and its related rates: (a) The
▪ Matching strategy and error rates ROC curve plots the true positive rate
against the false positive rate for a
particular combination of feature
extraction and matching algorithms.
Ideally, the true positive rate should be
close to 1, while the false positive rate is
close to 0. The area under the ROC curve
(AUC) is often used as a single (scalar)
measure of algorithm performance.
Alternatively, the equal error rate is
sometimes used. (b) The distribution of
positives (matches) and negatives (non-
matches) as a function of inter-feature
distance d. As the threshold θ is increased,
the number of true positives (TP) and false
positives (FP) increases.
Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 63
3. Feature-based Methods
❖ Feature matching
▪ Matching strategy and error rates

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 64
3. Feature-based Methods
❖ Feature matching
▪ Matching strategy and error rates

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 65
3. Feature-based Methods
❖ Feature matching
▪ Nearest neighbor distance ratio

where d1 and d2 are the nearest and second nearest neighbor distances, DA is the target
descriptor, and DB and DC are its closest two neighbors

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 66
3. Feature-based Methods

❖ Feature matching

Performance of the feature descriptors evaluated by Mikolajczyk and Schmid (2005), shown for three matching
strategies: (a) fixed threshold; (b) nearest neighbor; (c) nearest neighbor distance ratio (NNDR). Note how the
ordering of the algorithms does not change that much, but the overall performance varies significantly between the
different matching strategies.

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 67
Chapter 7. Object Recognition

❖1. Introduction

❖2. Pattern Matching

❖3. Feature-based Methods

❖4. Artificial Neural Networks

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 68
4. Artificial Neural Networks
❖ CNN - Convolutional Neural Network
▪ (Deep) convolutional neural networks (CNN): The term deep means that there is at least
one hidden layer and convolutional implies the use of convolution layers. The basic
principles of CNNs are inspired by the biological visual cortex of humans.
▪ The architecture of an example CNN can be seen in Slide 70. Input images with 28x28
pixels are convoluted with a filter to obtain 3D feature maps. The succeeding sub-
sampling, or often called pooling, layer further reduces the amount of data. This
procedure is continued until a one-dimensional vector, which represents the different
classes, is obtained.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 69
4. Artificial Neural Networks
❖ CNN - Convolutional Neural Network

One example architecture of a convolutional neural network using subsampling and convolution hidden layers.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 70
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network


▪ As most of the object recognition algorithms, CNNs need a training to adapt all weights of
the neurons. During that phase, different levels of features are extracted (see Slide 72).
▪ Low-level features contain colour, lines or contrast, whereas edges and corner belong to
mid-level features. High-level features already include class specific forms or sections.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 71
4. Artificial Neural Networks
❖ CNN - Convolutional Neural Network

Intermediate results from hidden layers. From left to right: low-level, mid-level and high-level features.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 72
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network


▪ Architecture Overview
➢ Regular Neural Nets: Neural Networks receive an input (a single vector), and
transform it through a series of hidden layers.
➢ Regular Neural Nets don’t scale well to full images.
➢ 3D volumes of neurons: width, height, depth.
➢ A ConvNet is made up of Layers. Every Layer has a simple API: It transforms an
input 3D volume to an output 3D volume with some differentiable function that
may or may not have parameters.

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 73
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network


▪ Architecture Overview

A regular 3-layer Neural Network.


Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 74
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network


▪ Architecture Overview

A ConvNet arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers.
Every layer of a ConvNet transforms the 3D input volume to a 3D output volume of neuron activations. In this
example, the red input layer holds the image, so its width and height would be the dimensions of the image,
and the depth would be 3 (Red, Green, Blue channels).
Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 75
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network


▪ Layers used to build ConvNets
➢ Three main types of layers to build ConvNet architectures:
▪ Convolutional Layer
▪ Pooling Layer, and
▪ Fully-Connected Layer

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 76
4. Artificial Neural Networks

➢ Example Architecture: Overview: a simple ConvNet for CIFAR-10 classification

The activations of an example ConvNet architecture.


Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 77
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network


▪ Layers used to build ConvNets
➢ Example Architecture: Overview: a simple ConvNet for CIFAR-10 classification
▪ INPUT [32x32x3]: an image of width 32, height 32, and with three color
channels R,G,B.
▪ CONV layer will compute the output of neurons that are connected to local
regions in the input → volume [32x32x12] if we decided to use 12 filters.

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 78
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network


▪ Layers used to build ConvNets
➢ Example Architecture: Overview: a simple ConvNet for CIFAR-10 classification
▪ RELU layer: elementwise activation function → the size of the volume
unchanged ([32x32x12]).
▪ POOL layer: downsampling operation → volume such as [16x16x12].
▪ FC (i.e. fully-connected) layer will compute the class scores, resulting in
volume of size [1x1x10].

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 79
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network


▪ Layers used to build ConvNets
➢ A ConvNet architecture is in the simplest case a list of Layers that transform the
image volume into an output volume (e.g. holding the class scores)
➢ There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far
the most popular)
➢ Each Layer accepts an input 3D volume and transforms it to an output 3D
volume through a differentiable function
➢ Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL
don’t)
➢ Each Layer may or may not have additional hyperparameters (e.g.
CONV/FC/POOL do, RELU doesn’t)
Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 80
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network


▪ Layers used to build ConvNets
➢ Input image

4x4x3 RGB Image

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 81
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network


▪ Layers used to build ConvNets
➢ Convolutional Layer

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 82
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network


▪ Layers used to build ConvNets
➢ Convolutional Layer

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 83
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network


▪ Layers used to build ConvNets
➢ Convolutional Layer

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 84
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network


▪ Layers used to build ConvNets
➢ Convolutional Layer
Parameters that control the behavior of each convolved layer
▪ Stride
▪ Padding (Zero-padding)
▪ Number of filter (depth of next layer)
▪ Size of the filter

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 85
4. Artificial Neural Networks

➢ Convolutional Layer
▪ Stride

Value of stride is 1, with filter 3x3 on 7x7 image. Value of stride is 2, with filter 3x3 on 7x7 image.

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 86
4. Artificial Neural Networks

➢ Convolutional Layer
▪ Stride

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 87
4. Artificial Neural Networks

➢ Convolutional Layer
▪ Padding (Zero-padding)

Showing image padded with zero-padding with value 2.


Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 88
4. Artificial Neural Networks

➢ Convolutional Layer
▪ Padding (Zero-padding)
• same convolution: preserve the dimension of the image
• wide convolution: Adding zero-padding
• narrow convolution: not using zero-padding

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 89
4. Artificial Neural Networks

➢ Convolutional Layer
▪ Number of filter (depth of next layer)
• Example: 6x6x3 image with four 3x3 filter.
• After convolving, will get 4x4xn, n is depends on the number of filter you use,
in another words, means that depends on the number of feature detector you
use. In this case, n will be 4.

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 90
4. Artificial Neural Networks

➢ Convolutional Layer
▪ Size of the filter
• Size of the filter usually is odd number so that the filter has the “central
pixel”/”central vision” so to know the position of the filter.

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 91
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network


▪ Layers used to build ConvNets
➢ Activation Function

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 92
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network


▪ Layers used to build ConvNets
➢ Activation Function
▪ Sigmoid Activation Function

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 93
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network


▪ Layers used to build ConvNets
➢ Activation Function
▪ ReLU Activation Function

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 94
4. Artificial Neural Networks

➢ Activation Function
▪ ReLU Activation Function

Applying ReLU to image.


Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 95
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network


▪ Layers used to build ConvNets
➢ The Pooling Layer: Spatial Pooling reduces the dimensionality of each feature
map and retains the most important information of an image.
▪ Average pooling
▪ Max pooling
▪ Sum pooling

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 96
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network


▪ Layers used to build ConvNets
➢ The Pooling Layer

Max-pooling in 2D image.
Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 97
4. Artificial Neural Networks

➢ The Pooling Layer

Max-pooling in 3D image, which is the one that we normally deal with.


Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 98
4. Artificial Neural Networks

➢ The Pooling Layer

Applying Max/Sum pooling to image that applied ReLU.


Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 99
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network


▪ Layers used to build ConvNets
➢ Fully Connected Layer (FC)

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 100
4. Artificial Neural Networks

➢ Fully Connected Layer (FC)

Fully connected layer.


Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 101
4. Artificial Neural Networks

➢ Fully Connected Layer (FC)


▪ Softmax function: takes a vector of arbitrary real-valued scores and squashes
it to a vector of values between zero and one that sum to one.

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 102
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network


➢ LeNet

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 103
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network


➢ AlexNet

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 104
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network


➢ ZFNet

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 105
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network


➢ Inception-v4

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 106
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network


➢ VGGNet

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 107
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network


➢ VGGNet

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 108
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network


➢ ResNet

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 109
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network


➢ Example: LeNet

110
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network


➢ Example: LeNet

111
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network


➢ Example: LeNet

112

You might also like