You are on page 1of 3

Module 3 - Convolutional Neural Networks An RGB image is nothing but a matrix of pixel values having three channels whereas

matrix of pixel values having three channels whereas a grayscale image is the
same but it has a single channel.
 Convolution layers, max pool layers [Kernel, Padding, Stride, Flatten, ReLU, Dense/fully connected]
 ELU Gradient Decent
 Training CNN, CNN architectures: VGG, Google Net, ResNet (Transfer learning)
 Dropout, normalization
 rules update, Data augmentation

History

CNNs were first developed and used around the 1980s. The most that a CNN could do at that time was recognize
handwritten digits. It was mostly used in the postal sectors to read zip codes, pin codes, etc. The important thing
to remember about any deep learning model is that it requires a large amount of data to train and also requires a
lot of computing resources. This was a major drawback for CNNs at that period and hence CNNs were only
limited to the postal sectors and it failed to enter the world of machine learning.

Yann LeCun, director of Facebook’s AI Research Group, is the pioneer of convolutional neural networks. He
built the first convolutional neural network called LeNet in 1988. LeNet was used for character recognition tasks
like reading zip codes and digits.

In 2012, computer vision took a quantum leap when a group of researchers from the University of Toronto
Why not fully connected network in place of CNN
developed an AI model that surpassed the best image recognition algorithms and that too by a large margin. The
AI system, which became known as AlexNet (named after its main creator, Alex Krizhevsky), won the 2012
ImageNet computer vision contest with an amazing 85 percent accuracy. The runner-up scored a modest 74
percent on the test.

How does CNN recognize images?

Why CNN

In deep learning, a convolutional neural network (CNN/ConvNet) is a class of deep neural networks, most
commonly applied to analyze visual imagery. ConvNet uses a special technique called Convolution. Now in
mathematics, convolution is a mathematical operation on two functions that produces a third function that
Size of the image: 𝐇𝐞𝐢𝐠𝐡𝐭 × 𝐖𝐢𝐝𝐭𝐡 × 𝐂𝐡𝐚𝐧𝐧𝐞𝐥𝐬 expresses how the shape of one is modified by the other. In case of CNN, the neuron in a layer will only be
connected to a small region of the layer before it. The role of the ConvNet is to reduce the images into a form
that is easier to process, without losing features that are critical for getting a good prediction.
Key terms in CNN

1) Kernel is a filter that is used to extract the features from the images. It is a matrix that moves over the input
data, performs the dot product with the sub-region of input data, and gets the output. It moves based on the
stride value

5) ReLU Layer: The negative values from the images are removed and replaced with zero’s. This is done to
avoid the values from summing up to zero. ReLU function activates a node if the input is above a certain
quantity, while the input is below zero, the output is zero, but when the input rises above a certain threshold,
it has a linear relationship with the dependent variable.
The above image shows what a convolution is. We take a filter/kernel (3×3 matrix) and apply it to the input image
to get the convolved feature. This convolved feature is passed on to the next layer.

𝑂 = [𝑖 − 𝑘] + 1

0 = [5 − 3] + 1 = 3

2) Stride: The filter is moved across the image left to right, top to bottom, with 1 pixel column change on
horizontal movements, then 1 pixel row change on the vertical movement. The amount of movement between
applications of the filter to the input image is referred to as the stride. The default stride is (1,1) for the height
and width movement

𝑖−𝑘
𝑂= +1
s
6) Flattening: Once the pooled feature map is obtained, the next step is to flatten it. It involves transforming
3) Padding: It is the best approach, where the number of pixels needed for the convolutional kernel to process
the entire pooled feature map matrix into a single column which is then fed to the neural network for
the edge pixels are added onto the outside copying the pixels from the edges of the image. It fixes the Border
processing. This happens in fully connected layer, where the actual classification happens.
effect problem.

𝑖 − 𝑘 + 2p
𝑂= +1
𝑠

4) Pooling layer is responsible for reducing the spatial size of the Convolved Feature. This is to decrease the
computational power required to process the data by reducing the dimensions. There are two types of pooling:
average pooling and max pooling. In Max Pooling, we find the maximum value of a pixel from a portion of
the image covered by the kernel. On the other hand, Average Pooling returns the average of all the
values from the portion of the image covered by the Kernel.
7) Dense/fully connected: It is a layer that is deeply connected with its preceding layer which means the neurons
of the layer are connected to every neuron of its preceding layer. Dense layers are used when association can
exist among any feature to any other feature in data point. Since between two layers of size n1 and n2, there
can n1∗n2 connections and these are referred to as Dense.

Convolutional neural networks are composed of multiple layers of artificial neurons. The first layer usually
extracts basic features such as horizontal or diagonal edges. This output is passed on to the next layer which
detects more complex features such as corners or combinational edges. As we move deeper into the network it
can identify even more complex features such as objects, faces, etc. Based on the activation map of the final
convolution layer, the classification layer outputs a set of confidence scores (values between 0 and 1) that specify
how likely the image is to belong to a “class.”

You might also like