Principles of Convolutional Neural Networks

Convolutional Neural Networks (CNN)
Essentially, CNNs are feedforward neural networks, which are the simplest type of
neural network. There are only a few connections between neurons in the same layer. Provide
the output to the next layer after receiving it from the previous layer. The layers do not
communicate with each other. Multilayer structure with unidirectional orientation. In the same
layer of neurons, the information is not transmitted in both directions, and there is no
communication between layers. With the exception of the input and output layers, all layers in
the middle are hidden layers, and a hidden layer consists of one or more layers. Pattern
classification commonly uses CNNs because they do not require complicated preprocessing of
images and can be directly input into original images.
Principles/Conventions to build a CNN architecture
Convolutional neural networks are constructed by keeping the feature space broad and
shallow at the beginning of the network, then narrowing it and making it deeper as the network
progresses. Here are a few rules to follow to help guide you as you build your CNN architecture
keeping the above principle in mind.
1. To better represent information on a global, high-level, and representative scale, always
start with smaller filters to capture as much local information as possible. Reducing the
generated feature space requires increasing the filter width gradually.
2. It is recommended in principle to start with fewer channels so that they can detect low-level
features that are combined to form many complex shapes (with a greater number of
channels) that can help distinguish between classes.

3. To help in learning more levels of global abstract structures, the number of filters is
increased in order to increase the depth of the feature space. Another benefit of a deeper
and narrower feature space is that it will diminish the feature space to fit into dense
networks.
4. When dealing with moderate or small-sized images, we generally use 3x3, 5x5, and 7x7
filter sizes for the convolutional layer. For Max-Pooling, we use 2x2 or 3x3 filter sizes with a
stride of 2. A large image can be shrunk down to a moderate size by using larger filters and
strides, and then carried out using the convention.
5. When you believe the borders of the image might be important or when you just wish to
lengthen your network architecture, you can use padding = same as it keeps the dimensions
unchanged after the convolution and thereby allowing you to use more convolutions
without shrinking their size.
6. As you add layers, you will become over-fit. Upon achieving sufficient accuracy in our
validation set, we can use regularization methods, such as L1/L2, dropouts, batch norms,
data augmentations, etc., to reduce over-fitting.
Structure of Convolutional Neural Networks

Essentially, the input image, the convolution step, and the depth (Feature Map) are all
layers in the CNN structure diagram above. Use the above procedure to obtain a feature map
with a fifth layer depth of 5, then connect the five feature maps, which are five matrices, into a
vector by row expansion. BP neural networks make up the fully connected layer in the Fully
Connected layer. The feature maps can be interpreted as neurons, arranged in a matrix, similar
to the neurons in the BP neural network. The calculation process for the pooling and product
calculations.
Biologically based visual cognition mechanisms are the basis for the creation of
convolutional neural networks. Currently, CNN is a research hotspot in many scientific fields,
especially when it comes to classification of images because the network is able to directly
input the image without complex pre-processing, which is more widely applicable. It can be
used to classify images, recognize targets, detect targets, segment semantically, etc.
Convolution layer
The first thing you should do is define convolution. It consists in generating a third
function by f and g through two other functions, convolution. A new output is obtained by
combining the convolution kernel with the input. The structure of the incoming information is
determined by the activation map (also called feature mapping). If the input is an array of
32*32*3 pixel values, the filter (sometimes called a neuron or kernel) acts on the receptive
field. This filter is an array (the elements are called weights or parameters), but the important
thing is that the filter depth must match the depth of the input (so you can make math
computations). Therefore the filter size is 5 x 5 x 3. This is done by sliding the filter from the
upper left corner of the image to the right, one pixel at a time, and multiplying the value before
each slide by the pixel value in the original image (also called the dot product).
Numbers are derived by adding the items together. A 28 x 28 array represents the results of
this filter, which is able to yield 28 x 28 results.
Fully connected layer
The fully connected layer looks at the output of the previous layer (which corresponds to
the activation maps of the more advanced features) and determines which features match the
most closely to those categories. By calculating the dot product of the weights and the previous
layer, you can derive the accuracy of classifications when the fully connected layer identifies
features that are well matched and those with specific weights.
(To say, the full connection functions as a classifier. The feature vector extracted from the
upper layer is used to determine the weight of each category and its likelihood of being
output.) The fully connected layer also integrates the previously obtained Feature map into a
value. By observing a large value, we can determine that the feature we are looking for is
independent of the position. In the fully connected layer, the role of the extracting of features
is part of its function, while that of the categorizing function is part of its function.
Pooling layer
An individual feature whose activation value is high (the characteristic value of that feature)
has a greater relative position to other features than its absolute position. In this layer, the
input volume is greatly reduced in size (the length and width have changed, but the depth has
not changed). This layer is called the pooling layer.
Two main purposes:
1. Statistical weight parameters are reduced by 75% (by using 2x2 pooling layers), resulting
in a reduction of computational costs.
2. Overfitting can be controlled.
There are many options for the pooling layer:
1. There should be a filter (typically 2x2) and a stride of the same length to achieve
maximum pooling. The output is then determined by applying it to each sub-area of the
filter convolution calculation.
2. Average pooling.
3. L2-norm pooling.
Dropout layer
This ensures new samples are not artificially flattened when the neural network's
weights are too close to those from the training sample. There is nothing complicated about
Dropout. A Dropout layer sets the activation parameter set to 0 in the forward pass, which is
equivalent to removing a random set of activation parameters from a layer. As a result, even if
some activation parameters are discarded, the network will be able to offer the appropriate
classification or output for the given sample. A mechanism such as this will prevent the neural
network from over-fitting the training samples, alleviating overfitting. A second important
aspect is that the Dropout layer can be used only in training, not in system testing.
Common uses for CNNs
CNNs are commonly used for image classification, for example to identify satellite
images that contain roads or to classify handwritten letters and numbers. In the mainstream,
CNNs are highly effective at image segmentation and signal processing, for example. A CNN can
be used for understanding in Natural Language Processing (NLP) and speech recognition,
however, recurrent neural networks (RNNs) are more commonly employed for NLP tasks. There
is also the possibility of implementing a CNN in a U-Net architecture, which is essentially a CNN
comprised of two almost identical CNNs. With U-nets, we can achieve similar output sizes to
our inputs, for example in image segmentation and resizing.
The concept of supplementing hyper parameters
Rather than training parameters, hyper parameters are values set beforehand for
parameters in machine learning. Hyperparameters on the learning machine require
optimization in order to improve its performance and effect. To improve the performance of a
learning machine, select a set of optimal hyperparameters. Here are the key hyperparameters:
● Optimization algorithm learning rate: This refers to the weight of the network update in
the optimization algorithm. According to SGD, Adam, Adagrad, AdaDelta or RMSProp, the
choice of learning rate depends on the type of optimization algorithm employed.
● Number of iterations:
A neural network is trained by submitting the entire training set to the network multiple
times. It may be pertinent to increase the number of iterations or shift the network
structure if the difference between test error rates and training error rates is small.
Otherwise, the number of iterations shall be increased or adjusted.
● Batch size:
● Learning convolutional neural networks is more effective when smaller batches are used,
and the selection range is normally between [16,128]. Changing batch sizes can have a
significant impact on the CNN network.
● Activation function:
Because activation functions are nonlinear, the model can theoretically be applied to any
function. As a rule, CNN networks work well with rectifier functions. In addition, you have
the option of choosing other activation functions, such as Sigmoid and Tanh, depending on
the tasks you need to complete.
● The number of hidden layers:
A large amount of calculations leads to slow performance so that training accuracy can no
longer increase.
● The number of hidden layer units:
too little will be under-fitting (the error on the training sample is extremely large), and when
too much, the correct regularization method should be adopted to prevent overfitting.
● Weight initialization: uniform distribution, normal distribution.
● Dropout method: A common regularization method that improves overfitting, with a default
of 0.5.
After our convolutional layer, there is a nonlinear layer. Generally we use the ReLU
(Correct Linear Unit) layer in order to give a linear calculation operation in the convolutional
layer (just the element wise multiplication and the summation system introduces nonlinear
features. The role of the nonlinear activation function is to add nonlinear factors, improve the
ability of the neural network to express the model, and solve problems that the linear model
cannot solve (such as linear inseparable classification problems). The ReLU layer works much
better because the neural network can increase the training speed much more without
significant changes in accuracy (due to increased computational efficiency).
Mathematical Derivations
Input
Output
The above input and output formulas are for each convolutional layer, and each
convolutional layer has a different weight matrix. , and , , Is a matrix form. For the last
layer of fully connected layers, set to Layer, the output is in vector form , the expected
output is , there is a total error formula.
Total error
Conv2 ( ) is a function of the convolution operation in Matlab. The third parameter valid
indicates the type of convolution operation. The convolution method described above is valid.
Is a convolution kernel matrix, Is the input matrix, Is biased, Is the activation
function. In the total error , they are the vectors of expected output and network output,
respectively. Representation vector 2-norm, the expression is .The
calculation formula of the input and output of the fully connected layer neurons is exactly the
same as that of the BP network. Is an activation function.

Principles of Convolutional Neural Networks

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Principles of Convolutional Neural Networks

Uploaded by

Copyright:

Available Formats

Convolutional Neural Networks (CNN)

images and can be directly input into original images.

Principles/Conventions to build a CNN architecture

keeping the above principle in mind.

1. To better represent information on a global, high-level, and representative scale, always

generated feature space requires increasing the filter width gradually.

channels) that can help distinguish between classes.

strides, and then carried out using the convention.

without shrinking their size.

data augmentations, etc., to reduce over-fitting.

Structure of Convolutional Neural Networks

this filter, which is able to yield 28 x 28 results.

Fully connected layer

not changed). This layer is called the pooling layer.

Two main purposes:

in a reduction of computational costs.

2. Overfitting can be controlled.

There are many options for the pooling layer:

filter convolution calculation.

Common uses for CNNs

our inputs, for example in image segmentation and resizing.

The concept of supplementing hyper parameters

parameters in machine learning. Hyperparameters on the learning machine require

choice of learning rate depends on the type of optimization algorithm employed.

Otherwise, the number of iterations shall be increased or adjusted.

significant impact on the CNN network.

the tasks you need to complete.

● The number of hidden layers:

● The number of hidden layer units:

● Weight initialization: uniform distribution, normal distribution.

significant changes in accuracy (due to increased computational efficiency).

output is , there is a total error formula.

Is a convolution kernel matrix, Is the input matrix, Is biased, Is the activation

respectively. Representation vector 2-norm, the expression is .The

same as that of the BP network. Is an activation function.

You might also like