Professional Documents
Culture Documents
Essentially, CNNs are feedforward neural networks, which are the simplest type of
neural network. There are only a few connections between neurons in the same layer. Provide
the output to the next layer after receiving it from the previous layer. The layers do not
communicate with each other. Multilayer structure with unidirectional orientation. In the same
layer of neurons, the information is not transmitted in both directions, and there is no
communication between layers. With the exception of the input and output layers, all layers in
the middle are hidden layers, and a hidden layer consists of one or more layers. Pattern
classification commonly uses CNNs because they do not require complicated preprocessing of
Convolutional neural networks are constructed by keeping the feature space broad and
shallow at the beginning of the network, then narrowing it and making it deeper as the network
progresses. Here are a few rules to follow to help guide you as you build your CNN architecture
start with smaller filters to capture as much local information as possible. Reducing the
2. It is recommended in principle to start with fewer channels so that they can detect low-level
features that are combined to form many complex shapes (with a greater number of
increased in order to increase the depth of the feature space. Another benefit of a deeper
and narrower feature space is that it will diminish the feature space to fit into dense
networks.
4. When dealing with moderate or small-sized images, we generally use 3x3, 5x5, and 7x7
filter sizes for the convolutional layer. For Max-Pooling, we use 2x2 or 3x3 filter sizes with a
stride of 2. A large image can be shrunk down to a moderate size by using larger filters and
5. When you believe the borders of the image might be important or when you just wish to
lengthen your network architecture, you can use padding = same as it keeps the dimensions
unchanged after the convolution and thereby allowing you to use more convolutions
6. As you add layers, you will become over-fit. Upon achieving sufficient accuracy in our
validation set, we can use regularization methods, such as L1/L2, dropouts, batch norms,
layers in the CNN structure diagram above. Use the above procedure to obtain a feature map
with a fifth layer depth of 5, then connect the five feature maps, which are five matrices, into a
vector by row expansion. BP neural networks make up the fully connected layer in the Fully
Connected layer. The feature maps can be interpreted as neurons, arranged in a matrix, similar
to the neurons in the BP neural network. The calculation process for the pooling and product
calculations.
Biologically based visual cognition mechanisms are the basis for the creation of
convolutional neural networks. Currently, CNN is a research hotspot in many scientific fields,
especially when it comes to classification of images because the network is able to directly
input the image without complex pre-processing, which is more widely applicable. It can be
used to classify images, recognize targets, detect targets, segment semantically, etc.
Convolution layer
The first thing you should do is define convolution. It consists in generating a third
function by f and g through two other functions, convolution. A new output is obtained by
combining the convolution kernel with the input. The structure of the incoming information is
determined by the activation map (also called feature mapping). If the input is an array of
32*32*3 pixel values, the filter (sometimes called a neuron or kernel) acts on the receptive
field. This filter is an array (the elements are called weights or parameters), but the important
thing is that the filter depth must match the depth of the input (so you can make math
computations). Therefore the filter size is 5 x 5 x 3. This is done by sliding the filter from the
upper left corner of the image to the right, one pixel at a time, and multiplying the value before
each slide by the pixel value in the original image (also called the dot product).
Numbers are derived by adding the items together. A 28 x 28 array represents the results of
The fully connected layer looks at the output of the previous layer (which corresponds to
the activation maps of the more advanced features) and determines which features match the
most closely to those categories. By calculating the dot product of the weights and the previous
layer, you can derive the accuracy of classifications when the fully connected layer identifies
features that are well matched and those with specific weights.
(To say, the full connection functions as a classifier. The feature vector extracted from the
upper layer is used to determine the weight of each category and its likelihood of being
output.) The fully connected layer also integrates the previously obtained Feature map into a
value. By observing a large value, we can determine that the feature we are looking for is
independent of the position. In the fully connected layer, the role of the extracting of features
is part of its function, while that of the categorizing function is part of its function.
Pooling layer
An individual feature whose activation value is high (the characteristic value of that feature)
has a greater relative position to other features than its absolute position. In this layer, the
input volume is greatly reduced in size (the length and width have changed, but the depth has
1. Statistical weight parameters are reduced by 75% (by using 2x2 pooling layers), resulting
1. There should be a filter (typically 2x2) and a stride of the same length to achieve
maximum pooling. The output is then determined by applying it to each sub-area of the
2. Average pooling.
3. L2-norm pooling.
Dropout layer
This ensures new samples are not artificially flattened when the neural network's
weights are too close to those from the training sample. There is nothing complicated about
Dropout. A Dropout layer sets the activation parameter set to 0 in the forward pass, which is
equivalent to removing a random set of activation parameters from a layer. As a result, even if
some activation parameters are discarded, the network will be able to offer the appropriate
classification or output for the given sample. A mechanism such as this will prevent the neural
network from over-fitting the training samples, alleviating overfitting. A second important
aspect is that the Dropout layer can be used only in training, not in system testing.
CNNs are commonly used for image classification, for example to identify satellite
images that contain roads or to classify handwritten letters and numbers. In the mainstream,
CNNs are highly effective at image segmentation and signal processing, for example. A CNN can
be used for understanding in Natural Language Processing (NLP) and speech recognition,
however, recurrent neural networks (RNNs) are more commonly employed for NLP tasks. There
is also the possibility of implementing a CNN in a U-Net architecture, which is essentially a CNN
comprised of two almost identical CNNs. With U-nets, we can achieve similar output sizes to
Rather than training parameters, hyper parameters are values set beforehand for
optimization in order to improve its performance and effect. To improve the performance of a
learning machine, select a set of optimal hyperparameters. Here are the key hyperparameters:
● Optimization algorithm learning rate: This refers to the weight of the network update in
the optimization algorithm. According to SGD, Adam, Adagrad, AdaDelta or RMSProp, the
● Number of iterations:
A neural network is trained by submitting the entire training set to the network multiple
times. It may be pertinent to increase the number of iterations or shift the network
structure if the difference between test error rates and training error rates is small.
● Batch size:
● Learning convolutional neural networks is more effective when smaller batches are used,
and the selection range is normally between [16,128]. Changing batch sizes can have a
● Activation function:
Because activation functions are nonlinear, the model can theoretically be applied to any
function. As a rule, CNN networks work well with rectifier functions. In addition, you have
the option of choosing other activation functions, such as Sigmoid and Tanh, depending on
A large amount of calculations leads to slow performance so that training accuracy can no
longer increase.
too little will be under-fitting (the error on the training sample is extremely large), and when
too much, the correct regularization method should be adopted to prevent overfitting.
● Dropout method: A common regularization method that improves overfitting, with a default
of 0.5.
After our convolutional layer, there is a nonlinear layer. Generally we use the ReLU
(Correct Linear Unit) layer in order to give a linear calculation operation in the convolutional
layer (just the element wise multiplication and the summation system introduces nonlinear
features. The role of the nonlinear activation function is to add nonlinear factors, improve the
ability of the neural network to express the model, and solve problems that the linear model
cannot solve (such as linear inseparable classification problems). The ReLU layer works much
better because the neural network can increase the training speed much more without
Mathematical Derivations
Input
Output
The above input and output formulas are for each convolutional layer, and each
convolutional layer has a different weight matrix. , and , , Is a matrix form. For the last
layer of fully connected layers, set to Layer, the output is in vector form , the expected
Total error
Conv2 ( ) is a function of the convolution operation in Matlab. The third parameter valid
indicates the type of convolution operation. The convolution method described above is valid.
function. In the total error , they are the vectors of expected output and network output,
calculation formula of the input and output of the fully connected layer neurons is exactly the