You are on page 1of 9

Convolutional Neural Networks (CNN)

Essentially, CNNs are feedforward neural networks, which are the simplest type of

neural network. There are only a few connections between neurons in the same layer. Provide

the output to the next layer after receiving it from the previous layer. The layers do not

communicate with each other. Multilayer structure with unidirectional orientation. In the same

layer of neurons, the information is not transmitted in both directions, and there is no

communication between layers. With the exception of the input and output layers, all layers in

the middle are hidden layers, and a hidden layer consists of one or more layers. Pattern

classification commonly uses CNNs because they do not require complicated preprocessing of

images and can be directly input into original images.

Principles/Conventions to build a CNN architecture

Convolutional neural networks are constructed by keeping the feature space broad and

shallow at the beginning of the network, then narrowing it and making it deeper as the network

progresses. Here are a few rules to follow to help guide you as you build your CNN architecture

keeping the above principle in mind.

1. To better represent information on a global, high-level, and representative scale, always

start with smaller filters to capture as much local information as possible. Reducing the

generated feature space requires increasing the filter width gradually.

2. It is recommended in principle to start with fewer channels so that they can detect low-level

features that are combined to form many complex shapes (with a greater number of

channels) that can help distinguish between classes.


3. To help in learning more levels of global abstract structures, the number of filters is

increased in order to increase the depth of the feature space. Another benefit of a deeper

and narrower feature space is that it will diminish the feature space to fit into dense

networks.

4. When dealing with moderate or small-sized images, we generally use 3x3, 5x5, and 7x7

filter sizes for the convolutional layer. For Max-Pooling, we use 2x2 or 3x3 filter sizes with a

stride of 2. A large image can be shrunk down to a moderate size by using larger filters and

strides, and then carried out using the convention.

5. When you believe the borders of the image might be important or when you just wish to

lengthen your network architecture, you can use padding = same as it keeps the dimensions

unchanged after the convolution and thereby allowing you to use more convolutions

without shrinking their size.

6. As you add layers, you will become over-fit. Upon achieving sufficient accuracy in our

validation set, we can use regularization methods, such as L1/L2, dropouts, batch norms,

data augmentations, etc., to reduce over-fitting.

Structure of Convolutional Neural Networks


Essentially, the input image, the convolution step, and the depth (Feature Map) are all

layers in the CNN structure diagram above. Use the above procedure to obtain a feature map

with a fifth layer depth of 5, then connect the five feature maps, which are five matrices, into a

vector by row expansion. BP neural networks make up the fully connected layer in the Fully

Connected layer. The feature maps can be interpreted as neurons, arranged in a matrix, similar

to the neurons in the BP neural network. The calculation process for the pooling and product

calculations.

Biologically based visual cognition mechanisms are the basis for the creation of

convolutional neural networks. Currently, CNN is a research hotspot in many scientific fields,

especially when it comes to classification of images because the network is able to directly

input the image without complex pre-processing, which is more widely applicable. It can be

used to classify images, recognize targets, detect targets, segment semantically, etc.

Convolution layer

The first thing you should do is define convolution. It consists in generating a third

function by f and g through two other functions, convolution. A new output is obtained by

combining the convolution kernel with the input. The structure of the incoming information is

determined by the activation map (also called feature mapping). If the input is an array of

32*32*3 pixel values, the filter (sometimes called a neuron or kernel) acts on the receptive

field. This filter is an array (the elements are called weights or parameters), but the important

thing is that the filter depth must match the depth of the input (so you can make math

computations). Therefore the filter size is 5 x 5 x 3. This is done by sliding the filter from the
upper left corner of the image to the right, one pixel at a time, and multiplying the value before

each slide by the pixel value in the original image (also called the dot product).

Numbers are derived by adding the items together. A 28 x 28 array represents the results of

this filter, which is able to yield 28 x 28 results.

Fully connected layer

The fully connected layer looks at the output of the previous layer (which corresponds to

the activation maps of the more advanced features) and determines which features match the

most closely to those categories. By calculating the dot product of the weights and the previous

layer, you can derive the accuracy of classifications when the fully connected layer identifies

features that are well matched and those with specific weights.

(To say, the full connection functions as a classifier. The feature vector extracted from the

upper layer is used to determine the weight of each category and its likelihood of being

output.) The fully connected layer also integrates the previously obtained Feature map into a

value. By observing a large value, we can determine that the feature we are looking for is

independent of the position. In the fully connected layer, the role of the extracting of features

is part of its function, while that of the categorizing function is part of its function.

Pooling layer

An individual feature whose activation value is high (the characteristic value of that feature)

has a greater relative position to other features than its absolute position. In this layer, the
input volume is greatly reduced in size (the length and width have changed, but the depth has

not changed). This layer is called the pooling layer.

Two main purposes:

1. Statistical weight parameters are reduced by 75% (by using 2x2 pooling layers), resulting

in a reduction of computational costs.

2. Overfitting can be controlled.

There are many options for the pooling layer:

1. There should be a filter (typically 2x2) and a stride of the same length to achieve

maximum pooling. The output is then determined by applying it to each sub-area of the

filter convolution calculation.

2. Average pooling.

3. L2-norm pooling.

Dropout layer

This ensures new samples are not artificially flattened when the neural network's

weights are too close to those from the training sample. There is nothing complicated about

Dropout. A Dropout layer sets the activation parameter set to 0 in the forward pass, which is

equivalent to removing a random set of activation parameters from a layer. As a result, even if

some activation parameters are discarded, the network will be able to offer the appropriate

classification or output for the given sample. A mechanism such as this will prevent the neural
network from over-fitting the training samples, alleviating overfitting. A second important

aspect is that the Dropout layer can be used only in training, not in system testing.

Common uses for CNNs

CNNs are commonly used for image classification, for example to identify satellite

images that contain roads or to classify handwritten letters and numbers. In the mainstream,

CNNs are highly effective at image segmentation and signal processing, for example. A CNN can

be used for understanding in Natural Language Processing (NLP) and speech recognition,

however, recurrent neural networks (RNNs) are more commonly employed for NLP tasks. There

is also the possibility of implementing a CNN in a U-Net architecture, which is essentially a CNN

comprised of two almost identical CNNs. With U-nets, we can achieve similar output sizes to

our inputs, for example in image segmentation and resizing.

The concept of supplementing hyper parameters

Rather than training parameters, hyper parameters are values set beforehand for

parameters in machine learning. Hyperparameters on the learning machine require

optimization in order to improve its performance and effect. To improve the performance of a

learning machine, select a set of optimal hyperparameters. Here are the key hyperparameters:

● Optimization algorithm learning rate: This refers to the weight of the network update in

the optimization algorithm. According to SGD, Adam, Adagrad, AdaDelta or RMSProp, the

choice of learning rate depends on the type of optimization algorithm employed.

● Number of iterations:
A neural network is trained by submitting the entire training set to the network multiple

times. It may be pertinent to increase the number of iterations or shift the network

structure if the difference between test error rates and training error rates is small.

Otherwise, the number of iterations shall be increased or adjusted.

● Batch size:

● Learning convolutional neural networks is more effective when smaller batches are used,

and the selection range is normally between [16,128]. Changing batch sizes can have a

significant impact on the CNN network.

● Activation function:

Because activation functions are nonlinear, the model can theoretically be applied to any

function. As a rule, CNN networks work well with rectifier functions. In addition, you have

the option of choosing other activation functions, such as Sigmoid and Tanh, depending on

the tasks you need to complete.

● The number of hidden layers:

A large amount of calculations leads to slow performance so that training accuracy can no

longer increase.

● The number of hidden layer units:

too little will be under-fitting (the error on the training sample is extremely large), and when

too much, the correct regularization method should be adopted to prevent overfitting.

● Weight initialization: uniform distribution, normal distribution.

● Dropout method: A common regularization method that improves overfitting, with a default

of 0.5.
After our convolutional layer, there is a nonlinear layer. Generally we use the ReLU

(Correct Linear Unit) layer in order to give a linear calculation operation in the convolutional

layer (just the element wise multiplication and the summation system introduces nonlinear

features. The role of the nonlinear activation function is to add nonlinear factors, improve the

ability of the neural network to express the model, and solve problems that the linear model

cannot solve (such as linear inseparable classification problems). The ReLU layer works much

better because the neural network can increase the training speed much more without

significant changes in accuracy (due to increased computational efficiency).

Mathematical Derivations

Input

Output

The above input and output formulas are for each convolutional layer, and each

convolutional layer has a different weight matrix. , and , ,  Is a matrix form. For the last

layer of fully connected layers, set to Layer, the output is in vector form , the expected

output is , there is a total error formula.

    Total error
Conv2 ( ) is a function of the convolution operation in Matlab. The third parameter valid

indicates the type of convolution operation. The convolution method described above is valid.

Is a convolution kernel matrix, Is the input matrix, Is biased, Is the activation

function. In the total error ,  they are the vectors of expected output and network output,

respectively. Representation vector 2-norm, the expression is .The

calculation formula of the input and output of the fully connected layer neurons is exactly the

same as that of the BP network. Is an activation function.

You might also like