You are on page 1of 5

2019 International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI)

Research and Discussion on Image Recognition and


Classification Algorithm Based on Deep Learning

Dong Yu-nan Liang Guang-sheng


School of electrical and electronic engineering School of electrical and electronic engineering
North China Electric Power University North China Electric Power University
Beijing, China Beijing, China
2546262573@qq.com

Abstract—Machine learning, as the core of artificial intelligence, neural network and has received the attention again. To limit
is also the fundamental reason for computer intelligence, and has computer level at the time, however, cannot be achieved to a
been widely used in the field of artificial intelligence. With the lot of training model, it was not until 2006 that "through
improvement of computer's ability to process data, deep learning unsupervised learning method step by step training algorithm,
has been highlighted in the field of machine learning. More and using the back propagation algorithm of supervised tune" the
more researchers have joined in the theoretical research and depth of the learning method is put forward, make deep
applied research of deep learning. And image recognition and learning into explosive growth stage. In 2016, AlphaGo beat
classification are very important applications. This paper first lee se-dol 4-1, which caused a worldwide sensation. In 2017,
compares deep learning with traditional machine learning
AlphaGo Zero, an upgraded version of AlphaGo based on
methods, then introduces the development process of deep
learning, studies and analyses the network structure of deep reinforcement learning algorithms, was launched.
learning such as deep belief network, convolution neural network
and recursive neural network, expounds the application of deep II. TYPICAL DEEP LEARNING NETWORK STRUCTURE
learning in image recognition and classification, and puts
forward the deep learning in image recognition and classification. A. Deep Belief Network
The problems encountered in the application of recognition and Geoffrey Hinton proposed deep belief network in 2006,
classification and the corresponding solutions are discussed.
which has become one of the mainstream frameworks of deep
Finally, the research status of in-depth learning in image
recognition and classification is summarized and prospected.
learning algorithms. By taking the hidden layer of the upper
layer of restricted boltzmann machine as the visible layer of the
Keywords- artificial intelligence; deep learning; convolutional next layer, several layers of restricted boltzmann machines are
neural network; image recognition and classification stacked up to form a deep belief network [2-3]. The model of
restricted Boltzmann machine is shown in Fig.1.
I. INTRODUCTION
Deep learning is a subclass of machine learning. Compared
with traditional machine learning, deep learning is more
suitable for big data processing. Algorithm performance
increases with the increase of data volume. Unlike traditional
machine learning, deep learning does not rely on the artificial
determination of application features. Instead, it attempts to
directly obtain higher-level features from the data and achieve
Figure 1 restricted Boltzmann machine
a deep-level machine learning model through multiple
transformations of features.[1]
Deep learning has experienced decades of development
since its beginning. From the proposal of MP model and hebb
learning rule to the discovery of perceptron learning by
American scientist Rosenblatt, the subsequent proposal of
perceptron is of milestone significance for the development of
neural network. However, because of the single layer
perceptron can't solve the problem of nonlinear, linear
classification, make the development of artificial neural
network into a trough, this kind of situation, it was not until
1986 that a kind of suitable for multilayer perceptron back
propagation algorithm, BP algorithm is put forward, to solve
the nonlinear problem, and walked out of the plight of artificial

978-1-7281-5094-9/19/$31.00 ©2019 IEEE 274


DOI 10.1109/MLBDBI48998.2019.00061

Authorized licensed use limited to: Fiorela Huarcaya Gonzales. Downloaded on June 11,2022 at 16:20:53 UTC from IEEE Xplore. Restrictions apply.
Figure.2 deep belief network

The model of the Boltzmann machine has two layers. The


lower layer is the visible layer, while the upper layer is the
hidden layer. The interconnections between the two layers are
bi-directional. There is no connection between neurons in the
same layer, and all neurons between adjacent layers are
connected. The training of RBM is essentially to find the
probability distribution with the maximum probability of a
training sample. We find the best weight through the training of
RBM. Stack the RBM successively after the pre-training, that
is, after the training of RBM1, the hidden layer of RBM1 is
used as the visible layer of RBM2 to train RBM2. After the
training of RBM2, the hidden layer of RBM2 is used as the
visible layer of RBM3 to train RBM3. Then, the back
propagation is carried out through conditional probability, the
weights and bias values of neurons are updated, and global
fine-tuning is carried out, forming a neural network and deep
belief network that learns through probability size. See Fig.2.
Deep confidence networks are widely used in image
recognition and classification due to their high flexibility and Figure.3 training process of convolutional neural network
easy expansion.
the intermediary, thus greatly reducing the number of
B. Convolution Neural Network parameters from the image input layer to the hidden layer. [7-9]
Convolutional neural network is a feedforward neural The basic structure of convolutional neural network
network, which includes convolution calculation method and includes convolution layer, pooling layer and full connection
has a deep structure. It is also one of the most important deep layer. Convolutional layer and pooling layer constitute multiple
learning algorithms. [4-6] coil units to extract features layer by layer, and finally
complete image classification through the later full connection
Convolutional neural network is a hot research topic in the layer. In the process of image classification, the convolution
field of image recognition and classification. Local sensor field, kernel, which has been learned, is used as a filter to filter each
weight sharing and pooling layer are the key points of small region of the image to obtain the eigenvalues of these
convolutional neural network. In the process of image small regions. Due to the limitation of the size of the
recognition and processing, feature differentiation is simulated convolution kernel, the image after convolution is still very
through convolution layer, and the number of weights is large. At this time, we need to pool the image, that is, to sample
reduced through the network structure of weight sharing and down. Thereby reducing the data dimension. The training
pooling, so as to reduce the complexity of image feature process of convolutional neural network is shown in Fig.3.
extraction and data reconstruction. When processing high-pixel
images, fully connected neural network needs to process a huge Training is divided into forward propagation stage and back
number of parameters because all the neurons in the upper and propagation stage. Firstly, the weights of the network are
lower layers can connect. The convolution neural network initialized, and the input image data to be processed are
adopts the method of sparse connection and adopts the network convolved with the convolution kernel to form a local sensor
structure of weight sharing through the "convolution kernel" as field. Then, through the convolution algorithm and activation
function, the output of the convolution layer is finally obtained.

275

Authorized licensed use limited to: Fiorela Huarcaya Gonzales. Downloaded on June 11,2022 at 16:20:53 UTC from IEEE Xplore. Restrictions apply.
Figure 5 structure of recurrent neural network

network of unsupervised learning is used for feature learning of


structural information [13]. The major difference between
recurrent neural network and feedforward neural network is
Figure 4 schematic diagram of pooling process
that recurrent neural network has a certain "memory". This has
The output of the convolution layer enters the pooling layer the advantage of being able to process input data that is clearly
contextually related, and can handle input that is not
to conduct the down-sampling operation to reduce the data
dimension and avoid overfitting. Common pooling methods necessarily long. However, it requires many training
include maximum pooling method, mean pooling method and parameters and has no characteristic learning ability.
random pooling method, as shown in Fig.4. The extracted
characteristic data were passed into the full connection layer, III. APPLICATION OF DEEP LEARNING IN IMAGE
classified and obtained the final result. RECOGNITION AND CLASSIFICATION

If the output result of convolutional neural network is Deep learning is widely applied in the field of image
inconsistent with our expectation, the back propagation training recognition and classification. In particular, convolutional
is carried out. Calculate the error between the output result and neural network in deep learning was originally proposed to
the expected result, return the error layer by layer, calculate the solve the problem of image recognition and classification [14].
error of each layer, and update the weight of each layer. As the pioneering work of convolutional neural network,
LeNet[15] was put forward in the 1990s, which determined the
The proposed convolutional neural network reduces the basic structure of convolutional neural network's convolutional
number of weights and the complexity of parameters, making layer, pooling layer and full connection layer, and laid the
the network easy to optimize. At the same time, the complexity foundation of convolutional neural network.
of the model is reduced and the risk of over-fitting is reduced.
This is of great significance for the recognition and With the development of ImageNet project, deep learning
classification of images. Convolutional neural network can has greatly improved the accuracy of image recognition and
better adapt to the structure of images and make the structure of classification. In the 2012 ImageNet large-scale visual
neural network simple and more adaptive. [10-12] recognition challenge, Hinton and his student Alex Krizhevsky
proposed AlexNet[16] in the university of Toronto, and won the
C. Recurrent Neural Network championship of that year's competition. AlexNet is a deeper
and wider promotion of LeNet. ReLU is used instead of
With a tree-like hierarchical structure, recursive neural Sigmoid as the activation function of convolutional neural
network is an artificial neural network in which network nodes network, which not only accelerates the training speed, but also
recursively implement a whole sequence of input according to solves the gradient dispersion problem when the network is
their connection order, and it is one of the algorithms of deep deep. At the same time, Dropout parameter is introduced to
learning [4]. Its network structure is shown in Fig.5. On the reduce the complexity of neuron adaptation and avoid over-
basis of the traditional input layer, output layer and hidden fitting problems caused by training to a certain extent [17].
layer, the hidden layer has an additional closed loop. It can be However, due to the limitation of model depth, the ability of
understood that the input of the hidden layer is the summary of the model to describe and extract image features is still
the existing input and the past memory. Recursive neural insufficient.
network is divided into temporal recursive neural network and
structural recursive neural network. The neurons of temporal GoogLeNet, the champion network of ILSVRC2014, is a
recursive neural network are connected to form a directed brand new deep learning structure proposed by Christian
graph, while the structural recursive neural network is recurred Szegedy. Compared with the previous structure, it is notable
through similar neural network structure to construct a more for introducing Inception module, which can use resources
complex neural network. Recurrent neural networks can be more efficiently to improve the training effect [18]. Inception
trained by using both supervised and unsupervised learning module now includes Inception v1, Inception v2, Inception v3,
theories. During supervised learning, weights are updated by Inception v4 and Inception-ResNet[19-22]. Inception module is
using back propagation algorithm. The recursive neural proposed to enable convolutional neural network to realize
multiple convolution and pooling operations on input images in
parallel, so as to avoid the problem that the convolutional layer

276

Authorized licensed use limited to: Fiorela Huarcaya Gonzales. Downloaded on June 11,2022 at 16:20:53 UTC from IEEE Xplore. Restrictions apply.
is repeatedly stacked and the network becomes deeper and B. Gradient Gisappearance Groblem
deeper for better performance. Each release of Inception The fundamental reason for the disappearance of gradient is
modules is an evolution of the previous release, allowing for the improper selection of activation function. For deep learning,
improved image classification accuracy while improving there are too many network level, especially for the way of
parameter utilization. However, with the higher level of the back propagation network, when we use like a sigmoid
model, the gradient dispersion problem is more and more function, using the chain derivative, can make the error from
serious, and the difficulty of network training is increasing. the output layer begins to decay, back layer more than one,
In addition to the above models, there are also VGGNet then the lower basic less than effective training, thus affecting
models [23], ResNet models [24], SENet models, etc., all of the normal work of the network.
which have achieved very good results in the ILSVRC For the gradient disappearance problem, there have been
competition. At the same time, with the progress of good solutions. For CNN network, the ReLu function can
convolutional neural networks, the data scale and complexity effectively avoid the gradient disappearance problem. For RNN
of the images they can process have increased significantly, networks, LSTM also overcomes the gradient vanishing
and their application fields have become more and more problem.
extensive, such as medical image analysis, food recognition,
behavior recognition , target detection and so on. C. Overfitting Problem
Overfitting means that the deep learning model we have
IV. PROBLEMS AND SOLUTIONS trained performs very well in the training sample, but poorly in
With the extensive application of image recognition and the test sample. The reason for this situation is that the number
classification based on deep learning in computer vision, the of training samples is less than the complexity of the model,
amount of data to be processed is increasing and the which leads to insufficient training of the model. The
complexity of the data is increasing, many problems have characteristics of training sample and test sample are different.
emerged, which are worthy of our consideration and discussion. The noise in the training sample is too large or the training
times are too many. As a result, the trained model remembers
A. Fall into Local Optimization the characteristics of the noise, so it cannot correctly match the
In the process of applying multi-layer neural network, it is input and output when testing the sample input. In view of the
easy to get into the problem of local optimization from above problems, we can make the number of training samples
different points of view. For example, when doing gradient match the complexity of the model, and reduce the number of
descent, as shown in Fig.6. training samples by reducing the complexity of the model or
loss function, batch loss, etc., so as to make them match. If the
Obviously, the minimum of the function is at D, and B, F, noise of the training sample is too much, we can adopt the
H are all local minima, and if we don't find D, and when we pretreatment method to reduce the noise of the training sample.
find B, we stop, then we're stuck in local optimization. For If over-fitting occurs due to the small number of training
multi-dimensional space, besides local optimization, there is samples, we can increase the number of samples by rotating,
also the saddle point problem. Using limited data to train deep scaling, cutting and other operations on the image samples, so
level networks, if trapped in local optimization, the as to reduce the occurrence of over-fitting.
performance is often not as good as that of shallow level
networks. In order to avoid falling into the local optimal V. SUMMARY AND PROSPECT
situation, weights can be initialized and different initial weights
can be used for training. The impulse is set so that it can cross At present, the image classification algorithm based on
some locally optimal conditions, using stochastic gradient depth of learning, especially based on the convolution of the
descent instead of real gradient descent or simulated annealing. neural network algorithm to deal with some simple image
These methods only reduce the possibility of falling into local classification, can achieve very good effect, but in some
optimization to some extent, but the local optimization problem complex image processing, especially contains a certain change,
still limits the development of deep structure, which requires us such as face recognition, face will be as time, light, angle,
to continue to explore and study better methods. constant change, and how to better, faster and more accurately
identify complex images, which has become the focus of
current research. At present, in the field of feature learning,
tagged data training still occupies a dominant position.
However, with the increasing amount of data, it becomes more
and more unrealistic to add tags to all the data. Therefore, it
becomes more and more urgent to automatically mark the data
or to train the network through unmarked data. On the other
hand, in the field of image recognition and classification, how
to accelerate the training speed while ensuring the recognition
accuracy is also a hot topic of current research. The setting of
some training parameters for convolutional neural network is
currently based on human experience or experiments, and there
Figure.6 schematic diagram of gradient descent method is no systematic regulation process. It is hoped that in the future,

277

Authorized licensed use limited to: Fiorela Huarcaya Gonzales. Downloaded on June 11,2022 at 16:20:53 UTC from IEEE Xplore. Restrictions apply.
there will be a method of automatic adjustment of relevant [9] ery deep convolutional networks for large-scale image recognition.
parameters in the network, so that the network can truly SIMONYAN K,ZISSERMAN A. International Conference on Learning
Representations . 2015
achieve global optimization and reduce the time required for
[10] K㸪Zhang X㸪㹐en S㸪et al㸬 Deep 㹐esidual Learning for Image
training the network. Based on this, it is necessary to conduct a Recognition 㹙 C 㹛 ҋ IEEE Conference on Computer Vision and
more comprehensive and in-depth study on image classification Pattern Recognition㸬 IEEE Computer Society㸪2016: 770-778㸬
algorithm based on deep learning. [11] rizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep
convolutional neural networks[C]// International Conference on Neural
Information Processing Systems. Curran Associates Inc. 2012:1097-
ACKNOWLEDGMENT 1105.
This thesis is completed under the kind care and careful [12] Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[J].
guidance of Mr. Liang in the process of topic selection and 2014:1-9.
research. His serious scientific attitude, rigorous academic [13] Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C]//
spirit and excelsior work style deeply inspired and inspired me. IEEE Conference on Computer Vision and Pattern Recognition. IEEE,
2015:1-9.
I would like to extend my sincere thanks and high respect to Mr.
Liang. [14] Ioffe S, Szegedy C. Batch Normalization: Accelerating Deep Network
Training by Reducing Internal Covariate Shift[J]. 2015:448-456.
[15] Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the Inception
REFERENCES Architecture for Computer Vision[C]// IEEE Conference on Computer
Vision and Pattern Recognition. IEEE Computer Society, 2016:2818-
[1] Goodfellow, I., Bengio, Y., Courville, A. 㸬 Deep learning (Vol.
2826.
1)㸬Cambridge㸸MIT press㸪2016㸸326-366.
[16] Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4, Inception-ResNet
[2] Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T.,
Wang, X., Wang, L., Wang, G. and Cai, J., 2015. Recent advances in and the Impact of Residual Connections on Learning[J]. 2016
convolutional neural networks. arXiv preprint arXiv:1512.07108. [17] Simonyan K, Zisserman A. Very Deep Convolutional Networks for
Large-Scale Image Recognition[J]. Computer Science, 2014.
[3] Jian CHENG,Pei-song WANG,Gang LI,Qing-hao HU,Han-qing
LU.Recent advances in efficient computation of deep convolutional [18] He K㸪Zhang X㸪㹐en S㸪et al㸬 Deep Residual Learning for Image
neural networks[J].Frontiers of Information Technology & Electronic Recognition 㹙 C 㹛 ҋ IEEE Conference on Computer Vision and
Engineering,2018,19(01):64-77. Pattern Recognition㸬 IEEE Computer Society㸪2016: 770-778㸬
[4] Spatial pyramid pooling in deep convolutional networks for visual [19] Hu J㸪Shen L㸪Sun G㸬 Squeeze-and-excitation Networks㹙J㹛㸬ar
recognition. He K,Zhang X,Ren S, et al. Pattern Analysis and Machine Xiv preprint ar Xiv: 1709.01507 . 2017.7.
Intelligence, IEEE Transactions on . 2015 [20] A comparison of 3D interest point descriptors with application to airport
[5] Learning multiple layers of representation[J] . Geoffrey E. Hinton. baggage object detection in complex CT imagery[J] . Greg Flitton,Toby
Trends in Cognitive Sciences . 2007 (10) P. Breckon,Najla Megherbi. Pattern Recognition . 2013 (9)
[6] ImageNet classification with deep convolutional neural networks. [21] Deep learning in neural networks: An overview[J] . Jürgen Schmidhuber.
Krizhevsky A,Sutskever I,Hinton G E. International Conference on Neural Networks . 2014
Neural Information Processing Systems . 2012 [22] Convolutional Networks and Applications in Vision. Y.
[7] Learning to compare image patches via convolutional neural networks. LeCun,K,Kavukcuoglu, C. Farabet. IEEE International Symposium on
ZAGORUYKO S,KOMODAKIS N. Computer Vision and Pattern Circuits and Systems . 2010
Recognition . 2015 [23] Comparison of Regularization Methods for ImageNet Classification with
[8] Li, P., Liu, Y. and Sun, M., 2013. Recursive autoencoders for ITG-based Deep Convolutional Neural Networks[J] . Evgeny A. Smirnov,Denis M.
translation. In Proceedings of the 2013 Conference on Empirical Timoshenko,Serge N. Andrianov. AASRI Procedia . 2014
Methods in Natural Language Processing (pp. 567-577). [24] Deep Learning Face Representation from Predicting 10,000 Classes. Sun
Y,Wang X,Tang X. Computer Vision and Pattern Recognition (CVPR)
2014 IEEE Conference on . 2014

278

Authorized licensed use limited to: Fiorela Huarcaya Gonzales. Downloaded on June 11,2022 at 16:20:53 UTC from IEEE Xplore. Restrictions apply.

You might also like