Professional Documents
Culture Documents
Download textbook Applications Of Computational Intelligence First Ieee Colombian Conference Colcaci 2018 Medellin Colombia May 16 18 2018 Revised Selected Papers Alvaro David Orjuela Canon ebook all chapter pdf
Download textbook Applications Of Computational Intelligence First Ieee Colombian Conference Colcaci 2018 Medellin Colombia May 16 18 2018 Revised Selected Papers Alvaro David Orjuela Canon ebook all chapter pdf
https://textbookfull.com/product/intelligence-science-and-big-
data-engineering-8th-international-conference-
iscide-2018-lanzhou-china-august-18-19-2018-revised-selected-
papers-yuxin-peng/
https://textbookfull.com/product/cyber-security-15th-
international-annual-conference-cncert-2018-beijing-china-
august-14-16-2018-revised-selected-papers-xiaochun-yun/
https://textbookfull.com/product/inclusive-robotics-for-a-better-
society-selected-papers-from-inbots-
conference-2018-16-18-october-2018-pisa-italy-jose-l-pons/
https://textbookfull.com/product/smart-multimedia-second-
international-conference-icsm-2019-san-diego-ca-usa-
december-16-18-2019-revised-selected-papers-troy-mcdaniel/
https://textbookfull.com/product/computer-vision-applications-
third-workshop-wcva-2018-held-in-conjunction-with-
icvgip-2018-hyderabad-india-december-18-2018-revised-selected-
Alvaro David Orjuela-Cañón
Juan Carlos Figueroa-García
Julián David Arias-Londoño (Eds.)
Applications of
Computational Intelligence
First IEEE Colombian Conference, ColCACI 2018
Medellín, Colombia, May 16–18, 2018
Revised Selected Papers
123
Communications
in Computer and Information Science 833
Commenced Publication in 2007
Founding and Former Series Editors:
Phoebe Chen, Alfredo Cuzzocrea, Xiaoyong Du, Orhun Kara, Ting Liu,
Dominik Ślęzak, and Xiaokang Yang
Editorial Board
Simone Diniz Junqueira Barbosa
Pontifical Catholic University of Rio de Janeiro (PUC-Rio),
Rio de Janeiro, Brazil
Joaquim Filipe
Polytechnic Institute of Setúbal, Setúbal, Portugal
Ashish Ghosh
Indian Statistical Institute, Kolkata, India
Igor Kotenko
St. Petersburg Institute for Informatics and Automation of the Russian
Academy of Sciences, St. Petersburg, Russia
Krishna M. Sivalingam
Indian Institute of Technology Madras, Chennai, India
Takashi Washio
Osaka University, Osaka, Japan
Junsong Yuan
University at Buffalo, The State University of New York, Buffalo, USA
Lizhu Zhou
Tsinghua University, Beijing, China
More information about this series at http://www.springer.com/series/7899
Alvaro David Orjuela-Cañón
Juan Carlos Figueroa-García
Julián David Arias-Londoño (Eds.)
Applications of
Computational Intelligence
First IEEE Colombian Conference, ColCACI 2018
Medellín, Colombia, May 16–18, 2018
Revised Selected Papers
123
Editors
Alvaro David Orjuela-Cañón Julián David Arias-Londoño
Universidad Antonio Nariño Universidad de Antioquia
Bogotá, Colombia Medellín, Colombia
Juan Carlos Figueroa-García
Universidad Distrital Francisco
José de Caldas
Bogotá, Colombia
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
General Chair
Alvaro David Universidad Antonio Nariño, Colombia
Orjuela-Cañón
Technical Co-chairs
Julián David Arias-Londoño Universidad de Antioquia, Colombia
Juan Carlos Figueroa-García Universidad Distrital Francisco José de Caldas,
Colombia
Publication Chairs
Diana Briceño Universidad Distrital Francisco José de Caldas,
Colombia
Alvaro David Universidad Antonio Nariño, Colombia
Orjuela-Cañón
Financial Chair
José David Cely Universidad Distrital Francisco José de Caldas,
Colombia
Webmaster
Fabian Martinez IEEE Colombia, Colombia
Diego Ruiz Universidad Central, Colombia
Program Committee
Alvaro David Orjuela Universidad Antonio Nariño, Colombia
Cañón
Julián David Arias Londoño Universidad de Antioquia, Colombia
Juan Carlos Figueroa Garcia Universidad Distrital Francisco José de Caldas,
Colombia
Jose Manoel de Seixas Universidade Federal de Rio de Janeiro, Brazil
Danton Ferreira Universidade Federal de Lavras, Brazil
VIII Organization
Computational Intelligence
Computer Science
Escuela Superior Politecnica del Litoral (ESPOL), Campus Gustavo Galindo Velasco,
09-01-5863, Guayaquil, Ecuador
{jefehern,agabad}@espol.edu.ec
1 Introduction
A restricted Boltzmann machine (RBM) is a generative neural-network model
used to represent the distribution of random observations and to extract features
of those observations in an unsupervised setting. In RBM, the independence
structure between its variables is modeled through the use of latent variables (see
Fig. 1). Restricted Boltzmann machines were introduced in [17]; although it was
not until Hinton proposed contrastive divergence (CD) as a training technique
in [9], that their true potential was unveiled.
Restricted Boltzmann machines have proven powerful enough to be effective
in diverse settings. Applications of RBM include: collaborative filtering [14],
acoustic modeling [4], human-motion modeling [19,20], and music generation [2].
Restricted Boltzmann machine variations have gained popularity over the
past few years. Two variations relevant to this work are: the RNN-RBM [4] and
the conditional RBM [19]. The RNN-RBM estimates the density of multivariate
time series data by pre-training an RBM to extract features and then training a
c Springer Nature Switzerland AG 2018
A. D. Orjuela-Cañón et al. (Eds.): ColCACI 2018, CCIS 833, pp. 3–13, 2018.
https://doi.org/10.1007/978-3-030-03023-0_1
4 J. Hernandez and A. G. Abad
Hidden h1 h2 h3
Visible v1 v2 v3 v4
recurrent neural network (RNN) to make predictions; this allows the parameters
of the RBM to be kept constant serving as a prior for the data distribution while
the biases are allowed to be modified by the RNN to convey temporal informa-
tion. Conditional RBMs, on the other hand, estimate the density of multivariate
time series data by connecting past and present units to hidden variables. How-
ever, this makes the conditional RBM unsuitable for traditional CD training [12].
In this work, we focus on the use of a modified RBM model that does not
keep its parameters constant in each time step (unlike the RNN-RBM); and
that adds hidden units for past interactions and lacks connections between past
and future visible units (unlike the conditional RBM). Our model is advanta-
geous because of two factors: (1) its structure allows the modeling of temporal
and spatial structures within a sequence through the extraction of meaningful
features that can be used for prediction and classification tasks; and (2) its topol-
ogy can be easily changed or learned because it is controlled by a single set of
parameters, allowing for many models to be readily tested and compared. We
show the performance of our model by applying it to the problem of Human
Action Recognition (HAR), i.e. classifying human actions in video sequences.
The present work is an extended version of our paper [8].
The rest of this work is organized as follows. Section 2 presents a review
of RBMs describing their energy function and training method trough CD.
Section 3 introduces our proposed model, called the p-RBM, which can be viewed
as an ensemble of RBMs with the property of recalling past interactions. In
Sect. 4 we apply our model to recognize human actions in video sequences. Con-
clusions and future research directions are provided in Sect. 5.
matrix W ∈ Rn×m represents the interaction between visible and hidden units;
a ∈ Rn and b ∈ Rm are the biases for the visible and hidden units, respectively;
and Z is the partition function defined by
Z(a, b, W) = e−E(v,h) . (3)
v,h
Hidden
Visible
t t−1 t−2
α|i−j| i, j ≤ p
A= , (8)
1 elsewhere
For α = 1 the model becomes fully connected, with all connections having the
same significance. For α = 0 the model is completely disconnected. Thus, matrix
A has some control on the topology of the model.
The energy function for our p-RBM model is given by
can, for instance, model high-dimensional time series and non-linear dynamical
systems, or reconstruct video from incomplete or corrupted versions.
The joint probability distribution under the model is given by the following
Boltzmann-like distribution
1
p(ṽ, h̃) = e−E(ṽ,h̃) , (11)
Z(W̃)
where E(ṽ, h̃) is given in Eq. (10), and Z(W̃) is the partition function defined
as
Z(W̃) = e−E(ṽ,h̃) .
(12)
ṽ,h̃
Equation (15) shows that, as with the RBM, the derivative of the log-likelihood
of the p-RBM can be written as the sum of two expectations. The first term is
the expectation of the derivative of the energy under the conditional distribution
of the hidden variables given an example {vt , vt−1 , · · · , vt−p } from the training
set T . The second term is the expectation of the derivative of the energy under
the p-RBM distribution [5]. Note that Eq. (15) can be written as
∂lnL(w|T ) ∂E ∂E
Δw = − = − . (16)
∂w ∂w Data ∂w Model
Equation (16) is the update step corresponding to the p-RBM model and is
analogous to Eq. (4).
8 J. Hernandez and A. G. Abad
Contrastive Divergence for the Model. When considering our model, the
k-step contrastive divergence (CDk ) becomes
(0) ∂E (k) ∂E
CDk (w, ṽ(0) ) = − p(h̃ |ṽ(0) ) + p(h̃ |ṽ(k) ) , (17)
(0)
∂w (k)
∂w
h̃ h̃
where a Gibbs chain has been run k times on the second term in order to
approximate the expectation of the derivative of the energy under the model,
given in Eq. (15).
Training the Model. In order to define a training rule for the model, we
propose a way to sample for the CD. The sampling of a block vector of hidden
variables can be done from the following block vector of probabilities
P(h̃|ṽ) = p(ht |ṽ) · · · p(ht−p |ṽ) 1 . (18)
We may obtain the expected values of the hidden vectors by repeatedly sam-
pling from Eq. (18). For our purposes it is better to place them in a block matrix
with the same dimension as W̃. We call this matrix E[h̃|ṽ] and it is formed by
vertically stacking p + 2 block vectors of the form
E[ht |ṽ] · · · E[ht−p |ṽ] 1 . (20)
Note that each item in the block matrix can be learned independently, once
the CD step is finalized. This results in a model that is highly suitable for
parallelization during training, specially on a GPU where its multicore structure
allows the concurrent handling of multiple computational threads.
Algorithm 1 describes the detailed procedure for training the p-RBM model.
Human action recognition (HAR) is an active research topic with a large num-
ber of applications, such as: video surveillance, video retrieval, assisted living,
and human-computer interaction. HAR focuses on the identification of spatio-
temporal motion changes using invariant features extracted from 2-D images.
Traditional approaches to HAR include the use of hidden Markov models (HMM)
on handcrafted features, such as in [3] where a fast skeletonization technique was
used for feature generation. Other approaches include the use of local spatial
and temporal features obtained from detecting image structures within video
segments of high variability [11]; they are then combined with support vector
machines in [15] to discriminate among six actions.
Unsupervised-learning approaches to HAR have also been proposed in the
literature. For instance, in [13] the authors used a Gaussian-RBM to extract
spatial features from motion-sensor data to feed an HMM; in [6] a deep Boltz-
man machine was used in combination with a feed-forward neural network to
10 J. Hernandez and A. G. Abad
We trained our model on the KTH dataset introduced in [15]. This dataset
contains six-human actions performed by 25 subjects. For our purposes we turn
the problem into a binary classification problem: discriminate between walking,
and not walking. Prior preprocessing included: downsizing the images from a
resolution of 160 × 120 to 28 × 28 pixels; background subtraction using a simple
gaussian-mixture based algorithm; and pixels normalization. Additionally, we
created 8,144 sequences for the training set and 4,026 for the testing set. We
chose the following set-up for our network: p = 10 for the moving window,
n = 784 for the size of the visible layer, m = 256 for the size of the hidden layer,
a mixing rate of k = 1, and α = 1 (i.e., a fully-connected model).
Our model can be used to reconstruct the sequences in the dataset (see Fig. 3,
in which we reconstruct a sample from the training set). The features learned by
the model are significant as the model is able to reconstruct and de-noise unseen
images from new categories (see Fig. 4).
For the classification problem we extracted the features (the h̃ block vector)
from all video sequences in the training set and used them to train a multilayer
perceptron (MLP) with the architecture described in Table 1. The use of such
a simple architecture is motivated by the fact that the features extracted by
the p-RBM already summarize the spatial and temporal context of the action
performed.
We evaluated the performance of our model by constructing a confusion
matrix on the test set, shown in Table 2. The model obtains an overall accuracy
of 0.9284.
Spatial and Temporal Feature Extraction Using an RBM Model 11
Table 1. Architecture of the MLP that sits on top of the p-RBM used to perform
classification. Here BS stands for Batch Size.
Table 2. Confusion matrix predicted action for the KTH dataset in the test set.
Table 3. Comparision of the p-RBM+MLP results with other methods on the KTH
dataset.
References
1. Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., Baskurt, A.: Sequential deep
learning for human action recognition. In: Salah, A.A., Lepri, B. (eds.) HBU 2011.
LNCS, vol. 7065, pp. 29–39. Springer, Heidelberg (2011). https://doi.org/10.1007/
978-3-642-25446-8 4
2. Boulanger-Lewandowski, N., Bengio, Y., Vincent, P.: Modeling temporal depen-
dencies in high-dimensional sequences: application to polyphonic music generation
and transcription (2012)
3. Chen, H.S., Chen, H.T., Chen, Y.W., Lee, S.Y.: Human action recognition using
star skeleton. In: Proceedings of the 4th ACM International Workshop on Video
Surveillance and Sensor Networks, VSSN 2006, pp. 171–178. ACM, New York
(2006). https://doi.org/10.1145/1178782.1178808
4. Dahl, G., Ranzato, M.A., Mohamed, A.R., Hinton, G.E.: Phone recognition with
the mean-covariance restricted Boltzmann machine. In: Lafferty, J.D., Williams,
C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Advances in Neural Infor-
mation Processing Systems, vol. 23, pp. 469–477. Curran Associates Inc. (2010)
Spatial and Temporal Feature Extraction Using an RBM Model 13
1 Introduction
The conventional recognition of objects in images for a long time was carried out in a
specific and imprecise way, because in order to detect the object and perform the
recognition it was necessary to extract the characteristics manually, using different
techniques depending on the task [1], techniques such as Principal Component Anal-
ysis (PCA), Independent Component Analysis (ICA), Local Binary Pattern (LBP),
Wavelet transform, gabor transform, Histogram of oriented gradients (HoG), Scale-
invariant feature transform (SIFT), among others. This caused that the system could not
be used to recognize objects outside the context for which one of the aforementioned
techniques had been tuned. With the arrival of deep learning, the feature extraction in
images is now done automatically [2], thanks to convolutional neural networks (CNN),
which allow the recognition of thousands of objects in images. in order to train these
networks requires the use of a large number of images where the object that the
network is going to recognize is located. A simple CNN can perform the task of
recognizing objects in images with an accuracy of 57%, but in search of a better
system, there are more complex networks with 80% accuracy. The training of more
complex CNNs requires a high computing capacity, due to the number of images and
neurons required by the network, and thanks to the Graphics Processor Unit (GPU) it is
possible to do it more quickly [3], but it continues being a process that takes days and
even weeks to complete, even if there are computers with exceptional capabilities. It is
because of this that large companies and universities release to the public networks of
complex and very precise architectures so that they can be used and evaluated by all
those who need them. These networks allow users to use all their architecture leaving
the last layer of classification free. re training this last layer is called transfer learning
since the network is used as a feature extractor and the necessary classes of each user
application are added, in this way the time and computation invested in the training is
minimal, being few minutes, and providing that a computer with average characteristics
is capable of executing these tasks. The objective of this project is to allow the use of
pre-trained models for different applications without training a CNN from scratch,
allowing many more people to make use of convolutional neural networks with mid-
range computers. In Sect. 2, the generalities of convolutional neural networks are
explained, this is because Sect. 3 talks about the neural network used for transfer
learning, Inception-v3 and its architecture. In Sect. 4 we describe the concept of
transfer learning to later talk about bottlenecks, what they are and what they are used
for. Section 7 explains the classifier trained for this application and finally Sect. 9
shows the results of the tests in real time.
Convolutional neural networks are deep networks, which are composed of many
interconnected neurons organized into many different layers [4], the specialty of these
networks, as the name implies, is the operation “convolution”, this is the equivalent of
passing a filter by the image, where each neuron in the input layer passes a different
filter, these convolution layers require that their neurons use a different activation
function which is rectified linear unit (ReLU), which returns zero when values coming
from the Neurons are negative and for the positive ones behaves in a linear way. The
next layer is the max Pooling which chooses the highest values and reduces the
dimensions of the layers as the network progresses. The last layer is the fully connected
for which its activation function is the Softmax, it realizes the classification of all the
data that comes out of the network and finally makes the prediction of the image, as
shown in Fig. 1.
The selection of the pre-trained model is carried out for the development of this work,
for which the most used models are weighted under the following criteria: Number of
network parameters, network weight, top-1 precision and the computation capacity
required for its training.
The first criterion is the number of parameters in the network, this gives an idea of
the amount of calculations that the computer must perform to process an image through
the entire network. The second criterion is the amount of space that the network
occupies within the computer, where it is required that it is not elevated. The third
criterion is the top-1 precision, which indicates how good the predictions of the model
are, and it is intended to be as large as possible. The fourth and last criterion determines
the amount of space in memory that requires the processing of an image inside the
computer, seeking to use the least amount of this possible because there is a GPU of
only 2 GB.
Based on the criteria described above, the following rating table is made (see
Fig. 2), to select the most suitable pre-trained model for the development of this
project. Scores of 1, 3 and 9 were assigned for the different comparison items.
Score
1 3 9
Criteria
Computing capacity
Modelos Number of parameters Weights Top-1 Accuracy Total
required
AlexNet 3 9 1 9 22
VGG-16 1 1 3 1 6
Inception-v3 9 3 9 3 24
Figure 3 shows that the model with the highest score, which meets all the estab-
lished requirements is Inception-v3, therefore, it is the model selected to work on this
project.
4 Inception-V3
The Inception network is a network trained by Google, which has a top-1 accuracy of
78% and is capable of classifying 1000 categories. It is selected for this work due to its
great acceptance in the community, easy use and high precision. The inception block is
shown in the following (Fig. 4):
The block of the previous figure represents each module that has the Inception-V3,
it is not the typical network that has layer after layer. The concept of the Inception is a
microNetwork within another Network [6]; that is why at the exit of each layer there
are 3 convolution layers (1 1, 3 3 and 5 5) and a max pooling operation, which
will be concatenated and represent the output of that module, as well as the input of the
next one. The complete architecture of the Inception-V3 network is shown in Fig. 5.
5 Transfer Learning
In practice, as already mentioned, training a CNN takes a lot of time and processing
capacity, the latter is not easily accessible, so the transfer learning is a solution that has
been received very quickly. The use of pre-trained models is a common practice, as
some of these have been previously trained with more than 1,000,000 images with
1,000 categories, there are several cases for transfer learning, which are explained
below:
– A CNN as a fixed feature extractor, where, as its name implies, it performs the
extraction of the characteristics that make the image of the data set processed by the
network and with the final result a new classifier is trained, whether it is a Support
Vector Machine (SVM) or a Softmax layer for classification.
– Adjustment or Fine-Tuning to CNN, for this case not only the classification layer is
trained but also the weights of the pre-trained network are modified using Back-
propagation, the neurons of the higher levels can be modified, and let the first ones
layers perform border detection tasks and so on.
6 Data Augmentation
The lack of images in unique categories, provide a possible over training in the net-
work, guiding it to learn the categories or leaving the data set without enough images to
perform a correct training. The data augmentation offers an alternative and a solution to
these problems, with it you can generate an extensive data set with only process and
apply different filters to a few images, increasing the number and diversity of these.
A script is developed that contains 24 operations and filters that are applied to a
single image, these operations contain rotations, scaling, noise, filters, flips among
others, this is how from only 20 photos taken of the object, you can obtain 500 images
for the training data set, and allow the classifier to generalize the images of an object at
different approach angles. The operations performed are shown in Fig. 6.
Application of Transfer Learning for Object Recognition 19
7 Platforms
In this section, different deep learning platforms are compared in order to select the
most appropriate for the implementation of the computational tool to be developed. It
also describes the available hardware and the libraries used.
For the selection of a platform, different characteristics have been collected as
shown in Fig. 7. The first criterion is that the platform is open source, thus reducing
implementation costs for the tool and it has a large community that performs contri-
butions to the software, which generates a great help for people who want to develop
applications. As a second criterion, it is sought that the software can be installed in the
largest number of operating systems to generalize the use of the tool. The third criterion
is that the software has Python programming interface, since it is more popular around
the world and provides a huge community for developments. On the other hand,
Python is a high level language and very user friendly. For the fourth criterion it is
required that the platform can perform the processing of the networks and their
respective calculations in GPU to reduce training times. Finally, it is desired that the
software allows the visualization of neural networks, in this way the architecture of the
pre-trained models is known to perform the transfer learning, and it offers the possi-
bility of carrying out a debugging process in case of errors in training. Scores of 1, 3
and 9 were assigned for the different comparison items.
According to the scores assigned to the respective selected criteria, we proceed to
perform the corresponding weighting of the platforms according to their functionalities
and characteristics, in this way we can differentiate between the platform that meets the
most criteria appropriately.
Figure 8 shows that the platform with the highest score, which meets all the
established requirements is TensorFlow, therefore, it is the software selected to work on
this project.
Another random document with
no related content on Scribd:
back
back
back
back
back
back
back
back
back
back
back
back