You are on page 1of 45

Artificial Neural Networks

in 45 Minutes

artificial intelligence

Carlos Sposito Araujo, M.Sc.

Copyright © 2018 Carlos Sposito Araujo

All rights reserved

Cover: Luiza Rocha e Sposito


For my three children, who endured my single subject “neural networks” for months
while I was in the early stages of studies and I was absolutely certain that was in my
hands the solution to all the problems and puzzles of the universe!
Inciting curiosity

Consider a gym center. It receives periodic enrollments, but also, unfortunately, it receives
periodic lock enrollments. Imagine that a gym’s administrator wished to predict for each new
client the period of time that would elapse between his enrollment and his enrollment lock. With
this “future forecast” in hand he would be able to do a job of convincing those whose
abandonment was expected soon. This information, however, would only be useful if the
moment of detachment were known well in advance. If possible, already on enrollment day.
Another situation: a telephone company. Whenever a customer delays their payment for a few
days, the company control system automatically triggers a postal charge, generating costs.
Considering that some of these late payment users are not bad payers, but forgotten customers or
traveling customers who will pay their bills as soon as they remember or return from travel, this
cost from postal charge would not have to occur. The most effective would be to send collection
only to real bad payers.
Is it possible the company to separate the mass of customers into two groups, good payers and
bad payers?
Yes! These two cases — gym center and telephone company — are real and have been solved
with the use of neural networks.
Who is this book for?

If you are just wanting to satisfy your curiosity about artificial neural networks (ANNs), or even
wanting to acquire an initial foundation that allows you to deepen your knowledge then this book
may be a good option.
Otherwise, if you have surpassed the initial phase of studies and already have the mathematical
foundation behind the ANNs concepts, look for a more in-depth bibliography, such as that
available at the end of this book.
Here, the goal will be to open the door to the basic understanding of ANNs.
From the beginning

Any study you do about ANNs, whether in books or scientific articles, it’s almost certain that
there will be a basic initial explanation if the author considers that the reader doesn’t yet have
sufficient technical knowledge on the subject.
In this case, in majority of texts, the explication will be very similar to:
“ANNs are techniques belonging to the set of studies known as Computational
Intelligence (formerly called Artificial Intelligence) that are based on mathematical
models influenced by the neural structures of human intelligence and that have the
capacity to expand knowledge through the experience of past cases.”
If the previous paragraph left you with more doubts than clarification, do not be alarmed. With
ease, we'll get there soon.
So even if you are up to date about the fundamentals of mathematical functions, do not skip this
beginning. The goal is to link concepts, increasing complexity step by step.
An initial view

To think about functions we need to look at the idea of sets.


Although it’s a primitive concept, that is, without mathematical definition, we can consider that a
set is a list of symbols or objects, as shown in Figure 1.
Exist situations where there is a match between the elements of two sets. In this case, we say that
there is a relationship between them.
Some relationships follow these three conditions:
1st condition - The relation between elements of two sets must only occur in one direction.
Visually, when using Venn diagrams, there can only be arrows starting only at one of the sets, as
shown in Figure 2. In this case, the set from where the relations start is called domain; the set
where relations arrive is codomain.

2nd condition - Every starting set element must relate to an arrival set element. However, it’s
possible that some codomain elements won’t receive domain element relationships. Visually, in
the Venn diagrams, all the starting set elements (domain) must have some arrow coming out of
them and reaching an arrival set element (codomain). However, there may be arrival set elements
without any arrow arriving to them, as shown in Figure 2.

3rd condition - Each domain element must be related to one and only one codomain element. In
Venn diagrams there can be no more than one arrow coming out of the same starting set element.
However, it’s possible to arrive more than one arrow in the same arrival set element, each of
these arrows being started in a different starting set element, as shown in Figure 2.
If the three conditions are met, the relationship becomes a function.
***

In functions, as not all codomain elements need to maintain a relationship with domain elements,
as the 2nd condition says, there is a distinction between the codomain and its subset which
contains only the elements that receive the relations coming from the domain. This subset is
called image (or range).
In Figure 2, the codomain is the set

{A, C, E, F, H, J}

The image set, however, doesn’t have the element “H”:

{A, C, E, F, J}.
A little algebra

Mathematically speaking, we say that a function f of A in B, written in the form y=f(x), is a


relation between sets A and B, where each element x ∈ A is associated with one and only one
element y ∈ B.
This relation y=f(x) indicates that each y value is obtained from a value of x previously chosen.
This demonstrates that y depends on x, as it’s necessary to know the x value before the y value
can be defined. Therefore, we say that x is the independent variable and y is the dependent
variable.
Let's look at a case study.
We know that the fuel expenditure on a vehicle depends on its speed (among other variables that
won’t be considered here for simplification purposes). Thus, we affirm that the fuel is consumed
as a function of the vehicle speed. The speed chosen by the driver at every moment is the factor
that will control this relationship, it’s the independent variable; the consumption will be the
dependent variable on the speed.

The Brazilian magazine Quatro Rodas® conducted a survey with the VW Fox® vehicle, as shown
in Table 1.
We can create a function that can be defined in the tabular form (through the data collected in the
research) or through the analytical expression

y = 99.731 - 1.0108 * x

where y represents the fuel consumption and x it’s the vehicle speed.
It’s easier to visualize this function and the relation between its two variables through Graphic 1,
where the domain is between 49.7 mph and 74.6mph (limits defined in the research), the
codomain is between 0 U.S. MPG and 50 U.S. MPG and the image it’s between 24.5 mph and
49.6 mph.

Thus, with what we have seen so far, a function can be explained as a rule that describes how a
quantity is determined (dependent variable) through other quantities (independent variables),
always in a unique way (as stated in the 3rd condition).
Functions with two independent variables

There are situations that need to be defined by functions that have not one, but two independent
variables, both associated with one dependent variable. This means that, in order to calculate the
dependent variable value, we must first determine the two independent variables values.
Mathematically we write functions with two independent variables in the form z=f(x,y), or with
other letters that please more, although this is the most usual way.
In this case, with two independent variables, the domain won’t be a set of points on the axis x, as
in functions with only one independent variable, but rather a set of ordered pairs (x,y) that
associate the points combinations of domain of each independent variable within a plane.
See this practical example.
According to Prof. Dr. Turibio Barros, we can calculate the caloric expenditure of a foot race, in
the horizontal plane, through the function

spent = 0.0118 * speed * weight

where, in the representation z=f(x,y), we have that x is the velocity, y is the runner’s weight and
z is the caloric expenditure. In other words, caloric expenditure is a function of runner’s speed
and weight.
Applying numbers for a better understanding, let us imagine the runner’s speed as 7.8 mph and
the runner’s weight as 171 lbs, which generates the ordered pair (7.8;171) in the function

z = f(7.8; 171)

By replacing the values in the function, we would have

spent = 0.0118 * 7.8 * 171 =

15.7 kcal/min

To graph this function we need our three-dimensional view, using the x, y, and z axes.
In Graphic 2, the runner’s speed is between 5 mph and 7.5 mph; the runner’s weight is between
154 lbs and 198 lbs. Both intervals were chosen arbitrarily to facilitate the graphic assembly, not
representing real limits.
This function’s domain is the set of ordered pairs that combine the x axis points between 5 mph
and 7.5 mph with the y axis points between 154 lbs and 198 lbs. The z values, the caloric
expenditure, are on the surface drawn in Graphic 2.
Functions with three or more independent variables

The mathematical representation of a function with three independent variables (and a dependent
one, you must not forget) is, in general,

w = f(x, y, z).

By extension, a function with four independent variables can be written as

s = f(x, y, z, w)

and a function with n independent variables can be represented by

y = f(x1, x2, x3, ..., xn),

or with any other letters you prefer to represent the variables, as we saw earlier. The important
thing is to relate the independent variables to the dependent variable.
Seen this, the following question is posed: what would a graphic representation of a function
with three, four or more independent variables be?
Would not be!
After all, we can only visualize graphics until the limit that we see the world, that is, three
dimensions.
However, this visual impossibility doesn’t hinder the mathematical construction of functions of n
variables. Only, we won’t have the facility of observe graphically something with this level of
complexity, being its representation restricted to algebra.

***

In the physical evaluation study we find a good example of a function with three variables. The
study “Prediction of body composition in female athletes”[1], developed to estimate the lean body
mass (LBM) of female athletes aged between 18 and 23 years, uses as independent variables the
total weight and the thigh and neck’s perimeters:

LBM=0.757*TBM+0.981*PN–0.516*PT+0.79

where TBM is the total body mass, PN is the neck’s perimeter and PT is the thigh’s perimeter.
In order to obtain the dependent variable LBM value we would have to define previously the
three independent variables TBM, PN and PT values.
***

Thinking about functions with four independent variables, we can observe in the electrical
engineering area the electric conductor dimensioning calculation through the unit voltage drop
limit criterion. In this situation we have

Uunit = (e(%) . V)/( IB . L)

where e(%) is the allowable voltage drop, V is the circuit voltage, IB is the electric current, L is
the circuit length. That is, to obtain the dependent variable Uunit value it would be necessary to
previously define the four independent variables e(%), V, IB and L values.
In a practical situation, replacing the variables in the function, we would have

Uunit=(0.04*220v)/(24.5A*0.015km)=

23.9 V/(A.km)

And about the graphic of this function?


It would require a five-dimensional drawing.
No way!!!

***

In the physics study area we can borrow a function with five independent variables, the famous
Newton's Law of Universal Gravitation, which would generate a six dimensions graphic, if that
were possible.
The force F exerted by a particle of mass m0, positioned at the coordinates origin xyz, on another
particle of mass m at the point (x, y, z), is given by the function:

F(m0,m,x,y,z)=(G.m0.m)/(x2+y2+z2)

where G is the universal gravitational constant.


Ugly graphics and beautiful graphics

In general, graphics that represent known functions are beautiful and elegant (always considering
that this is a very personal opinion). Let's look at some:
Whereas these functions are beautiful, elegant and visually pleasing, what would a horrific,
visually unattractive function look like?
Let's look at a very simple case of a function that we might consider ungainly, with only one
independent variable. Table 2 shows their values:
This is the graphic generated from the table:
Suppose that, although it wasn’t a beautiful and elegant graphic, it represents a real situation. In

this case, we could infer other dependent variable y values from independent values, such as

x=2.5 or x=-0.7.

For this, we would have two ways: to find the y value by its graphic or to put the x value in the
referring function, no matter what odd expression it had.
Practical use of ugly and inelegant functions

Let's go back to the first case discussed at the beginning of this book, the prediction of the elapse
between enrollment and enrollment lock. For this, what data would you use? The student's age?
The distance from his residence to the gym? His professional level? His marital status? His
weight? His number of children?
To make the situation more complex, imagine that there is a strong possibility that there are
some neighbors among the regulars — that is, they have the same distance from the residence to
the gym, one of the proposed data. Imagine that they have different ages, weights, and
professions. Imagine that they could be attending the gym at same time until they locked their
enrollments, each on a possibly different date.
Could it be possible to create a mathematical function that associates the age, the distance from
the residence to the gym, the weight, the professional level and the number of children, whose
result through the dependent variable could indicates the stay time in the gym? When we enter
the values of a certain student, the function should return us, in the dependent variable, the
expected interval time between the begin of his training program activity and the lock of his
enrollment.
Imagine how “dowdy” should be a function like this...
ANNs? We are getting closer and closer

This problem about the adherence time prediction in the gym activity program was exactly the
subject of my master's research, “Identification of Adherence Factors in Physical Activity
Programs in Gyms Using Computational Intelligence”[2], with Prof. Dr. Renan Moritz Varnier
Rodrigues de Almeida as my counselor. After we analyzed 63 available variables in the gym
database, we arrived at a function with 11 independent variables, including the address, gender,
profession, marital status, and dates of birth, enrollment and locking. The dependent variable
indicated, as a result, whether or not there would be a minimum stay of six months. This
dichotomy “less than six months / more than six months” was a strategic decision based on
studies that demonstrated that this point — six months — is an important watershed for
adherence to physical activity programs[3].

***

Another study which I participated with Costa G.G., Alvarenga A.V. and Pereira W.C.A.,
“Contour Classification in Digital Mammograms Using Artificial Neural Networks of Type ART-
2”[4], focused on the analysis of nodules detected in mammograms, aiding the medical diagnosis
with a “second opinion”.
Considering that malignant nodules have shape characteristics (irregular contours with
extensions from their borders) that differ from benign nodules (regular contours and circular or
oval shape), the function used six independent variables, all related to graphic nodule
characteristics. The dependent variable, also dichotomous, classified the nodule as “regular” or
“irregular”.

***

In the “Artificial Neural Networks for Infant Mortality Modeling”[5] study, developed in the
Biomedical Engineering Program of COPPE-UFRJ by Gismondi R.C., Almeida R.M.V.R. and
Infantosi A.F.C, 43 independent variables were used, including social, economic, environmental
and health indicators from 59 Brazilian municipalities, resulting in the infant mortality rate of
these municipalities as a dependent variable. Here, the study aim wasn’t to predict, but to
classify.

***

The study “Neural Networks for Preventing Default in Telephony Operators”[6], the second
example discussed at the beginning of this book, developed as a doctoral thesis in the Civil
Engineering Program of COPPE-UFRJ by Carlos André Reis Pinheiro, aimed at identifying
previously the behavior of defaulters clients, classifying them by the type of tendency to non-
payment, making it possible to take preventive actions by the company. In the function, 27
independent variables were used, most of them related to invoice, traffic minutes and days of
delay in the month averages, as well as traffic, pulse, days of delay and defaults indicators. The
dependent variable, also dichotomous, determined the good or bad client condition.

***

All these studies have in common the use of a mathematical function with n independent
variables, being n as large as necessary, in addition to a dependent variable to return the result.
We arrived!

Finally, we come to the book’s key point.


All studies presented previously have used neural networks.
You looking carefully, the ANNs can be summarized, nothing more or less, as

“dowdy” functions of n variables.

But why call these functions from “dowdy”?


Although Mathematics has terms outside the ordinary citizen's vocabulary, such as cevian,
apothem, holomorphic leaflets, ergodic theory and nilpotent algebra, the term “dowdy” is not a
concept used in this universe.
In the search for a term that represents how much these functions can be inelegant and ugly, at
least to what we are accustomed in the academic environment of elegant graphics and functions,
I was based on the Cambridge Academic Content Dictionary© definition from the word “dowdy”:

“not attractive or fashionable”

The definition seems to represent what would be the ugly aesthetic of these functions.
And how is such this “dowdy” function created?

Through the presented cases, we have seen that ANNs are basically mathematical functions with
multiple independent variables — usually there aren’t “visible to the naked eye” relations
between most of them — besides a dependent variable that returns the result.
In order for this “dowdy” function to be created, some steps must be walked:
Step 1 - choose the ANNs model
Each problems group has characteristics and objectives — control; ranking; prediction; or
approximation — that make it tend to a particular ANNs model that will solve it more
efficiently and accurately.
The model choice is critical to the study success.
Step 2 - the selection from software that will manage the neural network (ANNs)
The most practical way to generate the ANNs function, and later manage its use, is through
some existing software. In general, these applications are prepared to manage more than one
ANNs model.
The most used software are:

Matlab® (supports the ANNs models: Feedforward; Radial Basis; Dynamic; Learning
Vector Quantization; Competitive Layers; and Self-Organizing Maps);

NeuroDimension® (37 different models of ANNs, like Multilayer Perceptron,


Probabilistic; and Support Vector Machines, among others);
Wolfram Mathematica® (main supported ANNs models: Feedforward; Radial Basis
Function; Dynamic; Perceptrons; Vector Quantization; Unsupervised Networks; and
Hopfield Networks).
Step 3 - make available data about past cases
The success of an ANNs to solve a problem is directly related to the amount of input and
output data available, referring to situations already occurred in the past. The greater the
supply of cases, the greater the chance that the software will create a function that represents
the past problem and resolve the future cases.
In the study about adherence to the gym training program we had the data of hundreds of
former members, including the most important information for the study: both the enrollment
day and the locking enrollment day. That is, the length of time they were active at the gym,
which was the research key point.
Step 4 - ANNs training
With the data about past problem cases, we should randomly divide this mass into two
groups.
One of the groups will be used to feed the chosen software algorithm, in the phase called
ANNs training. By reading and rereading these data for a large number of times, sorted and
reordered internally in different ways, the application will shape the “dowdy” function.
At the end of this operation — which may linger minutes, hours or days of processing,
depending on the computer capacity, data amount and independent variables amount — the
algorithm will have created a function with the ability to return in the dependent variable the
historical values related to the data set introduced, as close as possible to the actual values.
Step 5 - ANNs validation
With the function provisionally created, tests will be performed with the data from second
group — those that were not previously used in the ANNs training phase (Step 4).
The necessity from separate the data into two groups is intended to prevent any influence at
this stage (bias), since artificially correct results would occur if the same data were used to
create the “dowdy” function (Step 4).
In this validation phase, comparing the true answers from the real cases with the answers
generated by the function, we’ll have the percentage of correct answers. This will indicate
whether the ANNs (in other words, the “dowdy” function) is really generating satisfactory
results within the minimum expected hit values. If not, not meeting expectations, one should
return to the training phase (Step 4) to try to refine the function. Next, a new validation phase
should try to confirm the improvement of function efficiency.
Step 6 - effective use of the ANNs
After the function tests results approval, the ANNs will be ready for full use, when it can then
receive new values in the independent variables and deliver responses in the dependent
variable.
A little history of ANNs

Now, with the basic understanding of what goes on inside the “black box” of an ANN, you can
follow better the historical sequence of its development.
The ANNs, considered non-algorithmic computational models, had as their initial inspiration the
complex network of neurons in the human brain, with their axons, dendrites and synapses.
The first model was proposed in 1943 by the physiologist Warren McCulloch and the
mathematician Walter Pitts in the landmark work “The Logical Calculus of Immanent Ideas in
Nervous Activity”[7], where the two researchers studied the analogies between a neuron and a
binary electronic process[8].
However, the McCulloch and Pitts’ study was more about the attempt to describe an artificial
model of a neuron and its computational capacities than even to present any computational
learning technique. This McCulloch-Pitts neuron can be modeled as a particular case of linear
discriminator of binary inputs.
In the late 1950s, at another important moment in the ANNs development history, Frank
Rosenblatt, in his paper “The Perceptron: a Probabilistic Model for Information Storage and
Organization in the Brain”[9], refines the McCulloch and Pitts’ ideas, creating a network with
several neurons, also of the linear discriminator type, and calls it perceptron. In this model, the
neurons were arranged in layers: the first contained the neurons that received the inputs; the latter
contained the neurons that delivered the outputs; and the intermediate layers, called hidden
layers, that did the true processing.
In 1960, Bernard Widrow and Marcian Hoff perfected the perceptron, creating ADALINE
(ADAptive LInear NEuron)[10]. The improvement introduced were the weights, which multiplied
the entries, and their totalization. In addition, there was inclusion of the sum of a bias.
The next step was the improvement of ADALINE, with the development of MADALINE (Many
ADALINE), which used several ADALINEs in parallel with only one output, whose result was
based on intermediate rules.
In the 1970s and early 1980s, ANNs studies were not entirely paralyzed because the Kohonen's
studies[11] of ANN Self-Organizing Maps and the Paul Werbos' studies[12] about the development
of the Backpropagation algorithm.
The researches returned strongly from the mid-1980s by the hands of Gail Carpenter and Stephen
Grossberg[13], developers of the unsupervised ART (Adaptive Resonance Theory) model, based
on the processing of human cognitive informations. In the following years, variants of the ART
model were developed: ART 1, ART 2, ART 3, ARTMAP, Fuzzy ART, ART 2-A and dART.
Some practical applications

For better illustration and better understanding, in addition to the applications presented in the
course of the book — adherence to gym’s training; sending charges to debtors in a telephone
company; contour classification in mammograms; and modeling of infant mortality — now are
presented some others applications used on a daily basis.

Prediction of the time of permanence of patients

Considering the importance of predicting resources for the effectiveness of a company's planning
- and a hospital unit fits in with this situation -, Mobley and colleagues[14] studied the application
of mathematical models that would aid the analysis and prediction of the period of permanence
of patients in a post-coronary treatment unit (UTPC), making use of a ANN Backpropagation
Multilayer Perceptron and a linear logistic transfer function (FTL) model. The variables used
were taken from the completed form at the time of patient admission to the UTPC. The ANN
Backpropagation Multilayer Perceptron was tested with two different topologies - two and three
intermediate layers -, with similar results in both, with 72% accuracy in the prediction of
permanence, when one day was considered a maximum error. The FTL results, for the same
maximum error of one day, were 64% accurate.

Electrocardiogram

Recognizing the importance of the electrocardiogram (ECG) in clinical practice, Maglaveras and
colleagues[15] reviewed trends in ECG pattern recognition, especially nonlinear transformations
and the use of ANN-based techniques for pattern recognition and classification, with algorithms
being tested for the detection of ischemic heart beats and recognition of atrial fibrillation.

Soares and Nadal[16], considering crucial the importance of ECG in the diagnosis of cardiac
integrity, studied a method of automatic detection of ST segment alterations using an ANN
Backpropagation Multilayer Perceptron trained with the Levenberg-Marquardt algorithm[17] for
pattern classification, where parameter extraction and dimensionality reduction were performed
through the use of Principal Component Analysis - a statistical method also used by Muniz and
Nadal[18] to distinguish the vertical component of the reaction force of the soil in gait test with
patients with fractures in lower limbs. In this case, the ANN Backpropagation Multilayer
Perceptron was trained with six different topologies, varying the hidden layer with 6, 10, 15, 20,
25 and 30 neurons. The use of Principal Component Analysis reduced the initial number of 90
input parameters (ST-T segment size) to only five, decreasing the number of input-layer neurons
to this same value.
The output layer worked with three neurons (ST+, ST-, N). The topology with the best
performance was the one that used 15 neurons in the hidden layer. For the ST+ segment
evaluation, the results indicated an accuracy of 89% and a sensitivity of 93%. Regarding ST-
alterations, the accuracy was 78% and the sensitivity 80%. For the normal segments, the
accuracy was 77%. These values are compatible with equivalent automatic systems found in the
literature, including more sophisticated systems employing similar methodology.
Also in a study of ST segment changes, because it was considered a good predictor of myocardial
infarction and sudden death, Frenkel and Nadal[19] investigated four methods of ST
representation, two based on morphological parameters and two based on Principal Component
Analysis, comparing their performances:
Methods based on morphological parameters
Direct analysis of the ST[20] segment in a single amplitude measure of the point located
104ms after the R wave;
RST method[21], two dependent samples in the RR interval delimiting the ST segment,
using the mean value of all samples located between them.
Principal Component Analysis based methods
The coefficient of the first Principal Component;
The coefficients of the first six Principal Components used as inputs from a ANN
Backpropagation Multilayer Perceptron.
The results pointed to the possibility of use any of the four methods for high ST segment
changes, above 300 μV. However, when there were moderate changes, between 100 μV and 300
μV, the ANN Backpropagation Multilayer Perceptron had the best result, with sensitivity of 84%
and positive predictive value of 75%.

Erythematous-squamous diseases diagnosis

David and Vivian West[22] investigated the accuracy of ANNs models in erythematous-squamous
diseases diagnosis that have very similar visual identifications among them, as well as
histopathological characteristics resulting from biopsy: psoriasis; seborrheic dermatitis; lichen
planus; chronic dermatitis; pityriasis rosea; and pityriasis rubra pilaris. For this study, the
authors used 34 dermatological variables that were applied in the ANN Backpropagation
Multilayer Perceptron and ANN Self-Organizing Maps models, and subsets of ANNs specialists.
Skull trauma

To create a medical decision support system for cases of skull trauma, Li and colleagues[23]
compared three mathematical models: linear regression; ANN Backpropagation Multilayer
Perceptron; and ANN Radial Base Function. These models used the variables: type of fracture;
Glasgow Coma scale[24]; seizure episodes; and the degree of recommendation of an open skull
surgery. The results showed values of sensitivity, specificity and area under the ROC (Receiver
Operating Characteristic) curve, respectively for: ANN Backpropagation Multilayer Perceptron,
88%, 80% and 0.897; ANN Radial Base Function, 80%, 80% and 0.880; and, linear regression,
73%, 68% and 0.761. According to the authors, the results suggest that ANNs may be a better
solution for complex nonlinear medical decision support systems than conventional statistical
techniques, such as linear regression.

Childhood dysfluency and stuttering


Considering the great challenge that for decades has been the difficult differentiation between
cases of infantile disfluency and stuttering, Geetha and colleagues[25] used an ANN
Backpropagation Multilayer Perceptron in order to achieve this discrimination. To do this, they
used data from children between the ages of two and six years old, divided into two groups: the
first, with 25 children, was used to train the ANN Backpropagation Multilayer Perceptron; the
second, with 26 children, was used to predict the diagnosis. Despite the low number of cases
during the training of ANN Backpropagation Multilayer Perceptron, the prediction had an
accuracy of 92%.

Insulin administration

As a rule, insulin administration for diabetics follows parameters based on the experience and
intuition of the attending physician, and there is insufficient information in the scientific
literature to address the practical aspects of dose application. To study the subject, Gogou and
colleagues[26] used an ANN Backpropagation Multilayer Perceptron that received information
from experts from the UK and Greece from previously submitted questionnaires.
The ANN Backpropagation Multilayer Perceptron was trained with 100 cases and tested with
another 100 cases of patients. The system correctly classified 92% of the test cases, showing to
be applicable to this problem.

Hemodynamic changes
Small changes that occur in a patient's physiology are difficult to detect, particularly in intensive
therapy units where the environment is bombarded by an avalanche of control signals sent
simultaneously by various devices. Parmanto and colleagues[27] used an ANN for classification
and detection of hemodynamic changes, since their early discovery, accompanied by appropriate
intervention, can lead to efficient patient care. Unlike many studies in Biomedical Engineering,
where the data used are static and the ANN Backpropagation Multilayer Perceptron and the ANN
Radial Base Function are powerful tools for decision support medical systems, this study used a
ANN Time-delay[28] to process data that are necessarily dynamic and updated in real time. An
ANN Time-delay was able to identify the hemodynamic conditions of 1138 situations (93%) in a
total of 1224 where the remaining cases were 56 of transition situations (5%) and 30 of data with
noises (2%), indicating the ANN Time-delay model for this type of dynamic application.

Transient ischemic attack and stroke

In order to study the prevalence of transient ischemic attack and stroke in populations, Barnes
and colleagues[29] used an ANN Backpropagation Multilayer Perceptron trained with
questionnaire data. Although the transient ischemic attack is a subjective phenomenon, both for
the patient and for the health professional, which may result in inconsistent interpretations, the
authors believe that the ANN concept constructed for this study facilitated the identification of
transient ischemic attacks and strokes.
Anything else

Although filled the book purpose — the presentation from basic ideas about ANNs —, still there
are some points that deserve to be mentioned:

- In some ANNs models the data used during the real application continues to update and adjust
the function parameters, improving its accuracy more and more.

- ANNs models differ radically from traditional programming techniques, since a software
structure like this does not depend on the a priori knowledge of a programmer regarding possible
solutions.

- It’s important to understand that given the non-algorithmic way in which the ANNs function is
created, one should not expect a 100% success in the validation phase, much less during actual
application with new data. It’s important to keep in mind that ANNs applications are exactly for
those problems where there is no straight, mathematically exact path to your solution.

- The idea of a rather complex function, whose parameters are adjusted as closely as possible to
an already past reality, will never be a guarantee of success in the future. In many practical life
problems it’s preferable to work with 70%, 80% of correct answers, than to obtain no answer
some.

- In some problems, as in the telephone company study, the information — dates, invoices,
minute count — is collected automatically, without manual intervention. In other situations, as in
the gym study, the data used are typed, generally not having later conferences that guarantee the
information accuracy. In the latter case, a data part should be disregarded after a detailed analyze
of inconsistency.

- An important feature of ANNs is their generalization capability, extracting useful output from
an imperfect, incomplete or noisy data set, thanks to parallel processing of input data. This
makes it quite fault tolerant, precisely because the error in a neuron can be covered by the correct
outputs of its neighboring elements.

- There are two types of ANN learning, supervised and unsupervised:

In supervised learning, two sets of data are used, one of inputs and one of the
corresponding outputs. In the training phase, the inputs are presented to the ANN, it must
be checked whether the calculated outputs correspond to the previously known outputs. If
not, the ANN should adjust the weights in order to store the required knowledge. This
phase should be repeated with the same input and output data until the ANN hit rate is
within a range considered satisfactory (as stated above, always below 100%).

In unsupervised learning, also known as self-supervised, there is no output data set, only
the set of inputs. In this case ANN works the input data by classifying them according to
its own criteria, that is, the neurons are used as classifiers and the data as classification
elements.
Thanks

When I think about artificial neural networks, the first person that comes to mind is Prof. Dr.
Renan Moritz Varnier Rodrigues de Almeida, my adviser in Biomedical Engineering master's
degree at COPPE-UFRJ. As soon as I told him about my interest in this subject, he returned
minutes later with several articles on studies that his laboratory had published in scientific
journals. Some time later, we were working together on my dissertation.
Specifically regarding this book, I thank my friends — in alphabetical order — Alexandre
Seixas, Bethania Teixeira, Carlos Gatts, Fernando Marcellino, Luiz Antonio Pereira and Newton
Mansur. Everyone spent time in meticulous reading, presenting me with important collaborations
in improving the text.
At home, I got the ever critical reading of my children Luiza, Julia and Luiz Carlos, and my wife,
Odette. Everyone gave great contributions that made me even change a little the book route.
Thank you all!
Recommended reading

Applying Neural Networks: A Practical Guide


Kevin Swingler

Neural Network Learning: Theoretical Foundations


Martin Anthony, Peter L. Bartlett

An Introduction to Neural Networks


Kevin Gurney

Make Your Own Neural Network


Tariq Rashid

Introduction to the Math of Neural Networks


Jeff Heaton

Code Your Own Neural Network: A step-by-step explanation


Steven C. Shaffer
Review

If you have the time, click here to make a quick evaluation.


Thank you!
References

[1]
Mayhew, J.L.,Piper, F.C.,Koss, J.A.,Montaldi, D.H., 1983, “Prediction of body composition in
female athletes”, Journal of Sports Medicine and Physical Fitness, v. 23, n. 3, p. 333-340.

[2]
Araujo, C.A.S., 2010, “Identificação dos fatores de aderência em programas de atividade
física em academias utilizando inteligência computacional”[dissertação de mestrado], COPPE-
UFRJ, Programa de Engenharia Biomédica.

[3]
Dishman, R.K., 1991, “Increasing and maintaining exercise and physical activity”, Behavior
Therapy, v. 22, n. 3, p. 345-378.
doi:10.1016/S0005-7894(05)80371-5

Fallon, E.A., Hausenblasb, H.A., Nigg, C.R., 2005, “The transtheoretical model and exercise
adherence: examining construct associations in later stages of change”, Psychology of Sport
and Exercise, v. 6, n. 6, p. 629-641.
doi:10.1016/j.psychsport.2005.01.003

Robison, J.I., Rogers, M.A., 1994, “Adherence to exercise programmes. Recommendations”,


Sports Medicine, v. 17, n. 1, p. 39-52.
doi:10.2165/00007256-199417010-00004

[4]
Costa, G.G., Alvarenga, A.G., Sposito-Araujo, C.A., Silva, R.M., 2007, “Classificação do
contorno em mamografias digitalizadas utilizando redes neurais artificiais do tipo ART-2”, XII
Congresso Brasileiro de Física Médica, Foz do Iguaçu, Brasil.

[5]
Gismondi, R.C., Almeida, R.M.V.R., Infantosi, A.F.C., 2002, “Artificial neural networks for
infant mortality modelling”, Computer Methods and Programs in Biomedicine, v. 69, n. 3, p.
237-247.
doi:10.1016/S0169-2607(02)00006-8

[6]
Pinheiro, C.A.R., 2005, “Redes neurais para prevenção de inadimplência em operadoras de
telefonia” [doctoral thesis], COPPE-UFRJ, Civil Engineering Program.
[7]
Mcculloch, W.S., Pitts, W., 1943, “A logical calculus of the ideas immanent in nervous
activity”, Bulletin of Mathematical Biophysics, v. 5, n. 4, p. 115-133.
doi:10.1007/BF02478259

[8]
Azevedo, F.M., Brasil, L.M., Oliveira, R.C.L., 2000, “Redes neurais: com aplicações em
controle e em sistemas especialistas”. 1 ed. Florianópolis, Brasil, Visual Books.

Oliveira Junior, H.A. (org), 2007, “Inteligência computacional: aplicada à administração,


economia e engenharia em Matlab”. 1 ed. São Paulo, Brasil, Thomson.

[9]
Rosenblatt, F., 1958, “The perceptron: a probabilistic model for information storage and
organization in the brain”, Psychological Review, v. 65, n. 6, p. 386-408.

[10]
Widrow, B., Hoff, M., 1960, Adaptive switching circuits, 1 ed. New York, USA, Institute
of Radio Engineers.

[11]
Kohonen, T., 1982, “Self-organized formation of topologically correct feature maps”,
Biological Cybernetics, v. 43, n. 1, p. 59-69.
doi:10.1007/BF00337288

Kohonen, T., 1982, “Analysis of a simple self-organizing process”, Biological Cybernetics, v.


44, n. 2, p. 135-140.
doi:10.1007/BF00317973

Kohonen, T., 1989, Self-organization and associative memory, 3 ed. New York, EUA,
Springer-Verlag.

Kohonen, T., 1987, “Adaptive, associative, and self-organizing functions in neural computing”,
Applied Optics, v. 26, n. 23, p. 4910-4918.
doi:10.1364/AO.26.004910

Kohonen, T., 1988, “The ‘neural’ phonetic typewriter”, Computer, v. 21, n. 3, p. 11-22.
doi:10.1109/2.28

[12]
Werbos, P.J., 1987, “Building and understanding adaptive systems: a statistical/numerical
approach to factory automation and brain research”, IEEE Transactions on Systems, Man,
and Cybernetics, v. 17, n. 1, p. 7-20.
doi:10.1109/TSMC.1987.289329

Werbos, P.J., 1988, “Generalization of backpropagation with application to a recurrent gas


market model”, Neural Networks, v. 1, n. 4, p. 179-189.
doi:10.1016/0893-6080(88)90007-X

Werbos, P.J., 1990, “Consistency of HDP applied to a simple reinforcement learning problem”,
Neural Networks, v. 3, n. 2, p. 179-189.
doi:10.1016/0893-6080(90)90088-3

[13]
Carpenter, G.A., 1989, “Neural network models for pattern recognition and associative
memory”, Neural Networks, v. 2, n. 4, p. 243-257.
doi:10.1016/0893-6080(89)90035-X

Carpenter, G.A., Grossberg, S., 1987, “A Massively parallel architecture for a self-organizing
neural pattern recognition machine”, Computer Vision, Graphics, and Image Processing, v.
37, n. 1, p. 54-115.
doi:10.1016/S0734-189X(87)80014-2

Carpenter, G.A., Grossberg, S., 1987, “ART 2: self-organization of stable category recognition
codes for analog input patterns”, Applied Optics, v. 26, n. 23, p. 4919-4930.
doi:10.1364/AO.26.004919

Carpenter, G.A., Grossberg, S., 1988, “The ART of adaptive pattern recognition by a self-
organizing neural network”, Computer, v. 21, n. 3, p. 77-88.
doi:10.1109/2.33

Carpenter, G.A., Grossberg, S., 1990, “ART 3: hierarchical search using chemical transmitters
in self-organizing pattern recognition architectures”, Neural Networks, v. 3, n. 2, p. 129-152.
doi:10.1016/0893-6080(90)90085-Y

Carpenter, G.A., Grossberg, S., 2002, “Adaptive resonance theory”. In: ARBIB, M.A. (ed), The
handbook of brain theory and neural networks, 2 ed., part III, Cambridge, USA, MIT Press.

Carpenter, G.A., Grossberg, S., MEHANIAN, C., 1989, “Invariant recognition of cluttered
scenes by a self-organizing ART architecture: CORT-X boundary segmentation”, Neural
Networks, v. 2, n. 3, p. 169-181.
doi:10.1016/0893-6080(89)90002-6

Carpenter, G.A., Grossberg, S., Reynolds, J.H., 1991, “ARTMAP: supervised real-time learning
and classification of nonstationary data by a self-organizing neural network”, Neural
Networks, v. 4, n. 5, p. 565-588.
doi:10.1016/0893-6080(91)90012-T

Carpenter, G.A., Grossberg, S., Rosen, D.B., 1991, “Fuzzy ART: fast stable learning and
categorization of analog patterns by an adaptive resonance system”, Neural Networks, v. 4, n.
6, p. 759-771.
doi:10.1016/0893-6080(91)90056-B

Carpenter, G.A., Grossberg, S., Rosen, D.B., 1991, “ART 2-A: an adaptive resonance algorithm
for rapid category learning and recognition”, Neural Networks, v. 4, n. 4, p. 493-504.
doi:10.1109/IJCNN.1991.155329

[14]
Mobley, B.A., Leasure, R., Davidson, L., 1995, “Artificial neural network predictions of
lengths of stay on a post-coronary care unit”, Heart Lung, v. 24, n. 3, p. 251-256.
doi:10.1016/S0147-9563(05)80045-7

[15]
Maglaveras, N., Stamkopoulos, T., Diamantaras, K., Pappas, C., Strintzis, M., 1998, “ECG
pattern recognition and classification using non-linear transformations and neural networks: a
review”, International Journal of Medical Informatics, v. 52, n. 1, p. 191-208.
doi:10.1016/S1386-5056(98)00138-5

[16]
Soares, P.P.S., Nadal, J., 1999, “Aplicação de uma rede neural feedforward com algoritmo
de Levenberg-Marquardt para classificação de alterações do segmento ST do
eletrocardiograma”. In: Proceedings of the IV Brazilian Conference on Neural Networks, p.
384-389, São José dos Campos, ITA, Jul.

[17]
Hagan, M.T., Menhaj, M.B., 1994, “Training feedforward networks with the Marquardt
algorithm”, IEEE Transactions on Neural Networks, v. 5, n. 6, p. 989-993.
doi:10.1109/72.329697

[18]
Muniz, A.M.S., Nadal, J., 2009, “Application of principal component analysis in vertical
ground reaction force to discriminate normal and abnormal gait”, Gait & Posture, v. 29, n. 1,
p. 31-35.
doi:10.1016/j.gaitpost.2008.05.015

[19]
Frenkel, D., Nadal, J., 2000, “Comparação de métodos de representação do segmento ST na
detecção automática de isquemias miocárdicas”, Revista Brasileira de Engenharia
Biomédica, v. 16, n. 3, p. 153-162.

[20]
Akselrod, S., Norymberg, M., Peled, I., et al., 1987, “Computerised analysis of ST segment
changes in ambulatory electrocardiograms”, Medical and Biological Engineering and
Computing, v. 25, n. 5, p. 513-519.
doi:10.1007/BF02441743

[21]
Benhorim, J., Badilini, F., Moss, A.J., et al., 1996, “New approach to detection of ischemic-
type ST segment depression”. In: MOSS, A.J., STERN, S. (eds), Noninvasive
electrocardiology: clinical aspects of holter monitoring, chapter 19, London, England, W. B.
Saunders.
doi:10.1002/clc.4960200326

[22]
West, D., West, V., 2000, “Improving diagnostic accuracy using a hierarchical neural
network to model decision subtasks”, International Journal of Medical Informatics, v. 57, n.
1, p. 41-55.
doi:10.1016/S1386-5056(99)00059-3

[23]
Li, Y., Liu, L., Chiu, W., Jian, W., 2000, “Neural network modeling for surgical decisions on
traumatic brain injury patients”, International Journal of Medical Informatics, v. 57, n. 1, p.
1-9.
doi:10.1016/S1386-5056(99)00054-4

[24]
Teasdale, G.M., Jennett, B., 1974, “Assessment of Coma and impaired consciousness. A
practical scale”, Lancet, v. 304, n. 7872, p. 81-84.
doi:10.1016/S0140-6736(74)91639-0

Teasdale, G.M., Murray, L., 2000, “Revisiting the Glasgow Coma scale and Coma score”,
Intensive Care Medicine, v. 26, n. 2, p. 153-154.
doi:10.1007/s001340050037

[25]
Geetha, Y.V., Pratibha, K., Ashok, R., Ravindra, S.K., 2000, “Classification of childhood
disfluencies using neural networks”, Journal of Fluency Disorders, v. 25, n. 2, p. 99-117.
doi:10.1016/S0094-730X(99)00029-7

[26]
Gogou, G., Maglaveras, N., Ambrosiadou, B.V., et al., 2001, “A neural network approach in
diabetes management by insulin administration”, Journal of Medical Systems, v. 25, n. 2, p.
119-131.
doi:10.1023/A:1005672631019

[27]
Parmanto, B., Deneault, L.G., Denault, A.Y., 2001, “Detection of hemodynamic changes in
clinical monitoring by time-delay neural networks”, International Journal of Medical
Informatics, v. 63, n. 1-2, p. 91-99.
doi:10.1016/S1386-5056(01)00174-5

[28]
Lin, D.T., Dayhoff, J.E., Ligomenides, P.A., 1995, “Trajectory production with the adaptive
time-delay neural network”, Neural Networks, v. 8, n. 3, p. 447-461.
doi:10.1016/0893-6080(94)00104-T

Waibel, A., Hanazawa, T., Hinton, G., et al., 1989, “Phoneme recognition using time-delay
neural networks”, IEEE Transactions on Acoustics, Speech, and Signal Processing, v. 37, n.
3, p. 328-339.
doi:10.1109/29.21701

[29]
Barnes, R.W., Toole, J.F., Nelson, J.J., et al., 2006, “Neural networks for ischemic stroke”,
Journal of Stroke and Cerebrovascular Diseases, v. 15, n. 5, p. 223-227.
doi:10.1016/j.jstrokecerebrovasdis.2006.05.008

You might also like