You are on page 1of 142

1

I N TRO DU C T IO N TO N E U R A L
N E TWO R KS

AI Sciences Publishing

2
How to contact us
Please address comments and questions concerning this book
to our customer service by email at:
contact@aisciences.net

Our goal is to provide high-quality books for your technical learning in


Data Science and Artificial Intelligence subjects.

Thank you so much for buying this book.

If you noticed any problem, please let us know by


sending us an email at review@aisciences.net before
writing any review online. It will be very helpful for us
to improve the quality of our books.

3
Table of Contents

Table of Contents ............................................................ 4


About Us ....................................................................................... 10
About our Books ......................................................................... 11
To Contact Us: ............................................................................. 11
From AI Sciences Publishing .........................................12

Introduction to Artificial Neural Network .....................17


A Brief History of Neural Network ............................................. 17
Artificial Neural Network vs. Biological Neural Network? ........ 18
Real – Biological Neurons .......................................................... 18
Artificial Neurons ........................................................................ 19
What Is Artificial Neural Network? .............................. 22
Let Us Introduce ........................................................................ 22
Artificial Neural Network Layers ............................................... 23
a. Input Layer:............................................................................... 24
b. Hidden Layer ............................................................................ 24
c. Output Layer ............................................................................ 25
Structure of a Neural Network ................................................... 25
Learning Process ........................................................................ 27
Supervised Learning .................................................................... 27
Unsupervised Learning ............................................................... 28
Reinforcement Learning ............................................................. 28
Why Neural Networks? ................................................. 30
Let Us Introduce ......................................................................... 31
Fundamentals of Artificial Neural Networks ............................. 32
Network Topology ...................................................................... 33
Feed forward Network................................................................ 33
Feedback Network....................................................................... 35
Single Layer Recurrent Network ............................................... 36
MultiLayer Recurrent Network ................................................. 37

4
Activation Functions .................................................................. 39
Linear Activation Function ........................................................ 39
Sigmoid Activation Function ..................................................... 40
Binary threshold signal function ................................................ 40
Bipolar threshold signal function .............................................. 41
Linear threshold (RAMP) signal function ................................ 41
Adjustments of Weights or Learning........................................... 41
Learning Paradigms ....................................................................41
Supervised Learning .................................................................... 42
Unsupervised Learning ............................................................... 43
Semi-Supervised Machine Learning .......................................... 44
Reinforcement Learning ............................................................. 45
Major Variants of Artificial Neural Network ................ 47
Multilayer Perceptron (MLP) ..................................................... 48
Activation Function ..................................................................... 49
Layers ............................................................................................. 50
Learning ......................................................................................... 50
Terminology.................................................................................. 52
Applications .................................................................................. 53
Convolutional neural networks ................................................... 53
The convolutional layer............................................................... 54
The Pooling Layer........................................................................ 55
The Output Layer ........................................................................ 57
Recurrent Neural Networks ....................................................... 57
Recurrent Neural Network Extensions ...................................... 60
Long Short-Term Memory ......................................................... 62
Deep Belief Networks ................................................................ 64
Deep Reservoir Computing ........................................................ 65

Tools and Technologies ................................................ 66


Major libraries ............................................................................ 66
OpenNN – Open Neural Network: ......................................... 66
Neural Network Libraries by Sony: .......................................... 66
Theano – Latest verion: Theano 0.7 ......................................... 67
Torch - Torch | Scientific computing for LuaJIT. ................ 67

5
Caffe – Caffe: A Deep Learning Framework .......................... 67
TensorFlow ................................................................................... 67
MXNet - MXNet with Documentation ................................... 68
Keras .............................................................................................. 68
Lasagne .......................................................................................... 68
Blocks............................................................................................. 69
Pylearn2 ......................................................................................... 69
DeepPy .......................................................................................... 69
Deepnet ......................................................................................... 69
Gensim .......................................................................................... 69
nolearn ........................................................................................... 70
Passage ........................................................................................... 70
The Microsoft Cognitive Toolkit(CNTK) ............................... 70
FANN ............................................................................................ 70
Programming language support ................................................. 71
Python............................................................................................ 71
Java ................................................................................................. 71
Lisp ................................................................................................. 72
Prolog............................................................................................. 72
C++................................................................................................ 73
AIML ............................................................................................. 73
Practical implementations ............................................. 76
Text Classification ...................................................................... 76
Text Classification Using Neural Networks ............................ 76
Image Processing ........................................................................ 91
Recognizing Objects with Deep Learning ............................... 91
Building our Bird Classifier ...................................................... 106
Testing Our Network ................................................................ 111
Major NN projects ....................................................... 115
Recognition of Braille Alphabet Using Neural Networks ......... 115
Shuttle Landing Control ............................................................ 115
Music Classification by Genre Using Neural Networks ........... 115
Face Recognition Using Neural Network ................................. 116
Concept Learning and Classification - Hayes-Roth Data Set ... 116

6
Predicting Poker Hands with Neural Networks ....................... 116
Predicting Relative Performance of Computer Processors with
Neural Networks ....................................................................... 117
Predicting Survival of Patients Using Habermans Data Set ..... 117
Predicting the Class of Breast Cancer with Neural Networks .. 117
Breast Tissue Classification Using Neural Networks ............... 117
Classification of Animal Species Using Neural Networks ........ 117
Car Evaluation Using Neural Networks ................................... 118
Lenses Classification Using Neural Networks ......................... 118
Balance Scale Classification Using Neural Networks ............... 119
Blood Transfusion Service Center ............................................. 119
Predicting the Result of Football Match with Neural
Networks ................................................................................... 120
Predicting the Workability of High-Performance Concrete ...... 120
Concrete Compressive Strength Test ........................................ 121
Glass Identification Using Neural Networks ............................ 121
Teaching Assistant Evaluation.................................................. 122
Predicting Protein Localization Sites Using Neural Networks . 122
Predicting the Religion of European States Using Neural
Networks ................................................................................... 122
Predicting the Burned Area of Forest Fires Using Neural
Networks ................................................................................... 124
Wine Classification Using Neural Networks ............................ 125
NeurophRM: Integration of the Neuroph Framework into
RapidMiner ............................................................................... 125

Open sources resources ................................................ 127

Issues and Challenges .................................................. 128


Uncertainty ................................................................................ 128
Lots and Lots of Data ................................................................ 128
Overfitting in Neural Networks ................................................ 129
Hyperparameter Optimization .................................................. 130
Requires High-Performance Hardware .................................... 130
Neural Networks Are Essentially a Blackbox ........................... 131
Lack of Flexibility and Multitasking ......................................... 132

Applications of ANN .................................................... 134


Speech Recognition ................................................................... 134

7
Character Recognition .............................................................. 134
Signature Verification Application ............................................ 135
Human Face Recognition ......................................................... 135
Image Compression .................................................................. 136
Stock Market Prediction ............................................................ 136
Traveling Saleman's Problem ................................................... 137
Future in NN ............................................................................ 137

Summary ....................................................................... 139

Thank you ! ................................................................... 141

8
9
 Do you want to discover, learn and understand the methods
and techniques of artificial intelligence, data science,
computer science, machine learning, deep learning or
statistics?
 Would you like to have books that you can read very fast and
understand very easily?
 Would you like to practice AI techniques?
If the answers are yes, you are in the right place. The AI
Sciences book series is perfectly suited to your expectations!
Our books are the best on the market for beginners,
newcomers, students and anyone who wants to learn more
about these subjects without going into too much theoretical
and mathematical detail. Our books are among the best sellers
on Amazon in the field.
About Us

We are a group of experts, PhD students and young


practitioners of Artificial Intelligence, Computer Science,
Machine Learning and Statistics. Some of us work in big
companies like Google, Facebook, Microsoft, KPMG, BCG
and Mazars.
We decided to produce a series of books mainly dedicated to
beginners and newcomers on the techniques and methods of
Machine Learning, Statistics, Artificial Intelligence and Data
Science. Initially, our objective was to help only those who
wish to understand these techniques more easily and to be able
to start without too much theory and without a long reading.
Today we also publish more complete books on some topics
for a wider audience.
10
About our Books

Our books have had phenomenal success and they are today
among the best sellers on Amazon. Our books have helped
many people to progress and especially to understand these
techniques, which are sometimes considered to be complicated
rightly or wrongly.
The books we produce are short, very pleasant to read. These
books focus on the essentials so that beginners can quickly
understand and practice effectively. You will never regret
having chosen one of our books.
We also offer you completely free books on our website: Visit
our site and subscribe in our Email-List: www.aisciences.net
By subscribing to our mailing list, we also offer you all our new
books for free and continuously.
To Contact Us:

 Website: www.aisciences.net
 Email: contact@aisciences.net
Follow us on social media and share our publications
 Facebook: @aisciencesllc
 LinkedIn: AI Sciences

11
From AI Sciences Publishing

12
WWW.AISCIENCES.NET
EBooks, free offers of eBooks and online learning courses.
Did you know that AI Sciences offers free eBooks versions of
every books published? Please subscribe to our email list to be
aware about our free eBook promotion. Get in touch with us
at contact@aisciences.net for more details.

At www.aisciences.net , you can also read a collection of free


books and receive exclusive free ebooks.

13
WWW.AISCIENCES.NET
Did you know that AI Sciences offers also online courses?
We want to help you in your career and take control of your
future with powerful and easy to follow courses in Data
Science, Machine Learning, Deep learning, Statistics and all
Artificial Intelligence subjects.

Most courses in Data science and Artificial Intelligence simply


bombard you with dense theory. Our course don’t throw
complex maths at you, but focus on building up your intuition
for infinitely better results down the line.

Please visit our website and subscribe to our email list to be


aware about our free courses and promotions. Get in touch
with us at academy@aisciences.net for more details.

14
© Copyright 2016 by AI Sciences
All rights reserved.
First Printing, 2016

Edited by Davies Company


Ebook Converted and Cover by Pixels Studio
Publised by AI Sciences LLC

ISBN-13: 978-1985134560
ISBN-10: 198513456X

The contents of this book may not be reproduced, duplicated or


transmitted without the direct written permission of the author.

Under no circumstances will any legal responsibility or blame be held


against the publisher for any reparation, damages, or monetary loss
due to the information herein, either directly or indirectly.

15
Legal Notice:

You cannot amend, distribute, sell, use, quote or paraphrase any part
or the content within this book without the consent of the author.

Disclaimer Notice:

Please note the information contained within this document is for


educational and entertainment purposes only. No warranties of any
kind are expressed or implied. Readers acknowledge that the author
is not engaging in the rendering of legal, financial, medical or
professional advice. Please consult a licensed professional before
attempting any techniques outlined in this book.

By reading this document, the reader agrees that under no


circumstances is the author responsible for any losses, direct or
indirect, which are incurred as a result of the use of information
contained within this document, including, but not limited to, errors,
omissions, or inaccuracies.

16
Introduction to Artificial Neural Network
An Artificial Neural Network (ANN) is a computational
model. It is based on the structure and functions of biological
neural networks. It works like the way human (animal) brain
processes information. It includes a large number of connected
processing units called neurons that work together to process
information. They also generate meaningful results from it. In
this book, we will take you through the complete introduction
to Artificial Neural Network, Artificial Neural Network
Structure, layers of ANN, Applications, Algorithms, Tools and
technology, Practical implementations and the benefits and
limitations of ANN.

A Brief History of Neural Network

The historical backdrop of neural networking seemingly began


in the late 1800s with scientific endeavors to ponder the
workings of the human brain. In 1890, William James
published the principal work about brain activity patterns.
In 1943, neurophysiologist Warren McCulloch and
mathematician Walter Pitts composed a paper on how neurons
may function. Keeping in mind the end goal to portray how
neurons in the brain may function, they modeled a
straightforward neural network utilizing electrical circuits

17
around the same time. In any case, the technology accessible
around then did not enable them to do excessively.

Artificial Neural Network vs. Biological Neural


Network?

Real – Biological Neurons

The human brain is a neural network. The fundamental


element of the neural network is called a neuron. Our brain has
10^11 neurons. And each of these neurons is connected to
approximately 10^4 other neurons.
Structure of Neurons in a brain comprises of four important
parts:

Dendrite: It receives signals from other surrounding neurons.


It may have n number of branches where each Dendrite
branch is connected to one neuron.

18
Soma (The cell body): It is the body of the nucleus.  It sums all
the incoming signals to generate an input.
Axon: When the sum reaches a certain threshold value, the
neuron fires a signal which travels down the axon and is
transmitted to other neurons via the synapses terminals.

Synapses:  The point of interconnection of one neuron with


other neurons. The synapses of a neuron is connected to
dendrites of the neighboring neuron. The amount of signal
transmitted depend upon the strength (synaptic weights) of the
connections.

Artificial Neurons

Our essential computational component (model neuron) is


frequently called a node or unit. It gets input from some
different units, or maybe from an outer source. Each input has
an associated weight w, which can be altered in order to model
synaptic learning. The unit processes some function f of the
weighted sum of its inputs:

Its output, in turn, can fill in as input to different units.

A Simple Artificial Neuron

19
• The weighted sum is called the net input to
unit i, regularly written neti.
• Note that wij refers to the weight from unit j to unit i (not
the other way around).
• The function f is the unit's activation function. In the least
complex case, f is the identity function, and the unit's
output is only its net input. This is called a linear unit.

An artificial neuron is a mathematical function imagined as a


model of biological neurons. Artificial neurons are elementary
units in an artificial neural network. The artificial neuron gets
at least one inputs related with a few weights and a bias. It sums
them to deliver an output (or activation, speaking to a neuron's
activity potential which is transmitted along its axon).

20
21
What Is Artificial Neural Network?

Let Us Introduce

To better understand artificial neural computing it is important


to know first how a conventional 'serial' computer and its
software process information. A serial computer has a central
processor that can address an array of memory locations where
data and instructions are stored.
Computations are made by the processor reading an
instruction and in addition any data the instruction requires
from memory addresses, the instruction is then executed and
the outcomes are spared in a predefined memory area as
required. In a serial system (and a standard parallel one too) the
computational steps are deterministic, sequential and logical,
and the state of a given variable can be followed starting with
one operation then onto the next.

In comparison, ANNs are not sequential or fundamentally


deterministic. There are no complex central processors, rather
there are numerous straightforward ones which for the most
part do simply take the weighted sum of their inputs from
different processors. ANNs don't execute programed
instructions; they respond in parallel (either reenacted or
genuine) to the pattern of inputs displayed to it. There are
additionally no different memory addresses for putting away
data. Rather, information is contained in the general activation
'state' of the network. 'Knowledge' is consequently spoken to
by the network itself, which is truly more than the sum of its
individual segments.

22
An Artificial Neural Network is an information processing
technique. It works like the way human brain processes
information. ANN incorporates an extensive number of
associated processing units that cooperate to process
information. They likewise produce significant outcomes from
it.

Neural networks discover extraordinary application in data


mining utilized as a part of segments. For instance financial
matters, legal sciences, and so forth and for pattern
recognition. It can be likewise utilized for data classification in
a lot of data after watchful training. We can apply neural
network not just for classification. It can likewise connected
for regression of consistent target qualities.

Artificial Neural Network Layers

Artificial Neural network is typically organized in layers. Layers


are being made up of many interconnected ‘nodes’ which
contain an ‘activation function’. A neural network may contain
the following 3 layers:
a. Input Layer
b. Hidden Layer
c. Output Layer
Patterns are presented to the network via the 'input layer',
which communicates to one or more 'hidden layers' where the
actual processing is done via a system of weighted
'connections'. The hidden layers then link to an 'output layer'
where the answer is output as shown in the figure below.

23
a. Input Layer:

The purpose of the input layer is to receive as input the values


of the explanatory attributes for each observation. Usually, the
number of input nodes in an input layer is equal to the number
of explanatory variables. ‘Input layer’ presents the patterns to
the network, which communicates to one or more ‘hidden
layers’.

The nodes of the input layer are passive, meaning they do not
change the data. They receive a single value on their input and
duplicate the value to their many outputs. From the input layer,
it duplicates each value and sent to all the hidden nodes.

b. Hidden Layer

The Hidden layers apply given transformations to the input

24
values inside the network. In this, incoming arcs that go from
other hidden nodes or from input nodes connected to each
node. It connects with outgoing arcs to output nodes or to
other hidden nodes. In hidden layer, the actual processing is
done via a system of weighted ‘connections’. There may be one
or more hidden layers. The values entering a hidden node
multiplied by weights, a set of predetermined numbers stored
in the program. The weighted inputs are then added to produce
a single number.

c. Output Layer

The hidden layers then link to an ‘output layer‘. Output layer


receives connections from hidden layers or from input layer. It
returns an output value that corresponds to the prediction of
the response variable. In classification problems, there is
usually only one output node. The active nodes of the output
layer combine and change the data to produce the output
values.

The ability of the neural network to provide useful data


manipulation lies in the proper selection of the weights. This is
different from conventional information processing.

Structure of a Neural Network

A neural network has at least two physical components,


namely, the processing elements and the connections between
them. The processing elements are called neurons, and the
connections between the neurons are known as links.

The structure of a neural network is also referred to as its

25
‘architecture’ or ‘topology’. It consists of the number of layers,
Elementary units. It also consists of interconnected Weight
adjustment mechanism. The choice of the structure determines
the results which are going to obtain. It is the most critical part
of the implementation of a neural network.

The simplest structure is the one in which units distributes in


two layers: An input layer and an output layer. Each unit in the
input layer has a single input and a single output which is equal
to the input. The output unit has all the units of the input layer
connected to its input, with a combination function and a
transfer function. There may be more than 1 output unit. In
this case, resulting model is a linear or logistic regression. This
is depending on whether transfer function is linear or logistic.
The weights of the network are regression coefficients.

By adding 1 or more hidden layers between the input and


output layers and units in this layer the predictive power of
neural network increases. But a number of hidden layers
should be as small as possible. This ensures that neural network
does not store all information from learning set but can
generalize it to avoid over fitting.

Over fitting can occur. It occurs when weights make the


system learn details of learning set instead of discovering
structures. This happens when size of learning set is too small
in relation to the complexity of the model.

A hidden layer is present or not, the output layer of the


network can sometimes have many units, when there are many
classes to predict.

26
Learning Process

Basically, learning means to do and adapt the change in itself


as and when there is a change in environment. ANN is a
complex system or more precisely we can say that it is a
complex adaptive system, which can change its internal
structure based on the information passing through it.

Learning rule or Learning process is a method or a


mathematical logic which improves the artificial neural
network's performance and usually this rule is applied
repeatedly over the network. It is done by updating the weights
and bias levels of a network when a network is simulated in a
specific data environment. A learning rule may accept existing
condition (weights and bias) of the network and will compare
the expected result and actual result of the network to give new
and improved values for weights and bias. Depending on the
complexity of actual model, which is being simulated, the
learning rule of the network can be as simple as an XOR gate
or Mean Squared Error or it can be the result of multiple
differential equations. The learning rule is one of the factors
which decides how fast or how accurate the artificial network
can be developed. Depending upon the process to develop the
network there are three main models of machine learning:
Supervised Learning

The learning algorithm would fall under this category if the


desired output for the network is also provided with the input
while training the network. By providing the neural network
with both an input and output pair it is possible to calculate an
error based on its target output and actual output. It can then
use that error to make corrections to the network by updating
27
its weights.

Unsupervised Learning

In this paradigm the neural network is only given a set of inputs


and it's the neural network's responsibility to find some kind
of pattern within the inputs provided without any external aid.
This type of learning paradigm is often used in data mining and
is also used by many recommendation algorithms due to their
ability to predict a user's preferences based on the preferences
of other similar users it has grouped together.

Reinforcement Learning

Reinforcement learning is similar to supervised learning in


that some feedback is given, however instead of providing a
target output a reward is given based on how well the system
performed. The aim of reinforcement learning is to maximize
the reward the system receives through trial-and-error. This
paradigm relates strongly with how learning works in nature,
for example an animal might remember the actions it's
previously taken which helped it to find food (the reward).

The possibility of learning has attracted the most interest in


neural networks. Given a specific task to solve, and a class of
functions F, learning means using a set of observations to find
f*є F which solves the task in some optimal sense.

This entails defining a cost function C: F → R such that, for


the optimal solution f*, C (f*) ≤ C (f), every f*є F – i.e., no
solution has a cost less than the cost of the optimal solution

28
The cost function C is an imperative idea in learning, as it is a
measure of how far away a specific solution is from an ideal
solution to the issue to be fathomed. Learning algorithms seek
through the solution space to discover a function that has the
littlest conceivable cost.

Let us see different learning rules in the neural network:

 Hebbian learning rule – It identifies, how to modify the


weights of nodes of a network.
 Perceptron learning rule – Network starts its learning by
assigning a random value to each weight.
 Delta learning rule – Modification in sympatric weight of a
node is equal to the multiplication of error and the input.
 Correlation learning rule – The correlation rule is the
supervised learning.
 Outstar learning principle – We can utilize it when it
assumes that nodes or neurons in a network orchestrated in
a layer.

29
Why Neural Networks?

Objective of this Chapter


At the end of this chapter, the reader
should have learnt:
 Fundamentals of Artificial Neural
Networks
 Activation function
 Adjustments of Weights or learning
 Recurrent Neural Network Extensions
 Long Short-Terms Memory
 Deep belief Networks
 Deep Reservoir Computing

30
Let Us Introduce

Neural networks adopt an alternate strategy to problem solving


than that of conventional computers. Conventional computers
utilize an algorithmic approach i.e. the computer takes after a
set of instructions keeping in mind the end goal i.e. to take care
of a problem. Unless the particular steps that the computer
needs to take after are known the computer can't tackle the
problem. That limits the problem solving capacity of
conventional computers to problems that we as of now
comprehend and know how to solve. Be that as it may,
computers would be a great deal and more helpful on the off
chance if they could do things that we don't precisely know
how to do.

Neural networks process information comparably the human


brain does. The network is made out of an extensive number
of much interconnected processing elements (neurons)
working in parallel to take care of a particular problem. Neural
networks learn by example. They can't be programmed to play
out a particular assignment. The examples must be chosen
carefully else important time is squandered or far and away
more terrible the network may function erroneously. The
problem is that there is no chance to get of knowing whether
the system is broken or not, unless an error happens.

Neural nets are broadly utilized as a part of pattern recognition


as a result of their capacity to sum up and to respond to
startling inputs/patterns. Amid training, neurons are instructed
to perceive different particular patterns and whether to fire or
not when that pattern is found. If a pattern is received during

31
the execution stage that is not associated with an output, the
neuron selects the output that corresponds to the pattern from
the set of patterns that it has been taught of, that is least
different from the input. This is called generalization.

For example:

A 4-input neuron is prepared to fire when the input is 1111


and not to fire when the input is 0000. In the wake of applying
the generalization rule the neuron will likewise fire when the
input is 0111, 1011, 1101, 1110 or 1111 yet won't fire when the
input is 0000, 0001, 0010, 0100 or 1000. Some other inputs
(like 0011) will create an irregular output since they are similarly
far off from 0000 and 1111

Pattern reproduction is substantially more entangled and


something that on conventional computers is extremely hard
to do. For pattern remaking feed-forward networks are
insufficient. Feedback is required with a specific end goal to
make a dynamic system that will create the suitable pattern.
The output of every neuron is associated with the input of the
neighboring neurons. These sort of networks are called auto-
associative networks.

A fascinating experiment was completed including a neural


network controlling a vehicle. The experiment was planned to
contrast human driving conduct and neural network driving
conduct. The outcomes demonstrated a shocking likeness
among them. As indicated by the outcomes neural nets can
inexact human driving conduct with a most extreme error of
5%.
Fundamentals of Artificial Neural Networks

32
Network Topology

A network topology is the course of action of a network


alongside its nodes and interfacing lines. As indicated by the
topology, ANN can be named the accompanying sorts:

Feed forward Network

It is a non-repetitive network having processing units/nodes


in layers and every one of the nodes in a layer are associated
with the nodes of the past layers. The connection has
distinctive weights upon them. There is no feedback circle
implies the signal can just stream one way, from input to
output. It might be partitioned into the accompanying two
types:

i. Single layer feed forward network:

The idea is of feed forward ANN having just a single weighted


layer. At the end of the day, we can state the input layer is

33
completely associated with the output layer.

The most straightforward sort of neural network is a single-


layer network, which comprises of a single layer of output
nodes; the inputs are fed specifically to the outputs by means
of a progression of weights. Along these lines it can be viewed
as the least difficult sort of feed-forward network. The sum of
the results of the weights and the inputs is ascertained in every
node, and if the value is over some threshold (commonly 0)
the neuron fires and takes the initiated value (normally 1); else
it takes the deactivated value (ordinarily - 1). Neurons with this
sort of activation function are additionally called artificial
neurons or linear threshold units.

ii .Multilayer feed forward network:


The concept is of feed forward ANN having more than one
weighted layer. As this network has one or more layers between
the input and the output layer, it is called hidden layers.

This class of networks consists of multiple layers of


computational units, interconnected in a feed-forward way.

34
Each neuron in one layer has directed connections to the
neurons of the subsequent layer.
The first layer is an input layer, the last layer is the output layer
and the layers between the input and the output layers are
hidden layers. This hidden layer is internal to the network and
has no direct connection with the external environment. There
can be more than one hidden layers however hypothetical
work has demonstrated that one hidden layer is adequate to
assess any complex nonlinear function. The complexity of the
network increments with the expansion in the quantity of
hidden layers. At the point when the quantity of hidden layers
are substantial an effectiveness of output response increases.
Multi-layer networks utilize an assortment of learning
techniques, the most well-known being back-propagation.
Here, the output values are contrasted with the right answer
with compute the value of some predefined error-function.
The algorithm changes the weights of every connection
keeping in mind the end goal to lessen the value of the error
function by some little sum. Subsequent to rehashing this
process for an adequately vast number of training cycles, the
network will for the most part focalize to some state where the
error of the computations is little. For this situation, one would
state that the network has learned a specific target function. To
alter weights legitimately, one applies a general strategy for
non-linear optimization that is called gradient descent

Feedback Network

As the name proposes, a feedback network has feedback paths,


which implies the signal can stream in the two directions
utilizing loops. This makes it a non-linear dynamic system,

35
which changes persistently until the point that it achieves a
state of equilibrium. It might be partitioned into the
accompanying kinds:

Recurrent networks: They are feedback networks with closed


loops. Following are the two sorts of recurrent networks.

Single Layer Recurrent Network

In the event that feedback of the output of the processing


components is guided back as input to the processing
components in a similar layer then it is called as parallel
feedback. Recurrent networks are feedback networks with
closed loop. In recurrent network the output of the processing
element can be directed back to the processing element itself
or to the other processing elements or both.

36
MultiLayer Recurrent Network

Generally, a Recurrent Multi-Layer Network consists of


multiple layers of nodes. Each of these layers is feed-forward
except for the last layer, which can have feedback connections.
Here the output of a processing element can be directed back
to the nodes in the preceding layer or the same layer or both.
This forms a multilayer recurrent network.

Some variants of the recurrent network are:

i. Fully recurrent network:


It is the simplest neural network architecture because all nodes
are connected to all other nodes and each node works as both
input and output.

ii. Jordan network:


It is a closed loop network in which the output will go to the
input again as feedback as shown in the following diagram.

37
iii. Hopfield Network

Hopfield neural network consists of a single layer which


contains one or more fully connected recurrent neurons. The
Hopfield network is commonly used for auto-association and
optimization tasks.

iv. Long Short Term Memory

Long Short Term Memory networks – usually just called


“LSTMs” – are a special kind of RNN, capable of learning
long-term dependencies. They were introduced by Hochreiter
& Schmidhuber (1997), and were refined and popularized by

38
many people in following work.1 They work tremendously well
on a large variety of problems, and are now widely used.

LSTMs are explicitly designed to avoid the long-term


dependency problem. Remembering information for long
periods of time is practically their default behavior, not
something they struggle to learn!

v. Elman Network

vi. Hierarchical RNN, etc.

Activation Functions

It might be characterized as the additional power or exertion


connected over the input to get a correct output. In ANN, we
can likewise apply activation functions over the input to get the
correct output. Followings are some activation functions of
intrigue:

Linear Activation Function

It is also called the identity function as it performs no input


editing. It can be defined as

F(x) = x

This activation function is unbounded. It is a generalized


model to use signal functions other than threshold function.
The output remains the same as the input.

39
Sigmoid Activation Function

It is of two kind as takes after:

i. Binary sigmoidal function: This activation function performs


input altering in the vicinity of 0 and 1. It is certain in nature.
It is constantly limited, which implies its output can't be under
0 and more than 1. It is likewise entirely expanding in nature,
which implies more the input higher would be the output. It
can be defined as

ii. Bipolar sigmoidal function: This activation function


performs input editing between -1 and 1. It can be positive or
negative in nature. It is always bounded, which means its
output cannot be less than -1 and more than 1. It is also strictly
increasing in nature like sigmoid function. It can be defined as

Binary threshold signal function

The function is defined as

where xth represents the threshold value. The output of this

40
function is binary i.e. either 0 or 1

Bipolar threshold signal function

The function is defined as

where xth represents the threshold value. The output of this


function is bipolar i.e. either 0 or -1

Linear threshold (RAMP) signal function

This is a bounded version of the linear threshold function


which is defined as:

Adjustments of Weights or Learning

Learning, in artificial neural network, is the strategy for


adjusting the weights of connections between the neurons of a
predefined network. Learning in ANN can be ordered into
three classifications in particular supervised learning,
unsupervised learning, and reinforcement learning.
Learning Paradigms

41
Supervised Learning

Supervised learning is an undertaking of deducing a function


from labelled training data. The training data comprise of a set
of training examples. In supervised learning, every example is
a couple comprising of an input question (normally a vector)
and the coveted output value (likewise called the supervisory
signal).

A supervised learning algorithm examines the training data and


produces a deduced function, which can utilized for mapping
new examples. An ideal situation will consider the algorithm to
accurately decide the class names for inconspicuous examples.
This requires the learning algorithm to generalize from the
training data to unseen situations in a “reasonable” way.

The majority of practical machine learning uses supervised


learning.

Supervised learning is where you have input variables (x) and


an output variable (Y) and you use an algorithm to learn the
mapping function from the input to the output.

Y = f(X)

The objective is to approximate the mapping function so well


that whenever there is a new input data (x) that you can
anticipate the output variables (Y) for that data.

It is called supervised learning on the grounds that the process


of an algorithm learning from the training dataset can be

42
thought of as an educator overseeing the learning process. We
know the right answers, the algorithm iteratively makes
predictions on the training data and is rectified by the educator.
Learning stops when the algorithm accomplishes an adequate
level of performance.

Example for supervised learning


1. Logistic Regression
2. Decision trees
3. Support vector machine (SVM)
4. K-Nearest Neighbors
5. Naive Bayes
6. Random forest
7. Linear regression
8. Polynomial regression
9. SVM for regression

Unsupervised Learning

In data science world, the problem of an unsupervised learning


task is endeavoring to discover hidden structure in unlabeled
data. Since the examples given to the learner are unlabeled,
there is no error or reward signal to assess a potential solution.

Unsupervised learning is the place you just have input data (X)
and no corresponding output variables. The objective for
unsupervised learning is to model the basic structure or
appropriation in the data with a specific end goal to learn more
about the data.

These are called unsupervised learning in light of the fact that


not at all like supervised learning above there is no right
answers and there is no instructor. Algorithms are left to their

43
own devises to find and present the fascinating structure in the
data.

Unsupervised learning problems can be additionally gathered


into clustering and association problems.

Clustering: A clustering problem is the place you need to find


the inborn groupings in the data, for example, gathering clients
by obtaining conduct.

Association: An association rule learning problem is the place


you need to find rules that portray extensive segments of your
data, for example, individuals that purchase X additionally tend
to purchase Y.

Some prominent examples of unsupervised learning


algorithms are:

1. K-means for clustering problems.


2. Apriori algorithm for association rule learning
problems.
3. Hierarchical clustering
4. Hidden Markov models

Semi-Supervised Machine Learning

Problems where you have a lot of input data (X) and just a
portion of the data is labelled (Y) are called semi-supervised
learning problems.

These problems sit in the middle of both supervised and


unsupervised learning.

44
A decent example is a photograph archive where just a portion
of the pictures are labeled, (e.g. canine, feline, individual, etc.)
and the greater part are unlabeled.

Numerous real world machine learning problems fall into this


zone. This is on account of it can be costly or tedious to mark
data as it might expect access to domain specialists. While
unlabeled data is shabby and simple to gather and store.

You can utilize unsupervised learning techniques to find and


learn the structure in the input variables.

You can likewise utilize supervised learning techniques to make


best figure predictions for the unlabeled data, feed that data
once more into the supervised learning algorithm as training
data and utilize.

Reinforcement Learning

In reinforcement learning, data x is usually not given, but


generated by an agent’s interactions with the environment. At
each point in time t, the agent performs an action yt and the
environment generates an observation xt and an instantaneous
cost ct, according to some (usually unknown) dynamics. The
aim is to discover a policy for selecting actions that minimizes
some measure of a long-term cost, i.e. the expected cumulative
cost. The environment’s dynamics and the long-term cost for
each policy are usually unknown, but can be estimated. ANNs
are frequently used in reinforcement learning as part of the
overall algorithm. Tasks that fall inside the worldview of
reinforcement learning are control problems, amusements and
other sequential decision making tasks.

45
46
Major Variants of Artificial Neural
Network

Objective of this Chapter


At the end of this chapter, the reader
should have learnt:
 Multilayer perceptron
 Convolutional Neural Networks
 Recurrent Neural Networks
 Recurrent Neural Network Extensions
 Long Short-Terms Memory
 Deep belief Networks
 Deep Reservoir Computing

47
Multilayer Perceptron (MLP)

The field of artificial neural networks is frequently just called


neural networks or multi-layer perceptrons after maybe the
most valuable kind of neural network. A perceptron is a single
neuron model that was an antecedent to bigger neural
networks.
A multilayer perceptron (MLP) is a class of feedforward
artificial neural network. A MLP comprises of no less than
three layers of nodes. With the exception of the input nodes,
every node is a neuron that uses a nonlinear activation
function. MLP uses a supervised learning technique called back
propagation for training. Its numerous layers and non-linear
activation differentiate MLP from a linear perceptron. It can
distinguish data that is not linearly separable.
An MLP is a network of simple neurons called perceptrons.
The basic concept of a single perceptron was introduced by
Rosenblatt in 1958. The perceptron computes a single output
from multiple real-valued inputs by forming a linear
combination according to its input weights and then possibly
putting the output through some nonlinear activation function.

Mathematically this can be written as

where ω denotes the vector of weights, x is the vector of


inputs, b the bias and Ҩ is the activation function.

An MLP (or Artificial Neural Network - ANN) with a single


hidden layer can be represented graphically as follows:

48
Multilayer perceptrons are sometimes colloquially referred to
as "vanilla" neural networks, especially when they have a single
hidden layer.

Activation Function

On the off chance that a multilayer perceptron has a linear


activation function in all neurons, that is, a linear function that
maps the weighted inputs to the output of every neuron, then
linear algebra shows that any number of layers can be lessened
to a two-layer input-output model. In Multilayer Perceptrons
some of the artificial neurons use a nonlinear activation
function. This nonlinear activation function was developed to
model the frequency of action potentials, or firing, of
biological neurons.
The two common activation functions that are both sigmoids
are described by:

The first is a hyperbolic tangent that ranges from -1 to 1, while


the other is the logistic function, which is similar in shape but
ranges from 0 to 1. Here yi is the output of the ith node
(neuron) and vi is the weighted sum of the input connections.

49
Elective activation functions have been proposed, including
the rectifier and softplus functions. More particular activation
functions incorporate radial basis functions (utilized as a part
of radial basis networks, another class of supervised neural
network models).

Layers

The Multilayer Perceptron consists of at least three or more


layers as follows:
i. an input
ii. an output layer
iii. one or more hidden layers
of nonlinearly-activating nodes which is the reason it is a deep
neural network (DNN). Since MLPs are fully connected, each
node in one layer connects with a certain weight wij to every
node in the accompanying layer.

Learning

Learning happens in the perceptron by changing connection


weights after each bit of data is processed, in view of the
measure of error in the output contrasted with the normal
outcome. This is an example of supervised learning, and is
brought out through back propagation, a speculation of the
slightest mean squares algorithm in the linear perceptron.

We present the error in output node j in the nth data point


(training example) by

50
where d is the target value and y is the value delivered by the
perceptron. The node weights are balanced in light of
corrections that limit the error in the whole output, given by

Using gradient descent, the change in each weight is

where yi is the output of the past neuron and ɳ is the learning


rate, which is chosen to guarantee that the weights rapidly
merge to a response, without motions.

The subordinate to be calculated relies upon the initiated local


field vj which itself shifts. It is anything but difficult to
demonstrate that for an output node this subordinate can be
rearranged to

where ɸ’ is the derivative of the activation function described


above, which itself does not vary. The analysis is more difficult
for the change in weights to a hidden node, but it can be shown
that the relevant derivative is

51
This relies upon the adjustment in weights of the kth nodes,
which speak to the output layer. So to change the hidden layer
weights, the output layer weights change as per the derivative
of the activation function, thus this algorithm presents the
back propagation of the activation function.

Terminology

"Multilayer perceptron" does not signify a single perceptron


that has various layers. Or maybe, it involves numerous
perceptrons that are systematized into layers. An option is
"multilayer perceptron network". In addition, MLP
"perceptrons" are not perceptrons in the severest conceivable
logic. Genuine perceptrons are formally a unique instance of
artificial neurons that utilization a threshold activation
function, for example, the Heaviside step function. MLP
perceptrons can utilize subjective activation functions. A
genuine perceptron performs binary classification (either one
of the available option), a MLP neuron is allowed to either
perform classification or regression, contingent on its
activation function. The term multilayer perceptron" later was
applied without respect to nature of the nodes/layers, which
can be a collection of haphazardly defined artificial neurons,
and not perceptrons explicitly. This interpretation avoids the

52
loosening of the definition of "perceptron" to mean an
artificial neuron in common.

Applications

MLPs are beneficial in research for their capability to solve


problems stochastically, which often allows estimated
solutions for tremendously difficult problems like fitness
approximation.

MLPs are universal function approximation functions as


showed by Cybenko's theorem, so they can be utilized to
create mathematical models by regression analysis. As
classification is a precise case of regression when the response
variable is categorical, MLPs make decent classifier
algorithms.

MLPs remained a popular machine learning solution in the


1980s, discovering applications in miscellaneous fields such as
speech recognition, image recognition, and machine
translation software, but subsequently faced strong
competition from much simpler (and related) support vector
machines. Interest in back propagation networks returned due
to the victories of deep learning.

Convolutional neural networks

There are three basic components as prerequisite to define a


basic convolutional network.

1. The convolutional layer


2. The Pooling layer[optional]
3. The output layer

53
Let’s see all of these in a slightly additional details

The convolutional layer

Assume we have an image of dimension 6*6. We describe a


weight matrix which excerpts certain features from the
images

We have

Initialized the weight as a 3*3 dimension matrix. This weight


intend to now run through the image such that all the pixels
are enclosed at least once, to give a convolved output. The
value 429 shown above, is obtained by addition of the values
acquired by element wise multiplication of the weight Matrix
and the highlighted 3*3 part of the input image. Similarly this
weight matrix is passed over all positions and a fresh 4*4
matrix will be created

54
The 6*6 image is now converted into a 4*4 image. Think of
weight matrix like a paint brush painting a wall. The brush first
paints the wall horizontally and then comes down and paints
the next row horizontally. Pixel values are used again when the
weight matrix moves along the image. This essentially enables
parameter sharing in a convolutional neural network.

Let’s see how this looks like in a real image.

The Pooling Layer

Occasionally when the images are excessively large, we would


want to decrease the amount of trainable parameters. It is then
preferred to periodically introduce pooling layers in the middle

55
of succeeding convolution layers. Pooling is done for the lone
purpose of decreasing the spatial size of the image. Pooling is
done autonomously on each depth dimension, thus the depth
of the image remains unaffected. The most common form of
pooling layer generally applied is the max-pooling.

At this point we have taken stride as 2, whereas pooling size


also as 2. The max operation is applied to every depth
dimension of the convolved output. As you can see, the 4*4
convolved output has converted to 2*2 after the max pooling
operation.

Let’s see how max-pooling looks on a real image.

As you can see I have taken convoluted image and have


applied max pooling on it. The max pooled image yet

56
preserves the information that it’s a car on a street. If you
observe carefully, the dimensions if the image have been split
fifty-fifty. This helps to diminish the parameters to a great
extent.

Similarly, other forms of pooling can also be applied like


average pooling or the L2 norm pooling.

The Output Layer

After multiple layers of convolution and padding, we would


need the output in the form of a class. The convolution and
pooling layers would merely be capable to excerpt features and
decrease the number of parameters from the actual images. On
the other hand, to produce the final output we need to apply a
fully connected layer to produce an output identical to the
number of classes we want. It turn out to be tough to reach
that number with just the convolution layers. Convolution
layers produce 3D activation maps whereas we just need the
output as whether or not an image belongs to a specific class.
The output layer has a loss function like categorical cross-
entropy, to calculate the error in prediction. Once the forward
pass is complete the back propagation initiates to update the
weight and biases for error and loss reduction.

Recurrent Neural Networks

A recurrent neural network (RNN) is a type of artificial neural


network where connections between elements (neurons) form
a directed cycle. This permits it to reveal dynamic temporal
behaviour. Dissimilar from feed forward neural networks,
RNNs can utilize their internal memory to process random
sequences of inputs. This makes them applicable to tasks such
as disjoint, connected handwriting recognition or speech

57
recognition.

The idea behind RNNs is to make use of sequential


information. In a traditional neural network we assume that all
inputs (and outputs) are independent of each other. But for
many tasks that’s a very bad idea. If you need to forecast the
next word in a sentence you better know which words came
before it. RNNs are termed as recurrent as they perform the
similar task for each element of a sequence, with the output
being dependent on the previous calculations. Alternative way
to think about RNNs is that they have a “memory” which
captures information about what has been computed so far. In
theory RNNs are able to make use of information in
haphazardly long sequences, but in actual fact they are
restricted to looking back only a limited steps. Here is what a
usual RNN looks like:

The overhead figure displays a RNN being unrolled (or


unfolded) into a complete network. By unrolling we basically
mean that we carve out the network for the whole sequence.
For example, if the sequence we are interested in is a sentence
of 5 words, the network would be unrolled into a 5-layer neural
network, one layer for every single word. The formulas that
administer the calculation happening in a RNN are as follows:

58
 x_t is the input at time step t. For example, x_1 could be a
one-hot vector corresponding to the second word of a
sentence.

 s_t is the hidden state at time step t. It’s the “memory” of


the network. s_t is calculated based on the preceding hidden
state and the input at the present step: s_t=f(Ux_t + Ws_(t-
1). The function f typically is a nonlinearity such as tanh or
ReLU. s_-1, which is necessary to compute the initial
hidden state, is normally initialized to all zeroes.

 o_t is the output at step t. For example, if we required to


predict the next word in a sentence it would be a vector of
probabilities across our vocabulary. o_t = softmax(Vs_t).

There are a small number of things to be noted here:

 You can consider the hidden state s_t as the memory of the
network. s_t captures information about what occurred in
all the previous time steps. The output at step o_t is
computed exclusively on the grounds of the memory at time
t. As briefly talked about above, it’s a bit more complex in
practice as s_t normally can’t capture information from too
many time steps ago.

 Dissimilar to the traditional deep neural network, which


utilizes diverse parameters at every single layer, a RNN
shares the same parameters (U, V, W above) through all the
steps. This reflects the fact that we are executing the same
job at each step, just with diverse inputs. This greatly
decreases the total number of parameters we require to
59
learn.

 The above figure has outputs at every single time step, but
relying on the task this may not be essential. For instance,
when predicting the sentiment of a sentence we may simply
care about the final output, not the sentiment after every
single word. Similarly, we may not require inputs at every
single time step. The key feature of an RNN is its hidden
state, which captures some information about a sequence.
RNNs have shown abundant success in many NLP tasks. At
this point I should comment that the most usually used type of
RNNs are LSTMs, which are much better at capturing long-
term dependencies than vanilla RNNs are.

Recurrent Neural Network Extensions

As the years have passed by researchers have developed extra


sophisticated forms of RNNs to deal with certain
shortcomings of the vanilla RNN model. We will cover them
in further detail in a later topic, but I want this unit to assist
as a brief overview so that you are aware with the
classification of models.

Bidirectional RNNs are built on the idea that the output at time
t may not only be subject to the previous elements in the
sequence, but also future elements. For example, to predict a
missing word in a sequence you want to observe at both the
left and the right context. Bidirectional RNNs are quite easy to
understand. They are just two RNNs arranged on top of each
other. The output is then calculated based on the hidden state
of both RNNs.

60
Deep (Bidirectional) RNNs are alike to Bidirectional RNNs,
only that we now have numerous layers per time step. In
practice this gives us a greater learning capacity (but we also
need a lot of training data).

61
LSTM networks are pretty widely held these days and we
briefly talked about them in the section above. LSTMs don’t
have a profoundly dissimilar architecture from RNNs, but they
use a different function to compute the hidden state. The
memory in LSTMs are known as cells and you can consider
them as black boxes that take as input the previous state h_{t-
1} and present input x_t. On the inside these cells choose what
to keep in (and what to wipe away from) memory. They then
combine the previous state, the present memory, and the input.
It turns out that these types of units are very resourceful at
capturing long-term dependencies.

Long Short-Term Memory

Long short-term memory (LSTM) units (or blocks) are the


building blocks of units for layers of a recurrent neural network
(RNN). An RNN consisting of LSTM units is often referred
to as an "LSTM network". A general LSTM unit is comprised
of a cell, an input gate, an output gate and a forget gate. The
cell is in charge of "remembering" values over random time
intervals; therefore the word "memory" in LSTM. Each of the
three gates can be assumed as a "conventional" artificial
neuron, just like the one in a multi-layer (or feed forward)
neural network: that is, they calculate an activation (using an
activation function) of a weighted sum. Intuitively, they can be
understood as regulators of the flow of values that goes
through the connections of the LSTM; hence the denotation
"gate". There are connections amid these gates and the cell.

The term long short-term denotes the fact that LSTM is a


model for the short-term memory which can last for a long age
of time. An LSTM is well-suited to categorize, process and

62
predict time series given time lags of indefinite size and period
between important events. LSTMs were created to deal with
the exploding and vanishing gradient problem when training
traditional RNNs. Relative insensitivity to gap length gives an
benefit to LSTM over other RNNs, hidden Markov models
(HMM) and other sequence learning methods in several
applications

There are numerous architectures of LSTM units. A general


architecture consists of a memory cell, an input gate, an output
gate and a forget gate.

An LSTM (memory) cell stores a value (or state), for either a


long or a short time periods. This is attained by using an
identity (or no) activation function for the memory cell. In this
way, when an LSTM network (that is an RNN composed of
LSTM units) is trained with back propagation over time, the
gradient does not incline to vanish.

The LSTM gates calculate an activation, often using the logistic


function. Intuitively, the input gate controls the degree to
which a new value flows into the cell, the forget gate controls
the degree to which a value remains in the cell and the output
gate controls the degree to which the value in the cell is used
to calculate the output activation of the LSTM unit.

There are connections into and out of these gates. A slight


number of connections are recurrent. The weights of these
connections, which are needed to be learned during training,
of an LSTM unit are utilized to direct the operation of the
gates. Each of the gates has its own parameters which are

63
weights and biases, from perhaps other units outside the LSTM
unit.
Deep Belief Networks

A deep belief network (DBN) is a procreative graphical model


in Machine learning, or on the other hand a class of deep neural
network, consisting of several layers of latent variables
("hidden units"), with connections between the layers but not
between units within every single layer.

When trained on a set of examples without supervision, a DBN


can learn to probabilistically restructure its inputs. The layers
then act as feature detectors. After this learning step, a DBN
can be further trained with supervision to accomplish
classification.

DBNs can be seen as an arrangement of clear, unsupervised


networks, for example, Restricted Boltzmann machines
(RBMs) or auto encoders, in which every last sub-network's
hidden layer helps as the visible layer for the accompanying
one. A RBM is an undirected, procreative energy based model
with a "visible" input layer and a hidden layer and connections
amidst yet not inside layers. This arrangement prompts a quick,
layer-by-layer unsupervised training technique, where
contrastive divergence is applied to each sub-network thusly,
starting from the "most reduced" pair of layers (the least
obvious layer is a training set).

Teh's observation that DBNs can be trained greedily, one layer


at a time, directed to one of the first operational deep learning
algorithms. Generally, there are many eye-catching

64
implementations and uses of DBNs in real-life applications
and scenarios (e.g., electroencephalography, drug discovery).
Deep Reservoir Computing

Reservoir computing is a structure for computation that may


be observed as an extension of neural networks. Normally an
input signal is fed into a fixed (arbitrary) dynamical system
called a reservoir and the dynamics of the reservoir map the
input to a higher dimension. Then a modest readout
mechanism is trained to read (understand) the state of the
reservoir and map it to the desired output. The core benefit is
that training is executed only at the readout stage and the
reservoir is static. Liquid-state machines and echo state
networks are two chief categories of reservoir computing.

The extension of the reservoir computing framework towards


Deep Learning, with the introduction of deep reservoir
computing and of the deep Echo State Network (deepESN)
model permits to cultivate efficiently trained models for
hierarchical processing of temporal data, at the same time
permitting the investigation on the integral role of layered
composition in recurrent neural networks.

Results illustrate that the sole CA(Cellular Automata) reservoir


system produces similar results to up-to-date work. The system
encompassed of two layered reservoirs do show an evident
development compared to a single CA reservoir. This shows
potential for further research and offers valuable insight on
how to design CA reservoir systems.

65
Tools and Technologies

Major libraries

Some of the Major libraries used in the implementation of


neural networks are:

OpenNN – Open Neural Network:

OpenNN is an open source class library developed in C++


programming language which equips neural networks, a key
part of machine learning research. The library equips any
number of layers of non-linear processing units for supervised
learning. This deep architecture permits the design of neural
networks with universal approximation properties. The chief
benefit of OpenNN is its great performance. It is written in
C++ for improved memory management and greater
processing speed, and implements CPU parallelization by
means of OpenMP and GPU acceleration with CUDA.

Neural Network Libraries by Sony:

Neural Network Libraries allows you to define a computation


graph (neural network) subconsciously with fewer extent of
code. Dynamic computation graph usage permits flexible
runtime network construction. The Library can utilize both
paradigms of static and dynamic graph. We create the Library
by keeping portability in mind. We run CIs for Linux and
Windows. Most of the code of the Library is written in C++11.
By implementing C++11 core API, you could deploy it onto
embedded devices. We have a nice function abstraction as well
66
as a code template generator for creating a new function.
Those permit the developers to write a new function with a
smaller amount of coding. A new device code can be added as
a plugin without any alteration of the Library code. CUDA is
in reality employed as a plugin extension.

Theano – Latest verion: Theano 0.7

This is a very expandable neural network library for utilization


with Python. It is proficient of working on CPU and GPU. It
is found to have the finest documentation out of the available
neural network libraries.

Torch - Torch | Scientific computing for LuaJIT.

This one too is a very flexible library. Often it has analogous


capabilities and performance to Theano. However, it is utilized
with the Lua language which is one not a well-known one and
lacks a lot of the standard data processing libraries that
languages like Python has.

Caffe – Caffe: A Deep Learning Framework

Written for users of C++ with CUDA, this library is


particularly optimized for vision tasks. I believe it is frequently
among the fastest libraries when benchmarked on vision tasks.

TensorFlow

Link to detailed documentation of tensorflow


https://www.tensorflow.org/.
Very recently open-sourced by Google, TensorFlow can be
thought of as a more or less one of these neural net libraries

67
with a modular GUI on top of it. Some have disapproved it
for not being as fast as some of the other heavily-optimized
libraries but the truth is not the same.

MXNet - MXNet with Documentation

A simple and modular way of building up a neural network and


training it can be done using this library. It is also often among
the fastest libraries available. Conversely, I have found it to be
lacking in flexibility and short on documentation.

Keras

This library is a protruding open source library developed in


Python for structuring Neural Networks. It is proficient of
running on top of MXNet, Deeplearning4j, Tensorflow,
Microsoft Cognitive Toolkit (CNTK) or Theano. The library
comprises of abundant enactments of usually used neural
network building blocks such as layers, objectives, activation
functions, optimizers, and a host of tools to make working
with image and text data easier.

Lasagne

This one is a Lightweight library to build and train artificial


neural networks in Theano. It supports Convolutional Neural
Networks (CNNs), along with recurrent networks and also
including Long Short-Term Memory (LSTM). It provides
transparent provision of CPUs and GPUs due to Theano’s
expression compiler. You can utilize this if you want the
flexibility of Theano but don’t want to always write neural
network layers from scratch.

68
Blocks

Blocks is a framework that aids you in building neural network


models on top of Theano.

Pylearn2

Pylearn2 is a library that encompasses a lot of models and


training algorithms like the Stochastic Gradient Descent that
are usually used in Deep Learning. Its functional libraries are
built on top of Theano.

DeepPy

DeepPy is an alternative Python deep learning framework built


on top of NumPy.

Deepnet

This is a GPU-based implementation of deep learning


algorithms developed in python. It consists of Feed-forward
Neural Nets, Restricted Boltzmann Machines (RBM), Deep
Belief Nets (DBN), Auto encoders, Deep Boltzmann
Machines and Convolutional Neural Nets.

Gensim

Gensim is a deep learning toolkit developed in python


programming language. It was envisioned for handling large
text collections, using resourceful algorithms.

69
nolearn

This library contains a number of wrappers and abstractions


around present neural network libraries. As Keras wraps
Theano and TensorFlow to provide a friendly API similarly
nolearn is a wrappers and abstractions for Lasagne, along with
a small number of machine learning utility modules.

Passage

Passage is one of the finest suited library for text analysis with
RNNs.

The Microsoft Cognitive Toolkit(CNTK)

The Microsoft Cognitive Toolkit (CNTK), is correspondingly


a deep-learning toolkit that defines neural networks as a series
of computational steps by means of a directed graph. CNTK
permits to easily realize and combine popular model types such
as feed-forward DNNs, convolutional nets (CNNs), and
recurrent networks (RNNs/LSTMs). It equips stochastic
gradient descent (SGD, error back propagation) learning with
automatic differentiation and parallelization across several
GPUs and servers. CNTK has been existing under an open-
source license since April 2015.

FANN

Fast Artificial Neural Network Library is a free open source


neural network library, which equips multilayer artificial neural
networks in C language with support for both fully connected
and sparsely connected networks. Cross-platform execution in
both fixed and floating point are sustained. It comprises a

70
framework for easy handling of training data sets. It is easy to
use, multipurpose, well documented, and fast. Bindings to
more than 20 programming languages are available. An easy to
read introduction article and a reference manual escorts the
library with examples and recommendations on how to use the
library. Numerous graphical user interfaces are also available
for the library.

Programming language support

Python

Python is one of the most widely implemented programming


languages in the AI zone of Artificial Intelligence due to its
simplicity. It can flawlessly be used with the data structures and
other repeatedly used AI algorithms.

The choice of Python for AI projects also stems from the fact
that there are plenteously of beneficial libraries that can be used
in AI. For instance, Numpy provides scientific computation
capability, Scypy for advanced computing and Pybrain for
machine learning in Python.

You will also have no difficulties learning Python for AI as


there are tons of resources accessible online.

Java

Java is also a good choice. It is an object-oriented


programming language that emphases on providing all the
advanced features required to work on AI projects, it's
portable, and it offers in-built garbage collection. The Java

71
community is also an advantageous point as there will be
someone to help you with your queries and problems.

Java is also a great choice as it offers an easy way to develop


algorithms, and AI is full of algorithms, be they search
algorithms, natural language processing algorithms or neural
networks algorithms. Not to mention that Java also permits for
scalability, which is a must-have feature for AI projects.

Lisp

Lisp gets along very well in the AI field because of its excellent
prototyping capabilities and its sustenance for symbolic
expressions. It's a dominant programming language and is used
in major AI projects, such as Macsyma, DART, and CYC.

The Lisp language is frequently used in the Machine Learning/


ILP sub-field because of its usability and symbolic structure.
Peter Norvig, the well-known computer scientist who works
broadly in the AI field, and also the writer of the famous AI
book, “Artificial Intelligence: A modern approach,” describes
why Lisp is one of the best programming languages for AI
development in a Quora answer.

Prolog

Prolog stands together with Lisp when it comes to worth and


usability. Agreeing to the writings, Prolog Programming for
Artificial Intelligence, Prolog is one of those programming
languages for some simple mechanisms, which can be
enormously useful for AI programming. For instance, it offers
pattern matching, automatic backtracking, and tree-based data

72
structuring mechanisms. Combining these mechanisms offers
a flexible framework to work with.

Prolog is widely used in expert systems for AI and is also useful


for working on medical projects.

C++

C++ is the fastest programming language in the domain. Its


capability to talk at the hardware level allows developers to
improve their program execution time. C++ is enormously
useful for AI projects, which are time-sensitive. Search engines,
for example, can implement C++ extensively.

In AI, C++ can be used for statistical AI procedures like those


found in neural networks. Algorithms can likewise be written
broadly in the C++ for speed execution, and AI in games is
mostly coded in C++ for faster execution and response time.

AIML

AIML - (meaning "Artificial Intelligence Markup Language")


is an XML language for implementation with Artificial
Linguistic Internet Computer Entity (A.L.I.C.E.)- type
chatterbots. The language has classes showing a unit of
knowledge; patterns of probable utterance addressed to a
chatbot, and templates of possible answers.

Information Processing Language (IPL) was the first language


created for artificial intelligence. It includes features envisioned
to support programs that could accomplish general problem
solving, such as lists, associations, schemas (frames), dynamic

73
memory allocation, data types, recursion, associative retrieval,
functions as arguments, generators (streams), and cooperative
multitasking.

Smalltalk has been utilized comprehensively for simulations,


neural networks, machine learning and genetic algorithms. It
uses the purest and most elegant kind of object-oriented
programming using message passing.

Stanford Research Institute Problem Solver (STRIPS) is a


language for expressing automated planning problem
instances. It defines the initial state, the goal states, and a set of
actions. For each action pre-conditions (what must be
established before the action is performed) and post-
conditions (what is established after the action is performed)
are stated.

Planner is a hybrid language between procedural and logical


languages. It gives a procedural interpretation to logical
sentences where implications are interpreted with pattern-
directed inference.

POP-11 is a reflective, incrementally compiled programming


language with countless features of an interpreted language. It
is the principal language of the Poplog programming
environment developed originally by the University of Sussex,
and recently in the School of Computer Science at the
University of Birmingham which hosts the Poplog website, It
is often used to introduce symbolic programming methods to
programmers of extra conventional languages like Pascal, who
find POP syntax more used to than that of Lisp. One of POP-
11's features is that it supports first-class functions.

74
Haskell is also a very worthy programming language for AI.
Lazy evaluation and the list and LogicT monads make it stress-
free to express non-deterministic algorithms, which is often
the case. Endless data structures are great for search trees. The
language's features permit a compositional way of expressing
the algorithms. The only disadvantage is that working with
graphs is a bit tougher at first because of purity.

Wolfram Language contains a wide range of integrated


machine learning capabilities, from highly automated functions
like Predict and Classify to functions based on specific
methods and diagnostics. The functions work on many types
of data, including numerical, categorical, time series, textual,
and image.

Other Programming Languages that can be used for the


purpose Artificial Neural Network
MATLAB
Perl
Julia

75
Practical implementations
Text Classification

Text Classification Using Neural Networks

Understanding the way chatbots' work is very important. An


important part of machinery inside a chat-bot is the text
classifier. Let’s have a look at the inner mechanisms of an
artificial neural network (ANN) for text classification.

We’ll utilize 2 layers of neurons (1 hidden layer) and a “bag of


words” tactic to organize our training data.

Text classification comes in 3 flavors: pattern matching,


algorithms, neural nets.

Although the algorithmic method using Multinomial Naive


Bayes is unexpectedly effective, it suffers from 3 fundamental
flaws:
 The algorithm produces a score rather than a probability.
We want a probability to overlook predictions below certain
threshold. This is similar to a ‘squelch’ dial on a VHF radio.
 The algorithm ‘learns’ from examples of what is in a class,
but not what isn’t. This learning of patterns of what does
not belong to a class is frequently very significant.
 Classes with excessively large training sets can generate
distorted classification scores, obliging the algorithm to
adjust scores relative to class size. This is not ideal.

76
As with its ‘Naive’ counterpart, this classifier isn’t endeavoring
to understand the meaning of a sentence, it’s trying to classify
it. In fact so called “AI chat-bots” do not understand language,
but that’s not the topic for now.

Let’s examine our text classifier one sector at a time. We will


take the following steps:
1. refer to libraries we need
2. provide training data
3. organize our data
4. iterate: code + test the results + tune the model
5. abstract

The code is below and we’re using IPython notebook which is


a super productive way of working on data science projects.
The code syntax is Python.

We begin by importing our natural language toolkit nltk. We


need a method to dependably tokenize sentences into words
and a method to stem words.

# use natural language toolkit

import nltk
from nltk.stem.lancaster import LancasterStemmer
import os
import json
import datetime
stemmer = LancasterStemmer()

And our training data, 12 sentences belonging to


3 classes (‘intents’).

77
# 3 classes of training data
training_data = []
training_data.append({"class":"greeting",
"sentence":"how are you?"})
training_data.append({"class":"greeting",
"sentence":"how is your day?"})
training_data.append({"class":"greeting",
"sentence":"good day"})
training_data.append({"class":"greeting",
"sentence":"how is it going today?"})

training_data.append({"class":"goodbye",
"sentence":"have a nice day"})
training_data.append({"class":"goodbye",
"sentence":"see you later"})
training_data.append({"class":"goodbye",
"sentence":"have a nice day"})
training_data.append({"class":"goodbye",
"sentence":"talk to you soon"})

training_data.append({"class":"sandwich",
"sentence":"make me a sandwich"})
training_data.append({"class":"sandwich",
"sentence":"can you make a sandwich?"})
training_data.append({"class":"sandwich",
"sentence":"having a sandwich today?"})
training_data.append({"class":"sandwich",
"sentence":"what's for lunch?"})
print ("%s sentences in training data" %
len(training_data))

12 sentences in training data


We can now organize our data structure for documents, classes
and words.
words = []
classes = []
documents = []
ignore_words = ['?']
# loop through each sentence in our training
data
for pattern in training_data:

78
# tokenize each word in the sentence
w = nltk.word_tokenize(pattern['sentence'])
# add to our words list
words.extend(w)
# add to documents in our corpus
documents.append((w, pattern['class']))
# add to our classes list
if pattern['class'] not in classes:
classes.append(pattern['class'])

# stem and lower each word and remove duplicates


words = [stemmer.stem(w.lower()) for w in words
if w not in ignore_words]
words = list(set(words))

# remove duplicates
classes = list(set(classes))

print (len(documents), "documents")


print (len(classes), "classes", classes)
print (len(words), "unique stemmed words",
words)

12 documents
3 classes ['greeting', 'goodbye', 'sandwich']
26 unique stemmed words ['sandwich', 'hav', 'a',
'how', 'for', 'ar', 'good', 'mak', 'me', 'it',
'day', 'soon', 'nic', 'lat', 'going', 'you',
'today', 'can', 'lunch', 'is', "'s", 'see',
'to', 'talk', 'yo', 'what']

Observe that each word is stemmed and lower-cased.


Stemming benefits the machine equate words like “have” and
“having”. We don’t care about case.

79
Our training data is transformed into “bag of words” for each
sentence.

# create our training data


training = []
output = []
# create an empty array for our output
output_empty = [0] * len(classes)

# training set, bag of words for each sentence


for doc in documents:
# initialize our bag of words
bag = []
# list of tokenized words for the pattern
pattern_words = doc[0]
# stem each word
pattern_words = [stemmer.stem(word.lower())
for word in pattern_words]
# create our bag of words array
for w in words:
bag.append(1) if w in pattern_words else
bag.append(0)

training.append(bag)
# output is a '0' for each tag and '1' for
current tag
output_row = list(output_empty)
output_row[classes.index(doc[1])] = 1
output.append(output_row)

# sample training/output
i = 0
w = documents[i][0]

80
print ([stemmer.stem(word.lower()) for word in
w])
print (training[i])
print (output[i])

['how', 'ar', 'you', '?']


[0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[1, 0, 0]

The above step is a classic in text classification: each training


sentence is reduced to an array of 0’s and 1’s against the array
of unique words in the corpus.

['how', 'are', 'you', '?']

is stemmed:

['how', 'ar', 'you', '?']

then transformed to input: a 1 for each word in the bag (the ? is


ignored)

[0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0]

and output: the first class

[1, 0, 0]

Note that a sentence could be given multiple classes, or none.


Make sure the above makes sense and play with the code until
you grow it.

81
Your initial step in machine learning is to have clean data.

Next we have our main functions for our 2-layer neural


network.

We use numpy library because it makes our matrix


multiplication to become fast.

We use a sigmoid function to normalize values and its


derivative to measure the error rate. Iterating and adjusting
until our error rate is acceptably small.

82
Also below we implement our bag-of-words model function,
transforming an input sentence into an array of 0’s and 1’s.
This matches exactly with our transform for training data. It is
always crucial to get this right.

import numpy as np
import time

# compute sigmoid nonlinearity


def sigmoid(x):
output = 1/(1+np.exp(-x))
return output

# convert output of sigmoid function to its


derivative
def sigmoid_output_to_derivative(output):
return output*(1-output)

def clean_up_sentence(sentence):
# tokenize the pattern
sentence_words =
nltk.word_tokenize(sentence)
# stem each word
sentence_words = [stemmer.stem(word.lower())
for word in sentence_words]
return sentence_words

# return bag of words array: 0 or 1 for each


word in the bag that exists in the sentence
def bow(sentence, words, show_details=False):
# tokenize the pattern
sentence_words = clean_up_sentence(sentence)
# bag of words
bag = [0]*len(words)
for s in sentence_words:
for i,w in enumerate(words):
if w == s:
bag[i] = 1
if show_details:

83
print ("found in bag: %s" %
w)

return(np.array(bag))

def think(sentence, show_details=False):


x = bow(sentence.lower(), words,
show_details)
if show_details:
print ("sentence:", sentence, "\n bow:",
x)
# input layer is our bag of words
l0 = x
# matrix multiplication of input and hidden
layer
l1 = sigmoid(np.dot(l0, synapse_0))
# output layer
l2 = sigmoid(np.dot(l1, synapse_1))
return l2

And now we code our neural network training


function to create synaptic weights. Don’t get
too excited, this is mostly matrix
multiplication — from middle-school math class.

def train(X, y, hidden_neurons=10, alpha=1,


epochs=50000, dropout=False,
dropout_percent=0.5):

print ("Training with %s neurons, alpha:%s,


dropout:%s %s" % (hidden_neurons, str(alpha),
dropout, dropout_percent if dropout else '') )
print ("Input matrix: %sx%s Output
matrix: %sx%s" % (len(X),len(X[0]),1,
len(classes)) )
np.random.seed(1)

last_mean_error = 1
# randomly initialize our weights with mean
0
synapse_0 = 2*np.random.random((len(X[0]),
hidden_neurons)) - 1

84
synapse_1 =
2*np.random.random((hidden_neurons,
len(classes))) - 1

prev_synapse_0_weight_update =
np.zeros_like(synapse_0)
prev_synapse_1_weight_update =
np.zeros_like(synapse_1)

synapse_0_direction_count =
np.zeros_like(synapse_0)
synapse_1_direction_count =
np.zeros_like(synapse_1)

for j in iter(range(epochs+1)):

# Feed forward through layers 0, 1, and


2
layer_0 = X
layer_1 = sigmoid(np.dot(layer_0,
synapse_0))

if(dropout):
layer_1 *=
np.random.binomial([np.ones((len(X),hidden_neuro
ns))],1-dropout_percent)[0] * (1.0/(1-
dropout_percent))

layer_2 = sigmoid(np.dot(layer_1,
synapse_1))

# how much did we miss the target value?


layer_2_error = y - layer_2

if (j% 10000) == 0 and j > 5000:


# if this 10k iteration's error is
greater than the last iteration, break out
if np.mean(np.abs(layer_2_error)) <
last_mean_error:
print ("delta after "+str(j)+"
iterations:" +
str(np.mean(np.abs(layer_2_error))) )
last_mean_error =
np.mean(np.abs(layer_2_error))

85
else:
print ("break:",
np.mean(np.abs(layer_2_error)), ">",
last_mean_error )
break

# in what direction is the target value?


# were we really sure? if so, don't
change too much.
layer_2_delta = layer_2_error *
sigmoid_output_to_derivative(layer_2)

# how much did each l1 value contribute


to the l2 error (according to the weights)?
layer_1_error =
layer_2_delta.dot(synapse_1.T)

# in what direction is the target l1?


# were we really sure? if so, don't
change too much.
layer_1_delta = layer_1_error *
sigmoid_output_to_derivative(layer_1)

synapse_1_weight_update =
(layer_1.T.dot(layer_2_delta))
synapse_0_weight_update =
(layer_0.T.dot(layer_1_delta))

if(j > 0):


synapse_0_direction_count +=
np.abs(((synapse_0_weight_update > 0)+0) -
((prev_synapse_0_weight_update > 0) + 0))
synapse_1_direction_count +=
np.abs(((synapse_1_weight_update > 0)+0) -
((prev_synapse_1_weight_update > 0) + 0))

synapse_1 += alpha *
synapse_1_weight_update
synapse_0 += alpha *
synapse_0_weight_update

prev_synapse_0_weight_update =
synapse_0_weight_update

86
prev_synapse_1_weight_update =
synapse_1_weight_update

now = datetime.datetime.now()

# persist synapses
synapse = {'synapse0': synapse_0.tolist(),
'synapse1': synapse_1.tolist(),
'datetime': now.strftime("%Y-%m-
%d %H:%M"),
'words': words,
'classes': classes
}
synapse_file = "synapses.json"

with open(synapse_file, 'w') as outfile:


json.dump(synapse, outfile, indent=4,
sort_keys=True)
print ("saved synapses to:", synapse_file)

We are now ready to build our neural network model, we will


save this as a json structure to represent our synaptic weights.
You should experiment with different ‘alpha’ (gradient descent
parameter) and see how it affects the error rate. This parameter
helps our error adjustment find the lowest error rate:

synapse_0 += alpha * synapse_0_weight_update

87
We use 20 neurons in our hidden layer, you can adjust this
easily. These parameters will vary depending on the
dimensions and shape of your training data, tune them down
to ~10^-3 as a reasonable error rate.

X = np.array(training)
y = np.array(output)

start_time = time.time()

train(X, y, hidden_neurons=20, alpha=0.1,


epochs=100000, dropout=False,
dropout_percent=0.2)

elapsed_time = time.time() - start_time


print ("processing time:", elapsed_time,
"seconds")

Training with 20 neurons, alpha:0.1,


dropout:False
Input matrix: 12x26 Output matrix: 1x3
delta after 10000 iterations:0.0062613597435
delta after 20000 iterations:0.00428296074919
delta after 30000 iterations:0.00343930779307
delta after 40000 iterations:0.00294648034566

88
delta after 50000 iterations:0.00261467859609
delta after 60000 iterations:0.00237219554105
delta after 70000 iterations:0.00218521899378
delta after 80000 iterations:0.00203547284581
delta after 90000 iterations:0.00191211022401
delta after 100000 iterations:0.00180823798397
saved synapses to: synapses.json
processing time: 6.501226902008057 seconds

The synapse.json file contains all of our synaptic weights, this


is our model.

This classify() function is all that’s needed for the classification


once synapse weights have been calculated: 15 lines of code.
The catch: if there’s a change to the training data our model
will need to be re-calculated. For a very large dataset this could
take a non-insignificant amount of time.
We can now generate the probability of a sentence belonging
to one (or more) of our classes. This is super-fast because it’s
dot-product calculation in our previously defined think()
function.

# probability threshold
ERROR_THRESHOLD = 0.2
# load our calculated synapse values
synapse_file = 'synapses.json'
with open(synapse_file) as data_file:
synapse = json.load(data_file)
synapse_0 = np.asarray(synapse['synapse0'])
synapse_1 = np.asarray(synapse['synapse1'])

def classify(sentence, show_details=False):


results = think(sentence, show_details)

results = [[i,r] for i,r in enumerate(results)


if r>ERROR_THRESHOLD ]
results.sort(key=lambda x: x[1], reverse=True)

89
return_results =[[classes[r[0]],r[1]] for r in
results]
print ("%s \n classification: %s" % (sentence,
return_results))
return return_results

classify("sudo make me a sandwich")


classify("how are you today?")
classify("talk to you tomorrow")
classify("who are you?")
classify("make me some lunch")
classify("how was your lunch today?")
print()
classify("good day", show_details=True)

sudo make me a sandwich


[['sandwich', 0.99917711814437993]]
how are you today?
[['greeting', 0.99864563257858363]]
talk to you tomorrow
[['goodbye', 0.95647479275905511]]
who are you?
[['greeting', 0.8964283843977312]]
make me some lunch
[['sandwich', 0.95371924052636048]]
how was your lunch today?
[['greeting', 0.99120883810944971],
['sandwich', 0.31626066870883057]]

Experiment with additional sentences and dissimilar


probabilities, you can then add training data and
improve/expand the model. Notice the solid predictions with
limited training data.

Certain sentences will produce multiple predictions (above a


threshold). You will want to establish the correct threshold
level for your application. Not all text classification scenarios

90
are the same: some predictive situations require more confidence than
others.

The last classification shows some internal details:


found in bag: good found in bag: day sentence: good day bow:
[0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] good day
[['greeting', 0.99664077655648697]]

Notice the bag-of-words (bow) for the sentence, 2 words


matched our corpus. The neural-net also learns from the 0’s,
the non-matching words.

A low-probability classification is simply shown by providing


a sentence where ‘a’ (common word) is the only match, for
example:
found in bag: a sentence: a burrito! bow: [0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] a burrito! [['sandwich',
0.61776860634647834]]

Image Processing

Here, we will learn how to develop programs that identify


objects in photos using deep learning. In other words, we’re
going to explain the black magic that allows Google Photos to
search your photos based on what is in the picture:

Recognizing Objects with Deep Learning

91
Any 3-year-old child can identify a photograph of a bird, but
figuring out how to make a computer identify objects has
baffled the very greatest computer scientists for over 50 years.

In the previous couple of years, we’ve eventually found a


decent method to object identification utilizing deep
convolutional neural networks. The concepts are completely
understandable if you break them down one by one.
So let’s do it — let’s write a program that can identify birds!

Starting Simple

Beforehand we learn how to identify pictures of birds, let’s


learn how to identify something much simpler   — the
handwritten number “8”.
We see how neural networks can solve complex problems by
chaining together lots of simple neurons.

We have also seen that the idea of machine learning is that the
same generic algorithms can be used again with diverse data to
solve diverse problems. So let’s change this same neural
network to identify handwritten text. But to make the task
really simple, we’ll only try to recognize one letter — the
numeral “8”.

Machine learning merely works when you not only have data 
but preferably a lot of data. So we need more and more of
handwritten “8”s to get started. Fortunately, researchers have
created the MNIST data set of handwritten numbers for this
very purpose. MNIST provides 60,000 images of handwritten
digits, each as an 18x18 image. Here are some “8”s from the
data set:

92
Some 8s from the MNIST data set

If you ponder about it, everything is just numbers.


Now we need to process images with our neural network. How
in the world do we feed images into a neural network as an
alternative to just numbers?
The answer is unbelievably simple. A neural network takes
numbers as input. To a computer, an image is actually just a
grid of numbers that signify how dark each pixel is:

To feed an image into our neural network, we just treat the


18x18 pixel image as an array of 324 numbers:
93
To handle 324 inputs, we’ll just widen our neural network to
have 324 input nodes:

Observe that our neural network also has two outputs now
(rather than just one). The first output will predict the
probability that the image is an “8” and the second output will
predict the probability it is not an “8”. By having a distinct
output for each type of object we want to identify, we can
utilize a neural network to classify objects into groups.
Our neural network is even bigger than the last time (324
inputs rather than 3!). But then again any modern computer
can handle a neural network with a small number of nodes like

94
some hundred nodes without blinking. This would even work
as satisfactory on your cell phone.
All that is left is to train the neural network with images of “8”s
and not-“8"s so it learns to differentiate between them. When
we input an “8”, we’ll tell it the likelihood the image is an “8”
is 100% and the likelihood it’s not an “8” is 0%. Vice versa for
the opposite images.

Here is some selected images of our training data:

We could train this type of neural network in a few minutes on


an up-to-date laptop. When the process is complete, we’ll have
a neural network that can identify pictures of “8”s with a pretty
high accuracy.

Tunnel Vision

The better part is that our “8” identifier actually does work well
on simple images where the letter is right in the middle of the
image:

95
But now the truly bad part of the news:
Our “8” identifier totally fails to work when the letter isn’t
perfectly centered in the image. Just the smallest amount of
position change ruins everything:

This is because of the reason our network only learned the


pattern of a perfectly-centered “8”. It has completely no idea
what an off-center “8” is. It knows accurately one pattern and
one pattern only.
That’s not very worthwhile in the real world. Real world
problems are at no time that clean and simple. So we require
to figure out how to make our neural network work in cases
where the “8” isn’t flawlessly centered.
Brute Force Idea #1: Searching with a Sliding Window
We already created a really good program for finding an “8”
centered in an image. What if we just scan all around the image
for possible “8”s in smaller sections, one section at a time, until
we find one?

96
This approach called a sliding window. It’s the brute force
solution. It works well in some limited cases, but it’s really
inefficient. You have to check the same image over and over
looking for objects of different sizes. We can do better than
this!

Brute Force Idea #2: More data and a Deep Neural Net
When we trained our network, we only showed it “8”s that
were perfectly centered. Imagine what happens if we train it
with even more data, as well as “8”s in all different positions
and sizes all around the image?
We don’t even require to collect some new training data. We
can just develop a script to produce new images with the “8”s
in all kinds of diverse positions in the image:

97
We generated Synthetic Training Data by creating diverse
versions of the training images we already had. This is a very
beneficial technique!
Using this technique, we can effortlessly create a boundless
supply of training data.

Additional data makes the problem tougher for our neural


network to solve, but we can reimburse for that by making our
network larger and thus able to learn even more complicated
patterns.
To make the network larger, we just stack up layer upon layer
of nodes:

98
We call this a “deep neural network” because it has even more
layers than a old-fashioned neural network.
This idea has been around since the late 1960s. But until
recently, training this large of a neural network was just too
slow to be useful. But once we figured out how to use 3d
graphics cards (which were designed to do matrix
multiplication really fast) instead of normal computer
processors, working with large neural networks suddenly
became practical. In fact, the exact same NVIDIA GeForce
GTX 1080 video card that you use to play Overwatch can be
used to train neural networks incredibly quickly.

But even though we can make our neural network really big
and train it quickly with a 3d graphics card that still isn’t going
to get us all the way to a solution. We are required to be cleverer
about how we process images into our neural network.
Think about it. It doesn’t make sense to train a network to
identify an “8” at the top of a picture separately from training
it to recognize an “8” at the bottom of a picture as if those
were two totally dissimilar objects.
There should be a particular way to make the neural network
smart enough to know that an “8” someplace in the picture is
the same thing without all that extra training. Luckily… there
is!

The Solution is Convolution


As a human, you instinctively know that pictures have a
hierarchy or conceptual structure. Consider this picture:

99
As a human, you instantly recognize the hierarchy in this
picture:
• The ground is covered in grass and concrete
• There is a child
• The child is sitting on a bouncy horse
• The bouncy horse is on top of the grass

Most importantly, we recognize the idea of a child no matter


what surface the child is on. We don’t have to re-learn the idea
of child for every possible surface it could appear on.
But right now, our neural network can’t do this. It thinks that
an “8” in some different portion of the image is a completely
dissimilar thing. It does not understand that moving an object
around in the image does not make it something unalike. This
means it has to re-learn the identity of every single object in
every possible position, which is quite hectic.
We need to provide our neural network with an understanding
of translation invariance — an “8” is an “8” no matter where in
the picture it is put up.

100
We’ll do this using a technique called Convolution. The idea of
convolution is inspired partly by computer science and partly
by biology (i.e. mad scientists literally poking cat brains with
weird probes to figure out how cats process images).
How Convolution Works
Rather than feeding entire images into our neural network as
one grid of numbers, we’re going to do something a lot smarter
that takes benefit of the idea that an object is the identical no
matter where it appears in an image.
Here is how the process is going to be in step by step manner:
Step 1: Break the image into overlapping image tiles
Similar to our sliding window search above, let’s pass a sliding
window over the entire original image and save each result as
a separate, tiny picture tile:

The process resulted into converting our actual image into a 77


equally-sized tiny image tiles.

Step 2: Feed each image tile into a small neural network

101
Previously, we fed a single image into a neural network to
check if it was an “8”. We’ll do the exact similar thing here, but
we’ll do it for each individual image tile:

Repeat this process 77 times, once for each tile.


Nevertheless, there’s one big twist: We’ll keep the same neural
network weights for every single tile in the original image. In
other words, we are handling every image tile in the same way.
If something interesting appears in any given tile, we’ll mark
that tile as an interesting tile.
Step 3: Store the results from each tile into a new array. We
don’t want to lose track of the arrangement of the original tiles.
So we store the result from processing each tile into a grid in
the same arrangement as the original image. It looks like this:

102
In other words, we’ve started with a large original image and
we ended with a slightly smaller array that stores the
information about which sections of our original image were
the most interesting.

Step 4: Downsampling
The result of Step 3 was an array that maps out which parts of
the original image are the most interesting. But that array is still
very large enough:

103
To decrease the size of this array, we downsample it using an
algorithm called max pooling. It sounds something advanced,
but it isn’t at all.
We’ll just observe each 2x2 square of the array and keep the
biggest number:

The technique used here is that if we find something


interesting in any of the four input tiles that makes up each 2x2
grid square, we’ll just save the most interesting bit. This
minimizes the size of our array while keeping the most
important bits.
Final step: Make a prediction
So far, we’ve decreased a giant image down into a fairly small
array.
This array too is just a bunch of numbers, so we can use that
small array as input into another neural network. This final
neural network will choose if the image is or isn’t a match. To
distinguish it from the convolution step, we call it a “fully
connected” network.
So from start to finish, our whole five-step pipeline looks like
this:

104
Some more additional steps:
The image processing pipeline is a sequence of steps which are
convolution, max-pooling, and finally a fully-connected
network.
While solving problems in the real world, these steps can be
joined and piled as many times as you want! You can have two,
three or even more number of layers, like ten convolution
layers. You can toss in max pooling wherever you want to
decrease the size of your data.
The plain idea is to start with a large image and continually boil
it down, step-by-step, until you finally have a single result.
Higher the number of convolution steps you have, greater will
be the complicated features your network will be able to learn
to recognize.
For instance, the initial convolution step might learn to identify
sharp edges, the second convolution step might identify beaks
using its knowledge of sharp edges, the third step might
recognize entire birds using its knowledge of beaks, etc.
Here’s what a more realistic deep convolutional network (like
you would find in a research paper) looks like:

105
In this situation, they start with a 224 x 224 pixel image, apply
convolution and max pooling two times, apply convolution 3
more number of times, apply max pooling and then have two
fully-connected layers. The final result is that the image is
categorized into one of 1000 categories!

Constructing the Right Network


How do you know which steps you need to combine to make
your image classifier work properly?
Fairly, you have to reply to this by doing a lot of
experimentation and testing. You might have to train 100
networks before you find the best structure and parameters for
the problem you are solving. Machine learning includes a lot of
trial and error!

Building our Bird Classifier

At the moment as a final point we know enough to write a


program that can decide if an image is a bird or not.
As usual, we need some data to get on track. The free
CIFAR10 data set contains 6,000 pictures of birds and 52,000
pictures of things that are not birds. But to get even more data

106
we’ll also add in the Caltech-UCSD Birds-200–2011 data set
that has another 12,000 bird pictures.

Here are a few of the birds from our combined data set:

And here are some of the 52,000 non-bird images:

This data set will do fine for our intentions, but 72,000 low-res
images is still pretty small for real-world applications. If you
want Google-level performance, you require millions of large
images. In machine learning, having more amount of data is
almost always more significant that having superior algorithms.
Now you know why Google is so happy to offer you limitless
photo storage. They want all of your image data.

107
To make our own classifier, we’ll use TFLearn. TFlearn is a
wrapper around Google’s TensorFlow deep learning library
that disclosures a simplified API. It lets us build convolutional
neural networks as easy as writing a few lines of code to
describe the layers of our network.
Now that we have a trained neural network, we can utilize it.
Here’s a simple script that takes in a single image file and
predicts if it is a bird or not.

# -*- coding: utf-8 -*-

"""
Based on the tflearn example located here:
https://github.com/tflearn/tflearn/blob/master/e
xamples/images/convnet_cifar10.py
"""
from __future__ import division, print_function,
absolute_import

# Import tflearn and some helpers


import tflearn
from tflearn.data_utils import shuffle
from tflearn.layers.core import input_data,
dropout, fully_connected
from tflearn.layers.conv import conv_2d,
max_pool_2d
from tflearn.layers.estimator import regression
from tflearn.data_preprocessing import
ImagePreprocessing
from tflearn.data_augmentation import
ImageAugmentation
import pickle

# Load the data set


X, Y, X_test, Y_test =
pickle.load(open("full_dataset.pkl", "rb"))

# Shuffle the data


X, Y = shuffle(X, Y)

108
# Make sure the data is normalized
img_prep = ImagePreprocessing()
img_prep.add_featurewise_zero_center()
img_prep.add_featurewise_stdnorm()

# Create extra synthetic training data by


flipping, rotating and blurring the
# images on our data set.
img_aug = ImageAugmentation()
img_aug.add_random_flip_leftright()
img_aug.add_random_rotation(max_angle=25.)
img_aug.add_random_blur(sigma_max=3.)

# Define our network architecture:

# Input is a 32x32 image with 3 color channels


(red, green and blue)
network = input_data(shape=[None, 32, 32, 3],

data_preprocessing=img_prep,
data_augmentation=img_aug)

# Step 1: Convolution
network = conv_2d(network, 32, 3,
activation='relu')

# Step 2: Max pooling


network = max_pool_2d(network, 2)

# Step 3: Convolution again


network = conv_2d(network, 64, 3,
activation='relu')

# Step 4: Convolution yet again


network = conv_2d(network, 64, 3,
activation='relu')

# Step 5: Max pooling again


network = max_pool_2d(network, 2)

# Step 6: Fully-connected 512 node neural


network
network = fully_connected(network, 512,
activation='relu')

109
# Step 7: Dropout - throw away some data
randomly during training to prevent over-fitting
network = dropout(network, 0.5)

# Step 8: Fully-connected neural network with


two outputs (0=isn't a bird, 1=is a bird) to
make the final prediction
network = fully_connected(network, 2,
activation='softmax')

# Tell tflearn how we want to train the network


network = regression(network, optimizer='adam',

loss='categorical_crossentropy',
learning_rate=0.001)

# Wrap the network in a model object


model = tflearn.DNN(network,
tensorboard_verbose=0, checkpoint_path='bird-
classifier.tfl.ckpt')

# Train it! We'll do 100 training passes and


monitor it as it goes.
model.fit(X, Y, n_epoch=100, shuffle=True,
validation_set=(X_test, Y_test),
show_metric=True, batch_size=96,
snapshot_epoch=True,
run_id='bird-classifier')

# Save model when training is complete to a file


model.save("bird-classifier.tfl")
print("Network trained and saved as bird-
classifier.tfl!")

# Load the image file


img = scipy.ndimage.imread(args.image,
mode="RGB")

# Scale it to 32x32
img = scipy.misc.imresize(img, (32, 32),
interp="bicubic").astype(np.float32,
casting='unsafe')

110
# Predict
prediction = model.predict([img])

# Check the result.


is_bird = np.argmax(prediction[0]) == 1

if is_bird:
print("That's a bird!")
else:
print("That's not a bird!")

If you are training with a good video card with enough RAM
(like an Nvidia GeForce GTX 980 Ti or better), this will be
done in less than an hour. If you are training with a normal
CPU, it might take a lot longer.

As it trains, the accuracy will increase. After the first pass, I got
75.4% accuracy. After just 10 passes, it was already up to
91.7%. After 50 or so passes, it capped out around 95.5%
accuracy and additional training didn’t help, so I stopped it
there.
Congrats! Our program can now recognize birds in images!

Testing Our Network

On the other hand to really see how effective our network is,
we need to check it with lots of images. The data set created
by me held back 15,000 images for validation. On running
those 15,000 images through the network, it predicted the
correct answer 95% of the time.

Let give the answer to how precise is 95% accuracy?


Our network titles to be 95% accurate. But such is not the case.
This could mean all sorts of different things.
111
For instance, what if 5% of our training images were birds and
the other 95% were not birds? A program that predicted “not
a bird” every single time would be 95% accurate! Then it would
also be 100% useless.
We need to observe more closely at the numbers than just the
overall accuracy. To judge how upright a classification system
really is, we need to look closely at how it failed, not just the
percentage of the time that it failed.

Instead of thinking about our predictions as “right” and


“wrong”, let’s break them down into four separate categories
 First, here are specific of the birds that our network
correctly identified as birds. We call these True
Positives:

Wow! Our network can recognize lots of different kinds of


birds successfully!
 Second, here are images that our network correctly
identified as “not a bird”. These are called True
Negatives:

112
Horses and trucks don’t fool us!
 Third, here are some images that we thought were
birds but were not really birds at all. These are
our False Positives:

Lots of planes were mistaken for birds! That makes sense.


 And finally, here are some images of birds that we
didn’t correctly recognize as birds. These are
our False Negatives:

Using our validation set of 15,000 images, here’s how many


times our predictions fell into each category:

The answer to "Why do we break our results down like this?"


is because not all mistakes are created equal.

113
Imagine if we were writing a program to detect cancer from an
MRI image. If we were detecting cancer, we’d rather have false
positives than false negatives. False negatives would be the
worst possible case — that’s when the program told someone
they definitely didn’t have cancer but they actually did.
Instead of just looking at overall accuracy, we calculate
Precision and Recall metrics. Precision and Recall metrics give
us a clearer picture of how well we did:

This tells us that 97% of the time we guessed “Bird”, we were


right! But it also tells us that we only found 90% of the actual
birds in the data set. In other words, we might not find every
bird but we are pretty sure about it when we do find one!

114
Major NN projects

Recognition of Braille Alphabet Using Neural


Networks

The main goal is to train neural network, to be able to


recognize which character of Braille alphabet is inputted. For
testing we used Serbian Cyrillic Braille alphabet.
As each letter is represented by six dots, we will have a matrix
of dimension 3x2. Each component of the matrix signifies one
input. We will have 6 inputs, for each dot.
As far as output is concerned, the number will vary depending
on the architecture.
Shuttle Landing Control

Implementing Space Shuttle Landing Control mechanism


using Neuroph framework by training the neural network that
uses Shuttle Landing Control data set.
Central goal of this experiment is to train neural network for
predicting the conditions under which an auto landing would
be preferable to manual control of the spacecraft.
Music Classification by Genre Using Neural
Networks

Music classification is a pattern identification problem which


includes extraction features and establishing classifier.
Artificial neural network have found reflective success in the
area of pattern recognition, it can be trained to distinguish the

115
standards used to classify, and can do so in a generalized
manner by repetitively showing a neural network inputs
classified into groups. Neural network provides a fresh
solution for music classification, so a new music classification
method is projected based on BP neural network in this
experiment.
Face Recognition Using Neural Network

The main goal is to train the neural network to identify a face


from any picture. The neural network takes some image's
parameters for input and tries to predict a person who has this
corresponding characteristic.
Concept Learning and Classification - Hayes-Roth
Data Set

A sample of a multivariate data type classification problem


using Neuroph. In this assignment we test Neuroph 2.4 with
Hayes-Roth Data Set. Quite a lot of architectures are tried out,
and decided which ones represent a good solution to the
problem, and which ones do not.

Predicting Poker Hands with Neural Networks

The core goal is to train the neural network to predict which


poker hand we have on the basis of cards we give as input
attributes. The database was acquired from the Carleton
University, Department of Computer Science Intelligent
Systems Research Unit in Canada.
The data set comprises more than 25000 instances but because
of software limitations the project was worked with shorter
version of 1003 instances.

116
Predicting Relative Performance of Computer
Processors with Neural Networks

The goal is to train the neural network to predict relative


performance of a CPU using some features that are used as
input, and consequently comparing that result with existing
performance that is published and relative performance that is
projected using linear regression method.
Predicting Survival of Patients Using Habermans
Data Set

Foretelling survival of patients who had undergone surgery for


breast cancer. The objective is to train the neural network to
predict whether a patient survived after breast cancer surgery,
when it is given other characteristics as input.
Predicting the Class of Breast Cancer with Neural
Networks

The main goal is to train the neural network to predict whether


a breast cancer is malicious or gentle, when it is given other
attributes as input.
Breast Tissue Classification Using Neural Networks

Train the neural network to predict to which cluster of six


classes removed breast tissue belongs. The objective is to train
the neural network to predict which cluster of six classes of
freshly removed tissue the Breast Tissue belongs, when it is
given other characteristics as input.
Classification of Animal Species Using Neural
Networks

117
The purpose of this experiment is to study the feasibility of
classification animal species using neural networks. An animal
class is made up of animal that are all alike in important ways.
Hence we need to train a neural network to make it able to
predict which species fit to a particular set. Once we have
decided on a problem to solve using neural networks, we will
want to gather data for training purposes. The training data set
includes a various variety of cases, each comprising values for
a range of input and output variables.
Another variant for this type of project is classification of
animal species on the basis of 17 Boolean-valued attributes.

Car Evaluation Using Neural Networks

This project is for testing Neuroph with Car Dataset which can
be found here:
http://archive.ics.uci.edu/ml/datasets/Car+Evaluation.
Several architectures will be tried out, and it will be determined
which ones represent a good solution to the problem, and
which ones does not. Car Evaluation Data set was obtained
from a modest hierarchical decision model.

The model evaluates cars according to the following concept


structure:
o car acceptability
o overall price
o buying price, maint price of the maintenance
o comfort
o number of doors, persons capacity in terms of persons
to carry, the size of luggage boot, safety of the car
Lenses Classification Using Neural Networks

118
Neuroph framework is to be used to train the neural network
that uses the Database for fitting contact lenses (Lenses data
set). The dataset used is taken from a paper by Cendrowska
(1988) on the inductive examination of a set of ophthalmic
data. The lenses data set tries to predict whether a person will
need soft contact lenses, hard contact lenses or no contacts, by
determining related features of the client.
The data set has 4 features (age of the patient, spectacle
prescription, notion on astigmatism, and information on tear
production rate) along with an associated three-valued class
that gives the suitable lens prescription for patient (hard
contact lenses, soft contact lenses, no lenses).
Balance Scale Classification Using Neural Networks

Using Neuroph framework for training the neural network that


uses Balance Scale data set. Balance Scale data set was
generated to model psychological experimental results. Each
example is classified as having the balance scale equal to one
of the following three: tip to the right, tip to the left, or be
balanced.
The characteristics are the left weight, the left distance, the
right weight, and the right distance. The correct method to find
the class is the greater of (left-distance * left-weight) and (right-
distance * right-weight). If they are equivalent then it is
balanced.
Main objective of this experiment is to train neural network to
classify this 3 type of balance scale.
Blood Transfusion Service Center

Teach the neural network to predict whether a blood donor


gave blood in March 2007 based on features that are provided
as input parameters.

119
Predicting the Result of Football Match with Neural
Networks

The main goal of this problem is to produce and train neural


network to predict whether home team wins, visitor team wins
or it will be draw in Barclays Premier League, given some
characteristics as input. First we need data set. For this
problem we pick results of Premier League season 2011/12.
Because of great number of matches we haphazardly sampled
106 results. Each result has 8 input and 3 output attributes.

Input attributes are:


1. Home team goalkeeper rating
2. Home team defence rating
3. Home team midfield rating
4. Home team attack rating
5. Visitor team goalkeeper rating
6. Visitor team defence rating
7. Visitor team midfield rating
8. Visitor team attack rating

Output attributes are:


1. Home team wins
2. Draw
3. Visitor team wins
Predicting the Workability of High-Performance
Concrete

These days, mix design of high-performance concrete is more


complex because it involves many variables and includes

120
various mineral and chemical added mixtures. Up to the
present time, the construction industry had to rest on a
relatively few human experts to give approvals in solving high
performance concrete mix design problem. This would usually
need costly human expert. However the situation may be
improved with the implementation of artificial intelligence that
manipulates the human brain in the way of thinking and giving
suggestion. The usefulness of artificial intelligence in solving
difficult problems has turn out to be recognized and their
development is being followed in many fields.
Concrete Compressive Strength Test

Concrete is the utmost important material in civil engineering.


The concrete’s compressive strength is an extremely nonlinear
function of age and ingredients.
For this mission, we will use Neuroph framework and
Concrete Compressive Strength dataset.
Glass Identification Using Neural Networks

The goal of this project is to use Neuroph framework for


training the neural network that uses Glass Identification data
set to categorize the glass in some of the predefined classes.
Glass Identification data set was made to help in criminological
investigation. At the scene of the crime, the glass left can be
used as evidence, but only if it is appropriately identified. Each
example is classified as he following:
building_windows_float_processed,
building_windows_non_float_processed,
vehicle_windows_float_processed,
vehicle_windows_non_float_processed, containers, tableware
and headlamps.

121
The features are RI: refractive index, Na: Sodium (unit
measurement: weight percent in corresponding oxide, as are
attributes 4-10), Mg: Magnesium, Al: Aluminum, Si: Silicon, K:
Potassium, Ca: Calcium, Ba: Barium, Fe: Iron.
Main aim of this experiment is to train neural network to
classify into this 7 types of glass.
Teaching Assistant Evaluation

The main goal is to train the neural network with data, which
can be found online, to classify the quality of the teachers’
performance. The data set consists of evaluations of teaching
performance over three consistent semesters and two summer
semesters of 164 teaching assistant (TA) assignments at the
Mathematics Department of the University of Wisconsin-
Madison. The scores were distributed into 3 roughly equal-
sized classes ("low", "medium", and "high") to form the class
variable.
Predicting Protein Localization Sites Using Neural
Networks

The main aim of this project is to create and train neural


network to predict protein localization sites. The initial step to
any Machine learning approach is to get the data set. Here we
choose results for Predicting Protein Localization Sites in
Eukaryotic Cells.
Predicting the Religion of European States Using
Neural Networks

The aim of this ML problem is to create and train neural


network to predict the religion of European countries,
providing some features as input. As usual we require a data

122
set. The data that we use in this experiment can be found at
Europe Data Center. Data that are collected referring to 49
European countries. Each country has 26 input features and 1
output attribute that is the religion.

Input features are:


1. Region of Europe where the country is
2. Total Area that the country covers (in thousands of
square km)
3. Population (in round millions)
4. Language of the country
5. Number of vertical bars in the flag
6. Number of horizontal stripes in the flag
7. Number of different colors in the flag
8. If red is present or not in the flag
9. If green is present or not in the flag
10. If blue is present or not in the flag
11. If gold is present or not in the flag
12. If yellow is present or not in the flag
13. If white is present or not in the flag
14. If black is present or not in the flag
15. If orange is present or not in the flag
16. .Major color in the flag (tie-breaks decided by taking the
topmost shade, if that fails then the most central shade,
and if that fails the leftmost shade)
17. Number of circles in the flag
18. Number of upright crosses
19. Number of diagonal crosses
20. Number of sun or star symbols
21. If a crescent moon symbol is present or not
22. If any triangles are present or not

123
23. If an animated image (e.g., an eagle, a tree, a human
hand) is present or not
24. If any letters or writing on the flag (e.g., a motto or
slogan) is present or not
25. Witch color is in the top-left corner (moving right to
decide tie-breaks)
26. Which color is in the bottom-right corner (moving left
to decide tie-breaks)

Output attribute is: - Religions of each country

Predicting the Burned Area of Forest Fires Using


Neural Networks

Our main goal here is to utilize the twelve input features (in
the original data set) to predict the burned area of forest fires.
The output "area" was first transformed with a ln(x+1)
function. At that moment, numerous Data Mining methods
were applied. Next, fitting the models, the outputs were post-
processed with the inverse of the ln(x+1) transform. Four
dissimilar input setups were used. The experiments were
performed using a 10-fold (cross-validation) x 30 runs. Two
regression metrics were measured: MAD and RMSE. A
Gaussian support vector machine (SVM) fed with only 4 direct
weather conditions (temp, RH, wind and rain) obtained the
best MAD value: 12.71 +- 0.01 (mean and confidence interval
within 95% using a t-student distribution). The best RMSE was
attained by the naive mean predictor. An analysis to the
regression error curve (REC) indicates that the SVM model
predicts more examples within a lower known error. In effect,
the SVM model predicts better small fires, which are the
majority.

124
Wine Classification Using Neural Networks

In this project we try to build a neural network that can classify


wines from three wineries by thirteen attributes:
1. Alcohol
2. Malic Acid
3. Ash
4. Ash Alcalinity
5. Magnesium
6. Total Phenols
7. Flavanoids
8. Nonflavanoid Phenols
9. Proanthocyanins
10. Color Intensity
11. Hue
12. OD280/OD315 of dedulted wines
13. Proline

This is a case of a pattern recognition problem, where inputs


are associated with different classes, and we would like to
construct a neural network that not only classifies the known
wines properly, but also can generalize to accurately classify
wines that were not used to design the solution. The thirteen
neighborhood attributes will act as inputs to a neural network,
and the respective output for each will be a 3-element row
vector with a 1 in the position of the associated winery, #1, #2
or #3.
NeurophRM: Integration of the Neuroph
Framework into RapidMiner

125
The learning of artificial neural networks (NN) is ubiquitous in
the research literature, and covers its application and interest
in many research fields, including computer science, artificial
intelligence, optimization, data mining, statistics, even
bioinformatics, medicine, and many more.

Despite some shortcomings that NNs have, like the lack of the
interpretability of the built model, it is still a broadly used
technique and counted in most data analytics frameworks.
Since the neural network model is hard to understand, software
packages, especially commercial ones, typically simplify the
NN model, reducing it to several parameters that users can
modify. There are only few software products that offer full
range of neural network customizable models, and they require
proficiency in understanding the neural network paradigm. In
open-source community, there are currently several stable
neural network frameworks that bid to experts the tool for full
customization of NN models.

Since RapidMiner is an open-source framework, connection to


one of these NN frameworks would draw attention of more
users, proposing a more customizable and powerful NN tool
for managing various data mining tasks. This is especially true
for NN experts, who would certainly find RapidMiner a useful
tool for overall data analysis and all the logistic support for
using NN models, including preprocessing, evaluation,
comparison with different algorithms, etc.
)

126
Open sources resources

Some of the resources for learning Artificial neural network are


here while most of them are open source some of them may
cost you a bit:

 Coursera — Machine Learning (Andrew Ng)


 Coursera — Neural Networks for Machine Learning
(Geoffrey Hinton)
 Udacity — Intro to Machine Learning (Sebastian
Thrun)
 Udacity — Machine Learning (Georgia Tech)
 Udacity — Deep Learning (Vincent Vanhoucke)
 Machine Learning (mathematicalmonk)
 Practical Deep Learning For Coders (Jeremy Howard
& Rachel Thomas)
 Stanford CS231n — Convolutional Neural Networks
for Visual Recognition (Winter 2016) (class link)
 Stanford CS224n — Natural Language Processing
with Deep Learning (Winter 2017) (class link)
 Oxford Deep NLP 2017 (Phil Blunsom et al.)
 Reinforcement Learning (David Silver)
 Practical Machine Learning Tutorial with Python
(sentdex)

127
Issues and Challenges

Uncertainty

A drawback of Artificial Neural Networks is that the


uncertainty in the predictions generated is rarely computed.
Failure to reason for such uncertainty makes it impossible to
measure the quality of ANN predictions, which harshly
confines their efficiency. In an effort to report this, a few
researchers have applied Bayesian techniques to ANN training.

Lots and Lots of Data

Deep learning algorithms are trained to learn progressively


using data. Large data sets are required to make sure that the
machine delivers anticipated results. As human brain requires
a lot of experiences to learn and infer information, the parallel
artificial neural network needs abundant amount of data. The
more powerful abstraction you want, the more parameters
need to be tweaked and more parameters require more amount
of data.
For instance, a speech recognition program would call for data
from multiple languages, demographics and time scales.
Researchers feed terabytes of data for the algorithm to learn a
single dialect. This is a time-consuming process and
necessitates marvelous data processing capabilities. To some
level, the scope of solving a problem through Deep Learning
is subjected to availability of huge corpus of data it would train
on.

128
The complexity of a neural network can be expressed through
the number of parameters. In the case of deep neural networks,
this number can be in the range of millions, tens of millions
and in some cases even hundreds of millions. Let’s call this
number P. Since you want to be certain of the model’s ability
to generalize, a good rule of a thumb for the number of data
points is at least P*P.
Overfitting in Neural Networks

At times, there is a sharp variance in error occurred in training


data set and the error encountered in a new unobserved data
set. It happens in complex models, such as having too many
parameters relative to the number of observations. The
efficacy of a model is judged by its capability to perform well
on an unobserved data set and not by its performance on the
training data fed to it.

The Training error is in blue and the Validation error is in red


(Overfitting) as a function of the number of cycles. In general,

129
a model is normally trained by make the most of its
performance on a particular training data set. The model thus
memorizes the training examples but does not learn to
generalize to new situations or unseen observations of the data
set.

Hyperparameter Optimization

Hyperparameters are the parameters whose values are defined


prior to the beginning of the learning process. Altering the
value of such parameters by a minor amount can invoke a large
change in the performance of your model.
Depending on the default parameters and not performing
Hyperparameter Optimization can have a substantial impact
on the model performance. Also, having too little
hyperparameters and hand tuning them rather than optimizing
through proven techniques is also a performance driving
aspect.

Requires High-Performance Hardware

Training a data set for a Deep Learning solution needs a lot of


data. To accomplish a task to solve real world problems, the
machine needs to be equipped with satisfactory processing
power. To guarantee better efficiency and less time
consumption, data scientists switch to multi-core high
performing GPUs and similar processing units. These
processing units are expensive and consume a lot of power.

130
Facebook’s Oregon Data Center is shown in the figure above.
Industry level Deep Learning systems require high-end data
centers while smart devices such as drones, robots other
mobile devices need small but efficient processing units.
Deploying Deep Learning solution to the real world thus
becomes a costly and power consuming affair.

Neural Networks Are Essentially a Blackbox

We know our model parameters, we feed labeled data to the


neural networks and also feed how they are put together. But
we usually do not recognize how they come to at a particular
solution. Neural networks are essentially Blackboxes and
researchers have had a hard time understanding how they infer
conclusions. The deficiency of ability of neural networks for
reason on an abstract level makes it difficult to implement
high-level cognitive functions. Also, their operation is mostly
invisible to humans, rendering them inappropriate for domains
in which verification of process is important.

131
Conversely, Murray Shanahan, Professor of Cognitive
Robotics at Imperial College London, has produced a paper
with his team which discusses Deep Symbolic Reinforcement
Learning, which platforms advancements in solving above-
mentioned hurdles.
Lack of Flexibility and Multitasking

Deep Learning models, once trained, can deliver extremely


efficient and perfect solution to a particular problem.
However, in the current scene, the neural network
architectures are greatly specialized to particular spheres of
application.

Most of our systems works on this subject, they are extremely


good at solving one problem. Even solving a much related
problem requires retraining and reevaluation. Researchers are
working tough in developing Deep Learning models which can
multitask without the requirement of reworking on the whole
architecture.
Even though, there are small developments in this facet using
Progressive Neural Networks. Also, there is substantial
progress towards Multi Task Learning (MTL). Researchers
from Google Brain Team and University of Toronto presented
a paper on MultiModel, a neural network architecture that lures
from the victory of vision, language and audio networks to
concurrently solve a number of problems covering multiple
domains, including image recognition, translation and speech
recognition.

Deep Learning may be one the chief research domains for


Artificial Intelligence, but it definitely is not flawless. While
discovering new and less explored territories of cognitive

132
technology, it is very usual to come across some hurdles and
complications. Some is the case with any technological
progress. The future witnesses the answer for the question “Is
Deep Learning our best solution towards real AI?”

133
Applications of ANN
Speech Recognition

Speech inhabits a noticeable role in human-human interaction.


Therefore, it is ordinary for people to expect speech interfaces
with computers. In the current era, for communication with
machines, humans still require sophisticated languages which
are hard to learn and use. To ease this communication barrier,
a simple solution could be, communication in a spoken
language that is promising for the machine to understand.

Excessive progress has been made in this field, however, still


such kinds of systems are facing the problem of inadequate
vocabulary or grammar along with the issue of retraining of the
system for dissimilar speakers in diverse conditions. ANN is
playing a key role in this domain. Following ANNs have been
used for speech recognition −

 Multilayer networks
 Multilayer networks with recurrent connections
 Kohonen self-organizing feature map

The handiest network for this is Kohonen Self-Organizing


feature map, which has its input as tiny segments of the speech
waveform. It will map the same kind of phonemes as the
output array, called as feature extraction technique. After
extracting the features, with the assistance of some audio
models as back-end processing, it will identify the utterance.

Character Recognition

134
It is a fascinating problem which comes under the general
domain of Pattern Recognition. Countless neural networks
have been developed for automatic recognition of handwritten
characters, either letters or digits. Following are some ANNs
which have been used for character recognition −

 Multilayer neural networks such as Backpropagation neural


networks.
 Neocognitron

However back-propagation neural networks have numerous


hidden layers, the pattern of connection from one layer to the
next is localized. Similarly, neocognitron also has more than a
few hidden layers and its training is done layer by layer for such
type of applications.

Signature Verification Application

Signatures are one of the most beneficial ways to authorize and


authenticate a person in legal transactions. Signature
verification system is a non-vision based technique.

The leading approach in this application is to extract the


feature or rather the geometrical feature set representing the
signature. With these feature sets, we have to train the neural
networks using an efficient neural network algorithm. This
trained neural network will classify the signature as being
genuine or forged under the verification stage.

Human Face Recognition

135
It is one of the biometric approaches to recognize the given
face. It is a distinctive task because of the characterization of
“non-face” images. Although, if a neural network is finely
trained, then it can be distributed into two classes namely
images having faces and images that do not have faces.

Initially, each of the input images must be preprocessed. Then,


the dimensionality of that image must be condensed. And,
finally it must be classified using neural network training
algorithm. Following neural networks are used for training
purposes with preprocessed image −

Fully-connected multilayer feed-forward neural network


trained with the help of back-propagation algorithm.

For dimensionality reduction, Principal Component Analysis


(PCA) is used.
Image Compression

Neural networks can accept and process massive amounts of


information at once, making them convenient in image
compression. With the Internet outburst and more sites using
more images on their contents, using neural networks for
image compression is worth a look.
Stock Market Prediction

The everyday business of the stock market is really


complicated. Several factors weigh in whether a given stock
will go up or down on any particular day. Since neural networks
can inspect a lot of information quickly and sort it all out, they
can be used to predict stock prices easily.

136
Traveling Saleman's Problem

Amusingly enough, neural networks can solve the traveling


salesman problem, but only to a certain notch of
approximation.
Medicine, Electronic Nose, Security, and Loan Applications -
These are certain applications that are in their proof-of-
concept stage, with the acceptation of a neural network that
will decide whether or not to grant a loan, something that has
already been used more successfully than many humans.

Future in NN

All current NN technologies will most likely be vastly


improved upon in the future. Everything from handwriting
and speech recognition to stock market prediction will become
more sophisticated as researchers develop better training
methods and network architectures.
NNs might, in the future, allow:

 Robots that can see, feel, and predict the world


around them
 Improved stock prediction
 Common usage of self-driving cars
 Composition of music
 Handwritten documents to be automatically
transformed into formatted word processing
documents
 Trends found in the human genome to aid in the
understanding of the data compiled by the human
genome project

137
 Self-diagnosis of medical problems using neural
networks
 And much more!

138
Summary

A brief history of the neural networks give us a basic


understanding of how it all started from the 1980s. Further
how the concept was developed into a practical model in the
early stages was seen.

The differences between the Artificial Neural Network and the


Biological Neural Network covered up topics to get a gist of
how Real Biological Neurons and how they differ from
Artificial Neurons.

The above topics were followed by a detailed study of Artificial


Neural Network. Any beginner in the field of artificial neural
network will understand the concepts like different Layers of
Artificial neural network and how it is structured.
The Learning process includes different types of Learning
mechanisms used in the ANN.

The two topologies of ANN are Feed forward networks and


Feedback networks
Weights tuning and Learning lets us improve the network
performance
Activations functions allows us to refine the networks.
Learning paradigms include four types of learning mechanisms
as follows: Supervised learning, Unsupervised learning, Semi-
Supervised Machine Learning, Reinforcement learning.

Major Variants of ANN are Multilayer perceptron (MLP),


Convolutional neural networks, recurrent neural network, long

139
short-term memory, Deep reservoir computing, deep belief
networks, etc.

Tools and Technologies used for ANNS are mentioned in the


following sections: Major libraries and Programming language
support.
Practical implementations show us various fields in which
ANNs are widely used like Text Classification, Image
Processing, etc.
Major NN projects show us real life implementations of
ANNS in different sectors. We can learn about ANNs in some
of the Open sources resources mentioned in the documents.

There are many issues and challenges face by the ANN users
nowadays, a brief content shows these.
The Applications of ANNs are large in number and varieties.
Some are mentioned in the document.

140
Thank you !
Thank you for buying this book! It is intended to help you
understanding machine learning. If you enjoyed this book and
felt that it added value to your life, we ask that you please take
the time to review it.
Your honest feedback would be greatly appreciated. It
really does make a difference.
If you noticed any problem, please let us know by
sending us an email at review@aisciences.net before
writing any review online. It will be very helpful for us
to improve the quality of our books.

We are a very small publishing company and our


survival depends on your reviews.
Please, take a minute to write us an honest review.

If you want to help us produce more material like this,


then please leave an honest review on amazon. It really
does make a difference.

https://www.amazon.com/dp/B07FTPKJMM

141
142

You might also like