Professional Documents
Culture Documents
I N TRO DU C T IO N TO N E U R A L
N E TWO R KS
AI Sciences Publishing
2
How to contact us
Please address comments and questions concerning this book
to our customer service by email at:
contact@aisciences.net
3
Table of Contents
4
Activation Functions .................................................................. 39
Linear Activation Function ........................................................ 39
Sigmoid Activation Function ..................................................... 40
Binary threshold signal function ................................................ 40
Bipolar threshold signal function .............................................. 41
Linear threshold (RAMP) signal function ................................ 41
Adjustments of Weights or Learning........................................... 41
Learning Paradigms ....................................................................41
Supervised Learning .................................................................... 42
Unsupervised Learning ............................................................... 43
Semi-Supervised Machine Learning .......................................... 44
Reinforcement Learning ............................................................. 45
Major Variants of Artificial Neural Network ................ 47
Multilayer Perceptron (MLP) ..................................................... 48
Activation Function ..................................................................... 49
Layers ............................................................................................. 50
Learning ......................................................................................... 50
Terminology.................................................................................. 52
Applications .................................................................................. 53
Convolutional neural networks ................................................... 53
The convolutional layer............................................................... 54
The Pooling Layer........................................................................ 55
The Output Layer ........................................................................ 57
Recurrent Neural Networks ....................................................... 57
Recurrent Neural Network Extensions ...................................... 60
Long Short-Term Memory ......................................................... 62
Deep Belief Networks ................................................................ 64
Deep Reservoir Computing ........................................................ 65
5
Caffe – Caffe: A Deep Learning Framework .......................... 67
TensorFlow ................................................................................... 67
MXNet - MXNet with Documentation ................................... 68
Keras .............................................................................................. 68
Lasagne .......................................................................................... 68
Blocks............................................................................................. 69
Pylearn2 ......................................................................................... 69
DeepPy .......................................................................................... 69
Deepnet ......................................................................................... 69
Gensim .......................................................................................... 69
nolearn ........................................................................................... 70
Passage ........................................................................................... 70
The Microsoft Cognitive Toolkit(CNTK) ............................... 70
FANN ............................................................................................ 70
Programming language support ................................................. 71
Python............................................................................................ 71
Java ................................................................................................. 71
Lisp ................................................................................................. 72
Prolog............................................................................................. 72
C++................................................................................................ 73
AIML ............................................................................................. 73
Practical implementations ............................................. 76
Text Classification ...................................................................... 76
Text Classification Using Neural Networks ............................ 76
Image Processing ........................................................................ 91
Recognizing Objects with Deep Learning ............................... 91
Building our Bird Classifier ...................................................... 106
Testing Our Network ................................................................ 111
Major NN projects ....................................................... 115
Recognition of Braille Alphabet Using Neural Networks ......... 115
Shuttle Landing Control ............................................................ 115
Music Classification by Genre Using Neural Networks ........... 115
Face Recognition Using Neural Network ................................. 116
Concept Learning and Classification - Hayes-Roth Data Set ... 116
6
Predicting Poker Hands with Neural Networks ....................... 116
Predicting Relative Performance of Computer Processors with
Neural Networks ....................................................................... 117
Predicting Survival of Patients Using Habermans Data Set ..... 117
Predicting the Class of Breast Cancer with Neural Networks .. 117
Breast Tissue Classification Using Neural Networks ............... 117
Classification of Animal Species Using Neural Networks ........ 117
Car Evaluation Using Neural Networks ................................... 118
Lenses Classification Using Neural Networks ......................... 118
Balance Scale Classification Using Neural Networks ............... 119
Blood Transfusion Service Center ............................................. 119
Predicting the Result of Football Match with Neural
Networks ................................................................................... 120
Predicting the Workability of High-Performance Concrete ...... 120
Concrete Compressive Strength Test ........................................ 121
Glass Identification Using Neural Networks ............................ 121
Teaching Assistant Evaluation.................................................. 122
Predicting Protein Localization Sites Using Neural Networks . 122
Predicting the Religion of European States Using Neural
Networks ................................................................................... 122
Predicting the Burned Area of Forest Fires Using Neural
Networks ................................................................................... 124
Wine Classification Using Neural Networks ............................ 125
NeurophRM: Integration of the Neuroph Framework into
RapidMiner ............................................................................... 125
7
Character Recognition .............................................................. 134
Signature Verification Application ............................................ 135
Human Face Recognition ......................................................... 135
Image Compression .................................................................. 136
Stock Market Prediction ............................................................ 136
Traveling Saleman's Problem ................................................... 137
Future in NN ............................................................................ 137
8
9
Do you want to discover, learn and understand the methods
and techniques of artificial intelligence, data science,
computer science, machine learning, deep learning or
statistics?
Would you like to have books that you can read very fast and
understand very easily?
Would you like to practice AI techniques?
If the answers are yes, you are in the right place. The AI
Sciences book series is perfectly suited to your expectations!
Our books are the best on the market for beginners,
newcomers, students and anyone who wants to learn more
about these subjects without going into too much theoretical
and mathematical detail. Our books are among the best sellers
on Amazon in the field.
About Us
Our books have had phenomenal success and they are today
among the best sellers on Amazon. Our books have helped
many people to progress and especially to understand these
techniques, which are sometimes considered to be complicated
rightly or wrongly.
The books we produce are short, very pleasant to read. These
books focus on the essentials so that beginners can quickly
understand and practice effectively. You will never regret
having chosen one of our books.
We also offer you completely free books on our website: Visit
our site and subscribe in our Email-List: www.aisciences.net
By subscribing to our mailing list, we also offer you all our new
books for free and continuously.
To Contact Us:
Website: www.aisciences.net
Email: contact@aisciences.net
Follow us on social media and share our publications
Facebook: @aisciencesllc
LinkedIn: AI Sciences
11
From AI Sciences Publishing
12
WWW.AISCIENCES.NET
EBooks, free offers of eBooks and online learning courses.
Did you know that AI Sciences offers free eBooks versions of
every books published? Please subscribe to our email list to be
aware about our free eBook promotion. Get in touch with us
at contact@aisciences.net for more details.
13
WWW.AISCIENCES.NET
Did you know that AI Sciences offers also online courses?
We want to help you in your career and take control of your
future with powerful and easy to follow courses in Data
Science, Machine Learning, Deep learning, Statistics and all
Artificial Intelligence subjects.
14
© Copyright 2016 by AI Sciences
All rights reserved.
First Printing, 2016
ISBN-13: 978-1985134560
ISBN-10: 198513456X
15
Legal Notice:
You cannot amend, distribute, sell, use, quote or paraphrase any part
or the content within this book without the consent of the author.
Disclaimer Notice:
16
Introduction to Artificial Neural Network
An Artificial Neural Network (ANN) is a computational
model. It is based on the structure and functions of biological
neural networks. It works like the way human (animal) brain
processes information. It includes a large number of connected
processing units called neurons that work together to process
information. They also generate meaningful results from it. In
this book, we will take you through the complete introduction
to Artificial Neural Network, Artificial Neural Network
Structure, layers of ANN, Applications, Algorithms, Tools and
technology, Practical implementations and the benefits and
limitations of ANN.
17
around the same time. In any case, the technology accessible
around then did not enable them to do excessively.
18
Soma (The cell body): It is the body of the nucleus. It sums all
the incoming signals to generate an input.
Axon: When the sum reaches a certain threshold value, the
neuron fires a signal which travels down the axon and is
transmitted to other neurons via the synapses terminals.
Artificial Neurons
19
• The weighted sum is called the net input to
unit i, regularly written neti.
• Note that wij refers to the weight from unit j to unit i (not
the other way around).
• The function f is the unit's activation function. In the least
complex case, f is the identity function, and the unit's
output is only its net input. This is called a linear unit.
20
21
What Is Artificial Neural Network?
Let Us Introduce
22
An Artificial Neural Network is an information processing
technique. It works like the way human brain processes
information. ANN incorporates an extensive number of
associated processing units that cooperate to process
information. They likewise produce significant outcomes from
it.
23
a. Input Layer:
The nodes of the input layer are passive, meaning they do not
change the data. They receive a single value on their input and
duplicate the value to their many outputs. From the input layer,
it duplicates each value and sent to all the hidden nodes.
b. Hidden Layer
24
values inside the network. In this, incoming arcs that go from
other hidden nodes or from input nodes connected to each
node. It connects with outgoing arcs to output nodes or to
other hidden nodes. In hidden layer, the actual processing is
done via a system of weighted ‘connections’. There may be one
or more hidden layers. The values entering a hidden node
multiplied by weights, a set of predetermined numbers stored
in the program. The weighted inputs are then added to produce
a single number.
c. Output Layer
25
‘architecture’ or ‘topology’. It consists of the number of layers,
Elementary units. It also consists of interconnected Weight
adjustment mechanism. The choice of the structure determines
the results which are going to obtain. It is the most critical part
of the implementation of a neural network.
26
Learning Process
Unsupervised Learning
Reinforcement Learning
28
The cost function C is an imperative idea in learning, as it is a
measure of how far away a specific solution is from an ideal
solution to the issue to be fathomed. Learning algorithms seek
through the solution space to discover a function that has the
littlest conceivable cost.
29
Why Neural Networks?
30
Let Us Introduce
31
the execution stage that is not associated with an output, the
neuron selects the output that corresponds to the pattern from
the set of patterns that it has been taught of, that is least
different from the input. This is called generalization.
For example:
32
Network Topology
33
completely associated with the output layer.
34
Each neuron in one layer has directed connections to the
neurons of the subsequent layer.
The first layer is an input layer, the last layer is the output layer
and the layers between the input and the output layers are
hidden layers. This hidden layer is internal to the network and
has no direct connection with the external environment. There
can be more than one hidden layers however hypothetical
work has demonstrated that one hidden layer is adequate to
assess any complex nonlinear function. The complexity of the
network increments with the expansion in the quantity of
hidden layers. At the point when the quantity of hidden layers
are substantial an effectiveness of output response increases.
Multi-layer networks utilize an assortment of learning
techniques, the most well-known being back-propagation.
Here, the output values are contrasted with the right answer
with compute the value of some predefined error-function.
The algorithm changes the weights of every connection
keeping in mind the end goal to lessen the value of the error
function by some little sum. Subsequent to rehashing this
process for an adequately vast number of training cycles, the
network will for the most part focalize to some state where the
error of the computations is little. For this situation, one would
state that the network has learned a specific target function. To
alter weights legitimately, one applies a general strategy for
non-linear optimization that is called gradient descent
Feedback Network
35
which changes persistently until the point that it achieves a
state of equilibrium. It might be partitioned into the
accompanying kinds:
36
MultiLayer Recurrent Network
37
iii. Hopfield Network
38
many people in following work.1 They work tremendously well
on a large variety of problems, and are now widely used.
v. Elman Network
Activation Functions
F(x) = x
39
Sigmoid Activation Function
40
function is binary i.e. either 0 or 1
41
Supervised Learning
Y = f(X)
42
thought of as an educator overseeing the learning process. We
know the right answers, the algorithm iteratively makes
predictions on the training data and is rectified by the educator.
Learning stops when the algorithm accomplishes an adequate
level of performance.
Unsupervised Learning
Unsupervised learning is the place you just have input data (X)
and no corresponding output variables. The objective for
unsupervised learning is to model the basic structure or
appropriation in the data with a specific end goal to learn more
about the data.
43
own devises to find and present the fascinating structure in the
data.
Problems where you have a lot of input data (X) and just a
portion of the data is labelled (Y) are called semi-supervised
learning problems.
44
A decent example is a photograph archive where just a portion
of the pictures are labeled, (e.g. canine, feline, individual, etc.)
and the greater part are unlabeled.
Reinforcement Learning
45
46
Major Variants of Artificial Neural
Network
47
Multilayer Perceptron (MLP)
48
Multilayer perceptrons are sometimes colloquially referred to
as "vanilla" neural networks, especially when they have a single
hidden layer.
Activation Function
49
Elective activation functions have been proposed, including
the rectifier and softplus functions. More particular activation
functions incorporate radial basis functions (utilized as a part
of radial basis networks, another class of supervised neural
network models).
Layers
Learning
50
where d is the target value and y is the value delivered by the
perceptron. The node weights are balanced in light of
corrections that limit the error in the whole output, given by
51
This relies upon the adjustment in weights of the kth nodes,
which speak to the output layer. So to change the hidden layer
weights, the output layer weights change as per the derivative
of the activation function, thus this algorithm presents the
back propagation of the activation function.
Terminology
52
loosening of the definition of "perceptron" to mean an
artificial neuron in common.
Applications
53
Let’s see all of these in a slightly additional details
We have
54
The 6*6 image is now converted into a 4*4 image. Think of
weight matrix like a paint brush painting a wall. The brush first
paints the wall horizontally and then comes down and paints
the next row horizontally. Pixel values are used again when the
weight matrix moves along the image. This essentially enables
parameter sharing in a convolutional neural network.
55
of succeeding convolution layers. Pooling is done for the lone
purpose of decreasing the spatial size of the image. Pooling is
done autonomously on each depth dimension, thus the depth
of the image remains unaffected. The most common form of
pooling layer generally applied is the max-pooling.
56
preserves the information that it’s a car on a street. If you
observe carefully, the dimensions if the image have been split
fifty-fifty. This helps to diminish the parameters to a great
extent.
57
recognition.
58
x_t is the input at time step t. For example, x_1 could be a
one-hot vector corresponding to the second word of a
sentence.
You can consider the hidden state s_t as the memory of the
network. s_t captures information about what occurred in
all the previous time steps. The output at step o_t is
computed exclusively on the grounds of the memory at time
t. As briefly talked about above, it’s a bit more complex in
practice as s_t normally can’t capture information from too
many time steps ago.
The above figure has outputs at every single time step, but
relying on the task this may not be essential. For instance,
when predicting the sentiment of a sentence we may simply
care about the final output, not the sentiment after every
single word. Similarly, we may not require inputs at every
single time step. The key feature of an RNN is its hidden
state, which captures some information about a sequence.
RNNs have shown abundant success in many NLP tasks. At
this point I should comment that the most usually used type of
RNNs are LSTMs, which are much better at capturing long-
term dependencies than vanilla RNNs are.
Bidirectional RNNs are built on the idea that the output at time
t may not only be subject to the previous elements in the
sequence, but also future elements. For example, to predict a
missing word in a sequence you want to observe at both the
left and the right context. Bidirectional RNNs are quite easy to
understand. They are just two RNNs arranged on top of each
other. The output is then calculated based on the hidden state
of both RNNs.
60
Deep (Bidirectional) RNNs are alike to Bidirectional RNNs,
only that we now have numerous layers per time step. In
practice this gives us a greater learning capacity (but we also
need a lot of training data).
61
LSTM networks are pretty widely held these days and we
briefly talked about them in the section above. LSTMs don’t
have a profoundly dissimilar architecture from RNNs, but they
use a different function to compute the hidden state. The
memory in LSTMs are known as cells and you can consider
them as black boxes that take as input the previous state h_{t-
1} and present input x_t. On the inside these cells choose what
to keep in (and what to wipe away from) memory. They then
combine the previous state, the present memory, and the input.
It turns out that these types of units are very resourceful at
capturing long-term dependencies.
62
predict time series given time lags of indefinite size and period
between important events. LSTMs were created to deal with
the exploding and vanishing gradient problem when training
traditional RNNs. Relative insensitivity to gap length gives an
benefit to LSTM over other RNNs, hidden Markov models
(HMM) and other sequence learning methods in several
applications
63
weights and biases, from perhaps other units outside the LSTM
unit.
Deep Belief Networks
64
implementations and uses of DBNs in real-life applications
and scenarios (e.g., electroencephalography, drug discovery).
Deep Reservoir Computing
65
Tools and Technologies
Major libraries
TensorFlow
67
with a modular GUI on top of it. Some have disapproved it
for not being as fast as some of the other heavily-optimized
libraries but the truth is not the same.
Keras
Lasagne
68
Blocks
Pylearn2
DeepPy
Deepnet
Gensim
69
nolearn
Passage
Passage is one of the finest suited library for text analysis with
RNNs.
FANN
70
framework for easy handling of training data sets. It is easy to
use, multipurpose, well documented, and fast. Bindings to
more than 20 programming languages are available. An easy to
read introduction article and a reference manual escorts the
library with examples and recommendations on how to use the
library. Numerous graphical user interfaces are also available
for the library.
Python
The choice of Python for AI projects also stems from the fact
that there are plenteously of beneficial libraries that can be used
in AI. For instance, Numpy provides scientific computation
capability, Scypy for advanced computing and Pybrain for
machine learning in Python.
Java
71
community is also an advantageous point as there will be
someone to help you with your queries and problems.
Lisp
Lisp gets along very well in the AI field because of its excellent
prototyping capabilities and its sustenance for symbolic
expressions. It's a dominant programming language and is used
in major AI projects, such as Macsyma, DART, and CYC.
Prolog
72
structuring mechanisms. Combining these mechanisms offers
a flexible framework to work with.
C++
AIML
73
memory allocation, data types, recursion, associative retrieval,
functions as arguments, generators (streams), and cooperative
multitasking.
74
Haskell is also a very worthy programming language for AI.
Lazy evaluation and the list and LogicT monads make it stress-
free to express non-deterministic algorithms, which is often
the case. Endless data structures are great for search trees. The
language's features permit a compositional way of expressing
the algorithms. The only disadvantage is that working with
graphs is a bit tougher at first because of purity.
75
Practical implementations
Text Classification
76
As with its ‘Naive’ counterpart, this classifier isn’t endeavoring
to understand the meaning of a sentence, it’s trying to classify
it. In fact so called “AI chat-bots” do not understand language,
but that’s not the topic for now.
import nltk
from nltk.stem.lancaster import LancasterStemmer
import os
import json
import datetime
stemmer = LancasterStemmer()
77
# 3 classes of training data
training_data = []
training_data.append({"class":"greeting",
"sentence":"how are you?"})
training_data.append({"class":"greeting",
"sentence":"how is your day?"})
training_data.append({"class":"greeting",
"sentence":"good day"})
training_data.append({"class":"greeting",
"sentence":"how is it going today?"})
training_data.append({"class":"goodbye",
"sentence":"have a nice day"})
training_data.append({"class":"goodbye",
"sentence":"see you later"})
training_data.append({"class":"goodbye",
"sentence":"have a nice day"})
training_data.append({"class":"goodbye",
"sentence":"talk to you soon"})
training_data.append({"class":"sandwich",
"sentence":"make me a sandwich"})
training_data.append({"class":"sandwich",
"sentence":"can you make a sandwich?"})
training_data.append({"class":"sandwich",
"sentence":"having a sandwich today?"})
training_data.append({"class":"sandwich",
"sentence":"what's for lunch?"})
print ("%s sentences in training data" %
len(training_data))
78
# tokenize each word in the sentence
w = nltk.word_tokenize(pattern['sentence'])
# add to our words list
words.extend(w)
# add to documents in our corpus
documents.append((w, pattern['class']))
# add to our classes list
if pattern['class'] not in classes:
classes.append(pattern['class'])
# remove duplicates
classes = list(set(classes))
12 documents
3 classes ['greeting', 'goodbye', 'sandwich']
26 unique stemmed words ['sandwich', 'hav', 'a',
'how', 'for', 'ar', 'good', 'mak', 'me', 'it',
'day', 'soon', 'nic', 'lat', 'going', 'you',
'today', 'can', 'lunch', 'is', "'s", 'see',
'to', 'talk', 'yo', 'what']
79
Our training data is transformed into “bag of words” for each
sentence.
training.append(bag)
# output is a '0' for each tag and '1' for
current tag
output_row = list(output_empty)
output_row[classes.index(doc[1])] = 1
output.append(output_row)
# sample training/output
i = 0
w = documents[i][0]
80
print ([stemmer.stem(word.lower()) for word in
w])
print (training[i])
print (output[i])
is stemmed:
[0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0]
[1, 0, 0]
81
Your initial step in machine learning is to have clean data.
82
Also below we implement our bag-of-words model function,
transforming an input sentence into an array of 0’s and 1’s.
This matches exactly with our transform for training data. It is
always crucial to get this right.
import numpy as np
import time
def clean_up_sentence(sentence):
# tokenize the pattern
sentence_words =
nltk.word_tokenize(sentence)
# stem each word
sentence_words = [stemmer.stem(word.lower())
for word in sentence_words]
return sentence_words
83
print ("found in bag: %s" %
w)
return(np.array(bag))
last_mean_error = 1
# randomly initialize our weights with mean
0
synapse_0 = 2*np.random.random((len(X[0]),
hidden_neurons)) - 1
84
synapse_1 =
2*np.random.random((hidden_neurons,
len(classes))) - 1
prev_synapse_0_weight_update =
np.zeros_like(synapse_0)
prev_synapse_1_weight_update =
np.zeros_like(synapse_1)
synapse_0_direction_count =
np.zeros_like(synapse_0)
synapse_1_direction_count =
np.zeros_like(synapse_1)
for j in iter(range(epochs+1)):
if(dropout):
layer_1 *=
np.random.binomial([np.ones((len(X),hidden_neuro
ns))],1-dropout_percent)[0] * (1.0/(1-
dropout_percent))
layer_2 = sigmoid(np.dot(layer_1,
synapse_1))
85
else:
print ("break:",
np.mean(np.abs(layer_2_error)), ">",
last_mean_error )
break
synapse_1_weight_update =
(layer_1.T.dot(layer_2_delta))
synapse_0_weight_update =
(layer_0.T.dot(layer_1_delta))
synapse_1 += alpha *
synapse_1_weight_update
synapse_0 += alpha *
synapse_0_weight_update
prev_synapse_0_weight_update =
synapse_0_weight_update
86
prev_synapse_1_weight_update =
synapse_1_weight_update
now = datetime.datetime.now()
# persist synapses
synapse = {'synapse0': synapse_0.tolist(),
'synapse1': synapse_1.tolist(),
'datetime': now.strftime("%Y-%m-
%d %H:%M"),
'words': words,
'classes': classes
}
synapse_file = "synapses.json"
87
We use 20 neurons in our hidden layer, you can adjust this
easily. These parameters will vary depending on the
dimensions and shape of your training data, tune them down
to ~10^-3 as a reasonable error rate.
X = np.array(training)
y = np.array(output)
start_time = time.time()
88
delta after 50000 iterations:0.00261467859609
delta after 60000 iterations:0.00237219554105
delta after 70000 iterations:0.00218521899378
delta after 80000 iterations:0.00203547284581
delta after 90000 iterations:0.00191211022401
delta after 100000 iterations:0.00180823798397
saved synapses to: synapses.json
processing time: 6.501226902008057 seconds
# probability threshold
ERROR_THRESHOLD = 0.2
# load our calculated synapse values
synapse_file = 'synapses.json'
with open(synapse_file) as data_file:
synapse = json.load(data_file)
synapse_0 = np.asarray(synapse['synapse0'])
synapse_1 = np.asarray(synapse['synapse1'])
89
return_results =[[classes[r[0]],r[1]] for r in
results]
print ("%s \n classification: %s" % (sentence,
return_results))
return return_results
90
are the same: some predictive situations require more confidence than
others.
Image Processing
91
Any 3-year-old child can identify a photograph of a bird, but
figuring out how to make a computer identify objects has
baffled the very greatest computer scientists for over 50 years.
Starting Simple
We have also seen that the idea of machine learning is that the
same generic algorithms can be used again with diverse data to
solve diverse problems. So let’s change this same neural
network to identify handwritten text. But to make the task
really simple, we’ll only try to recognize one letter — the
numeral “8”.
Machine learning merely works when you not only have data
but preferably a lot of data. So we need more and more of
handwritten “8”s to get started. Fortunately, researchers have
created the MNIST data set of handwritten numbers for this
very purpose. MNIST provides 60,000 images of handwritten
digits, each as an 18x18 image. Here are some “8”s from the
data set:
92
Some 8s from the MNIST data set
Observe that our neural network also has two outputs now
(rather than just one). The first output will predict the
probability that the image is an “8” and the second output will
predict the probability it is not an “8”. By having a distinct
output for each type of object we want to identify, we can
utilize a neural network to classify objects into groups.
Our neural network is even bigger than the last time (324
inputs rather than 3!). But then again any modern computer
can handle a neural network with a small number of nodes like
94
some hundred nodes without blinking. This would even work
as satisfactory on your cell phone.
All that is left is to train the neural network with images of “8”s
and not-“8"s so it learns to differentiate between them. When
we input an “8”, we’ll tell it the likelihood the image is an “8”
is 100% and the likelihood it’s not an “8” is 0%. Vice versa for
the opposite images.
Tunnel Vision
The better part is that our “8” identifier actually does work well
on simple images where the letter is right in the middle of the
image:
95
But now the truly bad part of the news:
Our “8” identifier totally fails to work when the letter isn’t
perfectly centered in the image. Just the smallest amount of
position change ruins everything:
96
This approach called a sliding window. It’s the brute force
solution. It works well in some limited cases, but it’s really
inefficient. You have to check the same image over and over
looking for objects of different sizes. We can do better than
this!
Brute Force Idea #2: More data and a Deep Neural Net
When we trained our network, we only showed it “8”s that
were perfectly centered. Imagine what happens if we train it
with even more data, as well as “8”s in all different positions
and sizes all around the image?
We don’t even require to collect some new training data. We
can just develop a script to produce new images with the “8”s
in all kinds of diverse positions in the image:
97
We generated Synthetic Training Data by creating diverse
versions of the training images we already had. This is a very
beneficial technique!
Using this technique, we can effortlessly create a boundless
supply of training data.
98
We call this a “deep neural network” because it has even more
layers than a old-fashioned neural network.
This idea has been around since the late 1960s. But until
recently, training this large of a neural network was just too
slow to be useful. But once we figured out how to use 3d
graphics cards (which were designed to do matrix
multiplication really fast) instead of normal computer
processors, working with large neural networks suddenly
became practical. In fact, the exact same NVIDIA GeForce
GTX 1080 video card that you use to play Overwatch can be
used to train neural networks incredibly quickly.
But even though we can make our neural network really big
and train it quickly with a 3d graphics card that still isn’t going
to get us all the way to a solution. We are required to be cleverer
about how we process images into our neural network.
Think about it. It doesn’t make sense to train a network to
identify an “8” at the top of a picture separately from training
it to recognize an “8” at the bottom of a picture as if those
were two totally dissimilar objects.
There should be a particular way to make the neural network
smart enough to know that an “8” someplace in the picture is
the same thing without all that extra training. Luckily… there
is!
99
As a human, you instantly recognize the hierarchy in this
picture:
• The ground is covered in grass and concrete
• There is a child
• The child is sitting on a bouncy horse
• The bouncy horse is on top of the grass
100
We’ll do this using a technique called Convolution. The idea of
convolution is inspired partly by computer science and partly
by biology (i.e. mad scientists literally poking cat brains with
weird probes to figure out how cats process images).
How Convolution Works
Rather than feeding entire images into our neural network as
one grid of numbers, we’re going to do something a lot smarter
that takes benefit of the idea that an object is the identical no
matter where it appears in an image.
Here is how the process is going to be in step by step manner:
Step 1: Break the image into overlapping image tiles
Similar to our sliding window search above, let’s pass a sliding
window over the entire original image and save each result as
a separate, tiny picture tile:
101
Previously, we fed a single image into a neural network to
check if it was an “8”. We’ll do the exact similar thing here, but
we’ll do it for each individual image tile:
102
In other words, we’ve started with a large original image and
we ended with a slightly smaller array that stores the
information about which sections of our original image were
the most interesting.
Step 4: Downsampling
The result of Step 3 was an array that maps out which parts of
the original image are the most interesting. But that array is still
very large enough:
103
To decrease the size of this array, we downsample it using an
algorithm called max pooling. It sounds something advanced,
but it isn’t at all.
We’ll just observe each 2x2 square of the array and keep the
biggest number:
104
Some more additional steps:
The image processing pipeline is a sequence of steps which are
convolution, max-pooling, and finally a fully-connected
network.
While solving problems in the real world, these steps can be
joined and piled as many times as you want! You can have two,
three or even more number of layers, like ten convolution
layers. You can toss in max pooling wherever you want to
decrease the size of your data.
The plain idea is to start with a large image and continually boil
it down, step-by-step, until you finally have a single result.
Higher the number of convolution steps you have, greater will
be the complicated features your network will be able to learn
to recognize.
For instance, the initial convolution step might learn to identify
sharp edges, the second convolution step might identify beaks
using its knowledge of sharp edges, the third step might
recognize entire birds using its knowledge of beaks, etc.
Here’s what a more realistic deep convolutional network (like
you would find in a research paper) looks like:
105
In this situation, they start with a 224 x 224 pixel image, apply
convolution and max pooling two times, apply convolution 3
more number of times, apply max pooling and then have two
fully-connected layers. The final result is that the image is
categorized into one of 1000 categories!
106
we’ll also add in the Caltech-UCSD Birds-200–2011 data set
that has another 12,000 bird pictures.
Here are a few of the birds from our combined data set:
This data set will do fine for our intentions, but 72,000 low-res
images is still pretty small for real-world applications. If you
want Google-level performance, you require millions of large
images. In machine learning, having more amount of data is
almost always more significant that having superior algorithms.
Now you know why Google is so happy to offer you limitless
photo storage. They want all of your image data.
107
To make our own classifier, we’ll use TFLearn. TFlearn is a
wrapper around Google’s TensorFlow deep learning library
that disclosures a simplified API. It lets us build convolutional
neural networks as easy as writing a few lines of code to
describe the layers of our network.
Now that we have a trained neural network, we can utilize it.
Here’s a simple script that takes in a single image file and
predicts if it is a bird or not.
"""
Based on the tflearn example located here:
https://github.com/tflearn/tflearn/blob/master/e
xamples/images/convnet_cifar10.py
"""
from __future__ import division, print_function,
absolute_import
108
# Make sure the data is normalized
img_prep = ImagePreprocessing()
img_prep.add_featurewise_zero_center()
img_prep.add_featurewise_stdnorm()
data_preprocessing=img_prep,
data_augmentation=img_aug)
# Step 1: Convolution
network = conv_2d(network, 32, 3,
activation='relu')
109
# Step 7: Dropout - throw away some data
randomly during training to prevent over-fitting
network = dropout(network, 0.5)
loss='categorical_crossentropy',
learning_rate=0.001)
# Scale it to 32x32
img = scipy.misc.imresize(img, (32, 32),
interp="bicubic").astype(np.float32,
casting='unsafe')
110
# Predict
prediction = model.predict([img])
if is_bird:
print("That's a bird!")
else:
print("That's not a bird!")
If you are training with a good video card with enough RAM
(like an Nvidia GeForce GTX 980 Ti or better), this will be
done in less than an hour. If you are training with a normal
CPU, it might take a lot longer.
As it trains, the accuracy will increase. After the first pass, I got
75.4% accuracy. After just 10 passes, it was already up to
91.7%. After 50 or so passes, it capped out around 95.5%
accuracy and additional training didn’t help, so I stopped it
there.
Congrats! Our program can now recognize birds in images!
On the other hand to really see how effective our network is,
we need to check it with lots of images. The data set created
by me held back 15,000 images for validation. On running
those 15,000 images through the network, it predicted the
correct answer 95% of the time.
112
Horses and trucks don’t fool us!
Third, here are some images that we thought were
birds but were not really birds at all. These are
our False Positives:
113
Imagine if we were writing a program to detect cancer from an
MRI image. If we were detecting cancer, we’d rather have false
positives than false negatives. False negatives would be the
worst possible case — that’s when the program told someone
they definitely didn’t have cancer but they actually did.
Instead of just looking at overall accuracy, we calculate
Precision and Recall metrics. Precision and Recall metrics give
us a clearer picture of how well we did:
114
Major NN projects
115
standards used to classify, and can do so in a generalized
manner by repetitively showing a neural network inputs
classified into groups. Neural network provides a fresh
solution for music classification, so a new music classification
method is projected based on BP neural network in this
experiment.
Face Recognition Using Neural Network
116
Predicting Relative Performance of Computer
Processors with Neural Networks
117
The purpose of this experiment is to study the feasibility of
classification animal species using neural networks. An animal
class is made up of animal that are all alike in important ways.
Hence we need to train a neural network to make it able to
predict which species fit to a particular set. Once we have
decided on a problem to solve using neural networks, we will
want to gather data for training purposes. The training data set
includes a various variety of cases, each comprising values for
a range of input and output variables.
Another variant for this type of project is classification of
animal species on the basis of 17 Boolean-valued attributes.
This project is for testing Neuroph with Car Dataset which can
be found here:
http://archive.ics.uci.edu/ml/datasets/Car+Evaluation.
Several architectures will be tried out, and it will be determined
which ones represent a good solution to the problem, and
which ones does not. Car Evaluation Data set was obtained
from a modest hierarchical decision model.
118
Neuroph framework is to be used to train the neural network
that uses the Database for fitting contact lenses (Lenses data
set). The dataset used is taken from a paper by Cendrowska
(1988) on the inductive examination of a set of ophthalmic
data. The lenses data set tries to predict whether a person will
need soft contact lenses, hard contact lenses or no contacts, by
determining related features of the client.
The data set has 4 features (age of the patient, spectacle
prescription, notion on astigmatism, and information on tear
production rate) along with an associated three-valued class
that gives the suitable lens prescription for patient (hard
contact lenses, soft contact lenses, no lenses).
Balance Scale Classification Using Neural Networks
119
Predicting the Result of Football Match with Neural
Networks
120
various mineral and chemical added mixtures. Up to the
present time, the construction industry had to rest on a
relatively few human experts to give approvals in solving high
performance concrete mix design problem. This would usually
need costly human expert. However the situation may be
improved with the implementation of artificial intelligence that
manipulates the human brain in the way of thinking and giving
suggestion. The usefulness of artificial intelligence in solving
difficult problems has turn out to be recognized and their
development is being followed in many fields.
Concrete Compressive Strength Test
121
The features are RI: refractive index, Na: Sodium (unit
measurement: weight percent in corresponding oxide, as are
attributes 4-10), Mg: Magnesium, Al: Aluminum, Si: Silicon, K:
Potassium, Ca: Calcium, Ba: Barium, Fe: Iron.
Main aim of this experiment is to train neural network to
classify into this 7 types of glass.
Teaching Assistant Evaluation
The main goal is to train the neural network with data, which
can be found online, to classify the quality of the teachers’
performance. The data set consists of evaluations of teaching
performance over three consistent semesters and two summer
semesters of 164 teaching assistant (TA) assignments at the
Mathematics Department of the University of Wisconsin-
Madison. The scores were distributed into 3 roughly equal-
sized classes ("low", "medium", and "high") to form the class
variable.
Predicting Protein Localization Sites Using Neural
Networks
122
set. The data that we use in this experiment can be found at
Europe Data Center. Data that are collected referring to 49
European countries. Each country has 26 input features and 1
output attribute that is the religion.
123
23. If an animated image (e.g., an eagle, a tree, a human
hand) is present or not
24. If any letters or writing on the flag (e.g., a motto or
slogan) is present or not
25. Witch color is in the top-left corner (moving right to
decide tie-breaks)
26. Which color is in the bottom-right corner (moving left
to decide tie-breaks)
Our main goal here is to utilize the twelve input features (in
the original data set) to predict the burned area of forest fires.
The output "area" was first transformed with a ln(x+1)
function. At that moment, numerous Data Mining methods
were applied. Next, fitting the models, the outputs were post-
processed with the inverse of the ln(x+1) transform. Four
dissimilar input setups were used. The experiments were
performed using a 10-fold (cross-validation) x 30 runs. Two
regression metrics were measured: MAD and RMSE. A
Gaussian support vector machine (SVM) fed with only 4 direct
weather conditions (temp, RH, wind and rain) obtained the
best MAD value: 12.71 +- 0.01 (mean and confidence interval
within 95% using a t-student distribution). The best RMSE was
attained by the naive mean predictor. An analysis to the
regression error curve (REC) indicates that the SVM model
predicts more examples within a lower known error. In effect,
the SVM model predicts better small fires, which are the
majority.
124
Wine Classification Using Neural Networks
125
The learning of artificial neural networks (NN) is ubiquitous in
the research literature, and covers its application and interest
in many research fields, including computer science, artificial
intelligence, optimization, data mining, statistics, even
bioinformatics, medicine, and many more.
Despite some shortcomings that NNs have, like the lack of the
interpretability of the built model, it is still a broadly used
technique and counted in most data analytics frameworks.
Since the neural network model is hard to understand, software
packages, especially commercial ones, typically simplify the
NN model, reducing it to several parameters that users can
modify. There are only few software products that offer full
range of neural network customizable models, and they require
proficiency in understanding the neural network paradigm. In
open-source community, there are currently several stable
neural network frameworks that bid to experts the tool for full
customization of NN models.
126
Open sources resources
127
Issues and Challenges
Uncertainty
128
The complexity of a neural network can be expressed through
the number of parameters. In the case of deep neural networks,
this number can be in the range of millions, tens of millions
and in some cases even hundreds of millions. Let’s call this
number P. Since you want to be certain of the model’s ability
to generalize, a good rule of a thumb for the number of data
points is at least P*P.
Overfitting in Neural Networks
129
a model is normally trained by make the most of its
performance on a particular training data set. The model thus
memorizes the training examples but does not learn to
generalize to new situations or unseen observations of the data
set.
Hyperparameter Optimization
130
Facebook’s Oregon Data Center is shown in the figure above.
Industry level Deep Learning systems require high-end data
centers while smart devices such as drones, robots other
mobile devices need small but efficient processing units.
Deploying Deep Learning solution to the real world thus
becomes a costly and power consuming affair.
131
Conversely, Murray Shanahan, Professor of Cognitive
Robotics at Imperial College London, has produced a paper
with his team which discusses Deep Symbolic Reinforcement
Learning, which platforms advancements in solving above-
mentioned hurdles.
Lack of Flexibility and Multitasking
132
technology, it is very usual to come across some hurdles and
complications. Some is the case with any technological
progress. The future witnesses the answer for the question “Is
Deep Learning our best solution towards real AI?”
133
Applications of ANN
Speech Recognition
Multilayer networks
Multilayer networks with recurrent connections
Kohonen self-organizing feature map
Character Recognition
134
It is a fascinating problem which comes under the general
domain of Pattern Recognition. Countless neural networks
have been developed for automatic recognition of handwritten
characters, either letters or digits. Following are some ANNs
which have been used for character recognition −
135
It is one of the biometric approaches to recognize the given
face. It is a distinctive task because of the characterization of
“non-face” images. Although, if a neural network is finely
trained, then it can be distributed into two classes namely
images having faces and images that do not have faces.
136
Traveling Saleman's Problem
Future in NN
137
Self-diagnosis of medical problems using neural
networks
And much more!
138
Summary
139
short-term memory, Deep reservoir computing, deep belief
networks, etc.
There are many issues and challenges face by the ANN users
nowadays, a brief content shows these.
The Applications of ANNs are large in number and varieties.
Some are mentioned in the document.
140
Thank you !
Thank you for buying this book! It is intended to help you
understanding machine learning. If you enjoyed this book and
felt that it added value to your life, we ask that you please take
the time to review it.
Your honest feedback would be greatly appreciated. It
really does make a difference.
If you noticed any problem, please let us know by
sending us an email at review@aisciences.net before
writing any review online. It will be very helpful for us
to improve the quality of our books.
https://www.amazon.com/dp/B07FTPKJMM
141
142