You are on page 1of 31

EIE 520

ARTIFICIAL NEURAL NETWORKS


Lead Lecturer: Prof. E. Adetiba, R.Engr.(COREN), Ph.D
Co-Lecturer: Mrs. Comfort Lawal, Mr. John Abubarka
Time: Two hours weekly
Venue: 500LHall, EIE

.
Module 1
Introduction to Artificial Neural Network

1.1 Artificial Neural Network (ANN) Definition


An ANN is an information processing model that
was inspired from the understanding of biological
nervous system. ANN was developed as a
generalization of mathematical models of neural
biology based on the following assumptions:
1. Neurons are simple units in a nervous system at
which information processing occurs.
2. Incoming information are signals that are passed
between neurons through connection links .
3. Each connection link has a corresponding weight
which multiplies the transmitted signal.
4. Each neuron applies an activation function to its net
input which is the sum of weighted input signals to
determine the output signal.

Figure 1. 1 Biological Neuron (Source: Fundamental of Neural Networks: Architectures,


Algorithms and Application By Laurene Fausett, )
1.2 Artificial Neural Network (ANN) Characteristics
 Some ANN properties that are inspired by biological neural network
are:
1. Non-linearity: Interconnection of nonlinear neurons
and the non-linearity is distributed throughout.
2. Input-Output mapping: Ability to learn with or
without a Teacher.
3. Adaptability: Can adapt the free parameters (synaptic
connections) to changes in the surrounding
environment.
4. Evidential response: Able to make decision with a
measure of confidence.
5. Fault Tolerance: It experiences graceful degradation
which implies that if a fault is small the degree of
degradation will be small.
6. VLSI implementability: The neurons in ANN are
parallel computation units and can be implemented with
very large scale integrated circuit (like ASIC, DSP,
FPGA,TPU) and parallel computing platforms (such as
Multi GPU, Multi-core CPU)

y0

y1

y2
...

yn  

Figure 1.2: Artificial Neuron


Figure 1.3: Biological Neural Network
Co
n ne
cti

Co eig
on

nn ht
we

w
ec s
igh
ts

tio
n
Input Signals

Output Signals

Output Signals
Input Signals

Input layer Output layer


Hidden layer Input layer Output layer
<-------Hidden layer-------->

A) Shallow Architecture B) Deep Architecture

Figure 1.4: Typical Artificial Neural Network (ANN) architectures


1.3.1 Similarity/Analogy of Biological Neural Network
with ANN
S/N Biological Neural Network Artificial Neural Network (ANN)

1 Soma Neuron
2 Dendrite Input
3 Axon Output
4 Synapse Weight
1.3.2 Differences Between the Biological and
Artificial Neural Network
1. Parameters: Human brains have about 10million
times synapses than ANN, ResNET, which is one
of the deepest ANN architecture has 60 million
synapses.
2. Topology: Human brains have no layers whereas
ANN could have several layer.
3. Operation Mode: The human brain works
asynchronously while ANN works
synchronously.
4. Learning Algorithm: ANN uses gradient descent
learning(and other learning algorithms) while the
how the human brain learn is not yet known.
5. Power consumption: Human brain uses very
little power compared to ANN.
6. Learning Phases: Human brain never stops
learning as long as it is living, while ANN first trains
then tests.
1.4 History of ANN
 The history of ANN began nearly simultaneously
with that of programmable electronic computers
and it is divided into five timelines.
i) The beginning of ANN(1940s).
ii) The first golden age (1950s and 1960s).
iii) The quiet years (1970s).
iv) The renaissance or renewed enthusiasm era
(1980s and 1990s).
v) The modern era (2000s – Till Date).
?? ??

Figure 1.5: Some Founding Fathers of ANN (From left to right and top to bottom: John
von Neumann, Donald O. Hebb, Marvin Minsky, Bernard Widrow, Seymour Papert,
Teuvo Kohonen, John Hopfield, Yann LeCun, Geoffrey Hinton, Yoshua Bengio).
1.4.1 The beginning of ANN (1940s)
(a) McCulloch-Pitts Neurons:
 The first artificial neuron was designed by Warren
McCulloch and Walter Pitts [McCulloch & Pitts, 1943].
The neurons are used most widely as logic circuits. Their
subsequent work [Pitts & McCulloch, 1943] addressed the
translational and rotational invariant pattern recognition.

(b)Hebb Learning:
 Donald Hebb, a psychologist at McGill University
developed the first learning law for ANN in 1949.
Rochester,Holland, Haibt & Duda (1956) refined Hebb’s
work to allow computer simulations for ANN. McClelland
and Rumelhart further expanded the work of Hebb in
1988.
1.4.2 The first golden age (1950s and 1960s)
 ANNs are often seen today as an alternative to (or
complement of) traditional computing, it should be noted
that John von Neumman was keenly interested in
modeling the brain[von Neumman, 1958].
(a) Perceptrons:
 Frank Rosenblatt(1958,1959,1960) with several other
researchers[Block, 1962, Minsky & Papert 1969]
introduced and developed a large class of ANN called
Perceptrons.
 The Perceptron learning rule uses an iterative weight
adjustment that is more powerful than the Hebb rule.
However, the limitations of what the Perceptron can learn
was demonstrated by Minsky and Papert (1969).
(b) Adaline:
 Bernard Widrow and his student Marcian Hoff developed a
learning rule tagged Least Mean Squares(LMS) or delta rule that
is closely related to the perceptron learning rule[Widrow & Hoff,
1960].
 The similarity of models developed in psychology by Rosenblatt
to those developed in electrical engineering by Widrow and Hoff
is an evidence of the interdisciplinary nature of ANN.
 The Widrow-Hoff learning rule for a single layer network is the
precursor of the backpropagation rule for multilayer networks.
 ADALINE means Adaptive Linear Neurons. It has been applied
for adaptive antenna system[Widrow, Mantey, Griffiths & Goode,
1967], rotation invariant pattern recognition, and a variety of
control systems such as broom balancing and backing up a truck.
 MADALINES are multilayer extensions of ADALINES[Widrow
& Hoff, 1960; Widrow and Lehr,1990]
1.4.3 The quiet years (1970s)
(a) Kohonen:
 Teuvo Kohonen’s early work at Helsinki University of
Technology dealt with associative memory neural
networks[Kohonen,1972].
 He later worked on self-organising feature maps that use a
topological structure for the cluster units.
 These networks have been applied to speech
recognition[Kohonen, Torkkola, Shozakai, Kangas &
Venta 1987; Kohonen, 1988],the “Traveling Salesman
Problem”[Angeniol, Vaubiis & Le Texier, 1988] and
composition of music [Kohonen, 1989b].
(b) Anderson:
 James Anderson started his research in ANN with
associative memory nets at Brown University[Anderson
1968, 1972]. He developed it into the “Brain-in-a-
Box”[Anderson, Silverstein, Ritz, & Jones, 1977].
Application of this include medical diagnosis and learning
of multiplication table.

(c) Grossberg:
 The works of Stephen Grossberg at the center for Adaptive
Systems at Boston University (although very mathematical
and biological) are widely known in ANN
community[Grossberg, 1976,1980, 1982, 1987, 1988].
(d) Carpenter:
 Stephen Grossberg and Gail Carpenter developed a theory
of self-organizing neural networks called Adaptive
Resonance Theory for binary input patterns (ART1) and
for continuously valued inputs (ART2).

1.4.4 The renaissance or renewed enthusiasm era


(1980s and 1990s)

(a) Backpropagation:
 The failure of single-layer Perceptron to be able to solve
some simple problems such as the XOR function and the
lack of a general method for training a multilayer networks
led to the “quiet years” of the 1970s.
 Werbos (1974) developed a method for propagating errors
at the output units back to the hidden units but it did not
gain wide publicity.

 David Parker (1985) and LeCun (1986) also discovered


this methods independently before it became widely
known.

 Parker’s work was however refined by David Rumelhart


of the University of California at San Diego and James
McClelland of Carnegie-Mellon University who refined
and publicised it [Rumelhart, Hinton & William,1986a,
1986b; McClelland & Rumelhart, 1988].
(b) Hopfield nets:
 John Hopfield of the California Institute of Technology with
David Tank of AT&T developed some ANNs based on fixed
weight and adaptive activations[Hopfield, 1982, 1984; Hopfield
& Tank, 1985, 1986; Tank & Hopfield 1987].
 These networks serve as associative memory network and can be
used to solve problems like the “Traveling Salesman Problem”.

(c) Neocognitron:
 Kunshiko Fukushima developed an earlier self-organizing
network called the cognitron for character
recognition[Fukushima, 1975]. Cognitron failed to recognize
positions or rotation distorted characters. Neocognitron was
created to correct this defficiency[Fukushima, 1988; Fukushima,
Mijake & Ito, 1983].
(d) Boltzman machine:
 Some researchers developed nondeterministic ANN in
which the weights or activations are changed on the basis of
a probability density function [Kirkpatrick, Gelatt &
Vecchi,1983].

(e) Hardware implementation:


 One major reason for renewed interest in ANN is improved
computational capabilities apart from the solution to the
problem of how to train a multilayer ANN.

 Optical neural networks were developed by Farhat, Psaltis,


Prata & Paek (1985) and VLSI ANNs were developed by
Sivilatti, Mahowald & Mead (1987).
f) Scientific Meetings:
American Institute of Physics began what has now
become an annual meeting in 1985 and by 1987,
the IEEE first International Conference on Neural
Networks drew more than 1,800 attendees.

Funding was provided by DARPA to Robert


Hecht-Nielsen and Todd Gutschow to develop
several digital neurocomputers and Hecht-
Nielsen(1990) developed the counter propagation
network.
g) Recurrent Neural Network Framework
Support Vector Machine (SVM), which has shallow
architecture was developed by Cortes and Vapnik
(1995) and the great success of SVM almost killed
the renewed enthusiasm of ANN research.
A Recurrent Neural Network (RNN) framework
named Long Short Term Memory (LSTM) was
proposed by Schmidhuber & Hochreiter in 1997.
LeNet was invented by Yann LeCun in 1990 as the
first widely recognized Convolutional Neural
Network (CNN) and it was inspired from
Neocognitron.
• LeNet was further made popular in LeCun et al.
(1998a) and led to the introduction of MNIST
database (LeCun et al., 1998b). MNIST database
has now become the standard benchmark in digit
recognition field. LeCun et al. also published the
Gradient-Based learning applied to document
recognition in 1998.
1.4.5 The modern era (2000s – Till Date)
The current waves of AI stemmed from previous
research endeavors in the field. For instance, the
evolution of Backpropation networks into Deep
learning, which is the rebranding of ANN research
had to wait for three related developments: i) Much
faster computers(e.g. GPU) ii) Massively bigger
training data sets(e.g. MNIST, ImageNet) iii)
Incremental improvements in learning algorithms.
The Deep Belief Network (DBN) was developed
and published by Hinton et al. in 2006. It is a
generative graphical model, which is based on the
restrictive Bolzman machine.
A Deep Auto-encoder based approach named
Greedy Layer-wise Training of Deep Networks was
developed by Bengio et al. in 2007.
AlexNet invented by Krizhevsky et al. (2012)
started the era of CNN application to ImageNet
classification with AlexNet providing proofs that
CNN can perform well on the historically difficult
ImageNet dataset.
In order to avoid the computationally expensive
training process, AlexNet training was supported
with two Graphic Processing Units (GPU), which
marked a new era of training of deep learning
network on GPU.
VGG deep network was developed by Simonyan and
Zisserman(2014) to show that simplicity is a promising
direction in deep learning with VGG having 3x3
convolutional layers, 2x2 pooling layers and the
number of filters doubling after each pooling layers.
Despite VGG’s admirable features, GoogleNet invented
by Szegedy et al. (2015) won the 2015 ImageNet
Competition. This is because GoogleNet introduced the
inception module into deep learning architecture.
Residual Net (ResNet) was invented in 2015 by He et
al. by exploring deeper structure with simple layers
following the simple layer path that VGG introduced.
ResNet is a 152 layer network with the introduction of
the residual block.
He et al. (2016) further validated that residual blocks are
essential for smooth propagation of information and
extended ResNet to a 1000-layer version with success on
CIFAR dataset.
In 2016, AlphaGo beat Lee Sedol, an 18-time world
champion at the game of Go (Silver et al., 2016). A year
later, AlphaGo beat a team of the world’s top five players.
However, AlphaGo Zero through learning by sheer trial
and error beat AlphaGo 100 games to none (Silver et al.,
2017). Notably, both AlphaGo and AlphaGo Zero relied
on reinforcement and deep learning to achieve these feats.
With the enormous resources being currently committed
to ANN research by the industry and government, it is
likely that the roller-coaster ride of ANN through winter
and summer seasons is now over.
##Representative Publications of the Evolution of ANN##
• McCulloch, Warren S. and Pitts, Walter, “A logical calculus of the ideas immanent in nervous activity,” The
bulletin of mathematical biophysics 5.4 (1943): 115-133.
• Rosenblatt, Frank, “The perceptron: a probabilistic model for information storage and organization in the
brain,” Psychological review 65.6 (1958): 386.
• Novikoff, A. B., “On convergence proofs on perceptrons,” Symposium on the Mathematical Theory of
Automata 12 (1962), 615-622.
• Minsky, Marvin and Papert, Seymour, “Perceptrons: An Introduction to Computational Geometry,” (1969)
• Rumelhart, David E.; Hinton, Geoffrey E. and Williams, Ronald J., “Learning representations by back-
propagating errors,” Cognitive modeling 5.3 (1988): 1.
• Hopfield, John J., “Neural networks and physical systems with emergent collective computational abilities,”
Proceedings of the national academy of sciences 79.8 (1982): 2554-2558.
• Ackley, David H.; Hinton, Geoffrey E. and Sejnowski, Terrence J., “A learning algorithm for Boltzmann
machines,” Cognitive science 9.1 (1985): 147-169.
• Cortes, Corinna and Vapnik, Vladimir, “Support-vector networks,” Machine learning 20.3 (1995): 273-297.
• LeCun, Yann; Bottou, Lon; Bengio, Yoshua and Haffner, Patrick, “Gradient-based learning applied to
document recognition,” Proceedings of the IEEE 86 no. 11 (1998): 2278-2324.
• Hinton, Geoffrey E., Osindero, Simon and Teh, Yee-Whye, “A fast learning algorithm for deep belief nets,”
Neural computation 18.7 (2006): 1527-1554.
• Bengio, Yoshua; Lamblin, Pascal; Popovici, Dan and Larochelle, Hugo, “Greedy layer-wise training of deep
networks,” Advances in neural information processing systems 19 (2007): 153.
• Krizhevsky Alex; Sutskever, Ilya and Hinton, Geoffrey E., “Imagenet classification with deep convolutional
neural networks,” Advances in neural information processing systems pp. 1097-1105. 2012.
• etc
1.5 Applications of ANN
 As engineers, you must understand which problems are well
suited for ANN solutions and which are not. Also, you must know
which ANN structure is most applicable to a given problem.

1.5.1 Problems Not Suited to ANN Solution


(1) Problems that are easily written out as flowcharts are not suitable
to ANN solution. Traditional programming approach is
appropriate for this.
(2) If the algorithm for the problem is an unchanging business rule,
there is no reason to use ANN.
(3) Problems that you must know exactly how the solution was
derived are often not suitable for ANN solution. ANN cannot
explain its reasoning for solving a problem for which it was
trained. It cannot explain how it followed a series of steps to
arrive at the answer.
1.5.2 Problems that are Suitable for ANN Solution

 Note that neural networks can often solve problems with


fewer lines of code than a traditional programming
algorithm. ANN are particularly useful for solving problems
that cannot be expressed as a series of steps, such as:
(1) Classification
(2) Pattern recognition
(3) Regression
(4) Data mining
(5) Transfer Learning
(6) Feature Learning
(7) Clustering
etc
Module 1 Assignment
1) Write short articles with citations and references on
five(5) different contributions that have been made in
ANN research within the following timelines:
a) 2001 – 2010
b) 2011 – till date.
Note: The assignment should be submitted on
Moodle as instructed.

You might also like