You are on page 1of 47

THE ARTIFICIAL

NEURON

1
Materi

1. The artificial neuron model


2. The artificial neuron activation
function
3. Decision boundary
4. The artificial neuron learning
5. Network architecture/topology
6. Neural Processing

2
Models of Neuron
 Neuron is an information processing unit
 A set of synapses or connecting links
 characterized by weight or strength
 An adder
 summing the input signals weighted by synapses
 a linear combiner
 An activation function
 also called squashing function
 squash (limits) the output to some finite values

3
Nonlinear model of a neuron (I)

4
Analogy

 Inputs represent synapses


 Weights represent the strengths of
synaptic links
 Wi represents dentrite secretion
 Summation block represents the
addition of the secretions
 Output represents axon voltage

5
Nonlinear model of a neuron (II)

6
Types of Activation Function

7
Activation Functions...

 Threshold or step function (McCulloch &


Pitts model)
 Linear: neurons using a linear activation
function are called in the literature
ADALINEs (Widrow 1960)
 Sigmoidal functions: functions which more
exactly describe non-linear functions of the
biological neurons.

8
Types of Activation Function

Identity Function

f(x) = x for all x

9
Types of Activation Function

Binary step function with threshold 

Also known as
Heaviside function

10
Activation Functions - sigmoid

11
Activation Functions - sigmoid

12
Activation Function value range

13
Stochastic Model of a Neuron

 So far we have introduced only deterministic


models of ANNs
 A stochastic(probabilistic)model can also be
defined
 If x denotes the state of a neuron, then P(v)
denotes the prob. of firing a neuron, where v is
the induced activation potential(bias + linear
combination).

14
Stochastic Model of a Neuron

 Where T is a pseudo-temperature used to


control the noise level (and therefore the
uncertainty in firing)
T = 0  Stochastic model  deterministic
model

15
Decision boundaries

 In simple cases, divide feature space by


drawing a hyperplan across it.
 Known as a decision boundary.
 Discriminant function: returns different values
on opposite sides. (straight line)
 Problems which can be thus classified are
linearly separable.

16
E.g. Decision Surface of a
Perceptron

 Perceptron is able to represent some useful functions


 AND(x1,x2) choose weights w0=-1.5, w1=1, w2=1
 But functions that are not linearly separable (e.g. XOR)
are not representable

17
Linear Separability

18
Rugby players & Ballet dancers

19
Training the neuron

20
The artificial neuron learning

 Supervised Learning
 Unsupervised Learning

21
Supervised Learning

 The desired response of the system is provided


by a teacher, e.g., the distance ρ[d,o] as as
error measure
 Estimate the negative error gradient direction
and reduce the error accordingly
 Modify the synaptic weights to reduce the
stochastic minimization of error in
multidimensional weight space

22
Unsupervised Learning
(Learning without a teacher)

 The desired response is unknown, no explicit


error information can be used to improve
network behavior. E.g. finding the cluster
boundaries of input pattern
 Suitable weight self-adaptation mechanisms
have to embedded in the trained network

23
Neuron Learning

Block diagram of basic learning modes (a). Supervised learning


(b). Unsupervised learning

24
Training

25
Simple network

26
Learning algorithm

27
Learning algorithm

 Epoch : Presentation of the entire training set to


the neural network. In the case of the AND function
an epoch consists of four sets of inputs being
presented to the network (i.e. [0,0], [0,1], [1,0],
[1,1])
 Error: The error value is the amount by which the
value output by the network differs from the target
value. For example, if we required the network to
output 0 and it output a 1, then Error = -1

28
Learning algorithm

 Target Value, T: When we are training a network we


not only present it with the input but also with a value
that we require the network to produce. For example, if
we present the network with [1,1] for the AND function
the target value will be 1
 Output , O: The output value from the neuron
 Ij: Inputs being presented to the neuron
 Wj: Weight from input neuron (Ij) to the output neuron
 LR : The learning rate. This dictates how quickly the
network converges. It is set by a matter of
experimentation. It is typically 0.1

29
Training the neuron

30
Training the neuron

31
Learning in Neural Networks

 Learn values of weights from I/O pairs Start


with random weights
 Load training example’s input
 Observe computed input
 Modify weights to reduce difference
 Iterate over all training examples
 Terminate when weights stop changing OR when
error is very small

32
Network Architecture

 Single-layer Feedforward Networks


 input layer and output layer
 single (computation) layer
 feedforward, acyclic
 Multilayer FeedforwardNetworks
 hidden layers
 hidden neurons and hidden units
 enables to extract high order statistics
 10-4-2 network, 100-30-10-3 network
 fully connected layered network
 Recurrent Networks
 at least one feedback loop
 with or without hidden neuron
33
Feedforward Networks (static)

34
Feedforward Networks

 One I/P and one O/P layer


 One or more hidden layers
 Each hidden layer is built from artificial neurons
 Each element of the preceding layer is connected
with each element of the next layer.
 There is no interconnection between artificial
neurons from the same layer.
 Finding weights is a task which has to be done
depending on which solution problem is to be
performed by a specific network.

35
Feedforward Networks
(Recurrent or dynamic systems)

36
Feedforward Networks
(Recurrent or dynamic systems)

 The interconnections go in two directions


between ANNs or with the feedback.
 Boltzmann machine is an example of
recursive nets which is a generalization of
Hopfield nets. Other example of recursive
nets: Adaptive Resonance Theory (ART)
nets.

37
Neural network as directed Graph

38
Neural network as directed Graph

 Block diagram can be simplify by the idea of


signal flow graph
 node is associated with signal
 directed link is associated with transfer
function
 synaptic links
 governed by linear input-output relation
 signal xj is multiplied by synaptic weight wkj
 activation links
 governed by nonlinear input-output relation
 nonlinear activation function
39
Feedback

 Output determines in part own output via


feedback

 depending on w
 stable, linear divergence, exponential divergence
 we are interested in the case of |w| <1 ; infinite
memory
 output depends on inputs of infinite past
 NN with feedback loop : recurrent network
40
Neural Processing

 Recall
 The process of computation of an output o for a
given input x performed by the ANN.
 It’s objective is to retrieve the information, i.e., to
decode the stored content which must have been
encoded in the network previously
 Autoassociation
 A network is presented a pattern similar to a
member of the stored set, autoassociation
associates the input pattern with the closest stored
pattern.
 Autoassociation: reconstruction of incomplete or
noisy image
41
Neural Processing

 Heteroassociation:
 The network associates the input pattern with pairs
of patterns stored

Association response: (a) autoassociation and (b) heterooassociation

42
Neural Processing

Classification
 A set of patterns is already divided into a number
of classes, or categories
 When an input pattern is presented, the classifier
recalls the information regarding the class
membership of the input pattern
 The classes are expressed by discrete
 valued output vectors, thus the output neurons of
the classifier employ binary activation functions
 A special case of heteroassociation

43
Neural Processing

Recognition
 If the desired response is the class number, but
the input pattern doesn’t exactly corresponding
to any of the patterns in the stored set

Classification response: (a) classification and (b) recognition

44
Neural Processing

Clustering
 Unsupervised classification of patterns/objects
without providing information about the actual
classes
 The network must discover for itself any existing
patterns, regularities, separating properties, etc.
 While discovering these, the network undergoes
change of its parameters, which is called Self-
organization

45
Neural Processing

patterns stored

Two-dimensional patterns: (a) clustered and (b) no apparent clusters

46
Summary

 Parallel distributed processing (especially a hardware


based neural net) is a good approach for complex
pattern recognition
 (e.g. image recognition, forecasting, text retrieval,
optimization)
 Less need to determine relevant factors a priori when
building a neural network
 Lots of training data are needed
 High tolerance to noisy data. In fact, noisy data
enhance post-training performance
 Difficult to verify or discern learned relationships even
with special knowledge extraction utilities developed for
neural nets
47