Professional Documents
Culture Documents
HNS Cont.
• The way we handle fuzzy, probabilistic, noisy and
inconsistent data is possible with computer
programs under specific circumstances.
• Highly sophisticated programming and when the
context of such data has been analyzed in detail.
• We have native ability to handle uncertainty.
• The biological processing unit, the brain, is highly
parallel, small, compact and dissipates little power.
11/7/2017 Ustaza: Hiba Hassan 4
Cont.
4) Use different sets of data; the network is trained
on a set of training data, and its generalization
ability is tested using a new testing data.
5) If the network doesn’t perform well enough, go
back to stage 3 and try harder.
6) If the network still doesn’t perform well enough,
go back to stage 2 and try harder.
7) If the network still doesn’t perform well enough,
go back to stage 1 and try harder.
8) Problem solved – move on to next problem.
11/7/2017 Ustaza: Hiba Hassan 6
Cont.
• There are two important aspects of the network’s
operation to consider:
• Learning: The network must learn decision
boundaries from a set of training patterns so that
these training patterns are classified correctly.
• Generalization: After training, the network must
also be able to generalize, i.e. correctly classify test
patterns it has never seen before.
• Usually we want the neural network to learn in a
way that produces good generalization.
11/7/2017 Ustaza: Hiba Hassan 7
Cont.
• Sometimes, the training data may contain
errors (e.g., noise in the experimental
determination of the input values, or
incorrect classifications).
• In this case, learning the training data
perfectly may make the generalization
worse. There is an important tradeoff
between learning and generalization that
arises quite often.
17/4/2018 Hiba Hassan: U of K 8
Neuron Models
• When the input is a vector, the individual element
inputs are multiplied (dot product) by weights and
the weighted values are fed to the summing
junction.
• Then the output y is given by:
A layer of neurons:
17/4/2018 Hiba Hassan: U of K 11
TRANSFER (ACTIVATION)
FUNCTIONS
17/4/2018 Hiba Hassan: U of K 15
1- Linear neurons
• These are simple but computationally limited
• If we can make them learn we may get insight
into more complicated neurons.
y b xi wi
y
0
i
b xi wi
0
i
17/4/2018 Hiba Hassan: U of K 16
1
output
z xi wi z = b + å xi wi
q = -b i
i
1 if z 1 if z³0
y y
0 otherwise 0 otherwise
17/4/2018 Hiba Hassan: U of K 18
y = z
0 otherwise 0
4- Sigmoid neurons
z = b+ å xi wi
1
• They give a real-
y=
valued output that is
a smooth and i
1+ e
-z
bounded function of
their total input. 1
• Typically they use
the logistic function 0.5
• They have positive y
derivatives which 0
make learning 0 z
easy.
17/4/2018 Hiba Hassan: U of K 19
17/4/2018 Hiba Hassan: U of K 20
Cont.
• Each output neuron will produce a value between 0
& 1, example; 0.3, 0.7, 0.8, 0.9….
• To solve this problem, a generalization of the logistic
sigmoid was developed, the softmax activation
function.
• The softmax function has the effect of making the
maximum value of the outputs to be close to 1 and
the rest to be close to 0.
17/4/2018 Hiba Hassan: U of K 22
e zi
yi =
å e
zj
jÎgroup
a(n) exp( n )
2
1 n , if 1 n 1
a ( n)
0, otherwise
17/4/2018 Hiba Hassan: U of K 24
17/4/2018 Hiba Hassan: U of K 25
17/4/2018 Hiba Hassan: U of K 26
In-class Assignment
• Given a single-input neuron with a weight of 2.3
and a bias of -3. For an input of 2, calculate the
output produced by the following transfer
functions:
I. Hard limit
II. Linear
III. Log-sigmoid
17/4/2018 Hiba Hassan: U of K 27
Learning Algorithm
• The learning algorithm is a prescribed set of well-
defined rule for the solution of a learning problem.
• In every learning algorithm, we must specify the
cost function.
• Cost function - is a way of using your training
data to determine values for your parameters
which produces an output function as accurate as
possible.
• The Learning paradigm نموذجis a model of the
environment in which the neural network
operates.
• There are three major learning paradigms.
17/4/2018 Hiba Hassan: U of K 30
1- Supervised Learning
• A teacher is present during the learning process
& the desired output is presented.
• Every input pattern is used to train the network.
• The cost function is given by the difference
between the network’s computed output and the
expected output.
17/4/2018 Hiba Hassan: U of K 31
2- Unsupervised Learning
• There is no teacher.
• No expected output is presented to the network.
• The system undergoes self learning by discovering
and adapting to the structural features in the input
patterns.
• The cost function is determined by the task
formulation.
• Most applications fall within the domain of
estimation problems such as statistical modeling,
compression, filtering, blind source separation and
clustering.
17/4/2018 Hiba Hassan: U of K 32
3- Reinforced Learning
• There is a teacher.
• There is no expected outcome presented to the
network.
• The teacher help by indicating if a computed
output is right or wrong.
• A reward is given for the right one & a penalty is
given for the wrong one.
• Data is usually not given, but generated by an
agent's interactions with the environment.
17/4/2018 Hiba Hassan: U of K 34
Cont.
• Tasks that fall within the paradigm of reinforcement
learning are control problems, games and other
sequential decision making tasks.
17/4/2018 Hiba Hassan: U of K 36
Example
• "Given this data, a friend has a house 750 square
feet - how much can they be expected to get?"
There are different approaches that can be used
to solve this,
• A Straight line through data
• Maybe $150 000
• A Second order polynomial
• Maybe $200 000
• Each of these approaches represent a way of
doing supervised learning.
17/4/2018 Hiba Hassan: U of K 39
Cont.
• So, a training data is provided in which the actual
price of the house is known.
• The algorithm uses this to learn to predict prices
of houses for any other set of data.
• We call this a regression problem because,
• It predicts continuous valued output (price)
• It has no real discrete definition.
17/4/2018 Hiba Hassan: U of K 40
Example 2
• The following graph shows the number of times a
breast tumor is benign or malignant vs its tumor
size:
17/4/2018 Hiba Hassan: U of K 41
Example 2 (cont.)
• The graph shows that we have 5 tumors of each kind.
• We want to find a way to classify whether a tumor is
benign or malignant according to our trained network!
• Can you estimate diagnosis based on tumor size?
• This is an example of a classification problem
• Classify data into one of two discrete classes -
malignant or not.
• In classification problems, we may have
a discrete number of possible values for the output,
e.g. 0 – benign, 1 - type 1, 2 - type 2, 3 - type 4.
• In classification problems we can plot data in different
ways.
17/4/2018 Hiba Hassan: U of K 42
Cont.
• Based on that data, you can try and define separate
classes by,
• Drawing a straight line between the two groups
• Using a more complex function to define the two
groups.
• Then, when you have an individual with a
specific tumor size and who is a specific age, you
can use that information to place them into one of
your classes
• You might have many features to consider
• Clump thickness, Uniformity of cell size, Uniformity
of cell shape…etc.
17/4/2018 Hiba Hassan: U of K 44
Supervised Learning
• A programmer specifies number of units in each layer
and connectivity between units, so the only unknown is
the set of weights associated with the connections.
17/4/2018 Hiba Hassan: U of K 45
Learning Rules:
• A learning rule, also known as training
Learning Rules
• These learning types may use different learning
rules, such as:
• Hebbian,
• Gradient descent,
• Competitive,
• Stochastic.
• Hence, the learning types are categorized even
further according to the rule used.
17/4/2018 Hiba Hassan: U of K 48
Perceptrons (cont.)
• It is made up of only input neurons and output neurons
• Input neurons, usually, have two states: ON and OFF
• A simple threshold activation function is used for the
output neurons.
• It uses supervised training
• Example:
17/4/2018 Hiba Hassan: U of K 50
17/4/2018 Hiba Hassan: U of K 51
Cont.
• Based on that simple example, now we can
develop the learning rule for a perceptron.
• The perceptron , usually, uses a hard limit
activation function as shown in the following
figures.
17/4/2018 Hiba Hassan: U of K 52
Perceptrons
One perceptron neuron
17/4/2018 Hiba Hassan: U of K 53
A layer of Perceptrons
17/4/2018 Hiba Hassan: U of K 55
Multilayer Perceptron
17/4/2018 Hiba Hassan: U of K 56
In-class Assignment
• Train a network to sort oranges from apples based
on 3 features; shape, texture and weight. Prototype
oranges (p1) and apples (p2) are:
Cont.
• The perceptron learning rule (e.g. learnp in Matlab)
calculates desired changes to the perceptron's weights
and biases given an input vector p, and the associated
error e.
• The target vector t must contain values of either 0 or 1,
as perceptrons (with hardlim transfer functions) can only
output such values.
• By carefully increasing the number of epochs, i.e. each
time learnp is executed, the perceptron has a better
chance of getting closer to the target values, & hence
converging.
17/4/2018 Hiba Hassan: U of K 60
Example
Example(cont.)
• Then,
n = p1 + p2 -1 = 0
• Set p1 =0
• Set p2 = 0
p 2,0
• Consider the input T
The decision
boundary, in blue,
is orthogonal to
the weight vector,
1w. That means
that our classes
are Linearly
separable.
17/4/2018 Hiba Hassan: U of K 64
Perceptron Implementation
• Orthogonal means that the weight vector is a 90̊
angle with the decision boundary.
• Example: implement an AND logic gate.
• Answer: It has the following input/target pairs:
17/4/2018 Hiba Hassan: U of K 66
Cont.
• First we need to select a decision boundary.
• Then, we choose a weight vector orthogonal to the
decision boundary.
• Then we choose any weight that falls in this vector,
for example;
Cont.
• The process of finding new weights (and biases)
can be repeated until there are no errors.
• Note that the perceptron learning rule is guaranteed
to converge in a finite number of steps for all
problems that can be solved by a perceptron.
• These include all classification problems that are
"linearly separable" تقبل الفصل خطيا.