AI Lect8 Neural

Artificial Intelligence
Neural Networks
What is Learning
The word "learning" has many different meanings.
It is used, at least, to describe
• memorizing something
• learning facts through observation and exploration
• development of motor and/or cognitive skills
through practice
• organization of new knowledge into general,
effective representations
Learning
• Study of processes that lead to self-
improvement of machine performance.
• It implies the ability to use knowledge to
create new knowledge or integrating new
facts into an existing knowledge structure
• Learning typically requires repetition and
practice to reduce differences between
observed and actual performance
3
?What is Learning
Herbert Simon: “Learning is any process by
which a system improves performance from
experience.”
4
Learning
Definition:
A computer program is said to learn from
experience E with respect to some class of
tasks T and performance measure P, if its
performance at tasks in T, as measured by P,
improves with experience.
5
Learning & Adaptation
• ”Modification of a behavioral tendency by
expertise.” (Webster 1984)
• ”A learning machine, broadly defined as any
device whose actions are influenced by past
experiences.” (Nilsson 1965)
• ”Any change in a system that allows it to
perform better the second time on repetition of
the same task or on another task drawn from
the same population.” (Simon 1983)
6
Negative Features of Human
Learning
• Its slow (5-6 years for motor skills 12-20 years for
abstract reasoning)
• Inefficient
• Expensive
• There is no copy process
• Learning strategy is often a function of knowledge
available to learner
7
Applications of ML
• Learning to recognize spoken words
• Learning to drive an autonomous
vehicle
• Learning to classify objects
• Learning to play world-class
backgammon
• Designing the morphology and
control structure of electro-
mechanical artefacts 8
Motivating Problems
• Handwritten Character Recognition
Motivating Problems
• Fingerprint Recognition (e.g., border
control)
10
Motivating Problems
• Face Recognition (security access to
buildings etc)
11
…Different kinds of learning
• Supervised learning:
– Someone gives us examples and the right answer
for those examples
– We have to predict the right answer for unseen
examples
• Unsupervised learning:
– We see examples but get no feedback
– We need to find patterns in the data
• Reinforcement learning:
– We take actions and get rewards
– Have to learn how to get high rewards
Reinforcement learning
Learning with a Teacher
desired
state x response
Environment Teacher
actual
+
Learning response
-
system 
error signal 14
Unsupervised Learning
state Learning
Environment
system
15
The red and the black
• Imagine that we were given all these points, and we
needed to guess a function of their x, y coordinates that
would have one output for the red ones and a different
output for the black ones.
?What’s the right hypothesis
• In this case, it seems like we could do pretty well by
defining a line that separates the two classes.
Now, what’s the right hypothesis
• Now, what if we have a slightly different configuration of
points? We can't divide them conveniently with a line.
Now, what’s the right hypothesis
• But this parabola-like curve seems like it might
be a reasonable separator.
Design a Learning System
Step 0:
– Lets treat the learning system as a black box
Learning System Z
2
3
6
7
8
9
Step 2: Representing Experience
– So, what would D be like? There are many possibilities.
– Assuming our system is to recognise 10 digits only, then D can be a 10-d

binary vector; each correspond to one of the digits
D = (d0, d1, d2, d3, d4, d5, d6, d7, d8, d9)
X = (1,1,0,1,1,1,1,1,1,1,0,0,0,0,1,1,1, 1,1,0, …., 1); 64-d Vector
D= (0,0,0,0,0,1,0,0,0,0)
X= (1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1, 1,1,0, …., 1); 64-d Vector
D= (0,0,0,0,0,0,0,0,1,0)
Example of supervised learning:
classification
• We lend money to people
• We have to predict whether they will pay us back or not
• People have various (say, binary) features:
– do we know their Address? do they have a Criminal record? high Income?
Educated? Old? Unemployed?
• We see examples: (Y = paid back, N = not)
+a, -c, +i, +e, +o, +u: Y
-a, +c, -i, +e, -o, -u: N
+a, -c, +i, -e, -o, -u: Y
-a, -c, +i, +e, -o, -u: Y
-a, +c, +i, -e, -o, -u: N
-a, -c, +i, -e, -o, +u: Y
+a, -c, -i, -e, +o, -u: N
+a, +c, +i, -e, +o, -u: N
• Next person is +a, -c, +i, -e, +o, -u. Will we get paid back?
Learning by Examples
Concept: ”days on which my friend Aldo enjoys his
favourite water sports”
Task: predict the value of ”Enjoy Sport” for an
arbitrary day based on the values of the other
attributes
Sky Temp Humid Wind Water Fore- Enjoy

cast Sport
Sunny Warm Normal Strong Warm Same Yes
Sunny Warm High Strong Warm Same Yes
Rainy Cold High Strong Warm Chane No
Sunny Warm High Strong Cool Chane Yes
24
Decision trees
high Income?
yes no
Criminal record?
NO
yes no
NO YES
Constructing a +a, -c, +i, +e, +o, +u: Y
-a, +c, -i, +e, -o, -u: N
+a, -c, +i, -e, -o, -u: Y
decision tree, one

-a, -c, +i, +e, -o, -u: Y
-a, +c, +i, -e, -o, -u: N
address? -a, -c, +i, -e, -o, +u: Y
yes +a, -c, -i, -e, +o, -u: N
step at a time no +a, +c, +i, -e, +o, -u: N
-a, +c, -i, +e, -o, -u: N

+a, -c, +i, +e, +o, +u: Y -a, -c, +i, +e, -o, -u: Y
+a, -c, +i, -e, -o, -u: Y -a, +c, +i, -e, -o, -u: N
-a, -c, +i, -e, -o, +u: Y
+a, -c, -i, -e, +o, -u: N
+a, +c, +i, -e, +o, -u: N criminal? criminal?
no yes no
yes
-a, +c, -i, +e, -o, -u: N -a, -c, +i, +e, -o, -u: Y
-a, +c, +i, -e, -o, -u: N -a, -c, +i, -e, -o, +u: Y
+a, -c, +i, +e, +o, +u: Y

+a, -c, +i, -e, -o, -u: Y
+a, -c, -i, -e, +o, -u: N
+a, +c, +i, -e, +o, -u: N Address was
income?
maybe not the
yes no
best attribute to
+a, -c, +i, +e, +o, +u: Y +a, -c, -i, -e, +o, -u: N
+a, -c, +i, -e, -o, -u: Y start with…
Different approach: nearest neighbor(s)
• Next person is -a, +c, -i, +e, -o, +u. Will we get paid back?
• Nearest neighbor: simply look at most similar example in
the training data, see what happened there
+a, -c, +i, +e, +o, +u: Y (distance 4)
-a, +c, -i, +e, -o, -u: N (distance 1)
+a, -c, +i, -e, -o, -u: Y (distance 5)
-a, -c, +i, +e, -o, -u: Y (distance 3)
-a, +c, +i, -e, -o, -u: N (distance 3)
-a, -c, +i, -e, -o, +u: Y (distance 3)
+a, -c, -i, -e, +o, -u: N (distance 5)
+a, +c, +i, -e, +o, -u: N (distance 5)
• Nearest neighbor is second, so predict N
• k nearest neighbors: look at k nearest neighbors, take a vote
– E.g., 5 nearest neighbors have 3 Ys, 2Ns, so predict Y
Neural Networks
• They can represent complicated hypotheses in high-dimensional
continuous spaces.
• They are attractive as a computational model because they are composed
of many small computing units.
• They were motivated by the structure of neural systems in parts of the
brain. Now it is understood that they are not an exact model of neural
function, but they have proved to be useful from a purely practical
perspective.
If…then rules
If tear production rate = reduced then recommendation = none
If age = young and astigmatic = no then recommendation = soft
Approaches to Machine Learning
• Numerical approaches
– Build numeric model with parameters based on
successes
• Structural approaches
– Concerned with the process of defining
relationships by creating links between concepts
30
Learning methods
• Decision rules:
– If income < $30.000 then reject
• Bayesian network:
– P(good | income, credit history,….)
• Neural Network:
• Nearest Neighbor:
– Take the same decision as for the customer in
the data base that is most similar to the
applicant
31
Classification
• Assign object/event to one of a given finite set of categories.
– Medical diagnosis
– Credit card applications or transactions
– Fraud detection in e-commerce
– Worm detection in network packets
– Spam filtering in email
– Recommended articles in a newspaper
– Recommended books, movies, music, or jokes
– Financial investments
– DNA sequences
– Spoken words
– Handwritten letters
– Astronomical images
32
Problem Solving / Planning /
Control
• Performing actions in an environment in order to
achieve a goal.
– Solving calculus problems
– Playing checkers, chess, or backgammon
– Balancing a pole
– Driving a car or a jeep
– Flying a plane, helicopter, or rocket
– Controlling an elevator
– Controlling a character in a video game
– Controlling a mobile robot
33
Another Example:
Handwriting Recognition
 Positive:
– This is a letter S:
• Background concepts:
– Pixel information
• Categorisations:
– (Matrix, Letter) pairs
– Both positive &
 Negative: negative
– This is a letter Z:
• Task
– Correctly categorise
• An unseen example
– Into 1 of 26 categories
History
• Roots of work on NN are in:
• Neurobiological studies (more than one century ago):
• How do nerves behave when stimulated by different magnitudes
of electric current? Is there a minimal threshold needed for
nerves to be activated? Given that no single nerve cel is long
enough, how do different nerve cells communicate among each
other?
• Psychological studies:
• How do animals learn, forget, recognize and perform other types
of tasks?
• Psycho-physical experiments helped to understand how individual
neurons and groups of neurons work.
• McCulloch and Pitts introduced the first mathematical model of
single neuron, widely applied in subsequent work.
History
• Widrow and Hoff (1960): Adaline
• Minsky and Papert (1969): limitations of single-layer perceptrons (and
they erroneously claimed that the limitations hold for multi-layer
perceptrons)
Stagnation in the 70's:
• Individual researchers continue laying foundations
• von der Marlsburg (1973): competitive learning and self-organization
Big neural-nets boom in the 80's
• Grossberg: adaptive resonance theory (ART)
• Hopfield: Hopfield network
• Kohonen: self-organising map (SOM)
Applications
• Classification:
– Image recognition
– Speech recognition
– Diagnostic
– Fraud detection
– …
• Regression:
– Forecasting (prediction on base of past history)
– …
• Pattern association:
– Retrieve an image from corrupted one
– …
• Clustering:
– clients profiles
– disease subtypes
– …
Real Neurons
• Cell structures
– Cell body
– Dendrites
– Axon
– Synaptic terminals
38
Non Symbolic Representations
• Decision trees can be easily read
– A disjunction of conjunctions (logic)
– We call this a symbolic representation
• Non-symbolic representations
– More numerical in nature, more difficult to read
• Artificial Neural Networks (ANNs)
– A Non-symbolic representation scheme
– They embed a giant mathematical function
• To take inputs and compute an output which is interpreted
as a categorisation
– Often shortened to “Neural Networks”
• Don’t confuse them with real neural networks (in heads)
Complicated Example:
Categorising Vehicles
• Input to function: pixel data from vehicle images

– Output: numbers: 1 for a car; 2 for a bus; 3 for a tank
INPUT INPUT INPUT INPUT
OUTPUT = 3 OUTPUT = 2 OUTPUT = 1 OUTPUT=1

Real Neural Learning
• Synapses change size and strength with
experience.
• Hebbian learning: When two connected
neurons are firing at the same time, the
strength of the synapse between them
increases.
• “Neurons that fire together, wire together.”
41
Neural Network
Input Layer Hidden 1 Hidden 2 Output Layer

Simple Neuron
X1
W1
Inputs X2 W2  f Output
Wn
Xn
Neuron Model
• A neuron has more than one input x1, x2,..,xm
• Each input is associated with a weight w 1,
w2,..,wm
• The neuron has a bias b
• The net input of the neuron is
n = w1 x1 + w2 x2+….+ wm xm + b
n   wi xi  b
Neuron output
• The neuron output is
y = f (n)
• f is called transfer function
Transfer Function
• We have 3 common transfer functions
– Hard limit transfer function
– Linear transfer function
– Sigmoid transfer function

Exercises
• The input to a single-input neuron is 2.0, its

weight is 2.3 and the bias is –3.
• What is the output of the neuron if it has
transfer function as:
– Hard limit
– Linear
– sigmoid
Architecture of ANN
• Feed-Forward networks
Allow the signals to travel one way from input to
output
• Feed-Back Networks
The signals travel as loops in the network, the
output is connected to the input of the network
Learning Rule
• The learning rule modifies the weights of the
connections.
• The learning process is divided into
Supervised and Unsupervised learning

Perceptron
X1
W1
Inputs X2 W2  f Output
Wn
Xn
Perceptron
• The perceptron is given first a randomly

weights vectors
• Perceptron is given chosen data pairs (input and
desired output)
• Preceptron learning rule changes the weights
according to the error in output
Perceptron
• The weight-adapting procedure is an iterative

method and should reduce the error to zero
• The output of perceptron is
Y = f(n)
= f ( w1x1+w2x2+…+wnxn)
=f (wixi) = f ( WTX)
Perceptron Learning Rule
W new = W old + (t-a) X
Where W new is the new weight
W old is the old value of weight
X is the input value
t is the desired value of output
a is the actual value of output

Example
• Consider a perceptron that has two real-valued
inputs and an output unit. All the initial
weights and the bias equal 0.1. Assume the
teacher has said that the output should be 0 for
the input: x1 = 5 and x2 = - 3. Find the
optimum weights for this problem.
Example
• Covert the classification problem into
perceptron neural network model
(start w1=1, b=3 and w2=2 or any
other values).
• X1 = [0 2], t1=1 & x2 = [1 0], t2=1
& x3 = [0 –2] , t3=0 & x4=[2 0],
t4=0
Example Perceptron
• Example calculation: x1=-1, x2=1, x3=1, x4=-1

– S = 0.25*(-1) + 0.25*(1) + 0.25*(1) + 0.25*(-1) = 0
• 0 > -0.1, so the output from the ANN is +1
– So the image is categorised as “bright”
The First Neural Neural
Networks
AND
X1 X2 Y
1 1 1
1 0 0
0 1 0
0 0 0
Threshold(Y) = 2
Simple Networks
1-
W = 1.5
x
t = 0.0
W=1
y
Exercises
• Design a neural network to recognize the
problem of
• X1=[2 2] , t1=0
• X=[1 -2], t2=1
• X3=[-2 2], t3=0
• X4=[-1 1], t4=1
Start with initial weights w=[0 0] and bias =0
Problems
• Four one-dimensional data belonging to two
classes are
X = [1 -0.5 3 -2]
T = [1 -1 1 -1]
W = [-2.5 1.75]
Example
-1 -1 -1 -1 -1 -1 -1 -1
-1 -1 +1 +1 +1 +1 -1 -1
-1 -1 -1 -1 -1 +1 -1 -1
-1 -1 -1 +1 +1 +1 -1 -1
-1 -1 -1 -1 -1 +1 -1 -1
-1 -1 -1 -1 -1 +1 -1 -1
-1 -1 +1 +1 +1 +1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1
Example
-1 -1 -1 -1 -1 -1 -1 -1
-1 -1 +1 +1 +1 +1 -1 -1
-1 -1 -1 -1 -1 +1 -1 -1
-1 -1 -1 +1 +1 +1 -1 -1
-1 +1 -1 -1 -1 +1 -1 -1
-1 -1 -1 -1 -1 +1 -1 -1
-1 -1 +1 +1 +1 +1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1
AND Network
• This example means we construct a network
for AND operation. The network draw a line
to separate the classes which is called
Classification
Perceptron Geometric View
The equation below describes a (hyper-)plane in the input space
consisting of real valued m-dimensional vectors. The plane splits
the input space into two regions, each of them describing one class.
decision
region for C1
m x2 w1x1 + w2x2 + w0 >= 0
w x  w
i 1
i i 0 0 decision
boundary C1
x1
C2
w1x1 + w2x2 + w0 = 0
Perceptron: Limitations
• The perceptron can only model linearly separable
classes, like (those described by) the following
Boolean functions:
• AND
• OR
• COMPLEMENT
• It cannot model the XOR.
• You can experiment with these functions in the

Matlab practical lessons.
Multi-layers Network
• Let the network of 3 layers
– Input layer
– Hidden layers
– Output layer
• Each layer has different number of neurons
Multi layer feed-forward NN
FFNNs overcome the limitation of single-layer NN: they can

handle non-linearly
separable learning tasks.
Input Output
layer layer
Hidden Layer
Types of decision regions
w0  w1 x1  w2 x2  0
1
w0 Network
x1 w1 with a single
w0  w1 x1  w2 x2  0 node
x2 w2
1
L1 1
L2
Convex 1 One-hidden layer
region
x1 1 network that realizes
L3
L4
1
3.5- the convex region
x2
1
Learning rule
• The perceptron learning rule can not be

applied to multi-layer network
• We use Backpropagation Algorithm in
learning process
Backprop
Network activation
Error computation
Forward Step
Error propagation
Backward Step
Bp Algorithm
• The weight change rule is
 ijnew   ijold   .error . f ' (inputi )
• Where  is the learning factor <1
• Error is the error between actual and trained
value
• f’ is the derivative of sigmoid function = f(1-f)
Delta Rule
• Each observation contributes a variable amount to the
output
• The scale of the contribution depends on the input
• Output errors can be blamed on the weights
• A least mean square (LSM) error function can be
defined (ideally it should be zero)
E = ½ (t – y)2
Calculation of Network Error
• Could calculate Network error as
– Proportion of mis-categorised examples
• But there are multiple output units, with numerical output
– So we use a more sophisticated measure:
• Not as complicated as it looks

– Square the difference between target and observed
• Squaring ensures we get a positive number
• Add up all the squared differences
– For every output unit and every example in training set
Example
• For the network with one neuron in input layer and
one neuron in hidden layer the following values are
given
X=1, w1 =1, b1=-2, w2=1, b2 =1, =1 and t=1
Where X is the input value
W1 is the weight connect input to hidden
W2 is the weight connect hidden to output
b1 and b2 are bias
t is the training value
Momentum in Backpropagation
• For each weight
– Remember what was added in the previous epoch
• In the current epoch

– Add on a small amount of the previous Δ
• The amount is determined by

– The momentum parameter, denoted α
– α is taken to be between 0 and 1
How Momentum Works
• If direction of the weight doesn’t change
– Then the movement of search gets bigger
– The amount of additional extra is compounded in each epoch
– May mean that narrow local minima are avoided
– May also mean that the convergence rate speeds up
• Caution:
– May not have enough momentum to get out of local minima
– Also, too much momentum might carry search

• Back out of the global minimum, into a local minimum
Building Neural Networks
• Define the problem in terms of neurons
– think in terms of layers
• Represent information as neurons
– operationalize neurons
– select their data type
– locate data for testing and training
• Define the network
• Train the network
• Test the network
Application: FACE RECOGNITION
• The problem:
– Face recognition of persons of a known group
in an indoor environment.
• The approach:
– Learn face classes over a wide range of poses
using neural network.
Navigation of a car
• Done by Pomerlau. The network takes inputs from a 34X36 video image
and a 7X36 range finder. Output units represent “drive straight”, “turn
left” or “turn right”. After training about 40 times on 1200 road images,
the car drove around CMU campus at 5 km/h (using a small workstation
on the car). This was almost twice the speed of any other non-NN
algorithm at the time.
01/01/23 79
Automated driving at 70 mph on a
public highway
Camera
image
outputs 30
for steering
30x32 weights
hidden 4
into one out of
units
four hidden
unit
30x32 pixels
as inputs
80
Exercises
• Perform one iteration of backpropagation to
network of two layers. First layer has one neuron
with weight 1 and bias –2. The transfer function
in first layer is f=n2
• The second layer has only one neuron with
weight 1 and bias 1. The f in second layer is 1/n.
• The input to the network is x=1 and t=1
1 1
(2t  2 y ) 2
n 2
1 e
W 11
X1 W13
W 12
b1
W21
X2 W23
b3
W22
b2
using the initial weights [b1= - 0.5, w11=2, w12=2, w13=0.5, b2= 0.5, w21=
1, w22 = 2, w23 = 0.25, and b3= 0.5] and input vector [2, 2.5] and t = 8.
.Process one iteration of backpropagation algorithm
Consider a transfer function as f(n) = n2. Perform
one iteration of BackPropagation with a= 0.9 for
neural network of two neurons in input layer and
one neuron in output layer. The input values are
X=[1 -1] and t = 8, the weight values between
input and hidden layer are w11 = 1, w12 = - 2,
w21 = 0.2, and w22 = 0.1. The weight between
input and output layers are w1 = 2 and w2= -2.
The bias in input layers are b1 = -1, and b2= 3.
W11
X1 W1
W12
W21
X2 W2
W22
• Kakuro . . . is a kind of game puzzle. The object of
the puzzle is to insert a digit from 1 to 9 inclusive
into each white cell such that the sum of the
numbers in each entry matches the clue associated
with it and that no digit is duplicated in any entry.
Briefly describe how you’d use Constraint
Satisfaction Problem methods to solve Kakuro
puzzles intelligently.

AI Lect8 Neural

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AI Lect8 Neural

Uploaded by

Copyright:

Available Formats

Artificial Intelligence

Herbert Simon: “Learning is any process by

which a system improves performance from

– Lets treat the learning system as a black box

– So, what would D be like? There are many possibilities.

– Assuming our system is to recognise 10 digits only, then D can be a 10-d

X = (1,1,0,1,1,1,1,1,1,1,0,0,0,0,1,1,1, 1,1,0, …., 1); 64-d Vector

X= (1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1, 1,1,0, …., 1); 64-d Vector

Sky Temp Humid Wind Water Fore- Enjoy

decision tree, one

yes +a, -c, -i, -e, +o, -u: N

step at a time no +a, +c, +i, -e, +o, -u: N

-a, +c, -i, +e, -o, -u: N

+a, -c, +i, +e, +o, +u: Y

• Input to function: pixel data from vehicle images

INPUT INPUT INPUT INPUT

OUTPUT = 3 OUTPUT = 2 OUTPUT = 1 OUTPUT=1

Input Layer Hidden 1 Hidden 2 Output Layer

• The neuron output is

• We have 3 common transfer functions

– Hard limit transfer function

– Linear transfer function

– Sigmoid transfer function

• The input to a single-input neuron is 2.0, its

• The learning rule modifies the weights of the

• The learning process is divided into

Supervised and Unsupervised learning

• The perceptron is given first a randomly

• The weight-adapting procedure is an iterative

W new = W old + (t-a) X

Where W new is the new weight

W old is the old value of weight

X is the input value

t is the desired value of output

a is the actual value of output

• Example calculation: x1=-1, x2=1, x3=1, x4=-1

• You can experiment with these functions in the

FFNNs overcome the limitation of single-layer NN: they can

• The perceptron learning rule can not be

• Not as complicated as it looks

• In the current epoch

• The amount is determined by

– The amount of additional extra is compounded in each epoch

– May mean that narrow local minima are avoided

– May also mean that the convergence rate speeds up

– Also, too much momentum might carry search

You might also like