ML NeuralNetwork DeepLearning

MACHINE LEARNING
Neural Network & Deep Learning
Presented by: Dr. S. Nadeem Ahsan

(Slides adopted from the presentation of Andrew Ng, (https://www.coursera.org/instructor/andrewng), and open lecture (https:/
/www.youtube.com/watch?v=XBh8HNh1sq0), Tom Mitchell book on Machine Learning
)
Welcome!!
Lecture 08 -09
Learning Outcomes
• Neural Network
• Different Architecture of Neural Network
• Backpropogation Algorithm
• Gradient Decent Approach
• Deep Learning
• Covolution Neural Network
Biological Inspirations
• Some numbers…
– The human brain contains about 10 billion nerve cells (neurons)
– Each neuron is connected to the others through 10000 synapses
• Properties of the brain
– It can learn, reorganize itself from experience
– It adapts to the environment
– It is robust and fault tolerant
Biological Neuron
• A neuron has
– A branching input (dendrites)
– A branching output (the axon)
• The information circulates from the dendrites to the axon via the cell body
• Axon connects to dendrites via synapses
– Synapses vary in strength
– Synapses may be excitatory or inhibitory
Biological Neuron
What is an artificial neuron ?
Definition : Non linear, parameterized function with re
stricted output range
y
 n 1

w0 y  f  w0   wi xi 
 i 1 
x1 x2 x3
Activation Functions
20
18
16
14
Linear
12
10
6
yx
4
0
0 2 4 6 8 10 12 14 16 18 20
Logistic
1.5
0.5
1
y
0
1  exp(  x)
-0.5
-1
-1.5
-2
-10 -8 -6 -4 -2 0 2 4 6 8 10
1.5
Hyperbolic tangent
1
0.5
0
exp( x)  exp(  x)
-0.5
y
-1
-1.5
exp( x)  exp(  x)
-2
-10 -8 -6 -4 -2 0 2 4 6 8 10
Neural Networks
• A mathematical model to solve engineering problems
– Group of highly connected neurons to realize compositions of non linear functi
ons
• Tasks
– Classification
– Discrimination
– Estimation
• 2 major types of networks
– Feed forward Neural Networks
– Recurrent Neural Networks
Learning
• The procedure that consists in estimating the parameters of neurons so that the whole
network can perform a specific task
• 3 types of learning
– The supervised learning
– The unsupervised learning
– Reinforcement learning
• The Learning process (supervised)

– Present the network a number of inputs and their corresponding outputs
– See how closely the actual outputs match the desired ones
– Modify the parameters to better approximate the desired outputs
Characteristics Of Neural Networks
1) NN exhibit mapping capabilities, that is, they can map input patt
ern to their associated output patterns.
2) NN can predict new outcomes previously untrained
3) NNs possess the capability to generalize. Thus, they can, predic

t new outcomes from past trends
4) NNs can process information in parallel, at high speed, and in di

stributed manner.
Classification of Learning Methods
1. Supervised Learning (Error Based)
- Error Correction (Gradient Descent)
- Least Mean Square
- Backpropagation
- Stochastic
2. Unsupervised Learning
- Hebbian
- Competitive
3. Reinforced Learning
NN Learning Rules
1. Hebbian Method
2. Gradient Descent Learning
3. Competitive Learning
4. Stochastic Learning
Neural Network Architecture
The following are the three fundamentally diff
erent classes of network architecture:
1) Single Layer FeedForward Networks
2) Multilayer FeedForward Networks
3) Recurrent Networks
1. ADALINE (Adaptive Linear Neural Element)
2. ART (Adaptive Resonance Theory)
3. AM (Associative Memory)
Taxonomy of Neural 4. BAM (Bidirectional Association Memory)
5. BSB (Brain-state-in-a-box)
Network 6. CCN (Cascade Correlation)
7. CPN (Counter Propagation Network)
Architectures 8. LVQ (Learning Vector Quantization)
9. MADALINE (Many ADALINE)
10. Hamming Network
11. Hopfield Network
12. Boltzmann Machine
13. Cauchy Machine
Classification of Some NN System with respect to Learning Methods and Architecture Types Learning Method
Type of Architecture Gradient Descent Hebbian Competitive Stochastic
Single Layer Feedforward ADALINE, Hopfield, Perceptron AM, Hopfield LVQ, SOFM
Multilayer Feedforward CCN, MLFF, RBF Neocognitrom
Recurrent NN RNN BAM, BSB, ART Boltzmann

Hopfield Machine, Caushy Machine
Neural Network Classification For Fixed Patterns
Current Model of NN
1. Deep Learning Architecture

2. Multilayer Feedforward Networks.
3. Radial Basis Function Networks
4. Self Organizing Networks
Feed Forward Neural Networks
Output layer
• The information is propagated fro
2nd hidden m the inputs to the outputs
layer • Computations of No non linear fu
nctions from n input variables by
compositions of Nc algebraic func
1st hidden tions
layer
• Time has no role (NO cycle betwe
en outputs and inputs)
x1 x2 ….. xn
Recurrent Neural Networks
• Can have arbitrary topologies
• Can model systems with internal states (d
ynamic ones)
0 1 • Delays are associated to a specific weight
0
0 • Training is more difficult
1 • Performance may be problematic
– Stable Outputs may be more difficult to e
0
0 1 valuate
– Unexpected behavior (oscillation, chaos,
…)
x1 x2
Building a Neural Network
1. “Select Structure”: Design the way that the neurons are interconnected
2. “Select weights” – decide the strengths with which the neurons are inte
rconnected
– weights are selected so get a “good match” to a “training set”
– “training set”: set of inputs and desired outputs
– often use a “learning algorithm”
Multiple Output Units: One-vs-Rest
Multiple Output Units: One-vs-Rest
Neural Network Classification
Representing Boolean Function
Representing Boolean Function
Combining Representaions to Create Non-Linear Functions
Layering Representations
Layering Representations
Forward-Propagating Local Input Signals
• Forward propagation gives all the a’s and z’s

Back-Propagating Local Error Signals
t2
t1
• Back-propagation gives all the d ’s

Backpropagation
Calculations for a Reverse Pass of Back Propagation
Neural Network (Backpropagation)-Example (1/2)
Neural Network (Backpropagation)-Example (1/2)
Gradient Decent
Learning in a backpropagation network is in two steps.
1. First each pattern is presented to the network and propagated forw
ard to the output.
2. Second, a method called gradient descent is used to minimize the
total error on the patterns in the training set.
In gradient descent, weights are changed in proportion to the negative
of an error derivative with respect to each weight:
Plot of Error Function “E(w)”
DEEP LEARNING
Convolution Neural Network

Deep Learning Journey
Why Deep Learning?
When the amount of data is increased, machine learning techniques are insufficient in t
erms of performance and deep learning gives better performance like accuracy.
Why now?
1) Algorithm Advancements
2) GPU Computing
3) Availability of Large Training Data
DEEP LEARNING
• We know it is good to learn a small model.
• From this fully connected model, do we really ne
ed all the edges?
• Can some of these be shared?
Deep Learning Vs Machine Learning
Popular Deep Learning Algorithm
Convolution Neural Network
Consider Learning an Image
• Some patterns are much smaller than the whole image
Can represent a small region with fewer parameters
“beak” detector
Same pattern appears in different places: They can be compressed!
What about training a lot of such “small” detectors and each detector must “move around”.
“upper-left beak”
detector
They can be compressed

to the same parameters.
“middle beak”
detector
A Convolutional Layer
A CNN is a neural network with some convolutional layers (and some
other layers). A convolutional layer has a number of filters that does
convolutional operation.
Beak detector
A filter
Convolution
These are the network
parameters to be learned.
1 -1 -1
1 0 0 0 0 1 -1 1 -1 Filter 1
0 1 0 0 1 0 -1 -1 1
0 0 1 1 0 0
1 0 0 0 1 0 -1 1 -1
-1 1 -1 Filter 2
0 1 0 0 1 0
0 0 1 0 1 0 -1 1 -1
…
…
6 x 6 image
Each filter detects a small pattern (3 x 3).
Convolution 1 -1 -1 Filter 1
-1 1 -1
stride=1
-1 -1 1
1 0 0 0 0 1 Dot
product
0 1 0 0 1 0 3 -1
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
1 -1 -1
Convolution -1 1 -1 Filter 1
-1 -1 1
If stride=2
1 0 0 0 0 1
0 1 0 0 1 0 3 -3
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
Convolution
1 -1 -1 Filter 1
-1 1 -1
stride=1 -1 -1 1
1 0 0 0 0 1
0 1 0 0 1 0 3 -1 -3 -1
0 0 1 1 0 0
1 0 0 0 1 0 -3 1 0 -3
0 1 0 0 1 0
0 0 1 0 1 0 -3 -3 0 1
6 x 6 image 3 -2 -2 -1
Convolution -1 1 -1 Filter 2
-1 1 -1
stride=1 -1 1 -1
Repeat this for each filter
1 0 0 0 0 1
0 1 0 0 1 0 3 -1 -3 -1
-1 -1 -1 -1
0 0 1 1 0 0
1 0 0 0 1 0 -3 1 0 -3
-1 -1 -2 1
0 1 0 0 1 0 Feature
0 0 1 0 1 0 -3 -3 Map0 1
-1 -1 -2 1
6 x 6 image 3 -2 -2 -1
-1 0 -4 3
Two 4 x 4 images
Forming 2 x 4 x 4 matrix
Color Image: RGB 3 Channels
1 -1 -1 -1-1 1 1 -1-1
1 1 -1-1 -1-1 -1 1 -1
-1-1 1 1 -1-1 -1-1 1 1 -1-1
-1 1 -1 Filter 1 -1 1 -1 Filter 2
-1-1 -1-1 1 1 -1-1 1 1 -1-1
-1 -1 1 -1 1 -1
Color image
1 0 0 0 0 1
1 0 0 0 0 1
0 11 00 00 01 00 1
0 1 0 0 1 0
0 00 11 01 00 10 0
0 0 1 1 0 0
1 00 00 10 11 00 0
1 0 0 0 1 0
0 11 00 00 01 10 0
0 1 0 0 1 0
0 00 11 00 01 10 0
0 0 1 0 1 0
0 0 1 0 1 0
Convolution v.s. Fully Connected
1 0 0 0 0 1 1 -1 -1 -1 1 -1
0 1 0 0 1 0 -1 1 -1 -1 1 -1
0 0 1 1 0 0 -1 -1 1 -1 1 -1
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
convolution
image
x1
1 0 0 0 0 1
0 1 0 0 1 0 x2
Fully- 0 0 1 1 0 0
1 0 0 0 1 0
connected
…
…
…
…
0 1 0 0 1 0
0 0 1 0 1 0
x36
1 -1 -1 1 1
Filter 1
-1 1 -1 2 0
-1 -1 1 3 0
4: 0 3
…
1 0 0 0 0 1
0 1 0 0 1 0 0
0 0 1 1 0 0 8 1
1 0 0 0 1 0 9 0
0 1 0 0 1 0 10: 0
…
0 0 1 0 1 0
13 0
6 x 6 image
14 0
fewer parameters! 15 1 Only connect to 9
16 1 inputs, not fully
connected
…
1 -1 -1 1: 1
-1 1 -1 Filter 1 2: 0
-1 -1 1 3: 0
4: 0 3
…
1 0 0 0 0 1
0 1 0 0 1 0 7: 0
0 0 1 1 0 0 8: 1
1 0 0 0 1 0 9: 0 -1
0 1 0 0 1 0 10: 0
…
0 0 1 0 1 0
13: 0
6 x 6 image
14: 0
Fewer parameters 15: 1
16: 1 Shared weights
Even fewer parameters
…
The whole CNN
cat dog ……
Convolution
Max Pooling
Can repeat
Fully Connected many times
Feedforward network
Convolution
Max Pooling
Flattened
Max Pooling
1 -1 -1 -1 1 -1
-1 1 -1 Filter 1 -1 1 -1 Filter 2
-1 -1 1 -1 1 -1
3 -1 -3 -1 -1 -1 -1 -1
-3 1 0 -3 -1 -1 -2 1
-3 -3 0 1 -1 -1 -2 1
3 -2 -2 -1 -1 0 -4 3
Why Pooling
• Subsampling pixels will not change the object
bird
bird
Subsampling
We can subsample the pixels to make image smaller

fewer parameters to characterize the image
A CNN Compresses a Fully Connected
Network in two ways:
• Reducing number of connections
• Shared weights on the edges
• Max pooling further reduces the complexity
Max Pooling
New image
1 0 0 0 0 1 but smaller
0 1 0 0 1 0 Conv
3 0
0 0 1 1 0 0 -1 1
1 0 0 0 1 0
0 1 0 0 1 0 Max 3 1
0 3
0 0 1 0 1 0 Pooling
2 x 2 image
6 x 6 image
Each filter
is a channel
The whole CNN
3 0
-1 1 Convolution
3 1
0 3
Max Pooling
Can repeat
A new image
many times
Smaller than the original image Convolution
The number of channels is the

number of filters Max Pooling
The whole CNN
cat dog ……
Convolution
Max Pooling
Fully Connected A new image

Feedforward network
Convolution
Max Pooling
Flattened A new image

3
Flattening
0
1
3 0
-1 1 3
3 1 -1
0 3 Flattened
1 Fully Connected
Feedforward network
3
Review Questions
1. What are the three different types of Neural Network?
2. What is the major challenge of Gradient Descent algorithm
3. What is learning rate in Neural Network and why its value should be small.
4. What filtering and max-pooling in CNN
5. What are the advantages of Deep Learning?
Thank you

ML NeuralNetwork DeepLearning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML NeuralNetwork DeepLearning

Uploaded by

Copyright:

Available Formats

MACHINE LEARNING

Neural Network & Deep Learning

Presented by: Dr. S. Nadeem Ahsan

• The Learning process (supervised)

2) NN can predict new outcomes previously untrained

3) NNs possess the capability to generalize. Thus, they can, predic

4) NNs can process information in parallel, at high speed, and in di

Type of Architecture Gradient Descent Hebbian Competitive Stochastic

Multilayer Feedforward CCN, MLFF, RBF Neocognitrom

Recurrent NN RNN BAM, BSB, ART Boltzmann

1. Deep Learning Architecture

• Forward propagation gives all the a’s and z’s

• Back-propagation gives all the d ’s

Convolution Neural Network

They can be compressed

We can subsample the pixels to make image smaller

The number of channels is the

Fully Connected A new image

Flattened A new image

You might also like