You are on page 1of 48

GEN101

Introductory
Artificial Intelligence
College of Engineering

Neural Networks
Types of Machine
Learning – Recap
Neurons
Neurons
Dendrites
o Basic unit of a neural network
Synapses o Very simple processing units
Axon o A neuron takes inputs, does some
Cell Body
math with them, and produces one
output

x1
1

3
W1
𝑥1 → 𝑥 1 𝑤1

Output

4
W2 f(x) = max(x,0)
+b f() 6y
Inputs

x2
0 +1 f(6)

𝑥2 → 𝑥 2 𝑤 2
∗ 2
W3
(

1
x3
2-
1- All
3- The
Eachthe weighted
sum is is
input inputs
passed are
by aadded
through
multiplied an
weight
𝑥3 → 𝑥. 3 𝑤3
∗ together
activationwith a bias b
function
Neurons
Dendrites
o Basic unit of a neural network
Synapses o Very simple processing units
Axon o A neuron takes inputs, does some
Cell Body
math with them, and produces one
output

x1
-1

3
W1
𝑥1 → 𝑥 1 𝑤1

Output

4
W2 f(x) = max(x,0)
+b f() 0y
Inputs

x2
0 +1 f(-4)

𝑥2 → 𝑥 2 𝑤 2
∗ 2
W3
(

-1
x3
2-
1- All
3- The
Eachthe weighted
sum is is
input inputs
passed are
by aadded
through
multiplied an
weight
𝑥3 → 𝑥. 3 𝑤3
∗ together
activationwith a bias b
function
Neuron Model
Model Parameters

x1
Neuros have a set of
parameters, called weights (like
the slope in linear regression) 𝑥1 → 𝑥 1 𝑤1 W1

• Called w’s Output
• The neuron model has also a
bias (like the y-intercept in linear W2 f(x) = max(x,0)
regression) Inputs x2 +b
+b f()
f(.) y
• Called b
• Together the w’s and the b 𝑥2 → 𝑥 2 𝑤 2
∗ W3
create a line in the space to
divide it into regions (classes) (
• Finally, the neuron has a non-
linear activation function
x3
• Called f()
• The nonlinearity will allow us to 𝑥3 → 𝑥. 3 𝑤3

create curves instead of lines .


to divide the space
Neuron Example
Weights, Bias, and Activations
Inputs Weights Output
(X) (W) (X*W)
5.4 8 43.2
x1
-10.2 5 -51.0
𝑥1 → 𝑥 1 𝑤1 W1
-0.1 22 -2.2 ∗
Output
101.4 -5 -507.0 W2 f(x) = max(x,0)
Inputs x2 +b
+b f()
f(.) y
0.0 2 0.0
𝑥2 → 𝑥 2 𝑤 2
∗ W3
12.0 -3 -36.0
(
Linear -553.0
x3
Combination
Bias 10.0 𝑥3 → 𝑥. 3 𝑤3

.
Output 0
(Activation)
Neuron Activation Functions
Connecting
Neurons to Build
Neural Networks
Neural Network
Brain Analogy

• A neural network is nothing more than a bunch of


neurons connected together.
• It is a system that consists of a large number of
neurons, each of which can process information on its
own so that instead of having a CPU process each
piece of information one after the other
Simple Neural Network
Dendrites This network has:
• n inputs (x1, x2, …, xn)
Synapses
• a hidden layer with n
Axon neurons (Ha​, Hb​, …, Hz)
• a hidden layer with n
neurons (V$​, V#​, …, V*)
• an output layer with one or
more neurons (y​)
Cell Body • All neurons are connected to
Hidden Hidden each other  Called fully
Layer 1 Layer 2 connected
x1 W1
a
Wa$ • A neural network can
W
1b Ha W V$ have any number of
layers with any number of
x2 W
a#
$1 neurons in those layers 
Hb V#
Inputs

.
.
.
. .
y1 Output
. .
x . .
W1

n-1
z

Hz V*
xn
Thereiscan
A hidden layer anybe multiple
layer hidden
between the layers!
input (first) layer and output (last) layer
Neural Networks in Practice
Designing Neural Networks Architectures
Number of Hidden Layers

Hidden Hidden Hidden Hidden


Layer 1 Layer 2 Layer 3 Layer 4
x1 W11
W12
h1 v1 v1 v1

W12
x2

Output
Inputs

h2 v2 v2 v2

x y
n-1

hn vn vn vn

xn

o The number of hidden layers is dependent on the problem


o You’re essentially trying to design the best network architecture — not too big, not too small, just right
o Generally, 1–5 hidden layers will serve you well for most problems
Designing Neural Networks Architectures
Neurons per Hidden Layers

Hidden Hidden
x1
W11 Layer 1 Layer 2
W12
h1
x2

v1 Output
Inputs

h2
y
x
n-1

v2

hn
xn

o In general, using the same number of neurons for all hidden layers will suffice
o For some datasets, having a large first layer and following it up with smaller layers will lead to better performance as
the first layer can learn a lot of lower-level features that can feed into higher order features in subsequent layers.
o Usually, you will get more of a performance boost from adding more layers than adding more neurons in each layer
o Remember: When choosing the number of layers/neurons if the number is too small, your network will not be able
to learn the underlying patterns in your data and thus be useless.
Designing Neural Networks Architectures
Activation functions
Sigmoid Tanh ReLU Leaky ReLU

• Towards the ends of the • Derivative is more steep • It doesn’t saturate. • Can be used as improvement
function, y values react very little compared to Sigmoid. over ReLU Activation function.
to the changes in x • It converges faster than some
• It is more efficient because it has other activation functions • It has all properties of ReLU,
• The derivative values in these a wider range for faster learning and it overcomes the ReLU
(sigmoid and tanh).
regions are very small and [-1,1] drawbacks.
converge to 0. This is called the • The most commonly used
vanishing gradient and the • The problem of gradients at the • Unlike the ReLU, the value of
ends of the function continues. activation function, because of its hyper-parameter (α) is defined
learning is minimal.
simplicity during the optimization prior to the training and hence
• When slow learning occurs, the process (backpropagation) cannot be adjusted during the
optimization algorithm cannot training time. The value of α
get maximum performance from • It is not computationally hence chosen might not be the
the neural network model. expensive. most optimal value.
Experimenting with Neural Network Designs
Explore effect of changes
• https://playground.tensorflow
.org/
• Compare the model
performance by changing
the activation functions
• Changing the activation
function from linear to other
non-linear functions added
non-linearity in the network,
which made the network
strong enough to capture the
relationship within the data
Training
Neural Networks
Searching for the Best Parameters
By Optimization the RMSE
Introduction to Optimization with the
Gradient Descent
Introduction to Optimization with the
Gradient Descent
How to Train a Neural
Network?

Finding the best


weights and biases
We use Backpropagation
Backpropagation Example

-2
x

5
+ q
y

* f

z -4
Backpropagation Example

-2
x

5
+ q 3
y

-12
* f

z -4

We need to find:
Backpropagation Example

-2
x

5
+ q 3
y

-12
* f
1
z -4

We need to find:
Backpropagation Example

-2
x

5
+ q 3
y -4
-12
* f
1
z -4
3

We need to find:
Backpropagation Example

-2
x
-4

5
+ q 3
y -4
-4
-12
* f
1
z -4
3

We need to find:
Backpropagation Example
Another Example

2.00

w0 -2.00
*

-1.00
x0
4.00
+
-3.00

w1 6.00
*

1.00 -1.00 0.37 1.37 0.73


x1 + *-1 exp +1 1/x f
-2.00

-3.00
w2
Backpropagation Example

2.00

w0 -2.00
*

-1.00
x0
4.00
+
-3.00

w1 6.00
*

1.00 -1.00 0.37 1.37 0.73


x1 + *-1 exp +1 1/x
1.00
f
-2.00

-3.00
w2
Backpropagation Example

2.00

w0 -2.00
*

-1.00
x0
4.00
+
-3.00

w1 6.00
*

1.00 -1.00 0.37 1.37 0.73


x1 + *-1 exp +1 1/x
1.00
f
-2.00

-3.00
w2
Backpropagation Example

2.00

w0 -2.00
*

-1.00
x0
4.00
+
-3.00

w1 6.00
*

1.00 -1.00 0.37 1.37 0.73


x1 + *-1 exp +1
-0.53
1/x
1.00
f
-2.00

-3.00
w2
Backpropagation Example

2.00

w0 -2.00
*

-1.00
x0
4.00
+
-3.00

w1 6.00
*

1.00 -1.00 0.37 1.37 0.73


x1 + *-1 exp +1
-0.53
1/x
1.00
f
-2.00

-3.00
w2
Backpropagation Example

2.00

w0 -2.00
*

-1.00
x0
4.00
+
-3.00

w1 6.00
*

1.00 -1.00 0.37 1.37 0.73


x1 + *-1 exp
-0.53
+1
-0.53
1/x
1.00
f
-2.00

-3.00
w2
Backpropagation Example

2.00

w0 -2.00
*

-1.00
x0
4.00
+
-3.00

w1 6.00
*

1.00 -1.00 0.37 1.37 0.73


x1 + *-1 exp
-0.53
+1
-0.53
1/x
1.00
f
-2.00

-3.00
w2
Backpropagation Example

2.00

w0 -2.00
*

-1.00
x0
4.00
+
-3.00

w1 6.00
*

1.00 -1.00 0.37 1.37 0.73


x1 + *-1 exp
-0.53
+1
-0.53
1/x
1.00
f
-0.20
-2.00

-3.00
w2
Backpropagation Example

2.00

w0 -2.00
*

-1.00
x0
4.00
+
-3.00

w1 6.00
*

1.00 -1.00 0.37 1.37 0.73


x1 + *-1 exp
-0.53
+1
-0.53
1/x
1.00
f
-0.20
-2.00

-3.00
w2
Backpropagation Example

2.00

w0 -2.00
*

-1.00
x0
4.00
+
-3.00

w1 6.00
*

1.00 -1.00 0.37 1.37 0.73


x1 +
0.20
*-1 exp
-0.53
+1
-0.53
1/x
1.00
f
-0.20
-2.00

-3.00
w2
Backpropagation Example

2.00

w0 -2.00
*

-1.00
x0
4.00
+
-3.00

w1 6.00
*

1.00 -1.00 0.37 1.37 0.73


x1 +
0.20
*-1 exp
-0.53
+1
-0.53
1/x
1.00
f
-0.20
-2.00

-3.00
w2
Backpropagation Example

2.00

w0 -2.00
*

-1.00
x0
4.00
+
-3.00 0.20

w1 6.00
*

1.00 -1.00 0.37 1.37 0.73


x1 +
0.20
*-1 exp
-0.53
+1
-0.53
1/x
1.00
f
-0.20
-2.00

-3.00
w2
0.20
Backpropagation Example

2.00

w0 -2.00
*
0.20
-1.00
x0
4.00
+
-3.00 0.20

w1 6.00
*
0.20
1.00 -1.00 0.37 1.37 0.73
x1 +
0.20
*-1 exp
-0.53
+1
-0.53
1/x
1.00
f
-0.20
-2.00

-3.00
w2
0.20
Backpropagation Example

2.00
-0.20

w0 -2.00
*
0.20
-1.00
x0
0.40 4.00
+
-3.00 0.20

w1 6.00
*
0.20
1.00 -1.00 0.37 1.37 0.73
x1 +
0.20
*-1 exp
-0.53
+1
-0.53
1/x
1.00
f
-0.20
-2.00

-3.00
w2
0.20
Backpropagation Example

2.00
-0.20

w0 -2.00
*
0.20
-1.00
x0
0.40 4.00
+
-3.00 0.20

-0.40

w1 6.00
*
0.20
1.00 -1.00 0.37 1.37 0.73
x1 +
0.20
*-1 exp
-0.53
+1
-0.53
1/x
1.00
f
-0.20
-2.00
-0.60
-3.00
w2
0.20
Backpropagation Example
Sigmoid Function

2.00
-0.20

w0 -2.00
*
0.20
-1.00
x0
0.40 4.00
+
-3.00 0.20 (1-0.73)*(0.73) = 0.2
-0.40

w1 6.00
* Sigmoid Gate
0.20
1.00 1.37
-1.00 0.37 0.73
x1 +
0.20
*-1 exp
-0.53
+1
-0.53
1/x
1.00
f
-0.20
-2.00
-0.60
-3.00
w2
0.20
Applications and Examples

Aerospace
Image processing, style transfer
Automotive
Machine translation
Applications

Examples
Electronics Replacement of all previously
discussed algorithms
Manufacturing Object identification on photos
and videos
Mechanics
Speech recognition and synthesis
Robotics

Telecommunication
Advantages & Disadvantages
Classification
Classifica
tion Advantages Disadvantages
Model
K-Nearest • The algorithm is simple and easy to implement. • The algorithm gets significantly slower as the number of
Neighbors • There’s no need to build a model, tune several examples and/or predictors/independent variables increase.
parameters, or make additional assumptions. • Classifying unknown records are relatively expensive
• The algorithm is versatile. It can be used for • It does not build models explicitly
classification, regression, and search..
• Robust to noisy data by averaging k-nearest neighbors

Decision • Inexpensive to construct and easy to visualize • Over-fitting of the data is possible.
Trees • Extremely fast at classifying unknown records • The small variation in the input data can result in a different
• Easy to interpret for small-sized trees decision tree. This can be reduced by using feature
• Accuracy is comparable to other classification techniques engineering techniques.
for many simple data sets • We have to balance the data-set before training the model.
• Non-linear patterns in the data can be captured easily.

Neural • Adaptive Learning • Hardware dependence


Networks • Self-Organization • Unexplained behavior of the network
• Real-Time Operation • Determination of proper network structure
• Prognosis • Difficulty of showing the problem to the network
• Fault Tolerance • The duration of the network is unknown
Using a Trained
Neural Network in
Practice

Forward Pass
Optimizing
Performance to
Train Neural
Networks

Gradient Descent

You might also like