You are on page 1of 21

GEN101

Introductory
Artificial Intelligence
College of Engineering

Neural Networks
Types of Machine
Learning – Recap
Neurons
Neurons
Dendrites
o Basic unit of a neural network
Synapses o Very simple processing units
Axon o A neuron takes inputs, does some
Cell Body
math with them, and produces one
output

x1
1

3
W1
𝑥1 → 𝑥 1 𝑤 1
2- All the weighted inputs are added

Output
together with a bias b
4
W2 f(x) = max(x,0)
+b f() 6y
Inputs

x2
0 +1 f(6)
3- The sum is passed through an
𝑥 2 → 𝑥 2 𝑤 2 W3
∗ 2 activation function
(
1- Each input is multiplied by a weight
1
x3

𝑥 3 → 𝑥. 3 𝑤3

Neurons
Dendrites
o Basic unit of a neural network
Synapses o Very simple processing units
Axon o A neuron takes inputs, does some
Cell Body
math with them, and produces one
output

x1
-1

3
W1
𝑥1 → 𝑥 1 𝑤 1

Output

4
W2 f(x) = max(x,0)
+b f() 0y
Inputs

x2
0 +1 f(-4)

𝑥 2 → 𝑥 2 𝑤 2 W3
∗ 2
(
-1
x3
2-
1- All
3- The
Eachthe weighted
sum is is
input inputs
passed are
by aadded
through
multiplied an
weight
𝑥 3 → 𝑥. 3 𝑤3
∗ together
activationwith a bias b
function
Neuron Model
Model Parameters

x1
Neuros have a set of
parameters, called weights (like
the slope in linear regression) 𝑥1 → 𝑥 1 𝑤 1 W1

• Called w’s Output
• The neuron model has also a
bias (like the y-intercept in W2 f(x) = max(x,0)
linear regression) Inputs x2 +b
+b f()
f(.) y
• Called b
• Together the w’s and the b 𝑥 2 → 𝑥 2 𝑤 2 W3

create a line in the space to
divide it into regions (classes) (
• Finally, the neuron has a non-
linear activation function x3
• Called f()
• The nonlinearity will allow us to 𝑥 3 → 𝑥. 3 𝑤3

create curves instead of lines .


to divide the space
Neuron Example
Weights, Bias, and Activations
Inputs Weights Output
(X) (W) (X*W)
5.4 8 43.2
x1
-10.2 5 -51.0
𝑥1 → 𝑥 1 𝑤 1 W1
-0.1 22 -2.2 ∗
Output
101.4 -5 -507.0 W2 f(x) = max(x,0)
Inputs x2 +b
+b f()
f(.) y
0.0 2 0.0
𝑥 2 → 𝑥 2 𝑤 2 W3

12.0 -3 -36.0
(
Linear -553.0
x3
Combination
Bias 10.0 𝑥 3 → 𝑥. 3 𝑤3

.
Output 0
(Activation)
Neuron Activation Functions
Connecting
Neurons to Build
Neural Networks
Neural Network
Brain Analogy

• A neural network is nothing more than a bunch of


neurons connected together.
• It is a system that consists of a large number of
neurons, each of which can process information on its
own so that instead of having a CPU process each
piece of information one after the other
Simple Neural Network
Dendrites This network has:
• n inputs (x1, x2, …, xn)
Synapses
• a hidden layer with n
Axon neurons (Ha​, Hb​, …, Hz)
• a hidden layer with n
neurons (V$​, V#​, …, V*)
• an output layer with one or
more neurons (y​)
Cell Body • All neurons are connected to
Hidden Hidden each other  Called fully
Layer 1 Layer 2 connected
x1 W1
a
Wa$ • A neural network can
W
1b Ha W V$ have any number of
layers with any number of
x2 W
a#
$1 neurons in those layers
Hb
Inputs

.
.
V#
.
. .
y1 Output
. .
x
W1

. .
n-1
z

There can be multiple hidden layers!


Hz V*
xn
A hidden layer is any layer between the input (first) layer and output (last) layer
Neural Networks in Practice
Designing Neural Networks Architectures
Number of Hidden Layers

Hidden Hidden Hidden Hidden


Layer 1 Layer 2 Layer 3 Layer 4
x1 W11
W12
h1 v1 v1 v1

W12
x2

Output
Inputs

h2 v2 v2 v2

x y
n-1

hn vn vn vn

xn

o The number of hidden layers is dependent on the problem


o You’re essentially trying to design the best network architecture — not too big, not too small, just right
o Generally, 1–5 hidden layers will serve you well for most problems
Designing Neural Networks Architectures
Neurons per Hidden Layers

Hidden Hidden
x1
W11 Layer 1 Layer 2
W12
h1
x2

v1 Output
Inputs

h2
y
x
n-1

v2

hn
xn

o In general, using the same number of neurons for all hidden layers will suffice
o For some datasets, having a large first layer and following it up with smaller layers will lead to better performance as
the first layer can learn a lot of lower-level features that can feed into higher order features in subsequent layers.
o Usually, you will get more of a performance boost from adding more layers than adding more neurons in each layer
o Remember: When choosing the number of layers/neurons if the number is too small, your network will not be able
to learn the underlying patterns in your data and thus be useless.
Designing Neural Networks Architectures
Activation functions
Sigmoid Tanh ReLU Leaky ReLU

• Towards the ends of the • Derivative is more steep • It doesn’t saturate. • Can be used as improvement
function, y values react very little compared to Sigmoid. over ReLU Activation function.
to the changes in x • It converges faster than some
• It is more efficient because it has other activation functions • It has all properties of ReLU,
• The derivative values in these a wider range for faster learning and it overcomes the ReLU
(sigmoid and tanh).
regions are very small and [-1,1] drawbacks.
converge to 0. This is called the • The most commonly used
vanishing gradient and the • The problem of gradients at the • Unlike the ReLU, the value of
ends of the function continues. activation function, because of its hyper-parameter (α) is defined
learning is minimal.
simplicity during the optimization prior to the training and hence
• When slow learning occurs, the process (backpropagation) cannot be adjusted during the
optimization algorithm cannot training time. The value of α
get maximum performance from • It is not computationally hence chosen might not be the
the neural network model. expensive. most optimal value.
Experimenting with Neural Network Designs
Explore effect of changes
• https://playground.tensorflo
w.org/
• Compare the model
performance by changing
the activation functions
• Changing the activation
function from linear to other
non-linear functions added
non-linearity in the network,
which made the network
strong enough to capture
the relationship within the
data
Training
Neural Networks
Searching for the Best Parameters
By Optimization the RMSE
Introduction to Optimization with the
Gradient Descent
Introduction to Optimization with the
Gradient Descent

You might also like