You are on page 1of 62

Artificial Neural Network̄

https://jalammar.github.io/visual-interactive-guide-basics-neural-networks/

https://www.simplilearn.com/tutorials/deep-learning-tutorial/perceptron
Neurons and History of ANN
• Artificial Neural Network Fundamentals
• Artificial neural networks (ANNs) describe a specific class of machine learning
algorithms designed to acquire their own knowledge by extracting useful
patterns from data.
• ANNs are function approximators, mapping inputs to outputs, and are
composed of many interconnected computational units, called neurons.
• Each individual neuron possesses little intrinsic approximation capability;
however, when many neurons function cohesively together, their combined
effects show remarkable learning performance.
Biological Model

• ANNs are engineered computational models inspired by the brain


(human & animal).
• While some researchers used ANNs to study animal brains, most
researchers view neural networks as being inspired by, not models of,
neurological systems.
• Figure 1 shows the basic functional unit of the brain, a biologic
neuron.

What is a Neural Network?
• The term ‘Neural’ has origin from the human (animal) nervous
system’s basic functional unit ‘neuron’ or nerve cells present in
the brain and other parts of the human (animal) body.
• A neural network is a group of algorithms that certify the
underlying relationship in a set of data similar to the human
brain.
• The neural network helps to change the input so that the
network gives the best result without redesigning the output
procedure.
Working of a Biological Neuron
• Parts of Neuron and their Functions
• The typical nerve cell of the human brain comprises of four parts –
• Function of Dendrite
• It receives signals from other neurons.
• Soma (cell body)
• It sums all the incoming signals to generate input.
• Axon Structure
• When the sum reaches a threshold value, the neuron fires, and the signal travels
down the axon to the other neurons.
• Synapses Working
• The point of interconnection of one neuron with other neurons. The amount of
signal transmitted depends upon the strength (synaptic weights) of the connections.
Artificial Neural Networks (ANN) and Biological Neural Networks (BNN) - Difference

ANN BNN
It is short for Artificial Neural Network. It is short for Biological Neural Network.

Processing speed is fast as compared to


They are slow in processing information.
Biological Neural Network.
Allocation for Storage to a new process is Allocation for storage to a new process is
strictly irreplaceable as the old location is easy as it is added just by adjusting the
saved for the previous process. interconnection strengths.
The process can operate in massive parallel
Processes operate in sequential mode.
operations.
Information is distributed into the network
If any information gets corrupted in the
throughout into sub-nodes, even if it gets
memory it cannot be retrieved.
corrupted it can be retrieved.
The activities are continuously monitored by a There is no control unit to monitor the
control unit. information being processed into the network.
Artificial Neural Networks (ANN) and Biological
Neural Networks (BNN) - Difference
Artificial Neural Network (ANN) With Biological Neural Network (BNN) -
Comparison

• The Biological Neural Network's dendrites are


analogous to the weighted inputs based on their
synaptic interconnection in the Artificial Neural
Network.

• The cell body is comparable to the artificial


neuron unit in the Artificial Neural Network,
comprising summation and threshold unit.

• Axon carries output that is analogous to the


output unit in the case of an Artificial Neural
Network. So, ANN is model using the working of
basic biological neurons.
APPLICATIONS OF NEURAL NETWORKS

• Neural networks can perform the following tasks:


• Text Classification and Categorization
• Identify faces
• Recognize speech
• Read handwritten text
• Control robots
How Does Artificial Neural Network
Works?
Nodes
• As mentioned previously, biological neurons are connected hierarchical networks, with the outputs of
some neurons being the inputs to others.
• We can represent these networks as connected layers of nodes.
• Each node takes multiple weighted inputs, applies the activation function to the summation of these
inputs, and in doing so generates an output.
• This will break this down further, but to help things along, consider the diagram below:

• Figure 2. Node with inputs


What is Weight (Artificial Neural Network)?

• Weight is the parameter within a neural network that transforms input data within the network's hidden
layers.
• A neural network is a series of nodes, or neurons. Within each node is a set of inputs, weight, and a bias value.
• As an input enters the node, it gets multiplied by a weight value and the resulting output is either observed, or
passed to the next layer in the neural network.
• Often the weights of a neural network are contained within the hidden layers of the network.
• It is helpful to imagine a theoretical neural network to understand how weights work. Within a neural
network there's an input layer, that takes the input signals and passes them to the next layer.
• Next, the neural network contains a series of hidden layers which apply transformations to the input
data. It is within the nodes of the hidden layers that the weights are applied.
• For example, a single node may take the input data and multiply it by an assigned weight value, then
add a bias before passing the data to the next layer.
• The final layer of the neural network is also known as the output layer. The output layer often tunes
the inputs from the hidden layers to produces the desired numbers in a specified range.
The Bias
• Biases, which are constant, are an additional input into the next
layer that will always have the value of 1.
• Bias units are not influenced by the previous layer (they do not have
any incoming connections) but they do have outgoing connections
with their own weights.
• The bias unit guarantees that even when all the inputs are zeros
there will still be an activation in the neuron
• Every neuron in the hidden layers are associated with a bias term.
• The bias term help us to control the firing threshold in each neuron
• It acts like the intercept in a linear equation (y = sum(wx) + b).
• If sum(mx) is not crossing the threshold but the neuron needs to fire
• bias will be adjusted to lower that neuron’s threshold to make it
fire! Network learns richer set of patterns using bias
• The bias term is also considered as input though it does not come
from data
Weight vs. Bias
• Weights and bias are both learnable parameters inside the network.
• A teachable neural network will randomize both the weight and bias values before learning
initially begins.
• As training continues, both parameters are adjusted toward the desired values and the correct
output.
• The two parameters differ in the extent of their influence upon the input data.
• Simply, bias represents how far off the predictions are from their intended value.
• Biases make up the difference between the function's output and its intended output.
• A low bias suggest that the network is making more assumptions about the form of the
output, whereas a high bias value makes less assumptions about the form of the output.
• Weights, on the other hand, can be thought of as the strength of the connection.
• Weight affects the amount of influence a change in the input will have upon the output.
• A low weight value will have no change on the input, and alternatively a larger weight value
will more significantly change the output.
How Does Artificial Neural Network Works?
• Artificial Neural Networks can be viewed as weighted directed graphs in which artificial neurons are
nodes, and directed edges with weights are connections between neuron outputs and neuron inputs.
• The Artificial Neural Network receives information from the external world in pattern and image in
vector form.
• These inputs are designated by the notation x(n) for n number of inputs.
• Each input is multiplied by its corresponding weights.
• Weights are the information used by the neural network to solve a problem.
• Typically weight represents the strength of the interconnection between neurons inside the Neural
Network.
• The weighted inputs are all summed up inside the computing unit (artificial neuron).
• In case the weighted sum is zero, bias is added to make the output not- zero or to scale up the system
response. Bias has the weight and input always equal to ‘1'.
• The sum corresponds to any numerical value ranging from 0 to infinity.
• To limit the response to arrive at the desired value, the threshold value is set up.
• For this, the sum is forward through an activation function.
• The activation function is set to the transfer function to get the desired output. There are linear as well
as the nonlinear activation function.
Types of Neural Networks in Artificial Intelligence
Parameter Types Description
Based on the
FeedForward, Feedforward - In which graphs have no loops.
connection
Recurrent Recurrent - Loops occur because of feedback.
pattern
Based on the
Single-layer, Single Layer - Having one secret layer. E.g., Single Perceptron
number of
Multi-Layer Multilayer - Having multiple secret layers. Multilayer Perceptron
hidden layers
Based on the
Fixed - Weights are a fixed priority and not changed at all.
nature of Fixed, Adaptive
Adaptive - Updates the weights and changes during training.
weights

Static - Memory less unit. The current output depends on the current
Based on the input. E.g., Feedforward network.
Static, Dynamic
Memory unit Dynamic - Memory unit - The output depends upon the current input
as well as the current output. E.g., Recurrent Neural Network
Neural Network Architecture Types
Neural Network Architecture Types
• Perceptron Model in Neural Networks
Neural Network is having two input units and one output unit with no hidden layers. These are also known as ‘single-layer perceptrons.‘
• Radial Basis Function Neural Network
• These networks are similar to the feed-forward Neural Network, except radial basis function is used as these neurons' activation function.
• Multilayer Perceptron Neural Network
• These networks use more than one hidden layer of neurons, unlike single-layer perceptron. These are also known as Deep Feedforward
Neural Networks.
• Recurrent Neural Network
• Type of Neural Network in which hidden layer neurons have self-connections. Recurrent Neural Networks possess memory. At any instance,
the hidden layer neuron receives activation from the lower layer and its previous activation value.
• Long Short-Term Memory Neural Network (LSTM)
• The type of Neural Network in which memory cell is incorporated into hidden layer neurons is called LSTM network.
• Hopfield Network
• A fully interconnected network of neurons in which each neuron is connected to every other neuron. The network is trained with input
patterns by setting a value of neurons to the desired pattern. Then its weights are computed. The weights are not changed. Once trained for
one or more patterns, the network will converge to the learned patterns. It is different from other Neural Networks.
• Boltzmann Machine Neural Network
• These networks are similar to the Hopfield network, except some neurons are input, while others are hidden in nature. The weights are
initialized randomly and learn through the back propagation algorithm.
Learning Techniques in Neural Networks
• Supervised Learning
• In this learning, the training data is input to the network, and the desired
output is known weights are adjusted until production yields desired value.
• Unsupervised Learning
• Use the input data to train the network whose output is known. The network
classifies the input data and adjusts the weight by feature extraction in input
data.
• Reinforcement Learning
• Here, the output value is unknown, but the network provides feedback on
whether the output is right or wrong. It is Semi-Supervised Learning.
Artificial Neural Network Architecture
• A typical Neural Network contains a large number of artificial neurons
called units arranged in a series of layers. In typical Artificial Neural
Network comprises different layers –
•Input layer - It contains those units (Artificial Neurons) which
receive input from the outside world on which the network will
learn, recognize about, or otherwise process.
•Output layer - It contains units that respond to the information
about how it learn any task.
•Hidden layer - These units are in between input and output
layers. The hidden layer's job is to transform the input into
something that the output unit can use somehow.

Connect Neural Networks, which means say each hidden


neuron links completely to every neuron in its previous
layer(input) and the next layer (output) layer.
What are the 4 Different Techniques of Neural
• Classification Neural Network
Networks?
• A Neural Network can be trained to classify a given pattern or dataset into a predefined class.
It uses Feedforward Networks.
• Prediction Neural Network
• A Neural Network can be trained to produce outputs that are expected from a given input.
E.g., - Stock market prediction.
• Clustering Neural Network
• The Neural network can identify a unique feature of the data and classify them into different
categories without any prior knowledge of the data. Following networks are used for
clustering -Competitive networks
• Adaptive Resonance Theory Networks and Kohonen Self-Organizing Maps.
• Association Neural Network
• Train the Neural Network to remember the particular pattern. When the noise pattern is
presented to the network, the network associates it with the memory's closest one or
discards it. E.g., Hopfield Networks, which performs recognition, classification, and clustering,
etc
Neural Networks for Pattern Recognition
Neural Networks for Pattern Recognition
• Pattern recognition is the study of how machines can observe the
environment, learn to distinguish patterns of interest from their background,
and make sound and reasonable decisions about the patterns' categories.
• Some examples of the pattern are - fingerprint images, a handwritten word, a
human face, or a speech signal. Given an input pattern, its recognition involves
the following task –
• Supervised classification - Given the input pattern is known as the member of
a predefined class.
• Unsupervised classification - Assign pattern is to a hitherto unknown class.
• So, the recognition problem here is essentially a classification or categorized
task. The design of pattern recognition systems usually involves the following
three aspects-Data acquisition and preprocessing
• Data representation
• Decision Making
Artificial Neural Network (ANN)
• Artificial Neural Network (ANN) are composed of multiple nodes. These nodes
act as biological neurons of human brain.
• The neurons are linked together and they interact with each other.
• The nodes can take input data and perform simple operations on the basis of
that data.
• The result of these operations is passed to other neurons. The output at each
node is called its activation or node value.
• Each link contain its own weight.
• ANNs are capable of learning, which takes place by altering weight values.
Single-layer Neural Networks (Perceptrons)
Single-layer Neural Networks (Perceptrons)
To build up towards the (useful) multi-layer Neural Networks, we will start with considering the (not really useful) single-layer
Neural Network. This is called a Perceptron.

The Perceptron
Input is multi-dimensional (i.e. input can be a vector):

input x = ( I1, I2, .., In)

Input nodes (or units) are connected (typically fully) to a node (or

multiple nodes) in the next layer.

A node in the next layer takes a weighted sum of all its inputs:

Summed input
Example

input x = ( I1, I2, I3) = ( 5, 3.2, 0.1 )

Summed input = = 5 w1 + 3.2 w2 + 0.1 w3


Perceptron

• Perceptron was introduced by Frank Rosenblatt in 1957. He proposed a Perceptron learning rule based on the
original MCP neuron. A Perceptron is an algorithm for supervised learning of binary classifiers. This algorithm
enables neurons to learn and processes elements in the training set one at a time.
• Perceptron has the following characteristics:
• Perceptron is an algorithm for Supervised Learning of single layer binary linear classifiers.
• Optimal weight coefficients are automatically learned.
• Weights are multiplied with the input features and decision is made if the neuron is fired or not.
• Activation function applies a step rule to check if the output of the weighting function is greater than zero.
• Linear decision boundary is drawn enabling the distinction between the two linearly separable classes +1 and -1.
• If the sum of the input signals exceeds a certain threshold, it outputs a signal; otherwise, there is no output.
• https://www.computing.dcu.ie/~humphrys/Notes/Neural/single.neural.html
Types of Perceptron
• There are two types of Perceptron: Single layer and Multilayer.
• Single layer - Single layer perceptron can learn only linearly separable patterns
• Multilayer - Multilayer perceptron or feed forward neural networks with two or more layers
have the greater processing power
• The Perceptron algorithm learns the weights for the input signals in order to draw a linear
decision boundary.
• This enables you to distinguish between the two linearly separable classes +1 and -1.
• Perceptron Learning Rule
• Perceptron Learning Rule states that the algorithm would automatically learn the optimal
weight coefficients. The input features are then multiplied with these weights to determine if a
neuron fires or not.
Perceptron Function
• Perceptron is a function that maps its input “x,” which is multiplied with the learned
weight coefficient; an output value ”f(x)”is generated.

• In the equation given above:


• “w” = vector of real-valued weights
• “b” = bias (an element that adjusts the boundary away from origin without any
dependence on the input value)
• “x” = vector of input x values

• “m” = number of inputs to the Perceptron


• The output can be represented as “1” or “0.” It can also be represented as “1” or “-1”
depending on which activation function is used.
Single-layer Perceptron or ANN
• Perceptron rule used to train a single-layer neural network.
• Weights are updated based on a unit function
Inputs of a Perceptron

• A Perceptron accepts inputs, moderates them with certain weight


values, then applies the transformation function to output the final
result.
• A Boolean output is based on inputs such as salaried, married, age,
past credit profile, etc. It has only two values: Yes and No or True and
False. The summation function “∑” multiplies all inputs of “x” by
weights “w” and then adds them up as follows:
Activation Functions of Perceptron

• The activation function applies a step rule (convert the numerical output into +1 or
-1) to check if the output of the weighting function is greater than zero or not.
• For example:
• If ∑ wixi> 0 => then final output “o” = 1 (issue bank loan)
• Else, final output “o” = -1 (deny bank loan)
• Step function gets triggered above a certain value of the neuron output; else it
outputs zero.
• Sign Function outputs +1 or -1 depending on whether neuron output is greater than
zero or not.
• Sigmoid is the S-curve and outputs a value between 0 and 1.
Single-layer Perceptrons can learn only linearly separable patterns. For classification we as
Activation function as a threshold to predict class. And for Regression, we need not need the
Activation function (Thresholding) or we can use a linear function to predict continuous value.
Input is typically a feature vector x multiplied by weights w and added to a bias b: y = w * x + b

where w denotes the vector of weights, x is the vector of inputs, b is the bias and φ is the
non-linear activation function.

For Weight Updation or perceptron learn through backpropagation. we will see that in further
section in detail.
Output:

• The figure shows how the decision function squashes wTx to either +1
or -1 and how it can be used to discriminate between two linearly
separable classes.
Multi-perceptron /multilayer Neural Network
• A fully connected multi-layer neural network is called a Multilayer Perceptron
(MLP).
• It has 3 layers including one hidden layer. If it has more than 1 hidden layer, it
is called a deep ANN.
• An MLP is a typical example of a feed forward artificial neural network.
• The number of layers and the number of neurons are referred to as hyper
parameters of a neural network, and these need tuning.
• The weight adjustment training is done via back propagation. Deeper neural
networks are better at processing data.
• However, deeper layers can lead to vanishing gradient problems. Special
algorithms are required to solve this issue.
Multi -layer Perceptron(MLP)
Putting together the structure
• Hopefully the previous explanations have given you a good overview of how a
given node/neuron/perceptron in a neural network operates.
• However, as you are probably aware, there are many such interconnected
nodes in a fully fledged neural network.
• These structures can come in a different forms, but the most common simple
neural network structure consists of an input layer, a hidden layer and an
output layer.
• An example of such a structure can be seen below:

Figure . Three layer neural network


Putting together the structure
• The three layers of the network can be seen in the above figure – Layer 1
represents the input layer, where the external input data enters the network.
• Layer 2 is called the hidden layer as this layer is not part of the input or
output.
• Note: neural networks can have many hidden layers, but in this case for
simplicity I have just included one.
• Finally, Layer 3 is the output layer. You can observe the many connections
between the layers, in particular between Layer 1 (L1) and Layer 2 (L2).
• As can be seen, each node in L1 has a connection to all the nodes in L2.
Likewise for the nodes in L2 to the single output node L3.
• Each of these connections will have an associated weight.
Notation
Notation
The feed-forward pass
• To demonstrate how to calculate the output from the input in neural networks,
let’s start with the specific case of the three layer neural network that was
presented above.
• Below it is presented in equation form, then it will be demonstrated with a
concrete example and some Python code:
The feed-forward pass
ARTIFICIAL NEURAL NETWORK:BUILDING BLOCKS

Processing of ANN depends upon the following three building block


• Network Topology
• Adjustments of Weights or Learning
• Activation Function
Network Topology
• Types of Neural Networks
• Feed Forward Neural Network
• Signals travel in one way i.e. from input to output only in Feed forward Neural Network. There is no
feedback or loops. The output of any layer does not affect that same layer in such networks. Feed
forward neural networks are straight forward networks that associate inputs with outputs. They have
fixed inputs and outputs. They are mostly used in pattern generation, pattern recognition and
classification.
Network Topology
• Feedback Neural Network
• Signals can travel in both the directions in Feedback neural networks. Feedback neural networks are very
powerful and can get very complicated. Feedback neural networks are dynamic. The ‘state’ in such network
keep changing until they reach an equilibrium point. They remain at the equilibrium point until the input
changes and a new equilibrium needs to be found. Feedback neural network architecture is also referred to
as interactive or recurrent, although the latter term is often used to denote feedback connections in
single-layer organisations. Feedback loops are allowed in such networks. They are used in content
addressable memories.
The structure of an ANN-The artificial
neuron
• The biological neuron is simulated in an ANN by an activation function. In
classification tasks (e.g. identifying spam e-mails) this activation function has to
have a “switch on” characteristic – in other words, once the input is greater than
a certain value, the output should change state i.e. from 0 to 1, from -1 to 1 or
from 0 to >0.
• This simulates the “turning on” of a biological neuron.
• A common activation function that is used is the sigmoid function
ACTIVATION
FUNCTIONS
• Activation functions are mathematical equations that determine the output of a
neural network.
• The function is attached to each neuron in the network, and determines whether it
should be activated (“fired”) or not, based on whether each neuron’s input is relevant
for the model’s prediction.
• Activation functions also help normalize the output of each neuron to a range
between 1 and 0 or between -1 and 1.
• An additional aspect of activation functions is that they must be computationally
efficient because they are calculated across thousands or even millions of neurons for
each data sample.
• Modern neural networks use a technique called back propagation to train the model,
which places an increased computational strain on the activation function, and its
derivative function.
ACTIVATION FUNCTIONS
• Sigmoid / Logistic
• Smooth gradient, preventing “jumps” in output values.  Output values
bound between 0 and 1, normalizing the output of each neuron.
• Clear predictions: For X above 2 or below - 2, tends to bring the Y value (the
prediction) to the edge of the curve, very close to 1 or 0. This enables clear
predictions.
• Disadvantages
• Vanishing gradient: for very high or very low values of X, there is almost no
change to the prediction, causing a vanishing gradient problem.
• This can result in the network refusing to learn further, or being too slow to
reach an accurate prediction.
• Outputs not zero cantered and Computationally expensive
ACTIVATION FUNCTIONS
• TanH / HYPERBOLIC TANGENT
• Advantages: Zero cantered- making it easier to model inputs that
have strongly negative, neutral, and strongly positive values.
• Otherwise like the Sigmoid function.
• Disadvantages: Like the Sigmoid function
Simplest Neural Network
• This simplest NN model only contains a neuron. We can treat a neuron (node) as a logistic
unit with Sigmoid (logistic) Activation Function, which can output a computation value
based on sigmoid activation function.
• The terminology of parameters θ in NN is called ‘weights’.
• Depending on the problems, you can decide whether to use the bias units or not.
• Neural Network (NN)
• Layer 1 is called Input Layer that inputs features.
• Last Layer is called Output Layer that outputs the final value computed by hypothesis H.
• The layer between Input Layer and Output Layer is called Hidden Layer, which is a block we
group neurons together.c
Implementing Logic Gates using Neural
Networks
• The input values, i.e., x1, x2, and 1 is multiplied with their respective weight
matrix that is W1, W2, and W0. The corresponding value is then fed to the
summation neuron where we have the summed value which is.

• Now, this value is fed to a neuron which has a non-linear function(sigmoid


in our case) for scaling the output to a desirable range.
• The scaled output of sigmoid is 0 if the output is less than 0.5 and 1 if the
output is greater than 0.5.
• Our main aim is to find the value of weights or the weight vector which will
enable the system to act as a particular gate.
Implementing AND gate

• AND gate operation is a simple multiplication operation between the


inputs. If any of the input is 0, the output is 0. In order to achieve 1 as
the output, both the inputs should be 1. The truth table below
conveys the same information.
Moving on to XOR gate
• For the XOR gate, the truth table on the left side of the image below
depicts that if there are two complement inputs, only then the output
will be 1. If the input is the same(0,0 or 1,1), then the output will be
0. The points when plotted in the x-y plane on the right gives us the
information that they are not linearly separable

You might also like