You are on page 1of 44

NEURAL NETWORKS

Neuron
• In the context of a NN, a neuron is the most fundamental unit of
processing.
• It’s also called a perceptron.
• A NN is based on the way a human brain works.
So, it simulates the way the biological neurons signal to one another.

• Apart from the living world, in the realm of Computer Science’s ANNs, a
neuron is a collection of a set of inputs, a set of weights, and an activation
function.
• It translates these inputs into a single output. Another layer of neurons picks
this output as its input and this goes on and on.
In essence, we can say that each neuron is a mathematical function that closely
simulates the functioning of a biological neuron.
History of NNs
Time Period Key Developments and Milestones
1940s-1950s McCulloch and Pitts propose artificial neuron concept
Development of the Perceptron for binary classification tasks
1960s-1970s
- Limited success, leading to a decline in interest
Resurgence with advances in training algorithms, including backpropagation
1980s-1990s - Introduction of Multi-Layer Perceptron (MLP)
Challenges with training deep networks due to the vanishing gradient problem
- Breakthroughs in deep learning with CNNs and RNNs
2000s-2010s - AlexNet (2012) sets a milestone in image recognition
Application of deep learning to various fields, including NLP
- Continued advancement with GANs and Transformers
Deep learning's impact on computer vision, speech recognition, autonomous
2010s-Present systems, & more
Ongoing research to explore new architectures and training techniques
Architecture of biological neuron: 3 basic parts:
Cell body Cell extension Axon Cell extension Dendrite

• The nucleus in the cell body controls the cell’s


functioning.
• The axon extension (having a long tail)
transmits messages from the cell.
• Dendrites extension (like a tree branch)
receive messages for the cell.
• Thus, the biological neurons communicate
with each other by sending chemicals, called
neurotransmitters, across a tiny space, called a
synapse, b/w axons and dendrites of adjacent
neurons.
Neural Network(NN)
• NNs are a series of algorithms that mimic the operations of human
brain to recognize relationships b/w vast amounts of data. In this
sense, NNs refer to systems of neurons, either organic or artificial in
nature.
• Reflects the behavior of human brain allowing computer programs
to recognize patterns and solve common problems in the field of AI,
ML and DL.
• They are used in a variety of applications in financial services, from
forecasting and marketing research to fraud detection and risk
assessment.
• NNs with several process layers are known as "deep" networks and
are used for DL algorithms.
Pros and Cons of NNs
Pros Cons
• Can often work more efficiently and for longer • Still rely on hardware that may require
than humans labour and expertise to maintain
• Ability to handle complex data • Need for large amounts of labeled training
• Can be programmed to learn from prior data
outcomes to strive to make smarter future • May take long periods of time to develop
calculations the code and algorithms
• Are continually being expanded in new fields • Usually report an estimated range or
with more difficult problems estimated amount that may not actualize
• When an element of the NN fails, its parallel • Needs training to operate.
nature can continue without any problem. • Lack of transparency in decision-making
• Sensitivity to input data quality and
• Robustness to noisy or incomplete data
preprocessing
• Can handle high-dimensional data • Ethical considerations in sensitive
• Can uncover hidden patterns and insights decision-making
Artificial Neural Networks (ANNs)
NNs, also known as Artificial Neural Networks (ANNs) or simulated neural networks (SNNs),
are a subset of ML and are at the heart of DL algorithms.
• ANNs are comprised of a node layers, containing an input layer, one or more
hidden layers, and an output layer.
• Each node, or artificial neuron, connects to another and has an associated
weight and threshold.
• If the output of any individual node is above the specified threshold value, that
node is activated, sending data to the next layer of the network. Otherwise, no
data is passed along to the next layer of the network.
• NNs can adapt to changing input; so the network generates the best possible
result without needing to redesign the output criteria.
• The concept of NN, which has its roots in AI, is swiftly gaining popularity in the
development of trading systems.
Components of an ANN
An input layer: The first layer, which accepts input data and transmits it to the layers below.
One or more Layers that process data via interconnected neurons b/w input and output layers.
hidden layers:
An output layer: The layer at the end of the network that generates predictions or outputs.
Neurons: The fundamental processing units take in inputs, apply weights and biases, and then, through
an activation function, produce an output.
Weights: Parameters that control the intensity of connectivity between neurons and thus the direction of
information flow.
Biases: Additional neuronal characteristics that adjust the network’s behavior and change the
activation function.
Activation A non-linear function applied to the weighted sum of inputs in each neuron, introducing non-
function: linearity to the network.
Forward The method of creating predictions by transferring input data from the input layer to the
propagation: output layer through the network.
Backpropagation: The process of computing gradients of the error with respect to weights and biases to adjust
them during training.
Loss function: A function that gauges the performance of the network by calculating the difference between
predicted results and actual labels.
The Architecture of Artificial Neural Networks A simple ANN consist of :

• Input layer: It contains those


units (Artificial Neurons) which
receive input from the outside
world on which the network will
learn, recognize about, or
otherwise process.,
• Hidden layer: These units are in
between input and output layers.
The hidden layer's job is to
transform the input into
something that the output unit
can use somehow.
• Output layer : It contains units
that respond to the information
about how it learn any task.
Artificial NN vs Biological NN
Aspect Artificial Neural Network (ANN) Biological Neural Network (BNN)

Origin Designed and developed by humans Found in living organisms

Structure It can have complex architectures and layers Comprises interconnected neurons

Complexity Can have complex architectures and layers Varies in complexity across species

Learning
Learns through backpropagation and training Learns through adaptation and experience
Mechanism

Speed Can perform computations quickly Slower due to chemical and biological processes

Scalability Can be scaled up or down as needed Limited by biological constraints

Processing Power Can process vast amounts of data rapidly Processing power varies across organisms

Fault Tolerance Resistant to noise and incomplete data Susceptible to noise and errors
Applications of Neural Networks:

Top applications of NNs:

• Neural Network for ML


• Face Recognition using it
• Neuro-Fuzzy Model and its applications
• NNs for data-intensive applications
General overview of how ANNs work

1. Data Preparation

2. Network Architecture Design

3. Data Pre-processing

4. Training the ANN

5. Validation and Tuning

6. Deployment and Testing

7. Decision-Making and Insights


Characteristics of ANN:

1. Non-linearity: capture complex non-linear relationships b/w input


variables and output predictions.
2. Adaptability and Learning: learn from data and adapt their internal
parameters (weights and biases) to optimize performance.
3. Parallel Processing: can process information parallel across multiple
neurons and layers.
4. Robustness: can handle noisy or incomplete data, making them resilient
to certain imperfections.
5. Feature Extraction: can automatically extract relevant features from the
input data.
Types of NN 1. Perceptron model
• Proposed by Minsky-Papert, is one of the simplest and oldest
models of Neuron.
• Smallest unit of NN that does certain computations to detect
features or BI in the input data.
• It accepts weighted inputs, and apply the activation function to
obtain the output as the final result.
• Also known as TLU(threshold logic unit)
• It is a supervised learning algorithm that classifies the data into two
categories, thus it is a binary classifier.
• A perceptron separates the input space into 2 categories
by a hyperplane represented as:
Advantages:
can implement Logic Gates like
AND, OR, or NAND.

Disadvantages
can only learn linearly separable
problems such as boolean AND
problem. For non-linear
problems such as the boolean
XOR problem, it does not work.
2. Feed Forward NNs
• The simplest form of NNs where input data travels in one direction only, passing through
artificial neural nodes and exiting through output nodes.
• Where hidden layers may or may not be present, input and output layers are present there.
• Based on this, they can be further classified as a single-layered or multi-layered feed-
forward NN.

• Number of layers depends on the complexity of the function. It has uni-directional forward
propagation but no backward propagation.
• Weights are static here.
• An activation function is fed by inputs which are multiplied by weights.

For example: The neuron is activated if it is above threshold (usually 0) and the neuron
produces 1 as an output. The neuron is not activated if it is below threshold (usually 0) which
is considered as -1.
• They are fairly simple to maintain and are equipped with to deal with data which contains a
lot of noise.
Advantages Disadvantages

• Less complex, easy to design & maintain Cannot be used for DL [due to
• Fast and speedy [One-way propagation] absence of dense layers and back
• Highly responsive to noisy data propagation]

Applications:

• Simple classification (where traditional ML based


classification algorithms have limitations)
• Face recognition [Simple straight forward image
processing]
• Computer vision [Where target classes are difficult to
classify]
• Speech Recognition
3. Multilayer Perceptron
• Complex neural nets where input data travels through various layers of
artificial neurons.
• Every single node is connected to all neurons in the next layer which makes it
a fully connected NN.
• Input and output layers are present having multiple hidden Layers i.e. at
least three or more layers in total.
• It has a bi-directional propagation i.e. forward propagation and backward
propagation.
• Inputs are multiplied with weights and fed to the activation function and in
backpropagation, they are modified to reduce the loss.
In simple words, weights are machine learnt values from NNs. They self-adjust
depending on the difference between predicted outputs vs training inputs.
• Nonlinear activation functions are used followed by softmax as an output
layer activation function.
Applications :

• Speech Recognition
• Machine Translation
• Complex Classification

Advantages:

Used for DL [due to the presence of dense fully connected layers and back
propagation]
Disadvantages: • Comparatively complex to design and maintain
• Comparatively slow (depends on number of hidden layers)
Approaches for knowledge extraction from Multilayer
Perceptrons
4. Convolutional NN(CNN)
• Contains a 3-D arrangement of neurons instead of the standard 2-D array.
• The first layer is called a convolutional layer.
• Each neuron in the convolutional layer only processes the information from a small
part of the visual field.
• Input features are taken in batch-wise like a filter.
• The network understands the images in parts and can compute these operations
multiple times to complete the full image processing.
• Processing involves conversion of the image from RGB or HSI scale to grey-scale.
• Propagation is uni-directional where CNN contains one or more convolutional layers
followed by pooling and bidirectional where the output of convolution layer goes to a
fully connected NN for classifying the images as shown in the diagram.
• Filters are used to extract certain parts of the image.
• In MLP the inputs are multiplied with weights and fed to the activation function.
• Convolution uses ReLU(rectified linear unit) & MLP(multilayer perceptron) uses
nonlinear activation function followed by softmax.
• Image processing
Applications: • Computer Vision
• Speech Recognition
• Machine translation

• Used for DL with few parameters


Advantages:
• Less parameters to learn as compared to fully connected layer

• Comparatively complex to design and maintain


Disadvantages:
• Comparatively slow [depends on the number of hidden layers]
5. Recurrent Neural Networks(RNN)
• Designed to save the output of a layer, RNN is fed back to the input
to help in predicting the outcome of the layer.
• The first layer is typically a feed forward NN followed by RNN layer
where some information it had in the previous time-step is
remembered by a memory function.
• Forward propagation is implemented in this case. It stores
information required for it’s future use.
• If the prediction is wrong, the learning rate is employed to make
small changes. Hence, making it gradually increase towards making
the right prediction during the backpropagation.
Applications of RNNs

• Text processing like auto suggest, grammar checks, etc.


• Text to speech processing
• Image tagger
• Sentiment Analysis
• Translation
Improvement over RNN:
LSTM (Long Short-Term Memory) Networks

• LSTM networks are a type of RNN that uses special units in addition to standard units.
• LSTM units include a ‘memory cell’ that can maintain information in memory for long
periods of time.
• A set of gates is used to control when information enters the memory when it’s output,
and when it’s forgotten.
• There are three types of gates viz, Input gate, output gate and forget gate. Input gate
decides how many information from the last sample will be kept in memory; the output
gate regulates the amount of data passed to the next layer, and forget gates control the
tearing rate of memory stored.
• This architecture lets them learn longer-term dependencies
This is one of the implementations of LSTM cells, many other architectures exist.
6. Modular Neural Network
• Has a number of different networks that function independently and
perform sub-tasks.
• The different networks do not really interact with or signal each other
during the computation process.
• They work independently towards achieving the output.

• As a result, a large and complex


computational process are done significantly
faster by breaking it down into independent
components.
• The computation speed increases because
the networks are not interacting with or
even connected to each other.
Applications of Modular NN
• Stock market prediction systems
• Adaptive MNN for character recognitions
• Compression of high level input data

Advantages:

• Efficient
• Independent training Disadvantages:
• Robustness
Moving target Problems
What are the Five Algorithms to Train a Neural Network?

1.Hebbian Learning Rule


2.Self - Organizing Kohonen Rule
3.Hopfield Network Law
4.LMS algorithm (Least Mean Square)
5.Competitive Learning
• The NN learns by adjusting its weights and bias (threshold) iteratively to yield
the desired output.
• These are also called free parameters.
• For learning to take place, NN is trained first.
• Training is performed using a defined set of rules, also known as learning
algorithm.
Training Algorithms
Gradient Descent Algorithm
• Simplest training algorithm used in the case of a supervised training model.
• In case the actual output is different from the target output, the difference or error is find out.
• It changes the weights of the network in such a manner to minimize this mistake.

Back Propagation Algorithm


• It is an extension of the gradient-based delta learning rule.
• Here, after finding an error (the difference between desired and target), the error is propagated backward
from the output layer to the input layer via the hidden layer.
• It is used in the case of Multi-layer Neural Network.
Learning Techniques in Neural Networks

1. Supervised Learning
2. Unsupervised Learning
3. Reinforcement Learning
4. Offline Learning
5. Online Learning
Competitive learning
• Is a form of unsupervised learning in ANNs, in which nodes compete for
the right to respond to a subset of the input data.
• A variant of Hebbian learning, competitive learning works by increasing
the specialization of each node in the network. It is well suited to finding
clusters within data.
• Models and algorithms based on the principle of competitive learning
include vector quantization & self-organizing maps.
• In this model, there are hierarchical sets of units in the network with
inhibitory and excitatory connections.
• The excitatory connections are b/w individual layers and the inhibitory
connections are between units in layered clusters.
• Units in a cluster are either active or inactive.
There are three basic elements to a competitive learning rule :

• A set of neurons that are all the same except for some randomly
distributed synaptic weights, and which therefore respond
differently to a given set of input patterns.

• A limit imposed on the “strength” of each neuron.

• A mechanism that permits the neurons to compete for the right


to respond to a given subset of inputs, such that only one output
neuron (or only one neuron per group), is active (i.e., “on”) at a
time. The neuron that wins the competition is called a “winner-
take-all” neuron.
Describe Supervised learning and Unsupervised learning
Supervised learning VS. Unsupervised learning
Building an Image Classification with ANN
• First, we need to load a dataset.
• In this Image Classification model we will tackle Fashion MNIST.
• It has a format of 60,000 grayscale images of 28 x 28 pixels each, with 10 classes. Let’s
import some necessary libraries to start with this task:
The MNIST database (Modified National Institute of Standards and Technology DB is a large
DB of handwritten digits that is commonly used for training various image processing
systems.
Step 1: Data Preparation

Example: Imagine we have a dataset of hand-drawn digits (0-9) in the form of 28x28
pixel grayscale images.
Each image is labeled with the correct digit (0, 1, 2, ..., 9).

Step 2: Input Layer

Example: Each image is flattened into a 1D array of 784 pixels (28x28 = 784).
The input layer of the ANN has 784 neurons, one for each pixel.

Step 3: Hidden Layers

Example: Let's use a single hidden layer with 128 neurons.


Each neuron in the hidden layer calculates a weighted sum of inputs from the input
layer and applies an activation function (e.g., ReLU).
Step 4: Weights and Biases

Example: Initially, weights and biases are set to small random values. During training, they
are adjusted to minimize prediction errors.
The weights determine how strongly each input pixel influences the hidden layer neurons.

Step 5: Activation Function

Example: In the hidden layer, the ReLU activation function is applied to each neuron's
weighted sum of inputs.
ReLU(x) = max(0, x) - It introduces non-linearity into the network.

Step 6: Output Layer

Example: In this case, we're classifying digits (0-9), so the output layer has 10 neurons (one
for each possible digit).
The softmax activation function is applied to produce probability scores for each digit class.
Step 7: Training

Example: During training, we use a labeled dataset to adjust weights and biases.
We use an optimization algorithm like gradient descent to minimize the difference between
predicted and actual digit labels.

Step 8: Prediction

Example: Once the ANN is trained, we can use it to classify new hand-drawn digits.
For instance, if we present the ANN with an image of a handwritten "7," it will produce
probability scores for all digits, and the digit with the highest score will be the predicted
digit.
Here's a simplified example to illustrate the process:
Suppose we have an image of a handwritten digit "3." The pixel values are converted into a
784-dimensional vector and passed through the ANN. After processing through the hidden
layer and applying the softmax function in the output layer, the ANN may produce
probabilities like this:
Digit 0: 0.02
Digit 1: 0.05
Digit 2: 0.03
Digit 3: 0.85
Digit 4: 0.01
Digit 5: 0.02
Digit 6: 0.01
Digit 7: 0.01
Digit 8: 0.01
Digit 9: 0.00

In this case, the ANN predicts that the image most likely
represents the digit "3" because it has the highest probability
score (0.85).
Why we use Activation functions with Neural Networks?
• It is used to determine the output of NN like yes or no. It maps the resulting values in between 0 to 1 or -1
to 1 etc. (depending upon the function).

• The Activation Functions can be basically divided into 2 types-


* Linear Activation Function * Non-linear Activation Functions

A) Linear or Identity Activation Function


The function is a line or linear. Therefore, the output of the functions will not be confined between any range.
Equation : f(x) = x
Range : (-infinity to infinity)

B) The Nonlinear Activation Functions are the most used activation functions. The Nonlinear Activation
Functions are mainly divided on the basis of their range or curves-

1. Sigmoid or Logistic Activation Function 2. Tanh or hyperbolic tangent Activation Function

3. ReLU (Rectified Linear Unit) Activation Function 4. Leaky ReLU


And many others
Description of important Activation Functions:
Softmax activation function :
It is most commonly used as an activation function for the last layer
of the neural network in the case of multi-class classification.

Sigmoid / Logistic Activation Function


SoftMax function is an extension of the Sigmoid function.
This function takes any real value as input and outputs
values in the range of 0 to 1.

It is a functional that is graphed in a "S" shape.

You might also like