Professional Documents
Culture Documents
Neuron
• In the context of a NN, a neuron is the most fundamental unit of
processing.
• It’s also called a perceptron.
• A NN is based on the way a human brain works.
So, it simulates the way the biological neurons signal to one another.
• Apart from the living world, in the realm of Computer Science’s ANNs, a
neuron is a collection of a set of inputs, a set of weights, and an activation
function.
• It translates these inputs into a single output. Another layer of neurons picks
this output as its input and this goes on and on.
In essence, we can say that each neuron is a mathematical function that closely
simulates the functioning of a biological neuron.
History of NNs
Time Period Key Developments and Milestones
1940s-1950s McCulloch and Pitts propose artificial neuron concept
Development of the Perceptron for binary classification tasks
1960s-1970s
- Limited success, leading to a decline in interest
Resurgence with advances in training algorithms, including backpropagation
1980s-1990s - Introduction of Multi-Layer Perceptron (MLP)
Challenges with training deep networks due to the vanishing gradient problem
- Breakthroughs in deep learning with CNNs and RNNs
2000s-2010s - AlexNet (2012) sets a milestone in image recognition
Application of deep learning to various fields, including NLP
- Continued advancement with GANs and Transformers
Deep learning's impact on computer vision, speech recognition, autonomous
2010s-Present systems, & more
Ongoing research to explore new architectures and training techniques
Architecture of biological neuron: 3 basic parts:
Cell body Cell extension Axon Cell extension Dendrite
Structure It can have complex architectures and layers Comprises interconnected neurons
Complexity Can have complex architectures and layers Varies in complexity across species
Learning
Learns through backpropagation and training Learns through adaptation and experience
Mechanism
Speed Can perform computations quickly Slower due to chemical and biological processes
Processing Power Can process vast amounts of data rapidly Processing power varies across organisms
Fault Tolerance Resistant to noise and incomplete data Susceptible to noise and errors
Applications of Neural Networks:
1. Data Preparation
3. Data Pre-processing
Disadvantages
can only learn linearly separable
problems such as boolean AND
problem. For non-linear
problems such as the boolean
XOR problem, it does not work.
2. Feed Forward NNs
• The simplest form of NNs where input data travels in one direction only, passing through
artificial neural nodes and exiting through output nodes.
• Where hidden layers may or may not be present, input and output layers are present there.
• Based on this, they can be further classified as a single-layered or multi-layered feed-
forward NN.
• Number of layers depends on the complexity of the function. It has uni-directional forward
propagation but no backward propagation.
• Weights are static here.
• An activation function is fed by inputs which are multiplied by weights.
For example: The neuron is activated if it is above threshold (usually 0) and the neuron
produces 1 as an output. The neuron is not activated if it is below threshold (usually 0) which
is considered as -1.
• They are fairly simple to maintain and are equipped with to deal with data which contains a
lot of noise.
Advantages Disadvantages
• Less complex, easy to design & maintain Cannot be used for DL [due to
• Fast and speedy [One-way propagation] absence of dense layers and back
• Highly responsive to noisy data propagation]
Applications:
• Speech Recognition
• Machine Translation
• Complex Classification
Advantages:
Used for DL [due to the presence of dense fully connected layers and back
propagation]
Disadvantages: • Comparatively complex to design and maintain
• Comparatively slow (depends on number of hidden layers)
Approaches for knowledge extraction from Multilayer
Perceptrons
4. Convolutional NN(CNN)
• Contains a 3-D arrangement of neurons instead of the standard 2-D array.
• The first layer is called a convolutional layer.
• Each neuron in the convolutional layer only processes the information from a small
part of the visual field.
• Input features are taken in batch-wise like a filter.
• The network understands the images in parts and can compute these operations
multiple times to complete the full image processing.
• Processing involves conversion of the image from RGB or HSI scale to grey-scale.
• Propagation is uni-directional where CNN contains one or more convolutional layers
followed by pooling and bidirectional where the output of convolution layer goes to a
fully connected NN for classifying the images as shown in the diagram.
• Filters are used to extract certain parts of the image.
• In MLP the inputs are multiplied with weights and fed to the activation function.
• Convolution uses ReLU(rectified linear unit) & MLP(multilayer perceptron) uses
nonlinear activation function followed by softmax.
• Image processing
Applications: • Computer Vision
• Speech Recognition
• Machine translation
• LSTM networks are a type of RNN that uses special units in addition to standard units.
• LSTM units include a ‘memory cell’ that can maintain information in memory for long
periods of time.
• A set of gates is used to control when information enters the memory when it’s output,
and when it’s forgotten.
• There are three types of gates viz, Input gate, output gate and forget gate. Input gate
decides how many information from the last sample will be kept in memory; the output
gate regulates the amount of data passed to the next layer, and forget gates control the
tearing rate of memory stored.
• This architecture lets them learn longer-term dependencies
This is one of the implementations of LSTM cells, many other architectures exist.
6. Modular Neural Network
• Has a number of different networks that function independently and
perform sub-tasks.
• The different networks do not really interact with or signal each other
during the computation process.
• They work independently towards achieving the output.
Advantages:
• Efficient
• Independent training Disadvantages:
• Robustness
Moving target Problems
What are the Five Algorithms to Train a Neural Network?
1. Supervised Learning
2. Unsupervised Learning
3. Reinforcement Learning
4. Offline Learning
5. Online Learning
Competitive learning
• Is a form of unsupervised learning in ANNs, in which nodes compete for
the right to respond to a subset of the input data.
• A variant of Hebbian learning, competitive learning works by increasing
the specialization of each node in the network. It is well suited to finding
clusters within data.
• Models and algorithms based on the principle of competitive learning
include vector quantization & self-organizing maps.
• In this model, there are hierarchical sets of units in the network with
inhibitory and excitatory connections.
• The excitatory connections are b/w individual layers and the inhibitory
connections are between units in layered clusters.
• Units in a cluster are either active or inactive.
There are three basic elements to a competitive learning rule :
• A set of neurons that are all the same except for some randomly
distributed synaptic weights, and which therefore respond
differently to a given set of input patterns.
Example: Imagine we have a dataset of hand-drawn digits (0-9) in the form of 28x28
pixel grayscale images.
Each image is labeled with the correct digit (0, 1, 2, ..., 9).
Example: Each image is flattened into a 1D array of 784 pixels (28x28 = 784).
The input layer of the ANN has 784 neurons, one for each pixel.
Example: Initially, weights and biases are set to small random values. During training, they
are adjusted to minimize prediction errors.
The weights determine how strongly each input pixel influences the hidden layer neurons.
Example: In the hidden layer, the ReLU activation function is applied to each neuron's
weighted sum of inputs.
ReLU(x) = max(0, x) - It introduces non-linearity into the network.
Example: In this case, we're classifying digits (0-9), so the output layer has 10 neurons (one
for each possible digit).
The softmax activation function is applied to produce probability scores for each digit class.
Step 7: Training
Example: During training, we use a labeled dataset to adjust weights and biases.
We use an optimization algorithm like gradient descent to minimize the difference between
predicted and actual digit labels.
Step 8: Prediction
Example: Once the ANN is trained, we can use it to classify new hand-drawn digits.
For instance, if we present the ANN with an image of a handwritten "7," it will produce
probability scores for all digits, and the digit with the highest score will be the predicted
digit.
Here's a simplified example to illustrate the process:
Suppose we have an image of a handwritten digit "3." The pixel values are converted into a
784-dimensional vector and passed through the ANN. After processing through the hidden
layer and applying the softmax function in the output layer, the ANN may produce
probabilities like this:
Digit 0: 0.02
Digit 1: 0.05
Digit 2: 0.03
Digit 3: 0.85
Digit 4: 0.01
Digit 5: 0.02
Digit 6: 0.01
Digit 7: 0.01
Digit 8: 0.01
Digit 9: 0.00
In this case, the ANN predicts that the image most likely
represents the digit "3" because it has the highest probability
score (0.85).
Why we use Activation functions with Neural Networks?
• It is used to determine the output of NN like yes or no. It maps the resulting values in between 0 to 1 or -1
to 1 etc. (depending upon the function).
B) The Nonlinear Activation Functions are the most used activation functions. The Nonlinear Activation
Functions are mainly divided on the basis of their range or curves-