ML-3
Artificial Neural Networks and its Applications
As you read this article, which organ in your body is thinking about it? It's
the brain, of course! But do you know how the brain works? Well, it
has neurons or nerve cells that are the primary units of both the brain and the
nervous system. These neurons receive sensory input from the outside
world, which they process and then provide the output, which might act as the
input to the next neuron.
Each of these neurons is connected to other neurons in complex arrangements
at synapses. Now, are you wondering how this is related to Artificial Neural
Networks? Let’s check out what they are in detail and how they learn
information.
Artificial Neural Networks
Artificial Neural Networks contain artificial neurons, which are called units.
These units are arranged in a series of layers that together constitute the
whole Artificial Neural Network in a system. A layer can have only a dozen
units or millions of units, as this depends on how the complex neural networks
will be required to learn the hidden patterns in the dataset.
Commonly, an Artificial Neural Network has an input layer, an output layer, as
well as hidden layers. The input layer receives data from the outside
world, which the neural network needs to analyze or learn about. Then, this data
passes through one or multiple hidden layers that transform the input into data
that is valuable for the output layer. Finally, the output layer provides an output
in the form of a response of the Artificial Neural Networks to the input data
provided.
In the majority of neural networks, units are interconnected from one layer to
another. Each of these connections has weights that determine the influence of
one unit on another unit. As the data transfers from one unit to another, the
neural network learns more and more about the data, which eventually results in
an output from the output layer.
If you want to gain practical skills in Artificial Neural Networks and explore
their diverse applications through our interactive live data science course ,
perfect for aspiring data scientists.
Neural Networks Architecture
The structures and operations of human neurons serve as the basis for artificial
neural networks. It is also known as neural networks or neural nets. The input
layer of an artificial neural network is the first layer, and it receives input from
external sources and releases it to the hidden layer, which is the second layer.
In the hidden layer, each neuron receives input from the previous layer neurons,
computes the weighted sum, and sends it to the neurons in the next layer. These
connections are weighted means effects of the inputs from the previous layer are
optimized more or less by assigning different-different weights to each input
and it is adjusted during the training process by optimizing these weights for
improved model performance.
Artificial neurons vs Biological neurons
The concept of artificial neural networks comes from biological neurons found
in animal brains So they share a lot of similarities in structure and function
wise.
Structure: The structure of artificial neural networks is inspired by
biological neurons. A biological neuron has a cell body or soma to
process the impulses, dendrites to receive them, and an axon that transfers
them to other neurons. The input nodes of artificial neural networks
receive input signals, the hidden layer nodes compute these input signals,
and the output layer nodes compute the final output by processing the
hidden layer's results using activation functions.
Biological Neuron Artificial Neuron
Dendrite Inputs
Cell nucleus or
Nodes
Soma
Synapses Weights
Axon Output
Synapses: Synapses are the links between biological neurons that enable
the transmission of impulses from dendrites to the cell body. Synapses are
the weights that join the one-layer nodes to the next-layer nodes in
artificial neurons. The strength of the links is determined by the weight
value.
Learning: In biological neurons, learning happens in the cell body
nucleus or soma, which has a nucleus that helps to process the impulses.
An action potential is produced and travels through the axons if the
impulses are powerful enough to reach the threshold. This becomes
possible by synaptic plasticity, which represents the ability of synapses to
become stronger or weaker over time in reaction to changes in their
activity. In artificial neural networks, backpropagation is a technique used
for learning, which adjusts the weights between nodes according to the
error or differences between predicted and actual outcomes.
Biological Neuron Artificial Neuron
Synaptic
Backpropagations
plasticity
Activation: In biological neurons, activation is the firing rate of the
neuron which happens when the impulses are strong enough to reach the
threshold. In artificial neural networks, A mathematical function known
as an activation function maps the input to the output, and executes a
Biological neurons to Artificial neurons
How do Artificial Neural Networks learn?
Artificial neural networks are trained using a training set. For example,
suppose you want to teach an ANN to recognize a cat. Then it is shown
thousands of different images of cats so that the network can learn to identify a
cat. Once the neural network has been trained enough using images of cats, then
you need to check if it can identify cat images correctly. This is done by making
the ANN classify the images it is provided by deciding whether they are cat
images or not. The output obtained by the ANN is corroborated by a human-
provided description of whether the image is a cat image or not. If the ANN
identifies incorrectly then is used to adjust whatever it has learned during
training. Backpropagation is done by fine-tuning the weights of the
connections in ANN units based on the error rate obtained. This process
continues until the artificial neural network can correctly recognize a cat in an
image with minimal possible error rates.
What are the types of Artificial Neural Networks?
Feedforward Neural Network : The feedforward neural network is one
of the most basic artificial neural networks. In this ANN, the data or the
input provided travels in a single direction. It enters into the ANN
through the input layer and exits through the output layer while hidden
layers may or may not exist. So the feedforward neural network has a
front-propagated wave only and usually does not have backpropagation.
Convolutional Neural Network : A Convolutional neural network has
some similarities to the feed-forward neural network, where the
connections between units have weights that determine the influence of
one unit on another unit. But a CNN has one or more than one
convolutional layer that uses a convolution operation on the input and
then passes the result obtained in the form of output to the next layer.
CNN has applications in speech and image processing which is
particularly useful in computer vision.
Modular Neural Network: A Modular Neural Network contains a
collection of different neural networks that work independently towards
obtaining the output with no interaction between them. Each of the
different neural networks performs a different sub-task by obtaining
unique inputs compared to other networks. The advantage of this modular
neural network is that it breaks down a large and complex computational
process into smaller components, thus decreasing its complexity while
still obtaining the required output.
Radial basis function Neural Network: Radial basis functions are those
functions that consider the distance of a point concerning the center. RBF
functions have two layers. In the first layer, the input is mapped into all
the Radial basis functions in the hidden layer and then the output layer
computes the output in the next step. Radial basis function nets are
normally used to model the data that represents any underlying trend or
function.
Recurrent Neural Network: The Recurrent Neural Network saves the
output of a layer and feeds this output back to the input to better predict
the outcome of the layer. The first layer in the RNN is quite similar to the
feed-forward neural network and the recurrent neural network starts once
the output of the first layer is computed. After this layer, each unit will
remember some information from the previous step so that it can act as a
memory cell in performing computations.
Applications of Artificial Neural Networks
Social Media: Artificial Neural Networks are used heavily in Social
Media. For example, let’s take the ‘People you may know’ feature on
Facebook that suggests people that you might know in real life so that
you can send them friend requests. Well, this magical effect is achieved
by using Artificial Neural Networks that analyze your profile, your
interests, your current friends, and also their friends and various other
factors to calculate the people you might potentially know.
Marketing and Sales: When you log onto E-commerce sites like
Amazon and Flipkart, they will recommend you products to buy based on
your previous browsing history. Similarly, suppose you love Pasta, then
Zomato, Swiggy, etc. will show you restaurant recommendations based
on your tastes and previous order history. This is true across all new-age
marketing segments like Book sites, Movie services, Hospitality sites,
etc. and it is done by implementing personalized marketing .
Healthcare: Artificial Neural Networks are used in Oncology to train
algorithms that can identify cancerous tissue at the microscopic level at
the same accuracy as trained physicians. Various rare diseases may
manifest in physical characteristics and can be identified in their
premature stages by using Facial Analysis on the patient photos.
Personal Assistants: Personal assistants like Alexa, Siri uses Natural
Language Processing to interact with the users and formulate a response
accordingly. Natural Language Processing uses artificial neural networks
that are made to handle many tasks of these personal assistants such as
managing the language syntax, semantics, correct speech, the
conversation that is going on, etc.
Conclusion
In conclusion, Artifical Neural Networks acts as a brain. It has various
layers which are interconnected to each other such as the input layer and the
hidden layer. These connections are weighted means effects of the inputs from
the previous layer are optimized more or less by assigning different weights to
each input. Artificial Neural Networks has various applications in today's worls.
It is used in mostly every sector, particularly social media, healthcare,
marketing and sales.
Activation Function:
An activation function in machine learning is a mathematical function used in
neural networks to decide whether a neuron should be activated or not. It
adds non-linearity to the model, allowing it to learn complex patterns in data.
Without activation functions, a neural network would behave like a simple
linear model. Common types include ReLU, which outputs only positive
values, Sigmoid, which squashes values between 0 and 1, and Tanh, which
ranges from -1 to 1. These functions are applied after calculating the weighted
sum of inputs in each neuron. They help the network understand and learn from
data more effectively. Choosing the right activation function can greatly impact
model performance.
Types of Activation Functions:
Sigmoid Activation Function
The Sigmoid function is one of the oldest and most commonly used activation
functions in machine learning. It takes any real number and maps it between 0
and 1, which makes it useful when we want to predict probabilities —
especially in binary classification problems. The function has an S-shaped
curve and is defined by the formula σ(x)=11+e−x\sigma(x) = \frac{1}{1 + e^{-
x}}σ(x)=1+e−x1. It is smooth and continuously differentiable, which is helpful
during training. However, the sigmoid function has a major drawback: for very
large or small input values, it flattens out and causes vanishing gradients,
which slows or even stops learning in deep networks.
Softmax Activation Function
The Softmax function is used mostly in the output layer of a neural network
when solving multi-class classification problems. It takes a vector of raw
scores (called logits) and converts them into probabilities that sum up to 1. The
formula is Softmax(xi)=exi∑jexj\text{Softmax}(x_i) = \frac{e^{x_i}}{\sum_j
e^{x_j}}Softmax(xi)=∑jexjexi, where xix_ixi is the input to a neuron. This
way, the model not only tells you which class it predicts, but also gives the
confidence for each class. It’s not used in hidden layers, only the output layer
when you're dealing with three or more class labels.
Tanh Activation Function
The Tanh function, short for hyperbolic tangent, is similar to the sigmoid
function but outputs values in the range -1 to 1 instead of 0 to 1. This helps the
network’s data stay centered around zero, which often results in faster
convergence during training. The formula is tanh(x)=ex−e−xex+e−x\tanh(x) = \
frac{e^x - e^{-x}}{e^x + e^{-x}}tanh(x)=ex+e−xex−e−x, and it also has an S-
shaped curve. Like sigmoid, tanh suffers from the vanishing gradient problem
for extreme input values, but it generally performs better than sigmoid in hidden
layers because of its zero-centered output.
ReLU (Rectified Linear Unit)
ReLU is currently the most widely used activation function in deep learning. It
is defined very simply: if the input is positive, it returns that value; if it’s
negative, it returns zero. This makes it much faster and simpler to compute than
sigmoid or tanh. ReLU is especially powerful because it helps reduce the
likelihood of vanishing gradients and allows for faster training of deep neural
networks. However, a problem can occur where some neurons only output zero
— this is known as the “dying ReLU” problem, meaning those neurons stop
learning permanently.
Leaky ReLU
To fix the “dying ReLU” problem, Leaky ReLU was introduced. Unlike
standard ReLU which outputs zero for any negative input, Leaky ReLU allows
a small, non-zero output for negative inputs, usually 0.01×x0.01 \times x0.01×x.
This means that neurons can still learn even if they receive negative signals. It
adds just a slight complexity over ReLU but helps prevent entire parts of the
network from becoming inactive. Leaky ReLU performs better in some models
where ReLU struggles due to many dead neurons.
PReLU (Parametric ReLU)
PReLU, or Parametric ReLU, is an advanced version of Leaky ReLU where
the slope for negative inputs is not fixed but instead learned during training.
This means the model can adapt how much it wants to allow negative input to
flow through. It combines the benefits of Leaky ReLU with flexibility,
potentially improving model accuracy. However, it slightly increases the
model’s complexity and the risk of overfitting, especially in smaller datasets,
because it adds more parameters to learn.
Let me know if you'd like a visual summary chart, graphs, or code examples!
4o
You said:
PReLu in simple easy way of understanding
ChatGPT said:
Sure! Here's a very simple and easy explanation of PReLU:
🧠 What is PReLU (Parametric ReLU)?
PReLU is a special type of activation function that is similar to ReLU and
Leaky ReLU, but smarter.
In normal ReLU, if the input is negative, the output is just zero. In Leaky
ReLU, we allow a small fixed value to pass through for negative inputs (like
0.01 × input).
But in PReLU, the amount that passes through for negative inputs is not fixed
— the model learns it during training! This means the network itself decides
how much of the negative input to keep, making it more flexible and adaptive.
📌 Simple Example:
If input is positive, output = input (same as ReLU)
If input is negative, output = a × input, where a is learned by the model
So instead of choosing a number like 0.01 (like in Leaky ReLU), PReLU finds
the best value of a while training.
✅ Why use PReLU?
Helps prevent neurons from dying
Learns the best slope for negative inputs
Can improve accuracy in some deep models
Perceptron in Machine Learning
A Perceptron is the simplest type of artificial neural network and one of the
basic building blocks of machine learning. It is a single-layer model that takes
multiple inputs, applies weights to them, sums them up, and passes the result
through an activation function to produce an output—usually a binary decision
like yes/no or 0/1. The perceptron is mainly used for linear classification
problems, meaning it tries to separate data into two classes using a straight line
(or hyperplane). Despite its simplicity, it laid the foundation for more complex
neural networks and helped start the field of deep learning.
Single-Layer Perceptron
A Single-Layer Perceptron is the most basic type of neural network, consisting
of only one layer of neurons (called the output layer). It takes multiple input
features, multiplies each by a weight, sums them up, and passes the result
through an activation function (usually a step or sign function) to produce an
output. This model can only solve linearly separable problems, meaning it can
draw a straight line (or hyperplane) to separate two classes. While simple and
fast, it cannot handle more complex problems where the data isn’t linearly
separable.
Multi-Layer Perceptron (MLP)
A Multi-Layer Perceptron is a more advanced neural network that contains
one or more hidden layers between the input and output layers. Each layer
consists of several neurons, and the network uses nonlinear activation functions
like ReLU or Sigmoid to learn complex patterns. The addition of hidden layers
allows MLPs to solve nonlinear problems by creating complex decision
boundaries. MLPs are trained using backpropagation, which adjusts weights
across all layers to minimize prediction error. This makes MLPs powerful
models widely used in classification, regression, and many other machine
learning tasks.
CNN
A Convolutional Neural Network (CNN) is a special type of neural network
mainly used for processing images and visual data. Unlike regular neural
networks, CNNs automatically detect important features like edges, shapes, and
textures by applying small filters called convolutions across the input image.
These filters scan the image step-by-step, creating feature maps that highlight
important parts. CNNs also use pooling layers to reduce the size of the data
while keeping important information, which helps the network learn faster and
avoid overfitting. By stacking multiple convolution and pooling layers, CNNs
can understand complex patterns and objects in images. They are widely used in
tasks like image recognition, object detection, and facial recognition. CNNs
have shown great success because they require fewer parameters than fully
connected networks, making them efficient. Their ability to learn spatial
hierarchies from data makes them very powerful for computer vision problems.
Overall, CNNs help computers “see” and understand images much like humans
do.
Key Components of a Convolutional Neural Network
1. Convolutional Layers: These layers apply convolutional operations to
input images using filters or kernels to detect features such as edges,
textures and more complex patterns. Convolutional operations help
preserve the spatial relationships between pixels.
2. Pooling Layers: They downsample the spatial dimensions of the input,
reducing the computational complexity and the number of parameters in
the network. Max pooling is a common pooling operation where we
select a maximum value from a group of neighboring pixels.
3. Activation Functions: They introduce non-linearity to the model by
allowing it to learn more complex relationships in the data.
4. Fully Connected Layers: These layers are responsible for making
predictions based on the high-level features learned by the previous
layers. They connect every neuron in one layer to every neuron in the
next layer.
How to Train a Convolutional Neural Network?
CNNs are trained using a supervised learning approach. This means that the
CNN is given a set of labeled training images. The CNN learns to map the input
images to their correct labels.
The training process for a CNN involves the following steps:
1. Data Preparation: The training images are preprocessed to ensure that
they are all in the same format and size.
2. Loss Function: A loss function is used to measure how well the CNN is
performing on the training data. The loss function is typically calculated
by taking the difference between the predicted labels and the actual labels
of the training images.
3. Optimizer: An optimizer is used to update the weights of the CNN in
order to minimize the loss function.
4. Backpropagation: Backpropagation is a technique used to calculate the
gradients of the loss function with respect to the weights of the CNN. The
gradients are then used to update the weights of the CNN using the
optimizer.
RNN:
A Recurrent Neural Network (RNN) is a type of neural network designed to
work with sequence data like sentences, time series, or speech. Unlike regular
networks, RNNs have a special memory that lets them remember information
from previous steps while processing new data. This memory helps the network
understand context, like the meaning of words in a sentence based on what
came before. RNNs process input one step at a time and pass their hidden state
along the sequence to keep track of past information. They are useful in tasks
such as language translation, speech recognition, and text prediction. However,
standard RNNs can struggle with very long sequences due to the vanishing
gradient problem, where early information fades away during training. To fix
this, advanced versions like LSTM and GRU were created to remember
important details longer. RNNs are powerful because they handle data where
order and context matter. Overall, they help machines understand and generate
sequences naturally.
Key Components of RNNs
There are mainly two components of RNNs that we will discuss.
1. Recurrent Neurons
The fundamental processing unit in RNN is a Recurrent Unit. They hold a
hidden state that maintains information about previous inputs in a sequence.
Recurrent units can "remember" information from prior steps by feeding back
their hidden state, allowing them to capture dependencies across time.
RNN Unfolding:
RNN Unfolding means “stretching out” a Recurrent Neural Network over time
to show how it processes a sequence step by step. Instead of thinking of the
RNN as a loop, we imagine a chain of repeated neural network units—one for
each time step in the input sequence. Each unit takes the input at that step and
the hidden state from the previous step, then passes its hidden state to the next
unit. This helps us understand how information flows through the network over
time and is useful when training the RNN using backpropagation through time
(BPTT). So, unfolding simply shows the RNN working like many connected
copies of itself across the sequence.
Here are the key components of a Recurrent Neural Network (RNN)
explained simply:
1. Input Layer: This is where the sequence data (like words or time steps)
enters the network, one element at a time.
2. Hidden State: A core part of RNNs, the hidden state acts like the
network’s memory. It stores information from previous inputs to help
understand the current input in context.
3. Weights: These are the parameters the network learns during training,
including weights for input-to-hidden connections and hidden-to-hidden
connections, which control how input and past information combine.
4. Activation Function: Usually a non-linear function (like tanh or ReLU)
applied to the weighted sum of inputs and previous hidden state, helping
the network learn complex patterns.
5. Output Layer: Produces the output for each time step, such as a
predicted word or value, based on the hidden state.
6. Sequence Processing: RNN processes data sequentially, feeding the
hidden state from one step into the next, enabling it to handle time-
dependent information.
7. Loss Function: Measures the difference between the predicted output
and actual target to guide learning during training.
Types of RNN:
One to One
One to Many
Many to One
Many to Many
1. Confusion Matrix
A confusion matrix is a table that helps visualize how well a classification
model is performing. It compares the actual labels with the predicted labels
and shows four values:
True Positive (TP): Correctly predicted positive cases
True Negative (TN): Correctly predicted negative cases
False Positive (FP): Incorrectly predicted positive cases (Type I error)
False Negative (FN): Incorrectly predicted negative cases (Type II error)
2. Precision
Precision tells us, out of all the cases where the model predicted positive, how
many were actually positive.
Precision=TPTP+FP\text{Precision} = \frac{TP}{TP +
FP}Precision=TP+FPTP
High precision means fewer false alarms.
3. Recall (Sensitivity)
Recall measures how many of the actual positive cases the model correctly
identified.
Recall=TPTP+FN\text{Recall} = \frac{TP}{TP + FN}Recall=TP+FNTP
High recall means the model misses fewer positive cases.
4. Accuracy
Accuracy shows the overall correctness of the model — the proportion of all
correct predictions (both positive and negative) out of all predictions.
Accuracy=TP+TNTP+TN+FP+FN\text{Accuracy} = \frac{TP + TN}{TP + TN
+ FP + FN}Accuracy=TP+TN+FP+FNTP+TN
It is useful when classes are balanced.
5. F-Score (F1 Score)
F1 Score is the harmonic mean of precision and recall, balancing the two. It’s
useful when you want to consider both false positives and false negatives.
F1 Score=2×Precision×RecallPrecision+Recall\text{F1 Score} = 2 \times \
frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \
text{Recall}}F1 Score=2×Precision+RecallPrecision×Recall
6. ROC Curve (Receiver Operating Characteristic Curve)
The ROC curve plots the True Positive Rate (Recall) against the False
Positive Rate (FPR) at different classification thresholds. The area under the
ROC curve (AUC) indicates how well the model distinguishes between classes
— closer to 1 is better. It helps to choose the best threshold for classification.