You are on page 1of 12

DEEP LEARNING NOTES

1. Compare Artificial Neuron with humanoid Neuron.

Artificial Neuron Humanoid Neuron

In the case of an ANN, the initial state and The strengths of connections between neurons and
weights are assigned randomly. the structure of connections don't start as random.
The initial state is genetically derived and is the
byproduct of evolution.

The interconnections change configuration The weights of an ANN are randomly initialized and
when the brain experiences new stimuli. adjusted via an optimization algorithm.

A typical ANN consists of hundreds, thousands, The BNN of the human brain consists of billions.
millions, and, in some exceptional cases (e.g.
GPT-3), billions of neurons.

What is forward propagation?

Forward propagation, also known as forward pass, is a fundamental concept in deep learning
and refers to the process of computing the output of a neural network given a set of input data.
It is the first step in the training and inference phases of deep learning.
During forward propagation, input data is fed through the neural network from the input layer to
the output layer, passing through one or more hidden layers. At each layer, the inputs are
transformed using activation functions and weights associated with the connections between
neurons. The output of each neuron in a layer serves as the input to the neurons in the next
layer until the final output layer is reached, which produces the predicted output for the given
input data.
What is backward propagation?

Backpropagation is a key algorithm used in deep learning for training artificial neural networks. It
involves iteratively updating the weights of the network by propagating error information
backwards through the network from the output layer to the input layer. The goal is to minimize
the difference between the network's predictions and the actual outputs.
For a single training example, the Backpropagation algorithm calculates the gradient of the error
function. Backpropagation can be written as a function of the neural network. Backpropagation
algorithms are a set of methods used to efficiently train artificial neural networks following a
gradient descent approach which exploits the chain rule.

What is Artificial Neuron?


What do you mean by artificial neural network ?

An Artificial Neural Network (ANN) is a computational model inspired by the structure and
function of the biological neural networks in the human brain. It is a type of machine learning
algorithm that is used to recognize patterns and relationships in data.

The artificial neural network is composed of interconnected nodes (neurons) that process
information in parallel. Each neuron receives input signals from the previous layer of neurons or
directly from the input data, and applies a mathematical operation to them to produce an output
signal that is passed to the next layer of neurons.

What is the difference between machine learning and deep learning?


What is gradient descent?

Gradient Descent is defined as one of the most commonly used iterative optimization
algorithms of machine learning to train the machine learning and deep learning models.

Gradient descent (GD) is an iterative first-order optimisation algorithm used to find a local
minimum/maximum of a given function. This method is commonly used in machine learning
(ML) and deep learning(DL) to minimize a cost/loss function (e.g. in a linear regression). The
basic idea behind gradient descent is to take small steps in the direction of the steepest descent
of the loss function with respect to the model parameters.
What do you mean by Mean Absolute error and Mean Square error ?

Mean Absolute Error (MAE) is a metric that measures the average absolute difference
between the predicted and actual values. It is calculated by taking the absolute difference
between the predicted and actual values for each data point, and then taking the average of
these differences.

The formula for MAE is:

where n is the number of data points, y_i is the actual value of the i-th data point, and ŷ_i is the
predicted value of the i-th data point.

Mean Square Error (MSE), on the other hand, measures the average squared difference
between the predicted and actual values. It is calculated by taking the square of the difference
between the predicted and actual values for each data point, and then taking the average of
these squared differences.

The formula for MSE is:

where n is the number of data points, y_i is the actual value of the i-th data point, and ŷ_i is the
predicted value of the i-th data point.

What is a perceptron ? State perceptron learning rule.


What is a multi layer perceptron with a diagram?

A multilayer perceptron is a fully connected class of feedforward artificial neural network.


A multilayer perceptron is a neural network connecting multiple layers in a directed graph, which
means that the signal path through the nodes only goes one way. Each node, apart from the
input nodes, has a nonlinear activation function. An MLP uses backpropagation as a supervised
learning technique. Since there are multiple layers of neurons, MLP is a deep learning
technique.
MLP is widely used for solving problems that require supervised learning as well as research
into computational neuroscience and parallel distributed processing. Applications include
speech recognition, image recognition and machine translation.

Can we train a neural network model by initializing all the weights to 0 ?

Initializing all weights to zero in a neural network is not recommended as it can lead to poor
performance and slow convergence during training.
When all the weights are initialized to the same value, each neuron in the network will produce
the same output, resulting in the same gradients being computed during backpropagation. This
will cause all the neurons in the network to learn the same features and produce the same
output, resulting in poor performance.
It is important to use proper weight initialization methods to ensure that the network can learn
and generalize effectively.

What is the activation function ? Explain various activation functions.

The activation function decides whether a neuron should be activated or not by calculating the
weighted sum and further adding bias to it. The purpose of the activation function is to introduce
non-linearity into the output of a neuron.
Activation functions make the back-propagation possible since the gradients are supplied along
with the error to update the weights and biases.
Types of activation function

Linear Function
● Equation : Linear function has the equation similar to as of a straight line i.e. y = x
● Range : -inf to +inf
● Uses : Linear activation function is used at just one place i.e. output layer.

Binary step Function


● Equation :

● Range : The function produces 1 (or true) when input passes a threshold limit whereas it
produces 0 (or false) when input does not pass threshold.
● Uses : The binary step function can be used as an activation function while creating a
binary classifier.

Sigmoid Function
● It is a function which is plotted as an ‘S’ shaped graph.
● Equation : A = 1/(1 + e-x)
● Value Range : 0 to 1
● Uses : Usually used in the output layer of a binary classification, where the result is
either 0 or 1.
Hyperbolic Tangent Function

● The activation that works almost always better than the sigmoid function is the Tanh
function also known as Tangent Hyperbolic function. It’s actually a mathematically shifted
version of the sigmoid function. Both are similar and can be derived from each other.
● Equation :-

● Value Range :- -1 to +1
● Uses :- Usually used in hidden layers of a neural network as its values lie between -1 to
1 hence the mean for the hidden layer comes out be 0 or very close to it, hence helps in
centering the data by bringing mean close to 0.

RELU Function
● It Stands for Rectified linear unit. It is the most widely used activation function. Chiefly
implemented in hidden layers of the Neural network.
● Equation :- A(x) = max(0,x). It gives an output x if x is positive and 0 otherwise.
● Value Range :- [0, inf)
● Uses :- ReLu is less computationally expensive than tanh and sigmoid because it
involves simpler mathematical operations.
SoftMax Function
The softmax function is also a type of sigmoid function but is handy when we are trying to
handle multi- class classification problems.
● Nature :- non-linear
● Uses :- Usually used when trying to handle multiple classes. The softmax function was
commonly found in the output layer of image classification problems.

State and explain various loss functions in deep learning.

1. Mean Squared Error (MSE) - This loss function is commonly used in regression
problems, where the goal is to predict a continuous value. MSE measures the average
squared difference between the predicted and actual values.
2. Mean Absolute Error (MAE) - This loss function is similar to MSE, but it measures the
absolute difference between the predicted and actual values. It is also commonly used in
regression problems.
3. Binary Cross-Entropy - This loss function is used in binary classification problems, where
the output is either 0 or 1. It measures the difference between the predicted and actual
values using the logarithmic loss function.
4. Categorical Cross-Entropy - This loss function is used in multi-class classification
problems, where the output can belong to one of several classes. It measures the
difference between the predicted and actual values using the logarithmic loss function.
5. Hinge Loss - This loss function is commonly used in binary classification problems where
the output is either -1 or 1. It measures the difference between the predicted and actual
values using the hinge loss function.
6. Kullback-Leibler Divergence - This loss function is used to measure the difference
between two probability distributions. It is commonly used in probabilistic models, such
as autoencoders and variational autoencoders.

What is meant by vanishing and exploding gradients?

Vanishing gradients occur when the gradients calculated during backpropagation become
increasingly small as they propagate backward through the layers of the network. This can
cause the weights of the earlier layers to update very slowly or not at all, which can significantly
slow down or even halt the learning process. The problem is especially prevalent in deep
networks with sigmoid or tanh activation functions, which have derivatives that are very small for
large or small inputs.
Conversely, exploding gradients occur when the gradients calculated during backpropagation
become increasingly large as they propagate backward through the layers of the network. This
can cause the weights of the earlier layers to update too quickly, leading to unstable training and
divergence.

State the steps of the back propagation algorithm.

Here are the steps involved in the backpropagation algorithm:


1. Forward propagation: In this step, the input is passed through the network, and the
output is calculated by applying the weights and biases to the input using the activation
functions at each layer.
2. Calculate the loss: The loss function is then calculated by comparing the predicted
output to the actual output.
3. Backward propagation: In this step, the gradient of the loss function with respect to the
weights of the network is calculated using the chain rule of calculus. The gradient is
calculated for each layer of the network, starting from the output layer and moving
backward toward the input layer.
4. Update the weights: After calculating the gradient, the weights are updated using an
optimization algorithm such as stochastic gradient descent (SGD) or Adam optimizer.
The goal is to update the weights in the direction of the negative gradient so as to
minimize the loss function.
5. Repeat: The process is repeated for a certain number of iterations or until the loss
function converges to a minimum.

What do you mean by transfer learning?

Transfer learning (TL) is a research problem in machine learning (ML) that focuses on storing
knowledge gained while solving one problem and applying it to a different but related problem.
For example, knowledge gained while learning to recognize cars could apply when trying to
recognize trucks.

What do you understand about the learning rate in a neural network ?


The learning rate is a hyperparameter in a neural network that determines the step size at which
the optimizer updates the weights of the network during training. In other words, the learning
rate controls how much the weights are adjusted in response to the gradient of the loss function.

What happens if the learning rate is too high or too low?


If the learning rate is too high, the optimizer may take large steps during the weight update
process, causing the weights to overshoot the optimal values. This can result in the loss
function oscillating or diverging, making the network unstable and preventing it from converging
to a minimum. In extreme cases, the loss may explode to very high values, making the network
unusable.
On the other hand, if the learning rate is too low, the optimizer may take very small steps during
the weight update process, slowing down the training process and making it more likely to get
stuck in local minima. The network may also converge to a suboptimal solution, resulting in
lower accuracy or performance.

Is there any difference between deep learning and neural networks?

What is dropout in deep learning? Why is dropout effective in deep networks?

Dropout is a regularization technique in deep learning that is used to prevent overfitting and
improve the generalization performance of the model. It works by randomly dropping out (i.e.,
setting to zero) some of the neurons in a layer during training, with a certain probability.
dropout is effective in deep networks because it provides a simple and effective way to
regularize the network and prevent overfitting, while also promoting the learning of more robust
and generalizable features. It is widely used in many state-of-the-art deep learning models and
is considered a key tool in the deep learning practitioner's toolkit.

You might also like