You are on page 1of 100

DEEP LEARNING VIVA PREPARATION: BY PROF.

VANDANA KATE

Introduction to Machine Learning:


With the help of Machine Learning, we can develop intelligent systems that are capable of taking
decisions on an autonomous basis. These algorithms learn from the past instances of data through
statistical analysis and pattern matching. Then, based on the learned data, it provides us with the
predicted results.
Applications of Machine Learning
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

NEED OF MACHINE LEARNING

• Ever since the technical revolution, we’ve been generating an immeasurable amount of data. As
per research, we generate around 2.5 quintillion bytes of data every single day! It is estimated
that by 2020, 1.7MB of data will be created every second for every person on earth.
• With the availability of so much data, it is finally possible to build predictive models that can
study and analyze complex data to find useful insights and deliver more accurate results.
• Top Tier companies such as Netflix and Amazon build such Machine Learning models by
using tons of data in order to identify profitable opportunities and avoid unwanted risks.

Here’s a list of reasons why Machine Learning is so important:

• Increase in Data Generation: Due to excessive production of data, we need a method that can
be used to structure, analyze and draw useful insights from data. This is where Machine
Learning comes in. It uses data to solve problems and find solutions to the most complex tasks
faced by organizations.
• Improve Decision Making: By making use of various algorithms, Machine Learning can be
used to make better business decisions. For example, Machine Learning is used to forecast
sales, predict downfalls in the stock market, identify risks and anomalies, etc.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

• Uncover patterns & trends in data: Finding hidden patterns and extracting key insights from
data is the most essential part of Machine Learning. By building predictive models and using
statistical techniques, Machine Learning allows you to dig beneath the surface and explore the
data at a minute scale. Understanding data and extracting patterns manually will take days,
whereas Machine Learning algorithms can perform such computations in less than a second.
• Solve complex problems: From detecting the genes linked to the deadly ALS disease to
building self-driving cars, Machine Learning can be used to solve the most complex problems.

To give you a better understanding of how important Machine Learning is, let’s list down a couple of
Machine Learning Applications:

• Netflix’s Recommendation Engine: The core of Netflix is its infamous recommendation engine.
Over 75% of what you watch is recommended by Netflix and these recommendations are made
by implementing Machine Learning.
• Facebook’s Auto-tagging feature: The logic behind Facebook’s DeepMind face verification
system is Machine Learning and Neural Networks. DeepMind studies the facial features in an
image to tag your friends and family.

Automatic Friend Tagging Suggestions in Facebook or any other social media platform.
Facebook uses face detection and Image recognition to automatically find the face of the
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

person which matches it’s Database and hence suggests us to tag that person based on
DeepFace.

Facebook’s Deep Learning project DeepFace is responsible for the recognition of faces and
identifying which person is in the picture. It also provides Alt Tags (Alternative Tags) to
images already uploaded on facebook. For eg., if we inspect the following image on Facebook,
the alt-tag has a description.

• Amazon’s Alexa: The infamous Alexa, which is based on Natural Language Processing and
Machine Learning is an advanced level Virtual Assistant that does more than just play songs on
your playlist. It can book you an Uber, connect with the other IoT devices at home, track your
health, etc.

Few of the major Applications of Machine Learning here are:

• Speech Recognition
• Speech to Text Conversion
• Natural Language Processing
• Text to Speech Conversion
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

• Google’s Spam Filter: Gmail makes use of Machine Learning to filter out spam messages. It
uses Machine Learning algorithms and Natural Language Processing to analyze emails in real-
time and classify them as either spam or non-spam.
• Traffic Alert:

Despite the Heavy Traffic, you are on the fastest route“. But, How does it know that?

Historic Data of that route collected over time and few tricks acquired from other companies.
Everyone using maps is providing their location, average speed, the route in which they are
traveling which in turn helps Google collect massive Data about the traffic, which makes them
predict the upcoming traffic and adjust your route according to it.

• Transportation and Commuting (Uber)


If you have used an app to book a cab, you are already using Machine Learning to an extent. It
provides a personalized application which is unique to you. Automatically detects
your location and provides options to either go home or office or any other frequent place based
on your History and Patterns.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

• Products Recommendations

Suppose you check an item on Amazon, but you do not buy it then and there. But the next day, you’re
watching videos on YouTube and suddenly you see an ad for the same item. You switch to Facebook,
there also you see the same ad. So how does this happen?

Well, this happens because Google tracks your search history, and recommends ads based on your
search history. This is one of the coolest applications of Machine Learning. In fact, 35% of
Amazon’s revenue is generated by Product Recommendations.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

Well, this happens because Google tracks your search history, and recommends ads based on your
search history. This is one of the coolest applications of Machine Learning. In fact, 35% of
Amazon’s revenue is generated by Product Recommendations.

Self Driving Cars


Well, here is one of the coolest application of Machine Learning. It’s here and people are already
using it. Machine Learning plays a very important role in Self Driving Cars and I’m sure you guys
might have heard about Tesla. The leader in this business and their current Artificial Intelligence is
driven by hardware manufacturer NVIDIA, which is based on Unsupervised Learning Algorithm.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

NVIDIA stated that they didn’t train their model to detect people or any object as such. The model
works on Deep Learning and it crowdsources data from all of its vehicles and its drivers. It uses
internal and external sensors which are a part of IOT. According to the data gathered by McKinsey,
the automotive data will hold a tremendous value of $750 Billion.

Introduction To Machine Learning

The term Machine Learning was first coined by Arthur Samuel in the year 1959. Looking back, that
year was probably the most significant in terms of technological advancements.

If you browse through the net about ‘what is Machine Learning’, you’ll get at least 100 different
definitions. However, the very first formal definition was given by Tom M. Mitchell:
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

“A computer program is said to learn from experience E with respect to some class of tasks T and
performance measure P if its performance at tasks in T, as measured by P, improves with experience
E.”
In simple terms, Machine learning is a subset of Artificial Intelligence (AI) which provides machines
the ability to learn automatically & improve from experience without being explicitly programmed to
do so. In the sense, it is the practice of getting Machines to solve problems by gaining the ability to
think.

Machine Learning is the most popular technique of predicting the future or classifying information to
help people in making necessary decisions. Machine Learning algorithms are trained over instances
or examples through which they learn from past experiences and also analyze the historical data.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

Therefore, as it trains over the examples, again and again, it is able to identify patterns in order to
make predictions about the future.

Machine Learning Definitions


Algorithm: A Machine Learning algorithm is a set of rules and statistical techniques used to learn
patterns from data and draw significant information from it. It is the logic behind a Machine Learning
model. An example of a Machine Learning algorithm is the Linear Regression algorithm.

Model: A model is the main component of Machine Learning. A model is trained by using a
Machine Learning Algorithm. An algorithm maps all the decisions that a model is supposed to take
based on the given input, in order to get the correct output.

Predictor Variable: It is a feature(s) of the data that can be used to predict the output.

Response Variable: It is the feature or the output variable that needs to be predicted by using the
predictor variable(s).

Training Data: The Machine Learning model is built using the training data. The training data helps
the model to identify key trends and patterns essential to predict the output.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

Testing Data: After the model is trained, it must be tested to evaluate how accurately it can predict
an outcome. This is done by the testing data set.

What Is Machine Learning? – Introduction To Machine Learning – Edureka

To sum it up, take a look at the above figure. A Machine Learning process begins by feeding the
machine lots of data, by using this data the machine is trained to detect hidden insights and trends.
These insights are then used to build a Machine Learning Model by using an algorithm in order to
solve a problem.

The next topic in this Introduction to Machine Learning blog is the Machine Learning Process.

Machine Learning Process


The Machine Learning process involves building a Predictive model that can be used to find a
solution for a Problem Statement. To understand the Machine Learning process let’s assume that you
have been given a problem that needs to be solved by using Machine Learning.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

Machine Learning Process – Introduction To Machine Learning – Edureka

The problem is to predict the occurrence of rain in your local area by using Machine Learning.

The below steps are followed in a Machine Learning process:

Step 1: Define the objective of the Problem Statement


DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

At this step, we must understand what exactly needs to be predicted. In our case, the objective is to
predict the possibility of rain by studying weather conditions. At this stage, it is also essential to take
mental notes on what kind of data can be used to solve this problem or the type of approach you must
follow to get to the solution.

Step 2: Data Gathering

At this stage, you must be asking questions such as,

• What kind of data is needed to solve this problem?


• Is the data available?
• How can I get the data?

Once you know the types of data that is required, you must understand how you can derive this data.
Data collection can be done manually or by web scraping. However, if you’re a beginner and you’re
just looking to learn Machine Learning you don’t have to worry about getting the data. There are
1000s of data resources on the web, you can just download the data set and get going.

Coming back to the problem at hand, the data needed for weather forecasting includes measures such
as humidity level, temperature, pressure, locality, whether or not you live in a hill station, etc. Such
data must be collected and stored for analysis.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

Step 3: Data Preparation

The data you collected is almost never in the right format. You will encounter a lot of inconsistencies
in the data set such as missing values, redundant variables, duplicate values, etc. Removing such
inconsistencies is very essential because they might lead to wrongful computations and predictions.
Therefore, at this stage, you scan the data set for any inconsistencies and you fix them then and there.

Step 4: Exploratory Data Analysis

Grab your detective glasses because this stage is all about diving deep into data and finding all the
hidden data mysteries. EDA or Exploratory Data Analysis is the brainstorming stage of Machine
Learning. Data Exploration involves understanding the patterns and trends in the data. At this stage,
all the useful insights are drawn and correlations between the variables are understood.

For example, in the case of predicting rainfall, we know that there is a strong possibility of rain if the
temperature has fallen low. Such correlations must be understood and mapped at this stage.

Step 5: Building a Machine Learning Model

All the insights and patterns derived during Data Exploration are used to build the Machine Learning
Model. This stage always begins by splitting the data set into two parts, training data, and testing
data. The training data will be used to build and analyze the model. The logic of the model is based
on the Machine Learning Algorithm that is being implemented.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

In the case of predicting rainfall, since the output will be in the form of True (if it will rain tomorrow)
or False (no rain tomorrow), we can use a Classification Algorithm such as Logistic Regression.

Choosing the right algorithm depends on the type of problem you’re trying to solve, the data set and
the level of complexity of the problem. In the upcoming sections, we will discuss the different types
of problems that can be solved by using Machine Learning.

Step 6: Model Evaluation & Optimization

After building a model by using the training data set, it is finally time to put the model to a test. The
testing data set is used to check the efficiency of the model and how accurately it can predict the
outcome. Once the accuracy is calculated, any further improvements in the model can be
implemented at this stage. Methods like parameter tuning and cross-validation can be used to
improve the performance of the model.

Step 7: Predictions

Once the model is evaluated and improved, it is finally used to make predictions. The final output can
be a Categorical variable (eg. True or False) or it can be a Continuous Quantity (eg. the predicted
value of a stock).

In our case, for predicting the occurrence of rainfall, the output will be a categorical variable.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

So that was the entire Machine Learning process. Now it’s time to learn about the different ways in
which Machines can learn.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

QUESTION AND ANSWERS

1. What is Deep Learning?

Deep Learning involves taking large volumes of structured or unstructured data and using complex
algorithms to train neural networks. It performs complex operations to extract hidden patterns and
features (for instance, distinguishing the image of a cat from that of a dog).
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

2. What is a Neural Network?

Neural Networks replicate the way humans learn, inspired by how the neurons in our brains fire, only
much simpler.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

The most common Neural Networks consist of three network layers:

1. An input layer
2. A hidden layer (this is the most important layer where feature extraction takes place, and
adjustments are made to train faster and function better)
3. An output layer
Each sheet contains neurons called “nodes,” performing various operations. Neural Networks are
used in deep learning algorithms like CNN, RNN, GAN, etc.

3. What Is a Multi-layer Perceptron(MLP)?

As in Neural Networks, MLPs have an input layer, a hidden layer, and an output layer. It has the
same structure as a single layer perceptron with one or more hidden layers. A single layer perceptron
can classify only linear separable classes with binary output (0,1), but MLP can classify nonlinear
classes.

Except for the input layer, each node in the other layers uses a nonlinear activation function. This
means the input layers, the data coming in, and the activation function is based upon all nodes and
weights being added together, producing the output. MLP uses a supervised learning method called
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

“backpropagation.” In backpropagation, the neural network calculates the error with the help of cost
function. It propagates this error backward from where it came (adjusts the weights to train the model
more accurately).

4. What Is Data Normalization, and Why Do We Need It?

The process of standardizing and reforming data is called “Data Normalization.” It’s a pre-processing
step to eliminate data redundancy. Often, data comes in, and you get the same information in
different formats. In these cases, you should rescale values to fit into a particular range, achieving
better convergence.

5. What is the Boltzmann Machine?

One of the most basic Deep Learning models is a Boltzmann Machine, resembling a simplified
version of the Multi-Layer Perceptron. This model features a visible input layer and a hidden layer --
just a two-layer neural net that makes stochastic decisions as to whether a neuron should be on or off.
Nodes are connected across layers, but no two nodes of the same layer are connected.

6. What Is the Role of Activation Functions in a Neural Network?


DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

At the most basic level, an activation function decides whether a neuron should be fired or not. It
accepts the weighted sum of the inputs and bias as input to any activation function. Step function,
Sigmoid, ReLU, Tanh, and Softmax are examples of activation functions.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

7. What Is the Cost Function?

Also referred to as “loss” or “error,” cost function is a measure to evaluate how good your model’s
performance is. It’s used to compute the error of the output layer during backpropagation. We push
that error backward through the neural network and use that during the different training functions.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

8. What Is Gradient Descent?

Gradient Descent is an optimal algorithm to minimize the cost function or to minimize an error. The
aim is to find the local-global minima of a function. This determines the direction the model should
take to reduce the error.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

9. What Do You Understand by Backpropagation?

Backpropagation is a technique to improve the performance of the network. It backpropagates the


error and updates the weights to reduce the error.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

10. What Is the Difference Between a Feedforward Neural Network and Recurrent Neural Network?

A Feedforward Neural Network signals travel in one direction from input to output. There are no
feedback loops; the network considers only the current input. It cannot memorize previous inputs
(e.g., CNN).

A Recurrent Neural Network’s signals travel in both directions, creating a looped network. It
considers the current input with the previously received inputs for generating the output of a layer
and can memorize past data due to its internal memory.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

11. What Are the Applications of a Recurrent Neural Network (RNN)?

The RNN can be used for sentiment analysis, text mining, and image captioning. Recurrent Neural
Networks can also address time series problems such as predicting the prices of stocks in a month or
quarter.

12. What Are the Softmax and ReLU Functions?

Softmax is an activation function that generates the output between zero and one. It divides each
output, such that the total sum of the outputs is equal to one. Softmax is often used for output layers.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

ReLU (or Rectified Linear Unit) is the most widely used activation function. It gives an output of X if
X is positive and zeros otherwise. ReLU is often used for hidden layers.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

13. What Are Hyperparameters?

With neural networks, you’re usually working with hyperparameters once the data is formatted
correctly. A hyperparameter is a parameter whose value is set before the learning process begins. It
determines how a network is trained and the structure of the network (such as the number of hidden
units, the learning rate, epochs, etc.).
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

14. What Will Happen If the Learning Rate Is Set Too Low or Too High?

When your learning rate is too low, training of the model will progress very slowly as we are making
minimal updates to the weights. It will take many updates before reaching the minimum point.

If the learning rate is set too high, this causes undesirable divergent behavior to the loss function due
to drastic updates in weights. It may fail to converge (model can give a good output) or even diverge
(data is too chaotic for the network to train).
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

15. What Is Dropout and Batch Normalization?

Dropout is a technique of dropping out hidden and visible units of a network randomly to prevent
overfitting of data (typically dropping 20 percent of the nodes). It doubles the number of iterations
needed to converge the network.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

Batch normalization is the technique to improve the performance and stability of neural networks by
normalizing the inputs in every layer so that they have mean output activation of zero and standard
deviation of one.

16. What Is the Difference Between Batch Gradient Descent and Stochastic Gradient Descent?

Batch Gradient Descent Stochastic Gradient Descent

The stochastic gradient computes


The batch gradient computes the gradient using the entire
the gradient using a single sample.
dataset.
It converges much faster than the
It takes time to converge because the volume of data is
batch gradient because it updates
huge, and weights update slowly.
weight more frequently.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

17. What is Overfitting and Underfitting, and How to Combat Them?

Overfitting occurs when the model learns the details and noise in the training data to the degree that it
adversely impacts the execution of the model on new information. It is more likely to occur with
nonlinear models that have more flexibility when learning a target function. An example would be if
a model is looking at cars and trucks, but only recognizes trucks that have a specific box shape. It
might not be able to notice a flatbed truck because there's only a particular kind of truck it saw in
training. The model performs well on training data, but not in the real world.

Underfitting alludes to a model that is neither well-trained on data nor can generalize to new
information. This usually happens when there is less and incorrect data to train a model. Underfitting
has both poor performance and accuracy.

To combat overfitting and underfitting, you can resample the data to estimate the model accuracy (k-
fold cross-validation) and by having a validation dataset to evaluate the model.

18. How Are Weights Initialized in a Network?

There are two methods here: we can either initialize the weights to zero or assign them randomly.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

Initializing all weights to 0: This makes your model similar to a linear model. All the neurons and
every layer perform the same operation, giving the same output and making the deep net useless.

Initializing all weights randomly: Here, the weights are assigned randomly by initializing them very
close to 0. It gives better accuracy to the model since every neuron performs different computations.
This is the most commonly used method.

19. What Are the Different Layers on CNN?

There are four layers in CNN:

1. Convolutional Layer - the layer that performs a convolutional operation, creating several smaller
picture windows to go over the data.
2. ReLU Layer - it brings non-linearity to the network and converts all the negative pixels to zero.
The output is a rectified feature map.
3. Pooling Layer - pooling is a down-sampling operation that reduces the dimensionality of the
feature map.
4. Fully Connected Layer - this layer recognizes and classifies the objects in the image.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

20. What is Pooling on CNN, and How Does It Work?

Pooling is used to reduce the spatial dimensions of a CNN. It performs down-sampling operations to
reduce the dimensionality and creates a pooled feature map by sliding a filter matrix over the input
matrix.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

22. What Are Vanishing and Exploding Gradients?

While training an RNN, your slope can become either too small or too large; this makes the training
difficult. When the slope is too small, the problem is known as a “Vanishing Gradient.” When the
slope tends to grow exponentially instead of decaying, it’s referred to as an “Exploding Gradient.”
Gradient problems lead to long training times, poor performance, and low accuracy.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

23. What Is the Difference Between Epoch, Batch, and Iteration in Deep Learning?

• Epoch - Represents one iteration over the entire dataset (everything put into the training model).
• Batch - Refers to when we cannot pass the entire dataset into the neural network at once, so we
divide the dataset into several batches.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

• Iteration - if we have 10,000 images as data and a batch size of 200. then an epoch should run 50
iterations (10,000 divided by 50).

24. Why is Tensorflow the Most Preferred Library in Deep Learning?

Tensorflow provides both C++ and Python APIs, making it easier to work on and has a faster
compilation time compared to other Deep Learning libraries like Keras and Torch. Tensorflow
supports both CPU and GPU computing devices.

25. What Do You Mean by Tensor in Tensorflow?

A tensor is a mathematical object represented as arrays of higher dimensions. These arrays of data
with different dimensions and ranks fed as input to the neural network are called “Tensors.”
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

28. Explain Generative Adversarial Network.

Suppose there is a wine shop purchasing wine from dealers, which they resell later. But some dealers
sell fake wine. In this case, the shop owner should be able to distinguish between fake and authentic
wine.

The forger will try different techniques to sell fake wine and make sure specific techniques go past
the shop owner’s check. The shop owner would probably get some feedback from wine experts that
some of the wine is not original. The owner would have to improve how he determines whether a
wine is fake or authentic.

The forger’s goal is to create wines that are indistinguishable from the authentic ones while the shop
owner intends to tell if the wine is real or not accurately.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

Let us understand this example with the help of an image shown above.

There is a noise vector coming into the forger who is generating fake wine.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

Here the forger acts as a Generator.

The shop owner acts as a Discriminator.

The Discriminator gets two inputs; one is the fake wine, while the other is the real authentic wine.
The shop owner has to figure out whether it is real or fake.

So, there are two primary components of Generative Adversarial Network (GAN) named:

1. Generator
2. Discriminator
The generator is a CNN that keeps keys producing images and is closer in appearance to the real
images while the discriminator tries to determine the difference between real and fake images The
ultimate aim is to make the discriminator learn to identify real and fake images.

29. What Is an Auto-encoder?


DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

This Neural Network has three layers in which the input neurons are equal to the output neurons. The
network's target outside is the same as the input. It uses dimensionality reduction to restructure the
input. It works by compressing the image input to a latent space representation then reconstructing
the output from this representation.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

30. What Is Bagging and Boosting?

Bagging and Boosting are ensemble techniques to train multiple models using the same learning
algorithm and then taking a call.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

With Bagging, we take a dataset and split it into training data and test data. Then we randomly select
data to place into the bags and train the model separately.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

With Boosting, the emphasis is on selecting data points which give wrong output to improve the
accuracy.

31. What are 3 major categories of neural networks?

The three most important types of neural networks are: Artificial Neural Networks (ANN);
Convolution Neural Networks (CNN), and Recurrent Neural Networks (RNN).

32. What is neural network and its types?

Neural Networks are artificial networks used in Machine Learning that work in a similar fashion to
the human nervous system. Many things are connected in various ways for a neural network to mimic
and work like the human brain. Neural networks are basically used in computational models.

33. What is CNN and DNN?


DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

A deep neural network (DNN) is an artificial neural network (ANN) with multiple layers between the
input and output layers. They can model complex non-linear relationships. Convolutional Neural
Networks (CNN) are an alternative type of DNN that allow modelling both time and space
correlations in multivariate signals.

34. How does CNN differ from Ann?

CNN is a specific kind of ANN that has one or more layers of convolutional units. The class
of ANN covers several architectures including Convolutional Neural Networks (CNN), Recurrent
Neural Networks (RNN) eg LSTM and GRU, Autoencoders, and Deep Belief Networks.

35. Why is CNN better than MLP?

Multilayer Perceptron (MLP) is great for MNIST as it is a simpler and more straight forward dataset,
but it lags when it comes to real-world application in computer vision, specifically image
classification as compared to CNN which is great.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

36. How Does an LSTM Network Work?

Long-Short-Term Memory (LSTM) is a special kind of recurrent neural network capable of learning
long-term dependencies, remembering information for long periods as its default behavior. There are
three steps in an LSTM network:

• Step 1: The network decides what to forget and what to remember.


• Step 2: It selectively updates cell state values.
• Step 3: The network decides what part of the current state makes it to the output.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

37. Recurrent Neural Networks

Applications of Recurrent Neural Networks

• Text processing like auto suggest, grammar checks, etc.


• Text to speech processing
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

• Image tagger
• Sentiment Analysis
• Translation

Designed to save the output of a layer, Recurrent Neural Network is fed back to the input to
help in predicting the outcome of the layer. The first layer is typically a feed forward neural
network followed by recurrent neural network layer where some information it had in the
previous time-step is remembered by a memory function. Forward propagation is
implemented in this case. It stores information required for it’s future use. If the prediction is
wrong, the learning rate is employed to make small changes. Hence, making it gradually
increase towards making the right prediction during the backpropagation.

Advantages of Recurrent Neural Networks

1. Model sequential data where each sample can be assumed to be dependent on historical ones
is one of the advantage.
2. Used with convolution layers to extend the pixel effectiveness.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

Disadvantages of Recurrent Neural Networks

1. Gradient vanishing and exploding problems


2. Training recurrent neural nets could be a difficult task
3. Difficult to process long sequential data using ReLU as an activation function.

38. Improvement over RNN: LSTM (Long Short-Term Memory) Networks

LSTM networks are a type of RNN that uses special units in addition to standard units. LSTM units
include a ‘memory cell’ that can maintain information in memory for long periods of time. A set of
gates is used to control when information enters the memory when it’s output, and when it’s
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

forgotten. There are three types of gates viz, Input gate, output gate and forget gate. Input gate
decides how many information from the last sample will be kept in memory; the output gate regulates
the amount of data passed to the next layer, and forget gates control the tearing rate of memory
stored. This architecture lets them learn longer-term dependencies
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

39. Sequence to sequence models

A sequence to sequence model consists of two Recurrent Neural Networks. Here, there exists an
encoder that processes the input and a decoder that processes the output. The encoder and decoder
work simultaneously – either using the same parameter or different ones. This model, on contrary to
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

the actual RNN, is particularly applicable in those cases where the length of the input data is equal to
the length of the output data. While they possess similar benefits and limitations of the RNN, these
models are usually applied mainly in chatbots, machine translations, and question answering systems.

40. Convolutional Neural Network

Applications on Convolution Neural Network


DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

• Image processing
• Computer Vision
• Speech Recognition
• Machine translation

Convolution neural network contains a three-dimensional arrangement of neurons, instead of the


standard two-dimensional array. The first layer is called a convolutional layer. Each neuron in the
convolutional layer only processes the information from a small part of the visual field. Input features
are taken in batch-wise like a filter. The network understands the images in parts and can compute
these operations multiple times to complete the full image processing. Processing involves
conversion of the image from RGB or HSI scale to grey-scale. Furthering the changes in the pixel
value will help to detect the edges and images can be classified into different categories.

Propagation is uni-directional where CNN contains one or more convolutional layers followed by
pooling and bidirectional where the output of convolution layer goes to a fully connected neural
network for classifying the images as shown in the above diagram. Filters are used to extract certain
parts of the image. In MLP the inputs are multiplied with weights and fed to the activation function.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

Convolution uses RELU and MLP uses nonlinear activation function followed by softmax.
Convolution neural networks show very effective results in image and video recognition, semantic
parsing and paraphrase detection.

Advantages of Convolution Neural Network:

1. Used for deep learning with few parameters


2. Less parameters to learn as compared to fully connected layer

Disadvantages of Convolution Neural Network:

• Comparatively complex to design and maintain


• Comparatively slow [depends on the number of hidden layers]

41. What is the difference between a Perceptron and Logistic Regression?

A Multi-Layer Perceptron (MLP) is one of the most basic neural networks that we use for
classification. For a binary classification problem, we know that the output can be either 0 or 1. This
is just like our simple logistic regression, where we use a logit function to generate a probability
between 0 and 1.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

So, what’s the difference between the two?

Simply put, it is just the difference in the threshold function! When we restrict the logistic regression
model to give us either exactly 1 or exactly 0, we get a Perceptron model:

42. Can we have the same bias for all neurons of a hidden layer?
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

Essentially, you can have a different bias value at each layer or at each neuron as well. However, it is
best if we have a bias matrix for all the neurons in the hidden layers as well.

A point to note is that both these strategies would give you very different results.

43. What if we do not use any activation function(s) in a neural network?

The main aim of this question is to understand why we need activation functions in a neural network.
You can start off by giving a simple explanation of how neural networks are built:

Step 1: Calculate the sum of all the inputs (X) according to their weights and include the bias term:

Z = (weights * X) + bias

Step 2: Apply an activation function to calculate the expected output:

Y = Activation(Z)
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

Steps 1 and 2 are performed at each layer. If you recollect, this is nothing but forward propagation!
Now, what if there is no activation function?

Our equation for Y essentially becomes:

Y = Z = (weights * X) + bias

Wait – isn’t this just a simple linear equation? Yes – and that is why we need activation functions. A
linear equation will not be able to capture the complex patterns in the data – this is even more evident
in the case of deep learning problems.

In order to capture non-linear relationships, we use activation functions, and that is why a neural
network without an activation function is just a linear regression model.

44. In a neural network, what if all the weights are initialized with the same value?

In simplest terms, if all the neurons have the same value of weights, each hidden unit will get exactly
the same signal. While this might work during forward propagation, the derivative of the cost
function during backward propagation would be the same every time.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

In short, there is no learning happening by the network! What do you call the phenomenon of the
model being unable to learn any patterns from the data? Yes, underfitting.

Therefore, if all weights have the same initial value, this would lead to underfitting.

Note: This question might further lead to questions on exploding and vanishing gradients, which I
have covered below.

45. List the supervised and unsupervised tasks in Deep Learning.

Now, this can be one tricky question. There might be a misconception that deep learning can only
solve unsupervised learning problems. This is not the case. Some example of Supervised Learning
and Deep learning include:

• Image classification
• Text classification
• Sequence tagging

On the other hand, there are some unsupervised deep learning techniques as well:
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

• Word embeddings (like Skip-gram and Continuous Bag of Words): Understanding Word
Embeddings: From Word2Vec to Count Vectors
• Autoencoders: Learn How to Enhance a Blurred Image using an Autoencoder!

46. What is the role of weights and bias in a neural network?

This is a question best explained with a real-life example. Consider that you want to go out today to
play a cricket match with your friends. Now, a number of factors can affect your decision-making,
like:

• How many of your friends can make it to the game?


• How much equipment can all of you bring?
• What is the temperature outside?

And so on. These factors can change your decision greatly or not too much. For example, if it is
raining outside, then you cannot go out to play at all. Or if you have only one bat, you can share it
while playing as well. The magnitude by which these factors can affect the game is called the weight
of that factor.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

Factors like the weather or temperature might have a higher weight, and other factors like equipment
would have a lower weight.

However, does this mean that we can play a cricket match with only one bat? No – we would need 1
ball and 6 wickets as well. This is where bias comes into the picture. Bias lets you assign some
threshold which helps you activate a decision-point (or a neuron) only when that threshold is crossed.

47. How does forward propagation and backpropagation work in deep learning?

Now, this can be answered in two ways. If you are on a phone interview, you cannot perform all the
calculus in writing and show the interviewer. In such cases, it best to explain it as such:

• Forward propagation: The inputs are provided with weights to the hidden layer. At each
hidden layer, we calculate the output of the activation at each node and this further propagates
to the next layer till the final output layer is reached. Since we start from the inputs to the final
output layer, we move forward and it is called forward propagation
• Backpropagation: We minimize the cost function by its understanding of how it changes with
changing the weights and biases in a neural network. This change is obtained by calculating the
gradient at each hidden layer (and using the chain rule). Since we start from the final cost
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

function and go back each hidden layer, we move backward and thus it is called backward
propagation

For an in-person interview, it is best to take up the marker, create a simple neural network with 2
inputs, a hidden layer, and an output layer, and explain it.

Forward propagation:
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

Backpropagation:

At layer L2, for all weights:


DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

At layer L1, for all weights:


DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

You need not explain with respect to the bias term as well, though you might need to expand the
above equations substituting the actual derivatives.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

48. Why should we use Batch Normalization?

Batch Normalization is one of the techniques used for reducing the training time of our deep learning
algorithm. Just like normalizing our input helps improve our logistic regression model, we can
normalize the activations of the hidden layers in our deep learning model as well:

We basically normalize a[1] and a[2] here. This means we normalize the inputs to the layer, and then
apply the activation functions to the normalized inputs.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

Here is an article that explains Batch Normalization and other techniques for improving Neural
Networks: Neural Networks – Hyperparameter Tuning, Regularization & Optimization.

49. List the activation functions you have used so far in your projects and how you would
choose one.

The most common activation functions are:

• Sigmoid
• Tanh
• ReLU
• Softmax

While it is not important to know all the activation functions, you can always score points by
knowing the range of these functions and how they are used. Here is a handy table for you to follow:
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

Here is a great guide on how to use these and other activations functions: Fundamentals of Deep
Learning – Activation Functions and When to Use Them?.

50. Why does a Convolutional Neural Network (CNN) work better with image data?

The key to this question lies in the Convolution operation. Unlike humans, the machine sees the
image as a matrix of pixel values. Instead of interpreting a shape like a petal or an ear, it just
identifies curves and edges.

Thus, instead of looking at the entire image, it helps to just read the image in parts. Doing this for a
300 x 300 pixel image would mean dividing the matrix into smaller 3 x 3 matrices and dealing with
them one by one. This is convolution.

Mathematically, we just perform a small operation on the matrix to help us detect features in the
image – like boundaries, colors, etc.

Z=X*f

Here, we are convolving (* operation – not multiplication) the input matrix X with another small
matrix f, called the kernel/filter to create a new matrix Z. This matrix is then passed on to the other
layers.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

If you have a board/screen in front of you, you can always illustrate this with a simple example:

Learning more about how CNNs work here.

51. Why do RNNs work better with text data?


DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

The main component that differentiates Recurrent Neural Networks (RNN) from the other models is
the addition of a loop at each node. This loop brings the recurrence mechanism in RNNs. In a basic
Artificial Neural Network (ANN), each input is given the same weight and fed to the network at the
same time. So, for a sentence like “I saw the movie and hated it”, it would be difficult to capture the
information which associates “it” with the “movie”.

The addition of a loop is to denote preserving the previous node’s information for the next node, and
so on. This is why RNNs are much better for sequential data, and since text data also is sequential in
nature, they are an improvement over ANNs.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

52. In a CNN, if the input size 5 X 5 and the filter size is 7 X 7, then what would be the size of the
output?

This is a pretty intuitive answer. As we saw above, we perform the convolution on ‘x’ one step at a
time, to the right, and in the end, we got Z with dimensions 2 X 2, for X with dimensions 3 X 3.

Thus, to make the input size similar to the filter size, we make use of padding – adding 0s to the input
matrix such that its new size becomes at least 7 X 7. Thus, the output size would be using the
formula:

Dimension of image = (n, n) = 5 X 5

Dimension of filter = (f,f) = 7 X 7

Padding = 1 (adding 1 pixel with value 0 all around the edges)

Dimension of output will be (n+2p-f+1) X (n+2p-f+1) = 1 X 1

53. What’s the difference between valid and same padding in a CNN?
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

This question has more chances of being a follow-up question to the previous one. Or if you have
explained how you used CNNs in a computer vision task, the interviewer might ask this question
along with the details of the padding parameters.

• Valid Padding: When we do not use any padding. The resultant matrix after convolution will
have dimensions (n – f + 1) X (n – f + 1)
• Same padding: Adding padded elements all around the edges such that the output matrix will
have the same dimensions as that of the input matrix

54. What do you mean by exploding and vanishing gradients?

The key here is to make the explanation as simple as possible. As we know, the gradient descent
algorithm tries to minimize the error by taking small steps towards the minimum value. These steps
are used to update the weights and biases in a neural network.

However, at times, the steps become too large and this results in larger updates to weights and bias
terms – so much so as to cause an overflow (or a NaN) value in the weights. This leads to an unstable
algorithm and is called an exploding gradient.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

On the other hand, the steps are too small and this leads to minimal changes in the weights and bias
terms – even negligible changes at times. We thus might end up training a deep learning model with
almost the same weights and biases each time and never reach the minimum error function. This is
called the vanishing gradient.

A point to note is that both these issues are specifically evident in Recurrent Neural Networks – so be
prepared for follow-up questions on RNN!

55. What are the applications of transfer learning in Deep Learning?

I am sure you would have a doubt as to why a relatively simple question was included in the
Intermediate Level. The reason is the sheer volume of subsequent questions it can generate!

The use of transfer learning has been one of the key milestones in deep learning. Training a large
model on a huge dataset, and then using the final parameters on smaller simpler datasets has led to
defining breakthroughs in the form of Pretrained Models. Be it Computer Vision or NLP, pretrained
models have become the norm in research and in the industry.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

56. How backpropagation is different in RNN compared to ANN?

In Recurrent Neural Networks, we have an additional loop at each node:

This loop essentially includes a time component into the network as well. This helps in capturing
sequential information from the data, which could not be possible in a generic artificial neural
network.

This is why the backpropagation in RNN is called Backpropagation through Time, as in


backpropagation at each time step.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

You can find a detailed explanation of RNNs here: Fundamentals of Deep Learning – Introduction to
Recurrent Neural Networks.

57. How does LSTM solve the vanishing gradient challenge?

The LSTM model is considered a special case of RNNs. The problems of vanishing gradients and
exploding gradients we saw earlier are a disadvantage while using the plain RNN model.

In LSTMs, we add a forget gate, which is basically a memory unit that retains information that is
retained across timesteps and discards the other information that is not needed. This also necessitates
the need for input and output gates to include the results of the forget gate as well.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

58. Why is GRU faster as compared to LSTM?

As you can see, the LSTM model can become quite complex. In order to still retain the functionality
of retaining information across time and yet not make a too complex model, we need GRUs.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

Basically, in GRUs, instead of having an additional Forget gate, we combine the input and Forget
gates into a single Update Gate:

It is this reduction in the number of gates that makes GRU less complex and faster than LSTM. You
can learn about GRUs, LSTMs and other sequence models in detail here: Must-Read Tutorial to
Learn Sequence Modeling & Attention Models.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

59 : - Deep Reinforcement Learning Answer –

Deep reinforcement learning combines artificial neural networks with a reinforcement learning
architecture that enables software-defined agents to learn the best actions possible in virtual
environment in order to attain their goals. Deep reinforcement learning is the combination of
reinforcement learning (RL) and deep learning. This field of research has been able to solve a wide
range of complex decisionmaking tasks that were previously out of reach for a machine. Thus, deep
RL opens up many new applications in domains such as healthcare, robotics, smart grids, finance,
and many more. This manuscript provides an introduction to deep reinforcement learning models,
algorithms and techniques. Particular focus is on the aspects related to generalization and how deep
RL can be used for practical applications. We assume the reader is familiar with basic machine
learning concepts.

60 : - Autoencoder Architecture Answer :-

An autoencoder is a neural network architecture capable of discovering structure within data in order
to develop a compressed representation of the input. ... Because autoencoders learn how to compress
the data based on attributes.Autoencoders are an unsupervised learning technique in which we
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

leverage neural networks for the task of representation learning. Specifically, we'll design a neural
network architecture such that we impose a bottleneck in the network which forces a compressed
knowledge representation of the original input. If the input features were each independent of one
another, this compression and subsequent reconstruction would be a very difficult task. However, if
some sort of structure exists in the data (ie. correlations between input features), this structure can be
learned and consequently leveraged when forcing the input through the network's bottleneck

61 In Reinforcement Learning (RL), agents are trained on a reward and punishment mechanism.
The agent is rewarded for correct moves and punished for the wrong ones. In doing so, the agent tries
to minimize wrong moves and maximize the right ones.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

Source
In this article, we’ll look at some of the real-world applications of reinforcement learning.

62. What is ‘Overfitting’ in Machine learning?

In machine learning, when a statistical model describes random error or noise instead of
underlying relationship ‘overfitting’ occurs. When a model is excessively complex,
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

overfitting is normally observed, because of having too many parameters with respect to the
number of training data types. The model exhibits poor performance which has been overfit.

63 Why overfitting happens?

The possibility of overfitting exists as the criteria used for training the model is not the same
as the criteria used to judge the efficacy of a model.

64What are Bayesian Networks (BN) ?

Bayesian Network is used to represent the graphical model for probability relationship
among a set of variables.

65 Types of RNN

The main reason that the recurrent nets are more exciting is that they allow us to operate
over sequences of vectors: Sequence in the input, the output, or in the most general case,
both. A few examples may this more concrete:
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

Each rectangle in the above image represents vectors, and arrows represent functions.
Input vectors are Red, output vectors are blue, and green holds RNN's state.

One-to-one:

This is also called Plain Neural networks. It deals with a fixed size of the input to the fixed
size of output, where they are independent of previous information/output.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

Example: Image classification.

One-to-Many:

It deals with a fixed size of information as input that gives a sequence of data as output.

Example: Image Captioning takes the image as input and outputs a sentence of words.

Many-to-One:

It takes a sequence of information as input and outputs a fixed size of the output.

Example: sentiment analysis where any sentence is classified as expressing the positive or
negative sentiment.

Many-to-Many:

It takes a Sequence of information as input and processes the recurrently outputs as a


Sequence of data.

Example: Machine Translation, where the RNN reads any sentence in English and then
outputs the sentence in French.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

Bidirectional Many-to-Many:

Synced sequence input and output. Notice that in every case are no pre-specified
constraints on the lengths sequences because the recurrent transformation (green) is fixed
and can be applied as many times as we like.

Example: Video classification where we wish to label every frame of the video.

66. What do you mean by Dropout?

Dropout is a cheap regulation technique used for reducing overfitting in neural networks. We
randomly drop out a set of nodes at each training step. As a result, we create a different
model for each training case, and all of these models share weights. It's a form of model
averaging.

67. What do you understand by Boltzmann Machine?

A Boltzmann machine (also known as stochastic Hopfield network with hidden units) is a
type of recurrent neural network. In a Boltzmann machine, nodes make binary decisions
with some bias. Boltzmann machines can be strung together to create more sophisticated
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

systems such as deep belief networks. Boltzmann Machines can be used to optimize the
solution to a problem.

Some important points about Boltzmann Machine-

o It uses a recurrent structure.


o It consists of stochastic neurons, which include one of the two possible states, either 1
or 0.
o The neurons present in this are either in an adaptive state (free state) or clamped state
(frozen state).
o If we apply simulated annealing or discrete Hopfield network, then it would become a
Boltzmann Machine.

68. Explain gradient descent?


An optimization algorithm that is used to minimize some function by repeatedly moving in
the direction of steepest descent as specified by the negative of the gradient is known as
gradient descent. It's an iteration algorithm, in every iteration algorithm, we compute the
gradient of a cost function, concerning each parameter and update the parameter of the
function via the following formula:
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

Where,

Θ - is the parameter vector,

α - learning rate,

J(Θ) - is a cost function

In machine learning, it is used to update the parameters of our model. Parameters represent
the coefficients in linear regression and weights in neural networks.

69. Explain the following variant of Gradient Descent: Stochastic, Batch, and Mini-batch?

o Stochastic Gradient Descent


Stochastic gradient descent is used to calculate the gradient and update the
parameters by using only a single training example.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

o Batch Gradient Descent


Batch gradient descent is used to calculate the gradients for the whole dataset and
perform just one update at each iteration.
o Mini-batch Gradient Descent
Mini-batch gradient descent is a variation of stochastic gradient descent. Instead of a
single training example, mini-batch of samples is used. Mini-batch gradient descent is
one of the most popular optimization algorithms.

70. What are the main benefits of Mini-batch Gradient Descent?

• It is computationally efficient compared to stochastic gradient descent.


• It improves generalization by finding flat minima.
• It improves convergence by using mini-batches. We can approximate the gradient of
the entire training set, which might help to avoid local minima.
• 71 Explain the different layers of CNN.
• There are four layered concepts that we should understand in CNN (Convolutional
Neural Network):
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

• Convolution
This layer comprises of a set of independent filters. All these filters are initialized
randomly. These filters then become our parameters which will be learned by the
network subsequently.
• ReLU
The ReLu layer is used with the convolutional layer.
• Pooling
It reduces the spatial size of the representation to lower the number of parameters and
computation in the network. This layer operates on each feature map independently.
• Full Collectedness
Neurons in a completely connected layer have complete connections to all activations
in the previous layer, as seen in regular Neural Networks. Their activations can be
easily computed with a matrix multiplication followed by a bias offset.

71. What are the different layers of Autoencoders? Explain briefly.

An autoencoder contains three layers:


DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

o Encoder
The encoder is used to compress the input into a latent space representation. It
encodes the input images as a compressed representation in a reduced dimension.
The compressed images are the distorted version of the original image.
o Code
The code layer is used to represent the compressed input which is fed to the decoder.
o Decoder
The decoder layer decodes the encoded image back to its original dimension. The
decoded image is a reduced reconstruction of the original image. It is automatically
reconstructed from the latent space representation.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

72. Difference between CNN and RNN

S.no CNN RNN


CNN stands for Convolutional RNN stands for Recurrent Neural
1 Neural Network. Network.
CNN is considered to be more RNN includes less feature
2 potent than RNN. compatibility when compared to CNN.
CNN is ideal for images and RNN is ideal for text and speech
3 video processing. Analysis.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

It is suitable for spatial data RNN is used for temporal data, also
4 like images. called sequential data.
The network takes fixed-size
inputs and generates fixed RNN can handle arbitrary input/
5 size outputs. output lengths.
CNN is a type of feed-forward
artificial neural network with
variations of multilayer RNN, unlike feed-forward neural
perceptron's designed to use networks- can use their internal
minimal amounts of memory to process arbitrary
6 preprocessing. sequences of inputs.
CNN's use of connectivity
patterns between the
neurons. CNN is affected by
the organization of the
animal visual cortex, whose
individual neurons are
arranged in such a way that Recurrent neural networks use time-
they can respond to series information- what a user spoke
overlapping regions in the last would impact what he will speak
7 visual field. next.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

Following are the diagram shows the schematic representation of CNN and RNN

Following are the diagram shows the schematic representation of CNN and RNN
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

73 Choice of optimizer

Momentum:

I guess almost all of us are familiar with the word ‘momentum’. As gradient descent is
comparable with finding a valley, momentum can be compared to a ball rolling downhill.
Momentum helps us to accelerate Gradient Descent(GD) when we have surfaces that curve
more steeply in one direction than in another direction. It also moistens the oscillation as
shown below. For updating the weights it takes the gradient of the current step as well as
the gradient of the previous time steps. Momentum speeds up gradient descent by
converging faster.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

Where V the exponentially weighted average of past gradients, is the momentum.

Thus, we observe that the weight parameters are updated using the gradient of the previous
run.

Adagrad — Adaptive Gradient Algorithm:

Adagrad is an adaptive algorithm for gradient-based optimization that alters the learning
rate to a lower value for parameters associated with frequently occurring features, and
larger updates (i.e. high learning rates) for parameters associated with infrequent features.
For this reason, it is well-suited for dealing with sparse data.
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

Previously, we performed updates on the weights with the same learning rate for every
weight. But Adagrad refurbishes the learning rate for every parameter .

is the partial derivative of the cost function w.r.t the parameter at the time step t.

contains the sum of the squares of the past gradients w.r.t. to all parameters θ along its
diagonal. We can now vectorize our implementation by performing a matrix-vector product
⊙ between and :

One of Adagrad’s main benefits is that it eliminates the need to manually tune the learning
rate. Most implementations use a default value of 0.01 and leave it at that.

Adagrad’s main weakness is its accumulation of the squared gradients in the denominator:
Since every added term is positive, the accumulated sum keeps growing during training.
This in turn causes the learning rate to shrink and eventually become infinitesimally small, at
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

which point the algorithm is no longer able to acquire additional knowledge. The following
algorithms aim to resolve this flaw.

RMSProp:

RMSProp, Root Mean Square Propagation, was devised by Geoffrey Hinton inLecture 6e
of his Coursera Class.. RMSProp comes up by solving the disadvantages of Adagrad. In
RMSProp learning rate gets adjusted automatically and it chooses different learning rates
for each parameter.

RMSprop as well divides the learning rate by an exponentially decaying average of squared
gradients. Hinton suggests γ to be set to 0.9, while a good default value for the learning
rate η is 0.001.

Adam:
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

Adaptive Moment Estimation (Adam) is another adaptive learning method. In addition to


storing an exponentially decaying average of past squared gradients like Adagrad and
RMSprop, Adam also keeps an exponentially decaying average of past gradients , similar
to momentum. Whereas momentum can be seen as a ball running down a slope, Adam
behaves like a heavy ball with friction, which thus prefers flat minima in the error surface.
We compute the decaying averages of past and past squared gradients and
respectively as follows:

and are the hyperparameters. and are estimates of the first moment (the mean) and
the second moment (the uncentered variance) of the gradients respectively, hence the
name of the method. As and are initialized as vectors of 0’s, the authors of Adam
observe that they are biased towards zero, especially during the initial time steps, and
especially when the decay rates are small (i.e. β1 and β2 are close to 1).

They counteract these biases by computing bias-corrected first and second moment
estimates:
DEEP LEARNING VIVA PREPARATION: BY PROF. VANDANA KATE

They then use these to update the parameters just as we have seen in Adadelta and
RMSprop, which yields the Adam update rule:

You might also like