QB With Answers Deep Lerarning

NOORAL ISALAM CENTRE FOR HIGER EDUCTION
(Declared as Deemed -to -be University under Section 3 of the U.G.C act 1956)
(Accredited by NAAC with A’ Grade)
Kumarcoil-629180,Kanyakumari District,Tamil Nadu,India
Subject Name: Deep Learning

Subject Code: CS25B7
QUESTIONBANK
Submitted by,
Abhishek Raj
Adharsh T S
Ameera Beegum
Bindhu Sabu
Leena
Vishnu Prasad
1
Unit 1
1. Define Machine Learning
Machine Learning (ML) is an algorithm that works on consequently through

experience and by the utilization of information. It is viewed as a piece of AI. ML
calculations assemble a model dependent on example information (Data), known as
"training Data or information", to settle on forecasts or choices without being
unequivocally customized to do as such.
2. Define ANN
Neural computing is an information processing paradigm, inspired by

biological system, composed of a large number of highly interconnected processing
elements(neurons)working in unison to solve specific problems. An artificial neuron
is a mathematical function conceived as a simple model of a real (biological) neuron.
 A set of input connections brings in activations from other neuron.

 A processing unit sums the inputs, and then applies a non-linear activation
function (i.e. squashing/transfer/threshold function).
 An output line transmits the result to other neurons.
3. What are the basic elements of ANN
An artificial Neuron consists of three basic components:
a) Weights
b) Thresholds and
c) Single activation function.
An Artificial neural network (ANN) model based on the biological neural

systems is shown in figure.
4. List any four learning procedure available in ANN.

2
(i)Supervised learning
(ii) Unsupervised learning
(iii) Reinforced learning
(iv) Hebbian learning
5. What is meant by linear regression
Linear regression analysis is used to predict the value of a variable based

on the value of another variable. The variable you want to predict is called the
dependent variable. The variable you are using to predict the other variable's value is
called the independent variable.
6. What is gradient descent.
Gradient Descent is an algorithm that finds the best-fit line for a given
training dataset in a smaller number of iterations. For some combination of m and c,
we will get the least Error (MSE). That combination of m and c will give us our best
fit line.
7. What are the types of gradient descent.
There are three types of gradient descent learning algorithms: batch gradient
descent, stochastic gradient descent and mini-batch gradient descent.
(i) Batch gradient descent.
(ii) Stochastic gradient descent.
(iii) Mini-batch gradient descent
8. What is meant by SVM
Support Vector Machine or SVM is one of the most popular Supervised

Learning algorithms, which is used for Classification as well as Regression
problems.
(i) Primarily, it is used for Classification problems in Machine
Learning.
(ii) The goal of the SVM algorithm is to create the best line or decision
boundary that can segregate
(iii) n-dimensional space into classes so that we can easily put the new
data point in the correct category in the future. This best decision
boundary is called a hyperplane.
3
9. What are the types of SVM
(i) Linear SVM: Linear SVM is used for linearly separable data,
which means if a dataset can be classified into two classes by using a
single straight line, then such data is termed as linearly separable data,
and classifier is used called as Linear SVM classifier.
(ii) Non-linear SVM: Non-Linear SVM is used for non-linearly

separated data, which means if a dataset cannot be classified by using a
straight line, then such data is termed as non-linear data and classifier
used is called as Non-linear SVM classifier.
10. What is multilayer perception model
A multilayer perceptron (MLP) is a fully connected class of feedforward

artificial neural network (ANN). The term MLP is used ambiguously, sometimes
loosely to mean any feedforward ANN, sometimes strictly to refer to networks
composed of multiple layers of perceptrons (with threshold activation)
11. List four limitations of single layer perceptions.
(i) Uses only Binary Activation function

(ii) Can be used only for Linear Networks
(iii) Since uses Supervised Learning ,Optimal Solution is provided
(iv) Training Time is More
(v) Cannot solve Linear In-separable Problem
4
12. List four requirements of learning laws.
(i) Learning Law should lead to convergence of weights

(ii) Learning or training time should be less for capturing the information from the
training pairs
(iii) Learning should use the local information
(iv) Learning process should able to capture the complex non linear mapping
available between the input & output pairs
(v) Learning should able to capture as many as patterns as possible
(vi) Storage of pattern information's gathered at the time of learning should be high
for the given network.
13. Explain Hebbian Learning.
14. What is a convex region?
Select any Two points in a region and draw a straight line between these
two points. If the points selected and the lines joining them both lie inside the region
then that region is known as convex regions.
Convex regions can be created by multiple decision lines arising from

multi layer networks. Single layer network cannot be used to solve inseparable
problem. Hence we go for multilayer network there by creating convex regions
which solves the inseparable problem.
15. What is Supervised learning?
Every input pattern that is used to train the network is associated with an
output pattern which is the target or the desired pattern.
A teacher is assumed to be present during the training process, when a
comparison is made between the network’s computed output and the correct
expected output, to determine the error.
The error can then be used to change network parameters, which result in
an improvement in performance.
5
16. State the difference between Supervised and unsupervised learning?
Supervised learning
output pattern which is the target or the desired pattern. A teacher is assumed to be
present during the training process, when a comparison is made between the
network’s computed output and the correct expected output, to determine the
error.The error can then be used to change network parameters, which result in an
improvement in performance.
Unsupervised learning:
In this learning method the target output is not presented to the network.It
is as if there is no teacher to present the desired patterns and hence the system learns
of its own by discovering and adapting to structural features in the input patterns.
17. Define Gradient descent learning?
18. What is meant by perceptron model.
A perceptron is a simple model of a biological neuron in an artificial

neuron network. Perception is also the name of an early algorithm for supervised
learning of binary classifiers
Perceptron is a mathematical model of a biological neuron. While the

actual neurons the dendrites receives the electrical signals from axions of other
neurons, in the perceptron these lectrical signals are represented as numerical values.
6
19. What is a shallow network?
Shallow neural networks is a term used to describe NN that usually have only
one hidden layer as opposed to deep NN which have several hidden layers, often of
various types.
It combines regression and CNN with several layers to process ordinal
regression on two age benchmark datasets.
These models construct a non-linear mapping from input to output. They
use convolution technology to extract data features.
20. What is logistic regression

(i) Logistic regression is one of the most popular Machine Learning algorithms,
which comes under the Supervised Learning technique. It is used for predicting the
categorical dependent variable using a given set of independent variables.
(ii) Logistic regression predicts the output of a categorical dependent variable.

Therefore the outcome must be a categorical or discrete value. It can be either Yes or
No, 0 or 1, true or False, etc. but instead of giving the exact value as 0 and 1, it gives
the probabilistic values which lie between 0 and 1.
(iii) Logistic Regression is much similar to the Linear Regression except that how
they are used. Linear Regression is used for solving Regression problems,
whereas Logistic regression is used for solving the classification problems.
(iv) In Logistic regression, instead of fitting a regression line, we fit an "S" shaped
logistic function, which predicts two maximum values (0 or 1).
7
PART B
1. Define Artificial Neural Networks? What is ANN model? What are the
basic elements of ANN?
An artificial neuron is a mathematical function conceived as a simple model of a real

(biological) neuron.
 The McCulloch-Pitts Neuron. This is a simplified model of real neurons,
known as a Threshold Logic Unit.
 A set of input connections brings in activations from other neuron.
 A processing unit sums the inputs, and then applies a non-linear activation
 An output line transmits the result to other neurons.
An artificial Neuron consists of three basic components:

a. Weights
b. thresholds and
c. a single activation function.
An Artificial neural network (ANN) model based on the biological neural systems is
shown in figure.
2. Explain different learning procedures available in ANN
Supervised Learning
Every input pattern that is used to train the network is associated with an output
pattern which is the target or the desired pattern. A teacher is assumed to be present
during the training process, when a comparison is made between the network’s
computed output and the correct expected output, to determine the error.The error
can then be used to change network parameters, which result in an improvement in
performance.
8
In this learning method the target output is not presented to the network.It is as if
there is no teacher to present the desired patterns and hence the system learns of its
own by discovering and adapting to structural features in the input patterns.
Reinforced learning:
In this method, a teacher though available, doesnot present the expected answer but
only indicates if the computed output correct or incorrect.The information provided
helps the network in the learning process.
Hebb learning
Gradient Descent Learning
3. What is Perceptron model? Explain the types of perception model.
A perceptron is a simple model of a biological neuron in an artificial

neuron network. Perception is also the name of an early algorithm for supervised
learning of binary classifiers
Perceptron is a mathematical model of a biological neuron. While the

actual neurons the dendrites receives the electrical signals from axions of other
neurons, in the perceptron these lectrical signals are represented as numerical values.
9
Perceptron network is capable of performing pattern classification into
two or more categories. The perceptron is trained using the perceptron learning rule.
We will first consider classification into two categories and then the general
multiclass classification later. For classification into only two categories, all we need
is a single output neuron. Here we will use bipolar neurons. The simplest architecture
that could do the job consists of a layer of N input neurons, an output layer with a
single output neuron, and no hidden layers. This is the same architecture as we saw
before for Hebb learning. However, we will use a different transfer function here for
the output neurons.
Single Layer Preceptrons:
 Uses only Binary Activation function

 Can be used only for linear networks
 Uses supervised learning, optimal solution is provided
 Training time is more
 Cannot solve linear inseparable problem
Multi Layer Preceptrons:

In multi layer perceptron network, in between the input and output layer there
will be some more layers known as hidden layers.
10
4. What is SVM? Explain the types of SVM.
Support Vector Machine or SVM is one of the most popular Supervised

Learning algorithms, which is used for Classification as well as Regression
problems.
It is used for Classification problems in Machine Learning. The goal of
the SVM algorithm is to create the best line or decision boundary that can segregate
n-dimensional space into classes so that we can easily put the new data point in the
correct category in the future. This best decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These
extreme cases are called as support vectors, and hence algorithm is termed as
Support Vector Machine. Consider the below diagram in which there are two
differentcategories that are classified using a decision boundary or hyperplane:
SVM can be of two types:
Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can beclassified into two classes by using a single straight line, then such data
is termed as linearlyseparable data, and classifier is used called as Linear SVM
11
classifier.
Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if adataset cannot be classified by using a straight line, then such data is
termed as non-linear dataand classifier used is called as Non-linear SVM classifier
Support Vectors:
The data points or vectors that are the closest to the hyperplane and which affect the
position ofthe hyperplane are termed as Support Vector. Since these vectors support
the hyperplane, hencecalled a Support vector.
5. Explain different training procedures in ANN.
Supervised Learning
output pattern which is the target or the desired pattern. A teacher is assumed to be
present during the training process, when a comparison is made between the
network’s computed output and the correct expected output, to determine the
error.The error can then be used to change network parameters, which result in an
improvement in performance.
In this learning method the target output is not presented to the network.It
is as if there is no teacher to present the desired patterns and hence the system learns
of its own by discovering and adapting to structural features in the input patterns.
Reinforced learning:
In this method, a teacher though available, doesnot present the expected
answer but only indicates if the computed output correct or incorrect.The information
12
provided helps the network in the learning process.
Hebb learning
Gradient Descent Learning
Stochastic:
A stochastic process is a probability model describing a collection of
time-ordered random variables that represent the possible sample paths.
Competitive
Competitive learning is a form of unsupervised learning in artificial
neural networks, in which nodes compete for the right to respond to a subset of the
input data. A variant of Hebbian learning, competitive learning works by increasing
the specialization of each node in the network.
13
UNIT-2
Part A
1) What is the probabilistic theory of Deep learning?

Probability is the science of quantifying uncertain things. Most of
machine learning and deep learning systems utilize a lot of data to learn about
patterns in the data. Whenever data is utilized in a system rather than sole logic,
uncertainty grows up and whenever uncertainty grows up, probability becomes
relevant.
2) What is BPN?
Backpropagation in neural network is a short form for “backward
propagation of errors.” It is a standard method of training artificial neural
networks. This method helps calculate the gradient of a loss function with respect
to all the weights in the network
3) What are the four major steps of BPN Algorithm?

1. Initialization of Bias, Weights
2. Feedforward process
3. Back Propagation of Errors
4. Updating of weights & biases
4) What are the Merits of BPN Algorithm?

• Has smooth effect on weight correction
• Computing time is less if weight’s are small
• 100 times faster than perceptron model
• Has a systematic weight updating procedure
5) List any fourThe de-Merits of BPN Algorithm?

• Network gets trapped in Local Minima
• Temporal Instability
• Network Paralysis
• Training time is more for Complex problems
6) What is Multi layer Network?

A multi-layer neural network contains more than one layer of artificial
neurons or nodes. They differ widely in design. It is important to note that while
single-layer neural networks were useful early in the evolution of AI, the vast
majority of networks used today have a multi-layer model.
7) What are the limitations of Single layer Network?

• Uses only Binary Activation function
• Can be used only for Linear Networks
14
• Since uses Supervised Learning ,Optimal Solution is provided
• Training Time is More
• Cannot solve Linear In-separable Problem
8) What is mean by Regularization?

A fundamental problem in machine learning is how to make an algorithm
that will perform well not just on the training data, but also on new inputs. Many
strategies used in machine learning are explicitly designed to reduce the test error,
possibly at the expense of increased training error. These strategies are known
collectively as regularization.
9) What is mean by L2 Regularization?

One of the simplest and most common kind of parameter norm penalty is
L2 parameter & it’s also called commonly as weight decay. This regularization
strategy drives the weights closer to the origin by adding a regularization term .
L2 regularization is also known as ridge regression or Tikhonov
regularization. To simplify, we assume no bias parameter, so θ is just w. Such a
model has the following total objective function.
10) What is mean by L1 Regularization?

While L2 weight decay is the most common form of weight decay, there
are other ways to penalize the size of the model parameters. Another option is to
use L1 regularization. L1 regularization on the model parameter w is defined as
the sum of absolute values of the individual parameters.
11) State Any two difference between L1 and L2 Regularization?

 L1 regularization can add the penalty term in cost function. But L2
regularization appends the squared value of weights in the cost function.
 L1 regularization can be helpful in features selection by eradicating the
unimportant features, whereas, L2 regularization is not recommended for feature
selection
12) What is convolutional neural network ?

Within Deep Learning, a Convolutional Neural Network or CNN is a type
of artificial neural network, which is widely used for image/object recognition and
classification. Deep Learning thus recognizes objects in an image by using a CNN.
13) What is GAN?

Generative modelling is an unsupervised learning task in machine
learning that involves automatically discovering and learning the regularities or
patterns in input data in such a way that the model can be used to generate or
output new examples that plausibly could have been drawn from the original
dataset.
15
14) What is Semi-Supervised learning?
Semi-supervised machine learning is a combination of supervised and
unsupervised learning. It uses a small amount of labelled data and a large amount
of unlabelled data, which provides the benefits of both unsupervised and
supervised learning while avoiding the challenges of finding a large amount of
labelled data.
15) What is the Difference between supervise and semi-supervised

learning?
Supervised learning aims to learn a function that, given a sample of data
and desired outputs, approximates a function that maps inputs to outputs. Semi-
supervised learning aims to label unlabelled data points using knowledge learned
from a small number of labelled data points
16) What is mean by Batch Normalization?

It is a method of adaptive reparameterization, motivated by the difficulty
of training very deep models. In Deep networks, the weights are updated for each
layer. So the output will no longer be on the same scale as the input (even though
input is normalized).
17) State any two difference between shallow Network and Deep learning
Network?
 Shallow network has One Hidden layer(or very less no. of Hidden
Layers) and takes input only as VECTORS.
 DL network can have raw data like image, text as inputs and Deep Net’s
has many layers of Hidden layers with more no. of neurons in each
layers.
18) What is the difference between single layer and Multilayer Network?
In a multilayer perceptron (MLP), single-layer perceptron (SLP) are
arranged in layers and connected to each other, with the outputs of the SLPs in the
output layer being the outputs of the MLP. Here, each SLP is represented by a disc
19) What is the Deference between CNN and GAN?

Both the FCC- GAN models learn the distribution much more quickly
than the CNN model. FCC-GAN models generate clearly recognizable digits,
while the CNN model does not. A er epoch 50, all models generate good images,
though FCC-GAN models still outperform the CNN model in terms of image
quality.
20) What is conditional GAN?
16
An important extension to the GAN is in their use for conditionally
generating an output. The generative model can be trained to generate new
examples from the input domain, where the input, the random vector from the
latent space, is provided with (conditioned by) some additional input. The
additional input could be a class value, such as male or female in the generation of
photographs of people, or a digit, in the case of generating images of handwritten
digits.
Part B
1) Explain Shallow Network and With Generic Model. State the difference
between Shallow Network and Deep learning Network?
Shallow neural networks give us basic idea about deep neural network which
consist of only 1 or 2 hidden layers. Understanding a shallow neural network gives
us an understanding into what exactly is going on inside a deep neural network A
neural network is built using various hidden layers. Now that we know the
computations that occur in a particular layer, let us understand how the whole
neural network computes the output for a given input X. These can also be called
the forward-propagation equations.
17
18
2) Explain TheHistory of Deep learning?
 The chain rule that underlies the back-propagation algorithm was invented in
the seventeenth century (Leibniz, 1676; L’Hôpital, 1696)
 Beginning in the 1940s, the function approximation techniques were used to
motivate machine learning models such as the perceptron
 The earliest models were based on linear models. Critics including Marvin
Minsky pointed out several of the flaws of the linear model family, such as its
inability to learn the XOR function, which led to a backlash against the entire
neural network approach
 Efficient applications of the chain rule based on dynamic programming began to
appear in the 1960s and 1970s
 Werbos (1981) proposed applying chain rule techniques for training artificial
neural networks. The idea was finally developed in practice after being
independently rediscovered in different ways.
 Following the success of back-propagation, neural network research gained
popularity and reached a peak in the early 1990s. Afterwards, other machine
learning techniques became more popular until the modern deep learning
renaissance that began in 2006
 The core ideas behind modern feed forward networks have not changed
substantially since the 1980s. The same back-propagation algorithm and the same
approaches to gradient descent are still in use
19
3) Explain BPN Network, What are the four Major Steps of BPN Algorithm.
State Merits and De-Merits of BPN Algorithm?

It uses Supervised Training process; it has asystematic procedure for

training the network and is used in Error Detection and Correction. Generalized
Delta Law /Continuous Perceptron Law/ Gradient Descent Law is used in this
network.
Generalized Delta rule minimizes the mean squared error of the output
calculated from the output. Delta law has faster convergence ratewhen compared
with Perceptron Law. It is the extended version of PerceptronTraining Law.
Limitations of this law is the Local minima problem. Due to this the
convergence speed reduces, but it is better than perceptron’s.
Even though Multi level perceptron’s can be used theyare flexible and
efficient that BPN. The weights between input and thehidden portion is considered
as Wij and the weight between first hidden to the nextlayer is considered as Vjk.
This network is valid only for Differential Output functions.
The Training process used in backpropagation involves three

stages,which are listed as below
1. Feedforward of input training pair

2. Calculation and backpropagation of associated error
3. Adjustments of weights
The algorithm for BPN is as classified int four major steps as follows:
20
1. Initialization of Bias, Weights
2. Feedforward process
3. Back Propagation of Errors
4. Updating of weights & biases
Merits
• Has smooth effect on weight correction
• Computing time is less if weight’s are small
• 100 times faster than perceptron model
• Has a systematic weight updating procedure
Demerits
• Learning phase requires intensive calculations
• Selection of number of Hidden layer neurons is an issue
• Selection of number of Hidden layers is also an issue
• Network gets trapped in Local Minima
• Temporal Instability
• Network Paralysis
• Training time is more for Complex problems
3) Explain Multilayer Network? What is the Need of Multilayer Network ?

Brief any example of Multilayer Network?
A multi-layer neural network contains more than one layer of artificial neurons

or nodes. They differ widely in design. It is important to note that while single-
layer neural networks were useful early in the evolution of AI, the vast majority of
networks used today have a multi-layer model.
Need for Multilayer Networks

 Single Layer networks cannot used to solve Linear Inseparable problems & can
only be used to solve linear separable problems
 Single layer networks cannot solve complex problems
 Single layer networks cannot be used when large input-output data set is
available  Single layer networks cannot capture the complex information’s
available in the training pairs Hence to overcome the above said Limitations we
use Multi-Layer Networks.
Multi-Layer Networks
 Any neural network which has at least one layer in between input and output
layers is called Multi-Layer Networks
 Layers present in between the input and out layers are called Hidden Layers
 Input layer neural unit just collects the inputs and forwards them to the next
higher layer
 Hidden layer and output layer neural units process the information’s feed to
21
them and produce an appropriate output
 Multi -layer networks provide optimal solution for arbitrary classification
problems
 Multi -layer networks use linear discriminants, where the inputs are non linear
Back propogation Networks (BPN) are good example of multilayer networks.

5) What are Generative adversarial networks? Explain With GAN Model

Architecture.
Generative Adversarial Networks, or GANs for short, are an approach to

generative modelling using deep learning methods Generativemodelling is an
unsupervised learning task in machine learning that involves automatically
discovering and learning the regularities or patterns in input data in such a way
that the model can be used to generate or output new examples that plausibly
could have been drawn from the original dataset.
22
Generative Adversarial Networks, or GANs, are a deep-learning-based
generative model.
More generally, GANs are a model architecture for training a generative

model, and it is most common to use deep learning models in this architecture.
The GAN architecture was first described in the 2014 paper by Ian
Goodfellow, et al. titled “Generative Adversarial Networks.”
A standardized approach called Deep Convolutional Generative

Adversarial Networks, or DCGAN, that led to more stable models was later
formalized by Alec Radford, et al. in the 2015 paper titled “Unsupervised
Representation Learning with Deep Convolutional Generative Adversarial
23
Networks“.
Generative modelling is an unsupervised learning problem, as we

discussed in the previous section, although a clever property of the GAN
architecture is that the training of the generative model is framed as a supervised
learning problem.
At a limit, the generator generates perfect replicas from the input domain
every time, and the discriminator cannot tell the difference and predicts “unsure”
(e.g. 50% for real and fake) in every case. This is just an example of an idealized
case; we do not need to get to this point to arrive at a useful generator model.
24
25
UNIT - 3
PART-A
1. Define Linear Factor model.
linear factor models are used as building blocks of mixture models of

larger, deep probabilistic models.
A linear factor model is defined by the use of a stochastic linear decoder
function that generates x by adding noise to a linear transformation of h. It
allows us to discover explanatory factors that have a simple joint
distribution. A linear factor model describes the data-generation process as
follows.
2. What is dimensionality reduction?
Dimensionality Reduction:
 High dimensionality is challenging and redundant
 It is natural to try to reduce dimensionality
 We reduce the dimensionality by feature combination i.e., combine
old features X to create new features Y as given below:
26
3. What is mean by PCA?
Principal Component Analysis, or simply PCA, is a statistical procedure

concerned with elucidating the covariance structure of a set of variables. It is
a way of identifying patterns in data, and expressing the data in such a way
as to highlight their similarities and differences.
Since patterns in data can be hard to find in data of high dimension, where
the luxury of graphical representation is not available. The other main
advantage of PCA is that once you have found these patterns in the data, and
you compress the data, ie. by reducing the number of dimensions, without
much loss of information.
4. State four advantages of PCA.
1. Removes Correlated Features

2. Improves Algorithm Performance
3. Reduces Overfitting
4. Improves Visualization
5. State disadvantages of PCA.
1. Independent variables become less interpretable: Principal

Components are the linear combination of your original features. Principal
Components are not as readable and interpretable as original features.
2. Data standardization is must before PCA: You must standardize your

data before implementing PCA, otherwise PCA will not be able to find the
optimal Principal Components.
3. Information Loss:It may miss some information as compared to

the original list of features.
6. What is linear discrimination analysis?

Linear Discriminant Analysis as its name suggests is a linear model for
classification and dimensionality reduction.
7. What is the need for LDA?
 Logistic Regression is performing well for binary classification but

fails in the case of multiple classification problems with well-
separated classes. While LDA handles these quite efficiently.
 LDA can also be used in data pre-processing to reduce the number of

features just as PCA which reduces the computing cost significantly.
27
8. What are the limitations of LDA?
 Linear decision boundaries may not effectively separate non-linearly

separable classes. More flexible boundaries are desired.
 In cases where the number of observations exceeds the number of

features, LDA might not perform as desired. This is called Small
Sample Size (SSS) problem. Regularization is required
9. List the steps involved in LDA.

There are three key steps .
(i) Calculate the severability between different classes. This is also
known as between-class variance and is defined as the distance
between the mean of the different class.
(ii) Calculate the within-class variance. This is the distance
between mean and sample of every class.
(iii) Construct the lower-dimensional space that maximizes Step-
I(between-class variance) and minimizes Step 2(within class
variance)
10.What are the advantages of LDA?
1. Simple prototype classifier: Distance to the class mean is used, it’s simple
to interpret.
2. Decision boundary is linear: It’s simple to implement and the
classification is robust.
3. Dimension reduction: It provides informative low-dimensional view on
the data, which is both useful for visualization and feature engineering.
11.What is an autoencoder?
12.
Autoencoder is an unsupervised Artificial Neural Network that attempts to
encode the data by compressing it into the lower dimensions (bottle neck
layer or code) and then decoding the data to re construct the original input.
The bottleneck layer (or code) holds the compressed representation of the
input data.
In Autoencoder the number of output units must be equal to the number of
input units since we’re attempting to re construct the input data. Auto
encoders usually consist of an encoder and a decoder. The encoder encodes
the provided data into a lower dimension which is the size of the bottle neck
layer and the decoder decodes the compressed data into its original form.
13. What are types of auto encoder?

Types of Autoencoders are:
28
 Deep Autoencoder
 Sparse Autoencoder
 Under complete Autoencoder
 Variational Autoencoder
 LSTM Autoencoder
14.List four hyper parameters for auto encoder.
Hyperparametersof an Autoencoder
 Code size or the number of unitsin the bottleneck layer
 Input and outputsize,which isthe number offeaturesin the data
 Number of neurons or nodes perlayer
 Number oflayersin encoder and decoder.
 Activation function
 Optimization function
15.What is ResNet?
ResNet, the winner of ILSVRC-2015 competition is a deep network with

over 100 layers. Residual networks (ResNet) are similar to VGG nets
however with a sequential approach they also use “Skip connections” and
“batch normalization” that helps to train deep layers without hampering the
performance. After VGG Nets, as CNNs were going deep, it was becoming
hard to train them because of vanishing gradients problem that makes the
derivate infinitely small. Therefore, the overall performance saturates or
even degrades. The idea of skips connection came from highway network
where gated shortcut connections were used.
16.What is inception net?
Inception network also known as GoogleLe Net was proposed by developers

at google in “Going Deeper with Convolutions” in 2014. The motivation of
InceptionNet comes from the presence of sparse features Salient parts in the
image that can have a large variation in size. Due to this, the selection of
right kernel size becomes extremely difficult as big kernels are selected for
global features and small kernels when the features are locally located. The
29
InceptionNets resolves this by stacking multiple kernels at the same level.
Typically it uses 5*5, 3*3 and 1*1 filters in one go.
17.What is hyper parameter optimization?

Hyperparameter optimization in machine learning intends to find the
hyperparameters of a given machine learning algorithm that deliver the best
performance as measured on a validation set.
Hyperparameters, in contrast to model parameters, are set by the machine
learning engineer before training. The number of trees in a random forest is a
hyperparameter while the weights in a neural network are model parameters
learned during training.
Hyperparameter optimization finds a combination of hyperparameters
that returns an optimal model which reduces a predefined loss function and
in turn increases the accuracy on given independent data
18.List four hyper parameter methods.
 Manual Hyperparameter Tuning

 Grid Search
 Random Search
 Bayesian Optimization
 Gradient-based Optimization
19.What is mean by batch normalization?
Batch normalization is a technique for training very deep neural networks that
normalizes the contributions to a layer for every mini-batch. This has the impact
of settling the learning process and drastically decreasing the number of training
epochs required to train deep neural networks.
20.List three popular manifold algorithms

30
Three popular manifold learning algorithms:
i) IsoMap (Isometric Mapping):
Isomap seeks a lower-dimensional representation that maintains ‘geodesic
distances’ between the points. A geodesic distance is a generalization of
distance for curved surfaces.
ii) Locally Linear Embeddings Locally Linear Embeddings:
use a variety of tangent linear patches (as demonstrated with the diagram
above) to model a manifold.
iii) t-SNE:
t-SNE is one of the most popular choices for high-dimensional visualization,
and stands for t-distributed Stochastic Neighbor Embeddings.
21.What is manifold learning?
Manifold learning for dimensionality reduction has recently gained much

attention to assist image processing tasks such as segmentation, registration,
tracking, recognition, and computational anatomy. It overcomes the
drawbacks of PCA. The goal of the manifold-learning algorithms is to
recover the original domain structure, up to some scaling and rotation.
PART-B
1. Explain auto encoder. What are the types of auto encoder? What are the
hyper parameters of an auto encoder?
Autoencoder is an unsupervised Artificial Neural Network that attempts

to encode the data by compressing it into the lower dimensions (bottle neck
layer or code) and then decoding the data to re construct the original
input.The bottleneck layer(or code) holds the compressed representation of
the input data. In Auto encoder the number of output units must be equal to
the number of input units since we’re attempting to re construct the input
data. Auto encoders usually consist of an encoder and a decoder. The
encoder encodes the provided data into a lower dimension which is the size
of the bottleneck layer and the decoder decodes the compressed data into its
original form.
Types of Autoencoders are:
 Deep Autoencoder
 Sparse Autoencoder
 Under complete Autoencoder
 Variational Autoencoder
 LSTM Autoencoder
31
.
Hyper parameters of an Autoencoder
 Code size or the number of units in the bottleneck layer

 Input and output size, which is the number of features in the data
 Number of neurons or nodes per layer
 Number of layersin encoder and decoder.
 Activation function
 Optimization function
2. Explain Principal Component analysis? What are steps involved in

PCA? What are the advantages and disadvantages of PCA?
Principal Component Analysis:

Principal Component Analysis, or simply PCA, is a statistical procedure
concerned with elucidating the covariance structure of a set of variables. In
particular it allows us to identify the principal directions in which the data
varies. It is a way of identifying patterns in data, and expressing the data in
such a way as to highlight their similarities and differences.
Since patterns in data can be hard to find in data of high dimension,
where the luxury of graphical representation is not available, PCA is a
powerful tool for analysing data. The other main advantage of PCA is that
once you have found these patterns in the data, and you compress the data,
ie. by reducing the number of dimensions, without much loss of information.
Principal component analysis can be broken down into five steps
 Standardize the range of continuous initial variables

 Compute the covariance matrix to identify correlations
 Compute the eigenvectors and eigenvalues of the covariance matrix
to identify the principal components
 Create a feature vector to decide which principal components to
keep
 Recast the data along the principal component axes
32
Advantages of PCA.
1. Removes Correlated Features: In a real-world scenario, it is very

common that we get thousands of features in our dataset. Hence the data set
should be reduced. We need to find out the correlation among the features
(correlated variables). Finding correlation manually in thousands of features
is nearly impossible, frustrating and time-consuming. PCA performs this
task effectively. After implementing the PCA on your dataset, all the
Principal Components are independent of one another. There is no
correlation among them.
2. Improves Algorithm Performance: With so many features, the

performance of your algorithm will drastically degrade. PCA is a very
common way to speed up your Machine Learning algorithm by getting rid of
correlated variables which don't contribute in any decision making. The
training time of the algorithms reduces significantly with a smaller number
of features. So, if the input dimensions are too high, then using PCA to speed
up the algorithm is a reasonable choice.
3. Reduces Overfitting: Overfitting mainly occurs when there are

too many variables in the dataset. So, PCA helps in overcoming the over
fitting issue by reducing the number of features.
4. Improves Visualization:
Disadvantages of PCA
 Independent variables become less interpretable: After implementing

PCA on the dataset, your original features will turn into Principal
Components. Principal Components are the linear combination of your
original features. Principal Components are not as readable and
interpretable as original features.
 Data standardization is must before PCA: You must standardize your

data before implementing PCA; otherwise PCA will not be able to find
the optimal Principal Components.
 Information Loss: Although Principal Components try to cover

maximum variance among the features in a dataset, if we don't select the
number of Principal Components with care, it may miss some
information as compared to the original list of features.
33
3. Explain linear discriminant Analysis? What is the need of LDA? What
are the steps involved in LDA? What are the advantages and disadvantages
of LDA?
Linear Discriminant Analysis as its name suggests is a linear model for

classification and dimensionality reduction. Most commonly used for feature
extraction in pattern classification problems.
Need for LDA:
 Logistic Regression is perform well for binary classification but fails in

the case of multiple classification problems with well-separated classes. While
LDA handles these quite efficiently.
 LDA can also be used in data pre-processing to reduce the number of
features just as PCA which reduces the computing cost significantly.
Limitations:
 Linear decision boundaries may not effectively separate non-linearly
separable classes. More flexible boundaries are desired.
 In cases where the number of observations exceeds the number of
features, LDA might not perform as desired. This is called Small Sample Size
(SSS) problem. Regularization is required.
Linear Discriminant Analysis or Normal Discriminant Analysis or

Discriminant Function Analysis is a dimensionality reduction technique that is
commonly used for supervised classification problems. It is used for modelling
differences in groups i.e. separating two or more classes. It is used to project the
features in higher dimension space into a lower dimension space. For example,
we have two classes and we need to separate them efficiently. Classes can have
multiple features. Using only a single feature to classify them may result in
some overlapping. So, we will keep on increasing the number of features for
proper classification.
Steps involved in LDA:

There are the three key steps.
(i) Calculate the separability between different classes. This is also known
as between-class variance and is defined as the distance between the
mean of different classes.
(ii) (ii) Calculate the within-class variance. This is the distance between the
mean and the sample of every class.
(iii) (iii) Construct the lower-dimensional space that maximizes Step1
(between-class variance) and minimizes Step 2(within-class variance).
Advantages of LDA:
1. Simple prototype classifier: Distance to the class mean is used, it’s

simple to interpret.
34
2. Decision boundary is linear: It’s simple to implement and the
classification is robust.
3. Dimension reduction: It provides informative low-dimensional view on
the data, which is both useful for visualization and feature engineering.
4. Explain manifold learning? What are the categories of manifold

learning? Explain manifold learning Algorithms.
Manifold Learnings:
 Manifold learning for dimensionality reduction has recently gained

much attention to assist image processing tasks such as segmentation,
registration, tracking, recognition, and computational anatomy.
 The drawbacks of PCA in handling dimensionality reduction problems
for non-linear weird and curved shaped surfaces necessitated
development of more advanced algorithms like Manifold Learning.
 This kind of data representation selectively chooses data points from a
low-dimensional manifold that is embedded in a high-dimensional
space in an attempt to generalize linear frameworks like PCA.
 Manifolds give a look of flat and featureless space that behaves like
Euclidean space. Manifold learning problems are unsupervised where
it learns the high-dimensional structure of the data from the data itself,
without the use of predetermined classifications and loss of
importance of information regarding some characteristic of the
original variables.
 The goal of the manifold-learning algorithms is to recover the original
domain structure, up to some scaling and rotation. The nonlinearity of
these algorithms allows them to reveal the domain structure even
when the manifold is not linearly embedded. It uses some scaling and
rotation for this purpose.
Manifold learning algorithms are divided in to two categories:
 Global methods: Allows high-dimensional data to be mapped

from high-dimensional to low-dimensional such that the global
properties are preserved. Examples include Multidimensional
Scaling (MDS), Isomaps covered in the following sections.
 Local methods: Allows high-dimensional data to be mapped to

low dimensional such that local properties are preserved.
Examples are Locally linear embedding (LLE), Laplacian
eigenmap (LE), Local tangent space alignment (LSTA), Hessian
Eigenmapping (HLLE).
35
Three popular manifold learning algorithms:
IsoMap (Isometric Mapping): Isomap seeks a lower-dimensional

representation that maintains ‘geodesic distances’ between the points. A
geodesic distance is a generalization of distance for curved surfaces. Hence,
instead of measuring distance in pure Euclidean distance with the Pythagorean
theorem-derived distance formula, Isomap optimizes distances along a
discovered manifold
Locally Linear Embeddings: Locally Linear Embeddings use a variety of
tangent linear patches (as demonstrated with the diagram above) to model a
manifold. It can be thought of as performing a PCA on each of these
neighbourhoods locally, producing a linear hyperplane, then comparing the
results globally to find the best nonlinear embedding. The goal of LLE is to
‘unroll’ or ‘unpack’ in distorted fashion the structure of the data, so often LLE
will tend to have a high density in the center with extending rays
t-SNE:
t-SNE is one of the most popular choices for high-dimensional visualization,
and stands for t-distributed Stochastic Neighbor Embeddings. The algorithm
converts relationships in original space into t-distributions, or normal
distributions with small sample sizes and relatively unknown standard
deviations. This makes t-SNE very sensitive to the local structure, a common
theme in manifold learning. It is considered to be the go-to visualization method
because of many advantages it possesses
5. Explain (i) Alexnet (ii) VGG-16 (iii) Resnet (iv) Inceptionnet
Convolutional neural networks are one of the variants of neural

networks where hidden layers consist of convolutional layers, pooling
layers, fully connected layers, and normalization layers. A stack of
distinct layers that transform input volume into output volume with the
help of a differentiable function is known as CNN
Architecture. Numerous variations of such arrangements have developed
over the years resulting in several CNN architectures. The most common
amongst them are:
i. AlexNet
ii. VGGNet
iii. ResNet
iv. GoogleNet / Inception
(i) AlexNet:
36
AlexNet model was proposed in 2012 in the research paper named
ImageNet Classification with Deep Convolution Neural Network by
Alex Krizhevsky and his colleagues AlexNet was the
first convolutional network which used GPU to boost performance.
1. AlexNet architecture consists of 5 convolutional layers, 3 max-

pooling layers, 2 normalization layers, 2 fully connected layers,
and 1 softmax layer.
2. Each convolutional layer consists of convolutional filters and a

nonlinear activation function ReLU.
3. The pooling layers are used to perform max pooling.
4. Input size is fixed due to the presence of fully connected layers.
5. The input size is mentioned at most of the places as 224x224x3

but due to somepadding which happens it works out to be
227x227x3
6. AlexNet overall has 60 million parameters.
Model Details
The model which won the competition was tuned with specific details-
1. ReLU is an activation function
2. Used Normalization layers which are not common anymore
3. Batch size of 128
4. SGD Momentum as learning algorithm
5. Heavy Data Augmentation with things like flipping, jittering,

cropping, color normalization, etc.
6. Ensembling of models to get the best results.
(ii) VGG-16
The major shortcoming of too many hyper-parameters of AlexNet was
solved by VGG Net by replacing large kernel-sized filters (11 and 5 in the
37
first and second convolution layer, respectively) with multiple 3×3 kernel-
sized filters one after another.
 The architecture developed by Simonyan and Zisserman was the 1st

runner up of the Visual Recognition Challenge of 2014.
 The architecture consists of 3*3 Convolutional filters, 2*2 Max Pooling
layer with a stride of 1.
 Padding is kept same to preserve the dimension
 There are 16 layers in the network where the input image is RGB format
with dimension of 224*224*3, followed by 5 pairs of Convolution
(filters: 64, 128, 256,512,512) and Max Pooling.
 The output of these layers is fed into three fully connected layers and a
softmax function in the output layer.
 In total there are 138 Million parameters in VGG Net.
iii) ResNet:
ResNet, the winner of ILSVRC-2015 competition is a deep network with over

100 layers. Residual networks (ResNet) is similar to VGG nets however with a
sequential approach they also use “Skip connections” and “batch normalization”
that helps to train deep layers without hampering the performance. After VGG
Nets, as CNNs were going deep, it was becoming hard to train them because of
vanishing gradients problem that makes the derivate infinitely small. Therefore,
the overall performance saturates or even degrades. The idea of skips
connection came from highway network where gated shortcut connections were
used
iv) Inception Net:

Inception network also known as GoogleLe Net was proposed by developers at
google in “Going Deeper with Convolutions” in 2014. The motivation of
InceptionNet comes from the presence of sparse features Salient parts in the
image that can have a large variation in size. Due to this, the selection of right
kernel size becomes extremely difficult as big kernels are selected for global
features and small kernels when the features are locally located. The
InceptionNets resolves this by stacking multiple kernels at the same level.
Typically it uses 5*5, 3*3 and 1*1 filters in one go.
38
UNIT - 4
PART-A
1. What is meant by optimization?
Optimization is the problem of finding a set of input to an objective

function that results in a maximum or minimum function evaluation. It is
the challenging problem that underlines many machine learning algorithms,
from fitting logistic regression models to training artificial neural networks.
In Deep Learning, with the help of loss function, the performance of the
model is estimated/ evaluated. This loss is used to train the network so that
it performs better. Essentially, we try to minimize the Loss function. The
Process of minimizing any mathematical function is called Optimization.
2. What is the need for optimization?

39
Presence of Local Minima reduces the model performance
Presence of Saddle Points which creates Vanishing Gradients or
Exploding Gradient Issues
 To select appropriate weight values and other associated model
parameters
 To minimize the loss value (Training error)
3. What is convex optimization?
Convex optimization is a kind of optimization which deals with the study

of problem of minimizing convex functions. Here the optimization function
is convex function.
All Linear functions are convex, so linear programming problems are
convex problems. When we have a convex objective and a convex feasible
region, then there can be only one optimal solution, which is globally
optimal.
Definition: A set C ⊆ Rn is convex if for x, y ∈ C and any α ∈ [0, 1], αx +
(1 − α)y∈ C
4. Explain non convex optimization?

• The Objective function is a non- convex function
• All non-linear problems can be modelled by using non convex functions.
(Linear functions are convex)
• It has multiple feasible regions and multiple locally optimal points.
• There can’t be a general algorithm to solve it efficiently in all cases
• Neural networks are universal function approximators, to do this, they
need to be able to approximate non-convex functions.
5. What are the reasons for non-convexity?
• Presence of many Local Minima

• Presence of Saddle Points
• Very Flat Regions
• Varying Curvature
6. State four methods to solve non-convex problem.
 Stochastic gradient descent

40
 Mini-batching
 SVRG
 Momentum
7. What is STN?
Spatial Transformer Network (SSTN) helps to crop out and scale-

normalizes the appropriate region, which can simplify the subsequent
classification task and lead to better classification performance. The Spatial
Transformer Network contains three parts Namely, Localization, Grid
Generator and Sampler. These Networks are used for performing
Transformations such as Cropping, Rotation etc on the given input images.
.
8. What are the advantages of STN?
 Helps in learning explicit spatial transformations like translation,

rotation, scaling, cropping, non-rigid deformations, etc. of features.
 Can be used in any networks and at any layer and learnt in an end-
to-end trainable manner.
 Provides improvement in the performance of existing models
9. What is RNN?
 RNNs are very powerful, because they combine two properties:

 Distributed hidden state that allows them to store a lot of information
about the past efficiently.
 Non-linear dynamics that allows them to update their hidden state in
complicated ways.
 With enough neurons and time, RNNs can compute anything that can
be computed by your computer.
10.What is LSTM?
LSTMs are a special kind of RNN — capable of learning long-term

dependencies by remembering information for long periods is the default
behavior.
All RNN are in the form of a chain of repeating modules of a neural
network. In standard RNNs, this repeating module will have a very simple
structure, such as a single tanh layer.
41
LSTMs also have a chain-like structure, but the repeating module is a bit
different structure. Instead of having a single neural network layer, four
interacting layers are communicating extraordinarily.
11.What are the steps involved in LSTM?
Step 1: Decide how much past data it should remember. The first step in
the LSTM is to decide which information should be omitted from the cell in
that particular time step. The sigmoid function determines this.
Step 2: Decide how much this unit adds to the current state. In the second
layer, there are two parts. One is the sigmoid function, and the other is the
tanh function.
Step 3: Decide what part of the current cell state makes it to the output The
third step is to decide what the output will be
12.List any four applications of LSTM?
 Robot control
 Time series prediction
 Speech recognition
 Rhythm learning
 Music composition
 Grammar learning
 Handwriting recognition
13.What is computational neuroscience?
Computational neuroscience is the field of study in which mathematical

tools and theories are used to investigate brain function.
The term “computational neuroscience” has two different definitions:
1. Using a computer to study the brain
2. Studying the brain as a computer
Computational and Artificial Neuroscience deals with the study or
understanding of how signals are transmitted through and from the human
brain.
14.What is BNN?
The human brain consists of a large number, more than a billion of neural
cells that process information. Each cell works like a simple processor. The
massive interaction between all cells and their parallel processing only
makes the brain’s abilities possible.
Various parts of biological neural network (BNN):Dendrites,Soma or cell
body ,Axon,Myelin sheath, Nodes of Ranvier,Synapse,Terminal buttons.
15.Explain artificial neuron model?

42
An artificial neuron is a mathematical function conceived as a simple model
of a real (biological) neuron.
 A set of input connections brings in activations from other neuron.
 A processing unit sums the inputs, and then applies a non-linear activation
 An output line transmits the result to other neurons.
16.What are the basic elements of ANN?
Neuron consists of three basic components –weights, thresholds and a

single activation function. An Artificial neural network (ANN) model
based on the biological neural systems. It explains the biophysical
mechanisms of computation in neurons, computer simulations of neural
circuits, and models of learning.
17.List any four applications of computational neuro science.
 Deep Learning, Artificial Intelligence and Machine Learning

 Human psychology  Medical sciences
 Mental models
 Computational anatomy
 Information theory
18.State the difference between convex and non-convex optimization
Convex optimization
 Convex optimization is a kind of optimization which deals with the

study of problem of minimizing convex functions. Here the
optimization function is convex function.
 All Linear functions are convex, so linear programming problems are
convex problems.
 It is much easier to analyze and test algorithms in such a context.
43
Non convex optimization
 The Objective function is a non- convex function: All non-linear

problems can be modelled by using non convex functions.
 There can’t be a general algorithm to solve it efficiently in all cases
 Neural networks are universal function approximators, to do this, they
need to be able to approximate non-convex functions.
19.What are optimizers?
Optimizers are algorithms or methods used to change the features of the

neural network such as weights and learning rate so that the loss is reduced.
Optimizers are used to solve optimization problems by minimizing the
function The Goal of an Optimizer is to minimize the Objective Function
(Loss Function based on the Training Data set). Simply Optimization is to
minimize the Training Error.
20.What is the goal of computational neuro science?
The goal of computational neuroscience is to explain how electrical and

chemical signals are used in the brain to represent and process information.
It explains the biophysical mechanisms of computation in neurons, computer
simulations of neural circuits, and models of learning.
44
PART- B
1. Explain optimization in Deep learning? What is the need for
optimization and explain the types ofoptimizations?
Optimization in Deep Learning: In Deep Learning, with the help of loss

function, the performance of the model is estimated/ evaluated. This loss is
used to train the network so that it performs better. Essentially, we try to
minimize the Loss function. Lower Loss means the model performs better.
The Process of minimizing any mathematical functionis called

Optimization.
Optimizers are algorithms or methods used to change the features of the
neural network such as weights and learning rate so that the loss is reduced.
Optimizers are used to solve optimization problems by minimizing the
function The Goal of an Optimizer is to minimize the Objective Function
(Loss Function based on the Training Data set). Simply Optimization is to
minimize the Training Error.
Need for Optimization:
 Presence of Local Minima reduces the model performance

 Presence of Saddle Points which creates Vanishing Gradients or
Exploding Gradient Issues
 To select appropriate weight values and other associated model
parameters
 To minimize the loss value (Training error)
Types of optimizations
 Gradient Descent
 Stochastic Gradient Descent (SGD)
 SVRG (Stochastic Variance Reduced Gradient)
 Mini Batch — Stochastic Gradient Descent.
 Momentum based Optimizer.
2. Explain RNN?What is the need for RNN?How to specify inputs and

targets to RNN?
A recurrent neural network is a type of artificial neural network commonly used
in speech recognition and natural language processing. RNNs are used in deep
45
learning and in the development of models that simulate neuron activity in the
human brain. They are especially powerful in use cases where context is critical
to predicting an outcome, and are also distinct from other types of artificial
neural networks because they use feedback loops to process a sequence of data
that informs the final output. These feedback loops allow information to persist.
This effect often is described as memory.
Figure: Recurrent Neural Network
 RNNs are very powerful, because they combine two properties:
 Distributed hidden state that allows them to store a lot of

information about the past efficiently.
 Non-linear dynamics that allow them to update their hidden state in

complicated ways.
 With enough neurons and time, RNNs can compute anything that can be
computed by your computer.
Need for RNN

 Normal Networks cannot handle sequential data
 They consider only the current input
 Normal Neural networks cannot memorize previous inputs
Providing Input to RNN:
We can specify inputs in several ways:

 Specify the initial states of all the units.
 Specify the initial states of a subset of the units.
46
 Specify the states of the same subset of the units at every time step.
Providing Targets to RNN:
We can specify targets in several ways:

– Specify desired final activities of all the units
– Specify desired activities of all units for the last few steps
• Good for learning attractors
• It is easy to add in extra error derivatives as we backpropagate.
– Specify the desired activity of a subset of the units
3. Explain LSTM networks? What are steps involved in LSTM networks?

What are the applications of LSTM network?
Long Short Term Memory Network’s ( LSTM):
LSTMs are a special kind of RNN — capable of learning long-term

dependencies by remembering information for long periods is the default
behaviour. All RNN are in the form of a chain of repeating modules of a
neural network. In standard RNNs, this repeating module will have a very
simple structure, such as a single tanh layer.
LSTMs also have a chain-like structure, but the repeating module is a bit
different structure. Instead of having a single neural network layer, four
interacting layers are communicating extraordinarily. Hochreiter &
Schmidhuber (1997) solved the problem of getting an RNN to remember
things for a long time (like hundreds of time steps). They designed a memory
cell using logistic and linear units with multiplicative interactions
To preserve information for a long time in the activities of an RNN, we use a

circuit that implements an analogue memory cell.
– A linear unit that has a self-link with a weight of 1 will maintain its
state.
– Information is stored in the cell by activating its write gate.
– Information is retrieved by activating the read gate.
– We can backpropagate through this circuit because logistics are had
nice derivatives
47
Steps Involved in LSTM Networks:
Step 1: Decide how much past data it should remember.

The first step in the LSTM is to decide which information should
be omitted from the cell in that particular time step. The sigmoid function
determines this. It looks at the previous state (ht-1) along with the current
input xt and computes the function.
Step 2: Decide how much this unit adds to the current state.
In the second layer, there are two parts. One is the sigmoid
function, and the other is the tanh function. In the sigmoid function, it
decides which values to let through (0 or 1). tanh function gives
weightage to the values which are passed, deciding their level of
importance (-1 to 1).
Step 3: Decide what part of the current cell state makes it to the output.
The third step is to decide what the output will be. First, we run a
sigmoid layer, which decides what parts of the cell state make it to the
output. Then, we put the cell state through tanh to push the values to be
between -1 and 1 and multiply it by the output of the sigmoid gate.
Applications of LSTM include:

• Robot control
• Time series prediction
• Speech recognition
• Rhythm learning
• Music composition
• Grammar learning
• Handwriting recognition
4. Explain BNN? Briefly explain artificial neuron model and elements

48
of ANN? Explain computational and artificial neuro science?
The Biological Neurons:

The human brain consists of a large number, more than a billion of neural
cells that process information. Each cell works like a simple processor. The
massive interaction between all cells and their parallel processing only
makes the brain’s abilities possible.
Artificial neuron model
An artificial neuron is a mathematical function conceived as a simple
model of a real (biological) neuron.
 The McCulloch-Pitts Neuron This is a simplified model of real neurons,
known as a Threshold Logic Unit
 A set of input connections brings in activations from other neuron.
 A processing unit sums the inputs, and then applies a non-linear activation
 An output line transmits the result to other neurons.
Basic Elements of ANN: Neuron consists of three basic components –

weights, thresholds and a single activation function. An Artificial neural
network (ANN) model based on the biological neural systems is shown in
figure
The goal of computational neuroscience is to explain how electrical and

chemical signals are used in the brain to represent and process information.
It explains the biophysical mechanisms of computation in neurons, computer
simulations of neural circuits, and models of learning.
Computational and Artificial Neuro-Science:
Computational neuroscience is the field of study in which mathematical

tools and theories are used to investigate brain function. The term
“computational neuroscience” has two different definitions:
49
1. using a computer to study the brain
2. studying the brain as a computer Computational and Artificial
Neuroscience deals with the study or understanding of how signals are
transmitted through and from the human brain.
A better understanding of How decision is made in human brain by

processing the data or signals will help us in developing Intelligent
algorithms or programs to solve complex problems. Hence, we need to
understand the basics of Biological Neural Networks (BNN).
5. Explain artificial neuro science? How it related with biological neuron?

Explain biological neuron? What are the applications of artificial
neuro science?
Artificial Neuro-Science: Artificial Neuroscience deals with the study or

understanding of how signals are transmitted through and from the human
brain. A better understanding of How decision is made in human brain by
processing the data or signals will help us in developing Intelligent
algorithms or programs to solve complex problems. Hence, we need to
understand the basics of Biological Neural Networks (BNN).
The Biological Neurons: The human brain consists of a large number, more
than a billion of neural cells that process information. Each cell works like a
simple processor. The massive interaction between all cells and their parallel
processing only makes the brain’s abilities possible. Figure 1 represents a
human biological nervous unit. Various parts of biological neural
network(BNN).
 Dendrites are branching fibres that extend from the cell body or soma.
 Soma or cell body of a neuron contains the nucleus and other structures,
50
support chemical processing and production of neurotransmitters.
 Axon is a singular fiber carries information away from the soma to the
synaptic sites of other neurons (dendrites ans somas), muscels, or glands.
Axon hillock is the site of summation for incoming information. At any
moment, the collective influence of all neurons that conduct impulses to a
given neuron will determine whether or not an action potential will be
initiated at the axon hillock and propagated along the axon.
 Myelin sheath consists of fat-containing cells that insulate the axon from
electrical activity. This insulation acts to increase the rate of transmission
of signals. A gap exists between each myelin sheath cell along the axon.
Since fat inhibits the propagation of electricity, the signals jump from one
gap to the next.
 Nodes of Ranvier are the gaps (about 1 μm) between myelin sheath cells.
Since fat serves as a good insulator, the myelin sheaths speed the rate of
transmission of an electrical impulse along the axon.
 Synapse is the point of connection between two neurons or a neuron and

a muscle or a gland. Electrochemical communication between neurons
take place at these junctions.
 Terminal buttons of a neuron are the small knobs at the end of an axon
that release chemicals called neurotransmitters
Applications of Computational Neuro Science:
 Deep Learning, Artificial Intelligence and Machine Learning

 Human psychology
 Medical sciences
 Mental models
 Computational anatomy
 Information theory
51
UNIT - 5
PART-A
1. What is Imagenet?
 ImageNet is an image database organized according to the WordNet

hierarchy (currently only the nouns), in which each node of the hierarchy is
depicted by hundreds and thousands of images.
 ImageNet gives researchers a common set of images to benchmark their
models and algorithms. ImageNet is useful for many computer vision
applications such as object recognition, image classification and object
localization.
2. What is wordnet?
WordNet is a database of English words linked together by semantic

relationships.
3. What is Synset?
 Words of similar meaning are grouped together into a synonym

set, simply called synset.
 Hypernyms are synsets that are more general. Thus, "organism" is
a hypernym of "plant".
 Hyponyms are synsets that are more specific. Thus, "aquatic" is a
hyponym of "plant".
4. What is WaveNet?
52
 WaveNet is a deep generative model of raw audio waveforms.
 WaveNets are able to generate speech which mimics any human
voice and which sounds more natural than the best existing Text-
to-Speech systems, reducing the gap with human performance by
over 50%.
5. Describe the workflow of WaveNet?
 Input is fed into a causal 1D convolution.

 The output is then fed to 2 different dilated 1D convolution layers
with sigmoid and tanh activations.
 The element-wise multiplication of 2 different activation values
results in a skip connection and the element-wise addition of a skip
connection and output of causal 1D results in the residual.
6. What is NLP?
 Natural Language Processing (NLP) is the sub-field of Computer

Science especially Artificial Intelligence (AI) that is concerned
about enabling computers to understand and process human
language.
 Technically, the main task of NLP would be to program computers
for analyzing and processing huge amount of natural language data.
7. List four phases of NLP?

Natural Language Processing Phases:
53
8. What are the different typesof NLP based on working?
o Speech Recognition — The translation of spoken language into

text.
o Natural Language Understanding (NLU) — The computer’s ability
to understand what we say.
o Natural Language Generation (NLG) — The generation of natural
language by a computer.
9. What are the applications of NLP?
 Spam Filters
 Algorithmic Trading
 Answering Questions
 Summarizing Information’s etc.
10. How images are labeled in ImageNet?
In the early stages of the ImageNet project, a few people try to label the
images collected for ImageNet. But now, researchers came to know about an
Amazon service called Mechanical Turk. This meant that image labelling can be
crowdsourced via this service.
Humans all over the world would label the images for a small fee.
Humans make mistakes and therefore we must have checks in place to
overcome them. Each human is given a task of 100 images.
In each task, 6 "gold standard" images are placed with known labels.
At most 2 errors are allowed on these standard images, otherwise the task
has to be restarted. In addition, the same image is labelled by three different
humans.
When there's disagreement, such ambiguous images are resubmitted to
another human with tighter quality threshold (only one allowed error on the
standard images).
11. How images of ImageNet get licensed?
Images for ImageNet were collected from various online sources.

ImageNet doesn't own the copyright for any of the images.
For public access, ImageNet provides image thumbnails and URLs from
where the original images were downloaded.
54
Researchers can use these URLs to download the original images.
However, those who wish to use the images for non-commercial or educational
purpose, can create an account on ImageNet and request access. This will allow
direct download of images from ImageNet. This is useful when the original
sources of images are no longer available.
12. What is Morphological Processing?
It is the first phase of NLP. The purpose of this phase is to break chunks
of language input into sets of tokens corresponding to paragraphs, sentences and
words.
For example, a word like “uneasy” can be broken into two sub-word
tokens as “un-easy”.
13. What is Syntax Analysis?

 It is the second phase of NLP. The purpose of this phase is two
folds: to check that a sentence is well formed or not and to break it up
into a structure that shows the syntactic relationships between the
different words.
 For example, the sentence like “The school goes to the boy” would
be rejected by syntax analyzer or parser.
14. What is Semantic Analysis?

It is the third phase of NLP. The purpose of this phase is to draw exact
meaning, or you can say dictionary meaning from the text. The text is checked
for meaningfulness.
For example, semantic analyzer would reject a sentence like “Hot ice-
cream”.
15. What is Pragmatic Analysis?

It is the fourth phase of NLP. Pragmatic analysis simply fits the actual
objects/events, which exist in a given context with object references obtained
during the last phase (semantic analysis).
For example, the sentence “Put the banana in the basket on the shelf” can
have two semantic interpretations and pragmatic analyzer will choose between
these two possibilities
16. What is Word2Vec?
55
Word2vec is a two-layer neural net that processes text by “vectorizing”
words. Its input is a text corpus and its output is a set of vectors: feature vectors
that represent words in that corpus. While Word2vec is not a deep neural
network, it turns text into a numerical form that deep neural networks can
understand.
17. What is the purpose of Word2Vec?
The purpose and usefulness of Word2vec is to group the vectors of

similar words together in vector space. That is, it detects similarities
mathematically.
Word2vec creates vectors that are distributed numerical representations
of word features, features such as the context of individual words. It does so
without human intervention.
18. What are the applications of Word2Vec?
Word2vec’s applications extend beyond parsing sentences in the wild. It

can be applied just as well to genes, code, likes, playlists, social media graphs
and other verbal or symbolic series in which patterns may be discerned.
19. State the applications of Deep learning?

Deep Learning finds a lot of usefulness in the field of Biomedical and
Bioinformatics.
Similarly for the other Applications such as Facial Recognition and Scene
Matching applications appropriate Deep Learning Based Algorithms such as
AlexNet, VGG, Inception, ResNet and or Deep learning-based LSTM or RNN
can be used.
20. What is meant by TTS?
a process usually referred to as speech synthesis or textto-speech (TTS)

— is still largely based on so-called concatenative TTS, where a very large
database of short speech fragments are recorded from a single speaker and then
recombined to form complete utterances.
Part-B:
1. What is ImageNet? explain how images are labeled, licensed in ImageNet?
56
ImageNet:
 ImageNet is an image database organized according to the WordNet

hierarchy (currently only the nouns), in which each node of the
hierarchy is depicted by hundreds and thousands of images.
 ImageNet gives researchers a common set of images to benchmark

their models and algorithms. ImageNet is useful for many computer
vision applications such as object recognition, image classification and
object localization.
Labeling images in ImageNet:
 In the early stages of the ImageNet project, a few people try to label
the images collected for ImageNet. But now, researchers came to
know about an Amazon service called Mechanical Turk. This meant
that image labelling can be crowdsourced via this service.
 Humans all over the world would label the images for a small fee.
Humans make mistakes and therefore we must have checks in place to
overcome them. Each human is given a task of 100 images.
 In each task, 6 "gold standard" images are placed with known labels.
 At most 2 errors are allowed on these standard images, otherwise the

task has to be restarted. In addition, the same image is labelled by
three different humans.
 When there's disagreement, such ambiguous images are resubmitted to

another human with tighter quality threshold (only one allowed error
on the standard images).
Licensing images of ImageNet:
 Images for ImageNet were collected from various online sources.

ImageNet doesn't own the copyright for any of the images.
 For public access, ImageNet provides image thumbnails and URLs from
where the original images were downloaded.
 Researchers can use these URLs to download the original images.

However, those who wish to use the images for non-commercial or
57
educational purpose, can create an account on ImageNet and request
access. This will allow direct download of images from ImageNet. This is
useful when the original sources of images are no longer available
2. What is WaveNet? Explain the WaveNet Overall Model and also

distinguish workflow of WaveNet?
WaveNet:
 WaveNet is a deep generative model of raw audio waveforms.
 WaveNets are able to generate speech which mimics any human voice
and which sounds more natural than the best existing Text-to-Speech
systems, reducing the gap with human performance by over 50%.
WaveNet Overall Model:
 a process usually referred to as speech synthesis or textto-speech

(TTS) — is still largely based on so-called concatenative TTS, where
a very large database of short speech fragments are recorded from a
single speaker and then recombined to form complete utterances.
 WaveNet changes this paradigm by directly modelling the raw

waveform of the audio signal, one sample at a time. As well as
yielding more natural-sounding speech, using raw waveforms means
that WaveNet can model any kind of audio, including music.
58
 The WaveNet proposes an autoregressive learning with the help of
convolutional networks with some tricks. Basically, we have a
convolution window sliding on the audio data, and at each step try to
predict the next sample value i.e., it builds a network that learns the
causal relationships between consecutive timesteps (as shown in
figure).
The workflow of WaveNet:
 Input is fed into a causal 1D convolution.

 The output is then fed to 2 different dilated 1D convolution layers with
sigmoid and tanh activations.
 The element-wise multiplication of 2 different activation values results in
a skip connection and the element-wise addition of a skip connection and
output of causal 1D result in the residual.
3. What is Natural Language Processing (NLP)? Explain different phases

of NLP?
Natural Language Processing (NLP):
 Natural Language Processing (NLP) is the sub-field of Computer

Science especially Artificial Intelligence (AI) that is concerned about
enabling computers to understand and process human language.
 Technically, the main task of NLP would be to program computers for
analyzing and processing huge amount of natural language data.
Applications of NLP:
59
 Spam Filters
 Algorithmic Trading
 Answering Questions
 Summarizing Information’s etc.
Phases of NLP:
Natural Language Processing Phases:
Morphological Processing:
 It is the first phase of NLP. The purpose of this phase is to break

chunks of language input into sets of tokens corresponding to
paragraphs, sentences and words.
 For example, a word like “uneasy” can be broken into two sub-word
tokens as “un-easy”.
Syntax Analysis:
 It is the second phase of NLP. The purpose of this phase is two

folds: to check that a sentence is well formed or not and to break it
up into a structure that shows the syntactic relationships between
the different words.
 For example, the sentence like “The school goes to the boy” would
be rejected by syntax analyzer or parser.
Semantic Analysis:
60
 It is the third phase of NLP. The purpose of this phase is to draw
exact meaning, or you can say dictionary meaning from the text.
The text is checked for meaningfulness.
 For example, semantic analyzer would reject a sentence like “Hot
ice-cream”.
Pragmatic Analysis:
 It is the fourth phase of NLP. Pragmatic analysis simply fits the
actual objects/events, which exist in a given context with object
references obtained during the last phase (semantic analysis).
 For example, the sentence “Put the banana in the basket on the
shelf” can have two semantic interpretations and pragmatic
analyzer will choose between these two possibilities.
4. Explain about Word2Vec?
Word2Vec:
 Word2vec is a two-layer neural net that processes text by
“vectorizing” words. Its input is a text corpus and its output is a set
of vectors: feature vectors that represent words in that corpus.
While Word2vec is not a deep neural network, it turns text into a
numerical form that deep neural networks can understand.
The purpose of Word2Vec:

 The purpose and usefulness of Word2vec is to group the vectors of
similar words together in vector space. That is, it detects
similarities mathematically.
 Word2vec creates vectors that are distributed numerical
representations of word features, features such as the context of
individual words. It does so without human intervention.
The applications of Word2Vec:

 Word2vec’s applications extend beyond parsing sentences in the wild.
It can be applied just as well to genes, code, likes, playlists, social
media graphs and other verbal or symbolic series in which patterns
may be discerned.
61
 Word2vec is similar to an autoencoder, encoding each word in a
vector, but rather than training against the input words through
reconstruction, as a restricted Boltzmann machine does, word2vec
trains words against other words that neighbor them in the input
corpus. t does so in one of two ways, either using context to predict a
target word (a method known as continuous bag of words, or CBOW),
or using a word to predict a target context, which is called skip-gram.
 When the feature vector assigned to a word cannot be used to

accurately predict that word’s context, the components of the vector
are adjusted. Each word’s context in the corpus is the teacher sending
error signals back to adjust the feature vector. The vectors of words
judged similar by their context are nudged closer together by adjusting
the numbers in the vector.
 Similar things and ideas are shown to be “close”. Their relative

meanings have been translated to measurable distances. Qualities
become quantities, and algorithms can do their work. But similarity is
just the basis of many associations that Word2vec can learn. For
example, it can gauge relations between words of one language, and
map them to another.
 The main idea of word2Vec is to design a model whose parameters

are the word vectors. Then, train the model on a certain objective. At
every iteration we run our model, evaluate the errors, and follow an
62
update rule that has some notion of penalizing the model parameters
that caused the error. Thus, we learn our word vectors.
5. Explain Applications of Deep Learning Networks?
 Deep Learning finds a lot of usefulness in the field of Biomedical and

Bioinformatics. Deep Learning algorithms can be used for detecting
fractures or anatomical changes in the human bones or in bone joints
thereby early prediction of various diseases like arteritis.
 Normal Neural Networks fails because of errors in the stages of Image

Segmentation and Feature Extractions. To avoid this we can build a
Convolution based model.
 In this example we had considered a CNN based Network. The input to

this model is Knee Thermographs. Thermography is the image which
senses or captures the heat intensity coming out from that particular
region. Based on the patient’s pressure points the color in the
thermographs varies. Red regions denote more pressure locations and
yellow regions Denote less pressure locations. So, from the thermogram
we can understand the effects of joint/Bone wear and tear or Damage
occurred at particular spot.
 The Convolution Filter is made to move over the image. The stride value
here considered for this case study is 1. We have used Max Pooling and
in the fully connected layer we have used Softmax aggregator.
63
 The above diagram is a schematic representation of a Deep Learning
based network which can be used for human knee Joint deformities
Identification purpose.
 Figure below shows the general Anatomical structure of a Human Knee
Joint.
Pooling layers section would reduce the number of parameters when the
images are too large. Spatial pooling also called subsampling or down sampling
which reduces the dimensionality of each map but retains important
information. This is to decrease the computational power required to process the
data by reducing the dimensions.
Types of Pooling:
• Max Pooling
• Average Pooling
• Sum Pooling
• The image is flattened into a column vector.
 The flattened output is fed to a feed-forward neural network and

backpropagation applied to every iteration of training.
Steps Involved:
 Provide input image into convolution layer

 Choose parameters, apply filters with strides, padding if requires. Perform
convolution on the image and apply ReLU activation to the matrix.
 Perform pooling to reduce dimensionality size
 Add as many convolutional layers until satisfied
 Flatten the output and feed into a fully connected layer (FC Layer)
 Output the class using an activation function (Logistic Regression with
cost functions) and classifies images.
64
Similarly for the other Applications such as Facial Recognition and Scene
Matching applications appropriate Deep Learning Based Algorithms such as
AlexNet, VGG, Inception, ResNet and or Deep learning-based LSTM or RNN
can be used. These Networks has to be explained with necessary Diagrams and
appropriate Explanations.
***********END**********
65

QB With Answers Deep Lerarning

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

QB With Answers Deep Lerarning

Uploaded by

Copyright:

Available Formats

NOORAL ISALAM CENTRE FOR HIGER EDUCTION

Subject Name: Deep Learning

1. Define Machine Learning

Machine Learning (ML) is an algorithm that works on consequently through

Neural computing is an information processing paradigm, inspired by

 A set of input connections brings in activations from other neuron.

3. What are the basic elements of ANN

An artificial Neuron consists of three basic components:

An Artificial neural network (ANN) model based on the biological neural

4. List any four learning procedure available in ANN.

5. What is meant by linear regression

Linear regression analysis is used to predict the value of a variable based

6. What is gradient descent.

7. What are the types of gradient descent.

8. What is meant by SVM

Support Vector Machine or SVM is one of the most popular Supervised

(ii) Non-linear SVM: Non-Linear SVM is used for non-linearly

10. What is multilayer perception model

A multilayer perceptron (MLP) is a fully connected class of feedforward

11. List four limitations of single layer perceptions.

(i) Uses only Binary Activation function

(i) Learning Law should lead to convergence of weights

13. Explain Hebbian Learning.

14. What is a convex region?

Convex regions can be created by multiple decision lines arising from

15. What is Supervised learning?

17. Define Gradient descent learning?

18. What is meant by perceptron model.

A perceptron is a simple model of a biological neuron in an artificial

Perceptron is a mathematical model of a biological neuron. While the

20. What is logistic regression

(ii) Logistic regression predicts the output of a categorical dependent variable.

An artificial neuron is a mathematical function conceived as a simple model of a real

An artificial Neuron consists of three basic components:

2. Explain different learning procedures available in ANN

Gradient Descent Learning

3. What is Perceptron model? Explain the types of perception model.

A perceptron is a simple model of a biological neuron in an artificial

Perceptron is a mathematical model of a biological neuron. While the

Single Layer Preceptrons:

 Uses only Binary Activation function

Multi Layer Preceptrons:

Support Vector Machine or SVM is one of the most popular Supervised

SVM can be of two types:

5. Explain different training procedures in ANN.

Gradient Descent Learning

1) What is the probabilistic theory of Deep learning?

3) What are the four major steps of BPN Algorithm?

4) What are the Merits of BPN Algorithm?

5) List any fourThe de-Merits of BPN Algorithm?

6) What is Multi layer Network?

7) What are the limitations of Single layer Network?

8) What is mean by Regularization?

9) What is mean by L2 Regularization?

10) What is mean by L1 Regularization?

11) State Any two difference between L1 and L2 Regularization?

12) What is convolutional neural network ?

13) What is GAN?

15) What is the Difference between supervise and semi-supervised

16) What is mean by Batch Normalization?

19) What is the Deference between CNN and GAN?

20) What is conditional GAN?

Backpropagation in neural network is a short form for “backward

It uses Supervised Training process; it has asystematic procedure for