Professional Documents
Culture Documents
(Autonomous)
Cheeryal (V), Keesara (M),Medchal District – 501 301 (T.S)
DEPARTMENT OF
COMPUTER SCIENCE AND ENGINEERING
(2021-2022)
CONTENTS
S.No Topic Page. No.
1 Cover Page 4
2 Syllabus copy 5
8 Brief notes on the importance of the course and how it fits into the curriculum 10
9 Prerequisites if any 12
14 Detailed notes 18
15 Additional topics 68
17 Question Bank 76
18 Assignment Questions 78
20 Tutorial problems 86
21 Known gaps ,if any and inclusion of the same in lecture schedule 87
B Teaching Evaluation 88
25 Student List 89
Distribution List :
Prepared by : Updated by :
1)Name : M SANTHOSH KUMAR 1) Name :
2) Sign : 2) Sign :
3) Design : Assistant Professor 3) Design :
4) Date : 19-08-21 4) Date :
Verified by : *For Q.C only
1) Name : 1) Name :
2) Sign : 2) Sign :
3) Design : 3) Design :
4) Date : 4) Date :
2) Sign :
3) Date :
REFERENCE BOOK(S)
1. System Modeling and Simulation: An Introduction, Frank L. Severance, Wiley Publisher.
2. System Simulation, Geoffrey Gordon, Prentice-Hall of India Private Limited, Second Edition.
PROGRAM OUTCOMES
PO1.Engineering knowledge: Apply the knowledge of mathematics, science, engineering
fundamentals, and an engineering specialization to the solution of complex engineering problems.
PO2.Problem analysis: Identify, formulate, review research literature, and analyze complex
engineering problems reaching substantiated conclusions using first principles of
mathematics, natural sciences, and engineering sciences.
PO3.Design/development of solutions : Design solutions for complex engineering problems
and design system components or processes that meet the specified needs with appropriate
consideration for the public health and safety, and the cultural, societal, and environmental
considerations.
PO4.Conduct investigations of complex problems: Use research-based knowledge and
research methods including design of experiments, analysis and interpretation of data, and
synthesis of the information to provide valid conclusions.
PO5.Modern tool usage: Create, select, and apply appropriate techniques, resources, and
modern engineering and IT tools including prediction and modeling to complex engineering
activities with an understanding of the limitations.
PO6.The engineer and society: Apply reasoning informed by the contextual knowledge to
assess societal, health, safety, legal and cultural issues and the consequent responsibilities relevant
to the professional engineering practice.
PO7.Environment and sustainability: Understand the impact of the professional engineering
solutions in societal and environmental contexts, and demonstrate the knowledge of, and
need for sustainable development.
PO8.Ethics: Apply ethical principles and commit to professional ethics and responsibilities and
norms of the engineering practice.
PO9.Individual and team work: Function effectively as an individual, and as a member or
leader in diverse teams, and in multidisciplinary settings.
PO10.Communication: Communicate effectively on complex engineering activities with the
engineering community and with society at large, such as, being able to comprehend and write
effective reports and design documentation, make effective presentations, and give and receive
clear instructions.
PO11.Project management and finance: Demonstrate knowledge and understanding of the
engineering and management principles and apply these to one’s own work, as a member
and leader in a team, to manage projects and in multidisciplinary environments.
PO12.Life-long learning : Recognize the need for, and have the preparation and ability to
engage in independent and life-long learning in the broadest context of technological change.
PSO 2: To follow the best practices namely SEI-CMM levels and six sigma which varies
from time to time for software development project using open ended programming environment
to produce software deliverables as per customer needs.
simulation
Professional 18CS4 1
1 and 3 2 1
Elective 108
modelling
8. Brief Importance of the Course and how it fits into the curriculum
a. What role does this course play within the Program?
A simulation is the imitation of the operation of a real-world process or system over time.
Simulations require the use of models; the model represents the key characteristics or behaviors of the
[1]
selected system or process, whereas the simulation represents the evolution of the model over time.
Often, computers are used to execute the simulation
b. How is the course unique or different from other courses of the Program?
Simulation is a technique for practice and learning that can be applied to many different disciplines
and trainees. It is a technique (not a technology) to replace and amplify real experiences with guided
ones, often “immersive” in nature, that evoke or replicate substantial aspects of the real world in a fully
interactive fashion.
c. What essential knowledge or skills should they gain from this experience?
Practicing in a safe environment
Understanding human behavior
Improving teamwork
Providing confidence
Giving insight into trainees’ own behavior
d. What knowledge or skills from this course will students need to have mastered to
perform well in future classes or later (Higher Education / Jobs)?
As robots, automation and artificial intelligence perform more tasks and there is massive
disruption of jobs, experts say a wider array of education and skills-building programs will be created
to meet new demands. There are two uncertainties: Will well-prepared workers be able to keep up in
the race with AI tools? And will market capitalism survive?
e. Why is this course important for students to take?
Well-designed simulations and games have been shown to improve decision-making and critical
thinking skills as well as teaching discipline-specific concepts. Active learning also helps students
develop interpersonal and communications skills.
f. What is/are the prerequisite(s) for this course?
n. What unique contributions to students’ learning experience does this course make?
It helps in executing mini and major projects that involve Simulation during the later years
of the program.
o. What is the value of taking this course? How exactly does it enrich the program?
Teaching employees new skills, techniques or processes can be challenging for many companies.
Often training depends on when another employee’s schedule is free or taking that person away
from production or billable work. If machines or equipment are needed for training, then those
machines are not in production during the training time, as well as the person operating the
equipment.
p. What are the major career options that require this course
Specific occupations that employ Simulation & Modeling include:
Simulation developer
A background in computer game simulation engineer
Simulation Technical support
9. Prerequisites
Programming for Problem Solving
Object Oriented Programming using Java
Probability and Statistics
10. Instructional Learning Outcomes
Upon completing this course, it is expected that a student will be able to do the following:
UNIT-I
1. Understand different concepts of simulation
2. Understand the Advantages and disadvantages of simulation
3. Understand the Recent applications of simulation
4. Understand Discrete and Continuous Systems
6. Understand the System Modeling, Types of Models
UNIT-II
1. Understand the random number generation
2. Understand the Generation of Pseudo-Random Numbers
3. Understand the Techniques of generating random numbers
4. Understand the tests for random numbers
5. Understand the Inverse-Transform Technique
6. Understand the Acceptance-Rejection Technique
UNIT-III
1. Understand the Numerical integration vs. continuous system simulation
2. Understand the Selection of an integration formula
3. Understand the Runge-Kutta integration formulas
4. Understand the Simulation of a water reservoir system
5. Understand the Fixed time-step vs. event-to-event model
6. Understand the generation of non-uniformly distributed random numbers
UNIT-IV
1. Understand Rudiments of queuing theory
2. Understand Simulation of a singleserver queue
3. Understand Simulation of a two-server queue
4. Understand Simulation of more general queues
5. Understand Analysis of activity network
UNIT-V
1. Understand Length of simulation runs
2. Understand Variance reduction techniques
3. Understand Experimental layout, Validation
4. Understand Continuous and discrete simulation languages
5. Understand Continuous simulation languages
6. Understand Block-structured continuous simulation languages
11. Class Time Table
20. Day20 Trading off Bias and Variance to Minimum Mean LCD
Square Error
29. Day29 Hidden Units, Rectified Linear Units and their LCD
Generalizations, Logistic Sigmoid and Hyperbolic
Tangent
Total No of classes:62
14.Detailed Notes
UNIT-I
Linear Algebra:
Scalar:
Scalar is a single number.
Lower case variable is used to represent a scalar.
Ex: Let s∈R be the slope of the line
Let n∈N be the number of units.
Vector:
Vector is an array of numbers.
The numbers are arranged in an order.
Each individual number is identified using an index.
Lower case and bold variable is used to represent a vector.
If each element is in R, and the vector has n elements, then the
vector lies in the set formed by taking the Cartesian product of R n
times, denoted as Rn.
Ex: x= x1
x2
xn
Matrices:
A matrix is a 2-D array of numbers, so each element is identified by indices
instead of one.
A ∈ Rm×n A is a real-valued matrix of height m and width of n.
A1,1 is the upper left entry of A
Am,n is the bottom right entry of A
Ai,: i-th row of A
A:,I i-th column of A
f(A)i,j gives element(i,j) of the matrix computed by applying the function f
to A.
Tensors:
An array of numbers arranged on a regular grid with a
variable number of axes is known as a tensor.
Ai,j,k
Transpose :
(AT)i,j = Aj,I
Vectors are matrices that contain only one column.
Transpose of a vector is a matrix with only one row.
Scalar can be thought of as a matrix with single element.
Transpose of a scalar is itself. a = aT.
We can add two matrices
C=A+B
We can add or multiply a scalar to a matrix
…………………………………
Ɐx ϵ X, P(x = x) = ∑y P(x = x, y = y)
Conditional Probability
Independence
Expectation
Linearity of Expectations:
Covariance matrix:
Bernoulli Distribution
Gaussian Distribution
Gaussian Distribution
Multivariate Gaussian
Empirical Distribution
Bayes’ Rule
Change of Variables
UNIT-2
Directional Curvature
Taylor series approximation
Critical points:
All positive eigenvalues All negative eigenvalues Some positive and some negative
Newton’s method
Poor conditioning
Poor Conditioning
Why convergence may not happen?
•For most problems, there exists a linear subspace of monotonically decreasing values
• For some problems, there are obstacles between this subspace the SGD path
• Factored linear models capture many qualitative aspects of deep network training
UNIT-3
Neural Networks:
● Artificial neural network (ANN) is a machine learning approach that models human brain
and consists of a number of artificial neurons.
● An activation function is applied to these inputs which results in activation level of neuron
(output value of the neuron).
● Knowledge about the learning task is given in the form of examples called training
examples.
● an architecture: a set of neurons and links connecting neurons. Each link has a
weight,
● a learning algorithm: used for training the NN by modifying the weights in order
to model a particular learning task correctly on the training examples.
Neuron
● The neuron is the basic information processing unit of a NN. It consists of:
1 A set of links, describing the neuron inputs, with weights W1, W2, …, Wm
1 An adder function (linear combiner) for computing the weighted sum of the inputs:
(real numbers)
3 Activation function for limiting the amplitude of the neuron output. Here ‘b’
denotes bias.
● The bias b has the effect of applying a transformation to the weighted sum u
v=u+b
● The bias is an external parameter of the neuron. It can be modeled by adding an extra
input.
● v is called induced field of the neuron
Neuron Models
The choice of activation function determines the neuron model.
Examples:
step function:
ramp function:
Gaussian function:
Step Function Ramp Function Sigmoid Function
The Gaussian function is the probability function of the normal distribution. Sometimes also
called the frequency curve.
Network Architectures:
● Three different classes of network architectures
− single-layer feed-forward
− multi-layer feed-forward
− recurrent
− The architecture of a neural network is linked with the learning algorithm used to
train
Single Layer Feed-forward:
x w
− A perceptron uses a step function that returns +1 if weighted sum of its input 0 and -1
1
otherwise
1 v y
x2 w2 (v)
wn
xn
Perceptron for Classification
● The perceptron is used for binary classification.
● First train a perceptron for a classification task.
− Find suitable weights in such a way that the training examples are correctly
classified.
− Geometrically try to find a hyper-plane that separates the examples of the two
classes.
● The perceptron can only model linearly separable classes.
● When the two classes are not linearly separable, it may be desirable to obtain a linear
separator that minimizes the mean squared error.
● Given training examples of classes C1, C2 train the perceptron in such a way that :
− If the output of the perceptron is +1 then the input is assigned to class C1
− If the output is -1 then the input is assigned to C2
Boolean function OR – Linearly separable
These two classes (true and false) cannot be separated using a line. Hence XOR is non linearly
separable.
Multi layer feed-forward NN (FFNN)
● FFNN is a more general network architecture, where there are hidden layers between input
and output layers.
● Hidden nodes do not directly receive inputs nor send outputs to the external environment.
● FFNNs overcome the limitation of single-layer NN.
● They can handle non-linearly separable learning tasks.
Hidden Layer
FFNN for XOR
● The ANN for XOR has two hidden nodes that realizes this non-linear separation and uses
the sign (step) activation function.
● Arrows from input nodes to two hidden nodes indicate the directions of the weight vectors
(1,-1) and (-1,1).
● The output node is used to combine the outputs of the two hidden nodes.
Since we are representing two states by 0 (false) and 1 (true), we will map negative outputs (–1, –
0.5) of hidden and output layers to 0 and positive output (0.5) to 1.
FFNN NEURON MODEL
● The classical learning algorithm of FFNN is based on the gradient descent method.
● For this reason the activation function used in FFNN are continuous functions of the
weights, differentiable everywhere.
● The activation function for node i may be defined as a simple form of the sigmoid
function in the following manner:
where A > 0, Vi = Wij * Yj , such that Wij is a weight of the link from node i to node j
and Yj is the output of node j.
Training Algorithm: Backpropagation
● The Backpropagation algorithm learns in the same way as single perceptron.
● It searches for weight values that minimize the total error of the network over the set of
training examples (training set).
● Backpropagation consists of the repeated application of the following two passes:
− Forward pass: In this step, the network is activated on one example and the error of
(each neuron of) the output layer is computed.
− Backward pass: in this step the network error is used for updating the weights. The
error is propagated backwards from the output layer through the network layer by
layer. This is done by recursively computing the local gradient of each neuron.
Backpropagation
● Back-propagation training algorithm
● The total mean squared error is the average of the network errors of the training examples.
Weight Update Rule
● The Backprop weight update rule is based on the gradient descent method:
− It takes a step in the direction yielding the maximum decrease of the network error
E.
− This direction is the opposite of the gradient of E.
● Iteration of the Backprop algorithm is usually terminated when the sum of squares of
errors of the output values for all training data in an epoch is less than some threshold such
as 0.01
Convolutional layers are the major building blocks used in convolutional neural networks.
The innovation of convolutional neural networks is the ability to automatically learn a large
number of filters in parallel specific to a training dataset under the constraints of a specific
predictive modeling problem, such as image classification. The result is highly specific
features that can be detected anywhere on input images.
convolution motivation and pooling:
0 Padding 1 stride
Zi,j,k=∑l,m,nVl,j+m−1,k+n−1Ki,l,m,nZi,j,k=∑l,m,nVl,j+m−1,k+n−1Ki,l,m,n
0 Padding s stride
Zi,j,k=c(K,V,s)i,j,k=∑l,m,n[Vl,s∗(j−1)+m,s∗(k−1)+nKi,l,m,n]Zi,j,k=c(K,V,s)i,j,k=∑l,m,n[Vl,
s∗(j−1)+m,s∗(k−1)+nKi,l,m,n]
Convolution with a stride greater than 1 pixel is equivalent to conv with 1 stride followed by
downsampling.
Some 0 Paddings and 1 stride
Without 0 paddings, the width of representation shrinks by one pixel less than the kernel width at
each layer. We are forced to choose between shrinking the spatial extent of the network rapidly and
using small kernel. 0 padding allows us to control the kernel width and the size of the output
independently.
Zi,j,k=∑l,m,n[Vl,i+m−1,j+n−1Wi,j,k,l,m,n]Zi,j,k=∑l,m,n[Vl,i+m−1,j+n−1Wi,j,k,l,m,n]
Comparison on local connections, convolution and full connection Useful when we know that
each feature should be a function of a small part of space, but no reason to think that the same
feature should occur accross all the space. eg: look for mouth only in the bottom half of the
image.
It can be also useful to make versions of convolution or local connected layers in which the
connectivity is further restricted, eg: constrain each output channel i to be a function of only a
subset 0ofPaddings
Some the inputand
channel.
1 stride
Without 0 paddings, the width of representation shrinks by one pixel less than the kernel width at
using small kernel. 0 padding allows us to control the kernel width and the size of the output
independently.
Tiled Convolution:
Learn a set of kernels that we rotate through as we move through space. Immediately
neighboring locations will have different filters, but the memory requirement for storing the
parameters will increase by a factor of the size of this set of kernels. Comparison on locally
connected layers, tiled convolution and standard convolution:
Local connected layers and tiled convolutional layer with max pooling: the detector units of
these layers are driven by different filters. If the filters learn to detect different tranformed
version of the same underlying features, then the max-pooled units become invariant to the
learned transformation.
Structured outputs:
Even if we understand the Convolution Neural Network theoretically, quite of us still get
confused about its input and output shapes while fitting the data to the network. This guide
will help you understand the Input and Output shapes for the Convolution Neural Network.
Input Shape
You always have to give a 4D array as input to the CNN. So input data has a shape
of (batch_size, height, width, depth), where the first dimension represents the batch size of
the image and the other three dimensions represent dimensions of the image which are height,
width, and depth. For some of you who are wondering what is the depth of the image, it’s
nothing but the number of color channels. For example, an RGB image would have a depth
of 3, and the greyscale image would have a depth of 1.
Output Shape
The output of the CNN is also a 4D array. Where batch size would be the same as input
batch size but the other 3 dimensions of the image might change depending upon the values
of filter, kernel size, and padding we use.
Data types:
different types of convolutional neural networks
The advancements in Computer Vision with Deep Learning has been constructed and
perfected with time, primarily over one particular algorithm — a Convolutional Neural
Network.
while convolutional neural networks (CNNs) have dominated the field of object recognition,
it can easily be deceived by creating a small perturbation, also known as adversarial attacks.
This can lead to the failure of the computer vision models and make it susceptible to
cyberattacks. CNN’s vulnerability to image perturbations has become a pressing concern for
the machine learning community while researchers and scientists are working towards
building computer vision models that generalise images like humans.
To address this vulnerability, researchers from MIT, Harvard University and MIT-IBM
Watson AI Lab have proposed VOneNets — a new class of hybrid CNN vision models — in
a recent paper. According to the researchers, this novel architecture leverages “biologically-
constrained neural networks along with deep learning techniques” to create more model
robustness against white-box adversarial attacks.
Recurrent neural networks can be built in many different ways. Much as almost any
function can be considered a feedforward neural network, essentially any function
involving recurrence can be considered a recurrent neural network.
Many recurrent neural networks use Eq 10.5 . or a similar equation to define the values of
their hidden units. To indicate that the state is the hidden units of the network, we now
rewrite Eq 10.4. using the variable to represent the state:
typical RNNs will add extra architectural features such as output layers that read
information out of the state to make predictions.
When the recurrent network is trained to perform a task that requires predicting the future
from the past, the network typically learns to use h ( t ) as a kind of lossy summary of the
task- relevant aspects of the past sequence of inputs up to t . This summary is in general
necessarily lossy, since it maps an arbitrary length sequence ( x ( t) ,x (t−1) ,x (t−2) ,...,x
(2) ,x (1) ) to a fixed length vector h ( t ) . Depending on the training criterion, this summary
might selectively keep some aspects of the past sequence with more precision than other
aspects. For example, if the RNN is used in statistical language modeling, typically to predict
the next word given previous words, it may not be necessary to store all of the information in
the input sequence up to time t , but rather only enough information to predict the rest of the
sentence. The most demanding situation is when we ask h ( t ) to be rich enough to allow one
to approximately recover the input sequence, as in autoencoder frameworks.
A recurrent neural network (RNN) is a type of artificial neural network which uses sequential
data or time series data. These deep learning algorithms are commonly used for ordinal or
temporal problems, such as language translation, natural language processing (nlp), speech
recognition, and image captioning; they are incorporated into popular applications such as
Siri, voice search, and Google Translate. Like feedforward and convolutional neural networks
(CNNs), recurrent neural networks utilize training data to learn. They are distinguished by
their “memory” as they take information from prior inputs to influence the current input and
output. While traditional deep neural networks assume that inputs and outputs are
independent of each other, the output of recurrent neural networks depend on the prior
elements within the sequence. While future events would also be helpful in determining the
output of a given sequence, unidirectional recurrent neural networks cannot account for these
events in their predictions.
Recurrent Neural Network vs. Feedforward Neural Network
Let’s take an idiom, such as “feeling under the weather”, which is commonly used when
someone is ill, to aid us in the explanation of RNNs. In order for the idiom to make sense, it
needs to be expressed in that specific order. As a result, recurrent networks need to account
for the position of each word in the idiom and they use that information to predict the next
word in the sequence.
Comparison of Recurrent Neural Networks (on the left) and Feedforward Neural Networks
(on the right)
Looking at the visual below, the “rolled” visual of the RNN represents the whole neural
network, or rather the entire predicted phrase, like “feeling under the weather.” The
“unrolled” visual represents the individual layers, or time steps, of the neural network. Each
layer maps to a single word in that phrase, such as “weather”. Prior inputs, such as “feeling”
and “under”, would be represented as a hidden state in the third timestep to predict the output
in the sequence, “the”.
Through this process, RNNs tend to run into two problems, known as exploding gradients
and vanishing gradients. These issues are defined by the size of the gradient, which is the
slope of the loss function along the error curve. When the gradient is too small, it continues to
become smaller, updating the weight parameters until they become insignificant—i.e. 0.
When that occurs, the algorithm is no longer learning. Exploding gradients occur when the
gradient is too large, creating an unstable model. In this case, the model weights will grow
too large, and they will eventually be represented as NaN. One solution to these issues is to
reduce the number of hidden layers within the neural network, eliminating some of the
complexity in the RNN model.
Bidirectional recurrent neural networks(RNN) are really just putting two independent RNNs
together. The input sequence is fed in normal time order for one network, and in reverse time
order for another. The outputs of the two networks are usually concatenated at each time step,
though there are other options, e.g. summation.
This structure allows the networks to have both backward and forward information about the
sequence at every time step. The concept seems easy enough. But when it comes to actually
implementing a neural network which utilizes bidirectional structure, confusion arises…
The Confusion
The first confusion is about the way to forward the outputs of a bidirectional RNN to a
dense neural network. For normal RNNs we could just forward the outputs at the last time
step, and the following picture I found via Google shows similar technique on a bidirectional
RNN.
A Confusing formulation
if we pick the output at the last time step, the reverse RNN will have only seen the last input
(x_3 in the picture). It’ll hardly provide any predictive power.
The second confusion is about the returned hidden states. In seq2seq models, we’ll want
hidden states from the encoder to initialize the hidden states of the decoder. Intuitively, if we
can only choose hidden states at one time step(as in PyTorch), we’d want the one at which
the RNN just consumed the last input in the sequence. But if the hidden states of time step n
(the last one) are returned, as before, we’ll have the hidden states of the reversed RNN with
only one step of inputs seen.
1. Encoder
A stack of several recurrent units (LSTM or GRU cells for better performance)
where each accepts a single element of the input sequence, collects information
for that element and propagates it forward.
Encoder Vector
This is the final hidden state produced from the encoder part of the model. It is
calculated using the formula above.
This vector aims to encapsulate the information for all input elements in order to
help the decoder make accurate predictions.
It acts as the initial hidden state of the decoder part of the model.
Decoder
A stack of several recurrent units where each predicts an output y_t at a time step t.
Each recurrent unit accepts a hidden state from the previous unit and produces
and output as well as its own hidden state.
we are just using the previous hidden state to compute the next one.
The output y_t at time step t is computed using the formula:
We calculate the outputs using the hidden state at the current time step together with the
respective weight W(S). Softmax is used to create a probability vector which will help us
determine the final output (e.g. word in the question-answering problem).
The power of this model lies in the fact that it can map sequences of different lengths
to each other. As you can see the inputs and outputs are not correlated and their lengths
can differ. This opens a whole new range of problems which can now be solved using such
architecture.
we could stack multiple layers of RNNs on top of each other. This results in a flexible
mechanism, due to the combination of several simple layers. In particular, data might be
relevant at different levels of the stack. For instance, we might want to keep high-level
data about financial market conditions (bear or bull market) available, whereas at a lower
level we only record shorter-term temporal dynamics.
In simple words, if we say that a Recursive neural network is a family person of a deep neural
network, we can validate it. So, if the same set of weights are recursively applied on a
structured input, then the Recursive neural network will take birth. So, it will keep happening
for all the nodes, as explained above. Recursive neural networks are made of architectural
class, which is majorly operational on structured inputs. The RNN’s are particularly directed
on acyclic graphs.
It’s a deep tree structure. For conditions like there are needs to parse the complete sentence,
there recursive neural networks are used. It has a topology similar to tree-like. The RNN’s
allow the branching of connections & structures with hierarchies.
They mainly use recursive neural networks for the prediction of structured outputs. It is done
over variable-sized input structures. Also, it traverses a given structure that too in topological
order. They also do it for scalar predictions. But here point to note is that the Recursive
neural network just does not respond to structured inputs, but it also works in contexts.
Each time series is processed separately. A very interesting point to ponder is that the first
introduction of RNN happened when a need arose to learn distributed data representations of
various structural networks.
A Recursive Neural Network is a type of deep neural network. So, with this, you can expect
& get a structured prediction by applying the same number of sets of weights on structured
inputs. With this type of processing, you get a typical deep neural network known as
a recursive neural network. These networks are non-linear in nature.
The recursive networks are adaptive models that are capable of learning deep structured
erudition. Therefore, you may say that the Recursive Neural Networks are among complex
inherent chains. Let’s discuss its connection with deep learning concepts.
Performance metrics:
It is very tempting to jump right into research and to implement a cutting edge deep learning
solution. However, that is the point where I usually tell myself to stay pragmatic and build a
decent baseline first. Many data scientists underestimate the importance of having a baseline.
I love baseline models for their ability to deliver 90% of value for 10% of the effort. An 80%
accurate model in 2 days is better than an 81.5% accurate model in 4 weeks and that is what
is important when working with clients. The beauty of a decent baseline model is that it is
very hard to beat and the cutting edge models will achieve just a marginal improvement over
it. There are few requirements for a good baseline model:
1. Baseline model should be simple. Simple models are less likely to overfit. If you
see that your baseline is already overfitting, it makes no sense to go for more
complex modelling, as the complexity will kill the performance.
When creating a machine learning model, you'll be presented with design choices as to how
to define your model architecture. Often times, we don't immediately know what the optimal
model architecture should be for a given model, and thus we'd like to be able to explore a
range of possibilities. In true machine learning fashion, we'll ideally ask the machine to
perform this exploration and select the optimal model architecture automatically. Parameters
which define the model architecture are referred to as hyperparameters and thus this process
of searching for the ideal model architecture is referred to as hyperparameter tuning.
These hyperparameters might address model design questions such as:
I want to be absolutely clear, hyperparameters are not model parameters and they cannot be
directly trained from the data. Model parameters are learned during training when we
optimize a loss function using something like gradient descent.The process for learning
parameter values is shown generally below.
Whereas the model parameters specify how to transform the input data into the desired
output, the hyperparameters define how our model is actually structured. Unfortunately,
there's no way to calculate “which way should I update my hyperparameter to reduce the
loss?” (ie. gradients) in order to find the optimal model architecture; thus, we generally resort
to experimentation to figure out what works best.
In general, this process includes:
1. Define a model
Debugging Strategies:
1. Study the system for the larger duration in order to understand the system. It helps
debugger to construct different representations of systems to be debugging
depends on the need. Study of the system is also done actively to find recent
changes made to the software.
2. Backwards analysis of the problem which involves tracing the program backward
from the location of failure message in order to identify the region of faulty code.
A detailed study of the region is conducting to find the cause of defects.
3. Forward analysis of the program involves tracing the program forwards using
breakpoints or print statements at different points in the program and studying the
results. The region where the wrong outputs are obtained is the region that needs
to be focused to find the defect.
4. Using the past experience of the software debug the software with similar
problems in nature. The success of this approach depends on the expertise of the
debugger.
UNIT-V
Large-Scale Deep Learning
number of neurons must be large
requires high performance hardware and software infrastructure.
use GPU computing or the CPUs of many machines networked together.
careful specialization of numerical computation routines can yield a large payoff.
Other strategies, besides choosing whether to use fixed or floating point, include
optimizing data structures to avoid cache misses and using vector instructions.
Graphics processing units (GPUs) are specialized hardware components
that were originally developed for graphics applications.
graphics cards having been designed to have a high degree of parallelism and high
memory bandwidth, at the cost of having a lower clock speed and less branching
capability relative to traditional CPUs.
Neural networks usually involve large and numerous buffers of parameters,
activation values, and gradient values, each of which must be completely updated
during every step of training.
memory operations are faster if they can be coalesced. Coalesced reads or writes
occur when several threads can each read or write a value that they need
simultaneously, as part of a single memory transaction. Different models of GPUs
are able to coalesce different kinds of read or write patterns. Typically, memory
operations are easier to coalesce if among n threads, thread i accesses byte i + j of
memory, and j is a multiple of some power of 2.
each thread in a group executes the same instruction simultaneously. This means
that branching can be difficult on GPU. Threads are divided into small groups
called warps. Each thread in a warp executes the same instruction during each
cycle, so if different threads within the same warp need to execute different code
paths, these different code paths must be traversed sequentially rather than in
parallel.
A key strategy for reducing the cost of inference is model compression
dynamic structure in the graph describing the computation needed to process an
input.
Natural Language Processing
A language model defines a probability distribution over sequences of tokens
in a natural language. Depending on how the model is designed, a token may
be a word, a character, or even a byte.
Training n-gram models is straightforward because the maximum likelihood
estimate can be computed simply by counting how many times each possible n
gram occurs in the training set.
When Pn−1 is non-zero but Pn is zero, the test log-likelihood is −∞. To avoid such
catastrophic outcomes, most n-gram models employ some form of smoothing.
Smoothing techniques
One basic technique consists of adding non-zero probability mass to all of the
possible next symbol values. This method can be justified as Bayesian inference
with a uniform or Dirichlet prior over the count parameters.
Classical n-gram models are particularly vulnerable to the curse of dimensionality.
There are | | V n possible n-grams and | | V is often very large. Even with a massive
training set and modest n, most n-grams will not occur in the training set. One way
to view a classical n-gram model is that it is performing nearest-neighbor lookup.
In other words, it can be viewed as a local non-parametric predictor, similar to k-
nearest neighbors.
To improve the statistical efficiency of n-gram models, class-based language
models (Brown et al., 1992; Ney and Kneser, 1993; Niesler et al., 1998) introduce
the notion of word categories and then share statistical strength between words that
are in the same category. The idea is to use a clustering algorithm to partition the
set of words into clusters or classes, based on their co-occurrence frequencies with
other words.
Neural language models or NLMs are a class of language model designed
to overcome the curse of dimensionality problem for modeling natural language
sequences by using a distributed representation of words (Bengio et al., 2001).
Unlike class-based n-gram models, neural language models are able to recognize
that two words are similar without losing the ability to encode each word as distinct
from the other. Neural language models share statistical strength between one
word (and its context) and other similar words and contexts. The distributed
representation the model learns for each word enables this sharing by allowing the
model to treat words that have features in common similarly.
word embeddings. In this interpretation, we view the raw symbols as points in a
space of dimension equal to the vocabulary size. The word representations embed
those points in a feature space of lower dimension. In the original space, every
word is represented by a one-hot vector, so every pair of words is at Euclidean
distance √2 from each other.
In many applications, V contains hundreds of thousands of words. The naive
approach to representing such a distribution is to apply an affine transformation
from a hidden representation to the output space, then apply the softmax function.
Suppose we have a vocabulary V with size | | V . The weight matrix describing the
linear component of this affine transformation is very large, because its output
dimension is |V | .
Boltzmann Machine
Boltzmann Machine is a kind of recurrent neural network where the nodes make binary decisions
and are present with certain biases. Several Boltzmann machines can be collaborated together to
make even more sophisticated systems such as a deep belief network. Coined after the famous
Austrian scientist Ludwig Boltzmann, who based the foundation on the idea of Boltzmann
distribution in the late 20th century, this type of network was further developed by Stanford
scientist Geoffrey Hinton. It derives its idea from the world of thermodynamics to conduct work
toward desired states. It consists of a network of symmetrically connected, neuron-like units that
make decisions stochastically whether to be active or not.
The main purpose of the Boltzmann Machine is to optimize the solution of a problem. It optimizes
the weights and quantities related to the particular problem assigned to it. This method is used
when the main objective is to create mapping and learn from the attributes and target variables in
the data. When the objective is to identify an underlying structure or the pattern within the data,
unsupervised learning methods for this model are considered to be more useful. Some of the most
popular unsupervised learning methods are Clustering, Dimensionality reduction, Anomaly
detection and Creating generative models.
Each of these techniques has a different objective of detecting patterns such as identifying latent
grouping, finding irregularities in the data, or generating new samples from the available data.
These networks can also be stacked layer-wise to build deep neural networks that capture highly
complicated statistics. The use of Restricted Boltzmann Machines has gained popularity in the
domain of imaging and image processing as well since they are capable of modelling continuous
data that are common to natural images. They also are being used to solve complicated quantum
mechanical many-particle problems or classical statistical physics problems like the Ising and
Potts classes of models.
The architecture of the Boltzmann Machine comprises a shallow, two-layer neural network that
also constitutes the building blocks of the deep network. The first layer of this model is called the
visible or input layer and the second is the hidden layer. They consist of neuron-like units called a
node and nodes are the areas where calculations take place. These nodes are interconnected to
each other across layers, but no two nodes of the same layer are linked. Therefore there is no
intra-layer communication and hence being one of the restrictions in a Boltzmann machine. Each
node through computation processes input, and makes stochastic decisions about whether to
transmit that input or not. When data is fed as input, these nodes learn all the parameters, their
patterns and correlation between them, on their own and form an efficient system. Hence a
Boltzmann Machine is also termed as an Unsupervised Deep Learning model.
This model can then be trained to get ready to monitor and study abnormal behaviour depending
on what it has learnt. The coefficients that modify the inputs are randomly initialized. Each visible
node takes a low-level feature from an item present in the dataset to be learned. The result of
those two operations is fed into an activation function which in turn produces the node’s output
also known as the strength of the signal passing through it. The outputs of the first hidden layer
would be passed as inputs to the second hidden layer, and from there through as many hidden
layers created, until they reach a final classifying layer. For simple feed-forward movements, the
nodes function as an autoencoder. Learning is typically very slow in Boltzmann machines with a
high number of hidden layers as large networks may take a long time to approach their
equilibrium distribution, especially when the weights are large and the distribution being highly
multimodal.
15.Additional Topics:
16.Question Papers
VASAVI COLLEGE OF ENGINEERING(AUTONOMOUS), IBRAHIMBAGH, HYDERABAD-500031
Department of Information Technology
BE(CBCS) VIII Semester (2019-2020) – I – Internal Examination
Title of the course: Deep Learning (Professional Elective-VI) (PE850IT)
Maximum Marks: 20 Duration: 60 min.
Date: 23-02-2020 Time: 02:30 PM to 03:30 PM
1.
Define a Neural Network. 1 1 1 1
2.
State the drawback of a singl-layer Perceptron. 1 1 1 1
3.
Mention some applications of Deep Learning. 1 1 1 1
4.
State the importance of Early Stopping. 1 1 2 1
5.
Give the equation for the expected squared error of 1 1 2 1
the ensemble predictor.
6.
Differentiate between L1 and L2 regularizations. 1 2 2 1
52
b) Write an Early Stopping meta-algorithm to
2
determine at what objective value we start to
overfit, then continue training until that value is 3 3 2
reached.
Summary of the percentage for each of the criteria BTL (Blooms Taxonomy Level) from the questions
framed.
1. Fundamental knowledge from Level-1 (Recall) & 2 (understand) : 60 %
2. Knowledge on application and analysis from Level-3(Apply) & 4 (Analyze) : 40 %
3. Critical thinking and ability to design from Level-5 (Estimate) & 6 (Create or Design): 00%
53
VASAVI COLLEGE OF ENGINEERING(AUTONOMOUS), IBRAHIMBAGH, HYDERABAD-500031
Department of Information Technology
BE(CBCS) VIII Semester (2019-2020) – I – Internal Examination
Title of the course: Deep Learning (Professional Elective-VI) (PE850IT)
Maximum Marks: 20 Duration: 60 min.
Date: 23-02-2020 Time: 02:30 PM to 03:30 PM
10.
Define a Perceptron. 1 1 1 1
11.
Mention any two applications of Multilayer Neural 1 1 1 1
Networks.
12.
What is Representation Learning? 1 1 1 1
13.
State the importance of Regularization. 1 1 2 1
14.
How does Early Stopping act as a regularizer? 1 1 2 1
15.
How is Data Augmentation effective for Object 1 2 2 1
Recognition?
54
VASAVI COLLEGE OF ENGINEERING(AUTONOMOUS), IBRAHIMBAGH, HYDERABAD-500031
Department of Information Technology
BE(CBCS) VIII Semester (2019-2020) – II – Internal Examination
Title of the course: Deep Learning (Professional Elective-VI) (PE850IT)
Maximum Marks: 20 Duration: 30 min.
Date: 27-06-2020(FN) Time: 11:00 AM to 11:30 AM
a)
55
b)
c)
d)
22.
What will be the output of the following matrix after applying average 1 3 3 2
pooling operation, 2x2 and stride=2?
a) b)
b) d)
23.
Number of layers in ALexNet, VGG, GoogleNet and ResNet model? 1 2 3 1
24.
What is the number of parameters in a max-pooling layer? 1 1 3 1
a) Number of filters times dimension of each filter
b) Number of filters
c) One d) zero
56
25.
What is the output dimension of the resulting image when a 7 × 7 1 2 3 2
kernel is applied to a 9 × 9 image?
a) 3 X 3 b) 4 X 4 c) 5 X 5 d) 2 X 2
26.
What is the difference between back-propagation algorithm and back- 1 2 4 1
propagation through time (BPTT) algorithm?
a) Unlike back-propagation, in BPTT we add the gradients for
corresponding weight for each time step
b) Unlike back-propagation, in BPTT we subtract the gradients
for corresponding weight for each time step
c) No difference
d) None of the above
27.
What technique is followed to deal with the problem of Exploding 1 1 4 1
Gradients in Recurrent Neural Net- works (RNN)?
Parameter Tying
a) b) Gradient clipping c) Using
modified architectures like LSTMs and GRUs d) Using
dropout
28. LSTM provide more control-ability and better results as compared to
1 1 4 1
RNN. But also comes with more complexity and operating cost
a) True b) False
29.
What is the number of zeros in the derivative of ht w.r.t. st, where 1 2 4 1
ht,st∈Rn?
a) n-1 b) n c) n2 – n d) 0
30.
In LSTM, during forward propagation, the gates control the flow of 1 1 4 1
information.
a) True b) False
31.
Why is an RNN (Recurrent Neural Network) used for machine 1 1 4 1
translation, say translating English to French?
a) It is applicable when the input/output is a sequence (e.g., a
sequence of words)
b) b) It is strictly more powerful than a Convolutional Neural
Network (CNN)
c) RNNs do not have problem of vanishing gradient
d) None of the above
32.
RNNs can be used with convolutional layers to extend the effective 1 1 4 1
pixel neighborhood
a) True b) False
33. Given below is the representation of LSTM, What is the number of operations that takeplace a
given timestep, t?
1 2 4 1
57
a) 4 b) 3 c) 6 d) 2
34.
Keras is a deep learning framework on which tool? 1 1 5 1
a) R b) Tensorflow c) SAS d) Azure
35.
How calculations work in TensorFlow? 1 1 5 1
a) Through vector multiplications b) Through RDDs
c) Through Computational Graphs d) Through map reduce tasks
36.
Why does Tensorflow use Computational Graphs? 1 1 5 1
a) Tensors are nothing but Computational Graphs
b) Graphs are easy to plot
c) There is no such concept of Computational Graphs in Tensorflow
d) Calculations can be done in parallel
37.
Which tool is a Deep Learning Wrapper on Tensorflow? 1 1 5 1
a) Python b) Keras c) PyTorch d) Azure
38.
How do we perform calculations in Tensorflow? 1 1 5 1
a) We launch the computational graph in TensorFlow.
b) We launch the session inside a Computational Graph
c) By creating multiple Tensors
d) By creating data Frames.
58
Code No: 138DU R16
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD
B. Tech IV Year II Semester Examinations, September - 2020
NEURAL NETWORKS AND DEEP
LEARNING
(Common to CSE, IT)
---
1. List and explain the various activation functions used in modeling of artificial neuron.
Also explain their suitability with respect to applications. [15]
2. Describe the Characteristics of Continuous Hopfield memory and discuss how it can be
used to solve Traveling salesman Problem. [15]
3. Explain the architecture and algorithm of full CPN with diagram. [15]
4. Give the Architecture of kohenen self-organizing and explain how it is used cluster the
input vectors. [15]
5. Give an example of learning XOR function to explain a fully functioning feed
forward network. [15]
6. Explain in detail about the concept of gradient based learning. [15]
7. Write an early stopping meta-algorithm for determining the best amount of time to train.
[15]
8. Discuss the application of second-order methods to the training of deep networks. [15]
---ooOoo---
59
QUESTION BANK
UNIT-I
CO1: Describe various deep learning algorithms used across various domains.
DESCRIPTIVE QUESTIONS
1. List out the historical developments in deep learning?
7. Suppose 30% of the women in a class received an A on the test and 25% of the men received an
A. The class is 60% women. Given that a person chosen at random received an A, what is the
probability this person is a women?
8. Find the mean and variance of Uniform distribution?
9. Explain about structural probabilistic models along with an example.
10. Write a short note on Expectations, Mean and Variance.
11. What is the expected value, mean and variance of the sum of three dices thrown together?
60
UNIT-II
CO2: Design the feed forward neural network using appropriate techniques.
DESCRIPTIVE QUESTIONS
1. Explain briefly about gradient descent optimization along with an example
2. Illustrate constrained optimization?
3. Using Least squares method, Find the regression line of the following data.
X Y
0 1
1 2
2 4
3 3.5
4 5
5 4
6 7
7 9
8 12
9 17
61
8. Suppose the data x1, x2, . . . , xn is drawn from a N(µ, σ2 ) distribution(Normal
distribution), where µ and σ are unknown. Find the maximum likelihood estimate
for the pair (µ, σ2 ).
9. How does cross validation reduce bias and variance?
UNIT-III
CO3: Develop the conditional random fields and its use in designing the deep neural network.
DESCRIPTIVE QUESTIONS
1. How Neural Networks solve the XOR problem using Back propagation method?
2. What is a training set and how is it used to train neural networks?
3.
4. You are given the following neural networks which take two binary valued inputs x 1, x2 ∈ {0, 1}
and the activation function is the threshold function(h(x) = 1 if x > 0; 0 otherwise). Which of the
following logical functions does it compute?
62
5. Describe the procedure of obtaining local minima using Gradient Descent method?
6. The XOR function (exclusive or) returns true only when one of the arguments is true and
another is false. Otherwise, it returns always false. Do you think it is possible to implement
this function using a single unit? A neural network of several units? Explain
7. Illustrate how data augmentation will improve the performance of deep learning model?
8. Illustrate how Bayesian inference over the weights will resolve Noise Robustness of a
neural network model?
11. Explain Newton second order derivative method to arrive optimal point for loss function
12. How sampling the weights of each fully connected layer will appropriate the initialization
of parameters of a neural network model?
13. Illustrate how Batch Normalization has stabilized the learning process for faster
convergence rates?
UNIT-IV
DISCRIPTIVE QUESTIONS
1. Discuss the Variants of the basic convolution function?
2. Suppose you have a convolutional network with the following architecture:
• The input is an RGB image of size 256 × 256.
• The first layer is a convolution layer with 32 feature maps and filters of size 3 ×3. It
uses a stride of 1, so it has the same width and height as the original image.
• The next layer is a pooling layer with a stride of 2 (so it reduces the size of each
dimension by a factor of 2) and pooling groups of size 3 × 3.
Determine the size of the receptive field for a single unit in the pooling layer.
3. An input image has been converted into a matrix of size 12 X 12 along with a filter of
size 3 X 3 with a Stride of 1. Determine the size of the convoluted matrix?
4. Illustrate hyper-parameter tuning to minimize the generalization error of a neural network
model?
5. Illustrate Sequence-to-Sequence Recurrent Neural Network architectures?
63
6. Determine what the following recurrent neural network computes. More precisely,
determine the function computed by the output unit at the final time step; the other
outputs are not important. All of the biases are 0. You may assume the inputs are integer
valued and the length of the input sequence is even?
7. Draw the unfolding graph for the Recurrent Neural Network model.
8. Illustrate how autoencoders are trained to reconstruct the input data in representation
learning mechanism?
10. Analyze the equation using Newton’s method to find the roots of the equation
point
64
12. The manufacturing company X and Y produces mobile phones. The manufacturing cost is,
C(x,y)= .
If the company objective is to produce 1900units per month while minimizing the total
monthly cost of production, how many units should be produced at each factory?
13.Explain about bi-directional RNN’s
UNIT-V
CO5: Optimize the deep neural network and to experiment various tools.
DESCRIPTIVE QUESTIONS
1. Explain in detail about Large Scale Deep Learning
2. What is convolutional Boltzmann machine?
3. What is linear factor model?
4. Illustrate reparameterization in Variational Autoencoders to achieve back
propagation through random operations?
5. Discuss the applications of Deep Learning in Computer Vision
6. Illustrate how autoencoders are trained to reconstruct the input data in
representation learning mechanism?
7. Explain how Deep Belief Network will be formed through the process of
gradient descent and back propagation of Restricted Boltzmann machines?
8. Discuss the important debugging tests in deep learning
9. what are encoders and decoders?
A=
65
A=
A=
Assignment-II
4. Explain about structured probabilistic models and along with an example?
5. write short notes on Expectation, Mean, and Variance ?
Ex: what is the expected value, mean and variance of the sum of three dices thrown together?
Assignment-III
6. Explain briefly about gradient descent optimization along with an example?
7. Explain L1, L2 parameter regularization of norm penalities.
Assignment-IV
8. Discuss the role of hyper parameters in a deep learning application.
9. Discuss the variants of the basic convolution function
Assignment V
66
3. The concept of Eigen values and vectors is applicable to?
A. Scalar matrix B. Identity matrix
C. Upper triangular matrix D. Square matrix
4. The rank of a 3 x 3 matrix C (= AB), found by multiplying a non-zero
column matrix A of size 3 x 1 and a non-zero row matrix B of size 1 x 3, is
A. 0 B. 1 C. 2 D. 3
5. How many of the following matrices have an eigenvalue 1?
A.1 B. 2 C. 3 D. 4
6. For any square matrix A, AAT is a
a) Unit matrix b) Symmetric matrix
c) Skew symmetric matrix d) Diagonal matrix
7. The eigenvalues of a 4 4 matrix A are given as 2,3,13, and 7. The detA is
a) 546 b) 19 c) 25 d) cannot be determined
8. The eigen vector (s) of the matrix is (are)
a) b) c) d)
9. Let A be a matrix such that A k = 0. What is the inverse of I - A?
a) 0 b) 1 c) A d)
e) Inverse is not guaranteed to exist
10. Let A and B be real symmetric matrices of size . Then which one of the
following is true?
a) b) c)
d)
11. If M is a square matrix with a zero determinant, which of the following assertion
(s) is (are) correct?
S1: Each row of M can be represented as a linear combination of the other rows
S2: Each column of M can be represented as a linear combination of the other
columns
S3: MX = 0 has a nontrivial solution
S4: M has an inverse
a) S3 and S2 b) S1 and S4 c) S1 and S3 d) S1, S2 and S3
67
12. Let A be a n×n matrix. Which of the following properties would necessarily imply that
A is singular?
I. The columns of A are linearly dependent.
II. A has a singular value that is 0.
III. Az = 0, for some z ≠ 0.
a) II only b) I and II only c) I and III only d) II and III only
e) I, II and III
13. Every matrix m x n has a singular value decomposition.
a) True b) False c) Invalid statement
14. Which of the following are true about principal components analysis (PCA)?
a) The principal components are eigenvectors of the centered data matrix.
b) The principal components are eigenvectors of the sample covariance matrix.
c) The principal components are right singular vectors of the centered data matrix.
d) The principal components are right singular vectors of the sample covariance matrix.
15. A man buys 10 bulbs, each with independent exponentially distributed lifetimes with the
same mean, with the intention of using one bulb at a time and replacing it with another as soon as
it fails. The distribution of the total duration of the 10 bulbs taken together is
a) Exponential b) Normal c) Uniform d) Bernoulli
16. Let A and B be events on the same sample space, with P (A) = 0.6 and P (B) = 0.7. Can these
two events be disjoint?
a) Yes b) No
17. Alice has 2 kids and one of them is a girl. What is the probability that the other child is
also a girl? You can assume that there are an equal number of males and females in the world.
a) 0.5 b) 0.25 c) 0.333 d) 0.75
18. Given two Boolean random variables, A and B, where P(A) = ½, P(B) = 1/3, and P(A | ¬B)
= ¼, what is P(A | B)?
a) 1/6 b) ¼ c) ¾ d) 1
68
Unit-2
1.what does it mean if your model has over-fit the data?
a) It has memorized the correct answers to the test data.
b) It has’nt captured enough details.
c) It has captured details in the training data that are irrelevant to the question.
2. How might a learning algorithm find a best line? (more than option can be correct)
a) Use an iterative method like gradient descent
b) Trial and error.
c) Plot all possible lines and pick the one that looks best.
d) Set the derivative of the loss function equal to 0 and solve.
e) Brute Force search.
3. Why is Gradient Descent considered an iterative approach?
a) Because we are using continuous updates to converge to a minimum.
b) Because we are using step-wise updates to converge on a minimum
Unit-3
1. Which of the following guidelines is applicable to initialization of the weight vector in a
fully connected neural network.
69
3. ___________ refers to a model that can neither model the training data nor generalize to new
data.
a) good fitting b) overfitting c) underfitting d) all of the above
4. Underdetermined problems are those problems that have [ ]
a. infinitely many solutions b. finite solutions
c. unique solution d. None of the above
5. Dropout is a [ ]
a.optimization technique b. regularization technique
c.adversarial technique d. None of the above
6.RMSProp addresses the problem caused by accumulated gradients in [ ]
a. Adam b. Adadelta
c.Momentum d. AdaGrad
7. The sparsity property induced by L1 regularization has been used extensively
as a ---------------------- mechanism.
8. Dataset augmentation has been a particularly effective technique for ----------.
9. Optimization algorithms that use the entire training set are called ------------ gradient
methods.
10. --------------------------- and its variants are probably the most used optimization
algorithms for deep learning in particular.
Unit-4
1. Pooling layers are used to accomplish which of the following? [ ]
70
4. We can use Recurrent Neural Network in all the above scenarios
13. Which of the following does not suffer from vanishing gradient descent problem?
1. 1-layer feed forward networks
[ ]
2. Very deep feed forward networks
3. Recurrent neural networks
4. Convolutional neural networks
14. -------------------------------provide a way to specialize neural networks to work
with data that has a clear grid-structured topology and to scale such models to very large
size.
15. --------------------- are a family of neural networks for processing sequential data.
16. The two basic approaches in choosing hyperparameters are --------------------------.
Unit-5
71
5. The ------------------------ model takes advantage of the observation that most variations in the
data can be captured by the latent variables, up to some small residual reconstruction error.
6. The ------------------- is an autoencoder that receives a corrupted data point as input and is
trained to predict the original, uncorrupted data point as its output
20.Tutorial Problems
NA
1
Assignment- 1
2
3
4
5
1
Assignment- 2
2
3
4
72
5
1
Assignment- 3
2
3
4
5
1
Assignment- 4
2
3
4
5
1
Assignment- 5
2
3
4
5
21.Known gaps
-No-
73
23. References, Journals, websites and E-links
REFERRENCE BOOKS:
Deep Learning, Goodfellow, I., Bengio,Y., and Courville, A., MIT Press, 2016.(Unit I-V)
1. https://www.deeplearningbook.org/contents/TOC.html
2. https://analyticsindiamag.com/
3. https://onlinecourses.nptel.ac.in/noc22_cs35
Teaching Evaluation
NA
25.Student List
Section-A
74
S.No Roll No Name
1 18R11A0501 Adavikolanu Swapna
2 18R11A0502 Andugula Shashaank
3 18R11A0503 Awari Deekshitha
4 18R11A0504 B Deevena Angeline Sunayana
5 18R11A0505 Bhamidipati Shiridi Prasad Revanth
6 18R11A0506 Ch Siri Sowmya
7 18R11A0507 Cheripalli Sreeja
8 18R11A0509 Errabelli Rushyanth
9 18R11A0510 G N Harshita
10 18R11A0511 Gajji Varun Kumar
11 18R11A0512 Sri Sai Pranavi Ganti
12 18R11A0513 H S Shreya
13 18R11A0514 Jangam Nagarjuna Goud
14 18R11A0515 Kanne Nithesh Sai
15 18R11A0516 Kodi Akhil Yadav
16 18R11A0517 Kola Snehitha
17 18R11A0518 Komuravelli Karthik
18 18R11A0519 Korada Santosh Kumar
19 18R11A0520 Kunchala Sairam
20 18R11A0521 L A Prithviraj Kumar
21 18R11A0522 Lahari Basavaraju
22 18R11A0523 Linga Jaya Krishna
23 18R11A0524 M Sree Charan Reddy
24 18R11A0525 Mambeti Sairam
25 18R11A0526 Mamilla Ramya
26 18R11A0527 Mohammad Afroz Khan
27 18R11A0528 Mohammed Abdul Ameen Siddiqui
28 18R11A0529 Muddula Anusha
29 18R11A0530 Musale Aashish
30 18R11A0531 Mutyala Santosh
31 18R11A0532 Pariti Divya
32 18R11A0533 Paruchuri Harsha Vardhan
33 18R11A0534 Patri Sai Sindhura
34 18R11A0535 Pinnem Tarun Kumar
35 18R11A0536 Pirangi Nithin Kalyan
36 18R11A0537 Poojaboina Preethi
37 18R11A0538 Puranam Satya Sai Rama Tarun
38 18R11A0539 S Guna Sindhuja
39 18R11A0540 Sangaraju Greeshma
40 18R11A0541 Syed Zainuddin
75
41 18R11A0542 Telukuntla Rajkumar
42 18R11A0543 Thorupunuri Jancy
43 18R11A0544 Thumu Ram Sai Teja Reddy
44 18R11A0545 Vadakattu Harish
45 18R11A0546 Vaishnavi Sabna
46 18R11A0547 Vemuri Madhu Venkata Sai
47 18R11A0548 Yarram Reddy Venkata Srivani Reddy
48 19R15A0501 Bhulaxmi Kalpana
49 19R15A0502 Challa Divya Reddy
50 19R15A0503 Adla Likitha
51 19R15A0504 Gopaladas Vinayalatha
52 19R15A0505 Ganji Charan Kumar
Section-D
76
26.Group wise Students List: Section-D
77
VIPRAGHNA VISHWANATH
18R11A05G4 INJEY DIVYA 18R11A05K1 SRIKAKULAPU
18R11A05G5 JYOTI GOUDA 18R11A05K2 YALALA SHALINI
KOMMERA VAMSHI KOLANUCHELIMI SAI
18R11A05G7 KRISHNA REDDY 19R15A0516 CHARAN
KONAKANCHI
18R11A05G8 MAHALAKSHMI 19R15A0517 CH NIKHIL
KORUKOPPULA SAI
18R11A05G9 KRISHNA 19R15A0518 KANDI PAVAN
18R11A05H0 KOTTAM CHANDRA SHEKAR 19R15A0519 CHITYALA SIRISHA
18R11A05H1 MADHAVI YADAV 19R15A0520 VAGALDAS ARAVIND
78