Soft Computing MCQ (9 Files Merged)

Artificial Intelligence Questions and
Answers – Fuzzy Logic – 1

This set of Artificial Intelligence MCQs focuses on “Fuzzy Logic – 1”.
1. Fuzzy logic is a form of

a) Two-valued logic
b) Crisp set logic
c) Many-valued logic
d) Binary set logic
View Answer
Answer: c
Explanation: With fuzzy logic set membership is defined by certain value. Hence it
could have many values to be in the set.
2. Traditional set theory is also known as Crisp Set theory.

a) True
b) False
View Answer
Answer: a
Explanation: Traditional set theory set membership is fixed or exact either the
member is in the set or not. There is only two crisp values true or false. In case of
fuzzy logic there are many values. With weight say x the member is in the set
3. The truth values of traditional set theory is ____________ and that of fuzzy set is
__________
a) Either 0 or 1, between 0 & 1
b) Between 0 & 1, either 0 or 1
c) Between 0 & 1, between 0 & 1
d) Either 0 or 1, either 0 or 1
View Answer
Answer: a
Explanation: Refer the definition of Fuzzy set and Crisp set.
4. Fuzzy logic is extension of Crisp set with an extension of handling the concept of
Partial Truth.
a) True
b) False
View Answer
Answer: a
Explanation: None.
advertisements
5. How many types of random variables are available?
a) 1
b) 2
c) 3
d) 4
View Answer
Answer: c
Explanation: The three types of random variables are Boolean, discrete and
continuous.
6. The room temperature is hot. Here the hot (use of linguistic variable is used) can be
represented by _______ .
a) Fuzzy Set
b) Crisp Set
View Answer
Answer: a
Explanation: Fuzzy logic deals with linguistic variables.
7. The values of the set membership is represented by

a) Discrete Set
b) Degree of truth
c) Probabilities
d) Both b & c
View Answer
Answer: b
Explanation: Both Probabilities and degree of truth ranges between 0 – 1.
8. What is meant by probability density function?

a) Probability distributions
b) Continuous variable
c) Discrete variable
d) Probability distributions for Continuous variables
View Answer
Answer: d
Explanation: None.
advertisements
9. Japanese were the first to utilize fuzzy logic practically on high-speed trains in
Sendai.
a) True
b) False
View Answer
Answer: a
Explanation: None.
10. Which of the following is used for probability theory sentences?

a) Conditional logic
b) Logic
c) Extension of propositional logic
d) None of the mentioned
View Answer
Answer: c
Explanation: The version of probability theory we present uses an extension of
propositional logic for its sentences.

1. Fuzzy Set theory defines fuzzy operators. Choose the fuzzy operators from the
following.
a) AND
b) OR
c) NOT
d) EX-OR
View Answer
Answer: a, b, c
Explanation: The AND, OR, and NOT operators of Boolean logic exist in fuzzy logic,
usually defined as the minimum, maximum, and complement;
2. There are also other operators, more linguistic in nature, called __________ that
can be applied to fuzzy set theory.
a) Hedges
b) Lingual Variable
c) Fuzz Variable
View Answer
Answer: a
Explanation: None.
3. Where does the Bayes rule can be used?

a) Solving queries
b) Increasing complexity
c) Decreasing complexity
d) Answering probabilistic query
View Answer
Answer: d
Explanation: Bayes rule can be used to answer the probabilistic queries conditioned
on one piece of evidence.
4. What does the Bayesian network provides?
a) Complete description of the domain
b) Partial description of the domain
c) Complete description of the problem
View Answer
Answer: a
Explanation: A Bayesian network provides a complete description of the domain.
advertisements
5. Fuzzy logic is usually represented as
a) IF-THEN-ELSE rules
b) IF-THEN rules
c) Both a & b
View Answer
Answer: b
Explanation: Fuzzy set theory defines fuzzy operators on fuzzy sets. The problem in
applying this is that the appropriate fuzzy operator may not be known. For this reason,
fuzzy logic usually uses IF-THEN rules, or constructs that are equivalent, such as
fuzzy associative matrices.
Rules are usually expressed in the form:
IF variable IS property THEN action
6. Like relational databases there does exists fuzzy relational databases.

a) True
b) False
View Answer
Answer: a
Explanation: Once fuzzy relations are defined, it is possible to develop fuzzy
relational databases. The first fuzzy relational database, FRDB, appeared in Maria
Zemankova’s dissertation.
7. ______________ is/are the way/s to represent uncertainty.

a) Fuzzy Logic
b) Probability
c) Entropy
d) All of the mentioned
View Answer
Answer: d
Explanation: Entropy is amount of uncertainty involved in data. Represented by
H(data).
8. ____________ are algorithms that learn from their more complex environments
(hence eco) to generalize, approximate and simplify solution logic.
a) Fuzzy Relational DB
b) Ecorithms
c) Fuzzy Set
View Answer
Answer: c
Explanation: Local structure is usually associated with linear rather than exponential
growth in complexity.
advertisements
9. Which condition is used to influence a variable directly by all the others?
a) Partially connected
b) Fully connected
c) Local connected
View Answer
Answer: b
Explanation: None.
10. What is the consequence between a node and its predecessors while creating
Bayesian network?
a) Conditionally dependent
b) Dependent
c) Conditionally independent
d) Both a & b
View Answer
Answer: c
Explanation: The semantics to derive a method for constructing Bayesian networks
were led to the consequence that a node can be conditionally independent of its
predecessors
Answers – Neural Networks – 1
This set of Artificial Intelligence MCQs focuses on “Neural Networks – 1”.
1. A 3-input neuron is trained to output a zero when the input is 110 and a one when
the input is 111. After generalization, the output will be zero when and only when the
input is:
a) 000 or 110 or 011 or 101
b) 010 or 100 or 110 or 101
c) 000 or 010 or 110 or 100
d) 100 or 111 or 101 or 001
View Answer
Answer: c
Explanation: The truth table before generalization is:
Inputs Output
000 $
001 $
010 $
011 $
100 $
101 $
110 0
111 1
where $ represents don’t know cases and the output is random.
After generalization, the truth table becomes:
Inputs Output
000 0
001 1
010 0
011 1
100 0
101 1
110 0
111 1
.
2. A perceptron is:
a) a single layer feed-forward neural network with pre-processing
b) an auto-associative neural network
c) a double layer auto-associative neural network
d) a neural network that contains feedback
View Answer
Answer: a
Explanation: The perceptron is a single layer feed-forward neural network. It is not an
auto-associative network because it has no feedback and is not a multiple layer neural
network because the pre-processing stage is not made of neurons.
3. An auto-associative network is:

a) a neural network that contains no loops
b) a neural network that contains feedback
c) a neural network that has only one loop
d) a single layer feed-forward neural network with pre-processing
View Answer
Answer: b
Explanation: An auto-associative network is equivalent to a neural network that
contains feedback. The number of feedback paths(loops) does not have to be one.
4. A 4-input neuron has weights 1, 2, 3 and 4. The transfer function is linear with the
constant of proportionality being equal to 2. The inputs are 4, 10, 5 and 20
respectively. The output will be:
a) 238
b) 76
c) 119
d) 123
View Answer
Answer: a
Explanation: The output is found by multiplying the weights with their respective
inputs, summing the results and multiplying with the transfer function. Therefore:
Output = 2 * (1*4 + 2*10 + 3*5 + 4*20) = 238.
advertisements
5. Which of the following is true?
(i) On average, neural networks have higher computational rates than conventional
computers.
(ii) Neural networks learn by example.
(iii) Neural networks mimic the way the human brain works.
a) All of the mentioned are true
b) (ii) and (iii) are true
c) (i), (ii) and (iii) are true
View Answer
Answer: a
Explanation: Neural networks have higher computational rates than conventional
computers because a lot of the operation is done in parallel. That is not the case when
the neural network is simulated on a computer. The idea behind neural nets is based
on the way the human brain works. Neural nets cannot be programmed, they cam only
learn by examples.
6. Which of the following is true for neural networks?

(i) The training time depends on the size of the network.
(ii) Neural networks can be simulated on a conventional computer.
(iii) Artificial neurons are identical in operation to biological ones.
a) All of the mentioned
b) (ii) is true
c) (i) and (ii) are true
View Answer
Answer: c
Explanation: The training time depends on the size of the network; the number of
neuron is greater and therefore the number of possible ‘states’ is increased. Neural
networks can be simulated on a conventional computer but the main advantage of
neural networks – parallel execution – is lost. Artificial neurons are not identical in
operation to the biological ones.
7. What are the advantages of neural networks over conventional computers?

(i) They have the ability to learn by example
(ii) They are more fault tolerant
(iii)They are more suited for real time operation due to their high ‘computational’
rates
a) (i) and (ii) are true
b) (i) and (iii) are true
c) Only (i)
View Answer
Answer: d
Explanation: Neural networks learn by example. They are more fault tolerant because
they are always able to respond and small changes in input do not normally cause a
change in output. Because of their parallel architecture, high computational rates are
achieved.

Single layer associative neural networks do not have the ability to:
(i) perform pattern recognition
(ii) find the parity of a picture
(iii)determine whether two or more shapes in a picture are connected or not
a) (ii) and (iii) are true
b) (ii) is true
c) All of the mentioned
View Answer
Answer: a
Explanation: Pattern recognition is what single layer neural networks are best at but
they don’t have the ability to find the parity of a picture or to determine whether two
shapes are connected or not.
advertisements
9. Which is true for neural networks?
a) It has set of nodes and connections
b) Each node computes it’s weighted input
c) Node could be in excited state or non-excited state
View Answer
Answer: d
Explanation: All mentioned are the characteristics of neural network.
10. Neuro software is:

a) A software used to analyze neurons
b) It is powerful and easy neural network
c) Designed to aid experts in real world
d) It is software used by Neuro surgeon
View Answer
Answer: b
Explanation: None.

1. Why is the XOR problem exceptionally interesting to neural network researchers?

a) Because it can be expressed in a way that allows you to use a neural network
b) Because it is complex binary operation that cannot be solved using neural networks
c) Because it can be solved by a single layer perceptron
d) Because it is the simplest linearly inseparable problem that exists.
View Answer
Answer: d
Explanation: None.
2. What is back propagation?

a) It is another name given to the curvy function in the perceptron
b) It is the transmission of error back through the network to adjust the inputs
c) It is the transmission of error back through the network to allow weights to be
adjusted so that the network can learn.
View Answer
Answer: c
Explanation: Back propagation is the transmission of error back through the network
to allow weights to be adjusted so that the network can learn.
3. Why are linearly separable problems of interest of neural network researchers?

a) Because they are the only class of problem that network can solve successfully
b) Because they are the only class of problem that Perceptron can solve successfully
c) Because they are the only mathematical functions that are continue
d) Because they are the only mathematical functions you can draw
View Answer
Answer: b
Explanation: Linearly separable problems of interest of neural network researchers
because they are the only class of problem that Perceptron can solve successfully
4. Which of the following is not the promise of artificial neural network?

a) It can explain result
b) It can survive the failure of some nodes
c) It has inherent parallelism
d) It can handle noise
View Answer
Answer: a
Explanation: The artificial Neural Network (ANN) cannot explain result.
advertisements
5. Neural Networks are complex ______________ with many parameters.
a) Linear Functions
b) Nonlinear Functions
c) Discrete Functions
d) Exponential Functions
View Answer
Answer: a
Explanation: Neural networks are complex linear functions with many parameters.
6. A perceptron adds up all the weighted inputs it receives, and if it exceeds a certain
value, it outputs a 1, otherwise it just outputs a 0.
a) True
b) False
c) Sometimes – it can also output intermediate values as well
d) Can’t say
View Answer
7. The name for the function in question 16 is

a) Step function
b) Heaviside function
c) Logistic function
d) Perceptron function
View Answer
Answer: b
Explanation: Also known as the step function – so answer 1 is also right. It is a hard
thresholding function, either on or off with no in-between.
8. Having multiple perceptrons can actually solve the XOR problem satisfactorily:
this is because each perceptron can partition off a linear part of the space itself, and
they can then combine their results.
a) True – this works always, and these multiple perceptrons learn to classify even
complex problems.
b) False – perceptrons are mathematically incapable of solving linearly inseparable
functions, no matter what you do
c) True – perceptrons can do this but are unable to learn to do it – they have to be
explicitly hand-coded
d) False – just having a single perceptron is enough
View Answer
Answer: c
Explanation: None.
advertisements
9. The network that involves backward links from output to the input and hidden
layers is called as ____.
a) Self organizing maps
b) Perceptrons
c) Recurrent neural network
d) Multi layered perceptron
View Answer
Answer: c
Explanation: RNN (Recurrent neural network) topology involves backward links from
output to the input and hidden layers.
10. Which of the following is an application of NN (Neural Network)?

a) Sales forecasting
b) Data validation
c) Risk management
View Answer
Answer: d
Explanation: All mentioned options are applications of Neural Network

Answers – Learning – 3
This set of Artificial Intelligence MCQs focuses on “Learning – 3”.
1. Which is not a desirable property of a logical rule-based system?

a) Locality
b) Attachment
c) Detachment
d) Truth-Functionality
e) Global attribute
View Answer
Answer: b
Explanation: Locality: In logical systems, whenever we have a rule of the form A =>
B, we can conclude B, given evidence A, without worrying about any other rules.
Detachment: Once a logical proof is found for a proposition B, the proposition can be
used regardless of how it was derived .That is, it can be detachment from its
justification. Truth-functionality: In logic, the truth of complex sentences can be
computed from the truth of the components. However, there are no Attachment
properties lies in a Rule-based system. Global attribute defines a particular problem
space as user specific and changes according to user’s plan to problem.
2. How is Fuzzy Logic different from conventional control methods?

a) IF and THEN Approach
b) FOR Approach
c) WHILE Approach
d) DO Approach
e) Else If approach
View Answer
Answer: a
Explanation: FL incorporates a simple, rule-based IF X AND Y THEN Z approach to
a solving control problem rather than attempting to model a system mathematically.
3. In an Unsupervised learning
a) Specific output values are given
b) Specific output values are not given
c) No specific Inputs are given
d) Both inputs and outputs are given
e) Neither inputs nor outputs are given
View Answer
Answer: b
Explanation: The problem of unsupervised learning involves learning patterns in the
input when no specific output values are supplied. We cannot expect the specific
output to test your result. Here the agent does not know what to do, as he is not aware
of the fact what propose system will come out. We can say an ambiguous un-proposed
situation.
4. Inductive learning involves finding a

a) Consistent Hypothesis
b) Inconsistent Hypothesis
c) Regular Hypothesis
d) Irregular Hypothesis
e) Estimated Hypothesis
View Answer
Answer: a
Explanation: Inductive learning involves finding a consistent hypothesis that agrees
with examples. The difficulty of the task depends on the chosen representation.
advertisements
5. Computational learning theory analyzes the sample complexity and computational
complexity of
a) Unsupervised Learning
b) Inductive learning
c) Forced based learning
d) Weak learning
e) Knowledge based learning
View Answer
Answer: b
Explanation: Computational learning theory analyzes the sample complexity and
computational complexity of inductive learning. There is a tradeoff between the
expressiveness of the hypothesis language and the ease of learning.
6. If a hypothesis says it should be positive, but in fact, it is negative, we call it

a) A consistent hypothesis
b) A false negative hypothesis
c) A false positive hypothesis
d) A specialized hypothesis
e) A true positive hypothesis
View Answer
Answer: c
Explanation: Consistent hypothesis go with examples, If the hypothesis says it should
be negative but infect it is positive, it is false negative. If a hypothesis says it should
be positive, but in fact, it is negative, it is false positive. In a specialized hypothesis
we need to have certain restrict or special conditions.
7. Neural Networks are complex ———————–with many parameters.

a) Linear Functions
e) Power Functions
View Answer
Answer: b
Explanation: Neural networks parameters can be learned from noisy data and they
have been used for thousands of applications, so it varies from problem to problem
and thus use nonlinear functions.
8. A perceptron is a ——————————–.
a) Feed-forward neural network
b) Back-propagation algorithm
c) Back-tracking algorithm
d) Feed Forward-backward algorithm
e) Optimal algorithm with Dynamic programming
View Answer
Answer: a
Explanation: A perceptron is a Feed-forward neural network with no hidden units that
can be representing only linear separable functions. If the data are linearly separable,
a simple weight updated rule can be used to fit the data exactly.
advertisements
9. Which of the following statement is true?
a) Not all formal languages are context-free
b) All formal languages are Context free
c) All formal languages are like natural language
d) Natural languages are context-oriented free
e) Natural language is formal
View Answer
Answer: a
Explanation: Not all formal languages are context-free.
10. Which of the following statement is not true?

a) The union and concatenation of two context-free languages is context-free
b) The reverse of a context-free language is context-free, but the complement need not
be
c) Every regular language is context-free because it can be described by a regular
grammar
d) The intersection of a context-free language and a regular language is always
context-free
e) The intersection two context-free languages is context-free
View Answer
Answer: e
Explanation: The union and concatenation of two context-free languages is context-
free; but intersection need not be.

1. Factors which affect the performance of learner system does not include
a) Representation scheme used
b) Training scenario
c) Type of feedback
d) Good data structures
View Answer
Answer: d
Explanation: Factors which affect the performance of learner system does not include
good data structures.
2. Different learning method does not include:

a) Memorization
b) Analogy
c) Deduction
d) Introduction
View Answer
Answer: d
Explanation: Different learning methods include memorization, analogy and
deduction.
3. Which of the following is the model used for learning?

a) Decision trees
b) Neural networks
c) Propositional and FOL rules
View Answer
Answer: d
Explanation: Decision trees, Neural networks, Propositional rules and FOL rules all
are the models of learning.
4. Automated vehicle is an example of ______.

a) Supervised learning
b) Unsupervised learning
c) Active learning
d) Reinforcement learning
View Answer
Answer: a
Explanation: In automatic vehicle set of vision inputs and corresponding actions are
available to learner hence it’s an example of supervised learning.
advertisements
5. Following is an example of active learning:
a) News Recommender system
b) Dust cleaning machine
c) Automated vehicle
View Answer
Answer: a
Explanation: In active learning, not only the teacher is available but the learner can
ask suitable perception-action pair example to improve performance.
6. In which of the following learning the teacher returns reward and punishment to
learner?
a) Active learning
b) Reinforcement learning
c) Supervised learning
d) Unsupervised learning
View Answer
Answer: b
Explanation: Reinforcement learning is the type of learning in which teacher returns
award or punishment to learner.
7. Decision trees are appropriate for the problems where:

a) Attributes are both numeric and nominal
b) Target function takes on a discrete number of values.
c) Data may have errors
View Answer
Answer: d
Explanation: Decision trees can be used in all the conditions stated.
8. Which of the following is not an application of learning?

a) Data mining
b) WWW
c) Speech recognition
View Answer
Answer: d
Explanation: All mentioned options are applications of learning.
advertisements
9. Which of the following is the component of learning system?
a) Goal
b) Model
c) Learning rules
View Answer
Answer: d
Explanation: Goal, model, learning rules and experience are the components of
learning system.
10. Following is also called as exploratory learning:

b) Active learning
c) Unsupervised learning
View Answer
Answer: c
Explanation: In unsupervised learning no teacher is available hence it is also called
unsupervised learning.

1. What will take place as the agent observes its interactions with the world?
a) Learning
b) Hearing
c) Perceiving
d) Speech
View Answer
Answer: a
Explanation: Learning will take place as the agent observes its interactions with the
world and its own decision making process.
2. Which modifies the performance element so that it makes better decision?

a) Performance element
b) Changing element
c) Learning element
View Answer
Answer: c
Explanation: A learning element modifies the performance element so that it can make
better decision.
3. How many things are concerned in design of a learning element?

a) 1
b) 2
c) 3
d) 4
View Answer
Answer: c
Explanation: The three main issues are affected in design of a learning element are
components, feedback and representation.
4. What is used in determining the nature of the learning problem?

a) Environment
b) Feedback
c) Problem
View Answer
Answer: b
Explanation: The type of feedback is used in determining the nature of the learning
problem that the agent faces.
advertisements
5. How many types are available in machine learning?
a) 1
b) 2
c) 3
d) 4
View Answer
Answer: c
Explanation: The three types of machine learning are supervised, unsupervised and
reinforcement.
6. Which is used for utility functions in game playing algorithm?

a) Linear polynomial
b) Weighted polynomial
c) Polynomial
d) Linear weighted polynomial
View Answer
Answer: d
Explanation: Linear weighted polynomial is used for learning element in the game
playing programs.
7. Which is used to choose among multiple consistent hypotheses?

a) Razor
b) Ockham razor
c) Learning element
View Answer
Answer: b
Explanation: Ockham razor prefers the simplest hypothesis consistent with the data
intuitively.
8. What will happen if the hypothesis space contains the true function?
a) Realizable
b) Unrealizable
c) Both a & b
View Answer
Answer: b
Explanation: A learning problem is realizable if the hypothesis space contains the true
function.
advertisements
9. What takes input as an object described by a set of attributes?
a) Tree
b) Graph
c) Decision graph
d) Decision tree
View Answer
Answer: d
Explanation: Decision tree takes input as an object described by a set of attributes and
returns a decision.
10. How the decision tree reaches its decision?

a) Single test
b) Two test
c) Sequence of test
d) No test
View Answer
Answer: c
Explanation: A decision tree reaches its decision by performing a sequence of tests
1: ANN is composed of large number of highly interconnected processing
elements(neurons) working in unison to solve problems.
A.
True
B.
False
C.
D.
Answer Report Discuss
Option: A
Explanation :
2:
Artificial neural network used for
A.
Pattern Recognition
B.
Classification
C.
Clustering
D.
All of these

Option: D
Explanation :
3:
A Neural Network can answer
A.
For Loop questions
B.
what-if questions
C.
IF-The-Else Analysis Questions
D.
None of these
Option: B
Explanation :
4:
Ability to learn how to do tasks based on the data given for training or initial
experience
A.
Self Organization
B.
Adaptive Learning
C.
Fault tolerance
D.
Robustness
Option: B
Explanation :
5:
Feature of ANN in which ANN creates its own organization or representation of
information it receives during learning time is
A.
Adaptive Learning
B.
Self Organization
C.
What-If Analysis
D.
Supervised Learniing
Option: B
Explanation :
Read more: http://www.avatto.com/computer-science/test/mcqs/soft-
computing/ann/514/1.html#ixzz46VE8CQAp
6:
In artificial Neural Network interconnected processing elements are called
A.
nodes or neurons
B.
weights
C.
axons
D.
Soma
Option: A
Explanation :
7:
Each connection link in ANN is associated with ________ which has information
about the input signal.
A.
neurons
B.
weights
C.
bias
D.
activation function
Option: B
Explanation :
8:
Neurons or artificial neurons have the capability to model networks of original
neurons as found in brain
A.
True
B.
False
C.
D.
Option: A
Explanation :
9:
Internal state of neuron is called __________, is the function of the inputs the
neurons receives
A.
Weight
B.
activation or activity level of neuron
C.
Bias
D.
None of these
Option: B
Explanation :
10:
Neuron can send ________ signal at a time.
A.
multiple
B.
one
C.
none
D.
any number of
Option: B
Explanation :

computing/ann/514/2.html#ixzz46VEVzf3a
1:
Artificial intelligence is
A
. It uses machine-learning techniques. Here program can learn From past
experience and adapt themselves to new situations
B.
Computational procedure that takes some value as input and produces some
value as output.
C.
Science of making machines performs tasks that would require intelligence
when performed by humans
D
. None of these
Option: C
Explanation :
2:
Expert systems
A
. Combining different types of method or information
B.
Approach to the design of learning algorithms that is structured along the lines
of the theory of evolution
C.
an information base filled with the knowledge of an expert formulated in terms
of if-then rules
D
. None of these
Option: C
Explanation :
3:
Falsification is
A.
Modular design of a software application that facilitates the integration of new
modules
B.
Showing a universal law or rule to be invalid by providing a counter example
C.
A set of attributes in a database table that refers to data in another table
D.
None of these
Option: B
Explanation :
4:
Evolutionary computation is
A
B.
of the theory of evolution.
C.
Decision support systems that contain an information base filled with the
knowledge of an expert formulated in terms of if-then rules.
D
. None of these
Option: B
Explanation :
5:
Extendible architecture is
A.
modules
B.
C.
D.
None of these
Option: A
Explanation :

computing/questions/192/1.html#ixzz46VEoNPTw
6:
Massively parallel machine is
A.
A programming language based on logic
B.
A computer where each processor has its own operating system, its own
memory, and its own hard disk
C.
Describes the structure of the contents of a database.
D.
None of these
Option: B
Explanation :
7:
Search space
A
. The large set of candidate solutions possible for a problem
B.
The information stored in a database that can be, retrieved with a single query.
C.
Worth of the output of a machine learning program that makes it understandable
for humans
D
. None of these
Option: A
Explanation :
8:
n(log n) is referred to
A.
A measure of the desired maximal complexity of data mining algorithms
B.
A database containing volatile data used for the daily operation of an
organization
C.
Relational database management system
D.
None of these
Option: A
Explanation :
9:
Perceptron is
A.
General class of approaches to a problem.
B.
Performing several computations simultaneously
C.
Structures in a database those are statistically relevant
D.
Simple forerunner of modern neural networks, without hidden layers
Option: D
Explanation :
10:
Prolog is
A.
B.
C.
Describes the structure of the contents of a database
D.
None of these
Option: A
Explanation :

computing/questions/192/2.html#ixzz46VF3O07W
11:
Shallow knowledge
A
B.
The information stored in a database that can be, retrieved with a single query
C.
Worth of the output of a machine learning program that makes it
understandable for humans
D
. None of these
Option: B
Explanation :
12:
Quantitative attributes are
A.
A reference to the speed of an algorithm, which is quadratically dependent
on the size of the data
B.
Attributes of a database table that can take only numerical values
C.
Tools designed to query a database
D.
None of these
Option: B
Explanation :
13:
Subject orientation
A
. The science of collecting, organizing, and applying numerical facts
B.
Measure of the probability that a certain hypothesis is incorrect given certain
observations.
C.
One of the defining aspects of a data warehouse, which is specially built
around all the existing applications of the operational data
D
. None of these
Option: C
Explanation :
14:
Vector
A.
It do not need the control of the human operator during their execution
B.
An arrow in a multi-dimensional space. It is a quantity usually characterized
by an ordered set of scalars
C.
The validation of a theory on the basis of a finite number of examples
D.
None of these
Option: B
Explanation :
15:
Transparency
A
B.
The information stored in a database that can be retrieved with a single query
C.
D
. None of these

Option: C
Explanation :

computing/questions/192/3.html#ixzz46VFK5DKd
1:
Core of soft Computing is
A.
Fuzzy Computing, Neural Computing, Genetic Algorithms
B.
Fuzzy Networks and Artificial Intelligence
C.
Artificial Intelligence and Neural Science
D.
Neural Science and Genetic Science
Option: A
Explanation :
2:
Who initiated the idea of Soft Computing
A.
Charles Darwin
B.
Lofti A Zadeh
C.
Rechenberg
D.
Mc_Culloch
Option: B
Explanation :
3:
Fuzzy Computing
A
. mimics human behaviour
B.
doesnt deal with 2 valued logic
C.
deals with information which is vague, imprecise, uncertain, ambiguous,
inexact, or probabilistic
D
. All of the above
Option: D
Explanation :
4:
Neural Computing
A.
mimics human brain
B.
information processing paradigm
C.
Both (a) and (b)
D.
None of the above
Option: C
Explanation :
5:
Genetic Algorithm are a part of
A
. Evolutionary Computing
B.
inspired by Darwin's theory about evolution - "survival of the fittest"
C.
are adaptive heuristic search algorithm based on the evolutionary ideas of
natural selection and genetics
D
. All of the above
Option: D
Explanation

computing/introduction/512/1.html#ixzz46VFZ9z1x
6:
What are the 2 types of learning
A.
Improvised and unimprovised
B.
supervised and unsupervised
C.
Layered and unlayered
D.
None of the above
Option: B
Explanation :
7:
Supervised Learning is
A.
learning with the help of examples
B.
learning without teacher
C.
learning with the help of teacher
D.
learning with computers as supervisor
Option: C
Explanation :
8:
Unsupervised learning is
A.
learning without computers
B.
problem based learning
C.
learning from environment
D.
learning from teachers
Option: C
Explanation :
9:
Conventional Artificial Intelligence is different from soft computing in the sense
A.
Conventional Artificial Intelligence deal with prdicate logic where as soft
computing deal with fuzzy logic
B.
Conventional Artificial Intelligence methods are limited by symbols where
as soft computing is based on empirical data
C.
Both (a) and (b)
D.
None of the above
Option: C
Explanation :
10:
In supervised learning
A.
classes are not predefined
B.
classes are predefined
C.
classes are not required
D.
classification is not done
Option: B
Explanation :

computing/introduction/512/2.html#ixzz46VFqvgSd
1:
Membership function defines the fuzziness in a fuzzy set irrespective of the
elements in the set, which are discrete or continuous.
A.
True
B.
False
C.
D.
Option: A
Explanation :
2:
The membership functions are generally represented in
A.
Tabular Form
B.
Graphical Form
C.
Mathematical Form
D.
Logical Form
Option: B
Explanation :
3:
Membership function can be thought of as a technique to solve empirical problems
on the basis of
A.
knowledge
B.
examples
C.
learning
D.
experience
Option: D
Explanation :
4: Three main basic features involved in characterizing membership function are
A.
Intution, Inference, Rank Ordering
B.
Fuzzy Algorithm, Neural network, Genetic Algorithm
C.
Core, Support , Boundary
D.
Weighted Average, center of Sums, Median
Option: C
Explanation :
5:
The region of universe that is characterized by complete membership in the set is
called
A.
Core
B.
Support
C.
Boundary
D.
Fuzzy
Option: A
Explanation :

computing/questions/369/1.html#ixzz46VG385ou
6: A fuzzy set whose membership function has at least one element x in the universe
whose membership value
is unity is called
A.
sub normal fuzzy sets
B.
normal fuzzy set
C.
convex fuzzy set
D.
concave fuzzy set
7:
In a Fuzzy set a prototypical element has a value
A.
1
B.
0
C.
infinite
D.
Not defined
Option: A
Explanation :
8:
A fuzzy set wherein no membership function has its value equal to 1 is called
A.
normal fuzzy set
B.
subnormal fuzzy set.
C.
convex fuzzy set
D.
concave fuzzy set
Option: B
Explanation :
9: A fuzzy set has a membership function whose membership values are strictly
monotonically increasing or strictly monotonically decreasing or strictly
monotonically increasing than strictly monotonically decreasing with increasing
values for elements in the universe
A.
convex fuzzy set
B.
concave fuzzy set
C.
Non concave Fuzzy set
D.
Non Convex Fuzzy set
Option: A
Explanation :
10:
The membership values of the membership function are nor strictly
monotonically increasing or decreasing or strictly monoronically increasing than
decreasing.
A.
Convex Fuzzy Set
B.
Non convex fuzzy set
C.
Normal Fuzzy set
D.
Sub normal fuzzy set
Option: B
Explanation :

computing/questions/369/2.html#ixzz46VGHJtYr
11:
Match the Column
List I
List II
1 Subnormal Fuzzy Set
2 Normal Fuzzy Set
3 Non Convex Normal Fuzzy Set
4 Convex Normal Fuzzy Set
A.
a b c d
2 1 4 3
B.
a b c d
1 2 3 4
C.
a b c d
4 3 2 1
D.
a b c d
3 2 1 4
Option: A
Explanation :
12: The crossover points of a membership function are defined as the elements in the
universe for which a particular fuzzy set has values equal to
A.
infinite
B.
1
C.
0
D.
0.5
Option: D
Explanation :

computing/questions/369/3.html#ixzz46VGTKXoG
Questions
1. Which of the following(s) is/are found in Genetic Algorithms?
(i)
evolution
(ii)
selection
(iii)
reproduction
(iv)
mutation
: Your answer is
(a)
i & ii only
(b)
i, ii & iii only
(c)
ii, iii & iv only

(d)
all of the above
2. Matching between terminologies of Genetic Algorithms and

Genetics:
Genetic Algorithms Genetics (biology)
(a) (i)
representation external disturbance,

structures such as cosmic radiation
(b) (ii)
crossover chromosomes
(c) (iii)
mutation survivability
(d) (iv)
selection sexual reproduction
: Your answer is .3
4. (a)
5. _____
6. (b)
7. _____
8. (c)
9. _____
10.(d)
11._____
12.Where are Genetic Algorithms applicable?
(i)
real time application
(ii)
biology
(iii)
Artificial Life
(iv)
economics
: Your answer is
(a)
i, ii & iii only
(b)
ii, iii & iv only
(c)
i, iii & iv only
(d)
all of the above
13.Which of the following(s) is/are the pre-requisite(s) when Genetic

Algorithms are applied to solve problems?
(i)
encoding of solutions
(ii)
well-understood search space
(iii)
method of evaluating the suitability of the solutions
(iv)
contain only one optimal solution
: Your answer is
(a)
i & ii only
(b)
ii & iii only
(c)
i & iii only

(d)
iii & iv only
14.Which of the following statement(s) is/are true?
(i)
Genetic Algorithm is a randomised parallel search algorithm, based

on the principles of natural selection, the process of evolution.
(ii)
GAs are exhaustive, giving out all the optimal solutions to a given
problem.
(iii)
GAs are used for solving optimization problems and modeling

evolutionary phenomena in the natural world.
(iv)
Despite their utility, GAs remain a poorly understood topic.
: Your answer is
(a)
i, ii & iii only
(b)
ii, iii & iv only
(c)
i, iii & iv only
(d)
all of the above
15.If crossover between chromosome in search space does not produce

significantly different offsprings, what does it imply? (if offspring
consist of one half of each parent)
(i)
The crossover operation is not succesful.
(ii)
Solution is about to be reached.

(iii)
Diversity is so poor that the parents involved in the crossover

operation are similar.
(iv)
The search space of the problem is not ideal for GAs to operate.
: Your answer is
(a)
ii, iii & iv only
(b)
ii & iii only
(c)
i, iii & iv only
(d)
all of the above
16.Which of the following comparison is true?
: Your answer is
(a)
In the event of restricted acess to information, GAs win out in that

they require much fewer information to operate than other search.
(b)
Under any circumstances, GAs always outperform other algorithms.
(c)
The qualities of solutions offered by GAs for any problems are

always better than those provided by other search.
(d)
GAs could be applied to any problem, whereas certain algorithms

are applicable to limited domains.
(i)
Artificial Life is analytic, trying to break down complex phenomena
into their basic components.
(ii)
Alife is a kind of Artificial Intelligence (AI).
(iii)
Alife pursues a two-fold goal: increasing our understanding of

nature and enhancing our insight into artificial models, thereby
providing us with the ability to improve their performance.
(iv)
Alife extends our studies of biology, life-as-we-know-it, to the larger

domain of possible life, life-as-it-could-be.
: Your answer is
(a)
i & ii only
(b)
iii & iv only
(c)
i, ii & iii only
(d)
all of the above
18.Where is Artificial Life applicable?
(i)
film (movie, video) production
(ii)
biology
(iii)
robotics
(iv)
air traffic control

: Your answer is
(a)
i, ii & iii only
(b)
ii, iii & iv only
(c)
i, iii & iv only
(d)
all of the above
19.Who can be benefited from Alife?
(i)
children
(ii)
designers
(iii)
artists
(iv)
patients
: Your answer is
(a)
i, ii & iii only
(b)
ii, iii & iv only
(c)
i, iii & iv only
(d)
all of the above

: Answers
Q1.
Which of the following(s) is/are found in Genetic Algorithms?
The correct answer is (d).
An initial population evolves to some optimal solutions. Selection biases for

better individuals, judged by their fitness values; two individuals are chosen
for reproducing offspring. By combining portions of good individuals, this
.process is likely to create even better individuals
...Go Back
Q2.
Matching between terminologies of Genetic Algorithms and

Genetics:
The correct answer is :
(a)
(ii)
(b)
(iv)
(c)
(i)
(d)
(iii)
...Go Back
Q3.
Where are Genetic Algorithms applicable?
The correct answer is (b).
Genetic Algorithms can be used to evolve strategies for interaction in the

Prisoner's Dilemma in economics. GAs are used as a computational method in
Alife - simulation of living systems starting with single cells and evolving to
orgranisms, societies or even whole economic systems. These features
compete for the limited resources in this virtual world. In biology, GAs are
used in protein structure prediction, protein folding, stability of DNA hairpins
.and modeling of immune system
DNA structures Protein Structures
It cannot be applied in real time systems. The response time is critical.

However, GAs cannot guarantee to find a solution. The time spent in
evaluation of fitness function and other genetic operations is substantially
.large, especially in a poorly- understood, complex search space
...Go Back
Q4.
Which of the following(s) is/are the requirement(s) when Genetic

The correct answer is (c).
The problem is mapped into a set of strings with each string representing a
potential solution (i.e. chromosomes). A fitness function is required to
compare and tell which solution is better. GA performance is heavily
.dependent on the representation chosen
GAs are designed to efficiently search large, non-linear, poorly understood

search space where expert knowledge is scarce or difficult to encode and
where traditional techniques fail. However, domain knowledge guides GAs to
obtain the optimal solutions. Moreover, GAs are powerful enough to solve for
.a set of (nearly) optimal solutions
...Go Back
Q5.
Which of the following statement(s) is/are true?
The search space is too complex for exhaustive search such that GAs
successfully find robust solutions after evaluating only a few percent of the
.full parameter space
It can never be guaranteed that GAs will find an optimal solution or even any
.solution at all
Their probabilistic nature and reliance on frequent interactions of members of

a large population make a complete analytic understanding of GAs extremely
.difficult
...Go Back
Q6.
If crossover between chromosome in search space does not produce

significantly different offspring, what does it imply? (if offspring
When crossover operation does not produce siginificantly different offsprings,

it shows that the parents involved are almost identical. Hence, it means that
solution is about to be reached. However, this solution derived is not
neccessarily the optimal solution. From here, we could see that mutation is
necessary to maintain the diversity of the population so that GAs would not be
.trapped in partial solutions
...Go Back
Q7.
Which of the following comparison is true?
The correct answer is (a).

 This is true since GAs require only information that would
evaluate the fitness function for the possible soulutions
(individuals in search space). But for other searches which
generally require more information, like differentiability of
problem function, might find it hard to find them.
 This holds true in most circumstances. However, if the search

space is small enough, other search like hill-climbing or
heuristic, which are very effective in explorating small space,
would just perform as good.
 GAs have only been developed for a couple of decades while

traditional searches have been investigated for a longer time.
Thus GAs do not necessarily produce a better quality solution.
 Evidently certain algorithms are only applicable to limited

domains . However, certain difficulties, like encoding of
problems, might hinder the use of GAs.
...Go Back
Q8.
Alife is characterised by a bottom-up synthesis approach, so that the robotics

work tends to aim for insect-like capability rather than human, and complex
hebaviours are developed by putting together more simple ones. Artificial
forms of evolution such as Genetic Algorithms and Genetic Programming are
widely used to evolve solutions or behaviours rather than designing them in a
.top-down fashion in Artificial Intelligence
...Go Back
Q9.
Where is Artificial Life applicable?
Alife is applicable in many fields, such as a walking robot

.shown on the right
...Go Back
Q10.
Who can be benefited from Alife?
Children can use various computational tools (including LEGO/Logo

and Electronic Bricks) to build artificial creatures, exploring
.some of the central ideas of Alife
GAs can be applied to the design of laminated composite structures, circuit

designs and the improvement of Pareto optimal designs. Genetic programming
can help artists to create many pictures. Medical problems can also be
.detected: Medibrains
...Go Back
1. Which type od the model is having the memory associated with it?
a) GAN
b) Autoencoder
c) RNN
d) CNN
2) RNN model works with

a) random data
b) nominal data
c) ordinal data
d) sequential data
3) Which of the data is an example of sequential data

a) MNIST data
b) house rate prediction data
c) weather forecasting data
d) CIFAR10 dataset
4) LSTM layer is used to

a) avoid the problem of exploding grdients
b) avoid the problem of vanishing gradients
c) to retain the previous state of the model
d) to work with ordinal type of data
5) Which methos is used to avoid the exploding gradient problem

a) LSTM
b) TBTT
c) forget cell
d) autoencoders
Type of RNN
applications of RNN
Forget Cell
Which should be the value of |Whh| so that model does not stuck in exploding and vanishing gradient problem
a) <1
b) >1
c) =1
d) =0
What of the following is the part of LSTM?

a) stride
b) zero padding
c) discriminator
d) Forget cell
Which of the method uses the trainable parameters for converting string data into numerical data"
a) one hot encoding
b) representing each word with unique number
c) word embedding
d) All of these
Which ofsource
This study the was
method have
downloaded least relationship
by 100000795234702 with encoded
from CourseHero.com data and
on 04-24-2022 string
03:52:32 type
GMT of data
-05:00
https://www.coursehero.com/file/75294158/unit-5pdf/
a) one hot encoding
b) representing each word with unique number
c) word embedding
d) All of these
This study source was downloaded by 100000795234702 from CourseHero.com on 04-24-2022 03:52:32 GMT -05:00
https://www.coursehero.com/file/75294158/unit-5pdf/
Powered by TCPDF (www.tcpdf.org)
1. High bias means- 10. Size of weights for the followin
Underfit g code is : model.keras.sequenti
Overfit al( [ layers.Dense(3)]) y=model(t
f.ones(10,5)) print(model.weights
2. How to check CPU time using [0].shape)-
python? 5 x 3
import ClockTime 10 x 5
import time 10 x 3
3 x 5
3. Python use -
Interpreter
Compiler 11. Model.compile() in keras requir
e-
4. How to check version of All of above
tensorflow? optimizer
tf.__version__ loss
tf._version_ metrics
tf.version
12. Model.save() save the model's-
5. Output of print(tf.test.gpu_devic All of above
e_name()) if only 1-GPU availa Model Architecture
ble is- Optimizer State
/device:GPU:0 Weight & Biase matrix
/device:GPU:1
13. Correct library to load saved
6. Matrix multiplication is- model in keras is-
@ keras.models.load_model()
* keras.Sequential.load_model()
** keras.layers.Dense.load_model()
7. For Square of tensor can i use- 14. Which of the following is the c
tf.square() orrect library to load pre-traine
** d NN?-
^2 tf.keras.Models
tf.keras.applications
8. Element wise matrix multiplicat tf.keras.layers
ion- tf.keras.preprocessing
a*b
a**b 15. Which of following is NOT dat
a@b a-augmentation layer?-
RandomTranslate()
9. Weights and biases in the sequ RandomCrop()
ential model assigned by either RandomFlip()
call the model with inputs or RandomRotation()
specify input shape during the
creation of the model. 16. Which of following is NOT the
True building block of LSTM-
False logic gate input gate
Weights are created once model Forget gate output gate
is declared
https://www.coursehero.com/file/75294130/ml-ete-sanjay-sir-pdf/
17. Which of the following weight
matrix leads to the VANISHIN 24. Which of the following is a cor
G gradient problem in BPTT?- rect library to import text_data
|Whh|<1 set_from_directory()-
|Whh|>1 I. tf.keras.preprocessing
|Whh| =1 II. tf.keras.layers.experimental.preproc
|Whh| =0 essing
III. sklearn.preprocessing
18. Which of following is NOT the IV. tf.keras.modes.preprocessing
gate in LSTM?
Multiplication gate 25. Which of the following can be
Input Gate used to solve the vanishing gra
Forget gate dient problem of BPTT?
Output gate LSTM
LSTM or GRU both can be used
19. A Gate in LSTM has an activa GRU
tion function- Dropout
Tanh
sigmoid 26. Rescaling and Resizing is the p
threshold reprocessing layers, that can be
linear imported from library -
tf.keras.layers.experimental.preproc
20. Which of the following |Whh| l essing
eads to Exploding gradient pro tf.keras.layers.preprocessing
blem? tf.keras.preprocessing
|Whh| > 1 tf.keras.models.layers.preprocessing
|Whh| < 1
|Whh| = 1 27. A dataset 'x_train' contains 50
|Whh| = 0 batch with each having size 32.
Number of batch in x_new=x_
21. Which of following is a correct train.take(20) is -
library for Embedding layer i 20
n RNN? 50
I. tf.keras.layers 30
II. tf.keras.applications 32
III. tf.keras.layers.experimental.preproc
essing 28. A dataset 'x_train' contains 50
IV. tf.keras.Models batch with each having size 32.
Number of batch in x_new=x_
22. LSTM stands for - train.skip(20) is -
Long Short Term Memory 30
Length Short Term Memory 50
Long Sequential Term Memory 20
Length Short Term Memory 32
23. GRU stands for - 29. Which of following is correct u

Gated Recurrent Unit se of three LSTM layers?
Graphical Recurrent Unit A) tf.keras.layers.LSTM(128,return_sequ
Generalized Recurrent Unit ences=True);
Gated Recurrence Unit tf.keras.layers.LSTM(64);
tf.keras.layers.LSTM(32)
B) tf.keras.layers.LSTM(128,return_sequ vertical_and_horizontal
ences=True); horizontal
tf.keras.layers.LSTM(64,return_seque
nce=True);
vertical
tf.keras.layers.LSTM(32) horizontal_and_vertical
C) tf.keras.layers.LSTM(128,return_sequ
ences=True); 35. Data augmentation layers are a
tf.keras.layers.LSTM(64); vailable in which directory?
tf.keras.layers.LSTM(32,return_seque
nces=True)
D) tf.keras.layers.LSTM(128); tf.keras.layers.experimental.preprocessing
tf.keras.layers.LSTM(64); tf.keras.preprocessing
tf.keras.layers.LSTM(32) tf.keras.models.layers.preprocessing
tf.data.experimental.preprocessing
30. Which of following activation f

36. Which of following layers is N
unction used in GRU?
OT a type of recurrent neural
both sigmoid and tanh
network layers?
Sigmoid tf.keras.layers.experimental.preprocessing.
Tanh TextVectorization()
Relu tf.keras.layers.LSTM()
Softmax tf.keras.layers.GRU()
tf.keras.layers.Bidirectional()
31. Which of the following is the b
est suitable application of RNN? 37. Which of following is NOT the
Text Classification methods of TextVectorization l
Time series forecasting ayer of tensorflow?
Text Generation ngrams()
Image classification adapt()
Regression get_vocabulary()
set_vocabulary()
32. Which of the following neural
network layers are supported b 38. TextVectorization layer is availa
y keras? ble in ...…
tf.keras.layers.experimental.preprocessing
All of above tf.keras.preprocessing
Conv2D tf.keras.layers.preprocessing
Conv2DTranspose sklearn.preprocessing
GlobalAveragePooling1D
LSTM 39. Which of the following is used
to stop exploding and Vanishin
33. Which of the following is false g Gradient-
about radial basis function neu LSTM
ral network? GRU
It resembles to RNNs which have feed Bidirectional
back loops. Dropout
None of the above.
It use radial besis function as activation
function. 40. Which of the following Method
While outputting, it considers the distan- uses an exponentially weighted
ce of a point with respect to the center. linear function of past observa
tions?
34. Which of following is NOT the Simple Exponential Smoothing
parameter of data augmentatio Holt Winter’s Exponential Smoothing
n layer RandomFlip()- Vector Autoregression
Autoregression
IV. Autoregressive Integrated Moving Avera
41. Which of the following is time ge (ARIMA)
series analysis method NOT su
pport both Trends or signal co 46. Which of following GANs uses
mponent? unpaierd data for prediction?
I. Seasonal Autoregressive Integrated Movi CycleGAN
ng Average (SARIMAX) Pix2Pix
II. Autoregressive Integrated Moving DCGAN
Average (ARIMA) FGSM
III. Seasonal Autoregressive Integrated Movi
ng Average with exogenous variable(S A
RIMAX) 47. What is the full form of FGS
IV. Holt Winter’s Exponential Smoothing M?
Fast Gradient Sign Method
42. Which of following is the corre Fast Gradient Sigmoid Method
ct method to load autoregressio Fourier Gradient Signature Methd
n model? Fast Gravity Sign Magnitude
I. from statsmodels.tsa.ar_model import Au
toReg 48. Which of the following is NOT
II. from statsmodels.tsa.arima_model import GANs networks?
ARMA
III. from statsmodels.tsa.arima_model import FGSM
AutoReg DeepDream
IV. from statsmodels.tsa.ar_model import Au Pix2Pix
toRegression CycleGAN
43. Which of the following is NOT 49. Which of following is TRUE fo

a multivariate time series anal r adversary example?
ysis method? I. These examples are added to train data
I. Vector Autoregression - intensely to fool Neural Networks
II. AutoRegression II. These examples are used for validation
III. Vector Autoregression Moving-Average ` III. These examples are used for training.
with exogenous variable IV. None of above
IV. Vector Autoregression Moving-Average
50. FGSM is a type of __________
44. Which of the following is time ___________ attack on NN.
series analysis method NOT su Black box
pport both Trends or signal co White box
mponent? Gray box
I. Autoregressive Integrated Moving Pink box
Average (ARIMA)
II. Seasonal Autoregressive Integrated
Moving Average with exogenous 51. Which of following is NOT infe
variable(SARIMAX) rence attack in NN?
III. Holt Winter’s Exponential Smoothing Fuzzy Inference
IV. Vector Autoregression Membership Inference
Attribute Inference
45. Which of the following is meth
Model Inference
od supports multivariate time s
Input Inference
eries analysis?
I. Vector Autoregression
II. Seasonal Autoregressive Integrated 52. Which of following is NOT an
Moving Average with exogenous attack on NN?
variable(S ARIMAX) Phissing
III. Holt Winter’s Exponential Smoothing Pisioning
Backdooring Attribute Inference
Torjoning Model Inference
53. In which of following attack, a 59. Which of following attack is on

n attacker wants to extract trai Data of ML model?
ning data of a model? Data poisioning
Attribute Inference Adversarial attacks
Input Inference Backdooring
Membership Inference Torjoning
Model Inference
60. Which of following layer can b
54. Which of the following network e used to convert Text data to
s can be used to convert one i Index vector?
mage in a form on a painting TextVectorization()
of another image? Text2Vec()
Neural style transfer text_data_from_directory()
Pix-to-Pix transfer text_to_int()
DCGAN
FGSM 61. Which of following is the corre
ct output by applying AvergaeP
55. Which of the following is corre ooling2D((3,3)) on input image
ct library to import Data augm [[1,2,3],[4,5,6],[7,8,9]]
entation layer? [[[[5.0]]]]
tf.keras.layers.experimental.preprocessing [[[5.0]]]
tf.keras.layer.preprcoessing [5.0]
tf.keras.preprocessing
tf.keras.models.preprocessing
[[5.0]]
56. Which of following techniques 62. What is the correct outputshap

used to randomly rotate, crop, e by applying Conv2D(32,7) on
zoom etc. to input image to sto input image of size 32x32x3?
p overtraining is called as - 26x26x32
Data Augmentation 32x32x32
Early Stopping 32x32x3
Feature Scaling 25x25x32
Cross Validation
63. Which of the following method
57. Which of the following network of matplotlib can be used to p
s used to generate a new image lot different graphs in shame fi
that looks like real ? gure?
GAN subplot()
CNN plot()
Perceptron imshow()
RNN grid()
58. Which of following privacy atta 64. Which of the following operato
ck on ML model, where attack rs NOT supported in python te
ers want to extract training dat nsorflow?
a of model? #
Membership Inference ^
Input Inference **
@ I.
II.
tf.keras.application.mobilenet_v2.decode_predictions
tf.keras.application.mobilenet_v2.preprocess_input
III. tf.keras.application.mobilenetV2.decode_predictions
65. Which of following weights are IV. tf.keras.application.mobilenet_v2.MobileNetV2.decode_predictions
desirable is BPTT algorithm?

71. Which of following command is
|W|=1
used to compiles a function in
|W|<1
to a callable TensorFlow graph
|W|>1
in version 2.3.0?
|W|=0
tf.function()
tf.Graph()
66. Which of following layers are
tf.Variable()
NOT a data augmentation late
tf.Constant()
r?
RAndomContrast()
72. Which of following is NOT cor
RandomRotate()
rect tensor in TensorFlow?
RandomFlip()
Encode Tensor
RandomZoom()
String Tensor
RandomCrop()
Ragged Tensor
Sparse Tensor
67. How many training parameters
in following model 1. input la
73. Which of following code return
yer 28x28x3, 2) Conv2D(64,7) ,
sum of numbers in string x se
3) Dropout(0.5), 4) Flatten()-
parated by space for example x
9408
='1 2 3 4 5'-
9472 I. a=tf.strings.to_number(tf.strings.split(x,sep=' '))
25088 add=tf.reduce_sum(a)
II. a=tf.strings.to_Int(tf.strings.split(x,sep=' '))
32832 add=tf.sum(a)
III. a=tf.strings.ParseInt(tf.strings.split(x,sep=' '))
add=tf.reduce_sum(a)
68. What is the output shape follo IV. a=tf.strings.to_number(tf.strings.split(x,sep=' '))
wing model 1. input layer 20x2 add=tf.sum(a)
0x3 2) Conv2D(7,14) , 3) Maxp
ooling2D((2,2)) 3) Dropout(0.5), 74. Datatype of ‘hist.history’ of foll
4) Flatten()- owing object is hist=model.fit(tr
49 x 1 ain_data,train_label,epochs=15)-
63 x 1 Dictionary
7 x 1 Tensor
343 x 1 List
String
69. Which of following networks m
ay be used to colourize binary 75. In keras, input_dim parameter
image? is set on which layer of the ne
ural network?
DeepDream Input layers
Pix2Pix Hidden layers
Neural Style Transfer Dropout layers
CycleGAN Output Layers
70. Which of following is the corre 76. Which of the following is true
ct library to convert predicted about dropout?
value of mobilenetV2 to correct Dropout is a regularization technique
label? Dropout does not reduce overfitting.
Dropout solves vanishing gradient problem.
All of the above.
77. Which types of layers are used

in Discriminator?
All of the above
Conv2D
LSTM
Conv2DTranspose
78. In conditional GAN, a conditio

nal parameter is added to ……
Generator
Both Discriminator and Generator
Discriminator
None of the above
79. Which of the following is false

about LSTM?
I. LSTM is an extension for RNN which
extends its memory.
II. LSTM solves the exploding gradients is
sue in RNN.
III. None of the above
IV. LSTM enables RNN to learn long term
dependencies.
80. Which of the following is an a

pplication of RNN?
All of the above

NLP
Audio and video analysis
Stock market prediction
Powered by TCPDF (www.tcpdf.org)
4/24/22, 2:20 PM [MCQ] Soft Computing - Last Moment Tuitions
Sem 8 Sale is Live Get Bundle Course @1999/- [ Click Here 

(https://lastmomenttuitions.com/course-category/mu/courses/fy/fy-comps/sem-8/) ]
Get Latest Exam Updates, Free Study m
(https://lastmome
nttuitions.com/)
ng
[MCQ] Soft Computing
Fuzzy Set Theory (#1617706897112-a3dcb97e-bb1e)
 Module 1
1. What is Fuzzy Logic?
A. a method of reasoning that resembles human reasoning
B. a method of question that resembles human answer
C. a method of giving answer that resembles human answer.
D. None of the Above
View Answer
Ans : A
Explanation: Fuzzy Logic (FL) is a method of reasoning that resembles human reasoning.
2. How many output Fuzzy Logic produce?
A. 2
B. 3
C. 4
D. 5
https://lastmomenttuitions.com/mcqs/it-engineering/mcq-soft-computing/ 1/12
View Answer
Ans : A
Explanation: The conventional logic block that a computer can understand takes precise
input and produces a de nite output as TRUE or FALSE, which is equivalent to human’s YES
or NO.
3. Fuzzy Logic can be implemented in?
A. Hardware
B. software
C. Both A and B
View Answer
Ans : C
Explanation: It can be implemented in hardware, software, or a combination of both.
4. The truth values of traditional set theory is ____________ and that of fuzzy set is __________
A. Either 0 or 1, between 0 & 1
B. Between 0 & 1, either 0 or 1
C. Between 0 & 1, between 0 & 1
D. Either 0 or 1, either 0 or 1
View Answer
Ans : A
Explanation: Refer the de nition of Fuzzy set and Crisp set.
5. How many main parts are there in Fuzzy Logic Systems Architecture?
A. 3
B. 4
C. 5
D. 6
View Answer
Ans : B
Explanation: It has four main parts.
6. Each element of X is mapped to a value between 0 and 1. It is called _____.
A. membership value
B. degree of membership
C. membership value
D. Both A and B
View Answer
Ans : D
Explanation: each element of X is mapped to a value between 0 and 1. It is called
membership value or degree of membership.
7. How many level of fuzzi er is there?
A. 4
B. 5
C. 6
D. 7
View Answer
Ans : B
Explanation: There is 5 level to fuzzi er
8. Fuzzy Set theory de nes fuzzy operators. Choose the fuzzy operators from the
following.
A. AND
B. OR
C. NOT
D. All of the above
View Answer
Ans : D
Explanation: The AND, OR, and NOT operators of Boolean logic exist in fuzzy logic, usually
de ned as the minimum, maximum, and complement;
represented by _______
A. Fuzzy Set
B. Crisp Set
C. Both A and B
View Answer
Ans : A
10. What action to take when IF (temperature=Warm) AND (target=Warm) THEN?
A. Heat
B. No_Change
C. Cool
View Answer
Ans : B
Explanation: IF (temperature=Warm) AND (target=Warm) THEN No_change
Crack Job Placement Aptitude in First Attempt
Prepare for Aptitude with 50+ Videos Lectures and Handmade Notes
Click Here! (https://lastmomenttuitions.com/aptitude/?ref=42057)
11. What is the form of Fuzzy logic?
a) Two-valued logic
b) Crisp set logic
d) Binary set logic
View Answer Answer: c
Explanation: With fuzzy logic set membership is de ned by certain value. Hence it could
have many values to be in the set.
a) True
b) False
View Answer Answer: a
Explanation: Traditional set theory set membership is xed or exact either the member is in
the set or not. There is only two crisp values true or false. In case of fuzzy logic there are
many values. With weight say x the member is in the set.
__________
Explanation: Refer the de nition of Fuzzy set and Crisp set.
Partial Truth.
a) True
b) False
Explanation: None.
represented by _______
a) Fuzzy Set
b) Crisp Set
c) Fuzzy & Crisp Set
16. The values of the set membership is represented by ___________
a) Discrete Set
b) Degree of truth
c) Probabilities
d) Both Degree of truth & Probabilities
View Answer Answer: b
17. Japanese were the rst to utilize fuzzy logic practically on high-speed trains in Sendai.
a) True
b) False
Explanation: None.
18. Fuzzy Set theory de nes fuzzy operators. Choose the fuzzy operators from the
following.
a) AND
b) OR
c) NOT
View Answer Answer: d
Explanation: The AND, OR, and NOT operators of Boolean logic exist in fuzzy logic, usually
de ned as the minimum, maximum, and complement;
19. There are also other operators, more linguistic in nature, called __________ that can be
applied to fuzzy set theory.
a) Hedges
b) Lingual Variable
c) Fuzz Variable
Explanation: None.
20. Fuzzy logic is usually represented as ___________
b) IF-THEN rules
c) Both IF-THEN-ELSE rules & IF-THEN rules
View Answer Answer: b
Explanation: Fuzzy set theory de nes fuzzy operators on fuzzy sets. The problem in applying
this is that the appropriate fuzzy operator may not be known. For this reason, fuzzy logic
usually uses IF-THEN rules, or constructs that are equivalent, such as fuzzy associative
matrices.
a) True
b) False
Explanation: Once fuzzy relations are de ned, it is possible to develop fuzzy relational
databases. The rst fuzzy relational database, FRDB, appeared in Maria Zemankova
dissertation.
a) Fuzzy Logic
b) Probability
c) Entropy
View Answer Answer: d
Explanation: Entropy is amount of uncertainty involved in data. Represented by H(data).
23. ____________ are algorithms that learn from their more complex environments (hence
eco) to generalize, approximate and simplify solution logic.
b) Ecorithms
c) Fuzzy Set
View Answer Answer: c
Explanation: Local structure is usually associated with linear rather than exponential growth
in complexity.
24. Membership function de nes the fuzziness in a fuzzy set irrespective of the elements
in the set, which are discrete or continuous.
a.) True
b.) False
Answer: A
25.The membership functions are generally represented in
a.) Tabular form
b) Graphical form
c) Mathematical form
d) Logical form
Ans: B
26.Membership function can be thought of as a technique to solve empirical problems
on the basis of
a) knowledge
b) example
c) learning
d) experience
Ans: D
27.Three main basic features involved in characterizing membership function are
a)Intution, Inference, Rank Ordering
b)Fuzzy Algorithm, Neural network, Genetic Algorithm
c)Core, Support , Boundary
d)Weighted Average, center of Sums, Median
Ans : C
28. A fuzzy set whose membership function has at least one element x in the universe
is unity is called
a) sub normal fuzzy sets
b) normal fuzzy set
c) convex fuzzy set
d) concave fuzzy set
Ans: B
29. In a Fuzzy set a prototypical element has a value
a) 1
b) 0
c) in nite
d) not de ned
Ans: A
30. A fuzzy set wherein no membership function has its value equal to 1 is called
a) Normal fuzzy set
b) Sub normal fuzzy set
c) convex fuzzy set
d) non convex fuzzy set
Ans: B
Python Programming for Complete Beginners
Start your Programming Journey with Python Programming which is Easy to Learn and
Highly in Demand
Click Here! (https://lastmomenttuitions.com/complete-python-bootcamp/?ref=42057)
31.A fuzzy set has a membership function whose membership values are strictly
monotonically increasing or strictly monotonically decreasing or strictly monotonically
increasing than strictly monotonically decreasing with increasing values for elements in
the universe
a) Convex fuzzy set
b) Concave fuzzy set
c) Non Concave fuzzy set
d) Non Convex fuzzy set
Ans : A
32. The membership values of the membership function are nor strictly monotonically
increasing or decreasing or strictly monoronically increasing than decreasing.
a) Convex fuzzy set
b) non convex fuzzy set
c) normal fuzzy set
d) sub normal fuzzy set
Ans : B
33. Activation models are?
a) dynamic
b) static
c) deterministic
d) none of the mentioned
Answer: c
Explanation: Input/output patterns & the activation values may be considered as sample
functions of random process.
34. If xb(t) represents di erentiation of state x(t), then a stochastic model can be
represented by?
a) xb(t)=deterministic model
b) xb(t)=deterministic model + noise component
c) xb(t)=deterministic model*noise component
d) none of the mentioned’
Answer: b
Explanation: Noise is assumed to be additive in nature in stochastic models.
35. What is equilibrium in neural systems?
a) deviation in present state, when small perturbations occur
b) settlement of network, when small perturbations occur
c) change in state, when small perturbations occur
Answer: b
Explanation: Follows from basic de nition of equilibrium.
36.What is the condition in Stochastic models, if xb(t) represents di erentiation of state
x(t)?
a) xb(t)=0
b) xb(t)=1
c) xb(t)=n(t), where n is noise component
d) xb(t)=n(t)+1
Answer: c
Explanation: xb(t)=0 is condition for deterministic models, so option c is radical choice.
37. What is asynchronous update in a network?
a) update to all units is done at the same time
b) change in state of any one unit drive the whole network
c) change in state of any number of units drive the whole network
Answer: b
Explanation: In asynchronous update, change in state of any one unit drive the whole
network.
38. Learning is a?
a) slow process
b) fast process
c) can be slow or fast in general
d) can’t say
Answer: a
Explanation: Learning is a slow process.
39. What are the requirements of learning laws?
a) convergence of weights
b) learning time should be as small as possible
c) learning should use only local weights
d) all of the mentioned
Answer: d
Explanation: These all are the some of basic requirements of learning laws.
40. Memory decay a ects what kind of memory?
a) short tem memory in general
b) older memory in general
c) can be short term or older
Answer: a
Explanation: Memory decay a ects short term memory rather than older memories.
41. What are the requirements of learning laws?
a) learning should be able to capture more & more patterns
b) learning should be able to grasp complex nonliear mappings
c) convergence of weights
d) all of the mentioned
Answer: d
Explanation: These all are the some of basic requirements of learning laws.
42. How is pattern information distributed?
a) it is distributed all across the weights
b) it is distributed in localised weights
c) it is distributed in certain proctive weights only
Answer: a
Explanation: pattern information is highly distributed all across the weights.
Learn Machine Learning with Python from Scratch
Start your Machine learning & Data Science journey with Complete Hands-on Learning
& doubt solving Support
Click Here! (https://lastmomenttuitions.com/python-with-machine-learning/?
ref=42057)
Fuzzy Rules, Reasoning, and Inference System (#1617706897122-32786dce-f201)
Neural Network -1 (#1617712494754-a7e4f75f-4154)
Neural Network - 2 (#1617714663498-551ba020-3db8)
Genetic Algorithm (#1617719294760-67cc31db-3261)
Hybrid Computing (#1617719844315-cca3f7a5-507f)
Prepare For Your Placements: https://lastmomenttuitions.com/courses/placement-preparation/
(https://lastmomenttuitions.com/courses/placement-preparation/)
(https://lastmomenttuitions.com/course/python-zero-to-hero-covering-web-development-
and-machine-learning-capstone-project-from-scratch-included-mentorship/youtube-2/)
/ Youtube Channel: https://www.youtube.com/channel/UCGFNZxMqKLsqWERX_N2f08Q
(https://www.youtube.com/channel/UCGFNZxMqKLsqWERX_N2f08Q)
Follow For Latest Updates, Study Tips & More Content!
and-machine-learning-capstone-project-from-scratch-included-mentorship/insta-
1/)/lastmomenttuition (https://www.instagram.com/lastmomenttuition/)
and-machine-learning-capstone-project-from-scratch-included-mentorship/link/)/ Last Moment
Tuitions (https://in.linkedin.com/company/last-moment-
tuitions#:~:text=Last%20Moment%20Tuitions%20(LMT)%20is,others%20is%20its%20teaching%20
methodology.)
and-machine-learning-capstone-project-from-scratch-included-
mentorship/twittrwer/)/ lastmomentdost (https://twitter.com/lastmomentdost)
lOMoARcPSD|7609677
Final ML - Practice it
Machine learning (Lovely Professional University)
StuDocu is not sponsored or endorsed by any college or university

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)
lOMoARcPSD|7609677
1 The most common issue when using ML is
Lack of skilled resources

Choice of appropriate algorithm
Poor quality of data
Inadequate infrastructure
2 An active learner
Both a and b
interacts with the environment at training time by posing queries
None of these
observes the information provided by the environment
3 Which of the decision matrix is used in CART algorithm?

Gini Index
Information Gain
Gain Ratio
None of these
4 The incorporation of prior knowledge that biases the learning mechanism is

known as
None of the above
Learning by memorization
Inductive Bias
Generalization
5 Which of the following sentences are true?

The best pruned tree is not the one that minimizes the number of encoding
In pre-pruning a tree is ‘pruned’ by halting its construction early
None of these
A pruning set of class labeled tuples is used to estimate cost
6 According to inductive bias in decision tree learning, which of the statement is

correct?
Avoid Overfitting
Shorter trees are preferred.
Avoid underfitting
Longer trees are preferred

lOMoARcPSD|7609677
7 What of the following feature is used to identify well-posed learning problem?

None of these
Training Experience
Performance Measure
Class of task
8 Design of learning system consists of

Choice of Training Experience
All of these
Performance Measure
Choice of function approximation algorithm
9 Which of the following algorithm can handle continuous data for decision tree?
CART
ID3
C4.5
None of these
10 Empirical Risk Minimization with inductive bias method

avoids the overfitting problem
increases training error
increases testing error
avoids the underfitting problem
11 In decision tree learning, each branch corresponds to

an attribute
attribute value
Classification value
Regression Value
12 Choose the correct statements about C4.5

It deals with continuous data and missing data
Root node is one with maximum information gain.
Gini Index is used to find root node
Root node is one with maximum Gain ratio.
13 Choose the correct statements for avoiding overfitting in decision tree?

Pre-pruning
Post pruning
Optimistic pruning

lOMoARcPSD|7609677
Pessimistic pruning
14 A computer program is said to learn from experience E with respect to some

class of tasks T and performance measure P, if its performance at tasks in T, as
measured by P, improves with experience
Supervised learning problem
Un Supervised learning problem
None of these
Well posed learning problem
15 Which of the following is a disadvantage of decision trees?

None of the above
Decision trees are prone to be over fit
Factor analysis
Decision trees are robust to outliers
16 The field of study that gives computers the capability to learn without being
explicitly programmed
Artificial Intelligence
Deep Learning
Machine Learning
None of there
17 In which approach, multiple classifiers are trained using bootstrap samples?

Decision Tree
Bagging
Boosting
Stacking
18 Feature need to be identified by using Well Posed Learning Problem:

Performance measure
Class of tasks
None of these
Training experience
19 Consider a dataset with 6 instances in Outlook = Sunny. Out of 6 instances, 3

instances belongs to Yes decision and 3 belongs to No. Compute the gini index for
Outlook = Sunny.
1
0.48
0
0.5

lOMoARcPSD|7609677
20 PAC stands for

Partition Approximately Correct
Probability Approximately Correct
Probability Applied Correctly
None of these
21 According to brute-force MAP learning algorithm, which of the statement is

correct? *
None of these
The probability of data D given hypothesis h is 1 if D is inconsistent and 0 otherwise.
The probability of data D given hypothesis h is 1 if D is consistent and 0 otherwise.
The probability of data D given hypothesis h is 0 if D is consistent and 1 other wise.
22 Choose the correct statement: *

E[error(Gibbs)]<2*E[error(bayesoptimal)]
E[error(Gibbs)]<=2* E[error(bayesoptimal)]
E[error(Gibbs)]>2*E[error(bayesoptimal)]
E[error(Gibbs)]=2 * E[error(bayesoptimal)]
23 Which algorithm is used to deal with missing data? *

Maximum aposterior hypothesis
Bayes optimal classifier
EM algorithm
Gibbs algorithm
24 Which of the following is least used in ranking loss?

0-1 ranking loss
Kendall tau loss
Normalized Discounted Cumulative loss
None of the mentioned
25 Which of the following is invalid according to all pair algorithm if

class={red,gree,blue,orange}, where red=1, green=2,blue=3, orange=4
red vs green
green vs orange
blue vs red
red vs orange
26 Mona receives emails that consists of 18% spam of those emails. The spam
filter is 93% reliable i.e., 93% of the mails it marks as spam are actually a spam
and 93% of spam mails are correctly labelled as spam. If a mail marked spam by
her spam filter, determine the probability that it is really spam.
84
50
39
63

lOMoARcPSD|7609677
27 Consider the following ranking: Target: 1, 2, 3, 4, 5, 6 Obtained: 1, 2, 4, 3, 5, 6

How many concordant and discordant pairs are available?
15, 1
14,2
15,0
14,1
28 Let us consider four classes={red,green, blue, yellow} where red is considered

as 2, green as 1 , blue as 3 and yellow as 4. So consider the following output
h12=+1, h13=-1,h14=-1, h23=+1,h24=-1,h34=-1So based upon above data which
class will be predicted in all pairs?
red
green
blue
yellow
29
94, 113, 92
110, 141, 100
119, 133, 118
none of the mentioned
30 Let suppose for some document xyz, term frequency of word j is 50 and
document frequency is 2000 and total number of documents is 10. Then what will
be the TF IDF
10,000
0.025

lOMoARcPSD|7609677
-115
-382
31 Consider the following data, D: {10, 12, 12, 14, 14} what will be jackknife bias of
the mode?
12
0
13
14
32 Consider the following confusion matrix. What is the precision of the model?
0.94
0.75
0.4
0.57
33 Consider the following data which shows 5 hypothesis for robot movement. For
all hypothesis probability given training data (D) is given. As well as probability for
F, L and H based upon hypothesis (hi) is given where F stands for forward, L
stands for Left and R stands for Right. Using the bayes optimal classifier, find the
direction of movement of robot.
2/2
Front
Left
All of the above

lOMoARcPSD|7609677
Right
34 Consider the two rankings: R1 = {A, B, C, D, E} R2= {A, B, C, D, E}What will be

tau coefficient?
0.75
1
0.5
0
35 Consider the following data, D: {1,3,3,5,7} , h=3 using the parzen window
estimation, what will be the probability at X=4.
3/5
1/15
1/5
3/50
36 If data is three dimensional and h=4, what will be the volume of region?
4
12
81
64
37 If value of k is very high in KNN algorithm, model is

Overfitting
Underfitting
None of these
Perfectfit\
38 Sample complexity of non-uniform learnability depends upon:

Accuracy score, confidence score
Accuracy score, confidence score, hypothesis class
Accuracy score, confidence score, hypothesis class and distribution of data
Accuracy score, confidence score, distribution of data
39 Sample complexity of consistency learnability depends upon:

Accuracy score, confidence score, hypothesis class and distribution of data
Accuracy score, confidence score
Accuracy score, confidence score, distribution of data
Accuracy score, confidence score, hypothesis class
40 Choose the correct statement:

In Fbeta score, beta times more importance is given to recall.
In Fbeta score, beta2 times more importance is given to precision.

lOMoARcPSD|7609677
In Fbeta score, beta times more importance is given to precision

In Fbeta score, beta2 times more importance is given to recall.
41 Natarajan dimension is the generalization of

Consistency Learnability
Redemacher complexity
VC-dimensiom
Non-uniform learnability
42 VC dimesion is used for

Infinite hypothesis and multiclass classification problem
Finite hypothesis and multiclass classification problem
infinite hypothesis and binary classification problem.
Finite hypothesis and binary classification problem

In Jackknife method, leave one out method is used.
If dataset is small, jackknife method increases computational complexity
If dataset is small, bootstrap method increases computational complexity.
In bootstrap method to estimate bias and variance, Leave one out method is used.
44 According to no free lunch theorem:

One classifier can be prefer over another without prior knowledge
All classifier do not perform equally if performance is taken average overall objective functions
All classifier perform equally if performance is taken average overall objective functions.
One feature can be prefer over another without prior knowledge
45 What is used to measure the uniform convergence?

Natarajan dimension
VC-dimension
Redemacher complexity
All of these
46 If in density estimation formula, if value of volume is kept constant, then which

of the technique is used?
One vs All
KNN
Parzen window
One vs One
47 A training set is called epsilon-representative if

For every h, Ls(h)-Ld(h)>=epsilon
For every h, Ls(h)-Ld(h)<=epsilon
For every h, |Ls(h)-Ld(h)|>=epsilon
For every h, |Ls(h)-Ld(h)|<=epsilon

lOMoARcPSD|7609677

In boosting samples are not taken with replacement
In bagging, samples are taken without replacement
In boosting samples are taken without replacement
In bagging samples are not taken without replacement
49 According to ugly duckling theorem:

One feature cannot be prefer over another without prior knowledge
All classifier perform equally if performance is taken average overall objective functions.
All classifier do not perform equally if performance is taken average overall objective functions
One classifier can be prefer over another without prior knowledge
50 In the structural risk minimization, prior knowledge is added to model by

All of these.
Applying feature extraction
Adding appropriate weights
Selecting relevant features

To improve the non-uniform algorithm, SRM is used
If non-uniform algorithm fails, we can predict whether it is due to approximation error or
estimation error
If non-uniform algorithm fails, we can’t predict whether it is due to approximation error or
estimation error
If PAC model fails it is due to approximation error

As the hypothesis class decreases, approximation error decreases and estimation error
increases.
As the hypothesis class decreases, approximation error increases and estimation error
decreases.
As the hypothesis class increases, approximation error decreases and estimation error
increases.
As the hypothesis class increases, approximation error increases and estimation error
decreases.
53 Axis aligned rectangle have the VC dimension

3
4
1
2

lOMoARcPSD|7609677
54 Choose the correct statement with respect to All pairs algorithm.

Number of instances in each training set is less than to number of instances in original
training set.
Number of instances in each training set is equal to number of instances in original training
set.
Number of instances in each training set is greater than number of instances in original
training set.
None of these
55 If value of k is very small in KNN algorithm, model is

Overfitting
Perfectfit
Underfitting
None of these
1. What is Machine Learning (ML)?

A. The autonomous acquisition of knowledge through the use of manual
programs
B. The selective acquisition of knowledge through the use of computer
programs
C. The selective acquisition of knowledge through the use of manual
programs
D. The autonomous acquisition of knowledge through the use of computer
programs
Correct option is D
2. Father of Machine Learning (ML)

A. Geoffrey Chaucer
B. Geoffrey Hill
C. Geoffrey Everest Hinton
D. None of the above
Correct option is C
3. Which is FALSE regarding regression?

A. It may be used for interpretation
B. It is used for prediction
C. It discovers causal relationships
D. It relates inputs to outputs
Correct option is C
4. Choose the correct option regarding machine learning (ML) and artificial
intelligence (AI)
A. ML is a set of techniques that turns a dataset into a software

lOMoARcPSD|7609677
B. AI is a software that can emulate the human mind

C. ML is an alternate way of programming intelligent machines
D. All of the above
Correct option is D
5. Which of the factors affect the performance of the learner system does not
include?
A. Good data structures
B. Representation scheme used
C. Training scenario
D. Type of feedback
Correct option is A
6. In general, to have a well-defined learning problem, we must identity which of the

following
A. The class of tasks
B. The measure of performance to be improved
C. The source of experience
D. All of the above
Correct option is D
7. Successful applications of ML
A. Learning to recognize spoken words
B. Learning to drive an autonomous vehicle
C. Learning to classify new astronomical structures
D. Learning to play world-class backgammon
E. All of the above
Correct option is E
8. Which of the following does not include different learning methods

A. Analogy
B. Introduction
C. Memorization
D. Deduction
Correct option is B
9. In language understanding, the levels of knowledge that does not include?

A. Empirical
B. Logical
C. Phonological
D. Syntactic
Correct option is A
10. Designing a machine learning approach involves:-

A. Choosing the type of training experience
B. Choosing the target function to be learned
C. Choosing a representation for the target function

lOMoARcPSD|7609677
D. Choosing a function approximation algorithm

E. All of the above
Correct option is E
11. Concept learning inferred a valued function from training examples of

its input and output.
A. Decimal
B. Hexadecimal
C. Boolean
D. All of the above
Correct option is C
12. Which of the following is not a supervised learning?

A. Naive Bayesian
B. PCA
C. Linear Regression
D. Decision Tree Answer
Correct option is B
13. What is Machine Learning?

• Artificial Intelligence
• Deep Learning
• Data Statistics
A. Only (i)
B. (i) and (ii)
C. All
D. None
Correct option is B
14. What kind of learning algorithm for “Facial identities or facial expressions”?
A. Prediction
B. Recognition Patterns
C. Generating Patterns
D. Recognizing Anomalies Answer
Correct option is B
15. Which of the following is not type of learning?

A. Unsupervised Learning
B. Supervised Learning
C. Semi-unsupervised Learning
D. Reinforcement Learning
Correct option is C
16. Real-Time decisions, Game AI, Learning Tasks, Skill Aquisition, and Robot
Navigation are applications of which of the folowing
A. Supervised Learning: Classification
B. Reinforcement Learning

lOMoARcPSD|7609677
C. Unsupervised Learning: Clustering

D. Unsupervised Learning: Regression
Correct option is B
17. Targetted marketing, Recommended Systems, and Customer Segmentation are

applications in which of the following
B. Unsupervised Learning: Clustering
C. Unsupervised Learning: Regression
Correct option is B
18. Fraud Detection, Image Classification, Diagnostic, and Customer Retention are
A. Unsupervised Learning: Regression
B. Supervised Learning: Classification
Correct option is B
19. Which of the following is not function of symbolic in the various function
representation of Machine Learning?
A. Rules in propotional Logic
B. Hidden-Markov Models (HMM)
C. Rules in first-order predicate logic
D. Decision Trees
Correct option is B
20. Which of the following is not numerical functions in the various function
A. Neural Network
B. Support Vector Machines
C. Case-based
D. Linear Regression
Correct option is C
21. FIND-S Algorithm starts from the most specific hypothesis and generalize it by
considering only
A. Negative
B. Positive
C. Negative or Positive
Correct option is B
22. FIND-S algorithm ignores

A. Negative
B. Positive

lOMoARcPSD|7609677
C. Both
Correct option is A
23. The Candidate-Elimination Algorithm represents the .

A. Solution Space
B. Version Space
C. Elimination Space
D. All of the above
Correct option is B
24. Inductive learning is based on the knowledge that if something happens a lot it is
likely to be generally
A. True
B. False Answer
Correct option is A
25. Inductive learning takes examples and generalizes rather than starting
with
A. Inductive
B. Existing
C. Deductive
D. None of these
Correct option is B
26. A drawback of the FIND-S is that it assumes the consistency within the training
set
A. True
B. False
Correct option is A
27. What strategies can help reduce overfitting in decision trees?

• Enforce a maximum depth for the tree
• Enforce a minimum number of samples in leaf nodes
• Pruning
• Make sure each leaf node is one pure class
A. All
B. (i), (ii) and (iii)
C. (i), (iii), (iv)
D. None
Correct option is B
28. Which of the following is a widely used and effective machine learning algorithm
based on the idea of bagging?
A. Decision Tree
B. Random Forest
C. Regression

lOMoARcPSD|7609677
D. Classification
Correct option is B
29. To find the minimum or the maximum of a function, we set the gradient to zero
because which of the following
A. Depends on the type of problem
B. The value of the gradient at extrema of a function is always zero
C. Both (A) and (B)
D. None of these
Correct option is B
30. Which of the following is a disadvantage of decision trees?

A. Decision trees are prone to be overfit
B. Decision trees are robust to outliers
C. Factor analysis
Correct option is A
31. What is perceptron?

A. A single layer feed-forward neural network with pre-processing
B. A neural network that contains feedback
C. A double layer auto-associative neural network
D. An auto-associative neural network
Correct option is A

• The training time depends on the size of the
• Neural networks can be simulated on a conventional
• Artificial neurons are identical in operation to biological
A. All
B. Only (ii)
C. (i) and (ii)
D. None
Correct option is C
• They have the ability to learn by
• They are more fault
• They are more suited for real time operation due to their high „computational‟
A. (i) and (ii)
B. (i) and (iii)
C. Only (i)
D. All
E. None
Correct option is D
34. What is Neuro software?

lOMoARcPSD|7609677
A. It is software used by Neurosurgeon

B. Designed to aid experts in real world
C. It is powerful and easy neural network
D. A software used to analyze neurons
Correct option is C

A. Each node computes it‟s weighted input
B. Node could be in excited state or non-excited state
C. It has set of nodes and connections
D. All of the above
Correct option is D
36. What is the objective of backpropagation algorithm?

A. To develop learning algorithm for multilayer feedforward neural network, so that
network can be trained to capture the mapping implicitly
B. To develop learning algorithm for multilayer feedforward neural network
C. To develop learning algorithm for single layer feedforward neural network
D. All of the above
Correct option is A

Single layer associative neural networks do not have the ability to:-
• Perform pattern recognition

• Find the parity of a picture
• Determine whether two or more shapes in a picture are connected or not
A. (ii) and (iii)
B. Only (ii)
C. All
D. None
Correct option is A
38. The backpropagation law is also known as generalized delta rule

A. True
B. False
Correct option is A

• On average, neural networks have higher computational rates than conventional
computers.
• Neural networks learn by
• Neural networks mimic the way the human brain
A. All
B. (ii) and (iii)
C. (i), (ii) and (iii)
D. None

lOMoARcPSD|7609677
Correct option is A
39. What is true regarding backpropagation rule?

A. Error in output is propagated backwards only to determine weight updates
B. There is no feedback of signal at nay stage
C. It is also called generalized delta rule
D. All of the above
Correct option is D
40. There is feedback in final stage of backpropagation

A. True
B. False
Correct option is B
41. An auto-associative network is

A. A neural network that has only one loop
C. A single layer feed-forward neural network with pre-processing
D. A neural network that contains no loops
Correct option is B
42. A 3-input neuron has weights 1, 4 and 3. The transfer function is linear with the
constant of proportionality being equal to 3. The inputs are 4, 8 and 5
respectively. What will be the output?
A. 139
B. 153
C. 162
D. 160
Correct option is B
43. What of the following is true regarding backpropagation rule?

A. Hidden layers output is not all important, they are only meant for supporting
input and output layers
B. Actual output is determined by computing the outputs of units for each hidden
layer
C. It is a feedback neural network
Correct option is B

A. It is another name given to the curvy function in the perceptron
B. It is the transmission of error back through the network to allow weights to be
adjusted so that the network can learn
C. It is another name given to the curvy function in the perceptron
Correct option is B

lOMoARcPSD|7609677
45. The general limitations of back propagation rule is/are

A. Scaling
B. Slow convergence
C. Local minima problem
D. All of the above
Correct option is D
46. What is the meaning of generalized in statement “backpropagation is a

generalized delta rule” ?
A. Because delta is applied to only input and output layers, thus making it more
simple and generalized
B. It has no significance
C. Because delta rule can be extended to hidden layer units
Correct option is C
47. Neural Networks are complex functions with many parameter

A. Linear
B. Non linear
C. Discreate
D. Exponential
Correct option is A
48. The general tasks that are performed with backpropagation algorithm
A. Pattern mapping
B. Prediction
C. Function approximation
D. All of the above
Correct option is D
49. Backpropagaion learning is based on the gradient descent along error surface.
A. True
B. False
Correct option is A
50. In backpropagation rule, how to stop the learning process?

A. No heuristic criteria exist
B. On basis of average gradient value
C. There is convergence involved
D. None of these
Correct option is B
51. Applications of NN (Neural Network)

A. Risk management
B. Data validation
C. Sales forecasting
D. All of the above

lOMoARcPSD|7609677
Correct option is D
layers is known as
A. Recurrent neural network
B. Self organizing maps
C. Perceptrons
D. Single layered perceptron
Correct option is A
53. Decision Tree is a display of an Algorithm?

A. True
B. False
Correct option is A
54. Which of the following is/are the decision tree nodes?

A. End Nodes
B. Decision Nodes
C. Chance Nodes
D. All of the above
Correct option is D
55. End Nodes are represented by which of the following

A. Solar street light
B. Triangles
C. Circles
D. Squares
Correct option is B
56. Decision Nodes are represented by which of the following

B. Triangles
C. Circles
D. Squares
Correct option is D
57. Chance Nodes are represented by which of the following

B. Triangles
C. Circles
D. Squares
Correct option is C
58. Advantage of Decision Trees

A. Possible Scenarios can be added
B. Use a white box model, if given result is provided by a model
C. Worst, best and expected values can be determined for different scenarios

lOMoARcPSD|7609677
D. All of the above

Correct option is D
59. terms are required for building a bayes model.

A. 1
B. 2
C. 3
D. 4
Correct option is C
60. Which of the following is the consequence between a node and its predecessors
while creating bayesian network?
A. Conditionally independent
B. Functionally dependent
C. Both Conditionally dependant & Dependant
D. Dependent
Correct option is A
61. Why it is needed to make probabilistic systems feasible in the world?

A. Feasibility
B. Reliability
C. Crucial robustness
Correct option is C
62. Bayes rule can be used for:-

A. Solving queries
B. Increasing complexity
C. Answering probabilistic query
D. Decreasing complexity
Correct option is C
63. provides way and means of weighing up the desirability of goals and the
likelihood of achieving
A. Utility theory
B. Decision theory
C. Bayesian networks
D. Probability theory
Correct option is A
64. Which of the following provided by the Bayesian Network?

A. Complete description of the problem
B. Partial description of the domain
C. Complete description of the domain
D. All of the above
Correct option is C

lOMoARcPSD|7609677
65. Probability provides a way of summarizing the that comes from our laziness and
A. Belief
B. Uncertaintity
C. Joint probability distributions
D. Randomness
Correct option is B
66. The entries in the full joint probability distribution can be calculated as
A. Using variables
B. Both Using variables & information
C. Using information
D. All of the above
Correct option is C
67. Causal chain (For example, Smoking cause cancer) gives rise to:-
A. Conditionally Independence
B. Conditionally Dependence
C. Both
Correct option is A
68. The bayesian network can be used to answer any query by using:-
A. Full distribution
B. Joint distribution
C. Partial distribution
D. All of the above
Correct option is B
69. Bayesian networks allow compact specification of:-

A. Joint probability distributions
B. Belief
C. Propositional logic statements
D. All of the above
Correct option is A
70. The compactness of the bayesian network can be described by

A. Fully structured
B. Locally structured
C. Partially structured
D. All of the above
Correct option is B
71. The Expectation-Maximization Algorithm has been used to identify conserved

domains in unaligned proteins only. State True or False.
A. True
B. False

lOMoARcPSD|7609677
Correct option is B
72. Which of the following is correct about the Naive Bayes?

A. Assumes that all the features in a dataset are independent
B. Assumes that all the features in a dataset are equally important
C. Both
D. All of the above
Correct option is C
73. Which of the following is false regarding EM Algorithm?

A. The alignment provides an estimate of the base or amino acid composition of
each column in the site
B. The column-by-column composition of the site already available is used to
estimate the probability of finding the site at any position in each of the
sequences
C. The row-by-column composition of the site already available is used to estimate
the probability
Correct option is C
74. Naïve Bayes Algorithm is a learning algorithm.

A. Supervised
B. Reinforcement
C. Unsupervised
D. None of these
Correct option is A
75. EM algorithm includes two repeated steps, here the step 2 is .

A. The normalization
B. The maximization step
C. The minimization step
Correct option is C
76. Examples of Naïve Bayes Algorithm is/are

A. Spam filtration
B. Sentimental analysis
C. Classifying articles
D. All of the above
Correct option is D
77. In the intermediate steps of “EM Algorithm”, the number of each base in each
column is determined and then converted to
A. True
B. False
Correct option is A

lOMoARcPSD|7609677
78. Naïve Bayes algorithm is based on and used for solving classification problems.
A. Bayes Theorem
B. Candidate elimination algorithm
C. EM algorithm
Correct option is A
79. Types of Naïve Bayes Model:

A. Gaussian
B. Multinomial
C. Bernoulli
D. All of the above
Correct option is D
80. Disadvantages of Naïve Bayes Classifier:

A. Naive Bayes assumes that all features are independent or unrelated, so it cannot
learn the relationship between
B. It performs well in Multi-class predictions as compared to the other
C. Naïve Bayes is one of the fast and easy ML algorithms to predict a class of
D. It is the most popular choice for text classification problems.
Correct option is A
81. The benefit of Naïve Bayes:-

A. Naïve Bayes is one of the fast and easy ML algorithms to predict a class of
B. It is the most popular choice for text classification problems.
C. It can be used for Binary as well as Multi-class
D. All of the above
Correct option is D
82. In which of the following types of sampling the information is carried out under
the opinion of an expert?
A. Convenience sampling
B. Judgement sampling
C. Quota sampling
D. Purposive sampling
Correct option is B
83. Full form of MDL?

A. Minimum Description Length
B. Maximum Description Length
C. Minimum Domain Length
D. None of these
Correct option is A
84. For the analysis of ML algorithms, we need

A. Computational learning theory
B. Statistical learning theory

lOMoARcPSD|7609677
C. Both A & B
D. None of these
Correct option is C
85. PAC stand for

A. Probably Approximate Correct
B. Probably Approx Correct
C. Probably Approximate Computation
D. Probably Approx Computation
Correct option is A
86. hypothesis h with respect to target concept c and distribution D , is the probability
that h will misclassify an instance drawn at random according to D.
A. True Error
B. Type 1 Error
C. Type 2 Error
D. None of these
Correct option is A
87. Statement: True error defined over entire instance space, not just training data
A. True
B. False
Correct option is A
88. What are the area CLT comprised of?

A. Sample Complexity
B. Computational Complexity
C. Mistake Bound
D. All of these
Correct option is D
88. What area of CLT tells “How many examples we need to find a good hypothesis
?”?
C. Mistake Bound
D. None of these
Correct option is A
89. What area of CLT tells “How much computational power we need to find a good
hypothesis ?”?
C. Mistake Bound
D. None of these
Correct option is B

lOMoARcPSD|7609677
90. What area of CLT tells “How many mistakes we will make before finding a good
hypothesis ?”?
C. Mistake Bound
D. None of these
Correct option is C
91. (For question no. 9 and 10) Can we say that concept described by conjunctions of
Boolean literals are PAC learnable?
A. Yes
B. No
Correct option is A
92. How large is the hypothesis space when we have n Boolean attributes?
A. |H| = 3 n
B. |H| = 2 n
C. |H| = 1 n
D. |H| = 4n
Correct option is A
93. The VC dimension of hypothesis space H1 is larger than the VC dimension of

hypothesis space H2. Which of the following can be inferred from this?
A. The number of examples required for learning a hypothesis in H1 is larger than
the number of examples required for H2
B. The number of examples required for learning a hypothesis in H1 is smaller than
the number of examples required for
C. No relation to number of samples required for PAC learning.
Correct option is A
94. For a particular learning task, if the requirement of error parameter changes from
0.1 to 0.01. How many more samples will be required for PAC learning?
A. Same
B. 2 times
C. 1000 times
D. 10 times
Correct option is D
95. Computational complexity of classes of learning problems depends on which of

the following?
A. The size or complexity of the hypothesis space considered by learner
B. The accuracy to which the target concept must be approximated
C. The probability that the learner will output a successful hypothesis
D. All of these
Correct option is D
96. The instance-based learner is a

lOMoARcPSD|7609677
A. Lazy-learner
B. Eager learner
C. Can‟t say
Correct option is A
97. When to consider nearest neighbour algorithms?

A. Instance map to point in kn
B. Not more than 20 attributes per instance
C. Lots of training data
D. None of these
E. A, B & C
Correct option is E
98. What are the advantages of Nearest neighbour alogo?

A. Training is very fast
B. Can learn complex target functions
C. Don‟t lose information
D. All of these
Correct option is D
99. What are the difficulties with k-nearest neighbour algo?

A. Calculate the distance of the test case from all training cases
B. Curse of dimensionality
C. Both A & B
D. None of these
Correct option is C
100. What if the target function is real valued in kNN algo?

A. Calculate the mean of the k nearest neighbours
B. Calculate the SD of the k nearest neighbour
C. None of these
Correct option is A
101. What is/are true about Distance-weighted KNN?

A. The weight of the neighbour is considered
B. The distance of the neighbour is considered
C. Both A & B
D. None of these
Correct option is C
102. What is/are advantage(s) of Distance-weighted k-NN over k-NN?

A. Robust to noisy training data
B. Quite effective when a sufficient large set of training data is provided
C. Both A & B
D. None of these
Correct option is C

lOMoARcPSD|7609677
103. What is/are advantage(s) of Locally Weighted Regression?

A. Pointwise approximation of complex target function
B. Earlier data has no influence on the new ones
C. Both A & B
D. None of these
Correct option is C
104. The quality of the result depends on (LWR)

A. Choice of the function
B. Choice of the kernel function K
C. Choice of the hypothesis space H
D. All of these
Correct option is D
105. How many types of layer in radial basis function neural networks?
A. 3
B. 2
C. 1
D. 4
Correct option is A, Input layer, Hidden layer, and Output layer
106. The neurons in the hidden layer contains Gaussian transfer function
whose output are to the distance from the centre of the neuron.
A. Directly
B. Inversely
C. equal
D. None of these
Correct option is B
107. PNN/GRNN networks have one neuron for each point in the training file,
While RBF network have a variable number of neurons that is usually
A. less than the number of training
B. greater than the number of training points
C. equal to the number of training points
D. None of these
Correct option is A
108. Which network is more accurate when the size of training set between
small to medium?
A. PNN/GRNN
B. RBF
C. K-means clustering
D. None of these
Correct option is A
109. What is/are true about RBF network?

A. A kind of supervised learning

lOMoARcPSD|7609677
B. Design of NN as curve fitting problem

C. Use of multidimensional surface to interpolate the test data
D. All of these
Correct option is D
110. Application of CBR

A. Design
B. Planning
C. Diagnosis
D. All of these
Correct option is A
111. What is/are advantages of CBR?

A. A local approx. is found for each test case
B. Knowledge is in a form understandable to human
C. Fast to train
D. All of these
Correct option is D
112 In k-NN algorithm, given a set of training examples and the value of k < size of training set
(n), the algorithm predicts the class of a test example to be the. What is/are advantages of CBR?
A. Least frequent class among the classes of k closest training

B. Most frequent class among the classes of k closest training
C. Class of the closest
D. Most frequent class among the classes of the k farthest training examples.
Correct option is B
113. Which of the following statements is true about PCA?

• We must standardize the data before applying
• We should select the principal components which explain the highest variance
• We should select the principal components which explain the lowest variance
• We can use PCA for visualizing the data in lower dimensions
A. (i), (ii) and (iv).
B. (ii) and (iv)
C. (iii) and (iv)
D. (i) and (iii)
Correct option is A
114. Genetic algorithm is a

A. Search technique used in computing to find true or approximate solution to
optimization and search problem
B. Sorting technique used in computing to find true or approximate solution to
optimization and sort problem
C. Both A & B
D. None of these
Correct option is A

lOMoARcPSD|7609677
115. GA techniques are inspired by

A. Evolutionary
B. Cytology
C. Anatomy
D. Ecology
Correct option is A
116. When would the genetic algorithm terminate?

A. Maximum number of generations has been produced
B. Satisfactory fitness level has been reached for the
C. Both A & B
D. None of these
Correct option is C
117. The algorithm operates by iteratively updating a pool of hypotheses,

called the
A. Population
B. Fitness
C. None of these
Correct option is A
118. What is the correct representation of GA?

A. GA(Fitness, Fitness_threshold, p)
B. GA(Fitness, Fitness_threshold, p, r )
C. GA(Fitness, Fitness_threshold, p, r, m)
D. GA(Fitness, Fitness_threshold)
Correct option is C
119. Genetic operators includes

A. Crossover
B. Mutation
C. Both A & B
D. None of these
Correct option is C
120. Produces two new offspring from two parent string by copying selected
bits from each parent is called
A. Mutation
B. Inheritance
C. Crossover
D. None of these
Correct option is C
121. Each schema the set of bit strings containing the indicated as
A. 0s, 1s
B. only 0s
C. only 1s

lOMoARcPSD|7609677
D. 0s, 1s, *s
Correct option is D
122. 0*10 represents the set of bit strings that includes exactly (A) 0010, 0110
A. 0010, 0010
B. 0100, 0110
C. 0100, 0010
Correct option is A
123. Correct ( h ) is the percent of all training examples correctly classified by

hypothesis then Fitness function is equal to
A. Fitness ( h) = (correct ( h)) 2
B. Fitness ( h) = (correct ( h)) 3
C. Fitness ( h) = (correct ( h))
D. Fitness ( h) = (correct ( h)) 4
Correct option is A
124. Statement: Genetic Programming individuals in the evolving population

are computer programs rather than bit
A. True
B. False
Correct option is A
125. evolution over many generations was directly influenced by

the experiences of individual organisms during their lifetime
A. Baldwin
B. Lamarckian
C. Bayes
D. None of these
Correct option is B
126. Search through the hypothesis space cannot be characterized. Why?

A. Hypotheses are created by crossover and mutation operators that allow radical
changes between successive generations
B. Hypotheses are not created by crossover and mutation
C. None of these
Correct option is A
127. ILP stand for

A. Inductive Logical programming
B. Inductive Logic Programming
C. Inductive Logical Program
D. Inductive Logic Program
Correct option is B
128. What is/are the requirement for the Learn-One-Rule method?

A. Input, accepts a set of +ve and -ve training examples.

lOMoARcPSD|7609677
B. Output, delivers a single rule that covers many +ve examples and few -ve.
C. Output rule has a high accuracy but not necessarily a high
D. A & B
E. A, B & C
Correct option is E
129. is any predicate (or its negation) applied to any set of terms.
A. Literal
B. Null
C. Clause
D. None of these
Correct option is A
130. Ground literal is a literal that

A. Contains only variables
B. does not contains any functions
C. does not contains any variables
D. Contains only functions Answer
Correct option is C
131. emphasizes learning feedback that evaluates the learner’s

performance without providing standards of correctness in the form of
behavioural
A. Reinforcement learning
C. None of these
Correct option is A
132. Features of Reinforcement learning

A. Set of problem rather than set of techniques
B. RL is training by reward and
C. RL is learning from trial and error with the
D. All of these
Correct option is D
133. Which type of feedback used by RL?

A. Purely Instructive feedback
B. Purely Evaluative feedback
C. Both A & B
D. None of these
Correct option is B
134. What is/are the problem solving methods for RL?

A. Dynamic programming
B. Monte Carlo Methods
C. Temporal-difference learning
D. All of these

lOMoARcPSD|7609677
Correct option is D
135. The FIND-S Algorithm

A. Starts with starts from the most specific hypothesis Answer
B. It considers negative examples
C. It considers both negative and positive
D. None of these Correct
136. The hypothesis space has a general-to-specific ordering of hypotheses, and the search can
be efficiently organized by taking advantage of a naturally occurring structure over the
hypothesis space
1.
A. TRUE
B. FALSE
Correct option is A
137. The Version space is:
A. The subset of all hypotheses is called the version space with respect to the
hypothesis space H and the training examples D, because it contains all plausible
versions of the target
B. The version space consists of only specific
C. None of these
D.
Correct option is A
138. The Candidate-Elimination Algorithm

A. The key idea in the Candidate-Elimination algorithm is to output a
description of the set of all hypotheses consistent with the training
B. Candidate-Elimination algorithm computes the description of this set
without explicitly enumerating all of its
C. This is accomplished by using the more-general-than partial ordering
and maintaining a compact representation of the set of consistent
D. All of these
Correct option is D
139. Concept learning is basically acquiring the definition of a general category

from given sample positive and negative training examples of the
A. TRUE
B. FALSE
Correct option is A
140. The hypothesis h1 is more-general-than hypothesis h2 ( h1 > h2) if and

only if h1≥h2 is true and h2≥h1 is false. We also say h2 is more-specific-than h1
A. The statement is true
B. The statement is false
C. We cannot

lOMoARcPSD|7609677
D. None of these
Correct option is A
141. The List-Then-Eliminate Algorithm

A. The List-Then-Eliminate algorithm initializes the version space to
contain all hypotheses in H, then eliminates any hypothesis found
inconsistent with any training
B. The List-Then-Eliminate algorithm not initializes to the version
C. None of these Answer
Correct option is A
A. Learning
B. Hearing
C. Perceiving
D. Speech
Correct option is A
143. Which modifies the performance element so that it makes better

decision?Performance element
A. Performance element
B. Changing element
C. Learning element
D. None of the mentioned
Correct option is C
144. Any hypothesis found to approximate the target function well over a
sufficiently large set of training examples will also approximate the target
function well over other unobserved example is called:
A. Inductive Learning Hypothesis
B. Null Hypothesis
C. Actual Hypothesis
D. None of these
Correct option is A
145. Feature of ANN in which ANN creates its own organization or

representation of information it receives during learning time is
A. Adaptive Learning
B. Self Organization
C. What-If Analysis
D. Supervised Learning
Correct option is B

A. Single test
B. Two test
C. Sequence of test

lOMoARcPSD|7609677
D. No test
Correct option is C

• Factor analysis
• Decision trees are robust to outliers
• Decision trees are prone to be overfit
• None of the above
Correct option is C
148. Tree/Rule based classification algorithms generate which rule to perform

the classification.
A. if-then.
B. then
C. do
D. Answer
Correct option is A
149. What is Gini Index?

A. It is a type of index structure
B. It is a measure of purity
C. None of the options
Correct option is A
150. What is not a RNN in machine learning?

A. One output to many inputs
B. Many inputs to a single output
C. RNNs for nonsequential input
D. Many inputs to many outputs
Correct option is A
151. Which of the following sentences are correct in reference to Information

gain?
A. It is biased towards multi-valued attributes
B. ID3 makes use of information gain
C. The approach used by ID3 is greedy
D. All of these
Correct option is D
152. A Neural Network can answer

A. For Loop questions
B. what-if questions
C. IF-The-Else Analysis Questions
D. None of these Answer
Correct option is B
153. Artificial neural network used for

lOMoARcPSD|7609677
A. Pattern Recognition
B. Classification
C. Clustering
D. All Answer
Correct option is D
154. Which of the following are the advantage/s of Decision Trees?

B. Use a white box model, If given result is provided by a model
D. All of the mentioned
Correct option is D
155. What is the mathematical likelihood that something will occur?

A. Classification
B. Probability
C. Naïve Bayes Classifier
D. None of the other
Correct option is C
A. What does the Bayesian network provides?

B. Complete description of the domain
C. Partial description of the domain
D. Complete description of the problem
E. None of the mentioned
Correct option is C

A. Solving queries
C. Decreasing complexity
D. Answering probabilistic query
Correct option is D
158. How many terms are required for building a Bayes model?
A. 2
B. 3
C. 4
D. 1
Correct option is B
159. What is needed to make probabilistic systems feasible in the world?

A. Reliability
B. Crucial robustness
C. Feasibility
Correct option is B

lOMoARcPSD|7609677
160. It was shown that the Naive Bayesian method

A. Can be much more accurate than the optimal Bayesian method
B. Is always worse off than the optimal Bayesian method
C. Can be almost optimal only when attributes are independent
D. Can be almost optimal when some attributes are dependent
Correct option is C
161. What is the consequence between a node and its predecessors while
creating Bayesian network?
A. Functionally dependent
B. Dependant
C. Conditionally independent
D. Both Conditionally dependant & Dependant
Correct option is C
162. How the compactness of the Bayesian network can be described?

A. Locally structured
B. Fully structured
C. Partial structure
Correct option is A
163. How the entries in the full joint probability distribution can be calculated?
A. Using variables
B. Using information
C. Both Using variables & information
Correct option is B
164. How the Bayesian network can be used to answer any query?
Correct option is B
165. Sample Complexity is

A. The sample complexity is the number of training-samples that we
need to supply to the algorithm, so that the function returned by the
algorithm is within an arbitrarily small error of the best possible
function, with probability arbitrarily close to 1
B. How many training examples are needed for learner to converge to a
successful hypothesis.
C. All of these
Correct option is C
166. PAC stands for

lOMoARcPSD|7609677
A. Probability Approximately Correct

B. Probability Applied Correctly
C. Partition Approximately Correct
Correct option is A
167. Which of the following will be true about k in k-NN in terms of variance
A. When you increase the k the variance will increases
B. When you decrease the k the variance will increases
C. Can‟t say
D. None of these
Correct option is B
168. Which of the following option is true about k-NN algorithm?

A. It can be used for classification
B. It can be used for regression
C. It can be used in both classification and regression Answer
Correct option is C
169. In k-NN it is very likely to overfit due to the curse of dimensionality.

Which of the following option would you consider to handle such problem? 1).
Dimensionality Reduction 2). Feature selection
A. 1
B. 2
C. 1 and 2
D. None of these
Correct option is C
170. When you find noise in data which of the following option would you
consider in k- NN
A. I will increase the value of k
B. I will decrease the value of k
C. Noise can not be dependent on value of k
D. None of these
Correct option is A
171. Which of the following will be true about k in k-NN in terms of Bias?
A. When you increase the k the bias will be increases
B. When you decrease the k the bias will be increases
C. Can‟t say
D. None of these
Correct option is A
172. What is used to mitigate overfitting in a test set?

A. Overfitting set
B. Training set
C. Validation dataset
D. Evaluation set

lOMoARcPSD|7609677
Correct option is C
173. A radial basis function is a

A. Activation function
B. Weight
C. Learning rate
D. none
Correct option is A
174. Mistake Bound is

A. How many training examples are needed for learner to converge to a successful
hypothesis.
B. How much computational effort is needed for a learner to converge to a
successful hypothesis
C. How many training examples will the learner misclassify before conversing to a
D. None of these
Correct option is C
175. All of the following are suitable problems for genetic algorithms EXCEPT
A. dynamic process control
B. pattern recognition with complex patterns
C. simulation of biological models
D. simple optimization with few variables
Correct option is D
176. Adding more basis functions in a linear model… (Pick the most probably
option)
A. Decreases model bias
B. Decreases estimation bias
C. Decreases variance
D. Doesn‟t affect bias and variance
Correct option is A
177. Which of these are types of crossover

A. Single point
B. Two point
C. Uniform
D. All of these
Correct option is D
178. A feature F1 can take certain value: A, B, C, D, E, & F and represents grade
of students from a college. Which of the following statement is true in following
case?
A. Feature F1 is an example of nominal
B. Feature F1 is an example of ordinal
C. It doesn‟t belong to any of the above category.

lOMoARcPSD|7609677
Correct option is B
179. You observe the following while fitting a linear regression to the data: As
you increase the amount of training data, the test error decreases and the
training error increases. The train error is quite low (almost what you expect it to),
while the test error is much higher than the train error. What do you think is the
main reason behind this behaviour? Choose the most probable option.
A. High variance
B. High model bias
C. High estimation bias
D. None of the above Answer
Correct option is C
180. Genetic algorithms are heuristic methods that do not guarantee an

optimal solution to a problem
A. TRUE
B. FALSE
Correct option is A
181. Which of the following statements about regularization is not correct?

A. Using too large a value of lambda can cause your hypothesis to
underfit the
B. Using too large a value of lambda can cause your hypothesis to
overfit the
C. Using a very large value of lambda cannot hurt the performance of
your hypothesis.
Correct option is A
182. Consider the following: (a) Evolution (b) Selection (c) Reproduction (d)
Mutation Which of the following are found in genetic algorithms?
A. All
B. a, b, c
C. a, b
D. b, d
Correct option is A
183. Genetic Algorithm are a part of

A. Evolutionary Computing
B. inspired by Darwin’s theory about evolution – “survival of the fittest”
C. are adaptive heuristic search algorithm based on the evolutionary
ideas of natural selection and genetics
D. All of the above
Correct option is D
184. Genetic algorithms belong to the family of methods in the

A. artificial intelligence area

lOMoARcPSD|7609677
B. optimization
C. complete enumeration family of methods
D. Non-computer based (human) solutions area
Correct option is A
185. For a two player chess game, the environment encompasses the
opponent
A. True
B. False
Correct option is A
186. Which among the following is not a necessary feature of a reinforcement

learning solution to a learning problem?
A. exploration versus exploitation dilemma
B. trial and error approach to learning
C. learning based on rewards
D. representation of the problem as a Markov Decision Process
Correct option is D
187. Which of the following sentence is FALSE regarding reinforcement

learning
A. It relates inputs to
B. It is used for
C. It may be used for
D. It discovers causal relationships.
Correct option is D
188. The EM algorithm is guaranteed to never decrease the value of its

objective function on any iteration
A. TRUE
B. FALSE Answer
Correct option is A
189. Consider the following modification to the tic-tac-toe game: at the end of
game, a coin is tossed and the agent wins if a head appears regardless of
whatever has happened in the game.Can reinforcement learning be used to learn
an optimal policy of playing Tic-Tac-Toe in this case?
A. Yes
B. No
Correct option is B
190. Out of the two repeated steps in EM algorithm, the step 2 is _
A. the maximization step

B. the minimization step
C. the optimization step
D. the normalization step

lOMoARcPSD|7609677
Correct option is A
191. Suppose the reinforcement learning player was greedy, that is, it always
played the move that brought it to the position that it rated the best. Might it
learn to play better, or worse, than a non greedy player?
A. Worse
B. Better
Correct option is B
192. A chess agent trained by using Reinforcement Learning can be trained by

playing against a copy of the same
A. True
B. False
Correct option is A
193. The EM iteration alternates between performing an expectation (E) step,

which creates a function for the expectation of the log-likelihood evaluated using
the current estimate for the parameters, and a maximization (M) step, which
computes parameters maximizing the expected log-likelihood found on the E
A. TRUE
B. FALSE
Correct option is A
194. Expectation–maximization (EM) algorithm is an

A. Iterative
B. Incremental
C. None
Correct option is A
195. Feature need to be identified by using Well Posed Learning Problem:

A. Class of tasks
B. Performance measure
C. Training experience
D. All of these
Correct option is D
196. A computer program that learns to play checkers might improve its
performance as:
A. Measured by its ability to win at the class of tasks involving playing
checkers
B. Experience obtained by playing games against
C. Both a & b
D. None of these
Correct option is C
197. Learning symbolic representations of concepts known as:

A. Artificial Intelligence

lOMoARcPSD|7609677
B. Machine Learning
C. Both a & b
D. None of these
Correct option is A
198. The field of study that gives computers the capability to learn without
being explicitly programmed
A. Machine Learning
B. Artificial Intelligence
C. Deep Learning
D. Both a & b
Correct option is A
199. The autonomous acquisition of knowledge through the use of computer

programs is called
B. Machine Learning
C. Deep learning
D. All of these
Correct option is B
200. Learning that enables massive quantities of data is known as

B. Machine Learning
C. Deep learning
D. All of these
Correct option is B
201. A different learning method does not include

A. Memorization
B. Analogy
C. Deduction
D. Introduction
Correct option is D
202. Types of learning used in machine

A. Supervised
B. Unsupervised
C. Reinforcement
D. All of these
Correct option is D
203. A computer program is said to learn from experience E with respect to

some class of tasks T and performance measure P, if its performance at tasks in T,
as measured by P, improves with experience
A. Supervised learning problem
B. Un Supervised learning problem

lOMoARcPSD|7609677
C. Well posed learning problem

D. All of these
Correct option is C
204. Which of the following is a widely used and effective machine learning
algorithm based on the idea of bagging?
A. Decision Tree
B. Regression
C. Classification
D. Random Forest
Correct option is D

A. 1
B. 2
C. 3
D. 4
Correct option is C
205. A model can learn based on the rewards it received for its previous action
is known as:
A. Supervised learning
B. Unsupervised learning
C. Reinforcement learning
D. Concept learning
Correct option is C
206. A subset of machine learning that involves systems that think and learn
like humans using artificial neural networks.
B. Machine Learning
C. Deep Learning
D. All of these
Correct option is C
207. A learning method in which a training data contains a small amount of

labeled data and a large amount of unlabeled data is known
as
A. Supervised Learning
B. Semi Supervised Learning
C. Unsupervised Learning
Correct option is C
208. Methods used for the calibration in Supervised Learning

A. Platt Calibration
B. Isotonic Regression

lOMoARcPSD|7609677
C. All of these
D. None of above
Correct option is C
209. The basic design issues for designing a learning

A. Choosing the Training Experience
B. Choosing the Target Function
C. Choosing a Function Approximation Algorithm
D. Estimating Training Values
E. All of these
Correct option is E
210. In Machine learning the module that must solve the given performance
task is known as:
A. Critic
B. Generalizer
C. Performance system
D. All of these
Correct option is C
211. A learning method that is used to solve a particular computational

program, multiple models such as classifiers or experts are strategically generated
and combined is called as
E. Ensemble learning
Correct option is E
212. In a learning system the component that takes as takes input the current
hypothesis (currently learned function) and outputs a new problem for the
Performance System to explore.
A. Critic
B. Generalizer
D. Experiment generator
E. All of these
Correct option is D
213. Learning method that is used to improve the classification, prediction,

function approximation etc of a model
Correct option is E

lOMoARcPSD|7609677
214. In a learning system the component that takes as input the history or
trace of the game and produces as output a set of training examples of the target
function is known as:
A. Critic
B. Generalizer
D. All of these
Correct option is A
215. The most common issue when using ML is

A. Lack of skilled resources
B. Inadequate Infrastructure
C. Poor Data Quality
D. None of these
Correct option is C
216. How to ensure that your model is not over fitting

A. Cross validation
B. Regularization
C. All of these
D. None of these
Correct option is C
217. A way to ensemble multiple classifications or regression

A. Stacking
B. Bagging
C. Blending
D. Boosting
Correct option is A
218. How well a model is going to generalize in new environment is known as

A. Data Quality
B. Transparent
C. Implementation
D. None of these
Correct option is B
219. Common classes of problems in machine learning is

A. Classification
B. Clustering
C. Regression
D. All of these
Correct option is D
A. Decision Tree

lOMoARcPSD|7609677
B. Regression
C. Classification
D. Random Forest
Correct option is D
221. Cost complexity pruning algorithm is used in?

A. CART
B. 5
C. ID3
D. All of
Correct option is A
222. Which one of these is not a tree based learner?

A. CART
B. 5
C. ID3
D. Bayesian Classifier
Correct option is D
223. Which one of these is a tree based learner?

A. Rule based
B. Bayesian Belief Network
C. Bayesian classifier
D. Random Forest
Correct option is D
224. What is the approach of basic algorithm for decision tree induction?
A. Greedy
B. Top Down
C. Procedural
D. Step by Step
Correct option is A
225. Which of the following classifications would best suit the student
performance classification systems?
A. If-.then-analysis
B. Market-basket analysis
C. Regression analysis
D. Cluster analysis
Correct option is A
226. What are two steps of tree pruning work?

A. Pessimistic pruning and Optimistic pruning
B. Post pruning and Pre pruning
C. Cost complexity pruning and time complexity pruning
D. None of these
Correct option is B

lOMoARcPSD|7609677
227. How will you counter over-fitting in decision tree?

A. By pruning the longer rules
B. By creating new rules
C. Both By pruning the longer rules‟ and „ By creating new rules‟
D. None of Answer
Correct option is A
228. Which of the following sentences are true?

A. In pre-pruning a tree is ‘pruned’ by halting its construction early
B. A pruning set of class labeled tuples is used to estimate cost
C. The best pruned tree is the one that minimizes the number of
encoding
D. All of these
Correct option is D

A. Factor analysis
C. Decision trees are prone to be over fit
Correct option is C
230. In which of the following scenario a gain ratio is preferred over

Information Gain?
A. When a categorical variable has very large number of category
B. When a categorical variable has very small number of category
C. Number of categories is the not the reason
D. None of these
Correct option is A
231. Major pruning techniques used in decision tree are

A. Minimum error
B. Smallest tree
C. Both a & b
D. None of these
Correct option is B
232. What does the central limit theorem state?

A. If the sample size increases sampling distribution must approach
normal distribution
B. If the sample size decreases then the sample distribution must
approach normal distribution.
C. If the sample size increases then the sampling distributions much
approach an exponential
D. If the sample size decreases then the sampling distributions much
approach an exponential
Correct option is A

lOMoARcPSD|7609677
233. The difference between the sample value expected and the estimates
value of the parameter is called as?
A. Bias
B. Error
C. Contradiction
D. Difference
Correct option is A
234. In which of the following types of sampling the information is carried out
under the opinion of an expert?
A. Quota sampling
B. Convenience sampling
C. Purposive sampling
D. Judgment sampling
Correct option is D
235. Which of the following is a subset of population?

A. Distribution
B. Sample
C. Data
D. Set
Correct option is B
236. The sampling error is defined as?

A. Difference between population and parameter
B. Difference between sample and parameter
C. Difference between population and sample
D. Difference between parameter and sample
Correct option is C
237. Machine learning is interested in the best hypothesis h from some space
H, given observed training data D. Here best hypothesis means
A. Most general hypothesis
B. Most probable hypothesis
C. Most specific hypothesis
D. None of these
Correct option is B
238. Practical difficulties with Bayesian Learning :

A. Initial knowledge of many probabilities is required
B. No consistent hypothesis
C. Hypotheses make probabilistic predictions
D. None of these
Correct option is A

lOMoARcPSD|7609677
239. Bayes’ theorem states that the relationship between the probability of the
hypothesis before getting the evidence P(H) and the probability of the hypothesis
after getting the evidence P(H∣E) is
A. [P(E∣H)P(H)] / P(E)
B. [P(E∣H) P(E) ] / P(H)
C. [P(E) P(H) ] / P(E∣H)
D. None of these
Correct option is A
240. A doctor knows that Cold causes fever 50% of the time. Prior probability
of any patient having cold is 1/50,000. Prior probability of any patient having
fever is 1/20. If a patient has fever, what is the probability he/she has cold?
A. P(C/F)= 0.0003
B. P(C/F)=0.0004
C. P(C/F)= 0.0002
D. P(C/F)=0.0045
Correct option is C
241. Which of the following will be true about k in K-Nearest Neighbor in

terms of Bias?
C. Can‟t say
D. None of these
Correct option is A
consider in K- Nearest Neighbor?
C. Noise cannot be dependent on value of k
D. None of these
Correct option is A
243. In K-Nearest Neighbor it is very likely to overfit due to the curse of

dimensionality. Which of the following option would you consider to handle such
problem?
• Dimensionality Reduction
• Feature selection
A. 1
B. 2
C. 1 and 2
D. None of these
Correct option is C
244. Radial basis functions is closely related to distance-weighted regression,

but it is
A. lazy learning

lOMoARcPSD|7609677
B. eager learning
C. concept learning
D. none of these
Correct option is B
245. Radial basis function networks provide a global approximation to the

target function, represented by of many local kernel function.
A. a series combination
B. a linear combination
C. a parallel combination
D. a non linear combination
Correct option is B
246. The most significant phase in a genetic algorithm is

A. Crossover
B. Mutation
C. Selection
D. Fitness function
Correct option is A
247. The crossover operator produces two new offspring from

A. Two parent strings, by copying selected bits from each parent
B. One parent strings, by copying selected bits from selected parent
C. Two parent strings, by copying selected bits from one parent
D. None of these
Correct option is A
248. Mathematically characterize the evolution over time of the population

within a GA based on the concept of
A. Schema
B. Crossover
C. Don‟t care
D. Fitness function
Correct option is A
249. In genetic algorithm process of selecting parents which mate and

recombine to create off-springs for the next generation is known as:
A. Tournament selection
B. Rank selection
C. Fitness sharing
D. Parent selection
Correct option is D
250. Crossover operations are performed in genetic programming by replacing

A. Randomly chosen sub tree of one parent program by a sub tree from
the other parent program.

lOMoARcPSD|7609677
B. Randomly chosen root node tree of one parent program by a sub tree
from the other parent program
C. Randomly chosen root node tree of one parent program by a root
node tree from the other parent program
D. None of these
Correct option is A
1) If you remove the following any one red points from the data. Does the
decision boundary will change?
A) Yes
B) No
2) [True or False] If you remove the non-red circled points from the data,
the decision boundary will change?
A) True
B) False
3) What do you mean by generalization error in terms of the SVM?

A) How far the hyperplane is from the support vectors
B) How accurately the SVM can predict outcomes for unseen data
C) The threshold amount of error in an SVM

lOMoARcPSD|7609677
4) When the C parameter is set to infinite, which of the following holds

true?
A) The optimal hyperplane if exists, will be the one that completely
separates the data
B) The soft-margin classifier will separate the data
C) None of the above
5) What do you mean by a hard margin?

A) The SVM allows very low error in classification
B) The SVM allows high amount of error in classification
C) None of the above
6) The minimum time complexity for training an SVM is O(n2). According to

this fact, what sizes of datasets are not best suited for SVM’s?
A) Large datasets
B) Small datasets
C) Medium sized datasets
D) Size does not matter
Solution: A
Datasets which have a clear classification boundary will function best with
SVM’s.

lOMoARcPSD|7609677
7) The effectiveness of an SVM depends upon:
A) Selection of Kernel
B) Kernel Parameters
C) Soft Margin Parameter C
D) All of the above
Solution: D
The SVM effectiveness depends upon how you choose the basic 3
requirements mentioned above in such a way that it maximises your
efficiency, reduces error and overfitting.
8) Support vectors are the data points that lie closest to the decision
surface.
A) TRUE
B) FALSE
Solution: A
They are the points closest to the hyperplane and the hardest ones to
classify. They also have a direct bearing on the location of the decision
surface.
9) The SVM’s are less effective when:

lOMoARcPSD|7609677
A) The data is linearly separable

B) The data is clean and ready to use
C) The data is noisy and contains overlapping points
Solution: C
When the data has noise and overlapping points, there is a problem in
drawing a clear hyperplane without misclassifying.
10) Suppose you are using RBF kernel in SVM with high Gamma value.
What does this signify?
A) The model would consider even far away points from hyperplane for
modeling
B) The model would consider only the points close to the hyperplane for
modeling
C) The model would not be affected by distance of points from hyperplane
for modeling
D) None of the above
Solution: B
The gamma parameter in SVM tuning signifies the influence of points either
near or far away from the hyperplane.
For a low gamma, the model will be too constrained and include all points
of the training dataset, without really capturing the shape.
For a higher gamma, the model will capture the shape of the dataset well.

lOMoARcPSD|7609677
11) The cost parameter in the SVM means:
A) The number of cross-validations to be made

B) The kernel to be used
C) The tradeoff between misclassification and simplicity of the model
D) None of the above
Solution: C
The cost parameter decides how much an SVM should be allowed to

“bend” with the data. For a low cost, you aim for a smooth decision surface
and for a higher cost, you aim to classify more points correctly. It is also
simply referred to as the cost of misclassification.
12) Suppose you are building a SVM model on data X. The data X can be
error prone which means that you should not trust any specific data point
too much. Now think that you want to build a SVM model which has
quadratic kernel function of polynomial degree 2 that uses Slack variable C
as one of it’s hyper parameter. Based upon that give the answer for
following question.
What would happen when you use very large value of C(C->infinity)?
Note: For small C was also classifying all data points correctly
A) We can still classify data correctly for given setting of hyper parameter C
B) We can not classify data correctly for given setting of hyper parameter C
C) Can’t Say

lOMoARcPSD|7609677
D) None of these
Solution: A
For large values of C, the penalty for misclassifying points is very high, so
the decision boundary will perfectly separate the data if possible.
13) What would happen when you use very small C (C~0)?
A) Misclassification would happen
B) Data will be correctly classified
C) Can’t say
D) None of these
Solution: A
The classifier can maximize the margin between most of the points, while
misclassifying a few points, because the penalty is so low.
14) If I am using all features of my dataset and I achieve 100% accuracy on

my training set, but ~70% on validation set, what should I look out for?
A) Underfitting
B) Nothing, the model is perfect
C) Overfitting
Solution: C

lOMoARcPSD|7609677
If we’re achieving 100% training accuracy very easily, we need to check to

verify if we’re overfitting our data.
15) Which of the following are real world applications of the SVM?
A) Text and Hypertext Categorization
B) Image Classification
C) Clustering of News Articles
D) All of the above
Solution: D
SVM’s are highly versatile models that can be used for practically all real
world problems ranging from regression to clustering and handwriting
recognitions.
Question Context: 16 – 18
Suppose you have trained an SVM with linear decision boundary after
training SVM, you correctly infer that your SVM model is under fitting.
16) Which of the following option would you more likely to consider iterating
SVM next time?
A) You want to increase your data points
B) You want to decrease your data points
C) You will try to calculate more variables
D) You will try to reduce the features

lOMoARcPSD|7609677
Solution: C
The best option here would be to create more features for the model.
17) Suppose you gave the correct answer in previous question. What do
you think that is actually happening?
1. We are lowering the bias

2. We are lowering the variance
3. We are increasing the bias
4. We are increasing the variance
A) 1 and 2
B) 2 and 3
C) 1 and 4
D) 2 and 4
Solution: C
Better model will lower the bias and increase the variance
18) In above question suppose you want to change one of it’s(SVM)

hyperparameter so that effect would be same as previous questions i.e
model will not under fit?
A) We will increase the parameter C

B) We will decrease the parameter C

lOMoARcPSD|7609677
C) Changing in C don’t effect

D) None of these
Solution: A
Increasing C parameter would be the right thing to do here, as it will ensure

regularized model
19) We usually use feature normalization before using the Gaussian kernel
in SVM. What is true about feature normalization?
1. We do feature normalization so that new feature will dominate other

2. Some times, feature normalization is not feasible in case of categorical
variables
3. Feature normalization always helps when we use Gaussian kernel in
SVM
A) 1
B) 1 and 2
C) 1 and 3
D) 2 and 3
Solution: B
Statements one and two are correct.
Question Context: 20-22

lOMoARcPSD|7609677
Suppose you are dealing with 4 class classification problem and you want
to train a SVM model on the data for that you are using One-vs-all method.
Now answer the below questions?
20) How many times we need to train our SVM model in such case?
A) 1
B) 2
C) 3
D) 4
Solution: D
For a 4 class problem, you would have to train the SVM at least 4 times if
you are using a one-vs-all method.
21) Suppose you have same distribution of classes in the data. Now, say
for training 1 time in one vs all setting the SVM is taking 10 second. How
many seconds would it require to train one-vs-all method end to end?
A) 20
B) 40
C) 60
D) 80
Solution: B
It would take 10×4 = 40 seconds

lOMoARcPSD|7609677
22) Suppose your problem has changed now. Now, data has only 2
classes. What would you think how many times we need to train SVM in
such case?
A) 1
B) 2
C) 3
D) 4
Solution: A
Training the SVM only one time would give you appropriate results
Question context: 23 – 24
Suppose you are using SVM with linear kernel of polynomial degree 2, Now
think that you have applied this on data and found that it perfectly fit the
data that means, Training and testing accuracy is 100%.
23) Now, think that you increase the complexity(or degree of polynomial of
this kernel). What would you think will happen?
A) Increasing the complexity will overfit the data

B) Increasing the complexity will underfit the data
C) Nothing will happen since your model was already 100% accurate
D) None of these

lOMoARcPSD|7609677
Solution: A
Increasing the complexity of the data would make the algorithm overfit the
data.
24) In the previous question after increasing the complexity you found that
training accuracy was still 100%. According to you what is the reason
behind that?
1. Since data is fixed and we are fitting more polynomial term or

parameters so the algorithm starts memorizing everything in the data
2. Since data is fixed and SVM doesn’t need to search in big hypothesis
space
A) 1
B) 2
C) 1 and 2
D) None of these
Solution: C
Both the given statements are correct.
25) What is/are true about kernel in SVM?

lOMoARcPSD|7609677
1. Kernel function map low dimensional data to high dimensional space

2. It’s a similarity function
A) 1
B) 2
C) 1 and 2
D) None of these
Solution: C
Both the given statements are correct.
Q- When comparing multiple regularised machine learning models for a

given task, which of the following are reasonable ways to pick the best one,
in terms of its ability to generalise to unseen data? (Here A refers to the
regularisation parameter as usual.)
(A)Pick the one with lowest training error, with A having been chosen so as
to minimise training error.
(b) Pick the one with lowest error on a separate test set, with A having
been chosen so as to minimise training error.
(c) Pick the one with lowest error on a separate test set, with A having been
chosen so as to minimise error on this test set.
d) Pick the one with lowest error on a separate test set, with A having been
chosen so as to minimise cross-validation error on the training set.
(E)Pick the one with lowest cross-validation error on the training set, with A
having been chosen so as to minimise cross-validation error on the training
set.

lOMoARcPSD|7609677
6. When doing MAP estimation of the parameters of a linear regression

model (assuming that the opti misation can be done exactly), increasing the
value of the noise precision B
(a) will never decrease the training error.
(B)will never increase the training error.
(C)will never decrease the testing error.
(d) will never increase the testing error.
(e) may either increase or decrease the training error.

(F)may either increase or decrease the testing error.
7. Which of the following are characteristics of data sampled from a

Gaussian distribution?
(a) The sample mean systematically underestimates the true mean.

(B)The sample variance systematically underestimates the true variance.
(c) Both the sample mean and variance are unbiased estimators of the true
values.
1 Which of the following would be incompatible with a frequentist (non-Bayesian) view of

probabil ity?
(a) The use of a non-Gaussian noise model in probabilistic regression.
(b) The use of probabilistic modelling for regression.
(C) The use of prior distributions on the parameters in a probabilistic model.

(D)The idea of assuming a probability distribution over models.

lOMoARcPSD|7609677
2. Four different people are doing bias-variance estimates on regularised linear regression
models. They come to you and make the following claims about certain experiments they've
done. Which of these claims are definitely incorrect? (Here A refers to the regularisation
parameter as usual.)
(a) 'I increased A and the model started underfitting the data, whilst the variance went down'.
(b) 'I decreased A and the model started overfitting the data, whilst the bias went up'.
(C)'I decreased A and the model started overfitting the data, whilst the variance went up'.
(D) 'I increased A and the model started underfitting the data, whilst the bias went down'.
3. Consider a binary classification problem. Suppose I have trained a model on a linearly

separable training set, and now I get a new labeled data point which is correctly classified by
the model, and far away from the decision boundary. If I now add this new point to my earlier
training set and re-train via gradient descent, initialising the parameters to those of the original
model, in which cases will the learnt decision boundary remain exactly the same?
(A)When my model is a perceptron.
(b) When my model is logistic regression.
(c) When my model is Fisher's linear discriminant.
(d) When my model is a linear discriminant trained via least squares.
4. Suppose your model is demonstrating high variance across different training sets. Which of
the following is NOT a valid way to try and reduce the variance?
(a) Increase the amount of training data in each training set.
(b)Improve the optimisation algorithm being used for error minimisation.
(c)Decrease the model complexity.
(d) Reduce the noise in the training data.
1. A _________ is a decision support tool that uses a tree-like graph or

model of decisions and their possible consequences, including chance
event outcomes, resource costs, and utility.
a) Decision tree
b) Graphs
c) Trees
d) Neural Networks

lOMoARcPSD|7609677
2. Decision Tree is a display of an algorithm.

a) True
b) False
3. Decision Tree is
a) Flow-Chart
b) Structure in which internal node represents test on an attribute, each
branch represents outcome of test and each leaf node represents class
label
c) Both a) & b)
.
4. Decision Trees can be used for Classification Tasks.
a) True
b) False
5. How many types of learning are available in machine learning?

a) 1
b) 2
c) 3
d) 4
6. Choose from the following that are Decision Tree nodes

a) Decision Nodes
b) Weighted Nodes
c) Chance Nodes
d) End Nodes
7. Decision Nodes are represented by,

a) Disks
b) Squares
c) Circles
d) Triangles
8. Chance Nodes are represented by,

a) Disks
b) Squares
c) Circles
d) Triangles
9. End Nodes are represented by,

a) Disks
b) Squares
c) Circles

lOMoARcPSD|7609677
d) Triangles

a) Single test
b) Two test
c) Sequence of test
d) No test
11. What is the other name of informed search strategy?

a) Simple search
b) Heuristic search
c) Online search
12. How many types of informed search method are in artificial

intelligence?
a) 1
b) 2
c) 3
d) 4
13. Which search uses the problem specific knowledge beyond the
definition of
the problem?
a) Informed search
b) Depth-first search
c) Breadth-first search
d) Uninformed search
14. Which function will select the lowest expansion node atfirst for
evaluation?
a) Greedy best-first search
b) Best-first search
c) Both a & b
15. What is the heuristic function of greedy best-first search?

a) f(n) != h(n)
b) f(n) < h(n)
c) f(n) = h(n)
d) f(n) > h(n)
16. Which search uses only the linear space for searching?
a) Best-first search

lOMoARcPSD|7609677
b) Recursive best-first search

c) Depth-first search
17. Which method is used to search better by learning?

c) Metalevel state space
18. Which search is complete and optimal when h(n) is consistent?

c) Both a & b
d) A* search
19. Which is used to improve the performance of heuristic search?

a) Quality of nodes
b) Quality of heuristic function
c) Simple form of nodes
20. Which search method will expand the node that is closest to the goal?
b) Greedy best-first search
c) A* search
21. Which data structure is used to give better heuristic estimates?

a) Forwards state-space
b) Backward state-space
c) Planning graph algorithm
22. Which is used to extract solution directly from the planning graph?
a) Planning algorithm
b) Graph plan
c) Hill-climbing search
23. What are present in the planning graph?

a) Sequence of levels
b) Literals
c) Variables

lOMoARcPSD|7609677
d) Heuristic estimates
24. What is the starting level of planning graph?

a) Level 3
b) Level 2
c) Level 1
d) Level 0
25. What are present in each level of planning graph?

a) Literals
b) Actions
c) Variables
d) Both a & b
26. Which kind of problem is suitable for planning graph?

a) Propositional planning problem
b) Planning problem
c) Action problem
27. What is meant by persistence actions?

a) Allow a literal to remain false
b) Allow a literal to remain true
c) Both a & b
28. When will further expansion is unnecessary for planning graph?

a) Identical
b) Replicate
c) Not identical
29. How many conditions are available between two actions in mutex
relation?
a) 1
b) 2
c) 3
d) 4
30. What is called inconsistent support?

a) If two literals are not negation of other
b) If two literals are negation of other
c) Mutually exclusive

lOMoARcPSD|7609677

lOMoARcPSD|7609677
1. What is Machine Learning (ML)?

A. The autonomous acquisition of knowledge through the use of manual programs
B. The selective acquisition of knowledge through the use of computer programs
C. The selective acquisition of knowledge through the use of manual programs
D. The autonomous acquisition of knowledge through the use of computer
programs
Correct option is D
2. Father of Machine Learning (ML)

A. Geoffrey Chaucer
B. Geoffrey Hill
C. Geoffrey Everest Hinton
Correct option is C
3. Which is FALSE regarding regression?

A. It may be used for interpretation
B. It is used for prediction
C. It discovers causal relationships
D. It relates inputs to outputs
Correct option is C
4. Choose the correct option regarding machine learning (ML) and artificial
intelligence (AI)
A. ML is a set of techniques that turns a dataset into a software
B. AI is a software that can emulate the human mind
C. ML is an alternate way of programming intelligent machines
D. All of the above
Correct option is D
5. Which of the factors affect the performance of the learner system does not
include?
A. Good data structures
B. Representation scheme used
C. Training scenario
D. Type of feedback
Correct option is A
6. In general, to have a well-defined learning problem, we must identity which of the

following
A. The class of tasks
B. The measure of performance to be improved
C. The source of experience
D. All of the above

lOMoARcPSD|7609677
Correct option is D
7. Successful applications of ML
A. Learning to recognize spoken words
B. Learning to drive an autonomous vehicle
C. Learning to classify new astronomical structures
D. Learning to play world-class backgammon
E. All of the above
Correct option is E
8. Which of the following does not include different learning methods

A. Analogy
B. Introduction
C. Memorization
D. Deduction
Correct option is B
9. In language understanding, the levels of knowledge that does not include?

A. Empirical
B. Logical
C. Phonological
D. Syntactic
Correct option is A
10. Designing a machine learning approach involves:-

A. Choosing the type of training experience
B. Choosing the target function to be learned
C. Choosing a representation for the target function
D. Choosing a function approximation algorithm
E. All of the above
Correct option is E
11. Concept learning inferred a valued function from training examples of

its input and output.
A. Decimal
B. Hexadecimal
C. Boolean
D. All of the above
Correct option is C
12. Which of the following is not a supervised learning?

A. Naive Bayesian
B. PCA
C. Linear Regression

lOMoARcPSD|7609677
D. Decision Tree Answer

Correct option is B
13. What is Machine Learning?

• Artificial Intelligence
• Deep Learning
• Data Statistics
A. Only (i)
B. (i) and (ii)
C. All
D. None
Correct option is B
14. What kind of learning algorithm for “Facial identities or facial expressions”?
A. Prediction
B. Recognition Patterns
C. Generating Patterns
D. Recognizing Anomalies Answer
Correct option is B
15. Which of the following is not type of learning?

A. Unsupervised Learning
C. Semi-unsupervised Learning
Correct option is C
16. Real-Time decisions, Game AI, Learning Tasks, Skill Aquisition, and Robot
Navigation are applications of which of the folowing
B. Reinforcement Learning
D. Unsupervised Learning: Regression
Correct option is B
17. Targetted marketing, Recommended Systems, and Customer Segmentation are

B. Unsupervised Learning: Clustering
C. Unsupervised Learning: Regression
Correct option is B

lOMoARcPSD|7609677
18. Fraud Detection, Image Classification, Diagnostic, and Customer Retention are
A. Unsupervised Learning: Regression
B. Supervised Learning: Classification
Correct option is B
19. Which of the following is not function of symbolic in the various function
A. Rules in propotional Logic
B. Hidden-Markov Models (HMM)
C. Rules in first-order predicate logic
D. Decision Trees
Correct option is B
20. Which of the following is not numerical functions in the various function
A. Neural Network
B. Support Vector Machines
C. Case-based
D. Linear Regression
Correct option is C
21. FIND-S Algorithm starts from the most specific hypothesis and generalize it by
considering only
A. Negative
B. Positive
C. Negative or Positive
Correct option is B
22. FIND-S algorithm ignores

A. Negative
B. Positive
C. Both
Correct option is A
23. The Candidate-Elimination Algorithm represents the .

A. Solution Space
B. Version Space
C. Elimination Space
D. All of the above

lOMoARcPSD|7609677
Correct option is B
24. Inductive learning is based on the knowledge that if something happens a lot it is
likely to be generally
A. True
B. False Answer
Correct option is A
25. Inductive learning takes examples and generalizes rather than starting
with
A. Inductive
B. Existing
C. Deductive
D. None of these
Correct option is B
26. A drawback of the FIND-S is that it assumes the consistency within the training
set
A. True
B. False
Correct option is A
27. What strategies can help reduce overfitting in decision trees?

• Enforce a maximum depth for the tree
• Enforce a minimum number of samples in leaf nodes
• Pruning
• Make sure each leaf node is one pure class
A. All
B. (i), (ii) and (iii)
C. (i), (iii), (iv)
D. None
Correct option is B
28. Which of the following is a widely used and effective machine learning algorithm
based on the idea of bagging?
A. Decision Tree
B. Random Forest
C. Regression
D. Classification
Correct option is B
29. To find the minimum or the maximum of a function, we set the gradient to zero
because which of the following
A. Depends on the type of problem

lOMoARcPSD|7609677
B. The value of the gradient at extrema of a function is always zero

C. Both (A) and (B)
D. None of these
Correct option is B

A. Decision trees are prone to be overfit
C. Factor analysis
Correct option is A
31. What is perceptron?

A. A single layer feed-forward neural network with pre-processing
C. A double layer auto-associative neural network
D. An auto-associative neural network
Correct option is A

• The training time depends on the size of the
• Neural networks can be simulated on a conventional
• Artificial neurons are identical in operation to biological
A. All
B. Only (ii)
C. (i) and (ii)
D. None
Correct option is C
subscribe our channel

• They have the ability to learn by
• They are more fault
• They are more suited for real time operation due to their high „computational‟
A. (i) and (ii)
B. (i) and (iii)
C. Only (i)
D. All
E. None
Correct option is D
34. What is Neuro software?

A. It is software used by Neurosurgeon

lOMoARcPSD|7609677
B. Designed to aid experts in real world

C. It is powerful and easy neural network
D. A software used to analyze neurons
Correct option is C

A. Each node computes it‟s weighted input
B. Node could be in excited state or non-excited state
C. It has set of nodes and connections
D. All of the above
Correct option is D
36. What is the objective of backpropagation algorithm?

A. To develop learning algorithm for multilayer feedforward neural network, so that
network can be trained to capture the mapping implicitly
B. To develop learning algorithm for multilayer feedforward neural network
C. To develop learning algorithm for single layer feedforward neural network
D. All of the above
Correct option is A

• Perform pattern recognition

• Find the parity of a picture
• Determine whether two or more shapes in a picture are connected or not
A. (ii) and (iii)
B. Only (ii)
C. All
D. None
Correct option is A
38. The backpropagation law is also known as generalized delta rule

A. True
B. False
Correct option is A

• On average, neural networks have higher computational rates than conventional
computers.
• Neural networks learn by
• Neural networks mimic the way the human brain
A. All
B. (ii) and (iii)

lOMoARcPSD|7609677
C. (i), (ii) and (iii)

D. None
Correct option is A
39. What is true regarding backpropagation rule?

A. Error in output is propagated backwards only to determine weight updates
B. There is no feedback of signal at nay stage
C. It is also called generalized delta rule
D. All of the above
Correct option is D
40. There is feedback in final stage of backpropagation

A. True
B. False
Correct option is B
41. An auto-associative network is

A. A neural network that has only one loop
C. A single layer feed-forward neural network with pre-processing
D. A neural network that contains no loops
Correct option is B
42. A 3-input neuron has weights 1, 4 and 3. The transfer function is linear with the
constant of proportionality being equal to 3. The inputs are 4, 8 and 5
respectively. What will be the output?
A. 139
B. 153
C. 162
D. 160
Correct option is B
43. What of the following is true regarding backpropagation rule?

A. Hidden layers output is not all important, they are only meant for supporting
input and output layers
B. Actual output is determined by computing the outputs of units for each hidden
layer
C. It is a feedback neural network
Correct option is B


lOMoARcPSD|7609677
B. It is the transmission of error back through the network to allow weights to be

adjusted so that the network can learn
C. It is another name given to the curvy function in the perceptron
Correct option is B
45. The general limitations of back propagation rule is/are

A. Scaling
B. Slow convergence
C. Local minima problem
D. All of the above
Correct option is D
46. What is the meaning of generalized in statement “backpropagation is a

generalized delta rule” ?
A. Because delta is applied to only input and output layers, thus making it more
simple and generalized
B. It has no significance
C. Because delta rule can be extended to hidden layer units
Correct option is C
47. Neural Networks are complex functions with many parameter

A. Linear
B. Non linear
C. Discreate
D. Exponential
Correct option is A
48. The general tasks that are performed with backpropagation algorithm
A. Pattern mapping
B. Prediction
C. Function approximation
D. All of the above
Correct option is D
49. Backpropagaion learning is based on the gradient descent along error surface.
A. True
B. False
Correct option is A
50. In backpropagation rule, how to stop the learning process?

A. No heuristic criteria exist
B. On basis of average gradient value

lOMoARcPSD|7609677
C. There is convergence involved

D. None of these
Correct option is B
51. Applications of NN (Neural Network)

A. Risk management
B. Data validation
C. Sales forecasting
D. All of the above
Correct option is D
layers is known as
A. Recurrent neural network
B. Self organizing maps
C. Perceptrons
D. Single layered perceptron
Correct option is A
53. Decision Tree is a display of an Algorithm?

A. True
B. False
Correct option is A
54. Which of the following is/are the decision tree nodes?

A. End Nodes
B. Decision Nodes
C. Chance Nodes
D. All of the above
Correct option is D
55. End Nodes are represented by which of the following

B. Triangles
C. Circles
D. Squares
Correct option is B
56. Decision Nodes are represented by which of the following

B. Triangles
C. Circles
D. Squares
Correct option is D

lOMoARcPSD|7609677
57. Chance Nodes are represented by which of the following

B. Triangles
C. Circles
D. Squares
Correct option is C
58. Advantage of Decision Trees

B. Use a white box model, if given result is provided by a model
D. All of the above
Correct option is D
59. terms are required for building a bayes model.

A. 1
B. 2
C. 3
D. 4
Correct option is C
60. Which of the following is the consequence between a node and its predecessors
while creating bayesian network?
A. Conditionally independent
B. Functionally dependent
C. Both Conditionally dependant & Dependant
D. Dependent
Correct option is A
61. Why it is needed to make probabilistic systems feasible in the world?

A. Feasibility
B. Reliability
C. Crucial robustness
Correct option is C
62. Bayes rule can be used for:-

A. Solving queries
C. Answering probabilistic query
D. Decreasing complexity
Correct option is C

lOMoARcPSD|7609677
63. provides way and means of weighing up the desirability of goals and the
likelihood of achieving
A. Utility theory
B. Decision theory
C. Bayesian networks
D. Probability theory
Correct option is A
64. Which of the following provided by the Bayesian Network?

A. Complete description of the problem
B. Partial description of the domain
C. Complete description of the domain
D. All of the above
Correct option is C
65. Probability provides a way of summarizing the that comes from our laziness
and
A. Belief
B. Uncertaintity
C. Joint probability distributions
D. Randomness
Correct option is B
66. The entries in the full joint probability distribution can be calculated as
A. Using variables
B. Both Using variables & information
C. Using information
D. All of the above
Correct option is C
67. Causal chain (For example, Smoking cause cancer) gives rise to:-
A. Conditionally Independence
B. Conditionally Dependence
C. Both
Correct option is A
68. The bayesian network can be used to answer any query by using:-
D. All of the above
Correct option is B

lOMoARcPSD|7609677
69. Bayesian networks allow compact specification of:-

A. Joint probability distributions
B. Belief
C. Propositional logic statements
D. All of the above
Correct option is A
70. The compactness of the bayesian network can be described by

A. Fully structured
B. Locally structured
C. Partially structured
D. All of the above
Correct option is B
71. The Expectation-Maximization Algorithm has been used to identify conserved

domains in unaligned proteins only. State True or False.
A. True
B. False
Correct option is B
72. Which of the following is correct about the Naive Bayes?

A. Assumes that all the features in a dataset are independent
B. Assumes that all the features in a dataset are equally important
C. Both
D. All of the above
Correct option is C
73. Which of the following is false regarding EM Algorithm?

A. The alignment provides an estimate of the base or amino acid composition of
each column in the site
B. The column-by-column composition of the site already available is used to
estimate the probability of finding the site at any position in each of the
sequences
C. The row-by-column composition of the site already available is used to estimate
the probability
Correct option is C
74. Naïve Bayes Algorithm is a learning algorithm.

A. Supervised
B. Reinforcement
C. Unsupervised
D. None of these
Correct option is A

lOMoARcPSD|7609677
75. EM algorithm includes two repeated steps, here the step 2 is .

A. The normalization
B. The maximization step
C. The minimization step
Correct option is C
76. Examples of Naïve Bayes Algorithm is/are

A. Spam filtration
B. Sentimental analysis
C. Classifying articles
D. All of the above
Correct option is D
77. In the intermediate steps of “EM Algorithm”, the number of each base in each
column is determined and then converted to
A. True
B. False
Correct option is A
78. Naïve Bayes algorithm is based on and used for solving classification problems.
A. Bayes Theorem
B. Candidate elimination algorithm
C. EM algorithm
Correct option is A
79. Types of Naïve Bayes Model:

A. Gaussian
B. Multinomial
C. Bernoulli
D. All of the above
Correct option is D
80. Disadvantages of Naïve Bayes Classifier:

A. Naive Bayes assumes that all features are independent or unrelated, so it cannot
learn the relationship between
B. It performs well in Multi-class predictions as compared to the other
C. Naïve Bayes is one of the fast and easy ML algorithms to predict a class of
D. It is the most popular choice for text classification problems.
Correct option is A
81. The benefit of Naïve Bayes:-

A. Naïve Bayes is one of the fast and easy ML algorithms to predict a class of

lOMoARcPSD|7609677
B. It is the most popular choice for text classification problems.

C. It can be used for Binary as well as Multi-class
D. All of the above
Correct option is D
82. In which of the following types of sampling the information is carried out under
the opinion of an expert?
A. Convenience sampling
B. Judgement sampling
C. Quota sampling
D. Purposive sampling
Correct option is B
83. Full form of MDL?

A. Minimum Description Length
B. Maximum Description Length
C. Minimum Domain Length
D. None of these
Correct option is A
84. For the analysis of ML algorithms, we need

A. Computational learning theory
B. Statistical learning theory
C. Both A & B
D. None of these
Correct option is C
85. PAC stand for

A. Probably Approximate Correct
B. Probably Approx Correct
C. Probably Approximate Computation
D. Probably Approx Computation
Correct option is A
86. hypothesis h with respect to target concept c and distribution D , is the

probability that h will misclassify an instance drawn at random according to D.
A. True Error
B. Type 1 Error
C. Type 2 Error
D. None of these
Correct option is A
87. Statement: True error defined over entire instance space, not just training data
A. True

lOMoARcPSD|7609677
B. False
Correct option is A
88. What are the area CLT comprised of?

C. Mistake Bound
D. All of these
Correct option is D
88. What area of CLT tells “How many examples we need to find a good hypothesis
?”?
C. Mistake Bound
D. None of these
Correct option is A
89. What area of CLT tells “How much computational power we need to find a good
hypothesis ?”?
C. Mistake Bound
D. None of these
Correct option is B
90. What area of CLT tells “How many mistakes we will make before finding a good
hypothesis ?”?
C. Mistake Bound
D. None of these
Correct option is C
91. (For question no. 9 and 10) Can we say that concept described by conjunctions of
Boolean literals are PAC learnable?
A. Yes
B. No
Correct option is A
92. How large is the hypothesis space when we have n Boolean attributes?
A. |H| = 3 n
B. |H| = 2 n
C. |H| = 1 n

lOMoARcPSD|7609677
D. |H| = 4n
Correct option is A
93. The VC dimension of hypothesis space H1 is larger than the VC dimension of

hypothesis space H2. Which of the following can be inferred from this?
A. The number of examples required for learning a hypothesis in H1 is larger than
the number of examples required for H2
B. The number of examples required for learning a hypothesis in H1 is smaller than
the number of examples required for
C. No relation to number of samples required for PAC learning.
Correct option is A
94. For a particular learning task, if the requirement of error parameter changes from
0.1 to 0.01. How many more samples will be required for PAC learning?
A. Same
B. 2 times
C. 1000 times
D. 10 times
Correct option is D
95. Computational complexity of classes of learning problems depends on which of

the following?
A. The size or complexity of the hypothesis space considered by learner
B. The accuracy to which the target concept must be approximated
C. The probability that the learner will output a successful hypothesis
D. All of these
Correct option is D
96. The instance-based learner is a

A. Lazy-learner
B. Eager learner
C. Can‟t say
Correct option is A
97. When to consider nearest neighbour algorithms?

A. Instance map to point in kn
B. Not more than 20 attributes per instance
C. Lots of training data
D. None of these
E. A, B & C
Correct option is E
98. What are the advantages of Nearest neighbour alogo?

A. Training is very fast

lOMoARcPSD|7609677
B. Can learn complex target functions

C. Don‟t lose information
D. All of these
Correct option is D
99. What are the difficulties with k-nearest neighbour algo?

A. Calculate the distance of the test case from all training cases
B. Curse of dimensionality
C. Both A & B
D. None of these
Correct option is C
100. What if the target function is real valued in kNN algo?

A. Calculate the mean of the k nearest neighbours
B. Calculate the SD of the k nearest neighbour
C. None of these
Correct option is A
101. What is/are true about Distance-weighted KNN?

A. The weight of the neighbour is considered
B. The distance of the neighbour is considered
C. Both A & B
D. None of these
Correct option is C
102. What is/are advantage(s) of Distance-weighted k-NN over k-NN?

A. Robust to noisy training data
B. Quite effective when a sufficient large set of training data is provided
C. Both A & B
D. None of these
Correct option is C
103. What is/are advantage(s) of Locally Weighted Regression?

A. Pointwise approximation of complex target function
B. Earlier data has no influence on the new ones
C. Both A & B
D. None of these
Correct option is C
104. The quality of the result depends on (LWR)

A. Choice of the function
B. Choice of the kernel function K
C. Choice of the hypothesis space H
D. All of these

lOMoARcPSD|7609677
Correct option is D
105. How many types of layer in radial basis function neural networks?
A. 3
B. 2
C. 1
D. 4
Correct option is A, Input layer, Hidden layer, and Output layer
106. The neurons in the hidden layer contains Gaussian transfer function whose
output are to the distance from the centre of the neuron.
A. Directly
B. Inversely
C. equal
D. None of these
Correct option is B
107. PNN/GRNN networks have one neuron for each point in the training file,
While RBF network have a variable number of neurons that is usually
A. less than the number of training
B. greater than the number of training points
C. equal to the number of training points
D. None of these
Correct option is A
108. Which network is more accurate when the size of training set between
small to medium?
A. PNN/GRNN
B. RBF
C. K-means clustering
D. None of these
Correct option is A
109. What is/are true about RBF network?

A. A kind of supervised learning
B. Design of NN as curve fitting problem
C. Use of multidimensional surface to interpolate the test data
D. All of these
Correct option is D
110. Application of CBR

A. Design
B. Planning
C. Diagnosis

lOMoARcPSD|7609677
D. All of these
Correct option is A
111. What is/are advantages of CBR?

A. A local approx. is found for each test case
B. Knowledge is in a form understandable to human
C. Fast to train
D. All of these
Correct option is D
112 In k-NN algorithm, given a set of training examples and the value of k < size of
training set (n), the algorithm predicts the class of a test example to be the. What is/are
advantages of CBR?
A. Least frequent class among the classes of k closest training

B. Most frequent class among the classes of k closest training
C. Class of the closest
D. Most frequent class among the classes of the k farthest training examples.
Correct option is B
113. Which of the following statements is true about PCA?

• We must standardize the data before applying
• We should select the principal components which explain the highest variance
• We should select the principal components which explain the lowest variance
• We can use PCA for visualizing the data in lower dimensions
A. (i), (ii) and (iv).
B. (ii) and (iv)
C. (iii) and (iv)
D. (i) and (iii)
Correct option is A
114. Genetic algorithm is a

A. Search technique used in computing to find true or approximate solution to
optimization and search problem
B. Sorting technique used in computing to find true or approximate solution to
optimization and sort problem
C. Both A & B
D. None of these
Correct option is A
115. GA techniques are inspired by

A. Evolutionary
B. Cytology
C. Anatomy

lOMoARcPSD|7609677
D. Ecology
Correct option is A
116. When would the genetic algorithm terminate?

A. Maximum number of generations has been produced
B. Satisfactory fitness level has been reached for the
C. Both A & B
D. None of these
Correct option is C
117. The algorithm operates by iteratively updating a pool of hypotheses,

called the
A. Population
B. Fitness
C. None of these
Correct option is A
118. What is the correct representation of GA?

A. GA(Fitness, Fitness_threshold, p)
B. GA(Fitness, Fitness_threshold, p, r )
C. GA(Fitness, Fitness_threshold, p, r, m)
D. GA(Fitness, Fitness_threshold)
Correct option is C
119. Genetic operators includes

A. Crossover
B. Mutation
C. Both A & B
D. None of these
Correct option is C
120. Produces two new offspring from two parent string by copying selected
bits from each parent is called
A. Mutation
B. Inheritance
C. Crossover
D. None of these
Correct option is C
121. Each schema the set of bit strings containing the indicated as
A. 0s, 1s
B. only 0s
C. only 1s
D. 0s, 1s, *s

lOMoARcPSD|7609677
Correct option is D
122. 0*10 represents the set of bit strings that includes exactly (A) 0010, 0110
A. 0010, 0010
B. 0100, 0110
C. 0100, 0010
Correct option is A
123. Correct ( h ) is the percent of all training examples correctly classified by

hypothesis then Fitness function is equal to
A. Fitness ( h) = (correct ( h)) 2
B. Fitness ( h) = (correct ( h)) 3
C. Fitness ( h) = (correct ( h))
D. Fitness ( h) = (correct ( h)) 4
Correct option is A
124. Statement: Genetic Programming individuals in the evolving population

are computer programs rather than bit
A. True
B. False
Correct option is A
125. evolution over many generations was directly influenced by the

experiences of individual organisms during their lifetime
A. Baldwin
B. Lamarckian
C. Bayes
D. None of these
Correct option is B
126. Search through the hypothesis space cannot be characterized. Why?

A. Hypotheses are created by crossover and mutation operators that allow radical
changes between successive generations
B. Hypotheses are not created by crossover and mutation
C. None of these
Correct option is A
127. ILP stand for

A. Inductive Logical programming
B. Inductive Logic Programming
C. Inductive Logical Program
D. Inductive Logic Program
Correct option is B

lOMoARcPSD|7609677
128. What is/are the requirement for the Learn-One-Rule method?

A. Input, accepts a set of +ve and -ve training examples.
B. Output, delivers a single rule that covers many +ve examples and few -ve.
C. Output rule has a high accuracy but not necessarily a high
D. A & B
E. A, B & C
Correct option is E
129. is any predicate (or its negation) applied to any set of terms.
A. Literal
B. Null
C. Clause
D. None of these
Correct option is A
subscribe our channel

130. Ground literal is a literal that
A. Contains only variables
B. does not contains any functions
C. does not contains any variables
D. Contains only functions Answer
Correct option is C
131. emphasizes learning feedback that evaluates the learner’s

performance without providing standards of correctness in the form of
behavioural
A. Reinforcement learning
C. None of these
Correct option is A
132. Features of Reinforcement learning

A. Set of problem rather than set of techniques
B. RL is training by reward and
C. RL is learning from trial and error with the
D. All of these
Correct option is D
133. Which type of feedback used by RL?

A. Purely Instructive feedback
B. Purely Evaluative feedback
C. Both A & B
D. None of these

lOMoARcPSD|7609677
Correct option is B
134. What is/are the problem solving methods for RL?

A. Dynamic programming
B. Monte Carlo Methods
C. Temporal-difference learning
D. All of these
Correct option is D
135. The FIND-S Algorithm

A. Starts with starts from the most specific hypothesis Answer
B. It considers negative examples
C. It considers both negative and positive
D. None of these Correct
136. The hypothesis space has a general-to-specific ordering of hypotheses, and the
search can be efficiently organized by taking advantage of a naturally occurring structure
over the hypothesis space
1.
A. TRUE
B. FALSE
Correct option is A
137. The Version space is:
A. The subset of all hypotheses is called the version space with respect to the
hypothesis space H and the training examples D, because it contains all plausible
versions of the target
B. The version space consists of only specific
C. None of these
D.
Correct option is A
138. The Candidate-Elimination Algorithm

A. The key idea in the Candidate-Elimination algorithm is to output a description
of the set of all hypotheses consistent with the training
B. Candidate-Elimination algorithm computes the description of this set without
explicitly enumerating all of its
C. This is accomplished by using the more-general-than partial ordering and
maintaining a compact representation of the set of consistent
D. All of these
Correct option is D

lOMoARcPSD|7609677
139. Concept learning is basically acquiring the definition of a general category

from given sample positive and negative training examples of the
A. TRUE
B. FALSE
Correct option is A
140. The hypothesis h1 is more-general-than hypothesis h2 ( h1 > h2) if and

only if h1≥h2 is true and h2≥h1 is false. We also say h2 is more-specific-than h1
A. The statement is true
B. The statement is false
C. We cannot
D. None of these
Correct option is A
141. The List-Then-Eliminate Algorithm

A. The List-Then-Eliminate algorithm initializes the version space to contain all
hypotheses in H, then eliminates any hypothesis found inconsistent with any
training
B. The List-Then-Eliminate algorithm not initializes to the version
C. None of these Answer
Correct option is A
A. Learning
B. Hearing
C. Perceiving
D. Speech
Correct option is A
143. Which modifies the performance element so that it makes better

decision?Performance element
A. Performance element
B. Changing element
C. Learning element
Correct option is C
144. Any hypothesis found to approximate the target function well over a
sufficiently large set of training examples will also approximate the target
function well over other unobserved example is called:
A. Inductive Learning Hypothesis
B. Null Hypothesis
C. Actual Hypothesis
D. None of these

lOMoARcPSD|7609677
Correct option is A
145. Feature of ANN in which ANN creates its own organization or

representation of information it receives during learning time is
A. Adaptive Learning
B. Self Organization
C. What-If Analysis
D. Supervised Learning
Correct option is B

A. Single test
B. Two test
C. Sequence of test
D. No test
Correct option is C

• Factor analysis
• Decision trees are robust to outliers
• Decision trees are prone to be overfit
Correct option is C
148. Tree/Rule based classification algorithms generate which rule to perform

the classification.
A. if-then.
B. then
C. do
D. Answer
Correct option is A
149. What is Gini Index?

A. It is a type of index structure
B. It is a measure of purity
C. None of the options
Correct option is A
150. What is not a RNN in machine learning?

A. One output to many inputs
B. Many inputs to a single output
C. RNNs for nonsequential input
D. Many inputs to many outputs
Correct option is A

lOMoARcPSD|7609677
151. Which of the following sentences are correct in reference to Information

gain?
A. It is biased towards multi-valued attributes
B. ID3 makes use of information gain
C. The approach used by ID3 is greedy
D. All of these
Correct option is D
152. A Neural Network can answer

A. For Loop questions
B. what-if questions
C. IF-The-Else Analysis Questions
D. None of these Answer
Correct option is B
153. Artificial neural network used for

A. Pattern Recognition
B. Classification
C. Clustering
D. All Answer
Correct option is D
154. Which of the following are the advantage/s of Decision Trees?

B. Use a white box model, If given result is provided by a model
Correct option is D
155. What is the mathematical likelihood that something will occur?

A. Classification
B. Probability
C. Naïve Bayes Classifier
D. None of the other
Correct option is C
A. What does the Bayesian network provides?

B. Complete description of the domain
C. Partial description of the domain
D. Complete description of the problem
E. None of the mentioned
Correct option is C

lOMoARcPSD|7609677
A. Solving queries
C. Decreasing complexity
D. Answering probabilistic query
Correct option is D
158. How many terms are required for building a Bayes model?
A. 2
B. 3
C. 4
D. 1
Correct option is B
159. What is needed to make probabilistic systems feasible in the world?

A. Reliability
B. Crucial robustness
C. Feasibility
Correct option is B
160. It was shown that the Naive Bayesian method

A. Can be much more accurate than the optimal Bayesian method
B. Is always worse off than the optimal Bayesian method
C. Can be almost optimal only when attributes are independent
D. Can be almost optimal when some attributes are dependent
Correct option is C
161. What is the consequence between a node and its predecessors while
creating Bayesian network?
A. Functionally dependent
B. Dependant
C. Conditionally independent
D. Both Conditionally dependant & Dependant
Correct option is C
162. How the compactness of the Bayesian network can be described?

A. Locally structured
B. Fully structured
C. Partial structure
Correct option is A
163. How the entries in the full joint probability distribution can be calculated?
A. Using variables

lOMoARcPSD|7609677
B. Using information
C. Both Using variables & information
Correct option is B
164. How the Bayesian network can be used to answer any query?
Correct option is B
165. Sample Complexity is

A. The sample complexity is the number of training-samples that we need to
supply to the algorithm, so that the function returned by the algorithm is
within an arbitrarily small error of the best possible function, with probability
arbitrarily close to 1
B. How many training examples are needed for learner to converge to a
successful hypothesis.
C. All of these
Correct option is C
166. PAC stands for

A. Probability Approximately Correct
B. Probability Applied Correctly
C. Partition Approximately Correct
Correct option is A
167. Which of the following will be true about k in k-NN in terms of variance
A. When you increase the k the variance will increases
B. When you decrease the k the variance will increases
C. Can‟t say
D. None of these
Correct option is B
168. Which of the following option is true about k-NN algorithm?

A. It can be used for classification
B. It can be used for regression
C. It can be used in both classification and regression Answer
Correct option is C
169. In k-NN it is very likely to overfit due to the curse of dimensionality. Which
of the following option would you consider to handle such problem? 1).
Dimensionality Reduction 2). Feature selection

lOMoARcPSD|7609677
A. 1
B. 2
C. 1 and 2
D. None of these
Correct option is C
consider in k- NN
C. Noise can not be dependent on value of k
D. None of these
Correct option is A
171. Which of the following will be true about k in k-NN in terms of Bias?
C. Can‟t say
D. None of these
Correct option is A
172. What is used to mitigate overfitting in a test set?

A. Overfitting set
B. Training set
C. Validation dataset
D. Evaluation set
Correct option is C
173. A radial basis function is a

A. Activation function
B. Weight
C. Learning rate
D. none
Correct option is A
174. Mistake Bound is

A. How many training examples are needed for learner to converge to a successful
hypothesis.
B. How much computational effort is needed for a learner to converge to a
C. How many training examples will the learner misclassify before conversing to a
D. None of these
Correct option is C

lOMoARcPSD|7609677
175. All of the following are suitable problems for genetic algorithms EXCEPT
A. dynamic process control
B. pattern recognition with complex patterns
C. simulation of biological models
D. simple optimization with few variables
Correct option is D
176. Adding more basis functions in a linear model… (Pick the most probably
option)
A. Decreases model bias
B. Decreases estimation bias
C. Decreases variance
D. Doesn‟t affect bias and variance
Correct option is A
177. Which of these are types of crossover

A. Single point
B. Two point
C. Uniform
D. All of these
Correct option is D
178. A feature F1 can take certain value: A, B, C, D, E, & F and represents grade
of students from a college. Which of the following statement is true in following
case?
A. Feature F1 is an example of nominal
B. Feature F1 is an example of ordinal
C. It doesn‟t belong to any of the above category.
Correct option is B
179. You observe the following while fitting a linear regression to the data: As
you increase the amount of training data, the test error decreases and the
training error increases. The train error is quite low (almost what you expect it to),
while the test error is much higher than the train error. What do you think is the
main reason behind this behaviour? Choose the most probable option.
A. High variance
B. High model bias
C. High estimation bias
D. None of the above Answer
Correct option is C
180. Genetic algorithms are heuristic methods that do not guarantee an

optimal solution to a problem
A. TRUE

lOMoARcPSD|7609677
B. FALSE
Correct option is A
181. Which of the following statements about regularization is not correct?

A. Using too large a value of lambda can cause your hypothesis to underfit the
B. Using too large a value of lambda can cause your hypothesis to overfit the
C. Using a very large value of lambda cannot hurt the performance of your
hypothesis.
Correct option is A
182. Consider the following: (a) Evolution (b) Selection (c) Reproduction (d)
Mutation Which of the following are found in genetic algorithms?
A. All
B. a, b, c
C. a, b
D. b, d
Correct option is A
183. Genetic Algorithm are a part of

A. Evolutionary Computing
B. inspired by Darwin’s theory about evolution – “survival of the fittest”
C. are adaptive heuristic search algorithm based on the evolutionary ideas of
D. All of the above
Correct option is D
184. Genetic algorithms belong to the family of methods in the

A. artificial intelligence area
B. optimization
C. complete enumeration family of methods
D. Non-computer based (human) solutions area
Correct option is A
185. For a two player chess game, the environment encompasses the opponent
A. True
B. False
Correct option is A
186. Which among the following is not a necessary feature of a reinforcement

learning solution to a learning problem?
A. exploration versus exploitation dilemma
B. trial and error approach to learning
C. learning based on rewards

lOMoARcPSD|7609677
D. representation of the problem as a Markov Decision Process

Correct option is D
187. Which of the following sentence is FALSE regarding reinforcement learning

A. It relates inputs to
B. It is used for
C. It may be used for
D. It discovers causal relationships.
Correct option is D
188. The EM algorithm is guaranteed to never decrease the value of its

objective function on any iteration
A. TRUE
B. FALSE Answer
Correct option is A
189. Consider the following modification to the tic-tac-toe game: at the end of
game, a coin is tossed and the agent wins if a head appears regardless of
whatever has happened in the game.Can reinforcement learning be used to learn
an optimal policy of playing Tic-Tac-Toe in this case?
A. Yes
B. No
Correct option is B
190. Out of the two repeated steps in EM algorithm, the step 2 is

_
A. the maximization step

B. the minimization step
C. the optimization step
D. the normalization step
Correct option is A
191. Suppose the reinforcement learning player was greedy, that is, it always
played the move that brought it to the position that it rated the best. Might it
learn to play better, or worse, than a non greedy player?
A. Worse
B. Better
Correct option is B
192. A chess agent trained by using Reinforcement Learning can be trained by

playing against a copy of the same
A. True

lOMoARcPSD|7609677
B. False
Correct option is A
193. The EM iteration alternates between performing an expectation (E) step,

which creates a function for the expectation of the log-likelihood evaluated using
the current estimate for the parameters, and a maximization (M) step, which
computes parameters maximizing the expected log-likelihood found on the E
A. TRUE
B. FALSE
Correct option is A
194. Expectation–maximization (EM) algorithm is an

A. Iterative
B. Incremental
C. None
Correct option is A
195. Feature need to be identified by using Well Posed Learning Problem:

A. Class of tasks
B. Performance measure
C. Training experience
D. All of these
Correct option is D
196. A computer program that learns to play checkers might improve its
performance as:
A. Measured by its ability to win at the class of tasks involving playing checkers
B. Experience obtained by playing games against
C. Both a & b
D. None of these
Correct option is C
197. Learning symbolic representations of concepts known as:

B. Machine Learning
C. Both a & b
D. None of these
Correct option is A
198. The field of study that gives computers the capability to learn without
being explicitly programmed
A. Machine Learning
B. Artificial Intelligence
C. Deep Learning

lOMoARcPSD|7609677
D. Both a & b
Correct option is A
199. The autonomous acquisition of knowledge through the use of computer

programs is called
B. Machine Learning
C. Deep learning
D. All of these
Correct option is B
200. Learning that enables massive quantities of data is known as

B. Machine Learning
C. Deep learning
D. All of these
Correct option is B
201. A different learning method does not include

A. Memorization
B. Analogy
C. Deduction
D. Introduction
Correct option is D
202. Types of learning used in machine

A. Supervised
B. Unsupervised
C. Reinforcement
D. All of these
Correct option is D
203. A computer program is said to learn from experience E with respect to

some class of tasks T and performance measure P, if its performance at tasks in T,
as measured by P, improves with experience
A. Supervised learning problem
B. Un Supervised learning problem
C. Well posed learning problem
D. All of these
Correct option is C
A. Decision Tree

lOMoARcPSD|7609677
B. Regression
C. Classification
D. Random Forest
Correct option is D

A. 1
B. 2
C. 3
D. 4
Correct option is C
205. A model can learn based on the rewards it received for its previous action
is known as:
A. Supervised learning
B. Unsupervised learning
C. Reinforcement learning
D. Concept learning
Correct option is C
206. A subset of machine learning that involves systems that think and learn
like humans using artificial neural networks.
B. Machine Learning
C. Deep Learning
D. All of these
Correct option is C
207. A learning method in which a training data contains a small amount of

labeled data and a large amount of unlabeled data is known
as
Correct option is C
208. Methods used for the calibration in Supervised Learning

A. Platt Calibration
B. Isotonic Regression
C. All of these
D. None of above
Correct option is C

lOMoARcPSD|7609677
209. The basic design issues for designing a learning

A. Choosing the Training Experience
B. Choosing the Target Function
C. Choosing a Function Approximation Algorithm
D. Estimating Training Values
E. All of these
Correct option is E
210. In Machine learning the module that must solve the given performance
task is known as:
A. Critic
B. Generalizer
D. All of these
Correct option is C
211. A learning method that is used to solve a particular computational

program, multiple models such as classifiers or experts are strategically generated
and combined is called as
Correct option is E
212. In a learning system the component that takes as takes input the current
hypothesis (currently learned function) and outputs a new problem for the
Performance System to explore.
A. Critic
B. Generalizer
D. Experiment generator
E. All of these
Correct option is D
213. Learning method that is used to improve the classification, prediction,

function approximation etc of a model
Correct option is E

lOMoARcPSD|7609677
214. In a learning system the component that takes as input the history or trace
of the game and produces as output a set of training examples of the target
function is known as:
A. Critic
B. Generalizer
D. All of these
Correct option is A
215. The most common issue when using ML is

A. Lack of skilled resources
B. Inadequate Infrastructure
C. Poor Data Quality
D. None of these
Correct option is C
216. How to ensure that your model is not over fitting

A. Cross validation
B. Regularization
C. All of these
D. None of these
Correct option is C
217. A way to ensemble multiple classifications or regression

A. Stacking
B. Bagging
C. Blending
D. Boosting
Correct option is A
218. How well a model is going to generalize in new environment is known as

A. Data Quality
B. Transparent
C. Implementation
D. None of these
Correct option is B
219. Common classes of problems in machine learning is

A. Classification
B. Clustering
C. Regression
D. All of these
Correct option is D

lOMoARcPSD|7609677
A. Decision Tree
B. Regression
C. Classification
D. Random Forest
Correct option is D
221. Cost complexity pruning algorithm is used in?

A. CART
B. 5
C. ID3
D. All of
Correct option is A
222. Which one of these is not a tree based learner?

A. CART
B. 5
C. ID3
D. Bayesian Classifier
Correct option is D
223. Which one of these is a tree based learner?

A. Rule based
B. Bayesian Belief Network
C. Bayesian classifier
D. Random Forest
Correct option is D
224. What is the approach of basic algorithm for decision tree induction?
A. Greedy
B. Top Down
C. Procedural
D. Step by Step
Correct option is A
225. Which of the following classifications would best suit the student
performance classification systems?
A. If-.then-analysis
B. Market-basket analysis
C. Regression analysis
D. Cluster analysis
Correct option is A

lOMoARcPSD|7609677
226. What are two steps of tree pruning work?

A. Pessimistic pruning and Optimistic pruning
B. Post pruning and Pre pruning
C. Cost complexity pruning and time complexity pruning
D. None of these
Correct option is B
227. How will you counter over-fitting in decision tree?

A. By pruning the longer rules
B. By creating new rules
C. Both By pruning the longer rules‟ and „ By creating new rules‟
D. None of Answer
Correct option is A
228. Which of the following sentences are true?

A. In pre-pruning a tree is ‘pruned’ by halting its construction early
B. A pruning set of class labeled tuples is used to estimate cost
C. The best pruned tree is the one that minimizes the number of encoding
D. All of these
Correct option is D

A. Factor analysis
C. Decision trees are prone to be over fit
Correct option is C
230. In which of the following scenario a gain ratio is preferred over

Information Gain?
A. When a categorical variable has very large number of category
B. When a categorical variable has very small number of category
C. Number of categories is the not the reason
D. None of these
Correct option is A
231. Major pruning techniques used in decision tree are

A. Minimum error
B. Smallest tree
C. Both a & b
D. None of these
Correct option is B
232. What does the central limit theorem state?

lOMoARcPSD|7609677
A. If the sample size increases sampling distribution must approach normal

distribution
B. If the sample size decreases then the sample distribution must approach
normal distribution.
C. If the sample size increases then the sampling distributions much approach an
exponential
D. If the sample size decreases then the sampling distributions much approach
an exponential
Correct option is A
233. The difference between the sample value expected and the estimates
value of the parameter is called as?
A. Bias
B. Error
C. Contradiction
D. Difference
Correct option is A
234. In which of the following types of sampling the information is carried out
under the opinion of an expert?
A. Quota sampling
B. Convenience sampling
C. Purposive sampling
D. Judgment sampling
Correct option is D
235. Which of the following is a subset of population?

A. Distribution
B. Sample
C. Data
D. Set
Correct option is B
236. The sampling error is defined as?

A. Difference between population and parameter
B. Difference between sample and parameter
C. Difference between population and sample
D. Difference between parameter and sample
Correct option is C
237. Machine learning is interested in the best hypothesis h from some space
H, given observed training data D. Here best hypothesis means
A. Most general hypothesis
B. Most probable hypothesis

lOMoARcPSD|7609677
C. Most specific hypothesis

D. None of these
Correct option is B
238. Practical difficulties with Bayesian Learning :

A. Initial knowledge of many probabilities is required
B. No consistent hypothesis
C. Hypotheses make probabilistic predictions
D. None of these
Correct option is A
239. Bayes’ theorem states that the relationship between the probability of the
hypothesis before getting the evidence P(H) and the probability of the hypothesis
after getting the evidence P(H∣E) is
A. [P(E∣H)P(H)] / P(E)
B. [P(E∣H) P(E) ] / P(H)
C. [P(E) P(H) ] / P(E∣H)
D. None of these
Correct option is A
240. A doctor knows that Cold causes fever 50% of the time. Prior probability of
any patient having cold is 1/50,000. Prior probability of any patient having fever is
1/20. If a patient has fever, what is the probability he/she has cold?
A. P(C/F)= 0.0003
B. P(C/F)=0.0004
C. P(C/F)= 0.0002
D. P(C/F)=0.0045
Correct option is C
241. Which of the following will be true about k in K-Nearest Neighbor in terms
of Bias?
C. Can‟t say
D. None of these
Correct option is A
consider in K- Nearest Neighbor?
C. Noise cannot be dependent on value of k
D. None of these
Correct option is A

lOMoARcPSD|7609677
243. In K-Nearest Neighbor it is very likely to overfit due to the curse of

dimensionality. Which of the following option would you consider to handle such
problem?
• Dimensionality Reduction
• Feature selection
A. 1
B. 2
C. 1 and 2
D. None of these
Correct option is C
244. Radial basis functions is closely related to distance-weighted regression,

but it is
A. lazy learning
B. eager learning
C. concept learning
D. none of these
Correct option is B
245. Radial basis function networks provide a global approximation to the

target function, represented by of many local kernel function.
A. a series combination
B. a linear combination
C. a parallel combination
D. a non linear combination
Correct option is B
246. The most significant phase in a genetic algorithm is

A. Crossover
B. Mutation
C. Selection
D. Fitness function
Correct option is A
247. The crossover operator produces two new offspring from

A. Two parent strings, by copying selected bits from each parent
B. One parent strings, by copying selected bits from selected parent
C. Two parent strings, by copying selected bits from one parent
D. None of these
Correct option is A
248. Mathematically characterize the evolution over time of the population

within a GA based on the concept of
A. Schema

lOMoARcPSD|7609677
B. Crossover
C. Don‟t care
D. Fitness function
Correct option is A
249. In genetic algorithm process of selecting parents which mate and

recombine to create off-springs for the next generation is known as:
A. Tournament selection
B. Rank selection
C. Fitness sharing
D. Parent selection
Correct option is D
250. Crossover operations are performed in genetic programming by replacing

A. Randomly chosen sub tree of one parent program by a sub tree from the
other parent program.
B. Randomly chosen root node tree of one parent program by a sub tree from
the other parent program
C. Randomly chosen root node tree of one parent program by a root node tree
from the other parent program
D. None of these
Correct option is A

lOMoARcPSD|7609677
What is Machine Learning (ML)?
The autonomous acquisition of knowledge through the use of manual programs
The selective acquisition of knowledge through the use of computer programs
The selective acquisition of knowledge through the use of manual programs
The autonomous acquisition of knowledge through the use of computer programs
Correct option is D
Father of Machine Learning (ML)
Geoffrey Chaucer
Geoffrey Hill
Geoffrey Everest Hinton
None of the above
Correct option is C
Which is FALSE regarding regression?
It may be used for interpretation
It is used for prediction
It discovers causal relationships
It relates inputs to outputs
Correct option is C
Choose the correct option regarding machine learning (ML) and artificial intelligence (AI)
ML is a set of techniques that turns a dataset into a software
AI is a software that can emulate the human mind
ML is an alternate way of programming intelligent machines
All of the above

lOMoARcPSD|7609677
Correct option is D
Which of the factors affect the performance of the learner system does not include?
Good data structures
Representation scheme used
Training scenario
Type of feedback
Correct option is A
In general, to have a well-defined learning problem, we must identity which of the following
The class of tasks
The measure of performance to be improved
The source of experience
All of the above
Correct option is D
Successful applications of ML
Learning to recognize spoken words
Learning to drive an autonomous vehicle
Learning to classify new astronomical structures
Learning to play world-class backgammon
All of the above
Correct option is E
Which of the following does not include different learning methods
Analogy
Introduction
Memorization
Deduction

lOMoARcPSD|7609677
Correct option is B
In language understanding, the levels of knowledge that does not include?
Empirical
Logical
Phonological
Syntactic
Correct option is A
Designing a machine learning approach involves:-
Choosing the type of training experience
Choosing the target function to be learned
Choosing a representation for the target function
Choosing a function approximation algorithm
All of the above
Correct option is E
Concept learning inferred a valued function from training examples of its input and output.
Decimal
Hexadecimal
Boolean
All of the above
Correct option is C
Which of the following is not a supervised learning?
Naïve Bayesian
PCA

lOMoARcPSD|7609677
Linear Regression
Decision Tree Answer
Correct option is B
What is Machine Learning?
Artificial Intelligence
Deep Learning
Data Statistics
Only (i)
(i) And (ii)
All
None
Correct option is B
What kind of learning algorithm for “Facial identities or facial expressions”?
Prediction
Recognition Patterns
Generating Patterns
Recognizing Anomalies Answer
Correct option is B
Which of the following is not type of learning?
Unsupervised Learning
Supervised Learning
Semi-unsupervised Learning
Reinforcement Learning
Correct option is C

lOMoARcPSD|7609677
Real-Time decisions, Game AI, Learning Tasks, Skill Acquisition, and Robot Navigation are applications of
which of the folowing
Supervised Learning: Classification
Unsupervised Learning: Clustering
Unsupervised Learning: Regression
Correct option is B
Targetted marketing, Recommended Systems, and Customer Segmentation are applications in which of
the following
Correct option is B
Fraud Detection, Image Classification, Diagnostic, and Customer Retention are applications in which of
the following
Correct option is B
Which of the following is not function of symbolic in the various function representation of Machine
Learning?
Rules in propotional Logic
Hidden-Markov Models (HMM)
Rules in first-order predicate logic
Decision Trees

lOMoARcPSD|7609677
Correct option is B
Which of the following is not numerical functions in the various function representation of Machine
Learning?
Neural Network
Support Vector Machines
Case-based
Linear Regression
Correct option is C
FIND-S Algorithm starts from the most specific hypothesis and generalize it by considering only
Negative
Positive
Negative or Positive
None of the above
Correct option is B
FIND-S algorithm ignores
Negative
Positive
Both
None of the above
Correct option is A
The Candidate-Elimination Algorithm represents the .
Solution Space
Version Space
Elimination Space
All of the above

lOMoARcPSD|7609677
Correct option is B
Inductive learning is based on the knowledge that if something happens a lot it is likely to be generally
True
False Answer
Correct option is A
Inductive learning takes examples and generalizes rather than starting with
Inductive
Existing
Deductive
None of these
Correct option is B
A drawback of the FIND-S is that it assumes the consistency within the training set
True
False
Correct option is A
What strategies can help reduce overfitting in decision trees?
Enforce a maximum depth for the tree
Enforce a minimum number of samples in leaf nodes
Pruning
Make sure each leaf node is one pure class
All
(i), (ii) and (iii)
(i), (iii), (iv)
None
Correct option is B

lOMoARcPSD|7609677
Which of the following is a widely used and effective machine learning algorithm based on the idea of
bagging?
Decision Tree
Random Forest
Regression
Classification
Correct option is B
To find the minimum or the maximum of a function, we set the gradient to zero because which of the
following
Depends on the type of problem
The value of the gradient at extrema of a function is always zero
Both (A) and (B)
None of these
Correct option is B
Which of the following is a disadvantage of decision trees?
Decision trees are prone to be overfit
Decision trees are robust to outliers
Factor analysis
None of the above
Correct option is A
What is perceptron?
A single layer feed-forward neural network with pre-processing
A neural network that contains feedback
A double layer auto-associative neural network
An auto-associative neural network

lOMoARcPSD|7609677
Correct option is A
Which of the following is true for neural networks?
The training time depends on the size of the
Neural networks can be simulated on a conventional
Artificial neurons are identical in operation to biological
All
Only (ii)
(i) And (ii)
None
Correct option is C
Subscribe our channel
What are the advantages of neural networks over conventional computers?
They have the ability to learn by
They are more fault
They are more suited for real time operation due to their high „computational‟
(i) and (ii)
(i) and (iii)
Only (i)
All
None
Correct option is D
What is Neuro software?
It is software used by Neurosurgeon
Designed to aid experts in real world
It is powerful and easy neural network

lOMoARcPSD|7609677
A software used to analyze neurons
Correct option is C
Which is true for neural networks?
Each node computes it‟s weighted input
Node could be in excited state or non-excited state
It has set of nodes and connections
All of the above
Correct option is D
What is the objective of backpropagation algorithm?
To develop learning algorithm for multilayer feedforward neural network, so that network can be
trained to capture the mapping implicitly
To develop learning algorithm for multilayer feedforward neural network
To develop learning algorithm for single layer feedforward neural network
All of the above
Correct option is A
Which of the following is true?
Perform pattern recognition
Find the parity of a picture
Determine whether two or more shapes in a picture are connected or not
(ii) And (iii)
Only (ii)
All
None
Correct option is A

lOMoARcPSD|7609677
The backpropagation law is also known as generalized delta rule
True
False
Correct option is A
Which of the following is true?
On average, neural networks have higher computational rates than conventional computers.
Neural networks learn by
Neural networks mimic the way the human brain
All
(ii) and (iii)
(i), (ii) and (iii)
None
Correct option is A
What is true regarding backpropagation rule?
Error in output is propagated backwards only to determine weight updates
There is no feedback of signal at nay stage
It is also called generalized delta rule
All of the above
Correct option is D
There is feedback in final stage of backpropagation
True
False
Correct option is B
An auto-associative network is

lOMoARcPSD|7609677
A neural network that has only one loop
A neural network that contains feedback
A single layer feed-forward neural network with pre-processing
A neural network that contains no loops
Correct option is B
A 3-input neuron has weights 1, 4 and 3. The transfer function is linear with the constant of
proportionality being equal to 3. The inputs are 4, 8 and 5 respectively. What will be the output?
139
153
162
160
Correct option is B
What of the following is true regarding backpropagation rule?
Hidden layers output is not all important, they are only meant for supporting input and output layers
Actual output is determined by computing the outputs of units for each hidden layer
It is a feedback neural network
None of the above
Correct option is B
What is back propagation?
It is another name given to the curvy function in the perceptron
It is the transmission of error back through the network to allow weights to be adjusted so that the
network can learn
It is another name given to the curvy function in the perceptron
None of the above
Correct option is B

lOMoARcPSD|7609677
The general limitations of back propagation rule is/are
Scaling
Slow convergence
Local minima problem
All of the above
Correct option is D
What is the meaning of generalized in statement “backpropagation is a generalized delta rule” ?
Because delta is applied to only input and output layers, thus making it more simple and generalized
It has no significance
Because delta rule can be extended to hidden layer units
None of the above
Correct option is C
Neural Networks are complex functions with many parameter
Linear
Non linear
Discreate
Exponential
Correct option is A
The general tasks that are performed with backpropagation algorithm
Pattern mapping
Prediction
Function approximation
All of the above
Correct option is D
Backpropagaion learning is based on the gradient descent along error surface.

lOMoARcPSD|7609677
True
False
Correct option is A
In backpropagation rule, how to stop the learning process?
No heuristic criteria exist
On basis of average gradient value
There is convergence involved
None of these
Correct option is B
Applications of NN (Neural Network)
Risk management
Data validation
Sales forecasting
All of the above
Correct option is D
The network that involves backward links from output to the input and hidden layers is known as
Recurrent neural network
Self organizing maps
Perceptrons
Single layered perceptron
Correct option is A
Decision Tree is a display of an Algorithm?
True
False
Correct option is A

lOMoARcPSD|7609677
Which of the following is/are the decision tree nodes?
End Nodes
Decision Nodes
Chance Nodes
All of the above
Correct option is D
End Nodes are represented by which of the following
Solar street light
Triangles
Circles
Squares
Correct option is B
Decision Nodes are represented by which of the following
Solar street light
Triangles
Circles
Squares
Correct option is D
Chance Nodes are represented by which of the following
Solar street light
Triangles
Circles
Squares
Correct option is C

lOMoARcPSD|7609677
Advantage of Decision Trees
Possible Scenarios can be added
Use a white box model, if given result is provided by a model
Worst, best and expected values can be determined for different scenarios
All of the above
Correct option is D
Terms are required for building a bayes model.
Correct option is C
Which of the following is the consequence between a node and its predecessors while creating bayesian
network?
Conditionally independent
Functionally dependent
Both Conditionally dependant & Dependant
Dependent
Correct option is A
Why it is needed to make probabilistic systems feasible in the world?
Feasibility
Reliability
Crucial robustness
None of the above
Correct option is C

lOMoARcPSD|7609677
Bayes rule can be used for:-
Solving queries
Increasing complexity
Answering probabilistic query
Decreasing complexity
Correct option is C
Provides way and means of weighing up the desirability of goals and the likelihood of achieving
Utility theory
Decision theory
Bayesian networks
Probability theory
Correct option is A
Which of the following provided by the Bayesian Network?
Complete description of the problem
Partial description of the domain
Complete description of the domain
All of the above
Correct option is C
65. Probability provides a way of summarizing the that comes from our laziness and
Belief
Uncertaintity
Joint probability distributions
Randomness
Correct option is B

lOMoARcPSD|7609677
The entries in the full joint probability distribution can be calculated as
Using variables
Both Using variables & information
Using information
All of the above
Correct option is C
Causal chain (For example, Smoking cause cancer) gives rise to:-
Conditionally Independence
Conditionally Dependence
Both
None of the above
Correct option is A
The bayesian network can be used to answer any query by using:-
Full distribution
Joint distribution
Partial distribution
All of the above
Correct option is B
Bayesian networks allow compact specification of:-
Joint probability distributions
Belief
Propositional logic statements
All of the above
Correct option is A
The compactness of the bayesian network can be described by

lOMoARcPSD|7609677
Fully structured
Locally structured
Partially structured
All of the above
Correct option is B
The Expectation-Maximization Algorithm has been used to identify conserved domains in unaligned
proteins only. State True or False.
True
False
Correct option is B
Which of the following is correct about the Naïve Bayes?
Assumes that all the features in a dataset are independent
Assumes that all the features in a dataset are equally important
Both
All of the above
Correct option is C
Which of the following is false regarding EM Algorithm?
The alignment provides an estimate of the base or amino acid composition of each column in the site
The column-by-column composition of the site already available is used to estimate the probability of
finding the site at any position in each of the sequences
The row-by-column composition of the site already available is used to estimate the probability
None of the above
Correct option is C
Naïve Bayes Algorithm is a learning algorithm.
Supervised

lOMoARcPSD|7609677
Reinforcement
Unsupervised
None of these
Correct option is A
EM algorithm includes two repeated steps, here the step 2 is .
The normalization
The maximization step
The minimization step
None of the above
Correct option is C
Examples of Naïve Bayes Algorithm is/are
Spam filtration
Sentimental analysis
Classifying articles
All of the above
Correct option is D
In the intermediate steps of “EM Algorithm”, the number of each base in each column is determined
and then converted to
True
False
Correct option is A
Naïve Bayes algorithm is based on and used for solving classification problems.
Bayes Theorem
Candidate elimination algorithm
EM algorithm

lOMoARcPSD|7609677
None of the above
Correct option is A
Types of Naïve Bayes Model:
Gaussian
Multinomial
Bernoulli
All of the above
Correct option is D
Disadvantages of Naïve Bayes Classifier:
Naïve Bayes assumes that all features are independent or unrelated, so it cannot learn the relationship
between
It performs well in Multi-class predictions as compared to the other
Naïve Bayes is one of the fast and easy ML algorithms to predict a class of
It is the most popular choice for text classification problems.
Correct option is A
The benefit of Naïve Bayes:-
Naïve Bayes is one of the fast and easy ML algorithms to predict a class of
It is the most popular choice for text classification problems.
It can be used for Binary as well as Multi-class
All of the above
Correct option is D
In which of the following types of sampling the information is carried out under the opinion of an
expert?
Convenience sampling
Judgement sampling

lOMoARcPSD|7609677
Quota sampling
Purposive sampling
Correct option is B
Full form of MDL?
Minimum Description Length
Maximum Description Length
Minimum Domain Length
None of these
Correct option is A
For the analysis of ML algorithms, we need
Computational learning theory
Statistical learning theory
Both A & B
None of these
Correct option is C
PAC stand for
Probably Approximate Correct
Probably Approx Correct
Probably Approximate Computation
Probably Approx Computation
Correct option is A
86. hypothesis h with respect to target concept c and distribution D , is the probability that h
will misclassify an instance drawn at random according to D.
True Error

lOMoARcPSD|7609677
Type 1 Error
Type 2 Error
None of these
Correct option is A
Statement: True error defined over entire instance space, not just training data
True
False
Correct option is A
What are the area CLT comprised of?
Sample Complexity
Computational Complexity
Mistake Bound
All of these
Correct option is D
What area of CLT tells “How many examples we need to find a good hypothesis ?”?
Sample Complexity
Mistake Bound
None of these
Correct option is A
What area of CLT tells “How much computational power we need to find a good hypothesis ?”?
Sample Complexity
Mistake Bound
None of these

lOMoARcPSD|7609677
Correct option is B
What area of CLT tells “How many mistakes we will make before finding a good hypothesis ?”?
Sample Complexity
Mistake Bound
None of these
Correct option is C
(For question no. 9 and 10) Can we say that concept described by conjunctions of Boolean literals are
PAC learnable?
Yes
No
Correct option is A
How large is the hypothesis space when we have n Boolean attributes?
|H| = 3 n
|H| = 2 n
|H| = 1 n
|H| = 4n
Correct option is A
The VC dimension of hypothesis space H1 is larger than the VC dimension of hypothesis space H2. Which
of the following can be inferred from this?
The number of examples required for learning a hypothesis in H1 is larger than the number of examples
required for H2
The number of examples required for learning a hypothesis in H1 is smaller than the number of
examples required for
No relation to number of samples required for PAC learning.
Correct option is A

lOMoARcPSD|7609677
For a particular learning task, if the requirement of error parameter changes from 0.1 to 0.01. How
many more samples will be required for PAC learning?
Same
2 times
1000 times
10 times
Correct option is D
Computational complexity of classes of learning problems depends on which of the following?
The size or complexity of the hypothesis space considered by learner
The accuracy to which the target concept must be approximated
The probability that the learner will output a successful hypothesis
All of these
Correct option is D
The instance-based learner is a
Lazy-learner
Eager learner
Can‟t say
Correct option is A
When to consider nearest neighbour algorithms?
Instance map to point in kn
Not more than 20 attributes per instance
Lots of training data
None of these
A, B & C
Correct option is E

lOMoARcPSD|7609677
What are the advantages of Nearest neighbour alogo?
Training is very fast
Can learn complex target functions
Don‟t lose information
All of these
Correct option is D
What are the difficulties with k-nearest neighbour algo?
Calculate the distance of the test case from all training cases
Curse of dimensionality
Both A & B
None of these
Correct opt

lOMoARcPSD|7609677
CS 189 Introduction to
Spring 2016 Machine Learning Final
• Please do not open the exam before you are instructed to do so.
• The exam is closed book, closed notes except your two-page cheat sheet.
• Electronic devices are forbidden on your person, including cell phones, iPods, headphones, and laptops.
Turn your cell phone off and leave all electronics at the front of the room, or risk getting a zero on
the exam.
• You have 3 hours.
• Please write your initials at the top right of each page (e.g., write “JS” if you are Jonathan Shewchuk). Finish
this by the end of your 3 hours.
• Mark your answers on front of each page, not the back. We will not scan the backs of each page, but you may
use them as scratch paper. Do not attach any extra sheets.
• The total number of points is 150. There are 30 multiple choice questions worth 3 points each, and 6 written
questions worth a total of 60 points.
• For multiple-choice questions, fill in the boxes for ALL correct choices: there may be more than one correct
choice, but there is always at least one correct choice. NO partial credit on multiple-choice questions: the
set of all correct answers must be checked.
First name
Last name
SID
First and last name of student to your left
First and last name of student to your right

lOMoARcPSD|7609677
Q1. [90 pts] Multiple Choice

Check the boxes for ALL CORRECT CHOICES. Every question should have at least one box checked. NO PARTIAL
CREDIT: the set of all correct answers (only) must be checked.
(1) [3 pts] What strategies can help reduce overfitting in decision trees?
Pruning Enforce a minimum number of samples in leaf

nodes
Make sure each leaf node is one pure class
Enforce a maximum depth for the tree
(2) [3 pts] Which of the following are true of convolutional neural networks (CNNs) for image analysis?
Filters in earlier layers tend to include edge They have more parameters than fully-
detectors connected networks with the same number of lay-
ers and the same numbers of neurons in each layer
Pooling layers reduce the spatial resolution of A CNN can be trained for unsupervised learn-
the image ing tasks, whereas an ordinary neural net cannot
(3) [3 pts] Neural networks
optimize a convex cost function always output values between 0 and 1
can be used for regression as well as classifica-

tion can be used in an ensemble
(4) [3 pts] Which of the following are true about generative models?
They model the joint distribution P (class = The perceptron is a generative model
C AND sample = x)
Linear discriminant analysis is a generative
They can be used for classification model
(5) [3 pts] Lasso can be interpreted as least-squares linear regression where
weights are regularized with the ℓ1 norm the weights have a Gaussian prior
weights are regularized with the ℓ2 norm the solution algorithm is simpler
(6) [3 pts] Which of the following methods can achieve zero training error on any linearly separable dataset?
Decision tree 15-nearest neighbors
Hard-margin SVM Perceptron
(7) [3 pts] The kernel trick
can be applied to every classification algorithm is commonly used for dimensionality reduction
changes ridge regression so we solve a d × d exploits the fact that in many learning al-
linear system instead of an n × n system, given n gorithms, the weights can be written as a linear
sample points with d features combination of input points

lOMoARcPSD|7609677
(8) [3 pts] Suppose we train a hard-margin linear SVM on n > 100 data points in R2 , yielding a hyperplane with
exactly 2 support vectors. If we add one more data point and retrain the classifier, what is the maximum
possible number of support vectors for the new hyperplane (assuming the n + 1 points are linearly separable)?
2 n
3 n+1
(9) [3 pts] In latent semantic indexing, we compute a low-rank approximation to a term-document matrix. Which
of the following motivate the low-rank reconstruction?
Finding documents that are related to each The low-rank approximation provides a loss-
other, e.g. of a similar genre less method for compressing an input matrix
In many applications, some principal compo-

nents encode noise rather than meaningful struc- Low-rank approximation enables discovery of
ture nonlinear relations
(10) [3 pts] Which of the following are true about subset selection?
Subset selection can substantially decrease the Subset selection can reduce overfitting
bias of support vector machines
Ridge regression frequently eliminates some of Finding the true best subset takes exponential
the features time
(11) [3 pts] In neural networks, nonlinear activation functions such as sigmoid, tanh, and ReLU
speed up the gradient calculation in backprop- help to learn nonlinear decision boundaries
agation, as compared to linear units
are applied only to the output units always output values between 0 and 1
(12) [3 pts] Suppose we are given data comprising points of several different classes. Each class has a different
probability distribution from which the sample points are drawn. We do not have the class labels. We use
k-means clustering to try to guess the classes. Which of the following circumstances would undermine its
effectiveness?
Some of the classes are not normally dis- The variance of each distribution is small in
tributed all directions
Each class has the same mean You choose k = n, the number of sample points
(13) [3 pts] Which of the following are true of spectral graph partitioning methods?
They find the cut with minimum weight They minimize a quadratic function subject to
one constraint: the partition must be balanced
They use one or more eigenvectors of the
Laplacian matrix The Normalized Cut was invented at Stanford
(14) [3 pts] Which of the following can help to reduce overfitting in an SVM classifier?
Use of slack variables High-degree polynomial features
Normalizing the data Setting a very low learning rate

lOMoARcPSD|7609677
(15) [3 pts] Which value of k in the k-nearest neighbors algorithm generates the solid decision boundary depicted
here? There are only 2 classes. (Ignore the dashed line, which is the Bayes decision boundary.)
k=1 k=2
k = 10 k = 100
(16) [3 pts] Consider one layer of weights (edges) in a convolutional neural network (CNN) for grayscale images,
connecting one layer of units to the next layer of units. Which type of layer has the fewest parameters to be
learned during training? (Select one.)
A convolutional layer with 10 3 × 3 filters A convolutional layer with 8 5 × 5 filters
A max-pooling layer that reduces a 10 × 10 A fully-connected layer from 20 hidden units

image to 5 × 5 to 4 output units
(17) [3 pts] In the kernelized perceptron algorithm with learning rate ǫ = 1, the coefficient ai corresponding to a
training example xi represents the weight for K(xi , x). Suppose we have a two-class classification problem with
yi ∈ {1, −1}. If yi = 1, which of the following can be true for ai ?
ai = −1 ai = 1
ai = 0 ai = 5
(18) [3 pts] Suppose you want to split a graph G into two subgraphs. Let L be G’s Laplacian matrix. Which of the
following could help you find a good split?
The eigenvector corresponding to the second- The left singular vector corresponding to the
largest eigenvalue of L second-largest singular value of L
The eigenvector corresponding to the second- The left singular vector corresponding to the
smallest eigenvalue of L second-smallest singular value of L
(19) [3 pts] Which of the following are properties that a kernel matrix always has?
Invertible All the entries are positive
At least one negative eigenvalue Symmetric

lOMoARcPSD|7609677
(20) [3 pts] How does the bias-variance decomposition of a ridge regression estimator compare with that of ordinary
least squares regression? (Select one.)
Ridge has larger bias, larger variance Ridge has smaller bias, larger variance
Ridge has larger bias, smaller variance Ridge has smaller bias, smaller variance
(21) [3 pts] Both PCA and Lasso can be used for feature selection. Which of the following statements are true?
Lasso selects a subset (not necessarily a strict PCA and Lasso both allow you to specify how
subset) of the original features many features are chosen
PCA produces features that are linear combi- PCA and Lasso are the same if you use the
nations of the original features kernel trick
(22) [3 pts] Which of the following are true about forward subset selection?
O(2d ) models must be trained during the al- It finds the subset of features that give the
gorithm, where d is the number of features lowest test error
It greedily adds the feature that most improves Forward selection is faster than backward se-
cross-validation accuracy lection if few features are relevant to prediction
(23) [3 pts] You’ve just finished training a random forest for spam classification, and it is getting abnormally bad
performance on your validation set, but good performance on your training set. Your implementation has no
bugs. What could be causing the problem?
Your decision trees are too deep You have too few trees in your ensemble
You are randomly sampling too many features Your bagging implementation is randomly
when you choose a split sampling sample points without replacement
   
6 3 1
 2 7 0
(24) [3 pts] Consider training a decision tree given a design matrix X = 
9 6 and labels y = 1. Let f1 denote
  
4 2 0
feature 1, corresponding to the first column of X, and let f2 denote feature 2, corresponding to the second
column. Which of the following splits at the root node gives the highest information gain? (Select one.)
f1 > 2 f2 > 3
f1 > 4 f2 > 6
(25) [3 pts] In terms of the bias-variance decomposition, a 1-nearest neighbor classifier has than a
3-nearest neighbor classifier.
higher variance higher bias
lower variance lower bias

lOMoARcPSD|7609677
(26) [3 pts] Which of the following are true about bagging?
In bagging, we choose random subsamples of The main purpose of bagging is to decrease

the input points with replacement the bias of learning algorithms.
Bagging is ineffective with logistic regression, If we use decision trees that have one sample
because all of the learners learn exactly the same point per leaf, bagging never gives lower training
decision boundary error than one ordinary decision tree
(27) [3 pts] An advantage of searching for an approximate nearest neighbor, rather than the exact nearest neighbor,
is that
it sometimes makes exhaustive search much the nearest neighbor classifier is sometimes
faster much more accurate
you find all the points within a distance of

it sometimes makes searching in a k-d tree (1 + ǫ)r from the query point, where r is the dis-
much faster tance from the query point to its nearest neighbor
(28) [3 pts] In the derivation of the spectral graph partitioning algorithm, we relax a combinatorial optimization
problem to a continuous optimization problem. This relaxation has the following effects.
The combinatorial problem requires an ex- The combinatorial problem requires finding
act bisection of the graph, but the continuous al- eigenvectors, whereas the continuous problem re-
gorithm can produce (after rounding) partitions quires only matrix multiplication
that aren’t perfectly balanced
The combinatorial problem cannot be modi- The combinatorial problem is NP-hard, but
fied to accommodate vertices that have different the continuous problem can be solved in polyno-
masses, whereas the continuous problem can mial time
(29) [3 pts] The firing rate of a neuron
determines how strongly the dendrites of the is more analogous to the output of a unit in a
neuron stimulate axons of neighboring neurons neural net than the output voltage of the neuron
only changes very slowly, taking a period of can sometimes exceed 30,000 action potentials
several seconds to make large adjustments per second
(30) [3 pts] In algorithms that use the kernel trick, the Gaussian kernel
gives a regression function or predictor func- is equivalent to lifting the d-dimensional sam-
tion that is a linear combination of Gaussians cen- ple points to points in a space whose dimension
tered at the sample points is exponential in d
is less prone to oscillating than polynomials, has good properties in theory but is rarely
assuming the variance of the Gaussians is large used in practice
(31) 3 bonus points! The following Berkeley professors were cited in this semester’s lectures (possibly self-cited)
for specific research contributions they made to machine learning.
David Culler Michael Jordan
Jitendra Malik Leo Breiman
Anca Dragan Jonathan Shewchuk

lOMoARcPSD|7609677
Q2. [8 pts] Feature Selection

A newly employed former CS 189/289A student trains the latest Deep Learning classifier and obtains state-of-the-art
accuracy. However, the classifier uses too many features! The boss is overwhelmed and asks for a model with fewer
features.
Let’s try to identify the most important features. Start with a simple dataset in R2 .
(1) [4 pts] Describe the training error of a Bayes optimal classifier that can see only the first feature of the data.
Describe the training error of a Bayes optimal classifier that can see only the second feature.
The first feature yields a training error of 50% (like random guessing). The second feature offers a training error of
zero.
(2) [4 pts] Based on this toy example, the student decides to fit a classifier on each feature individually, then
rank the features by their classifier’s accuracy, take the best k features, and train a new classifier on those k
features. We call this approach variable ranking. Unfortunately, the classifier trained on the best k features
obtains horrible accuracy, unless k is very close to d, the original number of features!
Construct a toy dataset in R2 for which variable ranking fails. In other words, a dataset where a variable is
useless by itself, but potentially useful alongside others. Use + for data points in Class 1, and O for data points
in Class 2.
An XOR Dataset is unpredictable with either feature. (This extends to n-dimensions, with the n-bit parity string.)

lOMoARcPSD|7609677
Q3. [10 pts] Gradient Descent for k-means Clustering

Recall the loss function for k-means clustering with k clusters, sample points x1 , ..., xn , and centers µ1 , ..., µk :
k X
X
L= kxi − µj k2 ,
j=1 xi ∈Sj
where Sj refers to the set of data points that are closer to µj than to any other cluster mean.
(1) [4 pts] Instead of updating µj by computing the mean, let’s minimize L with batch gradient descent while
holding the sets Sj fixed. Derive the update formula for µ1 with learning rate (step size) ǫ.
∂L ∂ X
= (xi − µ1 )⊤ (xi − µ1 )
∂µ1 ∂µ1
xi ∈S1
X
= 2(µ1 − xi ).
xi ∈S1
Therefore the update formula is X

µ1 ← µ1 + ǫ (xi − µ1 ).
xi ∈S1
(Note: writing 2ǫ instead of ǫ is fine.)
(2) [2 pts] Derive the update formula for µ1 with stochastic gradient descent on a single sample point xi . Use
learning rate ǫ.
µ1 ← µ1 + ǫ(xi − µ1 ) if xi ∈ S1 , otherwise no change.
(3) [4 pts] In this part, we will connect the batch gradient descent update equation with the standard k-means
algorithm. Recall that in the update step of the standard algorithm, we assign each cluster center to be the
mean (centroid) of the data points closest to that center. It turns out that a particular choice of the learning
rate ǫ (which may be different for each cluster) makes the two algorithms (batch gradient descent and the
standard k-means algorithm) have identical update steps. Let’s focus on the update for the first cluster, with
center µ1 . Calculate the value of ǫ so that both algorithms perform the same update for µ1 . (If you do it right,
the answer should be very simple.)
In the standard algorithm, we assign µ1 ← xi ∈S1 |S11 | xi .
P
Comparing to the answer in (1), we set xi ∈S1 |S11 | xi = µ1 + ǫ xi ∈S1 (xi − µ1 ) and solve for ǫ.
P P
X 1 X 1 X
xi − µ1 = ǫ (xi − µ1 )
|S1 | |S1 |
xi ∈S1 xi ∈S1 xi ∈S1
X 1 X
(xi − µ1 ) = ǫ (xi − µ1 ).
|S1 |
xi ∈S1 xi ∈S1
1
Thus ǫ = |S1 | .
(Note: answers that differ by a constant factor are fine if consistent with answer for (1).)

lOMoARcPSD|7609677
Q4. [10 pts] Kernels

(1) [2 pts] What is the primary motivation for using the kernel trick in machine learning algorithms?
If we want to map sample points to a very high-dimensional feature space, the kernel trick can save us from
having to compute those features explicitly, thereby saving a lot of time.
(Alternative solution: the kernel trick enables the use of infinite-dimensional feature spaces.)
(2) [4 pts] Prove that for every design matrix X ∈ Rn×d , the corresponding kernel matrix is positive semidefinite.
For every vector z ∈ Rn ,
z⊤ Kz = z⊤ XX ⊤ z = |X ⊤ z|2 ,
which is clearly nonnegative.
(3) [2 pts] Suppose that a regression algorithm contains the following line of code.
w ← w + X ⊤ M XX ⊤ u
Here, X ∈ Rn×d is the design matrix, w ∈ Rd is the weight vector, M ∈ Rn×n is a matrix unrelated to X,
and u ∈ Rn is a vector unrelated to X. We want to derive a dual version of the algorithm in which we express
the weights w as a linear combination of samples Xi (rows of X) and a dual weight vector a contains the
coefficients of that linear combination. Rewrite the line of code in its dual form so that it updates a correctly
(and so that w does not appear).
a ← a + M XX ⊤ u
(4) [2 pts] Can this line of code for updating a be kernelized? If so, show how. If not, explain why.
Yes:
a ← a + M Ku

lOMoARcPSD|7609677
Q5. [12 pts] Let’s PCA

 
6 −4
 −3 5 
You are given a design matrix X = 
 −2
. Let’s use PCA to reduce the dimension from 2 to 1.
6 
7 −3
(1) [6 pts] Compute the covariance matrix for the sample points. (Warning: Observe that X is not centered.)
Then compute the unit eigenvectors, and the corresponding eigenvalues, of the covariance matrix. Hint: If
you graph the points, you can probably guess the eigenvectors (then verify that they really are eigenvectors).

⊤ 82 −80
The covariance matrix is X X = .
−80 82
" # " #
√1 √1
Its unit eigenvectors are 2 with eigenvalue 2 and 2 with eigenvalue 162. (Note: either eigenvector
√1 − √12
2
can be replaced with its negation.)
(2) [3 pts] Suppose we use PCA to project the sample points onto a one-dimensional space. What one-dimensional
subspace are we projecting onto? For each of the four sample points in X (not the centered version of X!),
write the coordinate (in principal coordinate space, not in R2 ) that the point is projected to.
" #
√1

2 1
We are projecting onto the subspace spanned by . (Equivalently, onto the space spanned by . Equiva-
− √12 −1
10
lently, onto the line x + y = 0.) The projections are (6, −4) → √
2
, (−3, 5) → − √82 , (−2, 6) → − √82 , (7, −3) → 10
√
2
.
(3) [3 pts] Given a design matrix X that is taller than it is wide, prove that every right singular vector of X with
singular value σ is an eigenvector of the covariance matrix with eigenvalue σ 2 .
If v is a right singular vector of X, then there is a singular value decomposition X = U DV ⊤ such that v is a column
of V . Here each of U and V has orthonormal columns, V is square, and D is square and diagonal. The covariance
matrix is X ⊤ X = V DU ⊤ U DV ⊤ = V D2 V ⊤ . This is an eigendecomposition of X ⊤ X, so each singular vector in V
with singular value σ is an eigenvector of X ⊤ X with eigenvalue σ 2 .
10

lOMoARcPSD|7609677
Q6. [10 pts] Trees

13
1 5 5
16
10 12 2 12
3 15 3 4 10 9
17
2 4 1 16 8 14
14 13 6 7 15 11
6
8 11 17
9
7
(1) [5 pts] Above, we have two depictions of the same k-d tree, which we have built to solve nearest neighbor
queries. Each node of the tree at right represents a rectangular box at left, and also stores one of the sample
points that lie inside that box. (The root node represents the whole plane R2 .) If a treenode stores sample point
i, then the line passing through point i (in the diagram at left) determines which boxes the child treenodes
represent.
Simulate running an exact 1-nearest neighbor query, where the bold X is the query point. Recall that the query
algorithm visits the treenodes in a smart order, and keeps track of the nearest point it has seen so far.
• Write down the numbers of all the sample points that serve as the “nearest point seen so far” sometime
while the query algorithm is running, in the order they are encountered.
• Circle all the subtrees in the k-d tree at upper right that are never visited during this query. (This is why
k-d tree search is usually faster than exhaustive search.)
Nearest point seen so far: first 5, then 12, then 10.
The unvisited subtrees are rooted at 2, 13, 7, and 17.
(2) [5 pts] We are building a decision tree for a 2-class classification problem. We have n training points, each having
d real-valued features. At each node of the tree, we try every possible univariate split (i.e. for each feature, we
try every possible splitting value for that feature) and choose the split that maximizes the information gain.
Explain why it is possible to build the tree in O(ndh) time, where h is the depth of the tree’s deepest node.
Your explanation should include an analysis of the time to choose one node’s split. Assume that we can radix
sort real numbers in linear time.
Consider choosing the split at a node whose box contains n′ sample points. For each of the d features, we can sort
the sample points in O(n′ d) time. Then we can compute the entropy for the first split (separating the first sample
in the sorted list from the others) in O(n′ ) time, then we can walk through the list and update the entropy for each
successive split in O(1) time, summing to a total of O(n′ ) time for each of the d features. So it takes O(n′ d) time
overall to choose a split.
Each sample point participates in at most h treenodes, so each sample point contributes at most dh to the running
time, for a total running time of at most O(ndh).
11

lOMoARcPSD|7609677
Q7. [10 pts] Self-Driving Cars and Backpropagation

You want to train a neural network to drive a car. Your training data consists of grayscale 64 × 64 pixel images. The
training labels include the human driver’s steering wheel angle in degrees and the human driver’s speed in miles per
hour. Your neural network consists of an input layer with 64 × 64 = 4,096 units, a hidden layer with 2,048 units,
and an output layer with 2 units (one for steering angle, one for speed). You use the ReLU activation function for
the hidden units and no activation function for the outputs (or inputs).
(1) [2 pts] Calculate the number of parameters (weights) in this network. You can leave your answer as an
expression. Be sure to account for the bias terms.
4097 × 2048 + 2049 × 2
(2) [3 pts] You train your network with the cost function J = 12 |y − z|2 . Use the following notation.
• x is a training image (input) vector with a 1 component appended to the end, y is a training label (input)
vector, and z is the output vector. All vectors are column vectors.
• r(γ) = max{0, γ} is the ReLU activation function, r′ (γ) is its derivative (1 if γ > 0, 0 otherwise), and
r(v) is r(·) applied component-wise to a vector.
• g is the vector of hidden unit values before the ReLU activation functions are applied, and h = r(g) is
the vector of hidden unit values after they are applied (but we append a 1 component to the end of h).
• V is the weight matrix mapping the input layer to the hidden layer; g = V x.
• W is the weight matrix mapping the hidden layer to the output layer; z = W h.
Derive ∂J/∂Wij .
∂J ∂z
= (z − y)⊤
∂Wij ∂Wij
= (zi − yi )hj
(3) [1 pt] Write ∂J/∂W as an outer product of two vectors. ∂J/∂W is a matrix with the same dimensions as W ;
it’s just like a gradient, except that W and ∂J/∂W are matrices rather than vectors.
∂J
= (z − y)h⊤
∂W
(4) [4 pts] Derive ∂J/∂Vij .
∂J ∂z
= (z − y)⊤
∂Vij ∂Vij
∂h
= (z − y)⊤ W
∂Vij
= (z − y)⊤ W [0, . . . , r′ (gi ) xj , . . . , 0]⊤
= ((z − y)⊤ W )i r′ (gi ) xj .
12

lOMoARcPSD|7609677
5/3/2021 Reinforcement Learning - Ai Quiz Questions
QUIZ
QuizTOPIC - REINFORCEMENT LEARNING
Category
Machine Learning 1. Reinforcement learning is- 
Data Pre Processing 

A. Unsupervised learning 
Regression B. Supervised learning  

C. Award based learning 
Classification D. None

Clustering 
2. Which of the following is an application of reinforcement learning?

Reinforcement Learning 
Natural Language Processing A. Topic modeling  
B. Recommendation system 
Artificial Intelligence C. Pattern recognition  
D. Image classification 
3. Upper confidence bound is a
A. Reinforcement algorithm 
B. Supervised algorithm 
C. Unsupervised algorithm 
D. None 
4. Which of the following is true about reinforcement learning?
A. The agent gets rewards or penalty according to the action 

B. It’s an online learning 
C. The target of an agent is to maximize the rewards 

D. All of the above 
5. You have a task which is to show relative ads to target users. Which
algorithm you should use for this task?
A. K means clustering 
B. Naive Bayes 
C. Support vector machine 
D. Upper confidence bound 
6. Hidden Markov Model is used in-
A. Supervised learning 
B. Unsupervised learning  
C. Reinforcement learning 
https://www.aionlinecourse.com/ai-quiz-questions/machine-learning/reinforcement-learning 1/2
lOMoARcPSD|7609677
5/3/2021 Reinforcement Learning - Ai Quiz Questions
7. Which algorithm is used in robotics and industrial automation?
‘A. Thompson sampling 
B. Naive Bayes 
C. Decision tree 
D. All of the above 
8. Thompson sampling is a-
A. Probabilistic algorithm 
B. Based on Bayes inference rule 
C. Reinforcement learning algorithm 
9. Which of the following is false about Upper confidence bound?
A. It’s a Deterministic algorithm 

B. It does not allow delayed feedback 
C. It is not based on Bayes inference 
D. None 
10. The multi-armed bandit problem is a generalized use case for-
A. Reinforcement learning 
B. Supervised learning 
C. Unsupervised learning 
D. All of the above 
About Copyright
Help Terms &
Contact Condition
Blog Privacy Policy
    
© 2021 aionlinecourse.com All rights reserved.
https://www.aionlinecourse.com/ai-quiz-questions/machine-learning/reinforcement-learning 2/2
lOMoARcPSD|7609677
ML interview interview questions

lOMoARcPSD|7609677
Machine Learning/Data Science Interview

Cheat sheets
Aqeel Anwar
Version: 0.1.0.1
This document contains cheat sheets on various topics asked during a Machine Learn-
ing/Data science interview. This document is constantly updated to include more topics.
Click here to get the updated version
Table of Contents
Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1. Bias-Variance Trade-off . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Imbalanced Data in Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4. Bayes’ Theorem and Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
5. Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
6. Regularization in ML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
7. Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
8. Famous CNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
9. Ensemble Methods in Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Behavioral Interview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1. How to prepare for behavioral interview? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11
2. How to answer a behavioral question? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Page 1(sakshamsharma0308@gmail.com)
Downloaded by Saksham Sharma of 15
lOMoARcPSD|7609677
Cheat Sheet – Bias-Variance Tradeoff

What is Bias?
• Error between average model prediction and ground truth
• The bias of the estimated function tells us the capacity of the underlying model to
predict the values
What is Variance?
• Average variability in the model prediction for the given dataset
• The variance of the estimated function tells you how much the function can adjust
to the change in the dataset
High Bias Overly-simplified Model
Under-fitting
High error on both test and train data
High Variance Overly-complex Model

Over-fitting
Low error on train data and high on test
Starts modelling the noise in the input
<$YS $sR NBbS $sR

NBbSzs$st{ <$YSzs$st{
NBbS $sR <$YS $sR
NBbSzs$st{
Minimum Error
$sR
zs$st{
eB
<$YSzs$st{
=tITP$$t FZRSi$Y !%TP$$t

_PIS$PSR$9S _PIS$PSR$9S
BPSIssRS$RSR8sjj BPSIssRS$RSjs
Bias variance Trade-off
• Increasing bias reduces variance and vice-versa
• Error = bias2 + variance +irreducible error
• The best model is where the error is reduced.
• Compromise between bias and variance
Source: https://www.cheatsheets.aqeel-anwar.com
lOMoARcPSD|7609677
Cheat Sheet – Imbalanced Data in Classification

Blue: Label 1
Green: Label 0 Correct Predictions

Accuracy =
Total Predictions
Classifier that always predicts label blue yields prediction accuracy of 90%
Accuracy doesn’t always give the correct insight about your trained model
Accuracy: %age correct prediction Correct prediction over total predictions One value for entire network
Precision: Exactness of model From the detected cats, how many were Each class/label has a value
actually cats
Recall: Completeness of model Correctly detected cats over total cats Each class/label has a value
F1 Score: Combines Precision/Recall Harmonic mean of Precision and Recall Each class/label has a value
Performance metrics associated with Class 1

(Is your prediction correct?) (What did you predict)
Actual Labels True Negative
1 0
(Your prediction is correct) (You predicted 0)
TP FP
True False
Predicted Labels
Precision = False +ve rate =

1
Positive Positive TP + FP TN + FP
(Prec x Rec) TP + TN
F1 score = 2x Accuracy =
(Prec + Rec) TP + FN + FP + TN
False True
0
Negative Negative TN TP
Specificity = Recall, Sensitivity =
TN +FP True +ve rate TP + FN
Possible solutions
1. Data Replication: Replicate the available data until the Blue: Label 1
number of samples are comparable Green: Label 0
2. Synthetic Data: Images: Rotate, dilate, crop, add noise to Blue: Label 1
existing input images and create new data Green: Label 0
3. Modified Loss: Modify the loss to reflect greater error when 𝑙𝑜𝑠𝑠 = 𝑎 ∗ 𝒍𝒐𝒔𝒔𝒈𝒓𝒆𝒆𝒏 + 𝑏 ∗ 𝒍𝒐𝒔𝒔𝒃𝒍𝒖𝒆 𝑎>𝑏
misclassifying smaller sample set
4. Change the algorithm: Increase the model/algorithm complexity so that the two classes are perfectly
separable (Con: Overfitting)
Increase model
complexity
No straight line (y=ax) passing through origin can perfectly Straight line (y=ax+b) can perfectly separate data.
separate data. Best solution: line y=0, predict all labels blue Green class will no longer be predicted as blue
lOMoARcPSD|7609677
Cheat Sheet – PCA Dimensionality Reduction

What is PCA?
• Based on the dataset find a new set of orthogonal feature vectors in such a way that the
data spread is maximum in the direction of the feature vector (or dimension)
• Rates the feature vector in the decreasing order of data spread (or variance)
• The datapoints have maximum variance in the first feature vector, and minimum variance
in the last feature vector
• The variance of the datapoints in the direction of feature vector can be termed as a
measure of information in that direction.
Steps
1. Standardize the datapoints
2. Find the covariance matrix from the given datapoints
3. Carry out eigen-value decomposition of the covariance matrix
4. Sort the eigenvalues and eigenvectors
Dimensionality Reduction with PCA

• Keep the first m out of n feature vectors rated by PCA. These m vectors will be the best m
vectors preserving the maximum information that could have been preserved with m
vectors on the given dataset
Steps:
1. Carry out steps 1-4 from above
2. Keep first m feature vectors from the sorted eigenvector matrix
3. Transform the data for the new basis (feature vectors)
4. The importance of the feature vector is proportional to the magnitude of the eigen value
Figure 1 Figure 2
Feature # 1 (F1)
FeFeature # 1
Variance
Variance
1
e#
2
ur
e#
at
ur
at
Fe
w
w
Ne
Ne
F2 F1 Feature # 2 (F2) Feature # 2 F2 F1
Figure 3 Figure 1: Datapoints with feature vectors as

x and y-axis
Figure 2: The cartesian coordinate system is
rotated to maximize the standard deviation
Variance
ew Feature # 1
along any one axis (new feature # 2)

1
#
2 Figure 3: Remove the feature vector with

re
e#
u
ur minimum standard deviation of datapoints

at
at
Fe
Fe F2 F2 (new feature # 1) and project the data on

w
Ne
N
Feature # 2 new feature # 2
lOMoARcPSD|7609677
Cheat Sheet – Bayes Theorem and Classifier

What is Bayes’ Theorem?
• Describes the probability of an event, based on prior knowledge of conditions that might be
related to the event.
P(A B)
• How the probability of an event changes when
we have knowledge of another event Posterior
Probability
P(A) P(A B)
Usually, a better
estimate than P(A)
Bayes’ Theorem
Example
• Probability of fire P(F) = 1%
• Probability of smoke P(S) = 10%
Likelihood P(A) Evidence
• Prob of smoke given there is a fire P(S F) = 90%
• What is the probability that there is a fire given P(B A) Prior P(B)
we see a smoke P(F S)? Probability
Maximum Aposteriori Probability (MAP) Estimation

The MAP estimate of the random variable y, given that we have observed iid (x1, x2, x3, … ), is
given by. We try to accommodate our prior knowledge when estimating.
ˆMAP y that maximizes the product of
prior and likelihood
Maximum Likelihood Estimation (MLE)

The MAP estimate of the random variable y, given that we have observed iid (x1, x2, x3, … ), is
given by. We assume we don’t have any prior knowledge of the quantity being estimated.
ˆ y that maximizes only the
MLE
likelihood
MLE is a special case of MAP where our prior is uniform (all values are equally likely)
Naïve Bayes’ Classifier (Instantiation of MAP as classifier)

Suppose we have two classes, y=y1 and y=y2. Say we have more than one evidence/features (x1,
x2, x3, … ), using Bayes’ theorem
Bayes’ theorem assumes the features (x1, x2, x3, … ) are i.i.d. i.e
lOMoARcPSD|7609677
Cheat Sheet – Regression Analysis

What is Regression Analysis?
Fitting a function f(.) to datapoints yi=f(xi) under some error function. Based on the estimated
function and error, we have the following types of regression
1. Linear Regression:
Fits a line minimizing the sum of mean-squared error
for each datapoint.
2. Polynomial Regression:
Fits a polynomial of order k (k+1 unknowns) minimizing
the sum of mean-squared error for each datapoint.
3. Bayesian Regression:
For each datapoint, fits a gaussian distribution by
minimizing the mean-squared error. As the number of
data points xi increases, it converges to point
estimates i.e.
4. Ridge Regression:
Can fit either a line, or polynomial minimizing the sum
of mean-squared error for each datapoint and the
weighted L2 norm of the function parameters beta.
5. LASSO Regression:
Can fit either a line, or polynomial minimizing the the
sum of mean-squared error for each datapoint and the
weighted L1 norm of the function parameters beta.
6. Logistic Regression:
Can fit either a line, or polynomial with sigmoid
activation minimizing the binary cross-entropy loss for
each datapoint. The labels y are binary class labels.
Visual Representation:
Linear Regression Polynomial Regression Bayesian Linear Regression Logistic Regression
Label 1
y
y
Label 0
x x x x
Summary:
What does it fit? Estimated function Error Function
Linear A line in n dimensions
Polynomial A polynomial of order k
Bayesian Linear Gaussian distribution for each point
Ridge Linear/polynomial
LASSO Linear/polynomial
Logistic Linear/polynomial with sigmoid
lOMoARcPSD|7609677
$sR
Cheat Sheet – Regularization in ML zs$st{
eB
What is Regularization in ML?

• Regularization is an approach to address over-fitting in ML.
• Overfitted model fails to generalize estimations on test data
• When the underlying model to be learned is low bias/high
variance, or when we have small amount of data, the =tITP$$t FZRSi$Y !%TP$$t
estimated model is prone to over-fitting. _PIS$PSR$9S _PIS$PSR$9S
BPSIssRS$RSR8sjj BPSIssRS$RSjs
• Regularization reduces the variance of the model
Types of Regularization: Figure 1. Overfitting
1. Modify the loss function:
• L2 Regularization: Prevents the weights from getting too large (defined by L2 norm). Larger
the weights, more complex the model is, more chances of overfitting.
• L1 Regularization: Prevents the weights from getting too large (defined by L1 norm). Larger
the weights, more complex the model is, more chances of overfitting. L1 regularization
introduces sparsity in the weights. It forces more weights to be zero, than reducing the the
average magnitude of all weights
• Entropy: Used for the models that output probability. Forces the probability distribution
towards uniform distribution.
2. Modify data sampling:

• Data augmentation: Create more data from available data by randomly cropping, dilating,
rotating, adding small amount of noise etc.
• K-fold Cross-validation: Divide the data into k groups. Train on (k-1) groups and test on 1
group. Try all k possible combinations.
3. Change training approach:

• Injecting noise: Add random noise to the weights when they are being learned. It pushes the
model to be relatively insensitive to small variations in the weights, hence regularization
• Dropout: Generally used for neural networks. Connections between consecutive layers are
randomly dropped based on a dropout-ratio and the remaining network is trained in the
current iteration. In the next iteration, another set of random connections are dropped.
5-fold cross-validation Original Network Dropout-ratio = 30%
Test Train
Train Test Train
Train Test Train

Train Test Train
Train Test Connections = 16 Active = 11 (70%) Active = 11 (70%)
Figure 2. K-fold CV Figure 3. Drop-out

lOMoARcPSD|7609677
Cheat Sheet – Famous CNNs

AlexNet – 2012
Why: AlexNet was born out of the need to improve the results of
the ImageNet challenge.
What: The network consists of 5 Convolutional (CONV) layers and 3
Fully Connected (FC) layers. The activation used is the Rectified
Linear Unit (ReLU).
How: Data augmentation is carried out to reduce over-fitting, Uses
Local response localization.
VGGNet – 2014
Why: VGGNet was born out of the need to reduce the # of
parameters in the CONV layers and improve on training time
What: There are multiple variants of VGGNet (VGG16, VGG19, etc.)
How: The important point to note here is that all the conv kernels are
of size 3x3 and maxpool kernels are of size 2x2 with a stride of two.
ResNet – 2015
Why: Neural Networks are notorious for not being able to find a
simpler mapping when it exists. ResNet solves that.
What: There are multiple versions of ResNetXX architectures where
‘XX’ denotes the number of layers. The most used ones are ResNet50
and ResNet101. Since the vanishing gradient problem was taken care of
(more about it in the How part), CNN started to get deeper and deeper
How: ResNet architecture makes use of shortcut connections do solve
the vanishing gradient problem. The basic building block of ResNet is
a Residual block that is repeated throughout the network.
Filter
Concatenation
Weight layer
f(x) x 1x1
3x3
Conv
5x5
Conv
1x1 Conv
Weight layer Conv 1x1 1x1 3x3

Conv Conv Maxpool
+ Previous
f(x)+x Layer
Figure 1 ResNet Block Figure 2 Inception Block

Inception – 2014
Why: Lager kernels are preferred for more global features, on the other
hand, smaller kernels provide good results in detecting area-specific
features. For effective recognition of such a variable-sized feature, we
need kernels of different sizes. That is what Inception does.
What: The Inception network architecture consists of several inception
modules of the following structure. Each inception module consists of
four operations in parallel, 1x1 conv layer, 3x3 conv layer, 5x5 conv
layer, max pooling
How: Inception increases the network space from which the best
network is to be chosen via training. Each inception module can
capture salient features at different levels.
lOMoARcPSD|7609677
Cheat Sheet – Convolutional Neural Network

Convolutional Neural Network:
The data gets into the CNN through the input layer and passes
through various hidden layers before getting to the output layer.
The output of the network is compared to the actual labels in
terms of loss or error. The partial derivatives of this loss w.r.t the
trainable weights are calculated, and the weights are updated
through one of the various methods using backpropagation.
CNN Template:
Most of the commonly used hidden layers (not all) follow a
pattern
1. Layer function: Basic transforming function such as
convolutional or fully connected layer.
a. Fully Connected: Linear functions between the input and the
output.
a. Convolutional Layers: These layers are applied to 2D (3D) input feature maps. The trainable weights are a 2D (3D)
kernel/filter that moves across the input feature map, generating dot products with the overlapping region of the input
feature map.
b.Transposed Convolutional (DeConvolutional) Layer: Usually used to increase the size of the output feature map
(Upsampling) The idea behind the transposed convolutional layer is to undo (not exactly) the convolutional layer
Fully Connected Layer Convolutional Layer
w11*x
x1 1+ b1
+ b1 y1
w21*x2
x2
1
3 +b
1*x
x3 w3
Input Node Output Node Input Map Kernel Output Map
2. Pooling: Non-trainable layer to change the size of the feature map

a. Max/Average Pooling: Decrease the spatial size of the input layer based on
selecting the maximum/average value in receptive field defined by the kernel
b. UnPooling: A non-trainable layer used to increase the spatial size of the input
layer based on placing the input pixel at a certain index in the receptive field
of the output defined by the kernel.
3. Normalization: Usually used just before the activation functions to limit the
unbounded activation from increasing the output layer values too high
a. Local Response Normalization LRN: A non-trainable layer that square-normalizes the pixel values in a feature map
within a local neighborhood.
b. Batch Normalization: A trainable approach to normalizing the data by learning scale and shift variable during training.
3. Activation: Introduce non-linearity so CNN can 5. Loss function: Quantifies how far off the CNN prediction
efficiently map non-linear complex mapping. is from the actual labels.
a. Non-parametric/Static functions: Linear, ReLU a. Regression Loss Functions: MAE, MSE, Huber loss
b. Parametric functions: ELU, tanh, sigmoid, Leaky ReLU b. Classification Loss Functions: Cross entropy, Hinge loss
c. Bounded functions: tanh, sigmoid 4.0
MSE Loss
2.0
MAE Loss
2.0
Huber Loss
mse = (x − x̂)2 mae = |x − x̂| 1 2
! "
2 (x − x̂) : |x − x̂| < γ
3.5 1.75 1.75 γ|x − x̂| − 12 γ 2 : else
γ =1.9
3.0 1.5 1.5
2.5 1.25 1.25
2.0 1.0 1.0
1.5 0.75 0.75
1.0 0.5 0.5
0.5 0.25 0.25
0.0 0.0 0.0
-2.0 -1.0 0.0 1.0 2.0 -2.0 -1.0 0.0 1.0 2.0 -2.0 -1.0 0.0 1.0 2.0
Hinge Loss Cross Entropy Loss

1.0
3.0 !
max(0, 1 − x̂) : x = 1
"
−ylog(p) − (1 − y)log(1 − p)
max(0, 1 + x̂) : x = −1 8.0
2.5 0.8
2.0 6.0 0.6
1.5
4.0 0.4
1.0
2.0
0.5 0.2
0.0 0.0 0.0

-2.0 -1.0 0.0 1.0 2.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
lOMoARcPSD|7609677
Cheat Sheet – Ensemble Learning in ML

What is Ensemble Learning? Wisdom of the crowd
Combine multiple weak models/learners into one predictive model to reduce bias, variance and/or improve accuracy.
Types of Ensemble Learning: N number of weak learners

1.Bagging: Trains N different weak models (usually of same types – homogenous) with N non-overlapping subset of the
input dataset in parallel. In the test phase, each model is evaluated. The label with the greatest number of predictions is
selected as the prediction. Bagging methods reduces variance of the prediction
2.Boosting: Trains N different weak models (usually of same types – homogenous) with the complete dataset in a
sequential order. The datapoints wrongly classified with previous weak model is provided more weights to that they can
be classified by the next weak leaner properly. In the test phase, each model is evaluated and based on the test error of
each weak model, the prediction is weighted for voting. Boosting methods decreases the bias of the prediction.
3.Stacking: Trains N different weak models (usually of different types – heterogenous) with one of the two subsets of the
dataset in parallel. Once the weak learners are trained, they are used to trained a meta learner to combine their
predictions and carry out final prediction using the other subset. In test phase, each model predicts its label, these set of
labels are fed to the meta learner which generates the final prediction.
The block diagrams, and comparison table for each of these three methods can be seen below.
Ensemble Method – Boosting Ensemble Method – Bagging
Input Dataset Step #1 Input Dataset
Step #1 Create N subsets
Assign equal weights Complete dataset from original Subset #1 Subset #2 Subset #3 Subset #4
to all the datapoints dataset, one for each
in the dataset weak model
Uniform weights
Step #2
Train each weak
Weak Model Weak Model Weak Model Weak Model
Step #2a Step #2b model with an
Train a weak model Train Weak • Based on the final error on the independent #1 #2 #3 #4
with equal weights to trained weak model, calculate a subset, in
Model #1 parallel
all the datapoints scalar alpha.
• Use alpha to increase the weights of
wrongly classified points, and
decrease the weights of correctly
alpha1 Adjusted weights classified points
Step #3
In the test phase, predict from
each weak model and vote their Voting
Step #3b predictions to get final prediction
Step #3a Train Weak • Based on the final error on the
Train a weak model Model #2 trained weak model, calculate a
with adjusted weights scalar alpha.
on all the datapoints • Use alpha to increase the weights of
in the dataset wrongly classified points, and Final Prediction
decrease the weights of correctly
alpha2 Adjusted weights classified points
Train Weak Ensemble Method – Stacking

Model #3
Step #1
Create 2 subsets from Input Dataset
original dataset, one
for training weak Subset #1 – Weak Learners Subset #3#2 – Meta Learner
Subset
alpha3 Adjusted weights models and one for
meta-model
Train Weak
Step #(n+1)a Model #4 Step #2
Train a weak model Train each weak
with adjusted weights model with the
Train Weak Train Weak Train Weak Train Weak
on all the datapoints weak learner Model #1 Model #2 Model #3 Model #4
in the dataset dataset
alpha3
x x x x Input Dataset
Subset #1 – Weak Learners Subset #2 – Meta Learner
Step #n+2
In the test phase, predict from each
weak model and vote their predictions
weighted by the corresponding alpha to
get final prediction Step #3
Voting Train a meta-
learner for which Trained Weak Trained Weak Trained Weak Trained Weak
the input is the
outputs of the Model Model Model Model
weak models for #1 #2 #3 #4
the Meta Learner
dataset
Final Prediction
Parameter Bagging Boosting Stacking

Meta Model
Focuses on Reducing variance Reducing bias Improving accuracy
Nature of weak
Homogenous Homogenous Heterogenous Step #4
learners is In the test phase, feed the input to the
weak models, collect the output and feed
Weak learners are Learned voting it to the meta model. The output of the
Final Prediction
Simple voting Weighted voting meta model is the final prediction
aggregated by (meta-learner)
lOMoARcPSD|7609677
How to prepare for

1/4 behavioral interview?
Collect stories, assign keywords, practice
the STAR format
Keywords List important keywords that will be populated with your personal
stories. Most common keywords are given in the table below
Conflict Compromise to
Negotiation Creativity Flexibility Convincing
Resolution achieve goal
Another team Adjust to a
Handling Challenging Working with
priorities not colleague Take Stand
Crisis Situation difficult people
aligned style
Handling –ve Coworker Working with a Your Influence
Your strength
feedback view of you deadline weakness Others
Handling Converting Decision
Handling Conflict Mentorship/
unexpected challenge to without enough
failure Resolution Leadership
situation opportunity data
Stories
1. List all the organizations you have been a part of. For example
1. Academia: BSc, MSc, PhD
2. Industry: Jobs, Internship
3. Societies: Cultural, Technical, Sports
2. Think of stories from step 1 that can fall into one of the keywords categories. The
more stories the better. You should have at least 10-15 stories.
3. Create a summary table by assigning multiple keywords to each stories. This will help
you filter out the stories when the question asked in the interview. An example can be
seen below
Story 1: [Convincing] [Take Stand] [influence other]
Story 2: [Mentorship] [Leadership]
Story 3: [Conflict resolution] [Negotiation]
Story 4: [decision-without-enough-data]
STAR Format
Write down the stories in the STAR format as explained in the 2/4 part of this cheat
sheet. This will help you practice the organization of story in a meaningful way.
Icon Source: www.flaticon.com
lOMoARcPSD|7609677
How to prepare for

2/4 behavioral interview?
Direct*, meaningful*, personalized*, logical*
*(Respective colors are used to identify these characteristics in the example)
Example: “Tell us about a time when you had to convince senior executives”
S
“I worked as an intern in XYZ company in
Situation the summer of 2019. The project details
provided to me was elaborative. After
Explain the situation and some initial brainstorming, and research I
realized that the project approach can be
provide necessary context for modified to make it more efficient in
terms of the underlying KPIs. I decided to
your story. talk to my manager about it.”
“I had an hour-long call with my manager
T
and explained him in detail the proposed
Task approach and how it could improve the
KPIs. I was able to convince him. He
Explain the task and your asked me if I will be able to present my
proposed approach for approval in front of
responsibility in the the higher executives. I agreed to it. I was
working out of the ABC(city) office and
situation the executives need to fly in from
XYZ(city) office.”
“I did a quick background check on the

Action
A
executives to know better about their area
of expertise so that I can convince them
Walk through the steps and accordingly. I prepared an elaborative 15
slide presentation starting with explaining
actions you took to address their approach, moving onto my proposed
the issue approach and finally comparing them on
preliminary results.
“After some active discussion we were able

to establish that the proposed approach
Result
R
was better than the initial one. The
executives proposed a few small changes
State the outcome of the to my approach and really appreciated my
result of your actions stand. At the end of my internship, I was
selected among the 3 out of 68 interns
who got to meet the senior vice president
of the company over lunch.”

lOMoARcPSD|7609677
How to answer a
3/4 behavioral question?
Understand, Extract, Map, Select and Apply
Example: “Tell us about a time when you had to convince senior executives”
Understand the question

Example: A story where I was able to convince
Understand my seniors. Maybe they had something in mind,
and I had a better approach and tried to
convince them
Extract keywords and tags

Extract useful keywords that encapsulates the
Extract Example:
gist of the question
[Convincing], [Creative], [Leadership]
Map the keyword to your stories

Shortlist all the stories that fall under the
Map keywords extracted from previous step
Example:
Story1, Story2, Story3, Story4, … , Story N
Select the best story

From the shortlisted stories, pick the one that
Select best describes the question and has not been used
so far in the interview
Example: Story3
Apply the STAR method

Apply the STAR method on the selected story to
Apply answer the question
Example: See Cheat Sheet 2/3 for details

lOMoARcPSD|7609677
Behavioral Interview
4/4 Cheat Sheet
Summarizing the behavioral interview
Gather important topics as keywords

1 Understand and collect all the important topics
commonly asked in the interview
Collect your stories
How to
2 Based on all the organizations you have been a part of,
think of all the stories that fall under the keywords above
prepare Practice stories in STAR format
for the 3 Practice each story using the STAR format. You will have
to answer the question following this format.
interview Assign keywords to stories

4 Assign each of your story one or more keywords. This will
help you recall them quickly
Create a summary table

5 Create a summary table mapping stories to their associated
keywords. This will be used during the behavioral question
Understand the question

U Understand the question and clarify any confusions that
you have
Extract the keywords
How to E Try to extract one or more of the keywords from the

question
answer a
Map the keywords to stories
question
during
M Based on the keywords extracted, find the stories using the
summary table created during preparation (Step 4)
interview Select a story

S Since each keyword maybe assigned to multiple stories,
select the one that is most relevant and has not been used.
Apply the START format

A Once the story has been shortlisted, apply STAR format on
the story to answer the question.
lOMoARcPSD|7609677
Follow the Author:

Follow the author for more machine learning/data science content at
• Medium:https://aqeel-anwar.medium.com
• ° LinkedIn:https://www.linkedin.com/in/aqeelanwarmalik/
Version History
• Version 0.1.0.1 - Apr 05, 2021
Fixed minor typo issues in Baye’s Theorem, Regression analysis and Classifier and
PCA dimensionality reduction cheat sheets.
• Version 0.1.0.0 - Mar 30, 2021

Initial draft with nine basics of ML and two behavioral interview cheat sheets.
lOMoARcPSD|7609677
Advance ML - practice

lOMoARcPSD|7609677
Q- Let us assume we implement an AND function to a single neuron. Below is a

tabular representation of an AND function. What would be the weights and
bias?
What would be the weights and bias?

A. Bias = -1.5, w1 = 1, w2 = 1
B. Bias = 1.5, w1 = 2, w2 = 2
C. Bias = 1, w1 = 1.5, w2 = 1.5
D. None of these
Q-What are the steps for using a gradient descent algorithm?

1.Calculate error between the actual value and the predicted value
2. Reiterate until you find the best weights of network
3.Pass an input through the network and get values from output layer
4.Initialize random weight and bias
5.Go to each neurons which contributes to the error and change its respective
values to reduce the error
A. 1, 2, 3, 4, 5
B. 5, 4, 3, 2, 1
C. 4, 3, 1, 5, 2
D. 3, 2, 1, 5, 4

lOMoARcPSD|7609677
Q- Suppose you are inputting an image of size (150 x150 x3) with filter size=2,
stride=1,padding=0. What would be the output size of an image?
A. 150x150
B. 149x 149
C. 148x 148
D. 147 x 147
Q-which of the following metric will best analyze the performance of any
model?
A. Precision
B. Recall
C. F-Score
Q-the number of nodes in the input is 20 and the hidden layer is 5. Then what
would be the maximum number of connections exists between the input layer
and the output layer?
A. 100
B. 25
C. less than 100
D. Greater than 100
Q-Why do we use cross validation:

A. to check the accuracy of the model
B. to check the robustness of the model
C. to analyze ROC curve
D. all of the above
Q- if loss='categorical_crossentropy', then which type of classification is used?
A. Binary classification
B. Multiclassification

lOMoARcPSD|7609677
Q-A perceptron is a –
a. A single layer feed-forward neural network with pre-processing
b. An auto-associative neural network
c. A double layer auto-associative neural network
d. A neural network that contains feedback
Q- Which of the following is true

1 On average. Neural networks have higher computational rates than
conventional computers
2 Neural networks learn by example
3 neural networks mimic the way the human brain works
A. All of these
B. 1 and 2 are true
C. 1,2 and 3 are true
D. None of these
Q-What is back propagation

B. It is the transmission of error back through the network to adjust the
inputs
C. It is the transmission of error back through the network to allow
weights to be adjusted so that network can learn
D. None of these
Q-Neural networks are complex ---------------- with many parameters

a. Linear functions
b. Nonlinear functions
c. Discrete functions
d. Exponential functions
Q-Which one of the folowing gives higher accuracy:

A. Random forest
b. SVM

lOMoARcPSD|7609677
Q-Which tool is NOT Suited for building ANN models? *

Python
TensorFlow
Keras
Excel
Q-How can we improve the calculation speed in TensorFlow, without losing

accuracy? *
Using GPU
By doing random sampling on Tensors
By removing few nodes from computational graphs
by removing the hidden layers
Q-How calculations work in TensorFlow? *

Through vector multiplications
Through RDDs
Through Computational Graphs
Through map reduce tasks
Q-Which tool is best suited for solving Deep Learning problems *

R
Sklearn
Excel
TensorFlow
Q-A tensor is similar to *

Data Array
ANN Model
SQL query
Pythoncode
Which of the following will be used to convert Numpy array to TensorFlow

tensor?

lOMoARcPSD|7609677
Otf.convert_to_tensor()
O np.array()
O tf.make_ndarray()
O tf.constant()
Which of the following must be initialized in Tensorflow?
O Placeholders
O Variables
O Sessions
O All of the above
What will be the output of the following?

import numpy as np
c = tf.constant([[1,2,3].[4,5,6]])
print("Python List input: {}".format(c.get_shape()))
OPython list input: (2, 3)
O Python list input: (3, 2)
O Python list input: (3, 3)
O None of the mentioned
Which of the following function is used for ragged data?

O tf.ragged.Ragged Tensor()
O tf.ragged. Tensor ()
Otf.Ragged Tensor ()
O tf.ragged ()
The parameters that are require to be learnt in minimizing objective function

in supervised learning
O Only weight
O Only bias
Both of the mentioned
What would be the output of the following?
import numpy as np
shape=(3,4,2)
input-np zeros(shape)
print(input)
Options:

lOMoARcPSD|7609677
[[[0.0.] [o. 0.] [o. o.] [o. o.]] [[o. o.] [o. o.] [o. o.] [o. o.]] [[o.o.] [o. o.] [o. o.] [o.
o.]]]
O [[[0.0.] [0.0.] [o. o.]] [[0. o.] [o. o.) [0.0.]]]
O [[[0.0.] [0.0.] [0. o.]] [[o. o.] [o. o.] [o. o.]] [[o. o.] [o. o.] [o. o.]]]
Which of these statements about deep learning programming frameworks are

true?
Deep learning programming frameworks require cloud-based machines to run.
O Even if a project is currently open source, good governance of the project
helps ensure that the it remains open even in the long term, rather tum
become closed or modified to benefit only one company.
O A programming framework does not allow you to code up deep learning
algoritlans with typically fewer lines of code than a lower-level language such
as Python
"Grouping of people based on their performance" is an example of:

O Clustering
O Classification
O Regression
Consider the following statement "it takes less time to navigate the regions
having a ge
i) Gradient Descent Algorithm
ii) Momentum based Gradient Descent Algorithm
Only 1
Only 2
O Both (i) and (ii)
Which of the following is true in terms of seed?

validation generator = data_generator.flow_from_directory(
train_data_dir, target_size= (img_width, img_height),
batch_size= batch_size, shuffle = True, class_mode = 'categorical',
seed = 42, subset= 'validation')
O a fixed value set drawn from a random distribution

lOMoARcPSD|7609677
O to produce the same random tensor for a given shape and dtype.
Both a andb
What will be the output of the following?

import numpy as np c = tf.constant(np.array([
[[1,2,3],
[4.5.6]].
[[1.1.1].
])
print("3d NumPy array input: {}" format(c.get_shape()))
O 3d NumPy array input: (4, 2, 3)

3d NumPy array input: (2, 2, 3)
O 3d NumPy array input: (2, 4, 3)
O 3d NumPy array input: (2,2,2)
What will be the output of the given code?

import tensorflow as tf h-tf.constant("Deep") w=tf.constant(" Learning")
o=h+w print(o)
O Deep Learning
O tf Tensor(Deep Learning, shape-(1,1), dtype-string
tf Tensor(Deep Learning, shape-(), dtype-string)
O Error
What would be the output of the following?

t-tf constant([[5.0.6.0.17.0,8.0]])
v1=tf.Variable(t,name='hello') v2=tf.Variable(t+1, name='hello').
print(v1 ==v2)
Otf Tensor( [[False False] [True True]], shape=(2, 2), dtype-bool)
Otf Tensor( [[True True] [True True]], shape=(2, 2), dtype-bool)
Ⓒtf. Tensor( [[False False] [False False]], shape=(2, 2), dtype=bool)
None of the mentioned.

lOMoARcPSD|7609677
What does validation_split-0.20 means in the given statement?

model fit(inputX, inputY, validation_split-0.20, epochs-10, batch_size=10)
O to use 50% of the data before shuffling for validation and rest 50% for
training
to use 80% of the data for validation before shuffling
to use last 20% of the data for validation before shuffling
None of the mentioned
What does validation_split-0.20 means in the given statement?

model fit(inputX, input Y, validation_split-0.20, epochs-10, batch_size=10)
O to use 50% of the data before shuffling for validation and rest 50% for
training
O to use 80% of the data for validation before shuffling
to use last 20% of the data for validation before shuffling
Which of the following statements are true: Feature Engineering is

1. A process of putting domain knowledge into the creation of feature
extractors.
2. Used to reduce the complexity of data.
O Only 1
O Only 2
Both are true
O Both are false
Consider the statement "Given a person's credentials and background

information, your system should assess whether a person loan grant". Which
technique is applicable to this scenario
Machine Learning
O Deep Learning
O Reinforcement Learning
O All of the above.

lOMoARcPSD|7609677
The effect of using loss in following statement is?

model.compile(optimizer='adam',loss-ff keras
losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy'])
Oto compute the quantity that a model should seek to minimize during
training.
O to return the sum of the per-sample losses in the batch
O Both of the mentioned
Suppose we have a neural network with ReLU activation function. Now, we

replace ReLu activations 1 y linear. Would this new neural network be able to
approximate an AND function?
Yes
O No
Which tool is best suited for solving Deep Learning problems
• R
• Sk-learn
• Excel
• TensorFlow
2. A tensor is similar to
• Data Array
• ANN Model
• SQL query
• Pythoncode
3. How calculations work in TensorFlow
• Through vector multiplications

• Through RDDs
• Through Computational Graphs
• Through map reduce tasks

lOMoARcPSD|7609677
4. In TensorFlow, what is the used of a session?
• The current work space session for storing the code

• We launch the graph in a session
• A session is used to download the data
• A session is used for exporting data out of TensorFlow
5. What does feed_dict do?
• Feeds external data into computational graphs

• Creates a new place holder
• Creates a new tensor
• Creates a new session
6. out=tf.add(tf.matmul(X,W), b)
• Logistic Regression Equaltion

• Deep ANN equation
• Random Forest Equation
• Linear Regression equation
7. tf.reduce_sum(tf.square(out-Y))
• Linear Model equation

• Maximum Entropy loss function
• Squared Error loss function
• Feed_dict process
View Answer
8. How can we improve the calculation speed in TensorFlow, without losing

accuracy?
• Using GPU
• By doing random sampling on Tensors
• By removing few nodes from computational graphs
• by removing the hidden layers
View Answer
9. Keras is a deep learning framework on which tool

lOMoARcPSD|7609677
• R
• TensorFlow
• SAS
• Azure
View Answer
10. What is the meaning of model=sequentil() in Keras?
• No such code in Keras

• Keras should be used only for sequential models like RNNs
• Keras builds sequential models
• creates a computational graph
1. Which tool is NOT Suited for building ANN models
• Python
• TensorFlow
• Excel
• Keras
View Answer
12. Can we have multidimentional tensors
• No tensor can have maximum two dimentions

• Possible only in image data
• Yes possible
• Possible only in geo tagged data
View Answer
13. Why Tensorflow uses computational graphs?
• Tensors are nothing but computational graphs

• Graphs are easy to plot
• There is no such concept of computational graphs in TensorFlow
• Calculations can be done in parallel
View Answer
14. How do we perform caculations in TensorFlow?

lOMoARcPSD|7609677
• We launch the computational graph in a session

• We launch the sesssion inside a computational graph
• By creating multiple tensors
• By creating data frames
View Answer
15. How do you feed external data into placeholders?
• by using impoar data command

• by using feed_dict
• by using read data function
• Not possible
16 out=tf.sigmoid(tf.add(tf.matmul(X,W), b))
• Logistic Regression Equaltion

• Deep ANN equation
• Random Forest Equation
• Linear Regression equation
View Answer
17. C=-tf.reduce_sum(Y*tf.log(out))
• C is a logistc regression line equation

• C is a squared error loss function
• C is a cross entropy loss function
• C is a linear regression line equation
View Answer
18. Can we use GPU for faster computations in TensorFlow
• No, not possible

• Possible only on cloud
• Possible only with small datasets
• Yes, possible
View Answer
19. Which tool is a deep learning wrapper on TensorFlow

lOMoARcPSD|7609677
• Python
• Keras
• PyTourch
• Azure
View Answer
20. How deep learning models are built on Keras
• by using sequential models

• by using feed_dict
• by creating place holders and computational graphs
• by creating data frames
Which of the subsequent declaration(s) effectively represents an actual neuron

in TensorFlow?
• A neuron has a single enter and a single output best
• A neuron has multiple inputs but a single output only
• A neuron has a single input, however, more than one outputs
• A neuron has multiple inputs and more than one outputs
• All of the above statements are valid
What are the stairs for the usage of a gradient descent algorithm in
TensorFlow?
1. Calculate error among the actual fee and the anticipated price
2. Reiterate until you find the excellent weights of the network
3. Pass an enter via the community and get values from the output layer
4. Initialize random weight and bias
5. Go to every neurons which contributes to the error and exchange its
respective values to lessen the error
• 1, 2, 3, 4, 5
• 5, 4, 3, 2, 1
• 3, 2, 1, 5, 4
• 4, 3, 1, 5, 2

lOMoARcPSD|7609677
“Convolutional Neural Networks can carry out various forms of transformation

(rotations or scaling) in an enter”. Is the assertion correct true or false in
TensorFlow?
• True
• false
Which of the following techniques perform comparable operations as the

dropout in a neural community in TensorFlow?
• Bagging
• Boosting
• Stacking
• None of those
Which of the following is authentic approximately model capability (in which

version capacity method the potential of the neural community to
approximate complex capabilities) in TensorFlow?
• As range of hidden layers boom, model capability will increase
• As dropout ratio increases, version capacity increases
• As mastering charge will increase, model capacity will increase
• None of these
In case you growth the range of hidden layers in a Multi-Layer Perceptron, the
category errors of check facts always decreases in TensorFlow. Authentic or
fake?
• Actual
• Fake
What’s the series of the following duties in a perceptron in tensorflow?

1.Initialize weights of perceptron randomly
2. Visit the subsequent batch of the dataset
3. If the prediction does no longer in shape the output, trade the weights
4. For a sample enter, compute an output
• 1, 2, 3, 4
• 4, 3, 2, 1
• 3, 1, 2, 4
• 1, 4, 3, 2

lOMoARcPSD|7609677
Suppose that you have to limit the value feature via converting the
parameters. Which of the subsequent approach could be used for this in
TensorFlow?
• Exhaustive seek
• Random search
• Bayesian Optimization
• Any of those
Can a neural network model the characteristic (y=1/x) in TensorFlow?

• Sure
• No
Wherein neural internet architecture, does weight sharing occur in

TensorFlow?
• Convolutional neural community
• Recurrent Neural community
• Fully related Neural community
• Both a and b
Batch Normalization is useful due to the fact?

• It normalizes (adjustments) all the input earlier than sending it to the
subsequent layer
• It returns again the normalized mean and widespread deviation of
weights
• It miles a very efficient backpropagation method
• None of those
As opposed to trying to acquire absolute 0 error, we set a metric called Bayes

blunders that’s the error we hope to achieve. What may be the cause for the
use of Bayes blunders in TensorFlow?
• Input variables might not include entire statistics about the output
variable
• Gadget (that creates input-output mapping) may be stochastic
• Constrained training facts
• All of the above
In a neural network, which of the subsequent strategies is used to deal with

overfitting in TensorFlow?

lOMoARcPSD|7609677
• Dropout
• Regularization
• Batch Normalization
Y = ax^2 + bx + c (polynomial equation of degree 2)Can this equation be

represented via a neural network of a single hidden layer with linear
threshold?
• Sure
• No
A numeric variable can shop numeric values with a maximum of eight digits.
• Authentic
• False
What’s a lifeless unit in a neural community?

• A unit which doesn’t replace throughout training by means of any of
its neighbour
• A unit which does now not reply absolutely to any of the schooling
styles
• The unit which produces the most important sum-squared mistakes
• None of these
Which of the subsequent assertion is the high-quality description of early

stopping?
• Teach the network until a local minimum in the blunders feature is
reached
• Simulate the community on a take a look at dataset after each epoch
of schooling. Stop training whilst the generalization errors starts
offevolved to boom
• Add a momentum term to the weight update within the Generalized
Delta Rule, so that schooling converges more quickly
• A faster model of backpropagation, such as the `Quickprop’
algorithm
What if we use a gaining knowledge of fee that’s too huge?

• Network will converge
• Network will now not converge

lOMoARcPSD|7609677
• Can’t Say
In TensorFlow, knowing the weight and bias of each neuron is the maximum
crucial step. If you could by some means get the best fee of weight and bias for
each neuron, you may approximate any characteristic. What will be the first-
class way to technique this?
• Assign random values and pray to God they are correct
• Seek every feasible aggregate of weights and biases until you get the
fine price
• Iteratively test that when assigning a value how a ways you are from
the first-class values, and barely alternate the assigned values values to
cause them to higher
The variety of neurons inside the output layer must in shape the wide variety
of instructions (in which the variety of lessons is extra than 2) in a supervised
studying project in TensorFlow. Real or false?
• Genuine
• False
While pooling layer is introduced in a convolutional neural network, translation

in-variance is preserved. Genuine or fake?
• Genuine
• Fake
Which gradient approach is finer whilst the facts is too massive to address in
RAM simultaneously?
• Full Batch Gradient Descent
• Stochastic Gradient Descent
For a category task, in place of random weight initializations in a neural

network, we set all the weights to zero. Which of the subsequent statements is
authentic?
• There will no longer be any trouble and the neural network will
educate nicely
• The neural network will train but all of the neurons will turn out to
be recognizing the same factor
• The neural network will now not train as there's no internet gradient
exchange

lOMoARcPSD|7609677
For a photo reputation problem (spotting a cat in a photograph), which

architecture of neural network might be higher suited to remedy the trouble?
• Multi Layer Perceptron
• Convolutional Neural community
• Recurrent Neural community
• Perceptron
What are the elements to choose the intensity of the neural network?
1. Form of neural community
2. Input records
3. Computation strength
4. Studying charge
5. The output function to map
• 1, 2, 4, 5
• 2, 3, 4, 5
• 1, 3, 4, 5
• All of these
Growth in length of a convolutional kernel might always boom the

performance of a convolutional community.
• Real
• False
TensorFlow is imported as?

• Run TensorFlow
• Import TensorFlow as tf
• Import TensorFlow
• Run tf
NumPy is imported as?

• Run numpy
• Import numpy as np
• Import numpy
• Run numpy

lOMoARcPSD|7609677
Although system getting to know is an interesting concept, there are

restrained business programs wherein it’s miles beneficial.
• True
• False
Which of the subsequent is a way regularly utilized in TensorFlow and system

learning?
• Type of facts into classes based on attributes.
• Grouping comparable objects into clusters of associated events.
• Figuring out relationships between occasions to are expecting whilst
one will follow the alternative.
• All the above are not unusual system learning strategies.
k-NN set of rules does more computation on check time rather than train time.
• Real
• Fake
Which of the following distance metric cannot be utilized in k-NN?

• NY
• Minkowski
• Tanimoto
• Jaccard
• All can be used
Which of the following option is true about the ok-NN set of rules?
• It can be used for type
• It could be used for regression
• It could be used in both class and regression
For practical implementation what type of approximation is used on Boltzmann

law?
• max field approximation
• min subject approximation
• hopfield approximation
• none
False minima may be reduced through deterministic updates?

lOMoARcPSD|7609677
• Sure
• No
What changed into the second stage in perceptron version known as?
• Sensory gadgets
• Summing unit
• Association unit
• Output unit
Delta learning is of the unsupervised kind?

• Sure
• No
What results in minimization of errors among the favored & real outputs?
• Balance
• Convergence
• Either balance or convergence
• Not one of the mentioned
Assume a convolutional neural community is educated on ImageNet dataset

(item reputation dataset). This skilled model is then given a totally white image
as an enter. The output probabilities for this enter might be same for all
lessons. Real or false?
• Real
• False
The trouble you are trying to remedy has a small amount of records. Luckily,
you have a pre-educated neural community that turned into educated on a
similar problem. Which of the following methodologies could you choose to
utilize this pre-skilled community?
• Re-teach the version for the brand new dataset
• Investigate on each layer how the version plays and only choose a
few of them
• Excellent song the last couple of layers simplest
• Freeze all the layers besides the final, re-teach the closing layer
What of the following is accurate in regard to backpropagation algorithm?

lOMoARcPSD|7609677
• Also known as generalized delta rule.

• The error is propagated backwards to determine weight updates
• No feedback at any stage
• All of the above mentioned
Considering backpropagation, which of the following options is true?

• It is a feedback neural network
• Actual output determined by the output of each hidden layer
• Hidden layers output is significant, they are only meant for
supporting input and output layers
What are the general limitations of back propagation rule?

• No feedback at any stage
• Retarded convergence
• Scaling
• All of the mentioned
A format will modify both the stored value and the displayed value.
• Correct
• Incorrect
1) Which of the subsequent declaration(s) effectively represents an actual

neuron in TensorFlow?
A. A neuron has a single enter and a single output best

•
• B.A neuron has multiple inputs but a single output only
• C.A neuron has a single input, however, more than one outputs
• D.All of the above statements are valid
2) Which of the following techniques perform comparable operations as the
A. Stacking
•
• B.Bagging
• C.Boosting
• D.None of these
3) Can a neural network model the characteristic (y=1/x) in TensorFlow?
• A. True
• B.False

lOMoARcPSD|7609677
4) Wherein neural internet architecture, does weight sharing occur in

TensorFlow?
• A. Fully related Neural community

• B.Recurrent Neural community
• C.Convolutional neural community
• D.both b & c
5) In a neural network, which of the subsequent strategies is used to deal with
overfitting in TensorFlow?
• A. Dropout
• B.Regularization
• C.Batch Normalization
• D.All of the above
6) Y = ax^2 + bx + c (polynomial equation of degree 2)Can this equation be
represented via a neural network of a single hidden layer with linear
threshold?
• A. Yes
• B.No
7) A numeric variable can shop numeric values with a maximum of eight digits.
• A. True
• B.False
8) Identify the lifeless unit in a neural community?
• A. The unit which produces the most important sum-squared mistakes

• B.A unit which does now not reply absolutely to any of the schooling
styles
• C.A unit which doesn’t replace throughout training by means of any of
its neighbour
• D.None of these
9) What if we use a gaining knowledge of fee that’s too huge?
• A. Network will converge

• B.Network will now not converge
• C.Can’t Say
10) Which of following functions shouldn't be used at the output layer to
classify an image?
• A. tanh

lOMoARcPSD|7609677
• B.ReLU
• C.sigmoid
• D.None of these
11) The nodes in the i/p layer is 10 and that in the hidden layer is 5 what will
be the max. connections from the i/p layer to the hidden layer are?
• A. Twenty
• B.Sixty
• C.Fifty
• D.It is random
12) From the following choices where can deep learning be used?
• A. Detection of exotic particles

• B.Protein structure prediction
• C.Prediction of chemical reactions
13) The network that involves feedback links from o/p to i/p and hidden layers
is called as ____
• A. Self organizing maps

• B.Multi layered perceptron
• C.Recurrent neural network
14) Feature Columns, handle a variety of input data types without _______ to
the model.
• A. Changes
• B.user help
• C.documentation
• D.None of these
15) Why do we use TPU?
• A. To visualize model
• B.For debugging purpose only
• C.To accelerate the development
• D.TPU does not exist
16) What do you by TensorBoard?
• A. TensorBoard provides the visualization and tooling needed for

machine learning experimentation

lOMoARcPSD|7609677
• B.TensorBoard is a metric tool which compares model in terms of their

accuracy
• C.TensorBoard does not exsist
• D.TensorBoard is used to rank the best performing Tensors
17) Which of the following product isn't built using TensorFlow?
• A. Hand Writing Recognition

• B.Teachable Machine
• C.Nsynth
• D.Pandas
18) What is the full form of TPU?
• A. Two processing unit

• B.Truer processing unit
• C.Test processing unit
• D.Tensor processing unit
19) What is the full form of XLA in TensorFlow?
• A. Accelerated Linear Algebra

• B.Unknown Linear Algebra
• C.Xtreme Linear Algebra
• D.X Linear Algebra
20) Can TensorFlow be deployed in container software?
• A. True
• B.False
21) Which of the following dashboards in TensorFlow?
• A. Scalar Dashboard
• B.Histogram Dashboard
• C.Distributer Dashboard
22) Identify the type of Tensors?
• A. Variable Tensor
• B.Constant Tensor
• C.Place Holder Tensor.
23) Who discovered tensors?
• A. Gargi-Curbastro

lOMoARcPSD|7609677
B.Gregorio Ricci-Curbastro
•
• C.Both 1 and 2
• D.None of these
24) What of the following is accurate in regard to backpropagation algorithm?
A. Also known as generalized delta rule.

•
• B.No feedback at any stage
• C.The error is propagated backwards to determine weight updates
25) What are the general limitations of back propagation rule?
• A. No feedback at any stage

• B.Retarded convergence
• C.Scaling
• TensorFlow is a Python-based library which is used for creating machine
learning applications.
• A. TRUE
B. FALSE
C. Can be true or false
D. Can not say
• View Answer
•
• 2. How many types of Tensors are there?
• A. 2
B. 3
C. 4
D. 5
• View Answer
•
• 3. Which of the following are main advantages of TensorFlow?
• A. It has auto differentiation capabilities

B. It has platform flexibility
C. It is easily customizable and open-source
D. All of the above
• View Answer

lOMoARcPSD|7609677
• 4. TensorFlow architecture works in ________ parts.
• A. 1
B. 2
C. 3
D. 4
• View Answer
•
• 5. __________ provides a high-level API which makes neural network

building and training fast and easy.
• A. TensorLayer
B. TFLearn
C. PrettyTensor
D. Sonnet
• View Answer
•
• 6. Variables in TensorFlow are also known as ?
• A. tensor variable
B. tensor keywords
C. tensor attributes
D. tensor objects
• View Answer
•
• 7. Which of the following defines specific input data that does not
change with time?
• A. tf.variable
B. tf.placeholder
C. Both A and B
• View Answer
•

lOMoARcPSD|7609677
• 8. Can TensorFlow be deployed in container software?
• A. Yes
B. No
C. Can be yes or no
D. Can not say
• View Answer
•
• 9. Which of the following is true about TensorFlow?
• A. The TensorFlow is based on Theano library.

B. It is produced by Google
C. TensorFlow does not have any option at run time
D. All of the above
• View Answer
•
• 10. DeepSpeech is an open-source engine used to convert Speech into

Text.
• A. TRUE
B. FALSE
D. Can not say
TensorFlow was developed by
A. Oracle Team
•
• B. IBM Team
• C. Microsoft Team
• D. Google Brain Team
2) TensorFlow was firstly introduced in _______
A. October 9, 2015
•
• B. October 9, 2016
• C. November 8, 2015
• D. November 9, 2015
3) Tensorflow is written in which language?
• A. C++
• B. CUDA

lOMoARcPSD|7609677
• C. Python
• D. All of the Above
4) Tensorflow supports ______ of the following platforms.
• A. Linux
• B. macOS
• C. Windows & Android
5) Which of the following techniques perform comparable operations as the
• A. Bagging
• B. Boosting
• C. Stacking
• D. None Of Above
Download Free : TensorFlow MCQ PDF
6) In a neural network, which of the subsequent strategies is used to deal
with overfitting in TensorFlow?
• A. Dropout
• B. Regularization
• C. Batch Normalization
• D. All of the above
7) Tensorflow is similar to ______
• A. SQL query
• B. Data Array
• C. ANN Model
• D. Pythoncode
8) Why do we use TPU?
• A. TPU does not exist

• B. To visualize model
• C. To accelerate the development
• D. For debugging purpose only
9) What is the full form of TPU?
• A. Tensor processing unit

• B. Truer processing unit
• C. Two processing unit
• D. Test processing unit

lOMoARcPSD|7609677
10) Who discovered tensors?
• A. Gregorio Ricci-Curbastro
• B. Gargi-Curbastro
• C. Both A and B
Read Best: TensorFlow Interview Questions
11) How many types of Tensors are there?
• A. One
• B. Two
• C. Three
• D. Four
12) Variables in TensorFlow are also known as ?
• A. tensor objects
• B. tensor variable
• C. tensor attributes
• D. tensor keywords
13) Which of the following is true about TensorFlow?
• A. It is produced by Google
• B. The TensorFlow is based on Theano library.
• C. TensorFlow does not have any option at run time
14) TensorFlow is a free and open-source ______
• A. PHP
• B. Java
• C. Python
• D. Angular
15) Tensorflow supports which python version?
• A. Python 3.0
• B. Python 3.3
• C. Python 3.5
• D. Python 3.6
Download Free: TensorFlow Interview Questions PDF
16) Why tensorflow uses computational graphs?
• A. Graphs are easy to plot

lOMoARcPSD|7609677
B. Calculations can be done in parallel

•
• C. Tensors are nothing but computational graphs
• D. All of the above
17) Which of the following tool is a deep learning wrapper on TensorFlow?
A. Creo
•
• B. Keras
• C. Python
• D. Adurino
18) TensorFlow is mainly used for ______
A. Classification and Perception

•
• B. Discovering and Understanding
• C. Prediction and Creation
19) Which of the subsequent declaration(s) effectively represents an actual
neuron in TensorFlow?
A. A neuron has a single enter and a single output best

•
• B. A neuron has multiple inputs but a single output only
• C. A neuron has a single input, however, more than one outputs
• D. All of the above statements are valid
20) What if we use a gaining knowledge of fee that’s too huge?
A. Network will converge

•
• B. Network will now not converge
• C. Both A and B
21) What is the full form of XLA in TensorFlow?
• A. X Linear Algebra
• B. Xtreme Linear Algebra
• C. Unknown Linear Algebra
• D. Accelerated Linear Algebra
1. TensorFlow is a free and open-source ............. based library for machine

learning.
• Python

lOMoARcPSD|7609677
• Java
• PHP
• Angular
Tensor flow is developed by…………………….
• IBM Team
• Microsoft Team
• Google Brain team
View Answer
Google Brain team
Exp; TensorFlow is developed by the Google Brain team.
3. TensorFlow was initially released in .................
• November 9, 2015
• October 9, 2015
View Answer
November 9, 2015
Exp: TensorFlow was initially released on November 9, 2015, about 5.5 years
ago.
4. Tensorflow is written in which language?
• C++
• Python
• CUDA
View Answer
All of the above
Exp: Tensorflow is written in C++, Python, & CUDA programming languages.

lOMoARcPSD|7609677
5. Tensorflow attracts the largest popularity on GitHub compare to the other

deep learning framework.
• True
• False
Download Free : TensorFlow MCQ PDF
View Answer
True
Exp: Yes! Tensorflow attracts the largest popularity on GitHub compare to the
other deep learning framework.
6. Tensorflow supports which python version?
• Python 3.0
• Python 3.3
• Python 3.5
• Python 3.6–3.9
View Answer
Python 3.6–3.9
Exp: Tensorflow supports Python 3.6 to 3.9 version.
7. Tensorflow supports which of the following platforms?
• Linux
• macOS
• Windows & Android
View Answer
All of the above
Exp: Tensorflow supports 64-bit Linux, macOS, Windows & Android platforms.
8. Tensorflow is a symbolic math library based on .............

lOMoARcPSD|7609677
• Dataflow
• Differentiable programming
• Both Dataflow & Differentiable programming
View Answer
Both Dataflow & Differentiable programming
Exp: Tensorflow is a symbolic math library based on both dataflow &
differentiable programming.
9. There are ........... main tensor type you can create in TensorFlow.
• 2
• 3
• 4
• 5
View Answer
4
Exp: There are 4 main tensor type you can create in TensorFlow. these are
tf.Variable, tf.constant, tf.placeholder, & tf.SparseTensor.
10. What is the Advantage of TensorFlow?
• It has excellent community support.

• It is designed to use various backend software (GPUs, ASIC), etc. and
also highly parallel.
• It has a unique approach that allows monitoring the training progress
of our models and tracking several metrics.
Read Best: TensorFlow Interview Questions
View Answer
All of the above
Exp: The Advantages of TensorFlow are - It has excellent community support, It
is designed to use various backend software (GPUs, ASIC), etc. and also highly
parallel, It has a unique approach that allows monitoring the training progress

lOMoARcPSD|7609677
of our models and tracking several metrics, & Its performance is high and
matching the best in the industry.
11. What are the disadvantages of TensorFlow?
• Missing Symbolic loops

• No supports for windows
• No GPU support for Nvidia
View Answer
All of the above
Exp: The disadvantages of TensorFlow are as follows - Missing Symbolic loops,
No supports for windows, No GPU support for Nvidia, No support for OpenCL,
hard to find an error and difficult to debug.
12. What are the Features of TensorFlow?
• Flexible & Open Source

• Easily Trainable & Layered Components
• Open Source & Responsive Construct
View Answer
All of the above
Exp: The main features of TensorFlow are - Responsive Construct, Flexible,
Easily Trainable, Large Community, Open Source, Feature Columns, Layered
Components, & Event Logger (With TensorBoard) and many others.
13. TensorFlow has only supported 64-bit Python 3.5.x or Python 3.6.x on
Windows.
• True
• False
View Answer
True

lOMoARcPSD|7609677
14. TensorFlow managers handle the full lifecycle of Servables, except

..............
• Serving Servables
• Metrics Servables
• Loading Servables
• Unloading Servables
View Answer
Metrics Servables
Exp: TensorFlow managers handle the full lifecycle of a Servables, including -
Loading Servables, Serving Servables, Unloading Servables.
15. When was Tensorflow 2.0 released?
• September 2019
• October 2019
• August 2019
• November 2019
Download Free: TensorFlow Interview Questions PDF
View Answer
September 2019
Exp: Tensorflow 2.0 was released on September 30, 2019.
16. Why tensorflow uses computational graphs?
• Graphs are easy to plot

• Calculations can be done in parallel
• Tensors are nothing but computational graphs
View Answer
Calculations can be done in parallel
Exp: Tensorflow uses computational graphs because calculations can be done
in parallel.

lOMoARcPSD|7609677
17. What is the use of a session in TensorFlow?
• We launch the graph in a session

• A session is used to download the data
• The current work space session for storing the code
• A session is used for exporting data out of TensorFlow
View Answer
We launch the graph in a session
Exp: Basically, we launch the graph in a session in TensorFlow.
18. What are the different dashboards in TensorFlow?
• Scalar Dashboard
• Histogram Dashboard
• Distributer Dashboard
View Answer
All of the above
Exp: There are different types of dashboards are available in TensorFlow such
as - Scalar Dashboard, Histogram Dashboard, Distributor Dashboard, Image
Dashboard, & Audio Dashboard, etc.
19. Which of the following tool is a deep learning wrapper on TensorFlow?
• Keras
• Azure
• Python
• PyTourch
View Answer
Keras
Exp: Keras tool is a deep learning wrapper on TensorFlow.

lOMoARcPSD|7609677
20. Can we use GPU for faster computations in TensorFlow?
• Yes
• No
View Answer
Yes
Exp: Yes! we can use GPU for faster computations in TensorFlow.
Question-1 = Why is the convolutional layer important in convolutional neural

networks?
Solution = Because if we do not use a convolutional layer, we will end up with a
massive number of parameters that will need to be optimized and it will be
super computationally expensive.
Question-2 = The following is a typical architecture of a convolutional neural
network.
False
Question-3 = For unsupervised learning, which of the following deep neural
networks would you choose? Select all that apply
Solution = Autoencoders, Restricted Boltzmann Machines.
Question-4 = Recurrent Neural Networks are networks with loops, that don’t
just take a new input at a time, but also take as input the output from the data
point at the previous instance.
Solution = True
Question-5 = Which of the following statements is correct?
Solution = An autoencoder is an unsupervised neural network model that uses
backpropagation by setting the target variable to be the same as the input.
1. _________ is a high level API built on TensorFlow.
A. PyBrain
B. Keras
C. PyTorch
D. Theano
View Answer
2. Is keras a library?

lOMoARcPSD|7609677
A. Yes
B. No
C. Can be yes or no
D. Can not say
View Answer
3. Who invented keras?
A. Michael Berthold
B. Adam Paszke
C. Sam Gross
D. François Chollet
View Answer
4. __________ is a regularization technique for neural network models

proposed by Srivastava, it is a technique where randomly selected neurons are
ignored during training.
A. Callout
B. Digout
C. Dropout
D. Knimeout
View Answer
5. What is true about Keras?
A. Keras is an API designed for human beings, not machines.

B. Keras follows best practices for reducing cognitive load
C. it provides clear and actionable feedback upon user error
D. All of the above
View Answer
6. A flatten operation on a tensor reshapes the tensor to have a shape that is

equal to the number of elements contained in the tensor.

lOMoARcPSD|7609677
A. TRUE
B. FALSE
D. Can not say
View Answer
7. What are advanced activation functions in keras ?
A. LeakyReLU
B. PReLU
C. Both A and B
View Answer
8. Which of the following are correct initializers in keras?
A. keras.initializers.Initializer()
B. keras.initializers.Zeros()
C. keras.initializers.Ones()
D. All of the above
View Answer
9. A ____________ requires shape of the input (input_shape) to understand

the structure of the input data.
A. Keras layer
B. Keras Module
C. Keras Model
D. Keras Time
View Answer
10. Which of the following returns all the layers of the model as list?
A. model.inputs
B. model.layers

lOMoARcPSD|7609677
C. model.outputs
D. model.get_weights
Which of the following statement(s) correctly represents a real neuron?

A. A neuron has a single input and a single output only
B. A neuron has multiple inputs but a single output only
C. A neuron has a single input but multiple outputs
D. A neuron has multiple inputs and multiple outputs
E. All of the above statements are valid
Solution: (E)
A neuron can have a single Input / Output or multiple Inputs / Outputs.
Q2. Below is a mathematical representation of a neuron.
The different components of the neuron are

denoted as:
• x1, x2,…, xN: These are inputs to the neuron. These can either be the
actual observations from input layer or an intermediate value from one
of the hidden layers.
• w1, w2,…,wN: The Weight of each input.
• bi: Is termed as Bias units. These are constant values added to the input
of the activation function corresponding to each weight. It works similar
to an intercept term.
• a: Is termed as the activation of the neuron which can be represented
as
• and y: is the output of the neuron

lOMoARcPSD|7609677
Considering the above notations, will a line equation (y = mx + c) fall into the
category of a neuron?
A. Yes
B. No
Solution: (A)
A single neuron with no non-linearity can be considered as a linear regression
function.
Q3. Let us assume we implement an AND function to a single neuron. Below is

a tabular representation of an AND function:
X1 X2 X1 AND X2
0 0 0
0 1 0
1 0 0
1 1 1
The activation function of our neuron is denoted as:
What would be the weights and bias?

lOMoARcPSD|7609677
(Hint: For which values of w1, w2 and b does our neuron implement an AND
function?)
A. Bias = -1.5, w1 = 1, w2 = 1
B. Bias = 1.5, w1 = 2, w2 = 2
C. Bias = 1, w1 = 1.5, w2 = 1.5
D. None of these
Solution: (A)
A.
1. f(-1.5*1 + 1*0 + 1*0) = f(-1.5) = 0

2. f(-1.5*1 + 1*0 + 1*1) = f(-0.5) = 0
3. f(-1.5*1 + 1*1 + 1*0) = f(-0.5) = 0
4. f(-1.5*1 + 1*1+ 1*1) = f(0.5) = 1
Therefore option A is correct
Q4. A network is created when we multiple neurons stack together. Let us take
an example of a neural network simulating an XNOR function.
You can see that the last neuron takes input from two neurons before it. The
activation function for all the neurons is given by:

lOMoARcPSD|7609677
Suppose X1 is 0 and X2 is 1, what will be the output for the above neural
network?
A. 0
B. 1
Solution: (A)
Output of a1: f(0.5*1 + -1*0 + -1*1) = f(-0.5) = 0
Output of a2: f(-1.5*1 + 1*0 + 1*1) = f(-0.5) = 0
Output of a3: f(-0.5*1 + 1*0 + 1*0) = f(-0.5) = 0
So the correct answer is A
Q5. In a neural network, knowing the weight and bias of each neuron is the
most important step. If you can somehow get the correct value of weight and
bias for each neuron, you can approximate any function. What would be the
best way to approach this?
A. Assign random values and pray to God they are correct
B. Search every possible combination of weights and biases till you get the best
value
C. Iteratively check that after assigning a value how far you are from the best
values, and slightly change the assigned values values to make them better
D. None of these
Solution: (C)
Option C is the description of gradient descent.
Q6. What are the steps for using a gradient descent algorithm?
1. Calculate error between the actual value and the predicted value
2. Reiterate until you find the best weights of network
3. Pass an input through the network and get values from output layer
4. Initialize random weight and bias

lOMoARcPSD|7609677
5. Go to each neurons which contributes to the error and change its

respective values to reduce the error
A. 1, 2, 3, 4, 5
B. 5, 4, 3, 2, 1
C. 3, 2, 1, 5, 4
D. 4, 3, 1, 5, 2
Solution: (D)
Option D is correct
Q7. Suppose you have inputs as x, y, and z with values -2, 5, and -4 respectively.
You have a neuron ‘q’ and neuron ‘f’ with functions:
q=x+y
f=q*z
Graphical representation of the functions is as follows:
What is the gradient of F with respect to x, y, and z?

(HINT: To calculate gradient, you must find (df/dx), (df/dy) and (df/dz))
A. (-3,4,4)
B. (4,4,3)
C. (-4,-4,3)
D. (3,-4,-4)
Solution: (C)
Option C is correct.

lOMoARcPSD|7609677
Q8. Now let’s revise the previous slides. We have learned that:
• A neural network is a (crude) mathematical representation of a brain,

which consists of smaller components called neurons.
• Each neuron has an input, a processing function, and an output.
• These neurons are stacked together to form a network, which can be
used to approximate any function.
• To get the best possible neural network, we can use techniques like
gradient descent to update our neural network model.
Given above is a description of a neural network. When does a neural network
model become a deep learning model?
A. When you add more hidden layers and increase depth of neural network
B. When there is higher dimensionality of data
C. When the problem is an image recognition problem
D. None of these
Solution: (A)
More depth means the network is deeper. There is no strict rule of how many
layers are necessary to make a model deep, but still if there are more than 2
hidden layers, the model is said to be deep.
Q9. A neural network can be considered as multiple simple equations stacked

together. Suppose we want to replicate the function for the below mentioned
decision boundary.
Using two simple inputs h1 and h2

lOMoARcPSD|7609677
What will be the final equation?

A. (h1 AND NOT h2) OR (NOT h1 AND h2)
B. (h1 OR NOT h2) AND (NOT h1 OR h2)
C. (h1 AND h2) OR (h1 OR h2)
D. None of these
Solution: (A)
As you can see, combining h1 and h2 in an intelligent way can get you a
complex equation easily. Refer Chapter 9 of this book
Q10. “Convolutional Neural Networks can perform various types of

transformation (rotations or scaling) in an input”. Is the statement correct True
or False?
A. True
B. False
Solution: (B)
Data Preprocessing steps (viz rotation, scaling) is necessary before you give the
data to neural network because neural network cannot do it itself.
Q11. Which of the following techniques perform similar operations as dropout

in a neural network?
A. Bagging
B. Boosting

lOMoARcPSD|7609677
C. Stacking
D. None of these
Solution: (A)
Dropout can be seen as an extreme form of bagging in which each model is
trained on a single case and each parameter of the model is very strongly
regularized by sharing it with the corresponding parameter in all the other
models. Refer here
Q 12. Which of the following gives non-linearity to a neural network?

A. Stochastic Gradient Descent
B. Rectified Linear Unit
C. Convolution function
Solution: (B)
Rectified Linear unit is a non-linear activation function.
Q13. In training a neural network, you notice that the loss does not decrease
in the few starting epochs.
The reasons for this could be:
1. The learning is rate is low

2. Regularization parameter is high

lOMoARcPSD|7609677
3. Stuck at local minima

What according to you are the probable reasons?
A. 1 and 2
B. 2 and 3
C. 1 and 3
D. Any of these
Solution: (D)
The problem can occur due to any of the reasons mentioned.
Q14. Which of the following is true about model capacity (where model
capacity means the ability of neural network to approximate complex
functions) ?
A. As number of hidden layers increase, model capacity increases
B. As dropout ratio increases, model capacity increases
C. As learning rate increases, model capacity increases
D. None of these
Solution: (A)
Only option A is correct.
Q15. If you increase the number of hidden layers in a Multi Layer Perceptron,
the classification error of test data always decreases. True or False?
A. True
B. False
Solution: (B)
This is not always true. Overfitting may cause the error to increase.
Q16. You are building a neural network where it gets input from the previous
layer as well as from itself.

lOMoARcPSD|7609677
Which of the following architecture has feedback connections?

A. Recurrent Neural network
B. Convolutional Neural Network
C. Restricted Boltzmann Machine
D. None of these
Solution: (A)
Option A is correct.
Q17. What is the sequence of the following tasks in a perceptron?
1. Initialize weights of perceptron randomly

2. Go to the next batch of dataset
3. If the prediction does not match the output, change the weights
4. For a sample input, compute an output
A. 1, 2, 3, 4
B. 4, 3, 2, 1
C. 3, 1, 2, 4
D. 1, 4, 3, 2
Solution: (D)

lOMoARcPSD|7609677
Sequence D is correct.
Q18. Suppose that you have to minimize the cost function by changing the
parameters. Which of the following technique could be used for this?
A. Exhaustive Search
B. Random Search
C. Bayesian Optimization
D. Any of these
Solution: (D)
Any of the above mentioned technique can be used to change parameters.
Q19. First Order Gradient descent would not work correctly (i.e. may get stuck)
in which of the following graphs?
A.

lOMoARcPSD|7609677
B.
C.
D. None of these
Solution: (B)
This is a classic example of saddle point problem of gradient descent.
Q20. The below graph shows the accuracy of a trained 3-layer convolutional
neural network vs the number of parameters (i.e. number of feature kernels).

lOMoARcPSD|7609677
The trend suggests that as you increase the width of a neural network, the
accuracy increases till a certain threshold value, and then starts decreasing.
What could be the possible reason for this decrease?
A. Even if number of kernels increase, only few of them are used for prediction
B. As the number of kernels increase, the predictive power of neural network
decrease
C. As the number of kernels increase, they start to correlate with each other
which in turn helps overfitting
D. None of these
Solution: (C)
As mentioned in option C, the possible reason could be kernel correlation.
Q21. Suppose we have one hidden layer neural network as shown above. The
hidden layer in this network works as a dimensionality reductor. Now instead
of using this hidden layer, we replace it with a dimensionality reduction
technique such as PCA.

lOMoARcPSD|7609677
Would the network that uses a dimensionality reduction technique always

give same output as network with hidden layer?
A. Yes
B. No
Solution: (B)
Because PCA works on correlated features, whereas hidden layers work on
predictive capacity of features.
Q22. Can a neural network model the function (y=1/x)?

A. Yes
B. No
Solution: (A)
Option A is true, because activation function can be reciprocal function.
Q23. In which neural net architecture, does weight sharing occur?

A. Convolutional neural Network
B. Recurrent Neural Network
C. Fully Connected Neural Network
D. Both A and B

lOMoARcPSD|7609677
Solution: (D)
Option D is correct.
Q24. Batch Normalization is helpful because

A. It normalizes (changes) all the input before sending it to the next layer
B. It returns back the normalized mean and standard deviation of weights
C. It is a very efficient backpropagation technique
D. None of these
Solution: (A)
To read more about batch normalization, see refer this video
Q25. Instead of trying to achieve absolute zero error, we set a metric called
bayes error which is the error we hope to achieve. What could be the reason
for using bayes error?
A. Input variables may not contain complete information about the output
variable
B. System (that creates input-output mapping) may be stochastic
C. Limited training data
D. All the above
Solution: (D)
In reality achieving accurate prediction is a myth. So we should hope to achieve
an “achievable result”.
Q26. The number of neurons in the output layer should match the number of
classes (Where the number of classes is greater than 2) in a supervised learning
task. True or False?
A. True
B. False

lOMoARcPSD|7609677
Solution: (B)
It depends on output encoding. If it is one-hot encoding, then its true. But you
can have two outputs for four classes, and take the binary values as four
classes(00,01,10,11).
Q27. In a neural network, which of the following techniques is used to deal

with overfitting?
A. Dropout
B. Regularization
C. Batch Normalization
D. All of these
Solution: (D)
All of the techniques can be used to deal with overfitting.
Q28. Y = ax^2 + bx + c (polynomial equation of degree 2)

Can this equation be represented by a neural network of single hidden layer
with linear threshold?
A. Yes
B. No
Solution: (B)
The answer is no because having a linear threshold restricts your neural network
and in simple terms, makes it a consequential linear transformation function.
Q29. What is a dead unit in a neural network?

A. A unit which doesn’t update during training by any of its neighbour
B. A unit which does not respond completely to any of the training patterns
C. The unit which produces the biggest sum-squared error
D. None of these

lOMoARcPSD|7609677
Solution: (A)
Option A is correct.
Q30. Which of the following statement is the best description of early
stopping?
A. Train the network until a local minimum in the error function is reached
B. Simulate the network on a test dataset after every epoch of training. Stop
training when the generalization error starts to increase
C. Add a momentum term to the weight update in the Generalized Delta Rule,
so that training converges more quickly
D. A faster version of backpropagation, such as the `Quickprop’ algorithm
Solution: (B)
Option B is correct.
Q31. What if we use a learning rate that’s too large?

A. Network will converge
B. Network will not converge
C. Can’t Say
Solution: B
Option B is correct because the error rate would become erratic and explode.
Q32. The network shown in Figure 1 is trained to recognize the characters H

and T as shown below:

lOMoARcPSD|7609677
What would be the output of the network?
A.
B.
C.
D. Could be A or B depending on the weights of neural network
Solution: (D)
Without knowing what are the weights and biases of a neural network, we
cannot comment on what output it would give.
Q33. Suppose a convolutional neural network is trained on ImageNet dataset

(Object recognition dataset). This trained model is then given a completely
white image as an input.The output probabilities for this input would be equal
for all classes. True or False?
A. True
B. False
Solution: (B)

lOMoARcPSD|7609677
There would be some neurons which are do not activate for white pixels as
input. So the classes wont be equal.
Q34. When pooling layer is added in a convolutional neural network,

translation in-variance is preserved. True or False?
A. True
B. False
Solution: (A)
Translation invariance is induced when you use pooling.
Q35. Which gradient technique is more advantageous when the data is too big
to handle in RAM simultaneously?
A. Full Batch Gradient Descent
B. Stochastic Gradient Descent
Solution: (B)
Q36. The graph represents gradient flow of a four-hidden layer neural network
which is trained using sigmoid activation function per epoch of training. The
neural network suffers with the vanishing gradient problem.

lOMoARcPSD|7609677
Which of the following statements is true?

A. Hidden layer 1 corresponds to D, Hidden layer 2 corresponds to C, Hidden
layer 3 corresponds to B and Hidden layer 4 corresponds to A
B. Hidden layer 1 corresponds to A, Hidden layer 2 corresponds to B, Hidden
layer 3 corresponds to C and Hidden layer 4 corresponds to D
Solution: (A)
This is a description of a vanishing gradient problem. As the backprop algorithm
goes to starting layers, learning decreases.
Q37. For a classification task, instead of random weight initializations in a

neural network, we set all the weights to zero. Which of the following
statements is true?
A. There will not be any problem and the neural network will train properly
B. The neural network will train but all the neurons will end up recognizing the
same thing
C. The neural network will not train as there is no net gradient change

lOMoARcPSD|7609677
D. None of these
Solution: (B)
Q38. There is a plateau at the start. This is happening because the neural
network gets stuck at local minima before going on to global minima.
To avoid this, which of the following strategy should work?

A. Increase the number of parameters, as the network would not get stuck at
local minima
B. Decrease the learning rate by 10 times at the start and then use momentum
C. Jitter the learning rate, i.e. change the learning rate for a few epochs
D. None of these
Solution: (C)
Option C can be used to take a neural network out of local minima in which it is
stuck.
Q39. For an image recognition problem (recognizing a cat in a photo), which

architecture of neural network would be better suited to solve the problem?
A. Multi Layer Perceptron

lOMoARcPSD|7609677
B. Convolutional Neural Network

C. Recurrent Neural network
D. Perceptron
Solution: (B)
Convolutional Neural Network would be better suited for image related
problems because of its inherent nature for taking into account changes in
nearby locations of an image
Q40. Suppose while training, you encounter this issue. The error suddenly
increases after a couple of iterations.
You determine that there must a problem with the data. You plot the data and
find the insight that, original data is somewhat skewed and that may be
causing the problem.
What will you do to deal with this challenge?

lOMoARcPSD|7609677
A. Normalize
B. Apply PCA and then Normalize
C. Take Log Transform of the data
D. None of these
Solution: (B)
First you would remove the correlations of the data and then zero center it.
Q41. Which of the following is a decision boundary of Neural Network?
A) B
B) A
C) D
D) C
E) All of these
Solution: (E)
A neural network is said to be a universal function approximator, so it can
theoretically represent any decision boundary.
Q42. In the graph below, we observe that the error has many “ups and
downs”

lOMoARcPSD|7609677
Should we be worried?
A. Yes, because this means there is a problem with the learning rate of neural
network.
B. No, as long as there is a cumulative decrease in both training and validation
error, we don’t need to worry.
Solution: (B)
Option B is correct. In order to decrease these “ups and downs” try to increase
the batch size.
Q43. What are the factors to select the depth of neural network?
1. Type of neural network (eg. MLP, CNN etc)

2. Input data
3. Computation power, i.e. Hardware capabilities and software capabilities
4. Learning Rate
5. The output function to map
A. 1, 2, 4, 5
B. 2, 3, 4, 5
C. 1, 3, 4, 5
D. All of these

lOMoARcPSD|7609677
Solution: (D)
All of the above factors are important to select the depth of neural network
Q44. Consider the scenario. The problem you are trying to solve has a small
amount of data. Fortunately, you have a pre-trained neural network that was
trained on a similar problem. Which of the following methodologies would you
choose to make use of this pre-trained network?
A. Re-train the model for the new dataset
B. Assess on every layer how the model performs and only select a few of them
C. Fine tune the last couple of layers only
D. Freeze all the layers except the last, re-train the last layer
Solution: (D)
If the dataset is mostly similar, the best method would be to train only the last
layer, as previous all layers work as feature extractors.
Q45. Increase in size of a convolutional kernel would necessarily increase the

performance of a convolutional network.
A. True
B. False
Solution: (B)
1. Which of the following is a subset of machine learning?
• Numpy
• SciPy
• Deep Learning
View Answer
Correct Answer:

lOMoARcPSD|7609677
Deep Learning
2. How many layers Deep learning algorithms are constructed?
• 2
• 3
• 4
• 5
View Answer
Correct Answer:
4
3. The first layer is called the?
• inner layer
• outer layer
• hidden layer
View Answer
Correct Answer:
inner layer
4. CNN is mostly used when there is an?
• structured data
• unstructured data
• Both A and B
View Answer
Correct Answer:
unstructured data
5. Which of the following is/are Common uses of RNNs?
• BusinessesHelp securities traders to generate analytic reports

• Detect fraudulent credit-card transaction
• Provide a caption for images

lOMoARcPSD|7609677
All of the above
6. Which neural network has only one hidden layer between the input and
output?
• Shallow neural network

• Deep neural network
• Feed-forward neural networks
• Recurrent neural networks
View Answer
Correct Answer:
Shallow neural network
7. RNNs stands for?
• Receives neural networks

• Receives neural networks
• Recording neural networks
View Answer
Correct Answer:
Recurrent neural networks
8. Deep learning algorithms are _______ more accurate than machine learning
algorithm in image classification.
• 33%
• 0.37
• 0.4
• 0.41
View Answer
Correct Answer:
0.41
9. Which of the following is well suited for perceptual tasks?

lOMoARcPSD|7609677
• Feed-forward neural networks

• Convolutional neural networks
• Reinforcement Learning
View Answer
Correct Answer:
Convolutional neural networks
10. Which of the following is/are Limitations of deep learning?
• Data labeling
• Obtain huge training datasets
• both 1 and 2
View Answer
Correct Answer:
both 1 and 2
11. The input image has been converted into a matrix of size 28 X 28 and a
kernel/filter of size 7 X 7 with a stride of 1. What will be the size of the
convoluted matrix?
• 20x20
• 21x21
• 22x22
• 25x25
View Answer
Correct Answer:
22x22
12. Which of the following statements is true when you use 1×1 convolutions
in a CNN?
• It can help in dimensionality reduction

• It can be used for feature pooling

lOMoARcPSD|7609677
• It suffers less overfitting due to small kernel size

View Answer
Correct Answer:
All of the above
13. Which of the following functions can be used as an activation function in

the output layer if we wish to predict the probabilities of n classes (p1, p2..pk)
such that sum of p over all n equals to 1?
• Softmax
• ReLu
• Sigmoid
• Tanh
View Answer
Correct Answer:
Softmax
14. The number of nodes in the input layer is 10 and the hidden layer is 5. The
maximum number of connections from the input layer to the hidden layer are
• 50
• less than 50
• more than 50
• It is an arbitrary value
View Answer
Correct Answer:
50
15. In which of the following applications can we use deep learning to solve
the problem?
• Protein structure prediction

• Prediction of chemical reactions
• Detection of exotic particles

lOMoARcPSD|7609677
View Answer
Correct Answer:
All of the above
16. Assume a simple MLP model with 3 neurons and inputs= 1,2,3. The
weights to the input neurons are 4,5 and 6 respectively. Assume the activation
function is a linear constant value of 3. What will be the output ?
• 32
• 64
• 96
• 128
View Answer
Correct Answer:
96
17. In a simple MLP model with 8 neurons in the input layer, 5 neurons in the
hidden layer and 1 neuron in the output layer. What is the size of the weight
matrices between hidden output layer and input hidden layer?
• [1 X 5] , [5 X 8]
• [5 x 1] , [8 X 5]
• [8 X 5] , [5 X 1]
• [8 X 5] , [ 1 X 5]
View Answer
Correct Answer:
[5 x 1] , [8 X 5]
18. Which of the following would have a constant input in each epoch of
training a Deep Learning model?
• Weight between input and hidden layer

• Weight between hidden and output layer
• Biases of all hidden layer neurons
• Activation function of output layer
View Answer

lOMoARcPSD|7609677
Correct Answer:
Weight between input and hidden layer
19. In CNN, having max pooling always decrease the parameters?
• True
• False
• Can be true or false
• Cannot say
View Answer
Correct Answer:
False
20. Sentiment analysis using Deep Learning is a many-to one prediction task
• True
• False
• Can be true or false
• Cannot say
View Answer
Correct Answer:
True
21. Which, if any, of the following propositions is true about fully-connected

neural networks (FCNN)?
• In a FCNN, there are connections between neurons of a same layer.

• In a FCNN, the most common weight initialization scheme is the zero
initialization, because it leads to faster and more robust training.
• A FCNN with only linear activations is a linear network.
View Answer
Correct Answer:
A FCNN with only linear activations is a linear network.
22. What consist of Boltzmann machine?

lOMoARcPSD|7609677
• fully connected network with both hidden and visible units

• asynchronous operation
• stochastic update
• all of the mentioned
View Answer
Correct Answer:
all of the mentioned
23. In which neural net architecture, does weight sharing occur?
• Convolutional neural Network

• Recurrent Neural Network
• Fully Connected Neural Network
• Both1 and 2
View Answer
Correct Answer:
Both1 and 2
24. Which of the following methods DOES NOT prevent a model from
overfitting to the training set?
• Early stopping
• Dropout
• Data augmentation
• Pooling
View Answer
Correct Answer:
Pooling
25. Assume that your machine has a large enough RAM dedicated to training
neural networks. Compared to using stochastic gradient descent for your
optimization, choosing a batch size that fits your RAM will lead to::
• a more precise but slower update.

• a more precise and faster update.
• a less precise but faster update.
• a less precise and slower update.

lOMoARcPSD|7609677
View Answer
Correct Answer:
a more precise but slower update.
Question 1
For which purpose Convolutional Neural Network is used?
Mainly to process and analyse digital images, with some success cases
involving processing voice and natural language.
It is a multi purpose alghorithm that can be used for Unsupervised Learning.
Mainly to process and analyse financial models, predicting future trends.
It is a multi purpose alghorithm that can be used for Supervised Learning.

CNN has some components and parameters which works well with images.
That´s why it´s mainly used to analyse and predict images.
Question 2
What is the biggest advantage utilizing CNN?
Little dependence on pre processing, decreasing the needs of human effort
developing its functionalities.
It is easy to understand and fast to implement.
It has the highest accuracy among all alghoritms that predicts images.
It works well both for Supervised and Unsupervised Learning.

lOMoARcPSD|7609677
With little dependence on pre processing, this algorhitm requires less human
effort. It is actually a self learner, which makes the pre processing phase,
easier.
Convolutional Neural Network has 5 basic components: Convolution, ReLU,
Pooling, Flattening and Full Connection. Based on this information, please
answer the questions below.
Question 3
Which answer explains better the Convolution?
Detect key features in images, respecting their spatial boundaries.
It is the first step to use CNN.
Understand the model features and selecting the best.
It is a technique to standardize the dataset.

This is the component which detect features in images preserving the
relationship between pixels by learning image features using small squares of
input data.
Question 4
Which answer explains better the ReLU?
Helps in the detection of features, decreasing the non-linearity of the image,
converting negative pixels to zero. This behavior allows you to detect
variations of attributes.
It is used to find the best features considering their correlation.

lOMoARcPSD|7609677
Helps in the detection of features, increasing the non-linearity of the image,
converting positive pixels to zero. This behavior allows you to detect variations
of attributes.
A technique that allows you to find outliers.

Usually a image is highly non-linear, which means varied pixel values. This is a
scenario that is very difficult to a algorhitm makes correct predictions. ReLU
comes to decrease the non-linearity and make the job easier.
Question 5
Which answer explains better the Pooling?
It assists in the detection of features, even if they are distorted, in addition to
decreasing the attribute sizes, resulting in decreased computational need. It is
also very useful for extracting dominant attributes.
It assists in the detection of distorted features, in order to find dominant
attributes.
Creates a pool of data in order to improve the accuracy of the alghorithm
predicting images.
Decrease the features size, in order to decrease the computional power that
are needed.
As a result of pooling, even if the picture were a little tilted, the largest number
in a certain region of the feature map would have been recorded and hence,

lOMoARcPSD|7609677
the feature would have been preserved. Also as another benefit, reducing the
size by a very significant amount will uses less computional power.
Question 6
Which answer explains better the Flattening?
Once we have the pooled feature map, this component transforms the
information into a vector. It's the input we need to get on with Artificial Neural
Networks.
Transform images to vectors to make it easier to predict.
Delete unnecessary features to make our dataset cleaner.
It is the last step of CNN.

In the flattening procedure, we basically take the elements in a pooled feature
map and put them in a vector form. This becomes the input layer for the
upcoming ANN.
Question 7
Which answer explains better the Full Connection?
Full Connection acts by placing different weights in each synapse in order to
minimize errors. This step can be repeated until an expected result is achieved.
Full Connection acts by placing different weights in each synapse in order to
minimize errors. No iteration is needed, since we can get the best results in our
first attempt.

lOMoARcPSD|7609677
It is the last step of CNN, where we connect the results of the earlier
componentes to create a output.
It is a componente that connects diferents alghorithms in order to increase the
accuracy.
It works like a ANN, assigning random weights to each synapse, the input layer
is weight adjusted and put into an activation function. The output of this is
then compared to the true values and the error generated is back-propagated,
i.e. the weights are re-adjusted and all the processes repeated. This is done
until the error or cost function is minimised.
Question 8
What are the Pooling Types? What are their characteristics?
Max Pooling and Average Pooling. Max pooling returns the maximum value of
the portion covered by the kernel and suppresses the Noises, while Average
pooling only returns the measure of that portion.
Max Pooling and Average Pooling. Max pooling returns the maximum value of
the portion covered by the kernel, while Average pooling returns the measure
of that portion and suppresses the Noises.
Max Pooling and Minimum Pooling. Max pooling returns the maximum value
of the portion covered by the kernel and suppresses the Noises, while
Minimum pooling only returns the smallest value of that portion.

lOMoARcPSD|7609677
Max Pooling and Std Pooling. Max pooling returns the maximum value of the
portion covered by the kernel, while Std Pooling returns the standard deviation
of that portion.
It is recommended to use Max Pooling most of the time.
Question 9
CNN is divided in two big steps. Feature Learning and Classification. What
happens in each step?
Feature Learning has Convolution, ReLU and Pooling components, with
inumerous iterations between them before move to Classification, which uses
the Flattening and Full Connection components.
Feature Learning has Flattening and Full Connection components, with
inumerous iterations between them before move to Classification, which uses
the Convolution, ReLU and Pooling componentes.
During Feature Learning, CNN uses appropriates alghorithms to it, while during
classification its changes the alghorithm in order to achive the expected result.
option4
During Feature Learning, the algorhitm is learning about it´s dataset.
Components like Convolution, ReLU and Pooling works for that. Once the
features are known, the classification happens using the Flattening and Full
Connection components.
Question 10

lOMoARcPSD|7609677
What is the difference between CNN and ANN?
CNN has one or more layers of convolution units, which receives its input from
multiple units.
CNN uses a more simpler alghorithm than ANN.
CNN is a easiest way to use Neural Networks.
They complete eachother, so in order to use ANN, you need to start with CNN.
The only difference is the Convolutional component, which is what makes CNN
good in analysing and predict data like images. The other steps are the same.
Question 11
What is the benefit to use CNN instead ANN?
Reduce the number of units in the network, which means fewer parameters to
learn and reduced chance of overfitting. Also they consider the context
information in the small neighborhoos. This feature is very important to
achieve a better prediction in data like images.
Increase the number of units in the network, which means more parameters to
learn and increase chance of overfitting. Also they consider the context
information in the small neighborhoos. This feature is very important to
achieve a better prediction.
There is no benefit, ANN is always better.

lOMoARcPSD|7609677
CNN has better results since you have more computional power.
Since digital images are a bunch of pixels with high values, makes sense use
CNN to analyse them. CNN decrease their values, which is better for training
phase with less computional power and less information loss.
Question 12
What 'Shared Weights' means in CNN?
Well done, you are the best.
It is what makes CNN 'convolutional'. Forcing the neurons of one layer to share
weights, the forward pass becomes the equivalente of convolving a filter over
the image to produce a new image. Then the training phase become a task of
learning filters, deciding what features you should look for in the data.
Sharing weights among the features, make it easier and faster to CNN predict
the correct image.
It means that CNN use the weights of each feature in order to find the best
model to make prediction, sharing the results and returning the average.
It calculate the feature´s weights and compare with other alghorithms in order
to find the best parameters.

This feature is what makes CNN better to analyse images than ANN. The
Convolutional component of CNN simplify the images structures and the
algorhitm can predict better.

lOMoARcPSD|7609677
Which of the following is a subset of machine learning?
A. Numpy
B. SciPy
C. Deep Learning
D. All of the above
View Answer
Ans : C
Explanation: Deep learning is a computer software that mimics the network of

neurons in a brain. It is a subset of machine learning and is called deep
learning.
2. How many layers Deep learning algorithms are constructed?
A. 2
B. 3
C. 4
D. 5
View Answer
Ans : B
Explanation: Deep learning algorithms are constructed with 3 connected layers

: inner layer, outer layer, hidden layer.
3. The first layer is called the?
A. inner layer
B. outer layer
C. hidden layer
View Answer
Ans : A

lOMoARcPSD|7609677
Explanation: The first layer is called the Input Layer. The last layer is called the
Output Layer. All layers in between are called Hidden Layers.
4. RNNs stands for?
A. Receives neural networks

B. Report neural networks
C. Recording neural networks
D. Recurrent neural networks
View Answer
Ans : D
Explanation: Recurrent neural networks (RNNs) : RNN is a multi-layered neural

network that can store information in context nodes, allowing it to learn data
sequences and output a number or another sequence.
5. Which of the following is/are Common uses of RNNs?
A. BusinessesHelp securities traders to generate analytic reports

B. Detect fraudulent credit-card transaction
C. Provide a caption for images
D. All of the above
View Answer
Ans : D
Explanation: All of the above are Common uses of RNNs.
6. Which of the following is well suited for perceptual tasks?
A. Feed-forward neural networks

B. Recurrent neural networks
C. Convolutional neural networks
View Answer
Ans : C

lOMoARcPSD|7609677
Explanation: CNN is a multi-layered neural network with a unique architecture

designed to extract increasingly complex features of the data at each layer to
determine the output. CNNs are well suited for perceptual tasks.
7. CNN is mostly used when there is an?
A. structured data
B. unstructured data
C. Both A and B
View Answer
Ans : B
Explanation: CNN is mostly used when there is an unstructured data set (e.g.,
images) and the practitioners need to extract information from it.
8. Which neural network has only one hidden layer between the input and
output?
A. Shallow neural network

B. Deep neural network
C. Feed-forward neural networks
D. Recurrent neural networks
View Answer
Ans : A
Explanation: Shallow neural network: The Shallow neural network has only one
hidden layer between the input and output.
9. Which of the following is/are Limitations of deep learning?
A. Data labeling
B. Obtain huge training datasets
C. Both A and B
View Answer

lOMoARcPSD|7609677
Ans : C
Explanation: Both A and B are Limitations of deep learning.
10. Deep learning algorithms are _______ more accurate than machine
learning algorithm in image classification.
A. 33%
B. 37%
C. 40%
D. 41%
View Answer
Ans : D

1. Fuzzy logic is a form of

a) Two-valued logic
b) Crisp set logic
d) Binary set logic
View Answer
Answer: c
Explanation: With fuzzy logic set membership is defined by certain value. Hence it
could have many values to be in the set.

a) True
b) False
View Answer
Answer: a
Explanation: Traditional set theory set membership is fixed or exact either the
member is in the set or not. There is only two crisp values true or false. In case of
fuzzy logic there are many values. With weight say x the member is in the set
__________
View Answer
Answer: a
Explanation: Refer the definition of Fuzzy set and Crisp set.
Partial Truth.
a) True
b) False
View Answer
Answer: a
Explanation: None.
advertisements
5. How many types of random variables are available?
a) 1
b) 2
c) 3
d) 4
View Answer
Answer: c
Explanation: The three types of random variables are Boolean, discrete and
continuous.
represented by _______ .
a) Fuzzy Set
b) Crisp Set
View Answer
Answer: a
7. The values of the set membership is represented by

a) Discrete Set
b) Degree of truth
c) Probabilities
d) Both b & c
View Answer
Answer: b
8. What is meant by probability density function?

a) Probability distributions
b) Continuous variable
c) Discrete variable
d) Probability distributions for Continuous variables
View Answer
Answer: d
Explanation: None.
advertisements
9. Japanese were the first to utilize fuzzy logic practically on high-speed trains in
Sendai.
a) True
b) False
View Answer
Answer: a
Explanation: None.
10. Which of the following is used for probability theory sentences?

a) Conditional logic
b) Logic
c) Extension of propositional logic
View Answer
Answer: c
Explanation: The version of probability theory we present uses an extension of
propositional logic for its sentences.

1. Fuzzy Set theory defines fuzzy operators. Choose the fuzzy operators from the
following.
a) AND
b) OR
c) NOT
d) EX-OR
View Answer
Answer: a, b, c
Explanation: The AND, OR, and NOT operators of Boolean logic exist in fuzzy logic,
usually defined as the minimum, maximum, and complement;
2. There are also other operators, more linguistic in nature, called __________ that
can be applied to fuzzy set theory.
a) Hedges
b) Lingual Variable
c) Fuzz Variable
View Answer
Answer: a
Explanation: None.

a) Solving queries
b) Increasing complexity
c) Decreasing complexity
d) Answering probabilistic query
View Answer
Answer: d
Explanation: Bayes rule can be used to answer the probabilistic queries conditioned
on one piece of evidence.
4. What does the Bayesian network provides?
a) Complete description of the domain
b) Partial description of the domain
c) Complete description of the problem
View Answer
Answer: a
Explanation: A Bayesian network provides a complete description of the domain.
advertisements
5. Fuzzy logic is usually represented as
b) IF-THEN rules
c) Both a & b
View Answer
Answer: b
Explanation: Fuzzy set theory defines fuzzy operators on fuzzy sets. The problem in
applying this is that the appropriate fuzzy operator may not be known. For this reason,
fuzzy logic usually uses IF-THEN rules, or constructs that are equivalent, such as
fuzzy associative matrices.

a) True
b) False
View Answer
Answer: a
Explanation: Once fuzzy relations are defined, it is possible to develop fuzzy
relational databases. The first fuzzy relational database, FRDB, appeared in Maria
Zemankova’s dissertation.

a) Fuzzy Logic
b) Probability
c) Entropy
View Answer
Answer: d
Explanation: Entropy is amount of uncertainty involved in data. Represented by
H(data).
8. ____________ are algorithms that learn from their more complex environments
(hence eco) to generalize, approximate and simplify solution logic.
b) Ecorithms
c) Fuzzy Set
View Answer
Answer: c
Explanation: Local structure is usually associated with linear rather than exponential
growth in complexity.
advertisements
9. Which condition is used to influence a variable directly by all the others?
a) Partially connected
b) Fully connected
c) Local connected
View Answer
Answer: b
Explanation: None.
10. What is the consequence between a node and its predecessors while creating
Bayesian network?
a) Conditionally dependent
b) Dependent
c) Conditionally independent
d) Both a & b
View Answer
Answer: c
Explanation: The semantics to derive a method for constructing Bayesian networks
were led to the consequence that a node can be conditionally independent of its
predecessors
1. A 3-input neuron is trained to output a zero when the input is 110 and a one when
the input is 111. After generalization, the output will be zero when and only when the
input is:
a) 000 or 110 or 011 or 101
b) 010 or 100 or 110 or 101
c) 000 or 010 or 110 or 100
d) 100 or 111 or 101 or 001
View Answer
Answer: c
Explanation: The truth table before generalization is:
Inputs Output
000 $
001 $
010 $
011 $
100 $
101 $
110 0
111 1
where $ represents don’t know cases and the output is random.
After generalization, the truth table becomes:
Inputs Output
000 0
001 1
010 0
011 1
100 0
101 1
110 0
111 1
.
2. A perceptron is:
a) a single layer feed-forward neural network with pre-processing
b) an auto-associative neural network
c) a double layer auto-associative neural network
d) a neural network that contains feedback
View Answer
Answer: a
Explanation: The perceptron is a single layer feed-forward neural network. It is not an
auto-associative network because it has no feedback and is not a multiple layer neural
network because the pre-processing stage is not made of neurons.
3. An auto-associative network is:

a) a neural network that contains no loops
b) a neural network that contains feedback
c) a neural network that has only one loop
d) a single layer feed-forward neural network with pre-processing
View Answer
Answer: b
Explanation: An auto-associative network is equivalent to a neural network that
contains feedback. The number of feedback paths(loops) does not have to be one.
4. A 4-input neuron has weights 1, 2, 3 and 4. The transfer function is linear with the
constant of proportionality being equal to 2. The inputs are 4, 10, 5 and 20
respectively. The output will be:
a) 238
b) 76
c) 119
d) 123
View Answer
Answer: a
Explanation: The output is found by multiplying the weights with their respective
inputs, summing the results and multiplying with the transfer function. Therefore:
Output = 2 * (1*4 + 2*10 + 3*5 + 4*20) = 238.
advertisements
(i) On average, neural networks have higher computational rates than conventional
computers.
(ii) Neural networks learn by example.
(iii) Neural networks mimic the way the human brain works.
a) All of the mentioned are true
b) (ii) and (iii) are true
c) (i), (ii) and (iii) are true
View Answer
Answer: a
Explanation: Neural networks have higher computational rates than conventional
computers because a lot of the operation is done in parallel. That is not the case when
the neural network is simulated on a computer. The idea behind neural nets is based
on the way the human brain works. Neural nets cannot be programmed, they cam only
learn by examples.

(i) The training time depends on the size of the network.
(ii) Neural networks can be simulated on a conventional computer.
(iii) Artificial neurons are identical in operation to biological ones.
a) All of the mentioned
b) (ii) is true
c) (i) and (ii) are true
View Answer
Answer: c
Explanation: The training time depends on the size of the network; the number of
neuron is greater and therefore the number of possible ‘states’ is increased. Neural
networks can be simulated on a conventional computer but the main advantage of
neural networks – parallel execution – is lost. Artificial neurons are not identical in
operation to the biological ones.

(i) They have the ability to learn by example
(ii) They are more fault tolerant
(iii)They are more suited for real time operation due to their high ‘computational’
rates
a) (i) and (ii) are true
b) (i) and (iii) are true
c) Only (i)
View Answer
Answer: d
Explanation: Neural networks learn by example. They are more fault tolerant because
they are always able to respond and small changes in input do not normally cause a
change in output. Because of their parallel architecture, high computational rates are
achieved.

Single layer associative neural networks do not have the ability to:
(i) perform pattern recognition
(ii) find the parity of a picture
(iii)determine whether two or more shapes in a picture are connected or not
a) (ii) and (iii) are true
b) (ii) is true
c) All of the mentioned
View Answer
Answer: a
Explanation: Pattern recognition is what single layer neural networks are best at but
they don’t have the ability to find the parity of a picture or to determine whether two
shapes are connected or not.
advertisements
a) It has set of nodes and connections
b) Each node computes it’s weighted input
c) Node could be in excited state or non-excited state
View Answer
Answer: d
Explanation: All mentioned are the characteristics of neural network.
10. Neuro software is:

a) A software used to analyze neurons
b) It is powerful and easy neural network
c) Designed to aid experts in real world
d) It is software used by Neuro surgeon
View Answer
Answer: b
Explanation: None.

1. Why is the XOR problem exceptionally interesting to neural network researchers?

a) Because it can be expressed in a way that allows you to use a neural network
b) Because it is complex binary operation that cannot be solved using neural networks
c) Because it can be solved by a single layer perceptron
d) Because it is the simplest linearly inseparable problem that exists.
View Answer
Answer: d
Explanation: None.

a) It is another name given to the curvy function in the perceptron
b) It is the transmission of error back through the network to adjust the inputs
c) It is the transmission of error back through the network to allow weights to be
adjusted so that the network can learn.
View Answer
Answer: c
Explanation: Back propagation is the transmission of error back through the network
to allow weights to be adjusted so that the network can learn.
3. Why are linearly separable problems of interest of neural network researchers?

a) Because they are the only class of problem that network can solve successfully
b) Because they are the only class of problem that Perceptron can solve successfully
c) Because they are the only mathematical functions that are continue
d) Because they are the only mathematical functions you can draw
View Answer
Answer: b
Explanation: Linearly separable problems of interest of neural network researchers
because they are the only class of problem that Perceptron can solve successfully
4. Which of the following is not the promise of artificial neural network?

a) It can explain result
b) It can survive the failure of some nodes
c) It has inherent parallelism
d) It can handle noise
View Answer
Answer: a
Explanation: The artificial Neural Network (ANN) cannot explain result.
advertisements
5. Neural Networks are complex ______________ with many parameters.
a) Linear Functions
View Answer
Answer: a
Explanation: Neural networks are complex linear functions with many parameters.
6. A perceptron adds up all the weighted inputs it receives, and if it exceeds a certain
value, it outputs a 1, otherwise it just outputs a 0.
a) True
b) False
c) Sometimes – it can also output intermediate values as well
d) Can’t say
View Answer
7. The name for the function in question 16 is

a) Step function
b) Heaviside function
c) Logistic function
d) Perceptron function
View Answer
Answer: b
Explanation: Also known as the step function – so answer 1 is also right. It is a hard
thresholding function, either on or off with no in-between.
8. Having multiple perceptrons can actually solve the XOR problem satisfactorily:
this is because each perceptron can partition off a linear part of the space itself, and
they can then combine their results.
a) True – this works always, and these multiple perceptrons learn to classify even
complex problems.
b) False – perceptrons are mathematically incapable of solving linearly inseparable
functions, no matter what you do
c) True – perceptrons can do this but are unable to learn to do it – they have to be
explicitly hand-coded
d) False – just having a single perceptron is enough
View Answer
Answer: c
Explanation: None.
advertisements
layers is called as ____.
a) Self organizing maps
b) Perceptrons
c) Recurrent neural network
d) Multi layered perceptron
View Answer
Answer: c
Explanation: RNN (Recurrent neural network) topology involves backward links from
output to the input and hidden layers.
10. Which of the following is an application of NN (Neural Network)?

a) Sales forecasting
b) Data validation
c) Risk management
View Answer
Answer: d
Explanation: All mentioned options are applications of Neural Network

1. Which is not a desirable property of a logical rule-based system?

a) Locality
b) Attachment
c) Detachment
d) Truth-Functionality
e) Global attribute
View Answer
Answer: b
Explanation: Locality: In logical systems, whenever we have a rule of the form A =>
B, we can conclude B, given evidence A, without worrying about any other rules.
Detachment: Once a logical proof is found for a proposition B, the proposition can be
used regardless of how it was derived .That is, it can be detachment from its
justification. Truth-functionality: In logic, the truth of complex sentences can be
computed from the truth of the components. However, there are no Attachment
properties lies in a Rule-based system. Global attribute defines a particular problem
space as user specific and changes according to user’s plan to problem.
2. How is Fuzzy Logic different from conventional control methods?

a) IF and THEN Approach
b) FOR Approach
c) WHILE Approach
d) DO Approach
e) Else If approach
View Answer
Answer: a
Explanation: FL incorporates a simple, rule-based IF X AND Y THEN Z approach to
a solving control problem rather than attempting to model a system mathematically.
3. In an Unsupervised learning
a) Specific output values are given
b) Specific output values are not given
c) No specific Inputs are given
d) Both inputs and outputs are given
e) Neither inputs nor outputs are given
View Answer
Answer: b
Explanation: The problem of unsupervised learning involves learning patterns in the
input when no specific output values are supplied. We cannot expect the specific
output to test your result. Here the agent does not know what to do, as he is not aware
of the fact what propose system will come out. We can say an ambiguous un-proposed
situation.
4. Inductive learning involves finding a

a) Consistent Hypothesis
b) Inconsistent Hypothesis
c) Regular Hypothesis
d) Irregular Hypothesis
e) Estimated Hypothesis
View Answer
Answer: a
Explanation: Inductive learning involves finding a consistent hypothesis that agrees
with examples. The difficulty of the task depends on the chosen representation.
advertisements
5. Computational learning theory analyzes the sample complexity and computational
complexity of
a) Unsupervised Learning
b) Inductive learning
c) Forced based learning
d) Weak learning
e) Knowledge based learning
View Answer
Answer: b
Explanation: Computational learning theory analyzes the sample complexity and
computational complexity of inductive learning. There is a tradeoff between the
expressiveness of the hypothesis language and the ease of learning.
6. If a hypothesis says it should be positive, but in fact, it is negative, we call it

a) A consistent hypothesis
b) A false negative hypothesis
c) A false positive hypothesis
d) A specialized hypothesis
e) A true positive hypothesis
View Answer
Answer: c
Explanation: Consistent hypothesis go with examples, If the hypothesis says it should
be negative but infect it is positive, it is false negative. If a hypothesis says it should
be positive, but in fact, it is negative, it is false positive. In a specialized hypothesis
we need to have certain restrict or special conditions.
7. Neural Networks are complex ———————–with many parameters.

a) Linear Functions
e) Power Functions
View Answer
Answer: b
Explanation: Neural networks parameters can be learned from noisy data and they
have been used for thousands of applications, so it varies from problem to problem
and thus use nonlinear functions.
8. A perceptron is a ——————————–.
a) Feed-forward neural network
b) Back-propagation algorithm
c) Back-tracking algorithm
d) Feed Forward-backward algorithm
e) Optimal algorithm with Dynamic programming
View Answer
Answer: a
Explanation: A perceptron is a Feed-forward neural network with no hidden units that
can be representing only linear separable functions. If the data are linearly separable,
a simple weight updated rule can be used to fit the data exactly.
advertisements
9. Which of the following statement is true?
a) Not all formal languages are context-free
b) All formal languages are Context free
c) All formal languages are like natural language
d) Natural languages are context-oriented free
e) Natural language is formal
View Answer
Answer: a
Explanation: Not all formal languages are context-free.
10. Which of the following statement is not true?

a) The union and concatenation of two context-free languages is context-free
b) The reverse of a context-free language is context-free, but the complement need not
be
c) Every regular language is context-free because it can be described by a regular
grammar
d) The intersection of a context-free language and a regular language is always
context-free
e) The intersection two context-free languages is context-free
View Answer
Answer: e
Explanation: The union and concatenation of two context-free languages is context-
free; but intersection need not be.

1. Factors which affect the performance of learner system does not include
a) Representation scheme used
b) Training scenario
c) Type of feedback
d) Good data structures
View Answer
Answer: d
Explanation: Factors which affect the performance of learner system does not include
good data structures.
2. Different learning method does not include:

a) Memorization
b) Analogy
c) Deduction
d) Introduction
View Answer
Answer: d
Explanation: Different learning methods include memorization, analogy and
deduction.
3. Which of the following is the model used for learning?

a) Decision trees
b) Neural networks
c) Propositional and FOL rules
View Answer
Answer: d
Explanation: Decision trees, Neural networks, Propositional rules and FOL rules all
are the models of learning.
4. Automated vehicle is an example of ______.

b) Unsupervised learning
c) Active learning
View Answer
Answer: a
Explanation: In automatic vehicle set of vision inputs and corresponding actions are
available to learner hence it’s an example of supervised learning.
advertisements
5. Following is an example of active learning:
a) News Recommender system
b) Dust cleaning machine
c) Automated vehicle
View Answer
Answer: a
Explanation: In active learning, not only the teacher is available but the learner can
ask suitable perception-action pair example to improve performance.
6. In which of the following learning the teacher returns reward and punishment to
learner?
a) Active learning
b) Reinforcement learning
c) Supervised learning
d) Unsupervised learning
View Answer
Answer: b
Explanation: Reinforcement learning is the type of learning in which teacher returns
award or punishment to learner.
7. Decision trees are appropriate for the problems where:

a) Attributes are both numeric and nominal
b) Target function takes on a discrete number of values.
c) Data may have errors
View Answer
Answer: d
Explanation: Decision trees can be used in all the conditions stated.
8. Which of the following is not an application of learning?

a) Data mining
b) WWW
c) Speech recognition
View Answer
Answer: d
Explanation: All mentioned options are applications of learning.
advertisements
9. Which of the following is the component of learning system?
a) Goal
b) Model
c) Learning rules
View Answer
Answer: d
Explanation: Goal, model, learning rules and experience are the components of
learning system.
10. Following is also called as exploratory learning:

b) Active learning
c) Unsupervised learning
View Answer
Answer: c
Explanation: In unsupervised learning no teacher is available hence it is also called
unsupervised learning.

a) Learning
b) Hearing
c) Perceiving
d) Speech
View Answer
Answer: a
Explanation: Learning will take place as the agent observes its interactions with the
world and its own decision making process.
2. Which modifies the performance element so that it makes better decision?

a) Performance element
b) Changing element
c) Learning element
View Answer
Answer: c
Explanation: A learning element modifies the performance element so that it can make
better decision.
3. How many things are concerned in design of a learning element?

a) 1
b) 2
c) 3
d) 4
View Answer
Answer: c
Explanation: The three main issues are affected in design of a learning element are
components, feedback and representation.
4. What is used in determining the nature of the learning problem?

a) Environment
b) Feedback
c) Problem
View Answer
Answer: b
Explanation: The type of feedback is used in determining the nature of the learning
problem that the agent faces.
advertisements
a) 1
b) 2
c) 3
d) 4
View Answer
Answer: c
Explanation: The three types of machine learning are supervised, unsupervised and
reinforcement.
6. Which is used for utility functions in game playing algorithm?

a) Linear polynomial
b) Weighted polynomial
c) Polynomial
d) Linear weighted polynomial
View Answer
Answer: d
Explanation: Linear weighted polynomial is used for learning element in the game
playing programs.
7. Which is used to choose among multiple consistent hypotheses?

a) Razor
b) Ockham razor
c) Learning element
View Answer
Answer: b
Explanation: Ockham razor prefers the simplest hypothesis consistent with the data
intuitively.
8. What will happen if the hypothesis space contains the true function?
a) Realizable
b) Unrealizable
c) Both a & b
View Answer
Answer: b
Explanation: A learning problem is realizable if the hypothesis space contains the true
function.
advertisements
9. What takes input as an object described by a set of attributes?
a) Tree
b) Graph
c) Decision graph
d) Decision tree
View Answer
Answer: d
Explanation: Decision tree takes input as an object described by a set of attributes and
returns a decision.

a) Single test
b) Two test
c) Sequence of test
d) No test
View Answer
Answer: c
Explanation: A decision tree reaches its decision by performing a sequence of tests
1: ANN is composed of large number of highly interconnected processing
elements(neurons) working in unison to solve problems.
A.
True
B.
False
C.
D.
Option: A
Explanation :
2:
Artificial neural network used for
A.
Pattern Recognition
B.
Classification
C.
Clustering
D.
All of these

Option: D
Explanation :
3:
A Neural Network can answer
A.
For Loop questions
B.
what-if questions
C.
IF-The-Else Analysis Questions
D.
None of these
Option: B
Explanation :
4:
Ability to learn how to do tasks based on the data given for training or initial
experience
A.
Self Organization
B.
Adaptive Learning
C.
Fault tolerance
D.
Robustness
Option: B
Explanation :
5:
Feature of ANN in which ANN creates its own organization or representation of
information it receives during learning time is
A.
Adaptive Learning
B.
Self Organization
C.
What-If Analysis
D.
Supervised Learniing
Option: B
Explanation :
computing/ann/514/1.html#ixzz46VE8CQAp
6:
In artificial Neural Network interconnected processing elements are called
A.
nodes or neurons
B.
weights
C.
axons
D.
Soma
Option: A
Explanation :
7:
Each connection link in ANN is associated with ________ which has information
about the input signal.
A.
neurons
B.
weights
C.
bias
D.
activation function
Option: B
Explanation :
8:
Neurons or artificial neurons have the capability to model networks of original
neurons as found in brain
A.
True
B.
False
C.
D.
Option: A
Explanation :
9:
Internal state of neuron is called __________, is the function of the inputs the
neurons receives
A.
Weight
B.
activation or activity level of neuron
C.
Bias
D.
None of these
Option: B
Explanation :
10:
Neuron can send ________ signal at a time.
A.
multiple
B.
one
C.
none
D.
any number of
Option: B
Explanation :

computing/ann/514/2.html#ixzz46VEVzf3a
1:
Artificial intelligence is
A
. It uses machine-learning techniques. Here program can learn From past
experience and adapt themselves to new situations
B.
Computational procedure that takes some value as input and produces some
value as output.
C.
Science of making machines performs tasks that would require intelligence
when performed by humans
D
. None of these
Option: C
Explanation :
2:
Expert systems
A
B.
of the theory of evolution
C.
an information base filled with the knowledge of an expert formulated in terms
of if-then rules
D
. None of these
Option: C
Explanation :
3:
Falsification is
A.
modules
B.
C.
D.
None of these
Option: B
Explanation :
4:
Evolutionary computation is
A
B.
of the theory of evolution.
C.
Decision support systems that contain an information base filled with the
knowledge of an expert formulated in terms of if-then rules.
D
. None of these
Option: B
Explanation :
5:
Extendible architecture is
A.
modules
B.
C.
D.
None of these
Option: A
Explanation :

computing/questions/192/1.html#ixzz46VEoNPTw
6:
Massively parallel machine is
A.
B.
C.
Describes the structure of the contents of a database.
D.
None of these
Option: B
Explanation :
7:
Search space
A
B.
The information stored in a database that can be, retrieved with a single query.
C.
Worth of the output of a machine learning program that makes it understandable
for humans
D
. None of these
Option: A
Explanation :
8:
n(log n) is referred to
A.
A measure of the desired maximal complexity of data mining algorithms
B.
A database containing volatile data used for the daily operation of an
organization
C.
Relational database management system
D.
None of these
Option: A
Explanation :
9:
Perceptron is
A.
General class of approaches to a problem.
B.
Performing several computations simultaneously
C.
Structures in a database those are statistically relevant
D.
Simple forerunner of modern neural networks, without hidden layers
Option: D
Explanation :
10:
Prolog is
A.
B.
C.
Describes the structure of the contents of a database
D.
None of these
Option: A
Explanation :

computing/questions/192/2.html#ixzz46VF3O07W
11:
Shallow knowledge
A
B.
The information stored in a database that can be, retrieved with a single query
C.
D
. None of these
Option: B
Explanation :
12:
Quantitative attributes are
A.
A reference to the speed of an algorithm, which is quadratically dependent
on the size of the data
B.
Attributes of a database table that can take only numerical values
C.
Tools designed to query a database
D.
None of these
Option: B
Explanation :
13:
Subject orientation
A
. The science of collecting, organizing, and applying numerical facts
B.
Measure of the probability that a certain hypothesis is incorrect given certain
observations.
C.
One of the defining aspects of a data warehouse, which is specially built
around all the existing applications of the operational data
D
. None of these
Option: C
Explanation :
14:
Vector
A.
It do not need the control of the human operator during their execution
B.
An arrow in a multi-dimensional space. It is a quantity usually characterized
by an ordered set of scalars
C.
The validation of a theory on the basis of a finite number of examples
D.
None of these
Option: B
Explanation :
15:
Transparency
A
B.
The information stored in a database that can be retrieved with a single query
C.
D
. None of these

Option: C
Explanation :

computing/questions/192/3.html#ixzz46VFK5DKd
1:
Core of soft Computing is
A.
Fuzzy Computing, Neural Computing, Genetic Algorithms
B.
Fuzzy Networks and Artificial Intelligence
C.
Artificial Intelligence and Neural Science
D.
Neural Science and Genetic Science
Option: A
Explanation :
2:
Who initiated the idea of Soft Computing
A.
Charles Darwin
B.
Lofti A Zadeh
C.
Rechenberg
D.
Mc_Culloch
Option: B
Explanation :
3:
Fuzzy Computing
A
. mimics human behaviour
B.
doesnt deal with 2 valued logic
C.
deals with information which is vague, imprecise, uncertain, ambiguous,
inexact, or probabilistic
D
. All of the above
Option: D
Explanation :
4:
Neural Computing
A.
mimics human brain
B.
information processing paradigm
C.
Both (a) and (b)
D.
None of the above
Option: C
Explanation :
5:
Genetic Algorithm are a part of
A
. Evolutionary Computing
B.
inspired by Darwin's theory about evolution - "survival of the fittest"
C.
are adaptive heuristic search algorithm based on the evolutionary ideas of
D
. All of the above
Option: D
Explanation

computing/introduction/512/1.html#ixzz46VFZ9z1x
6:
What are the 2 types of learning
A.
Improvised and unimprovised
B.
supervised and unsupervised
C.
Layered and unlayered
D.
None of the above
Option: B
Explanation :
7:
Supervised Learning is
A.
learning with the help of examples
B.
learning without teacher
C.
learning with the help of teacher
D.
learning with computers as supervisor
Option: C
Explanation :
8:
Unsupervised learning is
A.
learning without computers
B.
problem based learning
C.
learning from environment
D.
learning from teachers
Option: C
Explanation :
9:
Conventional Artificial Intelligence is different from soft computing in the sense
A.
Conventional Artificial Intelligence deal with prdicate logic where as soft
computing deal with fuzzy logic
B.
Conventional Artificial Intelligence methods are limited by symbols where
as soft computing is based on empirical data
C.
Both (a) and (b)
D.
None of the above
Option: C
Explanation :
10:
In supervised learning
A.
classes are not predefined
B.
classes are predefined
C.
classes are not required
D.
classification is not done
Option: B
Explanation :

computing/introduction/512/2.html#ixzz46VFqvgSd
1:
Membership function defines the fuzziness in a fuzzy set irrespective of the
elements in the set, which are discrete or continuous.
A.
True
B.
False
C.
D.
Option: A
Explanation :
2:
The membership functions are generally represented in
A.
Tabular Form
B.
Graphical Form
C.
Mathematical Form
D.
Logical Form
Option: B
Explanation :
3:
Membership function can be thought of as a technique to solve empirical problems
on the basis of
A.
knowledge
B.
examples
C.
learning
D.
experience
Option: D
Explanation :
4: Three main basic features involved in characterizing membership function are
A.
Intution, Inference, Rank Ordering
B.
Fuzzy Algorithm, Neural network, Genetic Algorithm
C.
Core, Support , Boundary
D.
Weighted Average, center of Sums, Median
Option: C
Explanation :
5:
The region of universe that is characterized by complete membership in the set is
called
A.
Core
B.
Support
C.
Boundary
D.
Fuzzy
Option: A
Explanation :

computing/questions/369/1.html#ixzz46VG385ou
6: A fuzzy set whose membership function has at least one element x in the universe
is unity is called
A.
sub normal fuzzy sets
B.
normal fuzzy set
C.
convex fuzzy set
D.
concave fuzzy set
7:
In a Fuzzy set a prototypical element has a value
A.
1
B.
0
C.
infinite
D.
Not defined
Option: A
Explanation :
8:
A fuzzy set wherein no membership function has its value equal to 1 is called
A.
normal fuzzy set
B.
subnormal fuzzy set.
C.
convex fuzzy set
D.
concave fuzzy set
Option: B
Explanation :
9: A fuzzy set has a membership function whose membership values are strictly
monotonically increasing or strictly monotonically decreasing or strictly
monotonically increasing than strictly monotonically decreasing with increasing
values for elements in the universe
A.
convex fuzzy set
B.
concave fuzzy set
C.
Non concave Fuzzy set
D.
Non Convex Fuzzy set
Option: A
Explanation :
10:
The membership values of the membership function are nor strictly
monotonically increasing or decreasing or strictly monoronically increasing than
decreasing.
A.
Convex Fuzzy Set
B.
Non convex fuzzy set
C.
Normal Fuzzy set
D.
Sub normal fuzzy set
Option: B
Explanation :

computing/questions/369/2.html#ixzz46VGHJtYr
11:
Match the Column
List I
List II
1 Subnormal Fuzzy Set
2 Normal Fuzzy Set
3 Non Convex Normal Fuzzy Set
4 Convex Normal Fuzzy Set
A.
a b c d
2 1 4 3
B.
a b c d
1 2 3 4
C.
a b c d
4 3 2 1
D.
a b c d
3 2 1 4
Option: A
Explanation :
12: The crossover points of a membership function are defined as the elements in the
universe for which a particular fuzzy set has values equal to
A.
infinite
B.
1
C.
0
D.
0.5
Option: D
Explanation :

computing/questions/369/3.html#ixzz46VGTKXoG
Questions
1. Which of the following(s) is/are found in Genetic Algorithms?
(i)
evolution
(ii)
selection
(iii)
reproduction
(iv)
mutation
: Your answer is
(a)
i & ii only
(b)
i, ii & iii only
(c)
ii, iii & iv only

(d)
all of the above
2. Matching between terminologies of Genetic Algorithms and

Genetics:
Genetic Algorithms Genetics (biology)
(a) (i)
representation external disturbance,

structures such as cosmic radiation
(b) (ii)
crossover chromosomes
(c) (iii)
mutation survivability
(d) (iv)
selection sexual reproduction
: Your answer is .3
4. (a)
5. _____
6. (b)
7. _____
8. (c)
9. _____
10.(d)
11._____
12.Where are Genetic Algorithms applicable?
(i)
real time application
(ii)
biology
(iii)
Artificial Life
(iv)
economics
: Your answer is
(a)
i, ii & iii only
(b)
ii, iii & iv only
(c)
i, iii & iv only
(d)
all of the above
13.Which of the following(s) is/are the pre-requisite(s) when Genetic

(i)
encoding of solutions
(ii)
well-understood search space
(iii)
method of evaluating the suitability of the solutions
(iv)
contain only one optimal solution
: Your answer is
(a)
i & ii only
(b)
ii & iii only
(c)
i & iii only

(d)
iii & iv only
(i)
Genetic Algorithm is a randomised parallel search algorithm, based

on the principles of natural selection, the process of evolution.
(ii)
GAs are exhaustive, giving out all the optimal solutions to a given
problem.
(iii)
GAs are used for solving optimization problems and modeling

evolutionary phenomena in the natural world.
(iv)
Despite their utility, GAs remain a poorly understood topic.
: Your answer is
(a)
i, ii & iii only
(b)
ii, iii & iv only
(c)
i, iii & iv only
(d)
all of the above
15.If crossover between chromosome in search space does not produce

significantly different offsprings, what does it imply? (if offspring
(i)
The crossover operation is not succesful.
(ii)
Solution is about to be reached.

(iii)
Diversity is so poor that the parents involved in the crossover

operation are similar.
(iv)
The search space of the problem is not ideal for GAs to operate.
: Your answer is
(a)
ii, iii & iv only
(b)
ii & iii only
(c)
i, iii & iv only
(d)
all of the above
16.Which of the following comparison is true?
: Your answer is
(a)
In the event of restricted acess to information, GAs win out in that

they require much fewer information to operate than other search.
(b)
Under any circumstances, GAs always outperform other algorithms.
(c)
The qualities of solutions offered by GAs for any problems are

always better than those provided by other search.
(d)
GAs could be applied to any problem, whereas certain algorithms

are applicable to limited domains.
(i)
Artificial Life is analytic, trying to break down complex phenomena
into their basic components.
(ii)
Alife is a kind of Artificial Intelligence (AI).
(iii)
Alife pursues a two-fold goal: increasing our understanding of

nature and enhancing our insight into artificial models, thereby
providing us with the ability to improve their performance.
(iv)
Alife extends our studies of biology, life-as-we-know-it, to the larger

domain of possible life, life-as-it-could-be.
: Your answer is
(a)
i & ii only
(b)
iii & iv only
(c)
i, ii & iii only
(d)
all of the above
18.Where is Artificial Life applicable?
(i)
film (movie, video) production
(ii)
biology
(iii)
robotics
(iv)
air traffic control

: Your answer is
(a)
i, ii & iii only
(b)
ii, iii & iv only
(c)
i, iii & iv only
(d)
all of the above
19.Who can be benefited from Alife?
(i)
children
(ii)
designers
(iii)
artists
(iv)
patients
: Your answer is
(a)
i, ii & iii only
(b)
ii, iii & iv only
(c)
i, iii & iv only
(d)
all of the above

: Answers
Q1.
Which of the following(s) is/are found in Genetic Algorithms?
An initial population evolves to some optimal solutions. Selection biases for

better individuals, judged by their fitness values; two individuals are chosen
for reproducing offspring. By combining portions of good individuals, this
.process is likely to create even better individuals
...Go Back
Q2.
Matching between terminologies of Genetic Algorithms and

Genetics:
The correct answer is :
(a)
(ii)
(b)
(iv)
(c)
(i)
(d)
(iii)
...Go Back
Q3.
Where are Genetic Algorithms applicable?
Genetic Algorithms can be used to evolve strategies for interaction in the

Prisoner's Dilemma in economics. GAs are used as a computational method in
Alife - simulation of living systems starting with single cells and evolving to
orgranisms, societies or even whole economic systems. These features
compete for the limited resources in this virtual world. In biology, GAs are
used in protein structure prediction, protein folding, stability of DNA hairpins
.and modeling of immune system
DNA structures Protein Structures
It cannot be applied in real time systems. The response time is critical.

However, GAs cannot guarantee to find a solution. The time spent in
evaluation of fitness function and other genetic operations is substantially
.large, especially in a poorly- understood, complex search space
...Go Back
Q4.
Which of the following(s) is/are the requirement(s) when Genetic

The problem is mapped into a set of strings with each string representing a
potential solution (i.e. chromosomes). A fitness function is required to
compare and tell which solution is better. GA performance is heavily
.dependent on the representation chosen
GAs are designed to efficiently search large, non-linear, poorly understood

search space where expert knowledge is scarce or difficult to encode and
where traditional techniques fail. However, domain knowledge guides GAs to
obtain the optimal solutions. Moreover, GAs are powerful enough to solve for
.a set of (nearly) optimal solutions
...Go Back
Q5.
The search space is too complex for exhaustive search such that GAs
successfully find robust solutions after evaluating only a few percent of the
.full parameter space
It can never be guaranteed that GAs will find an optimal solution or even any
.solution at all
Their probabilistic nature and reliance on frequent interactions of members of

a large population make a complete analytic understanding of GAs extremely
.difficult
...Go Back
Q6.
If crossover between chromosome in search space does not produce

significantly different offspring, what does it imply? (if offspring
When crossover operation does not produce siginificantly different offsprings,

it shows that the parents involved are almost identical. Hence, it means that
solution is about to be reached. However, this solution derived is not
neccessarily the optimal solution. From here, we could see that mutation is
necessary to maintain the diversity of the population so that GAs would not be
.trapped in partial solutions
...Go Back
Q7.
Which of the following comparison is true?
The correct answer is (a).

 This is true since GAs require only information that would
evaluate the fitness function for the possible soulutions
(individuals in search space). But for other searches which
generally require more information, like differentiability of
problem function, might find it hard to find them.
 This holds true in most circumstances. However, if the search

space is small enough, other search like hill-climbing or
heuristic, which are very effective in explorating small space,
would just perform as good.
 GAs have only been developed for a couple of decades while

traditional searches have been investigated for a longer time.
Thus GAs do not necessarily produce a better quality solution.
 Evidently certain algorithms are only applicable to limited

domains . However, certain difficulties, like encoding of
problems, might hinder the use of GAs.
...Go Back
Q8.
Alife is characterised by a bottom-up synthesis approach, so that the robotics

work tends to aim for insect-like capability rather than human, and complex
hebaviours are developed by putting together more simple ones. Artificial
forms of evolution such as Genetic Algorithms and Genetic Programming are
widely used to evolve solutions or behaviours rather than designing them in a
.top-down fashion in Artificial Intelligence
...Go Back
Q9.
Where is Artificial Life applicable?
Alife is applicable in many fields, such as a walking robot

.shown on the right
...Go Back
Q10.
Who can be benefited from Alife?
Children can use various computational tools (including LEGO/Logo

and Electronic Bricks) to build artificial creatures, exploring
.some of the central ideas of Alife
GAs can be applied to the design of laminated composite structures, circuit

designs and the improvement of Pareto optimal designs. Genetic programming
can help artists to create many pictures. Medical problems can also be
.detected: Medibrains
...Go Back
SOFT COMPUTING
UNIT – I
1. The structural constitute of a human brain is known as ------------------
a) Neuron b)Cells c)Chromosomes d)Genes
2.Neural networks also known as -----------------------
a)Artificial Neural Network b)Artificial Neural Systems

c)Both A and B d) None of the above
3. Neurons also known as -----------------
a)Neurodes b)Processing elements c)Nodes d)All the above
4. In the neuron, attached to the soma are long irregularly shaped filaments called--------------
a)Dendrites b)Axon c)Synapse d)Cerebellum
5. Signum function is defined as -------------------
a) φ(I) =+1, I>0, -1, I<=0
b) φ(I)=0
c) φ(I)=+1,I>0
d) φ(I)=-1,I<=0
6. To generate the final output, the sum is passed on to a non-linear filter φ called
a)Smash function b)sum function c)Activation function d)Output function
7. ---------------function is a continuous function that varies gradually between the asymptotic values 0
and 1 or -1 and +1
a)Activation function b)Thresholding function c)Signum function d)Sigmoidal function
8.-----------------produce negative output values
a)Hyperbolic tangent function b)Parabolic tangent function
c)Tangent function d)None of the above
9.-------------------- carrying the weights connect every input neuron to the output neuron but not
vice-versa.
a)Feed forward network

b)Fast forward network
c)Fast network
d)Forward network
10.------------- has not feedback loop
a)Neural network b)Recurrent Network c)Multilayer Network d)Feed forward network
11. In the learning method, the target output is not presented to the network ----------------
a) Supervised learning b)Unsupervised learning
c)Reinforced learning d)Hebbian learning
12. Combining a number of ADALINE is ----------------
a) MULTILINE b)MULTIPLE LINE C)MADALINE d)MANYLINE
13.Neural network applications -----------------
a) Pattern Recognition b)Optimization Problem c)Forecasting d)All the above
14.------------------ is a Systematic method for training multilayer artificial neural network
a)Back propagation b)Forward propagation c)Speed propagation d)Multilayer propagation
15. --------------------- is a computational model
a) neuron b) cell c)Perception d)Neucleus
16.Intermediatry layer is present in ----------------------
a)Multilayer feedforward perception model
b)Multilayer perception model
c)Multilayer Feedforward model
d)None of the above
17.Linear Activation Operator equation is ---------------
a) O=gI,g=tanφ
b) O=gI,g=sinφ
c) O=gI,g=cosφ
d) O=gI,g=-tanφ
18.--------------- is never assured of finding global minimum as in the simple layer delta rulecase.
a)Back propagation b)Front Propagation c)Propagation d)None above
19.The test of neural network is known as--------------
a)Inference Engine b)Checking c)Deriving d)None

20.Application of Back Propagation
a)Design of Journal Bearing b)Classification of soil
c)Hot Extrusion of soil d)All the above
21. Reinforced learning also known as ----------------
a)Output based learning b)Error based learning
c)Back propagation learning d)None
22.---------------------learning follows “Winner takes all” strategy
a)Stochastic learning b)Competitive learning c)Hebbian learning d)BackPropagation learning
23.------------------earlier neural network architecture,
a)Rosenblatt Perception b)Rosen Perception c)Roshon Perception d)None
24. In Rosenblatt’s Perception network has three units, sensory unit, association unit and
--------------a)Output unit b) Response unit c) feedback unit d) Result unit
25.ADALINE stands for --------------------------
a)Adaptive Linear Neural Element Network
b)Adaptive Line Neural Network
c)Adapt Line Neural Element Network
d)Adaptive Linear Neural Network
PART-B
1. Explain model of artificial neuron

2. Differentiate Learning methods supervised, unsupervised, and reinforced learning
3. Explain Rosenblatt’s Perception
4. Explain ADALINE network
5. Explain Single layer ANN
6. Explain any one application of Back propagation networks
PART-C
1. Explain neural network architecture

2. Explain back propagation learning briefly
3. Explain basic concepts of neural network
UNIT-2
1.----------------is a store house of associated patterns which are encoded in some form
a)Associative memory b) Commutative memory
c)Neural networks d)Memory
2. If the associated pattern pairs (x,y) are different and if the model recalls a y given an x or vice
versa, then it is termed as -------------
a) Auto associative memory b) Hetero associative memory
c) neuro associative memory d) none
3. Autoassociative correlation memories are known as ---------------
a) Auto correlators b) Hetero Correlators c)Neuro Correlators d) None
4.--------------- recalls an output given an input in one feedforward pass
a)Static networks b) Dynamic networks c)Recurrent networks d) None
5.BAM stands for ----------------
a)Bidirectional Associative Memory b)v Associative Memory
c)Biconventional Associative Memory d) None
6.----------------- associates patterns in bipolar forms that are real-coded
a)Simplified Bidirectional Associative Memory b)Bipolar form
c)Bidirectional form d)None
7)---------------------- uses bipolar coding
a)Fabric defect identification b)Recognition of Characters
c)Design of Journal Bearing d) Classification of soil
8)Self-organizing network also known as ---------------------
a)Back Propagation network b)Training free counter propagation network
c)Propagation network d)none
9)Kesko proposed an energy function for the two states -----------------
a)E(A,B)=AMBT
b)E(A,B)=-AMBT
C)E(A,B)=-ABT
D)E(A,B)=ABT
10) BAM was introduced by ----------------------
a) Cruz b) Stubberd c)Kosko d)Rosenbatt
11)The algorithm which computes operator M is known as ------------------
a)Memory algorithm b)Recording Algorithm c)Transfer Algorithm d)None
12) Real coding is used by -----------------
a)Recognition of characters b)Fabric defect identification
c)Optimization d)Classification of soil
13)ART stands for --------------------
a)Adaptive Resonance Theory b)Adaptive Recent Theory
c)Adapt Resonance Theory d)Adaptive Retail Theory
14)A program --------------- is written in fortran for cluster formation
a) Vecquent b)Vecant c)Vector d)Quantization
15)----------------- networks were developed by carpenter and grossberg
a)ART b)ARP c)ARC d)ARD
16)------------------ of the network means that a pattern should not oscillate among different cluster
units at different stages of training
a)Stability b)Mobility c)Versitality d)Placticity
17)------------------- is the analogus version of ART
a)ART2 b)ART1 c)ART2A d)ARTMAP
18)----------------- test is incorporated into the adaptive backward network
a)Vigilance b)Indulgence c)Revailance d)None
19)In ---------------- learning the weights are adjusted only when the external input matches one of
the stored prototypes
a)Supervised b)UnSupervised c)Match-based d)None
20)Kim et al. Proposed an ------------------ method using ART2 architecture.
a)Pattern Recognition b) Chinese Recognition method

c)Character Recognition d)None
21)--------------- learning weight update during resonance occurs rapidly
a)Error-based b) Fast c)Slow d)Match-based
22)Comparison layer and recognition layer constitute -----------
a)Attenuation b)Attenuated System c)Synaptic System d)None
23)ART1 is an elegant theory that address ------------------
a)Stability – plasticity dilemma
b)Stability dilemma
c)Plasticity dilemma
d)None
24)Supervised version of ART -----------------
a)ARTMAP
b)Fuzzy art
c)Fuzzy Artmap
d)ART1
25)Slow learning is used as -----------------
a)ART1
b)ART2
c)ARTMAP
d)Fuzzy ART
PART-B
1.Explain Auto Correlators
2.Explain HeterCorrelators
3.Explain any one application of associative memory
4.Explain Simplified ART architecture
5.Disitinguish ART1 and ART2
6.Explain any one application of ART

PART-C
7.Explain Exponential BAM
8.Explain Classical ART network
9.Explain ART1 algorithm
UNIT-3
1.Fuzziness means -------------
a)Vagueness b)Clear c)Precise d)Certainty
2.---------------- are pictorial representations to denote a set
a)Flow chart b)Venn diagram c)DFD d)ER diagrams
3.The number of elements in a set is called its -------------
a)modality b)placiticity c)Cardinality d)elasticity
4.A set with a single element is called -----------
a)Single set b)Singleton set c)1 set d)none
5.A -------------- of a set A is the set of all possible subsets that are derivable from A including null set
a)Power set b)Impower set c)Rational set d)Irrational set
6.The member ship function of fuzzy set not always be described by ----------------
a)continuous b)Discrete c)crisp d)specific
7.Fuzzy relation is a fuzzy set defined on the Cartesian product of -----------
a)single set b)crisp set c)union set d)intersection set
8.Raising a fuzzy set to its second power is called --------------
a)concentration b)intersection c)conjunction d)disjunction
9.Taking a square root of fuzzy set is called -------------------
a)Dilemma b)Dual c)dialama d)none
10.Fuzzy relation associates ------------ to a varying degree of membership.
a)records b)tuples c)felds d)none
11.In case of => operator, the proposition occurring before the “=>” symbol is called---------
a. antecedent b.consequent c.conjunction d.disjunction
12. A truth table comprises rows known as -------------
a. interpredations b.contradiction c.conjunction d.disjunction
13.A formula which has all its interpretations recording true is known as a ----------------
a.disjunction b.conjunction c.tautology d.antecedent
14.In propositional logic, ---------------- widely used for inferring facts.
a.pones b.modus c.modus ponens d.pons
15.------------------ represent objects that do not change values
a.constants b.variables c.predicates d.subject
16.------------------------ are representative of associations between objects that are constants or

variables and acquire truth values.
a.Subject b.Predicate c.Quantifier d.Functions
17.----------------- truth values are multivalued.
a.crisp logic b.boolean logic c.fuzzy logic d.none
18.Fuzzy logic propositions are also quantified by --------------
a.fuzzy b.fuzzy qualifiers c.fuzzy quantifiers d.none
19.Fuzzy inference also referred to as --------------
a.approximate reasoning b.reasoning c.fixed reasoning d.none
20.Conversion of a fuzzy set to single crisp value is called -----------------
a.fuzzification b.defuzzification c.fuzzy logic d.fuzzy rule
21.--------------- obtains centre of area occupied by the fuzzy set
a.center b.center of gravity c.center of area d.center point
22.The ---------------- is the arithmetic average of mean values of all intervals
a.mean b.mean of maxima c.maximum d.mean interval
23.The ------------------ are obtained by computing the minimum of the membership functions of the
antecedents.
a.rule base b.rule strengths c.rules d.none
24.Relative quantifiers are defined as ---------

a.0 to 10 b.0 to 1 c.0 d.1
25.Fuzzy cruise controller has --------------- inputs
a.2 b.3 c.1 d.0
PART-B
1.Explain fuzzy set
2.Explain crisp set
Explain fuzzy relations
3.Distinguish between crisp logic and predicate logic
4.Explain fuzzy quantifiers
5.Explain fuzzy logic
6.Explain fuzzy inference
PART-C
1.Explain Fuzzy System
2.Explain any one of applications of Fuzzy systems
3.Explain fuzzy rule based systems.

UNIT-IV
PART-A
1.--------------- mimic the principle of natural genetics
a.Genetic programming b.Genetic Algorithm c.Genetic Evolution d.none
2.------------ mimics the behaviour of social insects
a.Swarm intelligence b.Ant colony c.Gentic Algorithm d.none
3.Possible settings of traits are called in genes -------------------
a.locus b.alleles c.genome d.genotype
4.------------------ means that the element of DNA is modified.
a.Recombination b.Selection c.Mutation d.none
5.The -------------- of an organism is measured by means of success of organism in life
a.Strength b.fitness c.Gene d.Chromosome
6.The space for all possible feasible solutions is called ------------------
a.space b.search c.search space d.area
7.------------- is a way of representing individual genes
a.conversion b.encoding c.coding d.none
8.In --------------, every chromosomes is a string of numbers
a.hexadecimal encoding b.octal encoding c.Permutation encoding d.none
9.------------ is the first operator applied on population.
a.Reproduction b.Recombination c.Mutation d.none
10.------------------ means that the genes from the already discovered good individuals are exploited
a.Diversity b.Population diversity c.Unity in diversity d.none
11.-------------is the degree to which the better individuals are favoured
a.Selective pressure b.Reproduction pressure c.Recombination pressure d.Mutation
12.The selection method which is less noisy is -----------
a.stochastic remainder solutionb.Boltzman solution c.Remainder solution d.none
13.The ----------------- is referred the proportion of individuals in the the population which are
replaced in each generation.
a.gap b.generation gap c.generation interval d.interval
14.Crossover operator proceeds in ------------- steps
a.4 b.3 c.5 d.2.
15.Matrix crossover is also known as ------------
a.One dimensional b.Two dimensional c.Three dimensional d.none
16.------------------performs linear inversion with a specified probability of 0.75.
a.Linear+end-inversion b.Discrete inversion c.Continuous inversion d.Mass inversion
17.---------------- of bit involves changing bits from 0 to 1 and 1 to 0.
a.Mutation b.Crossover c.Inversion d.Segregation
18.-------------------- is a process in which a given bit pattern is transformed into another bit pattern by
means of logical bit-wise operation.
a.Inversion b.Conversion c.Masking d.Segregation
19.In ------------------, inversion was applied with specified inversion probability p to each new
individual when it is created.
a.Discrete b.Continuous c.Mass inversion d.none
20.The -------------causes all the bits in the first operand to the shifted to the left by the number of
positions indicated by the second operand.
a.Shift right b.Shift left c.Shift operator d.none
21.A --------------- returns 1 if one of the bits have a value of 1 and the other has a value of 0
otherwise it returns a value 0.
a.bit wise or b.bit wise and c.not d.none
22.Population size, Mutation rate and cross over rate are together referred to as ---------------
a.control parameters b.central parameters c.connection parameters d.none
23.-------------selection is slow cooling of molten metal to achieve the minimum function value in a
minimization problem.
a.Boltzmann selection b.Tournament selection c.Roulette-wheel selection d.none
24.---------------is not a particular method of selecting the parents.
a.Steady-state b.Elitism c.Boltzmann selection d.Tournament Selection
25.Reproduction operator is also known as ---------

a.Recombination b.Selection c.Regeneration d.none
PART-B
1.Explain biological background of genetic algorithm
2.Explain Working principle of genetic algorithm
3.Explain any two types of encoding
4.Explain inheritance operators
5.Explain Mutation operator
6.Explain Bit-wise operator
PART-C
1.Explain Reproduction operator
2.Explain Inversion and Deletion
3.Explain Generation Cycle

UNIT-5
PART-A
1.Hybrid systems is combination of neural networks, fuzzy logic and --------------
a.Genetic Algorithm b.Genetic Programming c.Genetic d.none
2.In -------------, one technology calls the other as a subroutine to process or manipulate
information needed by it.
a.Auxiliary hybrid systems b.Embedded hybrid systems
c.sequential hybrid systems d.none
3.------------hyrbid systems make use of technologies in a pipeline fashion.
a.auxialiary hybrid systems b.embedded hybrid systems
4.--------------hyrbid systems the technologies participating are integerated in such a manner that
they appear interwined.
a.auxialiary hybrid systems b.embedded hybrid systems
5.------------- deals with uncertainty problems with its own merits and demerits
a.neuro –fuzzy b.neuro-genetic c.fuzzy –genetic d.none
6.Neural network can learn various tasks from -------------
a.training b.testing c.learning d.none
7.-------------exhibit non-linear functions to any desired degree of accuracy
8.---------------- use to determine the weights of a multilayer feedforward network with

backpropagation learning
9.------------------ fuzzy input vectors to crisp outputs
a.Fuzzy – backpropagation b.neuro –fuzzy c.neuro-genetic d.fuzzy –genetic
10.----------------is a neuro-fuzzy hybrid in which the host is a recurrent network with a kind of
competitive learning.
a.Fuzzy ARTMAP b.Fuzzy art c.ARTMAP d.none

11.FAM Stands for ------------
a.Fuzzy Associative Memory b.Fuzzy association memory
c.Fuzzy Assist Memory d.none
12.---------------maps fuzzy sets and can encode fuzzy rules.
a.FAM b.Fuzzy c.ART d.none
13.Fuzzy truck backer-upper system is application of ---------------
a.FAM b.Fuzzy ART c.ART d.none
14.----------------- applicable on fuzzy optimization problems
a.Fuzzy-genetic b.neuro – fuzzy c.fuzzy-logic d.fuzzy-backpropagation
15.--------------learning have reported difficulties in learning the topology of the networks whose
weights they optimize
a.Gradient descent learning b.descent learning c.Gradient learning d.none
16.Applying neuronal learning capabilities to fuzzy systems is knowns as ---------
a.NN driven fuzzy reasoning b.fuzzy driven nn reasoning
c.neural network reasoning d.none
17.---------- can be applicable to mathematical relationship
a. neuro-fuzzy b.fuzzy-neuro c.neuro-network d.none
18.------------- is a multilayer feedforward network architecture with gradient learning.
a.backpropagation b.forward propagation c.Propagation d.none
19. Recurrent network architectures adopting -------------
a.hebbian learning b.supervised learning c.unsupervised learning d.reinforced learning
20.------------ set have no crisp boundaries
a.fuzzy b.boolean c.crisp set d.none
21.GA-NN also known as -----------
a.GANN b.NNGA c.GA d.none
22.Image recognition under noisy is application of --------
a.Fuzzy b.Fuzzy art c.art d.none
23.Genetic algorithm ------------- uses to determine optimization

a.fitness function b.fit function c.strength function d.none
24.------------proposed neuro –fuzzy system
a.lee and lie b.kosko c.gradient d.lee
25.Knowledge-based evaluation and earthquake damage evaluation is application of -----------
a.fuzzy-backpropagation b.neuro-fuzzy c.fuzzy d.none
PART-B
1.Explain neuro-fuzzy hybrids
2.Explain neuro-genetic hybrids
3.Explain fuzzy-genetic hybrids
4.Explain fuzzy-backpropagation network
5.Explain FAM
PART-C
1.Explain Hybrid Systems
2.Explain Fuzzy ARTMAP
3.Explain GA based backpropagation network

Soft Computing MCQ (9 Files Merged)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Soft Computing MCQ (9 Files Merged)

Uploaded by

Copyright:

Available Formats

Artificial Intelligence Questions and

Answers – Fuzzy Logic – 1

1. Fuzzy logic is a form of

2. Traditional set theory is also known as Crisp Set theory.

7. The values of the set membership is represented by

8. What is meant by probability density function?

10. Which of the following is used for probability theory sentences?

Artificial Intelligence Questions and

3. Where does the Bayes rule can be used?

6. Like relational databases there does exists fuzzy relational databases.

7. ______________ is/are the way/s to represent uncertainty.

3. An auto-associative network is:

6. Which of the following is true for neural networks?

7. What are the advantages of neural networks over conventional computers?

8. Which of the following is true?

10. Neuro software is:

Artificial Intelligence Questions and

1. Why is the XOR problem exceptionally interesting to neural network researchers?

2. What is back propagation?

3. Why are linearly separable problems of interest of neural network researchers?

4. Which of the following is not the promise of artificial neural network?

7. The name for the function in question 16 is

10. Which of the following is an application of NN (Neural Network)?

Artificial Intelligence Questions and

1. Which is not a desirable property of a logical rule-based system?

2. How is Fuzzy Logic different from conventional control methods?

4. Inductive learning involves finding a

6. If a hypothesis says it should be positive, but in fact, it is negative, we call it

7. Neural Networks are complex ———————–with many parameters.

10. Which of the following statement is not true?

Artificial Intelligence Questions and

2. Different learning method does not include:

3. Which of the following is the model used for learning?

4. Automated vehicle is an example of ______.

7. Decision trees are appropriate for the problems where:

8. Which of the following is not an application of learning?

10. Following is also called as exploratory learning:

Artificial Intelligence Questions and

2. Which modifies the performance element so that it makes better decision?

3. How many things are concerned in design of a learning element?

4. What is used in determining the nature of the learning problem?

6. Which is used for utility functions in game playing algorithm?

7. Which is used to choose among multiple consistent hypotheses?

10. How the decision tree reaches its decision?

Answer Report Discuss

Answer Report Discuss

Answer Report Discuss

Answer Report Discuss

Answer Report Discuss

Answer Report Discuss

Answer Report Discuss

Answer Report Discuss

Answer Report Discuss

Read more: http://www.avatto.com/computer-science/test/mcqs/soft-

Answer Report Discuss

Answer Report Discuss

Answer Report Discuss

Answer Report Discuss

Answer Report Discuss

Read more: http://www.avatto.com/computer-science/test/mcqs/soft-

Answer Report Discuss

Answer Report Discuss

Answer Report Discuss