You are on page 1of 416

Artificial Intelligence Questions and

Answers – Fuzzy Logic – 1


This set of Artificial Intelligence MCQs focuses on “Fuzzy Logic – 1”.

1. Fuzzy logic is a form of


a) Two-valued logic
b) Crisp set logic
c) Many-valued logic
d) Binary set logic
View Answer

Answer: c
Explanation: With fuzzy logic set membership is defined by certain value. Hence it
could have many values to be in the set.

2. Traditional set theory is also known as Crisp Set theory.


a) True
b) False
View Answer

Answer: a
Explanation: Traditional set theory set membership is fixed or exact either the
member is in the set or not. There is only two crisp values true or false. In case of
fuzzy logic there are many values. With weight say x the member is in the set

3. The truth values of traditional set theory is ____________ and that of fuzzy set is
__________
a) Either 0 or 1, between 0 & 1
b) Between 0 & 1, either 0 or 1
c) Between 0 & 1, between 0 & 1
d) Either 0 or 1, either 0 or 1
View Answer

Answer: a
Explanation: Refer the definition of Fuzzy set and Crisp set.

4. Fuzzy logic is extension of Crisp set with an extension of handling the concept of
Partial Truth.
a) True
b) False
View Answer

Answer: a
Explanation: None.
advertisements
5. How many types of random variables are available?
a) 1
b) 2
c) 3
d) 4
View Answer
Answer: c
Explanation: The three types of random variables are Boolean, discrete and
continuous.

6. The room temperature is hot. Here the hot (use of linguistic variable is used) can be
represented by _______ .
a) Fuzzy Set
b) Crisp Set
View Answer

Answer: a
Explanation: Fuzzy logic deals with linguistic variables.

7. The values of the set membership is represented by


a) Discrete Set
b) Degree of truth
c) Probabilities
d) Both b & c
View Answer

Answer: b
Explanation: Both Probabilities and degree of truth ranges between 0 – 1.

8. What is meant by probability density function?


a) Probability distributions
b) Continuous variable
c) Discrete variable
d) Probability distributions for Continuous variables
View Answer

Answer: d
Explanation: None.
advertisements
9. Japanese were the first to utilize fuzzy logic practically on high-speed trains in
Sendai.
a) True
b) False
View Answer
Answer: a
Explanation: None.

10. Which of the following is used for probability theory sentences?


a) Conditional logic
b) Logic
c) Extension of propositional logic
d) None of the mentioned
View Answer

Answer: c
Explanation: The version of probability theory we present uses an extension of
propositional logic for its sentences.

Artificial Intelligence Questions and


Answers – Fuzzy Logic – 2
This set of Artificial Intelligence MCQs focuses on “Fuzzy Logic – 2”.

1. Fuzzy Set theory defines fuzzy operators. Choose the fuzzy operators from the
following.
a) AND
b) OR
c) NOT
d) EX-OR
View Answer

Answer: a, b, c
Explanation: The AND, OR, and NOT operators of Boolean logic exist in fuzzy logic,
usually defined as the minimum, maximum, and complement;

2. There are also other operators, more linguistic in nature, called __________ that
can be applied to fuzzy set theory.
a) Hedges
b) Lingual Variable
c) Fuzz Variable
d) None of the mentioned
View Answer

Answer: a
Explanation: None.

3. Where does the Bayes rule can be used?


a) Solving queries
b) Increasing complexity
c) Decreasing complexity
d) Answering probabilistic query
View Answer

Answer: d
Explanation: Bayes rule can be used to answer the probabilistic queries conditioned
on one piece of evidence.
4. What does the Bayesian network provides?
a) Complete description of the domain
b) Partial description of the domain
c) Complete description of the problem
d) None of the mentioned
View Answer

Answer: a
Explanation: A Bayesian network provides a complete description of the domain.
advertisements
5. Fuzzy logic is usually represented as
a) IF-THEN-ELSE rules
b) IF-THEN rules
c) Both a & b
d) None of the mentioned
View Answer
Answer: b
Explanation: Fuzzy set theory defines fuzzy operators on fuzzy sets. The problem in
applying this is that the appropriate fuzzy operator may not be known. For this reason,
fuzzy logic usually uses IF-THEN rules, or constructs that are equivalent, such as
fuzzy associative matrices.
Rules are usually expressed in the form:
IF variable IS property THEN action

6. Like relational databases there does exists fuzzy relational databases.


a) True
b) False
View Answer

Answer: a
Explanation: Once fuzzy relations are defined, it is possible to develop fuzzy
relational databases. The first fuzzy relational database, FRDB, appeared in Maria
Zemankova’s dissertation.

7. ______________ is/are the way/s to represent uncertainty.


a) Fuzzy Logic
b) Probability
c) Entropy
d) All of the mentioned
View Answer

Answer: d
Explanation: Entropy is amount of uncertainty involved in data. Represented by
H(data).

8. ____________ are algorithms that learn from their more complex environments
(hence eco) to generalize, approximate and simplify solution logic.
a) Fuzzy Relational DB
b) Ecorithms
c) Fuzzy Set
d) None of the mentioned
View Answer

Answer: c
Explanation: Local structure is usually associated with linear rather than exponential
growth in complexity.
advertisements
9. Which condition is used to influence a variable directly by all the others?
a) Partially connected
b) Fully connected
c) Local connected
d) None of the mentioned
View Answer
Answer: b
Explanation: None.

10. What is the consequence between a node and its predecessors while creating
Bayesian network?
a) Conditionally dependent
b) Dependent
c) Conditionally independent
d) Both a & b
View Answer

Answer: c
Explanation: The semantics to derive a method for constructing Bayesian networks
were led to the consequence that a node can be conditionally independent of its
predecessors
Artificial Intelligence Questions and
Answers – Neural Networks – 1
This set of Artificial Intelligence MCQs focuses on “Neural Networks – 1”.

1. A 3-input neuron is trained to output a zero when the input is 110 and a one when
the input is 111. After generalization, the output will be zero when and only when the
input is:
a) 000 or 110 or 011 or 101
b) 010 or 100 or 110 or 101
c) 000 or 010 or 110 or 100
d) 100 or 111 or 101 or 001
View Answer

Answer: c
Explanation: The truth table before generalization is:
Inputs Output
000 $
001 $
010 $
011 $
100 $
101 $
110 0
111 1
where $ represents don’t know cases and the output is random.
After generalization, the truth table becomes:
Inputs Output
000 0
001 1
010 0
011 1
100 0
101 1
110 0
111 1
.

2. A perceptron is:
a) a single layer feed-forward neural network with pre-processing
b) an auto-associative neural network
c) a double layer auto-associative neural network
d) a neural network that contains feedback
View Answer

Answer: a
Explanation: The perceptron is a single layer feed-forward neural network. It is not an
auto-associative network because it has no feedback and is not a multiple layer neural
network because the pre-processing stage is not made of neurons.

3. An auto-associative network is:


a) a neural network that contains no loops
b) a neural network that contains feedback
c) a neural network that has only one loop
d) a single layer feed-forward neural network with pre-processing
View Answer

Answer: b
Explanation: An auto-associative network is equivalent to a neural network that
contains feedback. The number of feedback paths(loops) does not have to be one.

4. A 4-input neuron has weights 1, 2, 3 and 4. The transfer function is linear with the
constant of proportionality being equal to 2. The inputs are 4, 10, 5 and 20
respectively. The output will be:
a) 238
b) 76
c) 119
d) 123
View Answer

Answer: a
Explanation: The output is found by multiplying the weights with their respective
inputs, summing the results and multiplying with the transfer function. Therefore:
Output = 2 * (1*4 + 2*10 + 3*5 + 4*20) = 238.
advertisements
5. Which of the following is true?
(i) On average, neural networks have higher computational rates than conventional
computers.
(ii) Neural networks learn by example.
(iii) Neural networks mimic the way the human brain works.
a) All of the mentioned are true
b) (ii) and (iii) are true
c) (i), (ii) and (iii) are true
d) None of the mentioned
View Answer
Answer: a
Explanation: Neural networks have higher computational rates than conventional
computers because a lot of the operation is done in parallel. That is not the case when
the neural network is simulated on a computer. The idea behind neural nets is based
on the way the human brain works. Neural nets cannot be programmed, they cam only
learn by examples.

6. Which of the following is true for neural networks?


(i) The training time depends on the size of the network.
(ii) Neural networks can be simulated on a conventional computer.
(iii) Artificial neurons are identical in operation to biological ones.
a) All of the mentioned
b) (ii) is true
c) (i) and (ii) are true
d) None of the mentioned
View Answer

Answer: c
Explanation: The training time depends on the size of the network; the number of
neuron is greater and therefore the number of possible ‘states’ is increased. Neural
networks can be simulated on a conventional computer but the main advantage of
neural networks – parallel execution – is lost. Artificial neurons are not identical in
operation to the biological ones.

7. What are the advantages of neural networks over conventional computers?


(i) They have the ability to learn by example
(ii) They are more fault tolerant
(iii)They are more suited for real time operation due to their high ‘computational’
rates
a) (i) and (ii) are true
b) (i) and (iii) are true
c) Only (i)
d) All of the mentioned
View Answer

Answer: d
Explanation: Neural networks learn by example. They are more fault tolerant because
they are always able to respond and small changes in input do not normally cause a
change in output. Because of their parallel architecture, high computational rates are
achieved.

8. Which of the following is true?


Single layer associative neural networks do not have the ability to:
(i) perform pattern recognition
(ii) find the parity of a picture
(iii)determine whether two or more shapes in a picture are connected or not
a) (ii) and (iii) are true
b) (ii) is true
c) All of the mentioned
d) None of the mentioned
View Answer

Answer: a
Explanation: Pattern recognition is what single layer neural networks are best at but
they don’t have the ability to find the parity of a picture or to determine whether two
shapes are connected or not.
advertisements
9. Which is true for neural networks?
a) It has set of nodes and connections
b) Each node computes it’s weighted input
c) Node could be in excited state or non-excited state
d) All of the mentioned
View Answer
Answer: d
Explanation: All mentioned are the characteristics of neural network.

10. Neuro software is:


a) A software used to analyze neurons
b) It is powerful and easy neural network
c) Designed to aid experts in real world
d) It is software used by Neuro surgeon
View Answer

Answer: b
Explanation: None.

Artificial Intelligence Questions and


Answers – Neural Networks – 2
This set of Artificial Intelligence MCQs focuses on “Neural Networks – 2”.

1. Why is the XOR problem exceptionally interesting to neural network researchers?


a) Because it can be expressed in a way that allows you to use a neural network
b) Because it is complex binary operation that cannot be solved using neural networks
c) Because it can be solved by a single layer perceptron
d) Because it is the simplest linearly inseparable problem that exists.
View Answer

Answer: d
Explanation: None.

2. What is back propagation?


a) It is another name given to the curvy function in the perceptron
b) It is the transmission of error back through the network to adjust the inputs
c) It is the transmission of error back through the network to allow weights to be
adjusted so that the network can learn.
d) None of the mentioned
View Answer

Answer: c
Explanation: Back propagation is the transmission of error back through the network
to allow weights to be adjusted so that the network can learn.

3. Why are linearly separable problems of interest of neural network researchers?


a) Because they are the only class of problem that network can solve successfully
b) Because they are the only class of problem that Perceptron can solve successfully
c) Because they are the only mathematical functions that are continue
d) Because they are the only mathematical functions you can draw
View Answer

Answer: b
Explanation: Linearly separable problems of interest of neural network researchers
because they are the only class of problem that Perceptron can solve successfully

4. Which of the following is not the promise of artificial neural network?


a) It can explain result
b) It can survive the failure of some nodes
c) It has inherent parallelism
d) It can handle noise
View Answer

Answer: a
Explanation: The artificial Neural Network (ANN) cannot explain result.
advertisements
5. Neural Networks are complex ______________ with many parameters.
a) Linear Functions
b) Nonlinear Functions
c) Discrete Functions
d) Exponential Functions
View Answer
Answer: a
Explanation: Neural networks are complex linear functions with many parameters.

6. A perceptron adds up all the weighted inputs it receives, and if it exceeds a certain
value, it outputs a 1, otherwise it just outputs a 0.
a) True
b) False
c) Sometimes – it can also output intermediate values as well
d) Can’t say
View Answer

7. The name for the function in question 16 is


a) Step function
b) Heaviside function
c) Logistic function
d) Perceptron function
View Answer

Answer: b
Explanation: Also known as the step function – so answer 1 is also right. It is a hard
thresholding function, either on or off with no in-between.

8. Having multiple perceptrons can actually solve the XOR problem satisfactorily:
this is because each perceptron can partition off a linear part of the space itself, and
they can then combine their results.
a) True – this works always, and these multiple perceptrons learn to classify even
complex problems.
b) False – perceptrons are mathematically incapable of solving linearly inseparable
functions, no matter what you do
c) True – perceptrons can do this but are unable to learn to do it – they have to be
explicitly hand-coded
d) False – just having a single perceptron is enough
View Answer

Answer: c
Explanation: None.
advertisements
9. The network that involves backward links from output to the input and hidden
layers is called as ____.
a) Self organizing maps
b) Perceptrons
c) Recurrent neural network
d) Multi layered perceptron
View Answer
Answer: c
Explanation: RNN (Recurrent neural network) topology involves backward links from
output to the input and hidden layers.

10. Which of the following is an application of NN (Neural Network)?


a) Sales forecasting
b) Data validation
c) Risk management
d) All of the mentioned
View Answer

Answer: d
Explanation: All mentioned options are applications of Neural Network

Artificial Intelligence Questions and


Answers – Learning – 3
This set of Artificial Intelligence MCQs focuses on “Learning – 3”.

1. Which is not a desirable property of a logical rule-based system?


a) Locality
b) Attachment
c) Detachment
d) Truth-Functionality
e) Global attribute
View Answer

Answer: b
Explanation: Locality: In logical systems, whenever we have a rule of the form A =>
B, we can conclude B, given evidence A, without worrying about any other rules.
Detachment: Once a logical proof is found for a proposition B, the proposition can be
used regardless of how it was derived .That is, it can be detachment from its
justification. Truth-functionality: In logic, the truth of complex sentences can be
computed from the truth of the components. However, there are no Attachment
properties lies in a Rule-based system. Global attribute defines a particular problem
space as user specific and changes according to user’s plan to problem.

2. How is Fuzzy Logic different from conventional control methods?


a) IF and THEN Approach
b) FOR Approach
c) WHILE Approach
d) DO Approach
e) Else If approach
View Answer

Answer: a
Explanation: FL incorporates a simple, rule-based IF X AND Y THEN Z approach to
a solving control problem rather than attempting to model a system mathematically.

3. In an Unsupervised learning
a) Specific output values are given
b) Specific output values are not given
c) No specific Inputs are given
d) Both inputs and outputs are given
e) Neither inputs nor outputs are given
View Answer

Answer: b
Explanation: The problem of unsupervised learning involves learning patterns in the
input when no specific output values are supplied. We cannot expect the specific
output to test your result. Here the agent does not know what to do, as he is not aware
of the fact what propose system will come out. We can say an ambiguous un-proposed
situation.

4. Inductive learning involves finding a


a) Consistent Hypothesis
b) Inconsistent Hypothesis
c) Regular Hypothesis
d) Irregular Hypothesis
e) Estimated Hypothesis
View Answer
Answer: a
Explanation: Inductive learning involves finding a consistent hypothesis that agrees
with examples. The difficulty of the task depends on the chosen representation.
advertisements
5. Computational learning theory analyzes the sample complexity and computational
complexity of
a) Unsupervised Learning
b) Inductive learning
c) Forced based learning
d) Weak learning
e) Knowledge based learning
View Answer
Answer: b
Explanation: Computational learning theory analyzes the sample complexity and
computational complexity of inductive learning. There is a tradeoff between the
expressiveness of the hypothesis language and the ease of learning.

6. If a hypothesis says it should be positive, but in fact, it is negative, we call it


a) A consistent hypothesis
b) A false negative hypothesis
c) A false positive hypothesis
d) A specialized hypothesis
e) A true positive hypothesis
View Answer

Answer: c
Explanation: Consistent hypothesis go with examples, If the hypothesis says it should
be negative but infect it is positive, it is false negative. If a hypothesis says it should
be positive, but in fact, it is negative, it is false positive. In a specialized hypothesis
we need to have certain restrict or special conditions.

7. Neural Networks are complex ———————–with many parameters.


a) Linear Functions
b) Nonlinear Functions
c) Discrete Functions
d) Exponential Functions
e) Power Functions
View Answer

Answer: b
Explanation: Neural networks parameters can be learned from noisy data and they
have been used for thousands of applications, so it varies from problem to problem
and thus use nonlinear functions.

8. A perceptron is a ——————————–.
a) Feed-forward neural network
b) Back-propagation algorithm
c) Back-tracking algorithm
d) Feed Forward-backward algorithm
e) Optimal algorithm with Dynamic programming
View Answer

Answer: a
Explanation: A perceptron is a Feed-forward neural network with no hidden units that
can be representing only linear separable functions. If the data are linearly separable,
a simple weight updated rule can be used to fit the data exactly.
advertisements
9. Which of the following statement is true?
a) Not all formal languages are context-free
b) All formal languages are Context free
c) All formal languages are like natural language
d) Natural languages are context-oriented free
e) Natural language is formal
View Answer
Answer: a
Explanation: Not all formal languages are context-free.

10. Which of the following statement is not true?


a) The union and concatenation of two context-free languages is context-free
b) The reverse of a context-free language is context-free, but the complement need not
be
c) Every regular language is context-free because it can be described by a regular
grammar
d) The intersection of a context-free language and a regular language is always
context-free
e) The intersection two context-free languages is context-free
View Answer

Answer: e
Explanation: The union and concatenation of two context-free languages is context-
free; but intersection need not be.

Artificial Intelligence Questions and


Answers – Learning – 2
This set of Artificial Intelligence MCQs focuses on “Learning – 2”.

1. Factors which affect the performance of learner system does not include
a) Representation scheme used
b) Training scenario
c) Type of feedback
d) Good data structures
View Answer
Answer: d
Explanation: Factors which affect the performance of learner system does not include
good data structures.

2. Different learning method does not include:


a) Memorization
b) Analogy
c) Deduction
d) Introduction
View Answer

Answer: d
Explanation: Different learning methods include memorization, analogy and
deduction.

3. Which of the following is the model used for learning?


a) Decision trees
b) Neural networks
c) Propositional and FOL rules
d) All of the mentioned
View Answer

Answer: d
Explanation: Decision trees, Neural networks, Propositional rules and FOL rules all
are the models of learning.

4. Automated vehicle is an example of ______.


a) Supervised learning
b) Unsupervised learning
c) Active learning
d) Reinforcement learning
View Answer

Answer: a
Explanation: In automatic vehicle set of vision inputs and corresponding actions are
available to learner hence it’s an example of supervised learning.
advertisements
5. Following is an example of active learning:
a) News Recommender system
b) Dust cleaning machine
c) Automated vehicle
d) None of the mentioned
View Answer
Answer: a
Explanation: In active learning, not only the teacher is available but the learner can
ask suitable perception-action pair example to improve performance.

6. In which of the following learning the teacher returns reward and punishment to
learner?
a) Active learning
b) Reinforcement learning
c) Supervised learning
d) Unsupervised learning
View Answer

Answer: b
Explanation: Reinforcement learning is the type of learning in which teacher returns
award or punishment to learner.

7. Decision trees are appropriate for the problems where:


a) Attributes are both numeric and nominal
b) Target function takes on a discrete number of values.
c) Data may have errors
d) All of the mentioned
View Answer

Answer: d
Explanation: Decision trees can be used in all the conditions stated.

8. Which of the following is not an application of learning?


a) Data mining
b) WWW
c) Speech recognition
d) None of the mentioned
View Answer

Answer: d
Explanation: All mentioned options are applications of learning.
advertisements
9. Which of the following is the component of learning system?
a) Goal
b) Model
c) Learning rules
d) All of the mentioned
View Answer
Answer: d
Explanation: Goal, model, learning rules and experience are the components of
learning system.

10. Following is also called as exploratory learning:


a) Supervised learning
b) Active learning
c) Unsupervised learning
d) Reinforcement learning
View Answer
Answer: c
Explanation: In unsupervised learning no teacher is available hence it is also called
unsupervised learning.

Artificial Intelligence Questions and


Answers – Learning – 1
This set of Artificial Intelligence MCQs focuses on “Learning – 1”.

1. What will take place as the agent observes its interactions with the world?
a) Learning
b) Hearing
c) Perceiving
d) Speech
View Answer

Answer: a
Explanation: Learning will take place as the agent observes its interactions with the
world and its own decision making process.

2. Which modifies the performance element so that it makes better decision?


a) Performance element
b) Changing element
c) Learning element
d) None of the mentioned
View Answer

Answer: c
Explanation: A learning element modifies the performance element so that it can make
better decision.

3. How many things are concerned in design of a learning element?


a) 1
b) 2
c) 3
d) 4
View Answer

Answer: c
Explanation: The three main issues are affected in design of a learning element are
components, feedback and representation.

4. What is used in determining the nature of the learning problem?


a) Environment
b) Feedback
c) Problem
d) All of the mentioned
View Answer
Answer: b
Explanation: The type of feedback is used in determining the nature of the learning
problem that the agent faces.
advertisements
5. How many types are available in machine learning?
a) 1
b) 2
c) 3
d) 4
View Answer
Answer: c
Explanation: The three types of machine learning are supervised, unsupervised and
reinforcement.

6. Which is used for utility functions in game playing algorithm?


a) Linear polynomial
b) Weighted polynomial
c) Polynomial
d) Linear weighted polynomial
View Answer

Answer: d
Explanation: Linear weighted polynomial is used for learning element in the game
playing programs.

7. Which is used to choose among multiple consistent hypotheses?


a) Razor
b) Ockham razor
c) Learning element
d) None of the mentioned
View Answer

Answer: b
Explanation: Ockham razor prefers the simplest hypothesis consistent with the data
intuitively.

8. What will happen if the hypothesis space contains the true function?
a) Realizable
b) Unrealizable
c) Both a & b
d) None of the mentioned
View Answer

Answer: b
Explanation: A learning problem is realizable if the hypothesis space contains the true
function.
advertisements
9. What takes input as an object described by a set of attributes?
a) Tree
b) Graph
c) Decision graph
d) Decision tree
View Answer
Answer: d
Explanation: Decision tree takes input as an object described by a set of attributes and
returns a decision.

10. How the decision tree reaches its decision?


a) Single test
b) Two test
c) Sequence of test
d) No test
View Answer

Answer: c
Explanation: A decision tree reaches its decision by performing a sequence of tests
1: ANN is composed of large number of highly interconnected processing
elements(neurons) working in unison to solve problems.

A.
True

B.
False

C.

D.

Answer Report Discuss

Option: A

Explanation :

2:
Artificial neural network used for

A.
Pattern Recognition

B.
Classification

C.
Clustering

D.
All of these

Answer Report Discuss


Option: D

Explanation :

3:
A Neural Network can answer

A.
For Loop questions

B.
what-if questions

C.
IF-The-Else Analysis Questions

D.
None of these

Answer Report Discuss

Option: B

Explanation :

4:
Ability to learn how to do tasks based on the data given for training or initial
experience

A.
Self Organization

B.
Adaptive Learning
C.
Fault tolerance

D.
Robustness

Answer Report Discuss

Option: B

Explanation :

5:
Feature of ANN in which ANN creates its own organization or representation of
information it receives during learning time is

A.
Adaptive Learning

B.
Self Organization

C.
What-If Analysis

D.
Supervised Learniing

Answer Report Discuss

Option: B

Explanation :
Read more: http://www.avatto.com/computer-science/test/mcqs/soft-
computing/ann/514/1.html#ixzz46VE8CQAp
6:
In artificial Neural Network interconnected processing elements are called

A.
nodes or neurons

B.
weights

C.
axons

D.
Soma

Answer Report Discuss

Option: A

Explanation :

7:
Each connection link in ANN is associated with ________ which has information
about the input signal.

A.
neurons

B.
weights
C.
bias

D.
activation function

Answer Report Discuss

Option: B

Explanation :

8:
Neurons or artificial neurons have the capability to model networks of original
neurons as found in brain

A.
True

B.
False

C.

D.

Answer Report Discuss

Option: A

Explanation :

9:
Internal state of neuron is called __________, is the function of the inputs the
neurons receives

A.
Weight
B.
activation or activity level of neuron

C.
Bias

D.
None of these

Answer Report Discuss

Option: B

Explanation :

10:
Neuron can send ________ signal at a time.

A.
multiple

B.
one

C.
none

D.
any number of
Answer Report Discuss

Option: B

Explanation :

Read more: http://www.avatto.com/computer-science/test/mcqs/soft-


computing/ann/514/2.html#ixzz46VEVzf3a
1:
Artificial intelligence is

A
. It uses machine-learning techniques. Here program can learn From past
experience and adapt themselves to new situations

B.
Computational procedure that takes some value as input and produces some
value as output.

C.
Science of making machines performs tasks that would require intelligence
when performed by humans

D
. None of these

Answer Report Discuss

Option: C

Explanation :

2:
Expert systems

A
. Combining different types of method or information
B.
Approach to the design of learning algorithms that is structured along the lines
of the theory of evolution

C.
an information base filled with the knowledge of an expert formulated in terms
of if-then rules

D
. None of these

Answer Report Discuss

Option: C

Explanation :

3:
Falsification is

A.
Modular design of a software application that facilitates the integration of new
modules

B.
Showing a universal law or rule to be invalid by providing a counter example

C.
A set of attributes in a database table that refers to data in another table
D.
None of these

Answer Report Discuss

Option: B

Explanation :

4:
Evolutionary computation is

A
. Combining different types of method or information

B.
Approach to the design of learning algorithms that is structured along the lines
of the theory of evolution.

C.
Decision support systems that contain an information base filled with the
knowledge of an expert formulated in terms of if-then rules.

D
. None of these

Answer Report Discuss

Option: B

Explanation :

5:
Extendible architecture is
A.
Modular design of a software application that facilitates the integration of new
modules

B.
Showing a universal law or rule to be invalid by providing a counter example

C.
A set of attributes in a database table that refers to data in another table

D.
None of these

Answer Report Discuss

Option: A

Explanation :

Read more: http://www.avatto.com/computer-science/test/mcqs/soft-


computing/questions/192/1.html#ixzz46VEoNPTw
6:
Massively parallel machine is

A.
A programming language based on logic

B.
A computer where each processor has its own operating system, its own
memory, and its own hard disk

C.
Describes the structure of the contents of a database.
D.
None of these

Answer Report Discuss

Option: B

Explanation :

7:
Search space

A
. The large set of candidate solutions possible for a problem

B.
The information stored in a database that can be, retrieved with a single query.

C.
Worth of the output of a machine learning program that makes it understandable
for humans

D
. None of these

Answer Report Discuss

Option: A

Explanation :

8:
n(log n) is referred to

A.
A measure of the desired maximal complexity of data mining algorithms
B.
A database containing volatile data used for the daily operation of an
organization

C.
Relational database management system

D.
None of these

Answer Report Discuss

Option: A

Explanation :

9:
Perceptron is

A.
General class of approaches to a problem.

B.
Performing several computations simultaneously

C.
Structures in a database those are statistically relevant

D.
Simple forerunner of modern neural networks, without hidden layers
Answer Report Discuss

Option: D

Explanation :

10:
Prolog is

A.
A programming language based on logic

B.
A computer where each processor has its own operating system, its own
memory, and its own hard disk

C.
Describes the structure of the contents of a database

D.
None of these

Answer Report Discuss

Option: A

Explanation :

Read more: http://www.avatto.com/computer-science/test/mcqs/soft-


computing/questions/192/2.html#ixzz46VF3O07W
11:
Shallow knowledge

A
. The large set of candidate solutions possible for a problem
B.
The information stored in a database that can be, retrieved with a single query

C.
Worth of the output of a machine learning program that makes it
understandable for humans

D
. None of these

Answer Report Discuss

Option: B

Explanation :

12:
Quantitative attributes are

A.
A reference to the speed of an algorithm, which is quadratically dependent
on the size of the data

B.
Attributes of a database table that can take only numerical values

C.
Tools designed to query a database

D.
None of these
Answer Report Discuss

Option: B

Explanation :

13:
Subject orientation

A
. The science of collecting, organizing, and applying numerical facts

B.
Measure of the probability that a certain hypothesis is incorrect given certain
observations.

C.
One of the defining aspects of a data warehouse, which is specially built
around all the existing applications of the operational data

D
. None of these

Answer Report Discuss

Option: C

Explanation :

14:
Vector

A.
It do not need the control of the human operator during their execution
B.
An arrow in a multi-dimensional space. It is a quantity usually characterized
by an ordered set of scalars

C.
The validation of a theory on the basis of a finite number of examples

D.
None of these

Answer Report Discuss

Option: B

Explanation :

15:
Transparency

A
. The large set of candidate solutions possible for a problem

B.
The information stored in a database that can be retrieved with a single query

C.
Worth of the output of a machine learning program that makes it
understandable for humans

D
. None of these

Answer Report Discuss


Option: C

Explanation :

Read more: http://www.avatto.com/computer-science/test/mcqs/soft-


computing/questions/192/3.html#ixzz46VFK5DKd
1:
Core of soft Computing is

A.
Fuzzy Computing, Neural Computing, Genetic Algorithms

B.
Fuzzy Networks and Artificial Intelligence

C.
Artificial Intelligence and Neural Science

D.
Neural Science and Genetic Science

Answer Report Discuss

Option: A

Explanation :

2:
Who initiated the idea of Soft Computing

A.
Charles Darwin

B.
Lofti A Zadeh
C.
Rechenberg

D.
Mc_Culloch

Answer Report Discuss

Option: B

Explanation :

3:
Fuzzy Computing

A
. mimics human behaviour

B.
doesnt deal with 2 valued logic

C.
deals with information which is vague, imprecise, uncertain, ambiguous,
inexact, or probabilistic

D
. All of the above

Answer Report Discuss

Option: D

Explanation :
4:
Neural Computing

A.
mimics human brain

B.
information processing paradigm

C.
Both (a) and (b)

D.
None of the above

Answer Report Discuss

Option: C

Explanation :

5:
Genetic Algorithm are a part of

A
. Evolutionary Computing

B.
inspired by Darwin's theory about evolution - "survival of the fittest"

C.
are adaptive heuristic search algorithm based on the evolutionary ideas of
natural selection and genetics
D
. All of the above

Answer Report Discuss

Option: D

Explanation

Read more: http://www.avatto.com/computer-science/test/mcqs/soft-


computing/introduction/512/1.html#ixzz46VFZ9z1x
6:
What are the 2 types of learning

A.
Improvised and unimprovised

B.
supervised and unsupervised

C.
Layered and unlayered

D.
None of the above

Answer Report Discuss

Option: B

Explanation :

7:
Supervised Learning is
A.
learning with the help of examples

B.
learning without teacher

C.
learning with the help of teacher

D.
learning with computers as supervisor

Answer Report Discuss

Option: C

Explanation :

8:
Unsupervised learning is

A.
learning without computers

B.
problem based learning

C.
learning from environment

D.
learning from teachers
Answer Report Discuss

Option: C

Explanation :

9:
Conventional Artificial Intelligence is different from soft computing in the sense

A.
Conventional Artificial Intelligence deal with prdicate logic where as soft
computing deal with fuzzy logic

B.
Conventional Artificial Intelligence methods are limited by symbols where
as soft computing is based on empirical data

C.
Both (a) and (b)

D.
None of the above

Answer Report Discuss

Option: C

Explanation :

10:
In supervised learning

A.
classes are not predefined
B.
classes are predefined

C.
classes are not required

D.
classification is not done

Answer Report Discuss

Option: B

Explanation :

Read more: http://www.avatto.com/computer-science/test/mcqs/soft-


computing/introduction/512/2.html#ixzz46VFqvgSd
1:
Membership function defines the fuzziness in a fuzzy set irrespective of the
elements in the set, which are discrete or continuous.

A.
True

B.
False

C.

D.

Answer Report Discuss

Option: A

Explanation :
2:
The membership functions are generally represented in

A.
Tabular Form

B.
Graphical Form

C.
Mathematical Form

D.
Logical Form

Answer Report Discuss

Option: B

Explanation :

3:
Membership function can be thought of as a technique to solve empirical problems
on the basis of

A.
knowledge

B.
examples

C.
learning
D.
experience

Answer Report Discuss

Option: D

Explanation :

4: Three main basic features involved in characterizing membership function are

A.
Intution, Inference, Rank Ordering

B.
Fuzzy Algorithm, Neural network, Genetic Algorithm

C.
Core, Support , Boundary

D.
Weighted Average, center of Sums, Median

Answer Report Discuss

Option: C

Explanation :

5:
The region of universe that is characterized by complete membership in the set is
called

A.
Core
B.
Support

C.
Boundary

D.
Fuzzy

Answer Report Discuss

Option: A

Explanation :

Read more: http://www.avatto.com/computer-science/test/mcqs/soft-


computing/questions/369/1.html#ixzz46VG385ou
6: A fuzzy set whose membership function has at least one element x in the universe
whose membership value
is unity is called

A.
sub normal fuzzy sets

B.
normal fuzzy set

C.
convex fuzzy set

D.
concave fuzzy set
Answer Report Discuss

7:
In a Fuzzy set a prototypical element has a value

A.
1

B.
0

C.
infinite

D.
Not defined

Answer Report Discuss

Option: A

Explanation :

8:
A fuzzy set wherein no membership function has its value equal to 1 is called

A.
normal fuzzy set

B.
subnormal fuzzy set.

C.
convex fuzzy set
D.
concave fuzzy set

Answer Report Discuss

Option: B

Explanation :

9: A fuzzy set has a membership function whose membership values are strictly
monotonically increasing or strictly monotonically decreasing or strictly
monotonically increasing than strictly monotonically decreasing with increasing
values for elements in the universe

A.
convex fuzzy set

B.
concave fuzzy set

C.
Non concave Fuzzy set

D.
Non Convex Fuzzy set

Answer Report Discuss

Option: A

Explanation :
10:
The membership values of the membership function are nor strictly
monotonically increasing or decreasing or strictly monoronically increasing than
decreasing.

A.
Convex Fuzzy Set

B.
Non convex fuzzy set

C.
Normal Fuzzy set

D.
Sub normal fuzzy set

Answer Report Discuss

Option: B

Explanation :

Read more: http://www.avatto.com/computer-science/test/mcqs/soft-


computing/questions/369/2.html#ixzz46VGHJtYr
11:
Match the Column

List I
List II

1 Subnormal Fuzzy Set

2 Normal Fuzzy Set

3 Non Convex Normal Fuzzy Set

4 Convex Normal Fuzzy Set

A.
a b c d
2 1 4 3

B.
a b c d

1 2 3 4

C.
a b c d

4 3 2 1

D.
a b c d

3 2 1 4

Answer Report Discuss

Option: A

Explanation :

12: The crossover points of a membership function are defined as the elements in the
universe for which a particular fuzzy set has values equal to

A.
infinite

B.
1

C.
0
D.
0.5

Answer Report Discuss

Option: D

Explanation :

Read more: http://www.avatto.com/computer-science/test/mcqs/soft-


computing/questions/369/3.html#ixzz46VGTKXoG

Questions

1. Which of the following(s) is/are found in Genetic Algorithms?

(i)

evolution

(ii)

selection

(iii)

reproduction

(iv)

mutation

: Your answer is

(a)

i & ii only

(b)

i, ii & iii only

(c)

ii, iii & iv only


(d)

all of the above

2. Matching between terminologies of Genetic Algorithms and


Genetics:

Genetic Algorithms Genetics (biology)

(a) (i)

representation external disturbance,


structures such as cosmic radiation

(b) (ii)

crossover chromosomes

(c) (iii)

mutation survivability

(d) (iv)

selection sexual reproduction

: Your answer is .3

4. (a)
5. _____
6. (b)
7. _____
8. (c)
9. _____
10.(d)
11._____

12.Where are Genetic Algorithms applicable?

(i)

real time application

(ii)

biology

(iii)

Artificial Life
(iv)

economics

: Your answer is

(a)

i, ii & iii only

(b)

ii, iii & iv only

(c)

i, iii & iv only

(d)

all of the above

13.Which of the following(s) is/are the pre-requisite(s) when Genetic


Algorithms are applied to solve problems?

(i)

encoding of solutions

(ii)

well-understood search space

(iii)

method of evaluating the suitability of the solutions

(iv)

contain only one optimal solution

: Your answer is

(a)

i & ii only

(b)

ii & iii only

(c)

i & iii only


(d)

iii & iv only

14.Which of the following statement(s) is/are true?

(i)

Genetic Algorithm is a randomised parallel search algorithm, based


on the principles of natural selection, the process of evolution.

(ii)

GAs are exhaustive, giving out all the optimal solutions to a given
problem.

(iii)

GAs are used for solving optimization problems and modeling


evolutionary phenomena in the natural world.

(iv)

Despite their utility, GAs remain a poorly understood topic.

: Your answer is

(a)

i, ii & iii only

(b)

ii, iii & iv only

(c)

i, iii & iv only

(d)

all of the above

15.If crossover between chromosome in search space does not produce


significantly different offsprings, what does it imply? (if offspring
consist of one half of each parent)

(i)

The crossover operation is not succesful.

(ii)

Solution is about to be reached.


(iii)

Diversity is so poor that the parents involved in the crossover


operation are similar.

(iv)

The search space of the problem is not ideal for GAs to operate.

: Your answer is

(a)

ii, iii & iv only

(b)

ii & iii only

(c)

i, iii & iv only

(d)

all of the above

16.Which of the following comparison is true?

: Your answer is

(a)

In the event of restricted acess to information, GAs win out in that


they require much fewer information to operate than other search.

(b)

Under any circumstances, GAs always outperform other algorithms.

(c)

The qualities of solutions offered by GAs for any problems are


always better than those provided by other search.

(d)

GAs could be applied to any problem, whereas certain algorithms


are applicable to limited domains.

17.Which of the following statement(s) is/are true?

(i)
Artificial Life is analytic, trying to break down complex phenomena
into their basic components.

(ii)

Alife is a kind of Artificial Intelligence (AI).

(iii)

Alife pursues a two-fold goal: increasing our understanding of


nature and enhancing our insight into artificial models, thereby
providing us with the ability to improve their performance.

(iv)

Alife extends our studies of biology, life-as-we-know-it, to the larger


domain of possible life, life-as-it-could-be.

: Your answer is

(a)

i & ii only

(b)

iii & iv only

(c)

i, ii & iii only

(d)

all of the above

18.Where is Artificial Life applicable?

(i)

film (movie, video) production

(ii)

biology

(iii)

robotics

(iv)

air traffic control


: Your answer is

(a)

i, ii & iii only

(b)

ii, iii & iv only

(c)

i, iii & iv only

(d)

all of the above

19.Who can be benefited from Alife?

(i)

children

(ii)

designers

(iii)

artists

(iv)

patients

: Your answer is

(a)

i, ii & iii only

(b)

ii, iii & iv only

(c)

i, iii & iv only

(d)

all of the above


: Answers

Q1.

Which of the following(s) is/are found in Genetic Algorithms?

The correct answer is (d).

An initial population evolves to some optimal solutions. Selection biases for


better individuals, judged by their fitness values; two individuals are chosen
for reproducing offspring. By combining portions of good individuals, this
.process is likely to create even better individuals

...Go Back

Q2.

Matching between terminologies of Genetic Algorithms and


Genetics:

The correct answer is :

(a)

(ii)

(b)

(iv)

(c)

(i)

(d)

(iii)

...Go Back

Q3.

Where are Genetic Algorithms applicable?

The correct answer is (b).

Genetic Algorithms can be used to evolve strategies for interaction in the


Prisoner's Dilemma in economics. GAs are used as a computational method in
Alife - simulation of living systems starting with single cells and evolving to
orgranisms, societies or even whole economic systems. These features
compete for the limited resources in this virtual world. In biology, GAs are
used in protein structure prediction, protein folding, stability of DNA hairpins
.and modeling of immune system

DNA structures Protein Structures

It cannot be applied in real time systems. The response time is critical.


However, GAs cannot guarantee to find a solution. The time spent in
evaluation of fitness function and other genetic operations is substantially
.large, especially in a poorly- understood, complex search space

...Go Back

Q4.

Which of the following(s) is/are the requirement(s) when Genetic


Algorithms are applied to solve problems?

The correct answer is (c).

The problem is mapped into a set of strings with each string representing a
potential solution (i.e. chromosomes). A fitness function is required to
compare and tell which solution is better. GA performance is heavily
.dependent on the representation chosen

GAs are designed to efficiently search large, non-linear, poorly understood


search space where expert knowledge is scarce or difficult to encode and
where traditional techniques fail. However, domain knowledge guides GAs to
obtain the optimal solutions. Moreover, GAs are powerful enough to solve for
.a set of (nearly) optimal solutions

...Go Back
Q5.

Which of the following statement(s) is/are true?

The correct answer is (c).

The search space is too complex for exhaustive search such that GAs
successfully find robust solutions after evaluating only a few percent of the
.full parameter space

It can never be guaranteed that GAs will find an optimal solution or even any
.solution at all

Their probabilistic nature and reliance on frequent interactions of members of


a large population make a complete analytic understanding of GAs extremely
.difficult

...Go Back

Q6.

If crossover between chromosome in search space does not produce


significantly different offspring, what does it imply? (if offspring
consist of one half of each parent)

The correct answer is (b).

When crossover operation does not produce siginificantly different offsprings,


it shows that the parents involved are almost identical. Hence, it means that
solution is about to be reached. However, this solution derived is not
neccessarily the optimal solution. From here, we could see that mutation is
necessary to maintain the diversity of the population so that GAs would not be
.trapped in partial solutions

...Go Back

Q7.

Which of the following comparison is true?

The correct answer is (a).


 This is true since GAs require only information that would
evaluate the fitness function for the possible soulutions
(individuals in search space). But for other searches which
generally require more information, like differentiability of
problem function, might find it hard to find them.

 This holds true in most circumstances. However, if the search


space is small enough, other search like hill-climbing or
heuristic, which are very effective in explorating small space,
would just perform as good.

 GAs have only been developed for a couple of decades while


traditional searches have been investigated for a longer time.
Thus GAs do not necessarily produce a better quality solution.

 Evidently certain algorithms are only applicable to limited


domains . However, certain difficulties, like encoding of
problems, might hinder the use of GAs.

...Go Back

Q8.

Which of the following statement(s) is/are true?

The correct answer is (b).

Alife is characterised by a bottom-up synthesis approach, so that the robotics


work tends to aim for insect-like capability rather than human, and complex
hebaviours are developed by putting together more simple ones. Artificial
forms of evolution such as Genetic Algorithms and Genetic Programming are
widely used to evolve solutions or behaviours rather than designing them in a
.top-down fashion in Artificial Intelligence

...Go Back

Q9.

Where is Artificial Life applicable?

The correct answer is (d).

Alife is applicable in many fields, such as a walking robot


.shown on the right

...Go Back
Q10.

Who can be benefited from Alife?

The correct answer is (d).

Children can use various computational tools (including LEGO/Logo


and Electronic Bricks) to build artificial creatures, exploring

.some of the central ideas of Alife

GAs can be applied to the design of laminated composite structures, circuit


designs and the improvement of Pareto optimal designs. Genetic programming
can help artists to create many pictures. Medical problems can also be
.detected: Medibrains

...Go Back
1. Which type od the model is having the memory associated with it?
a) GAN
b) Autoencoder
c) RNN
d) CNN

2) RNN model works with


a) random data
b) nominal data
c) ordinal data
d) sequential data

3) Which of the data is an example of sequential data


a) MNIST data
b) house rate prediction data
c) weather forecasting data
d) CIFAR10 dataset

4) LSTM layer is used to


a) avoid the problem of exploding grdients
b) avoid the problem of vanishing gradients
c) to retain the previous state of the model
d) to work with ordinal type of data

5) Which methos is used to avoid the exploding gradient problem


a) LSTM
b) TBTT
c) forget cell
d) autoencoders

Type of RNN
applications of RNN
Forget Cell

Which should be the value of |Whh| so that model does not stuck in exploding and vanishing gradient problem
a) <1
b) >1
c) =1
d) =0

What of the following is the part of LSTM?


a) stride
b) zero padding
c) discriminator
d) Forget cell

Which of the method uses the trainable parameters for converting string data into numerical data"
a) one hot encoding
b) representing each word with unique number
c) word embedding
d) All of these

Which ofsource
This study the was
method have
downloaded least relationship
by 100000795234702 with encoded
from CourseHero.com data and
on 04-24-2022 string
03:52:32 type
GMT of data
-05:00

https://www.coursehero.com/file/75294158/unit-5pdf/
a) one hot encoding
b) representing each word with unique number
c) word embedding
d) All of these

This study source was downloaded by 100000795234702 from CourseHero.com on 04-24-2022 03:52:32 GMT -05:00

https://www.coursehero.com/file/75294158/unit-5pdf/
Powered by TCPDF (www.tcpdf.org)
1. High bias means- 10. Size of weights for the followin
Underfit g code is : model.keras.sequenti
Overfit al( [ layers.Dense(3)]) y=model(t
f.ones(10,5)) print(model.weights
2. How to check CPU time using [0].shape)-
python? 5 x 3
import ClockTime 10 x 5
import time 10 x 3
3 x 5
3. Python use -
Interpreter
Compiler 11. Model.compile() in keras requir
e-
4. How to check version of All of above
tensorflow? optimizer
tf.__version__ loss
tf._version_ metrics
tf.version
12. Model.save() save the model's-
5. Output of print(tf.test.gpu_devic All of above
e_name()) if only 1-GPU availa Model Architecture
ble is- Optimizer State
/device:GPU:0 Weight & Biase matrix
/device:GPU:1
13. Correct library to load saved
6. Matrix multiplication is- model in keras is-
@ keras.models.load_model()
* keras.Sequential.load_model()
** keras.layers.Dense.load_model()

7. For Square of tensor can i use- 14. Which of the following is the c
tf.square() orrect library to load pre-traine
** d NN?-
^2 tf.keras.Models
tf.keras.applications
8. Element wise matrix multiplicat tf.keras.layers
ion- tf.keras.preprocessing
a*b
a**b 15. Which of following is NOT dat
a@b a-augmentation layer?-
RandomTranslate()
9. Weights and biases in the sequ RandomCrop()
ential model assigned by either RandomFlip()
call the model with inputs or RandomRotation()
specify input shape during the
creation of the model. 16. Which of following is NOT the
True building block of LSTM-
False logic gate input gate
Weights are created once model Forget gate output gate
is declared

This study source was downloaded by 100000795234702 from CourseHero.com on 04-24-2022 03:52:09 GMT -05:00

https://www.coursehero.com/file/75294130/ml-ete-sanjay-sir-pdf/
17. Which of the following weight
matrix leads to the VANISHIN 24. Which of the following is a cor
G gradient problem in BPTT?- rect library to import text_data
|Whh|<1 set_from_directory()-
|Whh|>1 I. tf.keras.preprocessing
|Whh| =1 II. tf.keras.layers.experimental.preproc
|Whh| =0 essing
III. sklearn.preprocessing
18. Which of following is NOT the IV. tf.keras.modes.preprocessing
gate in LSTM?
Multiplication gate 25. Which of the following can be
Input Gate used to solve the vanishing gra
Forget gate dient problem of BPTT?
Output gate LSTM
LSTM or GRU both can be used
19. A Gate in LSTM has an activa GRU
tion function- Dropout
Tanh
sigmoid 26. Rescaling and Resizing is the p
threshold reprocessing layers, that can be
linear imported from library -
tf.keras.layers.experimental.preproc
20. Which of the following |Whh| l essing
eads to Exploding gradient pro tf.keras.layers.preprocessing
blem? tf.keras.preprocessing
|Whh| > 1 tf.keras.models.layers.preprocessing
|Whh| < 1
|Whh| = 1 27. A dataset 'x_train' contains 50
|Whh| = 0 batch with each having size 32.
Number of batch in x_new=x_
21. Which of following is a correct train.take(20) is -
library for Embedding layer i 20
n RNN? 50
I. tf.keras.layers 30
II. tf.keras.applications 32
III. tf.keras.layers.experimental.preproc
essing 28. A dataset 'x_train' contains 50
IV. tf.keras.Models batch with each having size 32.
Number of batch in x_new=x_
22. LSTM stands for - train.skip(20) is -
Long Short Term Memory 30
Length Short Term Memory 50
Long Sequential Term Memory 20
Length Short Term Memory 32

23. GRU stands for - 29. Which of following is correct u


Gated Recurrent Unit se of three LSTM layers?
Graphical Recurrent Unit A) tf.keras.layers.LSTM(128,return_sequ
Generalized Recurrent Unit ences=True);
Gated Recurrence Unit tf.keras.layers.LSTM(64);
tf.keras.layers.LSTM(32)

This study source was downloaded by 100000795234702 from CourseHero.com on 04-24-2022 03:52:09 GMT -05:00

https://www.coursehero.com/file/75294130/ml-ete-sanjay-sir-pdf/
B) tf.keras.layers.LSTM(128,return_sequ vertical_and_horizontal
ences=True); horizontal
tf.keras.layers.LSTM(64,return_seque
nce=True);
vertical
tf.keras.layers.LSTM(32) horizontal_and_vertical
C) tf.keras.layers.LSTM(128,return_sequ
ences=True); 35. Data augmentation layers are a
tf.keras.layers.LSTM(64); vailable in which directory?
tf.keras.layers.LSTM(32,return_seque
nces=True)
D) tf.keras.layers.LSTM(128); tf.keras.layers.experimental.preprocessing
tf.keras.layers.LSTM(64); tf.keras.preprocessing
tf.keras.layers.LSTM(32) tf.keras.models.layers.preprocessing
tf.data.experimental.preprocessing

30. Which of following activation f


36. Which of following layers is N
unction used in GRU?
OT a type of recurrent neural
both sigmoid and tanh
network layers?
Sigmoid tf.keras.layers.experimental.preprocessing.
Tanh TextVectorization()
Relu tf.keras.layers.LSTM()
Softmax tf.keras.layers.GRU()
tf.keras.layers.Bidirectional()
31. Which of the following is the b
est suitable application of RNN? 37. Which of following is NOT the
Text Classification methods of TextVectorization l
Time series forecasting ayer of tensorflow?
Text Generation ngrams()
Image classification adapt()
Regression get_vocabulary()
set_vocabulary()
32. Which of the following neural
network layers are supported b 38. TextVectorization layer is availa
y keras? ble in ...…
tf.keras.layers.experimental.preprocessing
All of above tf.keras.preprocessing
Conv2D tf.keras.layers.preprocessing
Conv2DTranspose sklearn.preprocessing
GlobalAveragePooling1D
LSTM 39. Which of the following is used
to stop exploding and Vanishin
33. Which of the following is false g Gradient-
about radial basis function neu LSTM
ral network? GRU
It resembles to RNNs which have feed Bidirectional
back loops. Dropout
None of the above.
It use radial besis function as activation
function. 40. Which of the following Method
While outputting, it considers the distan- uses an exponentially weighted
ce of a point with respect to the center. linear function of past observa
tions?
34. Which of following is NOT the Simple Exponential Smoothing
parameter of data augmentatio Holt Winter’s Exponential Smoothing
n layer RandomFlip()- Vector Autoregression
Autoregression

This study source was downloaded by 100000795234702 from CourseHero.com on 04-24-2022 03:52:09 GMT -05:00

https://www.coursehero.com/file/75294130/ml-ete-sanjay-sir-pdf/
IV. Autoregressive Integrated Moving Avera
41. Which of the following is time ge (ARIMA)
series analysis method NOT su
pport both Trends or signal co 46. Which of following GANs uses
mponent? unpaierd data for prediction?
I. Seasonal Autoregressive Integrated Movi CycleGAN
ng Average (SARIMAX) Pix2Pix
II. Autoregressive Integrated Moving DCGAN
Average (ARIMA) FGSM
III. Seasonal Autoregressive Integrated Movi
ng Average with exogenous variable(S A
RIMAX) 47. What is the full form of FGS
IV. Holt Winter’s Exponential Smoothing M?
Fast Gradient Sign Method
42. Which of following is the corre Fast Gradient Sigmoid Method
ct method to load autoregressio Fourier Gradient Signature Methd
n model? Fast Gravity Sign Magnitude
I. from statsmodels.tsa.ar_model import Au
toReg 48. Which of the following is NOT
II. from statsmodels.tsa.arima_model import GANs networks?
ARMA
III. from statsmodels.tsa.arima_model import FGSM
AutoReg DeepDream
IV. from statsmodels.tsa.ar_model import Au Pix2Pix
toRegression CycleGAN

43. Which of the following is NOT 49. Which of following is TRUE fo


a multivariate time series anal r adversary example?
ysis method? I. These examples are added to train data
I. Vector Autoregression - intensely to fool Neural Networks
II. AutoRegression II. These examples are used for validation
III. Vector Autoregression Moving-Average ` III. These examples are used for training.
with exogenous variable IV. None of above
IV. Vector Autoregression Moving-Average
50. FGSM is a type of __________
44. Which of the following is time ___________ attack on NN.
series analysis method NOT su Black box
pport both Trends or signal co White box
mponent? Gray box
I. Autoregressive Integrated Moving Pink box
Average (ARIMA)
II. Seasonal Autoregressive Integrated
Moving Average with exogenous 51. Which of following is NOT infe
variable(SARIMAX) rence attack in NN?
III. Holt Winter’s Exponential Smoothing Fuzzy Inference
IV. Vector Autoregression Membership Inference
Attribute Inference
45. Which of the following is meth
Model Inference
od supports multivariate time s
Input Inference
eries analysis?
I. Vector Autoregression
II. Seasonal Autoregressive Integrated 52. Which of following is NOT an
Moving Average with exogenous attack on NN?
variable(S ARIMAX) Phissing
III. Holt Winter’s Exponential Smoothing Pisioning

This study source was downloaded by 100000795234702 from CourseHero.com on 04-24-2022 03:52:09 GMT -05:00

https://www.coursehero.com/file/75294130/ml-ete-sanjay-sir-pdf/
Backdooring Attribute Inference
Torjoning Model Inference

53. In which of following attack, a 59. Which of following attack is on


n attacker wants to extract trai Data of ML model?
ning data of a model? Data poisioning
Attribute Inference Adversarial attacks
Input Inference Backdooring
Membership Inference Torjoning
Model Inference
60. Which of following layer can b
54. Which of the following network e used to convert Text data to
s can be used to convert one i Index vector?
mage in a form on a painting TextVectorization()
of another image? Text2Vec()
Neural style transfer text_data_from_directory()
Pix-to-Pix transfer text_to_int()
DCGAN
FGSM 61. Which of following is the corre
ct output by applying AvergaeP
55. Which of the following is corre ooling2D((3,3)) on input image
ct library to import Data augm [[1,2,3],[4,5,6],[7,8,9]]
entation layer? [[[[5.0]]]]
tf.keras.layers.experimental.preprocessing [[[5.0]]]
tf.keras.layer.preprcoessing [5.0]
tf.keras.preprocessing
tf.keras.models.preprocessing
[[5.0]]

56. Which of following techniques 62. What is the correct outputshap


used to randomly rotate, crop, e by applying Conv2D(32,7) on
zoom etc. to input image to sto input image of size 32x32x3?
p overtraining is called as - 26x26x32
Data Augmentation 32x32x32
Early Stopping 32x32x3
Feature Scaling 25x25x32
Cross Validation
63. Which of the following method
57. Which of the following network of matplotlib can be used to p
s used to generate a new image lot different graphs in shame fi
that looks like real ? gure?
GAN subplot()
CNN plot()
Perceptron imshow()
RNN grid()

58. Which of following privacy atta 64. Which of the following operato
ck on ML model, where attack rs NOT supported in python te
ers want to extract training dat nsorflow?
a of model? #
Membership Inference ^
Input Inference **

This study source was downloaded by 100000795234702 from CourseHero.com on 04-24-2022 03:52:09 GMT -05:00

https://www.coursehero.com/file/75294130/ml-ete-sanjay-sir-pdf/
@ I.
II.
tf.keras.application.mobilenet_v2.decode_predictions
tf.keras.application.mobilenet_v2.preprocess_input
III. tf.keras.application.mobilenetV2.decode_predictions
65. Which of following weights are IV. tf.keras.application.mobilenet_v2.MobileNetV2.decode_predictions

desirable is BPTT algorithm?


71. Which of following command is
|W|=1
used to compiles a function in
|W|<1
to a callable TensorFlow graph
|W|>1
in version 2.3.0?
|W|=0
tf.function()
tf.Graph()
66. Which of following layers are
tf.Variable()
NOT a data augmentation late
tf.Constant()
r?
RAndomContrast()
72. Which of following is NOT cor
RandomRotate()
rect tensor in TensorFlow?
RandomFlip()
Encode Tensor
RandomZoom()
String Tensor
RandomCrop()
Ragged Tensor
Sparse Tensor
67. How many training parameters
in following model 1. input la
73. Which of following code return
yer 28x28x3, 2) Conv2D(64,7) ,
sum of numbers in string x se
3) Dropout(0.5), 4) Flatten()-
parated by space for example x
9408
='1 2 3 4 5'-
9472 I. a=tf.strings.to_number(tf.strings.split(x,sep=' '))
25088 add=tf.reduce_sum(a)
II. a=tf.strings.to_Int(tf.strings.split(x,sep=' '))
32832 add=tf.sum(a)
III. a=tf.strings.ParseInt(tf.strings.split(x,sep=' '))
add=tf.reduce_sum(a)
68. What is the output shape follo IV. a=tf.strings.to_number(tf.strings.split(x,sep=' '))
wing model 1. input layer 20x2 add=tf.sum(a)
0x3 2) Conv2D(7,14) , 3) Maxp
ooling2D((2,2)) 3) Dropout(0.5), 74. Datatype of ‘hist.history’ of foll
4) Flatten()- owing object is hist=model.fit(tr
49 x 1 ain_data,train_label,epochs=15)-
63 x 1 Dictionary
7 x 1 Tensor
343 x 1 List
String
69. Which of following networks m
ay be used to colourize binary 75. In keras, input_dim parameter
image? is set on which layer of the ne
ural network?
DeepDream Input layers
Pix2Pix Hidden layers
Neural Style Transfer Dropout layers
CycleGAN Output Layers

70. Which of following is the corre 76. Which of the following is true
ct library to convert predicted about dropout?
value of mobilenetV2 to correct Dropout is a regularization technique
label? Dropout does not reduce overfitting.
Dropout solves vanishing gradient problem.

This study source was downloaded by 100000795234702 from CourseHero.com on 04-24-2022 03:52:09 GMT -05:00

https://www.coursehero.com/file/75294130/ml-ete-sanjay-sir-pdf/
All of the above.

77. Which types of layers are used


in Discriminator?
All of the above
Conv2D
LSTM
Conv2DTranspose

78. In conditional GAN, a conditio


nal parameter is added to ……
Generator
Both Discriminator and Generator
Discriminator
None of the above

79. Which of the following is false


about LSTM?
I. LSTM is an extension for RNN which
extends its memory.
II. LSTM solves the exploding gradients is
sue in RNN.
III. None of the above
IV. LSTM enables RNN to learn long term
dependencies.

80. Which of the following is an a


pplication of RNN?

All of the above


NLP
Audio and video analysis
Stock market prediction

This study source was downloaded by 100000795234702 from CourseHero.com on 04-24-2022 03:52:09 GMT -05:00

https://www.coursehero.com/file/75294130/ml-ete-sanjay-sir-pdf/
Powered by TCPDF (www.tcpdf.org)
4/24/22, 2:20 PM [MCQ] Soft Computing - Last Moment Tuitions

Sem 8 Sale is Live Get Bundle Course @1999/- [ Click Here 


(https://lastmomenttuitions.com/course-category/mu/courses/fy/fy-comps/sem-8/) ]

Get Latest Exam Updates, Free Study m

(https://lastmome

nttuitions.com/)

ng

[MCQ] Soft Computing

Fuzzy Set Theory (#1617706897112-a3dcb97e-bb1e)

 Module 1

1. What is Fuzzy Logic?

A. a method of reasoning that resembles human reasoning

B. a method of question that resembles human answer

C. a method of giving answer that resembles human answer.

D. None of the Above

View Answer

Ans : A

Explanation: Fuzzy Logic (FL) is a method of reasoning that resembles human reasoning.

2. How many output Fuzzy Logic produce?

A. 2

B. 3

C. 4

D. 5

https://lastmomenttuitions.com/mcqs/it-engineering/mcq-soft-computing/ 1/12
4/24/22, 2:20 PM [MCQ] Soft Computing - Last Moment Tuitions

View Answer

Ans : A

Explanation: The conventional logic block that a computer can understand takes precise

input and produces a de nite output as TRUE or FALSE, which is equivalent to human’s YES

or NO.

3. Fuzzy Logic can be implemented in?

A. Hardware

B. software

C. Both A and B

D. None of the Above

View Answer

Ans : C

Explanation: It can be implemented in hardware, software, or a combination of both.

4. The truth values of traditional set theory is ____________ and that of fuzzy set is __________

A. Either 0 or 1, between 0 & 1

B. Between 0 & 1, either 0 or 1

C. Between 0 & 1, between 0 & 1

D. Either 0 or 1, either 0 or 1

View Answer

Ans : A

Explanation: Refer the de nition of Fuzzy set and Crisp set.

5. How many main parts are there in Fuzzy Logic Systems Architecture?

A. 3

B. 4

C. 5

D. 6

View Answer

Ans : B

Explanation: It has four main parts.

6. Each element of X is mapped to a value between 0 and 1. It is called _____.

A. membership value

B. degree of membership

C. membership value

D. Both A and B

View Answer

Ans : D

https://lastmomenttuitions.com/mcqs/it-engineering/mcq-soft-computing/ 2/12
4/24/22, 2:20 PM [MCQ] Soft Computing - Last Moment Tuitions

Explanation: each element of X is mapped to a value between 0 and 1. It is called

membership value or degree of membership.

7. How many level of fuzzi er is there?

A. 4

B. 5

C. 6

D. 7

View Answer

Ans : B

Explanation: There is 5 level to fuzzi er

8. Fuzzy Set theory de nes fuzzy operators. Choose the fuzzy operators from the

following.

A. AND

B. OR

C. NOT

D. All of the above

View Answer

Ans : D

Explanation: The AND, OR, and NOT operators of Boolean logic exist in fuzzy logic, usually

de ned as the minimum, maximum, and complement;

9. The room temperature is hot. Here the hot (use of linguistic variable is used) can be

represented by _______

A. Fuzzy Set

B. Crisp Set

C. Both A and B

D. None of the Above

View Answer

Ans : A

Explanation: Fuzzy logic deals with linguistic variables.

10. What action to take when IF (temperature=Warm) AND (target=Warm) THEN?

A. Heat

B. No_Change

C. Cool

D. None of the Above

View Answer

Ans : B

Explanation: IF (temperature=Warm) AND (target=Warm) THEN No_change

https://lastmomenttuitions.com/mcqs/it-engineering/mcq-soft-computing/ 3/12
4/24/22, 2:20 PM [MCQ] Soft Computing - Last Moment Tuitions

Crack Job Placement Aptitude in First Attempt

Prepare for Aptitude with 50+ Videos Lectures and Handmade Notes

Click Here! (https://lastmomenttuitions.com/aptitude/?ref=42057)

11. What is the form of Fuzzy logic?

a) Two-valued logic

b) Crisp set logic

c) Many-valued logic

d) Binary set logic

View Answer Answer: c

Explanation: With fuzzy logic set membership is de ned by certain value. Hence it could

have many values to be in the set.

12. Traditional set theory is also known as Crisp Set theory.

a) True

b) False

View Answer Answer: a

Explanation: Traditional set theory set membership is xed or exact either the member is in

the set or not. There is only two crisp values true or false. In case of fuzzy logic there are

many values. With weight say x the member is in the set.

13. The truth values of traditional set theory is ____________ and that of fuzzy set is

__________

a) Either 0 or 1, between 0 & 1

b) Between 0 & 1, either 0 or 1

c) Between 0 & 1, between 0 & 1

d) Either 0 or 1, either 0 or 1

View Answer Answer: a

Explanation: Refer the de nition of Fuzzy set and Crisp set.

14. Fuzzy logic is extension of Crisp set with an extension of handling the concept of

Partial Truth.

a) True

b) False

View Answer Answer: a

Explanation: None.

15. The room temperature is hot. Here the hot (use of linguistic variable is used) can be

represented by _______

https://lastmomenttuitions.com/mcqs/it-engineering/mcq-soft-computing/ 4/12
4/24/22, 2:20 PM [MCQ] Soft Computing - Last Moment Tuitions

a) Fuzzy Set

b) Crisp Set

c) Fuzzy & Crisp Set

d) None of the mentioned

View Answer Answer: a

Explanation: Fuzzy logic deals with linguistic variables.

16. The values of the set membership is represented by ___________

a) Discrete Set

b) Degree of truth

c) Probabilities

d) Both Degree of truth & Probabilities

View Answer Answer: b

Explanation: Both Probabilities and degree of truth ranges between 0 – 1.

17. Japanese were the rst to utilize fuzzy logic practically on high-speed trains in Sendai.

a) True

b) False

View Answer Answer: a

Explanation: None.

18. Fuzzy Set theory de nes fuzzy operators. Choose the fuzzy operators from the

following.

a) AND

b) OR

c) NOT

d) All of the mentioned

View Answer Answer: d

Explanation: The AND, OR, and NOT operators of Boolean logic exist in fuzzy logic, usually

de ned as the minimum, maximum, and complement;

19. There are also other operators, more linguistic in nature, called __________ that can be

applied to fuzzy set theory.

a) Hedges

b) Lingual Variable

c) Fuzz Variable

d) None of the mentioned

View Answer Answer: a

Explanation: None.

https://lastmomenttuitions.com/mcqs/it-engineering/mcq-soft-computing/ 5/12
4/24/22, 2:20 PM [MCQ] Soft Computing - Last Moment Tuitions

20. Fuzzy logic is usually represented as ___________

a) IF-THEN-ELSE rules

b) IF-THEN rules

c) Both IF-THEN-ELSE rules & IF-THEN rules

d) None of the mentioned

View Answer Answer: b

Explanation: Fuzzy set theory de nes fuzzy operators on fuzzy sets. The problem in applying

this is that the appropriate fuzzy operator may not be known. For this reason, fuzzy logic

usually uses IF-THEN rules, or constructs that are equivalent, such as fuzzy associative

matrices.

Rules are usually expressed in the form:

IF variable IS property THEN action

Crack Job Placement Aptitude in First Attempt

Prepare for Aptitude with 50+ Videos Lectures and Handmade Notes

Click Here! (https://lastmomenttuitions.com/aptitude/?ref=42057)

21. Like relational databases there does exists fuzzy relational databases.

a) True

b) False

View Answer Answer: a

Explanation: Once fuzzy relations are de ned, it is possible to develop fuzzy relational

databases. The rst fuzzy relational database, FRDB, appeared in Maria Zemankova

dissertation.

22. ______________ is/are the way/s to represent uncertainty.

a) Fuzzy Logic

b) Probability

c) Entropy

d) All of the mentioned

View Answer Answer: d

Explanation: Entropy is amount of uncertainty involved in data. Represented by H(data).

23. ____________ are algorithms that learn from their more complex environments (hence

eco) to generalize, approximate and simplify solution logic.

a) Fuzzy Relational DB

b) Ecorithms

c) Fuzzy Set

d) None of the mentioned

https://lastmomenttuitions.com/mcqs/it-engineering/mcq-soft-computing/ 6/12
4/24/22, 2:20 PM [MCQ] Soft Computing - Last Moment Tuitions

View Answer Answer: c

Explanation: Local structure is usually associated with linear rather than exponential growth

in complexity.

24. Membership function de nes the fuzziness in a fuzzy set irrespective of the elements

in the set, which are discrete or continuous.

a.) True

b.) False

Answer: A

25.The membership functions are generally represented in

a.) Tabular form

b) Graphical form

c) Mathematical form

d) Logical form

Ans: B

26.Membership function can be thought of as a technique to solve empirical problems

on the basis of

a) knowledge

b) example

c) learning

d) experience

Ans: D

27.Three main basic features involved in characterizing membership function are

a)Intution, Inference, Rank Ordering

b)Fuzzy Algorithm, Neural network, Genetic Algorithm

c)Core, Support , Boundary

d)Weighted Average, center of Sums, Median

Ans : C

28. A fuzzy set whose membership function has at least one element x in the universe

whose membership value

is unity is called

a) sub normal fuzzy sets

b) normal fuzzy set

c) convex fuzzy set

d) concave fuzzy set

Ans: B

https://lastmomenttuitions.com/mcqs/it-engineering/mcq-soft-computing/ 7/12
4/24/22, 2:20 PM [MCQ] Soft Computing - Last Moment Tuitions

29. In a Fuzzy set a prototypical element has a value

a) 1

b) 0

c) in nite

d) not de ned

Ans: A

30. A fuzzy set wherein no membership function has its value equal to 1 is called

a) Normal fuzzy set

b) Sub normal fuzzy set

c) convex fuzzy set

d) non convex fuzzy set

Ans: B

Python Programming for Complete Beginners

Start your Programming Journey with Python Programming which is Easy to Learn and

Highly in Demand

Click Here! (https://lastmomenttuitions.com/complete-python-bootcamp/?ref=42057)

31.A fuzzy set has a membership function whose membership values are strictly

monotonically increasing or strictly monotonically decreasing or strictly monotonically

increasing than strictly monotonically decreasing with increasing values for elements in

the universe

a) Convex fuzzy set

b) Concave fuzzy set

c) Non Concave fuzzy set

d) Non Convex fuzzy set

Ans : A

32. The membership values of the membership function are nor strictly monotonically

increasing or decreasing or strictly monoronically increasing than decreasing.

a) Convex fuzzy set

b) non convex fuzzy set

c) normal fuzzy set

d) sub normal fuzzy set

Ans : B

33. Activation models are?

a) dynamic

b) static

https://lastmomenttuitions.com/mcqs/it-engineering/mcq-soft-computing/ 8/12
4/24/22, 2:20 PM [MCQ] Soft Computing - Last Moment Tuitions

c) deterministic

d) none of the mentioned

Answer: c

Explanation: Input/output patterns & the activation values may be considered as sample

functions of random process.

34. If xb(t) represents di erentiation of state x(t), then a stochastic model can be

represented by?

a) xb(t)=deterministic model

b) xb(t)=deterministic model + noise component

c) xb(t)=deterministic model*noise component

d) none of the mentioned’

Answer: b

Explanation: Noise is assumed to be additive in nature in stochastic models.

35. What is equilibrium in neural systems?

a) deviation in present state, when small perturbations occur

b) settlement of network, when small perturbations occur

c) change in state, when small perturbations occur

d) none of the mentioned

Answer: b

Explanation: Follows from basic de nition of equilibrium.

36.What is the condition in Stochastic models, if xb(t) represents di erentiation of state

x(t)?

a) xb(t)=0

b) xb(t)=1

c) xb(t)=n(t), where n is noise component

d) xb(t)=n(t)+1

Answer: c

Explanation: xb(t)=0 is condition for deterministic models, so option c is radical choice.

37. What is asynchronous update in a network?

a) update to all units is done at the same time

b) change in state of any one unit drive the whole network

c) change in state of any number of units drive the whole network

d) none of the mentioned

Answer: b

Explanation: In asynchronous update, change in state of any one unit drive the whole

network.

https://lastmomenttuitions.com/mcqs/it-engineering/mcq-soft-computing/ 9/12
4/24/22, 2:20 PM [MCQ] Soft Computing - Last Moment Tuitions

38. Learning is a?

a) slow process

b) fast process

c) can be slow or fast in general

d) can’t say

Answer: a

Explanation: Learning is a slow process.

39. What are the requirements of learning laws?

a) convergence of weights

b) learning time should be as small as possible

c) learning should use only local weights

d) all of the mentioned

Answer: d

Explanation: These all are the some of basic requirements of learning laws.

40. Memory decay a ects what kind of memory?

a) short tem memory in general

b) older memory in general

c) can be short term or older

d) none of the mentioned

Answer: a

Explanation: Memory decay a ects short term memory rather than older memories.

Crack Job Placement Aptitude in First Attempt

Prepare for Aptitude with 50+ Videos Lectures and Handmade Notes

Click Here! (https://lastmomenttuitions.com/aptitude/?ref=42057)

41. What are the requirements of learning laws?

a) learning should be able to capture more & more patterns

b) learning should be able to grasp complex nonliear mappings

c) convergence of weights

d) all of the mentioned

Answer: d

Explanation: These all are the some of basic requirements of learning laws.

42. How is pattern information distributed?

a) it is distributed all across the weights

b) it is distributed in localised weights

c) it is distributed in certain proctive weights only

https://lastmomenttuitions.com/mcqs/it-engineering/mcq-soft-computing/ 10/12
4/24/22, 2:20 PM [MCQ] Soft Computing - Last Moment Tuitions

d) none of the mentioned

Answer: a

Explanation: pattern information is highly distributed all across the weights.

Learn Machine Learning with Python from Scratch

Start your Machine learning & Data Science journey with Complete Hands-on Learning

& doubt solving Support

Click Here! (https://lastmomenttuitions.com/python-with-machine-learning/?

ref=42057)

Fuzzy Rules, Reasoning, and Inference System (#1617706897122-32786dce-f201)

Neural Network -1 (#1617712494754-a7e4f75f-4154)

Neural Network - 2 (#1617714663498-551ba020-3db8)

Genetic Algorithm (#1617719294760-67cc31db-3261)

Hybrid Computing (#1617719844315-cca3f7a5-507f)

Prepare For Your Placements: https://lastmomenttuitions.com/courses/placement-preparation/

(https://lastmomenttuitions.com/courses/placement-preparation/)

(https://lastmomenttuitions.com/course/python-zero-to-hero-covering-web-development-

and-machine-learning-capstone-project-from-scratch-included-mentorship/youtube-2/)

/ Youtube Channel: https://www.youtube.com/channel/UCGFNZxMqKLsqWERX_N2f08Q

(https://www.youtube.com/channel/UCGFNZxMqKLsqWERX_N2f08Q)

Follow For Latest Updates, Study Tips & More Content!

(https://lastmomenttuitions.com/course/python-zero-to-hero-covering-web-development-

and-machine-learning-capstone-project-from-scratch-included-mentorship/insta-

1/)/lastmomenttuition (https://www.instagram.com/lastmomenttuition/)

(https://lastmomenttuitions.com/course/python-zero-to-hero-covering-web-development-

and-machine-learning-capstone-project-from-scratch-included-mentorship/link/)/ Last Moment

Tuitions (https://in.linkedin.com/company/last-moment-

tuitions#:~:text=Last%20Moment%20Tuitions%20(LMT)%20is,others%20is%20its%20teaching%20

methodology.)

https://lastmomenttuitions.com/mcqs/it-engineering/mcq-soft-computing/ 11/12
4/24/22, 2:20 PM [MCQ] Soft Computing - Last Moment Tuitions

(https://lastmomenttuitions.com/course/python-zero-to-hero-covering-web-development-

and-machine-learning-capstone-project-from-scratch-included-

mentorship/twittrwer/)/ lastmomentdost (https://twitter.com/lastmomentdost)

https://lastmomenttuitions.com/mcqs/it-engineering/mcq-soft-computing/ 12/12
lOMoARcPSD|7609677

Final ML - Practice it

Machine learning (Lovely Professional University)

StuDocu is not sponsored or endorsed by any college or university


Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)
lOMoARcPSD|7609677

1 The most common issue when using ML is

Lack of skilled resources


Choice of appropriate algorithm
Poor quality of data
Inadequate infrastructure

2 An active learner
Both a and b
interacts with the environment at training time by posing queries
None of these
observes the information provided by the environment

3 Which of the decision matrix is used in CART algorithm?


Gini Index
Information Gain
Gain Ratio
None of these

4 The incorporation of prior knowledge that biases the learning mechanism is


known as
None of the above
Learning by memorization
Inductive Bias
Generalization

5 Which of the following sentences are true?


The best pruned tree is not the one that minimizes the number of encoding
In pre-pruning a tree is ‘pruned’ by halting its construction early

None of these
A pruning set of class labeled tuples is used to estimate cost

6 According to inductive bias in decision tree learning, which of the statement is


correct?
Avoid Overfitting
Shorter trees are preferred.

Avoid underfitting

Longer trees are preferred

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

7 What of the following feature is used to identify well-posed learning problem?


None of these
Training Experience

Performance Measure

Class of task

8 Design of learning system consists of


Choice of Training Experience
All of these

Performance Measure
Choice of function approximation algorithm

9 Which of the following algorithm can handle continuous data for decision tree?
CART
ID3
C4.5
None of these

10 Empirical Risk Minimization with inductive bias method


avoids the overfitting problem
increases training error
increases testing error
avoids the underfitting problem

11 In decision tree learning, each branch corresponds to


an attribute
attribute value
Classification value
Regression Value

12 Choose the correct statements about C4.5


It deals with continuous data and missing data
Root node is one with maximum information gain.
Gini Index is used to find root node
Root node is one with maximum Gain ratio.

13 Choose the correct statements for avoiding overfitting in decision tree?


Pre-pruning
Post pruning
Optimistic pruning

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Pessimistic pruning

14 A computer program is said to learn from experience E with respect to some


class of tasks T and performance measure P, if its performance at tasks in T, as
measured by P, improves with experience
Supervised learning problem
Un Supervised learning problem
None of these
Well posed learning problem

15 Which of the following is a disadvantage of decision trees?


None of the above
Decision trees are prone to be over fit
Factor analysis
Decision trees are robust to outliers

16 The field of study that gives computers the capability to learn without being
explicitly programmed
Artificial Intelligence
Deep Learning
Machine Learning
None of there

17 In which approach, multiple classifiers are trained using bootstrap samples?


Decision Tree
Bagging
Boosting
Stacking

18 Feature need to be identified by using Well Posed Learning Problem:


Performance measure
Class of tasks
None of these
Training experience

19 Consider a dataset with 6 instances in Outlook = Sunny. Out of 6 instances, 3


instances belongs to Yes decision and 3 belongs to No. Compute the gini index for
Outlook = Sunny.
1
0.48
0
0.5

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

20 PAC stands for


Partition Approximately Correct
Probability Approximately Correct
Probability Applied Correctly
None of these

21 According to brute-force MAP learning algorithm, which of the statement is


correct? *
None of these
The probability of data D given hypothesis h is 1 if D is inconsistent and 0 otherwise.
The probability of data D given hypothesis h is 1 if D is consistent and 0 otherwise.
The probability of data D given hypothesis h is 0 if D is consistent and 1 other wise.

22 Choose the correct statement: *


E[error(Gibbs)]<2*E[error(bayesoptimal)]
E[error(Gibbs)]<=2* E[error(bayesoptimal)]
E[error(Gibbs)]>2*E[error(bayesoptimal)]
E[error(Gibbs)]=2 * E[error(bayesoptimal)]

23 Which algorithm is used to deal with missing data? *


Maximum aposterior hypothesis
Bayes optimal classifier
EM algorithm
Gibbs algorithm

24 Which of the following is least used in ranking loss?


0-1 ranking loss
Kendall tau loss
Normalized Discounted Cumulative loss
None of the mentioned

25 Which of the following is invalid according to all pair algorithm if


class={red,gree,blue,orange}, where red=1, green=2,blue=3, orange=4
red vs green
green vs orange
blue vs red
red vs orange

26 Mona receives emails that consists of 18% spam of those emails. The spam
filter is 93% reliable i.e., 93% of the mails it marks as spam are actually a spam
and 93% of spam mails are correctly labelled as spam. If a mail marked spam by
her spam filter, determine the probability that it is really spam.
84
50
39
63

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

27 Consider the following ranking: Target: 1, 2, 3, 4, 5, 6 Obtained: 1, 2, 4, 3, 5, 6


How many concordant and discordant pairs are available?
15, 1
14,2
15,0
14,1

28 Let us consider four classes={red,green, blue, yellow} where red is considered


as 2, green as 1 , blue as 3 and yellow as 4. So consider the following output
h12=+1, h13=-1,h14=-1, h23=+1,h24=-1,h34=-1So based upon above data which
class will be predicted in all pairs?
red
green
blue
yellow

29

94, 113, 92
110, 141, 100
119, 133, 118

none of the mentioned

30 Let suppose for some document xyz, term frequency of word j is 50 and
document frequency is 2000 and total number of documents is 10. Then what will
be the TF IDF
10,000
0.025

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

-115
-382

31 Consider the following data, D: {10, 12, 12, 14, 14} what will be jackknife bias of
the mode?
12
0
13
14

32 Consider the following confusion matrix. What is the precision of the model?

0.94

0.75
0.4
0.57

33 Consider the following data which shows 5 hypothesis for robot movement. For
all hypothesis probability given training data (D) is given. As well as probability for
F, L and H based upon hypothesis (hi) is given where F stands for forward, L
stands for Left and R stands for Right. Using the bayes optimal classifier, find the
direction of movement of robot.
2/2

Front
Left
All of the above

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Right

34 Consider the two rankings: R1 = {A, B, C, D, E} R2= {A, B, C, D, E}What will be


tau coefficient?
0.75
1
0.5
0

35 Consider the following data, D: {1,3,3,5,7} , h=3 using the parzen window
estimation, what will be the probability at X=4.
3/5
1/15
1/5
3/50

36 If data is three dimensional and h=4, what will be the volume of region?
4
12
81
64

37 If value of k is very high in KNN algorithm, model is


Overfitting
Underfitting
None of these
Perfectfit\

38 Sample complexity of non-uniform learnability depends upon:


Accuracy score, confidence score
Accuracy score, confidence score, hypothesis class
Accuracy score, confidence score, hypothesis class and distribution of data
Accuracy score, confidence score, distribution of data

39 Sample complexity of consistency learnability depends upon:


Accuracy score, confidence score, hypothesis class and distribution of data
Accuracy score, confidence score
Accuracy score, confidence score, distribution of data
Accuracy score, confidence score, hypothesis class

40 Choose the correct statement:


In Fbeta score, beta times more importance is given to recall.
In Fbeta score, beta2 times more importance is given to precision.

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

In Fbeta score, beta times more importance is given to precision


In Fbeta score, beta2 times more importance is given to recall.

41 Natarajan dimension is the generalization of


Consistency Learnability
Redemacher complexity
VC-dimensiom
Non-uniform learnability

42 VC dimesion is used for


Infinite hypothesis and multiclass classification problem
Finite hypothesis and multiclass classification problem
infinite hypothesis and binary classification problem.
Finite hypothesis and binary classification problem

43 Choose the correct statement:


In Jackknife method, leave one out method is used.
If dataset is small, jackknife method increases computational complexity
If dataset is small, bootstrap method increases computational complexity.
In bootstrap method to estimate bias and variance, Leave one out method is used.

44 According to no free lunch theorem:


One classifier can be prefer over another without prior knowledge
All classifier do not perform equally if performance is taken average overall objective functions
All classifier perform equally if performance is taken average overall objective functions.
One feature can be prefer over another without prior knowledge

45 What is used to measure the uniform convergence?


Natarajan dimension
VC-dimension
Redemacher complexity
All of these

46 If in density estimation formula, if value of volume is kept constant, then which


of the technique is used?
One vs All
KNN
Parzen window
One vs One

47 A training set is called epsilon-representative if


For every h, Ls(h)-Ld(h)>=epsilon
For every h, Ls(h)-Ld(h)<=epsilon
For every h, |Ls(h)-Ld(h)|>=epsilon
For every h, |Ls(h)-Ld(h)|<=epsilon

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

48 Choose the correct statement:


In boosting samples are not taken with replacement
In bagging, samples are taken without replacement
In boosting samples are taken without replacement
In bagging samples are not taken without replacement

49 According to ugly duckling theorem:


One feature cannot be prefer over another without prior knowledge
All classifier perform equally if performance is taken average overall objective functions.
All classifier do not perform equally if performance is taken average overall objective functions
One classifier can be prefer over another without prior knowledge

50 In the structural risk minimization, prior knowledge is added to model by


All of these.
Applying feature extraction
Adding appropriate weights
Selecting relevant features

51 Choose the correct statement:


To improve the non-uniform algorithm, SRM is used
If non-uniform algorithm fails, we can predict whether it is due to approximation error or
estimation error
If non-uniform algorithm fails, we can’t predict whether it is due to approximation error or
estimation error
If PAC model fails it is due to approximation error

52 Choose the correct statement:


As the hypothesis class decreases, approximation error decreases and estimation error
increases.
As the hypothesis class decreases, approximation error increases and estimation error
decreases.
As the hypothesis class increases, approximation error decreases and estimation error
increases.
As the hypothesis class increases, approximation error increases and estimation error
decreases.

53 Axis aligned rectangle have the VC dimension


3
4
1
2

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

54 Choose the correct statement with respect to All pairs algorithm.


Number of instances in each training set is less than to number of instances in original
training set.
Number of instances in each training set is equal to number of instances in original training
set.
Number of instances in each training set is greater than number of instances in original
training set.
None of these

55 If value of k is very small in KNN algorithm, model is


Overfitting
Perfectfit
Underfitting
None of these

1. What is Machine Learning (ML)?


A. The autonomous acquisition of knowledge through the use of manual
programs
B. The selective acquisition of knowledge through the use of computer
programs
C. The selective acquisition of knowledge through the use of manual
programs
D. The autonomous acquisition of knowledge through the use of computer
programs
Correct option is D

2. Father of Machine Learning (ML)


A. Geoffrey Chaucer
B. Geoffrey Hill
C. Geoffrey Everest Hinton
D. None of the above
Correct option is C

3. Which is FALSE regarding regression?


A. It may be used for interpretation
B. It is used for prediction
C. It discovers causal relationships
D. It relates inputs to outputs
Correct option is C

4. Choose the correct option regarding machine learning (ML) and artificial
intelligence (AI)
A. ML is a set of techniques that turns a dataset into a software

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

B. AI is a software that can emulate the human mind


C. ML is an alternate way of programming intelligent machines
D. All of the above
Correct option is D

5. Which of the factors affect the performance of the learner system does not
include?
A. Good data structures
B. Representation scheme used
C. Training scenario
D. Type of feedback
Correct option is A

6. In general, to have a well-defined learning problem, we must identity which of the


following
A. The class of tasks
B. The measure of performance to be improved
C. The source of experience
D. All of the above
Correct option is D

7. Successful applications of ML
A. Learning to recognize spoken words
B. Learning to drive an autonomous vehicle
C. Learning to classify new astronomical structures
D. Learning to play world-class backgammon
E. All of the above
Correct option is E

8. Which of the following does not include different learning methods


A. Analogy
B. Introduction
C. Memorization
D. Deduction
Correct option is B

9. In language understanding, the levels of knowledge that does not include?


A. Empirical
B. Logical
C. Phonological
D. Syntactic
Correct option is A

10. Designing a machine learning approach involves:-


A. Choosing the type of training experience
B. Choosing the target function to be learned
C. Choosing a representation for the target function

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

D. Choosing a function approximation algorithm


E. All of the above
Correct option is E

11. Concept learning inferred a valued function from training examples of


its input and output.
A. Decimal
B. Hexadecimal
C. Boolean
D. All of the above
Correct option is C

12. Which of the following is not a supervised learning?


A. Naive Bayesian
B. PCA
C. Linear Regression
D. Decision Tree Answer
Correct option is B

13. What is Machine Learning?


• Artificial Intelligence
• Deep Learning
• Data Statistics
A. Only (i)
B. (i) and (ii)
C. All
D. None
Correct option is B

14. What kind of learning algorithm for “Facial identities or facial expressions”?
A. Prediction
B. Recognition Patterns
C. Generating Patterns
D. Recognizing Anomalies Answer
Correct option is B

15. Which of the following is not type of learning?


A. Unsupervised Learning
B. Supervised Learning
C. Semi-unsupervised Learning
D. Reinforcement Learning
Correct option is C

16. Real-Time decisions, Game AI, Learning Tasks, Skill Aquisition, and Robot
Navigation are applications of which of the folowing
A. Supervised Learning: Classification
B. Reinforcement Learning

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

C. Unsupervised Learning: Clustering


D. Unsupervised Learning: Regression
Correct option is B

17. Targetted marketing, Recommended Systems, and Customer Segmentation are


applications in which of the following
A. Supervised Learning: Classification
B. Unsupervised Learning: Clustering
C. Unsupervised Learning: Regression
D. Reinforcement Learning
Correct option is B

18. Fraud Detection, Image Classification, Diagnostic, and Customer Retention are
applications in which of the following
A. Unsupervised Learning: Regression
B. Supervised Learning: Classification
C. Unsupervised Learning: Clustering
D. Reinforcement Learning
Correct option is B

19. Which of the following is not function of symbolic in the various function
representation of Machine Learning?
A. Rules in propotional Logic
B. Hidden-Markov Models (HMM)
C. Rules in first-order predicate logic
D. Decision Trees
Correct option is B

20. Which of the following is not numerical functions in the various function
representation of Machine Learning?
A. Neural Network
B. Support Vector Machines
C. Case-based
D. Linear Regression
Correct option is C

21. FIND-S Algorithm starts from the most specific hypothesis and generalize it by
considering only
A. Negative
B. Positive
C. Negative or Positive
D. None of the above
Correct option is B

22. FIND-S algorithm ignores


A. Negative
B. Positive

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

C. Both
D. None of the above
Correct option is A

23. The Candidate-Elimination Algorithm represents the .


A. Solution Space
B. Version Space
C. Elimination Space
D. All of the above
Correct option is B

24. Inductive learning is based on the knowledge that if something happens a lot it is
likely to be generally
A. True
B. False Answer
Correct option is A

25. Inductive learning takes examples and generalizes rather than starting
with
A. Inductive
B. Existing
C. Deductive
D. None of these
Correct option is B

26. A drawback of the FIND-S is that it assumes the consistency within the training
set
A. True
B. False
Correct option is A

27. What strategies can help reduce overfitting in decision trees?


• Enforce a maximum depth for the tree
• Enforce a minimum number of samples in leaf nodes
• Pruning
• Make sure each leaf node is one pure class
A. All
B. (i), (ii) and (iii)
C. (i), (iii), (iv)
D. None
Correct option is B

28. Which of the following is a widely used and effective machine learning algorithm
based on the idea of bagging?
A. Decision Tree
B. Random Forest
C. Regression

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

D. Classification
Correct option is B

29. To find the minimum or the maximum of a function, we set the gradient to zero
because which of the following
A. Depends on the type of problem
B. The value of the gradient at extrema of a function is always zero
C. Both (A) and (B)
D. None of these
Correct option is B

30. Which of the following is a disadvantage of decision trees?


A. Decision trees are prone to be overfit
B. Decision trees are robust to outliers
C. Factor analysis
D. None of the above
Correct option is A

31. What is perceptron?


A. A single layer feed-forward neural network with pre-processing
B. A neural network that contains feedback
C. A double layer auto-associative neural network
D. An auto-associative neural network
Correct option is A

32. Which of the following is true for neural networks?


• The training time depends on the size of the
• Neural networks can be simulated on a conventional
• Artificial neurons are identical in operation to biological
A. All
B. Only (ii)
C. (i) and (ii)
D. None
Correct option is C

33. What are the advantages of neural networks over conventional computers?
• They have the ability to learn by
• They are more fault
• They are more suited for real time operation due to their high „computational‟
A. (i) and (ii)
B. (i) and (iii)
C. Only (i)
D. All
E. None
Correct option is D

34. What is Neuro software?

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

A. It is software used by Neurosurgeon


B. Designed to aid experts in real world
C. It is powerful and easy neural network
D. A software used to analyze neurons
Correct option is C

35. Which is true for neural networks?


A. Each node computes it‟s weighted input
B. Node could be in excited state or non-excited state
C. It has set of nodes and connections
D. All of the above
Correct option is D

36. What is the objective of backpropagation algorithm?


A. To develop learning algorithm for multilayer feedforward neural network, so that
network can be trained to capture the mapping implicitly
B. To develop learning algorithm for multilayer feedforward neural network
C. To develop learning algorithm for single layer feedforward neural network
D. All of the above
Correct option is A

37. Which of the following is true?


Single layer associative neural networks do not have the ability to:-

• Perform pattern recognition


• Find the parity of a picture
• Determine whether two or more shapes in a picture are connected or not
A. (ii) and (iii)
B. Only (ii)
C. All
D. None
Correct option is A

38. The backpropagation law is also known as generalized delta rule


A. True
B. False
Correct option is A

38. Which of the following is true?


• On average, neural networks have higher computational rates than conventional
computers.
• Neural networks learn by
• Neural networks mimic the way the human brain
A. All
B. (ii) and (iii)
C. (i), (ii) and (iii)
D. None

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Correct option is A

39. What is true regarding backpropagation rule?


A. Error in output is propagated backwards only to determine weight updates
B. There is no feedback of signal at nay stage
C. It is also called generalized delta rule
D. All of the above
Correct option is D

40. There is feedback in final stage of backpropagation


A. True
B. False
Correct option is B

41. An auto-associative network is


A. A neural network that has only one loop
B. A neural network that contains feedback
C. A single layer feed-forward neural network with pre-processing
D. A neural network that contains no loops
Correct option is B

42. A 3-input neuron has weights 1, 4 and 3. The transfer function is linear with the
constant of proportionality being equal to 3. The inputs are 4, 8 and 5
respectively. What will be the output?
A. 139
B. 153
C. 162
D. 160
Correct option is B

43. What of the following is true regarding backpropagation rule?


A. Hidden layers output is not all important, they are only meant for supporting
input and output layers
B. Actual output is determined by computing the outputs of units for each hidden
layer
C. It is a feedback neural network
D. None of the above
Correct option is B

44. What is back propagation?


A. It is another name given to the curvy function in the perceptron
B. It is the transmission of error back through the network to allow weights to be
adjusted so that the network can learn
C. It is another name given to the curvy function in the perceptron
D. None of the above
Correct option is B

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

45. The general limitations of back propagation rule is/are


A. Scaling
B. Slow convergence
C. Local minima problem
D. All of the above
Correct option is D

46. What is the meaning of generalized in statement “backpropagation is a


generalized delta rule” ?
A. Because delta is applied to only input and output layers, thus making it more
simple and generalized
B. It has no significance
C. Because delta rule can be extended to hidden layer units
D. None of the above
Correct option is C

47. Neural Networks are complex functions with many parameter


A. Linear
B. Non linear
C. Discreate
D. Exponential
Correct option is A

48. The general tasks that are performed with backpropagation algorithm
A. Pattern mapping
B. Prediction
C. Function approximation
D. All of the above
Correct option is D

49. Backpropagaion learning is based on the gradient descent along error surface.
A. True
B. False
Correct option is A

50. In backpropagation rule, how to stop the learning process?


A. No heuristic criteria exist
B. On basis of average gradient value
C. There is convergence involved
D. None of these
Correct option is B

51. Applications of NN (Neural Network)


A. Risk management
B. Data validation
C. Sales forecasting
D. All of the above

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Correct option is D

52. The network that involves backward links from output to the input and hidden
layers is known as
A. Recurrent neural network
B. Self organizing maps
C. Perceptrons
D. Single layered perceptron
Correct option is A

53. Decision Tree is a display of an Algorithm?


A. True
B. False
Correct option is A

54. Which of the following is/are the decision tree nodes?


A. End Nodes
B. Decision Nodes
C. Chance Nodes
D. All of the above
Correct option is D

55. End Nodes are represented by which of the following


A. Solar street light
B. Triangles
C. Circles
D. Squares
Correct option is B

56. Decision Nodes are represented by which of the following


A. Solar street light
B. Triangles
C. Circles
D. Squares
Correct option is D

57. Chance Nodes are represented by which of the following


A. Solar street light
B. Triangles
C. Circles
D. Squares
Correct option is C

58. Advantage of Decision Trees


A. Possible Scenarios can be added
B. Use a white box model, if given result is provided by a model
C. Worst, best and expected values can be determined for different scenarios

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

D. All of the above


Correct option is D

59. terms are required for building a bayes model.


A. 1
B. 2
C. 3
D. 4
Correct option is C

60. Which of the following is the consequence between a node and its predecessors
while creating bayesian network?
A. Conditionally independent
B. Functionally dependent
C. Both Conditionally dependant & Dependant
D. Dependent
Correct option is A

61. Why it is needed to make probabilistic systems feasible in the world?


A. Feasibility
B. Reliability
C. Crucial robustness
D. None of the above
Correct option is C

62. Bayes rule can be used for:-


A. Solving queries
B. Increasing complexity
C. Answering probabilistic query
D. Decreasing complexity
Correct option is C

63. provides way and means of weighing up the desirability of goals and the
likelihood of achieving
A. Utility theory
B. Decision theory
C. Bayesian networks
D. Probability theory
Correct option is A

64. Which of the following provided by the Bayesian Network?


A. Complete description of the problem
B. Partial description of the domain
C. Complete description of the domain
D. All of the above
Correct option is C

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

65. Probability provides a way of summarizing the that comes from our laziness and

A. Belief
B. Uncertaintity
C. Joint probability distributions
D. Randomness
Correct option is B

66. The entries in the full joint probability distribution can be calculated as
A. Using variables
B. Both Using variables & information
C. Using information
D. All of the above
Correct option is C

67. Causal chain (For example, Smoking cause cancer) gives rise to:-
A. Conditionally Independence
B. Conditionally Dependence
C. Both
D. None of the above
Correct option is A

68. The bayesian network can be used to answer any query by using:-
A. Full distribution
B. Joint distribution
C. Partial distribution
D. All of the above
Correct option is B

69. Bayesian networks allow compact specification of:-


A. Joint probability distributions
B. Belief
C. Propositional logic statements
D. All of the above
Correct option is A

70. The compactness of the bayesian network can be described by


A. Fully structured
B. Locally structured
C. Partially structured
D. All of the above
Correct option is B

71. The Expectation-Maximization Algorithm has been used to identify conserved


domains in unaligned proteins only. State True or False.
A. True
B. False

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Correct option is B

72. Which of the following is correct about the Naive Bayes?


A. Assumes that all the features in a dataset are independent
B. Assumes that all the features in a dataset are equally important
C. Both
D. All of the above
Correct option is C

73. Which of the following is false regarding EM Algorithm?


A. The alignment provides an estimate of the base or amino acid composition of
each column in the site
B. The column-by-column composition of the site already available is used to
estimate the probability of finding the site at any position in each of the
sequences
C. The row-by-column composition of the site already available is used to estimate
the probability
D. None of the above
Correct option is C

74. Naïve Bayes Algorithm is a learning algorithm.


A. Supervised
B. Reinforcement
C. Unsupervised
D. None of these
Correct option is A

75. EM algorithm includes two repeated steps, here the step 2 is .


A. The normalization
B. The maximization step
C. The minimization step
D. None of the above
Correct option is C

76. Examples of Naïve Bayes Algorithm is/are


A. Spam filtration
B. Sentimental analysis
C. Classifying articles
D. All of the above
Correct option is D

77. In the intermediate steps of “EM Algorithm”, the number of each base in each
column is determined and then converted to
A. True
B. False
Correct option is A

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

78. Naïve Bayes algorithm is based on and used for solving classification problems.
A. Bayes Theorem
B. Candidate elimination algorithm
C. EM algorithm
D. None of the above
Correct option is A

79. Types of Naïve Bayes Model:


A. Gaussian
B. Multinomial
C. Bernoulli
D. All of the above
Correct option is D

80. Disadvantages of Naïve Bayes Classifier:


A. Naive Bayes assumes that all features are independent or unrelated, so it cannot
learn the relationship between
B. It performs well in Multi-class predictions as compared to the other
C. Naïve Bayes is one of the fast and easy ML algorithms to predict a class of
D. It is the most popular choice for text classification problems.
Correct option is A

81. The benefit of Naïve Bayes:-


A. Naïve Bayes is one of the fast and easy ML algorithms to predict a class of
B. It is the most popular choice for text classification problems.
C. It can be used for Binary as well as Multi-class
D. All of the above
Correct option is D

82. In which of the following types of sampling the information is carried out under
the opinion of an expert?
A. Convenience sampling
B. Judgement sampling
C. Quota sampling
D. Purposive sampling
Correct option is B

83. Full form of MDL?


A. Minimum Description Length
B. Maximum Description Length
C. Minimum Domain Length
D. None of these
Correct option is A

84. For the analysis of ML algorithms, we need


A. Computational learning theory
B. Statistical learning theory

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

C. Both A & B
D. None of these
Correct option is C

85. PAC stand for


A. Probably Approximate Correct
B. Probably Approx Correct
C. Probably Approximate Computation
D. Probably Approx Computation
Correct option is A

86. hypothesis h with respect to target concept c and distribution D , is the probability
that h will misclassify an instance drawn at random according to D.
A. True Error
B. Type 1 Error
C. Type 2 Error
D. None of these
Correct option is A

87. Statement: True error defined over entire instance space, not just training data
A. True
B. False
Correct option is A

88. What are the area CLT comprised of?


A. Sample Complexity
B. Computational Complexity
C. Mistake Bound
D. All of these
Correct option is D

88. What area of CLT tells “How many examples we need to find a good hypothesis
?”?
A. Sample Complexity
B. Computational Complexity
C. Mistake Bound
D. None of these
Correct option is A

89. What area of CLT tells “How much computational power we need to find a good
hypothesis ?”?
A. Sample Complexity
B. Computational Complexity
C. Mistake Bound
D. None of these
Correct option is B

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

90. What area of CLT tells “How many mistakes we will make before finding a good
hypothesis ?”?
A. Sample Complexity
B. Computational Complexity
C. Mistake Bound
D. None of these
Correct option is C

91. (For question no. 9 and 10) Can we say that concept described by conjunctions of
Boolean literals are PAC learnable?
A. Yes
B. No
Correct option is A

92. How large is the hypothesis space when we have n Boolean attributes?
A. |H| = 3 n
B. |H| = 2 n
C. |H| = 1 n
D. |H| = 4n
Correct option is A

93. The VC dimension of hypothesis space H1 is larger than the VC dimension of


hypothesis space H2. Which of the following can be inferred from this?
A. The number of examples required for learning a hypothesis in H1 is larger than
the number of examples required for H2
B. The number of examples required for learning a hypothesis in H1 is smaller than
the number of examples required for
C. No relation to number of samples required for PAC learning.
Correct option is A

94. For a particular learning task, if the requirement of error parameter changes from
0.1 to 0.01. How many more samples will be required for PAC learning?
A. Same
B. 2 times
C. 1000 times
D. 10 times
Correct option is D

95. Computational complexity of classes of learning problems depends on which of


the following?
A. The size or complexity of the hypothesis space considered by learner
B. The accuracy to which the target concept must be approximated
C. The probability that the learner will output a successful hypothesis
D. All of these
Correct option is D

96. The instance-based learner is a

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

A. Lazy-learner
B. Eager learner
C. Can‟t say
Correct option is A

97. When to consider nearest neighbour algorithms?


A. Instance map to point in kn
B. Not more than 20 attributes per instance
C. Lots of training data
D. None of these
E. A, B & C
Correct option is E

98. What are the advantages of Nearest neighbour alogo?


A. Training is very fast
B. Can learn complex target functions
C. Don‟t lose information
D. All of these
Correct option is D

99. What are the difficulties with k-nearest neighbour algo?


A. Calculate the distance of the test case from all training cases
B. Curse of dimensionality
C. Both A & B
D. None of these
Correct option is C

100. What if the target function is real valued in kNN algo?


A. Calculate the mean of the k nearest neighbours
B. Calculate the SD of the k nearest neighbour
C. None of these
Correct option is A

101. What is/are true about Distance-weighted KNN?


A. The weight of the neighbour is considered
B. The distance of the neighbour is considered
C. Both A & B
D. None of these
Correct option is C

102. What is/are advantage(s) of Distance-weighted k-NN over k-NN?


A. Robust to noisy training data
B. Quite effective when a sufficient large set of training data is provided
C. Both A & B
D. None of these
Correct option is C

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

103. What is/are advantage(s) of Locally Weighted Regression?


A. Pointwise approximation of complex target function
B. Earlier data has no influence on the new ones
C. Both A & B
D. None of these
Correct option is C

104. The quality of the result depends on (LWR)


A. Choice of the function
B. Choice of the kernel function K
C. Choice of the hypothesis space H
D. All of these
Correct option is D

105. How many types of layer in radial basis function neural networks?
A. 3
B. 2
C. 1
D. 4
Correct option is A, Input layer, Hidden layer, and Output layer

106. The neurons in the hidden layer contains Gaussian transfer function
whose output are to the distance from the centre of the neuron.
A. Directly
B. Inversely
C. equal
D. None of these
Correct option is B

107. PNN/GRNN networks have one neuron for each point in the training file,
While RBF network have a variable number of neurons that is usually
A. less than the number of training
B. greater than the number of training points
C. equal to the number of training points
D. None of these
Correct option is A

108. Which network is more accurate when the size of training set between
small to medium?
A. PNN/GRNN
B. RBF
C. K-means clustering
D. None of these
Correct option is A

109. What is/are true about RBF network?


A. A kind of supervised learning

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

B. Design of NN as curve fitting problem


C. Use of multidimensional surface to interpolate the test data
D. All of these
Correct option is D

110. Application of CBR


A. Design
B. Planning
C. Diagnosis
D. All of these
Correct option is A

111. What is/are advantages of CBR?


A. A local approx. is found for each test case
B. Knowledge is in a form understandable to human
C. Fast to train
D. All of these
Correct option is D

112 In k-NN algorithm, given a set of training examples and the value of k < size of training set
(n), the algorithm predicts the class of a test example to be the. What is/are advantages of CBR?

A. Least frequent class among the classes of k closest training


B. Most frequent class among the classes of k closest training
C. Class of the closest
D. Most frequent class among the classes of the k farthest training examples.
Correct option is B

113. Which of the following statements is true about PCA?


• We must standardize the data before applying
• We should select the principal components which explain the highest variance
• We should select the principal components which explain the lowest variance
• We can use PCA for visualizing the data in lower dimensions
A. (i), (ii) and (iv).
B. (ii) and (iv)
C. (iii) and (iv)
D. (i) and (iii)
Correct option is A

114. Genetic algorithm is a


A. Search technique used in computing to find true or approximate solution to
optimization and search problem
B. Sorting technique used in computing to find true or approximate solution to
optimization and sort problem
C. Both A & B
D. None of these
Correct option is A

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

115. GA techniques are inspired by


A. Evolutionary
B. Cytology
C. Anatomy
D. Ecology
Correct option is A

116. When would the genetic algorithm terminate?


A. Maximum number of generations has been produced
B. Satisfactory fitness level has been reached for the
C. Both A & B
D. None of these
Correct option is C

117. The algorithm operates by iteratively updating a pool of hypotheses,


called the
A. Population
B. Fitness
C. None of these
Correct option is A

118. What is the correct representation of GA?


A. GA(Fitness, Fitness_threshold, p)
B. GA(Fitness, Fitness_threshold, p, r )
C. GA(Fitness, Fitness_threshold, p, r, m)
D. GA(Fitness, Fitness_threshold)
Correct option is C

119. Genetic operators includes


A. Crossover
B. Mutation
C. Both A & B
D. None of these
Correct option is C

120. Produces two new offspring from two parent string by copying selected
bits from each parent is called
A. Mutation
B. Inheritance
C. Crossover
D. None of these
Correct option is C

121. Each schema the set of bit strings containing the indicated as
A. 0s, 1s
B. only 0s
C. only 1s

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

D. 0s, 1s, *s
Correct option is D

122. 0*10 represents the set of bit strings that includes exactly (A) 0010, 0110
A. 0010, 0010
B. 0100, 0110
C. 0100, 0010
Correct option is A

123. Correct ( h ) is the percent of all training examples correctly classified by


hypothesis then Fitness function is equal to
A. Fitness ( h) = (correct ( h)) 2
B. Fitness ( h) = (correct ( h)) 3
C. Fitness ( h) = (correct ( h))
D. Fitness ( h) = (correct ( h)) 4
Correct option is A

124. Statement: Genetic Programming individuals in the evolving population


are computer programs rather than bit
A. True
B. False
Correct option is A

125. evolution over many generations was directly influenced by


the experiences of individual organisms during their lifetime
A. Baldwin
B. Lamarckian
C. Bayes
D. None of these
Correct option is B

126. Search through the hypothesis space cannot be characterized. Why?


A. Hypotheses are created by crossover and mutation operators that allow radical
changes between successive generations
B. Hypotheses are not created by crossover and mutation
C. None of these
Correct option is A

127. ILP stand for


A. Inductive Logical programming
B. Inductive Logic Programming
C. Inductive Logical Program
D. Inductive Logic Program
Correct option is B

128. What is/are the requirement for the Learn-One-Rule method?


A. Input, accepts a set of +ve and -ve training examples.

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

B. Output, delivers a single rule that covers many +ve examples and few -ve.
C. Output rule has a high accuracy but not necessarily a high
D. A & B
E. A, B & C
Correct option is E

129. is any predicate (or its negation) applied to any set of terms.
A. Literal
B. Null
C. Clause
D. None of these
Correct option is A

130. Ground literal is a literal that


A. Contains only variables
B. does not contains any functions
C. does not contains any variables
D. Contains only functions Answer
Correct option is C

131. emphasizes learning feedback that evaluates the learner’s


performance without providing standards of correctness in the form of
behavioural
A. Reinforcement learning
B. Supervised Learning
C. None of these
Correct option is A

132. Features of Reinforcement learning


A. Set of problem rather than set of techniques
B. RL is training by reward and
C. RL is learning from trial and error with the
D. All of these
Correct option is D

133. Which type of feedback used by RL?


A. Purely Instructive feedback
B. Purely Evaluative feedback
C. Both A & B
D. None of these
Correct option is B

134. What is/are the problem solving methods for RL?


A. Dynamic programming
B. Monte Carlo Methods
C. Temporal-difference learning
D. All of these

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Correct option is D

135. The FIND-S Algorithm


A. Starts with starts from the most specific hypothesis Answer
B. It considers negative examples
C. It considers both negative and positive
D. None of these Correct
136. The hypothesis space has a general-to-specific ordering of hypotheses, and the search can
be efficiently organized by taking advantage of a naturally occurring structure over the
hypothesis space

1.
A. TRUE
B. FALSE
Correct option is A

137. The Version space is:

A. The subset of all hypotheses is called the version space with respect to the
hypothesis space H and the training examples D, because it contains all plausible
versions of the target
B. The version space consists of only specific
C. None of these
D.
Correct option is A

138. The Candidate-Elimination Algorithm


A. The key idea in the Candidate-Elimination algorithm is to output a
description of the set of all hypotheses consistent with the training
B. Candidate-Elimination algorithm computes the description of this set
without explicitly enumerating all of its
C. This is accomplished by using the more-general-than partial ordering
and maintaining a compact representation of the set of consistent
D. All of these
Correct option is D

139. Concept learning is basically acquiring the definition of a general category


from given sample positive and negative training examples of the
A. TRUE
B. FALSE
Correct option is A

140. The hypothesis h1 is more-general-than hypothesis h2 ( h1 > h2) if and


only if h1≥h2 is true and h2≥h1 is false. We also say h2 is more-specific-than h1
A. The statement is true
B. The statement is false
C. We cannot

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

D. None of these
Correct option is A

141. The List-Then-Eliminate Algorithm


A. The List-Then-Eliminate algorithm initializes the version space to
contain all hypotheses in H, then eliminates any hypothesis found
inconsistent with any training
B. The List-Then-Eliminate algorithm not initializes to the version
C. None of these Answer
Correct option is A

142. What will take place as the agent observes its interactions with the world?
A. Learning
B. Hearing
C. Perceiving
D. Speech
Correct option is A

143. Which modifies the performance element so that it makes better


decision?Performance element
A. Performance element
B. Changing element
C. Learning element
D. None of the mentioned
Correct option is C

144. Any hypothesis found to approximate the target function well over a
sufficiently large set of training examples will also approximate the target
function well over other unobserved example is called:
A. Inductive Learning Hypothesis
B. Null Hypothesis
C. Actual Hypothesis
D. None of these
Correct option is A

145. Feature of ANN in which ANN creates its own organization or


representation of information it receives during learning time is
A. Adaptive Learning
B. Self Organization
C. What-If Analysis
D. Supervised Learning
Correct option is B

146. How the decision tree reaches its decision?


A. Single test
B. Two test
C. Sequence of test

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

D. No test
Correct option is C

147. Which of the following is a disadvantage of decision trees?


• Factor analysis
• Decision trees are robust to outliers
• Decision trees are prone to be overfit
• None of the above
Correct option is C

148. Tree/Rule based classification algorithms generate which rule to perform


the classification.
A. if-then.
B. then
C. do
D. Answer
Correct option is A

149. What is Gini Index?


A. It is a type of index structure
B. It is a measure of purity
C. None of the options
Correct option is A

150. What is not a RNN in machine learning?


A. One output to many inputs
B. Many inputs to a single output
C. RNNs for nonsequential input
D. Many inputs to many outputs
Correct option is A

151. Which of the following sentences are correct in reference to Information


gain?
A. It is biased towards multi-valued attributes
B. ID3 makes use of information gain
C. The approach used by ID3 is greedy
D. All of these
Correct option is D

152. A Neural Network can answer


A. For Loop questions
B. what-if questions
C. IF-The-Else Analysis Questions
D. None of these Answer
Correct option is B

153. Artificial neural network used for

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

A. Pattern Recognition
B. Classification
C. Clustering
D. All Answer
Correct option is D

154. Which of the following are the advantage/s of Decision Trees?


A. Possible Scenarios can be added
B. Use a white box model, If given result is provided by a model
C. Worst, best and expected values can be determined for different scenarios
D. All of the mentioned
Correct option is D

155. What is the mathematical likelihood that something will occur?


A. Classification
B. Probability
C. Naïve Bayes Classifier
D. None of the other
Correct option is C

A. What does the Bayesian network provides?


B. Complete description of the domain
C. Partial description of the domain
D. Complete description of the problem
E. None of the mentioned
Correct option is C

157. Where does the Bayes rule can be used?


A. Solving queries
B. Increasing complexity
C. Decreasing complexity
D. Answering probabilistic query
Correct option is D

158. How many terms are required for building a Bayes model?
A. 2
B. 3
C. 4
D. 1
Correct option is B

159. What is needed to make probabilistic systems feasible in the world?


A. Reliability
B. Crucial robustness
C. Feasibility
D. None of the mentioned
Correct option is B

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

160. It was shown that the Naive Bayesian method


A. Can be much more accurate than the optimal Bayesian method
B. Is always worse off than the optimal Bayesian method
C. Can be almost optimal only when attributes are independent
D. Can be almost optimal when some attributes are dependent
Correct option is C

161. What is the consequence between a node and its predecessors while
creating Bayesian network?
A. Functionally dependent
B. Dependant
C. Conditionally independent
D. Both Conditionally dependant & Dependant
Correct option is C

162. How the compactness of the Bayesian network can be described?


A. Locally structured
B. Fully structured
C. Partial structure
D. All of the mentioned
Correct option is A

163. How the entries in the full joint probability distribution can be calculated?
A. Using variables
B. Using information
C. Both Using variables & information
D. None of the mentioned
Correct option is B

164. How the Bayesian network can be used to answer any query?
A. Full distribution
B. Joint distribution
C. Partial distribution
D. All of the mentioned
Correct option is B

165. Sample Complexity is


A. The sample complexity is the number of training-samples that we
need to supply to the algorithm, so that the function returned by the
algorithm is within an arbitrarily small error of the best possible
function, with probability arbitrarily close to 1
B. How many training examples are needed for learner to converge to a
successful hypothesis.
C. All of these
Correct option is C

166. PAC stands for

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

A. Probability Approximately Correct


B. Probability Applied Correctly
C. Partition Approximately Correct
Correct option is A

167. Which of the following will be true about k in k-NN in terms of variance
A. When you increase the k the variance will increases
B. When you decrease the k the variance will increases
C. Can‟t say
D. None of these
Correct option is B

168. Which of the following option is true about k-NN algorithm?


A. It can be used for classification
B. It can be used for regression
C. It can be used in both classification and regression Answer
Correct option is C

169. In k-NN it is very likely to overfit due to the curse of dimensionality.


Which of the following option would you consider to handle such problem? 1).
Dimensionality Reduction 2). Feature selection
A. 1
B. 2
C. 1 and 2
D. None of these
Correct option is C

170. When you find noise in data which of the following option would you
consider in k- NN
A. I will increase the value of k
B. I will decrease the value of k
C. Noise can not be dependent on value of k
D. None of these
Correct option is A

171. Which of the following will be true about k in k-NN in terms of Bias?
A. When you increase the k the bias will be increases
B. When you decrease the k the bias will be increases
C. Can‟t say
D. None of these
Correct option is A

172. What is used to mitigate overfitting in a test set?


A. Overfitting set
B. Training set
C. Validation dataset
D. Evaluation set

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Correct option is C

173. A radial basis function is a


A. Activation function
B. Weight
C. Learning rate
D. none
Correct option is A

174. Mistake Bound is


A. How many training examples are needed for learner to converge to a successful
hypothesis.
B. How much computational effort is needed for a learner to converge to a
successful hypothesis
C. How many training examples will the learner misclassify before conversing to a
successful hypothesis
D. None of these
Correct option is C

175. All of the following are suitable problems for genetic algorithms EXCEPT
A. dynamic process control
B. pattern recognition with complex patterns
C. simulation of biological models
D. simple optimization with few variables
Correct option is D

176. Adding more basis functions in a linear model… (Pick the most probably
option)
A. Decreases model bias
B. Decreases estimation bias
C. Decreases variance
D. Doesn‟t affect bias and variance
Correct option is A

177. Which of these are types of crossover


A. Single point
B. Two point
C. Uniform
D. All of these
Correct option is D

178. A feature F1 can take certain value: A, B, C, D, E, & F and represents grade
of students from a college. Which of the following statement is true in following
case?
A. Feature F1 is an example of nominal
B. Feature F1 is an example of ordinal
C. It doesn‟t belong to any of the above category.

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Correct option is B

179. You observe the following while fitting a linear regression to the data: As
you increase the amount of training data, the test error decreases and the
training error increases. The train error is quite low (almost what you expect it to),
while the test error is much higher than the train error. What do you think is the
main reason behind this behaviour? Choose the most probable option.
A. High variance
B. High model bias
C. High estimation bias
D. None of the above Answer
Correct option is C

180. Genetic algorithms are heuristic methods that do not guarantee an


optimal solution to a problem
A. TRUE
B. FALSE
Correct option is A

181. Which of the following statements about regularization is not correct?


A. Using too large a value of lambda can cause your hypothesis to
underfit the
B. Using too large a value of lambda can cause your hypothesis to
overfit the
C. Using a very large value of lambda cannot hurt the performance of
your hypothesis.
D. None of the above
Correct option is A

182. Consider the following: (a) Evolution (b) Selection (c) Reproduction (d)
Mutation Which of the following are found in genetic algorithms?
A. All
B. a, b, c
C. a, b
D. b, d
Correct option is A

183. Genetic Algorithm are a part of


A. Evolutionary Computing
B. inspired by Darwin’s theory about evolution – “survival of the fittest”
C. are adaptive heuristic search algorithm based on the evolutionary
ideas of natural selection and genetics
D. All of the above
Correct option is D

184. Genetic algorithms belong to the family of methods in the


A. artificial intelligence area

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

B. optimization
C. complete enumeration family of methods
D. Non-computer based (human) solutions area
Correct option is A

185. For a two player chess game, the environment encompasses the
opponent
A. True
B. False
Correct option is A

186. Which among the following is not a necessary feature of a reinforcement


learning solution to a learning problem?
A. exploration versus exploitation dilemma
B. trial and error approach to learning
C. learning based on rewards
D. representation of the problem as a Markov Decision Process
Correct option is D

187. Which of the following sentence is FALSE regarding reinforcement


learning
A. It relates inputs to
B. It is used for
C. It may be used for
D. It discovers causal relationships.
Correct option is D

188. The EM algorithm is guaranteed to never decrease the value of its


objective function on any iteration
A. TRUE
B. FALSE Answer
Correct option is A

189. Consider the following modification to the tic-tac-toe game: at the end of
game, a coin is tossed and the agent wins if a head appears regardless of
whatever has happened in the game.Can reinforcement learning be used to learn
an optimal policy of playing Tic-Tac-Toe in this case?
A. Yes
B. No
Correct option is B

190. Out of the two repeated steps in EM algorithm, the step 2 is _

A. the maximization step


B. the minimization step
C. the optimization step
D. the normalization step

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Correct option is A

191. Suppose the reinforcement learning player was greedy, that is, it always
played the move that brought it to the position that it rated the best. Might it
learn to play better, or worse, than a non greedy player?
A. Worse
B. Better
Correct option is B

192. A chess agent trained by using Reinforcement Learning can be trained by


playing against a copy of the same
A. True
B. False
Correct option is A

193. The EM iteration alternates between performing an expectation (E) step,


which creates a function for the expectation of the log-likelihood evaluated using
the current estimate for the parameters, and a maximization (M) step, which
computes parameters maximizing the expected log-likelihood found on the E
A. TRUE
B. FALSE
Correct option is A

194. Expectation–maximization (EM) algorithm is an


A. Iterative
B. Incremental
C. None
Correct option is A

195. Feature need to be identified by using Well Posed Learning Problem:


A. Class of tasks
B. Performance measure
C. Training experience
D. All of these
Correct option is D

196. A computer program that learns to play checkers might improve its
performance as:
A. Measured by its ability to win at the class of tasks involving playing
checkers
B. Experience obtained by playing games against
C. Both a & b
D. None of these
Correct option is C

197. Learning symbolic representations of concepts known as:


A. Artificial Intelligence

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

B. Machine Learning
C. Both a & b
D. None of these
Correct option is A

198. The field of study that gives computers the capability to learn without
being explicitly programmed
A. Machine Learning
B. Artificial Intelligence
C. Deep Learning
D. Both a & b
Correct option is A

199. The autonomous acquisition of knowledge through the use of computer


programs is called
A. Artificial Intelligence
B. Machine Learning
C. Deep learning
D. All of these
Correct option is B

200. Learning that enables massive quantities of data is known as


A. Artificial Intelligence
B. Machine Learning
C. Deep learning
D. All of these
Correct option is B

201. A different learning method does not include


A. Memorization
B. Analogy
C. Deduction
D. Introduction
Correct option is D

202. Types of learning used in machine


A. Supervised
B. Unsupervised
C. Reinforcement
D. All of these
Correct option is D

203. A computer program is said to learn from experience E with respect to


some class of tasks T and performance measure P, if its performance at tasks in T,
as measured by P, improves with experience
A. Supervised learning problem
B. Un Supervised learning problem

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

C. Well posed learning problem


D. All of these
Correct option is C

204. Which of the following is a widely used and effective machine learning
algorithm based on the idea of bagging?
A. Decision Tree
B. Regression
C. Classification
D. Random Forest
Correct option is D

205. How many types are available in machine learning?


A. 1
B. 2
C. 3
D. 4
Correct option is C

205. A model can learn based on the rewards it received for its previous action
is known as:
A. Supervised learning
B. Unsupervised learning
C. Reinforcement learning
D. Concept learning
Correct option is C

206. A subset of machine learning that involves systems that think and learn
like humans using artificial neural networks.
A. Artificial Intelligence
B. Machine Learning
C. Deep Learning
D. All of these
Correct option is C

207. A learning method in which a training data contains a small amount of


labeled data and a large amount of unlabeled data is known
as
A. Supervised Learning
B. Semi Supervised Learning
C. Unsupervised Learning
D. Reinforcement Learning
Correct option is C

208. Methods used for the calibration in Supervised Learning


A. Platt Calibration
B. Isotonic Regression

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

C. All of these
D. None of above
Correct option is C

209. The basic design issues for designing a learning


A. Choosing the Training Experience
B. Choosing the Target Function
C. Choosing a Function Approximation Algorithm
D. Estimating Training Values
E. All of these
Correct option is E

210. In Machine learning the module that must solve the given performance
task is known as:
A. Critic
B. Generalizer
C. Performance system
D. All of these
Correct option is C

211. A learning method that is used to solve a particular computational


program, multiple models such as classifiers or experts are strategically generated
and combined is called as
A. Supervised Learning
B. Semi Supervised Learning
C. Unsupervised Learning
D. Reinforcement Learning
E. Ensemble learning
Correct option is E

212. In a learning system the component that takes as takes input the current
hypothesis (currently learned function) and outputs a new problem for the
Performance System to explore.
A. Critic
B. Generalizer
C. Performance system
D. Experiment generator
E. All of these
Correct option is D

213. Learning method that is used to improve the classification, prediction,


function approximation etc of a model
A. Supervised Learning
B. Semi Supervised Learning
C. Unsupervised Learning
D. Reinforcement Learning
E. Ensemble learning
Correct option is E

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

214. In a learning system the component that takes as input the history or
trace of the game and produces as output a set of training examples of the target
function is known as:
A. Critic
B. Generalizer
C. Performance system
D. All of these
Correct option is A

215. The most common issue when using ML is


A. Lack of skilled resources
B. Inadequate Infrastructure
C. Poor Data Quality
D. None of these
Correct option is C

216. How to ensure that your model is not over fitting


A. Cross validation
B. Regularization
C. All of these
D. None of these
Correct option is C

217. A way to ensemble multiple classifications or regression


A. Stacking
B. Bagging
C. Blending
D. Boosting
Correct option is A

218. How well a model is going to generalize in new environment is known as


A. Data Quality
B. Transparent
C. Implementation
D. None of these
Correct option is B

219. Common classes of problems in machine learning is


A. Classification
B. Clustering
C. Regression
D. All of these
Correct option is D

220. Which of the following is a widely used and effective machine learning
algorithm based on the idea of bagging?
A. Decision Tree

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

B. Regression
C. Classification
D. Random Forest
Correct option is D

221. Cost complexity pruning algorithm is used in?


A. CART
B. 5
C. ID3
D. All of
Correct option is A

222. Which one of these is not a tree based learner?


A. CART
B. 5
C. ID3
D. Bayesian Classifier
Correct option is D

223. Which one of these is a tree based learner?


A. Rule based
B. Bayesian Belief Network
C. Bayesian classifier
D. Random Forest
Correct option is D

224. What is the approach of basic algorithm for decision tree induction?
A. Greedy
B. Top Down
C. Procedural
D. Step by Step
Correct option is A

225. Which of the following classifications would best suit the student
performance classification systems?
A. If-.then-analysis
B. Market-basket analysis
C. Regression analysis
D. Cluster analysis
Correct option is A

226. What are two steps of tree pruning work?


A. Pessimistic pruning and Optimistic pruning
B. Post pruning and Pre pruning
C. Cost complexity pruning and time complexity pruning
D. None of these
Correct option is B

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

227. How will you counter over-fitting in decision tree?


A. By pruning the longer rules
B. By creating new rules
C. Both By pruning the longer rules‟ and „ By creating new rules‟
D. None of Answer
Correct option is A

228. Which of the following sentences are true?


A. In pre-pruning a tree is ‘pruned’ by halting its construction early
B. A pruning set of class labeled tuples is used to estimate cost
C. The best pruned tree is the one that minimizes the number of
encoding
D. All of these
Correct option is D

229. Which of the following is a disadvantage of decision trees?


A. Factor analysis
B. Decision trees are robust to outliers
C. Decision trees are prone to be over fit
D. None of the above
Correct option is C

230. In which of the following scenario a gain ratio is preferred over


Information Gain?
A. When a categorical variable has very large number of category
B. When a categorical variable has very small number of category
C. Number of categories is the not the reason
D. None of these
Correct option is A

231. Major pruning techniques used in decision tree are


A. Minimum error
B. Smallest tree
C. Both a & b
D. None of these
Correct option is B

232. What does the central limit theorem state?


A. If the sample size increases sampling distribution must approach
normal distribution
B. If the sample size decreases then the sample distribution must
approach normal distribution.
C. If the sample size increases then the sampling distributions much
approach an exponential
D. If the sample size decreases then the sampling distributions much
approach an exponential
Correct option is A

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

233. The difference between the sample value expected and the estimates
value of the parameter is called as?
A. Bias
B. Error
C. Contradiction
D. Difference
Correct option is A

234. In which of the following types of sampling the information is carried out
under the opinion of an expert?
A. Quota sampling
B. Convenience sampling
C. Purposive sampling
D. Judgment sampling
Correct option is D

235. Which of the following is a subset of population?


A. Distribution
B. Sample
C. Data
D. Set
Correct option is B

236. The sampling error is defined as?


A. Difference between population and parameter
B. Difference between sample and parameter
C. Difference between population and sample
D. Difference between parameter and sample
Correct option is C

237. Machine learning is interested in the best hypothesis h from some space
H, given observed training data D. Here best hypothesis means
A. Most general hypothesis
B. Most probable hypothesis
C. Most specific hypothesis
D. None of these
Correct option is B

238. Practical difficulties with Bayesian Learning :


A. Initial knowledge of many probabilities is required
B. No consistent hypothesis
C. Hypotheses make probabilistic predictions
D. None of these
Correct option is A

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

239. Bayes’ theorem states that the relationship between the probability of the
hypothesis before getting the evidence P(H) and the probability of the hypothesis
after getting the evidence P(H∣E) is
A. [P(E∣H)P(H)] / P(E)
B. [P(E∣H) P(E) ] / P(H)
C. [P(E) P(H) ] / P(E∣H)
D. None of these
Correct option is A

240. A doctor knows that Cold causes fever 50% of the time. Prior probability
of any patient having cold is 1/50,000. Prior probability of any patient having
fever is 1/20. If a patient has fever, what is the probability he/she has cold?
A. P(C/F)= 0.0003
B. P(C/F)=0.0004
C. P(C/F)= 0.0002
D. P(C/F)=0.0045
Correct option is C

241. Which of the following will be true about k in K-Nearest Neighbor in


terms of Bias?
A. When you increase the k the bias will be increases
B. When you decrease the k the bias will be increases
C. Can‟t say
D. None of these
Correct option is A

242. When you find noise in data which of the following option would you
consider in K- Nearest Neighbor?
A. I will increase the value of k
B. I will decrease the value of k
C. Noise cannot be dependent on value of k
D. None of these
Correct option is A

243. In K-Nearest Neighbor it is very likely to overfit due to the curse of


dimensionality. Which of the following option would you consider to handle such
problem?
• Dimensionality Reduction
• Feature selection
A. 1
B. 2
C. 1 and 2
D. None of these
Correct option is C

244. Radial basis functions is closely related to distance-weighted regression,


but it is
A. lazy learning

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

B. eager learning
C. concept learning
D. none of these
Correct option is B

245. Radial basis function networks provide a global approximation to the


target function, represented by of many local kernel function.
A. a series combination
B. a linear combination
C. a parallel combination
D. a non linear combination
Correct option is B

246. The most significant phase in a genetic algorithm is


A. Crossover
B. Mutation
C. Selection
D. Fitness function
Correct option is A

247. The crossover operator produces two new offspring from


A. Two parent strings, by copying selected bits from each parent
B. One parent strings, by copying selected bits from selected parent
C. Two parent strings, by copying selected bits from one parent
D. None of these
Correct option is A

248. Mathematically characterize the evolution over time of the population


within a GA based on the concept of
A. Schema
B. Crossover
C. Don‟t care
D. Fitness function
Correct option is A

249. In genetic algorithm process of selecting parents which mate and


recombine to create off-springs for the next generation is known as:
A. Tournament selection
B. Rank selection
C. Fitness sharing
D. Parent selection
Correct option is D

250. Crossover operations are performed in genetic programming by replacing


A. Randomly chosen sub tree of one parent program by a sub tree from
the other parent program.

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

B. Randomly chosen root node tree of one parent program by a sub tree
from the other parent program
C. Randomly chosen root node tree of one parent program by a root
node tree from the other parent program
D. None of these
Correct option is A

1) If you remove the following any one red points from the data. Does the
decision boundary will change?
A) Yes
B) No

2) [True or False] If you remove the non-red circled points from the data,
the decision boundary will change?
A) True
B) False

3) What do you mean by generalization error in terms of the SVM?


A) How far the hyperplane is from the support vectors
B) How accurately the SVM can predict outcomes for unseen data
C) The threshold amount of error in an SVM

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

4) When the C parameter is set to infinite, which of the following holds


true?
A) The optimal hyperplane if exists, will be the one that completely
separates the data
B) The soft-margin classifier will separate the data
C) None of the above

5) What do you mean by a hard margin?


A) The SVM allows very low error in classification
B) The SVM allows high amount of error in classification
C) None of the above

6) The minimum time complexity for training an SVM is O(n2). According to


this fact, what sizes of datasets are not best suited for SVM’s?
A) Large datasets
B) Small datasets
C) Medium sized datasets
D) Size does not matter

Solution: A

Datasets which have a clear classification boundary will function best with
SVM’s.

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

7) The effectiveness of an SVM depends upon:

A) Selection of Kernel
B) Kernel Parameters
C) Soft Margin Parameter C
D) All of the above

Solution: D

The SVM effectiveness depends upon how you choose the basic 3
requirements mentioned above in such a way that it maximises your
efficiency, reduces error and overfitting.

8) Support vectors are the data points that lie closest to the decision
surface.

A) TRUE
B) FALSE

Solution: A

They are the points closest to the hyperplane and the hardest ones to
classify. They also have a direct bearing on the location of the decision
surface.

9) The SVM’s are less effective when:

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

A) The data is linearly separable


B) The data is clean and ready to use
C) The data is noisy and contains overlapping points

Solution: C

When the data has noise and overlapping points, there is a problem in
drawing a clear hyperplane without misclassifying.

10) Suppose you are using RBF kernel in SVM with high Gamma value.
What does this signify?
A) The model would consider even far away points from hyperplane for
modeling
B) The model would consider only the points close to the hyperplane for
modeling
C) The model would not be affected by distance of points from hyperplane
for modeling
D) None of the above

Solution: B

The gamma parameter in SVM tuning signifies the influence of points either
near or far away from the hyperplane.

For a low gamma, the model will be too constrained and include all points
of the training dataset, without really capturing the shape.

For a higher gamma, the model will capture the shape of the dataset well.

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

11) The cost parameter in the SVM means:

A) The number of cross-validations to be made


B) The kernel to be used
C) The tradeoff between misclassification and simplicity of the model
D) None of the above

Solution: C

The cost parameter decides how much an SVM should be allowed to


“bend” with the data. For a low cost, you aim for a smooth decision surface
and for a higher cost, you aim to classify more points correctly. It is also
simply referred to as the cost of misclassification.

12) Suppose you are building a SVM model on data X. The data X can be
error prone which means that you should not trust any specific data point
too much. Now think that you want to build a SVM model which has
quadratic kernel function of polynomial degree 2 that uses Slack variable C
as one of it’s hyper parameter. Based upon that give the answer for
following question.

What would happen when you use very large value of C(C->infinity)?

Note: For small C was also classifying all data points correctly

A) We can still classify data correctly for given setting of hyper parameter C
B) We can not classify data correctly for given setting of hyper parameter C
C) Can’t Say

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

D) None of these

Solution: A

For large values of C, the penalty for misclassifying points is very high, so
the decision boundary will perfectly separate the data if possible.

13) What would happen when you use very small C (C~0)?
A) Misclassification would happen
B) Data will be correctly classified
C) Can’t say
D) None of these

Solution: A

The classifier can maximize the margin between most of the points, while
misclassifying a few points, because the penalty is so low.

14) If I am using all features of my dataset and I achieve 100% accuracy on


my training set, but ~70% on validation set, what should I look out for?

A) Underfitting
B) Nothing, the model is perfect
C) Overfitting

Solution: C

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

If we’re achieving 100% training accuracy very easily, we need to check to


verify if we’re overfitting our data.

15) Which of the following are real world applications of the SVM?
A) Text and Hypertext Categorization
B) Image Classification
C) Clustering of News Articles
D) All of the above

Solution: D

SVM’s are highly versatile models that can be used for practically all real
world problems ranging from regression to clustering and handwriting
recognitions.

Question Context: 16 – 18

Suppose you have trained an SVM with linear decision boundary after
training SVM, you correctly infer that your SVM model is under fitting.

16) Which of the following option would you more likely to consider iterating
SVM next time?
A) You want to increase your data points
B) You want to decrease your data points
C) You will try to calculate more variables
D) You will try to reduce the features

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Solution: C

The best option here would be to create more features for the model.

17) Suppose you gave the correct answer in previous question. What do
you think that is actually happening?

1. We are lowering the bias


2. We are lowering the variance
3. We are increasing the bias
4. We are increasing the variance

A) 1 and 2
B) 2 and 3
C) 1 and 4
D) 2 and 4

Solution: C

Better model will lower the bias and increase the variance

18) In above question suppose you want to change one of it’s(SVM)


hyperparameter so that effect would be same as previous questions i.e
model will not under fit?

A) We will increase the parameter C


B) We will decrease the parameter C

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

C) Changing in C don’t effect


D) None of these

Solution: A

Increasing C parameter would be the right thing to do here, as it will ensure


regularized model

19) We usually use feature normalization before using the Gaussian kernel
in SVM. What is true about feature normalization?

1. We do feature normalization so that new feature will dominate other


2. Some times, feature normalization is not feasible in case of categorical
variables
3. Feature normalization always helps when we use Gaussian kernel in
SVM

A) 1
B) 1 and 2
C) 1 and 3
D) 2 and 3

Solution: B

Statements one and two are correct.

Question Context: 20-22

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Suppose you are dealing with 4 class classification problem and you want
to train a SVM model on the data for that you are using One-vs-all method.
Now answer the below questions?

20) How many times we need to train our SVM model in such case?

A) 1
B) 2
C) 3
D) 4

Solution: D

For a 4 class problem, you would have to train the SVM at least 4 times if
you are using a one-vs-all method.

21) Suppose you have same distribution of classes in the data. Now, say
for training 1 time in one vs all setting the SVM is taking 10 second. How
many seconds would it require to train one-vs-all method end to end?

A) 20
B) 40
C) 60
D) 80

Solution: B

It would take 10×4 = 40 seconds

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

22) Suppose your problem has changed now. Now, data has only 2
classes. What would you think how many times we need to train SVM in
such case?

A) 1
B) 2
C) 3
D) 4

Solution: A

Training the SVM only one time would give you appropriate results

Question context: 23 – 24

Suppose you are using SVM with linear kernel of polynomial degree 2, Now
think that you have applied this on data and found that it perfectly fit the
data that means, Training and testing accuracy is 100%.

23) Now, think that you increase the complexity(or degree of polynomial of
this kernel). What would you think will happen?

A) Increasing the complexity will overfit the data


B) Increasing the complexity will underfit the data
C) Nothing will happen since your model was already 100% accurate
D) None of these

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Solution: A

Increasing the complexity of the data would make the algorithm overfit the
data.

24) In the previous question after increasing the complexity you found that
training accuracy was still 100%. According to you what is the reason
behind that?

1. Since data is fixed and we are fitting more polynomial term or


parameters so the algorithm starts memorizing everything in the data
2. Since data is fixed and SVM doesn’t need to search in big hypothesis
space

A) 1
B) 2
C) 1 and 2
D) None of these

Solution: C

Both the given statements are correct.

25) What is/are true about kernel in SVM?

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

1. Kernel function map low dimensional data to high dimensional space


2. It’s a similarity function

A) 1
B) 2
C) 1 and 2
D) None of these

Solution: C

Both the given statements are correct.

Q- When comparing multiple regularised machine learning models for a


given task, which of the following are reasonable ways to pick the best one,
in terms of its ability to generalise to unseen data? (Here A refers to the
regularisation parameter as usual.)
(A)Pick the one with lowest training error, with A having been chosen so as
to minimise training error.

(b) Pick the one with lowest error on a separate test set, with A having
been chosen so as to minimise training error.

(c) Pick the one with lowest error on a separate test set, with A having been
chosen so as to minimise error on this test set.

d) Pick the one with lowest error on a separate test set, with A having been
chosen so as to minimise cross-validation error on the training set.

(E)Pick the one with lowest cross-validation error on the training set, with A
having been chosen so as to minimise cross-validation error on the training
set.

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

6. When doing MAP estimation of the parameters of a linear regression


model (assuming that the opti misation can be done exactly), increasing the
value of the noise precision B

(a) will never decrease the training error.

(B)will never increase the training error.

(C)will never decrease the testing error.

(d) will never increase the testing error.

(e) may either increase or decrease the training error.


(F)may either increase or decrease the testing error.

7. Which of the following are characteristics of data sampled from a


Gaussian distribution?

(a) The sample mean systematically underestimates the true mean.


(B)The sample variance systematically underestimates the true variance.
(c) Both the sample mean and variance are unbiased estimators of the true
values.

1 Which of the following would be incompatible with a frequentist (non-Bayesian) view of


probabil ity?

(a) The use of a non-Gaussian noise model in probabilistic regression.

(b) The use of probabilistic modelling for regression.

(C) The use of prior distributions on the parameters in a probabilistic model.


(D)The idea of assuming a probability distribution over models.

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

2. Four different people are doing bias-variance estimates on regularised linear regression
models. They come to you and make the following claims about certain experiments they've
done. Which of these claims are definitely incorrect? (Here A refers to the regularisation
parameter as usual.)

(a) 'I increased A and the model started underfitting the data, whilst the variance went down'.
(b) 'I decreased A and the model started overfitting the data, whilst the bias went up'.

(C)'I decreased A and the model started overfitting the data, whilst the variance went up'.

(D) 'I increased A and the model started underfitting the data, whilst the bias went down'.

3. Consider a binary classification problem. Suppose I have trained a model on a linearly


separable training set, and now I get a new labeled data point which is correctly classified by
the model, and far away from the decision boundary. If I now add this new point to my earlier
training set and re-train via gradient descent, initialising the parameters to those of the original
model, in which cases will the learnt decision boundary remain exactly the same?

(A)When my model is a perceptron.

(b) When my model is logistic regression.

(c) When my model is Fisher's linear discriminant.

(d) When my model is a linear discriminant trained via least squares.

4. Suppose your model is demonstrating high variance across different training sets. Which of
the following is NOT a valid way to try and reduce the variance?

(a) Increase the amount of training data in each training set.

(b)Improve the optimisation algorithm being used for error minimisation.

(c)Decrease the model complexity.

(d) Reduce the noise in the training data.

1. A _________ is a decision support tool that uses a tree-like graph or


model of decisions and their possible consequences, including chance
event outcomes, resource costs, and utility.
a) Decision tree
b) Graphs
c) Trees
d) Neural Networks

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

2. Decision Tree is a display of an algorithm.


a) True
b) False

3. Decision Tree is
a) Flow-Chart
b) Structure in which internal node represents test on an attribute, each
branch represents outcome of test and each leaf node represents class
label
c) Both a) & b)
d) None of the mentioned
.
4. Decision Trees can be used for Classification Tasks.
a) True
b) False

5. How many types of learning are available in machine learning?


a) 1
b) 2
c) 3
d) 4

6. Choose from the following that are Decision Tree nodes


a) Decision Nodes
b) Weighted Nodes
c) Chance Nodes
d) End Nodes

7. Decision Nodes are represented by,


a) Disks
b) Squares
c) Circles
d) Triangles

8. Chance Nodes are represented by,


a) Disks
b) Squares
c) Circles
d) Triangles

9. End Nodes are represented by,


a) Disks
b) Squares
c) Circles

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

d) Triangles

10. How the decision tree reaches its decision?


a) Single test
b) Two test
c) Sequence of test
d) No test

11. What is the other name of informed search strategy?


a) Simple search
b) Heuristic search
c) Online search
d) None of the mentioned

12. How many types of informed search method are in artificial


intelligence?
a) 1
b) 2
c) 3
d) 4

13. Which search uses the problem specific knowledge beyond the
definition of
the problem?
a) Informed search
b) Depth-first search
c) Breadth-first search
d) Uninformed search

14. Which function will select the lowest expansion node atfirst for
evaluation?
a) Greedy best-first search
b) Best-first search
c) Both a & b
d) None of the mentioned

15. What is the heuristic function of greedy best-first search?


a) f(n) != h(n)
b) f(n) < h(n)
c) f(n) = h(n)
d) f(n) > h(n)

16. Which search uses only the linear space for searching?
a) Best-first search

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

b) Recursive best-first search


c) Depth-first search
d) None of the mentioned

17. Which method is used to search better by learning?


a) Best-first search
b) Depth-first search
c) Metalevel state space
d) None of the mentioned

18. Which search is complete and optimal when h(n) is consistent?


a) Best-first search
b) Depth-first search
c) Both a & b
d) A* search

19. Which is used to improve the performance of heuristic search?


a) Quality of nodes
b) Quality of heuristic function
c) Simple form of nodes
d) None of the mentioned

20. Which search method will expand the node that is closest to the goal?
a) Best-first search
b) Greedy best-first search
c) A* search
d) None of the mentioned

21. Which data structure is used to give better heuristic estimates?


a) Forwards state-space
b) Backward state-space
c) Planning graph algorithm
d) None of the mentioned

22. Which is used to extract solution directly from the planning graph?
a) Planning algorithm
b) Graph plan
c) Hill-climbing search
d) All of the mentioned

23. What are present in the planning graph?


a) Sequence of levels
b) Literals
c) Variables

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

d) Heuristic estimates

24. What is the starting level of planning graph?


a) Level 3
b) Level 2
c) Level 1
d) Level 0

25. What are present in each level of planning graph?


a) Literals
b) Actions
c) Variables
d) Both a & b

26. Which kind of problem is suitable for planning graph?


a) Propositional planning problem
b) Planning problem
c) Action problem
d) None of the mentioned

27. What is meant by persistence actions?


a) Allow a literal to remain false
b) Allow a literal to remain true
c) Both a & b
d) None of the mentioned

28. When will further expansion is unnecessary for planning graph?


a) Identical
b) Replicate
c) Not identical
d) None of the mentioned

29. How many conditions are available between two actions in mutex
relation?
a) 1
b) 2
c) 3
d) 4

30. What is called inconsistent support?


a) If two literals are not negation of other
b) If two literals are negation of other
c) Mutually exclusive
d) None of the mentioned

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

1. What is Machine Learning (ML)?


A. The autonomous acquisition of knowledge through the use of manual programs
B. The selective acquisition of knowledge through the use of computer programs
C. The selective acquisition of knowledge through the use of manual programs
D. The autonomous acquisition of knowledge through the use of computer
programs
Correct option is D

2. Father of Machine Learning (ML)


A. Geoffrey Chaucer
B. Geoffrey Hill
C. Geoffrey Everest Hinton
D. None of the above
Correct option is C

3. Which is FALSE regarding regression?


A. It may be used for interpretation
B. It is used for prediction
C. It discovers causal relationships
D. It relates inputs to outputs
Correct option is C

4. Choose the correct option regarding machine learning (ML) and artificial
intelligence (AI)
A. ML is a set of techniques that turns a dataset into a software
B. AI is a software that can emulate the human mind
C. ML is an alternate way of programming intelligent machines
D. All of the above
Correct option is D

5. Which of the factors affect the performance of the learner system does not
include?
A. Good data structures
B. Representation scheme used
C. Training scenario
D. Type of feedback
Correct option is A

6. In general, to have a well-defined learning problem, we must identity which of the


following
A. The class of tasks
B. The measure of performance to be improved
C. The source of experience
D. All of the above

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Correct option is D

7. Successful applications of ML
A. Learning to recognize spoken words
B. Learning to drive an autonomous vehicle
C. Learning to classify new astronomical structures
D. Learning to play world-class backgammon
E. All of the above
Correct option is E

8. Which of the following does not include different learning methods


A. Analogy
B. Introduction
C. Memorization
D. Deduction
Correct option is B

9. In language understanding, the levels of knowledge that does not include?


A. Empirical
B. Logical
C. Phonological
D. Syntactic
Correct option is A

10. Designing a machine learning approach involves:-


A. Choosing the type of training experience
B. Choosing the target function to be learned
C. Choosing a representation for the target function
D. Choosing a function approximation algorithm
E. All of the above
Correct option is E

11. Concept learning inferred a valued function from training examples of


its input and output.
A. Decimal
B. Hexadecimal
C. Boolean
D. All of the above
Correct option is C

12. Which of the following is not a supervised learning?


A. Naive Bayesian
B. PCA
C. Linear Regression

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

D. Decision Tree Answer


Correct option is B

13. What is Machine Learning?


• Artificial Intelligence
• Deep Learning
• Data Statistics
A. Only (i)
B. (i) and (ii)
C. All
D. None
Correct option is B

14. What kind of learning algorithm for “Facial identities or facial expressions”?
A. Prediction
B. Recognition Patterns
C. Generating Patterns
D. Recognizing Anomalies Answer
Correct option is B

15. Which of the following is not type of learning?


A. Unsupervised Learning
B. Supervised Learning
C. Semi-unsupervised Learning
D. Reinforcement Learning
Correct option is C

16. Real-Time decisions, Game AI, Learning Tasks, Skill Aquisition, and Robot
Navigation are applications of which of the folowing
A. Supervised Learning: Classification
B. Reinforcement Learning
C. Unsupervised Learning: Clustering
D. Unsupervised Learning: Regression
Correct option is B

17. Targetted marketing, Recommended Systems, and Customer Segmentation are


applications in which of the following
A. Supervised Learning: Classification
B. Unsupervised Learning: Clustering
C. Unsupervised Learning: Regression
D. Reinforcement Learning
Correct option is B

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

18. Fraud Detection, Image Classification, Diagnostic, and Customer Retention are
applications in which of the following
A. Unsupervised Learning: Regression
B. Supervised Learning: Classification
C. Unsupervised Learning: Clustering
D. Reinforcement Learning
Correct option is B

19. Which of the following is not function of symbolic in the various function
representation of Machine Learning?
A. Rules in propotional Logic
B. Hidden-Markov Models (HMM)
C. Rules in first-order predicate logic
D. Decision Trees
Correct option is B

20. Which of the following is not numerical functions in the various function
representation of Machine Learning?
A. Neural Network
B. Support Vector Machines
C. Case-based
D. Linear Regression
Correct option is C

21. FIND-S Algorithm starts from the most specific hypothesis and generalize it by
considering only
A. Negative
B. Positive
C. Negative or Positive
D. None of the above
Correct option is B

22. FIND-S algorithm ignores


A. Negative
B. Positive
C. Both
D. None of the above
Correct option is A

23. The Candidate-Elimination Algorithm represents the .


A. Solution Space
B. Version Space
C. Elimination Space
D. All of the above

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Correct option is B

24. Inductive learning is based on the knowledge that if something happens a lot it is
likely to be generally
A. True
B. False Answer
Correct option is A

25. Inductive learning takes examples and generalizes rather than starting
with
A. Inductive
B. Existing
C. Deductive
D. None of these
Correct option is B

26. A drawback of the FIND-S is that it assumes the consistency within the training
set
A. True
B. False
Correct option is A

27. What strategies can help reduce overfitting in decision trees?


• Enforce a maximum depth for the tree
• Enforce a minimum number of samples in leaf nodes
• Pruning
• Make sure each leaf node is one pure class
A. All
B. (i), (ii) and (iii)
C. (i), (iii), (iv)
D. None
Correct option is B

28. Which of the following is a widely used and effective machine learning algorithm
based on the idea of bagging?
A. Decision Tree
B. Random Forest
C. Regression
D. Classification
Correct option is B

29. To find the minimum or the maximum of a function, we set the gradient to zero
because which of the following
A. Depends on the type of problem

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

B. The value of the gradient at extrema of a function is always zero


C. Both (A) and (B)
D. None of these
Correct option is B

30. Which of the following is a disadvantage of decision trees?


A. Decision trees are prone to be overfit
B. Decision trees are robust to outliers
C. Factor analysis
D. None of the above
Correct option is A

31. What is perceptron?


A. A single layer feed-forward neural network with pre-processing
B. A neural network that contains feedback
C. A double layer auto-associative neural network
D. An auto-associative neural network
Correct option is A

32. Which of the following is true for neural networks?


• The training time depends on the size of the
• Neural networks can be simulated on a conventional
• Artificial neurons are identical in operation to biological
A. All
B. Only (ii)
C. (i) and (ii)
D. None
Correct option is C

subscribe our channel


33. What are the advantages of neural networks over conventional computers?
• They have the ability to learn by
• They are more fault
• They are more suited for real time operation due to their high „computational‟
A. (i) and (ii)
B. (i) and (iii)
C. Only (i)
D. All
E. None
Correct option is D

34. What is Neuro software?


A. It is software used by Neurosurgeon

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

B. Designed to aid experts in real world


C. It is powerful and easy neural network
D. A software used to analyze neurons
Correct option is C

35. Which is true for neural networks?


A. Each node computes it‟s weighted input
B. Node could be in excited state or non-excited state
C. It has set of nodes and connections
D. All of the above
Correct option is D

36. What is the objective of backpropagation algorithm?


A. To develop learning algorithm for multilayer feedforward neural network, so that
network can be trained to capture the mapping implicitly
B. To develop learning algorithm for multilayer feedforward neural network
C. To develop learning algorithm for single layer feedforward neural network
D. All of the above
Correct option is A

37. Which of the following is true?


Single layer associative neural networks do not have the ability to:-

• Perform pattern recognition


• Find the parity of a picture
• Determine whether two or more shapes in a picture are connected or not
A. (ii) and (iii)
B. Only (ii)
C. All
D. None
Correct option is A

38. The backpropagation law is also known as generalized delta rule


A. True
B. False
Correct option is A

38. Which of the following is true?


• On average, neural networks have higher computational rates than conventional
computers.
• Neural networks learn by
• Neural networks mimic the way the human brain
A. All
B. (ii) and (iii)

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

C. (i), (ii) and (iii)


D. None
Correct option is A

39. What is true regarding backpropagation rule?


A. Error in output is propagated backwards only to determine weight updates
B. There is no feedback of signal at nay stage
C. It is also called generalized delta rule
D. All of the above
Correct option is D

40. There is feedback in final stage of backpropagation


A. True
B. False
Correct option is B

41. An auto-associative network is


A. A neural network that has only one loop
B. A neural network that contains feedback
C. A single layer feed-forward neural network with pre-processing
D. A neural network that contains no loops
Correct option is B

42. A 3-input neuron has weights 1, 4 and 3. The transfer function is linear with the
constant of proportionality being equal to 3. The inputs are 4, 8 and 5
respectively. What will be the output?
A. 139
B. 153
C. 162
D. 160
Correct option is B

43. What of the following is true regarding backpropagation rule?


A. Hidden layers output is not all important, they are only meant for supporting
input and output layers
B. Actual output is determined by computing the outputs of units for each hidden
layer
C. It is a feedback neural network
D. None of the above
Correct option is B

44. What is back propagation?


A. It is another name given to the curvy function in the perceptron

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

B. It is the transmission of error back through the network to allow weights to be


adjusted so that the network can learn
C. It is another name given to the curvy function in the perceptron
D. None of the above
Correct option is B

45. The general limitations of back propagation rule is/are


A. Scaling
B. Slow convergence
C. Local minima problem
D. All of the above
Correct option is D

46. What is the meaning of generalized in statement “backpropagation is a


generalized delta rule” ?
A. Because delta is applied to only input and output layers, thus making it more
simple and generalized
B. It has no significance
C. Because delta rule can be extended to hidden layer units
D. None of the above
Correct option is C

47. Neural Networks are complex functions with many parameter


A. Linear
B. Non linear
C. Discreate
D. Exponential
Correct option is A

48. The general tasks that are performed with backpropagation algorithm
A. Pattern mapping
B. Prediction
C. Function approximation
D. All of the above
Correct option is D

49. Backpropagaion learning is based on the gradient descent along error surface.
A. True
B. False
Correct option is A

50. In backpropagation rule, how to stop the learning process?


A. No heuristic criteria exist
B. On basis of average gradient value

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

C. There is convergence involved


D. None of these
Correct option is B

51. Applications of NN (Neural Network)


A. Risk management
B. Data validation
C. Sales forecasting
D. All of the above
Correct option is D

52. The network that involves backward links from output to the input and hidden
layers is known as
A. Recurrent neural network
B. Self organizing maps
C. Perceptrons
D. Single layered perceptron
Correct option is A

53. Decision Tree is a display of an Algorithm?


A. True
B. False
Correct option is A

54. Which of the following is/are the decision tree nodes?


A. End Nodes
B. Decision Nodes
C. Chance Nodes
D. All of the above
Correct option is D

55. End Nodes are represented by which of the following


A. Solar street light
B. Triangles
C. Circles
D. Squares
Correct option is B

56. Decision Nodes are represented by which of the following


A. Solar street light
B. Triangles
C. Circles
D. Squares
Correct option is D

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

57. Chance Nodes are represented by which of the following


A. Solar street light
B. Triangles
C. Circles
D. Squares
Correct option is C

58. Advantage of Decision Trees


A. Possible Scenarios can be added
B. Use a white box model, if given result is provided by a model
C. Worst, best and expected values can be determined for different scenarios
D. All of the above
Correct option is D

59. terms are required for building a bayes model.


A. 1
B. 2
C. 3
D. 4
Correct option is C

60. Which of the following is the consequence between a node and its predecessors
while creating bayesian network?
A. Conditionally independent
B. Functionally dependent
C. Both Conditionally dependant & Dependant
D. Dependent
Correct option is A

61. Why it is needed to make probabilistic systems feasible in the world?


A. Feasibility
B. Reliability
C. Crucial robustness
D. None of the above
Correct option is C

62. Bayes rule can be used for:-


A. Solving queries
B. Increasing complexity
C. Answering probabilistic query
D. Decreasing complexity
Correct option is C

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

63. provides way and means of weighing up the desirability of goals and the
likelihood of achieving
A. Utility theory
B. Decision theory
C. Bayesian networks
D. Probability theory
Correct option is A

64. Which of the following provided by the Bayesian Network?


A. Complete description of the problem
B. Partial description of the domain
C. Complete description of the domain
D. All of the above
Correct option is C

65. Probability provides a way of summarizing the that comes from our laziness
and

A. Belief
B. Uncertaintity
C. Joint probability distributions
D. Randomness
Correct option is B

66. The entries in the full joint probability distribution can be calculated as
A. Using variables
B. Both Using variables & information
C. Using information
D. All of the above
Correct option is C

67. Causal chain (For example, Smoking cause cancer) gives rise to:-
A. Conditionally Independence
B. Conditionally Dependence
C. Both
D. None of the above
Correct option is A

68. The bayesian network can be used to answer any query by using:-
A. Full distribution
B. Joint distribution
C. Partial distribution
D. All of the above
Correct option is B

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

69. Bayesian networks allow compact specification of:-


A. Joint probability distributions
B. Belief
C. Propositional logic statements
D. All of the above
Correct option is A

70. The compactness of the bayesian network can be described by


A. Fully structured
B. Locally structured
C. Partially structured
D. All of the above
Correct option is B

71. The Expectation-Maximization Algorithm has been used to identify conserved


domains in unaligned proteins only. State True or False.
A. True
B. False
Correct option is B

72. Which of the following is correct about the Naive Bayes?


A. Assumes that all the features in a dataset are independent
B. Assumes that all the features in a dataset are equally important
C. Both
D. All of the above
Correct option is C

73. Which of the following is false regarding EM Algorithm?


A. The alignment provides an estimate of the base or amino acid composition of
each column in the site
B. The column-by-column composition of the site already available is used to
estimate the probability of finding the site at any position in each of the
sequences
C. The row-by-column composition of the site already available is used to estimate
the probability
D. None of the above
Correct option is C

74. Naïve Bayes Algorithm is a learning algorithm.


A. Supervised
B. Reinforcement
C. Unsupervised
D. None of these
Correct option is A

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

75. EM algorithm includes two repeated steps, here the step 2 is .


A. The normalization
B. The maximization step
C. The minimization step
D. None of the above
Correct option is C

76. Examples of Naïve Bayes Algorithm is/are


A. Spam filtration
B. Sentimental analysis
C. Classifying articles
D. All of the above
Correct option is D

77. In the intermediate steps of “EM Algorithm”, the number of each base in each
column is determined and then converted to
A. True
B. False
Correct option is A

78. Naïve Bayes algorithm is based on and used for solving classification problems.
A. Bayes Theorem
B. Candidate elimination algorithm
C. EM algorithm
D. None of the above
Correct option is A

79. Types of Naïve Bayes Model:


A. Gaussian
B. Multinomial
C. Bernoulli
D. All of the above
Correct option is D

80. Disadvantages of Naïve Bayes Classifier:


A. Naive Bayes assumes that all features are independent or unrelated, so it cannot
learn the relationship between
B. It performs well in Multi-class predictions as compared to the other
C. Naïve Bayes is one of the fast and easy ML algorithms to predict a class of
D. It is the most popular choice for text classification problems.
Correct option is A

81. The benefit of Naïve Bayes:-


A. Naïve Bayes is one of the fast and easy ML algorithms to predict a class of

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

B. It is the most popular choice for text classification problems.


C. It can be used for Binary as well as Multi-class
D. All of the above
Correct option is D

82. In which of the following types of sampling the information is carried out under
the opinion of an expert?
A. Convenience sampling
B. Judgement sampling
C. Quota sampling
D. Purposive sampling
Correct option is B

83. Full form of MDL?


A. Minimum Description Length
B. Maximum Description Length
C. Minimum Domain Length
D. None of these
Correct option is A

84. For the analysis of ML algorithms, we need


A. Computational learning theory
B. Statistical learning theory
C. Both A & B
D. None of these
Correct option is C

85. PAC stand for


A. Probably Approximate Correct
B. Probably Approx Correct
C. Probably Approximate Computation
D. Probably Approx Computation
Correct option is A

86. hypothesis h with respect to target concept c and distribution D , is the


probability that h will misclassify an instance drawn at random according to D.
A. True Error
B. Type 1 Error
C. Type 2 Error
D. None of these
Correct option is A

87. Statement: True error defined over entire instance space, not just training data
A. True

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

B. False
Correct option is A

88. What are the area CLT comprised of?


A. Sample Complexity
B. Computational Complexity
C. Mistake Bound
D. All of these
Correct option is D

88. What area of CLT tells “How many examples we need to find a good hypothesis
?”?
A. Sample Complexity
B. Computational Complexity
C. Mistake Bound
D. None of these
Correct option is A

89. What area of CLT tells “How much computational power we need to find a good
hypothesis ?”?
A. Sample Complexity
B. Computational Complexity
C. Mistake Bound
D. None of these
Correct option is B

90. What area of CLT tells “How many mistakes we will make before finding a good
hypothesis ?”?
A. Sample Complexity
B. Computational Complexity
C. Mistake Bound
D. None of these
Correct option is C

91. (For question no. 9 and 10) Can we say that concept described by conjunctions of
Boolean literals are PAC learnable?
A. Yes
B. No
Correct option is A

92. How large is the hypothesis space when we have n Boolean attributes?
A. |H| = 3 n
B. |H| = 2 n
C. |H| = 1 n

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

D. |H| = 4n
Correct option is A

93. The VC dimension of hypothesis space H1 is larger than the VC dimension of


hypothesis space H2. Which of the following can be inferred from this?
A. The number of examples required for learning a hypothesis in H1 is larger than
the number of examples required for H2
B. The number of examples required for learning a hypothesis in H1 is smaller than
the number of examples required for
C. No relation to number of samples required for PAC learning.
Correct option is A

94. For a particular learning task, if the requirement of error parameter changes from
0.1 to 0.01. How many more samples will be required for PAC learning?
A. Same
B. 2 times
C. 1000 times
D. 10 times
Correct option is D

95. Computational complexity of classes of learning problems depends on which of


the following?
A. The size or complexity of the hypothesis space considered by learner
B. The accuracy to which the target concept must be approximated
C. The probability that the learner will output a successful hypothesis
D. All of these
Correct option is D

96. The instance-based learner is a


A. Lazy-learner
B. Eager learner
C. Can‟t say
Correct option is A

97. When to consider nearest neighbour algorithms?


A. Instance map to point in kn
B. Not more than 20 attributes per instance
C. Lots of training data
D. None of these
E. A, B & C
Correct option is E

98. What are the advantages of Nearest neighbour alogo?


A. Training is very fast

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

B. Can learn complex target functions


C. Don‟t lose information
D. All of these
Correct option is D

99. What are the difficulties with k-nearest neighbour algo?


A. Calculate the distance of the test case from all training cases
B. Curse of dimensionality
C. Both A & B
D. None of these
Correct option is C

100. What if the target function is real valued in kNN algo?


A. Calculate the mean of the k nearest neighbours
B. Calculate the SD of the k nearest neighbour
C. None of these
Correct option is A

101. What is/are true about Distance-weighted KNN?


A. The weight of the neighbour is considered
B. The distance of the neighbour is considered
C. Both A & B
D. None of these
Correct option is C

102. What is/are advantage(s) of Distance-weighted k-NN over k-NN?


A. Robust to noisy training data
B. Quite effective when a sufficient large set of training data is provided
C. Both A & B
D. None of these
Correct option is C

103. What is/are advantage(s) of Locally Weighted Regression?


A. Pointwise approximation of complex target function
B. Earlier data has no influence on the new ones
C. Both A & B
D. None of these
Correct option is C

104. The quality of the result depends on (LWR)


A. Choice of the function
B. Choice of the kernel function K
C. Choice of the hypothesis space H
D. All of these

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Correct option is D

105. How many types of layer in radial basis function neural networks?
A. 3
B. 2
C. 1
D. 4
Correct option is A, Input layer, Hidden layer, and Output layer

106. The neurons in the hidden layer contains Gaussian transfer function whose
output are to the distance from the centre of the neuron.
A. Directly
B. Inversely
C. equal
D. None of these
Correct option is B

107. PNN/GRNN networks have one neuron for each point in the training file,
While RBF network have a variable number of neurons that is usually
A. less than the number of training
B. greater than the number of training points
C. equal to the number of training points
D. None of these
Correct option is A

108. Which network is more accurate when the size of training set between
small to medium?
A. PNN/GRNN
B. RBF
C. K-means clustering
D. None of these
Correct option is A

109. What is/are true about RBF network?


A. A kind of supervised learning
B. Design of NN as curve fitting problem
C. Use of multidimensional surface to interpolate the test data
D. All of these
Correct option is D

110. Application of CBR


A. Design
B. Planning
C. Diagnosis

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

D. All of these
Correct option is A

111. What is/are advantages of CBR?


A. A local approx. is found for each test case
B. Knowledge is in a form understandable to human
C. Fast to train
D. All of these
Correct option is D

112 In k-NN algorithm, given a set of training examples and the value of k < size of
training set (n), the algorithm predicts the class of a test example to be the. What is/are
advantages of CBR?

A. Least frequent class among the classes of k closest training


B. Most frequent class among the classes of k closest training
C. Class of the closest
D. Most frequent class among the classes of the k farthest training examples.
Correct option is B

113. Which of the following statements is true about PCA?


• We must standardize the data before applying
• We should select the principal components which explain the highest variance
• We should select the principal components which explain the lowest variance
• We can use PCA for visualizing the data in lower dimensions
A. (i), (ii) and (iv).
B. (ii) and (iv)
C. (iii) and (iv)
D. (i) and (iii)
Correct option is A

114. Genetic algorithm is a


A. Search technique used in computing to find true or approximate solution to
optimization and search problem
B. Sorting technique used in computing to find true or approximate solution to
optimization and sort problem
C. Both A & B
D. None of these
Correct option is A

115. GA techniques are inspired by


A. Evolutionary
B. Cytology
C. Anatomy

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

D. Ecology
Correct option is A

116. When would the genetic algorithm terminate?


A. Maximum number of generations has been produced
B. Satisfactory fitness level has been reached for the
C. Both A & B
D. None of these
Correct option is C

117. The algorithm operates by iteratively updating a pool of hypotheses,


called the
A. Population
B. Fitness
C. None of these
Correct option is A

118. What is the correct representation of GA?


A. GA(Fitness, Fitness_threshold, p)
B. GA(Fitness, Fitness_threshold, p, r )
C. GA(Fitness, Fitness_threshold, p, r, m)
D. GA(Fitness, Fitness_threshold)
Correct option is C

119. Genetic operators includes


A. Crossover
B. Mutation
C. Both A & B
D. None of these
Correct option is C

120. Produces two new offspring from two parent string by copying selected
bits from each parent is called
A. Mutation
B. Inheritance
C. Crossover
D. None of these
Correct option is C

121. Each schema the set of bit strings containing the indicated as
A. 0s, 1s
B. only 0s
C. only 1s
D. 0s, 1s, *s

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Correct option is D

122. 0*10 represents the set of bit strings that includes exactly (A) 0010, 0110
A. 0010, 0010
B. 0100, 0110
C. 0100, 0010
Correct option is A

123. Correct ( h ) is the percent of all training examples correctly classified by


hypothesis then Fitness function is equal to
A. Fitness ( h) = (correct ( h)) 2
B. Fitness ( h) = (correct ( h)) 3
C. Fitness ( h) = (correct ( h))
D. Fitness ( h) = (correct ( h)) 4
Correct option is A

124. Statement: Genetic Programming individuals in the evolving population


are computer programs rather than bit
A. True
B. False
Correct option is A

125. evolution over many generations was directly influenced by the


experiences of individual organisms during their lifetime
A. Baldwin
B. Lamarckian
C. Bayes
D. None of these
Correct option is B

126. Search through the hypothesis space cannot be characterized. Why?


A. Hypotheses are created by crossover and mutation operators that allow radical
changes between successive generations
B. Hypotheses are not created by crossover and mutation
C. None of these
Correct option is A

127. ILP stand for


A. Inductive Logical programming
B. Inductive Logic Programming
C. Inductive Logical Program
D. Inductive Logic Program
Correct option is B

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

128. What is/are the requirement for the Learn-One-Rule method?


A. Input, accepts a set of +ve and -ve training examples.
B. Output, delivers a single rule that covers many +ve examples and few -ve.
C. Output rule has a high accuracy but not necessarily a high
D. A & B
E. A, B & C
Correct option is E

129. is any predicate (or its negation) applied to any set of terms.
A. Literal
B. Null
C. Clause
D. None of these
Correct option is A

subscribe our channel


130. Ground literal is a literal that
A. Contains only variables
B. does not contains any functions
C. does not contains any variables
D. Contains only functions Answer
Correct option is C

131. emphasizes learning feedback that evaluates the learner’s


performance without providing standards of correctness in the form of
behavioural
A. Reinforcement learning
B. Supervised Learning
C. None of these
Correct option is A

132. Features of Reinforcement learning


A. Set of problem rather than set of techniques
B. RL is training by reward and
C. RL is learning from trial and error with the
D. All of these
Correct option is D

133. Which type of feedback used by RL?


A. Purely Instructive feedback
B. Purely Evaluative feedback
C. Both A & B
D. None of these

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Correct option is B

134. What is/are the problem solving methods for RL?


A. Dynamic programming
B. Monte Carlo Methods
C. Temporal-difference learning
D. All of these
Correct option is D

135. The FIND-S Algorithm


A. Starts with starts from the most specific hypothesis Answer
B. It considers negative examples
C. It considers both negative and positive
D. None of these Correct
136. The hypothesis space has a general-to-specific ordering of hypotheses, and the
search can be efficiently organized by taking advantage of a naturally occurring structure
over the hypothesis space

1.
A. TRUE
B. FALSE
Correct option is A

137. The Version space is:

A. The subset of all hypotheses is called the version space with respect to the
hypothesis space H and the training examples D, because it contains all plausible
versions of the target
B. The version space consists of only specific
C. None of these
D.
Correct option is A

138. The Candidate-Elimination Algorithm


A. The key idea in the Candidate-Elimination algorithm is to output a description
of the set of all hypotheses consistent with the training
B. Candidate-Elimination algorithm computes the description of this set without
explicitly enumerating all of its
C. This is accomplished by using the more-general-than partial ordering and
maintaining a compact representation of the set of consistent
D. All of these
Correct option is D

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

139. Concept learning is basically acquiring the definition of a general category


from given sample positive and negative training examples of the
A. TRUE
B. FALSE
Correct option is A

140. The hypothesis h1 is more-general-than hypothesis h2 ( h1 > h2) if and


only if h1≥h2 is true and h2≥h1 is false. We also say h2 is more-specific-than h1
A. The statement is true
B. The statement is false
C. We cannot
D. None of these
Correct option is A

141. The List-Then-Eliminate Algorithm


A. The List-Then-Eliminate algorithm initializes the version space to contain all
hypotheses in H, then eliminates any hypothesis found inconsistent with any
training
B. The List-Then-Eliminate algorithm not initializes to the version
C. None of these Answer
Correct option is A

142. What will take place as the agent observes its interactions with the world?
A. Learning
B. Hearing
C. Perceiving
D. Speech
Correct option is A

143. Which modifies the performance element so that it makes better


decision?Performance element
A. Performance element
B. Changing element
C. Learning element
D. None of the mentioned
Correct option is C

144. Any hypothesis found to approximate the target function well over a
sufficiently large set of training examples will also approximate the target
function well over other unobserved example is called:
A. Inductive Learning Hypothesis
B. Null Hypothesis
C. Actual Hypothesis
D. None of these

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Correct option is A

145. Feature of ANN in which ANN creates its own organization or


representation of information it receives during learning time is
A. Adaptive Learning
B. Self Organization
C. What-If Analysis
D. Supervised Learning
Correct option is B

146. How the decision tree reaches its decision?


A. Single test
B. Two test
C. Sequence of test
D. No test
Correct option is C

147. Which of the following is a disadvantage of decision trees?


• Factor analysis
• Decision trees are robust to outliers
• Decision trees are prone to be overfit
• None of the above
Correct option is C

148. Tree/Rule based classification algorithms generate which rule to perform


the classification.
A. if-then.
B. then
C. do
D. Answer
Correct option is A

149. What is Gini Index?


A. It is a type of index structure
B. It is a measure of purity
C. None of the options
Correct option is A

150. What is not a RNN in machine learning?


A. One output to many inputs
B. Many inputs to a single output
C. RNNs for nonsequential input
D. Many inputs to many outputs
Correct option is A

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

151. Which of the following sentences are correct in reference to Information


gain?
A. It is biased towards multi-valued attributes
B. ID3 makes use of information gain
C. The approach used by ID3 is greedy
D. All of these
Correct option is D

152. A Neural Network can answer


A. For Loop questions
B. what-if questions
C. IF-The-Else Analysis Questions
D. None of these Answer
Correct option is B

153. Artificial neural network used for


A. Pattern Recognition
B. Classification
C. Clustering
D. All Answer
Correct option is D

154. Which of the following are the advantage/s of Decision Trees?


A. Possible Scenarios can be added
B. Use a white box model, If given result is provided by a model
C. Worst, best and expected values can be determined for different scenarios
D. All of the mentioned
Correct option is D

155. What is the mathematical likelihood that something will occur?


A. Classification
B. Probability
C. Naïve Bayes Classifier
D. None of the other
Correct option is C

A. What does the Bayesian network provides?


B. Complete description of the domain
C. Partial description of the domain
D. Complete description of the problem
E. None of the mentioned
Correct option is C

157. Where does the Bayes rule can be used?

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

A. Solving queries
B. Increasing complexity
C. Decreasing complexity
D. Answering probabilistic query
Correct option is D

158. How many terms are required for building a Bayes model?
A. 2
B. 3
C. 4
D. 1
Correct option is B

159. What is needed to make probabilistic systems feasible in the world?


A. Reliability
B. Crucial robustness
C. Feasibility
D. None of the mentioned
Correct option is B

160. It was shown that the Naive Bayesian method


A. Can be much more accurate than the optimal Bayesian method
B. Is always worse off than the optimal Bayesian method
C. Can be almost optimal only when attributes are independent
D. Can be almost optimal when some attributes are dependent
Correct option is C

161. What is the consequence between a node and its predecessors while
creating Bayesian network?
A. Functionally dependent
B. Dependant
C. Conditionally independent
D. Both Conditionally dependant & Dependant
Correct option is C

162. How the compactness of the Bayesian network can be described?


A. Locally structured
B. Fully structured
C. Partial structure
D. All of the mentioned
Correct option is A

163. How the entries in the full joint probability distribution can be calculated?
A. Using variables

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

B. Using information
C. Both Using variables & information
D. None of the mentioned
Correct option is B

164. How the Bayesian network can be used to answer any query?
A. Full distribution
B. Joint distribution
C. Partial distribution
D. All of the mentioned
Correct option is B

165. Sample Complexity is


A. The sample complexity is the number of training-samples that we need to
supply to the algorithm, so that the function returned by the algorithm is
within an arbitrarily small error of the best possible function, with probability
arbitrarily close to 1
B. How many training examples are needed for learner to converge to a
successful hypothesis.
C. All of these
Correct option is C

166. PAC stands for


A. Probability Approximately Correct
B. Probability Applied Correctly
C. Partition Approximately Correct
Correct option is A

167. Which of the following will be true about k in k-NN in terms of variance
A. When you increase the k the variance will increases
B. When you decrease the k the variance will increases
C. Can‟t say
D. None of these
Correct option is B

168. Which of the following option is true about k-NN algorithm?


A. It can be used for classification
B. It can be used for regression
C. It can be used in both classification and regression Answer
Correct option is C

169. In k-NN it is very likely to overfit due to the curse of dimensionality. Which
of the following option would you consider to handle such problem? 1).
Dimensionality Reduction 2). Feature selection

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

A. 1
B. 2
C. 1 and 2
D. None of these
Correct option is C

170. When you find noise in data which of the following option would you
consider in k- NN
A. I will increase the value of k
B. I will decrease the value of k
C. Noise can not be dependent on value of k
D. None of these
Correct option is A

171. Which of the following will be true about k in k-NN in terms of Bias?
A. When you increase the k the bias will be increases
B. When you decrease the k the bias will be increases
C. Can‟t say
D. None of these
Correct option is A

172. What is used to mitigate overfitting in a test set?


A. Overfitting set
B. Training set
C. Validation dataset
D. Evaluation set
Correct option is C

173. A radial basis function is a


A. Activation function
B. Weight
C. Learning rate
D. none
Correct option is A

174. Mistake Bound is


A. How many training examples are needed for learner to converge to a successful
hypothesis.
B. How much computational effort is needed for a learner to converge to a
successful hypothesis
C. How many training examples will the learner misclassify before conversing to a
successful hypothesis
D. None of these
Correct option is C

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

175. All of the following are suitable problems for genetic algorithms EXCEPT
A. dynamic process control
B. pattern recognition with complex patterns
C. simulation of biological models
D. simple optimization with few variables
Correct option is D

176. Adding more basis functions in a linear model… (Pick the most probably
option)
A. Decreases model bias
B. Decreases estimation bias
C. Decreases variance
D. Doesn‟t affect bias and variance
Correct option is A

177. Which of these are types of crossover


A. Single point
B. Two point
C. Uniform
D. All of these
Correct option is D

178. A feature F1 can take certain value: A, B, C, D, E, & F and represents grade
of students from a college. Which of the following statement is true in following
case?
A. Feature F1 is an example of nominal
B. Feature F1 is an example of ordinal
C. It doesn‟t belong to any of the above category.
Correct option is B

179. You observe the following while fitting a linear regression to the data: As
you increase the amount of training data, the test error decreases and the
training error increases. The train error is quite low (almost what you expect it to),
while the test error is much higher than the train error. What do you think is the
main reason behind this behaviour? Choose the most probable option.
A. High variance
B. High model bias
C. High estimation bias
D. None of the above Answer
Correct option is C

180. Genetic algorithms are heuristic methods that do not guarantee an


optimal solution to a problem
A. TRUE

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

B. FALSE
Correct option is A

181. Which of the following statements about regularization is not correct?


A. Using too large a value of lambda can cause your hypothesis to underfit the
B. Using too large a value of lambda can cause your hypothesis to overfit the
C. Using a very large value of lambda cannot hurt the performance of your
hypothesis.
D. None of the above
Correct option is A

182. Consider the following: (a) Evolution (b) Selection (c) Reproduction (d)
Mutation Which of the following are found in genetic algorithms?
A. All
B. a, b, c
C. a, b
D. b, d
Correct option is A

183. Genetic Algorithm are a part of


A. Evolutionary Computing
B. inspired by Darwin’s theory about evolution – “survival of the fittest”
C. are adaptive heuristic search algorithm based on the evolutionary ideas of
natural selection and genetics
D. All of the above
Correct option is D

184. Genetic algorithms belong to the family of methods in the


A. artificial intelligence area
B. optimization
C. complete enumeration family of methods
D. Non-computer based (human) solutions area
Correct option is A

185. For a two player chess game, the environment encompasses the opponent
A. True
B. False
Correct option is A

186. Which among the following is not a necessary feature of a reinforcement


learning solution to a learning problem?
A. exploration versus exploitation dilemma
B. trial and error approach to learning
C. learning based on rewards

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

D. representation of the problem as a Markov Decision Process


Correct option is D

187. Which of the following sentence is FALSE regarding reinforcement learning


A. It relates inputs to
B. It is used for
C. It may be used for
D. It discovers causal relationships.
Correct option is D

188. The EM algorithm is guaranteed to never decrease the value of its


objective function on any iteration
A. TRUE
B. FALSE Answer
Correct option is A

189. Consider the following modification to the tic-tac-toe game: at the end of
game, a coin is tossed and the agent wins if a head appears regardless of
whatever has happened in the game.Can reinforcement learning be used to learn
an optimal policy of playing Tic-Tac-Toe in this case?
A. Yes
B. No
Correct option is B

190. Out of the two repeated steps in EM algorithm, the step 2 is


_

A. the maximization step


B. the minimization step
C. the optimization step
D. the normalization step
Correct option is A

191. Suppose the reinforcement learning player was greedy, that is, it always
played the move that brought it to the position that it rated the best. Might it
learn to play better, or worse, than a non greedy player?
A. Worse
B. Better
Correct option is B

192. A chess agent trained by using Reinforcement Learning can be trained by


playing against a copy of the same
A. True

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

B. False
Correct option is A

193. The EM iteration alternates between performing an expectation (E) step,


which creates a function for the expectation of the log-likelihood evaluated using
the current estimate for the parameters, and a maximization (M) step, which
computes parameters maximizing the expected log-likelihood found on the E
A. TRUE
B. FALSE
Correct option is A

194. Expectation–maximization (EM) algorithm is an


A. Iterative
B. Incremental
C. None
Correct option is A

195. Feature need to be identified by using Well Posed Learning Problem:


A. Class of tasks
B. Performance measure
C. Training experience
D. All of these
Correct option is D

196. A computer program that learns to play checkers might improve its
performance as:
A. Measured by its ability to win at the class of tasks involving playing checkers
B. Experience obtained by playing games against
C. Both a & b
D. None of these
Correct option is C

197. Learning symbolic representations of concepts known as:


A. Artificial Intelligence
B. Machine Learning
C. Both a & b
D. None of these
Correct option is A

198. The field of study that gives computers the capability to learn without
being explicitly programmed
A. Machine Learning
B. Artificial Intelligence
C. Deep Learning

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

D. Both a & b
Correct option is A

199. The autonomous acquisition of knowledge through the use of computer


programs is called
A. Artificial Intelligence
B. Machine Learning
C. Deep learning
D. All of these
Correct option is B

200. Learning that enables massive quantities of data is known as


A. Artificial Intelligence
B. Machine Learning
C. Deep learning
D. All of these
Correct option is B

201. A different learning method does not include


A. Memorization
B. Analogy
C. Deduction
D. Introduction
Correct option is D

202. Types of learning used in machine


A. Supervised
B. Unsupervised
C. Reinforcement
D. All of these
Correct option is D

203. A computer program is said to learn from experience E with respect to


some class of tasks T and performance measure P, if its performance at tasks in T,
as measured by P, improves with experience
A. Supervised learning problem
B. Un Supervised learning problem
C. Well posed learning problem
D. All of these
Correct option is C

204. Which of the following is a widely used and effective machine learning
algorithm based on the idea of bagging?
A. Decision Tree

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

B. Regression
C. Classification
D. Random Forest
Correct option is D

205. How many types are available in machine learning?


A. 1
B. 2
C. 3
D. 4
Correct option is C

205. A model can learn based on the rewards it received for its previous action
is known as:
A. Supervised learning
B. Unsupervised learning
C. Reinforcement learning
D. Concept learning
Correct option is C

206. A subset of machine learning that involves systems that think and learn
like humans using artificial neural networks.
A. Artificial Intelligence
B. Machine Learning
C. Deep Learning
D. All of these
Correct option is C

207. A learning method in which a training data contains a small amount of


labeled data and a large amount of unlabeled data is known
as
A. Supervised Learning
B. Semi Supervised Learning
C. Unsupervised Learning
D. Reinforcement Learning
Correct option is C

208. Methods used for the calibration in Supervised Learning


A. Platt Calibration
B. Isotonic Regression
C. All of these
D. None of above
Correct option is C

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

209. The basic design issues for designing a learning


A. Choosing the Training Experience
B. Choosing the Target Function
C. Choosing a Function Approximation Algorithm
D. Estimating Training Values
E. All of these
Correct option is E

210. In Machine learning the module that must solve the given performance
task is known as:
A. Critic
B. Generalizer
C. Performance system
D. All of these
Correct option is C

211. A learning method that is used to solve a particular computational


program, multiple models such as classifiers or experts are strategically generated
and combined is called as
A. Supervised Learning
B. Semi Supervised Learning
C. Unsupervised Learning
D. Reinforcement Learning
E. Ensemble learning
Correct option is E

212. In a learning system the component that takes as takes input the current
hypothesis (currently learned function) and outputs a new problem for the
Performance System to explore.
A. Critic
B. Generalizer
C. Performance system
D. Experiment generator
E. All of these
Correct option is D

213. Learning method that is used to improve the classification, prediction,


function approximation etc of a model
A. Supervised Learning
B. Semi Supervised Learning
C. Unsupervised Learning
D. Reinforcement Learning
E. Ensemble learning
Correct option is E

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

214. In a learning system the component that takes as input the history or trace
of the game and produces as output a set of training examples of the target
function is known as:
A. Critic
B. Generalizer
C. Performance system
D. All of these
Correct option is A

215. The most common issue when using ML is


A. Lack of skilled resources
B. Inadequate Infrastructure
C. Poor Data Quality
D. None of these
Correct option is C

216. How to ensure that your model is not over fitting


A. Cross validation
B. Regularization
C. All of these
D. None of these
Correct option is C

217. A way to ensemble multiple classifications or regression


A. Stacking
B. Bagging
C. Blending
D. Boosting
Correct option is A

218. How well a model is going to generalize in new environment is known as


A. Data Quality
B. Transparent
C. Implementation
D. None of these
Correct option is B

219. Common classes of problems in machine learning is


A. Classification
B. Clustering
C. Regression
D. All of these
Correct option is D

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

220. Which of the following is a widely used and effective machine learning
algorithm based on the idea of bagging?
A. Decision Tree
B. Regression
C. Classification
D. Random Forest
Correct option is D

221. Cost complexity pruning algorithm is used in?


A. CART
B. 5
C. ID3
D. All of
Correct option is A

222. Which one of these is not a tree based learner?


A. CART
B. 5
C. ID3
D. Bayesian Classifier
Correct option is D

223. Which one of these is a tree based learner?


A. Rule based
B. Bayesian Belief Network
C. Bayesian classifier
D. Random Forest
Correct option is D

224. What is the approach of basic algorithm for decision tree induction?
A. Greedy
B. Top Down
C. Procedural
D. Step by Step
Correct option is A

225. Which of the following classifications would best suit the student
performance classification systems?
A. If-.then-analysis
B. Market-basket analysis
C. Regression analysis
D. Cluster analysis
Correct option is A

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

226. What are two steps of tree pruning work?


A. Pessimistic pruning and Optimistic pruning
B. Post pruning and Pre pruning
C. Cost complexity pruning and time complexity pruning
D. None of these
Correct option is B

227. How will you counter over-fitting in decision tree?


A. By pruning the longer rules
B. By creating new rules
C. Both By pruning the longer rules‟ and „ By creating new rules‟
D. None of Answer
Correct option is A

228. Which of the following sentences are true?


A. In pre-pruning a tree is ‘pruned’ by halting its construction early
B. A pruning set of class labeled tuples is used to estimate cost
C. The best pruned tree is the one that minimizes the number of encoding
D. All of these
Correct option is D

229. Which of the following is a disadvantage of decision trees?


A. Factor analysis
B. Decision trees are robust to outliers
C. Decision trees are prone to be over fit
D. None of the above
Correct option is C

230. In which of the following scenario a gain ratio is preferred over


Information Gain?
A. When a categorical variable has very large number of category
B. When a categorical variable has very small number of category
C. Number of categories is the not the reason
D. None of these
Correct option is A

231. Major pruning techniques used in decision tree are


A. Minimum error
B. Smallest tree
C. Both a & b
D. None of these
Correct option is B

232. What does the central limit theorem state?

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

A. If the sample size increases sampling distribution must approach normal


distribution
B. If the sample size decreases then the sample distribution must approach
normal distribution.
C. If the sample size increases then the sampling distributions much approach an
exponential
D. If the sample size decreases then the sampling distributions much approach
an exponential
Correct option is A

233. The difference between the sample value expected and the estimates
value of the parameter is called as?
A. Bias
B. Error
C. Contradiction
D. Difference
Correct option is A

234. In which of the following types of sampling the information is carried out
under the opinion of an expert?
A. Quota sampling
B. Convenience sampling
C. Purposive sampling
D. Judgment sampling
Correct option is D

235. Which of the following is a subset of population?


A. Distribution
B. Sample
C. Data
D. Set
Correct option is B

236. The sampling error is defined as?


A. Difference between population and parameter
B. Difference between sample and parameter
C. Difference between population and sample
D. Difference between parameter and sample
Correct option is C

237. Machine learning is interested in the best hypothesis h from some space
H, given observed training data D. Here best hypothesis means
A. Most general hypothesis
B. Most probable hypothesis

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

C. Most specific hypothesis


D. None of these
Correct option is B

238. Practical difficulties with Bayesian Learning :


A. Initial knowledge of many probabilities is required
B. No consistent hypothesis
C. Hypotheses make probabilistic predictions
D. None of these
Correct option is A

239. Bayes’ theorem states that the relationship between the probability of the
hypothesis before getting the evidence P(H) and the probability of the hypothesis
after getting the evidence P(H∣E) is
A. [P(E∣H)P(H)] / P(E)
B. [P(E∣H) P(E) ] / P(H)
C. [P(E) P(H) ] / P(E∣H)
D. None of these
Correct option is A

240. A doctor knows that Cold causes fever 50% of the time. Prior probability of
any patient having cold is 1/50,000. Prior probability of any patient having fever is
1/20. If a patient has fever, what is the probability he/she has cold?
A. P(C/F)= 0.0003
B. P(C/F)=0.0004
C. P(C/F)= 0.0002
D. P(C/F)=0.0045
Correct option is C

241. Which of the following will be true about k in K-Nearest Neighbor in terms
of Bias?
A. When you increase the k the bias will be increases
B. When you decrease the k the bias will be increases
C. Can‟t say
D. None of these
Correct option is A

242. When you find noise in data which of the following option would you
consider in K- Nearest Neighbor?
A. I will increase the value of k
B. I will decrease the value of k
C. Noise cannot be dependent on value of k
D. None of these
Correct option is A

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

243. In K-Nearest Neighbor it is very likely to overfit due to the curse of


dimensionality. Which of the following option would you consider to handle such
problem?
• Dimensionality Reduction
• Feature selection
A. 1
B. 2
C. 1 and 2
D. None of these
Correct option is C

244. Radial basis functions is closely related to distance-weighted regression,


but it is
A. lazy learning
B. eager learning
C. concept learning
D. none of these
Correct option is B

245. Radial basis function networks provide a global approximation to the


target function, represented by of many local kernel function.
A. a series combination
B. a linear combination
C. a parallel combination
D. a non linear combination
Correct option is B

246. The most significant phase in a genetic algorithm is


A. Crossover
B. Mutation
C. Selection
D. Fitness function
Correct option is A

247. The crossover operator produces two new offspring from


A. Two parent strings, by copying selected bits from each parent
B. One parent strings, by copying selected bits from selected parent
C. Two parent strings, by copying selected bits from one parent
D. None of these
Correct option is A

248. Mathematically characterize the evolution over time of the population


within a GA based on the concept of
A. Schema

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

B. Crossover
C. Don‟t care
D. Fitness function
Correct option is A

249. In genetic algorithm process of selecting parents which mate and


recombine to create off-springs for the next generation is known as:
A. Tournament selection
B. Rank selection
C. Fitness sharing
D. Parent selection
Correct option is D

250. Crossover operations are performed in genetic programming by replacing


A. Randomly chosen sub tree of one parent program by a sub tree from the
other parent program.
B. Randomly chosen root node tree of one parent program by a sub tree from
the other parent program
C. Randomly chosen root node tree of one parent program by a root node tree
from the other parent program
D. None of these
Correct option is A

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

What is Machine Learning (ML)?

The autonomous acquisition of knowledge through the use of manual programs

The selective acquisition of knowledge through the use of computer programs

The selective acquisition of knowledge through the use of manual programs

The autonomous acquisition of knowledge through the use of computer programs

Correct option is D

Father of Machine Learning (ML)

Geoffrey Chaucer

Geoffrey Hill

Geoffrey Everest Hinton

None of the above

Correct option is C

Which is FALSE regarding regression?

It may be used for interpretation

It is used for prediction

It discovers causal relationships

It relates inputs to outputs

Correct option is C

Choose the correct option regarding machine learning (ML) and artificial intelligence (AI)

ML is a set of techniques that turns a dataset into a software

AI is a software that can emulate the human mind

ML is an alternate way of programming intelligent machines

All of the above

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Correct option is D

Which of the factors affect the performance of the learner system does not include?

Good data structures

Representation scheme used

Training scenario

Type of feedback

Correct option is A

In general, to have a well-defined learning problem, we must identity which of the following

The class of tasks

The measure of performance to be improved

The source of experience

All of the above

Correct option is D

Successful applications of ML

Learning to recognize spoken words

Learning to drive an autonomous vehicle

Learning to classify new astronomical structures

Learning to play world-class backgammon

All of the above

Correct option is E

Which of the following does not include different learning methods

Analogy

Introduction

Memorization

Deduction

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Correct option is B

In language understanding, the levels of knowledge that does not include?

Empirical

Logical

Phonological

Syntactic

Correct option is A

Designing a machine learning approach involves:-

Choosing the type of training experience

Choosing the target function to be learned

Choosing a representation for the target function

Choosing a function approximation algorithm

All of the above

Correct option is E

Concept learning inferred a valued function from training examples of its input and output.

Decimal

Hexadecimal

Boolean

All of the above

Correct option is C

Which of the following is not a supervised learning?

Naïve Bayesian

PCA

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Linear Regression

Decision Tree Answer

Correct option is B

What is Machine Learning?

Artificial Intelligence

Deep Learning

Data Statistics

Only (i)

(i) And (ii)

All

None

Correct option is B

What kind of learning algorithm for “Facial identities or facial expressions”?

Prediction

Recognition Patterns

Generating Patterns

Recognizing Anomalies Answer

Correct option is B

Which of the following is not type of learning?

Unsupervised Learning

Supervised Learning

Semi-unsupervised Learning

Reinforcement Learning

Correct option is C

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Real-Time decisions, Game AI, Learning Tasks, Skill Acquisition, and Robot Navigation are applications of
which of the folowing

Supervised Learning: Classification

Reinforcement Learning

Unsupervised Learning: Clustering

Unsupervised Learning: Regression

Correct option is B

Targetted marketing, Recommended Systems, and Customer Segmentation are applications in which of
the following

Supervised Learning: Classification

Unsupervised Learning: Clustering

Unsupervised Learning: Regression

Reinforcement Learning

Correct option is B

Fraud Detection, Image Classification, Diagnostic, and Customer Retention are applications in which of
the following

Unsupervised Learning: Regression

Supervised Learning: Classification

Unsupervised Learning: Clustering

Reinforcement Learning

Correct option is B

Which of the following is not function of symbolic in the various function representation of Machine
Learning?

Rules in propotional Logic

Hidden-Markov Models (HMM)

Rules in first-order predicate logic

Decision Trees

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Correct option is B

Which of the following is not numerical functions in the various function representation of Machine
Learning?

Neural Network

Support Vector Machines

Case-based

Linear Regression

Correct option is C

FIND-S Algorithm starts from the most specific hypothesis and generalize it by considering only

Negative

Positive

Negative or Positive

None of the above

Correct option is B

FIND-S algorithm ignores

Negative

Positive

Both

None of the above

Correct option is A

The Candidate-Elimination Algorithm represents the .

Solution Space

Version Space

Elimination Space

All of the above

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Correct option is B

Inductive learning is based on the knowledge that if something happens a lot it is likely to be generally

True

False Answer

Correct option is A

Inductive learning takes examples and generalizes rather than starting with

Inductive

Existing

Deductive

None of these

Correct option is B

A drawback of the FIND-S is that it assumes the consistency within the training set

True

False

Correct option is A

What strategies can help reduce overfitting in decision trees?

Enforce a maximum depth for the tree

Enforce a minimum number of samples in leaf nodes

Pruning

Make sure each leaf node is one pure class

All

(i), (ii) and (iii)

(i), (iii), (iv)

None

Correct option is B

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Which of the following is a widely used and effective machine learning algorithm based on the idea of
bagging?

Decision Tree

Random Forest

Regression

Classification

Correct option is B

To find the minimum or the maximum of a function, we set the gradient to zero because which of the
following

Depends on the type of problem

The value of the gradient at extrema of a function is always zero

Both (A) and (B)

None of these

Correct option is B

Which of the following is a disadvantage of decision trees?

Decision trees are prone to be overfit

Decision trees are robust to outliers

Factor analysis

None of the above

Correct option is A

What is perceptron?

A single layer feed-forward neural network with pre-processing

A neural network that contains feedback

A double layer auto-associative neural network

An auto-associative neural network

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Correct option is A

Which of the following is true for neural networks?

The training time depends on the size of the

Neural networks can be simulated on a conventional

Artificial neurons are identical in operation to biological

All

Only (ii)

(i) And (ii)

None

Correct option is C

Subscribe our channel

What are the advantages of neural networks over conventional computers?

They have the ability to learn by

They are more fault

They are more suited for real time operation due to their high „computational‟

(i) and (ii)

(i) and (iii)

Only (i)

All

None

Correct option is D

What is Neuro software?

It is software used by Neurosurgeon

Designed to aid experts in real world

It is powerful and easy neural network

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

A software used to analyze neurons

Correct option is C

Which is true for neural networks?

Each node computes it‟s weighted input

Node could be in excited state or non-excited state

It has set of nodes and connections

All of the above

Correct option is D

What is the objective of backpropagation algorithm?

To develop learning algorithm for multilayer feedforward neural network, so that network can be
trained to capture the mapping implicitly

To develop learning algorithm for multilayer feedforward neural network

To develop learning algorithm for single layer feedforward neural network

All of the above

Correct option is A

Which of the following is true?

Single layer associative neural networks do not have the ability to:-

Perform pattern recognition

Find the parity of a picture

Determine whether two or more shapes in a picture are connected or not

(ii) And (iii)

Only (ii)

All

None

Correct option is A

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

The backpropagation law is also known as generalized delta rule

True

False

Correct option is A

Which of the following is true?

On average, neural networks have higher computational rates than conventional computers.

Neural networks learn by

Neural networks mimic the way the human brain

All

(ii) and (iii)

(i), (ii) and (iii)

None

Correct option is A

What is true regarding backpropagation rule?

Error in output is propagated backwards only to determine weight updates

There is no feedback of signal at nay stage

It is also called generalized delta rule

All of the above

Correct option is D

There is feedback in final stage of backpropagation

True

False

Correct option is B

An auto-associative network is

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

A neural network that has only one loop

A neural network that contains feedback

A single layer feed-forward neural network with pre-processing

A neural network that contains no loops

Correct option is B

A 3-input neuron has weights 1, 4 and 3. The transfer function is linear with the constant of
proportionality being equal to 3. The inputs are 4, 8 and 5 respectively. What will be the output?

139

153

162

160

Correct option is B

What of the following is true regarding backpropagation rule?

Hidden layers output is not all important, they are only meant for supporting input and output layers

Actual output is determined by computing the outputs of units for each hidden layer

It is a feedback neural network

None of the above

Correct option is B

What is back propagation?

It is another name given to the curvy function in the perceptron

It is the transmission of error back through the network to allow weights to be adjusted so that the
network can learn

It is another name given to the curvy function in the perceptron

None of the above

Correct option is B

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

The general limitations of back propagation rule is/are

Scaling

Slow convergence

Local minima problem

All of the above

Correct option is D

What is the meaning of generalized in statement “backpropagation is a generalized delta rule” ?

Because delta is applied to only input and output layers, thus making it more simple and generalized

It has no significance

Because delta rule can be extended to hidden layer units

None of the above

Correct option is C

Neural Networks are complex functions with many parameter

Linear

Non linear

Discreate

Exponential

Correct option is A

The general tasks that are performed with backpropagation algorithm

Pattern mapping

Prediction

Function approximation

All of the above

Correct option is D

Backpropagaion learning is based on the gradient descent along error surface.

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

True

False

Correct option is A

In backpropagation rule, how to stop the learning process?

No heuristic criteria exist

On basis of average gradient value

There is convergence involved

None of these

Correct option is B

Applications of NN (Neural Network)

Risk management

Data validation

Sales forecasting

All of the above

Correct option is D

The network that involves backward links from output to the input and hidden layers is known as

Recurrent neural network

Self organizing maps

Perceptrons

Single layered perceptron

Correct option is A

Decision Tree is a display of an Algorithm?

True

False

Correct option is A

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Which of the following is/are the decision tree nodes?

End Nodes

Decision Nodes

Chance Nodes

All of the above

Correct option is D

End Nodes are represented by which of the following

Solar street light

Triangles

Circles

Squares

Correct option is B

Decision Nodes are represented by which of the following

Solar street light

Triangles

Circles

Squares

Correct option is D

Chance Nodes are represented by which of the following

Solar street light

Triangles

Circles

Squares

Correct option is C

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Advantage of Decision Trees

Possible Scenarios can be added

Use a white box model, if given result is provided by a model

Worst, best and expected values can be determined for different scenarios

All of the above

Correct option is D

Terms are required for building a bayes model.

Correct option is C

Which of the following is the consequence between a node and its predecessors while creating bayesian
network?

Conditionally independent

Functionally dependent

Both Conditionally dependant & Dependant

Dependent

Correct option is A

Why it is needed to make probabilistic systems feasible in the world?

Feasibility

Reliability

Crucial robustness

None of the above

Correct option is C

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Bayes rule can be used for:-

Solving queries

Increasing complexity

Answering probabilistic query

Decreasing complexity

Correct option is C

Provides way and means of weighing up the desirability of goals and the likelihood of achieving

Utility theory

Decision theory

Bayesian networks

Probability theory

Correct option is A

Which of the following provided by the Bayesian Network?

Complete description of the problem

Partial description of the domain

Complete description of the domain

All of the above

Correct option is C

65. Probability provides a way of summarizing the that comes from our laziness and

Belief

Uncertaintity

Joint probability distributions

Randomness

Correct option is B

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

The entries in the full joint probability distribution can be calculated as

Using variables

Both Using variables & information

Using information

All of the above

Correct option is C

Causal chain (For example, Smoking cause cancer) gives rise to:-

Conditionally Independence

Conditionally Dependence

Both

None of the above

Correct option is A

The bayesian network can be used to answer any query by using:-

Full distribution

Joint distribution

Partial distribution

All of the above

Correct option is B

Bayesian networks allow compact specification of:-

Joint probability distributions

Belief

Propositional logic statements

All of the above

Correct option is A

The compactness of the bayesian network can be described by

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Fully structured

Locally structured

Partially structured

All of the above

Correct option is B

The Expectation-Maximization Algorithm has been used to identify conserved domains in unaligned
proteins only. State True or False.

True

False

Correct option is B

Which of the following is correct about the Naïve Bayes?

Assumes that all the features in a dataset are independent

Assumes that all the features in a dataset are equally important

Both

All of the above

Correct option is C

Which of the following is false regarding EM Algorithm?

The alignment provides an estimate of the base or amino acid composition of each column in the site

The column-by-column composition of the site already available is used to estimate the probability of
finding the site at any position in each of the sequences

The row-by-column composition of the site already available is used to estimate the probability

None of the above

Correct option is C

Naïve Bayes Algorithm is a learning algorithm.

Supervised

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Reinforcement

Unsupervised

None of these

Correct option is A

EM algorithm includes two repeated steps, here the step 2 is .

The normalization

The maximization step

The minimization step

None of the above

Correct option is C

Examples of Naïve Bayes Algorithm is/are

Spam filtration

Sentimental analysis

Classifying articles

All of the above

Correct option is D

In the intermediate steps of “EM Algorithm”, the number of each base in each column is determined
and then converted to

True

False

Correct option is A

Naïve Bayes algorithm is based on and used for solving classification problems.

Bayes Theorem

Candidate elimination algorithm

EM algorithm

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

None of the above

Correct option is A

Types of Naïve Bayes Model:

Gaussian

Multinomial

Bernoulli

All of the above

Correct option is D

Disadvantages of Naïve Bayes Classifier:

Naïve Bayes assumes that all features are independent or unrelated, so it cannot learn the relationship
between

It performs well in Multi-class predictions as compared to the other

Naïve Bayes is one of the fast and easy ML algorithms to predict a class of

It is the most popular choice for text classification problems.

Correct option is A

The benefit of Naïve Bayes:-

Naïve Bayes is one of the fast and easy ML algorithms to predict a class of

It is the most popular choice for text classification problems.

It can be used for Binary as well as Multi-class

All of the above

Correct option is D

In which of the following types of sampling the information is carried out under the opinion of an
expert?

Convenience sampling

Judgement sampling

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Quota sampling

Purposive sampling

Correct option is B

Full form of MDL?

Minimum Description Length

Maximum Description Length

Minimum Domain Length

None of these

Correct option is A

For the analysis of ML algorithms, we need

Computational learning theory

Statistical learning theory

Both A & B

None of these

Correct option is C

PAC stand for

Probably Approximate Correct

Probably Approx Correct

Probably Approximate Computation

Probably Approx Computation

Correct option is A

86. hypothesis h with respect to target concept c and distribution D , is the probability that h
will misclassify an instance drawn at random according to D.

True Error

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Type 1 Error

Type 2 Error

None of these

Correct option is A

Statement: True error defined over entire instance space, not just training data

True

False

Correct option is A

What are the area CLT comprised of?

Sample Complexity

Computational Complexity

Mistake Bound

All of these

Correct option is D

What area of CLT tells “How many examples we need to find a good hypothesis ?”?

Sample Complexity

Computational Complexity

Mistake Bound

None of these

Correct option is A

What area of CLT tells “How much computational power we need to find a good hypothesis ?”?

Sample Complexity

Computational Complexity

Mistake Bound

None of these

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Correct option is B

What area of CLT tells “How many mistakes we will make before finding a good hypothesis ?”?

Sample Complexity

Computational Complexity

Mistake Bound

None of these

Correct option is C

(For question no. 9 and 10) Can we say that concept described by conjunctions of Boolean literals are
PAC learnable?

Yes

No

Correct option is A

How large is the hypothesis space when we have n Boolean attributes?

|H| = 3 n

|H| = 2 n

|H| = 1 n

|H| = 4n

Correct option is A

The VC dimension of hypothesis space H1 is larger than the VC dimension of hypothesis space H2. Which
of the following can be inferred from this?

The number of examples required for learning a hypothesis in H1 is larger than the number of examples
required for H2

The number of examples required for learning a hypothesis in H1 is smaller than the number of
examples required for

No relation to number of samples required for PAC learning.

Correct option is A

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

For a particular learning task, if the requirement of error parameter changes from 0.1 to 0.01. How
many more samples will be required for PAC learning?

Same

2 times

1000 times

10 times

Correct option is D

Computational complexity of classes of learning problems depends on which of the following?

The size or complexity of the hypothesis space considered by learner

The accuracy to which the target concept must be approximated

The probability that the learner will output a successful hypothesis

All of these

Correct option is D

The instance-based learner is a

Lazy-learner

Eager learner

Can‟t say

Correct option is A

When to consider nearest neighbour algorithms?

Instance map to point in kn

Not more than 20 attributes per instance

Lots of training data

None of these

A, B & C

Correct option is E

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

What are the advantages of Nearest neighbour alogo?

Training is very fast

Can learn complex target functions

Don‟t lose information

All of these

Correct option is D

What are the difficulties with k-nearest neighbour algo?

Calculate the distance of the test case from all training cases

Curse of dimensionality

Both A & B

None of these

Correct opt

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

CS 189 Introduction to
Spring 2016 Machine Learning Final
• Please do not open the exam before you are instructed to do so.
• The exam is closed book, closed notes except your two-page cheat sheet.

• Electronic devices are forbidden on your person, including cell phones, iPods, headphones, and laptops.
Turn your cell phone off and leave all electronics at the front of the room, or risk getting a zero on
the exam.
• You have 3 hours.

• Please write your initials at the top right of each page (e.g., write “JS” if you are Jonathan Shewchuk). Finish
this by the end of your 3 hours.

• Mark your answers on front of each page, not the back. We will not scan the backs of each page, but you may
use them as scratch paper. Do not attach any extra sheets.
• The total number of points is 150. There are 30 multiple choice questions worth 3 points each, and 6 written
questions worth a total of 60 points.
• For multiple-choice questions, fill in the boxes for ALL correct choices: there may be more than one correct
choice, but there is always at least one correct choice. NO partial credit on multiple-choice questions: the
set of all correct answers must be checked.

First name

Last name

SID

First and last name of student to your left

First and last name of student to your right

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Q1. [90 pts] Multiple Choice


Check the boxes for ALL CORRECT CHOICES. Every question should have at least one box checked. NO PARTIAL
CREDIT: the set of all correct answers (only) must be checked.

(1) [3 pts] What strategies can help reduce overfitting in decision trees?

 Pruning  Enforce a minimum number of samples in leaf


nodes
 Make sure each leaf node is one pure class
 Enforce a maximum depth for the tree

(2) [3 pts] Which of the following are true of convolutional neural networks (CNNs) for image analysis?

 Filters in earlier layers tend to include edge  They have more parameters than fully-
detectors connected networks with the same number of lay-
ers and the same numbers of neurons in each layer

 Pooling layers reduce the spatial resolution of  A CNN can be trained for unsupervised learn-
the image ing tasks, whereas an ordinary neural net cannot

(3) [3 pts] Neural networks

 optimize a convex cost function  always output values between 0 and 1

 can be used for regression as well as classifica-


tion  can be used in an ensemble

(4) [3 pts] Which of the following are true about generative models?

 They model the joint distribution P (class =  The perceptron is a generative model
C AND sample = x)
 Linear discriminant analysis is a generative
 They can be used for classification model

(5) [3 pts] Lasso can be interpreted as least-squares linear regression where

 weights are regularized with the ℓ1 norm  the weights have a Gaussian prior

 weights are regularized with the ℓ2 norm  the solution algorithm is simpler

(6) [3 pts] Which of the following methods can achieve zero training error on any linearly separable dataset?

 Decision tree  15-nearest neighbors

 Hard-margin SVM  Perceptron

(7) [3 pts] The kernel trick

 can be applied to every classification algorithm  is commonly used for dimensionality reduction

 changes ridge regression so we solve a d × d  exploits the fact that in many learning al-
linear system instead of an n × n system, given n gorithms, the weights can be written as a linear
sample points with d features combination of input points

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

(8) [3 pts] Suppose we train a hard-margin linear SVM on n > 100 data points in R2 , yielding a hyperplane with
exactly 2 support vectors. If we add one more data point and retrain the classifier, what is the maximum
possible number of support vectors for the new hyperplane (assuming the n + 1 points are linearly separable)?

 2  n

 3  n+1

(9) [3 pts] In latent semantic indexing, we compute a low-rank approximation to a term-document matrix. Which
of the following motivate the low-rank reconstruction?

 Finding documents that are related to each  The low-rank approximation provides a loss-
other, e.g. of a similar genre less method for compressing an input matrix

 In many applications, some principal compo-


nents encode noise rather than meaningful struc-  Low-rank approximation enables discovery of
ture nonlinear relations

(10) [3 pts] Which of the following are true about subset selection?

 Subset selection can substantially decrease the  Subset selection can reduce overfitting
bias of support vector machines

 Ridge regression frequently eliminates some of  Finding the true best subset takes exponential
the features time

(11) [3 pts] In neural networks, nonlinear activation functions such as sigmoid, tanh, and ReLU

 speed up the gradient calculation in backprop-  help to learn nonlinear decision boundaries
agation, as compared to linear units

 are applied only to the output units  always output values between 0 and 1

(12) [3 pts] Suppose we are given data comprising points of several different classes. Each class has a different
probability distribution from which the sample points are drawn. We do not have the class labels. We use
k-means clustering to try to guess the classes. Which of the following circumstances would undermine its
effectiveness?

 Some of the classes are not normally dis-  The variance of each distribution is small in
tributed all directions

 Each class has the same mean  You choose k = n, the number of sample points

(13) [3 pts] Which of the following are true of spectral graph partitioning methods?

 They find the cut with minimum weight  They minimize a quadratic function subject to
one constraint: the partition must be balanced
 They use one or more eigenvectors of the
Laplacian matrix  The Normalized Cut was invented at Stanford

(14) [3 pts] Which of the following can help to reduce overfitting in an SVM classifier?

 Use of slack variables  High-degree polynomial features

 Normalizing the data  Setting a very low learning rate

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

(15) [3 pts] Which value of k in the k-nearest neighbors algorithm generates the solid decision boundary depicted
here? There are only 2 classes. (Ignore the dashed line, which is the Bayes decision boundary.)

 k=1  k=2

 k = 10  k = 100

(16) [3 pts] Consider one layer of weights (edges) in a convolutional neural network (CNN) for grayscale images,
connecting one layer of units to the next layer of units. Which type of layer has the fewest parameters to be
learned during training? (Select one.)

 A convolutional layer with 10 3 × 3 filters  A convolutional layer with 8 5 × 5 filters

 A max-pooling layer that reduces a 10 × 10  A fully-connected layer from 20 hidden units


image to 5 × 5 to 4 output units

(17) [3 pts] In the kernelized perceptron algorithm with learning rate ǫ = 1, the coefficient ai corresponding to a
training example xi represents the weight for K(xi , x). Suppose we have a two-class classification problem with
yi ∈ {1, −1}. If yi = 1, which of the following can be true for ai ?

 ai = −1  ai = 1

 ai = 0  ai = 5

(18) [3 pts] Suppose you want to split a graph G into two subgraphs. Let L be G’s Laplacian matrix. Which of the
following could help you find a good split?

 The eigenvector corresponding to the second-  The left singular vector corresponding to the
largest eigenvalue of L second-largest singular value of L

 The eigenvector corresponding to the second-  The left singular vector corresponding to the
smallest eigenvalue of L second-smallest singular value of L

(19) [3 pts] Which of the following are properties that a kernel matrix always has?

 Invertible  All the entries are positive

 At least one negative eigenvalue  Symmetric

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

(20) [3 pts] How does the bias-variance decomposition of a ridge regression estimator compare with that of ordinary
least squares regression? (Select one.)

 Ridge has larger bias, larger variance  Ridge has smaller bias, larger variance

 Ridge has larger bias, smaller variance  Ridge has smaller bias, smaller variance

(21) [3 pts] Both PCA and Lasso can be used for feature selection. Which of the following statements are true?

 Lasso selects a subset (not necessarily a strict  PCA and Lasso both allow you to specify how
subset) of the original features many features are chosen

 PCA produces features that are linear combi-  PCA and Lasso are the same if you use the
nations of the original features kernel trick

(22) [3 pts] Which of the following are true about forward subset selection?

 O(2d ) models must be trained during the al-  It finds the subset of features that give the
gorithm, where d is the number of features lowest test error

 It greedily adds the feature that most improves  Forward selection is faster than backward se-
cross-validation accuracy lection if few features are relevant to prediction

(23) [3 pts] You’ve just finished training a random forest for spam classification, and it is getting abnormally bad
performance on your validation set, but good performance on your training set. Your implementation has no
bugs. What could be causing the problem?

 Your decision trees are too deep  You have too few trees in your ensemble

 You are randomly sampling too many features  Your bagging implementation is randomly
when you choose a split sampling sample points without replacement
   
6 3 1
 2 7 0
(24) [3 pts] Consider training a decision tree given a design matrix X = 
9 6 and labels y = 1. Let f1 denote
  

4 2 0
feature 1, corresponding to the first column of X, and let f2 denote feature 2, corresponding to the second
column. Which of the following splits at the root node gives the highest information gain? (Select one.)

 f1 > 2  f2 > 3

 f1 > 4  f2 > 6

(25) [3 pts] In terms of the bias-variance decomposition, a 1-nearest neighbor classifier has than a
3-nearest neighbor classifier.

 higher variance  higher bias

 lower variance  lower bias

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

(26) [3 pts] Which of the following are true about bagging?

 In bagging, we choose random subsamples of  The main purpose of bagging is to decrease


the input points with replacement the bias of learning algorithms.

 Bagging is ineffective with logistic regression,  If we use decision trees that have one sample
because all of the learners learn exactly the same point per leaf, bagging never gives lower training
decision boundary error than one ordinary decision tree

(27) [3 pts] An advantage of searching for an approximate nearest neighbor, rather than the exact nearest neighbor,
is that

 it sometimes makes exhaustive search much  the nearest neighbor classifier is sometimes
faster much more accurate

 you find all the points within a distance of


 it sometimes makes searching in a k-d tree (1 + ǫ)r from the query point, where r is the dis-
much faster tance from the query point to its nearest neighbor

(28) [3 pts] In the derivation of the spectral graph partitioning algorithm, we relax a combinatorial optimization
problem to a continuous optimization problem. This relaxation has the following effects.

 The combinatorial problem requires an ex-  The combinatorial problem requires finding
act bisection of the graph, but the continuous al- eigenvectors, whereas the continuous problem re-
gorithm can produce (after rounding) partitions quires only matrix multiplication
that aren’t perfectly balanced

 The combinatorial problem cannot be modi-  The combinatorial problem is NP-hard, but
fied to accommodate vertices that have different the continuous problem can be solved in polyno-
masses, whereas the continuous problem can mial time

(29) [3 pts] The firing rate of a neuron

 determines how strongly the dendrites of the  is more analogous to the output of a unit in a
neuron stimulate axons of neighboring neurons neural net than the output voltage of the neuron

 only changes very slowly, taking a period of  can sometimes exceed 30,000 action potentials
several seconds to make large adjustments per second

(30) [3 pts] In algorithms that use the kernel trick, the Gaussian kernel

 gives a regression function or predictor func-  is equivalent to lifting the d-dimensional sam-
tion that is a linear combination of Gaussians cen- ple points to points in a space whose dimension
tered at the sample points is exponential in d

 is less prone to oscillating than polynomials,  has good properties in theory but is rarely
assuming the variance of the Gaussians is large used in practice

(31) 3 bonus points! The following Berkeley professors were cited in this semester’s lectures (possibly self-cited)
for specific research contributions they made to machine learning.

 David Culler  Michael Jordan

 Jitendra Malik  Leo Breiman

 Anca Dragan  Jonathan Shewchuk

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Q2. [8 pts] Feature Selection


A newly employed former CS 189/289A student trains the latest Deep Learning classifier and obtains state-of-the-art
accuracy. However, the classifier uses too many features! The boss is overwhelmed and asks for a model with fewer
features.

Let’s try to identify the most important features. Start with a simple dataset in R2 .

(1) [4 pts] Describe the training error of a Bayes optimal classifier that can see only the first feature of the data.
Describe the training error of a Bayes optimal classifier that can see only the second feature.

The first feature yields a training error of 50% (like random guessing). The second feature offers a training error of
zero.

(2) [4 pts] Based on this toy example, the student decides to fit a classifier on each feature individually, then
rank the features by their classifier’s accuracy, take the best k features, and train a new classifier on those k
features. We call this approach variable ranking. Unfortunately, the classifier trained on the best k features
obtains horrible accuracy, unless k is very close to d, the original number of features!
Construct a toy dataset in R2 for which variable ranking fails. In other words, a dataset where a variable is
useless by itself, but potentially useful alongside others. Use + for data points in Class 1, and O for data points
in Class 2.

An XOR Dataset is unpredictable with either feature. (This extends to n-dimensions, with the n-bit parity string.)

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Q3. [10 pts] Gradient Descent for k-means Clustering


Recall the loss function for k-means clustering with k clusters, sample points x1 , ..., xn , and centers µ1 , ..., µk :
k X
X
L= kxi − µj k2 ,
j=1 xi ∈Sj

where Sj refers to the set of data points that are closer to µj than to any other cluster mean.

(1) [4 pts] Instead of updating µj by computing the mean, let’s minimize L with batch gradient descent while
holding the sets Sj fixed. Derive the update formula for µ1 with learning rate (step size) ǫ.

∂L ∂ X
= (xi − µ1 )⊤ (xi − µ1 )
∂µ1 ∂µ1
xi ∈S1
X
= 2(µ1 − xi ).
xi ∈S1

Therefore the update formula is X


µ1 ← µ1 + ǫ (xi − µ1 ).
xi ∈S1

(Note: writing 2ǫ instead of ǫ is fine.)

(2) [2 pts] Derive the update formula for µ1 with stochastic gradient descent on a single sample point xi . Use
learning rate ǫ.
µ1 ← µ1 + ǫ(xi − µ1 ) if xi ∈ S1 , otherwise no change.

(3) [4 pts] In this part, we will connect the batch gradient descent update equation with the standard k-means
algorithm. Recall that in the update step of the standard algorithm, we assign each cluster center to be the
mean (centroid) of the data points closest to that center. It turns out that a particular choice of the learning
rate ǫ (which may be different for each cluster) makes the two algorithms (batch gradient descent and the
standard k-means algorithm) have identical update steps. Let’s focus on the update for the first cluster, with
center µ1 . Calculate the value of ǫ so that both algorithms perform the same update for µ1 . (If you do it right,
the answer should be very simple.)
In the standard algorithm, we assign µ1 ← xi ∈S1 |S11 | xi .
P

Comparing to the answer in (1), we set xi ∈S1 |S11 | xi = µ1 + ǫ xi ∈S1 (xi − µ1 ) and solve for ǫ.
P P

X 1 X 1 X
xi − µ1 = ǫ (xi − µ1 )
|S1 | |S1 |
xi ∈S1 xi ∈S1 xi ∈S1
X 1 X
(xi − µ1 ) = ǫ (xi − µ1 ).
|S1 |
xi ∈S1 xi ∈S1

1
Thus ǫ = |S1 | .

(Note: answers that differ by a constant factor are fine if consistent with answer for (1).)

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Q4. [10 pts] Kernels


(1) [2 pts] What is the primary motivation for using the kernel trick in machine learning algorithms?
If we want to map sample points to a very high-dimensional feature space, the kernel trick can save us from
having to compute those features explicitly, thereby saving a lot of time.
(Alternative solution: the kernel trick enables the use of infinite-dimensional feature spaces.)

(2) [4 pts] Prove that for every design matrix X ∈ Rn×d , the corresponding kernel matrix is positive semidefinite.
For every vector z ∈ Rn ,
z⊤ Kz = z⊤ XX ⊤ z = |X ⊤ z|2 ,
which is clearly nonnegative.

(3) [2 pts] Suppose that a regression algorithm contains the following line of code.

w ← w + X ⊤ M XX ⊤ u

Here, X ∈ Rn×d is the design matrix, w ∈ Rd is the weight vector, M ∈ Rn×n is a matrix unrelated to X,
and u ∈ Rn is a vector unrelated to X. We want to derive a dual version of the algorithm in which we express
the weights w as a linear combination of samples Xi (rows of X) and a dual weight vector a contains the
coefficients of that linear combination. Rewrite the line of code in its dual form so that it updates a correctly
(and so that w does not appear).

a ← a + M XX ⊤ u

(4) [2 pts] Can this line of code for updating a be kernelized? If so, show how. If not, explain why.
Yes:
a ← a + M Ku

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Q5. [12 pts] Let’s PCA


 
6 −4
 −3 5 
You are given a design matrix X = 
 −2
. Let’s use PCA to reduce the dimension from 2 to 1.
6 
7 −3

(1) [6 pts] Compute the covariance matrix for the sample points. (Warning: Observe that X is not centered.)
Then compute the unit eigenvectors, and the corresponding eigenvalues, of the covariance matrix. Hint: If
you graph the points, you can probably guess the eigenvectors (then verify that they really are eigenvectors).
 
⊤ 82 −80
The covariance matrix is X X = .
−80 82
" # " #
√1 √1
Its unit eigenvectors are 2 with eigenvalue 2 and 2 with eigenvalue 162. (Note: either eigenvector
√1 − √12
2
can be replaced with its negation.)

(2) [3 pts] Suppose we use PCA to project the sample points onto a one-dimensional space. What one-dimensional
subspace are we projecting onto? For each of the four sample points in X (not the centered version of X!),
write the coordinate (in principal coordinate space, not in R2 ) that the point is projected to.
" #
√1
 
2 1
We are projecting onto the subspace spanned by . (Equivalently, onto the space spanned by . Equiva-
− √12 −1
10
lently, onto the line x + y = 0.) The projections are (6, −4) → √
2
, (−3, 5) → − √82 , (−2, 6) → − √82 , (7, −3) → 10

2
.

(3) [3 pts] Given a design matrix X that is taller than it is wide, prove that every right singular vector of X with
singular value σ is an eigenvector of the covariance matrix with eigenvalue σ 2 .

If v is a right singular vector of X, then there is a singular value decomposition X = U DV ⊤ such that v is a column
of V . Here each of U and V has orthonormal columns, V is square, and D is square and diagonal. The covariance
matrix is X ⊤ X = V DU ⊤ U DV ⊤ = V D2 V ⊤ . This is an eigendecomposition of X ⊤ X, so each singular vector in V
with singular value σ is an eigenvector of X ⊤ X with eigenvalue σ 2 .

10

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Q6. [10 pts] Trees


13

1 5 5
16
10 12 2 12
3 15 3 4 10 9
17
2 4 1 16 8 14
14 13 6 7 15 11
6
8 11 17
9
7

(1) [5 pts] Above, we have two depictions of the same k-d tree, which we have built to solve nearest neighbor
queries. Each node of the tree at right represents a rectangular box at left, and also stores one of the sample
points that lie inside that box. (The root node represents the whole plane R2 .) If a treenode stores sample point
i, then the line passing through point i (in the diagram at left) determines which boxes the child treenodes
represent.
Simulate running an exact 1-nearest neighbor query, where the bold X is the query point. Recall that the query
algorithm visits the treenodes in a smart order, and keeps track of the nearest point it has seen so far.
• Write down the numbers of all the sample points that serve as the “nearest point seen so far” sometime
while the query algorithm is running, in the order they are encountered.
• Circle all the subtrees in the k-d tree at upper right that are never visited during this query. (This is why
k-d tree search is usually faster than exhaustive search.)

Nearest point seen so far: first 5, then 12, then 10.

The unvisited subtrees are rooted at 2, 13, 7, and 17.

(2) [5 pts] We are building a decision tree for a 2-class classification problem. We have n training points, each having
d real-valued features. At each node of the tree, we try every possible univariate split (i.e. for each feature, we
try every possible splitting value for that feature) and choose the split that maximizes the information gain.
Explain why it is possible to build the tree in O(ndh) time, where h is the depth of the tree’s deepest node.
Your explanation should include an analysis of the time to choose one node’s split. Assume that we can radix
sort real numbers in linear time.

Consider choosing the split at a node whose box contains n′ sample points. For each of the d features, we can sort
the sample points in O(n′ d) time. Then we can compute the entropy for the first split (separating the first sample
in the sorted list from the others) in O(n′ ) time, then we can walk through the list and update the entropy for each
successive split in O(1) time, summing to a total of O(n′ ) time for each of the d features. So it takes O(n′ d) time
overall to choose a split.

Each sample point participates in at most h treenodes, so each sample point contributes at most dh to the running
time, for a total running time of at most O(ndh).

11

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Q7. [10 pts] Self-Driving Cars and Backpropagation


You want to train a neural network to drive a car. Your training data consists of grayscale 64 × 64 pixel images. The
training labels include the human driver’s steering wheel angle in degrees and the human driver’s speed in miles per
hour. Your neural network consists of an input layer with 64 × 64 = 4,096 units, a hidden layer with 2,048 units,
and an output layer with 2 units (one for steering angle, one for speed). You use the ReLU activation function for
the hidden units and no activation function for the outputs (or inputs).

(1) [2 pts] Calculate the number of parameters (weights) in this network. You can leave your answer as an
expression. Be sure to account for the bias terms.

4097 × 2048 + 2049 × 2

(2) [3 pts] You train your network with the cost function J = 12 |y − z|2 . Use the following notation.
• x is a training image (input) vector with a 1 component appended to the end, y is a training label (input)
vector, and z is the output vector. All vectors are column vectors.
• r(γ) = max{0, γ} is the ReLU activation function, r′ (γ) is its derivative (1 if γ > 0, 0 otherwise), and
r(v) is r(·) applied component-wise to a vector.
• g is the vector of hidden unit values before the ReLU activation functions are applied, and h = r(g) is
the vector of hidden unit values after they are applied (but we append a 1 component to the end of h).
• V is the weight matrix mapping the input layer to the hidden layer; g = V x.
• W is the weight matrix mapping the hidden layer to the output layer; z = W h.
Derive ∂J/∂Wij .

∂J ∂z
= (z − y)⊤
∂Wij ∂Wij
= (zi − yi )hj

(3) [1 pt] Write ∂J/∂W as an outer product of two vectors. ∂J/∂W is a matrix with the same dimensions as W ;
it’s just like a gradient, except that W and ∂J/∂W are matrices rather than vectors.

∂J
= (z − y)h⊤
∂W

(4) [4 pts] Derive ∂J/∂Vij .

∂J ∂z
= (z − y)⊤
∂Vij ∂Vij
∂h
= (z − y)⊤ W
∂Vij
= (z − y)⊤ W [0, . . . , r′ (gi ) xj , . . . , 0]⊤
= ((z − y)⊤ W )i r′ (gi ) xj .

12

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

5/3/2021 Reinforcement Learning - Ai Quiz Questions

QUIZ
QuizTOPIC - REINFORCEMENT LEARNING
Category

Machine Learning 1. Reinforcement learning is- 

Data Pre Processing 


A. Unsupervised learning 

Regression B. Supervised learning  


C. Award based learning 
Classification D. None

Clustering 

2. Which of the following is an application of reinforcement learning?


Reinforcement Learning 

Natural Language Processing A. Topic modeling  

B. Recommendation system 
Artificial Intelligence C. Pattern recognition  

D. Image classification 

3. Upper confidence bound is a

A. Reinforcement algorithm 
B. Supervised algorithm 

C. Unsupervised algorithm 
D. None 

4. Which of the following is true about reinforcement learning?

A. The agent gets rewards or penalty according to the action 


B. It’s an online learning 

C. The target of an agent is to maximize the rewards 


D. All of the above 

5. You have a task which is to show relative ads to target users. Which
algorithm you should use for this task?

A. K means clustering 
B. Naive Bayes 
C. Support vector machine 

D. Upper confidence bound 

6. Hidden Markov Model is used in-

A. Supervised learning 

B. Unsupervised learning  
C. Reinforcement learning 

https://www.aionlinecourse.com/ai-quiz-questions/machine-learning/reinforcement-learning 1/2
Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)
lOMoARcPSD|7609677

5/3/2021 Reinforcement Learning - Ai Quiz Questions

D. All of the above 

7. Which algorithm is used in robotics and industrial automation?

‘A. Thompson sampling 

B. Naive Bayes 
C. Decision tree 

D. All of the above 

8. Thompson sampling is a-

A. Probabilistic algorithm 
B. Based on Bayes inference rule 
C. Reinforcement learning algorithm 

D. All of the above 

9. Which of the following is false about Upper confidence bound?

A. It’s a Deterministic algorithm 


B. It does not allow delayed feedback 
C. It is not based on Bayes inference 

D. None 

10. The multi-armed bandit problem is a generalized use case for-

A. Reinforcement learning 

B. Supervised learning 
C. Unsupervised learning 
D. All of the above 

About Copyright
Help Terms &
Contact Condition
Blog Privacy Policy

    

© 2021 aionlinecourse.com All rights reserved.

https://www.aionlinecourse.com/ai-quiz-questions/machine-learning/reinforcement-learning 2/2
Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)
lOMoARcPSD|7609677

ML interview interview questions

Machine learning (Lovely Professional University)

StuDocu is not sponsored or endorsed by any college or university


Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)
lOMoARcPSD|7609677

Machine Learning/Data Science Interview


Cheat sheets
Aqeel Anwar
Version: 0.1.0.1

This document contains cheat sheets on various topics asked during a Machine Learn-
ing/Data science interview. This document is constantly updated to include more topics.

Click here to get the updated version

Table of Contents
Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1. Bias-Variance Trade-off . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2. Imbalanced Data in Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3. Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

4. Bayes’ Theorem and Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

5. Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

6. Regularization in ML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

7. Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

8. Famous CNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

9. Ensemble Methods in Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Behavioral Interview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1. How to prepare for behavioral interview? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11

2. How to answer a behavioral question? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Page 1(sakshamsharma0308@gmail.com)
Downloaded by Saksham Sharma of 15
lOMoARcPSD|7609677

Cheat Sheet – Bias-Variance Tradeoff


What is Bias?
• Error between average model prediction and ground truth
• The bias of the estimated function tells us the capacity of the underlying model to
predict the values
What is Variance?
• Average variability in the model prediction for the given dataset
• The variance of the estimated function tells you how much the function can adjust
to the change in the dataset
High Bias Overly-simplified Model
Under-fitting
High error on both test and train data

High Variance Overly-complex Model


Over-fitting
Low error on train data and high on test
Starts modelling the noise in the input

<$YS $sR NBbS $sR


NBbSzs„$st{‚ <$YSzs„$st{‚
NBbS $sR <$YS $sR
NBbSzs„$st{‚

Minimum Error

$sR
zs„$st{‚
e„„B„
<$YSzs„$st{‚

=tI‚„TP$$t FZRSi$Y !%‚„TP$$t


_„‚P‚„„‚IS$PSR$9‚S _„‚P‚„„‚IS$PSR$9‚S
BPSIssR‚S$RSR8sjj BPSIssR‚S$RSjs„‚
Bias variance Trade-off
• Increasing bias reduces variance and vice-versa
• Error = bias2 + variance +irreducible error
• The best model is where the error is reduced.
• Compromise between bias and variance
Source: https://www.cheatsheets.aqeel-anwar.com

Page 2(sakshamsharma0308@gmail.com)
Downloaded by Saksham Sharma of 15
lOMoARcPSD|7609677

Cheat Sheet – Imbalanced Data in Classification


Blue: Label 1

Green: Label 0 Correct Predictions


Accuracy =
Total Predictions
Classifier that always predicts label blue yields prediction accuracy of 90%

Accuracy doesn’t always give the correct insight about your trained model
Accuracy: %age correct prediction Correct prediction over total predictions One value for entire network
Precision: Exactness of model From the detected cats, how many were Each class/label has a value
actually cats
Recall: Completeness of model Correctly detected cats over total cats Each class/label has a value
F1 Score: Combines Precision/Recall Harmonic mean of Precision and Recall Each class/label has a value

Performance metrics associated with Class 1


(Is your prediction correct?) (What did you predict)
Actual Labels True Negative
1 0
(Your prediction is correct) (You predicted 0)
TP FP
True False
Predicted Labels

Precision = False +ve rate =


1

Positive Positive TP + FP TN + FP

(Prec x Rec) TP + TN
F1 score = 2x Accuracy =
(Prec + Rec) TP + FN + FP + TN
False True
0

Negative Negative TN TP
Specificity = Recall, Sensitivity =
TN +FP True +ve rate TP + FN

Possible solutions
1. Data Replication: Replicate the available data until the Blue: Label 1
number of samples are comparable Green: Label 0
2. Synthetic Data: Images: Rotate, dilate, crop, add noise to Blue: Label 1
existing input images and create new data Green: Label 0
3. Modified Loss: Modify the loss to reflect greater error when 𝑙𝑜𝑠𝑠 = 𝑎 ∗ 𝒍𝒐𝒔𝒔𝒈𝒓𝒆𝒆𝒏 + 𝑏 ∗ 𝒍𝒐𝒔𝒔𝒃𝒍𝒖𝒆 𝑎>𝑏
misclassifying smaller sample set
4. Change the algorithm: Increase the model/algorithm complexity so that the two classes are perfectly
separable (Con: Overfitting)
Increase model
complexity

No straight line (y=ax) passing through origin can perfectly Straight line (y=ax+b) can perfectly separate data.
separate data. Best solution: line y=0, predict all labels blue Green class will no longer be predicted as blue

Source: https://www.cheatsheets.aqeel-anwar.com

Page 3(sakshamsharma0308@gmail.com)
Downloaded by Saksham Sharma of 15
lOMoARcPSD|7609677

Cheat Sheet – PCA Dimensionality Reduction


What is PCA?
• Based on the dataset find a new set of orthogonal feature vectors in such a way that the
data spread is maximum in the direction of the feature vector (or dimension)
• Rates the feature vector in the decreasing order of data spread (or variance)
• The datapoints have maximum variance in the first feature vector, and minimum variance
in the last feature vector
• The variance of the datapoints in the direction of feature vector can be termed as a
measure of information in that direction.
Steps
1. Standardize the datapoints
2. Find the covariance matrix from the given datapoints
3. Carry out eigen-value decomposition of the covariance matrix
4. Sort the eigenvalues and eigenvectors

Dimensionality Reduction with PCA


• Keep the first m out of n feature vectors rated by PCA. These m vectors will be the best m
vectors preserving the maximum information that could have been preserved with m
vectors on the given dataset
Steps:
1. Carry out steps 1-4 from above
2. Keep first m feature vectors from the sorted eigenvector matrix
3. Transform the data for the new basis (feature vectors)
4. The importance of the feature vector is proportional to the magnitude of the eigen value

Figure 1 Figure 2
Feature # 1 (F1)

FeFeature # 1

Variance
Variance

1
e#

2
ur

e#
at

ur
at
Fe
w

w
Ne
Ne

F2 F1 Feature # 2 (F2) Feature # 2 F2 F1

Figure 3 Figure 1: Datapoints with feature vectors as


x and y-axis
Figure 2: The cartesian coordinate system is
rotated to maximize the standard deviation
Variance
ew Feature # 1

along any one axis (new feature # 2)


1
#

2 Figure 3: Remove the feature vector with


re

e#
u

ur minimum standard deviation of datapoints


at

at
Fe

Fe F2 F2 (new feature # 1) and project the data on


w
Ne
N

Feature # 2 new feature # 2

Source: https://www.cheatsheets.aqeel-anwar.com

Page 4(sakshamsharma0308@gmail.com)
Downloaded by Saksham Sharma of 15
lOMoARcPSD|7609677

Cheat Sheet – Bayes Theorem and Classifier


What is Bayes’ Theorem?
• Describes the probability of an event, based on prior knowledge of conditions that might be
related to the event.

P(A B)
• How the probability of an event changes when
we have knowledge of another event Posterior
Probability
P(A) P(A B)
Usually, a better
estimate than P(A)
Bayes’ Theorem
Example
• Probability of fire P(F) = 1%
• Probability of smoke P(S) = 10%
Likelihood P(A) Evidence
• Prob of smoke given there is a fire P(S F) = 90%
• What is the probability that there is a fire given P(B A) Prior P(B)
we see a smoke P(F S)? Probability

Maximum Aposteriori Probability (MAP) Estimation


The MAP estimate of the random variable y, given that we have observed iid (x1, x2, x3, … ), is
given by. We try to accommodate our prior knowledge when estimating.
ˆMAP y that maximizes the product of
prior and likelihood

Maximum Likelihood Estimation (MLE)


The MAP estimate of the random variable y, given that we have observed iid (x1, x2, x3, … ), is
given by. We assume we don’t have any prior knowledge of the quantity being estimated.
ˆ y that maximizes only the
MLE
likelihood
MLE is a special case of MAP where our prior is uniform (all values are equally likely)

Naïve Bayes’ Classifier (Instantiation of MAP as classifier)


Suppose we have two classes, y=y1 and y=y2. Say we have more than one evidence/features (x1,
x2, x3, … ), using Bayes’ theorem

Bayes’ theorem assumes the features (x1, x2, x3, … ) are i.i.d. i.e

Source: https://www.cheatsheets.aqeel-anwar.com

Page 5(sakshamsharma0308@gmail.com)
Downloaded by Saksham Sharma of 15
lOMoARcPSD|7609677

Cheat Sheet – Regression Analysis


What is Regression Analysis?
Fitting a function f(.) to datapoints yi=f(xi) under some error function. Based on the estimated
function and error, we have the following types of regression
1. Linear Regression:
Fits a line minimizing the sum of mean-squared error
for each datapoint.
2. Polynomial Regression:
Fits a polynomial of order k (k+1 unknowns) minimizing
the sum of mean-squared error for each datapoint.
3. Bayesian Regression:
For each datapoint, fits a gaussian distribution by
minimizing the mean-squared error. As the number of
data points xi increases, it converges to point
estimates i.e.
4. Ridge Regression:
Can fit either a line, or polynomial minimizing the sum
of mean-squared error for each datapoint and the
weighted L2 norm of the function parameters beta.
5. LASSO Regression:
Can fit either a line, or polynomial minimizing the the
sum of mean-squared error for each datapoint and the
weighted L1 norm of the function parameters beta.
6. Logistic Regression:
Can fit either a line, or polynomial with sigmoid
activation minimizing the binary cross-entropy loss for
each datapoint. The labels y are binary class labels.
Visual Representation:
Linear Regression Polynomial Regression Bayesian Linear Regression Logistic Regression
Label 1
y
y

Label 0

x x x x

Summary:
What does it fit? Estimated function Error Function
Linear A line in n dimensions
Polynomial A polynomial of order k
Bayesian Linear Gaussian distribution for each point
Ridge Linear/polynomial
LASSO Linear/polynomial
Logistic Linear/polynomial with sigmoid

Source: https://www.cheatsheets.aqeel-anwar.com

Page 6(sakshamsharma0308@gmail.com)
Downloaded by Saksham Sharma of 15
lOMoARcPSD|7609677

$sR
Cheat Sheet – Regularization in ML zs„$st{‚
e„„B„

What is Regularization in ML?


• Regularization is an approach to address over-fitting in ML.
• Overfitted model fails to generalize estimations on test data
• When the underlying model to be learned is low bias/high
variance, or when we have small amount of data, the =tI‚„TP$$t FZRSi$Y !%‚„TP$$t
estimated model is prone to over-fitting. _„‚P‚„„‚IS$PSR$9‚S _„‚P‚„„‚IS$PSR$9‚S
BPSIssR‚S$RSR8sjj BPSIssR‚S$RSjs„‚
• Regularization reduces the variance of the model
Types of Regularization: Figure 1. Overfitting
1. Modify the loss function:
• L2 Regularization: Prevents the weights from getting too large (defined by L2 norm). Larger
the weights, more complex the model is, more chances of overfitting.

• L1 Regularization: Prevents the weights from getting too large (defined by L1 norm). Larger
the weights, more complex the model is, more chances of overfitting. L1 regularization
introduces sparsity in the weights. It forces more weights to be zero, than reducing the the
average magnitude of all weights

• Entropy: Used for the models that output probability. Forces the probability distribution
towards uniform distribution.

2. Modify data sampling:


• Data augmentation: Create more data from available data by randomly cropping, dilating,
rotating, adding small amount of noise etc.
• K-fold Cross-validation: Divide the data into k groups. Train on (k-1) groups and test on 1
group. Try all k possible combinations.

3. Change training approach:


• Injecting noise: Add random noise to the weights when they are being learned. It pushes the
model to be relatively insensitive to small variations in the weights, hence regularization
• Dropout: Generally used for neural networks. Connections between consecutive layers are
randomly dropped based on a dropout-ratio and the remaining network is trained in the
current iteration. In the next iteration, another set of random connections are dropped.
5-fold cross-validation Original Network Dropout-ratio = 30%
Test Train
Train Test Train

Train Test Train


Train Test Train

Train Test Connections = 16 Active = 11 (70%) Active = 11 (70%)

Figure 2. K-fold CV Figure 3. Drop-out


Source: https://www.cheatsheets.aqeel-anwar.com

Page 7(sakshamsharma0308@gmail.com)
Downloaded by Saksham Sharma of 15
lOMoARcPSD|7609677

Cheat Sheet – Famous CNNs


AlexNet – 2012
Why: AlexNet was born out of the need to improve the results of
the ImageNet challenge.
What: The network consists of 5 Convolutional (CONV) layers and 3
Fully Connected (FC) layers. The activation used is the Rectified
Linear Unit (ReLU).
How: Data augmentation is carried out to reduce over-fitting, Uses
Local response localization.

VGGNet – 2014
Why: VGGNet was born out of the need to reduce the # of
parameters in the CONV layers and improve on training time
What: There are multiple variants of VGGNet (VGG16, VGG19, etc.)
How: The important point to note here is that all the conv kernels are
of size 3x3 and maxpool kernels are of size 2x2 with a stride of two.

ResNet – 2015
Why: Neural Networks are notorious for not being able to find a
simpler mapping when it exists. ResNet solves that.
What: There are multiple versions of ResNetXX architectures where
‘XX’ denotes the number of layers. The most used ones are ResNet50
and ResNet101. Since the vanishing gradient problem was taken care of
(more about it in the How part), CNN started to get deeper and deeper
How: ResNet architecture makes use of shortcut connections do solve
the vanishing gradient problem. The basic building block of ResNet is
a Residual block that is repeated throughout the network.
Filter
Concatenation

Weight layer

f(x) x 1x1
3x3
Conv
5x5
Conv
1x1 Conv

Weight layer Conv 1x1 1x1 3x3


Conv Conv Maxpool

+ Previous
f(x)+x Layer

Figure 1 ResNet Block Figure 2 Inception Block


Inception – 2014
Why: Lager kernels are preferred for more global features, on the other
hand, smaller kernels provide good results in detecting area-specific
features. For effective recognition of such a variable-sized feature, we
need kernels of different sizes. That is what Inception does.
What: The Inception network architecture consists of several inception
modules of the following structure. Each inception module consists of
four operations in parallel, 1x1 conv layer, 3x3 conv layer, 5x5 conv
layer, max pooling
How: Inception increases the network space from which the best
network is to be chosen via training. Each inception module can
capture salient features at different levels.

Source: https://www.cheatsheets.aqeel-anwar.com

Page 8(sakshamsharma0308@gmail.com)
Downloaded by Saksham Sharma of 15
lOMoARcPSD|7609677

Cheat Sheet – Convolutional Neural Network


Convolutional Neural Network:
The data gets into the CNN through the input layer and passes
through various hidden layers before getting to the output layer.
The output of the network is compared to the actual labels in
terms of loss or error. The partial derivatives of this loss w.r.t the
trainable weights are calculated, and the weights are updated
through one of the various methods using backpropagation.

CNN Template:
Most of the commonly used hidden layers (not all) follow a
pattern
1. Layer function: Basic transforming function such as
convolutional or fully connected layer.
a. Fully Connected: Linear functions between the input and the
output.
a. Convolutional Layers: These layers are applied to 2D (3D) input feature maps. The trainable weights are a 2D (3D)
kernel/filter that moves across the input feature map, generating dot products with the overlapping region of the input
feature map.
b.Transposed Convolutional (DeConvolutional) Layer: Usually used to increase the size of the output feature map
(Upsampling) The idea behind the transposed convolutional layer is to undo (not exactly) the convolutional layer
Fully Connected Layer Convolutional Layer
w11*x
x1 1+ b1
+ b1 y1
w21*x2
x2
1
3 +b
1*x
x3 w3

Input Node Output Node Input Map Kernel Output Map

2. Pooling: Non-trainable layer to change the size of the feature map


a. Max/Average Pooling: Decrease the spatial size of the input layer based on
selecting the maximum/average value in receptive field defined by the kernel
b. UnPooling: A non-trainable layer used to increase the spatial size of the input
layer based on placing the input pixel at a certain index in the receptive field
of the output defined by the kernel.
3. Normalization: Usually used just before the activation functions to limit the
unbounded activation from increasing the output layer values too high
a. Local Response Normalization LRN: A non-trainable layer that square-normalizes the pixel values in a feature map
within a local neighborhood.
b. Batch Normalization: A trainable approach to normalizing the data by learning scale and shift variable during training.
3. Activation: Introduce non-linearity so CNN can 5. Loss function: Quantifies how far off the CNN prediction
efficiently map non-linear complex mapping. is from the actual labels.
a. Non-parametric/Static functions: Linear, ReLU a. Regression Loss Functions: MAE, MSE, Huber loss
b. Parametric functions: ELU, tanh, sigmoid, Leaky ReLU b. Classification Loss Functions: Cross entropy, Hinge loss
c. Bounded functions: tanh, sigmoid 4.0
MSE Loss
2.0
MAE Loss
2.0
Huber Loss
mse = (x − x̂)2 mae = |x − x̂| 1 2
! "
2 (x − x̂) : |x − x̂| < γ
3.5 1.75 1.75 γ|x − x̂| − 12 γ 2 : else
γ =1.9
3.0 1.5 1.5
2.5 1.25 1.25
2.0 1.0 1.0
1.5 0.75 0.75
1.0 0.5 0.5
0.5 0.25 0.25
0.0 0.0 0.0
-2.0 -1.0 0.0 1.0 2.0 -2.0 -1.0 0.0 1.0 2.0 -2.0 -1.0 0.0 1.0 2.0

Hinge Loss Cross Entropy Loss


1.0
3.0 !
max(0, 1 − x̂) : x = 1
"
−ylog(p) − (1 − y)log(1 − p)
max(0, 1 + x̂) : x = −1 8.0
2.5 0.8

2.0 6.0 0.6

1.5
4.0 0.4
1.0
2.0
0.5 0.2

0.0 0.0 0.0


-2.0 -1.0 0.0 1.0 2.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Source: https://www.cheatsheets.aqeel-anwar.com

Page 9(sakshamsharma0308@gmail.com)
Downloaded by Saksham Sharma of 15
lOMoARcPSD|7609677

Cheat Sheet – Ensemble Learning in ML


What is Ensemble Learning? Wisdom of the crowd
Combine multiple weak models/learners into one predictive model to reduce bias, variance and/or improve accuracy.

Types of Ensemble Learning: N number of weak learners


1.Bagging: Trains N different weak models (usually of same types – homogenous) with N non-overlapping subset of the
input dataset in parallel. In the test phase, each model is evaluated. The label with the greatest number of predictions is
selected as the prediction. Bagging methods reduces variance of the prediction

2.Boosting: Trains N different weak models (usually of same types – homogenous) with the complete dataset in a
sequential order. The datapoints wrongly classified with previous weak model is provided more weights to that they can
be classified by the next weak leaner properly. In the test phase, each model is evaluated and based on the test error of
each weak model, the prediction is weighted for voting. Boosting methods decreases the bias of the prediction.

3.Stacking: Trains N different weak models (usually of different types – heterogenous) with one of the two subsets of the
dataset in parallel. Once the weak learners are trained, they are used to trained a meta learner to combine their
predictions and carry out final prediction using the other subset. In test phase, each model predicts its label, these set of
labels are fed to the meta learner which generates the final prediction.

The block diagrams, and comparison table for each of these three methods can be seen below.
Ensemble Method – Boosting Ensemble Method – Bagging
Input Dataset Step #1 Input Dataset
Step #1 Create N subsets
Assign equal weights Complete dataset from original Subset #1 Subset #2 Subset #3 Subset #4
to all the datapoints dataset, one for each
in the dataset weak model

Uniform weights
Step #2
Train each weak
Weak Model Weak Model Weak Model Weak Model
Step #2a Step #2b model with an
Train a weak model Train Weak • Based on the final error on the independent #1 #2 #3 #4
with equal weights to trained weak model, calculate a subset, in
Model #1 parallel
all the datapoints scalar alpha.
• Use alpha to increase the weights of
wrongly classified points, and
decrease the weights of correctly
alpha1 Adjusted weights classified points
Step #3
In the test phase, predict from
each weak model and vote their Voting
Step #3b predictions to get final prediction
Step #3a Train Weak • Based on the final error on the
Train a weak model Model #2 trained weak model, calculate a
with adjusted weights scalar alpha.
on all the datapoints • Use alpha to increase the weights of
in the dataset wrongly classified points, and Final Prediction
decrease the weights of correctly
alpha2 Adjusted weights classified points

Train Weak Ensemble Method – Stacking


Model #3
Step #1
Create 2 subsets from Input Dataset
original dataset, one
for training weak Subset #1 – Weak Learners Subset #3#2 – Meta Learner
Subset
alpha3 Adjusted weights models and one for
meta-model

Train Weak
Step #(n+1)a Model #4 Step #2
Train a weak model Train each weak
with adjusted weights model with the
Train Weak Train Weak Train Weak Train Weak
on all the datapoints weak learner Model #1 Model #2 Model #3 Model #4
in the dataset dataset
alpha3

x x x x Input Dataset
Subset #1 – Weak Learners Subset #2 – Meta Learner
Step #n+2
In the test phase, predict from each
weak model and vote their predictions
weighted by the corresponding alpha to
get final prediction Step #3
Voting Train a meta-
learner for which Trained Weak Trained Weak Trained Weak Trained Weak
the input is the
outputs of the Model Model Model Model
weak models for #1 #2 #3 #4
the Meta Learner
dataset
Final Prediction

Parameter Bagging Boosting Stacking


Meta Model
Focuses on Reducing variance Reducing bias Improving accuracy
Nature of weak
Homogenous Homogenous Heterogenous Step #4
learners is In the test phase, feed the input to the
weak models, collect the output and feed
Weak learners are Learned voting it to the meta model. The output of the
Final Prediction
Simple voting Weighted voting meta model is the final prediction
aggregated by (meta-learner)

Source: https://www.cheatsheets.aqeel-anwar.com

Page 10(sakshamsharma0308@gmail.com)
Downloaded by Saksham Sharma of 15
lOMoARcPSD|7609677

How to prepare for


1/4 behavioral interview?
Collect stories, assign keywords, practice
the STAR format

Keywords List important keywords that will be populated with your personal
stories. Most common keywords are given in the table below

Conflict Compromise to
Negotiation Creativity Flexibility Convincing
Resolution achieve goal
Another team Adjust to a
Handling Challenging Working with
priorities not colleague Take Stand
Crisis Situation difficult people
aligned style
Handling –ve Coworker Working with a Your Influence
Your strength
feedback view of you deadline weakness Others
Handling Converting Decision
Handling Conflict Mentorship/
unexpected challenge to without enough
failure Resolution Leadership
situation opportunity data

Stories
1. List all the organizations you have been a part of. For example
1. Academia: BSc, MSc, PhD
2. Industry: Jobs, Internship
3. Societies: Cultural, Technical, Sports
2. Think of stories from step 1 that can fall into one of the keywords categories. The
more stories the better. You should have at least 10-15 stories.
3. Create a summary table by assigning multiple keywords to each stories. This will help
you filter out the stories when the question asked in the interview. An example can be
seen below
Story 1: [Convincing] [Take Stand] [influence other]
Story 2: [Mentorship] [Leadership]
Story 3: [Conflict resolution] [Negotiation]
Story 4: [decision-without-enough-data]

STAR Format
Write down the stories in the STAR format as explained in the 2/4 part of this cheat
sheet. This will help you practice the organization of story in a meaningful way.

Icon Source: www.flaticon.com

Source: https://www.cheatsheets.aqeel-anwar.com

Page 11(sakshamsharma0308@gmail.com)
Downloaded by Saksham Sharma of 15
lOMoARcPSD|7609677

How to prepare for


2/4 behavioral interview?
Direct*, meaningful*, personalized*, logical*
*(Respective colors are used to identify these characteristics in the example)

Example: “Tell us about a time when you had to convince senior executives”

S
“I worked as an intern in XYZ company in
Situation the summer of 2019. The project details
provided to me was elaborative. After
Explain the situation and some initial brainstorming, and research I
realized that the project approach can be
provide necessary context for modified to make it more efficient in
terms of the underlying KPIs. I decided to
your story. talk to my manager about it.”

“I had an hour-long call with my manager

T
and explained him in detail the proposed
Task approach and how it could improve the
KPIs. I was able to convince him. He
Explain the task and your asked me if I will be able to present my
proposed approach for approval in front of
responsibility in the the higher executives. I agreed to it. I was
working out of the ABC(city) office and
situation the executives need to fly in from
XYZ(city) office.”

“I did a quick background check on the


Action

A
executives to know better about their area
of expertise so that I can convince them
Walk through the steps and accordingly. I prepared an elaborative 15
slide presentation starting with explaining
actions you took to address their approach, moving onto my proposed
the issue approach and finally comparing them on
preliminary results.

“After some active discussion we were able


to establish that the proposed approach
Result

R
was better than the initial one. The
executives proposed a few small changes
State the outcome of the to my approach and really appreciated my
result of your actions stand. At the end of my internship, I was
selected among the 3 out of 68 interns
who got to meet the senior vice president
of the company over lunch.”

Icon Source: www.flaticon.com


Icon Source: www.flaticon.com
Source: https://www.cheatsheets.aqeel-anwar.com

Page 12(sakshamsharma0308@gmail.com)
Downloaded by Saksham Sharma of 15
lOMoARcPSD|7609677

How to answer a
3/4 behavioral question?
Understand, Extract, Map, Select and Apply
Example: “Tell us about a time when you had to convince senior executives”

Understand the question


Example: A story where I was able to convince
Understand my seniors. Maybe they had something in mind,
and I had a better approach and tried to
convince them

Extract keywords and tags


Extract useful keywords that encapsulates the
Extract Example:
gist of the question

[Convincing], [Creative], [Leadership]

Map the keyword to your stories


Shortlist all the stories that fall under the
Map keywords extracted from previous step
Example:
Story1, Story2, Story3, Story4, … , Story N

Select the best story


From the shortlisted stories, pick the one that
Select best describes the question and has not been used
so far in the interview
Example: Story3

Apply the STAR method


Apply the STAR method on the selected story to
Apply answer the question

Example: See Cheat Sheet 2/3 for details

Icon Source: www.flaticon.com


Icon Source: www.flaticon.com
Source: https://www.cheatsheets.aqeel-anwar.com

Page 13(sakshamsharma0308@gmail.com)
Downloaded by Saksham Sharma of 15
lOMoARcPSD|7609677

Behavioral Interview
4/4 Cheat Sheet
Summarizing the behavioral interview

Gather important topics as keywords


1 Understand and collect all the important topics
commonly asked in the interview

Collect your stories

How to
2 Based on all the organizations you have been a part of,
think of all the stories that fall under the keywords above

prepare Practice stories in STAR format

for the 3 Practice each story using the STAR format. You will have
to answer the question following this format.

interview Assign keywords to stories


4 Assign each of your story one or more keywords. This will
help you recall them quickly

Create a summary table


5 Create a summary table mapping stories to their associated
keywords. This will be used during the behavioral question

Understand the question


U Understand the question and clarify any confusions that
you have

Extract the keywords

How to E Try to extract one or more of the keywords from the


question
answer a
Map the keywords to stories
question
during
M Based on the keywords extracted, find the stories using the
summary table created during preparation (Step 4)

interview Select a story


S Since each keyword maybe assigned to multiple stories,
select the one that is most relevant and has not been used.

Apply the START format


A Once the story has been shortlisted, apply STAR format on
the story to answer the question.
Icon Source: www.flaticon.com

Source: https://www.cheatsheets.aqeel-anwar.com

Page 14(sakshamsharma0308@gmail.com)
Downloaded by Saksham Sharma of 15
lOMoARcPSD|7609677

Follow the Author:


Follow the author for more machine learning/data science content at

• ‘ Medium:https://aqeel-anwar.medium.com
• ° LinkedIn:https://www.linkedin.com/in/aqeelanwarmalik/

Version History
• Version 0.1.0.1 - Apr 05, 2021
Fixed minor typo issues in Baye’s Theorem, Regression analysis and Classifier and
PCA dimensionality reduction cheat sheets.

• Version 0.1.0.0 - Mar 30, 2021


Initial draft with nine basics of ML and two behavioral interview cheat sheets.

Page 15(sakshamsharma0308@gmail.com)
Downloaded by Saksham Sharma of 15
lOMoARcPSD|7609677

Advance ML - practice

Machine learning (Lovely Professional University)

StuDocu is not sponsored or endorsed by any college or university


Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)
lOMoARcPSD|7609677

Q- Let us assume we implement an AND function to a single neuron. Below is a


tabular representation of an AND function. What would be the weights and
bias?

What would be the weights and bias?


A. Bias = -1.5, w1 = 1, w2 = 1
B. Bias = 1.5, w1 = 2, w2 = 2
C. Bias = 1, w1 = 1.5, w2 = 1.5
D. None of these

Q-What are the steps for using a gradient descent algorithm?


1.Calculate error between the actual value and the predicted value
2. Reiterate until you find the best weights of network
3.Pass an input through the network and get values from output layer
4.Initialize random weight and bias
5.Go to each neurons which contributes to the error and change its respective
values to reduce the error

A. 1, 2, 3, 4, 5
B. 5, 4, 3, 2, 1
C. 4, 3, 1, 5, 2
D. 3, 2, 1, 5, 4

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Q- Suppose you are inputting an image of size (150 x150 x3) with filter size=2,
stride=1,padding=0. What would be the output size of an image?

A. 150x150
B. 149x 149
C. 148x 148
D. 147 x 147

Q-which of the following metric will best analyze the performance of any
model?
A. Precision
B. Recall
C. F-Score
D. None of the mentioned

Q-the number of nodes in the input is 20 and the hidden layer is 5. Then what
would be the maximum number of connections exists between the input layer
and the output layer?
A. 100
B. 25
C. less than 100
D. Greater than 100

Q-Why do we use cross validation:


A. to check the accuracy of the model
B. to check the robustness of the model
C. to analyze ROC curve
D. all of the above

Q- if loss='categorical_crossentropy', then which type of classification is used?

A. Binary classification
B. Multiclassification

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Q-A perceptron is a –
a. A single layer feed-forward neural network with pre-processing
b. An auto-associative neural network
c. A double layer auto-associative neural network
d. A neural network that contains feedback

Q- Which of the following is true


1 On average. Neural networks have higher computational rates than
conventional computers
2 Neural networks learn by example
3 neural networks mimic the way the human brain works
A. All of these
B. 1 and 2 are true
C. 1,2 and 3 are true
D. None of these

Q-What is back propagation


A. It is another name given to the curvy function in the perceptron
B. It is the transmission of error back through the network to adjust the
inputs
C. It is the transmission of error back through the network to allow
weights to be adjusted so that network can learn
D. None of these

Q-Neural networks are complex ---------------- with many parameters


a. Linear functions
b. Nonlinear functions
c. Discrete functions
d. Exponential functions

Q-Which one of the folowing gives higher accuracy:


A. Random forest
b. SVM

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Q-Which tool is NOT Suited for building ANN models? *


Python
TensorFlow
Keras
Excel

Q-How can we improve the calculation speed in TensorFlow, without losing


accuracy? *
Using GPU
By doing random sampling on Tensors
By removing few nodes from computational graphs
by removing the hidden layers

Q-How calculations work in TensorFlow? *


Through vector multiplications
Through RDDs
Through Computational Graphs
Through map reduce tasks

Q-Which tool is best suited for solving Deep Learning problems *


R
Sklearn
Excel
TensorFlow

Q-A tensor is similar to *


Data Array
ANN Model
SQL query
Pythoncode

Which of the following will be used to convert Numpy array to TensorFlow


tensor?

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Otf.convert_to_tensor()
O np.array()
O tf.make_ndarray()
O tf.constant()
Which of the following must be initialized in Tensorflow?
O Placeholders
O Variables
O Sessions
O All of the above

What will be the output of the following?


import numpy as np
c = tf.constant([[1,2,3].[4,5,6]])
print("Python List input: {}".format(c.get_shape()))
OPython list input: (2, 3)
O Python list input: (3, 2)
O Python list input: (3, 3)
O None of the mentioned

Which of the following function is used for ragged data?


O tf.ragged.Ragged Tensor()
O tf.ragged. Tensor ()
Otf.Ragged Tensor ()
O tf.ragged ()

The parameters that are require to be learnt in minimizing objective function


in supervised learning
O Only weight
O Only bias
Both of the mentioned
O None of the mentioned
What would be the output of the following?
import numpy as np
shape=(3,4,2)
input-np zeros(shape)
print(input)
Options:

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

[[[0.0.] [o. 0.] [o. o.] [o. o.]] [[o. o.] [o. o.] [o. o.] [o. o.]] [[o.o.] [o. o.] [o. o.] [o.
o.]]]
O [[[0.0.] [0.0.] [o. o.]] [[0. o.] [o. o.) [0.0.]]]
O [[[0.0.] [0.0.] [0. o.]] [[o. o.] [o. o.] [o. o.]] [[o. o.] [o. o.] [o. o.]]]
O None of the mentioned

Which of these statements about deep learning programming frameworks are


true?
Deep learning programming frameworks require cloud-based machines to run.
O Even if a project is currently open source, good governance of the project
helps ensure that the it remains open even in the long term, rather tum
become closed or modified to benefit only one company.
O A programming framework does not allow you to code up deep learning
algoritlans with typically fewer lines of code than a lower-level language such
as Python
O None of the mentioned

"Grouping of people based on their performance" is an example of:


O Clustering
O Classification
O Regression
O None of the mentioned
Consider the following statement "it takes less time to navigate the regions
having a ge
i) Gradient Descent Algorithm
ii) Momentum based Gradient Descent Algorithm
Only 1
Only 2
O Both (i) and (ii)
O None of the mentioned

Which of the following is true in terms of seed?


validation generator = data_generator.flow_from_directory(
train_data_dir, target_size= (img_width, img_height),
batch_size= batch_size, shuffle = True, class_mode = 'categorical',
seed = 42, subset= 'validation')
O a fixed value set drawn from a random distribution

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

O to produce the same random tensor for a given shape and dtype.
Both a andb
O None of the mentioned

What will be the output of the following?


import numpy as np c = tf.constant(np.array([
[[1,2,3],
[4.5.6]].
[[1.1.1].
])
print("3d NumPy array input: {}" format(c.get_shape()))

O 3d NumPy array input: (4, 2, 3)


3d NumPy array input: (2, 2, 3)
O 3d NumPy array input: (2, 4, 3)
O 3d NumPy array input: (2,2,2)

What will be the output of the given code?


import tensorflow as tf h-tf.constant("Deep") w=tf.constant(" Learning")
o=h+w print(o)
O Deep Learning
O tf Tensor(Deep Learning, shape-(1,1), dtype-string
tf Tensor(Deep Learning, shape-(), dtype-string)
O Error

What would be the output of the following?


t-tf constant([[5.0.6.0.17.0,8.0]])
v1=tf.Variable(t,name='hello') v2=tf.Variable(t+1, name='hello').
print(v1 ==v2)
Otf Tensor( [[False False] [True True]], shape=(2, 2), dtype-bool)
Otf Tensor( [[True True] [True True]], shape=(2, 2), dtype-bool)
Ⓒtf. Tensor( [[False False] [False False]], shape=(2, 2), dtype=bool)
None of the mentioned.

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

What does validation_split-0.20 means in the given statement?


model fit(inputX, inputY, validation_split-0.20, epochs-10, batch_size=10)
O to use 50% of the data before shuffling for validation and rest 50% for
training
to use 80% of the data for validation before shuffling
to use last 20% of the data for validation before shuffling
None of the mentioned

What does validation_split-0.20 means in the given statement?


model fit(inputX, input Y, validation_split-0.20, epochs-10, batch_size=10)
O to use 50% of the data before shuffling for validation and rest 50% for
training
O to use 80% of the data for validation before shuffling
to use last 20% of the data for validation before shuffling
O None of the mentioned

Which of the following statements are true: Feature Engineering is


1. A process of putting domain knowledge into the creation of feature
extractors.
2. Used to reduce the complexity of data.
O Only 1
O Only 2
Both are true
O Both are false

Consider the statement "Given a person's credentials and background


information, your system should assess whether a person loan grant". Which
technique is applicable to this scenario
Machine Learning
O Deep Learning
O Reinforcement Learning
O All of the above.

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

The effect of using loss in following statement is?


model.compile(optimizer='adam',loss-ff keras
losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy'])
Oto compute the quantity that a model should seek to minimize during
training.
O to return the sum of the per-sample losses in the batch
O Both of the mentioned
O None of the mentioned

Suppose we have a neural network with ReLU activation function. Now, we


replace ReLu activations 1 y linear. Would this new neural network be able to
approximate an AND function?
Yes
O No

Which tool is best suited for solving Deep Learning problems

• R
• Sk-learn
• Excel
• TensorFlow

2. A tensor is similar to

• Data Array
• ANN Model
• SQL query
• Pythoncode

3. How calculations work in TensorFlow

• Through vector multiplications


• Through RDDs
• Through Computational Graphs
• Through map reduce tasks

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

4. In TensorFlow, what is the used of a session?

• The current work space session for storing the code


• We launch the graph in a session
• A session is used to download the data
• A session is used for exporting data out of TensorFlow

5. What does feed_dict do?

• Feeds external data into computational graphs


• Creates a new place holder
• Creates a new tensor
• Creates a new session

6. out=tf.add(tf.matmul(X,W), b)

• Logistic Regression Equaltion


• Deep ANN equation
• Random Forest Equation
• Linear Regression equation

7. tf.reduce_sum(tf.square(out-Y))

• Linear Model equation


• Maximum Entropy loss function
• Squared Error loss function
• Feed_dict process
View Answer

8. How can we improve the calculation speed in TensorFlow, without losing


accuracy?

• Using GPU
• By doing random sampling on Tensors
• By removing few nodes from computational graphs
• by removing the hidden layers
View Answer

9. Keras is a deep learning framework on which tool

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

• R
• TensorFlow
• SAS
• Azure
View Answer

10. What is the meaning of model=sequentil() in Keras?

• No such code in Keras


• Keras should be used only for sequential models like RNNs
• Keras builds sequential models
• creates a computational graph

1. Which tool is NOT Suited for building ANN models

• Python
• TensorFlow
• Excel
• Keras
View Answer

12. Can we have multidimentional tensors

• No tensor can have maximum two dimentions


• Possible only in image data
• Yes possible
• Possible only in geo tagged data
View Answer

13. Why Tensorflow uses computational graphs?

• Tensors are nothing but computational graphs


• Graphs are easy to plot
• There is no such concept of computational graphs in TensorFlow
• Calculations can be done in parallel
View Answer

14. How do we perform caculations in TensorFlow?

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

• We launch the computational graph in a session


• We launch the sesssion inside a computational graph
• By creating multiple tensors
• By creating data frames
View Answer

15. How do you feed external data into placeholders?

• by using impoar data command


• by using feed_dict
• by using read data function
• Not possible

16 out=tf.sigmoid(tf.add(tf.matmul(X,W), b))

• Logistic Regression Equaltion


• Deep ANN equation
• Random Forest Equation
• Linear Regression equation
View Answer

17. C=-tf.reduce_sum(Y*tf.log(out))

• C is a logistc regression line equation


• C is a squared error loss function
• C is a cross entropy loss function
• C is a linear regression line equation
View Answer

18. Can we use GPU for faster computations in TensorFlow

• No, not possible


• Possible only on cloud
• Possible only with small datasets
• Yes, possible
View Answer

19. Which tool is a deep learning wrapper on TensorFlow

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

• Python
• Keras
• PyTourch
• Azure
View Answer

20. How deep learning models are built on Keras

• by using sequential models


• by using feed_dict
• by creating place holders and computational graphs
• by creating data frames

Which of the subsequent declaration(s) effectively represents an actual neuron


in TensorFlow?
• A neuron has a single enter and a single output best
• A neuron has multiple inputs but a single output only
• A neuron has a single input, however, more than one outputs
• A neuron has multiple inputs and more than one outputs
• All of the above statements are valid

What are the stairs for the usage of a gradient descent algorithm in
TensorFlow?
1. Calculate error among the actual fee and the anticipated price
2. Reiterate until you find the excellent weights of the network
3. Pass an enter via the community and get values from the output layer
4. Initialize random weight and bias
5. Go to every neurons which contributes to the error and exchange its
respective values to lessen the error
• 1, 2, 3, 4, 5
• 5, 4, 3, 2, 1
• 3, 2, 1, 5, 4
• 4, 3, 1, 5, 2

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

“Convolutional Neural Networks can carry out various forms of transformation


(rotations or scaling) in an enter”. Is the assertion correct true or false in
TensorFlow?
• True
• false

Which of the following techniques perform comparable operations as the


dropout in a neural community in TensorFlow?
• Bagging
• Boosting
• Stacking
• None of those

Which of the following is authentic approximately model capability (in which


version capacity method the potential of the neural community to
approximate complex capabilities) in TensorFlow?
• As range of hidden layers boom, model capability will increase
• As dropout ratio increases, version capacity increases
• As mastering charge will increase, model capacity will increase
• None of these

In case you growth the range of hidden layers in a Multi-Layer Perceptron, the
category errors of check facts always decreases in TensorFlow. Authentic or
fake?
• Actual
• Fake

What’s the series of the following duties in a perceptron in tensorflow?


1.Initialize weights of perceptron randomly
2. Visit the subsequent batch of the dataset
3. If the prediction does no longer in shape the output, trade the weights
4. For a sample enter, compute an output
• 1, 2, 3, 4
• 4, 3, 2, 1
• 3, 1, 2, 4
• 1, 4, 3, 2

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Suppose that you have to limit the value feature via converting the
parameters. Which of the subsequent approach could be used for this in
TensorFlow?
• Exhaustive seek
• Random search
• Bayesian Optimization
• Any of those

Can a neural network model the characteristic (y=1/x) in TensorFlow?


• Sure
• No

Wherein neural internet architecture, does weight sharing occur in


TensorFlow?
• Convolutional neural community
• Recurrent Neural community
• Fully related Neural community
• Both a and b

Batch Normalization is useful due to the fact?


• It normalizes (adjustments) all the input earlier than sending it to the
subsequent layer
• It returns again the normalized mean and widespread deviation of
weights
• It miles a very efficient backpropagation method
• None of those

As opposed to trying to acquire absolute 0 error, we set a metric called Bayes


blunders that’s the error we hope to achieve. What may be the cause for the
use of Bayes blunders in TensorFlow?
• Input variables might not include entire statistics about the output
variable
• Gadget (that creates input-output mapping) may be stochastic
• Constrained training facts
• All of the above

In a neural network, which of the subsequent strategies is used to deal with


overfitting in TensorFlow?

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

• Dropout
• Regularization
• Batch Normalization
• All of the above

Y = ax^2 + bx + c (polynomial equation of degree 2)Can this equation be


represented via a neural network of a single hidden layer with linear
threshold?
• Sure
• No

A numeric variable can shop numeric values with a maximum of eight digits.
• Authentic
• False

What’s a lifeless unit in a neural community?


• A unit which doesn’t replace throughout training by means of any of
its neighbour
• A unit which does now not reply absolutely to any of the schooling
styles
• The unit which produces the most important sum-squared mistakes
• None of these

Which of the subsequent assertion is the high-quality description of early


stopping?
• Teach the network until a local minimum in the blunders feature is
reached
• Simulate the community on a take a look at dataset after each epoch
of schooling. Stop training whilst the generalization errors starts
offevolved to boom
• Add a momentum term to the weight update within the Generalized
Delta Rule, so that schooling converges more quickly
• A faster model of backpropagation, such as the `Quickprop’
algorithm

What if we use a gaining knowledge of fee that’s too huge?


• Network will converge
• Network will now not converge

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

• Can’t Say

In TensorFlow, knowing the weight and bias of each neuron is the maximum
crucial step. If you could by some means get the best fee of weight and bias for
each neuron, you may approximate any characteristic. What will be the first-
class way to technique this?
• Assign random values and pray to God they are correct
• Seek every feasible aggregate of weights and biases until you get the
fine price
• Iteratively test that when assigning a value how a ways you are from
the first-class values, and barely alternate the assigned values values to
cause them to higher

The variety of neurons inside the output layer must in shape the wide variety
of instructions (in which the variety of lessons is extra than 2) in a supervised
studying project in TensorFlow. Real or false?
• Genuine
• False

While pooling layer is introduced in a convolutional neural network, translation


in-variance is preserved. Genuine or fake?
• Genuine
• Fake

Which gradient approach is finer whilst the facts is too massive to address in
RAM simultaneously?
• Full Batch Gradient Descent
• Stochastic Gradient Descent

For a category task, in place of random weight initializations in a neural


network, we set all the weights to zero. Which of the subsequent statements is
authentic?
• There will no longer be any trouble and the neural network will
educate nicely
• The neural network will train but all of the neurons will turn out to
be recognizing the same factor
• The neural network will now not train as there's no internet gradient
exchange

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

For a photo reputation problem (spotting a cat in a photograph), which


architecture of neural network might be higher suited to remedy the trouble?
• Multi Layer Perceptron
• Convolutional Neural community
• Recurrent Neural community
• Perceptron

What are the elements to choose the intensity of the neural network?
1. Form of neural community
2. Input records
3. Computation strength
4. Studying charge
5. The output function to map
• 1, 2, 4, 5
• 2, 3, 4, 5
• 1, 3, 4, 5
• All of these

Growth in length of a convolutional kernel might always boom the


performance of a convolutional community.
• Real
• False

TensorFlow is imported as?


• Run TensorFlow
• Import TensorFlow as tf
• Import TensorFlow
• Run tf

NumPy is imported as?


• Run numpy
• Import numpy as np
• Import numpy
• Run numpy

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Although system getting to know is an interesting concept, there are


restrained business programs wherein it’s miles beneficial.
• True
• False

Which of the subsequent is a way regularly utilized in TensorFlow and system


learning?
• Type of facts into classes based on attributes.
• Grouping comparable objects into clusters of associated events.
• Figuring out relationships between occasions to are expecting whilst
one will follow the alternative.
• All the above are not unusual system learning strategies.

k-NN set of rules does more computation on check time rather than train time.
• Real
• Fake

Which of the following distance metric cannot be utilized in k-NN?


• NY
• Minkowski
• Tanimoto
• Jaccard
• All can be used

Which of the following option is true about the ok-NN set of rules?
• It can be used for type
• It could be used for regression
• It could be used in both class and regression

For practical implementation what type of approximation is used on Boltzmann


law?
• max field approximation
• min subject approximation
• hopfield approximation
• none

False minima may be reduced through deterministic updates?

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

• Sure
• No

What changed into the second stage in perceptron version known as?
• Sensory gadgets
• Summing unit
• Association unit
• Output unit

Delta learning is of the unsupervised kind?


• Sure
• No

What results in minimization of errors among the favored & real outputs?
• Balance
• Convergence
• Either balance or convergence
• Not one of the mentioned

Assume a convolutional neural community is educated on ImageNet dataset


(item reputation dataset). This skilled model is then given a totally white image
as an enter. The output probabilities for this enter might be same for all
lessons. Real or false?
• Real
• False

The trouble you are trying to remedy has a small amount of records. Luckily,
you have a pre-educated neural community that turned into educated on a
similar problem. Which of the following methodologies could you choose to
utilize this pre-skilled community?
• Re-teach the version for the brand new dataset
• Investigate on each layer how the version plays and only choose a
few of them
• Excellent song the last couple of layers simplest
• Freeze all the layers besides the final, re-teach the closing layer

What of the following is accurate in regard to backpropagation algorithm?

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

• Also known as generalized delta rule.


• The error is propagated backwards to determine weight updates
• No feedback at any stage
• All of the above mentioned

Considering backpropagation, which of the following options is true?


• It is a feedback neural network
• Actual output determined by the output of each hidden layer
• Hidden layers output is significant, they are only meant for
supporting input and output layers

What are the general limitations of back propagation rule?


• No feedback at any stage
• Retarded convergence
• Scaling
• All of the mentioned

A format will modify both the stored value and the displayed value.
• Correct
• Incorrect

1) Which of the subsequent declaration(s) effectively represents an actual


neuron in TensorFlow?

A. A neuron has a single enter and a single output best



• B.A neuron has multiple inputs but a single output only
• C.A neuron has a single input, however, more than one outputs
• D.All of the above statements are valid
2) Which of the following techniques perform comparable operations as the
dropout in a neural community in TensorFlow?

A. Stacking

• B.Bagging
• C.Boosting
• D.None of these
3) Can a neural network model the characteristic (y=1/x) in TensorFlow?

• A. True
• B.False

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

4) Wherein neural internet architecture, does weight sharing occur in


TensorFlow?

• A. Fully related Neural community


• B.Recurrent Neural community
• C.Convolutional neural community
• D.both b & c
5) In a neural network, which of the subsequent strategies is used to deal with
overfitting in TensorFlow?

• A. Dropout
• B.Regularization
• C.Batch Normalization
• D.All of the above
6) Y = ax^2 + bx + c (polynomial equation of degree 2)Can this equation be
represented via a neural network of a single hidden layer with linear
threshold?

• A. Yes
• B.No
7) A numeric variable can shop numeric values with a maximum of eight digits.

• A. True
• B.False
8) Identify the lifeless unit in a neural community?

• A. The unit which produces the most important sum-squared mistakes


• B.A unit which does now not reply absolutely to any of the schooling
styles
• C.A unit which doesn’t replace throughout training by means of any of
its neighbour
• D.None of these
9) What if we use a gaining knowledge of fee that’s too huge?

• A. Network will converge


• B.Network will now not converge
• C.Can’t Say
10) Which of following functions shouldn't be used at the output layer to
classify an image?

• A. tanh

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

• B.ReLU
• C.sigmoid
• D.None of these
11) The nodes in the i/p layer is 10 and that in the hidden layer is 5 what will
be the max. connections from the i/p layer to the hidden layer are?

• A. Twenty
• B.Sixty
• C.Fifty
• D.It is random
12) From the following choices where can deep learning be used?

• A. Detection of exotic particles


• B.Protein structure prediction
• C.Prediction of chemical reactions
• D.All of the above
13) The network that involves feedback links from o/p to i/p and hidden layers
is called as ____

• A. Self organizing maps


• B.Multi layered perceptron
• C.Recurrent neural network
• D.All of the above
14) Feature Columns, handle a variety of input data types without _______ to
the model.

• A. Changes
• B.user help
• C.documentation
• D.None of these
15) Why do we use TPU?

• A. To visualize model
• B.For debugging purpose only
• C.To accelerate the development
• D.TPU does not exist
16) What do you by TensorBoard?

• A. TensorBoard provides the visualization and tooling needed for


machine learning experimentation

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

• B.TensorBoard is a metric tool which compares model in terms of their


accuracy
• C.TensorBoard does not exsist
• D.TensorBoard is used to rank the best performing Tensors
17) Which of the following product isn't built using TensorFlow?

• A. Hand Writing Recognition


• B.Teachable Machine
• C.Nsynth
• D.Pandas
18) What is the full form of TPU?

• A. Two processing unit


• B.Truer processing unit
• C.Test processing unit
• D.Tensor processing unit
19) What is the full form of XLA in TensorFlow?

• A. Accelerated Linear Algebra


• B.Unknown Linear Algebra
• C.Xtreme Linear Algebra
• D.X Linear Algebra
20) Can TensorFlow be deployed in container software?

• A. True
• B.False
21) Which of the following dashboards in TensorFlow?

• A. Scalar Dashboard
• B.Histogram Dashboard
• C.Distributer Dashboard
• D.All of the above
22) Identify the type of Tensors?

• A. Variable Tensor
• B.Constant Tensor
• C.Place Holder Tensor.
• D.All of the above
23) Who discovered tensors?

• A. Gargi-Curbastro

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

B.Gregorio Ricci-Curbastro

• C.Both 1 and 2
• D.None of these
24) What of the following is accurate in regard to backpropagation algorithm?

A. Also known as generalized delta rule.



• B.No feedback at any stage
• C.The error is propagated backwards to determine weight updates
• D.All of the above
25) What are the general limitations of back propagation rule?

• A. No feedback at any stage


• B.Retarded convergence
• C.Scaling
• D.All of the above
• TensorFlow is a Python-based library which is used for creating machine
learning applications.

• A. TRUE
B. FALSE
C. Can be true or false
D. Can not say
• View Answer

• 2. How many types of Tensors are there?

• A. 2
B. 3
C. 4
D. 5
• View Answer

• 3. Which of the following are main advantages of TensorFlow?

• A. It has auto differentiation capabilities


B. It has platform flexibility
C. It is easily customizable and open-source
D. All of the above
• View Answer

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

• 4. TensorFlow architecture works in ________ parts.

• A. 1
B. 2
C. 3
D. 4
• View Answer

• 5. __________ provides a high-level API which makes neural network


building and training fast and easy.

• A. TensorLayer
B. TFLearn
C. PrettyTensor
D. Sonnet
• View Answer

• 6. Variables in TensorFlow are also known as ?

• A. tensor variable
B. tensor keywords
C. tensor attributes
D. tensor objects
• View Answer

• 7. Which of the following defines specific input data that does not
change with time?

• A. tf.variable
B. tf.placeholder
C. Both A and B
D. None of the above
• View Answer

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

• 8. Can TensorFlow be deployed in container software?

• A. Yes
B. No
C. Can be yes or no
D. Can not say
• View Answer

• 9. Which of the following is true about TensorFlow?

• A. The TensorFlow is based on Theano library.


B. It is produced by Google
C. TensorFlow does not have any option at run time
D. All of the above
• View Answer

• 10. DeepSpeech is an open-source engine used to convert Speech into


Text.

• A. TRUE
B. FALSE
C. Can be true or false
D. Can not say
TensorFlow was developed by

A. Oracle Team

• B. IBM Team
• C. Microsoft Team
• D. Google Brain Team
2) TensorFlow was firstly introduced in _______

A. October 9, 2015

• B. October 9, 2016
• C. November 8, 2015
• D. November 9, 2015
3) Tensorflow is written in which language?

• A. C++
• B. CUDA

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

• C. Python
• D. All of the Above
4) Tensorflow supports ______ of the following platforms.

• A. Linux
• B. macOS
• C. Windows & Android
• D. All of the Above
5) Which of the following techniques perform comparable operations as the
dropout in a neural community in TensorFlow?

• A. Bagging
• B. Boosting
• C. Stacking
• D. None Of Above
Download Free : TensorFlow MCQ PDF
6) In a neural network, which of the subsequent strategies is used to deal
with overfitting in TensorFlow?

• A. Dropout
• B. Regularization
• C. Batch Normalization
• D. All of the above
7) Tensorflow is similar to ______

• A. SQL query
• B. Data Array
• C. ANN Model
• D. Pythoncode
8) Why do we use TPU?

• A. TPU does not exist


• B. To visualize model
• C. To accelerate the development
• D. For debugging purpose only
9) What is the full form of TPU?

• A. Tensor processing unit


• B. Truer processing unit
• C. Two processing unit
• D. Test processing unit

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

10) Who discovered tensors?

• A. Gregorio Ricci-Curbastro
• B. Gargi-Curbastro
• C. Both A and B
• D. None Of Above
Read Best: TensorFlow Interview Questions
11) How many types of Tensors are there?

• A. One
• B. Two
• C. Three
• D. Four
12) Variables in TensorFlow are also known as ?

• A. tensor objects
• B. tensor variable
• C. tensor attributes
• D. tensor keywords
13) Which of the following is true about TensorFlow?

• A. It is produced by Google
• B. The TensorFlow is based on Theano library.
• C. TensorFlow does not have any option at run time
• D. All of the Above
14) TensorFlow is a free and open-source ______

• A. PHP
• B. Java
• C. Python
• D. Angular
15) Tensorflow supports which python version?

• A. Python 3.0
• B. Python 3.3
• C. Python 3.5
• D. Python 3.6
Download Free: TensorFlow Interview Questions PDF
16) Why tensorflow uses computational graphs?

• A. Graphs are easy to plot

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

B. Calculations can be done in parallel



• C. Tensors are nothing but computational graphs
• D. All of the above
17) Which of the following tool is a deep learning wrapper on TensorFlow?

A. Creo

• B. Keras
• C. Python
• D. Adurino
18) TensorFlow is mainly used for ______

A. Classification and Perception



• B. Discovering and Understanding
• C. Prediction and Creation
• D. All of the Above
19) Which of the subsequent declaration(s) effectively represents an actual
neuron in TensorFlow?

A. A neuron has a single enter and a single output best



• B. A neuron has multiple inputs but a single output only
• C. A neuron has a single input, however, more than one outputs
• D. All of the above statements are valid
20) What if we use a gaining knowledge of fee that’s too huge?

A. Network will converge



• B. Network will now not converge
• C. Both A and B
• D. None Of Above
21) What is the full form of XLA in TensorFlow?

• A. X Linear Algebra
• B. Xtreme Linear Algebra
• C. Unknown Linear Algebra
• D. Accelerated Linear Algebra

1. TensorFlow is a free and open-source ............. based library for machine


learning.

• Python

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

• Java
• PHP
• Angular

Tensor flow is developed by…………………….

• IBM Team
• Microsoft Team
• Google Brain team
• None of the above
View Answer
Google Brain team
Exp; TensorFlow is developed by the Google Brain team.

3. TensorFlow was initially released in .................

• November 9, 2015
• November 8, 2015
• October 9, 2015
• November 9, 2016
View Answer
November 9, 2015
Exp: TensorFlow was initially released on November 9, 2015, about 5.5 years
ago.

4. Tensorflow is written in which language?

• C++
• Python
• CUDA
• All of the above
View Answer
All of the above
Exp: Tensorflow is written in C++, Python, & CUDA programming languages.

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

5. Tensorflow attracts the largest popularity on GitHub compare to the other


deep learning framework.

• True
• False
Download Free : TensorFlow MCQ PDF
View Answer
True
Exp: Yes! Tensorflow attracts the largest popularity on GitHub compare to the
other deep learning framework.

6. Tensorflow supports which python version?

• Python 3.0
• Python 3.3
• Python 3.5
• Python 3.6–3.9
View Answer
Python 3.6–3.9
Exp: Tensorflow supports Python 3.6 to 3.9 version.

7. Tensorflow supports which of the following platforms?

• Linux
• macOS
• Windows & Android
• All of the above
View Answer
All of the above
Exp: Tensorflow supports 64-bit Linux, macOS, Windows & Android platforms.

8. Tensorflow is a symbolic math library based on .............

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

• Dataflow
• Differentiable programming
• Both Dataflow & Differentiable programming
• None of the above
View Answer
Both Dataflow & Differentiable programming
Exp: Tensorflow is a symbolic math library based on both dataflow &
differentiable programming.

9. There are ........... main tensor type you can create in TensorFlow.

• 2
• 3
• 4
• 5
View Answer
4
Exp: There are 4 main tensor type you can create in TensorFlow. these are
tf.Variable, tf.constant, tf.placeholder, & tf.SparseTensor.

10. What is the Advantage of TensorFlow?

• It has excellent community support.


• It is designed to use various backend software (GPUs, ASIC), etc. and
also highly parallel.
• It has a unique approach that allows monitoring the training progress
of our models and tracking several metrics.
• All of the above
Read Best: TensorFlow Interview Questions
View Answer
All of the above
Exp: The Advantages of TensorFlow are - It has excellent community support, It
is designed to use various backend software (GPUs, ASIC), etc. and also highly
parallel, It has a unique approach that allows monitoring the training progress

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

of our models and tracking several metrics, & Its performance is high and
matching the best in the industry.

11. What are the disadvantages of TensorFlow?

• Missing Symbolic loops


• No supports for windows
• No GPU support for Nvidia
• All of the above
View Answer
All of the above
Exp: The disadvantages of TensorFlow are as follows - Missing Symbolic loops,
No supports for windows, No GPU support for Nvidia, No support for OpenCL,
hard to find an error and difficult to debug.

12. What are the Features of TensorFlow?

• Flexible & Open Source


• Easily Trainable & Layered Components
• Open Source & Responsive Construct
• All of the above
View Answer
All of the above
Exp: The main features of TensorFlow are - Responsive Construct, Flexible,
Easily Trainable, Large Community, Open Source, Feature Columns, Layered
Components, & Event Logger (With TensorBoard) and many others.

13. TensorFlow has only supported 64-bit Python 3.5.x or Python 3.6.x on
Windows.

• True
• False
View Answer
True

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

14. TensorFlow managers handle the full lifecycle of Servables, except


..............

• Serving Servables
• Metrics Servables
• Loading Servables
• Unloading Servables
View Answer
Metrics Servables
Exp: TensorFlow managers handle the full lifecycle of a Servables, including -
Loading Servables, Serving Servables, Unloading Servables.

15. When was Tensorflow 2.0 released?

• September 2019
• October 2019
• August 2019
• November 2019
Download Free: TensorFlow Interview Questions PDF
View Answer
September 2019
Exp: Tensorflow 2.0 was released on September 30, 2019.

16. Why tensorflow uses computational graphs?

• Graphs are easy to plot


• Calculations can be done in parallel
• Tensors are nothing but computational graphs
• All of the above
View Answer
Calculations can be done in parallel
Exp: Tensorflow uses computational graphs because calculations can be done
in parallel.

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

17. What is the use of a session in TensorFlow?

• We launch the graph in a session


• A session is used to download the data
• The current work space session for storing the code
• A session is used for exporting data out of TensorFlow
View Answer
We launch the graph in a session
Exp: Basically, we launch the graph in a session in TensorFlow.

18. What are the different dashboards in TensorFlow?

• Scalar Dashboard
• Histogram Dashboard
• Distributer Dashboard
• All of the above
View Answer
All of the above
Exp: There are different types of dashboards are available in TensorFlow such
as - Scalar Dashboard, Histogram Dashboard, Distributor Dashboard, Image
Dashboard, & Audio Dashboard, etc.

19. Which of the following tool is a deep learning wrapper on TensorFlow?

• Keras
• Azure
• Python
• PyTourch
View Answer
Keras
Exp: Keras tool is a deep learning wrapper on TensorFlow.

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

20. Can we use GPU for faster computations in TensorFlow?

• Yes
• No
View Answer
Yes
Exp: Yes! we can use GPU for faster computations in TensorFlow.

Question-1 = Why is the convolutional layer important in convolutional neural


networks?
Solution = Because if we do not use a convolutional layer, we will end up with a
massive number of parameters that will need to be optimized and it will be
super computationally expensive.
Question-2 = The following is a typical architecture of a convolutional neural
network.
False
Question-3 = For unsupervised learning, which of the following deep neural
networks would you choose? Select all that apply
Solution = Autoencoders, Restricted Boltzmann Machines.
Question-4 = Recurrent Neural Networks are networks with loops, that don’t
just take a new input at a time, but also take as input the output from the data
point at the previous instance.
Solution = True
Question-5 = Which of the following statements is correct?
Solution = An autoencoder is an unsupervised neural network model that uses
backpropagation by setting the target variable to be the same as the input.
1. _________ is a high level API built on TensorFlow.

A. PyBrain
B. Keras
C. PyTorch
D. Theano
View Answer

2. Is keras a library?

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

A. Yes
B. No
C. Can be yes or no
D. Can not say
View Answer

3. Who invented keras?

A. Michael Berthold
B. Adam Paszke
C. Sam Gross
D. François Chollet
View Answer

4. __________ is a regularization technique for neural network models


proposed by Srivastava, it is a technique where randomly selected neurons are
ignored during training.

A. Callout
B. Digout
C. Dropout
D. Knimeout
View Answer

5. What is true about Keras?

A. Keras is an API designed for human beings, not machines.


B. Keras follows best practices for reducing cognitive load
C. it provides clear and actionable feedback upon user error
D. All of the above
View Answer

6. A flatten operation on a tensor reshapes the tensor to have a shape that is


equal to the number of elements contained in the tensor.

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

A. TRUE
B. FALSE
C. Can be true or false
D. Can not say
View Answer

7. What are advanced activation functions in keras ?

A. LeakyReLU
B. PReLU
C. Both A and B
D. None of the above
View Answer

8. Which of the following are correct initializers in keras?

A. keras.initializers.Initializer()
B. keras.initializers.Zeros()
C. keras.initializers.Ones()
D. All of the above
View Answer

9. A ____________ requires shape of the input (input_shape) to understand


the structure of the input data.

A. Keras layer
B. Keras Module
C. Keras Model
D. Keras Time
View Answer

10. Which of the following returns all the layers of the model as list?

A. model.inputs
B. model.layers

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

C. model.outputs
D. model.get_weights

Which of the following statement(s) correctly represents a real neuron?


A. A neuron has a single input and a single output only
B. A neuron has multiple inputs but a single output only
C. A neuron has a single input but multiple outputs
D. A neuron has multiple inputs and multiple outputs
E. All of the above statements are valid
Solution: (E)
A neuron can have a single Input / Output or multiple Inputs / Outputs.

Q2. Below is a mathematical representation of a neuron.

The different components of the neuron are


denoted as:

• x1, x2,…, xN: These are inputs to the neuron. These can either be the
actual observations from input layer or an intermediate value from one
of the hidden layers.
• w1, w2,…,wN: The Weight of each input.
• bi: Is termed as Bias units. These are constant values added to the input
of the activation function corresponding to each weight. It works similar
to an intercept term.
• a: Is termed as the activation of the neuron which can be represented
as
• and y: is the output of the neuron

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Considering the above notations, will a line equation (y = mx + c) fall into the
category of a neuron?
A. Yes
B. No
Solution: (A)
A single neuron with no non-linearity can be considered as a linear regression
function.

Q3. Let us assume we implement an AND function to a single neuron. Below is


a tabular representation of an AND function:
X1 X2 X1 AND X2
0 0 0
0 1 0
1 0 0
1 1 1
The activation function of our neuron is denoted as:

What would be the weights and bias?

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

(Hint: For which values of w1, w2 and b does our neuron implement an AND
function?)
A. Bias = -1.5, w1 = 1, w2 = 1
B. Bias = 1.5, w1 = 2, w2 = 2
C. Bias = 1, w1 = 1.5, w2 = 1.5
D. None of these
Solution: (A)
A.

1. f(-1.5*1 + 1*0 + 1*0) = f(-1.5) = 0


2. f(-1.5*1 + 1*0 + 1*1) = f(-0.5) = 0
3. f(-1.5*1 + 1*1 + 1*0) = f(-0.5) = 0
4. f(-1.5*1 + 1*1+ 1*1) = f(0.5) = 1
Therefore option A is correct

Q4. A network is created when we multiple neurons stack together. Let us take
an example of a neural network simulating an XNOR function.

You can see that the last neuron takes input from two neurons before it. The
activation function for all the neurons is given by:

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Suppose X1 is 0 and X2 is 1, what will be the output for the above neural
network?
A. 0
B. 1
Solution: (A)
Output of a1: f(0.5*1 + -1*0 + -1*1) = f(-0.5) = 0
Output of a2: f(-1.5*1 + 1*0 + 1*1) = f(-0.5) = 0
Output of a3: f(-0.5*1 + 1*0 + 1*0) = f(-0.5) = 0
So the correct answer is A

Q5. In a neural network, knowing the weight and bias of each neuron is the
most important step. If you can somehow get the correct value of weight and
bias for each neuron, you can approximate any function. What would be the
best way to approach this?
A. Assign random values and pray to God they are correct
B. Search every possible combination of weights and biases till you get the best
value
C. Iteratively check that after assigning a value how far you are from the best
values, and slightly change the assigned values values to make them better
D. None of these
Solution: (C)
Option C is the description of gradient descent.

Q6. What are the steps for using a gradient descent algorithm?

1. Calculate error between the actual value and the predicted value
2. Reiterate until you find the best weights of network
3. Pass an input through the network and get values from output layer
4. Initialize random weight and bias

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

5. Go to each neurons which contributes to the error and change its


respective values to reduce the error
A. 1, 2, 3, 4, 5
B. 5, 4, 3, 2, 1
C. 3, 2, 1, 5, 4
D. 4, 3, 1, 5, 2
Solution: (D)
Option D is correct

Q7. Suppose you have inputs as x, y, and z with values -2, 5, and -4 respectively.
You have a neuron ‘q’ and neuron ‘f’ with functions:
q=x+y
f=q*z
Graphical representation of the functions is as follows:

What is the gradient of F with respect to x, y, and z?


(HINT: To calculate gradient, you must find (df/dx), (df/dy) and (df/dz))
A. (-3,4,4)
B. (4,4,3)
C. (-4,-4,3)
D. (3,-4,-4)
Solution: (C)
Option C is correct.

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Q8. Now let’s revise the previous slides. We have learned that:

• A neural network is a (crude) mathematical representation of a brain,


which consists of smaller components called neurons.
• Each neuron has an input, a processing function, and an output.
• These neurons are stacked together to form a network, which can be
used to approximate any function.
• To get the best possible neural network, we can use techniques like
gradient descent to update our neural network model.
Given above is a description of a neural network. When does a neural network
model become a deep learning model?
A. When you add more hidden layers and increase depth of neural network
B. When there is higher dimensionality of data
C. When the problem is an image recognition problem
D. None of these
Solution: (A)
More depth means the network is deeper. There is no strict rule of how many
layers are necessary to make a model deep, but still if there are more than 2
hidden layers, the model is said to be deep.

Q9. A neural network can be considered as multiple simple equations stacked


together. Suppose we want to replicate the function for the below mentioned
decision boundary.

Using two simple inputs h1 and h2

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

What will be the final equation?


A. (h1 AND NOT h2) OR (NOT h1 AND h2)
B. (h1 OR NOT h2) AND (NOT h1 OR h2)
C. (h1 AND h2) OR (h1 OR h2)
D. None of these
Solution: (A)
As you can see, combining h1 and h2 in an intelligent way can get you a
complex equation easily. Refer Chapter 9 of this book

Q10. “Convolutional Neural Networks can perform various types of


transformation (rotations or scaling) in an input”. Is the statement correct True
or False?
A. True
B. False
Solution: (B)
Data Preprocessing steps (viz rotation, scaling) is necessary before you give the
data to neural network because neural network cannot do it itself.

Q11. Which of the following techniques perform similar operations as dropout


in a neural network?
A. Bagging
B. Boosting

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

C. Stacking
D. None of these
Solution: (A)
Dropout can be seen as an extreme form of bagging in which each model is
trained on a single case and each parameter of the model is very strongly
regularized by sharing it with the corresponding parameter in all the other
models. Refer here

Q 12. Which of the following gives non-linearity to a neural network?


A. Stochastic Gradient Descent
B. Rectified Linear Unit
C. Convolution function
D. None of the above
Solution: (B)
Rectified Linear unit is a non-linear activation function.

Q13. In training a neural network, you notice that the loss does not decrease
in the few starting epochs.

The reasons for this could be:

1. The learning is rate is low


2. Regularization parameter is high

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

3. Stuck at local minima


What according to you are the probable reasons?
A. 1 and 2
B. 2 and 3
C. 1 and 3
D. Any of these
Solution: (D)
The problem can occur due to any of the reasons mentioned.

Q14. Which of the following is true about model capacity (where model
capacity means the ability of neural network to approximate complex
functions) ?
A. As number of hidden layers increase, model capacity increases
B. As dropout ratio increases, model capacity increases
C. As learning rate increases, model capacity increases
D. None of these
Solution: (A)
Only option A is correct.

Q15. If you increase the number of hidden layers in a Multi Layer Perceptron,
the classification error of test data always decreases. True or False?
A. True
B. False
Solution: (B)
This is not always true. Overfitting may cause the error to increase.

Q16. You are building a neural network where it gets input from the previous
layer as well as from itself.

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Which of the following architecture has feedback connections?


A. Recurrent Neural network
B. Convolutional Neural Network
C. Restricted Boltzmann Machine
D. None of these
Solution: (A)
Option A is correct.

Q17. What is the sequence of the following tasks in a perceptron?

1. Initialize weights of perceptron randomly


2. Go to the next batch of dataset
3. If the prediction does not match the output, change the weights
4. For a sample input, compute an output
A. 1, 2, 3, 4
B. 4, 3, 2, 1
C. 3, 1, 2, 4
D. 1, 4, 3, 2
Solution: (D)

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Sequence D is correct.

Q18. Suppose that you have to minimize the cost function by changing the
parameters. Which of the following technique could be used for this?
A. Exhaustive Search
B. Random Search
C. Bayesian Optimization
D. Any of these
Solution: (D)
Any of the above mentioned technique can be used to change parameters.

Q19. First Order Gradient descent would not work correctly (i.e. may get stuck)
in which of the following graphs?

A.

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

B.

C.
D. None of these
Solution: (B)
This is a classic example of saddle point problem of gradient descent.

Q20. The below graph shows the accuracy of a trained 3-layer convolutional
neural network vs the number of parameters (i.e. number of feature kernels).

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

The trend suggests that as you increase the width of a neural network, the
accuracy increases till a certain threshold value, and then starts decreasing.
What could be the possible reason for this decrease?
A. Even if number of kernels increase, only few of them are used for prediction
B. As the number of kernels increase, the predictive power of neural network
decrease
C. As the number of kernels increase, they start to correlate with each other
which in turn helps overfitting
D. None of these
Solution: (C)
As mentioned in option C, the possible reason could be kernel correlation.

Q21. Suppose we have one hidden layer neural network as shown above. The
hidden layer in this network works as a dimensionality reductor. Now instead
of using this hidden layer, we replace it with a dimensionality reduction
technique such as PCA.

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Would the network that uses a dimensionality reduction technique always


give same output as network with hidden layer?
A. Yes
B. No
Solution: (B)
Because PCA works on correlated features, whereas hidden layers work on
predictive capacity of features.

Q22. Can a neural network model the function (y=1/x)?


A. Yes
B. No
Solution: (A)
Option A is true, because activation function can be reciprocal function.

Q23. In which neural net architecture, does weight sharing occur?


A. Convolutional neural Network
B. Recurrent Neural Network
C. Fully Connected Neural Network
D. Both A and B

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Solution: (D)
Option D is correct.

Q24. Batch Normalization is helpful because


A. It normalizes (changes) all the input before sending it to the next layer
B. It returns back the normalized mean and standard deviation of weights
C. It is a very efficient backpropagation technique
D. None of these
Solution: (A)
To read more about batch normalization, see refer this video

Q25. Instead of trying to achieve absolute zero error, we set a metric called
bayes error which is the error we hope to achieve. What could be the reason
for using bayes error?
A. Input variables may not contain complete information about the output
variable
B. System (that creates input-output mapping) may be stochastic
C. Limited training data
D. All the above
Solution: (D)
In reality achieving accurate prediction is a myth. So we should hope to achieve
an “achievable result”.

Q26. The number of neurons in the output layer should match the number of
classes (Where the number of classes is greater than 2) in a supervised learning
task. True or False?
A. True
B. False

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Solution: (B)
It depends on output encoding. If it is one-hot encoding, then its true. But you
can have two outputs for four classes, and take the binary values as four
classes(00,01,10,11).

Q27. In a neural network, which of the following techniques is used to deal


with overfitting?
A. Dropout
B. Regularization
C. Batch Normalization
D. All of these
Solution: (D)
All of the techniques can be used to deal with overfitting.

Q28. Y = ax^2 + bx + c (polynomial equation of degree 2)


Can this equation be represented by a neural network of single hidden layer
with linear threshold?
A. Yes
B. No
Solution: (B)
The answer is no because having a linear threshold restricts your neural network
and in simple terms, makes it a consequential linear transformation function.

Q29. What is a dead unit in a neural network?


A. A unit which doesn’t update during training by any of its neighbour
B. A unit which does not respond completely to any of the training patterns
C. The unit which produces the biggest sum-squared error
D. None of these

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Solution: (A)
Option A is correct.
Q30. Which of the following statement is the best description of early
stopping?
A. Train the network until a local minimum in the error function is reached
B. Simulate the network on a test dataset after every epoch of training. Stop
training when the generalization error starts to increase
C. Add a momentum term to the weight update in the Generalized Delta Rule,
so that training converges more quickly
D. A faster version of backpropagation, such as the `Quickprop’ algorithm
Solution: (B)
Option B is correct.

Q31. What if we use a learning rate that’s too large?


A. Network will converge
B. Network will not converge
C. Can’t Say
Solution: B
Option B is correct because the error rate would become erratic and explode.

Q32. The network shown in Figure 1 is trained to recognize the characters H


and T as shown below:

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

What would be the output of the network?

A.

B.

C.
D. Could be A or B depending on the weights of neural network
Solution: (D)
Without knowing what are the weights and biases of a neural network, we
cannot comment on what output it would give.

Q33. Suppose a convolutional neural network is trained on ImageNet dataset


(Object recognition dataset). This trained model is then given a completely
white image as an input.The output probabilities for this input would be equal
for all classes. True or False?
A. True
B. False
Solution: (B)

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

There would be some neurons which are do not activate for white pixels as
input. So the classes wont be equal.

Q34. When pooling layer is added in a convolutional neural network,


translation in-variance is preserved. True or False?
A. True
B. False
Solution: (A)
Translation invariance is induced when you use pooling.

Q35. Which gradient technique is more advantageous when the data is too big
to handle in RAM simultaneously?
A. Full Batch Gradient Descent
B. Stochastic Gradient Descent
Solution: (B)
Option B is correct.

Q36. The graph represents gradient flow of a four-hidden layer neural network
which is trained using sigmoid activation function per epoch of training. The
neural network suffers with the vanishing gradient problem.

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Which of the following statements is true?


A. Hidden layer 1 corresponds to D, Hidden layer 2 corresponds to C, Hidden
layer 3 corresponds to B and Hidden layer 4 corresponds to A
B. Hidden layer 1 corresponds to A, Hidden layer 2 corresponds to B, Hidden
layer 3 corresponds to C and Hidden layer 4 corresponds to D
Solution: (A)
This is a description of a vanishing gradient problem. As the backprop algorithm
goes to starting layers, learning decreases.

Q37. For a classification task, instead of random weight initializations in a


neural network, we set all the weights to zero. Which of the following
statements is true?
A. There will not be any problem and the neural network will train properly
B. The neural network will train but all the neurons will end up recognizing the
same thing
C. The neural network will not train as there is no net gradient change

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

D. None of these
Solution: (B)
Option B is correct.

Q38. There is a plateau at the start. This is happening because the neural
network gets stuck at local minima before going on to global minima.

To avoid this, which of the following strategy should work?


A. Increase the number of parameters, as the network would not get stuck at
local minima
B. Decrease the learning rate by 10 times at the start and then use momentum
C. Jitter the learning rate, i.e. change the learning rate for a few epochs
D. None of these
Solution: (C)
Option C can be used to take a neural network out of local minima in which it is
stuck.

Q39. For an image recognition problem (recognizing a cat in a photo), which


architecture of neural network would be better suited to solve the problem?
A. Multi Layer Perceptron

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

B. Convolutional Neural Network


C. Recurrent Neural network
D. Perceptron
Solution: (B)
Convolutional Neural Network would be better suited for image related
problems because of its inherent nature for taking into account changes in
nearby locations of an image

Q40. Suppose while training, you encounter this issue. The error suddenly
increases after a couple of iterations.

You determine that there must a problem with the data. You plot the data and
find the insight that, original data is somewhat skewed and that may be
causing the problem.

What will you do to deal with this challenge?

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

A. Normalize
B. Apply PCA and then Normalize
C. Take Log Transform of the data
D. None of these
Solution: (B)
First you would remove the correlations of the data and then zero center it.

Q41. Which of the following is a decision boundary of Neural Network?

A) B
B) A
C) D
D) C
E) All of these
Solution: (E)
A neural network is said to be a universal function approximator, so it can
theoretically represent any decision boundary.

Q42. In the graph below, we observe that the error has many “ups and
downs”

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Should we be worried?
A. Yes, because this means there is a problem with the learning rate of neural
network.
B. No, as long as there is a cumulative decrease in both training and validation
error, we don’t need to worry.
Solution: (B)
Option B is correct. In order to decrease these “ups and downs” try to increase
the batch size.

Q43. What are the factors to select the depth of neural network?

1. Type of neural network (eg. MLP, CNN etc)


2. Input data
3. Computation power, i.e. Hardware capabilities and software capabilities
4. Learning Rate
5. The output function to map
A. 1, 2, 4, 5
B. 2, 3, 4, 5
C. 1, 3, 4, 5
D. All of these

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Solution: (D)
All of the above factors are important to select the depth of neural network

Q44. Consider the scenario. The problem you are trying to solve has a small
amount of data. Fortunately, you have a pre-trained neural network that was
trained on a similar problem. Which of the following methodologies would you
choose to make use of this pre-trained network?
A. Re-train the model for the new dataset
B. Assess on every layer how the model performs and only select a few of them
C. Fine tune the last couple of layers only
D. Freeze all the layers except the last, re-train the last layer
Solution: (D)
If the dataset is mostly similar, the best method would be to train only the last
layer, as previous all layers work as feature extractors.

Q45. Increase in size of a convolutional kernel would necessarily increase the


performance of a convolutional network.
A. True
B. False
Solution: (B)

1. Which of the following is a subset of machine learning?

• Numpy
• SciPy
• Deep Learning
• All of the above

View Answer
Correct Answer:

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Deep Learning

2. How many layers Deep learning algorithms are constructed?

• 2
• 3
• 4
• 5

View Answer
Correct Answer:
4

3. The first layer is called the?

• inner layer
• outer layer
• hidden layer
• None of the above

View Answer
Correct Answer:
inner layer

4. CNN is mostly used when there is an?

• structured data
• unstructured data
• Both A and B
• None of the above

View Answer
Correct Answer:
unstructured data

5. Which of the following is/are Common uses of RNNs?

• BusinessesHelp securities traders to generate analytic reports


• Detect fraudulent credit-card transaction
• Provide a caption for images

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

• All of the above

All of the above

6. Which neural network has only one hidden layer between the input and
output?

• Shallow neural network


• Deep neural network
• Feed-forward neural networks
• Recurrent neural networks

View Answer
Correct Answer:
Shallow neural network

7. RNNs stands for?

• Receives neural networks


• Receives neural networks
• Recording neural networks
• Recurrent neural networks

View Answer
Correct Answer:
Recurrent neural networks

8. Deep learning algorithms are _______ more accurate than machine learning
algorithm in image classification.

• 33%
• 0.37
• 0.4
• 0.41

View Answer
Correct Answer:
0.41

9. Which of the following is well suited for perceptual tasks?

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

• Feed-forward neural networks


• Recurrent neural networks
• Convolutional neural networks
• Reinforcement Learning

View Answer
Correct Answer:
Convolutional neural networks

10. Which of the following is/are Limitations of deep learning?

• Data labeling
• Obtain huge training datasets
• both 1 and 2
• None of the above

View Answer
Correct Answer:
both 1 and 2

11. The input image has been converted into a matrix of size 28 X 28 and a
kernel/filter of size 7 X 7 with a stride of 1. What will be the size of the
convoluted matrix?

• 20x20
• 21x21
• 22x22
• 25x25

View Answer
Correct Answer:
22x22

12. Which of the following statements is true when you use 1×1 convolutions
in a CNN?

• It can help in dimensionality reduction


• It can be used for feature pooling

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

• It suffers less overfitting due to small kernel size


• All of the above

View Answer
Correct Answer:
All of the above

13. Which of the following functions can be used as an activation function in


the output layer if we wish to predict the probabilities of n classes (p1, p2..pk)
such that sum of p over all n equals to 1?

• Softmax
• ReLu
• Sigmoid
• Tanh

View Answer
Correct Answer:
Softmax

14. The number of nodes in the input layer is 10 and the hidden layer is 5. The
maximum number of connections from the input layer to the hidden layer are

• 50
• less than 50
• more than 50
• It is an arbitrary value

View Answer
Correct Answer:
50

15. In which of the following applications can we use deep learning to solve
the problem?

• Protein structure prediction


• Prediction of chemical reactions
• Detection of exotic particles
• All of the above

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

View Answer
Correct Answer:
All of the above

16. Assume a simple MLP model with 3 neurons and inputs= 1,2,3. The
weights to the input neurons are 4,5 and 6 respectively. Assume the activation
function is a linear constant value of 3. What will be the output ?

• 32
• 64
• 96
• 128

View Answer
Correct Answer:
96

17. In a simple MLP model with 8 neurons in the input layer, 5 neurons in the
hidden layer and 1 neuron in the output layer. What is the size of the weight
matrices between hidden output layer and input hidden layer?

• [1 X 5] , [5 X 8]
• [5 x 1] , [8 X 5]
• [8 X 5] , [5 X 1]
• [8 X 5] , [ 1 X 5]

View Answer
Correct Answer:
[5 x 1] , [8 X 5]

18. Which of the following would have a constant input in each epoch of
training a Deep Learning model?

• Weight between input and hidden layer


• Weight between hidden and output layer
• Biases of all hidden layer neurons
• Activation function of output layer

View Answer

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Correct Answer:
Weight between input and hidden layer

19. In CNN, having max pooling always decrease the parameters?

• True
• False
• Can be true or false
• Cannot say

View Answer
Correct Answer:
False

20. Sentiment analysis using Deep Learning is a many-to one prediction task

• True
• False
• Can be true or false
• Cannot say

View Answer
Correct Answer:
True

21. Which, if any, of the following propositions is true about fully-connected


neural networks (FCNN)?

• In a FCNN, there are connections between neurons of a same layer.


• In a FCNN, the most common weight initialization scheme is the zero
initialization, because it leads to faster and more robust training.
• A FCNN with only linear activations is a linear network.
• None of the above

View Answer
Correct Answer:
A FCNN with only linear activations is a linear network.

22. What consist of Boltzmann machine?

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

• fully connected network with both hidden and visible units


• asynchronous operation
• stochastic update
• all of the mentioned

View Answer
Correct Answer:
all of the mentioned

23. In which neural net architecture, does weight sharing occur?

• Convolutional neural Network


• Recurrent Neural Network
• Fully Connected Neural Network
• Both1 and 2

View Answer
Correct Answer:
Both1 and 2

24. Which of the following methods DOES NOT prevent a model from
overfitting to the training set?

• Early stopping
• Dropout
• Data augmentation
• Pooling

View Answer
Correct Answer:
Pooling

25. Assume that your machine has a large enough RAM dedicated to training
neural networks. Compared to using stochastic gradient descent for your
optimization, choosing a batch size that fits your RAM will lead to::

• a more precise but slower update.


• a more precise and faster update.
• a less precise but faster update.
• a less precise and slower update.

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

View Answer
Correct Answer:
a more precise but slower update.

Question 1
For which purpose Convolutional Neural Network is used?

Mainly to process and analyse digital images, with some success cases

involving processing voice and natural language.

It is a multi purpose alghorithm that can be used for Unsupervised Learning.

Mainly to process and analyse financial models, predicting future trends.

It is a multi purpose alghorithm that can be used for Supervised Learning.


CNN has some components and parameters which works well with images.
That´s why it´s mainly used to analyse and predict images.
Question 2
What is the biggest advantage utilizing CNN?

Little dependence on pre processing, decreasing the needs of human effort

developing its functionalities.

It is easy to understand and fast to implement.

It has the highest accuracy among all alghoritms that predicts images.

It works well both for Supervised and Unsupervised Learning.

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

With little dependence on pre processing, this algorhitm requires less human
effort. It is actually a self learner, which makes the pre processing phase,
easier.
Convolutional Neural Network has 5 basic components: Convolution, ReLU,
Pooling, Flattening and Full Connection. Based on this information, please
answer the questions below.
Question 3
Which answer explains better the Convolution?

Detect key features in images, respecting their spatial boundaries.

It is the first step to use CNN.

Understand the model features and selecting the best.

It is a technique to standardize the dataset.


This is the component which detect features in images preserving the
relationship between pixels by learning image features using small squares of
input data.
Question 4
Which answer explains better the ReLU?

Helps in the detection of features, decreasing the non-linearity of the image,

converting negative pixels to zero. This behavior allows you to detect

variations of attributes.

It is used to find the best features considering their correlation.

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Helps in the detection of features, increasing the non-linearity of the image,

converting positive pixels to zero. This behavior allows you to detect variations

of attributes.

A technique that allows you to find outliers.


Usually a image is highly non-linear, which means varied pixel values. This is a
scenario that is very difficult to a algorhitm makes correct predictions. ReLU
comes to decrease the non-linearity and make the job easier.
Question 5
Which answer explains better the Pooling?

It assists in the detection of features, even if they are distorted, in addition to

decreasing the attribute sizes, resulting in decreased computational need. It is

also very useful for extracting dominant attributes.

It assists in the detection of distorted features, in order to find dominant

attributes.

Creates a pool of data in order to improve the accuracy of the alghorithm

predicting images.

Decrease the features size, in order to decrease the computional power that

are needed.
As a result of pooling, even if the picture were a little tilted, the largest number
in a certain region of the feature map would have been recorded and hence,

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

the feature would have been preserved. Also as another benefit, reducing the
size by a very significant amount will uses less computional power.
Question 6
Which answer explains better the Flattening?

Once we have the pooled feature map, this component transforms the

information into a vector. It's the input we need to get on with Artificial Neural

Networks.

Transform images to vectors to make it easier to predict.

Delete unnecessary features to make our dataset cleaner.

It is the last step of CNN.


In the flattening procedure, we basically take the elements in a pooled feature
map and put them in a vector form. This becomes the input layer for the
upcoming ANN.
Question 7
Which answer explains better the Full Connection?

Full Connection acts by placing different weights in each synapse in order to

minimize errors. This step can be repeated until an expected result is achieved.

Full Connection acts by placing different weights in each synapse in order to

minimize errors. No iteration is needed, since we can get the best results in our

first attempt.

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

It is the last step of CNN, where we connect the results of the earlier

componentes to create a output.

It is a componente that connects diferents alghorithms in order to increase the

accuracy.
It works like a ANN, assigning random weights to each synapse, the input layer
is weight adjusted and put into an activation function. The output of this is
then compared to the true values and the error generated is back-propagated,
i.e. the weights are re-adjusted and all the processes repeated. This is done
until the error or cost function is minimised.
Question 8
What are the Pooling Types? What are their characteristics?

Max Pooling and Average Pooling. Max pooling returns the maximum value of

the portion covered by the kernel and suppresses the Noises, while Average

pooling only returns the measure of that portion.

Max Pooling and Average Pooling. Max pooling returns the maximum value of

the portion covered by the kernel, while Average pooling returns the measure

of that portion and suppresses the Noises.

Max Pooling and Minimum Pooling. Max pooling returns the maximum value

of the portion covered by the kernel and suppresses the Noises, while

Minimum pooling only returns the smallest value of that portion.

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Max Pooling and Std Pooling. Max pooling returns the maximum value of the

portion covered by the kernel, while Std Pooling returns the standard deviation

of that portion.
It is recommended to use Max Pooling most of the time.
Question 9
CNN is divided in two big steps. Feature Learning and Classification. What
happens in each step?

Feature Learning has Convolution, ReLU and Pooling components, with

inumerous iterations between them before move to Classification, which uses

the Flattening and Full Connection components.

Feature Learning has Flattening and Full Connection components, with

inumerous iterations between them before move to Classification, which uses

the Convolution, ReLU and Pooling componentes.

During Feature Learning, CNN uses appropriates alghorithms to it, while during

classification its changes the alghorithm in order to achive the expected result.

option4
During Feature Learning, the algorhitm is learning about it´s dataset.
Components like Convolution, ReLU and Pooling works for that. Once the
features are known, the classification happens using the Flattening and Full
Connection components.
Question 10

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

What is the difference between CNN and ANN?

CNN has one or more layers of convolution units, which receives its input from

multiple units.

CNN uses a more simpler alghorithm than ANN.

CNN is a easiest way to use Neural Networks.

They complete eachother, so in order to use ANN, you need to start with CNN.
The only difference is the Convolutional component, which is what makes CNN
good in analysing and predict data like images. The other steps are the same.
Question 11
What is the benefit to use CNN instead ANN?

Reduce the number of units in the network, which means fewer parameters to

learn and reduced chance of overfitting. Also they consider the context

information in the small neighborhoos. This feature is very important to

achieve a better prediction in data like images.

Increase the number of units in the network, which means more parameters to

learn and increase chance of overfitting. Also they consider the context

information in the small neighborhoos. This feature is very important to

achieve a better prediction.

There is no benefit, ANN is always better.

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

CNN has better results since you have more computional power.
Since digital images are a bunch of pixels with high values, makes sense use
CNN to analyse them. CNN decrease their values, which is better for training
phase with less computional power and less information loss.
Question 12
What 'Shared Weights' means in CNN?
Well done, you are the best.

It is what makes CNN 'convolutional'. Forcing the neurons of one layer to share

weights, the forward pass becomes the equivalente of convolving a filter over

the image to produce a new image. Then the training phase become a task of

learning filters, deciding what features you should look for in the data.

Sharing weights among the features, make it easier and faster to CNN predict

the correct image.

It means that CNN use the weights of each feature in order to find the best

model to make prediction, sharing the results and returning the average.

It calculate the feature´s weights and compare with other alghorithms in order

to find the best parameters.


This feature is what makes CNN better to analyse images than ANN. The
Convolutional component of CNN simplify the images structures and the
algorhitm can predict better.

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Which of the following is a subset of machine learning?

A. Numpy
B. SciPy
C. Deep Learning
D. All of the above
View Answer
Ans : C

Explanation: Deep learning is a computer software that mimics the network of


neurons in a brain. It is a subset of machine learning and is called deep
learning.

2. How many layers Deep learning algorithms are constructed?

A. 2
B. 3
C. 4
D. 5
View Answer
Ans : B

Explanation: Deep learning algorithms are constructed with 3 connected layers


: inner layer, outer layer, hidden layer.

3. The first layer is called the?

A. inner layer
B. outer layer
C. hidden layer
D. None of the above
View Answer
Ans : A

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Explanation: The first layer is called the Input Layer. The last layer is called the
Output Layer. All layers in between are called Hidden Layers.

4. RNNs stands for?

A. Receives neural networks


B. Report neural networks
C. Recording neural networks
D. Recurrent neural networks
View Answer
Ans : D

Explanation: Recurrent neural networks (RNNs) : RNN is a multi-layered neural


network that can store information in context nodes, allowing it to learn data
sequences and output a number or another sequence.

5. Which of the following is/are Common uses of RNNs?

A. BusinessesHelp securities traders to generate analytic reports


B. Detect fraudulent credit-card transaction
C. Provide a caption for images
D. All of the above
View Answer
Ans : D

Explanation: All of the above are Common uses of RNNs.

6. Which of the following is well suited for perceptual tasks?

A. Feed-forward neural networks


B. Recurrent neural networks
C. Convolutional neural networks
D. Reinforcement Learning
View Answer
Ans : C

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Explanation: CNN is a multi-layered neural network with a unique architecture


designed to extract increasingly complex features of the data at each layer to
determine the output. CNNs are well suited for perceptual tasks.

7. CNN is mostly used when there is an?

A. structured data
B. unstructured data
C. Both A and B
D. None of the above
View Answer
Ans : B

Explanation: CNN is mostly used when there is an unstructured data set (e.g.,
images) and the practitioners need to extract information from it.

8. Which neural network has only one hidden layer between the input and
output?

A. Shallow neural network


B. Deep neural network
C. Feed-forward neural networks
D. Recurrent neural networks
View Answer
Ans : A

Explanation: Shallow neural network: The Shallow neural network has only one
hidden layer between the input and output.

9. Which of the following is/are Limitations of deep learning?

A. Data labeling
B. Obtain huge training datasets
C. Both A and B
D. None of the above
View Answer

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


lOMoARcPSD|7609677

Ans : C

Explanation: Both A and B are Limitations of deep learning.

10. Deep learning algorithms are _______ more accurate than machine
learning algorithm in image classification.

A. 33%
B. 37%
C. 40%
D. 41%
View Answer
Ans : D

Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)


Artificial Intelligence Questions and
Answers – Fuzzy Logic – 1
This set of Artificial Intelligence MCQs focuses on “Fuzzy Logic – 1”.

1. Fuzzy logic is a form of


a) Two-valued logic
b) Crisp set logic
c) Many-valued logic
d) Binary set logic
View Answer

Answer: c
Explanation: With fuzzy logic set membership is defined by certain value. Hence it
could have many values to be in the set.

2. Traditional set theory is also known as Crisp Set theory.


a) True
b) False
View Answer

Answer: a
Explanation: Traditional set theory set membership is fixed or exact either the
member is in the set or not. There is only two crisp values true or false. In case of
fuzzy logic there are many values. With weight say x the member is in the set

3. The truth values of traditional set theory is ____________ and that of fuzzy set is
__________
a) Either 0 or 1, between 0 & 1
b) Between 0 & 1, either 0 or 1
c) Between 0 & 1, between 0 & 1
d) Either 0 or 1, either 0 or 1
View Answer

Answer: a
Explanation: Refer the definition of Fuzzy set and Crisp set.

4. Fuzzy logic is extension of Crisp set with an extension of handling the concept of
Partial Truth.
a) True
b) False
View Answer

Answer: a
Explanation: None.
advertisements
5. How many types of random variables are available?
a) 1
b) 2
c) 3
d) 4
View Answer
Answer: c
Explanation: The three types of random variables are Boolean, discrete and
continuous.

6. The room temperature is hot. Here the hot (use of linguistic variable is used) can be
represented by _______ .
a) Fuzzy Set
b) Crisp Set
View Answer

Answer: a
Explanation: Fuzzy logic deals with linguistic variables.

7. The values of the set membership is represented by


a) Discrete Set
b) Degree of truth
c) Probabilities
d) Both b & c
View Answer

Answer: b
Explanation: Both Probabilities and degree of truth ranges between 0 – 1.

8. What is meant by probability density function?


a) Probability distributions
b) Continuous variable
c) Discrete variable
d) Probability distributions for Continuous variables
View Answer

Answer: d
Explanation: None.
advertisements
9. Japanese were the first to utilize fuzzy logic practically on high-speed trains in
Sendai.
a) True
b) False
View Answer
Answer: a
Explanation: None.

10. Which of the following is used for probability theory sentences?


a) Conditional logic
b) Logic
c) Extension of propositional logic
d) None of the mentioned
View Answer

Answer: c
Explanation: The version of probability theory we present uses an extension of
propositional logic for its sentences.

Artificial Intelligence Questions and


Answers – Fuzzy Logic – 2
This set of Artificial Intelligence MCQs focuses on “Fuzzy Logic – 2”.

1. Fuzzy Set theory defines fuzzy operators. Choose the fuzzy operators from the
following.
a) AND
b) OR
c) NOT
d) EX-OR
View Answer

Answer: a, b, c
Explanation: The AND, OR, and NOT operators of Boolean logic exist in fuzzy logic,
usually defined as the minimum, maximum, and complement;

2. There are also other operators, more linguistic in nature, called __________ that
can be applied to fuzzy set theory.
a) Hedges
b) Lingual Variable
c) Fuzz Variable
d) None of the mentioned
View Answer

Answer: a
Explanation: None.

3. Where does the Bayes rule can be used?


a) Solving queries
b) Increasing complexity
c) Decreasing complexity
d) Answering probabilistic query
View Answer

Answer: d
Explanation: Bayes rule can be used to answer the probabilistic queries conditioned
on one piece of evidence.
4. What does the Bayesian network provides?
a) Complete description of the domain
b) Partial description of the domain
c) Complete description of the problem
d) None of the mentioned
View Answer

Answer: a
Explanation: A Bayesian network provides a complete description of the domain.
advertisements
5. Fuzzy logic is usually represented as
a) IF-THEN-ELSE rules
b) IF-THEN rules
c) Both a & b
d) None of the mentioned
View Answer
Answer: b
Explanation: Fuzzy set theory defines fuzzy operators on fuzzy sets. The problem in
applying this is that the appropriate fuzzy operator may not be known. For this reason,
fuzzy logic usually uses IF-THEN rules, or constructs that are equivalent, such as
fuzzy associative matrices.
Rules are usually expressed in the form:
IF variable IS property THEN action

6. Like relational databases there does exists fuzzy relational databases.


a) True
b) False
View Answer

Answer: a
Explanation: Once fuzzy relations are defined, it is possible to develop fuzzy
relational databases. The first fuzzy relational database, FRDB, appeared in Maria
Zemankova’s dissertation.

7. ______________ is/are the way/s to represent uncertainty.


a) Fuzzy Logic
b) Probability
c) Entropy
d) All of the mentioned
View Answer

Answer: d
Explanation: Entropy is amount of uncertainty involved in data. Represented by
H(data).

8. ____________ are algorithms that learn from their more complex environments
(hence eco) to generalize, approximate and simplify solution logic.
a) Fuzzy Relational DB
b) Ecorithms
c) Fuzzy Set
d) None of the mentioned
View Answer

Answer: c
Explanation: Local structure is usually associated with linear rather than exponential
growth in complexity.
advertisements
9. Which condition is used to influence a variable directly by all the others?
a) Partially connected
b) Fully connected
c) Local connected
d) None of the mentioned
View Answer
Answer: b
Explanation: None.

10. What is the consequence between a node and its predecessors while creating
Bayesian network?
a) Conditionally dependent
b) Dependent
c) Conditionally independent
d) Both a & b
View Answer

Answer: c
Explanation: The semantics to derive a method for constructing Bayesian networks
were led to the consequence that a node can be conditionally independent of its
predecessors
Artificial Intelligence Questions and
Answers – Neural Networks – 1
This set of Artificial Intelligence MCQs focuses on “Neural Networks – 1”.

1. A 3-input neuron is trained to output a zero when the input is 110 and a one when
the input is 111. After generalization, the output will be zero when and only when the
input is:
a) 000 or 110 or 011 or 101
b) 010 or 100 or 110 or 101
c) 000 or 010 or 110 or 100
d) 100 or 111 or 101 or 001
View Answer

Answer: c
Explanation: The truth table before generalization is:
Inputs Output
000 $
001 $
010 $
011 $
100 $
101 $
110 0
111 1
where $ represents don’t know cases and the output is random.
After generalization, the truth table becomes:
Inputs Output
000 0
001 1
010 0
011 1
100 0
101 1
110 0
111 1
.

2. A perceptron is:
a) a single layer feed-forward neural network with pre-processing
b) an auto-associative neural network
c) a double layer auto-associative neural network
d) a neural network that contains feedback
View Answer

Answer: a
Explanation: The perceptron is a single layer feed-forward neural network. It is not an
auto-associative network because it has no feedback and is not a multiple layer neural
network because the pre-processing stage is not made of neurons.

3. An auto-associative network is:


a) a neural network that contains no loops
b) a neural network that contains feedback
c) a neural network that has only one loop
d) a single layer feed-forward neural network with pre-processing
View Answer

Answer: b
Explanation: An auto-associative network is equivalent to a neural network that
contains feedback. The number of feedback paths(loops) does not have to be one.

4. A 4-input neuron has weights 1, 2, 3 and 4. The transfer function is linear with the
constant of proportionality being equal to 2. The inputs are 4, 10, 5 and 20
respectively. The output will be:
a) 238
b) 76
c) 119
d) 123
View Answer

Answer: a
Explanation: The output is found by multiplying the weights with their respective
inputs, summing the results and multiplying with the transfer function. Therefore:
Output = 2 * (1*4 + 2*10 + 3*5 + 4*20) = 238.
advertisements
5. Which of the following is true?
(i) On average, neural networks have higher computational rates than conventional
computers.
(ii) Neural networks learn by example.
(iii) Neural networks mimic the way the human brain works.
a) All of the mentioned are true
b) (ii) and (iii) are true
c) (i), (ii) and (iii) are true
d) None of the mentioned
View Answer
Answer: a
Explanation: Neural networks have higher computational rates than conventional
computers because a lot of the operation is done in parallel. That is not the case when
the neural network is simulated on a computer. The idea behind neural nets is based
on the way the human brain works. Neural nets cannot be programmed, they cam only
learn by examples.

6. Which of the following is true for neural networks?


(i) The training time depends on the size of the network.
(ii) Neural networks can be simulated on a conventional computer.
(iii) Artificial neurons are identical in operation to biological ones.
a) All of the mentioned
b) (ii) is true
c) (i) and (ii) are true
d) None of the mentioned
View Answer

Answer: c
Explanation: The training time depends on the size of the network; the number of
neuron is greater and therefore the number of possible ‘states’ is increased. Neural
networks can be simulated on a conventional computer but the main advantage of
neural networks – parallel execution – is lost. Artificial neurons are not identical in
operation to the biological ones.

7. What are the advantages of neural networks over conventional computers?


(i) They have the ability to learn by example
(ii) They are more fault tolerant
(iii)They are more suited for real time operation due to their high ‘computational’
rates
a) (i) and (ii) are true
b) (i) and (iii) are true
c) Only (i)
d) All of the mentioned
View Answer

Answer: d
Explanation: Neural networks learn by example. They are more fault tolerant because
they are always able to respond and small changes in input do not normally cause a
change in output. Because of their parallel architecture, high computational rates are
achieved.

8. Which of the following is true?


Single layer associative neural networks do not have the ability to:
(i) perform pattern recognition
(ii) find the parity of a picture
(iii)determine whether two or more shapes in a picture are connected or not
a) (ii) and (iii) are true
b) (ii) is true
c) All of the mentioned
d) None of the mentioned
View Answer

Answer: a
Explanation: Pattern recognition is what single layer neural networks are best at but
they don’t have the ability to find the parity of a picture or to determine whether two
shapes are connected or not.
advertisements
9. Which is true for neural networks?
a) It has set of nodes and connections
b) Each node computes it’s weighted input
c) Node could be in excited state or non-excited state
d) All of the mentioned
View Answer
Answer: d
Explanation: All mentioned are the characteristics of neural network.

10. Neuro software is:


a) A software used to analyze neurons
b) It is powerful and easy neural network
c) Designed to aid experts in real world
d) It is software used by Neuro surgeon
View Answer

Answer: b
Explanation: None.

Artificial Intelligence Questions and


Answers – Neural Networks – 2
This set of Artificial Intelligence MCQs focuses on “Neural Networks – 2”.

1. Why is the XOR problem exceptionally interesting to neural network researchers?


a) Because it can be expressed in a way that allows you to use a neural network
b) Because it is complex binary operation that cannot be solved using neural networks
c) Because it can be solved by a single layer perceptron
d) Because it is the simplest linearly inseparable problem that exists.
View Answer

Answer: d
Explanation: None.

2. What is back propagation?


a) It is another name given to the curvy function in the perceptron
b) It is the transmission of error back through the network to adjust the inputs
c) It is the transmission of error back through the network to allow weights to be
adjusted so that the network can learn.
d) None of the mentioned
View Answer

Answer: c
Explanation: Back propagation is the transmission of error back through the network
to allow weights to be adjusted so that the network can learn.

3. Why are linearly separable problems of interest of neural network researchers?


a) Because they are the only class of problem that network can solve successfully
b) Because they are the only class of problem that Perceptron can solve successfully
c) Because they are the only mathematical functions that are continue
d) Because they are the only mathematical functions you can draw
View Answer

Answer: b
Explanation: Linearly separable problems of interest of neural network researchers
because they are the only class of problem that Perceptron can solve successfully

4. Which of the following is not the promise of artificial neural network?


a) It can explain result
b) It can survive the failure of some nodes
c) It has inherent parallelism
d) It can handle noise
View Answer

Answer: a
Explanation: The artificial Neural Network (ANN) cannot explain result.
advertisements
5. Neural Networks are complex ______________ with many parameters.
a) Linear Functions
b) Nonlinear Functions
c) Discrete Functions
d) Exponential Functions
View Answer
Answer: a
Explanation: Neural networks are complex linear functions with many parameters.

6. A perceptron adds up all the weighted inputs it receives, and if it exceeds a certain
value, it outputs a 1, otherwise it just outputs a 0.
a) True
b) False
c) Sometimes – it can also output intermediate values as well
d) Can’t say
View Answer

7. The name for the function in question 16 is


a) Step function
b) Heaviside function
c) Logistic function
d) Perceptron function
View Answer

Answer: b
Explanation: Also known as the step function – so answer 1 is also right. It is a hard
thresholding function, either on or off with no in-between.

8. Having multiple perceptrons can actually solve the XOR problem satisfactorily:
this is because each perceptron can partition off a linear part of the space itself, and
they can then combine their results.
a) True – this works always, and these multiple perceptrons learn to classify even
complex problems.
b) False – perceptrons are mathematically incapable of solving linearly inseparable
functions, no matter what you do
c) True – perceptrons can do this but are unable to learn to do it – they have to be
explicitly hand-coded
d) False – just having a single perceptron is enough
View Answer

Answer: c
Explanation: None.
advertisements
9. The network that involves backward links from output to the input and hidden
layers is called as ____.
a) Self organizing maps
b) Perceptrons
c) Recurrent neural network
d) Multi layered perceptron
View Answer
Answer: c
Explanation: RNN (Recurrent neural network) topology involves backward links from
output to the input and hidden layers.

10. Which of the following is an application of NN (Neural Network)?


a) Sales forecasting
b) Data validation
c) Risk management
d) All of the mentioned
View Answer

Answer: d
Explanation: All mentioned options are applications of Neural Network

Artificial Intelligence Questions and


Answers – Learning – 3
This set of Artificial Intelligence MCQs focuses on “Learning – 3”.

1. Which is not a desirable property of a logical rule-based system?


a) Locality
b) Attachment
c) Detachment
d) Truth-Functionality
e) Global attribute
View Answer

Answer: b
Explanation: Locality: In logical systems, whenever we have a rule of the form A =>
B, we can conclude B, given evidence A, without worrying about any other rules.
Detachment: Once a logical proof is found for a proposition B, the proposition can be
used regardless of how it was derived .That is, it can be detachment from its
justification. Truth-functionality: In logic, the truth of complex sentences can be
computed from the truth of the components. However, there are no Attachment
properties lies in a Rule-based system. Global attribute defines a particular problem
space as user specific and changes according to user’s plan to problem.

2. How is Fuzzy Logic different from conventional control methods?


a) IF and THEN Approach
b) FOR Approach
c) WHILE Approach
d) DO Approach
e) Else If approach
View Answer

Answer: a
Explanation: FL incorporates a simple, rule-based IF X AND Y THEN Z approach to
a solving control problem rather than attempting to model a system mathematically.

3. In an Unsupervised learning
a) Specific output values are given
b) Specific output values are not given
c) No specific Inputs are given
d) Both inputs and outputs are given
e) Neither inputs nor outputs are given
View Answer

Answer: b
Explanation: The problem of unsupervised learning involves learning patterns in the
input when no specific output values are supplied. We cannot expect the specific
output to test your result. Here the agent does not know what to do, as he is not aware
of the fact what propose system will come out. We can say an ambiguous un-proposed
situation.

4. Inductive learning involves finding a


a) Consistent Hypothesis
b) Inconsistent Hypothesis
c) Regular Hypothesis
d) Irregular Hypothesis
e) Estimated Hypothesis
View Answer
Answer: a
Explanation: Inductive learning involves finding a consistent hypothesis that agrees
with examples. The difficulty of the task depends on the chosen representation.
advertisements
5. Computational learning theory analyzes the sample complexity and computational
complexity of
a) Unsupervised Learning
b) Inductive learning
c) Forced based learning
d) Weak learning
e) Knowledge based learning
View Answer
Answer: b
Explanation: Computational learning theory analyzes the sample complexity and
computational complexity of inductive learning. There is a tradeoff between the
expressiveness of the hypothesis language and the ease of learning.

6. If a hypothesis says it should be positive, but in fact, it is negative, we call it


a) A consistent hypothesis
b) A false negative hypothesis
c) A false positive hypothesis
d) A specialized hypothesis
e) A true positive hypothesis
View Answer

Answer: c
Explanation: Consistent hypothesis go with examples, If the hypothesis says it should
be negative but infect it is positive, it is false negative. If a hypothesis says it should
be positive, but in fact, it is negative, it is false positive. In a specialized hypothesis
we need to have certain restrict or special conditions.

7. Neural Networks are complex ———————–with many parameters.


a) Linear Functions
b) Nonlinear Functions
c) Discrete Functions
d) Exponential Functions
e) Power Functions
View Answer

Answer: b
Explanation: Neural networks parameters can be learned from noisy data and they
have been used for thousands of applications, so it varies from problem to problem
and thus use nonlinear functions.

8. A perceptron is a ——————————–.
a) Feed-forward neural network
b) Back-propagation algorithm
c) Back-tracking algorithm
d) Feed Forward-backward algorithm
e) Optimal algorithm with Dynamic programming
View Answer

Answer: a
Explanation: A perceptron is a Feed-forward neural network with no hidden units that
can be representing only linear separable functions. If the data are linearly separable,
a simple weight updated rule can be used to fit the data exactly.
advertisements
9. Which of the following statement is true?
a) Not all formal languages are context-free
b) All formal languages are Context free
c) All formal languages are like natural language
d) Natural languages are context-oriented free
e) Natural language is formal
View Answer
Answer: a
Explanation: Not all formal languages are context-free.

10. Which of the following statement is not true?


a) The union and concatenation of two context-free languages is context-free
b) The reverse of a context-free language is context-free, but the complement need not
be
c) Every regular language is context-free because it can be described by a regular
grammar
d) The intersection of a context-free language and a regular language is always
context-free
e) The intersection two context-free languages is context-free
View Answer

Answer: e
Explanation: The union and concatenation of two context-free languages is context-
free; but intersection need not be.

Artificial Intelligence Questions and


Answers – Learning – 2
This set of Artificial Intelligence MCQs focuses on “Learning – 2”.

1. Factors which affect the performance of learner system does not include
a) Representation scheme used
b) Training scenario
c) Type of feedback
d) Good data structures
View Answer
Answer: d
Explanation: Factors which affect the performance of learner system does not include
good data structures.

2. Different learning method does not include:


a) Memorization
b) Analogy
c) Deduction
d) Introduction
View Answer

Answer: d
Explanation: Different learning methods include memorization, analogy and
deduction.

3. Which of the following is the model used for learning?


a) Decision trees
b) Neural networks
c) Propositional and FOL rules
d) All of the mentioned
View Answer

Answer: d
Explanation: Decision trees, Neural networks, Propositional rules and FOL rules all
are the models of learning.

4. Automated vehicle is an example of ______.


a) Supervised learning
b) Unsupervised learning
c) Active learning
d) Reinforcement learning
View Answer

Answer: a
Explanation: In automatic vehicle set of vision inputs and corresponding actions are
available to learner hence it’s an example of supervised learning.
advertisements
5. Following is an example of active learning:
a) News Recommender system
b) Dust cleaning machine
c) Automated vehicle
d) None of the mentioned
View Answer
Answer: a
Explanation: In active learning, not only the teacher is available but the learner can
ask suitable perception-action pair example to improve performance.

6. In which of the following learning the teacher returns reward and punishment to
learner?
a) Active learning
b) Reinforcement learning
c) Supervised learning
d) Unsupervised learning
View Answer

Answer: b
Explanation: Reinforcement learning is the type of learning in which teacher returns
award or punishment to learner.

7. Decision trees are appropriate for the problems where:


a) Attributes are both numeric and nominal
b) Target function takes on a discrete number of values.
c) Data may have errors
d) All of the mentioned
View Answer

Answer: d
Explanation: Decision trees can be used in all the conditions stated.

8. Which of the following is not an application of learning?


a) Data mining
b) WWW
c) Speech recognition
d) None of the mentioned
View Answer

Answer: d
Explanation: All mentioned options are applications of learning.
advertisements
9. Which of the following is the component of learning system?
a) Goal
b) Model
c) Learning rules
d) All of the mentioned
View Answer
Answer: d
Explanation: Goal, model, learning rules and experience are the components of
learning system.

10. Following is also called as exploratory learning:


a) Supervised learning
b) Active learning
c) Unsupervised learning
d) Reinforcement learning
View Answer
Answer: c
Explanation: In unsupervised learning no teacher is available hence it is also called
unsupervised learning.

Artificial Intelligence Questions and


Answers – Learning – 1
This set of Artificial Intelligence MCQs focuses on “Learning – 1”.

1. What will take place as the agent observes its interactions with the world?
a) Learning
b) Hearing
c) Perceiving
d) Speech
View Answer

Answer: a
Explanation: Learning will take place as the agent observes its interactions with the
world and its own decision making process.

2. Which modifies the performance element so that it makes better decision?


a) Performance element
b) Changing element
c) Learning element
d) None of the mentioned
View Answer

Answer: c
Explanation: A learning element modifies the performance element so that it can make
better decision.

3. How many things are concerned in design of a learning element?


a) 1
b) 2
c) 3
d) 4
View Answer

Answer: c
Explanation: The three main issues are affected in design of a learning element are
components, feedback and representation.

4. What is used in determining the nature of the learning problem?


a) Environment
b) Feedback
c) Problem
d) All of the mentioned
View Answer
Answer: b
Explanation: The type of feedback is used in determining the nature of the learning
problem that the agent faces.
advertisements
5. How many types are available in machine learning?
a) 1
b) 2
c) 3
d) 4
View Answer
Answer: c
Explanation: The three types of machine learning are supervised, unsupervised and
reinforcement.

6. Which is used for utility functions in game playing algorithm?


a) Linear polynomial
b) Weighted polynomial
c) Polynomial
d) Linear weighted polynomial
View Answer

Answer: d
Explanation: Linear weighted polynomial is used for learning element in the game
playing programs.

7. Which is used to choose among multiple consistent hypotheses?


a) Razor
b) Ockham razor
c) Learning element
d) None of the mentioned
View Answer

Answer: b
Explanation: Ockham razor prefers the simplest hypothesis consistent with the data
intuitively.

8. What will happen if the hypothesis space contains the true function?
a) Realizable
b) Unrealizable
c) Both a & b
d) None of the mentioned
View Answer

Answer: b
Explanation: A learning problem is realizable if the hypothesis space contains the true
function.
advertisements
9. What takes input as an object described by a set of attributes?
a) Tree
b) Graph
c) Decision graph
d) Decision tree
View Answer
Answer: d
Explanation: Decision tree takes input as an object described by a set of attributes and
returns a decision.

10. How the decision tree reaches its decision?


a) Single test
b) Two test
c) Sequence of test
d) No test
View Answer

Answer: c
Explanation: A decision tree reaches its decision by performing a sequence of tests
1: ANN is composed of large number of highly interconnected processing
elements(neurons) working in unison to solve problems.

A.
True

B.
False

C.

D.

Answer Report Discuss

Option: A

Explanation :

2:
Artificial neural network used for

A.
Pattern Recognition

B.
Classification

C.
Clustering

D.
All of these

Answer Report Discuss


Option: D

Explanation :

3:
A Neural Network can answer

A.
For Loop questions

B.
what-if questions

C.
IF-The-Else Analysis Questions

D.
None of these

Answer Report Discuss

Option: B

Explanation :

4:
Ability to learn how to do tasks based on the data given for training or initial
experience

A.
Self Organization

B.
Adaptive Learning
C.
Fault tolerance

D.
Robustness

Answer Report Discuss

Option: B

Explanation :

5:
Feature of ANN in which ANN creates its own organization or representation of
information it receives during learning time is

A.
Adaptive Learning

B.
Self Organization

C.
What-If Analysis

D.
Supervised Learniing

Answer Report Discuss

Option: B

Explanation :
Read more: http://www.avatto.com/computer-science/test/mcqs/soft-
computing/ann/514/1.html#ixzz46VE8CQAp
6:
In artificial Neural Network interconnected processing elements are called

A.
nodes or neurons

B.
weights

C.
axons

D.
Soma

Answer Report Discuss

Option: A

Explanation :

7:
Each connection link in ANN is associated with ________ which has information
about the input signal.

A.
neurons

B.
weights
C.
bias

D.
activation function

Answer Report Discuss

Option: B

Explanation :

8:
Neurons or artificial neurons have the capability to model networks of original
neurons as found in brain

A.
True

B.
False

C.

D.

Answer Report Discuss

Option: A

Explanation :

9:
Internal state of neuron is called __________, is the function of the inputs the
neurons receives

A.
Weight
B.
activation or activity level of neuron

C.
Bias

D.
None of these

Answer Report Discuss

Option: B

Explanation :

10:
Neuron can send ________ signal at a time.

A.
multiple

B.
one

C.
none

D.
any number of
Answer Report Discuss

Option: B

Explanation :

Read more: http://www.avatto.com/computer-science/test/mcqs/soft-


computing/ann/514/2.html#ixzz46VEVzf3a
1:
Artificial intelligence is

A
. It uses machine-learning techniques. Here program can learn From past
experience and adapt themselves to new situations

B.
Computational procedure that takes some value as input and produces some
value as output.

C.
Science of making machines performs tasks that would require intelligence
when performed by humans

D
. None of these

Answer Report Discuss

Option: C

Explanation :

2:
Expert systems

A
. Combining different types of method or information
B.
Approach to the design of learning algorithms that is structured along the lines
of the theory of evolution

C.
an information base filled with the knowledge of an expert formulated in terms
of if-then rules

D
. None of these

Answer Report Discuss

Option: C

Explanation :

3:
Falsification is

A.
Modular design of a software application that facilitates the integration of new
modules

B.
Showing a universal law or rule to be invalid by providing a counter example

C.
A set of attributes in a database table that refers to data in another table
D.
None of these

Answer Report Discuss

Option: B

Explanation :

4:
Evolutionary computation is

A
. Combining different types of method or information

B.
Approach to the design of learning algorithms that is structured along the lines
of the theory of evolution.

C.
Decision support systems that contain an information base filled with the
knowledge of an expert formulated in terms of if-then rules.

D
. None of these

Answer Report Discuss

Option: B

Explanation :

5:
Extendible architecture is
A.
Modular design of a software application that facilitates the integration of new
modules

B.
Showing a universal law or rule to be invalid by providing a counter example

C.
A set of attributes in a database table that refers to data in another table

D.
None of these

Answer Report Discuss

Option: A

Explanation :

Read more: http://www.avatto.com/computer-science/test/mcqs/soft-


computing/questions/192/1.html#ixzz46VEoNPTw
6:
Massively parallel machine is

A.
A programming language based on logic

B.
A computer where each processor has its own operating system, its own
memory, and its own hard disk

C.
Describes the structure of the contents of a database.
D.
None of these

Answer Report Discuss

Option: B

Explanation :

7:
Search space

A
. The large set of candidate solutions possible for a problem

B.
The information stored in a database that can be, retrieved with a single query.

C.
Worth of the output of a machine learning program that makes it understandable
for humans

D
. None of these

Answer Report Discuss

Option: A

Explanation :

8:
n(log n) is referred to

A.
A measure of the desired maximal complexity of data mining algorithms
B.
A database containing volatile data used for the daily operation of an
organization

C.
Relational database management system

D.
None of these

Answer Report Discuss

Option: A

Explanation :

9:
Perceptron is

A.
General class of approaches to a problem.

B.
Performing several computations simultaneously

C.
Structures in a database those are statistically relevant

D.
Simple forerunner of modern neural networks, without hidden layers
Answer Report Discuss

Option: D

Explanation :

10:
Prolog is

A.
A programming language based on logic

B.
A computer where each processor has its own operating system, its own
memory, and its own hard disk

C.
Describes the structure of the contents of a database

D.
None of these

Answer Report Discuss

Option: A

Explanation :

Read more: http://www.avatto.com/computer-science/test/mcqs/soft-


computing/questions/192/2.html#ixzz46VF3O07W
11:
Shallow knowledge

A
. The large set of candidate solutions possible for a problem
B.
The information stored in a database that can be, retrieved with a single query

C.
Worth of the output of a machine learning program that makes it
understandable for humans

D
. None of these

Answer Report Discuss

Option: B

Explanation :

12:
Quantitative attributes are

A.
A reference to the speed of an algorithm, which is quadratically dependent
on the size of the data

B.
Attributes of a database table that can take only numerical values

C.
Tools designed to query a database

D.
None of these
Answer Report Discuss

Option: B

Explanation :

13:
Subject orientation

A
. The science of collecting, organizing, and applying numerical facts

B.
Measure of the probability that a certain hypothesis is incorrect given certain
observations.

C.
One of the defining aspects of a data warehouse, which is specially built
around all the existing applications of the operational data

D
. None of these

Answer Report Discuss

Option: C

Explanation :

14:
Vector

A.
It do not need the control of the human operator during their execution
B.
An arrow in a multi-dimensional space. It is a quantity usually characterized
by an ordered set of scalars

C.
The validation of a theory on the basis of a finite number of examples

D.
None of these

Answer Report Discuss

Option: B

Explanation :

15:
Transparency

A
. The large set of candidate solutions possible for a problem

B.
The information stored in a database that can be retrieved with a single query

C.
Worth of the output of a machine learning program that makes it
understandable for humans

D
. None of these

Answer Report Discuss


Option: C

Explanation :

Read more: http://www.avatto.com/computer-science/test/mcqs/soft-


computing/questions/192/3.html#ixzz46VFK5DKd
1:
Core of soft Computing is

A.
Fuzzy Computing, Neural Computing, Genetic Algorithms

B.
Fuzzy Networks and Artificial Intelligence

C.
Artificial Intelligence and Neural Science

D.
Neural Science and Genetic Science

Answer Report Discuss

Option: A

Explanation :

2:
Who initiated the idea of Soft Computing

A.
Charles Darwin

B.
Lofti A Zadeh
C.
Rechenberg

D.
Mc_Culloch

Answer Report Discuss

Option: B

Explanation :

3:
Fuzzy Computing

A
. mimics human behaviour

B.
doesnt deal with 2 valued logic

C.
deals with information which is vague, imprecise, uncertain, ambiguous,
inexact, or probabilistic

D
. All of the above

Answer Report Discuss

Option: D

Explanation :
4:
Neural Computing

A.
mimics human brain

B.
information processing paradigm

C.
Both (a) and (b)

D.
None of the above

Answer Report Discuss

Option: C

Explanation :

5:
Genetic Algorithm are a part of

A
. Evolutionary Computing

B.
inspired by Darwin's theory about evolution - "survival of the fittest"

C.
are adaptive heuristic search algorithm based on the evolutionary ideas of
natural selection and genetics
D
. All of the above

Answer Report Discuss

Option: D

Explanation

Read more: http://www.avatto.com/computer-science/test/mcqs/soft-


computing/introduction/512/1.html#ixzz46VFZ9z1x
6:
What are the 2 types of learning

A.
Improvised and unimprovised

B.
supervised and unsupervised

C.
Layered and unlayered

D.
None of the above

Answer Report Discuss

Option: B

Explanation :

7:
Supervised Learning is
A.
learning with the help of examples

B.
learning without teacher

C.
learning with the help of teacher

D.
learning with computers as supervisor

Answer Report Discuss

Option: C

Explanation :

8:
Unsupervised learning is

A.
learning without computers

B.
problem based learning

C.
learning from environment

D.
learning from teachers
Answer Report Discuss

Option: C

Explanation :

9:
Conventional Artificial Intelligence is different from soft computing in the sense

A.
Conventional Artificial Intelligence deal with prdicate logic where as soft
computing deal with fuzzy logic

B.
Conventional Artificial Intelligence methods are limited by symbols where
as soft computing is based on empirical data

C.
Both (a) and (b)

D.
None of the above

Answer Report Discuss

Option: C

Explanation :

10:
In supervised learning

A.
classes are not predefined
B.
classes are predefined

C.
classes are not required

D.
classification is not done

Answer Report Discuss

Option: B

Explanation :

Read more: http://www.avatto.com/computer-science/test/mcqs/soft-


computing/introduction/512/2.html#ixzz46VFqvgSd
1:
Membership function defines the fuzziness in a fuzzy set irrespective of the
elements in the set, which are discrete or continuous.

A.
True

B.
False

C.

D.

Answer Report Discuss

Option: A

Explanation :
2:
The membership functions are generally represented in

A.
Tabular Form

B.
Graphical Form

C.
Mathematical Form

D.
Logical Form

Answer Report Discuss

Option: B

Explanation :

3:
Membership function can be thought of as a technique to solve empirical problems
on the basis of

A.
knowledge

B.
examples

C.
learning
D.
experience

Answer Report Discuss

Option: D

Explanation :

4: Three main basic features involved in characterizing membership function are

A.
Intution, Inference, Rank Ordering

B.
Fuzzy Algorithm, Neural network, Genetic Algorithm

C.
Core, Support , Boundary

D.
Weighted Average, center of Sums, Median

Answer Report Discuss

Option: C

Explanation :

5:
The region of universe that is characterized by complete membership in the set is
called

A.
Core
B.
Support

C.
Boundary

D.
Fuzzy

Answer Report Discuss

Option: A

Explanation :

Read more: http://www.avatto.com/computer-science/test/mcqs/soft-


computing/questions/369/1.html#ixzz46VG385ou
6: A fuzzy set whose membership function has at least one element x in the universe
whose membership value
is unity is called

A.
sub normal fuzzy sets

B.
normal fuzzy set

C.
convex fuzzy set

D.
concave fuzzy set
Answer Report Discuss

7:
In a Fuzzy set a prototypical element has a value

A.
1

B.
0

C.
infinite

D.
Not defined

Answer Report Discuss

Option: A

Explanation :

8:
A fuzzy set wherein no membership function has its value equal to 1 is called

A.
normal fuzzy set

B.
subnormal fuzzy set.

C.
convex fuzzy set
D.
concave fuzzy set

Answer Report Discuss

Option: B

Explanation :

9: A fuzzy set has a membership function whose membership values are strictly
monotonically increasing or strictly monotonically decreasing or strictly
monotonically increasing than strictly monotonically decreasing with increasing
values for elements in the universe

A.
convex fuzzy set

B.
concave fuzzy set

C.
Non concave Fuzzy set

D.
Non Convex Fuzzy set

Answer Report Discuss

Option: A

Explanation :
10:
The membership values of the membership function are nor strictly
monotonically increasing or decreasing or strictly monoronically increasing than
decreasing.

A.
Convex Fuzzy Set

B.
Non convex fuzzy set

C.
Normal Fuzzy set

D.
Sub normal fuzzy set

Answer Report Discuss

Option: B

Explanation :

Read more: http://www.avatto.com/computer-science/test/mcqs/soft-


computing/questions/369/2.html#ixzz46VGHJtYr
11:
Match the Column

List I
List II

1 Subnormal Fuzzy Set

2 Normal Fuzzy Set

3 Non Convex Normal Fuzzy Set

4 Convex Normal Fuzzy Set

A.
a b c d
2 1 4 3

B.
a b c d

1 2 3 4

C.
a b c d

4 3 2 1

D.
a b c d

3 2 1 4

Answer Report Discuss

Option: A

Explanation :

12: The crossover points of a membership function are defined as the elements in the
universe for which a particular fuzzy set has values equal to

A.
infinite

B.
1

C.
0
D.
0.5

Answer Report Discuss

Option: D

Explanation :

Read more: http://www.avatto.com/computer-science/test/mcqs/soft-


computing/questions/369/3.html#ixzz46VGTKXoG

Questions

1. Which of the following(s) is/are found in Genetic Algorithms?

(i)

evolution

(ii)

selection

(iii)

reproduction

(iv)

mutation

: Your answer is

(a)

i & ii only

(b)

i, ii & iii only

(c)

ii, iii & iv only


(d)

all of the above

2. Matching between terminologies of Genetic Algorithms and


Genetics:

Genetic Algorithms Genetics (biology)

(a) (i)

representation external disturbance,


structures such as cosmic radiation

(b) (ii)

crossover chromosomes

(c) (iii)

mutation survivability

(d) (iv)

selection sexual reproduction

: Your answer is .3

4. (a)
5. _____
6. (b)
7. _____
8. (c)
9. _____
10.(d)
11._____

12.Where are Genetic Algorithms applicable?

(i)

real time application

(ii)

biology

(iii)

Artificial Life
(iv)

economics

: Your answer is

(a)

i, ii & iii only

(b)

ii, iii & iv only

(c)

i, iii & iv only

(d)

all of the above

13.Which of the following(s) is/are the pre-requisite(s) when Genetic


Algorithms are applied to solve problems?

(i)

encoding of solutions

(ii)

well-understood search space

(iii)

method of evaluating the suitability of the solutions

(iv)

contain only one optimal solution

: Your answer is

(a)

i & ii only

(b)

ii & iii only

(c)

i & iii only


(d)

iii & iv only

14.Which of the following statement(s) is/are true?

(i)

Genetic Algorithm is a randomised parallel search algorithm, based


on the principles of natural selection, the process of evolution.

(ii)

GAs are exhaustive, giving out all the optimal solutions to a given
problem.

(iii)

GAs are used for solving optimization problems and modeling


evolutionary phenomena in the natural world.

(iv)

Despite their utility, GAs remain a poorly understood topic.

: Your answer is

(a)

i, ii & iii only

(b)

ii, iii & iv only

(c)

i, iii & iv only

(d)

all of the above

15.If crossover between chromosome in search space does not produce


significantly different offsprings, what does it imply? (if offspring
consist of one half of each parent)

(i)

The crossover operation is not succesful.

(ii)

Solution is about to be reached.


(iii)

Diversity is so poor that the parents involved in the crossover


operation are similar.

(iv)

The search space of the problem is not ideal for GAs to operate.

: Your answer is

(a)

ii, iii & iv only

(b)

ii & iii only

(c)

i, iii & iv only

(d)

all of the above

16.Which of the following comparison is true?

: Your answer is

(a)

In the event of restricted acess to information, GAs win out in that


they require much fewer information to operate than other search.

(b)

Under any circumstances, GAs always outperform other algorithms.

(c)

The qualities of solutions offered by GAs for any problems are


always better than those provided by other search.

(d)

GAs could be applied to any problem, whereas certain algorithms


are applicable to limited domains.

17.Which of the following statement(s) is/are true?

(i)
Artificial Life is analytic, trying to break down complex phenomena
into their basic components.

(ii)

Alife is a kind of Artificial Intelligence (AI).

(iii)

Alife pursues a two-fold goal: increasing our understanding of


nature and enhancing our insight into artificial models, thereby
providing us with the ability to improve their performance.

(iv)

Alife extends our studies of biology, life-as-we-know-it, to the larger


domain of possible life, life-as-it-could-be.

: Your answer is

(a)

i & ii only

(b)

iii & iv only

(c)

i, ii & iii only

(d)

all of the above

18.Where is Artificial Life applicable?

(i)

film (movie, video) production

(ii)

biology

(iii)

robotics

(iv)

air traffic control


: Your answer is

(a)

i, ii & iii only

(b)

ii, iii & iv only

(c)

i, iii & iv only

(d)

all of the above

19.Who can be benefited from Alife?

(i)

children

(ii)

designers

(iii)

artists

(iv)

patients

: Your answer is

(a)

i, ii & iii only

(b)

ii, iii & iv only

(c)

i, iii & iv only

(d)

all of the above


: Answers

Q1.

Which of the following(s) is/are found in Genetic Algorithms?

The correct answer is (d).

An initial population evolves to some optimal solutions. Selection biases for


better individuals, judged by their fitness values; two individuals are chosen
for reproducing offspring. By combining portions of good individuals, this
.process is likely to create even better individuals

...Go Back

Q2.

Matching between terminologies of Genetic Algorithms and


Genetics:

The correct answer is :

(a)

(ii)

(b)

(iv)

(c)

(i)

(d)

(iii)

...Go Back

Q3.

Where are Genetic Algorithms applicable?

The correct answer is (b).

Genetic Algorithms can be used to evolve strategies for interaction in the


Prisoner's Dilemma in economics. GAs are used as a computational method in
Alife - simulation of living systems starting with single cells and evolving to
orgranisms, societies or even whole economic systems. These features
compete for the limited resources in this virtual world. In biology, GAs are
used in protein structure prediction, protein folding, stability of DNA hairpins
.and modeling of immune system

DNA structures Protein Structures

It cannot be applied in real time systems. The response time is critical.


However, GAs cannot guarantee to find a solution. The time spent in
evaluation of fitness function and other genetic operations is substantially
.large, especially in a poorly- understood, complex search space

...Go Back

Q4.

Which of the following(s) is/are the requirement(s) when Genetic


Algorithms are applied to solve problems?

The correct answer is (c).

The problem is mapped into a set of strings with each string representing a
potential solution (i.e. chromosomes). A fitness function is required to
compare and tell which solution is better. GA performance is heavily
.dependent on the representation chosen

GAs are designed to efficiently search large, non-linear, poorly understood


search space where expert knowledge is scarce or difficult to encode and
where traditional techniques fail. However, domain knowledge guides GAs to
obtain the optimal solutions. Moreover, GAs are powerful enough to solve for
.a set of (nearly) optimal solutions

...Go Back
Q5.

Which of the following statement(s) is/are true?

The correct answer is (c).

The search space is too complex for exhaustive search such that GAs
successfully find robust solutions after evaluating only a few percent of the
.full parameter space

It can never be guaranteed that GAs will find an optimal solution or even any
.solution at all

Their probabilistic nature and reliance on frequent interactions of members of


a large population make a complete analytic understanding of GAs extremely
.difficult

...Go Back

Q6.

If crossover between chromosome in search space does not produce


significantly different offspring, what does it imply? (if offspring
consist of one half of each parent)

The correct answer is (b).

When crossover operation does not produce siginificantly different offsprings,


it shows that the parents involved are almost identical. Hence, it means that
solution is about to be reached. However, this solution derived is not
neccessarily the optimal solution. From here, we could see that mutation is
necessary to maintain the diversity of the population so that GAs would not be
.trapped in partial solutions

...Go Back

Q7.

Which of the following comparison is true?

The correct answer is (a).


 This is true since GAs require only information that would
evaluate the fitness function for the possible soulutions
(individuals in search space). But for other searches which
generally require more information, like differentiability of
problem function, might find it hard to find them.

 This holds true in most circumstances. However, if the search


space is small enough, other search like hill-climbing or
heuristic, which are very effective in explorating small space,
would just perform as good.

 GAs have only been developed for a couple of decades while


traditional searches have been investigated for a longer time.
Thus GAs do not necessarily produce a better quality solution.

 Evidently certain algorithms are only applicable to limited


domains . However, certain difficulties, like encoding of
problems, might hinder the use of GAs.

...Go Back

Q8.

Which of the following statement(s) is/are true?

The correct answer is (b).

Alife is characterised by a bottom-up synthesis approach, so that the robotics


work tends to aim for insect-like capability rather than human, and complex
hebaviours are developed by putting together more simple ones. Artificial
forms of evolution such as Genetic Algorithms and Genetic Programming are
widely used to evolve solutions or behaviours rather than designing them in a
.top-down fashion in Artificial Intelligence

...Go Back

Q9.

Where is Artificial Life applicable?

The correct answer is (d).

Alife is applicable in many fields, such as a walking robot


.shown on the right

...Go Back
Q10.

Who can be benefited from Alife?

The correct answer is (d).

Children can use various computational tools (including LEGO/Logo


and Electronic Bricks) to build artificial creatures, exploring

.some of the central ideas of Alife

GAs can be applied to the design of laminated composite structures, circuit


designs and the improvement of Pareto optimal designs. Genetic programming
can help artists to create many pictures. Medical problems can also be
.detected: Medibrains

...Go Back
SOFT COMPUTING

UNIT – I

1. The structural constitute of a human brain is known as ------------------

a) Neuron b)Cells c)Chromosomes d)Genes

2.Neural networks also known as -----------------------

a)Artificial Neural Network b)Artificial Neural Systems


c)Both A and B d) None of the above

3. Neurons also known as -----------------

a)Neurodes b)Processing elements c)Nodes d)All the above

4. In the neuron, attached to the soma are long irregularly shaped filaments called--------------

a)Dendrites b)Axon c)Synapse d)Cerebellum

5. Signum function is defined as -------------------

a) φ(I) =+1, I>0, -1, I<=0

b) φ(I)=0

c) φ(I)=+1,I>0

d) φ(I)=-1,I<=0

6. To generate the final output, the sum is passed on to a non-linear filter φ called

a)Smash function b)sum function c)Activation function d)Output function

7. ---------------function is a continuous function that varies gradually between the asymptotic values 0
and 1 or -1 and +1

a)Activation function b)Thresholding function c)Signum function d)Sigmoidal function

8.-----------------produce negative output values

a)Hyperbolic tangent function b)Parabolic tangent function

c)Tangent function d)None of the above

9.-------------------- carrying the weights connect every input neuron to the output neuron but not
vice-versa.

a)Feed forward network


b)Fast forward network
c)Fast network
d)Forward network
10.------------- has not feedback loop

a)Neural network b)Recurrent Network c)Multilayer Network d)Feed forward network

11. In the learning method, the target output is not presented to the network ----------------

a) Supervised learning b)Unsupervised learning

c)Reinforced learning d)Hebbian learning

12. Combining a number of ADALINE is ----------------

a) MULTILINE b)MULTIPLE LINE C)MADALINE d)MANYLINE

13.Neural network applications -----------------

a) Pattern Recognition b)Optimization Problem c)Forecasting d)All the above

14.------------------ is a Systematic method for training multilayer artificial neural network

a)Back propagation b)Forward propagation c)Speed propagation d)Multilayer propagation

15. --------------------- is a computational model

a) neuron b) cell c)Perception d)Neucleus

16.Intermediatry layer is present in ----------------------

a)Multilayer feedforward perception model

b)Multilayer perception model

c)Multilayer Feedforward model

d)None of the above

17.Linear Activation Operator equation is ---------------

a) O=gI,g=tanφ

b) O=gI,g=sinφ

c) O=gI,g=cosφ

d) O=gI,g=-tanφ

18.--------------- is never assured of finding global minimum as in the simple layer delta rulecase.

a)Back propagation b)Front Propagation c)Propagation d)None above

19.The test of neural network is known as--------------

a)Inference Engine b)Checking c)Deriving d)None


20.Application of Back Propagation

a)Design of Journal Bearing b)Classification of soil

c)Hot Extrusion of soil d)All the above

21. Reinforced learning also known as ----------------

a)Output based learning b)Error based learning

c)Back propagation learning d)None

22.---------------------learning follows “Winner takes all” strategy

a)Stochastic learning b)Competitive learning c)Hebbian learning d)BackPropagation learning

23.------------------earlier neural network architecture,

a)Rosenblatt Perception b)Rosen Perception c)Roshon Perception d)None

24. In Rosenblatt’s Perception network has three units, sensory unit, association unit and
--------------a)Output unit b) Response unit c) feedback unit d) Result unit

25.ADALINE stands for --------------------------

a)Adaptive Linear Neural Element Network

b)Adaptive Line Neural Network

c)Adapt Line Neural Element Network

d)Adaptive Linear Neural Network

PART-B

1. Explain model of artificial neuron


2. Differentiate Learning methods supervised, unsupervised, and reinforced learning
3. Explain Rosenblatt’s Perception
4. Explain ADALINE network
5. Explain Single layer ANN
6. Explain any one application of Back propagation networks

PART-C

1. Explain neural network architecture


2. Explain back propagation learning briefly
3. Explain basic concepts of neural network
UNIT-2

1.----------------is a store house of associated patterns which are encoded in some form

a)Associative memory b) Commutative memory

c)Neural networks d)Memory

2. If the associated pattern pairs (x,y) are different and if the model recalls a y given an x or vice
versa, then it is termed as -------------

a) Auto associative memory b) Hetero associative memory

c) neuro associative memory d) none

3. Autoassociative correlation memories are known as ---------------

a) Auto correlators b) Hetero Correlators c)Neuro Correlators d) None

4.--------------- recalls an output given an input in one feedforward pass

a)Static networks b) Dynamic networks c)Recurrent networks d) None

5.BAM stands for ----------------

a)Bidirectional Associative Memory b)v Associative Memory

c)Biconventional Associative Memory d) None

6.----------------- associates patterns in bipolar forms that are real-coded

a)Simplified Bidirectional Associative Memory b)Bipolar form

c)Bidirectional form d)None

7)---------------------- uses bipolar coding

a)Fabric defect identification b)Recognition of Characters

c)Design of Journal Bearing d) Classification of soil

8)Self-organizing network also known as ---------------------

a)Back Propagation network b)Training free counter propagation network

c)Propagation network d)none

9)Kesko proposed an energy function for the two states -----------------

a)E(A,B)=AMBT

b)E(A,B)=-AMBT

C)E(A,B)=-ABT
D)E(A,B)=ABT

10) BAM was introduced by ----------------------

a) Cruz b) Stubberd c)Kosko d)Rosenbatt

11)The algorithm which computes operator M is known as ------------------

a)Memory algorithm b)Recording Algorithm c)Transfer Algorithm d)None

12) Real coding is used by -----------------

a)Recognition of characters b)Fabric defect identification

c)Optimization d)Classification of soil

13)ART stands for --------------------

a)Adaptive Resonance Theory b)Adaptive Recent Theory

c)Adapt Resonance Theory d)Adaptive Retail Theory

14)A program --------------- is written in fortran for cluster formation

a) Vecquent b)Vecant c)Vector d)Quantization

15)----------------- networks were developed by carpenter and grossberg

a)ART b)ARP c)ARC d)ARD

16)------------------ of the network means that a pattern should not oscillate among different cluster
units at different stages of training

a)Stability b)Mobility c)Versitality d)Placticity

17)------------------- is the analogus version of ART

a)ART2 b)ART1 c)ART2A d)ARTMAP

18)----------------- test is incorporated into the adaptive backward network

a)Vigilance b)Indulgence c)Revailance d)None

19)In ---------------- learning the weights are adjusted only when the external input matches one of
the stored prototypes

a)Supervised b)UnSupervised c)Match-based d)None

20)Kim et al. Proposed an ------------------ method using ART2 architecture.

a)Pattern Recognition b) Chinese Recognition method


c)Character Recognition d)None

21)--------------- learning weight update during resonance occurs rapidly

a)Error-based b) Fast c)Slow d)Match-based

22)Comparison layer and recognition layer constitute -----------

a)Attenuation b)Attenuated System c)Synaptic System d)None

23)ART1 is an elegant theory that address ------------------

a)Stability – plasticity dilemma

b)Stability dilemma

c)Plasticity dilemma

d)None

24)Supervised version of ART -----------------

a)ARTMAP

b)Fuzzy art

c)Fuzzy Artmap

d)ART1

25)Slow learning is used as -----------------

a)ART1

b)ART2

c)ARTMAP

d)Fuzzy ART

PART-B

1.Explain Auto Correlators

2.Explain HeterCorrelators

3.Explain any one application of associative memory

4.Explain Simplified ART architecture

5.Disitinguish ART1 and ART2

6.Explain any one application of ART


PART-C

7.Explain Exponential BAM

8.Explain Classical ART network

9.Explain ART1 algorithm

UNIT-3

1.Fuzziness means -------------

a)Vagueness b)Clear c)Precise d)Certainty

2.---------------- are pictorial representations to denote a set

a)Flow chart b)Venn diagram c)DFD d)ER diagrams

3.The number of elements in a set is called its -------------

a)modality b)placiticity c)Cardinality d)elasticity

4.A set with a single element is called -----------

a)Single set b)Singleton set c)1 set d)none

5.A -------------- of a set A is the set of all possible subsets that are derivable from A including null set

a)Power set b)Impower set c)Rational set d)Irrational set

6.The member ship function of fuzzy set not always be described by ----------------

a)continuous b)Discrete c)crisp d)specific

7.Fuzzy relation is a fuzzy set defined on the Cartesian product of -----------

a)single set b)crisp set c)union set d)intersection set

8.Raising a fuzzy set to its second power is called --------------

a)concentration b)intersection c)conjunction d)disjunction

9.Taking a square root of fuzzy set is called -------------------

a)Dilemma b)Dual c)dialama d)none

10.Fuzzy relation associates ------------ to a varying degree of membership.

a)records b)tuples c)felds d)none

11.In case of => operator, the proposition occurring before the “=>” symbol is called---------
a. antecedent b.consequent c.conjunction d.disjunction

12. A truth table comprises rows known as -------------

a. interpredations b.contradiction c.conjunction d.disjunction

13.A formula which has all its interpretations recording true is known as a ----------------

a.disjunction b.conjunction c.tautology d.antecedent

14.In propositional logic, ---------------- widely used for inferring facts.

a.pones b.modus c.modus ponens d.pons

15.------------------ represent objects that do not change values

a.constants b.variables c.predicates d.subject

16.------------------------ are representative of associations between objects that are constants or


variables and acquire truth values.

a.Subject b.Predicate c.Quantifier d.Functions

17.----------------- truth values are multivalued.

a.crisp logic b.boolean logic c.fuzzy logic d.none

18.Fuzzy logic propositions are also quantified by --------------

a.fuzzy b.fuzzy qualifiers c.fuzzy quantifiers d.none

19.Fuzzy inference also referred to as --------------

a.approximate reasoning b.reasoning c.fixed reasoning d.none

20.Conversion of a fuzzy set to single crisp value is called -----------------

a.fuzzification b.defuzzification c.fuzzy logic d.fuzzy rule

21.--------------- obtains centre of area occupied by the fuzzy set

a.center b.center of gravity c.center of area d.center point

22.The ---------------- is the arithmetic average of mean values of all intervals

a.mean b.mean of maxima c.maximum d.mean interval

23.The ------------------ are obtained by computing the minimum of the membership functions of the
antecedents.

a.rule base b.rule strengths c.rules d.none

24.Relative quantifiers are defined as ---------


a.0 to 10 b.0 to 1 c.0 d.1

25.Fuzzy cruise controller has --------------- inputs

a.2 b.3 c.1 d.0

PART-B

1.Explain fuzzy set

2.Explain crisp set

Explain fuzzy relations

3.Distinguish between crisp logic and predicate logic

4.Explain fuzzy quantifiers

5.Explain fuzzy logic

6.Explain fuzzy inference

PART-C

1.Explain Fuzzy System

2.Explain any one of applications of Fuzzy systems

3.Explain fuzzy rule based systems.


UNIT-IV

PART-A

1.--------------- mimic the principle of natural genetics

a.Genetic programming b.Genetic Algorithm c.Genetic Evolution d.none

2.------------ mimics the behaviour of social insects

a.Swarm intelligence b.Ant colony c.Gentic Algorithm d.none

3.Possible settings of traits are called in genes -------------------

a.locus b.alleles c.genome d.genotype

4.------------------ means that the element of DNA is modified.

a.Recombination b.Selection c.Mutation d.none

5.The -------------- of an organism is measured by means of success of organism in life

a.Strength b.fitness c.Gene d.Chromosome

6.The space for all possible feasible solutions is called ------------------

a.space b.search c.search space d.area

7.------------- is a way of representing individual genes

a.conversion b.encoding c.coding d.none

8.In --------------, every chromosomes is a string of numbers

a.hexadecimal encoding b.octal encoding c.Permutation encoding d.none

9.------------ is the first operator applied on population.

a.Reproduction b.Recombination c.Mutation d.none

10.------------------ means that the genes from the already discovered good individuals are exploited

a.Diversity b.Population diversity c.Unity in diversity d.none

11.-------------is the degree to which the better individuals are favoured

a.Selective pressure b.Reproduction pressure c.Recombination pressure d.Mutation

12.The selection method which is less noisy is -----------

a.stochastic remainder solutionb.Boltzman solution c.Remainder solution d.none

13.The ----------------- is referred the proportion of individuals in the the population which are
replaced in each generation.
a.gap b.generation gap c.generation interval d.interval

14.Crossover operator proceeds in ------------- steps

a.4 b.3 c.5 d.2.

15.Matrix crossover is also known as ------------

a.One dimensional b.Two dimensional c.Three dimensional d.none

16.------------------performs linear inversion with a specified probability of 0.75.

a.Linear+end-inversion b.Discrete inversion c.Continuous inversion d.Mass inversion

17.---------------- of bit involves changing bits from 0 to 1 and 1 to 0.

a.Mutation b.Crossover c.Inversion d.Segregation

18.-------------------- is a process in which a given bit pattern is transformed into another bit pattern by
means of logical bit-wise operation.

a.Inversion b.Conversion c.Masking d.Segregation

19.In ------------------, inversion was applied with specified inversion probability p to each new
individual when it is created.

a.Discrete b.Continuous c.Mass inversion d.none

20.The -------------causes all the bits in the first operand to the shifted to the left by the number of
positions indicated by the second operand.

a.Shift right b.Shift left c.Shift operator d.none

21.A --------------- returns 1 if one of the bits have a value of 1 and the other has a value of 0
otherwise it returns a value 0.

a.bit wise or b.bit wise and c.not d.none

22.Population size, Mutation rate and cross over rate are together referred to as ---------------

a.control parameters b.central parameters c.connection parameters d.none

23.-------------selection is slow cooling of molten metal to achieve the minimum function value in a
minimization problem.

a.Boltzmann selection b.Tournament selection c.Roulette-wheel selection d.none

24.---------------is not a particular method of selecting the parents.

a.Steady-state b.Elitism c.Boltzmann selection d.Tournament Selection

25.Reproduction operator is also known as ---------


a.Recombination b.Selection c.Regeneration d.none

PART-B

1.Explain biological background of genetic algorithm

2.Explain Working principle of genetic algorithm

3.Explain any two types of encoding

4.Explain inheritance operators

5.Explain Mutation operator

6.Explain Bit-wise operator

PART-C

1.Explain Reproduction operator

2.Explain Inversion and Deletion

3.Explain Generation Cycle


UNIT-5

PART-A

1.Hybrid systems is combination of neural networks, fuzzy logic and --------------

a.Genetic Algorithm b.Genetic Programming c.Genetic d.none

2.In -------------, one technology calls the other as a subroutine to process or manipulate
information needed by it.

a.Auxiliary hybrid systems b.Embedded hybrid systems

c.sequential hybrid systems d.none

3.------------hyrbid systems make use of technologies in a pipeline fashion.

a.auxialiary hybrid systems b.embedded hybrid systems

c.sequential hybrid systems d.none

4.--------------hyrbid systems the technologies participating are integerated in such a manner that
they appear interwined.

a.auxialiary hybrid systems b.embedded hybrid systems

c.sequential hybrid systems d.none

5.------------- deals with uncertainty problems with its own merits and demerits

a.neuro –fuzzy b.neuro-genetic c.fuzzy –genetic d.none

6.Neural network can learn various tasks from -------------

a.training b.testing c.learning d.none

7.-------------exhibit non-linear functions to any desired degree of accuracy

a.neuro –fuzzy b.neuro-genetic c.fuzzy –genetic d.none

8.---------------- use to determine the weights of a multilayer feedforward network with


backpropagation learning

a.neuro –fuzzy b.neuro-genetic c.fuzzy –genetic d.none

9.------------------ fuzzy input vectors to crisp outputs

a.Fuzzy – backpropagation b.neuro –fuzzy c.neuro-genetic d.fuzzy –genetic

10.----------------is a neuro-fuzzy hybrid in which the host is a recurrent network with a kind of
competitive learning.

a.Fuzzy ARTMAP b.Fuzzy art c.ARTMAP d.none


11.FAM Stands for ------------

a.Fuzzy Associative Memory b.Fuzzy association memory

c.Fuzzy Assist Memory d.none

12.---------------maps fuzzy sets and can encode fuzzy rules.

a.FAM b.Fuzzy c.ART d.none

13.Fuzzy truck backer-upper system is application of ---------------

a.FAM b.Fuzzy ART c.ART d.none

14.----------------- applicable on fuzzy optimization problems

a.Fuzzy-genetic b.neuro – fuzzy c.fuzzy-logic d.fuzzy-backpropagation

15.--------------learning have reported difficulties in learning the topology of the networks whose
weights they optimize

a.Gradient descent learning b.descent learning c.Gradient learning d.none

16.Applying neuronal learning capabilities to fuzzy systems is knowns as ---------

a.NN driven fuzzy reasoning b.fuzzy driven nn reasoning

c.neural network reasoning d.none

17.---------- can be applicable to mathematical relationship

a. neuro-fuzzy b.fuzzy-neuro c.neuro-network d.none

18.------------- is a multilayer feedforward network architecture with gradient learning.

a.backpropagation b.forward propagation c.Propagation d.none

19. Recurrent network architectures adopting -------------

a.hebbian learning b.supervised learning c.unsupervised learning d.reinforced learning

20.------------ set have no crisp boundaries

a.fuzzy b.boolean c.crisp set d.none

21.GA-NN also known as -----------

a.GANN b.NNGA c.GA d.none

22.Image recognition under noisy is application of --------

a.Fuzzy b.Fuzzy art c.art d.none

23.Genetic algorithm ------------- uses to determine optimization


a.fitness function b.fit function c.strength function d.none

24.------------proposed neuro –fuzzy system

a.lee and lie b.kosko c.gradient d.lee

25.Knowledge-based evaluation and earthquake damage evaluation is application of -----------

a.fuzzy-backpropagation b.neuro-fuzzy c.fuzzy d.none

PART-B

1.Explain neuro-fuzzy hybrids

2.Explain neuro-genetic hybrids

3.Explain fuzzy-genetic hybrids

4.Explain fuzzy-backpropagation network

5.Explain FAM

PART-C

1.Explain Hybrid Systems

2.Explain Fuzzy ARTMAP

3.Explain GA based backpropagation network

You might also like