Professional Documents
Culture Documents
Answer: c
Explanation: With fuzzy logic set membership is defined by certain value. Hence it
could have many values to be in the set.
Answer: a
Explanation: Traditional set theory set membership is fixed or exact either the
member is in the set or not. There is only two crisp values true or false. In case of
fuzzy logic there are many values. With weight say x the member is in the set
3. The truth values of traditional set theory is ____________ and that of fuzzy set is
__________
a) Either 0 or 1, between 0 & 1
b) Between 0 & 1, either 0 or 1
c) Between 0 & 1, between 0 & 1
d) Either 0 or 1, either 0 or 1
View Answer
Answer: a
Explanation: Refer the definition of Fuzzy set and Crisp set.
4. Fuzzy logic is extension of Crisp set with an extension of handling the concept of
Partial Truth.
a) True
b) False
View Answer
Answer: a
Explanation: None.
advertisements
5. How many types of random variables are available?
a) 1
b) 2
c) 3
d) 4
View Answer
Answer: c
Explanation: The three types of random variables are Boolean, discrete and
continuous.
6. The room temperature is hot. Here the hot (use of linguistic variable is used) can be
represented by _______ .
a) Fuzzy Set
b) Crisp Set
View Answer
Answer: a
Explanation: Fuzzy logic deals with linguistic variables.
Answer: b
Explanation: Both Probabilities and degree of truth ranges between 0 – 1.
Answer: d
Explanation: None.
advertisements
9. Japanese were the first to utilize fuzzy logic practically on high-speed trains in
Sendai.
a) True
b) False
View Answer
Answer: a
Explanation: None.
Answer: c
Explanation: The version of probability theory we present uses an extension of
propositional logic for its sentences.
1. Fuzzy Set theory defines fuzzy operators. Choose the fuzzy operators from the
following.
a) AND
b) OR
c) NOT
d) EX-OR
View Answer
Answer: a, b, c
Explanation: The AND, OR, and NOT operators of Boolean logic exist in fuzzy logic,
usually defined as the minimum, maximum, and complement;
2. There are also other operators, more linguistic in nature, called __________ that
can be applied to fuzzy set theory.
a) Hedges
b) Lingual Variable
c) Fuzz Variable
d) None of the mentioned
View Answer
Answer: a
Explanation: None.
Answer: d
Explanation: Bayes rule can be used to answer the probabilistic queries conditioned
on one piece of evidence.
4. What does the Bayesian network provides?
a) Complete description of the domain
b) Partial description of the domain
c) Complete description of the problem
d) None of the mentioned
View Answer
Answer: a
Explanation: A Bayesian network provides a complete description of the domain.
advertisements
5. Fuzzy logic is usually represented as
a) IF-THEN-ELSE rules
b) IF-THEN rules
c) Both a & b
d) None of the mentioned
View Answer
Answer: b
Explanation: Fuzzy set theory defines fuzzy operators on fuzzy sets. The problem in
applying this is that the appropriate fuzzy operator may not be known. For this reason,
fuzzy logic usually uses IF-THEN rules, or constructs that are equivalent, such as
fuzzy associative matrices.
Rules are usually expressed in the form:
IF variable IS property THEN action
Answer: a
Explanation: Once fuzzy relations are defined, it is possible to develop fuzzy
relational databases. The first fuzzy relational database, FRDB, appeared in Maria
Zemankova’s dissertation.
Answer: d
Explanation: Entropy is amount of uncertainty involved in data. Represented by
H(data).
8. ____________ are algorithms that learn from their more complex environments
(hence eco) to generalize, approximate and simplify solution logic.
a) Fuzzy Relational DB
b) Ecorithms
c) Fuzzy Set
d) None of the mentioned
View Answer
Answer: c
Explanation: Local structure is usually associated with linear rather than exponential
growth in complexity.
advertisements
9. Which condition is used to influence a variable directly by all the others?
a) Partially connected
b) Fully connected
c) Local connected
d) None of the mentioned
View Answer
Answer: b
Explanation: None.
10. What is the consequence between a node and its predecessors while creating
Bayesian network?
a) Conditionally dependent
b) Dependent
c) Conditionally independent
d) Both a & b
View Answer
Answer: c
Explanation: The semantics to derive a method for constructing Bayesian networks
were led to the consequence that a node can be conditionally independent of its
predecessors
Artificial Intelligence Questions and
Answers – Neural Networks – 1
This set of Artificial Intelligence MCQs focuses on “Neural Networks – 1”.
1. A 3-input neuron is trained to output a zero when the input is 110 and a one when
the input is 111. After generalization, the output will be zero when and only when the
input is:
a) 000 or 110 or 011 or 101
b) 010 or 100 or 110 or 101
c) 000 or 010 or 110 or 100
d) 100 or 111 or 101 or 001
View Answer
Answer: c
Explanation: The truth table before generalization is:
Inputs Output
000 $
001 $
010 $
011 $
100 $
101 $
110 0
111 1
where $ represents don’t know cases and the output is random.
After generalization, the truth table becomes:
Inputs Output
000 0
001 1
010 0
011 1
100 0
101 1
110 0
111 1
.
2. A perceptron is:
a) a single layer feed-forward neural network with pre-processing
b) an auto-associative neural network
c) a double layer auto-associative neural network
d) a neural network that contains feedback
View Answer
Answer: a
Explanation: The perceptron is a single layer feed-forward neural network. It is not an
auto-associative network because it has no feedback and is not a multiple layer neural
network because the pre-processing stage is not made of neurons.
Answer: b
Explanation: An auto-associative network is equivalent to a neural network that
contains feedback. The number of feedback paths(loops) does not have to be one.
4. A 4-input neuron has weights 1, 2, 3 and 4. The transfer function is linear with the
constant of proportionality being equal to 2. The inputs are 4, 10, 5 and 20
respectively. The output will be:
a) 238
b) 76
c) 119
d) 123
View Answer
Answer: a
Explanation: The output is found by multiplying the weights with their respective
inputs, summing the results and multiplying with the transfer function. Therefore:
Output = 2 * (1*4 + 2*10 + 3*5 + 4*20) = 238.
advertisements
5. Which of the following is true?
(i) On average, neural networks have higher computational rates than conventional
computers.
(ii) Neural networks learn by example.
(iii) Neural networks mimic the way the human brain works.
a) All of the mentioned are true
b) (ii) and (iii) are true
c) (i), (ii) and (iii) are true
d) None of the mentioned
View Answer
Answer: a
Explanation: Neural networks have higher computational rates than conventional
computers because a lot of the operation is done in parallel. That is not the case when
the neural network is simulated on a computer. The idea behind neural nets is based
on the way the human brain works. Neural nets cannot be programmed, they cam only
learn by examples.
Answer: c
Explanation: The training time depends on the size of the network; the number of
neuron is greater and therefore the number of possible ‘states’ is increased. Neural
networks can be simulated on a conventional computer but the main advantage of
neural networks – parallel execution – is lost. Artificial neurons are not identical in
operation to the biological ones.
Answer: d
Explanation: Neural networks learn by example. They are more fault tolerant because
they are always able to respond and small changes in input do not normally cause a
change in output. Because of their parallel architecture, high computational rates are
achieved.
Answer: a
Explanation: Pattern recognition is what single layer neural networks are best at but
they don’t have the ability to find the parity of a picture or to determine whether two
shapes are connected or not.
advertisements
9. Which is true for neural networks?
a) It has set of nodes and connections
b) Each node computes it’s weighted input
c) Node could be in excited state or non-excited state
d) All of the mentioned
View Answer
Answer: d
Explanation: All mentioned are the characteristics of neural network.
Answer: b
Explanation: None.
Answer: d
Explanation: None.
Answer: c
Explanation: Back propagation is the transmission of error back through the network
to allow weights to be adjusted so that the network can learn.
Answer: b
Explanation: Linearly separable problems of interest of neural network researchers
because they are the only class of problem that Perceptron can solve successfully
Answer: a
Explanation: The artificial Neural Network (ANN) cannot explain result.
advertisements
5. Neural Networks are complex ______________ with many parameters.
a) Linear Functions
b) Nonlinear Functions
c) Discrete Functions
d) Exponential Functions
View Answer
Answer: a
Explanation: Neural networks are complex linear functions with many parameters.
6. A perceptron adds up all the weighted inputs it receives, and if it exceeds a certain
value, it outputs a 1, otherwise it just outputs a 0.
a) True
b) False
c) Sometimes – it can also output intermediate values as well
d) Can’t say
View Answer
Answer: b
Explanation: Also known as the step function – so answer 1 is also right. It is a hard
thresholding function, either on or off with no in-between.
8. Having multiple perceptrons can actually solve the XOR problem satisfactorily:
this is because each perceptron can partition off a linear part of the space itself, and
they can then combine their results.
a) True – this works always, and these multiple perceptrons learn to classify even
complex problems.
b) False – perceptrons are mathematically incapable of solving linearly inseparable
functions, no matter what you do
c) True – perceptrons can do this but are unable to learn to do it – they have to be
explicitly hand-coded
d) False – just having a single perceptron is enough
View Answer
Answer: c
Explanation: None.
advertisements
9. The network that involves backward links from output to the input and hidden
layers is called as ____.
a) Self organizing maps
b) Perceptrons
c) Recurrent neural network
d) Multi layered perceptron
View Answer
Answer: c
Explanation: RNN (Recurrent neural network) topology involves backward links from
output to the input and hidden layers.
Answer: d
Explanation: All mentioned options are applications of Neural Network
Answer: b
Explanation: Locality: In logical systems, whenever we have a rule of the form A =>
B, we can conclude B, given evidence A, without worrying about any other rules.
Detachment: Once a logical proof is found for a proposition B, the proposition can be
used regardless of how it was derived .That is, it can be detachment from its
justification. Truth-functionality: In logic, the truth of complex sentences can be
computed from the truth of the components. However, there are no Attachment
properties lies in a Rule-based system. Global attribute defines a particular problem
space as user specific and changes according to user’s plan to problem.
Answer: a
Explanation: FL incorporates a simple, rule-based IF X AND Y THEN Z approach to
a solving control problem rather than attempting to model a system mathematically.
3. In an Unsupervised learning
a) Specific output values are given
b) Specific output values are not given
c) No specific Inputs are given
d) Both inputs and outputs are given
e) Neither inputs nor outputs are given
View Answer
Answer: b
Explanation: The problem of unsupervised learning involves learning patterns in the
input when no specific output values are supplied. We cannot expect the specific
output to test your result. Here the agent does not know what to do, as he is not aware
of the fact what propose system will come out. We can say an ambiguous un-proposed
situation.
Answer: c
Explanation: Consistent hypothesis go with examples, If the hypothesis says it should
be negative but infect it is positive, it is false negative. If a hypothesis says it should
be positive, but in fact, it is negative, it is false positive. In a specialized hypothesis
we need to have certain restrict or special conditions.
Answer: b
Explanation: Neural networks parameters can be learned from noisy data and they
have been used for thousands of applications, so it varies from problem to problem
and thus use nonlinear functions.
8. A perceptron is a ——————————–.
a) Feed-forward neural network
b) Back-propagation algorithm
c) Back-tracking algorithm
d) Feed Forward-backward algorithm
e) Optimal algorithm with Dynamic programming
View Answer
Answer: a
Explanation: A perceptron is a Feed-forward neural network with no hidden units that
can be representing only linear separable functions. If the data are linearly separable,
a simple weight updated rule can be used to fit the data exactly.
advertisements
9. Which of the following statement is true?
a) Not all formal languages are context-free
b) All formal languages are Context free
c) All formal languages are like natural language
d) Natural languages are context-oriented free
e) Natural language is formal
View Answer
Answer: a
Explanation: Not all formal languages are context-free.
Answer: e
Explanation: The union and concatenation of two context-free languages is context-
free; but intersection need not be.
1. Factors which affect the performance of learner system does not include
a) Representation scheme used
b) Training scenario
c) Type of feedback
d) Good data structures
View Answer
Answer: d
Explanation: Factors which affect the performance of learner system does not include
good data structures.
Answer: d
Explanation: Different learning methods include memorization, analogy and
deduction.
Answer: d
Explanation: Decision trees, Neural networks, Propositional rules and FOL rules all
are the models of learning.
Answer: a
Explanation: In automatic vehicle set of vision inputs and corresponding actions are
available to learner hence it’s an example of supervised learning.
advertisements
5. Following is an example of active learning:
a) News Recommender system
b) Dust cleaning machine
c) Automated vehicle
d) None of the mentioned
View Answer
Answer: a
Explanation: In active learning, not only the teacher is available but the learner can
ask suitable perception-action pair example to improve performance.
6. In which of the following learning the teacher returns reward and punishment to
learner?
a) Active learning
b) Reinforcement learning
c) Supervised learning
d) Unsupervised learning
View Answer
Answer: b
Explanation: Reinforcement learning is the type of learning in which teacher returns
award or punishment to learner.
Answer: d
Explanation: Decision trees can be used in all the conditions stated.
Answer: d
Explanation: All mentioned options are applications of learning.
advertisements
9. Which of the following is the component of learning system?
a) Goal
b) Model
c) Learning rules
d) All of the mentioned
View Answer
Answer: d
Explanation: Goal, model, learning rules and experience are the components of
learning system.
1. What will take place as the agent observes its interactions with the world?
a) Learning
b) Hearing
c) Perceiving
d) Speech
View Answer
Answer: a
Explanation: Learning will take place as the agent observes its interactions with the
world and its own decision making process.
Answer: c
Explanation: A learning element modifies the performance element so that it can make
better decision.
Answer: c
Explanation: The three main issues are affected in design of a learning element are
components, feedback and representation.
Answer: d
Explanation: Linear weighted polynomial is used for learning element in the game
playing programs.
Answer: b
Explanation: Ockham razor prefers the simplest hypothesis consistent with the data
intuitively.
8. What will happen if the hypothesis space contains the true function?
a) Realizable
b) Unrealizable
c) Both a & b
d) None of the mentioned
View Answer
Answer: b
Explanation: A learning problem is realizable if the hypothesis space contains the true
function.
advertisements
9. What takes input as an object described by a set of attributes?
a) Tree
b) Graph
c) Decision graph
d) Decision tree
View Answer
Answer: d
Explanation: Decision tree takes input as an object described by a set of attributes and
returns a decision.
Answer: c
Explanation: A decision tree reaches its decision by performing a sequence of tests
1: ANN is composed of large number of highly interconnected processing
elements(neurons) working in unison to solve problems.
A.
True
B.
False
C.
D.
Option: A
Explanation :
2:
Artificial neural network used for
A.
Pattern Recognition
B.
Classification
C.
Clustering
D.
All of these
Explanation :
3:
A Neural Network can answer
A.
For Loop questions
B.
what-if questions
C.
IF-The-Else Analysis Questions
D.
None of these
Option: B
Explanation :
4:
Ability to learn how to do tasks based on the data given for training or initial
experience
A.
Self Organization
B.
Adaptive Learning
C.
Fault tolerance
D.
Robustness
Option: B
Explanation :
5:
Feature of ANN in which ANN creates its own organization or representation of
information it receives during learning time is
A.
Adaptive Learning
B.
Self Organization
C.
What-If Analysis
D.
Supervised Learniing
Option: B
Explanation :
Read more: http://www.avatto.com/computer-science/test/mcqs/soft-
computing/ann/514/1.html#ixzz46VE8CQAp
6:
In artificial Neural Network interconnected processing elements are called
A.
nodes or neurons
B.
weights
C.
axons
D.
Soma
Option: A
Explanation :
7:
Each connection link in ANN is associated with ________ which has information
about the input signal.
A.
neurons
B.
weights
C.
bias
D.
activation function
Option: B
Explanation :
8:
Neurons or artificial neurons have the capability to model networks of original
neurons as found in brain
A.
True
B.
False
C.
D.
Option: A
Explanation :
9:
Internal state of neuron is called __________, is the function of the inputs the
neurons receives
A.
Weight
B.
activation or activity level of neuron
C.
Bias
D.
None of these
Option: B
Explanation :
10:
Neuron can send ________ signal at a time.
A.
multiple
B.
one
C.
none
D.
any number of
Answer Report Discuss
Option: B
Explanation :
A
. It uses machine-learning techniques. Here program can learn From past
experience and adapt themselves to new situations
B.
Computational procedure that takes some value as input and produces some
value as output.
C.
Science of making machines performs tasks that would require intelligence
when performed by humans
D
. None of these
Option: C
Explanation :
2:
Expert systems
A
. Combining different types of method or information
B.
Approach to the design of learning algorithms that is structured along the lines
of the theory of evolution
C.
an information base filled with the knowledge of an expert formulated in terms
of if-then rules
D
. None of these
Option: C
Explanation :
3:
Falsification is
A.
Modular design of a software application that facilitates the integration of new
modules
B.
Showing a universal law or rule to be invalid by providing a counter example
C.
A set of attributes in a database table that refers to data in another table
D.
None of these
Option: B
Explanation :
4:
Evolutionary computation is
A
. Combining different types of method or information
B.
Approach to the design of learning algorithms that is structured along the lines
of the theory of evolution.
C.
Decision support systems that contain an information base filled with the
knowledge of an expert formulated in terms of if-then rules.
D
. None of these
Option: B
Explanation :
5:
Extendible architecture is
A.
Modular design of a software application that facilitates the integration of new
modules
B.
Showing a universal law or rule to be invalid by providing a counter example
C.
A set of attributes in a database table that refers to data in another table
D.
None of these
Option: A
Explanation :
A.
A programming language based on logic
B.
A computer where each processor has its own operating system, its own
memory, and its own hard disk
C.
Describes the structure of the contents of a database.
D.
None of these
Option: B
Explanation :
7:
Search space
A
. The large set of candidate solutions possible for a problem
B.
The information stored in a database that can be, retrieved with a single query.
C.
Worth of the output of a machine learning program that makes it understandable
for humans
D
. None of these
Option: A
Explanation :
8:
n(log n) is referred to
A.
A measure of the desired maximal complexity of data mining algorithms
B.
A database containing volatile data used for the daily operation of an
organization
C.
Relational database management system
D.
None of these
Option: A
Explanation :
9:
Perceptron is
A.
General class of approaches to a problem.
B.
Performing several computations simultaneously
C.
Structures in a database those are statistically relevant
D.
Simple forerunner of modern neural networks, without hidden layers
Answer Report Discuss
Option: D
Explanation :
10:
Prolog is
A.
A programming language based on logic
B.
A computer where each processor has its own operating system, its own
memory, and its own hard disk
C.
Describes the structure of the contents of a database
D.
None of these
Option: A
Explanation :
A
. The large set of candidate solutions possible for a problem
B.
The information stored in a database that can be, retrieved with a single query
C.
Worth of the output of a machine learning program that makes it
understandable for humans
D
. None of these
Option: B
Explanation :
12:
Quantitative attributes are
A.
A reference to the speed of an algorithm, which is quadratically dependent
on the size of the data
B.
Attributes of a database table that can take only numerical values
C.
Tools designed to query a database
D.
None of these
Answer Report Discuss
Option: B
Explanation :
13:
Subject orientation
A
. The science of collecting, organizing, and applying numerical facts
B.
Measure of the probability that a certain hypothesis is incorrect given certain
observations.
C.
One of the defining aspects of a data warehouse, which is specially built
around all the existing applications of the operational data
D
. None of these
Option: C
Explanation :
14:
Vector
A.
It do not need the control of the human operator during their execution
B.
An arrow in a multi-dimensional space. It is a quantity usually characterized
by an ordered set of scalars
C.
The validation of a theory on the basis of a finite number of examples
D.
None of these
Option: B
Explanation :
15:
Transparency
A
. The large set of candidate solutions possible for a problem
B.
The information stored in a database that can be retrieved with a single query
C.
Worth of the output of a machine learning program that makes it
understandable for humans
D
. None of these
Explanation :
A.
Fuzzy Computing, Neural Computing, Genetic Algorithms
B.
Fuzzy Networks and Artificial Intelligence
C.
Artificial Intelligence and Neural Science
D.
Neural Science and Genetic Science
Option: A
Explanation :
2:
Who initiated the idea of Soft Computing
A.
Charles Darwin
B.
Lofti A Zadeh
C.
Rechenberg
D.
Mc_Culloch
Option: B
Explanation :
3:
Fuzzy Computing
A
. mimics human behaviour
B.
doesnt deal with 2 valued logic
C.
deals with information which is vague, imprecise, uncertain, ambiguous,
inexact, or probabilistic
D
. All of the above
Option: D
Explanation :
4:
Neural Computing
A.
mimics human brain
B.
information processing paradigm
C.
Both (a) and (b)
D.
None of the above
Option: C
Explanation :
5:
Genetic Algorithm are a part of
A
. Evolutionary Computing
B.
inspired by Darwin's theory about evolution - "survival of the fittest"
C.
are adaptive heuristic search algorithm based on the evolutionary ideas of
natural selection and genetics
D
. All of the above
Option: D
Explanation
A.
Improvised and unimprovised
B.
supervised and unsupervised
C.
Layered and unlayered
D.
None of the above
Option: B
Explanation :
7:
Supervised Learning is
A.
learning with the help of examples
B.
learning without teacher
C.
learning with the help of teacher
D.
learning with computers as supervisor
Option: C
Explanation :
8:
Unsupervised learning is
A.
learning without computers
B.
problem based learning
C.
learning from environment
D.
learning from teachers
Answer Report Discuss
Option: C
Explanation :
9:
Conventional Artificial Intelligence is different from soft computing in the sense
A.
Conventional Artificial Intelligence deal with prdicate logic where as soft
computing deal with fuzzy logic
B.
Conventional Artificial Intelligence methods are limited by symbols where
as soft computing is based on empirical data
C.
Both (a) and (b)
D.
None of the above
Option: C
Explanation :
10:
In supervised learning
A.
classes are not predefined
B.
classes are predefined
C.
classes are not required
D.
classification is not done
Option: B
Explanation :
A.
True
B.
False
C.
D.
Option: A
Explanation :
2:
The membership functions are generally represented in
A.
Tabular Form
B.
Graphical Form
C.
Mathematical Form
D.
Logical Form
Option: B
Explanation :
3:
Membership function can be thought of as a technique to solve empirical problems
on the basis of
A.
knowledge
B.
examples
C.
learning
D.
experience
Option: D
Explanation :
A.
Intution, Inference, Rank Ordering
B.
Fuzzy Algorithm, Neural network, Genetic Algorithm
C.
Core, Support , Boundary
D.
Weighted Average, center of Sums, Median
Option: C
Explanation :
5:
The region of universe that is characterized by complete membership in the set is
called
A.
Core
B.
Support
C.
Boundary
D.
Fuzzy
Option: A
Explanation :
A.
sub normal fuzzy sets
B.
normal fuzzy set
C.
convex fuzzy set
D.
concave fuzzy set
Answer Report Discuss
7:
In a Fuzzy set a prototypical element has a value
A.
1
B.
0
C.
infinite
D.
Not defined
Option: A
Explanation :
8:
A fuzzy set wherein no membership function has its value equal to 1 is called
A.
normal fuzzy set
B.
subnormal fuzzy set.
C.
convex fuzzy set
D.
concave fuzzy set
Option: B
Explanation :
9: A fuzzy set has a membership function whose membership values are strictly
monotonically increasing or strictly monotonically decreasing or strictly
monotonically increasing than strictly monotonically decreasing with increasing
values for elements in the universe
A.
convex fuzzy set
B.
concave fuzzy set
C.
Non concave Fuzzy set
D.
Non Convex Fuzzy set
Option: A
Explanation :
10:
The membership values of the membership function are nor strictly
monotonically increasing or decreasing or strictly monoronically increasing than
decreasing.
A.
Convex Fuzzy Set
B.
Non convex fuzzy set
C.
Normal Fuzzy set
D.
Sub normal fuzzy set
Option: B
Explanation :
List I
List II
A.
a b c d
2 1 4 3
B.
a b c d
1 2 3 4
C.
a b c d
4 3 2 1
D.
a b c d
3 2 1 4
Option: A
Explanation :
12: The crossover points of a membership function are defined as the elements in the
universe for which a particular fuzzy set has values equal to
A.
infinite
B.
1
C.
0
D.
0.5
Option: D
Explanation :
Questions
(i)
evolution
(ii)
selection
(iii)
reproduction
(iv)
mutation
: Your answer is
(a)
i & ii only
(b)
(c)
(a) (i)
(b) (ii)
crossover chromosomes
(c) (iii)
mutation survivability
(d) (iv)
: Your answer is .3
4. (a)
5. _____
6. (b)
7. _____
8. (c)
9. _____
10.(d)
11._____
(i)
(ii)
biology
(iii)
Artificial Life
(iv)
economics
: Your answer is
(a)
(b)
(c)
(d)
(i)
encoding of solutions
(ii)
(iii)
(iv)
: Your answer is
(a)
i & ii only
(b)
(c)
(i)
(ii)
GAs are exhaustive, giving out all the optimal solutions to a given
problem.
(iii)
(iv)
: Your answer is
(a)
(b)
(c)
(d)
(i)
(ii)
(iv)
The search space of the problem is not ideal for GAs to operate.
: Your answer is
(a)
(b)
(c)
(d)
: Your answer is
(a)
(b)
(c)
(d)
(i)
Artificial Life is analytic, trying to break down complex phenomena
into their basic components.
(ii)
(iii)
(iv)
: Your answer is
(a)
i & ii only
(b)
(c)
(d)
(i)
(ii)
biology
(iii)
robotics
(iv)
(a)
(b)
(c)
(d)
(i)
children
(ii)
designers
(iii)
artists
(iv)
patients
: Your answer is
(a)
(b)
(c)
(d)
Q1.
...Go Back
Q2.
(a)
(ii)
(b)
(iv)
(c)
(i)
(d)
(iii)
...Go Back
Q3.
...Go Back
Q4.
The problem is mapped into a set of strings with each string representing a
potential solution (i.e. chromosomes). A fitness function is required to
compare and tell which solution is better. GA performance is heavily
.dependent on the representation chosen
...Go Back
Q5.
The search space is too complex for exhaustive search such that GAs
successfully find robust solutions after evaluating only a few percent of the
.full parameter space
It can never be guaranteed that GAs will find an optimal solution or even any
.solution at all
...Go Back
Q6.
...Go Back
Q7.
...Go Back
Q8.
...Go Back
Q9.
...Go Back
Q10.
...Go Back
1. Which type od the model is having the memory associated with it?
a) GAN
b) Autoencoder
c) RNN
d) CNN
Type of RNN
applications of RNN
Forget Cell
Which should be the value of |Whh| so that model does not stuck in exploding and vanishing gradient problem
a) <1
b) >1
c) =1
d) =0
Which of the method uses the trainable parameters for converting string data into numerical data"
a) one hot encoding
b) representing each word with unique number
c) word embedding
d) All of these
Which ofsource
This study the was
method have
downloaded least relationship
by 100000795234702 with encoded
from CourseHero.com data and
on 04-24-2022 string
03:52:32 type
GMT of data
-05:00
https://www.coursehero.com/file/75294158/unit-5pdf/
a) one hot encoding
b) representing each word with unique number
c) word embedding
d) All of these
This study source was downloaded by 100000795234702 from CourseHero.com on 04-24-2022 03:52:32 GMT -05:00
https://www.coursehero.com/file/75294158/unit-5pdf/
Powered by TCPDF (www.tcpdf.org)
1. High bias means- 10. Size of weights for the followin
Underfit g code is : model.keras.sequenti
Overfit al( [ layers.Dense(3)]) y=model(t
f.ones(10,5)) print(model.weights
2. How to check CPU time using [0].shape)-
python? 5 x 3
import ClockTime 10 x 5
import time 10 x 3
3 x 5
3. Python use -
Interpreter
Compiler 11. Model.compile() in keras requir
e-
4. How to check version of All of above
tensorflow? optimizer
tf.__version__ loss
tf._version_ metrics
tf.version
12. Model.save() save the model's-
5. Output of print(tf.test.gpu_devic All of above
e_name()) if only 1-GPU availa Model Architecture
ble is- Optimizer State
/device:GPU:0 Weight & Biase matrix
/device:GPU:1
13. Correct library to load saved
6. Matrix multiplication is- model in keras is-
@ keras.models.load_model()
* keras.Sequential.load_model()
** keras.layers.Dense.load_model()
7. For Square of tensor can i use- 14. Which of the following is the c
tf.square() orrect library to load pre-traine
** d NN?-
^2 tf.keras.Models
tf.keras.applications
8. Element wise matrix multiplicat tf.keras.layers
ion- tf.keras.preprocessing
a*b
a**b 15. Which of following is NOT dat
a@b a-augmentation layer?-
RandomTranslate()
9. Weights and biases in the sequ RandomCrop()
ential model assigned by either RandomFlip()
call the model with inputs or RandomRotation()
specify input shape during the
creation of the model. 16. Which of following is NOT the
True building block of LSTM-
False logic gate input gate
Weights are created once model Forget gate output gate
is declared
This study source was downloaded by 100000795234702 from CourseHero.com on 04-24-2022 03:52:09 GMT -05:00
https://www.coursehero.com/file/75294130/ml-ete-sanjay-sir-pdf/
17. Which of the following weight
matrix leads to the VANISHIN 24. Which of the following is a cor
G gradient problem in BPTT?- rect library to import text_data
|Whh|<1 set_from_directory()-
|Whh|>1 I. tf.keras.preprocessing
|Whh| =1 II. tf.keras.layers.experimental.preproc
|Whh| =0 essing
III. sklearn.preprocessing
18. Which of following is NOT the IV. tf.keras.modes.preprocessing
gate in LSTM?
Multiplication gate 25. Which of the following can be
Input Gate used to solve the vanishing gra
Forget gate dient problem of BPTT?
Output gate LSTM
LSTM or GRU both can be used
19. A Gate in LSTM has an activa GRU
tion function- Dropout
Tanh
sigmoid 26. Rescaling and Resizing is the p
threshold reprocessing layers, that can be
linear imported from library -
tf.keras.layers.experimental.preproc
20. Which of the following |Whh| l essing
eads to Exploding gradient pro tf.keras.layers.preprocessing
blem? tf.keras.preprocessing
|Whh| > 1 tf.keras.models.layers.preprocessing
|Whh| < 1
|Whh| = 1 27. A dataset 'x_train' contains 50
|Whh| = 0 batch with each having size 32.
Number of batch in x_new=x_
21. Which of following is a correct train.take(20) is -
library for Embedding layer i 20
n RNN? 50
I. tf.keras.layers 30
II. tf.keras.applications 32
III. tf.keras.layers.experimental.preproc
essing 28. A dataset 'x_train' contains 50
IV. tf.keras.Models batch with each having size 32.
Number of batch in x_new=x_
22. LSTM stands for - train.skip(20) is -
Long Short Term Memory 30
Length Short Term Memory 50
Long Sequential Term Memory 20
Length Short Term Memory 32
This study source was downloaded by 100000795234702 from CourseHero.com on 04-24-2022 03:52:09 GMT -05:00
https://www.coursehero.com/file/75294130/ml-ete-sanjay-sir-pdf/
B) tf.keras.layers.LSTM(128,return_sequ vertical_and_horizontal
ences=True); horizontal
tf.keras.layers.LSTM(64,return_seque
nce=True);
vertical
tf.keras.layers.LSTM(32) horizontal_and_vertical
C) tf.keras.layers.LSTM(128,return_sequ
ences=True); 35. Data augmentation layers are a
tf.keras.layers.LSTM(64); vailable in which directory?
tf.keras.layers.LSTM(32,return_seque
nces=True)
D) tf.keras.layers.LSTM(128); tf.keras.layers.experimental.preprocessing
tf.keras.layers.LSTM(64); tf.keras.preprocessing
tf.keras.layers.LSTM(32) tf.keras.models.layers.preprocessing
tf.data.experimental.preprocessing
This study source was downloaded by 100000795234702 from CourseHero.com on 04-24-2022 03:52:09 GMT -05:00
https://www.coursehero.com/file/75294130/ml-ete-sanjay-sir-pdf/
IV. Autoregressive Integrated Moving Avera
41. Which of the following is time ge (ARIMA)
series analysis method NOT su
pport both Trends or signal co 46. Which of following GANs uses
mponent? unpaierd data for prediction?
I. Seasonal Autoregressive Integrated Movi CycleGAN
ng Average (SARIMAX) Pix2Pix
II. Autoregressive Integrated Moving DCGAN
Average (ARIMA) FGSM
III. Seasonal Autoregressive Integrated Movi
ng Average with exogenous variable(S A
RIMAX) 47. What is the full form of FGS
IV. Holt Winter’s Exponential Smoothing M?
Fast Gradient Sign Method
42. Which of following is the corre Fast Gradient Sigmoid Method
ct method to load autoregressio Fourier Gradient Signature Methd
n model? Fast Gravity Sign Magnitude
I. from statsmodels.tsa.ar_model import Au
toReg 48. Which of the following is NOT
II. from statsmodels.tsa.arima_model import GANs networks?
ARMA
III. from statsmodels.tsa.arima_model import FGSM
AutoReg DeepDream
IV. from statsmodels.tsa.ar_model import Au Pix2Pix
toRegression CycleGAN
This study source was downloaded by 100000795234702 from CourseHero.com on 04-24-2022 03:52:09 GMT -05:00
https://www.coursehero.com/file/75294130/ml-ete-sanjay-sir-pdf/
Backdooring Attribute Inference
Torjoning Model Inference
58. Which of following privacy atta 64. Which of the following operato
ck on ML model, where attack rs NOT supported in python te
ers want to extract training dat nsorflow?
a of model? #
Membership Inference ^
Input Inference **
This study source was downloaded by 100000795234702 from CourseHero.com on 04-24-2022 03:52:09 GMT -05:00
https://www.coursehero.com/file/75294130/ml-ete-sanjay-sir-pdf/
@ I.
II.
tf.keras.application.mobilenet_v2.decode_predictions
tf.keras.application.mobilenet_v2.preprocess_input
III. tf.keras.application.mobilenetV2.decode_predictions
65. Which of following weights are IV. tf.keras.application.mobilenet_v2.MobileNetV2.decode_predictions
70. Which of following is the corre 76. Which of the following is true
ct library to convert predicted about dropout?
value of mobilenetV2 to correct Dropout is a regularization technique
label? Dropout does not reduce overfitting.
Dropout solves vanishing gradient problem.
This study source was downloaded by 100000795234702 from CourseHero.com on 04-24-2022 03:52:09 GMT -05:00
https://www.coursehero.com/file/75294130/ml-ete-sanjay-sir-pdf/
All of the above.
This study source was downloaded by 100000795234702 from CourseHero.com on 04-24-2022 03:52:09 GMT -05:00
https://www.coursehero.com/file/75294130/ml-ete-sanjay-sir-pdf/
Powered by TCPDF (www.tcpdf.org)
4/24/22, 2:20 PM [MCQ] Soft Computing - Last Moment Tuitions
(https://lastmome
nttuitions.com/)
ng
Module 1
View Answer
Ans : A
Explanation: Fuzzy Logic (FL) is a method of reasoning that resembles human reasoning.
A. 2
B. 3
C. 4
D. 5
https://lastmomenttuitions.com/mcqs/it-engineering/mcq-soft-computing/ 1/12
4/24/22, 2:20 PM [MCQ] Soft Computing - Last Moment Tuitions
View Answer
Ans : A
Explanation: The conventional logic block that a computer can understand takes precise
input and produces a de nite output as TRUE or FALSE, which is equivalent to human’s YES
or NO.
A. Hardware
B. software
C. Both A and B
View Answer
Ans : C
4. The truth values of traditional set theory is ____________ and that of fuzzy set is __________
D. Either 0 or 1, either 0 or 1
View Answer
Ans : A
5. How many main parts are there in Fuzzy Logic Systems Architecture?
A. 3
B. 4
C. 5
D. 6
View Answer
Ans : B
A. membership value
B. degree of membership
C. membership value
D. Both A and B
View Answer
Ans : D
https://lastmomenttuitions.com/mcqs/it-engineering/mcq-soft-computing/ 2/12
4/24/22, 2:20 PM [MCQ] Soft Computing - Last Moment Tuitions
A. 4
B. 5
C. 6
D. 7
View Answer
Ans : B
8. Fuzzy Set theory de nes fuzzy operators. Choose the fuzzy operators from the
following.
A. AND
B. OR
C. NOT
View Answer
Ans : D
Explanation: The AND, OR, and NOT operators of Boolean logic exist in fuzzy logic, usually
9. The room temperature is hot. Here the hot (use of linguistic variable is used) can be
represented by _______
A. Fuzzy Set
B. Crisp Set
C. Both A and B
View Answer
Ans : A
A. Heat
B. No_Change
C. Cool
View Answer
Ans : B
https://lastmomenttuitions.com/mcqs/it-engineering/mcq-soft-computing/ 3/12
4/24/22, 2:20 PM [MCQ] Soft Computing - Last Moment Tuitions
Prepare for Aptitude with 50+ Videos Lectures and Handmade Notes
a) Two-valued logic
c) Many-valued logic
Explanation: With fuzzy logic set membership is de ned by certain value. Hence it could
a) True
b) False
Explanation: Traditional set theory set membership is xed or exact either the member is in
the set or not. There is only two crisp values true or false. In case of fuzzy logic there are
13. The truth values of traditional set theory is ____________ and that of fuzzy set is
__________
d) Either 0 or 1, either 0 or 1
14. Fuzzy logic is extension of Crisp set with an extension of handling the concept of
Partial Truth.
a) True
b) False
Explanation: None.
15. The room temperature is hot. Here the hot (use of linguistic variable is used) can be
represented by _______
https://lastmomenttuitions.com/mcqs/it-engineering/mcq-soft-computing/ 4/12
4/24/22, 2:20 PM [MCQ] Soft Computing - Last Moment Tuitions
a) Fuzzy Set
b) Crisp Set
a) Discrete Set
b) Degree of truth
c) Probabilities
17. Japanese were the rst to utilize fuzzy logic practically on high-speed trains in Sendai.
a) True
b) False
Explanation: None.
18. Fuzzy Set theory de nes fuzzy operators. Choose the fuzzy operators from the
following.
a) AND
b) OR
c) NOT
Explanation: The AND, OR, and NOT operators of Boolean logic exist in fuzzy logic, usually
19. There are also other operators, more linguistic in nature, called __________ that can be
a) Hedges
b) Lingual Variable
c) Fuzz Variable
Explanation: None.
https://lastmomenttuitions.com/mcqs/it-engineering/mcq-soft-computing/ 5/12
4/24/22, 2:20 PM [MCQ] Soft Computing - Last Moment Tuitions
a) IF-THEN-ELSE rules
b) IF-THEN rules
Explanation: Fuzzy set theory de nes fuzzy operators on fuzzy sets. The problem in applying
this is that the appropriate fuzzy operator may not be known. For this reason, fuzzy logic
usually uses IF-THEN rules, or constructs that are equivalent, such as fuzzy associative
matrices.
Prepare for Aptitude with 50+ Videos Lectures and Handmade Notes
21. Like relational databases there does exists fuzzy relational databases.
a) True
b) False
Explanation: Once fuzzy relations are de ned, it is possible to develop fuzzy relational
databases. The rst fuzzy relational database, FRDB, appeared in Maria Zemankova
dissertation.
a) Fuzzy Logic
b) Probability
c) Entropy
23. ____________ are algorithms that learn from their more complex environments (hence
a) Fuzzy Relational DB
b) Ecorithms
c) Fuzzy Set
https://lastmomenttuitions.com/mcqs/it-engineering/mcq-soft-computing/ 6/12
4/24/22, 2:20 PM [MCQ] Soft Computing - Last Moment Tuitions
Explanation: Local structure is usually associated with linear rather than exponential growth
in complexity.
24. Membership function de nes the fuzziness in a fuzzy set irrespective of the elements
a.) True
b.) False
Answer: A
b) Graphical form
c) Mathematical form
d) Logical form
Ans: B
on the basis of
a) knowledge
b) example
c) learning
d) experience
Ans: D
Ans : C
28. A fuzzy set whose membership function has at least one element x in the universe
is unity is called
Ans: B
https://lastmomenttuitions.com/mcqs/it-engineering/mcq-soft-computing/ 7/12
4/24/22, 2:20 PM [MCQ] Soft Computing - Last Moment Tuitions
a) 1
b) 0
c) in nite
d) not de ned
Ans: A
30. A fuzzy set wherein no membership function has its value equal to 1 is called
Ans: B
Start your Programming Journey with Python Programming which is Easy to Learn and
Highly in Demand
31.A fuzzy set has a membership function whose membership values are strictly
increasing than strictly monotonically decreasing with increasing values for elements in
the universe
Ans : A
32. The membership values of the membership function are nor strictly monotonically
Ans : B
a) dynamic
b) static
https://lastmomenttuitions.com/mcqs/it-engineering/mcq-soft-computing/ 8/12
4/24/22, 2:20 PM [MCQ] Soft Computing - Last Moment Tuitions
c) deterministic
Answer: c
Explanation: Input/output patterns & the activation values may be considered as sample
34. If xb(t) represents di erentiation of state x(t), then a stochastic model can be
represented by?
a) xb(t)=deterministic model
Answer: b
Answer: b
x(t)?
a) xb(t)=0
b) xb(t)=1
d) xb(t)=n(t)+1
Answer: c
Answer: b
Explanation: In asynchronous update, change in state of any one unit drive the whole
network.
https://lastmomenttuitions.com/mcqs/it-engineering/mcq-soft-computing/ 9/12
4/24/22, 2:20 PM [MCQ] Soft Computing - Last Moment Tuitions
38. Learning is a?
a) slow process
b) fast process
d) can’t say
Answer: a
a) convergence of weights
Answer: d
Explanation: These all are the some of basic requirements of learning laws.
Answer: a
Explanation: Memory decay a ects short term memory rather than older memories.
Prepare for Aptitude with 50+ Videos Lectures and Handmade Notes
c) convergence of weights
Answer: d
Explanation: These all are the some of basic requirements of learning laws.
https://lastmomenttuitions.com/mcqs/it-engineering/mcq-soft-computing/ 10/12
4/24/22, 2:20 PM [MCQ] Soft Computing - Last Moment Tuitions
Answer: a
Start your Machine learning & Data Science journey with Complete Hands-on Learning
ref=42057)
(https://lastmomenttuitions.com/courses/placement-preparation/)
(https://lastmomenttuitions.com/course/python-zero-to-hero-covering-web-development-
and-machine-learning-capstone-project-from-scratch-included-mentorship/youtube-2/)
/ Youtube Channel: https://www.youtube.com/channel/UCGFNZxMqKLsqWERX_N2f08Q
(https://www.youtube.com/channel/UCGFNZxMqKLsqWERX_N2f08Q)
(https://lastmomenttuitions.com/course/python-zero-to-hero-covering-web-development-
and-machine-learning-capstone-project-from-scratch-included-mentorship/insta-
1/)/lastmomenttuition (https://www.instagram.com/lastmomenttuition/)
(https://lastmomenttuitions.com/course/python-zero-to-hero-covering-web-development-
and-machine-learning-capstone-project-from-scratch-included-mentorship/link/)/ Last Moment
Tuitions (https://in.linkedin.com/company/last-moment-
tuitions#:~:text=Last%20Moment%20Tuitions%20(LMT)%20is,others%20is%20its%20teaching%20
methodology.)
https://lastmomenttuitions.com/mcqs/it-engineering/mcq-soft-computing/ 11/12
4/24/22, 2:20 PM [MCQ] Soft Computing - Last Moment Tuitions
(https://lastmomenttuitions.com/course/python-zero-to-hero-covering-web-development-
and-machine-learning-capstone-project-from-scratch-included-
mentorship/twittrwer/)/ lastmomentdost (https://twitter.com/lastmomentdost)
https://lastmomenttuitions.com/mcqs/it-engineering/mcq-soft-computing/ 12/12
lOMoARcPSD|7609677
Final ML - Practice it
2 An active learner
Both a and b
interacts with the environment at training time by posing queries
None of these
observes the information provided by the environment
None of these
A pruning set of class labeled tuples is used to estimate cost
Avoid underfitting
Performance Measure
Class of task
Performance Measure
Choice of function approximation algorithm
9 Which of the following algorithm can handle continuous data for decision tree?
CART
ID3
C4.5
None of these
Pessimistic pruning
16 The field of study that gives computers the capability to learn without being
explicitly programmed
Artificial Intelligence
Deep Learning
Machine Learning
None of there
26 Mona receives emails that consists of 18% spam of those emails. The spam
filter is 93% reliable i.e., 93% of the mails it marks as spam are actually a spam
and 93% of spam mails are correctly labelled as spam. If a mail marked spam by
her spam filter, determine the probability that it is really spam.
84
50
39
63
29
94, 113, 92
110, 141, 100
119, 133, 118
30 Let suppose for some document xyz, term frequency of word j is 50 and
document frequency is 2000 and total number of documents is 10. Then what will
be the TF IDF
10,000
0.025
-115
-382
31 Consider the following data, D: {10, 12, 12, 14, 14} what will be jackknife bias of
the mode?
12
0
13
14
32 Consider the following confusion matrix. What is the precision of the model?
0.94
0.75
0.4
0.57
33 Consider the following data which shows 5 hypothesis for robot movement. For
all hypothesis probability given training data (D) is given. As well as probability for
F, L and H based upon hypothesis (hi) is given where F stands for forward, L
stands for Left and R stands for Right. Using the bayes optimal classifier, find the
direction of movement of robot.
2/2
Front
Left
All of the above
Right
35 Consider the following data, D: {1,3,3,5,7} , h=3 using the parzen window
estimation, what will be the probability at X=4.
3/5
1/15
1/5
3/50
36 If data is three dimensional and h=4, what will be the volume of region?
4
12
81
64
4. Choose the correct option regarding machine learning (ML) and artificial
intelligence (AI)
A. ML is a set of techniques that turns a dataset into a software
5. Which of the factors affect the performance of the learner system does not
include?
A. Good data structures
B. Representation scheme used
C. Training scenario
D. Type of feedback
Correct option is A
7. Successful applications of ML
A. Learning to recognize spoken words
B. Learning to drive an autonomous vehicle
C. Learning to classify new astronomical structures
D. Learning to play world-class backgammon
E. All of the above
Correct option is E
14. What kind of learning algorithm for “Facial identities or facial expressions”?
A. Prediction
B. Recognition Patterns
C. Generating Patterns
D. Recognizing Anomalies Answer
Correct option is B
16. Real-Time decisions, Game AI, Learning Tasks, Skill Aquisition, and Robot
Navigation are applications of which of the folowing
A. Supervised Learning: Classification
B. Reinforcement Learning
18. Fraud Detection, Image Classification, Diagnostic, and Customer Retention are
applications in which of the following
A. Unsupervised Learning: Regression
B. Supervised Learning: Classification
C. Unsupervised Learning: Clustering
D. Reinforcement Learning
Correct option is B
19. Which of the following is not function of symbolic in the various function
representation of Machine Learning?
A. Rules in propotional Logic
B. Hidden-Markov Models (HMM)
C. Rules in first-order predicate logic
D. Decision Trees
Correct option is B
20. Which of the following is not numerical functions in the various function
representation of Machine Learning?
A. Neural Network
B. Support Vector Machines
C. Case-based
D. Linear Regression
Correct option is C
21. FIND-S Algorithm starts from the most specific hypothesis and generalize it by
considering only
A. Negative
B. Positive
C. Negative or Positive
D. None of the above
Correct option is B
C. Both
D. None of the above
Correct option is A
24. Inductive learning is based on the knowledge that if something happens a lot it is
likely to be generally
A. True
B. False Answer
Correct option is A
25. Inductive learning takes examples and generalizes rather than starting
with
A. Inductive
B. Existing
C. Deductive
D. None of these
Correct option is B
26. A drawback of the FIND-S is that it assumes the consistency within the training
set
A. True
B. False
Correct option is A
28. Which of the following is a widely used and effective machine learning algorithm
based on the idea of bagging?
A. Decision Tree
B. Random Forest
C. Regression
D. Classification
Correct option is B
29. To find the minimum or the maximum of a function, we set the gradient to zero
because which of the following
A. Depends on the type of problem
B. The value of the gradient at extrema of a function is always zero
C. Both (A) and (B)
D. None of these
Correct option is B
33. What are the advantages of neural networks over conventional computers?
• They have the ability to learn by
• They are more fault
• They are more suited for real time operation due to their high „computational‟
A. (i) and (ii)
B. (i) and (iii)
C. Only (i)
D. All
E. None
Correct option is D
Correct option is A
42. A 3-input neuron has weights 1, 4 and 3. The transfer function is linear with the
constant of proportionality being equal to 3. The inputs are 4, 8 and 5
respectively. What will be the output?
A. 139
B. 153
C. 162
D. 160
Correct option is B
48. The general tasks that are performed with backpropagation algorithm
A. Pattern mapping
B. Prediction
C. Function approximation
D. All of the above
Correct option is D
49. Backpropagaion learning is based on the gradient descent along error surface.
A. True
B. False
Correct option is A
Correct option is D
52. The network that involves backward links from output to the input and hidden
layers is known as
A. Recurrent neural network
B. Self organizing maps
C. Perceptrons
D. Single layered perceptron
Correct option is A
60. Which of the following is the consequence between a node and its predecessors
while creating bayesian network?
A. Conditionally independent
B. Functionally dependent
C. Both Conditionally dependant & Dependant
D. Dependent
Correct option is A
63. provides way and means of weighing up the desirability of goals and the
likelihood of achieving
A. Utility theory
B. Decision theory
C. Bayesian networks
D. Probability theory
Correct option is A
65. Probability provides a way of summarizing the that comes from our laziness and
A. Belief
B. Uncertaintity
C. Joint probability distributions
D. Randomness
Correct option is B
66. The entries in the full joint probability distribution can be calculated as
A. Using variables
B. Both Using variables & information
C. Using information
D. All of the above
Correct option is C
67. Causal chain (For example, Smoking cause cancer) gives rise to:-
A. Conditionally Independence
B. Conditionally Dependence
C. Both
D. None of the above
Correct option is A
68. The bayesian network can be used to answer any query by using:-
A. Full distribution
B. Joint distribution
C. Partial distribution
D. All of the above
Correct option is B
Correct option is B
77. In the intermediate steps of “EM Algorithm”, the number of each base in each
column is determined and then converted to
A. True
B. False
Correct option is A
78. Naïve Bayes algorithm is based on and used for solving classification problems.
A. Bayes Theorem
B. Candidate elimination algorithm
C. EM algorithm
D. None of the above
Correct option is A
82. In which of the following types of sampling the information is carried out under
the opinion of an expert?
A. Convenience sampling
B. Judgement sampling
C. Quota sampling
D. Purposive sampling
Correct option is B
C. Both A & B
D. None of these
Correct option is C
86. hypothesis h with respect to target concept c and distribution D , is the probability
that h will misclassify an instance drawn at random according to D.
A. True Error
B. Type 1 Error
C. Type 2 Error
D. None of these
Correct option is A
87. Statement: True error defined over entire instance space, not just training data
A. True
B. False
Correct option is A
88. What area of CLT tells “How many examples we need to find a good hypothesis
?”?
A. Sample Complexity
B. Computational Complexity
C. Mistake Bound
D. None of these
Correct option is A
89. What area of CLT tells “How much computational power we need to find a good
hypothesis ?”?
A. Sample Complexity
B. Computational Complexity
C. Mistake Bound
D. None of these
Correct option is B
90. What area of CLT tells “How many mistakes we will make before finding a good
hypothesis ?”?
A. Sample Complexity
B. Computational Complexity
C. Mistake Bound
D. None of these
Correct option is C
91. (For question no. 9 and 10) Can we say that concept described by conjunctions of
Boolean literals are PAC learnable?
A. Yes
B. No
Correct option is A
92. How large is the hypothesis space when we have n Boolean attributes?
A. |H| = 3 n
B. |H| = 2 n
C. |H| = 1 n
D. |H| = 4n
Correct option is A
94. For a particular learning task, if the requirement of error parameter changes from
0.1 to 0.01. How many more samples will be required for PAC learning?
A. Same
B. 2 times
C. 1000 times
D. 10 times
Correct option is D
A. Lazy-learner
B. Eager learner
C. Can‟t say
Correct option is A
105. How many types of layer in radial basis function neural networks?
A. 3
B. 2
C. 1
D. 4
Correct option is A, Input layer, Hidden layer, and Output layer
106. The neurons in the hidden layer contains Gaussian transfer function
whose output are to the distance from the centre of the neuron.
A. Directly
B. Inversely
C. equal
D. None of these
Correct option is B
107. PNN/GRNN networks have one neuron for each point in the training file,
While RBF network have a variable number of neurons that is usually
A. less than the number of training
B. greater than the number of training points
C. equal to the number of training points
D. None of these
Correct option is A
108. Which network is more accurate when the size of training set between
small to medium?
A. PNN/GRNN
B. RBF
C. K-means clustering
D. None of these
Correct option is A
112 In k-NN algorithm, given a set of training examples and the value of k < size of training set
(n), the algorithm predicts the class of a test example to be the. What is/are advantages of CBR?
120. Produces two new offspring from two parent string by copying selected
bits from each parent is called
A. Mutation
B. Inheritance
C. Crossover
D. None of these
Correct option is C
121. Each schema the set of bit strings containing the indicated as
A. 0s, 1s
B. only 0s
C. only 1s
D. 0s, 1s, *s
Correct option is D
122. 0*10 represents the set of bit strings that includes exactly (A) 0010, 0110
A. 0010, 0010
B. 0100, 0110
C. 0100, 0010
Correct option is A
B. Output, delivers a single rule that covers many +ve examples and few -ve.
C. Output rule has a high accuracy but not necessarily a high
D. A & B
E. A, B & C
Correct option is E
129. is any predicate (or its negation) applied to any set of terms.
A. Literal
B. Null
C. Clause
D. None of these
Correct option is A
Correct option is D
1.
A. TRUE
B. FALSE
Correct option is A
A. The subset of all hypotheses is called the version space with respect to the
hypothesis space H and the training examples D, because it contains all plausible
versions of the target
B. The version space consists of only specific
C. None of these
D.
Correct option is A
D. None of these
Correct option is A
142. What will take place as the agent observes its interactions with the world?
A. Learning
B. Hearing
C. Perceiving
D. Speech
Correct option is A
144. Any hypothesis found to approximate the target function well over a
sufficiently large set of training examples will also approximate the target
function well over other unobserved example is called:
A. Inductive Learning Hypothesis
B. Null Hypothesis
C. Actual Hypothesis
D. None of these
Correct option is A
D. No test
Correct option is C
A. Pattern Recognition
B. Classification
C. Clustering
D. All Answer
Correct option is D
158. How many terms are required for building a Bayes model?
A. 2
B. 3
C. 4
D. 1
Correct option is B
161. What is the consequence between a node and its predecessors while
creating Bayesian network?
A. Functionally dependent
B. Dependant
C. Conditionally independent
D. Both Conditionally dependant & Dependant
Correct option is C
163. How the entries in the full joint probability distribution can be calculated?
A. Using variables
B. Using information
C. Both Using variables & information
D. None of the mentioned
Correct option is B
164. How the Bayesian network can be used to answer any query?
A. Full distribution
B. Joint distribution
C. Partial distribution
D. All of the mentioned
Correct option is B
167. Which of the following will be true about k in k-NN in terms of variance
A. When you increase the k the variance will increases
B. When you decrease the k the variance will increases
C. Can‟t say
D. None of these
Correct option is B
170. When you find noise in data which of the following option would you
consider in k- NN
A. I will increase the value of k
B. I will decrease the value of k
C. Noise can not be dependent on value of k
D. None of these
Correct option is A
171. Which of the following will be true about k in k-NN in terms of Bias?
A. When you increase the k the bias will be increases
B. When you decrease the k the bias will be increases
C. Can‟t say
D. None of these
Correct option is A
Correct option is C
175. All of the following are suitable problems for genetic algorithms EXCEPT
A. dynamic process control
B. pattern recognition with complex patterns
C. simulation of biological models
D. simple optimization with few variables
Correct option is D
176. Adding more basis functions in a linear model… (Pick the most probably
option)
A. Decreases model bias
B. Decreases estimation bias
C. Decreases variance
D. Doesn‟t affect bias and variance
Correct option is A
178. A feature F1 can take certain value: A, B, C, D, E, & F and represents grade
of students from a college. Which of the following statement is true in following
case?
A. Feature F1 is an example of nominal
B. Feature F1 is an example of ordinal
C. It doesn‟t belong to any of the above category.
Correct option is B
179. You observe the following while fitting a linear regression to the data: As
you increase the amount of training data, the test error decreases and the
training error increases. The train error is quite low (almost what you expect it to),
while the test error is much higher than the train error. What do you think is the
main reason behind this behaviour? Choose the most probable option.
A. High variance
B. High model bias
C. High estimation bias
D. None of the above Answer
Correct option is C
182. Consider the following: (a) Evolution (b) Selection (c) Reproduction (d)
Mutation Which of the following are found in genetic algorithms?
A. All
B. a, b, c
C. a, b
D. b, d
Correct option is A
B. optimization
C. complete enumeration family of methods
D. Non-computer based (human) solutions area
Correct option is A
185. For a two player chess game, the environment encompasses the
opponent
A. True
B. False
Correct option is A
189. Consider the following modification to the tic-tac-toe game: at the end of
game, a coin is tossed and the agent wins if a head appears regardless of
whatever has happened in the game.Can reinforcement learning be used to learn
an optimal policy of playing Tic-Tac-Toe in this case?
A. Yes
B. No
Correct option is B
Correct option is A
191. Suppose the reinforcement learning player was greedy, that is, it always
played the move that brought it to the position that it rated the best. Might it
learn to play better, or worse, than a non greedy player?
A. Worse
B. Better
Correct option is B
196. A computer program that learns to play checkers might improve its
performance as:
A. Measured by its ability to win at the class of tasks involving playing
checkers
B. Experience obtained by playing games against
C. Both a & b
D. None of these
Correct option is C
B. Machine Learning
C. Both a & b
D. None of these
Correct option is A
198. The field of study that gives computers the capability to learn without
being explicitly programmed
A. Machine Learning
B. Artificial Intelligence
C. Deep Learning
D. Both a & b
Correct option is A
204. Which of the following is a widely used and effective machine learning
algorithm based on the idea of bagging?
A. Decision Tree
B. Regression
C. Classification
D. Random Forest
Correct option is D
205. A model can learn based on the rewards it received for its previous action
is known as:
A. Supervised learning
B. Unsupervised learning
C. Reinforcement learning
D. Concept learning
Correct option is C
206. A subset of machine learning that involves systems that think and learn
like humans using artificial neural networks.
A. Artificial Intelligence
B. Machine Learning
C. Deep Learning
D. All of these
Correct option is C
C. All of these
D. None of above
Correct option is C
210. In Machine learning the module that must solve the given performance
task is known as:
A. Critic
B. Generalizer
C. Performance system
D. All of these
Correct option is C
212. In a learning system the component that takes as takes input the current
hypothesis (currently learned function) and outputs a new problem for the
Performance System to explore.
A. Critic
B. Generalizer
C. Performance system
D. Experiment generator
E. All of these
Correct option is D
214. In a learning system the component that takes as input the history or
trace of the game and produces as output a set of training examples of the target
function is known as:
A. Critic
B. Generalizer
C. Performance system
D. All of these
Correct option is A
220. Which of the following is a widely used and effective machine learning
algorithm based on the idea of bagging?
A. Decision Tree
B. Regression
C. Classification
D. Random Forest
Correct option is D
224. What is the approach of basic algorithm for decision tree induction?
A. Greedy
B. Top Down
C. Procedural
D. Step by Step
Correct option is A
225. Which of the following classifications would best suit the student
performance classification systems?
A. If-.then-analysis
B. Market-basket analysis
C. Regression analysis
D. Cluster analysis
Correct option is A
233. The difference between the sample value expected and the estimates
value of the parameter is called as?
A. Bias
B. Error
C. Contradiction
D. Difference
Correct option is A
234. In which of the following types of sampling the information is carried out
under the opinion of an expert?
A. Quota sampling
B. Convenience sampling
C. Purposive sampling
D. Judgment sampling
Correct option is D
237. Machine learning is interested in the best hypothesis h from some space
H, given observed training data D. Here best hypothesis means
A. Most general hypothesis
B. Most probable hypothesis
C. Most specific hypothesis
D. None of these
Correct option is B
239. Bayes’ theorem states that the relationship between the probability of the
hypothesis before getting the evidence P(H) and the probability of the hypothesis
after getting the evidence P(H∣E) is
A. [P(E∣H)P(H)] / P(E)
B. [P(E∣H) P(E) ] / P(H)
C. [P(E) P(H) ] / P(E∣H)
D. None of these
Correct option is A
240. A doctor knows that Cold causes fever 50% of the time. Prior probability
of any patient having cold is 1/50,000. Prior probability of any patient having
fever is 1/20. If a patient has fever, what is the probability he/she has cold?
A. P(C/F)= 0.0003
B. P(C/F)=0.0004
C. P(C/F)= 0.0002
D. P(C/F)=0.0045
Correct option is C
242. When you find noise in data which of the following option would you
consider in K- Nearest Neighbor?
A. I will increase the value of k
B. I will decrease the value of k
C. Noise cannot be dependent on value of k
D. None of these
Correct option is A
B. eager learning
C. concept learning
D. none of these
Correct option is B
B. Randomly chosen root node tree of one parent program by a sub tree
from the other parent program
C. Randomly chosen root node tree of one parent program by a root
node tree from the other parent program
D. None of these
Correct option is A
1) If you remove the following any one red points from the data. Does the
decision boundary will change?
A) Yes
B) No
2) [True or False] If you remove the non-red circled points from the data,
the decision boundary will change?
A) True
B) False
Solution: A
Datasets which have a clear classification boundary will function best with
SVM’s.
A) Selection of Kernel
B) Kernel Parameters
C) Soft Margin Parameter C
D) All of the above
Solution: D
The SVM effectiveness depends upon how you choose the basic 3
requirements mentioned above in such a way that it maximises your
efficiency, reduces error and overfitting.
8) Support vectors are the data points that lie closest to the decision
surface.
A) TRUE
B) FALSE
Solution: A
They are the points closest to the hyperplane and the hardest ones to
classify. They also have a direct bearing on the location of the decision
surface.
Solution: C
When the data has noise and overlapping points, there is a problem in
drawing a clear hyperplane without misclassifying.
10) Suppose you are using RBF kernel in SVM with high Gamma value.
What does this signify?
A) The model would consider even far away points from hyperplane for
modeling
B) The model would consider only the points close to the hyperplane for
modeling
C) The model would not be affected by distance of points from hyperplane
for modeling
D) None of the above
Solution: B
The gamma parameter in SVM tuning signifies the influence of points either
near or far away from the hyperplane.
For a low gamma, the model will be too constrained and include all points
of the training dataset, without really capturing the shape.
For a higher gamma, the model will capture the shape of the dataset well.
Solution: C
12) Suppose you are building a SVM model on data X. The data X can be
error prone which means that you should not trust any specific data point
too much. Now think that you want to build a SVM model which has
quadratic kernel function of polynomial degree 2 that uses Slack variable C
as one of it’s hyper parameter. Based upon that give the answer for
following question.
What would happen when you use very large value of C(C->infinity)?
Note: For small C was also classifying all data points correctly
A) We can still classify data correctly for given setting of hyper parameter C
B) We can not classify data correctly for given setting of hyper parameter C
C) Can’t Say
D) None of these
Solution: A
For large values of C, the penalty for misclassifying points is very high, so
the decision boundary will perfectly separate the data if possible.
13) What would happen when you use very small C (C~0)?
A) Misclassification would happen
B) Data will be correctly classified
C) Can’t say
D) None of these
Solution: A
The classifier can maximize the margin between most of the points, while
misclassifying a few points, because the penalty is so low.
A) Underfitting
B) Nothing, the model is perfect
C) Overfitting
Solution: C
15) Which of the following are real world applications of the SVM?
A) Text and Hypertext Categorization
B) Image Classification
C) Clustering of News Articles
D) All of the above
Solution: D
SVM’s are highly versatile models that can be used for practically all real
world problems ranging from regression to clustering and handwriting
recognitions.
Question Context: 16 – 18
Suppose you have trained an SVM with linear decision boundary after
training SVM, you correctly infer that your SVM model is under fitting.
16) Which of the following option would you more likely to consider iterating
SVM next time?
A) You want to increase your data points
B) You want to decrease your data points
C) You will try to calculate more variables
D) You will try to reduce the features
Solution: C
The best option here would be to create more features for the model.
17) Suppose you gave the correct answer in previous question. What do
you think that is actually happening?
A) 1 and 2
B) 2 and 3
C) 1 and 4
D) 2 and 4
Solution: C
Better model will lower the bias and increase the variance
Solution: A
19) We usually use feature normalization before using the Gaussian kernel
in SVM. What is true about feature normalization?
A) 1
B) 1 and 2
C) 1 and 3
D) 2 and 3
Solution: B
Suppose you are dealing with 4 class classification problem and you want
to train a SVM model on the data for that you are using One-vs-all method.
Now answer the below questions?
20) How many times we need to train our SVM model in such case?
A) 1
B) 2
C) 3
D) 4
Solution: D
For a 4 class problem, you would have to train the SVM at least 4 times if
you are using a one-vs-all method.
21) Suppose you have same distribution of classes in the data. Now, say
for training 1 time in one vs all setting the SVM is taking 10 second. How
many seconds would it require to train one-vs-all method end to end?
A) 20
B) 40
C) 60
D) 80
Solution: B
22) Suppose your problem has changed now. Now, data has only 2
classes. What would you think how many times we need to train SVM in
such case?
A) 1
B) 2
C) 3
D) 4
Solution: A
Training the SVM only one time would give you appropriate results
Question context: 23 – 24
Suppose you are using SVM with linear kernel of polynomial degree 2, Now
think that you have applied this on data and found that it perfectly fit the
data that means, Training and testing accuracy is 100%.
23) Now, think that you increase the complexity(or degree of polynomial of
this kernel). What would you think will happen?
Solution: A
Increasing the complexity of the data would make the algorithm overfit the
data.
24) In the previous question after increasing the complexity you found that
training accuracy was still 100%. According to you what is the reason
behind that?
A) 1
B) 2
C) 1 and 2
D) None of these
Solution: C
A) 1
B) 2
C) 1 and 2
D) None of these
Solution: C
(b) Pick the one with lowest error on a separate test set, with A having
been chosen so as to minimise training error.
(c) Pick the one with lowest error on a separate test set, with A having been
chosen so as to minimise error on this test set.
d) Pick the one with lowest error on a separate test set, with A having been
chosen so as to minimise cross-validation error on the training set.
(E)Pick the one with lowest cross-validation error on the training set, with A
having been chosen so as to minimise cross-validation error on the training
set.
2. Four different people are doing bias-variance estimates on regularised linear regression
models. They come to you and make the following claims about certain experiments they've
done. Which of these claims are definitely incorrect? (Here A refers to the regularisation
parameter as usual.)
(a) 'I increased A and the model started underfitting the data, whilst the variance went down'.
(b) 'I decreased A and the model started overfitting the data, whilst the bias went up'.
(C)'I decreased A and the model started overfitting the data, whilst the variance went up'.
(D) 'I increased A and the model started underfitting the data, whilst the bias went down'.
4. Suppose your model is demonstrating high variance across different training sets. Which of
the following is NOT a valid way to try and reduce the variance?
3. Decision Tree is
a) Flow-Chart
b) Structure in which internal node represents test on an attribute, each
branch represents outcome of test and each leaf node represents class
label
c) Both a) & b)
d) None of the mentioned
.
4. Decision Trees can be used for Classification Tasks.
a) True
b) False
d) Triangles
13. Which search uses the problem specific knowledge beyond the
definition of
the problem?
a) Informed search
b) Depth-first search
c) Breadth-first search
d) Uninformed search
14. Which function will select the lowest expansion node atfirst for
evaluation?
a) Greedy best-first search
b) Best-first search
c) Both a & b
d) None of the mentioned
16. Which search uses only the linear space for searching?
a) Best-first search
20. Which search method will expand the node that is closest to the goal?
a) Best-first search
b) Greedy best-first search
c) A* search
d) None of the mentioned
22. Which is used to extract solution directly from the planning graph?
a) Planning algorithm
b) Graph plan
c) Hill-climbing search
d) All of the mentioned
d) Heuristic estimates
29. How many conditions are available between two actions in mutex
relation?
a) 1
b) 2
c) 3
d) 4
4. Choose the correct option regarding machine learning (ML) and artificial
intelligence (AI)
A. ML is a set of techniques that turns a dataset into a software
B. AI is a software that can emulate the human mind
C. ML is an alternate way of programming intelligent machines
D. All of the above
Correct option is D
5. Which of the factors affect the performance of the learner system does not
include?
A. Good data structures
B. Representation scheme used
C. Training scenario
D. Type of feedback
Correct option is A
Correct option is D
7. Successful applications of ML
A. Learning to recognize spoken words
B. Learning to drive an autonomous vehicle
C. Learning to classify new astronomical structures
D. Learning to play world-class backgammon
E. All of the above
Correct option is E
14. What kind of learning algorithm for “Facial identities or facial expressions”?
A. Prediction
B. Recognition Patterns
C. Generating Patterns
D. Recognizing Anomalies Answer
Correct option is B
16. Real-Time decisions, Game AI, Learning Tasks, Skill Aquisition, and Robot
Navigation are applications of which of the folowing
A. Supervised Learning: Classification
B. Reinforcement Learning
C. Unsupervised Learning: Clustering
D. Unsupervised Learning: Regression
Correct option is B
18. Fraud Detection, Image Classification, Diagnostic, and Customer Retention are
applications in which of the following
A. Unsupervised Learning: Regression
B. Supervised Learning: Classification
C. Unsupervised Learning: Clustering
D. Reinforcement Learning
Correct option is B
19. Which of the following is not function of symbolic in the various function
representation of Machine Learning?
A. Rules in propotional Logic
B. Hidden-Markov Models (HMM)
C. Rules in first-order predicate logic
D. Decision Trees
Correct option is B
20. Which of the following is not numerical functions in the various function
representation of Machine Learning?
A. Neural Network
B. Support Vector Machines
C. Case-based
D. Linear Regression
Correct option is C
21. FIND-S Algorithm starts from the most specific hypothesis and generalize it by
considering only
A. Negative
B. Positive
C. Negative or Positive
D. None of the above
Correct option is B
Correct option is B
24. Inductive learning is based on the knowledge that if something happens a lot it is
likely to be generally
A. True
B. False Answer
Correct option is A
25. Inductive learning takes examples and generalizes rather than starting
with
A. Inductive
B. Existing
C. Deductive
D. None of these
Correct option is B
26. A drawback of the FIND-S is that it assumes the consistency within the training
set
A. True
B. False
Correct option is A
28. Which of the following is a widely used and effective machine learning algorithm
based on the idea of bagging?
A. Decision Tree
B. Random Forest
C. Regression
D. Classification
Correct option is B
29. To find the minimum or the maximum of a function, we set the gradient to zero
because which of the following
A. Depends on the type of problem
42. A 3-input neuron has weights 1, 4 and 3. The transfer function is linear with the
constant of proportionality being equal to 3. The inputs are 4, 8 and 5
respectively. What will be the output?
A. 139
B. 153
C. 162
D. 160
Correct option is B
48. The general tasks that are performed with backpropagation algorithm
A. Pattern mapping
B. Prediction
C. Function approximation
D. All of the above
Correct option is D
49. Backpropagaion learning is based on the gradient descent along error surface.
A. True
B. False
Correct option is A
52. The network that involves backward links from output to the input and hidden
layers is known as
A. Recurrent neural network
B. Self organizing maps
C. Perceptrons
D. Single layered perceptron
Correct option is A
60. Which of the following is the consequence between a node and its predecessors
while creating bayesian network?
A. Conditionally independent
B. Functionally dependent
C. Both Conditionally dependant & Dependant
D. Dependent
Correct option is A
63. provides way and means of weighing up the desirability of goals and the
likelihood of achieving
A. Utility theory
B. Decision theory
C. Bayesian networks
D. Probability theory
Correct option is A
65. Probability provides a way of summarizing the that comes from our laziness
and
A. Belief
B. Uncertaintity
C. Joint probability distributions
D. Randomness
Correct option is B
66. The entries in the full joint probability distribution can be calculated as
A. Using variables
B. Both Using variables & information
C. Using information
D. All of the above
Correct option is C
67. Causal chain (For example, Smoking cause cancer) gives rise to:-
A. Conditionally Independence
B. Conditionally Dependence
C. Both
D. None of the above
Correct option is A
68. The bayesian network can be used to answer any query by using:-
A. Full distribution
B. Joint distribution
C. Partial distribution
D. All of the above
Correct option is B
77. In the intermediate steps of “EM Algorithm”, the number of each base in each
column is determined and then converted to
A. True
B. False
Correct option is A
78. Naïve Bayes algorithm is based on and used for solving classification problems.
A. Bayes Theorem
B. Candidate elimination algorithm
C. EM algorithm
D. None of the above
Correct option is A
82. In which of the following types of sampling the information is carried out under
the opinion of an expert?
A. Convenience sampling
B. Judgement sampling
C. Quota sampling
D. Purposive sampling
Correct option is B
87. Statement: True error defined over entire instance space, not just training data
A. True
B. False
Correct option is A
88. What area of CLT tells “How many examples we need to find a good hypothesis
?”?
A. Sample Complexity
B. Computational Complexity
C. Mistake Bound
D. None of these
Correct option is A
89. What area of CLT tells “How much computational power we need to find a good
hypothesis ?”?
A. Sample Complexity
B. Computational Complexity
C. Mistake Bound
D. None of these
Correct option is B
90. What area of CLT tells “How many mistakes we will make before finding a good
hypothesis ?”?
A. Sample Complexity
B. Computational Complexity
C. Mistake Bound
D. None of these
Correct option is C
91. (For question no. 9 and 10) Can we say that concept described by conjunctions of
Boolean literals are PAC learnable?
A. Yes
B. No
Correct option is A
92. How large is the hypothesis space when we have n Boolean attributes?
A. |H| = 3 n
B. |H| = 2 n
C. |H| = 1 n
D. |H| = 4n
Correct option is A
94. For a particular learning task, if the requirement of error parameter changes from
0.1 to 0.01. How many more samples will be required for PAC learning?
A. Same
B. 2 times
C. 1000 times
D. 10 times
Correct option is D
Correct option is D
105. How many types of layer in radial basis function neural networks?
A. 3
B. 2
C. 1
D. 4
Correct option is A, Input layer, Hidden layer, and Output layer
106. The neurons in the hidden layer contains Gaussian transfer function whose
output are to the distance from the centre of the neuron.
A. Directly
B. Inversely
C. equal
D. None of these
Correct option is B
107. PNN/GRNN networks have one neuron for each point in the training file,
While RBF network have a variable number of neurons that is usually
A. less than the number of training
B. greater than the number of training points
C. equal to the number of training points
D. None of these
Correct option is A
108. Which network is more accurate when the size of training set between
small to medium?
A. PNN/GRNN
B. RBF
C. K-means clustering
D. None of these
Correct option is A
D. All of these
Correct option is A
112 In k-NN algorithm, given a set of training examples and the value of k < size of
training set (n), the algorithm predicts the class of a test example to be the. What is/are
advantages of CBR?
D. Ecology
Correct option is A
120. Produces two new offspring from two parent string by copying selected
bits from each parent is called
A. Mutation
B. Inheritance
C. Crossover
D. None of these
Correct option is C
121. Each schema the set of bit strings containing the indicated as
A. 0s, 1s
B. only 0s
C. only 1s
D. 0s, 1s, *s
Correct option is D
122. 0*10 represents the set of bit strings that includes exactly (A) 0010, 0110
A. 0010, 0010
B. 0100, 0110
C. 0100, 0010
Correct option is A
129. is any predicate (or its negation) applied to any set of terms.
A. Literal
B. Null
C. Clause
D. None of these
Correct option is A
Correct option is B
1.
A. TRUE
B. FALSE
Correct option is A
A. The subset of all hypotheses is called the version space with respect to the
hypothesis space H and the training examples D, because it contains all plausible
versions of the target
B. The version space consists of only specific
C. None of these
D.
Correct option is A
142. What will take place as the agent observes its interactions with the world?
A. Learning
B. Hearing
C. Perceiving
D. Speech
Correct option is A
144. Any hypothesis found to approximate the target function well over a
sufficiently large set of training examples will also approximate the target
function well over other unobserved example is called:
A. Inductive Learning Hypothesis
B. Null Hypothesis
C. Actual Hypothesis
D. None of these
Correct option is A
A. Solving queries
B. Increasing complexity
C. Decreasing complexity
D. Answering probabilistic query
Correct option is D
158. How many terms are required for building a Bayes model?
A. 2
B. 3
C. 4
D. 1
Correct option is B
161. What is the consequence between a node and its predecessors while
creating Bayesian network?
A. Functionally dependent
B. Dependant
C. Conditionally independent
D. Both Conditionally dependant & Dependant
Correct option is C
163. How the entries in the full joint probability distribution can be calculated?
A. Using variables
B. Using information
C. Both Using variables & information
D. None of the mentioned
Correct option is B
164. How the Bayesian network can be used to answer any query?
A. Full distribution
B. Joint distribution
C. Partial distribution
D. All of the mentioned
Correct option is B
167. Which of the following will be true about k in k-NN in terms of variance
A. When you increase the k the variance will increases
B. When you decrease the k the variance will increases
C. Can‟t say
D. None of these
Correct option is B
169. In k-NN it is very likely to overfit due to the curse of dimensionality. Which
of the following option would you consider to handle such problem? 1).
Dimensionality Reduction 2). Feature selection
A. 1
B. 2
C. 1 and 2
D. None of these
Correct option is C
170. When you find noise in data which of the following option would you
consider in k- NN
A. I will increase the value of k
B. I will decrease the value of k
C. Noise can not be dependent on value of k
D. None of these
Correct option is A
171. Which of the following will be true about k in k-NN in terms of Bias?
A. When you increase the k the bias will be increases
B. When you decrease the k the bias will be increases
C. Can‟t say
D. None of these
Correct option is A
175. All of the following are suitable problems for genetic algorithms EXCEPT
A. dynamic process control
B. pattern recognition with complex patterns
C. simulation of biological models
D. simple optimization with few variables
Correct option is D
176. Adding more basis functions in a linear model… (Pick the most probably
option)
A. Decreases model bias
B. Decreases estimation bias
C. Decreases variance
D. Doesn‟t affect bias and variance
Correct option is A
178. A feature F1 can take certain value: A, B, C, D, E, & F and represents grade
of students from a college. Which of the following statement is true in following
case?
A. Feature F1 is an example of nominal
B. Feature F1 is an example of ordinal
C. It doesn‟t belong to any of the above category.
Correct option is B
179. You observe the following while fitting a linear regression to the data: As
you increase the amount of training data, the test error decreases and the
training error increases. The train error is quite low (almost what you expect it to),
while the test error is much higher than the train error. What do you think is the
main reason behind this behaviour? Choose the most probable option.
A. High variance
B. High model bias
C. High estimation bias
D. None of the above Answer
Correct option is C
B. FALSE
Correct option is A
182. Consider the following: (a) Evolution (b) Selection (c) Reproduction (d)
Mutation Which of the following are found in genetic algorithms?
A. All
B. a, b, c
C. a, b
D. b, d
Correct option is A
185. For a two player chess game, the environment encompasses the opponent
A. True
B. False
Correct option is A
189. Consider the following modification to the tic-tac-toe game: at the end of
game, a coin is tossed and the agent wins if a head appears regardless of
whatever has happened in the game.Can reinforcement learning be used to learn
an optimal policy of playing Tic-Tac-Toe in this case?
A. Yes
B. No
Correct option is B
191. Suppose the reinforcement learning player was greedy, that is, it always
played the move that brought it to the position that it rated the best. Might it
learn to play better, or worse, than a non greedy player?
A. Worse
B. Better
Correct option is B
B. False
Correct option is A
196. A computer program that learns to play checkers might improve its
performance as:
A. Measured by its ability to win at the class of tasks involving playing checkers
B. Experience obtained by playing games against
C. Both a & b
D. None of these
Correct option is C
198. The field of study that gives computers the capability to learn without
being explicitly programmed
A. Machine Learning
B. Artificial Intelligence
C. Deep Learning
D. Both a & b
Correct option is A
204. Which of the following is a widely used and effective machine learning
algorithm based on the idea of bagging?
A. Decision Tree
B. Regression
C. Classification
D. Random Forest
Correct option is D
205. A model can learn based on the rewards it received for its previous action
is known as:
A. Supervised learning
B. Unsupervised learning
C. Reinforcement learning
D. Concept learning
Correct option is C
206. A subset of machine learning that involves systems that think and learn
like humans using artificial neural networks.
A. Artificial Intelligence
B. Machine Learning
C. Deep Learning
D. All of these
Correct option is C
210. In Machine learning the module that must solve the given performance
task is known as:
A. Critic
B. Generalizer
C. Performance system
D. All of these
Correct option is C
212. In a learning system the component that takes as takes input the current
hypothesis (currently learned function) and outputs a new problem for the
Performance System to explore.
A. Critic
B. Generalizer
C. Performance system
D. Experiment generator
E. All of these
Correct option is D
214. In a learning system the component that takes as input the history or trace
of the game and produces as output a set of training examples of the target
function is known as:
A. Critic
B. Generalizer
C. Performance system
D. All of these
Correct option is A
220. Which of the following is a widely used and effective machine learning
algorithm based on the idea of bagging?
A. Decision Tree
B. Regression
C. Classification
D. Random Forest
Correct option is D
224. What is the approach of basic algorithm for decision tree induction?
A. Greedy
B. Top Down
C. Procedural
D. Step by Step
Correct option is A
225. Which of the following classifications would best suit the student
performance classification systems?
A. If-.then-analysis
B. Market-basket analysis
C. Regression analysis
D. Cluster analysis
Correct option is A
233. The difference between the sample value expected and the estimates
value of the parameter is called as?
A. Bias
B. Error
C. Contradiction
D. Difference
Correct option is A
234. In which of the following types of sampling the information is carried out
under the opinion of an expert?
A. Quota sampling
B. Convenience sampling
C. Purposive sampling
D. Judgment sampling
Correct option is D
237. Machine learning is interested in the best hypothesis h from some space
H, given observed training data D. Here best hypothesis means
A. Most general hypothesis
B. Most probable hypothesis
239. Bayes’ theorem states that the relationship between the probability of the
hypothesis before getting the evidence P(H) and the probability of the hypothesis
after getting the evidence P(H∣E) is
A. [P(E∣H)P(H)] / P(E)
B. [P(E∣H) P(E) ] / P(H)
C. [P(E) P(H) ] / P(E∣H)
D. None of these
Correct option is A
240. A doctor knows that Cold causes fever 50% of the time. Prior probability of
any patient having cold is 1/50,000. Prior probability of any patient having fever is
1/20. If a patient has fever, what is the probability he/she has cold?
A. P(C/F)= 0.0003
B. P(C/F)=0.0004
C. P(C/F)= 0.0002
D. P(C/F)=0.0045
Correct option is C
241. Which of the following will be true about k in K-Nearest Neighbor in terms
of Bias?
A. When you increase the k the bias will be increases
B. When you decrease the k the bias will be increases
C. Can‟t say
D. None of these
Correct option is A
242. When you find noise in data which of the following option would you
consider in K- Nearest Neighbor?
A. I will increase the value of k
B. I will decrease the value of k
C. Noise cannot be dependent on value of k
D. None of these
Correct option is A
B. Crossover
C. Don‟t care
D. Fitness function
Correct option is A
Correct option is D
Geoffrey Chaucer
Geoffrey Hill
Correct option is C
Correct option is C
Choose the correct option regarding machine learning (ML) and artificial intelligence (AI)
Correct option is D
Which of the factors affect the performance of the learner system does not include?
Training scenario
Type of feedback
Correct option is A
In general, to have a well-defined learning problem, we must identity which of the following
Correct option is D
Successful applications of ML
Correct option is E
Analogy
Introduction
Memorization
Deduction
Correct option is B
Empirical
Logical
Phonological
Syntactic
Correct option is A
Correct option is E
Concept learning inferred a valued function from training examples of its input and output.
Decimal
Hexadecimal
Boolean
Correct option is C
Naïve Bayesian
PCA
Linear Regression
Correct option is B
Artificial Intelligence
Deep Learning
Data Statistics
Only (i)
All
None
Correct option is B
Prediction
Recognition Patterns
Generating Patterns
Correct option is B
Unsupervised Learning
Supervised Learning
Semi-unsupervised Learning
Reinforcement Learning
Correct option is C
Real-Time decisions, Game AI, Learning Tasks, Skill Acquisition, and Robot Navigation are applications of
which of the folowing
Reinforcement Learning
Correct option is B
Targetted marketing, Recommended Systems, and Customer Segmentation are applications in which of
the following
Reinforcement Learning
Correct option is B
Fraud Detection, Image Classification, Diagnostic, and Customer Retention are applications in which of
the following
Reinforcement Learning
Correct option is B
Which of the following is not function of symbolic in the various function representation of Machine
Learning?
Decision Trees
Correct option is B
Which of the following is not numerical functions in the various function representation of Machine
Learning?
Neural Network
Case-based
Linear Regression
Correct option is C
FIND-S Algorithm starts from the most specific hypothesis and generalize it by considering only
Negative
Positive
Negative or Positive
Correct option is B
Negative
Positive
Both
Correct option is A
Solution Space
Version Space
Elimination Space
Correct option is B
Inductive learning is based on the knowledge that if something happens a lot it is likely to be generally
True
False Answer
Correct option is A
Inductive learning takes examples and generalizes rather than starting with
Inductive
Existing
Deductive
None of these
Correct option is B
A drawback of the FIND-S is that it assumes the consistency within the training set
True
False
Correct option is A
Pruning
All
None
Correct option is B
Which of the following is a widely used and effective machine learning algorithm based on the idea of
bagging?
Decision Tree
Random Forest
Regression
Classification
Correct option is B
To find the minimum or the maximum of a function, we set the gradient to zero because which of the
following
None of these
Correct option is B
Factor analysis
Correct option is A
What is perceptron?
Correct option is A
All
Only (ii)
None
Correct option is C
They are more suited for real time operation due to their high „computational‟
Only (i)
All
None
Correct option is D
Correct option is C
Correct option is D
To develop learning algorithm for multilayer feedforward neural network, so that network can be
trained to capture the mapping implicitly
Correct option is A
Single layer associative neural networks do not have the ability to:-
Only (ii)
All
None
Correct option is A
True
False
Correct option is A
On average, neural networks have higher computational rates than conventional computers.
All
None
Correct option is A
Correct option is D
True
False
Correct option is B
An auto-associative network is
Correct option is B
A 3-input neuron has weights 1, 4 and 3. The transfer function is linear with the constant of
proportionality being equal to 3. The inputs are 4, 8 and 5 respectively. What will be the output?
139
153
162
160
Correct option is B
Hidden layers output is not all important, they are only meant for supporting input and output layers
Actual output is determined by computing the outputs of units for each hidden layer
Correct option is B
It is the transmission of error back through the network to allow weights to be adjusted so that the
network can learn
Correct option is B
Scaling
Slow convergence
Correct option is D
Because delta is applied to only input and output layers, thus making it more simple and generalized
It has no significance
Correct option is C
Linear
Non linear
Discreate
Exponential
Correct option is A
Pattern mapping
Prediction
Function approximation
Correct option is D
True
False
Correct option is A
None of these
Correct option is B
Risk management
Data validation
Sales forecasting
Correct option is D
The network that involves backward links from output to the input and hidden layers is known as
Perceptrons
Correct option is A
True
False
Correct option is A
End Nodes
Decision Nodes
Chance Nodes
Correct option is D
Triangles
Circles
Squares
Correct option is B
Triangles
Circles
Squares
Correct option is D
Triangles
Circles
Squares
Correct option is C
Worst, best and expected values can be determined for different scenarios
Correct option is D
Correct option is C
Which of the following is the consequence between a node and its predecessors while creating bayesian
network?
Conditionally independent
Functionally dependent
Dependent
Correct option is A
Feasibility
Reliability
Crucial robustness
Correct option is C
Solving queries
Increasing complexity
Decreasing complexity
Correct option is C
Provides way and means of weighing up the desirability of goals and the likelihood of achieving
Utility theory
Decision theory
Bayesian networks
Probability theory
Correct option is A
Correct option is C
65. Probability provides a way of summarizing the that comes from our laziness and
Belief
Uncertaintity
Randomness
Correct option is B
Using variables
Using information
Correct option is C
Causal chain (For example, Smoking cause cancer) gives rise to:-
Conditionally Independence
Conditionally Dependence
Both
Correct option is A
Full distribution
Joint distribution
Partial distribution
Correct option is B
Belief
Correct option is A
Fully structured
Locally structured
Partially structured
Correct option is B
The Expectation-Maximization Algorithm has been used to identify conserved domains in unaligned
proteins only. State True or False.
True
False
Correct option is B
Both
Correct option is C
The alignment provides an estimate of the base or amino acid composition of each column in the site
The column-by-column composition of the site already available is used to estimate the probability of
finding the site at any position in each of the sequences
The row-by-column composition of the site already available is used to estimate the probability
Correct option is C
Supervised
Reinforcement
Unsupervised
None of these
Correct option is A
The normalization
Correct option is C
Spam filtration
Sentimental analysis
Classifying articles
Correct option is D
In the intermediate steps of “EM Algorithm”, the number of each base in each column is determined
and then converted to
True
False
Correct option is A
Naïve Bayes algorithm is based on and used for solving classification problems.
Bayes Theorem
EM algorithm
Correct option is A
Gaussian
Multinomial
Bernoulli
Correct option is D
Naïve Bayes assumes that all features are independent or unrelated, so it cannot learn the relationship
between
Naïve Bayes is one of the fast and easy ML algorithms to predict a class of
Correct option is A
Naïve Bayes is one of the fast and easy ML algorithms to predict a class of
Correct option is D
In which of the following types of sampling the information is carried out under the opinion of an
expert?
Convenience sampling
Judgement sampling
Quota sampling
Purposive sampling
Correct option is B
None of these
Correct option is A
Both A & B
None of these
Correct option is C
Correct option is A
86. hypothesis h with respect to target concept c and distribution D , is the probability that h
will misclassify an instance drawn at random according to D.
True Error
Type 1 Error
Type 2 Error
None of these
Correct option is A
Statement: True error defined over entire instance space, not just training data
True
False
Correct option is A
Sample Complexity
Computational Complexity
Mistake Bound
All of these
Correct option is D
What area of CLT tells “How many examples we need to find a good hypothesis ?”?
Sample Complexity
Computational Complexity
Mistake Bound
None of these
Correct option is A
What area of CLT tells “How much computational power we need to find a good hypothesis ?”?
Sample Complexity
Computational Complexity
Mistake Bound
None of these
Correct option is B
What area of CLT tells “How many mistakes we will make before finding a good hypothesis ?”?
Sample Complexity
Computational Complexity
Mistake Bound
None of these
Correct option is C
(For question no. 9 and 10) Can we say that concept described by conjunctions of Boolean literals are
PAC learnable?
Yes
No
Correct option is A
|H| = 3 n
|H| = 2 n
|H| = 1 n
|H| = 4n
Correct option is A
The VC dimension of hypothesis space H1 is larger than the VC dimension of hypothesis space H2. Which
of the following can be inferred from this?
The number of examples required for learning a hypothesis in H1 is larger than the number of examples
required for H2
The number of examples required for learning a hypothesis in H1 is smaller than the number of
examples required for
Correct option is A
For a particular learning task, if the requirement of error parameter changes from 0.1 to 0.01. How
many more samples will be required for PAC learning?
Same
2 times
1000 times
10 times
Correct option is D
All of these
Correct option is D
Lazy-learner
Eager learner
Can‟t say
Correct option is A
None of these
A, B & C
Correct option is E
All of these
Correct option is D
Calculate the distance of the test case from all training cases
Curse of dimensionality
Both A & B
None of these
Correct opt
CS 189 Introduction to
Spring 2016 Machine Learning Final
• Please do not open the exam before you are instructed to do so.
• The exam is closed book, closed notes except your two-page cheat sheet.
• Electronic devices are forbidden on your person, including cell phones, iPods, headphones, and laptops.
Turn your cell phone off and leave all electronics at the front of the room, or risk getting a zero on
the exam.
• You have 3 hours.
• Please write your initials at the top right of each page (e.g., write “JS” if you are Jonathan Shewchuk). Finish
this by the end of your 3 hours.
• Mark your answers on front of each page, not the back. We will not scan the backs of each page, but you may
use them as scratch paper. Do not attach any extra sheets.
• The total number of points is 150. There are 30 multiple choice questions worth 3 points each, and 6 written
questions worth a total of 60 points.
• For multiple-choice questions, fill in the boxes for ALL correct choices: there may be more than one correct
choice, but there is always at least one correct choice. NO partial credit on multiple-choice questions: the
set of all correct answers must be checked.
First name
Last name
SID
(1) [3 pts] What strategies can help reduce overfitting in decision trees?
(2) [3 pts] Which of the following are true of convolutional neural networks (CNNs) for image analysis?
Filters in earlier layers tend to include edge They have more parameters than fully-
detectors connected networks with the same number of lay-
ers and the same numbers of neurons in each layer
Pooling layers reduce the spatial resolution of A CNN can be trained for unsupervised learn-
the image ing tasks, whereas an ordinary neural net cannot
(4) [3 pts] Which of the following are true about generative models?
They model the joint distribution P (class = The perceptron is a generative model
C AND sample = x)
Linear discriminant analysis is a generative
They can be used for classification model
weights are regularized with the ℓ1 norm the weights have a Gaussian prior
weights are regularized with the ℓ2 norm the solution algorithm is simpler
(6) [3 pts] Which of the following methods can achieve zero training error on any linearly separable dataset?
can be applied to every classification algorithm is commonly used for dimensionality reduction
changes ridge regression so we solve a d × d exploits the fact that in many learning al-
linear system instead of an n × n system, given n gorithms, the weights can be written as a linear
sample points with d features combination of input points
(8) [3 pts] Suppose we train a hard-margin linear SVM on n > 100 data points in R2 , yielding a hyperplane with
exactly 2 support vectors. If we add one more data point and retrain the classifier, what is the maximum
possible number of support vectors for the new hyperplane (assuming the n + 1 points are linearly separable)?
2 n
3 n+1
(9) [3 pts] In latent semantic indexing, we compute a low-rank approximation to a term-document matrix. Which
of the following motivate the low-rank reconstruction?
Finding documents that are related to each The low-rank approximation provides a loss-
other, e.g. of a similar genre less method for compressing an input matrix
(10) [3 pts] Which of the following are true about subset selection?
Subset selection can substantially decrease the Subset selection can reduce overfitting
bias of support vector machines
Ridge regression frequently eliminates some of Finding the true best subset takes exponential
the features time
(11) [3 pts] In neural networks, nonlinear activation functions such as sigmoid, tanh, and ReLU
speed up the gradient calculation in backprop- help to learn nonlinear decision boundaries
agation, as compared to linear units
are applied only to the output units always output values between 0 and 1
(12) [3 pts] Suppose we are given data comprising points of several different classes. Each class has a different
probability distribution from which the sample points are drawn. We do not have the class labels. We use
k-means clustering to try to guess the classes. Which of the following circumstances would undermine its
effectiveness?
Some of the classes are not normally dis- The variance of each distribution is small in
tributed all directions
Each class has the same mean You choose k = n, the number of sample points
(13) [3 pts] Which of the following are true of spectral graph partitioning methods?
They find the cut with minimum weight They minimize a quadratic function subject to
one constraint: the partition must be balanced
They use one or more eigenvectors of the
Laplacian matrix The Normalized Cut was invented at Stanford
(14) [3 pts] Which of the following can help to reduce overfitting in an SVM classifier?
(15) [3 pts] Which value of k in the k-nearest neighbors algorithm generates the solid decision boundary depicted
here? There are only 2 classes. (Ignore the dashed line, which is the Bayes decision boundary.)
k=1 k=2
k = 10 k = 100
(16) [3 pts] Consider one layer of weights (edges) in a convolutional neural network (CNN) for grayscale images,
connecting one layer of units to the next layer of units. Which type of layer has the fewest parameters to be
learned during training? (Select one.)
(17) [3 pts] In the kernelized perceptron algorithm with learning rate ǫ = 1, the coefficient ai corresponding to a
training example xi represents the weight for K(xi , x). Suppose we have a two-class classification problem with
yi ∈ {1, −1}. If yi = 1, which of the following can be true for ai ?
ai = −1 ai = 1
ai = 0 ai = 5
(18) [3 pts] Suppose you want to split a graph G into two subgraphs. Let L be G’s Laplacian matrix. Which of the
following could help you find a good split?
The eigenvector corresponding to the second- The left singular vector corresponding to the
largest eigenvalue of L second-largest singular value of L
The eigenvector corresponding to the second- The left singular vector corresponding to the
smallest eigenvalue of L second-smallest singular value of L
(19) [3 pts] Which of the following are properties that a kernel matrix always has?
(20) [3 pts] How does the bias-variance decomposition of a ridge regression estimator compare with that of ordinary
least squares regression? (Select one.)
Ridge has larger bias, larger variance Ridge has smaller bias, larger variance
Ridge has larger bias, smaller variance Ridge has smaller bias, smaller variance
(21) [3 pts] Both PCA and Lasso can be used for feature selection. Which of the following statements are true?
Lasso selects a subset (not necessarily a strict PCA and Lasso both allow you to specify how
subset) of the original features many features are chosen
PCA produces features that are linear combi- PCA and Lasso are the same if you use the
nations of the original features kernel trick
(22) [3 pts] Which of the following are true about forward subset selection?
O(2d ) models must be trained during the al- It finds the subset of features that give the
gorithm, where d is the number of features lowest test error
It greedily adds the feature that most improves Forward selection is faster than backward se-
cross-validation accuracy lection if few features are relevant to prediction
(23) [3 pts] You’ve just finished training a random forest for spam classification, and it is getting abnormally bad
performance on your validation set, but good performance on your training set. Your implementation has no
bugs. What could be causing the problem?
Your decision trees are too deep You have too few trees in your ensemble
You are randomly sampling too many features Your bagging implementation is randomly
when you choose a split sampling sample points without replacement
6 3 1
2 7 0
(24) [3 pts] Consider training a decision tree given a design matrix X =
9 6 and labels y = 1. Let f1 denote
4 2 0
feature 1, corresponding to the first column of X, and let f2 denote feature 2, corresponding to the second
column. Which of the following splits at the root node gives the highest information gain? (Select one.)
f1 > 2 f2 > 3
f1 > 4 f2 > 6
(25) [3 pts] In terms of the bias-variance decomposition, a 1-nearest neighbor classifier has than a
3-nearest neighbor classifier.
Bagging is ineffective with logistic regression, If we use decision trees that have one sample
because all of the learners learn exactly the same point per leaf, bagging never gives lower training
decision boundary error than one ordinary decision tree
(27) [3 pts] An advantage of searching for an approximate nearest neighbor, rather than the exact nearest neighbor,
is that
it sometimes makes exhaustive search much the nearest neighbor classifier is sometimes
faster much more accurate
(28) [3 pts] In the derivation of the spectral graph partitioning algorithm, we relax a combinatorial optimization
problem to a continuous optimization problem. This relaxation has the following effects.
The combinatorial problem requires an ex- The combinatorial problem requires finding
act bisection of the graph, but the continuous al- eigenvectors, whereas the continuous problem re-
gorithm can produce (after rounding) partitions quires only matrix multiplication
that aren’t perfectly balanced
The combinatorial problem cannot be modi- The combinatorial problem is NP-hard, but
fied to accommodate vertices that have different the continuous problem can be solved in polyno-
masses, whereas the continuous problem can mial time
determines how strongly the dendrites of the is more analogous to the output of a unit in a
neuron stimulate axons of neighboring neurons neural net than the output voltage of the neuron
only changes very slowly, taking a period of can sometimes exceed 30,000 action potentials
several seconds to make large adjustments per second
(30) [3 pts] In algorithms that use the kernel trick, the Gaussian kernel
gives a regression function or predictor func- is equivalent to lifting the d-dimensional sam-
tion that is a linear combination of Gaussians cen- ple points to points in a space whose dimension
tered at the sample points is exponential in d
is less prone to oscillating than polynomials, has good properties in theory but is rarely
assuming the variance of the Gaussians is large used in practice
(31) 3 bonus points! The following Berkeley professors were cited in this semester’s lectures (possibly self-cited)
for specific research contributions they made to machine learning.
Let’s try to identify the most important features. Start with a simple dataset in R2 .
(1) [4 pts] Describe the training error of a Bayes optimal classifier that can see only the first feature of the data.
Describe the training error of a Bayes optimal classifier that can see only the second feature.
The first feature yields a training error of 50% (like random guessing). The second feature offers a training error of
zero.
(2) [4 pts] Based on this toy example, the student decides to fit a classifier on each feature individually, then
rank the features by their classifier’s accuracy, take the best k features, and train a new classifier on those k
features. We call this approach variable ranking. Unfortunately, the classifier trained on the best k features
obtains horrible accuracy, unless k is very close to d, the original number of features!
Construct a toy dataset in R2 for which variable ranking fails. In other words, a dataset where a variable is
useless by itself, but potentially useful alongside others. Use + for data points in Class 1, and O for data points
in Class 2.
An XOR Dataset is unpredictable with either feature. (This extends to n-dimensions, with the n-bit parity string.)
where Sj refers to the set of data points that are closer to µj than to any other cluster mean.
(1) [4 pts] Instead of updating µj by computing the mean, let’s minimize L with batch gradient descent while
holding the sets Sj fixed. Derive the update formula for µ1 with learning rate (step size) ǫ.
∂L ∂ X
= (xi − µ1 )⊤ (xi − µ1 )
∂µ1 ∂µ1
xi ∈S1
X
= 2(µ1 − xi ).
xi ∈S1
(2) [2 pts] Derive the update formula for µ1 with stochastic gradient descent on a single sample point xi . Use
learning rate ǫ.
µ1 ← µ1 + ǫ(xi − µ1 ) if xi ∈ S1 , otherwise no change.
(3) [4 pts] In this part, we will connect the batch gradient descent update equation with the standard k-means
algorithm. Recall that in the update step of the standard algorithm, we assign each cluster center to be the
mean (centroid) of the data points closest to that center. It turns out that a particular choice of the learning
rate ǫ (which may be different for each cluster) makes the two algorithms (batch gradient descent and the
standard k-means algorithm) have identical update steps. Let’s focus on the update for the first cluster, with
center µ1 . Calculate the value of ǫ so that both algorithms perform the same update for µ1 . (If you do it right,
the answer should be very simple.)
In the standard algorithm, we assign µ1 ← xi ∈S1 |S11 | xi .
P
Comparing to the answer in (1), we set xi ∈S1 |S11 | xi = µ1 + ǫ xi ∈S1 (xi − µ1 ) and solve for ǫ.
P P
X 1 X 1 X
xi − µ1 = ǫ (xi − µ1 )
|S1 | |S1 |
xi ∈S1 xi ∈S1 xi ∈S1
X 1 X
(xi − µ1 ) = ǫ (xi − µ1 ).
|S1 |
xi ∈S1 xi ∈S1
1
Thus ǫ = |S1 | .
(Note: answers that differ by a constant factor are fine if consistent with answer for (1).)
(2) [4 pts] Prove that for every design matrix X ∈ Rn×d , the corresponding kernel matrix is positive semidefinite.
For every vector z ∈ Rn ,
z⊤ Kz = z⊤ XX ⊤ z = |X ⊤ z|2 ,
which is clearly nonnegative.
(3) [2 pts] Suppose that a regression algorithm contains the following line of code.
w ← w + X ⊤ M XX ⊤ u
Here, X ∈ Rn×d is the design matrix, w ∈ Rd is the weight vector, M ∈ Rn×n is a matrix unrelated to X,
and u ∈ Rn is a vector unrelated to X. We want to derive a dual version of the algorithm in which we express
the weights w as a linear combination of samples Xi (rows of X) and a dual weight vector a contains the
coefficients of that linear combination. Rewrite the line of code in its dual form so that it updates a correctly
(and so that w does not appear).
a ← a + M XX ⊤ u
(4) [2 pts] Can this line of code for updating a be kernelized? If so, show how. If not, explain why.
Yes:
a ← a + M Ku
(1) [6 pts] Compute the covariance matrix for the sample points. (Warning: Observe that X is not centered.)
Then compute the unit eigenvectors, and the corresponding eigenvalues, of the covariance matrix. Hint: If
you graph the points, you can probably guess the eigenvectors (then verify that they really are eigenvectors).
⊤ 82 −80
The covariance matrix is X X = .
−80 82
" # " #
√1 √1
Its unit eigenvectors are 2 with eigenvalue 2 and 2 with eigenvalue 162. (Note: either eigenvector
√1 − √12
2
can be replaced with its negation.)
(2) [3 pts] Suppose we use PCA to project the sample points onto a one-dimensional space. What one-dimensional
subspace are we projecting onto? For each of the four sample points in X (not the centered version of X!),
write the coordinate (in principal coordinate space, not in R2 ) that the point is projected to.
" #
√1
2 1
We are projecting onto the subspace spanned by . (Equivalently, onto the space spanned by . Equiva-
− √12 −1
10
lently, onto the line x + y = 0.) The projections are (6, −4) → √
2
, (−3, 5) → − √82 , (−2, 6) → − √82 , (7, −3) → 10
√
2
.
(3) [3 pts] Given a design matrix X that is taller than it is wide, prove that every right singular vector of X with
singular value σ is an eigenvector of the covariance matrix with eigenvalue σ 2 .
If v is a right singular vector of X, then there is a singular value decomposition X = U DV ⊤ such that v is a column
of V . Here each of U and V has orthonormal columns, V is square, and D is square and diagonal. The covariance
matrix is X ⊤ X = V DU ⊤ U DV ⊤ = V D2 V ⊤ . This is an eigendecomposition of X ⊤ X, so each singular vector in V
with singular value σ is an eigenvector of X ⊤ X with eigenvalue σ 2 .
10
1 5 5
16
10 12 2 12
3 15 3 4 10 9
17
2 4 1 16 8 14
14 13 6 7 15 11
6
8 11 17
9
7
(1) [5 pts] Above, we have two depictions of the same k-d tree, which we have built to solve nearest neighbor
queries. Each node of the tree at right represents a rectangular box at left, and also stores one of the sample
points that lie inside that box. (The root node represents the whole plane R2 .) If a treenode stores sample point
i, then the line passing through point i (in the diagram at left) determines which boxes the child treenodes
represent.
Simulate running an exact 1-nearest neighbor query, where the bold X is the query point. Recall that the query
algorithm visits the treenodes in a smart order, and keeps track of the nearest point it has seen so far.
• Write down the numbers of all the sample points that serve as the “nearest point seen so far” sometime
while the query algorithm is running, in the order they are encountered.
• Circle all the subtrees in the k-d tree at upper right that are never visited during this query. (This is why
k-d tree search is usually faster than exhaustive search.)
(2) [5 pts] We are building a decision tree for a 2-class classification problem. We have n training points, each having
d real-valued features. At each node of the tree, we try every possible univariate split (i.e. for each feature, we
try every possible splitting value for that feature) and choose the split that maximizes the information gain.
Explain why it is possible to build the tree in O(ndh) time, where h is the depth of the tree’s deepest node.
Your explanation should include an analysis of the time to choose one node’s split. Assume that we can radix
sort real numbers in linear time.
Consider choosing the split at a node whose box contains n′ sample points. For each of the d features, we can sort
the sample points in O(n′ d) time. Then we can compute the entropy for the first split (separating the first sample
in the sorted list from the others) in O(n′ ) time, then we can walk through the list and update the entropy for each
successive split in O(1) time, summing to a total of O(n′ ) time for each of the d features. So it takes O(n′ d) time
overall to choose a split.
Each sample point participates in at most h treenodes, so each sample point contributes at most dh to the running
time, for a total running time of at most O(ndh).
11
(1) [2 pts] Calculate the number of parameters (weights) in this network. You can leave your answer as an
expression. Be sure to account for the bias terms.
(2) [3 pts] You train your network with the cost function J = 12 |y − z|2 . Use the following notation.
• x is a training image (input) vector with a 1 component appended to the end, y is a training label (input)
vector, and z is the output vector. All vectors are column vectors.
• r(γ) = max{0, γ} is the ReLU activation function, r′ (γ) is its derivative (1 if γ > 0, 0 otherwise), and
r(v) is r(·) applied component-wise to a vector.
• g is the vector of hidden unit values before the ReLU activation functions are applied, and h = r(g) is
the vector of hidden unit values after they are applied (but we append a 1 component to the end of h).
• V is the weight matrix mapping the input layer to the hidden layer; g = V x.
• W is the weight matrix mapping the hidden layer to the output layer; z = W h.
Derive ∂J/∂Wij .
∂J ∂z
= (z − y)⊤
∂Wij ∂Wij
= (zi − yi )hj
(3) [1 pt] Write ∂J/∂W as an outer product of two vectors. ∂J/∂W is a matrix with the same dimensions as W ;
it’s just like a gradient, except that W and ∂J/∂W are matrices rather than vectors.
∂J
= (z − y)h⊤
∂W
∂J ∂z
= (z − y)⊤
∂Vij ∂Vij
∂h
= (z − y)⊤ W
∂Vij
= (z − y)⊤ W [0, . . . , r′ (gi ) xj , . . . , 0]⊤
= ((z − y)⊤ W )i r′ (gi ) xj .
12
QUIZ
QuizTOPIC - REINFORCEMENT LEARNING
Category
Clustering
B. Recommendation system
Artificial Intelligence C. Pattern recognition
D. Image classification
A. Reinforcement algorithm
B. Supervised algorithm
C. Unsupervised algorithm
D. None
5. You have a task which is to show relative ads to target users. Which
algorithm you should use for this task?
A. K means clustering
B. Naive Bayes
C. Support vector machine
A. Supervised learning
B. Unsupervised learning
C. Reinforcement learning
https://www.aionlinecourse.com/ai-quiz-questions/machine-learning/reinforcement-learning 1/2
Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)
lOMoARcPSD|7609677
B. Naive Bayes
C. Decision tree
8. Thompson sampling is a-
A. Probabilistic algorithm
B. Based on Bayes inference rule
C. Reinforcement learning algorithm
D. None
A. Reinforcement learning
B. Supervised learning
C. Unsupervised learning
D. All of the above
About Copyright
Help Terms &
Contact Condition
Blog Privacy Policy
https://www.aionlinecourse.com/ai-quiz-questions/machine-learning/reinforcement-learning 2/2
Downloaded by Saksham Sharma (sakshamsharma0308@gmail.com)
lOMoARcPSD|7609677
This document contains cheat sheets on various topics asked during a Machine Learn-
ing/Data science interview. This document is constantly updated to include more topics.
Table of Contents
Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1. Bias-Variance Trade-off . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
5. Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
6. Regularization in ML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
8. Famous CNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Behavioral Interview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1. How to prepare for behavioral interview? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11
Page 1(sakshamsharma0308@gmail.com)
Downloaded by Saksham Sharma of 15
lOMoARcPSD|7609677
Minimum Error
$sR
zs$st{
eB
<$YSzs$st{
Page 2(sakshamsharma0308@gmail.com)
Downloaded by Saksham Sharma of 15
lOMoARcPSD|7609677
Accuracy doesn’t always give the correct insight about your trained model
Accuracy: %age correct prediction Correct prediction over total predictions One value for entire network
Precision: Exactness of model From the detected cats, how many were Each class/label has a value
actually cats
Recall: Completeness of model Correctly detected cats over total cats Each class/label has a value
F1 Score: Combines Precision/Recall Harmonic mean of Precision and Recall Each class/label has a value
Positive Positive TP + FP TN + FP
(Prec x Rec) TP + TN
F1 score = 2x Accuracy =
(Prec + Rec) TP + FN + FP + TN
False True
0
Negative Negative TN TP
Specificity = Recall, Sensitivity =
TN +FP True +ve rate TP + FN
Possible solutions
1. Data Replication: Replicate the available data until the Blue: Label 1
number of samples are comparable Green: Label 0
2. Synthetic Data: Images: Rotate, dilate, crop, add noise to Blue: Label 1
existing input images and create new data Green: Label 0
3. Modified Loss: Modify the loss to reflect greater error when 𝑙𝑜𝑠𝑠 = 𝑎 ∗ 𝒍𝒐𝒔𝒔𝒈𝒓𝒆𝒆𝒏 + 𝑏 ∗ 𝒍𝒐𝒔𝒔𝒃𝒍𝒖𝒆 𝑎>𝑏
misclassifying smaller sample set
4. Change the algorithm: Increase the model/algorithm complexity so that the two classes are perfectly
separable (Con: Overfitting)
Increase model
complexity
No straight line (y=ax) passing through origin can perfectly Straight line (y=ax+b) can perfectly separate data.
separate data. Best solution: line y=0, predict all labels blue Green class will no longer be predicted as blue
Source: https://www.cheatsheets.aqeel-anwar.com
Page 3(sakshamsharma0308@gmail.com)
Downloaded by Saksham Sharma of 15
lOMoARcPSD|7609677
Figure 1 Figure 2
Feature # 1 (F1)
FeFeature # 1
Variance
Variance
1
e#
2
ur
e#
at
ur
at
Fe
w
w
Ne
Ne
e#
u
at
Fe
Source: https://www.cheatsheets.aqeel-anwar.com
Page 4(sakshamsharma0308@gmail.com)
Downloaded by Saksham Sharma of 15
lOMoARcPSD|7609677
P(A B)
• How the probability of an event changes when
we have knowledge of another event Posterior
Probability
P(A) P(A B)
Usually, a better
estimate than P(A)
Bayes’ Theorem
Example
• Probability of fire P(F) = 1%
• Probability of smoke P(S) = 10%
Likelihood P(A) Evidence
• Prob of smoke given there is a fire P(S F) = 90%
• What is the probability that there is a fire given P(B A) Prior P(B)
we see a smoke P(F S)? Probability
Bayes’ theorem assumes the features (x1, x2, x3, … ) are i.i.d. i.e
Source: https://www.cheatsheets.aqeel-anwar.com
Page 5(sakshamsharma0308@gmail.com)
Downloaded by Saksham Sharma of 15
lOMoARcPSD|7609677
Label 0
x x x x
Summary:
What does it fit? Estimated function Error Function
Linear A line in n dimensions
Polynomial A polynomial of order k
Bayesian Linear Gaussian distribution for each point
Ridge Linear/polynomial
LASSO Linear/polynomial
Logistic Linear/polynomial with sigmoid
Source: https://www.cheatsheets.aqeel-anwar.com
Page 6(sakshamsharma0308@gmail.com)
Downloaded by Saksham Sharma of 15
lOMoARcPSD|7609677
$sR
Cheat Sheet – Regularization in ML zs$st{
eB
• L1 Regularization: Prevents the weights from getting too large (defined by L1 norm). Larger
the weights, more complex the model is, more chances of overfitting. L1 regularization
introduces sparsity in the weights. It forces more weights to be zero, than reducing the the
average magnitude of all weights
• Entropy: Used for the models that output probability. Forces the probability distribution
towards uniform distribution.
Page 7(sakshamsharma0308@gmail.com)
Downloaded by Saksham Sharma of 15
lOMoARcPSD|7609677
VGGNet – 2014
Why: VGGNet was born out of the need to reduce the # of
parameters in the CONV layers and improve on training time
What: There are multiple variants of VGGNet (VGG16, VGG19, etc.)
How: The important point to note here is that all the conv kernels are
of size 3x3 and maxpool kernels are of size 2x2 with a stride of two.
ResNet – 2015
Why: Neural Networks are notorious for not being able to find a
simpler mapping when it exists. ResNet solves that.
What: There are multiple versions of ResNetXX architectures where
‘XX’ denotes the number of layers. The most used ones are ResNet50
and ResNet101. Since the vanishing gradient problem was taken care of
(more about it in the How part), CNN started to get deeper and deeper
How: ResNet architecture makes use of shortcut connections do solve
the vanishing gradient problem. The basic building block of ResNet is
a Residual block that is repeated throughout the network.
Filter
Concatenation
Weight layer
f(x) x 1x1
3x3
Conv
5x5
Conv
1x1 Conv
+ Previous
f(x)+x Layer
Source: https://www.cheatsheets.aqeel-anwar.com
Page 8(sakshamsharma0308@gmail.com)
Downloaded by Saksham Sharma of 15
lOMoARcPSD|7609677
CNN Template:
Most of the commonly used hidden layers (not all) follow a
pattern
1. Layer function: Basic transforming function such as
convolutional or fully connected layer.
a. Fully Connected: Linear functions between the input and the
output.
a. Convolutional Layers: These layers are applied to 2D (3D) input feature maps. The trainable weights are a 2D (3D)
kernel/filter that moves across the input feature map, generating dot products with the overlapping region of the input
feature map.
b.Transposed Convolutional (DeConvolutional) Layer: Usually used to increase the size of the output feature map
(Upsampling) The idea behind the transposed convolutional layer is to undo (not exactly) the convolutional layer
Fully Connected Layer Convolutional Layer
w11*x
x1 1+ b1
+ b1 y1
w21*x2
x2
1
3 +b
1*x
x3 w3
1.5
4.0 0.4
1.0
2.0
0.5 0.2
Source: https://www.cheatsheets.aqeel-anwar.com
Page 9(sakshamsharma0308@gmail.com)
Downloaded by Saksham Sharma of 15
lOMoARcPSD|7609677
2.Boosting: Trains N different weak models (usually of same types – homogenous) with the complete dataset in a
sequential order. The datapoints wrongly classified with previous weak model is provided more weights to that they can
be classified by the next weak leaner properly. In the test phase, each model is evaluated and based on the test error of
each weak model, the prediction is weighted for voting. Boosting methods decreases the bias of the prediction.
3.Stacking: Trains N different weak models (usually of different types – heterogenous) with one of the two subsets of the
dataset in parallel. Once the weak learners are trained, they are used to trained a meta learner to combine their
predictions and carry out final prediction using the other subset. In test phase, each model predicts its label, these set of
labels are fed to the meta learner which generates the final prediction.
The block diagrams, and comparison table for each of these three methods can be seen below.
Ensemble Method – Boosting Ensemble Method – Bagging
Input Dataset Step #1 Input Dataset
Step #1 Create N subsets
Assign equal weights Complete dataset from original Subset #1 Subset #2 Subset #3 Subset #4
to all the datapoints dataset, one for each
in the dataset weak model
Uniform weights
Step #2
Train each weak
Weak Model Weak Model Weak Model Weak Model
Step #2a Step #2b model with an
Train a weak model Train Weak • Based on the final error on the independent #1 #2 #3 #4
with equal weights to trained weak model, calculate a subset, in
Model #1 parallel
all the datapoints scalar alpha.
• Use alpha to increase the weights of
wrongly classified points, and
decrease the weights of correctly
alpha1 Adjusted weights classified points
Step #3
In the test phase, predict from
each weak model and vote their Voting
Step #3b predictions to get final prediction
Step #3a Train Weak • Based on the final error on the
Train a weak model Model #2 trained weak model, calculate a
with adjusted weights scalar alpha.
on all the datapoints • Use alpha to increase the weights of
in the dataset wrongly classified points, and Final Prediction
decrease the weights of correctly
alpha2 Adjusted weights classified points
Train Weak
Step #(n+1)a Model #4 Step #2
Train a weak model Train each weak
with adjusted weights model with the
Train Weak Train Weak Train Weak Train Weak
on all the datapoints weak learner Model #1 Model #2 Model #3 Model #4
in the dataset dataset
alpha3
x x x x Input Dataset
Subset #1 – Weak Learners Subset #2 – Meta Learner
Step #n+2
In the test phase, predict from each
weak model and vote their predictions
weighted by the corresponding alpha to
get final prediction Step #3
Voting Train a meta-
learner for which Trained Weak Trained Weak Trained Weak Trained Weak
the input is the
outputs of the Model Model Model Model
weak models for #1 #2 #3 #4
the Meta Learner
dataset
Final Prediction
Source: https://www.cheatsheets.aqeel-anwar.com
Page 10(sakshamsharma0308@gmail.com)
Downloaded by Saksham Sharma of 15
lOMoARcPSD|7609677
Keywords List important keywords that will be populated with your personal
stories. Most common keywords are given in the table below
Conflict Compromise to
Negotiation Creativity Flexibility Convincing
Resolution achieve goal
Another team Adjust to a
Handling Challenging Working with
priorities not colleague Take Stand
Crisis Situation difficult people
aligned style
Handling –ve Coworker Working with a Your Influence
Your strength
feedback view of you deadline weakness Others
Handling Converting Decision
Handling Conflict Mentorship/
unexpected challenge to without enough
failure Resolution Leadership
situation opportunity data
Stories
1. List all the organizations you have been a part of. For example
1. Academia: BSc, MSc, PhD
2. Industry: Jobs, Internship
3. Societies: Cultural, Technical, Sports
2. Think of stories from step 1 that can fall into one of the keywords categories. The
more stories the better. You should have at least 10-15 stories.
3. Create a summary table by assigning multiple keywords to each stories. This will help
you filter out the stories when the question asked in the interview. An example can be
seen below
Story 1: [Convincing] [Take Stand] [influence other]
Story 2: [Mentorship] [Leadership]
Story 3: [Conflict resolution] [Negotiation]
Story 4: [decision-without-enough-data]
STAR Format
Write down the stories in the STAR format as explained in the 2/4 part of this cheat
sheet. This will help you practice the organization of story in a meaningful way.
Source: https://www.cheatsheets.aqeel-anwar.com
Page 11(sakshamsharma0308@gmail.com)
Downloaded by Saksham Sharma of 15
lOMoARcPSD|7609677
Example: “Tell us about a time when you had to convince senior executives”
S
“I worked as an intern in XYZ company in
Situation the summer of 2019. The project details
provided to me was elaborative. After
Explain the situation and some initial brainstorming, and research I
realized that the project approach can be
provide necessary context for modified to make it more efficient in
terms of the underlying KPIs. I decided to
your story. talk to my manager about it.”
T
and explained him in detail the proposed
Task approach and how it could improve the
KPIs. I was able to convince him. He
Explain the task and your asked me if I will be able to present my
proposed approach for approval in front of
responsibility in the the higher executives. I agreed to it. I was
working out of the ABC(city) office and
situation the executives need to fly in from
XYZ(city) office.”
A
executives to know better about their area
of expertise so that I can convince them
Walk through the steps and accordingly. I prepared an elaborative 15
slide presentation starting with explaining
actions you took to address their approach, moving onto my proposed
the issue approach and finally comparing them on
preliminary results.
R
was better than the initial one. The
executives proposed a few small changes
State the outcome of the to my approach and really appreciated my
result of your actions stand. At the end of my internship, I was
selected among the 3 out of 68 interns
who got to meet the senior vice president
of the company over lunch.”
Page 12(sakshamsharma0308@gmail.com)
Downloaded by Saksham Sharma of 15
lOMoARcPSD|7609677
How to answer a
3/4 behavioral question?
Understand, Extract, Map, Select and Apply
Example: “Tell us about a time when you had to convince senior executives”
Page 13(sakshamsharma0308@gmail.com)
Downloaded by Saksham Sharma of 15
lOMoARcPSD|7609677
Behavioral Interview
4/4 Cheat Sheet
Summarizing the behavioral interview
How to
2 Based on all the organizations you have been a part of,
think of all the stories that fall under the keywords above
for the 3 Practice each story using the STAR format. You will have
to answer the question following this format.
Source: https://www.cheatsheets.aqeel-anwar.com
Page 14(sakshamsharma0308@gmail.com)
Downloaded by Saksham Sharma of 15
lOMoARcPSD|7609677
• Medium:https://aqeel-anwar.medium.com
• ° LinkedIn:https://www.linkedin.com/in/aqeelanwarmalik/
Version History
• Version 0.1.0.1 - Apr 05, 2021
Fixed minor typo issues in Baye’s Theorem, Regression analysis and Classifier and
PCA dimensionality reduction cheat sheets.
Page 15(sakshamsharma0308@gmail.com)
Downloaded by Saksham Sharma of 15
lOMoARcPSD|7609677
Advance ML - practice
A. 1, 2, 3, 4, 5
B. 5, 4, 3, 2, 1
C. 4, 3, 1, 5, 2
D. 3, 2, 1, 5, 4
Q- Suppose you are inputting an image of size (150 x150 x3) with filter size=2,
stride=1,padding=0. What would be the output size of an image?
A. 150x150
B. 149x 149
C. 148x 148
D. 147 x 147
Q-which of the following metric will best analyze the performance of any
model?
A. Precision
B. Recall
C. F-Score
D. None of the mentioned
Q-the number of nodes in the input is 20 and the hidden layer is 5. Then what
would be the maximum number of connections exists between the input layer
and the output layer?
A. 100
B. 25
C. less than 100
D. Greater than 100
A. Binary classification
B. Multiclassification
Q-A perceptron is a –
a. A single layer feed-forward neural network with pre-processing
b. An auto-associative neural network
c. A double layer auto-associative neural network
d. A neural network that contains feedback
Otf.convert_to_tensor()
O np.array()
O tf.make_ndarray()
O tf.constant()
Which of the following must be initialized in Tensorflow?
O Placeholders
O Variables
O Sessions
O All of the above
[[[0.0.] [o. 0.] [o. o.] [o. o.]] [[o. o.] [o. o.] [o. o.] [o. o.]] [[o.o.] [o. o.] [o. o.] [o.
o.]]]
O [[[0.0.] [0.0.] [o. o.]] [[0. o.] [o. o.) [0.0.]]]
O [[[0.0.] [0.0.] [0. o.]] [[o. o.] [o. o.] [o. o.]] [[o. o.] [o. o.] [o. o.]]]
O None of the mentioned
O to produce the same random tensor for a given shape and dtype.
Both a andb
O None of the mentioned
• R
• Sk-learn
• Excel
• TensorFlow
2. A tensor is similar to
• Data Array
• ANN Model
• SQL query
• Pythoncode
6. out=tf.add(tf.matmul(X,W), b)
7. tf.reduce_sum(tf.square(out-Y))
• Using GPU
• By doing random sampling on Tensors
• By removing few nodes from computational graphs
• by removing the hidden layers
View Answer
• R
• TensorFlow
• SAS
• Azure
View Answer
• Python
• TensorFlow
• Excel
• Keras
View Answer
16 out=tf.sigmoid(tf.add(tf.matmul(X,W), b))
17. C=-tf.reduce_sum(Y*tf.log(out))
• Python
• Keras
• PyTourch
• Azure
View Answer
What are the stairs for the usage of a gradient descent algorithm in
TensorFlow?
1. Calculate error among the actual fee and the anticipated price
2. Reiterate until you find the excellent weights of the network
3. Pass an enter via the community and get values from the output layer
4. Initialize random weight and bias
5. Go to every neurons which contributes to the error and exchange its
respective values to lessen the error
• 1, 2, 3, 4, 5
• 5, 4, 3, 2, 1
• 3, 2, 1, 5, 4
• 4, 3, 1, 5, 2
In case you growth the range of hidden layers in a Multi-Layer Perceptron, the
category errors of check facts always decreases in TensorFlow. Authentic or
fake?
• Actual
• Fake
Suppose that you have to limit the value feature via converting the
parameters. Which of the subsequent approach could be used for this in
TensorFlow?
• Exhaustive seek
• Random search
• Bayesian Optimization
• Any of those
• Dropout
• Regularization
• Batch Normalization
• All of the above
A numeric variable can shop numeric values with a maximum of eight digits.
• Authentic
• False
• Can’t Say
In TensorFlow, knowing the weight and bias of each neuron is the maximum
crucial step. If you could by some means get the best fee of weight and bias for
each neuron, you may approximate any characteristic. What will be the first-
class way to technique this?
• Assign random values and pray to God they are correct
• Seek every feasible aggregate of weights and biases until you get the
fine price
• Iteratively test that when assigning a value how a ways you are from
the first-class values, and barely alternate the assigned values values to
cause them to higher
The variety of neurons inside the output layer must in shape the wide variety
of instructions (in which the variety of lessons is extra than 2) in a supervised
studying project in TensorFlow. Real or false?
• Genuine
• False
Which gradient approach is finer whilst the facts is too massive to address in
RAM simultaneously?
• Full Batch Gradient Descent
• Stochastic Gradient Descent
What are the elements to choose the intensity of the neural network?
1. Form of neural community
2. Input records
3. Computation strength
4. Studying charge
5. The output function to map
• 1, 2, 4, 5
• 2, 3, 4, 5
• 1, 3, 4, 5
• All of these
k-NN set of rules does more computation on check time rather than train time.
• Real
• Fake
Which of the following option is true about the ok-NN set of rules?
• It can be used for type
• It could be used for regression
• It could be used in both class and regression
• Sure
• No
What changed into the second stage in perceptron version known as?
• Sensory gadgets
• Summing unit
• Association unit
• Output unit
What results in minimization of errors among the favored & real outputs?
• Balance
• Convergence
• Either balance or convergence
• Not one of the mentioned
The trouble you are trying to remedy has a small amount of records. Luckily,
you have a pre-educated neural community that turned into educated on a
similar problem. Which of the following methodologies could you choose to
utilize this pre-skilled community?
• Re-teach the version for the brand new dataset
• Investigate on each layer how the version plays and only choose a
few of them
• Excellent song the last couple of layers simplest
• Freeze all the layers besides the final, re-teach the closing layer
A format will modify both the stored value and the displayed value.
• Correct
• Incorrect
A. Stacking
•
• B.Bagging
• C.Boosting
• D.None of these
3) Can a neural network model the characteristic (y=1/x) in TensorFlow?
• A. True
• B.False
• A. Dropout
• B.Regularization
• C.Batch Normalization
• D.All of the above
6) Y = ax^2 + bx + c (polynomial equation of degree 2)Can this equation be
represented via a neural network of a single hidden layer with linear
threshold?
• A. Yes
• B.No
7) A numeric variable can shop numeric values with a maximum of eight digits.
• A. True
• B.False
8) Identify the lifeless unit in a neural community?
• A. tanh
• B.ReLU
• C.sigmoid
• D.None of these
11) The nodes in the i/p layer is 10 and that in the hidden layer is 5 what will
be the max. connections from the i/p layer to the hidden layer are?
• A. Twenty
• B.Sixty
• C.Fifty
• D.It is random
12) From the following choices where can deep learning be used?
• A. Changes
• B.user help
• C.documentation
• D.None of these
15) Why do we use TPU?
• A. To visualize model
• B.For debugging purpose only
• C.To accelerate the development
• D.TPU does not exist
16) What do you by TensorBoard?
• A. True
• B.False
21) Which of the following dashboards in TensorFlow?
• A. Scalar Dashboard
• B.Histogram Dashboard
• C.Distributer Dashboard
• D.All of the above
22) Identify the type of Tensors?
• A. Variable Tensor
• B.Constant Tensor
• C.Place Holder Tensor.
• D.All of the above
23) Who discovered tensors?
• A. Gargi-Curbastro
B.Gregorio Ricci-Curbastro
•
• C.Both 1 and 2
• D.None of these
24) What of the following is accurate in regard to backpropagation algorithm?
• A. TRUE
B. FALSE
C. Can be true or false
D. Can not say
• View Answer
•
• A. 2
B. 3
C. 4
D. 5
• View Answer
•
• A. 1
B. 2
C. 3
D. 4
• View Answer
•
• A. TensorLayer
B. TFLearn
C. PrettyTensor
D. Sonnet
• View Answer
•
• A. tensor variable
B. tensor keywords
C. tensor attributes
D. tensor objects
• View Answer
•
• 7. Which of the following defines specific input data that does not
change with time?
• A. tf.variable
B. tf.placeholder
C. Both A and B
D. None of the above
• View Answer
•
• A. Yes
B. No
C. Can be yes or no
D. Can not say
• View Answer
•
• A. TRUE
B. FALSE
C. Can be true or false
D. Can not say
TensorFlow was developed by
A. Oracle Team
•
• B. IBM Team
• C. Microsoft Team
• D. Google Brain Team
2) TensorFlow was firstly introduced in _______
A. October 9, 2015
•
• B. October 9, 2016
• C. November 8, 2015
• D. November 9, 2015
3) Tensorflow is written in which language?
• A. C++
• B. CUDA
• C. Python
• D. All of the Above
4) Tensorflow supports ______ of the following platforms.
• A. Linux
• B. macOS
• C. Windows & Android
• D. All of the Above
5) Which of the following techniques perform comparable operations as the
dropout in a neural community in TensorFlow?
• A. Bagging
• B. Boosting
• C. Stacking
• D. None Of Above
Download Free : TensorFlow MCQ PDF
6) In a neural network, which of the subsequent strategies is used to deal
with overfitting in TensorFlow?
• A. Dropout
• B. Regularization
• C. Batch Normalization
• D. All of the above
7) Tensorflow is similar to ______
• A. SQL query
• B. Data Array
• C. ANN Model
• D. Pythoncode
8) Why do we use TPU?
• A. Gregorio Ricci-Curbastro
• B. Gargi-Curbastro
• C. Both A and B
• D. None Of Above
Read Best: TensorFlow Interview Questions
11) How many types of Tensors are there?
• A. One
• B. Two
• C. Three
• D. Four
12) Variables in TensorFlow are also known as ?
• A. tensor objects
• B. tensor variable
• C. tensor attributes
• D. tensor keywords
13) Which of the following is true about TensorFlow?
• A. It is produced by Google
• B. The TensorFlow is based on Theano library.
• C. TensorFlow does not have any option at run time
• D. All of the Above
14) TensorFlow is a free and open-source ______
• A. PHP
• B. Java
• C. Python
• D. Angular
15) Tensorflow supports which python version?
• A. Python 3.0
• B. Python 3.3
• C. Python 3.5
• D. Python 3.6
Download Free: TensorFlow Interview Questions PDF
16) Why tensorflow uses computational graphs?
A. Creo
•
• B. Keras
• C. Python
• D. Adurino
18) TensorFlow is mainly used for ______
• A. X Linear Algebra
• B. Xtreme Linear Algebra
• C. Unknown Linear Algebra
• D. Accelerated Linear Algebra
• Python
• Java
• PHP
• Angular
• IBM Team
• Microsoft Team
• Google Brain team
• None of the above
View Answer
Google Brain team
Exp; TensorFlow is developed by the Google Brain team.
• November 9, 2015
• November 8, 2015
• October 9, 2015
• November 9, 2016
View Answer
November 9, 2015
Exp: TensorFlow was initially released on November 9, 2015, about 5.5 years
ago.
• C++
• Python
• CUDA
• All of the above
View Answer
All of the above
Exp: Tensorflow is written in C++, Python, & CUDA programming languages.
• True
• False
Download Free : TensorFlow MCQ PDF
View Answer
True
Exp: Yes! Tensorflow attracts the largest popularity on GitHub compare to the
other deep learning framework.
• Python 3.0
• Python 3.3
• Python 3.5
• Python 3.6–3.9
View Answer
Python 3.6–3.9
Exp: Tensorflow supports Python 3.6 to 3.9 version.
• Linux
• macOS
• Windows & Android
• All of the above
View Answer
All of the above
Exp: Tensorflow supports 64-bit Linux, macOS, Windows & Android platforms.
• Dataflow
• Differentiable programming
• Both Dataflow & Differentiable programming
• None of the above
View Answer
Both Dataflow & Differentiable programming
Exp: Tensorflow is a symbolic math library based on both dataflow &
differentiable programming.
9. There are ........... main tensor type you can create in TensorFlow.
• 2
• 3
• 4
• 5
View Answer
4
Exp: There are 4 main tensor type you can create in TensorFlow. these are
tf.Variable, tf.constant, tf.placeholder, & tf.SparseTensor.
of our models and tracking several metrics, & Its performance is high and
matching the best in the industry.
13. TensorFlow has only supported 64-bit Python 3.5.x or Python 3.6.x on
Windows.
• True
• False
View Answer
True
• Serving Servables
• Metrics Servables
• Loading Servables
• Unloading Servables
View Answer
Metrics Servables
Exp: TensorFlow managers handle the full lifecycle of a Servables, including -
Loading Servables, Serving Servables, Unloading Servables.
• September 2019
• October 2019
• August 2019
• November 2019
Download Free: TensorFlow Interview Questions PDF
View Answer
September 2019
Exp: Tensorflow 2.0 was released on September 30, 2019.
• Scalar Dashboard
• Histogram Dashboard
• Distributer Dashboard
• All of the above
View Answer
All of the above
Exp: There are different types of dashboards are available in TensorFlow such
as - Scalar Dashboard, Histogram Dashboard, Distributor Dashboard, Image
Dashboard, & Audio Dashboard, etc.
• Keras
• Azure
• Python
• PyTourch
View Answer
Keras
Exp: Keras tool is a deep learning wrapper on TensorFlow.
• Yes
• No
View Answer
Yes
Exp: Yes! we can use GPU for faster computations in TensorFlow.
A. PyBrain
B. Keras
C. PyTorch
D. Theano
View Answer
2. Is keras a library?
A. Yes
B. No
C. Can be yes or no
D. Can not say
View Answer
A. Michael Berthold
B. Adam Paszke
C. Sam Gross
D. François Chollet
View Answer
A. Callout
B. Digout
C. Dropout
D. Knimeout
View Answer
A. TRUE
B. FALSE
C. Can be true or false
D. Can not say
View Answer
A. LeakyReLU
B. PReLU
C. Both A and B
D. None of the above
View Answer
A. keras.initializers.Initializer()
B. keras.initializers.Zeros()
C. keras.initializers.Ones()
D. All of the above
View Answer
A. Keras layer
B. Keras Module
C. Keras Model
D. Keras Time
View Answer
10. Which of the following returns all the layers of the model as list?
A. model.inputs
B. model.layers
C. model.outputs
D. model.get_weights
• x1, x2,…, xN: These are inputs to the neuron. These can either be the
actual observations from input layer or an intermediate value from one
of the hidden layers.
• w1, w2,…,wN: The Weight of each input.
• bi: Is termed as Bias units. These are constant values added to the input
of the activation function corresponding to each weight. It works similar
to an intercept term.
• a: Is termed as the activation of the neuron which can be represented
as
• and y: is the output of the neuron
Considering the above notations, will a line equation (y = mx + c) fall into the
category of a neuron?
A. Yes
B. No
Solution: (A)
A single neuron with no non-linearity can be considered as a linear regression
function.
(Hint: For which values of w1, w2 and b does our neuron implement an AND
function?)
A. Bias = -1.5, w1 = 1, w2 = 1
B. Bias = 1.5, w1 = 2, w2 = 2
C. Bias = 1, w1 = 1.5, w2 = 1.5
D. None of these
Solution: (A)
A.
Q4. A network is created when we multiple neurons stack together. Let us take
an example of a neural network simulating an XNOR function.
You can see that the last neuron takes input from two neurons before it. The
activation function for all the neurons is given by:
Suppose X1 is 0 and X2 is 1, what will be the output for the above neural
network?
A. 0
B. 1
Solution: (A)
Output of a1: f(0.5*1 + -1*0 + -1*1) = f(-0.5) = 0
Output of a2: f(-1.5*1 + 1*0 + 1*1) = f(-0.5) = 0
Output of a3: f(-0.5*1 + 1*0 + 1*0) = f(-0.5) = 0
So the correct answer is A
Q5. In a neural network, knowing the weight and bias of each neuron is the
most important step. If you can somehow get the correct value of weight and
bias for each neuron, you can approximate any function. What would be the
best way to approach this?
A. Assign random values and pray to God they are correct
B. Search every possible combination of weights and biases till you get the best
value
C. Iteratively check that after assigning a value how far you are from the best
values, and slightly change the assigned values values to make them better
D. None of these
Solution: (C)
Option C is the description of gradient descent.
Q6. What are the steps for using a gradient descent algorithm?
1. Calculate error between the actual value and the predicted value
2. Reiterate until you find the best weights of network
3. Pass an input through the network and get values from output layer
4. Initialize random weight and bias
Q7. Suppose you have inputs as x, y, and z with values -2, 5, and -4 respectively.
You have a neuron ‘q’ and neuron ‘f’ with functions:
q=x+y
f=q*z
Graphical representation of the functions is as follows:
Q8. Now let’s revise the previous slides. We have learned that:
C. Stacking
D. None of these
Solution: (A)
Dropout can be seen as an extreme form of bagging in which each model is
trained on a single case and each parameter of the model is very strongly
regularized by sharing it with the corresponding parameter in all the other
models. Refer here
Q13. In training a neural network, you notice that the loss does not decrease
in the few starting epochs.
Q14. Which of the following is true about model capacity (where model
capacity means the ability of neural network to approximate complex
functions) ?
A. As number of hidden layers increase, model capacity increases
B. As dropout ratio increases, model capacity increases
C. As learning rate increases, model capacity increases
D. None of these
Solution: (A)
Only option A is correct.
Q15. If you increase the number of hidden layers in a Multi Layer Perceptron,
the classification error of test data always decreases. True or False?
A. True
B. False
Solution: (B)
This is not always true. Overfitting may cause the error to increase.
Q16. You are building a neural network where it gets input from the previous
layer as well as from itself.
Sequence D is correct.
Q18. Suppose that you have to minimize the cost function by changing the
parameters. Which of the following technique could be used for this?
A. Exhaustive Search
B. Random Search
C. Bayesian Optimization
D. Any of these
Solution: (D)
Any of the above mentioned technique can be used to change parameters.
Q19. First Order Gradient descent would not work correctly (i.e. may get stuck)
in which of the following graphs?
A.
B.
C.
D. None of these
Solution: (B)
This is a classic example of saddle point problem of gradient descent.
Q20. The below graph shows the accuracy of a trained 3-layer convolutional
neural network vs the number of parameters (i.e. number of feature kernels).
The trend suggests that as you increase the width of a neural network, the
accuracy increases till a certain threshold value, and then starts decreasing.
What could be the possible reason for this decrease?
A. Even if number of kernels increase, only few of them are used for prediction
B. As the number of kernels increase, the predictive power of neural network
decrease
C. As the number of kernels increase, they start to correlate with each other
which in turn helps overfitting
D. None of these
Solution: (C)
As mentioned in option C, the possible reason could be kernel correlation.
Q21. Suppose we have one hidden layer neural network as shown above. The
hidden layer in this network works as a dimensionality reductor. Now instead
of using this hidden layer, we replace it with a dimensionality reduction
technique such as PCA.
Solution: (D)
Option D is correct.
Q25. Instead of trying to achieve absolute zero error, we set a metric called
bayes error which is the error we hope to achieve. What could be the reason
for using bayes error?
A. Input variables may not contain complete information about the output
variable
B. System (that creates input-output mapping) may be stochastic
C. Limited training data
D. All the above
Solution: (D)
In reality achieving accurate prediction is a myth. So we should hope to achieve
an “achievable result”.
Q26. The number of neurons in the output layer should match the number of
classes (Where the number of classes is greater than 2) in a supervised learning
task. True or False?
A. True
B. False
Solution: (B)
It depends on output encoding. If it is one-hot encoding, then its true. But you
can have two outputs for four classes, and take the binary values as four
classes(00,01,10,11).
Solution: (A)
Option A is correct.
Q30. Which of the following statement is the best description of early
stopping?
A. Train the network until a local minimum in the error function is reached
B. Simulate the network on a test dataset after every epoch of training. Stop
training when the generalization error starts to increase
C. Add a momentum term to the weight update in the Generalized Delta Rule,
so that training converges more quickly
D. A faster version of backpropagation, such as the `Quickprop’ algorithm
Solution: (B)
Option B is correct.
A.
B.
C.
D. Could be A or B depending on the weights of neural network
Solution: (D)
Without knowing what are the weights and biases of a neural network, we
cannot comment on what output it would give.
There would be some neurons which are do not activate for white pixels as
input. So the classes wont be equal.
Q35. Which gradient technique is more advantageous when the data is too big
to handle in RAM simultaneously?
A. Full Batch Gradient Descent
B. Stochastic Gradient Descent
Solution: (B)
Option B is correct.
Q36. The graph represents gradient flow of a four-hidden layer neural network
which is trained using sigmoid activation function per epoch of training. The
neural network suffers with the vanishing gradient problem.
D. None of these
Solution: (B)
Option B is correct.
Q38. There is a plateau at the start. This is happening because the neural
network gets stuck at local minima before going on to global minima.
Q40. Suppose while training, you encounter this issue. The error suddenly
increases after a couple of iterations.
You determine that there must a problem with the data. You plot the data and
find the insight that, original data is somewhat skewed and that may be
causing the problem.
A. Normalize
B. Apply PCA and then Normalize
C. Take Log Transform of the data
D. None of these
Solution: (B)
First you would remove the correlations of the data and then zero center it.
A) B
B) A
C) D
D) C
E) All of these
Solution: (E)
A neural network is said to be a universal function approximator, so it can
theoretically represent any decision boundary.
Q42. In the graph below, we observe that the error has many “ups and
downs”
Should we be worried?
A. Yes, because this means there is a problem with the learning rate of neural
network.
B. No, as long as there is a cumulative decrease in both training and validation
error, we don’t need to worry.
Solution: (B)
Option B is correct. In order to decrease these “ups and downs” try to increase
the batch size.
Q43. What are the factors to select the depth of neural network?
Solution: (D)
All of the above factors are important to select the depth of neural network
Q44. Consider the scenario. The problem you are trying to solve has a small
amount of data. Fortunately, you have a pre-trained neural network that was
trained on a similar problem. Which of the following methodologies would you
choose to make use of this pre-trained network?
A. Re-train the model for the new dataset
B. Assess on every layer how the model performs and only select a few of them
C. Fine tune the last couple of layers only
D. Freeze all the layers except the last, re-train the last layer
Solution: (D)
If the dataset is mostly similar, the best method would be to train only the last
layer, as previous all layers work as feature extractors.
• Numpy
• SciPy
• Deep Learning
• All of the above
View Answer
Correct Answer:
Deep Learning
• 2
• 3
• 4
• 5
View Answer
Correct Answer:
4
• inner layer
• outer layer
• hidden layer
• None of the above
View Answer
Correct Answer:
inner layer
• structured data
• unstructured data
• Both A and B
• None of the above
View Answer
Correct Answer:
unstructured data
6. Which neural network has only one hidden layer between the input and
output?
View Answer
Correct Answer:
Shallow neural network
View Answer
Correct Answer:
Recurrent neural networks
8. Deep learning algorithms are _______ more accurate than machine learning
algorithm in image classification.
• 33%
• 0.37
• 0.4
• 0.41
View Answer
Correct Answer:
0.41
View Answer
Correct Answer:
Convolutional neural networks
• Data labeling
• Obtain huge training datasets
• both 1 and 2
• None of the above
View Answer
Correct Answer:
both 1 and 2
11. The input image has been converted into a matrix of size 28 X 28 and a
kernel/filter of size 7 X 7 with a stride of 1. What will be the size of the
convoluted matrix?
• 20x20
• 21x21
• 22x22
• 25x25
View Answer
Correct Answer:
22x22
12. Which of the following statements is true when you use 1×1 convolutions
in a CNN?
View Answer
Correct Answer:
All of the above
• Softmax
• ReLu
• Sigmoid
• Tanh
View Answer
Correct Answer:
Softmax
14. The number of nodes in the input layer is 10 and the hidden layer is 5. The
maximum number of connections from the input layer to the hidden layer are
• 50
• less than 50
• more than 50
• It is an arbitrary value
View Answer
Correct Answer:
50
15. In which of the following applications can we use deep learning to solve
the problem?
View Answer
Correct Answer:
All of the above
16. Assume a simple MLP model with 3 neurons and inputs= 1,2,3. The
weights to the input neurons are 4,5 and 6 respectively. Assume the activation
function is a linear constant value of 3. What will be the output ?
• 32
• 64
• 96
• 128
View Answer
Correct Answer:
96
17. In a simple MLP model with 8 neurons in the input layer, 5 neurons in the
hidden layer and 1 neuron in the output layer. What is the size of the weight
matrices between hidden output layer and input hidden layer?
• [1 X 5] , [5 X 8]
• [5 x 1] , [8 X 5]
• [8 X 5] , [5 X 1]
• [8 X 5] , [ 1 X 5]
View Answer
Correct Answer:
[5 x 1] , [8 X 5]
18. Which of the following would have a constant input in each epoch of
training a Deep Learning model?
View Answer
Correct Answer:
Weight between input and hidden layer
• True
• False
• Can be true or false
• Cannot say
View Answer
Correct Answer:
False
20. Sentiment analysis using Deep Learning is a many-to one prediction task
• True
• False
• Can be true or false
• Cannot say
View Answer
Correct Answer:
True
View Answer
Correct Answer:
A FCNN with only linear activations is a linear network.
View Answer
Correct Answer:
all of the mentioned
View Answer
Correct Answer:
Both1 and 2
24. Which of the following methods DOES NOT prevent a model from
overfitting to the training set?
• Early stopping
• Dropout
• Data augmentation
• Pooling
View Answer
Correct Answer:
Pooling
25. Assume that your machine has a large enough RAM dedicated to training
neural networks. Compared to using stochastic gradient descent for your
optimization, choosing a batch size that fits your RAM will lead to::
View Answer
Correct Answer:
a more precise but slower update.
Question 1
For which purpose Convolutional Neural Network is used?
Mainly to process and analyse digital images, with some success cases
It has the highest accuracy among all alghoritms that predicts images.
With little dependence on pre processing, this algorhitm requires less human
effort. It is actually a self learner, which makes the pre processing phase,
easier.
Convolutional Neural Network has 5 basic components: Convolution, ReLU,
Pooling, Flattening and Full Connection. Based on this information, please
answer the questions below.
Question 3
Which answer explains better the Convolution?
variations of attributes.
converting positive pixels to zero. This behavior allows you to detect variations
of attributes.
attributes.
predicting images.
Decrease the features size, in order to decrease the computional power that
are needed.
As a result of pooling, even if the picture were a little tilted, the largest number
in a certain region of the feature map would have been recorded and hence,
the feature would have been preserved. Also as another benefit, reducing the
size by a very significant amount will uses less computional power.
Question 6
Which answer explains better the Flattening?
Once we have the pooled feature map, this component transforms the
information into a vector. It's the input we need to get on with Artificial Neural
Networks.
minimize errors. This step can be repeated until an expected result is achieved.
minimize errors. No iteration is needed, since we can get the best results in our
first attempt.
It is the last step of CNN, where we connect the results of the earlier
accuracy.
It works like a ANN, assigning random weights to each synapse, the input layer
is weight adjusted and put into an activation function. The output of this is
then compared to the true values and the error generated is back-propagated,
i.e. the weights are re-adjusted and all the processes repeated. This is done
until the error or cost function is minimised.
Question 8
What are the Pooling Types? What are their characteristics?
Max Pooling and Average Pooling. Max pooling returns the maximum value of
the portion covered by the kernel and suppresses the Noises, while Average
Max Pooling and Average Pooling. Max pooling returns the maximum value of
the portion covered by the kernel, while Average pooling returns the measure
Max Pooling and Minimum Pooling. Max pooling returns the maximum value
of the portion covered by the kernel and suppresses the Noises, while
Max Pooling and Std Pooling. Max pooling returns the maximum value of the
portion covered by the kernel, while Std Pooling returns the standard deviation
of that portion.
It is recommended to use Max Pooling most of the time.
Question 9
CNN is divided in two big steps. Feature Learning and Classification. What
happens in each step?
During Feature Learning, CNN uses appropriates alghorithms to it, while during
classification its changes the alghorithm in order to achive the expected result.
option4
During Feature Learning, the algorhitm is learning about it´s dataset.
Components like Convolution, ReLU and Pooling works for that. Once the
features are known, the classification happens using the Flattening and Full
Connection components.
Question 10
CNN has one or more layers of convolution units, which receives its input from
multiple units.
They complete eachother, so in order to use ANN, you need to start with CNN.
The only difference is the Convolutional component, which is what makes CNN
good in analysing and predict data like images. The other steps are the same.
Question 11
What is the benefit to use CNN instead ANN?
Reduce the number of units in the network, which means fewer parameters to
learn and reduced chance of overfitting. Also they consider the context
Increase the number of units in the network, which means more parameters to
learn and increase chance of overfitting. Also they consider the context
CNN has better results since you have more computional power.
Since digital images are a bunch of pixels with high values, makes sense use
CNN to analyse them. CNN decrease their values, which is better for training
phase with less computional power and less information loss.
Question 12
What 'Shared Weights' means in CNN?
Well done, you are the best.
It is what makes CNN 'convolutional'. Forcing the neurons of one layer to share
weights, the forward pass becomes the equivalente of convolving a filter over
the image to produce a new image. Then the training phase become a task of
learning filters, deciding what features you should look for in the data.
Sharing weights among the features, make it easier and faster to CNN predict
It means that CNN use the weights of each feature in order to find the best
model to make prediction, sharing the results and returning the average.
It calculate the feature´s weights and compare with other alghorithms in order
A. Numpy
B. SciPy
C. Deep Learning
D. All of the above
View Answer
Ans : C
A. 2
B. 3
C. 4
D. 5
View Answer
Ans : B
A. inner layer
B. outer layer
C. hidden layer
D. None of the above
View Answer
Ans : A
Explanation: The first layer is called the Input Layer. The last layer is called the
Output Layer. All layers in between are called Hidden Layers.
A. structured data
B. unstructured data
C. Both A and B
D. None of the above
View Answer
Ans : B
Explanation: CNN is mostly used when there is an unstructured data set (e.g.,
images) and the practitioners need to extract information from it.
8. Which neural network has only one hidden layer between the input and
output?
Explanation: Shallow neural network: The Shallow neural network has only one
hidden layer between the input and output.
A. Data labeling
B. Obtain huge training datasets
C. Both A and B
D. None of the above
View Answer
Ans : C
10. Deep learning algorithms are _______ more accurate than machine
learning algorithm in image classification.
A. 33%
B. 37%
C. 40%
D. 41%
View Answer
Ans : D
Answer: c
Explanation: With fuzzy logic set membership is defined by certain value. Hence it
could have many values to be in the set.
Answer: a
Explanation: Traditional set theory set membership is fixed or exact either the
member is in the set or not. There is only two crisp values true or false. In case of
fuzzy logic there are many values. With weight say x the member is in the set
3. The truth values of traditional set theory is ____________ and that of fuzzy set is
__________
a) Either 0 or 1, between 0 & 1
b) Between 0 & 1, either 0 or 1
c) Between 0 & 1, between 0 & 1
d) Either 0 or 1, either 0 or 1
View Answer
Answer: a
Explanation: Refer the definition of Fuzzy set and Crisp set.
4. Fuzzy logic is extension of Crisp set with an extension of handling the concept of
Partial Truth.
a) True
b) False
View Answer
Answer: a
Explanation: None.
advertisements
5. How many types of random variables are available?
a) 1
b) 2
c) 3
d) 4
View Answer
Answer: c
Explanation: The three types of random variables are Boolean, discrete and
continuous.
6. The room temperature is hot. Here the hot (use of linguistic variable is used) can be
represented by _______ .
a) Fuzzy Set
b) Crisp Set
View Answer
Answer: a
Explanation: Fuzzy logic deals with linguistic variables.
Answer: b
Explanation: Both Probabilities and degree of truth ranges between 0 – 1.
Answer: d
Explanation: None.
advertisements
9. Japanese were the first to utilize fuzzy logic practically on high-speed trains in
Sendai.
a) True
b) False
View Answer
Answer: a
Explanation: None.
Answer: c
Explanation: The version of probability theory we present uses an extension of
propositional logic for its sentences.
1. Fuzzy Set theory defines fuzzy operators. Choose the fuzzy operators from the
following.
a) AND
b) OR
c) NOT
d) EX-OR
View Answer
Answer: a, b, c
Explanation: The AND, OR, and NOT operators of Boolean logic exist in fuzzy logic,
usually defined as the minimum, maximum, and complement;
2. There are also other operators, more linguistic in nature, called __________ that
can be applied to fuzzy set theory.
a) Hedges
b) Lingual Variable
c) Fuzz Variable
d) None of the mentioned
View Answer
Answer: a
Explanation: None.
Answer: d
Explanation: Bayes rule can be used to answer the probabilistic queries conditioned
on one piece of evidence.
4. What does the Bayesian network provides?
a) Complete description of the domain
b) Partial description of the domain
c) Complete description of the problem
d) None of the mentioned
View Answer
Answer: a
Explanation: A Bayesian network provides a complete description of the domain.
advertisements
5. Fuzzy logic is usually represented as
a) IF-THEN-ELSE rules
b) IF-THEN rules
c) Both a & b
d) None of the mentioned
View Answer
Answer: b
Explanation: Fuzzy set theory defines fuzzy operators on fuzzy sets. The problem in
applying this is that the appropriate fuzzy operator may not be known. For this reason,
fuzzy logic usually uses IF-THEN rules, or constructs that are equivalent, such as
fuzzy associative matrices.
Rules are usually expressed in the form:
IF variable IS property THEN action
Answer: a
Explanation: Once fuzzy relations are defined, it is possible to develop fuzzy
relational databases. The first fuzzy relational database, FRDB, appeared in Maria
Zemankova’s dissertation.
Answer: d
Explanation: Entropy is amount of uncertainty involved in data. Represented by
H(data).
8. ____________ are algorithms that learn from their more complex environments
(hence eco) to generalize, approximate and simplify solution logic.
a) Fuzzy Relational DB
b) Ecorithms
c) Fuzzy Set
d) None of the mentioned
View Answer
Answer: c
Explanation: Local structure is usually associated with linear rather than exponential
growth in complexity.
advertisements
9. Which condition is used to influence a variable directly by all the others?
a) Partially connected
b) Fully connected
c) Local connected
d) None of the mentioned
View Answer
Answer: b
Explanation: None.
10. What is the consequence between a node and its predecessors while creating
Bayesian network?
a) Conditionally dependent
b) Dependent
c) Conditionally independent
d) Both a & b
View Answer
Answer: c
Explanation: The semantics to derive a method for constructing Bayesian networks
were led to the consequence that a node can be conditionally independent of its
predecessors
Artificial Intelligence Questions and
Answers – Neural Networks – 1
This set of Artificial Intelligence MCQs focuses on “Neural Networks – 1”.
1. A 3-input neuron is trained to output a zero when the input is 110 and a one when
the input is 111. After generalization, the output will be zero when and only when the
input is:
a) 000 or 110 or 011 or 101
b) 010 or 100 or 110 or 101
c) 000 or 010 or 110 or 100
d) 100 or 111 or 101 or 001
View Answer
Answer: c
Explanation: The truth table before generalization is:
Inputs Output
000 $
001 $
010 $
011 $
100 $
101 $
110 0
111 1
where $ represents don’t know cases and the output is random.
After generalization, the truth table becomes:
Inputs Output
000 0
001 1
010 0
011 1
100 0
101 1
110 0
111 1
.
2. A perceptron is:
a) a single layer feed-forward neural network with pre-processing
b) an auto-associative neural network
c) a double layer auto-associative neural network
d) a neural network that contains feedback
View Answer
Answer: a
Explanation: The perceptron is a single layer feed-forward neural network. It is not an
auto-associative network because it has no feedback and is not a multiple layer neural
network because the pre-processing stage is not made of neurons.
Answer: b
Explanation: An auto-associative network is equivalent to a neural network that
contains feedback. The number of feedback paths(loops) does not have to be one.
4. A 4-input neuron has weights 1, 2, 3 and 4. The transfer function is linear with the
constant of proportionality being equal to 2. The inputs are 4, 10, 5 and 20
respectively. The output will be:
a) 238
b) 76
c) 119
d) 123
View Answer
Answer: a
Explanation: The output is found by multiplying the weights with their respective
inputs, summing the results and multiplying with the transfer function. Therefore:
Output = 2 * (1*4 + 2*10 + 3*5 + 4*20) = 238.
advertisements
5. Which of the following is true?
(i) On average, neural networks have higher computational rates than conventional
computers.
(ii) Neural networks learn by example.
(iii) Neural networks mimic the way the human brain works.
a) All of the mentioned are true
b) (ii) and (iii) are true
c) (i), (ii) and (iii) are true
d) None of the mentioned
View Answer
Answer: a
Explanation: Neural networks have higher computational rates than conventional
computers because a lot of the operation is done in parallel. That is not the case when
the neural network is simulated on a computer. The idea behind neural nets is based
on the way the human brain works. Neural nets cannot be programmed, they cam only
learn by examples.
Answer: c
Explanation: The training time depends on the size of the network; the number of
neuron is greater and therefore the number of possible ‘states’ is increased. Neural
networks can be simulated on a conventional computer but the main advantage of
neural networks – parallel execution – is lost. Artificial neurons are not identical in
operation to the biological ones.
Answer: d
Explanation: Neural networks learn by example. They are more fault tolerant because
they are always able to respond and small changes in input do not normally cause a
change in output. Because of their parallel architecture, high computational rates are
achieved.
Answer: a
Explanation: Pattern recognition is what single layer neural networks are best at but
they don’t have the ability to find the parity of a picture or to determine whether two
shapes are connected or not.
advertisements
9. Which is true for neural networks?
a) It has set of nodes and connections
b) Each node computes it’s weighted input
c) Node could be in excited state or non-excited state
d) All of the mentioned
View Answer
Answer: d
Explanation: All mentioned are the characteristics of neural network.
Answer: b
Explanation: None.
Answer: d
Explanation: None.
Answer: c
Explanation: Back propagation is the transmission of error back through the network
to allow weights to be adjusted so that the network can learn.
Answer: b
Explanation: Linearly separable problems of interest of neural network researchers
because they are the only class of problem that Perceptron can solve successfully
Answer: a
Explanation: The artificial Neural Network (ANN) cannot explain result.
advertisements
5. Neural Networks are complex ______________ with many parameters.
a) Linear Functions
b) Nonlinear Functions
c) Discrete Functions
d) Exponential Functions
View Answer
Answer: a
Explanation: Neural networks are complex linear functions with many parameters.
6. A perceptron adds up all the weighted inputs it receives, and if it exceeds a certain
value, it outputs a 1, otherwise it just outputs a 0.
a) True
b) False
c) Sometimes – it can also output intermediate values as well
d) Can’t say
View Answer
Answer: b
Explanation: Also known as the step function – so answer 1 is also right. It is a hard
thresholding function, either on or off with no in-between.
8. Having multiple perceptrons can actually solve the XOR problem satisfactorily:
this is because each perceptron can partition off a linear part of the space itself, and
they can then combine their results.
a) True – this works always, and these multiple perceptrons learn to classify even
complex problems.
b) False – perceptrons are mathematically incapable of solving linearly inseparable
functions, no matter what you do
c) True – perceptrons can do this but are unable to learn to do it – they have to be
explicitly hand-coded
d) False – just having a single perceptron is enough
View Answer
Answer: c
Explanation: None.
advertisements
9. The network that involves backward links from output to the input and hidden
layers is called as ____.
a) Self organizing maps
b) Perceptrons
c) Recurrent neural network
d) Multi layered perceptron
View Answer
Answer: c
Explanation: RNN (Recurrent neural network) topology involves backward links from
output to the input and hidden layers.
Answer: d
Explanation: All mentioned options are applications of Neural Network
Answer: b
Explanation: Locality: In logical systems, whenever we have a rule of the form A =>
B, we can conclude B, given evidence A, without worrying about any other rules.
Detachment: Once a logical proof is found for a proposition B, the proposition can be
used regardless of how it was derived .That is, it can be detachment from its
justification. Truth-functionality: In logic, the truth of complex sentences can be
computed from the truth of the components. However, there are no Attachment
properties lies in a Rule-based system. Global attribute defines a particular problem
space as user specific and changes according to user’s plan to problem.
Answer: a
Explanation: FL incorporates a simple, rule-based IF X AND Y THEN Z approach to
a solving control problem rather than attempting to model a system mathematically.
3. In an Unsupervised learning
a) Specific output values are given
b) Specific output values are not given
c) No specific Inputs are given
d) Both inputs and outputs are given
e) Neither inputs nor outputs are given
View Answer
Answer: b
Explanation: The problem of unsupervised learning involves learning patterns in the
input when no specific output values are supplied. We cannot expect the specific
output to test your result. Here the agent does not know what to do, as he is not aware
of the fact what propose system will come out. We can say an ambiguous un-proposed
situation.
Answer: c
Explanation: Consistent hypothesis go with examples, If the hypothesis says it should
be negative but infect it is positive, it is false negative. If a hypothesis says it should
be positive, but in fact, it is negative, it is false positive. In a specialized hypothesis
we need to have certain restrict or special conditions.
Answer: b
Explanation: Neural networks parameters can be learned from noisy data and they
have been used for thousands of applications, so it varies from problem to problem
and thus use nonlinear functions.
8. A perceptron is a ——————————–.
a) Feed-forward neural network
b) Back-propagation algorithm
c) Back-tracking algorithm
d) Feed Forward-backward algorithm
e) Optimal algorithm with Dynamic programming
View Answer
Answer: a
Explanation: A perceptron is a Feed-forward neural network with no hidden units that
can be representing only linear separable functions. If the data are linearly separable,
a simple weight updated rule can be used to fit the data exactly.
advertisements
9. Which of the following statement is true?
a) Not all formal languages are context-free
b) All formal languages are Context free
c) All formal languages are like natural language
d) Natural languages are context-oriented free
e) Natural language is formal
View Answer
Answer: a
Explanation: Not all formal languages are context-free.
Answer: e
Explanation: The union and concatenation of two context-free languages is context-
free; but intersection need not be.
1. Factors which affect the performance of learner system does not include
a) Representation scheme used
b) Training scenario
c) Type of feedback
d) Good data structures
View Answer
Answer: d
Explanation: Factors which affect the performance of learner system does not include
good data structures.
Answer: d
Explanation: Different learning methods include memorization, analogy and
deduction.
Answer: d
Explanation: Decision trees, Neural networks, Propositional rules and FOL rules all
are the models of learning.
Answer: a
Explanation: In automatic vehicle set of vision inputs and corresponding actions are
available to learner hence it’s an example of supervised learning.
advertisements
5. Following is an example of active learning:
a) News Recommender system
b) Dust cleaning machine
c) Automated vehicle
d) None of the mentioned
View Answer
Answer: a
Explanation: In active learning, not only the teacher is available but the learner can
ask suitable perception-action pair example to improve performance.
6. In which of the following learning the teacher returns reward and punishment to
learner?
a) Active learning
b) Reinforcement learning
c) Supervised learning
d) Unsupervised learning
View Answer
Answer: b
Explanation: Reinforcement learning is the type of learning in which teacher returns
award or punishment to learner.
Answer: d
Explanation: Decision trees can be used in all the conditions stated.
Answer: d
Explanation: All mentioned options are applications of learning.
advertisements
9. Which of the following is the component of learning system?
a) Goal
b) Model
c) Learning rules
d) All of the mentioned
View Answer
Answer: d
Explanation: Goal, model, learning rules and experience are the components of
learning system.
1. What will take place as the agent observes its interactions with the world?
a) Learning
b) Hearing
c) Perceiving
d) Speech
View Answer
Answer: a
Explanation: Learning will take place as the agent observes its interactions with the
world and its own decision making process.
Answer: c
Explanation: A learning element modifies the performance element so that it can make
better decision.
Answer: c
Explanation: The three main issues are affected in design of a learning element are
components, feedback and representation.
Answer: d
Explanation: Linear weighted polynomial is used for learning element in the game
playing programs.
Answer: b
Explanation: Ockham razor prefers the simplest hypothesis consistent with the data
intuitively.
8. What will happen if the hypothesis space contains the true function?
a) Realizable
b) Unrealizable
c) Both a & b
d) None of the mentioned
View Answer
Answer: b
Explanation: A learning problem is realizable if the hypothesis space contains the true
function.
advertisements
9. What takes input as an object described by a set of attributes?
a) Tree
b) Graph
c) Decision graph
d) Decision tree
View Answer
Answer: d
Explanation: Decision tree takes input as an object described by a set of attributes and
returns a decision.
Answer: c
Explanation: A decision tree reaches its decision by performing a sequence of tests
1: ANN is composed of large number of highly interconnected processing
elements(neurons) working in unison to solve problems.
A.
True
B.
False
C.
D.
Option: A
Explanation :
2:
Artificial neural network used for
A.
Pattern Recognition
B.
Classification
C.
Clustering
D.
All of these
Explanation :
3:
A Neural Network can answer
A.
For Loop questions
B.
what-if questions
C.
IF-The-Else Analysis Questions
D.
None of these
Option: B
Explanation :
4:
Ability to learn how to do tasks based on the data given for training or initial
experience
A.
Self Organization
B.
Adaptive Learning
C.
Fault tolerance
D.
Robustness
Option: B
Explanation :
5:
Feature of ANN in which ANN creates its own organization or representation of
information it receives during learning time is
A.
Adaptive Learning
B.
Self Organization
C.
What-If Analysis
D.
Supervised Learniing
Option: B
Explanation :
Read more: http://www.avatto.com/computer-science/test/mcqs/soft-
computing/ann/514/1.html#ixzz46VE8CQAp
6:
In artificial Neural Network interconnected processing elements are called
A.
nodes or neurons
B.
weights
C.
axons
D.
Soma
Option: A
Explanation :
7:
Each connection link in ANN is associated with ________ which has information
about the input signal.
A.
neurons
B.
weights
C.
bias
D.
activation function
Option: B
Explanation :
8:
Neurons or artificial neurons have the capability to model networks of original
neurons as found in brain
A.
True
B.
False
C.
D.
Option: A
Explanation :
9:
Internal state of neuron is called __________, is the function of the inputs the
neurons receives
A.
Weight
B.
activation or activity level of neuron
C.
Bias
D.
None of these
Option: B
Explanation :
10:
Neuron can send ________ signal at a time.
A.
multiple
B.
one
C.
none
D.
any number of
Answer Report Discuss
Option: B
Explanation :
A
. It uses machine-learning techniques. Here program can learn From past
experience and adapt themselves to new situations
B.
Computational procedure that takes some value as input and produces some
value as output.
C.
Science of making machines performs tasks that would require intelligence
when performed by humans
D
. None of these
Option: C
Explanation :
2:
Expert systems
A
. Combining different types of method or information
B.
Approach to the design of learning algorithms that is structured along the lines
of the theory of evolution
C.
an information base filled with the knowledge of an expert formulated in terms
of if-then rules
D
. None of these
Option: C
Explanation :
3:
Falsification is
A.
Modular design of a software application that facilitates the integration of new
modules
B.
Showing a universal law or rule to be invalid by providing a counter example
C.
A set of attributes in a database table that refers to data in another table
D.
None of these
Option: B
Explanation :
4:
Evolutionary computation is
A
. Combining different types of method or information
B.
Approach to the design of learning algorithms that is structured along the lines
of the theory of evolution.
C.
Decision support systems that contain an information base filled with the
knowledge of an expert formulated in terms of if-then rules.
D
. None of these
Option: B
Explanation :
5:
Extendible architecture is
A.
Modular design of a software application that facilitates the integration of new
modules
B.
Showing a universal law or rule to be invalid by providing a counter example
C.
A set of attributes in a database table that refers to data in another table
D.
None of these
Option: A
Explanation :
A.
A programming language based on logic
B.
A computer where each processor has its own operating system, its own
memory, and its own hard disk
C.
Describes the structure of the contents of a database.
D.
None of these
Option: B
Explanation :
7:
Search space
A
. The large set of candidate solutions possible for a problem
B.
The information stored in a database that can be, retrieved with a single query.
C.
Worth of the output of a machine learning program that makes it understandable
for humans
D
. None of these
Option: A
Explanation :
8:
n(log n) is referred to
A.
A measure of the desired maximal complexity of data mining algorithms
B.
A database containing volatile data used for the daily operation of an
organization
C.
Relational database management system
D.
None of these
Option: A
Explanation :
9:
Perceptron is
A.
General class of approaches to a problem.
B.
Performing several computations simultaneously
C.
Structures in a database those are statistically relevant
D.
Simple forerunner of modern neural networks, without hidden layers
Answer Report Discuss
Option: D
Explanation :
10:
Prolog is
A.
A programming language based on logic
B.
A computer where each processor has its own operating system, its own
memory, and its own hard disk
C.
Describes the structure of the contents of a database
D.
None of these
Option: A
Explanation :
A
. The large set of candidate solutions possible for a problem
B.
The information stored in a database that can be, retrieved with a single query
C.
Worth of the output of a machine learning program that makes it
understandable for humans
D
. None of these
Option: B
Explanation :
12:
Quantitative attributes are
A.
A reference to the speed of an algorithm, which is quadratically dependent
on the size of the data
B.
Attributes of a database table that can take only numerical values
C.
Tools designed to query a database
D.
None of these
Answer Report Discuss
Option: B
Explanation :
13:
Subject orientation
A
. The science of collecting, organizing, and applying numerical facts
B.
Measure of the probability that a certain hypothesis is incorrect given certain
observations.
C.
One of the defining aspects of a data warehouse, which is specially built
around all the existing applications of the operational data
D
. None of these
Option: C
Explanation :
14:
Vector
A.
It do not need the control of the human operator during their execution
B.
An arrow in a multi-dimensional space. It is a quantity usually characterized
by an ordered set of scalars
C.
The validation of a theory on the basis of a finite number of examples
D.
None of these
Option: B
Explanation :
15:
Transparency
A
. The large set of candidate solutions possible for a problem
B.
The information stored in a database that can be retrieved with a single query
C.
Worth of the output of a machine learning program that makes it
understandable for humans
D
. None of these
Explanation :
A.
Fuzzy Computing, Neural Computing, Genetic Algorithms
B.
Fuzzy Networks and Artificial Intelligence
C.
Artificial Intelligence and Neural Science
D.
Neural Science and Genetic Science
Option: A
Explanation :
2:
Who initiated the idea of Soft Computing
A.
Charles Darwin
B.
Lofti A Zadeh
C.
Rechenberg
D.
Mc_Culloch
Option: B
Explanation :
3:
Fuzzy Computing
A
. mimics human behaviour
B.
doesnt deal with 2 valued logic
C.
deals with information which is vague, imprecise, uncertain, ambiguous,
inexact, or probabilistic
D
. All of the above
Option: D
Explanation :
4:
Neural Computing
A.
mimics human brain
B.
information processing paradigm
C.
Both (a) and (b)
D.
None of the above
Option: C
Explanation :
5:
Genetic Algorithm are a part of
A
. Evolutionary Computing
B.
inspired by Darwin's theory about evolution - "survival of the fittest"
C.
are adaptive heuristic search algorithm based on the evolutionary ideas of
natural selection and genetics
D
. All of the above
Option: D
Explanation
A.
Improvised and unimprovised
B.
supervised and unsupervised
C.
Layered and unlayered
D.
None of the above
Option: B
Explanation :
7:
Supervised Learning is
A.
learning with the help of examples
B.
learning without teacher
C.
learning with the help of teacher
D.
learning with computers as supervisor
Option: C
Explanation :
8:
Unsupervised learning is
A.
learning without computers
B.
problem based learning
C.
learning from environment
D.
learning from teachers
Answer Report Discuss
Option: C
Explanation :
9:
Conventional Artificial Intelligence is different from soft computing in the sense
A.
Conventional Artificial Intelligence deal with prdicate logic where as soft
computing deal with fuzzy logic
B.
Conventional Artificial Intelligence methods are limited by symbols where
as soft computing is based on empirical data
C.
Both (a) and (b)
D.
None of the above
Option: C
Explanation :
10:
In supervised learning
A.
classes are not predefined
B.
classes are predefined
C.
classes are not required
D.
classification is not done
Option: B
Explanation :
A.
True
B.
False
C.
D.
Option: A
Explanation :
2:
The membership functions are generally represented in
A.
Tabular Form
B.
Graphical Form
C.
Mathematical Form
D.
Logical Form
Option: B
Explanation :
3:
Membership function can be thought of as a technique to solve empirical problems
on the basis of
A.
knowledge
B.
examples
C.
learning
D.
experience
Option: D
Explanation :
A.
Intution, Inference, Rank Ordering
B.
Fuzzy Algorithm, Neural network, Genetic Algorithm
C.
Core, Support , Boundary
D.
Weighted Average, center of Sums, Median
Option: C
Explanation :
5:
The region of universe that is characterized by complete membership in the set is
called
A.
Core
B.
Support
C.
Boundary
D.
Fuzzy
Option: A
Explanation :
A.
sub normal fuzzy sets
B.
normal fuzzy set
C.
convex fuzzy set
D.
concave fuzzy set
Answer Report Discuss
7:
In a Fuzzy set a prototypical element has a value
A.
1
B.
0
C.
infinite
D.
Not defined
Option: A
Explanation :
8:
A fuzzy set wherein no membership function has its value equal to 1 is called
A.
normal fuzzy set
B.
subnormal fuzzy set.
C.
convex fuzzy set
D.
concave fuzzy set
Option: B
Explanation :
9: A fuzzy set has a membership function whose membership values are strictly
monotonically increasing or strictly monotonically decreasing or strictly
monotonically increasing than strictly monotonically decreasing with increasing
values for elements in the universe
A.
convex fuzzy set
B.
concave fuzzy set
C.
Non concave Fuzzy set
D.
Non Convex Fuzzy set
Option: A
Explanation :
10:
The membership values of the membership function are nor strictly
monotonically increasing or decreasing or strictly monoronically increasing than
decreasing.
A.
Convex Fuzzy Set
B.
Non convex fuzzy set
C.
Normal Fuzzy set
D.
Sub normal fuzzy set
Option: B
Explanation :
List I
List II
A.
a b c d
2 1 4 3
B.
a b c d
1 2 3 4
C.
a b c d
4 3 2 1
D.
a b c d
3 2 1 4
Option: A
Explanation :
12: The crossover points of a membership function are defined as the elements in the
universe for which a particular fuzzy set has values equal to
A.
infinite
B.
1
C.
0
D.
0.5
Option: D
Explanation :
Questions
(i)
evolution
(ii)
selection
(iii)
reproduction
(iv)
mutation
: Your answer is
(a)
i & ii only
(b)
(c)
(a) (i)
(b) (ii)
crossover chromosomes
(c) (iii)
mutation survivability
(d) (iv)
: Your answer is .3
4. (a)
5. _____
6. (b)
7. _____
8. (c)
9. _____
10.(d)
11._____
(i)
(ii)
biology
(iii)
Artificial Life
(iv)
economics
: Your answer is
(a)
(b)
(c)
(d)
(i)
encoding of solutions
(ii)
(iii)
(iv)
: Your answer is
(a)
i & ii only
(b)
(c)
(i)
(ii)
GAs are exhaustive, giving out all the optimal solutions to a given
problem.
(iii)
(iv)
: Your answer is
(a)
(b)
(c)
(d)
(i)
(ii)
(iv)
The search space of the problem is not ideal for GAs to operate.
: Your answer is
(a)
(b)
(c)
(d)
: Your answer is
(a)
(b)
(c)
(d)
(i)
Artificial Life is analytic, trying to break down complex phenomena
into their basic components.
(ii)
(iii)
(iv)
: Your answer is
(a)
i & ii only
(b)
(c)
(d)
(i)
(ii)
biology
(iii)
robotics
(iv)
(a)
(b)
(c)
(d)
(i)
children
(ii)
designers
(iii)
artists
(iv)
patients
: Your answer is
(a)
(b)
(c)
(d)
Q1.
...Go Back
Q2.
(a)
(ii)
(b)
(iv)
(c)
(i)
(d)
(iii)
...Go Back
Q3.
...Go Back
Q4.
The problem is mapped into a set of strings with each string representing a
potential solution (i.e. chromosomes). A fitness function is required to
compare and tell which solution is better. GA performance is heavily
.dependent on the representation chosen
...Go Back
Q5.
The search space is too complex for exhaustive search such that GAs
successfully find robust solutions after evaluating only a few percent of the
.full parameter space
It can never be guaranteed that GAs will find an optimal solution or even any
.solution at all
...Go Back
Q6.
...Go Back
Q7.
...Go Back
Q8.
...Go Back
Q9.
...Go Back
Q10.
...Go Back
SOFT COMPUTING
UNIT – I
4. In the neuron, attached to the soma are long irregularly shaped filaments called--------------
b) φ(I)=0
c) φ(I)=+1,I>0
d) φ(I)=-1,I<=0
6. To generate the final output, the sum is passed on to a non-linear filter φ called
7. ---------------function is a continuous function that varies gradually between the asymptotic values 0
and 1 or -1 and +1
9.-------------------- carrying the weights connect every input neuron to the output neuron but not
vice-versa.
11. In the learning method, the target output is not presented to the network ----------------
a) O=gI,g=tanφ
b) O=gI,g=sinφ
c) O=gI,g=cosφ
d) O=gI,g=-tanφ
18.--------------- is never assured of finding global minimum as in the simple layer delta rulecase.
24. In Rosenblatt’s Perception network has three units, sensory unit, association unit and
--------------a)Output unit b) Response unit c) feedback unit d) Result unit
PART-B
PART-C
1.----------------is a store house of associated patterns which are encoded in some form
2. If the associated pattern pairs (x,y) are different and if the model recalls a y given an x or vice
versa, then it is termed as -------------
a)E(A,B)=AMBT
b)E(A,B)=-AMBT
C)E(A,B)=-ABT
D)E(A,B)=ABT
16)------------------ of the network means that a pattern should not oscillate among different cluster
units at different stages of training
19)In ---------------- learning the weights are adjusted only when the external input matches one of
the stored prototypes
b)Stability dilemma
c)Plasticity dilemma
d)None
a)ARTMAP
b)Fuzzy art
c)Fuzzy Artmap
d)ART1
a)ART1
b)ART2
c)ARTMAP
d)Fuzzy ART
PART-B
2.Explain HeterCorrelators
UNIT-3
5.A -------------- of a set A is the set of all possible subsets that are derivable from A including null set
6.The member ship function of fuzzy set not always be described by ----------------
11.In case of => operator, the proposition occurring before the “=>” symbol is called---------
a. antecedent b.consequent c.conjunction d.disjunction
13.A formula which has all its interpretations recording true is known as a ----------------
23.The ------------------ are obtained by computing the minimum of the membership functions of the
antecedents.
PART-B
PART-C
PART-A
10.------------------ means that the genes from the already discovered good individuals are exploited
13.The ----------------- is referred the proportion of individuals in the the population which are
replaced in each generation.
a.gap b.generation gap c.generation interval d.interval
18.-------------------- is a process in which a given bit pattern is transformed into another bit pattern by
means of logical bit-wise operation.
19.In ------------------, inversion was applied with specified inversion probability p to each new
individual when it is created.
20.The -------------causes all the bits in the first operand to the shifted to the left by the number of
positions indicated by the second operand.
21.A --------------- returns 1 if one of the bits have a value of 1 and the other has a value of 0
otherwise it returns a value 0.
22.Population size, Mutation rate and cross over rate are together referred to as ---------------
23.-------------selection is slow cooling of molten metal to achieve the minimum function value in a
minimization problem.
PART-B
PART-C
PART-A
2.In -------------, one technology calls the other as a subroutine to process or manipulate
information needed by it.
4.--------------hyrbid systems the technologies participating are integerated in such a manner that
they appear interwined.
5.------------- deals with uncertainty problems with its own merits and demerits
10.----------------is a neuro-fuzzy hybrid in which the host is a recurrent network with a kind of
competitive learning.
15.--------------learning have reported difficulties in learning the topology of the networks whose
weights they optimize
PART-B
5.Explain FAM
PART-C