You are on page 1of 6

BATC-601

1. The choice of the model assessment metric should be tied to-------------rather than---------
 algorithmic considerations, operational expedience
 operational considerations, algorithmic expedience
 None of these
 algorithmic expedience, operational considerations
2. Which of the following confusion matrix measures uses all quadrants of confusion
matrix?
 PCC
 Precision
 Recall
 None of these
3. Which of the following algorithms is best suited for reducing the number of inputs for
predictive models?
 All of these
 K-Means clustering
 Principal Component Analysis (PCA)
 Kohonen Self-Organizing Maps (SOM)
4. Software provide useful information about cluster but fails to explain about-----------
 meaning of cluster
 None of these
 Both (a) and (b)
 how clusters are formed by algorithm
5. Predictive analytics is the process of
 just cleaning data
 information retrieval to make useful predictions about future outcomes
 just compressing data
 guessing about present output without any data
6. In boosting algorithm, final predictions are made based on----------- of predictions from
all models
 median
 None of these
 weighted average
 average
BATC-601
7. In K MEANS, what is the number of clusters in the data?
 It is always 2
 It must be pre-specified
 It is always 3
 Algorithm will determine the same dynamically
8. Which of the following is NOT a step in CRISP-DM?
 Business understanding
 Data understanding
 Customer understanding
 Modelling
9. What sampling technique do statisticians typically use to assess model stability?
 Cross Validation
 Curse of dimensionality
 Temporal Sequencing
 Rule of Thumb
10. Which of the following is a standard datamining methodology?

 SPSS
 Mineset
 CRISP-DM
 Clementine

11. Which of the following statements is correct?


 Missing Not at Random (MNAR) means them is sing value can be inferred in
general by the mere fact that the value is missing
BATC-601
 Missing Completely at Random (MCAR) implies a conditional relationship between
the missing value and other variables
 Missing at Random (MAR) means that there is no way to determine what the value
should have been
 All of these
12. Which of the following statements is incorrect?
 Inputs need not be numeric for Kohonen Self-Organizing Maps (SOM) algorithm.
 Inputs must be numeric for K-Means clustering algorithm.
 Kohonen Self-Organizing Maps (SOM) needs all data to be populated, there can be
no missing values.
 When using Principal Component Analysis (PCA), any categorical variable to be
included in the model, must be converted to a number.
13. As a thump rule or guiding principle, ANOVA method works--------------- when there
are --------------clusters.
 worst, large no. of
 best, small no. of
 best, large no. of
 worst, small no. of
14. Which of the following statements is true?
 Statistics is often based on non-parametric algorithms; no guaranteed optimum.
 Statistics algorithms are not as efficient or stable for small data.
 In statistics, models are typically nonlinear.
 In statistics, data is typically smaller, the model is important.
15. Dummy variable --------- is helpful to reduce bias with dummy variables.
 removal
 inclusion
 None of these
 Scaling

16. What is the correct two-way combinations/interactions possible, if the number of


variables is 5?
 The number of possible two-way interactions is 20
BATC-601
 The number of possible two-way interactions is 5.
 The number of possible two-way interactions is 2.
 Tue number of possible two-way interactions is 10
17. What are the challenges in using Predictive Analytics?
 Predictive models require data in the form of two-dimensional data (rows and
columns).
 All of these
 Often, deployment of predictive models require shift in resources for an
organization.
 The models become too complex because of overfitting
18. What is the value of skew in a normal distnbution?
 0
 1
 Less than I
 Greater than I
19. Assume that we have records of each visit by a customer to a medical shop. Which of
the following will be a derived variable?
 Customer 's name
 Doctor who prescribed the medicine
 Date of birth
 Average spend in the last month
20. Which of the following is a property of normal distribution?
 The distribution is asymmetric.
 The mean, median and mode are all the same value
 The median and the mode are not the same value
 The mean and the median are not the same value
21. Models that are accurate, on the data used to train the models, generally show
 overfitting
 All of the above
 randomness
 underfitting

22. Which of these is False about Correlations between two variables


 one variable meaning is related to another's
BATC-601
 none of the above
 measurer the numerical relationship of one variable to another's
 Both of these
23. What is true about a distribution measured by kurtosis?
 Kurtosis is always negative
 A leptokurtic distribution is one in which Kurtosis values is more than 4
 A platykurtic distribution is one in which Kurtosis values is greater than 3
 Normal distribution will have a Kurtosis value of 2
24. Which of the following statements is incorrect?
 Descriptive modeling algorithms try to find relationships between inputs
 Descriptive modeling algorithms are also called as unsupervised learning methods.
Descriptive modeling algorithms discover the best way to segment the data
 Descriptive modeling algorithms try to find relationships that associate inputs to one
or more target variables
25. Which of the following is a property of Uniform Distnbution?
 The mean and the midpoint of the distnbution are not the same
 The distribution is infinite
 The distribution infinite, with a minimum and maximum value
 The distribution is asymmetric about the mean.
26. Which of the following is a property of normal distribution?
 Approximately 95% of the data will fall between the mean and +/-3 standard
deviations from the mean
 Approximately 95% of the data will fall between the mean and +/-4 standard
deviations from the mean
 Approximately 95% of the data will fall between the mean and +/-2 standard
deviations from the mean
 Approximately 95% of the data will fall between the mean and +/-1 standard
deviation from the mean.

27. -----------------is an important requirement for building good bagged ensembles.


 Exact fitting the model
BATC-601
 Underfitting the model
 Overfitting the model
 None of these
28. If the distribution has spikes, what is a good corrective action?
 Log1O transform
 Binning into regions centered on spikes
 Flip transform
 Power transform
29. Which of the following is NOT a Single-Variable Selection Technique?
 Linear regression forward selection (1 step)
 Chi-square Test
 Simpson's Paradox
 ANOVA
30. Most frequent metrics to assess model accuracy in classification problems is
 ROC
 None of the above
 PCC
 AUC

You might also like