Batc 601

BATC-601
1. The choice of the model assessment metric should be tied to-------------rather than---------
 algorithmic considerations, operational expedience
 operational considerations, algorithmic expedience
 None of these
 algorithmic expedience, operational considerations
2. Which of the following confusion matrix measures uses all quadrants of confusion
matrix?
 PCC
 Precision
 Recall
 None of these
3. Which of the following algorithms is best suited for reducing the number of inputs for
predictive models?
 All of these
 K-Means clustering
 Principal Component Analysis (PCA)
 Kohonen Self-Organizing Maps (SOM)
4. Software provide useful information about cluster but fails to explain about-----------
 meaning of cluster
 None of these
 Both (a) and (b)
 how clusters are formed by algorithm
5. Predictive analytics is the process of
 just cleaning data
 information retrieval to make useful predictions about future outcomes
 just compressing data
 guessing about present output without any data
6. In boosting algorithm, final predictions are made based on----------- of predictions from
all models
 median
 None of these
 weighted average
 average
BATC-601
7. In K MEANS, what is the number of clusters in the data?
 It is always 2
 It must be pre-specified
 It is always 3
 Algorithm will determine the same dynamically
8. Which of the following is NOT a step in CRISP-DM?
 Business understanding
 Data understanding
 Customer understanding
 Modelling
9. What sampling technique do statisticians typically use to assess model stability?
 Cross Validation
 Curse of dimensionality
 Temporal Sequencing
 Rule of Thumb
10. Which of the following is a standard datamining methodology?
 SPSS
 Mineset
 CRISP-DM
 Clementine
11. Which of the following statements is correct?

 Missing Not at Random (MNAR) means them is sing value can be inferred in
general by the mere fact that the value is missing
BATC-601
 Missing Completely at Random (MCAR) implies a conditional relationship between
the missing value and other variables
 Missing at Random (MAR) means that there is no way to determine what the value
should have been
 All of these
12. Which of the following statements is incorrect?
 Inputs need not be numeric for Kohonen Self-Organizing Maps (SOM) algorithm.
 Inputs must be numeric for K-Means clustering algorithm.
 Kohonen Self-Organizing Maps (SOM) needs all data to be populated, there can be
no missing values.
 When using Principal Component Analysis (PCA), any categorical variable to be
included in the model, must be converted to a number.
13. As a thump rule or guiding principle, ANOVA method works--------------- when there
are --------------clusters.
 worst, large no. of
 best, small no. of
 best, large no. of
 worst, small no. of
14. Which of the following statements is true?
 Statistics is often based on non-parametric algorithms; no guaranteed optimum.
 Statistics algorithms are not as efficient or stable for small data.
 In statistics, models are typically nonlinear.
 In statistics, data is typically smaller, the model is important.
15. Dummy variable --------- is helpful to reduce bias with dummy variables.
 removal
 inclusion
 None of these
 Scaling
16. What is the correct two-way combinations/interactions possible, if the number of

variables is 5?
 The number of possible two-way interactions is 20
BATC-601
 The number of possible two-way interactions is 5.
 The number of possible two-way interactions is 2.
 Tue number of possible two-way interactions is 10
17. What are the challenges in using Predictive Analytics?
 Predictive models require data in the form of two-dimensional data (rows and
columns).
 All of these
 Often, deployment of predictive models require shift in resources for an
organization.
 The models become too complex because of overfitting
18. What is the value of skew in a normal distnbution?
 0
 1
 Less than I
 Greater than I
19. Assume that we have records of each visit by a customer to a medical shop. Which of
the following will be a derived variable?
 Customer 's name
 Doctor who prescribed the medicine
 Date of birth
 Average spend in the last month
20. Which of the following is a property of normal distribution?
 The distribution is asymmetric.
 The mean, median and mode are all the same value
 The median and the mode are not the same value
 The mean and the median are not the same value
21. Models that are accurate, on the data used to train the models, generally show
 overfitting
 All of the above
 randomness
 underfitting
22. Which of these is False about Correlations between two variables

 one variable meaning is related to another's
BATC-601
 none of the above
 measurer the numerical relationship of one variable to another's
 Both of these
23. What is true about a distribution measured by kurtosis?
 Kurtosis is always negative
 A leptokurtic distribution is one in which Kurtosis values is more than 4
 A platykurtic distribution is one in which Kurtosis values is greater than 3
 Normal distribution will have a Kurtosis value of 2
24. Which of the following statements is incorrect?
 Descriptive modeling algorithms try to find relationships between inputs
 Descriptive modeling algorithms are also called as unsupervised learning methods.
Descriptive modeling algorithms discover the best way to segment the data
 Descriptive modeling algorithms try to find relationships that associate inputs to one
or more target variables
25. Which of the following is a property of Uniform Distnbution?
 The mean and the midpoint of the distnbution are not the same
 The distribution is infinite
 The distribution infinite, with a minimum and maximum value
 The distribution is asymmetric about the mean.
26. Which of the following is a property of normal distribution?
 Approximately 95% of the data will fall between the mean and +/-3 standard
deviations from the mean
deviation from the mean.
27. -----------------is an important requirement for building good bagged ensembles.

 Exact fitting the model
BATC-601
 Underfitting the model
 Overfitting the model
 None of these
28. If the distribution has spikes, what is a good corrective action?
 Log1O transform
 Binning into regions centered on spikes
 Flip transform
 Power transform
29. Which of the following is NOT a Single-Variable Selection Technique?
 Linear regression forward selection (1 step)
 Chi-square Test
 Simpson's Paradox
 ANOVA
30. Most frequent metrics to assess model accuracy in classification problems is
 ROC
 None of the above
 PCC
 AUC

Batc 601

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Batc 601

Uploaded by

Copyright:

Available Formats

BATC-601

11. Which of the following statements is correct?

16. What is the correct two-way combinations/interactions possible, if the number of

22. Which of these is False about Correlations between two variables

27. -----------------is an important requirement for building good bagged ensembles.

You might also like