You are on page 1of 21

WEEK -1

1) Statement: Descriptive Analytics, is the conventional form of Business Intelligence


and data analysis.
A. True
B. False

2) Which of the following is not an example of predictive analytics?


A. Linear regression
B. Time series analysis and forecasting
C. Bar Graphs
D. Data mining

3) State true or False: Statement: Data can be numerical and categorical but cannot be continuous or
discrete
A. True
B. False

4) Which of the following is not an example of Ratio Data?


A. Height
B. Year
C. Age
D. Weight

5) Which of the following is a command to have a message appear on the screen?


A. Print
B. Input
C. Write
D. Msg

6) What is the output from this print () function call: print (‘$100 $200 $300'.count ('$'), '$100 $200
$300'.count ('$', 5, 10), '$100 $200 $300'.count ('$', 5))
A. 3 1 0
B. 3 1 1
C. 3 1 2
D. 3 1 3
7) Median is not applicable to 
A. Ordinal
B. Interval
C. Nominal
D. None of the above 

8) State True or false: Statement: data can be generated by machines but not by humans.
A. True
B. False

9) Which one of the following is not a classification of Data Analytics?


A. Diagnostic analytics
B. Deceptive analytics
C. Predictive analytics
D. Prescriptive analytics

10) For getting 2nd, 4th & 7th row of a datafile “df”in Python programming, we can write:  


A. df.loc[[2,3,5]]  
B. df.loc[[1,3,6]]  
C. df.iloc[2,4,7]  
D. None of the above  

WEEK-2

1) Which of the following is not a method for describing a sample space?


A. roster or listing
B. tree diagram
C. Offset builder notation
D. Venn diagram

2) A club of 4 is to be selected from a group of 12 people. How many possible clubs can be selected?
A. 395
B. 425
C. 495
D. 525

3) Eight individuals are candidates for positions of president, vice president, and treasurer of an
organisation. How many possibilities of selections exist?
A. 300
B. 330
C. 336
D. 339
4) A college plans to interview 8 students for possible offers of graduate assistantships. The college
has three assistantships available. How many groups of three can the college select?
A. 126
B. 56
C. 136
D. 130

5) company plans to interview 10 recent graduates for possible employment. The company has three
positions open. How many groups of three can the company select?
A. 90
B. 100
C. 120
D. 130

6) Individual outcome of an experiment is called?


A. the sample space
B. a sample point
C. an experiment
D. an individual

7) Two events having nonzero probabilities


A. can be both mutually exclusive and independent
B. cannot be both mutually exclusive and independent
C. are always mutually exclusive
D. are always independent

8) On an average 5 % of items supplied by manufacturer X are defective. If a batch of 10 items is


inspected: what is the probability that 2 items are defective
A. 0.065
B. 0.075
C. 0.085
D. 0.095

9) A question paper contains 90 multiple choice questions. There are 4 alternative answers (A, B, C
or D) out of which only one is correct. Mr X answers these questions randomly (i.e. without
preparation). What is the probability that X gets a score of at least 10 marks?
A. 0.9997
B. 0.7894
C. 0
D. 0.001

10) State true or False: Statement: A distribution can either be discrete or continuous, it can’t be both
at the same time.
A. True
B. False
WEEK-3

1) State True or False: Statement: The specific value of a random variable is called estimator
A. True
B. False

2) If the true proportion of customers who are below 100 kg weight is 0.4, what is the probability that
a sample size 100 yields a sample proportion between 0.3 to 0.4
A. 0.961
B. 0.827
C. 0.706
D. 0.479

3) Stratified random sampling is a method of selecting a sample in which

A. the sample is first divided into strata, and then random samples are taken from each
stratum
B. various strata are selected from the sample
C. the population is first divided into strata, and then random samples are drawn from
each stratum
D. None of these alternatives is correct.

4) The Poisson probability distribution is which of the following types?

A. Continuous probability distribution


B. Discrete probability distribution
C. Uniform probability distribution
D. Normal probability distribution

5) Assuming a binomial experiment with p = 0.5 and a sample size of 100. The expected value of this
distribution is?

A. 0.50
B. 0.30
C. 100
D. 50

6) A hypergeometric probability distribution is identical to

A. the Poisson probability distribution


B. the binomial probability distribution
C. the normal distribution
D. None of these alternatives is correct

7) Why should one not go for sampling?

A. Less costly to administer than a census.


B. The person authorising the study is comfortable with the sample.
C. Because the research process is sometimes destructive
D. None of the above
8) A car distributor in city Y experiences on an average 2.5 car sales per day. Find the probability that
on a randomly selected day, they will sell 5 car:

A. 0.0668
B. 0.544
C. 0.082
D. 0.205

9) In question 8, Find the probability that on a randomly selected day, they will sell no cars:

A. 0.0668
B. 0.544
C. 0.082
D. 0.205

10) In question 8, Find the probability that on a randomly selected day, they will sell at most 2 cars

A. 0.0668
B. 0.544
C. 0.082
D. 0.205

WEEK-4

1) In hypothesis testing if the null hypothesis is rejected


A. no conclusions can be drawn from the test
B. the alternative hypothesis is true
C. the data must have been accumulated incorrectly
D. the sample size has been too small

2) The level of significance is the

A. maximum allowable probability of Type II error


B. maximum allowable probability of Type I error
C. same as the confidence coefficient
D. same as the p-value
3) When the following hypotheses are being tested at a level of significance of alpha

H0: Mu ≥ 500

Ha: Mu < 500

the null hypothesis will be rejected if the p-value is

A. ≤ alpha
B. > alpha
C. > Beta/2
D. 1 - (alpha/2)

4) In a two-tailed hypothesis test situation, the test statistic is determined to be t = -2.692. The sample
size has been 45. The p-value for this test is

A. -0.005
B. +0.005
C. -0.01
D. +0.01
5) In a lower one-tail hypothesis test situation, the p-value is determined to be 0.2. If the sample size for
this test is 51, the t statistic has a value of

A. 0.849
B. -0.849
C. 1299
D. -1299

6) A machine is designed to fill toothpaste tubes with 5.8 ounces of toothpaste. The manufacturer does
not want any underfilling or overfilling. The correct hypotheses to be tested are

A. H0: Mu not equals to 5.8, Ha: Mu = 5.8


B. H0: Mu = 5.8, Ha: Mu not equals to 5.8
C. H0: Mu > 5.8, Ha: Mu =< 5.8
D. H0: Mu >= 5.8, Ha: Mu < 5.8

7) The quality-control manager at a Li-BATTERY factory needs to determine whether the mean life of a
large shipment of Li-Battery is equal to the specified value of 375 hours. The process standard deviation is
known to be 100 hours. A random sample of 64 batteries indicates a sample mean life of 350 hours. State
the null hypotheses

A. Mu = 375
B. Mu ≤ 375
C. Mu = 350
D. Mu ≥ 350

8)In question 7, At the alpha = 0.05 level of significance is there any evidence that the mean life is
different from 375 hours?

A. Yes, there is
B. No, there is not
C. None of the above

9)In question 7, Computed the p-value is:

A. 0.0456
B. 0.456
C. 0.0228
D. 0.228

10) In question 7, at 95% confidence interval estimate of the population mean life of the battery is:

A. 325.5 to 379.5
B. 325.5 to 374.5
C. 320.5 to 379.5
D. 320.5 to 374.5
WEEK-5

1)The ‘F’ ratio in a completely randomized ANOVA is the ratio of

A. MST/MSE
B. MSTR/MSE
C. MSE/MSTR
D. MSE/MST

2) A term that means the same as the term "variable" in an ANOVA procedure is

A. factor
B. treatment
C. replication
D. variance within

3) An ANOVA procedure is used for four samples, each comprised of 30 observations, were taken from the
four populations. The numerator and denominator (respectively) degrees of freedom for the critical value of
F are

A. 3 and 30
B. 4 and 30
C. 3 and 119
D. 3 and 116

4)Which of the following is not a required assumption for the analysis of variance?

A. The random variable of interest for each population has a normal probability distribution
B. The variance associated with the random variable must be the same for each population
C. At least 2 populations are under consideration
D. Populations have equal means

5)If, sum of square between treatments (SSTR) = 6,750 H0: m1=m2=m3=m4 Sum of square of error (SSE)
= 8,000 Ha: at least one mean is different Total number of elements (nT )= 20 The mean square between
treatments (MSTR) equals

A. 400
B. 500
C. 1,687.5
D. 2,250

6) If, sum of square between treatments (SSTR) = 6,750 H0: m1=m2=m3=m4 Sum of square of error
(SSE) = 8,000 Ha: at least one mean is different Total number of elements (nT )= 20. The mean square
within treatments (MSE) equals

A. 400
B. 500
C. 1687.5
D. 2250

7)If, sum of square between treatments (SSTR) = 6,750 H0: m1=m2=m3=m4 Sum of square of error (SSE)
= 8,000 Ha: at least one mean is different Total number of elements (nT )= 20.The test statistic to test the
null hypothesis is equals

A. 0.22
B. 0.84
C. 4.22
D. 4.5
8)An ANOVA procedure is applied to data obtained from 6 samples where each sample contains 20
observations. The degrees of freedom for the critical value of F are

A. 6 numerator and 20 denominator degrees of freedom


B. 5 numerator and 20 denominator degrees of freedom
C. 5 numerator and 114 denominator degrees of freedom
D. 6 numerator and 114 denominator degrees of freedom

9)The critical F value with 6 numerator and 60 denominator degrees of freedom at alpha = .05 is

A. 3.74
B. 2.25
C. 2.37
D. 1.96

10)The ANOVA procedure is a statistical approach for determining whether or not

A. the means of two samples are equal


B. the means of two or more samples are equal
C. the means of more than two samples are equal
D. the means of two or more populations are equal

WEEK-6

A. 0.887
B. 0.956
C. 0.945
D. 0.932

2)With reference to the data given in question no. 1, test the null hypothesis: "There is no
significant relationship between the variables". we will:
A. Accept the null hypothesis
B. Reject the null hypothesis
C. Can’t state any conclusion
D. None of the above
3)State TRUE or FALSE, in context to regression analysis –

Statement: "The variance of error, is same for all values of the independent variable"

True
False

4)A regression analysis between sales (Y in $1000) and advertising (X in dollars) resulted in the following
equation

Y = 30,000 + 5 X

The above equation implies that an:

A. increase of $5 in advertising is associated with an increase of $5,000 in sales


B. increase of $1 in advertising is associated with an increase of $5 in sales
C. increase of $1 in advertising is associated with an increase of $35,000 in sales
D. increase of $1 in advertising is associated with an increase of $5,000 in sales

5)In a regression and correlation analysis if r2 = 1, then Sum of square of Error (SSE)
A. SSE must also be equal to one
B. SSE must be equal to zero
C. SSE can be any positive value
D. SSE must be negative

6)In a regression analysis if Sum of square of Error (SSE) = 200 and Sum of square of
Regression (SSR) = 300, then the coefficient of determination is

A. 0.6667
B. 0.4000
C. 0.6000
D. 1.5000

7)Regression analysis was applied between demand for a product (Y) and the price of the product (X), and
the following estimated regression equation was obtained.

Y = 120 - 10 X

Based on the above estimated regression equation, if price is increased by 2 units, then demand is
expected to

A. increase by 120 units


B. increase by 100 units
C. increase by 20 units
D. decrease by 20 units
8)Regression analysis was applied between sales (Y in $1,000) and advertising (X in $100), and the
following estimated regression equation was obtained.

Y = 80 + 6.2 X

Based on the above estimated regression line, if advertising is $10,000, then the point estimate for sales (in
dollars) is

A. $62,080
B. $142,000
C. $700
D. $700,000

9)In regression analysis if the dependent variable is measured in dollars, the independent
variable

A. must also be in dollars


B. must be in some units of currency
C. can be any units
D. cannot be in dollars

10)If the coefficient of correlation is 0.90, then the coefficient of determination

A. is also 0.9
B. is either 0.81 or -0.81
C. can be either negative or positive
D. must be 0.81

WEEK-7

1)In a regression analysis, the error term is a random variable with a mean or expected value
of
A. zero
B. one
C. any positive value
D. any value

2)If the coefficient of determination is a positive value, then the coefficient of correlation

A. must also be positive


B. must be zero
C. can be either negative or positive
D. must be larger than 1

3)Larger values of r2 imply that the observations are more closely grouped about the

A. average value of the independent variables


B. average value of the dependent variable
C. least squares line
D. Origin
4)he interval estimate of the mean value of y for a given value of x is

A. prediction interval estimate


B. confidence interval estimate
C. average regression
D. x versus y correlation interval

5)In a regression analysis, the coefficient of determination is 0.4225. The coefficient of


correlation in this situation is

A. ±0.65
B. ±0.1785
C. any positive value
D. any value

6)In a regression and correlation analysis if r2 = 1, then

A. SSE must also be equal to one


B. SSE must be equal to zero
C. SSE can be any positive value
D. SSE must be negative

7) If the coefficient of correlation is 0.8, the percentage of variation in the dependent variable
explained by the variation in the independent variable is

A. 0.80%
B. 80%
C. 0.64%
D. 64%

8)If the coefficient of determination is equal to 1, then the coefficient of correlation

A. must also be equal to 1


B. can be either -1 or +1
C. can be any value between -1 to +1
D. must be -1

9) If all the points of a scatter diagram lie on the least squares regression line, then the
coefficient of determination for these variables based on these data is

A. 0
B. 1
C. either 1 or -1, depending upon whether the relationship is positive or negative
D. could be any value between -1 and 1
10)A simple linear regression equation (y = mx + c ) will always pass through the point
_____

A. (0,0)
B. (1,1)
C. ( Ymean , Xmean )
D. (Xmean , Ymean)

WEEK-8

1)Which of the following methods do we use to best fit the data in Logistic Regression?

A. Least Square Error


B. Maximum Likelihood
C. Jaccard distance
D. All of these

2)Which of the following evaluation metrics can not be applied in case of logistic regression
output to compare with target?

A. AUC-ROC
B. Accuracy
C. Logloss
D. Mean-Squared-Error

3)Let f(x) denote the logistic function. The range of f(x) for any real value of x is

A. (0,1)
B. (-1 , 1)
C. All positive integers
D. All negative integers

4)Which of the following option is true?

A. Linear Regression errors values has to be normally distributed but in case of Logistic
Regression it is not the case
B. Logistic Regression errors values has to be normally distributed but in case of Linear
Regression it is not the case
C. Both Linear Regression and Logistic Regression error values have to be normally distributed
D. Both Linear Regression and Logistic Regression error values have not to be normally
distributed
5)For the figure given below, which decision boundary is overfitting the training data?

A. A
B. B
C. C
D. None of these

6)Select the correct alternatives from the following based on the figure

1. The training error in first plot is maximum as compared to second and third
plot.

2. The best model for this regression problem is the last (third) plot because it
has minimum training error (zero).

3. The second model is more robust than first and third because it will perform
best on unseen data.

4. The third model is overfitting more as compared to first and second.

5. All will perform same because we have not seen the testing data.

A. 1 and 3
B. 1 and 4
C. 1,3 and 4
D. 5

7)For categorical data with ‘n’ categories, the number of dummy variables will be________
A. n
B. n-1
C. n+1
D. 2n

8)In binary logistic regression,

A. The dependent variable is continuous


B. The dependent variable is divided into two equal subcategories
C. The dependent variable consists of two categories
D. There is no dependent variable

9) If the number of False negatives is 5 and number of True Positives is 20, the value of
recall will be equal to _______

A. 0.2
B. 0.6
C. 0.8
D. 0.3

10)If the precision is 0.6 and the recall value is 0.4, the value of f-measure will be

A. 0.48
B. 1
C. 0.24
D. None of these

WEEK - 9

1) The following confusion matrix was obtained from a classifier.

A. 35%
B. 27%
C. 75%
D. 80%
2)For the given confusion matrix, what is the number of False Positives for the Apple class?

A. 17
B. 8
C. 7
D. 4

3) For the above given confusion matrix, what is the number of True Negatives for
the Apple class?

A. 4
B. 10
C. 8
D. 36

4) For
the above given confusion matrix, what is the number of False Negatives for the
Apple class?

A. 3
B. 1
C. 2
D. 4

5)For the above given confusion matrix, what is the F1-score of the classifier
for the Apple class?
A. 0.5
B. 0.4
C. 0.2
D. 0.6

6) In ROC analysis, a classifier is called ‘good’ if it has ______

A. Low TPR and Low FPR


B. Low TPR and High FPR
C. High TPR and Low FPR
D. High TPR and High FPR

7) State True or False: Standardization of features is not required before training a Logistic
regression model
True

False

8) For the given confusion matrix, determine the sensitivity for the model.

A. 33%
B. 67%
C. 50%
D. None of these

9) For the given confusion matrix, determine the specificity for the model

A. 53%
B. 67%
C. 47%
D. 33%
10) According to the ROC Curve and AUC below, choose the correct alternative for
the effectiveness of classifiers A and B.

A. A=B
B. A <B
C. A>B
D. None of these

WEEK - 10

1) State True or False: Statement: Null hypothesis for chi square test of
independence assumes that, all the proportions are equal.
True

False

2) Statistical test conducted to determine whether to reject or not reject a


hypothesized probability distribution for a population is known as a ________
A. contingency test
B. probability test
C. goodness of fit test
D. None of these

3) What is the minimum number of variables/features required to perform clustering?

A. 0
B. 1
C. 2
D. 3

4) An important application of the chi-square distribution is

A. making inferences about a single population variance


B. testing for goodness of fit
C. testing for the independence of two variables
D. All of these alternatives are correct.
5) The bigger the chi-square statistic, _________ the p-value

A. bigger
B. smaller
C. does not vary
D. None of these

6) State True or False: The shape of Chi-square distribution is left-skewed


True

False

7) Determine the chi-square statistic for the following data.

A. 0
B. 52.75
C. 32
D. None of these

8)For the given data, if significance level is 5% and degrees of freedom = 4, what is the
chi-square value?

A. 11.07
B. 12.592
C. 9.488
D. 7.815
9) The degrees of freedom for a contingency table with 6 rows and 3 columns is

A. 18
B. 15
C. 6
D. 10

10) In K-means clustering, K stands for _________

A. Number of data points


B. Error function parameters
C. Number of clusters
D. None of these

You might also like