You are on page 1of 26

PGPX- AD Course

Question Bank

1. A stock-broker knows from his past experience that the probability that a client owns
stocks is 0.70 and the probability that a client owns bonds is 0.60. The probability that
the client owns bonds if she already owns stocks is 0.55. What is the probability that
she owns both of these securities?

Solution:
P(Stocks ∩ Bonds) = P(Bonds | Stocks) * P(Stocks)
= 0.385

2. In a particular class, the heights of male students are normally distributed with mean
178 cm and variance 150cm2. Calculate the probability that a randomly chosen male
student has height less than 186 cm.
a. 0.505
b. 0.693
c. 0.306
d. 0.743

Solution:
P(Z < 0.5059) = NORM.S.DIST(P(x<186,TRUE)
= 0.743

3. The probability of having a side effect from a certain vaccine is 0.005. If 1000
persons are inoculated, calculate the probability that at most 2 person has a side effect.
a. 0.124
b. 0.033
c. 0.007
d. 0.040

Solution:
Correct answer is 0.124

4. If X ~ N(µ = 100 , σ2 = 16) then P(X > 110 ) =

Solution:
Correct answer is 0.006

5. A woman aged exactly 70 years has a insurance policy, which will pay her family Rs.
50000 if he dies in the next two years. The probability that she dies in the next year is
0.025, whereas the probability that she dies in the year after that is 0.014. What is the
probability that she dies within the next two years?

Solution:
P(die within 2 years) = P(die in 1st year ) + P(die in 2nd year)
= 0.0386
6. An Ambulance driver has 3 routes to reach the hospital. In order to save patients life,
it needs to reach hospital within 12 minutes. During any instance, the driver chooses
any one of the 3 routes with equal probability. The probability that he arrives on time
using route A, B and C are 60% , 62% and 70% respectively. If during a random
instance the driver arrived on time, what is the probability that he had chosen route
C?

Solution:
= (P(Route) * P(Choosing Route))/Sum Product(P(Routes)*P(Choosing routes))
= 0.364

7. Suppose a fast-food restaurant sells two types of sandwiches, sweet-sandwich and


chilli-sandwich. Suppose that on a typical weekday, the demand for sweet sandwich is
approximately normally distributed with a mean of 500 and standard deviation of 80
and the demand for chilli-sandwich is normally distributed with a mean of 130 and
standard deviation of 30. The price of a sweet sandwich is 75 Rupees and that of a
chilli sandwich is 85 Rupees. Assume that the demands for the two types of
sandwiches are independent of each other. At the least what revenue (in rupees) can
the restaurant earn with 80% probability?
(answer correct and rounded up to 2 decimal places)

Solution:

Revenue from sweet-sandwich~ N(75*500, sd=75*80 )=N(37500,sd=6000)

Revenue from chilli-sandwich~ N(85*130, sd=85*30 )=N(11050, sd=2550)

Total revenue ~ N( 48550, sd=sqrt(6000^2+ 2550^2))

20th percentile of this distribution =norm.inv( .2, 48550, sqrt(6000^2+ 2550^2)) =


43063.139

8. A manufacturing facility needs to open a new assembly line in four months or there
will be significant cost overruns. The manager of this project believes that there are
four possible values for the random variable X (the number of months from now it
will take to complete this project): 2, 2.5, 3, and 3.5. It is currently believed that the
probabilities of these four possibilities are in the ratio 1 to 2 to 3 to 2. That is, X = 2.5
is twice as likely as X = 2 and X = 3 is 1.5 times as likely as X = 2.5. Calculate a) the
probability that the project will be completed in 3 months or less and b) the expected
completion time of the project.

Solution:

Let P(X=2)=p, then P(X=2.5)=2p, P(X=3)=3p, P(X=3.5)=2p

Since total probability=1, p+2p+3p+2p=8p=1, so p=1/8


x 2 2.5 3 3.5
P (X = x) 0.125 0.250 0.375 0.250

P(X<=3) = .125+.250+.375=.75,

E[X]= sum product of table=2.875

9. A researcher will be administering a survey questionnaire form on a sample of 15


individuals randomly picked from a population. The form needs to be filled by each
respondent. Suppose based on past studies it is believed that the probability of a form
being incorrectly filled by a respondent in such surveys is 0.3. Assuming the
respondents fill the forms independently of each other, the probability that the
researcher will find either 9 or more incorrectly filled forms is

Solution:

1-Binom.Dist (8,15,0.3,1) = 0.0152

10. The sizes of claims, which arise from a certain portfolio of insurance policies are
known to be normally distributed with means µ = Rs. 5000 and standard deviation σ =
Rs. 500. What is the probability that a future claim will be greater than Rs. 4500?

Solution:

1- norm.dist(4500, 5000,500,1) = 0.8413

11. A contestant on a game show is asked two questions. The probabilities that she gets
the first question correct is 0.4, the second question correct is 0.5, and both correct is
0.2. The probability that she gets both questions wrong is

Solution:

1-(.4+.5-.2)=0.3
12. There are three accounts, out of which exactly one is known to be an over-funded
account. A Chartered Accountant (CA) has been given the task of auditing to find the
overfunded account and the amount of overfunding. However, it is not known which
account is overfunded. So, the CA audits each account in turn until she finds the over-
funded account, and after auditing this one she stops. What is probability that the CA
ends up auditing all the 3 accounts?
a. 1/3
b. 1/2
c. 1/27
d. 2/3

Solution:

P(auditing 3 accounts)= P( 1st account picked out of 3 is not over funded) * P( 2nd
account picked out of remaining 2 is not overfunded)* P(3rd account picked out of
remaining 1 is over funded)
= (1-1/3)*(1-1/2)* 1
=1/3

13. A bag has two coins, the first one is a double-headed coin (i.e. has heads on both
sides) and the second one is a fair coin (i.e. equal chance for head and tail). A coin is
selected at random from the bag and this coin is tossed independently two times
successively. Given that the result of both the tosses are heads, what is the probability
that the first coin was picked?
a. 3/4
b. 2/3
c. 2/5
d. 4/5

Solution:

P(HH|first) P(first) / (P(HH|first)*P(first)+ P(HH|second)*P(second))

(1/2*1)/(1/2*1+1/2*1/4)

=4/5
14. A TV channel wants to attract more advertising companies to play ads during the
airing of an ongoing TV series. So, the TV channel decides to demonstrate to the
prospective advertising companies based on a random sample of households that more
than 50% of households tune in to the programme. If p= the population proportion of
households who tune in to the programme, then the TV channel should formulate the
null and alternative hypothesis formulation as
a. H0 : p = 0.5 , H1 : p ≠ 0.5.
b. H0 : p ≤ 0.5 , H1 : p < 0.5
c. H0 : p ≤ 0.5 , H1 : p > 0.5
d. H0 : p ≥ 0.5 , H1 : p < 0.5

15. Scientists at a weather station wish to test whether mu= “the average annual rainfall”
has increased from its long-term value of 22 cm. They formulate the hypothesis as
follows H0 : μ ≤ 22, H1:μ > 22 Assuming that the past 10 years of data on annual rain
fall can be treated as a random sample, they compute the P-Value to be 0.0149. If α
denotes the significance level, which one of these conclusions is correct?

a. H0 will be rejected at α = 0.05 but not rejected at both α = 0.01 and α = 0.1
b. H0 will be rejected at both α = 0.05 and 0.1 but not rejected at α = 0.01
c. H0 will be rejected at both α = 0.05 and α = 0.01, but not rejected at α = 0.1
d. H0 will be rejected at α = 0.01 but not rejected at both α = 0.05 and α = 0.1

16. A biscuit manufacturing company has a process in place to ensure that the proportion
(p) of manufactured biscuit packets whose weight deviates from the required
specification of 200 gm, is not more than 1%. Based on a random sample of 100
packets taken after every hour of production, they test the hypothesis
H0 : p ≤ 0.01 , H1: p > 0.01
Specifically, their rule is to “reject H0 if 3 or more packets in the sample deviate from
the required specification”. In case of a rejection, they will have to stall the
production process, examine and reset it by fixing any issues, before the next
production starts. Compute the power of this test when p = 0.025. (Assuming that the
number of packets produced every hour is very large, use the exact sampling
distribution of the test statistic for the computation)
(answer correct and rounded up to 4 decimal places)

Solution:
1-BINOM.DIST(2,100,0.025,1)= 0.4578
17. Which two of the following statements are True?
a. Type I error occurs when the null hypothesis is true but alternative is
accepted
b. The probability of type I error under null hypothesis can be ensured to not
exceed the significance level only if the sample size is chosen to be
sufficiently large
c. The lower limit of the 95% confidence interval for the population proportion
p, given that n = 300; and p^ =0.10 is 0.1339.
d. A 99% confidence interval for population mean μ will be necessarily
wider than the 95% confidence interval constructed using the same
approach.

18. A survey was conducted to determine the “average number of hours per week IIMA
students spend on sports” (μ). A simple random sample of 10 students was selected
and the following data shows the number of hours each of them spent on sports during
a week
8, 4, 7, 5, 9, 7, 6, 9, 5, 7.
Assuming that the t-distribution assumption is reasonable and finite population
correction=1, a 95% confidence interval for(μ) is
(Write confidence interval correct and rounded up to 4 decimal places)
Assuming that the t-distribution assumption is reasonable and finite population
correction=1, a 95% confidence interval for

Solution:
Sample mean= xbar= average(8, 4, 7, 5, 9, 7, 6, 9, 5, 7)=6.7,

Sample standard deviation=s= stdev(8, 4, 7, 5, 9, 7, 6, 9, 5, 7)= 1.702939

n = 10, SEhat(xbar)= s/sqrt(n)=0.538516

1-alpha=.95, alpha = .05, t_(alpha/2) = t.inv(.975, 10-1)=2.26215

95% CI = [ xbar- t_(alpha/2) * SEhat(xbar), xbar+ t_(alpha/2) * SEhat(xbar)]

95% CI = [5.4818 , 7.9182]


19. An opinion poll was conducted based on a simple random sample of 1000 voters from
a city. The opinion poll result reported a 99% confidence interval for the proportion of
voters who favour party A to be: [0.400, 0.490]. Based on this 99% confidence
interval, which one of these conclusions can be reached?
a. There is a 99% chance that voters will choose party A
b. It is likely that party A will gain votes between 20 % and 35 %.
c. It is unlikely that party A will gain more than 50% of the votes
d. It is likely that party A will gain less than 40% of the votes.

20. Select the option that best describes the issue with the survey question “On a scale of
1 to 10 how satisfied are you with the quality of food and ambience at the college
mess?
a. Order effect
b. No response
c. Leading Question
d. Double Barrelled question

Questions 21 to 24

A famous south Indian fast food chain serves only Idli and Dosa. Its customers were asked
whether they preferred Idli or whether they preferred Dosa. 60% said that they preferred Idli
and the remaining preferred Dosa. 70% of the customers were male. 80% of the males
preferred Idli.

21. If a customer is picked at random, the events A=“customer prefers Idli” and
B=“customer prefers Dosa” are
a. Mutually exclusive
b. None of the other choices
c. Independent
d. Both null sets

22. What is the probability that a randomly selected customer is not male? (enter full
numeric answer without rounding)

Solution: 1-0.7 = 0.3

23. What is the probability a randomly selected customer is Male and prefers
Dosa? (enter full numeric answer without rounding)

Solution:
P(Idli and Male)= P(Idli | Male)*P(Male)= .8*.7= 0.56
P(Dosa and male)= P(Male)-P(Idli and Male)= .7- .56=.14
24. Given that a randomly selected customer prefers Dosa, what is the probability that this
customer is not male? (enter full numeric answer without rounding)

Solution:
P(Dosa)=1-.6=.4
P(not male and Dosa)= P(Dosa)- P(male and Dosa)=P(Dosa)- P(Dosa|male)P(male)
= 0.4 - (1-.8)*0.7= 0.4-0.14=
0.26
P(not male|Dosa)= P(not male and Dosa)/P(Dosa)= .26/.4 = .65

Questions 25 to 28
The service manager for a new appliances store reviewed sales records of the past 20
sales of new microwaves to determine the number of warranty repairs he will be
called on to perform in the next 90 days. Corporate reports indicate that the
probability any one of their new microwaves needs a warranty repair in the first 90
days is 0.05. The manager assumes that calls for warranty repair are independent of
one another and is interested in predicting the number of microwaves requiring
warranty repairs in the next 90 days for this batch of 20 new microwaves sold.

25. What is the probability that two or more of the 20 new microwaves sold will require a
warranty repair in the first 90 days? (enter answer correct and rounded up to 4
decimal places)

Solution:
1-binom.dist(1, 20, 0.05,1)=.26416

26. The expected number of microwaves requiring warranty repairs in the next 90 days is
.
(enter full numeric answer without rounding)
Solution:
np=20*0.05=1

27. The variance of the number of microwaves requiring warranty repairs in the next 90
days is . (enter full numeric answer without rounding)

Solution:
np(1-p)=20*.05*(1-.05)=0.95
28. If there are 2 or less microwaves requiring warranty repairs in the next 90 days, then a
labour cost of Rs 600 per microwave will be incurred. However, if there are 3 or
more microwaves requiring warranty repairs in the next 90 days, the per microwave
labour cost reduces due to economies of scale and will be Rs 400 per microwave. The
expected total labor cost (in rupees) due to warranty repairs incurred on these 20
microwaves in the next 90 days will be .

(enter answer correct and rounded up to 2 decimal places)

Solution:
550.94 = sum product of probabilities and total labor cost from the table below

29. The defining property of a simple random sample is that:


a. the fewest sample units necessary for statistical significance is chosen
b. the easiest method to access the sampling units is chosen
c. every fourth subject is chosen as a sample
d. every sample of a particular size has the same chance of being chosen
30. Which one of the following statements is not true?

a. We can measure the accuracy of judgmental samples by applying some


simple rules of probability
b. A list of all members of the population from which we can choose a sample is
called a sampling frame
c. A judgmental sample is a sample in which the sampling units are chosen
according to the sampler’s judgment
d. The finite population correction factor is a correction for the standard error
when the sample size is fairly large relative to the population size

31. The 95% confidence interval estimate 18.5 ± 2.5 was calculated for a population
mean in which the sample standard deviation s was computed as 7.5. The analyst
realized that there was a calculation error and that the sample standard deviation
should have been 15. The 95% confidence interval estimate after incorporating this
correction will be

a. 18.5 ± 15
b. 37 ± 15
c. 37 ± 5
d. 18.5 ± 5

32. The p–value of a sample is the probability of realizing a sample with

a. at most as much evidence in favor of the null hypothesis as the sample actually
observed
b. at least as much evidence in favor of the alternative hypothesis as the
sample actually observed
c. at most as much evidence in favor of the alternative hypothesis as the sample
actually observed
d. at least as much evidence in favor of the null hypothesis as the sample actually
observed
Questions 33 to 35

A new online auction site specializes in selling automotive parts for classic cars. The founder
of the company believes that the price (in USD) received for a particular item increases with
its age (i.e., the age of the car on which the item can be used in years) and with the number of
bidders. A part of the multiple regression output based on 25 observations obtained using a
package is shown below. Assume that the standard regression model assumptions hold.
Regression Coefficients
Estimate SE t-value p-value
Constant -1242.99 331.204 -3.7529 0.0010
Age of Item 75.017 10.65 7.0459 0.0000
Number of Bidders 13.973 10.44 1.3380 0.0400
R-square =0.83

33. Which one of the following statements is not a correct interpretation of the above
output?

a. We can conclude that the slope parameter for the number of bidders is
significantly different from 0, at 5% significance level, but cannot conclude
the same at 1% significance level
b. Every unit increase in age is associated with a 75.071 USD increase in
price on an average
c. The degrees of freedom for the t-distribution used for testing the slope
parameters is 22
d. The model is unable to explain 17% of the variability in prices

34. As per the model, the point estimate for the expected price for an item aged 150 years,
when the number of bidders is 10 will be

Solution:

-1242.99 +75.017 *150 + 13.973*10=10149.29

35. Let 𝛽1 be the slope for the variable `Age of the item’. To test the hypothesis
𝐻0: 𝛽1 ≥ 80 𝑣𝑠 𝐻1: 𝛽1 < 80,
the P-value should be computed as
75.017−80
a. 𝑃 (𝑡 ≤ ), where t follows t-distribution with 22 degrees of freedom
√10.65
75.017
b. 𝑃 (𝑡 ≤ ), where t follows t-distribution with 22 degrees of freedom
10.65
𝟕𝟓.𝟎𝟏𝟕−𝟖𝟎
c. 𝑷 (𝒕 ≤ ), where t follows t-distribution with 22 degrees of freedom
𝟏𝟎.𝟔𝟓
75.017−80
d. 𝑃 (𝑡 ≥ ), where t follows t-distribution with 22 degrees of freedom
10.65
36. Which one of these statements is not true about regression diagnostics?

a. While looking for violation in linearity or constant variance assumption, it is more


appropriate to look at plots based on standardized residuals than just residuals.
b. If the scatter in the normal probability plot is not close to a straight line then it is
an indication of deviation from the normality assumption
c. It is appropriate to interpret the P-values only after the model assumptions have
been checked through residual diagnostics and are verified to hold.
d. To construct and interpret prediction intervals, verifying model assumptions is not
a necessary prerequisite.

Question 37 to 40
A multiple regression study was conducted to relate the current annual salaries (in USD) of
200 individuals to their years of experience and education levels. The education level is coded
as 1 if education is up to high school, 2 if it is college but not graduated, 3 if completed college
with an undergraduate degree and 4 if completed college with a graduate degree or higher. The
estimation results for the regression model, which yielded an R-square value of 88.6% are
shown in the table below.

Estimate SE t- P-value
value
Constant 28885.90 1574.70 18.34 < 0.0001
Years Experience 2013.65 85.41 23.58 < 0.0001
Education2 5682.82 1465.92 3.88 <0.0001
Education3 18146.72 1431.39 12.68 < 0.0001
Education4 26122.39 2244.11 11.64 < 0.0001

37. Which one of the following conclusions can be reached based on the information
provided?

a. If education level had been used as a numeric variable instead of as a categorical


variable, the model would have yielded a lower R-square value than 88.6%
b. Every additional year of experience will lead to an increase in salary of USD 2013.65
on an average
c. An individual with two years of experience and education level 3 is estimated to
earn on an average USD 7975.6 lesser than an individual with two years of
experience and education level 4.
d. An individual with education level 3 is estimated to earn about USD 7975.6 lesser
than an individual education level 4.
38. A 95% symmetric prediction interval for the salary for an individual with certain
given number of years of experience and education level is computed from the model
to be [41607.2, 72594.6]. The point estimate for the expected salary for this individual
is
Solution:

(41607.2+72594.6)/2= 57100.9

39. A 95% symmetric prediction interval for the salary for an individual with certain
given number of years of experience and education level is computed from the model
to be [41607.2, 72594.6]. Which one of the following statements is true?

a. The estimated margin of error for this interval is more than $30000.
b. A 90% symmetric prediction interval for the same years of experience and
education level will have a larger margin of error.
c. The estimated margin of error for this interval is more than$ 14000 and
$15000.
d. The estimated margin of error for this interval is more than$ 15000 and
$16000

40. Based on the estimated model, the predicted salary for an individual with 2 years of
experience and education level up to high school is
Solution:

28885.9 + 2013.65*2= 32913.2

41. In a certain premium restaurant, a customer can either order ála carte or buffet. In the
previous month, 30% of the customers who visited the restaurant ordered ála carte and
the remaining ordered buffet. Of those who ordered ála carte, 46% were men and the
remaining were women. Also, 60% of all customers in the previous month were
women and the remaining were men. The restaurant is planning to send a free “buffet
coupon” to one randomly chosen male customer of the previous month. What is the
probability that the randomly chosen male customer had buffet in the previous month?

Solution:

W=Women, M=Men, A= ala carte, B=buffet


Given: P(W)=0.6, so P(M)=.4, P(A)=.3, P(M|A)=.46
We want P(B|M) = P(B and M)/P(M)
P(M)= P(B and M) + P( A and M)
So, P(B and M)= P(M) – P(A and M)=P(M)-P(A)*P(M|A)
= .4 - .3 * .46 = .262
So, P(B|M) = P(B and M)/P(M)= .262/.4= 0.655
42. Suppose that the size of each claim arising under a certain mediclaim insurance policy
can be modeled as a normal distribution with a mean of ₹ 4000 and a standard
deviation of ₹ 600. A preliminary investigation of a particular claim reveals that its size
will be at least ₹ 3400. Given this information, what is the probability that the size of
this claim will be greater than ₹ 4000?
Solution:
We want P(X > 4000 | X > 3400)
P (X > 4000) = 1-NORM.DIST(4000,4000,600,1) = 0.5
P(X>3400) = 1-NORM.DIST(3400,4000,600,1) = 0.8413
P (X > 4000 | X > 3400) = P (X > 4000 ) / P(X>3400) = 0.5 / 0.841 = 0.5942

43. A recruitment agency as a part of its social media campaign, places advertisement in 3
online portals with probabilities of 0.2 , 0.3 and 0.5 respectively. The probability that
the recruitment company gets an enquiry from an advertisement in the first online
portal is 0.001. The probabilities of getting an inquiry for an advertisement in the
second and third portals are 0.002 and 0.004 respectively. Given that the company has
just received an enquiry, calculate the probability that it came from an advertisement in
the first online portal.

Solution:

First portal (A) 0.2 P(E | A) 0.001


Second portal (B) 0.3 P(E | B) 0.002
Third portal (C) 0.5 P(E | C) 0.004

Enquiry received (E)

= 0.071428571

44. In a particular residential colony it is estimated that there is a 25% chance that any
specified house will be burgled over a period of two years, independently for each
house. There are nine houses in the colony. Calculate the probability that either two or
more houses will be burgled over the period of two years.
Solution:

X ~ Bin(9 , 0.25)
P(X < =1) = BINOM.DIST(1, 9 , 0.25 , 1) = 0.3003 ~ 0.30
P(X > = 2) = 1 - P(X < 2) = 1 - P(X < = 1) = 0.7

45. In an organization, a HR officer wants to recruit a candidate to fill a single job vacancy.
5 candidates appear for the interview. Out of the 5 candidates, exactly 2 candidates are
suitable for the job and the other 3 are not suitable. The HR manager interviews the
candidates in random order until he finds one candidate who is suitable for the job. As
soon as he finds one suitable candidate he stops interviewing any more candidates.
Calculate the probability that the HR officer conducts either exactly 1 interview or
exactly 3 interviews.
Solution:
Let X= number of interviews conducted
Possible values of X are 1,2,3,4 (note 5 is not possible)
P(X=1) = 2/5
P(X=2) = 3/5*2/4=6/20=3/10
P(X=3)= 3/5*2/4*2/3=1/5
P(X=4)= 3/5*2/4*1/3*1=1/10
P(X=1)+P(X=3)= 2/5+1/5=3/5=.6

46. A market research analyst is studying the amount of money spent by customers for
online-shopping, during a month. He believes that the mean of the amount of money
spent by the customers during the month for online-shopping is Rs 3800 and the
standard deviation is Rs. 1500. Assuming Central Limit Theorem approximation, the
probability that the total amount spent by 100 customers during the month exceeds Rs.
400000, is:
Solution:
Let T= total amount spent by 100 customers
By Central Limit Theorem, approximately T follows N(3800*100, 1500*sqrt(100))
1-norm.dist(400000,380000,15000,1) = 0.0912

47. Which one of these is not a statement of independence between events A and B?
a) P(A and B) = P(A) x P(B)
b) P(A | B) = P(A)
c) P(B | A) = P(B)
d) P(A or B) = P(A) + P(B)
48. In the estimation of proportions, which one of the following will reduce the margin of
error (at 95% confidence) by 50% ?
a) Decrease the sample size from n=400 to n = 200

b) Increase the sample size from n=200 to n=400

c) Increase the sample size from n=200 to n = 800

d) Decrease the sample size from n = 800 to n = 200

Note: Margin of error and sample size are inversely proportional

49. A few weeks prior to the 2018 Karnataka assembly elections, a popular opinion poll
estimated a 57% vote share for the Congress party (i.e. the percentage of voters who
will vote for Congress party) and reported an estimated standard error of 0.005. The
survey used simple random sampling (SRSWOR), and the voting population of
Karnataka can be assumed to be large. The sample size used in the survey was not
explicitly reported but a PGPX student from IIMA, who had learnt concepts of
estimation of proportions, succeeded in deciphering the sample size used in the
survey. The sample size used in the survey was:
(Write the answer rounded to the next nearest whole number)
Solution:

standard error is computed as sqrt(0.57*(1-0.57)/n) = 0.005

hence n = 0.57*(1-0.57)/0.005^2 = 9804

50. Let p denote the probability of heads in a coin toss. To test H0 : p ≤ 1/3 vs H1 : p >
1/3 , the following test rule is determined. “Reject H0 if 4 independent tosses of the
coin result in either 2 or more heads” The probability of Type 1 error (when p=1/3) is:
(Write the answer rounded and correct upto 4 decimal places)
Solution:

Type 1 error = 1 - BINOM.DIST(1,4,1/3,1) = 0.4074

The correct answer is: 0.4074

51. Which one of the following issues does “standard error” address?
a) Sampling error
b) Non- response
c) Coverage Error
d) Measurement error
52. A random sample of 30 orders out of all pizza orders from a local restaurant during a
week, resulted in the average cost and standard deviation of the cost in the sample to be
Rs. 800 and Rs. 500, respectively. To estimate the average cost of pizza during the
week, the estimated margin of error for the 99% t-distribution based confidence
interval, is:
(Write the answer rounded and correct upto 2 decimal places)
Solution:

The correct answer is: 251.622

53. Which one of the following statements is FALSE?


a) Type 1 error happens when we reject a true null hypothesis.

b) For population proportion, the CLT-based 90% symmetric confidence interval


centered at the sample proportion, is shorter than the corresponding 95%
confidence interval.
c) Type 2 error happens when we reject a false null hypothesis.

d) In Simple Random Sampling, every unit in the sampling frame has an equal
probability of getting chosen in the sample.

54. The sports department at a large university wants to estimate the mean number of hours
students spend per day on athletic activities, based on a simple random sample. What
should be the sample size so that the university can be 99% confident that the sample
mean is within 0.5 hours of the population mean?
Assume that the population standard deviation is known to not exceed 2.3 hours, the
number of students in the university is very large, and that the CLT approximation
holds for the sample mean.
(Write answers rounded to the next nearest whole number)
Solution:

55. A study was conducted to estimate the prevalence (p) of Osteoporosis among a large
population of women aged 80 or older in a country. In other words, p= the proportion
of women in that age group in the country who had osteoporosis. The study was based
on a simple random sample and reported a 99% confidence interval [0.55, 0.75] for p,
using normal approximation to the distribution of sample proportion. Based on the
given 99% confidence interval, which one of these statements would you agree?
a) Prevalence is more than 65%

b) Prevalence is under 20%

c) Prevalence is under 20%

d) Prevalence is more than 50%

56. An article in a prestigious science journal argues that the adult urban population is
increasingly skimping on their sleep. A researcher undertook the study to estimate the
mean sleeping hours of adult urban people. He takes a random sample of 70 adult
residents of a metro city and concludes with 90% confidence that the mean sleep time
of adult residents in this metro city lies between 4.62 hours and 6.38 hours. Later, the
researcher considers it more appropriate to use a 95% confidence interval instead of the
90% confidence interval. Assuming that he uses the t-distribution based confidence
intervals:

The upper limit of the 95% confidence interval is:

The lower limit of the 95% confidence interval is:

(Write answers correct and rounded upto 3 decimal places)

Solution:

Question 57 to 63

A recruitment agency wants to recruit a candidate for ABC company for the post of
“data analyst”. The agency gave an advertisement on their portal and received several
applications which were suitable for the post. The agency had also asked all the
candidates about their desired annual salary (in lacs of Rupees) along with the
curriculum vitae. ABC company believes that the desired annual salary of a candidate
depends upon various factors, namely Education (defined as 1 if the candidate had
specialization in Data Science, 0 otherwise), Work Experience (in years) and
Proficiency (defined as 1 if the candidate knows multiple computer languages, 0
otherwise). Applications were received from 85 candidates after the advertisements. A
multiple linear regression model was fitted with desired annual salary (Y in lacs of
Rupees) as the dependent variable, as a function of Education, Work Experience and
Proficiency, which resulted in the following output: (Assume that the standard
regression assumptions hold true.)

Predictor Estimate Standard Error


Intercept 28.25 13.24
Education 1.23 1.12
Work Experience 5.65 1.62
Proficiency 13.64 1.54

R2 = 82.17%
57. The degrees of freedom associated with the error sum of squares, is
Solution:
degrees of freedom = 85 - 4 = 81
The correct answer is: 81

58. Which one of these statements is the correct interpretation for the estimated coefficient
of Work Experience?
a) Controlling for the other variables, one year increase in Work Experience is
associated with an increase of Rs. 565000 in desired annual salary
b) Controlling for the other variables, one year increase in Work Experience is
associated with an increase of Rs. 5.65, on an average in desired annual salary.
c) Controlling for the other variables, one year increase in Work Experience leads to
an increase of Rs. 565000, on an average in desired annual salary.
d) Controlling for the other variables, one year increase in Work Experience is
associated with an increase of Rs. 565000, on an average in desired annual
salary.

59. Which one of these statements is not correct?


a) The fitted model explains 82.17% of the variation in the desired annual salaries in
the data.
b) The value of the t statistic to test the null hypothesis that Work Experience has no
effect, is 3.487.
c) Controlling for the other variables, Proficiency has a positive linear association
with the desired annual salary of the candidate.
d) As per the model, the average desired salary of candidates with knowledge of
multiple computer languages is Rs.154000 more than average desired salary of
candidates without knowledge of multiple computer languages.
60. Suppose the agency randomly chose two candidates from the 85 namely, Nidhi and
Jaya. Both of them have specialization in Data Science and have Proficiency in
multiple computer languages. However, Nidhi has 4 years of Work Experience and
Jaya has 6 years of Work Experience. As per the model what is the difference between
their desired annual salary(in lacs of Rupees) on an average?
(Enter full numeric answer without rounding off)
Solution:
As per regression formula,

For Nidhi: 28.25 + 1.23 + 5.65*4 +13.64 = 65.72


For Jaya: 28.25 + 1.23 + 5.65*6 + 13.64 = 77.02
Difference = 11.3
The difference between their desired annual salary is Rs. 1130000.
The correct answer is: 1130000

61. Construct the symmetric t distribution based 95% confidence interval for the effect of
Education on the desired annual salary(in lacs of Rupees).
The lower limit of the 95% confidence interval is :
The upper limit of the 95% confidence interval is :
(enter the answers correct and rounded upto 3 decimal places)
Solution:

Lower limit = 1.23 - T.INV(1-0.05/2 , 81) * 1.12 = -0.9984

Upper limit = 1.23 + T.INV(1-0.05/2 , 81) * 1.12 = 3.4584

62. Is this statement true or false?


"The correlation coefficient of Proficiency with desired annual salary in the data, is not
more than 91%"
a) True
b) False

63. As per the model, on an average the desired annual salary(in lacs of Rupees) of a
candidate with specialization in data science, no work experience and no knowledge of
multiple computer languages, is
(Enter the full numeric answer without rounding off)
Solution:

Average desired annual salary = 28.25 + 1.23 *Education+5.65*Work


Experience+13.64*Proficiency

= 28.25 + 1.23*1 + 5.65*0 + 13.64*0 = 29.48

The correct answer is: 2948000

Question 64 to 66

The CEO of JM Pvt. Ltd. had been observing over past several months that at least 30%
of the employees have been coming to the office during weekends to complete their
work. He had some reasons to doubt that bad quality of sleep might be affecting their
ability to complete their work on time. He thought of an idea to design a space for the
employees where they can have a power nap after the lunch hour. Before introducing
these changes, he wanted to conduct a statistical test to determine whether there is
enough evidence to support this change. He carried out an experiment on a sample of
100 randomly chosen employees. In that experiment, he provided the so chosen
employees an option of taking a power nap after lunch during the upcoming 5
weekdays, and subsequently checked what percentage of them came to the office during
the weekend to finish their work. He observed that 20% of employees in the sample
came to the office during the subsequent weekend to finish their work. Based on this
data, he wanted to test the following hypothesis: H0 : p≥ 0.3 and H1: p<0.3, where p is
the proportion of employees in the company who will come to the office during
weekends to complete their work after implementation of the power nap initiative. He
wants to carry out the test at 5% level of significance.
(Assume Normal approximation)
64. The P-value that the CEO needs to use for this test, is
Solution:

SE =SQRT(0.3*(1-0.3)/100) =0.0458

P-value =NORM.DIST(0.2,0.3,0.0458,1) = 0.0145

The correct answer is: 0.0145

65. Suppose the CEO incorrectly computed the P-value to be 0.025, then which one of the
following conclusions would follow?

a) Reject H0 at α = 0.01 but not at α = 0.05.


b) Reject H0 at α = 0.05 and α = 0.01 but not at α = 0.1.
c) Reject H0 at α = 0.05 and α = 0.1 but not at α = 0.1.
d) Reject H0 at α = 0.01 , α = 0.05 and α = 0.1.
66. Which one of these statements is correct?

a) The power of the test when p=0.1 is higher than the power of the test when
p=0.2.
b) The power of the test when p=0.2 is higher than the power of the test when p=0.1.
c) Type I error occurs when the CEO chooses to go ahead with the plan.
d) Type II error occurs when the CEO chooses to go ahead with the plan.

Question 67 and 68

Zeeka space agency, responsible to protect a planet from asteroid collision, developed
space-to-space missiles to be loaded in satellites. These missiles are meant to hit and
destroy the asteroid.
Suppose p is the probability of a missile hitting a targeted asteroid. The agency planned
a 2-step missile trial for testing H0: p ≤ 0.1 vs H1: p > 0.1.
At the first step, 12 missiles will be fired independently. If three or more out of the 12
missiles hit the target then H0 is rejected and the study is terminated .
On the other hand, if less than 3 out of the 12 missiles hit the target then an additional
12 missiles will be fired in the second step. If a total of five or more out of the 24
missiles hit the target in the two steps, then H0 is rejected. If less than five out of the 24
missiles hit the targets in the 2-step missile trial, we fail to reject the H0.
(Do the calculation based on exact sampling distribution i.e., binomial distribution)

67. Calculate the probability of Type I error for the two-step testing procedure, when
p=0.1.
(enter the answer correct and rounded up to 3 decimal places)
Solution:

For calculating Type I error, we take p=0.1,

= 1-BINOM.DIST(2,12,0.1,TRUE) +

BINOM.DIST(0,12,0.1,FALSE) * (1-BINOM.DIST(4,12,0.1,TRUE)) +

BINOM.DIST(1,12,0.1,FALSE) * (1-BINOM.DIST(3,12,0.1,TRUE) )+

BINOM.DIST(2,12,0.1,FALSE) * (1-BINOM.DIST(2,12,0.1,TRUE))

=(1 – 0.8891) +0.2824*0.0043 +0.3766 * 0.0256 +0.2301 * 0.1109

= 0.14727 ~ 0.147

The correct answer is: 0.147


68. Suppose the test rule is modified, by increasing the number of missiles fired in the first
step, but without changing the number of missiles fired in the second step or without
changing any of the cut-offs in the earlier testing rule. Then, the power of the resulting
test, in comparison to the earlier test rule, will increase.
Select one:
a) True
b) False

69. The emergency ward of a multispeciality hospital wants to estimate the average waiting
time for its patients before starting a treatment. A researcher observes the waiting-times
in the emergency ward for 40 randomly chosen patients. The average waiting time in
the sample is 5 minutes with a sample standard deviation of 1 minute. Using the t
distribution approximation, he constructed 95% symmetric confidence interval.
The lower limit of the 95% confidence interval is:
The upper limit of the 95% confidence interval is:
(Write answers correct and rounded upto 2 decimal places)
Solution:

n = 40

xbar = 5

SD = 1

SE(xbar) = 1 / sqrt(40) = 0.158114

95% CI

t_(alpha/2 , n-1) = T.INV(1-0.05/2,40-1) = 2.022

Lower limit = 5 - 2.022 * 0.15811 = 4.6801 ~ 4.68

Upper limit = 5 + 2.022 * 0.15811 = 5.3196 ~ 5.32

70. Which one of these is a good suggestion for improving the accuracy of estimation of
the population parameter?
a) Decrease the confidence level at a given sample size.

b) Increase the confidence level at a given sample size.

c) Decrease the sample size at a given confidence level

d) Increase the sample size at a given confidence level


71. Which one of these is not a statement about the statistical methodology used to address
testing or estimation?
a) Type I error is 5%.

b) Sample size used for estimation is 15.

c) Sample proportion is 0.1.

d) The estimator is unbiased.

72. Observing data over a long period of time, the mean population score for an entrance
test to a college is known to be 450 and the population standard deviation is known to
be 100. Suppose that an external agency plans to draw a sample of 25 randomly chosen
participants who recently received their test score, and plans to use the sample mean
test score as the estimator of the population mean. Then,
the expected value of the estimator is
the standard error of the estimator is :
(enter the full numeric answer without rounding off)
Solution:

73. When asked questions concerning personal hygiene, people commonly lie. This is an
example of:

Select one:

a) sampling error
b) coverage error
c) non-response error
d) measurement error

74. During a countrywide promotional campaign for a new soft drink, a company places a
scratch label on every bottle. 10% of the bottles have the winning prize label. Hoping
to win a prize, a child decides to try a bottle of new drink each day for one full week.
What is the probability that the child will win a prize at least on one of the days?

(enter answer correct and rounded up to 3 decimal places)


Solution:

n = 7, p = 0.1

P(X ≥ 1) = 1 – P(X = 0) = 1 - BINOM.DIST(0,7,0.1,TRUE)

= 1 – 0.4782 = 0.5217 ~ 0.522

The correct answer is: 0.522

75. In an Entrance examination for graduation in Mathematics and Statistics, of the 120
students who appeared for the examination, 65 passed in Mathematics, 75 passed in
Statistics and 35 passed in both the tests. If a student is selected at random then the
probability that the student has failed in both the tests is:

Solution:

76. In a study where we are analyzing salaries of employees from a particular industry, to
which of these statements the central limit theorem doesn’t apply?

Select one:

a) Distribution of total salary of 100 randomly chosen employees.


b) Distribution of average salary of 100 randomly chosen employees.
c) The histogram of salaries for 100 randomly chosen employees.
d) Distribution of proportion of employees among 100 randomly chosen employees,
with annual salary greater than Rs. 200000.

77. Which issue is definitely present and is common to all the below survey designs(1,2
and 3)?

1. A business journal conducts a census of all its readers to predict the next likely
prime minister.
2. To determine the percentage of people in favour of an education policy, a radio
talk show surveys a random sample of its listeners for their views.
3. A police detective, interested in determining the extent of drug usage among
youth, studies a randomly chosen sample of high school students.
Select one:

a) Coverage error
b) Measurement error
c) Non response due to sensitive question
d) Sampling error

78. A curious professor who teaches a data analysis course at a premier management
institute used data on students from past batches and built a simple linear regression
model, with the explanatory variable being the average quiz score of a student (i.e.
average taken over multiple quizzes taken by a student) and the dependent variable
being the final exam score obtained by the student in the course. For an average quiz
score of 5, he calculated the 95% prediction interval to be [15, 25]. Which one of the
following statements is the correct interpretation of the interval?

Select one:

a) the professor can expect 95% of the students in a new batch to obtain a final exam
score between 15 and 25.
b) the professor can expect that the mean final exam score of students from a new
batch to lie between 15 and 25, with 95% confidence.
c) the professor can expect that the mean final exam score of students from a new
batch with an average quiz score of 5, to lie between 15 and 25, with 95%
confidence.
d) the professor can expect 95% of the students in a new batch with an average
quiz score of 5, to obtain a final exam score between 15 and 25.

You might also like