You are on page 1of 14

Introduction to Probability and Statistics (IPS)

Instructors: Profs. Banerjee, Bhattacharya, Mukhoti and Ranjan


(PGP-I / TERM- I/ AY 2018-19)
Endterm - September 18, 2018
Total points - 100
Duration: 3 hours
Allowed: Only a calculator; It is closed book, closed-notes examination
Instruction: Attempt all questions. All answers should be properly justified, except for MCQs. For non-
MCQ problems, no marks could be claimed without proper justification.

Full Name:

Roll No.:

Section:

Question Points Score


1 40
2 5
3 9
4 6
5 7
6 15
7 10
8 8
Total: 100
Multiple Choice type questions
1. (40 points) MCQ (20 × 2 = 40):
(1) Assume the amount of soft drink I consume on any given day is independent of the consumption
on any other day, and is normally distributed with µ = 13 oz and σ = 2. If I currently have two
six-packs of 16-oz bottles, what is the probability that I need to buy more soft drink before the end
of the second week (14 days)?
(a) less than 0.0
(b) less than 0.16 CORRECT
(c) 0.2357
(d) none of the above
Let X be the total amount consumed in 14 days. Then, X ∼ N (182, 56). Required probability is
1 − P (X ≤ 192) = 1 − Φ(1.336) = 1 − 0.9092 = 0.0908. Correct option is (b).

(2) Let X denote the number of Canon digital cameras sold during a particular week by a certain store.
The pmf of X is

x 0 1 2 3 4
pX (x) 0.1 0.2 0.3 0.25 0.15

Sixty percent of all customers who purchase these cameras also buy an extended warranty. Let Y
denote the number of purchasers during this week who buy an extended warranty. The expected
number of warranties sold is:
(a) 0.6 ∗ X
(b) 1.29 CORRECT
(c) 2.4
(d) none of the above
E(Y ) = E(E(Y | X)) = E(0.6X) = 0.6 × 2.15 = 1.29. Correct option is (b).

(3) The Central Limit Theorem:


(a) Requires some knowledge of the frequency distribution.
(b) Permits us to use sample statistics to make inference(s) about population parameters. COR-
RECT
(c) Relates the shape of a sampling distribution of the proportion to the proportion of the sample.
(d) Requires a sample to contain fewer than 30 observations

(4) Consider a sample of size 250, from a population with known standard deviation of 13.7, the mean
is found to be 112.4. The 95% confidence interval for the mean is :
(a) (110.702, 114.098) CORRECT
(b) (120.702, 224.098)
(c) (130.702, 234.098)
(d) (108.702, 116.098)

Page 2
(5) For sample size n = 9025, sample proportion p̂ = 0.32, the 95% confidence interval for population
proportion p is:
(a) (0.3104, 0.3296) CORRECT
(b) (0.3004, 0.3196)
(c) (0.3204, 0.3396)
(d) (0.3004, 0.3296)

(6) Which of the following t-distributions would be expected to have the most area in the tails?
(a) Sample Mean = 0.83, df = 12.
(b) Sample Mean = 15, df = 19.
(c) Sample Mean =15, n = 19.
(d) Sample Mean = 8.3, n = 12. CORRECT

(7) If X1 , X2 , X3 are 3 observations from N (µ, σ 2 ), then for testing H0 : µ = µ0 Vs H1 : µ > µ0 ,


X̄ − µ0
T = √ : (mark the correct statement)
S/ n
(a) will not be symmetric about its mean
(b) will have unit variance
(c) will follow Z−distribution
(d) will not have finite V (T ) CORRECT

(8) An advertiser is believed to exaggerate claims about a company’s product, (high performance, larger
measurable average). An agency wants to prove that this advertiser’s claims are exaggerated. There
are data available. The correct hypothesis test will be:
(a) Two-tailed test
(b) Right-tailed test
(c) Left-tailed test CORRECT
(d) None of these

(9) Which of the following is correct for a testing of hypothesis problem if the null hypothesis is rejected
at 1% level of significance?
(a) The p-value of the test is more than 0.01.
(b) The null hypothesis can be rejected at 5% level of significance as well. CORRECT
(c) One has committed a Type-II error if the null hypothesis is in reality, true.
(d) P(Type-I error) + P(Type-II error) is definitely less than 0.01.

(10) When conducting a test about the population mean with sample size 15, using sample mean and
sample standard deviation, the cut-off is:
(a) Z-value
(b) t-value with df = n

Page 3
(c) t-value with df = n + 1
(d) None of the above CORRECT; df should be n − 1.

(11) An increase in α, the level of significance, implies:


(a) An increase in the probability of the type I error to occur CORRECT
(b) A decrease in the probability of type I error to occur
(c) No change in any of the type I or type II error
(d) None of the above

(12) Which of the following is not a valid null-alternative hypothesis pair?


(a) H0 : µ ≤ 21 Vs. H1 : µ > 21
(b) H0 : µ = 21 Vs. H1 : µ > 21
(c) H0 : µ ≤ 21 Vs. H1 : µ = 22
(d) H0 : µ ≤ 21 Vs. H1 : µ = 21 CORRECT

(13) Suppose the sample proportion of students in a college who watch Game of Thrones is p̂, computed
using a sample of 100 students. Assuming the true population proportion is p, 0 < p < 1, the
standard error of p̂ is
(a) more than 0.5
(b) more than 0.05
(c) at most 0.5
(d) at most 0.05 CORRECT
p p
Standard error is p(1 − p)/100 ≤ 1/4/10 = 0.05.

(14) The Student body managing the Pi-Shop wants to test whether the proportion (p) of students
opting for fat-free ice-cream in contrast to regular one is 30%. A random sample of 625 students
asked about their buying preferences resulted in the 95% confidence interval of p as (0.283, 0.356).
Which of the following statements is incorrect?
(a) The null hypothesis is rejected at 5% level of significance. CORRECT
(b) They failed to reject the null hypothesis at 5% level of significance.
(c) The sample proportion was observed to be approximately 0.32.
(d) They would even fail to reject the null hypothesis of p = 0.35 at 5% level of significance.

(15) Which of the following statements are not correct assumptions for developing pooled confidence
intervals and for testing hypotheses about the difference between two population means (µ1 − µ2 )?
(a) Both populations are normally distributed
(b) The samples selected from the two populations are independent random samples.
(c) The two population variances are equal (σ12 = σ22 ).
(d) The degrees of freedom of the t distribution is n1 + n2 − 1. CORRECT

Page 4
(16) Toys are entering the virtual world, and Mattel recently developed a digital version of its famous
Barbie. The average price of the virtual doll is reported to be $60. A competing product sells for an
average of $65. Suppose both averages are sample estimates based on independent random samples
of 25 outlets selling Barbie software and 20 outlets selling the competing virtual doll, and suppose
the sample standard deviation for Barbie is $14 and for the competing doll it is $12. The correct
hypothsis for testing the equality of average prices would be:
(a) H0 : µx ≥ µy vs. Ha : µx < µy
(b) H0 : µx − µy = 0 vs. Ha : µx − µy > 0
(c) H0 : µx − µy = 0 vs. Ha : µx − µy 6= 0 CORRECT
(d) H0 : µx ≤ µy vs. Ha : µx > µy

(17) Which of the following is incorrect for a pooled t-test?


(a) The P-value is the tail probability of a t-distribution
(b) For large degrees of freedom, the t-critical value can be approximated by Z-value
(c) Even if the distribution of the data is not normal, sampling distribution of the test statistic can
be approximated by t- (with appropriate degrees of freedom) using the Central Limit Theorem.
CORRECT
(d) Two-sided tests use tα/2 and one-sided tests use tα as the critical values

(18) Satterthwaite approximation is used for:


(a) Testing means of two dependent Normal populations with unknown variances.
(b) Testing means of two independent Normal populations with known variances.
(c) Testing means of two independent Normal population with unknown and unequal variances.
CORRECT
(d) Testing means of two independent Normal populations with unknown but equal variances.

(19) Consider a dataset having 15 observations for a dependent variable y and an independent predictor
variable x. The mean values of y and x are respectively 0.788 and −0.083. The equation of the
linear regression line fitted using the least squares method is given by ŷ = 0.9666 + 2.1212x. Which
of the following statements is correct in this context?
(a) If a new observation x = −0.083, y = 0.788 is added to the dataset and a simple linear
regression model is fit to the updated dataset with 16 observations, the slope of the fitted line
will change.
(b) If a new observation x = −0.083, y = 0.788 is added to the dataset and a simple linear regression
model is fit to the updated dataset with 16 observations, the intercept of the fitted line will
change.
(c) If a new observation x = −0.083, y = 0.788 is added to the dataset and a simple linear regression
model is fit to the updated dataset with 16 observations, the fitted line will remain unchanged.
CORRECT
(d) For unit change in the value of x, the change in y is 0.9666.

(20) Car dealers are often interested in determining the trade value of a car based on the odometer
reading (number of miles driven). A simple linear regression line was fit using the trade value (in

Page 5
$1000s) as response (y) and the odometer reading (in 1000s of miles) as the independent variable
(x). The fitted regression line using the method of least squares is ŷ = 17.250 − 0.0669x, and
the coefficient of determination (R2 ) is 0.6483. In the context of this problem, which one of the
following statements is not correct?
(a) Trade values and Odometer readings are negatively correlated with each other.
(b) 64.83% of the variability in trade values is explained by odometer readings.
(c) The estimated trade value of a car with odometer reading between 30,000 and 40,000 miles is
less than $14,000. CORRECT
(d) Correlation coefficient between x and y is rxy = −0.8052.
At x = 30, y = 17.250 − 0.0669 ∗ 30 = 15.243, and at x = 40, y = 17.250 − 0.0669 ∗ 40 = 14.574.
Hence the estimated value in the given range is higher than $14,000.

Page 6
Short-answer type questions
2. (5 points) A particular brand of dishwasher soap is sold in three sizes: 25oz, 40oz, and 65 oz. Twenty
percent of all purchasers select a 25 oz box, fifty percent select a 40 oz box, and the remaining thirty
percent choose a 65 oz box. Let X1 and X2 denote the package sizes selected by two independently
selected purchasers. Find the distribution of the average package size (i.e., (X1 + X2 )/2).

Sample space of X̄ = (X1 + X2 )/2 is given by Ω = {25, 32.5, 40, 45, 52.5, 65}.

P (X̄ = 25) = P (X1 + X2 = 50)


= P (X1 = 25)P (X2 = 25)
= 0.2 × 0.2 = 0.04. (1)

P (X̄ = 32.5) = P (X1 + X2 = 65)


= 2P (X1 = 25)P (X2 = 40)
= 2 × 0.2 × 0.5 = 0.2. (2)

P (X̄ = 40) = P (X1 + X2 = 80)


= P (X1 = 40)P (X2 = 40)
= 0.5 × 0.5 = 0.25. (3)

P (X̄ = 45) = P (X1 + X2 = 90)


= 2P (X1 = 25)P (X2 = 65)
= 2 × 0.2 × 0.3 = 0.12. (4)

P (X̄ = 52.5) = P (X1 + X2 = 105)


= 2P (X1 = 40)P (X2 = 65)
= 2 × 0.5 × 0.3 = 0.3. (5)

P (X̄ = 65) = P (X1 + X2 = 130)


= P (X1 = 65)P (X2 = 65)
= 0.3 × 0.3 = 0.09. (6)

Page 7
3. (9 points) Direct market companies are turning to the Internet for new opportunities. A recent study
by Gruppo, Levey, & Co. showed that 73% of all direct marketers conduct transactions on the Internet.
Suppose a random sample of 300 direct marketing companies is taken.

(a) What is the probability that between 210 and 234 (inclusive) direct marketing companies are turning
to the Internet for new opportunities? [6 points]
Let p̂ denote the sample proportion of companies turning to the Internet. Also denote X as the
number of companies out of 300 turning to the Internet.
 
210 234
P (210 ≤ X ≤ 234) = P ≤ p̂ ≤
300 300
= P (0.70 ≤ p̂ ≤ 0.78)
!
0.70 − 0.73 0.78 − 0.73
= P p ≤Z≤ p
0.73 × 0.27/300 0.73 × 0.27/300
= Φ(1.95) − Φ(−1.17)
= 0.9744 − (1 − 0.8790) = 0.8534.

(b) What is the probability that 78% or more direct marketing companies are turning to the Internet
for new opportunities? [3 points]

P (p̂ ≥ 0.78) = P (Z ≥ 1.95)


= 1 − Φ(1.95) = 0.0256.

Page 8
4. (6 points) Oscar T. Grady is the production manager for Citrus Groves Inc., located just north of Ocala,
Florida. Oscar is concerned that the last 3 years’ late freezes have damaged the 2500 orange trees that
the Citrus Groves owns. In order to determine the extent of damage to the trees, Oscar has sampled
the number of oranges produced per tree for 42 trees and found that the average production was 525
oranges per tree with a standard deviation of 30 oranges per tree.
(a) Estimate the standard error of the mean for this finite population. [2 points]
σ̂ 30
Standard error = √ = √ = 4.6291.
n 42
(b) Construct a 98% confidence interval for the mean per-tree output of all 2500 trees. [4 points]
98% confidence interval for mean is given by
σ̂
X̄ ∓ z0.01 √ ,
n

which is given by
525 ∓ 2.33 × 4.6291 = (514.215, 535.785).

Page 9
5. (7 points) When an election for political office takes place, the TV networks cancel regular programming
and instead provide election coverage. When the ballots are counted, the results are reported. However,
for important offices such as President, the networks actively compete to see which will be the first to
predict a winner. This is done through exit polls, wherein a random sample of voters who exit the polling
booth is asked for whom they voted. From the data, the sample proportion of voters supporting the
candidates is computed.
(a) Paris Flash network conducts an exit poll during general elections in the country of San Theodoros,
where there are two Presidential candidates – General Alcazar and General Tapioca. Out of 765
voters interviewed in the exit poll, 407 people said that they have voted for General Alcazar. Can
Paris Flash conclude from these data that General Alcazar will win the elections? Give statistical
validations for your answer. Use level of significance to be α = 0.05. [4 points]
Let p denote the true prportion of votes obtained by General Alcazar. In order to conclude that he
is the winner, we have to test
H0 : p ≤ 0.5 vs. H1 : p > 0.5.
The test statistic is given by
p̂ − 0.5
T =r .
0.5 × 0.5
n
We reject the null hypothesis at level-α if

obsvd.(T ) > zα .

407
Now, the observed value of p̂ is = 0.532. Hence,
765
0.532 − 0.5
obsvd.(T ) = r = 1.77 > z0.05 = 1.645.
0.5 × 0.5
765

Hence we reject H0 at 5% level of significance and conclude that the data shows evidence that
General Alcazar will win the elections.
(b) Ottokar Tribune network is more cautious, and they want to be sure enough before making any
prediction announcement to the general public. Find the minimum number of voters they should
include in their exit polls interview if they want to keep a 4% margin of error at 95% level of
confidence. [3 points]
The margin of error at 95% level of confidence is given by
r r
p(1 − p) 0.25
z0.025 ≤ z0.025 .
n n
We need the RHS of the above expression to be at most 0.04. Hence, n should satisfy
2
z0.025 × 0.25
n≥ ≈ 601.
0.042
Hence Ottokar Tribune must interview at least 601 voters.

Page 10
6. (15 points) The following is the consolidated balance sheet of HeraFeriwala Pvt. Ltd., a retail sales
company operating majorly in streetside sales of garments and low cost metal utensils.

Consolidated Balance Sheet of HeraFeriwala Pvt. Ltd. (in units of INR 100000)
Assets Mar, 31, 2016 Mar 31, 2017
Current Assets:
Cash 100 400
Marketable Securities 200 300
Inventories 250 200
Accounts Receivable 300 500
Total Current Assets 850 1400

Table 1: Balance sheet (Current ratio in both the years : 1.0625)

While auditing, it was found from the profit and loss statement that the sales for current as well as
previous year are 500 (in units of INR 100000). Also the company reported that 80% of their sales was
in cash in the current year, among which 100 was converted into marketable securities, in addition to
existing 200 (carried from previous year). The auditor smells a possible overstatement in the balancesheet
due to mismatch between sales and current assets figures.
(a) How would you proceed to verify the auditor’s suspect on the company’s claim? To substantiate the
claim, she inspects the actual invoices, which are receivable. The number of invoices for the current
year was 24222. The auditor took a sample of 47 invoices and the sample total was found to be
INR 116807.1 (actual and not in units of INR 100000). Set up the appropriate null and alternative
hypotheses and justify them. [3 points]
Let µ denote the mean receivable amount. According to the balance sheet, for 24222 invoices
receivable, µ is claimed to be

µ0 = 500 × 105 /24222 = 2064.239.

As the suspecting auditor, she would want to verify whether the sample suggests significantly lower
value of µ. Hence she should test

H0 : µ ≥ 2064.239 vs. H1 : µ < 2064.239.

(b) Carry out the test with sample standard deviation 1224.505 at 5% level of significance. State your
assumptions clearly. What is your conclusion? [4 points]
The sample mean is observed to be x̄ = 116807.1/47 = 2485.257. The observed sample standard
deviation is given by s = 1224.505.
We assume that the receivable amounts are independent of each other and each one of them follows a
probability distribution with finite mean and finite variance (not necessarily the same mean and/or
variance). The test statistic based on the sample mean X̄ will then follow a Normal distribution
for large samples via CLT.
The observed value of the test statistic is given by
2485.257 − 2064.239
√ = 2.357.
1224.505/ 47

Since this is a left-tailed test, we would reject H0 at 5% level if the observed value of the test
statistic is less than −z0.05 = −1.645. Hence we fail to reject H0 at 5% level of significance and
conclude that there is no significant evidence to support the auditor’s suspicion.

[NOTE: Since the sample mean came out to be greater than µ0 = 2064.239, hence we would fail
to reject the left-tailed test (trivially!). One can actually argue this without even knowing the
standard deviation. Just finite variance assumption would suffice.]

Page 11
(c) To strengthen her findings, the auditor decides to check the inventory records. The company holds
a large number of warehouses accross the country. Out of their 560 ware houses, 420 are smaller
ones and 140 are large warehouses. Among the smaller ones, 160 are located in West Bengal, 120 in
Maharashtra and the rest in Orissa. The larger ones are equally distributed in the sub-urban areas
of the cities Bhubaneswar, Chennai, Faridabad, Durg and Vishakhapatnam. Due to restrictions
on cost and time, the auditor slected 36 warehouses in Kolkata and inspected them. She found
that out of 36, 13 has marginally misreported the inventory amount in the current year whereas
9 reported wrong inventory figures in the previous year. Carry out a suitable test for the auditor
to understand if the proportion of inventory misreporting has aggravated since last year or not
(Assume 5% level). [6 points]
Let p1 and p2 denote the proportion of misreportings in the current year and in the previous year
respectively. We have to test
H0 : p1 ≤ p2 vs. H1 : p1 > p2 .
Let the corresponding sample proportions be denoted by p̂1 and p̂2 respectively. We reject H0 at
level-α if  
 
 p̂1 − p̂2 
obsvd. 
 s    > zα ,

 1 1 
p̂(1 − p̂) +
n1 n2
where n1 and n2 respectively denote the sample sizes for the current and the previous year, and p̂
is given by
n1 p̂1 + n2 p̂2
p̂ = .
n1 + n2
Here, n1 = n2 = 36, and observed values of sample proportions are p̂1 = 13/36 = 0.361, p̂2 =
9/36 = 0.25, which gives p̂ = 22/72 = 0.306.
Thus, the observed value of the test statistic is given by
0.361 − 0.25
p = 1.024 < z0.05 = 1.645.
0.306 × (1 − 0.306) × 2/36

Hence we fail to reject H0 at 5% level of significance and conclude that we do not have significant
evidence that misreporting has aggravated since last year.
(d) Do you think that the test you have carried out to verify the proportion of inventory misreporting
over the years, is conclusive at 5% level? Justify your answer. [2 points]
The sample is not a random sample. She should have considered a stratified sampling scheme, and
also account for the correlation arising in this case.

Page 12
7. (10 points) The investigation and quantification of chaos in financial markets have gained momentum
in recent times with the advent of advanced models of chaos and complexity. Several tools and statistics
have been developed in an attempt to capture the linear and non-linear dynamics of the financial markets
around the globe (e.g., Lyapunov exponent, Hurst exponents, etc.). The following table shows the Hurst
exponent based index computed using the data on the daily closing value in the/ period January 2005
to July 2011 for ten stock exchanges from developed markets (USA, UK, ...) and ten from emerging
markets (Hungary, Brazil, ...). Conduct a pooled t-test for comparing the difference between the average
index values at level α = 0.05. [Use tα/2 = 2.01. Explicitly state the hypothesis, test statistic, critical
value, decision rule and the conclusion in the context of the question.]

Developed Emerging
0.055 0.064
0.062 0.088
0.060 0.073
0.062 0.090
0.057 0.086
0.055 0.067
0.068 0.088
0.065 0.078
0.069 0.061
0.069 0.065

Let µ1 and µ2 denote the average index values for developed markets (X) and evolving markets (Y )
respectively.
We have to test
H0 : µ1 − µ2 = 0 vs. H1 : µ1 − µ2 6= 0.

The test statistic corresponding to the pooled t-test is given by

X̄ − Ȳ
T = p ,
Sp 1/n1 + 1/n2

where
(n1 − 1)S12 + (n2 − 1)S22
Sp2 =
n1 + n2 − 2
is the pooled variance. Here the sample sizes are both n1 = n2 = 10. We reject H0 at level-α if
|T | > tα/2,n1 +n2 −2 .
From the data, we get,
X̄ = 0.0622, Ȳ = 0.0760, Sp = 0.0089.
The observed value of the test statistic is given by −3.4529, whose absolute value is greater than tα/2,18 =
2.01. Hence we reject H0 at 5% level of significance and conclude that the average index values in the
two different markets are not equal.

Page 13
8. (8 points) The CEO of Royal Jelly, a baby-food producer, claims that her company’s product is superior
to that of her leading competitor because babies gain weight faster with her product (this is a good thing
for babies). To test this claim, a survey was undertaken. Mothers of newborn babies were asked which
baby food they intended to feed their babies. Those who responded Royal Jelly or the leading competitor
were asked to keep track of their babies’ weight gain over the next 2 months. There were 15 mothers who
fed their babies with Royal Jelly and 25 mothers who fed their babies with the product of the leading
competitor. Each baby’s weight gain (in ounces) was recorded, resulting in the following summary
statistics:

Sample size Mean Variance


Royal Jelly 15 60.019 4.310
Other 25 54.882 9.839

Preliminary analysis of the data also revealed that the true population variances for weight gains in the
two cases are not equal to each other. Based on these information, can we conclude that Royal Jelly is
indeed superior in terms of weight gain? Use level of significance to be 0.05.
Let µ1 and µ2 respectively denote the mean gain in weight for babies having Royal Jelly and the other
one respectively. We have to test

H0 : µ1 − µ2 ≤ 0 vs. H1 : µ1 − µ2 > 0.

The test statistic is given by


X̄ − Ȳ
T =p
S12 /n1 + S22 /n2

We reject H0 at level-α if obsvd.(T ) > tα,ν , where

(S12 /n1 + S22 /n2 )2


ν≈
(S12 /n1 )2 (S 2 /n2 )2
+ 2
n1 − 1 n2 − 1

From the given information, we get ν ≈ 37.54., and the observed value of the test statistic as 6.225 >
t0.05,37.54 = 1.686. Hence we reject H0 and conclude that Royal Jelly is indeed superior.

Page 14

You might also like