You are on page 1of 3

STAT 5101: Foundations of Data Science

2021-22 Term 1
Assignment #2 [NEW version]
Due: Oct 13rd , 2021 (Wed) at 9:30pm
Total Score: 100 points
This assignment covers material from Chapter 4 to Chapter 5 of the lecture notes.
You are encouraged to show your calculation steps in details, so as to obtain partial scores in
case of incorrect answers.
How to turn in the assignment? During the lecture

Problem 1 [8 points]: Suppose screening test A has been used to detect disease D. Based on
historical performance, it's known that 21% of patients who are A positive will have disease
D, while 99% of patients who are A negative will NOT have disease D. It is known that 20%
of the population are detected to be A positive.
(a) [2 points] What’s the rate of disease D in the population?
A new screening test (Test B) is now purposed, and we would like to verify if it performs
better than Test A. It's known that 23% of patients who are B positive will have disease D.
It's known that 10% of the population are detected to be B positive.
(b) [3 points] Based on the result in part (a), calculate the conditional probability of
disease D given B negative.
(c) [3 points] Compute the relative risk of disease D given A positive, and the relative risk
of disease D given B positive. What conclusion can you draw from the two values?

Problem 2 [19 points] Suppose the genotype of Betty is unknown, but the genotypes of her
four grandparents are given by the chart below:

Grandparents: BA AO AB AB

Parents: Father ?? Mother ??

Child: Betty ??

(a) [2 points] What is the probability that genotype of both Father and Mother are AB?
(b) [3 points] What is the probability that genotype of Betty is AB?
The people with genotype AA, AO will have Phenotype A.
(c) [3 points] What is the probability that Betty’s Phenotype is A?
(d) [5 points] Are the blood type of Betty and Betty’s brother independent or not? Why?
What is the probability that both Betty’s and Betty’s brother’s Phenotype are A?
(e) [3 points] Given that Mother’s phenotype is A, what’s the conditional probability
that Betty’s Phenotype is also A?
(f) [3 points] Given that Betty’s phenotype is A, what’s the conditional probability that
Mother’s Phenotype is also A?
Page 1/3
Problem 3 [25 points] Consider a discrete random variable 𝑋 with CDF (Cumulative
Distribution Function) 𝐹(𝑥) specified below:

(a) [4 Points] Sketch the Probability Mass Function (PMF) of X


(b) [2 Points] Please calculate the value of Pr(2 < 𝑋 ≤ 3) and Pr(𝑋 = 3)
(c) [2 Points] Please calculate the value of Pr(𝑋 = 4) and Pr(3 ≤ 𝑋 ≤ 5)
(d) [2 Points] Please calculate the value of Pr(𝑋 = 6) and Pr(3 < 𝑋 ≤ 7)
(e) [4 Points] Please compute the population mean 𝜇 = E(𝑋), population variance
𝜎 2 = 𝑉𝑎𝑟(𝑋) and E[(𝑋 + 2)2 ]
(f) [3 Points] Compute Pr(𝜇 − 𝜎 < 𝑋 ≤ 𝜇 + 𝜎).
(g) [3 Points] If we transform the random variable 𝑋 to 𝑌 as 𝑌 = 3𝑋 − 2, please
find the Expect value of 𝑌 and the Variance of 𝑌.
(h) [5 Points] If we transform the random variable 𝑋 to 𝑍 as 𝑍 = 𝑋 2 , please find the
Probability Mass Function (PMF) of 𝑍, the CDF (Cumulative Distribution Function) of
𝑍, and graph the CDF plot.

Problem 4 [16 points]: There are 135 students in STAT1012 this semester. Suppose the
probability that a student live in the Lamma Island is 0.02. Let X be the total number of
students from STAT1012 who lives in the Lamma Island.
(a) [2 points] What probability distribution (including its parameters) does X follow?
(b) [3 points] What is the probability that at most 3 STAT1012 students live in the Lamma
Island?
(c) [3 points] Sketch the cumulative distribution function (cdf) of X up to x = 3.5.
(d) [3 Points] What’s the expectation values and variance of 𝑋? What’s the shape of the
distribution of 𝑋 (Right-skewed, left-skewed or symmetric)

Optional (You may choose to answer Part(e)(f) or Part(g))


(e) [2 points] Do you think X can be well-approximated by a Poisson distribution? Explain.
(f) [3 points] Approximate the probability in part (b) based on Poisson approximation.

(g) [5 points] If we randomly choose 10 students in the class, and find that 5 of them are live
in Lamma Island. What conclusion can we draw from it?

Page 2/3
Problem 5 [14 points]: Let the independent random variable 𝑋 and 𝑌 have binomial
distributions with parameters 𝑛𝑋 = 2, 𝑝𝑋 = 𝑝 and 𝑛𝑌 = 4, 𝑝𝑌 = 𝑝, respectively. If
Pr(𝑋 ≥ 1) = 5/9
(a) [3 points] Please find 𝑝 and Pr(𝑌 ≥ 2).
(b) [4 points] Please compute Pr(𝑋 = 𝑌).
(c) [4 points] Please compute Pr(𝑋 > 𝑌).
(d) [4 points] Please find the mode of 𝑋 and 𝑌. Are the distributions of 𝑋 and 𝑌
right-skewed, left-skewed or symmetric?

Optional (You may choose to answer Problem 6 or Problem 7)


Problem 6 [18 points]: Toss a fair one-dollar coin twice and a fair Five-dollar coin three
times, respectively. Let 𝑋 be the numbers of heads for one-dollar coin and 𝑌 be the
numbers of heads for five-dollar coin.
(a) [2 Points] What probability distribution (including its parameters) does X follow? What
about 𝑌?
(b) [3 Points] If we let random variable 𝑍 = 𝑋 + 𝑌, what’s the Expectation value and
variance of 𝑍?
(c) [6 Points] Could you please find out the Probability Mass Function (PMF) of 𝑍?
(d) [2 Points] What probability distribution function does 𝑍 follow? What conclusion can
you draw based on the result?
(e) [5 Points] If we let random variable 𝑊 = 𝑋 − 𝑌, Could you please find out the
Probability Mass Function (PMF) of 𝑊? And what’s the Expectation value and Variance of
𝑊?

Problem 7 [18 points]: The number of cracks in a section of highway that is significant
enough to require repair is assumed to follow a Poisson distribution. Let 𝑋 be the number
of cracks in 15km, and we have the information that the probability of 4 cracks in 15km is
equal to the probability of 5 cracks in 15km (Pr(𝑋 = 4) = Pr(𝑋 = 5))
(a) [3 points] Please find the Expect value 𝑋 (The expect cracks in 15km) and the
Variance of 𝑋.
(b) [3 points] Find the number of cracks in 15km which have the largest probability.
(c) [4 points] What’s the probability that at least one crack requires repair in 5km of the
highway?
(d) [4 points] Let 𝑌 be the number of cracks in 5km, sketch the (CDF) Cumulative
Distribution Function and graph up to 𝑦 = 4.5.
(e) [4 points] If we should order the material to fix the cracks beforehand, how many
packages of the material (One package for one crack) shall we order to ensure that all
the cracks in 5km can be fixed with at least 95% chance?

- End of Assignment -
Page 3/3

You might also like