You are on page 1of 2

Nanyang Business School

AB1202 – STATISTICS AND ANALYSIS

Tutorial :5
Topics : Bivariate Distributions and Correlations

1. Please download the data file “W5-Question 1.csv” from the course website. Then you can
use the following command in R to load the data into R for calculation.

read.csv(file.choose(),header=T)

The file contains a discrete bivariate distribution of X and Y. To calculate the answers for
the question below, you must know how to calculate the sum of probabilities based on
certain “conditions,” let’s call it conditional sum. To impose a condition, we use “[]” in R.
Again, if you don’t know how, please refer to the lecture video/ppt.

Determine each of the following probabilities:


(1) Pr(X=2)
(2) Pr(Y≥3)
(3) Pr(X≤2 and Y≤2)
(4) Pr(X=Y)
(5) Pr(X>Y)

2. Please download the data file “W5-Question 2.csv” from the course website. Then you can
use the following command in R to load the data into R for calculation.

read.csv(file.choose(),header=T)

The file contains data from a collection of 200 households surveyed (use the “View()”
command to view the loaded csv file). The first variable X is the number of members in a
randomly selected household from the survey, and the second variable Y is the number of
cars owned by the household. The last column is the frequencies of observations. The 200
surveyed households are equally likely to be selected.

To proceed, you must first calculate the probabilities for different value of X and Y. You can
do that based on the frequency column in the data. If you don’t know how, please refer to
the lecture video/ppt.

(1) What is the joint probability Pr(X=4,Y=0)?


(2) What is the marginal probability function f(X=4)?
(3) What is the conditional probability function of Y given X=4?
(4) What is the conditional mean of Y given X=4?

3. Suppose that X and Y are negatively correlated. Is Var(X+Y) larger or smaller than Var(X-Y)?
1
Nanyang Business School

!
4. Suppose that X and Y are random variables such that Var(X)=9, Var(Y)=4, and ρ(X,Y)=− ".

Determine:
(1) Var(X+Y)
(2) Var(X-3Y+4)

5. Let X be an RV representing the weather of a day. For simplicity, suppose X can only be
rain, sunny, and cloudy. Let Y be the average temperature (degree Celsius) of a day, and Z
be the amount of iced tea sold at Canteen-1 in a day.
(1) How would you define the meaning of the conditional distribution Pr(X|Y=20)?
How is it different from Pr(X)?
(2) What does E(Y|X=”sunny”) mean?
(3) Use commonsense to compare E(Z|Y=35) and E(Z). Which one should be larger?

6. [Correlation does not imply causality] Mr. Pearl is an experienced community worker. He
is heading a government program that promotes regular exercises and awareness of the
health impact of obesity. Under this program, he monitors the cholesterol level (X) and
cardio exercises (Y, in no of hours per week) of most residents in a district. Surprisingly, he
finds a high and positive correlation between X and Y. This result seems to suggest that
more exercises are related to a higher cholesterol level, which is rather counterintuitive to
him. To understand what went wrong, he interviewed local residents enrolled in the
program. He found that, compared with younger residents, older residents tend to
exercise more and also have a higher cholesterol level.

Based on the interview, what reason(s) can Mr. Pearl use to reexamine the high correlation
between X and Y based on the interviews?

You might also like