Lecture 4: Joint Probability Distribution & Limit Theorems: Wisnu Setiadi Nugroho

Lecture 4: Joint Probability Distribution & Limit
Theorems
Wisnu Setiadi Nugroho
Universitas Gadjah Mada
Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 1 / 48

1 Joint Probability Distributions
Bivariate distributions
More than two random variables
2 Limit Theorems
Introduction
Chebyshev’s Theorem
Law of Large Numbers
Central Limit Theorem
3 Sampling Distributions
Introduction

Dealing with more than one random variables
In practice, we are dealing with more than 1 random variables.

These variables may not be independent to each other.
For example:
▶ height and weight
▶ being poor and finishing elementary school
In this section, we study the joint distribution of two variables:
bivariate distributions.

Joint probability function
Definition
Let X and Y be random variables. If both X and Y are discrete, then:
f (x , y ) = P (X = x , Y = y )
is called the joint probability function (joint pmf) of X and Y .

Definition
If both X and Y are continuous then f (x , y ) is called the probability
density function (joint pdf) of X and Y iff:
Z bZ d
P (a ≤ X ≤ b, c ≤ Y ≤ d) = f (x , y ) dxdy
a c

Example
Suppose that there are 8 red balls, 10 yellow balls, and 20 blue balls in
a bucket. A total of 10 balls are randomly selected from this bucket.
Let X = number of red balls and Y = number of blue balls. Find the
joint probability function of the bivariate random variable (X , Y ).

Theorem
If X and Y are two random variables with joint probability function
f(x , y ), then:
f (x , y ) ≥ 0 for all x and y
P
If X and Y are discrete, then x ,y f (x , y ) = 1
where the sum is over all values (x , y ) that are assigned nonzero
probabilities. If X and Y are continuous, then
Z ∞ Z ∞
f (x , y ) = 1.
−∞ −∞

Marginal distribution function
Suppose that we are given the joint probability function (pdf or pmf)
We can obtain the probability distribution function of one of the
components through the marginals.

Definition
The marginal pmf or pdf of X , fX (x ) is defined by:
(R ∞
−∞ f (x , y ) dy X , Y are continuous
fX (x ) = P
∀y f (x , y ) X , Y are discrete
similarly, fY (y ) is defined by:

(R ∞
−∞ f (x , y ) dx X , Y are continuous
fY (y ) = P
∀x f (x , y ) X , Y are discrete

Example
Find the marginal probability density function of random variable X

and Y
y
-2 0 1 4
-1 0.2 0.1 0 0.2
x
3 0.1 0.2 0.1 0
5 0.1 0 0 0

Conditional Probability Distribution
The conditional probability distribution of the random variable X

given Y is given by:

 f (x ,y ) X , Y are continuous
fY (y )
f (x | y ) = f (x | Y = y ) =
 P(X =x ,Y =y ) X , Y are discrete
fY (y )

Example
Let
(C x + y ) x,y ≥ 0, x+y<1
(
f (x , y ) =
0 otherwise
otherwis
Find:
1) Show the range of (X,Y), RXy, in the x-y plane
2) Find the constant C
3) The Marginal PDF fX (x ) and fY (y )
4) Find P(Y<2X2)

Expected Value
Definition
Let f (x , y ) be the joint probability function. The expected value of
(X , Y ) is:
(P
x ,y xyf (x , y ) X , Y are discrete
E (X , Y ) = R∞ R∞
−∞ −∞ xyf (x , y ) dxdy X , Y are continuous

Properties of expected value
Let X and Y be two random variables. Then:

▶ E [aX + bY ] = aE [X ] + bE [Y ]
▶ If X and Y are independent, then
E [XY ] = E [X ] E [Y ]

Conditional Expectation
Definition
Let X and Y be jointly distributed with pmf or pdf f (x , y ). Then, the
conditional expectation of X given Y = y is:
(P
xxf (x | y ) X , Y are discrete
E (X | Y = y ) = R∞
−∞ xf (x | y ) dx X , Y are continuous

Example
Let (
(x+y)/2 x>0, y>0, 3x+y>3
f (x , y ) =
0 otherwise
Find:
▶ E [X ]

Covariance and Correlation
Definition
The covariance between two random variables X and Y is defined by:
σXY = Cov (X , Y ) = E (X − µX ) (Y − µY ) = E (XY ) − µX µY
where µX = E (X ) and µY = E (Y ). The correlation coefficient is defined

by:
Cov (X , Y )
ρ (x , y ) = p
Var (X ) Var (Y )

Covariance
If small values of X (X − µX ) < 0 are correlated with small values of

Y (Y − µY ) < 0, then the covariance is positive.
If small values of X (X − µX ) < 0 are correlated with large values of
Y (Y − µY ) > 0, then the covariance is negative.
Covariance is a signed measure of the variance of Y relative to X .
If X and Y are independent, then Cov (X , Y ) = 0.

Correlation
Correlation is the measure of the linear relationship between the

random variables X and Y . If Y = aX + b, a ̸= 0, then ρ (x , y ) = 1.
Unlike covariance, the correlation coefficient of X and Y is
dimensionless.

Properties
The properties of covariance and correlation coefficient:
▶ −1 ≤ ρ ≤ 1
▶ If X and Y are independent, then ρ = 0.
▶ If Y = aX + b, then:
(
1 if a > 0
ρXY =
−1 if a < 0
▶ If U = a1 X + b1 and V = a2 Y + b2 , then:
Cov (U, V ) = a1 a2 Cov (X , Y )
and (
ρXY if a1 a2 > 0
ρUV =
−ρXY otherwise
▶ Var (aX + bY ) = a2 Var (X ) + b 2 Var (Y ) + 2abCov (X , Y )

2 Limit Theorems
Introduction
Introduction

Sums of random variables
Let X1 , X2 , . . . , Xk be k random variables with means µ1 , µ2 , . . . , µk

and variances σ12 , σ22 , . . . , σk2 . Then:
E (X1 + X2 + . . . + Xk ) = µ1 + µ2 + . . . + µk .

Sums of random variables
Suppose that X1 , X2 , . . . , Xk are independent random variables. Then:
Var (X1 + X2 + . . . + Xk ) = σ12 + σ22 + . . . + σk2 .
If X1 , X2 , . . . , Xk are not independent random variables. Then:

K
X −1 K
X
Var (X1 + X2 + . . . + Xk ) = σ12 +σ22 +. . .+σk2 +2 Cov (Xi , Xj ) .
i=1 j=i+1

2 Limit Theorems
Introduction
Introduction

Motivation
Frequency intepretation of probability depends on the long-run

proportion of times the outcome would occur in repeated experiments.
Some binomial probabilities can be computed using the normal
distribution (using limiting arguments)
Many random variables that we encounter in nature have distributions
close to the normal probability distribution.

2 Limit Theorems
Introduction
Introduction

The theorem gives a lower bound for the area under a curve between
two points that:
▶ are on opposite sides of the mean
▶ equidistant from the mean
Why is this result important?
▶ We do not need to know the distribution of the underlying population
▶ We only need to know its mean and variance.
The theorem was developed by a Russian mathematician, Pafnuty
Chebyshev (1821-1894)

Theorem
Let the random variable X have a mean µ and standard deviation σ. Then
for K > 0, a constant:
1
P (|X − µ| < K σ) ≥ 1 −
K2
h i
E (X − µ)2 Var (X )
P {|X − µ| ≥ ϵ} ≤ = , for some ϵ > 0
ϵ2 ϵ2

This definition says that the probability that a random variable X

differs from its mean by at least K standard deviations is less than or
equal to 1/K 2 .
Suppose that we do not know the population distribution. We can
use Chebyshev’s theorem to have some understanding about the
population distribution:
At least 1 − k12 100% of observations will like within k standard

▶
deviations fo the mean.

▶ Suppose k = 2

Chebyshev’s Theorem and Empirical Rule
If the distribution is bell-shaped, then we have the following

empirical rule:
▶ approximately 68% of the observations lie within one standard
deviation of the mean
▶ approximately 95% of the observations lie within two standard
▶ approximately 99.7% of the observations lie within three standard

Example
Suppose that a random variable X has mean 24 and variance 9.

Obtain a bound on the probability that the random variable X
assumes values between 16.5 to 31.5.

Example
Let X be a random variable that represents the systolic blood pressure

of the population of 18- to 74-year old men. Suppose that X has
mean 129 mm Hg and standard deviation 19.8 mm Hg
▶ What is the bound on the probability that the systolic blood pressure of
this population will assume values between 89.4 and 168.6 mm Hg?
▶ In addition, assume that the distribution of X is approximately normal.
Using the normal table, find P (89.4 ≤ X ≤ 168.6).

2 Limit Theorems
Introduction
Introduction

Law of large numbers
We can now use Chebyshev’s theorem to prove the weak law of large
numbers.
The law says that if the sample size n is large, the sample mean rarely
deviates from the mean of the distribution of X .

Theorem
Let X1 , . . . , Xn be a set of pairwise independent random variables with
E (Xi ) = µ and Var (Xi ) = σ 2 . The for any c > 0,
n o σ2
P µ − c ≤ X̄ ≤ µ + c ≥ 1 −
nc 2
and as n → ∞, the probability approaches 1.
Sn

P −µ <ϵ →1
n
as n → ∞.

Without know the underlying distribution of Sn , the weak law of large

numbers states that:
▶ the sample mean X̄ = Snn differ from the population mean by less than
an arbitrary constant ϵ > 0.
▶ this probability tends to 1 as n tends to ∞.
This law is also referred to as the law of averages.

2 Limit Theorems
Introduction
Introduction

Central limit theorem
The Central Limit Theorem is one of the most important results in

probability theory.
The theorem suggests that the z-transform of the sample mean is
asymptotically standard normal.
No matter what the shape of the original distribution, the sampling
distribution of the mean approaches a normal probability distribution.

Central limit theorem
Theorem
If X1 , . . . , Xn is a random sample from an infinite population with mean µ
and variances σ 2 , and the moment-generating function MX (t), then the
(X̄ −µ)
limiting distribution of Zn = σ/√n as n → ∞ is the standard normal
probability distribution.
Also, Zn ∼ N(0, 1) for large n.

Example
A soft-drink vending machine is set so that the amount of drink

dispensed is a random variable with a mean of 8 ounces and a
standard deviation of 0.4 ounces. What is the approximate probability
that the average of 36 randomly chosen fills exceed 8.1 ounces?

2 Limit Theorems
Introduction
Introduction

Motivation
Sample is a set of random variables.

Thus, sample statistic (e.g., sample mean and variance of x ) is also
random.
Thus, sample statistic has a probability distribution.
We call it the sampling distribution (which is usually different from
the population distribution).

Motivation
Sampling distribution of a statistic provides:

▶ a theoretical model of the relative frequency histogram
▶ for the likely values of the statistic
▶ that one would observe through repeated sampling.

Sample
Definition
A sample is a set of observable random variables X1 , . . . , Xn . The number
n is called the sample size.

Random Sample
Definition
A random sample of size n from a population is a set of n independent
and identically distributed (iid) observable random variables X1 , . . . , Xn .

Statistic
Definition
A function T of observable random variables X1 , . . . , Xn that does not
depend on any unknown parameters is called a statistic.

Sampling distribution
Definition
The probability distribution of a sample statistic is called the sampling
distribution.

Sample mean and sample variance
Theorem
Let X1 , . . . , Xn be a random sample
of size n from
a population with mean
2 σ2
µ and variance σ . Then, E X̄ = µ and Var X̄ = n .

Lecture 4: Joint Probability Distribution & Limit Theorems: Wisnu Setiadi Nugroho

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 4: Joint Probability Distribution & Limit Theorems: Wisnu Setiadi Nugroho

Uploaded by

Copyright:

Available Formats

Lecture 4: Joint Probability Distribution & Limit

Wisnu Setiadi Nugroho

Universitas Gadjah Mada

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 1 / 48

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 2 / 48

In practice, we are dealing with more than 1 random variables.

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 3 / 48

is called the joint probability function (joint pmf) of X and Y .

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 4 / 48

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 5 / 48

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 6 / 48

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 7 / 48

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 8 / 48

similarly, fY (y ) is defined by:

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 9 / 48

Find the marginal probability density function of random variable X

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 10 / 48

The conditional probability distribution of the random variable X

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 11 / 48

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 12 / 48

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 13 / 48

Let X and Y be two random variables. Then:

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 14 / 48

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 15 / 48

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 16 / 48

σXY = Cov (X , Y ) = E (X − µX ) (Y − µY ) = E (XY ) − µX µY

where µX = E (X ) and µY = E (Y ). The correlation coefficient is defined

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 17 / 48

If small values of X (X − µX ) < 0 are correlated with small values of

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 18 / 48

Correlation is the measure of the linear relationship between the

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 19 / 48

Cov (U, V ) = a1 a2 Cov (X , Y )

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 20 / 48

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 21 / 48

Let X1 , X2 , . . . , Xk be k random variables with means µ1 , µ2 , . . . , µk

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 22 / 48

Suppose that X1 , X2 , . . . , Xk are independent random variables. Then:

Var (X1 + X2 + . . . + Xk ) = σ12 + σ22 + . . . + σk2 .

If X1 , X2 , . . . , Xk are not independent random variables. Then:

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 23 / 48

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 24 / 48

Frequency intepretation of probability depends on the long-run

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 25 / 48

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 26 / 48

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 27 / 48

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 28 / 48

This definition says that the probability that a random variable X

deviations fo the mean.

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 29 / 48

If the distribution is bell-shaped, then we have the following

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 30 / 48

Suppose that a random variable X has mean 24 and variance 9.

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 31 / 48

Let X be a random variable that represents the systolic blood pressure

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 32 / 48

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 33 / 48

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 34 / 48

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 35 / 48

Without know the underlying distribution of Sn , the weak law of large

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 36 / 48

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 37 / 48

The Central Limit Theorem is one of the most important results in