You are on page 1of 48

Lecture 4: Joint Probability Distribution & Limit

Theorems

Wisnu Setiadi Nugroho

Universitas Gadjah Mada

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 1 / 48


1 Joint Probability Distributions
Bivariate distributions
More than two random variables

2 Limit Theorems
Introduction
Chebyshev’s Theorem
Law of Large Numbers
Central Limit Theorem

3 Sampling Distributions
Introduction

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 2 / 48


Dealing with more than one random variables

In practice, we are dealing with more than 1 random variables.


These variables may not be independent to each other.
For example:
▶ height and weight
▶ being poor and finishing elementary school
In this section, we study the joint distribution of two variables:
bivariate distributions.

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 3 / 48


Joint probability function

Definition
Let X and Y be random variables. If both X and Y are discrete, then:

f (x , y ) = P (X = x , Y = y )

is called the joint probability function (joint pmf) of X and Y .

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 4 / 48


Joint probability function

Definition
If both X and Y are continuous then f (x , y ) is called the probability
density function (joint pdf) of X and Y iff:
Z bZ d
P (a ≤ X ≤ b, c ≤ Y ≤ d) = f (x , y ) dxdy
a c

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 5 / 48


Example

Suppose that there are 8 red balls, 10 yellow balls, and 20 blue balls in
a bucket. A total of 10 balls are randomly selected from this bucket.
Let X = number of red balls and Y = number of blue balls. Find the
joint probability function of the bivariate random variable (X , Y ).

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 6 / 48


Joint probability function

Theorem
If X and Y are two random variables with joint probability function
f(x , y ), then:
f (x , y ) ≥ 0 for all x and y
P
If X and Y are discrete, then x ,y f (x , y ) = 1
where the sum is over all values (x , y ) that are assigned nonzero
probabilities. If X and Y are continuous, then
Z ∞ Z ∞
f (x , y ) = 1.
−∞ −∞

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 7 / 48


Marginal distribution function

Suppose that we are given the joint probability function (pdf or pmf)
We can obtain the probability distribution function of one of the
components through the marginals.

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 8 / 48


Joint probability function

Definition
The marginal pmf or pdf of X , fX (x ) is defined by:
(R ∞
−∞ f (x , y ) dy X , Y are continuous
fX (x ) = P
∀y f (x , y ) X , Y are discrete

similarly, fY (y ) is defined by:


(R ∞
−∞ f (x , y ) dx X , Y are continuous
fY (y ) = P
∀x f (x , y ) X , Y are discrete

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 9 / 48


Example

Find the marginal probability density function of random variable X


and Y
y
-2 0 1 4
-1 0.2 0.1 0 0.2
x
3 0.1 0.2 0.1 0
5 0.1 0 0 0

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 10 / 48


Conditional Probability Distribution

The conditional probability distribution of the random variable X


given Y is given by:

 f (x ,y ) X , Y are continuous
fY (y )
f (x | y ) = f (x | Y = y ) =
 P(X =x ,Y =y ) X , Y are discrete
fY (y )

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 11 / 48


Example

Let
(C x + y ) x,y ≥ 0, x+y<1
(
f (x , y ) =
0 otherwise
otherwis

Find:
1) Show the range of (X,Y), RXy, in the x-y plane
2) Find the constant C
3) The Marginal PDF fX (x ) and fY (y )
4) Find P(Y<2X2)

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 12 / 48


Expected Value

Definition
Let f (x , y ) be the joint probability function. The expected value of
(X , Y ) is:
(P
x ,y xyf (x , y ) X , Y are discrete
E (X , Y ) = R∞ R∞
−∞ −∞ xyf (x , y ) dxdy X , Y are continuous

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 13 / 48


Properties of expected value

Let X and Y be two random variables. Then:


▶ E [aX + bY ] = aE [X ] + bE [Y ]
▶ If X and Y are independent, then

E [XY ] = E [X ] E [Y ]

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 14 / 48


Conditional Expectation

Definition
Let X and Y be jointly distributed with pmf or pdf f (x , y ). Then, the
conditional expectation of X given Y = y is:
(P
xxf (x | y ) X , Y are discrete
E (X | Y = y ) = R∞
−∞ xf (x | y ) dx X , Y are continuous

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 15 / 48


Example

Let (
(x+y)/2 x>0, y>0, 3x+y>3
f (x , y ) =
0 otherwise
Find:
▶ E [X ]

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 16 / 48


Covariance and Correlation

Definition
The covariance between two random variables X and Y is defined by:

σXY = Cov (X , Y ) = E (X − µX ) (Y − µY ) = E (XY ) − µX µY

where µX = E (X ) and µY = E (Y ). The correlation coefficient is defined


by:
Cov (X , Y )
ρ (x , y ) = p
Var (X ) Var (Y )

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 17 / 48


Covariance

If small values of X (X − µX ) < 0 are correlated with small values of


Y (Y − µY ) < 0, then the covariance is positive.
If small values of X (X − µX ) < 0 are correlated with large values of
Y (Y − µY ) > 0, then the covariance is negative.
Covariance is a signed measure of the variance of Y relative to X .
If X and Y are independent, then Cov (X , Y ) = 0.

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 18 / 48


Correlation

Correlation is the measure of the linear relationship between the


random variables X and Y . If Y = aX + b, a ̸= 0, then ρ (x , y ) = 1.
Unlike covariance, the correlation coefficient of X and Y is
dimensionless.

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 19 / 48


Properties
The properties of covariance and correlation coefficient:
▶ −1 ≤ ρ ≤ 1
▶ If X and Y are independent, then ρ = 0.
▶ If Y = aX + b, then:
(
1 if a > 0
ρXY =
−1 if a < 0

▶ If U = a1 X + b1 and V = a2 Y + b2 , then:

Cov (U, V ) = a1 a2 Cov (X , Y )

and (
ρXY if a1 a2 > 0
ρUV =
−ρXY otherwise
▶ Var (aX + bY ) = a2 Var (X ) + b 2 Var (Y ) + 2abCov (X , Y )

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 20 / 48


1 Joint Probability Distributions
Bivariate distributions
More than two random variables

2 Limit Theorems
Introduction
Chebyshev’s Theorem
Law of Large Numbers
Central Limit Theorem

3 Sampling Distributions
Introduction

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 21 / 48


Sums of random variables

Let X1 , X2 , . . . , Xk be k random variables with means µ1 , µ2 , . . . , µk


and variances σ12 , σ22 , . . . , σk2 . Then:

E (X1 + X2 + . . . + Xk ) = µ1 + µ2 + . . . + µk .

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 22 / 48


Sums of random variables

Suppose that X1 , X2 , . . . , Xk are independent random variables. Then:

Var (X1 + X2 + . . . + Xk ) = σ12 + σ22 + . . . + σk2 .

If X1 , X2 , . . . , Xk are not independent random variables. Then:


K
X −1 K
X
Var (X1 + X2 + . . . + Xk ) = σ12 +σ22 +. . .+σk2 +2 Cov (Xi , Xj ) .
i=1 j=i+1

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 23 / 48


1 Joint Probability Distributions
Bivariate distributions
More than two random variables

2 Limit Theorems
Introduction
Chebyshev’s Theorem
Law of Large Numbers
Central Limit Theorem

3 Sampling Distributions
Introduction

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 24 / 48


Motivation

Frequency intepretation of probability depends on the long-run


proportion of times the outcome would occur in repeated experiments.
Some binomial probabilities can be computed using the normal
distribution (using limiting arguments)
Many random variables that we encounter in nature have distributions
close to the normal probability distribution.

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 25 / 48


1 Joint Probability Distributions
Bivariate distributions
More than two random variables

2 Limit Theorems
Introduction
Chebyshev’s Theorem
Law of Large Numbers
Central Limit Theorem

3 Sampling Distributions
Introduction

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 26 / 48


Chebyshev’s Theorem

The theorem gives a lower bound for the area under a curve between
two points that:
▶ are on opposite sides of the mean
▶ equidistant from the mean
Why is this result important?
▶ We do not need to know the distribution of the underlying population
▶ We only need to know its mean and variance.
The theorem was developed by a Russian mathematician, Pafnuty
Chebyshev (1821-1894)

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 27 / 48


Chebyshev’s Theorem

Theorem
Let the random variable X have a mean µ and standard deviation σ. Then
for K > 0, a constant:
1
P (|X − µ| < K σ) ≥ 1 −
K2
h i
E (X − µ)2 Var (X )
P {|X − µ| ≥ ϵ} ≤ = , for some ϵ > 0
ϵ2 ϵ2

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 28 / 48


Chebyshev’s Theorem

This definition says that the probability that a random variable X


differs from its mean by at least K standard deviations is less than or
equal to 1/K 2 .
Suppose that we do not know the population distribution. We can
use Chebyshev’s theorem to have some understanding about the
population distribution:
At least 1 − k12 100% of observations will like within k standard


deviations fo the mean.


▶ Suppose k = 2

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 29 / 48


Chebyshev’s Theorem and Empirical Rule

If the distribution is bell-shaped, then we have the following


empirical rule:
▶ approximately 68% of the observations lie within one standard
deviation of the mean
▶ approximately 95% of the observations lie within two standard
deviation of the mean
▶ approximately 99.7% of the observations lie within three standard
deviation of the mean

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 30 / 48


Example

Suppose that a random variable X has mean 24 and variance 9.


Obtain a bound on the probability that the random variable X
assumes values between 16.5 to 31.5.

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 31 / 48


Example

Let X be a random variable that represents the systolic blood pressure


of the population of 18- to 74-year old men. Suppose that X has
mean 129 mm Hg and standard deviation 19.8 mm Hg
▶ What is the bound on the probability that the systolic blood pressure of
this population will assume values between 89.4 and 168.6 mm Hg?
▶ In addition, assume that the distribution of X is approximately normal.
Using the normal table, find P (89.4 ≤ X ≤ 168.6).

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 32 / 48


1 Joint Probability Distributions
Bivariate distributions
More than two random variables

2 Limit Theorems
Introduction
Chebyshev’s Theorem
Law of Large Numbers
Central Limit Theorem

3 Sampling Distributions
Introduction

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 33 / 48


Law of large numbers

We can now use Chebyshev’s theorem to prove the weak law of large
numbers.
The law says that if the sample size n is large, the sample mean rarely
deviates from the mean of the distribution of X .

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 34 / 48


Law of large numbers

Theorem
Let X1 , . . . , Xn be a set of pairwise independent random variables with
E (Xi ) = µ and Var (Xi ) = σ 2 . The for any c > 0,
n o σ2
P µ − c ≤ X̄ ≤ µ + c ≥ 1 −
nc 2
and as n → ∞, the probability approaches 1.
Sn
 
P −µ <ϵ →1
n
as n → ∞.

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 35 / 48


Law of large numbers

Without know the underlying distribution of Sn , the weak law of large


numbers states that:
▶ the sample mean X̄ = Snn differ from the population mean by less than
an arbitrary constant ϵ > 0.
▶ this probability tends to 1 as n tends to ∞.
This law is also referred to as the law of averages.

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 36 / 48


1 Joint Probability Distributions
Bivariate distributions
More than two random variables

2 Limit Theorems
Introduction
Chebyshev’s Theorem
Law of Large Numbers
Central Limit Theorem

3 Sampling Distributions
Introduction

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 37 / 48


Central limit theorem

The Central Limit Theorem is one of the most important results in


probability theory.
The theorem suggests that the z-transform of the sample mean is
asymptotically standard normal.
No matter what the shape of the original distribution, the sampling
distribution of the mean approaches a normal probability distribution.

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 38 / 48


Central limit theorem

Theorem
If X1 , . . . , Xn is a random sample from an infinite population with mean µ
and variances σ 2 , and the moment-generating function MX (t), then the
(X̄ −µ)
limiting distribution of Zn = σ/√n as n → ∞ is the standard normal
probability distribution.

Also, Zn ∼ N(0, 1) for large n.

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 39 / 48


Example

A soft-drink vending machine is set so that the amount of drink


dispensed is a random variable with a mean of 8 ounces and a
standard deviation of 0.4 ounces. What is the approximate probability
that the average of 36 randomly chosen fills exceed 8.1 ounces?

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 40 / 48


1 Joint Probability Distributions
Bivariate distributions
More than two random variables

2 Limit Theorems
Introduction
Chebyshev’s Theorem
Law of Large Numbers
Central Limit Theorem

3 Sampling Distributions
Introduction

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 41 / 48


Motivation

Sample is a set of random variables.


Thus, sample statistic (e.g., sample mean and variance of x ) is also
random.
Thus, sample statistic has a probability distribution.
We call it the sampling distribution (which is usually different from
the population distribution).

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 42 / 48


Motivation

Sampling distribution of a statistic provides:


▶ a theoretical model of the relative frequency histogram
▶ for the likely values of the statistic
▶ that one would observe through repeated sampling.

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 43 / 48


Sample

Definition
A sample is a set of observable random variables X1 , . . . , Xn . The number
n is called the sample size.

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 44 / 48


Random Sample

Definition
A random sample of size n from a population is a set of n independent
and identically distributed (iid) observable random variables X1 , . . . , Xn .

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 45 / 48


Statistic

Definition
A function T of observable random variables X1 , . . . , Xn that does not
depend on any unknown parameters is called a statistic.

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 46 / 48


Sampling distribution

Definition
The probability distribution of a sample statistic is called the sampling
distribution.

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 47 / 48


Sample mean and sample variance

Theorem
Let X1 , . . . , Xn be a random sample
 of size n from
 a population with mean
2 σ2
µ and variance σ . Then, E X̄ = µ and Var X̄ = n .

Wisnu Setiadi Nugroho (Econ UGM) Statistics II: Lecture 4 48 / 48

You might also like