- Ch. 05
- Discrete Random Variables Collection
- allen_ap_1999_c56_739
- chap05
- Probability 2 Up
- QM - ICAP - Ch 12 - Regression & Correlation
- Covariance Ellipses
- Methods of Structural Reliability
- Harvard
- nonlinearfit.pdf
- 2002_Paper III
- 440 Midterm Two
- chapter 2
- BAF2202+Management+Accounting+I[1] (4)
- Chap04Slides
- Solution 1
- 01_Handout_1.pdf
- Analysis Educational Admin
- Discrete Mathematics
- MS01
- Lec 4
- OrdinaryandUniversalKriging.pdf
- bnt-howto
- MCQ Random Variables
- FINALS - Random Variables
- 5.Probability Distributions
- Discrete Probability Distribution
- Lecture 13 - Random Variables
- 041SCF13
- Probability Theory
- 06
- 26132040 Math Writing
- 03
- 05
- 2081574 Super Fast Mental Math Vedic Math History
- 04
- AX Arms Guide Cor
- 01
- 2428345 Chicken Falooda Mazahiya Shayeri
- 02
- Physics Formula Book
- Trig Book

Functions

Matt Golder & Sona Golder

Pennsylvania State University

Random Variables

Whenever an experiment is conducted, we are typically interested in the

realization of some variable, X.

The realizations of X are governed by the rules of probability regarding

potential outcomes in the sample space.

X is referred to as a random variable – it is a numerical description of the

outcome of an experiment.

Random Variables

To deﬁne a random variable, we have to assign a real number to each point in

the sample space of the experiment that denotes the value of the variable X.

This function is called a random variable (or stochastic variable) or more

precisely a random function (stochastic function).

Formally, a random variable is a real-valued function for which the domain is

the sample space.

Notes

Notes

Notes

Random Variables

Example: Suppose that a coin is tossed twice so that the sample space is

S = {HH, HT, TH, TT}.

Let X represent the number of heads that can come up.

With each sample point we can associate a number for X.

Thus, X is a random variable.

Table: Random Variable: Tossing a Coin Twice

Sample Point HH HT TH TT

X 2 1 1 0

Random Variables

A discrete random variable X is one that can assume only a ﬁnite or

countably inﬁnite number of distinct values.

A continuous random variable is one that can assume any non-countably

inﬁnite number of values.

Random Variables

Statistical inference involves making an inference about a population. The

event of interest to us will often correspond to the value of a random variable.

The collection of probabilities associated with the diﬀerent values of a random

variable is called the probability distribution of the random variable.

The intuition of probability as it relates to discrete and continuous random

variables is identical; the only thing that is diﬀerent is the math.

Let’s start by looking at discrete probability distributions.

Notes

Notes

Notes

Probability Mass Function

Deﬁnition: The probability that X takes on the value x, Pr(X = x), is deﬁned

as the sum of the probabilities of all sample points in S that are assigned the

value x. Pr(X = x) is sometimes denoted by Pr(x) or p(x).

Probability distributions are functions that assign probabilities to each value x

of the random variable X. Thus, the probability distribution of X is also the

probability function of X i.e. f(x).

For a discrete random variable, the probability distribution is called the

probability mass function (pmf).

For a continuous random variable, it is called the probability density function

(pdf).

Probability Mass Function

The pmf, f(x), of a discrete random variable X satisﬁes the following

properties:

0 ≤ f(x) ≤ 1

x

f(x) = 1 where the summation is over all values of x with nonzero

probability.

Example: A couple plan to have 3 children, and are interested in the number of

girls they might have.

We can redeﬁne “the number of girls” as a random variable X that takes on

the values 0, 1, 2, 3, 4 i.e. X = {0, 1, 2, 3}.

Associated with each value is a probability, derived from the original sample

space.

Probability Mass Function

Figure: Probability Mass Function for Number of Girls in a Three-Child Family

I (if the probability of a boy on each birth is 0.52)

e P

r(e)

BB .14 B X Pr(x)

0 .14

1 .39

2 .36

3 .11

BBG .13

BGB .13

BGG .12

GBB .13

GBG .12

GGB .12

GGG .11

What is the probability that the couple will have more than one girl?

Pr(X > 1) = p(2) +p(3) = 0.47

Notes

Notes

Notes

Probability Mass Function

Figure: Probability Mass Function for Number of Girls in a Three-Child Family

I (if the probability of a boy on each birth is 0.52)

e P

r(e)

BB .14 B X Pr(x)

0 .14

1 .39

2 .36

3 .11

BBG .13

BGB .13

BGG .12

GBB .13

GBG .12

GGB .12

GGG .11

What is the probability that the couple will have more than one girl?

Pr(X > 1) = p(2) +p(3) = 0.47

Probability Mass Function

Figure: Probability Mass Function for Number of Girls in a Three-Child Family

II (if the probability of a boy on each birth is 0.52)

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 1 2 3

X

P

r

(

X

)

Probability Mass Function

Example: Suppose we have a sample of two U.S. voters who could support

Obama or McCain. The sample space S = {OO, OM, MO, MM}. Let’s

assume that the probability that a voter supports Obama is given by

Pr(Vi = o) = o. It follows from this that the Pr(Vi = M) = 1 −o.

Table: Obama vs McCain

Sample Point Obama Voters Probability of Sample Point

OO 2 o

2

OM 1 o(1 −o)

MO 1 (1 −o)o

MM 0 (1 −o)

2

Notes

Notes

Notes

Probability Mass Function

Let f(x) be the probability that x number of respondents prefer Obama:

Table: PMF: Obama vs McCain

Sample Point: Obama Voters f(x)

0 (1 −o)

2

1 2o(1 −o)

2 o

2

The probability that either one or both of the voters prefer Obama is

Pr(X = 1 ∪ X = 2) = f(1) +f(2) = 2o(1 −o) +o

2

.

Cumulative Probability Function

If X is at least ordinal, we may be interested in the probability that X assumes

a value less than or equal to a certain value, k.

Deﬁnition (Cumulative Probability Function): If x1, x2, . . . , xm are values of X

such that x1 < x2 < . . . < xm, then the cumulative probability function of xk

is given by:

Pr(X ≤ xk) =

x≤k

Pr(X = x) = 1 −

x>k

Pr(X = x)

or, equivalently,

Pr(X ≤ xk) = F(xk) = f(x1) +f(x2) +. . . +f(xk) =

k

i=1

f(xi)

Cumulative Probability Function

There are some important facts regarding cumulative probability functions to

note:

1 If xm is the largest value of X, then F(xm) = 1.

2 F(−∞) = 0, F(∞) = 1.

3 F(xi) −F(xi−1) = f(xi) where xi > xi−1 > xi−2 > . . ..

Notes

Notes

Notes

Cumulative Probability Function

If X takes on only a ﬁnite number of values, x1, x2, . . . , xn, then the

cumulative distribution function is given by:

F(x) =

_

¸

¸

¸

¸

¸

¸

¸

_

¸

¸

¸

¸

¸

¸

¸

_

0 −∞ < x < x1

f(x1) x1 ≤ x < x2

f(x1) +f(x2) x2 ≤ x < x3

.

.

.

.

.

.

f(x1) +. . . f(xn) xn ≤ x < ∞

Cumulative Probability Function

The cumulative probability for the number of girls in a three-child family, X, is

F(x) =

_

¸

¸

¸

¸

¸

¸

_

¸

¸

¸

¸

¸

¸

_

0 −∞ < x < 0

0.14 0 ≤ x < 1

0.14 + 0.39 = 0.53 1 ≤ x < 2

0.14 + 0.39 + 0.36 = 0.89 2 ≤ x < 3

0.14 + 0.39 + 0.36 + 0.11 = 1 3 ≤ x < ∞

Cumulative Probability Function

Figure: Cumulative Probability Function for Number of Girls in a Three-Child

Family

1.00

0.89

0.14

0.53

0

0.11

0.36

0.39

0.14

F(X)

3 2 1

X, Number of Girls

Notes

Notes

Notes

Continuous Probability Functions

A continuous random variable is one that can assume any non-countably

inﬁnite number of values.

Because there are an inﬁnite number of possible values, the probability of any

particular value is zero. The sum of the probabilities for all the possible values

is also zero.

Thus, we can’t really talk about probabilities; instead we talk about densities

and use the term probability density function (pdf).

Although the probability that a continuous random variable takes on any

particular value is zero, the probability that the variable takes on a value within

a particular interval is nonzero and can be found.

Probability and Cumulative Density Functions

The probabilities that the value of X will fall within any interval are given by

the corresponding area under the curve, f(x)

Figure: Probability Density Function for Clock-Dial Experiment

1 2 3 4 5 6 7 8 9 10 11 12

f(x)

x

1/12

The probability that the clock hand stops between 8 and 9 is

1

12

.

Probability and Cumulative Density Functions

The general idea is that the probability density function for a continuous

random variable X can be represented by a curve.

And that the probability that X assumes a value in the interval from a to b

(a < b) is given by the area under this curve bounded by a and b.

We can develop the idea of a pdf by contrasting it with a pmf.

Notes

Notes

Notes

Discrete Case

Figure: Probability Mass Function

X1 x

4

X10 X11

f(x)

x

Pr(x4 < x ≤ x10) = f(x5) +f(x6) +. . . +f(x10) =

10

i=5

f(xi)

= F(x10) −F(x4)

where F(x) represents the cumulative density function.

Continuous Case

Figure: Probability Density Function

x

4

X10

f(x)

x

We now need to use the continuous version of summation – integration.

Pr(x4 < x < x10) =

_

x10

x4

f(x)dx

Since the probability that x = x10 is zero, Pr(x4 < x < x10) and

Pr(x4 < x ≤ x10) are equivalent.

Continuous Case

Figure: Probability Density Function

x

4

X10

f(x)

x

We now need to use the continuous version of summation – integration.

Pr(x4 < x < x10) =

_

x10

x4

f(x)dx

Since the probability that x = x10 is zero, Pr(x4 < x < x10) and

Pr(x4 < x ≤ x10) are equivalent.

Notes

Notes

Notes

Probability and Cumulative Density Functions

In general, if X is a continuous random variable, then the probability that it

assumes a value in the interval from a to b is determined by

Pr(a < x < b) =

_

b

a

f(x)dx

where f(x) is the relevant probability density function.

Some important facts:

Since X must assume some value, this implies:

Pr(−∞ < x < ∞) =

_

∞

−∞

f(x)dx = 1

Probability and Cumulative Density Functions

The probability that X will assume any value less than or equal to some

speciﬁc x is

Pr(X ≤ x) = F(x) =

_

x

−∞

f(x)dx

where F(x) is the cumulative probability of x.

Figure: Probability Density Function and its Relationship to F(x)

F(x0)

x

0

f(x)

x

Probability and Cumulative Density Functions

Figure: Cumulative Density Function

F(x)

F(x1)

0.4

0.6

F(x2)

1

0

X2 X1 X

The probability density function, f(x), is the derivative of the cumulative

density function, F(x):

f(x) =

∂F(x)

∂x

= F

(x)

Notes

Notes

Notes

Probability and Cumulative Density Functions

F(−∞) = 0 & F(∞) = 1

For a < b,

F(b)−F(a) =

_

b

−∞

f(x)dx−

_

a

−∞

f(x)dx =

_

b

a

f(x)dx = Pr(a < x < b).

Figure: Cumulative Density Function

F(a)

b

F(b)-F(a)

a

f(x)

x

Some Integration Rules

_

x

n

dx =

x

n+1

n + 1

+c

_

x

−1

dx = ln|x| +c

_

kdx = kx +c

_

e

x

dx = e

x

+c

Uniform Distribution

Figure: Uniform Density Function

a

a b −

1

b x

f(x)

A uniform density function on the interval [a, b] is a constant, i.e., f(x) = k.

1 =

_

b

a

f(x)dx =

_

b

a

kdx =

¸

¸

¸

¸

¸

b

a

kx = (b −a)k

where

k =

1

b −a

Notes

Notes

Notes

Uniform Distribution

Figure: Finding Probabilities Using a Uniform Density Function

a

a b −

1

b c d x

f(x)

Pr(c ≤ X ≤ d) = Area of Shaded Rectangle =

d −c

b −a

Uniform Distribution

Pr(c ≤ X ≤ d) =

d −c

b −a

Proof.

Pr(c ≤ X ≤ d) = F(d) −F(c) =

_

d

a

f(x)dx −

_

c

a

f(x)dx

=

¸

¸

¸

¸

¸

d

a

x

b −a

−

¸

¸

¸

¸

¸

c

a

x

b −a

=

_

d

b −a

−

a

b −a

_

−

_

c

b −a

−

a

b −a

_

=

_

d −a

b −a

_

−

_

c −a

b −a

_

=

d −c

b −a

where we take advantage of the fact that f(x) =

1

b−a

.

Uniform Distribution

Example: Suppose that X is a random variable with values between 0 and 5.

Then X has a uniform distribution given by

f(x) =

1

5 −0

=

1

5

What is Pr(2 ≤ X ≤ 4.5)?

Pr(2 ≤ X ≤ 4.5) =

_

4.5

2

f(x)dx =

4.5 −2

5 −0

=

1

2

Notes

Notes

Notes

Uniform Distribution

Example: Suppose that X is a random variable with values between 0 and 5.

Then X has a uniform distribution given by

f(x) =

1

5 −0

=

1

5

What is Pr(2 ≤ X ≤ 4.5)?

Pr(2 ≤ X ≤ 4.5) =

_

4.5

2

f(x)dx =

4.5 −2

5 −0

=

1

2

Uniform Distribution

Example: A random variable X that has a standard uniform density function

takes on values between 0 and 1.

f(x) =

1

1 −0

= 1

What is Pr(0 ≤ X ≤ 0.5)?

Pr(0 ≤ X ≤ 0.5) =

_

0.5

0

f(x)dx =

0.5 −0

1 −0

=

1

2

Figure: Finding Probabilities Using a Standard Uniform Density Function

0

1

1 1/2 x

f(x)

Uniform Distribution

Example: A random variable X that has a standard uniform density function

takes on values between 0 and 1.

f(x) =

1

1 −0

= 1

What is Pr(0 ≤ X ≤ 0.5)?

Pr(0 ≤ X ≤ 0.5) =

_

0.5

0

f(x)dx =

0.5 −0

1 −0

=

1

2

Figure: Finding Probabilities Using a Standard Uniform Density Function

0

1

1 1/2 x

f(x)

Notes

Notes

Notes

Exponential Distribution

Example: What is Pr(0 ≤ X ≤ 2) if X has a standard exponential density

function? The standard exponential density function is

f(x) = e

−x

.

Answer:

Pr(0 ≤ X ≤ 2) =

_

2

0

e

−x

dx =

¸

¸

¸

¸

¸

2

0

−e

−x

= −e

−2

−

_

−e

−0

_

= −e

−2

−(−1) = 1 −e

−2

= 1 −

1

e

2

= 1 −

1

7.3

0.86

where we take advantage of the fact that e

0

= 1.

Exponential Distribution

Example: What is Pr(0 ≤ X ≤ 2) if X has a standard exponential density

function? The standard exponential density function is

f(x) = e

−x

.

Answer:

Pr(0 ≤ X ≤ 2) =

_

2

0

e

−x

dx =

¸

¸

¸

¸

¸

2

0

−e

−x

= −e

−2

−

_

−e

−0

_

= −e

−2

−(−1) = 1 −e

−2

= 1 −

1

e

2

= 1 −

1

7.3

0.86

where we take advantage of the fact that e

0

= 1.

Example

Given f(x) = cx

2

, 0 ≤ x ≤ 2, and f(x) = 0 elsewhere, ﬁnd the value of c for

which f(x) is a valid density function.

We require a value for c such that

_

∞

−∞

f(x)dx = 1

=

_

2

0

cx

2

dx =

¸

¸

¸

¸

¸

2

0

cx

3

3

=

_

8

3

_

c

Thus,

_

8

3

_

c = 1, and we ﬁnd that c =

3

8

. Thus, f(x) =

3

8

x

2

.

Notes

Notes

Notes

Example

Given f(x) = cx

2

, 0 ≤ x ≤ 2, and f(x) = 0 elsewhere, ﬁnd the value of c for

which f(x) is a valid density function.

We require a value for c such that

_

∞

−∞

f(x)dx = 1

=

_

2

0

cx

2

dx =

¸

¸

¸

¸

¸

2

0

cx

3

3

=

_

8

3

_

c

Thus,

_

8

3

_

c = 1, and we ﬁnd that c =

3

8

. Thus, f(x) =

3

8

x

2

.

Example

Given f(x) =

3

8

x

2

, ﬁnd P(1 ≤ X ≤ 2).

P(1 ≤ X ≤ 2) =

3

8

_

2

1

x

2

dx =

¸

¸

¸

¸

¸

2

1

_

3

8

_

x

3

3

=

7

8

Now ﬁnd P(1 < X < 2).

Because X has a continuous distribution, it follows that

P(X = 1) = P(X = 2) = 0 and, therefore, that

P(1 ≤ X ≤ 2) = P(1 < X < 2) =

7

8

Example

Given f(x) =

3

8

x

2

, ﬁnd P(1 ≤ X ≤ 2).

P(1 ≤ X ≤ 2) =

3

8

_

2

1

x

2

dx =

¸

¸

¸

¸

¸

2

1

_

3

8

_

x

3

3

=

7

8

Now ﬁnd P(1 < X < 2).

Because X has a continuous distribution, it follows that

P(X = 1) = P(X = 2) = 0 and, therefore, that

P(1 ≤ X ≤ 2) = P(1 < X < 2) =

7

8

Notes

Notes

Notes

Example

Given f(x) =

3

8

x

2

, ﬁnd P(1 ≤ X ≤ 2).

P(1 ≤ X ≤ 2) =

3

8

_

2

1

x

2

dx =

¸

¸

¸

¸

¸

2

1

_

3

8

_

x

3

3

=

7

8

Now ﬁnd P(1 < X < 2).

Because X has a continuous distribution, it follows that

P(X = 1) = P(X = 2) = 0 and, therefore, that

P(1 ≤ X ≤ 2) = P(1 < X < 2) =

7

8

Two Random Variables

The distribution of a single random variable is known as a univariate

distribution.

But we might be interested in the intersection of two events, in which case we

need to look at joint distributions.

The joint distributions of two or more random variables are termed bivariate or

multivariate distributions.

Example: We might be interested in the possible outcomes of tossing a coin

and rolling a die. The 12 sample points associated with this experiment are

equiprobable and correspond to the 12 numerical events (x, y). Because all

pairs (x, y) occur with the same relative frequency, we assign probability

1

2

×

1

6

=

1

12

to each sample point.

Jointly Discrete Random Variables

Figure: Bivariate Probability Function where x =Rolling a Dice and

y =Tossing a Coin

6 5 4 3 2 1

x

y

1

0

f(x, y)

1/12

We can write the joint (or bivariate) probability (mass) function for X and Y

as:

p(x, y) = f(x, y) ≡ Pr(X = x ∩ Y = y), y = 1, 0, x = 1, 2, . . . , 6.

Notes

Notes

Notes

Jointly Discrete Random Variables

If X and Y are discrete random variables with joint probability function

f(x, y) = f(X = x, Y = y), then

1 f(x, y) ≥ 0 ∀x, y

2

x

y

f(x, y) = 1 where the sum is over all values (x, y) that are

assigned nonzero probabilities.

Joint and Marginal Distributions

Table: Joint Probability Table (Crosstab): Joint and Marginal Distributions

Value of X

1 2 3 4 5 6 fY (y)

Value 0 1/12 1/12 1/12 1/12 1/12 1/12 1/2

of Y 1 1/12 1/12 1/12 1/12 1/12 1/12 1/2

fX(x) 1/6 1/6 1/6 1/6 1/6 1/6 1

The probability that X assumes a certain value and Y assumes a certain value

is called the joint probability of x and y and is written f(x, y).

The probability that X will assume a certain value alone (ignoring the value of

Y ) is called the marginal probability of x and is written fX(x).

fX(xi) =

∞

i=j

f(xi, yj) = f(xi, yj) +f(xi, yj) +· · · .

Conditional Distributions

The multiplicative law gave us the probability of the intersection of A∩ B as

P(A∩ B) = P(A)P(B|A)

It follows directly from the multiplicative law of probability that the bivariate

probability for the intersection of (x, y) is

f(x, y) = fX(x)f

Y |X

(y|x)

= fY (y)f

X|Y

(x|y)

fX(x) and fY (y) are the univariate or marginal probability distributions for X

and Y individually.

f

Y |X

(y|x) is the conditional probability that the random variable Y equals y

given that X takes on the value x and f

X|Y

(x|y) is the conditional probability

that the random variable X equals x given that Y takes on the value y.

Notes

Notes

Notes

Conditional Distributions

Deﬁnition: If X and Y are jointly discrete random variables with joint

probability function f(x, y) and marginal probability functions fX(x) and

fY (y), respectively, then the conditional discrete probability function of X

given Y is

p(x|y) = P(X = x|Y = y) =

P(X = x, Y = y)

P(Y = y)

= fXY (x|y) =

f(x, y)

fY (y)

provided that fY (y) > 0.

Example: The probability of rolling a six on the die given that you have tossed

a head with the coin is

f

X|Y

(6|1) =

f(6, 1)

fY (1)

=

1

12

1

2

=

1

6

.

Conditional Distributions

Figure: X = Number of Girls and Y = Number of Runs Deﬁned on the

Sample Space

Outcome

e Pr(e) X value Y value

BBB

BBG

BGB

BGG

GBB

GBG

GGB

GGG

.14

.13

.13

.12

.13

.12

.12

.11

0

1

1

2

1

2

2

3

1

2

3

2

2

3

2

1

Conditional Distributions

Figure: Bivariate Probability Function where X =Number of Girls and

Y =Number of Runs

p(x, y)

y

x

1

2

3

1 0 2 3

Notes

Notes

Notes

Conditional Distributions

Figure: Joint Probability Table for x =Number of Girls and y =Number of

Runs

y

x

1 2 3 p(x)

0

1

2

3

.14

0

0

.11

0

.26

.24

0

0

.13

.12

0

.14

.39

.36

.11

p(y) .25 .50 .25 1.00

The joint probability of (1, 2) i.e. one girl and a run of two is f(1, 2) = 0.26.

The marginal probability of one girl is fX(1) = 0 + 0.26 + 0.13 = 0.39.

The conditional probability of 1 girl given a run of two is:

f

X|Y

(1|2) =

f(1, 2)

fY (2)

=

0.26

0.50

= 0.52

Independence

Two events A and B are said to be independent if P(A∩ B) = P(A) ×P(B).

If X and Y are discrete random variables with joint probability function f(x, y)

and marginal probability functions fX(x) and fY (y), respectively, then X and

Y are independent if and only if

P(X = x, Y = y) = P(X = x)P(Y = y)

or equivalently

f(x, y) = fX(x)fY (y)

for all pairs of real numbers (x, y).

Independence

Are rolling a die (X) and tossing a coin (Y ) independent?

Does f(1, 0) = fX(1)fY (0)?

Yes i.e.

1

12

=

1

6

×

1

2

=

1

12

.

X and Y are independent random variables.

Whenever X and Y are independent, then the rows (and columns) of the joint

probability table f(x, y) will be proportional.

Notes

Notes

Notes

Independence

Are rolling a die (X) and tossing a coin (Y ) independent?

Does f(1, 0) = fX(1)fY (0)? Yes i.e.

1

12

=

1

6

×

1

2

=

1

12

.

X and Y are independent random variables.

Whenever X and Y are independent, then the rows (and columns) of the joint

probability table f(x, y) will be proportional.

Independence

Figure: X = Number of Girls and Y = Number of Runs Deﬁned on the

Sample Space

Outcome

e Pr(e) X value Y value

BBB

BBG

BGB

BGG

GBB

GBG

GGB

GGG

.14

.13

.13

.12

.13

.12

.12

.11

0

1

1

2

1

2

2

3

1

2

3

2

2

3

2

1

Are X and Y independent?

Independence

Figure: Joint Probability Table for x =Number of Girls and y =Number of

Runs

y

x

1 2 3 p(x)

0

1

2

3

.14

0

0

.11

0

.26

.24

0

0

.13

.12

0

.14

.39

.36

.11

p(y) .25 .50 .25 1.00

If X and Y were independent, then f(0, 1) would equal fX(0)fY (1). However,

this is not the case: 0.14 = 0.14 ×0.25. X and Y are dependent.

Notes

Notes

Notes

Jointly Continuous Random Variables

Bivariate probability density functions give the joint probability of certain

events, f(x, y).

If X and Y are jointly continuous random variables with a joint density

function given by f(x, y) = f(X = x, Y = y), then

1 f(x, y) ≥ 0 ∀x, y

2

_

∞

−∞

_

∞

−∞

f(x, y)dxdy = 1.

The probability that X lies between a1 and a2 while Y lies between b1 and b2

is:

P(a1 ≤ X ≤ a2, b1 ≤ Y ≤ b2) =

_

b2

b1

_

a2

a1

f(x, y)dxdy

Marginal Distributions

Deﬁnition: Let X and Y be jointly continuous random variables with a joint

density function given by f(x, y) = f(X = x, Y = y). Then the marginal

density functions of X and Y , respectively are given by

fX(x) =

_

∞

−∞

f(x, y)dy

fY (y) =

_

∞

−∞

f(x, y)dx

Conditional Distributions

Deﬁnition: Let X and Y be jointly continuous random variables with joint

density f(x, y) and marginal densities fX(x) and fY (y).

For any y such that fY (y) > 0, the conditional density of X given Y = y is

given by:

fXY (x, y) =

f(x, y)

fY (y)

and for any x such that fX(x) > 0, the conditional density of Y given X = x

is given by:

fY X(x, y) =

f(x, y)

fX(x)

Notes

Notes

Notes

Independence

Deﬁnition: If X and Y are continuous random variables with joint density

function f(x, y) and marginal densities fX(x) and fY (y), respectively, then X

and Y are independent if and only if

f(x, y) = fX(x)fY (y)

for all pairs of real numbers (x, y).

Mathematical Expectations

The probability distribution for a random variable is a theoretical model for the

empirical distribution of data associated with a real population.

Some of the most important characteristics of probability distributions are

termed expected values or mathematical expectations.

We are often interested in the mean and variance of probability distributions.

These numerical descriptive measures provide the parameters for the probability

distribution, f(x).

Recall that while we use

¯

X and s

2

to denote the mean and variance in our

sample (sampling distribution), we use the Greek letters µ and σ

2

for our

probability (population) distribution.

Expected Value of X (Discrete Random Variable)

The sample mean is ¯ x =

m

i=1

fi

n

xi. This varies from sample to sample even

when the sample size n is large.

x f(x) Relative Frequency in Sample

x1 f(x1)

f1

n

x2 f(x2)

f2

n

.

.

.

.

.

.

.

.

.

xm f(xm)

fm

n

Notes

Notes

Notes

Expected Value of X (Discrete Random Variable)

The mean (and variance) are derived in a similar way in that they depend on

the values of x and the associated probabilities.

However, instead of using experimentally derived frequencies, we use the

mathematically derived probability distribution.

The expected value, or mean, of a theoretical probability function is

E(X) =

m

i=1

xif(xi).

Expected Value of X (Discrete Random Variable)

Deﬁnition: Let X be a discrete random variable with probability function f(x).

Then the expected value of X, E(X), is deﬁned as:

E(X) =

x

xf(x)

In other words,

E(X) is equal to the probability-weighted mean of the values of X.

E(X) can be thought of as

**(Value×Probability) for all possible values.
**

If f(x) is an accurate description of the population frequency distribution, then

E(X) = µ, the population mean.

Expected Value of X (Discrete Random Variable)

The term “expected value” is used to emphasize the relation between the

population mean and one’s anticipation about the outcome of an experiment.

In eﬀect, the expected value of X is equivalent to the

f

n

that we would expect

to get if we could repeat whatever experiment we are looking at an inﬁnite

number of times.

It obviously follows from this that E(X) need not be a value taken on by X.

Notes

Notes

Notes

Expected Value of X (Discrete Random Variable)

Example: Suppose that a game is to be played with a single die. Let X be the

random variable giving the amount of money to be won on any toss.

Table: Die Rolling Wager

1 2 3 4 5 6

xj 0 20 0 40 0 -30

f(xj)

1

6

1

6

1

6

1

6

1

6

1

6

E(X) = 0

1

6

+ 20

1

6

+ 0

1

6

+ 40

1

6

+ 0

1

6

+ (−30)

1

6

= 5

Expected Value of X (Continuous Random Variable)

The intuition for a continuous random variable is the same as for the discrete

case: E(X) is a “typical” value, gained by “summing” across values of X . . .

Deﬁnition: Let X be a continuous random variable with probability function

f(x). Then the expected value of X, E(X), is deﬁned as:

E(X) =

_

∞

−∞

xf(x)dx

Expected Value of X (Continuous Random Variable)

Example: The density function of a random variable X is given by f(x) =

1

2

x

for 0 < x < 2, 0 otherwise. Find E(X).

E(X) =

_

∞

−∞

xf(x)dx =

_

2

0

x

_

1

2

x

_

dx =

_

2

0

x

2

2

dx =

x

3

6

¸

¸

¸

¸

¸

2

0

=

4

3

Notes

Notes

Notes

Expected Value of X (Continuous Random Variable)

Example: Consider: f(x) =

1

2

(x + 1) for −1 < x < 1 (with f(x) = 0

otherwise):

What is the expected value (mean) of f(x)?

Expected Value of X (Continuous Random Variable)

E[f(x)] =

_

1

−1

x

_

x + 1

2

_

dx

=

_

1

−1

1

2

(x

2

+x) dx

=

1

2

_

1

−1

x

2

dx +

1

2

_

1

−1

xdx

=

1

2

_

x

3

3

_¸

¸

¸

¸

1

−1

+

1

2

_

x

2

2

_¸

¸

¸

¸

1

−1

=

1

2

_

x

3

3

+

x

2

2

_¸

¸

¸

¸

1

−1

=

1

2

__

1

3

3

+

1

2

2

_

−

_

−1

3

3

+

−1

2

2

__

=

1

3

Some Theorems on Expectation

1 If c is any constant, then

E(c) = c

and

E(cX) = cE(X)

2 If X and Y are any random variables, then

E(X +Y ) = E(X) +E(Y )

3 If X and Y are independent random variables, then

E(XY ) = E(X)E(Y )

4 If X and Y are independent random variables, then

E(aX +b) = aE(X) +b

Notes

Notes

Notes

Some Theorems on Expectation

Proof.

E(X +Y ) =

∞

i=1

∞

j=1

f(xi, yj)(X +Y )

=

∞

i=1

∞

j=1

f(xi, yj)xi +

∞

i=1

∞

j=1

f(xi, yj)yj

=

∞

i=1

xi

∞

j=1

f(xi, yj) +

∞

i=1

yj

∞

j=1

f(xi, yj)

=

∞

i=1

xifX(xi) +

∞

j=1

yjfY (yj)

= E(X) +E(Y )

Some Theorems on Expectation

Proof.

E(aX +b) =

n

i=1

(aX +b)f(x)

=

n

i=1

aXf(x) +

n

i=1

bf(x)

= a

n

i=1

Xf(x) +b

n

i=1

f(x)

= a

n

i=1

Xf(x) +b = aE(X) +b

where we took advantage of the fact that

n

i=1

f(x) = 1.

Expectations

Example: What is the expected value of g(X) = 2 + 3X, where X is a random

variable obtained by rolling a die?

From the theorems above, we know that if X is a random variable and a and b

are constants, then

E(aX +b) = aE(X) +b

E(X) = 1

1

6

+ 2

1

6

+ 3

1

6

+ 4

1

6

+ 5

1

6

+ 6

1

6

=

21

6

= 3.5

E(g(X)) = 2 + 3E(X) = 2 + 3(3.5) = 12.5.

Notes

Notes

Notes

Expectations

This method generalizes quite easily:

E(g(X)) =

_

∞

i=1

g(xi)f(xi) (X discrete)

_

∞

−∞

g(x)f(x)dx (X continuous)

In terms of our example, therefore, we have:

E(g(X)) = [2 + 3(1)]

1

6

+ [2 + 3(2)]

1

6

+ [2 + 3(3)]

1

6

+ [2 + 3(4)]

1

6

+ [2 + 3(5)]

1

6

+ [2 + 3(6)]

1

6

= [2 + 3]

1

6

+ [2 + 6]

1

6

+ [2 + 9]

1

6

+ [2 + 12]

1

6

+ [2 + 15]

1

6

+ [2 + 18]

1

6

=

5

6

+

8

6

+

11

6

+

14

6

+

17

6

+

20

6

=

75

6

= 12.5

Expectations

Example: What is the expected value of X when X has a standard uniform

distribution i.e. f(x) = 1 and 0 ≤ X ≤ 1

E(X) =

_

1

0

xf(x)dx =

_

1

0

x1dx

=

x

2

2

¸

¸

¸

¸

¸

1

0

=

1

2

−0 =

1

2

Expectations

Example: What is the expected value of X when X has a standard uniform

distribution i.e. f(x) = 1 and 0 ≤ X ≤ 1

E(X) =

_

1

0

xf(x)dx =

_

1

0

x1dx

=

x

2

2

¸

¸

¸

¸

¸

1

0

=

1

2

−0 =

1

2

Notes

Notes

Notes

Expectations

Example: What is the expected value of X, when X is distributed according to

a standard exponential i.e. f(x) = e

−x

?

E(X) =

_

∞

−∞

xf(x)dx

=

_

∞

0

xe

−x

dx

This last part follows from the fact that the exponential distribution only goes

from 0 to ∞.

Expectations

E(X) =

_

∞

0

xe

−x

dx

To solve this, we need to do integration by parts.

E(X) =

_

∞

−∞

xf(x)dx

=

_

∞

0

xe

−x

dx

= −xe

−x

−

_

∞

0

e

−x

dx

= −xe

−x

−

_

−e

−x

_

¸

¸

¸

¸

¸

∞

0

= −∞e

−∞

+e

∞

−

_

−0e

−0

−

_

−e

−0

__

=

−∞

e

∞

+ 0 + 1 = 1 +

−∞

e

∞

Expectations

It turns out that

lim

x→∞

1

e

x

=

1

e

∞

= 0

Substituting in, we, therefore, have:

E(X) = 1 +

−∞

e

∞

= 1 + 0 = 1

Notes

Notes

Notes

Expectations

Example: A woman leaves for work between 8 a.m. and 8.30 a.m., and takes

between 40 and 60 minutes to get there.

Let the random variable X denote her time of departure, and the random

variable Y the travel time.

Assuming that these variables are independent and uniformly distributed, ﬁnd

the following:

1 Her expected arrival time.

2 The probability that she arrives before 9 a.m.

Expectations

The expected arrival time is E(X +Y ) = E(X) +E(Y ).

E(X) =

_

8.30

8.00

xf(x)dx =

_

8.30

8.00

x

1

30

dx

=

x

2

60

¸

¸

¸

¸

¸

8.30

8.00

Let’s change to minutes past 8.

E(X) =

x

2

60

¸

¸

¸

¸

¸

30

0

=

900

60

−0 = 15

A similar method shows that E(Y ) = 50. Thus, E(X) +E(Y ) = 65. If we

start at 8 a.m. and then add 65 minutes, we see that the expected arrival time

is 9.05 a.m.

Expectations

What’s the probability that she arrives before 9 a.m. i.e. P(X +Y ) ≤ 9 a.m.?

Let’s convert this to minutes past 8 a.m. In other words, we want to know

P(X +Y ) ≤ 60.

Put diﬀerently, we want to know FZ(60) where Z = X +Y and FZ is the

cumulative density function of Z.

We can think about this graphically.

Notes

Notes

Notes

Expectations

Figure: Travel Times

Y

60

40

X 20 30

The dotted rectangle indicates all the points where the joint density of X and

Y are nonzero.

The joint density of X and Y is uniformly distributed over this rectangle.

The grey triangle represents all the points that correspond to the event Z ≤ 60.

Expectations

Figure: Travel Times

Y

60

40

X 20 30

The area of the whole rectangle is 20 ×30 = 600 and the area of the triangle is

1

2

(20) ×20 = 200.

Thus, the probability that the woman arrives before 9 a.m. is

200

600

=

1

3

.

Expectations: Two Random Variables

Deﬁnition: Let g(X, Y ) be a function of the discrete random variables, X and

Y , which have bivariate probability function f(x, y). Then the expected value

of g(X, Y ) is

E[g(X, Y )] =

∞

i=1

∞

j=1

g(xi, yj)f(xi, yj)

Deﬁnition: Let g(X, Y ) be a function of the continuous random variables, X

and Y , which have bivariate density function f(x, y). Then the expected value

of g(X, Y ) is

E[g(X, Y )] =

_

∞

−∞

_

∞

−∞

g(xi, yj)f(xi, yj)dxdy

Notes

Notes

Notes

Expectations: Two Random Variables

Example: In planning a three-child family, suppose that annual clothing costs

are:

R = g(x, y) = 10 +x +y

Calculate the expected cost E(R)?

E[g(X, Y )] =

∞

i=1

∞

j=1

g(xi, yj)f(xi, yj)

where R = g(x, y) = 10 +x +y.

Expectations: Two Random Variables

This is the joint probability table for f(x, y).

Y

X 1 2 3

0 0.14 0 0

1 0 0.26 0.13

2 0 0.24 0.12

3 0.11 0 0

Now we need to calculate g(x, y).

Expectations: Two Random Variables

This is the joint probability table for g(x, y).

Y

X 1 2 3

0 (10+0+1)0.14 0 0

1 0 (10+1+2)0.26 (10+1+3)0.13

2 0 (10+2+2)0.24 (10+2+3)0.12

3 (10+3+1)0.11 0 0

If we sum up all these values across rows and columns, we have E(R) = 13.44.

Notes

Notes

Notes

Expectations: Two Random Variables

Example: Let X and Y have a joint density function given by f(x, y) = 2x for

0 ≤ x ≤ 1 and 0 ≤ y ≤ 1, 0 otherwise. Find E(XY ).

E(XY ) =

_

∞

−∞

_

∞

−∞

xyf(x, y)dxdy

=

_

1

0

_

1

0

xy(2x)dxdy

=

_

1

0

y

_

2x

3

3

¸

¸

¸

¸

¸

1

0

_

dy =

_

1

0

_

2

3

_

ydy

=

2

3

y

2

2

¸

¸

¸

¸

¸

1

0

=

1

3

Expectations: Two Random Variables

Example: Let X and Y have a joint density function given by f(x, y) = 2x for

0 ≤ x ≤ 1 and 0 ≤ y ≤ 1, 0 otherwise. Find E(X).

It turns out that if X and Y are two continuous random variables with joint

density function f(x, y), then the means or expectations of X and Y are

E(X) =

_

∞

−∞

_

∞

−∞

xf(x, y)dxdy

E(Y ) =

_

∞

−∞

_

∞

−∞

yf(x, y)dxdy

Expectations: Two Random Variables

Thus, in our example, we have:

E(X) =

_

∞

−∞

_

∞

−∞

xf(x, y)dxdy

=

_

1

0

_

1

0

x(2x)dxdy

=

_

1

0

_

2x

3

3

¸

¸

¸

¸

¸

1

0

_

dy =

_

1

0

_

2

3

_

dy

=

2

3

y

¸

¸

¸

¸

¸

1

0

=

2

3

Notes

Notes

Notes

Variance: Discrete Random Variable

Deﬁnition: If X is a discrete random variable with mean E(X) = µ, then the

variance of a random variable X is deﬁned to be the expected value of

(X −µ)

2

. That is:

Variance, σ

2

= E{[X −E(X)]

2

}

= E[(X −µ)

2

]

=

x

(x −µ)

2

f(x)

The standard deviation of X, σ, is the positive square root of σ

2

.

There is an alternative way of calculating the variance

Variance, σ

2

= E(X

2

) −[E(X)]

2

= E(X

2

) −µ

2

=

x

x

2

f(x) −µ

2

Variance: Discrete Random Variable

Table: Calculating the Mean and Variance of X = Number of Girls

Given Probability Calculation of µ Calculation of σ

2

Calculation of σ

2

Distribution µ =

x

xf(x) σ

2

=

x

(x −µ)

2

f(x) σ

2

=

x

x

2

f(x) −µ

2

x f(x) xf(x) (x −µ) (x −µ)

2

(x −µ)

2

f(x) x

2

f(x)

0 0.14 0 -1.44 2.07 0.29 0

1 0.39 0.39 -0.44 0.19 0.08 0.39

2 0.36 0.72 0.56 0.31 0.11 1.44

3 0.11 0.33 1.56 2.43 0.27 0.99

µ = 1.44 σ

2

= 0.75

x

x

2

p(x) = 2.82

µ

2

= 2.07

Diﬀerence = σ

2

= 0.75

Variance: Discrete Random Variable

Bikes Sold Example: On the basis of past experience, the buyer for a large

sports store estimates that the number of 10-speed bicycles sold next year will

be somewhere between 40 and 90 with the distribution shown in the ﬁrst two

columns of the table on the next slide.

Questions:

What is the mean number sold? What is the estimated standard

deviation?

If 60 are ordered, what is the chance they will all be sold? What is the

chance some will be left over?

To be almost sure (95%) of having enough bicycles, how many should be

ordered?

Notes

Notes

Notes

Variance: Discrete Random Variable

Table: Calculating the Mean and Variance of X = Number of Bikes Sold

Given Probability Calculation of µ Calculation of σ

2

Calculation of σ

2

Distribution µ =

x

xp(x) σ

2

=

x

(x −µ)

2

p(x) σ

2

=

x

x

2

p(x) −µ

2

x p(x) xp(x) (x −µ) (x −µ)

2

(x −µ)

2

p(x) x

2

p(x)

40 0.05 2 -22 484 24.2 80

50 0.15 7.5 -12 144 21.6 375

60 0.41 24.6 -2 4 1.64 1476

70 0.34 23.8 8 64 21.76 1666

80 0.04 3.2 18 324 12.96 256

90 0.01 0.9 28 784 7.84 81

µ = 62 σ

2

= 90

x

x

2

p(x) = 3934

µ

2

= 3844

Diﬀerence = σ

2

= 90

Variance: Discrete Random Variable

Answers:

The mean is 62 and the standard deviation is

√

90 = 9.5.

Pr(x ≥ 60) = 0.41 + 0.34 + 0.04 + 0.01 = 0.80 i.e. 80% chance.

Pr(x < 60) = 1 −P(x ≥ 60) = 0.20 i.e. 20% chance.

Notice the distribution of p(x). The p(x ≤ 70) = .95. So, to be 95% sure

of having enough bicycles, we should order 70 bicycles.

Variance: Continuous Random Variable

Deﬁnition: If X is a continuous random variable with mean E(X) = µ, then

the variance of a random variable X is deﬁned to be the expected value of

(X −µ)

2

. That is:

Variance, σ

2

= E{[X −E(X)]

2

}

= E[(X −µ)

2

]

=

_

∞

−∞

(x −µ)

2

f(x)dx

The standard deviation of X, σ, is the positive square root of σ

2

.

Notes

Notes

Notes

Variance: Continuous Random Variable

Example: The density function of a random variable X is given by f(x) =

1

2

x

for 0 < x < 2, 0 otherwise. Find the variance and standard deviation of this

variable. We already found that µ = E(X) =

4

3

in one of our earlier examples.

σ

2

= E

_

_

x −

4

3

_

2

_

=

_

∞

−∞

_

x −

4

3

_

2

f(x)dx

=

_

2

0

_

x −

4

3

_

2

_

1

2

x

_

dx =

2

9

and so the standard deviation is

σ =

_

2

9

=

√

2

3

Some Theorems on Variance

1 If c is any constant, then

Var(c) = 0

and

Var(cX) = c

2

Var(X)

2 The quantity E[(X −a)

2

] is a minimum when a = µ = E(X).

3 If X and Y are independent random variables, then

(a) Var(X +Y ) = Var(X) + Var(Y )

(b) Var(X −Y ) = Var(X) + Var(Y )

Theorem (a) indicates that the variance of a sum of independent

variables equals the sum of their variances.

4 If X is a random variable and a and b are constants, then

Var(aX +b) = a

2

Var(X)

Some Theorems on Variance

Proof.

Var(X −Y ) = Var(X) + Var(Y )

Deﬁne a = −1. Then

Var(X +aY ) = Var(X) + Var(aY )

= Var(X) +a

2

Var(Y )

= Var(X) + Var(Y )

Notes

Notes

Notes

Some Theorems on Variance

Proof.

Var(aX +b) = E[(aX +b) −E(ax +b)]

2

and since E(aX +b) = aµ +b

Var(aX +b) = E(aX −aµ)

2

= a

2

E[(X −µ)

2

] = a

2

Var(X)

Covariance (Variance of Joint Distributions)

If X and Y are two continuous random variables with joint density function

f(x, y), then we have already seen that:

E(XY ) =

_

∞

−∞

_

∞

−∞

xyf(x, y)dxdy

E(X) =

_

∞

−∞

_

∞

−∞

xf(x, y)dxdy

E(Y ) =

_

∞

−∞

_

∞

−∞

yf(x, y)dxdy

The variances of X and Y are:

var(X) = σ

2

X

=

_

∞

−∞

_

∞

−∞

(x −µX)

2

f(x, y)dxdy

var(Y ) = σ

2

Y

=

_

∞

−∞

_

∞

−∞

(y −µY )

2

f(x, y)dxdy

Covariance

Rather than talk about the joint variance of X and Y , we talk about the

covariance of X and Y .

Deﬁnition: If X and Y are random variables with means µX and µY ,

respectively, then the covariance of X and Y , often denoted σXY is calculated

as:

Cov(X, Y ) = σXY = E[(X −E(X))(Y −E(Y ))]

= E[(X −µX)(Y −µY )]

We can also write the covariance as

Cov(X, Y ) = σXY = E(XY ) −E(X)E(Y )

Notes

Notes

Notes

Covariance

Proof.

Cov(X, Y ) = E[(X −E(X))(Y −E(Y ))]

= E[(X −µX)(Y −µY )]

= E(XY −µXY −µY X +µXµY )

= E(XY ) −µXE(Y ) −µY E(X) +µXµY

= E(XY ) −E(X)E(Y )

= E(XY ) −µXµY

Covariance (Intuition)

Figure: Dependent and Independent Observations

X

Y Y

X μX

μX

μY μY

(b) (a)

Covariance is a measure of association between two variables – it indicates how

the values of one variable depend on the values of another.

Why? Well, it has to do with the fact that the average value of

(x −µX)(y −µY ) provides a measure of the linear dependence between X and

Y .

Covariance (Intuition)

Suppose we locate a plotted point (x, y) on Figure a, and measured the

deviations (x −µX) and (y −µY ).

Both deviations assume the same algebraic sign for any point (x, y) and their

product (x −µX)(y −µY ) is positive.

Points to the right of µX yield pairs of positive deviations, while points to the

left yield negative deviations.

The average of the product of the deviations (x −µX)(y −µY ) is large and

positive.

If the linear relationship indicated in Figure a had sloped down and to the right,

all corresponding pairs of deviations would have the opposite sign, and the

average value of (x −µX)(y −µY ) would have been a large negative number.

Notes

Notes

Notes

Covariance (Intuition)

The situation just described does not occur for Figure b, where little

dependence exists between X and Y .

Their corresponding deviations (x −µX) and (y −µY ) will assume the same

algebraic sign for some points and opposite signs for others.

Thus, the product (x −µX)(y −µY ) will be positive for some points, negative

for others, and will average to some value near 0.

From this we can see that the average value of (x −µX)(y −µY ) provides a

measure of the linear dependence between X and Y .

Covariance

The larger the absolute value of the covariance, the greater the linear

dependence between X and Y .

The covariance between two variables can be anywhere between −∞ and +∞.

Positive values indicate that X increases as Y increases, and negative values

indicate that X decreases as Y increases.

A zero value of the covariance indicates that the variables are uncorrelated, and

that there is no linear dependence between X and Y .

Covariance and Statistical Independence

If X and Y are independent random variables, then

Cov(X, Y ) = 0

Thus, independent variables must be uncorrelated.

But Cov(X, Y ) = 0 does not mean that X and Y are independent random

variables.

Notes

Notes

Notes

Covariance and Statistical Independence

Example: Consider the following joint probability.

Values of Y

-1 0 1 f(x)

-1 1/16 3/16 1/16 5/16

Values of X 0 3/16 0 3/16 6/16

1 1/16 3/16 1/16 5/16

f(y) 5/16 6/16 5/16 1

fXY (0, 0) = fX(0)fY (0)

0 =

6

16

×

6

16

As a result, we can see that X and Y are not independent.

Covariance and Statistical Independence

Cov(X, Y ) = E(XY ) −E(X)E(Y )

E(XY ) =

∞

i=1

∞

j=1

xiyjf(xi, yj)

= (−1)(−1)

1

16

+ (−1)(0)

3

16

+ (−1)(1)

1

16

+ (0)(−1)

3

16

+ (0)(0)(0) + (0)(1)

3

16

+ (1)(−1)(1/16) + (1)(0)(3/16) + (1)(1)(1/16)

= (1/16) −(1/16) −(1/16) + (1/16) = 0

E(X) = −1 ×

5

16

+ 0 ×

6

16

+ 1 ×

5

16

= 0

E(Y ) = −1 ×

5

16

+ 0 ×

6

16

+ 1 ×

5

16

= 0

Thus,

Cov(X, Y ) = 0 −0(0) = 0

Variance Again

When X and Y are independent random variables, we have

Var(X +Y ) = Var(X) + Var(Y )

Var(X −Y ) = Var(X) + Var(Y )

More generally, though, if X and Y are random variables, then

Var(X +Y ) = Var(X) + Var(Y ) + 2Cov(X, Y )

Var(X −Y ) = Var(X) + Var(Y ) −2Cov(X, Y )

Similarly, for a weighted sum, we have:

Var(aX +bY ) = a

2

Var(X) +b

2

Var(Y ) + 2abCov(X, Y )

Notes

Notes

Notes

Variance Again

Proof.

Var(X +Y ) = E[(X +Y ) −E(X +Y )]

2

= E[X −E(X) +Y −E(Y )]

2

= E[X −E(X)]

2

+E[Y −E(Y )]

2

+ 2E[X −E(X)][Y −E(Y )]

= Var(X) + Var(Y ) + 2Cov(X, Y )

Variance Again

Values of Y

-1 0 1 f(x)

-1 1/16 3/16 1/16 5/16

Values of X 0 3/16 0 3/16 6/16

1 1/16 3/16 1/16 5/16

f(y) 5/16 6/16 5/16 1

What’s Var(X +Y )?

One way to answer this is to deﬁne a new variable Z = X +Y .

Variance Again

zi f(zi)

-2

1

16

-1

6

16

0

2

16

1

6

16

2

1

16

Var(Z) = E(Z

2

) −[E(Z)]

2

=

z

2

i

fZ(zi) −

_

zifZ(zi)

_

2

= −2

2

_

1

16

_

+ (−1

2

)

_

6

16

_

+ 0 + 1

2

_

6

16

_

+ 2

2

_

1

16

_

−

_

−2

_

1

16

_

+ (−1)

_

6

16

_

+ 0 + 1

_

6

16

_

+ 2

_

1

16

__

2

=

20

16

−0 =

20

16

= 1.25

Notes

Notes

Notes

Variance Again

Or we can start from the following deﬁnition:

Var(X +Y ) = Var(X) + Var(Y ) + 2Cov(X, Y )

Since we know that Cov(X, Y ) = 0, we need only calculate Var(X) and

Var(Y ).

Var(X) =

xifX(xi) −

_

xifX(xi)

_

2

= (−1

2

)

_

5

16

_

+ 0 + (1

2

)

_

5

16

_

−

_

−1

_

5

16

_

+ 0 + 1

_

5

16

__

2

=

10

16

−(0)

2

=

10

16

(1)

Variance Again

Var(Y ) =

10

16

Thus,

Var(X +Y ) =

10

16

+

10

16

=

20

16

= 1.25

Correlation Coeﬃcient

It is diﬃcult to know whether a covariance is large or small because its value

depends upon the scale of measurement.

Cov(aX, bY ) = E[aX −E(aX)][bY −E(bY )]

= E[a(X −E(X))][b(Y −E(Y ))]

= abE[X −E(X)][Y −E(Y )]

= abCov(X, Y )

Notes

Notes

Notes

Correlation Coeﬃcient

This problem can be eliminated by standardizing its value and using the

correlation coeﬃcient, ρ

Correlation Coeﬃcient, ρ =

Cov(X, Y )

σXσY

=

σXY

σXσY

where σX and σY are the standard deviations of X and Y respectively.

−1 ≤ ρ ≤ 1

When ρ = 1, then we have perfect correlation, with all points falling on a

straight line with positive slope. ρ = 0 implies zero covariance and no

correlation.

ρ < 0 implies that Y decreases as X increases, with ρ = −1 implying perfect

correlation with all points falling on a straight line with negative slope.

Correlation Coeﬃcient: Things to Be Aware Of

The correlation coeﬃcient ρ is a measure of linear dependence only. It

does not necessarily indicate if there is a systematic relationship between

two variables X and Y .

As with covariance, if X and Y are independent, then ρ = 0. But ρ = 0

does not mean that X and Y are independent.

ρ might be large but this does not mean that Y necessarily changes by a

lot when X changes by a lot, or vice versa.

Figure: Correlation vs Responsiveness

X

Y

X

(b) (a)

Y

Correlation Coeﬃcient: Things to Be Aware Of

The strength of the correlation coeﬃcient ρ depends on (i) the strength

of the linear relationship between X and Y AND (ii) the variance of X

and Y . This is obviously problematic if we want to measure the strength

of the linear relationship between X and Y .

Figure: Correlation Coeﬃcient and Variation in X

m

e

m

b

e

r

o

f

c

o

n

g

r

e

s

s

m

e

m

b

e

r

o

f

c

o

n

g

r

e

s

s

foreign policy

civil rights

Notes

Notes

Notes

Conditional Expectations

If X and Y are any two random variables, then the conditional expectation of

Y given that X = x is deﬁned to be:

E(Y |X) =

_

∞

i=1

yif

Y |X

(yi|xi) in the discrete case

_

∞

−∞

yif

Y |X

(y|x)dy in the continuous case

In term of the example just given, the conditional expectation of Y given that

X is equal to -1 is

E(Y |x = −1) = −1 ×

1/16

5/16

+ 0 + 1 ×

1/16

5/16

= 0

Also,

E(Y |x = 0) = −1 ×

3/16

6/16

+ 0 + 1 ×

3/16

6/16

= 0

and

E(Y |x = 1) = −1 ×

1/16

5/16

+ 0 + 1 ×

1/16

5/16

= 0

Conditional Expectations

Conditional expectations have the following properties:

(1) E(Y |X = x) = E(Y ) when X and Y are independent

and

(2) E(Y ) =

_

∞

−∞

E(Y |X = x)fX(x)dx = E[E(Y |X)]

where on the right-hand side the inside expectation is with respect to the

conditional distribution of Y given X and the outside expectation is with

respect to the distribution of X.

Conditional Expectations

Example: The average travel time to a city is c hours by car or b hours by bus.

A woman cannot decide whether to drive or take the bus, and so tosses a coin.

What is her expected travel time?

We are dealing with the joint distribution of the outcome of the toss, X, and

the travel time Y , where Y = Ycar if X = 0 and Y = Ybus if X = 1. Let’s

assume that X and Y are independent so that by the ﬁrst property above we

have:

E(Y |X = 0) = E(Ycar|X = 0) = E(Ycar) = c

E(Y |X = 1) = E(Ybus|X = 1) = E(Ybus) = b

Then by the second property above

E(Y ) = E(Y |X = 0)P(X = 0) +E(Y |X = 1)P(X = 1) =

c +b

2

Notes

Notes

Notes

Conditional Variances

If X and Y are any two random variables, then the conditional variance of Y

given that X = x is deﬁned to be:

V (Y |X = x) = E[(Y −E(Y |X = x))

2

|X = x]

= E(Y

2

|X = x) −[E(Y |X = x)]

2

Summary

Mathematical

Expectations Population Sample

Mean µ = E(X)

¯

X =

1

N

n

i=1

Xi

Variance σ

2

= E[(X −µ)

2

] s

2

=

1

N−1

N

i=1

(Xi −

¯

X)

2

Covariance CovXY = E[(X −µX)(Y −µY )] CovXY =

N

i=1

(Xi−

¯

X)(Yi−

¯

Y )

N−1

Correlation ρXY =

E[(X−µX)(Y −µY )]

σXσY

rXY =

N

i=1

(Xi−

¯

X)(Yi−

¯

Y )

(N−1)sXsY

Notes

Notes

Notes

- Ch. 05Uploaded byfour threepio
- Discrete Random Variables CollectionUploaded bySean Yiyang
- allen_ap_1999_c56_739Uploaded byjegarciap
- chap05Uploaded byKaziRafi
- Probability 2 UpUploaded bySanthosh.S.U
- QM - ICAP - Ch 12 - Regression & CorrelationUploaded byTinh Linh
- Covariance EllipsesUploaded byrigasti
- Methods of Structural ReliabilityUploaded bymgrubisic
- HarvardUploaded byfarah
- nonlinearfit.pdfUploaded bygeo
- 2002_Paper IIIUploaded byhmphry
- 440 Midterm TwoUploaded byEd Z
- chapter 2Uploaded byKatie Cook
- BAF2202+Management+Accounting+I[1] (4)Uploaded byKafonyi John
- Chap04SlidesUploaded byVivek Jain
- Solution 1Uploaded byJustin Liao
- 01_Handout_1.pdfUploaded byKenny basabe
- Analysis Educational AdminUploaded byAsni Beddu
- Discrete MathematicsUploaded byAsghar Ali
- MS01Uploaded byeugene tan
- Lec 4Uploaded byGabe238
- OrdinaryandUniversalKriging.pdfUploaded byRossana Caira
- bnt-howtoUploaded byanu868
- MCQ Random VariablesUploaded byrubina.sk
- FINALS - Random VariablesUploaded byShōyōHinata
- 5.Probability DistributionsUploaded bysii raii
- Discrete Probability DistributionUploaded byMutia Zahara
- Lecture 13 - Random VariablesUploaded bySaji Jimeno
- 041SCF13Uploaded byujnzaq
- Probability TheoryUploaded byRollins John

- 06Uploaded bydanishhafeezrana
- 26132040 Math WritingUploaded bydanishhafeezrana
- 03Uploaded bydanishhafeezrana
- 05Uploaded bydanishhafeezrana
- 2081574 Super Fast Mental Math Vedic Math HistoryUploaded bydanishhafeezrana
- 04Uploaded bydanishhafeezrana
- AX Arms Guide CorUploaded bydanishhafeezrana
- 01Uploaded bydanishhafeezrana
- 2428345 Chicken Falooda Mazahiya ShayeriUploaded bydanishhafeezrana
- 02Uploaded bydanishhafeezrana
- Physics Formula BookUploaded bydanishhafeezrana
- Trig BookUploaded bydanishhafeezrana