You are on page 1of 24

1

200052 INTRODUCTION TO ECONOMIC


METHODS

SUMMARY NOTES - WEEK 4

Required Reading:
Ref. File 4: Sections 4.7 to 4.9
Ref. File 5: Introduction and Sections 5.1 to 5.4

4. PROBABILITY THEORY CONTINUED

4.9 Sampling With and Without Replacement

Definition (Random Sample from a Statistical Population)


A random sample of ‘n’ elements from a statistical
population is such that every possible combination of ‘n’
elements from the population has an equal probability of
being in the sample.

Many experiments involve taking a random sample from a


finite population. If we sample with replacement, we
effectively return each observation to the population
before making the next selection. In this way the
population from which we are sampling remains the same
from one selection to the next; provided sampling is
random, the successive outcomes will be independent.

If we sample without replacement from a finite


population, the outcome of any one selection will depend
on the outcomes of all previous selections; the population
is reduced with each selection.
2

Example 4.16:
Suppose that in a given street 50 residents voted in the last
election. Of these, 15 voted for party ‘A’, 30 voted for
party ‘B’ and 5 voted for neither party ‘A’ nor ‘B’.
Suppose that one evening a candidate for the next election
visits the residents of the street to introduce herself. What
is the probability that the first two eligible voters she
meets voted for party ‘A’ at the last election? ( 3 35 )

Here sampling is without replacement. Define the following


events:

A1 : first person voted for party ‘A’


A 2 : second person voted for party ‘A’

We require

P(A1  A 2 )  P(A1 ) P(A 2 | A1 )  (15 50)(14 49)  3 35

Example 4.17:
Consider the experiment of successively drawing 2 cards
from a deck of 52 playing cards. Define the following
events:

A 1 : ace on first draw


A 2 : ace on second draw

What is the probability of selecting 2 aces if sampling


(drawing) is (i) without replacement, and (ii) with
replacement? ( 1 221 , 1 169 )
3

Without replacement:

12 1
P (A1  A 2 )  P (A1 ) P (A 2 | A1 )  (4 52)(3 51)  
2652 221

With replacement:

1
P(A1  A 2 )  P(A1 ) P(A 2 )  (4 52)(4 52) 
169

Note: If we simultaneously select a sample of ‘n’ elements,


we are effectively sampling without replacement.

4.10 Probability Trees

Tree diagrams can be a useful aid in calculating the


probabilities of intersections of events (i.e. joint
probabilities).
4

Example 4.18:
Greasy Mo’s take-away food store offers special $10 meal
deals consisting of a small pizza or a kebab, together with
a can of soft drink, a milkshake or a cup of fruit juice.
Past experience has shown that 60% of meal deal buyers
choose a pizza (‘P’), 40% choose kebabs (‘K’), 75% choose
softdrink (‘S’), 20% choose a milkshake (‘M’) and 5%
choose fruit juice (‘J’). Assume the events ‘P’ and ‘K’ are
independent of the events ‘S’, ‘M’ and ‘J’. What is the
probability that a meal deal customer (chosen at random)
will choose a pizza and fruit juice? (0.03)

The tree diagram for this example can be drawn as below.


P (P  S)  0.6 (0.75)  0.45
S:0.75

M:0.2 P (P  M)  0.6 (0.2)  0.12

P:0.6 J:0.05
P (P  J)  0.6 (0.05)  0.03

P (K  S)  0.4 (0.75)  0.3


S:0.75
K:0.4
M:0.2 P (K  M)  0.4 (0.2)  0.08

J:0.05
P (K  J)  0.4 (0.05)  0.02

Thus P(P  J)  0.3, etc.


5

5. PROBABILITY DISTRIBUTIONS OF DISCRETE


RANDOM VARIABLES

5.1 Probability Distributions and Random Variables

A probability distribution can be considered a theoretical


model for a relative frequency distribution of data from a
real life population.

For example, the probability distribution normally used


for the experiment of tossing a fair coin once and noting
whether a head (‘H’) or tail (‘T’) results can be written

P (H)  1 2
P ( T)  1 2

This can be interpreted as saying that if the coin tossing


experiment were repeated many times, we would expect a
relative frequency of both outcomes to be a half.

A probability distribution thus specifies the probabilities


associated with the various outcomes of a statistical
experiment. It can take the form of a table, a graph or
some formula.

From now on we shall be concerned with the


characteristics of probability distributions. However, to
facilitate our study we shall now represent simple events
and events associated with statistical experiments by
values of random variables.
6

Definition (Random Variable)


A random variable X is a rule that assigns to each simple
event of a statistical experiment a unique numerical value.

The above definition can also be expressed in the following


slightly more mathematical way.

Alternative Definition (Random Variable)


A random variable X is a real valued function for which
the domain is the sample space of a statistical experiment.

Remember that by the term random experiment we mean


an experiment which gives rise to random outcomes.

In most statistical experiments of interest, outcomes give


rise to quantitative data that can be considered values of
the random variable being studied.

For example, if the experiment consists of selecting a


household at random and noting the number of children
in the household, we would naturally define

X  random variable representing the number of children


in a household.

X could thus take the values 0, 1, 2, 3,.... corresponding to


possible outcomes of the experiment.

In experiments which give rise to categorical or qualitative


data, a random variable can normally also be defined.
7

Example 5.1:
Consider the experiment of selecting a person at random
and noting their hair colour.

Here we could define X to be the random variable


representing hair colour, where

X  1 if the person’s hair colour is blonde


X 2 “ “ brown
X 3 “ “ grey
X 4 “ “ black
X 5 “ “ white
X 6 “ “ red

There are two basic types of random variables.

Definition (Discrete Random Variable)


A discrete random variable can only assume a finite or
infinite and countable number of values.

(By countably infinite we mean that the values can be


listed in order, although the list is infinitely long)

Definition (Continuous Random Variable)


A continuous random variable can assume any value in an
interval (finite or infinite).
8

Some examples of discrete random variables:


the number of errors on a typed page
the number of cars owned by a household

Some examples of continuous random variables:


the length of time between bus arrivals at a bus stop
the weight of an individual

At this stage we will concentrate on discrete random


variables.

Definition (Discrete Probability Distribution)


A discrete probability distribution lists a probability for, or
provides a means (e.g. a rule or formula) of assigning a
probability to, each value a discrete random variable can
take.

Suppose our random variable is called X. Then P ( X  x)


represents the probability that the random variable takes
on the particular value ‘x’. (As a result of the outcome of
an experiment).

Properties of the Discrete Probability Distribution of a


Random Variable X:

 0  P ( X  x)  1 for all values of ‘x’


  P ( X  x)  1
all x
9

Example 5.2:
Consider again the experiment of tossing a fair die once
and noting the number of dots on the upward facing side
(X).

We have

P( X  1)  P( X  2)  P( X  3)
 P( X  4)  P( X  5)  P( X  6)  1 6

and  P( X  x)  1
all x

At this point we can also introduce the concept of a


cumulative distribution function (or simply distribution
function) of a random variable (discrete or continuous).

Definition (Cumulative Distribution Function)


The cumulative distribution function of a random variable
X, denoted F ( x) , is defined as

F ( x)  P ( X  x)

where ‘x’ is any real number.

(In the above definition, ‘x’ represents not just numbers


that the random variable can take).

Thus a cumulative distribution function shows the


probability of the random variable taking on values less
than or equal to some value ‘x’.
10

5.2 Expected Values of Random Variables

It is of interest to have a measure of the centre of the


probability distribution of a random variable X. This role
is filled by the expected value of X.

Definition (Expected Value of a Discrete Random


Variable)
The expected value of a discrete random variable X is
defined as

E ( X )   xP ( X  x)
all x

(A weighted average of all the values X can take)

If a statistical experiment considered generates values of


the random variable that coincide with values in the
population considered, and the theoretical probability
distribution of the random variable and population
relative frequency distribution are the same, the mean of
the theoretical distribution of X will be the same as the
population mean (i.e.  ). That is, E ( X )   .

We will generally assume that our model (i.e. the


probability distribution) is correct, so the above holds.
11

Example 5.3:
Suppose you buy a lottery ticket for $10. The sole prize in
the lottery is $100,000 and 100,000 tickets are sold. If the
lottery is fair (i.e. each ticket sold has an equal chance of
winning), what will be your expected net gain (or loss)
from buying the lottery ticket?

(See video for solution)

Theorem (Expected Value of a Function of a Discrete


Random Variable)
Suppose a function g( X ) of a discrete random variable X.
The expected value of this function, if it exists, is given by

E [ g( X )]   g( x) P ( X  x)
all x

There are several important properties related to expected


values.
12

Theorem 5.2 (Various Properties of Expected Values)

 If ‘c’ is any constant then

E (c )  c

 If ‘c’ is any constant and g( X ) is any function of a


discrete or continuous random variable X then

E [cg( X )]  cE [ g( X )]

 If g i ( X ) ( i  1,...,k ) are ‘k’ functions of a discrete or


continuous random variable X then

E [ g1 ( X )  ..  gk ( X )]  E [ g1 ( X )]  ...  E [ gk ( X )]

 If h( X ) and g( X ) are two functions of a discrete or


continuous random variable X such that h( X )  g( X )
for all X, then

E [h ( X )]  E [ g( X )]

For example, E ( X  Y )  E ( X )  E (Y )

Note:
Two discrete random variables X and Y are independent if

P ( X  x | Y  y )  P ( X  x) for all values of x and y.


13

(or equivalently P (Y  y | X  x)  P (Y  y ) for all values


of x and y)

5.3 The Variance of a Random Variable

To gauge the dispersion of a random variable X about its


expected value or mean we can calculate the expected
value of its squared distance ( X  E ( X ))2 from the mean.
This is called the variance of the random variable X,
denoted Var ( X ) .

Definition (Variance of a Random Variable)


The variance of any random variable X (discrete or
continuous) is given by

Var ( X )  E [( X  E ( X )) 2 ]

If X is a discrete random variable that can take ‘n’


different values ( x1 , x 2 ,...,x n ) , the above definition
specializes to
n
Var ( X )   [x i  E ( X )]2 P ( X  x i )
i 1

Definition (Standard Deviation of a Random Variable)


The standard deviation of any random variable X (discrete
or continuous) is given by

SD ( X )  Var ( X )  E [( X  E ( X )) 2 ]
14

Again assuming the probability distribution of X is an


accurate representation of the population relative
frequency distribution of X, we can write Var ( X )   2 ,
where  2 is the population variance.

An alternative way of writing (and calculating) Var ( X ) is

Var ( X )  E ( X 2 )  [ E ( X )]2
 
   x 2 P ( X  x)  [ E ( X )]2 (If X is discrete)
 all x 

Example 5.4:
Suppose a lottery offers 3 prizes: $1,000, $2,000 and
$3,000. 10,000 tickets are sold and each ticket has an
equal chance of winning a prize. Calculate the variance
and standard deviation of the random variable X
representing the value of the prize won by a ticket.

(See video for solution)


15

If we wish to determine the variance of a linear function


Y  g( X )  a  bX of a random variable X, the following
rule can be used

Var (Y )  Var (a  bX )  b 2Var ( X )

5.4 The Binomial Distribution

The binomial distribution is a discrete probability


distribution based on ‘n’ repetitions of an experiment
whose outcomes are represented by a Bernoulli random
variable.

(a) Bernoulli Experiments

A Bernoulli experiment (or trial) is such that only 2


outcomes are possible. These outcomes can be denoted
success (‘S’) and failure (‘F’), with probabilities ‘p’ and
(1  p) , respectively.

A Bernoulli random variable Y is usually defined so that it


takes the value 1 if the outcome of a Bernoulli experiment
is a success, and the value 0 if the outcome is a failure.
16

Thus

P (Y  1)  p
P (Y  0)  (1  p )

The mean and variance of a Bernoulli random variable


defined in the above way are

E (Y )  p
Var (Y )  p(1  p )

An example of a Bernoulli experiment is the tossing of a


fair coin, denoting a head a success (Y  1) and a tail as a
failure (Y  0) , with p  1 2 .

(b) Binomial Experiments

Definition (Binomial Experiment)


A binomial experiment fulfils the following requirements:

(i) There are ‘n’ repetitions or ‘trials’ of a Bernoulli


experiment for which there are only two
outcomes, ‘success’ or ‘failure’.
(ii) All trials are performed under identical
conditions.
(iii) The trials are independent.
(iv) The probability of success ‘p’ is the same for each
trial.
(v) The random variable of interest, say X, is the
number of successes observed in the ‘n’ trials.
17

Theorem (The Binomial Probability Function)


Let X represent the number of successes in a binomial
experiment consisting of ‘n’ trials and with a probability
‘p’ of success on each trial. The probability of ‘x’
successes in such an experiment is given by

P ( X  x)  nCx p x (1  p)n x for x  0,1,2,3,...,n

(See reference file for proof if interested)

Example 5.5:
A company that supplies reverse-cycle air conditioning
units has found from experience that 70% of the units it
installs require servicing within the first 6 weeks of
operation. In a given week the firm installs 10 air
conditioning units. Calculate the probability that, within 6
weeks
 5 of the units require servicing
 none of the units require servicing
 all of the units require servicing

(See video for solution)


18

(c) Cumulative Binomial Probabilities

The calculation of cumulative binomial probabilities of the


form P ( X  c) is often tedious, even using a calculator.
However, tables to determine such probabilities are
available. (See Reference files appendix Table 3)

(Extract of Appendix 3)
CUMULATIVE BINOMIAL PROBABILITIES: P ( X  x | p, n)

p
n x 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 .... 0.70
1 0 0.9500 0.9000 0.8500 0.8000 0.7500 0.7000 0.6500 0.6000 .... 0.3000
1 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000

2 0 0.9025 0.8100 0.7225 0.6400 0.5625 0.4900 0.4225 0.3600 0.0900


1 0.9975 0.9900 0.9775 0.9600 0.9375 0.9100 0.8775 0.8400 .... 0.5100
2 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000

3 0 0.8574 0.7290 0.6141 0.5120 0.4219 0.3430 0.2746 0.2160 0.0270


1 0.9928 0.9720 0.9393 0.8960 0.8438 0.7840 0.7183 0.6480 .... 0.2160
2 0.9999 0.9990 0.9966 0.9920 0.9844 0.9730 0.9571 0.9360 0.6570
3 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000

           
10 0 0.5987 0.3487 0.1969 0.1074 0.0563 0.0282 0.0135 0.0060 0.0000
1 0.9139 0.7361 0.5443 0.3758 0.2440 0.1493 0.0860 0.0464 0.0001
2 0.9885 0.9298 0.8202 0.6778 0.5256 0.3828 0.2616 0.1673 0.0016
3 0.9990 0.9872 0.9500 0.8791 0.7759 0.6496 0.5138 0.3823 0.0106
4 0.9999 0.9984 0.9901 0.9672 0.9219 0.8497 0.7515 0.6331 0.0473
5 1.0000 0.9999 0.9986 0.9936 0.9803 0.9527 0.9051 0.8338 .... 0.1503
6 1.0000 1.0000 0.9999 0.9991 0.9965 0.9894 0.9740 0.9452 0.3504
7 1.0000 1.0000 1.0000 0.9999 0.9996 0.9984 0.9952 0.9877 0.6172
8 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9995 0.9983 0.8507
9 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9718
10 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
19

Example 5.6:
Referring to previous air conditioning unit example,
calculate the probability that within 6 weeks of installation

 less than 8 of the air conditioners require servicing.


 4 or more of the air conditioners require servicing.

(See video for solution)


20

Example 5.7:
A referring to previous air conditioning unit example, use
the cumulative binomial tables to calculate the probability
that within 6 weeks of installation

 5 units require servicing


 10 units require servicing

(See video for solution)


21

(d) Characteristics of the Binomial Distribution

Theorem (Mean and Variance of a Binomial Random


Variable)
Let X represent the number of successes in a binomial
experiment consisting of ‘n’ trials, and where the
probability of success on each trial is ‘p’. Then

E ( X )  np
Var ( X )  np(1  p)

For example, the mean and variance of the binomial


distribution of the previous air conditioning unit example
are 10(0.7)  7 and 10(0.7)(0.3)  2.1, respectively.

Each combination of ‘n’ and ‘p’ gives a particular


binomial distribution. We say ‘n’ and ‘p’ are the
parameters of the binomial distribution.

If p  0.5 , the binomial distribution is symmetric.


22

Example 5.8:
Suppose n  5 and p  0.5

(probability histogram)
probability
0.3125

0.1563

0.0313

0 1 2 3 4 5 X

The binomial distribution will be skewed to the left (i.e.


‘negatively skewed’) if p  0.5 , and skewed to the right
(i.e. ‘positively skewed’) if p  0.5 . In either case the
tendency to be skewed diminishes as ‘n’ increases.

(See the diagrams in reference file). This is a


characteristic which is useful in approximating binomial
probabilities as we shall see later.
23

MAIN POINTS

 If we sample without replacement from a finite


population, the outcome on any draw will depend on the
outcomes of all previous draws.

 Sampling with replacement from a finite population is


‘equivalent’ to sampling from an infinite population.

 Tree diagrams can facilitate the calculation of joint


probabilities (i.e. the probabilities of intersections of
events).

 A probability distribution can be interpreted as a model


for the relative frequency distribution of some real
statistical population. In any given situation, the model
may or may not represent the relative frequency
distribution exactly.

 It is convenient to associate the outcomes of a statistical


experiment with values of a random variable (e.g. X).
We can then think in terms of the probability
distribution of the random variable.

 The mean (expected value) and variance of a discrete


random variable are given by

E ( X )   x P ( X  x) ( )
all x

Var ( X )  E [( X  E ( X )) 2 ]
  ( x  E ( X )) 2 P ( X  x) (  2 )
all x
24

 The binomial distribution is a model for the relative


frequency (probability) distribution of numbers of
successes in ‘n’ trials of a Bernoulli experiment.

 The binomial distribution can be represented by the


probability function

P ( X  x)  C p x (1  p)n  x
n x

where ‘n’ is the number of trials, ‘x’ the number of


successes and ‘p’ the probability of success at each trial.

You might also like