You are on page 1of 7

School of Economics and Finance

ECN121 Statistical Methods in Economics

Lecture 2
We often use what we call random variables to describe the important aspects of the outcomes of
experiments. In today’s lecture we introduce one first important type of random variable: discrete random
variable. We will learn how to find probabilities concerning discrete random variables and how to compute
the so-called first and second moment of a discrete random variable: the mean (expected value) and the
variance. We conclude the lecture by looking at two types of discrete probability distributions: the Binomial
Distribution and the Bernoulli distribution.

1. Random Variable: a random variable is a variable that assumes numerical values that are determined by
the outcome of an experiment, where one and only one numerical value is assigned to experimental outcome

 Formal Definition: Consider a sample space S. A random variable X is a function defined over S. Hence
X=X(s), where s is an elementary event varying in S

o Discrete Random Variable: this is a random variable that either assumes a finite number of
possible values or the possible values may take the form of a countable sequence or list such as 0, 1,
2, 3…(a countable infinite list)
Examples of discrete random variables:
 Number X of the next three customers entering a store who will make a purchase (X could be 1,
2 or 3)
 Rating X, on a 1 through 5 scale given by a student to a QM course
 Number X of fire alarms in the Queen’s Building in the last two months

Example 1
Suppose we carry out the experiment: throw a die twice. Let’s define the random variable X = sum of the
numbers of the two faces showing.
In this case the sample space is given by:

1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12

In this case the random variable X is now defined over a sample space given by:
X = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}

Example 2
Refer to the three children family example. Let’s define a random variable X = number of girls
In this case from the sample space S we have:

Elements of S GGG BBB GBB GGB BGB GBG BBG BGG


Number of girls 3 0 1 2 1 2 1 2

Notice that now X defines a new smaller sample space, {0, 1, 2, 3}. If we reorganise the events according to
the value of X we have:

1
Value of X Favourable outcome
X=3 GGG
X=2 GGB, GBG, BGG
X=1 GBB, BGB, BBG
X=0 BBB

2. Discrete Probability Distributions: the value assumed by a discrete random variable depends on the
outcome of an experiment. Because the outcome of the experiment will be uncertain, the value assumed
by the random variable will also be uncertain. However, it is often useful to know the probabilities that
are associated with the different values that the random variable can take on.

 Probability Distribution: the probability distribution of a discrete random variable is a table, graph,
or formula that gives the probability associated with each possible value that the random variable can
assume

 Formal Definition: consider a discrete random variable defined over a sample space S and taking
values in a set . Then, the probability distribution of X is the collection {P( X  x), x in }

o Computing the distribution of a random variable: the key formula is:


P( X  x)  P({s  S ; X ( s)  x})   P({s})
s; X ( s ) x

Example
Let’s go back to the experiment: sum of faces of two dice
Given that each outcome of the experiment has the same probability 1/36, the distribution of the random
variable is given by:

Values of X Favourable outcome {s;X(s)=x} P(X=x)


2 (1,1) 1/36
3 (1,2), (2,1) 2/36
4 (1,3), (2,2), (3,1) 3/36
5 (1,4), (2,3), (3,2), (4,1) 4/36
6 (1,5), (2,4), (3,3), (4,2), (5,1) 5/36
7 (1,6), (2,5), (3,4), (4,3), (5,2), (6,1) 6/36
8 (2,6), (3,5), (4,4), (5,3), (6,2) 5/36
9 (3,6), (4,5), (5,4), (6,3) 4/36
10 (4,6), (5,5), (6,4) 3/36
11 (5,6), (6,5) 2/36
12 (6,6) 1/36

Example
Here, let’s go back to the random variable X = number of girls in a three children family
Each families have the same probabilities 1/8, so the probability distribution is given by:

Value of X Favourable outcome {s; X(s)=x} P(X=x)


X=3 GGG 1/8
X=2 GGB, GBG, BGG 3/8
X=1 GBB, BGB, BBG 3/8
X=0 BBB 1/8

 Properties of a discrete probability distribution: in a general a discrete probability distribution p(x)


must satisfy two conditions:

o Condition 1
p(x) is in [0,1] for any value of x

2
This condition states that the probability P(X=x) of each value taken by the random variable X is a
value between 0 and 1 inclusive

o Condition 2
 x
p ( x)  1
This condition simply states that the sum of all probabilities in the distribution must add up to 1

Examples
 p(0) = p(3) = 1/8, p(1)=p(2)=3/8 is a (discrete) probability distribution over {0,1,2,3}: all the
probabilities are non-negative and sum to 1

 p(0)=0.14, p(1)=0.39, p(2)=0.36 and p(3)=0.11 is a probability distribution: all the probabilities are
non-negative and sum to 1

 p(0)=-0.1, p(1) = 0.6, p(2)=0.2, p(3)=0.3 is not a probability distribution: p(0) is negative!

 p(0)=0.1, p(1)=0.6, p(2)=0.2, p(3)=0.3 is not a probability distribution: the candidate probabilities are
non-negative but do not sum up to 1

 Graphical Representation of Discrete Probability Distribution: probability distributions can be


represented graphically through bar diagrams. The values of the random variable are plotted on the
horizontal axis while the probabilities of each outcome are plotted on the vertical axis. See examples at
the end of these notes

3. Mean and Expected Value: suppose that the experiment described by a random variable X is repeated
an indefinitely large number of times. If the values of the random variable X observed on the repetition
are recorded, we would obtain the population of all possible observed values of the random variable X.
This population has a mean, which we denote as  and which is also called the expected value of X

 Definition: The mean indicates the central tendency of the distribution. The mean of the discrete
distribution of a random variable is a centrality parameter defined as:
E ( X )    x xp ( x)
where E(X) stands for ‘expected value of X’

 Important: do not confuse the sample mean X introduced last semester in the Spreadsheet and Data in
Economics course and the population mean E(X). These are conceptually very different objects:
o The population mean E(X) is computed from a probability distribution which is supposed to describe
a population in an exact way
o The sample mean X is computed using frequencies from a sample extracted from the
aforementioned population: it is an approximation of the population mean

Example
Let’s go back to the example concerning the experiment: sum of faces of two dice. In this case the expected
value is given by:

Values of X P(X=x) E(X)=xp(x)


2 1/36 (2)(1/36)
3 2/36 (3)(2/36)
4 3/36 (4)(3/36)
5 4/36 (5)(4/36)
6 5/36 (6)(5/36)
7 6/36 (7)(6/36)
8 5/36 (8)(5/36)
9 4/36 (9)(4/36)
10 3/36 (10)(3/36)

3
11 2/36 (11)(2/36)
12 1/36 (12)(1/36)
sum 252/36 = 7

Example
Let’s go back to the random variable X = number of girls in a three children family. Recall that each families
have the same probability 1/8. In this case we have:

Value of X P(X=x) E(X)=xp(x)


X=3 1/8 (3)(1/8)
X=2 3/8 (2)(3/8)
X=1 3/8 (1)(3/8)
X=0 1/8 (0)(1/8)

Hence the population mean is given by:


E( X )  0  3 / 8  6 / 8  3 / 8  3 / 2  1.5

4. Variance and Standard Deviation: just as the population of all possible observed values of a discrete
random variable X has a mean , this population also have a variance 2 and a standard deviation . Recall
that the variance of a population is the average of the squared deviations of the different population values
from the population mean.

 Definition of variance: the variance of the discrete distribution {p(x), x} is a measure of spread
defined as
 2  x ( x   ) 2 p( x)
By using the expected value notation, then the population variance of X is given by:
Var ( X )  E ( X  E ( X )) 2  x ( x  E ( X )) 2 p( x)

 Definition of Standard Deviation: the population standard deviation of X is the positive square root of
the variance of X and it is denoted by the symbol :
  Var ( X )   2

 A practical formula to compute the variance: to compute the variance can sometimes be cumbersome.
The following formula is more convenient for practical computations. The formula is:
 2  x x 2 p( x)   2
or, by using the concept of expected value:
Var ( X )  E ( X 2 )  E 2 ( X )  x x 2 p( x)  E 2 ( X )
(see Proof at the end of these notes)

Example
X: number of girls in a three children family. Each family has probability 1/8

Preferred (most convenient) method as given by second expression:

Value of X P(X=x) E(X)=xp(x) E(X2)=x2p(x)


X=3 1/8 (3)(1/8) (3)2(1/8)
X=2 3/8 (2)(3/8) (2)2(3/8)
X=1 3/8 (1)(3/8) (1)2(3/8)
X=0 1/8 (0)(1/8) (0)2(1/8)

2
3 12 9  3 6 3  12 9 3
Var ( X )             0.75
8 8 8  8 8 8 4 4 4

4
By using the standard formula and supposing we have computed E(X)=1.5 then:
3  1 9  1 3  1 3  9
2 2 2 2
 3 1  3 3  3 3  24
Var ( X )   0    1     2     3      0.75
 2 8  2 8  2 8  2 8 48 48

 Chebyshev’s Theorem: the variance and the standard deviation measure the spread of the population of
all possible observed values of the random variable. Chebyshev’s theorem tells us that for any value of k
that is greater than 1, at least 100(1 – 1/k2)% of all possible observed values of the random variable X lie
in the interval [   k ]

o Stated in terms of probability we have:

P(x falls in the interval [  k ])  1  1 / k 2


Example
X: number of girls in a three children family.
We know that for this probability distribution the mean value is =1.5 and the standard deviation is
  Var ( X )  0.75  0.867
I we set k=2 we can calculate the interval:
[  k ]  1.5  2  0.867   0.234,3.734
Thus, the Chebyshev’s Theorem tells us that:
P(x falls in the interval [0.234,3.734])  1  1 / 2 2  3 / 4

 Empirical Rule: if the probability distribution of a discrete random variable can be approximated by a
normal curve, then the empirical rule for normally distributed populations describes the population of all
possible values of X. Specifically we can say that approximately, 68.26%, 95.44% and 99.73% of all
possible observed values of X fall in the interval [    ] , [   2 ] and [   3 ]

5. The Binomial Distribution: there are many types of discrete random variables and the commonest is the
so-called Binomial Distribution. Let’s investigate this distribution by starting with an example

 Problem: suppose that historical sales records indicate that 40% of all customers who enter a discount
department store make a purchase. What is the probability that two of the next three customers will make
a purchase?

Answer: in order to find this probability we first note that the experiment of observing three customers
making a purchase decision has several distinguishing characteristics:

1. the experiment consists of three identical trials (each trial consists of a customer making a purchase
decision);
2. two outcomes are possible on each trial: the customer makes a purchase (success); the customer does not
make a purchase (failure);
3. since 40% of all customers make a purchase we can reasonably assume that the probability of purchase
(success) P(S)=0.4
4. We assume that the customers make independent purchase decisions. The sample space in this
experiment is given by:
SSS SSF SFS FSS
FFS FSF SFF FFF
Two out of three customers make a purchase if one of the sample space outcomes SSF; SFS and FSS occurs.

Since the trial (purchase decisions) are independent we can simply multiply the probabilities associated with
the different trial outcomes to find the probability of a sequence of outcomes:
P(SSF) = P(S)P(S)P(F) = (0.4)(04)(06) = (0.4)2(0.6)

5
P(SFS) = P(S)P(F)P(S) = (0.4)(0.6)(0.4) = (0.4)2(0.6)
P(FSS) = P(F)P(S)P(S) = (0.6)(0.4)(0.4) = (0.4)2(0.6)

Hence we get that the probability that two out of the next three customers make a purchase is:
P(SSF) + P(SFS) + P(FSS) = (0.4)2(0.6) + (0.4)2(0.6) + (0.4)2(0.6) = 3(04)2(0.6) = 0.288

What shall we note from this example?


a. the number 3 in the solution is given by the number of sample space outcomes;
b. 0.4 is the probability of a purchase (p)
c. 0.6 is the probability of no purchase (1 – p)

Hence, the probability that two of the next three customers make a purchase is:
 The number of ways to arrange 2 
   p 2 (1  p)1
 successes among three trials 
Generalisation
If we denote the number of success with X and the number of trials with n then we can generalise the
calculation of the probability as:
 The number of ways to arrange x x n!
  p (1  p) n x  p x (1  p) n x
 successes among n trials  x!(n  x)!

To summarise:
A binomial experiment has the following characteristics:
1) the experiment consists of n identical trials;
2) each trial results in a success or a failure;
3) the probability of a success on any trial is p and remains constant from trial to trial. This implies that the
probability of failure (1-p) on any trial remains constant from trial to trial;
4) the trials are independent (that is the result of the trials have nothing to do with each other)

Finally, if we define the random variable


X = the total number of successes in n trials of a binomial experiment
then we call X a binomial random variable and the probability of obtaining X successes in n trials is:
n!
p ( x)  p x (1  p) n x
x!(n  x)!
The binomial distribution is said to be defined over two parameters: the probability of success p and the
number of trials n

Example: see lecture slides

Examples of Binomial Distribution:

Trial S/F p n S
Tossing a fair coin H/T 1/2 Number of tosses Total of heads
Sex of kth child F/M 1/2 Number of children Number of girls
Pure guessing on a MCQ with R/W 1/3 Number of questions Mark
three possible answers

 Mean and Variance of the Binomial Distribution: the mean of the (n,p) binomial distribution is

  np
and its variance is
 2  np(1  p)

6
6. The Bernoulli Distribution: the random variable X has a Bernoulli distribution with parameter p[0,1] if
and only if X takes value 0 or 1 with P(X=1)=p and P(X=0)=1 – p

 Mean and Variance of the Bernoulli Distribution: if X has the Bernoulli distribution with parameter p,
then
E( X )    p
Var ( X )  p(1  p)

Proof: E ( X )  E ( X 2 )  0  (1  p)  1 ( p)  p
And
Var ( X )  p  p 2  p(1  p)

Readings
 Newbold P., Carlson W.L., Thorne B.M., (2013), Statistics for Business and Economics, 8th edition,
Chapter 4, sections 4.1 – 4.5
 Doane D., Seward L., (2013), Applied Statistics in Business and Economics, 4th ed., chapter 6
 Lind D., Marchal W., Wathen S., (2012), Statistical Techniques in Business and Economics, 15th ed.,
chapter 6
 Nieuwenhuis G., (2009), Statistical Methods for Business and Economics, Chapter 8. Sections 8.2.1,
8.4.1, chapter 9, sections 9.1 and 9.2.1
 Bowerman B., O’Connell R., Murphree E., (2009), Business Statistics in Practice, chapter 5, sections
5.1-5.3
 Aczel A., Sounderpandian J., (2009), Complete Business Statistics, chapter 3
 Ronald J. Wonnacott, Thomas H. Wonnacott, Introductory Statistics for Business and Economics, chapter
4, sections 4.1-4.3

Probability Distribution of Random Variable: sum of faces of two dice Probability Distribution: number of girls in a three children family

0.18 0.40

0.16 0.35

0.14
0.30

0.12
0.25
probability

probability

0.10
0.20
0.08

0.15
0.06

0.10
0.04

0.02 0.05

0.00 0.00
2 3 4 5 6 7 8 9 10 11 12 0 1 2 3
sum of faces number of girls

Appendix
Prove that Var ( X )  E ( X 2 )  E 2 ( X )

We start from
Var ( X )  E[ X  E ( X )]2
By taking the power we get:
Var ( X )  E[ X 2  2 XE ( X )  E 2 ( X )]
By taking the expectation operator (we talk more about the expectation operator next week!) we get:
Var ( X )  E( X 2 )  2E( X ) E( X )  E 2 ( X )
Or
Var ( X )  E( X 2 )  2E 2 ( X )  E 2 ( X )
By simplifying this expression we get:
Var ( X )  E ( X 2 )  E ( X )

You might also like