0% found this document useful (0 votes)
71 views17 pages

Statistics Chapter One

Chapter One provides an overview of basic probability theory, defining key concepts such as probability, events, and outcomes, and outlining the importance of probability in various fields. It discusses different approaches to probability, including classical, long-term relative frequency, and subjective probability, along with their limitations. Additionally, the chapter explains sample space and how to calculate probabilities based on sample outcomes through examples.

Uploaded by

milkesomidaksa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd

Topics covered

  • Probability in Economics,
  • Probability in Machine Learnin…,
  • Joint Probability,
  • Real-world Applications,
  • Rules of Probability,
  • Inferential Statistics,
  • Probability in Gaming,
  • Random Trials,
  • Probability Definition,
  • Classical Probability
0% found this document useful (0 votes)
71 views17 pages

Statistics Chapter One

Chapter One provides an overview of basic probability theory, defining key concepts such as probability, events, and outcomes, and outlining the importance of probability in various fields. It discusses different approaches to probability, including classical, long-term relative frequency, and subjective probability, along with their limitations. Additionally, the chapter explains sample space and how to calculate probabilities based on sample outcomes through examples.

Uploaded by

milkesomidaksa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd

Topics covered

  • Probability in Economics,
  • Probability in Machine Learnin…,
  • Joint Probability,
  • Real-world Applications,
  • Rules of Probability,
  • Inferential Statistics,
  • Probability in Gaming,
  • Random Trials,
  • Probability Definition,
  • Classical Probability

CHAPTER ONE

1. OVERVIEW OF BASIC PROBABILITY THEORY

1.1. AIMS AND OBJECTIVES

Probability theory forms the basis for inferential statistics as well as other fields that require
quantitative assessment of chance occurrences; such as quality control, management decision analysis;
and in areas of the natural sciences, engineering, economics etc.

After completing this unit, you will be able to

 define probability
 define important terms in probability
 identify the approaches in probability
 list sample space of an experiment
 identify the types of events
 calculate probabilities using deferent rules.
1.2 THE CONCEPT OF PROBABILITY

1.2.1 PROBABILITY DEFINED

Since life is full of uncertainties, people have always been interest in evaluating probabilities. The
theory of probability is an in indispensable tool in the analysis of situations involving uncertainty.

Probability can be defined as

- A mathematical means of studying uncertainty and variability.


- A number that conveys the strength of our belief in the occurrence of an uncertain event

From the above definitions you can differentiate probability to chances or possibilities. As the latter cannot
be quantified. Probability is a number between zero and one inclusive. The probability of zero represents
something that cannot happen and the probability of one represents something that is certain to happen.
The closer a probability is to zero, the more improbable it is that something will happen the closer the
probability is to one the more sure we are it will happen. When probability is 0.5 uncertainty will reach its
maximum.

1.2.2. WHY PROBABILITY


The results obtained using descriptive analysis can’t be generalized outside the observed data in question.
Any question relating to the population from which the observed data (sample) were drawn can’t be
answered within the descriptive statistics framework. So what should we do? In order to generalize beyond
the data in hand, we need to model the distribution of data of the population and not just describe the
observed data in hand. Such general model is provided by probability theory. It turns out that the model
provided by the probability theory owes a lot to the earlier developed descriptive statistics (such as the

1
concepts of location, dispersion and shape). E.g. the frequency curve becomes a density function
purporting to model observable real world phenomena.

Important terms
1. Experiment
A process that leads to the occurrence of one and only one of several possible observations or

A process of observation that has an uncertain outcome. eg Tossing a coin; answering a question
where the answer can be correct or incorrect; drawing a card from a deck of playing card.

2. Event
A collection of one or more outcomes of an experiment or an experimental outcome that may or may
not occur. If the experiment is tossing a coin the events are Head, or Tail.

3. Outcome
A particular result of an experiment. In case of tossing a coin, If head face up we will consider head as
the out come of the experiment.

1.3. APPROACHES IN PROBABILITY


1.3.1Objective Probability

A) Classic Probability

It is probability based on the symmetry of games of chance or similar situations. This probability is based
on the idea that certain occurrences are equally likely. eg. The numbers 1,2,3,4,5,and 6 on fair die are
equally likely to occur i.e they do have equal chance of occurrence.

If a random experiment can result in ‘N’ mutually exclusive and equally likely outcomes and if N A of the
outcomes result in the occurrence of the event A, then the probability of A is defined by P(A) = NA/N
Examples:
1. Flipping of a fair coin twice and observing the face which shows-up the set of all equally likely
outcomes is S = {(H T), (T H), (H H), (T T)}, with N = 4. Let the event A be observing at least one
head (H); then A= {(H T), (T H), (H H)}. Since NA = 3, P(A) = ¾
2. A fair die is thrown. Find the probabilities that the face on the die is (a) Maximum (b) Prime (c)
Multiple of 3 (d) Multiple of 7
Solution:
There are 6 possible outcomes when a die is tossed. We assumed that all the 6 faces are equally likely. The
classical definition of probability is to be applied here. The sample space is S = {1, 2, 3, 4, 5, 6}, N = 6
a)Let A be the event that the face is maximum. Thus, A = {6}, NA = 1. Therefore, P(A) = NA /N = 1/6
b)Let B be the event that the face is prime. Thus, B = {2, 3, 5}, N A = 3. Therefore, P(B) = NB/N = 3/6

c)Let C be the event that the face is multiple of three. Thus, C = {3, 6}, NC = 2.
2
Therefore, P(C) = NC/N = 2/6 = 1/3
d)Let D be an event that the face is multiple of 7. Thus, D = {ϕ}, ND = 0. Therefore, P(D) = ND/N =
0/6 = 0 (not possible).

Limitations of the classical approach


A) Applicable to situations when there is only a finite number of possible outcomes. It can’t be applicable
in situations when there is a difficulty of counting the total outcomes or the outcomes favorable to the
event. It is difficult to count the fish in the ocean. Thus it is difficult to find the probability of catching a
fish of some weight say more than one kilogram.
B) The equally likely condition renders the definition circular.
Someone could flip a coin indefinitely without ever turning up heads. It is applicable to situations where
are apparent objective symmetry exists-which raises the question of circularity. For this reason, we would
get trouble when we tried to employ it to the case of a biased coin or in dealing with the subjective
probability issues (e.g. the probability of Ethiopia’s next year inflation to be a slight below 12%).

B) Long-term Relative Frequency Probability

The probability of an event happening in the long-term is determined by observing what fraction of the
time similar events happened in the past. We often think of a probability in terms of the percentage of the
time the event would occur in many repetition of the experiment. Suppose that A is an event that might
occur when a particular experiment is performed then the probability that the event A will occur, P (A),
can be interpreted to be the number that would be approached by the relative frequency of the event A if
we perform the experiment an indefinitely large number of times.

Therefore, probability is interpreted as stemming from the observed stability of empirical frequencies. In a
coin flipping experiment for instance, probability of A = {H} is ½, not because there are two equally
outcomes but because repeated series of large numbers of trials demonstrates that the empirical frequency
of occurrence of A converges to that limit ½ as the number of an event A in ‘n’ trials goes to infinity.

Symbolically;

or,
Probability of an event happening = Number of times the event occurred in past
Total number of observation
Example

If a truck operator experienced 5 accidents out of 50 truck last year, then the probability that a truck will
have an accident next year can be 5/50 = 0.10

3
Limitations of the frequency approach
1) What is mean by the limit as ‘N’ goes to infinity
2) How can we generate infinite sequence of trials?
3) What happens to phenomena where repeated trials are not possible?

1.3.2 Subjective Probability

When there is no past experience or little on which to base a probability, personal judgment, experience,
intuition or expertise or any other subjective evaluation criteria will be applied to estimating or assigning
probability. This probability is subjective probability.

It is also called personal probability. Unlike objective probability one person’s subjective probability may
very well different from another person’s subjective probability of the same event.

eg. A physician assessing the probability of a patient’s recovery and an expert in the national bank
assessing probability of currency devaluation are both making a personal judgment based on what they
know and feel about the situation and other group of physicians or experts will arrive with different
probability, though both can employee identical techniques or approaches and information.

Both classic and long-term relative frequency probabilities are objective in the sense that no personal
judgment is involved. But one thing needs to be clear is that whatever the kind of probability involved
/subjective or objective/ the same set of mathematical rules holds for manipulating and analyzing
probability.

1.4 SAMPLE SPACE AND SAMPLE SPACE OUTCOME

In order to calculate and interpreter probabilities it is important to understand and use the idea of
sample space.

SAMPLE SPACE AND EVENTS : UNIVARIATE CASE


Suppose we toss a single coin and observe whether it comes up heads or tails.
 Population: is an infinite sequence of tosses of a single coin. We are not certain about whether the
result will be a head or a tail. Therefore, this coin toss is an example of a random trial or
experiment.
 Random trial/experiment: is an activity having two or more possible outcomes with uncertainty
in advance as to which outcomes will prevail.

4
 Basic outcomes/sample points/simple events: are the different possible outcomes of the random
trial.
 Sample space: is the set of all basic outcomes for the random trial.
In our example, a single coin toss, which we denote by S, contain the basic outcomes, H (heads) & T (tail).
The set of basic outcomes can be written as; S = {H, T}. These basic outcomes are mutually exclusive
(only one can occur at a time) and mutually exhaustive (at least one of them must occur).

BIVARIATE CASE
Suppose we flip two coins simultaneously & record whether they come up heads or tails. Now we have
four simple events: S = {HH, HT, TH, T T}.
 Event (E) - is a subset of sample points. E.g. E 1 = event ‘at least one head’ = {HH,
HT, TH}. E2 = events ‘both faces same’ = {HH, TT}.
 Complementary event to Ej - is a set of outcomes not contained in an event Ej. E.g.

complementary event to E1 = = {T T} and = {HT, TH}


 Intersection of Ei and Ej - is a set of sample points that belongs to both events Ei
and Ej. E.g.

E1 ∩ = {HT, TH} = . ∩ = ϕ = nil.


 When the intersection of the events is the null event, they are said to be mutually
exclusive. It should be obvious that intersection of an event and its complement is the null
event.
 Union - is the set of sample points that belongs to at least one of the events of E i and
E j.

E.g. U = {HT, TH, T T}. The union of all elementary events gives the sample space it
self.
 Universal event - is an event that contains the entire sample space.
 Event space – is a set of all possible events that can occur in a randomly trial or
experiment, including both the universal event and the null event.

Example 1

A newly married couple plans to have two children. Naturally, they are curious about whether their
children will be boys or girls. Therefore, we consider the experiment of having two children.

In order to find the sample spaces of this experiment, of having two children, we let ‘B’ denote that
child is a boy and ‘G’ denotes that child is a girl.

5
This experiment is a two-step process i.e having the first child, which could be a boy or a girl and
having the second child, which could also be either a boy or a girl.

This can be constructed by a tree diagram. Each branch of the tree leads us to a distinct sample
space outcome.

Sample Space outcomes

BB
Boy(B)

Girl (G)
BG
Boy (B)

GB
Boy(B)
Girl (G)

Girl (G)
GG

1st Child 2nd Child

We see that there are four sample space outcomes. Therefore the sample space (i.e the set of all of the
distinct samples space outcomes is BB BG GB GG.

In order to consider the probabilities of these outcomes, suppose that boys and girls are equality likely
each time a child is born. This says that each of the sample space outcomes is equally likely. i.e.

P(BB) = p(BG)=p(GB)=p(GG)= 1/4 This says that there is a 25%, chance that each of these outcomes
will occur. Since we are certain that there is no other option or combination remaining, the probability
that the couple will have any one of the sample space outcomes is one. i.e. P(BB) + P(BG) + P(EB) +
P(EG) = 1

Notice that these probabilities sum one i.e the sum of the probabilities of all sample space outcomes is
one.

Therefore the sample space (that is, the set of all of the distinct sample space out comes) is BB, BG,
GB, GG

Example 2

6
A student takes a quiz that consist of three true or false questions. If we consider our experiment to be
answering the three questions, each question can be answered correctly or incorrectly.

Let c denote answering a question correctly and I denote answering a question incorrectly. Then we
can depict a tree diagram of the sample space out come for the experiment.
CCC

Correct (c) CCI


Incorrect (I)

Correct (c) Correct (c) CII


Incorrect I
Incorrect (I)
Correct (c) CII
Correct (c)

ICC
Correct (c)
ICI
Incorrect (I)
Incorrect I IIC
Step I Correct (c) III
Answering Incorrect (I)
the 1st Sample space
question Step II
Answering Incorrect (I)
the 2nd
question Step III
Answering
the 3rd
question

This diagram portrays the experiment as a three-step process

Step I – answering the 1st question (Correctly or incorrectly) (C or I)

Step II – answering the 2nd question (Correctly or incorrectly).

Step III – answering the 3rd question (Correctly or incorrectly).

The tree diagram has eight different branches and the eight distinct sample space outcomes are listed at
the end of the branches. We see the sample space is

CCC CCI CIC CII ICC ICI IIC III

Now suppose that the student was totally unprepared for the test, and has to blindly guess the answer
to each question that is the student has a 50-50 chance or 0.5 probability of correctly answering each
question. This means that each of the eight sample space outcomes is equally likely to occur.

i.e

7
P(ccc) = P(ccI) ------P(III) =1/8

Here also the sum of the probabilities of the sample space out comes is one.

In General the sum of the probabilities of all the sample space is equal to 1.

Finding Probabilities by using Sample Space

If all of the sample space out comes are equally likely, then the probability that an event will occur is
equal to:

= The number of sample space outcomes that correspond to the event

The total number of sample space outcomes.

Consider the couple planning to have two children to find the probability of two boys first we have to
find the sample space outcome corresponding to the event of having the first child a boy and the
second child also a boy.

There is only one sample space outcome corresponding to this event i.e. BB so the probability will be:

= 0.25 the probability that the couple will have a boy and a girls is similarly calculated by first

identifying the sample space outcomes corresponding to the event of having a boy and a girls. The

sample space outcomes are BG and GB. So the probability will be = 0.5

Often time it may be practically impossible to list all possible sample space outcomes of an
experiment. Under such circumstances we can find the probability of an event by identifying the
number of sample space outcomes /without listing/ corresponding to the event.

Example - Suppose that 650.000 of 1,000,000 households in Addis subscribe to a newspaper called
Addis Zemen, and consider randomly selecting one of the Households in this city. That is consider
selecting one household & giving each and every household in the city the same chance of being
selected. Let A be the event that the randomly selected household subscribes to the Addis Zemen.
Then since the sample space of this experiment consists of 1,000,000 equally likely sample space
outcomes (households). It follows that

P(A) = the number of Households that subscribe to the Addis Zemen


The total number of households in the city

= 650,000 = 0.65
8
1000,000

Now also suppose that 500,000 households in the city subscribe to the Ethiopian Herald (H) and
further suppose that 250,000 households subscribe to both the newspapers.

We consider randomly selecting one household in the city, and we define the following events

A = The random of selected house hold subscribes to the Addis Zemen.

Ā = The randomly selected, hosehold does not subscribe to the Addis Zemen.

H= The randomly selected household subscribes to the Ethiopian Herald.

= The randomly selected household does not subscribe the Herald.

Using the notation AnH to denote both A& H we also define.

AnH = The randomly selected household subscribes both to Addis Zemen & Herald.

Since 650,000 of the 1,000,0000 households subscribe to the Addis Zemen (that is correspond to the
event Occurring). Then 350,000 households do not subscribe to Zemen (Ā) i.e. 1,000,000 – 650,000.

Similarly since 500,000 households subscribe to Herald (H) 500,000 households do not subscribe to
herald ( ).

Next consider the events

An = the randomly selected household subscribes to Zemen and does not subscribe to Herald;

ĀnH = the randomly selected household does not subscribe to Zemen and does subscribe to Herald.

A summary of the number of house holds corresponding to the events A, Ā, H, and AnH

Events Subscribe to Herald Does not subscribe to Herald Total


Subscribe Addis Zemen 250,000 650,000
Does not subscribe to Addis Zemen 350,000
Total 500,000 500,000 1,000,000

Define the event Ā n ,

Ān = the randomly selected household does not subscribe to both newspaper.

9
Since 650,000 households subscribe to the Addis Zemen (A) and 250,000 households subscribe to both
Zemen and Herald (AnH) it follows that 650,000 – 250,000 = 400,000 house holds subscribe to Addis
Zemen but do not subscribe to Herald, (An ). This subtraction is illustrated in the table below.

By similar logic

a. 500,000 – 250,000 = 25,000 households do not subscribe to Addis Zemen but do subscrige to
Herald (Ā nH)
b. 350,000 – 250,000 = 100,000 households do not subscribe the Addis Zemen and also do not
subscribe the Herald (Ā n )
c. Subtracting to find the number of households corresponding to the events.
d. AnH, An ,

Event H

A 250,000 650,000- 650,000


250,000

Ā 350,000

Total 500,000 500,000 1,000,000

e. (Ā n H) = 5000,000-250,000 = 250,000

f(Ā n ) = 350,000 – 250,000

= 100,000

A contingency table summarizing subscription data for Addis Zemen and Herald

Event Subscribe to Does not Subscribe to Total


Herald Herald

(H) ( )

Subscribe to Addis 250,000 400,000 650,000


Zemen

(A)

Does not subscribe to 250,000 100,0000 350,000


Addis Zemen (Ā)

Total 500,000 500,000 1,000,000

10
Now since we will randomly select one household (making all the households equally likely to be
chosen), the probability of any of the previously defined events is the ration of the number of
households corresponding to the event’s occurrence to the total number of households in the city.

Therefore

P(A) = 650,000 = 0.65


1,000,000

P(H) = 500,000 = 0.5


1,000,000

P(AnH) = 250,000 = 0.25


1,000,000

Next letting AUH denote either A or H, we consider finding the probability of the event

AUH = the randomly selected household subscribes to either the Addis Zemen or Herald. (i.e
subscribe to at least one of the two newspapers).

We see that the households subscribing to either Addis Zemen or Herald:

a) The 400,000 households that subscribe to only Addis Zemen, An


b) The 250,000 house holds that subscribe to only the Herald, ĀnH and
c) The 250,000 households that subscribes to both Addis Zemen and Herald, AnH. Therefore
since a total of 900,000 households subscribe to either the Addis Zemen or Herald it follows: -
P(AUH) = 900,000 = 0.9
1,000,000

i.e 90% of the house holds in the city subscribe to either Addis Zemen or Herald.

Notice that P(AUH) = 0.9 does not equal

P(A) +P(H) = 0.65 +0.5 = 1.15

Logically the reason for this is that both P(A) = 0.65 and P(H) = 0.5 count the 25% of the households
that subscribe to both newspapers. Therefore;

the sum of P(A) and P(H) counts this 25% of the households once to often

It follows that if we subtract P(AnH) = 0.25 from the sum of P(A) and P(H) then we will obtain
P(AuH) i.e

11
P(AuH) = P(A)+P(H) – P(AnH)

= 0.65+0.5-0.25 = 0.90

The intersection and union of Two events.

Given two events A&B

1) The Intersection of A&B is the event consisting of the sample space outcomes belonging to both
A&B, denoted AnB. Further more P(AnB) denotes the probability that Both A&B will
simultaneously Occur.
2) The union of A&B is the event consisting of sample space outcomes belonging to either A or B.
The union is denoted AUB Further more P(AUB) denotes the probability that either A or B will
occur.

1.5. RULES OF PROBABILITY (BASIC AXIOMS/ THEOREMS)


1) The probability of any basic outcome or event consisting of a set of basic outcomes must be between
zero and one. That is, for any outcome Oi or event Ei containing a set of outcomes we have
0 <= P(Oi) <= 1
0 <= P(Ei) <= 1 ----------------------------------------- (1)
If P(Oi) = 1 or P(Ei) = 1 the respective outcome or event is certain to occur; if P(oi) = 0 or P(Ei) = 0, the
outcome certain not to occur. It follows that probability can not be negative.
2) For any set of events of the sample space S (and of the event space E),
j

P(E j) =  P(Oi) ---------------------------- (2)


t=1

Where J is the number of basic events or sample points contained in the event Ej . In other words, the
probability that an event will occur is the sum of the probabilities that the basic outcomes contained in that
event. This follows from the fact that an event is said to occur when one of the basic outcomes or sample
points it contains occurs.
3) Since it is certain that at least one of the sample points or elementary events in the sample space will
occur,
P(S) = 1. And the null event cannot occur, so P(ϕ) = 0 where ϕ is the null event. These results follow
from the fact that P(S) is the sum of the probabilities of all the simple or basic events.
4) If two events A and B are mutually exclusive events (they are disjoint so that (A∩B) = ϕ then P(A∩B)
= 0), then the probability of either A or B, symbolically P(A or B) = P(AUB) = P(A) + P(B). It is called
addition rule.
5) If two events A and B are not mutually exclusive, then P(A or B) = P(AUB) = P(A) + P(B) – P(A∩B)
6) If A is an event from sample space, S and A’, is its complementary, then P(AUA’) = 1
7) If two events are independent, the probability of both A and B is given P(A∩B ) = P(A)*P(B)

12
8) If two events are dependent, the probability of both occurring simultaneously is given by P(A∩B) =
P(A)*P(B/A) or = P(B)*P(A/B), it follows that for any dependent events (say event A and B), P(A) never
be equal to P(A/B) or P(B) never be equal to P(B/A).

1.5.1. PROBABILITIES UNDER SPECIFIC AXIOMS


There are three types of probabilities: marginal probability, joint probability and conditional probability.
Marginal probability is a probability of an event which is not conditional upon the occurrence of any
other event.
Joint probability is a probability of the joint events. Let E1 and E 2 are events, then the joint event for these
two events will be P(E1E 2).
Conditional probability is a probability for an event which is conditional upon the occurrence of another
event. Assume A and B are statistically dependent events, then we can write the conditional probability of
these events as P(A/B) [i.e. the probability of A given that B has occurred’] and, or P(B/A) [i.e. the
probability of B given that A has occurred]. Based on the theorems (7) and (8) above, the marginal
probability and the conditional probability would be equal when the events are not statistically
dependent or being different otherwise.

Example 1
Take a certain firm that engage in production and delivering of parts for oil drilling rigs operating firm.
The relevant random trial is the delivery of a part. Two characteristics of the experiment are of interest -
first, whether the correct part was delivered and second, the number of days it took to get the part to the
drilling site.

Time of Delivery
Same day Next day More than one day Sum
Order Status (S) (N) (M)
Correct (C) 0.6 0.24 0.12 0.96
Incorrect order (I) 0.025 0.01 0.005 0.04
Sum 0.625 0.25 0.125 1

Probabilities have been assigned to the six elementary events either purely subjectively or using frequency
data. Those probabilities, represented by the numbers in the central enclosed rectangle must sum to unity
because they cover the entire sample space - at least one of the sample points must occur. They are called
joint probabilities because each is an intersection of two events - an ‘order status’ event (C or I) and a
‘delivery time’ event (S, N, or M). The probabilities in the right-most column and along the bottom row
are called marginal probabilities. Those in the right margin give the probabilities of the events ‘correct’
and ‘incorrect’. They are the unions of the joint probabilities along the respective rows and they must sum
to unity because the order delivered must be either correct or incorrect.

13
The marginal probabilities along the bottom row are the probabilities of the events ‘same day delivery’
(S), ‘next day delivery’ (N) and ‘more than one day to deliver’ (M). They are the intersections of the joint
probabilities in the respective columns and must also sum to unity because all orders are delivered
eventually. You can read from the table that the probability of the correct order being delivered in
less than two days is 0.6 + 0.24 = 0.84 and the probability of unsatisfactory performance (either incorrect
order or two or more days to deliver) is (0.12 + 0.025 + 0.01 + 0.005) = 0.16 = (1 - 0.84).

CONDITIONAL PROBABILITY
Let us continue the above example and ask ‘what the probability is of sending the correct order when the
delivery is made on the same day’. Note that this is different from asking ‘what the probability is of both
sending the correct order and delivering on the same day’.
It is the probability of getting the order correct conditional upon delivering on the same day and is thus
called a conditional probability. There are two things that can happen when delivery is on the same day -
the order sent can be correct, or the incorrect order can be sent. As you can see from the table a probability
weight of 0.6 + 0.025 = 0.625 is assigned to same-day delivery. Of this probability weight, the fraction
0.6/0.625 = .96 is assigned to the event ‘correct order’ and the fraction 0.25/0.625 = 0.04 is assigned to the
event ‘incorrect order’. The probability of getting the order correct conditional upon same day delivery is
thus 0.96 and we define the conditional probability as
P(C/S) = [P(C∩ S)]/P(S) ------------------------ (3)
where P(C|S) is the probability of C occurring conditional upon the occurrence of S, P(C ∩ S) is the joint
probability of C and S (the probability that both C and S will occur), and P(S) is the marginal or
unconditional probability of S (the probability that S will occur whether or not C occurs).

The definition of conditional probability also implies, from manipulation of (3), that
P (C∩S) = P(C/S)*P(S) ----------------------------- (4)
Thus, if we know that the conditional probability of C given S is equal to 0.96 and that the marginal
probability of C is 0.625 but are not given the joint probability of C and S, we can calculate that joint
probability as the product of 0.625 and 0.96 - namely 0.6.

STATISTICAL INDEPENDENCE (UNCONDITIONAL PROBABILITY)


From application of (3) to the left-most column in the main body of the table we see that the conditional
probability distribution of the event ‘order status’ given the event ‘same day delivery’ is P(C/S) = 0.96 and
P (I/S) = 0.04 which is the same as the marginal probability distribution of the event ‘order status’. Further
calculations using (3) reveal that the probability distributions of ‘order status’ conditional upon the events
‘next day delivery’ and ‘more than one day delivery’ are
P(C/N) = 0.24/0.25 = 0.96 = P(C/M) = 0.12/0.125 = 0.96

14
P(I/N) = 0.01/0.25 = 0.04 = P(I/M) = 0.005/0.125 = 0.04 which are the same as the marginal or
unconditional probability distribution of 'order status’. Moreover, the probability distributions of ‘time of
delivery’ conditional upon the events ‘correct order’ and ‘incorrect order’ are, respectively
P(S/C) = 0.6/0.96 = 0.625 = P(S/I) = 0.025/0.04; P(N/C) = 0.24/0.96 = 0.25 = P(N/I) = 0.1/0.04 and
P(M/C) = 0.12/0.96 = 0.125 = P(M/I) = 0.005/0.04 = 0.125 which are the same as the marginal or
unconditional probability distribution of ‘time of delivery’.

Since the conditional probability distributions are the same as the corresponding marginal probability
distributions, the probability of getting the correct order is the same whether delivery is on the same day or
on a subsequent day - that is, independent of the day of delivery.

And the probability of delivery on a particular day is independent of whether or not the order is correctly
filled. Under these conditions the two events ‘order status’ and ‘time of delivery’ are said to be
statistically independent.
Statistical independence means that the marginal and conditional probabilities are the same, so that
P(C/S) = P(C) ---------------------- (5)
Example 2
Suppose that we are looking at the behavior of two stocks listed on the New York Stock Exchange - Stock
A and Stock B - to observe whether over a given interval the prices of the stocks increased, decreased or
stayed the same. The sample space, together with the probabilities assigned to the sample points based on
several years of data on the price movements of the two stocks can be presented in tabular form as
follows:
Stock A
Increase No change Decrease
Stock B (A1) (A2) (A3) Sum
Increase (B1) 0.2 0.05 0.05 0.3
No change (B2) 0.15 0.1 0.15 0.4
Decrease (B3) 0.05 0.05 0.2 0.3
Sum 0.4 0.2 0.4 1

The conditional probability that the price of stock A will increase, given that the price of stock B
increases is
P(A1/B1) = [P(A1∩ B1)]/P(B1) = 0.2/0.3 = 0.666 which is greater than the unconditional probability of an
increase in the price of stock A, the total of the A1 column, equal to 0.4. This says that the probability
that the price of stock A will increase is greater if the price of stock B also increases.
Now consider the probability that the price of stock A will fall, conditional on a fall in the price of stock
B. This equals P(A3/B3) = [P(A3∩ B3)]/P(B3) = 0.2/0.3 = 0.666 which is greater than the 0.4
15
unconditional probability of a decline in the price of stock A given by the total at the bottom of the
A3 column.
The probability that the price of stock A will decline conditional upon the price of stock B not declining is
[P(A3∩ B1) + P(A3∩ B2)]/[P(B1) + P(B2)] = [0.05 + 0.15]/[0.3+0.4] = 20/70 = 0.286 which is smaller than
the 0.4 unconditional probability of the price of stock A declining regardless of what happens to the price
of stock B. The price of stock A is more likely to decline if the price of stock B declines and less likely to
decline if the price of stock B does not decline. A comparison of these conditional probabilities with
the relevant unconditional ones makes it clear that the prices of stock A and stock B move together.
They are statistically dependent.
There is an easy way to determine if the two variables in a bi-variate sample space are statistically
independent. From the definition of statistical independence (5) and the definition of conditional
probability as portrayed in equation (4) we have
P(C∩ S) = P(C/S)*P(S) = P(C)*P(S) ---------------------------- (6)
This means that when there is statistical independence, the joint probabilities in the tables above can be
obtained by multiplying together the two relevant marginal probabilities. In the delivery case, for example,
the joint probability of ‘correct order’ and ‘next day’ is equal to the product of the two marginal
probabilities 0.96 and 0.25, which yields the entry 0.24. The variables ‘order status’ and ‘time of delivery’
are statistically independent. On the other hand, if we multiply the marginal probability of A1 and the
marginal probability of B1 in the stock price change example we obtain .30 and 40 = 0.12 which is less
than 0.20, the actual entry in the joint probability distribution table. This indicates that the price changes of
the two stocks are statistically dependent.

Example 3
A bright young economics student at Moscow University in 1950 criticized the economic policies of the
great leader Joseph Stalin. He was arrested and sentenced to banishment for life to a work camp in the
east. In those days 70 percent of those banished were sent to Siberia and 30 percent were sent to Mongolia.
It was widely known that a major difference between Siberia and Mongolia was that fifty percent of the
men in Siberia wore fur hats, while only 10 percent of the people in Mongolia wore fur hats. The student
was loaded on a railroad box car without windows and shipped east. After many days the train stopped and
he was let out at an unknown location.
As the train pulled away he found himself alone on the prairie with a single man who would guide him to
the work camp where he would spend the rest of his life. The man was wearing a fur hat.
1) What is the probability he was in Siberia?
2) What is the probability he was in Siberia, if the man he found had not worn a fur hat?
In presenting your answer, calculate all joint and marginal probabilities. Hint: Portray the sample space in
rectangular fashion with location represented along one dimension and whether or not a fur hat is worn
along the other.
16
Solution:
Let S = Siberia; M = Mongolia; Worn Fur Hat = WFH and Not Worn Fur Hat = NFH
Given information are;
 Probability of those banished to go to Siberia = P(S) = 0.7
 Probability of those banished to go to Mongolia = P(M) = 0.3
 Probability a Siberian to wear a fur hat = P(WFH/S) = 0.5
 Probability a Mongolian to wear a fur hat = P(WFH/S) = 0.1
1) We are required to find the probability of being in Siberia given that the student found man had worn a
fur hat = P(S/WFH) = [P(SWFH)]/[P(WFH). We need first to find the joint probability of S and
WFH, P(SWFH)); and the marginal probability for WFH, P(WFH). P(WFH) is in turn also a sum of
the probabilities of joint events of S and M with that of WFH, P(SWFH) + P(MWFH).
P(SWFH) = P(S)*P(WFH/S) = 0.7*0.5 = 0.35
P(MWFH) = P(M)*(WFH/M) = 0.3*0.1 = 0.03. Then P(WFH) = 0.35 + 0.03 = 0.38
Worn Fur Hat Not Worn a Fur Hat Total
(WFH) ( NFH)
Location S 0.35 0.35 0.7
M 0.03 0.27 0.3
Total 0.38 0.62 1

Therefore, P(S/WFH) = [P(SWFH)]/[P(WFH)] = 0.35/0.38  0.92


2) Here, we needed to find the probability of the student to be in Siberia given that the man he found had
not worn a fur hat;
P(S/NFH) = [P(SNFH)]/[P(NFH). Since we have already got P(SWFH), P(MWFH) and P(WFH),
then P(SNFH) = P(S) - P(SWFH) = 0.7 – 0.35 = 0.35
P(NFH) = 1 - P(WFH) = 1 – 0.38 = 0.62
Therefore, P(S/NFH) = 0.35/0.62  0.56

17

You might also like