Statistics Chapter One
Topics covered
Statistics Chapter One
Topics covered
Probability theory forms the basis for inferential statistics as well as other fields that require
quantitative assessment of chance occurrences; such as quality control, management decision analysis;
and in areas of the natural sciences, engineering, economics etc.
define probability
define important terms in probability
identify the approaches in probability
list sample space of an experiment
identify the types of events
calculate probabilities using deferent rules.
1.2 THE CONCEPT OF PROBABILITY
Since life is full of uncertainties, people have always been interest in evaluating probabilities. The
theory of probability is an in indispensable tool in the analysis of situations involving uncertainty.
From the above definitions you can differentiate probability to chances or possibilities. As the latter cannot
be quantified. Probability is a number between zero and one inclusive. The probability of zero represents
something that cannot happen and the probability of one represents something that is certain to happen.
The closer a probability is to zero, the more improbable it is that something will happen the closer the
probability is to one the more sure we are it will happen. When probability is 0.5 uncertainty will reach its
maximum.
1
concepts of location, dispersion and shape). E.g. the frequency curve becomes a density function
purporting to model observable real world phenomena.
Important terms
1. Experiment
A process that leads to the occurrence of one and only one of several possible observations or
A process of observation that has an uncertain outcome. eg Tossing a coin; answering a question
where the answer can be correct or incorrect; drawing a card from a deck of playing card.
2. Event
A collection of one or more outcomes of an experiment or an experimental outcome that may or may
not occur. If the experiment is tossing a coin the events are Head, or Tail.
3. Outcome
A particular result of an experiment. In case of tossing a coin, If head face up we will consider head as
the out come of the experiment.
A) Classic Probability
It is probability based on the symmetry of games of chance or similar situations. This probability is based
on the idea that certain occurrences are equally likely. eg. The numbers 1,2,3,4,5,and 6 on fair die are
equally likely to occur i.e they do have equal chance of occurrence.
If a random experiment can result in ‘N’ mutually exclusive and equally likely outcomes and if N A of the
outcomes result in the occurrence of the event A, then the probability of A is defined by P(A) = NA/N
Examples:
1. Flipping of a fair coin twice and observing the face which shows-up the set of all equally likely
outcomes is S = {(H T), (T H), (H H), (T T)}, with N = 4. Let the event A be observing at least one
head (H); then A= {(H T), (T H), (H H)}. Since NA = 3, P(A) = ¾
2. A fair die is thrown. Find the probabilities that the face on the die is (a) Maximum (b) Prime (c)
Multiple of 3 (d) Multiple of 7
Solution:
There are 6 possible outcomes when a die is tossed. We assumed that all the 6 faces are equally likely. The
classical definition of probability is to be applied here. The sample space is S = {1, 2, 3, 4, 5, 6}, N = 6
a)Let A be the event that the face is maximum. Thus, A = {6}, NA = 1. Therefore, P(A) = NA /N = 1/6
b)Let B be the event that the face is prime. Thus, B = {2, 3, 5}, N A = 3. Therefore, P(B) = NB/N = 3/6
=½
c)Let C be the event that the face is multiple of three. Thus, C = {3, 6}, NC = 2.
2
Therefore, P(C) = NC/N = 2/6 = 1/3
d)Let D be an event that the face is multiple of 7. Thus, D = {ϕ}, ND = 0. Therefore, P(D) = ND/N =
0/6 = 0 (not possible).
The probability of an event happening in the long-term is determined by observing what fraction of the
time similar events happened in the past. We often think of a probability in terms of the percentage of the
time the event would occur in many repetition of the experiment. Suppose that A is an event that might
occur when a particular experiment is performed then the probability that the event A will occur, P (A),
can be interpreted to be the number that would be approached by the relative frequency of the event A if
we perform the experiment an indefinitely large number of times.
Therefore, probability is interpreted as stemming from the observed stability of empirical frequencies. In a
coin flipping experiment for instance, probability of A = {H} is ½, not because there are two equally
outcomes but because repeated series of large numbers of trials demonstrates that the empirical frequency
of occurrence of A converges to that limit ½ as the number of an event A in ‘n’ trials goes to infinity.
Symbolically;
or,
Probability of an event happening = Number of times the event occurred in past
Total number of observation
Example
If a truck operator experienced 5 accidents out of 50 truck last year, then the probability that a truck will
have an accident next year can be 5/50 = 0.10
3
Limitations of the frequency approach
1) What is mean by the limit as ‘N’ goes to infinity
2) How can we generate infinite sequence of trials?
3) What happens to phenomena where repeated trials are not possible?
When there is no past experience or little on which to base a probability, personal judgment, experience,
intuition or expertise or any other subjective evaluation criteria will be applied to estimating or assigning
probability. This probability is subjective probability.
It is also called personal probability. Unlike objective probability one person’s subjective probability may
very well different from another person’s subjective probability of the same event.
eg. A physician assessing the probability of a patient’s recovery and an expert in the national bank
assessing probability of currency devaluation are both making a personal judgment based on what they
know and feel about the situation and other group of physicians or experts will arrive with different
probability, though both can employee identical techniques or approaches and information.
Both classic and long-term relative frequency probabilities are objective in the sense that no personal
judgment is involved. But one thing needs to be clear is that whatever the kind of probability involved
/subjective or objective/ the same set of mathematical rules holds for manipulating and analyzing
probability.
In order to calculate and interpreter probabilities it is important to understand and use the idea of
sample space.
4
Basic outcomes/sample points/simple events: are the different possible outcomes of the random
trial.
Sample space: is the set of all basic outcomes for the random trial.
In our example, a single coin toss, which we denote by S, contain the basic outcomes, H (heads) & T (tail).
The set of basic outcomes can be written as; S = {H, T}. These basic outcomes are mutually exclusive
(only one can occur at a time) and mutually exhaustive (at least one of them must occur).
BIVARIATE CASE
Suppose we flip two coins simultaneously & record whether they come up heads or tails. Now we have
four simple events: S = {HH, HT, TH, T T}.
Event (E) - is a subset of sample points. E.g. E 1 = event ‘at least one head’ = {HH,
HT, TH}. E2 = events ‘both faces same’ = {HH, TT}.
Complementary event to Ej - is a set of outcomes not contained in an event Ej. E.g.
E.g. U = {HT, TH, T T}. The union of all elementary events gives the sample space it
self.
Universal event - is an event that contains the entire sample space.
Event space – is a set of all possible events that can occur in a randomly trial or
experiment, including both the universal event and the null event.
Example 1
A newly married couple plans to have two children. Naturally, they are curious about whether their
children will be boys or girls. Therefore, we consider the experiment of having two children.
In order to find the sample spaces of this experiment, of having two children, we let ‘B’ denote that
child is a boy and ‘G’ denotes that child is a girl.
5
This experiment is a two-step process i.e having the first child, which could be a boy or a girl and
having the second child, which could also be either a boy or a girl.
This can be constructed by a tree diagram. Each branch of the tree leads us to a distinct sample
space outcome.
BB
Boy(B)
Girl (G)
BG
Boy (B)
GB
Boy(B)
Girl (G)
Girl (G)
GG
We see that there are four sample space outcomes. Therefore the sample space (i.e the set of all of the
distinct samples space outcomes is BB BG GB GG.
In order to consider the probabilities of these outcomes, suppose that boys and girls are equality likely
each time a child is born. This says that each of the sample space outcomes is equally likely. i.e.
P(BB) = p(BG)=p(GB)=p(GG)= 1/4 This says that there is a 25%, chance that each of these outcomes
will occur. Since we are certain that there is no other option or combination remaining, the probability
that the couple will have any one of the sample space outcomes is one. i.e. P(BB) + P(BG) + P(EB) +
P(EG) = 1
Notice that these probabilities sum one i.e the sum of the probabilities of all sample space outcomes is
one.
Therefore the sample space (that is, the set of all of the distinct sample space out comes) is BB, BG,
GB, GG
Example 2
6
A student takes a quiz that consist of three true or false questions. If we consider our experiment to be
answering the three questions, each question can be answered correctly or incorrectly.
Let c denote answering a question correctly and I denote answering a question incorrectly. Then we
can depict a tree diagram of the sample space out come for the experiment.
CCC
ICC
Correct (c)
ICI
Incorrect (I)
Incorrect I IIC
Step I Correct (c) III
Answering Incorrect (I)
the 1st Sample space
question Step II
Answering Incorrect (I)
the 2nd
question Step III
Answering
the 3rd
question
The tree diagram has eight different branches and the eight distinct sample space outcomes are listed at
the end of the branches. We see the sample space is
Now suppose that the student was totally unprepared for the test, and has to blindly guess the answer
to each question that is the student has a 50-50 chance or 0.5 probability of correctly answering each
question. This means that each of the eight sample space outcomes is equally likely to occur.
i.e
7
P(ccc) = P(ccI) ------P(III) =1/8
Here also the sum of the probabilities of the sample space out comes is one.
In General the sum of the probabilities of all the sample space is equal to 1.
If all of the sample space out comes are equally likely, then the probability that an event will occur is
equal to:
Consider the couple planning to have two children to find the probability of two boys first we have to
find the sample space outcome corresponding to the event of having the first child a boy and the
second child also a boy.
There is only one sample space outcome corresponding to this event i.e. BB so the probability will be:
= 0.25 the probability that the couple will have a boy and a girls is similarly calculated by first
identifying the sample space outcomes corresponding to the event of having a boy and a girls. The
sample space outcomes are BG and GB. So the probability will be = 0.5
Often time it may be practically impossible to list all possible sample space outcomes of an
experiment. Under such circumstances we can find the probability of an event by identifying the
number of sample space outcomes /without listing/ corresponding to the event.
Example - Suppose that 650.000 of 1,000,000 households in Addis subscribe to a newspaper called
Addis Zemen, and consider randomly selecting one of the Households in this city. That is consider
selecting one household & giving each and every household in the city the same chance of being
selected. Let A be the event that the randomly selected household subscribes to the Addis Zemen.
Then since the sample space of this experiment consists of 1,000,000 equally likely sample space
outcomes (households). It follows that
= 650,000 = 0.65
8
1000,000
Now also suppose that 500,000 households in the city subscribe to the Ethiopian Herald (H) and
further suppose that 250,000 households subscribe to both the newspapers.
We consider randomly selecting one household in the city, and we define the following events
Ā = The randomly selected, hosehold does not subscribe to the Addis Zemen.
AnH = The randomly selected household subscribes both to Addis Zemen & Herald.
Since 650,000 of the 1,000,0000 households subscribe to the Addis Zemen (that is correspond to the
event Occurring). Then 350,000 households do not subscribe to Zemen (Ā) i.e. 1,000,000 – 650,000.
Similarly since 500,000 households subscribe to Herald (H) 500,000 households do not subscribe to
herald ( ).
An = the randomly selected household subscribes to Zemen and does not subscribe to Herald;
ĀnH = the randomly selected household does not subscribe to Zemen and does subscribe to Herald.
A summary of the number of house holds corresponding to the events A, Ā, H, and AnH
9
Since 650,000 households subscribe to the Addis Zemen (A) and 250,000 households subscribe to both
Zemen and Herald (AnH) it follows that 650,000 – 250,000 = 400,000 house holds subscribe to Addis
Zemen but do not subscribe to Herald, (An ). This subtraction is illustrated in the table below.
By similar logic
a. 500,000 – 250,000 = 25,000 households do not subscribe to Addis Zemen but do subscrige to
Herald (Ā nH)
b. 350,000 – 250,000 = 100,000 households do not subscribe the Addis Zemen and also do not
subscribe the Herald (Ā n )
c. Subtracting to find the number of households corresponding to the events.
d. AnH, An ,
Event H
Ā 350,000
e. (Ā n H) = 5000,000-250,000 = 250,000
= 100,000
A contingency table summarizing subscription data for Addis Zemen and Herald
(H) ( )
(A)
10
Now since we will randomly select one household (making all the households equally likely to be
chosen), the probability of any of the previously defined events is the ration of the number of
households corresponding to the event’s occurrence to the total number of households in the city.
Therefore
Next letting AUH denote either A or H, we consider finding the probability of the event
AUH = the randomly selected household subscribes to either the Addis Zemen or Herald. (i.e
subscribe to at least one of the two newspapers).
i.e 90% of the house holds in the city subscribe to either Addis Zemen or Herald.
Logically the reason for this is that both P(A) = 0.65 and P(H) = 0.5 count the 25% of the households
that subscribe to both newspapers. Therefore;
the sum of P(A) and P(H) counts this 25% of the households once to often
It follows that if we subtract P(AnH) = 0.25 from the sum of P(A) and P(H) then we will obtain
P(AuH) i.e
11
P(AuH) = P(A)+P(H) – P(AnH)
= 0.65+0.5-0.25 = 0.90
1) The Intersection of A&B is the event consisting of the sample space outcomes belonging to both
A&B, denoted AnB. Further more P(AnB) denotes the probability that Both A&B will
simultaneously Occur.
2) The union of A&B is the event consisting of sample space outcomes belonging to either A or B.
The union is denoted AUB Further more P(AUB) denotes the probability that either A or B will
occur.
Where J is the number of basic events or sample points contained in the event Ej . In other words, the
probability that an event will occur is the sum of the probabilities that the basic outcomes contained in that
event. This follows from the fact that an event is said to occur when one of the basic outcomes or sample
points it contains occurs.
3) Since it is certain that at least one of the sample points or elementary events in the sample space will
occur,
P(S) = 1. And the null event cannot occur, so P(ϕ) = 0 where ϕ is the null event. These results follow
from the fact that P(S) is the sum of the probabilities of all the simple or basic events.
4) If two events A and B are mutually exclusive events (they are disjoint so that (A∩B) = ϕ then P(A∩B)
= 0), then the probability of either A or B, symbolically P(A or B) = P(AUB) = P(A) + P(B). It is called
addition rule.
5) If two events A and B are not mutually exclusive, then P(A or B) = P(AUB) = P(A) + P(B) – P(A∩B)
6) If A is an event from sample space, S and A’, is its complementary, then P(AUA’) = 1
7) If two events are independent, the probability of both A and B is given P(A∩B ) = P(A)*P(B)
12
8) If two events are dependent, the probability of both occurring simultaneously is given by P(A∩B) =
P(A)*P(B/A) or = P(B)*P(A/B), it follows that for any dependent events (say event A and B), P(A) never
be equal to P(A/B) or P(B) never be equal to P(B/A).
Example 1
Take a certain firm that engage in production and delivering of parts for oil drilling rigs operating firm.
The relevant random trial is the delivery of a part. Two characteristics of the experiment are of interest -
first, whether the correct part was delivered and second, the number of days it took to get the part to the
drilling site.
Time of Delivery
Same day Next day More than one day Sum
Order Status (S) (N) (M)
Correct (C) 0.6 0.24 0.12 0.96
Incorrect order (I) 0.025 0.01 0.005 0.04
Sum 0.625 0.25 0.125 1
Probabilities have been assigned to the six elementary events either purely subjectively or using frequency
data. Those probabilities, represented by the numbers in the central enclosed rectangle must sum to unity
because they cover the entire sample space - at least one of the sample points must occur. They are called
joint probabilities because each is an intersection of two events - an ‘order status’ event (C or I) and a
‘delivery time’ event (S, N, or M). The probabilities in the right-most column and along the bottom row
are called marginal probabilities. Those in the right margin give the probabilities of the events ‘correct’
and ‘incorrect’. They are the unions of the joint probabilities along the respective rows and they must sum
to unity because the order delivered must be either correct or incorrect.
13
The marginal probabilities along the bottom row are the probabilities of the events ‘same day delivery’
(S), ‘next day delivery’ (N) and ‘more than one day to deliver’ (M). They are the intersections of the joint
probabilities in the respective columns and must also sum to unity because all orders are delivered
eventually. You can read from the table that the probability of the correct order being delivered in
less than two days is 0.6 + 0.24 = 0.84 and the probability of unsatisfactory performance (either incorrect
order or two or more days to deliver) is (0.12 + 0.025 + 0.01 + 0.005) = 0.16 = (1 - 0.84).
CONDITIONAL PROBABILITY
Let us continue the above example and ask ‘what the probability is of sending the correct order when the
delivery is made on the same day’. Note that this is different from asking ‘what the probability is of both
sending the correct order and delivering on the same day’.
It is the probability of getting the order correct conditional upon delivering on the same day and is thus
called a conditional probability. There are two things that can happen when delivery is on the same day -
the order sent can be correct, or the incorrect order can be sent. As you can see from the table a probability
weight of 0.6 + 0.025 = 0.625 is assigned to same-day delivery. Of this probability weight, the fraction
0.6/0.625 = .96 is assigned to the event ‘correct order’ and the fraction 0.25/0.625 = 0.04 is assigned to the
event ‘incorrect order’. The probability of getting the order correct conditional upon same day delivery is
thus 0.96 and we define the conditional probability as
P(C/S) = [P(C∩ S)]/P(S) ------------------------ (3)
where P(C|S) is the probability of C occurring conditional upon the occurrence of S, P(C ∩ S) is the joint
probability of C and S (the probability that both C and S will occur), and P(S) is the marginal or
unconditional probability of S (the probability that S will occur whether or not C occurs).
The definition of conditional probability also implies, from manipulation of (3), that
P (C∩S) = P(C/S)*P(S) ----------------------------- (4)
Thus, if we know that the conditional probability of C given S is equal to 0.96 and that the marginal
probability of C is 0.625 but are not given the joint probability of C and S, we can calculate that joint
probability as the product of 0.625 and 0.96 - namely 0.6.
14
P(I/N) = 0.01/0.25 = 0.04 = P(I/M) = 0.005/0.125 = 0.04 which are the same as the marginal or
unconditional probability distribution of 'order status’. Moreover, the probability distributions of ‘time of
delivery’ conditional upon the events ‘correct order’ and ‘incorrect order’ are, respectively
P(S/C) = 0.6/0.96 = 0.625 = P(S/I) = 0.025/0.04; P(N/C) = 0.24/0.96 = 0.25 = P(N/I) = 0.1/0.04 and
P(M/C) = 0.12/0.96 = 0.125 = P(M/I) = 0.005/0.04 = 0.125 which are the same as the marginal or
unconditional probability distribution of ‘time of delivery’.
Since the conditional probability distributions are the same as the corresponding marginal probability
distributions, the probability of getting the correct order is the same whether delivery is on the same day or
on a subsequent day - that is, independent of the day of delivery.
And the probability of delivery on a particular day is independent of whether or not the order is correctly
filled. Under these conditions the two events ‘order status’ and ‘time of delivery’ are said to be
statistically independent.
Statistical independence means that the marginal and conditional probabilities are the same, so that
P(C/S) = P(C) ---------------------- (5)
Example 2
Suppose that we are looking at the behavior of two stocks listed on the New York Stock Exchange - Stock
A and Stock B - to observe whether over a given interval the prices of the stocks increased, decreased or
stayed the same. The sample space, together with the probabilities assigned to the sample points based on
several years of data on the price movements of the two stocks can be presented in tabular form as
follows:
Stock A
Increase No change Decrease
Stock B (A1) (A2) (A3) Sum
Increase (B1) 0.2 0.05 0.05 0.3
No change (B2) 0.15 0.1 0.15 0.4
Decrease (B3) 0.05 0.05 0.2 0.3
Sum 0.4 0.2 0.4 1
The conditional probability that the price of stock A will increase, given that the price of stock B
increases is
P(A1/B1) = [P(A1∩ B1)]/P(B1) = 0.2/0.3 = 0.666 which is greater than the unconditional probability of an
increase in the price of stock A, the total of the A1 column, equal to 0.4. This says that the probability
that the price of stock A will increase is greater if the price of stock B also increases.
Now consider the probability that the price of stock A will fall, conditional on a fall in the price of stock
B. This equals P(A3/B3) = [P(A3∩ B3)]/P(B3) = 0.2/0.3 = 0.666 which is greater than the 0.4
15
unconditional probability of a decline in the price of stock A given by the total at the bottom of the
A3 column.
The probability that the price of stock A will decline conditional upon the price of stock B not declining is
[P(A3∩ B1) + P(A3∩ B2)]/[P(B1) + P(B2)] = [0.05 + 0.15]/[0.3+0.4] = 20/70 = 0.286 which is smaller than
the 0.4 unconditional probability of the price of stock A declining regardless of what happens to the price
of stock B. The price of stock A is more likely to decline if the price of stock B declines and less likely to
decline if the price of stock B does not decline. A comparison of these conditional probabilities with
the relevant unconditional ones makes it clear that the prices of stock A and stock B move together.
They are statistically dependent.
There is an easy way to determine if the two variables in a bi-variate sample space are statistically
independent. From the definition of statistical independence (5) and the definition of conditional
probability as portrayed in equation (4) we have
P(C∩ S) = P(C/S)*P(S) = P(C)*P(S) ---------------------------- (6)
This means that when there is statistical independence, the joint probabilities in the tables above can be
obtained by multiplying together the two relevant marginal probabilities. In the delivery case, for example,
the joint probability of ‘correct order’ and ‘next day’ is equal to the product of the two marginal
probabilities 0.96 and 0.25, which yields the entry 0.24. The variables ‘order status’ and ‘time of delivery’
are statistically independent. On the other hand, if we multiply the marginal probability of A1 and the
marginal probability of B1 in the stock price change example we obtain .30 and 40 = 0.12 which is less
than 0.20, the actual entry in the joint probability distribution table. This indicates that the price changes of
the two stocks are statistically dependent.
Example 3
A bright young economics student at Moscow University in 1950 criticized the economic policies of the
great leader Joseph Stalin. He was arrested and sentenced to banishment for life to a work camp in the
east. In those days 70 percent of those banished were sent to Siberia and 30 percent were sent to Mongolia.
It was widely known that a major difference between Siberia and Mongolia was that fifty percent of the
men in Siberia wore fur hats, while only 10 percent of the people in Mongolia wore fur hats. The student
was loaded on a railroad box car without windows and shipped east. After many days the train stopped and
he was let out at an unknown location.
As the train pulled away he found himself alone on the prairie with a single man who would guide him to
the work camp where he would spend the rest of his life. The man was wearing a fur hat.
1) What is the probability he was in Siberia?
2) What is the probability he was in Siberia, if the man he found had not worn a fur hat?
In presenting your answer, calculate all joint and marginal probabilities. Hint: Portray the sample space in
rectangular fashion with location represented along one dimension and whether or not a fur hat is worn
along the other.
16
Solution:
Let S = Siberia; M = Mongolia; Worn Fur Hat = WFH and Not Worn Fur Hat = NFH
Given information are;
Probability of those banished to go to Siberia = P(S) = 0.7
Probability of those banished to go to Mongolia = P(M) = 0.3
Probability a Siberian to wear a fur hat = P(WFH/S) = 0.5
Probability a Mongolian to wear a fur hat = P(WFH/S) = 0.1
1) We are required to find the probability of being in Siberia given that the student found man had worn a
fur hat = P(S/WFH) = [P(SWFH)]/[P(WFH). We need first to find the joint probability of S and
WFH, P(SWFH)); and the marginal probability for WFH, P(WFH). P(WFH) is in turn also a sum of
the probabilities of joint events of S and M with that of WFH, P(SWFH) + P(MWFH).
P(SWFH) = P(S)*P(WFH/S) = 0.7*0.5 = 0.35
P(MWFH) = P(M)*(WFH/M) = 0.3*0.1 = 0.03. Then P(WFH) = 0.35 + 0.03 = 0.38
Worn Fur Hat Not Worn a Fur Hat Total
(WFH) ( NFH)
Location S 0.35 0.35 0.7
M 0.03 0.27 0.3
Total 0.38 0.62 1
17