Sampling, Probability Laws and Theories

Sampling, Probability
Laws and Theories

By Dr. Shopeyin – Dosunmu Azeezat O.
NPMCN MD BIOSTATISTIC TOPIC SEMINAR PRESENTATION 18/01/2023 1

OUTLINE
• Sampling
• Introduction
• Sampling Methods
• Conclusion
• Probability Laws and Theories

• Introduction
• Probability Laws and Theories
• Conclusion

SAMPLING

INTRODUCTION
• What percentage of registered voters approve of the way the Nigerian
President is handling his job?
• This is difficult to determine exactly as there are 93.4 million registered voters in Nigeria
• But it’s not difficult to estimate this percentage quite well:
• Sample 1,000 (say) registered voters at random from the Northern geopolitical zones in
Nigeria. Then use the approval percentage among those registered voters as an estimate for
the approval percentage of all registered voters

INTRODUCTION
Who are does with the

characteristic of interest that The Theoretical or Target
generalization will be made Population
to?
What population can The Study

be accessed? Population
How can they be The Sampling

accessed? Frame
Who is in the study? The Sample

INTRODUCTION
STUDY POPULATION
SAMPLE
TARGET POPULATION

INTRODUCTION
• Sampling unit
• Individual unit of selection to be studied (Subjects … patients, hospitals, populace, communities)
• Nigerian registered voter
• A representative sample has all the important characteristics of the population from
which it is drawn
• Nigerian registered voters
• Generalizability refers to the extent to which we can apply the findings of our research to
the target population we are interested in
• 1000 voters 93.4 million voters
INTRODUCTION
• Sampling
• Systematic process of selecting a number of units or participants from a defined study
population
• Sampling is an attempt to achieve a representativeness
• The procedure by which some members of the population are selected as representatives
of the entire population

SAMPLING METHODS
• Non probability sampling methods
• Probability sampling methods
• The difference between the two is whether the sample selection is based on
randomization or not
• With randomization, every element gets equal chance to be picked up and to

be part of sample for study
SAMPLING METHODS
• Non – Probability Sampling Methods
• Selection of sample is not based on probabilities
• It does not rely on randomization
• This technique is more reliant on the researcher’s ability to select elements for a sample
• Outcome of sampling might be biased and makes it difficult for all the elements of
population to be part of the sample equally

SAMPLING METHODS
• Non-Probability Sampling Methods include:
• Purposive sampling
• Panel sampling
• Snowball sampling
• Convenience sampling
• Quota sampling

Non-Probability: Purposive Sampling
• Also known as Judgmental sampling technique
• Based on the intention or the purpose of study
• Researcher chooses the sample based on who they think would be appropriate for the study
• Used primarily when there is a limited number of people that have expertise in the area
being researched
• E.g. to select focus group discussion participants

Non-Probability: Purposive Sampling

Non-Probability: Panel Sampling
• Involves selecting a group of participants through a random sampling method and
• then asking that group for the same information again several times over a period of time
• That is, each participant is given same survey or interview at two or more time
points; each period of data collection is called a "wave"
• Often chosen for large scale or nation-wide studies in order to gauge changes in the
population with regard to any number of variables from chronic illness to job stress
to weekly food expenditures etc.
Non-Probability: Snowball sampling
• Also known as Network or chain referral sampling
• This technique can be used in situations where the population is a small group of individuals
who are completely unknown, difficult to reach/find and with special characteristic or rare
conditions
• Such as homeless people, drug abusers etc.
• Therefore we receive help from the first person who meets criteria for inclusion in a study
and then the person is asked to name others who meet these criteria. This referral technique
goes on, increasing the size of population like a snowball until the targeted sample size has
been attained
Non-Probability: Snowball sampling

Non-Probability: Convenience Sampling
• Sometimes known as grab or opportunity sampling or accidental or haphazard
sampling
• A type of nonprobability sampling which involves the sample being drawn from
that part of the population which is close to hand. That is, readily available and
convenient
• This method is useful when

• The availability of sample is rare and also costly
• Selecting participants for FGD, pilot testing
• Getting general idea of phenomenon under consideration
Non-Probability: Convenience Sampling

Non-Probability: Quota Sampling
• This type of sampling depends on some pre-set standard
• In advance the general composition of the sample’s characteristic (such as age, sex, religion, social
class, income, education etc.) is decided and quota/numbers predetermined
• The population is first segmented into mutually exclusive sub-groups, just as in stratified
sampling
• Then judgement used to select subjects or units non-randomly from each segment until exact
proportions of certain types of data is obtained or sufficient data in different categories is collected
• It is this second step which makes the technique one of non-probability sampling
• The problem is that these samples may introduce bias into research
findings because not everyone gets a chance of selection
• This random element is its greatest weakness and quota versus probability
has been a matter of controversy for many years
• Drawback is person selected may not be representative of the total
population in each category
• Generalization may not be correct

SAMPLING METHODS
• Probability Sampling Methods (recommended)
• Every unit in the population has a chance (greater than zero) of being selected in the
sample, and this probability can be accurately determined
• When every element in the population does have the same probability of selection, this is
known as an 'equal probability of selection' (EPS) design
• Such designs are also referred to as 'self-weighting' because all sampled units are given the same
weight
• This sampling technique uses randomization to make sure that every element of the
population gets an equal chance to be part of the selected sample
SAMPLING METHODS
• Probability (Random) Sampling Methods include:
• Simple random sampling
• Systematic random sampling
• Stratified random sampling
• Multistage sampling
• Multiphase sampling
• Cluster sampling

Probability Sampling: Simple Random Sampling
• Avoid misconception – Random means haphazard or without a conscious bias
• Every member of the entire target population has an equal chance of being
selected as a member of the sample
• Only chance determines selection

• This is done by assigning a number to each unit in the sampling frame
• Sample units to be selected can be determined by

• Balloting
• Casting dice
• Tossing of coin
• Lottery system
• Use of computer generated numbers or
• Table of random numbers (most objective)

• Advantages:
• It provides for greatest number of possible samples
• Applicable when population is small, homogeneous & readily available
• Estimates are easy to calculate
• Disadvantages:
• It is very difficult to achieve (i.e. time, effort and money)
• If sampling frame large, this method may be challenging or impracticable
• Minority subgroups of interest in population may not be present in sample in sufficient
numbers for study
Probability Sampling: Systematic sampling
• A sample selected by listing a population sequentially and choosing members
at regular intervals
• First subject is selected then others are systematically selected through a
predetermined sampling interval
• There must be no periodicity in the population or sampling frame
• Sampling fraction (ratio) = number of units in sampling frame
• Technique usually requires a numbered list

• ADVANTAGES:
• Sample easy to select
• Suitable sampling frame can be identified easily
• Sample evenly spread over entire reference population
• DISADVANTAGES:
• Relatively time, effort and money consuming
• Sample may be biased if hidden periodicity in population coincides with that of selection.
• Difficult to assess precision of estimate from one survey
Probability Sampling: Stratified sampling
• This technique divides the elements of the population into small subgroups
(strata) based on the similarity in such a way that the elements within the
group are homogeneous and heterogeneous among the other subgroups
formed
• Then the elements are randomly selected from each of these strata
• We need to have prior information about the population to create subgroups

• Proportional allocation
• Involves dividing your population into homogeneous subgroups and then taking a
simple random sample in each subgroup
• Not only the overall population is represented but also key subgroups of the population,
e.g. small minority groups
• In conducting a proportionate stratified random sampling you use the same sampling
fraction within strata
• If different sampling fractions used in the strata its called disproportionate stratified
random sampling


• Drawbacks to using stratified sampling:
• Sampling frame of entire population has to be prepared separately for each stratum
• When examining multiple criteria, stratifying variables may be related to some, but not to
others, further complicating the design, and potentially reducing the utility of the strata
• In some cases (such as designs with a large number of strata, or those with a specified
minimum sample size per group), stratified sampling can potentially require a larger
sample than would other methods

Probability Sampling: Multistage sampling
• Useful for large scale surveys
• Selection done in stages until arrive at final unit
• May use different sampling methods at each stage
• Combining any of the earliest described methods earlier in a variety of ways that
help address sampling needs in the most efficient and effective manner possible
• First stage, random number of wards chosen in all

states
• Followed by random number of streets
• Then third stage units will be houses
• All ultimate units (houses, for instance) selected at last step are surveyed
• Not as effective as true random sampling, but probably solves more of the problems inherent
to random sampling
• An effective strategy because it banks on multiple randomizations, as such, extremely useful
• Multistage sampling used frequently when a complete list of all members of the population
not exists and is inappropriate
• Moreover, by avoiding the use of all sample units in all selected clusters, multistage
sampling avoids the large, and perhaps unnecessary, costs associated with traditional cluster
sampling
Probability Sampling: Multi Phase sampling
• Part of the information collected from whole sample & part from subsample
• E.g. In Tb survey MT in all cases – Phase I

• Then, X –Ray chest in MT +ve cases – Phase II
• Sputum examination in X – Ray +ve cases - Phase III
• Survey by such procedure is less costly, less laborious & more purposeful
Probability Sampling: Cluster sampling
• An example of ‘two-stage sampling’ – a subset of sampling units within
selected clusters are randomly selected for inclusion in the sample
• Cluster sampling method is useful

• When sampling a population that is spread across a wide geographic region that requires
covering lots of ground geographically in order to get to each participant
• Often used in its adapted form (EPI 30 x 7 Cluster survey method) to evaluate
vaccination coverage in EPI

• First stage, divide population into groups of homogeneous population units

(usually along geographic boundaries)
• Sampling units are groups (known as clusters) rather than individuals
• Sampling frame is a list of chosen clusters rather than individual units
• Second stage, simple random sampling of those cluster(s) is done to select
sample of respondents
• Then study all individual units within selected cluster(s)

• Advantages :
• More convenient
• Cuts down on the cost/time of preparing a sampling frame
• This can reduce cost/time spent on travel and other administrative costs
• Disadvantage:
• Sampling error is higher for a simple random sample of same size

SAMPLING METHODS
• BIAS IN SAMPLING
• Systematic error in sampling procedures that leads to a distortion in the results of the
study
• Causes include
• Volunteer sample group
• Non probability sampling
• Loss of sample subjects in non random manner
• Non response of survey sample members in non random manner
CONCLUSION
• Sampling is defined as the process of selecting certain members or a subset of

the population to make statistical inferences from them and to estimate
characteristics of the whole population
• Probability sampling e.g. simple random, systematic, stratified, cluster and
multistage sampling
• Non Probability sampling e.g. convenience, purposive, quota sampling
PROBABILITY LAWS AND THEORIES

INTRODUCTION
• Statistics and Probability theory constitutes a branch of mathematics for
dealing with uncertainty
• Probability theory provides a basis for the science of statistical inference
from data
• Probability is a number that reflects the likelihood that an event will occur
• We hear about probabilities in many everyday situations, ranging from
weather forecasts (probability of rainy or sunny weather) to the lottery
(probability of hitting the big jackpot)

INTRODUCTION
• Statistical Experiment: Any action or activity that results in data generation
is a statistical experiment
• Trial: This is act of performing any experiment
• Outcome: This is the result of a trial for an experiment

INTRODUCTION
Special Events
The Null Event, The empty event - f
f = { } = the event that contains no outcomes
The empty event, f , never occurs.
The Entire Event, The Sample Space - S

S = the event that contains a set of all possible outcomes of an experiment
The entire event, S, always occurs

INTRODUCTION
• EXAMPLE: GAMES OF CHANCE

- Games of chance commonly involve the toss of a coin, the roll of a die, or the use of a pack of
cards
- The roll of a die

A usual six-sided die has a sample space Venn Diagram for
universal set S

INTRODUCTION
• An event A is a subset of the sample space S,
• It collects outcomes of particular interest
• The probability of an event A, p(A), is obtained by summing the probabilities of the outcomes
contained within the event A
• An event is said to occur if one of the outcomes contained within the event occurs

INTRODUCTION
• Types of Event
• Simple or Elementary Event:
This is an event with only one possible outcome
• Compound Event:
An event with more than one possible outcome

INTRODUCTION
• Mutually Exclusive events:
• Two events A and B or more are said to be mutually exclusive, if they cannot occur
together
• Occurrence of one precludes the occurrence of the other
• Independent events
• The occurrence of one event does not affect the occurrence of the other event

INTRODUCTION
• Complements of Events
The event Aʹ or Ā, the complement of event A, is the event consisting of everything in the sample space S
that is not contained within the event A
In all cases, the sum of all probabilities in a set is 1:
p(A)+p(Ā)=1
p(A)+p(Aʹ)=1
• This is an event which occurs when and only when the other event fails to occur for mutually exclusive events
For example, If probability of females (F) is 15/40, then the probability of its complement Males (M) is
p(M) = 1 – p(F) = 25/40
INTRODUCTION
• Equally likely: if all outcomes have equal chance of occurrence
• In some situations, notably games of chance, the experiments are conducted in such a way
that all of the possible outcomes can be considered to be equally likely, so that they must be
assigned identical probability values
• n outcomes in the sample space that are equally likely => each probability value will be 1/n
• In the toss of a coin, there are two equally likely outcomes, so the probability of any of the
outcomes in any toss is one in two
There are three common theories used to define probability

1. Theoretical or Classical
2. Empirical or Frequency
3. Subjective

• Theoretical/Classical Probability
• Assumes all outcomes in the sample space are equally likely to occur
• Probability is the frequency of favourable event divided by total number of possible outcomes

• Theoretical/Classical Probability
- A fair die (properly balanced) when rolled vigorously will have all of the six outcomes equally
likely, n = 6
S ={1, 2, 3, 4, 5, 6}
p(1)=p(2)=p(3)=p(4)=p(5)=p(6)=

• Empirical or Frequency Probability
• Many outcomes in real life are not equally likely
• Hence, focus is usually on the relative frequency approach
• The likelihood of occurrence of an event is calculated based on the information we collected from a series of
repeated trials actually observed or experienced
• Based on Empirical Law of Averages which assumes that the world works in such a way that the relative
frequency with which an event occurs in repeated trials always settles down to a limit

• Empirical or Frequency Probability
• p(A) = nA/N
• N = total number of trials
• nA = number of times that A occurs
• If we toss a coin 100 times, and we have heads 50 times, we speak of p(H) = 50/100 = 1/2

• Subjective Probability
• Probability is a value based on an educated guess
• This is the probability based on our own judgement. i.e. on our personal experiences
• Measures one’s degree of belief (confidence) or disbelief (doubt) about a person, an event or a phenomenon
occurring
• E.g. there will be a cure for Ebola Viral Disease in the next 10 years
• Nigeria will get better under APC government

• Properties of Probability
• It is non-negative
• Probability of 0 = event is certain not to occur (no chance event will occur)
• Probability of 1 = event is certain to occur
• Probability of 0.5 = event is expected to occur with 50% certainty the closer the value to 1,
the more likely is the event to occur
• 0 ≤ p(A) ≤ 1
• p(A)≥0
• p(A)≤1
• The Law of Large Numbers says that as the # of trials in an experiment
increases, the empirical probability approaches the theoretical probability.
• If an experiment is done many times, everything tends to “even out.”

• Addition Law
• If two or more events are mutually exclusive, the occurrence of one or the other is the
sum of their individual probabilities
• p(A or B) = p(A) + p(B) – p()
• If A and B are mutually exclusive events
• p() = 0
• p(A or B) = p(A) + p(B) – 0
• = p(A) + p(B)

If a die is tossed once, find the probability of the occurrence of even or prime number
Solution
S = {1,2,3,4,5,6}
A = {2,4,6} Even
B = {2,3,5} Prime
= {2}
p(A or B) = p(A) + p(B) – p
= 3/6 +3/6 -1/6 = 5/6
• Independent events: One event’s outcome does not depend on (or is not
influenced by) the other
• Multiplication Law
• When two events (A and B) or more events are independent, the probability of joint
occurrence (e.g. the occurrence of A and B) is the product of the individual probabilities
p(A and B) = p(A) p(B)

• The probability of getting a four in a single throw of a die is
• The probability of getting another four in the next throw is
• The probability of getting two *fours* in two consecutive throws is

• Conditional Probability
• The probability that event A occurs if we know for certain that event B will occur or
given that another event has occurred is called conditional probability
• If A and B are two non-mutually exclusive events, then
• The conditional probability of A given B is denoted

p
Read as the probability of A given B

Sex 15 – 24 25 – 34 35 – 44 Total
Male 40 20 20 80
Female 60 40 20 120
Total 100 60 40 200
• In general p(A|B) = =
• Consider the Probability of (25 – 34), given female = p(25 – 34/female)
• p(25 – 34|female) = p(25 – 34 and female)/p(Female)
• p(25 – 34|female) = (40/200)/(120/200)
• Law of Total Probability

• If is a partition of a sample space, then the
probability of an event B can be obtained from
the probabilities p( and p(B| using the formula

• Car Warranties
A company sells a certain type of car, which it assembles in one of four
possible locations. Plant I supplies 20%; plant II, 24%; plant III, 25%; and
plant IV, 31%. A customer buying a car does not know where the car has
been assembled, and so the probabilities of a purchased car being from each
of the four plants can be thought of as being 0.20, 0.24, 0.25, and 0.31.

• Car Warranties
Each new car sold carries a 1-year bumper-to-bumper warranty.
P( claim | plant I ) = 0.05, P( claim | plant II ) = 0.11
P( claim | plant III ) = 0.03, P( claim | plant IV ) = 0.08
For example, a car assembled in plant I has a probability of 0.05 of receiving a claim on its warranty.
Notice that claims are clearly not independent of assembly location because these four conditional probabilities
are unequal

• Car Warranties
If and are, respectively, the events that a car is assembled in plants I, II, III, and IV, then
they provide a partition of the sample space, and the probabilities p( are the supply
proportions of the four plants
B = { a claim is made }
= the claim rates for the four individual plants

• Bayes’ Theorem
If is a partition of a sample space, then the posterior probabilities of the event conditional on
an event B can be obtained from the probabilities p and p using the formula
p
• Car Warranties
- The prior probabilities

- If a claim is made on the warranty of the car, how does this change these probabilities?
p
p
p
p

• No claim is made on the warranty

p
p
p
p

CONCLUSION
• Probability laws and theories are important to help us understand how to

calculate probabilities and make sense of probability values stated in relation
to the real world
• Probability is important in making most of life’s decisions to ensure

successful outcomes

BIBLIOGRAPHY
• Sullivan LM. Essentials of biostatistics in public health. Third edition. Burlington,

Massachusetts: Jones & Bartlett Learning; 2018. 376 p.
• Rong J. Introduction to Probability Theory.
• Law of Large Numbers and Probability.
• Probability - Models for random phenomena.
• Probability Review. 2013 Sep.
• KAIST. Probability theory and law.

BIBLIOGRAPHY
• Probability Theory: summary.

• Stanford Center for Professional Development. Producing Data, Sampling. Stanford.
• Odugbemi T. Sampling methodology. College of Medicine University University of Lagos
MPH class; 2022 Dec 17; Nigeria.
• Kanupriya C. sampling methods.
• Odeyemi K. SAMPLING TECHNIQUES. 2021 presented at: National Postgraduate Part 1
Update Course; Nigeria.

THANKS FOR LISTENING
• Questions?
• Comments?
• Contribution?

Sampling, Probability Laws and Theories

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sampling, Probability Laws and Theories

Uploaded by

Copyright:

Available Formats

Sampling, Probability

Laws and Theories

NPMCN MD BIOSTATISTIC TOPIC SEMINAR PRESENTATION 18/01/2023 1

• Probability Laws and Theories

NPMCN MD BIOSTATISTIC TOPIC SEMINAR PRESENTATION 18/01/2023 2

NPMCN MD BIOSTATISTIC TOPIC SEMINAR PRESENTATION 18/01/2023 3

NPMCN MD BIOSTATISTIC TOPIC SEMINAR PRESENTATION 18/01/2023 4

Who are does with the

What population can The Study

How can they be The Sampling

Who is in the study? The Sample

NPMCN MD BIOSTATISTIC TOPIC SEMINAR PRESENTATION 18/01/2023 6

• Sampling is an attempt to achieve a representativeness

NPMCN MD BIOSTATISTIC TOPIC SEMINAR PRESENTATION 18/01/2023 8

• Probability sampling methods

• With randomization, every element gets equal chance to be picked up and to

• It does not rely on randomization

NPMCN MD BIOSTATISTIC TOPIC SEMINAR PRESENTATION 18/01/2023 10

NPMCN MD BIOSTATISTIC TOPIC SEMINAR PRESENTATION 18/01/2023 11

• Based on the intention or the purpose of study

NPMCN MD BIOSTATISTIC TOPIC SEMINAR PRESENTATION 18/01/2023 12

NPMCN MD BIOSTATISTIC TOPIC SEMINAR PRESENTATION 18/01/2023 13

NPMCN MD BIOSTATISTIC TOPIC SEMINAR PRESENTATION 18/01/2023 16

• This method is useful when

NPMCN MD BIOSTATISTIC TOPIC SEMINAR PRESENTATION 18/01/2023 18

NPMCN MD BIOSTATISTIC TOPIC SEMINAR PRESENTATION 18/01/2023 21

NPMCN MD BIOSTATISTIC TOPIC SEMINAR PRESENTATION 18/01/2023 23

• Only chance determines selection

NPMCN MD BIOSTATISTIC TOPIC SEMINAR PRESENTATION 18/01/2023 24

• Sample units to be selected can be determined by

NPMCN MD BIOSTATISTIC TOPIC SEMINAR PRESENTATION 18/01/2023 26

NPMCN MD BIOSTATISTIC TOPIC SEMINAR PRESENTATION 18/01/2023 29

• We need to have prior information about the population to create subgroups

NPMCN MD BIOSTATISTIC TOPIC SEMINAR PRESENTATION 18/01/2023 32

NPMCN MD BIOSTATISTIC TOPIC SEMINAR PRESENTATION 18/01/2023 33

NPMCN MD BIOSTATISTIC TOPIC SEMINAR PRESENTATION 18/01/2023 34

• Selection done in stages until arrive at final unit

• May use different sampling methods at each stage

• First stage, random number of wards chosen in all

• Followed by random number of streets

• Then third stage units will be houses

• An effective strategy because it banks on multiple randomizations, as such, extremely useful

• E.g. In Tb survey MT in all cases – Phase I

• Cluster sampling method is useful

NPMCN MD BIOSTATISTIC TOPIC SEMINAR PRESENTATION 18/01/2023 39

• First stage, divide population into groups of homogeneous population units

NPMCN MD BIOSTATISTIC TOPIC SEMINAR PRESENTATION 18/01/2023 41

NPMCN MD BIOSTATISTIC TOPIC SEMINAR PRESENTATION 18/01/2023 42

• Sampling is defined as the process of selecting certain members or a subset of

NPMCN MD BIOSTATISTIC TOPIC SEMINAR PRESENTATION 18/01/2023 45

NPMCN MD BIOSTATISTIC TOPIC SEMINAR PRESENTATION 18/01/2023 46

• Trial: This is act of performing any experiment

• Outcome: This is the result of a trial for an experiment

The Entire Event, The Sample Space - S

NPMCN MD BIOSTATISTIC TOPIC SEMINAR PRESENTATION 18/01/2023 48

• EXAMPLE: GAMES OF CHANCE

- The roll of a die

NPMCN MD BIOSTATISTIC TOPIC SEMINAR PRESENTATION 18/01/2023 49

• It collects outcomes of particular interest

NPMCN MD BIOSTATISTIC TOPIC SEMINAR PRESENTATION 18/01/2023 50

NPMCN MD BIOSTATISTIC TOPIC SEMINAR PRESENTATION 18/01/2023 51

NPMCN MD BIOSTATISTIC TOPIC SEMINAR PRESENTATION 18/01/2023 52

There are three common theories used to define probability

• The probability of getting two fours in two consecutive throws is