You are on page 1of 78

Sampling and Sampling Distributions

Population and Samples


Parameters and Statistics
Types of Sampling
Simple Random, Stratified, Systematic and Cluster
sampling, sampling distributions

Standard errors
Sampling from normal and non
normal populations
Central Limit Theorem
Finite population Multiplier
Population
Any well defined set (group) of objects about which a
statistical enquiry is being made is called a population or
universe.

The total number of objects (individuals or members) in a


population is known as the size of the population which
may be finite or infinite.

The population can refer to things as well as people.

For example,
All members of the cultural society of your city.
All students of mathematics of Ithaca college.
All Americans who saw 'TITANIC' last year.
Heights of all students of your school.
Weights of all the citizens of city of Lucknow above 20
years of age.
Mileages of automobiles tyre of Dunlop. etc.
Sample

A finite set of objects drawn from the population with an aim is


called a sample.

Even in every day life we make many of our decisions based on


samples taken, though we are not aware of it. I met Jackson
yesterday first time for an hour or two, I concluded that "Jackson is
crazy" which may be wrong.

We just take a little from a gunny bag of rice, we judge its quality and
then we purchase the whole bag.

If we want to taste milk, we just take a glassful of milk from the can
and taste it.

Note that taking a sample is easy in many cases where the


population is uniform or homogeneous. When the population is
heterogeneous (not uniform), the selection of a sample is not very
easy.
Inference Process

Population
Inference Process

Population

Sample
Inference Process

Population

Sample
statistic
(X ) Sample
Inference Process

Estimates Population
& tests

Sample
statistic
(X ) Sample
Population and Samples,
Why Sampling
Definition of sampling

Procedure by which some members


of a given population are selected as
representatives of the entire
population
Problems in Sampling?

• What problems do you know about?


• What issues are you aware of?
• What questions do you have?
Why do we use samples ?
Get information from large populations
– At minimal cost
– At maximum speed
– At increased accuracy
– Using enhanced tools
Sampling

Precision
Cost
What we need to know
• Concepts
– Representative ness
– Sampling methods
– Choice of the right design
Key Sampling Concepts
Sampling and representative ness

Sampling
Population
Sample

Target Population

Target Population  Sampling Population  Sample


Sampling and representative
ness
Study on prevalence of gynecological infection in women
in Bagalore

Female
Women
population
of 4 city wards

Female population of Bangalore

Target Population  Sampling Population  Sample


PARAMETERS AND
STATISTICS
• A parameter is a numerical quantity that describes
some characteristic of a population. Parameters are
often estimated since their value is generally
unknown, especially when the population is large
enough that it is impossible or impractical to obtain
measurements for all observations. Parameters are
normally represented by Greek letters. The most
common parameters are the population mean and
variance,

• A statistic is a quantitative value that is calculated


from the observations in a sample. They are usually
represented by lowercase English letters with other
symbols. The sample mean and variance, two of the
most common statistics derived from samples, are
denoted by the symbols x and s2, respectively.
Population and Sample

Population Sample
Use statistics to
summarize
features
Use parameters to
summarize
features

Inference on the population from the sample


Sampling Methods

• Simple random sampling


• Stratified Random Sampling
• Cluster Sampling
• Systematic Sampling
• Convenience Sampling
• Judgment Sampling
Stratified Random Sampling
• The population is first divided into groups of
elements called strata.
• Each element in the population belongs to one
and only one stratum.
• Best results are obtained when the elements
within each stratum are as much alike as
possible (i.e. homogeneous group).
• A simple random sample is taken from each
stratum.
• Formulas are available for combining the
stratum sample results into one population
parameter estimate.
Stratified Random Sampling
• Advantage: If strata are
homogeneous, this method is as
“precise” as simple random sampling
but with a smaller total sample size.
• Example: The basis for forming the
strata might be department,
location, age, industry type, etc.
Cluster Sampling
• The population is first divided into separate
groups of elements called clusters.
• Ideally, each cluster is a representative small-
scale version of the population (i.e.
heterogeneous group).
• A simple random sample of the clusters is then
taken.
• All elements within each sampled (chosen) cluster
form the sample.
… continued
Cluster Sampling
• Advantage: The close proximity of elements can
be cost effective (I.e. many sample observations
can be obtained in a short time).
• Disadvantage: This method generally requires a
larger total sample size than simple or stratified
random sampling.
• Example: A primary application is area
sampling, where clusters are city blocks or other
well-defined areas.
Systematic Sampling
• If a sample size of n is desired from a
population containing N elements, we
might sample one element for every
n/N elements in the population.
• We randomly select one of the first n/N
elements from the population list.
• We then select every n/Nth element
that follows in the population list.
• This method has the properties of a
simple random sample, especially if the
list of the population elements is a
random ordering.
Systematic Sampling
• Advantage: The sample usually will
be easier to identify than it would be
if simple random sampling were
used.
• Example: Selecting every 100th
listing in a telephone book after the
first randomly selected listing.
Convenience Sampling
• It is a nonprobability sampling technique.
Items are included in the sample without
known probabilities of being selected.
• The sample is identified primarily by
convenience.
• Advantage: Sample selection and data
collection are relatively easy.
• Disadvantage: It is impossible to determine
how representative of the population the
sample is.
• Example: A professor conducting research
might use student volunteers to constitute a
sample.
Judgment Sampling
• The person most knowledgeable on the
subject of the study selects elements of the
population that he or she feels are most
representative of the population.
• It is a nonprobability sampling technique.
• Advantage: It is a relatively easy way of
selecting a sample.
• Disadvantage: The quality of the sample
results depends on the judgment of the person
selecting the sample.
• Example: A reporter might sample three or
four senators, judging them as reflecting the
general opinion of the senate.
Simple random sampling

• Principle
– Equal chance of
drawing each unit

• Procedure
– Number all units
– Randomly draw
units
Simple random sampling
Example: evaluate the prevalence of
tooth decay among the 1200
children attending a school

• List of children attending the school


• Children numerated from 1 to 1200
• Sample size = 100 children
• Random sampling of 100 numbers
between 1 and 1200
How to randomly select?
Simple random sampling
Table of random numbers
57172 42088 70098 11333 26902 29959 43909 49607
33883 87680 28923 15659 09839 45817 89405 70743
77950 67344 10609 87119 15859 74577 42791 75889
11607 11596 01796 24498 17009 67119 00614 49529
56149 55678 38169 47228 49931 94303 67448 31286
80719 65101 77729 83949 83358 75230 56624 27549
93809 19505 82000 79068 45552 86776 48980 56684
40950 86216 48161 17646 24164 35513 94057 51834
12182 59744 65695 83710 41125 14291 74773 66391
13382 48076 73151 48724 35670 38453 63154 58116
38629 94576 48859 75654 17152 66516 78796 73099
60728 32063 12431 23898 23683 10853 04038 75246
01881 99056 46747 08846 01331 88163 74462 14551
23094 29831 95387 23917 07421 97869 88092 72201
15243 21100 48125 05243 16181 39641 36970 99522
53501 58431 68149 25405 23463 49168 02048 31522
07698 24181 01161 01527 17046 31460 91507 16050
22921 25930 79579 43488 13211 71120 91715 49881
68127 00501 37484 99278 28751 80855 02035 10910
55309 10713 36439 65660 72554 77021 46279 22705
92034 90892 69853 06175 61221 76825 18239 47687
50612 84077 41387 54107 09190 74305 68196 75634
81415 98504 32168 17822 49946 37545 47201 85224
38461 44528 30953 08633 08049 68698 08759 45611
07556 24587 88753 71626 64864 54986 38964 83534
60557 50031 75829 05622 30237 77795 41870 26300
Systematic sampling
• N = 1200, and n = 60
⇒ sampling fraction = 1200/60 = 20
• List persons from 1 to 1200
• Randomly select a number between 1 and 20
(ex : 8)
⇒ 1st person selected = the 8th on the
list
⇒ 2nd person = 8 + 20 = the 28th
etc .....
Systematic sampling
1 2 3 4 5 6 7 8 9 10 11 12
13 14 15

16 17 18 19 20 21 22 23 24 25 26 27
28 29 30

31 32 33 34 35 36 37 38 39 40 41 42
43 44 45

46 47 48 49 50 51 52 53 54 55 ……..
Systematic sampling
Example: systematic sampling
Stratified sampling

• Principle :

–Classify population into


internally homogeneous
subgroups (strata)
–Draw sample in each strata
–Combine results of all strata
Example: Stratified
sampling
• Determine vaccination coverage in
a country
• One sample drawn in each region
• Estimates calculated for each
stratum
• Each stratum weighted to obtain
estimate for country (average)
Stratified sampling
• Advantages
– More precise if variable associated
with strata
– All subgroups represented,
allowing separate conclusions
about each of them
• Disadvantages
– Sampling error difficult to measure
– Loss of precision if very small
numbers sampled in individual
strata
Cluster sampling
• Principle

–Random sample of groups


(“clusters”) of units
–In selected clusters, all units or
proportion (sample) of units
included
Example: Cluster sampling
Section 1 Section 2

Section 3

Section 5

Section 4
Cluster sampling
• Advantages
– Simple as complete list of sampling
units within population not required
– Less travel/resources required

• Disadvantages
– Imprecise if clusters homogeneous
and therefore sample variation
greater than population variation
(large design effect)
– Sampling error difficult to measure
Sample design
• The focus of the design for a sample
must be on the magnitude of the
standard errors of sampling not than
on an arbitrary percentage of the
target population.
• The standard errors are used to
calculate confidence intervals around
the sample data.
Standard Error of Mean

• 1. Standard Deviation of All Possible Sample Means,X


– Measures Scatter in All Sample Means,X

• 2. Less Than Pop. Standard Deviation

• 3. Formula (Sampling With Replacement)

σ
σx =
n
Properties of
Sampling
Distribution of
Mean
Properties of Sampling
Distribution of Mean
• 1. Unbiasedness
– Mean of Sampling Distribution Equals
Population Mean

• 2. Efficiency
– Sample Mean Comes Closer to Population
Mean Than Any Other Unbiased Estimator

• 3. Consistency
– As Sample Size Increases, Variation of
Sample Mean from Population Mean
Decreases
Unbiasedness

P(  X)

Unbiased Biased

A C

µ X
Efficiency

P(  X) Sampling
distribution
of mean B

Sampling
A distribution
of median

µ X
Consistency

P(  X) Larger
sample
size
B

Smaller
A sample
size

µ X
Sampling from
Normal
Populations
Sampling from
Normal Populations
•Central Tendency
Population Distribution
µx = µ σ = 10

•Dispersion
σ µ = 50 X
σx =
n Sampling Distribution
Sampling with n=4 n =16
replacement σ X = 5 σ  X = 2.5

µ X- = 50 X
Standardizing Sampling
Distribution of Mean
X − µx X − µ
Z= =
σx σ
Sampling n Standardized
Distribution Normal Distribution
σ X σ =1

µ X X µ =0 Z
Thinking Challenge

•You’re an operations
analyst for AT&T. Long-
distance telephone calls
are normally distribution
with µ = 8 min. & σ = 2
min. If you select random
samples of 25 calls, what
percentage of the sample
means would be between
7.8 & 8.2 minutes?
© 1984-1994 T/Maker Co.
Sampling Distribution
Solution*
X − µ 7.8 − 8
Z= = = − .50
σ n 2 25
X − µ 8.2 − 8
Sampling Z= = = .50 Standardized
σ n 2 25
Distribution Normal Distribution
σ  X = .4 σ =1
.3830

.1915 .1915

7.8 8 8.2 X -.50 0 .50 Z


Sampling from
Non-Normal
Populations
Sampling from
Non-Normal Populations
•Central Tendency
Population Distribution
µx = µ σ = 10

•Dispersion
σ µ = 50 X
σx =
n Sampling Distribution
– Sampling with n=4 n =30
replacement σ X = 5 σ  X = 1.8

µ X- = 50 X
Central Limit
Theorem
Central Limit Theorem

As
sample
size gets
large
enough
(n ≥ 30) ...

X
Central Limit Theorem

As
sample sampling
size gets distribution
large becomes
enough almost
(n ≥ 30) ... normal.

X
Central Limit Theorem

σ
As σx =
sample n
sampling
size gets distribution
large becomes
enough almost
(n ≥ 30) ... normal.

X
µx = µ