# Sampling

Objectives of presentation
• • • • • • Definition of sampling Why do we use samples? Concept of representativeness Main methods of sampling Sampling error Sample size calculation

Definition of sampling

Procedure by which some members of a given population are selected as representatives of the entire population

Definition of sampling terms
• Sampling unit
– Subject under observation on which information is collected

• Sampling fraction
– Ratio between the sample size and the population size

• Sampling frame
– Any list of all the sampling units in the population

• Sampling scheme
– Method of selecting sampling units from sampling frame

Why do we use samples ?
Get information from large populations
– At minimal cost – At maximum speed – At increased accuracy – Using enhanced tools

Sampling

Precision Cost

What we need to know
• Concepts
– Representativeness – Sampling methods – Choice of the right design

• Calculations
– Sampling error – Design effect – Sample size

Sampling and representativeness

Sampling Population

Sample

Target Population

Target Population  Sampling Population  Sample

Representativeness
• Person
• Demographic characteristics (age, sex…) • Exposure/susceptibility

• Place (ex : urban vs. rural) • Time
• Seasonality • Day of the week • Time of the day

Ensure representativeness before starting, confirm once completed !!!!!!

Types of samples
• Non-probability samples

• Probability samples

Non probability samples
• Quotas
• Sample reflects population structure • Time/resources constraints

• Convenience samples (purposive units)
• Biased • Best or worst scenario

Probability of being chosen : unknown

Probability samples
• Random sampling • Each subject has a known probability of being chosen • Reduces possibility of selection bias • Allows application of statistical theory to results

Sampling error
• No sample is the exact mirror image of the population • Magnitude of error can be measured in probability samples • Expressed by standard error
– of mean, proportion, differences, etc

• Function of
– amount of variability in measuring factor of interest – sample size

Methods used in probability samples
• • • • • Simple random sampling Systematic sampling Stratified sampling Multistage sampling Cluster sampling

PRA becomes scientific provided? •Formulated for research purpose •Planned systematically •Recorded systematically •Subjected to cross check(triangulation) •Control on reliability and validity.

Limitations of PRA /observation as a research tool Perceptions and  inferences  Selectivity and frame of reference  Social phenomena dispersed temporally and spatially  Emotional entanglements of observer  Observers anxiety  Sensitizing effect by observer  Validity of theoritical interpretation

Increasing reliability/validity of PRA

Proper definition, Conceptualisation,& Operationalisation of indicators

 

Increase confidence in judgement Improve training and practice in pra Reduce categories (not more than 50) Explanation of rules for its use categories and

Aids for observation
Diary  Note down key words to write a full report  Write details during each phase of field work  Reanalyse and categorise the notes/points daily  Use sociometry /social /other maps as aids  Use camera/video/  Tape recorder

Kendall’s coe-fficient of concordance (W) s W= -----------------1 ---- X k2(N3-N) -k∑ T 12 s=(Rj-MeanRj)2.....sum of squared deviations from mean of ranks judged by ‘k’ judges on ‘N’ indicators

∑ Rj Mean Rj= ------N ∑ (t3-t) T= ................. 12 > 30 INDICATORS USE ‘Z’ APPROXIMATION

  

RELIABILITY A criterion is stable or reproducible Reliability contributes to validity Reliable instrument need not be valid CHARECTERISTICS

Stability : Dependability: Accuracy Least error of measurement Equivalence: Reliability = True variance St2 Total variance S2
5. 6. 7. 8. 9.
X= Xt ± Xe S2 = St2 ± Se2 (divide throughout by S2)  1= St2 ± Se 2 S2 S2

Validity Internal validity = epistemic correlation betn TD & OD
Type I Type III

TypeII

External validity: generalizability to populations Criterion related validity •Relevance •Bias free •Reliability •Availability

Quality of an estimate
Precision & validity No precision Precision but no validity

Random error !

Systematic error (Bias) !

Simple random sampling
• Principle
– Equal chance of drawing each unit

• Procedure
– Number all units – Randomly draw units

Simple random sampling
• Advantages
– Simple – Sampling error easily measured

• Disadvantages
– Need complete list of units – Does not always achieve best representativeness – Units may be scattered

Simple random sampling
Example: evaluate the prevalence of tooth decay among the 1200 children attending a school • • • • List of children attending the school Children numerated from 1 to 1200 Sample size = 100 children Random sampling of 100 numbers between 1 and 1200 How to randomly select?

Simple random sampling

Table of random numbers
57172 33883 77950 11607 56149 80719 93809 40950 12182 13382 38629 60728 01881 23094 15243 53501 07698 22921 68127 55309 92034 50612 81415 38461 07556 60557 42088 87680 67344 11596 55678 65101 19505 86216 59744 48076 94576 32063 99056 29831 21100 58431 24181 25930 00501 10713 90892 84077 98504 44528 24587 50031 70098 28923 10609 01796 38169 77729 82000 48161 65695 73151 48859 12431 46747 95387 48125 68149 01161 79579 37484 36439 69853 41387 32168 30953 88753 75829 11333 15659 87119 24498 47228 83949 79068 17646 83710 48724 75654 23898 08846 23917 05243 25405 01527 43488 99278 65660 06175 54107 17822 08633 71626 05622 26902 09839 15859 17009 49931 83358 45552 24164 41125 35670 17152 23683 01331 07421 16181 23463 17046 13211 28751 72554 61221 09190 49946 08049 64864 30237 29959 45817 74577 67119 94303 75230 86776 35513 14291 38453 66516 10853 88163 97869 39641 49168 31460 71120 80855 77021 76825 74305 37545 68698 54986 77795 43909 89405 42791 00614 67448 56624 48980 94057 74773 63154 78796 04038 74462 88092 36970 02048 91507 91715 02035 46279 18239 68196 47201 08759 38964 41870 49607 70743 75889 49529 31286 27549 56684 51834 66391 58116 73099 75246 14551 72201 99522 31522 16050 49881 10910 22705 47687 75634 85224 45611 83534 26300

EPITABLE: random number listing

EPITABLE: random number listing

Systematic sampling
• N = 1200, and n = 60 ⇒ sampling fraction = 1200/60 = 20 • List persons from 1 to 1200 • Randomly select a number between 1 and 20 (ex : 8) ⇒ 1st person selected = the 8th on the list ⇒ 2nd person = 8 + 20 = the 28th etc .....

Systematic sampling

1 2 15

3

4

5

6

7

8

9

10 11 12 13 14

16 17 18 19 20

21 22 23 24

25 26 27 28 29

30

31 32

33 34 35

36 37 38 39 40 41 42

43 44 45

46 47 48 49

50 51 52 53 54 55

……..

Example: systematic sampling

Systematic sampling

Stratified sampling
• Principle :
– Classify population into internally homogeneous subgroups (strata) – Draw sample in each strata – Combine results of all strata

Stratified sampling
• Advantages
– More precise if variable associated with strata – All subgroups represented, allowing separate conclusions about each of them – Sampling error difficult to measure – Loss of precision if very small numbers sampled in individual strata

• Disadvantages

Example: Stratified sampling
• Determine vaccination coverage in a country • One sample drawn in each region • Estimates calculated for each stratum • Each stratum weighted to obtain estimate for country (average)

Multiple stage sampling
Principle • = consecutive samplings • example : sampling unit = household
– 1rst stage : drawing areas or blocks – 2nd stage : drawing buildings, houses – 3rd stage : drawing households

Cluster sampling
• Principle
– Random sample of groups (“clusters”) of units – In selected clusters, all units or proportion (sample) of units included

Example: Cluster sampling
Section 1 Section 2

Section 3

Section 5 Section 4

Cluster sampling
• Advantages
– Simple as complete list of sampling units within population not required – Less travel/resources required

• Disadvantages
– Imprecise if clusters homogeneous and therefore sample variation greater than population variation (large design effect) – Sampling error difficult to measure

EPI cluster sampling
To evaluate vaccination coverage: • Without list of persons • Total population of villages • Randomly choose 30 clusters • 30 cluster of 7 children each= 210 children

Drawing the clusters
You need :
– Map of the region – Distribution of population (by villages or area) – Age distribution (population 12-23 m :3%)
Village
A B C D E F G H I J

Pop.
53000 7300 106000 13000 26500 6600 40000 6600 53000 13200

12-23
1600 220 3200 400 800 200 1200 200 1600 400

Distribution of the clusters
Compute cumulated population
A B C D E F G H I J 1600 220 3200 400 800 200 1200 200 1600 400 1600 1820 5020 5420 6220 6420 7620 7820 9420 9820

Total population = 9820

Distribution of the clusters
Then compute sampling fraction : K= 9820 = 327 30 Draw a random number (between 1 and 327) Example: 62 Start from the village including “62” and draw the clusters adding the sampling fraction A B C D E F G H I J 1600 1820 5020 5420 6220 6420 7620 7820 9420 9820 IIII I IIIIIIIIII I II I IIII I IIIII I

Drawing households and children
On the spot Go to the center of the village , choose direction (random) Number the houses in this direction
 Ex: 21

Draw random number (between 1 and 21) to identify the first house to visit From this house progress until finding the 7 children ( itinerary rules fixed beforehand)

Design effect
Global variance p(1-p) Var srs = ---------n Cluster variance Σ (pi-p)² Var clus = ------------k(k-1)

Var clust Design effect = -----------------Var srs

p= global proportion pi= proportion in each stratum n= number of subjects k= number of strata srs= simple random sampling

EPITABLE: Calculating design effect

Selecting a sampling method
• Population to be studied
– Size/geographical distribution – Heterogeneity with respect to variable

• Level of precision required • Resources available • Importance of having a precise estimate of the sampling error

Steps in estimating sample size
• • • • • Identify major study variable Determine type of estimate (%, mean, ratio,...) Indicate expected frequency of factor of interest Decide on desired precision of the estimate Decide on acceptable risk that estimate will fall outside its real population value • Adjust for estimated design effect • Adjust for expected response rate • (Adjust for population size? In case of small size population only)

Sample size formula in descriptive survey
Simple random / systematic sampling z² * p * q n = -------------d² Cluster sampling z² * p * q n = g* -------------d²
z: alpha risk express in z-score p: expected prevalence q: 1 - p d: absolute precision g: design effect

1.96²*0.15*0.85 ---------------------0.03²

= 544

2*1.96²*0.15*0.85 -----------------------0.03²

= 1088

EPITABLE: cluster sample size calculation

Place of sampling in descriptive surveys
• • • • • • • • • • • • • Define objectives Define resources available Identify study population Identify variables to study Define precision required Establish plan of analysis (questionnaire) Create sampling frame Select sample Pilot data collection Collect data Analyse data Communicate results Use results

Conclusions
• Probability samples are the best • Beware of …
– refusals – absentees – “do not know”

Conclusions
• If in doubt…

Call a statistician !!!!

Sign up to vote on this title