Survey and Research Methods

Session 6: Sampling

Dr. Gangadhar Mahesh, B. Arch, M. Eng, Ph.D

NICMAR
ACM – Survey and Research Methods

1

Sampling
• Sampling – means of selecting a subset of units from a population for the purpose of collecting information for those units to draw inferences about the population as a whole • Two types of sampling - non-probability sampling - probability sampling • Choice depends primarily on whether reliable inferences are to be made about the population

NICMAR
ACM – Survey and Research Methods

2

Non-probabilistic Sampling
• Method of selecting units from a population using a subjective (i.e., non-random) method • Advantages
- Quick and convenient - Relatively inexpensive - Dose not require a survey frame

- To make inferences about the population it requires strong assumptions about the representativeness of the sample - Impossible to determine the probability that a unit in the population is selected for the sample - Reliable estimates and estimates of sampling error cannot be computed

• Can be used as
- An idea generating tool - A preliminary step towards the development of a probability sample survey - A follow-up step to help understand the results of a probability sample survey

NICMAR
ACM – Survey and Research Methods

3

Non-probabilistic Sampling Types 1
• Haphazard Sampling
- Assumes that the population is homogeneous - Units selected in an aimless, arbitrary manner with little or no planning involved - Example – the ‘man in the street’ interview where the interviewer selects any person who happens to walk by

• Volunteer Sampling
- Respondents are volunteers generally screened so as to get a set of characteristics suitable for the purposes of the survey - Subject to large selection biases, but is sometimes necessary - Example, for ethical reasons - often used in medical research and exploratory studies

• Judgement Sampling
- Based on previous ideas of population composition and behavior - Subject to the researcher's biases and is perhaps even more biased then haphazard sampling - Can be useful in exploratory studies – Example, in selecting members for focus groups or in depth interviews to test specific aspects of a questionnaire

NICMAR
ACM – Survey and Research Methods

4

Non-probabilistic Sampling Types 2
• Quota Sampling
- Done until a specific number of units (quotas) for various subpopulations has been selected - Quotas may be based on population proportions - Considered preferable to other forms of non-probabilistic sampling because it forces the inclusion of members of different subpopulations

• Modified Probability Sampling
- A combination of probability and non-probability sampling - First stage usually based on probability sampling and the last stage is a non-probability sample usually a quota sample - Example - geographical areas may be selected using a probability design, and then within each region, a quota sample of individuals may be drawn

• Network / Snowball Sampling
- Use to find rare / specialized individuals in the population and when one knows of the existence of some of these individuals and can contact them - Sample grows like a snowball rolling down a hill to hopefully include virtually everybody with that characteristic

NICMAR
ACM – Survey and Research Methods

5

Probabilistic Sampling
• Avoids bias by randomly selecting units from the population • Random does not mean arbitrary but is based on chance – that
all units in the survey population have a non-zero inclusion probability in the sample and that these probabilities can be calculated

• Not necessary for all units to have the same inclusion probability – in most complex surveys, the inclusion probability varies from unit to unit
• Advantages – reliable estimates and an estimate of the sampling error of each estimate can be
produced which means can be made about the population by using a relatively small groups

• Disadvantages – sampling is more difficult, takes longer, is difficult to manage and is usually
more expensive

NICMAR
ACM – Survey and Research Methods

6

Probabilistic Sampling – Simple Random Sampling
• Is a one step selection method that ensures that every possible sample of size ‘n’ has an equal chance of being selected • May be done with or without replacement – replacement allows for a
unit to be selected more than once; without replacement means that once a unit has been selected, it cannot be selected again

- It is the simplest sampling technique - Requires no additional information on the frame in order to draw the sample, the only information that is required is a complete list of the survey population and content information - It needs no technical development, standard formulas exist to determine the sample size, population estimates and variance estimates and those formulas are easy to use

- makes no use of auxiliary information even if such information exists which can results in estimates being less statistically efficient than if another sample design had been used - It can be expensive if personal interviews are used, if sample is geographically spread - It is possible to draw a ‘bad’ SRS sample

NICMAR
ACM – Survey and Research Methods

7

Probabilistic Sampling – Systemic Sampling 1
• Units are selected from the population at regular intervals • Used when one would like to use SRS but no list is available, or when the list is roughly random in order in which case SYS is even simpler to conduct than SRS • A sampling interval and a random start are required • When a list frame is used and the population size, N, is a multiple of the sample size, n, even kth unit is selected where the interval k is equal to N/n; the random start, r, is a single random number between 1 and k, inclusively

• Like SRS, each unit has an inclusion probability, π, equal to n/N but, unlike SRS, not every combination of n units has an equal chance of being selected
NICMAR
ACM – Survey and Research Methods

Probabilistic Sampling – Systemic Sampling 2
- A proxy for SRS when there is no frame - Does not require auxiliary frame information, like SRS - Can result in a sample that is better dispersed than SRS (depending on the sampling interval and how the list is sorted) - Is simpler than SRS since only one random number is required

- Can result in a ‘bad’ sample if sampling interval matches periodicity in the population - Does not use auxiliary information that might be available on the frame, and thus can result in an inefficient sampling strategy - The final sample size is not known in advance when a conceptual frame is used -Dose not have an unbiased estimator of the sampling variance. In order to do variance estimation, the systemic sample is often treated as if it were a simple random sample. This is only appropriate when the list is sorted randomly - it can lead to a variable sample size if the population size ‘N’ cannot be evenly divided by the desired sample size, n (but this can be avoided using circular SYS)

NICMAR
ACM – Survey and Research Methods

9

Probabilistic Sampling – Probability - Proportional-to-size Sampling
• Uses auxiliary data and yields unequal probabilities of inclusion • Used when population units vary in size and these sizes are known to increase the statistical efficiency • Disadvantages
- Requires a survey frame that contains good quality, up-to-date auxiliary information for all units on the frame that can be used as size measures - Is inappropriate if the size measures are not accurate or stable. In such circumstances, it is better to create size groupings and perform stratifies sampling - Is not always applicable, since not every population has a stable size measure that is correlated with the main survey variables - It can result in a sampling strategy that is less statistically efficient than SRS for survey variables that are not correlated with the size variables - Estimation of the sampling variance of an estimation is more complex - Frame creation is more costly and complex than SRS or SYS, since the size of each unit in the population needs to be measured and stored

NICMAR
ACM – Survey and Research Methods

10

Probabilistic Sampling – Cluster Sampling
• Process of randomly selecting complete groups (clusters0 of population units from the survey frame • A two-step process – First, the population is grouped into clusters and the second
step is to select a sample of clusters and interview all units within the selected clusters

- Can greatly reduce the cost of collection by having a less dispersed sample than SRS - Easier to apply than SRS or SYS to populations that are naturally clustered - Allows the production of estimates for the clusters themselves - Can be more statistically efficient than SRS if the units within the clusters are heterogeneous with respect to the study variables and the clusters are homogenous

- Can be less statistically efficient than SRS if the units within the clusters are homogenous with respect to the study variables - its final sample size is not usually known in advance, since it is not usually known how many units are within a cluster until after the survey has been conducted - Its survey organization and variance can be more complex than for other methods

NICMAR
ACM – Survey and Research Methods

11

Probabilistic Sampling – Stratified Sampling 1
• The population is divided into homogeneous, mutually exclusive groups called strata, and then independent samples are selected from each stratum by using appropriate sampling method

• Reasons for stratification
- To make the sampling strategy more efficient than SRS or SYS - To ensure adequate sample sizes for specific domains of interest for which analysis is to be performed - To protect against drawing a ‘bad’ sample

NICMAR
ACM – Survey and Research Methods

12

Probabilistic Sampling – Stratified Sampling 2
- Can increase the precision of overall population estimates, resulting in a more efficient sampling strategy. A smaller sample can save a considerable amount o the survey, particularly data collection - Can guarantee that important subgroups, when defined as strata, as well represented in sample, resulting in statistically efficient domain estimators - Can be operationally or administratively convenient - Can protect against selecting a ‘bad’ sample - Allows different sampling frames and procedures to be applied to different strata

- Requires that the sampling frame contain high quality auxiliary information for all units on the frame, not just those in the sample, that can be used for stratification - Frame creation is more costly and complex than for SRS or SYS, since the frame requires good auxiliary information - Can result in a sampling strategy that is less statistically efficient than SRS for survey variables that are not correlated to the stratification variables - Estimation is slightly more complex than for SRS or SYS

NICMAR
ACM – Survey and Research Methods

13

Probabilistic Sampling – Multi stage & Multi phase Sampling
• Multi-stage sampling
- The process of selecting a sample in two or more successive stages - The units at each stage are different in structure and are hierarchical - Commonly used with area frames to overcome the inefficiencies of one-stage cluster sampling - Can have any number of stages, but the complexity of the design increases with the number of stages

• Multi-phase sampling
- Collects basic information from a large sample of units and then, for a subsample of these units, collects more detailed information - Collects basic information from a large sample of units and then, for a subsample of these units, collects more detailed information - Useful when the frame lacks auxiliary information that could be used to stratify the population

NICMAR
ACM – Survey and Research Methods

14