You are on page 1of 42

AEA/CDC Summer Evaluation Institute

Offering 30: Basics of Probability and Purposeful Sampling Description: Choosing and implementing an appropriate sampling strategy can affect the validity, credibility and cost of an evaluation. Some studies require sophisticated probability sampling methods to produce accurate estimates of the characteristics of the populations served or of the size of the effects of the program or policy on the target population. Other studies may appropriately use purposeful samples to support theory development or to do detailed case analysis. In this workshop, you will be exposed to alternative sampling strategies that are frequently used in evaluation and social research. The instructor will address the 14 questions from his book Practical Sampling (Sage, 1990) that should be answered prior to sample design, as a part of sample design, and prior to analysis of the data. You will become acquainted with ways to plan and implement sampling strategies that meet the needs of an evaluation. Examples will be used to illustrate the designs and issues that arise in implementation and you will have the opportunity to raise specific sampling issues that encountered in your own work. Audience: Attendees who are new to sampling and working in any context Gary T. Henry holds the Duncan MacRae ’09 and Rebecca Kyle MacRae Professorship of Public Policy in the Department of Public Policy and directs the Carolina Institute for Public Policy at the University of North Carolina (UNC) at Chapel Hill. Also, he holds the appointment as Senior Statistician at the Frank Porter Graham Institute for Child Development at UNC-Chapel Hill. He previously served as the Director of Evaluation and Learning Services for the David and Lucile Packard Foundation. Henry has evaluated a variety of public policies and programs and is the author of Practical Sampling (Sage 1990), Graphing Data (Sage 1995) and the coauthor of Evaluation: An Integrated Framework for Understanding, Guiding, and Improving Policies and Programs (Jossey-Bass 2000). He received the Evaluation of the Year Award from the American Evaluation Association in 1998 for his work with the Georgia’s Council for School Performance and the Joseph S. Wholey Distinguished Scholarship Award in 2001 from the American Society for Public Administration and the Center for Accountability and Performance. Offered (Two Rotations of the Same Content - Do not register for both):
• •

Tuesday, June 24, 9:25 – 12:45 (20 minute break within) Wednesday, June 25, 9:25 – 12:45 (20 minute break within)

Sampling 101: Basics of Probability and Purposeful Sampling

Gary T. Henry

AEA/CDC Summer Evaluation Institute June 2008

Agenda
1.

The Basics
Defining Sample Sampling, Validity, and Error

2.

Purposeful Sampling
Justification Types

3.

Introduction to Probability Sampling
– – –

Error and Sampling Sources of Error Target Population and the Sample Frame

1

Agenda
3. Probability Sampling Methods
– – – – –

Simple random sampling Systematic sampling Stratified sampling Cluster sampling Multi-stage sampling

5. Making Choices
– – –

Pre-sampling Choices Sampling Choices Post-sampling Choices

Sampling Basics
Defining Sample
– – –

Specimen NOT! A subset of the population that will be used as a model Representative?
Judgment not a description Probability or purposeful, explain what and why

Validity Issues for Sampling
– – – –

External validity -- persons, settings, and time Sample size and establishing covariation Reliability of measures Sampling bias and covariation

Error, Precision, and Power

2

Purposeful Sampling
Non-probability Sampling Approaches

Convenience samples
Internal & external validity

Contrasting cases
Counterfactual evidence for explanation & theory development Intentional heterogeneity

– – – –

Typical case Critical case Snowball Quota

Purposeful Sampling: Counterfactual
A purposeful sample for the qualitative part of the ECS Mixed Methods Design
– – –

Contrasting cases Counterfactual comparisons Theory development

Strategy choosing children from high poverty households that beat the odds and those who are behind their peers at preschool to explore family differences

3

2) Difference Between Resilient and Non-Resilient 13.7 88.4 115. Of the 46 children selected and located in a first-grade or kindergarten classroom.6 21.1) WJ-LW WJ-AP Spring WJ-AP Spring 2004 Fall 2001 2003 110.7 (+10) 79.3 (+12. a sample of children who met or exceeded age-based expectations (termed “resilient”) and a group that fell below age-based expectations (termed “non-resilient”) were selected for the in-depth study.3 (+7.3 20.7 27.9) WJ-AP Spring 2004 109.6 112.1 (+9) 91.0 (+8.6) Non-Resilient Children 88.1) Resilient Children 102.ECS Qualitative Study Design Based upon a composite measure of their baseline scores on the assessments with national norms.1 (+7. 2 fathers) or guardians (2 extended family members).9 17 4 .1 106.3 9.1 (-.5) 97.4 (+13) 108. ECS Qualitative Study Design Table 4.3) 108.3 (+5.9) 100.0 (+8.7 109.2 (+12.6) 98.3 Letter Word-Recognition and Basic Math Test Scores and Gains from Preschool WJ-LW WJ-LW Spring Fall 2001 2003 Overall Study Sample 102.1 88. interviews were conducted with 36 parents (32 mothers.7 (+10.

and residence. Republicans were easier to interview. Roper. This election pitted Harry Truman. and thus Democrats were under-represented in the interviews. all declared Dewey as the winner based on interviews that has been conducted using quota sampling methods. and Crossley. Of course. rent paid. the democratic candidate who had assumed the presidency when Franklin D. the republican candidate. Roosevelt died. Subsequently. beating Dewey by nearly 5 percentage points.ECS Non-probability Sample Role of religion and faith-based communities Parental Sense of Purpose differed between the parents of resilient children and the parents of non-resilient children Four themes emerged as related to sense of purpose: Quality of and Time spent interacting with Child Value of Education Advocacy on Behalf of the Child Parent Outlook Role as Parents Non-resilient: Pal looking for Approval or Harsh Guide Resilient: Pals and Guides as the Situation Demands A Cautionary Tale about Non-probability Samples The most infamous example of the bias that can occur with quota sampling involves polling for the 1948 presidential election. but the interviewers had chosen too many Republicans. After that election. against Thomas Dewey. race. gender. 5 . The quota samples were carefully matched to Census data on age. Truman won with just under 50 percent of the popular vote. newspapers were filled with stories about the failure of the pollsters and as well as many commentators. either more willing to participate or more approachable. Gallup. all three polling firms dropped quota sampling techniques. Three polling firms.

tolerate (intervals and effect size) Credibility and Validity Issues: Sampling frame or non-frame options Precision or power Total Error: Bias and Variability 6 . estimate Variability -. minimize.Introduction to Probability Sampling Bias Impact of human discretion in selection Remove.Sampling error Estimate.

known.Sources of Bias and Variability Simple Random Sampling Simple Random Sampling – – – – – – – – – Equal probability of selection Every unit has an equal. non-zero probability of selection A list of the entire study population A fixed number of units to be selected A random selection mechanism Population list is available and accurate No other variables known or useful Data collection does not require travel Sampling handled centrally Requirements: Situations 7 .

013 0.022 0.030 0.018 0.00% 0.014 0.00% 30 0. if uncorrected Sampling Error for SRS Table 1: Sampling and Tolerable Error Calculations for Simple Random Samples Sampling Errors for Proportions (SRS) Proportion Percentage 0.009 0.015 0.011 0.015 0.25(P) Statistical software assumes SRS: Estimates likely to be biased.40 40.006 8 .Simple Random Sampling Sampling Error (SRS): sx = s/(n)1/2 for continuous variables sp = ((pq)/n)1/2 for proportions Finite population correction (1-n/N).022 0.042 0.043 0.049 0.009 0.040 50 0.013 0.10 10.013 0.055 0.042 0.013 0.006 0. if n > .00% 0.079 0.050 0.071 0.00% 0.091 0.040 0.089 0.022 0.00% 0.05 5.010 0.069 0.000 0.031 Sample Sizes 100 500 800 0.022 0.95 95.007 1.010 0.00% 0.030 0.008 0.055 0.009 0.90 90.017 0.049 0.014 0.015 0.00% 0.75 75.043 0.009 0.50 50.60 60.200 0.019 0.069 0.022 0.011 0.00% 0.014 0.061 0.25 25.061 0.031 0.089 0.008 1.015 0.017 0.014 0.00% 0.079 0.019 0.007 0.014 0. if uncorrected Sampling errors likely to be biased.016 0.

000 4.257 50 0.060 100 0.050 0.021 0.034 0.083 0.90 90.030 0.029 0.Sampling Error for SRS Sample Sizes Tolerable Errors (95% Confidence) Proportion Percentage 0.087 0.002 0.175 0.019 0.95 95.354 10.025 0.00% 0.00 3.095 0.826 18.059 0.014 0.022 0.009 0.179 0.548 0.004 0.177 1.162 1.887 9 .40 40.913 1.144 0.014 0.25 25.134 0.000 0.012 Sampling Error for SRS Table 1 (cont.002 0.085 0.014 1.142 Samples Sizes 100 500 800 0.038 0.00% 0.100 0.096 0.183 0.028 0.002 0.141 0.026 0.043 0.10 10.083 0.00% 0.031 0.001 0.000 0.50 50.107 0.017 0.043 500 0.00 100.75 75.224 0.005 0.472 3.012 0.035 0.007 0.019 0.500 0.028 0.018 0.414 14.078 50 0.027 0.018 0.00% 0.032 0.106 0.05 5.00% 0.50 1.078 0.091 0.289 2.001 0.035 0.010 0.120 0.019 0.030 0.003 0.300 0.155 0.158 0.00 10.200 0.707 1.155 0.016 0.030 0.): Sampling and Tolerable Error Calculations for Simple Random Samples Sampling Errors For Continuous Variables (SRS) Standard Deviation (S) 0.026 0.028 0.003 0.071 0.015 0.030 0.019 800 0.000 0.060 0.000 0.107 0.60 60.10 0.000 0.05 0.139 0.00% 0.021 0.01 0.034 0.00% 0.120 0.014 0.447 0.044 0.043 0.015 1.002 0.000 0.175 0.536 1.027 0.045 0.424 0.316 3.085 0.200 0.000 0.017 0.059 0.136 0.004 0.136 0.00% 0.096 0.098 0.00% 30 0.025 0.00 5.038 0.00 30 0.043 0.001 0.

877 8.006 0.098 0.930 1.003 0.719 100 0.004 0.566 5.035 0.772 27.600 500 0.007 0.208 0.05 0.263 0.014 0.980 1.200 0.003 0.620 6.010 0.00 10.000 0.057 0.50 1.346 0.179 0.001 0.139 0.028 0.001 0.310 0.765 800 0.438 0.006 0.358 1.001 0.588 0.074 1.00 5.009 0.018 0.277 0.578 35.00 30 0.036 0.003 0.10 0.186 0.01 0.004 0.00 100.088 0.170 0.044 0.785 50 0.Sampling Error for SRS Samples Sizes Tolerable Errors (95%Confidence) Standard Deviation (S) 0.283 0.020 0.002 0.198 1.031 0.196 0.693 6.789 3.062 0.028 0.00 3.003 0.658 Power Analysis See other hand-out 10 .960 19.069 0.832 1.386 2.001 0.

Systematic Sampling Systematic Sampling Characteristics – – – Equal probability of selection Every unit has an equal. known.g. non-zero probability of selection List may be ordered to simulate proportional stratification Known number of units in the study population A fixed number of units to be selected Selection interval: i = N/n.) Select every ith unit Requirements: – – – – – – Systematic Sampling Situations – – – – – Population list is available or is unavailable but physically represented (e. etc. voters. files) Number of population members known List or files are not cyclically arranged Data collection does not require travel Sampling handled centrally or in field 11 . round down to nearest integer Random start between 1 and i Representation of the study population (files.

especially when correlated with dependent variable(s) for study Data collection does not require travel Sampling handled centrally Variances are higher in some strata Efficiency is very important 12 .Stratified Random Sampling Stratified Sampling Characteristics – – Equal or unequal probability of selection Every unit has a known. non-zero probability of selection A list of the entire study population One or more known variables for every member of the population identified Every member of the population assigned to one strata based on variable(s) A fixed number of units to be selected within each strata Selection can be proportional or disproportional within strata A random selection mechanism Requirements: – – – – – – Stratified Random Sampling Situations – – – – – – Population list is available and accurate Other variables known or useful.

non-zero probability of selection A list of the clusters that include the entire study population A fixed number of clusters to be selected A random selection mechanism The average number and range of units in a cluster Requirements: – – – – 13 .Stratified Random Sampling Error Sampling Error Stratified Sample S 0 = [Σ(wk2s k2/n)]1/2 where wk = N k /N Cluster Sampling Cluster Sampling Characteristics – – Equal probability of selection for clusters Every cluster has a known.

c/C) ∑ (ΘC .xc)2 /(c .Cluster Sampling Situations – – – – – – Cluster list is available and accurate Every member of the study population is in one and only one cluster Clusters may be stratified Data collection does require travel Sampling handled centrally Every member of the cluster is surveyed which can cause some difference in the number of population members included Cluster Sampling Error s = [(1 .1) (c)]1 / 2 Where c = number of clusters selected C = total number of clusters in population Θc = grand mean Xc = cluster mean for cluster c 14 .

Complex Sample Designs Combine Sampling Methods Benefits Efficiency Cost reduction Reduce bias from available sampling frames Most useful approaches Stratified cluster sample Multi-stage samples with stratification Complex Sample Designs: Example Sample Stratification and Selection of Children for ECS 1. 2. 4. 3. County by strata Sites within counties from list Classroom from within site Eligible children (with signed consent) from within classrooms 15 .

9% 22.0% Sampling Design Questions Pre-Sampling 1. Is sampling appropriate? Strategies for developing and selecting among alternative sampling frames 16 .1% 100.526 23. What is the nature of the study? 2.999 1000 .0% 100.3999 4000 + Total Counties per Strata 70 59 17 9 4 159 Strata Counties Selected 4 8 4 4 4 24 Pre-K Sites 4 16 13 16 20 69 HS Sites 4 8 4 4 9 29 Private Sites 3 5 4 7 11 30 Population Estimates Estimated Population 9.2% 18.095 119.228 Sample Size 1 2 3 4 5 % 8. What are the variables of greatest interest? 3.756 26.3% 31.349 37.2% 22. How will the data be collected? 6.4% 17. Estimated Population and Sample Distribution in 5 Strata Sites Selected Estimated 4 year-olds per County < 250 250 .0% n 50 141 110 134 195 630 % 7.1999 2000 .6% 31.Complex Sample Designs: Example Table 2. Are the subpopulations or groups important? 5.5% 21. What is the target population? 4.502 22.9% 19.

How can non-response be evaluated? 13. What is the tolerable error? 9. Will the probability of selection be equal or unequal? 11. What list of the target population can be used? 8. What type of sampling method will be used? 10.Sampling Design Questions Sampling 7. How many units will be selected? Strategies for resource utilization: sample size versus follow-up trade-off Sampling Design Questions Post-Sampling 12. Is weighting necessary? 14. What are the standard errors and confidence intervals? Weighting issues and missing data 17 .

0424 Weight 1.7948 0.0163 0.0316 0.0278 0.0380 30% 0.Post-stratification Weights School A B C D E FTE 29 33 23 17 37 Responses 11 26 7 14 29 Response Population Sample Rate Proportion Proportion 38% 0.8351 18 .0205 78% 0.7256 0.0354 0.0161 79% 0.0102 82% 0.0220 0.1506 0.8308 2.

Constraints on time and budget often limit the number of members of the population who can be the subjects of the data collection and thereby require that only a subset of the population be selected for a study. Not all studies involve sampling.PLANNING APPLIED RESEARCH Practical Sampling 4 PRACTICAL SAMPLING Gary T. it can be more accurate to collect data from a sample than to conduct a census survey. In most cases. thereby reducing the amount of missing data (Dillman 1999. Henry Conducting an applied research project that involves primary data collection requires that the study team develop and implement a sampling plan. Fowler 1993). hospitals. there are examples of general population surveys that are used for applied research purposes such as statewide polls reported in the news media or surveys for assessing specific needs or measuring attitudes of the population concerning their support for a new program or policy. When a subset of a study population is to be selected for data collection. or members of a special population. census surveys in which the entire study population is selected for the study do not require sampling. or to improve measurement techniques in ways that could not have been done if they had attempted to collect data from all members of the study Page 1 . encouraging participation of those selected. The study population for an applied research project can be individuals or other units. even when census surveys are to be used. they can be members of a general population. are germane. However. for example. Sampling is required when not all members of the study population can be surveyed or included in the data collection. such as cities. many of the planning and implementation procedures related to sampling which are described in this chapter. Special populations are usually defined by participation or membership in a specific group during a prescribed time period. 2006. However. which are defined by age and place of residence at a specific time. When individuals are the focus of a study. Sampling or selecting a subset of the population is a part of most applied research projects. However. carrying out the selection process. such as eighth-graders enrolled in public schools in North Carolina during the 20052006 school year or adult mental health service consumers in Seattle who initiated service in 2006. adults living in New York between October 1 and October 27. Greater accuracy can be achieved when choosing a subset of the population allows the researchers to use their resources to encourage more of the selected members of the population to participate in the data collection. the selection process is known as sampling. evaluations and other applied studies focus on special populations. for example. which includes deciding how individuals or other units will be selected. often on populations who are eligible to participate in a certain program or those who have actually received services. such as obtaining an accurate listing of the study population and evaluating the impact of non-response. it is important to note that on the few occasions when resources permit collecting data from the entire study population. or defined geographic areas like census tracts. and assessing the extent to which departures from the expectations set when planning the sampling process may have affected the study findings.

Because of the bias that can result in having teachers rate the skills of their students (Mashburn & Henry. and calculate indicators of sample bias when bias cannot be entirely eliminated. The distinction between the two is that probability samples use random processes rather than human judgments to select the individuals or other units for a study. allows researchers to use well-grounded theories and methods to estimate the characteristics of the study population from the sample data or to test hypotheses about the study population. Probability samples make is possible to estimate averages or percentages for the study population (as well as other population parameters). evaluations of early childhood education programs often face the choice of using teacher ratings of the children’s skills which are collected for the entire population or direct assessments of a sample of the children who attend these programs. That is. The major benefit of eliminating human judgments in the selection process is that the probability sample that results is a statistical model of the study population. to explore a perceived social problem or issue. When non-probability samples are used. It is possible to calculate these estimates because probability sampling rests on probability theory. Non-probability samples allow human judgments. 2004). Often the cases selected through purposeful nonprobability sampling have particular theoretical or practical significance and can be used for developing theories or to generate explanations for the ways in which interesting or high performing cases differ from other cases. non-probability samples are best used to provide information about specific cases or members of the study population that are intrinsically interesting or important for the study. For example. Non-probability samples are used to guide data collection about the specific experiences of some members of the study population. or to develop theories that are grounded in the actual experiences of some actual members of the study population. to influence which individuals or units are selected for a study. whether purposeful or merely convenient. In contrast. the selected cases can be systematically different than the others in the study population and there is no means to adjust or estimate how similar or different these cases selected through non-probability sampling may be. This means that no known member of the study population is excluded from the sample and that all members have a known probability of selection. either purposefully or unintentionally. Relying on random procedures to select the sample for a study eliminates a very important source of bias from the study. scores on direct assessments from a sample of children can be more accurate measures of the children’s skills than teachers’ rating on the entire population of children. test hypotheses about the study population. Put another way. Relying on random processes to choose the members of the study population who are selected for the study. This limit on attributing the sample results to the study population is imposed since the judgments that led to selection of the sample. The use of probability sampling techniques can enhance the accuracy and credibility of the study findings. Probability theory requires that every member of the study population must have a known. estimate the range around the sample average (or other population parameter) within which the true average for the population is likely to occur. Researchers using probability samples forego exercising their judgments about which individuals are selected for a sample by allowing a random process to decide which members of the study population are designated for participation in the study. Page 2 . it is not reasonable to attribute the results to the entire study population.population. Probability and Non-probability Sampling Samples are generally categorized as either probability samples or non-probability samples. where the “random chance” of selection allows the sample to model the study population. non-zero chance of being included in the sample. Major purposes of probability samples are to estimate characteristics of the population from the sample data or to use sample data to infer that a difference exists between two groups in the study population or between members of the study population at two time periods. This situation contrasts with probability samples. can create bias. probability samples exist to provide information about the study population and to allow researchers familiar with the particular study population and measures to assess the adequacy of the sample from which the data were drawn for the study’s purpose.

0). For example.S. agree to participate in them. These types of internet surveys use non-probability sampling procedures and it remains to be seen if the polling organizations are able to model the processes by which individuals are selected for the surveys. both probability and non-probability samples have the potential for systematic error. the most significant difference is whether the sample data presents a valid picture of the study population or rather is used to provide evidence about the individual or cases in the sample itself. once the decision is made to use non-probability sampling methods. Just as the researchers can exercise judgment in the selection processes. even though the sample proportions matched the voting population proportions in terms of location. Truman actually received 50% of the population vote. Non-probability sampling is appropriate when individuals or cases have intrinsic interest or when contrasting cases can help to develop explanations or theories about why differences occur. It will be interesting to follow the use of internet surveys to predict elections to see if they suffer a similar fate. presenting information about participation rates is highly variable and much less standardized. such as those that have been promulgated by the American Association of Public Opinion Research (2006). it is inappropriate to present the findings in ways that suggest that they apply to the study population. probability and non-probability samples differ in very fundamental and significant ways. it is often required or at least commonly expected that researchers using probability samples will use standard definitions for calculating response rates. As this discussion begins to show. Reporting the response rates using the standard calculation methods makes the potential for bias transparent to the reader. also referred to as bias.Perhaps. For studies that are undertaken to describe the study population or test hypotheses that are to be attributed to the membership of the study population. compared with Dewey’s 45%. the most infamous case in which the characteristics of non-probability samples were attributed to the study population occurred in the polling done to predict the 1948 presidential election in the U. Therefore. Then. age. we will turn to an in-depth coverage of probability sampling methods because these Page 3 . probability samples will not always produce sharp contrasts that allow for the development of explanatory theories. all of which used a form of non-probability sampling known as quota sampling. The subjective bias of interviewers tilted toward the selection of more Republicans for interviews. Conversely. Using non-probability samples for these studies makes good sense and can add explanatory evidence to the discussion about how to improve social programs. Response rates are the selected sample members that participated in the study divided by the total sample and expressed in percentage terms. Keeter. For example. The evaluation literature is filled with exemplary or best case studies and studies that seek to explain the differences in more and less successful individuals. et al. While probability samples eliminate researchers’ judgments about which individuals will be selected to participate in a study. The unintended bias affected the accuracy and credibility of the polls and caused polling firms to begin to use more costly probability samples. An important difference between the use of probability samples and nonprobability samples is in the rigorous tracking and reporting of the potential for bias from probability samples. in attributing sample characteristics to the entire study population when individuals decide not to participate in a study. probability samples are required. Before beginning to develop a sampling plan. were convinced that Thomas Dewey would defeat Harry Truman by a significant margin. While similar monitoring and reporting procedures could be applied to nonprobability samples. Individuals whether they have been selected by random processes or human judgments have a right to exercise their own judgments about participation in the study. However. and economic status. the individuals selected have a right to choose if they will participate in a study. Perhaps. race. the research team must make a definitive statement about the purpose for which the study is undertaken. Three prominent polling firms.6 percent) and one with a more common response rates (36. the next section of the chapter provides some guidance about the types of non-probability samples that applied research could consider and the methods for implementing them. and the relationship between their responses and the actual vote can be used to predict the voting totals accurately. (2000) show that it is extremely rare for findings to differ in a statistically significant way between a survey with an exceptionally high responses rates (60. It is very difficult if not impossible to specify what levels of response rates are required to reduce bias to a negligible amount.

A very important but perhaps underutilized non-probability sampling method is to select cases that allow the researchers to contrast high performing cases or individuals with lower performing cases or individuals in order to find differences that between the two. obtain data at a low cost that motivates more extensive. although somewhat denigrated by their label often capitalize on identifying individuals who are readily available to participate in a study or individuals for whom some of the needed study data have already been collected. Gormley and Gayer made strategic use of available data and were able to calculate program impacts in ways that have enhanced knowledge about the impacts of state sponsored pre-kindergarten programs. Non-probability Sampling Non-probability samples are important tools for applied research which can be used to: • • • • • choose cases that can be used to construct socially or theoretically significant contrasts. Perhaps the most frequently used type of non-probability sample is the convenience. which falls under the umbrella of contrasting cases designs. or collect data about a group for whom it would be too costly or too difficult to use probability sampling techniques for a specific study. Convenience samples. Contrasting cases along with five other non-probability sampling designs that are used frequently in social research are listed in Table 4.. along with descriptions of their selection strategies (each of these designs is described more fully in Henry. this volume). Often convenience samples are used for studies where high degrees of internal validity or unbiased estimates of a program’s effects are needed but it is impractical to conduct the research in a way that allows for extrapolating the results to the entire population served by the program. which will be discussed in more detail later. obtain evidence about individuals whose experiences are particularly relevant to the study’s research questions. the researchers could administer the same survey a second time. which fits the schema of a simple pretest-posttest design (see Bickman et al. However.methods have been more extensively developed. The researchers may survey the students about their attitudes and behaviors relating to violence and then show them a movie containing graphic violence. systematic research. Psychologists interested in the relationship between violence in movies and aggressive behaviors by the American public may choose to recruit volunteers from an introductory psychology class in an experiment. To illustrate the use of convenience samples.1. allows researchers to gather evidence on the characteristics or processes that differ between the higher and lower performing cases. These provide empirically grounded explanations of the differences that can be used as a basis for theory and further systematic assessment. 1990). Chapter X. An example of type of sample is the study of the impact of the pre-kindergarten in Oklahoma that used data collected about children enrolled in the pre-k program operated by Tulsa Public Schools (Gormley and Gayer. Chapter X. establish the feasibility of using particular instruments or survey procedures for more costly research using probability samples. After the movie. 2005). Using this approach. the estimates of effects cannot be extrapolated beyond the Tulsa Public School population. Chapter 3. Page 4 . but their utility is certainly not limited to qualitative studies. and Mark & Reichardt. Nonprobability samples are often used very effectively in qualitative research designs (see Maxwell. this volume). let’s consider a hypothetical example that is similar to actual studies in many fields.

to avoid this confusion) and random assignment. the design employs random assignment but convenience sampling. the use of a convenience sample severely constrains the study’s external validity. such as age. in part. Random assignment means that the students are assigned by some method that makes it equally likely that each student will be assigned to either the treatment group or the placebo group (Boruch.S. most prioritize one over the other due to practical concerns such as costs or study purposes or because there are gaps in the current knowledge about the topic that the research sets out to examine that lead to developing strategies to fill an important gap. The differences in these two groups cannot be used to formally estimate the impact of violent movies on the U. The generalizability of findings refers to the external validity of the findings. The randomized assignment that was used increases the internal validity of a study. Group members identify additional members to be included in sample. Although this type of design can rate highly in isolating the effect of violent movies. The students in this sample are likely to be in their teens and early 20s if they were attending a traditional college or university and their reactions to the violent movie may be different from the reactions of older adults. Typical cases Critical cases Snowball Quota To expose and then clarify a point of confusion that often arises when discussing random samples (which I label probability samples. Although applied studies can be designed to provide high levels of both internal validity and generalizability. a movie without violence. or a non-treated group which receives a placebo. Select cases that are key or essential for overall acceptance or assessment. The strength of this design is in its ability to detect differences in the two groups that are attributable to the treatment. a movie with graphic violence. but it should not be confused with random sampling. may alter responses to seeing violent movies. this volume). Before the treatment is administered — in this case. Often well used when a theoretically or practically important variable can be used as the basis for the contrast. the convenience sample restricts the researchers’ ability to extrapolate or generalize the findings to the general population.Table 4. Other conditions. Applying the effects found in this study to the entire U. In this case. Interviewers select sample that yields the same proportions as in the population on easily identified variables. Select cases that are judged to represent very different conditions. If we are interested in the effect of violent movies on the U. before the movie is shown — each student is randomly assigned to a either a treatment group. Page 5 . population could be misleading. I will add a randomly assigned control group to this design. that is. population. Random sampling is a probability sampling technique that increases external validity. population. Select cases that are known beforehand to be useful and not to be extreme. watching a violent movie. Chapter X.S.S.1 Nonprobability Sample Designs Type of Sampling Convenience Contrasting cases Selection Strategy Select cases based on their availability for the study and ease of data collection.

Probability Samples As I stated earlier. and a random-digit dialing procedure that provides random lists of four digits matched with working telephone prefixes in the geographic area being sampled (see Lavrakas. when time and resources are limited or developing a list of the members is considered unethical. such as individuals living with HIV-AIDS or undocumented workers. Credibility. Power refers to the probability of detecting a difference of a specified size between two groups or a relationship of a specified size between two variables and a probability sample of a specific size. The procedure must be carefully designed and carried out to eliminate any potential human or inadvertent biases. and credibility of sample data and statistics. 1987). bias can produce significant differences between the sample and the study population. To have this characteristic. nonzero probability of being selected for the sample. which was mentioned earlier. Snowball samples are very commonly used for studies where the study population members are not readily identified or located. Snowball sampling involves recruiting a few members of the study population to participate in the study and asking them to identify or help recruit other members of the study population for the study. The confidence interval is the interval around the sample mean estimate in which the true mean is likely to fall given the degree of confidence specified by the analyst. Precision applies to the size of the confidence interval around a parameter estimate such as the mean or a percentage. when a newspaper reports that a poll has a margin of error of plus or minus 3 percent that is a way of expressing the precision of the sample. Because sampling variability has an established relationship to several factors (including sample size and variance). Chapter 15. for an example). this volume. Power is closely related to precision. Random selection does not mean arbitrary or haphazard (McKean. power. Sampling theory provides the basis for calculating the precision of statistics for probability samples. but because the interviewers select the respondents. The random selection process underlies the validity. However. a sample must be selected through a random mechanism. The validity of the data affects the accuracy of generalizing sample results to the study population and drawing correct conclusions about the population from the analytical procedures used to establish differences between two groups or co-variation. Quota samples exactly match the study population on easily observed characteristics. a computer program that generates a random list of units from an automated listing of the population. although sample design can have a considerable effect as will be discussed later in this chapter. or other activities that are not condoned by society or populations that may be stigmatized or potentially suffer other consequences if their membership in the group is know. It means that the analyst is confident that 95 out of 100 times. was frequently used by polling firms and other survey research organizations but has been largely discarded. the true percentage will fall within 3 percentage points of the percentage estimated for the sample. in large measure. probability samples have the distinguishing characteristic that each unit in the population has a known. Random selection mechanisms include a lottery-type procedure in which balls on which members of the population have been identified are selected from a well-mixed bowl of balls. snowball samples may be used to obtain evidence about some members of the study population. Quota sampling. precision. drugs.Convenience sampling and contrasting cases sampling are but two of the many types of non-probability samples that are frequently used in applied social research. Random selection requires ensuring that the selection of any unit is not affected by the selection of any other unit. the precision for a specific sample can be planned in advance of conducting a study. Snowball samples may be significantly biased if the individuals recruited for the study have limited knowledge of other members of the group. Examples of these types of populations are individuals involved with gangs. Random selection mechanisms are independent means of selection that are free from human judgment and the other biases that can inadvertently undermine the independence of each selection. For example. The principal means of increasing precision and power is increasing sample size. rests on absence of perceived bias in the Page 6 .

such as a sampling frame that lists some population members more than once Sampling variability: the fluctuation of sample estimates around the study population parameters that results from the random selection process Each component of error generates specific concerns for researchers and all three sources of error should be explicitly considered in the sampling plan and adaptation of the plan during the research process. Bias can occur because the listing of the population from which the sample has been drawn (sampling frame) is flawed or because the sampling methods cause some populations to be overrepresented in the sample. Error can arise from other sources. Probability sampling can increase credibility by eliminating the potential bias that can arise from using human judgment in the selection process. I describe the sources of total error in some detail. In the next sections. Bias is a direct threat to the external validity of the results. refers to systematic differences between the sample and the population that the sample represents. A distinct advantage of probability samples is that sampling theory provides the researcher with the means to decompose and in many cases calculate the probable error associated with any particular sample. decisions that allocate resources to reduce error from one component necessarily affect the resources available for reducing error from the other two components. is the amount of variability surrounding any sample statistic that results from the fact that a random subset of cases is used to estimate population parameters.sample selection process that would result in the sample being systematically different than the study population. and power are objective criteria and have widely agreed upon technical definitions. precision. Each of the three components of total error and some examples of the sources of each are illustrated in Figure 4. Because a probability sample is chosen at random from the population. Limited resources force the researcher to make trade-offs in reducing total error. Total error is defined as the difference between the true population value for the target population and the estimate based on the sample data. different samples will yield somewhat different estimates of the population parameter. Sampling variability is the expected amount of variation in the sample statistic based on the variance of the variable and the size of the sample. Sources of Total Error in Sampling Design The researcher can achieve the goal of practical sampling design by minimizing the amount of total error in the sample selection to an acceptable level given the purpose and resources available for the research. Below. Page 7 . Total error has three distinct components: Nonsampling bias: systematic error not related to sampling. sampling variability. and then return to the concept of total error for an example.1. Taken together. in sampling. as other contributors to this volume point out. such as differences in target and study populations or nonresponse Sampling bias: systematic error in the actual sampling that produces an overrepresentation of a portion of the study population. Bias. The other form of error in probability samples. I describe each of the three sources of error. Because sample design takes place under resource constraints. The researcher must be fully aware of the three components of error to make the best decisions based on the trade-offs to be considered in reducing total error. Credibility is a subjective criterion while validity. One form of error is known as bias. but here the focus is on total error that arises from the design and administration of the sampling process. bias and sampling variability represent total error for the sample.

Page 8 .

Nonsampling bias results from decisions as well as implementation of the decisions during data collection efforts that are not directly related to the selection of the sample. If nonresponse is truly random. which may exclude large numbers of the homeless. nonrespondents come from a definable subgroup of the population that may regard the research project as less salient or more of an intrusion that others. Sampling bias can be subdivided into two components: selection bias and estimation bias. such as eligibility criteria. a weight (w) equal to the inverse of the ratio of the Page 9 . the target population should include all homeless individuals. an evaluation of the community mental health services provided to the homeless should include only homeless recipients of community mental health care. On the other hand. when the probabilities of selection are known. new listings. Selection bias occurs when not all members of the study population have an equal probability of selection. in a comprehensive needs assessment for homeless individuals. To adjust for this unequal probability of selection. More frequently. For example. nonresponse creates nonsampling bias. For example. Nonresponse results from the researcher’s inability to contact certain members of the population or from some target population members’ choice to exercise their right not participate in a survey or provide other data for the research. The target population for the needs assessment is more broadly defined and inclusive of all homeless. whether served by current programs or not. A principal difference relevant to sample design is the difference between the target population and the study population. the definition of the study population may exclude some members of the target population that the researcher would like to include in the study findings. An individual appearing on both lists would have twice the likelihood of being selected for the sample. but it could be feasible to identify sample members that appeared on both lists and adjust for the unequal probability of selection that arises. it does not represent a bias. The expected value of the mean is the average of the means obtained by repeating the sampling procedures on the study population. using the Atlanta telephone directory as the sampling frame for the current residents of a the Atlanta metropolitan area would produce biased estimates of household characteristics due to unlisted numbers. When the probability of selection is not equal researchers adjust the estimates of the population parameters by using weights to compensate for the unequal probabilities of selection. Sampling Bias Sampling bias is the difference between the study population value and the expected value for the sample. An illustrative example of selection bias is a case in which a sample is selected from a study population list that contains duplicate entries for some members of the population. The target population can be defined based on conditions and concerns that arise from the theory being tested or factors specific in the policy or program being evaluated.Nonsampling Bias Nonsampling bias is the difference between the true target population value and the population value that would be obtained if the data collection procedures were administered with the entire population. In the citizen survey example presented in Henry (1990). Estimation procedures can adjust for the unequal probabilities. but this is frequently not the case and nonresponse should never be assumed to be missing at random or even ignorable without careful examination. The omission of subgroups such as these from the data that are actually collected creates a bias in the results. The target population is the group about which the researcher would like to make statements. The expected value of the mean is equal to the study population value if the sampling and calculation procedures are unbiased. Even if data were collected on the entire study population in this case. two lists are combined to form the study population list: state income tax returns and Medicaid-eligible clients. It may take an inordinate amount of resources to purge such a combined list of all duplicate listings. Also. and residents without phones. including the homeless and those who rely exclusively on cellular phones. Differences in the true mean of the population and the survey population mean arise from several sources. the findings would be biased because of the exclusion of some target population members. For instance.

Two factors have the greatest influence sampling on the standard error: the amount of variation around the mean of the variable (standard deviation or square root of the variance) and the size of the sample. are used to overcome other issues with the data and therefore. some members of the study population will be included and others will be excluded. Sampling theory can be used to provide a formula to estimate the precision of any probability sample based on information available from the sample. a formula can be used to estimate the standard deviation of the sampling distribution. the median is a biased estimate of the central tendency for the population. The probability of selection for this individual was twice the probability of selection for the members of the study population appearing on the list only once. Using this formula allows the researcher to estimate the standard error of the mean. For example. For example. which produces this variation. the precision of the sample estimate. Generally biased estimators. s is the estimate of the standard deviation. or the range which is likely to include the true mean for the study population. Sampling Variability The final component of total error in a sample is directly attributable to the fact that statistics from randomly selected samples will vary from one sample to the next due to chance. In any particular sample. in this particular case. Therefore. This is due to the fact that the expected value of the median of the sample means is not equal to the true study population mean. referred to hereafter as the standard error of the estimate.5. the statistic that measures the final component of total error. the standard error of the mean: [EQUATION] s x = s/(n1/2) ¯ where s ¯ is the estimate of the standard error of the mean. n is x the sample size. The larger the sample. it is useful to have an estimate of their likely proximity to the population value. Because the standard deviation for the population can be estimated from the sample information and the sample size is known. the smaller the standard deviation of the sampling distribution.probability of selection of unit to the probability of selection of units only listed once (r) should be applied in the estimation process: w = 1/p = 1/2 = . Smaller standard deviations reduce the sampling error of the mean. the median income of a population is often estimated rather that the mean income because relatively few very high income individuals can cause the mean to be high relative to median and the income that most members of the population actually receive. Estimation bias occurs when the average calculated using an estimation technique on all possible simple random samples from a population does not equal the study population value. The likelihood that the confidence interval contains the true mean is based on the product of the t statistic chosen for the following formula: [EQUATION] Page 10 . this type of individual would receive only one-half the weight of the other population members to compensate for the increased likelihood of appearing in the sample. The standard error is used to compute a confidence interval around the mean (or other estimate of a population parameter). such as the median. the estimation bias is outweighed by other factors. or in the terms that I have used before. The logic here is that those with double listings have been overrepresented by a factor of two in the sample and therefore must be given less weight in the estimation procedures to compensate. based solely on information from the sample. Because it is rare for sample estimates to be exactly equal to the study population value.

One further note on terminology: The terms sampling error and standard error are used interchangeably in the literature. Because the two bias components cannot be calculated as readily. Too frequently. First. however. Sampling without replacement limits the cases available for selection as more are drawn from the population. they are calculated for the statistic being used by almost any statistical software package. Total Error Total error combines the three sources of error described above. For example. such as non-response that could be indicated by calculating and publishing the response rate using the appropriate formulas published by the American Association of Public Opinion Research (2006). sampling planning is reduced to the calculation of sample size and selection of the type of probability sample to be used. 1. Second.I = ¯ +/. the audience for a study could easily assume that sampling error is synonymous with total error concept.05 that it does not appreciably affect the standard error calculation. they typically ignore other sources of error. it is set aside and not eligible to be selected again. Kish 1965). This is based on the fact that the finite population correction factor is so close to 1 when the sampling fraction is less than . When this occurs. In most cases. 1990. In the next section of this Page 11 . standard error calculations are specific to the particular population parameter being estimated. Second. Formulas must be adjusted for more complex sampling techniques (Henry. the researcher should report the confidence interval along with the point estimate for the mean to give the audience an understanding of the precision of the estimates. the sample must contain more than 5% of the population to require the FPC. assume that a simple random sample design has been used to select the sample. like the formulas presented above.96). However. the standard error for proportions is also commonly used: sp = [(pq)/n]½ Most statistic textbooks present formulas for the standard error of several estimators. failing to consider and to attempt to reduce all three components of total error sufficiently can reduce the validity and credibility of the study findings. Also. including regression coefficients. For example. They are specific statistics that measure the more general concept of sampling variability.96. for t. once a unit has been randomly drawn from the population to appear in the sample. it implies an error in procedure rather than an unavoidable consequence of sampling. Sample design is a conscious process of making trade-offs to minimize these three components of total error. reducing the standard error becomes the exclusive focus of sample design because it can be readily estimated. is the preferred term. First. that is. Two more technical points are important for discussion here. For the standard error of the mean. If a sample is drawn from a finite population. probability sampling design discussions thus far in this chapter have assumed that the sample would be selected without replacement. These formulas.t(s x ) x ¯ The confidence interval is the most popular direct measure of the precision of the estimates and it is common practice to use the value that represents 95 percent confidence. sampling without replacement may cause a finite population correction factor (FPC) to be needed in the computation of the standard error of the estimate. The common use of sampling error is unfortunate for two reasons. when newspapers report the margin of error for polling results that they publish (usually sp *1. they are often given short shrift during the design process. the formula using the FPC is [EQUATION] s x = (1-n/N)s/(n1/2) ¯ As a rule of thumb. Standard error. which could lead to the audience’s ignoring other sources of error.

The framework includes three phases of the overall design of the research project. including the sampling designs used in previous studies of the target population. may add important information to the sample planning process to fill in important gaps in knowledge about the population or program. the practical sampling design framework will be described. As researchers work through the choices presented in the framework. and provide ways to assess the amount of error that is likely to be present in the Page 12 . The process involves both calculations and judgment. to avoid problems experienced with the earlier studies. or to adhere to commonly accepted practices.3 Questions for Sample Design Presampling choices What is the nature of the study—exploratory. in essence. The framework is. In some situations or with certain populations. developmental. No single sample design will accomplish all of the goals for studying a particular population and choices may be made differently by different research teams. Table 4. While much of the framework applies to non-probability samples. a series of choices that must be made. assist the researchers in analyzing the data correctly. applied researchers can assess the options available to reduce total error while developing a sample plan and adapting the plan to the unexpected events that occur when the plan is being implemented. descriptive. My purpose in providing the framework here is to help researchers and consumers of research structure their thinking about design choices and the effects of their choices on total error.chapter. which have been further subdivided into 14 questions (see Table 4. By answering the questions presented in the framework. or analytic? What are the variables of greatest interest? What is the target population for the study? Are subpopulations important for the study? How will the date be collected? Is sampling appropriate? Sampling choices What listing of the target population can be used for the sampling frame? What is the precision or power needed for the study? What sampling design will be used? Will the probability of selection be equal or unequal? How many units will be selected for the sample? Postsampling choices How can the impact of nonresponse be evaluated? Is it necessary to weight the sample data? What are the standard errors and related confidence intervals for the study estimates? The answers to these questions will result in a plan to guide the sampling process. issues may be raised may cause then to reassess earlier decisions. especially the presampling questions. Practical Sampling Design Framework The framework for practical sampling design is a heuristic tool for researchers and members of the audience for research findings to use in sample design as well as an aid in interpretation of the findings. with each choice having implications for the validity and integrity of the study.3). so knowledge of prior research. the framework was originally developed for probability samples. some types of error raise greater concerns than others.

Sample designs that ensure coverage of a wide range of groups or. is the contrasting cases non-probability design. or Explanatory? Establishing the primary purpose of the study is one of the most important steps in the entire research process (see Bickman et al. the research base is often slim or not much is known about the issue or program in the specific area or region in which the study has been commissioned. The sampling plan for the developmental phase calls for over-sampling children who do not speak English at home. such as averages and percentages. is provided in Henry (1990). Exploratory research is often conducted on newly emerging social issues or recently developed social program. this volume). More detail on the implications of the various choices. intentionally heterogeneous samples are purposeful samples or small stratified samples. Presampling Choices What Is the Nature of the Study—Exploratory. exploratory studies are undertaken in the early phases of an evaluation and the findings are used to develop a plan for more thorough-going evaluation studies. One option for studies designed to develop theory. Chapter 1. These approaches can yield a diverse sample at relatively low cost. we will focus on making choices that impact sample planning and implementation as well as understanding some of the implications of those choices. Preferred sampling methods include those that ensure a wide range of groups are covered in the study rather than those that reduce error. One advantage of the probability sampling approach is that once the organizational level or other factors correlated with performance are identified. Developmental. helps focus future research on important variables.sample data. Developmental studies are a recent addition to the list of study purposes to emphasize the importance of studies that are commissioned for theory development or methodological development. a probability sampling approach which divided the units into high. and low performing units and sampled a higher proportion of high and low performing units but to sample some “average” performers as well could be selected. Alternatively. A non-probability design might select only high performing and low performing units for the purpose of collecting qualitative data to contrast these two groups. For example. in the field of early childhood education there is a growing need to assess the language. Sampling approaches for exploratory studies are quite reasonably limited by resource and time constraints placed on them. This design can be extremely useful for evaluations that attempt to explain why some programs or program administrative units (for example. which was mentioned earlier. as well as four detailed examples that illustrate how choices were actually made in four sample designs. “average”. In these cases. In the next three sections. CA. Data collection could be either qualitative or quantitative depending on the existing state of theoretical development in the field. Page 13 . Developing theories or explanations for socially or theoretically important phenomena can require studies with special sampling strategies. In addition. In some cases. Should the children who do not speak language at home be assessed in both their home language and English or only one? What are the implications for the length of the assessments and test fatigue if children are tested in both languages? To gather evidence to address questions of this sort. Descriptive. said another way. the organization that oversees the pre-kindergarten program for Los Angeles. other chapters in this handbook provide discussion of the other issues. and generates hypotheses to be tested. because estimates.. Exploratory research is generally conducted to provide an orientation or familiarization with the topic under study. recently commissioned a developmental study of measurement issues as the first phase of an evaluation of two of the pre-kindergarten programs operating in LA County. are not reasonable study products. schools or clinics) perform better than others. but we have few assessment instruments and little evidence about how to assess the children. and social skills of children who do not speak English within their household. an estimate of the frequency with which they are currently utilized in the study population could be calculated from the available sample data. It serves to orient the researcher to salient issues. in order to compare the consequences of alternative measures and measurement protocols. cognitive.

the emphasis for descriptive studies will be the precision of the estimates while analytical studies will need to pay attention to the power to detect effects if the effects actually occur. as was the case with Gormley and Gayer’s study so that gaps in existing knowledge can be reduced and the state of knowledge in a field move forward. which mean that the researchers may need to assess both precision and power as decisions about sample design and power are being considered. Chapter 15.Descriptive research is the core of many survey research projects in which estimates of population characteristics. the effects that were estimated would only formally generalize to the children who attended the Tulsa Pubic Schools program. the most important dependent variable in an applied study will be the one of greatest interest. probability sampling designs were originally developed for this type of research. even though other important variables must be reduced to secondary priorities as a result of the practical priorities. most sampling texts. emphasize the using sample data to develop estimates of the characteristics of the study population. Typically. Measuring the dependent variables as well as program participation and any control variables will need to be considered. the sampling variability component of total error is quite different. not the other children attending the state sponsored pre-kindergarten in Tulsa or the children served in the prekindergarten programs operated by the other 493 school districts in the state of Oklahoma. the focus is on the precision needed for estimates. Analytic and descriptive studies will be the primary focus in the responses to the remaining questions. For analytic studies. Although they have similar objectives for reducing both types of bias. Explanatory research examines expected relationships between groups and/or relationships between variables and the focus of these studies is explaining variation in one or more variables or testing cause-effect relationships. attributes. Usually. it requires substantive expertise and knowledge of the populations being served in the locality chosen for the study to assess the reasonableness of suggesting that the effects would be similar for other children in the target population who were not eligible for participation in the study. For instance. Both descriptive and analytic studies are concerned with reducing total error. This is an example of researchers placing greater emphasis on their ability to accurately estimate the size of the effect attributable to a program for a subset of the participants of the entire program than on the external validity or generalizability of the effect to the entire population served by the program. In fact. Chapter 12. a study of student performance may seek to assess the impacts of a program on both achievement and retention in grade. the most significant concern is whether the sample will be powerful enough to allow the researcher to detect an effect given the expected effect size. Even if a complete census survey of pre-kindergarteners attending Tulsa Public Schools had been possible. and Lavrakas. Therefore. For example. Often such choices are fruitful and well justified. applied researchers must default to practical considerations such as choosing a dependent variable that can be measured within the study’s time frame. many studies attempt both descriptive and analytic tasks. Studies often have multiple purposes. Chapter 2. Gormley and Gayer (2004) focused their evaluation of the impact of the pre-kindergarten program in Oklahoma on the children who participated in the program in Tulsa Public Schools. In addition. What Are the Variables of Greatest Interest? Selecting the most important variables for a study is an important precursor to the sampling design. it is common that practical considerations lead researchers to conduct their explanatory studies in more limited geographic areas than the entire area in which certain services are provided or program operate. For descriptive studies. In practice. Moreover. this volume). especially older ones. the researcher may envision including many descriptive tables in the write-up or using several statistical tools to examine expected relationships. This is done through a power analysis (see Lipsey. such as averages and percentages. or attitudes are study objectives (see Fowler. In cases such as these. Choosing the variable of greatest interest is a matter of setting priorities. It is the slow steady increments to knowledge rather than the “ideal” that will often shape the decision for the type of study to be conducted at a particular time and in specific circumstances. At times. But it has become common for probability studies to needed for explanatory research purposes. this volume). The variables of greatest interest are then used to develop responses to the Page 14 .

if the members of subpopulation can be identified before sampling. are potential remedies. For example. The collection of data from administrative records or mailed questionnaires also poses specific sampling concerns. Are Subpopulations Important of the Study? Often a researcher will choose to focus on a part of the target population for additional analysis. Page 15 . is an option when interviews are to be conducted over the phone (see Lavrakas. groups of individuals (households in Richmond or schools in Wisconsin). Chapter X. For example. resources available for the study mandate sampling. A sample designed without taking the subpopulation into account can yield too few of the subpopulation members in the sample for reliable analysis. In making a decision about sample size. the researcher should factor nonresponse into the final calculation. a state agency responsible for the administration of a statewide pre-k program may want the study findings to generalize to the entire state but a local program operator may be more focused on the program in her particular locality. In many cases. usually in the form of personal interviews. Increasing the overall sample size or disproportionately increasing the sample size for the sub-population of interest. The population can be individuals (residents of North Carolina or homeless in Los Angeles). including both estimating of characteristics of the subpopulation using the sample data and explanatory analyses. it will increase the sample size required for the same number of completes.80 will require an initial sample size of 625. such as sample size and sampling technique. Once again it is important to note that when resources are limited. is to be used. a desired sample size of 500 with an expected response rate of . sampling can produce more accurate results than a population or census-type study. If an alternative method of administering the instrument is expected to reduce response rates. Is Sampling Appropriate? The decision to sample rather than conduct a census survey should be made deliberatively. What Is the Target Population for the Study? The target population for a study is the group about which the researcher would like to be able to speak in the reports and presentations that they develop from the findings. households headed by single. random-digit dialing. the study sponsor may be interested in a particular target population. not the number surveyed. a technique that generates a probability sample of households with working phones. Decisions about target population definitions should be made with both researchers and study sponsors fully aware of the limitations on extrapolating the findings beyond the target population once the study is completed. For example. For example. Because the sampling error depends on the number who actually respond. In most cases. this volume). When subgroups are important focal points for separate analyses. or other units (invoices. it is common to divide the desired sample size by the proportion expected to respond. How Will the Date Be Collected? Certain sampling choices can be used only in conjunction with specific data collection choices. mailed questionnaires can have a high proportion of nonrespondents for some populations (see Mangione. 1983). later sampling design choices. Often. For example. or dwelling units). working females were of particular interest to some scientists examining the impact of income maintenance experiments (Skidmore. It is most important to identify the subgroups for which separate analyses are to be conducted. Chapter X. must consider this. schools.questions that come later in the design process. A probability sample of dwelling units is useful mainly for studies in which on-site fieldwork. state-owned cars. which comes a bit later in these questions. Nonresponse affects sampling variability and will cause non-sampling bias to the extent that the members of the sample who choose not to respond are different from those who do. this volume). as will be discussed later.

which can require substantial revisions. The sampling frame is the operational definition of the population. Duplications. it is nearly impossible to obtain an accurate listing of the target population. Because program decisions often determine winners and/or losers. Even when automated databases that contain all members of the population are being used. An alternative would be to use additional listings that include omitted population members to formulate a combination frame or to choose a technique that does not require a frame. However. or the list from which the sample is selected. Missing data are a frequent problem with automated databases. For random-digit dialing. Missing data are another form of nonresponse bias.” the use of a sample can affect the credibility of a study. increase the probability of selection for these units. Response to the first contact is often far less than 50 percent. For instance. On the other hand. Credibility is vital when study results are used to inform policy or program decisions. For studies that may affect funding allocations or when there is expert knowledge of specific cases that may appear to be “unusual” or “atypical. paying dividends in lowering nonsampling bias.resources for studies of entire populations are consumed by attempts to contact all population members. before access can be gained. credibility rather than validity may be the criterion on which the use of the findings turns. households with two or more phones Page 16 . or multiple listings of the same unit. many organizations such as school districts have research review committees that require proposals by submitted. Unchecked duplications result in sampling bias. gaining access can require substantial resources. these increase the time and resources required for data collection. sampling can improve the accuracy of results. when access to the target population is through organizations which serve the population. provides the definition of the study population. In addition. not individuals. the group about which the researchers can reasonably speak. Sampling Choices What Listing of the Target Population Can Be Used for the Sampling Frame? The sampling frame. in nearly every case. raising the issue of substantial nonsampling bias. because the missing data cannot be assumed to be missing at random. listed) The most difficult flaw to overcome is the omission of part of the target population from the sampling frame. For general population surveys. The cost of collecting the data missing from the data base or supplementing information for variables that have not been collected will be less for the sample than for the entire population. small populations and use of the information in the political environment may weigh against sampling. Obviously. Sampling would require fewer resources for the initial survey administration and could allow the investment of more resources in follow-up activities designed to increase responses. A telephone directory would seem to be a likely explicit sampling frame for a study of the population in a community. reviewed. Differences between the target population and the study population as listed in the sampling frame constitute a significant component of nonsampling bias. This can lead to a bias that cannot be estimated for the sample data. such as random-digit dialing instead of the phone book. and approved. it suffers from all four flaws that are commonplace in sampling frames: Omissions: target population units missing from the frame (example: new listings and unlisted numbers) Duplications: units listed more than once in the frame (example: households listed under multiple names) Ineligibles: units not in the target population (example: households recently moved out of the area) Cluster lists: groupings of units listed in the frame (example: households.

For explanatory studies. welfare rolls may actually be listings of cases that include all members of affected families. the only concerns are the cost of screening and the reduction of the expected sample size. If the selection is not done randomly. In addition to screening. Effect sizes are stated in standard deviation units. such as random-digit dialing telephone surveys. Sample size is a principal means by which the researcher can achieve this objective. this volume for more detail). Listings for special population surveys may also contain multiple units. rather than the units themselves. But the efficiency of the sampling design can have considerable impact on the amount of sampling error and the estimate of desired sample size. The level of precision required relates directly to the purpose for which the study results will be used. Cluster listings are caused by sampling frames that include groups of units that are to be analyzed. but entirely too large for setting a mayoral candidate to decide on spending funds on more advertising in the midst of a campaign in the same locality. What Types of Sampling Designs Will Be Used? The five probability sampling designs are simple random sampling. a member of a household with four adults is half as likely to be selected out of that household as is a member of a household with two adults. In some cases. systematic sampling. Ineligible occur when cases that are not members of the target population appear on the sampling list. In some evaluations of program services. a correction may be needed to compensate for the probability of selection if the clusters are unequal in size. especially when the proportion of ineligibles is large. A confidence interval of ±5% may be completely satisfactory for study to assess the need for a particular type of service within a community. In other cases. The primary issue for sampling is the selection of the unit of units from the cluster listing. stratified Page 17 . 1990) and used to adjust estimates. the sample variability that can be tolerated is based on the desire to be able to detect effects or relationship if they occur.are considered duplications. For example. a systematic bias may be introduced. Precision refers to the size of the confidence interval that is drawn around the sample mean or proportion estimates. information is sought only from one individual per cluster listing. To return to the telephone survey example. since the same household is listed two or more times. it has become common to specify an 80 percent chance of detecting the effect. The objective of the researcher is to produce a specified interval within which the true value for the study population is likely to fall. If the selection of the individual is done randomly. for example an effect size of . The power analysis requires that the researchers have an estimate of the size of the effect that they expect the program or intervention to produce and the degree of confidence that they would like to be able to have to detect the effects.25 means that the effect is expected to be one quarter of a standard deviation unit. Power analysis software is available from several sources to determine what the sample size would be required to detect an effect of a specified size. In most cases. weights can be calculated based on the number of duplications for each case in the sample (Henry. Many general population surveys. What Is the Sampling Variability that can be tolerated for the Study? The sampling variability affects the precision of the estimates for descriptive studies and the power to detect effects for explanatory studies. In practice. Precision requirements are used in the calculations of efficient sample sizes. it is likely that the sample size will need to be increased so that sampling errors will not increase due to the screening. A power analysis is conducted to assess the needs for a particular study (See Lipsey. The cost of screening for a telephone survey includes making contact with someone in the household to determine eligibility. Chapter X. This can require several phone calls and can become quite costly. actually sample households. researchers can address duplications by removing them from the list before sampling. duplications can occur because lists of program participants are actually lists of enrollees and individuals may be enrolled at some time during the study period in more than one program. When ineligibles can be screened from the list or from the sample.

If stratified sampling is chosen. and multistage sampling. For illustrative purposes a two stage sample is described in the table. a disproportionate sampling strategy should be considered. how should the clusters be defined? For multistage samples. The probability of selecting any unit is equal to the probability of selecting any other unit. including availability of an adequate sampling frame. the choices do not end with the selection of a design. cluster sampling. are placed into strata and then sampled. It is also common to use stratified cluster sampling. the cost of travel for data collection.4 presents the definitions of all five types of sampling techniques. either proportionately or disproportionately. as well as their requirements and benefits. However. the multistage sampling design which is also referred to as complex sample design has many variations and is best considered a category of designs rather than a particular design. and the availability of prior information about target population.sampling. in which the clusters. the probability of selection for any unit is the sampling fraction for the stratum in which the unit is placed. Choices branch off independently for each design. Will the Probability of Selection Be Equal or Unequal? Choices about the probability of selection will also affect sampling bias. For simple random sampling. the probability of selecting any individual unit is equal to the sampling fraction or the proportion of the population selected for the sample (n/N). Probabilities using a stratified design can be either equal or unequal as can multistage sample designs. how many sampling stages should be used? Table 4. such as schools or clinics. Page 18 . The choice of a design will depend on several factors. how many strata should be used? If cluster sampling is chosen. However. For stratified sample designs. If separate estimates or explanatory analyses are needed for certain subpopulations or some strata are known to have much higher variability for important variables. which would result in unequal probability of selection.

List of primary sampling units List of members for selected primary sampling units Requirements List of study population Count of study population (N) Sample size (n) Random selection of individuals or units Approximate count of study population (N) Sample size (n) Sampling interval (I = N/n rounded down to integer) Random start R such that R ≤I Count of study population for each stratum Sample size for each stratum Count of clusters (C) Count of primary sampling units Number of primary sampling units to be selected Number of members to be selected from primary sampling units Random selection mechanism for primary sampling units and members Same benefits as for cluster. clusters of study population members are sampled. then study population members are selected from each of the sampled clusters. plus may reduce standard error Approximate size of clusters (Nc) Number of clusters to be sampled (c) Random selection mechanism Benefits Easy to administer East to administer in field or with physical objects. List of physical representation of study population Stratified Either equal or unequal probability of selection sample where population is divided into strata (or groups) and a simple random sample of each stratum is selected. List of study population divided into strata Cluster Clusters that contain members of the study population are selected by a simple random sample and all members of the selected clusters are included in the study.4 Probability Sampling Techniques Simple Random Definition Equal probability of selection sample where n units are drawn from population list. Systematic Equal probability of selection sample where a random start is chosen less than or equal to the sampling interval is chosen and every unit that falls at the start and at the interval from the start is selected. such as files or invoices. both by random sampling.Table 4. List of clusters in which all members of study population are contained in one and only one cluster Multistage (two stage) First. when list unavailable Reduces standard error List of study population unnecessary No weighting required Disproportionate stratifications can be used to increase sample size of subpopulations Limits costs associated with travel or approvals from all clusters Clusters can be stratified for efficiency Most complex but most efficient and flexible Standard error calculation is automatic in most software Page 19 .

for example. where n’ is the sample size computed in the first step. assuming a simple random sample. given the sampling sample design? Precision. The most difficult piece of information to obtain for these formulas.How Many Units Will Be Selected for the Sample? Determining the sample size where many discussions of sampling begin. If the Page 20 . the research team needs a great deal of information before the sample size is determined for the study. is the estimate of the standard deviation. It is important for the researcher to review the proposed alternatives carefully in terms of total error. In addition. Nonresponse can occur when a respondent refuses to participate in the survey. The researcher directly controls only the same sample size. the researcher can adjust the sample size. other sample size considerations should be brought to bear at this point. including prior studies. Trade-offs between precision and cost are inherent at this juncture. from the sampling perspective. A number of options are available. Postsampling Choices How Can the Impact of Nonresponse Be Evaluated? Nonresponse for sampling purposes means the number of sampled individuals that did not provide useable responses. Although the sample size is the principal means for influencing means for influencing the precision of the estimate once the design has been chosen. calculated by subtracting the response rate from 1. But increasing the sample size means increasing the cost of data collection. is a function size of the confidence interval. Stratification or the selection of more primary sampling units in multistage sampling can improve the precision of a sample without increasing the number of units in the sample. In cases when the population is relatively small. and level of confidence required (represented by the t statistic). The researcher must consider and analyze numerous factors that may alter earlier choices. will the number of members of subpopulations that the sample can be expected to yield be sufficient precision for the subpopulation estimates? Determining the sample size is generally an iterative process. an iterative process can be used to examine the impact on efficient sample size if an alternative design where used. n is the sample size using the finite population correction error factor. which is influenced primarily by three variables: the standard deviation of the variable of interest. the sample size calculation is done using the following formulas: n’ = s2/(te/t)2. changes in the study population definition from using different sampling frames. to produce an estimate from the sample that is precise enough for the study objectives. the sample size. but as this framework points out. and f is the sampling fraction. For a descriptive study. and feasibility. these adjustments may increase costs also. s is the estimate of the standard deviation. small pilot studies. but perhaps less than increasing the sample size would. te is the tolerable error. the expected response rate or the percentage of ineligibles that may be included in the sampling frame. it is influenced by sampling fraction as a result of the finite population correction. t is the t value for the desired probability level. Of course. n = n’/(1+f). considering it is used prior to conducting the actual data collection. or when a respondent cannot be contacted. In descriptive studies. also. For example. and estimates using the range. researchers must answer this question: What sample size will produce estimates that are precise enough to meet the studies purpose.

1996). as with duplicates on the sampling frame or cluster listings. and Kish (1965). and the variables used to define the strata must be available for all clusters. this volume) and Dilman (1999) discuss several ways of reducing nonresponse. This means that the clusters must be placed into strata before selection. 1996. (Henry. 1983). & Smith. Generally. nonresponse can occur when an individual who is participating in a survey cannot or will not provide an answer to a specific question. They can also arise from deliberate choices. the nonsampling bias is reduced (Kalton. comparing the sample characteristics with known population parameters. Couper & Groves. Page 21 . such as disproportionate stratification. Error cannot be eliminated entirely. This type of sampling strategy can results in standard errors very close to those associated with simple random samples when the sample is properly designed. or examining the sensitivity of the sample estimates to weighting schemes that may provide greater weight to responses from individuals who are considered to have characteristics more like the non-respondents. the researcher must concentrate on reducing total error. Sudman (1976). The standard error for a cluster sample can often be reduced by stratification of the clusters before selection. see Henry. In addition. It is often necessary for the researcher to evaluate the impact of nonresponse by conducting special studies of the nonrespondents. see also Braverman. (For a discussion of the calculation of appropriate weights. Other sampling techniques require modifications to the formula and can be found in Henry (1990). some general guidance can be provided. Also. The researcher must act to make choices throughout the sampling process to reduce error. when the response rates are higher for some subgroups within the sample than others. not the number of units finally selected. Reducing error is the practical objective. and this can be achieved through careful design.) What Are the Standard Errors and Related Confidence Intervals for the Study Estimates? The precision of the estimates and the power of hypothesis tests are determined by the standard errors. Faced with this complex. 1990. Summary The challenge of sampling lies in making trade-offs to reduce total error while keeping study goals and resources in mind. see also Chapter 12. weights should be applied in all these cases. Fowler (1993. The effect is reduced when clusters are internally heterogeneous on the important study variables (large standard deviations within the clusters) or cluster means do not vary. This occurs because the number of independent choices is the number of clusters in cluster sampling. Kalton (1983). multidimensional challenge. Cluster sampling inflates the standard error of the estimates relative to simple random sampling. It is important to recognize that the sampling error formulas are different for the different sampling techniques. but reducing the error associated with one choice can increase errors from other sources. 1996. Formulas for calculating the standard error of the mean calculation for simple random samples were presented earlier in the chapter. Error can arise systematically from bias or can occur due to random fluctuation inherent in sampling. all other things held constant. Stratification lowers the sampling error. 1990. Krosnick. Sampling error can be further lowered when larger sampling fractions are allocated to strata that have the highest standard deviations. Narayan. when compared to simple random samples. many survey organizations increase the weights for the groups with lower response rates such that the proportions of each subgroup in the sample estimates equals the proportional representation of that subgroup in the study population. Unequal probabilities of selection can occur inadvertently in the sampling process. Is It Necessary to Weight the Sample Data? Weighting is usually required to compensate for sampling bias when unequal probabilities result from the researcher’s sampling choices.nonresponding portion of the population is reduced. However.

T. T. R. S. D. 3. T. Practical sampling. (1996). & Groves. What sample plan would you develop for describing the uninsured population of your state? 6.. (2005). (1993). New York: John Wiley. Couper. G. W. For probability samples. Henry. Cohen. (1963). (1994). Gormley. Household-level determinants of survey nonresponse. In M. K. (1996). S. Advances in survey research (pp. Public Opinion Quarterly. Quasi-experimentation: Design and analysis issues for field settings. 997–1003. 71. T. Journal of Human Resources 40:3.). M. R. G. Mail and Internet Surveys: The Tailored Design Method. Experimental and quasi-experimental designs for research. American Psychologist.. How would you go about determining the variable of greatest interest for an evaluation of adolescent mental health programs? 5. P. (1996). Campbell.Discussion Questions 1. Henry. Survey use in evaluation.org/pdfs/standarddef_4. 64:2. (2000).. Fowler. The earth is round.pdf. 71. Newbury Park. F. & Gayer. Dillman. C. 3–15. downloaded January 3. J. Kalton. CA: Sage. Consequences of Reducing Nonresponse in a National Telephone Survey.. CA: Sage. Keeter. & Campbell. G. (1983). 2nd Ed.). D. & Presser. Groves. Introduction to survey sampling. Newbury Park. 125-148. J. (1990).aapor. Page 22 .. & Stanley. M. Miller. Braverman & J. Chicago: Rand McNally. What is a confidence interval? What does it measure? 4. 63–70). T. 533-558. (1979). CA: Sage.. T. A.T. Standard Definitions: Final Dispositions of Case Codes and Outcome Rates for Surveys. Kohut. San Francisco: Jossey-Bass.D. Jr. M. D. 85–90. In what circumstances might you choose a convenience sample over a probability sample? 7. What are the main differences in probability and non-probability samples? 2.. Cook. Does the public have a role in evaluation? Surveys and democratic discourse.. Beverly Hills. what are the main alternatives to simple random samples? Name one circumstance in which each one might become a preferred option for the sampling design. T. Promoting School Readiness in Oklahoma. Slaters (Eds. New Directions in Evaluation. 2007 from www. T. A. Braverman. Chicago: Rand McNally. Survey research methods (2nd ed. What are the major factors that contribute to standard error of the mean? Which of the factors can be most easily controlled by researchers? References American Association of Public Opinion Research (2006). New Directions in Evaluation. C. J. (1999). 49.

McKean. Newbury Park. Newbury Park.). U. Kraemer. January). F. C. Kellerman. (1996). & Fajman. (2004). Educational Measurement: Issues and Practice 23(4). W. Washington. (1987.Kish. S. Assessing School Readiness: Validity and Bias in Preschool and Kindergarten Teachers’ Ratings. L. In M.. W.S. Todd. Department of Education. Mashburn. CA: Sage. San Francisco: Jossey-Bass.. Sudman. Washington. K. Skidmore. J.. H. Discover. M. Survey sampling. Design sensitivity: Statistical power for experimental research. Narayan. An evaluation of lottery expenditures for public school safety in Georgia.. G. S. A. Satisficing in surveys: Initial evidence. & Smith. & Henry. J. Lipsey. W. Advances in survey research (pp. Krosnick. CA: Sage. New York: Academic Press. Overview of the Seattle-Denver Income Maintenance Experiment: Final report. Dossey. DC: Government Printing Office. A. 72–81. Page 23 . T. H. R. N. 29–44). T. (1993). K. I. J.. (1990). A.. New York: John Wiley.. S. (1965). & Thiemann. (1983). (1976). (1996. S. Wald. G. (1987). DC: U.. NAEP 1992 mathematics report card for the nation and the states. Paper presented to the Council for School Performance. & Phillips. Applied sampling. 1630.. Mullis. October). Slaters (Eds. A. L. Lipscomb. Braverman & J. The orderly pursuit of pure disorder. How many subjects? Statistical power analysis in research. L. K. M.