Sampling: What is it?

Quantitative Research Methods
ENGL 5377
Spring 2007














Bobbie Latham
March 8, 2007


Introduction

In any research conducted, people, places, and things are studied. The opportunity to
study the entire population of those people, places, and things is an endeavor that most
researchers do not have the time and/or money to undertake. The idea of gathering data
from a population is one that has been used successfully over the years and is called a
census. This method is mentioned several times in the bible (Wikipedia). It was also
used by the Ancient Egyptians “to obtain empirical data describing their subjects”
(Babbie 37). In past years, the idea of collecting data from the entire population was used
by political entities to collect opinions about potential political candidates. Census data
collection is still very popular for collecting public opinion for political endeavors. For
most researchers, however, collecting data from an entire population is almost impossible
because of the amount of people, places, or things within the population. Taking a census
involves much time and money; something to which most researchers are not
accustomed. To collect data on a smaller scale, researchers gather data from a portion or
sample of the population.

The purpose of this paper is to describe sampling as a method of data collection.
Probability and non-probability sampling as well as the surrounding validity issues will
be discussed. Sampling theory may be adapted for content analysis, laboratory
experiments, and participant observation (Babbie 100). However, this paper will focus on
sampling as a method to select participants for surveys; more specifically interviewing
and self-administered questionnaires.


Sampling Definitions

The sample method involves taking a representative selection of the population and using
the data collected as research information. A sample is a “subgroup of a population”
(Frey et al. 125). It has also been described as a representative “taste” of a group
(Berinstein 17). The sample should be “representative in the sense that each sampled unit
will represent the characteristics of a known number of units in the population” (Lohr 3).
All disciplines conduct research using sampling of the population as a method, and the
definition is standard across these disciplines. Only the creative description of sampling
changes for purposes of creating understanding. The standard definition always includes
the ability of the research to select a portion of the population that is truly representative
of said population.

Sampling theory is important to understand in regards to selecting a sampling method
because it seeks to “make sampling more efficient” (Cochran 5). Cochran posits that
using correct sampling methods allows researchers the ability to reduce research costs,
conduct research more efficiently (speed), have greater flexibility, and provides for
greater accuracy (2).

Two standard categories of the sampling method exist. These two categories are called
probability sampling and non-probability sampling. Probability sampling is sometimes
called random sampling as non-probability sampling is sometimes called non-random
sampling. These terms are interchangeable. For the purpose of this paper, I will use
probability and non-probability as the naming conventions for the two sampling method
categories. It is important to note that all sample selection methods described are
selection without replacement, that is “once a unit is selected in the sampling process, it
is removed from the pool eligible for future selection” (Henry 27). All other texts
referenced in this paper assume selection without replacement. Gary Henry defines
selection with replacement as a process where “the selected unit is returned to the pool
eligible for selection” (28); however, no other references were found to this type of
selection method.

The choice to use probability or non-probability sampling depends on the goal of the
research. When a researcher needs to have a certain level of confidence in the data
collection, probability sampling should be used (MacNealy 125). Frey, et al. indicates
that the two sampling methods “differ in terms of how confident we are about the ability
of the selected sample to represent the population from which it is drawn” (126).
Probability samples can be “rigorously analyzed to determine possible bias and likely
error” (Henry 17). Non-probability sampling does not provide this advantage but is useful
for researchers “to achieve particular objectives of the research at hand” (Henry 17).
These objectives may allow for selection of the sample acquired by accident, because the
sample “knows” the most, or because the sample is the most typical (Fink & Kosecoff
53). Probability and non-probability sampling have advantages and disadvantages and the
use of each is determined by the researcher’s goals in relation to data collection and
validity. Each sampling category includes various methods for the selection process.


Probability Sampling

Probability sampling provides an advantage because of researcher’s ability to calculate
specific bias and error in regards to the data collected. Probability sampling is defined as
having the “distinguishing characteristic that each unit in the population has a known,
nonzero probability of being included in the sample” (Henry 25). It is described more
clearly as “every subject or unit has an equal chance of being selected” from the
population (Fink 10). It is important to give everyone an equal chance of being selected
because it “eliminates the danger of researchers biasing the selection process because of
their own opinions or desires” (Frey, et al. 126). When bias is eliminated, the results of
the research may be generalized from the sample to the whole of the population because
“the sample represents the population” (Frey, et al. 126).

There are four types of probability sampling that are standard across disciplines. These
four include simple random sampling, systematic random sampling, stratified random
sampling, and cluster sampling (Table 1).

Simple Random Sampling

Simple random sampling is often called straight random sampling. The naming
convention of this type of probability sampling method is not indicative of the discipline
but reliant upon the researcher or author of the various books and articles referenced.
That is to say that these two terms are interchangeable and is not interdependent on a
specific discipline within academia.

Simple random sampling requires that each member of the population have an equal
chance of being selected (as is the main goal of probability sampling). A simple random
sample is selected by assigning a number to each member in the population list and then
“use a random number table to draw out the members of the sample” (MacNealy 155).
Sharon Lohr explains that by using simple random sampling, the researcher “is in effect
mixing up the population before grabbing n units” (24). Another way of viewing simple
random sampling precludes that “all members of the study population are either
physically present or listed, and the members are selected at random until a previously
specified number of members or units has been selected” (Henry 27). Each member of
the population is “selected one at a time, independent of one another and without
replacement; once a unit is selected, it has no further chance to be selected” (Fowler 14).
Regardless of the process used for simple random sampling, the process can be laborious
if the list of the population is long or it is completed manually without the aid of a
computer (Babbie 84; Fowler, J r. 14).

An example of simple random sampling may include writing each member of the
population on a piece of paper and putting in a hat. Selecting the sample from the hat is
random and each member of the population has an equal chance of being selected. This
example is not feasible for large population, but can be completed easily if the population
is very small.

Researchers who choose simple random sampling must be cognizant of the numbers that
they choose. Researcher bias in regards to preferred numbers can be a problem for the
end results in regards to sample selection (Frey, et al. 126). It is best to ask other
researchers to aid in the selection of the numbers to be used in the selection process. It is
also important to note that by using simple random sampling, the sample selected may
not include all “elements in the population that are of interest” (Fink 11).


Systematic Random Sampling

Systematic random sampling is usually preferred over simple random sampling in so far
as it is more convenient for the researcher. This type of probability sampling is also
called ordinal sampling and pseudo-simple random samples (Frey, et al. 128; Henry 28).
Systematic random sampling includes “selection of sampling units in sequences separated
on lists by the interval of selection” (Kish 21). The selection of the sample from the
population list is made by randomly selecting a beginning and choosing every nth name
(MacNealy 155). Frey (et al.) calls the interval used to select every nth name the
sampling rate (28). Earl Babbie defines the same as sampling interval (84).

Before selecting from the population list, determine the “number of entries on the list and
the number of elements from the list that are to be selected” (Fowler, J r. 14). For
example, if there are 129 people on the population list select a beginning or starting point
at random and choose every tenth name that appears on the list. If you randomly choose
to begin on the name that appears on line 24, you will select for the sample the names
that appear on lines 24, 34, 44, and so on.

The most important element of systematic random sampling is that the selection starting
point is random. (Babbie 84; Fowler J r. 14; Henry 28; MacNealy 155). One inherent
disadvantage to systematic random sampling that researchers face is that the population
list should be carefully examined for arrangement order (Babbie 85). Babbie goes on to
explain that “if the elements are arranged in any particular order, you should ascertain
whether that order will bias the sample to be selected and should take steps to counteract
any possible bias” (85). Henry expounds by describing that this issue arises when “the
population listing is arranged in cyclical fashion and the cycle coincides with the
selection interval” (28). This problem can be remedied by examining the list and making
sure that the list of names is not arranged in any type of order.


Table 1
Probability Sampling Methods

Type of Sampling Selection Strategy
Simple Each member of the study population has an
equal probability of being selected.
Systematic Each member of the study population is either
assembled or listed, a random start is
designated, then members of the population are
selected at equal intervals
Stratified Each member of the study population is
assigned to a group or stratum, then a simple
random sample is selected from each stratum.
Cluster Each member of the study population is
assigned to a group or cluster, then clusters are
selected at random and all members of a
selected cluster are included in the sample.
(Henry 27)

Stratified Random Sampling

Stratified random sampling is “one in which the population is divided into subgroups or
‘strata,’ and a random sample is then selected from each subgroup” (Fink 11). When a
few characteristics are know about a population, stratified random sampling is preferable
because the population may be arranged in subgroups and then a random sample may be
selected from each of these subgroups (Babbie 85; Cochran 65; Fowler, J r. 15; Henry 28;
Kish 21). MacNealy further advises “arranging the original unit into categories so that the
distribution of a particular group in the population of interest will be closely replicated in
the sample” (156). These subgroups can exhibit characteristics including but not limited
to gender, race, ethnicity, religion, and age groups.

Two types of stratified random sampling include proportionate and disproportionate.
Proportionate stratification is “often done to insure representation of groups that have
importance to the research” and disproportionate is “done to allow analysis of some
particular strata members or to increase the overall precision of the sample estimates”
(Henry 29). The big difference between the two stems from the use of a fraction.
Proportionate stratified uses the same fraction for each subgroup and disproportionate
uses different fractions for each subgroup. To choose which is right for a research project,
the researcher must be aware of the various numbers of members in each subgroup. Take
for instance a population of churches in Lubbock, Texas. Whereas the First Baptist
Church may have 700 members in the subgroup, the Assembly of God may only have
130 members. This is yet another choice the researcher must make

A more simple example is if the population being examined is high school students, of
which the population is 55% female and 45% male, the population should be listed by
gender. The selection process would then include selecting every nth female from the
female list until 55% of the list is of the female gender. The remaining 45% should be
selected from the male list by choosing every nth male. This ensures that the sample is
representative of the population in so far as gender is concerned.

A concern when using stratified random sample is that the researcher must identify and
justify the subgroups (Fink 13). By using stratified random sampling, there is an attempt
“to control for sampling error” (MacNealy 156). To control for sampling error,
researchers must not only identify and justify the subgroups but make sure they are truly
representative of the population.

Cluster Sampling

Cluster sampling, on the surface, is very similar to stratified sampling in that “survey
population members are divided into unique, nonoverlapping groups prior to sampling”
(Henry 29). These groups are referred to as clusters instead of strata because they are
“naturally occurring groupings such as schools, households, or geographic units” (Henry
29). Where as a stratified sample “involves selecting a few members from each group or
stratum,” cluster sampling involves “the selection of a few groups and data are collected
from all group members” (Henry 29). This sampling method is used when no master list
of the population exists but “cluster” lists are obtainable (Babbie 88; Frey, et al. 130;
Henry 29; Lohr 24; MacNealy 156).

For example, a researcher wants a list of all special education teachers in the United
States. A comprehensive list does not exist; therefore, the researcher must then contact
each state and ask for a comprehensive list for that specific state. If each state does not
compile a comprehensive list, the researcher would then contact each school in that state
asking for a list of special education teachers. The groups of special education teachers
compiled may then be put into groups or clusters depending on the state in which they are
teaching.

It is important to note that with the method of cluster sampling, an additional sampling
method resides. Multistage sampling is used in cluster sampling. At least one reference
separated multistage sampling from cluster sampling as a probability sampling method
(Henry 30). Another, Fowler, J r., named only multistage sampling and left the word
cluster out all together (18). Henry indicates that multistage sampling is an extension of
cluster sampling whereas all others include within the method of multistage sampling as
part of cluster sampling. Multistage sampling occurs when a researcher must cluster
together certain groups because a master list is not available but encounters a more
complex design. It involves two stages: 1) Select clusters randomly from the population
and list, and 2) Select individuals randomly from the clusters (Babbie 88; Frey et al. 130).
While multistage is a part of cluster sampling in most of the books researched, not all see
it as one method.

A drawback to using cluster sampling occurs within the precision of the statistics (Babbie
88; Henry 30). While cluster sampling is convenient when a master list of the population
does not exist, the researcher will run the risk of inaccurate findings. One way to increase
the accuracy of results from cluster sampling is to use many clusters when implementing
multistage sampling (Fink 16). Fink goes on to explain “as you increase the number of
clusters, you can decrease the size of the sample within each” (16).


Non-probability Sampling

The advantage of non-probability sampling is that it a convenient way for researchers to
assemble a sample with little or no cost and/or for those research studies that do not
require representativeness of the population (Babbie 97). Non-probability sampling is a
good method to use when conducting a pilot study, when attempting to question groups
who may have sensitivities to the questions being asked and may not want answer those
questions honestly, and for those situations when ethical concerns may keep the
researcher from speaking to every member of a specific group (Fink 17). In non-
probability sampling, subjective judgments play a specific role (Henry 16). Researchers
must be careful not to generalize results based on non-probability sampling to the general
population.

Non-probability sampling includes various methods. None of the resources agree on all
of them. MacNealy indicates three methods where as Frey, et al. and Henry provide five
methods under non-probability sampling. Frey, et al. and Henry do not agree on the
naming conventions of the five given in each of their books. I attempt to summarize the
various non-probability sampling methods in Table 2.



Table 2
Various Non-probability Sampling Methods by Author

Author Types of Non-probability Sampling
Babbie - Purposive or judgmental sampling
- Quota sampling
- Reliance of available subjects
(Convenience)
Fink - Convenience
- Snowball sampling
- Quota sampling
- Focus groups
Frey, et al. - Convenience
- Volunteer
- Purposive
- Quota
- Network (snowball)
Henry - Conveniences samples
- Most similar/most dissimilar samples
(purposive)
- Typical case samples (purposive)
- Critical case samples (purposive)
- Snowball samples
- Quota samples
MacNealy - Convenience sampling
- Purposeful sampling
- Snowball sampling


The following non-probability sampling methods will be discussed in this section with
reference to the various naming conventions in table 2: Convenience, Purposive,
Snowball, and Quota. These four methods of non-probability sampling cover all those
listed in Table 2 although the naming conventions are not the same. Table 3 summarizes
the four non-probability sampling methods.

Convenience

Convenience sampling includes participants who are readily available and agree to
participate in a study (Fink 18; Frey, et al. 131; Henry 18; MacNealy 156). MacNealy
indicates that convenience sampling is often called accidental (156), while Frey, et al.
agree with the alternate title of accidental but also include haphazard as an alternate title
(131). Babbie does not use the specific title of convenience, but calls this same type of
non-probability sample “reliance on available subjects” (99).

All of these alternate names for convenience non-probability sampling include the same
definition. Convenience is just that… convenient. This is a relatively easy choice for
researchers when a group of people cannot be found to survey or question.

For example, convenience sampling may include going to a place of business (mall,
restaurant, etc.) and questioning or surveying those people who are available and consent
to being questioned. If the researcher is interested in what people think of hair cutting
techniques from a consumer perspective, the researcher may go to a hair salon and a
barber shop and poll those patrons leaving the establishment after getting their hair cut.

While convenience sampling includes only those ready and available, there is no excuse
for sloppiness (Babbie 99). Babbie goes on to explain that “survey researchers need to
find ways of procuring a sample that will represent the population they are interested in
learning about (99). In the example above, the interest is in people who have had their
hair cut recently. The researcher would get far less results from those people exiting a
restaurant. While some of those people may have had their haircut that day, the better
selection is to go to a place where haircuts take place.

Purposive

Purposive non-probability sample is also known as judgment or judgmental (Babbie 97;
J ones 766). It is referred to as purposeful by MacNealy (157). Gary Henry breaks
purposive down into three different methods: Most similar/dissimilar cases, typical cases,
and critical cases. No matter the naming convention used, all authors agree on the
definition of this non-probability sampling method.

Purposive sampling is selecting a sample “on the basis of your own knowledge of the
population, its elements, and the nature of your research aims” (Babbie 97). That is the
population is “non-randomly selected based on a particular characteristic” (Frey, et al.
132). The individual characteristics are selected to answer necessary questions about a
“certain matter or product” (MacNealy 157). The researcher is then able to select
participants based on internal knowledge of said characteristic. This method is useful if a
researcher wants to study “a small subset of a larger population in which many members
of the subset are easily identified but the enumeration of all is nearly impossible” (Babbie
97). Pilot studies are well suited to this type of non-probability sampling method.

For example, if a researcher wants to know student thoughts on using an online
registration system, those students who attempt to use the system would be surveyed. If
this survey took place at one institution, the results could not be generalized to every
institution utilizing web registration, only the institution where the survey took place.

Frey, et al. indicates that purposive non-probability sampling and stratified probability
sampling are very similar but warn that there is a crucial difference between the two.
Researchers using purposive sampling do not “select respondents randomly from each
group within the stratification categories” where as stratified sampling includes random
sampling at its core (132). All respondents, not only those randomly selected, “who
possess the characteristic are included” (132). It is important to note that purposive
sampling precludes that the researcher understand the characteristics clearly and
thoroughly enough to choose the sample and relate those findings only to that specific
group and not to the population as a whole.

Table 3
Non-probability Sampling Methods

Type of Sampling Selection Strategy
Convenience Select cases based on their availability for the
study.
Purposive Select cases that judged to represent similar
characteristics.
Snowball Group members identify additional members to
be included in the sample.
Quota Interviewers select a sample that yields the
same proportions as the population proportions
on easily identified variables.
(Henry 18)

Snowball

Frey, et al. call snowball sampling “network” sampling (133). The definitions are the
same. Snowball sampling is used “in those rare cases when the population of interest
cannot be identified other than by someone who knows that a certain person has the
necessary experience or characteristics to be included” (MacNealy 157). Snowball
sampling also includes relying on previously identified group members to identify others
who may share the same characteristics as the group already in place (Henry 21).

For example, a researcher wants to find usability engineers who have lost their job due to
company downsizing. A list of these types of people does not exist, but if the researcher
knows someone who has experienced this, that person may know of others and give
contact information so that others may be added to the group. MacNealy describes this as
“one participant leads to another” (157). Again, this type of non-probability sampling
cannot be generalized to a population but can be generalized to the group who shares the
same characteristics.



Quota

MacNealy does not include quota sampling in her non-probability section, but others do
well to define it as dividing the “population group being studied into subgroups. Then
based on the proportions of the subgroups needed for the final sample, interviewers are
given a number of units from each subgroup that they are to select and interview” (Henry
22). Quota sampling is a good method to use to non-randomly select groups based on
gender, age, race, and ethnicity, to name a few.

Frey, et al. describe quota sampling where “respondents are selected non-randomly on the
basis of their known proportion to the population” (133). Gary Henry describes quota
sampling as dividing the population group into subgroups and based on the proportions,
“interviewers are given a number of units from each subgroup that they are to select and
interview” (22). Henry compares quota sampling to stratified probability sampling but
gives a big difference between the two. Quota non-probability sampling and stratified
probability sampling are different in that “quota sampling allows the interviewer
discretion in the selection of the individuals for the sample” (22).

There are a number of problems that researchers should be aware of when choosing to
use this method of non-probability sampling:

• The list of subgroups and the proportions identified must be accurate before the
sampling begins (Babbie 99).
• The selection of the sample elements within a given cell (for proportion choice)
may include bias although the proportion of the population is estimated correctly
(Babbie 99).
• Non-response is hidden in quota sampling because the interviewer may simply
select another household to interview and “may under represent the proportion of
the population that is difficult to reach” (Henry 22).
• Generalizations to the population cannot be made when using quota sampling
(Henry 22).

Gary Henry concludes his discussion of quota sampling by indicating that this non-
probability sampling method has “fallen out of favor” for the reasons stated above as well
as indicating that “quotas is a biased sampling technique, although the bias is generally
small” (23). He also indicates that sampling error may higher when using this technique
(23).

Sample Errors

As with all research methods, sampling provides some room for error on the part of the
researcher. Being aware of those possible errors is essential in selection of the sampling
method used as well as calculation of the data collected. Simply being aware of possible
errors is often not enough. Arlene Fink believes that no matter how thorough and
proficient the researcher is, “sampling bias or error is inevitable” (25). Sampling error
may be defined as “the error that results from taking one sample instead of examining the
whole population” (Lohr 15). Lohr simply defines several types of sample errors as
“undercoverage, nonresponse, and sloppiness in data collection” (16).

Undercoverage refers to selecting a sample that is not large enough. The error here is that
the information gathered from a small sample is not representative of the population and
cannot be generalized to that population. Gary Henry indicates that “small sample size
may contribute to a conservative bias (Type II error) in the application of the statistical
test” (13). This happens when “a null hypothesis is not rejected although in fact it is
false” (13).

Nonresponse is a nonsampling error that precludes that some members of the population
who are eligible to be sampled are unwilling to participate or do not answer all questions
on the survey(s) (Cochran 292: Fink 26; Henry 124; Lohr 6). Lohr indicates that “the
main problem caused by nonresponse is potential bias of population estimates” (257).

Nonsampling errors “occurs because of imprecision in the definition of the target and
study population and errors in survey design and measurement” (Fink 25). Some errors of
nonsampling include changes due to historical circumstances, neglecting definitions and
inclusion and exclusion of criteria, and instrument or survey process instrument bias
(Fink 26).

Researchers should keep in mind that an “increase and sample size and an increased
homogeneity of the elements being sampled” allow for the reduction of sampling error
(Babbie 89). However, Lohr warns that “increasing the sample size without targeting
nonresponse does nothing to reduce nonresponse bias; a larger sample size merely
provides more observations from the class of persons that would respond to the survey”
(257).

Conclusion

Researchers may choose from a variety of sampling methods. The researcher goals
inform which sampling method is best for the research to be conducted. The main choice
in regards to sample method choice is whether or not the researcher wants to generalize
the findings from the sample to the whole of the population being studied. Being aware of
possible errors due to the sample method chosen is also very important because giving
possible errors within the results section allows the study to be regarded as valid. Many
sample method choices are available; the researcher must choose the method that is right
for the study.
References


Babbie, Earl. Survey Research Methods. Belmont, California: Wadsworth Publishing
Company, 2
nd
ed., 1990.

Berinstein, Paula. Business Statistics on the Web: Find Them Fast – At Little or No Cost.
New J ersey: CyberAge Books, 2003.

Cochran, William G. Sampling Techniques. New York: J ohn Wiley & Sons, Inc., 1953.

Fink, Arlene. How to Sample in Surveys. Vol. 6. London: Sage Publications, 1995.

Fowler, J r. Floyd J . Survey Research Methods. 2
nd
ed. Vol. 1. London: Sage Publications,
1993.

Frey, Lawrence R., Carl H. Botan, and Gary L. Kreps. Investigating Communication: An
Introduction to Research Methods. 2
nd
ed. Boston: Allyn and Bacon, 2000.

Henry, Gary T. Practical Sampling. Vol. 21. London: Sage Publications, 1990.

J ones, Howard L. “The Application of Sampling Procedures to Business Operations.”
Journal of the American Statistical Association. 50.271 (1955): 763-774.

Kish, Leslie. Survey Sampling. New York: J ohn Wiley & Sons, 1965.

Lohr, Sharon L. Sampling: Design and Analysis. Albany: Duxbury Press, 1999.

MacNealy, Mary Sue. Strategies for Empirical Research in Writing. New York:
Longman, 1999.

Wikipedia. The Free Encyclopedia. 7 March 2007. 22 February 2007
http://en.wikipedia.org/wiki/Sampling_(statistics)#History_of_sampling