You are on page 1of 6

Chapter 7 Sampling and Sampling Design

Types of Sampling Methods

 A sample is defined as the portion of a population that has been selected for analysis. The
results of the sample are then used to estimate characteristics of the entire population.
Statistical sampling procedures focus on collecting a small representative portion of the larger
population.
 There are three main reasons for selecting a sample:
 Selecting a sample is less time-consuming than selecting every item in the population.
 Selecting a sample is less costly than selecting every item in the population.
 Analyzing a sample is less cumbersome and more practical than analyzing the entire
population.
 The sampling process begins by defining the frame, a listing of items that make up the
population. Frames are data sources such as population lists, directories, or maps.
 Inaccurate or biased results can occur if a frame excludes certain portions of the population.
 Using different frames to generate data can lead to different conclusions.
 There are two types of samples: nonprobability samples and probability samples.
 In a nonprobability sample, you select the items or individuals without knowing their
probabilities of selection. The theory of statistical inference that has been developed for
probability sampling cannot be applied to nonprobability samples.
 A common type of nonprobability sampling is convenience sampling. Items selected are easy,
inexpensive, or convenient to sample.
 In many cases, participants in the sample select themselves.
 For many studies, only a nonprobability sample such as a judgment sample is available. In a
judgment sample, you get the opinions of preselected experts in the subject matter. Although
the experts may be well informed, you cannot generalize their results to the population.
 Nonprobability samples can have certain advantages, such as convenience, speed, and low cost.
However, their lack of accuracy due to selection bias and the fact that the results cannot be
used for statistical inference more than offset these advantages.
 In a probability sample, you select items based on known probabilities.
 Probability samples allow you to make inferences about the population of interest.
 The four types of probability samples most commonly used are simple random, systematic,
stratified, and cluster samples.

Simple Random Samples


 Every item from a frame has the same chance of selection as every other item.
 Is the most elementary random sampling technique
 Forms the basis for the other random sampling techniques
 You use n to represent the sample size and N to represent the frame size. You number every
item in the frame from 1 to N. The chance that you will select any particular member of the
frame on the first selection is 1/N
 You select samples with replacement or without replacement.
 Sampling with replacement means that after you select an item, you return it to the frame,
where it has the same probability of being selected again.
 Sampling without replacement means that once you select an item, you cannot select it again.
The chance that you will select any card not previously chosen on the second selection is now 1
out of N – 1.
 Fishbowl methods are not very useful.
 A table of random numbers consists of a series of digits listed in a randomly generated
sequence. The table can be read either horizontally or vertically.
 You first need to assign code numbers to the individual items of the frame. Then you generate
the random sample by reading the table of random numbers and selecting those individuals
from the frame whose assigned code numbers match the digits found in the table.

Systematic Samples
 In a systematic sample, you partition the N items in the frame into n groups of k items, where
k = N/n
 A systematic sample is also a convenient mechanism for collecting data from telephone books,
class rosters, and consecutive items coming off an assembly line.
 Simple random sampling and systematic sampling are simpler than other, more sophisticated,
probability sampling methods, but they generally require a larger sample size.
 Systematic sampling is prone to selection bias.

Stratified Samples

 You first subdivide the N items in the frame into separate subpopulations, or strata. A stratum is
defined by some common characteristic, such as gender or year in school.
 Stratified sampling is more efficient than either simple random sampling or systematic sampling.
 The homogeneity of items within each stratum provides greater precision in the estimates of
underlying population parameters.

Cluster Samples

 You divide the N items in the frame into clusters that contain several items. Clusters are often
naturally occurring designations, such as counties, election districts, city blocks, households, or
sales territories. You then take a random sample of one or more clusters and study all items in
each selected cluster.
 Cluster sampling is often more cost-effective than simple random sampling. However, cluster
sampling often requires a larger sample size to produce results as precise as those from simple
random sampling or stratified sampling.

Evaluating Survey Worthiness

 Surveys are used to collect data.


 First, you must evaluate the purpose of the survey, why it was conducted, and for whom it was
conducted. The second step in evaluating the worthiness of a survey is to determine whether it
was based on a probability or nonprobability sample.

Survey Error
 There are four types of survey errors:
 Coverage error - occurs if certain groups of items are excluded from the frame so that
they have no chance of being selected in the sample. Coverage error results in a
selection bias. Any random probability sample selected will provide only an estimate of
the characteristics of the frame, not the actual population.
 Nonresponse error - Not everyone is willing to respond to a survey. In fact, research has
shown that individuals in the upper and lower economic classes tend to respond less
frequently to surveys than do people in the middle class. Nonresponse error arises from
failure to collect data on all items in the sample and results in a nonresponse bias. You
should make several attempts to convince such individuals to complete the survey.
Personal interviews and telephone interviews usually produce a higher response rate
than do mail surveys—but at a higher cost.
 Sampling error - reflects the variation, or “chance differences,” from sample to sample,
based on the probability of particular individuals or items being selected in the
particular samples. The margin of error is the sampling error. You can reduce sampling
error by using larger sample sizes.
 Measurement error - the process of measurement is often governed by what is
convenient, not what is needed. The measurements you get are often only a proxy for
the ones you really desire. In order to avoid leading questions, you need to present
questions in a neutral manner. Three sources of measurement error are ambiguous
wording of questions (ex. poor questionnaire wording), the Hawthorne effect, and
respondent error. The Hawthorne effect occurs when a respondent feels obligated to
please the interviewer. Respondent error occurs as a result of an overzealous or
underzealous effort by the respondent.

Ethical Issues

 Coverage error can result in selection bias and becomes an ethical issue if particular groups or
individuals are purposely excluded from the frame so that the survey results are more
favourable to the survey’s sponsor.
 Nonresponse error can lead to nonresponse bias and becomes an ethical issue if the sponsor
knowingly designs the survey so that particular groups or individuals are less likely than others
to respond.
 Sampling error becomes an ethical issue if the findings are purposely presented without
reference to sample size and margin of error so that the sponsor can promote a viewpoint that
might otherwise be inappropriate.
 Measurement error becomes an ethical issue in one of three ways: (1)a survey sponsor chooses
leading questions that guide the responses in a particular direction;(2)an interviewer, through
mannerisms and tone, purposely creates a Hawthorne effect or otherwise guides the responses
in a particular direction; or(3)a respondent wilfully provides false information.
 Ethical issues also arise when the results of nonprobability samples are used to form conclusions
about the entire population.

Sampling Distributions
 Your main concern when making a statistical inference is reaching conclusions about a
population, not about a sample
 Hypothetically, to use the sample statistic to estimate the population parameter, you could
examine every possible sample of a given size that could occur.
 A sampling distribution is the distribution of the results if you actually selected all possible
samples.

Sampling Distribution of the Mean

 The sampling distribution of the mean is the distribution of all possible sample means if you
select all possible samples of a given size.
 The sample mean is unbiased because the mean of all the possible sample means (of a given
sample size, n) is equal to the population mean, 𝜇.

 Number of possible samples = N n

Standard Error of the Mean

 As the sample size increases, the effect of a single extreme value becomes smaller because it is
averaged with more values.
 The value of the standard deviation of all possible sample means, called the standard error of
the mean, expresses how the sample means vary from sample to sample.
 As the sample size increases, the standard error of the mean decreases by a factor equal to the
square root of the sample size.
Sampling from Non-Normally Distributed Populations—The Central Limit Theorem

 The Central Limit Theorem states that as the sample size (i.e., the number of values in each
sample) gets large enough, the sampling distribution of the mean is approximately normally
distributed. This is true regardless of the shape of the distribution of the individual values in the
population.
 When the sample size is at least 30, the sampling distribution of the mean is approximately
normal.
 In the case in which the distribution of a variable is extremely skewed or has more than one
mode, you may need sample sizes larger than 30 to ensure normality in the sampling
distribution of the mean.
 The mean of any sampling distribution is always equal to the mean of the population.
 When the population is normally distributed, the sampling distribution of the mean is normally
distributed for any sample size.
 When samples of size n=2 are selected, there is a peaking, or central limiting, effect already
working. Forn=5, the sampling distribution is bell-shaped and approximately normal. When
n=30 ,the sampling distribution looks very similar to a normal distribution. In general, the larger
the sample size, the more closely the sampling distribution will follow a normal distribution
 For exponential distribution, this population is extremely right-skewed. When n=2 ,the
sampling distribution is still highly right-skewed but less so than the distribution of the
population. For n=5 , the sampling distribution is slightly right-skewed. When n=30 , the
sampling distribution looks approximately normal.
 Conclusions regarding the Central Limit Theorem:
 For most population distributions, regardless of shape, the sampling distribution of the
mean is approximately normally distributed if samples of at least size 30 are selected.
 If the population distribution is fairly symmetrical, the sampling distribution of the
mean is approximately normal for samples as small as size 5.
 If the population is normally distributed, the sampling distribution of the mean is
normally distributed, regardless of the sample size.
Sampling Distribution of the Proportion

 The population proportion, represented π by is the proportion of items in the entire population
with the characteristic of interest.
 The sample proportion, represented by p, is the proportion of items in the sample with the
characteristic of interest.
 The sample proportion, a statistic, is used to estimate the population proportion, a parameter.
 To calculate the sample proportion, you assign one of two possible values, 1 or 0, to represent
the presence or absence of the characteristic. You then sum all the 1 and 0 values and divide by
n, the sample size.

 The statistic p is an unbiased estimator of the population proportion, π.

 You can use the normal distribution to approximate the binomial distribution when nπ and n(1-
π) are each at least 5.

You might also like