Professional Documents
Culture Documents
Important Terms of Sampling
Important Terms of Sampling
Sampling: It is the process of choosing a representative sample from a target population and collecting data
from that sample in order to come to a conclusion about the population as a whole .
Census: A Census is the process of collecting information about every member of a population. A census
includes every member in a selected population. Census provides statistical information about the population
of a country, ratio of different genders, number of employed people, total number of educated people, etc.
Objective of Sampling: To obtain the optimum results, i.e., the maximum information about the
characteristics of the population with the available sources at our disposal in terms of time, money and
manpower by studying the sample values only. Obtain the best possible estimates of the population
parameters.
Discrete and Continuous: Discrete variables are usually obtained by counting. There are a finite or
countable number of choices available with discrete data. You can't have 2.63 people in the room.
Continuous variables are usually obtained by measuring. Length, weight, and time are all examples
of continuous variables. Since continuous variables are real numbers, we usually round them. This
implies a boundary depending on the number of decimal places. For example: 64 is really anything
63.5 <= x < 64.5. Likewise, if there are two decimal places, then 64.03 is really anything 63.025 <= x
< 63.035. Boundaries always have one more decimal place than the data and end in a 5.
Parameter: the summary description of a given variable in a population
Random Sample: Random Sample is a subset of a statistical population in which each member of the subset
has an equal probability of being chosen. A simple random sample is meant to be an unbiased representation
of a group.
Non-Random Sample:
Sampling Units/Units: A sample from a population can be drawn one UNIT at a time, or more than one unit at
a time (one can sample clusters of units). The fundamental unit of the sample is called the sampling unit. It
need not be a unit of the population.
Variable of Interest (Random Variable X): A random variable is an assignment of numbers to possible
outcomes of a RANDOM EXPERIMENT . For example, consider tossing three coins. The number of heads showing when
the coins land is a random variable: it assigns the number 0 to the outcome {T, T, T}, the number 1 to the outcome
{T, T, H}, the number 2 to the outcome {T, H, H}, and the number 3 to the outcome {H, H, H}.
Advantages of Sampling:
Sample design: A set of rules or procedures that specify how a sample is to be selected. This can either be
probability or non-probability.
Sampling Frame: List of sampling units from which the sample, or some stage of the sample, is selected. It is
simply a list of the study population.
Probability Sampling: A sample drawn from a population using a random mechanism so that every element
of the population has a known chance of ending up in the sample.
Non-probability Sampling: Non-probability sampling is a sampling technique where the samples are
gathered in a process that does not give all the individuals in the population equal chances of being selected.
Sampling with and without replacement: Suppose we have a bowl of 100 unique numbers from 0 to 99. We
want to select a random sample of numbers from the bowl. After we pick a number from the bowl, we can put
the number aside or we can put it back into the bowl. If we put the number back in the bowl, it may be
selected more than once; if we put it aside, it can selected only one time. When a population element can be
selected more than one time, we are sampling with replacement. When a population element can be
selected only one time, we are sampling without replacement.
Sampling errors: Degree of error to be expected for a given sample design or the difference between the
sample mean and the population mean.
Non-sampling errors: A statistical error caused by human error to which a specific statistical analysis is
exposed. These errors can include, but are not limited to, data entry errors, biased questions in a
questionnaire, biased processing/decision making, inappropriate analysis conclusions and false information
provided by respondents.
Sampling Bias: The notion that those selected are not "typical" or "representative" of the larger populations
that have been chosen from.
Random Number Table: A random number table is a list of numbers, composed of the digits 0, 1, 2, 3, 4, 5,
6, 7, 8, and 9. Numbers in the list are arranged so that each digit has no predictable relationship to the digits
that preceded it or to the digits that followed it. In short, the digits are arranged randomly. Numbers in a
random number table are random numbers.
Probability sampling (SRS, Stratified Sampling, Systematic sampling, Cluster Sampling, Multi Stage,
Multi-Phase Sampling, Sequential sampling)
Simple Random Sampling: A simple random sample (SRS) of size n is produced by a scheme which
ensures that each subgroup of the population of size n has an equal probability of being chosen as the
sample.
Stratified Sampling: There may often be factors which divide up the population into sub-
populations (groups / strata) and we may expect the measurement of interest to vary among the
different sub-populations. This has to be accounted for when we select a sample from the population
in order that we obtain a sample that is representative of the population. This is achieved by
stratified sampling. A stratified sample is obtained by taking samples from each stratum or sub-
group of a population. When we sample a population with several strata, we generally require that
the proportion of each stratum in the sample should be the same as in the population. Stratified
sampling techniques are generally used when the population is heterogeneous, or dissimilar, where
certain homogeneous, or similar, sub-populations can be isolated (strata). Simple random sampling
is most appropriate when the entire population from which the sample is taken is homogeneous.
Some reasons for using stratified sampling over simple random sampling are:
Example
Suppose a farmer wishes to work out the average milk yield of each cow type in his herd which consists of
Ayrshire, Friesian, Galloway and Jersey cows. He could divide up his herd into the four sub-groups and take
samples from these.
Systematic sampling: A type of probability sampling method in which sample members from a
larger population are selected according to a random starting point and a fixed, periodic interval.
This interval, called the sampling interval, is calculated by dividing the population size by the desired
sample size. Despite the sample population being selected in advance, systematic sampling is still
thought of as being random, provided the periodic interval is determined beforehand and the
starting point is random.
Cluster Sampling: Cluster sampling is a sampling technique where the entire population is divided
into groups, or clusters, and a random sample of these clusters are selected. All observations in the
selected clusters are included in the sample.
Multi stage sampling: Sometimes the population is too large and scattered for it to be practical to
make a list of the entire population from which to draw a SRS. For instance, when the polling
organization samples US voters, they do not do a SRS. Since voter lists are compiled by counties, they
might first do a sample of the counties and then sample within the selected counties. This illustrates
two stages. In some instances, they might use even more stages. At each stage, they might do a
stratified random sample on sex, race, income level, or any other useful variable on which they could
get information before sampling.
Multi- Phase sampling:
A sampling method in which certain items of information are drawn from the whole units of a sample
and certain other items of information are taken from the subsample.
Sequential sampling: Sequential sampling is a non-probabilistic sampling technique, initially
developed as a tool for product quality control. The sample size, n, is not fixed in advanced, nor is the
timeframe of data collection. The process begins, first, with the sampling of a single observation or a
group of observations. These are then tested to see whether or not the null hypothesis can be
rejected. If the null is not rejected, then another observation or group of observations is sampled and
test is run again. In this way the test continues until the researcher is confident in his or her results.
Purposive Sampling: Purposive sampling is when a researcher chooses specific people within the
population to use for a particular study or research project. Unlike random studies, which deliberately
include a diverse cross section of ages, backgrounds and cultures, the idea behind purposive sampling is to
concentrate on people with particular characteristics who will better be able to assist with the relevant
research.
Quota Sampling: Quota sampling is a method of sampling widely used in opinion polling and market
research. Interviewers are each given a quota of subjects of specified type to attempt to recruit for example,
an interviewer might be told to go out and select 20 adult men and 20 adult women, 10 teenage girls and 10
teenage boys so that they could interview them about their television viewing. It suffers from a number of
methodological flaws, the most basic of which is that the sample is not a random sample and therefore the
sampling distributions of any statistics are unknown.
Central Limit Theorem: The central limit theorem in its shortest form states that the sampling distribution
of the sampling means approaches a normal distribution as the sample size gets larger, regardless of the
shape of the population distribution. So the sample means will be normally distributed (especially when the
sample is above 30) if the population is positively skewed, negatively skewed or even binomial (having only 2
outcomes).