You are on page 1of 13

School of Planning and Architecture, Vijayawada

An Institute of National Importance,


MHRD government of India.
DEPARTMENT OF PLANNING

URBAN AND REGIONAL PLANNING

GATE SCHOLORSHIP WORK -OCTOBER


Sampling fundamentals

- BY
Siva sankari s
2190400113

1
Sampling Fundamentals

 To have confidence in generalizing sample results to


 Sampling is a process of selecting representative units from an
the whole population requires a “probability sample”
entire population of a study.
of the population.
 Sampling refers to the process used to select any number of
NEED FOR SAMPLING
persons to represent the population according to some rules
or plan on basis of some selected measures.
 Only feasible method for collecting information
 In general statistics and survey methodology, sampling is  Reduces demands on resources (time, finance,.)
basically concerned with the selection of a subset of  Results obtained more quickly
individuals from within a statistical population to estimate the  Better accuracy of collected data
characteristics of the whole population.  Ethically acceptable
 Only way when population contains infinite members
 In research we often want to know certain characteristics of a
large population, but we are almost never able to do a
Key Principles of Probability Sampling
complete census of it. So we draw a sample—a subset of the
population—and conduct research on that relatively small
1. Define carefully the population to be surveyed.
subset. Then we generalize the results, with an allowance for
sampling error, to the entire population from which the
2. Determine how to access the survey population (the
sample was selected.
sampling frame)
 The capacity to generalize sample results to an entire
population is not inherent in just any sample. If we interview
3. Draw a sample by some random process.
people in a “convenience” sample—those passing by on the
street, for example—we cannot be confident that a census of
4. Know the probability (at least in relative terms) of
the population would yield similar results.
selecting each element of the population into the
sample.

1 Sampling fundamentals
Some fundaments definitions

1.Universe/Population(finite or infinite)  For example, a research may prepare a list of the all the
households of a locality which have pregnant women or
 A population can be defined as including all people or may used a register of pregnant women for antenatal
items with the characteristic one wishes to understand. care available with the local anganwari worker.
 Because there is very rarely enough time or money to
gather information from everyone or everything in a 3. Sampling design
population, the goal becomes finding a representative  It refers to the procedure the researcher would follow or
sample (or subset) of that population. adopt in selecting same sampling units from which
 Note also that the population from which the sample is inferences about population is drawn.
 drawn may not be the same as the population about  A sample design is made up of two elements.
which we actually want information. Often there is large  Sampling method. Sampling method refers to the rules
but not complete overlap between these two groups due and procedures by which some elements of the
to frame issues etc . population are included in the sample. Some common
 Sometimes they may be entirely separate - for instance, sampling methods aresimple random sampling ,
we might study rats in order to get a better stratified sampling , and cluster sampling .
understanding of human health, or we might study
records from people born in 2008 in order to make 4. Statistic/parameter
predictions about people born in 2009.  A statistic is a numerical value based upon sample,
whereas parameter is the numerical value based upon
2. Sampling frame: population.
 When we calculate mean from a sample this is called a
 It is a list of all the elements or subjects in the population statistic because it describes the characteristics of a
from which the sample is drawn. sample.
 Sampling frame could be prepared by the researcher or  When the same mean is calculated from a population it
an existing frame may be used. is called parameter because it describes characteristics
of a population.

2 Sampling fundamentals
5. Sampling error: 6. Precision:

 The sampling error is a number that describes the  Precision is the range within which the population average
precision of an estimate from any one of those samples. (or other parameter) will lie in accordance with the
 It is usually expressed as a margin of error associated reliability specified in the confidence level as a percentage
with a statistical level of confidence. For example, a of the estimate ± or as a numerical quantity.
presidential preference poll may report that the
incumbent is favored by 51% of the voters, with a margin 7. Confidence interval and significance level:
of error of plusor- minus 3 points at a confidence level of
95%.  Expected percentage of times that the actual value will fall
 This means that if the same survey were conducted with within the stated precision limits.
100 different samples of voters, 95 of them would be
expected to show the incumbent favored by between  Precision is the range within which the answer may vary
48% and 54% of the voters (51% ± 3%). and still be acceptable; confidence level indicates the
likelihood that the answer will fall within that range, and
the significance level indicates the likelihood that the
answer will fall outside that range

8. Sampling distribution

 A sampling distribution is a probability distribution of a


statistic obtained from a larger number of samples drawn
from a specific population.

 The sampling distribution of a given population is the


distribution of frequencies of a range of different
outcomes that could possibly occur for a statistic of a
population.

3 Sampling fundamentals
Concept of a sampling distribution is perhaps the most basic
concept in inferential statistics.

 The sampling distribution of the mean


 The sampling distribution of proportion
 Students t-distribution
 F distribution
 Chi-square ec 2 j distribution

Distributions: population, sample and sampling distributions

1. Sampling distribution of mean/ Z distribution:

Sampling distribution of mean refers to the probability


distribution of all the possible means of random samples of
a given size that we take from a population. If samples are
taken from a normal population

4 Sampling fundamentals
2. The sampling distribution of proportion/binomial 3. F DISTRIBUTION:
distribution
Two independent normal populations, having the same
Usually the statistics of attributes correspond to the variance
conditions of a binomial distribution that tends to become • The calculated value of F from the sample data is compared
normal distribution as n becomes larger and larger. If p with the corresponding table value of F and if the former is
represents the proportion of defectives i.e., of successes and equal to or exceeds the latter, then we infer that the null
q the proportion of non defectives i.e., of failures (or q = 1 – hypothesis of the variances being equal cannot be accepted.
p) and if p is treated as a random variable, then the sampling We shall make use of the F ratio in the context of hypothesis
distribution of proportion of successes has a mean = p with testing and also in the context of ANOVA technique.
standard deviation = p × q n, where n is the sample size.

4. chi-square distribution:
3. Students t-distribution
• Distribution is not symmetrical and all the values are
 The variable t differs from z in the sense that we use positive with (n – 1) degrees of freedom.
sample standard deviation in the calculation of t, whereas • Chi-square distribution is encountered when we deal with
we use standard collections of values that involve adding up squares.
 deviation of population in the calculation of z.
 There is a different t distribution for every possible sample
size i.e., for different degrees of freedom.
 The degrees of freedom for a sample of size n is n – 1. As
the sample size gets larger, the shape of the t distribution Point estimator:
becomes approximately equal to the normal distribution A point estimator is a formula that uses sample data to
calculate a single number (a sample statistic) that can be
used as an estimate of a population parameter. e.g. ¯x, s
to calculate μ, σ,

5 Sampling fundamentals
Central Limit Theorem: SAMPLING THEORY
The central limit theorem states that:
 Given a population with a finite mean μ and a finite non zero (i) Statistical estimation:
variance σ2, the sampling distribution of the mean  Sampling theory helps in estimating unknown population
approaches a normal distribution with a mean of μ and a parameters from a knowledge of statistical measures based
variance of σ2/N as N, the sample size, increases on sample studies.
 In other words, to obtain an estimate of parameter from
Example of the Central Limit Theorem in Practice: statistic is the main objective of the sampling theory.
 Roll 30 dice and calculate the average (sample mean) of the  The estimate can either be a point estimate or it may be an
numbers that you get on each die. interval estimate. Point estimate is a single estimate
 Now repeat this experiment 1000 times each time rolling 30 expressed in the form of a single figure, but interval
dice and computing a new sample mean. estimate has two limits viz., the upper limit and the lower
 Plot a histogram of the 1000 sample means that you have limit within which the parameter value may lie.
obtained.  Interval estimates are often used in statistical induction.
 This plot will look approximately normal
(ii) Testing of hypotheses:
 The second objective of sampling theory is to enable us to
decide whether to accept or reject hypothesis; the
sampling theory helps in determining whether observed
differences are actually due to chance or whether they are
really significant.

(iii) Statistical inference:


 Sampling theory helps in making generalisation about the
population/ universe from the studies based on samples
drawn from it.
 It also helps in determining the accuracy of such
generalisations.

6 Sampling fundamentals
i) To test the significance of the mean of a random sample

SANDLER’S A-TEST
Researchers can use A-test when correlated samples are
employed and hypothesised mean difference is taken as zero
i.e., H0 D : m = 0 .
Psychologists generally use this test in case of two groups
that are matched with respect to some extraneous
variable(s).
While using A-test, we work out A-statistic that yields exactly
the same results as Student’s t-test*.
A-statistic is found as follows:

The number of degrees of freedom (d.f.) in A-test is the same


as with Student’s t-test i.e., d.f. = n – 1, n being equal to the
number of pairs. The critical value of A, at a given level of
significance for given d.f., can be obtained from the table of
A-statistic (given in appendix at the end of the book).

7 Sampling fundamentals
CONCEPT OF STANDARD ERROR E.g. if different samples of the same size n are drawn from
a population, we get different values of sample mean x .
The standard deviationStandard error statistics are a class of The S.D. of x . is called standard error of x . . It is obvious
inferential statistics that function somewhat like descriptive that the standard error of x . will depend upon the size of
ztatistics in that they permit the researcher to construct confidence the sample and the variability of the population.
intervals about the obtained sample statistic. The confidence
interval so constructed provides an estimate of the interval in which
the population parameter will fall. The two most commonly used
standard error statistics are the standard error of the mean and the
standard error of the estimate.

The standard error of the mean permits the researcher to construct


a confidence interval in which the population mean is likely to fall.
The formula, (1-P) (most often P < 0.05) is the probability that the
population mean will fall in the calculated interval (usually 95%).

The Standard Error of the estimate is the other standard error


statistic most commonly used by researchers. This statistic is used
with the correlation measure, the Pearson R. It can allow the
researcher to construct a confidence interval within which the true
population correlation will fall. The computations derived from the
r and the standard error of the estimate can be used to determine
how precise an estimate of the population correlation is the sample
correlation statistic.

The standard error is an important indicator of how precise an


estimate of the population parameter the sample statistic is. Taken
together with such measures as effect size, p-value and sample size,
the effect size can be a useful tool to the researcher who seeks to
understand the accuracy of statistics calculated on random
samples.
8 Sampling fundamentals
ESTIMATION
Estimation of Population Parameters : The basic purpose of this
chapter is to know about the population parameters with the help
of Sample Statistic. If the population is completely unknown, and
we find the population parameters using the knowledge of
sample values only then it is called Estimation of Population
Parameters.
Types of Estimation : Estimation is divided into two groups : Determination of Sample Size
(i) Point Estimation (ii) Interval Estimation  The need for determination of the proper size of the
sample is very great for practical use in business
Point Estimation : Point estimation a single statistic is used to where either the standard error is known on the basis
provide an estimate of the population parameter. In other words, of past
the estimate of a population parameter given by a single number  experience or where a given absolute level of
is called the point estimation of the parameter. In point accuracy is desired.
estimation, we find a statistic which may be used for to replace an  If the sample size is too large, more money and time
unknown parameter of the population for all practical purposes. have to be spent but the result obtained from the
large sample may not be more accurate than that
Interval Estimation : There are situations where the point from a smaller sample.
estimation is not desirable and we are interested in finding such  On the other hand, a valid conclusion may not be
limits within which with a known probability or to a known degree reached if the sample size too small.
of reliability, the value of the population parameter is expected to  The method of determining a proper size is given for
lie. Such a process of estimation is called the interval estimation. the following two cases ;

9 Sampling fundamentals
SAMPLE SIZE AND ITS DETERMINATION

Size of the sample should be determined by a researcher


keeping in view the following points:

 Nature of universe
 Number of classes proposed
 Nature of study
 Type of sampling
 Standard of accuracy and acceptable confidence level
 Availability of finance
 Other considerations

Determination of appropriate Sample Size

 The computation of the appropriate sample size is


generally considered to be one of the most important
steps in statistical study. But it is observed that in most of
the studies this particular step has been overlooked.
 The sample size computation must be done
appropriately because if the sample size is not
appropriate for a particular study then the inference
drawn from the sample will not be authentic and it might
lead to some wrong conclusions

 Again, when we draw inference about parameter from


statistic, some kind of error arises. The error which arises
due to only a sample being used to estimate the
population parameters is termed as sampling error or
sampling fluctuations.

10 Sampling fundamentals
 Whatever may be the degree of cautiousness in
selecting sample, there will always be a difference There are various approaches for computing the sample
between the parameter and its corresponding estimate. size [5, 57, 117]. To determine the appropriate sample
 A sample with the smallest sampling error will always be size, the basic factors to be considered are the level of
considered a good representative of the population. precision required by users, the confidence level desired
 Bigger samples have lesser sampling errors. When the and degree of variability.
sample survey becomes the census survey, the sampling
error becomes zero. i) Level of Precision :
 On the other hand, smaller samples may be easier to
manage and have less non-sampling error.  Sample size is to be determined according to some pre
 Handling of bigger samples is more expensive than assigned ‘degree of precision’.
smaller ones. The non-sampling error increases with the  The ‘degree of precision’ is the margin of permissible
increase in sample size error between the estimated value and the population
value.
 In other words, it is the measure of how close an
estimate is to the actual characteristic in the
population.
 The level of precision may be termed as sampling error.
According to W.G.Cochran (1977), precision desired
may be made by giving the amount of errors that are
willing to tolerate in the sample estimates.
 The difference between the sample statistic and the
related population parameter is called the sampling
error. It depends on the amount of risk a researcher is
willing to accept while using the data to make
decisions.
 It is often expressed in percentage.
 If the sampling error or margin of error is ±5%, and 70%
unit in the sample attribute some criteria, then it can
be concluded that 65% to 75% of units in the
population have attributed that criteria.

11 Sampling fundamentals
 High level of precision requires larger sample sizes and DETERMINATION OF SAMPLE SIZE THROUGH THE APPROACH
higher cost to achieve those samples. BASED ON BAYESIAN STATISTICS

ii) Confidence level desired :  This approach of determining 'n'as such is known as Bayesian
approach.
 The confidence or risk level is ascertained through the well  The procedure for finding the optimal value of 'n' or the
established probability model called the normal sample under this
distribution and an associated theorem called the Central
(i) Find the expected value of the sample information
Limit theorem.
 The probability density function (p. d. f) of the normal (EVSI)* for every possible n;
distribution with parameters μ and s is given by (ii) Also workout reasonably approximated cost of taking a
sample of every possible n;
(iii) Compare the EVSI and the cost of the sample for every
possible n. In other words, workout the expected net gain
(ENG) for every possible n as stated below: For a given
sample size (n): (EVSI) – (Cost of sample) = (ENG)
(iv) Form (iii) above the optimal sample size, that value of n
which maximises the difference between the EVSI and the
cost of the sample, can be determined.
The computation of EVSI for every possible n and then
comparing the same with the respective cost is often a very
cumbersome task and is generally feasible with mechanised or
computer help.
 Hence, this approach although being theoretically optimal is
rarely used in practice.

12 Sampling fundamentals

You might also like