You are on page 1of 12

or Tnvestigato

carelessness on thC part


or
asking questions,arise
5. Again, bias may
or due
inaccuracies in recording,
to negligence or false recording
uldbytherefore
dish be
Arrangements
should therefo
the sample.
without actually visiting portance of
data. The importance Of accurate
investigators the collection of
to the staff in be impressed upon i
made to impart training should also
them.
investigation
information in the s u c c e s s of the
check the
investigators in the a
collection ofsupervisory
data. As soon as may
staff be used to
the collection of data begins; arrangements should liately
Sometimes,
made to receive the completed forms in batches and scrutiny should begin immedian.,e
ng
back to
enumerators for correci.
referred
some of thereturns may be
lf necessary, defective returns.
or completing the
the inconsistencies,

SAMPLING
(3.3) TYPES OF
a sample.
We describe below some ofth
different ways of selecting
There are many random sampling. (2) Purposive samplino
important types ofsampling-(1) Simple (5) Multi-stage sampling.
Systematic sampling.
(3) Stratified sampling. (4)
Random Samp1ing
) simple of selection
called Random Sampling) is the process
Simple Random Sampling (also unit of the population
has an "equal
such m a n n e r that every
of a group of units in a
obtained is called a
included in the sample. The group of units thus
chance" of being
Random Sample only). In practice, the members of the
Simple Random Sample (or
sample are drawn one by one.]
There are two ways of drawing a simple random sample:
(SRSWR): Simple random
(a Simple Random Sampling With Replacement
when the sample members are drawn
is said to be "with replacement",
Lsampling
from the population one by one, and
after each drawing, the selected population
before the next one is drawn.)
unit is noted and then returned to the population
This means that at each stage of
the sampling process all the population uníts
considered for selection with
(including those obtained in earlier drawings)
are

remains the same before each drawing


equal probability. Thus the population
and any of the population units may appear more than once in the sample.
(SRSWOR): Simple
Simple Random Sampling Without Replacement
(b random sampling is said to be "without replacement", when either the sample
after
members are drawn all at a time, or drawn one by one in such a manner that
next
each drawing the selected unit is not returned to the population when the
one is drawn.This means that when drawing is made one by one, at each stage
of the samplihg process the population units already chosen are not considered
for subsequent selections, but the drawing is made with equal probability only
from those units not selected in any ofthe earlier drawings. Itis evident that in
simple random sampling without replacement from a finite population, the sie
of the population goes on diminishing as the sampling process continues:
Consequently, no population unit can appear more than once in the sample.
Let a simple random sample of n members be drawn from a finite population
consisting of N members (i.e. sample size = n, population size = N). Then
Total number of possible samples with distinet 'permutations' of members
CRarnsteisti
aamee

l LeA a Jasn .
.
sis t b w t
151
Smpling Theory

SRSWR is N"
() in is *P, N(N- )N-2)... (N-
n +1) =
the order
of
in SRSWOR (ignoring
number of cases favourable to
each roup
Among these
ese. the
stinct 'combination'
drawing) or distir
of members
is not the same.

) in SRSWR
in SRSWOR
is the same. viz. n!
(ii) consider wiun a
reason, we generally each
For this with distinct
'permutations
SRSWR, N" possible samples
) in

probability
'combinations
N with distinct
samples
in S R S w O R . "C,
( N - possible
(ii)

each with probability


a
NC. X. X
from the population
random sample
. , represent a simple member of the
population
Let. of any particular
of selection
In SRSWR,
the probability
because before any
drawing the population
X remains a
c o n s t a n t 1/N:
in SRSWOR.
drawing result is also true
at any be shown that this of
N members. It may selection. Thus
the probability
contains all the varies at each stage of 1/N
population size at the i-th drawing
is a c o n s t a n t
although the member X, (suppose)
population
obtaining the i.e.
SRSWR and in SRSWOR;
both in

P X)=
1, 2, N.
various sampling
and k
among the
= . . .

2, n
forall i= 1, and most important
. . .

simplest hence also


Random sampling
is the human bias, and
the influence of
free from or the other
is
It is absolutely whether one unit
techniques. Chance alone
determines
known as Random
called Lottery Sampling. is facilitated by what are
The selection
of a random sample
selected.
the population is
more or
Numbers. in cases when
is the most appropriate The theories of
Random sampling characteristic under study.
the
with respect to on random sampling
only.
less homogeneous are based
test of significance in which the
distribution and random sampling
sampling case of simple the
This is a special constant throughout
Simple Sampling: member remains
selection of any particular selected earlier or
probability of member had been
of whether the infinite
irespective from a finite or an
sampling process a 'simple sample'
not. Therefore
SRSWR will always give only when the
population is
a simple sample
SRSWOR gives
However
population.

infinite.
)Purposive Sampling individual judgement otf the sampler,
is
on the basis of purposive
A sample which is selected for selecting a
There is special technique
no
called Purposive Sample.
a sample according to his
out a typical or representativechance is not allowed to
sample; but the sampler picks factor and
on the personal
OWn judgement. It all depends
152 Statistical Methods

play at all. Consequently, there is much scope for bias and the degree of accurac
the estimates is not known. Purposive sampling may be useful when the samnie
small; but as the sample size increases the estimates become unreliable due
to
accumulation of bias. The advantage of purposive sampling is tnat whereasa randon
Will not.
sample may vary widely from the average, a purposive sample
) Stratified Sampling
In Stratified Sampling, the population is subdivided into several parts, called strata
and then a sub-sample is chosen from each of them. All the sub-samples combined
together give the Stratified Sample. If the selection from stratais done by random
sampling, the method is known as Stratified Random Sampling. The subdivision of
the population into strata is done by purposive method, but the selection of sub-
samples from within the strata depends purely on chance. Stratified random sampling
may therefore be viewed as a mixture of both purposive and random sampling, and
combines the advantages of both.
(Stratified sampling is generally used when the population is heterogeneous. but
can be subdivided into strata within each of which the heterogeneity is not so
prominent. Some prior knowledge is, therefore necessary for subdivision into strata,
called stratification.) If a proper stratification can be made such that the strata differ
from one another as much as possible, but there is much homogeneity within each of
them, then a stratified sample will yield better estimates than a random sample of the
same size. This is so, because in a stratified sample the different sections of the
population are suitably represented through the sub-samples, while in random sampling
some of these sections may be over-represented or under-represented or may even be
omitted.
Usually the same fraction of members from each stratum is included in the sample,
i.e., sub-sample sizes are made proportional to sub-population sizes. The principal
purposes of stratification are: i) to increase the overall estimates; (i) to ensure that
all sections of the population are adequately represented; (ii) to avoid a large size of
the population, and (v) to avoid the heterogeneity of the population.
(Uses-(i) Administrative convenience may dictate the use ofstratified sampling.
For example, in conducting a sample survey over West Bengal, the different districts
may be taken as strata, so that the district authorities can supervise the survey and
collect data more efficiently from their own region, i) Sometimes, different parts of
the population involve different sampling problems, and
special measures are necessary
for dealing with these cases. Stratified sampling is extremely useful in such cases.
Advantages-(i) If data of a given precision are required for certain parts of the
population, it is advisable to treat each part as a population in its own right,
ii) Stratification also brings about a gain in precision of the estimates obtained.
A)Systematic Sampling
Systematic Sampling involves the selection of sample units at equal intervals, after all
the units in the population have been arranged in some order. If the population size 1s
finite, the units may be serially numbered and arranged. From the first k of these, a
single unit is chosen at random. This unit and every k-th unit thereafter constitutes a
Systematic Sample)In order to obtain a systematic sample of 500 villages out of
KRoraefeistie
Jaramle
basis etffusadk col Led a laramolda

153
Sampling Theory

have to be
Of 40,000 in West
st Bengal, i.e. of 80 on an average all the villages
one out
selected at random, suppo
numbered serially. From the first 80 of these a village is
number 2/. Then the with serial numbers 27,
107, 187, 20/,
with the
serial villages
constitute the systematic sample.
347, ...

of arrangement or uc
characteristic under study is independent of the order
If the random sample. Ine
urits, then a systematic sample
is practically equivalent to a
the sample is easier and quicker. Systematic sampling
1s
Suitaoic
actual selection of workers listed on
cards.
the units are described on serial numbered cards, e.g.
when numbers. The sample
can be drawn easily by looking at the serial
Then the sample interval.
it there are periodic features associated with the sampling
may be biased
Multi-stage Sampling
several
which is carried out in
Multi-stage Sampling refers to a sampling procedure These
calledfirst-stage units.
stages. The population is first divided into large groups,called units-the
divided into smaller units, second-stage
first-stage units are again
into third-stage units, and so oF, until we reach
the ultimate units.
second-stage units
are selected, from each of
which some second
Initially some of the first-stage units until the
on from stage to stage
units are chosen; and the process is carriéd
stage on an
in order to introduce a scheme
selection of ultimate units. For example, from the
basis in thevillages, we may have to select a few villages
experimental Sub-divisions may be used as
whole of the State. If apply three-stage sampling.
we
as the
the second-stage units and then Villages
first-stage units. Anchals forming
ultimate units.
divisions and subdivisions of the population
(Multi-stage sampling enables existing to be concentrated,
various stages and permits the field work
to de used as units at
Another advantage is that the subdivision into
second
although a large area is covered. which are included in
carried out for only those first stage units
stage units need be
of underdeveloped areas where no sampling
the sample. It therefore helps in surveys
accurate for subdivision of the natural units into
frame is sufficiently detailed and
considerable saving in cost is achieved
reasonably small sampling units. Usually, than
However, this method is in general less accurate
through multi-stage sampling.
the number of ultimate units by some single
any other sampling method using
same

stage process.

SAMPLE
METHOD OF DRAWING RANDOM
3.4
Example 3.6 What are random numbers ?
M.Com. '77, '80; M.B.A. "78]
L.C.WA, June 74; W.B.H.S. '79; C.U.,
refer to well
Solution Random numbers (also called Random sanmpling numbers)
some

successive figures appear in a perfectly 'random' order.


Known sequence of digits in which the from the Random Number Table, any of the ten
his means that if a digit is blindly selected 1/10. Similarly, iftwoconsecutive
gits 0, 1, 2,.., 9, is likely to occur with thethem (inprobability
same
the order they appear) may be any of the
are taken, the number formed by
gts
nundred numbers 00, 01, 02, .. 99 with the same probability 1/100, and so on. The series of

numbers usually available in groups of four digits to


facilitate easy reading as
andom are

follows:
in the value ó l are unkn alia statistucs (I) San
variation parameters
constant. This
fluctuation. Usually statistic s
is called its
sampling distribution of a
has no
parameter probability is called 'stand
I f f r e p e s

parameters. The distribution

used as of
estimates
deviation in the sampling a sampling dic mean a

the standard c o n s t a n t it
has n e i t h e r
d i s t r i b u t i o n ' and

However, since
the parameter
is bution approxim
of the statistic. s t a t i s t i c and p a r a m .
standard error.
between
neter: mean =
nor a to distinguish
notations will be used provided
The following
Statistic (from Sample
Parameter (from all
Population Values)
Values) (II)

I f p repre

a lot wve
Mean
Standard Deviation P a p p r o x i

Proportiom
rth Raw Moment
provide

rth Central Moment m Not-


larger t

Distribution ofa statistie


Example 3.11 Explain clearly the concept of Sampling
'74, '81; C.A., Nov. '7R.
(3.7
[W.B.H.S. 78, '82; C.U., B.A. (Econ) Stande
I.C.W.A. June 73, '76, "78, 79, Dec. "74, '81)
distri
the probability law which
statistic may be defined as
Solution Sampling of a
Distribution below
the statistic follows, if repeated random samples of a fixed size are drawn from a specified

population.
Let us consider a random sample x, X2, .., X, of size n drawn from a population containing
N units. Let us further suppose that we are interested in the sampling distribution of the
statistic (i.e., sample mean),
where +2t. =
(t +X,)/n.
If the population size N is finite, there is a finite number (say K) of possible ways of
drawing n units in the sample out of a total of N units in the population. For each of these K
samples we calculate the value of (see Tables 3.3 and 3.5. Here N = 4, n = 2, and
K N" = 16 for SRSWR and K = "C, =6 for SRSWOR). Although the K samples re
distinct, the sample means may not be all different; but each of these occurs with equal
probability. Thus, we can construct a table shwoing the set of possible values of the statistie
and also the probabilitythat will take each of these values. The probability distribution
of the statistic ï , will be called 'sampling distribution' of sample mean (Tables 3.4 and
3.6). The above method is quite general, and the sampling distribution of any other statistic,
say, median, or standard deviation of the sample, may be obtained.
If, however, the number (N) of units in the population is thé large, number (K) of possible
distinct samples being even larger,
the above method of
finding the sampling distribution
cannot be applied. In this case, the
values of I obtained from a
large number of samples
may be arranged in the form of a relative frequency distribution.(The limiting form of this
relative frequency distribution, when the number of samples considered becomes
large, will be called sampling distribution of the infinitely
statistic.When
the population is specified
by a theoretical distribution (e.g., binomial, or normal)/the sampling distribution can be
theoretically obtained. The knowledge of sampling distribution is necessary in finding
confidence limits' for parameters and in 'testing statistical hypotheses'.
equa, say F1E/2**
-
3.1 Aa
S.E. of(P -P2)
with s.d. o
population
normal
from a
random sample
4. For
S.E. of sample s.d. (5) = 2n
(3.7.5

variance (S) =
0
2
(3.76)
S.E. of sample and used only whenn the
are
however approximate
Formulae (3.7.5) and (3.7.6)
greater than 50).
sample size n is large (say ofa statistic. What doe
does
the concept of 'Standard error
Discuss
Example 3.12
statistic measure?
he standard error ofa C.U. B.A. (Econ) 8; C.A., Nov. '7
C.U. M.Com. '71 '77,; '76, '78; Dec. '811
L.C.W.A., June '73, *75,
n drawn from
a specified population
be a random sample of size
Solution Letx.1z,..x,
calculate the value of certain statistic, say, sample mean
On the basis of this sample, let
us

n a large number of
random sample of the fixed size
We repeat the process of drawing a
for each sample. The
and calcualte the value of the statistic (here, sample mean)
times, considered
distribution of these sample means, when the number of samples
relative frequency
distribution of sample mean.
is infinitely large, is called the sampling
have its own mean, standard deviation.
The sampling distribution of any statistic will
moments, etc. The standard deviation calculatedfrom the sampling distribution ofa statistie
he
error gives a measure of dispersion of
is called its 'Standard Error'. The standard
and goes on diminishng as the samplie
concernedstatistic. It depends on the sample size n
size increases. It is used to set up 'confidence limits' for population parameters and i

'testsof significance'. Thus, the standard errors of sample mean ( +) and sample propori
p) are used to find confidence limits for the population mean () and the populauo
oroportion (P) respectively.

xample 3.13 State the formulae for standard error of sample mean andsaniple
roportion. IM.B.A. 77; 1.C.W.A., June
CRaateai'stie
Jaramele

talle a Jaramel
basist wXb

ivenefulatian. aSalanter o
G
Caotan

A Ranttashie uted
tdtit
knmma
6ban da
andra
t o t s hte
s t a t i she
yautoane
Sonl
a ekaraa
etara

sa
la m
u h

tt Somte
dansiu
u va n
i t e

i
ahlr
2olsm y a
a d a l ibulon

Atatistic dape dto

aaard
d r a k m O

anplio
Valu»
V aua a
ditiuwn
Ai
pana
ecan

b kann a tha
xsob:lti
Co3a i
SAatutt
Consequently,
P{lt- 61< PE.}=
there is a
Thus,there 50: 50 chance that the
TnIn
thanPE this sense, P.E. measures thestatistic t differs from 0by more than or less
hanndom
In randon
sampling fluctuation like S.E.
samples of size h from a normal
he shown that very nearly population with mean u and s.d o, 1t
can

P.E. of mean =0.6745 on


=
0.6745 x (S.E.
if the population is not normal, this relation of mean)
Even
(3.7 A.2)
shis reason, the probable error (P.E.) is
For this reason,
helds approximately for large n.
rdard error (S.E.). For example, in large
standard error
sometimes defined as 0.6745 t es the
that
samples, S.E. of correlation coefficient
is (- /Vn , so

P.E. of correlation coefficient =


0.6745
74s
n (3.7 A.3)
PE. is rarely used in statistical theory.

(3.8 DISTRIBUTIONS USED IN SAMPLING THEORY


Four important probability distribution, which are derived from Normal distribution
and used in sampling theory are:
)Standard Normal distribution,
(i) Chi-square (r*) distribution
(i) Student's t distribution
(iv) Snedecor's F distribution.

(A) Standard Normal Distribution


If a random variable x is normally distributed with mean Hand standard deviation o,
then
- Normal variable - Mean

S.D. (3.8.1)
is called a Standard Normal Variate. The probability distribution of z is called Standard
Normal Distribution, and is defined by the p.d.f.

p(z) =
2,(-«<1<+o)
27
(3.8.2)
Characteristics:
1. The standard normal distribution is a special case of normal distribution with
the mean =0, and s.d. = 1.
2. It has no parameters (unlike the normal distribution which has two parameters
and o).
3. The central moments are =1, 43 =0, j4= 3.
Also, B, = 0, B = 3; Skewness ( ) = 0, Kurtosis (7)=0.
4. The standard normal curve is symmetrical about the mean 0, and the two tails
of the curve extend to infinity on either side of the mean. The points of infection
are at z =tl. |
Samping Theory 177

under the curve to the right


he area
that the
me
ans of the ordinate at 3p is p. For ,
This
ple. fromthe table of Area under Standard Normal Curve (Appendix, Table 1)
the area to the right of the ordinate at = 1.645 is .05: i.e..
wefindthat
P(z> 1.645) = .05 = 5%.
ner 5% point of the standard normal distribution is thus 1.645, and we express
Theo s 645. The following percentage points are useful in sampling theory
1.645,

thisas 051.645. 025= 1.96,


01 2.33, 0052.58 (3.8.7)
because of symmetry ot the standard normal curve, the area to the lefi of the
Again
1.64 is .05; i.e.
ordinat atz=- P < - 1.645) = .05.
The lower 5%pomt ot the distribution is thus - 1.645. Since the area to the right of

1.645 is 0.95, hence the


I 645is
lower 5% point is z-95=- 1.645 = -

205, In general, for the


nard normal distribution. the lower percentage point is ust the negative of the
percerntage point; i.e.,
upper
1- p (3.8.8)

(B) Chi-square (X*) Distribution


A random variable x is said to follow Chi-square ) distribution if its p.d.f. is of the
form
flx) = K.e n2)-1,: (0 <x< oo) (3.8.9)
where K is a constant. The parameter n (positive integer) is called the number of
variable
degrees offreedom. (y is a letter of the Greem alphabet and pronounced ky). A
which follows chi-square distribution is called a Chi-square variate, and often denoted
bythe symbol x.
In statistical work it is common to use the same symbol both for the random variable
and a specified value of the variable. For example, in the distribution with n degrees
of freedom (d.f.) the percentage points are denoted by xpn or breifly x*, if the d.f. n
is understood from the context. These are vlues of the variable such that P(x > x)
=p. Percentage points are given in Statistical Tables for various values ofp (Appendix,
Table II).
Characteristics:
1. Mean =n, S.D. = 2 wheren is the number of degrees of freedom (d.f.) of
Chi-square distribution.
2.Thechi-square curve is positively skew, and starting from 0 extends to infinity

,
on the right [Fig. 3.1(B)
Ifx andy are independent chi-square variates with d.f. na andn^ respectively,
then their sum (x + y) also follows chi-square distribution with d.f. (n, + n,).

4. When the d.f. n is large y2x -2n -1 approximately follows the standard

normal distribution.
Theorem II
then
2 , are n independent standard normal varieties,
Theorem III
population with mean u and

Ifx1, X2, . . .

X, is a
random sample from a normal
s.d.o,
then
24 )2/a

)
(3.8.11)
i=l
freedom; and
distribution with n degrees of
follows chi-square

ii) -F2 (3.8.12)


i=l
fredom. (Here, f is the
follows chi-square distribution
with (n -

1) degrees of mean
mean
of the sample).
Student's t Distribution
(C)
Student's t distribution, or simply t distribution, it
A random variable is said to folow
its p.d.f. is of the form
- (n +1)/2

f) = K. 1
n
(3.8.13)
(-oc <t <+ «)
where K is a constant. The parameter n (positive integer) is called the number of
W.S. Gossett who wrote
degrees offreedom (d.f.).This distribution was discovered by
under the pen-name "Stúdent" and hence it is called Student's distribution. A variable
which follows Student's distribution is denoted by the symbol t.
The percentage points of t distribution with n degrees of freedom are denoted by
about zero
n.nOr briefly 1, if the d.f. n is understood. Since the 1-curve is symmetrical
Fig. 3.1(C)). 1-=-p;for example t9s - f03
Characteristics:
n
1. Mean =0, S.D. :
Vn-2 (n>2)
2. The -curve is symmetrical about 0, extending from-o to+ oo (like the standard
normal curve). It has zero skewness and positive kurtosis (leptokurtic). i.e.
B = 0. B > 3.
3. When the d.f. n is large, the t distribution can be the standa
approximated by
normal distribution.
Theorem IV
If z and y are independent random variables, where z follows standard
norn
distribution and y follows chi-square distribution with n degrees of freedom, then
y = 2-) nS
2
follows
chi-square distribution with (n
s chi-
1) d.f.. where S =XX,
-

1o Also
vartance. Al the variable z and v are -

T))/n
n is the sample
sample
Theorem v
independently distributed.
ndom
f a ran sanmple of size n is drawn from a normal population with mean U and s.d.
a, then

X -

follows distribution with (n


Gllows t distri
SIn-1 (3.8.15)
-

1) degrees of
mean and s.d. of the sample. freedom, where and S denote the

Theorem VI
If two independent random samples of sizes n, and n, are drawn
from two normal
nopulations with means and u2 respectively and a
common s.d. o, then
2)-(41 -H2)
sln +1/n (3.8.16)
follows t distribution with (n,+n- 2)
degrees of freedom, where X, I denote the
means and S1, S,, the s.d.s. of the
samples, and

S=
S+n,S2
n +n -2 (3.8.17)
(D) Snedecor's F Distribution
A random variable is said to follow F distribution with degrees of freedom
its p.d.f. is of the form (n1, n2) if

SF) = K. FM4)=' (n, +n,F)-7 tng)/2


(3.8.18)
(0<F<)
CTe K is a constand The distribution was discovered by G.W. Snedecor and named
n honour of the distinguished mathematical statistician Sir R.A. Fisher. The
180 Statistical Methods

percentage points of F distribution with d.f. (n1, n2) are denoted by Fp.
p n. n2 of briefly
F, if the d.f. are understood. The lower percentage points are given by
F1-p.(n. n2) 1p,(n2, n) (3.8.19)
i.e., the lower percentage point is the reciprocal of the upper percentage point with
the order of d.f. reversed.

Characteristics:
n Mode = 22(7-2)
1. Mean =
n -2 (n2 + 2)

S.D. = 2 _ 2 0 + n , -2)
n2-2Vnn2 -4)
provided they exist and are positive.
2. The F-curve is positively skew, and
starting from 0 extends to infinity
[Fig. 3.1(D)]. |
Theroem VII
Ify, and y, are independent chi-square variates with
respectively, then
degrees of freedom n, and
F= l
y, /n (3.8.20)
follows F distribution with degrees of freedom
(n1, n2).
In view of the results
given in Theorem 3, and as a consequence of the above
theorem, we havve
Theorem VIII
Ifx, X2, ...
X and y. y2, ..
Yn, are independent random samples of sizes
n, and n
respectively fronm two normal populations with means
, l and common standard
deviation o, then

You might also like